Fonto: Why & How: My XPath is acting up?!

Fonto: Why & How: My XPath is acting up?!

In this weekly series Martin describes a question that was raised by a Fonto developer, how it was resolved and why Fonto behaved like that in the first place. This week, a developer was confused why an XPath looking for a closest ancestor kept returning the document node!

Let’s check the code. We are looking at a transform that searches for the first ancestor of a node that matches a certain query: the setContextNodeIdToSelectionAncestor transform. The developer was using the nightly version of the platform, which is known to sometimes be unstable. At this time, the transform looked like this:

const commonAncestorContainer =
	selectionManager.getCommonAncestorContainer() as FontoNode<'readable'> | null;
if (commonAncestorContainer) {
	let selector: XQExpression;
	if (stepData.selectionAncestorMatchesSomeOf) {
		selector = makeOrExpression(
			stepData.selectionAncestorMatchesSomeOf
		);
	} else {
		selector = stepData.selectionAncestorNodeSpec
	
	}
	const matchingAncestor = evaluateXPathToFirstNode(
		xq`ancestor-or-self::node()[${selector}]`,
		commonAncestorContainer,
		readOnlyBlueprint
	);

	if (matchingAncestor) {
		stepData.contextNodeId = getNodeId(matchingAncestor);
	} else {
		stepData.contextNodeId = null;
	}
}

return stepData;

One thing that stands out is the usage of the brand new xq API, which is introduced in the 7.17 version of the editor and will be made more powerful in the 8.0 version of the editor. It stands out because there is a selector passed as a part of the XQExpression. This transform has recently been refactored to allow the use of an XQExpression.

The selector is odd because (in the case of the developer) it should not match the document node. In their case it was set to 'self::p'. How is the document node matching that?!

We need to understand how the xq function works. It can interpolate sub expressions (typed as XQEXpression), strings (typed as strings), numbers and more. Interpolating the selector as a string causes a new expression to be built: ancestor-or-self::node()['self::p']. Note the apostrophes surrounding the self::p. These should not be there: since every non-empty string in XPath has an effective boolean value of true, the selector is basically ancestor-or-self::node()[true()]. This why we are getting a document node.

But then, the XPath specification indicates the ‘ancestor-or-self’ axis returns its nodes in reverse document order. In any case ancestor-or-self::*[1] returns the closest ancestor! We are inputting a text node somewhere deep in the document, how is Fonto getting to the document node?!

The XPath specification actually also describes any path expression to be in document order. In the expression ancestor-or-self::*[1], the [1]binds to the axis. But if you rewrite it to (ancestor-or-self::*)[1], the first node in document order is returned. See it for yourself on our XPath playground!

Fixing it

To fix this issue, we resorted to using a (for now internal) API to force the argument to the xq function to be an XQExpression. Furthermore, we changed the expression to be the following: xq`ancestor-or-self::node()[${selector}]`. This solved both issues: the filters that were not being used as well as the furthest node being returned. But fixing it is not enough. This was an honest mistake in using our APIs, which we should prevent by having better APIs!

Besides adding the usual unit test that everyone should write always when they fix a bug, we did one more thing to prevent this from ever happening in the future: we detect these errors by parsing XPaths using the parseScript API from FontoXPath, seeing where we are placing a token from a string interpolation, checking whether the direct parent is a <xqx:predicate/> element, and pushing a warning like this:

The interpolated string value "self::p" in the query "ancestor-or-self::node()["self::p"]" was used directly in a filter expression. This could be an error. If you meant to use this as a sub-expression, make sure the value is an XQExpression (using the xq template tag) rather than a string. If it was meant to be a string, surround the interpolation with an explicit string cast to disable this warning: "xq`self::*[xs:string(my interpolation)]`".

This warning started popping up at a few places in the platform, each of them were fixed, after writing an automated test that prevents them from happening in the future! The exact workings of this detection script will absolutely be covered in a future blogpost. If you have any other great ideas on common linter-like warnings for XPath and XQuery in Fonto, hit us up! We are always interested in feedback!

I hope this explained how Fonto works and why it works like that. During the years we built quite the product and we are aware some parts work in unexpected ways for those who have not been with it from the start. If you have any points of Fonto you would like some focus on, we are always ready and willing to share! Reach out on Twitter to Martin Middel (@dr_rataplan) or file a support issue!

Stay up-to-dateFonto Why & How posts direct in your inbox

Receive updates on new Fonto Why & How blog posts by email

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top