Make your editor fast: Optimize your XPath and XQuery

Optimize your XPaths for better performance

This blogpost is part of a series on improving the performance of implementations of the Fonto editor. In our experience, XPath is often one of the biggest bottlenecks for the editor’s performance. This is logical if you look at most modern Fonto APIs: SX configuration, mutation hooks and even custom sidebars use a lot of XPath. We will discuss the new XPath profiling infrastructure and how to use it to make use of the most powerful optimization in Fonto.

XPath profiler

Starting from Fonto 7.13, the Fonto editor will include tooling to diagnose some common mistakes that we see in some editors.

How to use it

The new XPath profiler is enabled by default in all Fonto editor debug builds. These are created by default when you just start the editor with fdt editor run. It has a number of functions it places on the Window object: startProfilingXPathPerformance and getXPathPerformanceSummary. These functions are not present on production builds that are created by fdt editor build.

  1. In the browser debugger console, call the startProfilingXPathPerformance function just before some process starts. For example when rendering a new document by jumping to it, inserting a large table or opening a heavy toolbar tab
  2. Once the process is done, execute copy(getXPathPerformanceSummary()) to stop the profiling and output a comma-separated value list on the clipboard
  3. Paste that list to your favorite spreadsheet application
  4. ???
  5. Profit

The fourth step is where your expertise and knowledge of the specific editor and schema comes in to help Fonto be as fast as it can be. Later on in this post, we will introduce buckets. In future posts, we will go into rewriting selectors to make them faster and how to leverage attribute indices to remove heavy selectors that try to traverse whole documents to find elements with specific attributes.

In this post, we will use a profile we extracted from one of the editors we have recently worked with. The editor uses the NISO-STS schema, which is a JATS derivative.

Test scenario: loading a document

For the first test, we perform the first render of a document by clicking on it in the outline view. This document is 950 KB on disk and contains 58 table elements. In total, this document contains about 17.000 elements.

The top ten XPaths in the initial profile look like this. Note the last column is not part of the csv generated by Fonto:

SelectorTimes executedAverage time in msTotal time spentPlatform or editor
fonto:title-content(.)131114.58884614489.655Platform, but calls configuration code
let $titleQuery := fonto:metadata-property(., "title-query")return if ($titleQuery) then fontoxpath:evaluate($titleQuery, map { ".": . })else let $titleSelector := fonto:metadata-property(., "title-selector") return if ($titleSelector) then fontoxpath:evaluate( "descendant-or-self::node()[" || $titleSelector || "]", map { ".": . } )//text()/string() => string-join(" ") else ()131114.44423114487.775Platform, but calls configuration code
asn:tables-reference(.)36376.747083313562.895Configuration
as:is-content-chunk(.)567350.054157045943072.600001Configuration
asn:sections-reference(.)7835.358141032757.935Configuration
as:is-structure-chunk(.)567080.045078560382556.315002Configuration
self::element() and fonto:is-read-only-root(.)182780.12720538352325.06Platform
self::*[fonto:is-on-review-route()]474420.044954049132132.709999Platform
(fonto:metadata-property(., "title-query"), "()")[1] => fontoxpath:evaluate(map{".": .})6350.432102.58Platform, but calls configuration code
Q{http://niso-sts-authoring-solution/authoring-schema/numbering}sections-reducer(fonto:current-hierarchy-node-id(), .)13161.17115382095.225Configuration

Some of these XPaths are executed by the editor platform, but some of these expressions are defined by the editor configuration itself. Detecting which are which is a matter of experience and familiarity with the configuration, but in general, XPaths that are defined by the application are the easiest to optimize. Most of these are already optimized in the most recent version of the platform, but if you do see some heavy XPaths you cannot recognize, do contact our support!

Disclaimer: the profiling tools do not make a distinction between public APIs and internals. As always, only use internals if you are ready to take the risk.

Buckets

The biggest performance optimization we do in the Fonto editor is in the Content Visualization Kit (CVK). It tries to disregard configurations for nodes where the XPath can not match for any variants of that node. This is currently based on the type of the node and the name of the node that can match the selector. Selectors that explicitly mention the name of the element they match with the form self::element-name benefit the most from this optimization, as they are never evaluated for any other node. Use the XPath playground to discover which selectors result in which bucket.

Some examples:

SelectorBucket
self::pname-p
self::divname-div
self::p[@class=”abc”]name-p
self::*Type-1-or-2 (in the Fonto editor, both attribute nodes and element nodes may match the self::* selector)
child::pNull (any node may contain a <p /> element. We do not yet leverage schema information)
@class=”abc”Type-1 (any element may match this selector. We do not yet leverage schema information)
fonto:dita-class(., “topic/p”)Null (any element may match this selector. The class attribute in Dita may be used to indicate specialization. Also, schema information is not yet leveraged)
self::p or self::divType-1, only elements may match this selector, this is the most specific single bucket we can use.

Custom functions

The bucketing optimization is applied to any selector that is passed to a configureAs function. In the table above, a number of few selectors are passed to configureAsSheetframe:

configureAsSheetFrame(sxModule, 'as:is-content-chunk(.)', undefined, { defaultTextContainer: 'p'});

This means that because the bucket for the selector as:is-content-chunk is null, each node was considered to be a candidate to be a sheetframe. Every node is tested against the query to verify it is indeed a sheetframe, even though in practice only a handful of nodes are actually sheetframes.

After looking into the implementation of this custom function, it turns out that the implementation was basically going over a couple of node names and testing the input with it. The node names were conveniently stored in a global array somewhere else.

We first tried replaced the function with a chain of or expressions: self::node1 or self::node2 or self::node3. This fixed a part of the issue. However, this can still be further optimised because the bucket is still too general. We can do so by increasing the number of unique selectors:

CHUNK_NODE_NAMES.forEach(chunk =>
	configureAsSheetFrame(
		sxModule,
		`self::${chunk}[parent::document-node()]`,
		undefined,
		{ defaultTextContainer: 'p' }
	)
);

After addressing this issue, we no longer encounter these queries in the top 100 most frequently executed XPaths, shaving about five seconds off the load time of the very large document.

Read only

One of the other XPaths we saw in the above table was self::*[fonto:is-on-review-route()], which is an internal private API that is used to make all documents read-only. This was used in the review add-on to configure everything as read-only: configureAsReadOnly(sxModule, ‘self::*[fonto:is-on-review-route()]’). Looking at the documentation of configureAsReadOnly, it states that all nodes matching and all of its descendants are marked as read-only. If only the root of the document (ie. the document node) is marked as read-only, the full content is already marked as read-only! Moreover, the self::document-node selector has the type-9 bucket instead of the type-1 bucket the self::* selector has. Since most documents contain more elements than they do document nodes, the bucket optimization reduces the time we need to determine the effective read-onlyness of a node: instead of going over all ancestor elements of an element, we can directly go to the document node: those ancestors won’t have any read-onlyness set.

In the end, we changed configureAsReadOnly(sxModule, ‘self::*[fonto-is-on-review-route()]’) to become configureAsReadOnly(sxModule, ‘self::document-node()[fonto-is-on-review-route()]’). This was done on the platform in the 7.12 release.

This change removed this selector from the XPath overviews, but also another XPath: self::element() and fonto:is-read-only-root(.). It turned out that the fonto:is-read-only-root function called the other one. This shows that especially functions in those XPaths may call other functions. If you do not see how an XPath with a function callback can be optimized, try another.

Wrap-up

Using these optimizations alone, we managed to improve the load time of the document we were working with by about 50% in a single day. Some improvements were found in the application code, some others are found in the platform code. We also applied other improvements which made the editor even faster on other points. These improvements will be highlighted in future blog posts, so stay tuned!

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top