How to configure Fonto for large documents?!

In this weekly series, Martin describes a question that was raised by a Fonto developer, how it was resolved and why Fonto behaved like that in the first place. This week, a change from the usual. I will cover a bigger subject that comes up a lot: how do I make Fonto edit megabytes and megabytes of structured content?

At Fonto, we see the editor being used for diverse document types. Some edit a single DITA topic of a few hundred characters. Some edit a single document in a custom schema that (when printed) spans a couple of pages. Some edit a document that covers an entire ISO standard. Some edit a map of tens of thousands DITA topics describing how to maintain an aircraft. Some partners edit a single HTML document that prints to a whole book that is too heavy to lift with one hand. All of them use Fonto, and all of them have their own challenges. In this post, I want to go into a common challenge: performance of huge documents. I will do this by going over the different phases of an edit session, explaining one by one how we have helped partners tackling the performance aspect.

Loading the editor and downloading the documents

Fonto Editor will by default load all documents that are added to the documentsHierarchy. There are four different ‘styles’ of documents that are loaded in the editor, they each have their own interesting features:

Small single document: we are talking about a couple of KB max, a screen or two worth. Printed out about three pages. These are not challenging, they can just be loaded without any issue.
Small set of documents, either in DITA or in another schema. A couple of dozen or so at the most. These are also not challenging.
Large single document, up to half a megabyte or more. These take (on an average 3G connection) multiple seconds to be downloaded for editing or uploaded to be saved. We use an approach called ‘chunking’ to split this document up in multiple smaller parts.
Large set of documents, either in DITA or in another schema, fifty or more, with no upper bound. These take a significant time to be downloaded and to be displayed, even if the author is not interested in editing all of those documents but they still want to make edits in the scope of all these documents. We use an approach we call JIT-loading to prevent downloading all these documents at once.

Most of the subjects in this post also apply to Fonto Review and Fonto Document History. They may not edit large documents, but they will still load them. Using chunking or JIT loading will speed up the load time for those products, and speed up authors!

JIT loading

When Fonto is asked to load a big set of documents, Fonto will load all of them. This takes time and memory. If an author attempts to load a few thousand documents, the browser may even refuse it. Fonto needs to be told to load documents Just In Time (JIT loading) to alleviate this. Instead of always loading everything, the editor will only load visible documents and suspend loading invisible documents until they are scrolled into view. This speeds up the loading time by only loading the first documents (usually less than 10) instead of all of them.

Fonto has first-class support for DITA. For editors that edit DITA, it is very straight-forward to enable JIT loading. Just set the map-manager-automatically-load-topicrefs configuration value to false:

configurationManager.set(
    'map-manager-automatically-load-topicrefs',
    false
);

For custom hierarchies, it is a bit less straight-forward. The full approach is described at the guide for custom hierarchies: How to configure hierarchical multi-document management.

When you ‘just’ enable JIT loading in an editor, the outline view will look empty: many documents will be labelled as ‘untitled document’. Not loading all documents means that not all data that relates to unloaded documents is present in the editor. For example, the title of unloaded documents will not be there. When the editor uses DITA, we recommend generating the content of the navtitle element in the map document.

<map xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" id="id-bd0b95ec-b828-4520-bf34-3ee1dde307ba" xsi:noNamespaceSchemaLocation="urn:fontoxml:names:fnd:xsd:map.xsd">
    <topicref href="landing-page.xml" id="id-03c90d66-8967-4aa4-aeb8-655a4a49ce45">
        <topicmeta>
            <navtitle>Fonto Documentation</navtitle>
        </topicmeta>
        <topicref href="get-started/_index.xml" id="id-a74975ad-b95c-47c9-93d0-8cd379ea533d">
            <topicmeta>
                <navtitle>Get started</navtitle>
            </topicmeta>
            <topicref href="get-started/prepare-your-environment.xml" id="id-d6afcb3b-d03a-4219-ed23-585a99b823a7">
                <topicmeta>
                    <navtitle>Set up local dev environment</navtitle>
                </topicmeta>
            </topicref>
            <topicref href="get-started/create-a-schema-bundle.xml" id="id-daa1f6e3-9d4a-424e-dab1-16671c3f4670">
                <topicmeta>
                    <navtitle>Create a schema bundle</navtitle>
                </topicmeta>
            </topicref>

JIT loading does not work always work out of the box. To make JIT loading feasible, the whole structure of your publication needs to be available without loading all documents. For example, if your sub-documents contain references to their sub-sub documents, all documents need to be loaded before the outline can be computed. This means JIT-loading has no benefit: all documents will be loaded in advance, not just in time.

JIT-loading can still work in this case. We recommend making a single map-like document that references to all sub-documents in the publication. If this central document can be built on the CMS, we can compute the outline of the publication without loading all documents.

Chunking

But what if the publication does not consist of many smaller documents? Loading single huge documents is a common challenge to a number of Fonto partners. A single large document takes time to load, takes time to save and can only be locked as a whole: multiple authors cannot collaborate in the same document.

Chunking is an approach we use to split up a single document in many other documents when we load it in the editor and merge it back into a single one when saving. We have done this with the NISO-STS schema and others. Most schemata have some kind of division or section-like element, so this approach should work for most of them.

When Fonto Editor (or Fonto Document History, or Fonto Review) requests a document, the CMS responds with a single catalog-like XML file, akin to a DITA map. This document contains references to the ‘divisions’ of the main document, like DITA topics. These have references like ‘<main document id>/<division id>’. When Fonto then requests a division document, the CMS retrieves the main document, finds the corresponding division and responds with that. When Fonto saves a division, the CMS replaces the old division with the new one.

To help you implement this custom hierarchy, we have a guide set up that describes how to load sub-documents: How to configure hierarchical multi-document management

By setting up the document state endpoint, the locking endpoints and the other document-related endpoints to actually work with sub documents instead of the main one, other approaches, such as JIT loading can be used to speed up the editor even more! Furthermore, authors can collaborate since they are able to lock their documents one division at a time.

Example of a chunked document:

<as:sec-ref as:href="iec/iec62481-1-1{ed3.0}en/0002-body/0001-sec.xml">           
  <as:outline>              
    <label>1</label>       
      <title>Scope</title>    
  </as:outline> 
</as:sec-ref>    
<as:sec-ref as:href="iec/iec62481-1-1{ed3.0}en/0002-body/0002-sec.xml">    
  <as:outline>     
    <label>2</label>     
    <title>Normative references</title>     
  </as:outline>   
</as:sec-ref>  
<as:sec-ref as:href="iec/iec62481-1-1{ed3.0}en/0002-body/0003-sec.xml">       
  <as:outline>          
    <label>3</label>   
    <title>Terms, definitions, symbols, abbreviated terms and conventions</title> 
  </as:outline>      
  <as:sec-ref as:href="iec/iec62481-1-1{ed3.0}en/0002-body/0003-sec/0001-sec.xml">   
    <as:outline>              
      <label>3.1</label>   
      <title>Terms and definitions</title>       
    </as:outline>     
    <as:term-sec-ref as:href="iec/iec62481-1-1{ed3.0}en/0002-body/0003-sec/0001-sec/0001-term-sec.xml">     
      <as:outline>                 
        <label>3.1.1</label>    
        <tbx:term id="ter-adu">ADU</tbx:term>    
      </as:outline>              
    </as:term-sec-ref>

The single document hierarchy add-on provides a kind of soft-chunking, which optimizes rendering for large documents but retains the performance overhead of dealing with large file sizes. If possible, using full CMS-side chunking is preferred.

Editing the document

Just loading the content is not enough; authors will also need to edit it. These edits cause changes in the XML, which in turn cause (parts of) the rendered document to rerender.

Computing what to rerender can take time. Time that will be noticeable as lag to the author. We have written a number of blog posts that optimise this in various ways:

Optimize your XQuery: https://www.fontoxml.com/blog/make-your-editor-fast-optimize-your-xpaths/
Improving typing behavior: https://www.fontoxml.com/blog/fonto-why-how-why-is-typing-so-slow/
Respond to XML changes: using indices: https://www.fontoxml.com/blog/observing-query-results/

These blog posts are a starting point to address and prevent performance issues in your editor.

Searching the document using find and replace

When loading a large set of documents (especially with JIT loading implemented), it becomes even more important for an author to search through them. The find and replace add-on supports an optional presearch endpoint that prevents all documents to be searched at the client side. Implementing this endpoint greatly speeds up search.

Conclusion

This was an introduction into the most common ways how Fonto can load large documents. We have partners with authors who edit tens of thousands of documents, editing whole books and documents ranging into hundreds of megabytes of content! With the right optimizations and the right architecture, the sky is the limit!

I hope this explains how Fonto works and why it works like that. During the years we built quite the product, and we are aware some parts work in unexpected ways for those who have not been with it from the start. If you have any points of Fonto you would like some focus on, we are always ready and willing to share! Reach out on Twitter to Martin Middel or file a support issue!

Martin Middel

Developer advocate / Evangelist. Has been with Fonto since it all began in 2013. He’s currently designing the next steps in Fonto Developer APIs with the input of our valuable partners.

In his spare time, Martin is an avid home brewer.

Stay up-to-dateFonto Why & How posts direct in your inbox

Receive updates on new Fonto Why & How blog posts by email

Fonto Why & How: How do I load huge documents?!