How do I generate IDs in a more predictable way?! Fonto Why & How

In this weekly series, Martin describes a question that was raised by a Fonto developer, how it was resolved and why Fonto behaves like that in the first place. This week, a support question came in inquiring how they could have more control in ID generation!

As usual, support question! This one read like this:

We are editing XML documents that have IDs. The standard UUIDs that get generated are okay, but it would be better if we can generate IDs that conform to our customer’s ID format.
In the past we’ve tried to generate IDs when we create elements. But when elements are created outside of these operations, (e.g. splitting a para) we got complaints that they were duplicating the @code of the existing para. So I switched it over to the unique ID configuration, but now we get complaints that they are in the wrong format.
Could you help us out here? Maybe we could inject a function in the ID generation feature?
There may be a way via existing APIs (maybe mutation hooks?) but I have not had any success so far. It is crucial that I don’t miss a single scenario or edge case where a new lid/code would need to be generated, so using your built-in unique ID system has been the safest option for me so far.
A Fonto Partner

That’s a cool question! Especially since it already linked to our MutationHook API, which is a very powerful one! This API allows one to hook into operations that edit XML on a very generic way. It allows one to react to any changes, new nodes being inserted or nodes being removed.

For this use-case, what we actually want to know is whether IDs are colliding. If they are, we should regenerate the ID of one of the two occurrences. By using an attribute index, we can make a quick lookup of elements by their ID. This API is another one that is very powerful if it is used correctly. Because it invalidates very sparingly (only when the return value of a call is actually changed), it can be used in mutation hooks, which will run every time they are invalidated. Attribute indices also fix a common and big performance issue that some editors run into when they contain a cross-reference functionality. By using them at the correct place, attribute indices can replace //*[@id = "something"] queries with a constant-time lookup of those elements.

Let’s follow the steps in the guide on how to create a mutation hook:

Creating the required files:
- I used a random editor I often use for testing, which edits documents in the docbook schema. It already contained the required install.ts and hooks.xqm files. If you’re following along, there might be files already that you can start from
The interesting part of the second step is the actual wiring of the callback to the hook. Especially the valueQuery. If we make the valueQuery use and attributeIndex to scan for collisions we can make it perform how we want to.
- So we need to make an attribute index. Because docbook uses ids in the xml:id namespace, we should configure the index like this:

addAttributeIndex(
    'http://www.w3.org/XML/1998/namespace',
    'id',
    'http://www.fontoxml.com/app/docbook',
    'id'
);

From there, the call to addMutationHook can be like this:

addMutationHook({
    selector: 'self::simpara[@xml:id]',
    valueQuery: 'array{Q{http://www.fontoxml.com/app/docbook}id(@xml:id)}',
    expectedResultType: ReturnTypes.ARRAY,
    onEvent: {
        functionLocalName: 'on-index-collision',
        namespaceURI: 'http://www.fontoxml.com/app/docbook'
    }
})

In that call we are using the array(*) type instead of node()*. We do that because in JavaScript, there exists no one-to-one translation between a sequence of nodes and an array of nodes. To make the behaviour more predictable, we assume that, if multiple values are returned, the JS-equivalent is an array. If later on an array of multiple items are passed to XPath, we create an XPath Array from them. Furthermore, if there’s no id set on the element, we just return an array of the current value. This

Finally, the hook! The hook can just scan for collisions using the passed data, and if there is one, replace the value of the current node:

declare %updating %public function db:on-index-collision ($event-type as xs:string, $node as node(), $previous-value as array(*)?, $current-value as array(*)?) {
    if (empty(array:flatten($current-value)) or not($node/@xml:id)) then
       ()
    else
        for $node in array:flatten($current-value) except array:flatten($previous-value) return
            replace value of node $node/@xml:id with db:generate-id()
};

The generate-id function in there is something that should be implemented with care. Because this is just demo code, this is fine for now:

declare %public function db:generate-id () as xs:string {
  "GENERATED-" || (xs:float(random-number-generator()?number) * 1000000000) => ceiling()
};

And it works! Ids are being generated when pressing enter in a paragraph:

This exercise actually uncovered a bug in our open source XPath engine, which is why we have a xs:float cast to the random-number-generator()?next function, which is filed on github.

I hope this explained how Fonto works and why it works like that. During the years we built quite the product, and we are aware some parts work in unexpected ways for those who have not been with it from the start. If you have any points of Fonto you would like some focus on, we are always ready and willing to share! Reach out on Twitter to Martin Middel or file a support issue!

Martin Middel

Developer advocate / Evangelist. Has been with Fonto since it all began in 2013. He’s currently designing the next steps in Fonto Developer APIs with the input of our valuable partners.

In his spare time, Martin is an avid home brewer.

Stay up-to-dateFonto Why & How posts direct in your inbox

Receive updates on new Fonto Why & How blog posts by email

Fonto: Why & How: How do I generate IDs in a more predictable way?!

Fonto: Why & How: How do I generate IDs in a more predictable way?!