Maarten Cleeren (Noordhoff): “Structured data will be the key to finding and retrieving documents”

"Structured data will be the key to finding and retrieving documents"

In December 2020 Forrester published a report that strongly confirmed our thoughts, and the developments we see in our market(s): The Future Of Documents: A Look Beyond The Paradigm Of Paper And Into Opportunities For Innovation – Forrester 2020. Based upon this report, we’ve interviewed thought leaders about the future of documents. In this article, Maarten Cleeren (MC), Director Operations at Noordhoff, shares his opinion.

The Forrester report points out that, although the tools for creating documents have become friendlier, the way workers and organizations think about documents really hasn’t changed since the introduction of personal computers.

What does a real disruption of this way of thinking about documents look like to you?
MC: “The best way humans tend to consume information is in structured short form. So there is a difference for me in consumption vs creation. Ultimately, I would say that there is not a lot that will change on the consumption end. The true disruption lies in the creation. The old-style idea of database publishing comes to mind as a disruption – although that has had a very specific use. With the coming of AI and machine learning, there come opportunities to make authoring a ‘collaborative’ effort – where authors are being supported by tooling. Tooling to facilitate metadata generation, summarization, translations, ‘light’ authoring (binding text etc.). Ultimately though, information mostly works in context – so even an author will want to retain some type of structured information snippet – which will be for all intents and purposes a ‘document’.”

According to Forrester, document authoring is ready for its moment of disruption, though information worker habits have yet to change because:

  • Friction between cloud-native documents and file storage tools remains.
  • The mental model of paper dominates the language of content management.
  • Static file types clog up processes and require content fracking.
  • Employee preferences are entrenched.

What do you see as the biggest hurdles to changing the way workers and businesses work with documents?
MC: “For me this is much more of a mental model change than anything else. Authors are used to structure full documents and place those in the context of perhaps larger documents (e.g. paragpraphs in chapters, chapters in books). There is a fundamental need for oversight across these sets of content. The true change will be in new ways to organize that context through other content structures (such as taxonomies, for example).”

Can you name any practical examples where the disruption is already taking place?
MC: “AI is already disrupting document creation – in news media, scientific publishing, advertising. Small disruptions are now starting to permeate other areas such as educational publishing, where content re-use is demanding the deconstruction of existing document structures in favour of new ways of aggregating content.”

It’s more of a mental model change than anything else

Structured data will surround content. Documents that include structured data must become the norm when they’re the input for automated processes, such as invoice processing. Document creators must take an outside-in approach and deliver documents in formats fit for purpose.

How can we motivate authors to take this outside-in approach i.e. change their behaviour. What’s in it for them?
MC: “Technology should facilitate this – preferably by automating the process of meta data generation as much as possible. Ultimately, structured data will be the key to finding and retrieving documents. If you want to share your work with an audience (from publishing on an intranet, the internet, a platform or commercial database): structure is the way to share.”

It is expected that robots will share the writing credits with humans. AI authorship will affect document authoring in the near future.

What developments do you expect to see in this context, and what do you already see happening today?
MC: “Content aggregation and summarisation are the most real examples of this – as well as some light content authoring in advertising and journalism. This technology is likely to grow quickly – and will impact what is written (i.e. tailored to suit best the market needs), how it is written (fully automated, automated editing, key word suggestions, summarisation etc), and who is doing the writing. In some key areas that last question still has a large human component – and for some that will always be the case – but there is a large part of text production that will be affected.”

A large part of text production will be affected by AI authorship

In the future, documents will be more fluid, componentized, and structured to separate underlying information from its presentation. A strong metadata-first strategy must replace the folder as an organizing principle and become a foundation for automation and AI.

What is in your opinion the difference between ‘documents’ and ‘data’?
MC: “There is no such distinction. A document can be a representation of a set of data.”

Which content platforms are already evolving in this direction?
MC: “Scientific content platforms (e.g. ScienceDirect PDF reader), news platforms, some fiction, platforms for documentation.”

Which industries are leading the way regarding the future of documents?
MC: “Advertising, STEM publishing, news media.”

Maarten Cleeren

Director Operations Noordhoff

Maarten Cleeren on LinkedIn

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top