By Ingrid S. Goldstein
We live in a networked society. The Internet, digitization, and globalization have turned our world into a complex system, a tangled web of relationships. Web 2.0 is already being replaced by Web 3.0, the Social Semantic Web. The Web of documents is turning into the Web of data. We live drowning in information, and need the right information at the right time available on the medium currently within reach.
But the established workflows in the publishing industry are no longer compatible with today’s realities. New ways of thinking and new approaches must be established in order to accomplish the required change to meet new consumer demands for a constant flow of connected information. A new paradigm is emerging; in its center is the content and not its printed form, the book. It opens up an opportunity that has not been there for several centuries, an opportunity to organize familiar work processes in completely different ways, to address new tasks, to discover a whole new world with all its potential.
Another new paradigm has emerged at a rapid pace: companies outside the publishing industry, such as Apple and Amazon, are increasingly the ones that define the technical standards on how information is to be published. Publishers are finding themselves in a defensive position when it comes to the means of production. They deliver the content, which has always been the core of their activities. How it will reach the customer — in which technical forms, through which distribution channels, based on which business models — are decisions upon which they have little influence.
E-books are currently the big hype in publishing. Nevertheless e-books are not the actual crucial point. Perhaps they are so popular because in concept, name, and production workflow they most resemble the printed book. The e-book is both alien and familiar at the same time, and can easily be connected to the publishing business. Things look very different, but when it comes to smartphone apps or mash-ups, everything familiar quickly disappears. While the e-book is an excellent entry into the digital world, it should not blind publishers so they miss the larger point. It is definitely not about focusing on specific formats, as their persistence (or rather brevity) in the market is impossible to predict. They may be called e-books or printed books, online publications or iPhone apps. A publisher should be able to serve all of these formats, without focusing exclusively on one.
But how can this be done?
XML, Semantics and RDF
Particular attention must be paid to the quality of the content and its editing. XML, semantics and RDF emerge as main priorities.
Over the past decades, XML, the eXtensible Markup Language, has become an established and indispensable part of the publishing world. The basic idea of XML is to describe content in a media neutral form, independently of software and hardware platforms, to allow for flexible and automatized processing procedures. Not all XML structures are the same. Depending on how the XML structure is defined different goals can be achieved. The following examples illustrate this:
While the first example simply marks the beginning and end of a paragraph, the second example tags additionally typographic statements such as bold, which can then be implemented in the output medium. The third example combines XML with semantics in a semantic markup, which renders transparent the meaning of the content, rather that its design. The word Berlin is clearly identified as city and the word Germany as state. This semantic or meaning oriented markup is completely independent of the output medium. It can be automatically converted to various typographic realizations for various output formats, or can be used to create an on-the-fly search index, to name a few possibilities.
RDF, the Resource Description Framework, can be expressed in XML. It was originally developed as a data model for describing metadata in the World Wide Web, and is now used, among other things, as a language for knowledge representation. RDF puts the data in the context of their meaning by storing complete statements: “Berlin is in Germany.” Such a statement consists of three parts, the subject (Berlin), the object (Germany) and the predicate (is in), which connects the two. Together with other statements, a knowledge network is created that allows accessing the data in various ways.
This triad of XML (media-neutral notation), semantics (meaning of the data), and RDF (context of the data and metadata) is the best way for a publisher to be prepared for future requirements. Such a tagging strategy leads to increased transparency, and hence to a growing potential to respond quickly and inexpensively to changing market and user requirements. Such data is referred to as agile content. Any content may require further editing or improvement in the course of its lifetime.
Content published today that is not prepared as agile content is prone to substantial additional expenses to produce various output formats.
Depending on the type of content, different strategies apply. Change is not instant but rather a process that needs time but can be accelerated. Any publisher who is not already on this path should take the first steps today.
Ingrid S. Goldstein (igoldstein[at]know-arch[dot]com) has over 20 years experience working in publishing and focusing on publishing strategies based on semantics and XML. She founded the London/Heidelberg-based consulting firm Knowledge Architectures last year.