By Nick Ruffilo
Hugh McGuire of Pressbooks gave a wonderful speech about books as APIs at O’Reilly’s Tools of Change for Publishing Conference last mongh (you can see Hugh’s Slideshare here) and that spurred a great deal of discussion and buzz. Many people have been talking about how APIs are the future of publishing. All this talk is great, but we a better understanding of what an API is from a business perspective, before the discussion can continue. The acronyms API, XML, JSON and HTML have been tossed around in the publishing industry for several years. Unfortunately, they are frequently being used incorrectly. The best education in my eyes is one you can see and play with. So, I built a book-as-API example that anyone can use.
What is an API?
API stands for Application Programming Interface and that is really just a fancy way of saying “a way for programs to talk to each other.” An API defines a set of parameters — in the example I’ve created, I let you define if you wish to see metadata, a specific scene, or the dramatis personae. Then, I have additional parameters that let you further refine things.
The second function of an API is to provide information back to you. The example lets you choose TEXT or HTML. Text is more human-friendly, HTML is more computer-friendly. But most APIs let you choose the format that you are using in your application.
API Examples in the Real World
APIs are everywhere. One of the most widely used APIs is Twitter. If you’ve ever used Twitter within a third-party app, or from anywhere but Twitter.com, you’ve interacted (indirectly) with an API. The Twitter API lets external applications send and receive tweets as well as gather additional data, like user profiles. In fact, in its broadest definition, Google (and all search engines) are APIs. You tell it what data you want (your search query) and it provides a list of formatted results (the search results).
APIs are like a telephone call. Whether it is your cellphone or a landline, you pick up the phone, dial a number, and talk to someone on the other line. HOW it works is completely irrelevant, all the other person needs is a phone. You can call cell to cell, cell to landline, and landline to landline — it doesn’t matter.
Why Do We Need APIs in Publishing?
APIs are the simplest (and most preferred) way for applications to talk to each other. APIs allow anyone you give access to get (or send) data in a structured and logical form. Amazon has a full product API which allows any website display the image, description, and selling information for a product. This API has enabled millions of web users to become affiliates for Amazon, which fueled their explosive growth in the early 2000s. Twitter’s APIs allowed a host of app makers to create different tweet experiences and allowed third-party app developers to create value-added services.
Experiment with our API Example
Try out our API example here: Book As API Example – The Comedy of Errors by William Shakespeare
The code is freely available and visible from the API itself. I have added a convenient form so that you can simply select the parameters you wish and get the data in whatever format you’d like (with or without HTML tags). Go. Play with it for a little while. When you’re done, come back and we’ll get into the nitty gritty.
The proof of concept I built provides three different types of information: Metadata, Character List, and Scenes. The value proposition for metadata should be obvious, but I’ll outline some potential uses. Let’s say you make all the metadata for your books available via an API (to be clear, when I say Metadata, I mean your title, author, ISBN, cover image, short description, long description, categories, and any other useful information that describes your book). As a developer, I can now access your metadata through the API and create a WordPress widget that will allow bloggers to simply provide your ISBN and all of your metadata will be displayed on their blog as you want it to appear. Instead of relying on a blogger to copy/paste the description from Amazon, or write up their own, you can ensure that the data they are using is accurate. They can still write their review, for example, but your data is controlled.
Another use is within Twitter or Facebook. If I were to tweet out the ISBN or book title, your API could be used by Facebook or my Twitter client to grab the additional metadata and display it to the people reading my post. My life has become easier as the poster and your data is now better displayed.
So why would you need a character list or scenes? Well, for almost the same reasons you need metadata. It’s content. You may want to only provide access to certain content, but by making your content easily available means people will share it and consume it. It could also lead to sales. Your API could provide access to a simple table of contents or an index, all the way to the full text of a book. (McGuire offers several examples in his Slideshare from his TOC talk.)
XML, JSON, and HTML
I mentioned that an API returns data. XML, and JSON are formatted data languages. XML uses tags (denoted using < and >). And XML respond may look like this:
<Item1>My first item</Item1>
<Item2>My 2nd item</Item2>
Item1: “My first item”,
Item2: “My 2nd item”
It is all the same data, just represented in a different way. There are a few (minor) pros and cons to using each, but for the most part, it does not matter. It’s like using a spoon or a knife to stir your sugar in your coffee in the morning. Both ways get the job done, and the efficiency gains from the spoon are rarely an issue.
Where does HTML come in? HTML is actually XML. HTML is a defined set of XML tags. So, HTML is a specific, XML is the generic. The same way that all beagles are dogs, but not all dogs are beagles. Every HTML file is also an XML document but not all XML documents are HTML or are meant to be viewed in a web browser.
There’s a Debate Over Using XML vs HTML for Storage? Why?
Since XML is a generic language, you can pretty much do anything that your heart desires. Although, if that were the case, there would be no way to ensure your XML document was useful, so there are some standards that have floated around. Anyone who worked with digital books in the late ’80s and ’90s is familiar with the wonderful SGML (which is still in use in some places). There are a few arguments that HTML is lacking enough tag support to do everything that publishing needs, but if you look at the EPUB 3 spec, I’d argue that very little is not possible using EPUB 3, and what isn’t there can be adjusted to work within the HTML5 spec. To learn more about XML and HTML, see my other post as part of my Tips for Technologists series, Tips for Technologists #4: Understanding XML/HTML/CSS.
Why Does Any of This Matter?
As you can see from my example, as long as you have a valid HTML document (as all your EPUB files are), with about 50 lines of code, you can immediately provide access to that content (and yes, you can limit it so that only 10% of the content is available, etc.). What that means is that any company (or person) out there can now get data from your content — such as title, author, ISBN, buy links (if you wish to provide those) and first chapter. That information can be used to help your content be discovered and ultimately be sold — whether as full text, snippets, chunks, or something entirely reconfigured. While a full-scale enterprise level book API is not as simple as the 50 lines of code I provided, it is a very low-resource project that can provide quick value to your content.
This Is Not New
Over a year ago, Oren Michels, CEO of Mashery.com, wrote this piece for Publishing Perspectives about making APIs for books. Amazon launched its product API in July of 2002. And more recently in these pages, Helmut von Berg of Klopotek went into a deep discussion of “Why the Future of Publishing is ‘Networked Publishing,” which is a broader application of the implications available to publishers through such tools as APIs.
While the term API may be new to you, you’ve been interacting (indirectly) with APIs as long as you’ve been using a computer. They provide a solid framework for communication of data between applications, and, often, their true potential isn’t understood until they are released and begin to be used.