The session “Big Data / Little Data: The Practical Capture, Analysis, and Integration of Data for Publishers” takes place on Tuesday, October 8 from 10:40 to 11:30 a.m. at the Frankfurt Academy’s CONTEC 2013 Conference as part of the Frankfurt Book Fair.
By Porter Anderson
The digital dynamic sweeping the global industry demands a mastery of many forms of data, its challenges and its potentials, as a top priority.
Publishing Perspectives spoke with three publishing practitioners who will discuss this challenge at this year Frankfurt Book Fair’s October 8 CONTEC 2013 Conference during the panel discussion “Big Data/Little Data: The Practical Capture, Analysis, and Integration of Data for Publishers.”
“Our foundations as an industry are creative, not analytical,” said Kristen McLean, the Miami-based founder and CEO of marketplace startup Bookigee. “We’re talking about an evolutionary change in our DNA. It won’t happen overnight, and we won’t go from being one type of organism to a completely different type of organism in one jump.”
“With the shift to digital,” said Posth, “publishers are confronted with an entirely new way of dealing with content and stories. The way that texts and types of media are being consumed is changing radically and rapidly. And with it, the logic and logistics of a digital world.”
Posth is the CEO of Berlin-based Publishing Data Networks, which offers analytics to the German publishing industry.
Dawson, who is Bowker’s Product Manager for Identifiers, is one of the best-known speakers on data on the world stage and is a go-to authority on changing issues in metadata for journalists as well as for publishing colleagues.
“Shopping has radically shifted,” Dawson said. “Today, ‘shop’ equals ‘search.’ It’s no longer enough to have clean bibliographic metadata on Amazon. You now have to think about things like, ‘How will someone find my book on Google?’ and hop from Google to Amazon. You can influence this behavior. And that fact can affect everything from what you call your book to how you mark up your Web site.”
Data, Great and Small
Many in the industry are working to sort out what’s meant by the terms “big data” and “little data,” and the terms are used, not surprisingly, with varying interpretations and imprecision.
Big Data is what organizations know about people — be they customers, citizens, employees, or voters…Big Data is what enables banks to predict credit card fraud by analyzing billions of transactions, marketers to understand customer sentiment by analyzing millions of interactions on social media, and retailers to target promotions and offers by analyzing millions of purchases.
In contrast, Little Data is what we know about ourselves. What we buy. Who we know. Where we go. How we spend our time. We’ve always had a sense for these things — after all, it’s our lives. But thanks to the combination of mobile, social, and cloud technologies, it’s easier than ever to gain insight into our own behavior.
It’s clear that the fast-growing centricity of data in business has major and sometimes daunting implications for publishers.
“As an industry,” McLean said, “we lag behind most major consumer industries, including the music, TV, and film industries, in using data to make informed decisions about our content and audience. We have been super-resistant to this idea that we should let audience insights drive content development, to the point that when asked, most folks in the editorial and content production areas of mainstream publishers are unable to give even the most basic metrics on who their actual customers are, and how much it costs the company to get and retain that customer.”
“Selling,” added Dawson, “is now a data game. Not learning everything you can about how search works is leaving money on the table.”
“And data analysis, by the way,” Posth said, “is nothing entirely new to publishers.
“But the digital world offers so much more data and different information to analyze, combine, integrate, visualize,” he said, “which might make data analysis in the digital world look like Neuland” — the “virgin territory” by which German Chancellor Angela Merkel referred to the Internet itself during a June news conference with President Obama.
McLean also points out that publishers can’t necessarily be blamed for having to run so hard to catch up with the new pivotal presence of data in their business.
“This is not entirely their fault,” McLean said. “We have a locked-up data pipeline in which publishers don’t have access to complete data. By and large, they don’t own that customer (the reader), so they don’t know that customer.
“The data they do own,” she said, “is primarily ‘4 walls’ data–data that comes from within the company. That kind of data does a good job of telling you what’s going on in your own house, but not in your neighborhood, or in your city. You and your corporate family may be doing swell, but did you know developers just bought 1,000 acres next door? ”
What Is Data’s “Elephant in the Room”?
Asked what the proverbial “elephant in the room”—a major problem no one wants to acknowledge — when it comes to data in publishing Posth went right for the denial, however understandable, of some publishers when it comes to the new importance of data.
“Data analysis is a business requirement and a necessary means to deal with the digital change,” Posth said. “The publishing industry needs to learn this lesson if it wants to survive. Publishers need to make sure that they work with partners (retailers, intermediaries, distributors), that in general support the idea of exchanging, at best, real-time information between people and organizations in a distributed supply chain.
“Data is not a giveaway or supplement to a business deal,” Posth said, “it is a prerequisite.”
Bookigee’s McLean named “lack of a central metadata warehouse” as the data-elephant in the room.
“I see it as a huge structural problem,” she said, “that adds tremendous complexity to a growing data ecosystem. It’s going to get solved eventually, but perhaps not in a way that will benefit many of the existing players. I feel like we need some good leadership here, but I am skeptical about whether the system forces in play will allow it.”
And Bowker’s Dawson, who is a steadfast proponent of what is sometimes called “the networked book,” said, “From my perspective, our current elephant is the Web.
“Book publishers studiously avoid knowing how the Web works. If I had required reading for anyone in book publishing now, it would be Weaving the Web by Tim Berners-Lee, because the Web is how people find out about things – or find out more about the thing they heard about. Using, creating, interpreting and encoding data needs to be as natural as breathing to us in book publishing now.”
Dawson added an ironic note: Berners-Lee’s book? “Is not in ebook form.”
How Do We Look Ahead?
“In the future,” Posth said, “content will be available in many different ways via APIs, Web sites, apps, native formats, streams. This will make it impossible to come up with a successful offering without knowing how the readers like to interact and engage with the content.
“Publishing is not about shipping units, entities or containers to a single user anymore,” he said. “It is about distributing transmedia stories across multiple platforms and formats to an audience that will use the content in ways that the publisher might not even have intended.”
Dawson said she believes publishing has made some progress in terms of grasping the rising importance of data, if only to a degree.
“I think we ‘get’ the idea of good bibliographic metadata, an ONIX record,” she said. “But product data is a moving target because the way we find out about products changes rapidly. Now semantically marking up your web pages can generate things like ‘rich snippets’ and Google’s Knowledge Graph.
“And at some point we’ll be semantically marking up the ebooks themselves,” she said, “linking them to other books and concepts, Wikipedia-style, almost. Which is why it’s important to understand how the web works – because that’s the direction books are headed in.”
Beyond such issues of product identification and linking, McLean said she’s seeing clouds on the rights horizon.
“I think global consumer behavior is on a direct collision with our existing rights environment,” she said.
“If I am a reader in Australia, and I want a book or movie produced only in the US, I’m going to get that book or movie. Anything you do to make that more difficult for me is going to frustrate me, and make it more likely that I’m going to get a pirated version of that content.
“Expansive platforms like Wattpad that have no rights issues are already drawing huge audience for English-language material from all over the world,” she said. “There is nothing stopping them from expanding their language and content communities. If they decide to start publishing traditional projects from within that platform, they could have a huge advantage and very little friction.”
McLean may have had a moment of clairvoyance there. Just after she made her comments to Publishing Perspectives last week, it was announced on September 5 that Allen Lau’s Wattpad was entering a partnership with Dominique Raccah’s Sourcebooks to publish Wattpad writers through the Sourcebooks Fire imprint, which produces YA material.
Sourcebooks will select Wattpad authors writing for the Young Adult category to publish through the Sourcebooks Fire YA imprint and plans to use Watpad’s new Fan Funding platform to encourage Wattpad members to support their favorite Wattpad authors. Wattpad material acquired by Sourcebooks will remain on Wattpad in its original form, though the books will be edited (with new chapters and other added material) and produced by Sourcebooks for general bookstore distribution.
In a partnership announcement like that, you can see some of the considerations coming together from Posth (those multiple platforms beyond traditional “containers”), Dawson (material created and persisting in-network), and McLean (some intriguing potential rights points).
The trio’s session at CONTEC is an attempt to bring it all into focus, even as data development in publishing seems to fly by like trains on Frankfurt’s S Bahn.
“Web, web, web, web, web,” Dawson said. “I hope to talk about the Linked Open Data movement, the Linked Content Coalition, and show a very scary diagram.”
The “Big Data / Little Data” session at CONTEC starts at 10:40 a.m. on October 8 at Frankfurt Marriott, following addresses from keynote speakers including Wiley CEO and president Stephen Smith and SoBooks’ founder Sascha Lobo. More information is here and a 20-percent discount on registration is available with the use of the code CONTEC13KPTW20.