Publishing Perspectives

Publishing Perspectives Navigation
  • Features
  • News
  • Rights
    • Submit Rights Deal
  • Magazines
  • About Us
    • Contact
    • Advertise
  • Subscribe
  • Search
  • Features
  • News
  • Rights
    • Submit Rights Deal
  • Magazines
  • About Us
    • Contact
    • Advertise
  • Subscribe
  • Search

Is BookLamp’s “Book Genome Project” the Future of Discovery?

In Digital by Edward NawotkaAugust 24, 2011

This new book discovery search engine breaks a book down into 32,160 data points and quantifies everything from density and pacing.

By Edward Nawotka

How BookLamp ingests a book

How BookLamp ingests a book

If you thought metadata was complicated, meet BookLamp.org, a new book discovery search engine that tracks 32,160 distinct data points per book. “We do this by taking the full text provided by a publisher in a digital format and running it though our computer,” explains CEO Aaron Stanton. “Our program breaks a book up into 100 scenes and measures the ‘DNA’ of each scene, looking for 132 different thematic ingredients, and another 2,000 variables.” A reader can go to the BookLamp site, which was launched in beta last week, and do a keyword search for titles that meet the criteria similar to a title they plug into the site. Pundits have dubbed it the “Pandora for Books,” though Stanton prefers the term “Book Genome Project.”

“Say you’re looking for a novel like the The Da Vinci Code. We have found that it contains 18.6% Religion and Religious Institutions, 9.4% Police & Murder Investigation, 8.2% Art and Art Galleries, and 6.7% Secret Societies & Communities, and other elements — we’ll pull out a book with similar elements, provided it is in our database,” says Stanton.

Stanton began the BookLamp project in 2003 while a student in Boise, Idaho, when he and his roommates scanned in a copy of Richard Bachman’s Thinner — something that then took a full six hours to do — before realizing what he wanted to do was likely beyond the scope of a college student. In 2007, he though it would be perfect for Google and managed to land a meeting (see “CanGoogleHearMe.com,” which became a viral meme in its day). Stanton then took the project to Dr. Matthew Jockers, professor of computational linguistics at Stanford University, who helped develop the protocols for BookLamp’s “contextual stylistic analysis.”

Today, BookLamp has some 20,000 texts in its database — primarily from Random House and Kensington publishers — and has amassed nearly 650 million “data points” in all. “We expect it to be in the billions in a few months,” says Stanton.

BookLamp assessment of Stieg Larsson

BookLamp's assessment of Stieg Larsson

But can a computer really accurately assess the content of a book? Stanton thinks so. “Our original models are based on focus groups,” he says. “We would give them a highly dense scene and a low density scene, for example, and ask them to assess them, which gave us a basis for training the models. Then we looked at books that might exceed the models and tweaked the formulas. In this way, our algorithms are trained like a human being.”

BookLamp quantifies such elements as density, pacing, description, dialogue and motion, in addition to numerous nuanced micro-categories, such as “pistols/rifles/weapons” or “explicit depictions of intimacy” or “office environments.”

“In many ways, using using thematic and other ‘ingredients’ as an alternative to traditional metadata,” says Stanton, who envisions the project serving readers, writers and publishers equally.

The first iteration of BookLamp — what you currently see online — is squarely aimed at readers. Writers and publishers, on the other hand, will soon be offered the ability to upload their manuscripts to BookLamp and have their books assessed along the same criteria. These works will go into a “living database of manuscripts” — which can be used by publishers who want to seek out manuscripts with certain characteristics. “For example,” says Stanton, “say vampires are hot one year, so you turn down all the books about space aliens, but then the trend shifts to space aliens — you can search our database for manuscripts that match these emerging trends and stay ahead of the curve. For authors, a rejected book is never a rejected book, since it can always be found.”

At the moment, BookLamp’s biggest obstacle might just be the publishers and writers themselves, who may very well be reticent to see their books converted into data points. The limited database of just 20,000 “is by far the biggest criticism of the site.” His goal is to hit 100,000 titles by the end of the year.

Curiosity seekers can sign up for and explore at BookLamp now at www.booklamp.org.

DISCUSS: Do You Trust a Computer to Tell You What to Read?

About the Author

Edward Nawotka

A widely published critic and essayist, Edward Nawotka serves as a speaker, educator and consultant for institutions and businesses involved in the global publishing and content industries. He was also editor-in-chief of Publishing Perspectives since the launch of the publication in 2009 until January 2016.

Tags: book discovery, BookLamp, computers, Dr. Matthew Jockers, future of publishing, Metadata, online book search

SUBSCRIBE

Sign up to get our FREE email edition, Monday to Friday!

SUBSCRIBE »

Browse Popular Topics

Audiobooks
Authors
Bestsellers
Book Prizes
Book Sales
Bookselling
Children’s Books
Digital Publishing
Distribution
Education
Frankfurt Book Fair
Literature
London Book Fair
Marketing
Reading
Rights
Statistics
Translation
Writing

Browse Countries / Regions

Africa
Asia
Australia
Brazil
Canada
China
Europe
France
Germany
India
Italy
Latin America
Mexico
Middle East
Russia
Southeast Asia
Spain
Turkey
USA
UK
Publishing Perspectives, a brand of Frankfurter Buchmesse and operated at MVB US, Inc. | newsletter@publishingperspectives.com
  • Subscribe
  • About Us
  • Contact
  • Terms
  • Privacy Policy
  • Advertise