How Keywords, Machine Learning Unlock Book Discovery

In Guest Contributors by Guest Contributor

Jim Bryant of argues that advancements natural language machine processing leading to keywords may unlock a revolution in book discovery.

By Jim Bryant,

Jim Bryant of

Jim Bryant of

Leading publishing industry analysts started highlighting the increasing urgency of developing better “book discovery” practices more than ten years ago. Several years ago, this issue was highlighted as one of the biggest challenges facing the future of the publishing industry. Today, no one in the publishing world is seriously debating the relevancy of “book discovery.”

The challenge of “book discovery” is highlighted by the well-accepted facts surrounding the notion that never before in history have so many books been available to so many readers.

  • Hundreds of thousands of backlist titles are being released by publishers.
  • Hundreds of thousands of self-published books are being offered by ebook retailers and self-publishing platforms.
  • Hundreds of thousands of titles from foreign publishers are making their way into domestic supply chains.

All of this is happening of course because of the widespread adaptation of ebooks and the convenience of being able to discover and order a book online and to begin reading it instantly on our new mobile devices.

One new solution to book discovery is the use of keywords that can be extracted from the text of a book. Typically, the approach is to algorithmically deconstruct each sentence to identify each word by part or speech. Each word can be assigned a value based on its frequency of use within the book vs. its frequency of use in the English language. Keyword extraction can be further broken down to identify people, places, and a wide range of other useful entities such as the presence of SAT words (the words that we are requested to master before taking the test) or perhaps even the presence of profanity. Using keywords and sentence structure it is also possible to algorithmically measure the complexity of the story and estimate the average reading level required to read to understand the book.

Here is an example of how useful keyword extraction can be. Consider The Mayo Clinic Diet. It is easy to see from the word cloud generated below that two of the keywords are calorie and exercise. But these two words don’t appear in the title or with the brief description. Great books like this one can be more easily discovered when these keywords are integrated into search processes deployed by ebook retailers and libraries.

Trajectory 2

Trajectory 1

Algorithmic recommendations offer another novel solution to the book discovery dilemma. Accurate recommendations are of increasing value today with millions of books to choose from and they are now being generated with great success. Sophisticated algorithms have been developed to compare keywords, sentiment, intensity, mood, complexity, reading level, age level, and dozens of other points of comparison that exist between one book and every other book.

Trajectory 3

Many accomplishments in human history take place over a relatively short period of time. We all remember the fact that only 66 years separated the Wright brothers flight in Kitty Hawk and Neil Armstrong’s landing on the moon.

Just three years ago, Small Demons used Natural Language Processing techniques to successfully identify specific brands and famous people that appeared in the books that they processed. Two years ago, BookLamp (now part of Apple) announced an early level of success in identifying the themes contained in trade books. And today, Trajectory is extracting the relevant keywords and generating algorithmic-based recommendations based on dozens of points of comparison. Considering how quickly machine processing and machine learning are evolving it is interesting to wonder what will happen over the next 60 years. Will algorithms be created to help authors create custom stories for individuals or will the algorithms create the stories themselves?

Jim Bryant is co-founder and CEO of Trajectory, Inc.

About the Author

Guest Contributor

Guest contributors to Publishing Perspectives have diverse backgrounds in publishing, media and technology. They live across the globe and bring unique, first-hand experience to their writing.