Publishing Perspectives Staff Report
Offering 251 Voices in 72 LanguagesThe digital acceleration experienced by many parts of the world publishing industry during the ongoing coronavirus COVID-19 pandemic meant not only a greater adoption (and for some consumers discovery) of ebooks, but also an uptake in audiobook sales and listening.
Logically, this has led some publishing houses to look at their backlists as a source of new, salable audio content.
An American company with Siberia-born founders, Speechki was created just in time, it would seem, for this new interest among many publishers in generating audio products from existing content. As Publishing Perspectives readers know, audiobooks are a favorite among publishing’s people because they’ve had a strong track record of sales for many years, with subscription services gaining traction in many international markets. The company will be exhibiting at Frankfurter Buchmesse (October 20 to 24) in Hall 4, stand B101, and is giving a digital Master Class at Frankfurt on October 13.
While many major publishing houses have developed their own divisions to produce audiobooks—and many contract the work out to independent production companies—expense remains a consideration. The work of talented narrators as well as the costs of good technical preparation make it necessary to triage a list, deciding where the best chances for audio income lie, and choosing carefully which titles get audiobook editions.
Speechki’s answer to this is the use of synthetic voices, text-to-speech read by machine. The company offers 251 voices in 72 languages, which makes it possible to produce internationally salable audio. Automation doesn’t stop with the production of a chosen voice, either. The system was designed especially for publishers, to make the traditional steps of audiobook production easy. With a few clicks publishers can upload a book’s text, select a voice and language, choose some of the audio settings including the speed of the narration and the desired type of audio file.
Speed to market might be an appreciated factor, as well. While it can take weeks and sometimes months for traditional audiobook production, Speechki reduces it to one or two days, thanks to the fact that it requires only 15 minutes to generate an eight-hour audiobook and about 10 more hours for “proof-listening” and fixing machine errors. And a part of the team’s interest is in taking the mystique out of what comes under the heading of an AI process, artificial intelligence.
“There’s no need to cheat or dissemble” about the fact that a listener is hearing a machine-generated voicing of a book. “You can be absolutely up-front about the fact that you’re using a synthetic voice,” say company materials.
‘A Solid Network in Publishing’
When Publishing Perspectives asked co-founding CEO Dima Abramov why he and his partner Sergey Baranov (COO) established the business, Abramov said he’d had the same disappointment that many of us have run into—looking for the audiobook edition of something and finding out there wasn’t one.
“I often had trouble finding he books I wanted to read as audiobooks,” he says. “Audio is the only convenient format for me.
“Doing research, I realized that more than 95 percent of the books published each year were not produced as audio. So my partner Sergey Baranov and I founded Speechki in 2018. We’d worked together on another successful software development company. And Speechki now has produced nearly 1,000 books for our publishing clients to date.”
He and Baranov, he says, came from a background in news and media. “We had a solid network in publishing,” he says. “We know the market very well.”
Both Baranov and Abramov have advanced degrees from Russia’s Siberian State Automobile and Highway Academy in Omsk, Abramov with a masters and PhD in computer science and Baranov with a master’s degree in information technology and management and a doctorate in computer software and media applications.
Abramov says the response from publishers to date has been promising. Most who learn about the program seem interested in a pilot program, he says, and they’ve been contacted about possible audiobook production to date for some 7,000 projects.
“Our low costs allow publishers to publish many more books in audio than they ever could before,” he says. “We make scaling up their audio divisions fast and inexpensive.”
An audiobook production, including the narration talent and technical factors, can cost between US$3,000 and $5,000, by some estimates. “The price for one title recorded by Speechki with the most modern synthetic voice is $1,000,” he says.
“And we have less expensive options, including a subscription model for publishers ready to start making conversions in bulk quantities” from text to audio editions.
The reception of an audiobook, of course, depends heavily on how comfortable a listener is with the voice of a narrator. On the Speechki homepage, you can use a slider to run through many “Smiths”—William Smith from Australia, Carol Smith from the United States, Nanda Devi from India, and Evelyn Smith from England, for example—each of whom will read a brief passage to you.
Speechki is able to manipulate the voices to a degree, allowing a client to make choices as to how they sound.
It’s true that there can be issues that need handling. For example, in one example we’ve heard, the synthetic narrator reads the name of the 16th-century mathematician Copernicus as “Koh-per-NICK-us”—a mistake many human schoolchildren easily will have made, too. This is what the proof-listening stage of production is for. Publishers can listen to a recording to catch such needed adjustments, or they can have Speechki do the proof-listening as an in-house service.
And while some professional narrators have been concerned about artificially generated narration challenging their work opportunities, Speechki’s corporate stance is that its service isn’t meant to replace the aesthetic excellence of a fine human narration.
“The company is about expanding the audiobook industry as it already exists,” the team writes, “not replacing it.” No technology, they say, can make the skills of a talented voice actor obsolete. “But actors deserve to be paid and have only so much time,” the team writes in its statement. “At the current rate, they produce recordings of a small fraction of available books. Their skills will continue to be in demand.
“Professional narration is an art, not a science. But Speechki’s services allow publishers to present a huge new variety of titles to the listening public in a pleasant, easily accessible form.”
One of the key advantages of what Speechki’s service does, Abramov and his associates say, is the long-form sustainability of lifelike renditions. Each voice is “cultivated,” if you will, for its own values and sonic factors, so that its freshness and distinction can cover a wide range of textual requirements.
And so the company hopes to be able to offer its demo to trade visitors next month at Frankfurt, and is also available for consultations and online demonstrations at its site—as well as during the Speechki digital Master Class at Frankfurt on October 13 at 3 p.m. CEST / 1300 GMT / 9 a.m. ET.
More from Publishing Perspectives about the Master Class program that Speechki is a part of at Frankfurter Buchmesse is here. More on audiobooks is here, more on the ‘digital acceleration’ in publishing during the COVID-19 pandemic is here, and more from us on issues in artificial intelligence and the book business is here.