Listening for Wider Adoption: ReadSpeaker Talks Up Text-to-Speech

In Feature Articles by Mark PiesingLeave a Comment

Even as some in publishing seem resistant to the educational benefits of text-to-speech, proponents like ReadSpeaker say the technology is getting stronger.
At Frankfurt Book Fair. Image: Frankfurter Buchmesse, Marc Jacquemin

At Frankfurt Book Fair. Image: Frankfurter Buchmesse, Marc Jacquemin

By Mark Piesing | @MarkPiesing

This Is Not Your Computer Talking
Roy Lindemann co-founded text-to-speech company ReadSpeaker in 1999.

Today, a technology that 18 years ago may have seemed liked science fiction is used by some of the biggest names in educational publishing.

Discovery Education and Cengage Learning use ReadSpeaker to increase students’ understanding and recall of their digital content, whether they’re dyslexic, prone to learn better and recall more by listening; or simply very busy lives on the move.

Roy Lindemann

Roy Lindemann

As Lindemann said during his presentation on the Education Hot Spot stage at Frankfurt Book Fair, the program can read more than 40 languages and speak in more than 150 voices.

Languages and voices offered by the software include Turkish male and Cantonese female, along with a new American English male voice called Mark.

According to Lindemann, since ReadSpeaker was embedded in Cengage’s MindTap platform, clicks on the listening button have jumped from 40,000 in 2011 to some 9 million by 2015. MindTap is an e-learning platform for personalized digital learning tools. Ten percent of all page views now generate a ReadSpeaker activation.

And yet, text-to-speech software does seem to have a problem: adoption isn’t always quick, and traditional publishers may be among those standing out.

‘Making Content More Comprehensible’

readspeaker-logo-linedCengage is asking now to have its offline mobile application speech-enabled by ReadSpeaker, which also has been embedded in Discovery Education’s platform. This means that both HTML content and documents on the site can be viewed and listened to: last year, the system recorded more than 10 million hearings. Users can read an entire document, or a section of the document.

Meanwhile, ETS, Educational Testing Service, develops, administers and marks more than 50 million tests annually in more than 180 countries at 9,000 locations. In order to reduce costs, they wanted to use a text-to-speech application that works offline to replace human readers of examination papers in a number of the largest states in the USA.

At a demonstration on the ReadSpeaker stand at Frankfurt, Publishing Perspectives had a look at how easy the tech is for users to handle: a complex text about how a petrol engine works was brought to audio in the demonstration.

It proved to be relatively simple to highlight the text on the screen that a user might want to hear spoken.

“It’s not a voice generated by computers. Instead, we use voice actors who are trained to read a text in a certain stable way.”Roy Lindemann

“It’s well known today that no two students have the same learning styles,” Lindemann said in his presentation, “and educational publishers have to cater for different ways of accessing their content.

“Twenty percent of the population has some form of limited literary skills, visual impairment, or learning disabilities. It’s about making content more comprehensible. Bimodal learning [reading and listening at the same time] has been shown to improve word recognition skills, reading comprehension, fluency,  accuracy, and concentration.

“It can improve information recall. It can also increase student motivation and more positive attitudes toward reading, and this increases the self-confidence in reading for learners. For educational publishers, text-to-speech is another distribution channel for their content.”

When It’s Not Embraced: The Artificiality Objection
“The recording of the voice actor reading a set text aloud is broken down into very small units of speech like individual syllables and phonemes. Then these units are reassembled by an algorithm.”Roy Lindemann

But when parts of the industry seem resistant to putting voice to text, Lindemann says, it’s often because, “There’s still a perception that text-to-speech technology is totally generated by computers,” Lindemann tells Publishing Perspectives.

“In fact, it’s not a voice generated by computers. Instead, we use voice actors who are trained to read a text in a certain stable way.

“The recording of the voice actor reading a set text aloud is broken down into very small units of speech like individual syllables and phonemes. Then these units are reassembled by an algorithm.

“We use different technologies to do this, one of which is unit selection synthesis. As a result of this process, we are really getting to a natural voice quality.”

ReadSpeaker’s American English female voice called Sophie was recognized by ASR News this month as the most accurate text-to-speech voice on the market, after being tested in 1,800 phrases.

About the Author

Mark Piesing

Mark Piesing is a freelance journalist (and teacher) based in Oxford, UK now writing mainly about technology, culture and the intersection between the two for some of the biggest brands in the UK media such as The Economist, Wired.co.uk, and The Guardian. He also contributes to Warwick Business School's Core magazine. WBS is one of the top business schools in the UK.

Leave a Comment