By Mark Piesing
Was this story was written by a robot?
Unless C-3PO was writing the story right next to you, or unless I cracked a joke, expressed an opinion or demonstrated some other form of human creativity such as an elegant turn of phrase, it might be a lot harder to tell than you’d think.
This isn’t solely a theoretical question either: in March 2014, within three minutes of a 4.7 magnitude earthquake occurring off the coast of California, a story had been written by a piece of software called Quakebot, approved by an editor and published online by the LA Times.
While this wasn’t the first time the LA Times had used a robot to write a story (or anyone else for that matter – a bot was being used to write weather forecasts in Canada as far back as the early 1990s), but last March seemed to be the moment when robot journalism stepped out of Hollywood and on to the high street. And journalists like me had to start taking a long hard look at our career plans as some commentators started to argue that in fact 90% of journalism read by the public will be written by robots in 2025.
Now media outlets including Yahoo and Associated Press use a piece of software called Wordsmith to write automated text — Yahoo for personalized, colorful commentary about every individual team, every week, in its Fantasy Sport series; and AP to churn out 4,400 quarterly earnings reports, a fifteen-fold increase on what it achieved using humans. Some big household names like Société Générale and L’Oreal are said to be already using robo-writing software called Yseop to produce documents such as company reports and even dialog with customers online. And they are the only ones willing to go public with it. In private many more companies like these are already using robo-writers, but are hiding behind non-disclosure orders out of fear perhaps of public reaction or, more likely, of losing their competitive advantage over their rivals.
These robot journalists work at superhuman speeds. The software program Yseop, uploaded to a customer’s server, can write 3,000 pages per second – vital speed in a world where he who publishes first gets the most hits. Wordsmith can write 2,000 articles every second.
And while the software salespeople tell journalists that they don’t have to worry about keeping their jobs because this software will in fact give them more time to focus on the interesting bits, their wide, shark-like grins suggest that journalism will soon join all the other professions that robots may take over.
Robo-journalism depends on software that can be categorized as Natural Language Generation (NLG) systems. Like a lazy student who cuts and pastes from Wikipedia, at their simplest these systems match lines of text from a library to facts and figures from a database to produce a short factual story. More advanced systems can spot repetition, choose individual words, decide a narrative structure and even alter tone or style for different outlets. They can publish direct to content management systems and in different languages.
Several companies have started to build into these systems artificial intelligence (AI) such as an “inference engine” so they can make the next step and do more than just write a simple factual report. Inference engines are not some kind of huge self-aware steampunk machine, but rather act something like an autopilot on an airliner by being able to analyze data based on rules they have been given. Yet even with these AI robo-journalists, we are still talking more the quarterly figures than War and Peace.
A Million Stories for an Audience of One
“We are not aware of a single person who has been replaced by our technology,” says James Kotecki, Manager of Media and Public Relations for Automated Insights, the company based in Durham, North Carolina who are behind Wordsmith. “Instead, our Wordsmith automation platform both produces content that humans cannot and frees humans to do more interesting work.
“Wordsmith dynamically spots patterns and trends in raw data and then describes those findings in natural language, just like a human would. Wordsmith authors insightful, personalized reports around individual user data at unprecedented scale and in real time.
“Our software is running before the first word is even written, to determine topic, tone, style, fact-generation, and lexicon. The output can be just about any format – web, email, even social media.
“The traditional journalism model is to write one story and hope a million people read it. We flip that model on its head, writing a million personalized stories for audiences as small as one. That type of model only makes sense because of automation.”
There is no theoretical length limit to the output of Wordsmith as long as there is structured data, it can produce content. In the case of an entire book, the limiting factors would likely be the availability of enough of this kind of data and the time it would take to configure the system. However, while subjects that lack good data are difficult to automate with today’s technology in the future this will change because as the internet of things grows so will the range of data that could be fed into Wordsmith.
Yesop’s Director of Communication Arden Manning says that “Yesop is the next step” by not just analyzing data like its rivals but using an inference engine to analyze how and why something has occurred. Yesop has offices in the Dallas, New York, London, Paris and Lyon, and was once listed as one of the top ten most innovative start ups in France.
“It is important to differentiate between AI and Hollywood AI. Hollywood AI is Mr. Data. This is AI which is a tool for people. When I first started in PR ten years ago journalists had one major deadline a day; now they have to blog, tweet and respond to comments, so their job has gone from working really hard on one story every day to many more, less well-researched – no offense – stories. So what we hope is this will up your time so you can write the well-researched story that you did ten years ago.”
They have also built the first system that clients can alter or build their own versions of, and “even if you can just change a formula in Excel” you should be able to play with the software.
Manning concedes that there are limitations to his software, but “speed isn’t an issue. Language isn’t an issue. Nor is length. Creativity and opinion are the only real limitation.”
Yesop can write stories in English, Spanish, French and German, with more like Chinese and Japanese on the horizon.
“Sitting down questioning me like you are doing now is not something that a machine is going to do as you are being creative with your questioning. Anyone who says that a computer is going to win the next Pulitzer prize is exaggerating for effect; it’s not going to happen.
“When you have a computer with a clear reasoning process it can offer non-biased advice. I suppose you could program in some sort of bias. ‘If this is happening, the answer is always lower taxes.’”
Christer Clerwall, an assistant professor of media and communications at Sweden’s Karlstad University, has run a small study with a group of undergraduates to see what public reactions to robot-written content might be. While students aren’t typical members of the public, they did rate the recap of an American football game written by a robot as more credible but more boring and tiring than the report written by an LA Times journalist, which was more pleasant to read. However, to his “surprise” they thought both reports were readable.
In the end, Clerwall says “I always get asked the question, if robot journalism is going to be good for journalism.” And his answer is always the same: “If you can free journalists from this kind of manual work and use these resources for more in-depth journalism then it would be great; however, technology has usually been used to decrease cost and increase profit so I am more skeptical about this more positive side.”
However, at the moment he is certainly not seeing any signs that it is being used in the Swedish media, although that could be down to the simple fact that the software doesn’t work in Swedish yet.
The future of robot journalism for Clerwall is data-journalism, as the bots will make it cheaper because “there is a lot of effort going into allowing these programs to mine unstructured data, as in the past someone had to do a lot of manual work analyzing it and then placing it in a format that is accessible.
“But while systems are getting better and better at analyzing and retrieving usable information from non-structured data I don’t see in any way how the software itself could be become the investigative journalist.”
But doubtless they will come up with an algorithm for that.