By Porter Anderson, Editor-in-Chief | @Porter_Anderson
‘Advanced Statistical Algorithms’Of all the commentary about OpenAI’s model ChatGPT—and who hasn’t commented on it?—some of the more level-headed observations for the publishing industry may come from Thomas Cox, a specialist in computer science and the managing director of Arq Works near Oxford.
Cox’s Arq Works focuses on software for the book publishing industry. This, of course, is one of the key reasons that we wanted to speak with him for today’s article (February 17). He’s familiar to many of our readers for his work with Frankfurt Rights.
And with Arq’s team having “integrated artificial intelligence into their content management and discovery tools” for clients, he recently wrote a brief opinion piece at Arq Works’ site about “AI”and ChatGPT, “The Future of AI, What is ChatGPT, and Does This Affect Publishing?”
Cox’s input is useful, of course, because the international publishing industry at times can be quite emotional in its responses to technological developments. You’ll remember no small amount of tearing of hair over anything dubbed “digital” years ago, for example, and the days when “enhanced ebooks” were predicted to guarantee a future in which print books would have no place. Similarly, there have been rash warnings of computers using ChatGPT to write and sell whole books—and trashing the book publishing industry forever, just as video killed the radio star, right?
‘It Will Happily Lie to You All Day’
Many people, even in publishing, can be unnerved by cyber-rattling when ChatGPT is discussed.
But what may be more seriously concerning, as Cox points out, is the potential for machine-generated systems to generate and promulgate content that’s incorrect and misleading—misinformation and, in the wrong hands, disinformation.
So extensive are concerns about ChatGPT that, as Anna Tong in San Francisco is writing for Reuters, “OpenAI, the startup behind ChatGPT, on Thursday said it is developing an upgrade to its viral chatbot that users can customize, as it works to address concerns about bias in artificial intelligence.”
Cox talks about how the software can “hallucinate,” as technologists call it, generating a stream of authoritative sounding verbiage that’s in fact utterly wrong.
“Advanced statistical algorithms. It’s never going to be a thinking machine.”Thomas Cox, Arq Works
“Which is to say, it will lie to you,” Cox says. “Like it’s the truth. Because it has no idea that it’s not the truth. So if you ask it for a particular set of facts about something, it will tell you that ‘whatever was invaded on this date in this year, and all of these things happened’—and you can’t trust it.
“A lot of the time it’s wrong because it’s based on how it’s trained. Because of the way the statistical model in the background works, it’s not a truth engine at this point. It’s not a knowledge base. It’s not the source of all human knowledge. It’s just representing an answer which has come back [because] it’s matching on all the algorithms that it’s been trained on. So it will happily lie to you all day.”
Answering one concern for educators in publishing, Cox says, “I think that for professors, educated people who are reading blog posts, it’s going to be easy to see the ones spewed out by ChatGPT or similar systems because if they’ve not been updated, they will just be absolutely filled with inaccuracies.”
Microsoft already is writing “Learning From Our First Week” blog posts on its Bing site, where it’s integrating OpenAI tech into its search engine. Those back-pedaling wheels of change are quickly in force, Microsoft’s tone now being one of “healthy engagement” and “good feedback on how to improve.”
This Too, Too Sullied Software
In his own post, Cox shows you a bit of text that ChatGPT generated for him “in the style of William Shakespeare.” A short excerpt:
‘To publish online, or not to publish online, that is the question:
“Whether ’tis nobler in the mind to share one’s thoughts and creations
“On the World Wide Web, or to keep them locked away,
“Never to be seen by the digital eyes of the world?”
It’s this sort of stylistic mimicry—algorithms at work on the key phrase “style of Shakespeare”—that might turn a head or two in a publishing office. Give it online writing as a theme and a reference to Hamlet’s soliloquy, and the software can make clever-ish substitutions and allusions that might even compare thee to a summer’s day.
But Cox knows to start where many people in publishing—people who work with words for a living, after all—might also begin: with the erroneous nature of the term artificial intelligence itself.
This is not actually intelligence. There is no known sentience in any “AI” system. The “AI’s,” as they’re called, are not self-aware and they “know” nothing. Machine learning, in fact, is not even learning as a child might learn, but the rapid selection of responses developed through super-fast iterations of a question or factor. The machine plunders existing texts, searching for matches to a phrase, without consciousness or context.
“In the short term, I’m sure it will be used by the publishing community to help research and produce blog posts, news articles, and content.”Thomas Cox, Arq Works
What are we really talking about here? “Advanced statistical algorithms,” Cox says, proving that reality rarely turns out to be so sexy as a buzz phrase. If A.I. Artificial Intelligence played well as a 2001 film from Steven Spielberg, it’s a phrase far too facile for the truth of what we’re talking about today.
The software is capable of making vast numbers of comparisons across huge volumes of textual sources, with lightning speed. Using “reinforcement learning,” Cox points out, means that the program uses what he calls “a loop of human feedback to refine and rank the answers that were provided.” So this is vast algorithmic capability with a human follow-up. That’s how it deduces the best answer to make to a question—and how it still can “byte” into a huge pile of responses that are way off the mark.
“It’s never going to be a thinking machine,” Cox says. “Parts of its architecture are kind of modeled on neurons and how they seem to act. But that’s only at quite an abstract level in the sense that we understand how the brain works and how those neurons interact–an understanding that’s still relatively rudimentary.
“So whenever someone says, ‘Okay, we’ve got our new AI system?’ In my head, I’m always thinking about how that can range from someone writing an algorithm in order to solve a particular specific issue all the way to something like ChatGPT, which has moved the bar a fair bit because of its general nature. It’s not solving a specific problem. It’s designed to generate text, but that’s any text that you want.”
And this is a key point, he notes, the “general” nature of ChatGPT, a “generalist generative AI,” wowing us because it seems—like a huge human mind—to have access to so many topics, themes, concepts, data.
‘A Lot of Hype’
“Many of the technologies that OpenAI is using could be developed by Google,” Cox points out. “They have similar technologies but to a completely different scale, because obviously they’ve got the resources to put into it. And, you know, their search changes all the time. Who knows how much of the algorithm underneath is using their ‘AI’ technologies in order to respond better?
“It must be five years ago that they launched Google Lens. Take a picture or hold your phone up to anything and it will tell you what it is, and if you want, it will do a search for it. It’s a slightly different technology, but it’s all ‘AI’-based, all advanced algorithms again, turning that picture into vectors, essentially comparing it to other objects.”
Never one to miss the sound of another shoe falling, Cox, having mentioned Google, adds, “I would imagine they’ll hit back with something” as ChatGPT fever spreads. “There’s a huge amount of funding going to OpenAI with a lot of hype, and an awful lot of the technology underlying it is actually Google’s.”
In his own post, Cox writes, “In the short term, I’m sure it will be used by the publishing community to help research and produce blog posts, news articles, and content. In the longer term, similar technologies will revolutionize the way we access information and will automate the types of tasks that up until now have been seen as skilled work.” That, of course, could mean disruption in a workforce that gradually shifts to certain amounts of automation, though at this point it’s unclear in what scenarios.
“Because of the way the statistical model in the background works, it’s not a truth engine at this point. It’s just representing an answer which has come back [because] it’s matching on all the algorithms that it’s been trained on. So it will happily lie to you all day.”Thomas Cox, Arq Works
For now, he says, it’s good to keep in mind that a software like ChatGPT is “feeding off found texts. There are a few publicly available databases right now in text, and that’s what it was trained on. That includes sites like Wikipedia” as well as freely available books’ content, sites “from all over the Internet,” and so on.
“And with that,” Cox says, “you get all those kinds of things we’ve worried about ‘AI’ reproducing the biases it finds online.”
Think about instances of misinformation and disinformation online relevant to the COVID-19 pandemic alone, and you realize the potential breadth of what he’s talking about as a bot sifts through those texts. Efforts to contextualize what it might offer from such a trove of unfiltered source material have caused some to say that one development may offer you a way to have the software produce responses that are more formal and less unguarded. “You might be fine with it being more loose in how it speaks,” he says. Or not.
But at this point, the deeper concerns here have to do with the potential for deliberate customization to impact a reader.
“The really scary thing,” Thomas Cox says, “is that if you’re using an app that knows something about you—say, for instance, one of your social media apps—then if any of that information about you is available externally to advertisers or people who are trying to tap into that, [someone using a ChatGPT-style software] can start customizing the message to you.”
What may, in fact, become the token of publishing’s realm, then, is that it’s not algorithm-generated content. A publisher’s marketing campaign for a new book that’s “100-percent human-generated literature” could become the most valuable pitch around.