Stanford Literary Lab Applies “Big Data” to Reading

In News by Dennis Abrams

By Dennis Abrams

Moby Dick Model

A topic of words from Moby Dick. Image: Matthew Jockers.

What with all the recent headlines about the NSA and big data and meta data, it’s nice to know that there are non-governmental organizations that can make use of such things as well. One such use, suggested by Franco Moretti, professor of English at Stanford University and co-founder of the Stanford Literary Lab, is for those with too many books on their hands and not enough time to read them: “Feed the books into a computer program and make graphs, maps and charts: it’s the best way to get to grips with the vastness of literature.” (Ed note: Sounds an awful lot like an offshoot Booklamp’s ‘Book Genome Project‘ which also worked with the Lab in its early days).

Writing about Moretti in the Financial Times, John Sunyer notes that “For centuries, the basic task of literary scholarship has been close reading of texts. But for digitally savvy academics such as Moretti, literary study doesn’t always require scholars to read books. This new approach to literature depends on computers to crunch ‘big data,’ or stores of massive amounts of information, to produce new insights.”

Examples? A 2011 study at Harvard of just four percent (or roughly 5 million) of all the books published in English shows that less than half of the number of words used are actually in dictionaries; the rest being what is known as “lexical matter.” Or, as a recent highly publicized study showed, “American English has become decidedly more ‘emotional’ than British English in the last half-century.”

Of course, this approach to literature has its detractors. Elif Batuman, author of The Possessed: Adventures with Russian Literature and the People Who Read Them, wrote in n+1 that “[His] concepts have all the irresistible magnetism of the diabolical.’ But as Sunyer writes, for Moretti, “The use of technology to study literature is only radical when you consider it in the context of the humanities – the most backward discipline in the academy. Mining texts for data makes it possible to look at the bigger picture – to understand the context in which a writer worked on a scale we haven’t seen before.”

In the article, Sunyer describes a visit he made to a Stanford Literary Lab seminar via Skype, where he had the chance to speak with a 27-year-old associate director for research at the lab, Ryan Hauser. Hauser told Sunyer that he couldn’t remember the last time he actually read a novel, “It was probably a few years ago and it was probably a sci-fi. But I don’t think I’ve read any fiction since I’ve been involved with the lab.”

This might be considered odd, considering that Sunyer describes the lab as “the office of the world’s most elite group of data-diggers in the humanities.” But as Matthew Jockers, a 46-year-old professor of English and co-founder of the lab told Sunyer, “We are reaching a tipping point. Today’s student of literature must be adept at gathering evidence from individual texts and mining digital text repositories.”

Jockers’ big claim to fame? He’s the first professor to assign 1,200 novels for just one class. “Luckily for the students, they didn’t have to read them,” he told Sunyer.

Other potential uses for big data? In his book To Save Everything, Click Here, technology writer Evgeny Morozov notes that Amazon has access to huge amounts of data collected from its Kindle devices that specify in just what part of a book readers give up on their reading. So, Morozov speculates, before we know it, “Amazon could build a system that uses this aggregated reading data to write novels automatically that are tailored to readers’ tastes.”

Where might it all lead? Moretti concedes that distant reading is still a “complex, thorny issue,” but adds, “Will we succeed? Who knows. But in the next few years, people will use this data in ways we can’t imagine yet. For me, that’s the most exciting development.”

Franco Moretti is the author of Distant Reading, a collection of essays published earlier this month by Verso.

About the Author

Dennis Abrams

Dennis Abrams is a contributing editor for Publishing Perspectives, responsible for news, children's publishing and media. He's also a restaurant critic, literary blogger, and the author of "The Play's The Thing," a complete YA guide to the plays of William Shakespeare published by Pentian, as well as more than 30 YA biographies and histories for Chelsea House publishers.