Little Words Help Crack the "Cuckoo's Calling" Case
"Harry Potter" author J.K. Rowling was recently revealed to have written a crime novel, "The Cuckoo's Calling," using the pseudonym Robert Galbraith. How she was found out involved a couple of linguistic experts analyzing the "little words" that are used in the novel's text.
The literary world is still abuzz over the revelation by London’s Sunday Times that J.K. Rowling of “Harry Potter” fame secretly wrote the well-received crime novel “The Cuckoo’s Calling” under the pen name Robert Galbraith. In chasing the scoop, the reporters called upon two experts in the field of authorship attribution to determine if “Galbraith” was really Rowling. The experts ran the texts through software programs designed to spot stylistic similarities, and the results were compelling enough for the Times to confront Rowling, who confessed to the pseudonymous work. ...
In pursuing the Rowling bombshell, freelance writer Cal Flyn, who worked with Times arts editor Richard Brooks on the story, contacted two academics who have developed software specifically to examine questions of authorship: Peter Millican, who teaches philosophy and computing at Oxford University, and Patrick Juola, a computer science professor at Duquesne University in Pittsburgh. Flyn provided them with machine-readable texts of “The Cuckoo’s Calling” along with Rowling’s previous novel, “The Casual Vacancy,” and novels by three British women who specialize in crime fiction: Ruth Rendell, P.D. James, and Val McDermid.
Millican’s program, known as Signature, and Juola’s Java Graphical Authorship Attribution Program (JGAAP for short), didn’t take much time to yield an answer: “Cuckoo” was stylistically more similar to “The Casual Vacancy” than it was to the work of any of the three other novelists. Millican requested an additional book by each of the writers, and he found that Rowling’s “Harry Potter and the Deathly Hallows,” despite being in a genre far removed from detective fiction, came in second place, ahead of the six non-Rowling novels he analyzed.
I asked Juola to sketch out his research findings for a guest post on the linguistics blog Language Log, where I am a contributor. His post offers a fascinating glimpse into the nuts and bolts of “forensic stylometry,” the machine-based method of extracting features from different texts and calculating their similarities. Juola fed the texts into JGAAP and ran them through four different tests. He looked at the distribution of word lengths in each book, an easy way to generate potentially useful data. He also looked at the distribution of the one hundred most commonly occurring words in the language, which mostly consist of lowly “function words” like prepositions, conjunctions, and articles. Even if you are trying to mask your usual writing style by choosing different vocabulary, it’s hard to fake your typical palette of function words.
Read the rest of the Speakeasy piece here, and then check out Patrick Juola's guest post on Language Log and his interview with Time Newsfeed. You can hear the other expert, Peter Millican, interviewed by the BBC here.
Update: You can hear Ben Zimmer talk about the Rowling case on WNYC's Leonard Lopate Show.