Exploring the pathways of our lexicon
Does E-Mail Have Fingerprints?
In the Sunday Review section of the New York Times, I took a look at how forensic linguists try to determine the author of an e-mail by picking up on subtle clues of style and grammar. This is very much in the news, thanks to a lawsuit filed against Facebook founder Mark Zuckerberg by one Paul Ceglia, who claims that Zuckerberg promised him half of Facebook's holdings, as proven by e-mail exchanges he says they had. Did Zuckerberg actually write the e-mails? Call the language detectives.
As I learned from researching the piece for the Times, the field of forensic linguistics is a contentious one, especially when it comes to matters of authorship attribution. It's one thing when a scholar is trying to determine who wrote a literary work — for instance, when the English professor Donald Foster correctly identified Joe Klein as the author of the political novel Primary Colors, despite Klein's protests to the contrary. Even with a long text, like a play that may or may not have been written by Shakespeare, there can be vehement debates among scholars. Now imagine trying to determine the author of a handful of e-mails or text messages.
The expert report filed on behalf of Zuckerberg came to the conclusion that he probably didn't write the e-mails that Ceglia said he did, but the evidence was seen as rather skimpy by the forensic linguists I talked to. (For further details, see this discussion by Mark Liberman on Language Log, especially the comments made on the post by Ron Butters, Larry Solan, and Carole Chaski, all experts in the field.) A handful of style markers were claimed to reveal an authorial difference between the e-mails in question (Ceglia quoted 35 of them in his amended complaint) and actual Zuckerberg e-mails from the time. These markers included variations in spelling (cannot vs. can not), capitalization (Internet vs. internet), punctuation ("..." vs. ". . ."), and syntax (run-on sentences vs. sentences with separating punctuation). The expert, Gerald McMenamin, found that 9 of the 11 style markers that he analyzed showed differences and two showed similarities, which he saw as strong enough evidence to conclude that Zuckerberg was not the author of the questioned e-mails. But others have argued that the sample size was simply too small to draw meaningful conclusions, and that the style markers were not systematically measured.
The Facebook case is just one of many where forensic linguists studying authorship attribution have become key expert witnesses in litigation proceedings. Their expertise has also been sought by law enforcement agencies in the hunt for criminal suspects. One famous case is that of the Unabomber, where textual clues were key in identifying Ted Kaczynski as the author of the Unabomber's manifesto in 1996. For instance, the manifesto used the phrase "You can't eat your cake and have it, too," instead of the more typical "You can't have your cake and eat it, too." The "eat/have" word order is, in a way, more logical, and also happens to predate the "have/eat" order — see my On Language reader response for more. Ted Kaczynski had used the "eat/have" version in known letters (his brother David said that their mother had taught them that was the way the expression should be), and this was one of the pieces of evidence that helped FBI agents convince a judge to issue a search warrant of Kaczynski's Montana cabin.
It's unclear, however, whether such evidence against Kaczynski would have stood up in court, had the case come to trial instead of ending in a plea agreement. Federal courts have set criteria for expert testimony, known as the Daubert standard, requiring that conclusions be based on sound scientific methodology rather than anecdotal impressions. Some forensic linguists, such as Carole Chaski of ALIAS Technology, have sought to put authorship attribution on firmer scientific footing. She cross-validates her methodology, which focuses on syntactic features. But as Chaski told me, even attaining a high accuracy rate in identifying the author of an e-mail or other text doesn't mean that we each leave a unique "fingerprint" in our writing. And in any case, she pointed out, the fingerprint metaphor may be rather misplaced, given the barrage of criticism that fingerprint identification has received lately in criminal forensics.
So there might not be a foolproof way of detecting whether an e-mail was or was not written by someone, whether it's Mark Zuckerberg or you or me. Nonetheless, we each have our stylistic quirks in writing, just as we do in speaking. In the Times piece, I made an off-hand mention that one of my own tics is an overreliance on em-dashes — I just can't help myself. My em-dash overuse struck a chord with readers, many of whom told me on Twitter that they too are big em-dash fans. John McIntyre, copy editor for the Baltimore Sun, took me to task, however, counseling restraint with the em-dash when other punctuation like commas or parentheses would do. Now that I've become more self-conscious about em-dashes, I've toned it down a bit, which just goes to show how much variability a single person can have in his or her writing. As Walt Whitman might say, we are large; we contain multitudes.