Word Routes

Exploring the pathways of our lexicon

Does E-Mail Have Fingerprints?

In the Sunday Review section of the New York Times, I took a look at how forensic linguists try to determine the author of an e-mail by picking up on subtle clues of style and grammar. This is very much in the news, thanks to a lawsuit filed against Facebook founder Mark Zuckerberg by one Paul Ceglia, who claims that Zuckerberg promised him half of Facebook's holdings, as proven by e-mail exchanges he says they had. Did Zuckerberg actually write the e-mails? Call the language detectives.

As I learned from researching the piece for the Times, the field of forensic linguistics is a contentious one, especially when it comes to matters of authorship attribution. It's one thing when a scholar is trying to determine who wrote a literary work — for instance, when the English professor Donald Foster correctly identified Joe Klein as the author of the political novel Primary Colors, despite Klein's protests to the contrary. Even with a long text, like a play that may or may not have been written by Shakespeare, there can be vehement debates among scholars. Now imagine trying to determine the author of a handful of e-mails or text messages.

The expert report filed on behalf of Zuckerberg came to the conclusion that he probably didn't write the e-mails that Ceglia said he did, but the evidence was seen as rather skimpy by the forensic linguists I talked to. (For further details, see this discussion by Mark Liberman on Language Log, especially the comments made on the post by Ron Butters, Larry Solan, and Carole Chaski, all experts in the field.) A handful of style markers were claimed to reveal an authorial difference between the e-mails in question (Ceglia quoted 35 of them in his amended complaint) and actual Zuckerberg e-mails from the time. These markers included variations in spelling (cannot vs. can not), capitalization (Internet vs. internet), punctuation ("..." vs. ". . ."), and syntax (run-on sentences vs. sentences with separating punctuation). The expert, Gerald McMenamin, found that 9 of the 11 style markers that he analyzed showed differences and two showed similarities, which he saw as strong enough evidence to conclude that Zuckerberg was not the author of the questioned e-mails. But others have argued that the sample size was simply too small to draw meaningful conclusions, and that the style markers were not systematically measured.

The Facebook case is just one of many where forensic linguists studying authorship attribution have become key expert witnesses in litigation proceedings. Their expertise has also been sought by law enforcement agencies in the hunt for criminal suspects. One famous case is that of the Unabomber, where textual clues were key in identifying Ted Kaczynski as the author of the Unabomber's manifesto in 1996. For instance, the manifesto used the phrase "You can't eat your cake and have it, too," instead of the more typical "You can't have your cake and eat it, too." The "eat/have" word order is, in a way, more logical, and also happens to predate the "have/eat" order — see my On Language reader response for more. Ted Kaczynski had used the "eat/have" version in known letters (his brother David said that their mother had taught them that was the way the expression should be), and this was one of the pieces of evidence that helped FBI agents convince a judge to issue a search warrant of Kaczynski's Montana cabin.

It's unclear, however, whether such evidence against Kaczynski would have stood up in court, had the case come to trial instead of ending in a plea agreement. Federal courts have set criteria for expert testimony, known as the Daubert standard, requiring that conclusions be based on sound scientific methodology rather than anecdotal impressions. Some forensic linguists, such as Carole Chaski of ALIAS Technology, have sought to put authorship attribution on firmer scientific footing. She cross-validates her methodology, which focuses on syntactic features. But as Chaski told me, even attaining a high accuracy rate in identifying the author of an e-mail or other text doesn't mean that we each leave a unique "fingerprint" in our writing. And in any case, she pointed out, the fingerprint metaphor may be rather misplaced, given the barrage of criticism that fingerprint identification has received lately in criminal forensics.

So there might not be a foolproof way of detecting whether an e-mail was or was not written by someone, whether it's Mark Zuckerberg or you or me. Nonetheless, we each have our stylistic quirks in writing, just as we do in speaking. In the Times piece, I made an off-hand mention that one of my own tics is an overreliance on em-dashes — I just can't help myself. My em-dash overuse struck a chord with readers, many of whom told me on Twitter that they too are big em-dash fans. John McIntyre, copy editor for the Baltimore Sun, took me to task, however, counseling restraint with the em-dash when other punctuation like commas or parentheses would do. Now that I've become more self-conscious about em-dashes, I've toned it down a bit, which just goes to show how much variability a single person can have in his or her writing. As Walt Whitman might say, we are large; we contain multitudes.

Rate this article:

Click here to read more articles from Word Routes.

Ben Zimmer is language columnist for The Wall Street Journal and former language columnist for The Boston Globe and The New York Times Magazine. He has worked as editor for American dictionaries at Oxford University Press and as a consultant to the Oxford English Dictionary. In addition to his regular "Word Routes" column here, he contributes to the group weblog Language Log. He is also the chair of the New Words Committee of the American Dialect Society. Click here to read more articles by Ben Zimmer.

Join the conversation

Comments from our users:

Thursday July 28th 2011, 1:58 AM
Comment by: Kcecelia (San Francisco, CA)
Enjoyed your original New York Times article as well as this follow-up piece. My dad told me em-dashes were a lazy habit people employed instead of determining what punctuation was actually appropriate. I disagree. I will admit to overusing them, but I also think if I write the way I speak, there is an awful lot of em-dash activity going on.
Thursday July 28th 2011, 3:29 AM
Comment by: Narjas
I know I use brackets a lot to mark 'inner thoughts' (I noticed I just used inverted commas there - possibly for emphasis in this case, something else I do a lot of).

Hyphens punctuate my emails too - usually when I find myself 'speaking' in staccato phrases - instead of finishing my sentences properly. A sort of shorthand, if you will.

Frequent use of commas is another clue that it's me - I don't know many contemporaries of mine who use commas to the extent that I do!!!!

I'd better not get up to anything fishy via email, I reckon. (Oh, and then there're the apostrophe shorteners....)
Thursday July 28th 2011, 3:40 AM
Comment by: Alice M. (Neuss Germany)
Is style variability not also a question of mood, subject matter, available time, target audience or recipient, and changes in general language usage? I find that my own style varies considerably depending on all these factors.
Thursday July 28th 2011, 5:38 AM
Comment by: Narjas
Yes, but I'm guessing that like your fingerprints - sometimes they're sweaty, inky, hot or cold, but they still can be traced back to your hands... Your mode of expression, whether time-pressed, moody, or socially (un-)inclined, is still heavily dependent on the extracts coming from your own unique brain (and its consciousness) which harbours your experiences to date, cultural influences, environmental conditioning that modify what we say and how it's said - in our own particular manner. (I think, therefore I'm [a believer in the expression-signature])
Thursday July 28th 2011, 10:33 AM
Comment by: Wood F.
I would love to learn more about the nuts and bolts of forensic linguistics. Are there any books or articles anyone knows about that go into greater detail?
Thursday July 28th 2011, 11:08 AM
Comment by: Ben Zimmer (New York, NY)Visual Thesaurus Contributor
Wood F.: If you're interested in the legal side of all of this, I'd recommend Speaking of Crime by Larry Solan and Peter Tiersma. Chapter 8 ("Who Wrote That?") is all about authorship attribution.
Thursday July 28th 2011, 12:13 PM
Comment by: Roger Dee (Haslett, MI)Top 10 Commenter
Isn't it really all about how the writer's thoughts and ideas are most accurately perceived in the reader's mind?
Thursday July 28th 2011, 1:04 PM
Comment by: Alice M. (Neuss Germany)
Narjas: Thanks for confirming I'm unique ;-). And yes, each one of us has an expression signature. That would certainly apply in varying degrees to everything I write or say when acting in my own right. But here's another aspect: the linguistic chameleon. I'm a translator and interpreter and I have to convey not just the meaning and intent of what my clients write or say, but their style and tone as well, which requires extreme adaptability and discipline. (Although I must admit to tidying up woolly prose out of personal antipathy to that and sympathy with the ultimate readers.) So the outcome of my translations is probably an 80/20 or 70/30 blend of the original author's expression and mine. Regarding interpreting, that blend would be more like 60/40 or maybe even 50/50, because in simultaneous interpreting you have 2 to max. 5 seconds flat to assimilate what is being said and transmit it in the other language while registering in a parallel brain loop what the speaker says next, and also monitoring your own voice in a third parallel loop to correct yourself if necessary.
Friday July 29th 2011, 12:51 AM
Comment by: tiger M.
If he sent the emails from his own computer, than it should have a email stamp going to the server. Which would include the IP address, computer date/time, and the email. Would this not be enough of a fingerprint with the linguistic part to prove him guilty?
Friday July 29th 2011, 7:36 AM
Comment by: Narjas
Alice - thanks for explaining what you get up to. Fascinating. My mother used to work as a written translator (German-English/ English-German), so had the luxury of time for dictionaries or thesauri.

I have first-hand experience of being that chameleon because we moved countries (continents even!) when I was growing up. It was pointed out by friends that I seemed to adapt my accent and manner of speaking to that of the person conversing with me: eg. exaggerating a Pakistani accent, leaving out words when speaking to someone for whom English would not be their first choice of language, Scandinavian/ Scottish lilts, fast-paced youthful questioning-at-the-ends-of-sentences? mode of expression, etc... I am guessing this went beyond wanting to be liked, but more to be understood!!!

Like the blend ratios you describe, an interesting way of quantifying the 'barely tangible' (or noticeable)
Friday July 29th 2011, 9:47 AM
Comment by: Ben Zimmer (New York, NY)Visual Thesaurus Contributor
Tiger M.: As I mention in the Times piece, Ceglia claims to have saved these e-mail exchanges by copying and pasting them into a Word document, so they lack the metadata (message headers, etc.) that might be useful in determining their provenance. (Of course, message headers can be faked too, so even having that information wouldn't necessarily establish anything.) The case is currently in a discovery process that allows Facebook's forensic experts to examine Ceglia's computer files and hardware, so we may be hearing more about this soon.
Friday July 29th 2011, 10:19 AM
Comment by: Alice M. (Neuss Germany)
Narjas: It would be very interesting to continue this conversation, but I don't think this is the place for it. Just a thought: I've been meaning to buy "Third Culture Kids" for ages. Have you ever read it?
Friday July 29th 2011, 5:07 PM
Comment by: begum F.Top 10 Commenter
It requires a lot of patience to be a language detective because they have to read and analyze all the contemporary writings. .
These detectives must be familiar with almost identical writings. As a science person, I would say this detective method is based on only physical criteria.
As such surprise and denial are expected in each case.
Friday July 29th 2011, 5:56 PM
Comment by: Narjas
Find me on LinkedIn, Alice. Cheers! Not read it yet, btw.
Saturday July 30th 2011, 9:16 AM
Comment by: Karen F. (New Castle, PA)
Another field that will be needed in the future generations. As for me, how I use any/all of the above methods clearly is my "frame of mind" on any given day. Great topic and discussion.
Sunday July 31st 2011, 12:28 AM
Comment by: Douglas B. (CA)
Email also tends to be bland and nondescript. One does not emphasize his individuality becuause he knows that the communication can be read by people for whom it is not really intended. Therefore, I deemphasize my own personality in such communications to indicate that the communication could have come from anyone and therefore should not be offensive to anyone. 42Elyboy
Tuesday August 2nd 2011, 8:39 AM
Comment by: Stan Carey (Galway Ireland)Visual Thesaurus Contributor
It's a fascinating area. Thanks for your insights into it, Ben. One thing that has struck me is how a person's writing/emailing style can change according to the recipient, just as their speech might in different social contexts.
Friday November 2nd 2012, 12:52 PM
Comment by: anoushka A.
fingerprints on e-mails sounds really fun
Wednesday May 1st 2013, 6:57 AM
Comment by: Kedarnath A. (Pune India)
If anyone knew that they were about to write an email that might become part of a contentious issue later on, it would not be so difficult for the author to deliberately change his or her style...possibly including misstakes where the author might not have made any under normal circumstances. (?)

Do you have a comment?

Share it with the Visual Thesaurus community.

Your comments:

Sign in to post a comment!

We're sorry, you must be a subscriber to comment.

Click here to subscribe today.

Already a subscriber? Click here to login.

A British educational project encourages students to be budding forensic linguists.
Facebook has sought trademarks for everyday words, like "face" and "book."
Predicting New Words
Forensic linguist Allan Metcalf tries to predict the success or failure of new words.