Behind the Dictionary

Lexicographers Talk About Language

Inside the OED, Part 1: The Wisdom of Crowds

Ever wonder how work is done at the Oxford English Dictionary, the world's largest and most prestigious English-language dictionary project? We got the inside story from none other than Jesse Sheidlower, OED editor at large, who works on North American materials out of the dictionary's New York office. In the first installment of our three-part interview, Jesse explains how the OED's North American Reading Program operates. (Note the firmly American spelling of "Program"!) The reading programs (or programmes) have been radically transformed by the digital revolution, but at the same time they still follow the traditions set down 150 years ago by James Murray, the dictionary's first editor. As Jesse explains, the OED relied on "the wisdom of crowds" for the gathering of historical evidence long before the age of Wikipedia.

VT: One of your responsibilities is overseeing the North American Reading Program.  Could you describe what that is and how it works?

JS: The OED is a historical dictionary, which means that for every sense of every word it contains quotations from chiefly written sources, showing how that word has been used over time. Originally the way that you would get these quotations, which are called citations, was that you simply read a wide variety of texts.  And any time you come across an interesting word, you write it down on a slip of paper.  You do this for enough years and read enough texts and you will eventually have a very, very large file with slips of paper in it that shows how the word has been used throughout its history.  You take a file of these, you sort them into order, you divide them up into senses.  And you have your dictionary there, based on the evidence that's in front of you. 

In a way, it's a collaborative project, one of the earliest collaborative projects in a way that Wikipedia and things like that are thought to be now, where these books were read by a very large number of people, thousands of people spread all over the world. Readers would take books either that they were interested in reading or that were assigned to them that would illustrate some time period of English or a particular subject area. They would find the interesting words and send them in.  So everyone was contributing the words that they found to the OED.

VT: Nowadays that would be called "crowdsourcing."

JS: Yes, and this process still goes on with the North American Reading Program and the OED's other reading programmes. There is one in the UK, one for world English, one for scientific sources, one devoted just to pre-1800 material.  And each of them has slightly different goals.  But in general the idea is that you're reading a lot of sources and trying to come up with interesting words. 

One of the things that has changed over time is that 150 years ago, or even 25 years ago, the only way of assembling this kind of material was to pretty much read a text through and find examples and write them down.  Now with the tremendous growth in online databases, it's very easy to find large numbers of good examples, even from published sources from just about any time period in English, by looking at online sources.  So it's no longer necessary to have a reading program to find a word. 

James Murray's classic example of the difficulties of reading programs is that people tend to notice unusual words and they don't notice usual words.  So when he was first starting to edit, he noticed that in the files there were five examples of the word abuse but 50 examples of the word abusement. This does not mean that abusement is 10 times more common.  It means that any time you come across a word that's unusual, of course you'll write it down.  But, it will never occur to you to write down a word like abuse because it's so common.  So then when you're working on abuse, you have this problem where the evidence you have in front of you is not sufficient.  In the old days, you'd have to either use a text-based concordance, such as was written for the Bible or Shakespeare or a very small number of other sources, or just read through and hope you can randomly find an example of this word from the time period you need.  Now you can go online and punch up every example of abuse published in an English source in the entire eighteenth century, for instance.

VT: So what's the point of having a reading program then, if you can do all of this simply by searching online databases now?

JS: The nature of the reading program has changed over time, where reading in order to find a particular example of any given word is no longer that important a goal because you can find these online.  The things that you want now are, first of all, identifying new words, or in particular new senses.  And this is something that's very hard to do online. Even if you know what you're looking for, it can be hard to find. 

To take one example, there's the so-called "Gen X so," where the word so is used to emphasize words that typically don't allow for comparison — something like, "Blogging is so 2004," or, "You are so not going to discuss that with me." This is something where it's very hard to find examples online because even if you can imagine a frame in which this can appear, the word so is so common that you're either going to find extremely narrow things because you're required to search so narrowly, or you're going to miss things that are out there because you don't know to search for them or you can't easily construct a search for them.  You get 10,000 examples and only one of them might be the thing you're interested in. A reading program person would identify this as a new sense.  And the examples of so you have in a database will reflect this, rather than the 9,999 other examples that you're not interested in.

VT: Just for the record, what's the earliest recorded use of "Gen X so"?

JS: I found an example from 1979, in Woody Allen's movie Manhattan: "'He's a big Bergman fan, you know?'  'Oh, please! God, you're so the opposite.'"  But the canonical example is from the movie Heathers: "Grow up, Heather, bulimia's so '86."

VT: What are some of the new ways that the reading program is working to bring together collaborators in the search for citations?

JS: We have a project that we started a number of years ago devoted to science fiction terms, where volunteer moderators are running a website devoted to science fiction terms in the OED, with examples of the words and definitions and discussions of how they're used.  Enthusiasts can add words to that, which will then be added to the OED's database and eventually either appear as part of OED entries or just be on the website for people to see. 

This was the perfect example of a kind of field where people are extremely devoted to the subject and very knowledgeable about it. They're able to find examples of things that would be very hard to find without this kind of specialist knowledge, and they contribute commentary to it that would be very hard to do without lots of specialist research.  It appears as a separate website, Science Fiction Citations.  Many of the entries are now being used as part of OED itself.  And there's been a book published out of it called Brave New Words, which is effectively a historical glossary of science fiction terms taken from the website. 

I think it's a good example of how this kind of work can benefit everyone involved, where the people who are contributing get to have a website devoted to their subject that's very detailed and is shared for everyone to be able to see.  The OED has extremely high-quality research that would be hard to get in any other way.  The world at large has a free site, where they can see tons of information about the vocabulary of science fiction.  And it seems like a win-win situation for everyone.  So this sort of project is something that we've been thinking about expanding into other areas.  The problem is, it is time-consuming to set up and you do need to have moderators to run it.  But, there's no reason why you couldn't have forums devoted to this and people actively discussing particular words and how they're used.  Any sort of specialist area could benefit from such an approach.


Rate this article:

Click here to read more articles from Behind the Dictionary.

Join the conversation

Comments from our users:

Wednesday July 30th 2008, 11:31 AM
Comment by: Talley Sue H. (New York, NY)
What happened to Jesse's bow tie?

He said: ". . . the people who are contributing get to have a website devoted to their subject that's very detailed and is shared for everyone to be able to see."

But he forgot the part about the people contributing getting to have FUN, and feel important and authoritative.

Over at Language Log, there are two posts about whether not being in the dictionary means a word is not a word. And lexicographer Grant Barrett made some interesting comments about the limitations--including manpower shortages--on getting words into a dictionary.
http://languagelog.ldc.upenn.edu/nll/?p=410#comments


Combined w/ Jesse's comments, that's an interesting perspective on how words get into dictionaries.

And how does one get into a reading program for the OED? Wouldn't that just be fun? And give you bragging rights?

(we had a big discussion in our family about "points" versus "bragging rights" after my daughter "got a point" for remembering to ask them to hold the ketchup on her McDonald's burger when we were on vacation--something we don't have to worry about in NYC)
Wednesday July 30th 2008, 1:27 PM
Comment by: Shirley R.
The impermanence of some of the words/phrases being paid attention to, seems like a waste of time and effort. Furthermore, they do not seem to provide clarity of meaning, but create confusion. When I first taught ELS in a former ussr republic, our younger volunteers had so much "throw away language"that they were not understood by the Russians who had taught and spoke English. However, teaching slang, for example, "cool" which is so irrelevant, gave the impression that English was being taught.
Wednesday July 30th 2008, 10:35 PM
Comment by: Talley Sue H. (New York, NY)
The word "cool" is irrelevant?
Saturday January 3rd 2009, 3:15 PM
Comment by: Rosina W. (San Rafael, CA)
If "cool" were irrelevant, would it have endured, with meaning intact, for at least half a century? (The earliest citation I can think of, off the top of my head, is the eponymous song from West Side Story (1961?), with its finger-snapping gang members. I suspect that the usage is a good deal older.)

I also question calling the word "slang," given its firm foothold in the language. "Cool" has clearly stood the test of time through several generations, and its use isn't confined to any particular demographic. Preschool kids use it. Sixty-something Baby Boomers use it. No other word quite captures the same connotations, making its niche in the language even more secure.

Idiom? Sure. But irrelevant? Never! The word is just too, uh, *cool.*
Sunday October 7th 2012, 12:12 PM
Comment by: begum F.Top 10 Commenter
To Rosina W., nicely written argument.

Do you have a comment?

Share it with the Visual Thesaurus community.

Your comments:

Sign in to post a comment!

We're sorry, you must be a subscriber to comment.

Click here to subscribe today.

Already a subscriber? Click here to login.

We talk to Jeff Prucher, editor of Brave New Words: The Oxford Dictionary of Science Fiction.
Behold the Corpus
- 1 Comment
How enormous collections of texts known as "corpora" are revolutionizing lexicography.
A reading list on recent trends in mass collaboration, pioneered by the OED 150 years ago.