Publishers of dictionaries today face a major dilemma: how can they justify continuing to devote the tremendous resources required to produce and distribute a dictionary in book form when an increasingly number of people — a number that is being added to with each new birth in a developed country — will probably never have the need to use or own a paper dictionary? It's a question that has gotten publishers' attention but that is far from finding a solution. There is considerable prestige associated with publishing a reputable dictionary in book form, and a certain amount of chagrin in having to stop doing so.

One tactic to keep the paper dictionary on the bookstore shelf — whether it is a stopgap or permanent solution remains to be seen — is to continue to publish the paper dictionary at a nominal loss while making money in other ways on the dictionary database: through related low-production-cost titles, electronic publishing, and licensing of the data for various uses. A promising but also problematic use of dictionary data is in natural language processing (NLP), wherein computers are fed bucketfuls of language with the expectation that they will be able to do something useful with it faster and more efficiently than humans do. A dictionary database can supply a full inventory of the meanings of words that, in theory, can aid a computer in word sense disambiguation (WSD): that is, determining which of many senses of a particular word is intended in a given context. 

This interface between language database and machine is a busy place that we've visited before in the Lounge (here and here). Our visit this month takes us into a subject that is never far from the lexicographer's heart, and one that is especially problematic for computers when they deal with language: lumping and splitting.

You're probably aware of the phenomena of lumping and splitting in dictionaries and in the VT, even if you don't think of it in these terms. Splitting is the easy one: have a look at fin, for example. It's a good example of a polysemous word —  one with many senses. Dictionary writers and users both find it useful to split a word like fin into multiple senses that are reasonably distinct from each other. The VT's definitions of fin, both the noun and the less frequent verb, are what we call splitty in the trade: each sense designates mainly only one thing.

Lumping, on the other hand, is the grouping together of related meanings of a word under a single sense. Look, for example, at recall. It has some splitty senses, but one particularly lumpy one: "cause one's (or someone else's) thoughts or attention to return from a reverie or digression". This kind of lumping is typical in dictionaries, and makes it possible for them to weigh no more than about five pounds when delivered. This definition of recall contains three or's, and can generate eight distinct definitions:

cause one's thoughts to return from a reverie
cause one's thoughts to return from a digression
cause one's attention to return from a reverie
cause one's attention to return from a digression
cause someone's thoughts to return from a reverie
cause someone's thoughts to return from a digression
cause someone's attention to return from a reverie
cause someone's attention to return from a digression

The lumping in this case is not terribly problematic, because the things lumped (thoughts and attention, reverie and digression, reflexive use and transitive use) are not worlds apart from each other: they are not things of an entirely different nature. But take another definition, not from the VT but from a leading British dictionary, of the noun clipping:

 something cut out or trimmed off, especially an article from a newspaper

This definition contains two time-honored lumping tools: or, which we've noted already, and especially: a definition code-word for indicating that among the meanings lumped in a sense, one particular meaning is far more frequently found than others.

This sort of dictionary-speak is not a challenge to most natural language users — that is, human beings: you take what you need and leave the rest, and you probably find in the definition of clipping what you're looking for because you have a context that tells you pretty quickly which meaning is closest to the one you seek. A computer, on the other hand, is not a natural language user: at best it's an artificial user of natural language, and dictionary-speak is not exactly natural language. The lumping in the above definition of clipping can be somewhat pernicious with regard to NLP: it passes over a "real world" distinction that the human mind makes automatically, but that a computer would not know to do: a clipping that you "cut out" is usually one that you want to keep; a clipping that you "trim off" (nail clippings, lawn clippings) is one that you typically discard. So the definition seamlessly lumps two entirely different kinds of things: one that you separate in order to keep, one that you separate in order to throw away.

When a computer is processing text at a fast clip — say, five thousand words a minute or so — how is it going to decide, on the basis of a definition like "something cut out or trimmed off, especially an article from a newspaper" which meaning of clipping is the one intended? Let's look at some typical uses of the word:

 someone anonymously sent us a newspaper clipping , dated a few years back. It was about how  
blow. Then I explained to him about the  clipping  I'd received from South Africa, the article 
 of contemporary newspaper and magazine  clippings  of the famous events, plus another fine 
ing a fence. No one gets three years for clipping ." She made another quick turn, this time  
een Julius and that woman in the Ledger  clippings  ?"   `I wouldn't have, but he showed me 
er death notice among the old newspaper  clippings  that Miss Grant had collected. Interestingly 
 built, modern suburb. He had with him a clipping from the local newspaper giving the names  
ews. But he did not just throw away the  clippings  . He spliced together all the gaps, the 
ht,' whispered Ellie,'I save my toenail  clippings  and leave them in his sock drawer.''I heard 
eafed through the thick file of Delafoy  clippings  and found my piece near the top. Delafoy 
 dangerous, for blood, like hair or nail clippings , can form a link between you and any forces  
 a very important weapon. "They collect  clippings  , they distribute material in small ways 
ing the lawn while a third raked up the  clippings  . Two herons flew above the distant stream 
 the same day. Impatient with newspaper  clippings  , she set off again to Wolfrats-hausen towards 
tractive shape. Hedges will need regular clipping . Not suitable for growing in pots. Harvesting  
reathe, plant a lettuce, throw the lawn  clippings  onto the compost heap, start the car, fell 
irdresser, with a palmful of straw-blond clippings , had smilingly informed Grillo that he  
ight!" cackled the ancient, stowing the  clipping  away. `And I'll tell you some more. I know 
the Yusufzai clan. Khalil still had the  clippings  from the Melbourne Age, with pictures and 

The newspaper sense is indeed the most frequent, and any sentence that contains "newspaper" as well as "clipping" would probably be safely sorted by the computer into the proper sense. But beyond this, the definition does not provide many cues to a naïve user like a computer about which sense of clipping might be meant. The upshot, if we may go from the particular to the general in one step, is that traditional dictionary definitions — the kind that humans have been happily dealing with for hundreds of years — are often not very useful to a computer processing text.

So here's the dilemma for dictionary publishers, rephrased: can dictionary definition language be made more computer-friendly, as a way of ensuring that dictionary-making is sustainably profitable in the future? And would doing this detract from the usefulness of  dictionary definitions for their core (if not very profit-generating) audience, namely human beings? The short answers to these questions are, respectively, "yes" and "yes" — and this of course does not resolve the dilemma, it only perpetuates it. Later this month, at the meeting of the Dictionary Society of North America, we'll address the question, and next month in the Lounge we'll unpack the answers to these questions in a little more detail.

Rate this article:

Click here to read more articles from Language Lounge.

Orin Hargraves is an independent lexicographer and contributor to numerous dictionaries published in the US, the UK, and Europe. He is also the author of Mighty Fine Words and Smashing Expressions (Oxford), the definitive guide to British and American differences, and Slang Rules! (Merriam-Webster), a practical guide for English learners. In addition to writing the Language Lounge column, Orin also writes for the Macmillan Dictionary Blog. Click here to visit his website. Click here to read more articles by Orin Hargraves.

Join the conversation

Comments from our users:

Friday May 1st 2009, 7:11 AM
Comment by: Maria D.
There is an interesting conversation going on at O'Reilly about making book formats fit computer formats, in general. It uses a book about Twitter, written in the format of a slide show, as an example I find funny.

Reinventing the Book in the Age of the Web:
Friday May 1st 2009, 11:27 AM
Comment by: Phil M. (tujunga, CA)
I love my Oxford Dictionary and Thesaurus with more than 150,000 entries. There is a feeling of having something solid in my hands and its not throwing back radiation in your face. I can skip back and forth from A to Z and not have to click a lot of buttons or drag my finger on or around a metal pad until it turns raw. There is a friendliness, a natural feel of paper; something organic giving way in your hands that a computer cannot replace for me. Plus, the added benefits are, some people might just think I'm smart for carrying a paper dictionary around. I cannot tell you how many times I have had to loan it out while working in coffee shops.

Phil Mendez
Saturday May 2nd 2009, 9:50 AM
Comment by: Adele C. M. (Charlotte, NC)
While reading the article, I had a nagging feeling that I did not like the idea of no more dictionaries and I vaguely wondered why. When I read Phil M.'s comments, I had my answer! When I am at home, I'll often go to my dictionary, which I keep open on a stand, to remind myself of the meaning of a word or to be certain my understanding is correct and it is far easier than, as he states, going back and forth with the buttons or, heaven forbid, misspelling or mistyping the word and having to start all over again. Thanks Phil!
Saturday May 2nd 2009, 4:48 PM
Comment by: larry A.
Re lumping:
One of my early forays into translation involved using the word plant in the horticultural sense on;y to find that in a French translation, it became a building used for manufacturing.
I also want to add that having a "real" dictionary helps me check spelling effortlessly. My computer is not able to check bizarre spellings.
Monday May 4th 2009, 11:52 AM
Comment by: Wood F.
I have the American Heritage Dictionary (Fourth Edition) as both a physical book and as an iPhone application. While I love the book -- hefty, detailed, and beautifully designed typographically -- I must say that I find the iPhone app a lot more convenient to use. When I look up words I frequently "chain" through a series of definitions, either of words used within a definition or to follow the path of an etymology. The iPhone edition has the complete text of the printed edition -- but it is fully hyperlinked (even through to the Indo-European root appendix), which makes it so easy to transition between definitions without flipping through ungainly pages. While nothing will ever replace physical books for their satisfying weight and "objecthood," electronic media have many advantages in speed and convenience over old-fashioned, static paper.

As for whether printed dictionaries -- or even electronic ones meant for human consumption -- should be optimized for computer word sense disambiguation, I don't really see why the two need to intersect. As you point out, computers and humans have different sets of needs and abilities. I think that a lumped definition is appropriate when it can be quickly processed and understood by a human, while split definitions are clearly needed for computer parsing. But why force a human to wade through a series of finely distinguished split definitions when one sentence with a couple or "ors" or "especiallies" is perfectly serviceable?

(P.S. I just had to look up "serviceable" to see whether there was an 'e' in the middle, and I did it in about 5 seconds on my iPhone.)
Monday May 4th 2009, 12:31 PM
Comment by: Shira C.
Personally, I think you should retire the paper books sooner rather than later. To give only a single reason, the splitting / lumping issue is less problematic in electronic form -- you can have it both ways. And you should. For instance, it may be intuitive to lump the related senses of "recall" for a native English-speaker, but a foreign speaker might benefit from knowing that several words in his own language can all be expressed by the English word "recall", so the definitions should be split off.

Ultimately I suppose I imagine that THE online dictionary would be this massive database with many useful views provided for different purposes. And I'd pay a reasonable fee (NOT several hundred dollars a year, sorry) to have access to THE dictionary with some good tools for mining it. Compared to that, a paper dictionary is a poor substitute.
Monday May 4th 2009, 9:51 PM
Comment by: L B.
I have about seven dictionaries (printed form) in my home. I will not part with any of them. But when I am busy writing, the computer dictionary is the quickest and sweetest way to go. As for the question of publishing or not publishing a paper dictionary, I should hope that lexicographers and publishers will never have to come to that dichotomy. All hail the paper book in my hands. LB
Thursday May 7th 2009, 3:29 PM
Comment by: Clarence W.
Unlump me please.

Flashcards, and other ever more compact mega-storage capacities, make the need for dictionary-speak unnecessary, if its primary purpose is to render physical books less than five pounds.

When it comes to physical books, I still have multiple dictionaries that differ in size and scope from pocket-sized to multi-volume. Sometimes a quick glance at a medium sized satisfies my need, other times I want something deeper and more nuanced and have to consult the one that requires a pedestal.

With technology I can have the pedestal and more in the palm, and would not be bothered by lack of dictionary speak. After all, when I seek the biggest volume I'm looking for the most detail. What have I been missing when even the most massive compilation is seeking to be svelte? Would it be too cumbersome to have five pedestals? Yikes! But, with technology, five, count them, five pedestals in a palm, I don't mind, really I don't.

The dilemma becomes deciding whether it is necessary, and if so how, to preserve the fluentcy of dictionary-speak among the masses.

Do you have a comment?

Share it with the Visual Thesaurus community.

Your comments:

Sign in to post a comment!

We're sorry, you must be a subscriber to comment.

Click here to subscribe today.

Already a subscriber? Click here to login.

Some turns of phrase are peculiar to literary English.
Operative Words
How the study of collocations can reveal patterns that go unnoticed in dictionaries.
Behold the Corpus
- 1 Comment
The use of massive databases of texts is transforming how dictionaries are made.