A Monthly Column for Word Lovers
Lumping, As You Like It
We left off last month on the horns of the dictionary publishers' dilemma: how do you keep a flagship title in print when it costs far more to produce it than it will ever generate in sales? We noted the lure of electronic licensing rights as a factor that might influence the way dictionaries are put together and marketed in the future; and we heard from a few readers who, not unpredictably, lamented any future in which dictionaries in book form were not available.
We also love printed dictionaries; there are dozens of them lying around in the Lounge, and indeed in every room of the Language Mansion — but these days we don't actually use them very much. If we crack open a printed dictionary it is likely to be late in the evening, when all the computers are turned off for the day and our leisure reading has brought us to a word that requires a look-up. We, too, savor the feel and physicality of a hefty dictionary on our lap, but we find that this experience is as available to us with a ten-year old dictionary as it is with one published last month — and most of the words we look up are not ones coined in this century, so the fact is, any good old dictionary will do. From the publishers' perspective, it is not enough that paper dictionary users love their books. Publishers don't feel the love unless users keep buying dictionaries, year in and year out: upgrading regularly, as they do with their cars, computers, and cell phones.
By day, in the glow of our computer monitor, we give electronic dictionaries a regular and rigorous workout. Two of the most useful on CD-ROM, Random House Unabridged and Merriam-Webster's 11th, are committed to our hard drive always open: their unsurpassed searching-by-algorithm and sorting facilities make them constantly useful companions. A printed dictionary, as we all know, is an excellent tool when you already know what word you're looking for. But what if want to collect terms related to blacksmithing? Or list verbs ending in -ize that didn't come from Greek? Or find words of Japanese derivation that are the names of foods? A print dictionary is not going to help you here: except when you're just browsing for leisure, alphabetical print dictionaries start with the premise that you know what word you're looking for.
The online competition for paper dictionaries is also formidable. There are of course many online dictionaries, ranging from the low end (dictionary.com, Wiktionary, The Free Dictionary) on up to premium subscription-only sites like the OED, Merriam-Webster's Unabridged, and, of course, the VT. In addition, there are dozens of other word-detective tools on the Internet, some of which exist by virtue of the way data is organized online. Let's say, for example, that you want to know what a credit default swap is. You would look in vain in most dictionaries, online or elsewhere, because until recently these things would have been regarded as too esoteric to appear in general dictionaries. But go to Google and type:
define: credit default swap
You'll be rewarded with nearly a dozen definitions of the term, along with links that will take you to sites where you can learn more. What dictionary could ever offer you this facility? The long and short of it is that printed dictionaries, as we know and love them, are a mature but obsolescent product. This is not to say they're not useful, but the things they are useful for are now eclipsed by the many more capabilities of their digital counterparts.
An artifact of dictionaries that we talked about last month is what we called "dictionary-speak": the habit of lumping multiple definitions in a single one as a way of saving space in a dictionary. One commenter on last month's column questioned whether there was now, in the digital age, any need to preserve fluency in dictionary-speak among readers. This goes to the heart of the topic we began last month, which we might now pose another way: should dictionary publishers still define words in a way that is optimized for human users of print dictionaries, when said publishers can't make any money off the one, and are no longer constrained by the space limitations of the other?
Here's the entry for a polysemous word, foundation, in what we might call an "old school" (but still in print) British dictionary:
foundation noun 1 that on which something is founded; basis 2 (often plural) a construction below the ground that distributes the load of a building, wall, etc 3 the base on which something stands 4 the act of founding or establishing or the state of being founded or established 5 an endowment or legacy for the perpetual support of an institution such as a school or hospital 6 an institution supported by an endowment, often one that provides funds for charities, research, etc 7 the charter incorporating or establishing a society or institution and the statutes or rules governing its affairs 8 a cosmetic in cream or cake form used as a base for make-up 9 a foundation garment 10 a card on which a sequence may be built
These senses, many of which correspond with ones you'll see in the VT wordmap, are variously lumpy and splitty. For an experienced human dictionary user, it doesn't much matter: as we and some commenters noted last month, the human mind is quite capable of parsing this sort of language. For a computational user, splitty is usually good: a computer deals more easily with senses that are split between, for example, "act of" and "state of" (which are lumped together in 4, above). On the other hand, a computer is happy to have lumped together any senses that have roughly the same collocates, and so would perhaps not object to senses 2 and 3, or 5 and 7, being lumped with each other.
It's unlikely that dictionary publishers are going to reinvent defining for the digital age: like print dictionaries themselves, definitions of English words as we know them are a mature technology, having evolved from 500 or so years of practice. But dictionary publishers are acutely aware of their need to retool and organize their valuable data in a way that will be profitable for them. As they do this, Aunt Edna, who uses a Merriam-Webster 7th edition to help her with crossword puzzles, is not going to be very high on the list of people to please. Research institutions, data mining start-ups, and intelligence organizations that need to kludge massive amounts of text in order to learn what it says may have the upper hand in determining what dictionary databases look like in the future.