Language Lounge

A Monthly Column for Word Lovers

On Some Deficiencies in Our Search Engines

"Look it up!" used to be a directive mainly about words in dictionaries; these days it's as likely to be about information on the Internet. A common experience in both cases is that you don't always find what you're looking for. This month in the Lounge we look at some of the overlapping reasons why.

In the mid-19th century, British scholar Richard Chenevix Trench gave two papers that were later published as a booklet with the title "On Some Deficiencies in Our English Dictionaries." His observations were a major impetus for the work that eventually became the Oxford English Dictionary. Trench enumerated seven points that he considered the major failings of the English dictionaries of his day:

We still look up words in dictionaries, but a lot of our looking-up today is not just for words but for information; and the place we look it up is on the Internet. For that, our point of entry is a search engine, the indispensable tool for access to online information. As we reviewed Trench's criticisms of dictionaries we were struck by how apt his observations were, mutatis mutandis, for Internet look-ups today.

We like to think of the Internet as a more or less complete repository of information, and of a search engine as providing an index that enables us to access that information. But everyone who uses search engines finds them wanting, mainly in being unable to locate for us the information we seek (and that we "know" is there), or in returning to us information that was not what we sought.

Trench's main idea was that "A Dictionary . . . is an inventory of the language: much more indeed, but this primarily." The common theme of all his criticisms is that dictionaries fail as an inventory of the language, and while he uses the term inventory, his points suggest that what Trench really wants a dictionary to be is an index: an index in the Peircean sense (an idea we explored in the Lounge a year ago), in which the index is a genuine indication of its referent. If there is a change in the referent, there must be a corresponding change in the index for fidelity to be maintained. This is also what we want a search engine's index to be: complete, appropriately granular, and up-to-date, such that it will always point unambiguously and accurately to what is there on the basis of the indications we supply. Why doesn't it always work that way?

The reasons could fill, and probably have filled, a book. From our perspective in the Lounge, the interesting points have to do with precision and fuzziness, and the ways they are reflected in two fields that have everything to do with Internet searches and their results: logic and language.

Most of our searches employ a fuzzy tool (language) to retrieve a fuzzy object (information encoded in language). We have the opportunity to introduce some logical parameters into our search, via the various tools that Google and other search engines offer (there is a sample here), and once our search string is sent off to the engine, various algorithmic operations take place that employ logic. What comes back? If we are lucky and if we framed our search skillfully, we get just what we were looking for. But sometimes we don't, and the ways in which search engines fail us are analogous to the ways that dictionaries failed Trench: the dictionaries he criticized contained faulty information, or they did not contain words, or information about them, that he knew to be in the lexicon; our searches may fail to return information that is actually present online, or they may return information that is not a match, in every sense, for what we sought.

The reasons for these failures are also overlapping to some degree: the field of inquiry — all networked information, on the one hand, and the lexicon, on the other — are both constantly changing, redundant, highly ambiguous, not logically constructed, and contain innumerable asymmetrical relationships among their members (all of which give rise to what we are calling fuzziness). Constructing a perfect, failproof tool for access — an Internet index on the one hand, a dictionary on the other — is probably an impossibility.

It seems to us that a goal of a good search engine should be that the Principle of Least Astonishment prevails whenever possible: that is, given any ambiguity in our query or in the nature of the information we seek, we would like the search engine to be biased toward giving us what we want, rather than giving us something completely unexpected. There is ample documentation of cases where this principle fails: sites like Reddit and Digg regularly feature posts by people who got some (to them, anyway) astonishing and counterintuitive result from a simple or complex search. All search-engine users probably have an example of this sort in mind; in the Lounge, we have our own recurring faulty search result, which seems to be due to an error of logic brought about by ambiguity in language.

Our personalized Google News page has a section for stories associated with ZIP code 87901: the code for Truth or Consequences, New Mexico. Astonishingly, however, Google News consistently delivers stories to us that are not about this charming desert town at all, but that simply contain the phrase "Truth or Consequences" — a not uncommon journalistic trope. Here, for example, is a recent section from our personalized news page:

Of the four stories pointed to, two have nothing at all to do with Truth or Consequences, NM. This surely, is simply a logical error that cannot be difficult to put right: just as not every "87901" has to do with Truth or Consequences, New Mexico, not every "Truth or Consequences" has to do with a small desert town in the American Southwest. Is it beyond the ability of men and machines to correct this?

Trench made his observations about dictionary failings long before the wonders of modern information technology were even dreamt of. He, and those who responded to his challenge, used only the old-fashioned tools available to them; a plan for systematic and thorough research and synthesis of the lexicon, starting with a detailed analysis of how dictionaries imperfectly reflected it. Modern crafters of search engines have the advantage of programming languages and lightning-fast computation, but in some ways have still not overcome basic challenges. We wonder if a closer examination of search engines failures, using the same old-fashioned tools available to Trench and his followers, might prove to be a fruitful avenue for search engine improvements.


Rate this article:

Click here to read more articles from Language Lounge.

Orin Hargraves is an independent lexicographer and contributor to numerous dictionaries published in the US, the UK, and Europe. He is also the author of Mighty Fine Words and Smashing Expressions (Oxford), the definitive guide to British and American differences, and Slang Rules! (Merriam-Webster), a practical guide for English learners. In addition to writing the Language Lounge column, Orin also writes for the Macmillan Dictionary Blog. Click here to visit his website. Click here to read more articles by Orin Hargraves.

Join the conversation

Comments from our users:

Monday February 1st 2010, 8:08 AM
Comment by: Ravi K.
Great article! Google, are you listening?
Monday February 1st 2010, 9:06 AM
Comment by: Robert L. (Phoenix, AZ)
Sounds like an opportunity to fill the niche. I love the phrase 'Principal of Least Astonishment'!
Thanks for great article!
Monday February 1st 2010, 10:33 AM
Comment by: Don H. (Antioch, CA)Top 10 Commenter
Interesting article!

I realized years ago that searching the Internet often felt like trying to fill the Dixie cup of a search from the Niagara-falls-size cataract of information flowing through the Web.

Part of the problem with unintended returns from searches can be credited to the pernicious activities of unprincipled people who try to use popular searches to draw traffic to their sites. Early in the days of Internet searching, before Google, I remember how offended I was when I searched for "hubble telescope" and got links to pornography sites, since the Webmeisters of those sites knew that men and boys would be searching on that string.

The search engine companies spend a lot of time tracking links from popular terms back to spurious sources who simply insert keywords as a way of piggybacking on popular searches. I've heard that it's a difficult challenge for them.
Monday February 1st 2010, 10:37 AM
Comment by: Françoise T. (Montreal Canada)
Great article, than you.
Monday February 1st 2010, 11:19 AM
Comment by: Jane B. (Winnipeg Canada)Top 10 Commenter
This is too sadly true for the most part, but one has to realize when searching that each one of the uncommon words is 'searched'. That's the way those engines work as I understand them.

If you want limitation to "Truth or Consequences NM", you have to put it that way. Then, according to Google, you will get only those references related to the small town.

You will probably also get a little cautionary note that by removing those quote marks, you'll get more information.

Google chooses its search results slightly differently from others as I understand searching (and I do an awful lot of it). The frequency of the item mentioned or most of the words in the item will determine its placement. The more exact you can be, the more sure your result will be.

Having said that, I have discovered that some things are almost impossible to find information about until I come up with a really, really precise wording. For example, my husband has trouble gaining weight. Not losing it. 'Losing weight' resulted in diet plans galore; 'gaining weight resulted in more of the same. How to didn't help. I can't remember ever getting satisfactory information on how to help a person gain weight. I went with the the age old advice: Eat more. Gradually build up the portions.

That has helped where the internet failed.

By the way, we've had his health checked out. That doesn't appear to be the problem. He just needs to eat more.

Finding a wonderful site about food values and calories helped!

I'm frequently searching for information about prehistoric times which is how I've learned to overcome some of the many challenges connected with Google.

Despite its shortcomings, it remains my favourite.

My only criticism is that there is no idication on the search page that I will not have access to the information that arises.

If Google could figure that one out, put a label on those sites that require some sort of special 'badge' to enter, I'd love it even more.

By the way, if you want the definition of a word, type it in the box. Type define and you'll get it.

Maybe.

Sigh!
Monday February 1st 2010, 11:26 AM
Comment by: hazel W.
So often when we look for information on the internet we are looking for an expansion of the word not a definition. It's a quick brainstorming. We look for phrases and topics that are related but not exactly the same.
Monday February 1st 2010, 11:35 AM
Comment by: Steve H.
Alas, poor Mr. Trench must be rolling in his grave, to see that a century and a half hence in the year 2010, some people still can't be bothered to distinguish the word "principle" from "principal"; even when having just seen its correct usage moments before...
Monday February 1st 2010, 11:39 AM
Comment by: John S.
Excellent introduction to a topic that in my opinion ventures into an area that as an educator I am finding to be greatly troubling: intelligent inquiry.
It is one thing to have an idea of what you are looking for, but this "browsing" seems to put too much responsibility on the search engine and not enough on the searcher.
I haven't quite thought this through, but in general I am finding a real lack of logical/critical thinking in our k-12 classrooms (I am a guest instructor in dozens a month), and very little rigor on the part of teachers to hone thee skills, either for her/himself, or the students.
Call me old fashioned, but can a 5th grader really start a sentence: "Me and Jim went ..." without stopping the class and drilling down on the logic of subject and object?
This lack of dignifying human capacity is perhaps evident in Googling.
Given the obvious ambiguity in accessing all information maybe the first results should be just the beginning for refining your questioning and a great opportunity for the development of critical inquiry skills.
Monday February 1st 2010, 1:05 PM
Comment by: Don H. (Antioch, CA)Top 10 Commenter
If you want to teach the logic behind online searching, the type of logic employed is Boolean. (Check out http://en.wikipedia.org/wiki/Boolean_algebra_%28logic%29. For a much easier presentation, see http://websearch.about.com/od/internetresearch/a/boolean.htm — or any of the other 5,000,000 websites addressing the issue.)
Monday February 1st 2010, 2:30 PM
Comment by: Jane B. (Winnipeg Canada)Top 10 Commenter
John S. I agree. I think that 'searching' presents a wonderful way to teach those skills of discerning just what information answers the search the best.

The next step is judging the source of that information, and finding other backups for it.

Sloppy searching should be stopped as soon as possible, just as sloppy grammar should be.
Monday February 1st 2010, 3:01 PM
Comment by: Renato V. (Americana Brazil)
Great article. Your are a good teacher. C O N G R A T U L A T I O N S !!
Monday February 1st 2010, 5:47 PM
Comment by: Daniel C. (Leicester United Kingdom)
What bugs me most is when Google thinks it knows what I want better than me. No I didn't mean [correctly spelled word], I was actually searching for what I typed!
Monday February 1st 2010, 6:49 PM
Comment by: Rosanne C. (Pittsburgh, PA)
Google may be Goliath, but Clusty is David! Clusty.com not only clusters the results but under their little icon, they categorize Sources and cluster Sites, i.e., .org, .com, .edu, etc. Under 'web' type: 87901 and explore your result options. Clusty was developed initially @ Carnegie Mellon in Pittsburgh.
The next best kept secret is that librarians are still the experts at finding relevant information. We have Masters Degrees in finding and evaluating information. Use us, please, or we will go by the way of the "first edition."
Thank you for the platform, Mr. Hargraves.
Monday February 1st 2010, 7:17 PM
Comment by: Don H. (Antioch, CA)Top 10 Commenter
Thanks for the heads-up, Rosanne. I bookmarked Clusty and will test it out. (I entered SEARCH ENGINES and got 17 million hits in Clusty, vs. more than 60 million on Google.)
Tuesday February 2nd 2010, 12:02 PM
Comment by: Jane B. (Winnipeg Canada)Top 10 Commenter
Thanks for the heads up about Clusty. I'll look into that one!
Tuesday February 2nd 2010, 6:33 PM
Comment by: Margaret O. (Brunswick Australia)
As an Aussie, who likes to use the Visual Thesaurus as an aid to my academic writing, I am regularly astonished by the differences in the listings I find in your US English based tool and what I will find if I go to a paper based Aussie English Thesaurus. For me, the comments about Google searches also applies to VT searches. I am sure that some of this reflects cultural differences. Some of it will be to do with differences in core vocabulary in the different societies. Some of it I assume is to do with VT being "a work in progress".

I enjoy your reflections on language and often forward them on to others I know share this quirky habit.
Saturday February 6th 2010, 7:18 PM
Comment by: L B.
I am so pleased with what VT can do. That whirling, phantasmic, growing-in-time presentation is a delight to watch and learn. Please consider me one of the grateful users. I commend those who came up with the plan to show word meanings schematically. You are angels.
Wednesday March 10th 2010, 2:37 PM
Comment by: Stan Carey (Galway Ireland)Visual Thesaurus Contributor
Interesting article! Thank you for the link to Trench's criticism.

I must say I've never thought of the internet as anything close to a complete repository of information (the Tristram Shandry paradox immediately pops into view), but it did come up with the goods when I recently searched for "rere" as a Hiberno-English variant of "rear". My Shorter OED listed it as obsolete, but I knew that was inaccurate, at least in Ireland, and it didn't take much Googling to find both historical and contemporary examples of the word's usage.

Do you have a comment?

Share it with the Visual Thesaurus community.

Your comments:

Sign in to post a comment!

We're sorry, you must be a subscriber to comment.

Click here to subscribe today.

Already a subscriber? Click here to login.

Safe Search is Off
- 15 Comments
Thinking about the relationship between words and pictures, with the help of Google Image Labeler.