Language Lounge
A Monthly Column for Word Lovers
Safe Search is Off
The index asserts nothing; it only says "There!" It takes hold of our eyes, as it were, and forcibly directs them to a particular object, and there it stops.
— C. S. Peirce
A new plague of time-squandering has descended on the Lounge; it has easily pushed out earlier rivals, like playing Scrabble with friends on Facebook and watching old Patsy Cline videos on YouTube. Now the Loungeurs are in the grip of Google Image Labeler. The devil himself could not have devised a better method of timesucking: you can easily incorporate it into any multitasking scheme; it asks, nominally, only two minutes out of your day; and it's all about words!
The initial thrill of playing the game was the challenge of coming up with perfect labels: words that most succinctly describe a small image appearing on your screen. This thrill quickly disappears, however, because you find that perfect labels don't win points, and points are what it's about: not only because everyone likes scoring, but because you cannot move on to a new image until you and a perfect stranger who is your playing partner in cyberspace land on the same label for the current picture (and thus each score). So you evolve very quickly from trying to find the perfect label (an activity that you could almost justify to yourself since it exercises the depth and breadth of your imagination and vocabulary) to trying to find the lowest common denominator label: the word most likely to come into the mind of your partner, whom you develop a composite picture of after a few games. The composite picture that develops of the typical Google Image Labeler, in our view, is a twenty-something male, steeped in contemporary pop culture but perhaps not much else, who is probably multislacking while at work, or perhaps taking a break from gaming online. Is this who should be entrusted with the monumental task of indexing images on the Internet?
Google Image Labeler, will (according to Google) help "improve the relevance of image search for users like yourself." While in the thrall of it we are often put in mind of Charles Saunders Peirce: he's a late-19th/early-20th century American philosopher, much beloved in the Lounge not only for his clear thought but because he was also a lexicographer: he contributed more than 5,000 definitions to the Century Dictionary. Peirce devoted a lot of thought to signs, by which he meant something intermediary between an object and the mind. His most famous classification of signs is threefold: the ikon, the index, and the symbol. Ikons resemble their objects; any photograph, image, or realistic drawing or painting is an example of an ikon. Indices bear a real relationship to their objects, such that a change in the object would be reflected in a change in the index: an index is an indication of another thing, in the most literal sense. Symbols bear an arbitrary relationship to their objects, and are connected to them only by virtue of usage and convention; most words are symbols.
In Google Image Labeler, players examine ikons. They assign symbols to them (words, or "labels"), with the ostensible view of generating an index. This would be an index in two senses: the conventional one, being an organized list of items with systematic reference to another thing; and a Peirceian index, in that the index as a whole would bear a real relationship to the set of images labeled.
Here's the problem: Google Image Labeler is currently designed to attract a maximum of inferior symbols (labels, in Google's terminology), and a minimum of good ones. A couple of examples:
sample image* |
labels that would helpfully index the image |
labels typically rewarded in Google Image Labeler |
|
Brandenburg Gate, Brandenburg Tor, Berlin, |
stone, sky, people, tourists, arches, blue, summer, tourist, walking, outside, sun |
|
Brigitte Bardot, Brigitte, Bardot, French actress, actress, portrait, pose, glossy photo, 20th century figures |
blonde, lips, hair, blue, skin, dress, hot, sexy, woman, girl, babe, chick, face, eyes |
The second picture here would also attract (and Google would reward) two other labels that we have not listed, referring to features of Ms. Bardot's anatomy — that's the typical level of cognition where minds meet in Google Image Labeler.
The value of an index — as any user of a reference book can attest — is the thoroughness, aptness, and granularity of its contents. Indexers work in probably even more obscurity than lexicographers, but the service they provide is immeasurable: they enable us to find a needle in a haystack. A good index effectively imposes a numbered, three-dimensional grid on the haystack and tells us which numbered box to look in.
Indexing a book, though it is a highly specialized skill, has an inbuilt simplicity: it equates like with like, that is, words with words: words found in an index are overwhelmingly also found in their reference, and thus a book index actually bears an ikonic relationship with its object. Indexing something other than words (images, smells, sounds, and so forth) is inherently more complex: it requires the assignment of words to things that are not words, and mainly do not contain words. This relationship can be symbolic only. Surely then, this is a job that, to be done properly, requires even more specialized skill. So it seems a pretty far stretch to think that Google is going to succeed in generating a valuable word index by crowdsourcing the job to anyone willing to have a crack at it: the results obtained from their labeler seem better suited to collect noise than signal, and everyone who uses search engines can attest that excess noise is already a big part of the problem in trying to find information online.
The proviso in our observation is that we don't know what Google is going to do with the data it generates, and of course it is possible, if not likely, that the marvelous minds there have anticipated or discovered the shortcomings we note and have found a way to deal with them. One thing is clear: they will have no shortage of data, because they have cleverly entrained an army of volunteers who will feed their datastream 24/7.
Some good introductions to Peirce's semiotics, which we recommend as being just as stimulating and much more edifying than labeling images, can be found here:
http://www.helsinki.fi/science/commens/dictionary.html
http://plato.stanford.edu/entries/peirce-semiotics/#Int
Google Image Labeler is an example of a gwap: a "game with a purpose." If you're into that sort of thing, you will be able to waste (or put to good use! it all depends on your view) a considerable amount of time here:
Luis von Ahn, Assistant Professor at Carnegie Mellon University, developed a game he called the ESP Game, which is the basis for Google Image Labeler. He gave a fascinating talk about his work to Google employees, in which he addresses some of the questions we raise, while leaving others a bit dangling:
http://video.google.com/videoplay?docid=-8246463980976635143&hl=en
* Images from WikiMedia Commons, but typical in size and resolution of ones that appear in Google Image Labeler.




Join the conversation
Comments from our users:
Thanks for indulging this tangent. I tried indexing in one class and I agree, it is incredibly difficult and underappreciated. When I come across a book that has no index, I immediately place the book in a lower category of quality and tend to think these are self-published books.
Based on your idea for best label I searched for "Lili Marlene" and found some things that were also close but nothing quite right, "woman smoking lamppost" was better.
I have decided that as a user of images, I find the 20-something-gamer is more helpful, unless I have a specific person or place in mind to represent the idea or mood I wish to convey.
Emily O, you said "they should develop some sort of tiered membership and place more weight on two qualified people playing the game." Well, let's see how the collective we (apart from Google) can take another step. Suppose Google licensed or gave away the tools to a community that cares (call them editors) to create their own sandbox of content.
Many useful web sites could come from this. Sorting images, sure, and even a match-making site. Content (search-engine) sites could be created by specialist teams "playing GWAP" on a work on the scale of the largest encyclopedia. Collectors and curators of art could assemble an index of the world's art, AIA an index of world buildings, doctors and medical schools an index of, well, I'm not sure I'd want to browse that site, but doctors would! These search engine web sites could have lasting (content) value, more so than the inevitable "Google collection of babes" (though I suspect Google's advertisers would draw more ad revenue from the latter database than all the former; good for all, if that makes the tools available for free---hah!).
The ThinkMap analogy: a thesaurus is rendered in a dramatically different interactive form, distinct from but based upon the printed thesaurus. I could just as easily see ThinkMap being used to map seven degrees to Kevin Bacon on the E! web site. That application of the tool doesn't debase the underlying tool. GIL as a collaborative tool brings great promise.
First, of all you have a wonderful name and you are an entertaining writer. I appreciate the euphony of your name and the quality of your articles.
The day this article came out I read it and was intrigued by the subject. I am a passionate advocate for good indexes and meaningful icons. But, I was particularly fascinated by your choice of the form of the word ikon instead of icon to mean "a visual representation." Ikon and icon are synonyms on VT, but ikon is a visual representation or a religious painting or panel while icon is primarily used in the sense of a graphical symbol used in a graphical user interface. The American Heritage Dictionary defines icon as (1) a visual representation, (2) a symbol, (3) a person who has become a symbol, (4) the aforementioned GUI symbol. When I look up ikon in the AHD it says variant of icon in the sense of a visual representation. The AHD definitions match my idea of the word.
For years I worked in Silicon Valley and was the director of a team of people who wrote highly technical object-oriented database management systems documentation and training courses for a programmer audience. We also designed a graphical user interfaces for a software product for our Windows users. We had long meetings attempting to determine the best icons (in a teensy size, which further complicated the issue) to show a selected action or topic. My team included, among others, two writers with PhDs in linguistics, an electrical engineer/biologist, a Java programmer, a C/C++ programmer, a graphic designer/illustrator, and a geologist who had become my production person. This multitalented group, after much hard work, came up with some creative and elegant solutions for extremely difficult to label concepts. These design meetings were even more dramatically challenging than our style and standards meetings. I see by your resume that you have some experience in technical documentation so you may know whereof I speak. And, during all of this work we spoke of icons but never ikons, which is why your choice has particular significance to me.
In the days after reading your article I could not get the choice of ikon as a usage out of my head and I tried to find a reason for your choice. I thought maybe you were British and had a classical education at a British public school and you were naturally gravitating towards the Greek form of the word. I also notice that you have written a book about the differences between British and American English usage and was wondering if you choice was an example of that. But, your Web site places your birth in Creede, Colorado and indicates that the last members of your family to be born in Britain experienced that circumstance in the 17th century. Then, I wondered if you were intentionally trying to use the form of the word that Mr. Peirce had used, but I looked up his paper on the subject and he did not use the word at all as far as I can tell. I thought maybe you were deliberately trying to avoid the usage most popular in high tech, but that does not make sense when you are using the terms in reference to Google's Image Labeler, which is obviously in the high tech realm.
So, I am intrigued, why did you choose your form, and upon further reflection, would you choose it still?
1) Kcecelia: So sorry for creating a confusion. My intention was to follow Peirce, so the spelling ‘icon’ should have been used. I reread some of his papers (from The Essential Peirce, which I checked out of the library) and took notes the week before I wrote the column; my notes all had ‘ikon,’ but in looking at the same papers in the scanned book online, I can’t imagine why. Your theories about it, however, do you credit for great thoroughness!
2) Others: many of your comments have made me rethink GIL and I’m now a bit more charitably inclined towards its usefulness. It seems likely, as Mary Beth J suggests, that multiple labels for a single image are a more useful indexing tool than single labels, and GIL certainly does generate multiples. As Wood F’s comment implies, Google has the luxury of collecting as much data about a single image as it wants, for there will always be willing players. So one is reminded of the monkeys-at-keyboards-producing-the-bible analogy: if you wait long enough, the data you want will arrive.
3) charles F., I’m with you! Peirce’s writings are fantastic brain food and it’s unfortunate that he is not widely read outside of universities; I hope that the column might bring a few more readers to him.
Lawrence B.