Word Routes

Exploring the pathways of our lexicon

How Watson Trounced the Humans

The field of natural language processing doesn't usually get showcased in a widely watched game show, but that's exactly what happened on Jeopardy! over the last three evenings, as IBM's Watson supercomputer squared off against the two best humans ever to play the game. IBM had sunk tens of millions of dollars in research money to develop Watson over the past four years, and a loss would have been highly embarrassing. Luckily for IBM, and unluckily for the carbon-based life forms Ken Jennings and Brad Rutter, Watson came through with flying colors.

Prior to this week's televised tournament (which was actually taped about a month ago), Watson had trained in "sparring games" against former Jeopardy! contestants. One of them, Greg Lindsay, had done quite well against Watson. According to Stephen Baker in his new book, Final Jeopardy: Man vs. Machine and the Quest to Know Everything, Lindsay "saw that Watson mastered factoids but struggled with humor and irony... Clues based on allusions, not facts, left it vulnerable." Lindsay created a blueprint for Jennings and Rutter to defeat Watson by focusing on these blind spots. If the humans were to have a hope of beating Watson, it would be by capitalizing on clues that have some semantic ambiguity or indirection to them. Though Watson did get tripped up by language at times, it wasn't enough to offset his blazing speed on more straightforward clues.

The tournament consisted of two games played over three half-hour shows, with the winner determined by the cumulative winnings over both games. Watson was unable to see or hear, and was instead fed the clues as electronic texts. (His answers came through in a synthesized voice that one tech blogger said sounded like "HAL's diffident nephew" — HAL, of course, being the computer in Stanley Kubrick's 2001.)

In the first round of Game 1, broadcast on Monday, Watson was impressive right out of the gate. An early clue in the category "Alternate Meanings" was "4-letter word for the iron fitting on the hoof of a horse or a card-dealing box in a casino." Watson's avatar, equipped with a mechanical thumb, buzzed in with the correct answer, shoe. Multiple meanings did not seem to be much of a problem in this case. It's notable that one of the data sources used by Watson (stored in 15 terabytes of memory) was the semantic database WordNet, which also helps power the Visual Thesaurus. WordNet includes both of these meanings in its sense inventory for the word shoe.

A later clue in the same category led to a rare incorrect response from Watson. (Generally, if Watson had low confidence in a possible response, it waited longer to buzz in, letting the humans answer those.) The clue was "Stylish elegance, or students who all graduated in the same year." Watson answered with chic, which fits the first part but not the second part. Brad was then able to buzz in with the correct answer, class.

Another goof by Watson in the first round was in the category "Olympic Oddities," for a clue about "the anatomical oddity of U.S. gymnast George Eyser, who won a gold medal on the parallel bars in 1904." Watson answered "leg," but the answer they were looking for was "his missing leg." IBM's project leader Greg Ferrucci later explained that Watson didn't really understand what an "oddity" was in this context. Watson had far less trouble in the category of "Beatles People," though some would quibble with its response to the clue, "'Bang bang' his 'silver hammer came down upon her head.'" Instead of answering "Maxwell," Watson provided the entire song title, "Maxwell's Silver Hammer," but apparently that was good enough for the judges.

Watson's deafness was on display when Ken incorrectly guessed that the 1920s were the decade when "the first modern crossword puzzle is published & Oreo cookies are introduced." Watson then gave the same answer (it was actually the 1910s), drawing a sour comment from host Alex Trebek, "Ken just said that." At the end of the round, Watson was tied with Brad at $5,000, with Ken at $2,000.

In the Double Jeopardy round of the first game, broadcast on Tuesday, Watson built a commanding lead. In fact it didn't miss a single question that it buzzed in on, 24 in all out of the 30 in the round. Then came Final Jeopardy, and Watson got stumped by this clue in the category "U.S. Cities": "Its largest airport is named for a WWII hero. Its second largest for a WWII battle." Ken and Brad got "Chicago," but Watson mystifyingly chose "Toronto." That no doubt occasioned a lot of water-cooler discussion in both Chicago and Toronto. On Stephen Baker's blog, Ferrucci tried to explain how Watson could have thought Toronto was a U.S. city (besides the fact that there's a city of Toronto in Ohio): because the Toronto Blue Jays play in baseball's American League, it might infer that Toronto was therefore "American."

The error didn't matter much, since Watson only bet $947 on Final Jeopardy, leaving it with a total of $35,734, to Brad's $10,400 and Ken's $4,800. For Game 2 on Wednesday night, the humans would need Watson to stumble badly. In the first round, Ken did quite well, and Watson ran into some hitches. It was completely shut out of one category, "Actors Who Direct." As Baker explains in his book, the clues were simply too short for Watson to develop the correct responses quickly enough. (The clues just consisted of movie titles, like "A Bronx Tale" for "Robert DeNiro.")

Ken continued to perform well in the Double Jeopardy round, but he still needed to find the Daily Doubles if he was going to have any chance of building up a dollar amount that would erase the disparity with Watson from Game 1. Watson found the first one, but flubbed it. The clue in "Nonfiction" was: "The New Yorker's 1959 review of this said in its brevity & clarity it is 'unlike most such manuals, a book, as well as a tool.'" The correct answer was Strunk and White's The Elements of Style, but Watson didn't even know he was looking for a book, answering "Dorothy Parker." Ken still might have had a shot in Final Jeopardy if he had found the other Daily Double on the board, but Watson eventually got that, too. At that point, the tournament was essentially over, as Watson's lead was insurmountable.

The Final Jeopardy clue for the second game was: "William Wilkinson's 'An Account of the Principalities of Wallachia and Moldavia' inspired this author's most famous novel." All three contestants got "Bram Stoker" (author of Dracula), and the final cumulative scores were Watson $77,147, Ken $24,000, and Brad $21,600. The million-dollar prize went to Watson, the entirety of which was donated to charity. Ken as first runner-up got $300,000, and Brad got $200,000. (The humans donated half of their winnings to charity.)

Ken managed to inject a bit of levity at the very end. On his Final Jeopardy response card, he wrote, "I, for one, welcome our new computer overlords." (That's a "Simpsons" reference, in case you didn't know.) So, should we indeed be welcoming our new computer overlords? I'll have more to say on the topic in a guest-post on The Atlantic later today.

Update: you can read my Atlantic piece here. And you can hear Stephen Baker and me talking about Watson on WNYC's "The Brian Lehrer Show" here.


Rate this article:

Click here to read more articles from Word Routes.

Ben Zimmer is executive producer of the Visual Thesaurus and Vocabulary.com. He is language columnist for The Wall Street Journal and former language columnist for The Boston Globe and The New York Times Magazine. He has worked as editor for American dictionaries at Oxford University Press and as a consultant to the Oxford English Dictionary. In addition to his regular "Word Routes" column here, he contributes to the group weblog Language Log. He is also the chair of the New Words Committee of the American Dialect Society. Click here to read more articles by Ben Zimmer.

Join the conversation

Comments from our users:

Thursday February 17th 2011, 10:59 AM
Comment by: Michael Lydon (New York, NY)Visual Thesaurus Contributor
Damn, I missed the shows, but this gives a good wrap-up. The whole thing is fascinating, and I feel sure that the tons of work and money poured into the game will pay off in all kinds of more serious computing.
Thursday February 17th 2011, 11:26 AM
Comment by: Jane B. (Winnipeg Canada)Top 10 Commenter
I thoroughly enjoyed the shows, including the IBM explanations of Watson. I did, however, feel sorry at times for the humans. That was a sympathetic or empathetic respons!
Thursday March 10th 2011, 9:58 AM
Comment by: Karen D. (Laurel, MD)
I thought that Watson's weaknesses were best on display in the "Also on your computer keyboard" category, where he failed miserably - including the one time he rang in with "Chemise" instead of "Shift".

It would be interesting to see him go head-to-head with Brad or Ken.

Do you have a comment?

Share it with the Visual Thesaurus community.

Your comments:

Sign in to post a comment!

We're sorry, you must be a subscriber to comment.

Click here to subscribe today.

Already a subscriber? Click here to login.

How smartphone autocorrect can result in unexpected and outrageous output.
Books by the Numbers
- 8 Comments
How the new science of culturomics uncovers patterns of word use.
New research sheds light on the word patterns of fiction writing.