Few of us get through a week without asking the question that titles this month's Language Lounge. We may pose the query more informally (What's going on? What's up?) or more succinctly with the help of demonstratives (What's that?), but pose it we do, and that's the case whether we're speaking English or some other language. Inquiring minds want to know, and often need to know. There's always a lot going on and there are always a lot of people who want to know about it. Authorities want to know about it too, if what's going on threatens the safety or security of a population that is under their protection. A 21st-century way that authorities try to find out what's going on can be organized under the rubric of event detection — a field that is in its infancy and that does not yet, as of this writing, even have its own Wikipedia article.

The linguistic element in event detection emerges naturally from dependable premises: coordinated action requires communication via language, and when things happen, people talk about it.  For things already happening, the digital age has provided an indispensable forum by which we may learn about them: Twitter (see below). For events that are being planned but have not yet happened, the available information is spotty, ambiguous, not always dependable, and difficult to analyze — but the need for authorities to get access to and interpret this language is just as vital as their need to get a handle on things that are already underway.

To talk about the easier case first: Twitter, among other electronic forums, offers a goldmine of data for those seeking to know what is happening now or what just happened. Research dollars flow to this activity, and if you Google "Twitter event detection" you'll find that many inquiring minds are already working the problem. One example is the EPIC project, an acronym for "Empowering the Public with Information in Crisis." EPIC is training computers to capture tweets and compile the information they contain into a composite understanding of a crisis that will give authorities vital information about the nature, size, and trajectory of an event that affects a population — information that more traditional news sources may be unable to compile because of the nature of the event.

The much thornier underside of event detection is the ability to detect imminent events that may affect a population: for example, an event that is planned and coordinated with malice aforethought, and that is therefore by its nature clandestine or camouflaged. Such events, when they happen, may be catastrophic and costly in every conceivable way. So, the thinking goes, there is benefit in investing in technology that would enable the early detection of such events and the possibility of preventing them.

Two programs currently being funded by the US Department of Defense are DEFT (Deep Exploration and Filtering of Text) and BOLT (Broad Operational Language Translation). The online splash page for DEFT has this graphic, which illustrates one of the core problems of intelligent natural language processing in a rather innocent way. Silhouetted Aaron and John exchange dialogue in which Aaron reveals his (somewhat unusual) allergy to apples, and John helpfully warns him against eating the cake. Aaron (and any human listening to the short dialog) would make the inference that the cake contains apples, but it is beyond the ability of computers currently to make such a connection.

Taxpayers may rest assured that DARPA, the research arm of the Department of Defense, is not spending millions of dollars to prevent the apple-averse Aarons of the world from eating apple cake. But it is not hard to imagine a short interchange between two individuals in which background knowledge would facilitate an ability to make critical inferences about things they leave unsaid, and possibly to prevent something far more dire than an allergic reaction.

The goal of DEFT, in fact, is to go one better than ordinary human inference. As the DEFT homepage states, it aims at a solution that will enable "understanding connections in text that might not be readily apparent to humans…. Sophisticated artificial intelligence of this nature has the potential to enable defense analysts to efficiently investigate orders of magnitude more documents so they can discover implicitly expressed, actionable information contained within them."

In a real-life scenario it seems like a pretty good bet that the interlocutors may not be named Aaron and John, their subjects will not be apples and allergies, and — perhaps most critically from a linguistic and computational point of view — they may not be speaking English. This adds greatly to the complexity of the problem because whichever of myriad languages (or regional dialects of little-studied languages) real-world Aaron and John may be using, the gist of their meanings and intentions will need to be transcribed (in order to become "text") and then translated (in order to become English) before any information they contain may become actionable.

While Aaron and John are shooting the breeze they will probably say a number of things that are of no interest to authorities. Which bits of their talk require study and analysis? The current thinking is that talk of events is significant, and so a computational way of reliably detecting mention of events in language is an important goal. But what's an event? Event is an abstract, very general noun. The definitions of more than 400 entries in one dictionary we looked at contain the word event. Everyone has a general notion of an event, and most people would probably subscribe to the idea that a distinguishing feature of events is that they have time boundaries somewhat in the way that physical objects have space boundaries. Is a party an event? A birthday party certainly is. A political party is not an event, and neither is a party to a lawsuit an event. So before events can be reliably tagged in text, computers must first perform the more basic task of word sense disambiguation, which we've talked about before in the Lounge, most recently here.

Another view into the murky waters of event detection is anomaly detection: a discipline that does have its own Wikipedia article. Anomaly detection is built around detecting patterns in a given dataset or data stream that do not conform to established norms. The thinking goes that abnormalities of this kind — anomalies — merit looking into. If the anomaly you are trying to detect is a spike in voltage or a surge in traffic, it's pretty easy to measure against a baseline that constitutes the normal state of affairs. Linguistic patterns, on the other hand, are extremely varied, subtle, and complex. How much would a computer have to know about the patterns of a language to make an educated guess about an abnormality in a given text requiring investigation? We are a long way from knowing that today.


Rate this article:

Click here to read more articles from Language Lounge.

Orin Hargraves is an independent lexicographer and contributor to numerous dictionaries published in the US, the UK, and Europe. He is also the author of Mighty Fine Words and Smashing Expressions (Oxford), the definitive guide to British and American differences, and Slang Rules! (Merriam-Webster), a practical guide for English learners. In addition to writing the Language Lounge column, Orin also writes for the Macmillan Dictionary Blog. Click here to visit his website. Click here to read more articles by Orin Hargraves.

Join the conversation

Comments from our users:

Friday March 1st 2013, 6:31 AM
Comment by: Edson Lopes (SÃO PAULO Brazil)
Besides the inferences that can be made by analyzing context, there is a further and more complicated issue: metaphors and coded meanings that can take up any disguise, and only be meaningful to the parties that exchange the message.
Friday March 1st 2013, 9:41 AM
Comment by: Morse W. (Huntsville, AL)
Excellent and interesting article. However, I would add that anomaly detection in the physical domain is not as easy as you might assume. The basic problem is that “what is normal” changes in time and context. Is that big bump your car just experienced the engine falling out, or did you just drive over a speed bump? The data that characterizes context is usually not available. Hence, there is uncertainty about what is normal. Bottom line is that you must deal with some very challenging probability and statistics.
Friday March 1st 2013, 4:58 PM
Comment by: mac
after millennia of evolving we have incomplete communication skills. good luck to bot.
Sunday March 3rd 2013, 4:38 AM
Comment by: Juan Jose Hartlohner (Madrid Spain)
The problem will always be the snowballing amount of data to be handled by a small team of human beings, who will have to judge the importance of event alarms. Then, how this intelligence is transmitted up the command chain, and how fast and befitting the "leaders" react to the sifted information submitted to them.
It's like surveillance cameras, they don't prevent crime; they are helpful in the investigation afterwards.
Sunday March 3rd 2013, 2:26 PM
Comment by: Mary C A.
Putting aside DEFT and BOLT and anomaly detection and the discussion that inevitably ensues, What is Normal? I'd like to get back to the discussion of words and their connections and etymology. Would somebody please do this for the word 'meme?'
Monday March 4th 2013, 3:21 PM
Comment by: Cody (Eugene, OR)
Mary C.A.:

Assuming you haven't already done this, I used Wikipedia to offer you 1) a short definition, and 2) the etymology of the word "meme."

A meme (pron.: /ˈmiːm/; meem) is "an idea, behavior or style that spreads from person to person within a culture."[2] A meme acts as a unit for carrying cultural ideas, symbols or practices, which can be transmitted from one mind to another through writing, speech, gestures, rituals or other imitable phenomena. Supporters of the concept regard memes as cultural analogues to genes in that they self-replicate, mutate, and respond to selective pressures.

The word meme is a shortening (modeled on gene) of mimeme (from Ancient Greek μίμημα Greek pronunciation: [míːmɛːma] mīmēma, "imitated thing", from μιμεῖσθαι mimeisthai, "to imitate", from μῖμος mimos "mime")[4] and it was coined by the British evolutionary biologist Richard Dawkins in The Selfish Gene (1976) as a concept for discussion of evolutionary principles in explaining the spread of ideas and cultural phenomena. Examples of memes given in the book included melodies, catch-phrases, fashion and the technology of building arches.

Do you have a comment?

Share it with the Visual Thesaurus community.

Your comments:

Sign in to post a comment!

We're sorry, you must be a subscriber to comment.

Click here to subscribe today.

Already a subscriber? Click here to login.