Few of us get through a week without asking the question that titles this month's Language Lounge. We may pose the query more informally (What's going on? What's up?) or more succinctly with the help of demonstratives (What's that?), but pose it we do, and that's the case whether we're speaking English or some other language. Inquiring minds want to know, and often need to know. There's always a lot going on and there are always a lot of people who want to know about it. Authorities want to know about it too, if what's going on threatens the safety or security of a population that is under their protection. A 21st-century way that authorities try to find out what's going on can be organized under the rubric of event detection — a field that is in its infancy and that does not yet, as of this writing, even have its own Wikipedia article.
The linguistic element in event detection emerges naturally from dependable premises: coordinated action requires communication via language, and when things happen, people talk about it. For things already happening, the digital age has provided an indispensable forum by which we may learn about them: Twitter (see below). For events that are being planned but have not yet happened, the available information is spotty, ambiguous, not always dependable, and difficult to analyze — but the need for authorities to get access to and interpret this language is just as vital as their need to get a handle on things that are already underway.
To talk about the easier case first: Twitter, among other electronic forums, offers a goldmine of data for those seeking to know what is happening now or what just happened. Research dollars flow to this activity, and if you Google "Twitter event detection" you'll find that many inquiring minds are already working the problem. One example is the EPIC project, an acronym for "Empowering the Public with Information in Crisis." EPIC is training computers to capture tweets and compile the information they contain into a composite understanding of a crisis that will give authorities vital information about the nature, size, and trajectory of an event that affects a population — information that more traditional news sources may be unable to compile because of the nature of the event.
The much thornier underside of event detection is the ability to detect imminent events that may affect a population: for example, an event that is planned and coordinated with malice aforethought, and that is therefore by its nature clandestine or camouflaged. Such events, when they happen, may be catastrophic and costly in every conceivable way. So, the thinking goes, there is benefit in investing in technology that would enable the early detection of such events and the possibility of preventing them.
Two programs currently being funded by the US Department of Defense are DEFT (Deep Exploration and Filtering of Text) and BOLT (Broad Operational Language Translation). The online splash page for DEFT has this graphic, which illustrates one of the core problems of intelligent natural language processing in a rather innocent way. Silhouetted Aaron and John exchange dialogue in which Aaron reveals his (somewhat unusual) allergy to apples, and John helpfully warns him against eating the cake. Aaron (and any human listening to the short dialog) would make the inference that the cake contains apples, but it is beyond the ability of computers currently to make such a connection.
Taxpayers may rest assured that DARPA, the research arm of the Department of Defense, is not spending millions of dollars to prevent the apple-averse Aarons of the world from eating apple cake. But it is not hard to imagine a short interchange between two individuals in which background knowledge would facilitate an ability to make critical inferences about things they leave unsaid, and possibly to prevent something far more dire than an allergic reaction.
The goal of DEFT, in fact, is to go one better than ordinary human inference. As the DEFT homepage states, it aims at a solution that will enable "understanding connections in text that might not be readily apparent to humans…. Sophisticated artificial intelligence of this nature has the potential to enable defense analysts to efficiently investigate orders of magnitude more documents so they can discover implicitly expressed, actionable information contained within them."
In a real-life scenario it seems like a pretty good bet that the interlocutors may not be named Aaron and John, their subjects will not be apples and allergies, and — perhaps most critically from a linguistic and computational point of view — they may not be speaking English. This adds greatly to the complexity of the problem because whichever of myriad languages (or regional dialects of little-studied languages) real-world Aaron and John may be using, the gist of their meanings and intentions will need to be transcribed (in order to become "text") and then translated (in order to become English) before any information they contain may become actionable.
While Aaron and John are shooting the breeze they will probably say a number of things that are of no interest to authorities. Which bits of their talk require study and analysis? The current thinking is that talk of events is significant, and so a computational way of reliably detecting mention of events in language is an important goal. But what's an event? Event is an abstract, very general noun. The definitions of more than 400 entries in one dictionary we looked at contain the word event. Everyone has a general notion of an event, and most people would probably subscribe to the idea that a distinguishing feature of events is that they have time boundaries somewhat in the way that physical objects have space boundaries. Is a party an event? A birthday party certainly is. A political party is not an event, and neither is a party to a lawsuit an event. So before events can be reliably tagged in text, computers must first perform the more basic task of word sense disambiguation, which we've talked about before in the Lounge, most recently here.
Another view into the murky waters of event detection is anomaly detection: a discipline that does have its own Wikipedia article. Anomaly detection is built around detecting patterns in a given dataset or data stream that do not conform to established norms. The thinking goes that abnormalities of this kind — anomalies — merit looking into. If the anomaly you are trying to detect is a spike in voltage or a surge in traffic, it's pretty easy to measure against a baseline that constitutes the normal state of affairs. Linguistic patterns, on the other hand, are extremely varied, subtle, and complex. How much would a computer have to know about the patterns of a language to make an educated guess about an abnormality in a given text requiring investigation? We are a long way from knowing that today.