The Data Cloud : Language Lounge : Thinkmap Visual Thesaurus

We've been thinking about the data cloud in the Lounge these days. "What data cloud?" you may ask, and well that you should: it's a term relatively new to English and it hasn't yet settled down to a single fixed meaning. The data cloud we've been thinking about is the Big One: the nebulous dataset consisting of all the data that is, in principle, at your fingertips when they are poised above an Internet-connected keyboard.

In particular, we've been thinking about that part of the data cloud that ordinary folks are the authors of. These days, so many entities -- roughly, those that are designated as "Web 2.0" phenomena -- invite us to upload, store, share, label, tag, and comment on our own and other folks' data.

We've been pondering the various sets of words that English now employs to talk about aspects of the data cloud: how we interact with it, how we characterize its operations, and how we consider the implications of its existence, even if we don't understand them all very well. A while back, in The New Food, we explored the way in which a number of food terms have morphed into meanings commonly associated with information technology. That's an example of how English -- like other modern languages, we suspect -- is quite economical in dealing with the data cloud. English has required relatively little new language for grappling with the new paradigm: instead, we simply put to work tried-and-true words that have proved amenable to having their meanings extended and applied to a new area of discourse. But what do these various bags of words -- of which food words are one example -- say about the data cloud and our relationship to it?

Let's start with the term "data cloud" itself. We don't know who the original coiner of this term is (we find hits online going back to about the turn of the 21st century), but we happily affix a gold star to his or her label for aptness. The data cloud does indeed have many of the features of a cloud: indistinct boundaries, the quality of being homogenous when viewed from a distance or from within, the sense that it hovers above us, and that it is constantly changing its indeterminate shape under the influence of forces much greater than an individual can command.

As real clouds exist in physical space, it seems inescapable to conceive of the data cloud as existing in space, or even of being a kind of space unto itself: it has addresses and locations; it can be both navigated and mined; we put things into it and take things out of it, though for these operations we use the mainly 20th century words upload and download. The persistent space metaphor that we use in dealing with the data cloud probably arises from many causes: our already established habit of talking about memory as if it existed in physical space; the fact that many function words in English (like prepositions) are grounded in spatial relations; and perhaps most of all, the fact that humans and their languages can't make much sense of anything that isn't grounded in time and space.

But we wonder, is there a deceptive simplicity in dealing with the data cloud as a notional space? Remember, the data cloud consists entirely of data. There's something slightly chilling in the definition of data: "a collection of facts from which conclusions may be drawn." Conclusions? What conclusions? And who's drawing them? These questions point to the dark underbelly of the data cloud and to another set of English words that have been given new life in talking about new technology. This set of words forces us to think about the data cloud as something other than a pretty, fluffy white thing that scuds across the horizon on a summer afternoon. The data cloud is home to a lot of curious things: bots, spiders, crawlers, gophers, and other critters that work tirelessly by night and day, sifting, indexing, collecting, comparing, and no doubt, drawing conclusions.

It's marvelous that various companies will give us multigigabytes of storage for our stuff free, or at only a nominal cost. But it's all information we're putting out there; whether it's our Gmail archive, our photostream on Flickr, our blog, or our financial records on our bank's bill paying facility. It's all data -- the food of the data cloud, the fodder from which conclusions can be drawn -- and in what other space would we leave valuable morsels lying about, knowing that various predatory critters were poised to feed on them?

We use familiar language in dealing with the data cloud because we need to make it familiar: we must have terms that enable us to talk about it in a way that makes sense to our time-and-space-bound brains, and we have to start from where we are. But the language that has developed about the data cloud seems to be a little compartmentalized now, and perhaps simplified in a way that allows us to think of the data cloud as something benign. We overlook the fact that we mix things in the notional space of the data cloud that we would never mix, or that we would make an effort to keep separate, in any real space. The data cloud really is something new under the sun. As we continue to internalize the implications of its existence, we wonder how our language-based metaphors about it will change.

Here are a couple of things we saw online (appropriately enough!) that got us started thinking about the data cloud. First, a diagram in Wired magazine that can be viewed interactively here:

http://www.wired.com/special_multimedia/2008/ff_secretlife_1602

Secondly, we saw an article in Forbes magazine. The line that caught our eye in this article was "...the development of the Nexus 7000, a network switch that's capable of routing 15 terabits of data per second -- the equivalent of moving the entire contents of Wikipedia in a hundredth of a second, or downloading every movie available on Netflix in about 40 seconds." This led us to a description of the product in question:

http://www.cisco.com/en/US/products/ps9402/index.html

What we liked most about this page was watching the "video data sheet," in which a Cisco executive talks in deadpan, matter-of-fact tones about the brave new world of high-speed data transfer. It's good medicine if you're inclined to think that the data cloud just happens!

Vocabulary, Online, Language

Click here to read more articles from Language Lounge.

Orin Hargraves is an independent lexicographer and contributor to numerous dictionaries published in the US, the UK, and Europe. He is also the author of Mighty Fine Words and Smashing Expressions (Oxford), the definitive guide to British and American differences, and Slang Rules! (Merriam-Webster), a practical guide for English learners. In addition to writing the Language Lounge column, Orin also writes for the Macmillan Dictionary Blog. Click here to visit his website. Click here to read more articles by Orin Hargraves.