A news story that flitted across the headlines earlier this year reported on a study called "The Geography of Happiness," in which researchers in Vermont subjected 10 million geotagged tweets to sentiment analysis (touched on briefly in the Lounge last summer) and correlated their findings successfully with annually-surveyed characteristics of people all 50 United States, including nearly 400 urban populations. Their object was to arrive at a metric for the relative happiness of people in a place. The study is a goldmine for data junkies and I urge anyone with an interest in its findings to look at the whole study (on the link above), not just the digested and spun bits of it that have appeared in the news media. "The Geography of Happiness" breaks new ground in the analysis of digital-age linguistic data, while also raising interesting questions about the limits of obtaining reliable results from algorithm-driven research on big bags of words.
Journalists in search of a story have found news hooks of many shapes and sizes in the study and different aspects of its findings are still being reported in the popular press and mulled over in blogs. Commentators can adjust their focus for close-up or wide-angle because there's material here for everyone. One of the more popular findings of the study was the somewhat coarse-grained happy state/sad state map, reproduced here:
Happier states are shown in red, sadder states in blue, neutral states are gray. No one is surprised that Hawaii seems to be the happiest state. It also fits a popular stereotype that two states in the Deep South, Louisiana and Mississippi, where quality-of-life indices lag far behind the national averages in many areas, are the least happy.
Anyone who reads the study can nitpick, and if you read pundit blogs or trawl the comments section of any of the online news coverage about the study, this is what you find. Are people in economically depressed Nevada really happy, or are the happy tweets coming from besotted revelers in Las Vegas? Why were tweets in Spanish not surveyed as well as ones in English? Is it really fair to characterize Louisiana as a sad place because of the abundance of profanity that is tweeted there? If wine is considered a happy word, is it any wonder that Napa, California is the happiest city in the country? Does it not imply a strong and unscientific prejudice against fat people that the researchers even looked for a correlation between word use and obesity?
The researchers did not actually "read" 10 million tweets, ponder their meaning, and rank them on a happiness scale. They were looking at words, not sentences. They (or rather, their minions: see below) ranked the happiness quotient of a set of individual words and then measured the relative frequency of happy and sad words across all geographical areas. So, for example, rainbow is a rather happy word and hate is a sad word. The important question that many linguists raise here is this: Is it valid to assess the meanings of words divorced from their context? Breakthrough seems like a pretty happy word on the face of it, but it isn't if you're talking about toilet paper. The researchers' assertion, however, is that when you're looking at millions of words, you can still obtain valid results while ignoring context. Their work follows on others who have looked to Twitter as the pulse on which to place a finger in order to discover what people are thinking and feeling, such as this frequently cited paper from 2010 does.
A core flaw of the happiness study may be its use of Amazon's Mechanical Turk service to arrive at the "average happiness" of each word examined in the study. Mechanical Turk is a massive crowdsourcing/outsourcing labor market in which simple, highly repetitive tasks are performed by Internet-connected workers for small payments. It's interesting to note that a separate, independent study of Mechanical Turk found that
certain homogeneous aspects of the [Mechanical Turk] population, such as education level and nationality, may impose limits on the appropriateness of Turkers as a target community for some interventions or research areas. An awareness of the demographics and behaviors of Mechanical Turk workers is important for understanding the capabilities and potential side effects of using this system.
So in other words, there may be a compromising overlap between the small minority of people who tweet, and the even tinier minority of people who work as Mechanical Turks, making "The Geography of Happiness" in a highly reduced view simply a snapshot of what hip and savvy 20-something Americans think happiness is in relation to themselves and others that they know little about.
One wall that this study comes up against is that happiness is a subjective state, and so attempts to measure it objectively will never rise above all objections. But what must not be overlooked in the study is the huge volume of data that was studied, as well as the great care taken to minimize bias from the peculiarities of measuring characteristics of language when it is separated from almost any meaningful context except its point of origin. The fact that the linguistic analysis correlates so well with survey-based analysis that is also aimed at measuring the elusive quality of happiness is the best validation of the researchers' approach.
The study was intriguing to me personally for a couple of reasons. Until very recently I lived for 20 years in Maryland, one of the saddest states according to the study. Now I live in Niwot, Colorado, a very sweet spot that is on an edge of a small triangle formed by three of the happiest cities in the nation, according to "The Geography of Happiness": No. 2 Longmont, No. 9 Lafayette, and No. 12 Boulder, Colorado. This suggests that I should be tripping over my own happy feet every day. Yet more interestingly, my father (may he rest in peace) was born in Shreveport, Louisiana and spent his childhood Beaumont, Texas: two cities that rank among the five most long-faced in the nation. So of course my first inclination was to consult my own experience to see if it correlated with the findings of the study.
I can report unequivocally that I am happier in Colorado than I was in Maryland, though that is partly due to the fact that I grew up here and now I feel like I am back home after many years away. I can report anecdotally that I am surrounded here by much happier people than I was in Maryland. The default way that you greet a stranger in this part of Colorado is to make eye contact, smile, and say something nice. That rarely happened in Maryland, and it is not the rule in any of the other places I have lived. As for my father's sad legacy: I only visited his neck of the woods once, long after he had died but well within the time frame coincident with the data assessed in "The Geography of Happiness." On leaving there, I will only say that I praised my late grandmother's fortitude and courage for moving her family out of the swamp in the 1940s and bringing it to Colorado!
As a non-tweeter, there's one nagging doubt about "The Geography of Happiness" that I can't address but that I would like to see investigated: how reliably do people actually tweet what they feel? There is an overwhelming pressure in American culture to be upbeat and "accentuate the positive," and I wonder how much this plays into the phenomenon of feelings that get tweeted, as opposed to ones that merely fester.