WikiLeaks Science in Peer Review Research
31st May 2011
On August 30, 2010, Back, Kufner, and Egloff published a short study with the peer review journal, Psychological Science. They got a Really Big database of over 400,000 text messages sent on September 11, 2001. They fed those messages into a well established software program, the Linguistic Inquiry and Word Count (LIWC), for analyzing the semantic content of this database to measure the emotions being expressed that day. They found a correlation of .84 between angry text messages and time indicating that people got angrier, a lot angrier, across the day. From this analysis of a Really Big dataset, the researchers concluded:
In sum, we investigated a large data set providing unobtrusive behavioral measures of negative emotions actually expressed during September 11. We were able to determine that people did not react primarily with sadness; that they experienced a number of anxiety outbursts, but recovered quickly; and that they steadily became angrier . . . anger is known to predict moral outrage and a desire for vengeance, which — once aroused — seem to require an outlet (Skitka et al., 2004). This might help to explain individual acts of discrimination following the attacks, as well as societal responses such as political intolerance and confrontational policy.
I would argue that their conclusions do not follow from this research, especially given they employed Observational Research where Really Big can seem Really Important, but never forget it’s Really Convenient and Biased to boot. And, how you go from Angry Text Messages to Intolerance and Confrontation seems to require proof of intervening stages, but whether it is calorie counts on menus, cell phones and driving, or statistically insignificant variation in something called Global Temperature, some folks have no trouble moving from a correlation to causality and thence to law and regulation. But, people smarter than I found a different reason to dispute this study.
See, on May 9, 2011, less than a year later, Cynthia Pury published her observations in same journal about the Angry American research from Back et al. Pury noted a slight problem with one case in this Really Big dataset.
The data contained many technical codes; thus, Back et al. counted only words recognized by LIWC. However, this procedure did not exclude automatically generated messages. Consequently, LIWC words in such messages were counted, even if the words lacked emotional meaning in context. Furthermore, computers can send messages with superhuman frequency, turning an otherwise minor measurement error into a serious confound. This confound can be detected by treating individual text messages as primary units, reading samples of each key word in context, and looking for repeating false positives.
Stated another way, Back et al. did not inspect that database of 400,000 messages and just assumed every text message in the database was sent by a human to another human. It never occurred to them that the database might include technical messages sent between devices (i.e. the pager and its server). Pury found that over one third of the messages in the database were indeed these technical exchanges and more importantly, every one contained the word, “critical,” in the message. Worse still, all of these technical messages came from just one device, a single pager.
Now it turns out that the well established software program that determines emotion from semantic content counts the word, “critical,” as an anger word. More of these “critical” messages were sent between the pager and server as the day progressed probably because of network failure from heavy volume. Thus, Back et al. determined on September 11, Americans got angrier, more intolerant, and more discriminatory simply without carefully inspecting the dataset and its contents. When the data from this one pager – which only carry a technical message between the device and its server – are removed from the dataset, the Angry, Intolerant, and Confrontational American Effect disappears.
Back and colleagues responded in Psychological Science on May 13, 2011, the week after Pury’s note. They confirmed Pury’s analysis of their error. They provided a revised conclusion,
As Pury’s (2011) analyses suggest, however, the timeline of anger was not as straightforward as indicated in our original analyses . . . Additional analyses and sources of data will be needed for a thorough evaluation of the course of anger on September 11, 2001.
You could put it that way. Or maybe . . .
This is about as embarrassing as it gets in research publication. I’m sure that everyone who’s ever published peer review science experienced a near heart attack merely reading this exchange. It’s your worst nightmare. You bust your butt doing a study, turn everything inside out and upside down three times, show your results to your worst enemy, and when everything seems copasetic you send it in for another beating at peer review. It passes, you publish, then one day you open your emailer program and find a message with ?????? in the subject line from someone you don’t know from Adam, Eve, or the Serpent. You read a polite inquiry about inconsistencies and technical questions about methods and you’re thinking, “Who is this knucklehead?”
But, because you’re doing science, you take this seriously and run a quick check of the database looking for a particular pager and the key term, Critical, and then you feel your heart constrict even before your mind can explain why. You run a sort routine on the data and, BOOM, your screen fills up with a gazillion cases of CRITICAL from the same pager.
Your life flashes before your eyes. A moment from a Methods 101 seminar. A tenure committee. A crying grad student. An empty office. Then you contact your coauthors who repeat your movie, but for themselves. The gift keeps on giving as you verify the biggest mistake of your life, there in black and white, forever in digital storage, like a brain in a bottle filled with formaldehyde, except it’s your reputation in a peer review journal that will endure until the last scientist and the last storage drive.
And, too, something like this, but at a lower intensity, is going on with the journal editor and reviewers. Sure, you can blame a technical error like this on the authors; you can’t and shouldn’t analyze their data for them. But, nobody in charge caught this whopper. Due diligence? Let’s guess how the attributional search on this one will work out.
This is the bad news about peer review research. People make huge errors. But, the good news in this is that Really Big errors tend to get caught and the literature becomes self-correcting. The additional good news is that errors like this rarely seem to occur. Even those Errata notes (where authors catch their own errors) are relatively rare and most exchanges between teams of authors typically turn more on He Said, I Did Not, You Did, Too! rather than fabulous errors like this. Thus, peer review not only seems to publish fewer whoppers, it can also catch them and hold them up to the light of day.
In retrospect all of the errors from all the players in this may seem more apparent now than anyone could have realized then. I’m not so sure. My radar does go off when I read an observational methodology like this that reports a correlation of .84 between a psychological variable and anything living or dead, real or imagined. With this kind of data and all the measurement and sampling error in it, a correlation of this size is a virtual identity, like parallel forms of a proven self report scale. Nobody – the authors, the reviewers, the editor – nobody flagged on this?
And then what was the source of the database, those 400,000 text messages? What, AT&T? Sprint? Some public record from a 9-11 Commission Report? Nope. Hold on to your BVDs or your panties or both.
WikiLeaks.
Yeah. The researchers went online to WikiLeaks and simply downloaded this double-secret, Dick Cheney/George Bush/CIA database. That’s why it was so easy for Cynthia Pury to catch the stupid error Back, Kufner, and Egloff committed. She, too, simply went to the WikiLeaks link, downloaded the file, and opened it and this can of worms. There is no provenance on this database, how it was built, how it was acquired, who accessed it, whether it was manipulated for any purpose. It’s just a hodgepodge of messages and technical codes that the scientists and freedom fighters at WikiLeaks acquired and posted online to expose the wrongdoings of governments. And no one thought that peculiar from a scientific point of view. Of course, Americans and America jumped on the Hate Wagon after 9-11. Gitmo! Waterboarding! Infringed Civil Liberties! The Angry Left! Oops. That last one doesn’t belong there.
Imagine instead if this had been from a Fox News server. Don’t change anything else, just that provenance. Think how the researchers, reviewers, and editor would have responded.
While I see a clear Biased Process in Back et al. conclusions about Angry Americans and all that zealotry, Back, Kufner, and Egloff are made to look foolish not for their silly, fevered, and unwarranted conclusions, but because they don’t know how to inspect a database. And, the journal editor and reviewers look incompetent because of their political beliefs. Who needs to think about a correlation of .84 or a wildly biased database source like WikiLeaks when we’ve got the right conclusion about anger, hate, and intolerance in America and Americans.
All Bad Science Is Persuasive, baby.
Mitja D. Back, Albrecht C.P. Kufner, and Boris Egloff
The Emotional Timeline of September 11, 2001 Psychological Science October 2010 21: 1417-1419, first published on August 30, 2010, doi:10.1177/0956797610382124
Cynthia L.S. Pury
Automation Can Lead to Confounds in Text Analysis: Back, Küfner, and Egloff (2010) and the Not-So-Angry Americans Psychological Science May 2011, first published on May 9, 2011, doi:10.1177/0956797611408735
Mitja D. Back, Albrecht C.P. Kufner, and Boris Egloff
“Automatic or the People?”: Anger on September 11, 2001, and Lessons Learned for the Analysis of Large Digital Data Sets Psychological Science May 2011, first published on May 13, 2011,
doi:10.1177/0956797611409592
Posted in Government, HowTo, Politics, Rules | Comments Off



