The Disputed "Science" of Online Behavior
Wednesday, May 30, 2012 - 01:19 PM
In the Internet era, both companies and scientists are well aware that more and more of our daily activities have moved to cyberspace. And they know the value of understanding the meaning and trends behind the countless links we follow ever day. Facebook scientists use data to study users’ ethnicities, improve geolocation and even to predict election results. At universities all over the country, schools of information study the effects of people’s attentions shifting to screens.
According to a NY Times article, academic researchers and tech companies have recently had some heated debates about the essential ingredient underlying their work - data. Companies like Amazon and Facebook have refused to open their massive troves of data to scientists, citing their users’ privacy concerns and competitive interests.
Bernardo Huberman, a physicist who directs the social computing group at HP Labs, argues that not only are companies’ researchers at an unfair advantage, but their findings shouldn’t be called “science” at all. Company researchers conduct studies that their equally talented peers at universities can’t and their results can’t be verified by independent scientists. That means that some of the most crucial social science research of the Internet era can’t be checked for inaccuracies or potential fraud.
When I spoke to Huberman, he told me that company researchers should be given two options: Either they figure out a way to anonymize the data and then publish it with their findings, or science journals and conferences should ban them from presenting their work as “science”.
The first choice seemed promising to me. And so I called Andreas Weigend, a physicist and former chief scientist at Amazon, to ask how it could be done. His answer? It’s impossible. Companies like Facebook have so much data about individuals that leaving out the subject’s name or location wouldn’t be sufficient to anonymize it (a topic we have discussed on the show before). The remaining data points could still be used to identify people.
But what happens to the central tenets of science- verifiability and generalizability- when companies don’t release research data? Should and can the “science” of online behavior be evaluated by these criteria? Weigend thinks that the requirement to publish data that allows other scientists to verify it is a relic of an old gate-keeping system. Companies like Amazon and Facebook don’t need scientists to verify their research. They just need to evaluate how effective their algorithms are in targeting consumers and where their page ranks are. Take Amazon’s “other books you might like” feature. If Amazon’s algorithms are good, then they’ve analyzed data about you in a way that results in your purchasing those recommendations. To them, that’s verification.
To me, this still seems unsatisfying. Perhaps all Amazon cares about is creating algorithms that better target customers. But they still have data that answer a whole slew of other questions that social scientists deeply care about. If they refuse to publish it, are we forced to rely on them to conduct studies that matter to social scientists, policy makers and those of us who want to understand how our behaviors are changing? Do we simply have to trust the accuracy of their findings? To that, Weigend simply answered: “What’s the alternative”?