Skip to content

Does Big Data have the flu?

These days, when people start feeling a fever and a sore throat coming on, often times their first move isn’t to the medicine cabinet. Instead, it’s to a computer or smart­phone to Google their symptoms.

These queries, which make up only a tiny fraction of the more than 7 billion total queries the search engine handles each day, are all stored by Google. The company uses this data for a variety of reasons; it can help Google improve its search results for users—which also boosts the company’s bottom line—and can also benefit the pop­u­la­tion as a whole in other ways.

One example of the latter is Google Flu Trends, a sta­tis­tical model developed by engineers at—the company’s foun­da­tional arm—in an effort to “now-​​cast” what’s happening with the flu on any given day.

But research has shown that GFT often misses its target. These results led North­eastern Uni­ver­sity net­work sci­en­tists and their col­leagues to take a closer look at how Big Data should be used to advance sci­en­tific research. Their report was published online Thursday in the journal Science.

“Big Data have enormous sci­en­tific pos­si­bil­i­ties,” said Northeastern professor David Lazer. “But we have to be aware that most Big Data aren’t designed for sci­en­tific purposes.” Fully achieving Big Data’s enthu­si­as­ti­cally lauded potential, he added, requires a synthesis of both computer science approaches to data as well as tra­di­tional approaches from the social sciences.

The paper was co-​​authored by Lazer, who holds joint appointments in the Department of Political Science and the College of Computer and Infor­ma­tion Science; Alessandro Vespig­nani, the Stern­berg Family Dis­tin­guished Uni­ver­sity Professor of Physics at Northeastern who has joint appointments in the College of Science, Bouvé College of Health Sciences, and the College of Computer and Infor­ma­tion Science; Northeastern visiting research professor of political science Ryan Kennedy; and Gary King, a professor in the Harvard University Department of Government.

“In a sense, Google Flu Trends is not bad, but it’s no better than any basic approach to time series pre­dic­tion,” Vespignani said. “So the issue is in the claims and the dis­re­gard of other techniques or data more than the actual result.”

In their paper, the researchers explain where Google Flu Trends went wrong and examine how the research com­mu­nity can best utilize the outputs of Big Data com­pa­nies as well as how those com­pa­nies should par­tic­i­pate in the research effort.

By incor­po­rating lagged data from the Centers for Disease Control and Pre­ven­tion as well as making a few simple statistical tweaks to the model, Lazer said, the GFT engineers could have sig­nif­i­cantly improved their results. But in a companion report also released Thursday on the Social Science Research Network—an online repos­i­tory of scholarly research and related materials—Lazer and his colleagues show that an updated version of GFT, which came about in response to a 2013 Nature article revealing GFT’s lim­i­ta­tions, does little better than its predecessor.

While Big Data certainly holds great promise for research, Lazer said, it will only be successful if the methods and data are made—at least partially—accessible to the com­mu­nity. But that so far has not been the case with Google.

“Google wants to contribute to science but at the same time does not follow sci­en­tific praxis and the prin­ci­ples of reproducibility and data availability that are crucial for progress,” Vespig­nani said. “In other words they want to contribute to science with a black box, which we cannot fully scru­ti­nize and understand.”

If sci­en­tists are to “stand on the shoulders of giants,” as the old adage requires for moving knowledge forward, they will need some help from the giants, Lazer said. Oth­er­wise failures like that with Google Flu Trends will be rampant, with the potential to tarnish our understanding of anything from stock market trends to the spread of disease.

– By Angela Herring

More Stories

Photo of the Capitol Building at night

High stakes for politics, SCOTUS in 2018

Photo of the crashed truck that was used in the October 31st attack in Manhattan.

Weaponizing Language: How the meaning of “allahu akbar” has been distorted

Northeastern logo

Why I love studying Spanish