Epidemiologists Tackle Influenza with Data
Computational epidemiologists are exploring ways to use Google, social media, and other data sources to improve public health.
Published Mar 1, 2013
By Diana Friedman
Academy Contributor
It’s fairly evident when flu hits a given area—employees start taking sick days, lines become longer at the doctor’s office, and emergency rooms fill up. But what if people, particularly healthcare workers and those not yet vaccinated, could get just a little more warning that flu was coming, or that the current flu season had not yet peaked? These were questions Rajan Patel, PhD, senior scientist at Google Inc. and two of his coworkers, Jeremy Ginsburg and Matthew Mohebbi, asked themselves in 2007.
While the CDC has its own method for estimating flu outbreaks—mainly by relying on select doctors around the country to report counts of influenza-like illness back to the CDC. But those reports must be collected, aggregated, and disseminated—which creates about a 2-week lag time between data collection and public reporting.
Patel and coworkers embarked on a project to create a real-time measurement of flu—measurable down to the city level, even in remote areas where it is hard to collect data from on-the-ground physicians—using the data source they knew best: search engine data.
Utilizing Search Engine Data
“We built a simple linear model that used the cumulative frequency of search terms, normalized for total search volume, to estimate the influenza-like illness rates provided by the CDC,” says Patel. They had to start by filtering billions of potentially flu-related search engine queries through a correlation analysis, determining which queries best related to CDC information on symptoms of influenza-like illness. “If we just guessed at the most likely search terms, we could have had misses,” says Patel.
The process of building, tuning, and validating the model took about a year. The results, published in a 2009 Nature paper, showed that data from the new model, called Google Flu Trends, was consistent with CDC data, although Flu Trends could often predict influenza upticks in a given area at least a few days earlier. Because the project was undertaken through Google.org, the search engine giant’s not-for-profit organization, Flu Trends results are completely open access, available at www.google.org/flutrends.
Patel has since moved on to other projects, including strengthening a core search algorithm that seeks to provide better search responses to users’ search engine queries (such as “What do I do if I have the flu?”). And other researchers have since created similar epidemic-tracking models for locations around the world.
Social Networks & Epidemic Forecasting
“I hope that other companies with social network data can think of ways to use it for good,” says Patel. Answering his wish, Lucky Gunasekara, founder of data firm Vulcan, is looking at using social network data for epidemic modeling. Gunasekara explains that in contrast to Google Flu Trends, which relies on large volumes of search engine data to capture incidence, models based on social networks could “look at the actual drivers and pathways of an epidemic.”
“If you know the topology of a network, essentially its structure, then you would just have to survey certain people—your canaries in the coal mine.” These people would allow you to predict who the flu might hit, based on their social interactions. The key, says Gunasekara, is identifying the right people and the right data—“it’s a very bad idea to collect all data from everyone and assume that because we have so much, we’ll be able to do something useful with it.”
“Say there is a potential bioterrorism alert in New York City. Would you need every person to check into emergency rooms to find out who is actually sick, inciting mass panic along the way? Or would it be better to figure out models of the epidemic within our social networks and to directly message or call a sample of potentially infected people to ask, ‘Are you feeling sick?’,” says Gunasekara.
A Tall Order
There are still large challenges to overcome before social network data can be successfully used for wide-scale epidemic forecasting, says Gunasekara, not the least of which is ensuring that forecasting doesn’t spill over into profiling territory.
“Technology should enable personal agency, not take it away.” It’s also necessary to distinguish between data that indicates online interaction (like commenting on a friend’s status) and data that could reasonably indicate real-life, person-to-person interaction (such as photo-tagging). “It’s also really hard to incentivize people to provide you with their personal data for something with such a high social stigma as being sick,” says Gunasekara.
But Gunasekara and other data scientists are not discouraged, rather energized, by the many challenges facing them. “Nicholas Christakis at Harvard has already started to put together a successful epidemic surveillance model using key individuals within social networks to identify the emergence of a new epidemic,” he says.
The Key Challenge
The finding emerged out of a campus-wide flu study conducted amongst Harvard undergraduates and published in PLOS One in 2010. The findings could be applicable, as Dr. Christakis articulated to the TED community, for not just the early detection of seasonal flu but also “viral” memes and fads in both the real and online world.
“People like Nicholas and myself are trying to build thoughtfully designed and scientifically rigorous social experiments and models that, yes, could lay the basis for massively beneficial public services and platforms. Scale though doesn’t equal scientific quality,” says Gunasekara.
“The key challenge in front of us is better understanding the science behind the dynamics of these epidemics and then translating those findings into the design of new services and products that we can all collectively benefit from through one common shared experience.”
This story originally appeared in the Winter 2013 issue of The New York Academy of Sciences Magazine.