We need a strategic stockpile of infectious disease data

Getty Images

The U.S. needs a new strategic stockpile … of data. Our economic and national security depend on it.

Now that we are on the path to containing COVID-19 we need to ensure that we prevent such a crisis from happening again. Since we can’t prevent new diseases from emerging, we have to learn to manage them better. Better data means better situation awareness, better models, better decisions, quicker response, a safer population, better health outcomes and a strong and resilient economy.

As a scientist, I’ve spent the last fifteen years developing models of epidemics. Prior to COVID-19, the models I developed were relatively limited in scope. They were intended to apply to a small population or geographic area (e.g. the 2014-2016 Ebola outbreak in West Africa). The COVID-19 pandemic was the first time that I worked on models on a truly large scale.

This work showed me the serious flaws in our data collection systems at the local, regional and national levels. Although the Centers for Disease Control and Prevention did and does provide data about the state of the epidemic, it was not fine-grained, and during the height of the epidemic it was often out of date. News media such as the New York Times and The Atlantic, and universities such as Johns Hopkins stepped in to fill the gap with their personnel, expertise and dashboards.

But managing the data to coordinate a national response to epidemics is not the job of the media or of academic researchers. The U.S. needs a centralized repository for collecting and cleaning data and disseminating it in real-time to relevant stakeholders, including decision-makers, doctors and scientists and the public. An important role for the new National Center for Outbreak Analytics is the collection of high-resolution spatial and temporal information about pathogen-specific testing, cases and outcomes, including demographic information about patients, such as age and race.

Similarly, the understanding of disease outbreaks like COVID-19 can be considerably improved by joining data about transmission (i.e. cases, hospitalizations and deaths) with other data streams. With COVID-19, anonymized mobility data was used to understand how the population moves throughout the country, while still respecting individual privacy, but modelers were slow to make use of this information because they had to negotiate with different technology companies to gain access. Other information that needs to be collected and curated in real-time includes:

  • Information about the effect of pharmacological mitigation efforts such as vaccination, as is currently being distributed by the Covid Act Now Coalition;
  • Information about public policies meant to prevent transmission. In the case of COVID-19 these were things like gathering bans, face mask mandates, school- and business-closures and shelter-in-place orders; and
  • Information from digital surveillance systems such as ProMed and HeathMap.

Real time data can also feed early warning systems for disease outbreaks. Such systems are currently in their infancy, but there is tremendous scope for future development if the right data streams are put into place.

However, collection of data in real time is not enough and there is much that must be done. Just as the strategic national stockpile of pharmaceuticals and medical supplies like ventilators and medicines exists to save lives in the event of a national medical emergency, a strategic data stockpile would save lives with information. But data collection must begin well before the emergency. 

There is no excuse for not having a wide range of data sources archived, standardized and cataloged to be placed in service when needed. Stockpiled data might include:

  • Geographic patterns. Where is the global population located? Where are the roads, rivers, and railways? Where are the hospitals and clinics, farms, factories, and forests? Of course, the geographic features of our world are changing. Geographic data needs to be kept up-to-date.
  • Behavioral data. One of the greatest challenges to forecasting infectious diseases is the highly unpredictable nature of human behaviors. But even if human attitudes (like vaccine hesitancy) and actions (like wearing a face mask) are variable, they are not inscrutable. The study of health psychology and health behavior is an entire academic discipline, but that data and knowledge is not integrated into our understanding of epidemics.
  • Mobility and transportation patterns. Infectious diseases spread when people move around. Spatial movement is a classic example of a multi-scale problem. People move and interact micro-locally within households, schools, and workplaces; locally within neighborhoods; regionally between counties and states; and internationally. Each of these is relevant to different aspects of outbreak containment.
  • Environmental data. Many diseases are strongly affected by environmental conditions. Pathogens like West Nile virus, Chikungunya virus, and Zika virus are transmitted by mosquitoes. Lyme disease is transmitted by ticks. Mosquitos, ticks and other arthropod vectors are acutely sensitive to their environments, including atmospheric conditions, moisture, vegetation, habitats and the presence of other animals.
  • Comparative information. By definition, when a new disease emerges there is no information about it. Prior outbreaks of closely related pathogens can be very useful for anticipating how an outbreak of a novel pathogen will progress. The tendency for superspreaders in the first SARS coronavirus outbreak was one of the first clues that superspreading would be an issue with COVID-19.
  • Post-event analyses. After action reports, post facto statistical analysis, and narratives about near misses all provide information about what worked and what didn’t. We can only learn from the past if we document it and curate the documentation so it is accessible. 

Prior to the pandemic, the United States had no shortage of pandemic-preparation plans. However, what was missing when SARS-CoV-2 began to spread was the acknowledgment that data don’t simply “exist.” Data have to be created and interpreted. Public health should be a non-partisan issue. Funding must be continual to ensure that an infectious disease strategic data stockpile is always up-to-date.

Epidemics grow exponentially when left unchecked, which is why early recognition and intervention are key to ensuring that outbreaks don’t become epidemics.

The COVID-19 pandemic is not yet over and it is just a matter of time before the next pandemic occurs. COVID-19 has shown that developing an operative strategic data stockpile must be a priority of the United States Congress prior to the emergence of the next novel pathogen.

John M. Drake is a professor and director of the Center for the Ecology of Infectious Diseases at the University of Georgia. 

Tags COVID-19 pandemic Disease outbreak Emerging infectious disease Epidemiology Global health Infectious diseases Pandemics ProMED-mail strategic national stockpile

The Hill has removed its comment section, as there are many other forums for readers to participate in the conversation. We invite you to join the discussion on Facebook and Twitter.

Most Popular

Load more


See all Video