Coronavirus: It’s time to get real about the misleading data

There is no doubt about it: The numbers are just not right. Whether diagnosed cases, deaths, projections, much of the data you see about the coronavirus is misleading — it’s just a matter of how far off the numbers really are.

How the data on coronavirus is presented and discussed is a serious problem, as efforts to contain the pandemic — and support for those efforts — are based on the math of transmission. Far-reaching public policy decisions are made based not only on data but on public demands as shaped by the data presented to the public. The public needs to know the truth to have confidence in those policy decisions — even if the truth means that there is a great deal of uncertainty which will not be resolved for some time.

Fundamentally, the problem is that people — and the news media — don’t like uncertainty and demand exact answers even when there are none to be had. The result too often is bad numbers, bad reporting and a disillusioned public when the numbers turn out to be off-base.

Let’s tackle four of the current data and analysis problems.

The number of actual cases is far higher than the number of ‘confirmed cases’

First off, the “confirmed cases” statistic ubiquitously reported every day is just that — only the cases that have been definitively confirmed by testing. The vast majority of people have not been — and likely will not be — tested: That includes people who contracted the virus and had mild or few symptoms or have recovered without hospitalization. For weeks now, for the most part, only the most severe cases were tested, which means many people who had or have the disease are undercounted, if counted at all. That number is a big deal, as those with mild cases can still transmit (even before they develop any symptoms).

So how many have likely been infected in the United States?

Estimates range from a few hundred thousand up to just under 2 million with a median of 362,000 as of March 26.

In a Reuters poll 2.3 percent of respondents claim they have been diagnosed as having the virus — that’s 4.8 million adults. Now, that is people at least in part making a self-diagnosis, but the experts in behavior and statistics who spoke to Reuters consider the survey results a much better approximation of the true level of contagion than the “confirmed cases” statistic.

Deaths are also likely underestimated. Since most deaths are linked to additional health conditions, that leaves multiple options for official cause of death. Further, overwhelmed hospitals set a priority on care, not on data collection. That does not mean deaths are 10 times higher, but the number of actual deaths from the virus is likely higher than the number of reported “confirmed” deaths.

Bottom line: It’s worse than the numbers suggest.

It’s not just the United States. One study estimates that infections in China in January were over six times higher than the official reports. Which brings us to the second point…

China is lying

On March 10, Chinese President Xi Jinping made a triumphant victory lap at the origination point of the pandemic. China started reporting no new local transmissions soon after the Xi visit. Does anyone really believe this?

The timing is too tidy, and there’s mounting evidence to be skeptical.

In a closed society with a pervasive surveillance state, getting at the truth is a major challenge, but media sources from Hong Kong and Japan as well as The Guardian have pieced together enough independent sources to cast serious doubt on China’s claims. International media sources not under the thumb of the government indicate a much higher number of cases and continued local transmission — whether by refusing to test even symptomatic patients, manipulating data or outright undercounting.

Again, the same holds true for reported deaths.

Given the Chinese government’s refusal to allow entry and access to outside observers, its expulsion of journalists from America’s top news outlets — the New York Times, the Wall Street Journal and the Washington Post — and its utter lack of external independent accountability, we will likely never know the true toll of the virus on China.

The only thing we can safely conclude is that China’s numbers are fraudulent.

Statistical model results are badly misunderstood

Statistical models are not magic boxes — they simply provide predictions based on currently known data and historical evidence. Like all statistical techniques, the models are dependent on the data. There’s an old saying in computer science: Garbage In, Garbage Out. If you have poor data, your results will be unreliable no matter how sophisticated your statistical modeling is.

What does this have to do with coronavirus? Due to the uncertainty of the data we have (true number of infections, true mortality rate) and the myriad of factors that will determine the ultimate toll on human health and the economy (extent of compliance with social distancing, ability of the health system to keep up, availability of medical supplies, etc.), any model is going to have a high degree of uncertainty to its conclusions.

Consider the University of Washington model. The headline figure of predicted deaths as 81,000 is just the middle value or most likely outcome from a range of 38,000 to 162,000 deaths. What researchers have (appropriately) done is run the model multiple times, entering a series of different assumptions. The better communities do reducing transmissibility and providing care, the lower the predicted deaths — but we don’t know how well different communities will actually do.  So, there’s a range.

The university is predicting the pandemic will subside in June, but that is also part of a range and based on specific assumptions and their estimation of current data.

The news media is making a critical mistake in reporting exact numbers and predicted dates instead of the range of possibilities.

Any specific prediction is dependent on a series of equally specific assumptions and estimations of what the true extent of infection might be. The models would be better if China provided factual information on their experience, but that is not — and will not — be the case.

Bottom line: Both the media and the public should accept the uncertainty inherent in the ranges of predictions in these models.

Hold the praise for outliers

Germany got quite a lot of praise for how the country has handled the outbreak based on a single statistic: its low mortality rate. High levels of testing, a superior health care system, and better preparation supposedly led to Germany’s stunning success.

It sure is nice to be number one — too bad it didn’t last.

In just five days, from March 26 to March 31, Germany saw its mortality rate rise to 1.08 percent, putting it behind several developed countries including Norway (0.84 percent), Israel (0.44 percent), Czechia (0.94 percent) Australia (0.42 percent), and even tiny Croatia (0.69 percent) — countries that fall behind Germany in almost all the measures used by Politico to “explain” German brilliance. The German rate has since continued to rise.

That Germany’s numbers would become more “normal” should not be surprising. When you consider the best analogs for Germany — adjacent wealthy Germanic/Scandinavian countries (Austria, Denmark, Netherlands, Sweden, Switzerland) — the nation does not stand out, at least in a positive way. Germany is older than any country in Europe (age is a strong predictor of coronavirus mortality). Its health care system’s general ranking is lower than all but Sweden.

It is a serious mistake to take an outlier like Germany and automatically assume that it should be the norm. This is a typical problem amongst reporters who understand little about statistics. Outliers are called “outliers” for a reason. They usually denote some set of unusual, non-replicable circumstances, data errors or, in this case, an arbitrary point in time where the data is incomplete.

The bottom line: When a particular statistic stands out, that number bears extra scrutiny and skepticism. Don’t automatically assume there is some kind of secret genius-level thinking happening.


There is still a great deal we don’t know about the current state of the coronavirus pandemic, much less the future course of it. But there are three conclusions we can make right now:

1) the number of infected people both in the U.S. and the world is much, much higher than the “confirmed cases” statistic;

2) All predictions about the future course of the pandemic are highly uncertain and are ultimately dependent on how vigilant communities and individuals are being;

3) No country had or has a secret magic solution.

As repetitive and trite as it sounds, low or no social contact, proper hygiene and personal responsibility remain the best practices for everyone.

Keith Naughton, Ph.D., co-founder of Silent Majority Strategies, is a public affairs consultant who specialized in Pennsylvania judicial elections. Follow him on Twitter @KNaughton711

Tags Catastrophe modeling China Coronavirus coronavirus deaths coronavirus pandemic COVID-19 Global health health data Infection Predictive modeling Statistics US news media

The Hill has removed its comment section, as there are many other forums for readers to participate in the conversation. We invite you to join the discussion on Facebook and Twitter.

More Technology News

See All
See all Hill.TV See all Video

Most Popular

Load more


See all Video