Deleted NIH data fuels further debate on COVID-19 origins

Deleted NIH data fuels further debate on COVID-19 origins
© Getty Images

A scientist who says he uncovered data on the genetic sequence of the SARS-CoV-2 virus that had been deleted from a National Institutes of Health (NIH) database has thrown more fuel on the debate over the virus's origins. 

Jesse Bloom, a principal researcher at the Fred Hutchinson Cancer Research Center, wrote in a preprint paper that he recovered a data set containing SARS-CoV-2 sequences from early in the Wuhan epidemic that had been deleted from the NIH’s Sequence Read Archive (SRA). 

The paper was posted to the bioRxiv server, an open repository for papers that have not yet been peer-reviewed or published. 

ADVERTISEMENT

According to Bloom, the data purportedly show that the virus was circulating in the Chinese city of Wuhan before a December outbreak of COVID-19 that was linked to a "wet market" selling live animals.

On Twitter, Bloom noted that the "fact this dataset was deleted should make us skeptical that all other relevant early Wuhan sequences have been shared."

The early samples that have been the focus of most studies, including the joint World Health Organization-China report, are not fully representative of the viruses actually present in Wuhan at that time, Bloom concluded.

The SRA is where scientists publish deep sequencing data for others to analyze. 

The data were first published by preprint in March 2020 and in the journal Small in June 2020. According to the NIH, the submitter who originally published the sequences to the SRA requested they be withdrawn. 

In a statement, the NIH said the requestor wanted the data removed from SRA and indicated it was being submitted to another database. Submitting investigators hold the rights to their data and can request withdrawal of the data, the agency said.

ADVERTISEMENT

“These SARS-CoV-2 sequences were submitted for posting in SRA in March 2020 and subsequently requested to be withdrawn by the submitting investigator in June 2020. The requestor indicated the sequence information had been updated, was being submitted to another database, and wanted the data removed from SRA to avoid version control issues,” the NIH said.

NIH said staff "can’t speculate on motive beyond a submitter’s stated intentions."

In his paper, Bloom said "there is no plausible scientific reason for the deletion. ... There are no corrections to the paper, the paper states human subjects approval was obtained, and the sequencing shows no evidence of plasmid or sample-to-sample contamination. It therefore seems likely the sequences were deleted to obscure their existence."

Bloom has been one of the leading outside proponents of the need to further investigate the origins of the coronavirus. 

He was a co-author of a letter to the journal Science, signed by a group of 17 leading scientists, calling for further investigation of the “lab leak” hypothesis.

The scientists noted that they were not advocating for one possibility over another but that the lack of evidence called into question any existing hypothesis. 

Bloom's paper doesn't draw conclusions on whether the virus escaped from a lab or naturally jumped from animals to humans, but he noted it does raise questions about the openness of Chinese researchers, and whether scientists trying to study the origins of the virus have access to all relevant information.

"These data do not provide specific additional support for either a zoonosis or a lab accident, they could be consistent with both," Bloom said in a brief email to The Hill. "What they do suggest is that we may have an incomplete picture of early Wuhan viruses, and that it is potentially possible to obtain more sequence data."