Last week, the Wall Street Journal reported on Project Nightingale — a controversial initiative spearheaded by Google’s parent company Alphabet and its health-care cloud-computing customer Ascension Health.
Millions of patient records had been copied to Google servers and were viewable by hundreds of Alphabet employees. This was done without the knowledge of the patients or their physicians.
While both Ascension and Google claim they abided by the law, Project Nightingale has raised serious concerns and sparked federal inquiry.
Why would Alphabet and Ascension engage in a deal with such regulatory risk? Health data has enormous value. The data Ascension turned over to Alphabet may be worth billions of dollars, if prior health-care data transactions are any indication. This value is rooted in medical data’s ability to drive innovation and fuel artificial intelligence-assisted medicine.
The risk and rewards of health data privacy
Consider the following: A few years ago, a good friend of mine suffered an unexpected tendon rupture. Jesse (not his real name) was young and fit at the time of the incident. He figured he’d chalk the rupture up to bad luck.
Then he realized he’d been taking the oft-prescribed antibiotic ciprofloxifan (Cipro). Used to treat a range of infections, Cipro launched in 1987 and became a multi-billion-dollar business by 2003. We now know the antibiotic has a number of rare but serious side effects.
But because the data was hard to obtain, these side effects remained hidden for years. In 2008, with clear evidence of increased risk of tendon rupture, Cipro was slapped with a black-box warning by the FDA. Additional black-box warnings were issued for irreversible peripheral neuropathy in 2016, and for aortic rupture in 2018. The maker of this wonder-drug is still facing litigation.
Disturbingly, lack of data means we remain largely in the dark. How can we address the data gap — to get better therapies faster and cheaper, identify risks and benefits sooner, and personalize therapies?
An ethical approach to health data
There is certainly a case to be made for data privacy. Far too often, we learn our personal data has been jeopardized. Data breaches or misuse by tech giants, major retailers, financial institutions, and credit agencies have affected nearly everyone in the modern world.
In health care, the lack of data privacy is arguably worse. Chances are that some of your health data is used, bought and sold daily without your knowledge. And the tens of billions of dollars generated do not flow back into primary health care, although the purveyors will argue that data-sharing do create value.
While stringent privacy rules may sound appealing, they too can be dangerous. Increased regulation could amplify the problem and further enable the gray market. HIPAA de-identification safe harbor provisions permit organizations to freely aggregate, transfer, and even sell data when specific identifiers are removed from a record — without having to comply with HIPAA.
Yet, data science demonstrates that just a handful of data elements can be used to re-identify an individual with remarkable precision. The emergence of machine learning in health applications has made de-identification highly ineffective.
In health care, privacy could kill you
Privacy is the bedrock of trust and the key to unlocking the true promise of data.
Data drives research and innovation; it lays the foundation for AI-assisted medicine.
For deep learning in particular, large volumes of data are critical. Medical knowledge was largely anecdotal in decades past, but today it is increasingly a numbers game. Electronic data and the submission of research findings to Data Commons like the National Cancer Institute Genomic Data Commons are paramount. And the standards for merging this data are spotty.
That said, imagine a future where hundreds of millions of deep medical records can be computed by the most advanced machine learning algorithms. This could help to identify new trends and optimal treatments not just for populations, but for individuals.
Will taking statins improve your personal outcome when undergoing radiation treatment for prostate or breast cancer, as some studies suggest? Will taking Cipro be risky, given your genetic and medical history? Analysis of massive data will readily answer questions such as these.
A Hippocratic Oath for health data
In short, if we are to reach the full potential of healthcare data, we cannot go down the path of exploitation. We must develop an ethical framework for the use and sharing of data centered on four pillars:
To preserve privacy, no individual data — even “de-identified” data — should be transferred without full audit and perpetual provenance. Those who say this is impossible are either not technologists, or they are in the business of profiting from patient data.
Patient consent should be fully revocable at any time. My experience, and that of my colleagues in clinical practice, is that most patients want their data used because it benefits them and society. That said, the patient’s wishes should be honored, with the option to change the conditions of use at any point.
As in other domains, data proceeds should flow to the data owner/custodian. It may not be practical to return pennies to individuals at this time. For now, we might consider data as a revenue stream for the healthcare centers that bear the unfunded mandate of curating and storing the data. Returns to health systems will improve patient care and the services not-for-profit hospitals can offer.
Use data at the source. The highest-value and most current patient data should come directly from the source — that is, generally from the hospitals where episodic treatment and testing are done. Increasingly, however, the data should reside at the edge, with the patient.
In this view of the future, patient data isn’t copied or exposed. It isn’t exploited by gray market. Rather, it retains maximum value for discovery, innovation, and care.
The technology required to implement this process requires further development, and those who profit from exploitation are powerful. However, the advantages of finding a better path forward are well worth the effort.
Piers Nash, MBA, Ph.D., is a cancer biologist who was a University of Chicago professor involved in genome project analysis for dozens of species and tens of thousands of human genomes. He was a Director in the Center for Data Intensive Science for the architecting and deployment of the National Cancer Institute’s Genomic Data Commons that is now the nation’s core system for harmonized genomic and clinical data for cancer research. Nash is the CEO of Sympatic Inc.