What we're missing in the CCPA de-identification debate

What we're missing in the CCPA de-identification debate
© Getty Images

The California Consumer Privacy Act (CCPA) is predicted to reshape national technology companies in the United States.

Unsurprisingly, it is under constant attacks from both businesses and lawyers. This is despite ‘eloquent’ calls from big tech companies in favor of comprehensive privacy laws. As usual, it is relatively easy to create consensus over high-level pro-user principles.

But the devil is in the details.


When it comes to making clear decisions about what to protect, the temptation is to throw the baby out with the bathwater. A piece of legislation really becomes alive once it has been interpreted by judges, who have the chance to be informed by experts. If these experts were the first generation of legal engineers, the soon-to-be-born CCPA could probably be saved.

Widely misunderstood are the CCPA’s de-identification provisions, which identify the types of data that can be excluded from CCPA’s scope or which should generate a much lighter compliance burden.

As a result, companies have an incentive to de-identify their data to reduce their obligations and protect consumers. Yet, by focusing upon the legalese and apparent inconsistencies of language between sections, lawyers often fail to see that de-identification is in fact a spectrum. Businesses using de-identified data should not be given ‘carte blanche.’

There is no doubt that the CCPA drafting could appear at first glance relatively clumsy. What’s more, it does not precisely identify the methods to be used to achieve de-identification, although it lists a series of key controls.

The truth is that lack of clarity was to be expected. Regulators in different parts of the world have been struggling with capturing what the core components of an effective de-identification process look like and what its actual effect should be. Just look at the six-year debate about ‘pseudonymization vs anonymization’ in the wake of the adoption of the General Data Protection Regulation (GDPR) in Europe.

The main source of confusion


Confusion mainly comes from the fact that while the CCPA does provide a clear exclusion for “publicly available” data, it is more ambiguous when it comes to de-identified or in the aggregate consumer information (compare section 1798.140(o)(2) with section 1798.145.(a)(5)). Nevertheless — and this should be underlined twice — the intent seems to be to ensure that the collection, use, retention, sale, or disclosure of de-identified information should not be restricted. De-identification is a spectrum, so de-identification techniques should be coupled with other technical and organizational measures or safeguards, such as access control, auditing, and obligations not to reidentify.

California legislators seem to be aware of these ambiguities and confusions and have attempted to remove them from the bill. For instance, three amendment bills — AB 873, 874, and 1355 (which the California Legislature passed) — all had in their intent to clarify that de-identified data is excluded from the definition of personal information, and thus the scope of the CCPA.

Legal standards are progressively converging

Despite the difficulty of the drafting task, the good news is that standards seem to be progressively converging and the CCPA is in fact building on existing recommendations.

The 2012 FTC safeguard recommends three steps for de-identification: (1) takes reasonable measures to ensure that the data is de-identified; (2) publicly commits not to try to re-identify the data; and (3) contractually prohibits downstream recipients from trying to re-identify the data.

The CCPA’s own recommendation for de-identification, in its 1.0 version, appears to overlap with the FTC’s framework:

  1. Implement technical safeguards that prohibit re-identification.
  2. Implement business processes that specifically prohibit re-identification.
  3. Implement business processes that prevent inadvertent release of de-identified information; and
  4. Make no attempts to re-identify the information.

The most recent draft of the new CCPA consumer privacy ballot initiative (“CCPA 2.0”) is even closer to the FTC’s three-prong test.

Best practices are progressively maturing

Best practices are maturing to this key insight: Only when organizations enact varied controls together, can regulators justify lifting restraints on processing personal information.

It is not enough to perturb the data to claim effective de-identification.

A successful de-identification strategy, as captured by both the CCPA and the FTC recommendations, is the control of downstream data usage through contractual obligations, auditing, and the range of techniques applied on the data in order to better balance utility and data perturbation. 

The following regulatory trends support this conclusion: 

  • The National Committee on Vital and Health Statistics sent a letter in 2017 to the U.S. Department of Health and Human Services calling for the reinforcement of the range of mechanisms found in the HIPAA Safe Harbor Rule. The National Committee is calling on additional measures such as “consider[ing] the intended uses or the security and access controls used by recipients of a particular de-identified data set.”
  • The U.S. National Institute of Standards and Technology (NIST) issued a whole guide to Protecting the Confidentiality of Personally Identifiable Information in 2010. An additional set of guidance for de-identification was released in 2015, targeting researchers, government and industry.
  • The UK Information Commission Office with the Code of Practice on anonymization released in 2012 has also suggested a range of controls. This has become a point of reference in the field globally (see e.g. the guidelines issued by the Privacy Commissioner of Ontario or the Australian guidance produced by the Office of the Australian Privacy Commissioner), and in particular, has informed the work done by the European Medicines Agency.

Yet, this is not exactly how lawyers are usually advising businesses. A more common approach is to insist upon the uncertainty, rather than to identify the range of possible solutions, and therefore demotivate businesses.

The only way out of this vicious circle and to give privacy regulations a chance to survive the storm is to call on legal engineers, who are able to combine both legal and technical insights in order to support responsible decision-making within organizations. These new figures should be tasked with helping build solutions, embedding as many legally-relevant safeguards as possible within products, services and organizational processes.

Sophie Stalla-Bourdillon is a leading expert on the EU GDPR, and Senior Privacy Counsel & Legal Engineer at Immuta, a leading automated data governance platform, where she works on tackling the ethical challenges of AI. She holds a master's degree in English and North-American Business Law from Panthéon-Sorbonne University and an L.LM. from Cornell Law School. Follow her on Twitter @SophieStallaB

Dan Wu is a Privacy Counsel & Legal Engineer at Immuta, a leading automated data governance platform. He holds a J.D. & Ph.D. from Harvard University. His work on legal tech, data ethics, and inclusive urban innovation has been featured in TechCrunch, Harvard Business Review, and Bloomberg. Follow him on Twitter @danwu_danwu