Credit card databases not so anonymous

There is virtually no such thing as an anonymous database of credit card transactions, a study released Friday concludes.

Given just four pieces of random, non-personal information about credit card transactions — say, the location of those purchases or the amount spent — researchers were able to tie the actions to the correct shopper 90 percent of the time.

ADVERTISEMENT
The researchers could then combine that shopper’s record with publicly available information on social media sites to reveal the shopper’s identity — a so-called “correlation attack.”

Yves-Alexandre de Montjoye, a graduate student at the Massachusetts Institute of Technology’s Media Lab, was the lead author on the study, which was published in the journal Science.

“The open sharing of raw data sets is not the future,” he told the journal.

De Montjoye and his team analyzed a database containing 1.1 million people’s credit card transactions in 10,000 shops over a three-month span.

The information had been stripped of names, credit card numbers, store addresses and the time stamp on the transaction. What was left is considered metadata, including purchase amount, the type of shop in which the purchase was made and a code representing each person.

Many companies argue that sharing anonymized metadata is harmless. But De Montjoye’s study shows people’s shopping habits are so unique that it is nearly impossible to truly anonymize a credit card transaction data set.

Other studies have revealed similar findings in other types of datasets. Researchers have re-identified Netflix users from an anonymous database of customer's viewing histories and medical patients from anonymous hospital data released by Washington state.

But many industry groups are pushing back against what they believe are overstated fears. Large databases are a major key to improving life, they argue, from helping companies run more efficiently to aiding doctors in diagnosing diseases.

De Montjoye didn’t downplay the benefits of big data. He told Science that one possible solution could be to keep sensitive databases in cloud-based storage, protected by strong “gatekeeper” software.

The software would prevent researchers from accessing individual files, while still allowing them to obtain overall statistics from the database.

Lawmakers have been grappling with similar issues. Congress has been trying to move a bill that would enable companies to share cybersecurity information, stripped of personally identifiable information, with the government.

Privacy advocates have argued the "anonymization" techniques are not necessarily sufficient to protect individual's identities.