33 Bits of Entropy: Myths and Fallacies of "Personally Identifiable Information"
نویسنده
چکیده
Data is the currency of the digital economy, but increasing data collection by companies and sharing with third parties threatens privacy. “Anonymization” is the usual answer to privacy concerns, typically implemented via removal of “personally identifiable information.” Sweeney’s work on reidentification of Massachusetts hospital records showed that naive deidentification via PII removal can be reversed [3]. That led to a cat-and-mouse game between deidentification and reidentification, with standards such as HIPAA mandating removal of a more comprehensive set of attributes. In parallel, techniques for data transformations that enable specific categories of computations in a mathematically rigorous privacy-preserving way were developed — “differential privacy” enables sidestepping the need for anonymization altogether [1]. However, the anonymization paradigm is extremely popular due to its convenience and because it avoids the need to circumscribe allowed computations in advance. Several theoretical and practical questions remained open. Given the increasingly easy availability of public “auxiliary information” about individuals (e.g., from social media), is it possible to provide any technical privacy guarantees via anonymization, while maintaining data utility? How identifiable are people’s footprints in the rich “longitudinal” databases that are common today? Can we characterize which types of data can lead to reidentification, thus salvaging the notion of “Personally Identifiable Information?” Finally, how many bits of uncorrelated information (“entropy”) are required to reidentify individuals in large datasets? This paper references the recent work, Myths and Fallacies of “Personally Identifiable Information” [2]. BODY Anonymization of rich consumer data is infeasible—people are unique, and any piece of data can help reidentify. 33 bits of entropy will do.
منابع مشابه
A Sudy on Information Privacy Issue on Social Networks
In the recent years, social networks (SN) are now employed for communication and networking, socializing, marketing, as well as one’s daily life. Billions of people in the world are connected though various SN platforms and applications, which results in generating massive amount of data online. This includes personal data or Personally Identifiable Information (PII). While more and more data a...
متن کاملMyths and Fallacies about Male Contraceptive Methods: A Qualitative Study amongst Married Youth in Slums of Karachi, Pakistan
Pakistan presently has one of the largest cohorts of young people in its history, with around 36 million people between the ages of 15 and 24 years. One of the main reasons for high population growth in Pakistan is almost stagnant contraceptive prevalence rate of 30% nationally and 17.4% amongst youth. The study was conducted to explore the perceptions regarding myths and fallacies related to m...
متن کاملFROPUF: How to Extract More Entropy from Two Ring Oscillators in FPGA-Based PUFs
Ring oscillator (RO) based physically unclonable function (PUF) on FPGAs is crucial and popular for its nice properties and easy implementation. The compensated measurement based on the ratio of two ring oscillators’ frequencies proves to be particularly effective to extract entropy of process variations. However from two ring oscillators only one bit entropy is extracted and RO PUFs will occup...
متن کاملImproved Lower Bounds for Locally Decodable Codes and Private Information Retrieval
We prove new lower bounds for locally decodable codes and private information retrieval. We show that a 2-query LDC encoding n-bit strings over an l-bit alphabet, where the decoder only uses b bits of each queried position of the codeword, needs code length
متن کاملQuantum Private Information Retrieval with Sublinear Communication Complexity
This note presents a quantum protocol for private information retrieval, in the case of a single (honest) server and with information-theoretical privacy, that has O( √ n)qubit communication complexity, where n denotes the size of the database. In comparison, it is known that any classical protocol must use Ω(n) bits of communication in this setting. ACM Classification: F.2.3 AMS Classification...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- TinyToCS
دوره 1 شماره
صفحات -
تاریخ انتشار 2012