Network of Excellence Internet Science Summer School. The theme of the summer school is "Internet Privacy and Identity, Trust and Reputation Mechanisms".
More information: http://www.internet-science.eu/
Joss Wright, Oxford Internet Institute (Plenary): Privacy-Preserving Data Analysis - Mechanisms and Formal Guarantees
1. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio
oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi
Privacy Mechanisms Notable Cases State of the Art Conclusions
Privacy-Preserving Data Analysis
Mechanisms and Formal Guarantees
Joss Wright
joss.wright@oii.ox.ac.uk
Oxford Internet Institute
Oxford University
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Joss Wright Privacy-Preserving Data Analysis: 1/57
2. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio
oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi
Privacy Mechanisms Notable Cases State of the Art Conclusions
Privacy
What is privacy?
Many definitions in different areas of application.
A useful definition: informational self-determination
Enable data subjects to control how, in what way, and to whom
their data is made available.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Joss Wright Privacy-Preserving Data Analysis: 2/57
3. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio
oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi
Privacy Mechanisms Notable Cases State of the Art Conclusions
Privacy
What is privacy?
Within the privacy enhancing technologies community:
Protecting the relations between communicating parties from
observation.
Context privacy.
Anonymous communications.
Preventing deduction of identities or attributes from collections of
data.
Data privacy.
Strongly related concepts, but surprisingly separate fields of research.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Joss Wright Privacy-Preserving Data Analysis: 3/57
4. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio
oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi
Privacy Mechanisms Notable Cases State of the Art Conclusions
Data Privacy
Protection of individual data subjects from identification.
Typically we work within the context of statistical queries on
databases.
Counts, averages, histogram queries, etc.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Joss Wright Privacy-Preserving Data Analysis: 4/57
5. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio
oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi
Privacy Mechanisms Notable Cases State of the Art Conclusions
Model
Consider a database as made up from a number of rows
representing a single, unique individual, with columns showing
attributes.
All databases are not like this, but it’s useful for mechanism design
and gives sufficient generality.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Joss Wright Privacy-Preserving Data Analysis: 5/57
6. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio
oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi
Privacy Mechanisms Notable Cases State of the Art Conclusions
Model
Name Age Height
Joss 31 168
Alice 30 144
Bob 25 200
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Joss Wright Privacy-Preserving Data Analysis: 6/57
7. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio
oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi
Privacy Mechanisms Notable Cases State of the Art Conclusions
Actors
Data subjects
Owners of the data
Holders and publishers of data
Recipients of data
Attacker
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Joss Wright Privacy-Preserving Data Analysis: 7/57
8. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio
oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi
Privacy Mechanisms Notable Cases State of the Art Conclusions
Trust in the System
Where do we place trust in the system?
Subjects
Need not be trusted as they control their own data.
Publishers
May need to be trusted in how they gather the data.
If you expect them to control release, they must be trusted.
Data Recipients
Adversarial and malicious.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Joss Wright Privacy-Preserving Data Analysis: 8/57
9. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio
oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi
Privacy Mechanisms Notable Cases State of the Art Conclusions
Basic Mechanisms
Anonymization
Remove explicit identifiers such as names.
Privacy-preserving data mining
Restrict queries to preserve privacy or results.
Preferably enforced by the data publisher.
Data peturbation
Alter data to prevent undesirable inferences from being drawn
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Joss Wright Privacy-Preserving Data Analysis: 9/57
10. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio
oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi
Privacy Mechanisms Notable Cases State of the Art Conclusions
Anonymization
Remove names or other obvious identifiers from data.
Problems arise with quasi-identifiers.
Combinations of record values that uniquely identify individuals.
These can be difficult to specify or even detect.
Exacerbated by the fact that data from external sources may
combine with the database to form a quasi-identifier.
We’ll come back to this.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Joss Wright Privacy-Preserving Data Analysis: 10/57
11. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio
oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi
Privacy Mechanisms Notable Cases State of the Art Conclusions
Anonymization
Name Age Height
Joss 31 168
Alice 30 144
Bob 25 200
Charles 31 187
David 27 168
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Joss Wright Privacy-Preserving Data Analysis: 11/57
12. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio
oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi
Privacy Mechanisms Notable Cases State of the Art Conclusions
Anonymization
Name Age Height
Joss 31 168
Alice 30 144
Bob 25 200
Charles 31 187
David 27 168
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Joss Wright Privacy-Preserving Data Analysis: 12/57
13. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio
oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi
Privacy Mechanisms Notable Cases State of the Art Conclusions
Anonymization
Name Age Height
Joss 31 168
Alice 30 144
Bob 25 200
Charles 31 187
David 30 168
Red values are unique, therefore quasi-identifiers.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Joss Wright Privacy-Preserving Data Analysis: 13/57
14. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio
oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi
Privacy Mechanisms Notable Cases State of the Art Conclusions
Anonymization
Name Age Height
Joss 31 168
Alice 30 144
Bob 25 200
Charles 31 187
David 30 168
Blue values are unique combinations, and so quasi-identifiers.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Joss Wright Privacy-Preserving Data Analysis: 14/57
15. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio
oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi
Privacy Mechanisms Notable Cases State of the Art Conclusions
Anonymization Methods
One of the most well-known anonymizing mechanisms applied to
data is k-anonymity
Each unique set of records in a database should be combined with
(1 − k) other records in the database.
Any given record therefore describes at least k people.
The probability that you are identified by that record is 1/k.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Joss Wright Privacy-Preserving Data Analysis: 15/57
16. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio
oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi
Privacy Mechanisms Notable Cases State of the Art Conclusions
k-anonymity
Name Age Height
Joss 31 168
Alice 30 144
Bob 25 200
Charles 31 187
David 27 168
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Joss Wright Privacy-Preserving Data Analysis: 16/57
17. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio
oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi
Privacy Mechanisms Notable Cases State of the Art Conclusions
k-anonymity
Name Age Height
Joss [25-35] ≤180
Alice [25-35] ≤180
Bob [25-35] >180
Charles [25-35] >180
David [25-35] ≤180
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Joss Wright Privacy-Preserving Data Analysis: 17/57
18. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio
oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi
Privacy Mechanisms Notable Cases State of the Art Conclusions
k-anonymity Applied
This is not a hypothetical issue.
When Sweeney proposed k-anonymity, she demonstrated the
risks.
Took postcode, date of birth and sex from a published voter
register
Took anonymized published medical records
Identified the record belonging to a former governor of
Massachusetts.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Joss Wright Privacy-Preserving Data Analysis: 18/57
19. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio
oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi
Privacy Mechanisms Notable Cases State of the Art Conclusions
Beyond k-anonymity
k-anonymity gives a basic level of anonymization that prevents an
individual being simply re-identified from their published attributes.
There are, naturally, more subtle issues.
We may still be able to infer sensitive information about a person,
even if we can’t directly identify them.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Joss Wright Privacy-Preserving Data Analysis: 19/57
20. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio
oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi
Privacy Mechanisms Notable Cases State of the Art Conclusions
l-diversity
k-anonymity ensures that an individual is indistinguishable from a
group of other individuals, preventing their direct re-identification.
It could be, however, that attributes shared by the entire group are
sensitive.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Joss Wright Privacy-Preserving Data Analysis: 20/57
21. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio
oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi
Privacy Mechanisms Notable Cases State of the Art Conclusions
l-diversity
Name Age Height Illness
Joss 31 168 Flu
Alice 30 144 Flu
Bob 25 200 HIV
Charles 31 187 HIV
David 27 168 Flu
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Joss Wright Privacy-Preserving Data Analysis: 21/57
22. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio
oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi
Privacy Mechanisms Notable Cases State of the Art Conclusions
l-diversity
Name Age Height Illness
Joss [25-35] ≤180 Flu
Alice [25-35] ≤180 Flu
Bob [25-35] >180 HIV
Charles [25-35] >180 HIV
David [25-35] ≤180 Flu
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Joss Wright Privacy-Preserving Data Analysis: 22/57
23. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio
oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi
Privacy Mechanisms Notable Cases State of the Art Conclusions
l-diversity
Name Age Height Illness
Joss [25-35] ≤180 Flu
Alice [25-35] ≤180 Flu
Bob [25-35] >180 HIV
Charles [25-35] >180 HIV
David [25-35] ≤180 Flu
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Joss Wright Privacy-Preserving Data Analysis: 23/57
24. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio
oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi
Privacy Mechanisms Notable Cases State of the Art Conclusions
l-diversity
Name Age Height Illness
Joss [25-35] ≤200 Flu
Alice [25-35] ≤200 Flu
Bob [25-35] ≤200 HIV
Charles [25-35] ≤200 HIV
David [25-35] ≤200 Flu
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Joss Wright Privacy-Preserving Data Analysis: 24/57
25. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio
oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi
Privacy Mechanisms Notable Cases State of the Art Conclusions
l-diversity
l-diversity ensures that not only are all users k-anonymous, but
that each group of users shares a variety of sensitive attributes.
Variations ensure that all sensitive attributes are evenly or
sufficiently distributed to avoid high probability association of user
with attribute.
One notable extenstion is t-closeness that ensures that the
distribution of attributes in the group is close to the distribution
across the entire table.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Joss Wright Privacy-Preserving Data Analysis: 25/57
26. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio
oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi
Privacy Mechanisms Notable Cases State of the Art Conclusions
Peturbation
The above approaches maintain the consistency of the database.
One of the oldest ideas is simply to replace genuine values with
perturbed values that maintain almost-correct desirable
properties.
For numeric quantities this can simply be the addition of random
noise according to some appropriate distribution.
Obviously this works best for numerical data.
For categories, this can result in attributes being re-assigned in a
variety of ways.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Joss Wright Privacy-Preserving Data Analysis: 26/57
27. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio
oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi
Privacy Mechanisms Notable Cases State of the Art Conclusions
Permutation
Sensitive attributes can be swapped between data records,
maintaining statistical quantities such as aggregate counts, averages
and distribution of data.
This has to be performed sensitively with respect to the required
analyses.
Typically on an ad-hoc, per-database basis.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Joss Wright Privacy-Preserving Data Analysis: 27/57
28. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio
oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi
Privacy Mechanisms Notable Cases State of the Art Conclusions
Sweeney’s k-anonymity Re-identification
In 2001, Sweeney set out to prove the ideas behind k-anonymity.
Took publicly available voter registration data and published,
anonymized medical records. (GIC Healthcare Data.)
At the time of the data collection, William Weld was the governor
of Massachusetts.
According to the voter records, only six people in Cambridge,
Massachusetts shared his birth date.
Of those six, three were male.
Only one lived within his (5-digit) ZIP code.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Joss Wright Privacy-Preserving Data Analysis: 28/57
29. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio
oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi
Privacy Mechanisms Notable Cases State of the Art Conclusions
Sweeney’s k-anonymity Re-identification
The anonymized medical records contained over 100 attributes
detailing diagnoses, procedures and medications.
Sweeney calculated that 87% of US citizens were uniquely
identifiable through the quasi-identifier of {sex, date of birth,
5-digit ZIP}
53% from {sex, date of birth, city}
18% from {sex, date of birth, county}
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Joss Wright Privacy-Preserving Data Analysis: 29/57
30. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio
oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi
Privacy Mechanisms Notable Cases State of the Art Conclusions
Netflix Prize
Netflix wanted to improve its film recommendation algorithm.
Published a database of over 100,000,000 film ratings by roughly
500,000 subscribers between 1999 and 2005.
A million dollar prize was offered for an algorithm that would
improve the recommendations given to users by a given degree of
accuracy.
“...all customer identifying information has been removed.”
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Joss Wright Privacy-Preserving Data Analysis: 30/57
31. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio
oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi
Privacy Mechanisms Notable Cases State of the Art Conclusions
Netflix Prize
Narayanan and Shmatikov disagreed.
Combined Netflix data with IMDb data to re-identify a large
number of users.
Linked Netflix ratings to IMDb profiles.
Showed the entire viewing history of many users.
Demonstrated how information such as political preference could
be extracted from the available data.
Proof of concept algorithm used IMDb. Easily adaptable for
alternative information sources.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Joss Wright Privacy-Preserving Data Analysis: 31/57
32. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio
oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi
Privacy Mechanisms Notable Cases State of the Art Conclusions
Netflix Prize
With 8 film ratings, 96% of subscribers can be uniquely identified.
With 2 ratings, and dates, 64% can be completely deanonymized.
With 2 ratings, and dates, 89% can be reduced to a possible 8
users.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Joss Wright Privacy-Preserving Data Analysis: 32/57
33. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio
oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi
Privacy Mechanisms Notable Cases State of the Art Conclusions
Netflix Prize Redux
Following this publication, Netflix’s response was...
... to announce a second Netflix prize containing more data points,
including age, zip code, gender and previously-chosen films.
Eventually cancelled, but only in response to legal action from
customers and concerns from the US Federal Trade Commission.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Joss Wright Privacy-Preserving Data Analysis: 33/57
34. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio
oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi
Privacy Mechanisms Notable Cases State of the Art Conclusions
Mechanisms Revisited
The mechanisms we’ve looked at so far are:
Typically ad-hoc based on the desired utility; the purpose for which
the data will be used.
Without formal guarantees.
Quantifiable probability that individuals could be reidentified.
Sensitive to auxiliary information from external data sources.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Joss Wright Privacy-Preserving Data Analysis: 34/57
35. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio
oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi
Privacy Mechanisms Notable Cases State of the Art Conclusions
Mechanisms Revisited
We can also consider privacy mechanisms as falling into one of
two families:
Non-interactive
Anonymize the data somehow, then release it.
Interactive
Keep the database secret, and only release results to queries.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Joss Wright Privacy-Preserving Data Analysis: 35/57
36. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio
oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi
Privacy Mechanisms Notable Cases State of the Art Conclusions
Non-Interactive Mechanisms
Historically, the main way of doing things.
Including most of the methods we’ve looked at so far.
A major limitation of this approach to anonymization is that it
requires you to fix the utility before you release the data.
Data is either useless and anonymous
Or useful and identifiable.
It is difficult to predict interactions with data that might be released
in the future.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Joss Wright Privacy-Preserving Data Analysis: 36/57
37. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio
oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi
Privacy Mechanisms Notable Cases State of the Art Conclusions
Interactive Mechanisms
In interactive mechanisms, the data is never released.
Instead, queries are sent to the holder of the database, who
releases an answer.
This approach is taken by the current state of the art: differential
privacy.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Joss Wright Privacy-Preserving Data Analysis: 37/57
38. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio
oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi
Privacy Mechanisms Notable Cases State of the Art Conclusions
Differential Privacy
In 1978, Dalenius stated the following desirable property for
privacy-preserving statistical databases:
“A statistical database should reveal nothing about an individual
that could not be learned without access to the database.”
This is impossible, largely due to the existence of auxiliary external
information that can be combined with the data in the database.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Joss Wright Privacy-Preserving Data Analysis: 38/57
39. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio
oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi
Privacy Mechanisms Notable Cases State of the Art Conclusions
Differential Privacy
‘Suppose one’s height were considered a sensitive piece of
information, and that revealing the height of an individual were a
privacy breach. Assume that a database yields the average
heights of women of different nationalities. An adversary who
has access to the statistical database and the auxiliary
information “Terry Gross is two inches shorter than the average
Lithuanian woman” learns Terry Gross’ height, while anyone
learning only the auxiliary information, without access to the
average heights, learns relatively little.’
– Dwork
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Joss Wright Privacy-Preserving Data Analysis: 39/57
40. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio
oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi
Privacy Mechanisms Notable Cases State of the Art Conclusions
Differential Privacy
Critically, this privacy breach occurs whether or not Terry Gross’ data is in
the database.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Joss Wright Privacy-Preserving Data Analysis: 40/57
41. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio
oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi
Privacy Mechanisms Notable Cases State of the Art Conclusions
Differential Privacy
Rather than guaranteeing that a privacy breach will not occur,
differential privacy guarantees that the privacy breach will not
occur due to the data in the database.
Reformulated: Anything that can happen if your data is in the
database could have happened even if your data weren’t in the
database.
This neatly accomodates any and all possible auxiliary information
available now or in the future.
It also divorces the privacy mechanism from the nature of the
underlying data, providing a general mechanism.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Joss Wright Privacy-Preserving Data Analysis: 41/57
42. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio
oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi
Privacy Mechanisms Notable Cases State of the Art Conclusions
Differential Privacy Core
A randomised function K achieves ϵ-differential privacy if, for any two
databases D1 , D2 differing on at most one element, and all
S ⊆ Range(K):
Pr[K(D1 ) ∈ S] ≤ eϵ × Pr[K(D2 ) ∈ S]
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Joss Wright Privacy-Preserving Data Analysis: 42/57
43. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio
oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi
Privacy Mechanisms Notable Cases State of the Art Conclusions
Differential Privacy Core
Alternatively:
Pr[K(D1 )∈S]
Pr[K(D2 )∈S] ≤ eϵ
The ratio between the two probabilities is bounded by eϵ
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Joss Wright Privacy-Preserving Data Analysis: 43/57
44. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio
oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi
Privacy Mechanisms Notable Cases State of the Art Conclusions
The Exponential Function
eε
150
100
50
0
0 1 2 3 4 5
ε
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Joss Wright Privacy-Preserving Data Analysis: 44/57
45. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio
oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi
Privacy Mechanisms Notable Cases State of the Art Conclusions
Diferential Privacy Core
Translated: for any calculation that you make on a database, any
result you get is (almost) equally probable if you add a person, and
thus a single record, to that database.
Alternatively put: two databases that differ in a single record
should be indistinguishable, with given probability, when accessed
via the privacy mechanism.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Joss Wright Privacy-Preserving Data Analysis: 45/57
46. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio
oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi
Privacy Mechanisms Notable Cases State of the Art Conclusions
Achieving Differential Privacy
How do we achieve this guarantee?
There are a variety of mechanisms proposed in the literature, but
Dwork’s original suggestion remains popular:
Appropriately chosen random noise is added to the result of a
query of arbitrary complexity.
Noise added to the result means that the original database retains
its accuracy.
The Laplace distribution provides desirable properties for the
appropriate noise.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Joss Wright Privacy-Preserving Data Analysis: 46/57
47. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio
oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi
Privacy Mechanisms Notable Cases State of the Art Conclusions
The Laplace Distribution
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Joss Wright Privacy-Preserving Data Analysis: 47/57
48. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio
oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi
Privacy Mechanisms Notable Cases State of the Art Conclusions
Achieving Differential Privacy
How do we know how much noise to add?
We use the L1 -sensitivity of the function to bound the noise:
Defined as the amount by which the query could change if a single
record were added to the database.
Recall that our guarantee is based around indistinguishability
between similar databases.
As an example: the count function (e.g. “How many people in the
database are left-handed?”) can only differ by one.
Other queries types differ, but many complex queries have
manageable L1 -sensitivity.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Joss Wright Privacy-Preserving Data Analysis: 48/57
49. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio
oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi
Privacy Mechanisms Notable Cases State of the Art Conclusions
Properties of Differential Privacy
Use of the Laplace distribution to add noise provably adds the
smallest amount required to preserve privacy.
The multiplicative factor used in the guarantee is scalable for
higher or lower guarantees.
Higher values decrease the likelihood that databases can be
distinguished as a result of queries, but make results less accurate.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Joss Wright Privacy-Preserving Data Analysis: 49/57
50. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio
oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi
Privacy Mechanisms Notable Cases State of the Art Conclusions
Differential Privacy Illustrated
Pr[x]
a b µ1 µ2
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Joss Wright Privacy-Preserving Data Analysis: 50/57
51. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio
oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi
Privacy Mechanisms Notable Cases State of the Art Conclusions
Differential Privacy Illustrated (Explanation)
In the previous slide, let µ1 and µ2 be two “true” results of a
query, such as a count function, from each of two databases that
differ in a single record.
With random noise added, drawn from the Laplace distribution,
both a and b are possible “noisy” results of the query for either
database.
Importantly, the ratio between the probability of a given noisy
result, such as a or b, based on µ1 , and the probability of that
result based on µ2 , is constant.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Joss Wright Privacy-Preserving Data Analysis: 51/57
52. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio
oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi
Privacy Mechanisms Notable Cases State of the Art Conclusions
Properties of Differential Privacy
Differentially private queries are neatly composable in two senses:
A complex sequence of queries can be given to the database
owner, each of which depends on the accurate result of the
previous query. At the end, only the final result need be perturbed.
The result of a differentially private query exhausts some amount
of the privacy guarantee. Further queries can be made until this
budget is exhausted.
At this point the database should be destroyed!
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Joss Wright Privacy-Preserving Data Analysis: 52/57
53. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio
oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi
Privacy Mechanisms Notable Cases State of the Art Conclusions
Practical Application
Privacy Integrated Queries (PINQ)
For practical application, we do not want database owners to need
to understand the theory.
There is now a simple database query language, similar to SQL,
that automatically enforces differential privacy guarantees.
Has been used in academic analyses, but not commercially.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Joss Wright Privacy-Preserving Data Analysis: 53/57
54. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio
oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi
Privacy Mechanisms Notable Cases State of the Art Conclusions
Practical Application
Smart Grids
Recent work by Danezis demonstrates differentially-private smart
metering for electrical grids.
Injects noise in billing by increasing the amount you pay.
Rapidly gets very expensive, but gives quantifiable privacy goals.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Joss Wright Privacy-Preserving Data Analysis: 54/57
55. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio
oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi
Privacy Mechanisms Notable Cases State of the Art Conclusions
Future Work
Differential privacy is a very strong guarantee. How effectively can
it be weakened?
Distributed settings for data sources and noise addition.
Streaming, or otherwise changing, data rather than static databases.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Joss Wright Privacy-Preserving Data Analysis: 55/57
56. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio
oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi
Privacy Mechanisms Notable Cases State of the Art Conclusions
Lessons
A step back: anonymizing data is hard.
We are only just beginning to realise just how hard.
Differential privacy, and PINQ, are good examples of how to go
about this and what limitations we face.
Netflix and other examples show that these risks are not isolated
or theoretical.
This is before we look at Facebook, Google, Amazon.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Joss Wright Privacy-Preserving Data Analysis: 56/57
57. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio
oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi
Privacy Mechanisms Notable Cases State of the Art Conclusions
Lessons
If you are in a position where you need to anonymize data, think
very carefully about how you treat the data, and what you release.
Eyeballing data, and removing obvious linkages, is not even close to
sufficient.
Do it if you want to, but don’t claim it’s anonymized.
The most important principle is data minimisation.
Only gather what you need.
Only use it for what you (initially) need.
Only share it when you must.
. . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
Joss Wright Privacy-Preserving Data Analysis: 57/57