SlideShare una empresa de Scribd logo
1 de 57
Descargar para leer sin conexión
iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio                                                        ioiioiioiioiio
 oiioiioiioiioiio                                                     iioiioxford internet ins�tute university of oxfordoiioi

 Privacy Mechanisms Notable Cases State of the Art Conclusions




                              Privacy-Preserving Data Analysis
                                   Mechanisms and Formal Guarantees


                                                        Joss Wright
                                                   joss.wright@oii.ox.ac.uk


                                                   Oxford Internet Institute
                                                     Oxford University




                                                                                            .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                       ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..

Joss Wright                                                                                                       Privacy-Preserving Data Analysis: 1/57
iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio                                                        ioiioiioiioiio
 oiioiioiioiioiio                                                     iioiioxford internet ins�tute university of oxfordoiioi

 Privacy Mechanisms Notable Cases State of the Art Conclusions


     Privacy


      What is privacy?
              Many definitions in different areas of application.
              A useful definition: informational self-determination
                       Enable data subjects to control how, in what way, and to whom
                       their data is made available.




                                                                                            .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                       ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..

Joss Wright                                                                                                       Privacy-Preserving Data Analysis: 2/57
iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio                                                        ioiioiioiioiio
 oiioiioiioiioiio                                                     iioiioxford internet ins�tute university of oxfordoiioi

 Privacy Mechanisms Notable Cases State of the Art Conclusions


     Privacy
      What is privacy?
              Within the privacy enhancing technologies community:
                       Protecting the relations between communicating parties from
                       observation.
                               Context privacy.
                               Anonymous communications.
                       Preventing deduction of identities or attributes from collections of
                       data.
                               Data privacy.
                               Strongly related concepts, but surprisingly separate fields of research.




                                                                                            .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                       ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..

Joss Wright                                                                                                       Privacy-Preserving Data Analysis: 3/57
iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio                                                        ioiioiioiioiio
 oiioiioiioiioiio                                                     iioiioxford internet ins�tute university of oxfordoiioi

 Privacy Mechanisms Notable Cases State of the Art Conclusions


     Data Privacy


              Protection of individual data subjects from identification.
              Typically we work within the context of statistical queries on
              databases.
              Counts, averages, histogram queries, etc.




                                                                                            .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                       ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..

Joss Wright                                                                                                       Privacy-Preserving Data Analysis: 4/57
iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio                                                        ioiioiioiioiio
 oiioiioiioiioiio                                                     iioiioxford internet ins�tute university of oxfordoiioi

 Privacy Mechanisms Notable Cases State of the Art Conclusions


     Model


              Consider a database as made up from a number of rows
              representing a single, unique individual, with columns showing
              attributes.
              All databases are not like this, but it’s useful for mechanism design
              and gives sufficient generality.




                                                                                            .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                       ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..

Joss Wright                                                                                                       Privacy-Preserving Data Analysis: 5/57
iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio                                                        ioiioiioiioiio
 oiioiioiioiioiio                                                     iioiioxford internet ins�tute university of oxfordoiioi

 Privacy Mechanisms Notable Cases State of the Art Conclusions


     Model


                                               Name           Age         Height
                                                Joss          31           168
                                               Alice          30           144
                                                Bob           25           200




                                                                                            .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                       ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..

Joss Wright                                                                                                       Privacy-Preserving Data Analysis: 6/57
iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio                                                        ioiioiioiioiio
 oiioiioiioiioiio                                                     iioiioxford internet ins�tute university of oxfordoiioi

 Privacy Mechanisms Notable Cases State of the Art Conclusions


     Actors


              Data subjects
                       Owners of the data
              Holders and publishers of data
              Recipients of data
                       Attacker




                                                                                            .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                       ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..

Joss Wright                                                                                                       Privacy-Preserving Data Analysis: 7/57
iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio                                                        ioiioiioiioiio
 oiioiioiioiioiio                                                     iioiioxford internet ins�tute university of oxfordoiioi

 Privacy Mechanisms Notable Cases State of the Art Conclusions


     Trust in the System
      Where do we place trust in the system?
              Subjects
                       Need not be trusted as they control their own data.
              Publishers
                       May need to be trusted in how they gather the data.
                       If you expect them to control release, they must be trusted.
              Data Recipients
                       Adversarial and malicious.




                                                                                            .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                       ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..

Joss Wright                                                                                                       Privacy-Preserving Data Analysis: 8/57
iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio                                                        ioiioiioiioiio
 oiioiioiioiioiio                                                     iioiioxford internet ins�tute university of oxfordoiioi

 Privacy Mechanisms Notable Cases State of the Art Conclusions


     Basic Mechanisms

              Anonymization
                       Remove explicit identifiers such as names.
              Privacy-preserving data mining
                       Restrict queries to preserve privacy or results.
                       Preferably enforced by the data publisher.
              Data peturbation
                       Alter data to prevent undesirable inferences from being drawn




                                                                                            .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                       ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..

Joss Wright                                                                                                       Privacy-Preserving Data Analysis: 9/57
iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio                                                        ioiioiioiioiio
 oiioiioiioiioiio                                                     iioiioxford internet ins�tute university of oxfordoiioi

 Privacy Mechanisms Notable Cases State of the Art Conclusions


     Anonymization

              Remove names or other obvious identifiers from data.
              Problems arise with quasi-identifiers.
                       Combinations of record values that uniquely identify individuals.
                       These can be difficult to specify or even detect.
                       Exacerbated by the fact that data from external sources may
                       combine with the database to form a quasi-identifier.
                       We’ll come back to this.




                                                                                            .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                       ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..

Joss Wright                                                                                                     Privacy-Preserving Data Analysis: 10/57
iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio                                                        ioiioiioiioiio
 oiioiioiioiioiio                                                     iioiioxford internet ins�tute university of oxfordoiioi

 Privacy Mechanisms Notable Cases State of the Art Conclusions


     Anonymization


                                              Name             Age         Height
                                                Joss           31           168
                                               Alice           30           144
                                               Bob             25           200
                                              Charles          31           187
                                               David           27           168




                                                                                            .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                       ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..

Joss Wright                                                                                                     Privacy-Preserving Data Analysis: 11/57
iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio                                                        ioiioiioiioiio
 oiioiioiioiioiio                                                     iioiioxford internet ins�tute university of oxfordoiioi

 Privacy Mechanisms Notable Cases State of the Art Conclusions


     Anonymization


                                              Name             Age         Height
                                                Joss           31           168
                                               Alice           30           144
                                               Bob             25           200
                                              Charles          31           187
                                               David           27           168




                                                                                            .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                       ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..

Joss Wright                                                                                                     Privacy-Preserving Data Analysis: 12/57
iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio                                                        ioiioiioiioiio
 oiioiioiioiioiio                                                     iioiioxford internet ins�tute university of oxfordoiioi

 Privacy Mechanisms Notable Cases State of the Art Conclusions


     Anonymization

                                              Name             Age         Height
                                                Joss           31           168
                                               Alice           30           144
                                               Bob             25           200
                                              Charles          31           187
                                               David           30           168

      Red values are unique, therefore quasi-identifiers.




                                                                                            .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                       ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..

Joss Wright                                                                                                     Privacy-Preserving Data Analysis: 13/57
iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio                                                        ioiioiioiioiio
 oiioiioiioiioiio                                                     iioiioxford internet ins�tute university of oxfordoiioi

 Privacy Mechanisms Notable Cases State of the Art Conclusions


     Anonymization

                                              Name             Age         Height
                                                Joss           31           168
                                               Alice           30           144
                                               Bob             25           200
                                              Charles          31           187
                                               David           30           168

      Blue values are unique combinations, and so quasi-identifiers.




                                                                                            .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                       ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..

Joss Wright                                                                                                     Privacy-Preserving Data Analysis: 14/57
iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio                                                        ioiioiioiioiio
 oiioiioiioiioiio                                                     iioiioxford internet ins�tute university of oxfordoiioi

 Privacy Mechanisms Notable Cases State of the Art Conclusions


     Anonymization Methods


              One of the most well-known anonymizing mechanisms applied to
              data is k-anonymity
              Each unique set of records in a database should be combined with
              (1 − k) other records in the database.
              Any given record therefore describes at least k people.
                       The probability that you are identified by that record is 1/k.




                                                                                            .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                       ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..

Joss Wright                                                                                                     Privacy-Preserving Data Analysis: 15/57
iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio                                                        ioiioiioiioiio
 oiioiioiioiioiio                                                     iioiioxford internet ins�tute university of oxfordoiioi

 Privacy Mechanisms Notable Cases State of the Art Conclusions


     k-anonymity


                                              Name             Age         Height
                                                Joss           31           168
                                               Alice           30           144
                                               Bob             25           200
                                              Charles          31           187
                                               David           27           168




                                                                                            .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                       ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..

Joss Wright                                                                                                     Privacy-Preserving Data Analysis: 16/57
iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio                                                        ioiioiioiioiio
 oiioiioiioiioiio                                                     iioiioxford internet ins�tute university of oxfordoiioi

 Privacy Mechanisms Notable Cases State of the Art Conclusions


     k-anonymity


                                           Name                Age            Height
                                             Joss            [25-35]          ≤180
                                            Alice            [25-35]          ≤180
                                            Bob              [25-35]          >180
                                           Charles           [25-35]          >180
                                            David            [25-35]          ≤180




                                                                                            .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                       ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..

Joss Wright                                                                                                     Privacy-Preserving Data Analysis: 17/57
iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio                                                        ioiioiioiioiio
 oiioiioiioiioiio                                                     iioiioxford internet ins�tute university of oxfordoiioi

 Privacy Mechanisms Notable Cases State of the Art Conclusions


     k-anonymity Applied

              This is not a hypothetical issue.
              When Sweeney proposed k-anonymity, she demonstrated the
              risks.
                       Took postcode, date of birth and sex from a published voter
                       register
                       Took anonymized published medical records
                       Identified the record belonging to a former governor of
                       Massachusetts.




                                                                                            .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                       ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..

Joss Wright                                                                                                     Privacy-Preserving Data Analysis: 18/57
iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio                                                        ioiioiioiioiio
 oiioiioiioiioiio                                                     iioiioxford internet ins�tute university of oxfordoiioi

 Privacy Mechanisms Notable Cases State of the Art Conclusions


     Beyond k-anonymity


              k-anonymity gives a basic level of anonymization that prevents an
              individual being simply re-identified from their published attributes.
              There are, naturally, more subtle issues.
              We may still be able to infer sensitive information about a person,
              even if we can’t directly identify them.




                                                                                            .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                       ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..

Joss Wright                                                                                                     Privacy-Preserving Data Analysis: 19/57
iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio                                                        ioiioiioiioiio
 oiioiioiioiioiio                                                     iioiioxford internet ins�tute university of oxfordoiioi

 Privacy Mechanisms Notable Cases State of the Art Conclusions


     l-diversity


              k-anonymity ensures that an individual is indistinguishable from a
              group of other individuals, preventing their direct re-identification.
              It could be, however, that attributes shared by the entire group are
              sensitive.




                                                                                            .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                       ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..

Joss Wright                                                                                                     Privacy-Preserving Data Analysis: 20/57
iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio                                                        ioiioiioiioiio
 oiioiioiioiioiio                                                     iioiioxford internet ins�tute university of oxfordoiioi

 Privacy Mechanisms Notable Cases State of the Art Conclusions


     l-diversity


                                       Name             Age         Height          Illness
                                         Joss           31           168               Flu
                                        Alice           30           144               Flu
                                        Bob             25           200              HIV
                                       Charles          31           187              HIV
                                        David           27           168               Flu




                                                                                            .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                       ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..

Joss Wright                                                                                                     Privacy-Preserving Data Analysis: 21/57
iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio                                                        ioiioiioiioiio
 oiioiioiioiioiio                                                     iioiioxford internet ins�tute university of oxfordoiioi

 Privacy Mechanisms Notable Cases State of the Art Conclusions


     l-diversity


                                    Name               Age             Height         Illness
                                      Joss           [25-35]           ≤180              Flu
                                     Alice           [25-35]           ≤180              Flu
                                     Bob             [25-35]           >180             HIV
                                    Charles          [25-35]           >180             HIV
                                     David           [25-35]           ≤180              Flu




                                                                                            .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                       ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..

Joss Wright                                                                                                     Privacy-Preserving Data Analysis: 22/57
iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio                                                        ioiioiioiioiio
 oiioiioiioiioiio                                                     iioiioxford internet ins�tute university of oxfordoiioi

 Privacy Mechanisms Notable Cases State of the Art Conclusions


     l-diversity


                                    Name               Age             Height         Illness
                                      Joss           [25-35]           ≤180              Flu
                                     Alice           [25-35]           ≤180              Flu
                                     Bob             [25-35]           >180             HIV
                                    Charles          [25-35]           >180             HIV
                                     David           [25-35]           ≤180              Flu




                                                                                            .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                       ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..

Joss Wright                                                                                                     Privacy-Preserving Data Analysis: 23/57
iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio                                                        ioiioiioiioiio
 oiioiioiioiioiio                                                     iioiioxford internet ins�tute university of oxfordoiioi

 Privacy Mechanisms Notable Cases State of the Art Conclusions


     l-diversity


                                    Name               Age             Height         Illness
                                      Joss           [25-35]           ≤200              Flu
                                     Alice           [25-35]           ≤200              Flu
                                     Bob             [25-35]           ≤200             HIV
                                    Charles          [25-35]           ≤200             HIV
                                     David           [25-35]           ≤200              Flu




                                                                                            .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                       ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..

Joss Wright                                                                                                     Privacy-Preserving Data Analysis: 24/57
iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio                                                        ioiioiioiioiio
 oiioiioiioiioiio                                                     iioiioxford internet ins�tute university of oxfordoiioi

 Privacy Mechanisms Notable Cases State of the Art Conclusions


     l-diversity

              l-diversity ensures that not only are all users k-anonymous, but
              that each group of users shares a variety of sensitive attributes.
              Variations ensure that all sensitive attributes are evenly or
              sufficiently distributed to avoid high probability association of user
              with attribute.
              One notable extenstion is t-closeness that ensures that the
              distribution of attributes in the group is close to the distribution
              across the entire table.




                                                                                            .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                       ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..

Joss Wright                                                                                                     Privacy-Preserving Data Analysis: 25/57
iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio                                                        ioiioiioiioiio
 oiioiioiioiioiio                                                     iioiioxford internet ins�tute university of oxfordoiioi

 Privacy Mechanisms Notable Cases State of the Art Conclusions


     Peturbation

              The above approaches maintain the consistency of the database.
              One of the oldest ideas is simply to replace genuine values with
              perturbed values that maintain almost-correct desirable
              properties.
              For numeric quantities this can simply be the addition of random
              noise according to some appropriate distribution.
                       Obviously this works best for numerical data.
                       For categories, this can result in attributes being re-assigned in a
                       variety of ways.




                                                                                            .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                       ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..

Joss Wright                                                                                                     Privacy-Preserving Data Analysis: 26/57
iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio                                                        ioiioiioiioiio
 oiioiioiioiioiio                                                     iioiioxford internet ins�tute university of oxfordoiioi

 Privacy Mechanisms Notable Cases State of the Art Conclusions


     Permutation


              Sensitive attributes can be swapped between data records,
              maintaining statistical quantities such as aggregate counts, averages
              and distribution of data.
              This has to be performed sensitively with respect to the required
              analyses.
                       Typically on an ad-hoc, per-database basis.




                                                                                            .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                       ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..

Joss Wright                                                                                                     Privacy-Preserving Data Analysis: 27/57
iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio                                                        ioiioiioiioiio
 oiioiioiioiioiio                                                     iioiioxford internet ins�tute university of oxfordoiioi

 Privacy Mechanisms Notable Cases State of the Art Conclusions


     Sweeney’s k-anonymity Re-identification

              In 2001, Sweeney set out to prove the ideas behind k-anonymity.
              Took publicly available voter registration data and published,
              anonymized medical records. (GIC Healthcare Data.)
              At the time of the data collection, William Weld was the governor
              of Massachusetts.
                       According to the voter records, only six people in Cambridge,
                       Massachusetts shared his birth date.
                       Of those six, three were male.
                       Only one lived within his (5-digit) ZIP code.




                                                                                            .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                       ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..

Joss Wright                                                                                                     Privacy-Preserving Data Analysis: 28/57
iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio                                                        ioiioiioiioiio
 oiioiioiioiioiio                                                     iioiioxford internet ins�tute university of oxfordoiioi

 Privacy Mechanisms Notable Cases State of the Art Conclusions


     Sweeney’s k-anonymity Re-identification

              The anonymized medical records contained over 100 attributes
              detailing diagnoses, procedures and medications.
              Sweeney calculated that 87% of US citizens were uniquely
              identifiable through the quasi-identifier of {sex, date of birth,
              5-digit ZIP}
                       53% from {sex, date of birth, city}
                       18% from {sex, date of birth, county}




                                                                                            .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                       ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..

Joss Wright                                                                                                     Privacy-Preserving Data Analysis: 29/57
iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio                                                        ioiioiioiioiio
 oiioiioiioiioiio                                                     iioiioxford internet ins�tute university of oxfordoiioi

 Privacy Mechanisms Notable Cases State of the Art Conclusions


     Netflix Prize

              Netflix wanted to improve its film recommendation algorithm.
              Published a database of over 100,000,000 film ratings by roughly
              500,000 subscribers between 1999 and 2005.
              A million dollar prize was offered for an algorithm that would
              improve the recommendations given to users by a given degree of
              accuracy.
              “...all customer identifying information has been removed.”




                                                                                            .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                       ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..

Joss Wright                                                                                                     Privacy-Preserving Data Analysis: 30/57
iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio                                                        ioiioiioiioiio
 oiioiioiioiioiio                                                     iioiioxford internet ins�tute university of oxfordoiioi

 Privacy Mechanisms Notable Cases State of the Art Conclusions


     Netflix Prize

              Narayanan and Shmatikov disagreed.
              Combined Netflix data with IMDb data to re-identify a large
              number of users.
                       Linked Netflix ratings to IMDb profiles.
                       Showed the entire viewing history of many users.
                       Demonstrated how information such as political preference could
                       be extracted from the available data.
                       Proof of concept algorithm used IMDb. Easily adaptable for
                       alternative information sources.




                                                                                            .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                       ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..

Joss Wright                                                                                                     Privacy-Preserving Data Analysis: 31/57
iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio                                                        ioiioiioiioiio
 oiioiioiioiioiio                                                     iioiioxford internet ins�tute university of oxfordoiioi

 Privacy Mechanisms Notable Cases State of the Art Conclusions


     Netflix Prize


              With 8 film ratings, 96% of subscribers can be uniquely identified.
              With 2 ratings, and dates, 64% can be completely deanonymized.
              With 2 ratings, and dates, 89% can be reduced to a possible 8
              users.




                                                                                            .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                       ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..

Joss Wright                                                                                                     Privacy-Preserving Data Analysis: 32/57
iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio                                                        ioiioiioiioiio
 oiioiioiioiioiio                                                     iioiioxford internet ins�tute university of oxfordoiioi

 Privacy Mechanisms Notable Cases State of the Art Conclusions


     Netflix Prize Redux


              Following this publication, Netflix’s response was...
              ... to announce a second Netflix prize containing more data points,
              including age, zip code, gender and previously-chosen films.
              Eventually cancelled, but only in response to legal action from
              customers and concerns from the US Federal Trade Commission.




                                                                                            .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                       ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..

Joss Wright                                                                                                     Privacy-Preserving Data Analysis: 33/57
iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio                                                        ioiioiioiioiio
 oiioiioiioiioiio                                                     iioiioxford internet ins�tute university of oxfordoiioi

 Privacy Mechanisms Notable Cases State of the Art Conclusions


     Mechanisms Revisited


              The mechanisms we’ve looked at so far are:
                       Typically ad-hoc based on the desired utility; the purpose for which
                       the data will be used.
                       Without formal guarantees.
                               Quantifiable probability that individuals could be reidentified.
                       Sensitive to auxiliary information from external data sources.




                                                                                            .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                       ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..

Joss Wright                                                                                                     Privacy-Preserving Data Analysis: 34/57
iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio                                                        ioiioiioiioiio
 oiioiioiioiioiio                                                     iioiioxford internet ins�tute university of oxfordoiioi

 Privacy Mechanisms Notable Cases State of the Art Conclusions


     Mechanisms Revisited


              We can also consider privacy mechanisms as falling into one of
              two families:
                       Non-interactive
                               Anonymize the data somehow, then release it.
                       Interactive
                               Keep the database secret, and only release results to queries.




                                                                                            .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                       ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..

Joss Wright                                                                                                     Privacy-Preserving Data Analysis: 35/57
iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio                                                        ioiioiioiioiio
 oiioiioiioiioiio                                                     iioiioxford internet ins�tute university of oxfordoiioi

 Privacy Mechanisms Notable Cases State of the Art Conclusions


     Non-Interactive Mechanisms

              Historically, the main way of doing things.
                       Including most of the methods we’ve looked at so far.
              A major limitation of this approach to anonymization is that it
              requires you to fix the utility before you release the data.
                       Data is either useless and anonymous
                       Or useful and identifiable.
                       It is difficult to predict interactions with data that might be released
                       in the future.




                                                                                            .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                       ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..

Joss Wright                                                                                                     Privacy-Preserving Data Analysis: 36/57
iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio                                                        ioiioiioiioiio
 oiioiioiioiioiio                                                     iioiioxford internet ins�tute university of oxfordoiioi

 Privacy Mechanisms Notable Cases State of the Art Conclusions


     Interactive Mechanisms


              In interactive mechanisms, the data is never released.
              Instead, queries are sent to the holder of the database, who
              releases an answer.
              This approach is taken by the current state of the art: differential
              privacy.




                                                                                            .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                       ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..

Joss Wright                                                                                                     Privacy-Preserving Data Analysis: 37/57
iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio                                                        ioiioiioiioiio
 oiioiioiioiioiio                                                     iioiioxford internet ins�tute university of oxfordoiioi

 Privacy Mechanisms Notable Cases State of the Art Conclusions


     Differential Privacy


              In 1978, Dalenius stated the following desirable property for
              privacy-preserving statistical databases:
              “A statistical database should reveal nothing about an individual
              that could not be learned without access to the database.”
              This is impossible, largely due to the existence of auxiliary external
              information that can be combined with the data in the database.




                                                                                            .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                       ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..

Joss Wright                                                                                                     Privacy-Preserving Data Analysis: 38/57
iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio                                                        ioiioiioiioiio
 oiioiioiioiioiio                                                     iioiioxford internet ins�tute university of oxfordoiioi

 Privacy Mechanisms Notable Cases State of the Art Conclusions


     Differential Privacy
              ‘Suppose one’s height were considered a sensitive piece of
              information, and that revealing the height of an individual were a
              privacy breach. Assume that a database yields the average
              heights of women of different nationalities. An adversary who
              has access to the statistical database and the auxiliary
              information “Terry Gross is two inches shorter than the average
              Lithuanian woman” learns Terry Gross’ height, while anyone
              learning only the auxiliary information, without access to the
              average heights, learns relatively little.’

                                                                                                                                 – Dwork


                                                                                            .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                       ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..

Joss Wright                                                                                                     Privacy-Preserving Data Analysis: 39/57
iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio                                                        ioiioiioiioiio
 oiioiioiioiioiio                                                     iioiioxford internet ins�tute university of oxfordoiioi

 Privacy Mechanisms Notable Cases State of the Art Conclusions


     Differential Privacy



      Critically, this privacy breach occurs whether or not Terry Gross’ data is in
      the database.




                                                                                            .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                       ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..

Joss Wright                                                                                                     Privacy-Preserving Data Analysis: 40/57
iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio                                                        ioiioiioiioiio
 oiioiioiioiioiio                                                     iioiioxford internet ins�tute university of oxfordoiioi

 Privacy Mechanisms Notable Cases State of the Art Conclusions


     Differential Privacy
              Rather than guaranteeing that a privacy breach will not occur,
              differential privacy guarantees that the privacy breach will not
              occur due to the data in the database.
              Reformulated: Anything that can happen if your data is in the
              database could have happened even if your data weren’t in the
              database.
              This neatly accomodates any and all possible auxiliary information
              available now or in the future.
              It also divorces the privacy mechanism from the nature of the
              underlying data, providing a general mechanism.


                                                                                            .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                       ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..

Joss Wright                                                                                                     Privacy-Preserving Data Analysis: 41/57
iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio                                                        ioiioiioiioiio
 oiioiioiioiioiio                                                     iioiioxford internet ins�tute university of oxfordoiioi

 Privacy Mechanisms Notable Cases State of the Art Conclusions


     Differential Privacy Core


      A randomised function K achieves ϵ-differential privacy if, for any two
      databases D1 , D2 differing on at most one element, and all
      S ⊆ Range(K):

                                 Pr[K(D1 ) ∈ S] ≤ eϵ × Pr[K(D2 ) ∈ S]




                                                                                            .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                       ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..

Joss Wright                                                                                                     Privacy-Preserving Data Analysis: 42/57
iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio                                                        ioiioiioiioiio
 oiioiioiioiioiio                                                     iioiioxford internet ins�tute university of oxfordoiioi

 Privacy Mechanisms Notable Cases State of the Art Conclusions


     Differential Privacy Core


      Alternatively:
                                                    Pr[K(D1 )∈S]
                                                    Pr[K(D2 )∈S]        ≤ eϵ

      The ratio between the two probabilities is bounded by eϵ




                                                                                            .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                       ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..

Joss Wright                                                                                                     Privacy-Preserving Data Analysis: 43/57
iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio                                                        ioiioiioiioiio
 oiioiioiioiioiio                                                     iioiioxford internet ins�tute university of oxfordoiioi

 Privacy Mechanisms Notable Cases State of the Art Conclusions


     The Exponential Function
              eε

                   150
                   100
                   50
                   0




                         0                1                2                3                     4                            5

                                                                    ε


                                                                                            .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                       ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..

Joss Wright                                                                                                     Privacy-Preserving Data Analysis: 44/57
iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio                                                        ioiioiioiioiio
 oiioiioiioiioiio                                                     iioiioxford internet ins�tute university of oxfordoiioi

 Privacy Mechanisms Notable Cases State of the Art Conclusions


     Diferential Privacy Core


              Translated: for any calculation that you make on a database, any
              result you get is (almost) equally probable if you add a person, and
              thus a single record, to that database.
              Alternatively put: two databases that differ in a single record
              should be indistinguishable, with given probability, when accessed
              via the privacy mechanism.




                                                                                            .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                       ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..

Joss Wright                                                                                                     Privacy-Preserving Data Analysis: 45/57
iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio                                                        ioiioiioiioiio
 oiioiioiioiioiio                                                     iioiioxford internet ins�tute university of oxfordoiioi

 Privacy Mechanisms Notable Cases State of the Art Conclusions


     Achieving Differential Privacy

              How do we achieve this guarantee?
              There are a variety of mechanisms proposed in the literature, but
              Dwork’s original suggestion remains popular:
              Appropriately chosen random noise is added to the result of a
              query of arbitrary complexity.
                       Noise added to the result means that the original database retains
                       its accuracy.
                       The Laplace distribution provides desirable properties for the
                       appropriate noise.




                                                                                            .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                       ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..

Joss Wright                                                                                                     Privacy-Preserving Data Analysis: 46/57
iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio                                                        ioiioiioiioiio
 oiioiioiioiioiio                                                     iioiioxford internet ins�tute university of oxfordoiioi

 Privacy Mechanisms Notable Cases State of the Art Conclusions


     The Laplace Distribution




                                                                                            .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                       ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..

Joss Wright                                                                                                     Privacy-Preserving Data Analysis: 47/57
iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio                                                        ioiioiioiioiio
 oiioiioiioiioiio                                                     iioiioxford internet ins�tute university of oxfordoiioi

 Privacy Mechanisms Notable Cases State of the Art Conclusions


     Achieving Differential Privacy
              How do we know how much noise to add?
                       We use the L1 -sensitivity of the function to bound the noise:
                       Defined as the amount by which the query could change if a single
                       record were added to the database.
              Recall that our guarantee is based around indistinguishability
              between similar databases.
              As an example: the count function (e.g. “How many people in the
              database are left-handed?”) can only differ by one.
              Other queries types differ, but many complex queries have
              manageable L1 -sensitivity.


                                                                                            .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                       ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..

Joss Wright                                                                                                     Privacy-Preserving Data Analysis: 48/57
iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio                                                        ioiioiioiioiio
 oiioiioiioiioiio                                                     iioiioxford internet ins�tute university of oxfordoiioi

 Privacy Mechanisms Notable Cases State of the Art Conclusions


     Properties of Differential Privacy


              Use of the Laplace distribution to add noise provably adds the
              smallest amount required to preserve privacy.
              The multiplicative factor used in the guarantee is scalable for
              higher or lower guarantees.
                       Higher values decrease the likelihood that databases can be
                       distinguished as a result of queries, but make results less accurate.




                                                                                            .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                       ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..

Joss Wright                                                                                                     Privacy-Preserving Data Analysis: 49/57
iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio                                                        ioiioiioiioiio
 oiioiioiioiioiio                                                     iioiioxford internet ins�tute university of oxfordoiioi

 Privacy Mechanisms Notable Cases State of the Art Conclusions


     Differential Privacy Illustrated
                   Pr[x]




                                                   a        b µ1 µ2



                                                                                            .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                       ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..

Joss Wright                                                                                                     Privacy-Preserving Data Analysis: 50/57
iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio                                                        ioiioiioiioiio
 oiioiioiioiioiio                                                     iioiioxford internet ins�tute university of oxfordoiioi

 Privacy Mechanisms Notable Cases State of the Art Conclusions


     Differential Privacy Illustrated (Explanation)
              In the previous slide, let µ1 and µ2 be two “true” results of a
              query, such as a count function, from each of two databases that
              differ in a single record.
              With random noise added, drawn from the Laplace distribution,
              both a and b are possible “noisy” results of the query for either
              database.
              Importantly, the ratio between the probability of a given noisy
              result, such as a or b, based on µ1 , and the probability of that
              result based on µ2 , is constant.



                                                                                            .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                       ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..

Joss Wright                                                                                                     Privacy-Preserving Data Analysis: 51/57
iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio                                                        ioiioiioiioiio
 oiioiioiioiioiio                                                     iioiioxford internet ins�tute university of oxfordoiioi

 Privacy Mechanisms Notable Cases State of the Art Conclusions


     Properties of Differential Privacy

              Differentially private queries are neatly composable in two senses:
                       A complex sequence of queries can be given to the database
                       owner, each of which depends on the accurate result of the
                       previous query. At the end, only the final result need be perturbed.
                       The result of a differentially private query exhausts some amount
                       of the privacy guarantee. Further queries can be made until this
                       budget is exhausted.
                               At this point the database should be destroyed!




                                                                                            .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                       ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..

Joss Wright                                                                                                     Privacy-Preserving Data Analysis: 52/57
iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio                                                        ioiioiioiioiio
 oiioiioiioiioiio                                                     iioiioxford internet ins�tute university of oxfordoiioi

 Privacy Mechanisms Notable Cases State of the Art Conclusions


     Practical Application


              Privacy Integrated Queries (PINQ)
                       For practical application, we do not want database owners to need
                       to understand the theory.
                       There is now a simple database query language, similar to SQL,
                       that automatically enforces differential privacy guarantees.
                       Has been used in academic analyses, but not commercially.




                                                                                            .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                       ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..

Joss Wright                                                                                                     Privacy-Preserving Data Analysis: 53/57
iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio                                                        ioiioiioiioiio
 oiioiioiioiioiio                                                     iioiioxford internet ins�tute university of oxfordoiioi

 Privacy Mechanisms Notable Cases State of the Art Conclusions


     Practical Application


              Smart Grids
                       Recent work by Danezis demonstrates differentially-private smart
                       metering for electrical grids.
                       Injects noise in billing by increasing the amount you pay.
                       Rapidly gets very expensive, but gives quantifiable privacy goals.




                                                                                            .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                       ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..

Joss Wright                                                                                                     Privacy-Preserving Data Analysis: 54/57
iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio                                                        ioiioiioiioiio
 oiioiioiioiioiio                                                     iioiioxford internet ins�tute university of oxfordoiioi

 Privacy Mechanisms Notable Cases State of the Art Conclusions


     Future Work


              Differential privacy is a very strong guarantee. How effectively can
              it be weakened?
              Distributed settings for data sources and noise addition.
              Streaming, or otherwise changing, data rather than static databases.




                                                                                            .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                       ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..

Joss Wright                                                                                                     Privacy-Preserving Data Analysis: 55/57
iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio                                                        ioiioiioiioiio
 oiioiioiioiioiio                                                     iioiioxford internet ins�tute university of oxfordoiioi

 Privacy Mechanisms Notable Cases State of the Art Conclusions


     Lessons

              A step back: anonymizing data is hard.
                       We are only just beginning to realise just how hard.
                       Differential privacy, and PINQ, are good examples of how to go
                       about this and what limitations we face.
              Netflix and other examples show that these risks are not isolated
              or theoretical.
                       This is before we look at Facebook, Google, Amazon.




                                                                                            .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                       ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..

Joss Wright                                                                                                     Privacy-Preserving Data Analysis: 56/57
iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio                                                        ioiioiioiioiio
 oiioiioiioiioiio                                                     iioiioxford internet ins�tute university of oxfordoiioi

 Privacy Mechanisms Notable Cases State of the Art Conclusions


     Lessons

              If you are in a position where you need to anonymize data, think
              very carefully about how you treat the data, and what you release.
                       Eyeballing data, and removing obvious linkages, is not even close to
                       sufficient.
                       Do it if you want to, but don’t claim it’s anonymized.
              The most important principle is data minimisation.
                       Only gather what you need.
                       Only use it for what you (initially) need.
                       Only share it when you must.




                                                                                            .    .    .      . . . . . . . . . . . .               .    .        .    .    .
                                                                                       ..   ..   ..       .. .. .. .. .. .. .. .. .. .. .. .. ..   ..       ..   ..   ..

Joss Wright                                                                                                     Privacy-Preserving Data Analysis: 57/57

Más contenido relacionado

Similar a Joss Wright, Oxford Internet Institute (Plenary): Privacy-Preserving Data Analysis - Mechanisms and Formal Guarantees

Joss wright
Joss wrightJoss wright
Joss wright
oiisdp
 
1. What are two items to consider when creating a malware analysis.docx
1. What are two items to consider when creating a malware analysis.docx1. What are two items to consider when creating a malware analysis.docx
1. What are two items to consider when creating a malware analysis.docx
jackiewalcutt
 
WRIGHT_JEREMY_1000738685-1
WRIGHT_JEREMY_1000738685-1WRIGHT_JEREMY_1000738685-1
WRIGHT_JEREMY_1000738685-1
Jeremy Wright
 
Integrating Information Lifecycles
Integrating Information LifecyclesIntegrating Information Lifecycles
Integrating Information Lifecycles
Betsy Martens
 
jy-web-visualization-ux08-slides
jy-web-visualization-ux08-slidesjy-web-visualization-ux08-slides
jy-web-visualization-ux08-slides
Jeremy Yuille
 
Alfreda DudleyTowson University, USAJames BramanTo.docx
Alfreda DudleyTowson University, USAJames BramanTo.docxAlfreda DudleyTowson University, USAJames BramanTo.docx
Alfreda DudleyTowson University, USAJames BramanTo.docx
daniahendric
 

Similar a Joss Wright, Oxford Internet Institute (Plenary): Privacy-Preserving Data Analysis - Mechanisms and Formal Guarantees (20)

Through a Router Darkly - Remote Investigation of Internet Censorship
Through a Router Darkly - Remote Investigation of Internet CensorshipThrough a Router Darkly - Remote Investigation of Internet Censorship
Through a Router Darkly - Remote Investigation of Internet Censorship
 
Joss wright
Joss wrightJoss wright
Joss wright
 
Fine-Grained Censorship Mapping
Fine-Grained Censorship MappingFine-Grained Censorship Mapping
Fine-Grained Censorship Mapping
 
Humanizing bioinformatics
Humanizing bioinformaticsHumanizing bioinformatics
Humanizing bioinformatics
 
CLIR Fellows - Science Data - 14_0730
CLIR Fellows - Science Data - 14_0730CLIR Fellows - Science Data - 14_0730
CLIR Fellows - Science Data - 14_0730
 
1. What are two items to consider when creating a malware analysis.docx
1. What are two items to consider when creating a malware analysis.docx1. What are two items to consider when creating a malware analysis.docx
1. What are two items to consider when creating a malware analysis.docx
 
SELF-STUDY MATERIAL FOR THE USERS OF EUROSTAT MICRODATA SETS
SELF-STUDY MATERIAL FOR THE USERS OF EUROSTAT MICRODATA SETSSELF-STUDY MATERIAL FOR THE USERS OF EUROSTAT MICRODATA SETS
SELF-STUDY MATERIAL FOR THE USERS OF EUROSTAT MICRODATA SETS
 
Iot privacy vs convenience
Iot privacy vs  convenienceIot privacy vs  convenience
Iot privacy vs convenience
 
It walks, It talks and it will conduct economic espionage by Greg Carpenter
It walks, It talks and it will conduct economic espionage by Greg CarpenterIt walks, It talks and it will conduct economic espionage by Greg Carpenter
It walks, It talks and it will conduct economic espionage by Greg Carpenter
 
Achieving Privacy in Publishing Search logs
Achieving Privacy in Publishing Search logsAchieving Privacy in Publishing Search logs
Achieving Privacy in Publishing Search logs
 
WRIGHT_JEREMY_1000738685-1
WRIGHT_JEREMY_1000738685-1WRIGHT_JEREMY_1000738685-1
WRIGHT_JEREMY_1000738685-1
 
Identification of newborn abandoned babies, criminals of rape cases & unknown...
Identification of newborn abandoned babies, criminals of rape cases & unknown...Identification of newborn abandoned babies, criminals of rape cases & unknown...
Identification of newborn abandoned babies, criminals of rape cases & unknown...
 
The Rising Tide Raises All Boats: The Advancement of Science of Cybersecurity
The Rising Tide Raises All Boats:  The Advancement of Science of CybersecurityThe Rising Tide Raises All Boats:  The Advancement of Science of Cybersecurity
The Rising Tide Raises All Boats: The Advancement of Science of Cybersecurity
 
Integrating Information Lifecycles
Integrating Information LifecyclesIntegrating Information Lifecycles
Integrating Information Lifecycles
 
Biometric Databases and Hadoop__HadoopSummit2010
Biometric Databases and Hadoop__HadoopSummit2010Biometric Databases and Hadoop__HadoopSummit2010
Biometric Databases and Hadoop__HadoopSummit2010
 
jy-web-visualization-ux08-slides
jy-web-visualization-ux08-slidesjy-web-visualization-ux08-slides
jy-web-visualization-ux08-slides
 
Dataset Citation and Identifiers: DOIs, ARKs, and EZID
Dataset Citation and Identifiers: DOIs, ARKs, and EZIDDataset Citation and Identifiers: DOIs, ARKs, and EZID
Dataset Citation and Identifiers: DOIs, ARKs, and EZID
 
A workflow experiment; or (The Unexpected Virtue of Ignorance)
A workflow experiment; or (The Unexpected Virtue of Ignorance)A workflow experiment; or (The Unexpected Virtue of Ignorance)
A workflow experiment; or (The Unexpected Virtue of Ignorance)
 
NIST - определения для Интернета вещей
NIST - определения для Интернета вещейNIST - определения для Интернета вещей
NIST - определения для Интернета вещей
 
Alfreda DudleyTowson University, USAJames BramanTo.docx
Alfreda DudleyTowson University, USAJames BramanTo.docxAlfreda DudleyTowson University, USAJames BramanTo.docx
Alfreda DudleyTowson University, USAJames BramanTo.docx
 

Más de i_scienceEU

Más de i_scienceEU (20)

Internet science conference
Internet science conferenceInternet science conference
Internet science conference
 
Social life in digital societies: Trust, Reputation and Privacy EINS summer s...
Social life in digital societies: Trust, Reputation and Privacy EINS summer s...Social life in digital societies: Trust, Reputation and Privacy EINS summer s...
Social life in digital societies: Trust, Reputation and Privacy EINS summer s...
 
Privacy 2020 (Participants) EINS summer school
Privacy 2020 (Participants) EINS summer schoolPrivacy 2020 (Participants) EINS summer school
Privacy 2020 (Participants) EINS summer school
 
[participants Communicating Privacy Risks to Users] EINS summer school
[participants Communicating Privacy Risks to Users] EINS summer school[participants Communicating Privacy Risks to Users] EINS summer school
[participants Communicating Privacy Risks to Users] EINS summer school
 
Kave Salamatian, Universite de Savoie and Eiko Yoneki, University of Cambridg...
Kave Salamatian, Universite de Savoie and Eiko Yoneki, University of Cambridg...Kave Salamatian, Universite de Savoie and Eiko Yoneki, University of Cambridg...
Kave Salamatian, Universite de Savoie and Eiko Yoneki, University of Cambridg...
 
Runa Sandvik, The Tor Project, London: Online Anonymity: Before and After th...
 Runa Sandvik, The Tor Project, London: Online Anonymity: Before and After th... Runa Sandvik, The Tor Project, London: Online Anonymity: Before and After th...
Runa Sandvik, The Tor Project, London: Online Anonymity: Before and After th...
 
Karmen Guevara, University of Cambridge: Dimensions of Identity, Trust and Pr...
Karmen Guevara, University of Cambridge: Dimensions of Identity, Trust and Pr...Karmen Guevara, University of Cambridge: Dimensions of Identity, Trust and Pr...
Karmen Guevara, University of Cambridge: Dimensions of Identity, Trust and Pr...
 
Jonathan Cave, University of Warwick (Plenary): Agreeing to Disagree About Pr...
Jonathan Cave, University of Warwick (Plenary): Agreeing to Disagree About Pr...Jonathan Cave, University of Warwick (Plenary): Agreeing to Disagree About Pr...
Jonathan Cave, University of Warwick (Plenary): Agreeing to Disagree About Pr...
 
Chris Marsden, University of Essex (Plenary): Regulation, Standards, Governan...
Chris Marsden, University of Essex (Plenary): Regulation, Standards, Governan...Chris Marsden, University of Essex (Plenary): Regulation, Standards, Governan...
Chris Marsden, University of Essex (Plenary): Regulation, Standards, Governan...
 
Lizzie Coles-Kemp, Royal Holloway University of London: Privacy Awareness: An...
Lizzie Coles-Kemp, Royal Holloway University of London: Privacy Awareness: An...Lizzie Coles-Kemp, Royal Holloway University of London: Privacy Awareness: An...
Lizzie Coles-Kemp, Royal Holloway University of London: Privacy Awareness: An...
 
Caspar Bowden EINS Summer School
Caspar Bowden EINS Summer SchoolCaspar Bowden EINS Summer School
Caspar Bowden EINS Summer School
 
Joanna Kulesza, University of Lodz: Transboundary Challenges of Privacy Prote...
Joanna Kulesza, University of Lodz: Transboundary Challenges of Privacy Prote...Joanna Kulesza, University of Lodz: Transboundary Challenges of Privacy Prote...
Joanna Kulesza, University of Lodz: Transboundary Challenges of Privacy Prote...
 
Network of Excellence in Internet Science (Supported Activities, Stavrakakis,...
Network of Excellence in Internet Science (Supported Activities, Stavrakakis,...Network of Excellence in Internet Science (Supported Activities, Stavrakakis,...
Network of Excellence in Internet Science (Supported Activities, Stavrakakis,...
 
Network of Excellence in Internet Science (Supported Activities, Callegati, U...
Network of Excellence in Internet Science (Supported Activities, Callegati, U...Network of Excellence in Internet Science (Supported Activities, Callegati, U...
Network of Excellence in Internet Science (Supported Activities, Callegati, U...
 
Network of Excellence in Internet Science (SEA4, Organisation of open calls, ...
Network of Excellence in Internet Science (SEA4, Organisation of open calls, ...Network of Excellence in Internet Science (SEA4, Organisation of open calls, ...
Network of Excellence in Internet Science (SEA4, Organisation of open calls, ...
 
Network of Excellence in Internet Science (SEA3, Dissemination & Cooperation,...
Network of Excellence in Internet Science (SEA3, Dissemination & Cooperation,...Network of Excellence in Internet Science (SEA3, Dissemination & Cooperation,...
Network of Excellence in Internet Science (SEA3, Dissemination & Cooperation,...
 
Network of Excellence in Internet Science (SEA2, Standardisation & Legislatio...
Network of Excellence in Internet Science (SEA2, Standardisation & Legislatio...Network of Excellence in Internet Science (SEA2, Standardisation & Legislatio...
Network of Excellence in Internet Science (SEA2, Standardisation & Legislatio...
 
Network of Excellence in Internet Science (SEA1, E-presence, Dissemination an...
Network of Excellence in Internet Science (SEA1, E-presence, Dissemination an...Network of Excellence in Internet Science (SEA1, E-presence, Dissemination an...
Network of Excellence in Internet Science (SEA1, E-presence, Dissemination an...
 
Network of Excellence in Internet Science (Multidisciplinarity and its Implic...
Network of Excellence in Internet Science (Multidisciplinarity and its Implic...Network of Excellence in Internet Science (Multidisciplinarity and its Implic...
Network of Excellence in Internet Science (Multidisciplinarity and its Implic...
 
Network of Excellence in Internet Science (Multidisciplinarity and its Implic...
Network of Excellence in Internet Science (Multidisciplinarity and its Implic...Network of Excellence in Internet Science (Multidisciplinarity and its Implic...
Network of Excellence in Internet Science (Multidisciplinarity and its Implic...
 

Último

Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
KarakKing
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
QucHHunhnh
 

Último (20)

How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 

Joss Wright, Oxford Internet Institute (Plenary): Privacy-Preserving Data Analysis - Mechanisms and Formal Guarantees

  • 1. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions Privacy-Preserving Data Analysis Mechanisms and Formal Guarantees Joss Wright joss.wright@oii.ox.ac.uk Oxford Internet Institute Oxford University . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Joss Wright Privacy-Preserving Data Analysis: 1/57
  • 2. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions Privacy What is privacy? Many definitions in different areas of application. A useful definition: informational self-determination Enable data subjects to control how, in what way, and to whom their data is made available. . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Joss Wright Privacy-Preserving Data Analysis: 2/57
  • 3. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions Privacy What is privacy? Within the privacy enhancing technologies community: Protecting the relations between communicating parties from observation. Context privacy. Anonymous communications. Preventing deduction of identities or attributes from collections of data. Data privacy. Strongly related concepts, but surprisingly separate fields of research. . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Joss Wright Privacy-Preserving Data Analysis: 3/57
  • 4. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions Data Privacy Protection of individual data subjects from identification. Typically we work within the context of statistical queries on databases. Counts, averages, histogram queries, etc. . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Joss Wright Privacy-Preserving Data Analysis: 4/57
  • 5. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions Model Consider a database as made up from a number of rows representing a single, unique individual, with columns showing attributes. All databases are not like this, but it’s useful for mechanism design and gives sufficient generality. . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Joss Wright Privacy-Preserving Data Analysis: 5/57
  • 6. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions Model Name Age Height Joss 31 168 Alice 30 144 Bob 25 200 . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Joss Wright Privacy-Preserving Data Analysis: 6/57
  • 7. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions Actors Data subjects Owners of the data Holders and publishers of data Recipients of data Attacker . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Joss Wright Privacy-Preserving Data Analysis: 7/57
  • 8. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions Trust in the System Where do we place trust in the system? Subjects Need not be trusted as they control their own data. Publishers May need to be trusted in how they gather the data. If you expect them to control release, they must be trusted. Data Recipients Adversarial and malicious. . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Joss Wright Privacy-Preserving Data Analysis: 8/57
  • 9. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions Basic Mechanisms Anonymization Remove explicit identifiers such as names. Privacy-preserving data mining Restrict queries to preserve privacy or results. Preferably enforced by the data publisher. Data peturbation Alter data to prevent undesirable inferences from being drawn . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Joss Wright Privacy-Preserving Data Analysis: 9/57
  • 10. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions Anonymization Remove names or other obvious identifiers from data. Problems arise with quasi-identifiers. Combinations of record values that uniquely identify individuals. These can be difficult to specify or even detect. Exacerbated by the fact that data from external sources may combine with the database to form a quasi-identifier. We’ll come back to this. . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Joss Wright Privacy-Preserving Data Analysis: 10/57
  • 11. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions Anonymization Name Age Height Joss 31 168 Alice 30 144 Bob 25 200 Charles 31 187 David 27 168 . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Joss Wright Privacy-Preserving Data Analysis: 11/57
  • 12. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions Anonymization Name Age Height Joss 31 168 Alice 30 144 Bob 25 200 Charles 31 187 David 27 168 . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Joss Wright Privacy-Preserving Data Analysis: 12/57
  • 13. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions Anonymization Name Age Height Joss 31 168 Alice 30 144 Bob 25 200 Charles 31 187 David 30 168 Red values are unique, therefore quasi-identifiers. . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Joss Wright Privacy-Preserving Data Analysis: 13/57
  • 14. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions Anonymization Name Age Height Joss 31 168 Alice 30 144 Bob 25 200 Charles 31 187 David 30 168 Blue values are unique combinations, and so quasi-identifiers. . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Joss Wright Privacy-Preserving Data Analysis: 14/57
  • 15. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions Anonymization Methods One of the most well-known anonymizing mechanisms applied to data is k-anonymity Each unique set of records in a database should be combined with (1 − k) other records in the database. Any given record therefore describes at least k people. The probability that you are identified by that record is 1/k. . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Joss Wright Privacy-Preserving Data Analysis: 15/57
  • 16. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions k-anonymity Name Age Height Joss 31 168 Alice 30 144 Bob 25 200 Charles 31 187 David 27 168 . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Joss Wright Privacy-Preserving Data Analysis: 16/57
  • 17. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions k-anonymity Name Age Height Joss [25-35] ≤180 Alice [25-35] ≤180 Bob [25-35] >180 Charles [25-35] >180 David [25-35] ≤180 . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Joss Wright Privacy-Preserving Data Analysis: 17/57
  • 18. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions k-anonymity Applied This is not a hypothetical issue. When Sweeney proposed k-anonymity, she demonstrated the risks. Took postcode, date of birth and sex from a published voter register Took anonymized published medical records Identified the record belonging to a former governor of Massachusetts. . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Joss Wright Privacy-Preserving Data Analysis: 18/57
  • 19. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions Beyond k-anonymity k-anonymity gives a basic level of anonymization that prevents an individual being simply re-identified from their published attributes. There are, naturally, more subtle issues. We may still be able to infer sensitive information about a person, even if we can’t directly identify them. . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Joss Wright Privacy-Preserving Data Analysis: 19/57
  • 20. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions l-diversity k-anonymity ensures that an individual is indistinguishable from a group of other individuals, preventing their direct re-identification. It could be, however, that attributes shared by the entire group are sensitive. . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Joss Wright Privacy-Preserving Data Analysis: 20/57
  • 21. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions l-diversity Name Age Height Illness Joss 31 168 Flu Alice 30 144 Flu Bob 25 200 HIV Charles 31 187 HIV David 27 168 Flu . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Joss Wright Privacy-Preserving Data Analysis: 21/57
  • 22. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions l-diversity Name Age Height Illness Joss [25-35] ≤180 Flu Alice [25-35] ≤180 Flu Bob [25-35] >180 HIV Charles [25-35] >180 HIV David [25-35] ≤180 Flu . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Joss Wright Privacy-Preserving Data Analysis: 22/57
  • 23. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions l-diversity Name Age Height Illness Joss [25-35] ≤180 Flu Alice [25-35] ≤180 Flu Bob [25-35] >180 HIV Charles [25-35] >180 HIV David [25-35] ≤180 Flu . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Joss Wright Privacy-Preserving Data Analysis: 23/57
  • 24. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions l-diversity Name Age Height Illness Joss [25-35] ≤200 Flu Alice [25-35] ≤200 Flu Bob [25-35] ≤200 HIV Charles [25-35] ≤200 HIV David [25-35] ≤200 Flu . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Joss Wright Privacy-Preserving Data Analysis: 24/57
  • 25. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions l-diversity l-diversity ensures that not only are all users k-anonymous, but that each group of users shares a variety of sensitive attributes. Variations ensure that all sensitive attributes are evenly or sufficiently distributed to avoid high probability association of user with attribute. One notable extenstion is t-closeness that ensures that the distribution of attributes in the group is close to the distribution across the entire table. . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Joss Wright Privacy-Preserving Data Analysis: 25/57
  • 26. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions Peturbation The above approaches maintain the consistency of the database. One of the oldest ideas is simply to replace genuine values with perturbed values that maintain almost-correct desirable properties. For numeric quantities this can simply be the addition of random noise according to some appropriate distribution. Obviously this works best for numerical data. For categories, this can result in attributes being re-assigned in a variety of ways. . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Joss Wright Privacy-Preserving Data Analysis: 26/57
  • 27. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions Permutation Sensitive attributes can be swapped between data records, maintaining statistical quantities such as aggregate counts, averages and distribution of data. This has to be performed sensitively with respect to the required analyses. Typically on an ad-hoc, per-database basis. . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Joss Wright Privacy-Preserving Data Analysis: 27/57
  • 28. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions Sweeney’s k-anonymity Re-identification In 2001, Sweeney set out to prove the ideas behind k-anonymity. Took publicly available voter registration data and published, anonymized medical records. (GIC Healthcare Data.) At the time of the data collection, William Weld was the governor of Massachusetts. According to the voter records, only six people in Cambridge, Massachusetts shared his birth date. Of those six, three were male. Only one lived within his (5-digit) ZIP code. . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Joss Wright Privacy-Preserving Data Analysis: 28/57
  • 29. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions Sweeney’s k-anonymity Re-identification The anonymized medical records contained over 100 attributes detailing diagnoses, procedures and medications. Sweeney calculated that 87% of US citizens were uniquely identifiable through the quasi-identifier of {sex, date of birth, 5-digit ZIP} 53% from {sex, date of birth, city} 18% from {sex, date of birth, county} . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Joss Wright Privacy-Preserving Data Analysis: 29/57
  • 30. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions Netflix Prize Netflix wanted to improve its film recommendation algorithm. Published a database of over 100,000,000 film ratings by roughly 500,000 subscribers between 1999 and 2005. A million dollar prize was offered for an algorithm that would improve the recommendations given to users by a given degree of accuracy. “...all customer identifying information has been removed.” . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Joss Wright Privacy-Preserving Data Analysis: 30/57
  • 31. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions Netflix Prize Narayanan and Shmatikov disagreed. Combined Netflix data with IMDb data to re-identify a large number of users. Linked Netflix ratings to IMDb profiles. Showed the entire viewing history of many users. Demonstrated how information such as political preference could be extracted from the available data. Proof of concept algorithm used IMDb. Easily adaptable for alternative information sources. . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Joss Wright Privacy-Preserving Data Analysis: 31/57
  • 32. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions Netflix Prize With 8 film ratings, 96% of subscribers can be uniquely identified. With 2 ratings, and dates, 64% can be completely deanonymized. With 2 ratings, and dates, 89% can be reduced to a possible 8 users. . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Joss Wright Privacy-Preserving Data Analysis: 32/57
  • 33. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions Netflix Prize Redux Following this publication, Netflix’s response was... ... to announce a second Netflix prize containing more data points, including age, zip code, gender and previously-chosen films. Eventually cancelled, but only in response to legal action from customers and concerns from the US Federal Trade Commission. . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Joss Wright Privacy-Preserving Data Analysis: 33/57
  • 34. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions Mechanisms Revisited The mechanisms we’ve looked at so far are: Typically ad-hoc based on the desired utility; the purpose for which the data will be used. Without formal guarantees. Quantifiable probability that individuals could be reidentified. Sensitive to auxiliary information from external data sources. . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Joss Wright Privacy-Preserving Data Analysis: 34/57
  • 35. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions Mechanisms Revisited We can also consider privacy mechanisms as falling into one of two families: Non-interactive Anonymize the data somehow, then release it. Interactive Keep the database secret, and only release results to queries. . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Joss Wright Privacy-Preserving Data Analysis: 35/57
  • 36. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions Non-Interactive Mechanisms Historically, the main way of doing things. Including most of the methods we’ve looked at so far. A major limitation of this approach to anonymization is that it requires you to fix the utility before you release the data. Data is either useless and anonymous Or useful and identifiable. It is difficult to predict interactions with data that might be released in the future. . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Joss Wright Privacy-Preserving Data Analysis: 36/57
  • 37. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions Interactive Mechanisms In interactive mechanisms, the data is never released. Instead, queries are sent to the holder of the database, who releases an answer. This approach is taken by the current state of the art: differential privacy. . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Joss Wright Privacy-Preserving Data Analysis: 37/57
  • 38. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions Differential Privacy In 1978, Dalenius stated the following desirable property for privacy-preserving statistical databases: “A statistical database should reveal nothing about an individual that could not be learned without access to the database.” This is impossible, largely due to the existence of auxiliary external information that can be combined with the data in the database. . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Joss Wright Privacy-Preserving Data Analysis: 38/57
  • 39. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions Differential Privacy ‘Suppose one’s height were considered a sensitive piece of information, and that revealing the height of an individual were a privacy breach. Assume that a database yields the average heights of women of different nationalities. An adversary who has access to the statistical database and the auxiliary information “Terry Gross is two inches shorter than the average Lithuanian woman” learns Terry Gross’ height, while anyone learning only the auxiliary information, without access to the average heights, learns relatively little.’ – Dwork . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Joss Wright Privacy-Preserving Data Analysis: 39/57
  • 40. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions Differential Privacy Critically, this privacy breach occurs whether or not Terry Gross’ data is in the database. . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Joss Wright Privacy-Preserving Data Analysis: 40/57
  • 41. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions Differential Privacy Rather than guaranteeing that a privacy breach will not occur, differential privacy guarantees that the privacy breach will not occur due to the data in the database. Reformulated: Anything that can happen if your data is in the database could have happened even if your data weren’t in the database. This neatly accomodates any and all possible auxiliary information available now or in the future. It also divorces the privacy mechanism from the nature of the underlying data, providing a general mechanism. . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Joss Wright Privacy-Preserving Data Analysis: 41/57
  • 42. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions Differential Privacy Core A randomised function K achieves ϵ-differential privacy if, for any two databases D1 , D2 differing on at most one element, and all S ⊆ Range(K): Pr[K(D1 ) ∈ S] ≤ eϵ × Pr[K(D2 ) ∈ S] . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Joss Wright Privacy-Preserving Data Analysis: 42/57
  • 43. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions Differential Privacy Core Alternatively: Pr[K(D1 )∈S] Pr[K(D2 )∈S] ≤ eϵ The ratio between the two probabilities is bounded by eϵ . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Joss Wright Privacy-Preserving Data Analysis: 43/57
  • 44. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions The Exponential Function eε 150 100 50 0 0 1 2 3 4 5 ε . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Joss Wright Privacy-Preserving Data Analysis: 44/57
  • 45. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions Diferential Privacy Core Translated: for any calculation that you make on a database, any result you get is (almost) equally probable if you add a person, and thus a single record, to that database. Alternatively put: two databases that differ in a single record should be indistinguishable, with given probability, when accessed via the privacy mechanism. . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Joss Wright Privacy-Preserving Data Analysis: 45/57
  • 46. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions Achieving Differential Privacy How do we achieve this guarantee? There are a variety of mechanisms proposed in the literature, but Dwork’s original suggestion remains popular: Appropriately chosen random noise is added to the result of a query of arbitrary complexity. Noise added to the result means that the original database retains its accuracy. The Laplace distribution provides desirable properties for the appropriate noise. . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Joss Wright Privacy-Preserving Data Analysis: 46/57
  • 47. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions The Laplace Distribution . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Joss Wright Privacy-Preserving Data Analysis: 47/57
  • 48. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions Achieving Differential Privacy How do we know how much noise to add? We use the L1 -sensitivity of the function to bound the noise: Defined as the amount by which the query could change if a single record were added to the database. Recall that our guarantee is based around indistinguishability between similar databases. As an example: the count function (e.g. “How many people in the database are left-handed?”) can only differ by one. Other queries types differ, but many complex queries have manageable L1 -sensitivity. . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Joss Wright Privacy-Preserving Data Analysis: 48/57
  • 49. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions Properties of Differential Privacy Use of the Laplace distribution to add noise provably adds the smallest amount required to preserve privacy. The multiplicative factor used in the guarantee is scalable for higher or lower guarantees. Higher values decrease the likelihood that databases can be distinguished as a result of queries, but make results less accurate. . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Joss Wright Privacy-Preserving Data Analysis: 49/57
  • 50. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions Differential Privacy Illustrated Pr[x] a b µ1 µ2 . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Joss Wright Privacy-Preserving Data Analysis: 50/57
  • 51. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions Differential Privacy Illustrated (Explanation) In the previous slide, let µ1 and µ2 be two “true” results of a query, such as a count function, from each of two databases that differ in a single record. With random noise added, drawn from the Laplace distribution, both a and b are possible “noisy” results of the query for either database. Importantly, the ratio between the probability of a given noisy result, such as a or b, based on µ1 , and the probability of that result based on µ2 , is constant. . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Joss Wright Privacy-Preserving Data Analysis: 51/57
  • 52. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions Properties of Differential Privacy Differentially private queries are neatly composable in two senses: A complex sequence of queries can be given to the database owner, each of which depends on the accurate result of the previous query. At the end, only the final result need be perturbed. The result of a differentially private query exhausts some amount of the privacy guarantee. Further queries can be made until this budget is exhausted. At this point the database should be destroyed! . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Joss Wright Privacy-Preserving Data Analysis: 52/57
  • 53. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions Practical Application Privacy Integrated Queries (PINQ) For practical application, we do not want database owners to need to understand the theory. There is now a simple database query language, similar to SQL, that automatically enforces differential privacy guarantees. Has been used in academic analyses, but not commercially. . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Joss Wright Privacy-Preserving Data Analysis: 53/57
  • 54. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions Practical Application Smart Grids Recent work by Danezis demonstrates differentially-private smart metering for electrical grids. Injects noise in billing by increasing the amount you pay. Rapidly gets very expensive, but gives quantifiable privacy goals. . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Joss Wright Privacy-Preserving Data Analysis: 54/57
  • 55. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions Future Work Differential privacy is a very strong guarantee. How effectively can it be weakened? Distributed settings for data sources and noise addition. Streaming, or otherwise changing, data rather than static databases. . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Joss Wright Privacy-Preserving Data Analysis: 55/57
  • 56. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions Lessons A step back: anonymizing data is hard. We are only just beginning to realise just how hard. Differential privacy, and PINQ, are good examples of how to go about this and what limitations we face. Netflix and other examples show that these risks are not isolated or theoretical. This is before we look at Facebook, Google, Amazon. . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Joss Wright Privacy-Preserving Data Analysis: 56/57
  • 57. iioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiioiio ioiioiioiioiio oiioiioiioiioiio iioiioxford internet ins�tute university of oxfordoiioi Privacy Mechanisms Notable Cases State of the Art Conclusions Lessons If you are in a position where you need to anonymize data, think very carefully about how you treat the data, and what you release. Eyeballing data, and removing obvious linkages, is not even close to sufficient. Do it if you want to, but don’t claim it’s anonymized. The most important principle is data minimisation. Only gather what you need. Only use it for what you (initially) need. Only share it when you must. . . . . . . . . . . . . . . . . . . . . .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. Joss Wright Privacy-Preserving Data Analysis: 57/57