On the Anonymity of Home/Work Location Pairs

On the Anonymity of Home/Work
Location Pairs

Philippe Golle
Kurt Partridge

Presented by: Bo Begole

Location traces are useful for
personalized location-based services
Inferred Restaurant Preferences
Location History
Monday Tuesda
Location Cuisine
Time

…
11:57- 12:45 37°26’39”
… …
-122°9’38”
12:00
1:22 - 1:31 37°23’11” $ $
to 1:00 $$ $$
-122°9’02”
Chinese Chinese
… … …
Italian Italian
…
… …
1:00 to

Personalized
Recommendations

[Magitti, CHI 2008]

Location traces can be sensitive
They can reveal business connections, political

affiliation, medical condition, risky behaviors, etc.
Can damage reputations, unfairly raise insurance

premiums, increase divorce rates, etc.

Some ways to mitigate the risk

– “Only my trusted network provider knows my location”
» Location data is often shared with third party LBS providers
– “Privacy policies and legislation protect my location traces”
» Data collector may fall victim to attacks, or be dishonest themselves
– “I’ll report my location only infrequently when I need service”
» Many services require frequent location updates (e.g. friend finder)
– “I’ll report my location pseudonymously”
» Watch out for inferences!

Sometimes possible to infer identity by
combining data sources
Voter Registration
Inference: what can be

learned from combining Name
existing knowledge with newly Street address
acquired information …
Gender
E.g.: learn the medical record Postal code

Date of Birth
of William Weld [Swe00]
– Knowing the birth date and ZIP
code of the governor of
Gender
Massachusetts
Postal code
– Can retrieve his health records
Date of Birth
from a supposedly anonymous
database of state employee
health-insurance claims Cancer Type

87% of US population have
 Patient Records
unique date of birth, gender
and postal code.

Inferences from location data
With 2-weeks’ worth of GPS data collected from a

subject’s car, Krumm [Pervasive 2007] showed we can
infer:
– Home address (median error < 60 m)

>5% are Identifiable by combining with

– Reverse geo-coder
– Web-based white pages directory

Preventing inferences [Krumm 07]

Data supression Noise Rounding

The K-anonymity Guarantee

Problem

– If a location trace is unique…
– It may be combined with other public data
– To generate undesirable inferences

Solution: k-anonymity [Sweeney 2000]

– Principle: “Data is safe to release if at least k
people share the same attributes”
– Anonymity set: set of people with same attributes
– K-anonymity: size of the anonymity set

K-anonymity Example
– Cancer DB: (Post code: 34012, DOB 7/31/1945, Male)
– Goal: ensure 5-anonymity
– (Cambridge MA) 54,805 people
– (Cambridge MA, male) 26,854 people
– (Cambridge MA, 55 years old) 2,096 people
– (Cambridge MA, DOB: 7/31/1945) 6 people
– (Cambridge MA, DOB: 7/31/1945, Male) 3 people
– (Post code: 34012, DOB: 7/31/1945, Male) 1 person

Combine with voter registration: William Weld

The same can be done for 69 percent of the 54,805

people on the voting list of Cambridge, MA.

Inferences from location data
What level of accuracy of location information

could result in small sizes of k anonymity for
some portion of the population?

Assume a location trace reveals approximate …

– region someone lives (county or postal code)
– and also, region someone works (county or postal code)

County and postal code are coarse-grained

– A trace that reveals less is probably not very useful
– Useful traces in fact reveal considerably more!
– Therefore, our results are an upper bound on the size of
the anonymity set (things could be worse!)

Data source
Longitudinal Employer-Household Dynamics

(LEHD)
– Census Bureau program to compile information about
where people work and where they live
» County
» Census Tract ~= US Postal code
– Also records age, earnings and distribution across
industries.
– 103,289,243 workers from 42 states

LEHD is a synthetic dataset

– “The key statistical property to preserve in the
synthetic data is the joint distribution of workers
across home and work areas”
– 3 implicates (synthetic datasets) are available for
researchers to check robustness of results

Size of live/work anonymity set

100,000 10,000,000
Size of anonymity set

1,000,000
10,000
100,000

1,000 10,000
50% have
k-anonymity
1,000
100
of 35,000
100
10
10
5% have
k-anonymity of 92
1
50% have
1
k-anonymity of 21

7% have
k-anonymity of 1 Home only
Work only
Both home & work

Influence of living & working in the
same region versus different regions
5,000

1,000,000
Size of anonymity set

1,000
100,000

10,000
100

1,000

100
10

10

1 1

Same location
Different locations
All

Conclusion: Even coarse grained
location traces can identify individuals
Approximate home and work locations narrow a person to a

very small anonymity set
Geographic Median Anonymity
Region Size Set Size
County 34,980
Census Tract
21
(~postal code)
Census Block 1

Being unique is not quite the same as being identifiable

– But identity of a specific individual may be found by combining other
sources (white pages, patient records, police records, private
knowledge, employer records, facebook pages, …)
We can estimate k-anonymity of location- and other context-

aware tech. prior to widespread adoption using public datasets
– Population Census
– Longitudinal Employment Household Data (LEHD)
– American Time Use Survey (ATUS)
– Japan Statistics Bureau Survey on Time Use and Leisure Activities

Techniques to Estimate Privacy Effects of
Context-Aware Technologies
K-anonymity (size of the anonymity set) is a metric to

characterize level of privacy

Can estimate k-anonymity of context-awareness

technologies with public datasets
– Population Census
– Longitudinal Employment Household Data (LEHD)
– American Time Use Survey (ATUS)
– Japan Statistics Bureau Survey on Time Use and Leisure
Activities

K-anonymity is not a perfect measure

– May still leak information
» L-diversity [MGKV 2006]
» ε-differential privacy [Dwork et al]

On the Anonymity of Home/Work Location Pairs

Recomendados

Recomendados

Más contenido relacionado

Destacado

Destacado (10)

Último

Último (20)

On the Anonymity of Home/Work Location Pairs

Notas del editor