SlideShare a Scribd company logo
1 of 22
Download to read offline
PRESERVING PRIVACY IN
SEMANTIC-RICH TRAJECTORIES
OF HUMAN MOBILITY
Anna Monreale, Roberto Trasarti, Dino
Pedreschi, Chiara Renso
KDDLab, Pisa
Vania Bogorny
Univ. Santa Catarina, Brasile
1
Knowledge Discovery and Delivery Lab
(ISTI-CNR & Univ. Pisa)
www-kdd.isti.cnr.it
ANONIMO MEETING, Pisa, 20,21 settembre 2010
SPRINGL 2010, San Jose, November 2, 2010
How the story begins…
2 Semantic
trajectories
represent the
important places
visited by people
Semantic
trajectories
represent the
important places
visited by people
This information can
be privacy sensitive!
We should find a
good generalization
of the visited
places… preserving
semantics!
But how?
This information can
be privacy sensitive!
We should find a
good generalization
of the visited
places… preserving
semantics!
But how?
Can we use a taxonomy
of places to generalize
and find anonymous
datasets?
Let’s ask help to Anna,
Dino and Roberto!
Can we use a taxonomy
of places to generalize
and find anonymous
datasets?
Let’s ask help to Anna,
Dino and Roberto!
Semantic Trajectories
 Availability of trajectory data increases
 From raw trajectories to new forms of trajectory data with
richer semantic information: semantic trajectories
 Semantic trajectories represents moving objects traces as
sequences of stops and moves
 A semantic trajectory can be represented as the sequence
of stops, e.g.
<Home, Work, ShoppingCenter, Gym>
Semantic Trajectory and
Privacy
 Data owner should not reveal personal sensitive
information
 Disclosure of personal sensitive information puts
the citizen’s privacy at risk.
 Hiding personal identifiers may not be sufficient
 Need for new privacy-preserving DT techniques
 Privacy by Design
 Natural trade-off between privacy quantification
and data utility
 Analysis results should not be altered significantly
 Privacy has to be maximized
Semantic Trajectories Analysis and
Privacy Issues
 Analyzing datasets of semantic trajectories
may cause privacy issues
 A place allows to infer personal sensitive
information of an individual
 Example: From the fact that a person has
stopped in an oncology clinic, an attacker can
derive private personal information about the
health of such person.
5
Semantic Trajectories Analysis
and Privacy Issues
k-anonymity is not enough for a robust protection
When individuals with similar trajectories stop in
the same sensitive place, we can easily infer
the individual sensitive information.
Example:
#U1 <Park, Restaurant, Oncology Clinic>
#U2 <Park, Restaurant, Oncology Clinic>
This dataset is 2-anonymous but the attacker can
infer that the user has been to the Oncology
Clinic!!!
6
The Privacy Framework
 Anonymizes dataset of semantic trajectories
 Based on semantic generalization and the
notion of c-safety - similar to the notion of l-
diversity in relational, tabular data
 It is based on: a taxonomy of places, the notion of
quasi identifier places and sensitive places.
 Preserves patterns mining results
Quasi-identifier and Sensitive
stops8
 The taxonomy of places
 Represents important places and their semantic
categories in a given domain
 quasi-identifier places: can be used to infer the
identity of the user
 sensitive places: can disclose sensitive
information about the user
 In general we don’t have an apriori
classification since it depends on the
application and the context
Privacy place taxonomy
9
Privacy Model
10
 Adversary Knowledge:
 how we anonymize the data
 the privacy place taxonomy describing the levels of
abstraction
 the user U is in the dataset
 a quasi-identifier place sequence SQ visited by the user
U
  Attack Model:
 Given SQ, the attacker builts a set of candidate semantic
trajectories containing SQ and tries to infer the sensitive
places visited by U.
 We denote by Prob(SQ,S) the probability that, given a
quasi-identifier place sequence SQ related to a user U,
the attacker infers the sequence of sensitive places S
visited by the user.
C-Safe Dataset
We want to control the probability Prob(SQ, S)
 A dataset ST is said c-safe wrt the place set Q if
for every quasi-identifier place sequence SQ,
we have that for each set of sensitive place S
Prob(SQ,S) ≤ c with c ∈ [0,1].
 Given a sequence of sensitive places S = s1, . . .
, sh and a quasi-identifier sequence SQ the
probability to infer S is the conditional
probability:
P(SQ,S) = P(S|SQ)
11
How we can obtain a c-safe dataset?
12
The CAST (C-safe Anonymization of Semantic
Trajectories) algorithm guarantees that P(S|SQ)
≤ c for each sequence of S and SQ
While (|S|>0)
SL = { s ∈ S| length(s) = MaxLength(S)}
While (|SL| >= m)
1. Compute the Cost of all possible group Gi of m
sequences in SL as: CostGi = CostQGi + CostSGi.
2. Apply the generalization with the lower Cost
storing the results in R.
3. Remove Gi from S and SL.
Example (1): The process
13
Consider the following set of sequences, and m=3 and c=0.45:
S = {<S1, R2, H1, R1, C1, S2>
<S3, D1, R1, C1, S2>
<S1, P3, C2, D2, S2>
…}
Example (2) CostQ
14
CostQ is the number of hops on the tree needed to generalize the
sequences of Quasi-identifiers to a common one.
Consider the group:
<S1, R2, H1, R1, C1, S2>
<S3, D1, R1, C1, S2>
<S1, P3, C2, D2, S2>
CostQ = 6 + 6 + 6 = 18
<Station,Place,Entertainment,S2 (H1,C1)>
<Station,Place,Entertainment,S2 (C1)>
<Station,Place,Entertainment,S2 (C2)>
Example (2) CostS
15
CostS is the number of hops on the tree needed to generalize the
sequence of Sensible in order to obtain the c-safety.
From the generalized group:
<Station,Place,Entertainment,S2 (H1,C1)>
<Station,Place,Entertainment,S2 (C1)>
<Station,Place,Entertainment,S2 (C2)>
CostS = 3
The Total Cost of this
group is 21 hops,
which is the lower
combination
<Station, Place, H1, Entertainment, Clinic,
S2 >
<Station, Place, Entertainment, Clinic, S2>
<Station, Place, Clinic, Entertainment, S2>
Example (4): Why is C-safe
<Station,Place,Entertainment,S2 (H1,C1)>
<Station,Place,Entertainment,S2 (C1)>
<Station,Place,Entertainment,S2 (C2)>
SQ = Station, Place, Entertainment, S2 .⟨ ⟩
Probability of crack: P (SQ , H1 ) = 1/3 <c , P(SQ,C1) = 2/3 > c and
P(SQ,C2) = 1/3 <c
We need to generalize C1 to the higher representation level in the
taxonomy: Clinic.
The probability of C1 become 2/5 < c !!!!
C-safe dataset:
<Station, Place, H1, Entertainment, Clinic, S2 >
<Station, Place, Entertainment, Clinic, S2>
<Station, Place, Clinic, Entertainment, S2>
16
Experiments
We found 6225 semantic trajectories with an
average length equal to 5.2 stops.
We run the sequential pattern algorithm and we
measured the quality of the results with two
measures:
 the coverage coefficient
 the distance coefficient.
17
The dataset contains trajectories of
17000 moving cars in Milan, in one
week, collected through GPS
devices.
Experiments: Quality of the
analysis
the coverage coefficient measures how many
patterns extracted from the original dataset
are covered (have a superclass in the taxonomy)
by the patterns extracted in the anonymized
dataset
18
Experiments: Coverage
Coefficient19
Experiments: Quality of the
analysis
Distance coefficient represents the distance in
terms of steps in the taxonomy to transform
the patterns from the set extracted on the
original dataset and the one from the
anonymized dataset.
20
Experiments: Distance
Coefficient
Conclusions and Future work
 Improve the algorithm with better heuristics
and that does not consider only groups of a
fixed size.
 More experiments with other mining
algorithms
 More utility measures for the evaluation of
results
 Another future research direction goes
towards the exploitation of c-safe semantic
trajectories dataset for semantic tagging of
trajectories. How does the anonymization step
22

More Related Content

Similar to Cast

International Refereed Journal of Engineering and Science (IRJES)
International Refereed Journal of Engineering and Science (IRJES)International Refereed Journal of Engineering and Science (IRJES)
International Refereed Journal of Engineering and Science (IRJES)irjes
 
Square grid points coveraged by
Square grid points coveraged bySquare grid points coveraged by
Square grid points coveraged byijcsit
 
A Novel Method based on Gaussianity and Sparsity for Signal Separation Algori...
A Novel Method based on Gaussianity and Sparsity for Signal Separation Algori...A Novel Method based on Gaussianity and Sparsity for Signal Separation Algori...
A Novel Method based on Gaussianity and Sparsity for Signal Separation Algori...IJECEIAES
 
Assessment 2 Top of FormBottom of FormContent· Print· Co.docx
Assessment 2 Top of FormBottom of FormContent· Print· Co.docxAssessment 2 Top of FormBottom of FormContent· Print· Co.docx
Assessment 2 Top of FormBottom of FormContent· Print· Co.docxfestockton
 
Pencil Beam And Collapsed Cone Algorithm Calculations For...
Pencil Beam And Collapsed Cone Algorithm Calculations For...Pencil Beam And Collapsed Cone Algorithm Calculations For...
Pencil Beam And Collapsed Cone Algorithm Calculations For...Heidi Owens
 
EVALUATING THE OPTIMAL PLACEMENT OF BINARY SENSORS
EVALUATING THE OPTIMAL PLACEMENT OF BINARY SENSORSEVALUATING THE OPTIMAL PLACEMENT OF BINARY SENSORS
EVALUATING THE OPTIMAL PLACEMENT OF BINARY SENSORSijistjournal
 
Srilakshmi alla blindsourceseperation
Srilakshmi alla blindsourceseperationSrilakshmi alla blindsourceseperation
Srilakshmi alla blindsourceseperationSrilakshmi Alla
 
A survey of sparse representation algorithms and applications
A survey of sparse representation algorithms and applications A survey of sparse representation algorithms and applications
A survey of sparse representation algorithms and applications redpel dot com
 
On solving coverage problems in a wireless sensor networks using diagrams
On solving coverage problems in a wireless sensor networks using diagrams On solving coverage problems in a wireless sensor networks using diagrams
On solving coverage problems in a wireless sensor networks using diagrams marwaeng
 
Evaluation of Uncertain Location
Evaluation of Uncertain Location Evaluation of Uncertain Location
Evaluation of Uncertain Location IOSR Journals
 
Information Theory and coding - Lecture 2
Information Theory and coding - Lecture 2Information Theory and coding - Lecture 2
Information Theory and coding - Lecture 2Aref35
 
A Counter Example For This Algorithm
A Counter Example For This AlgorithmA Counter Example For This Algorithm
A Counter Example For This AlgorithmTeresa Oakman
 
Iaetsd effective method for searching substrings in large databases
Iaetsd effective method for searching substrings in large databasesIaetsd effective method for searching substrings in large databases
Iaetsd effective method for searching substrings in large databasesIaetsd Iaetsd
 
Implementation of bpsc stegnography ( synopsis)
Implementation of bpsc stegnography ( synopsis)Implementation of bpsc stegnography ( synopsis)
Implementation of bpsc stegnography ( synopsis)Mumbai Academisc
 
Regression on gaussian symbols
Regression on gaussian symbolsRegression on gaussian symbols
Regression on gaussian symbolsAxel de Romblay
 
Multivariate Regression using Skull Structures
Multivariate Regression using Skull StructuresMultivariate Regression using Skull Structures
Multivariate Regression using Skull StructuresJustin Pierce
 

Similar to Cast (20)

International Refereed Journal of Engineering and Science (IRJES)
International Refereed Journal of Engineering and Science (IRJES)International Refereed Journal of Engineering and Science (IRJES)
International Refereed Journal of Engineering and Science (IRJES)
 
Square grid points coveraged by
Square grid points coveraged bySquare grid points coveraged by
Square grid points coveraged by
 
A Novel Method based on Gaussianity and Sparsity for Signal Separation Algori...
A Novel Method based on Gaussianity and Sparsity for Signal Separation Algori...A Novel Method based on Gaussianity and Sparsity for Signal Separation Algori...
A Novel Method based on Gaussianity and Sparsity for Signal Separation Algori...
 
Assessment 2 Top of FormBottom of FormContent· Print· Co.docx
Assessment 2 Top of FormBottom of FormContent· Print· Co.docxAssessment 2 Top of FormBottom of FormContent· Print· Co.docx
Assessment 2 Top of FormBottom of FormContent· Print· Co.docx
 
Pencil Beam And Collapsed Cone Algorithm Calculations For...
Pencil Beam And Collapsed Cone Algorithm Calculations For...Pencil Beam And Collapsed Cone Algorithm Calculations For...
Pencil Beam And Collapsed Cone Algorithm Calculations For...
 
sdcSpatial user!2019
sdcSpatial user!2019sdcSpatial user!2019
sdcSpatial user!2019
 
EVALUATING THE OPTIMAL PLACEMENT OF BINARY SENSORS
EVALUATING THE OPTIMAL PLACEMENT OF BINARY SENSORSEVALUATING THE OPTIMAL PLACEMENT OF BINARY SENSORS
EVALUATING THE OPTIMAL PLACEMENT OF BINARY SENSORS
 
Srilakshmi alla blindsourceseperation
Srilakshmi alla blindsourceseperationSrilakshmi alla blindsourceseperation
Srilakshmi alla blindsourceseperation
 
Final Report-1-(1)
Final Report-1-(1)Final Report-1-(1)
Final Report-1-(1)
 
A survey of sparse representation algorithms and applications
A survey of sparse representation algorithms and applications A survey of sparse representation algorithms and applications
A survey of sparse representation algorithms and applications
 
On solving coverage problems in a wireless sensor networks using diagrams
On solving coverage problems in a wireless sensor networks using diagrams On solving coverage problems in a wireless sensor networks using diagrams
On solving coverage problems in a wireless sensor networks using diagrams
 
Evaluation of Uncertain Location
Evaluation of Uncertain Location Evaluation of Uncertain Location
Evaluation of Uncertain Location
 
Information Theory and coding - Lecture 2
Information Theory and coding - Lecture 2Information Theory and coding - Lecture 2
Information Theory and coding - Lecture 2
 
A Counter Example For This Algorithm
A Counter Example For This AlgorithmA Counter Example For This Algorithm
A Counter Example For This Algorithm
 
Iaetsd effective method for searching substrings in large databases
Iaetsd effective method for searching substrings in large databasesIaetsd effective method for searching substrings in large databases
Iaetsd effective method for searching substrings in large databases
 
Implementation of bpsc stegnography ( synopsis)
Implementation of bpsc stegnography ( synopsis)Implementation of bpsc stegnography ( synopsis)
Implementation of bpsc stegnography ( synopsis)
 
Regression on gaussian symbols
Regression on gaussian symbolsRegression on gaussian symbols
Regression on gaussian symbols
 
Ijetr042309
Ijetr042309Ijetr042309
Ijetr042309
 
Unit 6: All
Unit 6: AllUnit 6: All
Unit 6: All
 
Multivariate Regression using Skull Structures
Multivariate Regression using Skull StructuresMultivariate Regression using Skull Structures
Multivariate Regression using Skull Structures
 

Recently uploaded

Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsSafe Software
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintMahmoud Rabie
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IES VE
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesMd Hossain Ali
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...Aggregage
 
100+ ChatGPT Prompts for SEO Optimization
100+ ChatGPT Prompts for SEO Optimization100+ ChatGPT Prompts for SEO Optimization
100+ ChatGPT Prompts for SEO Optimizationarrow10202532yuvraj
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Websitedgelyza
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Will Schroeder
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDELiveplex
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-pyJamie (Taka) Wang
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxMatsuo Lab
 
UiPath Studio Web workshop series - Day 5
UiPath Studio Web workshop series - Day 5UiPath Studio Web workshop series - Day 5
UiPath Studio Web workshop series - Day 5DianaGray10
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1DianaGray10
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024D Cloud Solutions
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureEric D. Schabell
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024SkyPlanner
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Brian Pichman
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationIES VE
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Commit University
 

Recently uploaded (20)

Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
 
100+ ChatGPT Prompts for SEO Optimization
100+ ChatGPT Prompts for SEO Optimization100+ ChatGPT Prompts for SEO Optimization
100+ ChatGPT Prompts for SEO Optimization
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Website
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
 
20230104 - machine vision
20230104 - machine vision20230104 - machine vision
20230104 - machine vision
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-py
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptx
 
UiPath Studio Web workshop series - Day 5
UiPath Studio Web workshop series - Day 5UiPath Studio Web workshop series - Day 5
UiPath Studio Web workshop series - Day 5
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability Adventure
 
Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)
 

Cast

  • 1. PRESERVING PRIVACY IN SEMANTIC-RICH TRAJECTORIES OF HUMAN MOBILITY Anna Monreale, Roberto Trasarti, Dino Pedreschi, Chiara Renso KDDLab, Pisa Vania Bogorny Univ. Santa Catarina, Brasile 1 Knowledge Discovery and Delivery Lab (ISTI-CNR & Univ. Pisa) www-kdd.isti.cnr.it ANONIMO MEETING, Pisa, 20,21 settembre 2010 SPRINGL 2010, San Jose, November 2, 2010
  • 2. How the story begins… 2 Semantic trajectories represent the important places visited by people Semantic trajectories represent the important places visited by people This information can be privacy sensitive! We should find a good generalization of the visited places… preserving semantics! But how? This information can be privacy sensitive! We should find a good generalization of the visited places… preserving semantics! But how? Can we use a taxonomy of places to generalize and find anonymous datasets? Let’s ask help to Anna, Dino and Roberto! Can we use a taxonomy of places to generalize and find anonymous datasets? Let’s ask help to Anna, Dino and Roberto!
  • 3. Semantic Trajectories  Availability of trajectory data increases  From raw trajectories to new forms of trajectory data with richer semantic information: semantic trajectories  Semantic trajectories represents moving objects traces as sequences of stops and moves  A semantic trajectory can be represented as the sequence of stops, e.g. <Home, Work, ShoppingCenter, Gym>
  • 4. Semantic Trajectory and Privacy  Data owner should not reveal personal sensitive information  Disclosure of personal sensitive information puts the citizen’s privacy at risk.  Hiding personal identifiers may not be sufficient  Need for new privacy-preserving DT techniques  Privacy by Design  Natural trade-off between privacy quantification and data utility  Analysis results should not be altered significantly  Privacy has to be maximized
  • 5. Semantic Trajectories Analysis and Privacy Issues  Analyzing datasets of semantic trajectories may cause privacy issues  A place allows to infer personal sensitive information of an individual  Example: From the fact that a person has stopped in an oncology clinic, an attacker can derive private personal information about the health of such person. 5
  • 6. Semantic Trajectories Analysis and Privacy Issues k-anonymity is not enough for a robust protection When individuals with similar trajectories stop in the same sensitive place, we can easily infer the individual sensitive information. Example: #U1 <Park, Restaurant, Oncology Clinic> #U2 <Park, Restaurant, Oncology Clinic> This dataset is 2-anonymous but the attacker can infer that the user has been to the Oncology Clinic!!! 6
  • 7. The Privacy Framework  Anonymizes dataset of semantic trajectories  Based on semantic generalization and the notion of c-safety - similar to the notion of l- diversity in relational, tabular data  It is based on: a taxonomy of places, the notion of quasi identifier places and sensitive places.  Preserves patterns mining results
  • 8. Quasi-identifier and Sensitive stops8  The taxonomy of places  Represents important places and their semantic categories in a given domain  quasi-identifier places: can be used to infer the identity of the user  sensitive places: can disclose sensitive information about the user  In general we don’t have an apriori classification since it depends on the application and the context
  • 10. Privacy Model 10  Adversary Knowledge:  how we anonymize the data  the privacy place taxonomy describing the levels of abstraction  the user U is in the dataset  a quasi-identifier place sequence SQ visited by the user U   Attack Model:  Given SQ, the attacker builts a set of candidate semantic trajectories containing SQ and tries to infer the sensitive places visited by U.  We denote by Prob(SQ,S) the probability that, given a quasi-identifier place sequence SQ related to a user U, the attacker infers the sequence of sensitive places S visited by the user.
  • 11. C-Safe Dataset We want to control the probability Prob(SQ, S)  A dataset ST is said c-safe wrt the place set Q if for every quasi-identifier place sequence SQ, we have that for each set of sensitive place S Prob(SQ,S) ≤ c with c ∈ [0,1].  Given a sequence of sensitive places S = s1, . . . , sh and a quasi-identifier sequence SQ the probability to infer S is the conditional probability: P(SQ,S) = P(S|SQ) 11
  • 12. How we can obtain a c-safe dataset? 12 The CAST (C-safe Anonymization of Semantic Trajectories) algorithm guarantees that P(S|SQ) ≤ c for each sequence of S and SQ While (|S|>0) SL = { s ∈ S| length(s) = MaxLength(S)} While (|SL| >= m) 1. Compute the Cost of all possible group Gi of m sequences in SL as: CostGi = CostQGi + CostSGi. 2. Apply the generalization with the lower Cost storing the results in R. 3. Remove Gi from S and SL.
  • 13. Example (1): The process 13 Consider the following set of sequences, and m=3 and c=0.45: S = {<S1, R2, H1, R1, C1, S2> <S3, D1, R1, C1, S2> <S1, P3, C2, D2, S2> …}
  • 14. Example (2) CostQ 14 CostQ is the number of hops on the tree needed to generalize the sequences of Quasi-identifiers to a common one. Consider the group: <S1, R2, H1, R1, C1, S2> <S3, D1, R1, C1, S2> <S1, P3, C2, D2, S2> CostQ = 6 + 6 + 6 = 18 <Station,Place,Entertainment,S2 (H1,C1)> <Station,Place,Entertainment,S2 (C1)> <Station,Place,Entertainment,S2 (C2)>
  • 15. Example (2) CostS 15 CostS is the number of hops on the tree needed to generalize the sequence of Sensible in order to obtain the c-safety. From the generalized group: <Station,Place,Entertainment,S2 (H1,C1)> <Station,Place,Entertainment,S2 (C1)> <Station,Place,Entertainment,S2 (C2)> CostS = 3 The Total Cost of this group is 21 hops, which is the lower combination <Station, Place, H1, Entertainment, Clinic, S2 > <Station, Place, Entertainment, Clinic, S2> <Station, Place, Clinic, Entertainment, S2>
  • 16. Example (4): Why is C-safe <Station,Place,Entertainment,S2 (H1,C1)> <Station,Place,Entertainment,S2 (C1)> <Station,Place,Entertainment,S2 (C2)> SQ = Station, Place, Entertainment, S2 .⟨ ⟩ Probability of crack: P (SQ , H1 ) = 1/3 <c , P(SQ,C1) = 2/3 > c and P(SQ,C2) = 1/3 <c We need to generalize C1 to the higher representation level in the taxonomy: Clinic. The probability of C1 become 2/5 < c !!!! C-safe dataset: <Station, Place, H1, Entertainment, Clinic, S2 > <Station, Place, Entertainment, Clinic, S2> <Station, Place, Clinic, Entertainment, S2> 16
  • 17. Experiments We found 6225 semantic trajectories with an average length equal to 5.2 stops. We run the sequential pattern algorithm and we measured the quality of the results with two measures:  the coverage coefficient  the distance coefficient. 17 The dataset contains trajectories of 17000 moving cars in Milan, in one week, collected through GPS devices.
  • 18. Experiments: Quality of the analysis the coverage coefficient measures how many patterns extracted from the original dataset are covered (have a superclass in the taxonomy) by the patterns extracted in the anonymized dataset 18
  • 20. Experiments: Quality of the analysis Distance coefficient represents the distance in terms of steps in the taxonomy to transform the patterns from the set extracted on the original dataset and the one from the anonymized dataset. 20
  • 22. Conclusions and Future work  Improve the algorithm with better heuristics and that does not consider only groups of a fixed size.  More experiments with other mining algorithms  More utility measures for the evaluation of results  Another future research direction goes towards the exploitation of c-safe semantic trajectories dataset for semantic tagging of trajectories. How does the anonymization step 22