SlideShare una empresa de Scribd logo
1 de 5
Descargar para leer sin conexión
Privacy Preserving Publication of Set-Valued Data

Written by {ga=terrovitis}


Set-values are a natural representation of data that appears in a large number of application
areas, ranging from retail sales logs to medical records. Datasets with set-values are sparse
multidimensional data that usually contain unique combinations of values. Publishing such
datasets poses great dangers to the privacy of the individuals that are associated with the
records. Adversaries with partial background knowledge can use it to retrieve the whole record
that is associated with an individual.

The aim of this article is to provide an overview of the basic anonymization methods for
set-valued data. Anonymization methods allow sanitizing set-valued datasets in such a way that
the privacy of the individuals who are associated with the data is guaranteed. The paper
describes the basic privacy guaranties that are used in research literature, and briefly presents
the state-of-the-art anonymization methods.

Privacy preserving publishing of set-values

As technology infiltrates more and more aspects of our lives, each human activity leaves a
digital trace in some repository. Vast amounts of personal data are implicitly or explicitly created
each day, and rarely one is aware of the extent of information that is kept and processed about
him or her. These personal data give rise to significant concerns about user privacy, since
important and sensitive details about one's life are collected and exploited by third parties.

The goal of privacy preservation technologies is to provide tools that allow
greater control over the dissemination of user data. A promising trend in the
field is Privacy Preserving Data Publishing (PPDP), which allows sharing of
anonymized data. Anonymizing a dataset is not limited to the removal of direct
identifiers that might exist in a dataset, e.g. the name or the Social Security
Number of a person. It also includes removing secondary information, e.g. like
age, zipcode that might lead indirectly to the true identity of an individual. This
secondary information is often referred to as quasi-identifiers.

A basic attack scenario against user privacy is as follows: An adversary who knows some
characteristics of a person, e.g. age and zipcode, searches the anonymized data for records
that match her background knowledge. If a single record is found, then she be can certain that
the record refers to the person see knows. The sparser the data are, the more unique
combinations exist, and the easier it is for an adversary to locate unique records that
correspond to specific users. This makes collections of set-values, which are sparse
multidimensional data, very hard to anonymize effectively without compromising user privacy
[Aggarwal 2007]. There are several methods in research literature that address the challenges
of anonymizing set-valued data. I report the basic techniques classified in three categories:
anonymization methods that protect against identity disclosure, methods that protect against
attribute disclosure and methods that offer differential privacy.




Protection against identity disclosure

Protection against identity disclosure guarantees that adversaries will not be able to



                                                                                            1/5
                                                                                            Phoca PDF
Privacy Preserving Publication of Set-Valued Data

Written by {ga=terrovitis}


associate specific records with a known individual. The most popular guaranty is k
-anonymity [Samarati 2001, Sweeney 2002]. k-anonymity guarantees that each record will
be indistinguishable from other k-1 records, with respect to the quasi identifiers. Every
combination of quasi identifiers appears 0 or more than k times in the anonymized dataset.



               In [He et. al. 2009] a method for transforming set-valued data to
a k-anonymous form, the Partition algorithm, is proposed. Partition employs
generalization for transforming the data to k-anonymous form. Generalization
is the replacement of a group of original values by one new more abstract one.
For example, if the residence area of an individual is reported in terms of
cities, the city name can be replaced by the country name in anonymized data
as in Figure 2. Partition employs local recoding; not all appearances of an original
value are replaced by a generalized one. Partition is a top down algorithm; it starts
by considering that all values are generalized to the more generic value of the
generalization hierarchy and then drills down the hierarchy until the k-anonymity property no
longer holds.

Despite using a flexible local recoding transformation, Partition has to significantly
transform the original data in order to achieve k-anonymity. Records of different cardinalities
that contain items from a large domain must end up to groups of identical records of size at
least k. In response to this problem [Terrovitis et. al. 2008, Terrovitis et. al. 2010]
propose km-anonymity. km-anonymity requires that each combination of up to m quasi
identifiers must appear at least k times in the published data. The intuition behind km-anonymity
is that there is little privacy gain from protecting against adversaries who already know most of
the terms of one record, and significant information loss in the effort to do so. In [Terrovitis et. al.
2008, Terrovitis et. al. 2010] algorithms that rely on generalization using both global and local
recoding are proposed.




In [Loukides et. al. 2011, Gkoulalas et. al. 2011] the authors provide an even more targeted
approach. The paper assumes that a set of privacy and utility constraints is known by the data
publisher. Such assumption holds in specific domains, like that of medical records [Loukides et.
al. 2010/II]. Exploiting this explicit knowledge about privacy dangers and utility the authors
propose algorithms that achieve protection against identity disclosure with limited information
loss.

Protection against attribute disclosure

The data values in a dataset are not usually equally important as personal information.
A common distinction in privacy related literature is between quasi identifiers and
sensitive values. Quasi identifiers are usually known through several sources and they
do not threaten the privacy of an individual. Sensitive values on the other hand, are
not considered available through other sources and they reveal important personal
information. If such a distinction holds and it is known by the data publisher, then data



                                                                                                2/5
                                                                                               Phoca PDF
Privacy Preserving Publication of Set-Valued Data

Written by {ga=terrovitis}


must also be protected against the disclosure of sensitive attributes. A common
guaranty for protecting against sensitive values is l-diversity [Machanavajjhala et. al.
2006]. l-diversity guarantees that any adversary cannot associate her background knowledge
with less than l well represented sensitive values. Well-represented is usually defined as a
probability threshold: an adversary cannot associate her background knowledge with any
sensitive value with probability over 1/l.




The first anonymization method that provided protection against attribute disclosure in
set-valued attributes was proposed in [Ghinita et. al. 2008]. The proposal of [Ghinita et. al.
2008, Ghinita et. al. 2011] relies on separating sensitive values from quasi identifiers as
depicted in Tables 4 and 5. The idea of separation was first proposed in the relational context in
[Xiao et. al 2006], but it was adjusted and extended in [Ghinita et. al. 2008, Ghinita et. al. 2011]
for the set-valued context. The basic idea of proposed the anonymization method is to create
clusters of similar records (with respect to quasi identifiers) and then publish at each cluster the
quasi identifiers and the sensitive values separately. A benefit of this transformation with respect
to generalization and suppression is that it does not require creating groups with identical quasi
identifiers. This way the information loss is kept low, even for data of very high cardinality and
dimensionality.

Protection against attribute disclosure is also studied in [Cao et. al. 2010]. The
proposed guaranty, p-uncertainty is similar to as l-diversity and requires that quasi identifiers
cannot be associated with sensitive values with probability over 1/p. The novelty of the
approach is that is considers as quasi identifier every record subset that appears in the dataset,
including the ones that contain sensitive values. This means that sensitive values can also act
as quasi identifiers. The proposed anonymization method relies both on generalization and
suppression.

A guaranty that provides protection both from identity and attribute disclosure,
the (h,k,p)-coherence, is proposed in [Xu et. al. 2008/I, Xu et. al. 2008/II].
(h,k,p)-coherence. Similarly to km-anonymity it protects from adversaries how
know up to p terms, by guaranteeing that every combination of items will
appear at least k times. Moreover, (h,k,p)-coherence guarantees that combinations of up to p
items cannot be associated with a sensitive value with probability over h. The proposed
anonymization method relies solely on suppression.

In [Loukides et. al. 2010/I] an anonymization method that can be tailored to specific sensitive
inferences is presented. The authors propose the notion of PS-rules, which are sensitive
association rules specified by the data owner. The anonymization procedure guarantees that an
adversary will not be able to infer these rules with high certainty. The proposed anonymization
algorithms are based on generalization.

Differential privacy in for set-valued attributes

Differential privacy [Dwork et. al. 2006] is a more generic guaranty than k-anonymity
and l-diversity and the protection it offers is broader than protection against identity and attribute


                                                                                              3/5
                                                                                              Phoca PDF
Privacy Preserving Publication of Set-Valued Data

Written by {ga=terrovitis}


disclosure. It does not make any assumptions about the adversary's background knowledge
(although anonymization methods that provide differential privacy might make some implicit
ones [Kifer et. al. 2011]), and it requires that any analysis of the anonymized data is not
significantly affected by the addition or removal of one record in the original data. The trade-off
for providing such a strong anonymity guaranty is that the information quality of the result if
significantly affected. Only the results of specific analysis can be provided, by adding noise in a
non-deterministic procedure.

A first approach to the anonymization of set-values by using differential privacy appears
in [Korolova et. al. 2009], in the context of web search query logs. Each record is the
web search history of a single user, which can be modeled as a set value, but the result
of the anonymization is not set-values. Only the frequent query terms and they support
(with the addition of Laplace noise) is published. A more recent approach that publishes
anonymized set values under differential privacy and provides set-values as a result,
appears in [Chen 2011]. The paper proposes DiffPart, a top-down algorithm that decides which
frequent itemsets can be published, without violating differential privacy. The supports of all
published items are distorted by adding Laplace noise.

References

      [Aggarwal 2007] Charu C. Aggarwal. On Randomization, Public Information and the Curse
      of Dimensionality. In Proceedings of ICDE'2007.
      [Cao et. al. 2010] Cao, P. Karras, C. Raïssi, and K.-L. Tan. ?-uncertainty: Inference-Proof
      Transaction Anonymization. PVLDB 2010.
      [Chen 2010] Rui Chen, Noman Mohammed, Benjamin C. M. Fung, Bipin C. Desai, and Li
      Xiong. Publishing Set-Valued Data via Differential Privacy. PVLDB 4(11), 2011.
      [Dwork et. al. 2006] C. Dwork, F. McSherry, K. Nissim, and A. Smith. Calibrating noise to
      sensitivity in private data analysis. In Theory of Cryptography Conference, pages 265-284,
      2006.
      [Ghinita et. al. 2008] Ghinita G., Tao Y., Kalnis P., On the Anonymization of Sparse
      High-Dimensional Data, ICDE, 2008.
      [Ghinita et. al. 2011] Ghinita G., Kalnis P., Tao Y., "Anonymous Publication of Sensitive
      Transactional Data", IEEE Transactions on Knowledge and Data Engineering (IEEE
      TKDE), 23(2), 161-174, 2011.
      [Gkoulalas et. al. 2011] PCTA: Privacy-constrained Clustering-based Transaction Data
      Anonymization. A. Gkoulalas-Divanis, G. Loukides, 4th International Workshop on Privacy
      and Anonymity in the Information Society, ACM, 2011.
      [He et. al. 2009] Y. He and J. F. Naughton. Anonymization of set-valued data via top-down,
      local generalization. In VLDB, pages 934-945, 2009.
      [Kifer et. al. 2011] Daniel Kifer, Ashwin Machanavajjhala. No free lunch in data privacy.
       SIGMOD Conference'2011.
      [Korolova et. al. 2009] Aleksandra Korolova, Krishnaram Kenthapadi, Nina Mishra,
      Alexandros Ntoulas. Releasing search queries and clicks privately. WWW'2009.
      [Loukides et. al. 2010/I] Grigorios Loukides, Aris Gkoulalas-Divanis, Jianhua Shao:
      Anonymizing Transaction Data to Eliminate Sensitive Inferences. DEXA (1), 2010.
      [Loukides et. al. 2010/II] Anonymization of Electronic Medical Records for Validating
      Genome-Wide Association Studies. G Loukides, A Gkoulalas-Divanis, B Malin,
      Proceedings of the National Academy of Sciences, National Acad Sciences, 2010.


                                                                                            4/5
                                                                                            Phoca PDF
Privacy Preserving Publication of Set-Valued Data

Written by {ga=terrovitis}


      [Loukides et. al. 2011] Grigorios Loukides, Aris Gkoulalas-Divanis, Bradley Malin: COAT:
      COnstraint-based anonymization of transactions. Knowl. Inf. Syst. 28(2): 251-282 (2011)
      [Machanavajjhala et. al. 2006] Machanavajjhala, A., Gehrke, J., Kifer, D.,
      Venkitasubramaniam, M. l-Diversity: Privacy Beyond k-Anonymity. ICDE, 2006.
      [Samarati 2001] P. Samarati. Protecting Respondents' Identities in Microdata Release.
      IEEE TKDE, 13(6), 2001.
      [Sweeney 2002] L. Sweeney. k-Anonymity: A Model for Protecting Privacy. International
      Journal on Uncertainty, Fuzziness and Knowledge-based Systems, 2002.
      [Terrovitis et. al. 2008] M. Terrovitis, N. Mamoulis and P. Kalnis. Privacy-preserving
      Anonymization of Set-valued Data. PVLDB, 1(1): 115-125, 2008.
      [Terrovitis et. al. 2010] M. Terrovitis, N. Mamoulis and P. Kalnis. Local and Global
      Recoding Methods for Anonymizing Set-valued Data. The VLDB Journal, (to appear),
      2010.
      [Xu et. al. 2008/I] Yabo Xu, Benjamin C. M. Fung, Ke Wang, Ada Wai-Chee Fu, Jian Pei:
      Publishing Sensitive Transactions for Itemset Utility. ICDM 2008: 1109-1114
      [Xu et. al. 2008/II] Yabo Xu, Ke Wang, Ada Wai-Chee Fu, Philip S. Yu: Anonymizing
      transaction databases for publication. KDD 2008: 767-775
      [Xiao et. al 2006] X. Xiao and Y. Tao, "Anatomy: Simple and Effective Privacy
      Preservation," in Proc. of VLDB, 2006, pp. 139-150.




                                                                                        5/5
                                                                                        Phoca PDF

Más contenido relacionado

La actualidad más candente

Using Randomized Response Techniques for Privacy-Preserving Data Mining
Using Randomized Response Techniques for Privacy-Preserving Data MiningUsing Randomized Response Techniques for Privacy-Preserving Data Mining
Using Randomized Response Techniques for Privacy-Preserving Data Mining14894
 
Provider Aware Anonymization Algorithm for Preserving M - Privacy
Provider Aware Anonymization Algorithm for Preserving M - PrivacyProvider Aware Anonymization Algorithm for Preserving M - Privacy
Provider Aware Anonymization Algorithm for Preserving M - PrivacyIJERA Editor
 
An New Attractive Mage Technique Using L-Diversity
An New Attractive Mage Technique Using L-Diversity  An New Attractive Mage Technique Using L-Diversity
An New Attractive Mage Technique Using L-Diversity mlaij
 
Privacy Preserving DB Systems
Privacy Preserving DB SystemsPrivacy Preserving DB Systems
Privacy Preserving DB SystemsAshraf Bashir
 
A Review Study on the Privacy Preserving Data Mining Techniques and Approaches
A Review Study on the Privacy Preserving Data Mining Techniques and ApproachesA Review Study on the Privacy Preserving Data Mining Techniques and Approaches
A Review Study on the Privacy Preserving Data Mining Techniques and Approaches14894
 
AN EFFICIENT SOLUTION FOR PRIVACYPRESERVING, SECURE REMOTE ACCESS TO SENSITIV...
AN EFFICIENT SOLUTION FOR PRIVACYPRESERVING, SECURE REMOTE ACCESS TO SENSITIV...AN EFFICIENT SOLUTION FOR PRIVACYPRESERVING, SECURE REMOTE ACCESS TO SENSITIV...
AN EFFICIENT SOLUTION FOR PRIVACYPRESERVING, SECURE REMOTE ACCESS TO SENSITIV...cscpconf
 
Cryptography for privacy preserving data mining
Cryptography for privacy preserving data miningCryptography for privacy preserving data mining
Cryptography for privacy preserving data miningMesbah Uddin Khan
 
Modeling and Detection of Data Leakage Fraud
Modeling and Detection of Data Leakage FraudModeling and Detection of Data Leakage Fraud
Modeling and Detection of Data Leakage FraudIOSR Journals
 
Data mining and privacy preserving in data mining
Data mining and privacy preserving in data miningData mining and privacy preserving in data mining
Data mining and privacy preserving in data miningNeeda Multani
 
Big Data Repository for Structural Biology: Challenges and Opportunities by P...
Big Data Repository for Structural Biology: Challenges and Opportunities by P...Big Data Repository for Structural Biology: Challenges and Opportunities by P...
Big Data Repository for Structural Biology: Challenges and Opportunities by P...datascienceiqss
 
Ijeee 7-11-privacy preserving distributed data mining with anonymous id assig...
Ijeee 7-11-privacy preserving distributed data mining with anonymous id assig...Ijeee 7-11-privacy preserving distributed data mining with anonymous id assig...
Ijeee 7-11-privacy preserving distributed data mining with anonymous id assig...Kumar Goud
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
Classification on multi label dataset using rule mining technique
Classification on multi label dataset using rule mining techniqueClassification on multi label dataset using rule mining technique
Classification on multi label dataset using rule mining techniqueeSAT Publishing House
 
DataTags: Sharing Privacy Sensitive Data by Michael Bar-sinai
DataTags: Sharing Privacy Sensitive Data by Michael Bar-sinaiDataTags: Sharing Privacy Sensitive Data by Michael Bar-sinai
DataTags: Sharing Privacy Sensitive Data by Michael Bar-sinaidatascienceiqss
 
Towards FAIR Open Science with PID Kernel Information: RPID Testbed
Towards FAIR Open Science with PID Kernel Information: RPID TestbedTowards FAIR Open Science with PID Kernel Information: RPID Testbed
Towards FAIR Open Science with PID Kernel Information: RPID TestbedBeth Plale
 
Enhancing access privacy of range retrievals over b+trees
Enhancing access privacy of range retrievals over b+treesEnhancing access privacy of range retrievals over b+trees
Enhancing access privacy of range retrievals over b+treesMigrant Systems
 
A Codon Frequency Obfuscation Heuristic for Raw Genomic Data Privacy
A Codon Frequency Obfuscation Heuristic for Raw Genomic Data PrivacyA Codon Frequency Obfuscation Heuristic for Raw Genomic Data Privacy
A Codon Frequency Obfuscation Heuristic for Raw Genomic Data PrivacyKato Mivule
 
Privacy Preserving Data Mining
Privacy Preserving Data MiningPrivacy Preserving Data Mining
Privacy Preserving Data MiningVrushali Malvadkar
 
Privacy Preserving Data Mining
Privacy Preserving Data MiningPrivacy Preserving Data Mining
Privacy Preserving Data MiningROMALEE AMOLIC
 

La actualidad más candente (20)

Using Randomized Response Techniques for Privacy-Preserving Data Mining
Using Randomized Response Techniques for Privacy-Preserving Data MiningUsing Randomized Response Techniques for Privacy-Preserving Data Mining
Using Randomized Response Techniques for Privacy-Preserving Data Mining
 
Provider Aware Anonymization Algorithm for Preserving M - Privacy
Provider Aware Anonymization Algorithm for Preserving M - PrivacyProvider Aware Anonymization Algorithm for Preserving M - Privacy
Provider Aware Anonymization Algorithm for Preserving M - Privacy
 
An New Attractive Mage Technique Using L-Diversity
An New Attractive Mage Technique Using L-Diversity  An New Attractive Mage Technique Using L-Diversity
An New Attractive Mage Technique Using L-Diversity
 
Privacy Preserving DB Systems
Privacy Preserving DB SystemsPrivacy Preserving DB Systems
Privacy Preserving DB Systems
 
A Review Study on the Privacy Preserving Data Mining Techniques and Approaches
A Review Study on the Privacy Preserving Data Mining Techniques and ApproachesA Review Study on the Privacy Preserving Data Mining Techniques and Approaches
A Review Study on the Privacy Preserving Data Mining Techniques and Approaches
 
AN EFFICIENT SOLUTION FOR PRIVACYPRESERVING, SECURE REMOTE ACCESS TO SENSITIV...
AN EFFICIENT SOLUTION FOR PRIVACYPRESERVING, SECURE REMOTE ACCESS TO SENSITIV...AN EFFICIENT SOLUTION FOR PRIVACYPRESERVING, SECURE REMOTE ACCESS TO SENSITIV...
AN EFFICIENT SOLUTION FOR PRIVACYPRESERVING, SECURE REMOTE ACCESS TO SENSITIV...
 
Cryptography for privacy preserving data mining
Cryptography for privacy preserving data miningCryptography for privacy preserving data mining
Cryptography for privacy preserving data mining
 
Modeling and Detection of Data Leakage Fraud
Modeling and Detection of Data Leakage FraudModeling and Detection of Data Leakage Fraud
Modeling and Detection of Data Leakage Fraud
 
Data mining and privacy preserving in data mining
Data mining and privacy preserving in data miningData mining and privacy preserving in data mining
Data mining and privacy preserving in data mining
 
Big Data Repository for Structural Biology: Challenges and Opportunities by P...
Big Data Repository for Structural Biology: Challenges and Opportunities by P...Big Data Repository for Structural Biology: Challenges and Opportunities by P...
Big Data Repository for Structural Biology: Challenges and Opportunities by P...
 
Ijeee 7-11-privacy preserving distributed data mining with anonymous id assig...
Ijeee 7-11-privacy preserving distributed data mining with anonymous id assig...Ijeee 7-11-privacy preserving distributed data mining with anonymous id assig...
Ijeee 7-11-privacy preserving distributed data mining with anonymous id assig...
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
Classification on multi label dataset using rule mining technique
Classification on multi label dataset using rule mining techniqueClassification on multi label dataset using rule mining technique
Classification on multi label dataset using rule mining technique
 
DataTags: Sharing Privacy Sensitive Data by Michael Bar-sinai
DataTags: Sharing Privacy Sensitive Data by Michael Bar-sinaiDataTags: Sharing Privacy Sensitive Data by Michael Bar-sinai
DataTags: Sharing Privacy Sensitive Data by Michael Bar-sinai
 
Ib3514141422
Ib3514141422Ib3514141422
Ib3514141422
 
Towards FAIR Open Science with PID Kernel Information: RPID Testbed
Towards FAIR Open Science with PID Kernel Information: RPID TestbedTowards FAIR Open Science with PID Kernel Information: RPID Testbed
Towards FAIR Open Science with PID Kernel Information: RPID Testbed
 
Enhancing access privacy of range retrievals over b+trees
Enhancing access privacy of range retrievals over b+treesEnhancing access privacy of range retrievals over b+trees
Enhancing access privacy of range retrievals over b+trees
 
A Codon Frequency Obfuscation Heuristic for Raw Genomic Data Privacy
A Codon Frequency Obfuscation Heuristic for Raw Genomic Data PrivacyA Codon Frequency Obfuscation Heuristic for Raw Genomic Data Privacy
A Codon Frequency Obfuscation Heuristic for Raw Genomic Data Privacy
 
Privacy Preserving Data Mining
Privacy Preserving Data MiningPrivacy Preserving Data Mining
Privacy Preserving Data Mining
 
Privacy Preserving Data Mining
Privacy Preserving Data MiningPrivacy Preserving Data Mining
Privacy Preserving Data Mining
 

Similar a 51 privacy-preserving-publication-of-set-valued-data

ANONYMIZATION OF PRIVACY PRESERVATION
ANONYMIZATION OF PRIVACY PRESERVATIONANONYMIZATION OF PRIVACY PRESERVATION
ANONYMIZATION OF PRIVACY PRESERVATIONpharmaindexing
 
78201919
7820191978201919
78201919IJRAT
 
Multilevel Privacy Preserving by Linear and Non Linear Data Distortion
Multilevel Privacy Preserving by Linear and Non Linear Data DistortionMultilevel Privacy Preserving by Linear and Non Linear Data Distortion
Multilevel Privacy Preserving by Linear and Non Linear Data DistortionIOSR Journals
 
A survey on privacy preserving data publishing
A survey on privacy preserving data publishingA survey on privacy preserving data publishing
A survey on privacy preserving data publishingijcisjournal
 
Towards A Differential Privacy and Utility Preserving Machine Learning Classi...
Towards A Differential Privacy and Utility Preserving Machine Learning Classi...Towards A Differential Privacy and Utility Preserving Machine Learning Classi...
Towards A Differential Privacy and Utility Preserving Machine Learning Classi...Kato Mivule
 
IRJET- Study Paper on: Ontology-based Privacy Data Chain Disclosure Disco...
IRJET-  	  Study Paper on: Ontology-based Privacy Data Chain Disclosure Disco...IRJET-  	  Study Paper on: Ontology-based Privacy Data Chain Disclosure Disco...
IRJET- Study Paper on: Ontology-based Privacy Data Chain Disclosure Disco...IRJET Journal
 
Data Anonymization Process Challenges and Context Missions
Data Anonymization Process Challenges and Context MissionsData Anonymization Process Challenges and Context Missions
Data Anonymization Process Challenges and Context Missionsijdms
 
Privacy Preserving Approaches for High Dimensional Data
Privacy Preserving Approaches for High Dimensional DataPrivacy Preserving Approaches for High Dimensional Data
Privacy Preserving Approaches for High Dimensional Dataijtsrd
 
Privacy Preserving Clustering on Distorted data
Privacy Preserving Clustering on Distorted dataPrivacy Preserving Clustering on Distorted data
Privacy Preserving Clustering on Distorted dataIOSR Journals
 
Query Processing with k-Anonymity
Query Processing with k-AnonymityQuery Processing with k-Anonymity
Query Processing with k-AnonymityWaqas Tariq
 
A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...
A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...
A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...Kato Mivule
 
A Brief Survey on Vertex and Label Anonymization Techniques of Online Social ...
A Brief Survey on Vertex and Label Anonymization Techniques of Online Social ...A Brief Survey on Vertex and Label Anonymization Techniques of Online Social ...
A Brief Survey on Vertex and Label Anonymization Techniques of Online Social ...IJERA Editor
 
m-Privacy for Collaborative Data Publishing
m-Privacy for Collaborative Data Publishingm-Privacy for Collaborative Data Publishing
m-Privacy for Collaborative Data PublishingMigrant Systems
 

Similar a 51 privacy-preserving-publication-of-set-valued-data (20)

Hy3414631468
Hy3414631468Hy3414631468
Hy3414631468
 
ANONYMIZATION OF PRIVACY PRESERVATION
ANONYMIZATION OF PRIVACY PRESERVATIONANONYMIZATION OF PRIVACY PRESERVATION
ANONYMIZATION OF PRIVACY PRESERVATION
 
78201919
7820191978201919
78201919
 
Multilevel Privacy Preserving by Linear and Non Linear Data Distortion
Multilevel Privacy Preserving by Linear and Non Linear Data DistortionMultilevel Privacy Preserving by Linear and Non Linear Data Distortion
Multilevel Privacy Preserving by Linear and Non Linear Data Distortion
 
A survey on privacy preserving data publishing
A survey on privacy preserving data publishingA survey on privacy preserving data publishing
A survey on privacy preserving data publishing
 
Towards A Differential Privacy and Utility Preserving Machine Learning Classi...
Towards A Differential Privacy and Utility Preserving Machine Learning Classi...Towards A Differential Privacy and Utility Preserving Machine Learning Classi...
Towards A Differential Privacy and Utility Preserving Machine Learning Classi...
 
IRJET- Study Paper on: Ontology-based Privacy Data Chain Disclosure Disco...
IRJET-  	  Study Paper on: Ontology-based Privacy Data Chain Disclosure Disco...IRJET-  	  Study Paper on: Ontology-based Privacy Data Chain Disclosure Disco...
IRJET- Study Paper on: Ontology-based Privacy Data Chain Disclosure Disco...
 
Data Anonymization Process Challenges and Context Missions
Data Anonymization Process Challenges and Context MissionsData Anonymization Process Challenges and Context Missions
Data Anonymization Process Challenges and Context Missions
 
Privacy Preserving Approaches for High Dimensional Data
Privacy Preserving Approaches for High Dimensional DataPrivacy Preserving Approaches for High Dimensional Data
Privacy Preserving Approaches for High Dimensional Data
 
130509
130509130509
130509
 
130509
130509130509
130509
 
ϵ-DIFFERENTIAL PRIVACY MODEL FOR VERTICALLY PARTITIONED DATA TO SECURE THE PR...
ϵ-DIFFERENTIAL PRIVACY MODEL FOR VERTICALLY PARTITIONED DATA TO SECURE THE PR...ϵ-DIFFERENTIAL PRIVACY MODEL FOR VERTICALLY PARTITIONED DATA TO SECURE THE PR...
ϵ-DIFFERENTIAL PRIVACY MODEL FOR VERTICALLY PARTITIONED DATA TO SECURE THE PR...
 
winbis1005
winbis1005winbis1005
winbis1005
 
Privacy Preserving Clustering on Distorted data
Privacy Preserving Clustering on Distorted dataPrivacy Preserving Clustering on Distorted data
Privacy Preserving Clustering on Distorted data
 
Query Processing with k-Anonymity
Query Processing with k-AnonymityQuery Processing with k-Anonymity
Query Processing with k-Anonymity
 
A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...
A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...
A Comparative Analysis of Data Privacy and Utility Parameter Adjustment, Usin...
 
AIDA ICITET
AIDA ICITETAIDA ICITET
AIDA ICITET
 
Cp34550555
Cp34550555Cp34550555
Cp34550555
 
A Brief Survey on Vertex and Label Anonymization Techniques of Online Social ...
A Brief Survey on Vertex and Label Anonymization Techniques of Online Social ...A Brief Survey on Vertex and Label Anonymization Techniques of Online Social ...
A Brief Survey on Vertex and Label Anonymization Techniques of Online Social ...
 
m-Privacy for Collaborative Data Publishing
m-Privacy for Collaborative Data Publishingm-Privacy for Collaborative Data Publishing
m-Privacy for Collaborative Data Publishing
 

Último

Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfchloefrazer622
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajanpragatimahajan3
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 

Último (20)

Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Disha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdfDisha NEET Physics Guide for classes 11 and 12.pdf
Disha NEET Physics Guide for classes 11 and 12.pdf
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 

51 privacy-preserving-publication-of-set-valued-data

  • 1. Privacy Preserving Publication of Set-Valued Data Written by {ga=terrovitis} Set-values are a natural representation of data that appears in a large number of application areas, ranging from retail sales logs to medical records. Datasets with set-values are sparse multidimensional data that usually contain unique combinations of values. Publishing such datasets poses great dangers to the privacy of the individuals that are associated with the records. Adversaries with partial background knowledge can use it to retrieve the whole record that is associated with an individual. The aim of this article is to provide an overview of the basic anonymization methods for set-valued data. Anonymization methods allow sanitizing set-valued datasets in such a way that the privacy of the individuals who are associated with the data is guaranteed. The paper describes the basic privacy guaranties that are used in research literature, and briefly presents the state-of-the-art anonymization methods. Privacy preserving publishing of set-values As technology infiltrates more and more aspects of our lives, each human activity leaves a digital trace in some repository. Vast amounts of personal data are implicitly or explicitly created each day, and rarely one is aware of the extent of information that is kept and processed about him or her. These personal data give rise to significant concerns about user privacy, since important and sensitive details about one's life are collected and exploited by third parties. The goal of privacy preservation technologies is to provide tools that allow greater control over the dissemination of user data. A promising trend in the field is Privacy Preserving Data Publishing (PPDP), which allows sharing of anonymized data. Anonymizing a dataset is not limited to the removal of direct identifiers that might exist in a dataset, e.g. the name or the Social Security Number of a person. It also includes removing secondary information, e.g. like age, zipcode that might lead indirectly to the true identity of an individual. This secondary information is often referred to as quasi-identifiers. A basic attack scenario against user privacy is as follows: An adversary who knows some characteristics of a person, e.g. age and zipcode, searches the anonymized data for records that match her background knowledge. If a single record is found, then she be can certain that the record refers to the person see knows. The sparser the data are, the more unique combinations exist, and the easier it is for an adversary to locate unique records that correspond to specific users. This makes collections of set-values, which are sparse multidimensional data, very hard to anonymize effectively without compromising user privacy [Aggarwal 2007]. There are several methods in research literature that address the challenges of anonymizing set-valued data. I report the basic techniques classified in three categories: anonymization methods that protect against identity disclosure, methods that protect against attribute disclosure and methods that offer differential privacy. Protection against identity disclosure Protection against identity disclosure guarantees that adversaries will not be able to 1/5 Phoca PDF
  • 2. Privacy Preserving Publication of Set-Valued Data Written by {ga=terrovitis} associate specific records with a known individual. The most popular guaranty is k -anonymity [Samarati 2001, Sweeney 2002]. k-anonymity guarantees that each record will be indistinguishable from other k-1 records, with respect to the quasi identifiers. Every combination of quasi identifiers appears 0 or more than k times in the anonymized dataset. In [He et. al. 2009] a method for transforming set-valued data to a k-anonymous form, the Partition algorithm, is proposed. Partition employs generalization for transforming the data to k-anonymous form. Generalization is the replacement of a group of original values by one new more abstract one. For example, if the residence area of an individual is reported in terms of cities, the city name can be replaced by the country name in anonymized data as in Figure 2. Partition employs local recoding; not all appearances of an original value are replaced by a generalized one. Partition is a top down algorithm; it starts by considering that all values are generalized to the more generic value of the generalization hierarchy and then drills down the hierarchy until the k-anonymity property no longer holds. Despite using a flexible local recoding transformation, Partition has to significantly transform the original data in order to achieve k-anonymity. Records of different cardinalities that contain items from a large domain must end up to groups of identical records of size at least k. In response to this problem [Terrovitis et. al. 2008, Terrovitis et. al. 2010] propose km-anonymity. km-anonymity requires that each combination of up to m quasi identifiers must appear at least k times in the published data. The intuition behind km-anonymity is that there is little privacy gain from protecting against adversaries who already know most of the terms of one record, and significant information loss in the effort to do so. In [Terrovitis et. al. 2008, Terrovitis et. al. 2010] algorithms that rely on generalization using both global and local recoding are proposed. In [Loukides et. al. 2011, Gkoulalas et. al. 2011] the authors provide an even more targeted approach. The paper assumes that a set of privacy and utility constraints is known by the data publisher. Such assumption holds in specific domains, like that of medical records [Loukides et. al. 2010/II]. Exploiting this explicit knowledge about privacy dangers and utility the authors propose algorithms that achieve protection against identity disclosure with limited information loss. Protection against attribute disclosure The data values in a dataset are not usually equally important as personal information. A common distinction in privacy related literature is between quasi identifiers and sensitive values. Quasi identifiers are usually known through several sources and they do not threaten the privacy of an individual. Sensitive values on the other hand, are not considered available through other sources and they reveal important personal information. If such a distinction holds and it is known by the data publisher, then data 2/5 Phoca PDF
  • 3. Privacy Preserving Publication of Set-Valued Data Written by {ga=terrovitis} must also be protected against the disclosure of sensitive attributes. A common guaranty for protecting against sensitive values is l-diversity [Machanavajjhala et. al. 2006]. l-diversity guarantees that any adversary cannot associate her background knowledge with less than l well represented sensitive values. Well-represented is usually defined as a probability threshold: an adversary cannot associate her background knowledge with any sensitive value with probability over 1/l. The first anonymization method that provided protection against attribute disclosure in set-valued attributes was proposed in [Ghinita et. al. 2008]. The proposal of [Ghinita et. al. 2008, Ghinita et. al. 2011] relies on separating sensitive values from quasi identifiers as depicted in Tables 4 and 5. The idea of separation was first proposed in the relational context in [Xiao et. al 2006], but it was adjusted and extended in [Ghinita et. al. 2008, Ghinita et. al. 2011] for the set-valued context. The basic idea of proposed the anonymization method is to create clusters of similar records (with respect to quasi identifiers) and then publish at each cluster the quasi identifiers and the sensitive values separately. A benefit of this transformation with respect to generalization and suppression is that it does not require creating groups with identical quasi identifiers. This way the information loss is kept low, even for data of very high cardinality and dimensionality. Protection against attribute disclosure is also studied in [Cao et. al. 2010]. The proposed guaranty, p-uncertainty is similar to as l-diversity and requires that quasi identifiers cannot be associated with sensitive values with probability over 1/p. The novelty of the approach is that is considers as quasi identifier every record subset that appears in the dataset, including the ones that contain sensitive values. This means that sensitive values can also act as quasi identifiers. The proposed anonymization method relies both on generalization and suppression. A guaranty that provides protection both from identity and attribute disclosure, the (h,k,p)-coherence, is proposed in [Xu et. al. 2008/I, Xu et. al. 2008/II]. (h,k,p)-coherence. Similarly to km-anonymity it protects from adversaries how know up to p terms, by guaranteeing that every combination of items will appear at least k times. Moreover, (h,k,p)-coherence guarantees that combinations of up to p items cannot be associated with a sensitive value with probability over h. The proposed anonymization method relies solely on suppression. In [Loukides et. al. 2010/I] an anonymization method that can be tailored to specific sensitive inferences is presented. The authors propose the notion of PS-rules, which are sensitive association rules specified by the data owner. The anonymization procedure guarantees that an adversary will not be able to infer these rules with high certainty. The proposed anonymization algorithms are based on generalization. Differential privacy in for set-valued attributes Differential privacy [Dwork et. al. 2006] is a more generic guaranty than k-anonymity and l-diversity and the protection it offers is broader than protection against identity and attribute 3/5 Phoca PDF
  • 4. Privacy Preserving Publication of Set-Valued Data Written by {ga=terrovitis} disclosure. It does not make any assumptions about the adversary's background knowledge (although anonymization methods that provide differential privacy might make some implicit ones [Kifer et. al. 2011]), and it requires that any analysis of the anonymized data is not significantly affected by the addition or removal of one record in the original data. The trade-off for providing such a strong anonymity guaranty is that the information quality of the result if significantly affected. Only the results of specific analysis can be provided, by adding noise in a non-deterministic procedure. A first approach to the anonymization of set-values by using differential privacy appears in [Korolova et. al. 2009], in the context of web search query logs. Each record is the web search history of a single user, which can be modeled as a set value, but the result of the anonymization is not set-values. Only the frequent query terms and they support (with the addition of Laplace noise) is published. A more recent approach that publishes anonymized set values under differential privacy and provides set-values as a result, appears in [Chen 2011]. The paper proposes DiffPart, a top-down algorithm that decides which frequent itemsets can be published, without violating differential privacy. The supports of all published items are distorted by adding Laplace noise. References [Aggarwal 2007] Charu C. Aggarwal. On Randomization, Public Information and the Curse of Dimensionality. In Proceedings of ICDE'2007. [Cao et. al. 2010] Cao, P. Karras, C. Raïssi, and K.-L. Tan. ?-uncertainty: Inference-Proof Transaction Anonymization. PVLDB 2010. [Chen 2010] Rui Chen, Noman Mohammed, Benjamin C. M. Fung, Bipin C. Desai, and Li Xiong. Publishing Set-Valued Data via Differential Privacy. PVLDB 4(11), 2011. [Dwork et. al. 2006] C. Dwork, F. McSherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography Conference, pages 265-284, 2006. [Ghinita et. al. 2008] Ghinita G., Tao Y., Kalnis P., On the Anonymization of Sparse High-Dimensional Data, ICDE, 2008. [Ghinita et. al. 2011] Ghinita G., Kalnis P., Tao Y., "Anonymous Publication of Sensitive Transactional Data", IEEE Transactions on Knowledge and Data Engineering (IEEE TKDE), 23(2), 161-174, 2011. [Gkoulalas et. al. 2011] PCTA: Privacy-constrained Clustering-based Transaction Data Anonymization. A. Gkoulalas-Divanis, G. Loukides, 4th International Workshop on Privacy and Anonymity in the Information Society, ACM, 2011. [He et. al. 2009] Y. He and J. F. Naughton. Anonymization of set-valued data via top-down, local generalization. In VLDB, pages 934-945, 2009. [Kifer et. al. 2011] Daniel Kifer, Ashwin Machanavajjhala. No free lunch in data privacy. SIGMOD Conference'2011. [Korolova et. al. 2009] Aleksandra Korolova, Krishnaram Kenthapadi, Nina Mishra, Alexandros Ntoulas. Releasing search queries and clicks privately. WWW'2009. [Loukides et. al. 2010/I] Grigorios Loukides, Aris Gkoulalas-Divanis, Jianhua Shao: Anonymizing Transaction Data to Eliminate Sensitive Inferences. DEXA (1), 2010. [Loukides et. al. 2010/II] Anonymization of Electronic Medical Records for Validating Genome-Wide Association Studies. G Loukides, A Gkoulalas-Divanis, B Malin, Proceedings of the National Academy of Sciences, National Acad Sciences, 2010. 4/5 Phoca PDF
  • 5. Privacy Preserving Publication of Set-Valued Data Written by {ga=terrovitis} [Loukides et. al. 2011] Grigorios Loukides, Aris Gkoulalas-Divanis, Bradley Malin: COAT: COnstraint-based anonymization of transactions. Knowl. Inf. Syst. 28(2): 251-282 (2011) [Machanavajjhala et. al. 2006] Machanavajjhala, A., Gehrke, J., Kifer, D., Venkitasubramaniam, M. l-Diversity: Privacy Beyond k-Anonymity. ICDE, 2006. [Samarati 2001] P. Samarati. Protecting Respondents' Identities in Microdata Release. IEEE TKDE, 13(6), 2001. [Sweeney 2002] L. Sweeney. k-Anonymity: A Model for Protecting Privacy. International Journal on Uncertainty, Fuzziness and Knowledge-based Systems, 2002. [Terrovitis et. al. 2008] M. Terrovitis, N. Mamoulis and P. Kalnis. Privacy-preserving Anonymization of Set-valued Data. PVLDB, 1(1): 115-125, 2008. [Terrovitis et. al. 2010] M. Terrovitis, N. Mamoulis and P. Kalnis. Local and Global Recoding Methods for Anonymizing Set-valued Data. The VLDB Journal, (to appear), 2010. [Xu et. al. 2008/I] Yabo Xu, Benjamin C. M. Fung, Ke Wang, Ada Wai-Chee Fu, Jian Pei: Publishing Sensitive Transactions for Itemset Utility. ICDM 2008: 1109-1114 [Xu et. al. 2008/II] Yabo Xu, Ke Wang, Ada Wai-Chee Fu, Philip S. Yu: Anonymizing transaction databases for publication. KDD 2008: 767-775 [Xiao et. al 2006] X. Xiao and Y. Tao, "Anatomy: Simple and Effective Privacy Preservation," in Proc. of VLDB, 2006, pp. 139-150. 5/5 Phoca PDF