SlideShare a Scribd company logo
1 of 4
Download to read offline
ACEEE International Journal on Network Security, Vol 1, No. 2, July 2010




  Enhancing Privacy of Confidential Data using
               K Anonymization
                          R.Vidyabanu 1 , Divya Suzanne Thomas2 and N.Nagaveni3
    1 Sri Krishna College of Engineering and Technology ,Department of Applied Sciences, Coimbatore, India
                                           Email - vidhyabanu@yahoo.com
    2 Sri Krishna College of Engineering and Technology ,Department of Applied Sciences, Coimbatore, India
          3Coimbatore Institute of Technology, Department of Mathematics,Coimbatore, Tamilnadu, India

Abstract-Recent advances in the field of data collection and        classification [5]. The goal is to find the optimal k-
related technologies have inaugurated a new era of                  anonymization that minimizes this cost metric. In
research where existing data mining algorithms should be            general, achieving optimal k-anonymization is NP-
reconsidered from a different point of view, this of privacy
                                                                    hard [6] [7]. Greedy methods were proposed in [8] [9]
preservation. Much research has been done recently on
privacy preserving data mining (PPDM) based on
                                                                    [10] [11]. Scalable algorithms (with the exponential
perturbation, randomization and secure multiparty                   complexity the worst-case) for finding the optimal k-
computations and more recently on anonymity including               anonymization were studied in [3] [4] [5].
k-anonymity and l-diversity.                                            The optimal k-anonymization is not suitable to
We use the technique of k-Anonymization to de-associate             classification where masking structures and masking
sensitive attributes from the corresponding identifiers.            noises have different effects: the former deems to
This is done by anonymizing the linking attributes so that          damage classification whereas the latter helps
at least k released records match each value combination
of the linking attributes. This paper proposes a k-                 classification. It is well known in data mining and
Anonymization solution for classification. The proposed             machine learning that the unmodified data, which has
method has been implemented and evaluated using UCI                 the lowest possible cost according to any cost metric,
repository datasets.                                                often has a worse classification than some generalized
After the k-anonymization solution is determined for the            (i.e., masked) data. Similarly, less masked data could
original data, classification, a data mining technique using        have a worse classification than some more masked
the ID3 algorithm, is applied on both the original table and
the compressed table .The accuracy of the both is                   data.. The optimal k-anonymization seeks to minimize
compared by determining the entropy and the information             the error on the training data, thus over-fits the data,
gain values. Experiments show that the quality of                   subject to the privacy constraint. Neither the over-
classification can be preserved even for highly restrictive         fitting nor the privacy constraint is relevant to the
anonymity requirements.                                             classification goal that seeks to minimize the error on
                                                                    future data.
Index Terms—k-Anonymization,           privacy,   masking,
classification                                                                       II   PRELIMINARIES
                    I INTRODUCTION                                  A. Masking
   In today's information society, given the                           To transform the table T to satisfy the anonymity
unprecedented ease of finding and accessing                         requirement, we consider three types of masking
information, protection of privacy has become a very                operations on the attributes D(j) in U[QID] where
important concern. In particular, large databases that              QID refers to the quasi identifier.
include sensitive information (e.g., health information)               Generalize
have often been available to public access, frequently                 D(j) is a categorical attribute with a taxonomy tree.
with identifiers stripped in an attempt to protect privacy.         A leaf node represents a domain value and a parent
However, if such information can be associated with the             node represents a less specific value. A generalized
corresponding people's identifiers, perhaps using other             D(j) can be viewed as a cut through its taxonomy tree.
publicly available databases, then privacy can be                   A cut of a tree is a subset of values in the tree that
seriously violated.                                                 contains exactly one value on each root-to leaf path.
   The notion of k-anonymity was first proposed in [2].             This type of generalization does not suffer from the
In general, a cost metric is used to measure the data               interpretation difficulty discussed earlier.
distortion of anonymization. Two types of cost metric                  Suppress:
have been considered. The first type, based on the                     D(j) is a categorical attribute with no taxonomy tree.
notion of minimal generalization [3] [4], is independent            The suppression of a value on D(j) means replacing all
of the purpose of the data release. The second type                 occurrences of the value with the special value. All
factors in the purpose of the data release such as                  suppressed values on Dj are represented by the same,

                                                               34
© 2010 ACEEE
DOI: 01.ijns.01.02.07
ACEEE International Journal on Network Security, Vol 1, No. 2, July 2010


which is treated as a new value in D(j) by a                        information and decreases the anonymity since records
classification algorithm.                                           with specific values are more distinguishable. The key
   Discretize:                                                      is selecting the best refinement at each step with both
   D(j) is a continuous attribute. The discretization of a          impacts considered.
value v on D(j) means replacing all occurrences of v                   The notion of refinement on different types of
with an interval containing the value.                              attributes D(j) that belongs to UQID(i) and defining
                                                                    selection criterion for a single refinement is formally
B.ID3 Algorithm
                                                                    defined below.
   Each non-leaf node of a decision tree corresponds to                Refinement for Generalization: Consider a
an input attribute, and each arc to a possible value of             categorical attribute D(j) with a user-specified
that attribute. A leaf node corresponds to the expected             taxonomy tree. Let child(v) be the set of child values
value of the output attribute when the input attributes             of v in a user-specified taxonomy tree. A refinement,
are described by the path from the root node to that leaf           written v - child(v), replaces the parent value v with
node.                                                               the child value in child(v) that generalizes the domain
   In a “good” decision tree, each non-leaf node should             value in each (generalized) record that contains v.
correspond to the input attribute which is the most                    Refinement for Suppression: For a categorical
informative about the output attribute amongst all the              attribute D(j) without taxonomy tree, a refinement ,
input attributes not yet considered in the path from the            refers to disclosing one value v from the set of
root node to that node. This is because we would like to            suppressed values Sup(j) . Let Rj denote the set of
predict the output attribute using the smallest possible            suppressed records that currently contain per (j) .
number of questions on average.                                     Disclosing v means replacing per (j) with v in all
   Entropy is used to determine how informative a                   records in R(j) that originally contain v.
particular input attribute is about the output attribute for           Refinement for Discretization: For a continuous
a subset of the training data. Entropy is a measure of              attribute, refinement is similar to that for
uncertainty in communication systems. It is                         generalization except that no prior taxonomy tree is
fundamental in modern information theory.                           given and the taxonomy tree has to be grown
                                                                    dynamically in the process of refinement. Initially, the
         III PROPOSED K-ANONYMIZATION                               interval that covers the full range of the attribute forms
                   TECHNIQUE                                        the root. The refinement on an interval v, written for
                                                                    all child(v), refers to the optimal split of v into two
A. Masking operation                                                child intervals child(v) that maximizes the information
   To transform T to satisfy the anonymity requirement,             gain. The anonymity is not used for finding a split
we consider three types of masking operations on the                good for classification. This is similar to defining a
attributes D(j) in U[QID] is done as mentioned above.               taxonomy tree where the main consideration is how
                                                                    the taxonomy best describes the application. Due to
B. Anonymisation by top-down refinement
                                                                    this extra step of identifying the optimal split of the
   For given Data Set, Consider the ‘p’ Quasi identifiers           parent interval, we treat continuous attributes
QID(1),……., QID(p) on table T. A(qid(i)) denotes the                separately from categorical attributes with taxonomy
number of data records in T that share the value qid(i)             trees. A refinement is valid (with respect to T) if T
on QID(i). The anonymity of QID(i), denoted                         satisfies the anonymity requirement after the
A(QID(i)), is the smallest a(qid(i)) for any value qid(i)           refinement. A refinement is beneficial (with respect to
on QID(i). A table T           satisfies the anonymity              T) if more than one class is involved in the refined
requirement (QID(1), k(1),…, QID(p), k(p)) if                       records. A refinement is performed only if it is both
A(QID(i)) >= k(i) where 1<= I <= p and where k(i) is                valid and beneficial. Therefore, a refinement
the anonymity threshold on QID(I )specified by the data             guarantees that every newly generated qid has a(qid) ¸
provider. We make note of the no. of the records or                 k.
attribute values for each quasi identifiers that are in the
given database and the check on the anonymity                       D.Computation of infogain, anonyloss and score of the
requirement Threshold level k(i) for each QID(i).                   refinement:
                                                                    InfoGain(v): defined as
C. Refinement of the masked datasets
   A table T can be masked by a sequence of                                                   R
refinements starting from the most masked state in
which each attribute is either generalized to the top most
                                                                                             ¥c
                                                                                                      c
                                                                                                          Rv
                                                                                                                I ( Rc )
                                                                    InfoGain (v) = I(Rv) -                                 (1)
value, or suppressed to the special value, or represented
by a single interval. This method iteratively refines a                                                         Rx )=
masked value selected from the current set of cuts,                   where I(Rx) is the entropy of        I(
suppressed values and intervals, until violating the
anonymity requirement. Each refinement increases the

                                                               35
© 2010 ACEEE
DOI: 01.ijns.01.02.07
ACEEE International Journal on Network Security, Vol 1, No. 2, July 2010



   ¥ freq(R cls) / R * log freq(R , cls) / R
                                                                                         IV RESULTS
               x,         x                x          x             Table 1 depicts the runtime of top down refinement
   cls                                                           using generalization, suppression and discretization for
                                                      (2)        200K to 1M data records and based on two types of
                                                                 anonymity requirements. AllAttQID refers to the
   freq(Rx; cls) is the number of data records in Rx             single quasi identifiers having all 14 attributes. This is
having the class cls. Intuitively, I(Rx) measures the            one of the most time consuming settings because of
entropy (or .impurity .) of classes in Rx. The more              the largest number of candidate refinements to
dominating the majority class in Rx is, the smaller I(Rx)        consider at each iteration. Top down refinement
is (i.e., less entropy in Rx).Therefore, I(Rx) measures          requires more iterations to reach a solution, hence
the error because non-majority                                   more runtime. Top down refinement takes
   AnonyLoss(v): defined as                                      approximately 80 seconds to transform 1M records.
                                                                 Compared to AllAttQID, top down refinement
                    avg{A(QID j ) − Av (QID j )}                 becomes less efficient for handling multiple quasi
                                                                 identifiers. The implementation results summarized
AnonyLoss(v) =                                        (3)
                                                                 below. Fig 1 shows the time taken for masking when
  where A(QIDj) and Av(QIDj)              represent   the        executed with different number of records.
anonymity before  and  after               refining    v.           Data Source: UCI Repository: Adult dataset
avg{ A(QID j ) − Av (QID j )}                                       Fields: Age, Work Class, Sex
                                  is the average loss of            Operation: Masking
anonymity for all QIDj that contain the attribute of                Suppression: Work Class, Sex
refinement process to heuristically maximize the                    Discretization: Age
classification goal. Consider a refinement for all
child(v) where for all Dj ,Dj is a categorical attribute                       TABLE 1TIME TAKEN IN SECONDS
with a user-specified taxonomy tree or Dj is a
continuous attribute with a dynamically grown
taxonomy tree.
   The refinement has two effects: it increases the
information of the refined records with respect to
classification, and it decreases the anonymity of the
refined records with respect to privacy. These effects
are measured by .information gain denoted by
InfoGain(v) in (1) , and .anonymity loss denoted by
AnonyLoss(v) in (3). v is a good candidate for
refinement. If InfoGain(v) is large and AnonyLoss(v) is
small. Our selection criterion is choosing the candidate
v, for the next refinement, that has the maximum
information-gain/ anonymity loss trade-off, defined as
Score(v) =InfoGain(v) AnonyLoss(v) + 1                (4)
   1 is added to AnonyLoss(v) to avoid division by zero.                  Figure 1 . for masking
Each choice of InfoGain(v) and AnonyLoss(v) gives a
trade-off between classification and anonymization. It              Table two summarizes the final results on applying
should be noted that Score is not a goodness metric of           the ID3 algorithm on the compressed and the original
k-anonymization. Infact, it is difficult to have a closed        data for a series of inputs . A comparison of the
form metric to capture the classification goal on future         accuracy of classification of transformed data with the
data.                                                            original data is depicted in fig 2.
                                                                    Data Source: UCI Repository: Adult dataset
E. Implementation of ID3 algorithm                                  Fields: Age, Work Class, Sex
   ID3 algorithm is used to construct the decision tree             Operation: Masking
by employing a top-down, greedy search through the                  Suppression: Work Class, Sex
given sets to test each attribute at every tree node. In            Discretization: Age
order to select the attribute that is most useful for               Generalization: Education
classifying a given sets, we introduce a metric                     Operation: ID3 Algorithm
information gain.                                                   Accuracy Calculation:
   To find an optimal way to classify a learning set,               Accuracy % = No. of records selected / Entropy
what we need to do is to minimize the questions asked.
Thus, we need some function which can measure which
questions provide the most balanced splitting.

                                                            36
© 2010 ACEEE
DOI: 01.ijns.01.02.07
ACEEE International Journal on Network Security, Vol 1, No. 2, July 2010


                                                                                                                                    REFERENCES
                                                                                                             1]. Benjamin C. M. Fung, Ke Wang, and Philip S. Yu,
                                                                                                                  Fellow “Anonymizing Classification Data for Privacy
                                                                                                                  Preservation” IEEE transactions on knowledge and
                                                                                                                  engineering 2007 .
                                                                                                             2]. Samarati and L. Sweeney,”Generalizing data to provide
                                                                                                                  anonymity when disclosing information” Proc. of the
                                                                                                                  17th ACM SIGACTSIGMOD- SIGART Symposium on
                                                                                                                  Principles of Database Systems (PODS 98), Seattle,
                                                                                                                  WA, June 1998, p. 188.
                                                                                                             3]. P. Samarati, ”Protecting respondents' identities in
                                                                                                                  microdata release”, IEEE Transactions on Knowledge
                                                                                                                  Engineering (TKDE), vol. 13, no. 6, 2001, pp.
                                                                                                                  1010.1027
                                                 Privacy Using K-Anonymity
                                                                                                             4]. L. Sweeney, “Achieving k-anonymity privacy
                           120                                                                                    protection using generalization and suppression”.
                           100
                                                                                                                  International Journal on Uncertainty, Fuzziness, and
                                                                                                                  Knowledge-based Systems, vol. 10, no. 5, pp. 571.588,
                           80                                                                                     2002.
   Decision Tree Privacy




                           60
                                                                                     Original Table          5]. R. J. Bayardo and R. Agrawal, “Data privacy through
                                                                                     Compressed Table
                                                                                                                  optimal kanonymization” Proc. of the 21st
                           40
                                                                                                                  International Conference on Data Engineering (ICDE),
                           20                                                                                     Tokyo, Japan, April 2005, pp. 217.228.
                                                                                                             6]. G. Aggarwal, T. Feder, K. Kenthapadi, R. Motwani, R.
                            0
                                 10   50   100       150   200     250   300   350                                Panigrahy, D. Thomas, and A. Zhu, “Approximation
                                                   no of records                                                  algorithms for k-anonymity”,.Journal of Privacy
                                                                                                                  Technology, no. 20051120001, November 2005
                                      Figure 2 . Accuracy using k-Anonymity                                  7]. A.Meyerson and R. Williams, “On the complexity of
                                                                                                                  optimal kanonymity”. Proc. of the 23rd ACM
                                                                                                                  Symposium on Principles of Database Systems (PODS),
                                                 V            CONCLUSION                                          2004, pp. 223.228.
                                                                                                             8]. L. Sweeney, .Data_y” A system for providing
   The problem of ensuring individual's anonymity                                                                 anonymity in medical data,” Proc. of the International
while releasing person-specific data for classification                                                           Conference on Database Security, 1998, pp. 356.381.
analysis is considered. The previous optimal k-                                                              9]. V. S. Iyengar, “Transforming data to satisfy privacy
anonymization based on a closed form of cost metric                                                               constraints”,.Proc. of the 8th ACM SIGKDD
does not address this classification requirement . The                                                            International Conference on Knowledge Discovery and
proposed method can be used to hide valuable                                                                      Data Mining, Edmonton, AB, Canada, July 2002, pp.
information while presenting it on a publicly accessible                                                          279.288.
place like internet. The results shows that when                                                             [10]. K. Wang, P. Yu, and S. Chakraborty, “Bottom-up
                                                                                                                   generalization: a data mining solution to privacy
classification is applied to both the original table and                                                          protection”, Proc. of the 4th IEEE International
transformed table ,the accuracy level is not too low for                                                          Conference on Data Mining (ICDM), November 2004.
the transformed table when compared to that of the                                                           [11]. B. C. M. Fung, K. Wang, and P. S. Yu, “Top-down
original table. The proposed privacy preserving                                                                   specialization     for    information     and     privacy
transformation preserved the nature of the data even in                                                           preservation”. Proc. of the 21st International
the transformed form. The classification accuracy while                                                           Conference on Data Engineering (ICDE), Tokyo,
using the transformed data is almost equal to that of the                                                         Japan, April 2005, pp. 205.216.
original dataset.




                                                                                                        37
© 2010 ACEEE
DOI: 01.ijns.01.02.07

More Related Content

What's hot

CCIA'2008: On the dimensions of data complexity through synthetic data sets
CCIA'2008: On the dimensions of data complexity through synthetic data setsCCIA'2008: On the dimensions of data complexity through synthetic data sets
CCIA'2008: On the dimensions of data complexity through synthetic data setsAlbert Orriols-Puig
 
Deep Domain Adaptation using Adversarial Learning and GAN
Deep Domain Adaptation using Adversarial Learning and GAN Deep Domain Adaptation using Adversarial Learning and GAN
Deep Domain Adaptation using Adversarial Learning and GAN RishirajChakraborty4
 
Name a naming mechanism for delay disruption tolerant network
Name a naming mechanism for delay disruption tolerant networkName a naming mechanism for delay disruption tolerant network
Name a naming mechanism for delay disruption tolerant networkIJCNCJournal
 
Dh31504508
Dh31504508Dh31504508
Dh31504508IJMER
 
CommunicationComplexity1_jieren
CommunicationComplexity1_jierenCommunicationComplexity1_jieren
CommunicationComplexity1_jierenjie ren
 
LOG MESSAGE ANOMALY DETECTION WITH OVERSAMPLING
LOG MESSAGE ANOMALY DETECTION WITH OVERSAMPLINGLOG MESSAGE ANOMALY DETECTION WITH OVERSAMPLING
LOG MESSAGE ANOMALY DETECTION WITH OVERSAMPLINGijaia
 
A Simple Introduction to Neural Information Retrieval
A Simple Introduction to Neural Information RetrievalA Simple Introduction to Neural Information Retrieval
A Simple Introduction to Neural Information RetrievalBhaskar Mitra
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for SearchBhaskar Mitra
 
Wsd as distributed constraint optimization problem
Wsd as distributed constraint optimization problemWsd as distributed constraint optimization problem
Wsd as distributed constraint optimization problemlolokikipipi
 
Modeling documents with Generative Adversarial Networks - John Glover
Modeling documents with Generative Adversarial Networks - John GloverModeling documents with Generative Adversarial Networks - John Glover
Modeling documents with Generative Adversarial Networks - John GloverSebastian Ruder
 
International Journal for Research in Applied Science & Engineering
International Journal for Research in Applied Science & EngineeringInternational Journal for Research in Applied Science & Engineering
International Journal for Research in Applied Science & Engineeringpriyanka singh
 
집합모델 확장불린모델
집합모델  확장불린모델집합모델  확장불린모델
집합모델 확장불린모델guesta34d441
 
Deep Neural Methods for Retrieval
Deep Neural Methods for RetrievalDeep Neural Methods for Retrieval
Deep Neural Methods for RetrievalBhaskar Mitra
 

What's hot (17)

CCIA'2008: On the dimensions of data complexity through synthetic data sets
CCIA'2008: On the dimensions of data complexity through synthetic data setsCCIA'2008: On the dimensions of data complexity through synthetic data sets
CCIA'2008: On the dimensions of data complexity through synthetic data sets
 
Bb25322324
Bb25322324Bb25322324
Bb25322324
 
Deep Domain Adaptation using Adversarial Learning and GAN
Deep Domain Adaptation using Adversarial Learning and GAN Deep Domain Adaptation using Adversarial Learning and GAN
Deep Domain Adaptation using Adversarial Learning and GAN
 
Name a naming mechanism for delay disruption tolerant network
Name a naming mechanism for delay disruption tolerant networkName a naming mechanism for delay disruption tolerant network
Name a naming mechanism for delay disruption tolerant network
 
Ijetcas14 527
Ijetcas14 527Ijetcas14 527
Ijetcas14 527
 
Dr24751755
Dr24751755Dr24751755
Dr24751755
 
Dh31504508
Dh31504508Dh31504508
Dh31504508
 
CommunicationComplexity1_jieren
CommunicationComplexity1_jierenCommunicationComplexity1_jieren
CommunicationComplexity1_jieren
 
LOG MESSAGE ANOMALY DETECTION WITH OVERSAMPLING
LOG MESSAGE ANOMALY DETECTION WITH OVERSAMPLINGLOG MESSAGE ANOMALY DETECTION WITH OVERSAMPLING
LOG MESSAGE ANOMALY DETECTION WITH OVERSAMPLING
 
A Simple Introduction to Neural Information Retrieval
A Simple Introduction to Neural Information RetrievalA Simple Introduction to Neural Information Retrieval
A Simple Introduction to Neural Information Retrieval
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for Search
 
Wsd as distributed constraint optimization problem
Wsd as distributed constraint optimization problemWsd as distributed constraint optimization problem
Wsd as distributed constraint optimization problem
 
Modeling documents with Generative Adversarial Networks - John Glover
Modeling documents with Generative Adversarial Networks - John GloverModeling documents with Generative Adversarial Networks - John Glover
Modeling documents with Generative Adversarial Networks - John Glover
 
International Journal for Research in Applied Science & Engineering
International Journal for Research in Applied Science & EngineeringInternational Journal for Research in Applied Science & Engineering
International Journal for Research in Applied Science & Engineering
 
집합모델 확장불린모델
집합모델  확장불린모델집합모델  확장불린모델
집합모델 확장불린모델
 
Fm2510101015
Fm2510101015Fm2510101015
Fm2510101015
 
Deep Neural Methods for Retrieval
Deep Neural Methods for RetrievalDeep Neural Methods for Retrieval
Deep Neural Methods for Retrieval
 

Similar to Enhancing Privacy of Confidential Data using K Anonymization

DATA SHARING TAXONOMY RECORDS FOR SECURITY CONSERVATION
DATA SHARING TAXONOMY RECORDS FOR SECURITY CONSERVATIONDATA SHARING TAXONOMY RECORDS FOR SECURITY CONSERVATION
DATA SHARING TAXONOMY RECORDS FOR SECURITY CONSERVATIONcsandit
 
GRAPH BASED LOCAL RECODING FOR DATA ANONYMIZATION
GRAPH BASED LOCAL RECODING FOR DATA ANONYMIZATIONGRAPH BASED LOCAL RECODING FOR DATA ANONYMIZATION
GRAPH BASED LOCAL RECODING FOR DATA ANONYMIZATIONijdms
 
A New Method for Preserving Privacy in Data Publishing
A New Method for Preserving Privacy in Data PublishingA New Method for Preserving Privacy in Data Publishing
A New Method for Preserving Privacy in Data Publishingcscpconf
 
An Effective Semantic Encrypted Relational Data Using K-Nn Model
An Effective Semantic Encrypted Relational Data Using K-Nn ModelAn Effective Semantic Encrypted Relational Data Using K-Nn Model
An Effective Semantic Encrypted Relational Data Using K-Nn ModelClaraZara1
 
VECTOR QUANTIZATION FOR PRIVACY PRESERVING CLUSTERING IN DATA MINING
VECTOR QUANTIZATION FOR PRIVACY PRESERVING CLUSTERING IN DATA MININGVECTOR QUANTIZATION FOR PRIVACY PRESERVING CLUSTERING IN DATA MINING
VECTOR QUANTIZATION FOR PRIVACY PRESERVING CLUSTERING IN DATA MININGacijjournal
 
Reduct generation for the incremental data using rough set theory
Reduct generation for the incremental data using rough set theoryReduct generation for the incremental data using rough set theory
Reduct generation for the incremental data using rough set theorycsandit
 
AN EFFECTIVE SEMANTIC ENCRYPTED RELATIONAL DATA USING K-NN MODEL
AN EFFECTIVE SEMANTIC ENCRYPTED RELATIONAL DATA USING K-NN MODELAN EFFECTIVE SEMANTIC ENCRYPTED RELATIONAL DATA USING K-NN MODEL
AN EFFECTIVE SEMANTIC ENCRYPTED RELATIONAL DATA USING K-NN MODELijsptm
 
Ieeepro techno solutions ieee java project - generalized approach for data
Ieeepro techno solutions  ieee java project - generalized approach for dataIeeepro techno solutions  ieee java project - generalized approach for data
Ieeepro techno solutions ieee java project - generalized approach for datahemanthbbc
 
Ieeepro techno solutions ieee dotnet project - generalized approach for data
Ieeepro techno solutions  ieee dotnet project - generalized approach for dataIeeepro techno solutions  ieee dotnet project - generalized approach for data
Ieeepro techno solutions ieee dotnet project - generalized approach for dataASAITHAMBIRAJAA
 
Ieeepro techno solutions ieee java project - generalized approach for data
Ieeepro techno solutions  ieee java project - generalized approach for dataIeeepro techno solutions  ieee java project - generalized approach for data
Ieeepro techno solutions ieee java project - generalized approach for datahemanthbbc
 
Ieeepro techno solutions ieee java project - generalized approach for data
Ieeepro techno solutions  ieee java project - generalized approach for dataIeeepro techno solutions  ieee java project - generalized approach for data
Ieeepro techno solutions ieee java project - generalized approach for datahemanthbbc
 
DWDM-AG-day-1-2023-SEC A plus Half B--.pdf
DWDM-AG-day-1-2023-SEC A plus Half B--.pdfDWDM-AG-day-1-2023-SEC A plus Half B--.pdf
DWDM-AG-day-1-2023-SEC A plus Half B--.pdfChristinaGayenMondal
 
Analysis of Classification Algorithm in Data Mining
Analysis of Classification Algorithm in Data MiningAnalysis of Classification Algorithm in Data Mining
Analysis of Classification Algorithm in Data Miningijdmtaiir
 
PRIVACY PRESERVING NAIVE BAYES CLASSIFIER FOR HORIZONTALLY PARTITIONED DATA U...
PRIVACY PRESERVING NAIVE BAYES CLASSIFIER FOR HORIZONTALLY PARTITIONED DATA U...PRIVACY PRESERVING NAIVE BAYES CLASSIFIER FOR HORIZONTALLY PARTITIONED DATA U...
PRIVACY PRESERVING NAIVE BAYES CLASSIFIER FOR HORIZONTALLY PARTITIONED DATA U...IJNSA Journal
 
ANONYMIZATION OF PRIVACY PRESERVATION
ANONYMIZATION OF PRIVACY PRESERVATIONANONYMIZATION OF PRIVACY PRESERVATION
ANONYMIZATION OF PRIVACY PRESERVATIONpharmaindexing
 

Similar to Enhancing Privacy of Confidential Data using K Anonymization (20)

DATA SHARING TAXONOMY RECORDS FOR SECURITY CONSERVATION
DATA SHARING TAXONOMY RECORDS FOR SECURITY CONSERVATIONDATA SHARING TAXONOMY RECORDS FOR SECURITY CONSERVATION
DATA SHARING TAXONOMY RECORDS FOR SECURITY CONSERVATION
 
Bj24390398
Bj24390398Bj24390398
Bj24390398
 
T24144148
T24144148T24144148
T24144148
 
41 125-1-pb
41 125-1-pb41 125-1-pb
41 125-1-pb
 
GRAPH BASED LOCAL RECODING FOR DATA ANONYMIZATION
GRAPH BASED LOCAL RECODING FOR DATA ANONYMIZATIONGRAPH BASED LOCAL RECODING FOR DATA ANONYMIZATION
GRAPH BASED LOCAL RECODING FOR DATA ANONYMIZATION
 
A New Method for Preserving Privacy in Data Publishing
A New Method for Preserving Privacy in Data PublishingA New Method for Preserving Privacy in Data Publishing
A New Method for Preserving Privacy in Data Publishing
 
An Effective Semantic Encrypted Relational Data Using K-Nn Model
An Effective Semantic Encrypted Relational Data Using K-Nn ModelAn Effective Semantic Encrypted Relational Data Using K-Nn Model
An Effective Semantic Encrypted Relational Data Using K-Nn Model
 
VECTOR QUANTIZATION FOR PRIVACY PRESERVING CLUSTERING IN DATA MINING
VECTOR QUANTIZATION FOR PRIVACY PRESERVING CLUSTERING IN DATA MININGVECTOR QUANTIZATION FOR PRIVACY PRESERVING CLUSTERING IN DATA MINING
VECTOR QUANTIZATION FOR PRIVACY PRESERVING CLUSTERING IN DATA MINING
 
winbis1005
winbis1005winbis1005
winbis1005
 
Reduct generation for the incremental data using rough set theory
Reduct generation for the incremental data using rough set theoryReduct generation for the incremental data using rough set theory
Reduct generation for the incremental data using rough set theory
 
AN EFFECTIVE SEMANTIC ENCRYPTED RELATIONAL DATA USING K-NN MODEL
AN EFFECTIVE SEMANTIC ENCRYPTED RELATIONAL DATA USING K-NN MODELAN EFFECTIVE SEMANTIC ENCRYPTED RELATIONAL DATA USING K-NN MODEL
AN EFFECTIVE SEMANTIC ENCRYPTED RELATIONAL DATA USING K-NN MODEL
 
Ieeepro techno solutions ieee java project - generalized approach for data
Ieeepro techno solutions  ieee java project - generalized approach for dataIeeepro techno solutions  ieee java project - generalized approach for data
Ieeepro techno solutions ieee java project - generalized approach for data
 
Ieeepro techno solutions ieee dotnet project - generalized approach for data
Ieeepro techno solutions  ieee dotnet project - generalized approach for dataIeeepro techno solutions  ieee dotnet project - generalized approach for data
Ieeepro techno solutions ieee dotnet project - generalized approach for data
 
Ieeepro techno solutions ieee java project - generalized approach for data
Ieeepro techno solutions  ieee java project - generalized approach for dataIeeepro techno solutions  ieee java project - generalized approach for data
Ieeepro techno solutions ieee java project - generalized approach for data
 
Ieeepro techno solutions ieee java project - generalized approach for data
Ieeepro techno solutions  ieee java project - generalized approach for dataIeeepro techno solutions  ieee java project - generalized approach for data
Ieeepro techno solutions ieee java project - generalized approach for data
 
DATA INTEGRITY AUDITING WITHOUT PRIVATE KEY STORAGE FOR SECURE CLOUD STORAGE
DATA INTEGRITY AUDITING WITHOUT PRIVATE KEY STORAGE  FOR SECURE CLOUD STORAGEDATA INTEGRITY AUDITING WITHOUT PRIVATE KEY STORAGE  FOR SECURE CLOUD STORAGE
DATA INTEGRITY AUDITING WITHOUT PRIVATE KEY STORAGE FOR SECURE CLOUD STORAGE
 
DWDM-AG-day-1-2023-SEC A plus Half B--.pdf
DWDM-AG-day-1-2023-SEC A plus Half B--.pdfDWDM-AG-day-1-2023-SEC A plus Half B--.pdf
DWDM-AG-day-1-2023-SEC A plus Half B--.pdf
 
Analysis of Classification Algorithm in Data Mining
Analysis of Classification Algorithm in Data MiningAnalysis of Classification Algorithm in Data Mining
Analysis of Classification Algorithm in Data Mining
 
PRIVACY PRESERVING NAIVE BAYES CLASSIFIER FOR HORIZONTALLY PARTITIONED DATA U...
PRIVACY PRESERVING NAIVE BAYES CLASSIFIER FOR HORIZONTALLY PARTITIONED DATA U...PRIVACY PRESERVING NAIVE BAYES CLASSIFIER FOR HORIZONTALLY PARTITIONED DATA U...
PRIVACY PRESERVING NAIVE BAYES CLASSIFIER FOR HORIZONTALLY PARTITIONED DATA U...
 
ANONYMIZATION OF PRIVACY PRESERVATION
ANONYMIZATION OF PRIVACY PRESERVATIONANONYMIZATION OF PRIVACY PRESERVATION
ANONYMIZATION OF PRIVACY PRESERVATION
 

More from IDES Editor

Power System State Estimation - A Review
Power System State Estimation - A ReviewPower System State Estimation - A Review
Power System State Estimation - A ReviewIDES Editor
 
Artificial Intelligence Technique based Reactive Power Planning Incorporating...
Artificial Intelligence Technique based Reactive Power Planning Incorporating...Artificial Intelligence Technique based Reactive Power Planning Incorporating...
Artificial Intelligence Technique based Reactive Power Planning Incorporating...IDES Editor
 
Design and Performance Analysis of Genetic based PID-PSS with SVC in a Multi-...
Design and Performance Analysis of Genetic based PID-PSS with SVC in a Multi-...Design and Performance Analysis of Genetic based PID-PSS with SVC in a Multi-...
Design and Performance Analysis of Genetic based PID-PSS with SVC in a Multi-...IDES Editor
 
Optimal Placement of DG for Loss Reduction and Voltage Sag Mitigation in Radi...
Optimal Placement of DG for Loss Reduction and Voltage Sag Mitigation in Radi...Optimal Placement of DG for Loss Reduction and Voltage Sag Mitigation in Radi...
Optimal Placement of DG for Loss Reduction and Voltage Sag Mitigation in Radi...IDES Editor
 
Line Losses in the 14-Bus Power System Network using UPFC
Line Losses in the 14-Bus Power System Network using UPFCLine Losses in the 14-Bus Power System Network using UPFC
Line Losses in the 14-Bus Power System Network using UPFCIDES Editor
 
Study of Structural Behaviour of Gravity Dam with Various Features of Gallery...
Study of Structural Behaviour of Gravity Dam with Various Features of Gallery...Study of Structural Behaviour of Gravity Dam with Various Features of Gallery...
Study of Structural Behaviour of Gravity Dam with Various Features of Gallery...IDES Editor
 
Assessing Uncertainty of Pushover Analysis to Geometric Modeling
Assessing Uncertainty of Pushover Analysis to Geometric ModelingAssessing Uncertainty of Pushover Analysis to Geometric Modeling
Assessing Uncertainty of Pushover Analysis to Geometric ModelingIDES Editor
 
Secure Multi-Party Negotiation: An Analysis for Electronic Payments in Mobile...
Secure Multi-Party Negotiation: An Analysis for Electronic Payments in Mobile...Secure Multi-Party Negotiation: An Analysis for Electronic Payments in Mobile...
Secure Multi-Party Negotiation: An Analysis for Electronic Payments in Mobile...IDES Editor
 
Selfish Node Isolation & Incentivation using Progressive Thresholds
Selfish Node Isolation & Incentivation using Progressive ThresholdsSelfish Node Isolation & Incentivation using Progressive Thresholds
Selfish Node Isolation & Incentivation using Progressive ThresholdsIDES Editor
 
Various OSI Layer Attacks and Countermeasure to Enhance the Performance of WS...
Various OSI Layer Attacks and Countermeasure to Enhance the Performance of WS...Various OSI Layer Attacks and Countermeasure to Enhance the Performance of WS...
Various OSI Layer Attacks and Countermeasure to Enhance the Performance of WS...IDES Editor
 
Responsive Parameter based an AntiWorm Approach to Prevent Wormhole Attack in...
Responsive Parameter based an AntiWorm Approach to Prevent Wormhole Attack in...Responsive Parameter based an AntiWorm Approach to Prevent Wormhole Attack in...
Responsive Parameter based an AntiWorm Approach to Prevent Wormhole Attack in...IDES Editor
 
Cloud Security and Data Integrity with Client Accountability Framework
Cloud Security and Data Integrity with Client Accountability FrameworkCloud Security and Data Integrity with Client Accountability Framework
Cloud Security and Data Integrity with Client Accountability FrameworkIDES Editor
 
Genetic Algorithm based Layered Detection and Defense of HTTP Botnet
Genetic Algorithm based Layered Detection and Defense of HTTP BotnetGenetic Algorithm based Layered Detection and Defense of HTTP Botnet
Genetic Algorithm based Layered Detection and Defense of HTTP BotnetIDES Editor
 
Enhancing Data Storage Security in Cloud Computing Through Steganography
Enhancing Data Storage Security in Cloud Computing Through SteganographyEnhancing Data Storage Security in Cloud Computing Through Steganography
Enhancing Data Storage Security in Cloud Computing Through SteganographyIDES Editor
 
Low Energy Routing for WSN’s
Low Energy Routing for WSN’sLow Energy Routing for WSN’s
Low Energy Routing for WSN’sIDES Editor
 
Permutation of Pixels within the Shares of Visual Cryptography using KBRP for...
Permutation of Pixels within the Shares of Visual Cryptography using KBRP for...Permutation of Pixels within the Shares of Visual Cryptography using KBRP for...
Permutation of Pixels within the Shares of Visual Cryptography using KBRP for...IDES Editor
 
Rotman Lens Performance Analysis
Rotman Lens Performance AnalysisRotman Lens Performance Analysis
Rotman Lens Performance AnalysisIDES Editor
 
Band Clustering for the Lossless Compression of AVIRIS Hyperspectral Images
Band Clustering for the Lossless Compression of AVIRIS Hyperspectral ImagesBand Clustering for the Lossless Compression of AVIRIS Hyperspectral Images
Band Clustering for the Lossless Compression of AVIRIS Hyperspectral ImagesIDES Editor
 
Microelectronic Circuit Analogous to Hydrogen Bonding Network in Active Site ...
Microelectronic Circuit Analogous to Hydrogen Bonding Network in Active Site ...Microelectronic Circuit Analogous to Hydrogen Bonding Network in Active Site ...
Microelectronic Circuit Analogous to Hydrogen Bonding Network in Active Site ...IDES Editor
 
Texture Unit based Monocular Real-world Scene Classification using SOM and KN...
Texture Unit based Monocular Real-world Scene Classification using SOM and KN...Texture Unit based Monocular Real-world Scene Classification using SOM and KN...
Texture Unit based Monocular Real-world Scene Classification using SOM and KN...IDES Editor
 

More from IDES Editor (20)

Power System State Estimation - A Review
Power System State Estimation - A ReviewPower System State Estimation - A Review
Power System State Estimation - A Review
 
Artificial Intelligence Technique based Reactive Power Planning Incorporating...
Artificial Intelligence Technique based Reactive Power Planning Incorporating...Artificial Intelligence Technique based Reactive Power Planning Incorporating...
Artificial Intelligence Technique based Reactive Power Planning Incorporating...
 
Design and Performance Analysis of Genetic based PID-PSS with SVC in a Multi-...
Design and Performance Analysis of Genetic based PID-PSS with SVC in a Multi-...Design and Performance Analysis of Genetic based PID-PSS with SVC in a Multi-...
Design and Performance Analysis of Genetic based PID-PSS with SVC in a Multi-...
 
Optimal Placement of DG for Loss Reduction and Voltage Sag Mitigation in Radi...
Optimal Placement of DG for Loss Reduction and Voltage Sag Mitigation in Radi...Optimal Placement of DG for Loss Reduction and Voltage Sag Mitigation in Radi...
Optimal Placement of DG for Loss Reduction and Voltage Sag Mitigation in Radi...
 
Line Losses in the 14-Bus Power System Network using UPFC
Line Losses in the 14-Bus Power System Network using UPFCLine Losses in the 14-Bus Power System Network using UPFC
Line Losses in the 14-Bus Power System Network using UPFC
 
Study of Structural Behaviour of Gravity Dam with Various Features of Gallery...
Study of Structural Behaviour of Gravity Dam with Various Features of Gallery...Study of Structural Behaviour of Gravity Dam with Various Features of Gallery...
Study of Structural Behaviour of Gravity Dam with Various Features of Gallery...
 
Assessing Uncertainty of Pushover Analysis to Geometric Modeling
Assessing Uncertainty of Pushover Analysis to Geometric ModelingAssessing Uncertainty of Pushover Analysis to Geometric Modeling
Assessing Uncertainty of Pushover Analysis to Geometric Modeling
 
Secure Multi-Party Negotiation: An Analysis for Electronic Payments in Mobile...
Secure Multi-Party Negotiation: An Analysis for Electronic Payments in Mobile...Secure Multi-Party Negotiation: An Analysis for Electronic Payments in Mobile...
Secure Multi-Party Negotiation: An Analysis for Electronic Payments in Mobile...
 
Selfish Node Isolation & Incentivation using Progressive Thresholds
Selfish Node Isolation & Incentivation using Progressive ThresholdsSelfish Node Isolation & Incentivation using Progressive Thresholds
Selfish Node Isolation & Incentivation using Progressive Thresholds
 
Various OSI Layer Attacks and Countermeasure to Enhance the Performance of WS...
Various OSI Layer Attacks and Countermeasure to Enhance the Performance of WS...Various OSI Layer Attacks and Countermeasure to Enhance the Performance of WS...
Various OSI Layer Attacks and Countermeasure to Enhance the Performance of WS...
 
Responsive Parameter based an AntiWorm Approach to Prevent Wormhole Attack in...
Responsive Parameter based an AntiWorm Approach to Prevent Wormhole Attack in...Responsive Parameter based an AntiWorm Approach to Prevent Wormhole Attack in...
Responsive Parameter based an AntiWorm Approach to Prevent Wormhole Attack in...
 
Cloud Security and Data Integrity with Client Accountability Framework
Cloud Security and Data Integrity with Client Accountability FrameworkCloud Security and Data Integrity with Client Accountability Framework
Cloud Security and Data Integrity with Client Accountability Framework
 
Genetic Algorithm based Layered Detection and Defense of HTTP Botnet
Genetic Algorithm based Layered Detection and Defense of HTTP BotnetGenetic Algorithm based Layered Detection and Defense of HTTP Botnet
Genetic Algorithm based Layered Detection and Defense of HTTP Botnet
 
Enhancing Data Storage Security in Cloud Computing Through Steganography
Enhancing Data Storage Security in Cloud Computing Through SteganographyEnhancing Data Storage Security in Cloud Computing Through Steganography
Enhancing Data Storage Security in Cloud Computing Through Steganography
 
Low Energy Routing for WSN’s
Low Energy Routing for WSN’sLow Energy Routing for WSN’s
Low Energy Routing for WSN’s
 
Permutation of Pixels within the Shares of Visual Cryptography using KBRP for...
Permutation of Pixels within the Shares of Visual Cryptography using KBRP for...Permutation of Pixels within the Shares of Visual Cryptography using KBRP for...
Permutation of Pixels within the Shares of Visual Cryptography using KBRP for...
 
Rotman Lens Performance Analysis
Rotman Lens Performance AnalysisRotman Lens Performance Analysis
Rotman Lens Performance Analysis
 
Band Clustering for the Lossless Compression of AVIRIS Hyperspectral Images
Band Clustering for the Lossless Compression of AVIRIS Hyperspectral ImagesBand Clustering for the Lossless Compression of AVIRIS Hyperspectral Images
Band Clustering for the Lossless Compression of AVIRIS Hyperspectral Images
 
Microelectronic Circuit Analogous to Hydrogen Bonding Network in Active Site ...
Microelectronic Circuit Analogous to Hydrogen Bonding Network in Active Site ...Microelectronic Circuit Analogous to Hydrogen Bonding Network in Active Site ...
Microelectronic Circuit Analogous to Hydrogen Bonding Network in Active Site ...
 
Texture Unit based Monocular Real-world Scene Classification using SOM and KN...
Texture Unit based Monocular Real-world Scene Classification using SOM and KN...Texture Unit based Monocular Real-world Scene Classification using SOM and KN...
Texture Unit based Monocular Real-world Scene Classification using SOM and KN...
 

Recently uploaded

DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 

Recently uploaded (20)

DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 

Enhancing Privacy of Confidential Data using K Anonymization

  • 1. ACEEE International Journal on Network Security, Vol 1, No. 2, July 2010 Enhancing Privacy of Confidential Data using K Anonymization R.Vidyabanu 1 , Divya Suzanne Thomas2 and N.Nagaveni3 1 Sri Krishna College of Engineering and Technology ,Department of Applied Sciences, Coimbatore, India Email - vidhyabanu@yahoo.com 2 Sri Krishna College of Engineering and Technology ,Department of Applied Sciences, Coimbatore, India 3Coimbatore Institute of Technology, Department of Mathematics,Coimbatore, Tamilnadu, India Abstract-Recent advances in the field of data collection and classification [5]. The goal is to find the optimal k- related technologies have inaugurated a new era of anonymization that minimizes this cost metric. In research where existing data mining algorithms should be general, achieving optimal k-anonymization is NP- reconsidered from a different point of view, this of privacy hard [6] [7]. Greedy methods were proposed in [8] [9] preservation. Much research has been done recently on privacy preserving data mining (PPDM) based on [10] [11]. Scalable algorithms (with the exponential perturbation, randomization and secure multiparty complexity the worst-case) for finding the optimal k- computations and more recently on anonymity including anonymization were studied in [3] [4] [5]. k-anonymity and l-diversity. The optimal k-anonymization is not suitable to We use the technique of k-Anonymization to de-associate classification where masking structures and masking sensitive attributes from the corresponding identifiers. noises have different effects: the former deems to This is done by anonymizing the linking attributes so that damage classification whereas the latter helps at least k released records match each value combination of the linking attributes. This paper proposes a k- classification. It is well known in data mining and Anonymization solution for classification. The proposed machine learning that the unmodified data, which has method has been implemented and evaluated using UCI the lowest possible cost according to any cost metric, repository datasets. often has a worse classification than some generalized After the k-anonymization solution is determined for the (i.e., masked) data. Similarly, less masked data could original data, classification, a data mining technique using have a worse classification than some more masked the ID3 algorithm, is applied on both the original table and the compressed table .The accuracy of the both is data.. The optimal k-anonymization seeks to minimize compared by determining the entropy and the information the error on the training data, thus over-fits the data, gain values. Experiments show that the quality of subject to the privacy constraint. Neither the over- classification can be preserved even for highly restrictive fitting nor the privacy constraint is relevant to the anonymity requirements. classification goal that seeks to minimize the error on future data. Index Terms—k-Anonymization, privacy, masking, classification II PRELIMINARIES I INTRODUCTION A. Masking In today's information society, given the To transform the table T to satisfy the anonymity unprecedented ease of finding and accessing requirement, we consider three types of masking information, protection of privacy has become a very operations on the attributes D(j) in U[QID] where important concern. In particular, large databases that QID refers to the quasi identifier. include sensitive information (e.g., health information) Generalize have often been available to public access, frequently D(j) is a categorical attribute with a taxonomy tree. with identifiers stripped in an attempt to protect privacy. A leaf node represents a domain value and a parent However, if such information can be associated with the node represents a less specific value. A generalized corresponding people's identifiers, perhaps using other D(j) can be viewed as a cut through its taxonomy tree. publicly available databases, then privacy can be A cut of a tree is a subset of values in the tree that seriously violated. contains exactly one value on each root-to leaf path. The notion of k-anonymity was first proposed in [2]. This type of generalization does not suffer from the In general, a cost metric is used to measure the data interpretation difficulty discussed earlier. distortion of anonymization. Two types of cost metric Suppress: have been considered. The first type, based on the D(j) is a categorical attribute with no taxonomy tree. notion of minimal generalization [3] [4], is independent The suppression of a value on D(j) means replacing all of the purpose of the data release. The second type occurrences of the value with the special value. All factors in the purpose of the data release such as suppressed values on Dj are represented by the same, 34 © 2010 ACEEE DOI: 01.ijns.01.02.07
  • 2. ACEEE International Journal on Network Security, Vol 1, No. 2, July 2010 which is treated as a new value in D(j) by a information and decreases the anonymity since records classification algorithm. with specific values are more distinguishable. The key Discretize: is selecting the best refinement at each step with both D(j) is a continuous attribute. The discretization of a impacts considered. value v on D(j) means replacing all occurrences of v The notion of refinement on different types of with an interval containing the value. attributes D(j) that belongs to UQID(i) and defining selection criterion for a single refinement is formally B.ID3 Algorithm defined below. Each non-leaf node of a decision tree corresponds to Refinement for Generalization: Consider a an input attribute, and each arc to a possible value of categorical attribute D(j) with a user-specified that attribute. A leaf node corresponds to the expected taxonomy tree. Let child(v) be the set of child values value of the output attribute when the input attributes of v in a user-specified taxonomy tree. A refinement, are described by the path from the root node to that leaf written v - child(v), replaces the parent value v with node. the child value in child(v) that generalizes the domain In a “good” decision tree, each non-leaf node should value in each (generalized) record that contains v. correspond to the input attribute which is the most Refinement for Suppression: For a categorical informative about the output attribute amongst all the attribute D(j) without taxonomy tree, a refinement , input attributes not yet considered in the path from the refers to disclosing one value v from the set of root node to that node. This is because we would like to suppressed values Sup(j) . Let Rj denote the set of predict the output attribute using the smallest possible suppressed records that currently contain per (j) . number of questions on average. Disclosing v means replacing per (j) with v in all Entropy is used to determine how informative a records in R(j) that originally contain v. particular input attribute is about the output attribute for Refinement for Discretization: For a continuous a subset of the training data. Entropy is a measure of attribute, refinement is similar to that for uncertainty in communication systems. It is generalization except that no prior taxonomy tree is fundamental in modern information theory. given and the taxonomy tree has to be grown dynamically in the process of refinement. Initially, the III PROPOSED K-ANONYMIZATION interval that covers the full range of the attribute forms TECHNIQUE the root. The refinement on an interval v, written for all child(v), refers to the optimal split of v into two A. Masking operation child intervals child(v) that maximizes the information To transform T to satisfy the anonymity requirement, gain. The anonymity is not used for finding a split we consider three types of masking operations on the good for classification. This is similar to defining a attributes D(j) in U[QID] is done as mentioned above. taxonomy tree where the main consideration is how the taxonomy best describes the application. Due to B. Anonymisation by top-down refinement this extra step of identifying the optimal split of the For given Data Set, Consider the ‘p’ Quasi identifiers parent interval, we treat continuous attributes QID(1),……., QID(p) on table T. A(qid(i)) denotes the separately from categorical attributes with taxonomy number of data records in T that share the value qid(i) trees. A refinement is valid (with respect to T) if T on QID(i). The anonymity of QID(i), denoted satisfies the anonymity requirement after the A(QID(i)), is the smallest a(qid(i)) for any value qid(i) refinement. A refinement is beneficial (with respect to on QID(i). A table T satisfies the anonymity T) if more than one class is involved in the refined requirement (QID(1), k(1),…, QID(p), k(p)) if records. A refinement is performed only if it is both A(QID(i)) >= k(i) where 1<= I <= p and where k(i) is valid and beneficial. Therefore, a refinement the anonymity threshold on QID(I )specified by the data guarantees that every newly generated qid has a(qid) ¸ provider. We make note of the no. of the records or k. attribute values for each quasi identifiers that are in the given database and the check on the anonymity D.Computation of infogain, anonyloss and score of the requirement Threshold level k(i) for each QID(i). refinement: InfoGain(v): defined as C. Refinement of the masked datasets A table T can be masked by a sequence of R refinements starting from the most masked state in which each attribute is either generalized to the top most ¥c c Rv I ( Rc ) InfoGain (v) = I(Rv) - (1) value, or suppressed to the special value, or represented by a single interval. This method iteratively refines a Rx )= masked value selected from the current set of cuts, where I(Rx) is the entropy of I( suppressed values and intervals, until violating the anonymity requirement. Each refinement increases the 35 © 2010 ACEEE DOI: 01.ijns.01.02.07
  • 3. ACEEE International Journal on Network Security, Vol 1, No. 2, July 2010 ¥ freq(R cls) / R * log freq(R , cls) / R IV RESULTS x, x x x Table 1 depicts the runtime of top down refinement cls using generalization, suppression and discretization for (2) 200K to 1M data records and based on two types of anonymity requirements. AllAttQID refers to the freq(Rx; cls) is the number of data records in Rx single quasi identifiers having all 14 attributes. This is having the class cls. Intuitively, I(Rx) measures the one of the most time consuming settings because of entropy (or .impurity .) of classes in Rx. The more the largest number of candidate refinements to dominating the majority class in Rx is, the smaller I(Rx) consider at each iteration. Top down refinement is (i.e., less entropy in Rx).Therefore, I(Rx) measures requires more iterations to reach a solution, hence the error because non-majority more runtime. Top down refinement takes AnonyLoss(v): defined as approximately 80 seconds to transform 1M records. Compared to AllAttQID, top down refinement avg{A(QID j ) − Av (QID j )} becomes less efficient for handling multiple quasi identifiers. The implementation results summarized AnonyLoss(v) = (3) below. Fig 1 shows the time taken for masking when where A(QIDj) and Av(QIDj) represent the executed with different number of records. anonymity before and after refining v. Data Source: UCI Repository: Adult dataset avg{ A(QID j ) − Av (QID j )} Fields: Age, Work Class, Sex is the average loss of Operation: Masking anonymity for all QIDj that contain the attribute of Suppression: Work Class, Sex refinement process to heuristically maximize the Discretization: Age classification goal. Consider a refinement for all child(v) where for all Dj ,Dj is a categorical attribute TABLE 1TIME TAKEN IN SECONDS with a user-specified taxonomy tree or Dj is a continuous attribute with a dynamically grown taxonomy tree. The refinement has two effects: it increases the information of the refined records with respect to classification, and it decreases the anonymity of the refined records with respect to privacy. These effects are measured by .information gain denoted by InfoGain(v) in (1) , and .anonymity loss denoted by AnonyLoss(v) in (3). v is a good candidate for refinement. If InfoGain(v) is large and AnonyLoss(v) is small. Our selection criterion is choosing the candidate v, for the next refinement, that has the maximum information-gain/ anonymity loss trade-off, defined as Score(v) =InfoGain(v) AnonyLoss(v) + 1 (4) 1 is added to AnonyLoss(v) to avoid division by zero. Figure 1 . for masking Each choice of InfoGain(v) and AnonyLoss(v) gives a trade-off between classification and anonymization. It Table two summarizes the final results on applying should be noted that Score is not a goodness metric of the ID3 algorithm on the compressed and the original k-anonymization. Infact, it is difficult to have a closed data for a series of inputs . A comparison of the form metric to capture the classification goal on future accuracy of classification of transformed data with the data. original data is depicted in fig 2. Data Source: UCI Repository: Adult dataset E. Implementation of ID3 algorithm Fields: Age, Work Class, Sex ID3 algorithm is used to construct the decision tree Operation: Masking by employing a top-down, greedy search through the Suppression: Work Class, Sex given sets to test each attribute at every tree node. In Discretization: Age order to select the attribute that is most useful for Generalization: Education classifying a given sets, we introduce a metric Operation: ID3 Algorithm information gain. Accuracy Calculation: To find an optimal way to classify a learning set, Accuracy % = No. of records selected / Entropy what we need to do is to minimize the questions asked. Thus, we need some function which can measure which questions provide the most balanced splitting. 36 © 2010 ACEEE DOI: 01.ijns.01.02.07
  • 4. ACEEE International Journal on Network Security, Vol 1, No. 2, July 2010 REFERENCES 1]. Benjamin C. M. Fung, Ke Wang, and Philip S. Yu, Fellow “Anonymizing Classification Data for Privacy Preservation” IEEE transactions on knowledge and engineering 2007 . 2]. Samarati and L. Sweeney,”Generalizing data to provide anonymity when disclosing information” Proc. of the 17th ACM SIGACTSIGMOD- SIGART Symposium on Principles of Database Systems (PODS 98), Seattle, WA, June 1998, p. 188. 3]. P. Samarati, ”Protecting respondents' identities in microdata release”, IEEE Transactions on Knowledge Engineering (TKDE), vol. 13, no. 6, 2001, pp. 1010.1027 Privacy Using K-Anonymity 4]. L. Sweeney, “Achieving k-anonymity privacy 120 protection using generalization and suppression”. 100 International Journal on Uncertainty, Fuzziness, and Knowledge-based Systems, vol. 10, no. 5, pp. 571.588, 80 2002. Decision Tree Privacy 60 Original Table 5]. R. J. Bayardo and R. Agrawal, “Data privacy through Compressed Table optimal kanonymization” Proc. of the 21st 40 International Conference on Data Engineering (ICDE), 20 Tokyo, Japan, April 2005, pp. 217.228. 6]. G. Aggarwal, T. Feder, K. Kenthapadi, R. Motwani, R. 0 10 50 100 150 200 250 300 350 Panigrahy, D. Thomas, and A. Zhu, “Approximation no of records algorithms for k-anonymity”,.Journal of Privacy Technology, no. 20051120001, November 2005 Figure 2 . Accuracy using k-Anonymity 7]. A.Meyerson and R. Williams, “On the complexity of optimal kanonymity”. Proc. of the 23rd ACM Symposium on Principles of Database Systems (PODS), V CONCLUSION 2004, pp. 223.228. 8]. L. Sweeney, .Data_y” A system for providing The problem of ensuring individual's anonymity anonymity in medical data,” Proc. of the International while releasing person-specific data for classification Conference on Database Security, 1998, pp. 356.381. analysis is considered. The previous optimal k- 9]. V. S. Iyengar, “Transforming data to satisfy privacy anonymization based on a closed form of cost metric constraints”,.Proc. of the 8th ACM SIGKDD does not address this classification requirement . The International Conference on Knowledge Discovery and proposed method can be used to hide valuable Data Mining, Edmonton, AB, Canada, July 2002, pp. information while presenting it on a publicly accessible 279.288. place like internet. The results shows that when [10]. K. Wang, P. Yu, and S. Chakraborty, “Bottom-up generalization: a data mining solution to privacy classification is applied to both the original table and protection”, Proc. of the 4th IEEE International transformed table ,the accuracy level is not too low for Conference on Data Mining (ICDM), November 2004. the transformed table when compared to that of the [11]. B. C. M. Fung, K. Wang, and P. S. Yu, “Top-down original table. The proposed privacy preserving specialization for information and privacy transformation preserved the nature of the data even in preservation”. Proc. of the 21st International the transformed form. The classification accuracy while Conference on Data Engineering (ICDE), Tokyo, using the transformed data is almost equal to that of the Japan, April 2005, pp. 205.216. original dataset. 37 © 2010 ACEEE DOI: 01.ijns.01.02.07