SlideShare una empresa de Scribd logo
1 de 6
Descargar para leer sin conexión
IOSR Journal of Computer Engineering (IOSRJCE)
ISSN: 2278-0661 Volume 5, Issue 1 (Sep-Oct. 2012), PP 06-11
Www.iosrjournals.org
www.iosrjournals.org 6 | Page
An Effective Heuristic Approach for Hiding Sensitive Patterns in
Databases
Mrs. P.Cynthia Selvi1
, Dr. A.R.Mohamed Shanavas2
1
Associate Professor, Dept. of Computer Science, KNGA College(W), Thanjavur 613007 /Affiliated to
Bharathidasan University, Tiruchirapalli, TamilNadu, India.
2
Associate Professor, Dept. of Comuter Science, Jamal Mohamed College, Tiruchirapalli 620 020/ Affiliated to
Bharathidasan University, Tiruchirapalli, TamilNadu, India.
Abstract: Privacy has been identified as a vital requirement in designing and implementing Data Mining (DM)
systems. This motivated Privacy Preservation in Data Mining (PPDM) as a rising field of research and various
approaches are being introduced by the researchers. One of the approaches is a sanitization process, that
transforms the source database into a modified one that the adversaries cannot extract the sensitive patterns
from. This study address this concept and proposes an effective heuristic-based algorithm which is aimed at
minimizing the number of removal of items in the source database possibly with no hiding failure.
Keywords: Privacy Preserving Data Mining, Restrictive Patterns, Sanitized database, Sensitive Transactions.
I. Introduction
PPDM is a novel research direction in DM, where DM algorithms are analyzed for the side-effects they
incur in data privacy. The main objective of PPDM is to develop algorithms for modifying the original data in
some way, so that the private data and private knowledge remain private even after the mining process[1]. In
DM, the users are provided with the data and not the association rules and are free to use their own tools; So,
the restriction for privacy has to be applied on the data itself before the mining phase.
For this reason, we need to develop mechanisms that can lead to new privacy control systems to
convert a given database into a new one in such a way to preserve the general rules mined from the original
database. The procedure of transforming the source database into a new database that hides some sensitive
patterns or rules is called the sanitization process[2]. To do so, a small number of transactions have to be
modified by deleting one or more items from them or even adding noise to the data by turning some items from
0 to 1 in some transactions. The released database is called the sanitized database. On one hand, this approach
slightly modifies some data, but this is perfectly acceptable in some real applications[3, 4]. On the other hand,
such an approach must hold the following restrictions:
o The impact on the source database has to be minimal
o An appropriate balance between the need for privacy and knowledge has to be guaranteed.
This study mainly focus on the task of minimizing the impact on the source database by reducing the
number of removed items from the source database with only one scan of the database. Section-2 briefly
summarizes the previous work done by various researchers; In Section-3 preliminaries are given. Section-4
states some basic definitions and of which definition 5 is framed by us which is used in the proposed heuristic-
based algorithm. In Section-5 the proposed algorithm is presented with illustration and example. As the detailed
analysis of the experimental results on large databases is under process, only the basic measures of effectiveness
is presented in this paper, after testing the algorithm for a sample generated database.
II. Related Work
Many researchers have paid attention to address the problem of privacy preservation in association rule
mining in recent years. The class of solutions for this problem has been restricted basically to randomization,
data partition, and data sanitization.
The idea behind data sanitization to reduce the support values of restrictive itemsets was first
introduced by Atallah et.al[1] and they have proved that the optimal sanitization process is NP-hard problem. In
[4], the authors generalized the problem in the sense that they considered the hiding of both sensitive frequent
itemsets and sensitive rules. Although these algorithms ensure privacy preservation, they are CPU-intensive
since they require multiple scans over a transactional database. In the same direction, Saygin [5] introduced a
method for selectively removing individual values from a database to prevent the discovery of a set of rules,
while preserving the data for other applications. They proposed some algorithms to obscure a given set of
sensitive rules by replacing known values with unknowns, while minimizing the side effects on non-sensitive
rules. These algorithms also require various scans to sanitize a database depending on the number of association
rules to be hidden.
An Effective Heuristic Approach for Hiding Sensitive Patterns in Databases
www.iosrjournals.org 7 | Page
Oliveira introduced many algorithms of which IGA[6] & SWA[7] aims at multiple rule hiding.
However, IGA has low misses cost; It groups restrictive itemsets and assigns a victim item to each group. This
clustering leads to the overlap between groups and it is not an efficient method to optimally cluster the itemsets.
It can be improved further by reducing the number of deleted items. Whereas, SWA improves the balance
between protection of sensitive knowledge and pattern discovery but it incurs an extra cost because some rules
are removed inadvertently. In our work, we focus on the heuristic based data sanitization approach.
III. Preliminaries
Transactional Database. A transactional database is a relation consisting of transactions in which each
transaction t is characterized by an ordered pair, defined as t = ˂Tid, list-of-elements˃, where Tid is a unique
transaction identifier number and list-of-elements represents a list of items making up the transactions. For
instance, in market basket data, a transactional database is composed of business transactions in which the list-
of-elements represents items purchased in a store.
Basics of Association Rules . One of the most studied problems in data mining is the process of discovering
association rules from large databases. Most of the existing algorithms for association rules rely on the support-
confidence framework introduced in [8].
Formally, association rules are defined as follows: Let I = {i1,...,in} be a set of literals, called items. Let
D be a database of transactions, where each transaction t is an itemset such that . A unique identifier, called
Tid, is associated with each transaction. A transaction t supports X, a set of items in I, if . An association
rule is an implication of the form , where , and . Thus, we say that a
rule holds in the database D with support if , where N is the number of transactions in D.
Similarly, we say that a rule holds in the database D with confidence ) if , where is the
number of occurrences of the set of items A in the set of transactions D. While the support is a measure of the
frequency of a rule, the confidence is a measure of the strength of the relation between sets of items.
Association rule mining algorithms rely on the two attributes, minimum Support(minSup ) and
minimum Confidence(minConf ). The problem of mining association rules have been first proposed in 1993[8].
Frequent Pattern. A pattern X is called a frequent pattern if Sup(X) ≥ minSup or if the absolute support of X
satisfies the corresponding minimum support count threshold. [pattern is an itemset; in this article, both terms
are used synonymously]. All association rules can directly be derived from the set of frequent patterns[8, 9]. The
conventions followed here are
o Apriori property[10]: all non empty subsets of a frequent itemsets(patterns) must also be frequent.
o Antimonotone property: if a set cannot pass a test, then all of its supersets will fail the same test as well.
Privacy Preservation in Frequent Patterns. The most basic model of privacy preserving data processing is one
in which we erase the sensitive entries in the data. These erased entries are usually particular patterns which are
decided by the user, who may either be the owner or the contributor of the data.
IV. Problem Definition
In this approach, the goal is to hide a group of frequent patterns which contains highly sensitive
knowledge. Such sensitive patterns that should be hidden are called restrictive patterns. Restrictive patterns can
always be generated from frequent patterns.
Definition 1. Let D be a source database, containing a set of all transactions. T denotes a set of transactions,
each transaction containing itemset . In addition, each k-itemset has an associated set of
transactions , where and .
Definition 2 : Restrictive Patterns : Let D be a source database, P be a set of all frequent patterns that can be
mined from D, and RulesH be a set of decision support rules that need to be hidden according to some security
policies. A set of patterns, denoted by RP is said to be restrictive, if RP ⊂ P and if and only if RP would
derive the set RulesH. RP is the set of non-restrictive patterns such that RP RP = P.
Definition 3 : Sensitive Transactions : Let T be a set of all transactions in a source database D, and RP be a set
of restrictive patterns mined from D. A set of transactions is said to be sensitive, denoted by ST, if every t ST
contain atleast one restrictive pattern, ie ST ={ T | X RP, X ⊆ t }. Moreover, if ST T then all restrictive
patterns can be mined one and only from ST.
An Effective Heuristic Approach for Hiding Sensitive Patterns in Databases
www.iosrjournals.org 8 | Page
Definition 4 : Transaction Degree : Let D be a source database and ST be a set of all sensitive transactions in
D. The degree of a sensitive transaction t, denoted as deg(t), such that t ST is defined as the number of
restrictive patterns that t contains.
Definition 5: Cover : The Cover of an item Ak can be defined as, CAk = { rpi | Ak rpi RP, 1 i |RP|} i.e.,
set of all restrictive patterns(rpi’s) which contain Ak. The item that is included in a maximum number of rpi’s is
the one with maximal cover or maxCover; i.e., maxCover = max( |CA1|, |CA2| , … |CAn| ) such that Ak rpi RP.
Based on the above definitions, the main strategy addressed in this work can be stated as given below:
If D is the source database of transactions and P is the set of relevant patterns that would be mined
from D, the goal is to transform D into a sanitized database D’, so that the most frequent patterns in P can still
be mined from D’ while others will be hidden. In this case, D’ becomes the released database.
V. Sanitization Algorithm
The optimal sanitization has been proved to be an NP-hard problem. To alleviate the complexity of the
optimal sanitization, some heuristics could be used. A heuristic does not guarantee the optimal solution but
usually finds a solution close to the best one in a faster response time. In this section, the proposed sanitizing
algorithm and the heuristics to sanitize a source database are introduced.
Given the source database (D), and the restrictive patterns(RP), the goal of the sanitization process is to
protect RP against the mining techniques used to disclose them. The sanitization process decreases the support
values of restrictive patterns by removing items from sensitive transactions. This process mainly includes four
sub-problems:
1. identifying the set of sensitive transactions for each restrictive pattern;
2. selecting the partial sensitive transactions to sanitize;
3. identify the candidate item(victim item) to be removed;
4. rewriting the modified database after removing the victim items.
Basically, all sanitizing algorithms differs only in subproblems 2 & 3.
5.1. Heuristic approach
In this work, the proposed algorithm is based on the following heuristics:
Heuristic-1. To solve subproblem-2 stated above, ST is sorted in decreasing order of (deg + size), thereby the
sensitive transactions that contains more number of patterns can be selected; this enable multiple patterns to be
sanitized in a single iteration.
Heuristic-2. To solve subproblem-3, the following heuristic is used in the algorithm :
for every item Ak RP, find cover and starting from maximal Cover, find T = ;
for every t T, mark Ak as the victim item and remove.
The rationale behind these two heuristics is to minimize the sanitization rate and thereby reducing the
impact on the source database.
Note : In this work, no sensitive transaction is completely removed. (ie, the number of transactions in the
source database is not altered).
5.2. Algorithm
Input : (i) D – Source Database (ii) RP – Set of all Restrictive Patterns
Output : D’ – Sanitized Database
Pre-requisites :
(i) Find Frequent Patterns(Itemsets) using Matrix Apriori Algorithm;
(ii) Form Look-up Table-1 : Ak RP , LT1(Ak) t-list / t D
(iii) Form Look-up Table-2 : rpi RP, LT2(rpi) t-list / t D
(iv) Form Look-up Table-3 : Ak RP, LT3(Ak) rpi-list / rpi RP
Algorithm maxCover1: // based on Heuristics 1 & 2 //
Step 1 : calculate supCount(rpi) rpi RP and sort in decreasing order ;
Step 2 : find Sensitive Transactions(ST) w.r.t. RP ;
a) calculate deg(t), size(t) t ST ;
b) sort t ST in decreasing order of deg & size ;
Step 3 : find ST D ST ; // ST - non sensitive transactions //
An Effective Heuristic Approach for Hiding Sensitive Patterns in Databases
www.iosrjournals.org 9 | Page
Step 4 : // Find ST’ //
find cover for every item Ak RP and sort in decreasing order of cover;
for each item Ak RP do
{
repeat
find T =
for each t T do
{
delete item Ak in non VictimTransactions such that Ak rpi rpi-list ; // Ak – victimItem //
// initially all t are nonvictim //
decrease supCount of rpi’s for which t is nonVictim;
mark t as victimTransaction in each t-list of rpi rpi _list(Ak ) ;
}
until (supCount = 0) for all rpi RP
}
Step 5 : D’ ST ST’
5.3. Illustration
The following examples help understand how the proposed Heuristics work. Refer the Source
Database(D) in Table-1. The set of all Restrictive Patterns to be hidden are given in Table-2. The sensitive
transactions- ST (transactions which include atleast one Restrictive Pattern) are identified from D and are
extracted. They are sorted in decreasing order of their deg and size(Table-3). Non sensitive transactions( ST)
are also filtered and stored separately(Table-5).
The proposed algorithm refer the Look-up Tables(listed below) to speed up the process.
LookUp Table-1 [ item ← t-list ] LookUp Table-2 [ rpi ← t-list ]
Item Transactions No.of
Trans.
C T01,T04,T02,T03,T06 5
D T01,T04,T02,T03,T06 5
E T01,T04,T03,T06 4
A T01,T04,T02 3
Item Transactions SupCount
R1 T01,T04,T02,T03,T06 5
R2 T01,T04,T03,T06 4
R3 T01,T04,T02 3
An Effective Heuristic Approach for Hiding Sensitive Patterns in Databases
www.iosrjournals.org 10 | Page
LookUp Table-3 [ item ← rpi-list ] Table.6. Sanitized Database(D’)
The proposed algorithm is based on the Heuristics 1 and 2. Here, for every item Ak starting from maxCover
(refer LookUp Table-3), find the sensitive transactions which are common for all rules associated with Ak.
i.e., [T = ]. In every transaction t in T, remove Ak and decrease the supCount of all restrictive
patterns( rpi) associated with t and which contain Ak.
Example : Item C (being the first item with maxCover), R1 & R3 form the rpi-list(C); whose common
transactions are [T01, T04, T02]. Remove C from these transactions, which reduce the supCount of both R1
and R3 by 3. Mark these t as victim transactions for R1 and R3. When we consider the next item D, the t-list of
rpi-list(D) are [T01, T04, T03, T06]. Removing D from T01 and T04 would not reduce the supCount of
R1(because they are already considered in the previous iteration); but would reduce the supCount of R2. Hence,
remove it and decrease only the supCount of R2. Whereas removing D from T03 and T06 would reduce the
supCount of both R1 and R2. This process is repeated until the supCount of all rpi’s are reduced to 0. The
modified form of sensitive transactions are denoted as ST’.
Then the Sanitized database D’(refer table-6)is formed bycombining ST (Table-5) and ST’.
VI. Implementation
This algorithm is tested in the environment of Intel core 2 duo Processor with 2.5 GHz speed and 4 GB
RAM, running Windows XP. We have used NetBean 6.9.1 to code the algorithm using Java(JDK 1.7) with
SQL Server 2005. It has been tested for a sample database to verify the main objectives (minimal removal of
items and no hiding failure) and it requires only one scan of the source database. However, the testing process
for very large real databases is in progress for a detailed analysis. Before the hiding process, the frequent
patterns are obtained using Matrix Apriori Algorithm[11], which is faster and use simpler data structure than
the Apriori algorithm[9,10]. Moreover, it scans the database only twice and works without candidate generation.
VII. Effectiveness Measures
(i) Sanitization Rate(SR) : It is defined as the ratio of removed items(victim items) to the total support value of
restrictive patterns(rpi) in the source database D.
SR =
With the sample database tested, it is found that the SR is less than 50%.
(ii) Hiding Failure(HF) : It is assessed by the restrictive patterns that were failed to be hidden. In other words,
if a hidden restrictive pattern cannot be extracted from the released database D’ with an arbitrary minSup, the
hidden pattern has no hiding failure occurrence.
HF =
As far as the algorithm maxcover1 is concerned, the restrictive patterns are modified till their
respective supcount becomes zero. Moreover it ensures that no transaction is completely removed. Hence this
algorithm is 100% HF free.
(iii) Misses Cost(MC) : This measure deals with the legitimate patterns(non restrictive patterns) that were
accidently missed.
MC =
Note: There is a compromise between the MC and HF; ie, the more patterns we hide, the more legitimate
patterns we miss. But with the sample database tested, this algorithm has 0% MC.
(iv) Artifactual Pattern(AP) : AP occurs when some artificial patterns are generated form D’ as an outcome of
the sanitization process.
AP =
Item Rules Cover
C R1, R3 2
D R1, R2 2
E R2 1
A R3 1
Tid Pattern(Itemset)
T01 E,A,B
T02 D,A,B,F
T03 C,E
T04 E,A
T05 E,B,F
T06 C,E
An Effective Heuristic Approach for Hiding Sensitive Patterns in Databases
www.iosrjournals.org 11 | Page
As this algorithm hides restrictive patterns by selectively removing items (instead of swapping, replacement,
etc.,) from the source database(D), it does not generate any artifactual pattern.
VIII. Conclusion
In this competitive but cooperative business environment, companies need to share information with
others, while at the same time, have to protect their own confidential knowledge. To facilitate this kind of data
sharing with privacy protection, the algorithm based on maxCover is proposed. This algorithm ensure that no
counterpart or adversary can mine the restrictive patterns even with an arbitrarily very small supCount.
This algorithm is based on the strategy to simultaneously decrease the support count of maximum
number of sensitive patterns (itemsets), with possibly minimum number of removal of items and it reduce the
impact on the source database. The proposed algorithm has minimal sanitization rate possibly with no hiding
failure and low misses cost. Above all this algorithm scans the original database only once. It is important to
note that the proposed algorithm is robust in the sense that there is no desanitization possible. The alterations to
the original database are not saved anywhere, since the owner of the database still keeps an original copy of the
database intact, while disturbing the sanitized database. Moreover, There is no possible way to reproduce the
original database from the sanitized one, as there is no encryption involved.
As already mentioned, the work on the time complexity analysis and the dissimilarity study between
the original and sanitized databases in an elaborate manner using very large databases is in progress for which
the publicly available real databases are being used.
References
[1]. Verykios,V.S, Bertino.E, Fovino.I.N, ProvenzaL.P, Saygin.Y and Theodoridis.Y, “State-of-the-art in Privacy Preservation Data
Mining”, New York,ACM SIGMOD Record, vol.33, no.2, pp.50-57,2004.
[2]. Atallah.M, Bertino.E, Elmagarmid.A, Ibrahim.M and Verykios.V.S, “Disclosure Limitation of Sensitive Rules”, In Proc. of IEEE
Knowledge and Data Engineering Workshop, pages 45–52, Chicago, Illinois, November 1999.
[3]. Clifton.C and Marks.D, “ Security and Privacy Implications of Data Mining”, In Workshop on Data Mining and Knowledge
Discovery, pages 15–19, Montreal, Canada, February 1996.
[4]. Dasseni.E, Verykios.V.S, Elmagarmid.A.K and Bertino.E, “ Hiding Association Rules by Using Confidence and Support”, In Proc.
of the 4th Information Hiding Workshop, pages 369– 383, Pittsburg, PA, April 2001.
[5]. Saygin.Y, Verykios.V.S, and Clifton.C, “Using Unknowns to Prevent Discovery of Association Rules”, SIGMOD Record,
30(4):45–54, December 2001.
[6]. Oliveira.S.R.M and Zaiane.O.R, “Privacy preserving Frequent Itemset Mining”, in the Proc. of the IEEE ICDM Workshop on
Privacy, Security, and Data Mining, Pages 43-54, Maebashi City, Japan, December 2002.
[7]. Oliveira.S.R.M and Zaiane.O.R, “An Efficient One-Scan Sanitization for Improving the Balance between Privacy and Knowledge
Discovery”, Technical Report TR 03-15, June 2003.
[8]. Agrawal R, Imielinski T, and Swami.A, “Mining association rules between sets of items in large databases”, Proceedings of 1993
ACM SIGMOD international conference on management of data, Washington, DC; 1993. p. 207-16.
[9]. Agrawal R, and Srikant R. “Fast algorithms for mining association rules”, Proceedings of 20th
international conference on very large
data bases, Santiago, Chile; 1994. p. 487-99.
[10]. Han J, and Kamber M, “Data Mining Concepts and Techniques”, Oxford University Press, 2009.
[11]. Pavon.J, Viana.S, and Gomez.S, “Matrix Apriori: speeding up the search for frequent patterns,” Proc. 24th IASTED International
Conference on Databases and Applications, 2006, pp. 75-82.

Más contenido relacionado

La actualidad más candente

A Survey on Identification of Closed Frequent Item Sets Using Intersecting Al...
A Survey on Identification of Closed Frequent Item Sets Using Intersecting Al...A Survey on Identification of Closed Frequent Item Sets Using Intersecting Al...
A Survey on Identification of Closed Frequent Item Sets Using Intersecting Al...IOSR Journals
 
Comparative study of frequent item set in data mining
Comparative study of frequent item set in data miningComparative study of frequent item set in data mining
Comparative study of frequent item set in data miningijpla
 
Classification on multi label dataset using rule mining technique
Classification on multi label dataset using rule mining techniqueClassification on multi label dataset using rule mining technique
Classification on multi label dataset using rule mining techniqueeSAT Publishing House
 
Application of data mining tools for
Application of data mining tools forApplication of data mining tools for
Application of data mining tools forIJDKP
 
Bj32809815
Bj32809815Bj32809815
Bj32809815IJMER
 
Frequent Item Set Mining - A Review
Frequent Item Set Mining - A ReviewFrequent Item Set Mining - A Review
Frequent Item Set Mining - A Reviewijsrd.com
 
Enhancement techniques for data warehouse staging area
Enhancement techniques for data warehouse staging areaEnhancement techniques for data warehouse staging area
Enhancement techniques for data warehouse staging areaIJDKP
 
A Quantified Approach for large Dataset Compression in Association Mining
A Quantified Approach for large Dataset Compression in Association MiningA Quantified Approach for large Dataset Compression in Association Mining
A Quantified Approach for large Dataset Compression in Association MiningIOSR Journals
 
GeneticMax: An Efficient Approach to Mining Maximal Frequent Itemsets Based o...
GeneticMax: An Efficient Approach to Mining Maximal Frequent Itemsets Based o...GeneticMax: An Efficient Approach to Mining Maximal Frequent Itemsets Based o...
GeneticMax: An Efficient Approach to Mining Maximal Frequent Itemsets Based o...ITIIIndustries
 
Top Down Approach to find Maximal Frequent Item Sets using Subset Creation
Top Down Approach to find Maximal Frequent Item Sets using Subset CreationTop Down Approach to find Maximal Frequent Item Sets using Subset Creation
Top Down Approach to find Maximal Frequent Item Sets using Subset Creationcscpconf
 
A Survey on Fuzzy Association Rule Mining Methodologies
A Survey on Fuzzy Association Rule Mining MethodologiesA Survey on Fuzzy Association Rule Mining Methodologies
A Survey on Fuzzy Association Rule Mining MethodologiesIOSR Journals
 

La actualidad más candente (16)

A Survey on Identification of Closed Frequent Item Sets Using Intersecting Al...
A Survey on Identification of Closed Frequent Item Sets Using Intersecting Al...A Survey on Identification of Closed Frequent Item Sets Using Intersecting Al...
A Survey on Identification of Closed Frequent Item Sets Using Intersecting Al...
 
Comparative study of frequent item set in data mining
Comparative study of frequent item set in data miningComparative study of frequent item set in data mining
Comparative study of frequent item set in data mining
 
Classification on multi label dataset using rule mining technique
Classification on multi label dataset using rule mining techniqueClassification on multi label dataset using rule mining technique
Classification on multi label dataset using rule mining technique
 
Hu3414421448
Hu3414421448Hu3414421448
Hu3414421448
 
Application of data mining tools for
Application of data mining tools forApplication of data mining tools for
Application of data mining tools for
 
Bj32809815
Bj32809815Bj32809815
Bj32809815
 
130509
130509130509
130509
 
K355662
K355662K355662
K355662
 
Frequent Item Set Mining - A Review
Frequent Item Set Mining - A ReviewFrequent Item Set Mining - A Review
Frequent Item Set Mining - A Review
 
Ap26261267
Ap26261267Ap26261267
Ap26261267
 
Enhancement techniques for data warehouse staging area
Enhancement techniques for data warehouse staging areaEnhancement techniques for data warehouse staging area
Enhancement techniques for data warehouse staging area
 
A Quantified Approach for large Dataset Compression in Association Mining
A Quantified Approach for large Dataset Compression in Association MiningA Quantified Approach for large Dataset Compression in Association Mining
A Quantified Approach for large Dataset Compression in Association Mining
 
GeneticMax: An Efficient Approach to Mining Maximal Frequent Itemsets Based o...
GeneticMax: An Efficient Approach to Mining Maximal Frequent Itemsets Based o...GeneticMax: An Efficient Approach to Mining Maximal Frequent Itemsets Based o...
GeneticMax: An Efficient Approach to Mining Maximal Frequent Itemsets Based o...
 
EXECUTION OF ASSOCIATION RULE MINING WITH DATA GRIDS IN WEKA 3.8
EXECUTION OF ASSOCIATION RULE MINING WITH DATA GRIDS IN WEKA 3.8EXECUTION OF ASSOCIATION RULE MINING WITH DATA GRIDS IN WEKA 3.8
EXECUTION OF ASSOCIATION RULE MINING WITH DATA GRIDS IN WEKA 3.8
 
Top Down Approach to find Maximal Frequent Item Sets using Subset Creation
Top Down Approach to find Maximal Frequent Item Sets using Subset CreationTop Down Approach to find Maximal Frequent Item Sets using Subset Creation
Top Down Approach to find Maximal Frequent Item Sets using Subset Creation
 
A Survey on Fuzzy Association Rule Mining Methodologies
A Survey on Fuzzy Association Rule Mining MethodologiesA Survey on Fuzzy Association Rule Mining Methodologies
A Survey on Fuzzy Association Rule Mining Methodologies
 

Destacado

New Skills for Testers
New Skills for TestersNew Skills for Testers
New Skills for TestersIOSR Journals
 
Implementation of FC-TCR for Reactive Power Control
Implementation of FC-TCR for Reactive Power ControlImplementation of FC-TCR for Reactive Power Control
Implementation of FC-TCR for Reactive Power ControlIOSR Journals
 
MDSR to Reduce Link Breakage Routing Overhead in MANET Using PRM
MDSR to Reduce Link Breakage Routing Overhead in MANET Using PRMMDSR to Reduce Link Breakage Routing Overhead in MANET Using PRM
MDSR to Reduce Link Breakage Routing Overhead in MANET Using PRMIOSR Journals
 
A New Filtering Method and a Novel Converter Transformer for HVDC System.
A New Filtering Method and a Novel Converter Transformer for HVDC System.A New Filtering Method and a Novel Converter Transformer for HVDC System.
A New Filtering Method and a Novel Converter Transformer for HVDC System.IOSR Journals
 
Improving security for data migration in cloud computing using randomized enc...
Improving security for data migration in cloud computing using randomized enc...Improving security for data migration in cloud computing using randomized enc...
Improving security for data migration in cloud computing using randomized enc...IOSR Journals
 
Solar Based Stand Alone High Performance Interleaved Boost Converter with Zvs...
Solar Based Stand Alone High Performance Interleaved Boost Converter with Zvs...Solar Based Stand Alone High Performance Interleaved Boost Converter with Zvs...
Solar Based Stand Alone High Performance Interleaved Boost Converter with Zvs...IOSR Journals
 
Social Networking Websites and Image Privacy
Social Networking Websites and Image PrivacySocial Networking Websites and Image Privacy
Social Networking Websites and Image PrivacyIOSR Journals
 
An Optimal Approach to derive Disjunctive Positive and Negative Rules from As...
An Optimal Approach to derive Disjunctive Positive and Negative Rules from As...An Optimal Approach to derive Disjunctive Positive and Negative Rules from As...
An Optimal Approach to derive Disjunctive Positive and Negative Rules from As...IOSR Journals
 
Adjustment of Cost 231 Hata Path Model For Cellular Transmission in Rivers State
Adjustment of Cost 231 Hata Path Model For Cellular Transmission in Rivers StateAdjustment of Cost 231 Hata Path Model For Cellular Transmission in Rivers State
Adjustment of Cost 231 Hata Path Model For Cellular Transmission in Rivers StateIOSR Journals
 
A Music Visual Interface via Emotion Detection Supervisor
A Music Visual Interface via Emotion Detection SupervisorA Music Visual Interface via Emotion Detection Supervisor
A Music Visual Interface via Emotion Detection SupervisorIOSR Journals
 
A Novel Rebroadcast Technique for Reducing Routing Overhead In Mobile Ad Hoc ...
A Novel Rebroadcast Technique for Reducing Routing Overhead In Mobile Ad Hoc ...A Novel Rebroadcast Technique for Reducing Routing Overhead In Mobile Ad Hoc ...
A Novel Rebroadcast Technique for Reducing Routing Overhead In Mobile Ad Hoc ...IOSR Journals
 
Generation and Implementation of Barker and Nested Binary codes
Generation and Implementation of Barker and Nested Binary codesGeneration and Implementation of Barker and Nested Binary codes
Generation and Implementation of Barker and Nested Binary codesIOSR Journals
 
E-Healthcare Billing and Record Management Information System using Android w...
E-Healthcare Billing and Record Management Information System using Android w...E-Healthcare Billing and Record Management Information System using Android w...
E-Healthcare Billing and Record Management Information System using Android w...IOSR Journals
 
Comparison of different Ant based techniques for identification of shortest p...
Comparison of different Ant based techniques for identification of shortest p...Comparison of different Ant based techniques for identification of shortest p...
Comparison of different Ant based techniques for identification of shortest p...IOSR Journals
 
Power Optimization in MIMO-CN with MANET
Power Optimization in MIMO-CN with MANETPower Optimization in MIMO-CN with MANET
Power Optimization in MIMO-CN with MANETIOSR Journals
 

Destacado (20)

G0444246
G0444246G0444246
G0444246
 
New Skills for Testers
New Skills for TestersNew Skills for Testers
New Skills for Testers
 
Implementation of FC-TCR for Reactive Power Control
Implementation of FC-TCR for Reactive Power ControlImplementation of FC-TCR for Reactive Power Control
Implementation of FC-TCR for Reactive Power Control
 
MDSR to Reduce Link Breakage Routing Overhead in MANET Using PRM
MDSR to Reduce Link Breakage Routing Overhead in MANET Using PRMMDSR to Reduce Link Breakage Routing Overhead in MANET Using PRM
MDSR to Reduce Link Breakage Routing Overhead in MANET Using PRM
 
A New Filtering Method and a Novel Converter Transformer for HVDC System.
A New Filtering Method and a Novel Converter Transformer for HVDC System.A New Filtering Method and a Novel Converter Transformer for HVDC System.
A New Filtering Method and a Novel Converter Transformer for HVDC System.
 
Improving security for data migration in cloud computing using randomized enc...
Improving security for data migration in cloud computing using randomized enc...Improving security for data migration in cloud computing using randomized enc...
Improving security for data migration in cloud computing using randomized enc...
 
Solar Based Stand Alone High Performance Interleaved Boost Converter with Zvs...
Solar Based Stand Alone High Performance Interleaved Boost Converter with Zvs...Solar Based Stand Alone High Performance Interleaved Boost Converter with Zvs...
Solar Based Stand Alone High Performance Interleaved Boost Converter with Zvs...
 
A0510107
A0510107A0510107
A0510107
 
F0523840
F0523840F0523840
F0523840
 
Social Networking Websites and Image Privacy
Social Networking Websites and Image PrivacySocial Networking Websites and Image Privacy
Social Networking Websites and Image Privacy
 
E0552228
E0552228E0552228
E0552228
 
C01111519
C01111519C01111519
C01111519
 
An Optimal Approach to derive Disjunctive Positive and Negative Rules from As...
An Optimal Approach to derive Disjunctive Positive and Negative Rules from As...An Optimal Approach to derive Disjunctive Positive and Negative Rules from As...
An Optimal Approach to derive Disjunctive Positive and Negative Rules from As...
 
Adjustment of Cost 231 Hata Path Model For Cellular Transmission in Rivers State
Adjustment of Cost 231 Hata Path Model For Cellular Transmission in Rivers StateAdjustment of Cost 231 Hata Path Model For Cellular Transmission in Rivers State
Adjustment of Cost 231 Hata Path Model For Cellular Transmission in Rivers State
 
A Music Visual Interface via Emotion Detection Supervisor
A Music Visual Interface via Emotion Detection SupervisorA Music Visual Interface via Emotion Detection Supervisor
A Music Visual Interface via Emotion Detection Supervisor
 
A Novel Rebroadcast Technique for Reducing Routing Overhead In Mobile Ad Hoc ...
A Novel Rebroadcast Technique for Reducing Routing Overhead In Mobile Ad Hoc ...A Novel Rebroadcast Technique for Reducing Routing Overhead In Mobile Ad Hoc ...
A Novel Rebroadcast Technique for Reducing Routing Overhead In Mobile Ad Hoc ...
 
Generation and Implementation of Barker and Nested Binary codes
Generation and Implementation of Barker and Nested Binary codesGeneration and Implementation of Barker and Nested Binary codes
Generation and Implementation of Barker and Nested Binary codes
 
E-Healthcare Billing and Record Management Information System using Android w...
E-Healthcare Billing and Record Management Information System using Android w...E-Healthcare Billing and Record Management Information System using Android w...
E-Healthcare Billing and Record Management Information System using Android w...
 
Comparison of different Ant based techniques for identification of shortest p...
Comparison of different Ant based techniques for identification of shortest p...Comparison of different Ant based techniques for identification of shortest p...
Comparison of different Ant based techniques for identification of shortest p...
 
Power Optimization in MIMO-CN with MANET
Power Optimization in MIMO-CN with MANETPower Optimization in MIMO-CN with MANET
Power Optimization in MIMO-CN with MANET
 

Similar a An Effective Heuristic for Hiding Sensitive Database Patterns

An improved Item-based Maxcover Algorithm to protect Sensitive Patterns in La...
An improved Item-based Maxcover Algorithm to protect Sensitive Patterns in La...An improved Item-based Maxcover Algorithm to protect Sensitive Patterns in La...
An improved Item-based Maxcover Algorithm to protect Sensitive Patterns in La...IOSR Journals
 
Distortion Based Algorithms For Privacy Preserving Frequent Item Set Mining
Distortion Based Algorithms For Privacy Preserving Frequent Item Set Mining Distortion Based Algorithms For Privacy Preserving Frequent Item Set Mining
Distortion Based Algorithms For Privacy Preserving Frequent Item Set Mining IJDKP
 
A literature review of modern association rule mining techniques
A literature review of modern association rule mining techniquesA literature review of modern association rule mining techniques
A literature review of modern association rule mining techniquesijctet
 
An Efficient Compressed Data Structure Based Method for Frequent Item Set Mining
An Efficient Compressed Data Structure Based Method for Frequent Item Set MiningAn Efficient Compressed Data Structure Based Method for Frequent Item Set Mining
An Efficient Compressed Data Structure Based Method for Frequent Item Set Miningijsrd.com
 
SECURED FREQUENT ITEMSET DISCOVERY IN MULTI PARTY DATA ENVIRONMENT FREQUENT I...
SECURED FREQUENT ITEMSET DISCOVERY IN MULTI PARTY DATA ENVIRONMENT FREQUENT I...SECURED FREQUENT ITEMSET DISCOVERY IN MULTI PARTY DATA ENVIRONMENT FREQUENT I...
SECURED FREQUENT ITEMSET DISCOVERY IN MULTI PARTY DATA ENVIRONMENT FREQUENT I...Editor IJMTER
 
Ijsrdv1 i2039
Ijsrdv1 i2039Ijsrdv1 i2039
Ijsrdv1 i2039ijsrd.com
 
An apriori based algorithm to mine association rules with inter itemset distance
An apriori based algorithm to mine association rules with inter itemset distanceAn apriori based algorithm to mine association rules with inter itemset distance
An apriori based algorithm to mine association rules with inter itemset distanceIJDKP
 
Comparative analysis of association rule generation algorithms in data streams
Comparative analysis of association rule generation algorithms in data streamsComparative analysis of association rule generation algorithms in data streams
Comparative analysis of association rule generation algorithms in data streamsIJCI JOURNAL
 
Privacy Preserving Approaches for High Dimensional Data
Privacy Preserving Approaches for High Dimensional DataPrivacy Preserving Approaches for High Dimensional Data
Privacy Preserving Approaches for High Dimensional Dataijtsrd
 
A Novel Filtering based Scheme for Privacy Preserving Data Mining
A Novel Filtering based Scheme for Privacy Preserving Data MiningA Novel Filtering based Scheme for Privacy Preserving Data Mining
A Novel Filtering based Scheme for Privacy Preserving Data MiningIRJET Journal
 
Introduction To Multilevel Association Rule And Its Methods
Introduction To Multilevel Association Rule And Its MethodsIntroduction To Multilevel Association Rule And Its Methods
Introduction To Multilevel Association Rule And Its MethodsIJSRD
 
An Ontological Approach for Mining Association Rules from Transactional Dataset
An Ontological Approach for Mining Association Rules from Transactional DatasetAn Ontological Approach for Mining Association Rules from Transactional Dataset
An Ontological Approach for Mining Association Rules from Transactional DatasetIJERA Editor
 

Similar a An Effective Heuristic for Hiding Sensitive Database Patterns (20)

An improved Item-based Maxcover Algorithm to protect Sensitive Patterns in La...
An improved Item-based Maxcover Algorithm to protect Sensitive Patterns in La...An improved Item-based Maxcover Algorithm to protect Sensitive Patterns in La...
An improved Item-based Maxcover Algorithm to protect Sensitive Patterns in La...
 
K017167076
K017167076K017167076
K017167076
 
F212734
F212734F212734
F212734
 
Distortion Based Algorithms For Privacy Preserving Frequent Item Set Mining
Distortion Based Algorithms For Privacy Preserving Frequent Item Set Mining Distortion Based Algorithms For Privacy Preserving Frequent Item Set Mining
Distortion Based Algorithms For Privacy Preserving Frequent Item Set Mining
 
A literature review of modern association rule mining techniques
A literature review of modern association rule mining techniquesA literature review of modern association rule mining techniques
A literature review of modern association rule mining techniques
 
F04713641
F04713641F04713641
F04713641
 
F04713641
F04713641F04713641
F04713641
 
F0423038041
F0423038041F0423038041
F0423038041
 
Dy33753757
Dy33753757Dy33753757
Dy33753757
 
An Efficient Compressed Data Structure Based Method for Frequent Item Set Mining
An Efficient Compressed Data Structure Based Method for Frequent Item Set MiningAn Efficient Compressed Data Structure Based Method for Frequent Item Set Mining
An Efficient Compressed Data Structure Based Method for Frequent Item Set Mining
 
SECURED FREQUENT ITEMSET DISCOVERY IN MULTI PARTY DATA ENVIRONMENT FREQUENT I...
SECURED FREQUENT ITEMSET DISCOVERY IN MULTI PARTY DATA ENVIRONMENT FREQUENT I...SECURED FREQUENT ITEMSET DISCOVERY IN MULTI PARTY DATA ENVIRONMENT FREQUENT I...
SECURED FREQUENT ITEMSET DISCOVERY IN MULTI PARTY DATA ENVIRONMENT FREQUENT I...
 
Ijsrdv1 i2039
Ijsrdv1 i2039Ijsrdv1 i2039
Ijsrdv1 i2039
 
An apriori based algorithm to mine association rules with inter itemset distance
An apriori based algorithm to mine association rules with inter itemset distanceAn apriori based algorithm to mine association rules with inter itemset distance
An apriori based algorithm to mine association rules with inter itemset distance
 
Comparative analysis of association rule generation algorithms in data streams
Comparative analysis of association rule generation algorithms in data streamsComparative analysis of association rule generation algorithms in data streams
Comparative analysis of association rule generation algorithms in data streams
 
Privacy Preserving Approaches for High Dimensional Data
Privacy Preserving Approaches for High Dimensional DataPrivacy Preserving Approaches for High Dimensional Data
Privacy Preserving Approaches for High Dimensional Data
 
A Novel Filtering based Scheme for Privacy Preserving Data Mining
A Novel Filtering based Scheme for Privacy Preserving Data MiningA Novel Filtering based Scheme for Privacy Preserving Data Mining
A Novel Filtering based Scheme for Privacy Preserving Data Mining
 
Introduction To Multilevel Association Rule And Its Methods
Introduction To Multilevel Association Rule And Its MethodsIntroduction To Multilevel Association Rule And Its Methods
Introduction To Multilevel Association Rule And Its Methods
 
Hl2513421351
Hl2513421351Hl2513421351
Hl2513421351
 
Hl2513421351
Hl2513421351Hl2513421351
Hl2513421351
 
An Ontological Approach for Mining Association Rules from Transactional Dataset
An Ontological Approach for Mining Association Rules from Transactional DatasetAn Ontological Approach for Mining Association Rules from Transactional Dataset
An Ontological Approach for Mining Association Rules from Transactional Dataset
 

Más de IOSR Journals (20)

A011140104
A011140104A011140104
A011140104
 
M0111397100
M0111397100M0111397100
M0111397100
 
L011138596
L011138596L011138596
L011138596
 
K011138084
K011138084K011138084
K011138084
 
J011137479
J011137479J011137479
J011137479
 
I011136673
I011136673I011136673
I011136673
 
G011134454
G011134454G011134454
G011134454
 
H011135565
H011135565H011135565
H011135565
 
F011134043
F011134043F011134043
F011134043
 
E011133639
E011133639E011133639
E011133639
 
D011132635
D011132635D011132635
D011132635
 
C011131925
C011131925C011131925
C011131925
 
B011130918
B011130918B011130918
B011130918
 
A011130108
A011130108A011130108
A011130108
 
I011125160
I011125160I011125160
I011125160
 
H011124050
H011124050H011124050
H011124050
 
G011123539
G011123539G011123539
G011123539
 
F011123134
F011123134F011123134
F011123134
 
E011122530
E011122530E011122530
E011122530
 
D011121524
D011121524D011121524
D011121524
 

Último

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 

Último (20)

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 

An Effective Heuristic for Hiding Sensitive Database Patterns

  • 1. IOSR Journal of Computer Engineering (IOSRJCE) ISSN: 2278-0661 Volume 5, Issue 1 (Sep-Oct. 2012), PP 06-11 Www.iosrjournals.org www.iosrjournals.org 6 | Page An Effective Heuristic Approach for Hiding Sensitive Patterns in Databases Mrs. P.Cynthia Selvi1 , Dr. A.R.Mohamed Shanavas2 1 Associate Professor, Dept. of Computer Science, KNGA College(W), Thanjavur 613007 /Affiliated to Bharathidasan University, Tiruchirapalli, TamilNadu, India. 2 Associate Professor, Dept. of Comuter Science, Jamal Mohamed College, Tiruchirapalli 620 020/ Affiliated to Bharathidasan University, Tiruchirapalli, TamilNadu, India. Abstract: Privacy has been identified as a vital requirement in designing and implementing Data Mining (DM) systems. This motivated Privacy Preservation in Data Mining (PPDM) as a rising field of research and various approaches are being introduced by the researchers. One of the approaches is a sanitization process, that transforms the source database into a modified one that the adversaries cannot extract the sensitive patterns from. This study address this concept and proposes an effective heuristic-based algorithm which is aimed at minimizing the number of removal of items in the source database possibly with no hiding failure. Keywords: Privacy Preserving Data Mining, Restrictive Patterns, Sanitized database, Sensitive Transactions. I. Introduction PPDM is a novel research direction in DM, where DM algorithms are analyzed for the side-effects they incur in data privacy. The main objective of PPDM is to develop algorithms for modifying the original data in some way, so that the private data and private knowledge remain private even after the mining process[1]. In DM, the users are provided with the data and not the association rules and are free to use their own tools; So, the restriction for privacy has to be applied on the data itself before the mining phase. For this reason, we need to develop mechanisms that can lead to new privacy control systems to convert a given database into a new one in such a way to preserve the general rules mined from the original database. The procedure of transforming the source database into a new database that hides some sensitive patterns or rules is called the sanitization process[2]. To do so, a small number of transactions have to be modified by deleting one or more items from them or even adding noise to the data by turning some items from 0 to 1 in some transactions. The released database is called the sanitized database. On one hand, this approach slightly modifies some data, but this is perfectly acceptable in some real applications[3, 4]. On the other hand, such an approach must hold the following restrictions: o The impact on the source database has to be minimal o An appropriate balance between the need for privacy and knowledge has to be guaranteed. This study mainly focus on the task of minimizing the impact on the source database by reducing the number of removed items from the source database with only one scan of the database. Section-2 briefly summarizes the previous work done by various researchers; In Section-3 preliminaries are given. Section-4 states some basic definitions and of which definition 5 is framed by us which is used in the proposed heuristic- based algorithm. In Section-5 the proposed algorithm is presented with illustration and example. As the detailed analysis of the experimental results on large databases is under process, only the basic measures of effectiveness is presented in this paper, after testing the algorithm for a sample generated database. II. Related Work Many researchers have paid attention to address the problem of privacy preservation in association rule mining in recent years. The class of solutions for this problem has been restricted basically to randomization, data partition, and data sanitization. The idea behind data sanitization to reduce the support values of restrictive itemsets was first introduced by Atallah et.al[1] and they have proved that the optimal sanitization process is NP-hard problem. In [4], the authors generalized the problem in the sense that they considered the hiding of both sensitive frequent itemsets and sensitive rules. Although these algorithms ensure privacy preservation, they are CPU-intensive since they require multiple scans over a transactional database. In the same direction, Saygin [5] introduced a method for selectively removing individual values from a database to prevent the discovery of a set of rules, while preserving the data for other applications. They proposed some algorithms to obscure a given set of sensitive rules by replacing known values with unknowns, while minimizing the side effects on non-sensitive rules. These algorithms also require various scans to sanitize a database depending on the number of association rules to be hidden.
  • 2. An Effective Heuristic Approach for Hiding Sensitive Patterns in Databases www.iosrjournals.org 7 | Page Oliveira introduced many algorithms of which IGA[6] & SWA[7] aims at multiple rule hiding. However, IGA has low misses cost; It groups restrictive itemsets and assigns a victim item to each group. This clustering leads to the overlap between groups and it is not an efficient method to optimally cluster the itemsets. It can be improved further by reducing the number of deleted items. Whereas, SWA improves the balance between protection of sensitive knowledge and pattern discovery but it incurs an extra cost because some rules are removed inadvertently. In our work, we focus on the heuristic based data sanitization approach. III. Preliminaries Transactional Database. A transactional database is a relation consisting of transactions in which each transaction t is characterized by an ordered pair, defined as t = ˂Tid, list-of-elements˃, where Tid is a unique transaction identifier number and list-of-elements represents a list of items making up the transactions. For instance, in market basket data, a transactional database is composed of business transactions in which the list- of-elements represents items purchased in a store. Basics of Association Rules . One of the most studied problems in data mining is the process of discovering association rules from large databases. Most of the existing algorithms for association rules rely on the support- confidence framework introduced in [8]. Formally, association rules are defined as follows: Let I = {i1,...,in} be a set of literals, called items. Let D be a database of transactions, where each transaction t is an itemset such that . A unique identifier, called Tid, is associated with each transaction. A transaction t supports X, a set of items in I, if . An association rule is an implication of the form , where , and . Thus, we say that a rule holds in the database D with support if , where N is the number of transactions in D. Similarly, we say that a rule holds in the database D with confidence ) if , where is the number of occurrences of the set of items A in the set of transactions D. While the support is a measure of the frequency of a rule, the confidence is a measure of the strength of the relation between sets of items. Association rule mining algorithms rely on the two attributes, minimum Support(minSup ) and minimum Confidence(minConf ). The problem of mining association rules have been first proposed in 1993[8]. Frequent Pattern. A pattern X is called a frequent pattern if Sup(X) ≥ minSup or if the absolute support of X satisfies the corresponding minimum support count threshold. [pattern is an itemset; in this article, both terms are used synonymously]. All association rules can directly be derived from the set of frequent patterns[8, 9]. The conventions followed here are o Apriori property[10]: all non empty subsets of a frequent itemsets(patterns) must also be frequent. o Antimonotone property: if a set cannot pass a test, then all of its supersets will fail the same test as well. Privacy Preservation in Frequent Patterns. The most basic model of privacy preserving data processing is one in which we erase the sensitive entries in the data. These erased entries are usually particular patterns which are decided by the user, who may either be the owner or the contributor of the data. IV. Problem Definition In this approach, the goal is to hide a group of frequent patterns which contains highly sensitive knowledge. Such sensitive patterns that should be hidden are called restrictive patterns. Restrictive patterns can always be generated from frequent patterns. Definition 1. Let D be a source database, containing a set of all transactions. T denotes a set of transactions, each transaction containing itemset . In addition, each k-itemset has an associated set of transactions , where and . Definition 2 : Restrictive Patterns : Let D be a source database, P be a set of all frequent patterns that can be mined from D, and RulesH be a set of decision support rules that need to be hidden according to some security policies. A set of patterns, denoted by RP is said to be restrictive, if RP ⊂ P and if and only if RP would derive the set RulesH. RP is the set of non-restrictive patterns such that RP RP = P. Definition 3 : Sensitive Transactions : Let T be a set of all transactions in a source database D, and RP be a set of restrictive patterns mined from D. A set of transactions is said to be sensitive, denoted by ST, if every t ST contain atleast one restrictive pattern, ie ST ={ T | X RP, X ⊆ t }. Moreover, if ST T then all restrictive patterns can be mined one and only from ST.
  • 3. An Effective Heuristic Approach for Hiding Sensitive Patterns in Databases www.iosrjournals.org 8 | Page Definition 4 : Transaction Degree : Let D be a source database and ST be a set of all sensitive transactions in D. The degree of a sensitive transaction t, denoted as deg(t), such that t ST is defined as the number of restrictive patterns that t contains. Definition 5: Cover : The Cover of an item Ak can be defined as, CAk = { rpi | Ak rpi RP, 1 i |RP|} i.e., set of all restrictive patterns(rpi’s) which contain Ak. The item that is included in a maximum number of rpi’s is the one with maximal cover or maxCover; i.e., maxCover = max( |CA1|, |CA2| , … |CAn| ) such that Ak rpi RP. Based on the above definitions, the main strategy addressed in this work can be stated as given below: If D is the source database of transactions and P is the set of relevant patterns that would be mined from D, the goal is to transform D into a sanitized database D’, so that the most frequent patterns in P can still be mined from D’ while others will be hidden. In this case, D’ becomes the released database. V. Sanitization Algorithm The optimal sanitization has been proved to be an NP-hard problem. To alleviate the complexity of the optimal sanitization, some heuristics could be used. A heuristic does not guarantee the optimal solution but usually finds a solution close to the best one in a faster response time. In this section, the proposed sanitizing algorithm and the heuristics to sanitize a source database are introduced. Given the source database (D), and the restrictive patterns(RP), the goal of the sanitization process is to protect RP against the mining techniques used to disclose them. The sanitization process decreases the support values of restrictive patterns by removing items from sensitive transactions. This process mainly includes four sub-problems: 1. identifying the set of sensitive transactions for each restrictive pattern; 2. selecting the partial sensitive transactions to sanitize; 3. identify the candidate item(victim item) to be removed; 4. rewriting the modified database after removing the victim items. Basically, all sanitizing algorithms differs only in subproblems 2 & 3. 5.1. Heuristic approach In this work, the proposed algorithm is based on the following heuristics: Heuristic-1. To solve subproblem-2 stated above, ST is sorted in decreasing order of (deg + size), thereby the sensitive transactions that contains more number of patterns can be selected; this enable multiple patterns to be sanitized in a single iteration. Heuristic-2. To solve subproblem-3, the following heuristic is used in the algorithm : for every item Ak RP, find cover and starting from maximal Cover, find T = ; for every t T, mark Ak as the victim item and remove. The rationale behind these two heuristics is to minimize the sanitization rate and thereby reducing the impact on the source database. Note : In this work, no sensitive transaction is completely removed. (ie, the number of transactions in the source database is not altered). 5.2. Algorithm Input : (i) D – Source Database (ii) RP – Set of all Restrictive Patterns Output : D’ – Sanitized Database Pre-requisites : (i) Find Frequent Patterns(Itemsets) using Matrix Apriori Algorithm; (ii) Form Look-up Table-1 : Ak RP , LT1(Ak) t-list / t D (iii) Form Look-up Table-2 : rpi RP, LT2(rpi) t-list / t D (iv) Form Look-up Table-3 : Ak RP, LT3(Ak) rpi-list / rpi RP Algorithm maxCover1: // based on Heuristics 1 & 2 // Step 1 : calculate supCount(rpi) rpi RP and sort in decreasing order ; Step 2 : find Sensitive Transactions(ST) w.r.t. RP ; a) calculate deg(t), size(t) t ST ; b) sort t ST in decreasing order of deg & size ; Step 3 : find ST D ST ; // ST - non sensitive transactions //
  • 4. An Effective Heuristic Approach for Hiding Sensitive Patterns in Databases www.iosrjournals.org 9 | Page Step 4 : // Find ST’ // find cover for every item Ak RP and sort in decreasing order of cover; for each item Ak RP do { repeat find T = for each t T do { delete item Ak in non VictimTransactions such that Ak rpi rpi-list ; // Ak – victimItem // // initially all t are nonvictim // decrease supCount of rpi’s for which t is nonVictim; mark t as victimTransaction in each t-list of rpi rpi _list(Ak ) ; } until (supCount = 0) for all rpi RP } Step 5 : D’ ST ST’ 5.3. Illustration The following examples help understand how the proposed Heuristics work. Refer the Source Database(D) in Table-1. The set of all Restrictive Patterns to be hidden are given in Table-2. The sensitive transactions- ST (transactions which include atleast one Restrictive Pattern) are identified from D and are extracted. They are sorted in decreasing order of their deg and size(Table-3). Non sensitive transactions( ST) are also filtered and stored separately(Table-5). The proposed algorithm refer the Look-up Tables(listed below) to speed up the process. LookUp Table-1 [ item ← t-list ] LookUp Table-2 [ rpi ← t-list ] Item Transactions No.of Trans. C T01,T04,T02,T03,T06 5 D T01,T04,T02,T03,T06 5 E T01,T04,T03,T06 4 A T01,T04,T02 3 Item Transactions SupCount R1 T01,T04,T02,T03,T06 5 R2 T01,T04,T03,T06 4 R3 T01,T04,T02 3
  • 5. An Effective Heuristic Approach for Hiding Sensitive Patterns in Databases www.iosrjournals.org 10 | Page LookUp Table-3 [ item ← rpi-list ] Table.6. Sanitized Database(D’) The proposed algorithm is based on the Heuristics 1 and 2. Here, for every item Ak starting from maxCover (refer LookUp Table-3), find the sensitive transactions which are common for all rules associated with Ak. i.e., [T = ]. In every transaction t in T, remove Ak and decrease the supCount of all restrictive patterns( rpi) associated with t and which contain Ak. Example : Item C (being the first item with maxCover), R1 & R3 form the rpi-list(C); whose common transactions are [T01, T04, T02]. Remove C from these transactions, which reduce the supCount of both R1 and R3 by 3. Mark these t as victim transactions for R1 and R3. When we consider the next item D, the t-list of rpi-list(D) are [T01, T04, T03, T06]. Removing D from T01 and T04 would not reduce the supCount of R1(because they are already considered in the previous iteration); but would reduce the supCount of R2. Hence, remove it and decrease only the supCount of R2. Whereas removing D from T03 and T06 would reduce the supCount of both R1 and R2. This process is repeated until the supCount of all rpi’s are reduced to 0. The modified form of sensitive transactions are denoted as ST’. Then the Sanitized database D’(refer table-6)is formed bycombining ST (Table-5) and ST’. VI. Implementation This algorithm is tested in the environment of Intel core 2 duo Processor with 2.5 GHz speed and 4 GB RAM, running Windows XP. We have used NetBean 6.9.1 to code the algorithm using Java(JDK 1.7) with SQL Server 2005. It has been tested for a sample database to verify the main objectives (minimal removal of items and no hiding failure) and it requires only one scan of the source database. However, the testing process for very large real databases is in progress for a detailed analysis. Before the hiding process, the frequent patterns are obtained using Matrix Apriori Algorithm[11], which is faster and use simpler data structure than the Apriori algorithm[9,10]. Moreover, it scans the database only twice and works without candidate generation. VII. Effectiveness Measures (i) Sanitization Rate(SR) : It is defined as the ratio of removed items(victim items) to the total support value of restrictive patterns(rpi) in the source database D. SR = With the sample database tested, it is found that the SR is less than 50%. (ii) Hiding Failure(HF) : It is assessed by the restrictive patterns that were failed to be hidden. In other words, if a hidden restrictive pattern cannot be extracted from the released database D’ with an arbitrary minSup, the hidden pattern has no hiding failure occurrence. HF = As far as the algorithm maxcover1 is concerned, the restrictive patterns are modified till their respective supcount becomes zero. Moreover it ensures that no transaction is completely removed. Hence this algorithm is 100% HF free. (iii) Misses Cost(MC) : This measure deals with the legitimate patterns(non restrictive patterns) that were accidently missed. MC = Note: There is a compromise between the MC and HF; ie, the more patterns we hide, the more legitimate patterns we miss. But with the sample database tested, this algorithm has 0% MC. (iv) Artifactual Pattern(AP) : AP occurs when some artificial patterns are generated form D’ as an outcome of the sanitization process. AP = Item Rules Cover C R1, R3 2 D R1, R2 2 E R2 1 A R3 1 Tid Pattern(Itemset) T01 E,A,B T02 D,A,B,F T03 C,E T04 E,A T05 E,B,F T06 C,E
  • 6. An Effective Heuristic Approach for Hiding Sensitive Patterns in Databases www.iosrjournals.org 11 | Page As this algorithm hides restrictive patterns by selectively removing items (instead of swapping, replacement, etc.,) from the source database(D), it does not generate any artifactual pattern. VIII. Conclusion In this competitive but cooperative business environment, companies need to share information with others, while at the same time, have to protect their own confidential knowledge. To facilitate this kind of data sharing with privacy protection, the algorithm based on maxCover is proposed. This algorithm ensure that no counterpart or adversary can mine the restrictive patterns even with an arbitrarily very small supCount. This algorithm is based on the strategy to simultaneously decrease the support count of maximum number of sensitive patterns (itemsets), with possibly minimum number of removal of items and it reduce the impact on the source database. The proposed algorithm has minimal sanitization rate possibly with no hiding failure and low misses cost. Above all this algorithm scans the original database only once. It is important to note that the proposed algorithm is robust in the sense that there is no desanitization possible. The alterations to the original database are not saved anywhere, since the owner of the database still keeps an original copy of the database intact, while disturbing the sanitized database. Moreover, There is no possible way to reproduce the original database from the sanitized one, as there is no encryption involved. As already mentioned, the work on the time complexity analysis and the dissimilarity study between the original and sanitized databases in an elaborate manner using very large databases is in progress for which the publicly available real databases are being used. References [1]. Verykios,V.S, Bertino.E, Fovino.I.N, ProvenzaL.P, Saygin.Y and Theodoridis.Y, “State-of-the-art in Privacy Preservation Data Mining”, New York,ACM SIGMOD Record, vol.33, no.2, pp.50-57,2004. [2]. Atallah.M, Bertino.E, Elmagarmid.A, Ibrahim.M and Verykios.V.S, “Disclosure Limitation of Sensitive Rules”, In Proc. of IEEE Knowledge and Data Engineering Workshop, pages 45–52, Chicago, Illinois, November 1999. [3]. Clifton.C and Marks.D, “ Security and Privacy Implications of Data Mining”, In Workshop on Data Mining and Knowledge Discovery, pages 15–19, Montreal, Canada, February 1996. [4]. Dasseni.E, Verykios.V.S, Elmagarmid.A.K and Bertino.E, “ Hiding Association Rules by Using Confidence and Support”, In Proc. of the 4th Information Hiding Workshop, pages 369– 383, Pittsburg, PA, April 2001. [5]. Saygin.Y, Verykios.V.S, and Clifton.C, “Using Unknowns to Prevent Discovery of Association Rules”, SIGMOD Record, 30(4):45–54, December 2001. [6]. Oliveira.S.R.M and Zaiane.O.R, “Privacy preserving Frequent Itemset Mining”, in the Proc. of the IEEE ICDM Workshop on Privacy, Security, and Data Mining, Pages 43-54, Maebashi City, Japan, December 2002. [7]. Oliveira.S.R.M and Zaiane.O.R, “An Efficient One-Scan Sanitization for Improving the Balance between Privacy and Knowledge Discovery”, Technical Report TR 03-15, June 2003. [8]. Agrawal R, Imielinski T, and Swami.A, “Mining association rules between sets of items in large databases”, Proceedings of 1993 ACM SIGMOD international conference on management of data, Washington, DC; 1993. p. 207-16. [9]. Agrawal R, and Srikant R. “Fast algorithms for mining association rules”, Proceedings of 20th international conference on very large data bases, Santiago, Chile; 1994. p. 487-99. [10]. Han J, and Kamber M, “Data Mining Concepts and Techniques”, Oxford University Press, 2009. [11]. Pavon.J, Viana.S, and Gomez.S, “Matrix Apriori: speeding up the search for frequent patterns,” Proc. 24th IASTED International Conference on Databases and Applications, 2006, pp. 75-82.