SlideShare una empresa de Scribd logo
1 de 30
Association Rule Mining with Privacy Preservation
In Horizontally Distributed Databases
Group 1 – Abhra Basak, Apoorva Kumar, Sachin K. Saini, Shiv Sankar, Suraj B. Malode
Introduction
Look before you leap
The Flow
Association
Rule Mining
Privacy
Preservation
Horizontally
Distributed
Datasets
Before we start mining!
trends or patterns in
large datasets
extracting useful
information
useful and
unexpected insights
analyze and
predicting system
behavior
Data Mining
Scalability
?
Artificial
Engineering
Machine
Learning
Statistics
Database
Systems
Association Rule Learning
By Rakesh Agarwal, IBM Almaden Research Center
• 80% of people who buy bread + butter, buy milk
• {Bread, Butter} → {Milk}
What is an Association Rule?
Antecedent
Consequent
Antecedent Consequent
Definitions
• 80% of people who buy bread + butter, buy milk
• {Bread, Butter} → {Milk}
Antecedent
• Prerequisites for
the rule to be
applied
Consequent
• The outcome
Support
• Percentage of
transaction
containing the
itemset
Confidence
• Faction of
transaction
satisfying the
rule
• Two different forms of constraints are used to generate the required association rules
• Syntactic Constraints: Restricts the attributes that may be present in a rule.
• Support Constraints: No of transactions that support a rule from the set of transactions.
Constraints
Association Rule Learning in Large Datasets
large datasets
• To find association rules
Generating
Large Itemset
• combinations of itemsets which are above a minimum support threshold
Generating
Association
Rules
•Mining all rules which are satisfied in that itemset
Association Rule Learning in Distributed Datasets
And Privacy Preservation
• Most tools used for mining association rules assume that data to be analyzed can be
collected at one central site.
• But issues like Privacy Preservation restrict the collection of data.
• Alternative methods for mining have to be devised for distributed datasets to the mining
process feasible while ensuring privacy.
Preview
• Dataset
• Combined data of Twitter and Facebook
• Rule
• How many percentage of people login into a social networking
site and post within the next 2 minutes?
Privacy Preservation
• Horizontally Partitioned (Example: Insurance Companies)
• Rule Being Mined: Does a procedure have an unusual rate of
complication?
• Implications:
• A company may have high cases of the procedure failing and
they may change policies to help.
• At the same time if this rule is exposed it may be a huge
problem for the company.
• The risks outweigh the gains.
Privacy Preservation
Patient ID Disease Prescription Effect
Patient ID Disease Prescription Effect
Patient ID Disease Prescription Effect
Company A
Company C
Company B
• Vertically Partitioned
Privacy Preservation
Credit Card No. Bought
tablet
2365987545623526 1
3639871526589414 1
4365845698742563 1
5962845632561200 1
6621563289657412 1
Credit Card No. Bought
TCover
2365987545623526 0
7639871526589414 1
4365845698742563 1
9962845632561200 0
6621563289657412 1
Common Property
Not One We
can exploit.
Mining of Association Rules
In Horizontally Partitioned Databases
What we want
• Computing Association Rules without revealing private information and getting
• The global support
• The global confidence
What we have
• Only the following information is available
• Local Support
• Local Confidence
• Size of the DB
Fundamental Steps
Even this information may not be shared freely between sites.
But we’ll get to that.
Calculating Required Values
• It protects individual privacy but each site has to disclose information.
• It reveals the local support and confidence in a rule at each site.
• This information if revealed can be harmful to an organization.
Problems with the approach
• We will be exploring two algorithms that have been used.
• One algorithm that has been used incorporates encryption with data distortion
while data sharing between sites.
• The second algorithm uses a particular Check Sum as the method of encryption.
Introducing the two Algorithms
Algorithm Uno
Some people are honest
• Phase 1: Uses encryption for mining of the large itemsets
• Phase 2: Uses a random number to preserve the privacy of each site (assuming a 3 or more party system)
Two phased algorithm
Phase 1: Commutative Encryption
Phase 2: Data Distortion
Site A
ABC:5
Size=100
Site B
ABC:6
Size=200
Site C
ABC:20
Size=300
R+count-5%*Size
=17+5-5%*100
13+20-5%*300 17+6-5%*200
13
17
18 >= R
R=17
• Doesn’t work for a 2 party system
• Assumes honest parties
• Assumes Boolean responses to variable for support of rules rather than a
subjective or weighted approach.
• As the no of candidate itemsets increases the encryption overhead
increases.
• The encryption overhead also varies directly proportional to the no of
sites or partitions.
Problems with the Algorithm
I got
……
Algorithm Dua
Don’t trust anyone
• Primarily used for to tackle semi honest sites.
• Data of each site is broken down into segments.
• Two interleaved nodes have a probability of hacking the one in between them.
• The neighbors are changed for each round. Hence, they can only obtain one such segment.
CK Secure Sum
P1
P2
P3
P4
Changing Neighbors
P1
P2
P4
P3
P1
P4
P2
P3
Round 1
Round 2
Round 3
Conclusion
The moral of the story...
Before you leave
• It is interesting that association rules play a vital role in data mining.
• Through this, what appears to be unrelated can have a logical explanation through
careful analysis.
• This aspect of data mining can be very useful in predicting patterns and foreseeing
trends in consumer behavior, choices and preferences.
• Association rules are indeed one of the best ways to succeed in business and enjoy the
harvest from data mining.
There are no dumb questions
(No questions please shhhh…)

Más contenido relacionado

Similar a Privacy Preservation Issues in Association Rule Mining in Horizontally Partitioned Databases

CFO Half-Day Conference
CFO Half-Day ConferenceCFO Half-Day Conference
CFO Half-Day Conferencegppcpa
 
Blockchain and Cybersecurity
Blockchain and Cybersecurity Blockchain and Cybersecurity
Blockchain and Cybersecurity gppcpa
 
Advanced Analytics: Graph Database Use Cases
Advanced Analytics: Graph Database Use CasesAdvanced Analytics: Graph Database Use Cases
Advanced Analytics: Graph Database Use CasesDATAVERSITY
 
How Eastern Bank Uses Big Data to Better Serve and Protect its Customers
How Eastern Bank Uses Big Data to Better Serve and Protect its CustomersHow Eastern Bank Uses Big Data to Better Serve and Protect its Customers
How Eastern Bank Uses Big Data to Better Serve and Protect its CustomersBrian Griffith
 
Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Roger Barga
 
Data warehousing and mining furc
Data warehousing and mining furcData warehousing and mining furc
Data warehousing and mining furcShani729
 
Group 2 Handling and Processing of big data.pptx
Group 2 Handling and Processing of big data.pptxGroup 2 Handling and Processing of big data.pptx
Group 2 Handling and Processing of big data.pptxsalutiontechnology
 
Webinar: Strategies to Enhance your Screening and Transaction Monitoring Proc...
Webinar: Strategies to Enhance your Screening and Transaction Monitoring Proc...Webinar: Strategies to Enhance your Screening and Transaction Monitoring Proc...
Webinar: Strategies to Enhance your Screening and Transaction Monitoring Proc...Alessa
 
Discovering Big Data in the Fog: Why Catalogs Matter
 Discovering Big Data in the Fog: Why Catalogs Matter Discovering Big Data in the Fog: Why Catalogs Matter
Discovering Big Data in the Fog: Why Catalogs MatterEric Kavanagh
 
Desai_edinburgh2001
Desai_edinburgh2001Desai_edinburgh2001
Desai_edinburgh2001Vijay Desai
 
Webinar Slides - How KeyBank Liberated its IT Ops from Rules-Based Event Mana...
Webinar Slides - How KeyBank Liberated its IT Ops from Rules-Based Event Mana...Webinar Slides - How KeyBank Liberated its IT Ops from Rules-Based Event Mana...
Webinar Slides - How KeyBank Liberated its IT Ops from Rules-Based Event Mana...Moogsoft
 
Riskfactorypcitheessentials 151125164111-lva1-app6892
Riskfactorypcitheessentials 151125164111-lva1-app6892Riskfactorypcitheessentials 151125164111-lva1-app6892
Riskfactorypcitheessentials 151125164111-lva1-app6892Risk Crew
 
Get the FUD out of Cybersecurity! ISACA CSXNA 2016 in Las Vegas
Get the FUD out of Cybersecurity! ISACA CSXNA 2016 in Las VegasGet the FUD out of Cybersecurity! ISACA CSXNA 2016 in Las Vegas
Get the FUD out of Cybersecurity! ISACA CSXNA 2016 in Las VegasShawn Tuma
 
Introduction to Data mining
Introduction to Data miningIntroduction to Data mining
Introduction to Data miningHadi Fadlallah
 
Legal Issues Associated with Third-Party Cyber Risk
Legal Issues Associated with Third-Party Cyber RiskLegal Issues Associated with Third-Party Cyber Risk
Legal Issues Associated with Third-Party Cyber RiskShawn Tuma
 

Similar a Privacy Preservation Issues in Association Rule Mining in Horizontally Partitioned Databases (20)

CFO Half-Day Conference
CFO Half-Day ConferenceCFO Half-Day Conference
CFO Half-Day Conference
 
Blockchain and Cybersecurity
Blockchain and Cybersecurity Blockchain and Cybersecurity
Blockchain and Cybersecurity
 
Trends in data analytics
Trends in data analyticsTrends in data analytics
Trends in data analytics
 
Advanced Analytics: Graph Database Use Cases
Advanced Analytics: Graph Database Use CasesAdvanced Analytics: Graph Database Use Cases
Advanced Analytics: Graph Database Use Cases
 
Data mining
Data miningData mining
Data mining
 
How Eastern Bank Uses Big Data to Better Serve and Protect its Customers
How Eastern Bank Uses Big Data to Better Serve and Protect its CustomersHow Eastern Bank Uses Big Data to Better Serve and Protect its Customers
How Eastern Bank Uses Big Data to Better Serve and Protect its Customers
 
Barga Galvanize Sept 2015
Barga Galvanize Sept 2015Barga Galvanize Sept 2015
Barga Galvanize Sept 2015
 
Data warehousing and mining furc
Data warehousing and mining furcData warehousing and mining furc
Data warehousing and mining furc
 
Group 2 Handling and Processing of big data.pptx
Group 2 Handling and Processing of big data.pptxGroup 2 Handling and Processing of big data.pptx
Group 2 Handling and Processing of big data.pptx
 
Webinar: Strategies to Enhance your Screening and Transaction Monitoring Proc...
Webinar: Strategies to Enhance your Screening and Transaction Monitoring Proc...Webinar: Strategies to Enhance your Screening and Transaction Monitoring Proc...
Webinar: Strategies to Enhance your Screening and Transaction Monitoring Proc...
 
Discovering Big Data in the Fog: Why Catalogs Matter
 Discovering Big Data in the Fog: Why Catalogs Matter Discovering Big Data in the Fog: Why Catalogs Matter
Discovering Big Data in the Fog: Why Catalogs Matter
 
Desai_edinburgh2001
Desai_edinburgh2001Desai_edinburgh2001
Desai_edinburgh2001
 
MVA Project.pptx
MVA Project.pptxMVA Project.pptx
MVA Project.pptx
 
Webinar Slides - How KeyBank Liberated its IT Ops from Rules-Based Event Mana...
Webinar Slides - How KeyBank Liberated its IT Ops from Rules-Based Event Mana...Webinar Slides - How KeyBank Liberated its IT Ops from Rules-Based Event Mana...
Webinar Slides - How KeyBank Liberated its IT Ops from Rules-Based Event Mana...
 
Fraud detection analysis
Fraud detection analysis Fraud detection analysis
Fraud detection analysis
 
Design for Security
Design for SecurityDesign for Security
Design for Security
 
Riskfactorypcitheessentials 151125164111-lva1-app6892
Riskfactorypcitheessentials 151125164111-lva1-app6892Riskfactorypcitheessentials 151125164111-lva1-app6892
Riskfactorypcitheessentials 151125164111-lva1-app6892
 
Get the FUD out of Cybersecurity! ISACA CSXNA 2016 in Las Vegas
Get the FUD out of Cybersecurity! ISACA CSXNA 2016 in Las VegasGet the FUD out of Cybersecurity! ISACA CSXNA 2016 in Las Vegas
Get the FUD out of Cybersecurity! ISACA CSXNA 2016 in Las Vegas
 
Introduction to Data mining
Introduction to Data miningIntroduction to Data mining
Introduction to Data mining
 
Legal Issues Associated with Third-Party Cyber Risk
Legal Issues Associated with Third-Party Cyber RiskLegal Issues Associated with Third-Party Cyber Risk
Legal Issues Associated with Third-Party Cyber Risk
 

Más de Abhra Basak

FINANCIAL MARKET PREDICTION AND PORTFOLIO OPTIMIZATION USING FUZZY DECISION T...
FINANCIAL MARKET PREDICTION AND PORTFOLIO OPTIMIZATION USING FUZZY DECISION T...FINANCIAL MARKET PREDICTION AND PORTFOLIO OPTIMIZATION USING FUZZY DECISION T...
FINANCIAL MARKET PREDICTION AND PORTFOLIO OPTIMIZATION USING FUZZY DECISION T...Abhra Basak
 
Concurrency in java
Concurrency in javaConcurrency in java
Concurrency in javaAbhra Basak
 
Introduction to XML
Introduction to XMLIntroduction to XML
Introduction to XMLAbhra Basak
 
Spanner - Google distributed database
Spanner - Google distributed databaseSpanner - Google distributed database
Spanner - Google distributed databaseAbhra Basak
 
DADAGIRI - The Fire Within
DADAGIRI - The Fire WithinDADAGIRI - The Fire Within
DADAGIRI - The Fire WithinAbhra Basak
 
Usability evaluation of the IIT Mandi Website
Usability evaluation of the IIT Mandi WebsiteUsability evaluation of the IIT Mandi Website
Usability evaluation of the IIT Mandi WebsiteAbhra Basak
 
Course Recommender
Course RecommenderCourse Recommender
Course RecommenderAbhra Basak
 
National Stock Exchange and Nasdaq 100
National Stock Exchange and Nasdaq 100National Stock Exchange and Nasdaq 100
National Stock Exchange and Nasdaq 100Abhra Basak
 

Más de Abhra Basak (8)

FINANCIAL MARKET PREDICTION AND PORTFOLIO OPTIMIZATION USING FUZZY DECISION T...
FINANCIAL MARKET PREDICTION AND PORTFOLIO OPTIMIZATION USING FUZZY DECISION T...FINANCIAL MARKET PREDICTION AND PORTFOLIO OPTIMIZATION USING FUZZY DECISION T...
FINANCIAL MARKET PREDICTION AND PORTFOLIO OPTIMIZATION USING FUZZY DECISION T...
 
Concurrency in java
Concurrency in javaConcurrency in java
Concurrency in java
 
Introduction to XML
Introduction to XMLIntroduction to XML
Introduction to XML
 
Spanner - Google distributed database
Spanner - Google distributed databaseSpanner - Google distributed database
Spanner - Google distributed database
 
DADAGIRI - The Fire Within
DADAGIRI - The Fire WithinDADAGIRI - The Fire Within
DADAGIRI - The Fire Within
 
Usability evaluation of the IIT Mandi Website
Usability evaluation of the IIT Mandi WebsiteUsability evaluation of the IIT Mandi Website
Usability evaluation of the IIT Mandi Website
 
Course Recommender
Course RecommenderCourse Recommender
Course Recommender
 
National Stock Exchange and Nasdaq 100
National Stock Exchange and Nasdaq 100National Stock Exchange and Nasdaq 100
National Stock Exchange and Nasdaq 100
 

Último

Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 

Último (20)

Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 

Privacy Preservation Issues in Association Rule Mining in Horizontally Partitioned Databases

  • 1. Association Rule Mining with Privacy Preservation In Horizontally Distributed Databases Group 1 – Abhra Basak, Apoorva Kumar, Sachin K. Saini, Shiv Sankar, Suraj B. Malode
  • 4. Before we start mining! trends or patterns in large datasets extracting useful information useful and unexpected insights analyze and predicting system behavior Data Mining Scalability ? Artificial Engineering Machine Learning Statistics Database Systems
  • 5. Association Rule Learning By Rakesh Agarwal, IBM Almaden Research Center
  • 6. • 80% of people who buy bread + butter, buy milk • {Bread, Butter} → {Milk} What is an Association Rule? Antecedent Consequent Antecedent Consequent
  • 7. Definitions • 80% of people who buy bread + butter, buy milk • {Bread, Butter} → {Milk} Antecedent • Prerequisites for the rule to be applied Consequent • The outcome Support • Percentage of transaction containing the itemset Confidence • Faction of transaction satisfying the rule
  • 8. • Two different forms of constraints are used to generate the required association rules • Syntactic Constraints: Restricts the attributes that may be present in a rule. • Support Constraints: No of transactions that support a rule from the set of transactions. Constraints
  • 9. Association Rule Learning in Large Datasets large datasets • To find association rules Generating Large Itemset • combinations of itemsets which are above a minimum support threshold Generating Association Rules •Mining all rules which are satisfied in that itemset
  • 10. Association Rule Learning in Distributed Datasets And Privacy Preservation
  • 11. • Most tools used for mining association rules assume that data to be analyzed can be collected at one central site. • But issues like Privacy Preservation restrict the collection of data. • Alternative methods for mining have to be devised for distributed datasets to the mining process feasible while ensuring privacy. Preview
  • 12. • Dataset • Combined data of Twitter and Facebook • Rule • How many percentage of people login into a social networking site and post within the next 2 minutes? Privacy Preservation
  • 13. • Horizontally Partitioned (Example: Insurance Companies) • Rule Being Mined: Does a procedure have an unusual rate of complication? • Implications: • A company may have high cases of the procedure failing and they may change policies to help. • At the same time if this rule is exposed it may be a huge problem for the company. • The risks outweigh the gains. Privacy Preservation Patient ID Disease Prescription Effect Patient ID Disease Prescription Effect Patient ID Disease Prescription Effect Company A Company C Company B
  • 14. • Vertically Partitioned Privacy Preservation Credit Card No. Bought tablet 2365987545623526 1 3639871526589414 1 4365845698742563 1 5962845632561200 1 6621563289657412 1 Credit Card No. Bought TCover 2365987545623526 0 7639871526589414 1 4365845698742563 1 9962845632561200 0 6621563289657412 1 Common Property Not One We can exploit.
  • 15. Mining of Association Rules In Horizontally Partitioned Databases
  • 16. What we want • Computing Association Rules without revealing private information and getting • The global support • The global confidence What we have • Only the following information is available • Local Support • Local Confidence • Size of the DB Fundamental Steps Even this information may not be shared freely between sites. But we’ll get to that.
  • 18. • It protects individual privacy but each site has to disclose information. • It reveals the local support and confidence in a rule at each site. • This information if revealed can be harmful to an organization. Problems with the approach
  • 19. • We will be exploring two algorithms that have been used. • One algorithm that has been used incorporates encryption with data distortion while data sharing between sites. • The second algorithm uses a particular Check Sum as the method of encryption. Introducing the two Algorithms
  • 21. • Phase 1: Uses encryption for mining of the large itemsets • Phase 2: Uses a random number to preserve the privacy of each site (assuming a 3 or more party system) Two phased algorithm
  • 22. Phase 1: Commutative Encryption
  • 23. Phase 2: Data Distortion Site A ABC:5 Size=100 Site B ABC:6 Size=200 Site C ABC:20 Size=300 R+count-5%*Size =17+5-5%*100 13+20-5%*300 17+6-5%*200 13 17 18 >= R R=17
  • 24. • Doesn’t work for a 2 party system • Assumes honest parties • Assumes Boolean responses to variable for support of rules rather than a subjective or weighted approach. • As the no of candidate itemsets increases the encryption overhead increases. • The encryption overhead also varies directly proportional to the no of sites or partitions. Problems with the Algorithm I got ……
  • 26. • Primarily used for to tackle semi honest sites. • Data of each site is broken down into segments. • Two interleaved nodes have a probability of hacking the one in between them. • The neighbors are changed for each round. Hence, they can only obtain one such segment. CK Secure Sum
  • 28. Conclusion The moral of the story...
  • 29. Before you leave • It is interesting that association rules play a vital role in data mining. • Through this, what appears to be unrelated can have a logical explanation through careful analysis. • This aspect of data mining can be very useful in predicting patterns and foreseeing trends in consumer behavior, choices and preferences. • Association rules are indeed one of the best ways to succeed in business and enjoy the harvest from data mining.
  • 30. There are no dumb questions (No questions please shhhh…)

Notas del editor

  1. Replace arrows :P
  2. Support - It provides the idea of feasibility of a rule; sometimes applied to antecedent only
  3. Replace arrow