SlideShare una empresa de Scribd logo
1 de 21
Outline
1. Data Mining (DM) ~ KDD [Definition]
2. DM Technique
-> Association rules [support & confidence]
3. Example
(4. Apriori Algorithm)
1. Data Mining ~ KDD [Definition]
- "Data mining (DM), also called KnowledgeDiscovery in Databases (KDD), is the process
of automatically searching large volumes of
data for patterns using specific DM
technique."

- [more formal definition] KDD ~ "the non-trivial
extraction of implicit, previously unknown
and potentially useful knowledge from data"
1. Data Mining ~ KDD [Definition]
Data Mining techniques
•
•
•
•
•
•

Information Visualization
k-nearest neighbor
decision trees
neural networks
association rules
…
2. Association rules
Support
Every association rule has a support and a confidence.
“The support is the percentage of transactions that demonstrate the rule.”

Example: Database with transactions ( customer_# : item_a1, item_a2,
…)
1:
2:
3:
4:

1, 3, 5.
1, 8, 14, 17, 12.
4, 6, 8, 12, 9, 104.
2, 1, 8.

support {8,12} = 2 (,or 50% ~ 2 of 4 customers)
support {1, 5} = 1 (,or 25% ~ 1 of 4 customers )
support {1} = 3 (,or 75% ~ 3 of 4 customers)
2. Association rules
Support

An itemset is called frequent if its support is equal or
greater than an agreed upon minimal value – the support
threshold
add to previous example:
if threshold 50%
then itemsets {8,12} and {1} called frequent
2. Association rules
Confidence
Every association rule has a support and a confidence.
An association rule is of the form:

X => Y

• X => Y: if someone buys X, he also buys Y

The confidence is the conditional probability that, given X
present in a transition , Y will also be present.
Confidence measure, by definition:
Confidence(X=>Y) equals support(X,Y) / support(X)
2. Association rules
Confidence

We should only consider rules derived from
itemsets with high support, and that also have
high confidence.
“A rule with low confidence is not meaningful.”
Rules don’t explain anything, they just point out
hard facts in data volumes.
3. Example
Example: Database with transactions ( customer_# : item_a1, item_a2, … )

1:
2:
3:
4:
5:
6:
7:
8:
9:
10:

3, 5, 8.
2, 6, 8.
1, 4, 7, 10.
3, 8, 10.
2, 5, 8.
1, 5, 6.
4, 5, 6, 8.
2, 3, 4.
1, 5, 7, 8.
3, 8, 9, 10.

Conf ( {5} => {8} ) ?
supp({5}) = 5
, supp({8}) = 7 , supp({5,8}) = 4,
then conf( {5} => {8} ) = 4/5 = 0.8 or 80%
3. Example
Example: Database with transactions ( customer_# : item_a1, item_a2, … )

1:
2:
3:
4:
5:
6:
7:
8:
9:
10:

3, 5, 8.
2, 6, 8.
1, 4, 7, 10.
3, 8, 10.
2, 5, 8.
1, 5, 6.
4, 5, 6, 8.
2, 3, 4.
1, 5, 7, 8.
3, 8, 9, 10.

Conf ( {5} => {8} ) ? 80% Done. Conf ( {8} => {5} ) ?
supp({5}) = 5
, supp({8}) = 7 , supp({5,8}) = 4,
then conf( {8} => {5} ) = 4/7 = 0.57 or 57%
3. Example
Example: Database with transactions ( customer_# : item_a1, item_a2, … )

Conf ( {5} => {8} ) ? 80% Done.
Conf ( {8} => {5} ) ? 57% Done.
Rule ( {5} => {8} ) more meaningful then
Rule ( {8} => {5} )
3. Example
Example: Database with transactions ( customer_# : item_a1, item_a2, … )

1:
2:
3:
4:
5:
6:
7:
8:
9:
10:

3, 5, 8.
2, 6, 8.
1, 4, 7, 10.
3, 8, 10.
2, 5, 8.
1, 5, 6.
4, 5, 6, 8.
2, 3, 4.
1, 5, 7, 8.
3, 8, 9, 10.

Conf ( {9} => {3} ) ?
supp({9}) = 1
, supp({3}) = 1 , supp({3,9}) = 1,
then conf( {9} => {3} ) = 1/1 = 1.0 or 100%. OK?
3. Example
Example: Database with transactions ( customer_# : item_a1, item_a2, … )

Conf( {9} => {3} ) = 100%. Done.
Notice: High Confidence, Low Support.
-> Rule ( {9} => {3} ) not meaningful
Apriori Algorithm
• In computer science and data mining, Apriori is
a classic algorithm for learning association rules.
• Apriori is designed to operate on databases
containing transactions (for example, collections
of items bought by customers, or details of a
website frequentation).
• The algorithm attempts to find subsets which are
common to at least a minimum number C (the
cutoff, or confidence threshold) of the itemsets.

13
Definition (contd.)
• Apriori uses a "bottom up" approach, where
frequent subsets are extended one item at a
time (a step known as candidate generation, and
groups of candidates are tested against the
data.
• The algorithm terminates when no further
successful extensions are found.
• Apriori uses breadth-first search and a hash
tree structure to count candidate item sets
efficiently.
14
15
Steps to Perform Apriori
Algorithm

16
Apriori Algorithm Examples
Problem Decomposition
Transaction ID Items Bought
1
Shoes, Shirt, Jacket
2
Shoes,Jacket
3
Shoes, Jeans
4
Shirt, Sweatshirt

If the minimum support is 50%, then {Shoes, Jacket} is the only 2itemset that satisfies the minimum support.
Frequent Itemset
{Shoes}
{Shirt}
{Jacket}
{Shoes, Jacket}

Support
75%
50%
50%
50%

If the minimum confidence is 50%, then the only two rules generated from this 2itemset, that have confidence greater than 50%, are:
Shoes ⇒ Jacket Support=50%, Confidence=66%
Jacket ⇒ Shoes Support=50%, Confidence=100%

17
The Apriori Algorithm — Example
Min support =50%

Database D
TID
100
200
300
400

itemset sup.
C1
{1}
2
{2}
3
Scan D
{3}
3
{4}
1
{5}
3

Items
134
235
1235
25

L2 itemset sup

C2 itemset sup

2
2
3
2

{1
{1
{1
{2
{2
{3

C3 itemset
{2 3 5}

Scan D

{1 3}
{2 3}
{2 5}
{3 5}

2}
3}
5}
3}
5}
5}

1
2
1
2
3
2

L1 itemset sup.
{1}
{2}
{3}
{5}

2
3
3
3

C2 itemset
{1 2}
Scan D

L3 itemset sup
{2 3 5} 2

{1
{1
{2
{2
{3

3}
5}
3}
5}
5}

18
Pseudo Code for Apriori
Algorithm

19
Apriori
Advantages/Disadvantages
• Advantages
– Uses large itemset property
– Easily parallelized
– Easy to implement

• Disadvantages
– Assumes transaction database is memory
resident.
– Requires many database scans.

20
Summary
•
•
•
•
•
•

Association Rules form an very applied data mining
approach.
Association Rules are derived from frequent itemsets.
The Apriori algorithm is an efficient algorithm for
finding all frequent itemsets.
The Apriori algorithm implements level-wise search
using frequent item property.
The Apriori algorithm can be additionally optimized.
There are many measures for association rules.

21

Más contenido relacionado

La actualidad más candente

1.10.association mining 2
1.10.association mining 21.10.association mining 2
1.10.association mining 2Krish_ver2
 
Association 04.03.14
Association   04.03.14Association   04.03.14
Association 04.03.14rahulmath80
 
Cs583 association-rules
Cs583 association-rulesCs583 association-rules
Cs583 association-rulesGautam Thakur
 
Lect7 Association analysis to correlation analysis
Lect7 Association analysis to correlation analysisLect7 Association analysis to correlation analysis
Lect7 Association analysis to correlation analysishktripathy
 
RDataMining slides-association-rule-mining-with-r
RDataMining slides-association-rule-mining-with-rRDataMining slides-association-rule-mining-with-r
RDataMining slides-association-rule-mining-with-rYanchang Zhao
 
Introduction To Multilevel Association Rule And Its Methods
Introduction To Multilevel Association Rule And Its MethodsIntroduction To Multilevel Association Rule And Its Methods
Introduction To Multilevel Association Rule And Its MethodsIJSRD
 
The comparative study of apriori and FP-growth algorithm
The comparative study of apriori and FP-growth algorithmThe comparative study of apriori and FP-growth algorithm
The comparative study of apriori and FP-growth algorithmdeepti92pawar
 
Associations1
Associations1Associations1
Associations1mancnilu
 
Mining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and CorrelationsMining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and CorrelationsJustin Cletus
 
Understanding Association Rule Mining
Understanding Association Rule MiningUnderstanding Association Rule Mining
Understanding Association Rule MiningMohit Rajput
 
Mining single dimensional boolean association rules from transactional
Mining single dimensional boolean association rules from transactionalMining single dimensional boolean association rules from transactional
Mining single dimensional boolean association rules from transactionalramya marichamy
 
Apriori algorithm
Apriori algorithmApriori algorithm
Apriori algorithmGangadhar S
 
DMTM 2015 - 05 Association Rules
DMTM 2015 - 05 Association RulesDMTM 2015 - 05 Association Rules
DMTM 2015 - 05 Association RulesPier Luca Lanzi
 
Association Analysis
Association AnalysisAssociation Analysis
Association Analysisguest0edcaf
 
Lecture 04 Association Rules Basics
Lecture 04 Association Rules BasicsLecture 04 Association Rules Basics
Lecture 04 Association Rules BasicsPier Luca Lanzi
 

La actualidad más candente (20)

Apriori Algorithm
Apriori AlgorithmApriori Algorithm
Apriori Algorithm
 
1.10.association mining 2
1.10.association mining 21.10.association mining 2
1.10.association mining 2
 
Association 04.03.14
Association   04.03.14Association   04.03.14
Association 04.03.14
 
Apriori algorithm
Apriori algorithmApriori algorithm
Apriori algorithm
 
Cs583 association-rules
Cs583 association-rulesCs583 association-rules
Cs583 association-rules
 
Lect7 Association analysis to correlation analysis
Lect7 Association analysis to correlation analysisLect7 Association analysis to correlation analysis
Lect7 Association analysis to correlation analysis
 
RDataMining slides-association-rule-mining-with-r
RDataMining slides-association-rule-mining-with-rRDataMining slides-association-rule-mining-with-r
RDataMining slides-association-rule-mining-with-r
 
Introduction To Multilevel Association Rule And Its Methods
Introduction To Multilevel Association Rule And Its MethodsIntroduction To Multilevel Association Rule And Its Methods
Introduction To Multilevel Association Rule And Its Methods
 
The comparative study of apriori and FP-growth algorithm
The comparative study of apriori and FP-growth algorithmThe comparative study of apriori and FP-growth algorithm
The comparative study of apriori and FP-growth algorithm
 
Associations1
Associations1Associations1
Associations1
 
Mining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and CorrelationsMining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and Correlations
 
Understanding Association Rule Mining
Understanding Association Rule MiningUnderstanding Association Rule Mining
Understanding Association Rule Mining
 
Mining single dimensional boolean association rules from transactional
Mining single dimensional boolean association rules from transactionalMining single dimensional boolean association rules from transactional
Mining single dimensional boolean association rules from transactional
 
Apriori algorithm
Apriori algorithmApriori algorithm
Apriori algorithm
 
DMTM 2015 - 05 Association Rules
DMTM 2015 - 05 Association RulesDMTM 2015 - 05 Association Rules
DMTM 2015 - 05 Association Rules
 
IMPROVED APRIORI ALGORITHM FOR ASSOCIATION RULES
IMPROVED APRIORI ALGORITHM FOR ASSOCIATION RULESIMPROVED APRIORI ALGORITHM FOR ASSOCIATION RULES
IMPROVED APRIORI ALGORITHM FOR ASSOCIATION RULES
 
Association Analysis
Association AnalysisAssociation Analysis
Association Analysis
 
Apriori algorithm
Apriori algorithmApriori algorithm
Apriori algorithm
 
Lecture 04 Association Rules Basics
Lecture 04 Association Rules BasicsLecture 04 Association Rules Basics
Lecture 04 Association Rules Basics
 
07 fp advanced
07 fp advanced07 fp advanced
07 fp advanced
 

Similar a Rmining

Data mining presentation.ppt
Data mining presentation.pptData mining presentation.ppt
Data mining presentation.pptneelamoberoi1030
 
Data Mining Concepts 15061
Data Mining Concepts 15061Data Mining Concepts 15061
Data Mining Concepts 15061badirh
 
Data Mining Concepts
Data Mining ConceptsData Mining Concepts
Data Mining Conceptsdataminers.ir
 
Data Mining Concepts
Data Mining ConceptsData Mining Concepts
Data Mining ConceptsDung Nguyen
 
IRJET- Effecient Support Itemset Mining using Parallel Map Reducing
IRJET-  	  Effecient Support Itemset Mining using Parallel Map ReducingIRJET-  	  Effecient Support Itemset Mining using Parallel Map Reducing
IRJET- Effecient Support Itemset Mining using Parallel Map ReducingIRJET Journal
 
Top Down Approach to find Maximal Frequent Item Sets using Subset Creation
Top Down Approach to find Maximal Frequent Item Sets using Subset CreationTop Down Approach to find Maximal Frequent Item Sets using Subset Creation
Top Down Approach to find Maximal Frequent Item Sets using Subset Creationcscpconf
 
Association Rule Mining in Data Mining.pptx
Association Rule Mining in Data Mining.pptxAssociation Rule Mining in Data Mining.pptx
Association Rule Mining in Data Mining.pptxlahiruherath654
 
An Improved Frequent Itemset Generation Algorithm Based On Correspondence
An Improved Frequent Itemset Generation Algorithm Based On Correspondence An Improved Frequent Itemset Generation Algorithm Based On Correspondence
An Improved Frequent Itemset Generation Algorithm Based On Correspondence cscpconf
 
Data Mining Association Analysis Basic Concepts a
Data Mining Association Analysis Basic Concepts aData Mining Association Analysis Basic Concepts a
Data Mining Association Analysis Basic Concepts aOllieShoresna
 
Association Rule Mining
Association Rule MiningAssociation Rule Mining
Association Rule MiningPALLAB DAS
 
Chapter 01 Introduction DM.pptx
Chapter 01 Introduction DM.pptxChapter 01 Introduction DM.pptx
Chapter 01 Introduction DM.pptxssuser957b41
 
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...Subrata Kumer Paul
 
Cluster2
Cluster2Cluster2
Cluster2work
 
Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Data Mining:  Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...Data Mining:  Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...Salah Amean
 

Similar a Rmining (20)

Data mining presentation.ppt
Data mining presentation.pptData mining presentation.ppt
Data mining presentation.ppt
 
Data Mining Concepts 15061
Data Mining Concepts 15061Data Mining Concepts 15061
Data Mining Concepts 15061
 
Data Mining Concepts
Data Mining ConceptsData Mining Concepts
Data Mining Concepts
 
Data Mining Concepts
Data Mining ConceptsData Mining Concepts
Data Mining Concepts
 
IRJET- Effecient Support Itemset Mining using Parallel Map Reducing
IRJET-  	  Effecient Support Itemset Mining using Parallel Map ReducingIRJET-  	  Effecient Support Itemset Mining using Parallel Map Reducing
IRJET- Effecient Support Itemset Mining using Parallel Map Reducing
 
Unit 3.pptx
Unit 3.pptxUnit 3.pptx
Unit 3.pptx
 
Top Down Approach to find Maximal Frequent Item Sets using Subset Creation
Top Down Approach to find Maximal Frequent Item Sets using Subset CreationTop Down Approach to find Maximal Frequent Item Sets using Subset Creation
Top Down Approach to find Maximal Frequent Item Sets using Subset Creation
 
Association Rule Mining in Data Mining.pptx
Association Rule Mining in Data Mining.pptxAssociation Rule Mining in Data Mining.pptx
Association Rule Mining in Data Mining.pptx
 
B0950814
B0950814B0950814
B0950814
 
An Improved Frequent Itemset Generation Algorithm Based On Correspondence
An Improved Frequent Itemset Generation Algorithm Based On Correspondence An Improved Frequent Itemset Generation Algorithm Based On Correspondence
An Improved Frequent Itemset Generation Algorithm Based On Correspondence
 
Data Mining Association Analysis Basic Concepts a
Data Mining Association Analysis Basic Concepts aData Mining Association Analysis Basic Concepts a
Data Mining Association Analysis Basic Concepts a
 
06FPBasic.ppt
06FPBasic.ppt06FPBasic.ppt
06FPBasic.ppt
 
06FPBasic.ppt
06FPBasic.ppt06FPBasic.ppt
06FPBasic.ppt
 
Association Rule Mining
Association Rule MiningAssociation Rule Mining
Association Rule Mining
 
Chapter 01 Introduction DM.pptx
Chapter 01 Introduction DM.pptxChapter 01 Introduction DM.pptx
Chapter 01 Introduction DM.pptx
 
06 fp basic
06 fp basic06 fp basic
06 fp basic
 
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...
Chapter 6. Mining Frequent Patterns, Associations and Correlations Basic Conc...
 
Cluster2
Cluster2Cluster2
Cluster2
 
Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Data Mining:  Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...Data Mining:  Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
 
6asso
6asso6asso
6asso
 

Último

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 

Último (20)

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 

Rmining

  • 1. Outline 1. Data Mining (DM) ~ KDD [Definition] 2. DM Technique -> Association rules [support & confidence] 3. Example (4. Apriori Algorithm)
  • 2. 1. Data Mining ~ KDD [Definition] - "Data mining (DM), also called KnowledgeDiscovery in Databases (KDD), is the process of automatically searching large volumes of data for patterns using specific DM technique." - [more formal definition] KDD ~ "the non-trivial extraction of implicit, previously unknown and potentially useful knowledge from data"
  • 3. 1. Data Mining ~ KDD [Definition] Data Mining techniques • • • • • • Information Visualization k-nearest neighbor decision trees neural networks association rules …
  • 4. 2. Association rules Support Every association rule has a support and a confidence. “The support is the percentage of transactions that demonstrate the rule.” Example: Database with transactions ( customer_# : item_a1, item_a2, …) 1: 2: 3: 4: 1, 3, 5. 1, 8, 14, 17, 12. 4, 6, 8, 12, 9, 104. 2, 1, 8. support {8,12} = 2 (,or 50% ~ 2 of 4 customers) support {1, 5} = 1 (,or 25% ~ 1 of 4 customers ) support {1} = 3 (,or 75% ~ 3 of 4 customers)
  • 5. 2. Association rules Support An itemset is called frequent if its support is equal or greater than an agreed upon minimal value – the support threshold add to previous example: if threshold 50% then itemsets {8,12} and {1} called frequent
  • 6. 2. Association rules Confidence Every association rule has a support and a confidence. An association rule is of the form: X => Y • X => Y: if someone buys X, he also buys Y The confidence is the conditional probability that, given X present in a transition , Y will also be present. Confidence measure, by definition: Confidence(X=>Y) equals support(X,Y) / support(X)
  • 7. 2. Association rules Confidence We should only consider rules derived from itemsets with high support, and that also have high confidence. “A rule with low confidence is not meaningful.” Rules don’t explain anything, they just point out hard facts in data volumes.
  • 8. 3. Example Example: Database with transactions ( customer_# : item_a1, item_a2, … ) 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 3, 5, 8. 2, 6, 8. 1, 4, 7, 10. 3, 8, 10. 2, 5, 8. 1, 5, 6. 4, 5, 6, 8. 2, 3, 4. 1, 5, 7, 8. 3, 8, 9, 10. Conf ( {5} => {8} ) ? supp({5}) = 5 , supp({8}) = 7 , supp({5,8}) = 4, then conf( {5} => {8} ) = 4/5 = 0.8 or 80%
  • 9. 3. Example Example: Database with transactions ( customer_# : item_a1, item_a2, … ) 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 3, 5, 8. 2, 6, 8. 1, 4, 7, 10. 3, 8, 10. 2, 5, 8. 1, 5, 6. 4, 5, 6, 8. 2, 3, 4. 1, 5, 7, 8. 3, 8, 9, 10. Conf ( {5} => {8} ) ? 80% Done. Conf ( {8} => {5} ) ? supp({5}) = 5 , supp({8}) = 7 , supp({5,8}) = 4, then conf( {8} => {5} ) = 4/7 = 0.57 or 57%
  • 10. 3. Example Example: Database with transactions ( customer_# : item_a1, item_a2, … ) Conf ( {5} => {8} ) ? 80% Done. Conf ( {8} => {5} ) ? 57% Done. Rule ( {5} => {8} ) more meaningful then Rule ( {8} => {5} )
  • 11. 3. Example Example: Database with transactions ( customer_# : item_a1, item_a2, … ) 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 3, 5, 8. 2, 6, 8. 1, 4, 7, 10. 3, 8, 10. 2, 5, 8. 1, 5, 6. 4, 5, 6, 8. 2, 3, 4. 1, 5, 7, 8. 3, 8, 9, 10. Conf ( {9} => {3} ) ? supp({9}) = 1 , supp({3}) = 1 , supp({3,9}) = 1, then conf( {9} => {3} ) = 1/1 = 1.0 or 100%. OK?
  • 12. 3. Example Example: Database with transactions ( customer_# : item_a1, item_a2, … ) Conf( {9} => {3} ) = 100%. Done. Notice: High Confidence, Low Support. -> Rule ( {9} => {3} ) not meaningful
  • 13. Apriori Algorithm • In computer science and data mining, Apriori is a classic algorithm for learning association rules. • Apriori is designed to operate on databases containing transactions (for example, collections of items bought by customers, or details of a website frequentation). • The algorithm attempts to find subsets which are common to at least a minimum number C (the cutoff, or confidence threshold) of the itemsets. 13
  • 14. Definition (contd.) • Apriori uses a "bottom up" approach, where frequent subsets are extended one item at a time (a step known as candidate generation, and groups of candidates are tested against the data. • The algorithm terminates when no further successful extensions are found. • Apriori uses breadth-first search and a hash tree structure to count candidate item sets efficiently. 14
  • 15. 15
  • 16. Steps to Perform Apriori Algorithm 16
  • 17. Apriori Algorithm Examples Problem Decomposition Transaction ID Items Bought 1 Shoes, Shirt, Jacket 2 Shoes,Jacket 3 Shoes, Jeans 4 Shirt, Sweatshirt If the minimum support is 50%, then {Shoes, Jacket} is the only 2itemset that satisfies the minimum support. Frequent Itemset {Shoes} {Shirt} {Jacket} {Shoes, Jacket} Support 75% 50% 50% 50% If the minimum confidence is 50%, then the only two rules generated from this 2itemset, that have confidence greater than 50%, are: Shoes ⇒ Jacket Support=50%, Confidence=66% Jacket ⇒ Shoes Support=50%, Confidence=100% 17
  • 18. The Apriori Algorithm — Example Min support =50% Database D TID 100 200 300 400 itemset sup. C1 {1} 2 {2} 3 Scan D {3} 3 {4} 1 {5} 3 Items 134 235 1235 25 L2 itemset sup C2 itemset sup 2 2 3 2 {1 {1 {1 {2 {2 {3 C3 itemset {2 3 5} Scan D {1 3} {2 3} {2 5} {3 5} 2} 3} 5} 3} 5} 5} 1 2 1 2 3 2 L1 itemset sup. {1} {2} {3} {5} 2 3 3 3 C2 itemset {1 2} Scan D L3 itemset sup {2 3 5} 2 {1 {1 {2 {2 {3 3} 5} 3} 5} 5} 18
  • 19. Pseudo Code for Apriori Algorithm 19
  • 20. Apriori Advantages/Disadvantages • Advantages – Uses large itemset property – Easily parallelized – Easy to implement • Disadvantages – Assumes transaction database is memory resident. – Requires many database scans. 20
  • 21. Summary • • • • • • Association Rules form an very applied data mining approach. Association Rules are derived from frequent itemsets. The Apriori algorithm is an efficient algorithm for finding all frequent itemsets. The Apriori algorithm implements level-wise search using frequent item property. The Apriori algorithm can be additionally optimized. There are many measures for association rules. 21