SlideShare una empresa de Scribd logo
1 de 25
Association Analysis
Association Analysis-Definition Association Analysis is the task of uncovering relationships among data. Association rules: It  is a model that identifies how the data items are associated with each other. Ex:        It is used in retail sales to identify that are frequently purchased together.
What is a rule?  ,[object Object],If (condition) then (result)  Example: IF a customer purchases coke, then the customer also purchases orange juice  The first part is the rule body and the second part is the rule head
Strength of a rule  How certain is the rule?  Confidence measures the certainty of a rule  It is the percentage of transactions containing all items stated in the condition that also contain the items in result  Confidence (A ,B) = P(B | A)  Example: The rule "If Coke then Oranje Juice" has a confidence of 100%
Strength of a rule  How often is the rule occurred?  Support measures the usefulness of a rule  It is the percentage of transactions that contains all items in the rule  Support (A , B) = P(A ,B)  Example: For the rule If Coke then Oranj juice  In all 5 transactions, 2 contains both coke and OJ  The support of the rule is 40% 
Association Rule Mining Two-step process  Find all frequent k-item sets, k=1, 2, 3, …  All items in a rule is referred as an itemset Rules that contains k item forms a k-itemset The occurrence frequency of an k-itemset is the number of transactions that contain all k items in the itemset An itemset satisfies a minimum support (or minimum occurrence frequency) is called a frequent itemset
Association Rule Mining 2.Generate strong association rules from the frequent k-itemsets Rules satisfy both a minimum support threshold and a minimum confidence threshold are called strong rules
Apriori Algorithm: Find all frequent k-item sets Apriori principle: If an itemset is frequent, then all of its subsets must also be frequent
Illustrating Apriori Principle
Apriori Algorithm Method:  Let k=1 Generate frequent itemsets of length 1 Repeat until no new frequent itemsets are identified Generate length (k+1) candidate itemsets from length k frequent itemsets
Contd… Prune candidate itemsets containing subsets of length k that are infrequent  Count the support of each candidate by scanning the DB Eliminate candidates that are infrequent, leaving only those that are frequent
Generate strong association rules from the frequent k-itemsets For each frequent k-itemset, generate all non-empty subsets  Fore every nonempty subset, generate the rule and the associated confidence  Output the rule if the minimum confidence threshold is satisfied
Multilevel association rules Difficult to find strong associations at very low or primitive levels of data    Few people may buy "IBM desktop computer" and "Sony b/w printer" together  Many people may purchase "computer" and "printer" together
Concept hierarchy defines a sequence of mappings from a set of low level concepts to higher level EX:                                IBM                                           Microsoft                                           Hp                                              ………                                          computer                                      software                                       printer                                    accessory 
Steps to be followed Top-down, progressive deepening approach  First mine high-level frequent items  Then mine their lower level frequent items and so on  At each level, Apriori algorithm is used  Use uniform minimum support for all levels, or  Use reduced minimum support at lower levels
Sequential Association Rule  Concerns sequences of events  New homeowners purchase shower curtains before purchasing furniture  When a customer goes into a bank branch and ask for an account reconciliation, there is a good chance that he or she will close all his or her accounts
Sequential Association Rule  Transaction must have two additional features:  a time stamp or sequencing information to determine when transactions occurred relative to each other  identifying information, such as account number or id number
Some important parameters  Duration  duration may be the entire available sequence in the database, or a user selected subsequence, such as year 1999  Event folding window  a set of events occurring within a specified period of time, such as within the same day, can be viewed as occurring together.
Some important parameters  Interval  between events in the discovered pattern  0 interval means to find strictly consecutive sequences  min_int <= interval <= max_int means to find patterns that are separated by at least min_int at most max_int interval = c, to find patterns carrying an exact interval
Some Practical Issues  Time window of transactions  Level of aggregation  Level of support and confidence
Time window of transactions  Select a time window for the transaction covers at least 2 product cycles  e.g. customer purchases a product with a frequency of six month or less, select a 12-month window of customer transaction data  For frequently purchased products, a short time window is sufficient  For low frequency items, a longer time window is necessary.
Level of aggregation  If product codes in the data are too specific (such as based on product details such as size and flavour), few associations will be discovered  Group products into categories according to the product hierarchy or create new level manually
Level of support and confidence  Start with a high support and gradually reduce it  Set confidence to around 50% to reduce the number of permutation
Conclusion Association analysis rules such as multidimensional and sequential association rules are studied. Apriori algorithm is described in detail Various practical issues in association rules are analyzed.
Visit more self help tutorials Pick a tutorial of your choice and browse through it at your own pace. The tutorials section is free, self-guiding and will not involve any additional support. Visit us at www.dataminingtools.net

Más contenido relacionado

Similar a Association Analysis

Software requirementspecification
Software requirementspecificationSoftware requirementspecification
Software requirementspecification
oshin-japanese
 
Businesses involved in mergers and acquisitions must exercise due di.docx
Businesses involved in mergers and acquisitions must exercise due di.docxBusinesses involved in mergers and acquisitions must exercise due di.docx
Businesses involved in mergers and acquisitions must exercise due di.docx
dewhirstichabod
 
Refining The System Definition
Refining The System DefinitionRefining The System Definition
Refining The System Definition
Sandeep Ganji
 
 risk-based approach of managing information systems is a holistic.docx
 risk-based approach of managing information systems is a holistic.docx risk-based approach of managing information systems is a holistic.docx
 risk-based approach of managing information systems is a holistic.docx
odiliagilby
 

Similar a Association Analysis (20)

big data seminar.pptx
big data seminar.pptxbig data seminar.pptx
big data seminar.pptx
 
Top Down Approach to find Maximal Frequent Item Sets using Subset Creation
Top Down Approach to find Maximal Frequent Item Sets using Subset CreationTop Down Approach to find Maximal Frequent Item Sets using Subset Creation
Top Down Approach to find Maximal Frequent Item Sets using Subset Creation
 
Software requirementspecification
Software requirementspecificationSoftware requirementspecification
Software requirementspecification
 
20IT501_DWDM_PPT_Unit_III.ppt
20IT501_DWDM_PPT_Unit_III.ppt20IT501_DWDM_PPT_Unit_III.ppt
20IT501_DWDM_PPT_Unit_III.ppt
 
viva_dd.pptx
viva_dd.pptxviva_dd.pptx
viva_dd.pptx
 
20IT501_DWDM_U3.ppt
20IT501_DWDM_U3.ppt20IT501_DWDM_U3.ppt
20IT501_DWDM_U3.ppt
 
Businesses involved in mergers and acquisitions must exercise due di.docx
Businesses involved in mergers and acquisitions must exercise due di.docxBusinesses involved in mergers and acquisitions must exercise due di.docx
Businesses involved in mergers and acquisitions must exercise due di.docx
 
Association Rule based Recommendation System using Big Data
Association Rule based Recommendation System using Big DataAssociation Rule based Recommendation System using Big Data
Association Rule based Recommendation System using Big Data
 
A wrapper for QuantLib and reference data
A wrapper for QuantLib and reference dataA wrapper for QuantLib and reference data
A wrapper for QuantLib and reference data
 
Profitable Itemset Mining using Weights
Profitable Itemset Mining using WeightsProfitable Itemset Mining using Weights
Profitable Itemset Mining using Weights
 
Customer Decision Support System
Customer Decision Support SystemCustomer Decision Support System
Customer Decision Support System
 
Refining The System Definition
Refining The System DefinitionRefining The System Definition
Refining The System Definition
 
Dma unit 2
Dma unit  2Dma unit  2
Dma unit 2
 
Monitoring Distributed Systems
Monitoring Distributed SystemsMonitoring Distributed Systems
Monitoring Distributed Systems
 
Predicting online user behaviour using deep learning algorithms
Predicting online user behaviour using deep learning algorithmsPredicting online user behaviour using deep learning algorithms
Predicting online user behaviour using deep learning algorithms
 
 risk-based approach of managing information systems is a holistic.docx
 risk-based approach of managing information systems is a holistic.docx risk-based approach of managing information systems is a holistic.docx
 risk-based approach of managing information systems is a holistic.docx
 
Lecture7 use case modeling
Lecture7 use case modelingLecture7 use case modeling
Lecture7 use case modeling
 
Introduction To Multilevel Association Rule And Its Methods
Introduction To Multilevel Association Rule And Its MethodsIntroduction To Multilevel Association Rule And Its Methods
Introduction To Multilevel Association Rule And Its Methods
 
PrésentationKnime-Final
PrésentationKnime-FinalPrésentationKnime-Final
PrésentationKnime-Final
 
ADAPTIVE MODEL FOR WEB SERVICE RECOMMENDATION
ADAPTIVE MODEL FOR WEB SERVICE RECOMMENDATIONADAPTIVE MODEL FOR WEB SERVICE RECOMMENDATION
ADAPTIVE MODEL FOR WEB SERVICE RECOMMENDATION
 

Más de Datamining Tools

Más de Datamining Tools (20)

Data Mining: Text and web mining
Data Mining: Text and web miningData Mining: Text and web mining
Data Mining: Text and web mining
 
Data Mining: Outlier analysis
Data Mining: Outlier analysisData Mining: Outlier analysis
Data Mining: Outlier analysis
 
Data Mining: Mining stream time series and sequence data
Data Mining: Mining stream time series and sequence dataData Mining: Mining stream time series and sequence data
Data Mining: Mining stream time series and sequence data
 
Data Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlationsData Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlations
 
Data Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysisData Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysis
 
Data Mining: Data warehouse and olap technology
Data Mining: Data warehouse and olap technologyData Mining: Data warehouse and olap technology
Data Mining: Data warehouse and olap technology
 
Data MIning: Data processing
Data MIning: Data processingData MIning: Data processing
Data MIning: Data processing
 
Data Mining: clustering and analysis
Data Mining: clustering and analysisData Mining: clustering and analysis
Data Mining: clustering and analysis
 
Data mining: Classification and Prediction
Data mining: Classification and PredictionData mining: Classification and Prediction
Data mining: Classification and Prediction
 
Data Mining: Data mining classification and analysis
Data Mining: Data mining classification and analysisData Mining: Data mining classification and analysis
Data Mining: Data mining classification and analysis
 
Data Mining: Data mining and key definitions
Data Mining: Data mining and key definitionsData Mining: Data mining and key definitions
Data Mining: Data mining and key definitions
 
Data Mining: Data cube computation and data generalization
Data Mining: Data cube computation and data generalizationData Mining: Data cube computation and data generalization
Data Mining: Data cube computation and data generalization
 
Data Mining: Applying data mining
Data Mining: Applying data miningData Mining: Applying data mining
Data Mining: Applying data mining
 
Data Mining: Application and trends in data mining
Data Mining: Application and trends in data miningData Mining: Application and trends in data mining
Data Mining: Application and trends in data mining
 
AI: Planning and AI
AI: Planning and AIAI: Planning and AI
AI: Planning and AI
 
AI: Logic in AI 2
AI: Logic in AI 2AI: Logic in AI 2
AI: Logic in AI 2
 
AI: Logic in AI
AI: Logic in AIAI: Logic in AI
AI: Logic in AI
 
AI: Learning in AI 2
AI: Learning in AI  2AI: Learning in AI  2
AI: Learning in AI 2
 
AI: Learning in AI
AI: Learning in AI AI: Learning in AI
AI: Learning in AI
 
AI: Introduction to artificial intelligence
AI: Introduction to artificial intelligenceAI: Introduction to artificial intelligence
AI: Introduction to artificial intelligence
 

Último

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 

Último (20)

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 

Association Analysis

  • 2. Association Analysis-Definition Association Analysis is the task of uncovering relationships among data. Association rules: It is a model that identifies how the data items are associated with each other. Ex: It is used in retail sales to identify that are frequently purchased together.
  • 3.
  • 4. Strength of a rule How certain is the rule? Confidence measures the certainty of a rule It is the percentage of transactions containing all items stated in the condition that also contain the items in result Confidence (A ,B) = P(B | A) Example: The rule "If Coke then Oranje Juice" has a confidence of 100%
  • 5. Strength of a rule How often is the rule occurred? Support measures the usefulness of a rule It is the percentage of transactions that contains all items in the rule Support (A , B) = P(A ,B) Example: For the rule If Coke then Oranj juice In all 5 transactions, 2 contains both coke and OJ The support of the rule is 40% 
  • 6. Association Rule Mining Two-step process Find all frequent k-item sets, k=1, 2, 3, … All items in a rule is referred as an itemset Rules that contains k item forms a k-itemset The occurrence frequency of an k-itemset is the number of transactions that contain all k items in the itemset An itemset satisfies a minimum support (or minimum occurrence frequency) is called a frequent itemset
  • 7. Association Rule Mining 2.Generate strong association rules from the frequent k-itemsets Rules satisfy both a minimum support threshold and a minimum confidence threshold are called strong rules
  • 8. Apriori Algorithm: Find all frequent k-item sets Apriori principle: If an itemset is frequent, then all of its subsets must also be frequent
  • 10. Apriori Algorithm Method: Let k=1 Generate frequent itemsets of length 1 Repeat until no new frequent itemsets are identified Generate length (k+1) candidate itemsets from length k frequent itemsets
  • 11. Contd… Prune candidate itemsets containing subsets of length k that are infrequent Count the support of each candidate by scanning the DB Eliminate candidates that are infrequent, leaving only those that are frequent
  • 12. Generate strong association rules from the frequent k-itemsets For each frequent k-itemset, generate all non-empty subsets Fore every nonempty subset, generate the rule and the associated confidence Output the rule if the minimum confidence threshold is satisfied
  • 13. Multilevel association rules Difficult to find strong associations at very low or primitive levels of data   Few people may buy "IBM desktop computer" and "Sony b/w printer" together Many people may purchase "computer" and "printer" together
  • 14. Concept hierarchy defines a sequence of mappings from a set of low level concepts to higher level EX: IBM  Microsoft  Hp ……… computer  software  printer  accessory 
  • 15. Steps to be followed Top-down, progressive deepening approach First mine high-level frequent items Then mine their lower level frequent items and so on At each level, Apriori algorithm is used Use uniform minimum support for all levels, or Use reduced minimum support at lower levels
  • 16. Sequential Association Rule  Concerns sequences of events New homeowners purchase shower curtains before purchasing furniture When a customer goes into a bank branch and ask for an account reconciliation, there is a good chance that he or she will close all his or her accounts
  • 17. Sequential Association Rule  Transaction must have two additional features: a time stamp or sequencing information to determine when transactions occurred relative to each other identifying information, such as account number or id number
  • 18. Some important parameters Duration duration may be the entire available sequence in the database, or a user selected subsequence, such as year 1999 Event folding window a set of events occurring within a specified period of time, such as within the same day, can be viewed as occurring together.
  • 19. Some important parameters Interval between events in the discovered pattern 0 interval means to find strictly consecutive sequences min_int <= interval <= max_int means to find patterns that are separated by at least min_int at most max_int interval = c, to find patterns carrying an exact interval
  • 20. Some Practical Issues  Time window of transactions Level of aggregation Level of support and confidence
  • 21. Time window of transactions Select a time window for the transaction covers at least 2 product cycles e.g. customer purchases a product with a frequency of six month or less, select a 12-month window of customer transaction data For frequently purchased products, a short time window is sufficient For low frequency items, a longer time window is necessary.
  • 22. Level of aggregation If product codes in the data are too specific (such as based on product details such as size and flavour), few associations will be discovered Group products into categories according to the product hierarchy or create new level manually
  • 23. Level of support and confidence Start with a high support and gradually reduce it Set confidence to around 50% to reduce the number of permutation
  • 24. Conclusion Association analysis rules such as multidimensional and sequential association rules are studied. Apriori algorithm is described in detail Various practical issues in association rules are analyzed.
  • 25. Visit more self help tutorials Pick a tutorial of your choice and browse through it at your own pace. The tutorials section is free, self-guiding and will not involve any additional support. Visit us at www.dataminingtools.net