SlideShare a Scribd company logo
1 of 26
Download to read offline
Introduction to Machine
       Learning
                      Lecture 16
 Advanced Topics in Association Rules Mining

                     Albert Orriols i Puig
                 http://www.albertorriols.net
                 htt //       lb t i l      t
                    aorriols@salle.url.edu

          Artificial Intelligence – Machine Learning
                            g                      g
              Enginyeria i Arquitectura La Salle
                     Universitat Ramon Llull
Recap of Lecture 13-15
        Ideas come from the market basket analysis (
                                              y    (MBA)
                                                       )
                Let’s go shopping!

           Milk, eggs, sugar,
                 bread
                                 Milk, eggs, cereal,        Eggs, sugar
                                        bread
                                        bd




              Customer1

                                     Customer2               Customer3

                What do my customer buy? Which product are bought together?
                Aim: Find associations and correlations between t e d e e t
                         d assoc at o s a d co e at o s bet ee the different
                items that customers place in their shopping basket
                                                                          Slide 2
Artificial Intelligence                Machine Learning
Recap of Lecture 15
        Aim: Find associations between items

        But wait!
                There are many different diapers
                          Dodot, Huggies …
                                   gg

                There are many different beers:
                          heineken, desperados, king fisher … in bottle/can …
                                  ,    p      ,    g

                                                                                Clothes
        Which rule do you prefer?
                diapers ⇒ beer                                           Outwear          Shirts
                dodot diapers M ⇒ Dam beer in Can
                                                                  Jackets       Ski Pants
        Which will have greater support?

                                                                                          Slide 3
Artificial Intelligence                        Machine Learning
Today’s Agenda
        Continuing our journey through some advanced
        topics in ARM
                Mining frequent patterns without candidate
                generation
                Multiple Level AR
                Sequential Pattern Mining
                Quantitative association rules
                Mining class association rules
                Beyond support & confidence
                B    d       t      fid
                Applications

                                                             Slide 4
Artificial Intelligence             Machine Learning
Introduction to Seq. AR
        So far, we have seen
              ,
                Apriori
                Fp-growth
                F      th
                Mining multiple level AR
        But none of them consider the order of transactions
        However,
        However is the sequence important?
                Whether the hen or the egg?


                Sometimes, really important
                          Analyze the sequence of items bought buy a customer
                          Web usage mining searches for navigational patterns of
                          users

                                                                                   Slide 5
Artificial Intelligence                     Machine Learning
An Example in Web Usage Mining




           Web sequence: < {Homepage} {Electronics} {Computers}
           {Laptops} {Sony Vaio} {Order Confirmation} {Return to Shopping} >



                                                                       Slide 6
Artificial Intelligence            Machine Learning
Definition
        Defining the problem:
               g     p
                Let I = {i1, i2, …, im} be a set of items
                Sequence: A ordered li t of itemsets
                S         An d d list f it        t
                Itemset/element: A non-empty set of items X ⊆ I. We denote a
                sequence s b < 1a2…ar> where ai i an it
                             by <a      >, h       is   itemset, which i also
                                                                t hi h is l
                called an element of s
                An l
                A element ( an it
                          t (or    itemset) of a sequence is denoted by { 1, x2,
                                           t) f           id     t d b {x
                …, xk}, where xj ∈ I is an item
                We
                W assume without loss of generality th t it
                             ith t l       f      lit that items in an element
                                                                 i      l    t
                of a sequence are in lexicographic order




                                                                            Slide 7
Artificial Intelligence                   Machine Learning
Definition
        Defining the problem:
               g     p
                Size: The size of a sequence is the number of elements (or
                itemsets) in the seque ce
                  e se s)      e sequence
                Length: The length of a sequence is the number of items in the
                seque ce
                sequence
                A sequence of length k is called k-sequence
                A sequence s1 = 〈 1a2…ar〉 i a subsequence of another
                                   〈a          is    b              f    th
                sequence s2 = 〈b1b2…bv〉, or s2 is a supersequence of s1, if
                there e st integers 1 ≤ j1 < j2 < … < jr 1 < jr ≤ v such t at a1 ⊆
                t e e exist tege s                                  suc that
                                                        r−1
                bj1, a2 ⊆ bj2, …, ar ⊆ bjr. We also say that s2 contains s1




                                                                                Slide 8
Artificial Intelligence                  Machine Learning
Example
        Let I = {1, 2, 3, 4, 5, 6, 7, 8, 9}.
                {, , , , , , , , }
        Sequence 〈{3}{4, 5}{8}〉 is contained in (or is a
        subsequence of) 〈{6} {3 7}{9}{4 5 8}{3 8}〉
                             {3, 7}{9}{4, 5, 8}{3,
                because {3} ⊆ {3, 7}, {4, 5} ⊆ {4, 5, 8}, and {8} ⊆ {3, 8}.
                However, 〈{3}{8}〉 is not contained in 〈{3, 8}〉 or vice versa.
                The size of the sequence 〈{3}{4, 5}{8}〉 is 3, and the length of
                the sequence is 4




                                                                                Slide 9
Artificial Intelligence                 Machine Learning
Objective
        Objective of sequential pattern mining (SPM)
          j            q        p            g(    )
                Input: A set S of input data sequences (or sequence database)
                Goal: the
                G l th problem of mining sequential patterns i t fi d all th
                            bl    f ii             ti l tt   is to find ll the
                sequences that have a user-specified minimum support
        Each
        E h such sequence is called a frequent sequence, or a
                h          i   ll d f        t
        sequential pattern
        The support for a sequence is the fraction of total data
        sequences in S that contains this sequence




                                                                         Slide 10
Artificial Intelligence               Machine Learning
Example
Customer             Transaction         Transaction          Customer       Customer Sequence
   ID                   time           (items bought)            ID
      1            July 20, 2005       30                            1   < (30) (90)>
      1            July 25, 2005       90                            2   <(10 20) (30) (40 60 70)>
      2            July 9, 2005
                      y,               10, 20
                                         ,                           3   <(30 50 70)>
                                                                          (        )
      2            July 14, 2005       30                            4   <(30) (40 70) (90)>
      2            July 20, 2005       40,60,70                      5   <(90)>
      3            July 25, 2005       30,50,70
      4            July 25, 2005       30
      4            July 29, 2005
                      y,               40, 70
                                         ,
      4            August 2, 2005      90
      5            July 12, 2005       90


                                            Sequential patterns with support >25%
                          1-sequence   < (30)> <(40)> <(70)> <(90)>
                          2-sequence   <(30)(40)> <(30)(70)><(30)(90)><(40 70)>
                          3-sequence   <(30) (40 70)>

                                        Example borrowed from Bing Liu
                                                                                                     Slide 11
Artificial Intelligence                           Machine Learning
GSP
        GSP follows closely Apriori but for sequential patterns
                          yp                  q        p
                If a sequence S is not frequent, then none of the super-
                seque ces of s eque
                sequences o S is frequent
                For instance, if <ab> is infrequent so do <acb> and <(ca)b>
        GSP follows the next steps:
            f ll    th     tt
                Initially, every item in DB is a candidate of length-1
                For each level (i.e., sequences of length-k) do
                          Scan database to collect support count for each candidate
                          sequence
                          Generate candidate length-(k+1) sequences from length-k
                          frequent sequences using Apriori
                          Repeat until no frequent sequence or no candidate can be
                          found
                Strength: Candidate pruning by Apriori
                                                                                 Slide 12
Artificial Intelligence                     Machine Learning
The Algorithm




                           Does this remind you Apriori?

                                                           Slide 13
Artificial Intelligence             Machine Learning
Quantitative AR

            Transaction ID      Age        Married       NumCars
                          1      23            No           1
                          2      25           Yes           1
                          3      29            No           0
                          4      34           Yes           2
                          5      38           Yes
                                              Y             2



                <Age: 30..39> and <Married: Yes> => <NumCars: 2>
                Support = 40% Conf = 100%
                          40%,
        How can we deal with these data?


                                                                   Slide 14
Artificial Intelligence               Machine Learning
Map to Boolean Values

 Record                Age
                        g         Age
                                   g       Married      Married    NumCars   NumCars
   ID                [20..29]   [30..39]    Yes           No          0         1
     100                  1        0         0                 1      0         1
     200                  1        0         1                 0      0         1
     300                  1        0         0                 1      1         0
     400                  0        1         1                 0      0         0
     500                  0        1         1                 0      0         0



        Now,
        Now use any system for mining boolean AR
                Apriori
                FP-growth
                                                                                    Slide 15
Artificial Intelligence                     Machine Learning
Problems with this Approach
        MinSup
                If number of intervals is large,
                the support of a single interval
                can be lower
        MinConf
                Information lost during partition
                values into intervals.
                Confidence can be lower as
                number of intervals is smaller
        Example
                In the used partition:
                          <NumCars:0> ⇒ <Married:No> c=100%
                But now, assume that in the partition, NumCars:0 and NumCars:1 go
                to the same interval
                          <NumCars:0,1> ⇒ <Married:No> c=66.67%

                                                                              Slide 16
Artificial Intelligence                    Machine Learning
Problems with this Approach
        How we can solve this problem?
                Increase the number of intervals
                (to reduce information lost)
                while combining adjacent ones (t i
                  hil     bi i    dj     t     (to increase support)
                                                                  t)
                ExecTime blows up as items
                per record increases
                ManyRules: Number of rules also blows up.
                Many of them will not be interesting




                                                                       Slide 17
Artificial Intelligence                  Machine Learning
Second Approach
        Other solutions?
                Well, the problem was that intervals were not the best ones
                Let’s t t
                L t’ try to create the best intervals f our d t
                                t th b t i t       l for    data
        How?
                Discretizing/Clustering techniques
                          Apply a discretizing/clustering technique to find the best
                               y             g          g
                          partitions
                          Employ those partitions




                We’ll see how clustering techniques work in the next class. So,
                keep this in mind and p
                   p                  pitch the p
                                                pieces together next class!
                                                         g

                                                                                   Slide 18
Artificial Intelligence                      Machine Learning
Third Approach
        And what if we do not map the input to a boolean
                                p       p
        space?
                Create interval based association
                        interval-based
                rules directly
                So,
                So decide the best interval and
                                            and,
                then, count the support
                Usually,
                Usually these approaches do not
                provide all the association rules,
                but the ones with larger support
                and confidence
                        f
                Fuzzy logics can also be applied here.
                          But again, we’ll see
                          GFS in two three lectures



                                                                Slide 19
Artificial Intelligence                      Machine Learning
Mining Class Association Rules
        So far, we have seen ARM without any specific target
              ,                            yp            g
                It finds all possible rules that exist in data, i.e., any item can appear as
                a consequent or a condition of a rule
        However, what if we are interested in some specific targets?
                E.g.:
                Eg:
                          The user has a set of text documents from some known topics.
                          He/she wants to find out what words are associated or correlated
                          with each topic

        So, now, we want to find:
                X ⇒ y, where X ⊆ I, and y ∈ Y
        The algorithms are very similar to those of ARM
        We are not going to see them in class. But you have
        information on the estudy

                                                                                         Slide 20
Artificial Intelligence                        Machine Learning
Beyond Support and Confidence
        Support and Confidence are the basic measures of
           pp
        interestingness
        But many more have been proposed during the last few
        years




                                                           Slide 21
Artificial Intelligence      Machine Learning
Some Applications
        Wal-Mart has used the technique
        for years to mine POS data and
        arrange their store to maximize
        sales from such analysis




        Medical databases to discover commonly occurring diseases
        amongst groups of people
        Lottery results databases, to discover those lucky combinations of
        L tt        lt d t b       t di        th    lk       bi ti      f
        numbers




                                                                        Slide 22
Artificial Intelligence            Machine Learning
Some Applications
        Power System Restoration
               y
                PSR is a multi-objective, multi-period, nonlinear, mixed integer
                op
                optimization p ob e with various co s a s a d
                       a o problem         a ous constraints and
                unforeseeable factors
                Discovering o assoc a o s that help bu d heuristics for PSR
                  sco e g of associations a e p build eu s cs o S
                Actions in a PSR
                          start_black_start_unit(x)
                          start black start unit(x)
                          energize_line(x)
                          pick_up_load(x)
                          pick up load(x)
                          synchronize(x,y)
                          connect_tie_line(x)
                          connect tie line(x)
                          crank_unit(x)
                          energize_busbar(x)
                          energize busbar(x)

                                                                            Slide 23
Artificial Intelligence                         Machine Learning
Some Applications
        Correlations with color, spatial relationships, etc.
        From coarse to Fine Resolution mining




                                                               Slide 24
Artificial Intelligence         Machine Learning
Next Class



        Clustering




                                               Slide 25
Artificial Intelligence     Machine Learning
Introduction to Machine
       Learning
                      Lecture 16
 Advanced Topics in Association Rules Mining

                     Albert Orriols i Puig
                 http://www.albertorriols.net
                 htt //       lb t i l      t
                    aorriols@salle.url.edu

          Artificial Intelligence – Machine Learning
                            g                      g
              Enginyeria i Arquitectura La Salle
                     Universitat Ramon Llull

More Related Content

What's hot

Tech natives 22042013_bartde_witte_watson_v01
Tech natives 22042013_bartde_witte_watson_v01Tech natives 22042013_bartde_witte_watson_v01
Tech natives 22042013_bartde_witte_watson_v01Bart de Witte
 
2 tri partite model algebra
2 tri partite model algebra2 tri partite model algebra
2 tri partite model algebraAle Cignetti
 
HIS'2008: New Crossover Operator for Evolutionary Rule Discovery in XCS
HIS'2008: New Crossover Operator for Evolutionary Rule Discovery in XCSHIS'2008: New Crossover Operator for Evolutionary Rule Discovery in XCS
HIS'2008: New Crossover Operator for Evolutionary Rule Discovery in XCSAlbert Orriols-Puig
 
Knowledge Components & Objects
Knowledge Components & ObjectsKnowledge Components & Objects
Knowledge Components & Objectsmohdazrulazlan
 
Manifesto for a Standard on Knowledge Exchange in Social Knowledge Management...
Manifesto for a Standard on Knowledge Exchange in Social Knowledge Management...Manifesto for a Standard on Knowledge Exchange in Social Knowledge Management...
Manifesto for a Standard on Knowledge Exchange in Social Knowledge Management...Communardo Software GmbH
 
Personal World Absorber - a tool to filter information garbage and boost user...
Personal World Absorber - a tool to filter information garbage and boost user...Personal World Absorber - a tool to filter information garbage and boost user...
Personal World Absorber - a tool to filter information garbage and boost user...Vladimir Kryukov
 
Creating Documentation Your Users Will Love
Creating Documentation Your Users Will LoveCreating Documentation Your Users Will Love
Creating Documentation Your Users Will LoveEna Arel
 
Formal Concept Analysis
Formal Concept AnalysisFormal Concept Analysis
Formal Concept AnalysisSSA KPI
 
Lecture 04 Association Rules Basics
Lecture 04 Association Rules BasicsLecture 04 Association Rules Basics
Lecture 04 Association Rules BasicsPier Luca Lanzi
 
Sixth Grade Math Curriculum Map
Sixth Grade Math Curriculum Map Sixth Grade Math Curriculum Map
Sixth Grade Math Curriculum Map Isaac_Schools_5
 
Chc v2.0 model 2 13-12
Chc v2.0 model 2 13-12Chc v2.0 model 2 13-12
Chc v2.0 model 2 13-12Kevin McGrew
 
Agile Business Analysis - The Key to Effective Requirements on Agile Projects
Agile Business Analysis - The Key to Effective Requirements on Agile ProjectsAgile Business Analysis - The Key to Effective Requirements on Agile Projects
Agile Business Analysis - The Key to Effective Requirements on Agile ProjectsLilian De Munno
 
Kindergarten Curriculum Map Math10 11
Kindergarten Curriculum Map Math10 11Kindergarten Curriculum Map Math10 11
Kindergarten Curriculum Map Math10 11Isaac_Schools_5
 
4th grade math curriculum map 2011 2012-1
4th grade math curriculum map 2011 2012-14th grade math curriculum map 2011 2012-1
4th grade math curriculum map 2011 2012-1Isaac_Schools_5
 

What's hot (20)

Tech natives 22042013_bartde_witte_watson_v01
Tech natives 22042013_bartde_witte_watson_v01Tech natives 22042013_bartde_witte_watson_v01
Tech natives 22042013_bartde_witte_watson_v01
 
2 tri partite model algebra
2 tri partite model algebra2 tri partite model algebra
2 tri partite model algebra
 
HIS'2008: New Crossover Operator for Evolutionary Rule Discovery in XCS
HIS'2008: New Crossover Operator for Evolutionary Rule Discovery in XCSHIS'2008: New Crossover Operator for Evolutionary Rule Discovery in XCS
HIS'2008: New Crossover Operator for Evolutionary Rule Discovery in XCS
 
Knowledge Components & Objects
Knowledge Components & ObjectsKnowledge Components & Objects
Knowledge Components & Objects
 
Making Intelligence
Making IntelligenceMaking Intelligence
Making Intelligence
 
Manifesto for a Standard on Knowledge Exchange in Social Knowledge Management...
Manifesto for a Standard on Knowledge Exchange in Social Knowledge Management...Manifesto for a Standard on Knowledge Exchange in Social Knowledge Management...
Manifesto for a Standard on Knowledge Exchange in Social Knowledge Management...
 
Personal World Absorber - a tool to filter information garbage and boost user...
Personal World Absorber - a tool to filter information garbage and boost user...Personal World Absorber - a tool to filter information garbage and boost user...
Personal World Absorber - a tool to filter information garbage and boost user...
 
Creating Documentation Your Users Will Love
Creating Documentation Your Users Will LoveCreating Documentation Your Users Will Love
Creating Documentation Your Users Will Love
 
Formal Concept Analysis
Formal Concept AnalysisFormal Concept Analysis
Formal Concept Analysis
 
Lecture 04 Association Rules Basics
Lecture 04 Association Rules BasicsLecture 04 Association Rules Basics
Lecture 04 Association Rules Basics
 
Sixth Grade Math Curriculum Map
Sixth Grade Math Curriculum Map Sixth Grade Math Curriculum Map
Sixth Grade Math Curriculum Map
 
Chc v2.0 model 2 13-12
Chc v2.0 model 2 13-12Chc v2.0 model 2 13-12
Chc v2.0 model 2 13-12
 
Agile Business Analysis - The Key to Effective Requirements on Agile Projects
Agile Business Analysis - The Key to Effective Requirements on Agile ProjectsAgile Business Analysis - The Key to Effective Requirements on Agile Projects
Agile Business Analysis - The Key to Effective Requirements on Agile Projects
 
Back to information discovery
Back to information discoveryBack to information discovery
Back to information discovery
 
Kindergarten Curriculum Map Math10 11
Kindergarten Curriculum Map Math10 11Kindergarten Curriculum Map Math10 11
Kindergarten Curriculum Map Math10 11
 
Lecture04 / scenarios
Lecture04 / scenariosLecture04 / scenarios
Lecture04 / scenarios
 
IntelliGO semantic similarity measure for Gene Ontology annotations
IntelliGO semantic similarity measure for Gene Ontology annotationsIntelliGO semantic similarity measure for Gene Ontology annotations
IntelliGO semantic similarity measure for Gene Ontology annotations
 
4th grade math curriculum map 2011 2012-1
4th grade math curriculum map 2011 2012-14th grade math curriculum map 2011 2012-1
4th grade math curriculum map 2011 2012-1
 
Integrated Learning
Integrated LearningIntegrated Learning
Integrated Learning
 
VO Course 06: VO Data-models
VO Course 06: VO Data-modelsVO Course 06: VO Data-models
VO Course 06: VO Data-models
 

Similar to Advanced Machine Learning Association Rule Mining

Artificial Intelligence for Undergrads
Artificial Intelligence for UndergradsArtificial Intelligence for Undergrads
Artificial Intelligence for UndergradsJose Berengueres
 
Building a Cutting-Edge Data Process Environment on a Budget by Gael Varoquaux
Building a Cutting-Edge Data Process Environment on a Budget by Gael VaroquauxBuilding a Cutting-Edge Data Process Environment on a Budget by Gael Varoquaux
Building a Cutting-Edge Data Process Environment on a Budget by Gael VaroquauxPyData
 
Think Machine Learning with Scikit-Learn (Python)
Think Machine Learning with Scikit-Learn (Python)Think Machine Learning with Scikit-Learn (Python)
Think Machine Learning with Scikit-Learn (Python)Chetan Khatri
 
Simple big data, in Python
Simple big data, in PythonSimple big data, in Python
Simple big data, in PythonGael Varoquaux
 
Building a cutting-edge data processing environment on a budget
Building a cutting-edge data processing environment on a budgetBuilding a cutting-edge data processing environment on a budget
Building a cutting-edge data processing environment on a budgetGael Varoquaux
 
Puppet for dummies - ZendCon 2011 Edition
Puppet for dummies - ZendCon 2011 EditionPuppet for dummies - ZendCon 2011 Edition
Puppet for dummies - ZendCon 2011 EditionJoshua Thijssen
 
Building Interpretable & Secure AI Systems using PyTorch
Building Interpretable & Secure AI Systems using PyTorchBuilding Interpretable & Secure AI Systems using PyTorch
Building Interpretable & Secure AI Systems using PyTorchgeetachauhan
 
Back To The Future.Key 2
Back To The Future.Key 2Back To The Future.Key 2
Back To The Future.Key 2gueste8cc560
 
Software and all that comes with it
Software and all that comes with itSoftware and all that comes with it
Software and all that comes with itAlberto Brandolini
 
AET vs. AED: Unsupervised Representation Learning by Auto-Encoding Transforma...
AET vs. AED: Unsupervised Representation Learning by Auto-Encoding Transforma...AET vs. AED: Unsupervised Representation Learning by Auto-Encoding Transforma...
AET vs. AED: Unsupervised Representation Learning by Auto-Encoding Transforma...Tomoyuki Suzuki
 
Java Hurdling: Obstacles and Techniques in Java Client Penetration-Testing
Java Hurdling: Obstacles and Techniques in Java Client Penetration-TestingJava Hurdling: Obstacles and Techniques in Java Client Penetration-Testing
Java Hurdling: Obstacles and Techniques in Java Client Penetration-TestingTal Melamed
 
On the code of data science
On the code of data scienceOn the code of data science
On the code of data scienceGael Varoquaux
 
Transformative iPad Use in Elementary School
Transformative iPad Use in  Elementary SchoolTransformative iPad Use in  Elementary School
Transformative iPad Use in Elementary SchoolSilvia Rosenthal Tolisano
 
Seven Ineffective Coding Habits of Many Java Programmers
Seven Ineffective Coding Habits of Many Java ProgrammersSeven Ineffective Coding Habits of Many Java Programmers
Seven Ineffective Coding Habits of Many Java ProgrammersKevlin Henney
 

Similar to Advanced Machine Learning Association Rule Mining (20)

Artificial Intelligence for Undergrads
Artificial Intelligence for UndergradsArtificial Intelligence for Undergrads
Artificial Intelligence for Undergrads
 
Building a Cutting-Edge Data Process Environment on a Budget by Gael Varoquaux
Building a Cutting-Edge Data Process Environment on a Budget by Gael VaroquauxBuilding a Cutting-Edge Data Process Environment on a Budget by Gael Varoquaux
Building a Cutting-Edge Data Process Environment on a Budget by Gael Varoquaux
 
Think Machine Learning with Scikit-Learn (Python)
Think Machine Learning with Scikit-Learn (Python)Think Machine Learning with Scikit-Learn (Python)
Think Machine Learning with Scikit-Learn (Python)
 
Simple big data, in Python
Simple big data, in PythonSimple big data, in Python
Simple big data, in Python
 
Lecture2 - Machine Learning
Lecture2 - Machine LearningLecture2 - Machine Learning
Lecture2 - Machine Learning
 
Lecture22
Lecture22Lecture22
Lecture22
 
Building a cutting-edge data processing environment on a budget
Building a cutting-edge data processing environment on a budgetBuilding a cutting-edge data processing environment on a budget
Building a cutting-edge data processing environment on a budget
 
Puppet for dummies - ZendCon 2011 Edition
Puppet for dummies - ZendCon 2011 EditionPuppet for dummies - ZendCon 2011 Edition
Puppet for dummies - ZendCon 2011 Edition
 
AI in Production
AI in ProductionAI in Production
AI in Production
 
Lecture24
Lecture24Lecture24
Lecture24
 
Lecture17
Lecture17Lecture17
Lecture17
 
Building Interpretable & Secure AI Systems using PyTorch
Building Interpretable & Secure AI Systems using PyTorchBuilding Interpretable & Secure AI Systems using PyTorch
Building Interpretable & Secure AI Systems using PyTorch
 
Back To The Future.Key 2
Back To The Future.Key 2Back To The Future.Key 2
Back To The Future.Key 2
 
Software and all that comes with it
Software and all that comes with itSoftware and all that comes with it
Software and all that comes with it
 
AET vs. AED: Unsupervised Representation Learning by Auto-Encoding Transforma...
AET vs. AED: Unsupervised Representation Learning by Auto-Encoding Transforma...AET vs. AED: Unsupervised Representation Learning by Auto-Encoding Transforma...
AET vs. AED: Unsupervised Representation Learning by Auto-Encoding Transforma...
 
Java Hurdling: Obstacles and Techniques in Java Client Penetration-Testing
Java Hurdling: Obstacles and Techniques in Java Client Penetration-TestingJava Hurdling: Obstacles and Techniques in Java Client Penetration-Testing
Java Hurdling: Obstacles and Techniques in Java Client Penetration-Testing
 
Data Binding in qooxdoo
Data Binding in qooxdooData Binding in qooxdoo
Data Binding in qooxdoo
 
On the code of data science
On the code of data scienceOn the code of data science
On the code of data science
 
Transformative iPad Use in Elementary School
Transformative iPad Use in  Elementary SchoolTransformative iPad Use in  Elementary School
Transformative iPad Use in Elementary School
 
Seven Ineffective Coding Habits of Many Java Programmers
Seven Ineffective Coding Habits of Many Java ProgrammersSeven Ineffective Coding Habits of Many Java Programmers
Seven Ineffective Coding Habits of Many Java Programmers
 

More from Albert Orriols-Puig

Lecture1 AI1 Introduction to artificial intelligence
Lecture1 AI1 Introduction to artificial intelligenceLecture1 AI1 Introduction to artificial intelligence
Lecture1 AI1 Introduction to artificial intelligenceAlbert Orriols-Puig
 
HAIS09-BeyondHomemadeArtificialDatasets
HAIS09-BeyondHomemadeArtificialDatasetsHAIS09-BeyondHomemadeArtificialDatasets
HAIS09-BeyondHomemadeArtificialDatasetsAlbert Orriols-Puig
 
Lecture14 - Advanced topics in association rules
Lecture14 - Advanced topics in association rulesLecture14 - Advanced topics in association rules
Lecture14 - Advanced topics in association rulesAlbert Orriols-Puig
 
Lecture9 - Bayesian-Decision-Theory
Lecture9 - Bayesian-Decision-TheoryLecture9 - Bayesian-Decision-Theory
Lecture9 - Bayesian-Decision-TheoryAlbert Orriols-Puig
 
New Challenges in Learning Classifier Systems: Mining Rarities and Evolving F...
New Challenges in Learning Classifier Systems: Mining Rarities and Evolving F...New Challenges in Learning Classifier Systems: Mining Rarities and Evolving F...
New Challenges in Learning Classifier Systems: Mining Rarities and Evolving F...Albert Orriols-Puig
 
IWLCS'2008: First Approach toward Online Evolution of Association Rules wit...
IWLCS'2008: First Approach toward Online Evolution of Association Rules wit...IWLCS'2008: First Approach toward Online Evolution of Association Rules wit...
IWLCS'2008: First Approach toward Online Evolution of Association Rules wit...Albert Orriols-Puig
 
HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers B...
HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers B...HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers B...
HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers B...Albert Orriols-Puig
 
HIS'2008: Artificial Data Sets based on Knowledge Generators: Analysis of Lea...
HIS'2008: Artificial Data Sets based on Knowledge Generators: Analysis of Lea...HIS'2008: Artificial Data Sets based on Knowledge Generators: Analysis of Lea...
HIS'2008: Artificial Data Sets based on Knowledge Generators: Analysis of Lea...Albert Orriols-Puig
 

More from Albert Orriols-Puig (18)

Lecture1 AI1 Introduction to artificial intelligence
Lecture1 AI1 Introduction to artificial intelligenceLecture1 AI1 Introduction to artificial intelligence
Lecture1 AI1 Introduction to artificial intelligence
 
HAIS09-BeyondHomemadeArtificialDatasets
HAIS09-BeyondHomemadeArtificialDatasetsHAIS09-BeyondHomemadeArtificialDatasets
HAIS09-BeyondHomemadeArtificialDatasets
 
Lecture19
Lecture19Lecture19
Lecture19
 
Lecture14 - Advanced topics in association rules
Lecture14 - Advanced topics in association rulesLecture14 - Advanced topics in association rules
Lecture14 - Advanced topics in association rules
 
Lecture12 - SVM
Lecture12 - SVMLecture12 - SVM
Lecture12 - SVM
 
Lecture11 - neural networks
Lecture11 - neural networksLecture11 - neural networks
Lecture11 - neural networks
 
Lecture10 - Naïve Bayes
Lecture10 - Naïve BayesLecture10 - Naïve Bayes
Lecture10 - Naïve Bayes
 
Lecture9 - Bayesian-Decision-Theory
Lecture9 - Bayesian-Decision-TheoryLecture9 - Bayesian-Decision-Theory
Lecture9 - Bayesian-Decision-Theory
 
Lecture8 - From CBR to IBk
Lecture8 - From CBR to IBkLecture8 - From CBR to IBk
Lecture8 - From CBR to IBk
 
Lecture6 - C4.5
Lecture6 - C4.5Lecture6 - C4.5
Lecture6 - C4.5
 
Lecture5 - C4.5
Lecture5 - C4.5Lecture5 - C4.5
Lecture5 - C4.5
 
Lecture4 - Machine Learning
Lecture4 - Machine LearningLecture4 - Machine Learning
Lecture4 - Machine Learning
 
Lecture3 - Machine Learning
Lecture3 - Machine LearningLecture3 - Machine Learning
Lecture3 - Machine Learning
 
Lecture1 - Machine Learning
Lecture1 - Machine LearningLecture1 - Machine Learning
Lecture1 - Machine Learning
 
New Challenges in Learning Classifier Systems: Mining Rarities and Evolving F...
New Challenges in Learning Classifier Systems: Mining Rarities and Evolving F...New Challenges in Learning Classifier Systems: Mining Rarities and Evolving F...
New Challenges in Learning Classifier Systems: Mining Rarities and Evolving F...
 
IWLCS'2008: First Approach toward Online Evolution of Association Rules wit...
IWLCS'2008: First Approach toward Online Evolution of Association Rules wit...IWLCS'2008: First Approach toward Online Evolution of Association Rules wit...
IWLCS'2008: First Approach toward Online Evolution of Association Rules wit...
 
HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers B...
HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers B...HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers B...
HIS'2008: Genetic-based Synthetic Data Sets for the Analysis of Classifiers B...
 
HIS'2008: Artificial Data Sets based on Knowledge Generators: Analysis of Lea...
HIS'2008: Artificial Data Sets based on Knowledge Generators: Analysis of Lea...HIS'2008: Artificial Data Sets based on Knowledge Generators: Analysis of Lea...
HIS'2008: Artificial Data Sets based on Knowledge Generators: Analysis of Lea...
 

Recently uploaded

Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfPatidar M
 
Mythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWMythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWQuiz Club NITW
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research DiscourseAnita GoswamiGiri
 
Congestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationCongestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationdeepaannamalai16
 
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...Nguyen Thanh Tu Collection
 
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQ-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQuiz Club NITW
 
4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptx4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptxmary850239
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptxmary850239
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmStan Meyer
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Seán Kennedy
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Projectjordimapav
 
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxBIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxSayali Powar
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4JOYLYNSAMANIEGO
 
How to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseHow to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseCeline George
 
Measures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataMeasures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataBabyAnnMotar
 
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...DhatriParmar
 
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxGrade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxkarenfajardo43
 
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDecoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDhatriParmar
 
MS4 level being good citizen -imperative- (1) (1).pdf
MS4 level   being good citizen -imperative- (1) (1).pdfMS4 level   being good citizen -imperative- (1) (1).pdf
MS4 level being good citizen -imperative- (1) (1).pdfMr Bounab Samir
 

Recently uploaded (20)

Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdf
 
Mythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWMythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITW
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research Discourse
 
Congestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationCongestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentation
 
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
 
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQ-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
 
4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptx4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptx
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and Film
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Project
 
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxBIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4
 
How to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseHow to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 Database
 
Measures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataMeasures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped data
 
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
 
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptxINCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
 
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxGrade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
 
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDecoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
 
MS4 level being good citizen -imperative- (1) (1).pdf
MS4 level   being good citizen -imperative- (1) (1).pdfMS4 level   being good citizen -imperative- (1) (1).pdf
MS4 level being good citizen -imperative- (1) (1).pdf
 

Advanced Machine Learning Association Rule Mining

  • 1. Introduction to Machine Learning Lecture 16 Advanced Topics in Association Rules Mining Albert Orriols i Puig http://www.albertorriols.net htt // lb t i l t aorriols@salle.url.edu Artificial Intelligence – Machine Learning g g Enginyeria i Arquitectura La Salle Universitat Ramon Llull
  • 2. Recap of Lecture 13-15 Ideas come from the market basket analysis ( y (MBA) ) Let’s go shopping! Milk, eggs, sugar, bread Milk, eggs, cereal, Eggs, sugar bread bd Customer1 Customer2 Customer3 What do my customer buy? Which product are bought together? Aim: Find associations and correlations between t e d e e t d assoc at o s a d co e at o s bet ee the different items that customers place in their shopping basket Slide 2 Artificial Intelligence Machine Learning
  • 3. Recap of Lecture 15 Aim: Find associations between items But wait! There are many different diapers Dodot, Huggies … gg There are many different beers: heineken, desperados, king fisher … in bottle/can … , p , g Clothes Which rule do you prefer? diapers ⇒ beer Outwear Shirts dodot diapers M ⇒ Dam beer in Can Jackets Ski Pants Which will have greater support? Slide 3 Artificial Intelligence Machine Learning
  • 4. Today’s Agenda Continuing our journey through some advanced topics in ARM Mining frequent patterns without candidate generation Multiple Level AR Sequential Pattern Mining Quantitative association rules Mining class association rules Beyond support & confidence B d t fid Applications Slide 4 Artificial Intelligence Machine Learning
  • 5. Introduction to Seq. AR So far, we have seen , Apriori Fp-growth F th Mining multiple level AR But none of them consider the order of transactions However, However is the sequence important? Whether the hen or the egg? Sometimes, really important Analyze the sequence of items bought buy a customer Web usage mining searches for navigational patterns of users Slide 5 Artificial Intelligence Machine Learning
  • 6. An Example in Web Usage Mining Web sequence: < {Homepage} {Electronics} {Computers} {Laptops} {Sony Vaio} {Order Confirmation} {Return to Shopping} > Slide 6 Artificial Intelligence Machine Learning
  • 7. Definition Defining the problem: g p Let I = {i1, i2, …, im} be a set of items Sequence: A ordered li t of itemsets S An d d list f it t Itemset/element: A non-empty set of items X ⊆ I. We denote a sequence s b < 1a2…ar> where ai i an it by <a >, h is itemset, which i also t hi h is l called an element of s An l A element ( an it t (or itemset) of a sequence is denoted by { 1, x2, t) f id t d b {x …, xk}, where xj ∈ I is an item We W assume without loss of generality th t it ith t l f lit that items in an element i l t of a sequence are in lexicographic order Slide 7 Artificial Intelligence Machine Learning
  • 8. Definition Defining the problem: g p Size: The size of a sequence is the number of elements (or itemsets) in the seque ce e se s) e sequence Length: The length of a sequence is the number of items in the seque ce sequence A sequence of length k is called k-sequence A sequence s1 = 〈 1a2…ar〉 i a subsequence of another 〈a is b f th sequence s2 = 〈b1b2…bv〉, or s2 is a supersequence of s1, if there e st integers 1 ≤ j1 < j2 < … < jr 1 < jr ≤ v such t at a1 ⊆ t e e exist tege s suc that r−1 bj1, a2 ⊆ bj2, …, ar ⊆ bjr. We also say that s2 contains s1 Slide 8 Artificial Intelligence Machine Learning
  • 9. Example Let I = {1, 2, 3, 4, 5, 6, 7, 8, 9}. {, , , , , , , , } Sequence 〈{3}{4, 5}{8}〉 is contained in (or is a subsequence of) 〈{6} {3 7}{9}{4 5 8}{3 8}〉 {3, 7}{9}{4, 5, 8}{3, because {3} ⊆ {3, 7}, {4, 5} ⊆ {4, 5, 8}, and {8} ⊆ {3, 8}. However, 〈{3}{8}〉 is not contained in 〈{3, 8}〉 or vice versa. The size of the sequence 〈{3}{4, 5}{8}〉 is 3, and the length of the sequence is 4 Slide 9 Artificial Intelligence Machine Learning
  • 10. Objective Objective of sequential pattern mining (SPM) j q p g( ) Input: A set S of input data sequences (or sequence database) Goal: the G l th problem of mining sequential patterns i t fi d all th bl f ii ti l tt is to find ll the sequences that have a user-specified minimum support Each E h such sequence is called a frequent sequence, or a h i ll d f t sequential pattern The support for a sequence is the fraction of total data sequences in S that contains this sequence Slide 10 Artificial Intelligence Machine Learning
  • 11. Example Customer Transaction Transaction Customer Customer Sequence ID time (items bought) ID 1 July 20, 2005 30 1 < (30) (90)> 1 July 25, 2005 90 2 <(10 20) (30) (40 60 70)> 2 July 9, 2005 y, 10, 20 , 3 <(30 50 70)> ( ) 2 July 14, 2005 30 4 <(30) (40 70) (90)> 2 July 20, 2005 40,60,70 5 <(90)> 3 July 25, 2005 30,50,70 4 July 25, 2005 30 4 July 29, 2005 y, 40, 70 , 4 August 2, 2005 90 5 July 12, 2005 90 Sequential patterns with support >25% 1-sequence < (30)> <(40)> <(70)> <(90)> 2-sequence <(30)(40)> <(30)(70)><(30)(90)><(40 70)> 3-sequence <(30) (40 70)> Example borrowed from Bing Liu Slide 11 Artificial Intelligence Machine Learning
  • 12. GSP GSP follows closely Apriori but for sequential patterns yp q p If a sequence S is not frequent, then none of the super- seque ces of s eque sequences o S is frequent For instance, if <ab> is infrequent so do <acb> and <(ca)b> GSP follows the next steps: f ll th tt Initially, every item in DB is a candidate of length-1 For each level (i.e., sequences of length-k) do Scan database to collect support count for each candidate sequence Generate candidate length-(k+1) sequences from length-k frequent sequences using Apriori Repeat until no frequent sequence or no candidate can be found Strength: Candidate pruning by Apriori Slide 12 Artificial Intelligence Machine Learning
  • 13. The Algorithm Does this remind you Apriori? Slide 13 Artificial Intelligence Machine Learning
  • 14. Quantitative AR Transaction ID Age Married NumCars 1 23 No 1 2 25 Yes 1 3 29 No 0 4 34 Yes 2 5 38 Yes Y 2 <Age: 30..39> and <Married: Yes> => <NumCars: 2> Support = 40% Conf = 100% 40%, How can we deal with these data? Slide 14 Artificial Intelligence Machine Learning
  • 15. Map to Boolean Values Record Age g Age g Married Married NumCars NumCars ID [20..29] [30..39] Yes No 0 1 100 1 0 0 1 0 1 200 1 0 1 0 0 1 300 1 0 0 1 1 0 400 0 1 1 0 0 0 500 0 1 1 0 0 0 Now, Now use any system for mining boolean AR Apriori FP-growth Slide 15 Artificial Intelligence Machine Learning
  • 16. Problems with this Approach MinSup If number of intervals is large, the support of a single interval can be lower MinConf Information lost during partition values into intervals. Confidence can be lower as number of intervals is smaller Example In the used partition: <NumCars:0> ⇒ <Married:No> c=100% But now, assume that in the partition, NumCars:0 and NumCars:1 go to the same interval <NumCars:0,1> ⇒ <Married:No> c=66.67% Slide 16 Artificial Intelligence Machine Learning
  • 17. Problems with this Approach How we can solve this problem? Increase the number of intervals (to reduce information lost) while combining adjacent ones (t i hil bi i dj t (to increase support) t) ExecTime blows up as items per record increases ManyRules: Number of rules also blows up. Many of them will not be interesting Slide 17 Artificial Intelligence Machine Learning
  • 18. Second Approach Other solutions? Well, the problem was that intervals were not the best ones Let’s t t L t’ try to create the best intervals f our d t t th b t i t l for data How? Discretizing/Clustering techniques Apply a discretizing/clustering technique to find the best y g g partitions Employ those partitions We’ll see how clustering techniques work in the next class. So, keep this in mind and p p pitch the p pieces together next class! g Slide 18 Artificial Intelligence Machine Learning
  • 19. Third Approach And what if we do not map the input to a boolean p p space? Create interval based association interval-based rules directly So, So decide the best interval and and, then, count the support Usually, Usually these approaches do not provide all the association rules, but the ones with larger support and confidence f Fuzzy logics can also be applied here. But again, we’ll see GFS in two three lectures Slide 19 Artificial Intelligence Machine Learning
  • 20. Mining Class Association Rules So far, we have seen ARM without any specific target , yp g It finds all possible rules that exist in data, i.e., any item can appear as a consequent or a condition of a rule However, what if we are interested in some specific targets? E.g.: Eg: The user has a set of text documents from some known topics. He/she wants to find out what words are associated or correlated with each topic So, now, we want to find: X ⇒ y, where X ⊆ I, and y ∈ Y The algorithms are very similar to those of ARM We are not going to see them in class. But you have information on the estudy Slide 20 Artificial Intelligence Machine Learning
  • 21. Beyond Support and Confidence Support and Confidence are the basic measures of pp interestingness But many more have been proposed during the last few years Slide 21 Artificial Intelligence Machine Learning
  • 22. Some Applications Wal-Mart has used the technique for years to mine POS data and arrange their store to maximize sales from such analysis Medical databases to discover commonly occurring diseases amongst groups of people Lottery results databases, to discover those lucky combinations of L tt lt d t b t di th lk bi ti f numbers Slide 22 Artificial Intelligence Machine Learning
  • 23. Some Applications Power System Restoration y PSR is a multi-objective, multi-period, nonlinear, mixed integer op optimization p ob e with various co s a s a d a o problem a ous constraints and unforeseeable factors Discovering o assoc a o s that help bu d heuristics for PSR sco e g of associations a e p build eu s cs o S Actions in a PSR start_black_start_unit(x) start black start unit(x) energize_line(x) pick_up_load(x) pick up load(x) synchronize(x,y) connect_tie_line(x) connect tie line(x) crank_unit(x) energize_busbar(x) energize busbar(x) Slide 23 Artificial Intelligence Machine Learning
  • 24. Some Applications Correlations with color, spatial relationships, etc. From coarse to Fine Resolution mining Slide 24 Artificial Intelligence Machine Learning
  • 25. Next Class Clustering Slide 25 Artificial Intelligence Machine Learning
  • 26. Introduction to Machine Learning Lecture 16 Advanced Topics in Association Rules Mining Albert Orriols i Puig http://www.albertorriols.net htt // lb t i l t aorriols@salle.url.edu Artificial Intelligence – Machine Learning g g Enginyeria i Arquitectura La Salle Universitat Ramon Llull