SlideShare una empresa de Scribd logo
1 de 49
Descargar para leer sin conexión
Mining Adaptively Frequent Closed Unlabeled
       Rooted Trees in Data Streams

            Albert Bifet and Ricard Gavaldà

               Universitat Politècnica de Catalunya


14th ACM SIGKDD International Conference on Knowledge
         Discovery and Data Mining (KDD’08)
                2008 Las Vegas, USA
Tree Mining
                                Mining frequent trees is
                                becoming an important
                                task
                                Applications:
                                    chemical informatics
                                    computer vision
                                    text retrieval
                                    bioinformatics
Data Streams
                                    Web analysis.
   Sequence is potentially
                                Many link-based
   infinite
                                structures may be
   High amount of data:         studied formally by
   sublinear space              means of unordered
   High speed of arrival:       trees
   sublinear time per
   example
Introduction: Data Streams

   Data Streams
      Sequence is potentially infinite
      High amount of data: sublinear space
      High speed of arrival: sublinear time per example
      Once an element from a data stream has been processed
      it is discarded or archived

  Example
  Puzzle: Finding Missing Numbers
      Let π be a permutation of {1, . . . , n}.
      Let π−1 be π with one element
      missing.
      π−1 [i] arrives in increasing order
  Task: Determine the missing number
Introduction: Data Streams

   Data Streams
      Sequence is potentially infinite
      High amount of data: sublinear space
      High speed of arrival: sublinear time per example
      Once an element from a data stream has been processed
      it is discarded or archived

  Example
  Puzzle: Finding Missing Numbers                  Use a n-bit
      Let π be a permutation of {1, . . . , n}.   vector to
      Let π−1 be π with one element               memorize all the
      missing.                                    numbers (O(n)
                                                  space)
      π−1 [i] arrives in increasing order
  Task: Determine the missing number
Introduction: Data Streams

   Data Streams
      Sequence is potentially infinite
      High amount of data: sublinear space
      High speed of arrival: sublinear time per example
      Once an element from a data stream has been processed
      it is discarded or archived

  Example
  Puzzle: Finding Missing Numbers
      Let π be a permutation of {1, . . . , n}.   Data Streams:
      Let π−1 be π with one element               O(log(n)) space.
      missing.
      π−1 [i] arrives in increasing order
  Task: Determine the missing number
Introduction: Data Streams

   Data Streams
      Sequence is potentially infinite
      High amount of data: sublinear space
      High speed of arrival: sublinear time per example
      Once an element from a data stream has been processed
      it is discarded or archived

  Example                                         Data Streams:
  Puzzle: Finding Missing Numbers                 O(log(n)) space.
      Let π be a permutation of {1, . . . , n}.   Store
      Let π−1 be π with one element               n(n + 1)
      missing.                                             − ∑ π−1 [j].
                                                     2       j≤i
      π−1 [i] arrives in increasing order
  Task: Determine the missing number
Introduction: Trees


  Our trees are:                   Our subtrees are:
      Unlabeled                         Induced
      Ordered and Unordered
                    Two different ordered trees
                   but the same unordered tree
Introduction


      Induced subtrees: obtained by repeatedly removing leaf
      nodes




      Embedded subtrees: obtained by contracting some of the
      edges
Introduction

   What Is Tree Pattern Mining?

   Given a dataset of trees, find the complete set of frequent
   subtrees
       Frequent Tree Pattern (FS):
            Include all the trees whose support is no less than min_sup
       Closed Frequent Tree Pattern (CS):
            Include no tree which has a super-tree with the same
            support
       CS ⊆ FS
       Closed Frequent Tree Mining provides a compact
       representation of frequent trees without loss of information
Introduction

   Unordered Subtree Mining
    A:               B:                   X:        Y:




                          D = {A, B}, min_sup = 2

                           # Closed Subtrees : 2
                          # Frequent Subtrees: 9

         Closed Subtrees: X, Y


         Frequent Subtrees:
Introduction


   Problem
   Given a data stream D of rooted, unlabelled and unordered
   trees, find frequent closed trees.


                                    We provide three algorithms,
                                    of increasing power
                                        Incremental
                                        Sliding Window
                                        Adaptive

               D
Outline


   1   Introduction


   2   Data Streams


   3   ADWIN : Concept Drift Mining


   4   Adaptive Closed Frequent Tree Mining


   5   Summary
Data Streams



  Data Streams
  At any time t in the data stream, we would like the per-item
  processing time and storage to be simultaneously
  O(log k (N, t)).

  Approximation algorithms
      Small error rate with high probability
                                                        ˜
      An algorithm (ε, δ )−approximates F if it outputs F for
                 ˜
      which Pr[|F − F | > εF ] < δ .
Data Streams Approximation Algorithms



            1011000111 1010101

  Sliding Window
  We can maintain simple statistics over sliding windows, using
  O( 1 log2 N) space, where
     ε
      N is the length of the sliding window
      ε is the accuracy parameter

      M. Datar, A. Gionis, P. Indyk, and R. Motwani.
      Maintaining stream statistics over sliding windows. 2002
Data Streams Approximation Algorithms



            10110001111 0101011

  Sliding Window
  We can maintain simple statistics over sliding windows, using
  O( 1 log2 N) space, where
     ε
      N is the length of the sliding window
      ε is the accuracy parameter

      M. Datar, A. Gionis, P. Indyk, and R. Motwani.
      Maintaining stream statistics over sliding windows. 2002
Data Streams Approximation Algorithms



            101100011110 1010111

  Sliding Window
  We can maintain simple statistics over sliding windows, using
  O( 1 log2 N) space, where
     ε
      N is the length of the sliding window
      ε is the accuracy parameter

      M. Datar, A. Gionis, P. Indyk, and R. Motwani.
      Maintaining stream statistics over sliding windows. 2002
Data Streams Approximation Algorithms



            1011000111101 0101110

  Sliding Window
  We can maintain simple statistics over sliding windows, using
  O( 1 log2 N) space, where
     ε
      N is the length of the sliding window
      ε is the accuracy parameter

      M. Datar, A. Gionis, P. Indyk, and R. Motwani.
      Maintaining stream statistics over sliding windows. 2002
Data Streams Approximation Algorithms



            10110001111010 1011101

  Sliding Window
  We can maintain simple statistics over sliding windows, using
  O( 1 log2 N) space, where
     ε
      N is the length of the sliding window
      ε is the accuracy parameter

      M. Datar, A. Gionis, P. Indyk, and R. Motwani.
      Maintaining stream statistics over sliding windows. 2002
Data Streams Approximation Algorithms



            101100011110101 0111010

  Sliding Window
  We can maintain simple statistics over sliding windows, using
  O( 1 log2 N) space, where
     ε
      N is the length of the sliding window
      ε is the accuracy parameter

      M. Datar, A. Gionis, P. Indyk, and R. Motwani.
      Maintaining stream statistics over sliding windows. 2002
Outline


   1   Introduction


   2   Data Streams


   3   ADWIN : Concept Drift Mining


   4   Adaptive Closed Frequent Tree Mining


   5   Summary
ADWIN: Adaptive sliding window

  ADWIN
  An adaptive sliding window whose size is recomputed online
  according to the rate of change observed.

  ADWIN has rigorous guarantees (theorems)
      On ratio of false positives and negatives
      On the relation of the size of the current window and
      change rates

  ADWIN using a Data Stream Sliding Window Model,
      can provide the exact counts of 1’s in O(1) time per point.
      tries O(log W ) cutpoints
      uses O( 1 log W ) memory words
              ε
      the processing time per example is O(log W ) (amortized
      and worst-case).
Time Change Detectors and Predictors: A General
Framework


                                          Estimation
                                          -
   xt
        -
            Estimator
Time Change Detectors and Predictors: A General
Framework


                                             Estimation
                                             -
   xt
        -
            Estimator                        Alarm
                        -                    -
                            Change Detect.
Time Change Detectors and Predictors: A General
Framework


                                                   Estimation
                                                   -
   xt
        -
            Estimator                              Alarm
                              -                    -
                                  Change Detect.
                 6
                                    6
                                    ?
            -
                     Memory
Window Management Models


                      W = 101010110111111

Equal & fixed size              Total window against
subwindows                     subwindow
      1010 1011011 1111               10101011011 1111
[Kifer+ 04]                    [Gama+ 04]

Equal size adjacent            ADWIN: All Adjacent
subwindows                     subwindows
     1010101 1011     1111           1 01010110111111
[Dasu+ 06]
Window Management Models


                      W = 101010110111111

Equal & fixed size              Total window against
subwindows                     subwindow
      1010 1011011 1111               10101011011 1111
[Kifer+ 04]                    [Gama+ 04]

Equal size adjacent            ADWIN: All Adjacent
subwindows                     subwindows
     1010101 1011     1111           10 1010110111111
[Dasu+ 06]
Window Management Models


                      W = 101010110111111

Equal & fixed size              Total window against
subwindows                     subwindow
      1010 1011011 1111               10101011011 1111
[Kifer+ 04]                    [Gama+ 04]

Equal size adjacent            ADWIN: All Adjacent
subwindows                     subwindows
     1010101 1011     1111           101 010110111111
[Dasu+ 06]
Window Management Models


                      W = 101010110111111

Equal & fixed size              Total window against
subwindows                     subwindow
      1010 1011011 1111               10101011011 1111
[Kifer+ 04]                    [Gama+ 04]

Equal size adjacent            ADWIN: All Adjacent
subwindows                     subwindows
     1010101 1011     1111           1010 10110111111
[Dasu+ 06]
Window Management Models


                      W = 101010110111111

Equal & fixed size              Total window against
subwindows                     subwindow
      1010 1011011 1111               10101011011 1111
[Kifer+ 04]                    [Gama+ 04]

Equal size adjacent            ADWIN: All Adjacent
subwindows                     subwindows
     1010101 1011     1111           10101 0110111111
[Dasu+ 06]
Window Management Models


                      W = 101010110111111

Equal & fixed size              Total window against
subwindows                     subwindow
      1010 1011011 1111               10101011011 1111
[Kifer+ 04]                    [Gama+ 04]

Equal size adjacent            ADWIN: All Adjacent
subwindows                     subwindows
     1010101 1011     1111           101010 110111111
[Dasu+ 06]
Window Management Models


                      W = 101010110111111

Equal & fixed size              Total window against
subwindows                     subwindow
      1010 1011011 1111               10101011011 1111
[Kifer+ 04]                    [Gama+ 04]

Equal size adjacent            ADWIN: All Adjacent
subwindows                     subwindows
     1010101 1011     1111           1010101 10111111
[Dasu+ 06]
Window Management Models


                      W = 101010110111111

Equal & fixed size              Total window against
subwindows                     subwindow
      1010 1011011 1111               10101011011 1111
[Kifer+ 04]                    [Gama+ 04]

Equal size adjacent            ADWIN: All Adjacent
subwindows                     subwindows
     1010101 1011     1111           10101011 0111111
[Dasu+ 06]
Window Management Models


                      W = 101010110111111

Equal & fixed size              Total window against
subwindows                     subwindow
      1010 1011011 1111               10101011011 1111
[Kifer+ 04]                    [Gama+ 04]

Equal size adjacent            ADWIN: All Adjacent
subwindows                     subwindows
     1010101 1011     1111           101010110 111111
[Dasu+ 06]
Window Management Models


                      W = 101010110111111

Equal & fixed size              Total window against
subwindows                     subwindow
      1010 1011011 1111               10101011011 1111
[Kifer+ 04]                    [Gama+ 04]

Equal size adjacent            ADWIN: All Adjacent
subwindows                     subwindows
     1010101 1011     1111           1010101101 11111
[Dasu+ 06]
Window Management Models


                      W = 101010110111111

Equal & fixed size              Total window against
subwindows                     subwindow
      1010 1011011 1111               10101011011 1111
[Kifer+ 04]                    [Gama+ 04]

Equal size adjacent            ADWIN: All Adjacent
subwindows                     subwindows
     1010101 1011     1111           10101011011 1111
[Dasu+ 06]
Window Management Models


                      W = 101010110111111

Equal & fixed size              Total window against
subwindows                     subwindow
      1010 1011011 1111               10101011011 1111
[Kifer+ 04]                    [Gama+ 04]

Equal size adjacent            ADWIN: All Adjacent
subwindows                     subwindows
     1010101 1011     1111           101010110111 111
[Dasu+ 06]
Window Management Models


                      W = 101010110111111

Equal & fixed size              Total window against
subwindows                     subwindow
      1010 1011011 1111               10101011011 1111
[Kifer+ 04]                    [Gama+ 04]

Equal size adjacent            ADWIN: All Adjacent
subwindows                     subwindows
     1010101 1011     1111           1010101101111 11
[Dasu+ 06]
Window Management Models


                      W = 101010110111111

Equal & fixed size              Total window against
subwindows                     subwindow
      1010 1011011 1111               10101011011 1111
[Kifer+ 04]                    [Gama+ 04]

Equal size adjacent            ADWIN: All Adjacent
subwindows                     subwindows
     1010101 1011     1111           10101011011111 1
[Dasu+ 06]                     11
Outline


   1   Introduction


   2   Data Streams


   3   ADWIN : Concept Drift Mining


   4   Adaptive Closed Frequent Tree Mining


   5   Summary
Pattern Relaxed Support


     Guojie Song, Dongqing Yang, Bin Cui, Baihua Zheng,
     Yunfeng Liu and Kunqing Xie.
     CLAIM: An Efficient Method for Relaxed Frequent Closed
     Itemsets Mining over Stream Data
      Linear Relaxed Interval:The support space of all
      subpatterns can be divided into n = 1/εr intervals, where
      εr is a user-specified relaxed factor, and each interval can
      be denoted by Ii = [li , ui ), where li = (n − i) ∗ εr ≥ 0,
      ui = (n − i + 1) ∗ εr ≤ 1 and i ≤ n.
      Linear Relaxed closed subpattern t: if and only if there
      exists no proper superpattern t of t such that their suports
      belong to the same interval Ii .
Pattern Relaxed Support



  As the number of closed frequent patterns is not linear with
  respect support, we introduce a new relaxed support:
      Logarithmic Relaxed Interval:The support space of all
      subpatterns can be divided into n = 1/εr intervals, where
      εr is a user-specified relaxed factor, and each interval can
      be denoted by Ii = [li , ui ), where li = c i , ui = c i+1 − 1
      and i ≤ n.
      Logarithmic Relaxed closed subpattern t: if and only if
      there exists no proper superpattern t of t such that their
      suports belong to the same interval Ii .
Galois Lattice of closed set of trees




                           1            2          3




             D
  We need                  12                 13   23
      a Galois
      connection pair
      a closure operator


                                        123
Incremental mining on closed frequent trees

    1   Adding a tree
        transaction, does
        not decrease the
        number of closed
        trees for D.         1       2         3
    2   Adding a
        transaction with a
        closed tree, does
        not modify the
        number of closed     12           13   23
        trees for D.




                                    123
Sliding Window mining on closed frequent trees

    1   Deleting a tree
        transaction, does
        not increase the
        number of closed
        trees for D.          1     2            3
    2   Deleting a tree
        transaction that is
        repeated, does not
        modify the number
        of closed trees for   12         13      23
        D.




                                   123
Algorithms

  Algorithms
      Incremental: I NC T REE N AT
      Sliding Window: W IN T REE N AT
      Adaptive: A DAT REE N AT Uses ADWIN to monitor change

  ADWIN
  An adaptive sliding window whose size is recomputed online
  according to the rate of change observed.

  ADWIN has rigorous guarantees (theorems)
      On ratio of false positives and negatives
      On the relation of the size of the current window and
      change rates
Experimental Validation: TN1



                                         CMTreeMiner
            300

      Time 200
     (sec.)
            100
                                                  I NC T REE N AT

                          2        4          6         8
                               Size (Milions)

     Figure: Time on experiments on ordered trees on TN1 dataset
Experimental Validation


                                       45




                                       35
              Number of Closed Trees




                                       25                                                                             AdaTreeInc 1
                                                                                                                      AdaTreeInc 2



                                       15




                                       5
                                            0   21.460 42.920 64.380 85.840 107.300 128.760 150.220 171.680 193.140
                                                                       Number of Samples




   Figure: Number of closed trees maintaining the same number of
   closed datasets on input data
Outline


   1   Introduction


   2   Data Streams


   3   ADWIN : Concept Drift Mining


   4   Adaptive Closed Frequent Tree Mining


   5   Summary
Summary



  Conclusions
      New logarithmic relaxed closed support
      Using Galois Latice Theory, we present methods for mining
      closed trees
          Incremental: I NC T REE N AT
          Sliding Window: W IN T REE N AT
          Adaptive: A DAT REE N AT using ADWIN to monitor change

  Future Work
  Labeled Trees and XML data.

Más contenido relacionado

La actualidad más candente

Introduction to Deep Learning with Python
Introduction to Deep Learning with PythonIntroduction to Deep Learning with Python
Introduction to Deep Learning with Pythonindico data
 
Text classification in scikit-learn
Text classification in scikit-learnText classification in scikit-learn
Text classification in scikit-learnJimmy Lai
 
Deep Learning through Examples
Deep Learning through ExamplesDeep Learning through Examples
Deep Learning through ExamplesSri Ambati
 
What is TensorFlow? | Introduction to TensorFlow | TensorFlow Tutorial For Be...
What is TensorFlow? | Introduction to TensorFlow | TensorFlow Tutorial For Be...What is TensorFlow? | Introduction to TensorFlow | TensorFlow Tutorial For Be...
What is TensorFlow? | Introduction to TensorFlow | TensorFlow Tutorial For Be...Simplilearn
 
[241]large scale search with polysemous codes
[241]large scale search with polysemous codes[241]large scale search with polysemous codes
[241]large scale search with polysemous codesNAVER D2
 
N-gram IDF: A Global Term Weighting Scheme Based on Information Distance (WWW...
N-gram IDF: A Global Term Weighting Scheme Based on Information Distance (WWW...N-gram IDF: A Global Term Weighting Scheme Based on Information Distance (WWW...
N-gram IDF: A Global Term Weighting Scheme Based on Information Distance (WWW...Masumi Shirakawa
 
Information in the Weights
Information in the WeightsInformation in the Weights
Information in the WeightsMark Chang
 
Scalable membership management
Scalable membership management Scalable membership management
Scalable membership management Vinay Setty
 
Sebastian Schelter – Distributed Machine Learing with the Samsara DSL
Sebastian Schelter – Distributed Machine Learing with the Samsara DSLSebastian Schelter – Distributed Machine Learing with the Samsara DSL
Sebastian Schelter – Distributed Machine Learing with the Samsara DSLFlink Forward
 
Alex Tellez, Deep Learning Applications
Alex Tellez, Deep Learning ApplicationsAlex Tellez, Deep Learning Applications
Alex Tellez, Deep Learning ApplicationsSri Ambati
 
Introduction to Neural Networks in Tensorflow
Introduction to Neural Networks in TensorflowIntroduction to Neural Networks in Tensorflow
Introduction to Neural Networks in TensorflowNicholas McClure
 
Keynote at Converge 2019
Keynote at Converge 2019Keynote at Converge 2019
Keynote at Converge 2019Travis Oliphant
 
Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...
Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...
Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...Cloudera, Inc.
 
Latent Semantic Analysis of Wikipedia with Spark
Latent Semantic Analysis of Wikipedia with SparkLatent Semantic Analysis of Wikipedia with Spark
Latent Semantic Analysis of Wikipedia with SparkSandy Ryza
 
Diving into Deep Learning (Silicon Valley Code Camp 2017)
Diving into Deep Learning (Silicon Valley Code Camp 2017)Diving into Deep Learning (Silicon Valley Code Camp 2017)
Diving into Deep Learning (Silicon Valley Code Camp 2017)Oswald Campesato
 
Array computing and the evolution of SciPy, NumPy, and PyData
Array computing and the evolution of SciPy, NumPy, and PyDataArray computing and the evolution of SciPy, NumPy, and PyData
Array computing and the evolution of SciPy, NumPy, and PyDataTravis Oliphant
 
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...MLconf
 
Data Science and Machine Learning Using Python and Scikit-learn
Data Science and Machine Learning Using Python and Scikit-learnData Science and Machine Learning Using Python and Scikit-learn
Data Science and Machine Learning Using Python and Scikit-learnAsim Jalis
 

La actualidad más candente (20)

Introduction to Deep Learning with Python
Introduction to Deep Learning with PythonIntroduction to Deep Learning with Python
Introduction to Deep Learning with Python
 
Text classification in scikit-learn
Text classification in scikit-learnText classification in scikit-learn
Text classification in scikit-learn
 
Deep Learning through Examples
Deep Learning through ExamplesDeep Learning through Examples
Deep Learning through Examples
 
What is TensorFlow? | Introduction to TensorFlow | TensorFlow Tutorial For Be...
What is TensorFlow? | Introduction to TensorFlow | TensorFlow Tutorial For Be...What is TensorFlow? | Introduction to TensorFlow | TensorFlow Tutorial For Be...
What is TensorFlow? | Introduction to TensorFlow | TensorFlow Tutorial For Be...
 
[241]large scale search with polysemous codes
[241]large scale search with polysemous codes[241]large scale search with polysemous codes
[241]large scale search with polysemous codes
 
N-gram IDF: A Global Term Weighting Scheme Based on Information Distance (WWW...
N-gram IDF: A Global Term Weighting Scheme Based on Information Distance (WWW...N-gram IDF: A Global Term Weighting Scheme Based on Information Distance (WWW...
N-gram IDF: A Global Term Weighting Scheme Based on Information Distance (WWW...
 
Codes and Isogenies
Codes and IsogeniesCodes and Isogenies
Codes and Isogenies
 
Information in the Weights
Information in the WeightsInformation in the Weights
Information in the Weights
 
Scalable membership management
Scalable membership management Scalable membership management
Scalable membership management
 
Sebastian Schelter – Distributed Machine Learing with the Samsara DSL
Sebastian Schelter – Distributed Machine Learing with the Samsara DSLSebastian Schelter – Distributed Machine Learing with the Samsara DSL
Sebastian Schelter – Distributed Machine Learing with the Samsara DSL
 
Alex Tellez, Deep Learning Applications
Alex Tellez, Deep Learning ApplicationsAlex Tellez, Deep Learning Applications
Alex Tellez, Deep Learning Applications
 
Introduction to Neural Networks in Tensorflow
Introduction to Neural Networks in TensorflowIntroduction to Neural Networks in Tensorflow
Introduction to Neural Networks in Tensorflow
 
Keynote at Converge 2019
Keynote at Converge 2019Keynote at Converge 2019
Keynote at Converge 2019
 
Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...
Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...
Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...
 
Latent Semantic Analysis of Wikipedia with Spark
Latent Semantic Analysis of Wikipedia with SparkLatent Semantic Analysis of Wikipedia with Spark
Latent Semantic Analysis of Wikipedia with Spark
 
Diving into Deep Learning (Silicon Valley Code Camp 2017)
Diving into Deep Learning (Silicon Valley Code Camp 2017)Diving into Deep Learning (Silicon Valley Code Camp 2017)
Diving into Deep Learning (Silicon Valley Code Camp 2017)
 
Array computing and the evolution of SciPy, NumPy, and PyData
Array computing and the evolution of SciPy, NumPy, and PyDataArray computing and the evolution of SciPy, NumPy, and PyData
Array computing and the evolution of SciPy, NumPy, and PyData
 
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
 
Data Science and Machine Learning Using Python and Scikit-learn
Data Science and Machine Learning Using Python and Scikit-learnData Science and Machine Learning Using Python and Scikit-learn
Data Science and Machine Learning Using Python and Scikit-learn
 
Bayesian Counters
Bayesian CountersBayesian Counters
Bayesian Counters
 

Destacado

Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataAlbert Bifet
 
Mining Implications from Lattices of Closed Trees
Mining Implications from Lattices of Closed TreesMining Implications from Lattices of Closed Trees
Mining Implications from Lattices of Closed TreesAlbert Bifet
 
Postgraduate Studies: Graduate School Experience
Postgraduate Studies: Graduate School ExperiencePostgraduate Studies: Graduate School Experience
Postgraduate Studies: Graduate School ExperienceLighton Phiri
 
@Travis pm. presents 'what is klout?'
@Travis pm. presents 'what is klout?'@Travis pm. presents 'what is klout?'
@Travis pm. presents 'what is klout?'Travis P. Mulenga
 
Introduction to Big Data Science
Introduction to Big Data ScienceIntroduction to Big Data Science
Introduction to Big Data ScienceAlbert Bifet
 
Métodos Adaptativos de Minería de Datos y Aprendizaje para Flujos de Datos.
Métodos Adaptativos de Minería de Datos y Aprendizaje para Flujos de Datos.Métodos Adaptativos de Minería de Datos y Aprendizaje para Flujos de Datos.
Métodos Adaptativos de Minería de Datos y Aprendizaje para Flujos de Datos.Albert Bifet
 
Adaptive XML Tree Mining on Evolving Data Streams
Adaptive XML Tree Mining on Evolving Data StreamsAdaptive XML Tree Mining on Evolving Data Streams
Adaptive XML Tree Mining on Evolving Data StreamsAlbert Bifet
 
MOA : Massive Online Analysis
MOA : Massive Online AnalysisMOA : Massive Online Analysis
MOA : Massive Online AnalysisAlbert Bifet
 
PAKDD 2011 TUTORIAL Handling Concept Drift: Importance, Challenges and Solutions
PAKDD 2011 TUTORIAL Handling Concept Drift: Importance, Challenges and SolutionsPAKDD 2011 TUTORIAL Handling Concept Drift: Importance, Challenges and Solutions
PAKDD 2011 TUTORIAL Handling Concept Drift: Importance, Challenges and SolutionsAlbert Bifet
 
New ensemble methods for evolving data streams
New ensemble methods for evolving data streamsNew ensemble methods for evolving data streams
New ensemble methods for evolving data streamsAlbert Bifet
 
Sentiment Knowledge Discovery in Twitter Streaming Data
Sentiment Knowledge Discovery in Twitter Streaming DataSentiment Knowledge Discovery in Twitter Streaming Data
Sentiment Knowledge Discovery in Twitter Streaming DataAlbert Bifet
 
Leveraging Bagging for Evolving Data Streams
Leveraging Bagging for Evolving Data StreamsLeveraging Bagging for Evolving Data Streams
Leveraging Bagging for Evolving Data StreamsAlbert Bifet
 
Understanding the Effects of Streamlining the Orchestration of Learning Activ...
Understanding the Effects of Streamlining the Orchestration of Learning Activ...Understanding the Effects of Streamlining the Orchestration of Learning Activ...
Understanding the Effects of Streamlining the Orchestration of Learning Activ...Lighton Phiri
 
Kalman Filters and Adaptive Windows for Learning in Data Streams
Kalman Filters and Adaptive Windows for Learning in Data StreamsKalman Filters and Adaptive Windows for Learning in Data Streams
Kalman Filters and Adaptive Windows for Learning in Data StreamsAlbert Bifet
 
Ad hoc vs. organised orchestration: A comparative analysis of technology-driv...
Ad hoc vs. organised orchestration: A comparative analysis of technology-driv...Ad hoc vs. organised orchestration: A comparative analysis of technology-driv...
Ad hoc vs. organised orchestration: A comparative analysis of technology-driv...Lighton Phiri
 
Distributed Decision Tree Learning for Mining Big Data Streams
Distributed Decision Tree Learning for Mining Big Data StreamsDistributed Decision Tree Learning for Mining Big Data Streams
Distributed Decision Tree Learning for Mining Big Data StreamsArinto Murdopo
 
Apache Samoa: Mining Big Data Streams with Apache Flink
Apache Samoa: Mining Big Data Streams with Apache FlinkApache Samoa: Mining Big Data Streams with Apache Flink
Apache Samoa: Mining Big Data Streams with Apache FlinkAlbert Bifet
 
High Availability in YARN
High Availability in YARNHigh Availability in YARN
High Availability in YARNArinto Murdopo
 
Moa: Real Time Analytics for Data Streams
Moa: Real Time Analytics for Data StreamsMoa: Real Time Analytics for Data Streams
Moa: Real Time Analytics for Data StreamsAlbert Bifet
 

Destacado (20)

Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Mining Implications from Lattices of Closed Trees
Mining Implications from Lattices of Closed TreesMining Implications from Lattices of Closed Trees
Mining Implications from Lattices of Closed Trees
 
Postgraduate Studies: Graduate School Experience
Postgraduate Studies: Graduate School ExperiencePostgraduate Studies: Graduate School Experience
Postgraduate Studies: Graduate School Experience
 
@Travis pm. presents 'what is klout?'
@Travis pm. presents 'what is klout?'@Travis pm. presents 'what is klout?'
@Travis pm. presents 'what is klout?'
 
Introduction to Big Data Science
Introduction to Big Data ScienceIntroduction to Big Data Science
Introduction to Big Data Science
 
Métodos Adaptativos de Minería de Datos y Aprendizaje para Flujos de Datos.
Métodos Adaptativos de Minería de Datos y Aprendizaje para Flujos de Datos.Métodos Adaptativos de Minería de Datos y Aprendizaje para Flujos de Datos.
Métodos Adaptativos de Minería de Datos y Aprendizaje para Flujos de Datos.
 
Adaptive XML Tree Mining on Evolving Data Streams
Adaptive XML Tree Mining on Evolving Data StreamsAdaptive XML Tree Mining on Evolving Data Streams
Adaptive XML Tree Mining on Evolving Data Streams
 
MOA : Massive Online Analysis
MOA : Massive Online AnalysisMOA : Massive Online Analysis
MOA : Massive Online Analysis
 
PAKDD 2011 TUTORIAL Handling Concept Drift: Importance, Challenges and Solutions
PAKDD 2011 TUTORIAL Handling Concept Drift: Importance, Challenges and SolutionsPAKDD 2011 TUTORIAL Handling Concept Drift: Importance, Challenges and Solutions
PAKDD 2011 TUTORIAL Handling Concept Drift: Importance, Challenges and Solutions
 
New ensemble methods for evolving data streams
New ensemble methods for evolving data streamsNew ensemble methods for evolving data streams
New ensemble methods for evolving data streams
 
Sentiment Knowledge Discovery in Twitter Streaming Data
Sentiment Knowledge Discovery in Twitter Streaming DataSentiment Knowledge Discovery in Twitter Streaming Data
Sentiment Knowledge Discovery in Twitter Streaming Data
 
Leveraging Bagging for Evolving Data Streams
Leveraging Bagging for Evolving Data StreamsLeveraging Bagging for Evolving Data Streams
Leveraging Bagging for Evolving Data Streams
 
Understanding the Effects of Streamlining the Orchestration of Learning Activ...
Understanding the Effects of Streamlining the Orchestration of Learning Activ...Understanding the Effects of Streamlining the Orchestration of Learning Activ...
Understanding the Effects of Streamlining the Orchestration of Learning Activ...
 
Kalman Filters and Adaptive Windows for Learning in Data Streams
Kalman Filters and Adaptive Windows for Learning in Data StreamsKalman Filters and Adaptive Windows for Learning in Data Streams
Kalman Filters and Adaptive Windows for Learning in Data Streams
 
Ad hoc vs. organised orchestration: A comparative analysis of technology-driv...
Ad hoc vs. organised orchestration: A comparative analysis of technology-driv...Ad hoc vs. organised orchestration: A comparative analysis of technology-driv...
Ad hoc vs. organised orchestration: A comparative analysis of technology-driv...
 
Distributed Decision Tree Learning for Mining Big Data Streams
Distributed Decision Tree Learning for Mining Big Data StreamsDistributed Decision Tree Learning for Mining Big Data Streams
Distributed Decision Tree Learning for Mining Big Data Streams
 
Apache Samoa: Mining Big Data Streams with Apache Flink
Apache Samoa: Mining Big Data Streams with Apache FlinkApache Samoa: Mining Big Data Streams with Apache Flink
Apache Samoa: Mining Big Data Streams with Apache Flink
 
High Availability in YARN
High Availability in YARNHigh Availability in YARN
High Availability in YARN
 
Moa: Real Time Analytics for Data Streams
Moa: Real Time Analytics for Data StreamsMoa: Real Time Analytics for Data Streams
Moa: Real Time Analytics for Data Streams
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 

Similar a Mining Adaptively Frequent Closed Unlabeled Rooted Trees in Data Streams

Lecture 7: Recurrent Neural Networks
Lecture 7: Recurrent Neural NetworksLecture 7: Recurrent Neural Networks
Lecture 7: Recurrent Neural NetworksSang Jun Lee
 
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...Simplilearn
 
5.3 dyn algo-i
5.3 dyn algo-i5.3 dyn algo-i
5.3 dyn algo-iKrish_ver2
 
Java and Deep Learning (Introduction)
Java and Deep Learning (Introduction)Java and Deep Learning (Introduction)
Java and Deep Learning (Introduction)Oswald Campesato
 
5.1 mining data streams
5.1 mining data streams5.1 mining data streams
5.1 mining data streamsKrish_ver2
 
Publishing consuming Linked Sensor Data meetup Cuenca
Publishing consuming Linked Sensor Data meetup CuencaPublishing consuming Linked Sensor Data meetup Cuenca
Publishing consuming Linked Sensor Data meetup CuencaJean-Paul Calbimonte
 
TensorFlow Tutorial | Deep Learning With TensorFlow | TensorFlow Tutorial For...
TensorFlow Tutorial | Deep Learning With TensorFlow | TensorFlow Tutorial For...TensorFlow Tutorial | Deep Learning With TensorFlow | TensorFlow Tutorial For...
TensorFlow Tutorial | Deep Learning With TensorFlow | TensorFlow Tutorial For...Simplilearn
 
Evaluating Classification Algorithms Applied To Data Streams Esteban Donato
Evaluating Classification Algorithms Applied To Data Streams   Esteban DonatoEvaluating Classification Algorithms Applied To Data Streams   Esteban Donato
Evaluating Classification Algorithms Applied To Data Streams Esteban DonatoEsteban Donato
 
Deep Learning Based Voice Activity Detection and Speech Enhancement
Deep Learning Based Voice Activity Detection and Speech EnhancementDeep Learning Based Voice Activity Detection and Speech Enhancement
Deep Learning Based Voice Activity Detection and Speech EnhancementNAVER Engineering
 
19 algorithms-and-complexity-110627100203-phpapp02
19 algorithms-and-complexity-110627100203-phpapp0219 algorithms-and-complexity-110627100203-phpapp02
19 algorithms-and-complexity-110627100203-phpapp02Muhammad Aslam
 
Spectral-, source-, connectivity- and network analysis of EEG and MEG data
Spectral-, source-, connectivity- and network analysis of EEG and MEG dataSpectral-, source-, connectivity- and network analysis of EEG and MEG data
Spectral-, source-, connectivity- and network analysis of EEG and MEG dataRobert Oostenveld
 
Time Series Data with Apache Cassandra
Time Series Data with Apache CassandraTime Series Data with Apache Cassandra
Time Series Data with Apache CassandraEric Evans
 
End of Sprint 5
End of Sprint 5End of Sprint 5
End of Sprint 5dm_work
 
EOS5 Demo
EOS5 DemoEOS5 Demo
EOS5 Demodm_work
 
Higher Order Fused Regularization for Supervised Learning with Grouped Parame...
Higher Order Fused Regularization for Supervised Learning with Grouped Parame...Higher Order Fused Regularization for Supervised Learning with Grouped Parame...
Higher Order Fused Regularization for Supervised Learning with Grouped Parame...Koh Takeuchi
 
14889574 dl ml RNN Deeplearning MMMm.ppt
14889574 dl ml RNN Deeplearning MMMm.ppt14889574 dl ml RNN Deeplearning MMMm.ppt
14889574 dl ml RNN Deeplearning MMMm.pptManiMaran230751
 
Time Series Data with Apache Cassandra
Time Series Data with Apache CassandraTime Series Data with Apache Cassandra
Time Series Data with Apache CassandraEric Evans
 
Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neur...
Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neur...Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neur...
Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neur...Claudio Greco
 

Similar a Mining Adaptively Frequent Closed Unlabeled Rooted Trees in Data Streams (20)

18 Data Streams
18 Data Streams18 Data Streams
18 Data Streams
 
Lecture 7: Recurrent Neural Networks
Lecture 7: Recurrent Neural NetworksLecture 7: Recurrent Neural Networks
Lecture 7: Recurrent Neural Networks
 
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
 
5.3 dyn algo-i
5.3 dyn algo-i5.3 dyn algo-i
5.3 dyn algo-i
 
Java and Deep Learning (Introduction)
Java and Deep Learning (Introduction)Java and Deep Learning (Introduction)
Java and Deep Learning (Introduction)
 
5.1 mining data streams
5.1 mining data streams5.1 mining data streams
5.1 mining data streams
 
Publishing consuming Linked Sensor Data meetup Cuenca
Publishing consuming Linked Sensor Data meetup CuencaPublishing consuming Linked Sensor Data meetup Cuenca
Publishing consuming Linked Sensor Data meetup Cuenca
 
TensorFlow Tutorial | Deep Learning With TensorFlow | TensorFlow Tutorial For...
TensorFlow Tutorial | Deep Learning With TensorFlow | TensorFlow Tutorial For...TensorFlow Tutorial | Deep Learning With TensorFlow | TensorFlow Tutorial For...
TensorFlow Tutorial | Deep Learning With TensorFlow | TensorFlow Tutorial For...
 
Evaluating Classification Algorithms Applied To Data Streams Esteban Donato
Evaluating Classification Algorithms Applied To Data Streams   Esteban DonatoEvaluating Classification Algorithms Applied To Data Streams   Esteban Donato
Evaluating Classification Algorithms Applied To Data Streams Esteban Donato
 
Deep Learning Based Voice Activity Detection and Speech Enhancement
Deep Learning Based Voice Activity Detection and Speech EnhancementDeep Learning Based Voice Activity Detection and Speech Enhancement
Deep Learning Based Voice Activity Detection and Speech Enhancement
 
19 algorithms-and-complexity-110627100203-phpapp02
19 algorithms-and-complexity-110627100203-phpapp0219 algorithms-and-complexity-110627100203-phpapp02
19 algorithms-and-complexity-110627100203-phpapp02
 
Defense
DefenseDefense
Defense
 
Spectral-, source-, connectivity- and network analysis of EEG and MEG data
Spectral-, source-, connectivity- and network analysis of EEG and MEG dataSpectral-, source-, connectivity- and network analysis of EEG and MEG data
Spectral-, source-, connectivity- and network analysis of EEG and MEG data
 
Time Series Data with Apache Cassandra
Time Series Data with Apache CassandraTime Series Data with Apache Cassandra
Time Series Data with Apache Cassandra
 
End of Sprint 5
End of Sprint 5End of Sprint 5
End of Sprint 5
 
EOS5 Demo
EOS5 DemoEOS5 Demo
EOS5 Demo
 
Higher Order Fused Regularization for Supervised Learning with Grouped Parame...
Higher Order Fused Regularization for Supervised Learning with Grouped Parame...Higher Order Fused Regularization for Supervised Learning with Grouped Parame...
Higher Order Fused Regularization for Supervised Learning with Grouped Parame...
 
14889574 dl ml RNN Deeplearning MMMm.ppt
14889574 dl ml RNN Deeplearning MMMm.ppt14889574 dl ml RNN Deeplearning MMMm.ppt
14889574 dl ml RNN Deeplearning MMMm.ppt
 
Time Series Data with Apache Cassandra
Time Series Data with Apache CassandraTime Series Data with Apache Cassandra
Time Series Data with Apache Cassandra
 
Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neur...
Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neur...Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neur...
Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neur...
 

Más de Albert Bifet

Artificial intelligence and data stream mining
Artificial intelligence and data stream miningArtificial intelligence and data stream mining
Artificial intelligence and data stream miningAlbert Bifet
 
MOA for the IoT at ACML 2016
MOA for the IoT at ACML 2016 MOA for the IoT at ACML 2016
MOA for the IoT at ACML 2016 Albert Bifet
 
Mining Big Data Streams with APACHE SAMOA
Mining Big Data Streams with APACHE SAMOAMining Big Data Streams with APACHE SAMOA
Mining Big Data Streams with APACHE SAMOAAlbert Bifet
 
Real Time Big Data Management
Real Time Big Data ManagementReal Time Big Data Management
Real Time Big Data ManagementAlbert Bifet
 
Multi-label Classification with Meta-labels
Multi-label Classification with Meta-labelsMulti-label Classification with Meta-labels
Multi-label Classification with Meta-labelsAlbert Bifet
 
Pitfalls in benchmarking data stream classification and how to avoid them
Pitfalls in benchmarking data stream classification and how to avoid themPitfalls in benchmarking data stream classification and how to avoid them
Pitfalls in benchmarking data stream classification and how to avoid themAlbert Bifet
 
Efficient Data Stream Classification via Probabilistic Adaptive Windows
Efficient Data Stream Classification via Probabilistic Adaptive WindowsEfficient Data Stream Classification via Probabilistic Adaptive Windows
Efficient Data Stream Classification via Probabilistic Adaptive WindowsAlbert Bifet
 
Mining Big Data in Real Time
Mining Big Data in Real TimeMining Big Data in Real Time
Mining Big Data in Real TimeAlbert Bifet
 
Mining Big Data in Real Time
Mining Big Data in Real TimeMining Big Data in Real Time
Mining Big Data in Real TimeAlbert Bifet
 
Fast Perceptron Decision Tree Learning from Evolving Data Streams
Fast Perceptron Decision Tree Learning from Evolving Data StreamsFast Perceptron Decision Tree Learning from Evolving Data Streams
Fast Perceptron Decision Tree Learning from Evolving Data StreamsAlbert Bifet
 
Adaptive Learning and Mining for Data Streams and Frequent Patterns
Adaptive Learning and Mining for Data Streams and Frequent PatternsAdaptive Learning and Mining for Data Streams and Frequent Patterns
Adaptive Learning and Mining for Data Streams and Frequent PatternsAlbert Bifet
 

Más de Albert Bifet (11)

Artificial intelligence and data stream mining
Artificial intelligence and data stream miningArtificial intelligence and data stream mining
Artificial intelligence and data stream mining
 
MOA for the IoT at ACML 2016
MOA for the IoT at ACML 2016 MOA for the IoT at ACML 2016
MOA for the IoT at ACML 2016
 
Mining Big Data Streams with APACHE SAMOA
Mining Big Data Streams with APACHE SAMOAMining Big Data Streams with APACHE SAMOA
Mining Big Data Streams with APACHE SAMOA
 
Real Time Big Data Management
Real Time Big Data ManagementReal Time Big Data Management
Real Time Big Data Management
 
Multi-label Classification with Meta-labels
Multi-label Classification with Meta-labelsMulti-label Classification with Meta-labels
Multi-label Classification with Meta-labels
 
Pitfalls in benchmarking data stream classification and how to avoid them
Pitfalls in benchmarking data stream classification and how to avoid themPitfalls in benchmarking data stream classification and how to avoid them
Pitfalls in benchmarking data stream classification and how to avoid them
 
Efficient Data Stream Classification via Probabilistic Adaptive Windows
Efficient Data Stream Classification via Probabilistic Adaptive WindowsEfficient Data Stream Classification via Probabilistic Adaptive Windows
Efficient Data Stream Classification via Probabilistic Adaptive Windows
 
Mining Big Data in Real Time
Mining Big Data in Real TimeMining Big Data in Real Time
Mining Big Data in Real Time
 
Mining Big Data in Real Time
Mining Big Data in Real TimeMining Big Data in Real Time
Mining Big Data in Real Time
 
Fast Perceptron Decision Tree Learning from Evolving Data Streams
Fast Perceptron Decision Tree Learning from Evolving Data StreamsFast Perceptron Decision Tree Learning from Evolving Data Streams
Fast Perceptron Decision Tree Learning from Evolving Data Streams
 
Adaptive Learning and Mining for Data Streams and Frequent Patterns
Adaptive Learning and Mining for Data Streams and Frequent PatternsAdaptive Learning and Mining for Data Streams and Frequent Patterns
Adaptive Learning and Mining for Data Streams and Frequent Patterns
 

Último

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 

Último (20)

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 

Mining Adaptively Frequent Closed Unlabeled Rooted Trees in Data Streams

  • 1. Mining Adaptively Frequent Closed Unlabeled Rooted Trees in Data Streams Albert Bifet and Ricard Gavaldà Universitat Politècnica de Catalunya 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’08) 2008 Las Vegas, USA
  • 2. Tree Mining Mining frequent trees is becoming an important task Applications: chemical informatics computer vision text retrieval bioinformatics Data Streams Web analysis. Sequence is potentially Many link-based infinite structures may be High amount of data: studied formally by sublinear space means of unordered High speed of arrival: trees sublinear time per example
  • 3. Introduction: Data Streams Data Streams Sequence is potentially infinite High amount of data: sublinear space High speed of arrival: sublinear time per example Once an element from a data stream has been processed it is discarded or archived Example Puzzle: Finding Missing Numbers Let π be a permutation of {1, . . . , n}. Let π−1 be π with one element missing. π−1 [i] arrives in increasing order Task: Determine the missing number
  • 4. Introduction: Data Streams Data Streams Sequence is potentially infinite High amount of data: sublinear space High speed of arrival: sublinear time per example Once an element from a data stream has been processed it is discarded or archived Example Puzzle: Finding Missing Numbers Use a n-bit Let π be a permutation of {1, . . . , n}. vector to Let π−1 be π with one element memorize all the missing. numbers (O(n) space) π−1 [i] arrives in increasing order Task: Determine the missing number
  • 5. Introduction: Data Streams Data Streams Sequence is potentially infinite High amount of data: sublinear space High speed of arrival: sublinear time per example Once an element from a data stream has been processed it is discarded or archived Example Puzzle: Finding Missing Numbers Let π be a permutation of {1, . . . , n}. Data Streams: Let π−1 be π with one element O(log(n)) space. missing. π−1 [i] arrives in increasing order Task: Determine the missing number
  • 6. Introduction: Data Streams Data Streams Sequence is potentially infinite High amount of data: sublinear space High speed of arrival: sublinear time per example Once an element from a data stream has been processed it is discarded or archived Example Data Streams: Puzzle: Finding Missing Numbers O(log(n)) space. Let π be a permutation of {1, . . . , n}. Store Let π−1 be π with one element n(n + 1) missing. − ∑ π−1 [j]. 2 j≤i π−1 [i] arrives in increasing order Task: Determine the missing number
  • 7. Introduction: Trees Our trees are: Our subtrees are: Unlabeled Induced Ordered and Unordered Two different ordered trees but the same unordered tree
  • 8. Introduction Induced subtrees: obtained by repeatedly removing leaf nodes Embedded subtrees: obtained by contracting some of the edges
  • 9. Introduction What Is Tree Pattern Mining? Given a dataset of trees, find the complete set of frequent subtrees Frequent Tree Pattern (FS): Include all the trees whose support is no less than min_sup Closed Frequent Tree Pattern (CS): Include no tree which has a super-tree with the same support CS ⊆ FS Closed Frequent Tree Mining provides a compact representation of frequent trees without loss of information
  • 10. Introduction Unordered Subtree Mining A: B: X: Y: D = {A, B}, min_sup = 2 # Closed Subtrees : 2 # Frequent Subtrees: 9 Closed Subtrees: X, Y Frequent Subtrees:
  • 11. Introduction Problem Given a data stream D of rooted, unlabelled and unordered trees, find frequent closed trees. We provide three algorithms, of increasing power Incremental Sliding Window Adaptive D
  • 12. Outline 1 Introduction 2 Data Streams 3 ADWIN : Concept Drift Mining 4 Adaptive Closed Frequent Tree Mining 5 Summary
  • 13. Data Streams Data Streams At any time t in the data stream, we would like the per-item processing time and storage to be simultaneously O(log k (N, t)). Approximation algorithms Small error rate with high probability ˜ An algorithm (ε, δ )−approximates F if it outputs F for ˜ which Pr[|F − F | > εF ] < δ .
  • 14. Data Streams Approximation Algorithms 1011000111 1010101 Sliding Window We can maintain simple statistics over sliding windows, using O( 1 log2 N) space, where ε N is the length of the sliding window ε is the accuracy parameter M. Datar, A. Gionis, P. Indyk, and R. Motwani. Maintaining stream statistics over sliding windows. 2002
  • 15. Data Streams Approximation Algorithms 10110001111 0101011 Sliding Window We can maintain simple statistics over sliding windows, using O( 1 log2 N) space, where ε N is the length of the sliding window ε is the accuracy parameter M. Datar, A. Gionis, P. Indyk, and R. Motwani. Maintaining stream statistics over sliding windows. 2002
  • 16. Data Streams Approximation Algorithms 101100011110 1010111 Sliding Window We can maintain simple statistics over sliding windows, using O( 1 log2 N) space, where ε N is the length of the sliding window ε is the accuracy parameter M. Datar, A. Gionis, P. Indyk, and R. Motwani. Maintaining stream statistics over sliding windows. 2002
  • 17. Data Streams Approximation Algorithms 1011000111101 0101110 Sliding Window We can maintain simple statistics over sliding windows, using O( 1 log2 N) space, where ε N is the length of the sliding window ε is the accuracy parameter M. Datar, A. Gionis, P. Indyk, and R. Motwani. Maintaining stream statistics over sliding windows. 2002
  • 18. Data Streams Approximation Algorithms 10110001111010 1011101 Sliding Window We can maintain simple statistics over sliding windows, using O( 1 log2 N) space, where ε N is the length of the sliding window ε is the accuracy parameter M. Datar, A. Gionis, P. Indyk, and R. Motwani. Maintaining stream statistics over sliding windows. 2002
  • 19. Data Streams Approximation Algorithms 101100011110101 0111010 Sliding Window We can maintain simple statistics over sliding windows, using O( 1 log2 N) space, where ε N is the length of the sliding window ε is the accuracy parameter M. Datar, A. Gionis, P. Indyk, and R. Motwani. Maintaining stream statistics over sliding windows. 2002
  • 20. Outline 1 Introduction 2 Data Streams 3 ADWIN : Concept Drift Mining 4 Adaptive Closed Frequent Tree Mining 5 Summary
  • 21. ADWIN: Adaptive sliding window ADWIN An adaptive sliding window whose size is recomputed online according to the rate of change observed. ADWIN has rigorous guarantees (theorems) On ratio of false positives and negatives On the relation of the size of the current window and change rates ADWIN using a Data Stream Sliding Window Model, can provide the exact counts of 1’s in O(1) time per point. tries O(log W ) cutpoints uses O( 1 log W ) memory words ε the processing time per example is O(log W ) (amortized and worst-case).
  • 22. Time Change Detectors and Predictors: A General Framework Estimation - xt - Estimator
  • 23. Time Change Detectors and Predictors: A General Framework Estimation - xt - Estimator Alarm - - Change Detect.
  • 24. Time Change Detectors and Predictors: A General Framework Estimation - xt - Estimator Alarm - - Change Detect. 6 6 ? - Memory
  • 25. Window Management Models W = 101010110111111 Equal & fixed size Total window against subwindows subwindow 1010 1011011 1111 10101011011 1111 [Kifer+ 04] [Gama+ 04] Equal size adjacent ADWIN: All Adjacent subwindows subwindows 1010101 1011 1111 1 01010110111111 [Dasu+ 06]
  • 26. Window Management Models W = 101010110111111 Equal & fixed size Total window against subwindows subwindow 1010 1011011 1111 10101011011 1111 [Kifer+ 04] [Gama+ 04] Equal size adjacent ADWIN: All Adjacent subwindows subwindows 1010101 1011 1111 10 1010110111111 [Dasu+ 06]
  • 27. Window Management Models W = 101010110111111 Equal & fixed size Total window against subwindows subwindow 1010 1011011 1111 10101011011 1111 [Kifer+ 04] [Gama+ 04] Equal size adjacent ADWIN: All Adjacent subwindows subwindows 1010101 1011 1111 101 010110111111 [Dasu+ 06]
  • 28. Window Management Models W = 101010110111111 Equal & fixed size Total window against subwindows subwindow 1010 1011011 1111 10101011011 1111 [Kifer+ 04] [Gama+ 04] Equal size adjacent ADWIN: All Adjacent subwindows subwindows 1010101 1011 1111 1010 10110111111 [Dasu+ 06]
  • 29. Window Management Models W = 101010110111111 Equal & fixed size Total window against subwindows subwindow 1010 1011011 1111 10101011011 1111 [Kifer+ 04] [Gama+ 04] Equal size adjacent ADWIN: All Adjacent subwindows subwindows 1010101 1011 1111 10101 0110111111 [Dasu+ 06]
  • 30. Window Management Models W = 101010110111111 Equal & fixed size Total window against subwindows subwindow 1010 1011011 1111 10101011011 1111 [Kifer+ 04] [Gama+ 04] Equal size adjacent ADWIN: All Adjacent subwindows subwindows 1010101 1011 1111 101010 110111111 [Dasu+ 06]
  • 31. Window Management Models W = 101010110111111 Equal & fixed size Total window against subwindows subwindow 1010 1011011 1111 10101011011 1111 [Kifer+ 04] [Gama+ 04] Equal size adjacent ADWIN: All Adjacent subwindows subwindows 1010101 1011 1111 1010101 10111111 [Dasu+ 06]
  • 32. Window Management Models W = 101010110111111 Equal & fixed size Total window against subwindows subwindow 1010 1011011 1111 10101011011 1111 [Kifer+ 04] [Gama+ 04] Equal size adjacent ADWIN: All Adjacent subwindows subwindows 1010101 1011 1111 10101011 0111111 [Dasu+ 06]
  • 33. Window Management Models W = 101010110111111 Equal & fixed size Total window against subwindows subwindow 1010 1011011 1111 10101011011 1111 [Kifer+ 04] [Gama+ 04] Equal size adjacent ADWIN: All Adjacent subwindows subwindows 1010101 1011 1111 101010110 111111 [Dasu+ 06]
  • 34. Window Management Models W = 101010110111111 Equal & fixed size Total window against subwindows subwindow 1010 1011011 1111 10101011011 1111 [Kifer+ 04] [Gama+ 04] Equal size adjacent ADWIN: All Adjacent subwindows subwindows 1010101 1011 1111 1010101101 11111 [Dasu+ 06]
  • 35. Window Management Models W = 101010110111111 Equal & fixed size Total window against subwindows subwindow 1010 1011011 1111 10101011011 1111 [Kifer+ 04] [Gama+ 04] Equal size adjacent ADWIN: All Adjacent subwindows subwindows 1010101 1011 1111 10101011011 1111 [Dasu+ 06]
  • 36. Window Management Models W = 101010110111111 Equal & fixed size Total window against subwindows subwindow 1010 1011011 1111 10101011011 1111 [Kifer+ 04] [Gama+ 04] Equal size adjacent ADWIN: All Adjacent subwindows subwindows 1010101 1011 1111 101010110111 111 [Dasu+ 06]
  • 37. Window Management Models W = 101010110111111 Equal & fixed size Total window against subwindows subwindow 1010 1011011 1111 10101011011 1111 [Kifer+ 04] [Gama+ 04] Equal size adjacent ADWIN: All Adjacent subwindows subwindows 1010101 1011 1111 1010101101111 11 [Dasu+ 06]
  • 38. Window Management Models W = 101010110111111 Equal & fixed size Total window against subwindows subwindow 1010 1011011 1111 10101011011 1111 [Kifer+ 04] [Gama+ 04] Equal size adjacent ADWIN: All Adjacent subwindows subwindows 1010101 1011 1111 10101011011111 1 [Dasu+ 06] 11
  • 39. Outline 1 Introduction 2 Data Streams 3 ADWIN : Concept Drift Mining 4 Adaptive Closed Frequent Tree Mining 5 Summary
  • 40. Pattern Relaxed Support Guojie Song, Dongqing Yang, Bin Cui, Baihua Zheng, Yunfeng Liu and Kunqing Xie. CLAIM: An Efficient Method for Relaxed Frequent Closed Itemsets Mining over Stream Data Linear Relaxed Interval:The support space of all subpatterns can be divided into n = 1/εr intervals, where εr is a user-specified relaxed factor, and each interval can be denoted by Ii = [li , ui ), where li = (n − i) ∗ εr ≥ 0, ui = (n − i + 1) ∗ εr ≤ 1 and i ≤ n. Linear Relaxed closed subpattern t: if and only if there exists no proper superpattern t of t such that their suports belong to the same interval Ii .
  • 41. Pattern Relaxed Support As the number of closed frequent patterns is not linear with respect support, we introduce a new relaxed support: Logarithmic Relaxed Interval:The support space of all subpatterns can be divided into n = 1/εr intervals, where εr is a user-specified relaxed factor, and each interval can be denoted by Ii = [li , ui ), where li = c i , ui = c i+1 − 1 and i ≤ n. Logarithmic Relaxed closed subpattern t: if and only if there exists no proper superpattern t of t such that their suports belong to the same interval Ii .
  • 42. Galois Lattice of closed set of trees 1 2 3 D We need 12 13 23 a Galois connection pair a closure operator 123
  • 43. Incremental mining on closed frequent trees 1 Adding a tree transaction, does not decrease the number of closed trees for D. 1 2 3 2 Adding a transaction with a closed tree, does not modify the number of closed 12 13 23 trees for D. 123
  • 44. Sliding Window mining on closed frequent trees 1 Deleting a tree transaction, does not increase the number of closed trees for D. 1 2 3 2 Deleting a tree transaction that is repeated, does not modify the number of closed trees for 12 13 23 D. 123
  • 45. Algorithms Algorithms Incremental: I NC T REE N AT Sliding Window: W IN T REE N AT Adaptive: A DAT REE N AT Uses ADWIN to monitor change ADWIN An adaptive sliding window whose size is recomputed online according to the rate of change observed. ADWIN has rigorous guarantees (theorems) On ratio of false positives and negatives On the relation of the size of the current window and change rates
  • 46. Experimental Validation: TN1 CMTreeMiner 300 Time 200 (sec.) 100 I NC T REE N AT 2 4 6 8 Size (Milions) Figure: Time on experiments on ordered trees on TN1 dataset
  • 47. Experimental Validation 45 35 Number of Closed Trees 25 AdaTreeInc 1 AdaTreeInc 2 15 5 0 21.460 42.920 64.380 85.840 107.300 128.760 150.220 171.680 193.140 Number of Samples Figure: Number of closed trees maintaining the same number of closed datasets on input data
  • 48. Outline 1 Introduction 2 Data Streams 3 ADWIN : Concept Drift Mining 4 Adaptive Closed Frequent Tree Mining 5 Summary
  • 49. Summary Conclusions New logarithmic relaxed closed support Using Galois Latice Theory, we present methods for mining closed trees Incremental: I NC T REE N AT Sliding Window: W IN T REE N AT Adaptive: A DAT REE N AT using ADWIN to monitor change Future Work Labeled Trees and XML data.