SlideShare a Scribd company logo
1 of 104
Download to read offline
A Short Course in Data
Stream Mining
Albert Bifet
Shenzhen, 23 January 2015
Huawei Noah’s Ark Lab
Real time analytics
Real time analytics
Introduction: Data Streams
Data Streams
Sequence is potentially infinite
High amount of data: sublinear space
High speed of arrival: sublinear time per example
Once an element from a data stream has been processed it is
discarded or archived
Example
Puzzle: Finding Missing Numbers
Let π be a permutation of {1,...,n}.
Let π−1 be π with one element missing.
π−1[i] arrives in increasing order
Task: Determine the missing number
Introduction: Data Streams
Data Streams
Sequence is potentially infinite
High amount of data: sublinear space
High speed of arrival: sublinear time per example
Once an element from a data stream has been processed it is
discarded or archived
Example
Puzzle: Finding Missing Numbers
Let π be a permutation of {1,...,n}.
Let π−1 be π with one element missing.
π−1[i] arrives in increasing order
Task: Determine the missing number
Use a n-bit vector
to memorize all the
numbers (O(n)
space)
Introduction: Data Streams
Data Streams
Sequence is potentially infinite
High amount of data: sublinear space
High speed of arrival: sublinear time per example
Once an element from a data stream has been processed it is
discarded or archived
Example
Puzzle: Finding Missing Numbers
Let π be a permutation of {1,...,n}.
Let π−1 be π with one element missing.
π−1[i] arrives in increasing order
Task: Determine the missing number
Data Streams:
O(log(n)) space.
Introduction: Data Streams
Data Streams
Sequence is potentially infinite
High amount of data: sublinear space
High speed of arrival: sublinear time per example
Once an element from a data stream has been processed it is
discarded or archived
Example
Puzzle: Finding Missing Numbers
Let π be a permutation of {1,...,n}.
Let π−1 be π with one element missing.
π−1[i] arrives in increasing order
Task: Determine the missing number
Data Streams:
O(log(n)) space.
Store
n(n+1)
2
−∑
j≤i
π−1[j].
Data Streams
Data Streams
Sequence is potentially infinite
High amount of data: sublinear space
High speed of arrival: sublinear time per example
Once an element from a data stream has been processed it is
discarded or archived
Tools:
approximation
randomization, sampling
sketching
Data Streams
Data Streams
Sequence is potentially infinite
High amount of data: sublinear space
High speed of arrival: sublinear time per example
Once an element from a data stream has been processed it is
discarded or archived
Approximation algorithms
Small error rate with high probability
An algorithm (ε,δ)−approximates F if it outputs ˜F for which
Pr[| ˜F −F| > εF] < δ.
Data Streams Approximation
Algorithms
1011000111 1010101
Sliding Window
We can maintain simple statistics over sliding windows, using
O(1
ε log2
N) space, where
N is the length of the sliding window
ε is the accuracy parameter
M. Datar, A. Gionis, P. Indyk, and R. Motwani.
Maintaining stream statistics over sliding windows. 2002
Data Streams Approximation
Algorithms
10110001111 0101011
Sliding Window
We can maintain simple statistics over sliding windows, using
O(1
ε log2
N) space, where
N is the length of the sliding window
ε is the accuracy parameter
M. Datar, A. Gionis, P. Indyk, and R. Motwani.
Maintaining stream statistics over sliding windows. 2002
Data Streams Approximation
Algorithms
101100011110 1010111
Sliding Window
We can maintain simple statistics over sliding windows, using
O(1
ε log2
N) space, where
N is the length of the sliding window
ε is the accuracy parameter
M. Datar, A. Gionis, P. Indyk, and R. Motwani.
Maintaining stream statistics over sliding windows. 2002
Data Streams Approximation
Algorithms
1011000111101 0101110
Sliding Window
We can maintain simple statistics over sliding windows, using
O(1
ε log2
N) space, where
N is the length of the sliding window
ε is the accuracy parameter
M. Datar, A. Gionis, P. Indyk, and R. Motwani.
Maintaining stream statistics over sliding windows. 2002
Data Streams Approximation
Algorithms
10110001111010 1011101
Sliding Window
We can maintain simple statistics over sliding windows, using
O(1
ε log2
N) space, where
N is the length of the sliding window
ε is the accuracy parameter
M. Datar, A. Gionis, P. Indyk, and R. Motwani.
Maintaining stream statistics over sliding windows. 2002
Data Streams Approximation
Algorithms
101100011110101 0111010
Sliding Window
We can maintain simple statistics over sliding windows, using
O(1
ε log2
N) space, where
N is the length of the sliding window
ε is the accuracy parameter
M. Datar, A. Gionis, P. Indyk, and R. Motwani.
Maintaining stream statistics over sliding windows. 2002
Classification
Classification
Definition
Given nC different classes, a classifier algorithm builds a model that
predicts for every unlabelled instance I the class C to which it belongs
with accuracy.
Example
A spam filter
Example
Twitter Sentiment analysis: analyze tweets with positive or negative
feelings
Data stream classification cycle
1 Process an example at a time, and
inspect it only once (at most)
2 Use a limited amount of memory
3 Work in a limited amount of time
4 Be ready to predict at any point
Classification
Data set that
describes e-mail
features for
deciding if it is
spam.
Example
Contains Domain Has Time
“Money” type attach. received spam
yes com yes night yes
yes edu no night yes
no com yes night yes
no edu no day no
no com no day no
yes cat no day yes
Assume we have to classify the following new instance:
Contains Domain Has Time
“Money” type attach. received spam
yes edu yes day ?
Bayes Classifiers
Na¨ıve Bayes
Based on Bayes Theorem:
P(c|d) =
P(c)P(d|c)
P(d)
posterior =
prior ×likelikood
evidence
Estimates the probability of observing attribute a and the prior
probability P(c)
Probability of class c given an instance d:
P(c|d) =
P(c)∏a∈d P(a|c)
P(d)
Bayes Classifiers
Multinomial Na¨ıve Bayes
Considers a document as a bag-of-words.
Estimates the probability of observing word w and the prior
probability P(c)
Probability of class c given a test document d:
P(c|d) =
P(c)∏w∈d P(w|c)nwd
P(d)
Perceptron
Attribute 1
Attribute 2
Attribute 3
Attribute 4
Attribute 5
Output hw(xi)
w1
w2
w3
w4
w5
Data stream: xi,yi
Classical perceptron: hw(xi) = sgn(wTxi),
Minimize Mean-square error: J(w) = 1
2 ∑(yi −hw(xi))2
Perceptron
Attribute 1
Attribute 2
Attribute 3
Attribute 4
Attribute 5
Output hw(xi)
w1
w2
w3
w4
w5
We use sigmoid function hw = σ(wTx) where
σ(x) = 1/(1+e−x
)
σ (x) = σ(x)(1−σ(x))
Perceptron
Minimize Mean-square error: J(w) = 1
2 ∑(yi −hw(xi))2
Stochastic Gradient Descent: w = w−η∇Jxi
Gradient of the error function:
∇J = −∑
i
(yi −hw(xi))∇hw(xi)
∇hw(xi) = hw(xi)(1−hw(xi))
Weight update rule
w = w+η ∑
i
(yi −hw(xi))hw(xi)(1−hw(xi))xi
Perceptron
PERCEPTRON LEARNING(Stream,η)
1 for each class
2 do PERCEPTRON LEARNING(Stream,class,η)
PERCEPTRON LEARNING(Stream,class,η)
1 £ Let w0 and w be randomly initialized
2 for each example (x,y) in Stream
3 do if class = y
4 then δ = (1−hw(x))·hw(x)·(1−hw(x))
5 else δ = (0−hw(x))·hw(x)·(1−hw(x))
6 w = w+η ·δ ·x
PERCEPTRON PREDICTION(x)
1 return argmaxclass hwclass
(x)
Classification
Data set that
describes e-mail
features for
deciding if it is
spam.
Example
Contains Domain Has Time
“Money” type attach. received spam
yes com yes night yes
yes edu no night yes
no com yes night yes
no edu no day no
no com no day no
yes cat no day yes
Assume we have to classify the following new instance:
Contains Domain Has Time
“Money” type attach. received spam
yes edu yes day ?
Classification
Assume we have to classify the following new instance:
Contains Domain Has Time
“Money” type attach. received spam
yes edu yes day ?
Time
Contains “Money”
YES
Yes
NO
No
Day
YES
Night
Decision Trees
Basic induction strategy:
A ← the “best” decision attribute for next node
Assign A as decision attribute for node
For each value of A, create new descendant of node
Sort training examples to leaf nodes
If training examples perfectly classified, Then STOP, Else iterate
over new leaf nodes
Hoeffding Trees
Hoeffding Tree : VFDT
Pedro Domingos and Geoff Hulten.
Mining high-speed data streams. 2000
With high probability, constructs an identical model that a
traditional (greedy) method would learn
With theoretical guarantees on the error rate
Time
Contains “Money”
YES
Yes
NO
No
Day
YES
Night
Hoeffding Bound Inequality
Probability of deviation of its expected value.
Hoeffding Bound Inequality
Let X = ∑i Xi where X1,...,Xn are independent and indentically
distributed in [0,1]. Then
1 Chernoff For each ε < 1
Pr[X > (1+ε)E[X]] ≤ exp −
ε2
3
E[X]
2 Hoeffding For each t > 0
Pr[X > E[X]+t] ≤ exp −2t2
/n
3 Bernstein Let σ2 = ∑i σ2
i the variance of X. If Xi −E[Xi] ≤ b for
each i ∈ [n] then for each t > 0
Pr[X > E[X]+t] ≤ exp −
t2
2σ2 + 2
3 bt
Hoeffding Tree or VFDT
HT(Stream,δ)
1 £ Let HT be a tree with a single leaf(root)
2 £ Init counts nijk at root
3 for each example (x,y) in Stream
4 do HTGROW((x,y),HT,δ)
HTGROW((x,y),HT,δ)
1 £ Sort (x,y) to leaf l using HT
2 £ Update counts nijk at leaf l
3 if examples seen so far at l are not all of the same class
4 then £ Compute G for each attribute
5 if G(Best Attr.)−G(2nd best) > R2 ln1/δ
2n
6 then £ Split leaf on best attribute
7 for each branch
8 do £ Start new leaf and initiliatize counts
Hoeffding Tree or VFDT
HT(Stream,δ)
1 £ Let HT be a tree with a single leaf(root)
2 £ Init counts nijk at root
3 for each example (x,y) in Stream
4 do HTGROW((x,y),HT,δ)
HTGROW((x,y),HT,δ)
1 £ Sort (x,y) to leaf l using HT
2 £ Update counts nijk at leaf l
3 if examples seen so far at l are not all of the same class
4 then £ Compute G for each attribute
5 if G(Best Attr.)−G(2nd best) > R2 ln1/δ
2n
6 then £ Split leaf on best attribute
7 for each branch
8 do £ Start new leaf and initiliatize counts
Hoeffding Trees
HT features
With high probability, constructs an identical model that a
traditional (greedy) method would learn
Ties: when two attributes have similar G, split if
G(Best Attr.)−G(2nd best) <
R2 ln1/δ
2n
< τ
Compute G every nmin instances
Memory: deactivate least promising nodes with lower pl ×el
pl is the probability to reach leaf l
el is the error in the node
Hoeffding Naive Bayes Tree
Hoeffding Tree
Majority Class learner at leaves
Hoeffding Naive Bayes Tree
G. Holmes, R. Kirkby, and B. Pfahringer.
Stress-testing Hoeffding trees, 2005.
monitors accuracy of a Majority Class learner
monitors accuracy of a Naive Bayes learner
predicts using the most accurate method
Bagging
Example
Dataset of 4 Instances : A, B, C, D
Classifier 1: B, A, C, B
Classifier 2: D, B, A, D
Classifier 3: B, A, C, B
Classifier 4: B, C, B, B
Classifier 5: D, C, A, C
Bagging builds a set of M base models, with a bootstrap sample
created by drawing random samples with
replacement.
Bagging
Example
Dataset of 4 Instances : A, B, C, D
Classifier 1: A, B, B, C
Classifier 2: A, B, D, D
Classifier 3: A, B, B, C
Classifier 4: B, B, B, C
Classifier 5: A, C, C, D
Bagging builds a set of M base models, with a bootstrap sample
created by drawing random samples with
replacement.
Bagging
Example
Dataset of 4 Instances : A, B, C, D
Classifier 1: A, B, B, C: A(1) B(2) C(1) D(0)
Classifier 2: A, B, D, D: A(1) B(1) C(0) D(2)
Classifier 3: A, B, B, C: A(1) B(2) C(1) D(0)
Classifier 4: B, B, B, C: A(0) B(3) C(1) D(0)
Classifier 5: A, C, C, D: A(1) B(0) C(2) D(1)
Each base model’s training set contains each of the original training
example K times where P(K = k) follows a binomial distribution.
Bagging
Figure : Poisson(1) Distribution.
Each base model’s training set contains each of the original training
example K times where P(K = k) follows a binomial distribution.
Oza and Russell’s Online Bagging
for M models
1: Initialize base models hm for all m ∈ {1,2,...,M}
2: for all training examples do
3: for m = 1,2,...,M do
4: Set w = Poisson(1)
5: Update hm with the current example with weight w
6: anytime output:
7: return hypothesis: hfin(x) = argmaxy∈Y ∑T
t=1 I(ht(x) = y)
Evolving Stream Classification
Data Mining Algorithms with
Concept Drift
No Concept Drift
-
input output
DM Algorithm
-
Counter1
Counter2
Counter3
Counter4
Counter5
Concept Drift
-
input output
DM Algorithm
Static Model
-
Change Detect.
-
6
Data Mining Algorithms with
Concept Drift
No Concept Drift
-
input output
DM Algorithm
-
Counter1
Counter2
Counter3
Counter4
Counter5
Concept Drift
-
input output
DM Algorithm
-
Estimator1
Estimator2
Estimator3
Estimator4
Estimator5
Optimal Change Detector and
Predictor
High accuracy
Low false positives and false negatives ratios
Theoretical guarantees
Fast detection of change
Low computational cost: minimum space and time needed
No parameters needed
Algorithm ADaptive Sliding
WINdow
Example
W= 101010110111111
W0= 1
ADWIN: ADAPTIVE WINDOWING ALGORITHM
1 Initialize Window W
2 for each t  0
3 do W ← W ∪{xt} (i.e., add xt to the head of W)
4 repeat Drop elements from the tail of W
5 until | ˆµW0 − ˆµW1 | ≥ εc holds
6 for every split of W into W = W0 ·W1
7 Output ˆµW
Algorithm ADaptive Sliding
WINdow
Example
W= 101010110111111
W0= 1 W1 = 01010110111111
ADWIN: ADAPTIVE WINDOWING ALGORITHM
1 Initialize Window W
2 for each t  0
3 do W ← W ∪{xt} (i.e., add xt to the head of W)
4 repeat Drop elements from the tail of W
5 until | ˆµW0 − ˆµW1 | ≥ εc holds
6 for every split of W into W = W0 ·W1
7 Output ˆµW
Algorithm ADaptive Sliding
WINdow
Example
W= 101010110111111
W0= 10 W1 = 1010110111111
ADWIN: ADAPTIVE WINDOWING ALGORITHM
1 Initialize Window W
2 for each t  0
3 do W ← W ∪{xt} (i.e., add xt to the head of W)
4 repeat Drop elements from the tail of W
5 until | ˆµW0 − ˆµW1 | ≥ εc holds
6 for every split of W into W = W0 ·W1
7 Output ˆµW
Algorithm ADaptive Sliding
WINdow
Example
W= 101010110111111
W0= 101 W1 = 010110111111
ADWIN: ADAPTIVE WINDOWING ALGORITHM
1 Initialize Window W
2 for each t  0
3 do W ← W ∪{xt} (i.e., add xt to the head of W)
4 repeat Drop elements from the tail of W
5 until | ˆµW0 − ˆµW1 | ≥ εc holds
6 for every split of W into W = W0 ·W1
7 Output ˆµW
Algorithm ADaptive Sliding
WINdow
Example
W= 101010110111111
W0= 1010 W1 = 10110111111
ADWIN: ADAPTIVE WINDOWING ALGORITHM
1 Initialize Window W
2 for each t  0
3 do W ← W ∪{xt} (i.e., add xt to the head of W)
4 repeat Drop elements from the tail of W
5 until | ˆµW0 − ˆµW1 | ≥ εc holds
6 for every split of W into W = W0 ·W1
7 Output ˆµW
Algorithm ADaptive Sliding
WINdow
Example
W= 101010110111111
W0= 10101 W1 = 0110111111
ADWIN: ADAPTIVE WINDOWING ALGORITHM
1 Initialize Window W
2 for each t  0
3 do W ← W ∪{xt} (i.e., add xt to the head of W)
4 repeat Drop elements from the tail of W
5 until | ˆµW0 − ˆµW1 | ≥ εc holds
6 for every split of W into W = W0 ·W1
7 Output ˆµW
Algorithm ADaptive Sliding
WINdow
Example
W= 101010110111111
W0= 101010 W1 = 110111111
ADWIN: ADAPTIVE WINDOWING ALGORITHM
1 Initialize Window W
2 for each t  0
3 do W ← W ∪{xt} (i.e., add xt to the head of W)
4 repeat Drop elements from the tail of W
5 until | ˆµW0 − ˆµW1 | ≥ εc holds
6 for every split of W into W = W0 ·W1
7 Output ˆµW
Algorithm ADaptive Sliding
WINdow
Example
W= 101010110111111
W0= 1010101 W1 = 10111111
ADWIN: ADAPTIVE WINDOWING ALGORITHM
1 Initialize Window W
2 for each t  0
3 do W ← W ∪{xt} (i.e., add xt to the head of W)
4 repeat Drop elements from the tail of W
5 until | ˆµW0 − ˆµW1 | ≥ εc holds
6 for every split of W into W = W0 ·W1
7 Output ˆµW
Algorithm ADaptive Sliding
WINdow
Example
W= 101010110111111
W0= 10101011 W1 = 0111111
ADWIN: ADAPTIVE WINDOWING ALGORITHM
1 Initialize Window W
2 for each t  0
3 do W ← W ∪{xt} (i.e., add xt to the head of W)
4 repeat Drop elements from the tail of W
5 until | ˆµW0 − ˆµW1 | ≥ εc holds
6 for every split of W into W = W0 ·W1
7 Output ˆµW
Algorithm ADaptive Sliding
WINdow
Example
W= 101010110111111 | ˆµW0 − ˆµW1 | ≥ εc : CHANGE DET.!
W0= 101010110 W1 = 111111
ADWIN: ADAPTIVE WINDOWING ALGORITHM
1 Initialize Window W
2 for each t  0
3 do W ← W ∪{xt} (i.e., add xt to the head of W)
4 repeat Drop elements from the tail of W
5 until | ˆµW0 − ˆµW1 | ≥ εc holds
6 for every split of W into W = W0 ·W1
7 Output ˆµW
Algorithm ADaptive Sliding
WINdow
Example
W= 101010110111111 Drop elements from the tail of W
W0= 101010110 W1 = 111111
ADWIN: ADAPTIVE WINDOWING ALGORITHM
1 Initialize Window W
2 for each t  0
3 do W ← W ∪{xt} (i.e., add xt to the head of W)
4 repeat Drop elements from the tail of W
5 until | ˆµW0 − ˆµW1 | ≥ εc holds
6 for every split of W into W = W0 ·W1
7 Output ˆµW
Algorithm ADaptive Sliding
WINdow
Example
W= 01010110111111 Drop elements from the tail of W
W0= 101010110 W1 = 111111
ADWIN: ADAPTIVE WINDOWING ALGORITHM
1 Initialize Window W
2 for each t  0
3 do W ← W ∪{xt} (i.e., add xt to the head of W)
4 repeat Drop elements from the tail of W
5 until | ˆµW0 − ˆµW1 | ≥ εc holds
6 for every split of W into W = W0 ·W1
7 Output ˆµW
Algorithm ADaptive Sliding
WINdow
Theorem
At every time step we have:
1 (False positive rate bound). If µt remains constant within W, the
probability that ADWIN shrinks the window at this step is at most
δ.
2 (False negative rate bound). Suppose that for some partition of
W in two parts W0W1 (where W1 contains the most recent items)
we have |µW0 − µW1 |  2εc. Then with probability 1−δ ADWIN
shrinks W to W1, or shorter.
ADWIN tunes itself to the data stream at hand, with no need for the
user to hardwire or precompute parameters.
Algorithm ADaptive Sliding
WINdow
ADWIN using a Data Stream Sliding Window Model,
can provide the exact counts of 1’s in O(1) time per point.
tries O(logW) cutpoints
uses O(1
ε logW) memory words
the processing time per example is O(logW) (amortized and
worst-case).
Sliding Window Model
1010101 101 11 1 1
Content: 4 2 2 1 1
Capacity: 7 3 2 1 1
VFDT / CVFDT
Concept-adapting Very Fast Decision Trees: CVFDT
G. Hulten, L. Spencer, and P. Domingos.
Mining time-changing data streams. 2001
It keeps its model consistent with a sliding window of examples
Construct “alternative branches” as preparation for changes
If the alternative branch becomes more accurate, switch of tree
branches occurs
Time
Contains “Money”
YES
Yes
NO
No
Day
YES
Night
Decision Trees: CVFDT
Time
Contains “Money”
YES
Yes
NO
No
Day
YES
Night
No theoretical guarantees on the error rate of CVFDT
CVFDT parameters :
1 W: is the example window size.
2 T0: number of examples used to check at each node if the
splitting attribute is still the best.
3 T1: number of examples used to build the alternate tree.
4 T2: number of examples used to test the accuracy of the alternate
tree.
Decision Trees: Hoeffding Adaptive
Tree
Hoeffding Adaptive Tree:
replace frequency statistics counters by estimators
don’t need a window to store examples, due to the fact that we
maintain the statistics data needed with estimators
change the way of checking the substitution of alternate subtrees,
using a change detector with theoretical guarantees
Advantages over CVFDT:
1 Theoretical guarantees
2 No Parameters
ADWIN Bagging (KDD’09)
ADWIN
An adaptive sliding window whose size is recomputed online
according to the rate of change observed.
ADWIN has rigorous guarantees (theorems)
On ratio of false positives and negatives
On the relation of the size of the current window and change
rates
ADWIN Bagging
When a change is detected, the worst classifier is removed and a new
classifier is added.
Leveraging Bagging for Evolving
Data Streams (ECML-PKDD’10)
Randomization as a powerful tool to increase accuracy and diversity
There are three ways of using randomization:
Manipulating the input data
Manipulating the classifier algorithms
Manipulating the output targets
Leveraging Bagging for Evolving
Data Streams
Leveraging Bagging
Using Poisson(λ)
Leveraging Bagging MC
Using Poisson(λ) and Random Output Codes
Fast Leveraging Bagging ME
if an instance is misclassified: weight = 1
if not: weight = eT/(1−eT),
Empirical evaluation
Accuracy RAM-Hours
Hoeffding Tree 74.03% 0.01
Online Bagging 77.15% 2.98
ADWIN Bagging 79.24% 1.48
Leveraging Bagging 85.54% 20.17
Leveraging Bagging MC 85.37% 22.04
Leveraging Bagging ME 80.77% 0.87
Leveraging Bagging
Leveraging Bagging
Using Poisson(λ)
Leveraging Bagging MC
Using Poisson(λ) and Random Output Codes
Leveraging Bagging ME
Using weight 1 if misclassified, otherwise eT/(1−eT)
Clustering
Clustering
Definition
Clustering is the distribution of a set of instances of examples into
non-known groups according to some common relations or affinities.
Example
Market segmentation of customers
Example
Social network communities
Clustering
Definition
Given
a set of instances I
a number of clusters K
an objective function cost(I)
a clustering algorithm computes an assignment of a cluster for each
instance
f : I → {1,...,K}
that minimizes the objective function cost(I)
Clustering
Definition
Given
a set of instances I
a number of clusters K
an objective function cost(C,I)
a clustering algorithm computes a set C of instances with |C| = K that
minimizes the objective function
cost(C,I) = ∑
x∈I
d2
(x,C)
where
d(x,c): distance function between x and c
d2(x,C) = minc∈Cd2(x,c): distance from x to the nearest point in
C
k-means
1. Choose k initial centers C = {c1,...,ck}
2. while stopping criterion has not been met
For i = 1,...,N
find closest center ck ∈ C to each instance pi
assign instance pi to cluster Ck
For k = 1,...,K
set ck to be the center of mass of all points in Ci
k-means++
1. Choose a initial center c1
For k = 2,...,K
select ck = p ∈ I with probability d2(p,C)/cost(C,I)
2. while stopping criterion has not been met
For i = 1,...,N
find closest center ck ∈ C to each instance pi
assign instance pi to cluster Ck
For k = 1,...,K
set ck to be the center of mass of all points in Ci
Performance Measures
Internal Measures
Sum square distance
Dunn index D = dmin
dmax
C-Index C = S−Smin
Smax−Smin
External Measures
Rand Measure
F Measure
Jaccard
Purity
BIRCH
BALANCED ITERATIVE REDUCING AND CLUSTERING USING
HIERARCHIES
Clustering Features CF = (N,LS,SS)
N: number of data points
LS: linear sum of the N data points
SS: square sum of the N data points
Properties:
Additivity: CF1 +CF2 = (N1 +N2,LS1 +LS2,SS1 +SS2)
Easy to compute: average inter-cluster distance
and average intra-cluster distance
Uses CF tree
Height-balanced tree with two parameters
B: branching factor
T: radius leaf threshold
BIRCH
BALANCED ITERATIVE REDUCING AND CLUSTERING USING
HIERARCHIES
Phase 1: Scan all data and build an initial in-memory CF tree
Phase 2: Condense into desirable range by building a smaller CF
tree (optional)
Phase 3: Global clustering
Phase 4: Cluster refining (optional and off line, as requires more
passes)
Clu-Stream
Clu-Stream
Uses micro-clusters to store statistics on-line
Clustering Features CF = (N,LS,SS,LT,ST)
N: numer of data points
LS: linear sum of the N data points
SS: square sum of the N data points
LT: linear sum of the time stamps
ST: square sum of the time stamps
Uses pyramidal time frame
Clu-Stream
On-line Phase
For each new point that arrives
the point is absorbed by a micro-cluster
the point starts a new micro-cluster of its own
delete oldest micro-cluster
merge two of the oldest micro-cluster
Off-line Phase
Apply k-means using microclusters as points
StreamKM++: Coresets
Coreset of a set P with respect to some problem
Small subset that approximates the original set P.
Solving the problem for the coreset provides an approximate
solution for the problem on P.
(k,ε)-coreset
A (k,ε)-coreset S of P is a subset of P that for each C of size k
(1−ε)cost(P,C) ≤ costw(S,C) ≤ (1+ε)cost(P,C)
StreamKM++: Coresets
Coreset Tree
Choose a leaf l node at random
Choose a new sample point denoted by qt+1 from Pl according to
d2
Based on ql and qt+1, split Pl into two subclusters and create two
child nodes
StreamKM++
Maintain L = log2( n
m )+2 buckets B0,B1,...,BL−1
Frequent Pattern Mining
Frequent Patterns
Suppose D is a dataset of patterns, t ∈ D, and min sup is a constant.
Definition
Support (t): number of patterns
in D that are superpatterns of t.
Definition
Pattern t is frequent if
Support (t) ≥ min sup.
Frequent Subpattern Problem
Given D and min sup, find all frequent subpatterns of patterns in D.
Frequent Patterns
Suppose D is a dataset of patterns, t ∈ D, and min sup is a constant.
Definition
Support (t): number of patterns
in D that are superpatterns of t.
Definition
Pattern t is frequent if
Support (t) ≥ min sup.
Frequent Subpattern Problem
Given D and min sup, find all frequent subpatterns of patterns in D.
Frequent Patterns
Suppose D is a dataset of patterns, t ∈ D, and min sup is a constant.
Definition
Support (t): number of patterns
in D that are superpatterns of t.
Definition
Pattern t is frequent if
Support (t) ≥ min sup.
Frequent Subpattern Problem
Given D and min sup, find all frequent subpatterns of patterns in D.
Frequent Patterns
Suppose D is a dataset of patterns, t ∈ D, and min sup is a constant.
Definition
Support (t): number of patterns
in D that are superpatterns of t.
Definition
Pattern t is frequent if
Support (t) ≥ min sup.
Frequent Subpattern Problem
Given D and min sup, find all frequent subpatterns of patterns in D.
Pattern Mining
Dataset Example
Document Patterns
d1 abce
d2 cde
d3 abce
d4 acde
d5 abcde
d6 bcd
Itemset Mining
d1 abce
d2 cde
d3 abce
d4 acde
d5 abcde
d6 bcd
Support Frequent
d1,d2,d3,d4,d5,d6 c
d1,d2,d3,d4,d5 e,ce
d1,d3,d4,d5 a,ac,ae,ace
d1,d3,d5,d6 b,bc
d2,d4,d5,d6 d,cd
d1,d3,d5 ab,abc,abe
be,bce,abce
d2,d4,d5 de,cde
minimal support = 3
Itemset Mining
d1 abce
d2 cde
d3 abce
d4 acde
d5 abcde
d6 bcd
Support Frequent
6 c
5 e,ce
4 a,ac,ae,ace
4 b,bc
4 d,cd
3 ab,abc,abe
be,bce,abce
3 de,cde
Itemset Mining
d1 abce
d2 cde
d3 abce
d4 acde
d5 abcde
d6 bcd
Support Frequent Gen Closed
6 c c c
5 e,ce e ce
4 a,ac,ae,ace a ace
4 b,bc b bc
4 d,cd d cd
3 ab,abc,abe ab
be,bce,abce be abce
3 de,cde de cde
Itemset Mining
d1 abce
d2 cde
d3 abce
d4 acde
d5 abcde
d6 bcd
Support Frequent Gen Closed Max
6 c c c
5 e,ce e ce
4 a,ac,ae,ace a ace
4 b,bc b bc
4 d,cd d cd
3 ab,abc,abe ab
be,bce,abce be abce abce
3 de,cde de cde cde
Itemset Mining
d1 abce
d2 cde
d3 abce
d4 acde
d5 abcde
d6 bcd
Support Frequent Gen Closed Max
6 c c c
5 e,ce e ce
4 a,ac,ae,ace a ace
4 b,bc b bc
4 d,cd d cd
3 ab,abc,abe ab
be,bce,abce be abce abce
3 de,cde de cde cde
Itemset Mining
d1 abce
d2 cde
d3 abce
d4 acde
d5 abcde
d6 bcd
e → ce
Support Frequent Gen Closed Max
6 c c c
5 e,ce e ce
4 a,ac,ae,ace a ace
4 b,bc b bc
4 d,cd d cd
3 ab,abc,abe ab
be,bce,abce be abce abce
3 de,cde de cde cde
Itemset Mining
d1 abce
d2 cde
d3 abce
d4 acde
d5 abcde
d6 bcd
Support Frequent Gen Closed Max
6 c c c
5 e,ce e ce
4 a,ac,ae,ace a ace
4 b,bc b bc
4 d,cd d cd
3 ab,abc,abe ab
be,bce,abce be abce abce
3 de,cde de cde cde
Itemset Mining
d1 abce
d2 cde
d3 abce
d4 acde
d5 abcde
d6 bcd
Support Frequent Gen Closed Max
6 c c c
5 e,ce e ce
4 a,ac,ae,ace a ace
4 b,bc b bc
4 d,cd d cd
3 ab,abc,abe ab
be,bce,abce be abce abce
3 de,cde de cde cde
Itemset Mining
d1 abce
d2 cde
d3 abce
d4 acde
d5 abcde
d6 bcd
a → ace
Support Frequent Gen Closed Max
6 c c c
5 e,ce e ce
4 a,ac,ae,ace a ace
4 b,bc b bc
4 d,cd d cd
3 ab,abc,abe ab
be,bce,abce be abce abce
3 de,cde de cde cde
Itemset Mining
d1 abce
d2 cde
d3 abce
d4 acde
d5 abcde
d6 bcd
Support Frequent Gen Closed Max
6 c c c
5 e,ce e ce
4 a,ac,ae,ace a ace
4 b,bc b bc
4 d,cd d cd
3 ab,abc,abe ab
be,bce,abce be abce abce
3 de,cde de cde cde
Closed Patterns
Usually, there are too many frequent patterns. We can compute a
smaller set, while keeping the same information.
Example
A set of 1000 items, has 21000 ≈ 10301 subsets, that is more than the
number of atoms in the universe ≈ 1079
Closed Patterns
A priori property
If t is a subpattern of t, then Support (t ) ≥ Support (t).
Definition
A frequent pattern t is closed if none of its proper superpatterns has
the same support as it has.
Frequent subpatterns and their supports can be generated from closed
patterns.
Maximal Patterns
Definition
A frequent pattern t is maximal if none of its proper superpatterns is
frequent.
Frequent subpatterns can be generated from maximal patterns, but not
with their support.
All maximal patterns are closed, but not all closed patterns are
maximal.
Non streaming frequent itemset
miners
Representation:
Horizontal layout
T1: a, b, c
T2: b, c, e
T3: b, d, e
Vertical layout
a: 1 0 0
b: 1 1 1
c: 1 1 0
Search:
Breadth-first (levelwise): Apriori
Depth-first: Eclat, FP-Growth
Mining Patterns over Data Streams
Requirements: fast, use small amount of memory and adaptive
Type:
Exact
Approximate
Per batch, per transaction
Incremental, Sliding Window, Adaptive
Frequent, Closed, Maximal patterns
Moment
Computes closed frequents itemsets in a sliding window
Uses Closed Enumeration Tree
Uses 4 type of Nodes:
Closed Nodes
Intermediate Nodes
Unpromising Gateway Nodes
Infrequent Gateway Nodes
Adding transactions: closed items remains closed
Removing transactions: infrequent items remains infrequent
FP-Stream
Mining Frequent Itemsets at Multiple Time Granularities
Based in FP-Growth
Maintains
pattern tree
tilted-time window
Allows to answer time-sensitive queries
Places greater information to recent data
Drawback: time and memory complexity
Tree and Graph Mining: Dealing
with time changes
Keep a window on recent stream elements
Actually, just its lattice of closed sets!
Keep track of number of closed patterns in lattice, N
Use some change detector on N
When change is detected:
Drop stale part of the window
Update lattice to reflect this deletion, using deletion rule
Alternatively, sliding window of some fixed size
Summary
A short course in Data Stream
Mining
Short Course Summary
1 Introduction
2 Classification
3 Ensemble Methods
4 Clustering
5 Frequent Pattern Mining
Open Source Software
1 MOA: http://moa.cms.waikato.ac.nz/
2 SAMOA: http://samoa-project.net/

More Related Content

What's hot

Efficient Data Stream Classification via Probabilistic Adaptive Windows
Efficient Data Stream Classification via Probabilistic Adaptive WindowsEfficient Data Stream Classification via Probabilistic Adaptive Windows
Efficient Data Stream Classification via Probabilistic Adaptive WindowsAlbert Bifet
 
Leveraging Bagging for Evolving Data Streams
Leveraging Bagging for Evolving Data StreamsLeveraging Bagging for Evolving Data Streams
Leveraging Bagging for Evolving Data StreamsAlbert Bifet
 
EuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOME
EuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOMEEuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOME
EuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOMEHONGJOO LEE
 
Distance oracle - Truy vấn nhanh khoảng cách giữa hai điểm bất kỳ trên đồ thị
Distance oracle - Truy vấn nhanh khoảng cách giữa hai điểm bất kỳ trên đồ thịDistance oracle - Truy vấn nhanh khoảng cách giữa hai điểm bất kỳ trên đồ thị
Distance oracle - Truy vấn nhanh khoảng cách giữa hai điểm bất kỳ trên đồ thịHong Ong
 
Distributed GLM with H2O - Atlanta Meetup
Distributed GLM with H2O - Atlanta MeetupDistributed GLM with H2O - Atlanta Meetup
Distributed GLM with H2O - Atlanta MeetupSri Ambati
 
PAC Bayesian for Deep Learning
PAC Bayesian for Deep LearningPAC Bayesian for Deep Learning
PAC Bayesian for Deep LearningMark Chang
 
Introduction to Data streaming - 05/12/2014
Introduction to Data streaming - 05/12/2014Introduction to Data streaming - 05/12/2014
Introduction to Data streaming - 05/12/2014Raja Chiky
 
PAC-Bayesian Bound for Deep Learning
PAC-Bayesian Bound for Deep LearningPAC-Bayesian Bound for Deep Learning
PAC-Bayesian Bound for Deep LearningMark Chang
 
Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16
Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16
Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16MLconf
 
Sebastian Schelter – Distributed Machine Learing with the Samsara DSL
Sebastian Schelter – Distributed Machine Learing with the Samsara DSLSebastian Schelter – Distributed Machine Learing with the Samsara DSL
Sebastian Schelter – Distributed Machine Learing with the Samsara DSLFlink Forward
 
Parallel Optimization in Machine Learning
Parallel Optimization in Machine LearningParallel Optimization in Machine Learning
Parallel Optimization in Machine LearningFabian Pedregosa
 
Scaling Python to CPUs and GPUs
Scaling Python to CPUs and GPUsScaling Python to CPUs and GPUs
Scaling Python to CPUs and GPUsTravis Oliphant
 
Scipy 2011 Time Series Analysis in Python
Scipy 2011 Time Series Analysis in PythonScipy 2011 Time Series Analysis in Python
Scipy 2011 Time Series Analysis in PythonWes McKinney
 
Introduction to Deep Learning with Python
Introduction to Deep Learning with PythonIntroduction to Deep Learning with Python
Introduction to Deep Learning with Pythonindico data
 
TensorFrames: Google Tensorflow on Apache Spark
TensorFrames: Google Tensorflow on Apache SparkTensorFrames: Google Tensorflow on Apache Spark
TensorFrames: Google Tensorflow on Apache SparkDatabricks
 
Mining Big Data Streams with APACHE SAMOA
Mining Big Data Streams with APACHE SAMOAMining Big Data Streams with APACHE SAMOA
Mining Big Data Streams with APACHE SAMOAAlbert Bifet
 
Text classification in scikit-learn
Text classification in scikit-learnText classification in scikit-learn
Text classification in scikit-learnJimmy Lai
 
Domain Adaptation
Domain AdaptationDomain Adaptation
Domain AdaptationMark Chang
 
Anomaly Detection with Apache Spark
Anomaly Detection with Apache SparkAnomaly Detection with Apache Spark
Anomaly Detection with Apache SparkCloudera, Inc.
 

What's hot (20)

Efficient Data Stream Classification via Probabilistic Adaptive Windows
Efficient Data Stream Classification via Probabilistic Adaptive WindowsEfficient Data Stream Classification via Probabilistic Adaptive Windows
Efficient Data Stream Classification via Probabilistic Adaptive Windows
 
Leveraging Bagging for Evolving Data Streams
Leveraging Bagging for Evolving Data StreamsLeveraging Bagging for Evolving Data Streams
Leveraging Bagging for Evolving Data Streams
 
EuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOME
EuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOMEEuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOME
EuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOME
 
Europy17_dibernardo
Europy17_dibernardoEuropy17_dibernardo
Europy17_dibernardo
 
Distance oracle - Truy vấn nhanh khoảng cách giữa hai điểm bất kỳ trên đồ thị
Distance oracle - Truy vấn nhanh khoảng cách giữa hai điểm bất kỳ trên đồ thịDistance oracle - Truy vấn nhanh khoảng cách giữa hai điểm bất kỳ trên đồ thị
Distance oracle - Truy vấn nhanh khoảng cách giữa hai điểm bất kỳ trên đồ thị
 
Distributed GLM with H2O - Atlanta Meetup
Distributed GLM with H2O - Atlanta MeetupDistributed GLM with H2O - Atlanta Meetup
Distributed GLM with H2O - Atlanta Meetup
 
PAC Bayesian for Deep Learning
PAC Bayesian for Deep LearningPAC Bayesian for Deep Learning
PAC Bayesian for Deep Learning
 
Introduction to Data streaming - 05/12/2014
Introduction to Data streaming - 05/12/2014Introduction to Data streaming - 05/12/2014
Introduction to Data streaming - 05/12/2014
 
PAC-Bayesian Bound for Deep Learning
PAC-Bayesian Bound for Deep LearningPAC-Bayesian Bound for Deep Learning
PAC-Bayesian Bound for Deep Learning
 
Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16
Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16
Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16
 
Sebastian Schelter – Distributed Machine Learing with the Samsara DSL
Sebastian Schelter – Distributed Machine Learing with the Samsara DSLSebastian Schelter – Distributed Machine Learing with the Samsara DSL
Sebastian Schelter – Distributed Machine Learing with the Samsara DSL
 
Parallel Optimization in Machine Learning
Parallel Optimization in Machine LearningParallel Optimization in Machine Learning
Parallel Optimization in Machine Learning
 
Scaling Python to CPUs and GPUs
Scaling Python to CPUs and GPUsScaling Python to CPUs and GPUs
Scaling Python to CPUs and GPUs
 
Scipy 2011 Time Series Analysis in Python
Scipy 2011 Time Series Analysis in PythonScipy 2011 Time Series Analysis in Python
Scipy 2011 Time Series Analysis in Python
 
Introduction to Deep Learning with Python
Introduction to Deep Learning with PythonIntroduction to Deep Learning with Python
Introduction to Deep Learning with Python
 
TensorFrames: Google Tensorflow on Apache Spark
TensorFrames: Google Tensorflow on Apache SparkTensorFrames: Google Tensorflow on Apache Spark
TensorFrames: Google Tensorflow on Apache Spark
 
Mining Big Data Streams with APACHE SAMOA
Mining Big Data Streams with APACHE SAMOAMining Big Data Streams with APACHE SAMOA
Mining Big Data Streams with APACHE SAMOA
 
Text classification in scikit-learn
Text classification in scikit-learnText classification in scikit-learn
Text classification in scikit-learn
 
Domain Adaptation
Domain AdaptationDomain Adaptation
Domain Adaptation
 
Anomaly Detection with Apache Spark
Anomaly Detection with Apache SparkAnomaly Detection with Apache Spark
Anomaly Detection with Apache Spark
 

Viewers also liked

5.1 mining data streams
5.1 mining data streams5.1 mining data streams
5.1 mining data streamsKrish_ver2
 
Moa: Real Time Analytics for Data Streams
Moa: Real Time Analytics for Data StreamsMoa: Real Time Analytics for Data Streams
Moa: Real Time Analytics for Data StreamsAlbert Bifet
 
Data Mining: Mining stream time series and sequence data
Data Mining: Mining stream time series and sequence dataData Mining: Mining stream time series and sequence data
Data Mining: Mining stream time series and sequence dataDataminingTools Inc
 
Mining Big Data in Real Time
Mining Big Data in Real TimeMining Big Data in Real Time
Mining Big Data in Real TimeAlbert Bifet
 
Adaptive Learning and Mining for Data Streams and Frequent Patterns
Adaptive Learning and Mining for Data Streams and Frequent PatternsAdaptive Learning and Mining for Data Streams and Frequent Patterns
Adaptive Learning and Mining for Data Streams and Frequent PatternsAlbert Bifet
 
Implementation of adaptive stft algorithm for lfm signals
Implementation of adaptive stft algorithm for lfm signalsImplementation of adaptive stft algorithm for lfm signals
Implementation of adaptive stft algorithm for lfm signalseSAT Journals
 
Data Stream Management
Data Stream ManagementData Stream Management
Data Stream Managementk_tauhid
 
CIKM 2009 - Efficient itemset generator discovery over a stream sliding window
CIKM 2009 - Efficient itemset generator discovery over a stream sliding windowCIKM 2009 - Efficient itemset generator discovery over a stream sliding window
CIKM 2009 - Efficient itemset generator discovery over a stream sliding windowChuancong Gao
 
Time series data mining techniques
Time series data mining techniquesTime series data mining techniques
Time series data mining techniquesShanmukha S. Potti
 
Mining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and CorrelationsMining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and CorrelationsJustin Cletus
 
PAKDD 2011 TUTORIAL Handling Concept Drift: Importance, Challenges and Solutions
PAKDD 2011 TUTORIAL Handling Concept Drift: Importance, Challenges and SolutionsPAKDD 2011 TUTORIAL Handling Concept Drift: Importance, Challenges and Solutions
PAKDD 2011 TUTORIAL Handling Concept Drift: Importance, Challenges and SolutionsAlbert Bifet
 
Step-by-Step Introduction to Apache Flink
Step-by-Step Introduction to Apache Flink Step-by-Step Introduction to Apache Flink
Step-by-Step Introduction to Apache Flink Slim Baltagi
 
Anyone Can Cook Report - WOWEL
Anyone Can Cook Report - WOWELAnyone Can Cook Report - WOWEL
Anyone Can Cook Report - WOWELyoonsukim1110
 
XY Lao Tablet
XY Lao TabletXY Lao Tablet
XY Lao Tabletlaonux
 
Gerusalemme terrena-educaz.crist
Gerusalemme terrena-educaz.cristGerusalemme terrena-educaz.crist
Gerusalemme terrena-educaz.cristUmberto Rosi
 

Viewers also liked (20)

5.1 mining data streams
5.1 mining data streams5.1 mining data streams
5.1 mining data streams
 
Moa: Real Time Analytics for Data Streams
Moa: Real Time Analytics for Data StreamsMoa: Real Time Analytics for Data Streams
Moa: Real Time Analytics for Data Streams
 
18 Data Streams
18 Data Streams18 Data Streams
18 Data Streams
 
Data Mining: Mining stream time series and sequence data
Data Mining: Mining stream time series and sequence dataData Mining: Mining stream time series and sequence data
Data Mining: Mining stream time series and sequence data
 
Mining Big Data in Real Time
Mining Big Data in Real TimeMining Big Data in Real Time
Mining Big Data in Real Time
 
Adaptive Learning and Mining for Data Streams and Frequent Patterns
Adaptive Learning and Mining for Data Streams and Frequent PatternsAdaptive Learning and Mining for Data Streams and Frequent Patterns
Adaptive Learning and Mining for Data Streams and Frequent Patterns
 
Implementation of adaptive stft algorithm for lfm signals
Implementation of adaptive stft algorithm for lfm signalsImplementation of adaptive stft algorithm for lfm signals
Implementation of adaptive stft algorithm for lfm signals
 
Data Stream Management
Data Stream ManagementData Stream Management
Data Stream Management
 
CIKM 2009 - Efficient itemset generator discovery over a stream sliding window
CIKM 2009 - Efficient itemset generator discovery over a stream sliding windowCIKM 2009 - Efficient itemset generator discovery over a stream sliding window
CIKM 2009 - Efficient itemset generator discovery over a stream sliding window
 
Slides lausanne-2013-v2
Slides lausanne-2013-v2Slides lausanne-2013-v2
Slides lausanne-2013-v2
 
Time series data mining techniques
Time series data mining techniquesTime series data mining techniques
Time series data mining techniques
 
Mining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and CorrelationsMining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and Correlations
 
Slides guanauato
Slides guanauatoSlides guanauato
Slides guanauato
 
PAKDD 2011 TUTORIAL Handling Concept Drift: Importance, Challenges and Solutions
PAKDD 2011 TUTORIAL Handling Concept Drift: Importance, Challenges and SolutionsPAKDD 2011 TUTORIAL Handling Concept Drift: Importance, Challenges and Solutions
PAKDD 2011 TUTORIAL Handling Concept Drift: Importance, Challenges and Solutions
 
Step-by-Step Introduction to Apache Flink
Step-by-Step Introduction to Apache Flink Step-by-Step Introduction to Apache Flink
Step-by-Step Introduction to Apache Flink
 
Flink vs. Spark
Flink vs. SparkFlink vs. Spark
Flink vs. Spark
 
Anyone Can Cook Report - WOWEL
Anyone Can Cook Report - WOWELAnyone Can Cook Report - WOWEL
Anyone Can Cook Report - WOWEL
 
Saim chishti books eemane abitalib2of2
Saim chishti books eemane abitalib2of2Saim chishti books eemane abitalib2of2
Saim chishti books eemane abitalib2of2
 
XY Lao Tablet
XY Lao TabletXY Lao Tablet
XY Lao Tablet
 
Gerusalemme terrena-educaz.crist
Gerusalemme terrena-educaz.cristGerusalemme terrena-educaz.crist
Gerusalemme terrena-educaz.crist
 

Similar to A Short Course in Data Stream Mining

Machine Learning
Machine LearningMachine Learning
Machine Learningbutest
 
Dedalo, looking for Cluster Explanations in a labyrinth of Linked Data
Dedalo, looking for Cluster Explanations in a labyrinth of Linked DataDedalo, looking for Cluster Explanations in a labyrinth of Linked Data
Dedalo, looking for Cluster Explanations in a labyrinth of Linked DataVrije Universiteit Amsterdam
 
Applied machine learning for search engine relevance 3
Applied machine learning for search engine relevance 3Applied machine learning for search engine relevance 3
Applied machine learning for search engine relevance 3Charles Martin
 
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494Sean Golliher
 
20070702 Text Categorization
20070702 Text Categorization20070702 Text Categorization
20070702 Text Categorizationmidi
 
Automatically Tolerating And Correcting Memory Errors
Automatically Tolerating And Correcting Memory ErrorsAutomatically Tolerating And Correcting Memory Errors
Automatically Tolerating And Correcting Memory ErrorsEmery Berger
 
Design and Analysis of Algorithms Exam Help
Design and Analysis of Algorithms Exam HelpDesign and Analysis of Algorithms Exam Help
Design and Analysis of Algorithms Exam HelpProgramming Exam Help
 
lecture 11
lecture 11lecture 11
lecture 11sajinsc
 
MS CS - Selecting Machine Learning Algorithm
MS CS - Selecting Machine Learning AlgorithmMS CS - Selecting Machine Learning Algorithm
MS CS - Selecting Machine Learning AlgorithmKaniska Mandal
 
FDA and Statistical learning theory
FDA and Statistical learning theoryFDA and Statistical learning theory
FDA and Statistical learning theorytuxette
 
Max Entropy
Max EntropyMax Entropy
Max Entropyjianingy
 
Ch03 Mining Massive Data Sets stanford
Ch03 Mining Massive Data Sets  stanfordCh03 Mining Massive Data Sets  stanford
Ch03 Mining Massive Data Sets stanfordSakthivel C R
 
Anlysis and design of algorithms part 1
Anlysis and design of algorithms part 1Anlysis and design of algorithms part 1
Anlysis and design of algorithms part 1Deepak John
 
Accelerating Metropolis Hastings with Lightweight Inference Compilation
Accelerating Metropolis Hastings with Lightweight Inference CompilationAccelerating Metropolis Hastings with Lightweight Inference Compilation
Accelerating Metropolis Hastings with Lightweight Inference CompilationFeynman Liang
 
Review session2
Review session2Review session2
Review session2NEEDY12345
 

Similar to A Short Course in Data Stream Mining (20)

Machine Learning
Machine LearningMachine Learning
Machine Learning
 
002.decision trees
002.decision trees002.decision trees
002.decision trees
 
Dedalo, looking for Cluster Explanations in a labyrinth of Linked Data
Dedalo, looking for Cluster Explanations in a labyrinth of Linked DataDedalo, looking for Cluster Explanations in a labyrinth of Linked Data
Dedalo, looking for Cluster Explanations in a labyrinth of Linked Data
 
Applied machine learning for search engine relevance 3
Applied machine learning for search engine relevance 3Applied machine learning for search engine relevance 3
Applied machine learning for search engine relevance 3
 
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
 
AI applications in education, Pascal Zoleko, Flexudy
AI applications in education, Pascal Zoleko, FlexudyAI applications in education, Pascal Zoleko, Flexudy
AI applications in education, Pascal Zoleko, Flexudy
 
20070702 Text Categorization
20070702 Text Categorization20070702 Text Categorization
20070702 Text Categorization
 
Automatically Tolerating And Correcting Memory Errors
Automatically Tolerating And Correcting Memory ErrorsAutomatically Tolerating And Correcting Memory Errors
Automatically Tolerating And Correcting Memory Errors
 
Chapter 4 ds
Chapter 4 dsChapter 4 ds
Chapter 4 ds
 
Design and Analysis of Algorithms Exam Help
Design and Analysis of Algorithms Exam HelpDesign and Analysis of Algorithms Exam Help
Design and Analysis of Algorithms Exam Help
 
lecture 11
lecture 11lecture 11
lecture 11
 
Into to prob_prog_hari (2)
Into to prob_prog_hari (2)Into to prob_prog_hari (2)
Into to prob_prog_hari (2)
 
MS CS - Selecting Machine Learning Algorithm
MS CS - Selecting Machine Learning AlgorithmMS CS - Selecting Machine Learning Algorithm
MS CS - Selecting Machine Learning Algorithm
 
FDA and Statistical learning theory
FDA and Statistical learning theoryFDA and Statistical learning theory
FDA and Statistical learning theory
 
Max Entropy
Max EntropyMax Entropy
Max Entropy
 
Ch03 Mining Massive Data Sets stanford
Ch03 Mining Massive Data Sets  stanfordCh03 Mining Massive Data Sets  stanford
Ch03 Mining Massive Data Sets stanford
 
Anlysis and design of algorithms part 1
Anlysis and design of algorithms part 1Anlysis and design of algorithms part 1
Anlysis and design of algorithms part 1
 
Accelerating Metropolis Hastings with Lightweight Inference Compilation
Accelerating Metropolis Hastings with Lightweight Inference CompilationAccelerating Metropolis Hastings with Lightweight Inference Compilation
Accelerating Metropolis Hastings with Lightweight Inference Compilation
 
Review session2
Review session2Review session2
Review session2
 
Function Approx2009
Function Approx2009Function Approx2009
Function Approx2009
 

More from Albert Bifet

Artificial intelligence and data stream mining
Artificial intelligence and data stream miningArtificial intelligence and data stream mining
Artificial intelligence and data stream miningAlbert Bifet
 
MOA for the IoT at ACML 2016
MOA for the IoT at ACML 2016 MOA for the IoT at ACML 2016
MOA for the IoT at ACML 2016 Albert Bifet
 
Apache Samoa: Mining Big Data Streams with Apache Flink
Apache Samoa: Mining Big Data Streams with Apache FlinkApache Samoa: Mining Big Data Streams with Apache Flink
Apache Samoa: Mining Big Data Streams with Apache FlinkAlbert Bifet
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataAlbert Bifet
 
Multi-label Classification with Meta-labels
Multi-label Classification with Meta-labelsMulti-label Classification with Meta-labels
Multi-label Classification with Meta-labelsAlbert Bifet
 
Pitfalls in benchmarking data stream classification and how to avoid them
Pitfalls in benchmarking data stream classification and how to avoid themPitfalls in benchmarking data stream classification and how to avoid them
Pitfalls in benchmarking data stream classification and how to avoid themAlbert Bifet
 
Mining Big Data in Real Time
Mining Big Data in Real TimeMining Big Data in Real Time
Mining Big Data in Real TimeAlbert Bifet
 
Sentiment Knowledge Discovery in Twitter Streaming Data
Sentiment Knowledge Discovery in Twitter Streaming DataSentiment Knowledge Discovery in Twitter Streaming Data
Sentiment Knowledge Discovery in Twitter Streaming DataAlbert Bifet
 
Fast Perceptron Decision Tree Learning from Evolving Data Streams
Fast Perceptron Decision Tree Learning from Evolving Data StreamsFast Perceptron Decision Tree Learning from Evolving Data Streams
Fast Perceptron Decision Tree Learning from Evolving Data StreamsAlbert Bifet
 
MOA : Massive Online Analysis
MOA : Massive Online AnalysisMOA : Massive Online Analysis
MOA : Massive Online AnalysisAlbert Bifet
 
New ensemble methods for evolving data streams
New ensemble methods for evolving data streamsNew ensemble methods for evolving data streams
New ensemble methods for evolving data streamsAlbert Bifet
 
Métodos Adaptativos de Minería de Datos y Aprendizaje para Flujos de Datos.
Métodos Adaptativos de Minería de Datos y Aprendizaje para Flujos de Datos.Métodos Adaptativos de Minería de Datos y Aprendizaje para Flujos de Datos.
Métodos Adaptativos de Minería de Datos y Aprendizaje para Flujos de Datos.Albert Bifet
 
Adaptive XML Tree Mining on Evolving Data Streams
Adaptive XML Tree Mining on Evolving Data StreamsAdaptive XML Tree Mining on Evolving Data Streams
Adaptive XML Tree Mining on Evolving Data StreamsAlbert Bifet
 
Mining Implications from Lattices of Closed Trees
Mining Implications from Lattices of Closed TreesMining Implications from Lattices of Closed Trees
Mining Implications from Lattices of Closed TreesAlbert Bifet
 
Kalman Filters and Adaptive Windows for Learning in Data Streams
Kalman Filters and Adaptive Windows for Learning in Data StreamsKalman Filters and Adaptive Windows for Learning in Data Streams
Kalman Filters and Adaptive Windows for Learning in Data StreamsAlbert Bifet
 

More from Albert Bifet (15)

Artificial intelligence and data stream mining
Artificial intelligence and data stream miningArtificial intelligence and data stream mining
Artificial intelligence and data stream mining
 
MOA for the IoT at ACML 2016
MOA for the IoT at ACML 2016 MOA for the IoT at ACML 2016
MOA for the IoT at ACML 2016
 
Apache Samoa: Mining Big Data Streams with Apache Flink
Apache Samoa: Mining Big Data Streams with Apache FlinkApache Samoa: Mining Big Data Streams with Apache Flink
Apache Samoa: Mining Big Data Streams with Apache Flink
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Multi-label Classification with Meta-labels
Multi-label Classification with Meta-labelsMulti-label Classification with Meta-labels
Multi-label Classification with Meta-labels
 
Pitfalls in benchmarking data stream classification and how to avoid them
Pitfalls in benchmarking data stream classification and how to avoid themPitfalls in benchmarking data stream classification and how to avoid them
Pitfalls in benchmarking data stream classification and how to avoid them
 
Mining Big Data in Real Time
Mining Big Data in Real TimeMining Big Data in Real Time
Mining Big Data in Real Time
 
Sentiment Knowledge Discovery in Twitter Streaming Data
Sentiment Knowledge Discovery in Twitter Streaming DataSentiment Knowledge Discovery in Twitter Streaming Data
Sentiment Knowledge Discovery in Twitter Streaming Data
 
Fast Perceptron Decision Tree Learning from Evolving Data Streams
Fast Perceptron Decision Tree Learning from Evolving Data StreamsFast Perceptron Decision Tree Learning from Evolving Data Streams
Fast Perceptron Decision Tree Learning from Evolving Data Streams
 
MOA : Massive Online Analysis
MOA : Massive Online AnalysisMOA : Massive Online Analysis
MOA : Massive Online Analysis
 
New ensemble methods for evolving data streams
New ensemble methods for evolving data streamsNew ensemble methods for evolving data streams
New ensemble methods for evolving data streams
 
Métodos Adaptativos de Minería de Datos y Aprendizaje para Flujos de Datos.
Métodos Adaptativos de Minería de Datos y Aprendizaje para Flujos de Datos.Métodos Adaptativos de Minería de Datos y Aprendizaje para Flujos de Datos.
Métodos Adaptativos de Minería de Datos y Aprendizaje para Flujos de Datos.
 
Adaptive XML Tree Mining on Evolving Data Streams
Adaptive XML Tree Mining on Evolving Data StreamsAdaptive XML Tree Mining on Evolving Data Streams
Adaptive XML Tree Mining on Evolving Data Streams
 
Mining Implications from Lattices of Closed Trees
Mining Implications from Lattices of Closed TreesMining Implications from Lattices of Closed Trees
Mining Implications from Lattices of Closed Trees
 
Kalman Filters and Adaptive Windows for Learning in Data Streams
Kalman Filters and Adaptive Windows for Learning in Data StreamsKalman Filters and Adaptive Windows for Learning in Data Streams
Kalman Filters and Adaptive Windows for Learning in Data Streams
 

Recently uploaded

Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...Bertram Ludäscher
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...Health
 
Data Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdfData Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdftheeltifs
 
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATIONCapstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATIONLakpaYanziSherpa
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareGraham Ware
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Klinik kandungan
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangeThinkInnovation
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowgargpaaro
 
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制vexqp
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRajesh Mondal
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
PLE-statistics document for primary schs
PLE-statistics document for primary schsPLE-statistics document for primary schs
PLE-statistics document for primary schscnajjemba
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.pptibrahimabdi22
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...nirzagarg
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxchadhar227
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...gajnagarg
 
Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........EfruzAsilolu
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraGovindSinghDasila
 

Recently uploaded (20)

Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
Data Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdfData Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdf
 
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATIONCapstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
PLE-statistics document for primary schs
PLE-statistics document for primary schsPLE-statistics document for primary schs
PLE-statistics document for primary schs
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 
Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 

A Short Course in Data Stream Mining

  • 1. A Short Course in Data Stream Mining Albert Bifet Shenzhen, 23 January 2015 Huawei Noah’s Ark Lab
  • 4. Introduction: Data Streams Data Streams Sequence is potentially infinite High amount of data: sublinear space High speed of arrival: sublinear time per example Once an element from a data stream has been processed it is discarded or archived Example Puzzle: Finding Missing Numbers Let π be a permutation of {1,...,n}. Let π−1 be π with one element missing. π−1[i] arrives in increasing order Task: Determine the missing number
  • 5. Introduction: Data Streams Data Streams Sequence is potentially infinite High amount of data: sublinear space High speed of arrival: sublinear time per example Once an element from a data stream has been processed it is discarded or archived Example Puzzle: Finding Missing Numbers Let π be a permutation of {1,...,n}. Let π−1 be π with one element missing. π−1[i] arrives in increasing order Task: Determine the missing number Use a n-bit vector to memorize all the numbers (O(n) space)
  • 6. Introduction: Data Streams Data Streams Sequence is potentially infinite High amount of data: sublinear space High speed of arrival: sublinear time per example Once an element from a data stream has been processed it is discarded or archived Example Puzzle: Finding Missing Numbers Let π be a permutation of {1,...,n}. Let π−1 be π with one element missing. π−1[i] arrives in increasing order Task: Determine the missing number Data Streams: O(log(n)) space.
  • 7. Introduction: Data Streams Data Streams Sequence is potentially infinite High amount of data: sublinear space High speed of arrival: sublinear time per example Once an element from a data stream has been processed it is discarded or archived Example Puzzle: Finding Missing Numbers Let π be a permutation of {1,...,n}. Let π−1 be π with one element missing. π−1[i] arrives in increasing order Task: Determine the missing number Data Streams: O(log(n)) space. Store n(n+1) 2 −∑ j≤i π−1[j].
  • 8. Data Streams Data Streams Sequence is potentially infinite High amount of data: sublinear space High speed of arrival: sublinear time per example Once an element from a data stream has been processed it is discarded or archived Tools: approximation randomization, sampling sketching
  • 9. Data Streams Data Streams Sequence is potentially infinite High amount of data: sublinear space High speed of arrival: sublinear time per example Once an element from a data stream has been processed it is discarded or archived Approximation algorithms Small error rate with high probability An algorithm (ε,δ)−approximates F if it outputs ˜F for which Pr[| ˜F −F| > εF] < δ.
  • 10. Data Streams Approximation Algorithms 1011000111 1010101 Sliding Window We can maintain simple statistics over sliding windows, using O(1 ε log2 N) space, where N is the length of the sliding window ε is the accuracy parameter M. Datar, A. Gionis, P. Indyk, and R. Motwani. Maintaining stream statistics over sliding windows. 2002
  • 11. Data Streams Approximation Algorithms 10110001111 0101011 Sliding Window We can maintain simple statistics over sliding windows, using O(1 ε log2 N) space, where N is the length of the sliding window ε is the accuracy parameter M. Datar, A. Gionis, P. Indyk, and R. Motwani. Maintaining stream statistics over sliding windows. 2002
  • 12. Data Streams Approximation Algorithms 101100011110 1010111 Sliding Window We can maintain simple statistics over sliding windows, using O(1 ε log2 N) space, where N is the length of the sliding window ε is the accuracy parameter M. Datar, A. Gionis, P. Indyk, and R. Motwani. Maintaining stream statistics over sliding windows. 2002
  • 13. Data Streams Approximation Algorithms 1011000111101 0101110 Sliding Window We can maintain simple statistics over sliding windows, using O(1 ε log2 N) space, where N is the length of the sliding window ε is the accuracy parameter M. Datar, A. Gionis, P. Indyk, and R. Motwani. Maintaining stream statistics over sliding windows. 2002
  • 14. Data Streams Approximation Algorithms 10110001111010 1011101 Sliding Window We can maintain simple statistics over sliding windows, using O(1 ε log2 N) space, where N is the length of the sliding window ε is the accuracy parameter M. Datar, A. Gionis, P. Indyk, and R. Motwani. Maintaining stream statistics over sliding windows. 2002
  • 15. Data Streams Approximation Algorithms 101100011110101 0111010 Sliding Window We can maintain simple statistics over sliding windows, using O(1 ε log2 N) space, where N is the length of the sliding window ε is the accuracy parameter M. Datar, A. Gionis, P. Indyk, and R. Motwani. Maintaining stream statistics over sliding windows. 2002
  • 17. Classification Definition Given nC different classes, a classifier algorithm builds a model that predicts for every unlabelled instance I the class C to which it belongs with accuracy. Example A spam filter Example Twitter Sentiment analysis: analyze tweets with positive or negative feelings
  • 18. Data stream classification cycle 1 Process an example at a time, and inspect it only once (at most) 2 Use a limited amount of memory 3 Work in a limited amount of time 4 Be ready to predict at any point
  • 19. Classification Data set that describes e-mail features for deciding if it is spam. Example Contains Domain Has Time “Money” type attach. received spam yes com yes night yes yes edu no night yes no com yes night yes no edu no day no no com no day no yes cat no day yes Assume we have to classify the following new instance: Contains Domain Has Time “Money” type attach. received spam yes edu yes day ?
  • 20. Bayes Classifiers Na¨ıve Bayes Based on Bayes Theorem: P(c|d) = P(c)P(d|c) P(d) posterior = prior ×likelikood evidence Estimates the probability of observing attribute a and the prior probability P(c) Probability of class c given an instance d: P(c|d) = P(c)∏a∈d P(a|c) P(d)
  • 21. Bayes Classifiers Multinomial Na¨ıve Bayes Considers a document as a bag-of-words. Estimates the probability of observing word w and the prior probability P(c) Probability of class c given a test document d: P(c|d) = P(c)∏w∈d P(w|c)nwd P(d)
  • 22. Perceptron Attribute 1 Attribute 2 Attribute 3 Attribute 4 Attribute 5 Output hw(xi) w1 w2 w3 w4 w5 Data stream: xi,yi Classical perceptron: hw(xi) = sgn(wTxi), Minimize Mean-square error: J(w) = 1 2 ∑(yi −hw(xi))2
  • 23. Perceptron Attribute 1 Attribute 2 Attribute 3 Attribute 4 Attribute 5 Output hw(xi) w1 w2 w3 w4 w5 We use sigmoid function hw = σ(wTx) where σ(x) = 1/(1+e−x ) σ (x) = σ(x)(1−σ(x))
  • 24. Perceptron Minimize Mean-square error: J(w) = 1 2 ∑(yi −hw(xi))2 Stochastic Gradient Descent: w = w−η∇Jxi Gradient of the error function: ∇J = −∑ i (yi −hw(xi))∇hw(xi) ∇hw(xi) = hw(xi)(1−hw(xi)) Weight update rule w = w+η ∑ i (yi −hw(xi))hw(xi)(1−hw(xi))xi
  • 25. Perceptron PERCEPTRON LEARNING(Stream,η) 1 for each class 2 do PERCEPTRON LEARNING(Stream,class,η) PERCEPTRON LEARNING(Stream,class,η) 1 £ Let w0 and w be randomly initialized 2 for each example (x,y) in Stream 3 do if class = y 4 then δ = (1−hw(x))·hw(x)·(1−hw(x)) 5 else δ = (0−hw(x))·hw(x)·(1−hw(x)) 6 w = w+η ·δ ·x PERCEPTRON PREDICTION(x) 1 return argmaxclass hwclass (x)
  • 26. Classification Data set that describes e-mail features for deciding if it is spam. Example Contains Domain Has Time “Money” type attach. received spam yes com yes night yes yes edu no night yes no com yes night yes no edu no day no no com no day no yes cat no day yes Assume we have to classify the following new instance: Contains Domain Has Time “Money” type attach. received spam yes edu yes day ?
  • 27. Classification Assume we have to classify the following new instance: Contains Domain Has Time “Money” type attach. received spam yes edu yes day ? Time Contains “Money” YES Yes NO No Day YES Night
  • 28. Decision Trees Basic induction strategy: A ← the “best” decision attribute for next node Assign A as decision attribute for node For each value of A, create new descendant of node Sort training examples to leaf nodes If training examples perfectly classified, Then STOP, Else iterate over new leaf nodes
  • 29. Hoeffding Trees Hoeffding Tree : VFDT Pedro Domingos and Geoff Hulten. Mining high-speed data streams. 2000 With high probability, constructs an identical model that a traditional (greedy) method would learn With theoretical guarantees on the error rate Time Contains “Money” YES Yes NO No Day YES Night
  • 30. Hoeffding Bound Inequality Probability of deviation of its expected value.
  • 31. Hoeffding Bound Inequality Let X = ∑i Xi where X1,...,Xn are independent and indentically distributed in [0,1]. Then 1 Chernoff For each ε < 1 Pr[X > (1+ε)E[X]] ≤ exp − ε2 3 E[X] 2 Hoeffding For each t > 0 Pr[X > E[X]+t] ≤ exp −2t2 /n 3 Bernstein Let σ2 = ∑i σ2 i the variance of X. If Xi −E[Xi] ≤ b for each i ∈ [n] then for each t > 0 Pr[X > E[X]+t] ≤ exp − t2 2σ2 + 2 3 bt
  • 32. Hoeffding Tree or VFDT HT(Stream,δ) 1 £ Let HT be a tree with a single leaf(root) 2 £ Init counts nijk at root 3 for each example (x,y) in Stream 4 do HTGROW((x,y),HT,δ) HTGROW((x,y),HT,δ) 1 £ Sort (x,y) to leaf l using HT 2 £ Update counts nijk at leaf l 3 if examples seen so far at l are not all of the same class 4 then £ Compute G for each attribute 5 if G(Best Attr.)−G(2nd best) > R2 ln1/δ 2n 6 then £ Split leaf on best attribute 7 for each branch 8 do £ Start new leaf and initiliatize counts
  • 33. Hoeffding Tree or VFDT HT(Stream,δ) 1 £ Let HT be a tree with a single leaf(root) 2 £ Init counts nijk at root 3 for each example (x,y) in Stream 4 do HTGROW((x,y),HT,δ) HTGROW((x,y),HT,δ) 1 £ Sort (x,y) to leaf l using HT 2 £ Update counts nijk at leaf l 3 if examples seen so far at l are not all of the same class 4 then £ Compute G for each attribute 5 if G(Best Attr.)−G(2nd best) > R2 ln1/δ 2n 6 then £ Split leaf on best attribute 7 for each branch 8 do £ Start new leaf and initiliatize counts
  • 34. Hoeffding Trees HT features With high probability, constructs an identical model that a traditional (greedy) method would learn Ties: when two attributes have similar G, split if G(Best Attr.)−G(2nd best) < R2 ln1/δ 2n < τ Compute G every nmin instances Memory: deactivate least promising nodes with lower pl ×el pl is the probability to reach leaf l el is the error in the node
  • 35. Hoeffding Naive Bayes Tree Hoeffding Tree Majority Class learner at leaves Hoeffding Naive Bayes Tree G. Holmes, R. Kirkby, and B. Pfahringer. Stress-testing Hoeffding trees, 2005. monitors accuracy of a Majority Class learner monitors accuracy of a Naive Bayes learner predicts using the most accurate method
  • 36. Bagging Example Dataset of 4 Instances : A, B, C, D Classifier 1: B, A, C, B Classifier 2: D, B, A, D Classifier 3: B, A, C, B Classifier 4: B, C, B, B Classifier 5: D, C, A, C Bagging builds a set of M base models, with a bootstrap sample created by drawing random samples with replacement.
  • 37. Bagging Example Dataset of 4 Instances : A, B, C, D Classifier 1: A, B, B, C Classifier 2: A, B, D, D Classifier 3: A, B, B, C Classifier 4: B, B, B, C Classifier 5: A, C, C, D Bagging builds a set of M base models, with a bootstrap sample created by drawing random samples with replacement.
  • 38. Bagging Example Dataset of 4 Instances : A, B, C, D Classifier 1: A, B, B, C: A(1) B(2) C(1) D(0) Classifier 2: A, B, D, D: A(1) B(1) C(0) D(2) Classifier 3: A, B, B, C: A(1) B(2) C(1) D(0) Classifier 4: B, B, B, C: A(0) B(3) C(1) D(0) Classifier 5: A, C, C, D: A(1) B(0) C(2) D(1) Each base model’s training set contains each of the original training example K times where P(K = k) follows a binomial distribution.
  • 39. Bagging Figure : Poisson(1) Distribution. Each base model’s training set contains each of the original training example K times where P(K = k) follows a binomial distribution.
  • 40. Oza and Russell’s Online Bagging for M models 1: Initialize base models hm for all m ∈ {1,2,...,M} 2: for all training examples do 3: for m = 1,2,...,M do 4: Set w = Poisson(1) 5: Update hm with the current example with weight w 6: anytime output: 7: return hypothesis: hfin(x) = argmaxy∈Y ∑T t=1 I(ht(x) = y)
  • 42. Data Mining Algorithms with Concept Drift No Concept Drift - input output DM Algorithm - Counter1 Counter2 Counter3 Counter4 Counter5 Concept Drift - input output DM Algorithm Static Model - Change Detect. - 6
  • 43. Data Mining Algorithms with Concept Drift No Concept Drift - input output DM Algorithm - Counter1 Counter2 Counter3 Counter4 Counter5 Concept Drift - input output DM Algorithm - Estimator1 Estimator2 Estimator3 Estimator4 Estimator5
  • 44. Optimal Change Detector and Predictor High accuracy Low false positives and false negatives ratios Theoretical guarantees Fast detection of change Low computational cost: minimum space and time needed No parameters needed
  • 45. Algorithm ADaptive Sliding WINdow Example W= 101010110111111 W0= 1 ADWIN: ADAPTIVE WINDOWING ALGORITHM 1 Initialize Window W 2 for each t 0 3 do W ← W ∪{xt} (i.e., add xt to the head of W) 4 repeat Drop elements from the tail of W 5 until | ˆµW0 − ˆµW1 | ≥ εc holds 6 for every split of W into W = W0 ·W1 7 Output ˆµW
  • 46. Algorithm ADaptive Sliding WINdow Example W= 101010110111111 W0= 1 W1 = 01010110111111 ADWIN: ADAPTIVE WINDOWING ALGORITHM 1 Initialize Window W 2 for each t 0 3 do W ← W ∪{xt} (i.e., add xt to the head of W) 4 repeat Drop elements from the tail of W 5 until | ˆµW0 − ˆµW1 | ≥ εc holds 6 for every split of W into W = W0 ·W1 7 Output ˆµW
  • 47. Algorithm ADaptive Sliding WINdow Example W= 101010110111111 W0= 10 W1 = 1010110111111 ADWIN: ADAPTIVE WINDOWING ALGORITHM 1 Initialize Window W 2 for each t 0 3 do W ← W ∪{xt} (i.e., add xt to the head of W) 4 repeat Drop elements from the tail of W 5 until | ˆµW0 − ˆµW1 | ≥ εc holds 6 for every split of W into W = W0 ·W1 7 Output ˆµW
  • 48. Algorithm ADaptive Sliding WINdow Example W= 101010110111111 W0= 101 W1 = 010110111111 ADWIN: ADAPTIVE WINDOWING ALGORITHM 1 Initialize Window W 2 for each t 0 3 do W ← W ∪{xt} (i.e., add xt to the head of W) 4 repeat Drop elements from the tail of W 5 until | ˆµW0 − ˆµW1 | ≥ εc holds 6 for every split of W into W = W0 ·W1 7 Output ˆµW
  • 49. Algorithm ADaptive Sliding WINdow Example W= 101010110111111 W0= 1010 W1 = 10110111111 ADWIN: ADAPTIVE WINDOWING ALGORITHM 1 Initialize Window W 2 for each t 0 3 do W ← W ∪{xt} (i.e., add xt to the head of W) 4 repeat Drop elements from the tail of W 5 until | ˆµW0 − ˆµW1 | ≥ εc holds 6 for every split of W into W = W0 ·W1 7 Output ˆµW
  • 50. Algorithm ADaptive Sliding WINdow Example W= 101010110111111 W0= 10101 W1 = 0110111111 ADWIN: ADAPTIVE WINDOWING ALGORITHM 1 Initialize Window W 2 for each t 0 3 do W ← W ∪{xt} (i.e., add xt to the head of W) 4 repeat Drop elements from the tail of W 5 until | ˆµW0 − ˆµW1 | ≥ εc holds 6 for every split of W into W = W0 ·W1 7 Output ˆµW
  • 51. Algorithm ADaptive Sliding WINdow Example W= 101010110111111 W0= 101010 W1 = 110111111 ADWIN: ADAPTIVE WINDOWING ALGORITHM 1 Initialize Window W 2 for each t 0 3 do W ← W ∪{xt} (i.e., add xt to the head of W) 4 repeat Drop elements from the tail of W 5 until | ˆµW0 − ˆµW1 | ≥ εc holds 6 for every split of W into W = W0 ·W1 7 Output ˆµW
  • 52. Algorithm ADaptive Sliding WINdow Example W= 101010110111111 W0= 1010101 W1 = 10111111 ADWIN: ADAPTIVE WINDOWING ALGORITHM 1 Initialize Window W 2 for each t 0 3 do W ← W ∪{xt} (i.e., add xt to the head of W) 4 repeat Drop elements from the tail of W 5 until | ˆµW0 − ˆµW1 | ≥ εc holds 6 for every split of W into W = W0 ·W1 7 Output ˆµW
  • 53. Algorithm ADaptive Sliding WINdow Example W= 101010110111111 W0= 10101011 W1 = 0111111 ADWIN: ADAPTIVE WINDOWING ALGORITHM 1 Initialize Window W 2 for each t 0 3 do W ← W ∪{xt} (i.e., add xt to the head of W) 4 repeat Drop elements from the tail of W 5 until | ˆµW0 − ˆµW1 | ≥ εc holds 6 for every split of W into W = W0 ·W1 7 Output ˆµW
  • 54. Algorithm ADaptive Sliding WINdow Example W= 101010110111111 | ˆµW0 − ˆµW1 | ≥ εc : CHANGE DET.! W0= 101010110 W1 = 111111 ADWIN: ADAPTIVE WINDOWING ALGORITHM 1 Initialize Window W 2 for each t 0 3 do W ← W ∪{xt} (i.e., add xt to the head of W) 4 repeat Drop elements from the tail of W 5 until | ˆµW0 − ˆµW1 | ≥ εc holds 6 for every split of W into W = W0 ·W1 7 Output ˆµW
  • 55. Algorithm ADaptive Sliding WINdow Example W= 101010110111111 Drop elements from the tail of W W0= 101010110 W1 = 111111 ADWIN: ADAPTIVE WINDOWING ALGORITHM 1 Initialize Window W 2 for each t 0 3 do W ← W ∪{xt} (i.e., add xt to the head of W) 4 repeat Drop elements from the tail of W 5 until | ˆµW0 − ˆµW1 | ≥ εc holds 6 for every split of W into W = W0 ·W1 7 Output ˆµW
  • 56. Algorithm ADaptive Sliding WINdow Example W= 01010110111111 Drop elements from the tail of W W0= 101010110 W1 = 111111 ADWIN: ADAPTIVE WINDOWING ALGORITHM 1 Initialize Window W 2 for each t 0 3 do W ← W ∪{xt} (i.e., add xt to the head of W) 4 repeat Drop elements from the tail of W 5 until | ˆµW0 − ˆµW1 | ≥ εc holds 6 for every split of W into W = W0 ·W1 7 Output ˆµW
  • 57. Algorithm ADaptive Sliding WINdow Theorem At every time step we have: 1 (False positive rate bound). If µt remains constant within W, the probability that ADWIN shrinks the window at this step is at most δ. 2 (False negative rate bound). Suppose that for some partition of W in two parts W0W1 (where W1 contains the most recent items) we have |µW0 − µW1 | 2εc. Then with probability 1−δ ADWIN shrinks W to W1, or shorter. ADWIN tunes itself to the data stream at hand, with no need for the user to hardwire or precompute parameters.
  • 58. Algorithm ADaptive Sliding WINdow ADWIN using a Data Stream Sliding Window Model, can provide the exact counts of 1’s in O(1) time per point. tries O(logW) cutpoints uses O(1 ε logW) memory words the processing time per example is O(logW) (amortized and worst-case). Sliding Window Model 1010101 101 11 1 1 Content: 4 2 2 1 1 Capacity: 7 3 2 1 1
  • 59. VFDT / CVFDT Concept-adapting Very Fast Decision Trees: CVFDT G. Hulten, L. Spencer, and P. Domingos. Mining time-changing data streams. 2001 It keeps its model consistent with a sliding window of examples Construct “alternative branches” as preparation for changes If the alternative branch becomes more accurate, switch of tree branches occurs Time Contains “Money” YES Yes NO No Day YES Night
  • 60. Decision Trees: CVFDT Time Contains “Money” YES Yes NO No Day YES Night No theoretical guarantees on the error rate of CVFDT CVFDT parameters : 1 W: is the example window size. 2 T0: number of examples used to check at each node if the splitting attribute is still the best. 3 T1: number of examples used to build the alternate tree. 4 T2: number of examples used to test the accuracy of the alternate tree.
  • 61. Decision Trees: Hoeffding Adaptive Tree Hoeffding Adaptive Tree: replace frequency statistics counters by estimators don’t need a window to store examples, due to the fact that we maintain the statistics data needed with estimators change the way of checking the substitution of alternate subtrees, using a change detector with theoretical guarantees Advantages over CVFDT: 1 Theoretical guarantees 2 No Parameters
  • 62. ADWIN Bagging (KDD’09) ADWIN An adaptive sliding window whose size is recomputed online according to the rate of change observed. ADWIN has rigorous guarantees (theorems) On ratio of false positives and negatives On the relation of the size of the current window and change rates ADWIN Bagging When a change is detected, the worst classifier is removed and a new classifier is added.
  • 63. Leveraging Bagging for Evolving Data Streams (ECML-PKDD’10) Randomization as a powerful tool to increase accuracy and diversity There are three ways of using randomization: Manipulating the input data Manipulating the classifier algorithms Manipulating the output targets
  • 64. Leveraging Bagging for Evolving Data Streams Leveraging Bagging Using Poisson(λ) Leveraging Bagging MC Using Poisson(λ) and Random Output Codes Fast Leveraging Bagging ME if an instance is misclassified: weight = 1 if not: weight = eT/(1−eT),
  • 65. Empirical evaluation Accuracy RAM-Hours Hoeffding Tree 74.03% 0.01 Online Bagging 77.15% 2.98 ADWIN Bagging 79.24% 1.48 Leveraging Bagging 85.54% 20.17 Leveraging Bagging MC 85.37% 22.04 Leveraging Bagging ME 80.77% 0.87 Leveraging Bagging Leveraging Bagging Using Poisson(λ) Leveraging Bagging MC Using Poisson(λ) and Random Output Codes Leveraging Bagging ME Using weight 1 if misclassified, otherwise eT/(1−eT)
  • 67. Clustering Definition Clustering is the distribution of a set of instances of examples into non-known groups according to some common relations or affinities. Example Market segmentation of customers Example Social network communities
  • 68. Clustering Definition Given a set of instances I a number of clusters K an objective function cost(I) a clustering algorithm computes an assignment of a cluster for each instance f : I → {1,...,K} that minimizes the objective function cost(I)
  • 69. Clustering Definition Given a set of instances I a number of clusters K an objective function cost(C,I) a clustering algorithm computes a set C of instances with |C| = K that minimizes the objective function cost(C,I) = ∑ x∈I d2 (x,C) where d(x,c): distance function between x and c d2(x,C) = minc∈Cd2(x,c): distance from x to the nearest point in C
  • 70. k-means 1. Choose k initial centers C = {c1,...,ck} 2. while stopping criterion has not been met For i = 1,...,N find closest center ck ∈ C to each instance pi assign instance pi to cluster Ck For k = 1,...,K set ck to be the center of mass of all points in Ci
  • 71. k-means++ 1. Choose a initial center c1 For k = 2,...,K select ck = p ∈ I with probability d2(p,C)/cost(C,I) 2. while stopping criterion has not been met For i = 1,...,N find closest center ck ∈ C to each instance pi assign instance pi to cluster Ck For k = 1,...,K set ck to be the center of mass of all points in Ci
  • 72. Performance Measures Internal Measures Sum square distance Dunn index D = dmin dmax C-Index C = S−Smin Smax−Smin External Measures Rand Measure F Measure Jaccard Purity
  • 73. BIRCH BALANCED ITERATIVE REDUCING AND CLUSTERING USING HIERARCHIES Clustering Features CF = (N,LS,SS) N: number of data points LS: linear sum of the N data points SS: square sum of the N data points Properties: Additivity: CF1 +CF2 = (N1 +N2,LS1 +LS2,SS1 +SS2) Easy to compute: average inter-cluster distance and average intra-cluster distance Uses CF tree Height-balanced tree with two parameters B: branching factor T: radius leaf threshold
  • 74. BIRCH BALANCED ITERATIVE REDUCING AND CLUSTERING USING HIERARCHIES Phase 1: Scan all data and build an initial in-memory CF tree Phase 2: Condense into desirable range by building a smaller CF tree (optional) Phase 3: Global clustering Phase 4: Cluster refining (optional and off line, as requires more passes)
  • 75. Clu-Stream Clu-Stream Uses micro-clusters to store statistics on-line Clustering Features CF = (N,LS,SS,LT,ST) N: numer of data points LS: linear sum of the N data points SS: square sum of the N data points LT: linear sum of the time stamps ST: square sum of the time stamps Uses pyramidal time frame
  • 76. Clu-Stream On-line Phase For each new point that arrives the point is absorbed by a micro-cluster the point starts a new micro-cluster of its own delete oldest micro-cluster merge two of the oldest micro-cluster Off-line Phase Apply k-means using microclusters as points
  • 77. StreamKM++: Coresets Coreset of a set P with respect to some problem Small subset that approximates the original set P. Solving the problem for the coreset provides an approximate solution for the problem on P. (k,ε)-coreset A (k,ε)-coreset S of P is a subset of P that for each C of size k (1−ε)cost(P,C) ≤ costw(S,C) ≤ (1+ε)cost(P,C)
  • 78. StreamKM++: Coresets Coreset Tree Choose a leaf l node at random Choose a new sample point denoted by qt+1 from Pl according to d2 Based on ql and qt+1, split Pl into two subclusters and create two child nodes StreamKM++ Maintain L = log2( n m )+2 buckets B0,B1,...,BL−1
  • 80. Frequent Patterns Suppose D is a dataset of patterns, t ∈ D, and min sup is a constant. Definition Support (t): number of patterns in D that are superpatterns of t. Definition Pattern t is frequent if Support (t) ≥ min sup. Frequent Subpattern Problem Given D and min sup, find all frequent subpatterns of patterns in D.
  • 81. Frequent Patterns Suppose D is a dataset of patterns, t ∈ D, and min sup is a constant. Definition Support (t): number of patterns in D that are superpatterns of t. Definition Pattern t is frequent if Support (t) ≥ min sup. Frequent Subpattern Problem Given D and min sup, find all frequent subpatterns of patterns in D.
  • 82. Frequent Patterns Suppose D is a dataset of patterns, t ∈ D, and min sup is a constant. Definition Support (t): number of patterns in D that are superpatterns of t. Definition Pattern t is frequent if Support (t) ≥ min sup. Frequent Subpattern Problem Given D and min sup, find all frequent subpatterns of patterns in D.
  • 83. Frequent Patterns Suppose D is a dataset of patterns, t ∈ D, and min sup is a constant. Definition Support (t): number of patterns in D that are superpatterns of t. Definition Pattern t is frequent if Support (t) ≥ min sup. Frequent Subpattern Problem Given D and min sup, find all frequent subpatterns of patterns in D.
  • 84. Pattern Mining Dataset Example Document Patterns d1 abce d2 cde d3 abce d4 acde d5 abcde d6 bcd
  • 85. Itemset Mining d1 abce d2 cde d3 abce d4 acde d5 abcde d6 bcd Support Frequent d1,d2,d3,d4,d5,d6 c d1,d2,d3,d4,d5 e,ce d1,d3,d4,d5 a,ac,ae,ace d1,d3,d5,d6 b,bc d2,d4,d5,d6 d,cd d1,d3,d5 ab,abc,abe be,bce,abce d2,d4,d5 de,cde minimal support = 3
  • 86. Itemset Mining d1 abce d2 cde d3 abce d4 acde d5 abcde d6 bcd Support Frequent 6 c 5 e,ce 4 a,ac,ae,ace 4 b,bc 4 d,cd 3 ab,abc,abe be,bce,abce 3 de,cde
  • 87. Itemset Mining d1 abce d2 cde d3 abce d4 acde d5 abcde d6 bcd Support Frequent Gen Closed 6 c c c 5 e,ce e ce 4 a,ac,ae,ace a ace 4 b,bc b bc 4 d,cd d cd 3 ab,abc,abe ab be,bce,abce be abce 3 de,cde de cde
  • 88. Itemset Mining d1 abce d2 cde d3 abce d4 acde d5 abcde d6 bcd Support Frequent Gen Closed Max 6 c c c 5 e,ce e ce 4 a,ac,ae,ace a ace 4 b,bc b bc 4 d,cd d cd 3 ab,abc,abe ab be,bce,abce be abce abce 3 de,cde de cde cde
  • 89. Itemset Mining d1 abce d2 cde d3 abce d4 acde d5 abcde d6 bcd Support Frequent Gen Closed Max 6 c c c 5 e,ce e ce 4 a,ac,ae,ace a ace 4 b,bc b bc 4 d,cd d cd 3 ab,abc,abe ab be,bce,abce be abce abce 3 de,cde de cde cde
  • 90. Itemset Mining d1 abce d2 cde d3 abce d4 acde d5 abcde d6 bcd e → ce Support Frequent Gen Closed Max 6 c c c 5 e,ce e ce 4 a,ac,ae,ace a ace 4 b,bc b bc 4 d,cd d cd 3 ab,abc,abe ab be,bce,abce be abce abce 3 de,cde de cde cde
  • 91. Itemset Mining d1 abce d2 cde d3 abce d4 acde d5 abcde d6 bcd Support Frequent Gen Closed Max 6 c c c 5 e,ce e ce 4 a,ac,ae,ace a ace 4 b,bc b bc 4 d,cd d cd 3 ab,abc,abe ab be,bce,abce be abce abce 3 de,cde de cde cde
  • 92. Itemset Mining d1 abce d2 cde d3 abce d4 acde d5 abcde d6 bcd Support Frequent Gen Closed Max 6 c c c 5 e,ce e ce 4 a,ac,ae,ace a ace 4 b,bc b bc 4 d,cd d cd 3 ab,abc,abe ab be,bce,abce be abce abce 3 de,cde de cde cde
  • 93. Itemset Mining d1 abce d2 cde d3 abce d4 acde d5 abcde d6 bcd a → ace Support Frequent Gen Closed Max 6 c c c 5 e,ce e ce 4 a,ac,ae,ace a ace 4 b,bc b bc 4 d,cd d cd 3 ab,abc,abe ab be,bce,abce be abce abce 3 de,cde de cde cde
  • 94. Itemset Mining d1 abce d2 cde d3 abce d4 acde d5 abcde d6 bcd Support Frequent Gen Closed Max 6 c c c 5 e,ce e ce 4 a,ac,ae,ace a ace 4 b,bc b bc 4 d,cd d cd 3 ab,abc,abe ab be,bce,abce be abce abce 3 de,cde de cde cde
  • 95. Closed Patterns Usually, there are too many frequent patterns. We can compute a smaller set, while keeping the same information. Example A set of 1000 items, has 21000 ≈ 10301 subsets, that is more than the number of atoms in the universe ≈ 1079
  • 96. Closed Patterns A priori property If t is a subpattern of t, then Support (t ) ≥ Support (t). Definition A frequent pattern t is closed if none of its proper superpatterns has the same support as it has. Frequent subpatterns and their supports can be generated from closed patterns.
  • 97. Maximal Patterns Definition A frequent pattern t is maximal if none of its proper superpatterns is frequent. Frequent subpatterns can be generated from maximal patterns, but not with their support. All maximal patterns are closed, but not all closed patterns are maximal.
  • 98. Non streaming frequent itemset miners Representation: Horizontal layout T1: a, b, c T2: b, c, e T3: b, d, e Vertical layout a: 1 0 0 b: 1 1 1 c: 1 1 0 Search: Breadth-first (levelwise): Apriori Depth-first: Eclat, FP-Growth
  • 99. Mining Patterns over Data Streams Requirements: fast, use small amount of memory and adaptive Type: Exact Approximate Per batch, per transaction Incremental, Sliding Window, Adaptive Frequent, Closed, Maximal patterns
  • 100. Moment Computes closed frequents itemsets in a sliding window Uses Closed Enumeration Tree Uses 4 type of Nodes: Closed Nodes Intermediate Nodes Unpromising Gateway Nodes Infrequent Gateway Nodes Adding transactions: closed items remains closed Removing transactions: infrequent items remains infrequent
  • 101. FP-Stream Mining Frequent Itemsets at Multiple Time Granularities Based in FP-Growth Maintains pattern tree tilted-time window Allows to answer time-sensitive queries Places greater information to recent data Drawback: time and memory complexity
  • 102. Tree and Graph Mining: Dealing with time changes Keep a window on recent stream elements Actually, just its lattice of closed sets! Keep track of number of closed patterns in lattice, N Use some change detector on N When change is detected: Drop stale part of the window Update lattice to reflect this deletion, using deletion rule Alternatively, sliding window of some fixed size
  • 104. A short course in Data Stream Mining Short Course Summary 1 Introduction 2 Classification 3 Ensemble Methods 4 Clustering 5 Frequent Pattern Mining Open Source Software 1 MOA: http://moa.cms.waikato.ac.nz/ 2 SAMOA: http://samoa-project.net/