Tata AIG General Insurance Company - Insurer Innovation Award 2024
Leveraging Bagging for Evolving Data Streams
1. Leveraging Bagging for Evolving Data Streams
Albert Bifet, Geoff Holmes, and Bernhard Pfahringer
University of Waikato
Hamilton, New Zealand
Barcelona, 21 September 2010
ECML PKDD 2010
2. Mining Data Streams with Concept Drift
Extract information from
potentially infinite sequence of data
possibly varying over time
using few resources
Adaptively:
no prior knowledge of type or rate of change
2 / 32
3. Mining Data Streams with Concept Drift
Extract information from
potentially infinite sequence of data
possibly varying over time
using few resources
Leveraging Bagging
New improvements for adaptive bagging methods using
input randomization
output randomization
2 / 32
4. Outline
1 Data stream constraints
2 Leveraging Bagging for Evolving Data Streams
3 Empirical evaluation
3 / 32
5. Outline
1 Data stream constraints
2 Leveraging Bagging for Evolving Data Streams
3 Empirical evaluation
4 / 32
6. Mining Massive Data
Eric Schmidt, August 2010
Every two days now we create as much information as we did
from the dawn of civilization up until 2003.
5 exabytes of data
5 / 32
7. Data stream classification cycle
1 Process an example at a
time, and inspect it only
once (at most)
2 Use a limited amount of
memory
3 Work in a limited amount
of time
4 Be ready to predict at
any point
6 / 32
8. Mining Massive Data
Koichi Kawana
Simplicity means the achievement of maximum effect with
minimum means.
time
accuracy
memory
Data Streams
7 / 32
9. Evaluation Example
Accuracy Time Memory
Classifier A 70% 100 20
Classifier B 80% 20 40
Which classifier is performing better?
8 / 32
11. Evaluation Example
Accuracy Time Memory RAM-Hours
Classifier A 70% 100 20 2,000
Classifier B 80% 20 40 800
Which classifier is performing better?
10 / 32
12. Outline
1 Data stream constraints
2 Leveraging Bagging for Evolving Data Streams
3 Empirical evaluation
11 / 32
13. Hoeffding Trees
Hoeffding Tree : VFDT
Pedro Domingos and Geoff Hulten.
Mining high-speed data streams. 2000
With high probability, constructs an identical model that a
traditional (greedy) method would learn
With theoretical guarantees on the error rate
Time
Contains “Money”
YES
Yes
NO
No
Day
YES
Night
12 / 32
14. Hoeffding Naive Bayes Tree
Hoeffding Tree
Majority Class learner at leaves
Hoeffding Naive Bayes Tree
G. Holmes, R. Kirkby, and B. Pfahringer.
Stress-testing Hoeffding trees, 2005.
monitors accuracy of a Majority Class learner
monitors accuracy of a Naive Bayes learner
predicts using the most accurate method
13 / 32
16. Bagging
Figure: Poisson(1) Distribution.
Each base model’s training set contains each of the original
training example K times where P(K = k) follows a binomial
distribution.
14 / 32
17. Oza and Russell’s Online Bagging for M models
1: Initialize base models hm for all m ∈ {1,2,...,M}
2: for all training examples do
3: for m = 1,2,...,M do
4: Set w = Poisson(1)
5: Update hm with the current example with weight w
6: anytime output:
7: return hypothesis: hfin(x) = argmaxy∈Y ∑T
t=1 I(ht (x) = y)
15 / 32
18. ADWIN Bagging (KDD’09)
ADWIN
An adaptive sliding window whose size is recomputed online
according to the rate of change observed.
ADWIN has rigorous guarantees (theorems)
On ratio of false positives and negatives
On the relation of the size of the current window and
change rates
ADWIN Bagging
When a change is detected, the worst classifier is removed and
a new classifier is added.
16 / 32
19. ADWIN Bagging for M models
1: Initialize base models hm for all m ∈ {1,2,...,M}
2: for all training examples do
3: for m = 1,2,...,M do
4: Set w = Poisson(1)
5: Update hm with the current example with weight w
6: if ADWIN detects change in error of one of the
classifiers then
7: Replace classifier with higher error with a new one
8: anytime output:
9: return hypothesis: hfin(x) = argmaxy∈Y ∑T
t=1 I(ht (x) = y)
17 / 32
20. Leveraging Bagging for Evolving
Data Streams
Randomization as a powerful tool to increase accuracy and
diversity
There are three ways of using randomization:
Manipulating the input data
Manipulating the classifier algorithms
Manipulating the output targets
18 / 32
22. ECOC Output Randomization
Table: Example matrix of random output codes for 3 classes and 6
classifiers
Class 1 Class 2 Class 3
Classifier 1 0 0 1
Classifier 2 0 1 1
Classifier 3 1 0 0
Classifier 4 1 1 0
Classifier 5 1 0 1
Classifier 6 0 1 0
20 / 32
23. Leveraging Bagging for Evolving Data Streams
Leveraging Bagging
Using Poisson(λ)
Leveraging Bagging MC
Using Poisson(λ) and Random Output Codes
Fast Leveraging Bagging ME
if an instance is misclassified: weight = 1
if not: weight = eT /(1−eT ),
21 / 32
24. Input Randomization
Bagging
resampling with replacement using Poisson(1)
Other Strategies
subagging
resampling without replacement
half subagging
resampling without replacement half of the instances
bagging without taking out any instance
using 1+Poisson(1)
22 / 32
25. Leveraging Bagging for Evolving Data Streams
1: Initialize base models hm for all m ∈ {1,2,...,M}
2:
3: for all training examples (x,y) do
4: for m = 1,2,...,M do
5: Set w = Poisson(λ)
6: Update hm with the current example with weight w
7: if ADWIN detects change in error of one of the classifiers
then
8: Replace classifier with higher error with a new one
9: anytime output:
10: return hfin(x) = argmaxy∈Y ∑T
t=1 I(ht (x) = y)
23 / 32
26. Leveraging Bagging for Evolving Data Streams MC
1: Initialize base models hm for all m ∈ {1,2,...,M}
2: Compute coloring µm(y)
3: for all training examples (x,y) do
4: for m = 1,2,...,M do
5: Set w = Poisson(λ)
6: Update hm with the current example with weight w and
class µm(y)
7: if ADWIN detects change in error of one of the classifiers
then
8: Replace classifier with higher error with a new one
9: anytime output:
10: return hfin(x) = argmaxy∈Y ∑T
t=1 I(ht (x) = µt (y))
23 / 32
27. Leveraging Bagging for Evolving Data Streams ME
1: Initialize base models hm for all m ∈ {1,2,...,M}
2:
3: for all training examples (x,y) do
4: for m = 1,2,...,M do
5: Set w = 1 if misclassified, otherwise eT /(1−eT )
6: Update hm with the current example with weight w
7: if ADWIN detects change in error of one of the classifiers
then
8: Replace classifier with higher error with a new one
9: anytime output:
10: return hfin(x) = argmaxy∈Y ∑T
t=1 I(ht (x) = y)
23 / 32
28. Outline
1 Data stream constraints
2 Leveraging Bagging for Evolving Data Streams
3 Empirical evaluation
24 / 32
29. What is MOA?
{M}assive {O}nline {A}nalysis is a framework for mining data
streams.
Based on experience with Weka and VFML
Focussed on classification trees, but lots of active
development: clustering, item set and sequence mining,
regression
Easy to extend
Easy to design and run experiments
25 / 32
30. MOA: the bird
The Moa (another native NZ bird) is not only flightless, like the
Weka, but also extinct.
26 / 32
31. Leveraging Bagging Empirical evaluation
Accuracy
75
77
79
81
83
85
87
89
91
93
951000080000150000220000290000360000430000500000570000640000710000780000850000920000990000
Instances
Accuracy(%)
Leveraging Bagging
Leveraging Bagging MC
ADWIN Bagging
Online Bagging
Figure: Accuracy on dataset SEA with three concept drifts.
27 / 32
32. Empirical evaluation
Accuracy RAM-Hours
Hoeffding Tree 74.03% 0.01
Online Bagging 77.15% 2.98
ADWIN Bagging 79.24% 1.48
ADWIN Half Subagging 78.36% 1.04
ADWIN Subagging 78.68% 1.13
ADWIN Bagging WT 81.49% 2.74
ADWIN Bagging Strategies
half subagging
resampling without replacement half of the instances
subagging
resampling without replacement
WT: bagging without taking out any instance
using 1+Poisson(1)
28 / 32
33. Empirical evaluation
Accuracy RAM-Hours
Hoeffding Tree 74.03% 0.01
Online Bagging 77.15% 2.98
ADWIN Bagging 79.24% 1.48
Leveraging Bagging 85.54% 20.17
Leveraging Bagging MC 85.37% 22.04
Leveraging Bagging ME 80.77% 0.87
Leveraging Bagging
Leveraging Bagging
Using Poisson(λ)
Leveraging Bagging MC
Using Poisson(λ) and Random Output Codes
Leveraging Bagging ME
Using weight 1 if misclassified, otherwise eT /(1−eT )
29 / 32
34. Empirical evaluation
Accuracy RAM-Hours
Hoeffding Tree 74.03% 0.01
Online Bagging 77.15% 2.98
ADWIN Bagging 79.24% 1.48
Random Forest Leveraging Bagging 80.69% 5.51
Random Forest Online Bagging 72.91% 1.30
Random Forest ADWIN Bagging 74.24% 0.89
Random Forests
the input training set is obtained by sampling with
replacement
the nodes of the tree use only (n) random attributes to
split
we only keep statistics of these attributes.
30 / 32
35. Leveraging Bagging Diversity
0
0,02
0,04
0,06
0,08
0,1
0,12
0,14
0,16
0,82 0,84 0,86 0,88 0,9 0,92 0,94 0,96
Kappa Statistic
Error
Leveraging Bagging
Online Bagging
Figure: Kappa-Error diagrams for Leveraging Bagging and Online
bagging (bottom) on on the SEA data with three concept drifts,
plotting 576 pairs of classifiers.
31 / 32
36. Summary
http://moa.cs.waikato.ac.nz/
Conclusions
New improvements for bagging methods using input
randomization
Improving Accuracy: Using Poisson(λ)
Improving RAM-Hours: Using weight 1 if misclassified,
otherwise eT /(1−eT )
New improvements for bagging methods using output
randomization
No need for multi-class classifiers
32 / 32