SlideShare una empresa de Scribd logo
1 de 28
Descargar para leer sin conexión
Pitfalls in Benchmarking Data Stream
Classification and How to Avoid Them
Albert Bifet1, Jesse Read2, Indr˙e ˇZliobait˙e3
Bernhard Pfahringer4, Geoff Holmes4
1Yahoo! Research Barcelona
2Universidad Carlos III, Madrid, Spain
3Aalto University and Helsinki Institute for Information Technology (HIIT), Finland
4University of Waikato, Hamilton, New Zealand
ECML-PKDD 2013, 25 September 2013
Data Streams
Data Streams
Sequence is potentially infinite
High amount of data: sublinear space
High speed of arrival: sublinear time per example
Once an element from a data stream has been processed
it is discarded or archived
Big Data & Real Time
1. Motivation
Electricity Dataset
Popular benchmark for testing adaptive classifiers
Collected from the Australian New South Wales Electricity
Market.
Contains 45,312 instances which record electricity prices
at 30 minute intervals.
The class label identifies the change of the price (UP or
DOWN) related to a moving average of the last 24 hours.
Electricity Dataset, Accuracy
0 1 2 3 4
·104
0
20
40
60
80
100
Time, instances
Accuracy,%
VFDT Majority Class
Naive Bayes
Electricity Dataset, Accuracy
0 1 2 3 4
·104
0
20
40
60
80
100
Time, instances
Accuracy,%
Magic Classifier VFDT
Majority Class Naive Bayes
Electricity Dataset, Kappa Statistic
0 1 2 3 4
·104
0
20
40
60
80
100
Time, instances
KappaStatistic,%
VFDT Naive Bayes
Electricity Dataset, Kappa Statistic
0 1 2 3 4
·104
0
20
40
60
80
100
Time, instances
KappaStatistic,%
Magic Classifier VFDT
Naive Bayes
Electricity Dataset, Accuracy
Algorithm name Acc. (%) Algorithm name Acc. (%)
DDM 89.6* Local detection 80.4
Learn++.CDS 88.5 Perceptron 79.1
KNN-SPRT 88.0 AUE2 77.3
GRI 88.0 ADWIN 76.6
FISH3 86.2 EAE 76.6
EDDM-IB1 85.7 Prop. method 76.1
Magic classifier 85.3 Cont. λ-perc. 74.1
ASHT 84.8 CALDS 72.5
bagADWIN 82.8 TA-SVM 68.9
DWM-NB 80.8
* tested on a subset
2. Problem
No-Change classifier: Weather classifier
Prediction for tomorrow: the same as
today
Electricity Dataset, Accuracy
0 1 2 3 4
·104
0
20
40
60
80
100
Time, instances
Accuracy,%
No-Change VFDT
Majority Class Naive Bayes
Electricity Dataset, Kappa Statistic
0 1 2 3 4
·104
0
20
40
60
80
100
Time, instances
KappaStatistic,%
No-Change VFDT
Naive Bayes
Characteristics of the Electricity Dataset
0.5 1 1.5 2 2.5 3 3.5 4 4.5
·104
20
30
40
50
60
Time, instances
Classprior,%
Characteristics of the Electricity Dataset
20 40 60 80 100 120 140 160 180 200
0
0.5
1
Lag, instances
Autocorrelation
3. Proposal
New Evaluation for Stream Classifiers
Kappa Statistic
p0: classifier’s prequential accuracy
pc: probability that a chance classifier makes a correct
prediction.
κ statistic
κ =
p0 − pc
1 − pc
κ = 1 if the classifier is always correct
κ = 0 if the predictions coincide with the correct ones as
often as those of the chance classifier
New Evaluation for Stream Classifiers
Kappa Plus Statistic
p0: classifier’s prequential accuracy
pe: no-change classifier’s prequential accuracy
κ+ statistic
κ+
=
p0 − pe
1 − pe
κ+ = 1 if the classifier is always correct
κ+ = 0 if the predictions coincide with the correct ones as
often as those of the no-change classifier
Electricity Market Dataset Accuracy
0 1 2 3 4
·104
60
80
100
Time, instances
Accuracy,%
No-Change HAT
Lev. Bagging
Electricity Market Dataset κ
0 1 2 3 4
·104
0
20
40
60
80
100
Time, instances
KappaStatistic,%
No-Change HAT
Lev. Bagging
Electricity Market Dataset κ+
0 1 2 3 4
·104
−300
−200
−100
0
100
Time, instances
KappaPlusStatistic,%
No-Change HAT
Lev. Bagging
SWT: Temporally Augmented Classifier
SWT: meta strategy that builds meta instances by augmenting
the original input attributes with the values of recent class
labels from the past
Pr[class is c] ≡ h(xt
, ct−
, . . . , ct−1
)
for the t-th test instance, where is the size of the sliding
window over the most recent true labels.
Electricity Market Dataset κ+
0 1 2 3 4
·104
−300
−200
−100
0
100
Time, instances
KappaPlusStatistic,%
No-Change SWT HAT
SWT Lev. Bagging
Electricity Market Dataset κ+
0 1 2 3 4
·104
−300
−200
−100
0
100
Time, instances
KappaPlusStatistic,%
No-Change HAT
Lev. Bagging
Electricity Market Dataset κ+
0 1 2 3 4
·104
−300
−200
−100
0
100
Time, instances
KappaPlusStatistic,%
No-Change SWT HAT
SWT Lev. Bagging
Forest Cover Type Dataset
0 2 4
·105
60
80
100
Time, instances
Accuracy,%
No-Change HAT
Lev. Bagging
0 2 4
·105
0
20
40
60
80
100
Time, instances
KappaStatistic,% No-Change HAT
Lev. Bagging
0 2 4
·105
−300
−200
−100
0
100
Time, instances
KappaPlusStatistic,%
No-Change HAT
Lev. Bagging
0 2 4
·105
0
20
40
60
80
100
Time, instances
Accuracy,%
No-Change SWT HAT
SWT Lev. Bagging
0 2 4
·105
0
20
40
60
80
100
Time, instances
KappaStatistic,%
No-Change SWT HAT
SWT Lev. Bagging
0 2 4
·105
−300
−200
−100
0
100
Time, instances
KappaPlusStatistic,%
No-Change SWT HAT
SWT Lev. Bagging
Conclusions
Temporal dependence in data stream mining
new κ+ measure
a wrapper classifier SWT
Pitfalls in Benchmarking Data Stream
Classification and How to Avoid Them
Thanks!
Pitfalls in Benchmarking Data Stream
Classification and How to Avoid Them

Más contenido relacionado

La actualidad más candente

Albert Bifet – Apache Samoa: Mining Big Data Streams with Apache Flink
Albert Bifet – Apache Samoa: Mining Big Data Streams with Apache FlinkAlbert Bifet – Apache Samoa: Mining Big Data Streams with Apache Flink
Albert Bifet – Apache Samoa: Mining Big Data Streams with Apache Flink
Flink Forward
 
DeepLearningProjV3
DeepLearningProjV3DeepLearningProjV3
DeepLearningProjV3
Ana Sanchez
 
Evaluation of Caching Strategies Based on Access Statistics on Past Requests
Evaluation of Caching Strategies Based on Access Statistics on Past RequestsEvaluation of Caching Strategies Based on Access Statistics on Past Requests
Evaluation of Caching Strategies Based on Access Statistics on Past Requests
SmartenIT
 

La actualidad más candente (20)

Streaming Algorithms
Streaming AlgorithmsStreaming Algorithms
Streaming Algorithms
 
Mining high speed data streams: Hoeffding and VFDT
Mining high speed data streams: Hoeffding and VFDTMining high speed data streams: Hoeffding and VFDT
Mining high speed data streams: Hoeffding and VFDT
 
Albert Bifet – Apache Samoa: Mining Big Data Streams with Apache Flink
Albert Bifet – Apache Samoa: Mining Big Data Streams with Apache FlinkAlbert Bifet – Apache Samoa: Mining Big Data Streams with Apache Flink
Albert Bifet – Apache Samoa: Mining Big Data Streams with Apache Flink
 
Dual-time Modeling and Forecasting in Consumer Banking (2016)
Dual-time Modeling and Forecasting in Consumer Banking (2016)Dual-time Modeling and Forecasting in Consumer Banking (2016)
Dual-time Modeling and Forecasting in Consumer Banking (2016)
 
Mining Big Data Streams with APACHE SAMOA
Mining Big Data Streams with APACHE SAMOAMining Big Data Streams with APACHE SAMOA
Mining Big Data Streams with APACHE SAMOA
 
DeepLearningProjV3
DeepLearningProjV3DeepLearningProjV3
DeepLearningProjV3
 
Metric based meta_learning
Metric based meta_learningMetric based meta_learning
Metric based meta_learning
 
Automatic Features Generation And Model Training On Spark: A Bayesian Approach
Automatic Features Generation And Model Training On Spark: A Bayesian ApproachAutomatic Features Generation And Model Training On Spark: A Bayesian Approach
Automatic Features Generation And Model Training On Spark: A Bayesian Approach
 
5.1 mining data streams
5.1 mining data streams5.1 mining data streams
5.1 mining data streams
 
R Packages for Time-Varying Networks and Extremal Dependence
R Packages for Time-Varying Networks and Extremal DependenceR Packages for Time-Varying Networks and Extremal Dependence
R Packages for Time-Varying Networks and Extremal Dependence
 
Mining Big Data in Real Time
Mining Big Data in Real TimeMining Big Data in Real Time
Mining Big Data in Real Time
 
Introduction to Data streaming - 05/12/2014
Introduction to Data streaming - 05/12/2014Introduction to Data streaming - 05/12/2014
Introduction to Data streaming - 05/12/2014
 
Probabilistic data structures
Probabilistic data structuresProbabilistic data structures
Probabilistic data structures
 
I Don't Want to Be a Dummy! Encoding Predictors for Trees
I Don't Want to Be a Dummy! Encoding Predictors for TreesI Don't Want to Be a Dummy! Encoding Predictors for Trees
I Don't Want to Be a Dummy! Encoding Predictors for Trees
 
Selective and incremental re-computation in reaction to changes: an exercise ...
Selective and incremental re-computation in reaction to changes: an exercise ...Selective and incremental re-computation in reaction to changes: an exercise ...
Selective and incremental re-computation in reaction to changes: an exercise ...
 
Project Matsu: Elastic Clouds for Disaster Relief
Project Matsu: Elastic Clouds for Disaster ReliefProject Matsu: Elastic Clouds for Disaster Relief
Project Matsu: Elastic Clouds for Disaster Relief
 
Scalable Whole-Exome Sequence Data Processing Using Workflow On A Cloud
Scalable Whole-Exome Sequence Data Processing Using Workflow On A CloudScalable Whole-Exome Sequence Data Processing Using Workflow On A Cloud
Scalable Whole-Exome Sequence Data Processing Using Workflow On A Cloud
 
The lifecycle of reproducible science data and what provenance has got to do ...
The lifecycle of reproducible science data and what provenance has got to do ...The lifecycle of reproducible science data and what provenance has got to do ...
The lifecycle of reproducible science data and what provenance has got to do ...
 
Evaluation of Caching Strategies Based on Access Statistics on Past Requests
Evaluation of Caching Strategies Based on Access Statistics on Past RequestsEvaluation of Caching Strategies Based on Access Statistics on Past Requests
Evaluation of Caching Strategies Based on Access Statistics on Past Requests
 
LHCb Computing Workshop 2018: PV finding with CNNs
LHCb Computing Workshop 2018: PV finding with CNNsLHCb Computing Workshop 2018: PV finding with CNNs
LHCb Computing Workshop 2018: PV finding with CNNs
 

Similar a Pitfalls in benchmarking data stream classification and how to avoid them

Similar a Pitfalls in benchmarking data stream classification and how to avoid them (20)

03 broderick qsts_sand2016-4697 c
03 broderick qsts_sand2016-4697 c03 broderick qsts_sand2016-4697 c
03 broderick qsts_sand2016-4697 c
 
05546953
0554695305546953
05546953
 
Evolving Fuzzy System Applied to Battery Charge Capacity Prediction for Faul...
 Evolving Fuzzy System Applied to Battery Charge Capacity Prediction for Faul... Evolving Fuzzy System Applied to Battery Charge Capacity Prediction for Faul...
Evolving Fuzzy System Applied to Battery Charge Capacity Prediction for Faul...
 
Slide
SlideSlide
Slide
 
A Hybrid Method of CART and Artificial Neural Network for Short Term Load For...
A Hybrid Method of CART and Artificial Neural Network for Short Term Load For...A Hybrid Method of CART and Artificial Neural Network for Short Term Load For...
A Hybrid Method of CART and Artificial Neural Network for Short Term Load For...
 
University of Victoria Talk - Metocean analysis and Machine Learning for Impr...
University of Victoria Talk - Metocean analysis and Machine Learning for Impr...University of Victoria Talk - Metocean analysis and Machine Learning for Impr...
University of Victoria Talk - Metocean analysis and Machine Learning for Impr...
 
ARIMA.pptx
ARIMA.pptxARIMA.pptx
ARIMA.pptx
 
Hybrid Evolutionary Algorithm Using Optimal Placement of FACTS Devices for To...
Hybrid Evolutionary Algorithm Using Optimal Placement of FACTS Devices for To...Hybrid Evolutionary Algorithm Using Optimal Placement of FACTS Devices for To...
Hybrid Evolutionary Algorithm Using Optimal Placement of FACTS Devices for To...
 
[Juan Martinez] Transient Analysis of Power Systems.pdf
[Juan Martinez] Transient Analysis of Power Systems.pdf[Juan Martinez] Transient Analysis of Power Systems.pdf
[Juan Martinez] Transient Analysis of Power Systems.pdf
 
Automated Generation of High-accuracy Interatomic Potentials Using Quantum Data
Automated Generation of High-accuracy Interatomic Potentials Using Quantum DataAutomated Generation of High-accuracy Interatomic Potentials Using Quantum Data
Automated Generation of High-accuracy Interatomic Potentials Using Quantum Data
 
Phase Measurement Units based FACT’s Devices for the Improvement of Power Sys...
Phase Measurement Units based FACT’s Devices for the Improvement of Power Sys...Phase Measurement Units based FACT’s Devices for the Improvement of Power Sys...
Phase Measurement Units based FACT’s Devices for the Improvement of Power Sys...
 
Methods for Achieving RTL to Gate Power Consistency
Methods for Achieving RTL to Gate Power ConsistencyMethods for Achieving RTL to Gate Power Consistency
Methods for Achieving RTL to Gate Power Consistency
 
The Wattminder Vision2009
The Wattminder Vision2009The Wattminder Vision2009
The Wattminder Vision2009
 
DSCmeetsRTS-CTS_v0
DSCmeetsRTS-CTS_v0DSCmeetsRTS-CTS_v0
DSCmeetsRTS-CTS_v0
 
KalmanForecast
KalmanForecastKalmanForecast
KalmanForecast
 
WSN_energy.pptx
WSN_energy.pptxWSN_energy.pptx
WSN_energy.pptx
 
⭐⭐⭐⭐⭐ Device Free Indoor Localization in the 28 GHz band based on machine lea...
⭐⭐⭐⭐⭐ Device Free Indoor Localization in the 28 GHz band based on machine lea...⭐⭐⭐⭐⭐ Device Free Indoor Localization in the 28 GHz band based on machine lea...
⭐⭐⭐⭐⭐ Device Free Indoor Localization in the 28 GHz band based on machine lea...
 
oberseminar2016
oberseminar2016oberseminar2016
oberseminar2016
 
Nowka low-power-07
Nowka low-power-07Nowka low-power-07
Nowka low-power-07
 
Smart Grid Vision
Smart Grid VisionSmart Grid Vision
Smart Grid Vision
 

Más de Albert Bifet

Apache Samoa: Mining Big Data Streams with Apache Flink
Apache Samoa: Mining Big Data Streams with Apache FlinkApache Samoa: Mining Big Data Streams with Apache Flink
Apache Samoa: Mining Big Data Streams with Apache Flink
Albert Bifet
 
Multi-label Classification with Meta-labels
Multi-label Classification with Meta-labelsMulti-label Classification with Meta-labels
Multi-label Classification with Meta-labels
Albert Bifet
 
STRIP: stream learning of influence probabilities.
STRIP: stream learning of influence probabilities.STRIP: stream learning of influence probabilities.
STRIP: stream learning of influence probabilities.
Albert Bifet
 
Efficient Data Stream Classification via Probabilistic Adaptive Windows
Efficient Data Stream Classification via Probabilistic Adaptive WindowsEfficient Data Stream Classification via Probabilistic Adaptive Windows
Efficient Data Stream Classification via Probabilistic Adaptive Windows
Albert Bifet
 
Métodos Adaptativos de Minería de Datos y Aprendizaje para Flujos de Datos.
Métodos Adaptativos de Minería de Datos y Aprendizaje para Flujos de Datos.Métodos Adaptativos de Minería de Datos y Aprendizaje para Flujos de Datos.
Métodos Adaptativos de Minería de Datos y Aprendizaje para Flujos de Datos.
Albert Bifet
 

Más de Albert Bifet (20)

Apache Samoa: Mining Big Data Streams with Apache Flink
Apache Samoa: Mining Big Data Streams with Apache FlinkApache Samoa: Mining Big Data Streams with Apache Flink
Apache Samoa: Mining Big Data Streams with Apache Flink
 
Introduction to Big Data Science
Introduction to Big Data ScienceIntroduction to Big Data Science
Introduction to Big Data Science
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Internet of Things Data Science
Internet of Things Data ScienceInternet of Things Data Science
Internet of Things Data Science
 
Real Time Big Data Management
Real Time Big Data ManagementReal Time Big Data Management
Real Time Big Data Management
 
A Short Course in Data Stream Mining
A Short Course in Data Stream MiningA Short Course in Data Stream Mining
A Short Course in Data Stream Mining
 
Multi-label Classification with Meta-labels
Multi-label Classification with Meta-labelsMulti-label Classification with Meta-labels
Multi-label Classification with Meta-labels
 
STRIP: stream learning of influence probabilities.
STRIP: stream learning of influence probabilities.STRIP: stream learning of influence probabilities.
STRIP: stream learning of influence probabilities.
 
Efficient Data Stream Classification via Probabilistic Adaptive Windows
Efficient Data Stream Classification via Probabilistic Adaptive WindowsEfficient Data Stream Classification via Probabilistic Adaptive Windows
Efficient Data Stream Classification via Probabilistic Adaptive Windows
 
Mining Big Data in Real Time
Mining Big Data in Real TimeMining Big Data in Real Time
Mining Big Data in Real Time
 
Mining Frequent Closed Graphs on Evolving Data Streams
Mining Frequent Closed Graphs on Evolving Data StreamsMining Frequent Closed Graphs on Evolving Data Streams
Mining Frequent Closed Graphs on Evolving Data Streams
 
PAKDD 2011 TUTORIAL Handling Concept Drift: Importance, Challenges and Solutions
PAKDD 2011 TUTORIAL Handling Concept Drift: Importance, Challenges and SolutionsPAKDD 2011 TUTORIAL Handling Concept Drift: Importance, Challenges and Solutions
PAKDD 2011 TUTORIAL Handling Concept Drift: Importance, Challenges and Solutions
 
Leveraging Bagging for Evolving Data Streams
Leveraging Bagging for Evolving Data StreamsLeveraging Bagging for Evolving Data Streams
Leveraging Bagging for Evolving Data Streams
 
MOA : Massive Online Analysis
MOA : Massive Online AnalysisMOA : Massive Online Analysis
MOA : Massive Online Analysis
 
New ensemble methods for evolving data streams
New ensemble methods for evolving data streamsNew ensemble methods for evolving data streams
New ensemble methods for evolving data streams
 
Métodos Adaptativos de Minería de Datos y Aprendizaje para Flujos de Datos.
Métodos Adaptativos de Minería de Datos y Aprendizaje para Flujos de Datos.Métodos Adaptativos de Minería de Datos y Aprendizaje para Flujos de Datos.
Métodos Adaptativos de Minería de Datos y Aprendizaje para Flujos de Datos.
 
Adaptive XML Tree Mining on Evolving Data Streams
Adaptive XML Tree Mining on Evolving Data StreamsAdaptive XML Tree Mining on Evolving Data Streams
Adaptive XML Tree Mining on Evolving Data Streams
 
Adaptive Learning and Mining for Data Streams and Frequent Patterns
Adaptive Learning and Mining for Data Streams and Frequent PatternsAdaptive Learning and Mining for Data Streams and Frequent Patterns
Adaptive Learning and Mining for Data Streams and Frequent Patterns
 
Mining Adaptively Frequent Closed Unlabeled Rooted Trees in Data Streams
Mining Adaptively Frequent Closed Unlabeled Rooted Trees in Data StreamsMining Adaptively Frequent Closed Unlabeled Rooted Trees in Data Streams
Mining Adaptively Frequent Closed Unlabeled Rooted Trees in Data Streams
 
Mining Implications from Lattices of Closed Trees
Mining Implications from Lattices of Closed TreesMining Implications from Lattices of Closed Trees
Mining Implications from Lattices of Closed Trees
 

Último

Gorgeous Call Girls In Jaipur {9521753030} ❤️VVIP ANKITA Call Girl in Jaipur ...
Gorgeous Call Girls In Jaipur {9521753030} ❤️VVIP ANKITA Call Girl in Jaipur ...Gorgeous Call Girls In Jaipur {9521753030} ❤️VVIP ANKITA Call Girl in Jaipur ...
Gorgeous Call Girls In Jaipur {9521753030} ❤️VVIP ANKITA Call Girl in Jaipur ...
Sheetaleventcompany
 
👉Amritsar Escorts📞Book Now📞👉 8725944379 👉Amritsar Escort Service No Advance C...
👉Amritsar Escorts📞Book Now📞👉 8725944379 👉Amritsar Escort Service No Advance C...👉Amritsar Escorts📞Book Now📞👉 8725944379 👉Amritsar Escort Service No Advance C...
👉Amritsar Escorts📞Book Now📞👉 8725944379 👉Amritsar Escort Service No Advance C...
Sheetaleventcompany
 
💚Call Girl In Amritsar 💯Anvi 📲🔝8725944379🔝Amritsar Call Girls No💰Advance Cash...
💚Call Girl In Amritsar 💯Anvi 📲🔝8725944379🔝Amritsar Call Girls No💰Advance Cash...💚Call Girl In Amritsar 💯Anvi 📲🔝8725944379🔝Amritsar Call Girls No💰Advance Cash...
💚Call Girl In Amritsar 💯Anvi 📲🔝8725944379🔝Amritsar Call Girls No💰Advance Cash...
Sheetaleventcompany
 
👉Amritsar Call Girl 👉📞 8725944379 👉📞 Just📲 Call Mack Call Girls Service In Am...
👉Amritsar Call Girl 👉📞 8725944379 👉📞 Just📲 Call Mack Call Girls Service In Am...👉Amritsar Call Girl 👉📞 8725944379 👉📞 Just📲 Call Mack Call Girls Service In Am...
👉Amritsar Call Girl 👉📞 8725944379 👉📞 Just📲 Call Mack Call Girls Service In Am...
Sheetaleventcompany
 

Último (20)

Sakinaka Call Girls Trishika 9892124323 Vashi Call girls Escorts Service
Sakinaka Call Girls Trishika 9892124323 Vashi Call girls  Escorts ServiceSakinaka Call Girls Trishika 9892124323 Vashi Call girls  Escorts Service
Sakinaka Call Girls Trishika 9892124323 Vashi Call girls Escorts Service
 
Nalasopara Call Girls Services 9892124323 Home and Hotel Delivery Free
Nalasopara Call Girls Services 9892124323 Home and Hotel Delivery FreeNalasopara Call Girls Services 9892124323 Home and Hotel Delivery Free
Nalasopara Call Girls Services 9892124323 Home and Hotel Delivery Free
 
Just Call Vip call girls Kolhapur Escorts ☎️8617370543 Starting From 5K to 25...
Just Call Vip call girls Kolhapur Escorts ☎️8617370543 Starting From 5K to 25...Just Call Vip call girls Kolhapur Escorts ☎️8617370543 Starting From 5K to 25...
Just Call Vip call girls Kolhapur Escorts ☎️8617370543 Starting From 5K to 25...
 
👉Chandigarh Call Girls 👉9878799926👉Just Call👉Chandigarh Call Girl In Chandiga...
👉Chandigarh Call Girls 👉9878799926👉Just Call👉Chandigarh Call Girl In Chandiga...👉Chandigarh Call Girls 👉9878799926👉Just Call👉Chandigarh Call Girl In Chandiga...
👉Chandigarh Call Girls 👉9878799926👉Just Call👉Chandigarh Call Girl In Chandiga...
 
Call Girl In Zirakpur👧 Book Now📱7837612180 📞👉Zirakpur Call Girls Service No A...
Call Girl In Zirakpur👧 Book Now📱7837612180 📞👉Zirakpur Call Girls Service No A...Call Girl In Zirakpur👧 Book Now📱7837612180 📞👉Zirakpur Call Girls Service No A...
Call Girl In Zirakpur👧 Book Now📱7837612180 📞👉Zirakpur Call Girls Service No A...
 
Rudraprayag call girls 📞 8617697112 At Low Cost Cash Payment Booking
Rudraprayag call girls 📞 8617697112 At Low Cost Cash Payment BookingRudraprayag call girls 📞 8617697112 At Low Cost Cash Payment Booking
Rudraprayag call girls 📞 8617697112 At Low Cost Cash Payment Booking
 
UNIVERSAL HUMAN VALUES - INTRODUCTION TO VALUE EDUCATION
 UNIVERSAL HUMAN VALUES - INTRODUCTION TO VALUE EDUCATION UNIVERSAL HUMAN VALUES - INTRODUCTION TO VALUE EDUCATION
UNIVERSAL HUMAN VALUES - INTRODUCTION TO VALUE EDUCATION
 
Sakinaka Call Girls Agency 📞 9892124323 ✅ Call Girl in Sakinaka
Sakinaka Call Girls Agency  📞 9892124323 ✅  Call Girl in SakinakaSakinaka Call Girls Agency  📞 9892124323 ✅  Call Girl in Sakinaka
Sakinaka Call Girls Agency 📞 9892124323 ✅ Call Girl in Sakinaka
 
Gorgeous Call Girls In Jaipur {9521753030} ❤️VVIP ANKITA Call Girl in Jaipur ...
Gorgeous Call Girls In Jaipur {9521753030} ❤️VVIP ANKITA Call Girl in Jaipur ...Gorgeous Call Girls In Jaipur {9521753030} ❤️VVIP ANKITA Call Girl in Jaipur ...
Gorgeous Call Girls In Jaipur {9521753030} ❤️VVIP ANKITA Call Girl in Jaipur ...
 
👉Amritsar Escorts📞Book Now📞👉 8725944379 👉Amritsar Escort Service No Advance C...
👉Amritsar Escorts📞Book Now📞👉 8725944379 👉Amritsar Escort Service No Advance C...👉Amritsar Escorts📞Book Now📞👉 8725944379 👉Amritsar Escort Service No Advance C...
👉Amritsar Escorts📞Book Now📞👉 8725944379 👉Amritsar Escort Service No Advance C...
 
💚Call Girl In Amritsar 💯Anvi 📲🔝8725944379🔝Amritsar Call Girls No💰Advance Cash...
💚Call Girl In Amritsar 💯Anvi 📲🔝8725944379🔝Amritsar Call Girls No💰Advance Cash...💚Call Girl In Amritsar 💯Anvi 📲🔝8725944379🔝Amritsar Call Girls No💰Advance Cash...
💚Call Girl In Amritsar 💯Anvi 📲🔝8725944379🔝Amritsar Call Girls No💰Advance Cash...
 
👉Amritsar Call Girl 👉📞 8725944379 👉📞 Just📲 Call Mack Call Girls Service In Am...
👉Amritsar Call Girl 👉📞 8725944379 👉📞 Just📲 Call Mack Call Girls Service In Am...👉Amritsar Call Girl 👉📞 8725944379 👉📞 Just📲 Call Mack Call Girls Service In Am...
👉Amritsar Call Girl 👉📞 8725944379 👉📞 Just📲 Call Mack Call Girls Service In Am...
 
Call Girls in Bangalore Nisha 💋9136956627 Bangalore Call Girls
Call Girls in Bangalore Nisha 💋9136956627 Bangalore Call GirlsCall Girls in Bangalore Nisha 💋9136956627 Bangalore Call Girls
Call Girls in Bangalore Nisha 💋9136956627 Bangalore Call Girls
 
Call Girls in Bangalore Lavya 💋9136956627 Bangalore Call Girls
Call Girls in Bangalore Lavya 💋9136956627 Bangalore Call GirlsCall Girls in Bangalore Lavya 💋9136956627 Bangalore Call Girls
Call Girls in Bangalore Lavya 💋9136956627 Bangalore Call Girls
 
High Class Call Girls in Bangalore 📱9136956627📱
High Class Call Girls in Bangalore 📱9136956627📱High Class Call Girls in Bangalore 📱9136956627📱
High Class Call Girls in Bangalore 📱9136956627📱
 
Call Girls in Bangalore Prachi 💋9136956627 Bangalore Call Girls
Call Girls in  Bangalore Prachi 💋9136956627 Bangalore Call GirlsCall Girls in  Bangalore Prachi 💋9136956627 Bangalore Call Girls
Call Girls in Bangalore Prachi 💋9136956627 Bangalore Call Girls
 
CALL GIRLS IN Munirka :- (1X 97111 47426 ENJOY 🔝
CALL GIRLS IN Munirka :- (1X 97111 47426  ENJOY 🔝CALL GIRLS IN Munirka :- (1X 97111 47426  ENJOY 🔝
CALL GIRLS IN Munirka :- (1X 97111 47426 ENJOY 🔝
 
Pooja : 9892124323, Dharavi Call Girls. 7000 Cash Payment Free Home Delivery
Pooja : 9892124323, Dharavi Call Girls. 7000 Cash Payment Free Home DeliveryPooja : 9892124323, Dharavi Call Girls. 7000 Cash Payment Free Home Delivery
Pooja : 9892124323, Dharavi Call Girls. 7000 Cash Payment Free Home Delivery
 
Nahan call girls 📞 8617697112 At Low Cost Cash Payment Booking
Nahan call girls 📞 8617697112 At Low Cost Cash Payment BookingNahan call girls 📞 8617697112 At Low Cost Cash Payment Booking
Nahan call girls 📞 8617697112 At Low Cost Cash Payment Booking
 
Hire 💕 8617697112 Pulwama Call Girls Service Call Girls Agency
Hire 💕 8617697112 Pulwama Call Girls Service Call Girls AgencyHire 💕 8617697112 Pulwama Call Girls Service Call Girls Agency
Hire 💕 8617697112 Pulwama Call Girls Service Call Girls Agency
 

Pitfalls in benchmarking data stream classification and how to avoid them

  • 1. Pitfalls in Benchmarking Data Stream Classification and How to Avoid Them Albert Bifet1, Jesse Read2, Indr˙e ˇZliobait˙e3 Bernhard Pfahringer4, Geoff Holmes4 1Yahoo! Research Barcelona 2Universidad Carlos III, Madrid, Spain 3Aalto University and Helsinki Institute for Information Technology (HIIT), Finland 4University of Waikato, Hamilton, New Zealand ECML-PKDD 2013, 25 September 2013
  • 2. Data Streams Data Streams Sequence is potentially infinite High amount of data: sublinear space High speed of arrival: sublinear time per example Once an element from a data stream has been processed it is discarded or archived Big Data & Real Time
  • 4. Electricity Dataset Popular benchmark for testing adaptive classifiers Collected from the Australian New South Wales Electricity Market. Contains 45,312 instances which record electricity prices at 30 minute intervals. The class label identifies the change of the price (UP or DOWN) related to a moving average of the last 24 hours.
  • 5. Electricity Dataset, Accuracy 0 1 2 3 4 ·104 0 20 40 60 80 100 Time, instances Accuracy,% VFDT Majority Class Naive Bayes
  • 6. Electricity Dataset, Accuracy 0 1 2 3 4 ·104 0 20 40 60 80 100 Time, instances Accuracy,% Magic Classifier VFDT Majority Class Naive Bayes
  • 7. Electricity Dataset, Kappa Statistic 0 1 2 3 4 ·104 0 20 40 60 80 100 Time, instances KappaStatistic,% VFDT Naive Bayes
  • 8. Electricity Dataset, Kappa Statistic 0 1 2 3 4 ·104 0 20 40 60 80 100 Time, instances KappaStatistic,% Magic Classifier VFDT Naive Bayes
  • 9. Electricity Dataset, Accuracy Algorithm name Acc. (%) Algorithm name Acc. (%) DDM 89.6* Local detection 80.4 Learn++.CDS 88.5 Perceptron 79.1 KNN-SPRT 88.0 AUE2 77.3 GRI 88.0 ADWIN 76.6 FISH3 86.2 EAE 76.6 EDDM-IB1 85.7 Prop. method 76.1 Magic classifier 85.3 Cont. λ-perc. 74.1 ASHT 84.8 CALDS 72.5 bagADWIN 82.8 TA-SVM 68.9 DWM-NB 80.8 * tested on a subset
  • 11. No-Change classifier: Weather classifier Prediction for tomorrow: the same as today
  • 12. Electricity Dataset, Accuracy 0 1 2 3 4 ·104 0 20 40 60 80 100 Time, instances Accuracy,% No-Change VFDT Majority Class Naive Bayes
  • 13. Electricity Dataset, Kappa Statistic 0 1 2 3 4 ·104 0 20 40 60 80 100 Time, instances KappaStatistic,% No-Change VFDT Naive Bayes
  • 14. Characteristics of the Electricity Dataset 0.5 1 1.5 2 2.5 3 3.5 4 4.5 ·104 20 30 40 50 60 Time, instances Classprior,%
  • 15. Characteristics of the Electricity Dataset 20 40 60 80 100 120 140 160 180 200 0 0.5 1 Lag, instances Autocorrelation
  • 17. New Evaluation for Stream Classifiers Kappa Statistic p0: classifier’s prequential accuracy pc: probability that a chance classifier makes a correct prediction. κ statistic κ = p0 − pc 1 − pc κ = 1 if the classifier is always correct κ = 0 if the predictions coincide with the correct ones as often as those of the chance classifier
  • 18. New Evaluation for Stream Classifiers Kappa Plus Statistic p0: classifier’s prequential accuracy pe: no-change classifier’s prequential accuracy κ+ statistic κ+ = p0 − pe 1 − pe κ+ = 1 if the classifier is always correct κ+ = 0 if the predictions coincide with the correct ones as often as those of the no-change classifier
  • 19. Electricity Market Dataset Accuracy 0 1 2 3 4 ·104 60 80 100 Time, instances Accuracy,% No-Change HAT Lev. Bagging
  • 20. Electricity Market Dataset κ 0 1 2 3 4 ·104 0 20 40 60 80 100 Time, instances KappaStatistic,% No-Change HAT Lev. Bagging
  • 21. Electricity Market Dataset κ+ 0 1 2 3 4 ·104 −300 −200 −100 0 100 Time, instances KappaPlusStatistic,% No-Change HAT Lev. Bagging
  • 22. SWT: Temporally Augmented Classifier SWT: meta strategy that builds meta instances by augmenting the original input attributes with the values of recent class labels from the past Pr[class is c] ≡ h(xt , ct− , . . . , ct−1 ) for the t-th test instance, where is the size of the sliding window over the most recent true labels.
  • 23. Electricity Market Dataset κ+ 0 1 2 3 4 ·104 −300 −200 −100 0 100 Time, instances KappaPlusStatistic,% No-Change SWT HAT SWT Lev. Bagging
  • 24. Electricity Market Dataset κ+ 0 1 2 3 4 ·104 −300 −200 −100 0 100 Time, instances KappaPlusStatistic,% No-Change HAT Lev. Bagging
  • 25. Electricity Market Dataset κ+ 0 1 2 3 4 ·104 −300 −200 −100 0 100 Time, instances KappaPlusStatistic,% No-Change SWT HAT SWT Lev. Bagging
  • 26. Forest Cover Type Dataset 0 2 4 ·105 60 80 100 Time, instances Accuracy,% No-Change HAT Lev. Bagging 0 2 4 ·105 0 20 40 60 80 100 Time, instances KappaStatistic,% No-Change HAT Lev. Bagging 0 2 4 ·105 −300 −200 −100 0 100 Time, instances KappaPlusStatistic,% No-Change HAT Lev. Bagging 0 2 4 ·105 0 20 40 60 80 100 Time, instances Accuracy,% No-Change SWT HAT SWT Lev. Bagging 0 2 4 ·105 0 20 40 60 80 100 Time, instances KappaStatistic,% No-Change SWT HAT SWT Lev. Bagging 0 2 4 ·105 −300 −200 −100 0 100 Time, instances KappaPlusStatistic,% No-Change SWT HAT SWT Lev. Bagging
  • 27. Conclusions Temporal dependence in data stream mining new κ+ measure a wrapper classifier SWT Pitfalls in Benchmarking Data Stream Classification and How to Avoid Them
  • 28. Thanks! Pitfalls in Benchmarking Data Stream Classification and How to Avoid Them