SlideShare a Scribd company logo
1 of 20
HistoSketch: Fast Similarity-Preserving Sketching
of Streaming Histograms with Concept Drift
Dingqi Yang*, Bin Li†, Laura Rettig*, Philippe Cudré-Mauroux*
*eXascale Infolab, University of Fribourg, Switzerland
†School of Computer Science, Fudan University, Shanghai, China
1
HistoSketch: Fast Similarity-Preserving Sketching of Streaming
Histograms with Concept Drift
2
What kind of location is this?
Places I’ve been:
Bar University
Museum Supermarket
0.7 0.6 0.14 0.21 0.41 0.63
0.64 0.65 0.21 0.86 0.24 0.82
0.64 0.65 0.21 0.86 0.24 0.82
0.7 0.6 0.14 0.21 0.41 0.63
Compute similarity
?
Motivation
• Histogram similarity: foundation for many machine learning tasks
• Cardinality of histograms over data streams continuously increases
• Similarity-preserving data sketches
• Compact, fixed size
• Preserve similarity under certain measure
• Are incrementally updateable
• Concept drift: distribution of a histogram changes over time
• If taken into account can improve accuracy of histogram-based similarity
techniques
• Typical method: gradual forgetting
3
Background
Given a data stream of incoming elements xt, with a weight wt
we compute a histogram V such that
Vi is the weighted cumulative count of the element i.
4
xtxt-1...
Streaming histogram elements xt with
wt
Corresponding histogram V
xt-2
Problem Formulation
• Create and maintain the similarity-preserving sketch S for the full
streaming histogram V such that
• each sketch has a fixed size K (K≪ |ℰ|);
• the collision probability between two sketches Sa and Sb is the normalized
similarity between the histograms Va and Vb
 the Hamming distance between Sa and Sb approximates SIMNMM(Va, Vb);
• the sketch S(t+1) can be efficiently computed from the incoming histograms
element xt+1, S(t), and a weight decay factor λ.
5
xtxt-1...
New element xt+1 received Incremental updating
xt+1
S(t+1)
xt+1S(t) λ
xt-2
HistoSketch
• Based on the idea of consistent weighted sampling
• Generate samples such that the probability of drawing identical samples from
two vectors is equal to their min-max similarity.
• Method draws three random variables 𝑟𝑖,𝑗~𝐺𝑎𝑚𝑚𝑎(2,1), 𝑐𝑖,𝑗~𝐺𝑎𝑚𝑚𝑎 2,1 ,
𝛽𝑖,𝑗~𝑈𝑛𝑖𝑓𝑜𝑟𝑚(0,1) and then computes
𝑦𝑖,𝑗 = exp 𝑟𝑖,𝑗
log 𝑉𝑖
𝑟𝑖,𝑗
+ 𝛽𝑖,𝑗 − 𝛽𝑖,𝑗
which is used as input to the random hash value generation.
6
HistoSketch
• We propose a new method to compute 𝑦𝑖,𝑗
𝑦𝑖,𝑗 = exp(log 𝑉𝑖 − 𝑟𝑖,𝑗 𝛽𝑖,𝑗)
• and show that this method is 1) correct and 2) scale-invariant.
Sketch creation
𝑎𝑖,𝑗 =
𝑐𝑖,𝑗
𝑦𝑖,𝑗exp(𝑟𝑖,𝑗)
7
Sketch element Sj
Histogram V
0.7 0.6 0.14 0.21 0.41ai,j
3
0.14
The corresponding
hash value Aj
Computing hash values
1 2 3 4 5i =
Minimum
1. compute 𝑦𝑖,𝑗
2. compute
hash value 𝑎𝑖,𝑗
3. set 𝑆𝑗 = 𝑎𝑟𝑔𝑚𝑖𝑛 𝑖∈ℇ
𝑎𝑖,𝑗
4. set 𝐴𝑗 = 𝑚𝑖𝑛 𝑖∈ℇ 𝑎𝑖,𝑗
HistoSketch
Incremental Sketch Update
Computation of sketch 𝑆 𝑡 + 1 relies only on 𝑆(𝑡) (with its corresponding hash values
𝐴 𝑡 ), an incoming element 𝑥𝑡+1 and the weight decay factor 𝜆.
8
Sketch element Sj(t) 3
0.147Adjusted hash value Aj(t)e-λ
1 2 3 4 5i =
Step II. Add xt+1
- 0.142 - - -ai,j
Computing hash value for i
1 2 3 4 5i =
Adjusting sketch
Sketch element Sj(t+1)2
0.142 Hash value Aj(t+1)
Step I. Scale V(t) by e-λ
Step III. Update sketch
0.14Original hash value Aj(t)
0.14×1/(e-λ)
Minimum
1. scale existing
elements in A
2. add 𝑖′ to
histogram
3. recompute 𝑎𝑖′
,𝑗
4. update sketch 𝑆𝑗 and hash
values 𝐴𝑗 with minimum 𝑎𝑗
Experimental Evaluation
• Classification task
• Given labeled streaming histograms, classify those histogram instances
without label
• KNN classifier takes data in the form of sketches for classification with 𝐾 = 5
• KNN takes most up-to-date training data for classification from continuously
updated sketches
9
Experimental Evaluation
• Synthetic dataset
• Generated from two Gaussian distributions representing two classes
• Simulate data streams with concept drift
• Abrupt: one stream starts to receive all elements from the other distribution
• Gradual: one stream starts to receive elements from the other distribution with
increasing probability, and the labels change
• Criteria:
1. How well is the similarity approximated? (impact of sketch length K)
2. How fast can it adapt to concept drift? (impact of weight decay factor λ)
10
Experimental Evaluation
1. Impact of sketch length K
• Fix 𝜆 = 0.02 and vary
𝐾 = [20, 50, 100, 200, 500, 1000]
• Compare against two methods that
retain the full histograms:
• Histogram-Classical with unweighted
elements
• Histogram-Forgetting with gradual
forgetting weights
• A sketch length of 𝐾 = 500 is
sufficient to approximate Histogram-
Forgetting
11
Experimental Evaluation
2. Impact of weight decay factor λ
• Fix 𝐾 = 100 and vary
𝜆 = [0, 0.005, 0.01, 0.02, 0.05, 0.1]
• Compare against Histogram-LatestK
which builds a histogram from the
latest 𝐾 = 100 elements in the
stream (unweighted)
• Similarity computation time:
• HistoSketch: 13ms
• Histogram-LatestK: 133ms
12
Experimental Evaluation
• POI dataset
• Infer a place’s category from its customers’ visiting pattern
• Foursquare dataset: user check-ins for two years from NYC, TKY, IST
• Data: user-time visit pairs discretized to the 168 hours in a week
• Comparised methods:
• Histogram-Coarse: discretized time slots are considered as histogram elements
• Histogram-Fine-Classical: user-time pairs are considered as histogram elements
• Histogram-Fine-LatestK: only latest K histogram elements
• Histogram-Fine-Forgetting: gradual forgetting weights (𝜆 = 0.01)
• POISketch: unweighted sketching method that approximates Histogram-Fine-Classical
• HistoSketch: approximates Histogram-Fine-Forgetting (𝜆 = 0.01)
• Fix 𝐾 = 100
13
Experimental Evaluation
Classification accuracy
14
Experimental Evaluation
Runtime performance: classification time
15
Conclusion
• We introduced HistoSketch, an efficient similarity preserving sketching method
for streaming histograms with concept drift.
• We demonstrated the effectiveness in approximating normalized min-max
similarity.
• We use incremental updates to the sketches with gradual forgetting to adapt to
concept drift.
• We showed on both synthetic and real-world data sets that this method
effectively and efficiently approximates similarity and adapts to concept drift.
• We observed a speed-up of 7500x on classification with a small loss of accuracy
of around 3.5%.
16
Thank you!
Backup: Histogram Similarity
• Min-max similarity
𝑆𝑖𝑚 𝑀𝑀 𝑉 𝑎
, 𝑉 𝑏
=
Σ𝑖∈ℰmin(𝑉𝑖
𝑎
, 𝑉𝑖
𝑏
)
Σ𝑖∈ℰmax(𝑉𝑖
𝑎
, 𝑉𝑖
𝑏
)
• …normalized: sum-to-one normalization
𝑖∈ℰ
𝑉𝑖
𝑎
= 1,
𝑖∈ℰ
𝑉𝑖
𝑏
= 1
• The collision probability between two sketches 𝑆 𝑎, 𝑆 𝑏 is exactly the
normalized min-max similarity between 𝑉 𝑎, 𝑉 𝑏
Pr 𝑆𝑗
𝑎
= 𝑆𝑗
𝑏
= 𝑆𝑖𝑚 𝑁𝑀𝑀 𝑉 𝑎, 𝑉 𝑏
17
Backup: HistoSketch Implementation
• Former histogram 𝑉 𝑡 is required to compute 𝑉(𝑡 + 1)
• The previous histogram is maintained in a modified count-min sketch 𝑄
• We extend the count-min sketch with decay weights by scaling all counters
𝑄(𝑡) ∙ 𝑒−𝜆
• Parameter configuration: 𝑑 = 10, 𝑔 = 50
guarantees an error of at most 4% with probability 0.999
18
Backup: Experimental Evaluation
Classification accuracy over time
19
Backup: Future Work
• Way to compute 𝑎𝑖,𝑗 can be further simplified
• Applications to other domains: e.g., recommendation, community
detection
20

More Related Content

What's hot

AWS Black Belt Online Seminar Amazon Redshift
AWS Black Belt Online Seminar Amazon RedshiftAWS Black Belt Online Seminar Amazon Redshift
AWS Black Belt Online Seminar Amazon RedshiftAmazon Web Services Japan
 
普通のRailsアプリをdockerで本番運用する知見
普通のRailsアプリをdockerで本番運用する知見普通のRailsアプリをdockerで本番運用する知見
普通のRailsアプリをdockerで本番運用する知見zaru sakuraba
 
Container Storage Best Practices in 2017
Container Storage Best Practices in 2017Container Storage Best Practices in 2017
Container Storage Best Practices in 2017Keith Resar
 
Amazon Athena 初心者向けハンズオン
Amazon Athena 初心者向けハンズオンAmazon Athena 初心者向けハンズオン
Amazon Athena 初心者向けハンズオンAmazon Web Services Japan
 
Building a Distributed Reservation System with Cassandra (Andrew Baker & Jeff...
Building a Distributed Reservation System with Cassandra (Andrew Baker & Jeff...Building a Distributed Reservation System with Cassandra (Andrew Baker & Jeff...
Building a Distributed Reservation System with Cassandra (Andrew Baker & Jeff...DataStax
 
Thrift vs Protocol Buffers vs Avro - Biased Comparison
Thrift vs Protocol Buffers vs Avro - Biased ComparisonThrift vs Protocol Buffers vs Avro - Biased Comparison
Thrift vs Protocol Buffers vs Avro - Biased ComparisonIgor Anishchenko
 
AWS初心者向けWebinar AWSでのNoSQLの活用
AWS初心者向けWebinar AWSでのNoSQLの活用AWS初心者向けWebinar AWSでのNoSQLの活用
AWS初心者向けWebinar AWSでのNoSQLの活用Amazon Web Services Japan
 
いまさら、AWSのネットワーク設計
いまさら、AWSのネットワーク設計いまさら、AWSのネットワーク設計
いまさら、AWSのネットワーク設計Serverworks Co.,Ltd.
 
So You Want to Write an Exporter
So You Want to Write an ExporterSo You Want to Write an Exporter
So You Want to Write an ExporterBrian Brazil
 
Docker composeで開発環境をメンバに配布せよ
Docker composeで開発環境をメンバに配布せよDocker composeで開発環境をメンバに配布せよ
Docker composeで開発環境をメンバに配布せよYusuke Kon
 
CloudFrontのリアルタイムログをKibanaで可視化しよう
CloudFrontのリアルタイムログをKibanaで可視化しようCloudFrontのリアルタイムログをKibanaで可視化しよう
CloudFrontのリアルタイムログをKibanaで可視化しようEiji KOMINAMI
 
[AWS Summit 2012] クラウドデザインパターン#5 CDP バッチ処理編
[AWS Summit 2012] クラウドデザインパターン#5 CDP バッチ処理編[AWS Summit 2012] クラウドデザインパターン#5 CDP バッチ処理編
[AWS Summit 2012] クラウドデザインパターン#5 CDP バッチ処理編Amazon Web Services Japan
 
とある診断員とAWS
とある診断員とAWSとある診断員とAWS
とある診断員とAWSzaki4649
 
ELB & CloudWatch & AutoScaling - AWSマイスターシリーズ
ELB & CloudWatch & AutoScaling - AWSマイスターシリーズELB & CloudWatch & AutoScaling - AWSマイスターシリーズ
ELB & CloudWatch & AutoScaling - AWSマイスターシリーズAmazon Web Services Japan
 
20190703 AWS Black Belt Online Seminar Amazon MQ
20190703 AWS Black Belt Online Seminar Amazon MQ20190703 AWS Black Belt Online Seminar Amazon MQ
20190703 AWS Black Belt Online Seminar Amazon MQAmazon Web Services Japan
 
AWS Black Belt Techシリーズ Elastic Load Balancing (ELB)
AWS Black Belt Techシリーズ  Elastic Load Balancing (ELB)AWS Black Belt Techシリーズ  Elastic Load Balancing (ELB)
AWS Black Belt Techシリーズ Elastic Load Balancing (ELB)Amazon Web Services Japan
 
クラウドのためのアーキテクチャ設計 - ベストプラクティス -
クラウドのためのアーキテクチャ設計 - ベストプラクティス - クラウドのためのアーキテクチャ設計 - ベストプラクティス -
クラウドのためのアーキテクチャ設計 - ベストプラクティス - SORACOM, INC
 
OWASP Top 10 2021 - let's take a closer look by Glenn Wilson
OWASP Top 10 2021 - let's take a closer look by Glenn WilsonOWASP Top 10 2021 - let's take a closer look by Glenn Wilson
OWASP Top 10 2021 - let's take a closer look by Glenn WilsonAlex Cachia
 

What's hot (20)

AWS Black Belt Online Seminar Amazon Redshift
AWS Black Belt Online Seminar Amazon RedshiftAWS Black Belt Online Seminar Amazon Redshift
AWS Black Belt Online Seminar Amazon Redshift
 
普通のRailsアプリをdockerで本番運用する知見
普通のRailsアプリをdockerで本番運用する知見普通のRailsアプリをdockerで本番運用する知見
普通のRailsアプリをdockerで本番運用する知見
 
Container Storage Best Practices in 2017
Container Storage Best Practices in 2017Container Storage Best Practices in 2017
Container Storage Best Practices in 2017
 
Amazon Athena 初心者向けハンズオン
Amazon Athena 初心者向けハンズオンAmazon Athena 初心者向けハンズオン
Amazon Athena 初心者向けハンズオン
 
Building a Distributed Reservation System with Cassandra (Andrew Baker & Jeff...
Building a Distributed Reservation System with Cassandra (Andrew Baker & Jeff...Building a Distributed Reservation System with Cassandra (Andrew Baker & Jeff...
Building a Distributed Reservation System with Cassandra (Andrew Baker & Jeff...
 
Thrift vs Protocol Buffers vs Avro - Biased Comparison
Thrift vs Protocol Buffers vs Avro - Biased ComparisonThrift vs Protocol Buffers vs Avro - Biased Comparison
Thrift vs Protocol Buffers vs Avro - Biased Comparison
 
AWS初心者向けWebinar AWSでのNoSQLの活用
AWS初心者向けWebinar AWSでのNoSQLの活用AWS初心者向けWebinar AWSでのNoSQLの活用
AWS初心者向けWebinar AWSでのNoSQLの活用
 
いまさら、AWSのネットワーク設計
いまさら、AWSのネットワーク設計いまさら、AWSのネットワーク設計
いまさら、AWSのネットワーク設計
 
So You Want to Write an Exporter
So You Want to Write an ExporterSo You Want to Write an Exporter
So You Want to Write an Exporter
 
Docker composeで開発環境をメンバに配布せよ
Docker composeで開発環境をメンバに配布せよDocker composeで開発環境をメンバに配布せよ
Docker composeで開発環境をメンバに配布せよ
 
CloudFrontのリアルタイムログをKibanaで可視化しよう
CloudFrontのリアルタイムログをKibanaで可視化しようCloudFrontのリアルタイムログをKibanaで可視化しよう
CloudFrontのリアルタイムログをKibanaで可視化しよう
 
[AWS Summit 2012] クラウドデザインパターン#5 CDP バッチ処理編
[AWS Summit 2012] クラウドデザインパターン#5 CDP バッチ処理編[AWS Summit 2012] クラウドデザインパターン#5 CDP バッチ処理編
[AWS Summit 2012] クラウドデザインパターン#5 CDP バッチ処理編
 
とある診断員とAWS
とある診断員とAWSとある診断員とAWS
とある診断員とAWS
 
ELB & CloudWatch & AutoScaling - AWSマイスターシリーズ
ELB & CloudWatch & AutoScaling - AWSマイスターシリーズELB & CloudWatch & AutoScaling - AWSマイスターシリーズ
ELB & CloudWatch & AutoScaling - AWSマイスターシリーズ
 
失敗から学ぶAWSの監視
失敗から学ぶAWSの監視失敗から学ぶAWSの監視
失敗から学ぶAWSの監視
 
20190703 AWS Black Belt Online Seminar Amazon MQ
20190703 AWS Black Belt Online Seminar Amazon MQ20190703 AWS Black Belt Online Seminar Amazon MQ
20190703 AWS Black Belt Online Seminar Amazon MQ
 
AWS Black Belt Techシリーズ Elastic Load Balancing (ELB)
AWS Black Belt Techシリーズ  Elastic Load Balancing (ELB)AWS Black Belt Techシリーズ  Elastic Load Balancing (ELB)
AWS Black Belt Techシリーズ Elastic Load Balancing (ELB)
 
動画配信プラットフォーム on AWS
動画配信プラットフォーム on AWS動画配信プラットフォーム on AWS
動画配信プラットフォーム on AWS
 
クラウドのためのアーキテクチャ設計 - ベストプラクティス -
クラウドのためのアーキテクチャ設計 - ベストプラクティス - クラウドのためのアーキテクチャ設計 - ベストプラクティス -
クラウドのためのアーキテクチャ設計 - ベストプラクティス -
 
OWASP Top 10 2021 - let's take a closer look by Glenn Wilson
OWASP Top 10 2021 - let's take a closer look by Glenn WilsonOWASP Top 10 2021 - let's take a closer look by Glenn Wilson
OWASP Top 10 2021 - let's take a closer look by Glenn Wilson
 

Similar to HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms with Concept Drift

Variational inference
Variational inference  Variational inference
Variational inference Natan Katz
 
Towards Accurate Multi-person Pose Estimation in the Wild (My summery)
Towards Accurate Multi-person Pose Estimation in the Wild (My summery)Towards Accurate Multi-person Pose Estimation in the Wild (My summery)
Towards Accurate Multi-person Pose Estimation in the Wild (My summery)Abdulrahman Kerim
 
clustering tendency
clustering tendencyclustering tendency
clustering tendencyAmir Shokri
 
[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You Need[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You NeedDaiki Tanaka
 
Human action recognition with kinect using a joint motion descriptor
Human action recognition with kinect using a joint motion descriptorHuman action recognition with kinect using a joint motion descriptor
Human action recognition with kinect using a joint motion descriptorSoma Boubou
 
is anyone_interest_in_auto-encoding_variational-bayes
is anyone_interest_in_auto-encoding_variational-bayesis anyone_interest_in_auto-encoding_variational-bayes
is anyone_interest_in_auto-encoding_variational-bayesNAVER Engineering
 
CSA 3702 machine learning module 3
CSA 3702 machine learning module 3CSA 3702 machine learning module 3
CSA 3702 machine learning module 3Nandhini S
 
Jindřich Libovický - 2017 - Attention Strategies for Multi-Source Sequence-...
Jindřich Libovický - 2017 - Attention Strategies for Multi-Source Sequence-...Jindřich Libovický - 2017 - Attention Strategies for Multi-Source Sequence-...
Jindřich Libovický - 2017 - Attention Strategies for Multi-Source Sequence-...Association for Computational Linguistics
 
ngboost.pptx
ngboost.pptxngboost.pptx
ngboost.pptxHadrian7
 
MediaEval 2015 - UNED-UV @ Retrieving Diverse Social Images Task - Poster
MediaEval 2015 - UNED-UV @ Retrieving Diverse Social Images Task - PosterMediaEval 2015 - UNED-UV @ Retrieving Diverse Social Images Task - Poster
MediaEval 2015 - UNED-UV @ Retrieving Diverse Social Images Task - Postermultimediaeval
 
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
Show, Attend and Tell: Neural Image Caption Generation with Visual AttentionShow, Attend and Tell: Neural Image Caption Generation with Visual Attention
Show, Attend and Tell: Neural Image Caption Generation with Visual AttentionEun Ji Lee
 
Exploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation LearningExploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation LearningSungchul Kim
 
Neural netorksmatching
Neural netorksmatchingNeural netorksmatching
Neural netorksmatchingMasa Kato
 
generalized_nbody_acs_2015_challacombe
generalized_nbody_acs_2015_challacombegeneralized_nbody_acs_2015_challacombe
generalized_nbody_acs_2015_challacombeMatt Challacombe
 
Score based Generative Modeling through Stochastic Differential Equations
Score based Generative Modeling through Stochastic Differential EquationsScore based Generative Modeling through Stochastic Differential Equations
Score based Generative Modeling through Stochastic Differential EquationsSungchul Kim
 
Temporal Superpixels Based on Proximity-Weighted Patch Matching
Temporal Superpixels Based on Proximity-Weighted Patch MatchingTemporal Superpixels Based on Proximity-Weighted Patch Matching
Temporal Superpixels Based on Proximity-Weighted Patch MatchingNAVER Engineering
 

Similar to HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms with Concept Drift (20)

230727_HB_JointJournalClub.pptx
230727_HB_JointJournalClub.pptx230727_HB_JointJournalClub.pptx
230727_HB_JointJournalClub.pptx
 
Variational inference
Variational inference  Variational inference
Variational inference
 
Towards Accurate Multi-person Pose Estimation in the Wild (My summery)
Towards Accurate Multi-person Pose Estimation in the Wild (My summery)Towards Accurate Multi-person Pose Estimation in the Wild (My summery)
Towards Accurate Multi-person Pose Estimation in the Wild (My summery)
 
clustering tendency
clustering tendencyclustering tendency
clustering tendency
 
[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You Need[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You Need
 
Human action recognition with kinect using a joint motion descriptor
Human action recognition with kinect using a joint motion descriptorHuman action recognition with kinect using a joint motion descriptor
Human action recognition with kinect using a joint motion descriptor
 
is anyone_interest_in_auto-encoding_variational-bayes
is anyone_interest_in_auto-encoding_variational-bayesis anyone_interest_in_auto-encoding_variational-bayes
is anyone_interest_in_auto-encoding_variational-bayes
 
CSA 3702 machine learning module 3
CSA 3702 machine learning module 3CSA 3702 machine learning module 3
CSA 3702 machine learning module 3
 
Jindřich Libovický - 2017 - Attention Strategies for Multi-Source Sequence-...
Jindřich Libovický - 2017 - Attention Strategies for Multi-Source Sequence-...Jindřich Libovický - 2017 - Attention Strategies for Multi-Source Sequence-...
Jindřich Libovický - 2017 - Attention Strategies for Multi-Source Sequence-...
 
CLIM Program: Remote Sensing Workshop, Optimization for Distributed Data Syst...
CLIM Program: Remote Sensing Workshop, Optimization for Distributed Data Syst...CLIM Program: Remote Sensing Workshop, Optimization for Distributed Data Syst...
CLIM Program: Remote Sensing Workshop, Optimization for Distributed Data Syst...
 
Sparse codes for natural images
Sparse codes for natural imagesSparse codes for natural images
Sparse codes for natural images
 
ngboost.pptx
ngboost.pptxngboost.pptx
ngboost.pptx
 
MediaEval 2015 - UNED-UV @ Retrieving Diverse Social Images Task - Poster
MediaEval 2015 - UNED-UV @ Retrieving Diverse Social Images Task - PosterMediaEval 2015 - UNED-UV @ Retrieving Diverse Social Images Task - Poster
MediaEval 2015 - UNED-UV @ Retrieving Diverse Social Images Task - Poster
 
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
Show, Attend and Tell: Neural Image Caption Generation with Visual AttentionShow, Attend and Tell: Neural Image Caption Generation with Visual Attention
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
 
Exploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation LearningExploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation Learning
 
Neural netorksmatching
Neural netorksmatchingNeural netorksmatching
Neural netorksmatching
 
generalized_nbody_acs_2015_challacombe
generalized_nbody_acs_2015_challacombegeneralized_nbody_acs_2015_challacombe
generalized_nbody_acs_2015_challacombe
 
Score based Generative Modeling through Stochastic Differential Equations
Score based Generative Modeling through Stochastic Differential EquationsScore based Generative Modeling through Stochastic Differential Equations
Score based Generative Modeling through Stochastic Differential Equations
 
Temporal Superpixels Based on Proximity-Weighted Patch Matching
Temporal Superpixels Based on Proximity-Weighted Patch MatchingTemporal Superpixels Based on Proximity-Weighted Patch Matching
Temporal Superpixels Based on Proximity-Weighted Patch Matching
 
Sudoku
SudokuSudoku
Sudoku
 

More from eXascale Infolab

Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction
Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link PredictionBeyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction
Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link PredictioneXascale Infolab
 
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...eXascale Infolab
 
Representation Learning on Complex Graphs
Representation Learning on Complex GraphsRepresentation Learning on Complex Graphs
Representation Learning on Complex GraphseXascale Infolab
 
A force directed approach for offline gps trajectory map
A force directed approach for offline gps trajectory mapA force directed approach for offline gps trajectory map
A force directed approach for offline gps trajectory mapeXascale Infolab
 
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...eXascale Infolab
 
Dependency-Driven Analytics: A Compass for Uncharted Data Oceans
Dependency-Driven Analytics: A Compass for Uncharted Data OceansDependency-Driven Analytics: A Compass for Uncharted Data Oceans
Dependency-Driven Analytics: A Compass for Uncharted Data OceanseXascale Infolab
 
SANAPHOR: Ontology-based Coreference Resolution
SANAPHOR: Ontology-based Coreference ResolutionSANAPHOR: Ontology-based Coreference Resolution
SANAPHOR: Ontology-based Coreference ResolutioneXascale Infolab
 
Efficient, Scalable, and Provenance-Aware Management of Linked Data
Efficient, Scalable, and Provenance-Aware Management of Linked DataEfficient, Scalable, and Provenance-Aware Management of Linked Data
Efficient, Scalable, and Provenance-Aware Management of Linked DataeXascale Infolab
 
Entity-Centric Data Management
Entity-Centric Data ManagementEntity-Centric Data Management
Entity-Centric Data ManagementeXascale Infolab
 
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked Data
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked DataLDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked Data
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked DataeXascale Infolab
 
Executing Provenance-Enabled Queries over Web Data
Executing Provenance-Enabled Queries over Web DataExecuting Provenance-Enabled Queries over Web Data
Executing Provenance-Enabled Queries over Web DataeXascale Infolab
 
The Dynamics of Micro-Task Crowdsourcing
The Dynamics of Micro-Task CrowdsourcingThe Dynamics of Micro-Task Crowdsourcing
The Dynamics of Micro-Task CrowdsourcingeXascale Infolab
 
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...eXascale Infolab
 
CIKM14: Fixing grammatical errors by preposition ranking
CIKM14: Fixing grammatical errors by preposition rankingCIKM14: Fixing grammatical errors by preposition ranking
CIKM14: Fixing grammatical errors by preposition rankingeXascale Infolab
 
An Introduction to Big Data
An Introduction to Big DataAn Introduction to Big Data
An Introduction to Big DataeXascale Infolab
 
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)eXascale Infolab
 

More from eXascale Infolab (20)

Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction
Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link PredictionBeyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction
Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction
 
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
 
Representation Learning on Complex Graphs
Representation Learning on Complex GraphsRepresentation Learning on Complex Graphs
Representation Learning on Complex Graphs
 
A force directed approach for offline gps trajectory map
A force directed approach for offline gps trajectory mapA force directed approach for offline gps trajectory map
A force directed approach for offline gps trajectory map
 
Cikm 2018
Cikm 2018Cikm 2018
Cikm 2018
 
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...
 
Dependency-Driven Analytics: A Compass for Uncharted Data Oceans
Dependency-Driven Analytics: A Compass for Uncharted Data OceansDependency-Driven Analytics: A Compass for Uncharted Data Oceans
Dependency-Driven Analytics: A Compass for Uncharted Data Oceans
 
Crowd scheduling www2016
Crowd scheduling www2016Crowd scheduling www2016
Crowd scheduling www2016
 
SANAPHOR: Ontology-based Coreference Resolution
SANAPHOR: Ontology-based Coreference ResolutionSANAPHOR: Ontology-based Coreference Resolution
SANAPHOR: Ontology-based Coreference Resolution
 
Efficient, Scalable, and Provenance-Aware Management of Linked Data
Efficient, Scalable, and Provenance-Aware Management of Linked DataEfficient, Scalable, and Provenance-Aware Management of Linked Data
Efficient, Scalable, and Provenance-Aware Management of Linked Data
 
Entity-Centric Data Management
Entity-Centric Data ManagementEntity-Centric Data Management
Entity-Centric Data Management
 
SSSW 2015 Sense Making
SSSW 2015 Sense MakingSSSW 2015 Sense Making
SSSW 2015 Sense Making
 
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked Data
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked DataLDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked Data
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked Data
 
Executing Provenance-Enabled Queries over Web Data
Executing Provenance-Enabled Queries over Web DataExecuting Provenance-Enabled Queries over Web Data
Executing Provenance-Enabled Queries over Web Data
 
The Dynamics of Micro-Task Crowdsourcing
The Dynamics of Micro-Task CrowdsourcingThe Dynamics of Micro-Task Crowdsourcing
The Dynamics of Micro-Task Crowdsourcing
 
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...
 
CIKM14: Fixing grammatical errors by preposition ranking
CIKM14: Fixing grammatical errors by preposition rankingCIKM14: Fixing grammatical errors by preposition ranking
CIKM14: Fixing grammatical errors by preposition ranking
 
OLTP-Bench
OLTP-BenchOLTP-Bench
OLTP-Bench
 
An Introduction to Big Data
An Introduction to Big DataAn Introduction to Big Data
An Introduction to Big Data
 
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)
 

Recently uploaded

Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Pooja Nehwal
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...karishmasinghjnh
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...amitlee9823
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...amitlee9823
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...amitlee9823
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...amitlee9823
 

Recently uploaded (20)

Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 

HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms with Concept Drift

  • 1. HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms with Concept Drift Dingqi Yang*, Bin Li†, Laura Rettig*, Philippe Cudré-Mauroux* *eXascale Infolab, University of Fribourg, Switzerland †School of Computer Science, Fudan University, Shanghai, China 1
  • 2. HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms with Concept Drift 2 What kind of location is this? Places I’ve been: Bar University Museum Supermarket 0.7 0.6 0.14 0.21 0.41 0.63 0.64 0.65 0.21 0.86 0.24 0.82 0.64 0.65 0.21 0.86 0.24 0.82 0.7 0.6 0.14 0.21 0.41 0.63 Compute similarity ?
  • 3. Motivation • Histogram similarity: foundation for many machine learning tasks • Cardinality of histograms over data streams continuously increases • Similarity-preserving data sketches • Compact, fixed size • Preserve similarity under certain measure • Are incrementally updateable • Concept drift: distribution of a histogram changes over time • If taken into account can improve accuracy of histogram-based similarity techniques • Typical method: gradual forgetting 3
  • 4. Background Given a data stream of incoming elements xt, with a weight wt we compute a histogram V such that Vi is the weighted cumulative count of the element i. 4 xtxt-1... Streaming histogram elements xt with wt Corresponding histogram V xt-2
  • 5. Problem Formulation • Create and maintain the similarity-preserving sketch S for the full streaming histogram V such that • each sketch has a fixed size K (K≪ |ℰ|); • the collision probability between two sketches Sa and Sb is the normalized similarity between the histograms Va and Vb  the Hamming distance between Sa and Sb approximates SIMNMM(Va, Vb); • the sketch S(t+1) can be efficiently computed from the incoming histograms element xt+1, S(t), and a weight decay factor λ. 5 xtxt-1... New element xt+1 received Incremental updating xt+1 S(t+1) xt+1S(t) λ xt-2
  • 6. HistoSketch • Based on the idea of consistent weighted sampling • Generate samples such that the probability of drawing identical samples from two vectors is equal to their min-max similarity. • Method draws three random variables 𝑟𝑖,𝑗~𝐺𝑎𝑚𝑚𝑎(2,1), 𝑐𝑖,𝑗~𝐺𝑎𝑚𝑚𝑎 2,1 , 𝛽𝑖,𝑗~𝑈𝑛𝑖𝑓𝑜𝑟𝑚(0,1) and then computes 𝑦𝑖,𝑗 = exp 𝑟𝑖,𝑗 log 𝑉𝑖 𝑟𝑖,𝑗 + 𝛽𝑖,𝑗 − 𝛽𝑖,𝑗 which is used as input to the random hash value generation. 6
  • 7. HistoSketch • We propose a new method to compute 𝑦𝑖,𝑗 𝑦𝑖,𝑗 = exp(log 𝑉𝑖 − 𝑟𝑖,𝑗 𝛽𝑖,𝑗) • and show that this method is 1) correct and 2) scale-invariant. Sketch creation 𝑎𝑖,𝑗 = 𝑐𝑖,𝑗 𝑦𝑖,𝑗exp(𝑟𝑖,𝑗) 7 Sketch element Sj Histogram V 0.7 0.6 0.14 0.21 0.41ai,j 3 0.14 The corresponding hash value Aj Computing hash values 1 2 3 4 5i = Minimum 1. compute 𝑦𝑖,𝑗 2. compute hash value 𝑎𝑖,𝑗 3. set 𝑆𝑗 = 𝑎𝑟𝑔𝑚𝑖𝑛 𝑖∈ℇ 𝑎𝑖,𝑗 4. set 𝐴𝑗 = 𝑚𝑖𝑛 𝑖∈ℇ 𝑎𝑖,𝑗
  • 8. HistoSketch Incremental Sketch Update Computation of sketch 𝑆 𝑡 + 1 relies only on 𝑆(𝑡) (with its corresponding hash values 𝐴 𝑡 ), an incoming element 𝑥𝑡+1 and the weight decay factor 𝜆. 8 Sketch element Sj(t) 3 0.147Adjusted hash value Aj(t)e-λ 1 2 3 4 5i = Step II. Add xt+1 - 0.142 - - -ai,j Computing hash value for i 1 2 3 4 5i = Adjusting sketch Sketch element Sj(t+1)2 0.142 Hash value Aj(t+1) Step I. Scale V(t) by e-λ Step III. Update sketch 0.14Original hash value Aj(t) 0.14×1/(e-λ) Minimum 1. scale existing elements in A 2. add 𝑖′ to histogram 3. recompute 𝑎𝑖′ ,𝑗 4. update sketch 𝑆𝑗 and hash values 𝐴𝑗 with minimum 𝑎𝑗
  • 9. Experimental Evaluation • Classification task • Given labeled streaming histograms, classify those histogram instances without label • KNN classifier takes data in the form of sketches for classification with 𝐾 = 5 • KNN takes most up-to-date training data for classification from continuously updated sketches 9
  • 10. Experimental Evaluation • Synthetic dataset • Generated from two Gaussian distributions representing two classes • Simulate data streams with concept drift • Abrupt: one stream starts to receive all elements from the other distribution • Gradual: one stream starts to receive elements from the other distribution with increasing probability, and the labels change • Criteria: 1. How well is the similarity approximated? (impact of sketch length K) 2. How fast can it adapt to concept drift? (impact of weight decay factor λ) 10
  • 11. Experimental Evaluation 1. Impact of sketch length K • Fix 𝜆 = 0.02 and vary 𝐾 = [20, 50, 100, 200, 500, 1000] • Compare against two methods that retain the full histograms: • Histogram-Classical with unweighted elements • Histogram-Forgetting with gradual forgetting weights • A sketch length of 𝐾 = 500 is sufficient to approximate Histogram- Forgetting 11
  • 12. Experimental Evaluation 2. Impact of weight decay factor λ • Fix 𝐾 = 100 and vary 𝜆 = [0, 0.005, 0.01, 0.02, 0.05, 0.1] • Compare against Histogram-LatestK which builds a histogram from the latest 𝐾 = 100 elements in the stream (unweighted) • Similarity computation time: • HistoSketch: 13ms • Histogram-LatestK: 133ms 12
  • 13. Experimental Evaluation • POI dataset • Infer a place’s category from its customers’ visiting pattern • Foursquare dataset: user check-ins for two years from NYC, TKY, IST • Data: user-time visit pairs discretized to the 168 hours in a week • Comparised methods: • Histogram-Coarse: discretized time slots are considered as histogram elements • Histogram-Fine-Classical: user-time pairs are considered as histogram elements • Histogram-Fine-LatestK: only latest K histogram elements • Histogram-Fine-Forgetting: gradual forgetting weights (𝜆 = 0.01) • POISketch: unweighted sketching method that approximates Histogram-Fine-Classical • HistoSketch: approximates Histogram-Fine-Forgetting (𝜆 = 0.01) • Fix 𝐾 = 100 13
  • 16. Conclusion • We introduced HistoSketch, an efficient similarity preserving sketching method for streaming histograms with concept drift. • We demonstrated the effectiveness in approximating normalized min-max similarity. • We use incremental updates to the sketches with gradual forgetting to adapt to concept drift. • We showed on both synthetic and real-world data sets that this method effectively and efficiently approximates similarity and adapts to concept drift. • We observed a speed-up of 7500x on classification with a small loss of accuracy of around 3.5%. 16 Thank you!
  • 17. Backup: Histogram Similarity • Min-max similarity 𝑆𝑖𝑚 𝑀𝑀 𝑉 𝑎 , 𝑉 𝑏 = Σ𝑖∈ℰmin(𝑉𝑖 𝑎 , 𝑉𝑖 𝑏 ) Σ𝑖∈ℰmax(𝑉𝑖 𝑎 , 𝑉𝑖 𝑏 ) • …normalized: sum-to-one normalization 𝑖∈ℰ 𝑉𝑖 𝑎 = 1, 𝑖∈ℰ 𝑉𝑖 𝑏 = 1 • The collision probability between two sketches 𝑆 𝑎, 𝑆 𝑏 is exactly the normalized min-max similarity between 𝑉 𝑎, 𝑉 𝑏 Pr 𝑆𝑗 𝑎 = 𝑆𝑗 𝑏 = 𝑆𝑖𝑚 𝑁𝑀𝑀 𝑉 𝑎, 𝑉 𝑏 17
  • 18. Backup: HistoSketch Implementation • Former histogram 𝑉 𝑡 is required to compute 𝑉(𝑡 + 1) • The previous histogram is maintained in a modified count-min sketch 𝑄 • We extend the count-min sketch with decay weights by scaling all counters 𝑄(𝑡) ∙ 𝑒−𝜆 • Parameter configuration: 𝑑 = 10, 𝑔 = 50 guarantees an error of at most 4% with probability 0.999 18
  • 20. Backup: Future Work • Way to compute 𝑎𝑖,𝑗 can be further simplified • Applications to other domains: e.g., recommendation, community detection 20

Editor's Notes

  1. Overall. Highlight some words (lots of monotonous text)
  2. Just from looking at the photo, what location is this? Can’t go there but I know the places I’ve been to. Check from these which is the most similar. Analogy: full histogram = going there (potentially high effort, costly) Sketch = looking at photo to judge similarity to known locations Talk about concept drift
  3. You may be familiar with common sketching techniques such as the very simple bloom filter or the count min sketch; these are not similarity-preserving
  4. V: classical histogram is a vector of ever-growing cardinality W_t are inversely proportional to the age
  5. V: classical histogram is a vector of ever-growing cardinality K is significantly smaller than the actual cardinality ℰ
  6. Refer to paper for the proofs of these properties Keep both sketch S and its corresponding hash values A input: V, sketch length K, random variables r, c, 𝛽 output: S with hash values A for 𝑗=1…𝐾: Compute 𝑦 𝑖,𝑗 Compute 𝑎 𝑖,𝑗 using 𝑦 𝑖,𝑗 Set sketch element 𝑆 𝑗 = 𝑎𝑟𝑔𝑚𝑖𝑛 𝑖∈ℇ 𝑎 𝑖,𝑗 Set hash value 𝐴 𝑗 = 𝑚𝑖𝑛 𝑖∈ℇ 𝑎 𝑖,𝑗
  7. Sum-to-one normalization ~ uniform scaling => can scale only on A and still be correct Upon arrival of 𝑥 𝑡+1 Weights of existing histogram elements are adjusted by a factor of 𝑒 −𝜆 by scaling 𝐴. Add incoming element 𝑖′ with weight 1 to scaled histogram. Recompute 𝑎 𝑖′,𝑗 ; 𝑆 𝑗 𝑡+1 =𝑖′ if 𝑎 𝑖′,𝑗 < 𝐴 𝑗 𝑡 ∙ 𝑒 𝜆 , else 𝑆 𝑗 𝑡 𝐴 𝑗 𝑡+1 = 𝑎 𝑖′,𝑗 if 𝑎 𝑖′,𝑗 < 𝐴 𝑗 𝑡 ∙ 𝑒 𝜆 , else 𝐴 𝑗 𝑡 ∙ 𝑒 𝜆
  8. Approximation = overall accuracy Adaptation to concept drift = accuracy recovery speed after concept drift
  9. Histogram-Classical adapts very slowly Histogram-Forgetting is fast to adapt HistoSketch adapts just as quickly as Histogram-Forgetting to concept drift Sketch length K has no impact on adaptation speed, but has a positive impact on the classification accuracy: larger K  longer sketch  more accurate as they better approximate the original histogram Although there is no big difference beyond K=500 which is almost equal to Histogram-Forgetting
  10. Trade-off between concept drift adaptation speed and accuracy: a high weight decay factor means quicker recovery, as outdated data is quickly forgotten, but overall lower accuracy, as less information from former histograms is used We observe the same trade-off in Histogram-LatestK, with its adaptation being slower than 0.05 and faster than 0.02, but also its accuracy being higher than 0.05 and lower than 0.02 Advantages of HistoSketch: HistoSketch can balance adaptation speed and accuracy, and HistoSketch is much faster for similarity computation (relies only on Hamming distance)
  11. Reminder: what is POI Different places typically have different temporal visiting patterns Fine-grained patterns: user+time instead of only time gives more accuracy POI abrupt change: e.g. change of type of POI – clothing store to art gallery POI gradual change: e.g. small change in POI – new menu items at a restaurant Split into 80% train and 20% unlabeled test Histogram-coarse: (i.e. visitor count per time slot, no information on user) POISketch = HistoSketch with decay factor 0
  12. Histogram-Coarse: worst accuracy (to be expected) Histogram-Fine-Forgetting: highest accuracy HistoSketch outperforms POISketch due to gradual forgetting weights HistoSketch: small loss in accuracy against Histogram-Fine-Forgetting
  13. HistoSketch presents a speedup against Forgetting of about 7500x since the hamming distance can be computed much more efficiently than normalized min-max similarity between full histograms HistoSketch also takes much less memory to maintain Longer sketches = slightly higher processing time, but still much less than other methods Can handle real-world scenarios: Foursquare: peaks at 7 million check-ins per day which is 81 check-ins per second Method is also parallelizable as sketches for different POIs can be independently maintained
  14. TODO: Add normalization
  15. Maybe remove this slide Accuracy increases over time with more information POISketch approximates Classical, HistoSketch approximates Forgetting => higher accuracy Larger improvement with the presence of abrupt drift: HistoSketch is more accurate at handling concept drift than Classical by approximating Forgetting No sudden drop in accuracy as POIs don’t change their type simultaneously