SlideShare a Scribd company logo
1 of 43
Download to read offline
Two Strategies for Large-Scale
Multi-label Classification on
the YouTube-8M Dataset
Author: Dalei Li

Supervisor: Professor Luis Baumela

Technical University of Madrid
1
YouTube-8M Video
Understanding Challenge
https://www.kaggle.com/c/youtube8m
2
YouTube-8M
•7 million videos

• 120 ~ 500 s

• 1000+ reviews

•4716 topics

• Visually recognizable

•3.4 topics per video
Electric guitar
3
Frame-level Dataset
• Extracted 1024-dimensional visual
features using GoogLeNet (Inception-
V3), per frame per second

• Extracted 128-dimensional audio
features from a CNN architecture for
audio classification, per second
4
Video-level Dataset
• Average the visual and audio features
over the frames of a video

(1024 + 128)-dimensional vector
Train
(70%)
Validation
(20%)
Test
(10%)
5
Multi-label Classification
• Given the feature vector x of an
instance (video) and a label (topic) l
from L, predict its probability to have l
6
Architecture to Learn A
Classification Model
7
Challenges
• Labels are not mutually exclusive

• One-vs-rest strategy

• Huge number of instances

• Streaming instances

• Incremental learning
8
One-vs-rest Strategy
• Learn a binary classification model
for each label l from L, i.e., P(l=1|x)

• The instances that have label l are
positive, the other instances are
negative
9
Streaming Instances
A B
Truck
Training
set Process Model
The goods can be split. The
process can be applied to t
instances one by one.
10
Incremental Learning
Improve the model from new
instances (by learning new skills in
human evolution).
11
Models
• Streaming Instances

• Multi-label k-nearest neighbor (k-NN)

• Multi-label radial basis function (RBF) network

• Incremental Learning

• One-vs-rest logistic regression

• Multi-layer neural network
12
1. Multi-label k-NN
• One-vs-rest transforms to m independent
binary classification problems

• Binary k-NN

• Assume when an instance x has label l or not its k
nearest neighbors have different characteristics

• Given an unseen instance x, observe the
characteristics of its k nearest neighbors to
determine it has l or not
13
the instances that have the same label
are usually more similar than the
instances that have different labels.
We assume the instances that are
similar tend to have the same label.

Similarity metric: Cosine similarity
Binary k-NN Idea
• Among the k nearest neighbors, the
number Cl of instances that also have
label l follow different distributions
when an instance x has label l or not

• Compute the posterior probability
P(l=1|x) using these distributions
14
The assumption is formulated to
Binary k-NN Model
• Prior distribution - without looking at an
instance x, the probability distribution
its label l follows

• Likelihood - knowing the label of x, the
distribution of the number Cl follows
• Posterior distribution - knowing the
number Cl of x, the distribution l follows
15
Implementation
• Prior distribution - while streaming the training
instances into a sequence of batches,
compute the sum of label and count

• Likelihood - compute Cl for each training
instance with (without) label l and compute the
proportion of possible each value, i.e., 0,…,k
• Posterior distribution - compute based on the
prior distribution and likelihoods
16
Implementation
17
The Tensorflow graph to find k nearest neighbors for a batch of training instances
2. Logistic Regression
18
A logistic regression model naturally
outputs the posterior probability.
3. Multi-label RBF Network
19
RBF network
An RBF network was proposed to fit a
real-valued function in a high-
dimensional space with a group of
radial basis functions.

In the known data points, the
interpolation value should equal the
desired value. In essence, it is linear
regression in the transformed space
Multi-label RBF Network
20
In essence, it is linear
regression in the transformed
space.
Three-phase Learning
21
RBF parameters (centers and scaling
factors) can be tunable. The linear
regression result is used to initialize
the logistic regression models.
Implementation
22
• Combine ML-RBF network proposed
by Zhang et al. and three-phase
learning by Schwenker et al. …

• The binary logistic regression models
share the same group of centers

• Replace squared error in fine-tuning
with cross entropy
4. Multi-layer Neural
Network
23
RBF transformations are replaced
with activation functions, such as
sigmoid, rectified and tanh.
Experimental result
• Evaluation metric: global average precision

• ML-kNN

• Logistic regression

• Multi-label RBF network

• Multi-layer neural network
24
1. Global Average Precision
• Binary classification

• Precision = true positive / (true positive + false positive)

• Recall = true positive / (true positive + false negative)

• Average precision

• Global average precision

• Concatenate the predictions for the test set and sort them based on the
posterior probability and compute the average precision
25
Over
2. ML-kNN - model
26
Prior probability (smooth parameter s=1.0)
Likelihood - first 30 labels
27
k=8, s=1.0
Likelihood - last 30 labels
28
k=8, s=1.0
Posterior probability
29
k=8, s=1.0
Tune k
30
Validation GAPs (s=1.0)
0.4723, 0.5805, 0.6178, 0.6497,
0.6673, 0.6802, 0.6837, 0.6805,
0.6721.
3. Logistic regression -
standard scale
31
Features mean and variance in the training set
Result
32
Comparison among different logistic regression models
Case
Initialization
method
Standard
scale
Misc Best validation GAP
1 random no N/A 0.6554
2 random yes N/A 0.5882
3 linear regression no N/A 0.6555
4 linear regression yes N/A 0.5885
5 random no weighted classes 0.5560
6 random no
instances
weighting
0.6397
4. Multi-label RBF Network
33
Training and validation GAPs versus training steps
(proportion of centers 1e-5, 5e-5, 1e-4, 1e-3, 5e-3)
GAPs are 0.46, 0.60, 0.64, 0.68, 0.71,
respectively.
Centers selection and fine-
tuning
34
Case #Centers Mean distance Weights initialization Misc Validation GAP
1 3753 (k-means) 0.4844 truncated normal N/A 0.6827
2 4862 (random) N/A truncated normal N/A 0.6243
3 3859 (k-means) 0.4839 truncated normal standard scale 0.7235
4 3919 (k-means) 0.4832 linear regression fine tuning 0.7769
5. Multi-layer Neural
Network
35
Architecture Hidden_2 layer
Training and validation
36
Dropout and Bagging
• Applying dropout improves the
performance slightly

• Bagging of six neural networks
achieves the private GAP 0.801
(ranked at 111 / 650 teams)
37
Comparison with other
methods
• Using frame-level data, 1.7 TB

• Concatenate raw features to hidden
layer activations (residual network)

• Explore labels dependency

• Ensemble much more models and the
same model at different training steps
38
Summary
• Implemented ML-kNN that supports tuning k (passing a list of k)

• Implemented k-means that supports removing empty or small clusters during each iteration
and supports Cosine and Euclidean distance

• Implemented linear regression that does not require data to be shifted and supports tuning l2
regularization (passing a list of l2
reg. rates)

• Implemented one-vs-rest logistic regression that supports class weights and instance
weights. More importantly, it supports any feature transformation implemented in Tensorflow

• Implemented stable standard scale transformation that supports any feature transformation

• Implemented multi-label RBF network that supports three-phase learning and experimented
with the hyper-parameters, such as the number of centers and scaling factors

• Implemented multi-layer neural networks and experimented with architecture, batch
normalization and dropout and bagging

• Finally, the implementations can be easily adapted to other large datasets
39
Future Work
• Make use of frame-level dataset

• Larger k in ML-kNN

• Exploit more centers and the centers from
different clustering results in RBF network

• Exploit the validation set

• Explore other ensemble methods
40
Acknowledges
• Professor Luis Baumela, Marta Patiño

• Friend Xinye Fu, Xiaofeng Liu

• The algorithms authors

• Etc.
41
References
• [1] ML-kNN,Zhang, Min-Ling and Zhou, Zhi-Hua. “ML-KNN: A lazy learning approach to multi-label
learning”. In: Pattern recognition 40.7 (2007), pp. 2038–2048.

• [2] Three-phase learning RBF network, Schwenker, Friedhelm, Kestler, Hans A, and Palm, Günther. “Three
learning phases for radial-basis-function networks”. In: Neural networks 14.4 (2001), pp. 439–458.

• [3] Multi-label RBF network, Zhang, Min-Ling. “ML-RBF: RBF neural networks for multi-label learning”. In:
Neural Processing Letters 29.2 (2009), pp. 61–74.

• [4] YouTube-8M dataset, Abu-El-Haija, Sami et al. “Youtube-8m: A large-scale video classification bench-
mark”. In: arXiv preprint arXiv:1609.08675 (2016).

• Open course Neural Networks for Machine Learning by Geoffrey Hinton, https://www.coursera.org/learn/
neural-networks

• Open course Machine Learning by Hsuan-Tien Lin, https://www.csie.ntu.edu.tw/~htlin/course/ml15fall/

• Open course Data Mining by Jia Li, http://www.personal.psu.edu/jol2/course/stat557/material.html

• Etc.
42
Thank you!
43

More Related Content

What's hot

MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and ArchitecturesMetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures
MLAI2
 

What's hot (20)

Transfer Learning in NLP: A Survey
Transfer Learning in NLP: A SurveyTransfer Learning in NLP: A Survey
Transfer Learning in NLP: A Survey
 
Object detection - RCNNs vs Retinanet
Object detection - RCNNs vs RetinanetObject detection - RCNNs vs Retinanet
Object detection - RCNNs vs Retinanet
 
Emerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision TransformersEmerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision Transformers
 
Image Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A surveyImage Segmentation Using Deep Learning : A survey
Image Segmentation Using Deep Learning : A survey
 
Exploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation LearningExploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation Learning
 
PR-232: AutoML-Zero:Evolving Machine Learning Algorithms From Scratch
PR-232:  AutoML-Zero:Evolving Machine Learning Algorithms From ScratchPR-232:  AutoML-Zero:Evolving Machine Learning Algorithms From Scratch
PR-232: AutoML-Zero:Evolving Machine Learning Algorithms From Scratch
 
A scalable collaborative filtering framework based on co clustering
A scalable collaborative filtering framework based on co clusteringA scalable collaborative filtering framework based on co clustering
A scalable collaborative filtering framework based on co clustering
 
Transfer Learning (D2L4 Insight@DCU Machine Learning Workshop 2017)
Transfer Learning (D2L4 Insight@DCU Machine Learning Workshop 2017)Transfer Learning (D2L4 Insight@DCU Machine Learning Workshop 2017)
Transfer Learning (D2L4 Insight@DCU Machine Learning Workshop 2017)
 
Learning deep features for discriminative localization
Learning deep features for discriminative localizationLearning deep features for discriminative localization
Learning deep features for discriminative localization
 
Trackster Pruning at the CMS High-Granularity Calorimeter
Trackster Pruning at the CMS High-Granularity CalorimeterTrackster Pruning at the CMS High-Granularity Calorimeter
Trackster Pruning at the CMS High-Granularity Calorimeter
 
Tutorial on Object Detection (Faster R-CNN)
Tutorial on Object Detection (Faster R-CNN)Tutorial on Object Detection (Faster R-CNN)
Tutorial on Object Detection (Faster R-CNN)
 
D1L5 Visualization (D1L2 Insight@DCU Machine Learning Workshop 2017)
D1L5 Visualization (D1L2 Insight@DCU Machine Learning Workshop 2017)D1L5 Visualization (D1L2 Insight@DCU Machine Learning Workshop 2017)
D1L5 Visualization (D1L2 Insight@DCU Machine Learning Workshop 2017)
 
Deep Learning in Computer Vision
Deep Learning in Computer VisionDeep Learning in Computer Vision
Deep Learning in Computer Vision
 
Toward wave net speech synthesis
Toward wave net speech synthesisToward wave net speech synthesis
Toward wave net speech synthesis
 
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and ArchitecturesMetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures
 
YOLOv4: optimal speed and accuracy of object detection review
YOLOv4: optimal speed and accuracy of object detection reviewYOLOv4: optimal speed and accuracy of object detection review
YOLOv4: optimal speed and accuracy of object detection review
 
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object DetectorPR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
PR-270: PP-YOLO: An Effective and Efficient Implementation of Object Detector
 
Paraphrasing complex network
Paraphrasing complex networkParaphrasing complex network
Paraphrasing complex network
 
PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.
PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.
PR-373: Revisiting ResNets: Improved Training and Scaling Strategies.
 
Relational knowledge distillation
Relational knowledge distillationRelational knowledge distillation
Relational knowledge distillation
 

Similar to Two strategies for large-scale multi-label classification on the YouTube-8M dataset

Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
MLconf
 
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
MLconf
 

Similar to Two strategies for large-scale multi-label classification on the YouTube-8M dataset (20)

StackNet Meta-Modelling framework
StackNet Meta-Modelling frameworkStackNet Meta-Modelling framework
StackNet Meta-Modelling framework
 
lecture_16.pptx
lecture_16.pptxlecture_16.pptx
lecture_16.pptx
 
How Machine Learning Helps Organizations to Work More Efficiently?
How Machine Learning Helps Organizations to Work More Efficiently?How Machine Learning Helps Organizations to Work More Efficiently?
How Machine Learning Helps Organizations to Work More Efficiently?
 
Multi-class Classification on Riemannian Manifolds for Video Surveillance
Multi-class Classification on Riemannian Manifolds for Video SurveillanceMulti-class Classification on Riemannian Manifolds for Video Surveillance
Multi-class Classification on Riemannian Manifolds for Video Surveillance
 
Mnist soln
Mnist solnMnist soln
Mnist soln
 
Unsupervised Learning: Clustering
Unsupervised Learning: Clustering Unsupervised Learning: Clustering
Unsupervised Learning: Clustering
 
CTF: Anomaly Detection in High-Dimensional Time Series with Coarse-to-Fine Mo...
CTF: Anomaly Detection in High-Dimensional Time Series with Coarse-to-Fine Mo...CTF: Anomaly Detection in High-Dimensional Time Series with Coarse-to-Fine Mo...
CTF: Anomaly Detection in High-Dimensional Time Series with Coarse-to-Fine Mo...
 
convolutional_rbm.ppt
convolutional_rbm.pptconvolutional_rbm.ppt
convolutional_rbm.ppt
 
Data mining with Weka
Data mining with WekaData mining with Weka
Data mining with Weka
 
Machine Learning workshop by GDSC Amity University Chhattisgarh
Machine Learning workshop by GDSC Amity University ChhattisgarhMachine Learning workshop by GDSC Amity University Chhattisgarh
Machine Learning workshop by GDSC Amity University Chhattisgarh
 
ExplainableAI.pptx
ExplainableAI.pptxExplainableAI.pptx
ExplainableAI.pptx
 
crossvalidation.pptx
crossvalidation.pptxcrossvalidation.pptx
crossvalidation.pptx
 
Time series analysis : Refresher and Innovations
Time series analysis : Refresher and InnovationsTime series analysis : Refresher and Innovations
Time series analysis : Refresher and Innovations
 
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
Dr. Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf SEA - 5/20/16
 
Large Scale Kernel Learning using Block Coordinate Descent
Large Scale Kernel Learning using Block Coordinate DescentLarge Scale Kernel Learning using Block Coordinate Descent
Large Scale Kernel Learning using Block Coordinate Descent
 
Mattar_PhD_Thesis
Mattar_PhD_ThesisMattar_PhD_Thesis
Mattar_PhD_Thesis
 
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
 
Deep learning Tutorial - Part II
Deep learning Tutorial - Part IIDeep learning Tutorial - Part II
Deep learning Tutorial - Part II
 
General Tips for participating Kaggle Competitions
General Tips for participating Kaggle CompetitionsGeneral Tips for participating Kaggle Competitions
General Tips for participating Kaggle Competitions
 
A new development in the hierarchical clustering of repertory grid data
A new development in the hierarchical clustering of repertory grid dataA new development in the hierarchical clustering of repertory grid data
A new development in the hierarchical clustering of repertory grid data
 

Recently uploaded

Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
AroojKhan71
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
MarinCaroMartnezBerg
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 

Recently uploaded (20)

Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
ELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptxELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 

Two strategies for large-scale multi-label classification on the YouTube-8M dataset

  • 1. Two Strategies for Large-Scale Multi-label Classification on the YouTube-8M Dataset Author: Dalei Li Supervisor: Professor Luis Baumela Technical University of Madrid 1
  • 3. YouTube-8M •7 million videos • 120 ~ 500 s • 1000+ reviews •4716 topics • Visually recognizable •3.4 topics per video Electric guitar 3
  • 4. Frame-level Dataset • Extracted 1024-dimensional visual features using GoogLeNet (Inception- V3), per frame per second • Extracted 128-dimensional audio features from a CNN architecture for audio classification, per second 4
  • 5. Video-level Dataset • Average the visual and audio features over the frames of a video (1024 + 128)-dimensional vector Train (70%) Validation (20%) Test (10%) 5
  • 6. Multi-label Classification • Given the feature vector x of an instance (video) and a label (topic) l from L, predict its probability to have l 6
  • 7. Architecture to Learn A Classification Model 7
  • 8. Challenges • Labels are not mutually exclusive • One-vs-rest strategy • Huge number of instances • Streaming instances • Incremental learning 8
  • 9. One-vs-rest Strategy • Learn a binary classification model for each label l from L, i.e., P(l=1|x) • The instances that have label l are positive, the other instances are negative 9
  • 10. Streaming Instances A B Truck Training set Process Model The goods can be split. The process can be applied to t instances one by one. 10
  • 11. Incremental Learning Improve the model from new instances (by learning new skills in human evolution). 11
  • 12. Models • Streaming Instances • Multi-label k-nearest neighbor (k-NN) • Multi-label radial basis function (RBF) network • Incremental Learning • One-vs-rest logistic regression • Multi-layer neural network 12
  • 13. 1. Multi-label k-NN • One-vs-rest transforms to m independent binary classification problems • Binary k-NN • Assume when an instance x has label l or not its k nearest neighbors have different characteristics • Given an unseen instance x, observe the characteristics of its k nearest neighbors to determine it has l or not 13 the instances that have the same label are usually more similar than the instances that have different labels. We assume the instances that are similar tend to have the same label. Similarity metric: Cosine similarity
  • 14. Binary k-NN Idea • Among the k nearest neighbors, the number Cl of instances that also have label l follow different distributions when an instance x has label l or not • Compute the posterior probability P(l=1|x) using these distributions 14 The assumption is formulated to
  • 15. Binary k-NN Model • Prior distribution - without looking at an instance x, the probability distribution its label l follows • Likelihood - knowing the label of x, the distribution of the number Cl follows • Posterior distribution - knowing the number Cl of x, the distribution l follows 15
  • 16. Implementation • Prior distribution - while streaming the training instances into a sequence of batches, compute the sum of label and count • Likelihood - compute Cl for each training instance with (without) label l and compute the proportion of possible each value, i.e., 0,…,k • Posterior distribution - compute based on the prior distribution and likelihoods 16
  • 17. Implementation 17 The Tensorflow graph to find k nearest neighbors for a batch of training instances
  • 18. 2. Logistic Regression 18 A logistic regression model naturally outputs the posterior probability.
  • 19. 3. Multi-label RBF Network 19 RBF network An RBF network was proposed to fit a real-valued function in a high- dimensional space with a group of radial basis functions. In the known data points, the interpolation value should equal the desired value. In essence, it is linear regression in the transformed space
  • 20. Multi-label RBF Network 20 In essence, it is linear regression in the transformed space.
  • 21. Three-phase Learning 21 RBF parameters (centers and scaling factors) can be tunable. The linear regression result is used to initialize the logistic regression models.
  • 22. Implementation 22 • Combine ML-RBF network proposed by Zhang et al. and three-phase learning by Schwenker et al. … • The binary logistic regression models share the same group of centers • Replace squared error in fine-tuning with cross entropy
  • 23. 4. Multi-layer Neural Network 23 RBF transformations are replaced with activation functions, such as sigmoid, rectified and tanh.
  • 24. Experimental result • Evaluation metric: global average precision • ML-kNN • Logistic regression • Multi-label RBF network • Multi-layer neural network 24
  • 25. 1. Global Average Precision • Binary classification • Precision = true positive / (true positive + false positive) • Recall = true positive / (true positive + false negative) • Average precision • Global average precision • Concatenate the predictions for the test set and sort them based on the posterior probability and compute the average precision 25 Over
  • 26. 2. ML-kNN - model 26 Prior probability (smooth parameter s=1.0)
  • 27. Likelihood - first 30 labels 27 k=8, s=1.0
  • 28. Likelihood - last 30 labels 28 k=8, s=1.0
  • 30. Tune k 30 Validation GAPs (s=1.0) 0.4723, 0.5805, 0.6178, 0.6497, 0.6673, 0.6802, 0.6837, 0.6805, 0.6721.
  • 31. 3. Logistic regression - standard scale 31 Features mean and variance in the training set
  • 32. Result 32 Comparison among different logistic regression models Case Initialization method Standard scale Misc Best validation GAP 1 random no N/A 0.6554 2 random yes N/A 0.5882 3 linear regression no N/A 0.6555 4 linear regression yes N/A 0.5885 5 random no weighted classes 0.5560 6 random no instances weighting 0.6397
  • 33. 4. Multi-label RBF Network 33 Training and validation GAPs versus training steps (proportion of centers 1e-5, 5e-5, 1e-4, 1e-3, 5e-3) GAPs are 0.46, 0.60, 0.64, 0.68, 0.71, respectively.
  • 34. Centers selection and fine- tuning 34 Case #Centers Mean distance Weights initialization Misc Validation GAP 1 3753 (k-means) 0.4844 truncated normal N/A 0.6827 2 4862 (random) N/A truncated normal N/A 0.6243 3 3859 (k-means) 0.4839 truncated normal standard scale 0.7235 4 3919 (k-means) 0.4832 linear regression fine tuning 0.7769
  • 37. Dropout and Bagging • Applying dropout improves the performance slightly • Bagging of six neural networks achieves the private GAP 0.801 (ranked at 111 / 650 teams) 37
  • 38. Comparison with other methods • Using frame-level data, 1.7 TB • Concatenate raw features to hidden layer activations (residual network) • Explore labels dependency • Ensemble much more models and the same model at different training steps 38
  • 39. Summary • Implemented ML-kNN that supports tuning k (passing a list of k) • Implemented k-means that supports removing empty or small clusters during each iteration and supports Cosine and Euclidean distance • Implemented linear regression that does not require data to be shifted and supports tuning l2 regularization (passing a list of l2 reg. rates) • Implemented one-vs-rest logistic regression that supports class weights and instance weights. More importantly, it supports any feature transformation implemented in Tensorflow • Implemented stable standard scale transformation that supports any feature transformation • Implemented multi-label RBF network that supports three-phase learning and experimented with the hyper-parameters, such as the number of centers and scaling factors • Implemented multi-layer neural networks and experimented with architecture, batch normalization and dropout and bagging • Finally, the implementations can be easily adapted to other large datasets 39
  • 40. Future Work • Make use of frame-level dataset • Larger k in ML-kNN • Exploit more centers and the centers from different clustering results in RBF network • Exploit the validation set • Explore other ensemble methods 40
  • 41. Acknowledges • Professor Luis Baumela, Marta Patiño • Friend Xinye Fu, Xiaofeng Liu • The algorithms authors • Etc. 41
  • 42. References • [1] ML-kNN,Zhang, Min-Ling and Zhou, Zhi-Hua. “ML-KNN: A lazy learning approach to multi-label learning”. In: Pattern recognition 40.7 (2007), pp. 2038–2048. • [2] Three-phase learning RBF network, Schwenker, Friedhelm, Kestler, Hans A, and Palm, Günther. “Three learning phases for radial-basis-function networks”. In: Neural networks 14.4 (2001), pp. 439–458. • [3] Multi-label RBF network, Zhang, Min-Ling. “ML-RBF: RBF neural networks for multi-label learning”. In: Neural Processing Letters 29.2 (2009), pp. 61–74. • [4] YouTube-8M dataset, Abu-El-Haija, Sami et al. “Youtube-8m: A large-scale video classification bench- mark”. In: arXiv preprint arXiv:1609.08675 (2016). • Open course Neural Networks for Machine Learning by Geoffrey Hinton, https://www.coursera.org/learn/ neural-networks • Open course Machine Learning by Hsuan-Tien Lin, https://www.csie.ntu.edu.tw/~htlin/course/ml15fall/ • Open course Data Mining by Jia Li, http://www.personal.psu.edu/jol2/course/stat557/material.html • Etc. 42