YouTube-8M: A Large-Scale Video Classification Benchmark (UPC Reading Group)

•

2 recomendaciones•815 vistas

1. The document describes the YouTube-8M dataset, which contains over 8 million YouTube videos labeled with visual entities. It explores several baseline machine learning models for multi-label video classification on the dataset. 2. The best performing models were deep learning models that aggregated frame-level features, such as deep bag-of-frames pooling and LSTMs. These achieved mean average precision scores consistent with human ratings on a test set. 3. It also briefly introduces Google Cloud Machine Learning Engine, a cloud platform for training and deploying machine learning models at scale, which was used to train models on the YouTube-8M dataset.

Datos y análisis

YouTube-8M: A Large-Scale Video Classification
Benchmark (and Google Cloud ML Engine)
Slides by Dídac Surís
ReadAI Reading Group, UPC
13th March, 2017
Sami Abu-El-Haija, Nisarg Kothari, Joonseok Lee, Paul
Natsev, George Toderici, Balakrishnan Varadarajan,
Sudheendra Vijayanarasimhan
[arxiv] (27 Sep 2016) [web]

Index
1. YouTube-8M
a. Dataset
b. Baseline approaches
c. Results
2. Google Cloud ML Engine

YouTube-8M: Dataset
Main features
● Multi-label (average 1.8)
● 4800 entities (24 top-level categories)
● 8, 264, 650 videos
● 500K hours of video
● Only visual entities
● Remove computational barriers

YouTube-8M: Dataset
Obtention
● YouTube video annotation system (metadata, context, …)
● First step: define entities
○ Human ratings to define entities (only visual ones)
○ At least 200 videos per entity
● Second step: collect videos
○ 10 M randomly sampled videos
○ Discard according to several
criteria
○ Split into train/validate/test

YouTube-8M: Dataset
Feature Extraction
● 50 years of video real time: impractical
● Sampling at 1 frame per second
● Frame-level feature extraction: fetch the ReLu activation of the last hidden
layer from the Inception network trained on ImageNet
● 2048 dimensions. With PCA + quantization size reduced 8x
● Audio features also extracted later:
https://www.kaggle.com/c/youtube8m/discussion/29475

YouTube-8M: Dataset
Not perfect ground truth
● 78.8 % precision
● 14.5 % recall

YouTube-8M: Baseline approaches
Frame-level
Training of 4800 independent one-vs-all classifiers
1. Average pooling + logistic
○ The frame-level probabilities are aggregated
to the video-level using a simple average
2. Deep Bag of Frame (DBoF) Pooling
○ k frames projected to an M-dimensional space
with RELU activations
○ Batch normalization
○ Aggregation of frames with max-pooling
3. LSTM
○ 2 LSTM layers with 1024 hidden units
○ Linearly increasing per-frame weights going
from 1/N to 1 for the last frame.

YouTube-8M: Baseline approaches
Video-level
Only difference is that now we combine features before the
neural network: fixed-length video features
● Mean, standard deviation, top 5 ordinal statistics
● Posterior normalization (subtract mean, PCA)
Online learning algorithms instead of batch optimization (¿?)
1. Logistic regression
2. SVM (online) + Hinge loss
3. Mixture of Experts

YouTube-8M: Results
Evaluation metrics and comparison
● Mean Average Precision
(Precision, Recall)
● Hit @k
● Precision at equal recall rate
(PERR)
These are results on the validation
set. On the human rated test set
the results are consistent.

YouTube-8M: Results
Results on other databases (transfer learning)
● Sports 1M
● Activity Net

Google Cloud Machine Learning Engine
Basics
● Google Cloud Platform: 300 $ trial
● Google Cloud Shell
● Pricing
○ Training: in ML units (depending on scale tier) * hours
○ Prediction: Per hour + # of predictions
● Google Cloud Storage for the results

Google Cloud Machine Learning Engine
Task submission

Google Cloud Machine Learning Engine
TensorBoard

Más contenido relacionado

La actualidad más candente

B Eng Final Year Project Presentationjesujoseph

IRJET-Multiple Object Detection using Deep Neural NetworksIRJET Journal

Deep Learning Fast MRI Using Channel Attention in Magnitude DomainJoonhyung Lee

Background subtractionShashank Dhariwal

Review : Prototype Mixture Models for Few-shot Semantic SegmentationDongmin Choi

Denoising Unpaired Low Dose CT Images with Self-Ensembled CycleGANJoonhyung Lee

Performance Enhancement for Quality Inter-Layer Scalable Video CodingIJCSIS Research Publications

A flexible method to create wave file features IJECEIAES

Be36338341IJERA Editor

Kassem2009lazchi

MEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTIONcsandit

Median based parallel steering kernel regression for image reconstructioncsandit

Complex Background Subtraction Using Kalman FilterIJERA Editor

Comparing Incremental Learning Strategies for Convolutional Neural NetworksVincenzo Lomonaco

Bag of tricks for image classification with convolutional neural networks r...Dongmin Choi

Survey on optical flow estimation with DLLeapMind Inc

Robust foreground modelling to segment and detect multiple moving objects in ...IJECEIAES

Keyframe-based Video Summarization DesignerUniversitat Politècnica de Catalunya

Seed net automatic seed generation with deep reinforcement learning for robus...NAVER Engineering

Image processing on matlab presentationNaatchammai Ramanathan

La actualidad más candente (20)

B Eng Final Year Project Presentation

IRJET-Multiple Object Detection using Deep Neural Networks

Deep Learning Fast MRI Using Channel Attention in Magnitude Domain

Background subtraction

Review : Prototype Mixture Models for Few-shot Semantic Segmentation

Denoising Unpaired Low Dose CT Images with Self-Ensembled CycleGAN

Performance Enhancement for Quality Inter-Layer Scalable Video Coding

A flexible method to create wave file features

Be36338341

Kassem2009

MEDIAN BASED PARALLEL STEERING KERNEL REGRESSION FOR IMAGE RECONSTRUCTION

Median based parallel steering kernel regression for image reconstruction

Complex Background Subtraction Using Kalman Filter

Comparing Incremental Learning Strategies for Convolutional Neural Networks

Bag of tricks for image classification with convolutional neural networks r...

Survey on optical flow estimation with DL

Robust foreground modelling to segment and detect multiple moving objects in ...

Keyframe-based Video Summarization Designer

Seed net automatic seed generation with deep reinforcement learning for robus...

Image processing on matlab presentation

Destacado

Visual Translation Embedding Network for Visual Relation Detection (UPC Readi...Universitat Politècnica de Catalunya

Skin Lesion Detection from Dermoscopic Images using Convolutional Neural Netw...Universitat Politècnica de Catalunya

How to invest in capital marketSabiha Jannat

Deep Learning for Computer Vision: Attention Models (UPC 2016)Universitat Politècnica de Catalunya

Deep Learning for Computer Vision: Generative models and adversarial training...Universitat Politècnica de Catalunya

La figura del director en la LOMCEMiguel Miguel

Baptist Visitor, 2016First Southern Baptist Church of North Hollywood

Prot. 337 17 mensagem de veto 002 - integral ao autógrafo de lei nº 3.602-16Claudio Figueiredo

Defective productsKyle Larson

Creating new classes of objects with deep generative neural netsAkin Osman Kazakci

Paper crf design_toolsDave John

Tools for Image Retrieval in Large Multimedia DatabasesUniversitat Politècnica de Catalunya

Conditional Random Fields - Vidya VenkiteswaranWithTheBest

Project Portfolio SummariesTA Instruments

Deep Learning for Computer Vision: Data Augmentation (UPC 2016)Universitat Politècnica de Catalunya

Deep Learning for Computer Vision: Optimization (UPC 2016)Universitat Politècnica de Catalunya

Web本文抽出 using crfShuyo Nakatani

Machine Learning: Generative and Discriminative Modelsbutest

Deep Learning for Computer Vision: Saliency Prediction (UPC 2016)Universitat Politècnica de Catalunya

Region-oriented Convolutional Networks for Object RetrievalUniversitat Politècnica de Catalunya

Destacado (20)

Visual Translation Embedding Network for Visual Relation Detection (UPC Readi...

Skin Lesion Detection from Dermoscopic Images using Convolutional Neural Netw...

How to invest in capital market

Deep Learning for Computer Vision: Attention Models (UPC 2016)

Deep Learning for Computer Vision: Generative models and adversarial training...

La figura del director en la LOMCE

Baptist Visitor, 2016

Prot. 337 17 mensagem de veto 002 - integral ao autógrafo de lei nº 3.602-16

Defective products

Creating new classes of objects with deep generative neural nets

Paper crf design_tools

Tools for Image Retrieval in Large Multimedia Databases

Conditional Random Fields - Vidya Venkiteswaran

Project Portfolio Summaries

Deep Learning for Computer Vision: Data Augmentation (UPC 2016)

Deep Learning for Computer Vision: Optimization (UPC 2016)

Web本文抽出 using crf

Machine Learning: Generative and Discriminative Models

Deep Learning for Computer Vision: Saliency Prediction (UPC 2016)

Region-oriented Convolutional Networks for Object Retrieval

Similar a YouTube-8M: A Large-Scale Video Classification Benchmark (UPC Reading Group)

Mtech Second progresspresentation ON VIDEO SUMMARIZATIONNEERAJ BAGHEL

Sprint 71ManageIQ

Tutorial-on-DNN-09A-Co-design-Sparsity.pdfDuy-Hieu Bui

Managing 600 instancesGeoffrey Beausire

Deep neural networks for Youtube recommendationsAryan Khandal

Image Object Detection PipelineAbhinav Dadhich

IRJET- Storage Optimization of Video Surveillance from CCTV CameraIRJET Journal

Activity Recognition projectAndreaNapoletani

2021 05-04-u2-netJAEMINJEONG5

Sprint 50 reviewManageIQ

5 ijaems sept-2015-9-video feature extraction based on modified lle using ada...INFOGAIN PUBLICATION

Practical MLAntonio Pitasi

ML Paper Tutorial - Video Face Manipulation Detection Through Ensemble of CNN...Pei-Yuan Chien

Video Thumbnail SelectorVasileiosMezaris

Key frame extraction for video summarization using motion activity descriptorseSAT Publishing House

Key frame extraction for video summarization using motion activity descriptorseSAT Journals

USING IMAGE CLASSIFICATION TO INCENTIVIZE RECYCLINGIRJET Journal

Effective Compression of Digital VideoIRJET Journal

Sprint 44 reviewManageIQ

Real Time Object Dectection using machine learningpratik pratyay

Similar a YouTube-8M: A Large-Scale Video Classification Benchmark (UPC Reading Group) (20)

Mtech Second progresspresentation ON VIDEO SUMMARIZATION

Sprint 71

Tutorial-on-DNN-09A-Co-design-Sparsity.pdf

Managing 600 instances

Deep neural networks for Youtube recommendations

Image Object Detection Pipeline

IRJET- Storage Optimization of Video Surveillance from CCTV Camera

Activity Recognition project

2021 05-04-u2-net

Sprint 50 review

5 ijaems sept-2015-9-video feature extraction based on modified lle using ada...

Practical ML

ML Paper Tutorial - Video Face Manipulation Detection Through Ensemble of CNN...

Video Thumbnail Selector

Key frame extraction for video summarization using motion activity descriptors

USING IMAGE CLASSIFICATION TO INCENTIVIZE RECYCLING

Effective Compression of Digital Video

Sprint 44 review

Real Time Object Dectection using machine learning

Más de Universitat Politècnica de Catalunya

Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Universitat Politècnica de Catalunya

Deep Generative Learning for AllUniversitat Politècnica de Catalunya

The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...Universitat Politècnica de Catalunya

Towards Sign Language Translation & Production | Xavier Giro-i-NietoUniversitat Politècnica de Catalunya

The Transformer - Xavier Giró - UPC Barcelona 2021Universitat Politècnica de Catalunya

Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...Universitat Politècnica de Catalunya

Open challenges in sign language translation and productionUniversitat Politècnica de Catalunya

Generation of Synthetic Referring Expressions for Object Segmentation in VideosUniversitat Politècnica de Catalunya

Discovery and Learning of Navigation Goals from Pixels in MinecraftUniversitat Politècnica de Catalunya

Learn2Sign : Sign language recognition and translation using human keypoint e...Universitat Politècnica de Catalunya

Intepretability / Explainable AI for Deep Neural NetworksUniversitat Politècnica de Catalunya

Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Universitat Politècnica de Catalunya

Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...Universitat Politècnica de Catalunya

Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Universitat Politècnica de Catalunya

Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Universitat Politècnica de Catalunya

Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020Universitat Politècnica de Catalunya

Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)Universitat Politècnica de Catalunya

Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Universitat Politècnica de Catalunya

Curriculum Learning for Recurrent Video Object SegmentationUniversitat Politècnica de Catalunya

Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020Universitat Politècnica de Catalunya

Más de Universitat Politècnica de Catalunya (20)

Deep Generative Learning for All - The Gen AI Hype (Spring 2024)

Deep Generative Learning for All

The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...

Towards Sign Language Translation & Production | Xavier Giro-i-Nieto

The Transformer - Xavier Giró - UPC Barcelona 2021

Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...

Open challenges in sign language translation and production

Generation of Synthetic Referring Expressions for Object Segmentation in Videos

Discovery and Learning of Navigation Goals from Pixels in Minecraft

Learn2Sign : Sign language recognition and translation using human keypoint e...

Intepretability / Explainable AI for Deep Neural Networks

Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020

Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...

Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020

Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...

Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020

Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)

Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...

Curriculum Learning for Recurrent Video Object Segmentation

Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020

Último

VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY

Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823

Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila

FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg

Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Standamitlee9823

April 2024 - Crypto Market Report's Analysismanisha194592

Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls

Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums

Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823

Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal

Halmar dropshipping via API with DroFxolyaivanovalion

Invezz.com - Grow your wealth with trading signalsInvezz1

Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823

Probability Grade 10 Third Quarter LessonsJoseMangaJr1

CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE9953056974 Low Rate Call Girls In Saket, Delhi NCR

Week-01-2.ppt BBB human Computer interactionfulawalesam

Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71

Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823

Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann

ALSO dropshipping via API with DroFx.pptxolyaivanovalion

YouTube-8M: A Large-Scale Video Classification Benchmark (UPC Reading Group)

1. YouTube-8M: A Large-Scale Video Classification Benchmark (and Google Cloud ML Engine) Slides by Dídac Surís ReadAI Reading Group, UPC 13th March, 2017 Sami Abu-El-Haija, Nisarg Kothari, Joonseok Lee, Paul Natsev, George Toderici, Balakrishnan Varadarajan, Sudheendra Vijayanarasimhan [arxiv] (27 Sep 2016) [web]

2. Index 1. YouTube-8M a. Dataset b. Baseline approaches c. Results 2. Google Cloud ML Engine

3. Index 1. YouTube-8M a. Dataset b. Baseline approaches c. Results 2. Google Cloud ML Engine

4. YouTube-8M: Dataset Main features ● Multi-label (average 1.8) ● 4800 entities (24 top-level categories) ● 8, 264, 650 videos ● 500K hours of video ● Only visual entities ● Remove computational barriers

5. YouTube-8M: Dataset Obtention ● YouTube video annotation system (metadata, context, …) ● First step: define entities ○ Human ratings to define entities (only visual ones) ○ At least 200 videos per entity ● Second step: collect videos ○ 10 M randomly sampled videos ○ Discard according to several criteria ○ Split into train/validate/test

6. YouTube-8M: Dataset Feature Extraction ● 50 years of video real time: impractical ● Sampling at 1 frame per second ● Frame-level feature extraction: fetch the ReLu activation of the last hidden layer from the Inception network trained on ImageNet ● 2048 dimensions. With PCA + quantization size reduced 8x ● Audio features also extracted later: https://www.kaggle.com/c/youtube8m/discussion/29475

7. YouTube-8M: Dataset Not perfect ground truth ● 78.8 % precision ● 14.5 % recall

8. Index 1. YouTube-8M a. Dataset b. Baseline approaches c. Results 2. Google Cloud ML Engine

9. YouTube-8M: Baseline approaches Frame-level Training of 4800 independent one-vs-all classifiers 1. Average pooling + logistic ○ The frame-level probabilities are aggregated to the video-level using a simple average 2. Deep Bag of Frame (DBoF) Pooling ○ k frames projected to an M-dimensional space with RELU activations ○ Batch normalization ○ Aggregation of frames with max-pooling 3. LSTM ○ 2 LSTM layers with 1024 hidden units ○ Linearly increasing per-frame weights going from 1/N to 1 for the last frame.

10. YouTube-8M: Baseline approaches Video-level Only difference is that now we combine features before the neural network: fixed-length video features ● Mean, standard deviation, top 5 ordinal statistics ● Posterior normalization (subtract mean, PCA) Online learning algorithms instead of batch optimization (¿?) 1. Logistic regression 2. SVM (online) + Hinge loss 3. Mixture of Experts

11. Index 1. YouTube-8M a. Dataset b. Baseline approaches c. Results 2. Google Cloud ML Engine

12. YouTube-8M: Results Evaluation metrics and comparison ● Mean Average Precision (Precision, Recall) ● Hit @k ● Precision at equal recall rate (PERR) These are results on the validation set. On the human rated test set the results are consistent.

13. YouTube-8M: Results Results on other databases (transfer learning) ● Sports 1M ● Activity Net

14. Index 1. YouTube-8M a. Dataset b. Baseline approaches c. Results 2. Google Cloud ML Engine

15. Google Cloud Machine Learning Engine Basics ● Google Cloud Platform: 300 $ trial ● Google Cloud Shell ● Pricing ○ Training: in ML units (depending on scale tier) * hours ○ Prediction: Per hour + # of predictions ● Google Cloud Storage for the results

16. Google Cloud Machine Learning Engine Task submission

17. Google Cloud Machine Learning Engine TensorBoard

YouTube-8M: A Large-Scale Video Classification Benchmark (UPC Reading Group)

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Destacado

Destacado (20)

Similar a YouTube-8M: A Large-Scale Video Classification Benchmark (UPC Reading Group)

Similar a YouTube-8M: A Large-Scale Video Classification Benchmark (UPC Reading Group) (20)

Más de Universitat Politècnica de Catalunya

Más de Universitat Politècnica de Catalunya (20)

Último

Último (20)

YouTube-8M: A Large-Scale Video Classification Benchmark (UPC Reading Group)