SlideShare una empresa de Scribd logo
1 de 27
Large-scale Deep Unsupervised Learning using Graphics Processors Rajat Raina Anand Madhavan Andrew Y. Ng Stanford University
Learning from unlabeled data Classify vs. car motorcycle Learn higher-level representation Deep Belief Networks Sparse Coding Input space Unlabeled examples Higher-level representation Large-scale Deep Unsupervised Learning                             Rajat Raina, Anand Madhavan, Andrew Y. Ng
The promise of unsupervised learning Use large amounts of unlabeled data to learn complex/deep models, possibly with many parameters. Large-scale Deep Unsupervised Learning                             Rajat Raina, Anand Madhavan, Andrew Y. Ng
Some recent work on DBNs (Similar situation for sparse coding.) Large-scale Deep Unsupervised Learning                             Rajat Raina, Anand Madhavan, Andrew Y. Ng
Large-scale learning [Banko & Brill, 2001] Large-scale Deep Unsupervised Learning                             Rajat Raina, Anand Madhavan, Andrew Y. Ng
Large-scale unsupervised learning Current models: 1000s of input dimensions, 1000s of hidden units. 106  parameters. Our desired model:  108  parameters Large-scale Deep Unsupervised Learning                             Rajat Raina, Anand Madhavan, Andrew Y. Ng
Graphics Processors CPU RAM Graphics Card (GPU) Motherboard Large-scale Deep Unsupervised Learning                             Rajat Raina, Anand Madhavan, Andrew Y. Ng
Why graphics processors? 1000 750 500 250 0 NVIDIA GPU Peak Gflops (billion ops / sec) Intel CPU 2003           2004          2005         2006            2007                 2008 (Source: NVIDIA CUDA Programming Guide) Large-scale Deep Unsupervised Learning                             Rajat Raina, Anand Madhavan, Andrew Y. Ng
Why graphics processors? IBM ASCI White Supercomputer Cost: $110 million Space: 2 basketball courts 13 graphics cards Large-scale Deep Unsupervised Learning                             Rajat Raina, Anand Madhavan, Andrew Y. Ng
GPU Schematic 30 MPs … MP MP MP Shared Memory (16K) Shared Memory (16K) Shared Memory (16K) 1000GB/s … SP SP SP SP SP SP SP SP SP SP SP SP Registers Registers SP SP SP SP SP SP SP SP SP SP SP SP 100 GB/s (coalesced) Registers Global Memory (~1GB) Slow transfer from RAM RAM (Note: Some additional features not displayed.)
GPU Programming Two-level parallelism Split task into blocks, blocks into threads. Access to global memory (not RAM). Restrictions on memory access patterns. Main bottleneck: Getting data into GPU memory, and accessing it in efficient ways. NVIDIA CUDA High-level routines to allocate/copy GPU memory. Good GPU matrix libraries that suffice for many machine learning tasks. MP MP MP Shared Memory Shared Memory Shared Memory RAM SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP Global Memory (~1GB) Large-scale Deep Unsupervised Learning                             Rajat Raina, Anand Madhavan, Andrew Y. Ng
Unsupervised learning on GPUs Initialize parameters in global memory. while  convergence criterion is not satisfied Periodically transfer a large number of unlabeled examples into global memory. Pick a few of the unlabeled examples at a time, and compute the updates in parallel using the GPU's two-level parallelism (blocks and threads) or GPU matrix libraries. end Transfer learnt parameters from global memory. Large-scale Deep Unsupervised Learning                             Rajat Raina, Anand Madhavan, Andrew Y. Ng
Deep Belief Networks Learning Large DBNs using Graphics Processors                                        Rajat Raina, Andrew Y. Ng
Restricted Boltzmann Machine (RBM) . . . h2 h1 h Contrastive divergence learning via conditional distributions: . . . v v1 v2 v3 Large-scale Deep Unsupervised Learning                             Rajat Raina, Anand Madhavan, Andrew Y. Ng
Experimental setup Single graphics card: Nvidia GTX 280 1GB on-board memory, 240 cores. Current price: US $250. CPU: Two cores, each @3.16GHz.
Learning Large RBMs 2 weeks Dual-core CPU    1 week 35 hours 72x faster    1 day Learning time for 10 million examples  (log scale) 8 hours 2 hours 5 hours GPU    1 hour ½ hour     1                             18                                 36          45 Millions of parameters Large-scale Deep Unsupervised Learning                             Rajat Raina, Anand Madhavan, Andrew Y. Ng
Overlapping patches DBN Hidden UnitsB Hidden UnitsA . . . . . . WA, bA, cA WB, bB, cB Patch A Patch B Input image Large-scale Deep Unsupervised Learning                             Rajat Raina, Anand Madhavan, Andrew Y. Ng
[object Object],Overlapping patches DBN example 8192 units 15680 units 2048 units 20736 units (144x144) 32768 units (128 units per 24x24 patch)         	 …  	 …    .                 . 	 … All layers can be learnt in about 1 day on a GPU. Large-scale Deep Unsupervised Learning                             Rajat Raina, Anand Madhavan, Andrew Y. Ng
Sparse Coding Large-scale Deep Unsupervised Learning                             Rajat Raina, Anand Madhavan, Andrew Y. Ng
Sparse coding Basis vectors b Given unlabeled data x(i),  obtain b by solving: Alternating minimization Keep a fixed, find optimal b. Keep b fixed, find optimal a. Input x 	    =   0.8 *       b87+  0.3 *          b376+  0.5 *       b411 Activations a = 0.8 *                   + 0.3 *                     + 0.5 * Large-scale Deep Unsupervised Learning                             Rajat Raina, Anand Madhavan, Andrew Y. Ng
Parallel Sparse Coding Alternating minimization Keep a fixed, find optimal b.  Easy on GPU (projected grad descent). Keep b fixed, find optimal a.  Not as straightforward. Need to parallelize: Large-scale Deep Unsupervised Learning                             Rajat Raina, Anand Madhavan, Andrew Y. Ng
Parallel Sparse Coding Easy to optimize for one coordinate (keeping the others fixed). 							          (Friedman et al., 2007) One iteration of our algorithm: Descent direction Large-scale Deep Unsupervised Learning                             Rajat Raina, Anand Madhavan, Andrew Y. Ng
Sparse coding with 106 parameters 19 days Dual-core CPU Learning time (days) with  10 million examples 15x faster GPU 1 day 6 hours 3% nonzero                                                   10% nonzero Sparsity Large-scale Deep Unsupervised Learning                             Rajat Raina, Anand Madhavan, Andrew Y. Ng
Summary Large-scale unsupervised learning. Ten-times more data might transform an OK algorithm into a good algorithm. Working at smaller-scale risks confounding the effects of the model itself, with the effect of scale. GPUs are a powerful tool for machine learning. Easy to program (no low-level programming). Especially useful for stochastic learning methods. Learning algorithms for DBNs and sparse coding can be an order-of-magnitude faster. Large-scale Deep Unsupervised Learning                             Rajat Raina, Anand Madhavan, Andrew Y. Ng
THE  END
Why graphics processors?   120 100 80     60     40     20       0 NVIDIA GPU Bandwidth from memory to processor (GB/s) Intel CPU 2003          2004         2005         2006          2007 (Source: NVIDIA CUDA Programming Guide) Large-scale Deep Unsupervised Learning                             Rajat Raina, Anand Madhavan, Andrew Y. Ng
GPU Programming: A=A+B __global__ void vecAdd(float* A, float* B){ int my = threadIdx.x + blockIdx.x  * 128; 	A[my]=A[my]+B[my]; } int main(intargc, char** argv){ 	float A[SIZE], B[SIZE]; 	float* d_A, * d_B; cudaMalloc((void**)&d_A,SIZE_BYTES); cudaMalloc((void**)&d_B,SIZE_BYTES); cudaMemcpy(d_A,A,SIZE_BYTES,cudaMemcpyHostToDevice); cudaMemcpy(d_B,B,SIZE_BYTES,cudaMemcpyHostToDevice); vecAdd<<<32,128>>>(d_A,d_B); cudaThreadSynchronize(); cudaMemcpy(A,d_A,SIZE_BYTES,cudaMemcpyDeviceToHost); } GPU CPU (Adapted from http://www.cs.technion.ac.il/~marks/docs/LinuxClubGPGPU.pdf)

Más contenido relacionado

La actualidad más candente

[2A4]DeepLearningAtNAVER
[2A4]DeepLearningAtNAVER[2A4]DeepLearningAtNAVER
[2A4]DeepLearningAtNAVERNAVER D2
 
Introduction to Deep Learning and neon at Galvanize
Introduction to Deep Learning and neon at GalvanizeIntroduction to Deep Learning and neon at Galvanize
Introduction to Deep Learning and neon at GalvanizeIntel Nervana
 
DeepImage_EmTech-public-small
DeepImage_EmTech-public-smallDeepImage_EmTech-public-small
DeepImage_EmTech-public-smallRen Wu
 
Using Deep Learning to do Real-Time Scoring in Practical Applications - 2015-...
Using Deep Learning to do Real-Time Scoring in Practical Applications - 2015-...Using Deep Learning to do Real-Time Scoring in Practical Applications - 2015-...
Using Deep Learning to do Real-Time Scoring in Practical Applications - 2015-...Greg Makowski
 
Deep Learning Jump Start
Deep Learning Jump StartDeep Learning Jump Start
Deep Learning Jump StartMichele Toni
 
DLD meetup 2017, Efficient Deep Learning
DLD meetup 2017, Efficient Deep LearningDLD meetup 2017, Efficient Deep Learning
DLD meetup 2017, Efficient Deep LearningBrodmann17
 
Sara Hooker & Sean McPherson, Delta Analytics, at MLconf Seattle 2017
Sara Hooker & Sean McPherson, Delta Analytics, at MLconf Seattle 2017Sara Hooker & Sean McPherson, Delta Analytics, at MLconf Seattle 2017
Sara Hooker & Sean McPherson, Delta Analytics, at MLconf Seattle 2017MLconf
 
(Msc Thesis) Sparse Coral Classification Using Deep Convolutional Neural Netw...
(Msc Thesis) Sparse Coral Classification Using Deep Convolutional Neural Netw...(Msc Thesis) Sparse Coral Classification Using Deep Convolutional Neural Netw...
(Msc Thesis) Sparse Coral Classification Using Deep Convolutional Neural Netw...Mohamed Elawady
 
Deep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural NetworksDeep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural NetworksChristian Perone
 
Deep Learning for Robotics
Deep Learning for RoboticsDeep Learning for Robotics
Deep Learning for RoboticsIntel Nervana
 
Convolutional neural network
Convolutional neural network Convolutional neural network
Convolutional neural network Yan Xu
 
Tall and Skinny QRs in MapReduce
Tall and Skinny QRs in MapReduceTall and Skinny QRs in MapReduce
Tall and Skinny QRs in MapReduceDavid Gleich
 
Language translation with Deep Learning (RNN) with TensorFlow
Language translation with Deep Learning (RNN) with TensorFlowLanguage translation with Deep Learning (RNN) with TensorFlow
Language translation with Deep Learning (RNN) with TensorFlowS N
 

La actualidad más candente (13)

[2A4]DeepLearningAtNAVER
[2A4]DeepLearningAtNAVER[2A4]DeepLearningAtNAVER
[2A4]DeepLearningAtNAVER
 
Introduction to Deep Learning and neon at Galvanize
Introduction to Deep Learning and neon at GalvanizeIntroduction to Deep Learning and neon at Galvanize
Introduction to Deep Learning and neon at Galvanize
 
DeepImage_EmTech-public-small
DeepImage_EmTech-public-smallDeepImage_EmTech-public-small
DeepImage_EmTech-public-small
 
Using Deep Learning to do Real-Time Scoring in Practical Applications - 2015-...
Using Deep Learning to do Real-Time Scoring in Practical Applications - 2015-...Using Deep Learning to do Real-Time Scoring in Practical Applications - 2015-...
Using Deep Learning to do Real-Time Scoring in Practical Applications - 2015-...
 
Deep Learning Jump Start
Deep Learning Jump StartDeep Learning Jump Start
Deep Learning Jump Start
 
DLD meetup 2017, Efficient Deep Learning
DLD meetup 2017, Efficient Deep LearningDLD meetup 2017, Efficient Deep Learning
DLD meetup 2017, Efficient Deep Learning
 
Sara Hooker & Sean McPherson, Delta Analytics, at MLconf Seattle 2017
Sara Hooker & Sean McPherson, Delta Analytics, at MLconf Seattle 2017Sara Hooker & Sean McPherson, Delta Analytics, at MLconf Seattle 2017
Sara Hooker & Sean McPherson, Delta Analytics, at MLconf Seattle 2017
 
(Msc Thesis) Sparse Coral Classification Using Deep Convolutional Neural Netw...
(Msc Thesis) Sparse Coral Classification Using Deep Convolutional Neural Netw...(Msc Thesis) Sparse Coral Classification Using Deep Convolutional Neural Netw...
(Msc Thesis) Sparse Coral Classification Using Deep Convolutional Neural Netw...
 
Deep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural NetworksDeep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural Networks
 
Deep Learning for Robotics
Deep Learning for RoboticsDeep Learning for Robotics
Deep Learning for Robotics
 
Convolutional neural network
Convolutional neural network Convolutional neural network
Convolutional neural network
 
Tall and Skinny QRs in MapReduce
Tall and Skinny QRs in MapReduceTall and Skinny QRs in MapReduce
Tall and Skinny QRs in MapReduce
 
Language translation with Deep Learning (RNN) with TensorFlow
Language translation with Deep Learning (RNN) with TensorFlowLanguage translation with Deep Learning (RNN) with TensorFlow
Language translation with Deep Learning (RNN) with TensorFlow
 

Destacado

Lessons from 2MM machine learning models
Lessons from 2MM machine learning modelsLessons from 2MM machine learning models
Lessons from 2MM machine learning modelsExtract Data Conference
 
Presentation on supervised learning
Presentation on supervised learningPresentation on supervised learning
Presentation on supervised learningTonmoy Bhagawati
 
What's Wrong With Deep Learning?
What's Wrong With Deep Learning?What's Wrong With Deep Learning?
What's Wrong With Deep Learning?Philip Zheng
 
Deep neural networks
Deep neural networksDeep neural networks
Deep neural networksSi Haem
 

Destacado (6)

Lessons from 2MM machine learning models
Lessons from 2MM machine learning modelsLessons from 2MM machine learning models
Lessons from 2MM machine learning models
 
Presentation on supervised learning
Presentation on supervised learningPresentation on supervised learning
Presentation on supervised learning
 
Tutorial on Deep Learning
Tutorial on Deep LearningTutorial on Deep Learning
Tutorial on Deep Learning
 
Andrew Ng, Chief Scientist at Baidu
Andrew Ng, Chief Scientist at BaiduAndrew Ng, Chief Scientist at Baidu
Andrew Ng, Chief Scientist at Baidu
 
What's Wrong With Deep Learning?
What's Wrong With Deep Learning?What's Wrong With Deep Learning?
What's Wrong With Deep Learning?
 
Deep neural networks
Deep neural networksDeep neural networks
Deep neural networks
 

Similar a Large-scale Deep Unsupervised Learning using Graphics Processors

Scalable Learning in Computer Vision
Scalable Learning in Computer VisionScalable Learning in Computer Vision
Scalable Learning in Computer Visionbutest
 
Big data 2.0, deep learning and financial Usecases
Big data 2.0, deep learning and financial UsecasesBig data 2.0, deep learning and financial Usecases
Big data 2.0, deep learning and financial UsecasesArvind Rapaka
 
Early Benchmarking Results for Neuromorphic Computing
Early Benchmarking Results for Neuromorphic ComputingEarly Benchmarking Results for Neuromorphic Computing
Early Benchmarking Results for Neuromorphic ComputingDESMOND YUEN
 
Dynamic Batch Parallel Algorithms for Updating Pagerank : SLIDES
Dynamic Batch Parallel Algorithms for Updating Pagerank : SLIDESDynamic Batch Parallel Algorithms for Updating Pagerank : SLIDES
Dynamic Batch Parallel Algorithms for Updating Pagerank : SLIDESSubhajit Sahu
 
Accelerating Data Science With GPUs
Accelerating Data Science With GPUsAccelerating Data Science With GPUs
Accelerating Data Science With GPUsiguazio
 
Multimedia hardware
Multimedia hardwareMultimedia hardware
Multimedia hardwareUtsav Roy
 
infoShare AI Roadshow 2018 - Tomasz Kopacz (Microsoft) - jakie możliwości daj...
infoShare AI Roadshow 2018 - Tomasz Kopacz (Microsoft) - jakie możliwości daj...infoShare AI Roadshow 2018 - Tomasz Kopacz (Microsoft) - jakie możliwości daj...
infoShare AI Roadshow 2018 - Tomasz Kopacz (Microsoft) - jakie możliwości daj...Infoshare
 
Fast data in times of crisis with GPU accelerated database QikkDB | Business ...
Fast data in times of crisis with GPU accelerated database QikkDB | Business ...Fast data in times of crisis with GPU accelerated database QikkDB | Business ...
Fast data in times of crisis with GPU accelerated database QikkDB | Business ...Matej Misik
 
Introduction to GPUs for Machine Learning
Introduction to GPUs for Machine LearningIntroduction to GPUs for Machine Learning
Introduction to GPUs for Machine LearningSri Ambati
 
Making Machine Learning Scale: Single Machine and Distributed
Making Machine Learning Scale: Single Machine and DistributedMaking Machine Learning Scale: Single Machine and Distributed
Making Machine Learning Scale: Single Machine and DistributedTuri, Inc.
 
Machine Learning with New Hardware Challegens
Machine Learning with New Hardware ChallegensMachine Learning with New Hardware Challegens
Machine Learning with New Hardware ChallegensOscar Law
 
Design and implementation of GPU-based SAR image processor
Design and implementation of GPU-based SAR image processorDesign and implementation of GPU-based SAR image processor
Design and implementation of GPU-based SAR image processorNajeeb Ahmad
 
Joker'14 Java as a fundamental working tool of the Data Scientist
Joker'14 Java as a fundamental working tool of the Data ScientistJoker'14 Java as a fundamental working tool of the Data Scientist
Joker'14 Java as a fundamental working tool of the Data ScientistAlexey Zinoviev
 
Accelerating stochastic gradient descent using adaptive mini batch size3
Accelerating stochastic gradient descent using adaptive mini batch size3Accelerating stochastic gradient descent using adaptive mini batch size3
Accelerating stochastic gradient descent using adaptive mini batch size3muayyad alsadi
 
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)Lablup Inc.
 
APSys Presentation Final copy2
APSys Presentation Final copy2APSys Presentation Final copy2
APSys Presentation Final copy2Junli Gu
 
Raul sena - Apresentação Analiticsemtudo - Scientific Applications using GPU
Raul sena - Apresentação Analiticsemtudo - Scientific Applications using GPURaul sena - Apresentação Analiticsemtudo - Scientific Applications using GPU
Raul sena - Apresentação Analiticsemtudo - Scientific Applications using GPUEduardo Gaspar
 
GPU Accelerated Deep Learning for CUDNN V2
GPU Accelerated Deep Learning for CUDNN V2GPU Accelerated Deep Learning for CUDNN V2
GPU Accelerated Deep Learning for CUDNN V2NVIDIA
 
Data Analytics and Machine Learning: From Node to Cluster on ARM64
Data Analytics and Machine Learning: From Node to Cluster on ARM64Data Analytics and Machine Learning: From Node to Cluster on ARM64
Data Analytics and Machine Learning: From Node to Cluster on ARM64Ganesh Raju
 

Similar a Large-scale Deep Unsupervised Learning using Graphics Processors (20)

Scalable Learning in Computer Vision
Scalable Learning in Computer VisionScalable Learning in Computer Vision
Scalable Learning in Computer Vision
 
Big data 2.0, deep learning and financial Usecases
Big data 2.0, deep learning and financial UsecasesBig data 2.0, deep learning and financial Usecases
Big data 2.0, deep learning and financial Usecases
 
Early Benchmarking Results for Neuromorphic Computing
Early Benchmarking Results for Neuromorphic ComputingEarly Benchmarking Results for Neuromorphic Computing
Early Benchmarking Results for Neuromorphic Computing
 
Dynamic Batch Parallel Algorithms for Updating Pagerank : SLIDES
Dynamic Batch Parallel Algorithms for Updating Pagerank : SLIDESDynamic Batch Parallel Algorithms for Updating Pagerank : SLIDES
Dynamic Batch Parallel Algorithms for Updating Pagerank : SLIDES
 
Accelerating Data Science With GPUs
Accelerating Data Science With GPUsAccelerating Data Science With GPUs
Accelerating Data Science With GPUs
 
Multimedia hardware
Multimedia hardwareMultimedia hardware
Multimedia hardware
 
infoShare AI Roadshow 2018 - Tomasz Kopacz (Microsoft) - jakie możliwości daj...
infoShare AI Roadshow 2018 - Tomasz Kopacz (Microsoft) - jakie możliwości daj...infoShare AI Roadshow 2018 - Tomasz Kopacz (Microsoft) - jakie możliwości daj...
infoShare AI Roadshow 2018 - Tomasz Kopacz (Microsoft) - jakie możliwości daj...
 
Fast data in times of crisis with GPU accelerated database QikkDB | Business ...
Fast data in times of crisis with GPU accelerated database QikkDB | Business ...Fast data in times of crisis with GPU accelerated database QikkDB | Business ...
Fast data in times of crisis with GPU accelerated database QikkDB | Business ...
 
Introduction to GPUs for Machine Learning
Introduction to GPUs for Machine LearningIntroduction to GPUs for Machine Learning
Introduction to GPUs for Machine Learning
 
Making Machine Learning Scale: Single Machine and Distributed
Making Machine Learning Scale: Single Machine and DistributedMaking Machine Learning Scale: Single Machine and Distributed
Making Machine Learning Scale: Single Machine and Distributed
 
Machine Learning with New Hardware Challegens
Machine Learning with New Hardware ChallegensMachine Learning with New Hardware Challegens
Machine Learning with New Hardware Challegens
 
Design and implementation of GPU-based SAR image processor
Design and implementation of GPU-based SAR image processorDesign and implementation of GPU-based SAR image processor
Design and implementation of GPU-based SAR image processor
 
Joker'14 Java as a fundamental working tool of the Data Scientist
Joker'14 Java as a fundamental working tool of the Data ScientistJoker'14 Java as a fundamental working tool of the Data Scientist
Joker'14 Java as a fundamental working tool of the Data Scientist
 
Accelerating stochastic gradient descent using adaptive mini batch size3
Accelerating stochastic gradient descent using adaptive mini batch size3Accelerating stochastic gradient descent using adaptive mini batch size3
Accelerating stochastic gradient descent using adaptive mini batch size3
 
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
 
APSys Presentation Final copy2
APSys Presentation Final copy2APSys Presentation Final copy2
APSys Presentation Final copy2
 
Hardware in Space
Hardware in SpaceHardware in Space
Hardware in Space
 
Raul sena - Apresentação Analiticsemtudo - Scientific Applications using GPU
Raul sena - Apresentação Analiticsemtudo - Scientific Applications using GPURaul sena - Apresentação Analiticsemtudo - Scientific Applications using GPU
Raul sena - Apresentação Analiticsemtudo - Scientific Applications using GPU
 
GPU Accelerated Deep Learning for CUDNN V2
GPU Accelerated Deep Learning for CUDNN V2GPU Accelerated Deep Learning for CUDNN V2
GPU Accelerated Deep Learning for CUDNN V2
 
Data Analytics and Machine Learning: From Node to Cluster on ARM64
Data Analytics and Machine Learning: From Node to Cluster on ARM64Data Analytics and Machine Learning: From Node to Cluster on ARM64
Data Analytics and Machine Learning: From Node to Cluster on ARM64
 

Más de butest

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEbutest
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jacksonbutest
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer IIbutest
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazzbutest
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.docbutest
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1butest
 
Facebook
Facebook Facebook
Facebook butest
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...butest
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...butest
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTbutest
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docbutest
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docbutest
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.docbutest
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!butest
 

Más de butest (20)

EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBE
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jackson
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer II
 
PPT
PPTPPT
PPT
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.doc
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1
 
Facebook
Facebook Facebook
Facebook
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENT
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.doc
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.doc
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.doc
 
hier
hierhier
hier
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!
 

Large-scale Deep Unsupervised Learning using Graphics Processors

  • 1. Large-scale Deep Unsupervised Learning using Graphics Processors Rajat Raina Anand Madhavan Andrew Y. Ng Stanford University
  • 2. Learning from unlabeled data Classify vs. car motorcycle Learn higher-level representation Deep Belief Networks Sparse Coding Input space Unlabeled examples Higher-level representation Large-scale Deep Unsupervised Learning Rajat Raina, Anand Madhavan, Andrew Y. Ng
  • 3. The promise of unsupervised learning Use large amounts of unlabeled data to learn complex/deep models, possibly with many parameters. Large-scale Deep Unsupervised Learning Rajat Raina, Anand Madhavan, Andrew Y. Ng
  • 4. Some recent work on DBNs (Similar situation for sparse coding.) Large-scale Deep Unsupervised Learning Rajat Raina, Anand Madhavan, Andrew Y. Ng
  • 5. Large-scale learning [Banko & Brill, 2001] Large-scale Deep Unsupervised Learning Rajat Raina, Anand Madhavan, Andrew Y. Ng
  • 6. Large-scale unsupervised learning Current models: 1000s of input dimensions, 1000s of hidden units. 106 parameters. Our desired model: 108 parameters Large-scale Deep Unsupervised Learning Rajat Raina, Anand Madhavan, Andrew Y. Ng
  • 7. Graphics Processors CPU RAM Graphics Card (GPU) Motherboard Large-scale Deep Unsupervised Learning Rajat Raina, Anand Madhavan, Andrew Y. Ng
  • 8. Why graphics processors? 1000 750 500 250 0 NVIDIA GPU Peak Gflops (billion ops / sec) Intel CPU 2003 2004 2005 2006 2007 2008 (Source: NVIDIA CUDA Programming Guide) Large-scale Deep Unsupervised Learning Rajat Raina, Anand Madhavan, Andrew Y. Ng
  • 9. Why graphics processors? IBM ASCI White Supercomputer Cost: $110 million Space: 2 basketball courts 13 graphics cards Large-scale Deep Unsupervised Learning Rajat Raina, Anand Madhavan, Andrew Y. Ng
  • 10. GPU Schematic 30 MPs … MP MP MP Shared Memory (16K) Shared Memory (16K) Shared Memory (16K) 1000GB/s … SP SP SP SP SP SP SP SP SP SP SP SP Registers Registers SP SP SP SP SP SP SP SP SP SP SP SP 100 GB/s (coalesced) Registers Global Memory (~1GB) Slow transfer from RAM RAM (Note: Some additional features not displayed.)
  • 11. GPU Programming Two-level parallelism Split task into blocks, blocks into threads. Access to global memory (not RAM). Restrictions on memory access patterns. Main bottleneck: Getting data into GPU memory, and accessing it in efficient ways. NVIDIA CUDA High-level routines to allocate/copy GPU memory. Good GPU matrix libraries that suffice for many machine learning tasks. MP MP MP Shared Memory Shared Memory Shared Memory RAM SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP SP Global Memory (~1GB) Large-scale Deep Unsupervised Learning Rajat Raina, Anand Madhavan, Andrew Y. Ng
  • 12. Unsupervised learning on GPUs Initialize parameters in global memory. while convergence criterion is not satisfied Periodically transfer a large number of unlabeled examples into global memory. Pick a few of the unlabeled examples at a time, and compute the updates in parallel using the GPU's two-level parallelism (blocks and threads) or GPU matrix libraries. end Transfer learnt parameters from global memory. Large-scale Deep Unsupervised Learning Rajat Raina, Anand Madhavan, Andrew Y. Ng
  • 13. Deep Belief Networks Learning Large DBNs using Graphics Processors Rajat Raina, Andrew Y. Ng
  • 14. Restricted Boltzmann Machine (RBM) . . . h2 h1 h Contrastive divergence learning via conditional distributions: . . . v v1 v2 v3 Large-scale Deep Unsupervised Learning Rajat Raina, Anand Madhavan, Andrew Y. Ng
  • 15. Experimental setup Single graphics card: Nvidia GTX 280 1GB on-board memory, 240 cores. Current price: US $250. CPU: Two cores, each @3.16GHz.
  • 16. Learning Large RBMs 2 weeks Dual-core CPU 1 week 35 hours 72x faster 1 day Learning time for 10 million examples (log scale) 8 hours 2 hours 5 hours GPU 1 hour ½ hour 1 18 36 45 Millions of parameters Large-scale Deep Unsupervised Learning Rajat Raina, Anand Madhavan, Andrew Y. Ng
  • 17. Overlapping patches DBN Hidden UnitsB Hidden UnitsA . . . . . . WA, bA, cA WB, bB, cB Patch A Patch B Input image Large-scale Deep Unsupervised Learning Rajat Raina, Anand Madhavan, Andrew Y. Ng
  • 18.
  • 19. Sparse Coding Large-scale Deep Unsupervised Learning Rajat Raina, Anand Madhavan, Andrew Y. Ng
  • 20. Sparse coding Basis vectors b Given unlabeled data x(i), obtain b by solving: Alternating minimization Keep a fixed, find optimal b. Keep b fixed, find optimal a. Input x = 0.8 * b87+ 0.3 * b376+ 0.5 * b411 Activations a = 0.8 * + 0.3 * + 0.5 * Large-scale Deep Unsupervised Learning Rajat Raina, Anand Madhavan, Andrew Y. Ng
  • 21. Parallel Sparse Coding Alternating minimization Keep a fixed, find optimal b. Easy on GPU (projected grad descent). Keep b fixed, find optimal a. Not as straightforward. Need to parallelize: Large-scale Deep Unsupervised Learning Rajat Raina, Anand Madhavan, Andrew Y. Ng
  • 22. Parallel Sparse Coding Easy to optimize for one coordinate (keeping the others fixed). (Friedman et al., 2007) One iteration of our algorithm: Descent direction Large-scale Deep Unsupervised Learning Rajat Raina, Anand Madhavan, Andrew Y. Ng
  • 23. Sparse coding with 106 parameters 19 days Dual-core CPU Learning time (days) with 10 million examples 15x faster GPU 1 day 6 hours 3% nonzero 10% nonzero Sparsity Large-scale Deep Unsupervised Learning Rajat Raina, Anand Madhavan, Andrew Y. Ng
  • 24. Summary Large-scale unsupervised learning. Ten-times more data might transform an OK algorithm into a good algorithm. Working at smaller-scale risks confounding the effects of the model itself, with the effect of scale. GPUs are a powerful tool for machine learning. Easy to program (no low-level programming). Especially useful for stochastic learning methods. Learning algorithms for DBNs and sparse coding can be an order-of-magnitude faster. Large-scale Deep Unsupervised Learning Rajat Raina, Anand Madhavan, Andrew Y. Ng
  • 26. Why graphics processors? 120 100 80 60 40 20 0 NVIDIA GPU Bandwidth from memory to processor (GB/s) Intel CPU 2003 2004 2005 2006 2007 (Source: NVIDIA CUDA Programming Guide) Large-scale Deep Unsupervised Learning Rajat Raina, Anand Madhavan, Andrew Y. Ng
  • 27. GPU Programming: A=A+B __global__ void vecAdd(float* A, float* B){ int my = threadIdx.x + blockIdx.x * 128; A[my]=A[my]+B[my]; } int main(intargc, char** argv){ float A[SIZE], B[SIZE]; float* d_A, * d_B; cudaMalloc((void**)&d_A,SIZE_BYTES); cudaMalloc((void**)&d_B,SIZE_BYTES); cudaMemcpy(d_A,A,SIZE_BYTES,cudaMemcpyHostToDevice); cudaMemcpy(d_B,B,SIZE_BYTES,cudaMemcpyHostToDevice); vecAdd<<<32,128>>>(d_A,d_B); cudaThreadSynchronize(); cudaMemcpy(A,d_A,SIZE_BYTES,cudaMemcpyDeviceToHost); } GPU CPU (Adapted from http://www.cs.technion.ac.il/~marks/docs/LinuxClubGPGPU.pdf)

Notas del editor

  1. In modern computing, sometimes the bottleneck is getting data into the processor from the memory.
  2. http://www.cs.technion.ac.il/~marks/docs/LinuxClubGPGPU.pdf