SlideShare una empresa de Scribd logo
1 de 66
Deep Learning
Ruslan Salakhutdinov
Department of Computer Science
University of Toronto
Images & Video
Relational Data/
Social Network
Massive increase in both computational power and the amount of
data available from web, video cameras, laboratory measurements.
Mining for Structure
Speech & Audio
Gene Expression
Text & Language
Geological Data
Product
Recommendation
Climate Change
Mostly Unlabeled
• Develop statistical models that can discover underlying structure, cause, or
statistical correlation from data in unsupervised or semi-supervised way.
• Multiple application domains.
Deep Learning
Impact of Deep Learning
• Speech Recognition
• Computer Vision
• Language Understanding
• Recommender Systems
• Drug Discovery and Medical
Image Analysis
Deep Learning in Action
• Achieves state-of-the-art on many object recognition tasks!
Try it at deeplearning.cs.toronto.edu!
Example: Understanding Images
Model Samples
• a group of people in a crowded area .
• a group of people are walking and talking .
• a group of people, standing around and talking .
• a group of people that are in the outside .
strangers, coworkers, conventioneers,
attendants, patrons
TAGS:
Nearest Neighbor Sentence:
people taking pictures of a crazy person
Image Tagging and Retrieval
mosque, tower,
building, cathedral,
dome, castle
kitchen, stove, oven,
refrigerator,
microwave
ski, skiing,
skiers, skiiers,
snowmobile
bowl, cup,
soup, cups,
coffee
beach
snow
Speech Recognition
Merck Molecular Activity Challenge
• Deep Learning technique: Predict biological activities of different
molecules, given numerical descriptors generated from their
chemical structures.
• To develop new medicines, it is important to identify molecules
that are highly active toward their intended targets.
Toronto team takes first place!
• From their blog:
- Restricted Boltzmann machines
- Probabilistic Matrix Factorization
(Salakhutdinov et. al. ICML, 2007, Salakhutdinov and Mnih, 2008)
To put these algorithms to use, we had to work to overcome some limitations, for
instance that they were built to handle 100 million ratings, instead of the more than
5 billion that we have, and that they were not built to adapt as members added
more ratings. But once we overcame those challenges, we put the two algorithms
into production, where they are still used as part of our recommendation engine.
Netflix uses:
Both of these algorithms were
developed by us at Toronto!
Deep Learning in the News
Key Computational Challenges
- Learning from billions of
(unlabeled) data points
- Developing new parallel
algorithms
Building bigger models using more data improves
performance of deep learning algorithms!
Scaling up our deep learning algorithms:
- Scaling up Computation using clusters of GPUs and
FPGAs
Building Artificial Intelligence
Develop computer algorithms that can:
- See and recognize objects around us
- Perceive human speech
- Understand natural language
- Navigate around autonomously
- Display human like Intelligence
Personal assistants, self-driving cars, etc.
Talk Roadmap
• Introduction
• Key Deep Learning Models
• Applications: Multimodal Learning and
Language Modeling
Learning Feature Representations
pixel 1
pixel 2 Learning
Algorithm
pixel 2
pixel1
Segway
Non-SegwayInput Space
Learning Feature Representations
pixel 2
pixel1
Segway
Non-SegwayInput Space
Handle
Wheel
Learning
Algorithm
Feature
Representation
Handle
Wheel
Feature Space
Traditional Approaches
Image vision features Recognition
Object
detection
Audio
classification
Audio audio features
Speaker
identification
Data Feature
extraction
Learning
algorithm
Computer Vision Features
SIFT Spin image
HoG RIFT
Textons GLOH
Computer Vision Features
SIFT Spin image
HoG RIFT
Textons GLOH
Deep Learning
ZCR
Spectrogram MFCC
RolloffFlux
Audio Features
Audio Features
ZCR
Spectrogram MFCC
RolloffFlux
Deep Learning
Example: Boltzmann Machine
Input data (e.g. pixel
intensities of an image,
words from webpages,
speech signal).
Target variables (response)
(e.g. class labels,
categories, phonemes).
Model parameters
Latent (hidden)
variables
Markov Random Fields, Undirected Graphical Models.
Unsupervised Learning
Vector of word counts
on a webpage
Latent variables:
semantic topics
804,414 newswire stories
(Hinton & Salakhutdinov, Science 2006)
Talk Roadmap
• Introduction
• Key Deep Learning Models
• Applications: Multimodal Learning and
Language Modeling.
Restricted Boltzmann Machines
- Can characterize uncertainty.
Pair-wise Unary
Markov random fields, Boltzmann machines, log-linear models.
Image visible variables
Feature Detectors
Define a proper probabilistic model:
- Deal with missing or noisy data.
- Can simulate from the model.
Modeling Images
(Salakhutdinov & Hinton, NIPS 2007; Salakhutdinov & Murray, ICML 2008)
Learned features (out of 10,000)
4 million unlabelled images
= 0.9 * + 0.8 * + 0.6 * …
New Image
Modeling Images and Text
(Salakhutdinov & Hinton, NIPS 2007; Salakhutdinov & Murray, ICML 2008)
Learned ``strokes’’Data: Handwritten characters
Learned features: ``topics’’
russian
russia
moscow
yeltsin
soviet
clinton
house
president
bill
congress
computer
system
product
software
develop
trade
country
import
world
economy
stock
wall
street
point
dow
Reuters dataset:
804,414 unlabeled
newswire stories
Bag-of-Words
Learned features: ``genre’’
Fahrenheit 9/11
Bowling for Columbine
The People vs. Larry Flynt
Canadian Bacon
La Dolce Vita
Independence Day
The Day After Tomorrow
Con Air
Men in Black II
Men in Black
Friday the 13th
The Texas Chainsaw Massacre
Children of the Corn
Child's Play
The Return of Michael Myers
Scary Movie
Naked Gun
Hot Shots!
American Pie
Police Academy
Netflix dataset:
480,189 users
17,770 movies
Over 100 million ratings
State-of-the-art performance
on the Netflix dataset.
Recommender Engine
(Salakhutdinov, Mnih, Hinton, ICML 2007)
Multinomial visible: user ratings
Binary hidden: user preferences
Image
Low-level features:
Edges
Input: Pixels
(Salakhutdinov & Hinton, Neural Computation 2012)
Deep Boltzmann Machines:
Learning Hierarchies of Features
Image
Higher-level features:
Combination of edges
Low-level features:
Edges
Input: Pixels
Learn simpler representations,
then compose more complex ones
(Salakhutdinov & Hinton, Neural Computation 2012)
Deep Boltzmann Machines:
Learning Hierarchies of Features
Learning Multiple Layers
• Biological and theoretical justification for learning multiple
layers of representation
• Biologically inspired learning:
- Brain has hierarchical
architecture
- Cortex appears to have a
generic learning algorithm
- Humans learn simpler representations, then
compose more complex ones
Learning Feature Hierarchies
Layer 1 Primitives
Lee et.al., ICML 2009
Layer 2
Parts
Layer 3
Objects
Learn simpler representations, then compose more complex ones.
Good Generative Model?
Handwritten Characters
Good Generative Model?
Handwritten Characters
Good Generative Model?
Handwritten Characters
Real DataSimulated
Good Generative Model?
Handwritten Characters
Real Data Simulated
Good Generative Model?
Handwritten Characters
Good Generative Model?
MNIST Handwritten Digit Dataset
Talk Roadmap
• Introduction
• Key Deep Learning Models
• Applications: Multimodal Learning and
Language Modeling.
Data – Collection of Modalities
• Multimedia content on the web -
image + text + audio.
• Product recommendation
systems.
• Robotics applications.
Audio
Vision
Touch sensors
Motor control
sunset,
pacificocean,
bakerbeach,
seashore, ocean
car,
automobile
Shared Concept
“Modality-free” representation
“Modality-full” representation
“Concept”
sunset, pacific ocean,
baker beach, seashore,
ocean
• Improve Classification
Multi-Modal Input
pentax, k10d, kangarooisland
southaustralia, sa australia
australiansealion 300mm
SEA / NOT SEA
• Retrieve data from one modality when queried using data from
another modality
beach, sea, surf,
strand, shore, wave,
seascape, sand,
ocean, waves
• Fill in Missing Modalities
beach, sea, surf,
strand, shore, wave,
seascape, sand,
ocean, waves
Challenges - I
Very different input
representations
Image Text
sunset, pacific ocean,
baker beach, seashore,
ocean • Images – real-valued, dense
Difficult to learn
cross-modal features
from low-level
representations.
Dense
• Text – discrete, sparse
Sparse
Challenges - II
Noisy and missing data
Image Text
pentax, k10d,
pentaxda50200,
kangarooisland, sa,
australiansealion
mickikrimmel,
mickipedia,
headshot
unseulpixel,
naturey
< no text>
Challenges - II
Image Text Text generated by the model
beach, sea, surf, strand,
shore, wave, seascape,
sand, ocean, waves
portrait, girl, woman, lady,
blonde, pretty, gorgeous,
expression, model
night, notte, traffic, light,
lights, parking, darkness,
lowlight, nacht, glow
fall, autumn, trees, leaves,
foliage, forest, woods,
branches, path
pentax, k10d,
pentaxda50200,
kangarooisland, sa,
australiansealion
mickikrimmel,
mickipedia,
headshot
unseulpixel,
naturey
< no text>
0
0
1
0
0
Dense, real-valued
image features
Gaussian model
Replicated Softmax
Multimodal DBM
Word
counts
(Srivastava & Salakhutdinov, NIPS 2012, JMLR 2014)
Multimodal DBM
0
0
1
0
0
Dense, real-valued
image features
Gaussian model
Replicated Softmax
Word
counts
(Srivastava & Salakhutdinov, NIPS 2012, JMLR 2014)
Gaussian model
Replicated Softmax
0
0
1
0
0
Multimodal DBM
Word
counts
Dense, real-valued
image features
(Srivastava & Salakhutdinov, NIPS 2012, JMLR 2014)
Text Generated from Images
canada, nature,
sunrise, ontario, fog,
mist, bc, morning
insect, butterfly, insects,
bug, butterflies,
lepidoptera
graffiti, streetart, stencil,
sticker, urbanart, graff,
sanfrancisco
portrait, child, kid,
ritratto, kids, children,
boy, cute, boys, italy
dog, cat, pet, kitten, puppy,
ginger, tongue, kitty, dogs,
furry
sea, france, boat, mer,
beach, river, bretagne,
plage, brittany
Given Generated Given Generated
Text Generated from Images
Given Generated
water, glass, beer, bottle,
drink, wine, bubbles, splash,
drops, drop
portrait, women, army, soldier,
mother, postcard, soldiers
obama, barackobama, election,
politics, president, hope, change,
sanfrancisco, convention, rally
Images from Text
water, red,
sunset
nature, flower,
red, green
blue, green,
yellow, colors
chocolate, cake
Given Retrieved
MIR-Flickr Dataset
Huiskes et. al.
• 1 million images along with user-assigned tags.
sculpture, beauty,
stone
nikon, green, light,
photoshop, apple, d70
white, yellow,
abstract, lines, bus,
graphic
sky, geotagged,
reflection, cielo,
bilbao, reflejo
food, cupcake,
vegan
d80
anawesomeshot,
theperfectphotographer,
flash, damniwishidtakenthat,
spiritofphotography
nikon, abigfave,
goldstaraward, d80,
nikond80
Results
• Logistic regression on top-level representation.
• Multimodal Inputs
Learning Algorithm MAP Precision@50
Random 0.124 0.124
LDA [Huiskes et. al.] 0.492 0.754
SVM [Huiskes et. al.] 0.475 0.758
DBM-Labelled 0.526 0.791
Deep Belief Net 0.638 0.867
Autoencoder 0.638 0.875
DBM 0.641 0.873
Mean Average Precision
Labeled
25K
examples
+ 1 Million
unlabelled
State-of-the-art performance
Generating Sentences
Input
A man skiing down the snow
covered mountain with a dark
sky in the background.
Output
• More challenging problem.
• How can we generate complete descriptions of images?
Learning Semantic Representation
• Key Idea: Each word w is represented as a D-dimensional
real-valued vector rw 2 RD.
Dimension 2
Dimension2
Semantic Space
table
chair
dolphin
whale
November
Joint Feature space
A castle and
reflecting water
A ship sailing
in the ocean
A plane flying
in the sky
Multimodal Neural Language Models (Kiros, et.al., ICML 2014)
Learning Semantic Representation
Tagging and Retrieval
mosque, tower,
building, cathedral,
dome, castle
kitchen, stove, oven,
refrigerator,
microwave
ski, skiing,
skiers, skiiers,
snowmobile
bowl, cup,
soup, cups,
coffee
beach
snow
Multimodal Linguistic Regularities
Nearest Images
Ryan Kiros, 2014
Multimodal Linguistic Regularities
Nearest Images
Ryan Kiros, 2014
Caption Generation
Caption Generation
Model Samples
• Two men in a room talking on a table .
• Two men are sitting next to each other .
• Two men are having a conversation at a table .
• Two men sitting at a desk next to each other .
colleagues waiters waiter
entrepreneurs busboy
TAGS:
More Examples
spider, spiders, arachnid,
insects, insect
creepy, spooky, elfin
Model Samples
Giant spider found in the Netherlands.
Look at the new spider web.
This was near the black spider web.
I like the spider.
The pattern of one spider web.
TAGS:
Multi-Modal Models
Laser scans
Images
Video
Text & Language
Time series
data
Speech &
Audio
Develop learning systems that come
closer to displaying human like intelligence
Summary
• Efficient learning algorithms for Hierarchical Generative Models.
Learning more adaptive, robust, and structured representations.
• Deep models can improve current state-of-the art in many
application domains:
 Object recognition and detection, text and image retrieval, handwritten
character and speech recognition, and others.
Text & image retrieval /
Object recognition
Learning a Category
Hierarchy
Dealing with
missing/occluded data
HMM decoder
Speech Recognition
sunset, pacific ocean,
beach, seashore
Multimodal Data
Object Detection
Our Toronto Lab
We collaborate with and consult for various
organizations
Thank you

Más contenido relacionado

Similar a Deep Learning: Changing the Playing Field of Artificial Intelligence - MaRS Global Leadership

AWS re:Invent 2016: Deep Learning in Alexa (MAC202)
AWS re:Invent 2016: Deep Learning in Alexa (MAC202)AWS re:Invent 2016: Deep Learning in Alexa (MAC202)
AWS re:Invent 2016: Deep Learning in Alexa (MAC202)Amazon Web Services
 
An introduction to Deep Learning
An introduction to Deep LearningAn introduction to Deep Learning
An introduction to Deep LearningAmazon Web Services
 
Deep Learning Use Cases - Data Science Pop-up Seattle
Deep Learning Use Cases - Data Science Pop-up SeattleDeep Learning Use Cases - Data Science Pop-up Seattle
Deep Learning Use Cases - Data Science Pop-up SeattleDomino Data Lab
 
Introduction to Deep Learning (September 2017)
Introduction to Deep Learning (September 2017)Introduction to Deep Learning (September 2017)
Introduction to Deep Learning (September 2017)Julien SIMON
 
Deep Learning and the state of AI / 2016
Deep Learning and the state of AI / 2016Deep Learning and the state of AI / 2016
Deep Learning and the state of AI / 2016Grigory Sapunov
 
Week1- Introduction.pptx
Week1- Introduction.pptxWeek1- Introduction.pptx
Week1- Introduction.pptxfahmi324663
 
The Concurrent Constraint Programming Research Programmes -- Redux
The Concurrent Constraint Programming Research Programmes -- ReduxThe Concurrent Constraint Programming Research Programmes -- Redux
The Concurrent Constraint Programming Research Programmes -- ReduxPierre Schaus
 
Big, Open, Data and Semantics for Real-World Application Near You
Big, Open, Data and Semantics for Real-World Application Near YouBig, Open, Data and Semantics for Real-World Application Near You
Big, Open, Data and Semantics for Real-World Application Near YouBiplav Srivastava
 
Deep Learning AtoC with Image Perspective
Deep Learning AtoC with Image PerspectiveDeep Learning AtoC with Image Perspective
Deep Learning AtoC with Image PerspectiveDong Heon Cho
 
OWF14 - Big Data : The State of Machine Learning in 2014
OWF14 - Big Data : The State of Machine  Learning in 2014OWF14 - Big Data : The State of Machine  Learning in 2014
OWF14 - Big Data : The State of Machine Learning in 2014Paris Open Source Summit
 
Data Con LA 2019 - State of the Art of Innovation in Computer Vision by Chris...
Data Con LA 2019 - State of the Art of Innovation in Computer Vision by Chris...Data Con LA 2019 - State of the Art of Innovation in Computer Vision by Chris...
Data Con LA 2019 - State of the Art of Innovation in Computer Vision by Chris...Data Con LA
 
Promises of Deep Learning
Promises of Deep LearningPromises of Deep Learning
Promises of Deep LearningDavid Khosid
 
A Spark-Based Intelligent Assistant: Making Data Exploration in Natural Langu...
A Spark-Based Intelligent Assistant: Making Data Exploration in Natural Langu...A Spark-Based Intelligent Assistant: Making Data Exploration in Natural Langu...
A Spark-Based Intelligent Assistant: Making Data Exploration in Natural Langu...Databricks
 
AI in Finance: Moving forward!
AI in Finance: Moving forward!AI in Finance: Moving forward!
AI in Finance: Moving forward!Adrian Hornsby
 
LiDeng-BerlinOct2015-ASR-GenDisc-4by3.pptx
LiDeng-BerlinOct2015-ASR-GenDisc-4by3.pptxLiDeng-BerlinOct2015-ASR-GenDisc-4by3.pptx
LiDeng-BerlinOct2015-ASR-GenDisc-4by3.pptxVishnuRajuV
 
Big Data & Artificial Intelligence
Big Data & Artificial IntelligenceBig Data & Artificial Intelligence
Big Data & Artificial IntelligenceZavain Dar
 
Social Semantic (Sensor) Web
Social Semantic (Sensor) WebSocial Semantic (Sensor) Web
Social Semantic (Sensor) WebDavid Crowley
 
Chatbots in 2017 -- Ithaca Talk Dec 6
Chatbots in 2017 -- Ithaca Talk Dec 6Chatbots in 2017 -- Ithaca Talk Dec 6
Chatbots in 2017 -- Ithaca Talk Dec 6Paul Houle
 

Similar a Deep Learning: Changing the Playing Field of Artificial Intelligence - MaRS Global Leadership (20)

AWS re:Invent 2016: Deep Learning in Alexa (MAC202)
AWS re:Invent 2016: Deep Learning in Alexa (MAC202)AWS re:Invent 2016: Deep Learning in Alexa (MAC202)
AWS re:Invent 2016: Deep Learning in Alexa (MAC202)
 
An introduction to Deep Learning
An introduction to Deep LearningAn introduction to Deep Learning
An introduction to Deep Learning
 
Deep Learning Use Cases - Data Science Pop-up Seattle
Deep Learning Use Cases - Data Science Pop-up SeattleDeep Learning Use Cases - Data Science Pop-up Seattle
Deep Learning Use Cases - Data Science Pop-up Seattle
 
Introduction to Deep Learning (September 2017)
Introduction to Deep Learning (September 2017)Introduction to Deep Learning (September 2017)
Introduction to Deep Learning (September 2017)
 
Deep Learning and the state of AI / 2016
Deep Learning and the state of AI / 2016Deep Learning and the state of AI / 2016
Deep Learning and the state of AI / 2016
 
Week1- Introduction.pptx
Week1- Introduction.pptxWeek1- Introduction.pptx
Week1- Introduction.pptx
 
The Concurrent Constraint Programming Research Programmes -- Redux
The Concurrent Constraint Programming Research Programmes -- ReduxThe Concurrent Constraint Programming Research Programmes -- Redux
The Concurrent Constraint Programming Research Programmes -- Redux
 
Big, Open, Data and Semantics for Real-World Application Near You
Big, Open, Data and Semantics for Real-World Application Near YouBig, Open, Data and Semantics for Real-World Application Near You
Big, Open, Data and Semantics for Real-World Application Near You
 
Deep Learning AtoC with Image Perspective
Deep Learning AtoC with Image PerspectiveDeep Learning AtoC with Image Perspective
Deep Learning AtoC with Image Perspective
 
OWF14 - Big Data : The State of Machine Learning in 2014
OWF14 - Big Data : The State of Machine  Learning in 2014OWF14 - Big Data : The State of Machine  Learning in 2014
OWF14 - Big Data : The State of Machine Learning in 2014
 
Data Con LA 2019 - State of the Art of Innovation in Computer Vision by Chris...
Data Con LA 2019 - State of the Art of Innovation in Computer Vision by Chris...Data Con LA 2019 - State of the Art of Innovation in Computer Vision by Chris...
Data Con LA 2019 - State of the Art of Innovation in Computer Vision by Chris...
 
Promises of Deep Learning
Promises of Deep LearningPromises of Deep Learning
Promises of Deep Learning
 
A Spark-Based Intelligent Assistant: Making Data Exploration in Natural Langu...
A Spark-Based Intelligent Assistant: Making Data Exploration in Natural Langu...A Spark-Based Intelligent Assistant: Making Data Exploration in Natural Langu...
A Spark-Based Intelligent Assistant: Making Data Exploration in Natural Langu...
 
AI in Finance: Moving forward!
AI in Finance: Moving forward!AI in Finance: Moving forward!
AI in Finance: Moving forward!
 
LiDeng-BerlinOct2015-ASR-GenDisc-4by3.pptx
LiDeng-BerlinOct2015-ASR-GenDisc-4by3.pptxLiDeng-BerlinOct2015-ASR-GenDisc-4by3.pptx
LiDeng-BerlinOct2015-ASR-GenDisc-4by3.pptx
 
AI meets Big Data
AI meets Big DataAI meets Big Data
AI meets Big Data
 
Big Data & Artificial Intelligence
Big Data & Artificial IntelligenceBig Data & Artificial Intelligence
Big Data & Artificial Intelligence
 
Social Semantic (Sensor) Web
Social Semantic (Sensor) WebSocial Semantic (Sensor) Web
Social Semantic (Sensor) Web
 
Chatbots in 2017 -- Ithaca Talk Dec 6
Chatbots in 2017 -- Ithaca Talk Dec 6Chatbots in 2017 -- Ithaca Talk Dec 6
Chatbots in 2017 -- Ithaca Talk Dec 6
 
Our World is Socio-technical
Our World is Socio-technicalOur World is Socio-technical
Our World is Socio-technical
 

Más de MaRS Discovery District

How to Pitch a VC - Entrepreneurship 101
How to Pitch a VC - Entrepreneurship 101How to Pitch a VC - Entrepreneurship 101
How to Pitch a VC - Entrepreneurship 101MaRS Discovery District
 
25 lessons learned - Entrepreneurship 101
25 lessons learned - Entrepreneurship 10125 lessons learned - Entrepreneurship 101
25 lessons learned - Entrepreneurship 101MaRS Discovery District
 
So you want to start a business? - Entrepreneurship 101
So you want to start a business? - Entrepreneurship 101So you want to start a business? - Entrepreneurship 101
So you want to start a business? - Entrepreneurship 101MaRS Discovery District
 
Lessons in Startup Leadership - Entrepreneurship 101
Lessons in Startup Leadership - Entrepreneurship 101Lessons in Startup Leadership - Entrepreneurship 101
Lessons in Startup Leadership - Entrepreneurship 101MaRS Discovery District
 
Startup finances: Forecasting, Modelling & Metrics
Startup finances:  Forecasting, Modelling & MetricsStartup finances:  Forecasting, Modelling & Metrics
Startup finances: Forecasting, Modelling & MetricsMaRS Discovery District
 
10+ Steps to Scaling Your Cheer Squad - Entrepreneurship 101
10+ Steps to Scaling Your Cheer Squad - Entrepreneurship 10110+ Steps to Scaling Your Cheer Squad - Entrepreneurship 101
10+ Steps to Scaling Your Cheer Squad - Entrepreneurship 101MaRS Discovery District
 
Scaling Your Startup - Entrepreneurship 101
Scaling Your Startup - Entrepreneurship 101Scaling Your Startup - Entrepreneurship 101
Scaling Your Startup - Entrepreneurship 101MaRS Discovery District
 
Scaling Outside Canada - Entrepreneurship 101
Scaling Outside Canada - Entrepreneurship 101Scaling Outside Canada - Entrepreneurship 101
Scaling Outside Canada - Entrepreneurship 101MaRS Discovery District
 
Partnership Negotiations - Entrepreneurship 101
Partnership Negotiations - Entrepreneurship 101Partnership Negotiations - Entrepreneurship 101
Partnership Negotiations - Entrepreneurship 101MaRS Discovery District
 
Art of the deal 101: Notes from the Trenches - Entrepreneurship 101
Art of the deal 101: Notes from the Trenches - Entrepreneurship 101Art of the deal 101: Notes from the Trenches - Entrepreneurship 101
Art of the deal 101: Notes from the Trenches - Entrepreneurship 101MaRS Discovery District
 
The Art & Science of Sales: Tips, Tricks & Tools - Entrepreneurship 101
The Art & Science of Sales: Tips, Tricks & Tools - Entrepreneurship 101The Art & Science of Sales: Tips, Tricks & Tools - Entrepreneurship 101
The Art & Science of Sales: Tips, Tricks & Tools - Entrepreneurship 101MaRS Discovery District
 
Sales Putting the Fun in Funnel - Entrepreneurship 101
Sales Putting the Fun in Funnel - Entrepreneurship 101Sales Putting the Fun in Funnel - Entrepreneurship 101
Sales Putting the Fun in Funnel - Entrepreneurship 101MaRS Discovery District
 

Más de MaRS Discovery District (20)

How to Pitch a VC - Entrepreneurship 101
How to Pitch a VC - Entrepreneurship 101How to Pitch a VC - Entrepreneurship 101
How to Pitch a VC - Entrepreneurship 101
 
The Pitch - Entrepreneurship 101
The Pitch - Entrepreneurship 101The Pitch - Entrepreneurship 101
The Pitch - Entrepreneurship 101
 
25 lessons learned - Entrepreneurship 101
25 lessons learned - Entrepreneurship 10125 lessons learned - Entrepreneurship 101
25 lessons learned - Entrepreneurship 101
 
So you want to start a business? - Entrepreneurship 101
So you want to start a business? - Entrepreneurship 101So you want to start a business? - Entrepreneurship 101
So you want to start a business? - Entrepreneurship 101
 
Lessons in Startup Leadership - Entrepreneurship 101
Lessons in Startup Leadership - Entrepreneurship 101Lessons in Startup Leadership - Entrepreneurship 101
Lessons in Startup Leadership - Entrepreneurship 101
 
Why Should I Work for You? (The EVP)
Why Should I Work for You? (The EVP)Why Should I Work for You? (The EVP)
Why Should I Work for You? (The EVP)
 
A New Hiring Paradigm
A New Hiring ParadigmA New Hiring Paradigm
A New Hiring Paradigm
 
How to Find and Hire Top Talent
How to Find and Hire Top TalentHow to Find and Hire Top Talent
How to Find and Hire Top Talent
 
Startup finances: Forecasting, Modelling & Metrics
Startup finances:  Forecasting, Modelling & MetricsStartup finances:  Forecasting, Modelling & Metrics
Startup finances: Forecasting, Modelling & Metrics
 
Financial Modelling
Financial Modelling Financial Modelling
Financial Modelling
 
Forecasting Revenue
Forecasting RevenueForecasting Revenue
Forecasting Revenue
 
10+ Steps to Scaling Your Cheer Squad - Entrepreneurship 101
10+ Steps to Scaling Your Cheer Squad - Entrepreneurship 10110+ Steps to Scaling Your Cheer Squad - Entrepreneurship 101
10+ Steps to Scaling Your Cheer Squad - Entrepreneurship 101
 
Scaling Your Startup - Entrepreneurship 101
Scaling Your Startup - Entrepreneurship 101Scaling Your Startup - Entrepreneurship 101
Scaling Your Startup - Entrepreneurship 101
 
Scaling Outside Canada - Entrepreneurship 101
Scaling Outside Canada - Entrepreneurship 101Scaling Outside Canada - Entrepreneurship 101
Scaling Outside Canada - Entrepreneurship 101
 
Partnership Negotiations - Entrepreneurship 101
Partnership Negotiations - Entrepreneurship 101Partnership Negotiations - Entrepreneurship 101
Partnership Negotiations - Entrepreneurship 101
 
Licensing - Entrepreneurship 101
Licensing - Entrepreneurship 101Licensing - Entrepreneurship 101
Licensing - Entrepreneurship 101
 
Art of the deal 101: Notes from the Trenches - Entrepreneurship 101
Art of the deal 101: Notes from the Trenches - Entrepreneurship 101Art of the deal 101: Notes from the Trenches - Entrepreneurship 101
Art of the deal 101: Notes from the Trenches - Entrepreneurship 101
 
Social Selling - Entrepreneurship 101
Social Selling - Entrepreneurship 101Social Selling - Entrepreneurship 101
Social Selling - Entrepreneurship 101
 
The Art & Science of Sales: Tips, Tricks & Tools - Entrepreneurship 101
The Art & Science of Sales: Tips, Tricks & Tools - Entrepreneurship 101The Art & Science of Sales: Tips, Tricks & Tools - Entrepreneurship 101
The Art & Science of Sales: Tips, Tricks & Tools - Entrepreneurship 101
 
Sales Putting the Fun in Funnel - Entrepreneurship 101
Sales Putting the Fun in Funnel - Entrepreneurship 101Sales Putting the Fun in Funnel - Entrepreneurship 101
Sales Putting the Fun in Funnel - Entrepreneurship 101
 

Último

SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 

Último (20)

SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 

Deep Learning: Changing the Playing Field of Artificial Intelligence - MaRS Global Leadership

  • 1. Deep Learning Ruslan Salakhutdinov Department of Computer Science University of Toronto
  • 2. Images & Video Relational Data/ Social Network Massive increase in both computational power and the amount of data available from web, video cameras, laboratory measurements. Mining for Structure Speech & Audio Gene Expression Text & Language Geological Data Product Recommendation Climate Change Mostly Unlabeled • Develop statistical models that can discover underlying structure, cause, or statistical correlation from data in unsupervised or semi-supervised way. • Multiple application domains. Deep Learning
  • 3. Impact of Deep Learning • Speech Recognition • Computer Vision • Language Understanding • Recommender Systems • Drug Discovery and Medical Image Analysis
  • 4. Deep Learning in Action • Achieves state-of-the-art on many object recognition tasks! Try it at deeplearning.cs.toronto.edu!
  • 5. Example: Understanding Images Model Samples • a group of people in a crowded area . • a group of people are walking and talking . • a group of people, standing around and talking . • a group of people that are in the outside . strangers, coworkers, conventioneers, attendants, patrons TAGS: Nearest Neighbor Sentence: people taking pictures of a crazy person
  • 6. Image Tagging and Retrieval mosque, tower, building, cathedral, dome, castle kitchen, stove, oven, refrigerator, microwave ski, skiing, skiers, skiiers, snowmobile bowl, cup, soup, cups, coffee beach snow
  • 8. Merck Molecular Activity Challenge • Deep Learning technique: Predict biological activities of different molecules, given numerical descriptors generated from their chemical structures. • To develop new medicines, it is important to identify molecules that are highly active toward their intended targets. Toronto team takes first place!
  • 9. • From their blog: - Restricted Boltzmann machines - Probabilistic Matrix Factorization (Salakhutdinov et. al. ICML, 2007, Salakhutdinov and Mnih, 2008) To put these algorithms to use, we had to work to overcome some limitations, for instance that they were built to handle 100 million ratings, instead of the more than 5 billion that we have, and that they were not built to adapt as members added more ratings. But once we overcame those challenges, we put the two algorithms into production, where they are still used as part of our recommendation engine. Netflix uses: Both of these algorithms were developed by us at Toronto!
  • 10.
  • 11. Deep Learning in the News
  • 12. Key Computational Challenges - Learning from billions of (unlabeled) data points - Developing new parallel algorithms Building bigger models using more data improves performance of deep learning algorithms! Scaling up our deep learning algorithms: - Scaling up Computation using clusters of GPUs and FPGAs
  • 13. Building Artificial Intelligence Develop computer algorithms that can: - See and recognize objects around us - Perceive human speech - Understand natural language - Navigate around autonomously - Display human like Intelligence Personal assistants, self-driving cars, etc.
  • 14. Talk Roadmap • Introduction • Key Deep Learning Models • Applications: Multimodal Learning and Language Modeling
  • 15. Learning Feature Representations pixel 1 pixel 2 Learning Algorithm pixel 2 pixel1 Segway Non-SegwayInput Space
  • 16. Learning Feature Representations pixel 2 pixel1 Segway Non-SegwayInput Space Handle Wheel Learning Algorithm Feature Representation Handle Wheel Feature Space
  • 17. Traditional Approaches Image vision features Recognition Object detection Audio classification Audio audio features Speaker identification Data Feature extraction Learning algorithm
  • 18. Computer Vision Features SIFT Spin image HoG RIFT Textons GLOH
  • 19. Computer Vision Features SIFT Spin image HoG RIFT Textons GLOH Deep Learning
  • 22. Example: Boltzmann Machine Input data (e.g. pixel intensities of an image, words from webpages, speech signal). Target variables (response) (e.g. class labels, categories, phonemes). Model parameters Latent (hidden) variables Markov Random Fields, Undirected Graphical Models.
  • 23. Unsupervised Learning Vector of word counts on a webpage Latent variables: semantic topics 804,414 newswire stories (Hinton & Salakhutdinov, Science 2006)
  • 24. Talk Roadmap • Introduction • Key Deep Learning Models • Applications: Multimodal Learning and Language Modeling.
  • 25. Restricted Boltzmann Machines - Can characterize uncertainty. Pair-wise Unary Markov random fields, Boltzmann machines, log-linear models. Image visible variables Feature Detectors Define a proper probabilistic model: - Deal with missing or noisy data. - Can simulate from the model.
  • 26. Modeling Images (Salakhutdinov & Hinton, NIPS 2007; Salakhutdinov & Murray, ICML 2008) Learned features (out of 10,000) 4 million unlabelled images = 0.9 * + 0.8 * + 0.6 * … New Image
  • 27. Modeling Images and Text (Salakhutdinov & Hinton, NIPS 2007; Salakhutdinov & Murray, ICML 2008) Learned ``strokes’’Data: Handwritten characters Learned features: ``topics’’ russian russia moscow yeltsin soviet clinton house president bill congress computer system product software develop trade country import world economy stock wall street point dow Reuters dataset: 804,414 unlabeled newswire stories Bag-of-Words
  • 28. Learned features: ``genre’’ Fahrenheit 9/11 Bowling for Columbine The People vs. Larry Flynt Canadian Bacon La Dolce Vita Independence Day The Day After Tomorrow Con Air Men in Black II Men in Black Friday the 13th The Texas Chainsaw Massacre Children of the Corn Child's Play The Return of Michael Myers Scary Movie Naked Gun Hot Shots! American Pie Police Academy Netflix dataset: 480,189 users 17,770 movies Over 100 million ratings State-of-the-art performance on the Netflix dataset. Recommender Engine (Salakhutdinov, Mnih, Hinton, ICML 2007) Multinomial visible: user ratings Binary hidden: user preferences
  • 29. Image Low-level features: Edges Input: Pixels (Salakhutdinov & Hinton, Neural Computation 2012) Deep Boltzmann Machines: Learning Hierarchies of Features
  • 30. Image Higher-level features: Combination of edges Low-level features: Edges Input: Pixels Learn simpler representations, then compose more complex ones (Salakhutdinov & Hinton, Neural Computation 2012) Deep Boltzmann Machines: Learning Hierarchies of Features
  • 31. Learning Multiple Layers • Biological and theoretical justification for learning multiple layers of representation • Biologically inspired learning: - Brain has hierarchical architecture - Cortex appears to have a generic learning algorithm - Humans learn simpler representations, then compose more complex ones
  • 32. Learning Feature Hierarchies Layer 1 Primitives Lee et.al., ICML 2009 Layer 2 Parts Layer 3 Objects Learn simpler representations, then compose more complex ones.
  • 35. Good Generative Model? Handwritten Characters Real DataSimulated
  • 36. Good Generative Model? Handwritten Characters Real Data Simulated
  • 38. Good Generative Model? MNIST Handwritten Digit Dataset
  • 39. Talk Roadmap • Introduction • Key Deep Learning Models • Applications: Multimodal Learning and Language Modeling.
  • 40. Data – Collection of Modalities • Multimedia content on the web - image + text + audio. • Product recommendation systems. • Robotics applications. Audio Vision Touch sensors Motor control sunset, pacificocean, bakerbeach, seashore, ocean car, automobile
  • 41. Shared Concept “Modality-free” representation “Modality-full” representation “Concept” sunset, pacific ocean, baker beach, seashore, ocean
  • 42. • Improve Classification Multi-Modal Input pentax, k10d, kangarooisland southaustralia, sa australia australiansealion 300mm SEA / NOT SEA • Retrieve data from one modality when queried using data from another modality beach, sea, surf, strand, shore, wave, seascape, sand, ocean, waves • Fill in Missing Modalities beach, sea, surf, strand, shore, wave, seascape, sand, ocean, waves
  • 43. Challenges - I Very different input representations Image Text sunset, pacific ocean, baker beach, seashore, ocean • Images – real-valued, dense Difficult to learn cross-modal features from low-level representations. Dense • Text – discrete, sparse Sparse
  • 44. Challenges - II Noisy and missing data Image Text pentax, k10d, pentaxda50200, kangarooisland, sa, australiansealion mickikrimmel, mickipedia, headshot unseulpixel, naturey < no text>
  • 45. Challenges - II Image Text Text generated by the model beach, sea, surf, strand, shore, wave, seascape, sand, ocean, waves portrait, girl, woman, lady, blonde, pretty, gorgeous, expression, model night, notte, traffic, light, lights, parking, darkness, lowlight, nacht, glow fall, autumn, trees, leaves, foliage, forest, woods, branches, path pentax, k10d, pentaxda50200, kangarooisland, sa, australiansealion mickikrimmel, mickipedia, headshot unseulpixel, naturey < no text>
  • 46. 0 0 1 0 0 Dense, real-valued image features Gaussian model Replicated Softmax Multimodal DBM Word counts (Srivastava & Salakhutdinov, NIPS 2012, JMLR 2014)
  • 47. Multimodal DBM 0 0 1 0 0 Dense, real-valued image features Gaussian model Replicated Softmax Word counts (Srivastava & Salakhutdinov, NIPS 2012, JMLR 2014)
  • 48. Gaussian model Replicated Softmax 0 0 1 0 0 Multimodal DBM Word counts Dense, real-valued image features (Srivastava & Salakhutdinov, NIPS 2012, JMLR 2014)
  • 49. Text Generated from Images canada, nature, sunrise, ontario, fog, mist, bc, morning insect, butterfly, insects, bug, butterflies, lepidoptera graffiti, streetart, stencil, sticker, urbanart, graff, sanfrancisco portrait, child, kid, ritratto, kids, children, boy, cute, boys, italy dog, cat, pet, kitten, puppy, ginger, tongue, kitty, dogs, furry sea, france, boat, mer, beach, river, bretagne, plage, brittany Given Generated Given Generated
  • 50. Text Generated from Images Given Generated water, glass, beer, bottle, drink, wine, bubbles, splash, drops, drop portrait, women, army, soldier, mother, postcard, soldiers obama, barackobama, election, politics, president, hope, change, sanfrancisco, convention, rally
  • 51. Images from Text water, red, sunset nature, flower, red, green blue, green, yellow, colors chocolate, cake Given Retrieved
  • 52. MIR-Flickr Dataset Huiskes et. al. • 1 million images along with user-assigned tags. sculpture, beauty, stone nikon, green, light, photoshop, apple, d70 white, yellow, abstract, lines, bus, graphic sky, geotagged, reflection, cielo, bilbao, reflejo food, cupcake, vegan d80 anawesomeshot, theperfectphotographer, flash, damniwishidtakenthat, spiritofphotography nikon, abigfave, goldstaraward, d80, nikond80
  • 53. Results • Logistic regression on top-level representation. • Multimodal Inputs Learning Algorithm MAP Precision@50 Random 0.124 0.124 LDA [Huiskes et. al.] 0.492 0.754 SVM [Huiskes et. al.] 0.475 0.758 DBM-Labelled 0.526 0.791 Deep Belief Net 0.638 0.867 Autoencoder 0.638 0.875 DBM 0.641 0.873 Mean Average Precision Labeled 25K examples + 1 Million unlabelled State-of-the-art performance
  • 54. Generating Sentences Input A man skiing down the snow covered mountain with a dark sky in the background. Output • More challenging problem. • How can we generate complete descriptions of images?
  • 55. Learning Semantic Representation • Key Idea: Each word w is represented as a D-dimensional real-valued vector rw 2 RD. Dimension 2 Dimension2 Semantic Space table chair dolphin whale November
  • 56. Joint Feature space A castle and reflecting water A ship sailing in the ocean A plane flying in the sky Multimodal Neural Language Models (Kiros, et.al., ICML 2014) Learning Semantic Representation
  • 57. Tagging and Retrieval mosque, tower, building, cathedral, dome, castle kitchen, stove, oven, refrigerator, microwave ski, skiing, skiers, skiiers, snowmobile bowl, cup, soup, cups, coffee beach snow
  • 61. Caption Generation Model Samples • Two men in a room talking on a table . • Two men are sitting next to each other . • Two men are having a conversation at a table . • Two men sitting at a desk next to each other . colleagues waiters waiter entrepreneurs busboy TAGS:
  • 62. More Examples spider, spiders, arachnid, insects, insect creepy, spooky, elfin Model Samples Giant spider found in the Netherlands. Look at the new spider web. This was near the black spider web. I like the spider. The pattern of one spider web. TAGS:
  • 63. Multi-Modal Models Laser scans Images Video Text & Language Time series data Speech & Audio Develop learning systems that come closer to displaying human like intelligence
  • 64. Summary • Efficient learning algorithms for Hierarchical Generative Models. Learning more adaptive, robust, and structured representations. • Deep models can improve current state-of-the art in many application domains:  Object recognition and detection, text and image retrieval, handwritten character and speech recognition, and others. Text & image retrieval / Object recognition Learning a Category Hierarchy Dealing with missing/occluded data HMM decoder Speech Recognition sunset, pacific ocean, beach, seashore Multimodal Data Object Detection
  • 65. Our Toronto Lab We collaborate with and consult for various organizations