SlideShare una empresa de Scribd logo
1 de 60
Training Image
Models with
Video Learning for Analysis from Deep Embeddings
Timothy Emerick, PhD Sue He Alexander Polis Monica Rajendiran
truck
truck
bicyclist
A green truck is crossing an intersection.
A group of people are crossing the street.
★ Machine vision models often require large amounts of labeled data to
train well
★ Existing labelled datasets can be too generic and have a broad concept
space for our purposes
★ Machine vision models often require large amounts of labeled data to
train well
★ Existing labelled datasets can be too generic and have a broad concept
space for our purposes
ImageNet
14 million+ images of 21K+ class entities
YouTube-8M
450K+ hours of 4700+ class entities
Olga Russakovsky*, Jia Deng*, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma,
Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg
and Li Fei-Fei. (* = equal contribution) ImageNet Large Scale Visual Recognition
Challenge. IJCV, 2015.
Abu-El-Haija, Sami, et al. "YouTube-8M: A large-scale video classification
benchmark." arXiv preprint arXiv:1609.08675 (2016).
ImageNet
14 million+ images of 21K+ class entities
YouTube-8M
450K+ hours of 4700+ class entities
Olga Russakovsky*, Jia Deng*, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma,
Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg
and Li Fei-Fei. (* = equal contribution) ImageNet Large Scale Visual Recognition
Challenge. IJCV, 2015.
Abu-El-Haija, Sami, et al. "YouTube-8M: A large-scale video classification
benchmark." arXiv preprint arXiv:1609.08675 (2016).
★ Graphics have become
extremely realistic over the
years
★ Games are codeable, enabling
complex simulations
★ Simulating in-game helps you
ignore low level tasks like
movement animations and
routing
★ Graphics have become
extremely realistic over the
years
★ Games are codeable, enabling
complex simulations
★ Simulating in-game helps you
ignore low level tasks like
movement animations and
routing
★ Graphics have become
extremely realistic over the
years
★ Games are codeable, enabling
complex simulations
★ Simulating in-game helps you
ignore low level tasks like
movement animations and
routing
★ Rockstar Advanced Game
Engine’s (RAGE) super realistic
graphics
★ Huge modding community
provides lots of customization
★ Programmatically configurable
options
★ Rockstar Advanced Game
Engine’s (RAGE) super realistic
graphics
★ Huge modding community
provides lots of customization
★ Programmatically configurable
options
★ Rockstar Advanced Game
Engine’s (RAGE) super realistic
graphics
★ Huge modding community
provides lots of customization
★ Programmatically configurable
options
★ Programmatically configurable
options
○ Script-Hook-V is a library which
allows you to write scripts in-game
○ Thousands of function calls
★ Programmatically configurable
options
○ We can generate entities of choice
in-game and have them perform
complex actions
○ Vehicles: driving, turning, waiting at
stoplights
○ People: entering/exiting vehicles,
waiting to cross the street, parking
○ Environment: weather, time of day,
camera elevation, zoom
★ Grand Theft Auto Dataset:
○ Video footage
○ Objects of interest per frame
(vehicles and pedestrians)
○ Object location information
(bounding box information)
○ Text Descriptions
(e.g. a white truck is turning left)
CNNS
★ Extracts features from the input image,
distilled down to class predictions
★ Preserves spatial relationship between
pixels
Bird
Airplane
Superman
Car
0 2 2 3 0
2 5 3 3 7
8 7 4 5 3
5 4 4 0 2
8 7 3 8 5
7 8 5
12 12 15
16 16 7
0 2 2 3 0
2 5 3 3 7
8 7 4 5 3
5 4 4 0 2
8 7 3 8 5
*
1 0 1
0 1 0
0 0 0
Input Image Weights
*
Filter
Feature Map
7
0 2 2 3 0
2 5 3 3 7
8 7 4 5 3
5 4 4 0 2
8 7 3 8 5
*
1 0 1
0 1 0
0 0 0
Input Image Weights
*
Filter
Feature Map
x1 x0 x1
x0 x1 x0
x0 x0 x0
7 8
0 2 2 3 0
2 5 3 3 7
8 7 4 5 3
5 4 4 0 2
8 7 3 8 5
*
1 0 1
0 1 0
0 0 0
Input Image Weights
*
Filter
Feature Map
x1 x0 x1
x0 x1 x0
x0 x0 x0
7 8 5
0 2 2 3 0
2 5 3 3 7
8 7 4 5 3
5 4 4 0 2
8 7 3 8 5
*
1 0 1
0 1 0
0 0 0
Input Image Weights
*
Filter
Feature Map
x1 x0 x1
x0 x1 x0
x0 x0 x0
7 8 5
12
0 2 2 3 0
2 5 3 3 7
8 7 4 5 3
5 4 4 0 2
8 7 3 8 5
*
1 0 1
0 1 0
0 0 0
Input Image Weights
*
Filter
Feature Map
x1 x0 x1
x0 x1 x0
x0 x0 x0
7 8 5
12 12
0 2 2 3 0
2 5 3 3 7
8 7 4 5 3
5 4 4 0 2
8 7 3 8 5
*
1 0 1
0 1 0
0 0 0
Input Image Weights
*
Filter
Feature Map
x1 x0 x1
x0 x1 x0
x0 x0 x0
7 8 5
12 12 15
0 2 2 3 0
2 5 3 3 7
8 7 4 5 3
5 4 4 0 2
8 7 3 8 5
*
1 0 1
0 1 0
0 0 0
Input Image Weights
*
Filter
Feature Map
x1 x0 x1
x0 x1 x0
x0 x0 x0
7 8 5
12 12 15
16
0 2 2 3 0
2 5 3 3 7
8 7 4 5 3
5 4 4 0 2
8 7 3 8 5
*
1 0 1
0 1 0
0 0 0
Input Image Weights
*
Filter
Feature Map
x1 x0 x1
x0 x1 x0
x0 x0 x0
7 8 5
12 12 15
16 16
0 2 2 3 0
2 5 3 3 7
8 7 4 5 3
5 4 4 0 2
8 7 3 8 5
*
1 0 1
0 1 0
0 0 0
Input Image Weights
*
Filter
Feature Map
x1 x0 x1
x0 x1 x0
x0 x0 x0
7 8 5
12 12 15
16 16 7
0 2 2 3 0
2 5 3 3 7
8 7 4 5 3
5 4 4 0 2
8 7 3 8 5
*
1 0 1
0 1 0
0 0 0
Input Image Weights
*
Filter
Feature Map
x1 x0 x1
x0 x1 x0
x0 x0 x0
3 feature maps
produced from 3
filters
Bird
Airplane
Superman
Car
-1 -1 -1
-1 8 -1
-1 -1 -1
CNNS
★ Extracts features from the input image,
distilled down to class predictions
★ Preserves spatial relationship between
pixels
Bird
Airplane
Superman
Car
★ YOLO9000 (YOLO v2) is a real time object
detection convolutional neural network
architecture
★ Redmon, Joseph and Farhadi, Ali. "YOLO9000:
better, faster, stronger." arXiv (2017).
★ YOLO9000 (YOLO v2) is a real time object
detection convolutional neural network
architecture
★ Redmon, Joseph and Farhadi, Ali. "YOLO9000:
better, faster, stronger." arXiv (2017).
Game Engine
Action
Generation
Camera
Control
Environment
Control
Annotations
Text
Extraction
Pedestrians/Vehicles
Camera
Environment
Game Engine
Action
Generation
Camera
Control
Environment
Control
Annotations
Text
Extraction
Pedestrians/Vehicles
Camera
Environment
RNNs
★ Works well with sequential input (e.g. words in
a sentence or a vector of numbers representing
an image)
★ For a given input, incorporates a “feedback”
loop of the information it received and the
decision it made from the previous input in the
sequence
Neural
Network
Output
Input
“e”
“h”
Vocabulary of 4 letters:
h e l o
Letters could be encoded as:
h [1 0 0 0]
e [0 1 0 0]
l [0 0 1 0]
o [0 0 0 1]
h
e
e l
l l
l
o
“l”
“e”
h
e
e l
l l
l
o
“l”
“l”
h
e
e l
l l
l
o
“o”
“l”
h
e
e l l
l l o
LSTM
LSTM
LSTM
LSTM
LSTM
LSTM
LSTM
LSTM A
car
white
driving
LSTM
★ A variation of RNNs (Long Short Term Memory)
★ LSTMs use additional units of “memory” for longer
connections across sequence inputs
Attention
★ Train model to focus on salient objects in
the image
★ Instead of feeding features from the
entire image to an RNN, just feed the
salient region’s features
LSTM
LSTM
LSTM
LSTM
LSTM
LSTM
LSTM
LSTM A
car
white
driving
“A man in a white shirt is walking”
“A white service vehicle is parked”
Search: “red truck”
Search by Text in Video
★ Extracting captions from video and store
them in an index
★ Fast video search by text query over large
amounts of video
Search by Example in Video
★ A user-defined bounding box on a video
frame
★ Query for similar objects of interest in the
entirety of a video dataset, at the frame
level
Search by Example in Video
★ A user-defined bounding box on a video
frame
★ Query for similar objects of interest in the
entirety of a video dataset, at the frame
level
★ GTA V allows us to create fully annotated, custom tailored,
photorealistic datasets
★ We can use this dataset to train models that are good at object
detection/localization, captioning, and search by example or text for
overhead video
★ The use of models trained on GTA data also has applicability in areas
such as real-time security camera alerting and self driving cars
★ GTA V allows us to create fully annotated, custom tailored,
photorealistic datasets
★ We can use this dataset to train models that are good at object
detection/localization, captioning, and search by example or text for
overhead video
★ The use of models trained on GTA data also has applicability in areas
such as real-time security camera alerting and self driving cars
★ GTA V allows us to create fully annotated, custom tailored,
photorealistic datasets
★ We can use this dataset to train models that are good at object
detection/localization, captioning, and search by example or text for
overhead video
★ The use of models trained on GTA data also has applicability in areas
such as real-time security camera alerting and self driving cars
www.ccri.com
mrajendiran@ccri.com

Más contenido relacionado

La actualidad más candente

Introduction to spatial data analysis in r
Introduction to spatial data analysis in rIntroduction to spatial data analysis in r
Introduction to spatial data analysis in r
Richard Wamalwa
 

La actualidad más candente (8)

Day 6 - PostGIS
Day 6 - PostGISDay 6 - PostGIS
Day 6 - PostGIS
 
increasing the action gap - new operators for reinforcement learning
increasing the action gap - new operators for reinforcement learningincreasing the action gap - new operators for reinforcement learning
increasing the action gap - new operators for reinforcement learning
 
自然方策勾配法の基礎と応用
自然方策勾配法の基礎と応用自然方策勾配法の基礎と応用
自然方策勾配法の基礎と応用
 
Win-Win Search: Dual-Agent Stochastic Game in Session Search (SIGIR 2014)
Win-Win Search: Dual-Agent Stochastic Game in Session Search (SIGIR 2014)Win-Win Search: Dual-Agent Stochastic Game in Session Search (SIGIR 2014)
Win-Win Search: Dual-Agent Stochastic Game in Session Search (SIGIR 2014)
 
LSGAN - SIMPle(Simple Idea Meaningful Performance Level up)
LSGAN - SIMPle(Simple Idea Meaningful Performance Level up)LSGAN - SIMPle(Simple Idea Meaningful Performance Level up)
LSGAN - SIMPle(Simple Idea Meaningful Performance Level up)
 
Semantic search within Earth Observation products databases based on automati...
Semantic search within Earth Observation products databases based on automati...Semantic search within Earth Observation products databases based on automati...
Semantic search within Earth Observation products databases based on automati...
 
Introduction to spatial data analysis in r
Introduction to spatial data analysis in rIntroduction to spatial data analysis in r
Introduction to spatial data analysis in r
 
GeoMesa LocationTech DC
GeoMesa LocationTech DCGeoMesa LocationTech DC
GeoMesa LocationTech DC
 

Similar a Training Drone Image Models with Grand Theft Auto

Similar a Training Drone Image Models with Grand Theft Auto (20)

Andreas Zeitler (Vuframe): Virtual & Augmented Business: How to Discover and ...
Andreas Zeitler (Vuframe): Virtual & Augmented Business: How to Discover and ...Andreas Zeitler (Vuframe): Virtual & Augmented Business: How to Discover and ...
Andreas Zeitler (Vuframe): Virtual & Augmented Business: How to Discover and ...
 
Drone ppt
Drone pptDrone ppt
Drone ppt
 
Artem Melnytskyi "Friendly Сo-pilot as a Practical AI Application"
Artem Melnytskyi "Friendly Сo-pilot as a Practical AI Application"Artem Melnytskyi "Friendly Сo-pilot as a Practical AI Application"
Artem Melnytskyi "Friendly Сo-pilot as a Practical AI Application"
 
"Wix Engineering Media AI Photo Studio", Mykola Mykhailych
"Wix Engineering Media AI Photo Studio", Mykola Mykhailych"Wix Engineering Media AI Photo Studio", Mykola Mykhailych
"Wix Engineering Media AI Photo Studio", Mykola Mykhailych
 
UE4 Landscape
UE4 LandscapeUE4 Landscape
UE4 Landscape
 
AMF305_Autonomous Driving Algorithm Development on Amazon AI
AMF305_Autonomous Driving Algorithm Development on Amazon AIAMF305_Autonomous Driving Algorithm Development on Amazon AI
AMF305_Autonomous Driving Algorithm Development on Amazon AI
 
AI Powered Drones
AI Powered DronesAI Powered Drones
AI Powered Drones
 
Machine learning for newbies
Machine learning for newbiesMachine learning for newbies
Machine learning for newbies
 
object-detection.pptx
object-detection.pptxobject-detection.pptx
object-detection.pptx
 
med_poster_spie
med_poster_spiemed_poster_spie
med_poster_spie
 
On-the-fly Visual Category Search in Web-scale Image Collections
On-the-fly Visual Category Search in Web-scale Image CollectionsOn-the-fly Visual Category Search in Web-scale Image Collections
On-the-fly Visual Category Search in Web-scale Image Collections
 
Discovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI Projects
Discovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI ProjectsDiscovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI Projects
Discovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI Projects
 
Pelee: a real time object detection system on mobile devices Paper Review
Pelee: a real time object detection system on mobile devices Paper ReviewPelee: a real time object detection system on mobile devices Paper Review
Pelee: a real time object detection system on mobile devices Paper Review
 
Voxxed Days Thesaloniki 2016 - Machine Learning for Developers
Voxxed Days Thesaloniki 2016 - Machine Learning for DevelopersVoxxed Days Thesaloniki 2016 - Machine Learning for Developers
Voxxed Days Thesaloniki 2016 - Machine Learning for Developers
 
[한국어] Safe Multi-Agent Reinforcement Learning for Autonomous Driving
[한국어] Safe Multi-Agent Reinforcement Learning for Autonomous Driving[한국어] Safe Multi-Agent Reinforcement Learning for Autonomous Driving
[한국어] Safe Multi-Agent Reinforcement Learning for Autonomous Driving
 
Image-to-Image Translation
Image-to-Image TranslationImage-to-Image Translation
Image-to-Image Translation
 
44CON London 2015 - Stegosploit - Drive-by Browser Exploits using only Images
44CON London 2015 - Stegosploit - Drive-by Browser Exploits using only Images44CON London 2015 - Stegosploit - Drive-by Browser Exploits using only Images
44CON London 2015 - Stegosploit - Drive-by Browser Exploits using only Images
 
Keep Calm and Stegosploit - 44CON 2015
Keep Calm and Stegosploit - 44CON 2015Keep Calm and Stegosploit - 44CON 2015
Keep Calm and Stegosploit - 44CON 2015
 
RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...
RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...
RenderMan*: The Role of Open Shading Language (OSL) with Intel® Advanced Vect...
 
Novi sad ai event 3-2018
Novi sad ai event 3-2018Novi sad ai event 3-2018
Novi sad ai event 3-2018
 

Último

怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
vexqp
 
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
vexqp
 
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
wsppdmt
 
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
vexqp
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
chadhar227
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
vexqp
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
vexqp
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
ranjankumarbehera14
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
q6pzkpark
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Klinik kandungan
 
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling ManjurJual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
ptikerjasaptiker
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
gajnagarg
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 

Último (20)

怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
 
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
 
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling ManjurJual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
SR-101-01012024-EN.docx  Federal Constitution  of the Swiss ConfederationSR-101-01012024-EN.docx  Federal Constitution  of the Swiss Confederation
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 

Training Drone Image Models with Grand Theft Auto

  • 2. Video Learning for Analysis from Deep Embeddings Timothy Emerick, PhD Sue He Alexander Polis Monica Rajendiran
  • 3.
  • 6. A green truck is crossing an intersection.
  • 7. A group of people are crossing the street.
  • 8. ★ Machine vision models often require large amounts of labeled data to train well ★ Existing labelled datasets can be too generic and have a broad concept space for our purposes
  • 9. ★ Machine vision models often require large amounts of labeled data to train well ★ Existing labelled datasets can be too generic and have a broad concept space for our purposes
  • 10. ImageNet 14 million+ images of 21K+ class entities YouTube-8M 450K+ hours of 4700+ class entities Olga Russakovsky*, Jia Deng*, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg and Li Fei-Fei. (* = equal contribution) ImageNet Large Scale Visual Recognition Challenge. IJCV, 2015. Abu-El-Haija, Sami, et al. "YouTube-8M: A large-scale video classification benchmark." arXiv preprint arXiv:1609.08675 (2016).
  • 11. ImageNet 14 million+ images of 21K+ class entities YouTube-8M 450K+ hours of 4700+ class entities Olga Russakovsky*, Jia Deng*, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg and Li Fei-Fei. (* = equal contribution) ImageNet Large Scale Visual Recognition Challenge. IJCV, 2015. Abu-El-Haija, Sami, et al. "YouTube-8M: A large-scale video classification benchmark." arXiv preprint arXiv:1609.08675 (2016).
  • 12. ★ Graphics have become extremely realistic over the years ★ Games are codeable, enabling complex simulations ★ Simulating in-game helps you ignore low level tasks like movement animations and routing
  • 13. ★ Graphics have become extremely realistic over the years ★ Games are codeable, enabling complex simulations ★ Simulating in-game helps you ignore low level tasks like movement animations and routing
  • 14. ★ Graphics have become extremely realistic over the years ★ Games are codeable, enabling complex simulations ★ Simulating in-game helps you ignore low level tasks like movement animations and routing
  • 15. ★ Rockstar Advanced Game Engine’s (RAGE) super realistic graphics ★ Huge modding community provides lots of customization ★ Programmatically configurable options
  • 16. ★ Rockstar Advanced Game Engine’s (RAGE) super realistic graphics ★ Huge modding community provides lots of customization ★ Programmatically configurable options
  • 17. ★ Rockstar Advanced Game Engine’s (RAGE) super realistic graphics ★ Huge modding community provides lots of customization ★ Programmatically configurable options
  • 18. ★ Programmatically configurable options ○ Script-Hook-V is a library which allows you to write scripts in-game ○ Thousands of function calls
  • 19. ★ Programmatically configurable options ○ We can generate entities of choice in-game and have them perform complex actions ○ Vehicles: driving, turning, waiting at stoplights ○ People: entering/exiting vehicles, waiting to cross the street, parking ○ Environment: weather, time of day, camera elevation, zoom
  • 20. ★ Grand Theft Auto Dataset: ○ Video footage ○ Objects of interest per frame (vehicles and pedestrians) ○ Object location information (bounding box information) ○ Text Descriptions (e.g. a white truck is turning left)
  • 21. CNNS ★ Extracts features from the input image, distilled down to class predictions ★ Preserves spatial relationship between pixels Bird Airplane Superman Car
  • 22. 0 2 2 3 0 2 5 3 3 7 8 7 4 5 3 5 4 4 0 2 8 7 3 8 5
  • 23. 7 8 5 12 12 15 16 16 7 0 2 2 3 0 2 5 3 3 7 8 7 4 5 3 5 4 4 0 2 8 7 3 8 5 * 1 0 1 0 1 0 0 0 0 Input Image Weights * Filter Feature Map
  • 24. 7 0 2 2 3 0 2 5 3 3 7 8 7 4 5 3 5 4 4 0 2 8 7 3 8 5 * 1 0 1 0 1 0 0 0 0 Input Image Weights * Filter Feature Map x1 x0 x1 x0 x1 x0 x0 x0 x0
  • 25. 7 8 0 2 2 3 0 2 5 3 3 7 8 7 4 5 3 5 4 4 0 2 8 7 3 8 5 * 1 0 1 0 1 0 0 0 0 Input Image Weights * Filter Feature Map x1 x0 x1 x0 x1 x0 x0 x0 x0
  • 26. 7 8 5 0 2 2 3 0 2 5 3 3 7 8 7 4 5 3 5 4 4 0 2 8 7 3 8 5 * 1 0 1 0 1 0 0 0 0 Input Image Weights * Filter Feature Map x1 x0 x1 x0 x1 x0 x0 x0 x0
  • 27. 7 8 5 12 0 2 2 3 0 2 5 3 3 7 8 7 4 5 3 5 4 4 0 2 8 7 3 8 5 * 1 0 1 0 1 0 0 0 0 Input Image Weights * Filter Feature Map x1 x0 x1 x0 x1 x0 x0 x0 x0
  • 28. 7 8 5 12 12 0 2 2 3 0 2 5 3 3 7 8 7 4 5 3 5 4 4 0 2 8 7 3 8 5 * 1 0 1 0 1 0 0 0 0 Input Image Weights * Filter Feature Map x1 x0 x1 x0 x1 x0 x0 x0 x0
  • 29. 7 8 5 12 12 15 0 2 2 3 0 2 5 3 3 7 8 7 4 5 3 5 4 4 0 2 8 7 3 8 5 * 1 0 1 0 1 0 0 0 0 Input Image Weights * Filter Feature Map x1 x0 x1 x0 x1 x0 x0 x0 x0
  • 30. 7 8 5 12 12 15 16 0 2 2 3 0 2 5 3 3 7 8 7 4 5 3 5 4 4 0 2 8 7 3 8 5 * 1 0 1 0 1 0 0 0 0 Input Image Weights * Filter Feature Map x1 x0 x1 x0 x1 x0 x0 x0 x0
  • 31. 7 8 5 12 12 15 16 16 0 2 2 3 0 2 5 3 3 7 8 7 4 5 3 5 4 4 0 2 8 7 3 8 5 * 1 0 1 0 1 0 0 0 0 Input Image Weights * Filter Feature Map x1 x0 x1 x0 x1 x0 x0 x0 x0
  • 32. 7 8 5 12 12 15 16 16 7 0 2 2 3 0 2 5 3 3 7 8 7 4 5 3 5 4 4 0 2 8 7 3 8 5 * 1 0 1 0 1 0 0 0 0 Input Image Weights * Filter Feature Map x1 x0 x1 x0 x1 x0 x0 x0 x0
  • 33. 3 feature maps produced from 3 filters Bird Airplane Superman Car
  • 34. -1 -1 -1 -1 8 -1 -1 -1 -1
  • 35. CNNS ★ Extracts features from the input image, distilled down to class predictions ★ Preserves spatial relationship between pixels Bird Airplane Superman Car
  • 36. ★ YOLO9000 (YOLO v2) is a real time object detection convolutional neural network architecture ★ Redmon, Joseph and Farhadi, Ali. "YOLO9000: better, faster, stronger." arXiv (2017).
  • 37. ★ YOLO9000 (YOLO v2) is a real time object detection convolutional neural network architecture ★ Redmon, Joseph and Farhadi, Ali. "YOLO9000: better, faster, stronger." arXiv (2017).
  • 38.
  • 41.
  • 42. RNNs ★ Works well with sequential input (e.g. words in a sentence or a vector of numbers representing an image) ★ For a given input, incorporates a “feedback” loop of the information it received and the decision it made from the previous input in the sequence Neural Network Output Input
  • 43. “e” “h” Vocabulary of 4 letters: h e l o Letters could be encoded as: h [1 0 0 0] e [0 1 0 0] l [0 0 1 0] o [0 0 0 1] h e e l l l l o
  • 47. LSTM LSTM LSTM LSTM LSTM LSTM LSTM LSTM A car white driving LSTM ★ A variation of RNNs (Long Short Term Memory) ★ LSTMs use additional units of “memory” for longer connections across sequence inputs
  • 48.
  • 49. Attention ★ Train model to focus on salient objects in the image ★ Instead of feeding features from the entire image to an RNN, just feed the salient region’s features
  • 51.
  • 52. “A man in a white shirt is walking”
  • 53. “A white service vehicle is parked”
  • 54. Search: “red truck” Search by Text in Video ★ Extracting captions from video and store them in an index ★ Fast video search by text query over large amounts of video
  • 55. Search by Example in Video ★ A user-defined bounding box on a video frame ★ Query for similar objects of interest in the entirety of a video dataset, at the frame level
  • 56. Search by Example in Video ★ A user-defined bounding box on a video frame ★ Query for similar objects of interest in the entirety of a video dataset, at the frame level
  • 57. ★ GTA V allows us to create fully annotated, custom tailored, photorealistic datasets ★ We can use this dataset to train models that are good at object detection/localization, captioning, and search by example or text for overhead video ★ The use of models trained on GTA data also has applicability in areas such as real-time security camera alerting and self driving cars
  • 58. ★ GTA V allows us to create fully annotated, custom tailored, photorealistic datasets ★ We can use this dataset to train models that are good at object detection/localization, captioning, and search by example or text for overhead video ★ The use of models trained on GTA data also has applicability in areas such as real-time security camera alerting and self driving cars
  • 59. ★ GTA V allows us to create fully annotated, custom tailored, photorealistic datasets ★ We can use this dataset to train models that are good at object detection/localization, captioning, and search by example or text for overhead video ★ The use of models trained on GTA data also has applicability in areas such as real-time security camera alerting and self driving cars

Notas del editor

  1. Vehicles - Color, Type, Damage People - Clothing Color, Gender, Number Buildings - Type
  2. Video captioning
  3. Video captioning
  4. Video captioning