SlideShare una empresa de Scribd logo
1 de 16
3D Multi Object GAN
Fully Convolutional Refined Auto-Encoding Generative Adversarial
Networks for 3D Multi Object Scenes
8/31/2017
Real/Fake
Encoder Generator
Fully Convolution
x
Normal Distribution
z Generator
Discriminator
zenc
reshape
Code
DiscriminatorReal/Fake
Refiner
Refiner
xgen
xrec
Agenda
• Introduction
• Dataset
• Network Architecture
• Loss Functions
• Experiments
• Evaluations
• Suggestions of Future Work
Source Code:
https://github.com/yunishi3/3D-FCR-alphaGAN
Introduction
3D multi object generative models should be an extremely important tasks
for AR/VR and graphics fields.
- Synthesize a variety of novel 3D multi objects
- Recognize the objects including shapes, objects and layouts
Only single objects are generated so far
Simple 3D-GANs[1]
Simple 3D-VAE[2]
Multi Object Scenes
Dataset
• SUNCG dataset
Extracted only voxel models from SUNCG dataset
-Voxel size: 80 x 48 x 80 (Downsized from 240x144x240)
-12 Objects
[empty, ceiling, floor, wall, window, chair, bed, sofa, table, tvs, furn, objs]
-Amount: around 185000
-Got rid of trimming by camera angles
-Chose the scenes that have over 10000 amount of voxels.
-No label
From Princeton [3]
Challenges, Difficulties
Sparse, so much varieties
empty ceiling floor wall window chair bed sofa table tv furnitureobjects
92.466 0.945 1.103 1.963 0.337 0.070 0.368 0.378 0.107 0.009 1.406 0.846
Average occupancy ratio of each objects in dataset [%]
[%] Average occupancy ratio
Dining room
Bedroom
Garage
Living room
Network Architecture
Fully Convolutional Refined Auto-Encoding Generative Adversarial Networks
-Similar architecture with 3DGAN[1]
-Encoder, discriminator are almost mirrored from generator
-Latent space is fully convolutional layer (5x3x5x16)
-Fully convolution enables Zenc to represent more features
-Last activation of generator is softmax -> Divide 12 classes
-Code discriminator is fully connected (2 hidden layers)[4]
-Refiner is similar architecture of simGAN[5]
-Multi class activation
for multi object scenes
-Fully Convolution
Novel Contribution
Real/Fake
Encoder Generator
Fully Convolution
Normal Distribution
z Generator
Discriminator
zenc
reshape
Code
DiscriminatorReal/Fake
Refiner
Refiner
xgen
xrec
x
Network Architecture
Inspired by [1]
z
5x3x5x512 10x6x10x256 20x12x20x128 40x24x40x64 80x48x80x12
Each Network
-3D deconv (Stride:2)
-Batch Norm
-LRelu(Discriminator)
Relu(Encoder, Generator)
Last Activation
-Softmax
5x3x5x16
Reshape
& FC
batchnorm
relu
5x3x5
stride 2
batchnorm
relu
Generator Network
5x3x5
stride 2
batchnorm
relu
5x3x5
stride 2
batchnorm
relu
5x3x5
stride 2
softmax
[Wu et al. 2016, MIT]
Network Architecture
Inspired by [5]
Each Network
-3D deconv (Stride:1)
-Relu(Encoder, Generator)
-ResNet Block loops 4 times
(different weights)
Last Activation
-Softmax
Refiner Network
Unlabeled Real Images
Synthetic
Simulated images
Refined
Figure5. Exampleoutput of SimGAN for theUnityEyesgazeestimation dataset [40]. (Left) real imagesfrom MPIIGaze[43]. Our
refiner network doesnot useany label information from MPIIGazedataset at training time. (Right) refinement resultson UnityEye.
The skin texture and the iris region in the refined synthetic images are qualitatively significantly more similar to the real images
than to thesynthetic images. More examples areincluded in thesupplementary material.
maps
Conv
f@nxn
Conv
f@nxn
+
ReLU
ReLU
Input
Features
Output
Features
Figure6. A ResNet block with two n ⇥n convolutional layers,
and 214K real images from the MPIIGaze dataset [43]
– samples shown in Figure 5. MPIIGaze is a very chal-
lenging eye gaze estimation dataset captured under ex-
treme illumination conditions. For UnityEyes we use a
single generic rendering environment to generate train-
ing data without any dataset-specific targeting.
80x48x80x12
80x48x80x32 80x48x80x32
80x48x80x12
ResNet Block x4
3x3x3
relu
3x3x3
relu
3x3x3
Relu
Loss / Training
Encoder
Distribution GAN Loss
Reconstruction Loss
Discriminator discriminates real and fake scenes accurately
Generator fools discriminator
ℒ 𝑟𝑒𝑐 =
𝑛
𝑐𝑙𝑎𝑠𝑠
𝑤 𝑛 −𝛾𝑥𝑙𝑜𝑔 𝑥 𝑟𝑒𝑐 − 1 − 𝛾 1 − 𝑥 𝑙𝑜𝑔 1 − 𝑥 𝑟𝑒𝑐
Reconstruction accuracy would be high
w is occupancy normalized weights with every batch
GAN Loss
ℒ 𝐺𝐴𝑁 𝐷 = −log 𝐷 𝑥 − log 1 − 𝐷 𝑥 𝑟𝑒𝑐 − log 1 − 𝐷 𝑥 𝑔𝑒𝑛
ℒ 𝐺𝐴𝑁 𝐺 = − log 𝐷 𝑥 𝑟𝑒𝑐 − log 𝐷 𝑥 𝑔𝑒𝑛
min
𝐸
ℒ = ℒ 𝑐𝐺𝐴𝑁 𝐸 + 𝜆ℒ 𝑟𝑒𝑐
TrainingLoss
Generator with refiner
min
𝐺
ℒ = 𝜆ℒ 𝑟𝑒𝑐 + ℒ 𝐺𝐴𝑁 𝐺
Discriminator
min
𝐷
ℒ = ℒ 𝐺𝐴𝑁 𝐷
Learning rate: 0.0001
Batch size: 20(Base), 8(Refiner)
Iteration: 100000
(75000:Base, 25000:Refiner)
ℒ 𝑐𝐺𝐴𝑁 𝐷 = −log 𝐷𝑐𝑜𝑑𝑒 𝑧 − log 1 − 𝐷𝑐𝑜𝑑𝑒 𝑧 𝑒𝑛𝑐
ℒ 𝑐𝐺𝐴𝑁 𝐸 = − log 𝐷𝑐𝑜𝑑𝑒 𝑧 𝑒𝑛𝑐
Code discriminator discriminates real and fake distribution accurately
Encoder fools code discriminator
Code Discriminator
min
𝐶
ℒ = ℒ 𝑐𝐺𝐴𝑁 𝐷
Refiner is trained after 75000 iterations
Experiments
Refiner smooths and refines shapes visually.
Generated scenes from random distribution were not realistic
Generated from random distribution
FC-VAE 3D FCR-alphaGAN
Reconstruction
Real Reconstruction
Almost reconstructed, but small shapes have disappeared.
Before Refine After Refine
This architecture worked better than just VAE, but it’s not enough.
This is because encoder was not generalized to the distribution
Before Refine After Refine
Result
Numerical evaluation of reconstruction by IoU
Intersection-over Union(IoU) [6]
Reconstruction accuracy got high due to the fully convolution and alphaGAN
IoU for every class
IoU for all
Same number of latent space dimension
Same number of
latent space dimension
Evaluations
Interpolation
Smooth transition between scenes are built
Evaluations
Latent Space Evaluation
The 2D represented mapping by SVD of 200 encoded samples
Color:1D embedding by SVD of centroid coordinates of each scene
Fully convolution Standard VAE
Fully Convolution enables the latent space to be related to spatial context
This follows 1d embedding of centroid coordinates from lower right to upper left. This does not.
Evaluations
Latent space evaluation by added noise
The effects of individual spatial dimensions composed of 5x3x5 as the latent space.
Red means the level of changes given by normal distribution noises of one dimension.
・2,0,4 dimension changes objects in right back area.
・4,0,1 dimension changes objects in left front area.
・1,0,0 dimension changes objects in left back area.
・4,0,4 dimension changes objects in right front area.
Fully Convolution enables the latent space to be related to spatial context
Suggestions of Future Work
・Revise the dataset
This dataset is extremely sparse and has plenty of varieties. Floors and small objects are allocated to huge varieties of
positions, also some of the small parts like legs of chairs broke up in the dataset because of the downsizing. That makes
predicting latent space too hard. Therefore, it is an important work to revise the dataset like limiting the varieties or
adjusting the positions of objects.
・Redefine the latent space
In this work, I defined the latent space with one space which includes all information like shapes and positions of each
object. Therefore, some small objects disappeared in the generated models, and a lot of non-realistic objects were
generated. In order to solve that, it is an important work to redefine the latent space like isolating it to each object and
layout. However, increasing the varieties of objects and taking account into multiple objects are required in that case.
3D Multi Object GAN

Más contenido relacionado

La actualidad más candente

Basic Generative Adversarial Networks
Basic Generative Adversarial NetworksBasic Generative Adversarial Networks
Basic Generative Adversarial NetworksDong Heon Cho
 
ICASSP 2018 Tutorial: Generative Adversarial Network and its Applications to ...
ICASSP 2018 Tutorial: Generative Adversarial Network and its Applications to ...ICASSP 2018 Tutorial: Generative Adversarial Network and its Applications to ...
ICASSP 2018 Tutorial: Generative Adversarial Network and its Applications to ...宏毅 李
 
Unsupervised learning represenation with DCGAN
Unsupervised learning represenation with DCGANUnsupervised learning represenation with DCGAN
Unsupervised learning represenation with DCGANShyam Krishna Khadka
 
Generative adversarial network_Ayadi_Alaeddine
Generative adversarial network_Ayadi_AlaeddineGenerative adversarial network_Ayadi_Alaeddine
Generative adversarial network_Ayadi_AlaeddineDeep Learning Italia
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks남주 김
 
Variational Autoencoded Regression of Visual Data with Generative Adversarial...
Variational Autoencoded Regression of Visual Data with Generative Adversarial...Variational Autoencoded Regression of Visual Data with Generative Adversarial...
Variational Autoencoded Regression of Visual Data with Generative Adversarial...NAVER Engineering
 
Generative Adversarial Networks (GAN)
Generative Adversarial Networks (GAN)Generative Adversarial Networks (GAN)
Generative Adversarial Networks (GAN)Manohar Mukku
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networksYunjey Choi
 
A pixel to-pixel segmentation method of DILD without masks using CNN and perl...
A pixel to-pixel segmentation method of DILD without masks using CNN and perl...A pixel to-pixel segmentation method of DILD without masks using CNN and perl...
A pixel to-pixel segmentation method of DILD without masks using CNN and perl...남주 김
 
Generative Adversarial Network (+Laplacian Pyramid GAN)
Generative Adversarial Network (+Laplacian Pyramid GAN)Generative Adversarial Network (+Laplacian Pyramid GAN)
Generative Adversarial Network (+Laplacian Pyramid GAN)NamHyuk Ahn
 
[GAN by Hung-yi Lee]Part 1: General introduction of GAN
[GAN by Hung-yi Lee]Part 1: General introduction of GAN[GAN by Hung-yi Lee]Part 1: General introduction of GAN
[GAN by Hung-yi Lee]Part 1: General introduction of GANNAVER Engineering
 
Introduction to Generative Adversarial Networks
Introduction to Generative Adversarial NetworksIntroduction to Generative Adversarial Networks
Introduction to Generative Adversarial NetworksBennoG1
 
Generative Adversarial Networks
Generative Adversarial NetworksGenerative Adversarial Networks
Generative Adversarial NetworksMustafa Yagmur
 
Convolutional neural network in practice
Convolutional neural network in practiceConvolutional neural network in practice
Convolutional neural network in practice남주 김
 
Gan seminar
Gan seminarGan seminar
Gan seminarSan Kim
 
Generative Adversarial Networks and Their Applications
Generative Adversarial Networks and Their ApplicationsGenerative Adversarial Networks and Their Applications
Generative Adversarial Networks and Their ApplicationsArtifacia
 
Recent Progress on Utilizing Tag Information with GANs - StarGAN & TD-GAN
Recent Progress on Utilizing Tag Information with GANs - StarGAN & TD-GANRecent Progress on Utilizing Tag Information with GANs - StarGAN & TD-GAN
Recent Progress on Utilizing Tag Information with GANs - StarGAN & TD-GANHao-Wen (Herman) Dong
 

La actualidad más candente (20)

Basic Generative Adversarial Networks
Basic Generative Adversarial NetworksBasic Generative Adversarial Networks
Basic Generative Adversarial Networks
 
ICASSP 2018 Tutorial: Generative Adversarial Network and its Applications to ...
ICASSP 2018 Tutorial: Generative Adversarial Network and its Applications to ...ICASSP 2018 Tutorial: Generative Adversarial Network and its Applications to ...
ICASSP 2018 Tutorial: Generative Adversarial Network and its Applications to ...
 
그림 그리는 AI
그림 그리는 AI그림 그리는 AI
그림 그리는 AI
 
Unsupervised learning represenation with DCGAN
Unsupervised learning represenation with DCGANUnsupervised learning represenation with DCGAN
Unsupervised learning represenation with DCGAN
 
Generative adversarial network_Ayadi_Alaeddine
Generative adversarial network_Ayadi_AlaeddineGenerative adversarial network_Ayadi_Alaeddine
Generative adversarial network_Ayadi_Alaeddine
 
Generative adversarial text to image synthesis
Generative adversarial text to image synthesisGenerative adversarial text to image synthesis
Generative adversarial text to image synthesis
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks
 
Variational Autoencoded Regression of Visual Data with Generative Adversarial...
Variational Autoencoded Regression of Visual Data with Generative Adversarial...Variational Autoencoded Regression of Visual Data with Generative Adversarial...
Variational Autoencoded Regression of Visual Data with Generative Adversarial...
 
Generative Adversarial Networks (GAN)
Generative Adversarial Networks (GAN)Generative Adversarial Networks (GAN)
Generative Adversarial Networks (GAN)
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks
 
A pixel to-pixel segmentation method of DILD without masks using CNN and perl...
A pixel to-pixel segmentation method of DILD without masks using CNN and perl...A pixel to-pixel segmentation method of DILD without masks using CNN and perl...
A pixel to-pixel segmentation method of DILD without masks using CNN and perl...
 
Generative Adversarial Network (+Laplacian Pyramid GAN)
Generative Adversarial Network (+Laplacian Pyramid GAN)Generative Adversarial Network (+Laplacian Pyramid GAN)
Generative Adversarial Network (+Laplacian Pyramid GAN)
 
[GAN by Hung-yi Lee]Part 1: General introduction of GAN
[GAN by Hung-yi Lee]Part 1: General introduction of GAN[GAN by Hung-yi Lee]Part 1: General introduction of GAN
[GAN by Hung-yi Lee]Part 1: General introduction of GAN
 
Spark algorithms
Spark algorithmsSpark algorithms
Spark algorithms
 
Introduction to Generative Adversarial Networks
Introduction to Generative Adversarial NetworksIntroduction to Generative Adversarial Networks
Introduction to Generative Adversarial Networks
 
Generative Adversarial Networks
Generative Adversarial NetworksGenerative Adversarial Networks
Generative Adversarial Networks
 
Convolutional neural network in practice
Convolutional neural network in practiceConvolutional neural network in practice
Convolutional neural network in practice
 
Gan seminar
Gan seminarGan seminar
Gan seminar
 
Generative Adversarial Networks and Their Applications
Generative Adversarial Networks and Their ApplicationsGenerative Adversarial Networks and Their Applications
Generative Adversarial Networks and Their Applications
 
Recent Progress on Utilizing Tag Information with GANs - StarGAN & TD-GAN
Recent Progress on Utilizing Tag Information with GANs - StarGAN & TD-GANRecent Progress on Utilizing Tag Information with GANs - StarGAN & TD-GAN
Recent Progress on Utilizing Tag Information with GANs - StarGAN & TD-GAN
 

Destacado

Rpp revisi 2017 sejarah peminatan kelas 11 sma
Rpp revisi 2017 sejarah peminatan kelas 11 smaRpp revisi 2017 sejarah peminatan kelas 11 sma
Rpp revisi 2017 sejarah peminatan kelas 11 smaDiva Pendidikan
 
IoTビジネスのフレームワーク、ロードマップ
IoTビジネスのフレームワーク、ロードマップIoTビジネスのフレームワーク、ロードマップ
IoTビジネスのフレームワーク、ロードマップKatsuhito Okada
 
AI eats UX vol.2 Talk 20170913 -人工知能は「検索」体験をどう変えるか
AI eats UX vol.2 Talk 20170913 -人工知能は「検索」体験をどう変えるかAI eats UX vol.2 Talk 20170913 -人工知能は「検索」体験をどう変えるか
AI eats UX vol.2 Talk 20170913 -人工知能は「検索」体験をどう変えるかNozomu Tannaka
 
170130 IoT LT #23 (CESで見てきたハードウェアスタートアップを支えるエコシステム) @ソフトバンク
170130 IoT LT #23 (CESで見てきたハードウェアスタートアップを支えるエコシステム) @ソフトバンク170130 IoT LT #23 (CESで見てきたハードウェアスタートアップを支えるエコシステム) @ソフトバンク
170130 IoT LT #23 (CESで見てきたハードウェアスタートアップを支えるエコシステム) @ソフトバンクToshiki Tsuboi
 
0528 kanntigai ui_ux
0528 kanntigai ui_ux0528 kanntigai ui_ux
0528 kanntigai ui_uxSaori Matsui
 
女子の心をつかむUIデザインポイント - MERY編 -
女子の心をつかむUIデザインポイント - MERY編 -女子の心をつかむUIデザインポイント - MERY編 -
女子の心をつかむUIデザインポイント - MERY編 -Shoko Tanaka
 

Destacado (8)

Rpp revisi 2017 sejarah peminatan kelas 11 sma
Rpp revisi 2017 sejarah peminatan kelas 11 smaRpp revisi 2017 sejarah peminatan kelas 11 sma
Rpp revisi 2017 sejarah peminatan kelas 11 sma
 
IoTビジネスのフレームワーク、ロードマップ
IoTビジネスのフレームワーク、ロードマップIoTビジネスのフレームワーク、ロードマップ
IoTビジネスのフレームワーク、ロードマップ
 
AI eats UX vol.2 Talk 20170913 -人工知能は「検索」体験をどう変えるか
AI eats UX vol.2 Talk 20170913 -人工知能は「検索」体験をどう変えるかAI eats UX vol.2 Talk 20170913 -人工知能は「検索」体験をどう変えるか
AI eats UX vol.2 Talk 20170913 -人工知能は「検索」体験をどう変えるか
 
170130 IoT LT #23 (CESで見てきたハードウェアスタートアップを支えるエコシステム) @ソフトバンク
170130 IoT LT #23 (CESで見てきたハードウェアスタートアップを支えるエコシステム) @ソフトバンク170130 IoT LT #23 (CESで見てきたハードウェアスタートアップを支えるエコシステム) @ソフトバンク
170130 IoT LT #23 (CESで見てきたハードウェアスタートアップを支えるエコシステム) @ソフトバンク
 
会社説明会資料【2012年卒新卒採用】
会社説明会資料【2012年卒新卒採用】会社説明会資料【2012年卒新卒採用】
会社説明会資料【2012年卒新卒採用】
 
Lightning Network入門
Lightning Network入門Lightning Network入門
Lightning Network入門
 
0528 kanntigai ui_ux
0528 kanntigai ui_ux0528 kanntigai ui_ux
0528 kanntigai ui_ux
 
女子の心をつかむUIデザインポイント - MERY編 -
女子の心をつかむUIデザインポイント - MERY編 -女子の心をつかむUIデザインポイント - MERY編 -
女子の心をつかむUIデザインポイント - MERY編 -
 

Similar a 3D Multi Object GAN

Faire de la reconnaissance d'images avec le Deep Learning - Cristina & Pierre...
Faire de la reconnaissance d'images avec le Deep Learning - Cristina & Pierre...Faire de la reconnaissance d'images avec le Deep Learning - Cristina & Pierre...
Faire de la reconnaissance d'images avec le Deep Learning - Cristina & Pierre...Jedha Bootcamp
 
DEEP LEARNING TECHNIQUES POWER POINT PRESENTATION
DEEP LEARNING TECHNIQUES POWER POINT PRESENTATIONDEEP LEARNING TECHNIQUES POWER POINT PRESENTATION
DEEP LEARNING TECHNIQUES POWER POINT PRESENTATIONSelvaLakshmi63
 
Comparison of Various RCNN techniques for Classification of Object from Image
Comparison of Various RCNN techniques for Classification of Object from ImageComparison of Various RCNN techniques for Classification of Object from Image
Comparison of Various RCNN techniques for Classification of Object from ImageIRJET Journal
 
Cahall Final Intern Presentation
Cahall Final Intern PresentationCahall Final Intern Presentation
Cahall Final Intern PresentationDaniel Cahall
 
Machine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural NetworkMachine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural NetworkRichard Kuo
 
Overview of Convolutional Neural Networks
Overview of Convolutional Neural NetworksOverview of Convolutional Neural Networks
Overview of Convolutional Neural Networksananth
 
The Technology behind Shadow Warrior, ZTG 2014
The Technology behind Shadow Warrior, ZTG 2014The Technology behind Shadow Warrior, ZTG 2014
The Technology behind Shadow Warrior, ZTG 2014Jarosław Pleskot
 
Joey gonzalez, graph lab, m lconf 2013
Joey gonzalez, graph lab, m lconf 2013Joey gonzalez, graph lab, m lconf 2013
Joey gonzalez, graph lab, m lconf 2013MLconf
 
Implementing a modern, RenderMan compliant, REYES renderer
Implementing a modern, RenderMan compliant, REYES rendererImplementing a modern, RenderMan compliant, REYES renderer
Implementing a modern, RenderMan compliant, REYES rendererDavide Pasca
 
ImageNet classification with deep convolutional neural networks(2012)
ImageNet classification with deep convolutional neural networks(2012)ImageNet classification with deep convolutional neural networks(2012)
ImageNet classification with deep convolutional neural networks(2012)WoochulShin10
 
Presentation vision transformersppt.pptx
Presentation vision transformersppt.pptxPresentation vision transformersppt.pptx
Presentation vision transformersppt.pptxhtn540
 
Tensorflow London 13: Zbigniew Wojna 'Deep Learning for Big Scale 2D Imagery'
Tensorflow London 13: Zbigniew Wojna 'Deep Learning for Big Scale 2D Imagery'Tensorflow London 13: Zbigniew Wojna 'Deep Learning for Big Scale 2D Imagery'
Tensorflow London 13: Zbigniew Wojna 'Deep Learning for Big Scale 2D Imagery'Seldon
 
Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningCastLabKAIST
 
SeRanet introduction
SeRanet introductionSeRanet introduction
SeRanet introductionKosuke Nakago
 
SIGGRAPH 2018 - Full Rays Ahead! From Raster to Real-Time Raytracing
SIGGRAPH 2018 - Full Rays Ahead! From Raster to Real-Time RaytracingSIGGRAPH 2018 - Full Rays Ahead! From Raster to Real-Time Raytracing
SIGGRAPH 2018 - Full Rays Ahead! From Raster to Real-Time RaytracingElectronic Arts / DICE
 
机器学习Adaboost
机器学习Adaboost机器学习Adaboost
机器学习AdaboostShocky1
 

Similar a 3D Multi Object GAN (20)

Faire de la reconnaissance d'images avec le Deep Learning - Cristina & Pierre...
Faire de la reconnaissance d'images avec le Deep Learning - Cristina & Pierre...Faire de la reconnaissance d'images avec le Deep Learning - Cristina & Pierre...
Faire de la reconnaissance d'images avec le Deep Learning - Cristina & Pierre...
 
DEEP LEARNING TECHNIQUES POWER POINT PRESENTATION
DEEP LEARNING TECHNIQUES POWER POINT PRESENTATIONDEEP LEARNING TECHNIQUES POWER POINT PRESENTATION
DEEP LEARNING TECHNIQUES POWER POINT PRESENTATION
 
Comparison of Various RCNN techniques for Classification of Object from Image
Comparison of Various RCNN techniques for Classification of Object from ImageComparison of Various RCNN techniques for Classification of Object from Image
Comparison of Various RCNN techniques for Classification of Object from Image
 
Cahall Final Intern Presentation
Cahall Final Intern PresentationCahall Final Intern Presentation
Cahall Final Intern Presentation
 
Machine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural NetworkMachine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural Network
 
lec6a.ppt
lec6a.pptlec6a.ppt
lec6a.ppt
 
Overview of Convolutional Neural Networks
Overview of Convolutional Neural NetworksOverview of Convolutional Neural Networks
Overview of Convolutional Neural Networks
 
The Technology behind Shadow Warrior, ZTG 2014
The Technology behind Shadow Warrior, ZTG 2014The Technology behind Shadow Warrior, ZTG 2014
The Technology behind Shadow Warrior, ZTG 2014
 
IMAGE PROCESSING
IMAGE PROCESSINGIMAGE PROCESSING
IMAGE PROCESSING
 
Joey gonzalez, graph lab, m lconf 2013
Joey gonzalez, graph lab, m lconf 2013Joey gonzalez, graph lab, m lconf 2013
Joey gonzalez, graph lab, m lconf 2013
 
Implementing a modern, RenderMan compliant, REYES renderer
Implementing a modern, RenderMan compliant, REYES rendererImplementing a modern, RenderMan compliant, REYES renderer
Implementing a modern, RenderMan compliant, REYES renderer
 
ImageNet classification with deep convolutional neural networks(2012)
ImageNet classification with deep convolutional neural networks(2012)ImageNet classification with deep convolutional neural networks(2012)
ImageNet classification with deep convolutional neural networks(2012)
 
Eye deep
Eye deepEye deep
Eye deep
 
Presentation vision transformersppt.pptx
Presentation vision transformersppt.pptxPresentation vision transformersppt.pptx
Presentation vision transformersppt.pptx
 
Tensorflow London 13: Zbigniew Wojna 'Deep Learning for Big Scale 2D Imagery'
Tensorflow London 13: Zbigniew Wojna 'Deep Learning for Big Scale 2D Imagery'Tensorflow London 13: Zbigniew Wojna 'Deep Learning for Big Scale 2D Imagery'
Tensorflow London 13: Zbigniew Wojna 'Deep Learning for Big Scale 2D Imagery'
 
Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine Learning
 
SeRanet introduction
SeRanet introductionSeRanet introduction
SeRanet introduction
 
SIGGRAPH 2018 - Full Rays Ahead! From Raster to Real-Time Raytracing
SIGGRAPH 2018 - Full Rays Ahead! From Raster to Real-Time RaytracingSIGGRAPH 2018 - Full Rays Ahead! From Raster to Real-Time Raytracing
SIGGRAPH 2018 - Full Rays Ahead! From Raster to Real-Time Raytracing
 
UE4 Landscape
UE4 LandscapeUE4 Landscape
UE4 Landscape
 
机器学习Adaboost
机器学习Adaboost机器学习Adaboost
机器学习Adaboost
 

Más de Yu Nishimura

ACL2018 出張報告
ACL2018 出張報告ACL2018 出張報告
ACL2018 出張報告Yu Nishimura
 
ICLR2018出張報告
ICLR2018出張報告ICLR2018出張報告
ICLR2018出張報告Yu Nishimura
 
ぼくがd.schoolで学んだこと
ぼくがd.schoolで学んだことぼくがd.schoolで学んだこと
ぼくがd.schoolで学んだことYu Nishimura
 
Relonch体験レポート
Relonch体験レポートRelonch体験レポート
Relonch体験レポートYu Nishimura
 
Snap Inc.徹底分析
Snap Inc.徹底分析Snap Inc.徹底分析
Snap Inc.徹底分析Yu Nishimura
 
シリコンバレーとスタンフォードに見るイノベーションの源泉
シリコンバレーとスタンフォードに見るイノベーションの源泉シリコンバレーとスタンフォードに見るイノベーションの源泉
シリコンバレーとスタンフォードに見るイノベーションの源泉Yu Nishimura
 

Más de Yu Nishimura (8)

ACL2018 出張報告
ACL2018 出張報告ACL2018 出張報告
ACL2018 出張報告
 
ICLR2018出張報告
ICLR2018出張報告ICLR2018出張報告
ICLR2018出張報告
 
ぼくがd.schoolで学んだこと
ぼくがd.schoolで学んだことぼくがd.schoolで学んだこと
ぼくがd.schoolで学んだこと
 
Relonch体験レポート
Relonch体験レポートRelonch体験レポート
Relonch体験レポート
 
CVPR 2017 報告
CVPR 2017 報告CVPR 2017 報告
CVPR 2017 報告
 
Drama2Vec
Drama2VecDrama2Vec
Drama2Vec
 
Snap Inc.徹底分析
Snap Inc.徹底分析Snap Inc.徹底分析
Snap Inc.徹底分析
 
シリコンバレーとスタンフォードに見るイノベーションの源泉
シリコンバレーとスタンフォードに見るイノベーションの源泉シリコンバレーとスタンフォードに見るイノベーションの源泉
シリコンバレーとスタンフォードに見るイノベーションの源泉
 

Último

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 

Último (20)

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 

3D Multi Object GAN

  • 1. 3D Multi Object GAN Fully Convolutional Refined Auto-Encoding Generative Adversarial Networks for 3D Multi Object Scenes 8/31/2017 Real/Fake Encoder Generator Fully Convolution x Normal Distribution z Generator Discriminator zenc reshape Code DiscriminatorReal/Fake Refiner Refiner xgen xrec
  • 2. Agenda • Introduction • Dataset • Network Architecture • Loss Functions • Experiments • Evaluations • Suggestions of Future Work Source Code: https://github.com/yunishi3/3D-FCR-alphaGAN
  • 3. Introduction 3D multi object generative models should be an extremely important tasks for AR/VR and graphics fields. - Synthesize a variety of novel 3D multi objects - Recognize the objects including shapes, objects and layouts Only single objects are generated so far Simple 3D-GANs[1] Simple 3D-VAE[2] Multi Object Scenes
  • 4. Dataset • SUNCG dataset Extracted only voxel models from SUNCG dataset -Voxel size: 80 x 48 x 80 (Downsized from 240x144x240) -12 Objects [empty, ceiling, floor, wall, window, chair, bed, sofa, table, tvs, furn, objs] -Amount: around 185000 -Got rid of trimming by camera angles -Chose the scenes that have over 10000 amount of voxels. -No label From Princeton [3]
  • 5. Challenges, Difficulties Sparse, so much varieties empty ceiling floor wall window chair bed sofa table tv furnitureobjects 92.466 0.945 1.103 1.963 0.337 0.070 0.368 0.378 0.107 0.009 1.406 0.846 Average occupancy ratio of each objects in dataset [%] [%] Average occupancy ratio Dining room Bedroom Garage Living room
  • 6. Network Architecture Fully Convolutional Refined Auto-Encoding Generative Adversarial Networks -Similar architecture with 3DGAN[1] -Encoder, discriminator are almost mirrored from generator -Latent space is fully convolutional layer (5x3x5x16) -Fully convolution enables Zenc to represent more features -Last activation of generator is softmax -> Divide 12 classes -Code discriminator is fully connected (2 hidden layers)[4] -Refiner is similar architecture of simGAN[5] -Multi class activation for multi object scenes -Fully Convolution Novel Contribution Real/Fake Encoder Generator Fully Convolution Normal Distribution z Generator Discriminator zenc reshape Code DiscriminatorReal/Fake Refiner Refiner xgen xrec x
  • 7. Network Architecture Inspired by [1] z 5x3x5x512 10x6x10x256 20x12x20x128 40x24x40x64 80x48x80x12 Each Network -3D deconv (Stride:2) -Batch Norm -LRelu(Discriminator) Relu(Encoder, Generator) Last Activation -Softmax 5x3x5x16 Reshape & FC batchnorm relu 5x3x5 stride 2 batchnorm relu Generator Network 5x3x5 stride 2 batchnorm relu 5x3x5 stride 2 batchnorm relu 5x3x5 stride 2 softmax [Wu et al. 2016, MIT]
  • 8. Network Architecture Inspired by [5] Each Network -3D deconv (Stride:1) -Relu(Encoder, Generator) -ResNet Block loops 4 times (different weights) Last Activation -Softmax Refiner Network Unlabeled Real Images Synthetic Simulated images Refined Figure5. Exampleoutput of SimGAN for theUnityEyesgazeestimation dataset [40]. (Left) real imagesfrom MPIIGaze[43]. Our refiner network doesnot useany label information from MPIIGazedataset at training time. (Right) refinement resultson UnityEye. The skin texture and the iris region in the refined synthetic images are qualitatively significantly more similar to the real images than to thesynthetic images. More examples areincluded in thesupplementary material. maps Conv f@nxn Conv f@nxn + ReLU ReLU Input Features Output Features Figure6. A ResNet block with two n ⇥n convolutional layers, and 214K real images from the MPIIGaze dataset [43] – samples shown in Figure 5. MPIIGaze is a very chal- lenging eye gaze estimation dataset captured under ex- treme illumination conditions. For UnityEyes we use a single generic rendering environment to generate train- ing data without any dataset-specific targeting. 80x48x80x12 80x48x80x32 80x48x80x32 80x48x80x12 ResNet Block x4 3x3x3 relu 3x3x3 relu 3x3x3 Relu
  • 9. Loss / Training Encoder Distribution GAN Loss Reconstruction Loss Discriminator discriminates real and fake scenes accurately Generator fools discriminator ℒ 𝑟𝑒𝑐 = 𝑛 𝑐𝑙𝑎𝑠𝑠 𝑤 𝑛 −𝛾𝑥𝑙𝑜𝑔 𝑥 𝑟𝑒𝑐 − 1 − 𝛾 1 − 𝑥 𝑙𝑜𝑔 1 − 𝑥 𝑟𝑒𝑐 Reconstruction accuracy would be high w is occupancy normalized weights with every batch GAN Loss ℒ 𝐺𝐴𝑁 𝐷 = −log 𝐷 𝑥 − log 1 − 𝐷 𝑥 𝑟𝑒𝑐 − log 1 − 𝐷 𝑥 𝑔𝑒𝑛 ℒ 𝐺𝐴𝑁 𝐺 = − log 𝐷 𝑥 𝑟𝑒𝑐 − log 𝐷 𝑥 𝑔𝑒𝑛 min 𝐸 ℒ = ℒ 𝑐𝐺𝐴𝑁 𝐸 + 𝜆ℒ 𝑟𝑒𝑐 TrainingLoss Generator with refiner min 𝐺 ℒ = 𝜆ℒ 𝑟𝑒𝑐 + ℒ 𝐺𝐴𝑁 𝐺 Discriminator min 𝐷 ℒ = ℒ 𝐺𝐴𝑁 𝐷 Learning rate: 0.0001 Batch size: 20(Base), 8(Refiner) Iteration: 100000 (75000:Base, 25000:Refiner) ℒ 𝑐𝐺𝐴𝑁 𝐷 = −log 𝐷𝑐𝑜𝑑𝑒 𝑧 − log 1 − 𝐷𝑐𝑜𝑑𝑒 𝑧 𝑒𝑛𝑐 ℒ 𝑐𝐺𝐴𝑁 𝐸 = − log 𝐷𝑐𝑜𝑑𝑒 𝑧 𝑒𝑛𝑐 Code discriminator discriminates real and fake distribution accurately Encoder fools code discriminator Code Discriminator min 𝐶 ℒ = ℒ 𝑐𝐺𝐴𝑁 𝐷 Refiner is trained after 75000 iterations
  • 10. Experiments Refiner smooths and refines shapes visually. Generated scenes from random distribution were not realistic Generated from random distribution FC-VAE 3D FCR-alphaGAN Reconstruction Real Reconstruction Almost reconstructed, but small shapes have disappeared. Before Refine After Refine This architecture worked better than just VAE, but it’s not enough. This is because encoder was not generalized to the distribution Before Refine After Refine
  • 11. Result Numerical evaluation of reconstruction by IoU Intersection-over Union(IoU) [6] Reconstruction accuracy got high due to the fully convolution and alphaGAN IoU for every class IoU for all Same number of latent space dimension Same number of latent space dimension
  • 13. Evaluations Latent Space Evaluation The 2D represented mapping by SVD of 200 encoded samples Color:1D embedding by SVD of centroid coordinates of each scene Fully convolution Standard VAE Fully Convolution enables the latent space to be related to spatial context This follows 1d embedding of centroid coordinates from lower right to upper left. This does not.
  • 14. Evaluations Latent space evaluation by added noise The effects of individual spatial dimensions composed of 5x3x5 as the latent space. Red means the level of changes given by normal distribution noises of one dimension. ・2,0,4 dimension changes objects in right back area. ・4,0,1 dimension changes objects in left front area. ・1,0,0 dimension changes objects in left back area. ・4,0,4 dimension changes objects in right front area. Fully Convolution enables the latent space to be related to spatial context
  • 15. Suggestions of Future Work ・Revise the dataset This dataset is extremely sparse and has plenty of varieties. Floors and small objects are allocated to huge varieties of positions, also some of the small parts like legs of chairs broke up in the dataset because of the downsizing. That makes predicting latent space too hard. Therefore, it is an important work to revise the dataset like limiting the varieties or adjusting the positions of objects. ・Redefine the latent space In this work, I defined the latent space with one space which includes all information like shapes and positions of each object. Therefore, some small objects disappeared in the generated models, and a lot of non-realistic objects were generated. In order to solve that, it is an important work to redefine the latent space like isolating it to each object and layout. However, increasing the varieties of objects and taking account into multiple objects are required in that case.