SlideShare una empresa de Scribd logo
1 de 17
Descargar para leer sin conexión
Deep Learning on
Everyday Devices
Amir Alush, CTO & co founder, IMVC 2018
Deep Learning “Cycle”
untrained model
new data
(test)
training
data
trained model
car
car pedestrian
Yes No
car
TRAINING INFERENCE
Inference on Everyday devices
8-Core ARM Qualcomm 6-core
Snapdragon 820 TBD
ARM
x86
MIPS
Power
other
Embedded processors market share (Source: AMD 2016)
Inference on the Edge
Motivation
● Low Latency
● Keep Privacy
● Small/no Bandwidth
● Utilizing Existing HW
Challenges
● NN High complexity
● Low HW resources
● Complex porting
The Deep Learning Stack
HARDWARE
GPU, CPU, TPU, FPGA, DSP, ASIC
Primitives Libraries
BLAS*, NNPACK,CUDNN
Frameworks
TF, CAFFE, PyTorch, MXNet
Algorithms
NN Architectures, Meta-Architectures
Engines
TensorRT, Core ML, SNPE
It’s the algorithms!
Efficient algorithms:
1. More AI on any processor
2. Critical for everyday devices with low power processors
Speed-accuracy tradeoff
Running off-the-shelf DL algorithms on the edge
requires sacrificing accuracy. The tradeoffs:
1. Reduce input resolution → reduce accuracy
2. Reduce model size (backbone) → reduce accuracy
3. Reduce model bit precision → reduce accuracy?
4. Less accurate algorithm
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Application”, Howard et al. 2017
nverted Residuals and Linear Bottlenecks: Mobile Networks for Classification, Detection and Segmentation”, Sandler et al. 2018
mAP GPU msecs
faster-rcnn + resnet 101 + high resolution 35.6 140
ssd + mobilenetV1 + low resolution 18.8 50
Reduce network size (pruning)
● Should be structured, non structured is usually not HW supported
● Requires re-training
● How effective on already small models?
● Can hurt accuracy!
“Recovering from Random Pruning: On the Plasticity of Deep Convolutional Neural Networks”, Mittal 2018
Reduce model bit precision (quantization)
● Reduce from 32 bits to 8/4/2/1 bits ?
● Activations & Weights are within a narrow range
● Networks are robust to small changes
● 8 bits:
○ Need to be supported in processors instructions
○ Reduces DRAM bandwidth (more in the SRAM)
○ Supported natively by various engines
● Below 8 bits:
○ Accuracy drops
○ Not supported in existing HW
How to deploy your models?
Training Frameworks Inference Engines
● Research flexibility:
○ Fast POC from idea to results
○ Easy to extended: new data loaders, layers
(loss, operations)
○ Easy debugging
● Good training speed (+ parallelization)
○ Fast research iterations
● Large active community:
○ Explore new research ideas fast
● Personal flavour
● Portability should not a factor
● Optimized for latency (forward pass)
● Should be efficient on a specific hardware
○ Takes advantage of specific hw
instructions
● Little / no overhead
● Efficient working memory
● Small code size
● Support all of your NN layers
Training frameworks vs Inference Engines
Build your own Inference Environment?
1. Use the engines (TensorRT, SNPE...):
○ Faster to production (if the conversion works out of the box)
○ Current generation has some limitations and minimal tech support
○ Not as mature as the training frameworks or DL primitives libraries
○ Not all NN capabilities are supported
2. Write your own inference engine:
○ Slower to production, but, with low risk
○ Use the DL primitives libraries (CuDNN, BLAS, NNPACK..) they are more matures than the
engines and they are optimized!
○ You’ll need to write your own logic on top!
○ You’re in control and can adjust to your needs
Brodmann17 R&D workflow
DeployData Research
Brodmann17 Use Case - IoT
● MWC 2018 recent cooperation with &
● Embedded World 2018 with
● CES 2018 with
Brodmann17 Use Case - Smart City
* 1 CPU core
Brodmann17 Use Case - Adas
Benchmarking on the KITTI dataset car detection
* 1 CPU core
Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite, Geiger et al, 2012
1. motivation and challenges in NN edge processing
2. Speed accuracy tradeoffs made today
3. Brodmann17 key design principles for keeping efficiency and accuracy
on the edge
4. key considerations when choosing your inference environment
5. Brodmann17 example use cases
Summary

Más contenido relacionado

La actualidad más candente

Kevin Shaw at AI Frontiers: AI on the Edge: Bringing Intelligence to Small De...
Kevin Shaw at AI Frontiers: AI on the Edge: Bringing Intelligence to Small De...Kevin Shaw at AI Frontiers: AI on the Edge: Bringing Intelligence to Small De...
Kevin Shaw at AI Frontiers: AI on the Edge: Bringing Intelligence to Small De...AI Frontiers
 
Presentation
PresentationPresentation
Presentationbutest
 
"Blending Cloud and Edge Machine Learning to Deliver Real-time Video Monitori...
"Blending Cloud and Edge Machine Learning to Deliver Real-time Video Monitori..."Blending Cloud and Edge Machine Learning to Deliver Real-time Video Monitori...
"Blending Cloud and Edge Machine Learning to Deliver Real-time Video Monitori...Edge AI and Vision Alliance
 
IoT Slam Keynote: Harnessing the Flood of Data with Heterogeneous Computing a...
IoT Slam Keynote: Harnessing the Flood of Data with Heterogeneous Computing a...IoT Slam Keynote: Harnessing the Flood of Data with Heterogeneous Computing a...
IoT Slam Keynote: Harnessing the Flood of Data with Heterogeneous Computing a...Ryft
 
How Do I Understand Deep Learning Performance?
How Do I Understand Deep Learning Performance?How Do I Understand Deep Learning Performance?
How Do I Understand Deep Learning Performance?NVIDIA
 
“A Highly Data-Efficient Deep Learning Approach,” a Presentation from Samsung
“A Highly Data-Efficient Deep Learning Approach,” a Presentation from Samsung“A Highly Data-Efficient Deep Learning Approach,” a Presentation from Samsung
“A Highly Data-Efficient Deep Learning Approach,” a Presentation from SamsungEdge AI and Vision Alliance
 
Machine Learning with Scala
Machine Learning with ScalaMachine Learning with Scala
Machine Learning with ScalaSusan Eraly
 
Vertex Perspectives | AI Optimized Chipsets | Part II
Vertex Perspectives | AI Optimized Chipsets | Part IIVertex Perspectives | AI Optimized Chipsets | Part II
Vertex Perspectives | AI Optimized Chipsets | Part IIVertex Holdings
 
Affordable AI Connects To A Better Life
Affordable AI Connects To A Better LifeAffordable AI Connects To A Better Life
Affordable AI Connects To A Better LifeNVIDIA Taiwan
 
"Emerging Processor Architectures for Deep Learning: Options and Trade-offs,"...
"Emerging Processor Architectures for Deep Learning: Options and Trade-offs,"..."Emerging Processor Architectures for Deep Learning: Options and Trade-offs,"...
"Emerging Processor Architectures for Deep Learning: Options and Trade-offs,"...Edge AI and Vision Alliance
 
HiPEAC 2020: Energy-aware Task Scheduling in LEGaTO: Low Energy Toolset for H...
HiPEAC 2020: Energy-aware Task Scheduling in LEGaTO: Low Energy Toolset for H...HiPEAC 2020: Energy-aware Task Scheduling in LEGaTO: Low Energy Toolset for H...
HiPEAC 2020: Energy-aware Task Scheduling in LEGaTO: Low Energy Toolset for H...LEGATO project
 
"How to Test and Validate an Automated Driving System," a Presentation from M...
"How to Test and Validate an Automated Driving System," a Presentation from M..."How to Test and Validate an Automated Driving System," a Presentation from M...
"How to Test and Validate an Automated Driving System," a Presentation from M...Edge AI and Vision Alliance
 
Google Devfest kolkata'19
Google Devfest kolkata'19Google Devfest kolkata'19
Google Devfest kolkata'19Akshay Bahadur
 
"New Dataflow Architecture for Machine Learning," a Presentation from Wave Co...
"New Dataflow Architecture for Machine Learning," a Presentation from Wave Co..."New Dataflow Architecture for Machine Learning," a Presentation from Wave Co...
"New Dataflow Architecture for Machine Learning," a Presentation from Wave Co...Edge AI and Vision Alliance
 
Deep learning for smart manufacturing
Deep learning for smart manufacturingDeep learning for smart manufacturing
Deep learning for smart manufacturingSunil Kumar Pradhan
 
“An Introduction to Data Augmentation Techniques in ML Frameworks,” a Present...
“An Introduction to Data Augmentation Techniques in ML Frameworks,” a Present...“An Introduction to Data Augmentation Techniques in ML Frameworks,” a Present...
“An Introduction to Data Augmentation Techniques in ML Frameworks,” a Present...Edge AI and Vision Alliance
 
"Approaches for Vision-based Driver Monitoring," a Presentation from PathPart...
"Approaches for Vision-based Driver Monitoring," a Presentation from PathPart..."Approaches for Vision-based Driver Monitoring," a Presentation from PathPart...
"Approaches for Vision-based Driver Monitoring," a Presentation from PathPart...Edge AI and Vision Alliance
 
Tiny intelligent computers and sensors - Open Hardware Event 2020
Tiny intelligent computers and sensors - Open Hardware Event 2020Tiny intelligent computers and sensors - Open Hardware Event 2020
Tiny intelligent computers and sensors - Open Hardware Event 2020Jan Jongboom
 

La actualidad más candente (20)

Kevin Shaw at AI Frontiers: AI on the Edge: Bringing Intelligence to Small De...
Kevin Shaw at AI Frontiers: AI on the Edge: Bringing Intelligence to Small De...Kevin Shaw at AI Frontiers: AI on the Edge: Bringing Intelligence to Small De...
Kevin Shaw at AI Frontiers: AI on the Edge: Bringing Intelligence to Small De...
 
Presentation
PresentationPresentation
Presentation
 
"Blending Cloud and Edge Machine Learning to Deliver Real-time Video Monitori...
"Blending Cloud and Edge Machine Learning to Deliver Real-time Video Monitori..."Blending Cloud and Edge Machine Learning to Deliver Real-time Video Monitori...
"Blending Cloud and Edge Machine Learning to Deliver Real-time Video Monitori...
 
IoT Slam Keynote: Harnessing the Flood of Data with Heterogeneous Computing a...
IoT Slam Keynote: Harnessing the Flood of Data with Heterogeneous Computing a...IoT Slam Keynote: Harnessing the Flood of Data with Heterogeneous Computing a...
IoT Slam Keynote: Harnessing the Flood of Data with Heterogeneous Computing a...
 
How Do I Understand Deep Learning Performance?
How Do I Understand Deep Learning Performance?How Do I Understand Deep Learning Performance?
How Do I Understand Deep Learning Performance?
 
“A Highly Data-Efficient Deep Learning Approach,” a Presentation from Samsung
“A Highly Data-Efficient Deep Learning Approach,” a Presentation from Samsung“A Highly Data-Efficient Deep Learning Approach,” a Presentation from Samsung
“A Highly Data-Efficient Deep Learning Approach,” a Presentation from Samsung
 
Machine Learning with Scala
Machine Learning with ScalaMachine Learning with Scala
Machine Learning with Scala
 
Original
OriginalOriginal
Original
 
Vertex Perspectives | AI Optimized Chipsets | Part II
Vertex Perspectives | AI Optimized Chipsets | Part IIVertex Perspectives | AI Optimized Chipsets | Part II
Vertex Perspectives | AI Optimized Chipsets | Part II
 
Affordable AI Connects To A Better Life
Affordable AI Connects To A Better LifeAffordable AI Connects To A Better Life
Affordable AI Connects To A Better Life
 
"Emerging Processor Architectures for Deep Learning: Options and Trade-offs,"...
"Emerging Processor Architectures for Deep Learning: Options and Trade-offs,"..."Emerging Processor Architectures for Deep Learning: Options and Trade-offs,"...
"Emerging Processor Architectures for Deep Learning: Options and Trade-offs,"...
 
HiPEAC 2020: Energy-aware Task Scheduling in LEGaTO: Low Energy Toolset for H...
HiPEAC 2020: Energy-aware Task Scheduling in LEGaTO: Low Energy Toolset for H...HiPEAC 2020: Energy-aware Task Scheduling in LEGaTO: Low Energy Toolset for H...
HiPEAC 2020: Energy-aware Task Scheduling in LEGaTO: Low Energy Toolset for H...
 
"How to Test and Validate an Automated Driving System," a Presentation from M...
"How to Test and Validate an Automated Driving System," a Presentation from M..."How to Test and Validate an Automated Driving System," a Presentation from M...
"How to Test and Validate an Automated Driving System," a Presentation from M...
 
On-Device AI
On-Device AIOn-Device AI
On-Device AI
 
Google Devfest kolkata'19
Google Devfest kolkata'19Google Devfest kolkata'19
Google Devfest kolkata'19
 
"New Dataflow Architecture for Machine Learning," a Presentation from Wave Co...
"New Dataflow Architecture for Machine Learning," a Presentation from Wave Co..."New Dataflow Architecture for Machine Learning," a Presentation from Wave Co...
"New Dataflow Architecture for Machine Learning," a Presentation from Wave Co...
 
Deep learning for smart manufacturing
Deep learning for smart manufacturingDeep learning for smart manufacturing
Deep learning for smart manufacturing
 
“An Introduction to Data Augmentation Techniques in ML Frameworks,” a Present...
“An Introduction to Data Augmentation Techniques in ML Frameworks,” a Present...“An Introduction to Data Augmentation Techniques in ML Frameworks,” a Present...
“An Introduction to Data Augmentation Techniques in ML Frameworks,” a Present...
 
"Approaches for Vision-based Driver Monitoring," a Presentation from PathPart...
"Approaches for Vision-based Driver Monitoring," a Presentation from PathPart..."Approaches for Vision-based Driver Monitoring," a Presentation from PathPart...
"Approaches for Vision-based Driver Monitoring," a Presentation from PathPart...
 
Tiny intelligent computers and sensors - Open Hardware Event 2020
Tiny intelligent computers and sensors - Open Hardware Event 2020Tiny intelligent computers and sensors - Open Hardware Event 2020
Tiny intelligent computers and sensors - Open Hardware Event 2020
 

Similar a Deep Learning on Everyday Devices

“Enabling Ultra-low Power Edge Inference and On-device Learning with Akida,” ...
“Enabling Ultra-low Power Edge Inference and On-device Learning with Akida,” ...“Enabling Ultra-low Power Edge Inference and On-device Learning with Akida,” ...
“Enabling Ultra-low Power Edge Inference and On-device Learning with Akida,” ...Edge AI and Vision Alliance
 
Deploying Pretrained Model In Edge IoT Devices.pdf
Deploying Pretrained Model In Edge IoT Devices.pdfDeploying Pretrained Model In Edge IoT Devices.pdf
Deploying Pretrained Model In Edge IoT Devices.pdfObject Automation
 
Deep learning at the edge: 100x Inference improvement on edge devices
Deep learning at the edge: 100x Inference improvement on edge devicesDeep learning at the edge: 100x Inference improvement on edge devices
Deep learning at the edge: 100x Inference improvement on edge devicesNumenta
 
Technologies comparison: Genuino 101 vs uTensor
Technologies comparison: Genuino 101 vs uTensor Technologies comparison: Genuino 101 vs uTensor
Technologies comparison: Genuino 101 vs uTensor AndreaNapoletani
 
Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...
Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...
Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...Codemotion
 
HiPEAC-CSW 2022_Pedro Trancoso presentation
HiPEAC-CSW 2022_Pedro Trancoso presentationHiPEAC-CSW 2022_Pedro Trancoso presentation
HiPEAC-CSW 2022_Pedro Trancoso presentationVEDLIoT Project
 
Benefits of a Homemade ML Platform
Benefits of a Homemade ML PlatformBenefits of a Homemade ML Platform
Benefits of a Homemade ML PlatformGetInData
 
How AI and ML are driving Memory Architecture changes
How AI and ML are driving Memory Architecture changesHow AI and ML are driving Memory Architecture changes
How AI and ML are driving Memory Architecture changesDanny Sabour
 
Ceph Day Shanghai - SSD/NVM Technology Boosting Ceph Performance
Ceph Day Shanghai - SSD/NVM Technology Boosting Ceph Performance Ceph Day Shanghai - SSD/NVM Technology Boosting Ceph Performance
Ceph Day Shanghai - SSD/NVM Technology Boosting Ceph Performance Ceph Community
 
“Autonomous Driving AI Workloads: Technology Trends and Optimization Strategi...
“Autonomous Driving AI Workloads: Technology Trends and Optimization Strategi...“Autonomous Driving AI Workloads: Technology Trends and Optimization Strategi...
“Autonomous Driving AI Workloads: Technology Trends and Optimization Strategi...Edge AI and Vision Alliance
 
How I Sped up Complex Matrix-Vector Multiplication: Finding Intel MKL's "S
How I Sped up Complex Matrix-Vector Multiplication: Finding Intel MKL's "SHow I Sped up Complex Matrix-Vector Multiplication: Finding Intel MKL's "S
How I Sped up Complex Matrix-Vector Multiplication: Finding Intel MKL's "SBrandon Liu
 
Mauricio breteernitiz hpc-exascale-iscte
Mauricio breteernitiz hpc-exascale-iscteMauricio breteernitiz hpc-exascale-iscte
Mauricio breteernitiz hpc-exascale-isctembreternitz
 
Ibm pure data system for analytics n200x
Ibm pure data system for analytics n200xIbm pure data system for analytics n200x
Ibm pure data system for analytics n200xIBM Sverige
 
“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...
“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...
“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...Edge AI and Vision Alliance
 
Red hat Storage Day LA - Designing Ceph Clusters Using Intel-Based Hardware
Red hat Storage Day LA - Designing Ceph Clusters Using Intel-Based HardwareRed hat Storage Day LA - Designing Ceph Clusters Using Intel-Based Hardware
Red hat Storage Day LA - Designing Ceph Clusters Using Intel-Based HardwareRed_Hat_Storage
 
Analytics, Big Data and Nonvolatile Memory Architectures – Why you Should Car...
Analytics, Big Data and Nonvolatile Memory Architectures – Why you Should Car...Analytics, Big Data and Nonvolatile Memory Architectures – Why you Should Car...
Analytics, Big Data and Nonvolatile Memory Architectures – Why you Should Car...StampedeCon
 
Chiplets in Data Centers
Chiplets in Data CentersChiplets in Data Centers
Chiplets in Data CentersODSA Workgroup
 
Intel Distribution for Python - Scaling for HPC and Big Data
Intel Distribution for Python - Scaling for HPC and Big DataIntel Distribution for Python - Scaling for HPC and Big Data
Intel Distribution for Python - Scaling for HPC and Big DataDESMOND YUEN
 
Infrastructure and Tooling - Full Stack Deep Learning
Infrastructure and Tooling - Full Stack Deep LearningInfrastructure and Tooling - Full Stack Deep Learning
Infrastructure and Tooling - Full Stack Deep LearningSergey Karayev
 

Similar a Deep Learning on Everyday Devices (20)

“Enabling Ultra-low Power Edge Inference and On-device Learning with Akida,” ...
“Enabling Ultra-low Power Edge Inference and On-device Learning with Akida,” ...“Enabling Ultra-low Power Edge Inference and On-device Learning with Akida,” ...
“Enabling Ultra-low Power Edge Inference and On-device Learning with Akida,” ...
 
Deploying Pretrained Model In Edge IoT Devices.pdf
Deploying Pretrained Model In Edge IoT Devices.pdfDeploying Pretrained Model In Edge IoT Devices.pdf
Deploying Pretrained Model In Edge IoT Devices.pdf
 
Deep learning at the edge: 100x Inference improvement on edge devices
Deep learning at the edge: 100x Inference improvement on edge devicesDeep learning at the edge: 100x Inference improvement on edge devices
Deep learning at the edge: 100x Inference improvement on edge devices
 
Technologies comparison: Genuino 101 vs uTensor
Technologies comparison: Genuino 101 vs uTensor Technologies comparison: Genuino 101 vs uTensor
Technologies comparison: Genuino 101 vs uTensor
 
Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...
Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...
Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...
 
HiPEAC-CSW 2022_Pedro Trancoso presentation
HiPEAC-CSW 2022_Pedro Trancoso presentationHiPEAC-CSW 2022_Pedro Trancoso presentation
HiPEAC-CSW 2022_Pedro Trancoso presentation
 
Benefits of a Homemade ML Platform
Benefits of a Homemade ML PlatformBenefits of a Homemade ML Platform
Benefits of a Homemade ML Platform
 
How AI and ML are driving Memory Architecture changes
How AI and ML are driving Memory Architecture changesHow AI and ML are driving Memory Architecture changes
How AI and ML are driving Memory Architecture changes
 
Ceph Day Shanghai - SSD/NVM Technology Boosting Ceph Performance
Ceph Day Shanghai - SSD/NVM Technology Boosting Ceph Performance Ceph Day Shanghai - SSD/NVM Technology Boosting Ceph Performance
Ceph Day Shanghai - SSD/NVM Technology Boosting Ceph Performance
 
“Autonomous Driving AI Workloads: Technology Trends and Optimization Strategi...
“Autonomous Driving AI Workloads: Technology Trends and Optimization Strategi...“Autonomous Driving AI Workloads: Technology Trends and Optimization Strategi...
“Autonomous Driving AI Workloads: Technology Trends and Optimization Strategi...
 
How I Sped up Complex Matrix-Vector Multiplication: Finding Intel MKL's "S
How I Sped up Complex Matrix-Vector Multiplication: Finding Intel MKL's "SHow I Sped up Complex Matrix-Vector Multiplication: Finding Intel MKL's "S
How I Sped up Complex Matrix-Vector Multiplication: Finding Intel MKL's "S
 
Mauricio breteernitiz hpc-exascale-iscte
Mauricio breteernitiz hpc-exascale-iscteMauricio breteernitiz hpc-exascale-iscte
Mauricio breteernitiz hpc-exascale-iscte
 
Ibm pure data system for analytics n200x
Ibm pure data system for analytics n200xIbm pure data system for analytics n200x
Ibm pure data system for analytics n200x
 
“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...
“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...
“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...
 
Red hat Storage Day LA - Designing Ceph Clusters Using Intel-Based Hardware
Red hat Storage Day LA - Designing Ceph Clusters Using Intel-Based HardwareRed hat Storage Day LA - Designing Ceph Clusters Using Intel-Based Hardware
Red hat Storage Day LA - Designing Ceph Clusters Using Intel-Based Hardware
 
Analytics, Big Data and Nonvolatile Memory Architectures – Why you Should Car...
Analytics, Big Data and Nonvolatile Memory Architectures – Why you Should Car...Analytics, Big Data and Nonvolatile Memory Architectures – Why you Should Car...
Analytics, Big Data and Nonvolatile Memory Architectures – Why you Should Car...
 
Chiplets in Data Centers
Chiplets in Data CentersChiplets in Data Centers
Chiplets in Data Centers
 
Summit workshop thompto
Summit workshop thomptoSummit workshop thompto
Summit workshop thompto
 
Intel Distribution for Python - Scaling for HPC and Big Data
Intel Distribution for Python - Scaling for HPC and Big DataIntel Distribution for Python - Scaling for HPC and Big Data
Intel Distribution for Python - Scaling for HPC and Big Data
 
Infrastructure and Tooling - Full Stack Deep Learning
Infrastructure and Tooling - Full Stack Deep LearningInfrastructure and Tooling - Full Stack Deep Learning
Infrastructure and Tooling - Full Stack Deep Learning
 

Más de Brodmann17

5 Practical Steps to a Successful Deep Learning Research
5 Practical Steps to a Successful  Deep Learning Research5 Practical Steps to a Successful  Deep Learning Research
5 Practical Steps to a Successful Deep Learning ResearchBrodmann17
 
Advanced deep learning based object detection methods
Advanced deep learning based object detection methodsAdvanced deep learning based object detection methods
Advanced deep learning based object detection methodsBrodmann17
 
Fast methods for deep learning based object detection
Fast methods for deep learning based object detectionFast methods for deep learning based object detection
Fast methods for deep learning based object detectionBrodmann17
 
Deep learning based object detection basics
Deep learning based object detection basicsDeep learning based object detection basics
Deep learning based object detection basicsBrodmann17
 
Introduction to object detection
Introduction to object detectionIntroduction to object detection
Introduction to object detectionBrodmann17
 
DLD meetup 2017, Efficient Deep Learning
DLD meetup 2017, Efficient Deep LearningDLD meetup 2017, Efficient Deep Learning
DLD meetup 2017, Efficient Deep LearningBrodmann17
 
Brodmann17 CVPR 2017 review - meetup slides
Brodmann17 CVPR 2017 review - meetup slides Brodmann17 CVPR 2017 review - meetup slides
Brodmann17 CVPR 2017 review - meetup slides Brodmann17
 

Más de Brodmann17 (8)

5 Practical Steps to a Successful Deep Learning Research
5 Practical Steps to a Successful  Deep Learning Research5 Practical Steps to a Successful  Deep Learning Research
5 Practical Steps to a Successful Deep Learning Research
 
Advanced deep learning based object detection methods
Advanced deep learning based object detection methodsAdvanced deep learning based object detection methods
Advanced deep learning based object detection methods
 
Fast methods for deep learning based object detection
Fast methods for deep learning based object detectionFast methods for deep learning based object detection
Fast methods for deep learning based object detection
 
Deep learning based object detection basics
Deep learning based object detection basicsDeep learning based object detection basics
Deep learning based object detection basics
 
Introduction to object detection
Introduction to object detectionIntroduction to object detection
Introduction to object detection
 
DLD meetup 2017, Efficient Deep Learning
DLD meetup 2017, Efficient Deep LearningDLD meetup 2017, Efficient Deep Learning
DLD meetup 2017, Efficient Deep Learning
 
Brodmann17 CVPR 2017 review - meetup slides
Brodmann17 CVPR 2017 review - meetup slides Brodmann17 CVPR 2017 review - meetup slides
Brodmann17 CVPR 2017 review - meetup slides
 
Geektime 2017
Geektime 2017Geektime 2017
Geektime 2017
 

Último

Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Silpa
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....muralinath2
 
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)AkefAfaneh2
 
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flypumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flyPRADYUMMAURYA1
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformationAreesha Ahmad
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .Poonam Aher Patil
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxseri bangash
 
Use of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxUse of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxRenuJangid3
 
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...Monika Rani
 
An introduction on sequence tagged site mapping
An introduction on sequence tagged site mappingAn introduction on sequence tagged site mapping
An introduction on sequence tagged site mappingadibshanto115
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)Areesha Ahmad
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Silpa
 
Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Silpa
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxSuji236384
 
Stages in the normal growth curve
Stages in the normal growth curveStages in the normal growth curve
Stages in the normal growth curveAreesha Ahmad
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsSérgio Sacani
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxMohamedFarag457087
 

Último (20)

Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
 
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
 
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flypumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
 
Use of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxUse of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptx
 
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
 
An introduction on sequence tagged site mapping
An introduction on sequence tagged site mappingAn introduction on sequence tagged site mapping
An introduction on sequence tagged site mapping
 
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICEPATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
 
Stages in the normal growth curve
Stages in the normal growth curveStages in the normal growth curve
Stages in the normal growth curve
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 

Deep Learning on Everyday Devices

  • 1. Deep Learning on Everyday Devices Amir Alush, CTO & co founder, IMVC 2018
  • 2. Deep Learning “Cycle” untrained model new data (test) training data trained model car car pedestrian Yes No car TRAINING INFERENCE
  • 3. Inference on Everyday devices 8-Core ARM Qualcomm 6-core Snapdragon 820 TBD ARM x86 MIPS Power other Embedded processors market share (Source: AMD 2016)
  • 4. Inference on the Edge Motivation ● Low Latency ● Keep Privacy ● Small/no Bandwidth ● Utilizing Existing HW Challenges ● NN High complexity ● Low HW resources ● Complex porting
  • 5. The Deep Learning Stack HARDWARE GPU, CPU, TPU, FPGA, DSP, ASIC Primitives Libraries BLAS*, NNPACK,CUDNN Frameworks TF, CAFFE, PyTorch, MXNet Algorithms NN Architectures, Meta-Architectures Engines TensorRT, Core ML, SNPE
  • 6. It’s the algorithms! Efficient algorithms: 1. More AI on any processor 2. Critical for everyday devices with low power processors
  • 7. Speed-accuracy tradeoff Running off-the-shelf DL algorithms on the edge requires sacrificing accuracy. The tradeoffs: 1. Reduce input resolution → reduce accuracy 2. Reduce model size (backbone) → reduce accuracy 3. Reduce model bit precision → reduce accuracy? 4. Less accurate algorithm MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Application”, Howard et al. 2017 nverted Residuals and Linear Bottlenecks: Mobile Networks for Classification, Detection and Segmentation”, Sandler et al. 2018 mAP GPU msecs faster-rcnn + resnet 101 + high resolution 35.6 140 ssd + mobilenetV1 + low resolution 18.8 50
  • 8. Reduce network size (pruning) ● Should be structured, non structured is usually not HW supported ● Requires re-training ● How effective on already small models? ● Can hurt accuracy! “Recovering from Random Pruning: On the Plasticity of Deep Convolutional Neural Networks”, Mittal 2018
  • 9. Reduce model bit precision (quantization) ● Reduce from 32 bits to 8/4/2/1 bits ? ● Activations & Weights are within a narrow range ● Networks are robust to small changes ● 8 bits: ○ Need to be supported in processors instructions ○ Reduces DRAM bandwidth (more in the SRAM) ○ Supported natively by various engines ● Below 8 bits: ○ Accuracy drops ○ Not supported in existing HW
  • 10. How to deploy your models? Training Frameworks Inference Engines ● Research flexibility: ○ Fast POC from idea to results ○ Easy to extended: new data loaders, layers (loss, operations) ○ Easy debugging ● Good training speed (+ parallelization) ○ Fast research iterations ● Large active community: ○ Explore new research ideas fast ● Personal flavour ● Portability should not a factor ● Optimized for latency (forward pass) ● Should be efficient on a specific hardware ○ Takes advantage of specific hw instructions ● Little / no overhead ● Efficient working memory ● Small code size ● Support all of your NN layers
  • 11. Training frameworks vs Inference Engines
  • 12. Build your own Inference Environment? 1. Use the engines (TensorRT, SNPE...): ○ Faster to production (if the conversion works out of the box) ○ Current generation has some limitations and minimal tech support ○ Not as mature as the training frameworks or DL primitives libraries ○ Not all NN capabilities are supported 2. Write your own inference engine: ○ Slower to production, but, with low risk ○ Use the DL primitives libraries (CuDNN, BLAS, NNPACK..) they are more matures than the engines and they are optimized! ○ You’ll need to write your own logic on top! ○ You’re in control and can adjust to your needs
  • 14. Brodmann17 Use Case - IoT ● MWC 2018 recent cooperation with & ● Embedded World 2018 with ● CES 2018 with
  • 15. Brodmann17 Use Case - Smart City * 1 CPU core
  • 16. Brodmann17 Use Case - Adas Benchmarking on the KITTI dataset car detection * 1 CPU core Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite, Geiger et al, 2012
  • 17. 1. motivation and challenges in NN edge processing 2. Speed accuracy tradeoffs made today 3. Brodmann17 key design principles for keeping efficiency and accuracy on the edge 4. key considerations when choosing your inference environment 5. Brodmann17 example use cases Summary