SlideShare una empresa de Scribd logo
1 de 18
Descargar para leer sin conexión
Copyright © 2015 Synopsys Inc. 1
Bruno Lavigueur
12 May 2015
Tailoring CNNs for Low-cost,
Low-power Implementations
Copyright © 2015 Synopsys Inc. 2
• Embedded vision subsystem, build from many silicon proven IPs
• DesignWare: ARC HS processor, AXI, DMA, Memory Compiler, …
• HAPS FPGA-based rapid prototyping system
Synopsys at a Glance
>5,300
Masters/PhD
Degrees
>2,300
IP Designers
>1,500
Applications
Engineers
>$2.2B
FY14
Revenue
32%
Revenue
on R&D
>9,300
Employees
Copyright © 2015 Synopsys Inc. 3
• Convolutional Neural Network (CNN)
• Wide range of detection and classification
possible
• The majority of the published CNN graphs
are not tailored for embedded
• Memory requirements
• Number of floating point operations (# of MAC)
• Yet CNN have nice properties for parallelization on embedded devices
• Regular processing, feed forward dataflow, no data dependant
computation
• Key questions
• Can the size and complexity of the graph be reduced with minimal
impact on detection rates ?
• Number of layers, connectivity, size of convolution
• What is the impact of moving from floating to fixed point ?
CNN on Embedded Devices
Copyright © 2015 Synopsys Inc. 4
How CNN Works (Once Trained)
• Multiple feature extraction layers
• Progressive refinement process
• Each successive layer extracts more complex features (higher level)
• Last layer performs classification
• Same computation (neuron) replicated multiple times
Input image Layer 1
Low level feature extraction
Pooling & down sampling
Layer 2
Mid-level features
Partially connected
Layer 3
High-level
features
Fully
connected
classification
Copyright © 2015 Synopsys Inc. 5
• Each layer of convolutions extract progressively higher level features
• Subsampling / max pooling to “zoom out” and detect bigger objects
with smaller convolutions
• Non-linear function on each neuron to activate it
Visualising a CNN
Layer 1 output
sample
Layer 2 output
sample
Layer 3 output
sample
Layer 4 output
sample
Copyright © 2015 Synopsys Inc. 6
• Convolution of
multiple inputs
together
• Fixed kernel size
• Optional subsampling
• 1, 2, 4x
• Optional max-pooling
• Very regular, repetitive
computation
• Dominated by MAC
• Deterministic
• Non-linear activation
function (sigmoid,
hyperbolic tangent,
rectifier)
CNN Computation
I0
IM-1
I1
O0
ON-1
M inputs
(XI * YI)
Z kernels (K * K) with
associated weights
N outputs (XO * YO)
Oj = act(Bj+ (Iv x Kw) + …)
Convolution (x)
act
act
Activation (tanh, ReLU)
…
Copyright © 2015 Synopsys Inc. 7
• Given the nature of the algorithm,
there are many ways to accelerate
CNNs including:
• Vector / SIMD unit
• Systolic array / Streaming
• GPU
• Performance / Power / Area trade-offs will vary
• Depending on the architecture
• In all cases the main limitations will be
• Amount of closely coupled memory available
• Maximum number of Giga-MAC/s that can be sustained
• I/O bandwidth required & available
• Optimized data movement, efficient streaming
Moving Towards Embedded CNN
EV Processor
Shared
Memory
DMA
Interconnect
RISC CPU
32-bit
Core
32-bit
Core
32-bit
Core
32-bit
Core
CNN Engine
…
…
PE PE PE
PE PE PE
Copyright © 2015 Synopsys Inc. 8
Moving CNN to Embedded Systems
• Graph Complexity
• Number of layers
(depth)
• Size of the
convolutions filters
• Number of
connections
between the layers
Compute requirements ALU width/costMemory size
Input
Layer 1 Layer 2 Layer 3 Layer 4
3 2 1
1 2 6
1 2 1
0 1
1 0
Image
Filter
5 8
3 3
Feature
map
Conv. = 4 6
2 2
Data precision# Coefficients
Act.
Copyright © 2015 Synopsys Inc. 9
• Starting point:
• Multicoreware generated ~10 million faces/non-faces from over 200
Hollywood and Bollywood full length movies
• Trained CNN to detect faces in those movies
Example of a Big& Small CNN Application
Metric Alexnet like Embedded
version
Weight Space 400 MB 0.5 MB
Layers 10
(7Cv+3 FC)
5
(3 Cv+2 FC)
Compute 200x 1x
Bandwidth 400x 1x
F1-Score .963 .905
Accuracy .993 .981
VGA 30 FPS 4800 GOPS 24 GOPS
• Cv: Convolution layers
(partially connected)
• FC: Fully connected
layers
Copyright © 2015 Synopsys Inc. 10
• Using standard open source projects to train networks with floating
point and GPU acceleration to explore network topology
• Cuda-convnet, Caffe, Theano
• Didn’t worry initially about numerical precision as literature has shown
CNN are robust to precision
• From scratch: Small networks can be trained very fast
• Enables lots of shots on goal :
• Using scripting and many GPU’s
• Number of network layers, convolutions, subsampling & pooling
• Explored huge space and quickly converged on a graph with good learning
• From an existing graph: Also worked backwards from high accuracy
large graph
• Iteratively reduced it and retrained the best ones
• End up with similar networks in both cases
Reducing Complexity of the Graph
Copyright © 2015 Synopsys Inc. 11
• Improve F-1 score with classic techniques such as
• Data Normalization
• Hard negative mining (boosting)
• Annealing the learning rate
• Data Augmentation: Flip, Random Cropping, color space, ..
• Moved initial system from F1 of ~.74 to ~.90
• Once the graph topology and training is satisfying look at the impact of
moving to fixed point
• Test below are done with 31437 positive and 263145 negative samples
Training Optimizations
Initial Optimized
True positive 19706 27093
False positive 1769 1335
False negative 11731 4344
F-1 Score 0.7449 0.9051
Copyright © 2015 Synopsys Inc. 12
• Compare output of every layer with reference floating
point version
• Differences may grow after each layer
• Detection threshold might need to be tweaked to
achieve similar results
Moving to Fixed Point: Empirical Approach
ReLU
Image
Filter
Convolution =
Accumulator Feature
map
200 64 1
150 50 1
1 10 220
4 0
0 -1
750 255
590 -20
Non-linear
function
750 255
590 0
Shift +
saturate
255 127
255 0
Greyscale
image, 8
bit pixels
Convert to
fixed-point
based on range,
e.g 16 bit
(Q2S13)
Make sure
accumulator
is wide
enough,
e.g. 32 bit
(signed)
Shift-right values to avoid overflow,
x = max(0, x) >> N
Choose ‘N’ according to dynamic
range of ‘x’ values
Copyright © 2015 Synopsys Inc. 13
• FDDB: Face Detection
Data Set and Benchmark
• Results shown for the
embedded small & fixed
point graph
• Localization can be
improved with pre/post
processing
• Impacts scores
• Not done here
Results For Face Detection Application
Type F-1
Best (CascadeCNN) 0.91
Middle 10 average 0.85
Embedded – 40% 0.84
Embedded – 50% 0.82
Fixed point,
8bit
Copyright © 2015 Synopsys Inc. 14
• Design time configurable
• Number of CNN Processing Elements
(2 to 8)
• Streaming interconnection network
configured for number of cores
• Runtime reconfigurable
• Flexible point-to-point connections
between all cores
• CNN-optimized instruction set
• Convolutions, MAC, LUT, …
• Micro-DMA & stream interface for data
movement
• Programmable
• Using the generated C compiler
• Each CNN PE has a local data &
program memory
Low-cost, Low-power, Flexible CNN
SubsystemInterconnect
DMA
Shared
DMem
CNN Engine
Reconfigurable
Streaming Interconnect
PE 1 …PE 2 PE 4
PE 5 PE 6 PE 8…
RISC
MP
32 bit
RISC
32 bit
RISC
32 bit
RISC
32 bit
RISC
Sync
Copyright © 2015 Synopsys Inc. 15
Mapping Example and Performance
L1&4 FIFO L2
L3a
L3b
Subsystem Interconnect
L1 L2 L3 L4
• Input image read only once
• 30 cycles average to do 8
convolutions of 5x5 in parallel
• Including all data movement
& contention
• Over 85% MAC resource
utilization (8 MACs / CNN PE)
• ~15mW per PE @28nm HPM
• w. memory & interconnect
• Mapping on 4
processing elements
• Smaller layers merged
together
4 PE, 5 FIFO configuration
Copyright © 2015 Synopsys Inc. 16
Demonstrator
ARC EV52 Processor
RISC multi-core Shared
Data
Mem
CNN Engine
DMA
AXI Subsystem Interconnect
PE 8
Core 2
MEM
PE 1
Core 1
MEM
AXI Interconnect
DDR
ARC HS Core
• Read in frame,
• Pyramid (scaling)
• Non-max suppression
• Softmax
• Display the result
AXI 2
UMRBus
CNN graph
Host application
streaming video
frames to DDR over
UMR-bus and back
HAPS 70-S12
Prototyping System
Clocked at 50Mhz
(10% of real-time)
Workstation
webcam
Copyright © 2015 Synopsys Inc. 17
• CNN compute requirement can be dramatically reduced with a small impact
of the detection rates
• Works well when the number of object classes to detect is kept small
• Offline training is the critical step to obtain good performances
• Specialized and programmable hardware can be used to efficiently
implement many different CNN graphs
• Low power and area
• Some pre- and post-processing is needed to have a complete and useful
application
• CNN accelerator coupled with quad-core RISC cluster
• Useful to couple CNN with other processing steps to improve performances
• Shrinking the image when it doesn’t impact detection rates
• Sliding a detection window on an image
• Region of interest
Lessons Learned
Copyright © 2015 Synopsys Inc. 18
• Selected CNN papers
• Embedded facial image processing with Convolutional Neural
Networks
• http://liris.cnrs.fr/Documents/Liris-6072.pdf
• Memory-Centric Accelerator Design for Convolutional Neural
Networks
• http://parse.ele.tue.nl/system/attachments/58/original/iccdMP17.pdf?1381908921
• CNN tutorial & courses
• Stanford CNN course
• http://cs231n.github.io/
• Neural network intro and visualization
• http://colah.github.io/
• Synopsys DesignWare Embedded Vision Processors
• http://www.synopsys.com/ev
• More information and demo available at the Technology Showcase
(Mission City Ballroom, Tables 3 & 4)
Resources

Más contenido relacionado

Destacado

Artificial neural networks
Artificial neural networksArtificial neural networks
Artificial neural networks
stellajoseph
 
Artificial neural network
Artificial neural networkArtificial neural network
Artificial neural network
DEEPASHRI HK
 
"A Practitioner’s Guide to Commercializing Applications of Computer Vision," ...
"A Practitioner’s Guide to Commercializing Applications of Computer Vision," ..."A Practitioner’s Guide to Commercializing Applications of Computer Vision," ...
"A Practitioner’s Guide to Commercializing Applications of Computer Vision," ...
Edge AI and Vision Alliance
 
GPU Computing for Cognitive Robotics
GPU Computing for Cognitive RoboticsGPU Computing for Cognitive Robotics
GPU Computing for Cognitive Robotics
Martin Peniak
 

Destacado (17)

neural network
neural networkneural network
neural network
 
Neural network & its applications
Neural network & its applications Neural network & its applications
Neural network & its applications
 
Artificial neural networks
Artificial neural networksArtificial neural networks
Artificial neural networks
 
Artificial neural network
Artificial neural networkArtificial neural network
Artificial neural network
 
Deep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural NetworksDeep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural Networks
 
Hype vs. Reality: The AI Explainer
Hype vs. Reality: The AI ExplainerHype vs. Reality: The AI Explainer
Hype vs. Reality: The AI Explainer
 
Introduction to CNN with Application to Object Recognition
Introduction to CNN with Application to Object RecognitionIntroduction to CNN with Application to Object Recognition
Introduction to CNN with Application to Object Recognition
 
"A Practitioner’s Guide to Commercializing Applications of Computer Vision," ...
"A Practitioner’s Guide to Commercializing Applications of Computer Vision," ..."A Practitioner’s Guide to Commercializing Applications of Computer Vision," ...
"A Practitioner’s Guide to Commercializing Applications of Computer Vision," ...
 
Deep learning in Computer Vision
Deep learning in Computer VisionDeep learning in Computer Vision
Deep learning in Computer Vision
 
Lukáš Vrábel - Deep Convolutional Neural Networks
Lukáš Vrábel - Deep Convolutional Neural NetworksLukáš Vrábel - Deep Convolutional Neural Networks
Lukáš Vrábel - Deep Convolutional Neural Networks
 
From Conventional Machine Learning to Deep Learning and Beyond.pptx
From Conventional Machine Learning to Deep Learning and Beyond.pptxFrom Conventional Machine Learning to Deep Learning and Beyond.pptx
From Conventional Machine Learning to Deep Learning and Beyond.pptx
 
CUDA and Caffe for deep learning
CUDA and Caffe for deep learningCUDA and Caffe for deep learning
CUDA and Caffe for deep learning
 
Deep Convolutional Neural Networks - Overview
Deep Convolutional Neural Networks - OverviewDeep Convolutional Neural Networks - Overview
Deep Convolutional Neural Networks - Overview
 
"Challenges in Object Detection on Embedded Devices," a Presentation from CEVA
"Challenges in Object Detection on Embedded Devices," a Presentation from CEVA"Challenges in Object Detection on Embedded Devices," a Presentation from CEVA
"Challenges in Object Detection on Embedded Devices," a Presentation from CEVA
 
GPU Computing for Cognitive Robotics
GPU Computing for Cognitive RoboticsGPU Computing for Cognitive Robotics
GPU Computing for Cognitive Robotics
 
Introduction to Recurrent Neural Network with Application to Sentiment Analys...
Introduction to Recurrent Neural Network with Application to Sentiment Analys...Introduction to Recurrent Neural Network with Application to Sentiment Analys...
Introduction to Recurrent Neural Network with Application to Sentiment Analys...
 
How Zalando accelerates warehouse operations with neural networks - Calvin Se...
How Zalando accelerates warehouse operations with neural networks - Calvin Se...How Zalando accelerates warehouse operations with neural networks - Calvin Se...
How Zalando accelerates warehouse operations with neural networks - Calvin Se...
 

Más de Edge AI and Vision Alliance

“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
Edge AI and Vision Alliance
 
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
Edge AI and Vision Alliance
 
“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...
Edge AI and Vision Alliance
 
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
Edge AI and Vision Alliance
 
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
Edge AI and Vision Alliance
 
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
Edge AI and Vision Alliance
 
“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara
Edge AI and Vision Alliance
 
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
Edge AI and Vision Alliance
 

Más de Edge AI and Vision Alliance (20)

“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
 
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
 
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
 
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
 
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
 
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
 
“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...
 
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
 
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
 
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
 
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
 
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
 
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
 
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
 
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
 
“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara
 
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
 
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
 
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
 
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Último (20)

MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 

"Tailoring Convolutional Neural Networks for Low-Cost, Low-Power Implementation," a Presentation From Synopsys

  • 1. Copyright © 2015 Synopsys Inc. 1 Bruno Lavigueur 12 May 2015 Tailoring CNNs for Low-cost, Low-power Implementations
  • 2. Copyright © 2015 Synopsys Inc. 2 • Embedded vision subsystem, build from many silicon proven IPs • DesignWare: ARC HS processor, AXI, DMA, Memory Compiler, … • HAPS FPGA-based rapid prototyping system Synopsys at a Glance >5,300 Masters/PhD Degrees >2,300 IP Designers >1,500 Applications Engineers >$2.2B FY14 Revenue 32% Revenue on R&D >9,300 Employees
  • 3. Copyright © 2015 Synopsys Inc. 3 • Convolutional Neural Network (CNN) • Wide range of detection and classification possible • The majority of the published CNN graphs are not tailored for embedded • Memory requirements • Number of floating point operations (# of MAC) • Yet CNN have nice properties for parallelization on embedded devices • Regular processing, feed forward dataflow, no data dependant computation • Key questions • Can the size and complexity of the graph be reduced with minimal impact on detection rates ? • Number of layers, connectivity, size of convolution • What is the impact of moving from floating to fixed point ? CNN on Embedded Devices
  • 4. Copyright © 2015 Synopsys Inc. 4 How CNN Works (Once Trained) • Multiple feature extraction layers • Progressive refinement process • Each successive layer extracts more complex features (higher level) • Last layer performs classification • Same computation (neuron) replicated multiple times Input image Layer 1 Low level feature extraction Pooling & down sampling Layer 2 Mid-level features Partially connected Layer 3 High-level features Fully connected classification
  • 5. Copyright © 2015 Synopsys Inc. 5 • Each layer of convolutions extract progressively higher level features • Subsampling / max pooling to “zoom out” and detect bigger objects with smaller convolutions • Non-linear function on each neuron to activate it Visualising a CNN Layer 1 output sample Layer 2 output sample Layer 3 output sample Layer 4 output sample
  • 6. Copyright © 2015 Synopsys Inc. 6 • Convolution of multiple inputs together • Fixed kernel size • Optional subsampling • 1, 2, 4x • Optional max-pooling • Very regular, repetitive computation • Dominated by MAC • Deterministic • Non-linear activation function (sigmoid, hyperbolic tangent, rectifier) CNN Computation I0 IM-1 I1 O0 ON-1 M inputs (XI * YI) Z kernels (K * K) with associated weights N outputs (XO * YO) Oj = act(Bj+ (Iv x Kw) + …) Convolution (x) act act Activation (tanh, ReLU) …
  • 7. Copyright © 2015 Synopsys Inc. 7 • Given the nature of the algorithm, there are many ways to accelerate CNNs including: • Vector / SIMD unit • Systolic array / Streaming • GPU • Performance / Power / Area trade-offs will vary • Depending on the architecture • In all cases the main limitations will be • Amount of closely coupled memory available • Maximum number of Giga-MAC/s that can be sustained • I/O bandwidth required & available • Optimized data movement, efficient streaming Moving Towards Embedded CNN EV Processor Shared Memory DMA Interconnect RISC CPU 32-bit Core 32-bit Core 32-bit Core 32-bit Core CNN Engine … … PE PE PE PE PE PE
  • 8. Copyright © 2015 Synopsys Inc. 8 Moving CNN to Embedded Systems • Graph Complexity • Number of layers (depth) • Size of the convolutions filters • Number of connections between the layers Compute requirements ALU width/costMemory size Input Layer 1 Layer 2 Layer 3 Layer 4 3 2 1 1 2 6 1 2 1 0 1 1 0 Image Filter 5 8 3 3 Feature map Conv. = 4 6 2 2 Data precision# Coefficients Act.
  • 9. Copyright © 2015 Synopsys Inc. 9 • Starting point: • Multicoreware generated ~10 million faces/non-faces from over 200 Hollywood and Bollywood full length movies • Trained CNN to detect faces in those movies Example of a Big& Small CNN Application Metric Alexnet like Embedded version Weight Space 400 MB 0.5 MB Layers 10 (7Cv+3 FC) 5 (3 Cv+2 FC) Compute 200x 1x Bandwidth 400x 1x F1-Score .963 .905 Accuracy .993 .981 VGA 30 FPS 4800 GOPS 24 GOPS • Cv: Convolution layers (partially connected) • FC: Fully connected layers
  • 10. Copyright © 2015 Synopsys Inc. 10 • Using standard open source projects to train networks with floating point and GPU acceleration to explore network topology • Cuda-convnet, Caffe, Theano • Didn’t worry initially about numerical precision as literature has shown CNN are robust to precision • From scratch: Small networks can be trained very fast • Enables lots of shots on goal : • Using scripting and many GPU’s • Number of network layers, convolutions, subsampling & pooling • Explored huge space and quickly converged on a graph with good learning • From an existing graph: Also worked backwards from high accuracy large graph • Iteratively reduced it and retrained the best ones • End up with similar networks in both cases Reducing Complexity of the Graph
  • 11. Copyright © 2015 Synopsys Inc. 11 • Improve F-1 score with classic techniques such as • Data Normalization • Hard negative mining (boosting) • Annealing the learning rate • Data Augmentation: Flip, Random Cropping, color space, .. • Moved initial system from F1 of ~.74 to ~.90 • Once the graph topology and training is satisfying look at the impact of moving to fixed point • Test below are done with 31437 positive and 263145 negative samples Training Optimizations Initial Optimized True positive 19706 27093 False positive 1769 1335 False negative 11731 4344 F-1 Score 0.7449 0.9051
  • 12. Copyright © 2015 Synopsys Inc. 12 • Compare output of every layer with reference floating point version • Differences may grow after each layer • Detection threshold might need to be tweaked to achieve similar results Moving to Fixed Point: Empirical Approach ReLU Image Filter Convolution = Accumulator Feature map 200 64 1 150 50 1 1 10 220 4 0 0 -1 750 255 590 -20 Non-linear function 750 255 590 0 Shift + saturate 255 127 255 0 Greyscale image, 8 bit pixels Convert to fixed-point based on range, e.g 16 bit (Q2S13) Make sure accumulator is wide enough, e.g. 32 bit (signed) Shift-right values to avoid overflow, x = max(0, x) >> N Choose ‘N’ according to dynamic range of ‘x’ values
  • 13. Copyright © 2015 Synopsys Inc. 13 • FDDB: Face Detection Data Set and Benchmark • Results shown for the embedded small & fixed point graph • Localization can be improved with pre/post processing • Impacts scores • Not done here Results For Face Detection Application Type F-1 Best (CascadeCNN) 0.91 Middle 10 average 0.85 Embedded – 40% 0.84 Embedded – 50% 0.82 Fixed point, 8bit
  • 14. Copyright © 2015 Synopsys Inc. 14 • Design time configurable • Number of CNN Processing Elements (2 to 8) • Streaming interconnection network configured for number of cores • Runtime reconfigurable • Flexible point-to-point connections between all cores • CNN-optimized instruction set • Convolutions, MAC, LUT, … • Micro-DMA & stream interface for data movement • Programmable • Using the generated C compiler • Each CNN PE has a local data & program memory Low-cost, Low-power, Flexible CNN SubsystemInterconnect DMA Shared DMem CNN Engine Reconfigurable Streaming Interconnect PE 1 …PE 2 PE 4 PE 5 PE 6 PE 8… RISC MP 32 bit RISC 32 bit RISC 32 bit RISC 32 bit RISC Sync
  • 15. Copyright © 2015 Synopsys Inc. 15 Mapping Example and Performance L1&4 FIFO L2 L3a L3b Subsystem Interconnect L1 L2 L3 L4 • Input image read only once • 30 cycles average to do 8 convolutions of 5x5 in parallel • Including all data movement & contention • Over 85% MAC resource utilization (8 MACs / CNN PE) • ~15mW per PE @28nm HPM • w. memory & interconnect • Mapping on 4 processing elements • Smaller layers merged together 4 PE, 5 FIFO configuration
  • 16. Copyright © 2015 Synopsys Inc. 16 Demonstrator ARC EV52 Processor RISC multi-core Shared Data Mem CNN Engine DMA AXI Subsystem Interconnect PE 8 Core 2 MEM PE 1 Core 1 MEM AXI Interconnect DDR ARC HS Core • Read in frame, • Pyramid (scaling) • Non-max suppression • Softmax • Display the result AXI 2 UMRBus CNN graph Host application streaming video frames to DDR over UMR-bus and back HAPS 70-S12 Prototyping System Clocked at 50Mhz (10% of real-time) Workstation webcam
  • 17. Copyright © 2015 Synopsys Inc. 17 • CNN compute requirement can be dramatically reduced with a small impact of the detection rates • Works well when the number of object classes to detect is kept small • Offline training is the critical step to obtain good performances • Specialized and programmable hardware can be used to efficiently implement many different CNN graphs • Low power and area • Some pre- and post-processing is needed to have a complete and useful application • CNN accelerator coupled with quad-core RISC cluster • Useful to couple CNN with other processing steps to improve performances • Shrinking the image when it doesn’t impact detection rates • Sliding a detection window on an image • Region of interest Lessons Learned
  • 18. Copyright © 2015 Synopsys Inc. 18 • Selected CNN papers • Embedded facial image processing with Convolutional Neural Networks • http://liris.cnrs.fr/Documents/Liris-6072.pdf • Memory-Centric Accelerator Design for Convolutional Neural Networks • http://parse.ele.tue.nl/system/attachments/58/original/iccdMP17.pdf?1381908921 • CNN tutorial & courses • Stanford CNN course • http://cs231n.github.io/ • Neural network intro and visualization • http://colah.github.io/ • Synopsys DesignWare Embedded Vision Processors • http://www.synopsys.com/ev • More information and demo available at the Technology Showcase (Mission City Ballroom, Tables 3 & 4) Resources