SlideShare una empresa de Scribd logo
1 de 37
Descargar para leer sin conexión
EM3V - Embedded Vision
Sundance Multiprocessor Technology, Ltd.
Pedro Machado pedro.m@sundance.com
Fatima Kishwar fatima.k@sundance.com
Flemming Christensen flemming.c@sundance.com
7/16/2018
1
OVERVIEW
• Company profile
• Introduction
• Technologies
• VCS-1 (EMC2) System
• Discussion
• Future Work
7/16/2018
2
SERVICE AND EXCELLENCE!
Established in 1989 by Flemming CHRISTENSEN
• Employee Owned and a ’Life-Style’ company
• 10x people with 300+ years experince
• 4x with accredited Xilinx FPGA training
• Always designed and built our own products
• BSI - ISO9001-2015 certified, since 2003
Techology Focus
• Acceleration,Vision, Sensor & Robotics
7/16/2018
3
MODULAR, RECONFIGURABLE - LIKE LEGO®
• Reduced time-to-market
• Rapid Prototyping
• Reconfigurable
• Flexible
• Modular
• Scalable
• Reliable
Year 1989
7/16/2018
4
5
TURN-KEY SYSTEM SOLUTIONS FROM A-Z
• Custom Bespoke Systems
• Hazardous Environment
• Custom Integration Service
• Design of Enclosures
• Commercial-of-the-Shelves Systems (COTS)
• Flexible and Upgradeable
• Design for Excellence
• Maintenance with ease
INTRODUCTION
 Increasing demand for High Performance Computing
 Everyone wants more compute-power
 Finer time-steps; larger data-sets; better models
 Decreasing single-threaded performance
 Emphasis on multi-core CPUs and parallelism
 Do computational biologists need to learn PThreads?
 Increasing focus on power and space
 Boxes are cheap: 16 node clusters are very affordable
 Where do you put them? Who is paying for power?
 How can we use hardware acceleration to help?
7/16/2018
6
TYPES OF HARDWARE ACCELERATOR
 GPU : Graphics Processing Unit
 Many-core - 30 SIMD processors per device
 High bandwidth, low complexity memory – no caches
 MPPA : Massively Parallel Processor Array
 Grid of simple processors – 300 tiny RISC CPUs
 Point-to-point connections on 2-D grid
 FPGA : Field Programmable Gate Array
 Fine-grained grid of logic and small RAMs
 Build whatever you want
7/16/2018
7
HARDWARE ADVANTAGES:
PERFORMANCE
A Comparison of CPUs, GPUs, FPGAs, and MPPAs for Random Number Generation,
D. Thomas, L. Howes, and W. Luk , In Proc. of FPGA, pgs. 22-24, 2009
1
10
1
31
0
10
20
30
40
C P U G P U MP P A F P G A
S peed-up
• More parallelism - more performance
• GPU: 30 cores, 16-way SIMD
• MPPA: 300 tiny RISC cores
• FPGA: hundreds of parallel functional units
7/16/2018
8
HARDWARE ADVANTAGES: POWER
A Comparison of CPUs, GPUs, FPGAs, and MPPAs for Random Number Generation,
D. Thomas, L. Howes, and W. Luk , In Proc. of FPGA, pgs. 22-24, 2009
1 10 1
31
1 9 18
175
0
50
100
150
200
C P U G P U MP P A F P G A
S peed-up E fficiency
• GPU: 1.2GHz - same power as CPU
• MPPA: 300MHz - Same performance as CPU, but 18x less power
• FPGA: 300MHz - faster and less power
7/16/2018
9
FPGA ACCELERATED APPLICATIONS
 Finance
 2006: Option pricing: 30x CPU
 2007: Multivariate Value-at-Risk: 33x Quad CPU
 2008: Credit-risk analysis: 60x Quad CPU
 Bioinformatics
 2007: Protein Graph Labelling: 100x Quad CPU
 Neural Networks
 2008: Spiking Neural Networks: 4x Quad CPU
1.1x GPU
All with less than a fifth of the power
7/16/2018
10
PROBLEM: DESIGN EFFORT
 Researchers love scripting languages: Matlab, Python, Perl
 Simple to use and understand, lots of libraries
 Easy to experiment and develop promising prototype
 Eventually prototype is ready: need to scale to large problems
 Need to rewrite prototype to improve performance: e.g. Matlab to C
 Simplicity of prototype is hidden by layers of optimisation
Hour Day Week Month
0.25
1
Year
4
16
64
256
Scripted
Compiled
Multi-threaded
Vectorised
Relative
Performance
Design-time
CPU
7/16/2018
11
PROBLEMS: DESIGN EFFORT
 GPUs provide a somewhat gentle learning curve
 CUDA and OpenCL almost allow compilation of ordinary C code
 User must understand GPU architecture to maximise speed-up
 Code must be radically altered to maximise use of functional units
 Memory structures and accesses must map onto physical RAM banks
 We are asking the user to learn about things they don’t care about
Hour Day Week Month
0.25
1
Year
4
16
64
256
Compiled
C to GPU
Reorganisation
Memory Opt.
Relative
Performance
Design-time
CPU
GPU
7/16/2018
12
PROBLEMS: DESIGN EFFORT
 FPGAs provide large speed-up and power savings – at a price!
 Days or weeks to get an initial version working
 Multiple optimisation and verification cycles to get high performance
 Too risky and too specialised for most users
 Months of retraining for an uncertain speed-up
 Currently only used in large projects, with dedicated FPGA engineer
Hour Day Week Month
0.25
1
Year
4
16
64
256
Initial Design
Parallelisation
Clock Rate
Relative
Performance
Design-time
CPU
GPU
FPGA
7/16/2018
13
GOAL: FPGAS FOR THE MASSES
 Accelerate niche applications with limited user-base
 Don’t have to wait for traditional “heroic” optimisation
 Single-source description
 The prototype code is the final code
 Encourage experimentation
 Give users freedom to tweak and modify
 Target platforms at multiple scales
 Individual user; Research group; Enterprise
 Use domain specific knowledge about applications
 Identify bottlenecks: optimise them
 Identify design patterns: automate them
 Don’t try to do general purpose “C to hardware”
7/16/2018
14
WHICH TECHNOLOGY?
 Clear architectural trend of parallelism and heterogeneity
 Heterogeneous devices have many tradeoffs
 Usage cases also affect best device choice
 Problem: huge design space
Problem
Solution
CPU CPU CPU CPU CPU
Problem
Solution
Execution time: 10 sec Execution time: 2.5 sec
CPU FPGAPCI
CPU GPUPCI
Orders of Magnitude
Improvement
CPU
CPU CPU
GPU
FPGA
Clock Rate
Base
Complexity
Base
Parallelism
Base
Power
Base
ExecutionTime
Task Size
Sequential CPU
Multicore CPU
GPU
FPGA
Huge Design Space
•Which Accelerator?
•Which Brand?
•Number of cores?
•Which Device?
•Device Cost?
•Design time?
•Which algorithm?
•Use case
optimization?
7/16/2018
15
TYPICAL CV APPLICATIONS: SLIDING
WINDOW
 Contribution: thorough analysis of devices and use cases for
sliding window applications
 Sliding window used in many domains, including image
processing and embedded
CPU
CPU
CPU
CPU
CPU
GPU FPGA
Devices Algorithms Use Cases
KernelSize
Image SizeCorrentropyConvolution
Sum of Absolute Differences
7/16/2018
16
TYPICAL CV APPLICATIONS: SLIDING
WINDOW APPLICATIONS
 Consider a 2D Sliding Window with 16-bit grayscale image inputs
 Applies window function against a window from image and the kernel
 “Slides” the window to get the next input
 Repeats for every possible window
 45x45 kernel on 1080p 30-FPS video = 120 billion memory accesses/second
Window KernelWindow 0
Window Function
Output Pixel
Window W-1
Window 1Input: image of size x×y, kernel of size n×m
for (row=0; row < x-n; row++) {
for (col=0; col < y-m; col++) {
// get n*m pixels (i.e., windows
// starting from current row and col)
window=image[row:row+n-1][col:col+m-1]
output[row][col]=f(window,kernel)
}
}
7/16/2018
17
APP 1: SUM OF ABSOLUTE DIFFERENCES
(SAD)
 Used for: H.264 encoding, object identification
 Window function: point-wise absolute difference, followed by
summation
Kernel
W1
WindowX
W2
W3 W4
k1 k2
k3 k4
W1 W2 W3 W4
k1 k2 k3 k4
d1 d2 d3 d4 a1 a2 a3 a4
Ox
Output Pixel
Absolute Value()
Ox = 0;
For pixel in window:
Ox += abs(pixeli – kerneli)
7/16/2018
18
APP 2: 2D CONVOLUTION
 Used for: filtering, edge detection
 Window function: point-wise product followed by summation
Kernel
W1
WindowX
W2
W3 W4
k1 k2
k3 k4
W1 W2 W3 W4
k4 k3 k2 k1
p1 p2 p3 p4
Ox Output Pixel
Ox = 0;
For pixel in window:
Ox += (pixeli x kerneln-i)
7/16/2018
19
APP 3: CORRENTROPY
 Used for: optical flow, obstacle avoidance
 Window function: Gaussian of point-wise absolute difference,
followed by summation
W1 W2 W3 W4
k1 k2 k3 k4
a1 a2 a3 a4
Ox
Output Pixel
Absolute Difference
Ox = 0;
For pixel in window:
Ox += Gauss(abs(pixeli – kerneli))
g1 g2 g3 g4
Gaussian Function()
7/16/2018
20
HETEROGENEOUS COMPUTING SYSTEM
IN TOP500 LIST
 Reason: Significant performance/energy-efficiency boost from
GPU/CPU
1
8 7
17
39
62
53
75
102
94
0
20
40
60
80
100
120
2007 2008 2009 2010 2011 2012 2013 2014 2015 2016
(June)
Heterogeneous computing system in Top500
7/16/2018
21
 Specialized accelerator for data-parallel
applications
 Optimized for processing massive data
 Give up unrelated goal and features
 Give up optimizing latency for processing
single data
 Give up branch prediction, out-of-order
execution
 Give up large traditional cache hierarchy
 More resource for parallel are processing
 More cores, more ALU
GPU: SPECIALIZED ACCELERATOR FOR A
SET OF APPLICATIONS
Data DataCore
Data DataCore
Data DataCore
Data DataCore
7/16/2018
22
 Only provides primitive building blocks for computation
 Register, addition/multiplication , memories, programmable Boolean operations and
connections
 Build application-specific accelerator from primitives building blocks
 Interconnection between primitive functional units
 Timing of data movement between primitive functional units
 Opportunities for optimizations for a specific application!
 Maximizing efficiency while throwing away redundancy
CREATING APPLICATION-SPECIFIC
ACCELERATOR WITH FPGA
Virtex®-7 FPGA
Precise, Low Jitter Clocking
MMCMs
Logic Fabric
LUT-6 CLB
DSP Engines
DSP48E1 Slices
On-Chip Memory
36Kbit/18Kbit Block
RAM
Enhanced Connectivity
PCIe® Interface Blocks
Hi-perf. Parallel I/O
Connectivity
SelectIO™ Technology
Hi-performance Serial I//O Connectivity
Transceiver Technology
7/16/2018
23
 Require tremendous efforts
 Extensive knowledge of digital circuit design
 The potential of FPGAs is not easily accessible by
common software engineers
THE CHALLENGES OF PROMOTING
FPGAS AMONG SOFTWARE ENGINEERS
AXI Master
Burst inference
DSP48
Timing Closure
Stable interface
Loop rewind
247/16/2018
24
• Zynq-7000 devices are equipped with
dual-core ARM Cortex-A9 processors
integrated with 28nm Artix-7 or Kintex®-
7 based programmable logic for
excellent performance-per-watt and
maximum design flexibility.
• Sundance’s EMC2 carrier board is
Compatible with all the Zynq-7000
Series.
HYBRID ARQUITECTURE
7/16/2018
25
• Zynq® UltraScale+™ MPSoC devices provide 64-bit processor scalability
while combining real-time control with soft and hard engines for graphics,
video, waveform, and packet processing.
• Sundance’s EMC2 carrier board is compatible with all the Zynq®
UltraScale+™ MPSoC. Our focus is now on the XCZU4EV device (automotive
grade).
WHAT IS THE BEST SOLUTION?
7/16/2018
26
VCS-1 (EMC2) HARDWARE FEATURES
Connectivity:
 FM191-R; FMC-LPC to:
 15x Digital I/Os [DB9]
 12x Analogue Inputs [DB9]
 8x Analogue Outputs [DB9]
 1x Expansion [SEIC]
 FM191-U; SEIC to:
 4x USB3.0 [USB-c]
 28x GPIO [40-pin GPIO]
 FM191-A1; 40-pin GPIO
 28x GPIO [DB9]
7/16/2018
27
The ZU4EV MPSoC is compatible with a wide range of sensors.
Depth sensor (up to 20m). Interface
USB3.0
Thermal
imaging
Interface: Eth
Movidius Neural
Compute Stick
Interface: USB3.0
VCS-1 (EMC2) SENSORS COMPATIBILITY
7/16/2018
28JAI AD-130GE
Interface: Eth
VCS-1 (EMC2) COMPATIBILITY
VCS-1 features:
 Raspberry PI and Arduino compatible;
 Compatible with most of the Arduino/RPI
sensors and actuators;
 4x USB3.0 ports for interfacing with a wide
range of sensors;
 MQTT and OpenCV compatible
 ROS compatible
 ROS2 ready
 HIPPEROS ready
7/16/2018
29
The VCS-1 will be fully compatible
with the Xilinx reVision stack.
 Includes support for the most
popular neural networks
including AlexNet, GoogLeNet,
VGG, SSD, and FCN.
 Optimized implementations for
CNN network layers, required to
build custom neural networks
(DNN/CNN)
7/16/2018
30
Xilinx reVISION stack
DEEP LEARNING ON THE VCS-1
(EMC2)
VCS-1 (EMC2) OPEN SOURCE
SOFTWARE AND FIRMWARE
Open Source Hardware/software and online
documentation:
 Open Hardware Repository
https://www.ohwr.org/projects/emc2-dp
 Microsoft Windows/Linux 64-bit SDK
https://github.com/SundanceMultiprocessorTechnology/V
CS-1_SDK
 FM191 ARM SDK
https://github.com/SundanceMultiprocessorTechnology/V
CS-1_FM191_SDK
 FM191 Firmware
https://github.com/SundanceMultiprocessorTechnology/V
CS-1_FM191_FW
 EMC2 ROS
 https://github.com/SundanceMultiprocessorTechnology
/VCS-1_emc2_ros
7/16/2018
31
A custom enclosure was specially
designed for accommodating the VCS-1
system.
7/16/2018
32
VCS-1(EMC2) ENCLOSURE
DISCUSSION
The VCS-1 has the following
characteristics:
1. High performance
(24V@0.595A)
2. Low power consumption
3. Highly compatible with a wide
range of commercially
available sensors and actuators
4. Highly optimised for computer
vision applications
5. Fully reconfigurable
7/16/2018
33
WHAT NEXT?
 Be part of our customers family by ordering our products or
hiring our services.
 Sundance University Program (SUP)
 Access to hardware prototypes
 Advisory Board Members
 BSC/MSc/PhD Internships
 Funding capture
 H2020
 InnovateUK
 EPSRC
Know more about SUP and on-going projects
https://www.sundance.com/sundance-in-eu-projects-
programs/
7/16/2018
34
UNIVERSITY CLIENTS
7/16/2018
35
INDUSTRIAL CLIENTS
7/16/2018
36
QUESTIONS?
7/16/2018
37
Sundance Multiprocessor Technology, Ltd.
Pedro Machado pedro.m@sundance.com
Fatima Kishwar fatima.k@sundance.com
Flemming Christensen flemming.c@sundance.com
EM3V - Embedded Vision

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

MIT's experience on OpenPOWER/POWER 9 platform
MIT's experience on OpenPOWER/POWER 9 platformMIT's experience on OpenPOWER/POWER 9 platform
MIT's experience on OpenPOWER/POWER 9 platform
 
02 ai inference acceleration with components all in open hardware: opencapi a...
02 ai inference acceleration with components all in open hardware: opencapi a...02 ai inference acceleration with components all in open hardware: opencapi a...
02 ai inference acceleration with components all in open hardware: opencapi a...
 
Overview of Scientific Workflows - Why Use Them?
Overview of Scientific Workflows - Why Use Them?Overview of Scientific Workflows - Why Use Them?
Overview of Scientific Workflows - Why Use Them?
 
PG-4119, 3D Geometry Compression on GPU, by Jacques Lefaucheux
PG-4119, 3D Geometry Compression on GPU, by Jacques LefaucheuxPG-4119, 3D Geometry Compression on GPU, by Jacques Lefaucheux
PG-4119, 3D Geometry Compression on GPU, by Jacques Lefaucheux
 
PT-4052, Introduction to AMD Developer Tools, by Yaki Tebeka and Gordon Selley
PT-4052, Introduction to AMD Developer Tools, by Yaki Tebeka and Gordon SelleyPT-4052, Introduction to AMD Developer Tools, by Yaki Tebeka and Gordon Selley
PT-4052, Introduction to AMD Developer Tools, by Yaki Tebeka and Gordon Selley
 
TAU E4S ON OpenPOWER /POWER9 platform
TAU E4S ON OpenPOWER /POWER9 platformTAU E4S ON OpenPOWER /POWER9 platform
TAU E4S ON OpenPOWER /POWER9 platform
 
Improving engineering efficiency through tiled hierarchical flows
Improving engineering efficiency through tiled hierarchical flowsImproving engineering efficiency through tiled hierarchical flows
Improving engineering efficiency through tiled hierarchical flows
 
ANSYS SCADE Usage for Unmanned Aircraft Vehicles
ANSYS SCADE Usage for Unmanned Aircraft VehiclesANSYS SCADE Usage for Unmanned Aircraft Vehicles
ANSYS SCADE Usage for Unmanned Aircraft Vehicles
 
WT-4066, The Making of Turbulenz’ Polycraft WebGL Benchmark, by Ian Ballantyne
WT-4066, The Making of Turbulenz’ Polycraft WebGL Benchmark, by Ian BallantyneWT-4066, The Making of Turbulenz’ Polycraft WebGL Benchmark, by Ian Ballantyne
WT-4066, The Making of Turbulenz’ Polycraft WebGL Benchmark, by Ian Ballantyne
 
HC-4017, HSA Compilers Technology, by Debyendu Das
HC-4017, HSA Compilers Technology, by Debyendu DasHC-4017, HSA Compilers Technology, by Debyendu Das
HC-4017, HSA Compilers Technology, by Debyendu Das
 
Tesla Accelerated Computing Platform
Tesla Accelerated Computing PlatformTesla Accelerated Computing Platform
Tesla Accelerated Computing Platform
 
Programming Models for Exascale Systems
Programming Models for Exascale SystemsProgramming Models for Exascale Systems
Programming Models for Exascale Systems
 
OpenPOWER System Marconi100
OpenPOWER System Marconi100OpenPOWER System Marconi100
OpenPOWER System Marconi100
 
SGI: Meeting Manufacturing's Need for Production Supercomputing
SGI: Meeting Manufacturing's Need for Production SupercomputingSGI: Meeting Manufacturing's Need for Production Supercomputing
SGI: Meeting Manufacturing's Need for Production Supercomputing
 
Xilinx Edge Compute using Power 9 /OpenPOWER systems
Xilinx Edge Compute using Power 9 /OpenPOWER systemsXilinx Edge Compute using Power 9 /OpenPOWER systems
Xilinx Edge Compute using Power 9 /OpenPOWER systems
 
Hardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and MLHardware & Software Platforms for HPC, AI and ML
Hardware & Software Platforms for HPC, AI and ML
 
Part 2 Maximizing the utilization of GPU resources on-premise and in the cloud
Part 2   Maximizing the utilization of GPU resources on-premise and in the cloudPart 2   Maximizing the utilization of GPU resources on-premise and in the cloud
Part 2 Maximizing the utilization of GPU resources on-premise and in the cloud
 
High Performance Interconnects: Assessment & Rankings
High Performance Interconnects: Assessment & RankingsHigh Performance Interconnects: Assessment & Rankings
High Performance Interconnects: Assessment & Rankings
 
OpenCAPI next generation accelerator
OpenCAPI next generation accelerator OpenCAPI next generation accelerator
OpenCAPI next generation accelerator
 
It's Time to ROCm!
It's Time to ROCm!It's Time to ROCm!
It's Time to ROCm!
 

Similar a E3MV - Embedded Vision - Sundance

Amd accelerated computing -ufrj
Amd   accelerated computing -ufrjAmd   accelerated computing -ufrj
Amd accelerated computing -ufrj
Roberto Brandao
 
GTC Taiwan 2017 在 Google Cloud 當中使用 GPU 進行效能最佳化
GTC Taiwan 2017 在 Google Cloud 當中使用 GPU 進行效能最佳化GTC Taiwan 2017 在 Google Cloud 當中使用 GPU 進行效能最佳化
GTC Taiwan 2017 在 Google Cloud 當中使用 GPU 進行效能最佳化
NVIDIA Taiwan
 

Similar a E3MV - Embedded Vision - Sundance (20)

Accelerating Data Science With GPUs
Accelerating Data Science With GPUsAccelerating Data Science With GPUs
Accelerating Data Science With GPUs
 
Evolution of Supermicro GPU Server Solution
Evolution of Supermicro GPU Server SolutionEvolution of Supermicro GPU Server Solution
Evolution of Supermicro GPU Server Solution
 
OpenACC Monthly Highlights: November 2020
OpenACC Monthly Highlights: November 2020OpenACC Monthly Highlights: November 2020
OpenACC Monthly Highlights: November 2020
 
22by7 and DellEMC Tech Day July 20 2017 - Power Edge
22by7 and DellEMC Tech Day July 20 2017 - Power Edge22by7 and DellEMC Tech Day July 20 2017 - Power Edge
22by7 and DellEMC Tech Day July 20 2017 - Power Edge
 
Design installation-commissioning-red raider-cluster-ttu
Design installation-commissioning-red raider-cluster-ttuDesign installation-commissioning-red raider-cluster-ttu
Design installation-commissioning-red raider-cluster-ttu
 
Infrastructure and Tooling - Full Stack Deep Learning
Infrastructure and Tooling - Full Stack Deep LearningInfrastructure and Tooling - Full Stack Deep Learning
Infrastructure and Tooling - Full Stack Deep Learning
 
Get Your Head in the Cloud - Lessons in GPU Computing with Schlumberger
Get Your Head in the Cloud - Lessons in GPU Computing with SchlumbergerGet Your Head in the Cloud - Lessons in GPU Computing with Schlumberger
Get Your Head in the Cloud - Lessons in GPU Computing with Schlumberger
 
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
 
Cuda meetup presentation 5
Cuda meetup presentation 5Cuda meetup presentation 5
Cuda meetup presentation 5
 
Modular by Design: Supermicro’s New Standards-Based Universal GPU Server
Modular by Design: Supermicro’s New Standards-Based Universal GPU ServerModular by Design: Supermicro’s New Standards-Based Universal GPU Server
Modular by Design: Supermicro’s New Standards-Based Universal GPU Server
 
Modular by Design: Supermicro’s New Standards-Based Universal GPU Server
Modular by Design: Supermicro’s New Standards-Based Universal GPU ServerModular by Design: Supermicro’s New Standards-Based Universal GPU Server
Modular by Design: Supermicro’s New Standards-Based Universal GPU Server
 
POWER9 for AI & HPC
POWER9 for AI & HPCPOWER9 for AI & HPC
POWER9 for AI & HPC
 
TechEvent Exdata X7-2 POC with OVM
TechEvent Exdata X7-2 POC with OVMTechEvent Exdata X7-2 POC with OVM
TechEvent Exdata X7-2 POC with OVM
 
08 Supercomputer Fugaku
08 Supercomputer Fugaku08 Supercomputer Fugaku
08 Supercomputer Fugaku
 
Amd accelerated computing -ufrj
Amd   accelerated computing -ufrjAmd   accelerated computing -ufrj
Amd accelerated computing -ufrj
 
NGD Systems and Microsoft Keynote Presentation at IPDPS MPP in Vacouver
NGD Systems and Microsoft Keynote Presentation at IPDPS MPP in VacouverNGD Systems and Microsoft Keynote Presentation at IPDPS MPP in Vacouver
NGD Systems and Microsoft Keynote Presentation at IPDPS MPP in Vacouver
 
PGI Compilers & Tools Update- March 2018
PGI Compilers & Tools Update- March 2018PGI Compilers & Tools Update- March 2018
PGI Compilers & Tools Update- March 2018
 
HPC Infrastructure To Solve The CFD Grand Challenge
HPC Infrastructure To Solve The CFD Grand ChallengeHPC Infrastructure To Solve The CFD Grand Challenge
HPC Infrastructure To Solve The CFD Grand Challenge
 
GTC Taiwan 2017 在 Google Cloud 當中使用 GPU 進行效能最佳化
GTC Taiwan 2017 在 Google Cloud 當中使用 GPU 進行效能最佳化GTC Taiwan 2017 在 Google Cloud 當中使用 GPU 進行效能最佳化
GTC Taiwan 2017 在 Google Cloud 當中使用 GPU 進行效能最佳化
 
Trends towards the merge of HPC + Big Data systems
Trends towards the merge of HPC + Big Data systemsTrends towards the merge of HPC + Big Data systems
Trends towards the merge of HPC + Big Data systems
 

Más de Sundance Multiprocessor Technology Ltd.

Más de Sundance Multiprocessor Technology Ltd. (20)

Sundance Perception Blade
Sundance Perception BladeSundance Perception Blade
Sundance Perception Blade
 
Sundance's presentation at B:RAI 2020
Sundance's presentation at B:RAI 2020Sundance's presentation at B:RAI 2020
Sundance's presentation at B:RAI 2020
 
Sundance at the 49th Intelligent Sensing Program
Sundance at the 49th Intelligent Sensing ProgramSundance at the 49th Intelligent Sensing Program
Sundance at the 49th Intelligent Sensing Program
 
Sundance VCS-1 for Precision Robotics
Sundance VCS-1 for Precision RoboticsSundance VCS-1 for Precision Robotics
Sundance VCS-1 for Precision Robotics
 
TULIPP at the 10th Intelligent Imaging Event
TULIPP at the 10th Intelligent Imaging EventTULIPP at the 10th Intelligent Imaging Event
TULIPP at the 10th Intelligent Imaging Event
 
Sundance TULIPP Workshop at Nottingham Trent University
Sundance TULIPP Workshop at Nottingham Trent UniversitySundance TULIPP Workshop at Nottingham Trent University
Sundance TULIPP Workshop at Nottingham Trent University
 
TULIPP Starter Kit – AGRI
TULIPP Starter Kit – AGRITULIPP Starter Kit – AGRI
TULIPP Starter Kit – AGRI
 
System Design on Zynq using SDSoC
System Design on Zynq using SDSoCSystem Design on Zynq using SDSoC
System Design on Zynq using SDSoC
 
Re-Vision stack presentation
Re-Vision stack presentationRe-Vision stack presentation
Re-Vision stack presentation
 
Moving object detection on FPGA
Moving object detection on FPGAMoving object detection on FPGA
Moving object detection on FPGA
 
ANPR FPGA Workshop
ANPR FPGA WorkshopANPR FPGA Workshop
ANPR FPGA Workshop
 
HiPEAC 2018 - CPS, why all the fuss?
HiPEAC 2018 - CPS, why all the fuss?HiPEAC 2018 - CPS, why all the fuss?
HiPEAC 2018 - CPS, why all the fuss?
 
Sundance HiPEAC 2018 Presentation
Sundance HiPEAC 2018 PresentationSundance HiPEAC 2018 Presentation
Sundance HiPEAC 2018 Presentation
 
TULIPP - Leaving a legacy: The ultimate Low-Power Image Processing Handbook
TULIPP - Leaving a legacy: The ultimate Low-Power Image Processing HandbookTULIPP - Leaving a legacy: The ultimate Low-Power Image Processing Handbook
TULIPP - Leaving a legacy: The ultimate Low-Power Image Processing Handbook
 
TULIPP at NMI 18-5-17
TULIPP at NMI 18-5-17TULIPP at NMI 18-5-17
TULIPP at NMI 18-5-17
 
Open VPX Tutorial
Open VPX TutorialOpen VPX Tutorial
Open VPX Tutorial
 
VF360 OpenVPX Board w. Altera Stratix and TI KeyStone DSP
VF360 OpenVPX Board w. Altera Stratix and TI KeyStone DSPVF360 OpenVPX Board w. Altera Stratix and TI KeyStone DSP
VF360 OpenVPX Board w. Altera Stratix and TI KeyStone DSP
 
Stack PC in PC104 Land
Stack PC in PC104 LandStack PC in PC104 Land
Stack PC in PC104 Land
 
EMC2 Xilinx SDSoC presentation
EMC2 Xilinx SDSoC presentationEMC2 Xilinx SDSoC presentation
EMC2 Xilinx SDSoC presentation
 
Pc 104 series 1 application showcase
Pc 104 series 1 application showcasePc 104 series 1 application showcase
Pc 104 series 1 application showcase
 

Último

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 

E3MV - Embedded Vision - Sundance

  • 1. EM3V - Embedded Vision Sundance Multiprocessor Technology, Ltd. Pedro Machado pedro.m@sundance.com Fatima Kishwar fatima.k@sundance.com Flemming Christensen flemming.c@sundance.com 7/16/2018 1
  • 2. OVERVIEW • Company profile • Introduction • Technologies • VCS-1 (EMC2) System • Discussion • Future Work 7/16/2018 2
  • 3. SERVICE AND EXCELLENCE! Established in 1989 by Flemming CHRISTENSEN • Employee Owned and a ’Life-Style’ company • 10x people with 300+ years experince • 4x with accredited Xilinx FPGA training • Always designed and built our own products • BSI - ISO9001-2015 certified, since 2003 Techology Focus • Acceleration,Vision, Sensor & Robotics 7/16/2018 3
  • 4. MODULAR, RECONFIGURABLE - LIKE LEGO® • Reduced time-to-market • Rapid Prototyping • Reconfigurable • Flexible • Modular • Scalable • Reliable Year 1989 7/16/2018 4
  • 5. 5 TURN-KEY SYSTEM SOLUTIONS FROM A-Z • Custom Bespoke Systems • Hazardous Environment • Custom Integration Service • Design of Enclosures • Commercial-of-the-Shelves Systems (COTS) • Flexible and Upgradeable • Design for Excellence • Maintenance with ease
  • 6. INTRODUCTION  Increasing demand for High Performance Computing  Everyone wants more compute-power  Finer time-steps; larger data-sets; better models  Decreasing single-threaded performance  Emphasis on multi-core CPUs and parallelism  Do computational biologists need to learn PThreads?  Increasing focus on power and space  Boxes are cheap: 16 node clusters are very affordable  Where do you put them? Who is paying for power?  How can we use hardware acceleration to help? 7/16/2018 6
  • 7. TYPES OF HARDWARE ACCELERATOR  GPU : Graphics Processing Unit  Many-core - 30 SIMD processors per device  High bandwidth, low complexity memory – no caches  MPPA : Massively Parallel Processor Array  Grid of simple processors – 300 tiny RISC CPUs  Point-to-point connections on 2-D grid  FPGA : Field Programmable Gate Array  Fine-grained grid of logic and small RAMs  Build whatever you want 7/16/2018 7
  • 8. HARDWARE ADVANTAGES: PERFORMANCE A Comparison of CPUs, GPUs, FPGAs, and MPPAs for Random Number Generation, D. Thomas, L. Howes, and W. Luk , In Proc. of FPGA, pgs. 22-24, 2009 1 10 1 31 0 10 20 30 40 C P U G P U MP P A F P G A S peed-up • More parallelism - more performance • GPU: 30 cores, 16-way SIMD • MPPA: 300 tiny RISC cores • FPGA: hundreds of parallel functional units 7/16/2018 8
  • 9. HARDWARE ADVANTAGES: POWER A Comparison of CPUs, GPUs, FPGAs, and MPPAs for Random Number Generation, D. Thomas, L. Howes, and W. Luk , In Proc. of FPGA, pgs. 22-24, 2009 1 10 1 31 1 9 18 175 0 50 100 150 200 C P U G P U MP P A F P G A S peed-up E fficiency • GPU: 1.2GHz - same power as CPU • MPPA: 300MHz - Same performance as CPU, but 18x less power • FPGA: 300MHz - faster and less power 7/16/2018 9
  • 10. FPGA ACCELERATED APPLICATIONS  Finance  2006: Option pricing: 30x CPU  2007: Multivariate Value-at-Risk: 33x Quad CPU  2008: Credit-risk analysis: 60x Quad CPU  Bioinformatics  2007: Protein Graph Labelling: 100x Quad CPU  Neural Networks  2008: Spiking Neural Networks: 4x Quad CPU 1.1x GPU All with less than a fifth of the power 7/16/2018 10
  • 11. PROBLEM: DESIGN EFFORT  Researchers love scripting languages: Matlab, Python, Perl  Simple to use and understand, lots of libraries  Easy to experiment and develop promising prototype  Eventually prototype is ready: need to scale to large problems  Need to rewrite prototype to improve performance: e.g. Matlab to C  Simplicity of prototype is hidden by layers of optimisation Hour Day Week Month 0.25 1 Year 4 16 64 256 Scripted Compiled Multi-threaded Vectorised Relative Performance Design-time CPU 7/16/2018 11
  • 12. PROBLEMS: DESIGN EFFORT  GPUs provide a somewhat gentle learning curve  CUDA and OpenCL almost allow compilation of ordinary C code  User must understand GPU architecture to maximise speed-up  Code must be radically altered to maximise use of functional units  Memory structures and accesses must map onto physical RAM banks  We are asking the user to learn about things they don’t care about Hour Day Week Month 0.25 1 Year 4 16 64 256 Compiled C to GPU Reorganisation Memory Opt. Relative Performance Design-time CPU GPU 7/16/2018 12
  • 13. PROBLEMS: DESIGN EFFORT  FPGAs provide large speed-up and power savings – at a price!  Days or weeks to get an initial version working  Multiple optimisation and verification cycles to get high performance  Too risky and too specialised for most users  Months of retraining for an uncertain speed-up  Currently only used in large projects, with dedicated FPGA engineer Hour Day Week Month 0.25 1 Year 4 16 64 256 Initial Design Parallelisation Clock Rate Relative Performance Design-time CPU GPU FPGA 7/16/2018 13
  • 14. GOAL: FPGAS FOR THE MASSES  Accelerate niche applications with limited user-base  Don’t have to wait for traditional “heroic” optimisation  Single-source description  The prototype code is the final code  Encourage experimentation  Give users freedom to tweak and modify  Target platforms at multiple scales  Individual user; Research group; Enterprise  Use domain specific knowledge about applications  Identify bottlenecks: optimise them  Identify design patterns: automate them  Don’t try to do general purpose “C to hardware” 7/16/2018 14
  • 15. WHICH TECHNOLOGY?  Clear architectural trend of parallelism and heterogeneity  Heterogeneous devices have many tradeoffs  Usage cases also affect best device choice  Problem: huge design space Problem Solution CPU CPU CPU CPU CPU Problem Solution Execution time: 10 sec Execution time: 2.5 sec CPU FPGAPCI CPU GPUPCI Orders of Magnitude Improvement CPU CPU CPU GPU FPGA Clock Rate Base Complexity Base Parallelism Base Power Base ExecutionTime Task Size Sequential CPU Multicore CPU GPU FPGA Huge Design Space •Which Accelerator? •Which Brand? •Number of cores? •Which Device? •Device Cost? •Design time? •Which algorithm? •Use case optimization? 7/16/2018 15
  • 16. TYPICAL CV APPLICATIONS: SLIDING WINDOW  Contribution: thorough analysis of devices and use cases for sliding window applications  Sliding window used in many domains, including image processing and embedded CPU CPU CPU CPU CPU GPU FPGA Devices Algorithms Use Cases KernelSize Image SizeCorrentropyConvolution Sum of Absolute Differences 7/16/2018 16
  • 17. TYPICAL CV APPLICATIONS: SLIDING WINDOW APPLICATIONS  Consider a 2D Sliding Window with 16-bit grayscale image inputs  Applies window function against a window from image and the kernel  “Slides” the window to get the next input  Repeats for every possible window  45x45 kernel on 1080p 30-FPS video = 120 billion memory accesses/second Window KernelWindow 0 Window Function Output Pixel Window W-1 Window 1Input: image of size x×y, kernel of size n×m for (row=0; row < x-n; row++) { for (col=0; col < y-m; col++) { // get n*m pixels (i.e., windows // starting from current row and col) window=image[row:row+n-1][col:col+m-1] output[row][col]=f(window,kernel) } } 7/16/2018 17
  • 18. APP 1: SUM OF ABSOLUTE DIFFERENCES (SAD)  Used for: H.264 encoding, object identification  Window function: point-wise absolute difference, followed by summation Kernel W1 WindowX W2 W3 W4 k1 k2 k3 k4 W1 W2 W3 W4 k1 k2 k3 k4 d1 d2 d3 d4 a1 a2 a3 a4 Ox Output Pixel Absolute Value() Ox = 0; For pixel in window: Ox += abs(pixeli – kerneli) 7/16/2018 18
  • 19. APP 2: 2D CONVOLUTION  Used for: filtering, edge detection  Window function: point-wise product followed by summation Kernel W1 WindowX W2 W3 W4 k1 k2 k3 k4 W1 W2 W3 W4 k4 k3 k2 k1 p1 p2 p3 p4 Ox Output Pixel Ox = 0; For pixel in window: Ox += (pixeli x kerneln-i) 7/16/2018 19
  • 20. APP 3: CORRENTROPY  Used for: optical flow, obstacle avoidance  Window function: Gaussian of point-wise absolute difference, followed by summation W1 W2 W3 W4 k1 k2 k3 k4 a1 a2 a3 a4 Ox Output Pixel Absolute Difference Ox = 0; For pixel in window: Ox += Gauss(abs(pixeli – kerneli)) g1 g2 g3 g4 Gaussian Function() 7/16/2018 20
  • 21. HETEROGENEOUS COMPUTING SYSTEM IN TOP500 LIST  Reason: Significant performance/energy-efficiency boost from GPU/CPU 1 8 7 17 39 62 53 75 102 94 0 20 40 60 80 100 120 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 (June) Heterogeneous computing system in Top500 7/16/2018 21
  • 22.  Specialized accelerator for data-parallel applications  Optimized for processing massive data  Give up unrelated goal and features  Give up optimizing latency for processing single data  Give up branch prediction, out-of-order execution  Give up large traditional cache hierarchy  More resource for parallel are processing  More cores, more ALU GPU: SPECIALIZED ACCELERATOR FOR A SET OF APPLICATIONS Data DataCore Data DataCore Data DataCore Data DataCore 7/16/2018 22
  • 23.  Only provides primitive building blocks for computation  Register, addition/multiplication , memories, programmable Boolean operations and connections  Build application-specific accelerator from primitives building blocks  Interconnection between primitive functional units  Timing of data movement between primitive functional units  Opportunities for optimizations for a specific application!  Maximizing efficiency while throwing away redundancy CREATING APPLICATION-SPECIFIC ACCELERATOR WITH FPGA Virtex®-7 FPGA Precise, Low Jitter Clocking MMCMs Logic Fabric LUT-6 CLB DSP Engines DSP48E1 Slices On-Chip Memory 36Kbit/18Kbit Block RAM Enhanced Connectivity PCIe® Interface Blocks Hi-perf. Parallel I/O Connectivity SelectIO™ Technology Hi-performance Serial I//O Connectivity Transceiver Technology 7/16/2018 23
  • 24.  Require tremendous efforts  Extensive knowledge of digital circuit design  The potential of FPGAs is not easily accessible by common software engineers THE CHALLENGES OF PROMOTING FPGAS AMONG SOFTWARE ENGINEERS AXI Master Burst inference DSP48 Timing Closure Stable interface Loop rewind 247/16/2018 24
  • 25. • Zynq-7000 devices are equipped with dual-core ARM Cortex-A9 processors integrated with 28nm Artix-7 or Kintex®- 7 based programmable logic for excellent performance-per-watt and maximum design flexibility. • Sundance’s EMC2 carrier board is Compatible with all the Zynq-7000 Series. HYBRID ARQUITECTURE 7/16/2018 25
  • 26. • Zynq® UltraScale+™ MPSoC devices provide 64-bit processor scalability while combining real-time control with soft and hard engines for graphics, video, waveform, and packet processing. • Sundance’s EMC2 carrier board is compatible with all the Zynq® UltraScale+™ MPSoC. Our focus is now on the XCZU4EV device (automotive grade). WHAT IS THE BEST SOLUTION? 7/16/2018 26
  • 27. VCS-1 (EMC2) HARDWARE FEATURES Connectivity:  FM191-R; FMC-LPC to:  15x Digital I/Os [DB9]  12x Analogue Inputs [DB9]  8x Analogue Outputs [DB9]  1x Expansion [SEIC]  FM191-U; SEIC to:  4x USB3.0 [USB-c]  28x GPIO [40-pin GPIO]  FM191-A1; 40-pin GPIO  28x GPIO [DB9] 7/16/2018 27
  • 28. The ZU4EV MPSoC is compatible with a wide range of sensors. Depth sensor (up to 20m). Interface USB3.0 Thermal imaging Interface: Eth Movidius Neural Compute Stick Interface: USB3.0 VCS-1 (EMC2) SENSORS COMPATIBILITY 7/16/2018 28JAI AD-130GE Interface: Eth
  • 29. VCS-1 (EMC2) COMPATIBILITY VCS-1 features:  Raspberry PI and Arduino compatible;  Compatible with most of the Arduino/RPI sensors and actuators;  4x USB3.0 ports for interfacing with a wide range of sensors;  MQTT and OpenCV compatible  ROS compatible  ROS2 ready  HIPPEROS ready 7/16/2018 29
  • 30. The VCS-1 will be fully compatible with the Xilinx reVision stack.  Includes support for the most popular neural networks including AlexNet, GoogLeNet, VGG, SSD, and FCN.  Optimized implementations for CNN network layers, required to build custom neural networks (DNN/CNN) 7/16/2018 30 Xilinx reVISION stack DEEP LEARNING ON THE VCS-1 (EMC2)
  • 31. VCS-1 (EMC2) OPEN SOURCE SOFTWARE AND FIRMWARE Open Source Hardware/software and online documentation:  Open Hardware Repository https://www.ohwr.org/projects/emc2-dp  Microsoft Windows/Linux 64-bit SDK https://github.com/SundanceMultiprocessorTechnology/V CS-1_SDK  FM191 ARM SDK https://github.com/SundanceMultiprocessorTechnology/V CS-1_FM191_SDK  FM191 Firmware https://github.com/SundanceMultiprocessorTechnology/V CS-1_FM191_FW  EMC2 ROS  https://github.com/SundanceMultiprocessorTechnology /VCS-1_emc2_ros 7/16/2018 31
  • 32. A custom enclosure was specially designed for accommodating the VCS-1 system. 7/16/2018 32 VCS-1(EMC2) ENCLOSURE
  • 33. DISCUSSION The VCS-1 has the following characteristics: 1. High performance (24V@0.595A) 2. Low power consumption 3. Highly compatible with a wide range of commercially available sensors and actuators 4. Highly optimised for computer vision applications 5. Fully reconfigurable 7/16/2018 33
  • 34. WHAT NEXT?  Be part of our customers family by ordering our products or hiring our services.  Sundance University Program (SUP)  Access to hardware prototypes  Advisory Board Members  BSC/MSc/PhD Internships  Funding capture  H2020  InnovateUK  EPSRC Know more about SUP and on-going projects https://www.sundance.com/sundance-in-eu-projects- programs/ 7/16/2018 34
  • 37. QUESTIONS? 7/16/2018 37 Sundance Multiprocessor Technology, Ltd. Pedro Machado pedro.m@sundance.com Fatima Kishwar fatima.k@sundance.com Flemming Christensen flemming.c@sundance.com EM3V - Embedded Vision