SlideShare una empresa de Scribd logo
1 de 28
Descargar para leer sin conexión
The LEGaTO project has received funding from the European Union's
Horizon 2020 research and innovation programme under the grant
agreement No 780681
LEGaTO: First Steps Towards
Energy-Efficient Toolset for
Heterogeneous Computing
SAMOS XVIII
Tobias Becker (Maxeler)
18/July/2018
SAMOS XVIII
The industry challenge
• Dennard scaling is dead
• Moore's law is slowing down
• Trend towards heterogeneous architectures
• Increasing focus on energy: ICT sector is responsible for
5% of global energy consumption
• Creates a programmability challenge
SAMOS XVIII
LEGaTO Ambition
• Create software stack-support for energy-efficient
heterogeneous computing
o Starting with Made-in-Europe mature software stack, and
optimizing this stack to support energy-efficiency
o Computing on a commercial cutting-edge European-developed
heterogeneous hardware substrate with CPU + GPU + FPGA +
FPGA-based Dataflow Engines (DFE)
• Main goal: energy efficiency
SAMOS XVIII
LEGaTO Objectives
21.08.18 4
SAMOS XVIII
Partners
• BSC (Barcelona Supercomputing Center)
• CHALMERS (Chalmers University of Technology)
• UNINE (Neuchatel University)
• TUD (Technical University of Dresden)
• CHR (Christmann)
• UNIBI (University of Bielefeld)
• TECHNION (Technion, Israel Institute of Technology)
• MAXELER (Maxeler Technologies Limited)
• DIS (Data Intelligence Sweden)
• HZI (Helmholtz Centre for Infection Research)
• Duration: Dec 2017 – Nov 2020
SAMOS XVIII
Overview
SAMOS XVIII
Hardware Platforms
• Christmann (RECS Box)
o Heterogeneous platform: Supports CPU (X86 or Arm), GPU
(Nvidia) and FPGA (Xillinx and Altera)
• MAXELER
o Maxeler DFE platforms and toolset
• Heterogeneous microserver approach, integrating CPU, GPU and FPGA
technology
• High-speed, low latency communication infrastructure,
enabling accelerators, scalable across multiple servers
• Modular, blade-style design, hot pluggable/swappable
21/08/2018 8
RECS|Box overview
Concept
21/08/2018 9
• Dedicated network for
monitoring, control and
iKVM
• Multiple 1Gb/10Gb
Ethernet links per
Microserver – integrated 40
Gb Ethernet backbone
• Dedicated high-speed, low-
latency communication
infrastructure
• Host-2-Host PCIe between
CPUs
• Switched high-speed
serial lanes between
FPGAs
• Enables pooling of
accelerators
and assignment to
different hosts
Baseboard
Baseboard
Baseboard
3/16
Microserver
s
per
Baseboard
Microserver
Module
CPU Mem I/O
Microserver
Module
Microserver
Module
Backplane
Up to 15 Baseboards per
Server
Distributed
Monitoring
and KVM
Distributed
Monitoring
and KVM
Ethernet
Communication
Infrastructure
Ethernet
Communication
Infrastructure
High-speed
Low-latency
Communication
Storage /
I/O-Ext.
High-speed
Low-latency
Communication
RECS|Box overview
Architecture
CPU Mem I/O CPU Mem I/O
21/08/2018 10
RECS|Box overview
Microserver
Intel x86
Arm Server SoCs
Xilinx FPGAs
Intel FPGAs
Low-Power
Microserver
High-Performance
Microserver
CPU Microserver FPGA MicroserverGPGPU
NVIDIA Jetson TX1/TX2
4 Core A57@1.73 GHz + Maxwell GPGPU@1.5 GHz
2 Core Denver + 4 Core A57 + Pascal GPGPU@1.5 GHz
ARMv8 Server SoC
32 Core A72@2.1 GHz
NVIDIA Tesla P100/V100
Pascal/Volta GPGPU@1.3 GHz Intel Stratix 10 SoC
2,800 kLE
Xilinx Zynq 7020
85 kLC
PCIe-Extensions
PCIe SSD
Xeon Phi
RAPTOR
FPGA acc.
SAMOS XVIII
Maxeler Dataflow Computing Systems
MPC-X3000
• 8 Dataflow Engines in 1U
• Up to 1TB of DFE RAM
• Zero-copy RDMA
• Dynamic allocation of DFEs to
conventional CPU servers through
Infiniband
• Equivalent performance to
20-50 x86 servers
SAMOS XVIII
Dataflow System at Jülich
Supercomputing Center
• Pilot system deployed in October 2017 as part of PRACE PCP
• System configuration
o 1U Maxeler MPC-X with 8 MAX5 DFEs
o 1U AMD EPYC server
• 2x AMD 7601 EPYC CPUs
• 1 TB RAM
• 9.6 TB SSD (data storage)
o one 1U login head node
• Runs 4 scientific workloads
12
MPC-X node
Remote users
MAX5 DFE EPYC CPU
1TB DDR4
Head/Build node
ipmi
56 Gbps 2x Infiniband @ 56 Gbps
10 Gbps
10 Gbps
Supermicro EPYC node
http://www.prace-ri.eu/pcp/
SAMOS XVIII
Scaling Dataflow Computing into the
Cloud
• MAX5 DFEs are compatible with Amazon EC2 F1
instances
• MaxCompiler supports programming F1
• Hybrid Cloud Model:
o On-premise Dataflow system from Maxeler
o Elastically expand to the Amazon Cloud when
needed
13
On-premise
Cloud
SAMOS XVIII
Use Cases
• Healthcare: Infection biomarkers
o Statistical search for biomarkers, which often
needs intensive computation. A biomarker is
a measurable value that can indicate the
state of an organism, and is often the
presence, absence or severity of a specific
disease
• Smart Home: Assisted Living
o The ability of the home to learn from the
users behavior and anticipate future behavior
is still an open task and necessary to obtain
a broad user acceptance of assisted living in
the general public
SAMOS XVIII
Use Cases
• Smart City: operational urban
pollutant dispersion modelling
o Modeling city landscape + sensor data +
wind prediction to issue a “pollutant
weather prediction”
• Machine Learning: Automated driving
and graphics rendering
o Object detection using CNN networks for
automated driving systems and CNN-
and LSTM-based methods for realistic
rendering of graphics for gaming and
multi-camera systems
• Secure IoT Gateway
o Variety of sensors and actors in an
industrial and private surrounding
SAMOS XVIII
Programming Models and Runtimes:
Task based programming + Dataflow!
• OmpSs / OmpSs@FPGA (BSC)
• MaxCompiler (Maxeler)
• DFiant HLS (TECHNION)
• XiTAO (CHALMERS)
21 August,
2018
16
SAMOS XVIII
17
Towards a single source for any
target
• A need if we expect programmers to survive
o Architectures appear like mushrooms
• Productivity
o Develop once → run everywhere
• Performance
• Key concept behind OmpSs
o Sequential task based program on single address/name space
o + directionality annotations
o Executed in parallel: Automatic runtime computation of dependences among
tasks
o LEGaTO: Extend tasks with resource requirements, propagate through the
stack to find the most energy efficient solution at run time
18
Example for OmpSs@FPGA (Matrix multiply
with OmpSs and Vivado HLS annotations)
SAMOS XVIII
19
AXIOM board:
Xilinx Zynq Ultrascale+ chip,
with 4 ARM Cortex-A53 cores,
and the ZU9EG FPGA.
IPs configuration 1*256, 3*128 Number of instances * size
Frequency (MHz) 200, 250, 300 Working frequency of the FPGA
Number of SMP cores SMP: 1 to 4
FPGA: 3+1 helper, 2+2 helpers
Combination of SMP and helper threads
Number of FPGA helper threads SMP: 0; FPGA: 1, 2 Helper threads are used to manage tasks on
the FPGA
Number of pending tasks 4, 8, 16 and 32 Number of tasks sent to the IP cores before
waiting for their finalization
OmpSs@FPGA Experiments (Matrix
multiply)
SAMOS XVIII
DFiant HLS
• Aims to bridge the programmability gap by combining constructs and
semantics from software, hardware and dataflow languages
• Programming model accommodates a middle-ground between low-
level HDL and high-level sequential programming
High-Level Synthesis
Languages and Tools
(e.g., C and Vivado HLS)
Register-Transfer Level
HDLs
(e.g., VHDL)
DFiant: A Dataflow HDL
 Automatic pipelining
 Not an HDL
 Problem with state
 Separating timing
from functionality
 Concurrency
 Fine-grain control
 Automatic pipelining Concurrency
 Fine-grain control
 Bound to clock
 Explicit pipelining
SAMOS XVIII
PCI
Express
Manager
DFE
Memory
MyManager (.maxj)
x
x
+
30
x
Manager m = new Manager();
Kernel k =
new MyKernel();
m.setKernel(k);
m.setIO(
link(“x", CPU),
m.build();
link(“y", CPU));
MyKernel(DATA_SIZE,
x, DATA_SIZE*4,
y, DATA_SIZE*4);
for (int i =0; i < DATA_SIZE; i++)
y[i]= x[i] * x[i] + 30;
Main
Memory
CPU
CPU
Code
Host Code (.c)
MaxCompiler: Application Development
21
SLiC
MaxelerOS
DFEVar x = io.input("x", dfeInt(32));
DFEVar result = x * x + 30;
io.output("y", result, dfeInt(32));
MyKernel (.maxj)
int*x, *y;
y
x
x
+
30
y
x
Task-based kernel identification/DFE
mapping
OmpSs identifies “static” task graphs while running and DFE kilo-task instance is
selected
Instantiate static, customized, ultra-deep (>1,000 stages) computing pipelines
DataFlow Engines
(DFEs) produce
one pair of results
at each clock cycle
XiTAO: Generalized Tasks
Reduces Overheads
Reduces Parallel Slackness
Overcommits memory resources
 Improve Parallel Slackness by
reducing dimensionality
 Reduce Cache/BW pressure by
allocating multiple resources
OmpSs + XiTao
21.08.18 24
Extend the XiTAO with
support for heterogeneous
resources
Why extremely low-voltage for FPGAs?
Why Energy efficiency through Undervolting?
• Still way to go for <20MW@exascale
• FPGAs are at least 20X less power- and energy-efficient than ASICs.
• Undervolting directly delivers dynamic and static power/energy reduction.
• Note: undervolting is not DVFS
• Frequency is not lowered!
FPGA Undervolting Preliminary Experiments
Voltage scaling below standard nominal level, in on-chip
memories of Xilinx technology:
o Power quadratically decreases, deliver savings of more than
10X.
o A conservative voltage guardband exists below nominal.
o Fault exponentially increases, below the guardband.
o Faults can be detected / managed.
SAMOS XVIII
LEGaTO System Stack
Questions?
https://legato-project.eu/
legato-project@bsc.es

Más contenido relacionado

La actualidad más candente

Panda scalable hpc_bestpractices_tue100418
Panda scalable hpc_bestpractices_tue100418Panda scalable hpc_bestpractices_tue100418
Panda scalable hpc_bestpractices_tue100418inside-BigData.com
 
Data Plane Evolution: Towards Openness and Flexibility
Data Plane Evolution: Towards Openness and FlexibilityData Plane Evolution: Towards Openness and Flexibility
Data Plane Evolution: Towards Openness and FlexibilityAPNIC
 
20170523 T-Systems Iberia Object Storage Cloud Services
20170523 T-Systems Iberia Object Storage Cloud Services20170523 T-Systems Iberia Object Storage Cloud Services
20170523 T-Systems Iberia Object Storage Cloud ServicesJavier Gallego Aguagil
 
Processing malaria HTS results using KNIME: a tutorial
Processing malaria HTS results using KNIME: a tutorialProcessing malaria HTS results using KNIME: a tutorial
Processing malaria HTS results using KNIME: a tutorialGreg Landrum
 
CourboSpark: Decision Tree for Time-series on Spark
CourboSpark: Decision Tree for Time-series on SparkCourboSpark: Decision Tree for Time-series on Spark
CourboSpark: Decision Tree for Time-series on SparkDataWorks Summit
 
GPU Computing with Python and Anaconda: The Next Frontier
GPU Computing with Python and Anaconda: The Next FrontierGPU Computing with Python and Anaconda: The Next Frontier
GPU Computing with Python and Anaconda: The Next FrontierNVIDIA
 
OpenACC Monthly Highlights: January 2021
OpenACC Monthly Highlights: January 2021OpenACC Monthly Highlights: January 2021
OpenACC Monthly Highlights: January 2021OpenACC
 
Review of Cloud Computing Simulation Platforms and Related Environments
Review of Cloud Computing Simulation Platforms and Related EnvironmentsReview of Cloud Computing Simulation Platforms and Related Environments
Review of Cloud Computing Simulation Platforms and Related EnvironmentsRECAP Project
 
Big & Open Data: Challenges for Smartcity
Big & Open Data:  Challenges for SmartcityBig & Open Data:  Challenges for Smartcity
Big & Open Data: Challenges for SmartcityVictoria López
 
Designing HPC, Deep Learning, and Cloud Middleware for Exascale Systems
Designing HPC, Deep Learning, and Cloud Middleware for Exascale SystemsDesigning HPC, Deep Learning, and Cloud Middleware for Exascale Systems
Designing HPC, Deep Learning, and Cloud Middleware for Exascale Systemsinside-BigData.com
 
Network experiences with Public Cloud Services @ TNC2017
Network experiences with Public Cloud Services @ TNC2017Network experiences with Public Cloud Services @ TNC2017
Network experiences with Public Cloud Services @ TNC2017Helix Nebula The Science Cloud
 
OpenACC Monthly Highlights: July 2021
OpenACC Monthly Highlights: July  2021OpenACC Monthly Highlights: July  2021
OpenACC Monthly Highlights: July 2021OpenACC
 
Arm DynamIQ: Intelligent Solutions Using Cluster Based Multiprocessing
Arm DynamIQ: Intelligent Solutions Using Cluster Based MultiprocessingArm DynamIQ: Intelligent Solutions Using Cluster Based Multiprocessing
Arm DynamIQ: Intelligent Solutions Using Cluster Based MultiprocessingArm
 
Linaro: High Performance Computing (HPC)
Linaro: High Performance Computing (HPC)Linaro: High Performance Computing (HPC)
Linaro: High Performance Computing (HPC)Linaro
 
Project Trillium: Arm Machine Learning Platform
Project Trillium: Arm Machine Learning PlatformProject Trillium: Arm Machine Learning Platform
Project Trillium: Arm Machine Learning PlatformArm
 
Science Demonstrator Session: Physics and Astrophysics
Science Demonstrator Session: Physics and AstrophysicsScience Demonstrator Session: Physics and Astrophysics
Science Demonstrator Session: Physics and AstrophysicsEOSCpilot .eu
 

La actualidad más candente (20)

Panda scalable hpc_bestpractices_tue100418
Panda scalable hpc_bestpractices_tue100418Panda scalable hpc_bestpractices_tue100418
Panda scalable hpc_bestpractices_tue100418
 
Data Plane Evolution: Towards Openness and Flexibility
Data Plane Evolution: Towards Openness and FlexibilityData Plane Evolution: Towards Openness and Flexibility
Data Plane Evolution: Towards Openness and Flexibility
 
20170523 T-Systems Iberia Object Storage Cloud Services
20170523 T-Systems Iberia Object Storage Cloud Services20170523 T-Systems Iberia Object Storage Cloud Services
20170523 T-Systems Iberia Object Storage Cloud Services
 
Processing malaria HTS results using KNIME: a tutorial
Processing malaria HTS results using KNIME: a tutorialProcessing malaria HTS results using KNIME: a tutorial
Processing malaria HTS results using KNIME: a tutorial
 
Postcard: NECOS
Postcard: NECOSPostcard: NECOS
Postcard: NECOS
 
CourboSpark: Decision Tree for Time-series on Spark
CourboSpark: Decision Tree for Time-series on SparkCourboSpark: Decision Tree for Time-series on Spark
CourboSpark: Decision Tree for Time-series on Spark
 
GPU Computing with Python and Anaconda: The Next Frontier
GPU Computing with Python and Anaconda: The Next FrontierGPU Computing with Python and Anaconda: The Next Frontier
GPU Computing with Python and Anaconda: The Next Frontier
 
Vlsi titles 2017 18
Vlsi titles 2017 18Vlsi titles 2017 18
Vlsi titles 2017 18
 
OpenACC Monthly Highlights: January 2021
OpenACC Monthly Highlights: January 2021OpenACC Monthly Highlights: January 2021
OpenACC Monthly Highlights: January 2021
 
Review of Cloud Computing Simulation Platforms and Related Environments
Review of Cloud Computing Simulation Platforms and Related EnvironmentsReview of Cloud Computing Simulation Platforms and Related Environments
Review of Cloud Computing Simulation Platforms and Related Environments
 
Big & Open Data: Challenges for Smartcity
Big & Open Data:  Challenges for SmartcityBig & Open Data:  Challenges for Smartcity
Big & Open Data: Challenges for Smartcity
 
Designing HPC, Deep Learning, and Cloud Middleware for Exascale Systems
Designing HPC, Deep Learning, and Cloud Middleware for Exascale SystemsDesigning HPC, Deep Learning, and Cloud Middleware for Exascale Systems
Designing HPC, Deep Learning, and Cloud Middleware for Exascale Systems
 
Network experiences with Public Cloud Services @ TNC2017
Network experiences with Public Cloud Services @ TNC2017Network experiences with Public Cloud Services @ TNC2017
Network experiences with Public Cloud Services @ TNC2017
 
OpenACC Monthly Highlights: July 2021
OpenACC Monthly Highlights: July  2021OpenACC Monthly Highlights: July  2021
OpenACC Monthly Highlights: July 2021
 
HNSciCloud Overview
HNSciCloud OverviewHNSciCloud Overview
HNSciCloud Overview
 
Arm DynamIQ: Intelligent Solutions Using Cluster Based Multiprocessing
Arm DynamIQ: Intelligent Solutions Using Cluster Based MultiprocessingArm DynamIQ: Intelligent Solutions Using Cluster Based Multiprocessing
Arm DynamIQ: Intelligent Solutions Using Cluster Based Multiprocessing
 
Linaro: High Performance Computing (HPC)
Linaro: High Performance Computing (HPC)Linaro: High Performance Computing (HPC)
Linaro: High Performance Computing (HPC)
 
Project Trillium: Arm Machine Learning Platform
Project Trillium: Arm Machine Learning PlatformProject Trillium: Arm Machine Learning Platform
Project Trillium: Arm Machine Learning Platform
 
Helix Nebula Phase 1
Helix Nebula Phase 1Helix Nebula Phase 1
Helix Nebula Phase 1
 
Science Demonstrator Session: Physics and Astrophysics
Science Demonstrator Session: Physics and AstrophysicsScience Demonstrator Session: Physics and Astrophysics
Science Demonstrator Session: Physics and Astrophysics
 

Similar a SAMOS 2018: LEGaTO: first steps towards energy-efficient toolset for heterogeneous computing

DATE 2020: Design, Automation and Test in Europe Conference
DATE 2020: Design, Automation and Test in Europe ConferenceDATE 2020: Design, Automation and Test in Europe Conference
DATE 2020: Design, Automation and Test in Europe ConferenceLEGATO project
 
Automatic generation of hardware memory architectures for HPC
Automatic generation of hardware memory architectures for HPCAutomatic generation of hardware memory architectures for HPC
Automatic generation of hardware memory architectures for HPCFacultad de Informática UCM
 
LEGaTO Heterogeneous Hardware
LEGaTO Heterogeneous HardwareLEGaTO Heterogeneous Hardware
LEGaTO Heterogeneous HardwareLEGATO project
 
VEDLIoT at FPL'23_Accelerators for Heterogenous Computing in AIoT
VEDLIoT at FPL'23_Accelerators for Heterogenous Computing in AIoTVEDLIoT at FPL'23_Accelerators for Heterogenous Computing in AIoT
VEDLIoT at FPL'23_Accelerators for Heterogenous Computing in AIoTVEDLIoT Project
 
SS-CPSIoT 2023_Kevin Mika and Piotr Zierhoffer presentation
SS-CPSIoT 2023_Kevin Mika and Piotr Zierhoffer presentationSS-CPSIoT 2023_Kevin Mika and Piotr Zierhoffer presentation
SS-CPSIoT 2023_Kevin Mika and Piotr Zierhoffer presentationVEDLIoT Project
 
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)Lablup Inc.
 
Programmable Exascale Supercomputer
Programmable Exascale SupercomputerProgrammable Exascale Supercomputer
Programmable Exascale SupercomputerSagar Dolas
 
“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...
“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...
“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...Edge AI and Vision Alliance
 
Supermicro AI Pod that’s Super Simple, Super Scalable, and Super Affordable
Supermicro AI Pod that’s Super Simple, Super Scalable, and Super AffordableSupermicro AI Pod that’s Super Simple, Super Scalable, and Super Affordable
Supermicro AI Pod that’s Super Simple, Super Scalable, and Super AffordableRebekah Rodriguez
 
Leveraging IoT as part of your digital transformation
Leveraging IoT as part of your digital transformationLeveraging IoT as part of your digital transformation
Leveraging IoT as part of your digital transformationJohn Archer
 
Real time machine learning proposers day v3
Real time machine learning proposers day v3Real time machine learning proposers day v3
Real time machine learning proposers day v3mustafa sarac
 
OCP Telco Engineering Workshop at BCE2017
OCP Telco Engineering Workshop at BCE2017OCP Telco Engineering Workshop at BCE2017
OCP Telco Engineering Workshop at BCE2017Radisys Corporation
 
Xilinx Edge Compute using Power 9 /OpenPOWER systems
Xilinx Edge Compute using Power 9 /OpenPOWER systemsXilinx Edge Compute using Power 9 /OpenPOWER systems
Xilinx Edge Compute using Power 9 /OpenPOWER systemsGanesan Narayanasamy
 
HiPEAC 2020: Energy-aware Task Scheduling in LEGaTO: Low Energy Toolset for H...
HiPEAC 2020: Energy-aware Task Scheduling in LEGaTO: Low Energy Toolset for H...HiPEAC 2020: Energy-aware Task Scheduling in LEGaTO: Low Energy Toolset for H...
HiPEAC 2020: Energy-aware Task Scheduling in LEGaTO: Low Energy Toolset for H...LEGATO project
 
Harnessing the virtual realm for successful real world artificial intelligence
Harnessing the virtual realm for successful real world artificial intelligenceHarnessing the virtual realm for successful real world artificial intelligence
Harnessing the virtual realm for successful real world artificial intelligenceAlison B. Lowndes
 
Xilinx Data Center Strategy and CCIX
Xilinx Data Center Strategy and CCIXXilinx Data Center Strategy and CCIX
Xilinx Data Center Strategy and CCIXYoshihiro Horie
 

Similar a SAMOS 2018: LEGaTO: first steps towards energy-efficient toolset for heterogeneous computing (20)

DATE 2020: Design, Automation and Test in Europe Conference
DATE 2020: Design, Automation and Test in Europe ConferenceDATE 2020: Design, Automation and Test in Europe Conference
DATE 2020: Design, Automation and Test in Europe Conference
 
Automatic generation of hardware memory architectures for HPC
Automatic generation of hardware memory architectures for HPCAutomatic generation of hardware memory architectures for HPC
Automatic generation of hardware memory architectures for HPC
 
LEGaTO Heterogeneous Hardware
LEGaTO Heterogeneous HardwareLEGaTO Heterogeneous Hardware
LEGaTO Heterogeneous Hardware
 
Santhosh Resume
Santhosh ResumeSanthosh Resume
Santhosh Resume
 
VEDLIoT at FPL'23_Accelerators for Heterogenous Computing in AIoT
VEDLIoT at FPL'23_Accelerators for Heterogenous Computing in AIoTVEDLIoT at FPL'23_Accelerators for Heterogenous Computing in AIoT
VEDLIoT at FPL'23_Accelerators for Heterogenous Computing in AIoT
 
LEGaTO: Use cases
LEGaTO: Use casesLEGaTO: Use cases
LEGaTO: Use cases
 
SS-CPSIoT 2023_Kevin Mika and Piotr Zierhoffer presentation
SS-CPSIoT 2023_Kevin Mika and Piotr Zierhoffer presentationSS-CPSIoT 2023_Kevin Mika and Piotr Zierhoffer presentation
SS-CPSIoT 2023_Kevin Mika and Piotr Zierhoffer presentation
 
NWU and HPC
NWU and HPCNWU and HPC
NWU and HPC
 
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
 
Programmable Exascale Supercomputer
Programmable Exascale SupercomputerProgrammable Exascale Supercomputer
Programmable Exascale Supercomputer
 
Exascale Capabl
Exascale CapablExascale Capabl
Exascale Capabl
 
“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...
“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...
“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...
 
Supermicro AI Pod that’s Super Simple, Super Scalable, and Super Affordable
Supermicro AI Pod that’s Super Simple, Super Scalable, and Super AffordableSupermicro AI Pod that’s Super Simple, Super Scalable, and Super Affordable
Supermicro AI Pod that’s Super Simple, Super Scalable, and Super Affordable
 
Leveraging IoT as part of your digital transformation
Leveraging IoT as part of your digital transformationLeveraging IoT as part of your digital transformation
Leveraging IoT as part of your digital transformation
 
Real time machine learning proposers day v3
Real time machine learning proposers day v3Real time machine learning proposers day v3
Real time machine learning proposers day v3
 
OCP Telco Engineering Workshop at BCE2017
OCP Telco Engineering Workshop at BCE2017OCP Telco Engineering Workshop at BCE2017
OCP Telco Engineering Workshop at BCE2017
 
Xilinx Edge Compute using Power 9 /OpenPOWER systems
Xilinx Edge Compute using Power 9 /OpenPOWER systemsXilinx Edge Compute using Power 9 /OpenPOWER systems
Xilinx Edge Compute using Power 9 /OpenPOWER systems
 
HiPEAC 2020: Energy-aware Task Scheduling in LEGaTO: Low Energy Toolset for H...
HiPEAC 2020: Energy-aware Task Scheduling in LEGaTO: Low Energy Toolset for H...HiPEAC 2020: Energy-aware Task Scheduling in LEGaTO: Low Energy Toolset for H...
HiPEAC 2020: Energy-aware Task Scheduling in LEGaTO: Low Energy Toolset for H...
 
Harnessing the virtual realm for successful real world artificial intelligence
Harnessing the virtual realm for successful real world artificial intelligenceHarnessing the virtual realm for successful real world artificial intelligence
Harnessing the virtual realm for successful real world artificial intelligence
 
Xilinx Data Center Strategy and CCIX
Xilinx Data Center Strategy and CCIXXilinx Data Center Strategy and CCIX
Xilinx Data Center Strategy and CCIX
 

Más de LEGATO project

Scrooge Attack: Undervolting ARM Processors for Profit
Scrooge Attack: Undervolting ARM Processors for ProfitScrooge Attack: Undervolting ARM Processors for Profit
Scrooge Attack: Undervolting ARM Processors for ProfitLEGATO project
 
A practical approach for updating an integrity-enforced operating system
A practical approach for updating an integrity-enforced operating systemA practical approach for updating an integrity-enforced operating system
A practical approach for updating an integrity-enforced operating systemLEGATO project
 
TEEMon: A continuous performance monitoring framework for TEEs
TEEMon: A continuous performance monitoring framework for TEEsTEEMon: A continuous performance monitoring framework for TEEs
TEEMon: A continuous performance monitoring framework for TEEsLEGATO project
 
secureTF: A Secure TensorFlow Framework
secureTF: A Secure TensorFlow FrameworksecureTF: A Secure TensorFlow Framework
secureTF: A Secure TensorFlow FrameworkLEGATO project
 
PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep...
PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep...PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep...
PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep...LEGATO project
 
LEGaTO: Machine Learning Use Case
LEGaTO: Machine Learning Use CaseLEGaTO: Machine Learning Use Case
LEGaTO: Machine Learning Use CaseLEGATO project
 
Smart Home AI at the edge
Smart Home AI at the edgeSmart Home AI at the edge
Smart Home AI at the edgeLEGATO project
 
LEGaTO: Low-Energy Heterogeneous Computing Use of AI in the project
LEGaTO: Low-Energy Heterogeneous Computing Use of AI in the projectLEGaTO: Low-Energy Heterogeneous Computing Use of AI in the project
LEGaTO: Low-Energy Heterogeneous Computing Use of AI in the projectLEGATO project
 
LEGaTO: Software Stack Programming Models
LEGaTO: Software Stack Programming ModelsLEGaTO: Software Stack Programming Models
LEGaTO: Software Stack Programming ModelsLEGATO project
 
LEGaTO: Software Stack Runtimes
LEGaTO: Software Stack RuntimesLEGaTO: Software Stack Runtimes
LEGaTO: Software Stack RuntimesLEGATO project
 
LEGaTO: Low-Energy Heterogeneous Computing Workshop
LEGaTO: Low-Energy Heterogeneous Computing WorkshopLEGaTO: Low-Energy Heterogeneous Computing Workshop
LEGaTO: Low-Energy Heterogeneous Computing WorkshopLEGATO project
 
TZ4Fabric: Executing Smart Contracts with ARM TrustZone
TZ4Fabric: Executing Smart Contracts with ARM TrustZoneTZ4Fabric: Executing Smart Contracts with ARM TrustZone
TZ4Fabric: Executing Smart Contracts with ARM TrustZoneLEGATO project
 
Infection Research with Maxeler Dataflow Computing
Infection Research with Maxeler Dataflow ComputingInfection Research with Maxeler Dataflow Computing
Infection Research with Maxeler Dataflow ComputingLEGATO project
 
Smart Home - AI at the edge
Smart Home - AI at the edgeSmart Home - AI at the edge
Smart Home - AI at the edgeLEGATO project
 
FPGA Undervolting and Checkpointing for Energy-Efficiency and Error-Resiliency
FPGA Undervolting and Checkpointing for Energy-Efficiency and Error-ResiliencyFPGA Undervolting and Checkpointing for Energy-Efficiency and Error-Resiliency
FPGA Undervolting and Checkpointing for Energy-Efficiency and Error-ResiliencyLEGATO project
 
Device Data Directory and Asynchronous execution: A path to heterogeneous com...
Device Data Directory and Asynchronous execution: A path to heterogeneous com...Device Data Directory and Asynchronous execution: A path to heterogeneous com...
Device Data Directory and Asynchronous execution: A path to heterogeneous com...LEGATO project
 
Scheduling Task-parallel Applications in Dynamically Asymmetric Environments
Scheduling Task-parallel Applications in Dynamically Asymmetric EnvironmentsScheduling Task-parallel Applications in Dynamically Asymmetric Environments
Scheduling Task-parallel Applications in Dynamically Asymmetric EnvironmentsLEGATO project
 
RECS – Cloud to Edge Microserver Platform for Energy-Efficient Computing
RECS – Cloud to Edge Microserver Platform for Energy-Efficient ComputingRECS – Cloud to Edge Microserver Platform for Energy-Efficient Computing
RECS – Cloud to Edge Microserver Platform for Energy-Efficient ComputingLEGATO project
 
Secure Task-Based Programming with OmpSs and SGX
Secure Task-Based Programming with OmpSs and SGXSecure Task-Based Programming with OmpSs and SGX
Secure Task-Based Programming with OmpSs and SGXLEGATO project
 

Más de LEGATO project (20)

Scrooge Attack: Undervolting ARM Processors for Profit
Scrooge Attack: Undervolting ARM Processors for ProfitScrooge Attack: Undervolting ARM Processors for Profit
Scrooge Attack: Undervolting ARM Processors for Profit
 
A practical approach for updating an integrity-enforced operating system
A practical approach for updating an integrity-enforced operating systemA practical approach for updating an integrity-enforced operating system
A practical approach for updating an integrity-enforced operating system
 
TEEMon: A continuous performance monitoring framework for TEEs
TEEMon: A continuous performance monitoring framework for TEEsTEEMon: A continuous performance monitoring framework for TEEs
TEEMon: A continuous performance monitoring framework for TEEs
 
secureTF: A Secure TensorFlow Framework
secureTF: A Secure TensorFlow FrameworksecureTF: A Secure TensorFlow Framework
secureTF: A Secure TensorFlow Framework
 
PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep...
PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep...PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep...
PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep...
 
LEGaTO: Machine Learning Use Case
LEGaTO: Machine Learning Use CaseLEGaTO: Machine Learning Use Case
LEGaTO: Machine Learning Use Case
 
Smart Home AI at the edge
Smart Home AI at the edgeSmart Home AI at the edge
Smart Home AI at the edge
 
LEGaTO: Low-Energy Heterogeneous Computing Use of AI in the project
LEGaTO: Low-Energy Heterogeneous Computing Use of AI in the projectLEGaTO: Low-Energy Heterogeneous Computing Use of AI in the project
LEGaTO: Low-Energy Heterogeneous Computing Use of AI in the project
 
LEGaTO Integration
LEGaTO IntegrationLEGaTO Integration
LEGaTO Integration
 
LEGaTO: Software Stack Programming Models
LEGaTO: Software Stack Programming ModelsLEGaTO: Software Stack Programming Models
LEGaTO: Software Stack Programming Models
 
LEGaTO: Software Stack Runtimes
LEGaTO: Software Stack RuntimesLEGaTO: Software Stack Runtimes
LEGaTO: Software Stack Runtimes
 
LEGaTO: Low-Energy Heterogeneous Computing Workshop
LEGaTO: Low-Energy Heterogeneous Computing WorkshopLEGaTO: Low-Energy Heterogeneous Computing Workshop
LEGaTO: Low-Energy Heterogeneous Computing Workshop
 
TZ4Fabric: Executing Smart Contracts with ARM TrustZone
TZ4Fabric: Executing Smart Contracts with ARM TrustZoneTZ4Fabric: Executing Smart Contracts with ARM TrustZone
TZ4Fabric: Executing Smart Contracts with ARM TrustZone
 
Infection Research with Maxeler Dataflow Computing
Infection Research with Maxeler Dataflow ComputingInfection Research with Maxeler Dataflow Computing
Infection Research with Maxeler Dataflow Computing
 
Smart Home - AI at the edge
Smart Home - AI at the edgeSmart Home - AI at the edge
Smart Home - AI at the edge
 
FPGA Undervolting and Checkpointing for Energy-Efficiency and Error-Resiliency
FPGA Undervolting and Checkpointing for Energy-Efficiency and Error-ResiliencyFPGA Undervolting and Checkpointing for Energy-Efficiency and Error-Resiliency
FPGA Undervolting and Checkpointing for Energy-Efficiency and Error-Resiliency
 
Device Data Directory and Asynchronous execution: A path to heterogeneous com...
Device Data Directory and Asynchronous execution: A path to heterogeneous com...Device Data Directory and Asynchronous execution: A path to heterogeneous com...
Device Data Directory and Asynchronous execution: A path to heterogeneous com...
 
Scheduling Task-parallel Applications in Dynamically Asymmetric Environments
Scheduling Task-parallel Applications in Dynamically Asymmetric EnvironmentsScheduling Task-parallel Applications in Dynamically Asymmetric Environments
Scheduling Task-parallel Applications in Dynamically Asymmetric Environments
 
RECS – Cloud to Edge Microserver Platform for Energy-Efficient Computing
RECS – Cloud to Edge Microserver Platform for Energy-Efficient ComputingRECS – Cloud to Edge Microserver Platform for Energy-Efficient Computing
RECS – Cloud to Edge Microserver Platform for Energy-Efficient Computing
 
Secure Task-Based Programming with OmpSs and SGX
Secure Task-Based Programming with OmpSs and SGXSecure Task-Based Programming with OmpSs and SGX
Secure Task-Based Programming with OmpSs and SGX
 

Último

Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLkantirani197
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)Areesha Ahmad
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...chandars293
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfSumit Kumar yadav
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptxAlMamun560346
 
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...Lokesh Kothari
 
American Type Culture Collection (ATCC).pptx
American Type Culture Collection (ATCC).pptxAmerican Type Culture Collection (ATCC).pptx
American Type Culture Collection (ATCC).pptxabhishekdhamu51
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑Damini Dixit
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformationAreesha Ahmad
 
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencySheetal Arora
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...ssifa0344
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 

Último (20)

Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptx
 
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
 
American Type Culture Collection (ATCC).pptx
American Type Culture Collection (ATCC).pptxAmerican Type Culture Collection (ATCC).pptx
American Type Culture Collection (ATCC).pptx
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
 
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 

SAMOS 2018: LEGaTO: first steps towards energy-efficient toolset for heterogeneous computing

  • 1. The LEGaTO project has received funding from the European Union's Horizon 2020 research and innovation programme under the grant agreement No 780681 LEGaTO: First Steps Towards Energy-Efficient Toolset for Heterogeneous Computing SAMOS XVIII Tobias Becker (Maxeler) 18/July/2018
  • 2. SAMOS XVIII The industry challenge • Dennard scaling is dead • Moore's law is slowing down • Trend towards heterogeneous architectures • Increasing focus on energy: ICT sector is responsible for 5% of global energy consumption • Creates a programmability challenge
  • 3. SAMOS XVIII LEGaTO Ambition • Create software stack-support for energy-efficient heterogeneous computing o Starting with Made-in-Europe mature software stack, and optimizing this stack to support energy-efficiency o Computing on a commercial cutting-edge European-developed heterogeneous hardware substrate with CPU + GPU + FPGA + FPGA-based Dataflow Engines (DFE) • Main goal: energy efficiency
  • 5. SAMOS XVIII Partners • BSC (Barcelona Supercomputing Center) • CHALMERS (Chalmers University of Technology) • UNINE (Neuchatel University) • TUD (Technical University of Dresden) • CHR (Christmann) • UNIBI (University of Bielefeld) • TECHNION (Technion, Israel Institute of Technology) • MAXELER (Maxeler Technologies Limited) • DIS (Data Intelligence Sweden) • HZI (Helmholtz Centre for Infection Research) • Duration: Dec 2017 – Nov 2020
  • 7. SAMOS XVIII Hardware Platforms • Christmann (RECS Box) o Heterogeneous platform: Supports CPU (X86 or Arm), GPU (Nvidia) and FPGA (Xillinx and Altera) • MAXELER o Maxeler DFE platforms and toolset
  • 8. • Heterogeneous microserver approach, integrating CPU, GPU and FPGA technology • High-speed, low latency communication infrastructure, enabling accelerators, scalable across multiple servers • Modular, blade-style design, hot pluggable/swappable 21/08/2018 8 RECS|Box overview Concept
  • 9. 21/08/2018 9 • Dedicated network for monitoring, control and iKVM • Multiple 1Gb/10Gb Ethernet links per Microserver – integrated 40 Gb Ethernet backbone • Dedicated high-speed, low- latency communication infrastructure • Host-2-Host PCIe between CPUs • Switched high-speed serial lanes between FPGAs • Enables pooling of accelerators and assignment to different hosts Baseboard Baseboard Baseboard 3/16 Microserver s per Baseboard Microserver Module CPU Mem I/O Microserver Module Microserver Module Backplane Up to 15 Baseboards per Server Distributed Monitoring and KVM Distributed Monitoring and KVM Ethernet Communication Infrastructure Ethernet Communication Infrastructure High-speed Low-latency Communication Storage / I/O-Ext. High-speed Low-latency Communication RECS|Box overview Architecture CPU Mem I/O CPU Mem I/O
  • 10. 21/08/2018 10 RECS|Box overview Microserver Intel x86 Arm Server SoCs Xilinx FPGAs Intel FPGAs Low-Power Microserver High-Performance Microserver CPU Microserver FPGA MicroserverGPGPU NVIDIA Jetson TX1/TX2 4 Core A57@1.73 GHz + Maxwell GPGPU@1.5 GHz 2 Core Denver + 4 Core A57 + Pascal GPGPU@1.5 GHz ARMv8 Server SoC 32 Core A72@2.1 GHz NVIDIA Tesla P100/V100 Pascal/Volta GPGPU@1.3 GHz Intel Stratix 10 SoC 2,800 kLE Xilinx Zynq 7020 85 kLC PCIe-Extensions PCIe SSD Xeon Phi RAPTOR FPGA acc.
  • 11. SAMOS XVIII Maxeler Dataflow Computing Systems MPC-X3000 • 8 Dataflow Engines in 1U • Up to 1TB of DFE RAM • Zero-copy RDMA • Dynamic allocation of DFEs to conventional CPU servers through Infiniband • Equivalent performance to 20-50 x86 servers
  • 12. SAMOS XVIII Dataflow System at Jülich Supercomputing Center • Pilot system deployed in October 2017 as part of PRACE PCP • System configuration o 1U Maxeler MPC-X with 8 MAX5 DFEs o 1U AMD EPYC server • 2x AMD 7601 EPYC CPUs • 1 TB RAM • 9.6 TB SSD (data storage) o one 1U login head node • Runs 4 scientific workloads 12 MPC-X node Remote users MAX5 DFE EPYC CPU 1TB DDR4 Head/Build node ipmi 56 Gbps 2x Infiniband @ 56 Gbps 10 Gbps 10 Gbps Supermicro EPYC node http://www.prace-ri.eu/pcp/
  • 13. SAMOS XVIII Scaling Dataflow Computing into the Cloud • MAX5 DFEs are compatible with Amazon EC2 F1 instances • MaxCompiler supports programming F1 • Hybrid Cloud Model: o On-premise Dataflow system from Maxeler o Elastically expand to the Amazon Cloud when needed 13 On-premise Cloud
  • 14. SAMOS XVIII Use Cases • Healthcare: Infection biomarkers o Statistical search for biomarkers, which often needs intensive computation. A biomarker is a measurable value that can indicate the state of an organism, and is often the presence, absence or severity of a specific disease • Smart Home: Assisted Living o The ability of the home to learn from the users behavior and anticipate future behavior is still an open task and necessary to obtain a broad user acceptance of assisted living in the general public
  • 15. SAMOS XVIII Use Cases • Smart City: operational urban pollutant dispersion modelling o Modeling city landscape + sensor data + wind prediction to issue a “pollutant weather prediction” • Machine Learning: Automated driving and graphics rendering o Object detection using CNN networks for automated driving systems and CNN- and LSTM-based methods for realistic rendering of graphics for gaming and multi-camera systems • Secure IoT Gateway o Variety of sensors and actors in an industrial and private surrounding
  • 16. SAMOS XVIII Programming Models and Runtimes: Task based programming + Dataflow! • OmpSs / OmpSs@FPGA (BSC) • MaxCompiler (Maxeler) • DFiant HLS (TECHNION) • XiTAO (CHALMERS) 21 August, 2018 16
  • 17. SAMOS XVIII 17 Towards a single source for any target • A need if we expect programmers to survive o Architectures appear like mushrooms • Productivity o Develop once → run everywhere • Performance • Key concept behind OmpSs o Sequential task based program on single address/name space o + directionality annotations o Executed in parallel: Automatic runtime computation of dependences among tasks o LEGaTO: Extend tasks with resource requirements, propagate through the stack to find the most energy efficient solution at run time
  • 18. 18 Example for OmpSs@FPGA (Matrix multiply with OmpSs and Vivado HLS annotations)
  • 19. SAMOS XVIII 19 AXIOM board: Xilinx Zynq Ultrascale+ chip, with 4 ARM Cortex-A53 cores, and the ZU9EG FPGA. IPs configuration 1*256, 3*128 Number of instances * size Frequency (MHz) 200, 250, 300 Working frequency of the FPGA Number of SMP cores SMP: 1 to 4 FPGA: 3+1 helper, 2+2 helpers Combination of SMP and helper threads Number of FPGA helper threads SMP: 0; FPGA: 1, 2 Helper threads are used to manage tasks on the FPGA Number of pending tasks 4, 8, 16 and 32 Number of tasks sent to the IP cores before waiting for their finalization OmpSs@FPGA Experiments (Matrix multiply)
  • 20. SAMOS XVIII DFiant HLS • Aims to bridge the programmability gap by combining constructs and semantics from software, hardware and dataflow languages • Programming model accommodates a middle-ground between low- level HDL and high-level sequential programming High-Level Synthesis Languages and Tools (e.g., C and Vivado HLS) Register-Transfer Level HDLs (e.g., VHDL) DFiant: A Dataflow HDL  Automatic pipelining  Not an HDL  Problem with state  Separating timing from functionality  Concurrency  Fine-grain control  Automatic pipelining Concurrency  Fine-grain control  Bound to clock  Explicit pipelining
  • 21. SAMOS XVIII PCI Express Manager DFE Memory MyManager (.maxj) x x + 30 x Manager m = new Manager(); Kernel k = new MyKernel(); m.setKernel(k); m.setIO( link(“x", CPU), m.build(); link(“y", CPU)); MyKernel(DATA_SIZE, x, DATA_SIZE*4, y, DATA_SIZE*4); for (int i =0; i < DATA_SIZE; i++) y[i]= x[i] * x[i] + 30; Main Memory CPU CPU Code Host Code (.c) MaxCompiler: Application Development 21 SLiC MaxelerOS DFEVar x = io.input("x", dfeInt(32)); DFEVar result = x * x + 30; io.output("y", result, dfeInt(32)); MyKernel (.maxj) int*x, *y; y x x + 30 y x
  • 22. Task-based kernel identification/DFE mapping OmpSs identifies “static” task graphs while running and DFE kilo-task instance is selected Instantiate static, customized, ultra-deep (>1,000 stages) computing pipelines DataFlow Engines (DFEs) produce one pair of results at each clock cycle
  • 23. XiTAO: Generalized Tasks Reduces Overheads Reduces Parallel Slackness Overcommits memory resources  Improve Parallel Slackness by reducing dimensionality  Reduce Cache/BW pressure by allocating multiple resources
  • 24. OmpSs + XiTao 21.08.18 24 Extend the XiTAO with support for heterogeneous resources
  • 25. Why extremely low-voltage for FPGAs? Why Energy efficiency through Undervolting? • Still way to go for <20MW@exascale • FPGAs are at least 20X less power- and energy-efficient than ASICs. • Undervolting directly delivers dynamic and static power/energy reduction. • Note: undervolting is not DVFS • Frequency is not lowered!
  • 26. FPGA Undervolting Preliminary Experiments Voltage scaling below standard nominal level, in on-chip memories of Xilinx technology: o Power quadratically decreases, deliver savings of more than 10X. o A conservative voltage guardband exists below nominal. o Fault exponentially increases, below the guardband. o Faults can be detected / managed.