SlideShare a Scribd company logo
1 of 19
HiPEAC CSW Autumn 2020
The LEGaTO project has received funding from the European Union's Horizon 2020 research and
innovation programme under the grant agreement No 780681
16.10.2020
LEGaTO:
Software Stack
Programming Models
HiPEAC 2020
Computer Systems Week
16-10-2020
Pascal Felber
University of Neuchatel
HiPEAC CSW Autumn 2020HiPEAC CSW Autumn 2020
Outline
• Programming models in LEGaTO’s big picture
• Common programming model for different targets
• Energy efficiency
• High-level dataflow hardware description language
• Kernel identification and dataflow engine mapping
• Fault tolerance and security
HiPEAC CSW Autumn 2020HiPEAC CSW Autumn 2020
LEGaTO big picture
LEGaTO
aspects
Smart Home
USE CASES
PROGRAMMING
MODEL
COMPILER &
HLS
RUNTIME
MIDDLEWARE
HARDWARE
Smart City
Secure IOT Gateway
Sequential Task-Based OmpSs programs
C and HLS SourceCode
CPU/GPU Binaries Bitstream
Deployment, Monitoring, Control
OpenStack Middleware
C Source Code RTL
Native compiler and Linker FPGA Synthesis
Runtime
Microserver
Hardware
Platform
XiTAO Runtime
MercuriumCompilation XiTAO Front-End SCONE Compiler MaxCompiler AutoAit DFiant HLS
Machine Learning
CPU
Node Composition Redfish API Monitoring and Control REST API
GPU FPGA/DFE
Healthcare
SecurityProgrammabilityEnergy- Efficiency Fault - tolerance
SCONE Runtime HEATSNanos Runtime
Fault-Tolerance
Interface
OmpSs Eclipse IDE Plug-In
HiPEAC CSW Autumn 2020HiPEAC CSW Autumn 2020
Main achievements
• Programming model, annotations
• Compiler support for OmpSs-2 with GPUs and FPGAs,
annotated task model, LLVM code generation
• IDE plugin for Eclipse
• Task groups, resource partitioning
• Energy efficiency in task scheduling (XiTao, HEATS, DiAS)
• DFiant high-productivity HDL
• Mapping of OmpSs tasks onto MaxJ
• Fault-tolerance through compiler-based error detection,
co-scheduling, checkpointing, secure task execution in TEE
HiPEAC CSW Autumn 2020HiPEAC CSW Autumn 2020
Towards a single source for any
target
• New architectures continue to appear
− Common programming model
− Increase programmers’ productivity
− Develop once → run everywhere
• Performance and energy efficiency
• Key concept behind OmpSs
− Sequential task based program on single address/name space +
directionality annotations
− Executed in parallel: automatic runtime computation of
dependences among tasks
− LEGaTO: extend tasks with resource requirements, propagate
through the stack to find the most energy efficient solution at
run time
HiPEAC CSW Autumn 2020HiPEAC CSW Autumn 2020
Front-end tool box
Front-end
HiPEAC CSW Autumn 2020HiPEAC CSW Autumn 2020
OmpSs with SMP, OpenCL and FPGA
#pragma omp target device(smp) copy_ deps
#pragma omp task depend(in:a, b) depend(inout:c)
void matrix_ multiply(float a[BS][BS], float b[BS][BS],
float c[BS][BS]);
SMP
GPGPU or
OpenCL FPGA
#pragma omp target device(opencl) ndrange(2, NB, NB, 16, 16) 
implements(matrix_ multiply)
#pragma omp task depend(in:a, b) depend(inout:c)
_ _ kernel void matrix_ multiply_ opencl(float a[BS][BS],
float b[BS][BS],
float c[BS][BS]);
FPGA
#pragma omp target device(fpga) implements(matrix_ multiply) 
num_ instances(3)
#pragma omp task depend(in:a, b) depend(inout:c)
void matrix_ multiply_ fpga(float a[BS][BS], float b[BS][BS],
float c[BS][BS]);
HiPEAC CSW Autumn 2020HiPEAC CSW Autumn 2020
OmpSs with FPGA experiments
IPs configuration 1*256, 3*128 Number of instances * size
Frequency (MHz) 200, 250, 300 Working frequency of the FPGA
Number of SMP cores SMP: 1 to 4
FPGA: 3+1 helper, 2+2 helpers
Combination of SMP and helper
threads
Number of FPGA helper threads SMP: 0; FPGA: 1, 2 Helper threads are used to manage
tasks on the FPGA
Number of pending tasks 4, 8, 16 and 32 Number of tasks sent to the IP cores
before waiting for their finalization
HiPEAC CSW Autumn 2020HiPEAC CSW Autumn 2020
IDE plug-in
• OpenMP and OmpSs support in Eclipse
− Support for most of the programming models
directives and clauses
− Including
small help
descriptions
− Based on
context, auto-
completion
HiPEAC CSW Autumn 2020HiPEAC CSW Autumn 2020
DFiant HDL
• Aims to bridge the programmability gap by combining constructs
and semantics from software, hardware and dataflow languages
• Programming model accommodates a middle-ground between
low-level HDL and high-level sequential programming
High-Level Synthesis
Languages and Tools
(e.g., C and Vivado HLS)
Register-Transfer
Level HDLs
(e.g., VHDL)
DFiant: A Dataflow HDL
 Automatic pipelining
 Not an HDL
 Problem with state
 Separating timing
from functionality
 Concurrency
 Fine-grain control
 Automatic pipelining Concurrency
 Fine-grain control
 Bound to clock
 Explicit pipelining
HiPEAC CSW Autumn 2020HiPEAC CSW Autumn 2020
Task-based kernel identification for
DFE mapping
• OmpSs identifies “static”
task graphs while running
• Annotation
of I/O and
compute help
to create DFE
task model
• Instantiate static,
customized, ultra-deep
(>1,000 stages) computing
pipelines
HiPEAC CSW Autumn 2020HiPEAC CSW Autumn 2020
Static task groups
1
Cluster static subgraphs into macrotasks targeting FPGA
execution and/or elastic multicore scheduling (XiTAO)
Static
subgraph Final
graph
with static
macrotask
#pragma oss task in(…) out(…)
2
4
22
33
#pragma oss taskgroup num_threads(auto)
HiPEAC CSW Autumn 2020HiPEAC CSW Autumn 2020
XiTAO data parallel nodes
• Energy Efficient
− Data parallel nodes hide internal task parallelism and can be
scheduled with XiTAO’s energy efficient scheduler
• Programmable
− C++ based interface, and requires minimal application code
changes
• Task/Data Parallel
− Easy and intuitive nesting of data parallel nodes in a coarser
TAO-DAG
• Granularity/Slackness Control
− User-level control on the granularity of internal parallelism
(control of the BLOCK_LENGTH for dynamically scheduled TAOs)
HiPEAC CSW Autumn 2020HiPEAC CSW Autumn 2020
26.11.2019 13
Energy efficiency for large jobs
HEATS: heterogeneity- and
energy-aware task scheduling
• Exploit the requirements of a
given task to identify the most
efficient configuration of nodes
• Monitoring tasks and nodes in
real time to perform the best
fitting placement and
migrations when necessary
• Prototype in Kubernetes
HiPEAC CSW Autumn 2020HiPEAC CSW Autumn 2020
Energy efficiency for large jobs
• Big data production systems usually implement priority
scheduling
− Job streams with different characteristics, latency
requirements
− Jobs with varying numbers tasks
− High-priority jobs are promptly served with little queueing
− Low-priority jobs suffer from repetitive evictions
− Pre-emptive priority scheduling = significant resource waste
• DiAS: differentially approximate and sprint CPU frequency
• DiAS improves the latency for all priorities and eliminates waste
from re-executing the evicted low-priority jobs
HiPEAC CSW Autumn 2020HiPEAC CSW Autumn 2020
Fault tolerance and security
• Fault tolerance front-end (compiler annotations)
− Initial work that translates pragma annotations to FTI
API calls
#pragma chk init Initialize the fault tolerance interface (FTI) library
#pragma chk load(data-expr-list) Protect variables in expr-list & recover from file
#pragma chk store(data-expr-list) Protect variables in expr-list & create a checkpoint file
#pragma chk shutdown Finalize/de-allocate the internal FTI data structures
• Fault tolerance back-end
− Implemented incremental checkpoint on FTI,
to be used to partially update checkpoint files
− Implemented partial recovery from checkpoint files,
to be used on recovery to extract output data of a
task from the checkpoint file
HiPEAC CSW Autumn 2020HiPEAC CSW Autumn 2020
SCONE platform
• Enables native applications to run inside Intel SGX
enclaves without code changes
• Transparently attests applications
• Supports network and file system shields
• Manages secrets and configuration
• Supports secure multi-stakeholder machine learning
computations:
− Code, data, and models are encrypted
− Tensorflow, PyTorch, OpenVino, OpenCV, etc.
HiPEAC CSW Autumn 2020HiPEAC CSW Autumn 2020
Summary
• Heterogeneity
− Integrated programming model around OmpSs
• Energy efficiency
− XiTAO scheduling
• Programming models in LEGaTO’s big picture
• Common programming model for different targets
• High-level dataflow hardware description language
• Kernel identification and dataflow engine mapping
• Fault tolerance and security
HiPEAC CSW Autumn 2020

More Related Content

What's hot

OpenMP tasking model: from the standard to the classroom
OpenMP tasking model: from the standard to the classroomOpenMP tasking model: from the standard to the classroom
OpenMP tasking model: from the standard to the classroomFacultad de Informática UCM
 
Understand and Harness the Capabilities of Intel® Xeon Phi™ Processors
Understand and Harness the Capabilities of Intel® Xeon Phi™ ProcessorsUnderstand and Harness the Capabilities of Intel® Xeon Phi™ Processors
Understand and Harness the Capabilities of Intel® Xeon Phi™ ProcessorsIntel® Software
 
Porting an MPI application to hybrid MPI+OpenMP with Reveal tool on Shaheen II
Porting an MPI application to hybrid MPI+OpenMP with Reveal tool on Shaheen IIPorting an MPI application to hybrid MPI+OpenMP with Reveal tool on Shaheen II
Porting an MPI application to hybrid MPI+OpenMP with Reveal tool on Shaheen IIGeorge Markomanolis
 
A DSP technical challange for an FPGA Engineer
A DSP technical challange for an FPGA EngineerA DSP technical challange for an FPGA Engineer
A DSP technical challange for an FPGA EngineerMaikon
 
Demosaic RTL for ISP workflow
Demosaic RTL for ISP workflowDemosaic RTL for ISP workflow
Demosaic RTL for ISP workflowMaikon
 
Use C++ and Intel® Threading Building Blocks (Intel® TBB) for Hardware Progra...
Use C++ and Intel® Threading Building Blocks (Intel® TBB) for Hardware Progra...Use C++ and Intel® Threading Building Blocks (Intel® TBB) for Hardware Progra...
Use C++ and Intel® Threading Building Blocks (Intel® TBB) for Hardware Progra...Intel® Software
 
A Fast and Accurate Cost Model for FPGA Design Space Exploration in HPC Appli...
A Fast and Accurate Cost Model for FPGA Design Space Exploration in HPC Appli...A Fast and Accurate Cost Model for FPGA Design Space Exploration in HPC Appli...
A Fast and Accurate Cost Model for FPGA Design Space Exploration in HPC Appli...waqarnabi
 
PL-4048, Adapting languages for parallel processing on GPUs, by Neil Henning
PL-4048, Adapting languages for parallel processing on GPUs, by Neil HenningPL-4048, Adapting languages for parallel processing on GPUs, by Neil Henning
PL-4048, Adapting languages for parallel processing on GPUs, by Neil HenningAMD Developer Central
 
Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...
Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...
Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...AMD Developer Central
 
Optimize Single Particle Orbital (SPO) Evaluations Based on B-splines
Optimize Single Particle Orbital (SPO) Evaluations Based on B-splinesOptimize Single Particle Orbital (SPO) Evaluations Based on B-splines
Optimize Single Particle Orbital (SPO) Evaluations Based on B-splinesIntel® Software
 
Kernel Recipes 2018 - XDP: a new fast and programmable network layer - Jesper...
Kernel Recipes 2018 - XDP: a new fast and programmable network layer - Jesper...Kernel Recipes 2018 - XDP: a new fast and programmable network layer - Jesper...
Kernel Recipes 2018 - XDP: a new fast and programmable network layer - Jesper...Anne Nicolas
 
EuroMPI 2016 Keynote: How Can MPI Fit Into Today's Big Computing
EuroMPI 2016 Keynote: How Can MPI Fit Into Today's Big ComputingEuroMPI 2016 Keynote: How Can MPI Fit Into Today's Big Computing
EuroMPI 2016 Keynote: How Can MPI Fit Into Today's Big ComputingJonathan Dursi
 
186 devlin p-poster(2)
186 devlin p-poster(2)186 devlin p-poster(2)
186 devlin p-poster(2)vaidehi87
 
Evaluating GPU programming Models for the LUMI Supercomputer
Evaluating GPU programming Models for the LUMI SupercomputerEvaluating GPU programming Models for the LUMI Supercomputer
Evaluating GPU programming Models for the LUMI SupercomputerGeorge Markomanolis
 
Improve Vectorization Efficiency
Improve Vectorization EfficiencyImprove Vectorization Efficiency
Improve Vectorization EfficiencyIntel® Software
 
Some experiences for porting application to Intel Xeon Phi
Some experiences for porting application to Intel Xeon PhiSome experiences for porting application to Intel Xeon Phi
Some experiences for porting application to Intel Xeon PhiMaho Nakata
 
On Context-Orientation in Aggregate Programming
On Context-Orientation in Aggregate ProgrammingOn Context-Orientation in Aggregate Programming
On Context-Orientation in Aggregate ProgrammingRoberto Casadei
 
A Framework for Efficient Rapid Prototyping by Virtually Enlarging FPGA Resou...
A Framework for Efficient Rapid Prototyping by Virtually Enlarging FPGA Resou...A Framework for Efficient Rapid Prototyping by Virtually Enlarging FPGA Resou...
A Framework for Efficient Rapid Prototyping by Virtually Enlarging FPGA Resou...Shinya Takamaeda-Y
 
Involvement in OpenHPC
Involvement in OpenHPC	Involvement in OpenHPC
Involvement in OpenHPC Linaro
 

What's hot (20)

OpenMP tasking model: from the standard to the classroom
OpenMP tasking model: from the standard to the classroomOpenMP tasking model: from the standard to the classroom
OpenMP tasking model: from the standard to the classroom
 
Understand and Harness the Capabilities of Intel® Xeon Phi™ Processors
Understand and Harness the Capabilities of Intel® Xeon Phi™ ProcessorsUnderstand and Harness the Capabilities of Intel® Xeon Phi™ Processors
Understand and Harness the Capabilities of Intel® Xeon Phi™ Processors
 
Porting an MPI application to hybrid MPI+OpenMP with Reveal tool on Shaheen II
Porting an MPI application to hybrid MPI+OpenMP with Reveal tool on Shaheen IIPorting an MPI application to hybrid MPI+OpenMP with Reveal tool on Shaheen II
Porting an MPI application to hybrid MPI+OpenMP with Reveal tool on Shaheen II
 
A DSP technical challange for an FPGA Engineer
A DSP technical challange for an FPGA EngineerA DSP technical challange for an FPGA Engineer
A DSP technical challange for an FPGA Engineer
 
Demosaic RTL for ISP workflow
Demosaic RTL for ISP workflowDemosaic RTL for ISP workflow
Demosaic RTL for ISP workflow
 
Use C++ and Intel® Threading Building Blocks (Intel® TBB) for Hardware Progra...
Use C++ and Intel® Threading Building Blocks (Intel® TBB) for Hardware Progra...Use C++ and Intel® Threading Building Blocks (Intel® TBB) for Hardware Progra...
Use C++ and Intel® Threading Building Blocks (Intel® TBB) for Hardware Progra...
 
A Fast and Accurate Cost Model for FPGA Design Space Exploration in HPC Appli...
A Fast and Accurate Cost Model for FPGA Design Space Exploration in HPC Appli...A Fast and Accurate Cost Model for FPGA Design Space Exploration in HPC Appli...
A Fast and Accurate Cost Model for FPGA Design Space Exploration in HPC Appli...
 
PL-4048, Adapting languages for parallel processing on GPUs, by Neil Henning
PL-4048, Adapting languages for parallel processing on GPUs, by Neil HenningPL-4048, Adapting languages for parallel processing on GPUs, by Neil Henning
PL-4048, Adapting languages for parallel processing on GPUs, by Neil Henning
 
Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...
Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...
Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by...
 
Optimize Single Particle Orbital (SPO) Evaluations Based on B-splines
Optimize Single Particle Orbital (SPO) Evaluations Based on B-splinesOptimize Single Particle Orbital (SPO) Evaluations Based on B-splines
Optimize Single Particle Orbital (SPO) Evaluations Based on B-splines
 
Kernel Recipes 2018 - XDP: a new fast and programmable network layer - Jesper...
Kernel Recipes 2018 - XDP: a new fast and programmable network layer - Jesper...Kernel Recipes 2018 - XDP: a new fast and programmable network layer - Jesper...
Kernel Recipes 2018 - XDP: a new fast and programmable network layer - Jesper...
 
EuroMPI 2016 Keynote: How Can MPI Fit Into Today's Big Computing
EuroMPI 2016 Keynote: How Can MPI Fit Into Today's Big ComputingEuroMPI 2016 Keynote: How Can MPI Fit Into Today's Big Computing
EuroMPI 2016 Keynote: How Can MPI Fit Into Today's Big Computing
 
186 devlin p-poster(2)
186 devlin p-poster(2)186 devlin p-poster(2)
186 devlin p-poster(2)
 
Evaluating GPU programming Models for the LUMI Supercomputer
Evaluating GPU programming Models for the LUMI SupercomputerEvaluating GPU programming Models for the LUMI Supercomputer
Evaluating GPU programming Models for the LUMI Supercomputer
 
Improve Vectorization Efficiency
Improve Vectorization EfficiencyImprove Vectorization Efficiency
Improve Vectorization Efficiency
 
Some experiences for porting application to Intel Xeon Phi
Some experiences for porting application to Intel Xeon PhiSome experiences for porting application to Intel Xeon Phi
Some experiences for porting application to Intel Xeon Phi
 
0507036
05070360507036
0507036
 
On Context-Orientation in Aggregate Programming
On Context-Orientation in Aggregate ProgrammingOn Context-Orientation in Aggregate Programming
On Context-Orientation in Aggregate Programming
 
A Framework for Efficient Rapid Prototyping by Virtually Enlarging FPGA Resou...
A Framework for Efficient Rapid Prototyping by Virtually Enlarging FPGA Resou...A Framework for Efficient Rapid Prototyping by Virtually Enlarging FPGA Resou...
A Framework for Efficient Rapid Prototyping by Virtually Enlarging FPGA Resou...
 
Involvement in OpenHPC
Involvement in OpenHPC	Involvement in OpenHPC
Involvement in OpenHPC
 

Similar to LEGaTO: Software Stack Programming Models

D. Fast, Simple User-Space Network Functions with Snabb (RIPE 77)
D. Fast, Simple User-Space Network Functions with Snabb (RIPE 77)D. Fast, Simple User-Space Network Functions with Snabb (RIPE 77)
D. Fast, Simple User-Space Network Functions with Snabb (RIPE 77)Igalia
 
CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distribu...
CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distribu...CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distribu...
CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distribu...AMD Developer Central
 
Large-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop EcosystemLarge-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop EcosystemGyula Fóra
 
Accelerating Real Time Analytics with Spark Streaming and FPGAaaS with Prabha...
Accelerating Real Time Analytics with Spark Streaming and FPGAaaS with Prabha...Accelerating Real Time Analytics with Spark Streaming and FPGAaaS with Prabha...
Accelerating Real Time Analytics with Spark Streaming and FPGAaaS with Prabha...Databricks
 
Accelerating Spark MLlib and DataFrame with Vector Processor “SX-Aurora TSUBASA”
Accelerating Spark MLlib and DataFrame with Vector Processor “SX-Aurora TSUBASA”Accelerating Spark MLlib and DataFrame with Vector Processor “SX-Aurora TSUBASA”
Accelerating Spark MLlib and DataFrame with Vector Processor “SX-Aurora TSUBASA”Databricks
 
Porting a Streaming Pipeline from Scala to Rust
Porting a Streaming Pipeline from Scala to RustPorting a Streaming Pipeline from Scala to Rust
Porting a Streaming Pipeline from Scala to RustEvan Chan
 
Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019
Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019
Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019Thomas Weise
 
Intermachine Parallelism
Intermachine ParallelismIntermachine Parallelism
Intermachine ParallelismSri Prasanna
 
Target updated track f
Target updated   track fTarget updated   track f
Target updated track fAlona Gradman
 
Chip Ex2010 Gert Goossens
Chip Ex2010 Gert GoossensChip Ex2010 Gert Goossens
Chip Ex2010 Gert GoossensAlona Gradman
 
CAPI and OpenCAPI Hardware acceleration enablement
CAPI and OpenCAPI Hardware acceleration enablementCAPI and OpenCAPI Hardware acceleration enablement
CAPI and OpenCAPI Hardware acceleration enablementGanesan Narayanasamy
 
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...Intel® Software
 
Unified, Efficient, and Portable Data Processing with Apache Beam
Unified, Efficient, and Portable Data Processing with Apache BeamUnified, Efficient, and Portable Data Processing with Apache Beam
Unified, Efficient, and Portable Data Processing with Apache BeamDataWorks Summit/Hadoop Summit
 
A New Chapter of Data Processing with CDK
A New Chapter of Data Processing with CDKA New Chapter of Data Processing with CDK
A New Chapter of Data Processing with CDKShu-Jeng Hsieh
 
SAMOS 2018: LEGaTO: first steps towards energy-efficient toolset for heteroge...
SAMOS 2018: LEGaTO: first steps towards energy-efficient toolset for heteroge...SAMOS 2018: LEGaTO: first steps towards energy-efficient toolset for heteroge...
SAMOS 2018: LEGaTO: first steps towards energy-efficient toolset for heteroge...LEGATO project
 
Cray XT Porting, Scaling, and Optimization Best Practices
Cray XT Porting, Scaling, and Optimization Best PracticesCray XT Porting, Scaling, and Optimization Best Practices
Cray XT Porting, Scaling, and Optimization Best PracticesJeff Larkin
 
Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...inside-BigData.com
 
TensorFlow meetup: Keras - Pytorch - TensorFlow.js
TensorFlow meetup: Keras - Pytorch - TensorFlow.jsTensorFlow meetup: Keras - Pytorch - TensorFlow.js
TensorFlow meetup: Keras - Pytorch - TensorFlow.jsStijn Decubber
 

Similar to LEGaTO: Software Stack Programming Models (20)

D. Fast, Simple User-Space Network Functions with Snabb (RIPE 77)
D. Fast, Simple User-Space Network Functions with Snabb (RIPE 77)D. Fast, Simple User-Space Network Functions with Snabb (RIPE 77)
D. Fast, Simple User-Space Network Functions with Snabb (RIPE 77)
 
CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distribu...
CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distribu...CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distribu...
CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distribu...
 
Large-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop EcosystemLarge-Scale Stream Processing in the Hadoop Ecosystem
Large-Scale Stream Processing in the Hadoop Ecosystem
 
Accelerating Real Time Analytics with Spark Streaming and FPGAaaS with Prabha...
Accelerating Real Time Analytics with Spark Streaming and FPGAaaS with Prabha...Accelerating Real Time Analytics with Spark Streaming and FPGAaaS with Prabha...
Accelerating Real Time Analytics with Spark Streaming and FPGAaaS with Prabha...
 
Accelerating Spark MLlib and DataFrame with Vector Processor “SX-Aurora TSUBASA”
Accelerating Spark MLlib and DataFrame with Vector Processor “SX-Aurora TSUBASA”Accelerating Spark MLlib and DataFrame with Vector Processor “SX-Aurora TSUBASA”
Accelerating Spark MLlib and DataFrame with Vector Processor “SX-Aurora TSUBASA”
 
Porting a Streaming Pipeline from Scala to Rust
Porting a Streaming Pipeline from Scala to RustPorting a Streaming Pipeline from Scala to Rust
Porting a Streaming Pipeline from Scala to Rust
 
Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019
Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019
Python Streaming Pipelines on Flink - Beam Meetup at Lyft 2019
 
Dpdk applications
Dpdk applicationsDpdk applications
Dpdk applications
 
Intermachine Parallelism
Intermachine ParallelismIntermachine Parallelism
Intermachine Parallelism
 
Target updated track f
Target updated   track fTarget updated   track f
Target updated track f
 
Chip Ex2010 Gert Goossens
Chip Ex2010 Gert GoossensChip Ex2010 Gert Goossens
Chip Ex2010 Gert Goossens
 
CAPI and OpenCAPI Hardware acceleration enablement
CAPI and OpenCAPI Hardware acceleration enablementCAPI and OpenCAPI Hardware acceleration enablement
CAPI and OpenCAPI Hardware acceleration enablement
 
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
Fast Insights to Optimized Vectorization and Memory Using Cache-aware Rooflin...
 
Unified, Efficient, and Portable Data Processing with Apache Beam
Unified, Efficient, and Portable Data Processing with Apache BeamUnified, Efficient, and Portable Data Processing with Apache Beam
Unified, Efficient, and Portable Data Processing with Apache Beam
 
A New Chapter of Data Processing with CDK
A New Chapter of Data Processing with CDKA New Chapter of Data Processing with CDK
A New Chapter of Data Processing with CDK
 
Nike tech talk.2
Nike tech talk.2Nike tech talk.2
Nike tech talk.2
 
SAMOS 2018: LEGaTO: first steps towards energy-efficient toolset for heteroge...
SAMOS 2018: LEGaTO: first steps towards energy-efficient toolset for heteroge...SAMOS 2018: LEGaTO: first steps towards energy-efficient toolset for heteroge...
SAMOS 2018: LEGaTO: first steps towards energy-efficient toolset for heteroge...
 
Cray XT Porting, Scaling, and Optimization Best Practices
Cray XT Porting, Scaling, and Optimization Best PracticesCray XT Porting, Scaling, and Optimization Best Practices
Cray XT Porting, Scaling, and Optimization Best Practices
 
Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...
 
TensorFlow meetup: Keras - Pytorch - TensorFlow.js
TensorFlow meetup: Keras - Pytorch - TensorFlow.jsTensorFlow meetup: Keras - Pytorch - TensorFlow.js
TensorFlow meetup: Keras - Pytorch - TensorFlow.js
 

More from LEGATO project

Scrooge Attack: Undervolting ARM Processors for Profit
Scrooge Attack: Undervolting ARM Processors for ProfitScrooge Attack: Undervolting ARM Processors for Profit
Scrooge Attack: Undervolting ARM Processors for ProfitLEGATO project
 
A practical approach for updating an integrity-enforced operating system
A practical approach for updating an integrity-enforced operating systemA practical approach for updating an integrity-enforced operating system
A practical approach for updating an integrity-enforced operating systemLEGATO project
 
TEEMon: A continuous performance monitoring framework for TEEs
TEEMon: A continuous performance monitoring framework for TEEsTEEMon: A continuous performance monitoring framework for TEEs
TEEMon: A continuous performance monitoring framework for TEEsLEGATO project
 
secureTF: A Secure TensorFlow Framework
secureTF: A Secure TensorFlow FrameworksecureTF: A Secure TensorFlow Framework
secureTF: A Secure TensorFlow FrameworkLEGATO project
 
PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep...
PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep...PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep...
PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep...LEGATO project
 
LEGaTO: Machine Learning Use Case
LEGaTO: Machine Learning Use CaseLEGaTO: Machine Learning Use Case
LEGaTO: Machine Learning Use CaseLEGATO project
 
Smart Home AI at the edge
Smart Home AI at the edgeSmart Home AI at the edge
Smart Home AI at the edgeLEGATO project
 
LEGaTO: Low-Energy Heterogeneous Computing Use of AI in the project
LEGaTO: Low-Energy Heterogeneous Computing Use of AI in the projectLEGaTO: Low-Energy Heterogeneous Computing Use of AI in the project
LEGaTO: Low-Energy Heterogeneous Computing Use of AI in the projectLEGATO project
 
LEGaTO Heterogeneous Hardware
LEGaTO Heterogeneous HardwareLEGaTO Heterogeneous Hardware
LEGaTO Heterogeneous HardwareLEGATO project
 
LEGaTO: Low-Energy Heterogeneous Computing Workshop
LEGaTO: Low-Energy Heterogeneous Computing WorkshopLEGaTO: Low-Energy Heterogeneous Computing Workshop
LEGaTO: Low-Energy Heterogeneous Computing WorkshopLEGATO project
 
TZ4Fabric: Executing Smart Contracts with ARM TrustZone
TZ4Fabric: Executing Smart Contracts with ARM TrustZoneTZ4Fabric: Executing Smart Contracts with ARM TrustZone
TZ4Fabric: Executing Smart Contracts with ARM TrustZoneLEGATO project
 
Infection Research with Maxeler Dataflow Computing
Infection Research with Maxeler Dataflow ComputingInfection Research with Maxeler Dataflow Computing
Infection Research with Maxeler Dataflow ComputingLEGATO project
 
Smart Home - AI at the edge
Smart Home - AI at the edgeSmart Home - AI at the edge
Smart Home - AI at the edgeLEGATO project
 
FPGA Undervolting and Checkpointing for Energy-Efficiency and Error-Resiliency
FPGA Undervolting and Checkpointing for Energy-Efficiency and Error-ResiliencyFPGA Undervolting and Checkpointing for Energy-Efficiency and Error-Resiliency
FPGA Undervolting and Checkpointing for Energy-Efficiency and Error-ResiliencyLEGATO project
 
Device Data Directory and Asynchronous execution: A path to heterogeneous com...
Device Data Directory and Asynchronous execution: A path to heterogeneous com...Device Data Directory and Asynchronous execution: A path to heterogeneous com...
Device Data Directory and Asynchronous execution: A path to heterogeneous com...LEGATO project
 
Scheduling Task-parallel Applications in Dynamically Asymmetric Environments
Scheduling Task-parallel Applications in Dynamically Asymmetric EnvironmentsScheduling Task-parallel Applications in Dynamically Asymmetric Environments
Scheduling Task-parallel Applications in Dynamically Asymmetric EnvironmentsLEGATO project
 
RECS – Cloud to Edge Microserver Platform for Energy-Efficient Computing
RECS – Cloud to Edge Microserver Platform for Energy-Efficient ComputingRECS – Cloud to Edge Microserver Platform for Energy-Efficient Computing
RECS – Cloud to Edge Microserver Platform for Energy-Efficient ComputingLEGATO project
 
Secure Task-Based Programming with OmpSs and SGX
Secure Task-Based Programming with OmpSs and SGXSecure Task-Based Programming with OmpSs and SGX
Secure Task-Based Programming with OmpSs and SGXLEGATO project
 

More from LEGATO project (20)

Scrooge Attack: Undervolting ARM Processors for Profit
Scrooge Attack: Undervolting ARM Processors for ProfitScrooge Attack: Undervolting ARM Processors for Profit
Scrooge Attack: Undervolting ARM Processors for Profit
 
A practical approach for updating an integrity-enforced operating system
A practical approach for updating an integrity-enforced operating systemA practical approach for updating an integrity-enforced operating system
A practical approach for updating an integrity-enforced operating system
 
TEEMon: A continuous performance monitoring framework for TEEs
TEEMon: A continuous performance monitoring framework for TEEsTEEMon: A continuous performance monitoring framework for TEEs
TEEMon: A continuous performance monitoring framework for TEEs
 
secureTF: A Secure TensorFlow Framework
secureTF: A Secure TensorFlow FrameworksecureTF: A Secure TensorFlow Framework
secureTF: A Secure TensorFlow Framework
 
PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep...
PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep...PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep...
PipeTune: Pipeline Parallelism of Hyper and System Parameters Tuning for Deep...
 
LEGaTO: Machine Learning Use Case
LEGaTO: Machine Learning Use CaseLEGaTO: Machine Learning Use Case
LEGaTO: Machine Learning Use Case
 
Smart Home AI at the edge
Smart Home AI at the edgeSmart Home AI at the edge
Smart Home AI at the edge
 
LEGaTO: Low-Energy Heterogeneous Computing Use of AI in the project
LEGaTO: Low-Energy Heterogeneous Computing Use of AI in the projectLEGaTO: Low-Energy Heterogeneous Computing Use of AI in the project
LEGaTO: Low-Energy Heterogeneous Computing Use of AI in the project
 
LEGaTO Integration
LEGaTO IntegrationLEGaTO Integration
LEGaTO Integration
 
LEGaTO: Use cases
LEGaTO: Use casesLEGaTO: Use cases
LEGaTO: Use cases
 
LEGaTO Heterogeneous Hardware
LEGaTO Heterogeneous HardwareLEGaTO Heterogeneous Hardware
LEGaTO Heterogeneous Hardware
 
LEGaTO: Low-Energy Heterogeneous Computing Workshop
LEGaTO: Low-Energy Heterogeneous Computing WorkshopLEGaTO: Low-Energy Heterogeneous Computing Workshop
LEGaTO: Low-Energy Heterogeneous Computing Workshop
 
TZ4Fabric: Executing Smart Contracts with ARM TrustZone
TZ4Fabric: Executing Smart Contracts with ARM TrustZoneTZ4Fabric: Executing Smart Contracts with ARM TrustZone
TZ4Fabric: Executing Smart Contracts with ARM TrustZone
 
Infection Research with Maxeler Dataflow Computing
Infection Research with Maxeler Dataflow ComputingInfection Research with Maxeler Dataflow Computing
Infection Research with Maxeler Dataflow Computing
 
Smart Home - AI at the edge
Smart Home - AI at the edgeSmart Home - AI at the edge
Smart Home - AI at the edge
 
FPGA Undervolting and Checkpointing for Energy-Efficiency and Error-Resiliency
FPGA Undervolting and Checkpointing for Energy-Efficiency and Error-ResiliencyFPGA Undervolting and Checkpointing for Energy-Efficiency and Error-Resiliency
FPGA Undervolting and Checkpointing for Energy-Efficiency and Error-Resiliency
 
Device Data Directory and Asynchronous execution: A path to heterogeneous com...
Device Data Directory and Asynchronous execution: A path to heterogeneous com...Device Data Directory and Asynchronous execution: A path to heterogeneous com...
Device Data Directory and Asynchronous execution: A path to heterogeneous com...
 
Scheduling Task-parallel Applications in Dynamically Asymmetric Environments
Scheduling Task-parallel Applications in Dynamically Asymmetric EnvironmentsScheduling Task-parallel Applications in Dynamically Asymmetric Environments
Scheduling Task-parallel Applications in Dynamically Asymmetric Environments
 
RECS – Cloud to Edge Microserver Platform for Energy-Efficient Computing
RECS – Cloud to Edge Microserver Platform for Energy-Efficient ComputingRECS – Cloud to Edge Microserver Platform for Energy-Efficient Computing
RECS – Cloud to Edge Microserver Platform for Energy-Efficient Computing
 
Secure Task-Based Programming with OmpSs and SGX
Secure Task-Based Programming with OmpSs and SGXSecure Task-Based Programming with OmpSs and SGX
Secure Task-Based Programming with OmpSs and SGX
 

Recently uploaded

Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Silpa
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
Dubai Call Girls Beauty Face Teen O525547819 Call Girls Dubai Young
Dubai Call Girls Beauty Face Teen O525547819 Call Girls Dubai YoungDubai Call Girls Beauty Face Teen O525547819 Call Girls Dubai Young
Dubai Call Girls Beauty Face Teen O525547819 Call Girls Dubai Youngkajalvid75
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)Areesha Ahmad
 
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort ServiceCall Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort Serviceshivanisharma5244
 
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Servicemonikaservice1
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxFarihaAbdulRasheed
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)Areesha Ahmad
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxseri bangash
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and ClassificationsAreesha Ahmad
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .Poonam Aher Patil
 
Sector 62, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 62, Noida Call girls :8448380779 Model Escorts | 100% verifiedSector 62, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 62, Noida Call girls :8448380779 Model Escorts | 100% verifiedDelhi Call girls
 
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedConnaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedDelhi Call girls
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flypumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flyPRADYUMMAURYA1
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...chandars293
 
300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptxryanrooker
 
Introduction to Viruses
Introduction to VirusesIntroduction to Viruses
Introduction to VirusesAreesha Ahmad
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Monika Rani
 
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Silpa
 

Recently uploaded (20)

Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Dubai Call Girls Beauty Face Teen O525547819 Call Girls Dubai Young
Dubai Call Girls Beauty Face Teen O525547819 Call Girls Dubai YoungDubai Call Girls Beauty Face Teen O525547819 Call Girls Dubai Young
Dubai Call Girls Beauty Face Teen O525547819 Call Girls Dubai Young
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)
 
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort ServiceCall Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
 
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
The Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptxThe Mariana Trench remarkable geological features on Earth.pptx
The Mariana Trench remarkable geological features on Earth.pptx
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
Sector 62, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 62, Noida Call girls :8448380779 Model Escorts | 100% verifiedSector 62, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 62, Noida Call girls :8448380779 Model Escorts | 100% verified
 
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedConnaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flypumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
 
300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx
 
Introduction to Viruses
Introduction to VirusesIntroduction to Viruses
Introduction to Viruses
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
 
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
 

LEGaTO: Software Stack Programming Models

  • 1. HiPEAC CSW Autumn 2020 The LEGaTO project has received funding from the European Union's Horizon 2020 research and innovation programme under the grant agreement No 780681 16.10.2020 LEGaTO: Software Stack Programming Models HiPEAC 2020 Computer Systems Week 16-10-2020 Pascal Felber University of Neuchatel
  • 2. HiPEAC CSW Autumn 2020HiPEAC CSW Autumn 2020 Outline • Programming models in LEGaTO’s big picture • Common programming model for different targets • Energy efficiency • High-level dataflow hardware description language • Kernel identification and dataflow engine mapping • Fault tolerance and security
  • 3. HiPEAC CSW Autumn 2020HiPEAC CSW Autumn 2020 LEGaTO big picture LEGaTO aspects Smart Home USE CASES PROGRAMMING MODEL COMPILER & HLS RUNTIME MIDDLEWARE HARDWARE Smart City Secure IOT Gateway Sequential Task-Based OmpSs programs C and HLS SourceCode CPU/GPU Binaries Bitstream Deployment, Monitoring, Control OpenStack Middleware C Source Code RTL Native compiler and Linker FPGA Synthesis Runtime Microserver Hardware Platform XiTAO Runtime MercuriumCompilation XiTAO Front-End SCONE Compiler MaxCompiler AutoAit DFiant HLS Machine Learning CPU Node Composition Redfish API Monitoring and Control REST API GPU FPGA/DFE Healthcare SecurityProgrammabilityEnergy- Efficiency Fault - tolerance SCONE Runtime HEATSNanos Runtime Fault-Tolerance Interface OmpSs Eclipse IDE Plug-In
  • 4. HiPEAC CSW Autumn 2020HiPEAC CSW Autumn 2020 Main achievements • Programming model, annotations • Compiler support for OmpSs-2 with GPUs and FPGAs, annotated task model, LLVM code generation • IDE plugin for Eclipse • Task groups, resource partitioning • Energy efficiency in task scheduling (XiTao, HEATS, DiAS) • DFiant high-productivity HDL • Mapping of OmpSs tasks onto MaxJ • Fault-tolerance through compiler-based error detection, co-scheduling, checkpointing, secure task execution in TEE
  • 5. HiPEAC CSW Autumn 2020HiPEAC CSW Autumn 2020 Towards a single source for any target • New architectures continue to appear − Common programming model − Increase programmers’ productivity − Develop once → run everywhere • Performance and energy efficiency • Key concept behind OmpSs − Sequential task based program on single address/name space + directionality annotations − Executed in parallel: automatic runtime computation of dependences among tasks − LEGaTO: extend tasks with resource requirements, propagate through the stack to find the most energy efficient solution at run time
  • 6. HiPEAC CSW Autumn 2020HiPEAC CSW Autumn 2020 Front-end tool box Front-end
  • 7. HiPEAC CSW Autumn 2020HiPEAC CSW Autumn 2020 OmpSs with SMP, OpenCL and FPGA #pragma omp target device(smp) copy_ deps #pragma omp task depend(in:a, b) depend(inout:c) void matrix_ multiply(float a[BS][BS], float b[BS][BS], float c[BS][BS]); SMP GPGPU or OpenCL FPGA #pragma omp target device(opencl) ndrange(2, NB, NB, 16, 16) implements(matrix_ multiply) #pragma omp task depend(in:a, b) depend(inout:c) _ _ kernel void matrix_ multiply_ opencl(float a[BS][BS], float b[BS][BS], float c[BS][BS]); FPGA #pragma omp target device(fpga) implements(matrix_ multiply) num_ instances(3) #pragma omp task depend(in:a, b) depend(inout:c) void matrix_ multiply_ fpga(float a[BS][BS], float b[BS][BS], float c[BS][BS]);
  • 8. HiPEAC CSW Autumn 2020HiPEAC CSW Autumn 2020 OmpSs with FPGA experiments IPs configuration 1*256, 3*128 Number of instances * size Frequency (MHz) 200, 250, 300 Working frequency of the FPGA Number of SMP cores SMP: 1 to 4 FPGA: 3+1 helper, 2+2 helpers Combination of SMP and helper threads Number of FPGA helper threads SMP: 0; FPGA: 1, 2 Helper threads are used to manage tasks on the FPGA Number of pending tasks 4, 8, 16 and 32 Number of tasks sent to the IP cores before waiting for their finalization
  • 9. HiPEAC CSW Autumn 2020HiPEAC CSW Autumn 2020 IDE plug-in • OpenMP and OmpSs support in Eclipse − Support for most of the programming models directives and clauses − Including small help descriptions − Based on context, auto- completion
  • 10. HiPEAC CSW Autumn 2020HiPEAC CSW Autumn 2020 DFiant HDL • Aims to bridge the programmability gap by combining constructs and semantics from software, hardware and dataflow languages • Programming model accommodates a middle-ground between low-level HDL and high-level sequential programming High-Level Synthesis Languages and Tools (e.g., C and Vivado HLS) Register-Transfer Level HDLs (e.g., VHDL) DFiant: A Dataflow HDL  Automatic pipelining  Not an HDL  Problem with state  Separating timing from functionality  Concurrency  Fine-grain control  Automatic pipelining Concurrency  Fine-grain control  Bound to clock  Explicit pipelining
  • 11. HiPEAC CSW Autumn 2020HiPEAC CSW Autumn 2020 Task-based kernel identification for DFE mapping • OmpSs identifies “static” task graphs while running • Annotation of I/O and compute help to create DFE task model • Instantiate static, customized, ultra-deep (>1,000 stages) computing pipelines
  • 12. HiPEAC CSW Autumn 2020HiPEAC CSW Autumn 2020 Static task groups 1 Cluster static subgraphs into macrotasks targeting FPGA execution and/or elastic multicore scheduling (XiTAO) Static subgraph Final graph with static macrotask #pragma oss task in(…) out(…) 2 4 22 33 #pragma oss taskgroup num_threads(auto)
  • 13. HiPEAC CSW Autumn 2020HiPEAC CSW Autumn 2020 XiTAO data parallel nodes • Energy Efficient − Data parallel nodes hide internal task parallelism and can be scheduled with XiTAO’s energy efficient scheduler • Programmable − C++ based interface, and requires minimal application code changes • Task/Data Parallel − Easy and intuitive nesting of data parallel nodes in a coarser TAO-DAG • Granularity/Slackness Control − User-level control on the granularity of internal parallelism (control of the BLOCK_LENGTH for dynamically scheduled TAOs)
  • 14. HiPEAC CSW Autumn 2020HiPEAC CSW Autumn 2020 26.11.2019 13 Energy efficiency for large jobs HEATS: heterogeneity- and energy-aware task scheduling • Exploit the requirements of a given task to identify the most efficient configuration of nodes • Monitoring tasks and nodes in real time to perform the best fitting placement and migrations when necessary • Prototype in Kubernetes
  • 15. HiPEAC CSW Autumn 2020HiPEAC CSW Autumn 2020 Energy efficiency for large jobs • Big data production systems usually implement priority scheduling − Job streams with different characteristics, latency requirements − Jobs with varying numbers tasks − High-priority jobs are promptly served with little queueing − Low-priority jobs suffer from repetitive evictions − Pre-emptive priority scheduling = significant resource waste • DiAS: differentially approximate and sprint CPU frequency • DiAS improves the latency for all priorities and eliminates waste from re-executing the evicted low-priority jobs
  • 16. HiPEAC CSW Autumn 2020HiPEAC CSW Autumn 2020 Fault tolerance and security • Fault tolerance front-end (compiler annotations) − Initial work that translates pragma annotations to FTI API calls #pragma chk init Initialize the fault tolerance interface (FTI) library #pragma chk load(data-expr-list) Protect variables in expr-list & recover from file #pragma chk store(data-expr-list) Protect variables in expr-list & create a checkpoint file #pragma chk shutdown Finalize/de-allocate the internal FTI data structures • Fault tolerance back-end − Implemented incremental checkpoint on FTI, to be used to partially update checkpoint files − Implemented partial recovery from checkpoint files, to be used on recovery to extract output data of a task from the checkpoint file
  • 17. HiPEAC CSW Autumn 2020HiPEAC CSW Autumn 2020 SCONE platform • Enables native applications to run inside Intel SGX enclaves without code changes • Transparently attests applications • Supports network and file system shields • Manages secrets and configuration • Supports secure multi-stakeholder machine learning computations: − Code, data, and models are encrypted − Tensorflow, PyTorch, OpenVino, OpenCV, etc.
  • 18. HiPEAC CSW Autumn 2020HiPEAC CSW Autumn 2020 Summary • Heterogeneity − Integrated programming model around OmpSs • Energy efficiency − XiTAO scheduling • Programming models in LEGaTO’s big picture • Common programming model for different targets • High-level dataflow hardware description language • Kernel identification and dataflow engine mapping • Fault tolerance and security

Editor's Notes

  1. HLS: High Level Synthesis HDL: Hardware Description Language AutoAit: mapping of OmpSs to Vivado
  2. Overall software toolchain for the cluster runtime. The LEGaTO programming model front-end is shown on the left-hand side of the figure. The LEGaTO front-end consists of the tools that process the source code and generate the LEGaTO binary targeting the heterogeneous platforms. These tools include extensions to Mercurium (previously developed by BSC) to analyze OmpSs source code and generate Nanos/XiTAO/FPGA/GPU binaries, and two high level programming methodologies to generate dataflow kernels: DFiant and MaxJ.
  3. OmpSs Provides tasking to SMP cores Usual scheduling policies/techniques: FIFO, Cilk, Immediate successor “Implements” provides different targets for the same task Kernel provided in CUDA or OpenCL Data transfers automatically issued by OmpSs Single source parallel programming with FPGA acceleration “Implements” technique available “num_instances(N)” allows to generate the indicated number of IP accelerators
  4. AXIOM board: Xilinx Zynq Ultrascale+ chip, with 4 ARM Cortex-A53 cores, and the ZU9EG FPGA.
  5. DFE: Data Flow Engine Task-based kernel identification/DFE mapping The purpose of Task T4.6 is to identify static sub-graphs in the OmpSs task graph and map them to kernels on a Maxeler FPGA-based Dataflow Engine (DFE). The rational is that the OmpSs tasks appear naturally suitable for FPGA mapping: They have clearly defined inputs and outputs and have self-contained state. Maxeler's programming model is based on dataflow where large dataflow graphs, described in MaxJ, are mapped and optimised to generate FPGA configurations. These dataflow graphs are essentially static, highly customised and ultra-deep pipelines that achieve very high computational throughput. Generating these dataflow graphs is supported by Maxeler's MaxCompiler toolchain and runtime execution from a host application is enabled through the MaxelerOS runtime. A task-based programming model such as OmpSs is a good match to act as a front end for the dataflow graph generation. However, due to high context switching overhead of FPGAs, tasks graphs mapped to FPGAs need to be static. SLiC: interface into Maxeler HW allowing to spawn tasks.
  6. Granularity of internal parallelism: thread to core mapping
  7. We have further improved HEATS by designing an updated version of it where not only migrations across heterogeneous nodes are exploited but also across the three layers of the deployment architecture (edge, fog and cloud). The prototype implementation and test of the system update is currently under development. On top of that, we have also observed performance and energy improvements when tuning the CPU frequency of the nodes. Based on this, we have developed a sprinting approach which runs jobs at a higher frequency for as long as a predefined budget is not used up. In this work we also take into account that jobs might have different priorities. The system was implemented in Go and tested using Spark. The outcome of this work has been submitted to Middleware’19 in a paper called “Differential Approximation and Sprinting for Multi-Priority Big Data Engines” which is currently under revision.
  8. Trick: Reduce a fraction of data load for low-priority jobs Temporarily increase the CPU frequency for high-priority jobs. Differential approximation Controllable approximation level that discriminates among priority classes: Drop different fractions of data Better latencies for low-priority jobs at the cost of their accuracy loss Less latency increase for high-priority jobs Stochastic models to control approximation and and sprinting Adjusts the frequency levels Accelerate high-priority jobs after temporarily waiting behind low-priority ones Result: reduce tail latencies (90% low-pri, 60% high-pri) and energy (20%)
  9. FTI: fault tolerance interface. Fault Tolerance Mechanisms In order to improve the availability of the platform we implemented a secure checkpointing mechanism. Our first version was implemented using the file system shield available in SCONE. Moreover, we added support for the systems calls vfork() and fork() in the SCONE toolchain in order to provide additional fault tolerance mechanisms such as rejuvenation which is a commonly implemented fault tolerance techniques. Providing fork support for applications running in Trusted Execution Environments is a non-trivial problem. First it requires to create a new enclave and copy the whole application state to the new enclave including running an attestation. Furthermore, the state must be consistent when using multiple enclave threads. It also requires pre-emption of non-forking threads. We have recently completed the implementation of this functionality which is currently under testing at the point of writing of this report.