SlideShare una empresa de Scribd logo
1 de 16
www.bsc.es
Leipzig, June 17th 2013
Jesus Labarta
Jesus.labarta@bsc.es
OmpSs – improving the
scalability of OpenMP
2
The parallel programming revolution
Parallel programming in the past
– Where to place data
– What to run where
– How to communicate
Parallel programming in the future
– What do I need to compute
– What data do I need to use
– hints (not necessarily very precise) on
potential concurrency, locality,…
Schedule @ programmers mind
Static
Schedule @ system
Dynamic
Complexity  ! Awareness
Variability
3
Key concept
– Sequential task based program on single address/name space
+ directionality annotations
– Happens to execute parallel: Automatic run time computation of
dependencies between tasks
Differentiation of StarSs
– Dependences: Tasks instantiated but not ready. Order IS defined
• Lookahead
– Avoid stalling the main control flow when a computation depending on previous
tasks is reached
– Possibility to “see” the future searching for further potential concurrency
– Locality aware
– Homogenizing heterogeneity
The StarSs family of programming models
4
The StarSs “Granularities”
StarSs
OmpSs COMPSs
@ SMP @ GPU @ Cluster
Average task Granularity:
100 microseconds – 10 milliseconds 1second - 1 day
Language binding:
C, C++, FORTRAN Java, Python
Address space to compute dependences:
Memory Files, Objects (SCM)
Parallel Ensemble, workflow
5
OpenMP
void Cholesky(int NT, float *A[NT][NT] ) {
#pragma omp parallel
#pragma omp single
for (int k=0; k<NT; k++) {
#pragma omp task
spotrf (A[k][k], TS);
#pragma omp taskwait
for (int i=k+1; i<NT; i++) {
#pragma omp task
strsm (A[k][k], A[k][i], TS);
}
#pragma omp taskwait
for (int i=k+1; i<NT; i++) {
for (j=k+1; j<i; j++)
#pragma omp task
sgemm( A[k][i], A[k][j], A[j][i], TS);
#pragma omp task
ssyrk (A[k][i], A[i][i], TS);
#pragma omp taskwait
}
}
}
6
Extend OpenMP with a data-flow execution model to exploit
unstructured parallelism (compared to fork-join)
– in/out pragmas to enable dependence and locality
– Inlined/outlined
void Cholesky(int NT, float *A[NT][NT] ) {
for (int k=0; k<NT; k++) {
#pragma omp task inout ([TS][TS]A[k][k])
spotrf (A[k][k], TS) ;
for (int i=k+1; i<NT; i++) {
#pragma omp task in(([TS][TS]A[k][k])) 
inout ([TS][TS]A[k][i])
strsm (A[k][k], A[k][i], TS);
}
for (int i=k+1; i<NT; i++) {
for (j=k+1; j<i; j++) {
#pragma omp task in([TS][TS]A[k][i]),
in([TS][TS]A[k][j]) inout ([TS][TS]A[j][i])
sgemm( A[k][i], A[k][j], A[j][i], TS);
}
#pragma omp task in ([TS][TS]A[k][i]) 
inout([TS][TS]A[i][i])
ssyrk (A[k][i], A[i][i], TS);
}
}
}
OmpSs
7
OmpSs
Other features and characteristics
– Contiguous / strided dependence detection and data management
– Nesting/recursion
– Multiple implementations for a given task
– Integrated with powerful performance tools (Extrae+Paraver)
Continuous development and use
– Since 2004
– On large applications and systems
Pushing ideas into the OpenMP standard
– Developer Positioning: efforts in OmpSs will not be lost.
– Dependences  v 4.0
18
Used in projects and applications …
Undertaken significant efforts to port real large scale
applications:
–
• Scalapack, PLASMA, SPECFEM3D, LBC, CPMD PSC, PEPC,
LS1 Mardyn, Asynchronous algorithms, Microbenchmarks
–
• YALES2, EUTERPE, SPECFEM3D, MP2C, BigDFT,
QuantumESPRESSO, PEPC, SMMP, ProFASI, COSMO, BQCD
– DEEP
• NEURON, iPIC3D, ECHAM/MESSy, AVBP, TurboRVB, Seismic
– G8_ECS
• CGPOP, NICAM (planed) …
– Consolider project (Spanish ministry)
• MRGENESIS
– BSC initiatives and collaborations:
• GROMACS, GADGET, WRF,…
19
… but NOT only for «scientific computing» …
Plagiarism detection
– Histograms, sorting, … (FhI FIRST)
Trace browsing
– Paraver (BSC)
Clustering algorithms
– G-means (BSC)
Image processing
– Tracking (USAF)
Embedded and consumer
– H.264 (TUBerlin), …
8
The potential of asynchrony
Performance …
– Cholesky 8K x 8K @ SandyBridge (E5-2670, 2.6 GHz)
– Intel Math Kernel Library (MKL) parallel vs. OmpSs+ MKL sequential
• Nested OmpSs (large block size, small block size)
 Reproducibility / less sensitivity to environment options
Strong scaling
Impact of problem size
for fixed core count
9
• Detected issues (ongoing work)
– scheduling policies (generic of nested OmpSs),
– contention on task queues (specific of high #cores),…
• Still fairly good performance!!!
Xeon Phi (I)
10
Hybrid MPI/SMPSs
Linpack example
Overlap communication/computation
Extend asynchronous data-flow execution to
outer level
Automatic lookahead
…
for (k=0; k<N; k++) {
if (mine) {
Factor_panel(A[k]);
send (A[k])
} else {
receive (A[k]);
if (necessary) resend (A[k]);
}
for (j=k+1; j<N; j++)
update (A[k], A[j]);
…
#pragma css task inout(A[SIZE])
void Factor_panel(float *A);
#pragma css task input(A[SIZE]) inout(B[SIZE])
void update(float *A, float *B);
#pragma css task input(A[SIZE])
void send(float *A);
#pragma css task output(A[SIZE])
void receive(float *A);
#pragma css task input(A[SIZE])
void resend(float *A);
P0 P1 P2
V. Marjanovic, et al, “Overlapping Communication and Computation by using a Hybrid MPI/SMPSs Approach” ICS 2010
11
Automatically achieved by the runtime
– Load balance within node
– Fine grain.
– Complementary to application level
load balance.
– Leverage OmpSs malleability
LeWI: Lend core When Idle
– User level Run time Library (DLB)
coordinating processes within node
– Fighting the Linux kernel: Explicit pinning of
threads and handoff scheduling
Overcome Amdahl’s law in hybrid
programming
– Enable partial node level parallelization
– hope for lazy programmers
… and us all
Dynamic Load Balancing: LeWI
Communication
Running task
Idle
4 MPI processes @ 4 cores node
“LeWI: A Runtime Balancing
Algorithm for Nested Parallelism”.
M.Garcia et al. ICPP09
12
Offloading to Intel® Xeon Phi™ (SRMPI app @ DEEP)
Host
KNC
KNC
KNC
KNC
Host
KNC
KNC
KNC
KNC
MPI
MPI
MPI
void main(int argc, char* argv[]) {
//MPI stuff ….
for(int i=0; i<my_jobs; i++) {
int idx = i % workers;
#pragma omp task out(worker_image[idx])
worker(worker_image[idx]);
#pragma omp task inout(master_image) in(worker_image[idx]);
accum(master_image, worker_image[idx])
}
#pragma omp taskwait
global_accum(global_pool, master_image); // MPI
}
Host
KNC
KNC
KNC
KNC
Host
KNC
KNC
KNC
KNC
MPI
OmpSs
OmpSs
void main(int argc, char* argv[]) {
//MPI stuff ….
for(int i=0; i<my_jobs; i++) {
int idx = i % workers;
// MPI stuff to send job to worker
//…
//….manually taking care of load balance
// …
// …. and collect results on worker_pool[idx]
accum(master_image, worker_image[idx])
}
global_accum(global_image, master_image); // MPI
}
void main(int argc, char* argv[]) {
for(int i=0; i<my_jobs; i++) {
int idx = i % workers;
#pragma omp task out(worker_image[idx])
worker(worker_image[idx]);
#pragma omp task inout(master_image) in(worker_image[idx]);
accum(global_image, worker_image[idx])
}
#pragma omp taskwait
}
13
Offloading to Intel® Xeon Phi™ (SRMPI app @ DEEP)
Productivity …
… and performance
… flexibility …
14
MPI + OmpSs @ MIC
– Hydro 8 MPI processes, 2 cores/process, 4 OmpSs threads/process
– Mapped to 64 hardware threads (16 cores)
4 OmpSs threads/Core
2 OmpSs threads/Core
3 OmpSs threads@ Core 1
2 OmpSs threads/Core
1 OmpSs threads@ Core 2
15
OmpSs: Enabler for exascale
 Can exploit very unstructured
parallelism
 Not just loop/data parallelism
 Easy to change structure
 Supports large amounts of
lookahead
 Not stalling for dependence satisfaction
 Allow for locality optimizations to
tolerate latency
 Overlap data transfers, prefetch
 Reuse
 Nicely hybridizes into MPI/StarSs
 Propagates to large scale the node level
dataflow characteristics
 Overlap communication and computation
 A chance against Amdahl’s law
 Support for heterogeneity
 Any # and combination of CPUs, GPUs
 Including autotuning
 Malleability: Decouple program from
resources
 Allowing dynamic resource allocation and
load balance
 Tolerate noise
15
Asynchrony: Data-flow
Locality
Simple / incremental
interface to programmers
Compatible with proprietary
lower level technologies
THANKS

Más contenido relacionado

La actualidad más candente

An evaluation of LLVM compiler for SVE with fairly complicated loops
An evaluation of LLVM compiler for SVE with fairly complicated loopsAn evaluation of LLVM compiler for SVE with fairly complicated loops
An evaluation of LLVM compiler for SVE with fairly complicated loopsLinaro
 
あなたのScalaを爆速にする7つの方法
あなたのScalaを爆速にする7つの方法あなたのScalaを爆速にする7つの方法
あなたのScalaを爆速にする7つの方法x1 ichi
 
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...Jen Aman
 
Using R in remote computer clusters
Using R in remote computer clustersUsing R in remote computer clusters
Using R in remote computer clustersBurak Himmetoglu
 
Flux and InfluxDB 2.0 by Paul Dix
Flux and InfluxDB 2.0 by Paul DixFlux and InfluxDB 2.0 by Paul Dix
Flux and InfluxDB 2.0 by Paul DixInfluxData
 
Linuxconf 2011 parallel languages talk
Linuxconf 2011 parallel languages talkLinuxconf 2011 parallel languages talk
Linuxconf 2011 parallel languages talkLenz Gschwendtner
 
Ufuc Celebi – Stream & Batch Processing in one System
Ufuc Celebi – Stream & Batch Processing in one SystemUfuc Celebi – Stream & Batch Processing in one System
Ufuc Celebi – Stream & Batch Processing in one SystemFlink Forward
 
Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ...
Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ...Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ...
Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ...Intel® Software
 
Early Results of OpenMP 4.5 Portability on NVIDIA GPUs & CPUs
Early Results of OpenMP 4.5 Portability on NVIDIA GPUs & CPUsEarly Results of OpenMP 4.5 Portability on NVIDIA GPUs & CPUs
Early Results of OpenMP 4.5 Portability on NVIDIA GPUs & CPUsJeff Larkin
 
.Net 4.0 Threading and Parallel Programming
.Net 4.0 Threading and Parallel Programming.Net 4.0 Threading and Parallel Programming
.Net 4.0 Threading and Parallel ProgrammingAlex Moore
 
Michael Häusler – Everyday flink
Michael Häusler – Everyday flinkMichael Häusler – Everyday flink
Michael Häusler – Everyday flinkFlink Forward
 
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...AMD Developer Central
 
A Graph-Based Method For Cross-Entity Threat Detection
 A Graph-Based Method For Cross-Entity Threat Detection A Graph-Based Method For Cross-Entity Threat Detection
A Graph-Based Method For Cross-Entity Threat DetectionJen Aman
 
Brief introduction on Hadoop,Dremel, Pig, FlumeJava and Cassandra
Brief introduction on Hadoop,Dremel, Pig, FlumeJava and CassandraBrief introduction on Hadoop,Dremel, Pig, FlumeJava and Cassandra
Brief introduction on Hadoop,Dremel, Pig, FlumeJava and CassandraSomnath Mazumdar
 
Buzzwords Numba Presentation
Buzzwords Numba PresentationBuzzwords Numba Presentation
Buzzwords Numba Presentationkammeyer
 
Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...
Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...
Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...Databricks
 
Probabilistic data structures. Part 2. Cardinality
Probabilistic data structures. Part 2. CardinalityProbabilistic data structures. Part 2. Cardinality
Probabilistic data structures. Part 2. CardinalityAndrii Gakhov
 
Apache Flink Training: DataStream API Part 1 Basic
 Apache Flink Training: DataStream API Part 1 Basic Apache Flink Training: DataStream API Part 1 Basic
Apache Flink Training: DataStream API Part 1 BasicFlink Forward
 
EuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOME
EuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOMEEuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOME
EuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOMEHONGJOO LEE
 

La actualidad más candente (20)

An evaluation of LLVM compiler for SVE with fairly complicated loops
An evaluation of LLVM compiler for SVE with fairly complicated loopsAn evaluation of LLVM compiler for SVE with fairly complicated loops
An evaluation of LLVM compiler for SVE with fairly complicated loops
 
あなたのScalaを爆速にする7つの方法
あなたのScalaを爆速にする7つの方法あなたのScalaを爆速にする7つの方法
あなたのScalaを爆速にする7つの方法
 
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
 
Using R in remote computer clusters
Using R in remote computer clustersUsing R in remote computer clusters
Using R in remote computer clusters
 
Flux and InfluxDB 2.0 by Paul Dix
Flux and InfluxDB 2.0 by Paul DixFlux and InfluxDB 2.0 by Paul Dix
Flux and InfluxDB 2.0 by Paul Dix
 
Linuxconf 2011 parallel languages talk
Linuxconf 2011 parallel languages talkLinuxconf 2011 parallel languages talk
Linuxconf 2011 parallel languages talk
 
Ufuc Celebi – Stream & Batch Processing in one System
Ufuc Celebi – Stream & Batch Processing in one SystemUfuc Celebi – Stream & Batch Processing in one System
Ufuc Celebi – Stream & Batch Processing in one System
 
Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ...
Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ...Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ...
Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ...
 
Early Results of OpenMP 4.5 Portability on NVIDIA GPUs & CPUs
Early Results of OpenMP 4.5 Portability on NVIDIA GPUs & CPUsEarly Results of OpenMP 4.5 Portability on NVIDIA GPUs & CPUs
Early Results of OpenMP 4.5 Portability on NVIDIA GPUs & CPUs
 
.Net 4.0 Threading and Parallel Programming
.Net 4.0 Threading and Parallel Programming.Net 4.0 Threading and Parallel Programming
.Net 4.0 Threading and Parallel Programming
 
Michael Häusler – Everyday flink
Michael Häusler – Everyday flinkMichael Häusler – Everyday flink
Michael Häusler – Everyday flink
 
scikit-cuda
scikit-cudascikit-cuda
scikit-cuda
 
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
 
A Graph-Based Method For Cross-Entity Threat Detection
 A Graph-Based Method For Cross-Entity Threat Detection A Graph-Based Method For Cross-Entity Threat Detection
A Graph-Based Method For Cross-Entity Threat Detection
 
Brief introduction on Hadoop,Dremel, Pig, FlumeJava and Cassandra
Brief introduction on Hadoop,Dremel, Pig, FlumeJava and CassandraBrief introduction on Hadoop,Dremel, Pig, FlumeJava and Cassandra
Brief introduction on Hadoop,Dremel, Pig, FlumeJava and Cassandra
 
Buzzwords Numba Presentation
Buzzwords Numba PresentationBuzzwords Numba Presentation
Buzzwords Numba Presentation
 
Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...
Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...
Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...
 
Probabilistic data structures. Part 2. Cardinality
Probabilistic data structures. Part 2. CardinalityProbabilistic data structures. Part 2. Cardinality
Probabilistic data structures. Part 2. Cardinality
 
Apache Flink Training: DataStream API Part 1 Basic
 Apache Flink Training: DataStream API Part 1 Basic Apache Flink Training: DataStream API Part 1 Basic
Apache Flink Training: DataStream API Part 1 Basic
 
EuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOME
EuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOMEEuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOME
EuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOME
 

Similar a OmpSs - Improving Parallelism of OpenMP Applications

OpenMP tasking model: from the standard to the classroom
OpenMP tasking model: from the standard to the classroomOpenMP tasking model: from the standard to the classroom
OpenMP tasking model: from the standard to the classroomFacultad de Informática UCM
 
NVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読みNVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読みNVIDIA Japan
 
Beyond Map/Reduce: Getting Creative With Parallel Processing
Beyond Map/Reduce: Getting Creative With Parallel ProcessingBeyond Map/Reduce: Getting Creative With Parallel Processing
Beyond Map/Reduce: Getting Creative With Parallel ProcessingEd Kohlwey
 
Big data shim
Big data shimBig data shim
Big data shimtistrue
 
Intermachine Parallelism
Intermachine ParallelismIntermachine Parallelism
Intermachine ParallelismSri Prasanna
 
Task based Programming with OmpSs and its Application
Task based Programming with OmpSs and its ApplicationTask based Programming with OmpSs and its Application
Task based Programming with OmpSs and its ApplicationFacultad de Informática UCM
 
Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*Intel® Software
 
Chapel-on-X: Exploring Tasking Runtimes for PGAS Languages
Chapel-on-X: Exploring Tasking Runtimes for PGAS LanguagesChapel-on-X: Exploring Tasking Runtimes for PGAS Languages
Chapel-on-X: Exploring Tasking Runtimes for PGAS LanguagesAkihiro Hayashi
 
Overview of Apache SystemML by Berthold Reinwald and Nakul Jindal
Overview of Apache SystemML by Berthold Reinwald and Nakul JindalOverview of Apache SystemML by Berthold Reinwald and Nakul Jindal
Overview of Apache SystemML by Berthold Reinwald and Nakul JindalArvind Surve
 
Overview of Apache SystemML by Berthold Reinwald and Nakul Jindal
Overview of Apache SystemML by Berthold Reinwald and Nakul JindalOverview of Apache SystemML by Berthold Reinwald and Nakul Jindal
Overview of Apache SystemML by Berthold Reinwald and Nakul JindalArvind Surve
 
IS-ENES COMP Superscalar tutorial
IS-ENES COMP Superscalar tutorialIS-ENES COMP Superscalar tutorial
IS-ENES COMP Superscalar tutorialRoger Rafanell Mas
 
Natural Language Processing with CNTK and Apache Spark with Ali Zaidi
Natural Language Processing with CNTK and Apache Spark with Ali ZaidiNatural Language Processing with CNTK and Apache Spark with Ali Zaidi
Natural Language Processing with CNTK and Apache Spark with Ali ZaidiDatabricks
 
Introduction to Scalding and Monoids
Introduction to Scalding and MonoidsIntroduction to Scalding and Monoids
Introduction to Scalding and MonoidsHugo Gävert
 
Atomate: a high-level interface to generate, execute, and analyze computation...
Atomate: a high-level interface to generate, execute, and analyze computation...Atomate: a high-level interface to generate, execute, and analyze computation...
Atomate: a high-level interface to generate, execute, and analyze computation...Anubhav Jain
 
NYAI - Scaling Machine Learning Applications by Braxton McKee
NYAI - Scaling Machine Learning Applications by Braxton McKeeNYAI - Scaling Machine Learning Applications by Braxton McKee
NYAI - Scaling Machine Learning Applications by Braxton McKeeRizwan Habib
 
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16MLconf
 
Spark with Elasticsearch - umd version 2014
Spark with Elasticsearch - umd version 2014Spark with Elasticsearch - umd version 2014
Spark with Elasticsearch - umd version 2014Holden Karau
 
HPAT presentation at JuliaCon 2016
HPAT presentation at JuliaCon 2016HPAT presentation at JuliaCon 2016
HPAT presentation at JuliaCon 2016Ehsan Totoni
 
Three Functional Programming Technologies for Big Data
Three Functional Programming Technologies for Big DataThree Functional Programming Technologies for Big Data
Three Functional Programming Technologies for Big DataDynamical Software, Inc.
 

Similar a OmpSs - Improving Parallelism of OpenMP Applications (20)

OpenMP tasking model: from the standard to the classroom
OpenMP tasking model: from the standard to the classroomOpenMP tasking model: from the standard to the classroom
OpenMP tasking model: from the standard to the classroom
 
NVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読みNVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読み
 
Beyond Map/Reduce: Getting Creative With Parallel Processing
Beyond Map/Reduce: Getting Creative With Parallel ProcessingBeyond Map/Reduce: Getting Creative With Parallel Processing
Beyond Map/Reduce: Getting Creative With Parallel Processing
 
Big data shim
Big data shimBig data shim
Big data shim
 
MapReduce basics
MapReduce basicsMapReduce basics
MapReduce basics
 
Intermachine Parallelism
Intermachine ParallelismIntermachine Parallelism
Intermachine Parallelism
 
Task based Programming with OmpSs and its Application
Task based Programming with OmpSs and its ApplicationTask based Programming with OmpSs and its Application
Task based Programming with OmpSs and its Application
 
Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*
 
Chapel-on-X: Exploring Tasking Runtimes for PGAS Languages
Chapel-on-X: Exploring Tasking Runtimes for PGAS LanguagesChapel-on-X: Exploring Tasking Runtimes for PGAS Languages
Chapel-on-X: Exploring Tasking Runtimes for PGAS Languages
 
Overview of Apache SystemML by Berthold Reinwald and Nakul Jindal
Overview of Apache SystemML by Berthold Reinwald and Nakul JindalOverview of Apache SystemML by Berthold Reinwald and Nakul Jindal
Overview of Apache SystemML by Berthold Reinwald and Nakul Jindal
 
Overview of Apache SystemML by Berthold Reinwald and Nakul Jindal
Overview of Apache SystemML by Berthold Reinwald and Nakul JindalOverview of Apache SystemML by Berthold Reinwald and Nakul Jindal
Overview of Apache SystemML by Berthold Reinwald and Nakul Jindal
 
IS-ENES COMP Superscalar tutorial
IS-ENES COMP Superscalar tutorialIS-ENES COMP Superscalar tutorial
IS-ENES COMP Superscalar tutorial
 
Natural Language Processing with CNTK and Apache Spark with Ali Zaidi
Natural Language Processing with CNTK and Apache Spark with Ali ZaidiNatural Language Processing with CNTK and Apache Spark with Ali Zaidi
Natural Language Processing with CNTK and Apache Spark with Ali Zaidi
 
Introduction to Scalding and Monoids
Introduction to Scalding and MonoidsIntroduction to Scalding and Monoids
Introduction to Scalding and Monoids
 
Atomate: a high-level interface to generate, execute, and analyze computation...
Atomate: a high-level interface to generate, execute, and analyze computation...Atomate: a high-level interface to generate, execute, and analyze computation...
Atomate: a high-level interface to generate, execute, and analyze computation...
 
NYAI - Scaling Machine Learning Applications by Braxton McKee
NYAI - Scaling Machine Learning Applications by Braxton McKeeNYAI - Scaling Machine Learning Applications by Braxton McKee
NYAI - Scaling Machine Learning Applications by Braxton McKee
 
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16
 
Spark with Elasticsearch - umd version 2014
Spark with Elasticsearch - umd version 2014Spark with Elasticsearch - umd version 2014
Spark with Elasticsearch - umd version 2014
 
HPAT presentation at JuliaCon 2016
HPAT presentation at JuliaCon 2016HPAT presentation at JuliaCon 2016
HPAT presentation at JuliaCon 2016
 
Three Functional Programming Technologies for Big Data
Three Functional Programming Technologies for Big DataThree Functional Programming Technologies for Big Data
Three Functional Programming Technologies for Big Data
 

Más de Intel IT Center

AI Crash Course- Supercomputing
AI Crash Course- SupercomputingAI Crash Course- Supercomputing
AI Crash Course- SupercomputingIntel IT Center
 
FPGA Inference - DellEMC SURFsara
FPGA Inference - DellEMC SURFsaraFPGA Inference - DellEMC SURFsara
FPGA Inference - DellEMC SURFsaraIntel IT Center
 
High Memory Bandwidth Demo @ One Intel Station
High Memory Bandwidth Demo @ One Intel StationHigh Memory Bandwidth Demo @ One Intel Station
High Memory Bandwidth Demo @ One Intel StationIntel IT Center
 
INFOGRAPHIC: Advantages of Intel vs. IBM Power on SAP HANA solutions
INFOGRAPHIC: Advantages of Intel vs. IBM Power on SAP HANA solutionsINFOGRAPHIC: Advantages of Intel vs. IBM Power on SAP HANA solutions
INFOGRAPHIC: Advantages of Intel vs. IBM Power on SAP HANA solutionsIntel IT Center
 
Disrupt Hackers With Robust User Authentication
Disrupt Hackers With Robust User AuthenticationDisrupt Hackers With Robust User Authentication
Disrupt Hackers With Robust User AuthenticationIntel IT Center
 
Strengthen Your Enterprise Arsenal Against Cyber Attacks With Hardware-Enhanc...
Strengthen Your Enterprise Arsenal Against Cyber Attacks With Hardware-Enhanc...Strengthen Your Enterprise Arsenal Against Cyber Attacks With Hardware-Enhanc...
Strengthen Your Enterprise Arsenal Against Cyber Attacks With Hardware-Enhanc...Intel IT Center
 
Harness Digital Disruption to Create 2022’s Workplace Today
Harness Digital Disruption to Create 2022’s Workplace TodayHarness Digital Disruption to Create 2022’s Workplace Today
Harness Digital Disruption to Create 2022’s Workplace TodayIntel IT Center
 
Don't Rely on Software Alone. Protect Endpoints with Hardware-Enhanced Security.
Don't Rely on Software Alone.Protect Endpoints with Hardware-Enhanced Security.Don't Rely on Software Alone.Protect Endpoints with Hardware-Enhanced Security.
Don't Rely on Software Alone. Protect Endpoints with Hardware-Enhanced Security.Intel IT Center
 
Achieve Unconstrained Collaboration in a Digital World
Achieve Unconstrained Collaboration in a Digital WorldAchieve Unconstrained Collaboration in a Digital World
Achieve Unconstrained Collaboration in a Digital WorldIntel IT Center
 
Intel® Xeon® Scalable Processors Enabled Applications Marketing Guide
Intel® Xeon® Scalable Processors Enabled Applications Marketing GuideIntel® Xeon® Scalable Processors Enabled Applications Marketing Guide
Intel® Xeon® Scalable Processors Enabled Applications Marketing GuideIntel IT Center
 
#NABshow: National Association of Broadcasters 2017 Super Session Presentatio...
#NABshow: National Association of Broadcasters 2017 Super Session Presentatio...#NABshow: National Association of Broadcasters 2017 Super Session Presentatio...
#NABshow: National Association of Broadcasters 2017 Super Session Presentatio...Intel IT Center
 
Identity Protection for the Digital Age
Identity Protection for the Digital AgeIdentity Protection for the Digital Age
Identity Protection for the Digital AgeIntel IT Center
 
Three Steps to Making a Digital Workplace a Reality
Three Steps to Making a Digital Workplace a RealityThree Steps to Making a Digital Workplace a Reality
Three Steps to Making a Digital Workplace a RealityIntel IT Center
 
Three Steps to Making The Digital Workplace a Reality - by Intel’s Chad Const...
Three Steps to Making The Digital Workplace a Reality - by Intel’s Chad Const...Three Steps to Making The Digital Workplace a Reality - by Intel’s Chad Const...
Three Steps to Making The Digital Workplace a Reality - by Intel’s Chad Const...Intel IT Center
 
Intel® Xeon® Processor E7-8800/4800 v4 EAMG 2.0
Intel® Xeon® Processor E7-8800/4800 v4 EAMG 2.0Intel® Xeon® Processor E7-8800/4800 v4 EAMG 2.0
Intel® Xeon® Processor E7-8800/4800 v4 EAMG 2.0Intel IT Center
 
Intel® Xeon® Processor E5-2600 v4 Enterprise Database Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Enterprise Database Applications ShowcaseIntel® Xeon® Processor E5-2600 v4 Enterprise Database Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Enterprise Database Applications ShowcaseIntel IT Center
 
Intel® Xeon® Processor E5-2600 v4 Core Business Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Core Business Applications ShowcaseIntel® Xeon® Processor E5-2600 v4 Core Business Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Core Business Applications ShowcaseIntel IT Center
 
Intel® Xeon® Processor E5-2600 v4 Financial Security Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Financial Security Applications ShowcaseIntel® Xeon® Processor E5-2600 v4 Financial Security Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Financial Security Applications ShowcaseIntel IT Center
 
Intel® Xeon® Processor E5-2600 v4 Telco Cloud Digital Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Telco Cloud Digital Applications ShowcaseIntel® Xeon® Processor E5-2600 v4 Telco Cloud Digital Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Telco Cloud Digital Applications ShowcaseIntel IT Center
 
Intel® Xeon® Processor E5-2600 v4 Tech Computing Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Tech Computing Applications ShowcaseIntel® Xeon® Processor E5-2600 v4 Tech Computing Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Tech Computing Applications ShowcaseIntel IT Center
 

Más de Intel IT Center (20)

AI Crash Course- Supercomputing
AI Crash Course- SupercomputingAI Crash Course- Supercomputing
AI Crash Course- Supercomputing
 
FPGA Inference - DellEMC SURFsara
FPGA Inference - DellEMC SURFsaraFPGA Inference - DellEMC SURFsara
FPGA Inference - DellEMC SURFsara
 
High Memory Bandwidth Demo @ One Intel Station
High Memory Bandwidth Demo @ One Intel StationHigh Memory Bandwidth Demo @ One Intel Station
High Memory Bandwidth Demo @ One Intel Station
 
INFOGRAPHIC: Advantages of Intel vs. IBM Power on SAP HANA solutions
INFOGRAPHIC: Advantages of Intel vs. IBM Power on SAP HANA solutionsINFOGRAPHIC: Advantages of Intel vs. IBM Power on SAP HANA solutions
INFOGRAPHIC: Advantages of Intel vs. IBM Power on SAP HANA solutions
 
Disrupt Hackers With Robust User Authentication
Disrupt Hackers With Robust User AuthenticationDisrupt Hackers With Robust User Authentication
Disrupt Hackers With Robust User Authentication
 
Strengthen Your Enterprise Arsenal Against Cyber Attacks With Hardware-Enhanc...
Strengthen Your Enterprise Arsenal Against Cyber Attacks With Hardware-Enhanc...Strengthen Your Enterprise Arsenal Against Cyber Attacks With Hardware-Enhanc...
Strengthen Your Enterprise Arsenal Against Cyber Attacks With Hardware-Enhanc...
 
Harness Digital Disruption to Create 2022’s Workplace Today
Harness Digital Disruption to Create 2022’s Workplace TodayHarness Digital Disruption to Create 2022’s Workplace Today
Harness Digital Disruption to Create 2022’s Workplace Today
 
Don't Rely on Software Alone. Protect Endpoints with Hardware-Enhanced Security.
Don't Rely on Software Alone.Protect Endpoints with Hardware-Enhanced Security.Don't Rely on Software Alone.Protect Endpoints with Hardware-Enhanced Security.
Don't Rely on Software Alone. Protect Endpoints with Hardware-Enhanced Security.
 
Achieve Unconstrained Collaboration in a Digital World
Achieve Unconstrained Collaboration in a Digital WorldAchieve Unconstrained Collaboration in a Digital World
Achieve Unconstrained Collaboration in a Digital World
 
Intel® Xeon® Scalable Processors Enabled Applications Marketing Guide
Intel® Xeon® Scalable Processors Enabled Applications Marketing GuideIntel® Xeon® Scalable Processors Enabled Applications Marketing Guide
Intel® Xeon® Scalable Processors Enabled Applications Marketing Guide
 
#NABshow: National Association of Broadcasters 2017 Super Session Presentatio...
#NABshow: National Association of Broadcasters 2017 Super Session Presentatio...#NABshow: National Association of Broadcasters 2017 Super Session Presentatio...
#NABshow: National Association of Broadcasters 2017 Super Session Presentatio...
 
Identity Protection for the Digital Age
Identity Protection for the Digital AgeIdentity Protection for the Digital Age
Identity Protection for the Digital Age
 
Three Steps to Making a Digital Workplace a Reality
Three Steps to Making a Digital Workplace a RealityThree Steps to Making a Digital Workplace a Reality
Three Steps to Making a Digital Workplace a Reality
 
Three Steps to Making The Digital Workplace a Reality - by Intel’s Chad Const...
Three Steps to Making The Digital Workplace a Reality - by Intel’s Chad Const...Three Steps to Making The Digital Workplace a Reality - by Intel’s Chad Const...
Three Steps to Making The Digital Workplace a Reality - by Intel’s Chad Const...
 
Intel® Xeon® Processor E7-8800/4800 v4 EAMG 2.0
Intel® Xeon® Processor E7-8800/4800 v4 EAMG 2.0Intel® Xeon® Processor E7-8800/4800 v4 EAMG 2.0
Intel® Xeon® Processor E7-8800/4800 v4 EAMG 2.0
 
Intel® Xeon® Processor E5-2600 v4 Enterprise Database Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Enterprise Database Applications ShowcaseIntel® Xeon® Processor E5-2600 v4 Enterprise Database Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Enterprise Database Applications Showcase
 
Intel® Xeon® Processor E5-2600 v4 Core Business Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Core Business Applications ShowcaseIntel® Xeon® Processor E5-2600 v4 Core Business Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Core Business Applications Showcase
 
Intel® Xeon® Processor E5-2600 v4 Financial Security Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Financial Security Applications ShowcaseIntel® Xeon® Processor E5-2600 v4 Financial Security Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Financial Security Applications Showcase
 
Intel® Xeon® Processor E5-2600 v4 Telco Cloud Digital Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Telco Cloud Digital Applications ShowcaseIntel® Xeon® Processor E5-2600 v4 Telco Cloud Digital Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Telco Cloud Digital Applications Showcase
 
Intel® Xeon® Processor E5-2600 v4 Tech Computing Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Tech Computing Applications ShowcaseIntel® Xeon® Processor E5-2600 v4 Tech Computing Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Tech Computing Applications Showcase
 

Último

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 

Último (20)

Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 

OmpSs - Improving Parallelism of OpenMP Applications

  • 1. www.bsc.es Leipzig, June 17th 2013 Jesus Labarta Jesus.labarta@bsc.es OmpSs – improving the scalability of OpenMP
  • 2. 2 The parallel programming revolution Parallel programming in the past – Where to place data – What to run where – How to communicate Parallel programming in the future – What do I need to compute – What data do I need to use – hints (not necessarily very precise) on potential concurrency, locality,… Schedule @ programmers mind Static Schedule @ system Dynamic Complexity  ! Awareness Variability
  • 3. 3 Key concept – Sequential task based program on single address/name space + directionality annotations – Happens to execute parallel: Automatic run time computation of dependencies between tasks Differentiation of StarSs – Dependences: Tasks instantiated but not ready. Order IS defined • Lookahead – Avoid stalling the main control flow when a computation depending on previous tasks is reached – Possibility to “see” the future searching for further potential concurrency – Locality aware – Homogenizing heterogeneity The StarSs family of programming models
  • 4. 4 The StarSs “Granularities” StarSs OmpSs COMPSs @ SMP @ GPU @ Cluster Average task Granularity: 100 microseconds – 10 milliseconds 1second - 1 day Language binding: C, C++, FORTRAN Java, Python Address space to compute dependences: Memory Files, Objects (SCM) Parallel Ensemble, workflow
  • 5. 5 OpenMP void Cholesky(int NT, float *A[NT][NT] ) { #pragma omp parallel #pragma omp single for (int k=0; k<NT; k++) { #pragma omp task spotrf (A[k][k], TS); #pragma omp taskwait for (int i=k+1; i<NT; i++) { #pragma omp task strsm (A[k][k], A[k][i], TS); } #pragma omp taskwait for (int i=k+1; i<NT; i++) { for (j=k+1; j<i; j++) #pragma omp task sgemm( A[k][i], A[k][j], A[j][i], TS); #pragma omp task ssyrk (A[k][i], A[i][i], TS); #pragma omp taskwait } } }
  • 6. 6 Extend OpenMP with a data-flow execution model to exploit unstructured parallelism (compared to fork-join) – in/out pragmas to enable dependence and locality – Inlined/outlined void Cholesky(int NT, float *A[NT][NT] ) { for (int k=0; k<NT; k++) { #pragma omp task inout ([TS][TS]A[k][k]) spotrf (A[k][k], TS) ; for (int i=k+1; i<NT; i++) { #pragma omp task in(([TS][TS]A[k][k])) inout ([TS][TS]A[k][i]) strsm (A[k][k], A[k][i], TS); } for (int i=k+1; i<NT; i++) { for (j=k+1; j<i; j++) { #pragma omp task in([TS][TS]A[k][i]), in([TS][TS]A[k][j]) inout ([TS][TS]A[j][i]) sgemm( A[k][i], A[k][j], A[j][i], TS); } #pragma omp task in ([TS][TS]A[k][i]) inout([TS][TS]A[i][i]) ssyrk (A[k][i], A[i][i], TS); } } } OmpSs
  • 7. 7 OmpSs Other features and characteristics – Contiguous / strided dependence detection and data management – Nesting/recursion – Multiple implementations for a given task – Integrated with powerful performance tools (Extrae+Paraver) Continuous development and use – Since 2004 – On large applications and systems Pushing ideas into the OpenMP standard – Developer Positioning: efforts in OmpSs will not be lost. – Dependences  v 4.0 18 Used in projects and applications … Undertaken significant efforts to port real large scale applications: – • Scalapack, PLASMA, SPECFEM3D, LBC, CPMD PSC, PEPC, LS1 Mardyn, Asynchronous algorithms, Microbenchmarks – • YALES2, EUTERPE, SPECFEM3D, MP2C, BigDFT, QuantumESPRESSO, PEPC, SMMP, ProFASI, COSMO, BQCD – DEEP • NEURON, iPIC3D, ECHAM/MESSy, AVBP, TurboRVB, Seismic – G8_ECS • CGPOP, NICAM (planed) … – Consolider project (Spanish ministry) • MRGENESIS – BSC initiatives and collaborations: • GROMACS, GADGET, WRF,… 19 … but NOT only for «scientific computing» … Plagiarism detection – Histograms, sorting, … (FhI FIRST) Trace browsing – Paraver (BSC) Clustering algorithms – G-means (BSC) Image processing – Tracking (USAF) Embedded and consumer – H.264 (TUBerlin), …
  • 8. 8 The potential of asynchrony Performance … – Cholesky 8K x 8K @ SandyBridge (E5-2670, 2.6 GHz) – Intel Math Kernel Library (MKL) parallel vs. OmpSs+ MKL sequential • Nested OmpSs (large block size, small block size)  Reproducibility / less sensitivity to environment options Strong scaling Impact of problem size for fixed core count
  • 9. 9 • Detected issues (ongoing work) – scheduling policies (generic of nested OmpSs), – contention on task queues (specific of high #cores),… • Still fairly good performance!!! Xeon Phi (I)
  • 10. 10 Hybrid MPI/SMPSs Linpack example Overlap communication/computation Extend asynchronous data-flow execution to outer level Automatic lookahead … for (k=0; k<N; k++) { if (mine) { Factor_panel(A[k]); send (A[k]) } else { receive (A[k]); if (necessary) resend (A[k]); } for (j=k+1; j<N; j++) update (A[k], A[j]); … #pragma css task inout(A[SIZE]) void Factor_panel(float *A); #pragma css task input(A[SIZE]) inout(B[SIZE]) void update(float *A, float *B); #pragma css task input(A[SIZE]) void send(float *A); #pragma css task output(A[SIZE]) void receive(float *A); #pragma css task input(A[SIZE]) void resend(float *A); P0 P1 P2 V. Marjanovic, et al, “Overlapping Communication and Computation by using a Hybrid MPI/SMPSs Approach” ICS 2010
  • 11. 11 Automatically achieved by the runtime – Load balance within node – Fine grain. – Complementary to application level load balance. – Leverage OmpSs malleability LeWI: Lend core When Idle – User level Run time Library (DLB) coordinating processes within node – Fighting the Linux kernel: Explicit pinning of threads and handoff scheduling Overcome Amdahl’s law in hybrid programming – Enable partial node level parallelization – hope for lazy programmers … and us all Dynamic Load Balancing: LeWI Communication Running task Idle 4 MPI processes @ 4 cores node “LeWI: A Runtime Balancing Algorithm for Nested Parallelism”. M.Garcia et al. ICPP09
  • 12. 12 Offloading to Intel® Xeon Phi™ (SRMPI app @ DEEP) Host KNC KNC KNC KNC Host KNC KNC KNC KNC MPI MPI MPI void main(int argc, char* argv[]) { //MPI stuff …. for(int i=0; i<my_jobs; i++) { int idx = i % workers; #pragma omp task out(worker_image[idx]) worker(worker_image[idx]); #pragma omp task inout(master_image) in(worker_image[idx]); accum(master_image, worker_image[idx]) } #pragma omp taskwait global_accum(global_pool, master_image); // MPI } Host KNC KNC KNC KNC Host KNC KNC KNC KNC MPI OmpSs OmpSs void main(int argc, char* argv[]) { //MPI stuff …. for(int i=0; i<my_jobs; i++) { int idx = i % workers; // MPI stuff to send job to worker //… //….manually taking care of load balance // … // …. and collect results on worker_pool[idx] accum(master_image, worker_image[idx]) } global_accum(global_image, master_image); // MPI } void main(int argc, char* argv[]) { for(int i=0; i<my_jobs; i++) { int idx = i % workers; #pragma omp task out(worker_image[idx]) worker(worker_image[idx]); #pragma omp task inout(master_image) in(worker_image[idx]); accum(global_image, worker_image[idx]) } #pragma omp taskwait }
  • 13. 13 Offloading to Intel® Xeon Phi™ (SRMPI app @ DEEP) Productivity … … and performance … flexibility …
  • 14. 14 MPI + OmpSs @ MIC – Hydro 8 MPI processes, 2 cores/process, 4 OmpSs threads/process – Mapped to 64 hardware threads (16 cores) 4 OmpSs threads/Core 2 OmpSs threads/Core 3 OmpSs threads@ Core 1 2 OmpSs threads/Core 1 OmpSs threads@ Core 2
  • 15. 15 OmpSs: Enabler for exascale  Can exploit very unstructured parallelism  Not just loop/data parallelism  Easy to change structure  Supports large amounts of lookahead  Not stalling for dependence satisfaction  Allow for locality optimizations to tolerate latency  Overlap data transfers, prefetch  Reuse  Nicely hybridizes into MPI/StarSs  Propagates to large scale the node level dataflow characteristics  Overlap communication and computation  A chance against Amdahl’s law  Support for heterogeneity  Any # and combination of CPUs, GPUs  Including autotuning  Malleability: Decouple program from resources  Allowing dynamic resource allocation and load balance  Tolerate noise 15 Asynchrony: Data-flow Locality Simple / incremental interface to programmers Compatible with proprietary lower level technologies