SlideShare una empresa de Scribd logo
1 de 16
www.bsc.es
Leipzig, June 17th 2013
Jesus Labarta
Jesus.labarta@bsc.es
OmpSs – improving the
scalability of OpenMP
2
The parallel programming revolution
Parallel programming in the past
– Where to place data
– What to run where
– How to communicate
Parallel programming in the future
– What do I need to compute
– What data do I need to use
– hints (not necessarily very precise) on
potential concurrency, locality,…
Schedule @ programmers mind
Static
Schedule @ system
Dynamic
Complexity  ! Awareness
Variability
3
Key concept
– Sequential task based program on single address/name space
+ directionality annotations
– Happens to execute parallel: Automatic run time computation of
dependencies between tasks
Differentiation of StarSs
– Dependences: Tasks instantiated but not ready. Order IS defined
• Lookahead
– Avoid stalling the main control flow when a computation depending on previous
tasks is reached
– Possibility to “see” the future searching for further potential concurrency
– Locality aware
– Homogenizing heterogeneity
The StarSs family of programming models
4
The StarSs “Granularities”
StarSs
OmpSs COMPSs
@ SMP @ GPU @ Cluster
Average task Granularity:
100 microseconds – 10 milliseconds 1second - 1 day
Language binding:
C, C++, FORTRAN Java, Python
Address space to compute dependences:
Memory Files, Objects (SCM)
Parallel Ensemble, workflow
5
OpenMP
void Cholesky(int NT, float *A[NT][NT] ) {
#pragma omp parallel
#pragma omp single
for (int k=0; k<NT; k++) {
#pragma omp task
spotrf (A[k][k], TS);
#pragma omp taskwait
for (int i=k+1; i<NT; i++) {
#pragma omp task
strsm (A[k][k], A[k][i], TS);
}
#pragma omp taskwait
for (int i=k+1; i<NT; i++) {
for (j=k+1; j<i; j++)
#pragma omp task
sgemm( A[k][i], A[k][j], A[j][i], TS);
#pragma omp task
ssyrk (A[k][i], A[i][i], TS);
#pragma omp taskwait
}
}
}
6
Extend OpenMP with a data-flow execution model to exploit
unstructured parallelism (compared to fork-join)
– in/out pragmas to enable dependence and locality
– Inlined/outlined
void Cholesky(int NT, float *A[NT][NT] ) {
for (int k=0; k<NT; k++) {
#pragma omp task inout ([TS][TS]A[k][k])
spotrf (A[k][k], TS) ;
for (int i=k+1; i<NT; i++) {
#pragma omp task in(([TS][TS]A[k][k])) 
inout ([TS][TS]A[k][i])
strsm (A[k][k], A[k][i], TS);
}
for (int i=k+1; i<NT; i++) {
for (j=k+1; j<i; j++) {
#pragma omp task in([TS][TS]A[k][i]),
in([TS][TS]A[k][j]) inout ([TS][TS]A[j][i])
sgemm( A[k][i], A[k][j], A[j][i], TS);
}
#pragma omp task in ([TS][TS]A[k][i]) 
inout([TS][TS]A[i][i])
ssyrk (A[k][i], A[i][i], TS);
}
}
}
OmpSs
7
OmpSs
Other features and characteristics
– Contiguous / strided dependence detection and data management
– Nesting/recursion
– Multiple implementations for a given task
– Integrated with powerful performance tools (Extrae+Paraver)
Continuous development and use
– Since 2004
– On large applications and systems
Pushing ideas into the OpenMP standard
– Developer Positioning: efforts in OmpSs will not be lost.
– Dependences  v 4.0
18
Used in projects and applications …
Undertaken significant efforts to port real large scale
applications:
–
• Scalapack, PLASMA, SPECFEM3D, LBC, CPMD PSC, PEPC,
LS1 Mardyn, Asynchronous algorithms, Microbenchmarks
–
• YALES2, EUTERPE, SPECFEM3D, MP2C, BigDFT,
QuantumESPRESSO, PEPC, SMMP, ProFASI, COSMO, BQCD
– DEEP
• NEURON, iPIC3D, ECHAM/MESSy, AVBP, TurboRVB, Seismic
– G8_ECS
• CGPOP, NICAM (planed) …
– Consolider project (Spanish ministry)
• MRGENESIS
– BSC initiatives and collaborations:
• GROMACS, GADGET, WRF,…
19
… but NOT only for «scientific computing» …
Plagiarism detection
– Histograms, sorting, … (FhI FIRST)
Trace browsing
– Paraver (BSC)
Clustering algorithms
– G-means (BSC)
Image processing
– Tracking (USAF)
Embedded and consumer
– H.264 (TUBerlin), …
8
The potential of asynchrony
Performance …
– Cholesky 8K x 8K @ SandyBridge (E5-2670, 2.6 GHz)
– Intel Math Kernel Library (MKL) parallel vs. OmpSs+ MKL sequential
• Nested OmpSs (large block size, small block size)
 Reproducibility / less sensitivity to environment options
Strong scaling
Impact of problem size
for fixed core count
9
• Detected issues (ongoing work)
– scheduling policies (generic of nested OmpSs),
– contention on task queues (specific of high #cores),…
• Still fairly good performance!!!
Xeon Phi (I)
10
Hybrid MPI/SMPSs
Linpack example
Overlap communication/computation
Extend asynchronous data-flow execution to
outer level
Automatic lookahead
…
for (k=0; k<N; k++) {
if (mine) {
Factor_panel(A[k]);
send (A[k])
} else {
receive (A[k]);
if (necessary) resend (A[k]);
}
for (j=k+1; j<N; j++)
update (A[k], A[j]);
…
#pragma css task inout(A[SIZE])
void Factor_panel(float *A);
#pragma css task input(A[SIZE]) inout(B[SIZE])
void update(float *A, float *B);
#pragma css task input(A[SIZE])
void send(float *A);
#pragma css task output(A[SIZE])
void receive(float *A);
#pragma css task input(A[SIZE])
void resend(float *A);
P0 P1 P2
V. Marjanovic, et al, “Overlapping Communication and Computation by using a Hybrid MPI/SMPSs Approach” ICS 2010
11
Automatically achieved by the runtime
– Load balance within node
– Fine grain.
– Complementary to application level
load balance.
– Leverage OmpSs malleability
LeWI: Lend core When Idle
– User level Run time Library (DLB)
coordinating processes within node
– Fighting the Linux kernel: Explicit pinning of
threads and handoff scheduling
Overcome Amdahl’s law in hybrid
programming
– Enable partial node level parallelization
– hope for lazy programmers
… and us all
Dynamic Load Balancing: LeWI
Communication
Running task
Idle
4 MPI processes @ 4 cores node
“LeWI: A Runtime Balancing
Algorithm for Nested Parallelism”.
M.Garcia et al. ICPP09
12
Offloading to Intel® Xeon Phi™ (SRMPI app @ DEEP)
Host
KNC
KNC
KNC
KNC
Host
KNC
KNC
KNC
KNC
MPI
MPI
MPI
void main(int argc, char* argv[]) {
//MPI stuff ….
for(int i=0; i<my_jobs; i++) {
int idx = i % workers;
#pragma omp task out(worker_image[idx])
worker(worker_image[idx]);
#pragma omp task inout(master_image) in(worker_image[idx]);
accum(master_image, worker_image[idx])
}
#pragma omp taskwait
global_accum(global_pool, master_image); // MPI
}
Host
KNC
KNC
KNC
KNC
Host
KNC
KNC
KNC
KNC
MPI
OmpSs
OmpSs
void main(int argc, char* argv[]) {
//MPI stuff ….
for(int i=0; i<my_jobs; i++) {
int idx = i % workers;
// MPI stuff to send job to worker
//…
//….manually taking care of load balance
// …
// …. and collect results on worker_pool[idx]
accum(master_image, worker_image[idx])
}
global_accum(global_image, master_image); // MPI
}
void main(int argc, char* argv[]) {
for(int i=0; i<my_jobs; i++) {
int idx = i % workers;
#pragma omp task out(worker_image[idx])
worker(worker_image[idx]);
#pragma omp task inout(master_image) in(worker_image[idx]);
accum(global_image, worker_image[idx])
}
#pragma omp taskwait
}
13
Offloading to Intel® Xeon Phi™ (SRMPI app @ DEEP)
Productivity …
… and performance
… flexibility …
14
MPI + OmpSs @ MIC
– Hydro 8 MPI processes, 2 cores/process, 4 OmpSs threads/process
– Mapped to 64 hardware threads (16 cores)
4 OmpSs threads/Core
2 OmpSs threads/Core
3 OmpSs threads@ Core 1
2 OmpSs threads/Core
1 OmpSs threads@ Core 2
15
OmpSs: Enabler for exascale
 Can exploit very unstructured
parallelism
 Not just loop/data parallelism
 Easy to change structure
 Supports large amounts of
lookahead
 Not stalling for dependence satisfaction
 Allow for locality optimizations to
tolerate latency
 Overlap data transfers, prefetch
 Reuse
 Nicely hybridizes into MPI/StarSs
 Propagates to large scale the node level
dataflow characteristics
 Overlap communication and computation
 A chance against Amdahl’s law
 Support for heterogeneity
 Any # and combination of CPUs, GPUs
 Including autotuning
 Malleability: Decouple program from
resources
 Allowing dynamic resource allocation and
load balance
 Tolerate noise
15
Asynchrony: Data-flow
Locality
Simple / incremental
interface to programmers
Compatible with proprietary
lower level technologies
THANKS

Más contenido relacionado

La actualidad más candente

An evaluation of LLVM compiler for SVE with fairly complicated loops
An evaluation of LLVM compiler for SVE with fairly complicated loopsAn evaluation of LLVM compiler for SVE with fairly complicated loops
An evaluation of LLVM compiler for SVE with fairly complicated loopsLinaro
 
あなたのScalaを爆速にする7つの方法
あなたのScalaを爆速にする7つの方法あなたのScalaを爆速にする7つの方法
あなたのScalaを爆速にする7つの方法x1 ichi
 
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...Jen Aman
 
Using R in remote computer clusters
Using R in remote computer clustersUsing R in remote computer clusters
Using R in remote computer clustersBurak Himmetoglu
 
Flux and InfluxDB 2.0 by Paul Dix
Flux and InfluxDB 2.0 by Paul DixFlux and InfluxDB 2.0 by Paul Dix
Flux and InfluxDB 2.0 by Paul DixInfluxData
 
Linuxconf 2011 parallel languages talk
Linuxconf 2011 parallel languages talkLinuxconf 2011 parallel languages talk
Linuxconf 2011 parallel languages talkLenz Gschwendtner
 
Ufuc Celebi – Stream & Batch Processing in one System
Ufuc Celebi – Stream & Batch Processing in one SystemUfuc Celebi – Stream & Batch Processing in one System
Ufuc Celebi – Stream & Batch Processing in one SystemFlink Forward
 
Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ...
Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ...Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ...
Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ...Intel® Software
 
Early Results of OpenMP 4.5 Portability on NVIDIA GPUs & CPUs
Early Results of OpenMP 4.5 Portability on NVIDIA GPUs & CPUsEarly Results of OpenMP 4.5 Portability on NVIDIA GPUs & CPUs
Early Results of OpenMP 4.5 Portability on NVIDIA GPUs & CPUsJeff Larkin
 
.Net 4.0 Threading and Parallel Programming
.Net 4.0 Threading and Parallel Programming.Net 4.0 Threading and Parallel Programming
.Net 4.0 Threading and Parallel ProgrammingAlex Moore
 
Michael Häusler – Everyday flink
Michael Häusler – Everyday flinkMichael Häusler – Everyday flink
Michael Häusler – Everyday flinkFlink Forward
 
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...AMD Developer Central
 
A Graph-Based Method For Cross-Entity Threat Detection
 A Graph-Based Method For Cross-Entity Threat Detection A Graph-Based Method For Cross-Entity Threat Detection
A Graph-Based Method For Cross-Entity Threat DetectionJen Aman
 
Brief introduction on Hadoop,Dremel, Pig, FlumeJava and Cassandra
Brief introduction on Hadoop,Dremel, Pig, FlumeJava and CassandraBrief introduction on Hadoop,Dremel, Pig, FlumeJava and Cassandra
Brief introduction on Hadoop,Dremel, Pig, FlumeJava and CassandraSomnath Mazumdar
 
Buzzwords Numba Presentation
Buzzwords Numba PresentationBuzzwords Numba Presentation
Buzzwords Numba Presentationkammeyer
 
Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...
Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...
Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...Databricks
 
Probabilistic data structures. Part 2. Cardinality
Probabilistic data structures. Part 2. CardinalityProbabilistic data structures. Part 2. Cardinality
Probabilistic data structures. Part 2. CardinalityAndrii Gakhov
 
Apache Flink Training: DataStream API Part 1 Basic
 Apache Flink Training: DataStream API Part 1 Basic Apache Flink Training: DataStream API Part 1 Basic
Apache Flink Training: DataStream API Part 1 BasicFlink Forward
 
EuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOME
EuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOMEEuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOME
EuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOMEHONGJOO LEE
 

La actualidad más candente (20)

An evaluation of LLVM compiler for SVE with fairly complicated loops
An evaluation of LLVM compiler for SVE with fairly complicated loopsAn evaluation of LLVM compiler for SVE with fairly complicated loops
An evaluation of LLVM compiler for SVE with fairly complicated loops
 
あなたのScalaを爆速にする7つの方法
あなたのScalaを爆速にする7つの方法あなたのScalaを爆速にする7つの方法
あなたのScalaを爆速にする7つの方法
 
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
 
Using R in remote computer clusters
Using R in remote computer clustersUsing R in remote computer clusters
Using R in remote computer clusters
 
Flux and InfluxDB 2.0 by Paul Dix
Flux and InfluxDB 2.0 by Paul DixFlux and InfluxDB 2.0 by Paul Dix
Flux and InfluxDB 2.0 by Paul Dix
 
Linuxconf 2011 parallel languages talk
Linuxconf 2011 parallel languages talkLinuxconf 2011 parallel languages talk
Linuxconf 2011 parallel languages talk
 
Ufuc Celebi – Stream & Batch Processing in one System
Ufuc Celebi – Stream & Batch Processing in one SystemUfuc Celebi – Stream & Batch Processing in one System
Ufuc Celebi – Stream & Batch Processing in one System
 
Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ...
Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ...Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ...
Accelerate Your Python* Code through Profiling, Tuning, and Compilation Part ...
 
Early Results of OpenMP 4.5 Portability on NVIDIA GPUs & CPUs
Early Results of OpenMP 4.5 Portability on NVIDIA GPUs & CPUsEarly Results of OpenMP 4.5 Portability on NVIDIA GPUs & CPUs
Early Results of OpenMP 4.5 Portability on NVIDIA GPUs & CPUs
 
.Net 4.0 Threading and Parallel Programming
.Net 4.0 Threading and Parallel Programming.Net 4.0 Threading and Parallel Programming
.Net 4.0 Threading and Parallel Programming
 
Michael Häusler – Everyday flink
Michael Häusler – Everyday flinkMichael Häusler – Everyday flink
Michael Häusler – Everyday flink
 
scikit-cuda
scikit-cudascikit-cuda
scikit-cuda
 
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
 
A Graph-Based Method For Cross-Entity Threat Detection
 A Graph-Based Method For Cross-Entity Threat Detection A Graph-Based Method For Cross-Entity Threat Detection
A Graph-Based Method For Cross-Entity Threat Detection
 
Brief introduction on Hadoop,Dremel, Pig, FlumeJava and Cassandra
Brief introduction on Hadoop,Dremel, Pig, FlumeJava and CassandraBrief introduction on Hadoop,Dremel, Pig, FlumeJava and Cassandra
Brief introduction on Hadoop,Dremel, Pig, FlumeJava and Cassandra
 
Buzzwords Numba Presentation
Buzzwords Numba PresentationBuzzwords Numba Presentation
Buzzwords Numba Presentation
 
Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...
Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...
Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...
 
Probabilistic data structures. Part 2. Cardinality
Probabilistic data structures. Part 2. CardinalityProbabilistic data structures. Part 2. Cardinality
Probabilistic data structures. Part 2. Cardinality
 
Apache Flink Training: DataStream API Part 1 Basic
 Apache Flink Training: DataStream API Part 1 Basic Apache Flink Training: DataStream API Part 1 Basic
Apache Flink Training: DataStream API Part 1 Basic
 
EuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOME
EuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOMEEuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOME
EuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOME
 

Similar a OmpSs - Improving Parallelism of OpenMP Applications

OpenMP tasking model: from the standard to the classroom
OpenMP tasking model: from the standard to the classroomOpenMP tasking model: from the standard to the classroom
OpenMP tasking model: from the standard to the classroomFacultad de Informática UCM
 
NVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読みNVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読みNVIDIA Japan
 
Beyond Map/Reduce: Getting Creative With Parallel Processing
Beyond Map/Reduce: Getting Creative With Parallel ProcessingBeyond Map/Reduce: Getting Creative With Parallel Processing
Beyond Map/Reduce: Getting Creative With Parallel ProcessingEd Kohlwey
 
Big data shim
Big data shimBig data shim
Big data shimtistrue
 
Intermachine Parallelism
Intermachine ParallelismIntermachine Parallelism
Intermachine ParallelismSri Prasanna
 
Task based Programming with OmpSs and its Application
Task based Programming with OmpSs and its ApplicationTask based Programming with OmpSs and its Application
Task based Programming with OmpSs and its ApplicationFacultad de Informática UCM
 
Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*Intel® Software
 
Chapel-on-X: Exploring Tasking Runtimes for PGAS Languages
Chapel-on-X: Exploring Tasking Runtimes for PGAS LanguagesChapel-on-X: Exploring Tasking Runtimes for PGAS Languages
Chapel-on-X: Exploring Tasking Runtimes for PGAS LanguagesAkihiro Hayashi
 
Overview of Apache SystemML by Berthold Reinwald and Nakul Jindal
Overview of Apache SystemML by Berthold Reinwald and Nakul JindalOverview of Apache SystemML by Berthold Reinwald and Nakul Jindal
Overview of Apache SystemML by Berthold Reinwald and Nakul JindalArvind Surve
 
Overview of Apache SystemML by Berthold Reinwald and Nakul Jindal
Overview of Apache SystemML by Berthold Reinwald and Nakul JindalOverview of Apache SystemML by Berthold Reinwald and Nakul Jindal
Overview of Apache SystemML by Berthold Reinwald and Nakul JindalArvind Surve
 
IS-ENES COMP Superscalar tutorial
IS-ENES COMP Superscalar tutorialIS-ENES COMP Superscalar tutorial
IS-ENES COMP Superscalar tutorialRoger Rafanell Mas
 
Natural Language Processing with CNTK and Apache Spark with Ali Zaidi
Natural Language Processing with CNTK and Apache Spark with Ali ZaidiNatural Language Processing with CNTK and Apache Spark with Ali Zaidi
Natural Language Processing with CNTK and Apache Spark with Ali ZaidiDatabricks
 
Introduction to Scalding and Monoids
Introduction to Scalding and MonoidsIntroduction to Scalding and Monoids
Introduction to Scalding and MonoidsHugo Gävert
 
Atomate: a high-level interface to generate, execute, and analyze computation...
Atomate: a high-level interface to generate, execute, and analyze computation...Atomate: a high-level interface to generate, execute, and analyze computation...
Atomate: a high-level interface to generate, execute, and analyze computation...Anubhav Jain
 
NYAI - Scaling Machine Learning Applications by Braxton McKee
NYAI - Scaling Machine Learning Applications by Braxton McKeeNYAI - Scaling Machine Learning Applications by Braxton McKee
NYAI - Scaling Machine Learning Applications by Braxton McKeeRizwan Habib
 
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16MLconf
 
Spark with Elasticsearch - umd version 2014
Spark with Elasticsearch - umd version 2014Spark with Elasticsearch - umd version 2014
Spark with Elasticsearch - umd version 2014Holden Karau
 
HPAT presentation at JuliaCon 2016
HPAT presentation at JuliaCon 2016HPAT presentation at JuliaCon 2016
HPAT presentation at JuliaCon 2016Ehsan Totoni
 
Three Functional Programming Technologies for Big Data
Three Functional Programming Technologies for Big DataThree Functional Programming Technologies for Big Data
Three Functional Programming Technologies for Big DataDynamical Software, Inc.
 

Similar a OmpSs - Improving Parallelism of OpenMP Applications (20)

OpenMP tasking model: from the standard to the classroom
OpenMP tasking model: from the standard to the classroomOpenMP tasking model: from the standard to the classroom
OpenMP tasking model: from the standard to the classroom
 
NVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読みNVIDIA HPC ソフトウエア斜め読み
NVIDIA HPC ソフトウエア斜め読み
 
Beyond Map/Reduce: Getting Creative With Parallel Processing
Beyond Map/Reduce: Getting Creative With Parallel ProcessingBeyond Map/Reduce: Getting Creative With Parallel Processing
Beyond Map/Reduce: Getting Creative With Parallel Processing
 
Big data shim
Big data shimBig data shim
Big data shim
 
MapReduce basics
MapReduce basicsMapReduce basics
MapReduce basics
 
Intermachine Parallelism
Intermachine ParallelismIntermachine Parallelism
Intermachine Parallelism
 
Task based Programming with OmpSs and its Application
Task based Programming with OmpSs and its ApplicationTask based Programming with OmpSs and its Application
Task based Programming with OmpSs and its Application
 
Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*
 
Chapel-on-X: Exploring Tasking Runtimes for PGAS Languages
Chapel-on-X: Exploring Tasking Runtimes for PGAS LanguagesChapel-on-X: Exploring Tasking Runtimes for PGAS Languages
Chapel-on-X: Exploring Tasking Runtimes for PGAS Languages
 
Overview of Apache SystemML by Berthold Reinwald and Nakul Jindal
Overview of Apache SystemML by Berthold Reinwald and Nakul JindalOverview of Apache SystemML by Berthold Reinwald and Nakul Jindal
Overview of Apache SystemML by Berthold Reinwald and Nakul Jindal
 
Overview of Apache SystemML by Berthold Reinwald and Nakul Jindal
Overview of Apache SystemML by Berthold Reinwald and Nakul JindalOverview of Apache SystemML by Berthold Reinwald and Nakul Jindal
Overview of Apache SystemML by Berthold Reinwald and Nakul Jindal
 
IS-ENES COMP Superscalar tutorial
IS-ENES COMP Superscalar tutorialIS-ENES COMP Superscalar tutorial
IS-ENES COMP Superscalar tutorial
 
Natural Language Processing with CNTK and Apache Spark with Ali Zaidi
Natural Language Processing with CNTK and Apache Spark with Ali ZaidiNatural Language Processing with CNTK and Apache Spark with Ali Zaidi
Natural Language Processing with CNTK and Apache Spark with Ali Zaidi
 
Introduction to Scalding and Monoids
Introduction to Scalding and MonoidsIntroduction to Scalding and Monoids
Introduction to Scalding and Monoids
 
Atomate: a high-level interface to generate, execute, and analyze computation...
Atomate: a high-level interface to generate, execute, and analyze computation...Atomate: a high-level interface to generate, execute, and analyze computation...
Atomate: a high-level interface to generate, execute, and analyze computation...
 
NYAI - Scaling Machine Learning Applications by Braxton McKee
NYAI - Scaling Machine Learning Applications by Braxton McKeeNYAI - Scaling Machine Learning Applications by Braxton McKee
NYAI - Scaling Machine Learning Applications by Braxton McKee
 
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16
 
Spark with Elasticsearch - umd version 2014
Spark with Elasticsearch - umd version 2014Spark with Elasticsearch - umd version 2014
Spark with Elasticsearch - umd version 2014
 
HPAT presentation at JuliaCon 2016
HPAT presentation at JuliaCon 2016HPAT presentation at JuliaCon 2016
HPAT presentation at JuliaCon 2016
 
Three Functional Programming Technologies for Big Data
Three Functional Programming Technologies for Big DataThree Functional Programming Technologies for Big Data
Three Functional Programming Technologies for Big Data
 

Más de Intel IT Center

AI Crash Course- Supercomputing
AI Crash Course- SupercomputingAI Crash Course- Supercomputing
AI Crash Course- SupercomputingIntel IT Center
 
FPGA Inference - DellEMC SURFsara
FPGA Inference - DellEMC SURFsaraFPGA Inference - DellEMC SURFsara
FPGA Inference - DellEMC SURFsaraIntel IT Center
 
High Memory Bandwidth Demo @ One Intel Station
High Memory Bandwidth Demo @ One Intel StationHigh Memory Bandwidth Demo @ One Intel Station
High Memory Bandwidth Demo @ One Intel StationIntel IT Center
 
INFOGRAPHIC: Advantages of Intel vs. IBM Power on SAP HANA solutions
INFOGRAPHIC: Advantages of Intel vs. IBM Power on SAP HANA solutionsINFOGRAPHIC: Advantages of Intel vs. IBM Power on SAP HANA solutions
INFOGRAPHIC: Advantages of Intel vs. IBM Power on SAP HANA solutionsIntel IT Center
 
Disrupt Hackers With Robust User Authentication
Disrupt Hackers With Robust User AuthenticationDisrupt Hackers With Robust User Authentication
Disrupt Hackers With Robust User AuthenticationIntel IT Center
 
Strengthen Your Enterprise Arsenal Against Cyber Attacks With Hardware-Enhanc...
Strengthen Your Enterprise Arsenal Against Cyber Attacks With Hardware-Enhanc...Strengthen Your Enterprise Arsenal Against Cyber Attacks With Hardware-Enhanc...
Strengthen Your Enterprise Arsenal Against Cyber Attacks With Hardware-Enhanc...Intel IT Center
 
Harness Digital Disruption to Create 2022’s Workplace Today
Harness Digital Disruption to Create 2022’s Workplace TodayHarness Digital Disruption to Create 2022’s Workplace Today
Harness Digital Disruption to Create 2022’s Workplace TodayIntel IT Center
 
Don't Rely on Software Alone. Protect Endpoints with Hardware-Enhanced Security.
Don't Rely on Software Alone.Protect Endpoints with Hardware-Enhanced Security.Don't Rely on Software Alone.Protect Endpoints with Hardware-Enhanced Security.
Don't Rely on Software Alone. Protect Endpoints with Hardware-Enhanced Security.Intel IT Center
 
Achieve Unconstrained Collaboration in a Digital World
Achieve Unconstrained Collaboration in a Digital WorldAchieve Unconstrained Collaboration in a Digital World
Achieve Unconstrained Collaboration in a Digital WorldIntel IT Center
 
Intel® Xeon® Scalable Processors Enabled Applications Marketing Guide
Intel® Xeon® Scalable Processors Enabled Applications Marketing GuideIntel® Xeon® Scalable Processors Enabled Applications Marketing Guide
Intel® Xeon® Scalable Processors Enabled Applications Marketing GuideIntel IT Center
 
#NABshow: National Association of Broadcasters 2017 Super Session Presentatio...
#NABshow: National Association of Broadcasters 2017 Super Session Presentatio...#NABshow: National Association of Broadcasters 2017 Super Session Presentatio...
#NABshow: National Association of Broadcasters 2017 Super Session Presentatio...Intel IT Center
 
Identity Protection for the Digital Age
Identity Protection for the Digital AgeIdentity Protection for the Digital Age
Identity Protection for the Digital AgeIntel IT Center
 
Three Steps to Making a Digital Workplace a Reality
Three Steps to Making a Digital Workplace a RealityThree Steps to Making a Digital Workplace a Reality
Three Steps to Making a Digital Workplace a RealityIntel IT Center
 
Three Steps to Making The Digital Workplace a Reality - by Intel’s Chad Const...
Three Steps to Making The Digital Workplace a Reality - by Intel’s Chad Const...Three Steps to Making The Digital Workplace a Reality - by Intel’s Chad Const...
Three Steps to Making The Digital Workplace a Reality - by Intel’s Chad Const...Intel IT Center
 
Intel® Xeon® Processor E7-8800/4800 v4 EAMG 2.0
Intel® Xeon® Processor E7-8800/4800 v4 EAMG 2.0Intel® Xeon® Processor E7-8800/4800 v4 EAMG 2.0
Intel® Xeon® Processor E7-8800/4800 v4 EAMG 2.0Intel IT Center
 
Intel® Xeon® Processor E5-2600 v4 Enterprise Database Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Enterprise Database Applications ShowcaseIntel® Xeon® Processor E5-2600 v4 Enterprise Database Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Enterprise Database Applications ShowcaseIntel IT Center
 
Intel® Xeon® Processor E5-2600 v4 Core Business Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Core Business Applications ShowcaseIntel® Xeon® Processor E5-2600 v4 Core Business Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Core Business Applications ShowcaseIntel IT Center
 
Intel® Xeon® Processor E5-2600 v4 Financial Security Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Financial Security Applications ShowcaseIntel® Xeon® Processor E5-2600 v4 Financial Security Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Financial Security Applications ShowcaseIntel IT Center
 
Intel® Xeon® Processor E5-2600 v4 Telco Cloud Digital Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Telco Cloud Digital Applications ShowcaseIntel® Xeon® Processor E5-2600 v4 Telco Cloud Digital Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Telco Cloud Digital Applications ShowcaseIntel IT Center
 
Intel® Xeon® Processor E5-2600 v4 Tech Computing Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Tech Computing Applications ShowcaseIntel® Xeon® Processor E5-2600 v4 Tech Computing Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Tech Computing Applications ShowcaseIntel IT Center
 

Más de Intel IT Center (20)

AI Crash Course- Supercomputing
AI Crash Course- SupercomputingAI Crash Course- Supercomputing
AI Crash Course- Supercomputing
 
FPGA Inference - DellEMC SURFsara
FPGA Inference - DellEMC SURFsaraFPGA Inference - DellEMC SURFsara
FPGA Inference - DellEMC SURFsara
 
High Memory Bandwidth Demo @ One Intel Station
High Memory Bandwidth Demo @ One Intel StationHigh Memory Bandwidth Demo @ One Intel Station
High Memory Bandwidth Demo @ One Intel Station
 
INFOGRAPHIC: Advantages of Intel vs. IBM Power on SAP HANA solutions
INFOGRAPHIC: Advantages of Intel vs. IBM Power on SAP HANA solutionsINFOGRAPHIC: Advantages of Intel vs. IBM Power on SAP HANA solutions
INFOGRAPHIC: Advantages of Intel vs. IBM Power on SAP HANA solutions
 
Disrupt Hackers With Robust User Authentication
Disrupt Hackers With Robust User AuthenticationDisrupt Hackers With Robust User Authentication
Disrupt Hackers With Robust User Authentication
 
Strengthen Your Enterprise Arsenal Against Cyber Attacks With Hardware-Enhanc...
Strengthen Your Enterprise Arsenal Against Cyber Attacks With Hardware-Enhanc...Strengthen Your Enterprise Arsenal Against Cyber Attacks With Hardware-Enhanc...
Strengthen Your Enterprise Arsenal Against Cyber Attacks With Hardware-Enhanc...
 
Harness Digital Disruption to Create 2022’s Workplace Today
Harness Digital Disruption to Create 2022’s Workplace TodayHarness Digital Disruption to Create 2022’s Workplace Today
Harness Digital Disruption to Create 2022’s Workplace Today
 
Don't Rely on Software Alone. Protect Endpoints with Hardware-Enhanced Security.
Don't Rely on Software Alone.Protect Endpoints with Hardware-Enhanced Security.Don't Rely on Software Alone.Protect Endpoints with Hardware-Enhanced Security.
Don't Rely on Software Alone. Protect Endpoints with Hardware-Enhanced Security.
 
Achieve Unconstrained Collaboration in a Digital World
Achieve Unconstrained Collaboration in a Digital WorldAchieve Unconstrained Collaboration in a Digital World
Achieve Unconstrained Collaboration in a Digital World
 
Intel® Xeon® Scalable Processors Enabled Applications Marketing Guide
Intel® Xeon® Scalable Processors Enabled Applications Marketing GuideIntel® Xeon® Scalable Processors Enabled Applications Marketing Guide
Intel® Xeon® Scalable Processors Enabled Applications Marketing Guide
 
#NABshow: National Association of Broadcasters 2017 Super Session Presentatio...
#NABshow: National Association of Broadcasters 2017 Super Session Presentatio...#NABshow: National Association of Broadcasters 2017 Super Session Presentatio...
#NABshow: National Association of Broadcasters 2017 Super Session Presentatio...
 
Identity Protection for the Digital Age
Identity Protection for the Digital AgeIdentity Protection for the Digital Age
Identity Protection for the Digital Age
 
Three Steps to Making a Digital Workplace a Reality
Three Steps to Making a Digital Workplace a RealityThree Steps to Making a Digital Workplace a Reality
Three Steps to Making a Digital Workplace a Reality
 
Three Steps to Making The Digital Workplace a Reality - by Intel’s Chad Const...
Three Steps to Making The Digital Workplace a Reality - by Intel’s Chad Const...Three Steps to Making The Digital Workplace a Reality - by Intel’s Chad Const...
Three Steps to Making The Digital Workplace a Reality - by Intel’s Chad Const...
 
Intel® Xeon® Processor E7-8800/4800 v4 EAMG 2.0
Intel® Xeon® Processor E7-8800/4800 v4 EAMG 2.0Intel® Xeon® Processor E7-8800/4800 v4 EAMG 2.0
Intel® Xeon® Processor E7-8800/4800 v4 EAMG 2.0
 
Intel® Xeon® Processor E5-2600 v4 Enterprise Database Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Enterprise Database Applications ShowcaseIntel® Xeon® Processor E5-2600 v4 Enterprise Database Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Enterprise Database Applications Showcase
 
Intel® Xeon® Processor E5-2600 v4 Core Business Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Core Business Applications ShowcaseIntel® Xeon® Processor E5-2600 v4 Core Business Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Core Business Applications Showcase
 
Intel® Xeon® Processor E5-2600 v4 Financial Security Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Financial Security Applications ShowcaseIntel® Xeon® Processor E5-2600 v4 Financial Security Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Financial Security Applications Showcase
 
Intel® Xeon® Processor E5-2600 v4 Telco Cloud Digital Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Telco Cloud Digital Applications ShowcaseIntel® Xeon® Processor E5-2600 v4 Telco Cloud Digital Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Telco Cloud Digital Applications Showcase
 
Intel® Xeon® Processor E5-2600 v4 Tech Computing Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Tech Computing Applications ShowcaseIntel® Xeon® Processor E5-2600 v4 Tech Computing Applications Showcase
Intel® Xeon® Processor E5-2600 v4 Tech Computing Applications Showcase
 

Último

How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 

Último (20)

How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 

OmpSs - Improving Parallelism of OpenMP Applications

  • 1. www.bsc.es Leipzig, June 17th 2013 Jesus Labarta Jesus.labarta@bsc.es OmpSs – improving the scalability of OpenMP
  • 2. 2 The parallel programming revolution Parallel programming in the past – Where to place data – What to run where – How to communicate Parallel programming in the future – What do I need to compute – What data do I need to use – hints (not necessarily very precise) on potential concurrency, locality,… Schedule @ programmers mind Static Schedule @ system Dynamic Complexity  ! Awareness Variability
  • 3. 3 Key concept – Sequential task based program on single address/name space + directionality annotations – Happens to execute parallel: Automatic run time computation of dependencies between tasks Differentiation of StarSs – Dependences: Tasks instantiated but not ready. Order IS defined • Lookahead – Avoid stalling the main control flow when a computation depending on previous tasks is reached – Possibility to “see” the future searching for further potential concurrency – Locality aware – Homogenizing heterogeneity The StarSs family of programming models
  • 4. 4 The StarSs “Granularities” StarSs OmpSs COMPSs @ SMP @ GPU @ Cluster Average task Granularity: 100 microseconds – 10 milliseconds 1second - 1 day Language binding: C, C++, FORTRAN Java, Python Address space to compute dependences: Memory Files, Objects (SCM) Parallel Ensemble, workflow
  • 5. 5 OpenMP void Cholesky(int NT, float *A[NT][NT] ) { #pragma omp parallel #pragma omp single for (int k=0; k<NT; k++) { #pragma omp task spotrf (A[k][k], TS); #pragma omp taskwait for (int i=k+1; i<NT; i++) { #pragma omp task strsm (A[k][k], A[k][i], TS); } #pragma omp taskwait for (int i=k+1; i<NT; i++) { for (j=k+1; j<i; j++) #pragma omp task sgemm( A[k][i], A[k][j], A[j][i], TS); #pragma omp task ssyrk (A[k][i], A[i][i], TS); #pragma omp taskwait } } }
  • 6. 6 Extend OpenMP with a data-flow execution model to exploit unstructured parallelism (compared to fork-join) – in/out pragmas to enable dependence and locality – Inlined/outlined void Cholesky(int NT, float *A[NT][NT] ) { for (int k=0; k<NT; k++) { #pragma omp task inout ([TS][TS]A[k][k]) spotrf (A[k][k], TS) ; for (int i=k+1; i<NT; i++) { #pragma omp task in(([TS][TS]A[k][k])) inout ([TS][TS]A[k][i]) strsm (A[k][k], A[k][i], TS); } for (int i=k+1; i<NT; i++) { for (j=k+1; j<i; j++) { #pragma omp task in([TS][TS]A[k][i]), in([TS][TS]A[k][j]) inout ([TS][TS]A[j][i]) sgemm( A[k][i], A[k][j], A[j][i], TS); } #pragma omp task in ([TS][TS]A[k][i]) inout([TS][TS]A[i][i]) ssyrk (A[k][i], A[i][i], TS); } } } OmpSs
  • 7. 7 OmpSs Other features and characteristics – Contiguous / strided dependence detection and data management – Nesting/recursion – Multiple implementations for a given task – Integrated with powerful performance tools (Extrae+Paraver) Continuous development and use – Since 2004 – On large applications and systems Pushing ideas into the OpenMP standard – Developer Positioning: efforts in OmpSs will not be lost. – Dependences  v 4.0 18 Used in projects and applications … Undertaken significant efforts to port real large scale applications: – • Scalapack, PLASMA, SPECFEM3D, LBC, CPMD PSC, PEPC, LS1 Mardyn, Asynchronous algorithms, Microbenchmarks – • YALES2, EUTERPE, SPECFEM3D, MP2C, BigDFT, QuantumESPRESSO, PEPC, SMMP, ProFASI, COSMO, BQCD – DEEP • NEURON, iPIC3D, ECHAM/MESSy, AVBP, TurboRVB, Seismic – G8_ECS • CGPOP, NICAM (planed) … – Consolider project (Spanish ministry) • MRGENESIS – BSC initiatives and collaborations: • GROMACS, GADGET, WRF,… 19 … but NOT only for «scientific computing» … Plagiarism detection – Histograms, sorting, … (FhI FIRST) Trace browsing – Paraver (BSC) Clustering algorithms – G-means (BSC) Image processing – Tracking (USAF) Embedded and consumer – H.264 (TUBerlin), …
  • 8. 8 The potential of asynchrony Performance … – Cholesky 8K x 8K @ SandyBridge (E5-2670, 2.6 GHz) – Intel Math Kernel Library (MKL) parallel vs. OmpSs+ MKL sequential • Nested OmpSs (large block size, small block size)  Reproducibility / less sensitivity to environment options Strong scaling Impact of problem size for fixed core count
  • 9. 9 • Detected issues (ongoing work) – scheduling policies (generic of nested OmpSs), – contention on task queues (specific of high #cores),… • Still fairly good performance!!! Xeon Phi (I)
  • 10. 10 Hybrid MPI/SMPSs Linpack example Overlap communication/computation Extend asynchronous data-flow execution to outer level Automatic lookahead … for (k=0; k<N; k++) { if (mine) { Factor_panel(A[k]); send (A[k]) } else { receive (A[k]); if (necessary) resend (A[k]); } for (j=k+1; j<N; j++) update (A[k], A[j]); … #pragma css task inout(A[SIZE]) void Factor_panel(float *A); #pragma css task input(A[SIZE]) inout(B[SIZE]) void update(float *A, float *B); #pragma css task input(A[SIZE]) void send(float *A); #pragma css task output(A[SIZE]) void receive(float *A); #pragma css task input(A[SIZE]) void resend(float *A); P0 P1 P2 V. Marjanovic, et al, “Overlapping Communication and Computation by using a Hybrid MPI/SMPSs Approach” ICS 2010
  • 11. 11 Automatically achieved by the runtime – Load balance within node – Fine grain. – Complementary to application level load balance. – Leverage OmpSs malleability LeWI: Lend core When Idle – User level Run time Library (DLB) coordinating processes within node – Fighting the Linux kernel: Explicit pinning of threads and handoff scheduling Overcome Amdahl’s law in hybrid programming – Enable partial node level parallelization – hope for lazy programmers … and us all Dynamic Load Balancing: LeWI Communication Running task Idle 4 MPI processes @ 4 cores node “LeWI: A Runtime Balancing Algorithm for Nested Parallelism”. M.Garcia et al. ICPP09
  • 12. 12 Offloading to Intel® Xeon Phi™ (SRMPI app @ DEEP) Host KNC KNC KNC KNC Host KNC KNC KNC KNC MPI MPI MPI void main(int argc, char* argv[]) { //MPI stuff …. for(int i=0; i<my_jobs; i++) { int idx = i % workers; #pragma omp task out(worker_image[idx]) worker(worker_image[idx]); #pragma omp task inout(master_image) in(worker_image[idx]); accum(master_image, worker_image[idx]) } #pragma omp taskwait global_accum(global_pool, master_image); // MPI } Host KNC KNC KNC KNC Host KNC KNC KNC KNC MPI OmpSs OmpSs void main(int argc, char* argv[]) { //MPI stuff …. for(int i=0; i<my_jobs; i++) { int idx = i % workers; // MPI stuff to send job to worker //… //….manually taking care of load balance // … // …. and collect results on worker_pool[idx] accum(master_image, worker_image[idx]) } global_accum(global_image, master_image); // MPI } void main(int argc, char* argv[]) { for(int i=0; i<my_jobs; i++) { int idx = i % workers; #pragma omp task out(worker_image[idx]) worker(worker_image[idx]); #pragma omp task inout(master_image) in(worker_image[idx]); accum(global_image, worker_image[idx]) } #pragma omp taskwait }
  • 13. 13 Offloading to Intel® Xeon Phi™ (SRMPI app @ DEEP) Productivity … … and performance … flexibility …
  • 14. 14 MPI + OmpSs @ MIC – Hydro 8 MPI processes, 2 cores/process, 4 OmpSs threads/process – Mapped to 64 hardware threads (16 cores) 4 OmpSs threads/Core 2 OmpSs threads/Core 3 OmpSs threads@ Core 1 2 OmpSs threads/Core 1 OmpSs threads@ Core 2
  • 15. 15 OmpSs: Enabler for exascale  Can exploit very unstructured parallelism  Not just loop/data parallelism  Easy to change structure  Supports large amounts of lookahead  Not stalling for dependence satisfaction  Allow for locality optimizations to tolerate latency  Overlap data transfers, prefetch  Reuse  Nicely hybridizes into MPI/StarSs  Propagates to large scale the node level dataflow characteristics  Overlap communication and computation  A chance against Amdahl’s law  Support for heterogeneity  Any # and combination of CPUs, GPUs  Including autotuning  Malleability: Decouple program from resources  Allowing dynamic resource allocation and load balance  Tolerate noise 15 Asynchrony: Data-flow Locality Simple / incremental interface to programmers Compatible with proprietary lower level technologies