SlideShare una empresa de Scribd logo
1 de 14
Descargar para leer sin conexión
OpenMP and NVIDIA
Jeff Larkin
NVIDIA Developer Technologies
OpenMP and NVIDIA
OpenMP is the dominant standard for directive-based parallel
programming.
NVIDIA joined OpenMP in 2011 to contribute to discussions around
parallel accelerators.
NVIDIA proposed the TEAMS construct for accelerators in 2012
OpenMP 4.0 with accelerator support released in 2013
Why does NVIDIA care
about OpenMP?
3 Ways to Accelerate Applications
Applications
Libraries

Compiler
Directives

Programming
Languages

“Drop-in”
Acceleration

Easily Accelerate
Applications

Maximum
Flexibility
Reaching a Broader Set of Developers

1,000,000’s

Universities
Supercomputing Centers
Oil & Gas

100,000’s
Research

Early Adopters

2004

Time

CAE
CFD
Finance
Rendering
Data Analytics
Life Sciences
Defense
Weather
Climate
Plasma Physics

Present
Introduction to
OpenMP
Teams/Distribute
OpenMP PARALLEL FOR
Executes the iterations of the
next for loop in parallel across
a team of threads.

#pragma omp parallel for
for (i=0; i<N; i++)
p[i] = v1[i] * v2[i];
OpenMP TARGET PARALLEL FOR
Offloads data and execution to
a target device, then
Executes the iterations of the
next for loop in parallel across
a team of threads.

#pragma omp target
#pragma omp parallel for
for (i=0; i<N; i++)
p[i] = v1[i] * v2[i];
OpenMP 4.0 TEAMS/DISTRIBUTE
Creates a league of teams on
the target device, distributes
blocks of work among those
teams, and executes the
remaining work in parallel
within each team

This code is portable whether 1
team or many teams are used.

#pragma omp target teams
#pragma omp 
distribute parallel for 
reduction(+:sum)
for (i=0; i<N; i++)
sum += B[i] * C[i];
OpenMP 4.0 Teams Distribute Parallel For
#pragma omp target
#pragma omp parallel for reduction(+:sum)
for (i=0; i< ??;i++) sum += B[i] **C[i];
N; i++) sum += B[i]
C[i];
OpenMP 4.0 Teams Distribute Parallel For
#pragma omp target teams
#pragma omp distribute parallel for
#pragma omp parallel for reduction(+:sum)
for (i=0; i< ??; i++) sum += B[i] * C[i];

The programmer
doesn’t have to
#pragma omp parallel for reduction(+:sum) think about how
the loop is
for (i=??; i< ??; i++) sum += B[i] * C[i];
decomposed!

#pragma omp parallel for reduction(+:sum)
for (i=??; i< ??; i++) sum += B[i] * C[i];
#pragma omp parallel for reduction(+:sum)
for (i=??; i< N; i++) sum += B[i] * C[i];
OMP + NV: We’re not
done yet!
OMP + NV: We’re not done yet!
Hardware parallelism is not going away, programmers demand a
simple, portable way to use it.
OpenMP 4.0 is just the first step toward a portable standard for
directive-based acceleration
We will continue to work with OpenMP to address the challenges
of parallel computing
Improved Interoperability
Improved Portability
Improved Expressibility
Thank You

Más contenido relacionado

La actualidad más candente

Swift for TensorFlow - Tanmay Bakshi - Advanced Spark and TensorFlow Meetup -...
Swift for TensorFlow - Tanmay Bakshi - Advanced Spark and TensorFlow Meetup -...Swift for TensorFlow - Tanmay Bakshi - Advanced Spark and TensorFlow Meetup -...
Swift for TensorFlow - Tanmay Bakshi - Advanced Spark and TensorFlow Meetup -...Chris Fregly
 
Concurrent Programming OpenMP @ Distributed System Discussion
Concurrent Programming OpenMP @ Distributed System DiscussionConcurrent Programming OpenMP @ Distributed System Discussion
Concurrent Programming OpenMP @ Distributed System DiscussionCherryBerry2
 
Distributed Prioritized Experience Replay(Ape-X)
Distributed Prioritized Experience Replay(Ape-X)Distributed Prioritized Experience Replay(Ape-X)
Distributed Prioritized Experience Replay(Ape-X)Younggyo Seo
 
Compiler Optimization-Space Exploration
Compiler Optimization-Space ExplorationCompiler Optimization-Space Exploration
Compiler Optimization-Space Explorationtmusabbir
 
Local Optimizations in Eclipse QVTc and QVTr using the Micro-Mapping Model of...
Local Optimizations in Eclipse QVTc and QVTr using the Micro-Mapping Model of...Local Optimizations in Eclipse QVTc and QVTr using the Micro-Mapping Model of...
Local Optimizations in Eclipse QVTc and QVTr using the Micro-Mapping Model of...Edward Willink
 
Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for D...
 Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for D... Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for D...
Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for D...Databricks
 
Introduction to OpenMP (Performance)
Introduction to OpenMP (Performance)Introduction to OpenMP (Performance)
Introduction to OpenMP (Performance)Akhila Prabhakaran
 
CILK/CILK++ and Reducers
CILK/CILK++ and ReducersCILK/CILK++ and Reducers
CILK/CILK++ and ReducersYunming Zhang
 
Two-level Just-in-Time Compilation with One Interpreter and One Engine
Two-level Just-in-Time Compilation with One Interpreter and One EngineTwo-level Just-in-Time Compilation with One Interpreter and One Engine
Two-level Just-in-Time Compilation with One Interpreter and One EngineYusuke Izawa
 
Matlab m files and scripts
Matlab m files and scriptsMatlab m files and scripts
Matlab m files and scriptsAmeen San
 
Stack Hybridization: A Mechanism for Bridging Two Compilation Strategies in a...
Stack Hybridization: A Mechanism for Bridging Two Compilation Strategies in a...Stack Hybridization: A Mechanism for Bridging Two Compilation Strategies in a...
Stack Hybridization: A Mechanism for Bridging Two Compilation Strategies in a...Yusuke Izawa
 

La actualidad más candente (20)

Openmp
OpenmpOpenmp
Openmp
 
Introduction to OpenMP
Introduction to OpenMPIntroduction to OpenMP
Introduction to OpenMP
 
Swift for TensorFlow - Tanmay Bakshi - Advanced Spark and TensorFlow Meetup -...
Swift for TensorFlow - Tanmay Bakshi - Advanced Spark and TensorFlow Meetup -...Swift for TensorFlow - Tanmay Bakshi - Advanced Spark and TensorFlow Meetup -...
Swift for TensorFlow - Tanmay Bakshi - Advanced Spark and TensorFlow Meetup -...
 
Concurrent Programming OpenMP @ Distributed System Discussion
Concurrent Programming OpenMP @ Distributed System DiscussionConcurrent Programming OpenMP @ Distributed System Discussion
Concurrent Programming OpenMP @ Distributed System Discussion
 
Distributed Prioritized Experience Replay(Ape-X)
Distributed Prioritized Experience Replay(Ape-X)Distributed Prioritized Experience Replay(Ape-X)
Distributed Prioritized Experience Replay(Ape-X)
 
Compiler Optimization-Space Exploration
Compiler Optimization-Space ExplorationCompiler Optimization-Space Exploration
Compiler Optimization-Space Exploration
 
Peephole Optimization
Peephole OptimizationPeephole Optimization
Peephole Optimization
 
Local Optimizations in Eclipse QVTc and QVTr using the Micro-Mapping Model of...
Local Optimizations in Eclipse QVTc and QVTr using the Micro-Mapping Model of...Local Optimizations in Eclipse QVTc and QVTr using the Micro-Mapping Model of...
Local Optimizations in Eclipse QVTc and QVTr using the Micro-Mapping Model of...
 
Nug2004 yhe
Nug2004 yheNug2004 yhe
Nug2004 yhe
 
Callgraph analysis
Callgraph analysisCallgraph analysis
Callgraph analysis
 
Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for D...
 Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for D... Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for D...
Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for D...
 
Introduction to OpenMP (Performance)
Introduction to OpenMP (Performance)Introduction to OpenMP (Performance)
Introduction to OpenMP (Performance)
 
Lecture 1
Lecture 1Lecture 1
Lecture 1
 
CILK/CILK++ and Reducers
CILK/CILK++ and ReducersCILK/CILK++ and Reducers
CILK/CILK++ and Reducers
 
Two-level Just-in-Time Compilation with One Interpreter and One Engine
Two-level Just-in-Time Compilation with One Interpreter and One EngineTwo-level Just-in-Time Compilation with One Interpreter and One Engine
Two-level Just-in-Time Compilation with One Interpreter and One Engine
 
openmp
openmpopenmp
openmp
 
Mit cilk
Mit cilkMit cilk
Mit cilk
 
Matlab m files and scripts
Matlab m files and scriptsMatlab m files and scripts
Matlab m files and scripts
 
Lesson 9.1 value returning
Lesson 9.1 value returningLesson 9.1 value returning
Lesson 9.1 value returning
 
Stack Hybridization: A Mechanism for Bridging Two Compilation Strategies in a...
Stack Hybridization: A Mechanism for Bridging Two Compilation Strategies in a...Stack Hybridization: A Mechanism for Bridging Two Compilation Strategies in a...
Stack Hybridization: A Mechanism for Bridging Two Compilation Strategies in a...
 

Similar a SC13: OpenMP and NVIDIA

OpenPOWER Webinar from University of Delaware - Title :OpenMP (offloading) o...
OpenPOWER Webinar from University of Delaware  - Title :OpenMP (offloading) o...OpenPOWER Webinar from University of Delaware  - Title :OpenMP (offloading) o...
OpenPOWER Webinar from University of Delaware - Title :OpenMP (offloading) o...Ganesan Narayanasamy
 
GTC16 - S6410 - Comparing OpenACC 2.5 and OpenMP 4.5
GTC16 - S6410 - Comparing OpenACC 2.5 and OpenMP 4.5GTC16 - S6410 - Comparing OpenACC 2.5 and OpenMP 4.5
GTC16 - S6410 - Comparing OpenACC 2.5 and OpenMP 4.5Jeff Larkin
 
Tom Peters, Software Engineer, Ufora at MLconf ATL 2016
Tom Peters, Software Engineer, Ufora at MLconf ATL 2016Tom Peters, Software Engineer, Ufora at MLconf ATL 2016
Tom Peters, Software Engineer, Ufora at MLconf ATL 2016MLconf
 
Early Results of OpenMP 4.5 Portability on NVIDIA GPUs & CPUs
Early Results of OpenMP 4.5 Portability on NVIDIA GPUs & CPUsEarly Results of OpenMP 4.5 Portability on NVIDIA GPUs & CPUs
Early Results of OpenMP 4.5 Portability on NVIDIA GPUs & CPUsJeff Larkin
 
Parallel Programming on the ANDC cluster
Parallel Programming on the ANDC clusterParallel Programming on the ANDC cluster
Parallel Programming on the ANDC clusterSudhang Shankar
 
DevOps Demystified
DevOps DemystifiedDevOps Demystified
DevOps Demystifiedadil raja
 
Chapel Comes of Age: a Language for Productivity, Parallelism, and Performance
Chapel Comes of Age: a Language for Productivity, Parallelism, and PerformanceChapel Comes of Age: a Language for Productivity, Parallelism, and Performance
Chapel Comes of Age: a Language for Productivity, Parallelism, and Performanceinside-BigData.com
 
Maths&programming forartists wip
Maths&programming forartists wipMaths&programming forartists wip
Maths&programming forartists wipkedar nath
 
Parallel processing -open mp
Parallel processing -open mpParallel processing -open mp
Parallel processing -open mpTanjilla Sarkar
 
Better Code: Concurrency
Better Code: ConcurrencyBetter Code: Concurrency
Better Code: ConcurrencyPlatonov Sergey
 
NYAI - Scaling Machine Learning Applications by Braxton McKee
NYAI - Scaling Machine Learning Applications by Braxton McKeeNYAI - Scaling Machine Learning Applications by Braxton McKee
NYAI - Scaling Machine Learning Applications by Braxton McKeeRizwan Habib
 
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16MLconf
 
Parallel and Async Programming With C#
Parallel and Async Programming With C#Parallel and Async Programming With C#
Parallel and Async Programming With C#Rainer Stropek
 
R4ML: An R Based Scalable Machine Learning Framework
R4ML: An R Based Scalable Machine Learning FrameworkR4ML: An R Based Scalable Machine Learning Framework
R4ML: An R Based Scalable Machine Learning FrameworkAlok Singh
 

Similar a SC13: OpenMP and NVIDIA (20)

OpenPOWER Webinar from University of Delaware - Title :OpenMP (offloading) o...
OpenPOWER Webinar from University of Delaware  - Title :OpenMP (offloading) o...OpenPOWER Webinar from University of Delaware  - Title :OpenMP (offloading) o...
OpenPOWER Webinar from University of Delaware - Title :OpenMP (offloading) o...
 
GTC16 - S6410 - Comparing OpenACC 2.5 and OpenMP 4.5
GTC16 - S6410 - Comparing OpenACC 2.5 and OpenMP 4.5GTC16 - S6410 - Comparing OpenACC 2.5 and OpenMP 4.5
GTC16 - S6410 - Comparing OpenACC 2.5 and OpenMP 4.5
 
Tom Peters, Software Engineer, Ufora at MLconf ATL 2016
Tom Peters, Software Engineer, Ufora at MLconf ATL 2016Tom Peters, Software Engineer, Ufora at MLconf ATL 2016
Tom Peters, Software Engineer, Ufora at MLconf ATL 2016
 
Early Results of OpenMP 4.5 Portability on NVIDIA GPUs & CPUs
Early Results of OpenMP 4.5 Portability on NVIDIA GPUs & CPUsEarly Results of OpenMP 4.5 Portability on NVIDIA GPUs & CPUs
Early Results of OpenMP 4.5 Portability on NVIDIA GPUs & CPUs
 
Parallel Programming on the ANDC cluster
Parallel Programming on the ANDC clusterParallel Programming on the ANDC cluster
Parallel Programming on the ANDC cluster
 
DevOps Demystified
DevOps DemystifiedDevOps Demystified
DevOps Demystified
 
Chapel Comes of Age: a Language for Productivity, Parallelism, and Performance
Chapel Comes of Age: a Language for Productivity, Parallelism, and PerformanceChapel Comes of Age: a Language for Productivity, Parallelism, and Performance
Chapel Comes of Age: a Language for Productivity, Parallelism, and Performance
 
Maths&programming forartists wip
Maths&programming forartists wipMaths&programming forartists wip
Maths&programming forartists wip
 
Parallel processing -open mp
Parallel processing -open mpParallel processing -open mp
Parallel processing -open mp
 
Better Code: Concurrency
Better Code: ConcurrencyBetter Code: Concurrency
Better Code: Concurrency
 
Python introduction
Python introductionPython introduction
Python introduction
 
NYAI - Scaling Machine Learning Applications by Braxton McKee
NYAI - Scaling Machine Learning Applications by Braxton McKeeNYAI - Scaling Machine Learning Applications by Braxton McKee
NYAI - Scaling Machine Learning Applications by Braxton McKee
 
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16
 
Matlab ppt
Matlab pptMatlab ppt
Matlab ppt
 
Parallel and Async Programming With C#
Parallel and Async Programming With C#Parallel and Async Programming With C#
Parallel and Async Programming With C#
 
What`s New in Java 8
What`s New in Java 8What`s New in Java 8
What`s New in Java 8
 
presentation
presentationpresentation
presentation
 
25-MPI-OpenMP.pptx
25-MPI-OpenMP.pptx25-MPI-OpenMP.pptx
25-MPI-OpenMP.pptx
 
R4ML: An R Based Scalable Machine Learning Framework
R4ML: An R Based Scalable Machine Learning FrameworkR4ML: An R Based Scalable Machine Learning Framework
R4ML: An R Based Scalable Machine Learning Framework
 
Data Science
Data ScienceData Science
Data Science
 

Más de Jeff Larkin

Best Practices for OpenMP on GPUs - OpenMP UK Users Group
Best Practices for OpenMP on GPUs - OpenMP UK Users GroupBest Practices for OpenMP on GPUs - OpenMP UK Users Group
Best Practices for OpenMP on GPUs - OpenMP UK Users GroupJeff Larkin
 
FortranCon2020: Highly Parallel Fortran and OpenACC Directives
FortranCon2020: Highly Parallel Fortran and OpenACC DirectivesFortranCon2020: Highly Parallel Fortran and OpenACC Directives
FortranCon2020: Highly Parallel Fortran and OpenACC DirectivesJeff Larkin
 
Performance Portability Through Descriptive Parallelism
Performance Portability Through Descriptive ParallelismPerformance Portability Through Descriptive Parallelism
Performance Portability Through Descriptive ParallelismJeff Larkin
 
GTC16 - S6510 - Targeting GPUs with OpenMP 4.5
GTC16 - S6510 - Targeting GPUs with OpenMP 4.5GTC16 - S6510 - Targeting GPUs with OpenMP 4.5
GTC16 - S6510 - Targeting GPUs with OpenMP 4.5Jeff Larkin
 
Refactoring Applications for the XK7 and Future Hybrid Architectures
Refactoring Applications for the XK7 and Future Hybrid ArchitecturesRefactoring Applications for the XK7 and Future Hybrid Architectures
Refactoring Applications for the XK7 and Future Hybrid ArchitecturesJeff Larkin
 
Optimizing GPU to GPU Communication on Cray XK7
Optimizing GPU to GPU Communication on Cray XK7Optimizing GPU to GPU Communication on Cray XK7
Optimizing GPU to GPU Communication on Cray XK7Jeff Larkin
 
Progress Toward Accelerating CAM-SE
Progress Toward Accelerating CAM-SEProgress Toward Accelerating CAM-SE
Progress Toward Accelerating CAM-SEJeff Larkin
 
HPCMPUG2011 cray tutorial
HPCMPUG2011 cray tutorialHPCMPUG2011 cray tutorial
HPCMPUG2011 cray tutorialJeff Larkin
 
CUG2011 Introduction to GPU Computing
CUG2011 Introduction to GPU ComputingCUG2011 Introduction to GPU Computing
CUG2011 Introduction to GPU ComputingJeff Larkin
 
Maximizing Application Performance on Cray XT6 and XE6 Supercomputers DOD-MOD...
Maximizing Application Performance on Cray XT6 and XE6 Supercomputers DOD-MOD...Maximizing Application Performance on Cray XT6 and XE6 Supercomputers DOD-MOD...
Maximizing Application Performance on Cray XT6 and XE6 Supercomputers DOD-MOD...Jeff Larkin
 
May2010 hex-core-opt
May2010 hex-core-optMay2010 hex-core-opt
May2010 hex-core-optJeff Larkin
 
A Comparison of Accelerator Programming Models
A Comparison of Accelerator Programming ModelsA Comparison of Accelerator Programming Models
A Comparison of Accelerator Programming ModelsJeff Larkin
 
Cray XT Porting, Scaling, and Optimization Best Practices
Cray XT Porting, Scaling, and Optimization Best PracticesCray XT Porting, Scaling, and Optimization Best Practices
Cray XT Porting, Scaling, and Optimization Best PracticesJeff Larkin
 
XT Best Practices
XT Best PracticesXT Best Practices
XT Best PracticesJeff Larkin
 
Practical Examples for Efficient I/O on Cray XT Systems (CUG 2009)
Practical Examples for Efficient I/O on Cray XT Systems (CUG 2009)Practical Examples for Efficient I/O on Cray XT Systems (CUG 2009)
Practical Examples for Efficient I/O on Cray XT Systems (CUG 2009)Jeff Larkin
 

Más de Jeff Larkin (15)

Best Practices for OpenMP on GPUs - OpenMP UK Users Group
Best Practices for OpenMP on GPUs - OpenMP UK Users GroupBest Practices for OpenMP on GPUs - OpenMP UK Users Group
Best Practices for OpenMP on GPUs - OpenMP UK Users Group
 
FortranCon2020: Highly Parallel Fortran and OpenACC Directives
FortranCon2020: Highly Parallel Fortran and OpenACC DirectivesFortranCon2020: Highly Parallel Fortran and OpenACC Directives
FortranCon2020: Highly Parallel Fortran and OpenACC Directives
 
Performance Portability Through Descriptive Parallelism
Performance Portability Through Descriptive ParallelismPerformance Portability Through Descriptive Parallelism
Performance Portability Through Descriptive Parallelism
 
GTC16 - S6510 - Targeting GPUs with OpenMP 4.5
GTC16 - S6510 - Targeting GPUs with OpenMP 4.5GTC16 - S6510 - Targeting GPUs with OpenMP 4.5
GTC16 - S6510 - Targeting GPUs with OpenMP 4.5
 
Refactoring Applications for the XK7 and Future Hybrid Architectures
Refactoring Applications for the XK7 and Future Hybrid ArchitecturesRefactoring Applications for the XK7 and Future Hybrid Architectures
Refactoring Applications for the XK7 and Future Hybrid Architectures
 
Optimizing GPU to GPU Communication on Cray XK7
Optimizing GPU to GPU Communication on Cray XK7Optimizing GPU to GPU Communication on Cray XK7
Optimizing GPU to GPU Communication on Cray XK7
 
Progress Toward Accelerating CAM-SE
Progress Toward Accelerating CAM-SEProgress Toward Accelerating CAM-SE
Progress Toward Accelerating CAM-SE
 
HPCMPUG2011 cray tutorial
HPCMPUG2011 cray tutorialHPCMPUG2011 cray tutorial
HPCMPUG2011 cray tutorial
 
CUG2011 Introduction to GPU Computing
CUG2011 Introduction to GPU ComputingCUG2011 Introduction to GPU Computing
CUG2011 Introduction to GPU Computing
 
Maximizing Application Performance on Cray XT6 and XE6 Supercomputers DOD-MOD...
Maximizing Application Performance on Cray XT6 and XE6 Supercomputers DOD-MOD...Maximizing Application Performance on Cray XT6 and XE6 Supercomputers DOD-MOD...
Maximizing Application Performance on Cray XT6 and XE6 Supercomputers DOD-MOD...
 
May2010 hex-core-opt
May2010 hex-core-optMay2010 hex-core-opt
May2010 hex-core-opt
 
A Comparison of Accelerator Programming Models
A Comparison of Accelerator Programming ModelsA Comparison of Accelerator Programming Models
A Comparison of Accelerator Programming Models
 
Cray XT Porting, Scaling, and Optimization Best Practices
Cray XT Porting, Scaling, and Optimization Best PracticesCray XT Porting, Scaling, and Optimization Best Practices
Cray XT Porting, Scaling, and Optimization Best Practices
 
XT Best Practices
XT Best PracticesXT Best Practices
XT Best Practices
 
Practical Examples for Efficient I/O on Cray XT Systems (CUG 2009)
Practical Examples for Efficient I/O on Cray XT Systems (CUG 2009)Practical Examples for Efficient I/O on Cray XT Systems (CUG 2009)
Practical Examples for Efficient I/O on Cray XT Systems (CUG 2009)
 

Último

Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 

Último (20)

Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 

SC13: OpenMP and NVIDIA

  • 1. OpenMP and NVIDIA Jeff Larkin NVIDIA Developer Technologies
  • 2. OpenMP and NVIDIA OpenMP is the dominant standard for directive-based parallel programming. NVIDIA joined OpenMP in 2011 to contribute to discussions around parallel accelerators. NVIDIA proposed the TEAMS construct for accelerators in 2012 OpenMP 4.0 with accelerator support released in 2013
  • 3. Why does NVIDIA care about OpenMP?
  • 4. 3 Ways to Accelerate Applications Applications Libraries Compiler Directives Programming Languages “Drop-in” Acceleration Easily Accelerate Applications Maximum Flexibility
  • 5. Reaching a Broader Set of Developers 1,000,000’s Universities Supercomputing Centers Oil & Gas 100,000’s Research Early Adopters 2004 Time CAE CFD Finance Rendering Data Analytics Life Sciences Defense Weather Climate Plasma Physics Present
  • 7. OpenMP PARALLEL FOR Executes the iterations of the next for loop in parallel across a team of threads. #pragma omp parallel for for (i=0; i<N; i++) p[i] = v1[i] * v2[i];
  • 8. OpenMP TARGET PARALLEL FOR Offloads data and execution to a target device, then Executes the iterations of the next for loop in parallel across a team of threads. #pragma omp target #pragma omp parallel for for (i=0; i<N; i++) p[i] = v1[i] * v2[i];
  • 9. OpenMP 4.0 TEAMS/DISTRIBUTE Creates a league of teams on the target device, distributes blocks of work among those teams, and executes the remaining work in parallel within each team This code is portable whether 1 team or many teams are used. #pragma omp target teams #pragma omp distribute parallel for reduction(+:sum) for (i=0; i<N; i++) sum += B[i] * C[i];
  • 10. OpenMP 4.0 Teams Distribute Parallel For #pragma omp target #pragma omp parallel for reduction(+:sum) for (i=0; i< ??;i++) sum += B[i] **C[i]; N; i++) sum += B[i] C[i];
  • 11. OpenMP 4.0 Teams Distribute Parallel For #pragma omp target teams #pragma omp distribute parallel for #pragma omp parallel for reduction(+:sum) for (i=0; i< ??; i++) sum += B[i] * C[i]; The programmer doesn’t have to #pragma omp parallel for reduction(+:sum) think about how the loop is for (i=??; i< ??; i++) sum += B[i] * C[i]; decomposed! #pragma omp parallel for reduction(+:sum) for (i=??; i< ??; i++) sum += B[i] * C[i]; #pragma omp parallel for reduction(+:sum) for (i=??; i< N; i++) sum += B[i] * C[i];
  • 12. OMP + NV: We’re not done yet!
  • 13. OMP + NV: We’re not done yet! Hardware parallelism is not going away, programmers demand a simple, portable way to use it. OpenMP 4.0 is just the first step toward a portable standard for directive-based acceleration We will continue to work with OpenMP to address the challenges of parallel computing Improved Interoperability Improved Portability Improved Expressibility