SlideShare una empresa de Scribd logo
1 de 53
Ralph H. Castain
Intel
Joshua Hursey
IBM
PMIx: State-of-the-Union
https://pmix.orgSlack: pmix-workspace.slack.com
Agenda
• Quick status update
 Library releases
 Standards documents
• Application examples
 PMIx Basics: Querying information
 OpenSHMEM
 Interlibrary coordination
 MPI Sessions: Dynamic Process Grouping
 Dynamic Programming Models (web only)
Ralph H. Castain, Joshua Hursey, Aurelien Bouteiller, David Solt, “PMIx: Process management for exascale
environments”, Parallel Computing, 2018.
https://doi.org/10.1016/j.parco.2018.08.002
Overview Paper
• Standards Documents
 v2.0 released Sept. 27, 2018
 v2.1 (errata/clarifications, Dec.)
 v3.0 (Dec.)
 v4.0 (1Q19)
Status Snapshot
Current Library Releases
https://github.com/pmix/pmix-standard/releases
• Clarify terms
 Session, job, application
• New chapter detailing namespace registration
data organization
• Data retrieval examples
 Application size in multi-app scenarios
 What rank to provide for which data attributes
PMIx Standard – In Progress
• Standards Documents
 v2.0 released Sept. 27, 2018
 v2.1 (errata/clarifications, Dec.)
 v3.0 (Dec.)
 v4.0 (1Q19)
• Reference Implementation
 v3.0 => v3.1 (Dec)
 v2.1 => v2.2 (Dec)
Status Snapshot
Current Library Releases
New dstore
implementation
Full Standard
Compliance
• v2 => workflow orchestration
 (Ch.7) Job Allocation Management and Reporting, (Ch.8) Event Notification
• v3 => tools support
 Security credentials, I/O management
• v4 => tools, network, dynamic programming models
 Complete debugger attach/detach/re-attach
• Direct and indirect launch
 Fabric topology information (switches, NICs, …)
• Communication cost, connectivity graphs, inventory collection
 PMIx Groups
 Bindings (Python, …)
What’s in the Standard?
• PRRTE
 Formal releases to begin in Dec. or Jan.
 Ties to external PMIx, libevent, hwloc installations
 Support up thru current head PMIx master branch
• Cross-version support
 Regularly tested, automated script
• Help Wanted: Container developers for regular regression testing
 v2.1.1 and above remain interchangeable
 https://pmix.org/support/faq/how-does-pmix-work-with-containers/
Status Snapshot
Adoption Updates
• MPI use-cases
 Re-ordering for load balance
(UTK/ECP)
 Fault management (UTK)
 On-the-fly session
formation/teardown (MPIF)
• MPI libraries
 OMPI, MPICH, Intel MPI,
HPE-MPI, Spectrum MPI,
Fujitsu MPI
• Resource Managers (RMs)
 Slurm, Fujitsu,
IBM’s Job Step Manager (JSM),
PBSPro (2019), Kubernetes(?)
• Spark, TensorFlow
 Just getting underway
• OpenSHMEM
 Multiple implementations
Artem Y. Polyakov, Joshua S. Ladd, Boris I. Karasev
Mellanox Technologies
Shared Memory Optimizations (ds21)
• Improvements to intra-node performance
 Highly optimized the intra-node communication component dstor/ds21[1]
• Significantly improved the PMIx_Get latency using the 2N-lock algorithm on systems with a large
number of processes per node accessing many keys.
• Designed for exascale: Deployed in production on CORAL systems, Summit and Sierra,
POWER9-based systems.
• Available in PMIx v2.2, v3.1, and subsequent releases.
 Enabled by default.
• Improvements to the SLURM PMIx plug-in
 Added a ring-based PMIx_Fence collective (Slurm v18.08)
 Added support for high-performance RDMA point-to-point
via UCX (Slurm v17.11) [2]
 Verified PMIx compatibility through v3.x API (Slurm v18.08)
[1] Artem Polyakov et al. A Scalable PMIx Database (poster) EuroMPI’18: https://eurompi2018.bsc.es/sites/default/files/uploaded/dstore_empi2018.pdf
[2] Artem Polyakov PMIx Plugin with UCX Support (SC17), SchedMD Booth talk: https://slurm.schedmd.com/publications.html
Mellanox update
PMIx_Get() latency (POWER8 system)
Joshua Hursey
IBM
PMIx Basics: Querying Information
IBM
Systems
Harnessing PMIx capabilities
#include <pmix.h>
int main(int argc, char**argv) {
pmix_proc_tmyproc,proc_wildcard;
pmix_value_t value;
pmix_value_t *val =&value;
PMIx_Init(&myproc, NULL, 0);
PMIX_PROC_CONSTRUCT(&proc_wildcard);
(void)strncpy(proc_wildcard.nspace, myproc.nspace, PMIX_MAX_NSLEN);
proc_wildcard.rank =PMIX_RANK_WILDCARD;
PMIx_Get(&proc_wildcard, PMIX_JOB_SIZE,NULL, 0, &val);
job_size =val->data.uint32;
PMIx_Get(&myproc, PMIX_HOSTNAME, NULL, 0, &val);
strncpy(hostname, val->data.string,256);
printf("%d/%d) Hello Worldfrom %sn", myproc.rank, job_size, hostname);
PMIx_Get(&proc_wildcard, PMIX_LOCALLDR,NULL, 0, &val);// Lowest rankon this node
printf("%d/%d) Lowest Local Rank: %dn", myproc.rank, job_size,val->data.rank);
PMIx_Get(&proc_wildcard, PMIX_LOCAL_SIZE,NULL, 0, &val);// Number ofranks on this node
printf("%d/%d) Local Ranks: %dn", myproc.rank, job_size, val->data.uint32);
PMIx_Fence(&proc_wildcard, 1, NULL, 0); // Synchronize processes (not required, just fordemo)
PMIx_Finalize(NULL, 0); // Cleanup
return 0;
}
• https://pmix.org/support/faq/rm-provided-information/
• https://pmix.org/support/faq/what-information-is-an-rm-supposed-to-provide/
IBM
Systems
Harnessing PMIx capabilities
volatile bool isdone =false;
static void cbfunc(pmix_status_t status, pmix_info_t*info, size_t ninfo, void *cbdata,
pmix_release_cbfunc_t release_fn, void *release_cbdata)
{
size_t n, i;
if (0 < ninfo) {
for (n=0; n < ninfo; n++) {
// iterate trough the info[n].value.data.darray->arrayof
/*
typedef struct pmix_proc_info {
pmix_proc_tproc;
char *hostname;
char *executable_name;
pid_tpid;
int exit_code;
pmix_proc_state_t state;
} pmix_proc_info_t;
*/
}
}
if (NULL !=release_fn) {
release_fn(release_cbdata);
}
isdone =true;
}
• https://pmix.org/support/how-to/example-direct-launch-debugger-tool/
• https://pmix.org/support/how-to/example-indirect-launch-debugger-tool/
#include <pmix.h>
intmain(int argc, char**argv) {
pmix_query_t*query;
char clientspace[PMIX_MAX_NSLEN+1];
// ... Initialize and setup – PMIx_tool_init()...
/* get the proctable forthis nspace */
PMIX_QUERY_CREATE(query, 1);
PMIX_ARGV_APPEND(rc, query[0].keys, PMIX_QUERY_PROC_TABLE);
query[0].nqual =1;
PMIX_INFO_CREATE(query->qualifiers, query[0].nqual);
PMIX_INFO_LOAD(&query->qualifiers[0], PMIX_NSPACE, clientspace, PMIX_STRING);
if (PMIX_SUCCESS!=(rc=PMIx_Query_info_nb(query, 1,cbfunc, NULL))) {
exit(42);
}
/* wait toget a response */
while( !isdone ) { usleep(10); }
// ... Finalize andcleanup
}
Tony Curtis <anthony.curtis@stonybrook.edu>
Dr. Barbara Chapman
Abdullah Shahneous Bari Sayket, Wenbin Lü, Wes Suttle
Stony Brook University
OSSS OpenSHMEM-on-UCX / Fault Tolerance
https://www.iacs.stonybrook.edu/
PMIx@SC18: OSSS OpenSHMEM-on-UCX / Fault Tolerance
• OpenSHMEM
 Partitioned Global Address Space library
 Take advantage of RDMA
• put/get directly to/from remote memory
• Atomics, locks
• Collectives
• And more…
http://www.openshmem.org/
PMIx@SC18: OSSS OpenSHMEM-on-UCX / Fault Tolerance
• OpenSHMEM: our bit
 Reference Implementation
• OSSS + LANL + SBU + Rice + Duke + Rhodes
• Wireup/maintenance: PMIx
• Communication substrate: UCX
 shm, IB, uGNI, CUDA, …
PMIx@SC18: OSSS OpenSHMEM-on-UCX / Fault Tolerance
• Reference Implementation
 PMIx/UCX for OpenSHMEM ≥ 1.4
 Also for < 1.4 based on GASNet
 Open-Source @ github
• Or my dev repo if you’re brave
• Our research vehicle
PMIx@SC18: OSSS OpenSHMEM-on-UCX / Fault Tolerance
• Reference Implementation
 PMIx/PRRTE used as o-o-b startup / launch
 Exchange of wireup info e.g. heap addresses/rkeys
 Rank/locality and other “ambient” queries
 Stays out of the way until OpenSHMEM finalized
 Could kill it once we’re up but used for fault tolerance
project
PMIx@SC18: OSSS OpenSHMEM-on-UCX / Fault Tolerance
• Fault Tolerance (NSF) #1
 Project with UTK and Rutgers
 Current OpenSHMEM specification lacks general
fault tolerance (FT) features
 PMIx has basic FT building blocks already
• Event handling, process monitoring, job control
 Using these features to build FT API for
OpenSHMEM
PMIx@SC18: OSSS OpenSHMEM-on-UCX / Fault Tolerance
• Fault Tolerance (NSF) #2
 User specifies
• Desired process monitoring scheme
• Application-specific fault mitigation procedure
 Then our FT API takes care of:
• Initiating specified process monitoring
• Registering fault mitigation routine with PMIx server
 PMIx takes care of the rest
PMIx@SC18: OSSS OpenSHMEM-on-UCX / Fault Tolerance
• Fault Tolerance (NSF) #3
 Longer-term goal: automated selection and
implementation of FT techniques
 Compiler chooses from a small set of pre-packaged
FT schemes
 Appropriate technique selected based on application
structure and data access patterns
 Compiler implements the scheme at compile-time
PMIx@SC18: OSSS OpenSHMEM-on-UCX / Fault Tolerance
• PMIx @ github
 We try to keep up on the bleeding edge of both PMIx
& PRRTE for development
 But make sure releases work too!
 We’ve opened quite a few tickets
• Always fixed or nixed quickly!
Any (easy) Questions?
Tony Curtis <anthony.curtis@stonybrook.edu>
Dr. Barbara Chapman
Abdullah Shahneous Bari Sayket, Wenbin Lü, Wes Suttle
Stony Brook University
OSSS OpenSHMEM-on-UCX / Fault Tolerance
https://www.iacs.stonybrook.edu/
http://www.openshmem.org/
http://www.openucx.org/
Geoffroy Vallee
Oak Ridge National Laboratory
Runtime Coordination Based on PMIx
Introduction
• Many pre-exascale systems show a fairly
drastic hardware architecture change
 Bigger/complex compute nodes
 Summit: 2 IBM Power-9 CPUs, 6 GPUs per node
• Requires most scientific simulations to switch
from pure MPI to a MPI+X paradigm
• MPI+OpenMP is a popular solution
Challenges
• MPI and OpenMP are independent communities
• Implementations un-aware of each other
 MPI will deploy ranks without any knowledge of the
OpenMP threads that will run under each rank
 OpenMP assumes by default that all available
resources can be used
It is very difficult for users to have a fine-grain control
over application execution
Context
• U.S. DOE Exascale Computing Project (ECP)
• ECP OMPI-X project
• ECP SOLLVE project
• Oak Ridge Leadership Computing Facility
(OCLF)
Challenges (2)
• Impossible to coordinate runtimes and optimize
resource utilization
 Runtimes independently allocate resources
 No fine-grain & coordinated control over the
placement of MPI ranks and OpenMP threads
 Difficult to express affinity across runtimes
 Difficult to optimize applications
Illustration
• On Summit nodes
Numa Node 1
Numa Node 2
GPU
GPU
GPU
GPU
GPU
GPU
Node architecture
Numa Node 1
Numa Node 2
GPU
GPU
GPU
GPU
GPU
GPU
Desired placement
MPI rank / OpenMP master thread
OpenMP thread
Proposed Solution
• Make all runtimes PMIx aware for data
exchange and notification through events
• Key PMIx characteristics
 Distributed key/value store for data exchange
 Asynchronous events for coordination
 Enable interactions with the resource manager
• Modification of LLVM OpenMP and Open MPI
Two Use-cases
• Fine-grain placement over MPI ranks and
OpenMP threads (prototyping stage)
• Inter-runtime progress (design stage)
Fine-grain Placement
• Goals
 Explicit placement of MPI ranks
 Explicit placement of OpenMP threads
 Re-configure MPI+OpenMP layout at runtime
• Propose the concept of layout
 Static layout: explicit definition of ranks and threads
placement at initialization time only
 Dynamic layouts: change placement at runtime
Layout: Example
→
Static Layouts
• New MPI mapper
 Get the layout specified by the user
 Publish it through PMIx
• New MPI OpenMP Coordination (MOC) helper-
library gets the layout and set OpenMP places
to specify threads will be created placement
• No modification to OpenMP standard or runtime
Dynamic Layouts
• New API to define phases within the application
 A static layout is deployed for each phase
 Modification of the layout between phases (w/ MOC)
 Well adapted to the nature of some applications
• Requires
 OpenMP runtime modifications (LLVM prototype)
• Get phases’ layouts and set new places internally
 OpenMP standard modifications to allow dynamicity
Architecture Overview
• MOC ensures inter-runtime data
exchanged (MOC-specific PMIx
keys)
• Runtimes are PMIx-aware
• Runtimes have a layout-aware
mapper/affinity manager
• Respect the separation between
standards, resource manager
and runtimes
Inter-runtime Progress
• Runtime usually guarantee internal progress but
are not aware of other runtimes
• Runtimes can end up in a deadlock if mutually
waiting for completion
 MPI operation is ready to complete
 Application calls and block in other runtimes
Inter-runtime Progress (2)
• Proposed solution
 Runtimes use PMIx to notify of progress
 When a runtime gets a progress notification, it yields
once done with all tasks that can be completed
• Currently working on prototyping
Conclusion
• There is a real need for making runtimes aware
of each other
• PMIx is a privileged option
 Asynchronous / event based
 Data exchange
 Enable interaction with the resource manager
• Opens new opportunities for both research and
development (e.g. new project focusing on QoS)
Dan Holmes
EPCC University of Edinburgh
Howard Pritchard, Nathan Hjelm
Los Alamos National Laboratory
MPI Sessions: Dynamic Proc. Groups
Using PMIx to Help
Replace MPI_Init
14th November 2018
Dan Holmes, Howard Pritchard, Nathan Hjelm
LA-UR-18-30830
Problems with MPI_Init
● All MPI processes must initialize MPI exactly once
● MPI cannot be initialized within an MPI process from different
application components without coordination
● MPI cannot be re-initialized after MPI is finalized
● Error handling for MPI initialization cannot be specified
Sessions – a new way to start
MPI
● General scheme:
● Query the underlying
run-time system
● Get a “set” of
processes
● Determine the processes
you want
● Create an MPI_Group
● Create a communicator
with just those processes
● Create an MPI_Comm
Query runtime
for set of processes
MPI_Group
MPI_Comm
MPI_Session
MPI Sessions proposed API
● Create (or destroy) a session:
● MPI_SESSION_INIT (and MPI_SESSION_FINALIZE)
● Get names of sets of processes:
● MPI_SESSION_GET_NUM_PSETS,
MPI_SESSION_GET_NTH_PSET
● Create an MPI_GROUP from a process set name:
● MPI_GROUP_CREATE_FROM_SESSION
● Create an MPI_COMM from an MPI_GROUP:
● MPI_COMM_CREATE_FROM_GROUP
PMIx
groups
helps here
MPI_COMM_CREATE_FROM_GROU
P
The ‘uri’ is supplied by the application.
Implementation challenge: ‘group’ is a local object.
Need some way to synchronize with other “joiners” to the
communicator. The ‘uri’ is different than a process set
name.
MPI_Create_comm_from_group(IN MPI_Group group,
IN const char *uri,
IN MPI_Info info,
IN MPI_Erhandler hndl,
OUT MPI_Comm *comm);
Using PMIx Groups
● PMIx Groups - a collection of processes desiring a unified
identifier for purposes such as passing events or participating in
PMIx fence operations
● Invite/join/leave semantics
● Sessions prototype implementation currently uses
PMIX_Group_construct/PMIX_Group_destruct
● Can be used to generate a “unique” 64-bit identifier for the group.
Used by the sessions prototype to generate a communicator ID.
● Useful options for future work
● Timeout for processes joining the group
● Asynchronous notification when a process leaves the group
Using PMIx_Group_Construct
● ‘id’ maps to/from the ‘uri’ in MPI_Comm_create_from_group
(plus additional Open MPI internal info)
● ‘procs’ array comes from information previously supplied by PMIx
○ “mpi://world” and “mpi://self” already available
○ mpiexec -np 2 --pset user://ocean ocean.x : 
-np 2 --pset user://atmosphere atmosphere.x
PMIx_Group_Construct(const char id[],
const pmix_proc_t procs[],
const pmix_info_t info[],
size_t ninfo);
MPI Sessions Prototype Status
● All “MPI_*_from_group” functions have been implemented
● Only pml/ob1 supported at this time
● Working on MPI_Group creation (from PSET) now
● Will start with mpi://world and mpi://self
● User-defined process sets need additional support from PMIx
● Up next: break apart MPI initialization
● Goal is to reduce startup time and memory footprint
Summary
● PMIx Groups provides an OOB mechanism for MPI processes to
bootstrap the formation of a communication context (MPI
Communicator) from a group of MPI processes
● Functionality for future work
● Handling (unexpected) process exit
● User-defined process sets
● Group expansion
Funding Acknowledgments
Useful Links:
General Information: https://pmix.org/
PMIx Library: https://github.com/pmix/pmix
PMIx Reference RunTime Environment (PRRTE): https://github.com/pmix/prrte
PMIx Standard: https://github.com/pmix/pmix-standard
Slack: pmix-workspace.slack.com
Q&A

Más contenido relacionado

La actualidad más candente

Linux Kernel Cryptographic API and Use Cases
Linux Kernel Cryptographic API and Use CasesLinux Kernel Cryptographic API and Use Cases
Linux Kernel Cryptographic API and Use CasesKernel TLV
 
2008-09-09 IBM Interaction Conference, Red Hat Update for System z
2008-09-09 IBM Interaction Conference, Red Hat Update for System z2008-09-09 IBM Interaction Conference, Red Hat Update for System z
2008-09-09 IBM Interaction Conference, Red Hat Update for System zShawn Wells
 
44CON 2013 - Hack the Gibson - Exploiting Supercomputers - John Fitzpatrick &...
44CON 2013 - Hack the Gibson - Exploiting Supercomputers - John Fitzpatrick &...44CON 2013 - Hack the Gibson - Exploiting Supercomputers - John Fitzpatrick &...
44CON 2013 - Hack the Gibson - Exploiting Supercomputers - John Fitzpatrick &...44CON
 
Data Plane and VNF Acceleration Mini Summit
Data Plane and VNF Acceleration Mini Summit Data Plane and VNF Acceleration Mini Summit
Data Plane and VNF Acceleration Mini Summit Open-NFP
 
Overview of Linux real-time challenges
Overview of Linux real-time challengesOverview of Linux real-time challenges
Overview of Linux real-time challengesDaniel Stenberg
 
ProjectVault[VivekKumar_CS-C_6Sem_MIT].pptx
ProjectVault[VivekKumar_CS-C_6Sem_MIT].pptxProjectVault[VivekKumar_CS-C_6Sem_MIT].pptx
ProjectVault[VivekKumar_CS-C_6Sem_MIT].pptxVivek Kumar
 
Accelerating Envoy and Istio with Cilium and the Linux Kernel
Accelerating Envoy and Istio with Cilium and the Linux KernelAccelerating Envoy and Istio with Cilium and the Linux Kernel
Accelerating Envoy and Istio with Cilium and the Linux KernelThomas Graf
 
Dpdk: rte_security: An update and introducing PDCP
Dpdk: rte_security: An update and introducing PDCPDpdk: rte_security: An update and introducing PDCP
Dpdk: rte_security: An update and introducing PDCPHemant Agrawal
 
OpenStack Summit Tokyo - Know-how of Challlenging Deploy/Operation NTT DOCOMO...
OpenStack Summit Tokyo - Know-how of Challlenging Deploy/Operation NTT DOCOMO...OpenStack Summit Tokyo - Know-how of Challlenging Deploy/Operation NTT DOCOMO...
OpenStack Summit Tokyo - Know-how of Challlenging Deploy/Operation NTT DOCOMO...Masaaki Nakagawa
 
Dpdk 2019-ipsec-eventdev
Dpdk 2019-ipsec-eventdevDpdk 2019-ipsec-eventdev
Dpdk 2019-ipsec-eventdevHemant Agrawal
 
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDP
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDPDockerCon 2017 - Cilium - Network and Application Security with BPF and XDP
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDPThomas Graf
 
eBPF - Rethinking the Linux Kernel
eBPF - Rethinking the Linux KerneleBPF - Rethinking the Linux Kernel
eBPF - Rethinking the Linux KernelThomas Graf
 
Introduction to eBPF
Introduction to eBPFIntroduction to eBPF
Introduction to eBPFRogerColl2
 
Evolving Virtual Networking with IO Visor
Evolving Virtual Networking with IO VisorEvolving Virtual Networking with IO Visor
Evolving Virtual Networking with IO VisorLarry Lang
 
OVS and DPDK - T.F. Herbert, K. Traynor, M. Gray
OVS and DPDK - T.F. Herbert, K. Traynor, M. GrayOVS and DPDK - T.F. Herbert, K. Traynor, M. Gray
OVS and DPDK - T.F. Herbert, K. Traynor, M. Grayharryvanhaaren
 
The Impact of Software-based Virtual Network in the Public Cloud
The Impact of Software-based Virtual Network in the Public CloudThe Impact of Software-based Virtual Network in the Public Cloud
The Impact of Software-based Virtual Network in the Public CloudChunghan Lee
 
SDN & NFV Introduction - Open Source Data Center Networking
SDN & NFV Introduction - Open Source Data Center NetworkingSDN & NFV Introduction - Open Source Data Center Networking
SDN & NFV Introduction - Open Source Data Center NetworkingThomas Graf
 

La actualidad más candente (20)

HPC at NIBR
HPC at NIBRHPC at NIBR
HPC at NIBR
 
Linux Kernel Cryptographic API and Use Cases
Linux Kernel Cryptographic API and Use CasesLinux Kernel Cryptographic API and Use Cases
Linux Kernel Cryptographic API and Use Cases
 
2008-09-09 IBM Interaction Conference, Red Hat Update for System z
2008-09-09 IBM Interaction Conference, Red Hat Update for System z2008-09-09 IBM Interaction Conference, Red Hat Update for System z
2008-09-09 IBM Interaction Conference, Red Hat Update for System z
 
44CON 2013 - Hack the Gibson - Exploiting Supercomputers - John Fitzpatrick &...
44CON 2013 - Hack the Gibson - Exploiting Supercomputers - John Fitzpatrick &...44CON 2013 - Hack the Gibson - Exploiting Supercomputers - John Fitzpatrick &...
44CON 2013 - Hack the Gibson - Exploiting Supercomputers - John Fitzpatrick &...
 
Data Plane and VNF Acceleration Mini Summit
Data Plane and VNF Acceleration Mini Summit Data Plane and VNF Acceleration Mini Summit
Data Plane and VNF Acceleration Mini Summit
 
Real time Linux
Real time LinuxReal time Linux
Real time Linux
 
Overview of Linux real-time challenges
Overview of Linux real-time challengesOverview of Linux real-time challenges
Overview of Linux real-time challenges
 
ProjectVault[VivekKumar_CS-C_6Sem_MIT].pptx
ProjectVault[VivekKumar_CS-C_6Sem_MIT].pptxProjectVault[VivekKumar_CS-C_6Sem_MIT].pptx
ProjectVault[VivekKumar_CS-C_6Sem_MIT].pptx
 
Accelerating Envoy and Istio with Cilium and the Linux Kernel
Accelerating Envoy and Istio with Cilium and the Linux KernelAccelerating Envoy and Istio with Cilium and the Linux Kernel
Accelerating Envoy and Istio with Cilium and the Linux Kernel
 
Rina sim workshop
Rina sim workshopRina sim workshop
Rina sim workshop
 
Dpdk: rte_security: An update and introducing PDCP
Dpdk: rte_security: An update and introducing PDCPDpdk: rte_security: An update and introducing PDCP
Dpdk: rte_security: An update and introducing PDCP
 
OpenStack Summit Tokyo - Know-how of Challlenging Deploy/Operation NTT DOCOMO...
OpenStack Summit Tokyo - Know-how of Challlenging Deploy/Operation NTT DOCOMO...OpenStack Summit Tokyo - Know-how of Challlenging Deploy/Operation NTT DOCOMO...
OpenStack Summit Tokyo - Know-how of Challlenging Deploy/Operation NTT DOCOMO...
 
Dpdk 2019-ipsec-eventdev
Dpdk 2019-ipsec-eventdevDpdk 2019-ipsec-eventdev
Dpdk 2019-ipsec-eventdev
 
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDP
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDPDockerCon 2017 - Cilium - Network and Application Security with BPF and XDP
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDP
 
eBPF - Rethinking the Linux Kernel
eBPF - Rethinking the Linux KerneleBPF - Rethinking the Linux Kernel
eBPF - Rethinking the Linux Kernel
 
Introduction to eBPF
Introduction to eBPFIntroduction to eBPF
Introduction to eBPF
 
Evolving Virtual Networking with IO Visor
Evolving Virtual Networking with IO VisorEvolving Virtual Networking with IO Visor
Evolving Virtual Networking with IO Visor
 
OVS and DPDK - T.F. Herbert, K. Traynor, M. Gray
OVS and DPDK - T.F. Herbert, K. Traynor, M. GrayOVS and DPDK - T.F. Herbert, K. Traynor, M. Gray
OVS and DPDK - T.F. Herbert, K. Traynor, M. Gray
 
The Impact of Software-based Virtual Network in the Public Cloud
The Impact of Software-based Virtual Network in the Public CloudThe Impact of Software-based Virtual Network in the Public Cloud
The Impact of Software-based Virtual Network in the Public Cloud
 
SDN & NFV Introduction - Open Source Data Center Networking
SDN & NFV Introduction - Open Source Data Center NetworkingSDN & NFV Introduction - Open Source Data Center Networking
SDN & NFV Introduction - Open Source Data Center Networking
 

Similar a SC'18 BoF Presentation

Exploring the Final Frontier of Data Center Orchestration: Network Elements -...
Exploring the Final Frontier of Data Center Orchestration: Network Elements -...Exploring the Final Frontier of Data Center Orchestration: Network Elements -...
Exploring the Final Frontier of Data Center Orchestration: Network Elements -...Puppet
 
SC'16 PMIx BoF Presentation
SC'16 PMIx BoF PresentationSC'16 PMIx BoF Presentation
SC'16 PMIx BoF Presentationrcastain
 
EuroMPI 2017 PMIx presentation
EuroMPI 2017 PMIx presentationEuroMPI 2017 PMIx presentation
EuroMPI 2017 PMIx presentationrcastain
 
Tamir Dresher - DotNet 7 What's new.pptx
Tamir Dresher - DotNet 7 What's new.pptxTamir Dresher - DotNet 7 What's new.pptx
Tamir Dresher - DotNet 7 What's new.pptxTamir Dresher
 
MingLiuResume2016
MingLiuResume2016MingLiuResume2016
MingLiuResume2016Ming Liu
 
Docker meetup - PaaS interoperability
Docker meetup - PaaS interoperabilityDocker meetup - PaaS interoperability
Docker meetup - PaaS interoperabilityLudovic Piot
 
KubeCon USA 2017 brief Overview - from Kubernetes meetup Bangalore
KubeCon USA 2017 brief Overview - from Kubernetes meetup BangaloreKubeCon USA 2017 brief Overview - from Kubernetes meetup Bangalore
KubeCon USA 2017 brief Overview - from Kubernetes meetup BangaloreKrishna-Kumar
 
HiPEAC Computing Systems Week 2022_Mario Porrmann presentation
HiPEAC Computing Systems Week 2022_Mario Porrmann presentationHiPEAC Computing Systems Week 2022_Mario Porrmann presentation
HiPEAC Computing Systems Week 2022_Mario Porrmann presentationVEDLIoT Project
 
PETRUCCI_Andrea_Research_Projects_and_Publications
PETRUCCI_Andrea_Research_Projects_and_PublicationsPETRUCCI_Andrea_Research_Projects_and_Publications
PETRUCCI_Andrea_Research_Projects_and_PublicationsAndrea PETRUCCI
 
HiPEAC 2019 Tutorial - Maestro RTOS
HiPEAC 2019 Tutorial - Maestro RTOSHiPEAC 2019 Tutorial - Maestro RTOS
HiPEAC 2019 Tutorial - Maestro RTOSTulipp. Eu
 
Intel open stack-summit-session-nov13-final
Intel open stack-summit-session-nov13-finalIntel open stack-summit-session-nov13-final
Intel open stack-summit-session-nov13-finalDeepak Mane
 
OCP Telco Engineering Workshop at BCE2017
OCP Telco Engineering Workshop at BCE2017OCP Telco Engineering Workshop at BCE2017
OCP Telco Engineering Workshop at BCE2017Radisys Corporation
 
Microservices @ Work - A Practice Report of Developing Microservices
Microservices @ Work - A Practice Report of Developing MicroservicesMicroservices @ Work - A Practice Report of Developing Microservices
Microservices @ Work - A Practice Report of Developing MicroservicesQAware GmbH
 
Introduction to Civil Infrastructure Platform
Introduction to Civil Infrastructure PlatformIntroduction to Civil Infrastructure Platform
Introduction to Civil Infrastructure PlatformSZ Lin
 
Monitoring federation open stack infrastructure
Monitoring federation open stack infrastructureMonitoring federation open stack infrastructure
Monitoring federation open stack infrastructureFernando Lopez Aguilar
 

Similar a SC'18 BoF Presentation (20)

GPA Software Overview R3
GPA Software Overview R3GPA Software Overview R3
GPA Software Overview R3
 
Exploring the Final Frontier of Data Center Orchestration: Network Elements -...
Exploring the Final Frontier of Data Center Orchestration: Network Elements -...Exploring the Final Frontier of Data Center Orchestration: Network Elements -...
Exploring the Final Frontier of Data Center Orchestration: Network Elements -...
 
eBPF Basics
eBPF BasicseBPF Basics
eBPF Basics
 
SC'16 PMIx BoF Presentation
SC'16 PMIx BoF PresentationSC'16 PMIx BoF Presentation
SC'16 PMIx BoF Presentation
 
EuroMPI 2017 PMIx presentation
EuroMPI 2017 PMIx presentationEuroMPI 2017 PMIx presentation
EuroMPI 2017 PMIx presentation
 
Typesafe spark- Zalando meetup
Typesafe spark- Zalando meetupTypesafe spark- Zalando meetup
Typesafe spark- Zalando meetup
 
Tamir Dresher - DotNet 7 What's new.pptx
Tamir Dresher - DotNet 7 What's new.pptxTamir Dresher - DotNet 7 What's new.pptx
Tamir Dresher - DotNet 7 What's new.pptx
 
MingLiuResume2016
MingLiuResume2016MingLiuResume2016
MingLiuResume2016
 
Review of QNX
Review of QNXReview of QNX
Review of QNX
 
Docker meetup - PaaS interoperability
Docker meetup - PaaS interoperabilityDocker meetup - PaaS interoperability
Docker meetup - PaaS interoperability
 
KubeCon USA 2017 brief Overview - from Kubernetes meetup Bangalore
KubeCon USA 2017 brief Overview - from Kubernetes meetup BangaloreKubeCon USA 2017 brief Overview - from Kubernetes meetup Bangalore
KubeCon USA 2017 brief Overview - from Kubernetes meetup Bangalore
 
Mastering Real-time Linux
Mastering Real-time LinuxMastering Real-time Linux
Mastering Real-time Linux
 
HiPEAC Computing Systems Week 2022_Mario Porrmann presentation
HiPEAC Computing Systems Week 2022_Mario Porrmann presentationHiPEAC Computing Systems Week 2022_Mario Porrmann presentation
HiPEAC Computing Systems Week 2022_Mario Porrmann presentation
 
PETRUCCI_Andrea_Research_Projects_and_Publications
PETRUCCI_Andrea_Research_Projects_and_PublicationsPETRUCCI_Andrea_Research_Projects_and_Publications
PETRUCCI_Andrea_Research_Projects_and_Publications
 
HiPEAC 2019 Tutorial - Maestro RTOS
HiPEAC 2019 Tutorial - Maestro RTOSHiPEAC 2019 Tutorial - Maestro RTOS
HiPEAC 2019 Tutorial - Maestro RTOS
 
Intel open stack-summit-session-nov13-final
Intel open stack-summit-session-nov13-finalIntel open stack-summit-session-nov13-final
Intel open stack-summit-session-nov13-final
 
OCP Telco Engineering Workshop at BCE2017
OCP Telco Engineering Workshop at BCE2017OCP Telco Engineering Workshop at BCE2017
OCP Telco Engineering Workshop at BCE2017
 
Microservices @ Work - A Practice Report of Developing Microservices
Microservices @ Work - A Practice Report of Developing MicroservicesMicroservices @ Work - A Practice Report of Developing Microservices
Microservices @ Work - A Practice Report of Developing Microservices
 
Introduction to Civil Infrastructure Platform
Introduction to Civil Infrastructure PlatformIntroduction to Civil Infrastructure Platform
Introduction to Civil Infrastructure Platform
 
Monitoring federation open stack infrastructure
Monitoring federation open stack infrastructureMonitoring federation open stack infrastructure
Monitoring federation open stack infrastructure
 

Más de rcastain

PMIx: Bridging the Container Boundary
PMIx: Bridging the Container BoundaryPMIx: Bridging the Container Boundary
PMIx: Bridging the Container Boundaryrcastain
 
PMIx Tiered Storage Support
PMIx Tiered Storage SupportPMIx Tiered Storage Support
PMIx Tiered Storage Supportrcastain
 
SC'17 BoF Presentation
SC'17 BoF PresentationSC'17 BoF Presentation
SC'17 BoF Presentationrcastain
 
PMIx: Debuggers and Fabric Support
PMIx: Debuggers and Fabric SupportPMIx: Debuggers and Fabric Support
PMIx: Debuggers and Fabric Supportrcastain
 
HPC Resource Management: Futures
HPC Resource Management: FuturesHPC Resource Management: Futures
HPC Resource Management: Futuresrcastain
 
SC15 PMIx Birds-of-a-Feather
SC15 PMIx Birds-of-a-FeatherSC15 PMIx Birds-of-a-Feather
SC15 PMIx Birds-of-a-Featherrcastain
 
HPC Controls Future
HPC Controls FutureHPC Controls Future
HPC Controls Futurercastain
 
Exascale Process Management Interface
Exascale Process Management InterfaceExascale Process Management Interface
Exascale Process Management Interfacercastain
 

Más de rcastain (8)

PMIx: Bridging the Container Boundary
PMIx: Bridging the Container BoundaryPMIx: Bridging the Container Boundary
PMIx: Bridging the Container Boundary
 
PMIx Tiered Storage Support
PMIx Tiered Storage SupportPMIx Tiered Storage Support
PMIx Tiered Storage Support
 
SC'17 BoF Presentation
SC'17 BoF PresentationSC'17 BoF Presentation
SC'17 BoF Presentation
 
PMIx: Debuggers and Fabric Support
PMIx: Debuggers and Fabric SupportPMIx: Debuggers and Fabric Support
PMIx: Debuggers and Fabric Support
 
HPC Resource Management: Futures
HPC Resource Management: FuturesHPC Resource Management: Futures
HPC Resource Management: Futures
 
SC15 PMIx Birds-of-a-Feather
SC15 PMIx Birds-of-a-FeatherSC15 PMIx Birds-of-a-Feather
SC15 PMIx Birds-of-a-Feather
 
HPC Controls Future
HPC Controls FutureHPC Controls Future
HPC Controls Future
 
Exascale Process Management Interface
Exascale Process Management InterfaceExascale Process Management Interface
Exascale Process Management Interface
 

Último

Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 

Último (20)

Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 

SC'18 BoF Presentation

  • 1. Ralph H. Castain Intel Joshua Hursey IBM PMIx: State-of-the-Union https://pmix.orgSlack: pmix-workspace.slack.com
  • 2. Agenda • Quick status update  Library releases  Standards documents • Application examples  PMIx Basics: Querying information  OpenSHMEM  Interlibrary coordination  MPI Sessions: Dynamic Process Grouping  Dynamic Programming Models (web only)
  • 3. Ralph H. Castain, Joshua Hursey, Aurelien Bouteiller, David Solt, “PMIx: Process management for exascale environments”, Parallel Computing, 2018. https://doi.org/10.1016/j.parco.2018.08.002 Overview Paper
  • 4. • Standards Documents  v2.0 released Sept. 27, 2018  v2.1 (errata/clarifications, Dec.)  v3.0 (Dec.)  v4.0 (1Q19) Status Snapshot Current Library Releases https://github.com/pmix/pmix-standard/releases
  • 5. • Clarify terms  Session, job, application • New chapter detailing namespace registration data organization • Data retrieval examples  Application size in multi-app scenarios  What rank to provide for which data attributes PMIx Standard – In Progress
  • 6. • Standards Documents  v2.0 released Sept. 27, 2018  v2.1 (errata/clarifications, Dec.)  v3.0 (Dec.)  v4.0 (1Q19) • Reference Implementation  v3.0 => v3.1 (Dec)  v2.1 => v2.2 (Dec) Status Snapshot Current Library Releases New dstore implementation Full Standard Compliance
  • 7. • v2 => workflow orchestration  (Ch.7) Job Allocation Management and Reporting, (Ch.8) Event Notification • v3 => tools support  Security credentials, I/O management • v4 => tools, network, dynamic programming models  Complete debugger attach/detach/re-attach • Direct and indirect launch  Fabric topology information (switches, NICs, …) • Communication cost, connectivity graphs, inventory collection  PMIx Groups  Bindings (Python, …) What’s in the Standard?
  • 8. • PRRTE  Formal releases to begin in Dec. or Jan.  Ties to external PMIx, libevent, hwloc installations  Support up thru current head PMIx master branch • Cross-version support  Regularly tested, automated script • Help Wanted: Container developers for regular regression testing  v2.1.1 and above remain interchangeable  https://pmix.org/support/faq/how-does-pmix-work-with-containers/ Status Snapshot
  • 9. Adoption Updates • MPI use-cases  Re-ordering for load balance (UTK/ECP)  Fault management (UTK)  On-the-fly session formation/teardown (MPIF) • MPI libraries  OMPI, MPICH, Intel MPI, HPE-MPI, Spectrum MPI, Fujitsu MPI • Resource Managers (RMs)  Slurm, Fujitsu, IBM’s Job Step Manager (JSM), PBSPro (2019), Kubernetes(?) • Spark, TensorFlow  Just getting underway • OpenSHMEM  Multiple implementations
  • 10. Artem Y. Polyakov, Joshua S. Ladd, Boris I. Karasev Mellanox Technologies Shared Memory Optimizations (ds21)
  • 11. • Improvements to intra-node performance  Highly optimized the intra-node communication component dstor/ds21[1] • Significantly improved the PMIx_Get latency using the 2N-lock algorithm on systems with a large number of processes per node accessing many keys. • Designed for exascale: Deployed in production on CORAL systems, Summit and Sierra, POWER9-based systems. • Available in PMIx v2.2, v3.1, and subsequent releases.  Enabled by default. • Improvements to the SLURM PMIx plug-in  Added a ring-based PMIx_Fence collective (Slurm v18.08)  Added support for high-performance RDMA point-to-point via UCX (Slurm v17.11) [2]  Verified PMIx compatibility through v3.x API (Slurm v18.08) [1] Artem Polyakov et al. A Scalable PMIx Database (poster) EuroMPI’18: https://eurompi2018.bsc.es/sites/default/files/uploaded/dstore_empi2018.pdf [2] Artem Polyakov PMIx Plugin with UCX Support (SC17), SchedMD Booth talk: https://slurm.schedmd.com/publications.html Mellanox update PMIx_Get() latency (POWER8 system)
  • 12. Joshua Hursey IBM PMIx Basics: Querying Information
  • 13. IBM Systems Harnessing PMIx capabilities #include <pmix.h> int main(int argc, char**argv) { pmix_proc_tmyproc,proc_wildcard; pmix_value_t value; pmix_value_t *val =&value; PMIx_Init(&myproc, NULL, 0); PMIX_PROC_CONSTRUCT(&proc_wildcard); (void)strncpy(proc_wildcard.nspace, myproc.nspace, PMIX_MAX_NSLEN); proc_wildcard.rank =PMIX_RANK_WILDCARD; PMIx_Get(&proc_wildcard, PMIX_JOB_SIZE,NULL, 0, &val); job_size =val->data.uint32; PMIx_Get(&myproc, PMIX_HOSTNAME, NULL, 0, &val); strncpy(hostname, val->data.string,256); printf("%d/%d) Hello Worldfrom %sn", myproc.rank, job_size, hostname); PMIx_Get(&proc_wildcard, PMIX_LOCALLDR,NULL, 0, &val);// Lowest rankon this node printf("%d/%d) Lowest Local Rank: %dn", myproc.rank, job_size,val->data.rank); PMIx_Get(&proc_wildcard, PMIX_LOCAL_SIZE,NULL, 0, &val);// Number ofranks on this node printf("%d/%d) Local Ranks: %dn", myproc.rank, job_size, val->data.uint32); PMIx_Fence(&proc_wildcard, 1, NULL, 0); // Synchronize processes (not required, just fordemo) PMIx_Finalize(NULL, 0); // Cleanup return 0; } • https://pmix.org/support/faq/rm-provided-information/ • https://pmix.org/support/faq/what-information-is-an-rm-supposed-to-provide/
  • 14. IBM Systems Harnessing PMIx capabilities volatile bool isdone =false; static void cbfunc(pmix_status_t status, pmix_info_t*info, size_t ninfo, void *cbdata, pmix_release_cbfunc_t release_fn, void *release_cbdata) { size_t n, i; if (0 < ninfo) { for (n=0; n < ninfo; n++) { // iterate trough the info[n].value.data.darray->arrayof /* typedef struct pmix_proc_info { pmix_proc_tproc; char *hostname; char *executable_name; pid_tpid; int exit_code; pmix_proc_state_t state; } pmix_proc_info_t; */ } } if (NULL !=release_fn) { release_fn(release_cbdata); } isdone =true; } • https://pmix.org/support/how-to/example-direct-launch-debugger-tool/ • https://pmix.org/support/how-to/example-indirect-launch-debugger-tool/ #include <pmix.h> intmain(int argc, char**argv) { pmix_query_t*query; char clientspace[PMIX_MAX_NSLEN+1]; // ... Initialize and setup – PMIx_tool_init()... /* get the proctable forthis nspace */ PMIX_QUERY_CREATE(query, 1); PMIX_ARGV_APPEND(rc, query[0].keys, PMIX_QUERY_PROC_TABLE); query[0].nqual =1; PMIX_INFO_CREATE(query->qualifiers, query[0].nqual); PMIX_INFO_LOAD(&query->qualifiers[0], PMIX_NSPACE, clientspace, PMIX_STRING); if (PMIX_SUCCESS!=(rc=PMIx_Query_info_nb(query, 1,cbfunc, NULL))) { exit(42); } /* wait toget a response */ while( !isdone ) { usleep(10); } // ... Finalize andcleanup }
  • 15. Tony Curtis <anthony.curtis@stonybrook.edu> Dr. Barbara Chapman Abdullah Shahneous Bari Sayket, Wenbin Lü, Wes Suttle Stony Brook University OSSS OpenSHMEM-on-UCX / Fault Tolerance https://www.iacs.stonybrook.edu/
  • 16. PMIx@SC18: OSSS OpenSHMEM-on-UCX / Fault Tolerance • OpenSHMEM  Partitioned Global Address Space library  Take advantage of RDMA • put/get directly to/from remote memory • Atomics, locks • Collectives • And more… http://www.openshmem.org/
  • 17. PMIx@SC18: OSSS OpenSHMEM-on-UCX / Fault Tolerance • OpenSHMEM: our bit  Reference Implementation • OSSS + LANL + SBU + Rice + Duke + Rhodes • Wireup/maintenance: PMIx • Communication substrate: UCX  shm, IB, uGNI, CUDA, …
  • 18.
  • 19. PMIx@SC18: OSSS OpenSHMEM-on-UCX / Fault Tolerance • Reference Implementation  PMIx/UCX for OpenSHMEM ≥ 1.4  Also for < 1.4 based on GASNet  Open-Source @ github • Or my dev repo if you’re brave • Our research vehicle
  • 20. PMIx@SC18: OSSS OpenSHMEM-on-UCX / Fault Tolerance • Reference Implementation  PMIx/PRRTE used as o-o-b startup / launch  Exchange of wireup info e.g. heap addresses/rkeys  Rank/locality and other “ambient” queries  Stays out of the way until OpenSHMEM finalized  Could kill it once we’re up but used for fault tolerance project
  • 21. PMIx@SC18: OSSS OpenSHMEM-on-UCX / Fault Tolerance • Fault Tolerance (NSF) #1  Project with UTK and Rutgers  Current OpenSHMEM specification lacks general fault tolerance (FT) features  PMIx has basic FT building blocks already • Event handling, process monitoring, job control  Using these features to build FT API for OpenSHMEM
  • 22. PMIx@SC18: OSSS OpenSHMEM-on-UCX / Fault Tolerance • Fault Tolerance (NSF) #2  User specifies • Desired process monitoring scheme • Application-specific fault mitigation procedure  Then our FT API takes care of: • Initiating specified process monitoring • Registering fault mitigation routine with PMIx server  PMIx takes care of the rest
  • 23. PMIx@SC18: OSSS OpenSHMEM-on-UCX / Fault Tolerance • Fault Tolerance (NSF) #3  Longer-term goal: automated selection and implementation of FT techniques  Compiler chooses from a small set of pre-packaged FT schemes  Appropriate technique selected based on application structure and data access patterns  Compiler implements the scheme at compile-time
  • 24. PMIx@SC18: OSSS OpenSHMEM-on-UCX / Fault Tolerance • PMIx @ github  We try to keep up on the bleeding edge of both PMIx & PRRTE for development  But make sure releases work too!  We’ve opened quite a few tickets • Always fixed or nixed quickly!
  • 25. Any (easy) Questions? Tony Curtis <anthony.curtis@stonybrook.edu> Dr. Barbara Chapman Abdullah Shahneous Bari Sayket, Wenbin Lü, Wes Suttle Stony Brook University OSSS OpenSHMEM-on-UCX / Fault Tolerance https://www.iacs.stonybrook.edu/ http://www.openshmem.org/ http://www.openucx.org/
  • 26. Geoffroy Vallee Oak Ridge National Laboratory Runtime Coordination Based on PMIx
  • 27. Introduction • Many pre-exascale systems show a fairly drastic hardware architecture change  Bigger/complex compute nodes  Summit: 2 IBM Power-9 CPUs, 6 GPUs per node • Requires most scientific simulations to switch from pure MPI to a MPI+X paradigm • MPI+OpenMP is a popular solution
  • 28. Challenges • MPI and OpenMP are independent communities • Implementations un-aware of each other  MPI will deploy ranks without any knowledge of the OpenMP threads that will run under each rank  OpenMP assumes by default that all available resources can be used It is very difficult for users to have a fine-grain control over application execution
  • 29. Context • U.S. DOE Exascale Computing Project (ECP) • ECP OMPI-X project • ECP SOLLVE project • Oak Ridge Leadership Computing Facility (OCLF)
  • 30. Challenges (2) • Impossible to coordinate runtimes and optimize resource utilization  Runtimes independently allocate resources  No fine-grain & coordinated control over the placement of MPI ranks and OpenMP threads  Difficult to express affinity across runtimes  Difficult to optimize applications
  • 31. Illustration • On Summit nodes Numa Node 1 Numa Node 2 GPU GPU GPU GPU GPU GPU Node architecture Numa Node 1 Numa Node 2 GPU GPU GPU GPU GPU GPU Desired placement MPI rank / OpenMP master thread OpenMP thread
  • 32. Proposed Solution • Make all runtimes PMIx aware for data exchange and notification through events • Key PMIx characteristics  Distributed key/value store for data exchange  Asynchronous events for coordination  Enable interactions with the resource manager • Modification of LLVM OpenMP and Open MPI
  • 33. Two Use-cases • Fine-grain placement over MPI ranks and OpenMP threads (prototyping stage) • Inter-runtime progress (design stage)
  • 34. Fine-grain Placement • Goals  Explicit placement of MPI ranks  Explicit placement of OpenMP threads  Re-configure MPI+OpenMP layout at runtime • Propose the concept of layout  Static layout: explicit definition of ranks and threads placement at initialization time only  Dynamic layouts: change placement at runtime
  • 36. Static Layouts • New MPI mapper  Get the layout specified by the user  Publish it through PMIx • New MPI OpenMP Coordination (MOC) helper- library gets the layout and set OpenMP places to specify threads will be created placement • No modification to OpenMP standard or runtime
  • 37. Dynamic Layouts • New API to define phases within the application  A static layout is deployed for each phase  Modification of the layout between phases (w/ MOC)  Well adapted to the nature of some applications • Requires  OpenMP runtime modifications (LLVM prototype) • Get phases’ layouts and set new places internally  OpenMP standard modifications to allow dynamicity
  • 38. Architecture Overview • MOC ensures inter-runtime data exchanged (MOC-specific PMIx keys) • Runtimes are PMIx-aware • Runtimes have a layout-aware mapper/affinity manager • Respect the separation between standards, resource manager and runtimes
  • 39. Inter-runtime Progress • Runtime usually guarantee internal progress but are not aware of other runtimes • Runtimes can end up in a deadlock if mutually waiting for completion  MPI operation is ready to complete  Application calls and block in other runtimes
  • 40. Inter-runtime Progress (2) • Proposed solution  Runtimes use PMIx to notify of progress  When a runtime gets a progress notification, it yields once done with all tasks that can be completed • Currently working on prototyping
  • 41. Conclusion • There is a real need for making runtimes aware of each other • PMIx is a privileged option  Asynchronous / event based  Data exchange  Enable interaction with the resource manager • Opens new opportunities for both research and development (e.g. new project focusing on QoS)
  • 42. Dan Holmes EPCC University of Edinburgh Howard Pritchard, Nathan Hjelm Los Alamos National Laboratory MPI Sessions: Dynamic Proc. Groups
  • 43. Using PMIx to Help Replace MPI_Init 14th November 2018 Dan Holmes, Howard Pritchard, Nathan Hjelm LA-UR-18-30830
  • 44. Problems with MPI_Init ● All MPI processes must initialize MPI exactly once ● MPI cannot be initialized within an MPI process from different application components without coordination ● MPI cannot be re-initialized after MPI is finalized ● Error handling for MPI initialization cannot be specified
  • 45. Sessions – a new way to start MPI ● General scheme: ● Query the underlying run-time system ● Get a “set” of processes ● Determine the processes you want ● Create an MPI_Group ● Create a communicator with just those processes ● Create an MPI_Comm Query runtime for set of processes MPI_Group MPI_Comm MPI_Session
  • 46. MPI Sessions proposed API ● Create (or destroy) a session: ● MPI_SESSION_INIT (and MPI_SESSION_FINALIZE) ● Get names of sets of processes: ● MPI_SESSION_GET_NUM_PSETS, MPI_SESSION_GET_NTH_PSET ● Create an MPI_GROUP from a process set name: ● MPI_GROUP_CREATE_FROM_SESSION ● Create an MPI_COMM from an MPI_GROUP: ● MPI_COMM_CREATE_FROM_GROUP PMIx groups helps here
  • 47. MPI_COMM_CREATE_FROM_GROU P The ‘uri’ is supplied by the application. Implementation challenge: ‘group’ is a local object. Need some way to synchronize with other “joiners” to the communicator. The ‘uri’ is different than a process set name. MPI_Create_comm_from_group(IN MPI_Group group, IN const char *uri, IN MPI_Info info, IN MPI_Erhandler hndl, OUT MPI_Comm *comm);
  • 48. Using PMIx Groups ● PMIx Groups - a collection of processes desiring a unified identifier for purposes such as passing events or participating in PMIx fence operations ● Invite/join/leave semantics ● Sessions prototype implementation currently uses PMIX_Group_construct/PMIX_Group_destruct ● Can be used to generate a “unique” 64-bit identifier for the group. Used by the sessions prototype to generate a communicator ID. ● Useful options for future work ● Timeout for processes joining the group ● Asynchronous notification when a process leaves the group
  • 49. Using PMIx_Group_Construct ● ‘id’ maps to/from the ‘uri’ in MPI_Comm_create_from_group (plus additional Open MPI internal info) ● ‘procs’ array comes from information previously supplied by PMIx ○ “mpi://world” and “mpi://self” already available ○ mpiexec -np 2 --pset user://ocean ocean.x : -np 2 --pset user://atmosphere atmosphere.x PMIx_Group_Construct(const char id[], const pmix_proc_t procs[], const pmix_info_t info[], size_t ninfo);
  • 50. MPI Sessions Prototype Status ● All “MPI_*_from_group” functions have been implemented ● Only pml/ob1 supported at this time ● Working on MPI_Group creation (from PSET) now ● Will start with mpi://world and mpi://self ● User-defined process sets need additional support from PMIx ● Up next: break apart MPI initialization ● Goal is to reduce startup time and memory footprint
  • 51. Summary ● PMIx Groups provides an OOB mechanism for MPI processes to bootstrap the formation of a communication context (MPI Communicator) from a group of MPI processes ● Functionality for future work ● Handling (unexpected) process exit ● User-defined process sets ● Group expansion
  • 53. Useful Links: General Information: https://pmix.org/ PMIx Library: https://github.com/pmix/pmix PMIx Reference RunTime Environment (PRRTE): https://github.com/pmix/prrte PMIx Standard: https://github.com/pmix/pmix-standard Slack: pmix-workspace.slack.com Q&A