SlideShare a Scribd company logo
1 of 21
MPI Requirements
of the Network Layer
Presented to the OpenFabrics libfabric Working Group
January 21, 2014
Community feedback assembled
by Jeff Squyres, Cisco Systems
Many thanks to the contributors
(in no particular order)
•

ETZ Zurich

•

• Torsten Hoefler

•

•
•
•
•
•

Sandia National Labs
• Ron Brightwell
• Brian Barrett
• Ryan Grant

•

IBM
•
•
•
•

Chulho Kim
Carl Obert
Michael Blocksome
Perry Schmidt

Cisco Systems

•

Jeff Squyres
Dave Goodell
Reese Faucette
Cesare Cantu
Upinder Malhi

Oak Ridge National Labs
• Scott Atchley
• Pavel Shamis

•

Argonne National Labs
• Jeff Hammond

Slide 2
Many thanks to the contributors
(in no particular order)
• Intel
• Sayantan Sur
• Charles Archer

• AMD
• Brad Benton

• Microsoft

• Cray
• Krishna Kandalla

• Fab Tillier

• U. Edinburgh / EPCC

• Mellanox
• Devendar Bureddy

• Dan Holmes

• U. Alabama Birmingham

• SGI

• Tony Skjellum
• Amin Hassani
• Shane Farmer

• Michael Raymond

Slide 3
Quick MPI overview
• High-level abstraction API
• No concept of a connection

• All communication:
• Is reliable
• Has some ordering rules
• Is comprised of typed messages

• Peer address is (communicator, integer) tuple
• I.e., virtualized
Slide 4
Quick MPI overview
• Communication modes
• Blocking and non-blocking (polled completion)
• Point-to-point: two-sided and one-sided
• Collective operations: broadcast, scatter, reduce,
…etc.
• …and others, but those are the big ones

• Asynch. progression is required/strongly desired

• Message buffers are provided by the application
• They are not “special” (e.g., registered)
Slide 5
Quick MPI overview
• MPI specification
• Governed by the MPI Forum standards body
• Currently at MPI-3.0

• MPI implementations
• Software + hardware implementation of the spec
• Some are open source, some are closed source
• Generally don’t care about interoperability (e.g., wire
protocols)

Slide 6
MPI is a large community
• Community feedback represents union of:
• Different viewpoints
• Different MPI implementations
• Different hardware perspectives

• …and not all agree with each other
• For example…

Slide 7
Different MPI camps
Those who want
high level interfaces

Those who want
low level interfaces

• Do not want to see
memory registration

• Want to have good
memory registration
infrastructure

• Want tag matching

• Want direct access to
hardware capabilities

• E.g., PSM
• Trust the network layer to
do everything well under
the covers

• Want to fully implement
MPI interfaces themselves
• Or, the MPI implementers
are the kernel / firmware
/hardware developers
Slide 8
Be careful what you ask for…
• …because you just got it

• Members of the MPI
Forum would like to be
involved in the libfabric
design on an ongoing
basis
• Can we get an MPI
libfabric listserv?

Slide 9
This is as far as we got
On 21 Jan 2014
Basic things MPI needs
• Messages (not streams)
• Efficient API
• Allow for low latency / high bandwidth
• Low number of instructions in the critical path
• Enable “zero copy”

• Separation of local action initiation and completion
• One-sided (including atomics) and two-sided
semantics
• No requirement for communication buffer alignment
Slide
11
Basic things MPI needs
• Asynchronous progress independent of API calls
• Preferably via dedicated hardware

• Scalable communications with millions of peers
• Think of MPI as a fully-connected model
(even though it usually isn’t implemented that way)

• Today, runs with 3 million MPI processes in a job

Slide
12
Things MPI likes in verbs
• (all the basic needs from previous slide)
• Different modes of communication
• Reliable vs. unreliable
• Scalable connectionless communications (i.e., UD)

• Specify peer read/write address (i.e., RDMA)
• RDMA write with immediate (*)
• …but we want more (more on this later)

Slide
13
Things MPI likes in verbs
• Ability to re-use (short/inline) buffers immediately
• Polling and OS-native/fd-based blocking QP
modes
• Discover devices, ports, and their capabilities (*)
• …but let’s not tie this to a specific hardware model

• Scatter / gather lists for sends

• Atomic operations (*)
• …but we want more (more on this later)
Slide
14
Things MPI likes in verbs
• Can have multiple
consumers in a single
process
• API handles are
independent of each other

Slide
15
Things MPI likes in verbs
• Verbs does not:
• Require collective initialization across multiple
processes
• Require peers to have the same process image
• Restrict completion order vs. delivery order
• Restrict source/target address region (stack, data,
heap)
• Require a specific wire protocol (*)
• …but it does impose limitations, e.g., 40-byte GRH UD
header
Slide
16
Things MPI likes in verbs
• Ability to connect to “unrelated” peers
• Cannot access peer (memory) without permission
• Cleans up everything upon process termination
• E.g., kernel and hardware resources are released

Slide
17
Other things MPI wants
(described as verbs
improvements)
• MTU is an int (not an enum)
• Specify timeouts to connection requests
• …or have a CM that completes connections
asynchronously

• All operations need to be non-blocking, including:
• Address handle creation
• Communication setup / teardown
• Memory registration / deregistration
Slide
18
Other things MPI wants
(described as verbs
improvements)
• Specify buffer/length as function parameters
• Specified as struct requires extra memory accesses
• …more on this later

• Ability to query how many credits currently
available in a QP
• To support actions that consume more than one
credit

• Remove concept of “queue pair”
• Have standalone send channels and receive
channels
Slide
19
Other things MPI wants
(described as verbs
improvements)
• Completion at target for an RDMA write
• Have ability to query if loopback communication is
supported
• Clearly delineate what functionality must be
supported vs. what is optional
• Example: MPI provides (almost) the same
functionality everywhere, regardless of hardware /
platform
• Verbs functionality is wildly different for each provider
Slide
20
Stop for today
Still consolidating more MPI community feedback

More Related Content

What's hot

An Open Source Workbench for Prototyping Multimodal Interactions Based on Off...
An Open Source Workbench for Prototyping Multimodal Interactions Based on Off...An Open Source Workbench for Prototyping Multimodal Interactions Based on Off...
An Open Source Workbench for Prototyping Multimodal Interactions Based on Off...Jean Vanderdonckt
 
Proxying DBI with DBD::Gofer and App::Staticperl
Proxying DBI with DBD::Gofer and App::StaticperlProxying DBI with DBD::Gofer and App::Staticperl
Proxying DBI with DBD::Gofer and App::Staticperlnohuhu
 
They why behind php frameworks
They why behind php frameworksThey why behind php frameworks
They why behind php frameworksKirk Madera
 
Adding Real-time Features to PHP Applications
Adding Real-time Features to PHP ApplicationsAdding Real-time Features to PHP Applications
Adding Real-time Features to PHP ApplicationsRonny López
 
CrossWorlds: Unleash the Power of Domino for Connections Development
CrossWorlds: Unleash the Power of Domino for Connections Development CrossWorlds: Unleash the Power of Domino for Connections Development
CrossWorlds: Unleash the Power of Domino for Connections Development LetsConnect
 
dot net final year project in jalandhar
dot net final year project in jalandhardot net final year project in jalandhar
dot net final year project in jalandhardeepikakaler1
 
PHP Web Development Frameworks & Advantages
PHP Web Development Frameworks & AdvantagesPHP Web Development Frameworks & Advantages
PHP Web Development Frameworks & AdvantagesAditMicrosys Australia
 
Performance and Abstractions
Performance and AbstractionsPerformance and Abstractions
Performance and AbstractionsMetosin Oy
 
ASP.NET Core Demos
ASP.NET Core DemosASP.NET Core Demos
ASP.NET Core DemosErik Noren
 
Microsoft .Net Framework
Microsoft .Net FrameworkMicrosoft .Net Framework
Microsoft .Net FrameworkRohit Rao
 
Content Management Systems and Refactoring - Drupal, WordPress and eZ Publish
Content Management Systems and Refactoring - Drupal, WordPress and eZ PublishContent Management Systems and Refactoring - Drupal, WordPress and eZ Publish
Content Management Systems and Refactoring - Drupal, WordPress and eZ PublishJani Tarvainen
 
Revamping Mailjet API documentation @ ParisAPI meetup
Revamping Mailjet API documentation @ ParisAPI meetupRevamping Mailjet API documentation @ ParisAPI meetup
Revamping Mailjet API documentation @ ParisAPI meetupMailjet
 
Net Framework overview
Net Framework overviewNet Framework overview
Net Framework overviewMohitKumar1985
 
ASP.NET Core Demos Part 2
ASP.NET Core Demos Part 2ASP.NET Core Demos Part 2
ASP.NET Core Demos Part 2Erik Noren
 
Bccon use notes objects in memory and other useful
Bccon   use notes objects in memory and other usefulBccon   use notes objects in memory and other useful
Bccon use notes objects in memory and other usefulFrank van der Linden
 

What's hot (19)

An Open Source Workbench for Prototyping Multimodal Interactions Based on Off...
An Open Source Workbench for Prototyping Multimodal Interactions Based on Off...An Open Source Workbench for Prototyping Multimodal Interactions Based on Off...
An Open Source Workbench for Prototyping Multimodal Interactions Based on Off...
 
Proxying DBI with DBD::Gofer and App::Staticperl
Proxying DBI with DBD::Gofer and App::StaticperlProxying DBI with DBD::Gofer and App::Staticperl
Proxying DBI with DBD::Gofer and App::Staticperl
 
They why behind php frameworks
They why behind php frameworksThey why behind php frameworks
They why behind php frameworks
 
Adding Real-time Features to PHP Applications
Adding Real-time Features to PHP ApplicationsAdding Real-time Features to PHP Applications
Adding Real-time Features to PHP Applications
 
CrossWorlds: Unleash the Power of Domino for Connections Development
CrossWorlds: Unleash the Power of Domino for Connections Development CrossWorlds: Unleash the Power of Domino for Connections Development
CrossWorlds: Unleash the Power of Domino for Connections Development
 
dot net final year project in jalandhar
dot net final year project in jalandhardot net final year project in jalandhar
dot net final year project in jalandhar
 
PHP Web Development Frameworks & Advantages
PHP Web Development Frameworks & AdvantagesPHP Web Development Frameworks & Advantages
PHP Web Development Frameworks & Advantages
 
Performance and Abstractions
Performance and AbstractionsPerformance and Abstractions
Performance and Abstractions
 
.NET Framework
.NET Framework.NET Framework
.NET Framework
 
.Net overview by cetpa
.Net overview by cetpa.Net overview by cetpa
.Net overview by cetpa
 
ASP.NET Core Demos
ASP.NET Core DemosASP.NET Core Demos
ASP.NET Core Demos
 
Microsoft .Net Framework
Microsoft .Net FrameworkMicrosoft .Net Framework
Microsoft .Net Framework
 
Content Management Systems and Refactoring - Drupal, WordPress and eZ Publish
Content Management Systems and Refactoring - Drupal, WordPress and eZ PublishContent Management Systems and Refactoring - Drupal, WordPress and eZ Publish
Content Management Systems and Refactoring - Drupal, WordPress and eZ Publish
 
Revamping Mailjet API documentation @ ParisAPI meetup
Revamping Mailjet API documentation @ ParisAPI meetupRevamping Mailjet API documentation @ ParisAPI meetup
Revamping Mailjet API documentation @ ParisAPI meetup
 
Net Framework overview
Net Framework overviewNet Framework overview
Net Framework overview
 
ASP.NET Core Demos Part 2
ASP.NET Core Demos Part 2ASP.NET Core Demos Part 2
ASP.NET Core Demos Part 2
 
.Net framework
.Net framework.Net framework
.Net framework
 
Bccon use notes objects in memory and other useful
Bccon   use notes objects in memory and other usefulBccon   use notes objects in memory and other useful
Bccon use notes objects in memory and other useful
 
Net overview
Net overviewNet overview
Net overview
 

Similar to 2014 01-21-mpi-community-feedback

EuroMPI 2013 presentation: McMPI
EuroMPI 2013 presentation: McMPIEuroMPI 2013 presentation: McMPI
EuroMPI 2013 presentation: McMPIDan Holmes
 
Threads in Operating System | Multithreading | Interprocess Communication
Threads in Operating System | Multithreading | Interprocess CommunicationThreads in Operating System | Multithreading | Interprocess Communication
Threads in Operating System | Multithreading | Interprocess CommunicationShivam Mitra
 
ch4-Software is Everywhere
ch4-Software is Everywherech4-Software is Everywhere
ch4-Software is Everywheressuser06ea42
 
Embedded Systems: Lecture 8: The Raspberry Pi as a Linux Box
Embedded Systems: Lecture 8: The Raspberry Pi as a Linux BoxEmbedded Systems: Lecture 8: The Raspberry Pi as a Linux Box
Embedded Systems: Lecture 8: The Raspberry Pi as a Linux BoxAhmed El-Arabawy
 
Realtime traffic analyser
Realtime traffic analyserRealtime traffic analyser
Realtime traffic analyserAlex Moskvin
 
Opening last bits of the infrastructure
Opening last bits of the infrastructureOpening last bits of the infrastructure
Opening last bits of the infrastructureErwan Velu
 
Hpc lunch and learn
Hpc lunch and learnHpc lunch and learn
Hpc lunch and learnJohn D Almon
 
Advanced Internet of Things firmware engineering with Thingsquare and Contiki...
Advanced Internet of Things firmware engineering with Thingsquare and Contiki...Advanced Internet of Things firmware engineering with Thingsquare and Contiki...
Advanced Internet of Things firmware engineering with Thingsquare and Contiki...Adam Dunkels
 
The Python in the Apple
The Python in the AppleThe Python in the Apple
The Python in the ApplezeroSteiner
 
Putting Compilers to Work
Putting Compilers to WorkPutting Compilers to Work
Putting Compilers to WorkSingleStore
 
The world is not black and white – Impact of decisions over the lifetime of a...
The world is not black and white – Impact of decisions over the lifetime of a...The world is not black and white – Impact of decisions over the lifetime of a...
The world is not black and white – Impact of decisions over the lifetime of a...Eric Reiche
 
Php internal architecture
Php internal architecturePhp internal architecture
Php internal architectureElizabeth Smith
 
State of Puppet - Puppet Camp Silicon Valley 2014
State of Puppet - Puppet Camp Silicon Valley 2014State of Puppet - Puppet Camp Silicon Valley 2014
State of Puppet - Puppet Camp Silicon Valley 2014Puppet
 
The Linux Block Layer - Built for Fast Storage
The Linux Block Layer - Built for Fast StorageThe Linux Block Layer - Built for Fast Storage
The Linux Block Layer - Built for Fast StorageKernel TLV
 
Software Architecture and Architectors: useless VS valuable
Software Architecture and Architectors: useless VS valuableSoftware Architecture and Architectors: useless VS valuable
Software Architecture and Architectors: useless VS valuableComsysto Reply GmbH
 
Allwinner Kernel Upstreaming Experiences
Allwinner Kernel Upstreaming ExperiencesAllwinner Kernel Upstreaming Experiences
Allwinner Kernel Upstreaming ExperiencesChen-Yu Tsai
 
Understanding Android Benchmarks
Understanding Android BenchmarksUnderstanding Android Benchmarks
Understanding Android BenchmarksKoan-Sin Tan
 
Linux Distribution Collaboration …on a Mainframe!
Linux Distribution Collaboration …on a Mainframe!Linux Distribution Collaboration …on a Mainframe!
Linux Distribution Collaboration …on a Mainframe!All Things Open
 
Architectural Decisions: Smoothly and Consistently
Architectural Decisions: Smoothly and ConsistentlyArchitectural Decisions: Smoothly and Consistently
Architectural Decisions: Smoothly and ConsistentlyComsysto Reply GmbH
 

Similar to 2014 01-21-mpi-community-feedback (20)

EuroMPI 2013 presentation: McMPI
EuroMPI 2013 presentation: McMPIEuroMPI 2013 presentation: McMPI
EuroMPI 2013 presentation: McMPI
 
Threads in Operating System | Multithreading | Interprocess Communication
Threads in Operating System | Multithreading | Interprocess CommunicationThreads in Operating System | Multithreading | Interprocess Communication
Threads in Operating System | Multithreading | Interprocess Communication
 
ch4-Software is Everywhere
ch4-Software is Everywherech4-Software is Everywhere
ch4-Software is Everywhere
 
Embedded Systems: Lecture 8: The Raspberry Pi as a Linux Box
Embedded Systems: Lecture 8: The Raspberry Pi as a Linux BoxEmbedded Systems: Lecture 8: The Raspberry Pi as a Linux Box
Embedded Systems: Lecture 8: The Raspberry Pi as a Linux Box
 
Realtime traffic analyser
Realtime traffic analyserRealtime traffic analyser
Realtime traffic analyser
 
Opening last bits of the infrastructure
Opening last bits of the infrastructureOpening last bits of the infrastructure
Opening last bits of the infrastructure
 
Hpc lunch and learn
Hpc lunch and learnHpc lunch and learn
Hpc lunch and learn
 
Advanced Internet of Things firmware engineering with Thingsquare and Contiki...
Advanced Internet of Things firmware engineering with Thingsquare and Contiki...Advanced Internet of Things firmware engineering with Thingsquare and Contiki...
Advanced Internet of Things firmware engineering with Thingsquare and Contiki...
 
The Python in the Apple
The Python in the AppleThe Python in the Apple
The Python in the Apple
 
Putting Compilers to Work
Putting Compilers to WorkPutting Compilers to Work
Putting Compilers to Work
 
The world is not black and white – Impact of decisions over the lifetime of a...
The world is not black and white – Impact of decisions over the lifetime of a...The world is not black and white – Impact of decisions over the lifetime of a...
The world is not black and white – Impact of decisions over the lifetime of a...
 
Php internal architecture
Php internal architecturePhp internal architecture
Php internal architecture
 
State of Puppet - Puppet Camp Silicon Valley 2014
State of Puppet - Puppet Camp Silicon Valley 2014State of Puppet - Puppet Camp Silicon Valley 2014
State of Puppet - Puppet Camp Silicon Valley 2014
 
The Linux Block Layer - Built for Fast Storage
The Linux Block Layer - Built for Fast StorageThe Linux Block Layer - Built for Fast Storage
The Linux Block Layer - Built for Fast Storage
 
Software Architecture and Architectors: useless VS valuable
Software Architecture and Architectors: useless VS valuableSoftware Architecture and Architectors: useless VS valuable
Software Architecture and Architectors: useless VS valuable
 
Allwinner Kernel Upstreaming Experiences
Allwinner Kernel Upstreaming ExperiencesAllwinner Kernel Upstreaming Experiences
Allwinner Kernel Upstreaming Experiences
 
Smart Object Architecture
Smart Object ArchitectureSmart Object Architecture
Smart Object Architecture
 
Understanding Android Benchmarks
Understanding Android BenchmarksUnderstanding Android Benchmarks
Understanding Android Benchmarks
 
Linux Distribution Collaboration …on a Mainframe!
Linux Distribution Collaboration …on a Mainframe!Linux Distribution Collaboration …on a Mainframe!
Linux Distribution Collaboration …on a Mainframe!
 
Architectural Decisions: Smoothly and Consistently
Architectural Decisions: Smoothly and ConsistentlyArchitectural Decisions: Smoothly and Consistently
Architectural Decisions: Smoothly and Consistently
 

More from Jeff Squyres

Open MPI State of the Union X SC'16 BOF
Open MPI State of the Union X SC'16 BOFOpen MPI State of the Union X SC'16 BOF
Open MPI State of the Union X SC'16 BOFJeff Squyres
 
MPI Sessions: a proposal to the MPI Forum
MPI Sessions: a proposal to the MPI ForumMPI Sessions: a proposal to the MPI Forum
MPI Sessions: a proposal to the MPI ForumJeff Squyres
 
MPI Fourm SC'15 BOF
MPI Fourm SC'15 BOFMPI Fourm SC'15 BOF
MPI Fourm SC'15 BOFJeff Squyres
 
Open MPI SC'15 State of the Union BOF
Open MPI SC'15 State of the Union BOFOpen MPI SC'15 State of the Union BOF
Open MPI SC'15 State of the Union BOFJeff Squyres
 
Cisco's journey from Verbs to Libfabric
Cisco's journey from Verbs to LibfabricCisco's journey from Verbs to Libfabric
Cisco's journey from Verbs to LibfabricJeff Squyres
 
(Very) Loose proposal to revamp MPI_INIT and MPI_FINALIZE
(Very) Loose proposal to revamp MPI_INIT and MPI_FINALIZE(Very) Loose proposal to revamp MPI_INIT and MPI_FINALIZE
(Very) Loose proposal to revamp MPI_INIT and MPI_FINALIZEJeff Squyres
 
Fun with Github webhooks: verifying Signed-off-by
Fun with Github webhooks: verifying Signed-off-byFun with Github webhooks: verifying Signed-off-by
Fun with Github webhooks: verifying Signed-off-byJeff Squyres
 
Open MPI new version number scheme and roadmap
Open MPI new version number scheme and roadmapOpen MPI new version number scheme and roadmap
Open MPI new version number scheme and roadmapJeff Squyres
 
The State of libfabric in Open MPI
The State of libfabric in Open MPIThe State of libfabric in Open MPI
The State of libfabric in Open MPIJeff Squyres
 
Cisco usNIC libfabric provider
Cisco usNIC libfabric providerCisco usNIC libfabric provider
Cisco usNIC libfabric providerJeff Squyres
 
(Open) MPI, Parallel Computing, Life, the Universe, and Everything
(Open) MPI, Parallel Computing, Life, the Universe, and Everything(Open) MPI, Parallel Computing, Life, the Universe, and Everything
(Open) MPI, Parallel Computing, Life, the Universe, and EverythingJeff Squyres
 
Cisco usNIC: how it works, how it is used in Open MPI
Cisco usNIC: how it works, how it is used in Open MPICisco usNIC: how it works, how it is used in Open MPI
Cisco usNIC: how it works, how it is used in Open MPIJeff Squyres
 
Cisco EuroMPI'13 vendor session presentation
Cisco EuroMPI'13 vendor session presentationCisco EuroMPI'13 vendor session presentation
Cisco EuroMPI'13 vendor session presentationJeff Squyres
 
Open MPI Explorations in Process Affinity (EuroMPI'13 presentation)
Open MPI Explorations in Process Affinity (EuroMPI'13 presentation)Open MPI Explorations in Process Affinity (EuroMPI'13 presentation)
Open MPI Explorations in Process Affinity (EuroMPI'13 presentation)Jeff Squyres
 
MOSSCon 2013, Cisco Open Source talk
MOSSCon 2013, Cisco Open Source talkMOSSCon 2013, Cisco Open Source talk
MOSSCon 2013, Cisco Open Source talkJeff Squyres
 
Ethernet and TCP optimizations
Ethernet and TCP optimizationsEthernet and TCP optimizations
Ethernet and TCP optimizationsJeff Squyres
 
Friends don't let friends leak MPI_Requests
Friends don't let friends leak MPI_RequestsFriends don't let friends leak MPI_Requests
Friends don't let friends leak MPI_RequestsJeff Squyres
 
MPI-3 Timer requests proposal
MPI-3 Timer requests proposalMPI-3 Timer requests proposal
MPI-3 Timer requests proposalJeff Squyres
 
MPI_Mprobe is good for you
MPI_Mprobe is good for youMPI_Mprobe is good for you
MPI_Mprobe is good for youJeff Squyres
 

More from Jeff Squyres (20)

Open MPI State of the Union X SC'16 BOF
Open MPI State of the Union X SC'16 BOFOpen MPI State of the Union X SC'16 BOF
Open MPI State of the Union X SC'16 BOF
 
MPI Sessions: a proposal to the MPI Forum
MPI Sessions: a proposal to the MPI ForumMPI Sessions: a proposal to the MPI Forum
MPI Sessions: a proposal to the MPI Forum
 
MPI Fourm SC'15 BOF
MPI Fourm SC'15 BOFMPI Fourm SC'15 BOF
MPI Fourm SC'15 BOF
 
Open MPI SC'15 State of the Union BOF
Open MPI SC'15 State of the Union BOFOpen MPI SC'15 State of the Union BOF
Open MPI SC'15 State of the Union BOF
 
Cisco's journey from Verbs to Libfabric
Cisco's journey from Verbs to LibfabricCisco's journey from Verbs to Libfabric
Cisco's journey from Verbs to Libfabric
 
(Very) Loose proposal to revamp MPI_INIT and MPI_FINALIZE
(Very) Loose proposal to revamp MPI_INIT and MPI_FINALIZE(Very) Loose proposal to revamp MPI_INIT and MPI_FINALIZE
(Very) Loose proposal to revamp MPI_INIT and MPI_FINALIZE
 
Fun with Github webhooks: verifying Signed-off-by
Fun with Github webhooks: verifying Signed-off-byFun with Github webhooks: verifying Signed-off-by
Fun with Github webhooks: verifying Signed-off-by
 
Open MPI new version number scheme and roadmap
Open MPI new version number scheme and roadmapOpen MPI new version number scheme and roadmap
Open MPI new version number scheme and roadmap
 
The State of libfabric in Open MPI
The State of libfabric in Open MPIThe State of libfabric in Open MPI
The State of libfabric in Open MPI
 
Cisco usNIC libfabric provider
Cisco usNIC libfabric providerCisco usNIC libfabric provider
Cisco usNIC libfabric provider
 
(Open) MPI, Parallel Computing, Life, the Universe, and Everything
(Open) MPI, Parallel Computing, Life, the Universe, and Everything(Open) MPI, Parallel Computing, Life, the Universe, and Everything
(Open) MPI, Parallel Computing, Life, the Universe, and Everything
 
Cisco usNIC: how it works, how it is used in Open MPI
Cisco usNIC: how it works, how it is used in Open MPICisco usNIC: how it works, how it is used in Open MPI
Cisco usNIC: how it works, how it is used in Open MPI
 
Cisco EuroMPI'13 vendor session presentation
Cisco EuroMPI'13 vendor session presentationCisco EuroMPI'13 vendor session presentation
Cisco EuroMPI'13 vendor session presentation
 
Open MPI Explorations in Process Affinity (EuroMPI'13 presentation)
Open MPI Explorations in Process Affinity (EuroMPI'13 presentation)Open MPI Explorations in Process Affinity (EuroMPI'13 presentation)
Open MPI Explorations in Process Affinity (EuroMPI'13 presentation)
 
MPI History
MPI HistoryMPI History
MPI History
 
MOSSCon 2013, Cisco Open Source talk
MOSSCon 2013, Cisco Open Source talkMOSSCon 2013, Cisco Open Source talk
MOSSCon 2013, Cisco Open Source talk
 
Ethernet and TCP optimizations
Ethernet and TCP optimizationsEthernet and TCP optimizations
Ethernet and TCP optimizations
 
Friends don't let friends leak MPI_Requests
Friends don't let friends leak MPI_RequestsFriends don't let friends leak MPI_Requests
Friends don't let friends leak MPI_Requests
 
MPI-3 Timer requests proposal
MPI-3 Timer requests proposalMPI-3 Timer requests proposal
MPI-3 Timer requests proposal
 
MPI_Mprobe is good for you
MPI_Mprobe is good for youMPI_Mprobe is good for you
MPI_Mprobe is good for you
 

Recently uploaded

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 

Recently uploaded (20)

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 

2014 01-21-mpi-community-feedback

  • 1. MPI Requirements of the Network Layer Presented to the OpenFabrics libfabric Working Group January 21, 2014 Community feedback assembled by Jeff Squyres, Cisco Systems
  • 2. Many thanks to the contributors (in no particular order) • ETZ Zurich • • Torsten Hoefler • • • • • • Sandia National Labs • Ron Brightwell • Brian Barrett • Ryan Grant • IBM • • • • Chulho Kim Carl Obert Michael Blocksome Perry Schmidt Cisco Systems • Jeff Squyres Dave Goodell Reese Faucette Cesare Cantu Upinder Malhi Oak Ridge National Labs • Scott Atchley • Pavel Shamis • Argonne National Labs • Jeff Hammond Slide 2
  • 3. Many thanks to the contributors (in no particular order) • Intel • Sayantan Sur • Charles Archer • AMD • Brad Benton • Microsoft • Cray • Krishna Kandalla • Fab Tillier • U. Edinburgh / EPCC • Mellanox • Devendar Bureddy • Dan Holmes • U. Alabama Birmingham • SGI • Tony Skjellum • Amin Hassani • Shane Farmer • Michael Raymond Slide 3
  • 4. Quick MPI overview • High-level abstraction API • No concept of a connection • All communication: • Is reliable • Has some ordering rules • Is comprised of typed messages • Peer address is (communicator, integer) tuple • I.e., virtualized Slide 4
  • 5. Quick MPI overview • Communication modes • Blocking and non-blocking (polled completion) • Point-to-point: two-sided and one-sided • Collective operations: broadcast, scatter, reduce, …etc. • …and others, but those are the big ones • Asynch. progression is required/strongly desired • Message buffers are provided by the application • They are not “special” (e.g., registered) Slide 5
  • 6. Quick MPI overview • MPI specification • Governed by the MPI Forum standards body • Currently at MPI-3.0 • MPI implementations • Software + hardware implementation of the spec • Some are open source, some are closed source • Generally don’t care about interoperability (e.g., wire protocols) Slide 6
  • 7. MPI is a large community • Community feedback represents union of: • Different viewpoints • Different MPI implementations • Different hardware perspectives • …and not all agree with each other • For example… Slide 7
  • 8. Different MPI camps Those who want high level interfaces Those who want low level interfaces • Do not want to see memory registration • Want to have good memory registration infrastructure • Want tag matching • Want direct access to hardware capabilities • E.g., PSM • Trust the network layer to do everything well under the covers • Want to fully implement MPI interfaces themselves • Or, the MPI implementers are the kernel / firmware /hardware developers Slide 8
  • 9. Be careful what you ask for… • …because you just got it • Members of the MPI Forum would like to be involved in the libfabric design on an ongoing basis • Can we get an MPI libfabric listserv? Slide 9
  • 10. This is as far as we got On 21 Jan 2014
  • 11. Basic things MPI needs • Messages (not streams) • Efficient API • Allow for low latency / high bandwidth • Low number of instructions in the critical path • Enable “zero copy” • Separation of local action initiation and completion • One-sided (including atomics) and two-sided semantics • No requirement for communication buffer alignment Slide 11
  • 12. Basic things MPI needs • Asynchronous progress independent of API calls • Preferably via dedicated hardware • Scalable communications with millions of peers • Think of MPI as a fully-connected model (even though it usually isn’t implemented that way) • Today, runs with 3 million MPI processes in a job Slide 12
  • 13. Things MPI likes in verbs • (all the basic needs from previous slide) • Different modes of communication • Reliable vs. unreliable • Scalable connectionless communications (i.e., UD) • Specify peer read/write address (i.e., RDMA) • RDMA write with immediate (*) • …but we want more (more on this later) Slide 13
  • 14. Things MPI likes in verbs • Ability to re-use (short/inline) buffers immediately • Polling and OS-native/fd-based blocking QP modes • Discover devices, ports, and their capabilities (*) • …but let’s not tie this to a specific hardware model • Scatter / gather lists for sends • Atomic operations (*) • …but we want more (more on this later) Slide 14
  • 15. Things MPI likes in verbs • Can have multiple consumers in a single process • API handles are independent of each other Slide 15
  • 16. Things MPI likes in verbs • Verbs does not: • Require collective initialization across multiple processes • Require peers to have the same process image • Restrict completion order vs. delivery order • Restrict source/target address region (stack, data, heap) • Require a specific wire protocol (*) • …but it does impose limitations, e.g., 40-byte GRH UD header Slide 16
  • 17. Things MPI likes in verbs • Ability to connect to “unrelated” peers • Cannot access peer (memory) without permission • Cleans up everything upon process termination • E.g., kernel and hardware resources are released Slide 17
  • 18. Other things MPI wants (described as verbs improvements) • MTU is an int (not an enum) • Specify timeouts to connection requests • …or have a CM that completes connections asynchronously • All operations need to be non-blocking, including: • Address handle creation • Communication setup / teardown • Memory registration / deregistration Slide 18
  • 19. Other things MPI wants (described as verbs improvements) • Specify buffer/length as function parameters • Specified as struct requires extra memory accesses • …more on this later • Ability to query how many credits currently available in a QP • To support actions that consume more than one credit • Remove concept of “queue pair” • Have standalone send channels and receive channels Slide 19
  • 20. Other things MPI wants (described as verbs improvements) • Completion at target for an RDMA write • Have ability to query if loopback communication is supported • Clearly delineate what functionality must be supported vs. what is optional • Example: MPI provides (almost) the same functionality everywhere, regardless of hardware / platform • Verbs functionality is wildly different for each provider Slide 20
  • 21. Stop for today Still consolidating more MPI community feedback

Editor's Notes

  1. Feedback from call:We like RDMA write with immediate… but 4 bytes isn’t enoughRe-state last bullet better: want completion on peer for an RDMA write
  2. Replaced “inline messages” with “ability to re-use buffer immediately”
  3. Replaced MPI + PGAS example with “API handles are independent of each other”Split off into its own slide
  4. Specifically mentioned the 40-byte GHR UD header
  5. Translated “Multiple PDs per process” to “ability to connect to ‘unrelated’ peers”
  6. After all:- Added 1st bullet as reaction to feedback from slide 12