Stefano will give an introduction to the most common and used programming models for performing parallel I/O on supercomputers. He will first give a broad overview of parallel APIs for programming I/O on supercomputers. He will then introduce MPI I/O, one of the most used programming interfaces for parallel I/O, presenting its basic concepts, providing programming examples and guidelines for achieving high performance I/O on supercomputers.
Visit: https://www.eudat.eu/eudat-summer-school
Introduction to HPC Programming Models - EUDAT Summer School (Stefano Markidis, KTH)
1. www.eudat.eu
EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No. 654065
Introduction to HPC
Programming Models
Stefano Markidis
KTH, Sweden
2. Supercomputing - I
Use of computer simulation as a tool
for greater understanding of the real
world
Complements experimentation
and theory
Problems are increasingly
computationally challenging
Large parallel machines needed
to perform calculations
Critical to leverage parallelism in
all phases
Data access is a huge challenge
Using parallelism to obtain
performance
Finding usable, efficient, portable
I/O interfaces
millenium project
Thermal hydraulics with Nek500
6. c c
c c
💲💲
DRAM
c c
c c
💲💲
c c
c c
💲💲
c c
c c
💲💲
NIC
c c
c c
💲💲
DRAM
c c
c c
💲💲
c c
c c
💲💲
c c
c c
💲💲
NIC
c c
c c
💲💲
c c
c c
💲💲
c c
c c
💲💲
c c
c c
💲💲
c c
c c
💲💲
c c
c c
💲💲
c c
c c
💲💲
c c
c c
c c
c c
c c
c c
c c
c c
c c
c c
7. HPC I/O System is also rather complex…
An HPC I/O system is attached to supercomputer
The HPC I/O system is a supercomputer itself
Commodity
network primarily
carries storage traffic
Enterprise storage
controllers and large racks
of disks connected via
Storage nodes run
parallel file system
software and manage
Gateway nodes run
parallel file system client
software and forward I/O
Ethernet
10 Gbit/sec
InfiniBand
16 Gbit/sec
BG/P Tree
6.8 Gbit/sec
Serial ATA
3.0 Gbit/sec
HW bottleneck is
here. Controllers
can manage only
4.6 Gbyte/sec.
Peak I/O system
bandwidth is
78.2 Gbyte/sec.
Architectural diagram of 557 TF Argonne Leadership Computing Facility Blue Gene/P I/O system
8. Supercomputing – II
Most of modern supercomputer
hardware are built following two
principles:
use of commodity hardware:
Intel CPUs, AMD CPUs, DDR4,
NVIDIA GPU …
Using parallelism to achieve
very high performance
The file systems connected to
computers are built in the same
way
Gather large numbers of
storage device: HDDs, SSDs
Connect them together in
parallel to create a high
bandwidth, high capacity
storage device.
10. Largest HPC IO Systems
https://www.vi4io.org/hpsl/2017/start
This is where
Big Data starts for HPC
11. Supercomputing - III
Supercomputing, n. [ sˌuːpəkəmpjˈuːtə]
A special branch of scientific computing
that turns a computation-bound problem
into an I/O-bound problem.
12. Why is that ? I/O vs Compute Performance
Disk AccessRates over Time
13. HPC Programming Models
Programming models are an abstraction of parallel
computer architectures
To express conveniently algorithms without
focusing on the details of underlying hardware
To remove complexity of architecture when
designing algorithms
To allow for high-performance
implementations
14. Two HPC Programming Models for Supercomputers
p0 p1 p2
a=12 a=77 a=32
a=12
a=12
12
12
Message-Passing: explicit send
and receive operations (explicit
communication)
p0 p1 p2
a(1) =
12
a(2) =
77
a(3) =
32
a(1) a(2) a(3)
a(2) =
12 a(2) =
12
a(3) =
12
a(3) =
12
PGAS: access global memory that is
physically distributed (implicit
communication)
Get/Put are load/store to global memory
Problem: move value of a from p0,
to p1 and then p2
15. How do you program a supercomputer ?
99% of the codes for supercomputers are written in
Fortran (including Fortran77) and C/C++
Other languages supporting multithreading for
on-node parallelism (Python, Java, …)
99% of the large HPC codes use MPI libraries (MP
programming model)
Used to move data from one computing node to
another but also used for on-node parallelism
Data-analytics frameworks for supercomputers
often use MPI as transport layers
16. MPI
MPI = standardized specification document for a
Message Passing library to support parallel computing in
C/C++ and Fortran.
Portability
High-Performance
Two main implementations:
MPICH and OpenMPI (you can install on your laptop)
Supercomputer vendors provide highly-tuned
implementations of these two.
Only four fundamental functions: MPI_Init,
MPI_Finalize, MPI_Send, MPI_Recv
Other collective functions that include all the
communicate processes, i.e. broadcast, scatter, …
Includes RDMA operations (one-sided), also streaming
models built atop
18. What is Parallel I/O?
At the program level:
Concurrent reads or writes from multiple
processes to a common file
At the system level:
A parallel file system and hardware that support
such concurrent access
Three strategies of I/O in HPC:
Spokesperson
Multiple writers multiple files
Cooperative
20. Multiple writers multiple files
All the processes write to individual files
Might limited by the file system
Easy to program
It doesn’t scale
Number of files creates bottleneck with metadata operations
Number of simultaneous disk accesses creates contention for
file system resources
21. Cooperative Parallel I/O (Real Parallel IO)
Multiple processes write to a shared file potentially not in
a non-contiguous way
Truly IO-parallel
22. EUDAT Summer School, 3-7 July 2017, Crete
Applications (Weather Forecast, CFD, Astrophysics …)
High-Level I/O Level Libraries
I/O Middleware
I/O Forwarding
I/O Parallel File system
I/O Hardware
MPI I/O
HDF5, NetCDF, SionLib, ADIOS
CIOD/DVS
Lustre, GPFS, …
Know about this
allow you to optimize
higher level of the
software stack
HPC I/O Software Stack
23. MPI I/O
Why Parallel I/O in MPI?
Writing is like sending and reading is like receiving.
Any parallel I/O system will need:
collective operations, communicators, …
Why do I/O in MPI?
Why not just POSIX?
Parallel performance
Single file (instead of one file / process)
MPI has replacement functions for POSIX I/O
Provides migration path
Multiple styles of I/O can all be expressed in MPI
Including some that cannot be expressed without
MPI
24. MPI I/O: the basics
I/O operations for unformatted binary file, similar to
read and write, there is no fwrite nor fread.
Just like POSIX I/O, you need to
Open the file
Read or Write data to the file
Close the file
In MPI, these steps are almost the same:
Open the file: MPI_File_open
Write to the file: MPI_File_write
Close the file: MPI_File_close
25. An example of MPI I/O
#include <stdio.h>
#include "mpi.h”
int main(int argc, char *argv[])
{
MPI_File fh;
int buf[1000], rank;
MPI_Init(0,0);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI _File_ open(MPI_COMM_WORLD, "test.out",
MPI_MODE_CREATE|MPI_MODE_WRONLY,
MPI_INFO_NULL, &fh);
if (rank == 0)
MPI _File_ w rite(fh, buf, 1000, MPI_INT, MPI_STATUS_IGNORE);
MPI _File_ close(&fh);
MPI_Finalize();
return 0;
}
Example code to write to a shared file
26. High-Level Parallel Libraries
Provide structure to files
Well-defined, portable formats
Self-describing
APIs more appropriate for computational science
Typed data
Noncontiguous regions in memory and file
Interfaces are implemented on top of MPI-IO
27. HDF5
Most used high-level library in scientific codes
HDF5 = Hierarchical Data format
HDF5 is three things:
Data model: container, data set, group and link
Library: support for parallel I/O operations
File Format: Hierarchical data organization in
single file; typed, multidimensional array storage ;
attributes on dataset, data
28. What about PGAS I/O ?
Effort in designing PGAS-like
programming systems for I/O
operations
Different parts of a shared
file are virtually mapped to
a global memory space,
that is accessible by all
processes, think about
mmap for instance
To write to disk, make a
store to global memory
To read from disk, make a
load from the global
memory
I/O system is becoming very
heterogeneous so it is good to
have a unique flat global
“memory” space to hide this
architectural complexity
a(0) a(1) a(2) a(3) a(4) a(5) a(6)
File 1 File 2
mapping
User
write
a(5)
a(5) = 8.7
Global “memory”
29. Conclusions
Supercomputers consist of several computing
nodes connected by an high-performance network
Programming models abstract supercomputer
hardware to allow for efficient implementation of
algorithms
MPI, C/C++ and Fortran are dominant
MPI I/O provides means for real parallel I/O
HDF5 most famous data format, library and data
model in HPC
PGAS I/O might be a viable option
31. www.eudat.eu
Acknowledgements
These slides are largely based and adapted from:
- “Parallel I/O in Practice” by Rob Ross
https://www.nersc.gov/assets/Training/pio-in-practice-sc12.pdf
- “Short introduktion on Optimizing I/O” by Cray
https://www.pdc.kth.se/education/course-resources/introduction-to-
cray-xc30-xc40/feb-2015/05_Short_Intro_Optimizing-IO.pdf
- “Lecture 32: Introduction to MPI I/O” by Bill Gropp
http://wgropp.cs.illinois.edu/courses/cs598-s16/lectures/lecture32.pdf