SlideShare una empresa de Scribd logo
1 de 24
Descargar para leer sin conexión
New Process/Thread Runtime
Process in Process
Techniques for Practical
Address-Space Sharing
Atsushi Hori (RIKEN)
Dec. 13, 2017
Arm HPC Workshop@Akihabara 2017
Background
• The rise of many-core architectures
• The current parallel execution models are
designed for multi-core architectures
• Shall we have a new parallel execution
model ?
2
Arm HPC Workshop@Akihabara 2017
What to be shared and what not to be shared
• Isolated address spaces
• slow communication
• Shared variables
• contentions on shared variables
3
Address Space
Isolated Shared
Variables
Privatized
Multi-Process
(MPI)
Shared ??
Multi-Thread
(OpenMP)
Arm HPC Workshop@Akihabara 2017
What to be shared and what not to be shared
• Isolated address spaces
• slow communication
• Shared variables
• contentions on shared variables
4
Address Space
Isolated Shared
Variables
Privatized
Multi-Process
(MPI)
3rd Exec.
Model
Shared ??
Multi-Thread
(OpenMP)
Arm HPC Workshop@Akihabara 2017
Implementation of 3rd Execution Model
• MPC (by CEA)
• Multi-thread approach
• Compiler converts all variables thread local
• a.out and b.out cannot run simultaneously
• PVAS (by RIKEN)
• Multi-process approach
• Patched Linux
• OS kernel allows processes to share address
space
• MPC, PVAS, and SMARTMAP are not portable
5
Arm HPC Workshop@Akihabara 2017
Why portability matters ?
• On the large supercomputers (i.e. the K),
modified OS kernel or kernel module is not
allowed for users to install
• When I tried to port PVAS onto McKernel,
core developer denies the modification
• DO NOT CONTAMINATE MY CODE !!
6
Arm HPC Workshop@Akihabara 2017
PiP is very PORTABLE
7
CPU OS
Xeon and Xeon Phi
x86_64 Linux
x86_64 McKernel
the K and FX10 SPARC64 XTCOS
ARM (Opteron A1170) Aarch64 Linux
0
0.1
0.2
1
10
100
200
Time[S]
# Tasks -- Xeon
PiP:preload
PiP:thread
Fork&Exec
Vfork&Exec
PosixSpawn Pthread
0
1
2
1
10
100
200# Tasks -- KNL
0
0.1
0.2
1
10
100
200
# Tasks -- Aarch64
0
1
2
1
10
100
200
# Tasks -- K
Task Spawning Time
Arm HPC Workshop@Akihabara 2017
Portability
• PiP can run the machines where
• pthread_create() (, or clone() system call)
• PIE
• dlmopen()
are supported
• PiP does not run on
• BG/Q PIE is not supported
• Windows PIE is not fully supported
• Mac OSX dlmopen() is not supported
• FACT: All machines listed in Top500 (Nov. 2017)
use Linux family OS !!
8
Arm HPC Workshop@Akihabara 2017
• User-level implementation of 3rd exec. model
• Portable and practical
Process in Process (PiP)
9
555555554000-555555556000 r-xp ... /PIP/test/basic
555555755000-555555756000 r--p ... /PIP/test/basic
555555756000-555555757000 rw-p ... /PIP/test/basic
555555757000-555555778000 rw-p ... [heap]
7fffe8000000-7fffe8021000 rw-p ...
7fffe8021000-7fffec000000 ---p ...
7ffff0000000-7ffff0021000 rw-p ...
7ffff0021000-7ffff4000000 ---p ...
7ffff4b24000-7ffff4c24000 rw-p ...
7ffff4c24000-7ffff4c27000 r-xp ... /PIP/lib/libpip.so
7ffff4c27000-7ffff4e26000 ---p ... /PIP/lib/libpip.so
7ffff4e26000-7ffff4e27000 r--p ... /PIP/lib/libpip.so
7ffff4e27000-7ffff4e28000 rw-p ... /PIP/lib/libpip.so
7ffff4e28000-7ffff4e2a000 r-xp ... /PIP/test/basic
7ffff4e2a000-7ffff5029000 ---p ... /PIP/test/basic
7ffff5029000-7ffff502a000 r--p ... /PIP/test/basic
7ffff502a000-7ffff502b000 rw-p ... /PIP/test/basic
7ffff502b000-7ffff502e000 r-xp ... /PIP/lib/libpip.so
7ffff502e000-7ffff522d000 ---p ... /PIP/lib/libpip.so
7ffff522d000-7ffff522e000 r--p ... /PIP/lib/libpip.so
7ffff522e000-7ffff522f000 rw-p ... /PIP/lib/libpip.so
7ffff522f000-7ffff5231000 r-xp ... /PIP/test/basic
7ffff5231000-7ffff5430000 ---p ... /PIP/test/basic
7ffff5430000-7ffff5431000 r--p ... /PIP/test/basic
7ffff5431000-7ffff5432000 rw-p ... /PIP/test/basic
...
7ffff5a52000-7ffff5a56000 rw-p ...
...
7ffff5c6e000-7ffff5c72000 rw-p ...
7ffff5c72000-7ffff5e28000 r-xp ... /lib64/libc.so
7ffff5e28000-7ffff6028000 ---p ... /lib64/libc.so
7ffff6028000-7ffff602c000 r--p ... /lib64/libc.so
7ffff602c000-7ffff602e000 rw-p ... /lib64/libc.so
7ffff602e000-7ffff6033000 rw-p ...
7ffff6033000-7ffff61e9000 r-xp ... /lib64/libc.so
7ffff61e9000-7ffff63e9000 ---p ... /lib64/libc.so
7ffff63e9000-7ffff63ed000 r--p ... /lib64/libc.so
7ffff63ed000-7ffff63ef000 rw-p ... /lib64/libc.so
7ffff63ef000-7ffff63f4000 rw-p ...
7ffff63f4000-7ffff63f5000 ---p ...
7ffff63f5000-7ffff6bf5000 rw-p ... [stack:10641]
7ffff6bf5000-7ffff6bf6000 ---p ...
7ffff6bf6000-7ffff73f6000 rw-p ... [stack:10640]
7ffff73f6000-7ffff75ac000 r-xp ... /lib64/libc.so
7ffff75ac000-7ffff77ac000 ---p ... /lib64/libc.so
7ffff77ac000-7ffff77b0000 r--p ... /lib64/libc.so
7ffff77b0000-7ffff77b2000 rw-p ... /lib64/libc.so
7ffff77b2000-7ffff77b7000 rw-p ...
...
7ffff79cf000-7ffff79d3000 rw-p ...
7ffff79d3000-7ffff79d6000 r-xp ... /PIP/lib/libpip.so
7ffff79d6000-7ffff7bd5000 ---p ... /PIP/lib/libpip.so
7ffff7bd5000-7ffff7bd6000 r--p ... /PIP/lib/libpip.so
7ffff7bd6000-7ffff7bd7000 rw-p ... /PIP/lib/libpip.so
7ffff7ddb000-7ffff7dfc000 r-xp ... /lib64/ld.so
7ffff7edc000-7ffff7fe0000 rw-p ...
7ffff7ff7000-7ffff7ffa000 rw-p ...
7ffff7ffa000-7ffff7ffc000 r-xp ... [vdso]
7ffff7ffc000-7ffff7ffd000 r--p ... /lib64/ld.so
7ffff7ffd000-7ffff7ffe000 rw-p ... /lib64/ld.so
7ffff7ffe000-7ffff7fff000 rw-p ...
7ffffffde000-7ffffffff000 rw-p ... [stack]
ffffffffff600000-ffffffffff601000 r-xp ... [vsyscall]
Program
Glibc
Address Space
Task-0 int x;
Task-1 int x;
:
a.out Task-(n-1) int x;
Task-(n) int a;
:
b.out Task-(m-1) int a;
:
Arm HPC Workshop@Akihabara 2017
Why address space sharing is better ?
• Memory mapping techniques in multi-process model
• POSIX (SYS-V, mmap, ..) shmem
• XPMEM
• Same page table is shared by tasks
• no page table coherency overhead
• saving memory for page tables
• pointers can be used as they are
10
Memory mapping
must maintain page
table coherency
-> OVERHEAD
(system call, page
fault, and page table
size)
shared
region
Page
Table
shared
region
Page
Table
Proc-0 Proc-1
coherent
Shared Physical Memory Pages
Arm HPC Workshop@Akihabara 2017
Memory Mapping vs. PiP
11
for Practical Address-Space Sharing PPoPP 2018, February 24–28, 2018, Vienna, Austria
concurrency because the
alysis is processing the data.
ction 7.4, we chose the latter
n is exible enough to allow
rms
forms to cover several OS
n our evaluation as listed in
platform H/W info.
Clock Memory Network
.6GHz 64 GiB ConnectX-3
.4GHz 96(+16) GiB Omni-Path
.0GHz 16 GiB Tofu
n Section 7.1 and 7.3 without using
ne with cache quadrant mode.
platform S/W info.
Table 5. Overhead of XPMEM and POSIX shmem functions
(Wallaby/Linux)
XPMEM Cycles
xpmem_make() 1,585
xpmem_get() 15,294
xpmem_attach() 2,414
xpmem_detach() 19,183
xpmem_release() 693
POSIX Shmem Cycles
Sender shm_open() 22,294
ftruncate() 4,080
mmap() 5,553
close() 6,017
Receiver shm_open() 13,522
mmap() 16,232
close() 16,746
6.2 Page Fault Overhead
Figure 4 shows the time series of each access using the same
microbenchmark program used in the preceding subsection.
Element access was strided with 64 bytes so that each cache
block was accessed only once, to eliminate the cache block
eect. In the XPMEM case, the mmap()ed region was attached
by using the XPMEM functions. The upper-left graph in
this gure shows the time series using POSIX shmem and
XPMEM, and the lower-left graph shows the time series
using PiP. Both graphs on the left-hand side show spikes at
every 4 KiB. Because of space limitations, we do not show
(Xeon/Linux)
10
100
1,000
5,000
AccessTime[Tick]
ShmemXPMEM XPMEM
PageSize:4KiB PageSize:2MiB
10
100
500
0 4,096 8,192 12,288 16,384
Array Elements [Byte offset]
PiP:process PiP:thread
0 4,096 8,192 12,288 16,384
Array Elements [Byte offset]
PiP:process PiP:thread
(Xeon/Linux)
PiP takes
less than
100 clocks !!
Arm HPC Workshop@Akihabara 2017
Process in Process (PiP)
• dlmopen (not a typo of dlopen)
• load a program having a new name space
• The same variable “foo” can have multiple
instances having different addresses
• Position Independent Executable (PIE)
• PIE programs can be loaded at any location
• Combine dlmopen and PIE
• load a PIE program with dlmopen
• We can privatize variables in the same
address space
12
Arm HPC Workshop@Akihabara 2017
Glibc Issue
• In the current Glibc, dlmopen() can create up to 16
name spaces only
• Each PiP task requires one name space to have
privatized variables
• Many-core architecture can run more than 16 PiP tasks,
up to the number of CPU cores
• Glibc patch is also provided to have more number of
name spaces, in case 16 is not enough
• Changing the size of name space stable
• Currently 260 PiP tasks can be created
• Some workaround codes can be found in PiP library
code
13
Arm HPC Workshop@Akihabara 2017
PiP Showcases
14
Arm HPC Workshop@Akihabara 2017
Showcase 1 : MPI pt2pt
• Current Eager/Rndv. 2 Copies
• PiP Rndv. 1 Copy
15
(Xeon/Linux)
1
4
16
64
256
1024
4096
16384
65536
Bandwidth(MB/s)
Message Size (Bytes)
eager-2copy
rndv-2copy
PiP (rndv-1copy)
PiP is 3.5x
faster @ 128KB
better
Arm HPC Workshop@Akihabara 2017
Showcase 2 : MPI DDT
• Derived Data Type (DDT) Communication
• Non-contiguous data transfer
• Current pack - send - unpack (3 copies)
• PiP non-contig send (1 copy)
16
0
0.5
1
1.5
2
64K
16,
128
32K,
32,
128
16K,
64,
128
8K,
128,
128
4K,
256,
128
2K,
512,
128
1K,
1K,
128
512,
2K,
128
256,
4K,
128
128,
8K,
128
64,
16K,
128
NormolizedTime
Count of double elements in X,Y,Z dimentions
eager-2copy (base)
rndv-2copy
PiP
Non-contig Vec
Non-contig Vec
(Xeon/Linux)
better
Arm HPC Workshop@Akihabara 2017
Showcase 3 : MPI_Win_allocate_shared (1/2)
17
MPI Implementation
int main(int argc, char **argv) {
MPI_Init(argc, argv);
...
MPI_Win_allocate_shared(size, 1,
MPI_INFO_NULL, comm, mem, win);
...
MPI_Win_shared_query(win, north, sz,
dsp_unit, northptr);
MPI_Win_shared_query(win, south, sz,
dsp_unit, southptr);
MPI_Win_shared_query(win, east, sz,
dsp_unit, eastptr);
MPI_Win_shared_query(win, west, sz,
dsp_unit, westptr);
...
MPI_Win_lock_all(0, win);
for(int iter=0; iterniters; ++iter) {
MPI_Win_sync(win);
MPI_Barrier(shmcomm);
/* stencil computation */
}
MPI_Win_unlock_all(win);
...
}
PiP Implementation
int main(int argc, char **argv) {
pip_init( pipid, p, NULL, 0 );
...
mem = malloc( size );
...
pip_get_addr( north, mem, northptr );
pip_get_addr( south, mem, southptr );
pip_get_addr( east, mem, eastptr );
pip_get_addr( west, mem, westptr );
...
for(int iter=0; iterniters; ++iter) {
pip_barrier( p );
...
/* stencil computation */
}
...
pip_fin();
}
Arm HPC Workshop@Akihabara 2017
Showcase 3 : MPI_Win_allocate_shared (2/2)
18
1E+2
1E+3
1E+4
1E+5
1E+6
0.1
1
10
100
1 10 100 1,000
TotalPageTableSize[KiB]
PTSizePercentagetoArraySize(MPI)
# Tasks -- KNL
PiP
MPI
Percentage
1E+0
1E+1
1E+2
1E+3
1E+4
1E+5
1E+6
1 10 100 1,000
#TotalPageFaults
# Tasks -- KNL
PiP
MPI
5P Stencil (4K x 4K)
Size of Page Tables# Page Faults
better
Arm HPC Workshop@Akihabara 2017
Showcase 4 : In Situ
19
LAMMPSProcess InsituProcess
Pre1allocated
SharedBuffer
Copy%in
Copy%out
Gather
data
chunks
Analysis
Dump
copydata
LAMMPSprocess Insituprocess
Copy%out
Gather
data
chunks
Analysis
Dump
data
Original SHMEM-based In Situ
PiP-based In Situ 4,4,4 6,6,6 8,8,8 10,10,10 12,12,12
1
1.1
1.2
1.3
1.4
1.5
1.6
1.7
SlowdownRatio(basedonw/oIn-situ)
LAMMPS: 3d Lennard-Jones melt (xx, yy, zz)
POSIX shmem PiP
LAMMPS in situ: POSIX shmem vs. PiP
On Xeon + Linux
• LAMMPSprocessranwithfourOpenMP threads;
• Insituprocessiswithsinglethread;
• O(N2)comp. cost data transfer cost at(12,12,12).
(Xeon/Linux)
better
Arm HPC Workshop@Akihabara 2017
Showcase 5 : SNAP
20
683.3%
379.1%
207.9%
153.0%
106.4%
91.6%
83.3%
430.5%
221.2%
123.0%
68.3%
42.0%
27.7%
22.0%
1.6%
1.7%
1.7%
2.2%
2.5%
3.3%
3.8%
0
0.5
1
1.5
2
2.5
3
3.5
4
0
100
200
300
400
500
600
700
800
16 32 64 128 256 512 1024
Speedup(PiP%vs%Threads)
Solve%Time%(s)
Number%of%Cores
MPICH/Threads
MPICH/PiP
Speedup(PiPvsThreads)
PiP V.S. threads in hybrid MPI+X SNAP
strong scaling on OFP (1-16 nodes, flat mode).
• ( MPI + OpenMP ) ( MPI + PiP )
better
better
Arm HPC Workshop@Akihabara 2017
Showcase 5 : Using in Hybrid MPI + “X” as the
“X” (2)
21
! PiP based(parallelism
– Easy(application(data(sharing(across(cores
– No(multithreading(safety(overhead
– Naturally(utilizing(multiple(network(ports((
Network(Ports
MPI(stack
APP(data
1
4
16
64
256
1024
4096
16384
65536
1
4
16
64
256
1K
4K
16K
64K
256K
1M
4M
MessageSize(Bytes)
KMessages/sbewteenPiPtasks
1(Pair
4(Pairs
16(Pairs
64(Pairs
1
4
16
64
256
1024
4096
16384
65536
1
4
16
64
256
1K
4K
16K
64K
256K
1M
4M
MessageSize(Bytes)
KMessages/sbetweenthreads
1(Pair
4(Pairs
16(Pairs
64(Pairs
683.3
430.5
1.6
0
100
200
300
400
500
600
700
800
16
SolveTime(s)
Multipair message rate (osu_mbw_mr )
between two OFP nodes (Xeon Phi + Linux, flat mode).
PiP V.S
strong sca
Arm HPC Workshop@Akihabara 2017
Research Collaboration
• ANL (Dr. Pavan and Dr. Min) — DOE-MEXT
• MPICH
• UT/ICL (Prof. Bosilca)
• Open MPI
• CEA (Dr. Pérache) — CEA-RIKEN
• MPC
• UIUC (Prof. Kale) — JLESC
• AMPI
• Intel (Dr. Dayal)
• In Situ
22
Arm HPC Workshop@Akihabara 2017
Summary
• Process in Process (PIP)
• New implementation of the 3rd execution
model
• better than memory mapping techniques
• PiP is portable and practical because of
user-level implementation
• can run on the K and OFP super
computers
• Showcases prove PiP can improve
performance
23
Arm HPC Workshop@Akihabara 2017
Final words
• The Glib issues will be reported to Redhat
• We are seeking PiP applications not only HPC
but also Enterprise
24

Más contenido relacionado

La actualidad más candente

Exploring the Programming Models for the LUMI Supercomputer
Exploring the Programming Models for the LUMI Supercomputer Exploring the Programming Models for the LUMI Supercomputer
Exploring the Programming Models for the LUMI Supercomputer George Markomanolis
 
Utilizing AMD GPUs: Tuning, programming models, and roadmap
Utilizing AMD GPUs: Tuning, programming models, and roadmapUtilizing AMD GPUs: Tuning, programming models, and roadmap
Utilizing AMD GPUs: Tuning, programming models, and roadmapGeorge Markomanolis
 
Linaro HPC Workshop Note
Linaro HPC Workshop NoteLinaro HPC Workshop Note
Linaro HPC Workshop NoteLinaro
 
Evaluating GPU programming Models for the LUMI Supercomputer
Evaluating GPU programming Models for the LUMI SupercomputerEvaluating GPU programming Models for the LUMI Supercomputer
Evaluating GPU programming Models for the LUMI SupercomputerGeorge Markomanolis
 
Get Lower Latency and Higher Throughput for Java Applications
Get Lower Latency and Higher Throughput for Java ApplicationsGet Lower Latency and Higher Throughput for Java Applications
Get Lower Latency and Higher Throughput for Java ApplicationsScyllaDB
 
Using eBPF to Measure the k8s Cluster Health
Using eBPF to Measure the k8s Cluster HealthUsing eBPF to Measure the k8s Cluster Health
Using eBPF to Measure the k8s Cluster HealthScyllaDB
 
High-Performance Networking Using eBPF, XDP, and io_uring
High-Performance Networking Using eBPF, XDP, and io_uringHigh-Performance Networking Using eBPF, XDP, and io_uring
High-Performance Networking Using eBPF, XDP, and io_uringScyllaDB
 
Bpf performance tools chapter 4 bcc
Bpf performance tools chapter 4   bccBpf performance tools chapter 4   bcc
Bpf performance tools chapter 4 bccViller Hsiao
 
BPF Hardware Offload Deep Dive
BPF Hardware Offload Deep DiveBPF Hardware Offload Deep Dive
BPF Hardware Offload Deep DiveNetronome
 
Whoops! I Rewrote It in Rust
Whoops! I Rewrote It in RustWhoops! I Rewrote It in Rust
Whoops! I Rewrote It in RustScyllaDB
 
Disaggregation a Primer: Optimizing design for Edge Cloud & Bare Metal applic...
Disaggregation a Primer: Optimizing design for Edge Cloud & Bare Metal applic...Disaggregation a Primer: Optimizing design for Edge Cloud & Bare Metal applic...
Disaggregation a Primer: Optimizing design for Edge Cloud & Bare Metal applic...Netronome
 
Kernel Proc Connector and Containers
Kernel Proc Connector and ContainersKernel Proc Connector and Containers
Kernel Proc Connector and ContainersKernel TLV
 
Performance optimization 101 - Erlang Factory SF 2014
Performance optimization 101 - Erlang Factory SF 2014Performance optimization 101 - Erlang Factory SF 2014
Performance optimization 101 - Erlang Factory SF 2014lpgauth
 
Data Structures for High Resolution, Real-time Telemetry at Scale
Data Structures for High Resolution, Real-time Telemetry at ScaleData Structures for High Resolution, Real-time Telemetry at Scale
Data Structures for High Resolution, Real-time Telemetry at ScaleScyllaDB
 
Debugging node in prod
Debugging node in prodDebugging node in prod
Debugging node in prodYunong Xiao
 
BPF - All your packets belong to me
BPF - All your packets belong to meBPF - All your packets belong to me
BPF - All your packets belong to me_xhr_
 
IPv4aaS tutorial and hands-on
IPv4aaS tutorial and hands-onIPv4aaS tutorial and hands-on
IPv4aaS tutorial and hands-onAPNIC
 

La actualidad más candente (20)

Lustre Best Practices
Lustre Best Practices Lustre Best Practices
Lustre Best Practices
 
Exploring the Programming Models for the LUMI Supercomputer
Exploring the Programming Models for the LUMI Supercomputer Exploring the Programming Models for the LUMI Supercomputer
Exploring the Programming Models for the LUMI Supercomputer
 
Utilizing AMD GPUs: Tuning, programming models, and roadmap
Utilizing AMD GPUs: Tuning, programming models, and roadmapUtilizing AMD GPUs: Tuning, programming models, and roadmap
Utilizing AMD GPUs: Tuning, programming models, and roadmap
 
Getting started with AMD GPUs
Getting started with AMD GPUsGetting started with AMD GPUs
Getting started with AMD GPUs
 
Linaro HPC Workshop Note
Linaro HPC Workshop NoteLinaro HPC Workshop Note
Linaro HPC Workshop Note
 
Evaluating GPU programming Models for the LUMI Supercomputer
Evaluating GPU programming Models for the LUMI SupercomputerEvaluating GPU programming Models for the LUMI Supercomputer
Evaluating GPU programming Models for the LUMI Supercomputer
 
Get Lower Latency and Higher Throughput for Java Applications
Get Lower Latency and Higher Throughput for Java ApplicationsGet Lower Latency and Higher Throughput for Java Applications
Get Lower Latency and Higher Throughput for Java Applications
 
Using eBPF to Measure the k8s Cluster Health
Using eBPF to Measure the k8s Cluster HealthUsing eBPF to Measure the k8s Cluster Health
Using eBPF to Measure the k8s Cluster Health
 
High-Performance Networking Using eBPF, XDP, and io_uring
High-Performance Networking Using eBPF, XDP, and io_uringHigh-Performance Networking Using eBPF, XDP, and io_uring
High-Performance Networking Using eBPF, XDP, and io_uring
 
Bpf performance tools chapter 4 bcc
Bpf performance tools chapter 4   bccBpf performance tools chapter 4   bcc
Bpf performance tools chapter 4 bcc
 
Onnc intro
Onnc introOnnc intro
Onnc intro
 
BPF Hardware Offload Deep Dive
BPF Hardware Offload Deep DiveBPF Hardware Offload Deep Dive
BPF Hardware Offload Deep Dive
 
Whoops! I Rewrote It in Rust
Whoops! I Rewrote It in RustWhoops! I Rewrote It in Rust
Whoops! I Rewrote It in Rust
 
Disaggregation a Primer: Optimizing design for Edge Cloud & Bare Metal applic...
Disaggregation a Primer: Optimizing design for Edge Cloud & Bare Metal applic...Disaggregation a Primer: Optimizing design for Edge Cloud & Bare Metal applic...
Disaggregation a Primer: Optimizing design for Edge Cloud & Bare Metal applic...
 
Kernel Proc Connector and Containers
Kernel Proc Connector and ContainersKernel Proc Connector and Containers
Kernel Proc Connector and Containers
 
Performance optimization 101 - Erlang Factory SF 2014
Performance optimization 101 - Erlang Factory SF 2014Performance optimization 101 - Erlang Factory SF 2014
Performance optimization 101 - Erlang Factory SF 2014
 
Data Structures for High Resolution, Real-time Telemetry at Scale
Data Structures for High Resolution, Real-time Telemetry at ScaleData Structures for High Resolution, Real-time Telemetry at Scale
Data Structures for High Resolution, Real-time Telemetry at Scale
 
Debugging node in prod
Debugging node in prodDebugging node in prod
Debugging node in prod
 
BPF - All your packets belong to me
BPF - All your packets belong to meBPF - All your packets belong to me
BPF - All your packets belong to me
 
IPv4aaS tutorial and hands-on
IPv4aaS tutorial and hands-onIPv4aaS tutorial and hands-on
IPv4aaS tutorial and hands-on
 

Similar a New Process/Thread Runtime

OpenNebulaConf2017EU: IPP Cloud by Jimmy Goffaux, IPPON
OpenNebulaConf2017EU: IPP Cloud by Jimmy Goffaux, IPPONOpenNebulaConf2017EU: IPP Cloud by Jimmy Goffaux, IPPON
OpenNebulaConf2017EU: IPP Cloud by Jimmy Goffaux, IPPONOpenNebula Project
 
SciPipe - A light-weight workflow library inspired by flow-based programming
SciPipe - A light-weight workflow library inspired by flow-based programmingSciPipe - A light-weight workflow library inspired by flow-based programming
SciPipe - A light-weight workflow library inspired by flow-based programmingSamuel Lampa
 
Portable Streaming Pipelines with Apache Beam
Portable Streaming Pipelines with Apache BeamPortable Streaming Pipelines with Apache Beam
Portable Streaming Pipelines with Apache Beamconfluent
 
Accelerating apache spark with rdma
Accelerating apache spark with rdmaAccelerating apache spark with rdma
Accelerating apache spark with rdmainside-BigData.com
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsAlluxio, Inc.
 
OpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC SystemsOpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC SystemsHPCC Systems
 
[NetApp Managing Big Workspaces with Storage Magic
[NetApp Managing Big Workspaces with Storage Magic[NetApp Managing Big Workspaces with Storage Magic
[NetApp Managing Big Workspaces with Storage MagicPerforce
 
Open MPI SC'15 State of the Union BOF
Open MPI SC'15 State of the Union BOFOpen MPI SC'15 State of the Union BOF
Open MPI SC'15 State of the Union BOFJeff Squyres
 
Spark Pipelines in the Cloud with Alluxio with Gene Pang
Spark Pipelines in the Cloud with Alluxio with Gene PangSpark Pipelines in the Cloud with Alluxio with Gene Pang
Spark Pipelines in the Cloud with Alluxio with Gene PangSpark Summit
 
QNIBTerminal Plus InfiniBand - Containerized MPI Workloads
QNIBTerminal Plus InfiniBand - Containerized MPI WorkloadsQNIBTerminal Plus InfiniBand - Containerized MPI Workloads
QNIBTerminal Plus InfiniBand - Containerized MPI Workloadsinside-BigData.com
 
Using R on High Performance Computers
Using R on High Performance ComputersUsing R on High Performance Computers
Using R on High Performance ComputersDave Hiltbrand
 
Spark Pipelines in the Cloud with Alluxio
Spark Pipelines in the Cloud with AlluxioSpark Pipelines in the Cloud with Alluxio
Spark Pipelines in the Cloud with AlluxioAlluxio, Inc.
 
HPCC Systems 6.0.0 Highlights
HPCC Systems 6.0.0 HighlightsHPCC Systems 6.0.0 Highlights
HPCC Systems 6.0.0 HighlightsHPCC Systems
 
Hyperion EPM APIs - Added value from HFM, Workspace, FDM, Smartview, and Shar...
Hyperion EPM APIs - Added value from HFM, Workspace, FDM, Smartview, and Shar...Hyperion EPM APIs - Added value from HFM, Workspace, FDM, Smartview, and Shar...
Hyperion EPM APIs - Added value from HFM, Workspace, FDM, Smartview, and Shar...Charles Beyer
 
6-9-2017-slides-vFinal.pptx
6-9-2017-slides-vFinal.pptx6-9-2017-slides-vFinal.pptx
6-9-2017-slides-vFinal.pptxSimRelokasi2
 
ACIC: Automatic Cloud I/O Configurator for HPC Applications
ACIC: Automatic Cloud I/O Configurator for HPC ApplicationsACIC: Automatic Cloud I/O Configurator for HPC Applications
ACIC: Automatic Cloud I/O Configurator for HPC ApplicationsMingliang Liu
 

Similar a New Process/Thread Runtime (20)

OpenNebulaConf2017EU: IPP Cloud by Jimmy Goffaux, IPPON
OpenNebulaConf2017EU: IPP Cloud by Jimmy Goffaux, IPPONOpenNebulaConf2017EU: IPP Cloud by Jimmy Goffaux, IPPON
OpenNebulaConf2017EU: IPP Cloud by Jimmy Goffaux, IPPON
 
SciPipe - A light-weight workflow library inspired by flow-based programming
SciPipe - A light-weight workflow library inspired by flow-based programmingSciPipe - A light-weight workflow library inspired by flow-based programming
SciPipe - A light-weight workflow library inspired by flow-based programming
 
Portable Streaming Pipelines with Apache Beam
Portable Streaming Pipelines with Apache BeamPortable Streaming Pipelines with Apache Beam
Portable Streaming Pipelines with Apache Beam
 
Accelerating apache spark with rdma
Accelerating apache spark with rdmaAccelerating apache spark with rdma
Accelerating apache spark with rdma
 
Red Hat Storage Roadmap
Red Hat Storage RoadmapRed Hat Storage Roadmap
Red Hat Storage Roadmap
 
Red Hat Storage Roadmap
Red Hat Storage RoadmapRed Hat Storage Roadmap
Red Hat Storage Roadmap
 
Callgraph analysis
Callgraph analysisCallgraph analysis
Callgraph analysis
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
 
OpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC SystemsOpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC Systems
 
[NetApp Managing Big Workspaces with Storage Magic
[NetApp Managing Big Workspaces with Storage Magic[NetApp Managing Big Workspaces with Storage Magic
[NetApp Managing Big Workspaces with Storage Magic
 
Intro to CakePHP
Intro to CakePHPIntro to CakePHP
Intro to CakePHP
 
Open MPI SC'15 State of the Union BOF
Open MPI SC'15 State of the Union BOFOpen MPI SC'15 State of the Union BOF
Open MPI SC'15 State of the Union BOF
 
Spark Pipelines in the Cloud with Alluxio with Gene Pang
Spark Pipelines in the Cloud with Alluxio with Gene PangSpark Pipelines in the Cloud with Alluxio with Gene Pang
Spark Pipelines in the Cloud with Alluxio with Gene Pang
 
QNIBTerminal Plus InfiniBand - Containerized MPI Workloads
QNIBTerminal Plus InfiniBand - Containerized MPI WorkloadsQNIBTerminal Plus InfiniBand - Containerized MPI Workloads
QNIBTerminal Plus InfiniBand - Containerized MPI Workloads
 
Using R on High Performance Computers
Using R on High Performance ComputersUsing R on High Performance Computers
Using R on High Performance Computers
 
Spark Pipelines in the Cloud with Alluxio
Spark Pipelines in the Cloud with AlluxioSpark Pipelines in the Cloud with Alluxio
Spark Pipelines in the Cloud with Alluxio
 
HPCC Systems 6.0.0 Highlights
HPCC Systems 6.0.0 HighlightsHPCC Systems 6.0.0 Highlights
HPCC Systems 6.0.0 Highlights
 
Hyperion EPM APIs - Added value from HFM, Workspace, FDM, Smartview, and Shar...
Hyperion EPM APIs - Added value from HFM, Workspace, FDM, Smartview, and Shar...Hyperion EPM APIs - Added value from HFM, Workspace, FDM, Smartview, and Shar...
Hyperion EPM APIs - Added value from HFM, Workspace, FDM, Smartview, and Shar...
 
6-9-2017-slides-vFinal.pptx
6-9-2017-slides-vFinal.pptx6-9-2017-slides-vFinal.pptx
6-9-2017-slides-vFinal.pptx
 
ACIC: Automatic Cloud I/O Configurator for HPC Applications
ACIC: Automatic Cloud I/O Configurator for HPC ApplicationsACIC: Automatic Cloud I/O Configurator for HPC Applications
ACIC: Automatic Cloud I/O Configurator for HPC Applications
 

Más de Linaro

Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea GalloDeep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea GalloLinaro
 
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta VekariaArm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta VekariaLinaro
 
Huawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Huawei’s requirements for the ARM based HPC solution readiness - Joshua MoraHuawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Huawei’s requirements for the ARM based HPC solution readiness - Joshua MoraLinaro
 
Bud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qaBud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qaLinaro
 
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018Linaro
 
HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018Linaro
 
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...Linaro
 
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...Linaro
 
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...Linaro
 
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...Linaro
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineLinaro
 
HKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening KeynoteHKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening KeynoteLinaro
 
HKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP WorkshopHKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP WorkshopLinaro
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineLinaro
 
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and allHKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and allLinaro
 
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse HypervisorHKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse HypervisorLinaro
 
HKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMUHKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMULinaro
 
HKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8MHKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8MLinaro
 
HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation Linaro
 
HKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted bootHKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted bootLinaro
 

Más de Linaro (20)

Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea GalloDeep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
 
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta VekariaArm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
 
Huawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Huawei’s requirements for the ARM based HPC solution readiness - Joshua MoraHuawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Huawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
 
Bud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qaBud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qa
 
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
 
HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018
 
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
 
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
 
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
 
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
 
HKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening KeynoteHKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening Keynote
 
HKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP WorkshopHKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP Workshop
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
 
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and allHKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
 
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse HypervisorHKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
 
HKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMUHKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMU
 
HKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8MHKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8M
 
HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation
 
HKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted bootHKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted boot
 

Último

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 

Último (20)

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 

New Process/Thread Runtime

  • 1. New Process/Thread Runtime Process in Process Techniques for Practical Address-Space Sharing Atsushi Hori (RIKEN) Dec. 13, 2017
  • 2. Arm HPC Workshop@Akihabara 2017 Background • The rise of many-core architectures • The current parallel execution models are designed for multi-core architectures • Shall we have a new parallel execution model ? 2
  • 3. Arm HPC Workshop@Akihabara 2017 What to be shared and what not to be shared • Isolated address spaces • slow communication • Shared variables • contentions on shared variables 3 Address Space Isolated Shared Variables Privatized Multi-Process (MPI) Shared ?? Multi-Thread (OpenMP)
  • 4. Arm HPC Workshop@Akihabara 2017 What to be shared and what not to be shared • Isolated address spaces • slow communication • Shared variables • contentions on shared variables 4 Address Space Isolated Shared Variables Privatized Multi-Process (MPI) 3rd Exec. Model Shared ?? Multi-Thread (OpenMP)
  • 5. Arm HPC Workshop@Akihabara 2017 Implementation of 3rd Execution Model • MPC (by CEA) • Multi-thread approach • Compiler converts all variables thread local • a.out and b.out cannot run simultaneously • PVAS (by RIKEN) • Multi-process approach • Patched Linux • OS kernel allows processes to share address space • MPC, PVAS, and SMARTMAP are not portable 5
  • 6. Arm HPC Workshop@Akihabara 2017 Why portability matters ? • On the large supercomputers (i.e. the K), modified OS kernel or kernel module is not allowed for users to install • When I tried to port PVAS onto McKernel, core developer denies the modification • DO NOT CONTAMINATE MY CODE !! 6
  • 7. Arm HPC Workshop@Akihabara 2017 PiP is very PORTABLE 7 CPU OS Xeon and Xeon Phi x86_64 Linux x86_64 McKernel the K and FX10 SPARC64 XTCOS ARM (Opteron A1170) Aarch64 Linux 0 0.1 0.2 1 10 100 200 Time[S] # Tasks -- Xeon PiP:preload PiP:thread Fork&Exec Vfork&Exec PosixSpawn Pthread 0 1 2 1 10 100 200# Tasks -- KNL 0 0.1 0.2 1 10 100 200 # Tasks -- Aarch64 0 1 2 1 10 100 200 # Tasks -- K Task Spawning Time
  • 8. Arm HPC Workshop@Akihabara 2017 Portability • PiP can run the machines where • pthread_create() (, or clone() system call) • PIE • dlmopen() are supported • PiP does not run on • BG/Q PIE is not supported • Windows PIE is not fully supported • Mac OSX dlmopen() is not supported • FACT: All machines listed in Top500 (Nov. 2017) use Linux family OS !! 8
  • 9. Arm HPC Workshop@Akihabara 2017 • User-level implementation of 3rd exec. model • Portable and practical Process in Process (PiP) 9 555555554000-555555556000 r-xp ... /PIP/test/basic 555555755000-555555756000 r--p ... /PIP/test/basic 555555756000-555555757000 rw-p ... /PIP/test/basic 555555757000-555555778000 rw-p ... [heap] 7fffe8000000-7fffe8021000 rw-p ... 7fffe8021000-7fffec000000 ---p ... 7ffff0000000-7ffff0021000 rw-p ... 7ffff0021000-7ffff4000000 ---p ... 7ffff4b24000-7ffff4c24000 rw-p ... 7ffff4c24000-7ffff4c27000 r-xp ... /PIP/lib/libpip.so 7ffff4c27000-7ffff4e26000 ---p ... /PIP/lib/libpip.so 7ffff4e26000-7ffff4e27000 r--p ... /PIP/lib/libpip.so 7ffff4e27000-7ffff4e28000 rw-p ... /PIP/lib/libpip.so 7ffff4e28000-7ffff4e2a000 r-xp ... /PIP/test/basic 7ffff4e2a000-7ffff5029000 ---p ... /PIP/test/basic 7ffff5029000-7ffff502a000 r--p ... /PIP/test/basic 7ffff502a000-7ffff502b000 rw-p ... /PIP/test/basic 7ffff502b000-7ffff502e000 r-xp ... /PIP/lib/libpip.so 7ffff502e000-7ffff522d000 ---p ... /PIP/lib/libpip.so 7ffff522d000-7ffff522e000 r--p ... /PIP/lib/libpip.so 7ffff522e000-7ffff522f000 rw-p ... /PIP/lib/libpip.so 7ffff522f000-7ffff5231000 r-xp ... /PIP/test/basic 7ffff5231000-7ffff5430000 ---p ... /PIP/test/basic 7ffff5430000-7ffff5431000 r--p ... /PIP/test/basic 7ffff5431000-7ffff5432000 rw-p ... /PIP/test/basic ... 7ffff5a52000-7ffff5a56000 rw-p ... ... 7ffff5c6e000-7ffff5c72000 rw-p ... 7ffff5c72000-7ffff5e28000 r-xp ... /lib64/libc.so 7ffff5e28000-7ffff6028000 ---p ... /lib64/libc.so 7ffff6028000-7ffff602c000 r--p ... /lib64/libc.so 7ffff602c000-7ffff602e000 rw-p ... /lib64/libc.so 7ffff602e000-7ffff6033000 rw-p ... 7ffff6033000-7ffff61e9000 r-xp ... /lib64/libc.so 7ffff61e9000-7ffff63e9000 ---p ... /lib64/libc.so 7ffff63e9000-7ffff63ed000 r--p ... /lib64/libc.so 7ffff63ed000-7ffff63ef000 rw-p ... /lib64/libc.so 7ffff63ef000-7ffff63f4000 rw-p ... 7ffff63f4000-7ffff63f5000 ---p ... 7ffff63f5000-7ffff6bf5000 rw-p ... [stack:10641] 7ffff6bf5000-7ffff6bf6000 ---p ... 7ffff6bf6000-7ffff73f6000 rw-p ... [stack:10640] 7ffff73f6000-7ffff75ac000 r-xp ... /lib64/libc.so 7ffff75ac000-7ffff77ac000 ---p ... /lib64/libc.so 7ffff77ac000-7ffff77b0000 r--p ... /lib64/libc.so 7ffff77b0000-7ffff77b2000 rw-p ... /lib64/libc.so 7ffff77b2000-7ffff77b7000 rw-p ... ... 7ffff79cf000-7ffff79d3000 rw-p ... 7ffff79d3000-7ffff79d6000 r-xp ... /PIP/lib/libpip.so 7ffff79d6000-7ffff7bd5000 ---p ... /PIP/lib/libpip.so 7ffff7bd5000-7ffff7bd6000 r--p ... /PIP/lib/libpip.so 7ffff7bd6000-7ffff7bd7000 rw-p ... /PIP/lib/libpip.so 7ffff7ddb000-7ffff7dfc000 r-xp ... /lib64/ld.so 7ffff7edc000-7ffff7fe0000 rw-p ... 7ffff7ff7000-7ffff7ffa000 rw-p ... 7ffff7ffa000-7ffff7ffc000 r-xp ... [vdso] 7ffff7ffc000-7ffff7ffd000 r--p ... /lib64/ld.so 7ffff7ffd000-7ffff7ffe000 rw-p ... /lib64/ld.so 7ffff7ffe000-7ffff7fff000 rw-p ... 7ffffffde000-7ffffffff000 rw-p ... [stack] ffffffffff600000-ffffffffff601000 r-xp ... [vsyscall] Program Glibc Address Space Task-0 int x; Task-1 int x; : a.out Task-(n-1) int x; Task-(n) int a; : b.out Task-(m-1) int a; :
  • 10. Arm HPC Workshop@Akihabara 2017 Why address space sharing is better ? • Memory mapping techniques in multi-process model • POSIX (SYS-V, mmap, ..) shmem • XPMEM • Same page table is shared by tasks • no page table coherency overhead • saving memory for page tables • pointers can be used as they are 10 Memory mapping must maintain page table coherency -> OVERHEAD (system call, page fault, and page table size) shared region Page Table shared region Page Table Proc-0 Proc-1 coherent Shared Physical Memory Pages
  • 11. Arm HPC Workshop@Akihabara 2017 Memory Mapping vs. PiP 11 for Practical Address-Space Sharing PPoPP 2018, February 24–28, 2018, Vienna, Austria concurrency because the alysis is processing the data. ction 7.4, we chose the latter n is exible enough to allow rms forms to cover several OS n our evaluation as listed in platform H/W info. Clock Memory Network .6GHz 64 GiB ConnectX-3 .4GHz 96(+16) GiB Omni-Path .0GHz 16 GiB Tofu n Section 7.1 and 7.3 without using ne with cache quadrant mode. platform S/W info. Table 5. Overhead of XPMEM and POSIX shmem functions (Wallaby/Linux) XPMEM Cycles xpmem_make() 1,585 xpmem_get() 15,294 xpmem_attach() 2,414 xpmem_detach() 19,183 xpmem_release() 693 POSIX Shmem Cycles Sender shm_open() 22,294 ftruncate() 4,080 mmap() 5,553 close() 6,017 Receiver shm_open() 13,522 mmap() 16,232 close() 16,746 6.2 Page Fault Overhead Figure 4 shows the time series of each access using the same microbenchmark program used in the preceding subsection. Element access was strided with 64 bytes so that each cache block was accessed only once, to eliminate the cache block eect. In the XPMEM case, the mmap()ed region was attached by using the XPMEM functions. The upper-left graph in this gure shows the time series using POSIX shmem and XPMEM, and the lower-left graph shows the time series using PiP. Both graphs on the left-hand side show spikes at every 4 KiB. Because of space limitations, we do not show (Xeon/Linux) 10 100 1,000 5,000 AccessTime[Tick] ShmemXPMEM XPMEM PageSize:4KiB PageSize:2MiB 10 100 500 0 4,096 8,192 12,288 16,384 Array Elements [Byte offset] PiP:process PiP:thread 0 4,096 8,192 12,288 16,384 Array Elements [Byte offset] PiP:process PiP:thread (Xeon/Linux) PiP takes less than 100 clocks !!
  • 12. Arm HPC Workshop@Akihabara 2017 Process in Process (PiP) • dlmopen (not a typo of dlopen) • load a program having a new name space • The same variable “foo” can have multiple instances having different addresses • Position Independent Executable (PIE) • PIE programs can be loaded at any location • Combine dlmopen and PIE • load a PIE program with dlmopen • We can privatize variables in the same address space 12
  • 13. Arm HPC Workshop@Akihabara 2017 Glibc Issue • In the current Glibc, dlmopen() can create up to 16 name spaces only • Each PiP task requires one name space to have privatized variables • Many-core architecture can run more than 16 PiP tasks, up to the number of CPU cores • Glibc patch is also provided to have more number of name spaces, in case 16 is not enough • Changing the size of name space stable • Currently 260 PiP tasks can be created • Some workaround codes can be found in PiP library code 13
  • 14. Arm HPC Workshop@Akihabara 2017 PiP Showcases 14
  • 15. Arm HPC Workshop@Akihabara 2017 Showcase 1 : MPI pt2pt • Current Eager/Rndv. 2 Copies • PiP Rndv. 1 Copy 15 (Xeon/Linux) 1 4 16 64 256 1024 4096 16384 65536 Bandwidth(MB/s) Message Size (Bytes) eager-2copy rndv-2copy PiP (rndv-1copy) PiP is 3.5x faster @ 128KB better
  • 16. Arm HPC Workshop@Akihabara 2017 Showcase 2 : MPI DDT • Derived Data Type (DDT) Communication • Non-contiguous data transfer • Current pack - send - unpack (3 copies) • PiP non-contig send (1 copy) 16 0 0.5 1 1.5 2 64K 16, 128 32K, 32, 128 16K, 64, 128 8K, 128, 128 4K, 256, 128 2K, 512, 128 1K, 1K, 128 512, 2K, 128 256, 4K, 128 128, 8K, 128 64, 16K, 128 NormolizedTime Count of double elements in X,Y,Z dimentions eager-2copy (base) rndv-2copy PiP Non-contig Vec Non-contig Vec (Xeon/Linux) better
  • 17. Arm HPC Workshop@Akihabara 2017 Showcase 3 : MPI_Win_allocate_shared (1/2) 17 MPI Implementation int main(int argc, char **argv) { MPI_Init(argc, argv); ... MPI_Win_allocate_shared(size, 1, MPI_INFO_NULL, comm, mem, win); ... MPI_Win_shared_query(win, north, sz, dsp_unit, northptr); MPI_Win_shared_query(win, south, sz, dsp_unit, southptr); MPI_Win_shared_query(win, east, sz, dsp_unit, eastptr); MPI_Win_shared_query(win, west, sz, dsp_unit, westptr); ... MPI_Win_lock_all(0, win); for(int iter=0; iterniters; ++iter) { MPI_Win_sync(win); MPI_Barrier(shmcomm); /* stencil computation */ } MPI_Win_unlock_all(win); ... } PiP Implementation int main(int argc, char **argv) { pip_init( pipid, p, NULL, 0 ); ... mem = malloc( size ); ... pip_get_addr( north, mem, northptr ); pip_get_addr( south, mem, southptr ); pip_get_addr( east, mem, eastptr ); pip_get_addr( west, mem, westptr ); ... for(int iter=0; iterniters; ++iter) { pip_barrier( p ); ... /* stencil computation */ } ... pip_fin(); }
  • 18. Arm HPC Workshop@Akihabara 2017 Showcase 3 : MPI_Win_allocate_shared (2/2) 18 1E+2 1E+3 1E+4 1E+5 1E+6 0.1 1 10 100 1 10 100 1,000 TotalPageTableSize[KiB] PTSizePercentagetoArraySize(MPI) # Tasks -- KNL PiP MPI Percentage 1E+0 1E+1 1E+2 1E+3 1E+4 1E+5 1E+6 1 10 100 1,000 #TotalPageFaults # Tasks -- KNL PiP MPI 5P Stencil (4K x 4K) Size of Page Tables# Page Faults better
  • 19. Arm HPC Workshop@Akihabara 2017 Showcase 4 : In Situ 19 LAMMPSProcess InsituProcess Pre1allocated SharedBuffer Copy%in Copy%out Gather data chunks Analysis Dump copydata LAMMPSprocess Insituprocess Copy%out Gather data chunks Analysis Dump data Original SHMEM-based In Situ PiP-based In Situ 4,4,4 6,6,6 8,8,8 10,10,10 12,12,12 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 SlowdownRatio(basedonw/oIn-situ) LAMMPS: 3d Lennard-Jones melt (xx, yy, zz) POSIX shmem PiP LAMMPS in situ: POSIX shmem vs. PiP On Xeon + Linux • LAMMPSprocessranwithfourOpenMP threads; • Insituprocessiswithsinglethread; • O(N2)comp. cost data transfer cost at(12,12,12). (Xeon/Linux) better
  • 20. Arm HPC Workshop@Akihabara 2017 Showcase 5 : SNAP 20 683.3% 379.1% 207.9% 153.0% 106.4% 91.6% 83.3% 430.5% 221.2% 123.0% 68.3% 42.0% 27.7% 22.0% 1.6% 1.7% 1.7% 2.2% 2.5% 3.3% 3.8% 0 0.5 1 1.5 2 2.5 3 3.5 4 0 100 200 300 400 500 600 700 800 16 32 64 128 256 512 1024 Speedup(PiP%vs%Threads) Solve%Time%(s) Number%of%Cores MPICH/Threads MPICH/PiP Speedup(PiPvsThreads) PiP V.S. threads in hybrid MPI+X SNAP strong scaling on OFP (1-16 nodes, flat mode). • ( MPI + OpenMP ) ( MPI + PiP ) better better
  • 21. Arm HPC Workshop@Akihabara 2017 Showcase 5 : Using in Hybrid MPI + “X” as the “X” (2) 21 ! PiP based(parallelism – Easy(application(data(sharing(across(cores – No(multithreading(safety(overhead – Naturally(utilizing(multiple(network(ports(( Network(Ports MPI(stack APP(data 1 4 16 64 256 1024 4096 16384 65536 1 4 16 64 256 1K 4K 16K 64K 256K 1M 4M MessageSize(Bytes) KMessages/sbewteenPiPtasks 1(Pair 4(Pairs 16(Pairs 64(Pairs 1 4 16 64 256 1024 4096 16384 65536 1 4 16 64 256 1K 4K 16K 64K 256K 1M 4M MessageSize(Bytes) KMessages/sbetweenthreads 1(Pair 4(Pairs 16(Pairs 64(Pairs 683.3 430.5 1.6 0 100 200 300 400 500 600 700 800 16 SolveTime(s) Multipair message rate (osu_mbw_mr ) between two OFP nodes (Xeon Phi + Linux, flat mode). PiP V.S strong sca
  • 22. Arm HPC Workshop@Akihabara 2017 Research Collaboration • ANL (Dr. Pavan and Dr. Min) — DOE-MEXT • MPICH • UT/ICL (Prof. Bosilca) • Open MPI • CEA (Dr. Pérache) — CEA-RIKEN • MPC • UIUC (Prof. Kale) — JLESC • AMPI • Intel (Dr. Dayal) • In Situ 22
  • 23. Arm HPC Workshop@Akihabara 2017 Summary • Process in Process (PIP) • New implementation of the 3rd execution model • better than memory mapping techniques • PiP is portable and practical because of user-level implementation • can run on the K and OFP super computers • Showcases prove PiP can improve performance 23
  • 24. Arm HPC Workshop@Akihabara 2017 Final words • The Glib issues will be reported to Redhat • We are seeking PiP applications not only HPC but also Enterprise 24