Implementation and integration of GPU-accelerated easyWave for instant tsunami propagation calculations in the TRIDEC tsunami early warning system demonstrator
An alternative way to quickly access simulated tsunami wave propagations in early warning systems based on the actual earthquake parameters is the on-demand computation with feasible algorithms, high performance hardware, and optimized code. Thus one simulation, or several simulations with varying granularity, covering several hours of tsunami wave propagation, and tailored to the actual situation can be computed within seconds. Since the main uncertainty in early warning is originated from the source parameters, furthermore, the on-demand computation allows fast re-computation in case of updated parameters.
Mathematical Modelling for Tsunami Early Warning Systems, Malaga, April 9-11, 2014
https://edanya.uma.es/tsumamos2014/
http://youtu.be/6xFJZzWNi7o
Similar to Implementation and integration of GPU-accelerated easyWave for instant tsunami propagation calculations in the TRIDEC tsunami early warning system demonstrator
Similar to Implementation and integration of GPU-accelerated easyWave for instant tsunami propagation calculations in the TRIDEC tsunami early warning system demonstrator (20)
Implementation and integration of GPU-accelerated easyWave for instant tsunami propagation calculations in the TRIDEC tsunami early warning system demonstrator
1. Implementation and integration of GPU-accelerated easyWave
for instant tsunami propagation calculations
in the TRIDEC tsunami early warning system demonstrator
Martin Hammitzsch1
, Johannes Spazier2
, Andrey Babeyko1
, and Sven Reißland1
1
GFZ German Research Centre for Geosciences, 2
University Potsdam
TsuMaMoS 2014 – Mathematical Modelling for Tsunami Early Warning Systems
9-11 April 2014, Malaga, Spain
2. Motivation
• Matching Simulation Databases (MSDB)
– Tsunami Early Warning Systems (TEWS) store a large number of pre-computed tsunami
simulations in a database, which normally weight several terabytes
– Given earthquake event parameters, from pre-computed simulations
• Either the best matches are picked with the closest source model,
• Or a composite simulation is built by combining individual pre-computed simulations
– The construction, operation and maintenance of MSDBs, not only require time consuming pre-
computation and set-up but are also an IT and management challenge
– MSDBs may introduce problems if the closest pre-computed source model and fault
parameters do not necessarily coincide with the actual seismic observations
• On-demand simulation computation
– Fast development of computational power in recent years raises a question of employing on-
demand (on-the-fly) computation as an alternative way to quickly access simulated tsunami
wave propagations in TEWS
– Given actual earthquake event parameters, on-demand computations
• Use feasible algorithms, high performance hardware, and optimized code
• Compute within seconds one simulation, or several simulations with varying granularity,
covering several hours of tsunami wave propagation, tailored to the actual situation
• Allow fast re-computation in case of updated parameters
– Changes in algorithms, data etc. can be applied immediately
4. easyWave
• Application used to simulate tsunami generation and
propagation in the context of early warning
– Employs a light-weighted numerical scheme to simulate tsunami wave propagation
and run-ups reasonable for early warning purposes
– Computes spherical shallow water equations in linear approximation without coastal
inundations and without detailed run-ups
– Applies Green's law to estimate peak coastal tsunami amplitudes based on tsunami
waves calculated for the validity limit of the linear shallow water model, usually for
20-50m depth
• Use of GPU acceleration to speed up calculations
• AGPL licensed free and open source software (FOSS)
– Go to http://trac.gfz-potsdam.de/easywave and make use of it.
5. Theoretical and numerical background (1)
• Linear shallow water equations in spherical coordinates
(fluxes formulation):
• Variables to solve – h: wave height, M: longitudal, N: latitudal flux
• Parameters – D: bathymetry, g: gravity, R: Earth radius
• Coordinates – θ and λ
0
θ
0
λθcos
0)θcos(
θλθcos
1
=
h
R
gD
+
t
N
=
h
R
gD
+
t
M
=N+
M
Rt
h
∂
∂
∂
∂
∂
∂
∂
∂
∂
∂
∂
∂
+
∂
∂
- Mass conservation
- Moment conservation longitude
- Moment conservation latitude
6. Theoretical and numerical background (2)
• Leap-frog explicit time stepping at expanding staggered finite-
difference uniform grid
• Follows well-known TUNAMI-F1 numerical algorithm (IUGG/IOC
Time Project, IOC Manuals and Guides No. 35, UNESCO 1997)
• Boundary conditions:
− Open ocean radiation
− Full normal reflection on land
• Coastal flow-depth by extrapolation from offshore positions at
50-100 m depth similar to Japanese TEWS (Kamigaichi, 2009)
• Input:
− Okada‘s faults (multiple)
− Direct uplift at a grid (Golden Software GRD-format)
• Output:
− EWH and ETA at given points-of-interest (POI)
− 1D time series at POIs
− 2D maps (GRD-format) of wave propagation, max wave heights and arrival times
− 2D post-processing to PNG-images and MPEG
7. Example
• The synthetic mareograms on Fig.3 were not fitted to the buoys observations. They result
solely from the GPS source inversion and, hence, are absolutely independent with buoy
records. The comparison justifies the quality of the fast source model.
• See Hoechner, A., Ge, M., Babeyko, A. Y., and Sobolev, S. V.: Instant tsunami early
warning based on real-time GPS – Tohoku 2011 case study, Nat. Hazards Earth Syst. Sci.,
13, 1285-1292, doi:10.5194/nhess-13-1285-2013, 2013.
9. Concept
• Compute Unified Device Architecture (CUDA)
– Introduced by NVIDIA in 2006
– CUDA enables direct programming of the GPU and thus to exploit its computing power
for scientific applications
– Possibility to offload special portions (kernels) of the code to the GPU, that runs with
thousands of threads on hundreds of cores in parallel
– CUDA-C extends C/C++ by special syntax and runtime libraries
(no new language to learn)
– CUDA runs on all current NVIDIA devices (but not on other cards, e.g. AMD)
• Porting easyWave to a parallel GPU version using CUDA
– Offload calculation of the linear shallow water equations to the GPU and process many
grid points at the same time
– Handle data transfers that are required because of separated memory areas on CPU
and GPU
– Leave rest of the code unchanged
10. GPU performance compared to CPU
X – Model integration time in minutes, Y – Computational time in seconds
11. Hardware specific software optimization
• GPU hardware matters
– Latest CUDA compute architectures beat old devices
– Latest CUDA compute architectures beat hardware specific software optimization
• Optimization relevant for future application with old hardware
– Hardware specific software optimization for Tesla architecture (compute capability 1.x) only
– Hardware specific software optimization achieves performance of new hardware generation
compared to GPU cores (C1060 with 240 cores and C2075 with 448 cores)
See Christgau, S., Spazier, J.,
Schnor, B., Hammitzsch, M.,
Babeyko, A., and Waechter, J.:
A comparison of CUDA and
OpenACC: Accelerating the
Tsunami Simulation EasyWave,
Architecture of Computing
Systems (ARCS) 2014,
February 25-28, 2014
12. Optimized grid stripe GPU parallelization
• Hardware performance comparison
– Speed-up factor 37 from E5-1603 (red) to GTX Titan 1x (grey)
– Speed-up factor 29 from i7-3970X (green) to GTX Titan 1x (grey)
• GPU parallelization relevant for future application with long run time
– Speed-up factor 1.65 from C1060 1x (blue) to C1060 2x (pink)
– Speed-up factor 2.08 from C1060 1x (blue) to C1060 4x (azure) CPU communication
– Speed-up factor 1.94 from GTX Titan 1x (grey) to GTX Titan 2x (brown) P2P memcopies
14. TRIDEC Project
• Focuses on new technologies for real‐
time intelligent information
management in collaborative, complex
critical decision processes
• Important application field of the
technology developed is management
of natural crises, i.e. tsunamis
• Based on the development of and
experiences in the German Indonesian
Tsunami Early Warning System
(GITEWS) and the Distant Early
Warning System (DEWS)
• In TRIDEC new developments extend
the existing platform for both, sensor
integration and warning dissemination
• Building distributed tsunami warning
systems for transnational deployment
based on a component-based
technology framework
15. On-demand computation in TRIDEC
• Re-engineering and porting of easyWave
– Re-engineering of algorithm and code to serve essential foundation for the GPU version
– Parallelisation from sequential CPU computation to parallel multi GPU processing as
native CUDA implementation for NVIDIA cards
– Optimization by various techniques and analysis of impact on performance for different
hardware generations
– Time-saving by the more than twenty-fold accelerated computation
• Integration and use in the TRIDEC TEWS demonstrator
– Wrapping of most optimised easyWave GPU version by an abstraction layer for a
service-like integration in the TRIDEC TEWS demonstrator
– Request of simulation computations based on earthquake event parameters by
operators on duty working with the TRIDEC Command and Control User Interface
(CCUI)
• Computation of two simulations in parallel for the Portuguese system set-up, one
for the Gulf of Cadiz region with 3 hours wave propagation and another one for the
North East Atlantic region with 10 hours wave propagation.
– Serving input for the tsunami warning message generation and dissemination
• Provision of Estimated Time of Arrival (ETA) and the Estimated Wave Height (EWH)
for Tsunami Forecast Points (TFP)
19. New concept
• Backend
– Private cloud with NVIDIA GPUs on different servers with varying hardware
– easyWave GPU port computing the whole wave field with wave amplitudes
– Software for managing computations with GPUs on different servers
• Frontend
– Web 2.0 website with map interface
– List with latest moment tensor events from GEOFON
– Automatic processing based on EQ thresholds defined in simplified decision matrix
– Re-processing of GEOFON events with modified EQ parameters
– Processing of events with self-defined parameters
– Computation of isochrones (tsunami travel times) and isohypses (wave heights)
– Computation of estimated time of arrival (ETA) and wave height (EWH/SSH) for
selected Tsunami Forecast Points (TFP) in the North-eastern Atlantic, the
Mediterranean and connected seas (NEAM) region
• Limited public access, full access for registered researchers
Wave propagation time, and thus expanding grid size matters
Initial, non-linear portion of the plot is due to the expanding grid used by easyWave. Only few codes use expanding grids. Usually, the whole grid is being used at each time steps. In this case, graphs will increase linearly.
GPU usage relevant for future application with high-resolution grids
Version 1.1 - Speicherausrichtung
Version 2.0 - Call-by-Value
Version 2.1 - Parallele Vergrößerung
Version 2.2 - Registernutzung
Version 2.3 - Call-by-Value-Erweiterung
Version 3.0 - Shared Memory
Version 3.1 - Shared Memory-Erweiterung
Tesla C1060, compute capability 1.x, Tesla compute architecture (GFZ)
Tesla C2075, compute capability 2.x, Fermi compute architecture (IfI)
GeForce GTX Titan, compute capability 3.x, Kepler compute architecture (TRIDEC)