Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

byteLAKE's exceptional experience with NVIDIA

66 visualizaciones

Publicado el

Looking for expertise on #NVIDIA technologies? Seek no more! We've been working with NVIDIA technologies in all possible configurations and with all microarchitectures.

► Expertise in all possible configurations
• desktop, mobile, server
• Tesla, Fermi, Kepler (K80, GeForce GTX Titan, Jetson), Maxwell (NVIDIA GeForce GTX 980), Pascal (P100), Tesla V100, T4
• CUDA, OpenCL, OpenACC

► Several case studies delivered
• AI training (machine/ deep learning)
• Edge AI inferencing
• Classic HPC simulations (CFD, weather)

► Very active in research space
• several publications for prestigious journals (Concurrency and Computation: Practice and Experience, Parallel Computing, Journal of Supercomputing etc.)

Learn more at:

Publicado en: Servicios
  • Sé el primero en comentar

  • Sé el primero en recomendar esto

byteLAKE's exceptional experience with NVIDIA

  1. 1. byteLAKE’s exceptional experience with NVIDIA SUMMARY Brief summary. Read latest updates at Artificial Intelligence Machine Learning Computer Vision Edge AI Cognitive Automation HPC byteLAKE Europe & USA +48 508 091 885 +48 505 322 282 +1 650 735 2063
  2. 2. Summary: byteLAKE’s exceptional experience with NVIDIA  Feb-20 2 HPC Simulations • parallelization of the EULAG model (weather simulations) • porting and adaptation of various applications / algorithms to HPC (CPU + GPU) architectures About EULAG: that particular model has a proven record of successful applications, and excellent efficiency as well as scalability on conventional supercomputer architectures. For instance it is being implemented as the new dynamical core of the COSMO weather prediction framework. Expertise in CUDA, OpenCL, OpenACC and NVIDIA hardware (from small edge devices to HPC clusters). • NVIDIA’s architectures like Kepler (i.e. K80 for servers, GeForce GTX Titan for desktop, Jetson for mobile), Maxwell (i.e. NVIDIA GeForce GTX 980 for desktop), Pascal (i.e. P100 for servers) • we have been working on NVIDIA’s platforms starting from Tesla architecture (i.e. C1060 card; year of 2008) and Fermi architecture (i.e. C2050 card). • Our recent projects were on V100 and T4.
  3. 3. Summary: byteLAKE’s exceptional experience with NVIDIA  Feb-20 3 More about HPC weather simulations: • we have done a lot of work here in the areas of analyzing the overall algorithm’s resources usage and their influence on the system performance. • based on that, we removed bottlenecks and eventually developed a method of efficient distribution of computation across GPU kernels. • our method analyzes memory transactions between GPU global and shared memories. That helps us deploy various strategies to accelerate the code execution, namely stencil decomposition, block decomposition (with weighting analysis between computation and communication), reduce inter-memory communication, and register file reusing. • besides, we also applied additional optimization techniques including 2.5D blocking, coalesced memory access, padding, and providing a high GPU occupancy, as well as algorithm-specific optimizations such as rearrangement of boundary conditions (i.e. to reduce the branch divergence), and management of exchanging halo areas between graphics processors within a single node. • all of these helped us significantly improve the overall performance of the simulation algorithm. • on top of these, we have built an auto-tuning procedure (machine learning based) that allowed us to automate the adaptation of the simulation to a set of GPUs, taking their individual characteristics into account (algorithm/GPU specific parameters incl. sizes of compute unified device architecture (CUDA) block for each kernel of the algorithm, size of data alignment boundary for each algorithm’s array, configuration of GPU-shared memory, cached or non- cached memory access, and CUDA compute capability setting). Results of the HPC weather simulations improvements: • We have experimentally validated our methods for NVIDIA Kepler-based GPUs (incl. Tesla K20X, GeForce GTX TITAN, a single Tesla K80 GPU, and multi-GPU system with two K80 cards, as well as GeForce GTX 980 GPU based on the NVIDIA Maxwell architecture). • Depending on the grid size and device architecture, our method allowed us to achieve a speed- up over the basic version of the HPC simulation (without auto-tuning mechanism) from 1.1 for GeForce GTX 980 to 1.92 for 2xTesla K80 GPU (side note: low speed-up for GeForce GTX 980 is case specific). • Then we also focused on an inter- and intra- node overlapping between data transfers and GPU computations for the GPU-accelerated cluster.
  4. 4. Summary: byteLAKE’s exceptional experience with NVIDIA  Feb-20 4 • For the Piz Daint cluster (equipped with NVIDIA Tesla K20 GPUs – 2015 year), our approach allowed us to achieve a weak scalability up to 136 nodes. The obtained performance exceeded 16 TFop/s in double precision. All in all our improved code was almost twice faster than the basic one. Besides performance, we also decreased the energy consumption. Therefore we applied a mixed precision arithmetic to the algorithm and managed it dynamically using a modified version of the random forest (machine learning) algorithm. We deployed it on the Piz Daint supercomputer (ranked 3rd at the TOP500 list, as of Nov. 2017) which is equipped with NVIDIA Tesla P100 GPU accelerators that are based on the NVIDIA Pascal architecture. • We have also deployed it on the MICLAB cluster containing NVIDIA Tesla K80 (NVIDIA Kepler- based GPU). As a result, we reduced the energy consumption by up to 36%. Example research publications using NVIDIA hardware: • Parallelization of 2D MPDATA EULAG algorithm on hybrid architectures with GPU accelerators, Parallel Computing 40(8), 2014, 425-447 • Adaptation of fluid model EULAG to graphics processing unit architecture, Concurrency and Computation: Practice and Experience 27(4), 2015, 937-957 • Performance modeling of 3D MPDATA simulations on GPU cluster, Journal of Supercomputing 73(2), 2017, 664-675 • Systematic adaptation of stencil-based 3D MPDATA algorithm to GPU architectures, Concurrency and Computation: Practice and Experience 29(9), 2017 • Machine learning method for energy reduction by utilizing dynamic mixed precision on GPU- based supercomputers, Concurrency and Computation: Practice and Experience
  5. 5. Summary: byteLAKE’s exceptional experience with NVIDIA  Feb-20 5 AI & HPC Convergence Research at byteLAKE
  6. 6. Summary: byteLAKE’s exceptional experience with NVIDIA  Feb-20 6 Learn more: