NVIDIA Tesla GPUs can accelerate computational research by providing greater performance at lower costs and power requirements compared to CPUs alone. GPUs allow for faster simulation times, higher accuracy, and more research to be conducted. UCLA's physics and astronomy department saw a 20% performance increase with the same power budget by upgrading to Tesla M2090 GPUs. Over 150,000 academic papers have been published on GPU computing, showing its widespread adoption for accelerating science applications.
2. Lift the Barriers of HPC
Faster / Maximum Greater Budget &
More Research Performance Power Efficiencies
Faster, More Discovery, More Performance More Performance
Higher Accuracy per dollar per watt
3. GPU Impact to Computational Research
More
Research + Maximum
Performance + Efficient
Power
88ns/day, 6x Faster 318% Higher Performance 2.5x Flops / Watt
54% Added Cost Tianhe-1A: CPU + GPU
JAC simulation time
23,558 Atoms DHFR AMBER 11 Jaguar: CPU only
CPU: Dual socket Intel Xeon
Axel Kohlmeyer: Temple University Tianhe-1A: #2 Top500; Jaguar: #3 Top500
X5670, 2.93 GHz (12 cores)
4. GPU Computing by Numbers
60 583
Universities Universities
150K 1.5M
CUDA Downloads CUDA Downloads
4,000 22,500
Academic Papers Academic Papers
1 52
Supercomputer Supercomputers
2008 2012
5. UCLA
Department of Physics and Astronomy
Challenge
Accelerate Plasma Research with innovative Particle-in-Cell (PIC) Simulations
Overcome space and power constraints in data centers
Integrate into shared computing strategy across institutes and centers at UCLA
Solution
GPU cluster
96 server nodes
288 NVIDIA Tesla GPUs
Upgraded GPUs to NVIDIA Tesla M2090s (from M2070)
Impact
Upgrades resulted in 20% higher performance with same power cost
GPUs extended to new groups within department for greatly accelerated modeling
Solves faster performance requirements within limited space and power constraints
#235 on prestigious Top500 list with only 6 Racks
8. 3 Ways to Accelerate Applications
Applications
OpenACC Programming
Libraries
Directives Languages
“Drop-in” Easily Accelerate Maximum
Acceleration Applications Flexibility
THRUST C
BLAS, LAPACK C++
FFT PGI Accelerator Fortran
NPP CAPS HMPP OpenCL
Sparse CRAY DirectCompute
Imaging Java
RNG Python
9. GPU-Accelerated MATLAB Results
10x speedup in data clustering via K- 14x speedup in template matching routine 3x speedup in estimating 7.6 million
means clustering algorithm (part of cancer cell image analysis) contract prices using Black-Scholes model
17x speedup in simulating the movement 4x speedup in adaptive filtering routine 4x speedup in wave equation solving (part
of 3072 celestial objects (part of acoustic tracking algorithm) of seismic data processing algorithm)
10. AMBER 12 - Extreme Performance with K20
DHRF JAC 23K Atoms (NVE) Running AMBER 12 GPU Support Revision 12.1
SPFP with CUDA 4.2.9 ECC Off
120
The blue node contains 2x Intel E5-2687W CPUs
95.59 (8 Cores per CPU)
100
Each green node contains 2x Intel E5-2687W
CPUs (8 Cores per CPU) plus 2x NVIDIA K20 GPU
Nanoseconds / Day
80
60
40
20 12.47
0
1 Node 1 Node
DHFR
Gain > 7.5X throughput/performance by adding just 2 K20 GPUs
when compared to dual CPU performance
11. NAMD 2.9
Outstanding Strong Scaling with Multi-STMV Running NAMD version 2.9
Each blue XE6 CPU node contains 1x AMD
100 STMV on Hundreds of Nodes 1600 Opteron (16 Cores per CPU).
1.2
Fermi XK6 Each green XK6 CPU+GPU node contains
1x AMD 1600 Opteron (16 Cores per CPU)
1 and an additional 1x NVIDIA X2090 GPU.
CPU XK6
2.7x
Nanoseconds / Day
0.8
2.9x
0.6
0.4
0.2
3.6x
3.8x Concatenation of 100
0 Satellite Tobacco Mosaic Virus
32 64 128 256 512 640 768
# of Nodes
Accelerate your science by 2.7-3.8x when compared to CPU-based supercomputers
12. Try NVIDIA GPUs
Available Applications Applications Catalog
www.nvidia.com/appscatalog
Quick Application Acceleration OpenACC Directives
www.nvidia.com/gpudirectives
Easy & Free GPU Test Drive GPU Test Drive Cluster
www.nvidia.com/gputestdrive
Welcome, today I am excited to show you how NVIDIA Tesla GPU solutions are having a profound impact on science by breaking new barriers in computing performance. Researchers all over the world have embraced computing as the third pillar of science. Now with Tesla GPU Computing, explosive performance gains are allowing academic researchers to discover new theories, build more robust models and publish more papers.I will share highlights of successful academic institutions and researchers achieving their goals of faster, better science while doing so within academic budget constraints.
With the growing need to use computing to achieve new frontiers in science and research, we quickly identified barriers to growing this need. First of all, we need to enable the researchers and scientists to do faster and more discovery with higher amounts of accuracy. We need to also do that with maximum performance per dollar, because we all have budgets. We need to do it in the most efficient manner, whether that be efficiency of power, or even efficiency in space.
It’s exciting to show that GPU computing can address all of the most important barriers of delivering game changing ability in computational research.For example: AMBER – a very popular computational chemistry application can allow researchers to see 6x more simulation data per day, achieving 88 nanoseconds in a day, what would take a week to simulate on CPUs alone.Now let’s see how much does that actually cost, well by adding just 50% cost to a system, you are getting over a 300% performance gain.And finally GPUs are very power efficient. The #2 and #3 most powerful supercomputers in the world are a great example. China’s Tianhe-1A, taking the #2 spot, is 2.5x more power efficient than oak ridge’s Jaguar CPU only system.
We have certainly reached the inflection point of broad adoption of GPU computing.Over 580 universities are teaching GPU computing as part of their regular curriculum. In fact, this year the Chinese Ministry of education will be requiring 200 of their higher education institutions to make NVIDIA’s CUDA parallel programming part of the curriculum.It’s been a growing trend for more and more government funding being awarded to GPU projects by the NIH, NSF or DOE.Not only large projects, like Oak Ridge’s Titan project which incorporates some 18 thousand GPUs, but also university infrastructure grants and department/research grants to develop GPU computing applications are being regularly awarded.
UCLA was faced with many of challenges or barriers of HPC. The challenges they faced were that they needed to accelerate a new innovative Plasma simulation. And they also needed to overcome space and power constraints. So their solution was a cluster with 96 nodes and 288 NVIDIA Tesla GPUs. The impact was considerable. The GPUs resulted in 20% higher performance with the same power cost. Additionally, the GPUs extended to new groups within departments for greater accelerated modeling.So here they were able to offer faster and more performance as well as fitting within a budget they had for both space and power.
NVIDIA’s GPU accelerated application footprint is growing exponentially year over year. Computational scientists and developers have realized that the future is in parallel computing.Native GPU acceleration has now made its way into the most widely used and published against scientific applications. This breadth of applications enables each school and department’s domain scientist population, specifically those who aren’t programmers, to reap the benefits of GPU acceleration.
Equally important to applications, enabling domain scientists, we have been developing easier and easier approaches to develop your own applications for GPUs.For fastest and easiest approach we have our “drop in” libraries.Many scientific applications make wide use of standard templates or math libraries. NVIDIA makes freely available the most commonly used such as Thrust, a templated library and many math libraries such as BLAS, fft and Sparse matrices.Another extremely non-invasive way to get application acceleration is to apply open ACC directives to your existing application. It takes only a few lines of code to get a 2-10 times speedup in just a matter of days or hours.Finally if you are a developer and need the maximum amount of performance, we support you in your native programming language.
Engineers and scientists worldwide rely on MATLAB to accelerate the pace of discovery, innovation, and development in disciplines such as automotive, aerospace, electronics, financial services, biotech, and many other industriesEngineers and scientists are successfully employing GPU technology, to accelerate their discipline-specific calculations. With minimal effort and without extensive knowledge of GPUs, you can now use the promising power of GPUs with MATLAB.
(previous script from AMBER 11 benchmarks. Slide showsK20 results)I briefly spoke about AMBER’s price performance in our opening. Now that you see how easy it is for researchers and scientists to benefit from GPU computing with ready to go applications or easy to implement developer approaches such as directives, we should revisit price performance. See again, on a single node when applying 2 GPUs, this will essentially increase the node cost by 50%, we get much more than a 50% performance improvement. In fact, with this application we achieve greater than 300% higher performance making GPUs a clear winning investment.Additional Information on K20 Slide:1 CPU node (dual CPUs) = 12.47 ns/day1 CPU+ GPU node (dual CPUs and GPUs) = 95.59 ns/day
NAMD, another extremely popular Molecular Dynamics package, here is showing that it gets up to a 2.7x speedup with GPUs. We’ve benchmarked it with a typical STMV benchmark, which is 1 million atoms. So this is a very large system. But these are the systems and simulation times needed for researchers to make breakthroughs in science. 32 64 128 256 512 640 768s/step GPU XK6 1.2414 0.660887 0.342743 0.199465 0.10837 0.089752 0.0774948s/step CPU XK6 4.62633 2.36707 1.19722 0.609124 0.314745 0.255016 0.209511ns/day Fermi XK6 0.069599 0.13073339 0.252084 0.433159 0.797269 0.962655 1.114913517ns/day CPU XK6 0.018676 0.03650082 0.072167 0.141843 0.274508 0.338802 0.412388848
Today more than ever, it’s easier for researchers, scientists and academic institutions to benefit from GPU computing. We have ready-to-go GPU accelerated applications (see the Applications Catalog). We are continuously investing in creating the easiest approaches to quickly accelerating your own applications; OpenACC directives being our latest development.And finally, the GPU Test Drive cluster is the ideal solution to easily test how a particular application accelerates with GPUs. The GPU Test Drive clusteris also pre-configured for easy purchase and installations
Thank you for following along.I hope we have proved to you that GPU computing is making extraordinary contributions to science and research.Now is the time to reach your next scientific computing achievements by investing in NVIDIA Tesla GPUs which have worldwide adoption and world class developer support.