SlideShare una empresa de Scribd logo
1 de 33
Descargar para leer sin conexión
Advancing Science in Alternative Energy and 
Bioengineering with Many-Core Processors 
W. Michael Brown, Intel® Corporation† 
Jan-Michael Carrillo, Oak Ridge National Laboratory* 
Intel Confidential — Do Not Forward 
54th HPC User Forum 
September 15-17, 2014 
Seattle, Washington 
†Disclaimer: This talk includes results from my tenure at ORNL* that do not necessarily 
reflect the views, research directions, etc. of Intel® Corporation. 
* 
Other 
names 
and 
brands 
may 
be 
claimed 
as 
the 
property 
of 
others.
Legal Disclaimers 
INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS 
AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR 
A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. 
A "Mission Critical Application" is any application in which failure of the Intel Product could result, directly or indirectly, in personal injury or death. SHOULD YOU PURCHASE OR USE INTEL'S PRODUCTS FOR ANY SUCH MISSION CRITICAL APPLICATION, YOU SHALL INDEMNIFY AND HOLD INTEL AND 
ITS SUBSIDIARIES, SUBCONTRACTORS AND AFFILIATES, AND THE DIRECTORS, OFFICERS, AND EMPLOYEES OF EACH, HARMLESS AGAINST ALL CLAIMS COSTS, DAMAGES, AND EXPENSES AND REASONABLE ATTORNEYS' FEES ARISING OUT OF, DIRECTLY OR INDIRECTLY, ANY 
CLAIM OF PRODUCT LIABILITY, PERSONAL INJURY, OR DEATH ARISING IN ANY WAY OUT OF SUCH MISSION CRITICAL APPLICATION, WHETHER OR NOT INTEL OR ITS SUBCONTRACTOR WAS NEGLIGENT IN THE DESIGN, MANUFACTURE, OR WARNING OF THE INTEL PRODUCT OR ANY 
OF ITS PARTS. 
Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined". Intel reserves these for future definition and shall have no responsibility whatsoever for 
conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information. 
The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request. 
Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order. 
Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be obtained by calling 1-800-548-4725, or go to: http://www.intel.com/design/literature.htm 
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those 
factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. 
Intel does not control or audit the design or implementation of third party benchmarks or Web sites referenced in this document. Intel encourages all of its customers to visit the referenced Web sites or others where similar performance benchmarks are reported and confirm whether the referenced benchmarks 
are accurate and reflect performance of systems available for purchase. 
Relative performance is calculated by assigning a baseline value of 1.0 to one benchmark result, and then dividing the actual benchmark result for the baseline platform into each of the specific benchmark results of each of the other platforms, and assigning them a relative performance number that correlates 
with the performance improvements reported. 
SPEC, SPECint, SPECfp, SPECrate. SPECpower, SPECjAppServer, SPECjbb, SPECjvm, SPECWeb, SPECompM, SPECompL, SPEC MPI, SPECjEnterprise* are trademarks of the Standard Performance Evaluation Corporation. See http://www.spec.org for more information. TPC-C, TPC-H, TPC-E are 
trademarks of the Transaction Processing Council. See http://www.tpc.org for more information. 
Hyper-Threading Technology requires a computer system with a processor supporting HT Technology and an HT Technology-enabled chipset, BIOS and operating system. Performance will vary depending on the specific hardware and software you use. For more information including details on which 
processors support HT Technology, see here 
Intel® Turbo Boost Technology requires a Platform with a processor with Intel Turbo Boost Technology capability. Intel Turbo Boost Technology performance varies depending on hardware, software and overall system configuration. Check with your platform manufacturer on whether your system delivers Intel 
Turbo Boost Technology. For more information, see http://www.intel.com/technology/turboboost 
No computer system can provide absolute security. Requires an enabled Intel® processor and software optimized for use of the technology. Consult your system manufacturer and/or software vendor for more information. 
Intel processor numbers are not a measure of performance. Processor numbers differentiate features within each processor family, not across different processor families: Go to: 
Learn About Intel® Processor Numbers 
Intel product plans in this presentation do not constitute Intel plan of record product roadmaps. Please contact your Intel representative to obtain Intel’s current plan of record product roadmaps. 
Copyright © 2014 Intel Corporation. All rights reserved. Intel, the Intel logo, Xeon and Intel Core are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. All dates and products specified are for planning purposes only and are subject to change 
without notice 
*Other names and brands may be claimed as the property of others. 2
Risk Factors 
The above statements and any others in this document that refer to plans and expectations for the third quarter, the year and the future are forward-looking statements that involve a number of 
risks and uncertainties. Words such as “anticipates,” “expects,” “intends,” “plans,” “believes,” “seeks,” “estimates,” “may,” “will,” “should” and their variations identify forward-looking 
statements. Statements that refer to or are based on projections, uncertain events or assumptions also identify forward-looking statements. Many factors could affect Intel’s actual results, and 
variances from Intel’s current expectations regarding such factors could cause actual results to differ materially from those expressed in these forward-looking statements. Intel presently considers 
the following to be the important factors that could cause actual results to differ materially from the company’s expectations. Demand could be different from Intel's expectations due to factors 
including changes in business and economic conditions; customer acceptance of Intel’s and competitors’ products; supply constraints and other disruptions affecting customers; changes in 
customer order patterns including order cancellations; and changes in the level of inventory at customers. Uncertainty in global economic and financial conditions poses a risk that consumers and 
businesses may defer purchases in response to negative financial events, which could negatively affect product demand and other related matters. Intel operates in intensely competitive industries 
that are characterized by a high percentage of costs that are fixed or difficult to reduce in the short term and product demand that is highly variable and difficult to forecast. Revenue and the gross 
margin percentage are affected by the timing of Intel product introductions and the demand for and market acceptance of Intel's products; actions taken by Intel's competitors, including product 
offerings and introductions, marketing programs and pricing pressures and Intel’s response to such actions; and Intel’s ability to respond quickly to technological developments and to incorporate 
new features into its products. The gross margin percentage could vary significantly from expectations based on capacity utilization; variations in inventory valuation, including variations related to 
the timing of qualifying products for sale; changes in revenue levels; segment product mix; the timing and execution of the manufacturing ramp and associated costs; start-up costs; excess or 
obsolete inventory; changes in unit costs; defects or disruptions in the supply of materials or resources; product manufacturing quality/yields; and impairments of long-lived assets, including 
manufacturing, assembly/test and intangible assets. Intel's results could be affected by adverse economic, social, political and physical/infrastructure conditions in countries where Intel, its 
customers or its suppliers operate, including military conflict and other security risks, natural disasters, infrastructure disruptions, health concerns and fluctuations in currency exchange rates. 
Expenses, particularly certain marketing and compensation expenses, as well as restructuring and asset impairment charges, vary depending on the level of demand for Intel's products and the 
level of revenue and profits. Intel’s results could be affected by the timing of closing of acquisitions and divestitures. Intel's results could be affected by adverse effects associated with product 
defects and errata (deviations from published specifications), and by litigation or regulatory matters involving intellectual property, stockholder, consumer, antitrust, disclosure and other issues, such 
as the litigation and regulatory matters described in Intel's SEC reports. An unfavorable ruling could include monetary damages or an injunction prohibiting Intel from manufacturing or selling one or 
more products, precluding particular business practices, impacting Intel’s ability to design its products, or requiring other remedies such as compulsory licensing of intellectual property. A detailed 
discussion of these and other factors that could affect Intel’s results is included in Intel’s SEC filings, including the company’s most recent reports on Form 10-Q, Form 10-K and earnings release. 
Rev. 7/17/13
Optimization Notice 
4 
Optimization Notice 
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for 
optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, 
SSE3, and SSE3 instruction sets and other optimizations. Intel does not guarantee the 
availability, functionality, or effectiveness of any optimization on microprocessors not 
manufactured by Intel. 
Microprocessor-dependent optimizations in this product are intended for use with Intel 
microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for 
Intel microprocessors. Please refer to the applicable product User and Reference Guides for 
more information regarding the specific instruction sets covered by this notice. 
Notice revision #20110804
5 
Outline 
Motivation: A subset of science problems I have been involved with 
that have been difficult to investigate with previous 
hardware/software solutions 
What we have done: Recent hardware and software advances that have 
enabled new investigations 
What we are doing: Issues and the solutions required to continue to advance 
HPC capabilities
3 Example Science Drivers 
6
Liquid Crystal Biosensors 
7 
Label Free Diagnostics 
§ Alignment changes induced by interactions with 
biomolecules can be detected with optical microscopy 
§ "Critical to this area is a sound understanding of the 
theory and modelling of liquid-crystal materials at 
interfaces. Computer simulations of the molecular and 
mesoscopic interactions between thermotropic liquid-crystal 
materials … are a necessity. It is important to 
understand the interactions at these complicated 
interfaces“ - Woltman, et al., Nature Materials*, 6, 929 
(2007) 
§ Prior to 2013, no molecular simulations at 
experimentally relevant scales 
§ Computational limitation 
§ Extrapolations from small-scale studies difficult/dangerous 
due to the presence of undulative modes 
REVIEW ARTICLE 
Aqueous lipid-rich solution 
Aligned 
liquid crystal 
Lipid-depleted regions 
Glass 
substrate 
Patterned 
well 
Woltman, et al., Nature 
Materials*, 6, 929 (2007) 
T = 0 min T = 15 min T = 45 min T = 75 min 
* 
Other 
names 
and 
brands 
may 
be 
claimed 
as 
the 
property 
of 
others.
Icephobic Surfaces for Wind Power 
8 
Many of the regions of the world that 
can benefit from wind power are in cold 
climates 
Ice can reduce the efficiency of 
turbines or force shutdown 
Extremely difficult to probe ice 
formation experimentally 
Probing with simulation requires large 
simulations at time scales sufficient to 
capture rare nucleation events leading 
to ice formation
Organic Solar Cells 
9 
Organic photovoltaic (OPV) solar cells are promising 
renewable energy sources 
• Low cost, high-flexibility, light weight 
Morphology of the OPV active layer has a significant impact 
on device efficiency 
• Donor regions should be large enough to absorb light 
efficiently, but small enough to allow charge excitations 
to diffuse to acceptor before recombining 
• High interface-to-volume ratio for efficient exciton 
dissociation 
• Proper donor molecular alignment to optimize the 
carrier mobility of the donor phase 
Multi-scale problem 
• Very large simulations to reach experimental scales 
PCBM 
(Electron 
Acceptor) 
P3HT 
(Electron 
Donor)
Hardware and Software Advances 
10
11 
Many-core and GPUs 
Issues with power consumption, heat dissipation, and memory access latencies 
have forced a change in direction towards many-core processors for current and 
next-generation supercomputers that advance the limits of peak floating point 
performance 
GPU-based architectures were the first affordable many-core processors 
generally available 
ORNL* Titan came online in late 2012 as a hybrid supercomputer capable of over 
17PF HPL performance. 
• Cray* XK7 architecture with a Gemini* network interconnect and nodes 
containing a single AMD* Opteron* 6274 processor and Nvidia* Tesla K20X 
* 
Other 
names 
and 
brands 
may 
be 
claimed 
as 
the 
property 
of 
others.
12 
Center for Accelerated Application Readiness 
(CAAR at ORNL*) 
In order to exploit the potential performance gains on Titan, changes to the 
software are required 
CAAR was formed to address this issue for several codes before the upgrade to 
Titan 
• LAMMPS* (Large-scale Atomic/Molecular Massively Parallel Simulator) was 
the molecular dynamics code chosen for CAAR 
• We implemented the ‘GPU package’ for LAMMPS* with new algorithms suited for 
running on hybrid GPGPU architectures 
Brown, W.M., Wang, P. Plimpton, S.J., Tharrington, A.N. Implementing Molecular Dynamics on Hybrid High Performance 
Computers - Short Range Forces. Computer Physics Communications*. 2011. 182: p. 898-911. 
Brown, W.M., Kohlmeyer, A. Plimpton, S.J., Tharrington, A.N. Implementing Molecular Dynamics on Hybrid High Performance 
Computers - Particle-Particle Particle-Mesh. Computer Physics Communications*. 2012. 183: p. 449-459. 
* 
Other 
names 
and 
brands 
may 
be 
claimed 
as 
the 
property 
of 
others.
Spinoidal Dewetting in Liquid Crystal Layers 
Science Team: Trung Dac Nguyen, Jan-Michael Carrillo, Mike Matheson, W. Michael Brown, (ORNL*) 
13 
Problem: How/why do defects form in liquid 
crystal layers? 
• Not possible to study on previous Jaguar 
supercomputer with software available 
unless the machine was dedicated to the 
problem 
Innovation: Coarse-grain simulation model 
with high arithmetic intensity/vector potential, 
GPU algorithms 
Result: > 7X performance gain (1S CPU+GPU 
vs 2S CPU); First simulations at scale with 
molecular detail 
Nguyen, T. D., Carrillo, J.-M. Y., Matheson, M. A., 
Brown, W.M., Rupture Mechanism of liquid 
crystal thin films realized by large-scale 
molecular simulations. Nanoscale*. 2014. 6: 
3083-96. 
* 
Other 
names 
and 
brands 
may 
be 
claimed 
as 
the 
property 
of 
others.
Icephobic Surfaces 
Science Team: Yamada Masako, et. al (GE* Global Research) 
14 
Problem: Probe mechanism for water droplet 
freezing on a surface with molecular detail 
Innovation: New 3-body potential for water 
simulation (E. B. Moore, V. Molinero, Nature* 
479 (2011) 506–508) 
• Coarse-grain model, eliminate all-to-all 
communications for electrostatics, larger 
timestep 
• > 100X Simulation rate speedup from model 
• Additional concurrency for 3-body: O(N3) 
independent force computations vs O(N2) 
[for N atoms within cutoff radius] 
• Additional 2-3X Simulation speedup on XK7 
node (1S CPU+GPU) vs 2S Opteron* 
Result: Simulation of water freezing 
process now typical with hundreds of 
nodes. 
Brown, W.M., Masako, Y. Implementing Molecular 
Dynamics on Hybrid High Performance 
Computers – Three-Body Potentials. Computer 
Physics Communications*. 2013. 184: p. 2785-2793. 
* 
Other 
names 
and 
brands 
may 
be 
claimed 
as 
the 
property 
of 
others.
Issues with Hybrid GPU Machines 
15
Issues with the Programming Model 
16 
Using CUDA* with C/C++ semantics introduces multiple languages, code paths, 
compiler requirements, etc. into a code base. 
• Support for Fortran or x86 targets requires proprietary compilers 
OpenACC* can potentially address this, but in my opinion, will still require 
separate code paths and different algorithms for x86 in general 
• GPUs have 10,000s of threads in flight, different performance for atomic 
operations, different penalties for thread synchronization, context switching, 
etc. 
• For example, the algorithms used for molecular dynamics on the GPU would 
not perform well on x86 processors. 
• For the 3-body water/ice simulation presented here, we triple the number of force 
computations on the GPU with redundant computations! 
* 
Other 
names 
and 
brands 
may 
be 
claimed 
as 
the 
property 
of 
others.
Optimizing separate code paths 
17 
Optimizations for GPU do not necessarily improve 
performance on CPUs 
• Many production codes supporting GPU acceleration still use the 
CPU for many routines and must still support CPU-only 
• Intel® Xeon® processors are still improving the performance/ 
socket 
• 85% of all systems on TOP500* use Intel® processors (97% of 
new systems) 
• Future Intel® many-core processors will be bootable (no 
coprocessor necessary) 
• Difficult to manage/debug/port software with separate optimization 
code paths 
No Coprocessor/GPU 
1 
8.38 
2.94 
12.43 
14 
12 
10 
8 
6 
4 
2 
0 
Liquid Crystal Benchmark Simulation 
Rate 
(Higher is Better) 
(2012) LAMMPS Baseline on 2S AMD* Opteron* 6274 [1600MHz DDR3] 
(2012) LAMMPS w/ GPU Optimizations on 1S AMD* Opteron* 6274 + 
Nvidia* Tesla* K20X 
(2014) LAMMPS w/ GPU Optimizations on 2S Intel® Xeon® E5-2697v3 
[2133 MHz DDR4] 
(2014) LAMMPS w/ Intel® Xeon Phi™ Coprocessor Optimizations on 2S 
Intel® Xeon® E5-2697v3 [2133 MHz DDR4] 
Source: 
AMD* & AMD/NVIDIA* results: http://www.nvidia.com/docs/IO/122634/computational-chemistry-benchmarks.pdf 
Intel Results: Intel Measured August 2014 
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and 
MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. 
You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when 
combined with other products. See benchmark tests and configurations in the speaker notes. For more information go to http://www.intel.com/performance 
* Other names and brands may be claimed as the property of others.
Intel® Package for LAMMPS 
18
Motivation 
19 
Provide a workspace for demonstrating code 
modernization strategies with vectorization for 
different precision modes, OpenMP*, and 
MPI*-3 
Support computation offload to Intel® 
coprocessors with performance that is 
comparable or better than GPU performance 
Demonstrate portable performance with a 
single code path using standard MD 
algorithms with C++/OpenMP* 
Provide routines that allow the community to 
easily add new functionality by example 
LAMMPS* 
Rhodopsin 
Protein 
Benchmark 
512K 
32.00 
16.00 
8.00 
4.00 
2.00 
1.00 
0.50 
Atoms 
(TACC* 
Stampede) 
1 
2 
4 
8 
16 
32 
Simula@on 
Time 
(Lower 
is 
beEer) 
Nodes 
2S 
Intel® 
Xeon® 
Processor 
E5-­‐2680 
2S 
Intel® 
Xeon® 
Processor 
E5-­‐2680 
+ 
Intel® 
Xeon 
Phi™ 
Coprocessor 
SE10P 
2S 
Intel® 
Xeon® 
Processor 
E5-­‐2680 
+ 
Nvidia* 
Tesla 
K20m 
Stampede Configuration: HT Off, 32GB DDR3-1600MHz, PCIe2 
(GPU), PCIe2 (Intel® Coprocessor), FDR 56 Gb/s, MPSS 3.3, 
MVAPICH 2.0b 
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and 
MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. 
You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when 
combined with other products. See benchmark tests and configurations in the speaker notes. For more information go to http://www.intel.com/performance 
Performance improvements 
with Intel® package still 
significant with 1000 atoms 
per CPU core 
Source: Intel Measured August 2014 
* Other names and brands may be claimed as the property of others.
20 
Current Optimizations in Intel Package 
Data alignment 
Support for mixed and single precision modes in addition to double 
SIMD directives to allow compiler vectorization for routines with data 
dependencies 
Modifications to conditional branches to better support compiler vectorization for 
both coprocessors and Intel® Advanced Vector Extensions (Intel® AVX) 
Neighbor-list padding to prevent execution of vector remainder code 
Offload directives to manage data allocation, transfer, and concurrent 
computation on the coprocessor 
* Other names and brands may be claimed as the property of others.
21 
Advantages of Intel® Package vs GPU Package (1) 
• Same code for routines run on the CPU and coprocessor (with or without offload) 
• Optimizations for Intel® Xeon Phi™ coprocessors resulted in faster performance on Intel® Xeon® 
processors (up to 3.5X) 
• GPU package uses different algorithms and different code/language 
• Support for both ‘newton’ settings allows for more flexibility for new force-fields 
• Improved flexibility for heterogeneous calculations 
• Intel® coprocessor offload not limited to 16 MPI* tasks on CPU (CUDA*-MPS limitation) 
• Intel® package supports OpenMP* with multiple threads on the CPU (GPU package does not use 
OpenMP) 
• MPI* tasks sharing coprocessor are able to get exclusive core affinity 
* 
Other 
names 
and 
brands 
may 
be 
claimed 
as 
the 
property 
of 
others.
22 
Advantages of Intel® Package vs GPU Package (2) 
• More options for overlap of MPI* communications and computation 
• Build process is simpler and does not require building a separate library for 
coprocessor routines 
• One compiler/Makefile for everything 
• Precision mode (single, mixed, or double) can be switched at run-time without 
rebuilding 
• Package written in standard C++ with OpenMP* 
• Offload directives used for the coprocessor 
* 
Other 
names 
and 
brands 
may 
be 
claimed 
as 
the 
property 
of 
others.
Organic Solar Cells 
Science Team: Jan-Michael Y Carrillo, Rajeev Kumar, Monojoy Goswami, S. Michael Kilbey II, Bobby G Sumpter (ORNL*/ 
UT*) 
23 
Problem: Predictive simulation of active 
layer morphology and molecular alignment 
based on blend composition 
Innovation: Code Modernization/HW 
Result: With code modernization and advanced 
HPC resources, we have been the first to perform 
simulations at scales that match experiment. 
• Left: Up to 1.9X with use of a coprocessor 
• Simulations include all of the statistics and I/O 
(about 10% of run time) from the production runs 
• Significant potential for advanced multiscale 
simulation models with coprocessors… 
256.00 
128.00 
64.00 
32.00 
16.00 
8.00 
OPV 
Simula@on 
1.77M 
Atoms 
(Stampede) 
2 
4 
8 
16 
32 
64 
Simula@on 
Time 
(Lower 
is 
beEer) 
Nodes 
2S 
Intel® 
Xeon® 
Processor 
E5-­‐2680 
(Baseline) 
2S 
Intel® 
Xeon® 
Processor 
E5-­‐2680 
(Intel® 
Package) 
2S 
Intel® 
Xeon® 
Processor 
E5-­‐2680 
+ 
Intel® 
Xeon 
Phi™ 
SE10P 
2S 
Intel® 
Xeon® 
Processor 
E5-­‐2680 
+ 
Nvidia* 
Tesla* 
K20m 
COARSE-­‐GRAINED 
MD 
• Morphology 
• Phase 
segregation 
Carrillo, J.-M. Y., Kumar, R., Goswami, G., Sumpter, B., Brown, W.M., New Insights 
into Dynamics and Morphology of P3HT:PCBM Active Layers in Bulk 
Heterojunctions. Physical Chemistry Chemical Physics. 2013. 15: p. 17873-17882. 
Source: Michael Brown 
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. 
Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, 
operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information 
and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product 
when combined with other products. See benchmark tests and configurations in the speaker notes. For more information go to 
http://www.intel.com/performance 
* Other names and brands may be claimed as the property of others.
LAMMPS performance with current processors 
24 
1.00 
4.68 
7.64 
7.12 
1.92 
6.58 
8.66 
9.71 
2.30 
10.36 
12.44 
14.00 
12.00 
10.00 
8.00 
6.00 
4.00 
2.00 
0.00 
Baseline 
(CPU 
Only) 
Intel® 
Package 
(CPU 
Only) 
Offload 
to 
GPU 
Offload 
to 
Intel® 
Xeon 
Phi™ 
Coprocessor 
Simula@on 
Rate 
(Higher 
is 
BeEer) 
LAMMPS 
Liquid 
Crystal 
Benchmark 
524K 
Atoms 
2S 
Intel® 
Xeon® 
Processor 
E5-­‐2680 
/ 
Nvidia* 
Tesla* 
K20m 
/ 
Intel® 
Xeon 
Phi™ 
Coprocessor 
SE10P† 
2S 
Intel® 
Xeon® 
Processor 
E5-­‐2697v2 
/ 
Nvidia* 
Tesla* 
K40c 
/ 
Intel® 
Xeon 
Phi™ 
Coprocessor 
7120A 
‡ 
2S 
Intel® 
Xeon® 
Processor 
E5-­‐2697v3 
/ 
HW 
Not 
Available 
/ 
Intel® 
Xeon 
Phi™ 
Coprocessor 
7120A‡ 
1.00 
1.21 
2.30 
2.16 
1.51 
1.82 
2.68 
2.70 
1.82 
2.31 
3.06 
3.50 
3.00 
2.50 
2.00 
1.50 
1.00 
0.50 
0.00 
Baseline 
(CPU 
Only) 
Intel® 
Package 
(CPU 
Only) 
Offload 
to 
GPU 
Offload 
to 
Intel® 
Xeon 
Phi™ 
Coprocessor 
Simula@on 
Rate 
(Higher 
is 
BeEer) 
LAMMPS 
Rhodopsin 
Protein 
Benchmark 
512K 
Atoms 
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using 
specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests 
to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. See benchmark tests and configurations in the speaker 
notes. For more information go to http://www.intel.com/performance 
* Other names and brands may be claimed as the property of others.
Issues/Improvements 
25
Available/Ongoing/Future Improvements 
26 
Compiler vectorization 
• Significant advances in compiler 
vectorization using the Intel® 
Composer XE 2015. 
• New: Vector variant functions that 
allow explicit vector coding for small 
subroutines 
• Advanced compiler optimization 
reports with Intel® Composer XE 
2015 and runtime analysis with 
Intel® VTune™ Amplifier XE 
MPI Performance in symmetric modes 
• Significant performance 
improvements with Intel® MPI* 5. 
• Future many-core chips will have 
bootable options and also options 
for integrated fabric
Unveiling Details of Knights Landing 
(Next Generation Intel® Xeon Phi™ Products) 
Compute: Energy-efficient IA cores2 
§ Microarchitecture enhanced for HPC3 
§ 3X Single Thread Performance vs Knights Corner4 
§ Intel Xeon Processor Binary Compatible5 
§ 1/3X the Space6 
§ 5X Power Efficiency6 
Platform Memory: DDR4 Bandwidth and 
Capacity Comparable to Intel® Xeon® 
Processors 
. 
. 
. 
. 
. 
. 
. . . 
. . . 
. . . 
. . . 
. . . 
Intel® Silvermont Arch. 
Enhanced for HPC 
Integrated Fabric 
Processor Package 
. . . 
. . . 
All 
products, 
computer 
systems, 
dates 
and 
figures 
specified 
are 
preliminary 
based 
on 
current 
expecta@ons, 
and 
are 
subject 
to 
change 
without 
no@ce. 
1Over 
3 
Teraflops 
of 
peak 
theore@cal 
double-­‐precision 
performance 
is 
preliminary 
and 
based 
on 
current 
expecta@ons 
of 
cores, 
clock 
frequency 
and 
floa@ng 
point 
opera@ons 
per 
cycle. 
FLOPS 
= 
cores 
x 
clock 
frequency 
x 
floa@ng-­‐ 
point 
opera@ons 
per 
second 
per 
cycle. 
. 
2Modified 
version 
of 
Intel® 
Silvermont 
microarchitecture 
currently 
found 
in 
Intel® 
AtomTM 
processors. 
3Modifica@ons 
include 
AVX512 
and 
4 
threads/ 
core 
support. 
4Projected 
peak 
theore@cal 
single-­‐thread 
performance 
rela@ve 
to 
1st 
Genera@on 
Intel® 
Xeon 
Phi™ 
Coprocessor 
7120P 
(formerly 
codenamed 
Knights 
Corner). 
5 
Binary 
Compa@ble 
with 
Intel 
Xeon 
processors 
using 
Haswell 
Instruc@on 
Set 
(except 
TSX) 
. 
6Projected 
results 
based 
on 
internal 
Intel 
analysis 
of 
Knights 
Landing 
memory 
vs 
Knights 
Corner 
(GDDR5). 
7Projected 
result 
based 
on 
internal 
Intel 
analysis 
of 
STREAM 
benchmark 
using 
a 
Knights 
Landing 
processor 
with 
16GB 
of 
ultra 
high-­‐bandwidth 
versus 
DDR4 
memory 
only 
with 
all 
channels 
populated. 
Conceptual—Not Actual Package Layout 
… 
Jointly Developed with Micron Technology 
Designed using Intel’s cutting-edge 
14nm Transistor 
Technology 
Not bound by “offloading” bottlenecks 
Standalone CPU or 
PCIe Coprocessor 
Common instruction set architecture 
Intel® Advanced Vector 
On-Package Memory: Extensions 512 
§ up to 16GB at launch 
§ 5X Bandwidth vs DDR47
Summary 
28
29 
Summary 
We have demonstrated that important research for a variety of science problems 
continues to benefit from hardware and software advances in HPC. 
We believe that the Intel® path forward for HPC architecture and software offers 
a solution that allows for a simpler code base and reduced software effort 
• We are dedicated to providing solutions that allow for portable performance 
across a range of different processors for both conventional HPC centers and 
those on a path to exa-scale. 
We have demonstrated, in a large production code, that optimizations for Intel® 
Xeon Phi™ coprocessors also improve performance on Intel® Xeon® processors 
and that the same routine can be used for efficient computations on both
Summary (2) 
30 
The innovations here also included 
reconsideration of the simulation model 
• Important to realize that there is often a 
choice in the model used, and that in the 
past, minimizing computation was 
important to performance 
• It may be the case that you can exploit 
additional arithmetic intensity and 
concurrency to increase the accuracy, 
time-step, etc. in order to advance 
research capabilities 
• This has already been the case for 
simulation in materials problems 
GPT BOP 
CHARMm 
Stillinger-Weber 
EAM 
Tersoff 
AIREBO 
MEAM 
ReaxFF 
ReaxFF/C 
eFF 
COMB 
T/2 
) 
REBO 
EIM 
GAP 
1980 1990 2000 2010 
Year Published 
-3 
10 
-5 
10 
10 
-6 
10 
-4 
10 
-2 
Cost [core-sec/atom-timestep] 
O(2 
Computational Aspects of Many-body 
Potentials, S. J. Plimpton and A. P. Thompson, 
MRS Bulletin*, 37, 513-521 (2012). 
* Other names and brands may be claimed as the property of others.
31 
Acknowledgements 
Research by the teams presented here made possible by: 
DOE Early Science and ALCC Programs 
This research used resources of the Oak Ridge Leadership Computing Facility* at the Oak Ridge National Laboratory*, 
which is supported by the Office of Science of the U.S. Department of Energy* under Contract No. DE-AC05-00OR22725. 
NSF TACC* Stampede Project: ACI-1134872 
The researchers acknowledge the Texas Advanced Computing Center* (TACC) at The University of Texas at Austin* for 
providing HPC resources that have contributed to the research results reported within this paper. URL: http:// 
www.tacc.utexas.edu 
NSF UT* Beacon Project 
This material is based upon work supported by the National Science Foundation under Grant Number 1137097 and by the 
University of Tennessee* through the Beacon Project. Any opinions, findings, and conclusions or recommendations 
expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science 
Foundation* or the University of Tennessee*. 
* Other names and brands may be claimed as the property of others.
32 
Getting Access to Intel® Xeon Phi™ HPC Systems 
NSF XSEDE*: 
TACC* Stampede, LSU* SuperMIC 
https://www.xsede.org/allocations 
NSF UT* Beacon: 
https://www.nics.tennessee.edu/computing-resources/beacon/allocations 
DOE NERSC* Users (Babbage): 
https://www.nersc.gov/users/computational-systems/testbeds/babbage/ 
Purdue* Conte (Purchase available): 
https://www.rcac.purdue.edu/compute/conte/ 
* Other names and brands may be claimed as the property of others.
Thanks! 
Intel Confidential — Do Not Forward

Más contenido relacionado

Destacado

9 Testimonio Colegio de Postgraduados Puebla
9 Testimonio Colegio de Postgraduados Puebla9 Testimonio Colegio de Postgraduados Puebla
9 Testimonio Colegio de Postgraduados PueblaPedro Morlet
 
Energy resource management
Energy resource managementEnergy resource management
Energy resource managementRiddhima Kartik
 
DONG Energy Recommendation
DONG Energy RecommendationDONG Energy Recommendation
DONG Energy RecommendationPiotr Pilzek
 
SaveCal (SaveCal.com) Save Cal Affordable Financial Programs for Energy Effic...
SaveCal (SaveCal.com) Save Cal Affordable Financial Programs for Energy Effic...SaveCal (SaveCal.com) Save Cal Affordable Financial Programs for Energy Effic...
SaveCal (SaveCal.com) Save Cal Affordable Financial Programs for Energy Effic...Savecal.com
 
Integrating Renewable Energy Systems
Integrating Renewable Energy SystemsIntegrating Renewable Energy Systems
Integrating Renewable Energy Systemsjeff_ranson
 
"Business Plan 2007-2011, the Gulf of Mexico, and Renewable Fuels"
"Business Plan 2007-2011, the Gulf of Mexico, and Renewable Fuels""Business Plan 2007-2011, the Gulf of Mexico, and Renewable Fuels"
"Business Plan 2007-2011, the Gulf of Mexico, and Renewable Fuels"Petrobras
 
Seth Kaplan presentation at EUCI conference on Renewable Energy Markets in Ne...
Seth Kaplan presentation at EUCI conference on Renewable Energy Markets in Ne...Seth Kaplan presentation at EUCI conference on Renewable Energy Markets in Ne...
Seth Kaplan presentation at EUCI conference on Renewable Energy Markets in Ne...Seth Kaplan
 
Energy audits in action
Energy audits in action Energy audits in action
Energy audits in action Lonnie Russell
 
Auctioning RE projects: Lessons learned from auction design for renewable ele...
Auctioning RE projects: Lessons learned from auction design for renewable ele...Auctioning RE projects: Lessons learned from auction design for renewable ele...
Auctioning RE projects: Lessons learned from auction design for renewable ele...Leonardo ENERGY
 
Willieson Shamo, Head, Upstream & Ag. Director of Petroleum at Ministry of En...
Willieson Shamo, Head, Upstream & Ag. Director of Petroleum at Ministry of En...Willieson Shamo, Head, Upstream & Ag. Director of Petroleum at Ministry of En...
Willieson Shamo, Head, Upstream & Ag. Director of Petroleum at Ministry of En...Global Business Events
 

Destacado (13)

Aci wood stoves & energy audits
Aci wood stoves & energy auditsAci wood stoves & energy audits
Aci wood stoves & energy audits
 
9 Testimonio Colegio de Postgraduados Puebla
9 Testimonio Colegio de Postgraduados Puebla9 Testimonio Colegio de Postgraduados Puebla
9 Testimonio Colegio de Postgraduados Puebla
 
Development and Expansion of Alternative Energy Markets
Development and Expansion of Alternative Energy MarketsDevelopment and Expansion of Alternative Energy Markets
Development and Expansion of Alternative Energy Markets
 
Energy resource management
Energy resource managementEnergy resource management
Energy resource management
 
DONG Energy Recommendation
DONG Energy RecommendationDONG Energy Recommendation
DONG Energy Recommendation
 
SaveCal (SaveCal.com) Save Cal Affordable Financial Programs for Energy Effic...
SaveCal (SaveCal.com) Save Cal Affordable Financial Programs for Energy Effic...SaveCal (SaveCal.com) Save Cal Affordable Financial Programs for Energy Effic...
SaveCal (SaveCal.com) Save Cal Affordable Financial Programs for Energy Effic...
 
Integrating Renewable Energy Systems
Integrating Renewable Energy SystemsIntegrating Renewable Energy Systems
Integrating Renewable Energy Systems
 
"Business Plan 2007-2011, the Gulf of Mexico, and Renewable Fuels"
"Business Plan 2007-2011, the Gulf of Mexico, and Renewable Fuels""Business Plan 2007-2011, the Gulf of Mexico, and Renewable Fuels"
"Business Plan 2007-2011, the Gulf of Mexico, and Renewable Fuels"
 
Seth Kaplan presentation at EUCI conference on Renewable Energy Markets in Ne...
Seth Kaplan presentation at EUCI conference on Renewable Energy Markets in Ne...Seth Kaplan presentation at EUCI conference on Renewable Energy Markets in Ne...
Seth Kaplan presentation at EUCI conference on Renewable Energy Markets in Ne...
 
Energy audits in action
Energy audits in action Energy audits in action
Energy audits in action
 
Auctioning RE projects: Lessons learned from auction design for renewable ele...
Auctioning RE projects: Lessons learned from auction design for renewable ele...Auctioning RE projects: Lessons learned from auction design for renewable ele...
Auctioning RE projects: Lessons learned from auction design for renewable ele...
 
Low carbon Finland 2050. VTT clean energy technology strategies for society
Low carbon Finland 2050. VTT clean energy technology strategies for societyLow carbon Finland 2050. VTT clean energy technology strategies for society
Low carbon Finland 2050. VTT clean energy technology strategies for society
 
Willieson Shamo, Head, Upstream & Ag. Director of Petroleum at Ministry of En...
Willieson Shamo, Head, Upstream & Ag. Director of Petroleum at Ministry of En...Willieson Shamo, Head, Upstream & Ag. Director of Petroleum at Ministry of En...
Willieson Shamo, Head, Upstream & Ag. Director of Petroleum at Ministry of En...
 

Similar a Advancing Science in Alternative Energy and Bioengineering with Many-Core Processors

Improving the performance of OpenSubdiv* on Intel Architecture
Improving the performance of OpenSubdiv* on Intel ArchitectureImproving the performance of OpenSubdiv* on Intel Architecture
Improving the performance of OpenSubdiv* on Intel ArchitectureIntel® Software
 
Arquitetura do coprocessador Intel® Xeon Phi™ - Intel Software Conference 2013
Arquitetura do coprocessador Intel® Xeon Phi™ - Intel Software Conference 2013Arquitetura do coprocessador Intel® Xeon Phi™ - Intel Software Conference 2013
Arquitetura do coprocessador Intel® Xeon Phi™ - Intel Software Conference 2013Intel Software Brasil
 
Using Xeon + FPGA for Accelerating HPC Workloads
Using Xeon + FPGA for Accelerating HPC WorkloadsUsing Xeon + FPGA for Accelerating HPC Workloads
Using Xeon + FPGA for Accelerating HPC Workloadsinside-BigData.com
 
Intel HPC Update
Intel HPC UpdateIntel HPC Update
Intel HPC UpdateIBM Danmark
 
Ceph Day Shanghai - VSM (Virtual Storage Manager) - Simplify Ceph Management ...
Ceph Day Shanghai - VSM (Virtual Storage Manager) - Simplify Ceph Management ...Ceph Day Shanghai - VSM (Virtual Storage Manager) - Simplify Ceph Management ...
Ceph Day Shanghai - VSM (Virtual Storage Manager) - Simplify Ceph Management ...Ceph Community
 
Intel atom processor_z2420_product_brief
Intel atom processor_z2420_product_briefIntel atom processor_z2420_product_brief
Intel atom processor_z2420_product_briefAntonio Da Silva Campos
 
Austin Cherian: Big data and HPC technologies - intel
Austin Cherian: Big data and HPC technologies - intelAustin Cherian: Big data and HPC technologies - intel
Austin Cherian: Big data and HPC technologies - intelVu Hung Nguyen
 
O uso de tecnologias Intel na implantação de sistemas de alto desempenho
O uso de tecnologias Intel na implantação de sistemas de alto desempenhoO uso de tecnologias Intel na implantação de sistemas de alto desempenho
O uso de tecnologias Intel na implantação de sistemas de alto desempenhoIntel Software Brasil
 
Transforming Business with Advanced Analytics
Transforming Business with Advanced AnalyticsTransforming Business with Advanced Analytics
Transforming Business with Advanced AnalyticsIntel IT Center
 
Yocto Project Open Source Build System and Collaboration Initiative
Yocto Project Open Source Build System and Collaboration InitiativeYocto Project Open Source Build System and Collaboration Initiative
Yocto Project Open Source Build System and Collaboration InitiativeMarcelo Sanz
 
8 intel network builders overview
8 intel network builders overview8 intel network builders overview
8 intel network builders overviewvideos
 
Intel: мобильность и трансформация рабочего места
Intel: мобильность и трансформация рабочего местаIntel: мобильность и трансформация рабочего места
Intel: мобильность и трансформация рабочего местаExpolink
 
2014-vol18-iss-2-intel-technology-journal
2014-vol18-iss-2-intel-technology-journal2014-vol18-iss-2-intel-technology-journal
2014-vol18-iss-2-intel-technology-journalRyan M. Cohen
 
Lynn Comp - Intel Big Data & Cloud Summit 2013 (2)
Lynn Comp - Intel Big Data & Cloud Summit 2013 (2)Lynn Comp - Intel Big Data & Cloud Summit 2013 (2)
Lynn Comp - Intel Big Data & Cloud Summit 2013 (2)IntelAPAC
 
4 dpdk roadmap(1)
4 dpdk roadmap(1)4 dpdk roadmap(1)
4 dpdk roadmap(1)videos
 
Intel® AI: Reinforcement Learning Coach
Intel® AI:  Reinforcement Learning Coach Intel® AI:  Reinforcement Learning Coach
Intel® AI: Reinforcement Learning Coach Intel® Software
 
AI & Computer Vision (OpenVINO) - CPBR12
AI & Computer Vision (OpenVINO) - CPBR12AI & Computer Vision (OpenVINO) - CPBR12
AI & Computer Vision (OpenVINO) - CPBR12Jomar Silva
 

Similar a Advancing Science in Alternative Energy and Bioengineering with Many-Core Processors (20)

Improving the performance of OpenSubdiv* on Intel Architecture
Improving the performance of OpenSubdiv* on Intel ArchitectureImproving the performance of OpenSubdiv* on Intel Architecture
Improving the performance of OpenSubdiv* on Intel Architecture
 
DreamWork Animation DWA
DreamWork Animation DWADreamWork Animation DWA
DreamWork Animation DWA
 
DreamWorks Animation
DreamWorks AnimationDreamWorks Animation
DreamWorks Animation
 
Arquitetura do coprocessador Intel® Xeon Phi™ - Intel Software Conference 2013
Arquitetura do coprocessador Intel® Xeon Phi™ - Intel Software Conference 2013Arquitetura do coprocessador Intel® Xeon Phi™ - Intel Software Conference 2013
Arquitetura do coprocessador Intel® Xeon Phi™ - Intel Software Conference 2013
 
Using Xeon + FPGA for Accelerating HPC Workloads
Using Xeon + FPGA for Accelerating HPC WorkloadsUsing Xeon + FPGA for Accelerating HPC Workloads
Using Xeon + FPGA for Accelerating HPC Workloads
 
Intel HPC Update
Intel HPC UpdateIntel HPC Update
Intel HPC Update
 
Ceph Day Shanghai - VSM (Virtual Storage Manager) - Simplify Ceph Management ...
Ceph Day Shanghai - VSM (Virtual Storage Manager) - Simplify Ceph Management ...Ceph Day Shanghai - VSM (Virtual Storage Manager) - Simplify Ceph Management ...
Ceph Day Shanghai - VSM (Virtual Storage Manager) - Simplify Ceph Management ...
 
Intel atom processor_z2420_product_brief
Intel atom processor_z2420_product_briefIntel atom processor_z2420_product_brief
Intel atom processor_z2420_product_brief
 
Austin Cherian: Big data and HPC technologies - intel
Austin Cherian: Big data and HPC technologies - intelAustin Cherian: Big data and HPC technologies - intel
Austin Cherian: Big data and HPC technologies - intel
 
O uso de tecnologias Intel na implantação de sistemas de alto desempenho
O uso de tecnologias Intel na implantação de sistemas de alto desempenhoO uso de tecnologias Intel na implantação de sistemas de alto desempenho
O uso de tecnologias Intel na implantação de sistemas de alto desempenho
 
VIOPS08: マイクロサーバー アーキテクチャトレンド
VIOPS08: マイクロサーバー アーキテクチャトレンドVIOPS08: マイクロサーバー アーキテクチャトレンド
VIOPS08: マイクロサーバー アーキテクチャトレンド
 
Transforming Business with Advanced Analytics
Transforming Business with Advanced AnalyticsTransforming Business with Advanced Analytics
Transforming Business with Advanced Analytics
 
Yocto Project Open Source Build System and Collaboration Initiative
Yocto Project Open Source Build System and Collaboration InitiativeYocto Project Open Source Build System and Collaboration Initiative
Yocto Project Open Source Build System and Collaboration Initiative
 
8 intel network builders overview
8 intel network builders overview8 intel network builders overview
8 intel network builders overview
 
Intel: мобильность и трансформация рабочего места
Intel: мобильность и трансформация рабочего местаIntel: мобильность и трансформация рабочего места
Intel: мобильность и трансформация рабочего места
 
2014-vol18-iss-2-intel-technology-journal
2014-vol18-iss-2-intel-technology-journal2014-vol18-iss-2-intel-technology-journal
2014-vol18-iss-2-intel-technology-journal
 
Lynn Comp - Intel Big Data & Cloud Summit 2013 (2)
Lynn Comp - Intel Big Data & Cloud Summit 2013 (2)Lynn Comp - Intel Big Data & Cloud Summit 2013 (2)
Lynn Comp - Intel Big Data & Cloud Summit 2013 (2)
 
4 dpdk roadmap(1)
4 dpdk roadmap(1)4 dpdk roadmap(1)
4 dpdk roadmap(1)
 
Intel® AI: Reinforcement Learning Coach
Intel® AI:  Reinforcement Learning Coach Intel® AI:  Reinforcement Learning Coach
Intel® AI: Reinforcement Learning Coach
 
AI & Computer Vision (OpenVINO) - CPBR12
AI & Computer Vision (OpenVINO) - CPBR12AI & Computer Vision (OpenVINO) - CPBR12
AI & Computer Vision (OpenVINO) - CPBR12
 

Más de inside-BigData.com

Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...inside-BigData.com
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networksinside-BigData.com
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...inside-BigData.com
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...inside-BigData.com
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...inside-BigData.com
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networksinside-BigData.com
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoringinside-BigData.com
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecastsinside-BigData.com
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Updateinside-BigData.com
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19inside-BigData.com
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuninginside-BigData.com
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODinside-BigData.com
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Accelerationinside-BigData.com
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficientlyinside-BigData.com
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Erainside-BigData.com
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computinginside-BigData.com
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Clusterinside-BigData.com
 

Más de inside-BigData.com (20)

Major Market Shifts in IT
Major Market Shifts in ITMajor Market Shifts in IT
Major Market Shifts in IT
 
Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networks
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networks
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecasts
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Update
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuning
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
 
State of ARM-based HPC
State of ARM-based HPCState of ARM-based HPC
State of ARM-based HPC
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Acceleration
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Era
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computing
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Cluster
 
Overview of HPC Interconnects
Overview of HPC InterconnectsOverview of HPC Interconnects
Overview of HPC Interconnects
 

Último

The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 

Último (20)

The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 

Advancing Science in Alternative Energy and Bioengineering with Many-Core Processors

  • 1. Advancing Science in Alternative Energy and Bioengineering with Many-Core Processors W. Michael Brown, Intel® Corporation† Jan-Michael Carrillo, Oak Ridge National Laboratory* Intel Confidential — Do Not Forward 54th HPC User Forum September 15-17, 2014 Seattle, Washington †Disclaimer: This talk includes results from my tenure at ORNL* that do not necessarily reflect the views, research directions, etc. of Intel® Corporation. * Other names and brands may be claimed as the property of others.
  • 2. Legal Disclaimers INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. A "Mission Critical Application" is any application in which failure of the Intel Product could result, directly or indirectly, in personal injury or death. SHOULD YOU PURCHASE OR USE INTEL'S PRODUCTS FOR ANY SUCH MISSION CRITICAL APPLICATION, YOU SHALL INDEMNIFY AND HOLD INTEL AND ITS SUBSIDIARIES, SUBCONTRACTORS AND AFFILIATES, AND THE DIRECTORS, OFFICERS, AND EMPLOYEES OF EACH, HARMLESS AGAINST ALL CLAIMS COSTS, DAMAGES, AND EXPENSES AND REASONABLE ATTORNEYS' FEES ARISING OUT OF, DIRECTLY OR INDIRECTLY, ANY CLAIM OF PRODUCT LIABILITY, PERSONAL INJURY, OR DEATH ARISING IN ANY WAY OUT OF SUCH MISSION CRITICAL APPLICATION, WHETHER OR NOT INTEL OR ITS SUBCONTRACTOR WAS NEGLIGENT IN THE DESIGN, MANUFACTURE, OR WARNING OF THE INTEL PRODUCT OR ANY OF ITS PARTS. Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined". Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information. The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request. Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order. Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be obtained by calling 1-800-548-4725, or go to: http://www.intel.com/design/literature.htm Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. Intel does not control or audit the design or implementation of third party benchmarks or Web sites referenced in this document. Intel encourages all of its customers to visit the referenced Web sites or others where similar performance benchmarks are reported and confirm whether the referenced benchmarks are accurate and reflect performance of systems available for purchase. Relative performance is calculated by assigning a baseline value of 1.0 to one benchmark result, and then dividing the actual benchmark result for the baseline platform into each of the specific benchmark results of each of the other platforms, and assigning them a relative performance number that correlates with the performance improvements reported. SPEC, SPECint, SPECfp, SPECrate. SPECpower, SPECjAppServer, SPECjbb, SPECjvm, SPECWeb, SPECompM, SPECompL, SPEC MPI, SPECjEnterprise* are trademarks of the Standard Performance Evaluation Corporation. See http://www.spec.org for more information. TPC-C, TPC-H, TPC-E are trademarks of the Transaction Processing Council. See http://www.tpc.org for more information. Hyper-Threading Technology requires a computer system with a processor supporting HT Technology and an HT Technology-enabled chipset, BIOS and operating system. Performance will vary depending on the specific hardware and software you use. For more information including details on which processors support HT Technology, see here Intel® Turbo Boost Technology requires a Platform with a processor with Intel Turbo Boost Technology capability. Intel Turbo Boost Technology performance varies depending on hardware, software and overall system configuration. Check with your platform manufacturer on whether your system delivers Intel Turbo Boost Technology. For more information, see http://www.intel.com/technology/turboboost No computer system can provide absolute security. Requires an enabled Intel® processor and software optimized for use of the technology. Consult your system manufacturer and/or software vendor for more information. Intel processor numbers are not a measure of performance. Processor numbers differentiate features within each processor family, not across different processor families: Go to: Learn About Intel® Processor Numbers Intel product plans in this presentation do not constitute Intel plan of record product roadmaps. Please contact your Intel representative to obtain Intel’s current plan of record product roadmaps. Copyright © 2014 Intel Corporation. All rights reserved. Intel, the Intel logo, Xeon and Intel Core are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. All dates and products specified are for planning purposes only and are subject to change without notice *Other names and brands may be claimed as the property of others. 2
  • 3. Risk Factors The above statements and any others in this document that refer to plans and expectations for the third quarter, the year and the future are forward-looking statements that involve a number of risks and uncertainties. Words such as “anticipates,” “expects,” “intends,” “plans,” “believes,” “seeks,” “estimates,” “may,” “will,” “should” and their variations identify forward-looking statements. Statements that refer to or are based on projections, uncertain events or assumptions also identify forward-looking statements. Many factors could affect Intel’s actual results, and variances from Intel’s current expectations regarding such factors could cause actual results to differ materially from those expressed in these forward-looking statements. Intel presently considers the following to be the important factors that could cause actual results to differ materially from the company’s expectations. Demand could be different from Intel's expectations due to factors including changes in business and economic conditions; customer acceptance of Intel’s and competitors’ products; supply constraints and other disruptions affecting customers; changes in customer order patterns including order cancellations; and changes in the level of inventory at customers. Uncertainty in global economic and financial conditions poses a risk that consumers and businesses may defer purchases in response to negative financial events, which could negatively affect product demand and other related matters. Intel operates in intensely competitive industries that are characterized by a high percentage of costs that are fixed or difficult to reduce in the short term and product demand that is highly variable and difficult to forecast. Revenue and the gross margin percentage are affected by the timing of Intel product introductions and the demand for and market acceptance of Intel's products; actions taken by Intel's competitors, including product offerings and introductions, marketing programs and pricing pressures and Intel’s response to such actions; and Intel’s ability to respond quickly to technological developments and to incorporate new features into its products. The gross margin percentage could vary significantly from expectations based on capacity utilization; variations in inventory valuation, including variations related to the timing of qualifying products for sale; changes in revenue levels; segment product mix; the timing and execution of the manufacturing ramp and associated costs; start-up costs; excess or obsolete inventory; changes in unit costs; defects or disruptions in the supply of materials or resources; product manufacturing quality/yields; and impairments of long-lived assets, including manufacturing, assembly/test and intangible assets. Intel's results could be affected by adverse economic, social, political and physical/infrastructure conditions in countries where Intel, its customers or its suppliers operate, including military conflict and other security risks, natural disasters, infrastructure disruptions, health concerns and fluctuations in currency exchange rates. Expenses, particularly certain marketing and compensation expenses, as well as restructuring and asset impairment charges, vary depending on the level of demand for Intel's products and the level of revenue and profits. Intel’s results could be affected by the timing of closing of acquisitions and divestitures. Intel's results could be affected by adverse effects associated with product defects and errata (deviations from published specifications), and by litigation or regulatory matters involving intellectual property, stockholder, consumer, antitrust, disclosure and other issues, such as the litigation and regulatory matters described in Intel's SEC reports. An unfavorable ruling could include monetary damages or an injunction prohibiting Intel from manufacturing or selling one or more products, precluding particular business practices, impacting Intel’s ability to design its products, or requiring other remedies such as compulsory licensing of intellectual property. A detailed discussion of these and other factors that could affect Intel’s results is included in Intel’s SEC filings, including the company’s most recent reports on Form 10-Q, Form 10-K and earnings release. Rev. 7/17/13
  • 4. Optimization Notice 4 Optimization Notice Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Notice revision #20110804
  • 5. 5 Outline Motivation: A subset of science problems I have been involved with that have been difficult to investigate with previous hardware/software solutions What we have done: Recent hardware and software advances that have enabled new investigations What we are doing: Issues and the solutions required to continue to advance HPC capabilities
  • 6. 3 Example Science Drivers 6
  • 7. Liquid Crystal Biosensors 7 Label Free Diagnostics § Alignment changes induced by interactions with biomolecules can be detected with optical microscopy § "Critical to this area is a sound understanding of the theory and modelling of liquid-crystal materials at interfaces. Computer simulations of the molecular and mesoscopic interactions between thermotropic liquid-crystal materials … are a necessity. It is important to understand the interactions at these complicated interfaces“ - Woltman, et al., Nature Materials*, 6, 929 (2007) § Prior to 2013, no molecular simulations at experimentally relevant scales § Computational limitation § Extrapolations from small-scale studies difficult/dangerous due to the presence of undulative modes REVIEW ARTICLE Aqueous lipid-rich solution Aligned liquid crystal Lipid-depleted regions Glass substrate Patterned well Woltman, et al., Nature Materials*, 6, 929 (2007) T = 0 min T = 15 min T = 45 min T = 75 min * Other names and brands may be claimed as the property of others.
  • 8. Icephobic Surfaces for Wind Power 8 Many of the regions of the world that can benefit from wind power are in cold climates Ice can reduce the efficiency of turbines or force shutdown Extremely difficult to probe ice formation experimentally Probing with simulation requires large simulations at time scales sufficient to capture rare nucleation events leading to ice formation
  • 9. Organic Solar Cells 9 Organic photovoltaic (OPV) solar cells are promising renewable energy sources • Low cost, high-flexibility, light weight Morphology of the OPV active layer has a significant impact on device efficiency • Donor regions should be large enough to absorb light efficiently, but small enough to allow charge excitations to diffuse to acceptor before recombining • High interface-to-volume ratio for efficient exciton dissociation • Proper donor molecular alignment to optimize the carrier mobility of the donor phase Multi-scale problem • Very large simulations to reach experimental scales PCBM (Electron Acceptor) P3HT (Electron Donor)
  • 10. Hardware and Software Advances 10
  • 11. 11 Many-core and GPUs Issues with power consumption, heat dissipation, and memory access latencies have forced a change in direction towards many-core processors for current and next-generation supercomputers that advance the limits of peak floating point performance GPU-based architectures were the first affordable many-core processors generally available ORNL* Titan came online in late 2012 as a hybrid supercomputer capable of over 17PF HPL performance. • Cray* XK7 architecture with a Gemini* network interconnect and nodes containing a single AMD* Opteron* 6274 processor and Nvidia* Tesla K20X * Other names and brands may be claimed as the property of others.
  • 12. 12 Center for Accelerated Application Readiness (CAAR at ORNL*) In order to exploit the potential performance gains on Titan, changes to the software are required CAAR was formed to address this issue for several codes before the upgrade to Titan • LAMMPS* (Large-scale Atomic/Molecular Massively Parallel Simulator) was the molecular dynamics code chosen for CAAR • We implemented the ‘GPU package’ for LAMMPS* with new algorithms suited for running on hybrid GPGPU architectures Brown, W.M., Wang, P. Plimpton, S.J., Tharrington, A.N. Implementing Molecular Dynamics on Hybrid High Performance Computers - Short Range Forces. Computer Physics Communications*. 2011. 182: p. 898-911. Brown, W.M., Kohlmeyer, A. Plimpton, S.J., Tharrington, A.N. Implementing Molecular Dynamics on Hybrid High Performance Computers - Particle-Particle Particle-Mesh. Computer Physics Communications*. 2012. 183: p. 449-459. * Other names and brands may be claimed as the property of others.
  • 13. Spinoidal Dewetting in Liquid Crystal Layers Science Team: Trung Dac Nguyen, Jan-Michael Carrillo, Mike Matheson, W. Michael Brown, (ORNL*) 13 Problem: How/why do defects form in liquid crystal layers? • Not possible to study on previous Jaguar supercomputer with software available unless the machine was dedicated to the problem Innovation: Coarse-grain simulation model with high arithmetic intensity/vector potential, GPU algorithms Result: > 7X performance gain (1S CPU+GPU vs 2S CPU); First simulations at scale with molecular detail Nguyen, T. D., Carrillo, J.-M. Y., Matheson, M. A., Brown, W.M., Rupture Mechanism of liquid crystal thin films realized by large-scale molecular simulations. Nanoscale*. 2014. 6: 3083-96. * Other names and brands may be claimed as the property of others.
  • 14. Icephobic Surfaces Science Team: Yamada Masako, et. al (GE* Global Research) 14 Problem: Probe mechanism for water droplet freezing on a surface with molecular detail Innovation: New 3-body potential for water simulation (E. B. Moore, V. Molinero, Nature* 479 (2011) 506–508) • Coarse-grain model, eliminate all-to-all communications for electrostatics, larger timestep • > 100X Simulation rate speedup from model • Additional concurrency for 3-body: O(N3) independent force computations vs O(N2) [for N atoms within cutoff radius] • Additional 2-3X Simulation speedup on XK7 node (1S CPU+GPU) vs 2S Opteron* Result: Simulation of water freezing process now typical with hundreds of nodes. Brown, W.M., Masako, Y. Implementing Molecular Dynamics on Hybrid High Performance Computers – Three-Body Potentials. Computer Physics Communications*. 2013. 184: p. 2785-2793. * Other names and brands may be claimed as the property of others.
  • 15. Issues with Hybrid GPU Machines 15
  • 16. Issues with the Programming Model 16 Using CUDA* with C/C++ semantics introduces multiple languages, code paths, compiler requirements, etc. into a code base. • Support for Fortran or x86 targets requires proprietary compilers OpenACC* can potentially address this, but in my opinion, will still require separate code paths and different algorithms for x86 in general • GPUs have 10,000s of threads in flight, different performance for atomic operations, different penalties for thread synchronization, context switching, etc. • For example, the algorithms used for molecular dynamics on the GPU would not perform well on x86 processors. • For the 3-body water/ice simulation presented here, we triple the number of force computations on the GPU with redundant computations! * Other names and brands may be claimed as the property of others.
  • 17. Optimizing separate code paths 17 Optimizations for GPU do not necessarily improve performance on CPUs • Many production codes supporting GPU acceleration still use the CPU for many routines and must still support CPU-only • Intel® Xeon® processors are still improving the performance/ socket • 85% of all systems on TOP500* use Intel® processors (97% of new systems) • Future Intel® many-core processors will be bootable (no coprocessor necessary) • Difficult to manage/debug/port software with separate optimization code paths No Coprocessor/GPU 1 8.38 2.94 12.43 14 12 10 8 6 4 2 0 Liquid Crystal Benchmark Simulation Rate (Higher is Better) (2012) LAMMPS Baseline on 2S AMD* Opteron* 6274 [1600MHz DDR3] (2012) LAMMPS w/ GPU Optimizations on 1S AMD* Opteron* 6274 + Nvidia* Tesla* K20X (2014) LAMMPS w/ GPU Optimizations on 2S Intel® Xeon® E5-2697v3 [2133 MHz DDR4] (2014) LAMMPS w/ Intel® Xeon Phi™ Coprocessor Optimizations on 2S Intel® Xeon® E5-2697v3 [2133 MHz DDR4] Source: AMD* & AMD/NVIDIA* results: http://www.nvidia.com/docs/IO/122634/computational-chemistry-benchmarks.pdf Intel Results: Intel Measured August 2014 Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. See benchmark tests and configurations in the speaker notes. For more information go to http://www.intel.com/performance * Other names and brands may be claimed as the property of others.
  • 18. Intel® Package for LAMMPS 18
  • 19. Motivation 19 Provide a workspace for demonstrating code modernization strategies with vectorization for different precision modes, OpenMP*, and MPI*-3 Support computation offload to Intel® coprocessors with performance that is comparable or better than GPU performance Demonstrate portable performance with a single code path using standard MD algorithms with C++/OpenMP* Provide routines that allow the community to easily add new functionality by example LAMMPS* Rhodopsin Protein Benchmark 512K 32.00 16.00 8.00 4.00 2.00 1.00 0.50 Atoms (TACC* Stampede) 1 2 4 8 16 32 Simula@on Time (Lower is beEer) Nodes 2S Intel® Xeon® Processor E5-­‐2680 2S Intel® Xeon® Processor E5-­‐2680 + Intel® Xeon Phi™ Coprocessor SE10P 2S Intel® Xeon® Processor E5-­‐2680 + Nvidia* Tesla K20m Stampede Configuration: HT Off, 32GB DDR3-1600MHz, PCIe2 (GPU), PCIe2 (Intel® Coprocessor), FDR 56 Gb/s, MPSS 3.3, MVAPICH 2.0b Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. See benchmark tests and configurations in the speaker notes. For more information go to http://www.intel.com/performance Performance improvements with Intel® package still significant with 1000 atoms per CPU core Source: Intel Measured August 2014 * Other names and brands may be claimed as the property of others.
  • 20. 20 Current Optimizations in Intel Package Data alignment Support for mixed and single precision modes in addition to double SIMD directives to allow compiler vectorization for routines with data dependencies Modifications to conditional branches to better support compiler vectorization for both coprocessors and Intel® Advanced Vector Extensions (Intel® AVX) Neighbor-list padding to prevent execution of vector remainder code Offload directives to manage data allocation, transfer, and concurrent computation on the coprocessor * Other names and brands may be claimed as the property of others.
  • 21. 21 Advantages of Intel® Package vs GPU Package (1) • Same code for routines run on the CPU and coprocessor (with or without offload) • Optimizations for Intel® Xeon Phi™ coprocessors resulted in faster performance on Intel® Xeon® processors (up to 3.5X) • GPU package uses different algorithms and different code/language • Support for both ‘newton’ settings allows for more flexibility for new force-fields • Improved flexibility for heterogeneous calculations • Intel® coprocessor offload not limited to 16 MPI* tasks on CPU (CUDA*-MPS limitation) • Intel® package supports OpenMP* with multiple threads on the CPU (GPU package does not use OpenMP) • MPI* tasks sharing coprocessor are able to get exclusive core affinity * Other names and brands may be claimed as the property of others.
  • 22. 22 Advantages of Intel® Package vs GPU Package (2) • More options for overlap of MPI* communications and computation • Build process is simpler and does not require building a separate library for coprocessor routines • One compiler/Makefile for everything • Precision mode (single, mixed, or double) can be switched at run-time without rebuilding • Package written in standard C++ with OpenMP* • Offload directives used for the coprocessor * Other names and brands may be claimed as the property of others.
  • 23. Organic Solar Cells Science Team: Jan-Michael Y Carrillo, Rajeev Kumar, Monojoy Goswami, S. Michael Kilbey II, Bobby G Sumpter (ORNL*/ UT*) 23 Problem: Predictive simulation of active layer morphology and molecular alignment based on blend composition Innovation: Code Modernization/HW Result: With code modernization and advanced HPC resources, we have been the first to perform simulations at scales that match experiment. • Left: Up to 1.9X with use of a coprocessor • Simulations include all of the statistics and I/O (about 10% of run time) from the production runs • Significant potential for advanced multiscale simulation models with coprocessors… 256.00 128.00 64.00 32.00 16.00 8.00 OPV Simula@on 1.77M Atoms (Stampede) 2 4 8 16 32 64 Simula@on Time (Lower is beEer) Nodes 2S Intel® Xeon® Processor E5-­‐2680 (Baseline) 2S Intel® Xeon® Processor E5-­‐2680 (Intel® Package) 2S Intel® Xeon® Processor E5-­‐2680 + Intel® Xeon Phi™ SE10P 2S Intel® Xeon® Processor E5-­‐2680 + Nvidia* Tesla* K20m COARSE-­‐GRAINED MD • Morphology • Phase segregation Carrillo, J.-M. Y., Kumar, R., Goswami, G., Sumpter, B., Brown, W.M., New Insights into Dynamics and Morphology of P3HT:PCBM Active Layers in Bulk Heterojunctions. Physical Chemistry Chemical Physics. 2013. 15: p. 17873-17882. Source: Michael Brown Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. See benchmark tests and configurations in the speaker notes. For more information go to http://www.intel.com/performance * Other names and brands may be claimed as the property of others.
  • 24. LAMMPS performance with current processors 24 1.00 4.68 7.64 7.12 1.92 6.58 8.66 9.71 2.30 10.36 12.44 14.00 12.00 10.00 8.00 6.00 4.00 2.00 0.00 Baseline (CPU Only) Intel® Package (CPU Only) Offload to GPU Offload to Intel® Xeon Phi™ Coprocessor Simula@on Rate (Higher is BeEer) LAMMPS Liquid Crystal Benchmark 524K Atoms 2S Intel® Xeon® Processor E5-­‐2680 / Nvidia* Tesla* K20m / Intel® Xeon Phi™ Coprocessor SE10P† 2S Intel® Xeon® Processor E5-­‐2697v2 / Nvidia* Tesla* K40c / Intel® Xeon Phi™ Coprocessor 7120A ‡ 2S Intel® Xeon® Processor E5-­‐2697v3 / HW Not Available / Intel® Xeon Phi™ Coprocessor 7120A‡ 1.00 1.21 2.30 2.16 1.51 1.82 2.68 2.70 1.82 2.31 3.06 3.50 3.00 2.50 2.00 1.50 1.00 0.50 0.00 Baseline (CPU Only) Intel® Package (CPU Only) Offload to GPU Offload to Intel® Xeon Phi™ Coprocessor Simula@on Rate (Higher is BeEer) LAMMPS Rhodopsin Protein Benchmark 512K Atoms Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. See benchmark tests and configurations in the speaker notes. For more information go to http://www.intel.com/performance * Other names and brands may be claimed as the property of others.
  • 26. Available/Ongoing/Future Improvements 26 Compiler vectorization • Significant advances in compiler vectorization using the Intel® Composer XE 2015. • New: Vector variant functions that allow explicit vector coding for small subroutines • Advanced compiler optimization reports with Intel® Composer XE 2015 and runtime analysis with Intel® VTune™ Amplifier XE MPI Performance in symmetric modes • Significant performance improvements with Intel® MPI* 5. • Future many-core chips will have bootable options and also options for integrated fabric
  • 27. Unveiling Details of Knights Landing (Next Generation Intel® Xeon Phi™ Products) Compute: Energy-efficient IA cores2 § Microarchitecture enhanced for HPC3 § 3X Single Thread Performance vs Knights Corner4 § Intel Xeon Processor Binary Compatible5 § 1/3X the Space6 § 5X Power Efficiency6 Platform Memory: DDR4 Bandwidth and Capacity Comparable to Intel® Xeon® Processors . . . . . . . . . . . . . . . . . . . . . Intel® Silvermont Arch. Enhanced for HPC Integrated Fabric Processor Package . . . . . . All products, computer systems, dates and figures specified are preliminary based on current expecta@ons, and are subject to change without no@ce. 1Over 3 Teraflops of peak theore@cal double-­‐precision performance is preliminary and based on current expecta@ons of cores, clock frequency and floa@ng point opera@ons per cycle. FLOPS = cores x clock frequency x floa@ng-­‐ point opera@ons per second per cycle. . 2Modified version of Intel® Silvermont microarchitecture currently found in Intel® AtomTM processors. 3Modifica@ons include AVX512 and 4 threads/ core support. 4Projected peak theore@cal single-­‐thread performance rela@ve to 1st Genera@on Intel® Xeon Phi™ Coprocessor 7120P (formerly codenamed Knights Corner). 5 Binary Compa@ble with Intel Xeon processors using Haswell Instruc@on Set (except TSX) . 6Projected results based on internal Intel analysis of Knights Landing memory vs Knights Corner (GDDR5). 7Projected result based on internal Intel analysis of STREAM benchmark using a Knights Landing processor with 16GB of ultra high-­‐bandwidth versus DDR4 memory only with all channels populated. Conceptual—Not Actual Package Layout … Jointly Developed with Micron Technology Designed using Intel’s cutting-edge 14nm Transistor Technology Not bound by “offloading” bottlenecks Standalone CPU or PCIe Coprocessor Common instruction set architecture Intel® Advanced Vector On-Package Memory: Extensions 512 § up to 16GB at launch § 5X Bandwidth vs DDR47
  • 29. 29 Summary We have demonstrated that important research for a variety of science problems continues to benefit from hardware and software advances in HPC. We believe that the Intel® path forward for HPC architecture and software offers a solution that allows for a simpler code base and reduced software effort • We are dedicated to providing solutions that allow for portable performance across a range of different processors for both conventional HPC centers and those on a path to exa-scale. We have demonstrated, in a large production code, that optimizations for Intel® Xeon Phi™ coprocessors also improve performance on Intel® Xeon® processors and that the same routine can be used for efficient computations on both
  • 30. Summary (2) 30 The innovations here also included reconsideration of the simulation model • Important to realize that there is often a choice in the model used, and that in the past, minimizing computation was important to performance • It may be the case that you can exploit additional arithmetic intensity and concurrency to increase the accuracy, time-step, etc. in order to advance research capabilities • This has already been the case for simulation in materials problems GPT BOP CHARMm Stillinger-Weber EAM Tersoff AIREBO MEAM ReaxFF ReaxFF/C eFF COMB T/2 ) REBO EIM GAP 1980 1990 2000 2010 Year Published -3 10 -5 10 10 -6 10 -4 10 -2 Cost [core-sec/atom-timestep] O(2 Computational Aspects of Many-body Potentials, S. J. Plimpton and A. P. Thompson, MRS Bulletin*, 37, 513-521 (2012). * Other names and brands may be claimed as the property of others.
  • 31. 31 Acknowledgements Research by the teams presented here made possible by: DOE Early Science and ALCC Programs This research used resources of the Oak Ridge Leadership Computing Facility* at the Oak Ridge National Laboratory*, which is supported by the Office of Science of the U.S. Department of Energy* under Contract No. DE-AC05-00OR22725. NSF TACC* Stampede Project: ACI-1134872 The researchers acknowledge the Texas Advanced Computing Center* (TACC) at The University of Texas at Austin* for providing HPC resources that have contributed to the research results reported within this paper. URL: http:// www.tacc.utexas.edu NSF UT* Beacon Project This material is based upon work supported by the National Science Foundation under Grant Number 1137097 and by the University of Tennessee* through the Beacon Project. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation* or the University of Tennessee*. * Other names and brands may be claimed as the property of others.
  • 32. 32 Getting Access to Intel® Xeon Phi™ HPC Systems NSF XSEDE*: TACC* Stampede, LSU* SuperMIC https://www.xsede.org/allocations NSF UT* Beacon: https://www.nics.tennessee.edu/computing-resources/beacon/allocations DOE NERSC* Users (Babbage): https://www.nersc.gov/users/computational-systems/testbeds/babbage/ Purdue* Conte (Purchase available): https://www.rcac.purdue.edu/compute/conte/ * Other names and brands may be claimed as the property of others.
  • 33. Thanks! Intel Confidential — Do Not Forward