7. Illustrative example
Fortran code using MPI, single threaded originally.
Run on Intel® Xeon Phi™ coprocessor natively (no offload).
Based on an actual customer example.
Shown to illustrate a point about common techniques.
Your results may vary!
Untuned
Performance on
Intel® Xeon®
processor
Untuned
Performance on
Intel® Xeon Phi™
coprocessor
8. Illustrative example
Fortran code using MPI, single threaded originally.
Run on Intel® Xeon Phi™ coprocessor natively (no offload).
Yeah!
Untuned
Performance on
Intel® Xeon®
processor
Untuned
Performance on
Intel® Xeon Phi™
coprocessor
TUNED
TUNED
Performance on
Performance on
Intel® Xeon Phi™
Intel® Xeon®
coprocessor
processor
9. Illustrative example
Fortran code using MPI, single threaded originally.
Run on Intel® Xeon Phi™ coprocessor natively (no offload).
Yeah!
Common optimization
techniques…
“dual benefit”
Untuned
Performance on
Intel® Xeon®
processor
Untuned
Performance on
Intel® Xeon Phi™
coprocessor
TUNED
TUNED
Performance on
Performance on
Intel® Xeon Phi™
Intel® Xeon®
coprocessor
processor
10. Illustrative example
Fortran code using MPI, single threaded originally.
Run on Intel® Xeon Phi™ coprocessor natively (no offload).
Common optimization
techniques…
“dual benefit”
Untuned
Performance on
Intel® Xeon®
processor
Untuned
Performance on
Intel® Xeon Phi™
coprocessor
TUNED
Performance on
Intel® Xeon®
processor
TUNED
Performance on
Intel® Xeon Phi™
coprocessor
13. TRAINING TAB… “WEBINAR” link
Webinar: Introduction to High Performance
Application Development for Multicore and Manycore
Abstract: This two day webinar series introduces developers to the world of multicore
and manycore computing with Intel® Xeon® processors and Intel® Xeon Phi™
coprocessors. Expert technical teams at Intel discuss development tools, programming
models, vectorization, and execution models that will get your development efforts
powered up to get the best out of your applications and platforms.
When: Recorded
Where: Online (software.intel.com/mic-developer >
Who:
training > webinar)
High Performance Application Developers
15. Next Intel® Xeon Phi™ Processor
(codename Knights Landing)
• Enhances CPU program compatibility by being one,
delivering on the advantages of heterogeneous
programming without the disadvantages.
• Preserves investments in current Intel® Xeon Phi™
programming.
•
including OpenMP* 4.0 offload
(and Intel offload directives) because “offload” for
coprocessor version becomes “native” for processor
version automatically in compilation!
• Integrated on-package memory, enhances
performance over off-package alone, using
standard programming APIs.
•
•
•
Standalone CPU (or coprocessor)
Intel’s industry leading 14nm process
Intel® AVX-512 instructions.
All products, computer systems, dates and figures specified are preliminary based on
current expectations, and are subject to change without notice.
16. Next Intel® Xeon Phi™ Processor
(codename Knights Landing)
• Enhances CPU program compatibility by being one,
delivering on the advantages of heterogeneous
programming without the disadvantages.
• Preserves investments in current Intel® Xeon Phi™
programming.
•
including OpenMP* 4.0 offload
(and Intel offload directives) because “offload” for
coprocessor version becomes “native” for processor
version automatically in compilation!
• Integrated on-package memory, enhances
performance over off-package alone, using
standard programming APIs.
•
•
•
Standalone CPU (or coprocessor)
Intel’s industry leading 14nm process
Intel® AVX-512 instructions.
All products, computer systems, dates and figures specified are preliminary based on
current expectations, and are subject to change without notice.
17. Next Intel® Xeon Phi™ Processor
(codename Knights Landing)
• Enhances CPU program compatibility by being one,
delivering on the advantages of heterogeneous
programming without the disadvantages.
• Preserves investments in current Intel® Xeon Phi™
programming.
•
including OpenMP* 4.0 offload
(and Intel offload directives) because “offload” for
coprocessor version becomes “native” for processor
version automatically in compilation!
• Integrated on-package memory, enhances
performance over off-package alone, using
standard programming APIs.
•
•
•
Standalone CPU (or coprocessor)
Intel’s industry leading 14nm process
Intel® AVX-512 instructions.
All products, computer systems, dates and figures specified are preliminary based on
current expectations, and are subject to change without notice.
18. Next Intel® Xeon Phi™ Processor
(codename Knights Landing)
• Enhances CPU program compatibility by being one,
delivering on the advantages of heterogeneous
programming without the disadvantages.
• Preserves investments in current Intel® Xeon Phi™
programming.
•
including OpenMP* 4.0 offload
(and Intel offload directives) because “offload” for
coprocessor version becomes “native” for processor
version automatically in compilation!
• Integrated on-package memory, enhances
performance over off-package alone, using
standard programming APIs.
•
•
•
Standalone CPU (or coprocessor)
Intel’s industry leading 14nm process
Intel® AVX-512 instructions.
All products, computer systems, dates and figures specified are preliminary based on
current expectations, and are subject to change without notice.
19. Intel® Xeon Phi™ Coprocessor Starter Kits
Go parallel today with a fully-configured
system starting below $5K*
3120A
OR
5110P
software.intel.com/xeon-phi-starter-kit
Other brands and names are the property of their respective owners.
*Pricing and starter kit configurations will vary. See software.intel.com/xeon-phi-starter-kit and provider websites for full details and disclaimers. Stated currency is US Dollars.
21. Intel® Parallel Computing Centers
http://tinyurl.com/parallelcenter
Five Centers Announced on October 22.
Intel investing in Parallel Application Development – open source
– for everyone!
“RFP” (request for proposal) open until December 1.
22. Intel at SC13 includes…
60 Theater
1 Keynote
Presentations
•
•
•
•
“The Secret Life of
Data”
Clean energy: predictive modeling
Robotic Welding Systems
Optimizing weather models
Optimizing code for Intel® Xeon® processors
and coprocessors
1st Parallel Universe
Computing Challenge
Winner$25k charity
donation
1st time for TOP500 class
system running on show floor
5K ICR Charity
Fun Run
9 Industry &
Research
demonstrations
Discover
Your Parallel 4 Collaboration Hubs
• Neo-Heterogeneity
Universe
• Storage and Fabric
• Exascale & Intel® Parallel Computing
Centers
• Expanding Tech. Computing Usage
To compete, you must compute
Parallel is your path forward
Let’s get there together
23. Intel at SC13 includes…
Matches:
60 Theater
1 Keynote
Presentations
•
•
•
•
“The Secret Life of
TODAY 8pm
Data”
Clean energy: predictive modeling
Robotic Welding Systems
Optimizing weather models
Optimizing code for Intel® Xeon® processors
and coprocessors
9 Industry &
Research
demonstrations
11am
and
4pm
1st Parallel Universe Tuesday
Discover
Computing Challenge Wednesday
11am
and
4pm
Winner$25k charity
Your Parallel 4 Collaboration Hubs
donation
Thursday
11am
1st time for TOP500 class
system running on show floor
FINAL
5K ICR Charity
Fun Run
Universe
• Neo-Heterogeneity
• Storage and Fabric
• Exascale & Intel® Parallel Computing
Centers
• Expanding Tech. Computing Usage
Thursday 2pm
To compete, you must compute
Parallel is your path forward
AllLet’s get there together
about FUN! (and $25,000)
27. Risk Factors
The above statements and any others in this document that refer to plans and expectations for the third quarter, the year and the future are forwardlooking statements that involve a number of risks and uncertainties. Words such as “anticipates,” “expects,” “intends,” “plans,” “believes,” “seeks,”
“estimates,” “may,” “will,” “should” and their variations identify forward-looking statements. Statements that refer to or are based on projections,
uncertain events or assumptions also identify forward-looking statements. Many factors could affect Intel’s actual results, and variances from Intel’s
current expectations regarding such factors could cause actual results to differ materially from those expressed in these forward-looking statements.
Intel presently considers the following to be the important factors that could cause actual results to differ materially from the company’s expectations.
Demand could be different from Intel's expectations due to factors including changes in business and economic conditions; customer acceptance of
Intel’s and competitors’ products; supply constraints and other disruptions affecting customers; changes in customer order patterns including order
cancellations; and changes in the level of inventory at customers. Uncertainty in global economic and financial conditions poses a risk that consumers
and businesses may defer purchases in response to negative financial events, which could negatively affect product demand and other related
matters. Intel operates in intensely competitive industries that are characterized by a high percentage of costs that are fixed or difficult to reduce in
the short term and product demand that is highly variable and difficult to forecast. Revenue and the gross margin percentage are affected by the
timing of Intel product introductions and the demand for and market acceptance of Intel's products; actions taken by Intel's competitors, including
product offerings and introductions, marketing programs and pricing pressures and Intel’s response to such actions; and Intel’s ability to respond
quickly to technological developments and to incorporate new features into its products. The gross margin percentage could vary significantly from
expectations based on capacity utilization; variations in inventory valuation, including variations related to the timing of qualifying products for sale;
changes in revenue levels; segment product mix; the timing and execution of the manufacturing ramp and associated costs; start-up costs; excess or
obsolete inventory; changes in unit costs; defects or disruptions in the supply of materials or resources; product manufacturing quality/yields; and
impairments of long-lived assets, including manufacturing, assembly/test and intangible assets. Intel's results could be affected by adverse economic,
social, political and physical/infrastructure conditions in countries where Intel, its customers or its suppliers operate, including military conflict and
other security risks, natural disasters, infrastructure disruptions, health concerns and fluctuations in currency exchange rates. Expenses, particularly
certain marketing and compensation expenses, as well as restructuring and asset impairment charges, vary depending on the level of demand for
Intel's products and the level of revenue and profits. Intel’s results could be affected by the timing of closing of acquisitions and divestitures. Intel's
results could be affected by adverse effects associated with product defects and errata (deviations from published specifications), and by litigation or
regulatory matters involving intellectual property, stockholder, consumer, antitrust, disclosure and other issues, such as the litigation and regulatory
matters described in Intel's SEC reports. An unfavorable ruling could include monetary damages or an injunction prohibiting Intel from manufacturing
or selling one or more products, precluding particular business practices, impacting Intel’s ability to design its products, or requiring other remedies
Rev. 7/17/13
such as compulsory licensing of intellectual property. A detailed discussion of these and other factors that could affect Intel’s results is included in
Intel’s SEC filings, including the company’s most recent reports on Form 10-Q, Form 10-K and earnings release.
Editor's Notes
DP/SP = double/single precisionFMA is unique to MIC wrt Xeon now, but Xeon allows 2 general instructions to execute concurrentlyQuicker summary foil here, longer one for theater presentation
A spectrum of execution models provide flexible options to best meet the needs of the applicationMost applications (80-90%) will run best on Intel Xeon processorsHighly parallel and vectorized applications will run even faster on Intel Xeon Phi CoprocessorsMulticore is the standard, Xeon-based clustering as we know it today. OffloadMPI ranks on Intel Xeon processors (only)All messages into/out of Xeon processorsOffload models used to accelerate MPI ranks – use pragmasTBB, OpenMP, Cilk Plus, Pthreads within Intel® MICSupport for automatic offload when Intel MKL is usedSymmetric (Windows host cannot be “symmetric” with Linux coprocessor)MPI ranks on Intel XP and Intel Xeon processorsMessages to/from any coreTBB, OpenMP, Cilk Plus, Pthreads used directly within MPI processesMany Core onlySometimes called native modeMPI ranks on Intel MIC (only)All messages into/out of Intel MICTBB, OpenMP*, Cilk Plus, Pthreads used directly within MPI processesNote the the card OS is still Linux. The model that is not supported is running in the symmetric mode or where MPI ranks are distributed amongst the Xeon and Xeon Phi resources. MPI ranks can offload tasks to the Xeon Phi coprocessor if it is available as a resource to that node.
Executive Summary (5 W’s)What is it?A Live Webinar –Developing High Performance Applications for Intel® Xeon and Xeon Phi Processors and CoprocessorsWhen is it?June 25th and June 26thDuration – 3 hours, each dayWho is it for?Software developers who develop, or are developing, high performance applications utilizing C/C++ and Fortran looking to build and scale forward their applications.Where is it?Online only event! Sign up and attend from the comfort of your office or your home. All that is needed is a PC/Mac with a browser that is web enabled with an internet connection.Why should I attend?This webinar will introduce you to the world of multicore and manycore computing with the Intel® Xeon and Intel® Xeon Phi processors and coprocessors. Development tools, programming models, execution models will be presented and discussed by the expert technical teams at Intel that will get your development efforts powered up to get the best out of your applications and platforms.https://www1.gotomeeting.com/register/686457049
Exploit the parallel power of the Intel Xeon Phi coprocessor for high-performance computing Intel® Xeon Phi™ Coprocessor High Performance Programming Jim Jeffers and James Reinders This book belongs on the bookshelf of every HPC professional. Not only does it successfully and accessibly teach us how to use and obtain high performance on the Intel MIC architecture, it is about much more than that. It takes us back to the universal fundamentals of high-performance computing including how to think and reason about the performance of algorithms mapped to modern architectures, and it puts into your hands powerful tools that will be useful for years to come. —Robert J. Harrison, Institute for Advanced Computational Science, Stony Brook University, from the ForewordReinders and Jeffers have written an outstanding book about much more than the Intel Xeon Phi. This is a comprehensive overview of the challenges in realizing the performance potential of advanced architectures, including modern multi-core processors and many-core coprocessors. The authors provide a cogent explanation of the reasons why applications often fall short of theoretical performance, and include steps that application developers can take to bridge the gap. This will be recommended reading for all of my staff.James A. Ang, Ph.D., Sandia National LaboratoriesThe authors have provided a very readable, big-picture introduction to programming the Intel Xeon Phi Coprocessor. By chronicling step-by-step optimizations of several computational kernels, software interfaces are illustrated for getting the most out of key architectural features of the Intel Xeon Phi Coprocessor.James L. Schwarzmeier, Cray Inc.The authors' consummate knowledge of the architecture shines through in this excellent introduction to the fundamentals of programming for the Intel® Xeon Phi™ coprocessor.I highly recommend this engaging treatise to programmers interested in effectively utilizing the Intel® Xeon Phi™ coprocessor. R. Glenn Brook, Ph.D.Chief Technology Officer, Joint Institute for Computational SciencesDirector, Application Acceleration Center of ExcellenceUniversity of Tennessee / Oak Ridge National Laboratory Authors Jim Jeffers and James Reinders spent two years helping educate customers about the prototype and pre-production hardware before Intel introduced the first Intel Xeon Phi coprocessor. They have distilled their own experiences coupled with insights from many expert customers, Intel Field Engineers, Application Engineers and Technical Consulting Engineers, to create this authoritative first book on the essentials of programming for this new architecture and these new products.This book is useful even before you ever touch a system with an Intel Xeon Phi coprocessor. To ensure that your applications run at maximum efficiency, the authors emphasize key techniques for programming any modern parallel computing system whether based on Intel Xeon processors, Intel Xeon Phi coprocessors, or other high performance microprocessors. Applying these techniques will generally increase your program performance on any system, and better prepare you for Intel Xeon Phi coprocessors and the Intel MIC architecture. Features: A practical guide to the essentials of the Intel Xeon Phi coprocessorPresents best practices for portable, high-performance computing and a familiar and proven threaded, scalar-vector programming modelIncludes simple but informative code examples that explain the unique aspects of this new highly parallel and high performance computational productCovers wide vectors, many cores, many threads and high bandwidth cache/memory architecture About the authors: Jim Jeffers[Photo]MIC Architecture Specialist, Intel James Reinders[Photo: pull from previous book, Structured Parallel Programming, ISBN: 978-0124159938]Director, Chief Evangelist, Intel Software Shelving category:Parallel Programming / Computer Architecture
Neo-heterogeneous is being used by some of our customers to describe a unique benefit of a system that combines Intel® Xeon® processors and Intel® Xeon Phi™ coprocessors. Neo-heterogeneous means an HPC system can have heterogeneous processor and coprocessor hardware, but a common programming model for both parts of the system – easing the burden on developers in training, testing, deploying, and maintaining code.
Neo-heterogeneous is being used by some of our customers to describe a unique benefit of a system that combines Intel® Xeon® processors and Intel® Xeon Phi™ coprocessors. Neo-heterogeneous means an HPC system can have heterogeneous processor and coprocessor hardware, but a common programming model for both parts of the system – easing the burden on developers in training, testing, deploying, and maintaining code.
Neo-heterogeneous is being used by some of our customers to describe a unique benefit of a system that combines Intel® Xeon® processors and Intel® Xeon Phi™ coprocessors. Neo-heterogeneous means an HPC system can have heterogeneous processor and coprocessor hardware, but a common programming model for both parts of the system – easing the burden on developers in training, testing, deploying, and maintaining code.
Neo-heterogeneous is being used by some of our customers to describe a unique benefit of a system that combines Intel® Xeon® processors and Intel® Xeon Phi™ coprocessors. Neo-heterogeneous means an HPC system can have heterogeneous processor and coprocessor hardware, but a common programming model for both parts of the system – easing the burden on developers in training, testing, deploying, and maintaining code.
Konrad-Zuse-ZentrumfürInformationstechnik Berlin (ZIB)ZIB and Intel have set up a “Research Center for Many-core High-Performance Computing” at ZIB. This Center will foster the uptake of current and next generation Intel many- and multicore technology in high performance computing and big data analytics. Read more › CINECACINECA is a nonprofit Consortium, made up of Italian universities and Institutions - hosting one of the largest public Italian computing centers, with EMEA and world wide visibility. CINECA has high expertise in parallel codes and specifically in material modeling codes. In the initial project, the Parallization of codes like Quantum Espresso are the target. Read more › Purdue UniversityThe Intel Parallel Computing Center at Purdue University is focused on improving the computing experience of researchers conducting work in nanoelectronics as they try to better understand how electrons flow through nano-scale devices, such as next-generation transistors. Read more › Texas Advanced Computing CenterThe Texas Advanced Computing Center at the University of Texas supports cutting-edge research in nearly every field of science, powering the discoveries of tomorrow. The Texas Advanced Computing Center (TACC) has deployed the first large scale system integrated with the new Intel Xeon Phi Coprocessor technology. The system called Stampede went into full scale production in early 2013 and is currently ranked as #6 most powerful system in the Top500 world rankings. Read more › University of TennesseeIntel supports three efforts at the University of Tennessee that utilize the Intel® Xeon Phi™ coprocessor architecture for porting and optimization. GROMACS is a molecular dynamics program led by world-renown computational molecular biophysicist Dr. Jeremy C. Smith. BLAST is a bioinformatics toolchain for sequencing genomic data that will yield advances in biotechnology and personalized medicine. The Innovative Computing Laboratory, under the direction of Dr. Jack Dongarra, is developing MAGMA MIC, the first of a new generation of highly optimized linear algebra libraries. Read more ›