Learn how to Accelerate parallel processing while simplifying heterogeneous environment management by Optimizing IBM System x and Intel MIC clusters with IBM Platform HPC. For more information on IBM Systems, visit http://ibm.co/RKEeMO.
Visit http://bit.ly/KWh5Dx to 'Follow' the official Twitter handle of IBM India Smarter Computing.
Accelerating parallel processing while simplifying heterogeneous environment management
1. Thought Leadership White Paper
IBM Systems & Technology Group November 2012
Accelerating parallel processing
while simplifying heterogeneous
environment management
Optimizing IBM System x and Intel MIC clusters with IBM Platform HPC
2. 2 Accelerating parallel processing while simplifying heterogeneous environment management
The challenges of application scaling and
distributed computing
With the rapid advancement in hardware, such as multi-core
processors and faster interconnects, the latest technical
computing systems offer unprecedented computing power for
running parallel applications. However, building a parallel
application that can take advantage of all the available
computing resources is no easy task.
Although the number of CPU cores continues to increase
thanks to Moore’s Law, the total memory per CPU has not
kept up. As a result, per-core memory is decreasing. For
certain technical computing applications that require large
amounts of memory to run at an optimal speed, the existing
hardware architecture does not offer the performance that
meets user requirements. In these cases, users have one of two
options. They can turn off some of the cores and only use the
memory available to those cores, in which case those extra
cores become wasted resources. Or they live with coarse-
grained parallel applications that run on older generation
hardware with less core count per CPU for greater memory
bandwidth. Both alternatives lead to slower throughput and
decreased return on infrastructure investment.
The task of converting serial existing software into a parallel
version is daunting because of the volume of code and because
it requires computer programmers to master sophisticated
parallel programming models such as MPI or Open MP. In
these niche fields, resources are highly scarce so rewriting the
older applications can be time consuming and costly.
Contents
2 The challenges of application scaling and distributed
computing
3 The Intel Many Integrated Core architecture
4 The Intel Xeon Phi coprocessor
4 IBM Platform HPC for IBM System x
5 The IBM System x platform
6 An integrated solution for performance and efficiency in
parallel environments
7 Delivering breakthrough innovations
7 Accelerating performance and simplifying management
Technical computing, or high performance computing (HPC),
has been adopted in many industry segments because of the
superior price/performance of clusters built on commodity
technologies. Technical computing can help accelerate research
and design—from automobile and airplane design to
developing household products, such as potato chips and
diapers. Technical computing is becoming a critical part of
many organizations’ strategy as they look for new ways to
grow, compete and innovate. However, many users also face
challenges that prevent them from harnessing the power of
HPC resources available to them, which leads to poor
infrastructure utilization and longer time to results. This white
paper explores the challenges organizations face today and
explains how an integrated solution that uses IBM®
Platform™ HPC for IBM System x® and the Intel® MIC
architecture can help address these challenges.
3. IBM Systems & Technology Group 3
Figure 1: Intel MIC Knights Corner Architecture
The Intel MIC architecture design has two distinct features.
The design:
• Is comprised of many smaller, lower power Intel processor
cores.
• Contains wider vector processing units than previous
architecture designs, which results in greater floating point
performance per watt.
With its innovative design, the Intel MIC architecture delivers
superior aggregated performance. It supports data parallelism,
threads parallelism and process parallelism and increases total
memory bandwidth. Along with this performance gain,
programmability is another significant benefit of the Intel MIC
architecture.
The increased complexity of managing a distributed
computing environment is another challenge. Although the
absolute computing power of hardware continues to advance,
managing a data center intelligently so various workloads get
the necessary hardware resources needed to run at an optimal
speed is not a trivial task. Sophisticated infrastructure
management software is critical to ensure high resource
utilization and faster application performance.
For as long as clusters have been leveraged in technical
computing environments, users have been dealing with these
challenges. The continued disparity between the advancement
in hardware and software hinders broader adoption of
technical computing and ultimately affects the competitiveness
of a business.
Technical computing users need a solution that addresses both
parallel programming and system management challenges.
This paper details a viable, integrated technology solution
developed by IBM and Intel Corporation that can help users
overcome the barriers mentioned earlier, improve the
utilization of their infrastructure and benefit from accelerated
application performance.
The Intel MIC Architecture
The Intel Many Integrated Core (MIC) is an innovative
multiprocessor computer architecture which can provide
higher aggregated performance than other solutions. It is
designed to simplify application parallelization and deliver
significant performance improvements.
4. 4 Accelerating parallel processing while simplifying heterogeneous environment management
The Intel Xeon Phi coprocessor
The Intel Xeon Phi™ coprocessor is based on the Intel MIC
architecture. The Intel Xeon Phi coprocessor is based on a 22
nm process and offers more than 50 compute cores. The Intel
Xeon Phi coprocessor offers technical computing users who
are developing and running highly parallel applications these
benefits:
• Greater performance and performance per watt than
alternative products
• Easy programmability with the preservation of a general
purpose, standard programming environment
• Improved application scalability from small systems to large
supercomputer clusters
• Although the existing Intel Xeon processor delivers
industrial strength processing power for an enterprise to
address mission critical workloads, the Intel Xeon Phi
coprocessor is optimized to deliver the highest level of
parallel performance to technical computing applications to
power breakthrough innovations.
IBM Platform HPC for IBM System x
Platform HPC is complete, end-to-end management software.
It includes a rich set of out-of the-box features that empower
technical computing users by reducing the complexity of their
HPC environment and improving their time to results.
Platform HPC includes a complete set of management
capabilities:
• Cluster management
• Workload management
• Workload monitoring and reporting
• System monitoring and reporting
• Dynamic operating system multi-boot
• MPI libraries (IBM Platform MPI)
• Integrated application scripts and templates
• A unified web portal
The ease-of-programming features include:
• Support for standard C/C++/Fortran programming
languages, compilers, libraries and tools.
• Freedom from the need to program to the underlying
hardware because the technology is hardware-agnostic.
• Ease-of-tuning because many-core and multi-core
architectures generally benefit from the same tuning
methods.
• The ability to run applications independently because Intel
MIC is more than just an accelerator and systems built on
Intel MIC are a fully addressable, independent node in a
cluster that is capable of running applications independently.
• Support for Intel Cluster Studio XE and Intel Parallel
Studio XE.
• Uses standard paralleling models including MPI, OpenMP,
Intel Threading Building Blocks (TBB) and Intel Cilk™
Plus.
The Intel MIC architecture addresses highly parallel workload
requirements and is specifically designed for technical
computing users who are reaching the limit of application
scaling and aggregated memory bandwidth. Intel MIC is a
viable, easy solution that helps users expand their computing
capabilities. Intel MIC helps deliver simplified parallelization
and greater throughput for programs that run hundreds of
threads and make active use of the 512-bit memory space, such
as Monte Carlo simulations and Black Scholls in financial
services, seismic analysis in oil and gas, weather modeling,
molecular dynamics and digital content creation.
5. IBM Systems & Technology Group 5
Figure 2: IBM Platform HPC Architecture
With these capabilities, Platform HPC makes it easy for cluster
users to:
• Deploy OS, tools and multiple applications.
• Manage and maintain the cluster, including applying patches
and upgrades while monitoring and reporting on cluster
health.
• Ensure the fair use of a cluster by multiple user groups while
optimizing performance and avoiding application conflicts.
• Isolate problems and perform troubleshooting using
easy-to-use tools.
With Platform HPC, users benefit from faster time-to-cluster
readiness and a shorter learning curve for both system
administration and application workload submission.
Leveraging the powerful workload capabilities, which are
based on industry-leading IBM Platform LSF®, they also can
achieve higher application throughput for faster time-to-
results.
The IBM System x platform
In addition to the management support provided by Platform
HPC, the Intel Xeon Phi coprocessor is also being offered as
an expansion option on IBM System x iDataPlex® dx360 M4
server for addressing specific workload needs and will be
shipping broadly to clients upon Intel’s general availability
date.
Figure 3: IBM iDataPlex
The iDataPlex dx360 M4 is the first server from IBM to
support Intel Xeon Phi coprocessors. The iDataPlex dx360 M4
is a dual-socket, half-depth server that is built on the latest
Intel Xeon processors, which offer high performance,
flexibility and energy efficiency. The flexible design supports
up to 2 MIC adapters to meet the requirement of highly
parallel workloads.
6. 6 Accelerating parallel processing while simplifying heterogeneous environment management
An integrated solution for performance
and efficiency in parallel environments
Platform HPC is tightly integrated with System x hardware
and is now delivered as a value-add to technical computing
cluster solutions from IBM. Built-in support for the Intel Xeon
Phi coprocessor enables technical computing users to easily
take advantage of the unprecedented computing capability
delivered by Intel Xeon Phi coprocessors to gain better
throughput for their parallel applications.
Platform HPC includes both node-level and cluster-level
monitoring of the systems that incorporate Intel Xeon Phi
coprocessors. At the cluster level, by selecting Intel Xeon
Phi-related monitoring metrics, a system administrator can
easily identify the nodes that have Intel Xeon Phi coprocessor
cards installed.
When the administrator drills down to the node level, the
detailed metrics of individual Intel Xeon Phi cards are listed.
The dashboard shows static configuration information such as
the flash version and board SKU along with dynamic metrics
such as core frequency and fan speed.
To automate the monitoring process, these monitored metrics
can be used to define alerts and their corresponding alert
actions. The metrics also can be used to specify a threshold
during job submission for the workload scheduler, so it places
the Intel Xeon Phi coprocessor-related jobs correctly.
Figure 4: IBM Platform HPC dashboard showing Intel Xeon Phi voltage state and temperature
7. IBM Systems & Technology Group 7
Delivering breakthrough innovations
With the Intel MIC architecture, application parallelization
becomes easy and application throughput gets a significant
boost. Adding Intel Xeon Phi coprocessor support into
Platform HPC enhances the experience of technical computing
users who need to run highly parallel applications on hardware
systems that include Intel Xeon Phi coprocessors. With the
integration between System x, Platform HPC and the Intel
Xeon Phi coprocessor, Platform HPC automatically allocates
the Intel Xeon Phi coprocessors to application workloads that
are suited for running on those resources, which frees CPU
resources in a cluster so they are available to other types of
workloads. This intelligent scheduling mechanism enables
better matching between available resources and different
types of workloads running on a cluster. It also results in faster
overall application throughput and high resource utilizations.
Accelerating performance and
simplifying management
With the Intel Xeon Phi coprocessor, technical computing
users can accelerate parallel applications beyond the limits of
existing processors in their hardware platform. The built–in
support for the Intel Xeon Phi coprocessor offered by Platform
HPC enables automatic matching between parallel applications
and available Intel Xeon Phi coprocessors on System x clusters.
It reduces the workload scheduling and resource management
complexity for IT administrators. Ultimately, the combined
solution of Platform HPC, System x and the Intel Xeon Phi
coprocessor delivers compelling benefits to both IT
administrators and users to ensure faster time-to-results,
improved user and administrator productivity and optimal
infrastructure utilization.
Figure 5: Node-level monitoring