TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
Deploying and Managing HPC Clusters with IBM Platform and Intel Xeon Phi Coprocessor
1. Best Practices in Deploying and Managing
HPC Clusters with Intel® Xeon Phi™
Louise Westoby
WW Marketing Manager, IBM Platform Computing
June 18, 2013
2. Business Innovation Stressing IT
Objective: Gain competitive advantage
• Innovate with more complex applications / simulations / analytics
• Long processing limits number of iterations in a given time period
• Explosion of data improves results but adds complexity
• Delays and high cost of adding new applications
• Difficult to use systemsEnd Users /
Business
Objective: Reduce cost while maintaining service
• Infrastructure silos to meet peak service level requirements
• CapEx and OpEx budget growth constrained
• Infrastructure issues – power/cooling, space, etc.
• Rise of lower cost resources (x86) and virtualization
• Evolving trend toward heterogeneous, multi-core programming models
IT
Organizations
3. VIRTUALIZED VIEWOF COMPUTE,NETWORKAND STORAGERESOURCES
Application
Businesses need to overcome infrastructure limitations to
maximize the value of compute and data-intensive applications
Application
Examples
• Simulation
• Analysis
• Design
• Big data
IT constrained
• Long wait times
• Low utilization
• IT Sprawl
IBM Platform Computing
Software
Big Data / Hadoop
Simulation and
Modeling
Analytics
Today Future
Make lots of
computers look like
“one”
Prioritized matching
of supply with
demand
Benefits
• High utilization
• Throughput
• Performance
• Prioritization
• Reduced cost
Repeated for
many
applications
and groups
• Clusters
• Grid
• HPC Cloud
VIRTUALIZED VIEWOF COMPUTE,NETWORKAND STORAGERESOURCES
Faster time
to results
Use fewer
resources
HPC Cloud / Cluster
Mgmt
4. Complete range of technical computing management software to
maximize high performance applications
Workload and
Resource
Management
Data
Management
Infrastructure
Management
Platform LSF Family
Batch, MPI workloads with process
mgmt, monitoring, analytics, user
portal, license mgmt
Platform HPC
Simplified, integrated HPC
management software for batch, MPI
workloads integrated with systems
Platform Symphony Family
High throughput, near ‘real time’
parallel compute and Big Data /
MapReduce workloads
Big Data /
Hadoop
Simulation /
Modeling
AnalyticsApplications
Heterogeneous
Resources
Compute Storage Network
Virtual, Physical, Desktop, Server, Cloud
Platform Cluster Manager Family
Provision and manage
Single Cluster (Standard) to Dynamic Clouds (Advanced)
General Parallel File System (GPFS)
High performance, distributed parallel file system
5. System X and Platform Computing: better together
Reference Ecosystem – Leverage the tight integration between IBM System x,
Platform Computing software and Intel technology
RHEL MS
System X
App App App
Q Logic
InfiniBand
Intel
Xeon
Intel
Xeon Phi
Intel
Intel Cluster
Ready
IBM Platform
Computing
6. Leveraging Platform HPC to properly provision and configure
Xeon Phi environment
Add Intel MPSS
packages to the
repository
Create provisioning
template to include
MPSS package
Provision all nodes
with Xeon Phi cards
Generate MPSS
configuration on
nodes with Xeon Phi
Create network
bridge & configure
Xeon Phi network
Start mpss service
automatically on
system boot up
1. Provision nodes and install MPSS
2. Install Intel® Xeon Phi ®compilers and run time software
3. Configure Platform HPC ELIM
7. Levering Platform LSF or Platform HPC to simplify
scheduling of Intel® Xeon Phi™ jobs
• Job can be submitted by specifying the following
metrics:
– Number of Xeon Phi cards required on each node
– Any metrics the Xeon Phi ELIM collects
• Job will be placed on nodes with available Xeon Phi
cards that meet the resource requirements
– Numerate Xeon Phi card on a node allowing multiple
jobs running on the same node using designated cards
• Agnostic to Xeon Phi execution mode (offload, native,
etc.)
• Job information
– Indication of which Xeon Phi cards are used
Collecting Xeon Phi Metrics
• Total number of cards per
node
• Number of cores per
accelerator
• Core temperature (Celsius)
• Frequency (GHz)
• Total power (Watts)
• Total Free memory (MB)
8. Cluster Node
Platform HPC monitoring system
• Single agent for both resource monitoring
and resource management
• Based on 20 years of Platform technology
– Light weight and small footprint
– Scalable
– Robust
– Extendable
– Fully automated failover
• Added monitoring metrics shown in
Platform HPC web GUI automatically
• Added monitoring metrics can be used to
define alerts
LIM
Xeon Phi
ELIM
GPU ELIM
Other
ELIMs
Management Node
Master
LIM PERF:
Monitoring &
Reporting
Master
Scheduler
9. Mudpot: Intel® Xeon Phi™ Cluster is used for advanced
computing at the NCAR Wyoming Supercomputing Center
9
10. IBM Platform LSF Leveraged at NCAR to manage complex,
heterogeneous compute environment
• From user POV there is one
place to submit jobs, regardless
of resource
• Different queues depending on
job type (e.g. regular, bigmem,
gpgpu)
• Allows multistage jobs to run on
multiple resources
– Large model run on
Yellowstone
– Dependent Data-Analysis Run
on Geyser
• Sharing between projects
managed transparently
10