IT Service Management (ITSM) Best Practices for Advanced Computing
e-Infrastructures for Science and Industry
1. 8th Int. Conference on Parallel Processing and Applied Mathematics
Wroclaw, Poland, Sep 13 – 16, 2009
e-Infrastructures
for Science and Industry
-Clusters, Grids, and Clouds –
Wolfgang Gentzsch, The DEISA Project and OGF
2. HPC Centers
• They are service providers, for past 40 years
• For research, education, and industry
• Computing, storage, apps, data, services
• Very professional
• to end-users, they look (almost) like Clouds
PPAM, Wroclaw, Sept 2009 Wolfgang Gentzsch 2
RI-222919
3. Grids
1998: The Grid: Blueprint for a
New Computing Infrastructure
Ian Foster, Carl Kesselman
2002: The Anatomy of the Grid
Ian Foster, Carl Kesselman, Steve Tuecke
PPAM, Wroclaw, Sept 2009 Wolfgang Gentzsch 3
RI-222919
4. Grids (Sun in 2001)
Clouds
Departmental Enterprise Global
Grids Grids Grids
PPAM, Wroclaw, Sept 2009 Wolfgang Gentzsch 4
RI-222919
5. Public Clouds • and
• IaaS, PaaS, SaaS
• Access • Outside corporate data center • we
• Elasticity Clouds
• Access over the Internet
• Virtual (Vmware, Xen,...)
• have
• all
• Abstraction
• Abstraction of the hardware
• Public, •private, hybrid PaaS, IaaS,the
Service oriented: SaaS, • HaaS
• Variable cost of services (QoS)
• Capex => Opex • components
• Pay-per-use IT services
• Pay-per-use up/down
• Scaling • available
• Scaling • today
PPAM, Wroclaw, Sept 2009 Wolfgang Gentzsch 5
RI-222919
6. Benefits of moving HPC to Grids
• Closer collaboration with your colleagues (VCs)
• More resources allow faster/more processing
• Different architectures serve more users
• Failover: move jobs to another system
. . . and Clouds
• No upfront cost for additional resources
• CapEx => OpEx, pay-per-use
• Elasticity, scaling up and down
• Hybrid solution (private and public cloud)
PPAM, Wroclaw, Sept 2009 Wolfgang Gentzsch 6
RI-222919
7. The Cloud of Cloud Companies
• Akamai
• Amazon • Areti Internet
• Google • Enki
• Sun • Fortress ITX
• Salesforce • Joyent
• Microsoft • Layered Technologies
• IBM • Rackspace
• Oracle • Terremark
• EMC • Xcalibre
• Cloudera • Manjrasoft / Aneka
• Cloudsoft • GridwiseTech / Momentum
• NICE/EnginFrame
PPAM, Wroclaw, Sept 2009 Wolfgang Gentzsch 7
RI-222919
8. NICE EnginFrame
Cluster/Grid/Cloud Portal
Remote, interactive, transparent, secure access to apps & data
on corporate Intranet or Internet, or in the Cloud.
Users
Virtualized
Standard protocols
Standard protocols
Data Center Clusters
Administrators
Win LX
Batch
Applications
Mac UX Licenses
Users Intranet Clients
Cloud Portal Interactive
Applications
/ Gateway
Administrators
Virtualized Storage
PPAM, Wroclaw, Sept 2009 Wolfgang Gentzsch 8
RI-222919
9. A Scalable Data Cloud Infrastructure
Example:
GridwiseTech
Momentum
PPAM, Wroclaw, Sept 2009 Wolfgang Gentzsch 9
RI-222919
10. ANEKA Cloud Platform
SaaS Cloud applications
Social computing, Enterprise, ISV, Scientific, CDNs, ...
Aneka Cloud Platform
Cloud Programming Models & SDK
Task Thread Map Reduce Workflow Third Party
Model Model Model Model Models
Core Cloud Services
PaaS SLA QoS
Pricing Billing Metering
Management Negotiation
Job Execution Admission Data
Monitoring
Scheduling Management Control Storage
VM VM Virtual Machines
Management Deployment
Windows Mac with Mono Linux with Mono
Amazon
Private Cloud Microsoft Google
Sun
IaaS
Courtesy:
Manjrasoft
LAN network Data Center
PPAM, Wroclaw, Sept 2009 Wolfgang Gentzsch 10
RI-222919
13. 4000
Animoto EC2 image usage
0
Day 1 Day 8
PPAM, Wroclaw, Sept 2009 Wolfgang Gentzsch 13
RI-222919
14. ‚My‘ current project:
DEISA: Grid or Cloud ?
Distributed European Infrastructure for Supercomputing Applications
Ecosystem for HPC Grand-Challenge Applications
15. DEISA HPC Centers
DEISA1: May 1st, 2004 – April 30th, 2008
DEISA2: May 1st, 2008 – April 30th, 2011
PPAM, Wroclaw, Sept 2009 Wolfgang Gentzsch 15
RI-222919
16. DEISA UNICORE Infrastructure
Gateway
CINECA user Gateway IDRIS Gateway
Gateway FZJ HLRS Gateway job
job ECMWF HPCX
Gateway Gateway
CSC LRZ
NJS
Gateway IDRIS IBM P6 LRZ user
Gateway
CINECA NJS NJS
RZG
FZJ IBM HLRS NEC SX8
IDB UUDB
Gateway Gateway
BSC IDB UUDB IDB UUDB SARA
NJS
ECMWF IBM P5
NJS
AIX
LL-MC HPCX Cray XT4
IDB UUDB
Super-UX
AIX NQS II IDB UUDB
LL-MC P
FT
G rid
AIX
LL
UNICOS/lc
NJS PBS Pro
NJS
CSC Cray XT4/5
LRZ SGI ALTIX
IDB UUDB UNICOS/lc
PBS Pro IDB UUDB
LINUX
PBS Pro
NJS
CINECA IBM P5
AIX
LL-MC
AIX
IDB UUDB LL-MC
LINUX LINUX NJS
Maui/Slurm LL RZG IBM
NJS NJS IDB UUDB
BSC IBM PPC SARA IBM
PPAM, Wroclaw, Sept 2009
IDB UUDB IDB UUDB Wolfgang Gentzsch 16
RI-222919
17. Categories of DEISA services
Operations
re
t
qu
or
es
pp
ct
of
ts
su
du
fer
co
s
ro
ss
st
nf
sp
ue
er
ig
r
q
ur
vic
fe
re
ati
of
e
n o
requests development
Technologies Applications
offers technology
PPAM, Wroclaw, Sept 2009 Wolfgang Gentzsch 17
RI-222919
18. DEISA Service Layers
Multiple Common Presen-
Workflow
ways to production tation
managemnt
access environmnt layer
Co-
Single Job manag.
Job reservation
monitor layer and
rerouting and co-
system monitor.
allocation
Data WAN Data
Data staging
transfer shared manag.
tools
tools File system layer
Network
Unified DEISA Network and
AAA Sites connectivity AAA
layers
PPAM, Wroclaw, Sept 2009 Wolfgang Gentzsch 18
RI-222919
19. DEISA Global File System
IBM P6 & BlueGene/P
IBM P6 & BlueGene/P NEC SX8
AIX, Linux
LL-MC
Super-UX
AIX, Linux NQS II
LL-MC P
FT
IBM P6
G rid Cray XT4
AIX
LL
UNICOS/lc
PBS Pro
Cray XT4/5 UNICOS/lc
PBS Pro SGI ALTIX
LINUX
PBS Pro
AIX
LL-MC
IBM P5 AIX, Linux
LL-MC
LINUX LINUX
Maui/Slurm LL IBM P5+ / P6 IBM P6 & BlueGene/P
IBM PPC
Global transparent file system based on the Multi-Cluster General Parallel File System
(MC-GPFS of IBM)
PPAM, Wroclaw, Sept 2009 Wolfgang Gentzsch 19
RI-222919
20. User management in DEISA
• A dedicated LDAP-based distributed repository
administers DEISA users
• Trusted LDAP servers are authorized to access each
other (based on X.509 certificates) and encrypted
communication is used to maintain confidentiality
SARA
SARA
BSC CINECA CSC ECMWF EPCC FZJ HLRS IDRIS LRZ RZG
PPAM, Wroclaw, Sept 2009 Wolfgang Gentzsch 20
RI-222919
21. DEISA: Grid or Cloud ?
• Built on top of proven, professional infrastructure of HPC
centers with expertise in implementation, operation, services.
• Ecosystem of resources, middleware, applications is respecting
administrative, cultural and political autonomy of partners.
• Globalizing existing HPC services - from local to global -
according to user requirements: revolution by evolution.
• User support: user-friendly access to resources, porting user
apps onto turnkey architecture.
• After EU funding, DEISA HPC ecosystem will operate in a
sustainable way, in the interest of the ‘global scientist’, as...
... almost an HPC Cloud !
PPAM, Wroclaw, Sept 2009 Wolfgang Gentzsch 21
RI-222919
22. There are still many
Challenges with Clouds
Sustainable TECHNICAL
Competitive
Advantage
CULTURAL
LEGAL &
REGULATORY
PPAM, Wroclaw, Sept 2009 Wolfgang Gentzsch 22
RI-222919
23. Challenges, Potential Cloud Inhibitors
• Not all applications are cloud-ready or cloud-enabled
• Interoperability of clouds (standards ?)
• Sensitive data, sensitive applications (med.patient records)
• Different organizations have different ROI
• Security: end-to-end from your resources to the cloud !
• Current IT culture is not predisposed to sharing resources
• “Static” licensing model doesn’t embrace cloud
• Protection of intellectual property
• Legal issues (FDA, HIPAA)
PPAM, Wroclaw, Sept 2009 Wolfgang Gentzsch 23
RI-222919
24. A Cloud Checklist for HPC
When is your HPC app ready for the Cloud ?
... no issues with licenses, IP, secrecy, privacy, sensitive
data and big data movement, legal or regulatory issues,
trust, . . .
...your app is architecture independent, not optimized for
specific architecture (single process, loosely-coupled low-
level parallel, I/O-robust)
...it’s just one app and zillions of parameters
...latency and bandwidth are not an issue
Ideally, your meta-scheduler knows your
requirements and schedules automatically ☺
PPAM, Wroclaw, Sept 2009 Wolfgang Gentzsch 24
RI-222919
25. Hybrid Grid/Cloud
Resource Management
Project C
Define policies according to
priorities, budget, and time
User 1 Team B
Department 4
Department 3
Contractor X
User 2
Department 2 Project A
External Cloud
Resources Department 1
Department resource access
Campus wide resource demand
PPAM, Wroclaw, Sept 2009 Wolfgang Gentzsch 25
RI-222919
26. Ed Walker, Benchmarking Amazon EC2 for high-performance scientific computing, ;Login, October 2008.
PPAM, Wroclaw, Sept 2009 Wolfgang Gentzsch 26
RI-222919
27. Ed Walker, Benchmarking Amazon EC2 for high-performance scientific computing, ;Login, October 2008.
PPAM, Wroclaw, Sept 2009 Wolfgang Gentzsch 27
RI-222919
28. Ed Walker, Benchmarking Amazon EC2 for high-performance scientific computing, ;Login, October 2008.
PPAM, Wroclaw, Sept 2009 Wolfgang Gentzsch 28
RI-222919
29. Ed Walker, Benchmarking Amazon EC2 for high-performance scientific computing, ;Login, October 2008.
PPAM, Wroclaw, Sept 2009 Wolfgang Gentzsch 29
RI-222919
30. A Closer Look at HPC Load
Single parallel job, cpu-intensive, tightly-coupled,
highly scalable, peta, exa,..
Single parallel job, cpu-intensive, weakly-scalable
Capacity computing, throughput, parameter jobs
Managing massive data sets, possibly
geographically distributed
Analysis and visualization of data sets
*) Similar to the analysis of T.Sterling and D.Stark, LSU, HPCwire
PPAM, Wroclaw, Sept 2009 Wolfgang Gentzsch 30
RI-222919
31. Clouds and supercomputers:
Conventional wisdom?
Clouds/
clusters
Too slow
Super Too
computers expensive
Courtesy
Ian Foster
Loosely coupled Tightly coupled
applications
PPAM, Wroclaw, Sept 2009
applications
Wolfgang Gentzsch 31
RI-222919
32. Loosely coupled problems
• Ensemble runs to quantify climate model uncertainty
• Identify potential drug targets by screening a database of ligand
structures against target proteins
• Study economic model sensitivity to parameters
• Analyze turbulence dataset from many perspectives
• Perform numerical optimization to determine optimal resource
assignment in energy problems
• Mine collection of data from advanced light sources
• Construct databases of computed properties of chemical compounds
• Analyze data from the Large Hadron Collider
• Analyze log data from 100,000-node parallel computations
☺ all can run in the cloud ☺
PPAM, Wroclaw, Sept 2009 Wolfgang Gentzsch 32
3
RI-222919
33. Thank You
Dziekuje
gentzsch@rzg.mpg.de
PPAM, Wroclaw, Sept 2009 Wolfgang Gentzsch 33
RI-222919