SlideShare a Scribd company logo
1 of 20
Download to read offline
HPC Cloud
Calligo system
“Focus on your research”
Floris Sluiter
Project leader
SARA
SARA Project involvements
What is High Performance Cloud
Computing?
● Cloud Computing: Self Service
Dynamically Scalable Computing Facilities
– Cloud computing is not about new technology, it is
about new uses of technology
● “HPC Cloud” is a “Cloud built with high end
HPC hardware”
● Through virtualization:
● Better seperation between users, allowing
more administrative rights
● Very fast to set up custom environments,
saving application porting time
● Users can fully define and control a private
HPC cluster
Why an HPC Cloud?
● Christophe Blanchet, IDB - Infrastructure Distributing Biology:
● Big task to port them all to your favorite architecture
● Is true for many scientific fields
● HPC Cloud:
● Provides a different solution to “porting applications”
● Provides the possibility to scale up your Desktop environment to HPC
The SARA / BiG Grid
Cloud project
● The beginning
● Started in 2009 from a powerpoint presentation
● Pilot
● 3 months with 10 users on 48 core cluster
– Participation was a “price” in a competition
– Very succesfull
● Preparing for Operations
● 2010 BiG Grid grants funding for project
– 2010 – 2011 Product development
– 50 usergroups on 128 core cluster
● 2011 Big Grid grants funding for HPC hardware
● Now
● Presenting a production infrastructure
● HPC Cloud day 4th
of October (slides and video is online at www.sara.nl)
HPC Cloud Project
Publicity & dissemination
● In 2009 we were the first in the global Grid community to offer this
to end users
● We started international interest group and presented at many
events (ECEE, EGI, Sienna Cloud Scape, etc) and institutions
● We are in the top 5 search results for “HPC Cloud” and “HPC
Cloud Computing” at Google
● News articles on paper and in online publications (HPCCloud,
HPCWire, Computable, etc)
User participation
30 involved in Beta testing
nr. Title Core Hours Storage Objective Group/instiute
1 10-100GB / VM
2 2000 (+) 75-100GB Analyse 20 million Flickr Geocoded data points Uva, GPIO institute
3 Urban Flood Simulation 1500 1 GB UvA, Computational Science
4 testing 1GB / VM
5 8000 150GB
6
7 Department of Geography, UvA
8 1 TB Deltares
9 20TB
10 320 20GB
11 160 630GB Video Feature extraction
12 150-300GB Chris Klijn, NKI
Cloud computing for sequence
assembly
14 samples * 2 vms *
2-4 cores * 2 days =
5000
Run a set of prepared vm's for different and specific
sequence assembly tasks
Bacterial Genomics, CMBI
Nijmegen
Cloud computing for a multi-
method perspective study of
construction of (cyber)space and
place
asses cloud technology potential and efficiency on
ported Urban Flood simulation modules
A user friendly cloud-based
inverse modelling environment
Further develop a user-friendly desktop environment
running in the cloud supporting modelling, testing and
large scale running of model.
Computational Geo-ecology,
UvA
Real life HPC cloud computing
experiences for MicroArray
analyses
Test, development and acquire real life experiences
using vm's for microarray analysis
Microarray Department,
Integrative BioInformatics Unit,
UvA
Customized pipelines for the
processing of MRI brain data ?
up to 1TB of data ->
transferred out quickly.
Configure a customized virtual infrastructure for MRI
image processing pipelines
Biomedical Imaging Group,
Rotterdam, Erasmus MC
Cloud computing for historical
map collections: access and
georeferencing ?
7VM's of 500 GB = 3.5
TB
Set up distributed, decentralized autonomous
georeferencing data delivery system.
Parallellization of MT3DMS for
modeling contaminant transport at
large scale
64 cores, schaling
experiments / * 80
hours = 5000 hours
Goal, investigate massive parallell scaling for code
speed-up
An imputation pipeline on Grid
Gain
Estimate an execution time of existing bioinformatics
pipelines and, in particular, heavy imputation pipelines
on a new HPC cloud
Groningen Bioinformatics
Center, university of groningen
Regional Atmospheric Soaring
Prediction
Demonstrate how cloud computing eliminates porting
problems.
Computational Geo-ecology,
UvA
Extraction of Social Signals from
video
Pattern Recognition Laboratory,
TU DelftAnalysis of next generation
sequencing data from mouse
tumors ?
Run analysis pipeline to create mouse model for
genome analysis
User participation
30 involved in Beta testing
nr. Title Core Hours Storage Objective Group/instiute
1 10-100GB / VM
2 2000 (+) 75-100GB Analyse 20 million Flickr Geocoded data points Uva, GPIO institute
3 Urban Flood Simulation 1500 1 GB UvA, Computational Science
4 testing 1GB / VM
5 8000 150GB
6
7 Department of Geography, UvA
8 1 TB Deltares
9 20TB
10 320 20GB
11 160 630GB Video Feature extraction
12 150-300GB Chris Klijn, NKI
Cloud computing for sequence
assembly
14 samples * 2 vms *
2-4 cores * 2 days =
5000
Run a set of prepared vm's for different and specific
sequence assembly tasks
Bacterial Genomics, CMBI
Nijmegen
Cloud computing for a multi-
method perspective study of
construction of (cyber)space and
place
asses cloud technology potential and efficiency on
ported Urban Flood simulation modules
A user friendly cloud-based
inverse modelling environment
Further develop a user-friendly desktop environment
running in the cloud supporting modelling, testing and
large scale running of model.
Computational Geo-ecology,
UvA
Real life HPC cloud computing
experiences for MicroArray
analyses
Test, development and acquire real life experiences
using vm's for microarray analysis
Microarray Department,
Integrative BioInformatics Unit,
UvA
Customized pipelines for the
processing of MRI brain data ?
up to 1TB of data ->
transferred out quickly.
Configure a customized virtual infrastructure for MRI
image processing pipelines
Biomedical Imaging Group,
Rotterdam, Erasmus MC
Cloud computing for historical
map collections: access and
georeferencing ?
7VM's of 500 GB = 3.5
TB
Set up distributed, decentralized autonomous
georeferencing data delivery system.
Parallellization of MT3DMS for
modeling contaminant transport at
large scale
64 cores, schaling
experiments / * 80
hours = 5000 hours
Goal, investigate massive parallell scaling for code
speed-up
An imputation pipeline on Grid
Gain
Estimate an execution time of existing bioinformatics
pipelines and, in particular, heavy imputation pipelines
on a new HPC cloud
Groningen Bioinformatics
Center, university of groningen
Regional Atmospheric Soaring
Prediction
Demonstrate how cloud computing eliminates porting
problems.
Computational Geo-ecology,
UvA
Extraction of Social Signals from
video
Pattern Recognition Laboratory,
TU DelftAnalysis of next generation
sequencing data from mouse
tumors ?
Run analysis pipeline to create mouse model for
genome analysis
Field # projects
Bioinformatics 14
Ecology 4
Geography 3
Computer science 4
Alpha/Gamma 5
Virtual HPC Cluster
● A true HPC cluster with dedicated storage,
network and compute nodes
● User has full control
● Upload and run any image
● Custom environment examples
– A much bigger workstation and/or
many identical workstations
– Closed source software
● Different environments in same cluster
● Example: combine Linux with windows nodes
● High mem, disk, I/O, CPU and in any
combination
● Long running jobs
HPC Cloud Application types
● Applications with different
requirements can co-exist
on the same physical host
and in the same virtual
cluster
● Single node (remote
desktop on HPC node)
● Master with workers
(standard cluster)
● Token server
● Pipelines/workflows
● example:
MSWindows+Linux
● 24/7 Services that start
workers
HPC Type Examples Requirements
Compute
Intensive
Monte Carlo simulations
and parameter
optimizations, etc
CPU Cycles
Data intensive Signal/Image processing
in Astronomy, Remote
Sensing, Medical
Imaging, DNA matching,
Pattern matching, etc
I/O to data (SAN
File Servers)
Communication
intensive
Particle Physics, MPI, etc Fast
interconnect
network
Memory
intensive
DNA assembly, etc Large (Shared)
RAM
Continuous
services
Databases, webservers,
webservices
Dynamically
scalable
HPC Cloud Architecture
Calligo “I make clouds”
1 Node:
● CPU Intel 2.13 GHz 32 cores (Xeon-E7
"Westmere-EX")
● RAM 256 Gbyte
● "Local disk" 10 Tbyte
● Ethernet 4*10GE
Total System (19 nodes)
● 608 cores
● RAM 4,75TB
● 144 ports 10GE, 1-hop, non-blocking
interconnect (fabric: 2.5 Terabit/s full-
duplex,4.5usec port to port latency
● 400TB shared storage (ISCSI,NFS,CIFS,WebDAV...)
● 11.5K specints / 5TFlops
Platform and tools:
Redmine collaboration portal
Custom GUI (Open Source)
Open Nebula + custom add-ons
KVM + Libvirt
● Novel architecture
● Low latecy 10GE switch chassis at the heart, with mountable storage
● Per node High mem, high core, 4* network (but smaller partitions possible)
● Fast and dynamic changes in configurations and partitions
● Security and safety through strict separation of users + monitoring network
● Invested in Expandability
● Network switch and storage controllers can be expanded to support more compute nodes
● Future architecture focus might change, depending on usage patterns
● Virtualization overhead <10%
● Very flexible for any application. “Just great” is often good enough
Cloud = Expensive???
● Please Note:
● The hardware is what makes it
“High performance computing”
– We choose this type of hardware to fill a specific niche in
our offerings
● The Software stack is what makes it “Cloud”
● You can do this with any type of hardware (but be
careful with Infiniband)!!
● Actually we give much less user support
– expensive hardware but cheaper in personnel
HPC Cloud Architecture vs
other HPC Architectures
System Node Characteristics
Huygens National Super CPU Power6, 4.7Ghz, 32/64 cores
RAM 128/256 GB
"local disk" 8Tbyte
Infiniband 8*20 Gbit/s
Network, storage, and compute
are fully balanced resulting in a
highly optimized system for
I/O- intensive, parallel
applications
LISA National Compute Cluster CPU Intel 2.26Ghz 8/12 cores
RAM 24 Gbyte
Local disk 65Gbyte/200Gbyte
Infiniband 20Gbit/s
Network, storage, and compute
are optimized for efficient
parallel applications
Grid CPU Intel 2.2 Ghz 8 cores
RAM 24 Gbyte
Local disk 300Gbyte
Ethernet 1 Gbit/s
Network, storage, and compute
are optimized for high
throughput of data and single
core data processing
Cloud CPU Intel 2.13 GHz 32 cores
(Xeon-E7 "Westmere-EX")
RAM 256 Gbyte
"Local disk" 10 Tbyte
Ethernet 4*10GE
Network, storage, and compute
are dynamically scalable and
configurable for both large and
small workloads
Cloud security & trust
Chief principle: !!! CONTAINMENT !!!
● Protect
– the outside from the cloud users
– the cloud users from the outside
– the cloud users from each other
– Not possible to protect the cloud user from himself because user has full access/control/responsibility
● ex. virus research must be possible
● Keep each VM inside its own sandbox, completely separated from others and its host
● Private Vlan per user
– Access to outside only through proxy
– Access from outside only through VNC on proxy (port forwarding)
● Public ip is allowed (both incoming and outgoing), but Source & Target (range) need to be
specified
– Self service & Dynamically: rules are autogranted
– Traffic is monitored
– Open Ports are monitored
● Specialized hardware helps, but not necessary
– QinQ, Bridges, IP tables
Calligo Network
● 1 entry point from
administration network
● No connection from
admin to user space
● Almost full redundancy
● Switch fabric is
impressive: 3.75 Terabit
full-duplex with 4.5usec
port to port latency
● Dedicated lightpaths to
various systems in the
Netherlands
Cloud Security
“Public” side
SRIOV SRIOV SRIOV SRIOV
Arista
VLANs / QinQ
QOS/Throttle
Firewall
VM 2 VMVM 1 VM 6VM 3VM 5VM 4
PCI
PCI
PCI
PCI
PCI
PCI
PCI400 Tbyte DDN Storage
NFS/CIFS/ISCSI
TricorderProxy
Cloud Security
(VLANS + monitoring)
SRIOV SRIOV SRIOV SRIOV
Arista
VLANs / QinQ
QOS/Throttle
Firewall
VM 2 VMVM 1 VM 6VM 3VM 5VM 4
PCI
PCI
PCI
PCI
PCI
PCI
PCI400 Tbyte DDN Storage
NFS/CIFS/ISCSI
TricorderProxy
GUI + Collaboration Portal
For current and future users
● Most current users only asked a few support
questions when they start to use it, it is easy to do
support on this system ;-)
● In the Pilot and Beta phase, the hardware was not
ideal for this type of system and applications. The
new hardware is/will be much bigger and better!
● On the HPC Cloud: Do not change your HPC
applications for the system, but rather adapt the
environment they run in
● The result will be: a shorter time to a solution
Thank you!
Questions?
www.cloud.sara.nl
Photo's: http://cloudappreciationsociety.org/

More Related Content

What's hot

Exploring the Performance Impact of Virtualization on an HPC Cloud
Exploring the Performance Impact of Virtualization on an HPC CloudExploring the Performance Impact of Virtualization on an HPC Cloud
Exploring the Performance Impact of Virtualization on an HPC CloudRyousei Takano
 
Cloud Roundtable at Microsoft Switzerland
Cloud Roundtable at Microsoft Switzerland Cloud Roundtable at Microsoft Switzerland
Cloud Roundtable at Microsoft Switzerland mictc
 
Programmable Exascale Supercomputer
Programmable Exascale SupercomputerProgrammable Exascale Supercomputer
Programmable Exascale SupercomputerSagar Dolas
 
CERN IT Monitoring
CERN IT Monitoring CERN IT Monitoring
CERN IT Monitoring Tim Bell
 
USENIX NSDI 2016 (Session: Resource Sharing)
USENIX NSDI 2016 (Session: Resource Sharing)USENIX NSDI 2016 (Session: Resource Sharing)
USENIX NSDI 2016 (Session: Resource Sharing)Ryousei Takano
 
Flow-centric Computing - A Datacenter Architecture in the Post Moore Era
Flow-centric Computing - A Datacenter Architecture in the Post Moore EraFlow-centric Computing - A Datacenter Architecture in the Post Moore Era
Flow-centric Computing - A Datacenter Architecture in the Post Moore EraRyousei Takano
 
Big Process for Big Data @ NASA
Big Process for Big Data @ NASABig Process for Big Data @ NASA
Big Process for Big Data @ NASAIan Foster
 
Toward a National Research Platform
Toward a National Research PlatformToward a National Research Platform
Toward a National Research PlatformLarry Smarr
 
UberCloud HPC Experiment Introduction for Beginners
UberCloud HPC Experiment Introduction for BeginnersUberCloud HPC Experiment Introduction for Beginners
UberCloud HPC Experiment Introduction for Beginnershpcexperiment
 
20121017 OpenStack CERN Accelerating Science
20121017 OpenStack CERN Accelerating Science20121017 OpenStack CERN Accelerating Science
20121017 OpenStack CERN Accelerating ScienceTim Bell
 
HTCC poster for CERN Openlab opendays 2015
HTCC poster for CERN Openlab opendays 2015HTCC poster for CERN Openlab opendays 2015
HTCC poster for CERN Openlab opendays 2015Karel Ha
 
Experiments with Complex Scientific Applications on Hybrid Cloud Infrastructures
Experiments with Complex Scientific Applications on Hybrid Cloud InfrastructuresExperiments with Complex Scientific Applications on Hybrid Cloud Infrastructures
Experiments with Complex Scientific Applications on Hybrid Cloud InfrastructuresRafael Ferreira da Silva
 
The CMS openstack, opportunistic, overlay, online-cluster Cloud (CMSooooCloud)
The CMS openstack, opportunistic, overlay, online-cluster Cloud (CMSooooCloud)The CMS openstack, opportunistic, overlay, online-cluster Cloud (CMSooooCloud)
The CMS openstack, opportunistic, overlay, online-cluster Cloud (CMSooooCloud)Jose Antonio Coarasa Perez
 
Virtual Network Functions as Real-Time Containers in Private Clouds
Virtual Network Functions as Real-Time Containers in Private CloudsVirtual Network Functions as Real-Time Containers in Private Clouds
Virtual Network Functions as Real-Time Containers in Private Cloudstcucinotta
 
Using Community Clouds for Load Testing- the ProActive CLIF solution, OW2con'...
Using Community Clouds for Load Testing- the ProActive CLIF solution, OW2con'...Using Community Clouds for Load Testing- the ProActive CLIF solution, OW2con'...
Using Community Clouds for Load Testing- the ProActive CLIF solution, OW2con'...OW2
 
Modeling and simulation of power consumption and execution times for real-tim...
Modeling and simulation of power consumption and execution times for real-tim...Modeling and simulation of power consumption and execution times for real-tim...
Modeling and simulation of power consumption and execution times for real-tim...tcucinotta
 

What's hot (20)

Exploring the Performance Impact of Virtualization on an HPC Cloud
Exploring the Performance Impact of Virtualization on an HPC CloudExploring the Performance Impact of Virtualization on an HPC Cloud
Exploring the Performance Impact of Virtualization on an HPC Cloud
 
Cloud Roundtable at Microsoft Switzerland
Cloud Roundtable at Microsoft Switzerland Cloud Roundtable at Microsoft Switzerland
Cloud Roundtable at Microsoft Switzerland
 
Programmable Exascale Supercomputer
Programmable Exascale SupercomputerProgrammable Exascale Supercomputer
Programmable Exascale Supercomputer
 
Gupta_Keynote_VTDC-3
Gupta_Keynote_VTDC-3Gupta_Keynote_VTDC-3
Gupta_Keynote_VTDC-3
 
CERN IT Monitoring
CERN IT Monitoring CERN IT Monitoring
CERN IT Monitoring
 
USENIX NSDI 2016 (Session: Resource Sharing)
USENIX NSDI 2016 (Session: Resource Sharing)USENIX NSDI 2016 (Session: Resource Sharing)
USENIX NSDI 2016 (Session: Resource Sharing)
 
Flow-centric Computing - A Datacenter Architecture in the Post Moore Era
Flow-centric Computing - A Datacenter Architecture in the Post Moore EraFlow-centric Computing - A Datacenter Architecture in the Post Moore Era
Flow-centric Computing - A Datacenter Architecture in the Post Moore Era
 
Big Process for Big Data @ NASA
Big Process for Big Data @ NASABig Process for Big Data @ NASA
Big Process for Big Data @ NASA
 
Toward a National Research Platform
Toward a National Research PlatformToward a National Research Platform
Toward a National Research Platform
 
UberCloud HPC Experiment Introduction for Beginners
UberCloud HPC Experiment Introduction for BeginnersUberCloud HPC Experiment Introduction for Beginners
UberCloud HPC Experiment Introduction for Beginners
 
20121017 OpenStack CERN Accelerating Science
20121017 OpenStack CERN Accelerating Science20121017 OpenStack CERN Accelerating Science
20121017 OpenStack CERN Accelerating Science
 
HTCC poster for CERN Openlab opendays 2015
HTCC poster for CERN Openlab opendays 2015HTCC poster for CERN Openlab opendays 2015
HTCC poster for CERN Openlab opendays 2015
 
Helix Nebula - The Science Cloud, Status Update
Helix Nebula - The Science Cloud, Status UpdateHelix Nebula - The Science Cloud, Status Update
Helix Nebula - The Science Cloud, Status Update
 
Experiments with Complex Scientific Applications on Hybrid Cloud Infrastructures
Experiments with Complex Scientific Applications on Hybrid Cloud InfrastructuresExperiments with Complex Scientific Applications on Hybrid Cloud Infrastructures
Experiments with Complex Scientific Applications on Hybrid Cloud Infrastructures
 
The CMS openstack, opportunistic, overlay, online-cluster Cloud (CMSooooCloud)
The CMS openstack, opportunistic, overlay, online-cluster Cloud (CMSooooCloud)The CMS openstack, opportunistic, overlay, online-cluster Cloud (CMSooooCloud)
The CMS openstack, opportunistic, overlay, online-cluster Cloud (CMSooooCloud)
 
Federated HPC Clouds applied to Radiation Therapy
Federated HPC Clouds applied to Radiation TherapyFederated HPC Clouds applied to Radiation Therapy
Federated HPC Clouds applied to Radiation Therapy
 
Machine Learning & Data Science in the Age of the GPU: Smarter, Faster, Better
Machine Learning & Data Science in the Age of the GPU: Smarter, Faster, BetterMachine Learning & Data Science in the Age of the GPU: Smarter, Faster, Better
Machine Learning & Data Science in the Age of the GPU: Smarter, Faster, Better
 
Virtual Network Functions as Real-Time Containers in Private Clouds
Virtual Network Functions as Real-Time Containers in Private CloudsVirtual Network Functions as Real-Time Containers in Private Clouds
Virtual Network Functions as Real-Time Containers in Private Clouds
 
Using Community Clouds for Load Testing- the ProActive CLIF solution, OW2con'...
Using Community Clouds for Load Testing- the ProActive CLIF solution, OW2con'...Using Community Clouds for Load Testing- the ProActive CLIF solution, OW2con'...
Using Community Clouds for Load Testing- the ProActive CLIF solution, OW2con'...
 
Modeling and simulation of power consumption and execution times for real-tim...
Modeling and simulation of power consumption and execution times for real-tim...Modeling and simulation of power consumption and execution times for real-tim...
Modeling and simulation of power consumption and execution times for real-tim...
 

Similar to HPC Cloud Calligo system

The OptIPuter as a Prototype for CalREN-XD
The OptIPuter as a Prototype for CalREN-XDThe OptIPuter as a Prototype for CalREN-XD
The OptIPuter as a Prototype for CalREN-XDLarry Smarr
 
CloudLightning and the OPM-based Use Case
CloudLightning and the OPM-based Use CaseCloudLightning and the OPM-based Use Case
CloudLightning and the OPM-based Use CaseCloudLightning
 
Get Your Head in the Cloud - Lessons in GPU Computing with Schlumberger
Get Your Head in the Cloud - Lessons in GPU Computing with SchlumbergerGet Your Head in the Cloud - Lessons in GPU Computing with Schlumberger
Get Your Head in the Cloud - Lessons in GPU Computing with Schlumbergerinside-BigData.com
 
Security Challenges and the Pacific Research Platform
Security Challenges and the Pacific Research PlatformSecurity Challenges and the Pacific Research Platform
Security Challenges and the Pacific Research PlatformLarry Smarr
 
AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...
AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...
AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...Amazon Web Services
 
Opportunities of ML-based data analytics in ABCI
Opportunities of ML-based data analytics in ABCIOpportunities of ML-based data analytics in ABCI
Opportunities of ML-based data analytics in ABCIRyousei Takano
 
OptIPuter Overview
OptIPuter OverviewOptIPuter Overview
OptIPuter OverviewLarry Smarr
 
Systems Support for Many Task Computing
Systems Support for Many Task ComputingSystems Support for Many Task Computing
Systems Support for Many Task ComputingEric Van Hensbergen
 
EGITF 2013 - Bringing Private Cloud Computing to HPC and Science with OpenNebula
EGITF 2013 - Bringing Private Cloud Computing to HPC and Science with OpenNebulaEGITF 2013 - Bringing Private Cloud Computing to HPC and Science with OpenNebula
EGITF 2013 - Bringing Private Cloud Computing to HPC and Science with OpenNebulaOpenNebula Project
 
Bringing Private Cloud computing to HPC and Science - EGI TF tf 2013
Bringing Private Cloud computing to HPC and Science -  EGI TF tf 2013Bringing Private Cloud computing to HPC and Science -  EGI TF tf 2013
Bringing Private Cloud computing to HPC and Science - EGI TF tf 2013Ignacio M. Llorente
 
Scallable Distributed Deep Learning on OpenPOWER systems
Scallable Distributed Deep Learning on OpenPOWER systemsScallable Distributed Deep Learning on OpenPOWER systems
Scallable Distributed Deep Learning on OpenPOWER systemsGanesan Narayanasamy
 
Webinar: Cutting Time, Complexity and Cost from Data Science to Production
Webinar: Cutting Time, Complexity and Cost from Data Science to ProductionWebinar: Cutting Time, Complexity and Cost from Data Science to Production
Webinar: Cutting Time, Complexity and Cost from Data Science to Productioniguazio
 
The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, Universi...
The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, Universi...The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, Universi...
The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, Universi...OpenStack
 
Supporting bioinformatics applications with hybrid multi-cloud services
Supporting bioinformatics applications with hybrid multi-cloud servicesSupporting bioinformatics applications with hybrid multi-cloud services
Supporting bioinformatics applications with hybrid multi-cloud servicesAhmed Abdullah
 
Distributed and Cloud Computing 1st Edition Hwang Solutions Manual
Distributed and Cloud Computing 1st Edition Hwang Solutions ManualDistributed and Cloud Computing 1st Edition Hwang Solutions Manual
Distributed and Cloud Computing 1st Edition Hwang Solutions Manualkyxeminut
 
Applying Cloud Techniques to Address Complexity in HPC System Integrations
Applying Cloud Techniques to Address Complexity in HPC System IntegrationsApplying Cloud Techniques to Address Complexity in HPC System Integrations
Applying Cloud Techniques to Address Complexity in HPC System Integrationsinside-BigData.com
 
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facilityinside-BigData.com
 

Similar to HPC Cloud Calligo system (20)

The OptIPuter as a Prototype for CalREN-XD
The OptIPuter as a Prototype for CalREN-XDThe OptIPuter as a Prototype for CalREN-XD
The OptIPuter as a Prototype for CalREN-XD
 
CloudLightning and the OPM-based Use Case
CloudLightning and the OPM-based Use CaseCloudLightning and the OPM-based Use Case
CloudLightning and the OPM-based Use Case
 
Get Your Head in the Cloud - Lessons in GPU Computing with Schlumberger
Get Your Head in the Cloud - Lessons in GPU Computing with SchlumbergerGet Your Head in the Cloud - Lessons in GPU Computing with Schlumberger
Get Your Head in the Cloud - Lessons in GPU Computing with Schlumberger
 
Security Challenges and the Pacific Research Platform
Security Challenges and the Pacific Research PlatformSecurity Challenges and the Pacific Research Platform
Security Challenges and the Pacific Research Platform
 
AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...
AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...
AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...
 
Opportunities of ML-based data analytics in ABCI
Opportunities of ML-based data analytics in ABCIOpportunities of ML-based data analytics in ABCI
Opportunities of ML-based data analytics in ABCI
 
OptIPuter Overview
OptIPuter OverviewOptIPuter Overview
OptIPuter Overview
 
Systems Support for Many Task Computing
Systems Support for Many Task ComputingSystems Support for Many Task Computing
Systems Support for Many Task Computing
 
EGITF 2013 - Bringing Private Cloud Computing to HPC and Science with OpenNebula
EGITF 2013 - Bringing Private Cloud Computing to HPC and Science with OpenNebulaEGITF 2013 - Bringing Private Cloud Computing to HPC and Science with OpenNebula
EGITF 2013 - Bringing Private Cloud Computing to HPC and Science with OpenNebula
 
Bringing Private Cloud computing to HPC and Science - EGI TF tf 2013
Bringing Private Cloud computing to HPC and Science -  EGI TF tf 2013Bringing Private Cloud computing to HPC and Science -  EGI TF tf 2013
Bringing Private Cloud computing to HPC and Science - EGI TF tf 2013
 
Grid computing & its applications
Grid computing & its applicationsGrid computing & its applications
Grid computing & its applications
 
Scallable Distributed Deep Learning on OpenPOWER systems
Scallable Distributed Deep Learning on OpenPOWER systemsScallable Distributed Deep Learning on OpenPOWER systems
Scallable Distributed Deep Learning on OpenPOWER systems
 
Cloud, Fog, or Edge: Where and When to Compute?
Cloud, Fog, or Edge: Where and When to Compute?Cloud, Fog, or Edge: Where and When to Compute?
Cloud, Fog, or Edge: Where and When to Compute?
 
Webinar: Cutting Time, Complexity and Cost from Data Science to Production
Webinar: Cutting Time, Complexity and Cost from Data Science to ProductionWebinar: Cutting Time, Complexity and Cost from Data Science to Production
Webinar: Cutting Time, Complexity and Cost from Data Science to Production
 
The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, Universi...
The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, Universi...The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, Universi...
The Why and How of HPC-Cloud Hybrids with OpenStack - Lev Lafayette, Universi...
 
01-06 OCRE Test Suite - Fernandes.pdf
01-06 OCRE Test Suite - Fernandes.pdf01-06 OCRE Test Suite - Fernandes.pdf
01-06 OCRE Test Suite - Fernandes.pdf
 
Supporting bioinformatics applications with hybrid multi-cloud services
Supporting bioinformatics applications with hybrid multi-cloud servicesSupporting bioinformatics applications with hybrid multi-cloud services
Supporting bioinformatics applications with hybrid multi-cloud services
 
Distributed and Cloud Computing 1st Edition Hwang Solutions Manual
Distributed and Cloud Computing 1st Edition Hwang Solutions ManualDistributed and Cloud Computing 1st Edition Hwang Solutions Manual
Distributed and Cloud Computing 1st Edition Hwang Solutions Manual
 
Applying Cloud Techniques to Address Complexity in HPC System Integrations
Applying Cloud Techniques to Address Complexity in HPC System IntegrationsApplying Cloud Techniques to Address Complexity in HPC System Integrations
Applying Cloud Techniques to Address Complexity in HPC System Integrations
 
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
 

Recently uploaded

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 

Recently uploaded (20)

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 

HPC Cloud Calligo system

  • 1. HPC Cloud Calligo system “Focus on your research” Floris Sluiter Project leader SARA
  • 3. What is High Performance Cloud Computing? ● Cloud Computing: Self Service Dynamically Scalable Computing Facilities – Cloud computing is not about new technology, it is about new uses of technology ● “HPC Cloud” is a “Cloud built with high end HPC hardware” ● Through virtualization: ● Better seperation between users, allowing more administrative rights ● Very fast to set up custom environments, saving application porting time ● Users can fully define and control a private HPC cluster
  • 4. Why an HPC Cloud? ● Christophe Blanchet, IDB - Infrastructure Distributing Biology: ● Big task to port them all to your favorite architecture ● Is true for many scientific fields ● HPC Cloud: ● Provides a different solution to “porting applications” ● Provides the possibility to scale up your Desktop environment to HPC
  • 5. The SARA / BiG Grid Cloud project ● The beginning ● Started in 2009 from a powerpoint presentation ● Pilot ● 3 months with 10 users on 48 core cluster – Participation was a “price” in a competition – Very succesfull ● Preparing for Operations ● 2010 BiG Grid grants funding for project – 2010 – 2011 Product development – 50 usergroups on 128 core cluster ● 2011 Big Grid grants funding for HPC hardware ● Now ● Presenting a production infrastructure ● HPC Cloud day 4th of October (slides and video is online at www.sara.nl)
  • 6. HPC Cloud Project Publicity & dissemination ● In 2009 we were the first in the global Grid community to offer this to end users ● We started international interest group and presented at many events (ECEE, EGI, Sienna Cloud Scape, etc) and institutions ● We are in the top 5 search results for “HPC Cloud” and “HPC Cloud Computing” at Google ● News articles on paper and in online publications (HPCCloud, HPCWire, Computable, etc)
  • 7. User participation 30 involved in Beta testing nr. Title Core Hours Storage Objective Group/instiute 1 10-100GB / VM 2 2000 (+) 75-100GB Analyse 20 million Flickr Geocoded data points Uva, GPIO institute 3 Urban Flood Simulation 1500 1 GB UvA, Computational Science 4 testing 1GB / VM 5 8000 150GB 6 7 Department of Geography, UvA 8 1 TB Deltares 9 20TB 10 320 20GB 11 160 630GB Video Feature extraction 12 150-300GB Chris Klijn, NKI Cloud computing for sequence assembly 14 samples * 2 vms * 2-4 cores * 2 days = 5000 Run a set of prepared vm's for different and specific sequence assembly tasks Bacterial Genomics, CMBI Nijmegen Cloud computing for a multi- method perspective study of construction of (cyber)space and place asses cloud technology potential and efficiency on ported Urban Flood simulation modules A user friendly cloud-based inverse modelling environment Further develop a user-friendly desktop environment running in the cloud supporting modelling, testing and large scale running of model. Computational Geo-ecology, UvA Real life HPC cloud computing experiences for MicroArray analyses Test, development and acquire real life experiences using vm's for microarray analysis Microarray Department, Integrative BioInformatics Unit, UvA Customized pipelines for the processing of MRI brain data ? up to 1TB of data -> transferred out quickly. Configure a customized virtual infrastructure for MRI image processing pipelines Biomedical Imaging Group, Rotterdam, Erasmus MC Cloud computing for historical map collections: access and georeferencing ? 7VM's of 500 GB = 3.5 TB Set up distributed, decentralized autonomous georeferencing data delivery system. Parallellization of MT3DMS for modeling contaminant transport at large scale 64 cores, schaling experiments / * 80 hours = 5000 hours Goal, investigate massive parallell scaling for code speed-up An imputation pipeline on Grid Gain Estimate an execution time of existing bioinformatics pipelines and, in particular, heavy imputation pipelines on a new HPC cloud Groningen Bioinformatics Center, university of groningen Regional Atmospheric Soaring Prediction Demonstrate how cloud computing eliminates porting problems. Computational Geo-ecology, UvA Extraction of Social Signals from video Pattern Recognition Laboratory, TU DelftAnalysis of next generation sequencing data from mouse tumors ? Run analysis pipeline to create mouse model for genome analysis
  • 8. User participation 30 involved in Beta testing nr. Title Core Hours Storage Objective Group/instiute 1 10-100GB / VM 2 2000 (+) 75-100GB Analyse 20 million Flickr Geocoded data points Uva, GPIO institute 3 Urban Flood Simulation 1500 1 GB UvA, Computational Science 4 testing 1GB / VM 5 8000 150GB 6 7 Department of Geography, UvA 8 1 TB Deltares 9 20TB 10 320 20GB 11 160 630GB Video Feature extraction 12 150-300GB Chris Klijn, NKI Cloud computing for sequence assembly 14 samples * 2 vms * 2-4 cores * 2 days = 5000 Run a set of prepared vm's for different and specific sequence assembly tasks Bacterial Genomics, CMBI Nijmegen Cloud computing for a multi- method perspective study of construction of (cyber)space and place asses cloud technology potential and efficiency on ported Urban Flood simulation modules A user friendly cloud-based inverse modelling environment Further develop a user-friendly desktop environment running in the cloud supporting modelling, testing and large scale running of model. Computational Geo-ecology, UvA Real life HPC cloud computing experiences for MicroArray analyses Test, development and acquire real life experiences using vm's for microarray analysis Microarray Department, Integrative BioInformatics Unit, UvA Customized pipelines for the processing of MRI brain data ? up to 1TB of data -> transferred out quickly. Configure a customized virtual infrastructure for MRI image processing pipelines Biomedical Imaging Group, Rotterdam, Erasmus MC Cloud computing for historical map collections: access and georeferencing ? 7VM's of 500 GB = 3.5 TB Set up distributed, decentralized autonomous georeferencing data delivery system. Parallellization of MT3DMS for modeling contaminant transport at large scale 64 cores, schaling experiments / * 80 hours = 5000 hours Goal, investigate massive parallell scaling for code speed-up An imputation pipeline on Grid Gain Estimate an execution time of existing bioinformatics pipelines and, in particular, heavy imputation pipelines on a new HPC cloud Groningen Bioinformatics Center, university of groningen Regional Atmospheric Soaring Prediction Demonstrate how cloud computing eliminates porting problems. Computational Geo-ecology, UvA Extraction of Social Signals from video Pattern Recognition Laboratory, TU DelftAnalysis of next generation sequencing data from mouse tumors ? Run analysis pipeline to create mouse model for genome analysis Field # projects Bioinformatics 14 Ecology 4 Geography 3 Computer science 4 Alpha/Gamma 5
  • 9. Virtual HPC Cluster ● A true HPC cluster with dedicated storage, network and compute nodes ● User has full control ● Upload and run any image ● Custom environment examples – A much bigger workstation and/or many identical workstations – Closed source software ● Different environments in same cluster ● Example: combine Linux with windows nodes ● High mem, disk, I/O, CPU and in any combination ● Long running jobs
  • 10. HPC Cloud Application types ● Applications with different requirements can co-exist on the same physical host and in the same virtual cluster ● Single node (remote desktop on HPC node) ● Master with workers (standard cluster) ● Token server ● Pipelines/workflows ● example: MSWindows+Linux ● 24/7 Services that start workers HPC Type Examples Requirements Compute Intensive Monte Carlo simulations and parameter optimizations, etc CPU Cycles Data intensive Signal/Image processing in Astronomy, Remote Sensing, Medical Imaging, DNA matching, Pattern matching, etc I/O to data (SAN File Servers) Communication intensive Particle Physics, MPI, etc Fast interconnect network Memory intensive DNA assembly, etc Large (Shared) RAM Continuous services Databases, webservers, webservices Dynamically scalable
  • 11. HPC Cloud Architecture Calligo “I make clouds” 1 Node: ● CPU Intel 2.13 GHz 32 cores (Xeon-E7 "Westmere-EX") ● RAM 256 Gbyte ● "Local disk" 10 Tbyte ● Ethernet 4*10GE Total System (19 nodes) ● 608 cores ● RAM 4,75TB ● 144 ports 10GE, 1-hop, non-blocking interconnect (fabric: 2.5 Terabit/s full- duplex,4.5usec port to port latency ● 400TB shared storage (ISCSI,NFS,CIFS,WebDAV...) ● 11.5K specints / 5TFlops Platform and tools: Redmine collaboration portal Custom GUI (Open Source) Open Nebula + custom add-ons KVM + Libvirt ● Novel architecture ● Low latecy 10GE switch chassis at the heart, with mountable storage ● Per node High mem, high core, 4* network (but smaller partitions possible) ● Fast and dynamic changes in configurations and partitions ● Security and safety through strict separation of users + monitoring network ● Invested in Expandability ● Network switch and storage controllers can be expanded to support more compute nodes ● Future architecture focus might change, depending on usage patterns ● Virtualization overhead <10% ● Very flexible for any application. “Just great” is often good enough
  • 12. Cloud = Expensive??? ● Please Note: ● The hardware is what makes it “High performance computing” – We choose this type of hardware to fill a specific niche in our offerings ● The Software stack is what makes it “Cloud” ● You can do this with any type of hardware (but be careful with Infiniband)!! ● Actually we give much less user support – expensive hardware but cheaper in personnel
  • 13. HPC Cloud Architecture vs other HPC Architectures System Node Characteristics Huygens National Super CPU Power6, 4.7Ghz, 32/64 cores RAM 128/256 GB "local disk" 8Tbyte Infiniband 8*20 Gbit/s Network, storage, and compute are fully balanced resulting in a highly optimized system for I/O- intensive, parallel applications LISA National Compute Cluster CPU Intel 2.26Ghz 8/12 cores RAM 24 Gbyte Local disk 65Gbyte/200Gbyte Infiniband 20Gbit/s Network, storage, and compute are optimized for efficient parallel applications Grid CPU Intel 2.2 Ghz 8 cores RAM 24 Gbyte Local disk 300Gbyte Ethernet 1 Gbit/s Network, storage, and compute are optimized for high throughput of data and single core data processing Cloud CPU Intel 2.13 GHz 32 cores (Xeon-E7 "Westmere-EX") RAM 256 Gbyte "Local disk" 10 Tbyte Ethernet 4*10GE Network, storage, and compute are dynamically scalable and configurable for both large and small workloads
  • 14. Cloud security & trust Chief principle: !!! CONTAINMENT !!! ● Protect – the outside from the cloud users – the cloud users from the outside – the cloud users from each other – Not possible to protect the cloud user from himself because user has full access/control/responsibility ● ex. virus research must be possible ● Keep each VM inside its own sandbox, completely separated from others and its host ● Private Vlan per user – Access to outside only through proxy – Access from outside only through VNC on proxy (port forwarding) ● Public ip is allowed (both incoming and outgoing), but Source & Target (range) need to be specified – Self service & Dynamically: rules are autogranted – Traffic is monitored – Open Ports are monitored ● Specialized hardware helps, but not necessary – QinQ, Bridges, IP tables
  • 15. Calligo Network ● 1 entry point from administration network ● No connection from admin to user space ● Almost full redundancy ● Switch fabric is impressive: 3.75 Terabit full-duplex with 4.5usec port to port latency ● Dedicated lightpaths to various systems in the Netherlands
  • 16. Cloud Security “Public” side SRIOV SRIOV SRIOV SRIOV Arista VLANs / QinQ QOS/Throttle Firewall VM 2 VMVM 1 VM 6VM 3VM 5VM 4 PCI PCI PCI PCI PCI PCI PCI400 Tbyte DDN Storage NFS/CIFS/ISCSI TricorderProxy
  • 17. Cloud Security (VLANS + monitoring) SRIOV SRIOV SRIOV SRIOV Arista VLANs / QinQ QOS/Throttle Firewall VM 2 VMVM 1 VM 6VM 3VM 5VM 4 PCI PCI PCI PCI PCI PCI PCI400 Tbyte DDN Storage NFS/CIFS/ISCSI TricorderProxy
  • 19. For current and future users ● Most current users only asked a few support questions when they start to use it, it is easy to do support on this system ;-) ● In the Pilot and Beta phase, the hardware was not ideal for this type of system and applications. The new hardware is/will be much bigger and better! ● On the HPC Cloud: Do not change your HPC applications for the system, but rather adapt the environment they run in ● The result will be: a shorter time to a solution