GridEngine Summit Keynote about Uber Cloud Experiment

Status
September
15
2012

Keynote
Half-‐Time
Results

Why
this
Experiment
?

Access
to
remote
computing
resources

  Focus
on:
simulation
applications
in
digital

manufacturing
and
High
Performance

Technical
Computing
(HPTC)

  Focus
on:
remote
resources
in
HPTC
Centers
&
in
HPTC
Clouds

  Observation:
while
business
clouds
are
becoming
widely
used,
the

acceptance
of
simulation
clouds
in
industry
is
still
in
an
early
adopter

stage
(CAE,
Bio,
Finance,
Oil
&
Gas,
DCC)

  Reason:
big
diﬀerence
between
business
and
simulation
clouds

  Barriers:
Complexity,
IP,
data
transfer,
software
licenses,

parallel
communications,
speciﬁc
system
requirements,

data
security,
interoperability,
cost,
etc.

Our
Goal
for
the
Experiment

  Current
Experiment:
August
–
October
2012

  Form
a
community
around
the
benefits
of
HPTC
in
the
cloud

  Hands-‐on
exploring
and
understand
the
challenges
with

digital
manufacturing
in
the
Cloud

  Study
each
end-‐to-‐end
process
and
find
ways
to
overcome

these
challenges

  Document
our
findings

Participants

Some
of
our
Providers

Some
of
our
Resource
Providers
want
to
be
anonymous

Media
Sponsor

Participants

Some
of
our
ISVs
want

Some
of
our
Software
Providers
to
be
anonymous

Participants

Some
of
our
HPC
Experts

Some
of
our
HPC
Experts
want
to
be
anonymous

Participants

Many
of
our
industry
end-‐users

Some
of
our
End-‐Users

want
to
be
anonymous

Where
are
we
with
the
experiment

  We
currently
have
over
170
participating
organizations
and

individuals

  Experiment
reaches
to
every
corner

of
the
globe,

participants
are

coming
from
22
countries

  Participants
sign
up
through
www.hpcexperiment.com
and

www.cfrdexperiment.com

  25
teams
have
been
formed

and
are
active

Participants
by
geography

%
of
Site
Traﬃc

US
36
%

Germany
12
%

Italy
6
%

Australia
6
%

Spain
5
%

UK
5
%

Russia
3
%

France
3
%

Other
24
%

Teams,
it’s
all
about
teams

  Anchor
Bolt
  Cement
Flow
  Wind
Turbines

  Resonance
  Sprinkler
  Combustion

  Radiofrequency
  Space
Capsule
  Blood
Flow

  Supersonic
  Car
Acoustics
  ChinaCFD

  Liquid-‐Gas
  Dosimetry
  Gas
Bubbles

  Wing-‐Flow
  Weathermen
  Side
impact

  Ship-‐Hull
  ColombiaBio

Building
the
teams

  An
end-‐user
joins
the
experiment

  Organizers
(Burak,
Wolfgang)
identify
perfect
team
expert

  Organizers
contact
the
ISV
and
ask
to
join
the
experiment

  End-‐user
and
team
expert
analyze
resource
requirements
and
send

to
organizers

  Organizers
suggest
one
or
two
computational

resource
providers

  After
all
four
team
members
agree
on
the
right

resource
provider,
the
team
is
ready
to
go

Bumps
on
the
road
–
the
top
4

  Delays
because
of
vacation
times
in
August

&
other

projects
(internal,
customer)
from

our
participants

  Getting
HPC
participants
was
quite
easy,

getting
CAE
participants
was
a
challenge

  Participants
can
spend
only
small
portion
of
their
time

  Learning
the
access
and
usage
processes
of
our
software
and
compute

resource
providers
can
take
many
days

  Process
automation
capabilities
of
providers
vary
greatly.
Some
have

focused
on
enrollment,
registration
automation,
while
others
haven’t.

  Experiment
organizers’
lack
of
automation,
currently
the
whole
end-‐to-‐
end
process
is
manual
(intentionally)

  Getting
regular
updates
from
Team
Experts
is
a
challenge
because
this

is
not
their
day
job

  Consider:
the
sample
size
is
still
small

Are
we
discovering
hurdles?

  Reaching
end-‐users
who
are
ready
and
willing
to
engage
in

HPTC
and
especially
HPTC
in
the
Cloud.

  About
half
of
our
participants
want
to
remain
anonymous,
for

diﬀerent
reasons
(failure,
policies,
internal
processes,…)

  HPC
is
complex.
At
times
it
requires
multiple
experts.

  Matching
end-‐user
projects
with
the
appropriate
resource

providers
is
tricky
and
critical
to
the
teams
success.

  Resource
providers
(e.g.
HPC
Centers)
often
face

internal
policy
and
legal
hurdles

  Sometimes,
the
1000
cpu-‐core
hours
are
a
limit

Let’s
hear
from
Team
Experts!

Chris
Dagdigian

Co-‐founder
and
Principal
Consultant

BioTeam
Inc

Ingo
Seipp

Science
+
Computing

Team
2
Short
Status
Report

HPC
Expert:

End
User:

Anonymous

Team
2
Overview

OUR
END
USER
OUR
HPC
SOFTWARE

•  Individual
&
organization
has
requested
  CST
Studio
Suite

anonymity

  “Electromagnetic
Simulation”

•  Goal:
Hybrid
model
in
which
local
and
  www.cst.com

cloud
resources
leveraged
simultaneously

  Diverse
Architecture
Options

•  We
can
say
this:
1.  Local
Windows
Workstation

–  It’s
a
medical
device

2.  CST
Distributed
Computing

–  Simulating
new
probe
design
for
a

particular
device
3.  CST
MPI

–  Tests
involve
running
at
simulation
size
&

resolution
that
cannot
be
performed
  OS
Diversity

internally*

  Various
combinations
of
Windows

–  *Using
fake
data
at
this
time

and
Linux
based
machines

First
Design
Failed
Miserably

The
Good
The
Bad

•  Looked
pretty
on
paper!
  Can’t
launch
GPU
nodes
from

•  Total
isolation
of
systems
via
inside
a
VPC

Amazon
VPC
and
custom
subnets

  …
so
we
ran
them
on
“regular”

•  VPC
allows
for
“elastic”
NIC
EC2

devices
w/
persistent
MAC

addresses.

–  Awesome
for
license
servers
  …
and
this
did
not
work
well

  NAT
translation
between
EC2

•  VPN
Server
allowed
end-‐user
and
VPC
private
IP
addresses

remote
resources
to
directly
join

our
cloud
environment
wreaked
havoc
with
CST

Distributed
Computing
Master

Second
Design
–
Good
So
Far

The
Good
The
Bad

•  It
works;
we
are
running
tasks
  Lost
the
persistent
MAC

across
multiple
GPU
solver
nodes
address
when
we
left
the

right
now
VPC;
need
to
treat
our
license

•  Security
surprisingly
good
despite

server
very
carefully

losing
VPC
isolation

–  EC2
Security
Groups
block
us
from
  Unclear
today
how
we
will

the
rest
of
Internet
&
AWS
attempt
to
incorporate

remote
solver
&
workstation

•  CST
License
server
now
running
on
resources
at
end-‐user
site

much
cheaper
Linux
instance
  We
know
from
attempt
#1

that
CST
and
NAT
translation

•  Clear
path
to
elasticity
and
high-‐ don’t
work
well
together
…

scale
simulation
runs

Next
Steps

  Run
at
large
scale

  Refine
our
architecture

  We
might
move
the
License
Server
back
into
a
VPC
in
order
to

leverage
the
significant
benefit
of
persistent
MAC
addresses
and

elastic
NIC
devices

  Figure
out
how
bring
in
the
remote
resources
sitting
at
end
user

site

FrontEnd
+
2
GPU
Solvers
In
Action

Team
8,
Short
Status
Report

Multiphase
ﬂows
within
the

Ingo
Seipp
cement
and
mineral
industry

and
Team

Science
+
Computing

Multiphase
ﬂows
within
the
cement

and
mineral
industry

  HPC
Expert:
science
+
computing

  End
user:
FLSmidth

  Applications:
Ansys
CFX,
EDEM

  Resource
provider:
Bull
extreme
factory

  Goals:

  Reduce
runtime
of
jobs

  Increase
mesh
sizes

©
2012
science
+

computing
ag
HPC
Experiment
webinar
|
14.9.2012

Flash
dryer
and
SAG
mill

Challenge:
Scalability
of
ﬂash
dryer
problem

©
2012
science
+

computing
ag
HPC
Experiment
webinar
|
14.9.2012

extreme
factory

www.extremefactory.com

  150
Tﬂops
with
Intel
Xeon
E5
and
X5600
family
cpus

  GPUs

  Over
30
installed

applications

©
2012
science
+

computing
ag
HPC
Experiment
webinar
|
14.9.2012

extreme
factory

  User
access
by
customized
web-‐interface
or
ssh

©
2012
science
+

computing
ag
HPC
Experiment
webinar
|
14.9.2012

Project
status
and
experiences

  XF
has
installed
recent
versions
of
Ansys
software.

  Ansys
CFX
License
provided
by
Ansys,
installed
at
XF.

  Non-‐disclosure
agreements
signed
for
end-‐user

  Access
to
XF
for
end-‐user

  Establish
batch
execution
process
of
customer
case,

i.e.
setting
up
environment
and
command
to
run
user's
job.

  Working
process
changes

GUI-‐oriented
working
process
needs
to
be
adopted
to
integrate
or

allow
cloud-‐based
execution
of
computing
tasks.

©
2012
science
+

computing
ag
HPC
Experiment
webinar
|
14.9.2012

Team
14,
Short
Status
Report

Simulation
of
electromagnetic

radiation
and
dosimetry
in

Ingo
Seipp

humans
in
cars
induced
by

and
Team

mobile
phone
technology

Science
+
Computing

Team
14,
Short
Status
Report

Simulation
of
electromagnetic
radiation
and
dosimetry
in

humans
in
cars
induced
by
mobile
phone
technology.

  Enduser:
University
Wuppertal,
Institute
of

  Application:
CST
Software
MWS,CS,DS,EM
(CST)

  Resource
provider:
University
Rapperswil
MICTC

  Goals:

  Reduce
runtime
of
current
jobs

  Increase
model
resolution
sizes

©
2012
science
+

computing
ag
HPC
Experiment
webinar
|
14.9.2012

EMC
dosimetry
with
high
res
models

Challenge:
Problem
size
and
parallel
execution

©
2012
science
+

computing
ag
HPC
Experiment
webinar
|
14.9.2012

Project
status
and
experiences

  CST
software
has
been
installed
and
the
license
provided
and

installed.

  Access
for
end-‐user
via
VPN
has
been
enabled
by
resource

provider.
Currently,
there
are
still
some
problems
connecting

from
high-‐security
end-‐user
site.
Local
IT
is
investigating.

  The
CST
software
installation
must
still
be
tested
for
a

parallel
job.

  It
requires
some
eﬀort
to
setup
and
test
new
applications.

©
2012
science
+

computing
ag
HPC
Experiment
webinar
|
14.9.2012

Announcing:

Uber-‐Cloud
Experiment
Round
2

  Why
‘Uber-‐Cloud’:
HPC/CAE/BIO
in
the
Cloud
is
only
one
part
of
the

Experiment,
in
addition
we
provide
access
to
HPC
Centers
and
other

resources.

  Round
1
is
proof
of
concept
=>
YES,
remote
access
to
HPC
resources

works,
and,
there
is
real
interest
and
need!

  Round
2
will
be
more
professional
=>
more
tools

instead
of
hands-‐on,
more
teams,
more
applications

beyond
CAE,
a
list
of
professional
services,
measuring

the
eﬀort,
how
much
would
it
cost,
etc.

  Existing
Round
1
Teams
are
encouraged
in
Round
2
to
use
other

resources
or
can
participate
in
forming
new
Teams.

  Oct
15:
Call
for
Participation;
Nov
15:
Start
of
Experiment
Round
2

What
is
next?

  07/24/2012
Publish
updated
kick
off
document

  07/24/2012
Request
for
detailed
participant
profiles

  08/10/2012
End-‐user
projects
submitted

  08/17/2012
Resources
are
assigned,
end-‐user
projects
start

  09/14/2012
Half
Time
meeting
webinar

  10/15/2012
End-‐user
projects
are
completed

  10/31/2012
Experiment
is
completed

  11/15/2012
Experiment
findings
are
published

  11/15/2012
Start
of
Experiment
Round
2,
Kick-‐off
at
SC
in
Salt
Lake
City

Conclusion

  Response
to
the
Uber-‐Cloud
Experiment
is
overwhelming

  Everybody
is
learning
and
working
along
their
very
specific

business
interest

  At
least
20
of
the
25
teams
will
finish
successfully
and
in
time

  97%
of
current
participants
will
continue
in
Round
2

  Univa
Grid
Engine
Community:
forming
teams
which
explore

bursting
into
a
public
HPC
Cloud
from
their
Univa
GE
cluster

  The
experiment
could
help
the
Grid
Engine
customers
be
more:

business
flexible,
profitable,
cost
effective,
customer
friendly…

GridEngine Summit Keynote about Uber Cloud Experiment

Recomendados

Recomendados

Más contenido relacionado

Destacado

Destacado (19)

Similar a GridEngine Summit Keynote about Uber Cloud Experiment

Similar a GridEngine Summit Keynote about Uber Cloud Experiment (20)

Último

Último (20)

GridEngine Summit Keynote about Uber Cloud Experiment