How local, regional, and national cyberinfrastructure can be coordinated and linked to advance science and engineering, based on experiences and lessons from the Center for Computation & Technology at LSU (ideas, funding, implementation), plus some thoughts on what might be done differently if we were starting today. Presented at First Workshop - Center for Computational Engineering & Sciences, Unicamp, Campinas, Brazil 10 APR 2014
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Advancing Science through Coordinated Cyberinfrastructure
1.
www.ci.anl.gov
www.ci.uchicago.edu
Advancing
Science
through
Coordinated
Cyberinfrastructure
Daniel
S.
Katz
d.katz@ieee.org
Senior
Fellow,
ComputaBon
InsBtute,
University
of
Chicago
&
Argonne
NaBonal
Laboratory
Affiliate
Faculty,
Center
for
ComputaBon
&
Technology,
Louisiana
State
University
Adjunct
Associate
Professor,
Electrical
and
Computer
Engineering,
LSU
2. www.ci.anl.gov
www.ci.uchicago.edu
2
Advancing
Science
through
CI
–
d.katz@ieee.org
Topics
• What
we
did
in
Louisiana
from
2006-‐2010
• What
I
would
do
differently
now
• A
short
video
to
highlight
some
addiBonal
issues
that
I
hope
the
Center
for
ComputaBonal
Engineering
&
Sciences
will
keep
in
mind
3. www.ci.anl.gov
www.ci.uchicago.edu
3
Advancing
Science
through
CI
–
d.katz@ieee.org
Louisiana
• Area: 134 382 km2 (33/51)
• Population: 4 533 000 (2010, 25/51)
• GDP: $208 billion (2009, 24/51)
• GDP/person: $45 700 (2009, 21/51)
• In Poverty: 17% (2009, 44/51)
• High School Degree: 82% (2009, 46/51)
• BS Degree: 21% (2009, 47/51)
• Advanced Degree: 7% (2009, 48/51)
State
Goals:
talented
workforce,
great
compeBBveness,
strong
educaBonal
system,
increased
economic
development
4. www.ci.anl.gov
www.ci.uchicago.edu
4
Advancing
Science
through
CI
–
d.katz@ieee.org
PITAC
Report
Summary:
• “ComputaBonal
science
-‐-‐
the
use
of
advanced
compuBng
capabiliBes
to
understand
and
solve
complex
problems
-‐-‐
is
criBcal
to
scienBfic
leadership,
economic
compeBBveness,
and
naBonal
security.
It
is
one
of
the
most
important
technical
fields
of
the
21st
century
because
it
is
essenBal
to
advances
throughout
society.”
• “UniversiBes
must
significantly
change
organizaBonal
structures:
mulBdisciplinary
&
collaboraBve
research
are
needed
[for
US]
to
remain
compeBBve
in
global
science”
Complex
problems:
Innova1ons
will
occur
at
boundaries
5. www.ci.anl.gov
www.ci.uchicago.edu
5
Advancing
Science
through
CI
–
d.katz@ieee.org
Big
Science
and
Infrastructure
• Higgs*
boson
discovery
announced
at
CERN
July
4,
2012
• Instrument:
Large
Hadron
Collider
(LHC)
• Infrastructure
– CompuBng
Hardware:
Worldwide
LHC
CompuBng
Grid
(WLCG):
235,000
cores
across
36
countries,
including
OpenScience
Grid
(OSG,
US),
European
Grid
Infrastructure
(EGI,
Europe),
...
– Data:
~20
PB
of
data
created
in
2011-‐2012
– Soiware:
grid
middleware,
physics
analysis
applicaBons,
...
– Networks
– EducaBon
&
Training
• Data
generated
centrally,
moved
(~3
PB/week)
across
mulB-‐Bered
infrastructure
to
be
compuBng
upon
6. www.ci.anl.gov
www.ci.uchicago.edu
6
Advancing
Science
through
CI
–
d.katz@ieee.org
Big
Science
and
Infrastructure
• Hurricanes
affect
humans
• MulB-‐physics:
atmosphere,
ocean,
coast,
vegetaBon,
soil
– Sensors
and
data
as
inputs
• Humans:
what
have
they
built,
where
are
they,
what
will
they
do
– Data
and
models
as
inputs
• Infrastructure:
– Urgent/scheduled
processing,
workflow
systems
– Soiware
applicaBons,
workflows
– Networks
– Decision-‐support
systems,
visualizaBon
– Data
storage,
interoperability
7. www.ci.anl.gov
www.ci.uchicago.edu
7
Advancing
Science
through
CI
–
d.katz@ieee.org
Long-‐tail
Science
and
Infrastructure
• Exploding
data
volumes
&
powerful
simulaBon
methods
mean
that
more
researchers
need
advanced
infrastructure
• Such
“long-‐tail”
researchers
cannot
afford
expensive
experBse
and
unique
infrastructure
• Challenge:
Outsource
and/or
automate
Bme-‐consuming
common
processes
– Tools,
e.g.,
Globus
Online
and
data
management
o Note:
much
LHC
data
is
moved
by
Globus
GridFTP,
e.g.,
May/June
2012,
>20
PB,
>20M
files
– Gateways,
e.g.,
nanoHUB,
CIPRES,
access
to
scienBfic
simulaBon
soiware
NSF
grant
size,
2007.
(“Dark
data
in
the
long
tail
of
science”,
B.
Heidorn)
8. www.ci.anl.gov
www.ci.uchicago.edu
8
Advancing
Science
through
CI
–
d.katz@ieee.org
Long-‐tail
Science
and
Infrastructure
• CIPRES
Science
Gateway
for
PhylogeneBcs
– Study
of
diversificaBon
of
life
and
relaBonships
among
living
things
through
Bme
• Highly
used,
as
of
mid
2013:
– Cited
in
at
least
400
publicaBons,
e.g.,
Nature,
PNAS,
Cell
– More
than
5000
unique
users
in
3
years
– Used
rouBnely
in
at
least
68
undergraduate
classes
– 45%
US
(including
most
states),
55%
70
other
countries
• Infrastructure
– Flexible
web
applicaBon
o A
science
gateway,
uses
soiware
and
lessons
from
XSEDE
gateways
team,
e.g.,
idenBfy
management,
HPC
job
control
– Science
soiware:
tree
inference
and
sequence
alignment
o Parallel
versions
of
MrBayes,
RAxML,
GARLI,
BEAST,
MAFFT
o PAUP*,
Poy,
ClustalW,
Contralign,
FSA,
MUSCLE,
...
– Data
o Personal
user
space
for
storing
results
o Tools
to
transfer
and
view
data
Credit:
Mark
Miller,
SDSC
9. www.ci.anl.gov
www.ci.uchicago.edu
9
Advancing
Science
through
CI
–
d.katz@ieee.org
Infrastructure
Challenges
• Science
– Larger
teams,
more
disciplines,
more
countries
• Data
– Size,
complexity,
rates
all
increasing
rapidly
– Need
for
interoperability
(systems
and
policies)
• Systems
– More
cores,
more
architectures
(GPUs),
more
memory
hierarchy
– Changing
balances
(latency
vs
bandwidth)
– Changing
limits
(power,
funds)
– System
architecture
and
business
models
changing
(clouds)
– Network
capacity
growing;
increase
networks
-‐>
increased
security
• Soiware
– MulBphysics
algorithms,
frameworks
– Programing
models
and
abstracBons
for
science,
data,
and
hardware
– V&V,
reproducibility,
fault
tolerance
• People
– EducaBon
and
training
– Career
paths
– Credit
and
avribuBon
10. www.ci.anl.gov
www.ci.uchicago.edu
10
Advancing
Science
through
CI
–
d.katz@ieee.org
Cyberinfrastructure
“Cyberinfrastructure
consists
of
compu1ng
systems,
data
storage
systems,
advanced
instruments
and
data
repositories,
visualiza1on
environments,
and
people,
all
linked
together
by
so@ware
and
high
performance
networks
to
improve
research
produc1vity
and
enable
breakthroughs
not
otherwise
possible.”
-‐-‐
Craig
Stewart
11. www.ci.anl.gov
www.ci.uchicago.edu
11
Advancing
Science
through
CI
–
d.katz@ieee.org
ComputaBonal
&
Data-‐enabled
Science
&
Engineering
(CDS&E)
• LIGO:
Laser
Interferometric
GravitaBonal
Wave
Observatory
• Ties
together
theory,
computaBon,
and
experiment
– Each
drives
the
other
two!
12. www.ci.anl.gov
www.ci.uchicago.edu
12
Advancing
Science
through
CI
–
d.katz@ieee.org
How
We
Started
• State
commitment:
$25M/year
for
Vision
20/20
– $9M:
LSU
-‐>
CCT
(similarly,
ULL
-‐>
LITE)
• University
commitment
to
build
new
programs
for
21st
century
• State
and
University
willingness
to
make
extraordinary
investments
• Opportunity
to
build
new
world
class
program
in
interdisciplinary
research
and
educaBon,
involving
all
of
LSU
• Ed
Seidel-‐led
vision
to
insBgate
state-‐wide
collaboraBon
13. www.ci.anl.gov
www.ci.uchicago.edu
13
Advancing
Science
through
CI
–
d.katz@ieee.org
Advancing
Research
• PotenBally
requires
advances
in
three
areas,
depending
on
exisBng
strengths
14. www.ci.anl.gov
www.ci.uchicago.edu
14
Advancing
Science
through
CI
–
d.katz@ieee.org
CCT
Director Office
Edward Seidel
HPC Partnership
McMahon
Cyberinfrastructure
Development
Katz
Focus Areas
Allen
LONI
Systems and
Software
Coast to Cosmos
LSU HPC
Performance Team
Core Comp. Sci.
Corporate Relations
Blue Waters, etc.
Material World
Labs: ACAL, DSL,
Viz, LCAT, …
NSF TeraGrid
Cultural Computing
Visualization
14
CCT
OrganizaBon
15. www.ci.anl.gov
www.ci.uchicago.edu
15
Advancing
Science
through
CI
–
d.katz@ieee.org
Cyberinfrastructure
Development
• Vision:
combine
research
and
infrastructure
– Research
o Computer
science
o ApplicaBons
o Tools
• Both
together
have
squared
growth
of
either
alone
• CyD
staff
–
PhDs
in
CS
and
apps
who
understand
the
whole
picture
and
want
to
grow
the
ecosystem
15
– Infrastructure
o Hardware
o OperaBons
o Policies
16. www.ci.anl.gov
www.ci.uchicago.edu
16
Advancing
Science
through
CI
–
d.katz@ieee.org
NaBonal
Lambda
Rail
UNO
Tulane
UL-‐L
SUBR
LSU
LA
Tech
LONI:
40
Gbps
network
LONI:
~100TF
IBM,
Dell
Supercomputers
Cybertools:
Tools
and
Services
CompuBng
in
Louisiana
LONI
InsBtute:
People
and
CollaboraBons
TeraGrid,
OSG
17. www.ci.anl.gov
www.ci.uchicago.edu
17
Advancing
Science
through
CI
–
d.katz@ieee.org
LONI
-‐
Networking
CompuBng
LSU
La TechLSU HSC
ULL
Tulane
SU
UNOLSU HSC
LONI node
Multiple 10GE
~500 core Dell cluster
112 proc. IBM P5 cluster
~4500 core Dell Cluster
ULM
McNeese
NSU
SLU
Alex
Network:
partners
and
customers
18. www.ci.anl.gov
www.ci.uchicago.edu
18
Advancing
Science
through
CI
–
d.katz@ieee.org
LONI
CompuBng
Resources
(2010)
• One
central
Dell
cluster
(Queen
Bee)
– 5500
IB-‐connected
cores
at
ISB
in
Baton
Rouge
– Archival
storage
contracted
through
NCSA
– 50%
of
allocaBons
dedicated
to
TeraGrid
from
2008
• Six
distributed
512-‐core
Dell
clusters
• Five
distributed
14-‐node
(112
procs)
IBM
P5-‐575
clusters
• Distributed
PetaShare
storage
– 32
TB
disk
@
each
small
Dell
cluster
– 8
TB
disk
on
LSU
LaTech
small
Dell
clusters
–
for
LBRN
– 8
TB
at
SC-‐S
HSC-‐NO
–
for
LBRN
– 250
TB
tape
• All
run
by
HPC@LSU,
including
user
support/training
20. www.ci.anl.gov
www.ci.uchicago.edu
20
Advancing
Science
through
CI
–
d.katz@ieee.org
Cactus
• Component-‐based
HPC
framework
– Freely-‐available
environment
for
collaboraBve
applicaBon
development
• Cuzng
edge
CS
– Grid
compuBng,
petascale,
accelerators,
steering,
remote
viz
• AcBve
user
developer
communiBes
– 10
year
pedigree,
$10M
support
– Numerical
RelaBvity,
CFD,
Coastal,
Reservoir
Engineering,
…
• Domain-‐specific
toolkits,
e.g.
CFD
toolkit
– FD/FV/FE
numerical
methods
– Structured,
mulB-‐block,
unstructured
– Uses
PETSc,
Trilinos,
MUMPS,
HYPRE
– Used
to
build
Black
Oil
Toolkit
21. www.ci.anl.gov
www.ci.uchicago.edu
21
Advancing
Science
through
CI
–
d.katz@ieee.org
PetaShare
• Main
concept:
data
is
managed
(migrated,
moved,
replicated,
cached,
etc.)
automaBcally
• Data-‐aware
storage
systems,
data-‐aware
schedulers,
cross-‐domain
metadata
scheme
• Provides:
250
TB
disk,
400
TB
tape
storage
(and
access
to
naBonal
storage
faciliBes)
• ApplicaBons:
coastal
environmental
modeling,
geospaBal
analysis,
bioinformaBcs,
medical
imaging,
fluid
dynamics,
petroleum
engineering,
numerical
relaBvity,
high
energy
physics.
Credit:
Tevfik
Kosar
22. www.ci.anl.gov
www.ci.uchicago.edu
22
Advancing
Science
through
CI
–
d.katz@ieee.org
LONI
InsBtute
“CCT
for
the
Louisiana”
• $15M
5-‐year
project
– $7M
BoR,
$8M
from
LaTech,
LSU,
SUBR,
Tulane,
UNO,
ULL
• Catalyzes
new
inter-‐insBtuBonal
collaboraBons,
ambiBous
projects
and
top
level
hires:
– LONI
network
and
compuBng
– NSF
projects:
PetaShare,
VizTangibles,
TeraGrid,
Blue
Waters
– EPSCoR:
NSF
CyberTools,
DOE
UCoMS,
DoD
– NIH:
$17M
LBRN
– Promote
collaboraBve
research
at
interfaces
for
innovaBon
23. www.ci.anl.gov
www.ci.uchicago.edu
23
Advancing
Science
through
CI
–
d.katz@ieee.org
LONI
InsBtute
Vision
• LONI
investments
create
world
leading
infrastructure
• Create
bold
new
inter-‐university
superstructure
– New
faculty,
staff,
students;
train
others.
Focus
on
CS,
Bio,
Materials,
but
all
disciplines
impacted
– Promote
research
at
interfaces
for
innovaBon
• Draw
on,
enhance
strengths
of
all
universiBes
– Strong
groups
recently
created;
collecBvely
world-‐class
– Solve
complex
problems
through
collaboraBon
computaBon
– Much
stronger
recruiBng
opportuniBes
for
all
insBtuBons
– Statewide
interdisciplinary
educaBon
research
program
• Create
University-‐Industry
Research
Centers
(UIRCs)
– Research
Triangle,
NCSA/UIUC,
Bay
Area,
others
• Transform
Louisiana
– Such
commived
cooperaBon
between
sites
extraordinary
24. www.ci.anl.gov
www.ci.uchicago.edu
24
Advancing
Science
through
CI
–
d.katz@ieee.org
LONI
InsBtute
Hiring
and
Projects
• Two
new
faculty
at
each
insBtuBon
(12
total)
– Six
in
CS,
six
in
Comp.
Bio/Materials
• Six
ComputaBonal
ScienBsts
– Following
Bavarian
KONWIHR
project
– Support
70-‐90
projects
over
five
years;
lead
to
external
funding
• Graduate
students
– 36
new
students
funded,
trained;
two
years
each
• One
Coordinator/economic
development
• All
hiring
coordinated
across
state
• Leading
faculty
across
state
create
mulB-‐insBtuBonal
seed
projects
• Building
on
seeds,
dozens
of
new
projects
selected,
started
• Exploit
common
themes,
compuBng
environments,
tools
found
in
all
areas
25. www.ci.anl.gov
www.ci.uchicago.edu
25
Advancing
Science
through
CI
–
d.katz@ieee.org
TeraGrid
(XSEDE)
• TeraGrid:
world’s
largest
open
scienBfic
discovery
infrastructure
• Leadership
class
resources
at
eleven
partner
sites
combined
to
create
an
integrated,
persistent
computaBonal
resource
– High-‐performance
networks
– High-‐performance
computers
(1
Pflops
(~100,000
cores)
-‐
1.75
Pflops)
o And
a
Condor
pool
(w/
~13,000
CPUs)
– VisualizaBon
systems
– Data
CollecBons
(30
PB,
100
discipline-‐specific
databases)
– Science
Gateways
– User
portal
– User
services
-‐
Help
desk,
training,
advanced
app
support
• Allocated
to
US
researchers
and
their
collaborators
through
naBonal
peer-‐review
process
– Generally,
review
of
compuBng,
not
science
• Mid
2011:
TeraGrid
-‐-‐
XSEDE
26. www.ci.anl.gov
www.ci.uchicago.edu
26
Advancing
Science
through
CI
–
d.katz@ieee.org
Campus
Champions
• “Champion”
is
a
staff
or
faculty
member
on
a
campus
that
provides
informaBon
on
XSEDE
to
his/her
colleagues
• Currently
~160
insBtuBons
represented
by
champions
• Champions
get:
– Monthly
training
and
updates
– Start-‐up
accounts
– Forum
for
sharing
and
interacBons
– Access
to
informaBon
on
usage
by
local
users
– RegistraBons
for
annual
XSEDE
Conference
waived
• Champions
do:
– Raise
awareness
locally
– Provide
training
– Get
users
started
with
access
quickly
– Represent
needs
of
local
community
– Provide
feedback
to
improve
services
– Avend
annual
XSEDE
conference
– Share
their
training
and
educaBon
materials
– Build
community
across
campus,
and
among
all
Champions
March 26, 2014
Revised March 22, 2014
Campus Champion Institutions
Standard – 87
EPSCoR States – 51
Minority Serving Institutions – 12
EPSCoR States and Minority Serving Institutions – 8
Total Campus Champion Institutions – 158
Credit:
Kay
Hunt
27. www.ci.anl.gov
www.ci.uchicago.edu
27
Advancing
Science
through
CI
–
d.katz@ieee.org
LONI
and
NaBonal
Cyberinfrastructure
• TeraGrid
– One
of
the
11
TeraGrid
Resource
Providers
– Playing
a
role
in
TG-‐wide
governance
(TeraGrid
Forum,
ExecuBve
Steering
Commivee,
various
working
groups,
GIG
Director
of
Science)
– Contributed
administraBve
soiware
AmieGold
(glue
between
TG
account
info
and
local
info)
and
CS
soiware
(HARC,
PetaShare,
SAGA)
• OSG
– Currently
providing
resources
• XSEDE
– LONI
not
a
partner
in
XSEDE,
but
a
service
provider
• NaBonally
– Bringing
in
new
users
from
the
southeast
US
– LONI
InsBtute
ComputaBonal
ScienBsts
-‐
Campus
Champions
28. www.ci.anl.gov
www.ci.uchicago.edu
28
Advancing
Science
through
CI
–
d.katz@ieee.org
Create
and
maintain
a
CI
ecosystem
providing
new
capabili'es
that
advance
and
accelerate
scienBfic
inquiry
at
unprecedented
complexity
and
scale
Support
the
foundaBonal
research
needed
to
conBnue
to
efficiently
advance
CI
Enable
transformaBve,
interdisciplinary,
collaboraBve,
science
and
engineering
research
and
educaBon
through
the
use
of
advanced
CI
Transform
pracBce
through
new
policies
for
CI
addressing
challenges
of
academic
culture,
open
disseminaBon
and
use,
reproducibility
and
trust,
curaBon,
sustainability,
governance,
citaBon,
stewardship,
and
avribuBon
of
authorship
Develop
a
next
generaBon
diverse
workforce
of
scienBsts
and
engineers
equipped
with
essenBal
skills
to
use
and
develop
CI,
with
CI
used
in
both
the
research
and
educa'on
process
NSF
Vision:
Infrastructure
Role
Lifecycle
29. www.ci.anl.gov
www.ci.uchicago.edu
29
Advancing
Science
through
CI
–
d.katz@ieee.org
Relevant
NSF
Programs
• EPSCoR
–
targeted
support
for
states
that
are
less
successful
in
NSF
funding
• MRI
–
Major
Research
InstrumentaBon
• CIF21
(NSF’s
CI
umbrella)
– eXtreme
Digital
(XD)
– Track
1
(Blue
Waters)
– Soiware
Infrastructure
for
Sustained
InnovaBon
(SI2)
– Campus
Cyberinfrastructure
-‐
Network
Infrastructure
and
Engineering
(CC-‐NIE)
• IntegraBve
Graduate
EducaBon
and
Research
Traineeship
Program
(IGERT)
• General
research
programs
30. www.ci.anl.gov
www.ci.uchicago.edu
30
Advancing
Science
through
CI
–
d.katz@ieee.org
Recap
(to
2010)
• Louisiana
decides
that
science
and
technology
can
lead
to
a
bever
future
• Builds
a
regional
cyberinfrastructure
(network,
compuBng,
soiware,
~data,
people)
that
connects
to
naBonal-‐scale
infrastructure
– Using
a
mix
of
naBonal,
state,
and
local
funding
• Starts
to
change
culture
–
infuse
computaBon
in
academic
departments,
interdisciplinary
hiring,
large
collaboraBve
projects
• But...
• Didn’t
really
think
about
data
as
much
as
we
would
have
were
we
starBng
again
today
31. www.ci.anl.gov
www.ci.uchicago.edu
31
Advancing
Science
through
CI
–
d.katz@ieee.org
• Swii
is
designed
to
compose
large
parallel
workflows,
from
serial
or
parallel
applicaBon
programs,
to
run
fast
and
efficiently
on
a
variety
of
pla~orms
– A
parallel
scripBng
system
for
Grids
and
clusters
for
loosely-‐coupled
applicaBons
-‐
programs
(executable,
shell,
python,
R,
Octave,
Matlab,
etc.)
linked
by
exchanging
files
– Easy
to
write:
simple
high-‐level
C-‐like
funcBonal
language,
allows
small
Swii
scripts
to
do
large-‐scale
work
– Easy
to
run:
contains
all
services
for
running,
in
one
Java
applicaBon
o Works
on
mulBcore
workstaBons,
HPC,
Grids
(interfaces
to
schedulers,
Globus,
ssh)
– A
powerful,
efficient,
scalable
and
flexible
execuBon
engine.
o Scaling
O(10M)
tasks
–
.5M
in
live
science
work,
and
growing
o CollecBve
data
management
being
developed
to
opBmize
I/O
• Used
in
earth
science,
neuroscience,
proteomics,
molecular
dynamics,
biochemistry,
economics,
staBsBcs,
knowledge
modeling,
and
more
• hvp://www.ci.uchicago.edu/swii
M.
Wilde,
N.
Hategan,
J.
M.
Wozniak,
B.
Clifford,
D.
S.
Katz,
I.
Foster,
Swii:
A
language
for
distributed
parallel
scripBng,
Parallel
CompuBng,
v.
37(9),
pp.
633-‐652,
2011.
32. www.ci.anl.gov
www.ci.uchicago.edu
32
Advancing
Science
through
CI
–
d.katz@ieee.org
Swii
Programming
model:
all
execuBon
driven
by
parallel
data
flow
• analyze1()
and
analyze2()
are
computed
in
parallel
• analyze()
returns
r
when
they
are
done
• This
parallelism
is
automa1c
• Works
recursively
throughout
the
program’s
call
graph
– E.g.,
can
embed
within
foreach
loop,
itself
done
in
parallel
– Foreach
loops
can
be
nested
(int r) analyze(int i)!
{!
j = analyze1(i); !
k = analyze2(i);!
r = 0.5*(j + k);!
}!
!
33. www.ci.anl.gov
www.ci.uchicago.edu
33
Advancing
Science
through
CI
–
d.katz@ieee.org
Submit host (login node, laptop, Linux server)
Data server
Swift
script
Swii
runBme
system
has
drivers
and
algorithms
to
efficiently
support
and
aggregate
vastly
diverse
runBme
environments
Swii
Environment
Clouds:
Amazon
EC2,
XSEDE
Wispy,
Future
Grid
…
Application
Programs
34. www.ci.anl.gov
www.ci.uchicago.edu
34
Advancing
Science
through
CI
–
d.katz@ieee.org
Globus
Big data transfer
and sharing…
…with Dropbox-like simplicity…
…directly from your own storage systems
Run as a non-profit service
to the non-profit research community
35. www.ci.anl.gov
www.ci.uchicago.edu
35
Advancing
Science
through
CI
–
d.katz@ieee.org
Globus
Users
• “I
need
a
good
place
to
store
or
backup
my
(big)
research
data,
at
a
reasonable
price.”
• “I
need
to
easily,
quickly,
and
reliably
move
or
mirror
porBons
of
my
data
to
other
places,
including
my
campus
HPC
system,
lab
server,
desktop,
laptop,
XSEDE,
cloud,
etc.”
• “I
need
a
way
to
easily
and
securely
share
my
data
with
my
colleagues
at
other
insBtuBons.”
• “I
want
to
publish
my
data
so
that
it’s
available
and
discoverable
long-‐term.”
• “I
want
to
archive
my
data
in
case
it’s
needed
someBme
in
the
future.”
36. www.ci.anl.gov
www.ci.uchicago.edu
36
Advancing
Science
through
CI
–
d.katz@ieee.org
Globus
is
SaaS
• Web,
command
line,
and
REST
interfaces
• Reduced
IT
operaBonal
costs
• New
features
automaBcally
available
• Consolidated
support
troubleshooBng
• Easy
to
add
your
laptop,
server,
cluster,
supercomputer,
etc.
with
Globus
Connect
37. www.ci.anl.gov
www.ci.uchicago.edu
37
Advancing
Science
through
CI
–
d.katz@ieee.org
Globus
Connected
Resources
on
Campus
• Research
compuBng
center
• Department
/
lab
storage
• Campus-‐wide
home/project
file
system
• Mass
Storage
Systems
• Science
instruments
• Desktops
and
laptops
• Custom
web
applicaBons
• Amazon
Web
Services
S3
38. www.ci.anl.gov
www.ci.uchicago.edu
38
Advancing
Science
through
CI
–
d.katz@ieee.org
Lessons
• Three
triangle
facets
(infrastructure,
computaBonal,
interdisciplinary)
have
be
taken
seriously
at
highest
levels,
seen
as
important
component
of
academic
research
• Infrastructure
need
to
be
integrated
at
all
levels
(laboratory,
campus,
regional,
naBonal,
internaBonal)
–
users
need
to
be
able
to
easily
move
work
and
data
to
appropriate
systems,
and
collaborate
across
locaBons
• EducaBon
and
training
of
students
and
faculty
is
crucial
–
vast
improvements
are
needed
over
the
small
numbers
currently
reached
through
HPC
center
tutorials;
computaBon
and
computaBonal
thinking
need
to
be
part
of
new
curricula
across
all
disciplines
• Emphasis
should
be
made
on
broadening
parBcipaBon
in
computaBon,
not
just
focusing
on
high
end
systems
where
decreasing
numbers
of
researchers
can
join
in,
but
making
tools
much
more
easily
usable
and
intuiBve
and
freeing
all
researchers
from
the
limitaBons
of
their
personal
workstaBons,
and
providing
access
to
simple
tools
for
large
scale
parameter
studies,
data
archiving,
visualizaBon
and
collaboraBon
• Vision
needs
to
be
consistent
–
cannot
be
just
one
person
• Funding
needs
to
be
stable
(acBviBes
need
to
be
sustainable)
39. www.ci.anl.gov
www.ci.uchicago.edu
39
Advancing
Science
through
CI
–
d.katz@ieee.org
Video
• Data
Sharing
-‐
hvps://www.youtube.com/
watch?v=N2zK3sAtr-‐4
40. www.ci.anl.gov
www.ci.uchicago.edu
40
Advancing
Science
through
CI
–
d.katz@ieee.org
Sources
• D.
S.
Katz
et
al.,
“Louisiana:
A
Model
for
Advancing
Regional
e-‐Science
through
Cyberinfrastructure,”
Philosophical
TransacBons
of
the
Royal
Society
A,
367(1897),
2009.
– authors
from
Louisiana
State
University,
Tulane
University,
University
of
Louisiana
at
Lafayeve,
Louisiana
Tech
University,
Louisiana
Community
and
Technical
College
System,
Southern
University,
University
of
New
Orleans
• G.
Allen
and
D.
S.
Katz,
“ComputaBonal
science,
infrastructure
and
interdisciplinary
research
on
university
campuses:
experiences
and
lessons
from
the
Center
for
ComputaBon
and
Technology,”
NSF
Workshop
on
Sustainable
Funding
and
Business
Models
for
Academic
Cyberinfrastructure
FaciliBes,
Cornell
University,
2010
• Daniel
S.
Katz,
David
Proctor,
“A
Framework
for
Discussing
e-‐Research
Infrastructure
Sustainability,”
hvp://dx.doi.org/10.6084/m9.figshare.790767,
submived
to
Workshop
on
Sustainable
Soiware
for
Science:
PracBce
and
Experiences
(hvp://wssspe.researchcompuBng.org.uk)
at
SC13
• Swii:
Swii
Team,
led
by
Mike
Wilde,
hvp://www.ci.uchicago.edu/swii
• Globus:
Globus
Team,
led
by
Ian
Foster
and
Steve
Tuecke,
hvp://
www.globus.org