SlideShare una empresa de Scribd logo
1 de 40
Descargar para leer sin conexión
 
	
  
www.ci.anl.gov	
  
www.ci.uchicago.edu	
  
Advancing	
  Science	
  through	
  
Coordinated	
  Cyberinfrastructure	
  
Daniel	
  S.	
  Katz	
  
d.katz@ieee.org	
  
Senior	
  Fellow,	
  ComputaBon	
  InsBtute,	
  University	
  of	
  Chicago	
  &	
  Argonne	
  NaBonal	
  Laboratory	
  
Affiliate	
  Faculty,	
  Center	
  for	
  ComputaBon	
  &	
  Technology,	
  Louisiana	
  State	
  University	
  
Adjunct	
  Associate	
  Professor,	
  Electrical	
  and	
  Computer	
  Engineering,	
  LSU	
  	
  
www.ci.anl.gov	
  
www.ci.uchicago.edu	
  
2	
   Advancing	
  Science	
  through	
  CI	
  –	
  d.katz@ieee.org	
  
Topics	
  
•  What	
  we	
  did	
  in	
  Louisiana	
  from	
  2006-­‐2010	
  
•  What	
  I	
  would	
  do	
  differently	
  now	
  
•  A	
  short	
  video	
  to	
  highlight	
  some	
  addiBonal	
  issues	
  
that	
  I	
  hope	
  the	
  Center	
  for	
  ComputaBonal	
  
Engineering	
  &	
  Sciences	
  will	
  keep	
  in	
  mind	
  
www.ci.anl.gov	
  
www.ci.uchicago.edu	
  
3	
   Advancing	
  Science	
  through	
  CI	
  –	
  d.katz@ieee.org	
  
Louisiana	
  
•  Area: 134 382 km2 (33/51)
•  Population: 4 533 000 (2010, 25/51)
•  GDP: $208 billion (2009, 24/51)
•  GDP/person: $45 700 (2009, 21/51)
•  In Poverty: 17% (2009, 44/51)
•  High School Degree: 82% (2009, 46/51)
•  BS Degree: 21% (2009, 47/51)
•  Advanced Degree: 7% (2009, 48/51)
State	
  Goals:	
  talented	
  workforce,	
  great	
  compeBBveness,	
  strong	
  
educaBonal	
  system,	
  increased	
  economic	
  development	
  
www.ci.anl.gov	
  
www.ci.uchicago.edu	
  
4	
   Advancing	
  Science	
  through	
  CI	
  –	
  d.katz@ieee.org	
  
PITAC	
  Report	
  Summary:	
  	
  
•  “ComputaBonal	
  science	
  -­‐-­‐	
  the	
  use	
  of	
  
advanced	
  compuBng	
  capabiliBes	
  to	
  
understand	
  and	
  solve	
  complex	
  
problems	
  -­‐-­‐	
  is	
  criBcal	
  to	
  scienBfic	
  
leadership,	
  economic	
  compeBBveness,	
  
and	
  naBonal	
  security.	
  It	
  is	
  one	
  of	
  the	
  
most	
  important	
  technical	
  fields	
  of	
  the	
  
21st	
  century	
  because	
  it	
  is	
  essenBal	
  to	
  
advances	
  throughout	
  society.”	
  
•  “UniversiBes	
  must	
  significantly	
  change	
  
organizaBonal	
  structures:	
  	
  
mulBdisciplinary	
  &	
  collaboraBve	
  
research	
  are	
  needed	
  [for	
  US]	
  to	
  remain	
  
compeBBve	
  in	
  global	
  science”	
  
Complex	
  problems:	
  	
  Innova1ons	
  will	
  occur	
  at	
  boundaries	
  
www.ci.anl.gov	
  
www.ci.uchicago.edu	
  
5	
   Advancing	
  Science	
  through	
  CI	
  –	
  d.katz@ieee.org	
  
Big	
  Science	
  and	
  Infrastructure	
  
•  Higgs*	
  boson	
  discovery	
  announced	
  at	
  CERN	
  July	
  4,	
  2012	
  
•  Instrument:	
  Large	
  Hadron	
  Collider	
  (LHC)	
  
•  Infrastructure	
  
–  CompuBng	
  Hardware:	
  Worldwide	
  LHC	
  CompuBng	
  Grid	
  (WLCG):	
  235,000	
  cores	
  
across	
  36	
  countries,	
  including	
  OpenScience	
  Grid	
  (OSG,	
  US),	
  European	
  Grid	
  
Infrastructure	
  (EGI,	
  Europe),	
  ...	
  
–  Data:	
  ~20	
  PB	
  of	
  data	
  created	
  in	
  2011-­‐2012	
  
–  Soiware:	
  grid	
  middleware,	
  physics	
  analysis	
  applicaBons,	
  ...	
  
–  Networks	
  
–  EducaBon	
  &	
  
Training	
  
•  Data	
  generated	
  	
  
centrally,	
  moved	
  	
  
(~3	
  PB/week)	
  
across	
  mulB-­‐Bered	
  	
  
infrastructure	
  to	
  be	
  	
  
compuBng	
  upon	
  
www.ci.anl.gov	
  
www.ci.uchicago.edu	
  
6	
   Advancing	
  Science	
  through	
  CI	
  –	
  d.katz@ieee.org	
  
Big	
  Science	
  and	
  Infrastructure	
  
•  Hurricanes	
  affect	
  humans	
  
•  MulB-­‐physics:	
  atmosphere,	
  ocean,	
  coast,	
  vegetaBon,	
  soil	
  
–  Sensors	
  and	
  data	
  as	
  inputs	
  
•  Humans:	
  what	
  have	
  they	
  built,	
  where	
  are	
  they,	
  what	
  will	
  they	
  do	
  
–  Data	
  and	
  models	
  as	
  inputs	
  
•  Infrastructure:	
  
–  Urgent/scheduled	
  processing,	
  	
  
workflow	
  systems	
  
–  Soiware	
  applicaBons,	
  workflows	
  
–  Networks	
  
–  Decision-­‐support	
  systems,	
  	
  
visualizaBon	
  
–  Data	
  storage,	
  
interoperability	
  
www.ci.anl.gov	
  
www.ci.uchicago.edu	
  
7	
   Advancing	
  Science	
  through	
  CI	
  –	
  d.katz@ieee.org	
  
Long-­‐tail	
  Science	
  and	
  Infrastructure	
  
•  Exploding	
  data	
  volumes	
  &	
  powerful	
  
simulaBon	
  methods	
  	
  mean	
  that	
  more	
  
researchers	
  need	
  advanced	
  infrastructure	
  
•  Such	
  “long-­‐tail”	
  researchers	
  	
  cannot	
  afford	
  
expensive	
  experBse	
  and	
  unique	
  
infrastructure	
  	
  
•  Challenge:	
  Outsource	
  and/or	
  automate	
  
Bme-­‐consuming	
  common	
  processes	
  
–  Tools,	
  e.g.,	
  Globus	
  Online	
  and	
  data	
  
management	
  
o  Note:	
  much	
  LHC	
  data	
  is	
  moved	
  by	
  Globus	
  GridFTP,	
  
e.g.,	
  May/June	
  2012,	
  >20	
  PB,	
  >20M	
  files	
  
–  Gateways,	
  e.g.,	
  nanoHUB,	
  CIPRES,	
  access	
  to	
  
scienBfic	
  simulaBon	
  soiware	
  
NSF	
  grant	
  size,	
  2007.	
  (“Dark	
  
data	
  in	
  the	
  long	
  tail	
  of	
  
science”,	
  B.	
  Heidorn)	
  
www.ci.anl.gov	
  
www.ci.uchicago.edu	
  
8	
   Advancing	
  Science	
  through	
  CI	
  –	
  d.katz@ieee.org	
  
Long-­‐tail	
  Science	
  and	
  Infrastructure	
  
•  CIPRES	
  Science	
  Gateway	
  for	
  PhylogeneBcs	
  
–  Study	
  of	
  diversificaBon	
  of	
  life	
  and	
  relaBonships	
  among	
  living	
  things	
  through	
  Bme	
  
•  Highly	
  used,	
  as	
  of	
  mid	
  2013:	
  
–  Cited	
  in	
  at	
  least	
  400	
  publicaBons,	
  e.g.,	
  Nature,	
  PNAS,	
  Cell	
  
–  More	
  than	
  5000	
  unique	
  users	
  in	
  3	
  years	
  
–  Used	
  rouBnely	
  in	
  at	
  least	
  68	
  undergraduate	
  classes	
  
–  45%	
  US	
  (including	
  most	
  states),	
  55%	
  70	
  other	
  countries	
  
•  Infrastructure	
  
–  Flexible	
  web	
  applicaBon	
  
o  A	
  science	
  gateway,	
  uses	
  soiware	
  and	
  lessons	
  from	
  XSEDE	
  gateways	
  team,	
  e.g.,	
  idenBfy	
  
management,	
  HPC	
  job	
  control	
  
–  Science	
  soiware:	
  tree	
  inference	
  and	
  sequence	
  alignment	
  
o  Parallel	
  versions	
  of	
  MrBayes,	
  RAxML,	
  GARLI,	
  BEAST,	
  MAFFT	
  
o  PAUP*,	
  Poy,	
  ClustalW,	
  Contralign,	
  FSA,	
  MUSCLE,	
  ...	
  
–  Data	
  
o  Personal	
  user	
  space	
  for	
  storing	
  	
  
results	
  
o  Tools	
  to	
  transfer	
  and	
  view	
  data	
  
Credit:	
  Mark	
  Miller,	
  SDSC	
  
www.ci.anl.gov	
  
www.ci.uchicago.edu	
  
9	
   Advancing	
  Science	
  through	
  CI	
  –	
  d.katz@ieee.org	
  
Infrastructure	
  Challenges	
  
•  Science	
  
–  Larger	
  teams,	
  more	
  disciplines,	
  more	
  countries	
  
•  Data	
  	
  
–  Size,	
  complexity,	
  rates	
  all	
  increasing	
  rapidly	
  
–  Need	
  for	
  interoperability	
  (systems	
  and	
  policies)	
  
•  Systems	
  
–  More	
  cores,	
  more	
  architectures	
  (GPUs),	
  more	
  memory	
  hierarchy	
  
–  Changing	
  balances	
  (latency	
  vs	
  bandwidth)	
  
–  Changing	
  limits	
  (power,	
  funds)	
  
–  System	
  architecture	
  and	
  business	
  models	
  changing	
  (clouds)	
  
–  Network	
  capacity	
  growing;	
  increase	
  networks	
  -­‐>	
  increased	
  security	
  
•  Soiware	
  
–  MulBphysics	
  algorithms,	
  frameworks	
  
–  Programing	
  models	
  and	
  abstracBons	
  for	
  science,	
  data,	
  and	
  hardware	
  
–  V&V,	
  reproducibility,	
  fault	
  tolerance	
  
•  People	
  
–  EducaBon	
  and	
  training	
  
–  Career	
  paths	
  
–  Credit	
  and	
  avribuBon	
  
www.ci.anl.gov	
  
www.ci.uchicago.edu	
  
10	
   Advancing	
  Science	
  through	
  CI	
  –	
  d.katz@ieee.org	
  
Cyberinfrastructure	
  
“Cyberinfrastructure	
  consists	
  of	
  
	
  compu1ng	
  systems,	
  
	
  data	
  storage	
  systems,	
  	
  
	
  advanced	
  instruments	
  and	
  	
  
	
  data	
  repositories,	
  	
  
	
  visualiza1on	
  environments,	
  and	
  	
  
	
  people,	
  	
  
all	
  linked	
  together	
  by	
  so@ware	
  and	
  	
  
	
  high	
  performance	
  networks	
  	
  
to	
  improve	
  research	
  produc1vity	
  and	
  enable	
  breakthroughs	
  	
  
	
  not	
  otherwise	
  possible.”	
  	
  
	
   	
   	
   	
   	
  -­‐-­‐	
  Craig	
  Stewart	
  
	
  
www.ci.anl.gov	
  
www.ci.uchicago.edu	
  
11	
   Advancing	
  Science	
  through	
  CI	
  –	
  d.katz@ieee.org	
  
ComputaBonal	
  &	
  Data-­‐enabled	
  	
  
Science	
  &	
  Engineering	
  (CDS&E)	
  
•  LIGO:	
  	
  Laser	
  Interferometric	
  GravitaBonal	
  Wave	
  
Observatory	
  
•  Ties	
  together	
  theory,	
  computaBon,	
  and	
  experiment	
  
–  Each	
  drives	
  the	
  other	
  two!	
  
www.ci.anl.gov	
  
www.ci.uchicago.edu	
  
12	
   Advancing	
  Science	
  through	
  CI	
  –	
  d.katz@ieee.org	
  
How	
  We	
  Started	
  
•  State	
  commitment:	
  $25M/year	
  for	
  Vision	
  20/20	
  
–  $9M:	
  LSU	
  -­‐>	
  CCT	
  (similarly,	
  ULL	
  -­‐>	
  LITE)	
  
•  University	
  commitment	
  to	
  build	
  new	
  programs	
  for	
  
21st	
  century	
  
•  State	
  and	
  University	
  willingness	
  to	
  make	
  
extraordinary	
  investments	
  
•  Opportunity	
  to	
  build	
  new	
  world	
  class	
  program	
  in	
  
interdisciplinary	
  research	
  and	
  educaBon,	
  involving	
  
all	
  of	
  LSU	
  
•  Ed	
  Seidel-­‐led	
  vision	
  to	
  insBgate	
  state-­‐wide	
  
collaboraBon	
  
www.ci.anl.gov	
  
www.ci.uchicago.edu	
  
13	
   Advancing	
  Science	
  through	
  CI	
  –	
  d.katz@ieee.org	
  
Advancing	
  Research	
  
•  PotenBally	
  requires	
  advances	
  in	
  three	
  areas,	
  
depending	
  on	
  exisBng	
  strengths	
  
www.ci.anl.gov	
  
www.ci.uchicago.edu	
  
14	
   Advancing	
  Science	
  through	
  CI	
  –	
  d.katz@ieee.org	
  
CCT	

Director Office	

Edward Seidel	

HPC Partnership	

McMahon	

Cyberinfrastructure
Development
Katz	

Focus Areas	

Allen	

LONI	

Systems and
Software	

Coast to Cosmos	

	

LSU HPC	

 Performance Team	

 Core Comp. Sci.	

Corporate Relations	

	

Blue Waters, etc.	

Material World	

	

Labs: ACAL, DSL,
Viz, LCAT, …	

NSF TeraGrid	

 Cultural Computing	

Visualization	

14	
  
CCT	
  OrganizaBon	
  
www.ci.anl.gov	
  
www.ci.uchicago.edu	
  
15	
   Advancing	
  Science	
  through	
  CI	
  –	
  d.katz@ieee.org	
  
Cyberinfrastructure	
  Development	
  
•  Vision:	
  combine	
  research	
  and	
  infrastructure	
  
–  Research	
  
o  Computer	
  science	
  
o  ApplicaBons	
  
o  Tools	
  
•  Both	
  together	
  have	
  squared	
  growth	
  of	
  either	
  
alone	
  
•  CyD	
  staff	
  –	
  PhDs	
  in	
  CS	
  and	
  apps	
  who	
  understand	
  
the	
  whole	
  picture	
  and	
  want	
  to	
  grow	
  the	
  
ecosystem	
  
15	
  
–  Infrastructure	
  
o  Hardware	
  
o  OperaBons	
  
o  Policies	
  
www.ci.anl.gov	
  
www.ci.uchicago.edu	
  
16	
   Advancing	
  Science	
  through	
  CI	
  –	
  d.katz@ieee.org	
  
NaBonal	
  Lambda	
  Rail	
  UNO	
  
Tulane	
  UL-­‐L	
  
SUBR	
  
LSU	
  
LA	
  Tech	
  
	
  
	
  
LONI:	
  40	
  Gbps	
  network	
  
LONI:	
  ~100TF	
  IBM,	
  Dell	
  
Supercomputers	
  
Cybertools:	
  Tools	
  and	
  
Services	
  
CompuBng	
  in	
  Louisiana	
  
LONI	
  InsBtute:	
  People	
  
and	
  CollaboraBons	
  
TeraGrid,	
  OSG	
  
www.ci.anl.gov	
  
www.ci.uchicago.edu	
  
17	
   Advancing	
  Science	
  through	
  CI	
  –	
  d.katz@ieee.org	
  
LONI	
  -­‐	
  Networking	
  	
  CompuBng	
  
LSU
La TechLSU HSC
ULL
Tulane
SU
UNOLSU HSC
LONI node
Multiple 10GE
~500 core Dell cluster 
112 proc. IBM P5 cluster
~4500 core Dell Cluster
ULM
McNeese
NSU
SLU
Alex
Network:	
  partners	
  and	
  customers	
  
www.ci.anl.gov	
  
www.ci.uchicago.edu	
  
18	
   Advancing	
  Science	
  through	
  CI	
  –	
  d.katz@ieee.org	
  
LONI	
  CompuBng	
  Resources	
  (2010)	
  
•  One	
  central	
  Dell	
  cluster	
  (Queen	
  Bee)	
  
–  5500	
  IB-­‐connected	
  cores	
  at	
  ISB	
  in	
  Baton	
  Rouge	
  
–  Archival	
  storage	
  contracted	
  through	
  NCSA	
  
–  50%	
  of	
  allocaBons	
  dedicated	
  to	
  TeraGrid	
  from	
  2008	
  	
  	
  
•  Six	
  distributed	
  512-­‐core	
  Dell	
  clusters	
  
•  Five	
  distributed	
  14-­‐node	
  (112	
  procs)	
  IBM	
  P5-­‐575	
  clusters	
  
•  Distributed	
  PetaShare	
  storage	
  
–  32	
  TB	
  disk	
  @	
  each	
  small	
  Dell	
  cluster	
  
–  8	
  TB	
  disk	
  on	
  LSU	
  	
  LaTech	
  small	
  Dell	
  clusters	
  –	
  for	
  LBRN	
  
–  8	
  TB	
  at	
  SC-­‐S	
  	
  HSC-­‐NO	
  –	
  for	
  LBRN	
  
–  250	
  TB	
  tape	
  
•  All	
  run	
  by	
  HPC@LSU,	
  including	
  user	
  support/training	
  
www.ci.anl.gov	
  
www.ci.uchicago.edu	
  
19	
   Advancing	
  Science	
  through	
  CI	
  –	
  d.katz@ieee.org	
  
$12M	
  NSF	
  CyberTools	
  Project:	
  Enabler	
  and	
  Driver	
  
www.ci.anl.gov	
  
www.ci.uchicago.edu	
  
20	
   Advancing	
  Science	
  through	
  CI	
  –	
  d.katz@ieee.org	
  
Cactus	
  
•  Component-­‐based	
  	
  
HPC	
  framework	
  	
  
–  Freely-­‐available	
  	
  
environment	
  for	
  	
  
collaboraBve	
  applicaBon	
  	
  
development	
  
•  Cuzng	
  edge	
  CS	
  
–  Grid	
  compuBng,	
  petascale,	
  accelerators,	
  steering,	
  remote	
  viz	
  
•  AcBve	
  user	
  	
  developer	
  communiBes	
  
–  10	
  year	
  pedigree,	
  $10M	
  support	
  
–  Numerical	
  RelaBvity,	
  CFD,	
  Coastal,	
  Reservoir	
  Engineering,	
  …	
  
•  Domain-­‐specific	
  toolkits,	
  e.g.	
  CFD	
  toolkit	
  
–  FD/FV/FE	
  numerical	
  methods	
  
–  Structured,	
  mulB-­‐block,	
  unstructured	
  
–  Uses	
  PETSc,	
  Trilinos,	
  MUMPS,	
  HYPRE	
  
–  Used	
  to	
  build	
  Black	
  Oil	
  Toolkit	
  
www.ci.anl.gov	
  
www.ci.uchicago.edu	
  
21	
   Advancing	
  Science	
  through	
  CI	
  –	
  d.katz@ieee.org	
  
PetaShare	
  
•  Main	
  concept:	
  data	
  is	
  managed	
  (migrated,	
  moved,	
  replicated,	
  cached,	
  etc.)	
  	
  
automaBcally	
  
•  Data-­‐aware	
  storage	
  systems,	
  data-­‐aware	
  schedulers,	
  cross-­‐domain	
  metadata	
  
scheme	
  
•  Provides:	
  250	
  TB	
  disk,	
  400	
  TB	
  tape	
  	
  
storage	
  (and	
  access	
  to	
  naBonal	
  	
  
storage	
  faciliBes)	
  
•  ApplicaBons:	
  	
  
coastal	
  	
  environmental	
  	
  
modeling,	
  	
  
geospaBal	
  analysis,	
  	
  
bioinformaBcs,	
  	
  
medical	
  imaging,	
  	
  
fluid	
  dynamics,	
  	
  
petroleum	
  engineering,	
  	
  
numerical	
  relaBvity,	
  	
  
high	
  energy	
  physics.	
  	
  
	
  	
  
Credit:	
  Tevfik	
  Kosar	
  
www.ci.anl.gov	
  
www.ci.uchicago.edu	
  
22	
   Advancing	
  Science	
  through	
  CI	
  –	
  d.katz@ieee.org	
  
LONI	
  InsBtute	
  	
  
“CCT	
  for	
  the	
  Louisiana”	
  
•  $15M	
  5-­‐year	
  project	
  
–  $7M	
  BoR,	
  $8M	
  from	
  LaTech,	
  LSU,	
  SUBR,	
  Tulane,	
  UNO,	
  
ULL	
  
•  Catalyzes	
  new	
  inter-­‐insBtuBonal	
  collaboraBons,	
  
ambiBous	
  projects	
  and	
  top	
  level	
  hires:	
  
–  LONI	
  network	
  and	
  compuBng	
  
–  NSF	
  projects:	
  	
  PetaShare,	
  VizTangibles,	
  TeraGrid,	
  Blue	
  
Waters	
  
–  EPSCoR:	
  	
  NSF	
  CyberTools,	
  DOE	
  UCoMS,	
  DoD	
  	
  
–  NIH:	
  $17M	
  LBRN	
  
–  Promote	
  collaboraBve	
  research	
  at	
  interfaces	
  for	
  
innovaBon	
  
www.ci.anl.gov	
  
www.ci.uchicago.edu	
  
23	
   Advancing	
  Science	
  through	
  CI	
  –	
  d.katz@ieee.org	
  
LONI	
  InsBtute	
  Vision	
  
•  LONI	
  investments	
  create	
  world	
  leading	
  infrastructure	
  
•  Create	
  bold	
  new	
  inter-­‐university	
  superstructure	
  
–  New	
  faculty,	
  staff,	
  students;	
  	
  train	
  others.	
  	
  Focus	
  on	
  CS,	
  Bio,	
  
Materials,	
  but	
  all	
  disciplines	
  impacted	
  
–  Promote	
  research	
  at	
  interfaces	
  for	
  innovaBon	
  
•  Draw	
  on,	
  enhance	
  strengths	
  of	
  all	
  universiBes	
  
–  Strong	
  groups	
  recently	
  created;	
  	
  collecBvely	
  world-­‐class	
  
–  Solve	
  complex	
  problems	
  through	
  collaboraBon	
  	
  computaBon	
  
–  Much	
  stronger	
  recruiBng	
  opportuniBes	
  for	
  all	
  insBtuBons	
  
–  Statewide	
  interdisciplinary	
  educaBon	
  	
  research	
  program	
  
•  Create	
  University-­‐Industry	
  Research	
  Centers	
  (UIRCs)	
  
–  Research	
  Triangle,	
  NCSA/UIUC,	
  Bay	
  Area,	
  others	
  
•  Transform	
  Louisiana	
  
–  Such	
  commived	
  cooperaBon	
  between	
  sites	
  extraordinary	
  
www.ci.anl.gov	
  
www.ci.uchicago.edu	
  
24	
   Advancing	
  Science	
  through	
  CI	
  –	
  d.katz@ieee.org	
  
LONI	
  InsBtute	
  Hiring	
  and	
  Projects	
  
•  Two	
  new	
  faculty	
  at	
  each	
  insBtuBon	
  (12	
  total)	
  
–  Six	
  in	
  CS,	
  six	
  in	
  Comp.	
  Bio/Materials	
  
•  Six	
  ComputaBonal	
  ScienBsts	
  
–  Following	
  Bavarian	
  KONWIHR	
  project	
  
–  Support	
  70-­‐90	
  projects	
  over	
  five	
  years;	
  lead	
  to	
  external	
  funding	
  
•  Graduate	
  students	
  
–  36	
  new	
  students	
  funded,	
  trained;	
  two	
  years	
  each	
  
•  One	
  Coordinator/economic	
  development	
  
•  All	
  hiring	
  coordinated	
  across	
  state	
  
•  Leading	
  faculty	
  across	
  state	
  create	
  mulB-­‐insBtuBonal	
  seed	
  
projects	
  
•  Building	
  on	
  seeds,	
  dozens	
  of	
  new	
  projects	
  selected,	
  started	
  
•  Exploit	
  common	
  themes,	
  compuBng	
  environments,	
  tools	
  
found	
  in	
  all	
  areas	
  
www.ci.anl.gov	
  
www.ci.uchicago.edu	
  
25	
   Advancing	
  Science	
  through	
  CI	
  –	
  d.katz@ieee.org	
  
TeraGrid	
  (XSEDE)	
  
•  TeraGrid:	
  world’s	
  largest	
  open	
  scienBfic	
  discovery	
  infrastructure	
  
•  Leadership	
  class	
  resources	
  at	
  eleven	
  partner	
  sites	
  combined	
  to	
  create	
  
an	
  integrated,	
  persistent	
  computaBonal	
  resource	
  
–  High-­‐performance	
  networks	
  
–  High-­‐performance	
  computers	
  (1	
  Pflops	
  (~100,000	
  cores)	
  -­‐	
  1.75	
  Pflops)	
  
o  And	
  a	
  Condor	
  pool	
  (w/	
  ~13,000	
  CPUs)	
  
–  VisualizaBon	
  systems	
  
–  Data	
  CollecBons	
  (30	
  PB,	
  100	
  discipline-­‐specific	
  databases)	
  
–  Science	
  Gateways	
  
–  User	
  portal	
  
–  User	
  services	
  -­‐	
  Help	
  desk,	
  training,	
  advanced	
  app	
  support	
  
•  Allocated	
  to	
  US	
  researchers	
  and	
  their	
  collaborators	
  through	
  naBonal	
  
peer-­‐review	
  process	
  
–  Generally,	
  review	
  of	
  compuBng,	
  not	
  science	
  
•  Mid	
  2011:	
  TeraGrid	
  -­‐-­‐	
  XSEDE	
  
www.ci.anl.gov	
  
www.ci.uchicago.edu	
  
26	
   Advancing	
  Science	
  through	
  CI	
  –	
  d.katz@ieee.org	
  
Campus	
  Champions	
  
•  “Champion”	
  is	
  a	
  staff	
  or	
  faculty	
  member	
  on	
  a	
  campus	
  that	
  provides	
  informaBon	
  on	
  
XSEDE	
  to	
  his/her	
  colleagues	
  
•  Currently	
  ~160	
  insBtuBons	
  represented	
  by	
  champions	
  
•  Champions	
  get:	
  
–  Monthly	
  training	
  and	
  updates	
  
–  Start-­‐up	
  accounts	
  
–  Forum	
  for	
  sharing	
  and	
  interacBons	
  
–  Access	
  to	
  informaBon	
  on	
  usage	
  by	
  	
  
local	
  users	
  
–  RegistraBons	
  for	
  annual	
  XSEDE	
  	
  
Conference	
  waived	
  
•  Champions	
  do:	
  
–  Raise	
  awareness	
  locally	
  
–  Provide	
  training	
  
–  Get	
  users	
  started	
  with	
  access	
  quickly	
  
–  Represent	
  needs	
  of	
  local	
  community	
  
–  Provide	
  feedback	
  to	
  improve	
  services	
  
–  Avend	
  annual	
  XSEDE	
  conference	
  
–  Share	
  their	
  training	
  and	
  educaBon	
  materials	
  
–  Build	
  community	
  across	
  campus,	
  and	
  among	
  all	
  Champions	
  
March 26, 2014
Revised March 22, 2014
Campus Champion Institutions
Standard – 87
EPSCoR States – 51
Minority Serving Institutions – 12
EPSCoR States and Minority Serving Institutions – 8
Total Campus Champion Institutions – 158
Credit:	
  Kay	
  Hunt	
  
www.ci.anl.gov	
  
www.ci.uchicago.edu	
  
27	
   Advancing	
  Science	
  through	
  CI	
  –	
  d.katz@ieee.org	
  
LONI	
  and	
  NaBonal	
  Cyberinfrastructure	
  
•  TeraGrid	
  
–  One	
  of	
  the	
  11	
  TeraGrid	
  Resource	
  Providers	
  
–  Playing	
  a	
  role	
  in	
  TG-­‐wide	
  governance	
  (TeraGrid	
  Forum,	
  ExecuBve	
  
Steering	
  Commivee,	
  various	
  working	
  groups,	
  GIG	
  Director	
  of	
  
Science)	
  
–  Contributed	
  administraBve	
  soiware	
  AmieGold	
  (glue	
  between	
  TG	
  
account	
  info	
  and	
  local	
  info)	
  and	
  CS	
  soiware	
  (HARC,	
  PetaShare,	
  
SAGA)	
  
•  OSG	
  
–  Currently	
  providing	
  resources	
  
•  XSEDE	
  
–  LONI	
  not	
  a	
  partner	
  in	
  XSEDE,	
  but	
  a	
  service	
  provider	
  
•  NaBonally	
  
–  Bringing	
  in	
  new	
  users	
  from	
  the	
  southeast	
  US	
  
–  LONI	
  InsBtute	
  ComputaBonal	
  ScienBsts	
  -­‐	
  	
  Campus	
  Champions	
  
www.ci.anl.gov	
  
www.ci.uchicago.edu	
  
28	
   Advancing	
  Science	
  through	
  CI	
  –	
  d.katz@ieee.org	
  
Create	
  and	
  maintain	
  a	
  CI	
  
ecosystem	
  providing	
  new	
  
capabili'es	
  that	
  advance	
  
and	
  accelerate	
  scienBfic	
  
inquiry	
  at	
  unprecedented	
  
complexity	
  and	
  scale	
  
Support	
  the	
  
foundaBonal	
  research	
  
needed	
  to	
  conBnue	
  to	
  
efficiently	
  advance	
  CI	
  
Enable	
  transformaBve,	
  
interdisciplinary,	
  
collaboraBve,	
  science	
  
and	
  engineering	
  
research	
  and	
  educaBon	
  
through	
  the	
  use	
  of	
  
advanced	
  CI	
  
Transform	
  pracBce	
  through	
  new	
  policies	
  
for	
  CI	
  addressing	
  challenges	
  of	
  academic	
  
culture,	
  open	
  disseminaBon	
  and	
  use,	
  
reproducibility	
  and	
  trust,	
  curaBon,	
  
sustainability,	
  governance,	
  citaBon,	
  
stewardship,	
  and	
  avribuBon	
  of	
  authorship	
  
Develop	
  a	
  next	
  generaBon	
  diverse	
  
workforce	
  of	
  scienBsts	
  and	
  engineers	
  
equipped	
  with	
  essenBal	
  skills	
  to	
  use	
  
and	
  develop	
  CI,	
  with	
  CI	
  used	
  in	
  both	
  
the	
  research	
  and	
  educa'on	
  process	
  
NSF	
  Vision:	
  Infrastructure	
  Role	
  	
  Lifecycle	
  
www.ci.anl.gov	
  
www.ci.uchicago.edu	
  
29	
   Advancing	
  Science	
  through	
  CI	
  –	
  d.katz@ieee.org	
  
Relevant	
  NSF	
  Programs	
  
•  EPSCoR	
  –	
  targeted	
  support	
  for	
  states	
  that	
  are	
  less	
  
successful	
  in	
  NSF	
  funding	
  
•  MRI	
  –	
  Major	
  Research	
  InstrumentaBon	
  
•  CIF21	
  (NSF’s	
  CI	
  umbrella)	
  
–  eXtreme	
  Digital	
  (XD)	
  
–  Track	
  1	
  (Blue	
  Waters)	
  
–  Soiware	
  Infrastructure	
  for	
  Sustained	
  InnovaBon	
  (SI2)	
  
–  Campus	
  Cyberinfrastructure	
  -­‐	
  Network	
  Infrastructure	
  
and	
  Engineering	
  (CC-­‐NIE)	
  
•  IntegraBve	
  Graduate	
  EducaBon	
  and	
  Research	
  
Traineeship	
  Program	
  (IGERT)	
  
•  General	
  research	
  programs	
  
www.ci.anl.gov	
  
www.ci.uchicago.edu	
  
30	
   Advancing	
  Science	
  through	
  CI	
  –	
  d.katz@ieee.org	
  
Recap	
  (to	
  2010)	
  
•  Louisiana	
  decides	
  that	
  science	
  and	
  technology	
  can	
  
lead	
  to	
  a	
  bever	
  future	
  
•  Builds	
  a	
  regional	
  cyberinfrastructure	
  (network,	
  
compuBng,	
  soiware,	
  ~data,	
  people)	
  that	
  connects	
  
to	
  naBonal-­‐scale	
  infrastructure	
  	
  
–  Using	
  a	
  mix	
  of	
  naBonal,	
  state,	
  and	
  local	
  funding	
  
•  Starts	
  to	
  change	
  culture	
  –	
  infuse	
  computaBon	
  in	
  
academic	
  departments,	
  interdisciplinary	
  hiring,	
  
large	
  collaboraBve	
  projects	
  
•  But...	
  
•  Didn’t	
  really	
  think	
  about	
  data	
  as	
  much	
  as	
  we	
  would	
  
have	
  were	
  we	
  starBng	
  again	
  today	
  
www.ci.anl.gov	
  
www.ci.uchicago.edu	
  
31	
   Advancing	
  Science	
  through	
  CI	
  –	
  d.katz@ieee.org	
  
•  Swii	
  is	
  designed	
  to	
  compose	
  large	
  parallel	
  workflows,	
  from	
  serial	
  or	
  parallel	
  
applicaBon	
  programs,	
  to	
  run	
  fast	
  and	
  efficiently	
  on	
  a	
  variety	
  of	
  pla~orms	
  
–  A	
  parallel	
  scripBng	
  system	
  for	
  Grids	
  and	
  clusters	
  for	
  loosely-­‐coupled	
  applicaBons	
  -­‐	
  
programs	
  (executable,	
  shell,	
  python,	
  R,	
  Octave,	
  Matlab,	
  etc.)	
  linked	
  by	
  exchanging	
  
files	
  
–  Easy	
  to	
  write:	
  simple	
  high-­‐level	
  C-­‐like	
  funcBonal	
  language,	
  allows	
  small	
  Swii	
  
scripts	
  to	
  do	
  large-­‐scale	
  work	
  
–  Easy	
  to	
  run:	
  contains	
  all	
  services	
  for	
  running,	
  in	
  one	
  Java	
  applicaBon	
  
o  Works	
  on	
  mulBcore	
  workstaBons,	
  HPC,	
  Grids	
  (interfaces	
  to	
  schedulers,	
  Globus,	
  ssh)	
  
–  A	
  powerful,	
  efficient,	
  scalable	
  and	
  flexible	
  execuBon	
  engine.	
  	
  
o  Scaling	
  O(10M)	
  tasks	
  –	
  .5M	
  in	
  live	
  science	
  work,	
  and	
  growing	
  
o  CollecBve	
  data	
  management	
  being	
  developed	
  to	
  opBmize	
  I/O	
  
•  Used	
  in	
  earth	
  science,	
  neuroscience,	
  proteomics,	
  molecular	
  dynamics,	
  
biochemistry,	
  economics,	
  staBsBcs,	
  knowledge	
  modeling,	
  and	
  more	
  
•  hvp://www.ci.uchicago.edu/swii	
  
M.	
  Wilde,	
  N.	
  Hategan,	
  J.	
  M.	
  Wozniak,	
  B.	
  Clifford,	
  D.	
  S.	
  Katz,	
  I.	
  Foster,	
  Swii:	
  A	
  language	
  for	
  distributed	
  parallel	
  scripBng,	
  Parallel	
  CompuBng,	
  v.
37(9),	
  pp.	
  633-­‐652,	
  2011.	
  
www.ci.anl.gov	
  
www.ci.uchicago.edu	
  
32	
   Advancing	
  Science	
  through	
  CI	
  –	
  d.katz@ieee.org	
  
Swii	
  Programming	
  model:	
  
all	
  execuBon	
  driven	
  by	
  parallel	
  data	
  flow	
  
•  analyze1()	
  and	
  analyze2()	
  are	
  computed	
  in	
  parallel	
  
•  analyze()	
  returns	
  r	
  when	
  they	
  are	
  done	
  
•  This	
  parallelism	
  is	
  automa1c	
  
•  Works	
  recursively	
  throughout	
  the	
  program’s	
  call	
  graph	
  
–  E.g.,	
  can	
  embed	
  within	
  foreach	
  loop,	
  itself	
  done	
  in	
  parallel	
  
–  Foreach	
  loops	
  can	
  be	
  nested	
  
(int r) analyze(int i)!
{!
j = analyze1(i); !
k = analyze2(i);!
r = 0.5*(j + k);!
}!
!
www.ci.anl.gov	
  
www.ci.uchicago.edu	
  
33	
   Advancing	
  Science	
  through	
  CI	
  –	
  d.katz@ieee.org	
  
Submit host (login node, laptop, Linux server)
Data server
Swift
script
Swii	
  runBme	
  system	
  has	
  drivers	
  and	
  algorithms	
  to	
  efficiently	
  
support	
  and	
  aggregate	
  vastly	
  diverse	
  runBme	
  environments	
  
Swii	
  Environment	
  
Clouds:	
  
Amazon	
  EC2,	
  
XSEDE	
  Wispy,	
  
Future	
  Grid	
  …	
  
Application
Programs
www.ci.anl.gov	
  
www.ci.uchicago.edu	
  
34	
   Advancing	
  Science	
  through	
  CI	
  –	
  d.katz@ieee.org	
  
Globus	
  
Big data transfer
and sharing…
…with Dropbox-like simplicity…
…directly from your own storage systems
Run as a non-profit service
to the non-profit research community
www.ci.anl.gov	
  
www.ci.uchicago.edu	
  
35	
   Advancing	
  Science	
  through	
  CI	
  –	
  d.katz@ieee.org	
  
Globus	
  Users	
  
•  “I	
  need	
  a	
  good	
  place	
  to	
  store	
  or	
  backup	
  my	
  (big)	
  
research	
  data,	
  at	
  a	
  reasonable	
  price.”	
  
•  “I	
  need	
  to	
  easily,	
  quickly,	
  and	
  reliably	
  move	
  or	
  
mirror	
  porBons	
  of	
  my	
  data	
  to	
  other	
  places,	
  
including	
  my	
  campus	
  HPC	
  system,	
  lab	
  server,	
  
desktop,	
  laptop,	
  XSEDE,	
  cloud,	
  etc.”	
  
•  “I	
  need	
  a	
  way	
  to	
  easily	
  and	
  securely	
  share	
  my	
  data	
  
with	
  my	
  colleagues	
  at	
  other	
  insBtuBons.”	
  
•  “I	
  want	
  to	
  publish	
  my	
  data	
  so	
  that	
  it’s	
  available	
  and	
  
discoverable	
  long-­‐term.”	
  
•  “I	
  want	
  to	
  archive	
  my	
  data	
  in	
  case	
  it’s	
  needed	
  
someBme	
  in	
  the	
  future.”	
  
www.ci.anl.gov	
  
www.ci.uchicago.edu	
  
36	
   Advancing	
  Science	
  through	
  CI	
  –	
  d.katz@ieee.org	
  
Globus	
  is	
  SaaS	
  
•  Web,	
  command	
  line,	
  and	
  REST	
  interfaces	
  
•  Reduced	
  IT	
  operaBonal	
  costs	
  
•  New	
  features	
  automaBcally	
  available	
  
•  Consolidated	
  support	
  	
  troubleshooBng	
  
•  Easy	
  to	
  add	
  your	
  laptop,	
  server,	
  cluster,	
  
supercomputer,	
  etc.	
  with	
  Globus	
  Connect	
  	
  
www.ci.anl.gov	
  
www.ci.uchicago.edu	
  
37	
   Advancing	
  Science	
  through	
  CI	
  –	
  d.katz@ieee.org	
  
Globus	
  Connected	
  Resources	
  on	
  Campus	
  
•  Research	
  compuBng	
  center	
  
•  Department	
  /	
  lab	
  storage	
  
•  Campus-­‐wide	
  home/project	
  file	
  system	
  
•  Mass	
  Storage	
  Systems	
  
•  Science	
  instruments	
  
•  Desktops	
  and	
  laptops	
  
•  Custom	
  web	
  applicaBons	
  
•  Amazon	
  Web	
  Services	
  S3	
  
www.ci.anl.gov	
  
www.ci.uchicago.edu	
  
38	
   Advancing	
  Science	
  through	
  CI	
  –	
  d.katz@ieee.org	
  
Lessons	
  
•  Three	
  triangle	
  facets	
  (infrastructure,	
  computaBonal,	
  interdisciplinary)	
  have	
  
be	
  taken	
  seriously	
  at	
  highest	
  levels,	
  seen	
  as	
  important	
  component	
  of	
  
academic	
  research	
  
•  Infrastructure	
  need	
  to	
  be	
  integrated	
  at	
  all	
  levels	
  (laboratory,	
  campus,	
  
regional,	
  naBonal,	
  internaBonal)	
  –	
  users	
  need	
  to	
  be	
  able	
  to	
  easily	
  move	
  work	
  
and	
  data	
  to	
  appropriate	
  systems,	
  and	
  collaborate	
  across	
  locaBons	
  	
  
•  EducaBon	
  and	
  training	
  of	
  students	
  and	
  faculty	
  is	
  crucial	
  –	
  vast	
  improvements	
  
are	
  needed	
  over	
  the	
  small	
  numbers	
  currently	
  reached	
  through	
  HPC	
  center	
  
tutorials;	
  computaBon	
  and	
  computaBonal	
  thinking	
  need	
  to	
  be	
  part	
  of	
  new	
  
curricula	
  across	
  all	
  disciplines	
  	
  
•  Emphasis	
  should	
  be	
  made	
  on	
  broadening	
  parBcipaBon	
  in	
  computaBon,	
  not	
  
just	
  focusing	
  on	
  high	
  end	
  systems	
  where	
  decreasing	
  numbers	
  of	
  researchers	
  
can	
  join	
  in,	
  but	
  making	
  tools	
  much	
  more	
  easily	
  usable	
  and	
  intuiBve	
  and	
  
freeing	
  all	
  researchers	
  from	
  the	
  limitaBons	
  of	
  their	
  personal	
  workstaBons,	
  
and	
  providing	
  access	
  to	
  simple	
  tools	
  for	
  large	
  scale	
  parameter	
  studies,	
  data	
  
archiving,	
  visualizaBon	
  and	
  collaboraBon	
  
•  Vision	
  needs	
  to	
  be	
  consistent	
  –	
  cannot	
  be	
  just	
  one	
  person	
  
•  Funding	
  needs	
  to	
  be	
  stable	
  (acBviBes	
  need	
  to	
  be	
  sustainable)	
  
www.ci.anl.gov	
  
www.ci.uchicago.edu	
  
39	
   Advancing	
  Science	
  through	
  CI	
  –	
  d.katz@ieee.org	
  
Video	
  
•  Data	
  Sharing	
  -­‐	
  hvps://www.youtube.com/
watch?v=N2zK3sAtr-­‐4	
  
www.ci.anl.gov	
  
www.ci.uchicago.edu	
  
40	
   Advancing	
  Science	
  through	
  CI	
  –	
  d.katz@ieee.org	
  
Sources	
  
•  D.	
  S.	
  Katz	
  et	
  al.,	
  “Louisiana:	
  A	
  Model	
  for	
  Advancing	
  Regional	
  e-­‐Science	
  
through	
  Cyberinfrastructure,”	
  Philosophical	
  TransacBons	
  of	
  the	
  Royal	
  Society	
  
A,	
  367(1897),	
  2009.	
  
–  authors	
  from	
  Louisiana	
  State	
  University,	
  Tulane	
  University,	
  University	
  of	
  Louisiana	
  
at	
  Lafayeve,	
  Louisiana	
  Tech	
  University,	
  Louisiana	
  Community	
  and	
  Technical	
  
College	
  System,	
  Southern	
  University,	
  University	
  of	
  New	
  Orleans	
  
•  G.	
  Allen	
  and	
  D.	
  S.	
  Katz,	
  “ComputaBonal	
  science,	
  infrastructure	
  and	
  
interdisciplinary	
  research	
  on	
  university	
  campuses:	
  experiences	
  and	
  lessons	
  
from	
  the	
  Center	
  for	
  ComputaBon	
  and	
  Technology,”	
  NSF	
  Workshop	
  on	
  
Sustainable	
  Funding	
  and	
  Business	
  Models	
  for	
  Academic	
  Cyberinfrastructure	
  
FaciliBes,	
  Cornell	
  University,	
  2010	
  
•  Daniel	
  S.	
  Katz,	
  David	
  Proctor,	
  “A	
  Framework	
  for	
  Discussing	
  e-­‐Research	
  
Infrastructure	
  Sustainability,”	
  hvp://dx.doi.org/10.6084/m9.figshare.790767,	
  
submived	
  to	
  Workshop	
  on	
  Sustainable	
  Soiware	
  for	
  Science:	
  PracBce	
  and	
  
Experiences	
  (hvp://wssspe.researchcompuBng.org.uk)	
  at	
  SC13	
  
•  Swii:	
  Swii	
  Team,	
  led	
  by	
  Mike	
  Wilde,	
  hvp://www.ci.uchicago.edu/swii	
  
•  Globus:	
  Globus	
  Team,	
  led	
  by	
  Ian	
  Foster	
  and	
  Steve	
  Tuecke,	
  hvp://
www.globus.org	
  

Más contenido relacionado

La actualidad más candente

Citizen science
Citizen scienceCitizen science
Citizen science
samar1407
 
Knowledge-empowered Probabilistic Graphical Models for Physical-Cyber-Social ...
Knowledge-empowered Probabilistic Graphical Models for Physical-Cyber-Social ...Knowledge-empowered Probabilistic Graphical Models for Physical-Cyber-Social ...
Knowledge-empowered Probabilistic Graphical Models for Physical-Cyber-Social ...
Artificial Intelligence Institute at UofSC
 
CI-Team MSI-CIEC High Performance Computing and CyberInfrastructure (CI) Camp...
CI-Team MSI-CIEC High Performance Computing and CyberInfrastructure (CI) Camp...CI-Team MSI-CIEC High Performance Computing and CyberInfrastructure (CI) Camp...
CI-Team MSI-CIEC High Performance Computing and CyberInfrastructure (CI) Camp...
karl.barnes
 

La actualidad más candente (8)

Citizen science
Citizen scienceCitizen science
Citizen science
 
Knowledge-empowered Probabilistic Graphical Models for Physical-Cyber-Social ...
Knowledge-empowered Probabilistic Graphical Models for Physical-Cyber-Social ...Knowledge-empowered Probabilistic Graphical Models for Physical-Cyber-Social ...
Knowledge-empowered Probabilistic Graphical Models for Physical-Cyber-Social ...
 
CI-Team MSI-CIEC High Performance Computing and CyberInfrastructure (CI) Camp...
CI-Team MSI-CIEC High Performance Computing and CyberInfrastructure (CI) Camp...CI-Team MSI-CIEC High Performance Computing and CyberInfrastructure (CI) Camp...
CI-Team MSI-CIEC High Performance Computing and CyberInfrastructure (CI) Camp...
 
Technoanthropology 1.0
Technoanthropology 1.0Technoanthropology 1.0
Technoanthropology 1.0
 
Short and Long of Data Driven Innovation
Short and Long of Data Driven InnovationShort and Long of Data Driven Innovation
Short and Long of Data Driven Innovation
 
An Integrated Science Cyberinfrastructure for Data-Intensive Research
An Integrated Science Cyberinfrastructure for Data-Intensive ResearchAn Integrated Science Cyberinfrastructure for Data-Intensive Research
An Integrated Science Cyberinfrastructure for Data-Intensive Research
 
What is Research Data Management? UAL
What is Research Data Management? UALWhat is Research Data Management? UAL
What is Research Data Management? UAL
 
Developments in Education for Information: Will ‘Data’ Trigger the Next Wave ...
Developments in Education for Information: Will ‘Data’ Trigger the Next Wave ...Developments in Education for Information: Will ‘Data’ Trigger the Next Wave ...
Developments in Education for Information: Will ‘Data’ Trigger the Next Wave ...
 

Destacado

NSF SI2 program discussion at 2013 SI2 PI meeting
NSF SI2 program discussion at 2013 SI2 PI meetingNSF SI2 program discussion at 2013 SI2 PI meeting
NSF SI2 program discussion at 2013 SI2 PI meeting
Daniel S. Katz
 

Destacado (7)

Massachusetts Tidelands Law
Massachusetts Tidelands LawMassachusetts Tidelands Law
Massachusetts Tidelands Law
 
NSF SI2 program discussion at 2013 SI2 PI meeting
NSF SI2 program discussion at 2013 SI2 PI meetingNSF SI2 program discussion at 2013 SI2 PI meeting
NSF SI2 program discussion at 2013 SI2 PI meeting
 
Software and Education at NSF/ACI
Software and Education at NSF/ACISoftware and Education at NSF/ACI
Software and Education at NSF/ACI
 
Opinions on the State of Production Distributed Infrastructure (PDI)
Opinions on the State of Production Distributed Infrastructure (PDI)Opinions on the State of Production Distributed Infrastructure (PDI)
Opinions on the State of Production Distributed Infrastructure (PDI)
 
Perspectives on Undergraduate Education in Parallel and Distributed Computing
Perspectives on Undergraduate Education in Parallel and Distributed ComputingPerspectives on Undergraduate Education in Parallel and Distributed Computing
Perspectives on Undergraduate Education in Parallel and Distributed Computing
 
US University Research Funding, Peer Reviews, and Metrics
US University Research Funding, Peer Reviews, and MetricsUS University Research Funding, Peer Reviews, and Metrics
US University Research Funding, Peer Reviews, and Metrics
 
20160607 citation4software panel
20160607 citation4software panel20160607 citation4software panel
20160607 citation4software panel
 

Similar a Advancing Science through Coordinated Cyberinfrastructure

Internet2 Bio IT 2016 v2
Internet2 Bio IT 2016 v2Internet2 Bio IT 2016 v2
Internet2 Bio IT 2016 v2
Dan Taylor
 
SOLE: Linking Research Papers with Science Objects
SOLE: Linking Research Papers with Science ObjectsSOLE: Linking Research Papers with Science Objects
SOLE: Linking Research Papers with Science Objects
Tanu Malik
 

Similar a Advancing Science through Coordinated Cyberinfrastructure (20)

Mexico talk foster march 2012
Mexico talk foster march 2012Mexico talk foster march 2012
Mexico talk foster march 2012
 
The Pacific Research Platform:a Science-Driven Big-Data Freeway System
The Pacific Research Platform:a Science-Driven Big-Data Freeway SystemThe Pacific Research Platform:a Science-Driven Big-Data Freeway System
The Pacific Research Platform:a Science-Driven Big-Data Freeway System
 
Summary of 3DPAS
Summary of 3DPASSummary of 3DPAS
Summary of 3DPAS
 
NSF Software @ ApacheConNA
NSF Software @ ApacheConNANSF Software @ ApacheConNA
NSF Software @ ApacheConNA
 
06 e science-bio diversity@ pacc 18.07.2014
06 e science-bio diversity@ pacc 18.07.201406 e science-bio diversity@ pacc 18.07.2014
06 e science-bio diversity@ pacc 18.07.2014
 
Ci days notre_dame_april2010
Ci days notre_dame_april2010Ci days notre_dame_april2010
Ci days notre_dame_april2010
 
Sla2009 D Curation Heidorn
Sla2009 D Curation HeidornSla2009 D Curation Heidorn
Sla2009 D Curation Heidorn
 
An Integrated West Coast Science DMZ for Data-Intensive Research
An Integrated West Coast Science DMZ for Data-Intensive ResearchAn Integrated West Coast Science DMZ for Data-Intensive Research
An Integrated West Coast Science DMZ for Data-Intensive Research
 
Understanding the Big Picture of e-Science
Understanding the Big Picture of e-ScienceUnderstanding the Big Picture of e-Science
Understanding the Big Picture of e-Science
 
A National Big Data Cyberinfrastructure Supporting Computational Biomedical R...
A National Big Data Cyberinfrastructure Supporting Computational Biomedical R...A National Big Data Cyberinfrastructure Supporting Computational Biomedical R...
A National Big Data Cyberinfrastructure Supporting Computational Biomedical R...
 
Coupling Australia’s Researchers to the Global Innovation Economy
Coupling Australia’s Researchers to the Global Innovation EconomyCoupling Australia’s Researchers to the Global Innovation Economy
Coupling Australia’s Researchers to the Global Innovation Economy
 
High Performance Cyberinfrastructure to Support Data-Intensive Biomedical Res...
High Performance Cyberinfrastructure to Support Data-Intensive Biomedical Res...High Performance Cyberinfrastructure to Support Data-Intensive Biomedical Res...
High Performance Cyberinfrastructure to Support Data-Intensive Biomedical Res...
 
GENI Engineering Conference -- Ian Foster
GENI Engineering Conference -- Ian FosterGENI Engineering Conference -- Ian Foster
GENI Engineering Conference -- Ian Foster
 
Internet2 Bio IT 2016 v2
Internet2 Bio IT 2016 v2Internet2 Bio IT 2016 v2
Internet2 Bio IT 2016 v2
 
SOLE: Linking Research Papers with Science Objects
SOLE: Linking Research Papers with Science ObjectsSOLE: Linking Research Papers with Science Objects
SOLE: Linking Research Papers with Science Objects
 
WOW13_RPITWC_Web Observatories
WOW13_RPITWC_Web ObservatoriesWOW13_RPITWC_Web Observatories
WOW13_RPITWC_Web Observatories
 
Calit2: An Experiment in Social Networks
Calit2: An Experiment in Social NetworksCalit2: An Experiment in Social Networks
Calit2: An Experiment in Social Networks
 
Sept 18 NISO Webinar: Research Data Curation, Part 2: Libraries and Big Data ...
Sept 18 NISO Webinar: Research Data Curation, Part 2: Libraries and Big Data ...Sept 18 NISO Webinar: Research Data Curation, Part 2: Libraries and Big Data ...
Sept 18 NISO Webinar: Research Data Curation, Part 2: Libraries and Big Data ...
 
The Future of Telecommunications and Information Technology
The Future of Telecommunications and Information TechnologyThe Future of Telecommunications and Information Technology
The Future of Telecommunications and Information Technology
 
HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8HKU Data Curation MLIM7350 Class 8
HKU Data Curation MLIM7350 Class 8
 

Más de Daniel S. Katz

Software Citation in Theory and Practice
Software Citation in Theory and PracticeSoftware Citation in Theory and Practice
Software Citation in Theory and Practice
Daniel S. Katz
 

Más de Daniel S. Katz (20)

Research software susainability
Research software susainabilityResearch software susainability
Research software susainability
 
Software Professionals (RSEs) at NCSA
Software Professionals (RSEs) at NCSASoftware Professionals (RSEs) at NCSA
Software Professionals (RSEs) at NCSA
 
Parsl: Pervasive Parallel Programming in Python
Parsl: Pervasive Parallel Programming in PythonParsl: Pervasive Parallel Programming in Python
Parsl: Pervasive Parallel Programming in Python
 
Requiring Publicly-Funded Software, Algorithms, and Workflows to be Made Publ...
Requiring Publicly-Funded Software, Algorithms, and Workflows to be Made Publ...Requiring Publicly-Funded Software, Algorithms, and Workflows to be Made Publ...
Requiring Publicly-Funded Software, Algorithms, and Workflows to be Made Publ...
 
What is eScience, and where does it go from here?
What is eScience, and where does it go from here?What is eScience, and where does it go from here?
What is eScience, and where does it go from here?
 
Citation and Research Objects: Toward Active Research Objects
Citation and Research Objects: Toward Active Research ObjectsCitation and Research Objects: Toward Active Research Objects
Citation and Research Objects: Toward Active Research Objects
 
FAIR is not Fair Enough, Particularly for Software Citation, Availability, or...
FAIR is not Fair Enough, Particularly for Software Citation, Availability, or...FAIR is not Fair Enough, Particularly for Software Citation, Availability, or...
FAIR is not Fair Enough, Particularly for Software Citation, Availability, or...
 
Fundamentals of software sustainability
Fundamentals of software sustainabilityFundamentals of software sustainability
Fundamentals of software sustainability
 
Software Citation in Theory and Practice
Software Citation in Theory and PracticeSoftware Citation in Theory and Practice
Software Citation in Theory and Practice
 
URSSI
URSSIURSSI
URSSI
 
Research Software Sustainability: WSSSPE & URSSI
Research Software Sustainability: WSSSPE & URSSIResearch Software Sustainability: WSSSPE & URSSI
Research Software Sustainability: WSSSPE & URSSI
 
Software citation
Software citationSoftware citation
Software citation
 
Expressing and sharing workflows
Expressing and sharing workflowsExpressing and sharing workflows
Expressing and sharing workflows
 
Citation and reproducibility in software
Citation and reproducibility in softwareCitation and reproducibility in software
Citation and reproducibility in software
 
Software Citation: Principles, Implementation, and Impact
Software Citation:  Principles, Implementation, and ImpactSoftware Citation:  Principles, Implementation, and Impact
Software Citation: Principles, Implementation, and Impact
 
Summary of WSSSPE and its working groups
Summary of WSSSPE and its working groupsSummary of WSSSPE and its working groups
Summary of WSSSPE and its working groups
 
Working towards Sustainable Software for Science: Practice and Experience (WS...
Working towards Sustainable Software for Science: Practice and Experience (WS...Working towards Sustainable Software for Science: Practice and Experience (WS...
Working towards Sustainable Software for Science: Practice and Experience (WS...
 
20160607 citation4software opening
20160607 citation4software opening20160607 citation4software opening
20160607 citation4software opening
 
Scientific Software Challenges and Community Responses
Scientific Software Challenges and Community ResponsesScientific Software Challenges and Community Responses
Scientific Software Challenges and Community Responses
 
What do we need beyond a DOI?
What do we need beyond a DOI?What do we need beyond a DOI?
What do we need beyond a DOI?
 

Último

Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
PirithiRaju
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSS
LeenakshiTyagi
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
anilsa9823
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
Sérgio Sacani
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Sérgio Sacani
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Sérgio Sacani
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Lokesh Kothari
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Sérgio Sacani
 

Último (20)

Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSS
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 

Advancing Science through Coordinated Cyberinfrastructure

  • 1.     www.ci.anl.gov   www.ci.uchicago.edu   Advancing  Science  through   Coordinated  Cyberinfrastructure   Daniel  S.  Katz   d.katz@ieee.org   Senior  Fellow,  ComputaBon  InsBtute,  University  of  Chicago  &  Argonne  NaBonal  Laboratory   Affiliate  Faculty,  Center  for  ComputaBon  &  Technology,  Louisiana  State  University   Adjunct  Associate  Professor,  Electrical  and  Computer  Engineering,  LSU    
  • 2. www.ci.anl.gov   www.ci.uchicago.edu   2   Advancing  Science  through  CI  –  d.katz@ieee.org   Topics   •  What  we  did  in  Louisiana  from  2006-­‐2010   •  What  I  would  do  differently  now   •  A  short  video  to  highlight  some  addiBonal  issues   that  I  hope  the  Center  for  ComputaBonal   Engineering  &  Sciences  will  keep  in  mind  
  • 3. www.ci.anl.gov   www.ci.uchicago.edu   3   Advancing  Science  through  CI  –  d.katz@ieee.org   Louisiana   •  Area: 134 382 km2 (33/51) •  Population: 4 533 000 (2010, 25/51) •  GDP: $208 billion (2009, 24/51) •  GDP/person: $45 700 (2009, 21/51) •  In Poverty: 17% (2009, 44/51) •  High School Degree: 82% (2009, 46/51) •  BS Degree: 21% (2009, 47/51) •  Advanced Degree: 7% (2009, 48/51) State  Goals:  talented  workforce,  great  compeBBveness,  strong   educaBonal  system,  increased  economic  development  
  • 4. www.ci.anl.gov   www.ci.uchicago.edu   4   Advancing  Science  through  CI  –  d.katz@ieee.org   PITAC  Report  Summary:     •  “ComputaBonal  science  -­‐-­‐  the  use  of   advanced  compuBng  capabiliBes  to   understand  and  solve  complex   problems  -­‐-­‐  is  criBcal  to  scienBfic   leadership,  economic  compeBBveness,   and  naBonal  security.  It  is  one  of  the   most  important  technical  fields  of  the   21st  century  because  it  is  essenBal  to   advances  throughout  society.”   •  “UniversiBes  must  significantly  change   organizaBonal  structures:     mulBdisciplinary  &  collaboraBve   research  are  needed  [for  US]  to  remain   compeBBve  in  global  science”   Complex  problems:    Innova1ons  will  occur  at  boundaries  
  • 5. www.ci.anl.gov   www.ci.uchicago.edu   5   Advancing  Science  through  CI  –  d.katz@ieee.org   Big  Science  and  Infrastructure   •  Higgs*  boson  discovery  announced  at  CERN  July  4,  2012   •  Instrument:  Large  Hadron  Collider  (LHC)   •  Infrastructure   –  CompuBng  Hardware:  Worldwide  LHC  CompuBng  Grid  (WLCG):  235,000  cores   across  36  countries,  including  OpenScience  Grid  (OSG,  US),  European  Grid   Infrastructure  (EGI,  Europe),  ...   –  Data:  ~20  PB  of  data  created  in  2011-­‐2012   –  Soiware:  grid  middleware,  physics  analysis  applicaBons,  ...   –  Networks   –  EducaBon  &   Training   •  Data  generated     centrally,  moved     (~3  PB/week)   across  mulB-­‐Bered     infrastructure  to  be     compuBng  upon  
  • 6. www.ci.anl.gov   www.ci.uchicago.edu   6   Advancing  Science  through  CI  –  d.katz@ieee.org   Big  Science  and  Infrastructure   •  Hurricanes  affect  humans   •  MulB-­‐physics:  atmosphere,  ocean,  coast,  vegetaBon,  soil   –  Sensors  and  data  as  inputs   •  Humans:  what  have  they  built,  where  are  they,  what  will  they  do   –  Data  and  models  as  inputs   •  Infrastructure:   –  Urgent/scheduled  processing,     workflow  systems   –  Soiware  applicaBons,  workflows   –  Networks   –  Decision-­‐support  systems,     visualizaBon   –  Data  storage,   interoperability  
  • 7. www.ci.anl.gov   www.ci.uchicago.edu   7   Advancing  Science  through  CI  –  d.katz@ieee.org   Long-­‐tail  Science  and  Infrastructure   •  Exploding  data  volumes  &  powerful   simulaBon  methods    mean  that  more   researchers  need  advanced  infrastructure   •  Such  “long-­‐tail”  researchers    cannot  afford   expensive  experBse  and  unique   infrastructure     •  Challenge:  Outsource  and/or  automate   Bme-­‐consuming  common  processes   –  Tools,  e.g.,  Globus  Online  and  data   management   o  Note:  much  LHC  data  is  moved  by  Globus  GridFTP,   e.g.,  May/June  2012,  >20  PB,  >20M  files   –  Gateways,  e.g.,  nanoHUB,  CIPRES,  access  to   scienBfic  simulaBon  soiware   NSF  grant  size,  2007.  (“Dark   data  in  the  long  tail  of   science”,  B.  Heidorn)  
  • 8. www.ci.anl.gov   www.ci.uchicago.edu   8   Advancing  Science  through  CI  –  d.katz@ieee.org   Long-­‐tail  Science  and  Infrastructure   •  CIPRES  Science  Gateway  for  PhylogeneBcs   –  Study  of  diversificaBon  of  life  and  relaBonships  among  living  things  through  Bme   •  Highly  used,  as  of  mid  2013:   –  Cited  in  at  least  400  publicaBons,  e.g.,  Nature,  PNAS,  Cell   –  More  than  5000  unique  users  in  3  years   –  Used  rouBnely  in  at  least  68  undergraduate  classes   –  45%  US  (including  most  states),  55%  70  other  countries   •  Infrastructure   –  Flexible  web  applicaBon   o  A  science  gateway,  uses  soiware  and  lessons  from  XSEDE  gateways  team,  e.g.,  idenBfy   management,  HPC  job  control   –  Science  soiware:  tree  inference  and  sequence  alignment   o  Parallel  versions  of  MrBayes,  RAxML,  GARLI,  BEAST,  MAFFT   o  PAUP*,  Poy,  ClustalW,  Contralign,  FSA,  MUSCLE,  ...   –  Data   o  Personal  user  space  for  storing     results   o  Tools  to  transfer  and  view  data   Credit:  Mark  Miller,  SDSC  
  • 9. www.ci.anl.gov   www.ci.uchicago.edu   9   Advancing  Science  through  CI  –  d.katz@ieee.org   Infrastructure  Challenges   •  Science   –  Larger  teams,  more  disciplines,  more  countries   •  Data     –  Size,  complexity,  rates  all  increasing  rapidly   –  Need  for  interoperability  (systems  and  policies)   •  Systems   –  More  cores,  more  architectures  (GPUs),  more  memory  hierarchy   –  Changing  balances  (latency  vs  bandwidth)   –  Changing  limits  (power,  funds)   –  System  architecture  and  business  models  changing  (clouds)   –  Network  capacity  growing;  increase  networks  -­‐>  increased  security   •  Soiware   –  MulBphysics  algorithms,  frameworks   –  Programing  models  and  abstracBons  for  science,  data,  and  hardware   –  V&V,  reproducibility,  fault  tolerance   •  People   –  EducaBon  and  training   –  Career  paths   –  Credit  and  avribuBon  
  • 10. www.ci.anl.gov   www.ci.uchicago.edu   10   Advancing  Science  through  CI  –  d.katz@ieee.org   Cyberinfrastructure   “Cyberinfrastructure  consists  of    compu1ng  systems,    data  storage  systems,      advanced  instruments  and      data  repositories,      visualiza1on  environments,  and      people,     all  linked  together  by  so@ware  and      high  performance  networks     to  improve  research  produc1vity  and  enable  breakthroughs      not  otherwise  possible.”              -­‐-­‐  Craig  Stewart    
  • 11. www.ci.anl.gov   www.ci.uchicago.edu   11   Advancing  Science  through  CI  –  d.katz@ieee.org   ComputaBonal  &  Data-­‐enabled     Science  &  Engineering  (CDS&E)   •  LIGO:    Laser  Interferometric  GravitaBonal  Wave   Observatory   •  Ties  together  theory,  computaBon,  and  experiment   –  Each  drives  the  other  two!  
  • 12. www.ci.anl.gov   www.ci.uchicago.edu   12   Advancing  Science  through  CI  –  d.katz@ieee.org   How  We  Started   •  State  commitment:  $25M/year  for  Vision  20/20   –  $9M:  LSU  -­‐>  CCT  (similarly,  ULL  -­‐>  LITE)   •  University  commitment  to  build  new  programs  for   21st  century   •  State  and  University  willingness  to  make   extraordinary  investments   •  Opportunity  to  build  new  world  class  program  in   interdisciplinary  research  and  educaBon,  involving   all  of  LSU   •  Ed  Seidel-­‐led  vision  to  insBgate  state-­‐wide   collaboraBon  
  • 13. www.ci.anl.gov   www.ci.uchicago.edu   13   Advancing  Science  through  CI  –  d.katz@ieee.org   Advancing  Research   •  PotenBally  requires  advances  in  three  areas,   depending  on  exisBng  strengths  
  • 14. www.ci.anl.gov   www.ci.uchicago.edu   14   Advancing  Science  through  CI  –  d.katz@ieee.org   CCT Director Office Edward Seidel HPC Partnership McMahon Cyberinfrastructure Development Katz Focus Areas Allen LONI Systems and Software Coast to Cosmos LSU HPC Performance Team Core Comp. Sci. Corporate Relations Blue Waters, etc. Material World Labs: ACAL, DSL, Viz, LCAT, … NSF TeraGrid Cultural Computing Visualization 14   CCT  OrganizaBon  
  • 15. www.ci.anl.gov   www.ci.uchicago.edu   15   Advancing  Science  through  CI  –  d.katz@ieee.org   Cyberinfrastructure  Development   •  Vision:  combine  research  and  infrastructure   –  Research   o  Computer  science   o  ApplicaBons   o  Tools   •  Both  together  have  squared  growth  of  either   alone   •  CyD  staff  –  PhDs  in  CS  and  apps  who  understand   the  whole  picture  and  want  to  grow  the   ecosystem   15   –  Infrastructure   o  Hardware   o  OperaBons   o  Policies  
  • 16. www.ci.anl.gov   www.ci.uchicago.edu   16   Advancing  Science  through  CI  –  d.katz@ieee.org   NaBonal  Lambda  Rail  UNO   Tulane  UL-­‐L   SUBR   LSU   LA  Tech       LONI:  40  Gbps  network   LONI:  ~100TF  IBM,  Dell   Supercomputers   Cybertools:  Tools  and   Services   CompuBng  in  Louisiana   LONI  InsBtute:  People   and  CollaboraBons   TeraGrid,  OSG  
  • 17. www.ci.anl.gov   www.ci.uchicago.edu   17   Advancing  Science  through  CI  –  d.katz@ieee.org   LONI  -­‐  Networking    CompuBng   LSU La TechLSU HSC ULL Tulane SU UNOLSU HSC LONI node Multiple 10GE ~500 core Dell cluster 112 proc. IBM P5 cluster ~4500 core Dell Cluster ULM McNeese NSU SLU Alex Network:  partners  and  customers  
  • 18. www.ci.anl.gov   www.ci.uchicago.edu   18   Advancing  Science  through  CI  –  d.katz@ieee.org   LONI  CompuBng  Resources  (2010)   •  One  central  Dell  cluster  (Queen  Bee)   –  5500  IB-­‐connected  cores  at  ISB  in  Baton  Rouge   –  Archival  storage  contracted  through  NCSA   –  50%  of  allocaBons  dedicated  to  TeraGrid  from  2008       •  Six  distributed  512-­‐core  Dell  clusters   •  Five  distributed  14-­‐node  (112  procs)  IBM  P5-­‐575  clusters   •  Distributed  PetaShare  storage   –  32  TB  disk  @  each  small  Dell  cluster   –  8  TB  disk  on  LSU    LaTech  small  Dell  clusters  –  for  LBRN   –  8  TB  at  SC-­‐S    HSC-­‐NO  –  for  LBRN   –  250  TB  tape   •  All  run  by  HPC@LSU,  including  user  support/training  
  • 19. www.ci.anl.gov   www.ci.uchicago.edu   19   Advancing  Science  through  CI  –  d.katz@ieee.org   $12M  NSF  CyberTools  Project:  Enabler  and  Driver  
  • 20. www.ci.anl.gov   www.ci.uchicago.edu   20   Advancing  Science  through  CI  –  d.katz@ieee.org   Cactus   •  Component-­‐based     HPC  framework     –  Freely-­‐available     environment  for     collaboraBve  applicaBon     development   •  Cuzng  edge  CS   –  Grid  compuBng,  petascale,  accelerators,  steering,  remote  viz   •  AcBve  user    developer  communiBes   –  10  year  pedigree,  $10M  support   –  Numerical  RelaBvity,  CFD,  Coastal,  Reservoir  Engineering,  …   •  Domain-­‐specific  toolkits,  e.g.  CFD  toolkit   –  FD/FV/FE  numerical  methods   –  Structured,  mulB-­‐block,  unstructured   –  Uses  PETSc,  Trilinos,  MUMPS,  HYPRE   –  Used  to  build  Black  Oil  Toolkit  
  • 21. www.ci.anl.gov   www.ci.uchicago.edu   21   Advancing  Science  through  CI  –  d.katz@ieee.org   PetaShare   •  Main  concept:  data  is  managed  (migrated,  moved,  replicated,  cached,  etc.)     automaBcally   •  Data-­‐aware  storage  systems,  data-­‐aware  schedulers,  cross-­‐domain  metadata   scheme   •  Provides:  250  TB  disk,  400  TB  tape     storage  (and  access  to  naBonal     storage  faciliBes)   •  ApplicaBons:     coastal    environmental     modeling,     geospaBal  analysis,     bioinformaBcs,     medical  imaging,     fluid  dynamics,     petroleum  engineering,     numerical  relaBvity,     high  energy  physics.         Credit:  Tevfik  Kosar  
  • 22. www.ci.anl.gov   www.ci.uchicago.edu   22   Advancing  Science  through  CI  –  d.katz@ieee.org   LONI  InsBtute     “CCT  for  the  Louisiana”   •  $15M  5-­‐year  project   –  $7M  BoR,  $8M  from  LaTech,  LSU,  SUBR,  Tulane,  UNO,   ULL   •  Catalyzes  new  inter-­‐insBtuBonal  collaboraBons,   ambiBous  projects  and  top  level  hires:   –  LONI  network  and  compuBng   –  NSF  projects:    PetaShare,  VizTangibles,  TeraGrid,  Blue   Waters   –  EPSCoR:    NSF  CyberTools,  DOE  UCoMS,  DoD     –  NIH:  $17M  LBRN   –  Promote  collaboraBve  research  at  interfaces  for   innovaBon  
  • 23. www.ci.anl.gov   www.ci.uchicago.edu   23   Advancing  Science  through  CI  –  d.katz@ieee.org   LONI  InsBtute  Vision   •  LONI  investments  create  world  leading  infrastructure   •  Create  bold  new  inter-­‐university  superstructure   –  New  faculty,  staff,  students;    train  others.    Focus  on  CS,  Bio,   Materials,  but  all  disciplines  impacted   –  Promote  research  at  interfaces  for  innovaBon   •  Draw  on,  enhance  strengths  of  all  universiBes   –  Strong  groups  recently  created;    collecBvely  world-­‐class   –  Solve  complex  problems  through  collaboraBon    computaBon   –  Much  stronger  recruiBng  opportuniBes  for  all  insBtuBons   –  Statewide  interdisciplinary  educaBon    research  program   •  Create  University-­‐Industry  Research  Centers  (UIRCs)   –  Research  Triangle,  NCSA/UIUC,  Bay  Area,  others   •  Transform  Louisiana   –  Such  commived  cooperaBon  between  sites  extraordinary  
  • 24. www.ci.anl.gov   www.ci.uchicago.edu   24   Advancing  Science  through  CI  –  d.katz@ieee.org   LONI  InsBtute  Hiring  and  Projects   •  Two  new  faculty  at  each  insBtuBon  (12  total)   –  Six  in  CS,  six  in  Comp.  Bio/Materials   •  Six  ComputaBonal  ScienBsts   –  Following  Bavarian  KONWIHR  project   –  Support  70-­‐90  projects  over  five  years;  lead  to  external  funding   •  Graduate  students   –  36  new  students  funded,  trained;  two  years  each   •  One  Coordinator/economic  development   •  All  hiring  coordinated  across  state   •  Leading  faculty  across  state  create  mulB-­‐insBtuBonal  seed   projects   •  Building  on  seeds,  dozens  of  new  projects  selected,  started   •  Exploit  common  themes,  compuBng  environments,  tools   found  in  all  areas  
  • 25. www.ci.anl.gov   www.ci.uchicago.edu   25   Advancing  Science  through  CI  –  d.katz@ieee.org   TeraGrid  (XSEDE)   •  TeraGrid:  world’s  largest  open  scienBfic  discovery  infrastructure   •  Leadership  class  resources  at  eleven  partner  sites  combined  to  create   an  integrated,  persistent  computaBonal  resource   –  High-­‐performance  networks   –  High-­‐performance  computers  (1  Pflops  (~100,000  cores)  -­‐  1.75  Pflops)   o  And  a  Condor  pool  (w/  ~13,000  CPUs)   –  VisualizaBon  systems   –  Data  CollecBons  (30  PB,  100  discipline-­‐specific  databases)   –  Science  Gateways   –  User  portal   –  User  services  -­‐  Help  desk,  training,  advanced  app  support   •  Allocated  to  US  researchers  and  their  collaborators  through  naBonal   peer-­‐review  process   –  Generally,  review  of  compuBng,  not  science   •  Mid  2011:  TeraGrid  -­‐-­‐  XSEDE  
  • 26. www.ci.anl.gov   www.ci.uchicago.edu   26   Advancing  Science  through  CI  –  d.katz@ieee.org   Campus  Champions   •  “Champion”  is  a  staff  or  faculty  member  on  a  campus  that  provides  informaBon  on   XSEDE  to  his/her  colleagues   •  Currently  ~160  insBtuBons  represented  by  champions   •  Champions  get:   –  Monthly  training  and  updates   –  Start-­‐up  accounts   –  Forum  for  sharing  and  interacBons   –  Access  to  informaBon  on  usage  by     local  users   –  RegistraBons  for  annual  XSEDE     Conference  waived   •  Champions  do:   –  Raise  awareness  locally   –  Provide  training   –  Get  users  started  with  access  quickly   –  Represent  needs  of  local  community   –  Provide  feedback  to  improve  services   –  Avend  annual  XSEDE  conference   –  Share  their  training  and  educaBon  materials   –  Build  community  across  campus,  and  among  all  Champions   March 26, 2014 Revised March 22, 2014 Campus Champion Institutions Standard – 87 EPSCoR States – 51 Minority Serving Institutions – 12 EPSCoR States and Minority Serving Institutions – 8 Total Campus Champion Institutions – 158 Credit:  Kay  Hunt  
  • 27. www.ci.anl.gov   www.ci.uchicago.edu   27   Advancing  Science  through  CI  –  d.katz@ieee.org   LONI  and  NaBonal  Cyberinfrastructure   •  TeraGrid   –  One  of  the  11  TeraGrid  Resource  Providers   –  Playing  a  role  in  TG-­‐wide  governance  (TeraGrid  Forum,  ExecuBve   Steering  Commivee,  various  working  groups,  GIG  Director  of   Science)   –  Contributed  administraBve  soiware  AmieGold  (glue  between  TG   account  info  and  local  info)  and  CS  soiware  (HARC,  PetaShare,   SAGA)   •  OSG   –  Currently  providing  resources   •  XSEDE   –  LONI  not  a  partner  in  XSEDE,  but  a  service  provider   •  NaBonally   –  Bringing  in  new  users  from  the  southeast  US   –  LONI  InsBtute  ComputaBonal  ScienBsts  -­‐    Campus  Champions  
  • 28. www.ci.anl.gov   www.ci.uchicago.edu   28   Advancing  Science  through  CI  –  d.katz@ieee.org   Create  and  maintain  a  CI   ecosystem  providing  new   capabili'es  that  advance   and  accelerate  scienBfic   inquiry  at  unprecedented   complexity  and  scale   Support  the   foundaBonal  research   needed  to  conBnue  to   efficiently  advance  CI   Enable  transformaBve,   interdisciplinary,   collaboraBve,  science   and  engineering   research  and  educaBon   through  the  use  of   advanced  CI   Transform  pracBce  through  new  policies   for  CI  addressing  challenges  of  academic   culture,  open  disseminaBon  and  use,   reproducibility  and  trust,  curaBon,   sustainability,  governance,  citaBon,   stewardship,  and  avribuBon  of  authorship   Develop  a  next  generaBon  diverse   workforce  of  scienBsts  and  engineers   equipped  with  essenBal  skills  to  use   and  develop  CI,  with  CI  used  in  both   the  research  and  educa'on  process   NSF  Vision:  Infrastructure  Role    Lifecycle  
  • 29. www.ci.anl.gov   www.ci.uchicago.edu   29   Advancing  Science  through  CI  –  d.katz@ieee.org   Relevant  NSF  Programs   •  EPSCoR  –  targeted  support  for  states  that  are  less   successful  in  NSF  funding   •  MRI  –  Major  Research  InstrumentaBon   •  CIF21  (NSF’s  CI  umbrella)   –  eXtreme  Digital  (XD)   –  Track  1  (Blue  Waters)   –  Soiware  Infrastructure  for  Sustained  InnovaBon  (SI2)   –  Campus  Cyberinfrastructure  -­‐  Network  Infrastructure   and  Engineering  (CC-­‐NIE)   •  IntegraBve  Graduate  EducaBon  and  Research   Traineeship  Program  (IGERT)   •  General  research  programs  
  • 30. www.ci.anl.gov   www.ci.uchicago.edu   30   Advancing  Science  through  CI  –  d.katz@ieee.org   Recap  (to  2010)   •  Louisiana  decides  that  science  and  technology  can   lead  to  a  bever  future   •  Builds  a  regional  cyberinfrastructure  (network,   compuBng,  soiware,  ~data,  people)  that  connects   to  naBonal-­‐scale  infrastructure     –  Using  a  mix  of  naBonal,  state,  and  local  funding   •  Starts  to  change  culture  –  infuse  computaBon  in   academic  departments,  interdisciplinary  hiring,   large  collaboraBve  projects   •  But...   •  Didn’t  really  think  about  data  as  much  as  we  would   have  were  we  starBng  again  today  
  • 31. www.ci.anl.gov   www.ci.uchicago.edu   31   Advancing  Science  through  CI  –  d.katz@ieee.org   •  Swii  is  designed  to  compose  large  parallel  workflows,  from  serial  or  parallel   applicaBon  programs,  to  run  fast  and  efficiently  on  a  variety  of  pla~orms   –  A  parallel  scripBng  system  for  Grids  and  clusters  for  loosely-­‐coupled  applicaBons  -­‐   programs  (executable,  shell,  python,  R,  Octave,  Matlab,  etc.)  linked  by  exchanging   files   –  Easy  to  write:  simple  high-­‐level  C-­‐like  funcBonal  language,  allows  small  Swii   scripts  to  do  large-­‐scale  work   –  Easy  to  run:  contains  all  services  for  running,  in  one  Java  applicaBon   o  Works  on  mulBcore  workstaBons,  HPC,  Grids  (interfaces  to  schedulers,  Globus,  ssh)   –  A  powerful,  efficient,  scalable  and  flexible  execuBon  engine.     o  Scaling  O(10M)  tasks  –  .5M  in  live  science  work,  and  growing   o  CollecBve  data  management  being  developed  to  opBmize  I/O   •  Used  in  earth  science,  neuroscience,  proteomics,  molecular  dynamics,   biochemistry,  economics,  staBsBcs,  knowledge  modeling,  and  more   •  hvp://www.ci.uchicago.edu/swii   M.  Wilde,  N.  Hategan,  J.  M.  Wozniak,  B.  Clifford,  D.  S.  Katz,  I.  Foster,  Swii:  A  language  for  distributed  parallel  scripBng,  Parallel  CompuBng,  v. 37(9),  pp.  633-­‐652,  2011.  
  • 32. www.ci.anl.gov   www.ci.uchicago.edu   32   Advancing  Science  through  CI  –  d.katz@ieee.org   Swii  Programming  model:   all  execuBon  driven  by  parallel  data  flow   •  analyze1()  and  analyze2()  are  computed  in  parallel   •  analyze()  returns  r  when  they  are  done   •  This  parallelism  is  automa1c   •  Works  recursively  throughout  the  program’s  call  graph   –  E.g.,  can  embed  within  foreach  loop,  itself  done  in  parallel   –  Foreach  loops  can  be  nested   (int r) analyze(int i)! {! j = analyze1(i); ! k = analyze2(i);! r = 0.5*(j + k);! }! !
  • 33. www.ci.anl.gov   www.ci.uchicago.edu   33   Advancing  Science  through  CI  –  d.katz@ieee.org   Submit host (login node, laptop, Linux server) Data server Swift script Swii  runBme  system  has  drivers  and  algorithms  to  efficiently   support  and  aggregate  vastly  diverse  runBme  environments   Swii  Environment   Clouds:   Amazon  EC2,   XSEDE  Wispy,   Future  Grid  …   Application Programs
  • 34. www.ci.anl.gov   www.ci.uchicago.edu   34   Advancing  Science  through  CI  –  d.katz@ieee.org   Globus   Big data transfer and sharing… …with Dropbox-like simplicity… …directly from your own storage systems Run as a non-profit service to the non-profit research community
  • 35. www.ci.anl.gov   www.ci.uchicago.edu   35   Advancing  Science  through  CI  –  d.katz@ieee.org   Globus  Users   •  “I  need  a  good  place  to  store  or  backup  my  (big)   research  data,  at  a  reasonable  price.”   •  “I  need  to  easily,  quickly,  and  reliably  move  or   mirror  porBons  of  my  data  to  other  places,   including  my  campus  HPC  system,  lab  server,   desktop,  laptop,  XSEDE,  cloud,  etc.”   •  “I  need  a  way  to  easily  and  securely  share  my  data   with  my  colleagues  at  other  insBtuBons.”   •  “I  want  to  publish  my  data  so  that  it’s  available  and   discoverable  long-­‐term.”   •  “I  want  to  archive  my  data  in  case  it’s  needed   someBme  in  the  future.”  
  • 36. www.ci.anl.gov   www.ci.uchicago.edu   36   Advancing  Science  through  CI  –  d.katz@ieee.org   Globus  is  SaaS   •  Web,  command  line,  and  REST  interfaces   •  Reduced  IT  operaBonal  costs   •  New  features  automaBcally  available   •  Consolidated  support    troubleshooBng   •  Easy  to  add  your  laptop,  server,  cluster,   supercomputer,  etc.  with  Globus  Connect    
  • 37. www.ci.anl.gov   www.ci.uchicago.edu   37   Advancing  Science  through  CI  –  d.katz@ieee.org   Globus  Connected  Resources  on  Campus   •  Research  compuBng  center   •  Department  /  lab  storage   •  Campus-­‐wide  home/project  file  system   •  Mass  Storage  Systems   •  Science  instruments   •  Desktops  and  laptops   •  Custom  web  applicaBons   •  Amazon  Web  Services  S3  
  • 38. www.ci.anl.gov   www.ci.uchicago.edu   38   Advancing  Science  through  CI  –  d.katz@ieee.org   Lessons   •  Three  triangle  facets  (infrastructure,  computaBonal,  interdisciplinary)  have   be  taken  seriously  at  highest  levels,  seen  as  important  component  of   academic  research   •  Infrastructure  need  to  be  integrated  at  all  levels  (laboratory,  campus,   regional,  naBonal,  internaBonal)  –  users  need  to  be  able  to  easily  move  work   and  data  to  appropriate  systems,  and  collaborate  across  locaBons     •  EducaBon  and  training  of  students  and  faculty  is  crucial  –  vast  improvements   are  needed  over  the  small  numbers  currently  reached  through  HPC  center   tutorials;  computaBon  and  computaBonal  thinking  need  to  be  part  of  new   curricula  across  all  disciplines     •  Emphasis  should  be  made  on  broadening  parBcipaBon  in  computaBon,  not   just  focusing  on  high  end  systems  where  decreasing  numbers  of  researchers   can  join  in,  but  making  tools  much  more  easily  usable  and  intuiBve  and   freeing  all  researchers  from  the  limitaBons  of  their  personal  workstaBons,   and  providing  access  to  simple  tools  for  large  scale  parameter  studies,  data   archiving,  visualizaBon  and  collaboraBon   •  Vision  needs  to  be  consistent  –  cannot  be  just  one  person   •  Funding  needs  to  be  stable  (acBviBes  need  to  be  sustainable)  
  • 39. www.ci.anl.gov   www.ci.uchicago.edu   39   Advancing  Science  through  CI  –  d.katz@ieee.org   Video   •  Data  Sharing  -­‐  hvps://www.youtube.com/ watch?v=N2zK3sAtr-­‐4  
  • 40. www.ci.anl.gov   www.ci.uchicago.edu   40   Advancing  Science  through  CI  –  d.katz@ieee.org   Sources   •  D.  S.  Katz  et  al.,  “Louisiana:  A  Model  for  Advancing  Regional  e-­‐Science   through  Cyberinfrastructure,”  Philosophical  TransacBons  of  the  Royal  Society   A,  367(1897),  2009.   –  authors  from  Louisiana  State  University,  Tulane  University,  University  of  Louisiana   at  Lafayeve,  Louisiana  Tech  University,  Louisiana  Community  and  Technical   College  System,  Southern  University,  University  of  New  Orleans   •  G.  Allen  and  D.  S.  Katz,  “ComputaBonal  science,  infrastructure  and   interdisciplinary  research  on  university  campuses:  experiences  and  lessons   from  the  Center  for  ComputaBon  and  Technology,”  NSF  Workshop  on   Sustainable  Funding  and  Business  Models  for  Academic  Cyberinfrastructure   FaciliBes,  Cornell  University,  2010   •  Daniel  S.  Katz,  David  Proctor,  “A  Framework  for  Discussing  e-­‐Research   Infrastructure  Sustainability,”  hvp://dx.doi.org/10.6084/m9.figshare.790767,   submived  to  Workshop  on  Sustainable  Soiware  for  Science:  PracBce  and   Experiences  (hvp://wssspe.researchcompuBng.org.uk)  at  SC13   •  Swii:  Swii  Team,  led  by  Mike  Wilde,  hvp://www.ci.uchicago.edu/swii   •  Globus:  Globus  Team,  led  by  Ian  Foster  and  Steve  Tuecke,  hvp:// www.globus.org