1. Advancing Life Sciences Research
with High Performance Computing
and Cyberinfrastructure
Ian Stokes-Rees
Harvard Medical School
SHOW - Making Biology Binary, June 2010
4. Science Behind the Movie
Multi-scale
Data intensive
Dynamic
Models
Simulation
Analysis
Ian Stokes-Rees, NEBioGrid, Harvard Medical School June 23rd, 2010
5. Water channel through aquaporin tetramere in lipid bilayer
Tajkhorshid, E., Nollert, P., Jensen, M.O., Miercke, L.J., O'Connell, J., Stroud, R.M., and Schulten, K. (2002). Science 296, 525-530
6. Molecular Dynamics
Computationally intensive
Necessarily parallel
Nanosecond scale today
Millisecond to second tomorrow
Rapidly growing interest
Ian Stokes-Rees, NEBioGrid, Harvard Medical School June 23rd, 2010
14. Boston Life Sciences
Universities
Hospitals
Pharmaceuticals
Research Institutes
Tufts
Universit
y
School
of
Medicin
e
Ian Stokes-Rees, NEBioGrid, Harvard Medical School June 23rd, 2010
15. Washington U. School of Med. Cornell U.
R. Cerione NE-CAT
T. Ellenberger
B. Crane R. Oswald
D. Fremont
S. Ealick C. Parrish
Rosalind Franklin NIH M. Jin H. Sondermann
D. Harrison M. Mayer
A. Ke UMass Medical
U. Washington
T. Gonen
U. Maryland W. Royer
E. Toth
Brandeis U.
UC Davis N. Grigorieff
H. Stahlberg Tufts U.
K. Heldwein
UCSF Columbia U.
JJ Miranda
Q. Fan
Y. Cheng
Rockefeller U.
Stanford R. MacKinnon
A. Brunger Yale U.
K. Garcia T. Boggon K. Reinisch
T. Jardetzky D. Braddock J. Schlessinger
Y. Ha F. Sigworth
CalTech E. Lolis F. Zhou
P. Bjorkman Harvard and Affiliates
W. Clemons N. Beglova A. Leschziner
G. Jensen Rice University S. Blacklow K. Miller
D. Rees E. Nikonowicz B. Chen A. Rao
Y. Shamoo Vanderbilt J. Chou T. Rapoport
Y.J. Tao Center for Structural Biology J. Clardy M. Samso
WesternU
W. Chazin C. Sanders M. Eck P. Sliz
M. Swairjo
B. Eichman B. Spiller B. Furie T. Springer
M. Egli M. Stone R. Gaudet G. Verdine
UCSD B. Lacy M. Waterman M. Grant G. Wagner
T. Nakagawa M. Ohi S.C. Harrison L. Walensky
H. Viadiu Thomas Jefferson J. Hogle S.Walker
J. Williams D. Jeruzalmi T.Walz
Ian Stokes-Rees, NEBioGrid, Harvard Medical School D. Kahne June 23rd, 2010
J. Wang
Not Pictured:
University of Toronto: L. Howell, E. Pai, F. Sicheri; NHRI (Taiwan): G. Liou; Trinity College, Dublin: Amir Khan T. Kirchhausen S. Wong
18. Grid Computing
Federated and scalable
Secure
Standardized
Compute sharing & cycle scavenging
Dynamic formation of collaborations
Data sharing
Ian Stokes-Rees, NEBioGrid, Harvard Medical School June 23rd, 2010
23. Acknowledgements
Piotr Sliz
PI and SBGrid team leader
Ian Levesque
Systems Architect
Ben Eisenbraun
Software Curator
Peter Doherty
Grid Administrator
Caitlin Colgrove
Intern Software Engineer
Steve Jahl
System Administrator Ian Stokes-Rees, http://sbgrid.org
24. Summary
Compute power increasingly affordable
New computational techniques
New hardware (multi-core, GPU)
Grid and cloud computing
Fast networking, cheap storage
Scientists developing necessary skills
Be in touch - ijstokes@hkl.hms.harvard.edu
Ian Stokes-Rees, NEBioGrid, Harvard Medical School June 23rd, 2010
26. How to get a structural biologist using CI
Ease of use
No command line
X.509 (initial request, VOs, proxies, Roles, etc.) are really complicated
Support infrastructure (mailing lists, tickets, phone, training)
Killer apps
They will use it if they see peers using it to advance scientific goals
They will use it if some novel workflows or workflow patterns are established
Data management is a big problem for everyone (see bonus, time permitting) -- we
believe grid infrastructure could provide a solution
Security
Data needs to be secure ...
... but users still want to control sharing/access
Roadblocks
Reliability of underlying infrastructure and difficulty in debugging
Applications tied to GUIs, rudimentary interfaces
Ian Stokes-Rees, SBGrid, Harvard Medical School October 13th, 2009
27. Security Challenges
Identity Management
Mixture of .htpasswd, PAM, X.509, and application-specific IDs
Complexity of X.509 (and associated paraphernalia) confuses users
account creation, use, and management
Virtual Organization hierarchies and user-driven collaborations
Inheritance of rights/policies
How to allow users to easily create and manage groups
Merging security policies
Site/resource, VO, and user policies need to be merged
Encryption and Privacy Preservation
Generic mechanisms for encryption and key management
Preserving privacy of actions and data in federated grid environment
Ian Stokes-Rees, NEBioGrid, Harvard Medical School June 23rd, 2010
28. Security Work
Meta data system
Provide more generic pointers to ACLs and encryption keys
Extension of GACL system
Include non-X.509 ID tokens as policy principals
Allow GACL policies to apply to web framework objects (pyGACL)
Simple replicated key system for file encryption
Use of meta-data framework to point to encryption key (and replicas)
Use GACL to control key access (regular file)
Libraries to automatically read/write encrypted files
Future
VO hierarchies
Tools for user driven ACL management
Tools for policy management (merging site, VO and user policies)
Ian Stokes-Rees, NEBioGrid, Harvard Medical School June 23rd, 2010