Recombinant DNA technology (Immunological screening)
CYVERSE: TRANSFORMING LIFE SCIENCE RESEARCH VIA CYBERINFRASTRUCTURE
1. CYVERSE: TRANSFORMING LIFE
SCIENCE RESEARCH VIA
CYBERINFRASTRUCTURE
Matthew Vaughn @mattdotvaughn
Director, Life Sciences Computing, TACC
Co-PI Cyverse, Araport, Jetstream Cloud
9/8/2016 1
2. OVERVIEW
9/8/2016 2
• WHAT IS CYVERSE?
• HOW IS IT TRANSFORMATIONAL FOR LIFE
SCIENCES RESEARCH?
• HOW DOES IT FIT INTO THE BIGGER SCHEME?
• WHAT DIRECTIONS AND CHALLENGES ARE IN ITS
FUTURE?
3. CYVERSE IS A CYBERINFRASTRUCTURE
9/8/2016 3
Vision: Transforming science through data-driven
discovery
Mission: To design, develop, deploy, and expand a
national cyberinfrastructure for life
science research, and to train scientists in
its use
4. SUPPORTED BY THE NSF BIO DIRECTORATE
9/8/2016 4
• Division of Biological
Infrastructure
• $100 Million, 10-year investment
• CyVerse resources are
– Freely available to the
community
– Intended to spur national and
international collaboration for
research and education
iPlant 2008
Empowering a
New Plant Biology
iPlant 2013
Cyberinfrastructure for
Life Science
CyVerse 2016
Transforming
Science Through
Data-Driven
Discovery
DBI-0735191
DBI-1265383
9. RESEARCH TEAMS NEED THIS
Store, organize, share primary data
Do basic analysis
Store, organize, share data products
Generate and explore hypotheses
Share analysis code with the scientific public
Integrate results from new experiments
Publish data alongside plots, visualizations and
analytical tools
9/8/2016 9
10. BUT END UP DEALING WITH THIS
Data lifecycle management
Fine-grained permission management
Discoverability
Version control
Taming promising new analysis codes (usually
based immature technology)
Paying for storage, cycles, and consulting
Making their science reproducible
9/8/2016 10
12. CYVERSE PRODUCT MATRIX
9/8/2016 12
Atmosphere
User-provisioned, highly configurable cloud computing environment tailored for
sciences
Discovery
Environment
Web-accessible analysis workbench and gateway to national HPC infrastructure
(XSEDE)
Bisque Software for managing, analyzing and visualizing high throughput imaging data
Data Store
Scalable data storage for managing and sharing data across CyVerse’s CI and
external data resources
Science APIs
Automation interfaces to connect data and computation for rapid integration
external resources. Also used as a graduate teaching platform.
DNA Subway Classroom-friendly bioinformatics teaching platform
Powered by CyVerse Third-party applications built on CyVerse’s foundational services and
13. Welch et al. 2013
Bioinformatics
Specialist
Computing
Professional
Bench Scientist
EMPOWER USERS AT ALL LEVELS
Help them avoid
data and
operations siloes
15. IMPACTS
9/8/2016 15
• 500+ publications
• >2PB user data stored
• 40+k registered users
• Millions of compute
hours annually
• Hundreds of trainees
16. CYVERSE IS A HUB
IN A RICH &
COLLABORATIVE
ECOSYSTEM
9/8/2016 16
• Using
• Collaborating
• Contributing
• Supporting
• Inventing
17. CURRENT INITIATIVES
9/8/2016 17
Enabling Data-Driven Discovery. Providing Advanced Training to Researchers. Removing
Barriers to Reproducible Science.
Cyverse Data Commons
Portable Science Lab
Intensive Engagement
18. CYVERSE DATA COMMONS
9/8/2016 18
Make research data discoverable and reusable. Ensure it ends up stored in its natural repository.
Cyverse Data
Store
Staging Area
Data Commons
Portal
Natural
Repositories
Publish in place
simply by sharing
Curate, format,
describe metadata
Published
snapshot with
DOI and open
access
Facilitated deposit
to NCBI-SRA,
Genbank, and
more
19. PORTABLE SCIENCE LAB
9/8/2016 19
Continue adoption of technologies to describe, encapsulate, and share research code and
data.
Virtual machines, Linux containers, Web Service APIs,
Workflow Standards
Integrated via Interactive, Narrative
Notebooks
21. SUMMARY
9/8/2016 21
• CyVerse is a reference model for cyberinfrastructure that is already
being extended to other disciplines
• CyVerse provides a vertically integrated, scalable data-to-discovery
cyberinfrastructure that leverages existing federal and state
investments to transform life science research
• Cyverse is driving technological and operational innovation via a
web of interactions and collaborations with other projects,
platforms, and infrastructures.
22. KEY CHALLENGE - CYVERSE VALUE PROPOSITION
9/8/2016 22
“Are you still going to be around in 3 years?”
”Why did my analysis fail? Don’t you have big computers?”
“Shouldn’t we just go to Amazon Web Services?”
“I don’t want my students spending time learning computing.”
“Why aren’t you working on X?”
Everyone’s generating TERASCALE DATA
There aren’t thousands of locations capable of computing at this scale
Collaborative teams are geographically dispersed
iPlant can HELP
But now we need flexibility! Jetstream doesn't’t solve hardware issues but is aimed at other challenging aspects.
“Create detailed spatial-temporal molecular atlas (RNA, proteins, metabolites) of the developing lung”
Here are their high level requirements. Seems familiar, right?
They’re inevitably bogged down in these kinds of details…
all while their NEED for computing is outpacing their resources
Ah-ha, you say. They should just move to the cloud!