Más contenido relacionado Similar a Next Generation Biomedical Research using AWS (20) Más de Amazon Web Services (20) Next Generation Biomedical Research using AWS1. P U B L I C S E C T O R
S U M M I T
Washington, DC
2. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Next generation Biomedical
Research using AWS
S e s s i o n I D 3 0 1 0 9 6
Nick Weber
Program Manager, Cloud Services for Research
National Institutes of Health,
Center for Information Technology
James Wiggins
Sr. Solutions Architect,
Academic Medical Centers
Amazon Web Services
3. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Pillars of
biomedical
research
Global
infrastructure
Security and
compliance
Analytics
and AI/ML
Life science/
biomedical solutions
Partner/Collaboration
ecosystem
4. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
5. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
AWS compliance programs
27017
International Organization for
Standardization
27018
International Organization for
Standardization
27001
International Organization for
Standardization
9001
International Organization for
Standardization
AWS Artifact is a free, online portal for
self-service access to download AWS
compliance reports and manage select
agreements
Getting Started>>
FAQ>>
Documentation>>
All Assurance Programs>>
6. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
AWS Guidance for HIPAA
HIPAA Security Controls
Matrix
AWS HIPAA
Whitepaper
HIPAA Reference
Architecture
7. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Stanford University- Deep Learning
• Early detection of diabetic retinopathy
• Leading cause of blindness in adults
• Catch it early enough; prevented 90% of time
“Before AWS, we
couldn’t even attempt
these projects….AWS
makes research
liberating.”
Jason Su
Stanford University Student
8. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Diagnosing Skin Cancer Better than Humans
The Stanford AI Lab uses deep
learning to detect skin cancer at
physician-levels, or better.
• Train algorithm to pinpoint
markers of skin cancer even with
variations in lighting, camera
angle and zoom
• Can be adjusted for sensitivity and
specificity, depending on what the
human users are looking for
9. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
HeartFlow creates personalized medical technology
using deep learning to help diagnose heart disease.
• Analyzes CT scans to create accurate 3D model
of a patient’s heart and coronary arteries
• Simulates the flow of blood in each vessel
• 100% non-invasive solution means 60% of
patients can avoid an angiogram, reducing
healthcare system costs by 25%
Detecting Heart Disease with Deep Learning
10. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
New Research Approaches
Emory University launched The RHEDcloud
Project which helps identify security
controls, provide common implementation
frameworks, and Implement common
automation required to integrate cloud
platforms with on-premises security,
network, and identity management
infrastructure
UCSF School of Medicine implemented
a secure research computing
environment on AWS for PHI. It
increased infrastructure delivery
speed by 90x, allowed for 2x
increased capacity and at a 30-50%
cost reduction.
11. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
OHDSI
• Observational Health Data Sciences
and Informatics (OHDSI)
• Open source community developed
• Train machine learning models to
predict patient health outcomes
• Analyze treatment pathways
• Automatically deploy a complete
enterprise environment
on AWS in ~1 hour
or a training environment in ~5 mins
12. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Medical Image
De-identification
• Amazon Rekognition provides
computer vision
• Amazon Comprehend Medical
provides natural language
processing (NLP)
• No algorithm development or model
training required
• Implemented in less than 100 lines of
Python
13. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Medical Image
Classification
• Use machine learning to detect
diseases shown by medical images
• ML models created using Amazon
SageMaker
• Train a model on over 100,000
images in about 9 hours using large
GPU instances
• Generalized pattern for model
training and threshold selection can
be applied to any image modality
14. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Amazon Comprehend
Medical
15. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
16. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Accelerate time to research
Project
Design your
Research
Infrastructure
Work with IT
to build the
Research
Infrastructure
IRB
Certify Data
Security
protocols
and audits
Install your
research
tools
Search for
Data Sets
Load the data
set into the
research
environment
Research
Opportunity
for increased agility
17. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Amazon EMR
Amazon
DynamoDB
Amazon Kinesis
Data Analytics
AWS Marketplace
Amazon RDS
AWS Lambda AWS IoT Core
AWS
CloudFormation
Amazon Redshift
AWS
Service Catalog
Current state of self-service
AWS console
I Need a
Server
Broad Choices…
Requires Security Policy
Time consuming
Incorrectly tagged
Cost over runs
Amazon S3 Amazon EC2
18. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Self-service with preconfigured compliance
Constrains
Security controls
Parameter validation
IAM assignment
Tag enforcement
Standardizes best practices
JSON or YAML
AWS Services
AWS Marketplace
third-party products
Customer-Created
AWS-Based Solution
AWS Service
Catalog
Admin
19. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
New state of self-service
20. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
AWS enables researcher self-service
Research Lab A
Research Lab B
Security
Configuration
Cromwell
(genomics)
Lab A Genomics Account (Production, PHI)
Data
Access
AWS
Landing Zone /
Control Tower
University Organization
Centralized
Identities Audit
Logging
Security
Policies
Cost
Tracking
OHDSI
(bioinformatics)
Lab B Pop Health Account (Sandbox, Non-PHI)
Data
Access
SLURM
(hpc)
Lab B HPC Account (Production, PHI)
Data
Access
Public
Data Sets
Internal
Data Sets
Research
Partners
Results
Data Lake
Other Data
Connectivity
AWS
Service Catalog
AWS
Service Catalog
AWS
Service Catalog
IT/SecOps
21. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Researcher Autonomy and Selection
Opensource tools
There is a variety of automation freely
available for common open source tools in
Service Catalog.
• Cromwell (GATK)
• OMOP Common Data Model
• Observational Health Data Science and
Informatics (OHDSI) tools
• Project REDCap
• Hail, FastQC for genomics
• Hadoop eco-system tools like Spark
• RStudio, Jupyter Notebooks
• HPC orchestrators like CfnCluster
• ML frameworks like TensorFlow, MXNet,
etc.
• Wordpress, Drupal
• and more…
Commercial Tools
Tools available through the AWS
Marketplace can be provisioned via
Service Catalog.
• Illumina DRAGEN
• SAS
• Tableau
• Hortonworks
• Corda Enterprise Blockchain
• Informatica
• CloudyCluster
• Alces Flight
• MathWorks MATLAB
• Teradata
• Univa Grid Engine
• and more…
Internal Tools
Customize your own automation.
• Create automation for any of your
existing tools.
• Automate the application of your
organization’s security best
practices.
22. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Research
Data
Data
Catalog
Access
Control
Lake Formation
Research data lake
Collaborating
Researchers
23. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Public Research Data Sets
AWS hosts a variety of public datasets that anyone can access for free.
Below are just a few examples.
• 1000 Genomes Project
• The Cancer Genome Atlas
• International Cancer Genome Consortium
• 3000 Rice Genome
• Genome in a Bottle (GIAB)
• The Genome Modeling System
• Medicare Drug Spending
• The Human Connectome Project
• The Human Microbiome Project
• OpenNeuro
• Physionet
• Tabula muris
• OpenStreetMaps
• and more….
24. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Collaboration: Biomedical
research
Nick Weber
Program Manager, Cloud Services for Research
National Institutes of Health, Center for Information Technology
S e s s i o n 3 0 1 0 9 6
25. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
NIH Mission
. . .to seek fundamental knowledge about the nature and
behavior of living systems and the application of that
knowledge to enhance health, lengthen life, and reduce
illness and disability.
26. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Data Infrastructure
• Optimize data storage
and security
• Connect NIH data
systems
Modernized Data
Ecosystem
• Modernize repositories
• Support storage/sharing
of individual datasets
• Better integrate
clinical/observational
data into biomedical data
science
Data Management,
Analytics, and Tools
• Make tools useful/
generalizable/accessible
• Broaden utility of, and
access to, specialized
tools
• Improve discovery and
cataloging resources
Workforce
Development
• Enhance the NIH data
science workforce
• Expand the national
research workforce
• Engage a broader
community
Stewardship and
Sustainability
• Develop policies for a
FAIR data ecosystem
• Enhance stewardship
• Identity and maintain
high-use, high-value data
https://datascience.nih.gov
27. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Must have unique identifiers, effectively labeling data within
searchable resourcesFindable
Must be easily retrievable via open systems with effective and
secure authentication and authorization proceduresAccessible
Should “use and speak the same language” via use of
standardized vocabulariesInteroperable
Must be adequately described to a new user, have clear
information about data-usage licenses, and have a traceable
“owner’s manual” (provenance)
Reusable
Making Data FAIR
28. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Data
Researcher
& Tools
Researcher
& Tools
Researcher
& Tools
Researcher
& Tools
Data &
Tools
Researcher
Researcher
Researcher
A Cloud-based Model Promotes FAIRness
Traditional Model
Cloud-based Model
29. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
• Many NIH programs using the cloud to store and compute on data
• Supports increasing size and complexity of data
• Has robust compute and analytical tools that are constantly evolving
• Provides the ability to share information among geographically distributed groups
• Allows researchers to focus on what they do best!
BUT… cloud adoption doesn’t address all aforementioned research challenges
• The way the data are stored and managed is often unique to each NIH program
• Not enough attention paid to data organization, structure, access, utility, findability, reusability
• Data is the byproduct, whereas the end goal is scientific findings, journal articles
• Result is reduced ability to use/reuse the data, both within and across programs
Use of Cloud for NIH Research is Greatly Increasing
30. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Science and
Technology
Research
Infrastructure for
Discovery,
Experimentation, and
Sustainability
Harnessing the power of commercial cloud computing and
providing NIH researchers access to the most advanced
computational infrastructure, tools, and services
31. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
• A series of public-private sector relationships to obtain cloud-based storage, computing, and related services
at cost-effective rates for NIH and NIH-funded researchers
• Aimed at significantly lowering the barriers to entry for accessing and computing against biomedical research
data
• Administrative obstacles
• Technical obstacles
• Financial obstacles
• Facilitates access to the most advanced computational infrastructure, tools, and services available today in
order to accelerate biomedical research and discovery
• Includes mechanisms to explore unique opportunities to collaborate with commercial partners on the
development of new ways to expand access to and the value high-value research data
The STRIDES Initiative
Overview
32. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
ResearchDataor
Experimental
Hypothesis
Discovery &
Planning
Phase
• Identify
scope,
budget, etc.
• Determine
appropriate
services
• Agree to
enroll in
STRIDES
CSP Setup
& Billing
Phase
• Finalize
funding
• Establish
cloud
account(s)
Data
Migration
Phase
• Identify req’s,
develop
migration plan
• Migrate data,
tools
• Establish data
access
Data Use &
Research
Phase
• Conduct
analyses and
collaborative
research
• Initiate data
lifecycle
management
ScientificKnowledge&Innovation
33. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Data Infrastructure
Modernized Data
Ecosystem
Data Management,
Analytics, and Tools
Workforce
Development
Stewardship and
Sustainability
https://datascience.nih.gov
34. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Much coordination and work needed to realize this vision!
• Metadata standards—consistency and provenance
• Harmonized data—combined, cleaned, and updated
• Cross-cutting metadata models—querying within and across programs
• FAIR assessment—tools to assess/improve FAIRness of data (common metrics)
• Authentication/Authorization—robust permissions to access/use controlled-access data
• Data dashboards—monitoring data management activities
• Data portal—directory to available data sets
• Analysis platforms—cloud-based workspaces supporting end user interactions
• Training—materials for end users to help analyze and understand data
35. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
36. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
SRA
Framingham
Heart Study
…and so many others!
37. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Developing data-driven platforms that integrate large amounts
of genomic and clinical data from different disease types.
Empowering the collaborative discovery, engagement, and
necessary partnerships across disease communities that are
crucial for progress in our biological understanding of diseases.
Enabling rapid translation to personalized treatments for
patients diagnosed with childhood cancer or structural birth
defects.
Accelerating discovery of genetic causes and shared biologic
pathways within and across these conditions.
Supported by
38. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
MCP is a collaboration with Amazon Web Services
that aims to improve access to and analysis of
data from the Human Microbiome Project.
~5 TB of
Human Microbiome Project Data
Hosted in a public dataset at
no cost
Data analytic tools
Researchers can analyze data
online
Supported by
https://nephele.niaid.nih.gov/
39. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
What might we discover when we can link individuals’ electronic health care
records with their personal data, alongside clinical and basic research data?
40. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Too many to count!
NIH Leadership Team: Andrea Norris, Jim Anderson, Susan Gregurick*, Betsy Wilder, Scott Jackson, Teresa
Marquette, Jeff Snyder, Kate O’Sullivan, Ann Gawalt, Taylor Gilliland, Belinda Seto, Vivien Bonazzi*
STRIDES Initiative Team & Friends: Tom Shaw, James Davis, Nigel Horne, Todd Reilly, Antej Nuhanovic,
Matt Gieseke, Valerie Virta, Joel Peterson, Simon Twigger, Jen Yttri, Michael Ojiere
AWS Team: Sanjay Padhi, Eric Egan, Jamie Baker, Kevin Froelke, Brett McMillen, Marcy Collinson, Jeff
Cole, Matt Wascak, Gargi Chhatwal (and more!)
Four Points Technology Team: Joel Lipkin, Dana Sawyer, Michelle Kosiorek, Laurie Lucas, Emily Jones
Many thanks for additional valued support from The MITRE Corporation, Grant Thornton, BioTeam, and the
CIT Communications and Outreach team!
*Including for my shameless borrowing of slide content & ideas!
41. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
Thank you!
© 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.P U B L I C S E C TO R
S U M M I T
James Wiggins: wiggjame@amazon.com
Kaushik Mohanty: kmohanty@amazon.com
Nick Weber: nick.weber@nih.gov