SlideShare una empresa de Scribd logo
1 de 56
Descargar para leer sin conexión
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Brian O’Connor
Technical Director - Analysis Core
UCSC Genomics Institute
Nov 28th, 2016
Large-scale, Cloud-based Analysis of
Cancer Genomes
Lessons Learned from the PCAWG Project
Overview
Past Present Future
PCAWG: A Cloud-Based, Distributed Collaboration
● International Cancer Genome
Consortium (ICGC)
● ~5,800 Whole Genomes
–~2,800 Cancer Donors
–~1,300 with RNASeq data
–Goal is to consistently analyze data
● 8 sites storing and sharing data via GNOS
– 300TB -> 900TB
● 14 Cloud (and HPC) environments
–3 Commercial, 7 OpenStack, 4 HPC
–~630 VMs, ~15K cores, ~60TB of
RAM
PCAWG Cloud Analysis “Core” Workflows
PCAWG Lessons Learned
1. Commercial cloud policies
2. Portable tools
3. Failure-tolerant, distributed execution infrastructure
4. Commercial cloud costs
Lesson 1: Commercial Cloud Policies
• PCAWG analysis showed the power of clouds
• Key policy changes enabled commercial cloud usage
• NIH updated dbGaP cloud policy - March 2015
• ICGC DACO updated ICGC cloud policy - May 2015
• Partnerships with commercial cloud entities
• Amazon Public Datasets Program
• Seven Bridges
• DNAnexus
PCAWG Cloud Analysis Architecture
GNOS
Academic
Compute
Centers
Cloud
Orchestrator
Compute AWS Cloud
Cloud
Orchestrator
Metadata
Index
Sequencing
Projects
Spot
Instances
Work Orders
PCAWG Analysis Architecture & AWS
GNOS
Academic
Compute
Centers
Cloud
Orchestrator
Compute
AWS
Cloud
Cloud
Orchestrator
Metadata
Index
DNAnexus
Seven
Bridges
Sequencing
Projects
Represents a major shift, ICGC data now redistributed within Amazon’s Cloud
Spot
Instances
Work Orders
Amazon S3
Lesson 2: Portable Tools
Containerized workflows for portability between sites
Core Workflows
Alignment: BWA-Mem
Variant Calling: Broad, DKFZ/EMBL, and Sanger
https://github.com/ICGC-TCGA-PanCancer
Lesson 3: Fault-Tolerant Cloud Execution
Architecture 1.0
Architecture 2.0
Architecture 3.0
● cloud-based
clusters
● gluster distributed
filesystem
● scheduling per
cloud
● single-node
workers
● no distributed
filesystem
● ansible for setup ● a complete rethink
Lesson 4: Cloud Costs
Workflow Hardware (cores /
machine)
Runtimes Cost on AWS
BWA 8 cores (16 GB RAM) 5 days (± 5) per
specimen
$11.16
Sanger 8 cores (32 GB RAM) 4 days (± 3) per
donor
$17.22
DKFZ /
EMBL
16 cores (64 GB RAM) 2 days (± 0.6)
per donor
$12.80
Broad 32 cores (256 GB RAM) 2.6 days per
donor
$20.48
workflow storage required per donor
BWA 240 GB
Sanger 4 GB
DKFZ / EMBL 5 GB
Total 249 GB
Data analysis: Create a cloud
commons, Nature 2015
$62/donor
ICGC PCAWG Legacy
Publications soon
AWS Public Datasets
Program
~1,400 PCAWG Donors
- BAM (~70% of ICGC
donors)
- VCF from all three
pipelines
- more ICGC data uploaded
regularly
https://dcc.icgc.org/icgc-in-the-cloud
The Present
Goal: to formalize lessons from PCAWG into reusable tools
Dockstore
Tool/Workflow
Sharing
Toil
Workflow
Execution
Redwood
File
Storage
Ecosystem of Tools
Redwood - Scalable Storage
Authentication
& Storage
Services
Key Features: based on ICGC Storage Service, supports FUSE, BAM
Slicing, and Highly Parallel access, typically WORM usage pattern
client
Amazon S3
Amazon EC2
instance
AWS cloud
Redwood - Storage System Performance
The Redwood Storage System (and underlying S3) provided a stable
and secure mechanism to store and use genomic data
Example run of ~100 simultaneously downloads saw ~45-100MB/s
Dockstore.org - Sharing Tools & Workflows
Dockstore:
● Share tools and
workflows
● Package tools with
Docker, Describe
with CWL/WDL
● PCAWG goal,
provide our tools
via Dockstore
http://dockstore.org and https://github.com/ga4gh
Dockstore Architecture
Built on
DockerHub/Quay.io
and
GitHub/BitBucket
Adds metadata to
address
shortcomings for
bioinformatics
workflows
CWL/WDL is the
natural choice for
Descriptor
Dockstore 1.0 Release
Highlighted New Features
Support for 1.0.0 GA4GH Tool Registry API
Support for displaying, sharing, and natively launching CWL 1.0 &
WDL tools
Preliminary support for CWL/WDL workflows
Full list of updates since 0.4-beta.4 in
https://github.com/ga4gh/dockstore/releases
New Content
ICGC PanCancer Analysis of Whole Genomes (PCAWG) tools
• BWA-mem, Sanger, Delly, DKFZ
Dockstore Tour
Search
Main Page
Tool Management
Running Dockstore Tools
Execution with the Dockstore Command Line Interface (CLI)
Goal was something simple but want the same process
accessible via other execution systems!
provision
input files
pull
Docker
images
execute
tool with
inputs
using CWL
provision
output files
somewhere
Seven Bridges, Curoverse, Galaxy, Consonance, etc
Simple Dockstore Command Line
Coming Soon to Dockstore
Workflow DAG view
Testing PCAWG
Test Data
“Launch With…”
• Consonance
• Commercial partner(s)
Signed Dockers
Cross site indexing
See Roadmap:
https://goo.gl/4D9a8F
Toil - Efficient Compute on AWS
●A system for large-scale, efficient work on AWS
●Toil recently completed a 30K core, 20K sample re-
compute
●Per job granularity allows for better efficiency and
robustness
The job graph in
Toil can be either
statically or
dynamically
declared.
Toil - Dynamic DAGs
Toil - Spark & ADAM Integration
Amazon EC2 Instances
master
slave
slave slave
slave
User scripts are written in pure Python
from toil.job import Job
def helloWorld(message, memory="2G", cores=2, disk="3G"):
return "Hello, world!, here's a message: %s" % message
j = Job.wrapFn(helloWorld, "You did it!")
if __name__=="__main__":
parser = Job.Runner.getDefaultArgumentParser()
options = parser.parse_args()
print Job.Runner.startToil(j, options) #Prints Hello, world!, ...
Toil - Accessible to New Developers
● Toil can be installed on any system with
Python 2.7
● Built-in support for various batch systems - a few
in part to open-source community support!
○ Mesos
○ SGE (GridEngine)
○ UCSC’s Parasol
○ Single Machine Mode
○ LSF
○ SLURM
● All batch systems can be interchangeably used
with any of the job stores
Toil - Portable
● Cloud-based job stores are designed to handle
many concurrent workers
● Mesos has been shown to scale to 50k simulated
nodes in Amazon Elastic Compute Cloud (EC2)
● Workers try to reduce interactions with the master
by scheduling jobs locally
Toil - Scalable
● Jobs are checkpointed upon completion, allowing
for resumability after job failure
● Toil’s jobstore can resume from any combination of
leader/worker failure
● Toil currently supports job stores for:
○ Shared file systems
○ AWS (Amazon S3 + Amazon SimpleDB)
○ Experimental support for Azure / Google Cloud
Toil - Robust to Failures
Toil in Action
20,000 RNA-seq Sample Recompute
Scalable and robust to failure
Toil RNA-seq Recompute
The Future
PCAWG showed the power of cloud for large scientific
analysis
Current work with Redwood, Dockstore, and Toil
formalized lessons learned and methodologies
Our future work focuses on establishing standards
from our previous work and applying these to future
larger-scale efforts
Tool Registry API
● Formalizing the standard with the GA4GH through the Containers and
Workflows Task Team, implemented in Dockstore
● Basic read API with extended support for write and search
Tool(s)
descriptor
Docker GET list
GET search
POST register
CWL/WDL Conventions API Standard to Share
Emerging GA4GH API Standards
Emerging GA4GH API Standards
Further work of the Containers and Workflows Task Team
Workflow/Task Execution APIs
POST new task
GET task status
GET task
stderr/stdout
API Standard to Execute
Tools
Docker
JSON
stderr stdout file(s)
status
+
Cloud-specific
Implementation
WDL/CWL
Workflow or
GA4GH Containers & Workflow Vision
Toil
Dockstore.org
Redwood
- GA4GH Containers & Workflows Task Team
- Broad Institute
- Cincinnati Children’s Hospital
- Curoverse
- European Bioinformatics Institute
- Intel
- Institute for Systems Biology
- Google, Microsoft, and Amazon
- Ontario Institute for Cancer Research
- Oregon Health and Science University
- Seven Bridges Genomics
- University of California Santa Cruz
● Lincoln Stein, Josh Stuart,
Gad Getz, Peter Campbell,
Jan Korbel - PCAWG
● Vincent Ferretti - Storage
● Denis Yuen - Dockstore
● Kyle Ellrott - Task API
● Peter Amstutz - Workflow API
and Co-leader
● Jeff Gentry - Co-leader
● Hannes Schmidt, Frank
Nothaft & the Toil Team
Acknowledgements
Software Availability
Dockstore
Tool/Workflow
Sharing
Toil
Workflow
Execution
Redwood
File
Storage
https://github.com/icgc-dcc/dcc-storage https://dockstore.org/ https://toil.readthedocs.io
All three projects are open source and welcome your contributions
The AWS Perspective
Enabling science
Scalable compute resource only when needed
Time to result was greatly reduced
Cost of analysis was greatly reduced
Data is able to be securely shared in place
Global community access
Open data as a platform
Data Creation Data Enrichment
Sensemaking
Data at Rest
(Object storage)
Basic APIs
Complex APIs
Consumer
applications
Algorithmic
policy
Data-driven
journalism
Data Catalogs
Focused data
dashboards
Predictive
modeling
Visualizations
Lower cost of knowledge
(Efficiency)
45
Open data as a platform
Data Creation Data Enrichment
Sensemaking
Data at Rest
(Object storage)
Basic APIs
Complex APIs
Consumer
applications
Algorithmic
policy
Data-driven
journalism
Data Catalogs
Focused data
dashboards
Predictive
modeling
Visualizations
Lower cost of knowledge
(Efficiency)
46
BAM gVCF
Wig, GFF
? ?
?
??
Amazon S3 for science
Amazon S3
Data Lake
Data Science Sandbox
Visualization /
Reporting
Public datasets on AWS
To enable more innovation, AWS hosts a selection of datasets that anyone
can access for free. Data in our public datasets is available for rapid
access to our flexible and low-cost computing resources.
Earth Science
• Landsat
• NEXRAD
• NASA NEX
Life Science
• TCGA & ICGC
• 1000 Genomes
• Genome in a Bottle
• Human Microbiome Project
• 3000 Rice Genome Internet Science
• Common Crawl Corpus
• Google Books Ngrams
• Multimedia Commons
https://aws.amazon.com/public-datasets/
Serverless Science with AWS
Lambda
AWS Lambda
Continuous ScalingNo Servers to
Manage
AWS Lambda automatically
scales your application by running
code in response to each trigger.
Your code runs in parallel and
processes each trigger
individually, scaling precisely with
the size of the workload.
Subsecond
Metering
With AWS Lambda, you are
charged for every 100ms your code
executes and the number of times
your code is triggered. You don't
pay anything when your code isn't
running.
AWS Lambda automatically runs
your code without requiring you to
provision or manage servers. Just
write the code and upload it to
Lambda.
Serverless, event-driven compute service
Key Scenarios
Stateless processing of discrete or
streaming updates to your data-store or
message bus
Customize responses and response
workflows to state and data changes
within AWS
Execute server side backend logic in a
cross platform fashion
Data processing App backend development Control systems
Evented genome sequence processing
Nanocall*
* Matei David (Jared T. Simpson lab)
doi:10.1093/bioinformatics/btw569
 The use of API gateway to execute Lambda
functions that bundle a statistical program
function in R for calculating the significance of
an association of a gene’s expression level
with patient survival for every gene in the
genome (~20K)
 Utilization of this Serverless architecture
enabled them to scale dynamically without
paying for idle compute and leveraging robust
error handling capabilities
 Exemplifies how researchers can leverage
PHI data de-identification to use more
resources on the AWS platform
Data analysis using R, API Gateway, and Lambda
Station X’s GenePool platform enables real-time biomarker analysis and management of
clinical genomic data at scale.
The patient data has been de-
identified…API Gateway and
Lambda only receive the event,
time-to-event, and expression
values [which] ensures that we are
able to use Lambda and API
Gateway...while still complying with
the AWS BAA and HIPAA.
“
”
GT-Scan2 – Scaling CRISPR-Cas9 searches
Thank you!
Remember to complete
your evaluations!

Más contenido relacionado

La actualidad más candente

Running Relational Databases on AWS
Running Relational Databases on AWS  Running Relational Databases on AWS
Running Relational Databases on AWS Amazon Web Services
 
AWS re:Invent 2016: How Thermo Fisher Is Reducing Mass Spectrometry Experimen...
AWS re:Invent 2016: How Thermo Fisher Is Reducing Mass Spectrometry Experimen...AWS re:Invent 2016: How Thermo Fisher Is Reducing Mass Spectrometry Experimen...
AWS re:Invent 2016: How Thermo Fisher Is Reducing Mass Spectrometry Experimen...Amazon Web Services
 
AWS re:Invent 2016: Turner's cloud native media supply chain for TNT, TBS, Ad...
AWS re:Invent 2016: Turner's cloud native media supply chain for TNT, TBS, Ad...AWS re:Invent 2016: Turner's cloud native media supply chain for TNT, TBS, Ad...
AWS re:Invent 2016: Turner's cloud native media supply chain for TNT, TBS, Ad...Amazon Web Services
 
AWS re:Invent 2016: Configuration Management in the Cloud (DEV305)
AWS re:Invent 2016: Configuration Management in the Cloud (DEV305)AWS re:Invent 2016: Configuration Management in the Cloud (DEV305)
AWS re:Invent 2016: Configuration Management in the Cloud (DEV305)Amazon Web Services
 
The State of Serverless Computing | AWS Public Sector Summit 2017
The State of Serverless Computing | AWS Public Sector Summit 2017The State of Serverless Computing | AWS Public Sector Summit 2017
The State of Serverless Computing | AWS Public Sector Summit 2017Amazon Web Services
 
AWS re:Invent 2016: AWS Database State of the Union (DAT320)
AWS re:Invent 2016: AWS Database State of the Union (DAT320)AWS re:Invent 2016: AWS Database State of the Union (DAT320)
AWS re:Invent 2016: AWS Database State of the Union (DAT320)Amazon Web Services
 
Advanced AWS techniques from the trenches of the Enterprise – Sourced Group
Advanced AWS techniques from the trenches of the Enterprise – Sourced GroupAdvanced AWS techniques from the trenches of the Enterprise – Sourced Group
Advanced AWS techniques from the trenches of the Enterprise – Sourced GroupAmazon Web Services
 
Introduction to Container Management on AWS
Introduction to Container Management on AWSIntroduction to Container Management on AWS
Introduction to Container Management on AWSAmazon Web Services
 
Deep Dive on Microservices and Docker - AWS Summit Cape Town 2017
Deep Dive on Microservices and Docker - AWS Summit Cape Town 2017Deep Dive on Microservices and Docker - AWS Summit Cape Town 2017
Deep Dive on Microservices and Docker - AWS Summit Cape Town 2017Amazon Web Services
 
AWS re:Invent 2016: Effective Application Data Analytics for Modern Applicati...
AWS re:Invent 2016: Effective Application Data Analytics for Modern Applicati...AWS re:Invent 2016: Effective Application Data Analytics for Modern Applicati...
AWS re:Invent 2016: Effective Application Data Analytics for Modern Applicati...Amazon Web Services
 
Unlocking Self-Service Big Data Analytics on AWS
Unlocking Self-Service Big Data Analytics on AWSUnlocking Self-Service Big Data Analytics on AWS
Unlocking Self-Service Big Data Analytics on AWSAmazon Web Services
 
AWS APAC Webinar Week - Maintaining Performance & Availability While Lowering...
AWS APAC Webinar Week - Maintaining Performance & Availability While Lowering...AWS APAC Webinar Week - Maintaining Performance & Availability While Lowering...
AWS APAC Webinar Week - Maintaining Performance & Availability While Lowering...Amazon Web Services
 
Design, Deploy, and Optimize SQL Server on AWS - June 2017 AWS Online Tech Talks
Design, Deploy, and Optimize SQL Server on AWS - June 2017 AWS Online Tech TalksDesign, Deploy, and Optimize SQL Server on AWS - June 2017 AWS Online Tech Talks
Design, Deploy, and Optimize SQL Server on AWS - June 2017 AWS Online Tech TalksAmazon Web Services
 
Migrate from Oracle to Amazon Aurora using AWS Schema Conversion Tool & AWS D...
Migrate from Oracle to Amazon Aurora using AWS Schema Conversion Tool & AWS D...Migrate from Oracle to Amazon Aurora using AWS Schema Conversion Tool & AWS D...
Migrate from Oracle to Amazon Aurora using AWS Schema Conversion Tool & AWS D...Amazon Web Services
 
ENT309 scaling up to your first 10 million users
ENT309 scaling up to your first 10 million usersENT309 scaling up to your first 10 million users
ENT309 scaling up to your first 10 million usersAmazon Web Services
 
AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration...
AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration...AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration...
AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration...Amazon Web Services
 
SRV412 Deep Dive on CICD and Docker
SRV412 Deep Dive on CICD and DockerSRV412 Deep Dive on CICD and Docker
SRV412 Deep Dive on CICD and DockerAmazon Web Services
 
AWS re:Invent 2016: Develop Your Migration Toolkit (ENT312)
AWS re:Invent 2016: Develop Your Migration Toolkit (ENT312)AWS re:Invent 2016: Develop Your Migration Toolkit (ENT312)
AWS re:Invent 2016: Develop Your Migration Toolkit (ENT312)Amazon Web Services
 
SRV409 Deep Dive on Microservices and Docker
SRV409 Deep Dive on Microservices and DockerSRV409 Deep Dive on Microservices and Docker
SRV409 Deep Dive on Microservices and DockerAmazon Web Services
 
Migrating your Databases to AWS: Deep Dive on Amazon RDS and AWS Database Mig...
Migrating your Databases to AWS: Deep Dive on Amazon RDS and AWS Database Mig...Migrating your Databases to AWS: Deep Dive on Amazon RDS and AWS Database Mig...
Migrating your Databases to AWS: Deep Dive on Amazon RDS and AWS Database Mig...Amazon Web Services
 

La actualidad más candente (20)

Running Relational Databases on AWS
Running Relational Databases on AWS  Running Relational Databases on AWS
Running Relational Databases on AWS
 
AWS re:Invent 2016: How Thermo Fisher Is Reducing Mass Spectrometry Experimen...
AWS re:Invent 2016: How Thermo Fisher Is Reducing Mass Spectrometry Experimen...AWS re:Invent 2016: How Thermo Fisher Is Reducing Mass Spectrometry Experimen...
AWS re:Invent 2016: How Thermo Fisher Is Reducing Mass Spectrometry Experimen...
 
AWS re:Invent 2016: Turner's cloud native media supply chain for TNT, TBS, Ad...
AWS re:Invent 2016: Turner's cloud native media supply chain for TNT, TBS, Ad...AWS re:Invent 2016: Turner's cloud native media supply chain for TNT, TBS, Ad...
AWS re:Invent 2016: Turner's cloud native media supply chain for TNT, TBS, Ad...
 
AWS re:Invent 2016: Configuration Management in the Cloud (DEV305)
AWS re:Invent 2016: Configuration Management in the Cloud (DEV305)AWS re:Invent 2016: Configuration Management in the Cloud (DEV305)
AWS re:Invent 2016: Configuration Management in the Cloud (DEV305)
 
The State of Serverless Computing | AWS Public Sector Summit 2017
The State of Serverless Computing | AWS Public Sector Summit 2017The State of Serverless Computing | AWS Public Sector Summit 2017
The State of Serverless Computing | AWS Public Sector Summit 2017
 
AWS re:Invent 2016: AWS Database State of the Union (DAT320)
AWS re:Invent 2016: AWS Database State of the Union (DAT320)AWS re:Invent 2016: AWS Database State of the Union (DAT320)
AWS re:Invent 2016: AWS Database State of the Union (DAT320)
 
Advanced AWS techniques from the trenches of the Enterprise – Sourced Group
Advanced AWS techniques from the trenches of the Enterprise – Sourced GroupAdvanced AWS techniques from the trenches of the Enterprise – Sourced Group
Advanced AWS techniques from the trenches of the Enterprise – Sourced Group
 
Introduction to Container Management on AWS
Introduction to Container Management on AWSIntroduction to Container Management on AWS
Introduction to Container Management on AWS
 
Deep Dive on Microservices and Docker - AWS Summit Cape Town 2017
Deep Dive on Microservices and Docker - AWS Summit Cape Town 2017Deep Dive on Microservices and Docker - AWS Summit Cape Town 2017
Deep Dive on Microservices and Docker - AWS Summit Cape Town 2017
 
AWS re:Invent 2016: Effective Application Data Analytics for Modern Applicati...
AWS re:Invent 2016: Effective Application Data Analytics for Modern Applicati...AWS re:Invent 2016: Effective Application Data Analytics for Modern Applicati...
AWS re:Invent 2016: Effective Application Data Analytics for Modern Applicati...
 
Unlocking Self-Service Big Data Analytics on AWS
Unlocking Self-Service Big Data Analytics on AWSUnlocking Self-Service Big Data Analytics on AWS
Unlocking Self-Service Big Data Analytics on AWS
 
AWS APAC Webinar Week - Maintaining Performance & Availability While Lowering...
AWS APAC Webinar Week - Maintaining Performance & Availability While Lowering...AWS APAC Webinar Week - Maintaining Performance & Availability While Lowering...
AWS APAC Webinar Week - Maintaining Performance & Availability While Lowering...
 
Design, Deploy, and Optimize SQL Server on AWS - June 2017 AWS Online Tech Talks
Design, Deploy, and Optimize SQL Server on AWS - June 2017 AWS Online Tech TalksDesign, Deploy, and Optimize SQL Server on AWS - June 2017 AWS Online Tech Talks
Design, Deploy, and Optimize SQL Server on AWS - June 2017 AWS Online Tech Talks
 
Migrate from Oracle to Amazon Aurora using AWS Schema Conversion Tool & AWS D...
Migrate from Oracle to Amazon Aurora using AWS Schema Conversion Tool & AWS D...Migrate from Oracle to Amazon Aurora using AWS Schema Conversion Tool & AWS D...
Migrate from Oracle to Amazon Aurora using AWS Schema Conversion Tool & AWS D...
 
ENT309 scaling up to your first 10 million users
ENT309 scaling up to your first 10 million usersENT309 scaling up to your first 10 million users
ENT309 scaling up to your first 10 million users
 
AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration...
AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration...AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration...
AWS re:Invent 2016: Netflix: Container Scheduling, Execution, and Integration...
 
SRV412 Deep Dive on CICD and Docker
SRV412 Deep Dive on CICD and DockerSRV412 Deep Dive on CICD and Docker
SRV412 Deep Dive on CICD and Docker
 
AWS re:Invent 2016: Develop Your Migration Toolkit (ENT312)
AWS re:Invent 2016: Develop Your Migration Toolkit (ENT312)AWS re:Invent 2016: Develop Your Migration Toolkit (ENT312)
AWS re:Invent 2016: Develop Your Migration Toolkit (ENT312)
 
SRV409 Deep Dive on Microservices and Docker
SRV409 Deep Dive on Microservices and DockerSRV409 Deep Dive on Microservices and Docker
SRV409 Deep Dive on Microservices and Docker
 
Migrating your Databases to AWS: Deep Dive on Amazon RDS and AWS Database Mig...
Migrating your Databases to AWS: Deep Dive on Amazon RDS and AWS Database Mig...Migrating your Databases to AWS: Deep Dive on Amazon RDS and AWS Database Mig...
Migrating your Databases to AWS: Deep Dive on Amazon RDS and AWS Database Mig...
 

Similar a AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Lessons Learned from the PCAWG Project (LFS304)

Computing Outside The Box September 2009
Computing Outside The Box September 2009Computing Outside The Box September 2009
Computing Outside The Box September 2009Ian Foster
 
AWS Community Day Bangkok 2019 - How AWS Parallel Cluster can accelerate high...
AWS Community Day Bangkok 2019 - How AWS Parallel Cluster can accelerate high...AWS Community Day Bangkok 2019 - How AWS Parallel Cluster can accelerate high...
AWS Community Day Bangkok 2019 - How AWS Parallel Cluster can accelerate high...AWS User Group - Thailand
 
Ceph used in Cancer Research at OICR
Ceph used in Cancer Research at OICRCeph used in Cancer Research at OICR
Ceph used in Cancer Research at OICRCeph Community
 
Get Your Head in the Cloud - Lessons in GPU Computing with Schlumberger
Get Your Head in the Cloud - Lessons in GPU Computing with SchlumbergerGet Your Head in the Cloud - Lessons in GPU Computing with Schlumberger
Get Your Head in the Cloud - Lessons in GPU Computing with Schlumbergerinside-BigData.com
 
DEVNET-1140 InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
DEVNET-1140	InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...DEVNET-1140	InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
DEVNET-1140 InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...Cisco DevNet
 
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...DataStax
 
Geospatial Sensor Networks and Partitioning Data
Geospatial Sensor Networks and Partitioning DataGeospatial Sensor Networks and Partitioning Data
Geospatial Sensor Networks and Partitioning DataAlexMiowski
 
The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...
The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...
The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...t_ivanov
 
StorPool Storage Оverview and Integration with CloudStack
StorPool Storage Оverview and Integration with CloudStackStorPool Storage Оverview and Integration with CloudStack
StorPool Storage Оverview and Integration with CloudStackShapeBlue
 
Hpc Cloud project Overview
Hpc Cloud project OverviewHpc Cloud project Overview
Hpc Cloud project OverviewFloris Sluiter
 
Revolutionary Storage for Modern Databases, Applications and Infrastrcture
Revolutionary Storage for Modern Databases, Applications and InfrastrctureRevolutionary Storage for Modern Databases, Applications and Infrastrcture
Revolutionary Storage for Modern Databases, Applications and Infrastrcturesabnees
 
Introduction to Apache Mesos and DC/OS
Introduction to Apache Mesos and DC/OSIntroduction to Apache Mesos and DC/OS
Introduction to Apache Mesos and DC/OSSteve Wong
 
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...Ceph Community
 
VMworld 2013: Virtualizing Databases: Doing IT Right
VMworld 2013: Virtualizing Databases: Doing IT Right VMworld 2013: Virtualizing Databases: Doing IT Right
VMworld 2013: Virtualizing Databases: Doing IT Right VMworld
 
Computing Outside The Box June 2009
Computing Outside The Box June 2009Computing Outside The Box June 2009
Computing Outside The Box June 2009Ian Foster
 
High Performance Computing with AWS
High Performance Computing with AWSHigh Performance Computing with AWS
High Performance Computing with AWSAmazon Web Services
 
XDF 2019 Xilinx Accelerated Database and Data Analytics Ecosystem
XDF 2019 Xilinx Accelerated Database and Data Analytics EcosystemXDF 2019 Xilinx Accelerated Database and Data Analytics Ecosystem
XDF 2019 Xilinx Accelerated Database and Data Analytics EcosystemDan Eaton
 
AWS Public Sector Symposium 2014 Canberra | Big Data in the Cloud: Accelerati...
AWS Public Sector Symposium 2014 Canberra | Big Data in the Cloud: Accelerati...AWS Public Sector Symposium 2014 Canberra | Big Data in the Cloud: Accelerati...
AWS Public Sector Symposium 2014 Canberra | Big Data in the Cloud: Accelerati...Amazon Web Services
 

Similar a AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Lessons Learned from the PCAWG Project (LFS304) (20)

Computing Outside The Box September 2009
Computing Outside The Box September 2009Computing Outside The Box September 2009
Computing Outside The Box September 2009
 
AWS Community Day Bangkok 2019 - How AWS Parallel Cluster can accelerate high...
AWS Community Day Bangkok 2019 - How AWS Parallel Cluster can accelerate high...AWS Community Day Bangkok 2019 - How AWS Parallel Cluster can accelerate high...
AWS Community Day Bangkok 2019 - How AWS Parallel Cluster can accelerate high...
 
Ceph used in Cancer Research at OICR
Ceph used in Cancer Research at OICRCeph used in Cancer Research at OICR
Ceph used in Cancer Research at OICR
 
Get Your Head in the Cloud - Lessons in GPU Computing with Schlumberger
Get Your Head in the Cloud - Lessons in GPU Computing with SchlumbergerGet Your Head in the Cloud - Lessons in GPU Computing with Schlumberger
Get Your Head in the Cloud - Lessons in GPU Computing with Schlumberger
 
DEVNET-1140 InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
DEVNET-1140	InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...DEVNET-1140	InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
DEVNET-1140 InterCloud Mapreduce and Spark Workload Migration and Sharing: Fi...
 
Benefits of Hadoop as Platform as a Service
Benefits of Hadoop as Platform as a ServiceBenefits of Hadoop as Platform as a Service
Benefits of Hadoop as Platform as a Service
 
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
 
Self-Service Supercomputing
Self-Service SupercomputingSelf-Service Supercomputing
Self-Service Supercomputing
 
Geospatial Sensor Networks and Partitioning Data
Geospatial Sensor Networks and Partitioning DataGeospatial Sensor Networks and Partitioning Data
Geospatial Sensor Networks and Partitioning Data
 
The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...
The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...
The Impact of Columnar File Formats on SQL-on-Hadoop Engine Performance: A St...
 
StorPool Storage Оverview and Integration with CloudStack
StorPool Storage Оverview and Integration with CloudStackStorPool Storage Оverview and Integration with CloudStack
StorPool Storage Оverview and Integration with CloudStack
 
Hpc Cloud project Overview
Hpc Cloud project OverviewHpc Cloud project Overview
Hpc Cloud project Overview
 
Revolutionary Storage for Modern Databases, Applications and Infrastrcture
Revolutionary Storage for Modern Databases, Applications and InfrastrctureRevolutionary Storage for Modern Databases, Applications and Infrastrcture
Revolutionary Storage for Modern Databases, Applications and Infrastrcture
 
Introduction to Apache Mesos and DC/OS
Introduction to Apache Mesos and DC/OSIntroduction to Apache Mesos and DC/OS
Introduction to Apache Mesos and DC/OS
 
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
Ceph Day New York 2014: Best Practices for Ceph-Powered Implementations of St...
 
VMworld 2013: Virtualizing Databases: Doing IT Right
VMworld 2013: Virtualizing Databases: Doing IT Right VMworld 2013: Virtualizing Databases: Doing IT Right
VMworld 2013: Virtualizing Databases: Doing IT Right
 
Computing Outside The Box June 2009
Computing Outside The Box June 2009Computing Outside The Box June 2009
Computing Outside The Box June 2009
 
High Performance Computing with AWS
High Performance Computing with AWSHigh Performance Computing with AWS
High Performance Computing with AWS
 
XDF 2019 Xilinx Accelerated Database and Data Analytics Ecosystem
XDF 2019 Xilinx Accelerated Database and Data Analytics EcosystemXDF 2019 Xilinx Accelerated Database and Data Analytics Ecosystem
XDF 2019 Xilinx Accelerated Database and Data Analytics Ecosystem
 
AWS Public Sector Symposium 2014 Canberra | Big Data in the Cloud: Accelerati...
AWS Public Sector Symposium 2014 Canberra | Big Data in the Cloud: Accelerati...AWS Public Sector Symposium 2014 Canberra | Big Data in the Cloud: Accelerati...
AWS Public Sector Symposium 2014 Canberra | Big Data in the Cloud: Accelerati...
 

Más de Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

Más de Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Último

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 

Último (20)

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 

AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Lessons Learned from the PCAWG Project (LFS304)

  • 1. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Brian O’Connor Technical Director - Analysis Core UCSC Genomics Institute Nov 28th, 2016 Large-scale, Cloud-based Analysis of Cancer Genomes Lessons Learned from the PCAWG Project
  • 3. PCAWG: A Cloud-Based, Distributed Collaboration ● International Cancer Genome Consortium (ICGC) ● ~5,800 Whole Genomes –~2,800 Cancer Donors –~1,300 with RNASeq data –Goal is to consistently analyze data ● 8 sites storing and sharing data via GNOS – 300TB -> 900TB ● 14 Cloud (and HPC) environments –3 Commercial, 7 OpenStack, 4 HPC –~630 VMs, ~15K cores, ~60TB of RAM
  • 4. PCAWG Cloud Analysis “Core” Workflows
  • 5. PCAWG Lessons Learned 1. Commercial cloud policies 2. Portable tools 3. Failure-tolerant, distributed execution infrastructure 4. Commercial cloud costs
  • 6. Lesson 1: Commercial Cloud Policies • PCAWG analysis showed the power of clouds • Key policy changes enabled commercial cloud usage • NIH updated dbGaP cloud policy - March 2015 • ICGC DACO updated ICGC cloud policy - May 2015 • Partnerships with commercial cloud entities • Amazon Public Datasets Program • Seven Bridges • DNAnexus
  • 7. PCAWG Cloud Analysis Architecture GNOS Academic Compute Centers Cloud Orchestrator Compute AWS Cloud Cloud Orchestrator Metadata Index Sequencing Projects Spot Instances Work Orders
  • 8. PCAWG Analysis Architecture & AWS GNOS Academic Compute Centers Cloud Orchestrator Compute AWS Cloud Cloud Orchestrator Metadata Index DNAnexus Seven Bridges Sequencing Projects Represents a major shift, ICGC data now redistributed within Amazon’s Cloud Spot Instances Work Orders Amazon S3
  • 9. Lesson 2: Portable Tools Containerized workflows for portability between sites Core Workflows Alignment: BWA-Mem Variant Calling: Broad, DKFZ/EMBL, and Sanger https://github.com/ICGC-TCGA-PanCancer
  • 10. Lesson 3: Fault-Tolerant Cloud Execution Architecture 1.0 Architecture 2.0 Architecture 3.0 ● cloud-based clusters ● gluster distributed filesystem ● scheduling per cloud ● single-node workers ● no distributed filesystem ● ansible for setup ● a complete rethink
  • 11. Lesson 4: Cloud Costs Workflow Hardware (cores / machine) Runtimes Cost on AWS BWA 8 cores (16 GB RAM) 5 days (± 5) per specimen $11.16 Sanger 8 cores (32 GB RAM) 4 days (± 3) per donor $17.22 DKFZ / EMBL 16 cores (64 GB RAM) 2 days (± 0.6) per donor $12.80 Broad 32 cores (256 GB RAM) 2.6 days per donor $20.48 workflow storage required per donor BWA 240 GB Sanger 4 GB DKFZ / EMBL 5 GB Total 249 GB Data analysis: Create a cloud commons, Nature 2015 $62/donor
  • 12. ICGC PCAWG Legacy Publications soon AWS Public Datasets Program ~1,400 PCAWG Donors - BAM (~70% of ICGC donors) - VCF from all three pipelines - more ICGC data uploaded regularly https://dcc.icgc.org/icgc-in-the-cloud
  • 13. The Present Goal: to formalize lessons from PCAWG into reusable tools Dockstore Tool/Workflow Sharing Toil Workflow Execution Redwood File Storage
  • 15. Redwood - Scalable Storage Authentication & Storage Services Key Features: based on ICGC Storage Service, supports FUSE, BAM Slicing, and Highly Parallel access, typically WORM usage pattern client Amazon S3 Amazon EC2 instance AWS cloud
  • 16. Redwood - Storage System Performance The Redwood Storage System (and underlying S3) provided a stable and secure mechanism to store and use genomic data Example run of ~100 simultaneously downloads saw ~45-100MB/s
  • 17. Dockstore.org - Sharing Tools & Workflows Dockstore: ● Share tools and workflows ● Package tools with Docker, Describe with CWL/WDL ● PCAWG goal, provide our tools via Dockstore http://dockstore.org and https://github.com/ga4gh
  • 18. Dockstore Architecture Built on DockerHub/Quay.io and GitHub/BitBucket Adds metadata to address shortcomings for bioinformatics workflows CWL/WDL is the natural choice for Descriptor
  • 19. Dockstore 1.0 Release Highlighted New Features Support for 1.0.0 GA4GH Tool Registry API Support for displaying, sharing, and natively launching CWL 1.0 & WDL tools Preliminary support for CWL/WDL workflows Full list of updates since 0.4-beta.4 in https://github.com/ga4gh/dockstore/releases New Content ICGC PanCancer Analysis of Whole Genomes (PCAWG) tools • BWA-mem, Sanger, Delly, DKFZ
  • 21.
  • 22. Running Dockstore Tools Execution with the Dockstore Command Line Interface (CLI) Goal was something simple but want the same process accessible via other execution systems! provision input files pull Docker images execute tool with inputs using CWL provision output files somewhere Seven Bridges, Curoverse, Galaxy, Consonance, etc Simple Dockstore Command Line
  • 23.
  • 24.
  • 25.
  • 26.
  • 27. Coming Soon to Dockstore Workflow DAG view Testing PCAWG Test Data “Launch With…” • Consonance • Commercial partner(s) Signed Dockers Cross site indexing See Roadmap: https://goo.gl/4D9a8F
  • 28. Toil - Efficient Compute on AWS ●A system for large-scale, efficient work on AWS ●Toil recently completed a 30K core, 20K sample re- compute ●Per job granularity allows for better efficiency and robustness
  • 29. The job graph in Toil can be either statically or dynamically declared. Toil - Dynamic DAGs
  • 30. Toil - Spark & ADAM Integration Amazon EC2 Instances master slave slave slave slave
  • 31. User scripts are written in pure Python from toil.job import Job def helloWorld(message, memory="2G", cores=2, disk="3G"): return "Hello, world!, here's a message: %s" % message j = Job.wrapFn(helloWorld, "You did it!") if __name__=="__main__": parser = Job.Runner.getDefaultArgumentParser() options = parser.parse_args() print Job.Runner.startToil(j, options) #Prints Hello, world!, ... Toil - Accessible to New Developers
  • 32. ● Toil can be installed on any system with Python 2.7 ● Built-in support for various batch systems - a few in part to open-source community support! ○ Mesos ○ SGE (GridEngine) ○ UCSC’s Parasol ○ Single Machine Mode ○ LSF ○ SLURM ● All batch systems can be interchangeably used with any of the job stores Toil - Portable
  • 33. ● Cloud-based job stores are designed to handle many concurrent workers ● Mesos has been shown to scale to 50k simulated nodes in Amazon Elastic Compute Cloud (EC2) ● Workers try to reduce interactions with the master by scheduling jobs locally Toil - Scalable
  • 34. ● Jobs are checkpointed upon completion, allowing for resumability after job failure ● Toil’s jobstore can resume from any combination of leader/worker failure ● Toil currently supports job stores for: ○ Shared file systems ○ AWS (Amazon S3 + Amazon SimpleDB) ○ Experimental support for Azure / Google Cloud Toil - Robust to Failures
  • 35. Toil in Action 20,000 RNA-seq Sample Recompute
  • 36. Scalable and robust to failure Toil RNA-seq Recompute
  • 37. The Future PCAWG showed the power of cloud for large scientific analysis Current work with Redwood, Dockstore, and Toil formalized lessons learned and methodologies Our future work focuses on establishing standards from our previous work and applying these to future larger-scale efforts
  • 38. Tool Registry API ● Formalizing the standard with the GA4GH through the Containers and Workflows Task Team, implemented in Dockstore ● Basic read API with extended support for write and search Tool(s) descriptor Docker GET list GET search POST register CWL/WDL Conventions API Standard to Share Emerging GA4GH API Standards
  • 39. Emerging GA4GH API Standards Further work of the Containers and Workflows Task Team Workflow/Task Execution APIs POST new task GET task status GET task stderr/stdout API Standard to Execute Tools Docker JSON stderr stdout file(s) status + Cloud-specific Implementation WDL/CWL Workflow or
  • 40. GA4GH Containers & Workflow Vision Toil Dockstore.org Redwood
  • 41. - GA4GH Containers & Workflows Task Team - Broad Institute - Cincinnati Children’s Hospital - Curoverse - European Bioinformatics Institute - Intel - Institute for Systems Biology - Google, Microsoft, and Amazon - Ontario Institute for Cancer Research - Oregon Health and Science University - Seven Bridges Genomics - University of California Santa Cruz ● Lincoln Stein, Josh Stuart, Gad Getz, Peter Campbell, Jan Korbel - PCAWG ● Vincent Ferretti - Storage ● Denis Yuen - Dockstore ● Kyle Ellrott - Task API ● Peter Amstutz - Workflow API and Co-leader ● Jeff Gentry - Co-leader ● Hannes Schmidt, Frank Nothaft & the Toil Team Acknowledgements
  • 44. Enabling science Scalable compute resource only when needed Time to result was greatly reduced Cost of analysis was greatly reduced Data is able to be securely shared in place Global community access
  • 45. Open data as a platform Data Creation Data Enrichment Sensemaking Data at Rest (Object storage) Basic APIs Complex APIs Consumer applications Algorithmic policy Data-driven journalism Data Catalogs Focused data dashboards Predictive modeling Visualizations Lower cost of knowledge (Efficiency) 45
  • 46. Open data as a platform Data Creation Data Enrichment Sensemaking Data at Rest (Object storage) Basic APIs Complex APIs Consumer applications Algorithmic policy Data-driven journalism Data Catalogs Focused data dashboards Predictive modeling Visualizations Lower cost of knowledge (Efficiency) 46 BAM gVCF Wig, GFF ? ? ? ??
  • 47. Amazon S3 for science Amazon S3 Data Lake Data Science Sandbox Visualization / Reporting
  • 48. Public datasets on AWS To enable more innovation, AWS hosts a selection of datasets that anyone can access for free. Data in our public datasets is available for rapid access to our flexible and low-cost computing resources. Earth Science • Landsat • NEXRAD • NASA NEX Life Science • TCGA & ICGC • 1000 Genomes • Genome in a Bottle • Human Microbiome Project • 3000 Rice Genome Internet Science • Common Crawl Corpus • Google Books Ngrams • Multimedia Commons https://aws.amazon.com/public-datasets/
  • 50. AWS Lambda Continuous ScalingNo Servers to Manage AWS Lambda automatically scales your application by running code in response to each trigger. Your code runs in parallel and processes each trigger individually, scaling precisely with the size of the workload. Subsecond Metering With AWS Lambda, you are charged for every 100ms your code executes and the number of times your code is triggered. You don't pay anything when your code isn't running. AWS Lambda automatically runs your code without requiring you to provision or manage servers. Just write the code and upload it to Lambda. Serverless, event-driven compute service
  • 51. Key Scenarios Stateless processing of discrete or streaming updates to your data-store or message bus Customize responses and response workflows to state and data changes within AWS Execute server side backend logic in a cross platform fashion Data processing App backend development Control systems
  • 52. Evented genome sequence processing Nanocall* * Matei David (Jared T. Simpson lab) doi:10.1093/bioinformatics/btw569
  • 53.  The use of API gateway to execute Lambda functions that bundle a statistical program function in R for calculating the significance of an association of a gene’s expression level with patient survival for every gene in the genome (~20K)  Utilization of this Serverless architecture enabled them to scale dynamically without paying for idle compute and leveraging robust error handling capabilities  Exemplifies how researchers can leverage PHI data de-identification to use more resources on the AWS platform Data analysis using R, API Gateway, and Lambda Station X’s GenePool platform enables real-time biomarker analysis and management of clinical genomic data at scale. The patient data has been de- identified…API Gateway and Lambda only receive the event, time-to-event, and expression values [which] ensures that we are able to use Lambda and API Gateway...while still complying with the AWS BAA and HIPAA. “ ”
  • 54. GT-Scan2 – Scaling CRISPR-Cas9 searches