SlideShare una empresa de Scribd logo
1 de 62
Descargar para leer sin conexión
The Problem and Promise of Translational Genetics and a
Step to the Clouded Solution of Scalable Clinical Whole
Genome Sequencing
Jafar Shameem
Amazon Web Services
November 14, 2013

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
Agenda
• Bio-Informatics and Amazon Web Services
• Examples of collaboration
• Building Blocks
–
–
–
–

Compute
Storage
Tools
Pricing Models
A rich history of collaboration with Life Sciences organizations
AWS Public Data Sets
•

A centralized repository of public datasets

•

Seamless integration with cloud based applications

•

No charge to the community

•

Some of the datasets available today:
–
–

Human Microbiome Project

–

Ensembl

–

GenBank

–

Illumina – Jay Flateley Human Genome Dataset

–

YRI Trio Dataset

–

The Cannabis Sativa Genome

–

UniGene

–

Influenza Virrus

–

•

1000 Genomes Project

PubChem

Tell us what else you’d like for us to host …
Understanding how human genetics contributes
to heart disease and aging
CHARGE Consortium
- aimed at better understanding how human genetics contributes to heart disease
and aging
DNANexus
Baylor College of Medicine
Cluster High Mem 8XL
89 EC2 Compute Units
244 GB SSD instance storage

Compute
High Storage 8XL 117 GB
35 EC2 Compute Units
24 * 2 TB instance store

Cluster Compute 8XL 60.5 GB
88 EC2 Compute Units

Memory
(GiB)

Hi-Mem 4XL 68.4 GB
26 EC2 Compute Units
8 virtual cores
Hi-Mem 2XL 34.2 GB
13 EC2 Compute Units
4 virtual cores

Cluster Compute 4XL 23 GB
33.5 EC2 Compute Units
High I/O 4XL 60.5 GB, 35
EC2 Compute Units,
2*1024 GB SSD-based
local instance storage

Hi-Mem XL 17.1 GB
6.5 EC2 Compute Units
2 virtual cores
Medium 3.7 GB,
2 EC2 Compute Units
1 virtual core

Extra Large 15 GB
8 EC2 Compute Units
4 virtual cores

Small 1.7 GB,
1 EC2 Compute Unit
1 virtual core
Micro 613 MB
Up to 2 ECUs

Large 7.5 GB
4 EC2 Compute Units
2 virtual cores
High-CPU Med 1.7 GB
5 EC2 Compute Units
2 virtual cores

EC2 Compute Units

Cluster GPU 4XL 22 GB
33.5 EC2 Compute Units,
2 x NVIDIA Tesla “Fermi”
M2050 GPUs

High-CPU XL 7 GB
20 EC2 Compute Units
8 virtual cores
Storage

Relational Database Service

SimpleDB

DynamoDB

S3

Fully managed database

NoSQL, Schemaless

NoSQL, Schemaless,

Object datastore up to 5TB

(MySQL, Oracle, MSSQL)

Smaller datasets

Provisioned throughput

per object

database

99.999999999% durability

Redshift
Petabyte scale
data warehousing service
Fully managed
Tools of the trade
•
•
•
•
•
•
•

GATK
NCBI BLAST
Crossbow
CloudBurst
Myrna
Clovr
BioPerl Max

•
•
•
•
•
•
•

VIPDAC
Superfamily
Cloud-Coffee
BioNimbus
GMOD
CloudAligner
BioConductor

•
•
•
•

QIIME
SNAP
BWA
Bowtie/TopHat/Cufflin
ks
• STAR, GSNAP, RUM

MIT StarCluster

Galaxy CloudMan

Rocks

Torque

Slurm

Condor

Chef

Puppet

SaltStack

Get links to AMIs at:
https://github.com/mndoci/mndoci.github.com/wiki/Life-Science-Apps-on-AWS
Many purchase models to support different needs
Free Tier

On-Demand

Reserved

Spot

Dedicated

Get Started on AWS
with free usage & no
commitment

Pay for compute
capacity by the hour
with no long-term
commitments

Make a low, one-time
payment and receive a
significant discount on
the hourly charge

Bid for unused capacity,
charged at a Spot Price
which fluctuates based
on supply and demand

Launch instances within
Amazon VPC that run
on hardware dedicated
to a single customer

For POCs and
getting started

For spiky workloads,
or to define needs

For committed
utilization

For time-insensitive or
transient workloads

For highly sensitive or
compliance related
workloads
How to use Spot?
Ideal Applications
Batch Processing
Time-Delayable
Fault-Tolerant or Restartable
Compute-Intensive
Horizontally Scalable
Stateless Worker Nodes
Region and AZ Independent
Uses Deployment Automation

Less Ideal Applications
Interactive
Strict/Tight SLA for Completion
Expensive to Handle Terminations
Data-Intensive
In-Memory Scaling
Long-Running Worker Nodes (weeks)
Requires a Single AZ
Manually Launched and Managed
Tractable, scalable, and economical processing of
clinical whole genome sequences in AWS
Clinical Genomics for Cancer Diagnosis
Amazon Web Services Re-Invent 2013
Nov 14th, 2013 Las Vegas, NV

Peter J. Tonellato, PhD
Harvard Medical School

Dennis P. Wall, PhD*
Stanford University*

Stanford University
Whole Genome Breast Cancer Program
Objective: The objective of the Whole Genome Breast Cancer Program
(WGBC) is to demonstrate the clinical utility and value of the use of whole
genome analysis to practical breast cancer detection, diagnosis, prognosis
and improved outcomes.

WGA in Clinical Turn-Around
Demonstrate the use of Amazon Web Services to establish Clinical Whole
Genome Analysis in “clinical turn-around”:
WG NGS Sequence to Actionable Health Care Information

Clock time:
Cost:

< 3 hours
< $100

Stanford University
Whole Genome Breast Cancer Program
1. Organization and Progress to date
2. Historical BIDMC Breast Cancer cases
3. Clinical Whole Genome Analysis – Laboratory Test
4. COSMOS: Clinical Whole Genome Analysis on AWS

Stanford University
Whole Genome Breast Cancer Program
1. Organization and Progress to date

2. Historical BIDMC Breast Cancer cases
3. Clinical Whole Genome Analysis – Laboratory Test
4. COSMOS: Clinical Whole Genome Analysis on AWS

Stanford University
N - MDBCTB
Surgery
Mike

Genetics
Nadine

Oncology
Gerburg

Pathology
Stu

Radiation
Oncology
Abram

Genetic
Counseling
Jill

Social Work
Barbara

Bioinformatics
Peter & Dennis

LPM
- Sheida - Latrice
- Jared - Yassine
- Val
- Michiyo

Program Coordinator – Michiyo
- Research Assistant * (Emily Poles?)
- Technician *

Case management
- Case Identification
- Case review (N-MDBCTB)
- Consent (RN)
- Clinical Data Management
- Tissue Collection
- Sample Management
- Follow-up Assay/Bioinformatics

Imaging
Tejas

Assay
- Preparation/Storage
- DNA extraction/purification
- Sample delivery (to outsource)
- Whole genome sequencing
-OncoScan v3 (BI)
- DNA/RNA/NGS sequencing
(outsource)

Bioinformatics
- Data Transfer
- Genome Data
Integration/Management
- Annotation
- Analysis
- Translation
- Case Evidence Report

Translation
- Data Integration/Management
- Case report (N-MDBCTB)

Oversee
External Advisory Board

Clinical Executive Committee

Cancer Center

Stanford University

Regulatory Affairs
* Hiring
Clinical Lab

BIDMC Breast Cancer Patient/Sample Process

Surgical
specimen

BIDMC Clinic (Oncology, Surgery, Radiation Oncology)
N-BCMDTB
Case selection

Diagnosis Work-up
MMG
US
MRI
Biopsy
(Immunohistochemistry, FISH)

Pathology

FFPE
BCMDTB

Diagnoses

Yes

Surgery
After
NAC

Consent
to care

X

Biopsy
specimen

Blood
sample

FF

Presentation
Case and Schedule

Case Evaluation

Surgery?
No

Blood test lab

No
Yes

Consent
to
Research

Surgery

Biopsy

Blood
sample
Tissue
specimen

Storage
FF

Diagnoses

OMR

OMR

Yes -> * and **
No -> *

Blood test

Storage
FFPE, FF

Research

*
Clinical

Research Pathology Lab
Blood
sample

**
Research

Tissue
sample

DNA, RNA
Extraction

- DNA Sequencing
- Exome sequencing
- OncoScan v3™
Copy number
Somatic Mutation

Adjuvant
Therapy

Analysis Lab (LPM)
Chemotherapy
Radiation therapy

Patient flow
Clinical Evaluation
Case identification workflow
Sample workflow
Analysis workflow

- Gene expression pipeline
- OncoScan™ pipeline
- SNP Chip pipeline
- Integrative pipeline

OMR
- Clinical Data
- Follow up Outcome

Translation

Workflow
X: No further treatment and research
NAC: Neoadjuvant chemotherapy
OMR: Online Medical Record
FFPE: Formalin-Fixed, Paraffin-Embedded (tissue)
Stanford University
FF: fresh frozen (tissue)

N-BCMDTB
Result Evaluation
Identification of Targeted Therapy
Personalized Medicine
IRB Approved
Protocol
No

Evaluation

- Treatment Decision

Yes -> Undergo surgery
No

Excluded

Decision

- Case Decision

Yes -> Eligible

No surgery
Not eligible
Disagreed
Poor sample

No

Consent

- Getting consent

Yes -> Agreed
No

No

Clinical Workup

- Blood Test
- Breast Surgery

Tissue Workup

- Pathology Workup
- Sample Collection
(Extract DNA/RNA from Tissue and Blood)
- DNA genome sequencing
- Exome sequencing
- Copy number and somatic mutations analysis
using an array platform (OncoScan)
- Analysis outcome data

Assay
Clinical Outcome
Analysis
Clinicopathological
Characteristic

Translation

- Discussion at NBCMDTB

Traditional Treatment
Personalized
- Identification of Targeted Therapy
Stanford University
and Personalized Medicine
Medicine
Whole Genome Breast Cancer Program
1. Organization and Progress to date
2. Historical BIDMC Breast Cancer cases
3. Clinical Whole Genome Analysis – Laboratory Test
4. COSMOS: Clinical Whole Genome Analysis on AWS

Stanford University
Breast Cancer Clinical Use of WGA
1.
2.
3.
4.
5.

Family and Individual Risk prediction
Breast Cancer Tumor Characterization
Breast Cancer Diagnosis
Breast Cancer Prognosis
Prediction of response to targeted
therapies
6. Indications of outcome and assessment for
future treatment refinement

Stanford University
Breast Cancer Genomic Devices
35 devices reviewed; 26 used clinically
Prognosis
Risk Prediction
23andMe*
deCODEme*
BRACAnalysis*
Ambry Genetics*
CCDG Panel

OncoScan
TargetPrint
BluePrint**
PAM50*
BreastProfile*
Her2Pro*
MammaPrint

Methyl-Profiler
Rotterdam Signature
MammoStrat
BreastGeneDX
Breast Cancer Array
OncotypeDX*
Breast Cancer Index

Research
OncoMap3**
AsuraSeq-1000**
OncoCarta**

*Associated CPT/CMS codes
**Not for clinical use

SNaPshot
MapQuant DX
TheraPrint**
NexCourse Bca
Wash U Panel
Target Now

Stanford University
Clinically Actionable Breast Cancer
Information
Data Type

# Unique Entries

Gene

773

SNP

52 SNPs for risk prediction. 1681
SNPs for prognosis

1733

Small Insertion

75

Small Deletion

205

Translocation

3

Gene Expression

Drug target commonly based on
gene expression profile

383

Protein Expression

7

Amplification

64

Deletion

HER2, Estrogen, Progesterone
receptor status

48

Total “Clinically”
Actionable

3291

9 Deletions in BRCA1 or BRCA2
detected by BRACAnalysis confer
increased breast cancer risk
Stanford University
Whole Genome Breast Cancer Program
1. Organization and Progress to date
2. Historical BIDMC Breast Cancer cases
3. Clinical Whole Genome Analysis (WGA) – Laboratory Test
4. COSMOS: Clinical Whole Genome Analysis on AWS

Stanford University
Clinical WGA Workflow
Patients
Samples

Next Generation
Sequencers

Bioinformatics Analysis
Clinical Genomics
Interpretation
Service
Clinical Report
Biomedical Report

Stanford University
BWA
GATK
Picard

SNP/indel

CNV-seq
ReadDepth
Segseq

CNV

Risk
Prediction

DNA-Seq

RNA-Seq

Tophat
Cufflinks
BLAST

Gene Exp.

miRNA

miRNAkey
miRBase

miRNA
targets

Bismark

% Gene
Methyl

Methyl

Stanford University

Pre-clinical
and clinical
variant
annotation

Classification
(Tumor,
disease)

Pathway
Analysis
Reduced Cost of Next Generation Sequencing
(NGS)
• NGS platforms: 5,000
Megabases/day
• Drop of the per-base
sequencing cost
• Data on petabyte scale
• NGS analysis involves
complex workflows
Stanford University
WGA in “Clinical Turn-around” – Future

12 hours
Sample Collection

500 hours
Sequencing

< 40hours < $100
3 hours
Analysis

12 hours
Clinical Action
Stanford University
Current Costs to Run on
Amazon Web Services
Details:

1 Whole Genome

60x

Spot and Reserved Instances

Utilizing Amazon Glacier for long term storage
Whole Genome Analysis:
Approximately 1 day
Approximately $1500

Stanford University
Whole Genome Breast Cancer Program
1. Organization and Progress to date
2. Historical BIDMC Breast Cancer cases
3. Clinical Whole Genome Analysis – Laboratory Test
4. COSMOS: Clinical Whole Genome Analysis on AWS
•
•
•
•

AWS
Applications
Workflow
COSMOS
Stanford University
Clinical Whole Genome Analysis Computational
Objective:

< 3 hours < $100
Four approaches to optimize and achieve our
Clinical Turn-Around Objective:
• AWS
• Refine and Improve WGA Applications
• Create a Standardized, Robust CWGA Workflow
• Stabilize a new Workflow and Distributive Computing
Platform: COSMOS

Stanford University
Clinical Whole Genome Analysis Computational
Objective:

< 3 hours < $100
Four approaches to optimize and achieve our
Clinical Turn-Around Objective:
• AWS
• Refine and Improve WGA Applications
• Create a Standardized, Robust CWGA Workflow
• Stabilize a new Workflow and Distributive Computing
Platform: COSMOS

Stanford University
Dynamic Cluster with number and the type of instances
adapted to data-sets, jobs, and applications.
EC2
instances

AMIs

S3 storage

BAM

BAM

On-demand Master(s)

Load Balanced
Spot Instance
Workers
Stanford University

BAM
EC2
instances

Optimization: Correct type and number of EC2s and
cluster
Current non-optimized Master: CC2.8xlarge
High Memory: Single job (BWA) ~ 10GB RAM
High IO: Access to common data files
Virtualization: HVM for HugePage

AMIs

Current non-optimized Worker: CC2.8xlarge

S3 storage

High Memory: Single job (BWA) ~ 10GB RAM
High IO: Access to common data files
Virtualization: HVM for HugePage

Stanford University
Create stable CWGA AMI(s)
EC2
instances

Required Applications, libraries and
dependencies:

Applications (GATK): Samtools, BWA, …
Human Reference Genome
AMIs

Annotation Databases

S3 storage

Stanford University
Optimize: AMI
EC2
instances

Compiler:

GCC 4.6+ supports AVX mode
Refined GCC parameters

Compressed libraries: zlib and snappy
Refined JAVA parameters for GATK optimization
AMIs

S3 storage

Memory: HugePage (2M) configured for every node/application
Disks:
Ephemeral:
Cluster Disks:

RAID 0
GlusterFS

Stanford University
EC2
instances

AMIs

S3 storage:

•
•
•
•
•

Storage of BAM files
Transfer of BAM and other files
“checkpoint” after each successful workflow stage
Backup of intermediate and final results
Storage of all timings and job information

S3 storage

Stanford University
Clinical Whole Genome Analysis Computational
Objective:

< 3 hours < $100
Four approaches to optimize and achieve our
Clinical Turn-Around Objective:
• AWS
• Refine and Improve WGA Applications
• Create a Standardized, Robust CWGA Workflow
• Stabilize a new Workflow and Distributive Computing
Platform: COSMOS

Stanford University
BWA
GATK
Picard

SNP/indel

CNV-seq
ReadDepth
Segseq

CNV

Risk
Prediction

DNA-Seq

RNA-Seq

Tophat
Cufflinks
BLAST

Gene Exp.

miRNA

miRNAkey
miRBase

miRNA
targets

Bismark

% Gene
Methyl

Methyl

Stanford University

Pre-clinical
and clinical
variant
annotation

Classification
(Tumor,
disease)

Pathway
Analysis
WGA Applications
Genome Analysis Toolkit (GATK) “best practice”.
Preparation/Alignment

Variant calling

Stanford University

Annotation

Source: GATK best practices, BROAD Institute, http://www.broadinstitute.org/gatk/guide/topic?name=best-practices
Applications Parallelization
5 exomes example
5 exome
600

500

400

300

200

5 exome

100

0

Preparation/Alignment

Variant calling

Stanford University

Annotation
Alignment: Burrows-Wheeler Aligner

Stanford University
Clinical Whole Genome Analysis Computational
Objective:

< 3 hours < $100
Four approaches to optimize and achieve our
Clinical Turn-Around Objective:
• AWS
• Refine and Improve WGA Applications
• Create a Standardized, Robust CWGA Workflow
• Stabilize a new Workflow and Distributive Computing
Platform: COSMOS

Stanford University
BWA
GATK
Picard

SNP/indel

CNV-seq
ReadDepth
Segseq

CNV

Risk
Prediction

DNA-Seq

RNA-Seq

Tophat
Cufflinks
BLAST

Gene Exp.

miRNA

miRNAkey
miRBase

miRNA
targets

Bismark

% Gene
Methyl

Methyl

Stanford University

Pre-clinical
and clinical
variant
annotation

Classification
(Tumor,
disease)

Pathway
Analysis
BWA
GATK
Picard

SNP/indel

CNV-seq
ReadDepth
Segseq

CNV

Risk
Prediction

DNA-Seq

RNA-Seq

Tophat
Cufflinks
BLAST

Gene Exp.

miRNA

miRNAkey
miRBase

miRNA
targets

Bismark

% Gene
Methyl

Methyl

Stanford University

Pre-clinical
and clinical
variant
annotation

Classification
(Tumor,
disease)

Pathway
Analysis
GenomeKey
Implements GATK "best practices" for variant calling.
GenomeKey
Preparation/Alignment

Variant calling

Stanford University

Annotation

Source: GATK best practices, BROAD Institute, http://www.broadinstitute.org/gatk/guide/topic?name=best-practices
Databases Integrated

Stanford University
Databases Integrated
CytoBank
The_1000g_Febuary_all
dbSNP135
NHLBI_Exome_Project_euro
TFBS
NHLBI_Exome_Project_aa
Segmental_Duplications
NHLBI_Exome_Project_all
RepeatMasker
HGMD_INDEL
Self Chain
HGMD_SNP
mirBase
COSMIC
TargetScan
GWAS_Catalog
Plus support for generic database file formats such as .bed and .gff3
SIFT
ENCODE_DNaseI_Hypersensitivity
PolyPhen2
ENCODE_Transcription_Factor
Mutation_Taster
UCSC_Gene
GERP
Refseq_Gene
PhyloP
Ensembl_Gene
LRT
CCDS_Gene
Mce46way
DrugBank
Complete_Genomics_69

Stanford University
Workflow Optimization
• Speed:
• Replacing BWA with SNAP (for the same accuracy)
• Re-implement some slow algorithms (e.g. BQSR)
• Accuracy:
• Add additional quality control steps
• Replacing Unified Genotyper with Haplotype Caller

Stanford University
Clinical Whole Genome Analysis Computational
Objective:

< 3 hours < $100
Four approaches to optimize and achieve our
Clinical Turn-Around Objective:
• AWS
• Refine and Improve WGA Applications
• Create a Standardized, Robust CWGA Workflow
• Stabilize a new Workflow and Distributive Computing
Platform: COSMOS

Stanford University
COSMOS
Workflow management System
Job
splitting

COSMOS

Job
tracking

GenomeKey

Preparation/Alignm
ent

Variant calling

Annotation

Gluster FS

MySQL DB

Networking

Web
Interface

OS & Software

Grid
engine

EC2 and S3 AWS

Instances

Stanford University

Storage
COSMOS Parallelization
1200

Preparation/Alignment

Variant calling

Annotation

Number of Jobs

1000
800
600
1 Exome
5 Exomes

400

10 Exomes
All Runs

200
0

Stanford University
COSMOS Job Splitting

Stanford University
COSMOS Job Splitting

Stanford University
COSMOS Job Splitting

Stanford University
Job Dependency Tracking

PREPARATION / ALIGNMENT

VARIANT CALLING

ANNOTATION

Stanford University
COSMOS Web Interface

PREPARATION / ALIGNMENT

VARIANT CALLING

ANNOTATION
Stanford University
Clinical Whole Genome Analysis Computational
Objective:

< 3 hours < $100
Four approaches to optimize and achieve our
Clinical Turn-Around Objective:
• AWS
• Refine and Improve WGA Applications
• Create a Standardized, Robust CWGA Workflow
• Stabilize a new Workflow and Distributive Computing
Platform: COSMOS

Stanford University
Whole Exome Analysis
Pre and Post-Optimization
30
Before
25

Wall time

20

~$90
Before

15
10
5
0

~$48 After

Before

~$27

After

~$47
After

~$27

~$10
1 exome

5 exomes
Stanford University

10 exomes
Whole Exome Analysis:
30
Before
25

Wall time

20

~$90
Before

15
10
5
0

~$48 After

Before

~$27

After

~$47
After

~$27

~$10
1 exome

5 exomes
Stanford University

10 exomes
Whole Exome Analysis:
30
Before
25

Wall time

20

~$90
Before

15
10
5
0

~$48 After

Before

~$27

After

~$47
After

~$27

~$10
1 exome

5 exomes
Stanford University

10 exomes
Whole Genome Breast Cancer Program
Objective: The objective of the Whole Genome Breast Cancer Program
(WGBC) is to demonstrate the clinical utility and value of the use of whole
genome analysis to practical breast cancer detection, diagnosis, prognosis
and improved outcomes.

WGA in Clinical Turn-Around
Demonstrate the use of Amazon Web Services to establish Clinical Whole
Genome Analysis in “clinical turn-around”:
WG NGS Sequence to Actionable Health Care Information

Clock time:
Cost:

< 3 hours
< $100

Stanford University
Acknowledgments
LPM (Tonellato)
Erik Gafni (InVitae)
Vince Fusaro (InVitae)
Jared B. Hawkins
Ryan Powles
Yassine Souilmi

Autism Speaks
6000 Exomes (current)
10,000 Genomes

Wall lab
(Harvard & Stanford University)
Jae-Yoon Jung
Alex Lancaster
David Tulga

Ancient Human Genomes
David Reich

Stanford University
Tractable, scalable, and economical processing of
clinical whole genome sequences in AWS
Clinical Genomics for Cancer Diagnosis
Amazon Web Services Re-Invent 2013
Nov 14th, 2013 Las Vegas, NV

Peter J. Tonellato, PhD
Harvard Medical School

Dennis P. Wall, PhD*
Stanford University*

Stanford University

Más contenido relacionado

La actualidad más candente

Life sciences big data use cases
Life sciences big data use casesLife sciences big data use cases
Life sciences big data use casesGuy Coates
 
Hadoop for Bioinformatics: Building a Scalable Variant Store
Hadoop for Bioinformatics: Building a Scalable Variant StoreHadoop for Bioinformatics: Building a Scalable Variant Store
Hadoop for Bioinformatics: Building a Scalable Variant StoreUri Laserson
 
Sharing data: Sanger Experiences
Sharing data: Sanger ExperiencesSharing data: Sanger Experiences
Sharing data: Sanger ExperiencesGuy Coates
 
Managing Genomics Data at the Sanger Institute
Managing Genomics Data at the Sanger InstituteManaging Genomics Data at the Sanger Institute
Managing Genomics Data at the Sanger Instituteinside-BigData.com
 
Whitepaper : CHI: Hadoop's Rise in Life Sciences
Whitepaper : CHI: Hadoop's Rise in Life Sciences Whitepaper : CHI: Hadoop's Rise in Life Sciences
Whitepaper : CHI: Hadoop's Rise in Life Sciences EMC
 
How novel compute technology transforms life science research
How novel compute technology transforms life science researchHow novel compute technology transforms life science research
How novel compute technology transforms life science researchDenis C. Bauer
 
The Gordon Data-intensive Supercomputer. Enabling Scientific Discovery
The Gordon Data-intensive Supercomputer. Enabling Scientific DiscoveryThe Gordon Data-intensive Supercomputer. Enabling Scientific Discovery
The Gordon Data-intensive Supercomputer. Enabling Scientific DiscoveryIntel IT Center
 
White Paper: Life Sciences at RENCI, Big Data IT to Manage, Decipher and Info...
White Paper: Life Sciences at RENCI, Big Data IT to Manage, Decipher and Info...White Paper: Life Sciences at RENCI, Big Data IT to Manage, Decipher and Info...
White Paper: Life Sciences at RENCI, Big Data IT to Manage, Decipher and Info...EMC
 
Computing Outside The Box September 2009
Computing Outside The Box September 2009Computing Outside The Box September 2009
Computing Outside The Box September 2009Ian Foster
 
Module 01 - Understanding Big Data and Hadoop 1.x,2.x
Module 01 - Understanding Big Data and Hadoop 1.x,2.xModule 01 - Understanding Big Data and Hadoop 1.x,2.x
Module 01 - Understanding Big Data and Hadoop 1.x,2.xNPN Training
 
Clouds, Grids and Data
Clouds, Grids and DataClouds, Grids and Data
Clouds, Grids and DataGuy Coates
 
Don't Be Scared. Data Don't Bite. Introduction to Big Data.
Don't Be Scared. Data Don't Bite. Introduction to Big Data.Don't Be Scared. Data Don't Bite. Introduction to Big Data.
Don't Be Scared. Data Don't Bite. Introduction to Big Data.KGMGROUP
 
Using the Open Science Data Cloud for Data Science Research
Using the Open Science Data Cloud for Data Science ResearchUsing the Open Science Data Cloud for Data Science Research
Using the Open Science Data Cloud for Data Science ResearchRobert Grossman
 
Sanger HPC infrastructure Report (2007)
Sanger HPC infrastructure  Report (2007)Sanger HPC infrastructure  Report (2007)
Sanger HPC infrastructure Report (2007)Guy Coates
 
Computing Outside The Box June 2009
Computing Outside The Box June 2009Computing Outside The Box June 2009
Computing Outside The Box June 2009Ian Foster
 
The Open Science Data Cloud: Empowering the Long Tail of Science
The Open Science Data Cloud: Empowering the Long Tail of ScienceThe Open Science Data Cloud: Empowering the Long Tail of Science
The Open Science Data Cloud: Empowering the Long Tail of ScienceRobert Grossman
 
Dr.Hadoop- an infinite scalable metadata management for Hadoop-How the baby e...
Dr.Hadoop- an infinite scalable metadata management for Hadoop-How the baby e...Dr.Hadoop- an infinite scalable metadata management for Hadoop-How the baby e...
Dr.Hadoop- an infinite scalable metadata management for Hadoop-How the baby e...Dipayan Dev
 
H2O World - Sparkling water on the Spark Notebook: Interactive Genomes Clust...
H2O World -  Sparkling water on the Spark Notebook: Interactive Genomes Clust...H2O World -  Sparkling water on the Spark Notebook: Interactive Genomes Clust...
H2O World - Sparkling water on the Spark Notebook: Interactive Genomes Clust...Sri Ambati
 
HPC lab projects
HPC lab projectsHPC lab projects
HPC lab projectsJason Riedy
 

La actualidad más candente (20)

Life sciences big data use cases
Life sciences big data use casesLife sciences big data use cases
Life sciences big data use cases
 
Hadoop for Bioinformatics: Building a Scalable Variant Store
Hadoop for Bioinformatics: Building a Scalable Variant StoreHadoop for Bioinformatics: Building a Scalable Variant Store
Hadoop for Bioinformatics: Building a Scalable Variant Store
 
Sharing data: Sanger Experiences
Sharing data: Sanger ExperiencesSharing data: Sanger Experiences
Sharing data: Sanger Experiences
 
Managing Genomics Data at the Sanger Institute
Managing Genomics Data at the Sanger InstituteManaging Genomics Data at the Sanger Institute
Managing Genomics Data at the Sanger Institute
 
Whitepaper : CHI: Hadoop's Rise in Life Sciences
Whitepaper : CHI: Hadoop's Rise in Life Sciences Whitepaper : CHI: Hadoop's Rise in Life Sciences
Whitepaper : CHI: Hadoop's Rise in Life Sciences
 
How novel compute technology transforms life science research
How novel compute technology transforms life science researchHow novel compute technology transforms life science research
How novel compute technology transforms life science research
 
The Gordon Data-intensive Supercomputer. Enabling Scientific Discovery
The Gordon Data-intensive Supercomputer. Enabling Scientific DiscoveryThe Gordon Data-intensive Supercomputer. Enabling Scientific Discovery
The Gordon Data-intensive Supercomputer. Enabling Scientific Discovery
 
White Paper: Life Sciences at RENCI, Big Data IT to Manage, Decipher and Info...
White Paper: Life Sciences at RENCI, Big Data IT to Manage, Decipher and Info...White Paper: Life Sciences at RENCI, Big Data IT to Manage, Decipher and Info...
White Paper: Life Sciences at RENCI, Big Data IT to Manage, Decipher and Info...
 
Computing Outside The Box September 2009
Computing Outside The Box September 2009Computing Outside The Box September 2009
Computing Outside The Box September 2009
 
Module 01 - Understanding Big Data and Hadoop 1.x,2.x
Module 01 - Understanding Big Data and Hadoop 1.x,2.xModule 01 - Understanding Big Data and Hadoop 1.x,2.x
Module 01 - Understanding Big Data and Hadoop 1.x,2.x
 
Clouds, Grids and Data
Clouds, Grids and DataClouds, Grids and Data
Clouds, Grids and Data
 
Don't Be Scared. Data Don't Bite. Introduction to Big Data.
Don't Be Scared. Data Don't Bite. Introduction to Big Data.Don't Be Scared. Data Don't Bite. Introduction to Big Data.
Don't Be Scared. Data Don't Bite. Introduction to Big Data.
 
Using the Open Science Data Cloud for Data Science Research
Using the Open Science Data Cloud for Data Science ResearchUsing the Open Science Data Cloud for Data Science Research
Using the Open Science Data Cloud for Data Science Research
 
Sanger HPC infrastructure Report (2007)
Sanger HPC infrastructure  Report (2007)Sanger HPC infrastructure  Report (2007)
Sanger HPC infrastructure Report (2007)
 
Computing Outside The Box June 2009
Computing Outside The Box June 2009Computing Outside The Box June 2009
Computing Outside The Box June 2009
 
Whither Small Data?
Whither Small Data?Whither Small Data?
Whither Small Data?
 
The Open Science Data Cloud: Empowering the Long Tail of Science
The Open Science Data Cloud: Empowering the Long Tail of ScienceThe Open Science Data Cloud: Empowering the Long Tail of Science
The Open Science Data Cloud: Empowering the Long Tail of Science
 
Dr.Hadoop- an infinite scalable metadata management for Hadoop-How the baby e...
Dr.Hadoop- an infinite scalable metadata management for Hadoop-How the baby e...Dr.Hadoop- an infinite scalable metadata management for Hadoop-How the baby e...
Dr.Hadoop- an infinite scalable metadata management for Hadoop-How the baby e...
 
H2O World - Sparkling water on the Spark Notebook: Interactive Genomes Clust...
H2O World -  Sparkling water on the Spark Notebook: Interactive Genomes Clust...H2O World -  Sparkling water on the Spark Notebook: Interactive Genomes Clust...
H2O World - Sparkling water on the Spark Notebook: Interactive Genomes Clust...
 
HPC lab projects
HPC lab projectsHPC lab projects
HPC lab projects
 

Destacado

Ahmed Absi slides bigbwa
Ahmed Absi slides  bigbwaAhmed Absi slides  bigbwa
Ahmed Absi slides bigbwaAbsi Ahmed
 
Introduction to NGS Variant Calling Analysis (UEB-UAT Bioinformatics Course -...
Introduction to NGS Variant Calling Analysis (UEB-UAT Bioinformatics Course -...Introduction to NGS Variant Calling Analysis (UEB-UAT Bioinformatics Course -...
Introduction to NGS Variant Calling Analysis (UEB-UAT Bioinformatics Course -...VHIR Vall d’Hebron Institut de Recerca
 
Using Security to Build with Confidence in AWS
Using Security to Build with Confidence in AWSUsing Security to Build with Confidence in AWS
Using Security to Build with Confidence in AWSAmazon Web Services
 
(DVO207) Defending Your Workloads Against the Next Zero-Day Attack
(DVO207) Defending Your Workloads Against the Next Zero-Day Attack(DVO207) Defending Your Workloads Against the Next Zero-Day Attack
(DVO207) Defending Your Workloads Against the Next Zero-Day AttackAmazon Web Services
 
Journey Through the AWS Cloud; Application Services
Journey Through the AWS Cloud; Application ServicesJourney Through the AWS Cloud; Application Services
Journey Through the AWS Cloud; Application ServicesAmazon Web Services
 
RMG204 Optimizing Costs with AWS - AWS re: Invent 2012
RMG204 Optimizing Costs with AWS - AWS re: Invent 2012RMG204 Optimizing Costs with AWS - AWS re: Invent 2012
RMG204 Optimizing Costs with AWS - AWS re: Invent 2012Amazon Web Services
 
GOWAR - Virtual Wars Real Places. AWS Case Study
GOWAR - Virtual Wars Real Places. AWS Case StudyGOWAR - Virtual Wars Real Places. AWS Case Study
GOWAR - Virtual Wars Real Places. AWS Case StudyAmazon Web Services
 
Secure Hadoop as a Service - Session Sponsored by Intel
Secure Hadoop as a Service - Session Sponsored by IntelSecure Hadoop as a Service - Session Sponsored by Intel
Secure Hadoop as a Service - Session Sponsored by IntelAmazon Web Services
 
Customer presentation: Trisys, Introduction to AWS, Cambridge
Customer presentation: Trisys, Introduction to AWS, CambridgeCustomer presentation: Trisys, Introduction to AWS, Cambridge
Customer presentation: Trisys, Introduction to AWS, CambridgeAmazon Web Services
 
(SOV208) Amazon WorkSpaces and Amazon Zocalo | AWS re:Invent 2014
(SOV208) Amazon WorkSpaces and Amazon Zocalo | AWS re:Invent 2014(SOV208) Amazon WorkSpaces and Amazon Zocalo | AWS re:Invent 2014
(SOV208) Amazon WorkSpaces and Amazon Zocalo | AWS re:Invent 2014Amazon Web Services
 
AWS Enterprise Summit London | National Rail Enquiries Darwin Migration
AWS Enterprise Summit London | National Rail Enquiries Darwin MigrationAWS Enterprise Summit London | National Rail Enquiries Darwin Migration
AWS Enterprise Summit London | National Rail Enquiries Darwin MigrationAmazon Web Services
 
Media Success Stories from the Cloud
Media Success Stories from the CloudMedia Success Stories from the Cloud
Media Success Stories from the CloudAmazon Web Services
 
Best Practices in Architecting for the Cloud Webinar - Jinesh Varia
Best Practices in Architecting for the Cloud Webinar - Jinesh VariaBest Practices in Architecting for the Cloud Webinar - Jinesh Varia
Best Practices in Architecting for the Cloud Webinar - Jinesh VariaAmazon Web Services
 
AWS Customer Presentation - SOASTA
AWS Customer Presentation - SOASTAAWS Customer Presentation - SOASTA
AWS Customer Presentation - SOASTAAmazon Web Services
 
Canonical AWS Summit London 2011
Canonical AWS Summit London 2011Canonical AWS Summit London 2011
Canonical AWS Summit London 2011Amazon Web Services
 
Relational Databases Redefined with AWS
Relational Databases Redefined with AWSRelational Databases Redefined with AWS
Relational Databases Redefined with AWSAmazon Web Services
 
AWS Partner Presentation - Suse Linux Proven Cloud Success
AWS Partner Presentation - Suse Linux Proven Cloud SuccessAWS Partner Presentation - Suse Linux Proven Cloud Success
AWS Partner Presentation - Suse Linux Proven Cloud SuccessAmazon Web Services
 
Deploy, Manage & Scale Your Apps with Elastic Beanstalk
Deploy, Manage & Scale Your Apps with Elastic BeanstalkDeploy, Manage & Scale Your Apps with Elastic Beanstalk
Deploy, Manage & Scale Your Apps with Elastic BeanstalkAmazon Web Services
 
AWS Sydney Summit 2013 - Technical Lessons on How to do DR in the Cloud
AWS Sydney Summit 2013 - Technical Lessons on How to do DR in the CloudAWS Sydney Summit 2013 - Technical Lessons on How to do DR in the Cloud
AWS Sydney Summit 2013 - Technical Lessons on How to do DR in the CloudAmazon Web Services
 

Destacado (20)

Ahmed Absi slides bigbwa
Ahmed Absi slides  bigbwaAhmed Absi slides  bigbwa
Ahmed Absi slides bigbwa
 
Introduction to NGS Variant Calling Analysis (UEB-UAT Bioinformatics Course -...
Introduction to NGS Variant Calling Analysis (UEB-UAT Bioinformatics Course -...Introduction to NGS Variant Calling Analysis (UEB-UAT Bioinformatics Course -...
Introduction to NGS Variant Calling Analysis (UEB-UAT Bioinformatics Course -...
 
Using Security to Build with Confidence in AWS
Using Security to Build with Confidence in AWSUsing Security to Build with Confidence in AWS
Using Security to Build with Confidence in AWS
 
(DVO207) Defending Your Workloads Against the Next Zero-Day Attack
(DVO207) Defending Your Workloads Against the Next Zero-Day Attack(DVO207) Defending Your Workloads Against the Next Zero-Day Attack
(DVO207) Defending Your Workloads Against the Next Zero-Day Attack
 
Journey Through the AWS Cloud; Application Services
Journey Through the AWS Cloud; Application ServicesJourney Through the AWS Cloud; Application Services
Journey Through the AWS Cloud; Application Services
 
RMG204 Optimizing Costs with AWS - AWS re: Invent 2012
RMG204 Optimizing Costs with AWS - AWS re: Invent 2012RMG204 Optimizing Costs with AWS - AWS re: Invent 2012
RMG204 Optimizing Costs with AWS - AWS re: Invent 2012
 
GOWAR - Virtual Wars Real Places. AWS Case Study
GOWAR - Virtual Wars Real Places. AWS Case StudyGOWAR - Virtual Wars Real Places. AWS Case Study
GOWAR - Virtual Wars Real Places. AWS Case Study
 
Secure Hadoop as a Service - Session Sponsored by Intel
Secure Hadoop as a Service - Session Sponsored by IntelSecure Hadoop as a Service - Session Sponsored by Intel
Secure Hadoop as a Service - Session Sponsored by Intel
 
Customer presentation: Trisys, Introduction to AWS, Cambridge
Customer presentation: Trisys, Introduction to AWS, CambridgeCustomer presentation: Trisys, Introduction to AWS, Cambridge
Customer presentation: Trisys, Introduction to AWS, Cambridge
 
(SOV208) Amazon WorkSpaces and Amazon Zocalo | AWS re:Invent 2014
(SOV208) Amazon WorkSpaces and Amazon Zocalo | AWS re:Invent 2014(SOV208) Amazon WorkSpaces and Amazon Zocalo | AWS re:Invent 2014
(SOV208) Amazon WorkSpaces and Amazon Zocalo | AWS re:Invent 2014
 
AWS Enterprise Summit London | National Rail Enquiries Darwin Migration
AWS Enterprise Summit London | National Rail Enquiries Darwin MigrationAWS Enterprise Summit London | National Rail Enquiries Darwin Migration
AWS Enterprise Summit London | National Rail Enquiries Darwin Migration
 
Media Success Stories from the Cloud
Media Success Stories from the CloudMedia Success Stories from the Cloud
Media Success Stories from the Cloud
 
Best Practices in Architecting for the Cloud Webinar - Jinesh Varia
Best Practices in Architecting for the Cloud Webinar - Jinesh VariaBest Practices in Architecting for the Cloud Webinar - Jinesh Varia
Best Practices in Architecting for the Cloud Webinar - Jinesh Varia
 
Canberra Symposium Keynote
Canberra Symposium KeynoteCanberra Symposium Keynote
Canberra Symposium Keynote
 
AWS Customer Presentation - SOASTA
AWS Customer Presentation - SOASTAAWS Customer Presentation - SOASTA
AWS Customer Presentation - SOASTA
 
Canonical AWS Summit London 2011
Canonical AWS Summit London 2011Canonical AWS Summit London 2011
Canonical AWS Summit London 2011
 
Relational Databases Redefined with AWS
Relational Databases Redefined with AWSRelational Databases Redefined with AWS
Relational Databases Redefined with AWS
 
AWS Partner Presentation - Suse Linux Proven Cloud Success
AWS Partner Presentation - Suse Linux Proven Cloud SuccessAWS Partner Presentation - Suse Linux Proven Cloud Success
AWS Partner Presentation - Suse Linux Proven Cloud Success
 
Deploy, Manage & Scale Your Apps with Elastic Beanstalk
Deploy, Manage & Scale Your Apps with Elastic BeanstalkDeploy, Manage & Scale Your Apps with Elastic Beanstalk
Deploy, Manage & Scale Your Apps with Elastic Beanstalk
 
AWS Sydney Summit 2013 - Technical Lessons on How to do DR in the Cloud
AWS Sydney Summit 2013 - Technical Lessons on How to do DR in the CloudAWS Sydney Summit 2013 - Technical Lessons on How to do DR in the Cloud
AWS Sydney Summit 2013 - Technical Lessons on How to do DR in the Cloud
 

Similar a A Step to the Clouded Solution of Scalable Clinical Genome Sequencing (BDT308) | AWS re:Invent 2013

Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Ian Foster
 
Building bioinformatics resources for the global community
Building bioinformatics resources for the global communityBuilding bioinformatics resources for the global community
Building bioinformatics resources for the global communityExternalEvents
 
100,000 Genomes Project.
100,000 Genomes Project.100,000 Genomes Project.
100,000 Genomes Project.David Montaner
 
Artificial Intelligence in pathology
Artificial Intelligence in pathologyArtificial Intelligence in pathology
Artificial Intelligence in pathologynehaSingh1543
 
Data sharing - Data management - The SysMO-SEEK Story
Data sharing - Data management - The SysMO-SEEK StoryData sharing - Data management - The SysMO-SEEK Story
Data sharing - Data management - The SysMO-SEEK StoryResearch Information Network
 
Data management, data sharing: the SysMO-SEEK Story
Data management, data sharing: the SysMO-SEEK StoryData management, data sharing: the SysMO-SEEK Story
Data management, data sharing: the SysMO-SEEK StoryCarole Goble
 
FDA NGS and Big Data Conference September 2014
FDA NGS and Big Data Conference September 2014FDA NGS and Big Data Conference September 2014
FDA NGS and Big Data Conference September 2014Warren Kibbe
 
A Real-Time Prostate Cancer Radiotherapy Research Database
A Real-Time Prostate Cancer Radiotherapy Research DatabaseA Real-Time Prostate Cancer Radiotherapy Research Database
A Real-Time Prostate Cancer Radiotherapy Research DatabaseCancer Institute NSW
 
Open Source Networking Solving Molecular Analysis of Cancer
Open Source Networking Solving Molecular Analysis of CancerOpen Source Networking Solving Molecular Analysis of Cancer
Open Source Networking Solving Molecular Analysis of CancerOpen Networking Summit
 
Stephen Friend CRUK-MD Anderson Cancer Workshop 2012-02-28
Stephen Friend CRUK-MD Anderson Cancer Workshop 2012-02-28Stephen Friend CRUK-MD Anderson Cancer Workshop 2012-02-28
Stephen Friend CRUK-MD Anderson Cancer Workshop 2012-02-28Sage Base
 
GIAB Integrating multiple technologies to form benchmark SVs 180517
GIAB Integrating multiple technologies to form benchmark SVs 180517GIAB Integrating multiple technologies to form benchmark SVs 180517
GIAB Integrating multiple technologies to form benchmark SVs 180517GenomeInABottle
 
Production Bioinformatics, emphasis on Production
Production Bioinformatics, emphasis on ProductionProduction Bioinformatics, emphasis on Production
Production Bioinformatics, emphasis on ProductionChris Dwan
 
Forum on Personalized Medicine: Challenges for the next decade
Forum on Personalized Medicine: Challenges for the next decadeForum on Personalized Medicine: Challenges for the next decade
Forum on Personalized Medicine: Challenges for the next decadeJoaquin Dopazo
 
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...GenomeInABottle
 
GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015GenomeInABottle
 
Breast cancer diagnosis via data mining performance analysis of seven differe...
Breast cancer diagnosis via data mining performance analysis of seven differe...Breast cancer diagnosis via data mining performance analysis of seven differe...
Breast cancer diagnosis via data mining performance analysis of seven differe...cseij
 
Genomic Big Data Management, Integration and Mining - Emanuel Weitschek
Genomic Big Data Management, Integration and Mining - Emanuel WeitschekGenomic Big Data Management, Integration and Mining - Emanuel Weitschek
Genomic Big Data Management, Integration and Mining - Emanuel WeitschekData Driven Innovation
 
Bioinformatics_1_ChenS.pptx
Bioinformatics_1_ChenS.pptxBioinformatics_1_ChenS.pptx
Bioinformatics_1_ChenS.pptxxRowlet
 
Affymetrix OncoScan®* data analysis with Nexus Copy Number™
Affymetrix OncoScan®* data analysis with Nexus Copy Number™Affymetrix OncoScan®* data analysis with Nexus Copy Number™
Affymetrix OncoScan®* data analysis with Nexus Copy Number™Affymetrix
 

Similar a A Step to the Clouded Solution of Scalable Clinical Genome Sequencing (BDT308) | AWS re:Invent 2013 (20)

Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009Quantitative Medicine Feb 2009
Quantitative Medicine Feb 2009
 
Building bioinformatics resources for the global community
Building bioinformatics resources for the global communityBuilding bioinformatics resources for the global community
Building bioinformatics resources for the global community
 
100,000 Genomes Project.
100,000 Genomes Project.100,000 Genomes Project.
100,000 Genomes Project.
 
Artificial Intelligence in pathology
Artificial Intelligence in pathologyArtificial Intelligence in pathology
Artificial Intelligence in pathology
 
Data sharing - Data management - The SysMO-SEEK Story
Data sharing - Data management - The SysMO-SEEK StoryData sharing - Data management - The SysMO-SEEK Story
Data sharing - Data management - The SysMO-SEEK Story
 
Data management, data sharing: the SysMO-SEEK Story
Data management, data sharing: the SysMO-SEEK StoryData management, data sharing: the SysMO-SEEK Story
Data management, data sharing: the SysMO-SEEK Story
 
FDA NGS and Big Data Conference September 2014
FDA NGS and Big Data Conference September 2014FDA NGS and Big Data Conference September 2014
FDA NGS and Big Data Conference September 2014
 
A Real-Time Prostate Cancer Radiotherapy Research Database
A Real-Time Prostate Cancer Radiotherapy Research DatabaseA Real-Time Prostate Cancer Radiotherapy Research Database
A Real-Time Prostate Cancer Radiotherapy Research Database
 
Oncogenomics 2013
Oncogenomics 2013Oncogenomics 2013
Oncogenomics 2013
 
Open Source Networking Solving Molecular Analysis of Cancer
Open Source Networking Solving Molecular Analysis of CancerOpen Source Networking Solving Molecular Analysis of Cancer
Open Source Networking Solving Molecular Analysis of Cancer
 
Stephen Friend CRUK-MD Anderson Cancer Workshop 2012-02-28
Stephen Friend CRUK-MD Anderson Cancer Workshop 2012-02-28Stephen Friend CRUK-MD Anderson Cancer Workshop 2012-02-28
Stephen Friend CRUK-MD Anderson Cancer Workshop 2012-02-28
 
GIAB Integrating multiple technologies to form benchmark SVs 180517
GIAB Integrating multiple technologies to form benchmark SVs 180517GIAB Integrating multiple technologies to form benchmark SVs 180517
GIAB Integrating multiple technologies to form benchmark SVs 180517
 
Production Bioinformatics, emphasis on Production
Production Bioinformatics, emphasis on ProductionProduction Bioinformatics, emphasis on Production
Production Bioinformatics, emphasis on Production
 
Forum on Personalized Medicine: Challenges for the next decade
Forum on Personalized Medicine: Challenges for the next decadeForum on Personalized Medicine: Challenges for the next decade
Forum on Personalized Medicine: Challenges for the next decade
 
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
 
GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015GIAB update for GRC GIAB workshop 191015
GIAB update for GRC GIAB workshop 191015
 
Breast cancer diagnosis via data mining performance analysis of seven differe...
Breast cancer diagnosis via data mining performance analysis of seven differe...Breast cancer diagnosis via data mining performance analysis of seven differe...
Breast cancer diagnosis via data mining performance analysis of seven differe...
 
Genomic Big Data Management, Integration and Mining - Emanuel Weitschek
Genomic Big Data Management, Integration and Mining - Emanuel WeitschekGenomic Big Data Management, Integration and Mining - Emanuel Weitschek
Genomic Big Data Management, Integration and Mining - Emanuel Weitschek
 
Bioinformatics_1_ChenS.pptx
Bioinformatics_1_ChenS.pptxBioinformatics_1_ChenS.pptx
Bioinformatics_1_ChenS.pptx
 
Affymetrix OncoScan®* data analysis with Nexus Copy Number™
Affymetrix OncoScan®* data analysis with Nexus Copy Number™Affymetrix OncoScan®* data analysis with Nexus Copy Number™
Affymetrix OncoScan®* data analysis with Nexus Copy Number™
 

Más de Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

Más de Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Último

Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 

Último (20)

Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 

A Step to the Clouded Solution of Scalable Clinical Genome Sequencing (BDT308) | AWS re:Invent 2013

  • 1. The Problem and Promise of Translational Genetics and a Step to the Clouded Solution of Scalable Clinical Whole Genome Sequencing Jafar Shameem Amazon Web Services November 14, 2013 © 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
  • 2. Agenda • Bio-Informatics and Amazon Web Services • Examples of collaboration • Building Blocks – – – – Compute Storage Tools Pricing Models
  • 3. A rich history of collaboration with Life Sciences organizations
  • 4. AWS Public Data Sets • A centralized repository of public datasets • Seamless integration with cloud based applications • No charge to the community • Some of the datasets available today: – – Human Microbiome Project – Ensembl – GenBank – Illumina – Jay Flateley Human Genome Dataset – YRI Trio Dataset – The Cannabis Sativa Genome – UniGene – Influenza Virrus – • 1000 Genomes Project PubChem Tell us what else you’d like for us to host …
  • 5. Understanding how human genetics contributes to heart disease and aging CHARGE Consortium - aimed at better understanding how human genetics contributes to heart disease and aging DNANexus Baylor College of Medicine
  • 6. Cluster High Mem 8XL 89 EC2 Compute Units 244 GB SSD instance storage Compute High Storage 8XL 117 GB 35 EC2 Compute Units 24 * 2 TB instance store Cluster Compute 8XL 60.5 GB 88 EC2 Compute Units Memory (GiB) Hi-Mem 4XL 68.4 GB 26 EC2 Compute Units 8 virtual cores Hi-Mem 2XL 34.2 GB 13 EC2 Compute Units 4 virtual cores Cluster Compute 4XL 23 GB 33.5 EC2 Compute Units High I/O 4XL 60.5 GB, 35 EC2 Compute Units, 2*1024 GB SSD-based local instance storage Hi-Mem XL 17.1 GB 6.5 EC2 Compute Units 2 virtual cores Medium 3.7 GB, 2 EC2 Compute Units 1 virtual core Extra Large 15 GB 8 EC2 Compute Units 4 virtual cores Small 1.7 GB, 1 EC2 Compute Unit 1 virtual core Micro 613 MB Up to 2 ECUs Large 7.5 GB 4 EC2 Compute Units 2 virtual cores High-CPU Med 1.7 GB 5 EC2 Compute Units 2 virtual cores EC2 Compute Units Cluster GPU 4XL 22 GB 33.5 EC2 Compute Units, 2 x NVIDIA Tesla “Fermi” M2050 GPUs High-CPU XL 7 GB 20 EC2 Compute Units 8 virtual cores
  • 7. Storage Relational Database Service SimpleDB DynamoDB S3 Fully managed database NoSQL, Schemaless NoSQL, Schemaless, Object datastore up to 5TB (MySQL, Oracle, MSSQL) Smaller datasets Provisioned throughput per object database 99.999999999% durability Redshift Petabyte scale data warehousing service Fully managed
  • 8. Tools of the trade • • • • • • • GATK NCBI BLAST Crossbow CloudBurst Myrna Clovr BioPerl Max • • • • • • • VIPDAC Superfamily Cloud-Coffee BioNimbus GMOD CloudAligner BioConductor • • • • QIIME SNAP BWA Bowtie/TopHat/Cufflin ks • STAR, GSNAP, RUM MIT StarCluster Galaxy CloudMan Rocks Torque Slurm Condor Chef Puppet SaltStack Get links to AMIs at: https://github.com/mndoci/mndoci.github.com/wiki/Life-Science-Apps-on-AWS
  • 9. Many purchase models to support different needs Free Tier On-Demand Reserved Spot Dedicated Get Started on AWS with free usage & no commitment Pay for compute capacity by the hour with no long-term commitments Make a low, one-time payment and receive a significant discount on the hourly charge Bid for unused capacity, charged at a Spot Price which fluctuates based on supply and demand Launch instances within Amazon VPC that run on hardware dedicated to a single customer For POCs and getting started For spiky workloads, or to define needs For committed utilization For time-insensitive or transient workloads For highly sensitive or compliance related workloads
  • 10. How to use Spot? Ideal Applications Batch Processing Time-Delayable Fault-Tolerant or Restartable Compute-Intensive Horizontally Scalable Stateless Worker Nodes Region and AZ Independent Uses Deployment Automation Less Ideal Applications Interactive Strict/Tight SLA for Completion Expensive to Handle Terminations Data-Intensive In-Memory Scaling Long-Running Worker Nodes (weeks) Requires a Single AZ Manually Launched and Managed
  • 11. Tractable, scalable, and economical processing of clinical whole genome sequences in AWS Clinical Genomics for Cancer Diagnosis Amazon Web Services Re-Invent 2013 Nov 14th, 2013 Las Vegas, NV Peter J. Tonellato, PhD Harvard Medical School Dennis P. Wall, PhD* Stanford University* Stanford University
  • 12. Whole Genome Breast Cancer Program Objective: The objective of the Whole Genome Breast Cancer Program (WGBC) is to demonstrate the clinical utility and value of the use of whole genome analysis to practical breast cancer detection, diagnosis, prognosis and improved outcomes. WGA in Clinical Turn-Around Demonstrate the use of Amazon Web Services to establish Clinical Whole Genome Analysis in “clinical turn-around”: WG NGS Sequence to Actionable Health Care Information Clock time: Cost: < 3 hours < $100 Stanford University
  • 13. Whole Genome Breast Cancer Program 1. Organization and Progress to date 2. Historical BIDMC Breast Cancer cases 3. Clinical Whole Genome Analysis – Laboratory Test 4. COSMOS: Clinical Whole Genome Analysis on AWS Stanford University
  • 14. Whole Genome Breast Cancer Program 1. Organization and Progress to date 2. Historical BIDMC Breast Cancer cases 3. Clinical Whole Genome Analysis – Laboratory Test 4. COSMOS: Clinical Whole Genome Analysis on AWS Stanford University
  • 15. N - MDBCTB Surgery Mike Genetics Nadine Oncology Gerburg Pathology Stu Radiation Oncology Abram Genetic Counseling Jill Social Work Barbara Bioinformatics Peter & Dennis LPM - Sheida - Latrice - Jared - Yassine - Val - Michiyo Program Coordinator – Michiyo - Research Assistant * (Emily Poles?) - Technician * Case management - Case Identification - Case review (N-MDBCTB) - Consent (RN) - Clinical Data Management - Tissue Collection - Sample Management - Follow-up Assay/Bioinformatics Imaging Tejas Assay - Preparation/Storage - DNA extraction/purification - Sample delivery (to outsource) - Whole genome sequencing -OncoScan v3 (BI) - DNA/RNA/NGS sequencing (outsource) Bioinformatics - Data Transfer - Genome Data Integration/Management - Annotation - Analysis - Translation - Case Evidence Report Translation - Data Integration/Management - Case report (N-MDBCTB) Oversee External Advisory Board Clinical Executive Committee Cancer Center Stanford University Regulatory Affairs * Hiring
  • 16. Clinical Lab BIDMC Breast Cancer Patient/Sample Process Surgical specimen BIDMC Clinic (Oncology, Surgery, Radiation Oncology) N-BCMDTB Case selection Diagnosis Work-up MMG US MRI Biopsy (Immunohistochemistry, FISH) Pathology FFPE BCMDTB Diagnoses Yes Surgery After NAC Consent to care X Biopsy specimen Blood sample FF Presentation Case and Schedule Case Evaluation Surgery? No Blood test lab No Yes Consent to Research Surgery Biopsy Blood sample Tissue specimen Storage FF Diagnoses OMR OMR Yes -> * and ** No -> * Blood test Storage FFPE, FF Research * Clinical Research Pathology Lab Blood sample ** Research Tissue sample DNA, RNA Extraction - DNA Sequencing - Exome sequencing - OncoScan v3™ Copy number Somatic Mutation Adjuvant Therapy Analysis Lab (LPM) Chemotherapy Radiation therapy Patient flow Clinical Evaluation Case identification workflow Sample workflow Analysis workflow - Gene expression pipeline - OncoScan™ pipeline - SNP Chip pipeline - Integrative pipeline OMR - Clinical Data - Follow up Outcome Translation Workflow X: No further treatment and research NAC: Neoadjuvant chemotherapy OMR: Online Medical Record FFPE: Formalin-Fixed, Paraffin-Embedded (tissue) Stanford University FF: fresh frozen (tissue) N-BCMDTB Result Evaluation Identification of Targeted Therapy Personalized Medicine
  • 17. IRB Approved Protocol No Evaluation - Treatment Decision Yes -> Undergo surgery No Excluded Decision - Case Decision Yes -> Eligible No surgery Not eligible Disagreed Poor sample No Consent - Getting consent Yes -> Agreed No No Clinical Workup - Blood Test - Breast Surgery Tissue Workup - Pathology Workup - Sample Collection (Extract DNA/RNA from Tissue and Blood) - DNA genome sequencing - Exome sequencing - Copy number and somatic mutations analysis using an array platform (OncoScan) - Analysis outcome data Assay Clinical Outcome Analysis Clinicopathological Characteristic Translation - Discussion at NBCMDTB Traditional Treatment Personalized - Identification of Targeted Therapy Stanford University and Personalized Medicine Medicine
  • 18. Whole Genome Breast Cancer Program 1. Organization and Progress to date 2. Historical BIDMC Breast Cancer cases 3. Clinical Whole Genome Analysis – Laboratory Test 4. COSMOS: Clinical Whole Genome Analysis on AWS Stanford University
  • 19. Breast Cancer Clinical Use of WGA 1. 2. 3. 4. 5. Family and Individual Risk prediction Breast Cancer Tumor Characterization Breast Cancer Diagnosis Breast Cancer Prognosis Prediction of response to targeted therapies 6. Indications of outcome and assessment for future treatment refinement Stanford University
  • 20. Breast Cancer Genomic Devices 35 devices reviewed; 26 used clinically Prognosis Risk Prediction 23andMe* deCODEme* BRACAnalysis* Ambry Genetics* CCDG Panel OncoScan TargetPrint BluePrint** PAM50* BreastProfile* Her2Pro* MammaPrint Methyl-Profiler Rotterdam Signature MammoStrat BreastGeneDX Breast Cancer Array OncotypeDX* Breast Cancer Index Research OncoMap3** AsuraSeq-1000** OncoCarta** *Associated CPT/CMS codes **Not for clinical use SNaPshot MapQuant DX TheraPrint** NexCourse Bca Wash U Panel Target Now Stanford University
  • 21. Clinically Actionable Breast Cancer Information Data Type # Unique Entries Gene 773 SNP 52 SNPs for risk prediction. 1681 SNPs for prognosis 1733 Small Insertion 75 Small Deletion 205 Translocation 3 Gene Expression Drug target commonly based on gene expression profile 383 Protein Expression 7 Amplification 64 Deletion HER2, Estrogen, Progesterone receptor status 48 Total “Clinically” Actionable 3291 9 Deletions in BRCA1 or BRCA2 detected by BRACAnalysis confer increased breast cancer risk Stanford University
  • 22. Whole Genome Breast Cancer Program 1. Organization and Progress to date 2. Historical BIDMC Breast Cancer cases 3. Clinical Whole Genome Analysis (WGA) – Laboratory Test 4. COSMOS: Clinical Whole Genome Analysis on AWS Stanford University
  • 23. Clinical WGA Workflow Patients Samples Next Generation Sequencers Bioinformatics Analysis Clinical Genomics Interpretation Service Clinical Report Biomedical Report Stanford University
  • 25. Reduced Cost of Next Generation Sequencing (NGS) • NGS platforms: 5,000 Megabases/day • Drop of the per-base sequencing cost • Data on petabyte scale • NGS analysis involves complex workflows Stanford University
  • 26. WGA in “Clinical Turn-around” – Future 12 hours Sample Collection 500 hours Sequencing < 40hours < $100 3 hours Analysis 12 hours Clinical Action Stanford University
  • 27. Current Costs to Run on Amazon Web Services Details:  1 Whole Genome  60x  Spot and Reserved Instances  Utilizing Amazon Glacier for long term storage Whole Genome Analysis: Approximately 1 day Approximately $1500 Stanford University
  • 28. Whole Genome Breast Cancer Program 1. Organization and Progress to date 2. Historical BIDMC Breast Cancer cases 3. Clinical Whole Genome Analysis – Laboratory Test 4. COSMOS: Clinical Whole Genome Analysis on AWS • • • • AWS Applications Workflow COSMOS Stanford University
  • 29. Clinical Whole Genome Analysis Computational Objective: < 3 hours < $100 Four approaches to optimize and achieve our Clinical Turn-Around Objective: • AWS • Refine and Improve WGA Applications • Create a Standardized, Robust CWGA Workflow • Stabilize a new Workflow and Distributive Computing Platform: COSMOS Stanford University
  • 30. Clinical Whole Genome Analysis Computational Objective: < 3 hours < $100 Four approaches to optimize and achieve our Clinical Turn-Around Objective: • AWS • Refine and Improve WGA Applications • Create a Standardized, Robust CWGA Workflow • Stabilize a new Workflow and Distributive Computing Platform: COSMOS Stanford University
  • 31. Dynamic Cluster with number and the type of instances adapted to data-sets, jobs, and applications. EC2 instances AMIs S3 storage BAM BAM On-demand Master(s) Load Balanced Spot Instance Workers Stanford University BAM
  • 32. EC2 instances Optimization: Correct type and number of EC2s and cluster Current non-optimized Master: CC2.8xlarge High Memory: Single job (BWA) ~ 10GB RAM High IO: Access to common data files Virtualization: HVM for HugePage AMIs Current non-optimized Worker: CC2.8xlarge S3 storage High Memory: Single job (BWA) ~ 10GB RAM High IO: Access to common data files Virtualization: HVM for HugePage Stanford University
  • 33. Create stable CWGA AMI(s) EC2 instances Required Applications, libraries and dependencies: Applications (GATK): Samtools, BWA, … Human Reference Genome AMIs Annotation Databases S3 storage Stanford University
  • 34. Optimize: AMI EC2 instances Compiler: GCC 4.6+ supports AVX mode Refined GCC parameters Compressed libraries: zlib and snappy Refined JAVA parameters for GATK optimization AMIs S3 storage Memory: HugePage (2M) configured for every node/application Disks: Ephemeral: Cluster Disks: RAID 0 GlusterFS Stanford University
  • 35. EC2 instances AMIs S3 storage: • • • • • Storage of BAM files Transfer of BAM and other files “checkpoint” after each successful workflow stage Backup of intermediate and final results Storage of all timings and job information S3 storage Stanford University
  • 36. Clinical Whole Genome Analysis Computational Objective: < 3 hours < $100 Four approaches to optimize and achieve our Clinical Turn-Around Objective: • AWS • Refine and Improve WGA Applications • Create a Standardized, Robust CWGA Workflow • Stabilize a new Workflow and Distributive Computing Platform: COSMOS Stanford University
  • 38. WGA Applications Genome Analysis Toolkit (GATK) “best practice”. Preparation/Alignment Variant calling Stanford University Annotation Source: GATK best practices, BROAD Institute, http://www.broadinstitute.org/gatk/guide/topic?name=best-practices
  • 39. Applications Parallelization 5 exomes example 5 exome 600 500 400 300 200 5 exome 100 0 Preparation/Alignment Variant calling Stanford University Annotation
  • 41. Clinical Whole Genome Analysis Computational Objective: < 3 hours < $100 Four approaches to optimize and achieve our Clinical Turn-Around Objective: • AWS • Refine and Improve WGA Applications • Create a Standardized, Robust CWGA Workflow • Stabilize a new Workflow and Distributive Computing Platform: COSMOS Stanford University
  • 44. GenomeKey Implements GATK "best practices" for variant calling. GenomeKey Preparation/Alignment Variant calling Stanford University Annotation Source: GATK best practices, BROAD Institute, http://www.broadinstitute.org/gatk/guide/topic?name=best-practices
  • 46. Databases Integrated CytoBank The_1000g_Febuary_all dbSNP135 NHLBI_Exome_Project_euro TFBS NHLBI_Exome_Project_aa Segmental_Duplications NHLBI_Exome_Project_all RepeatMasker HGMD_INDEL Self Chain HGMD_SNP mirBase COSMIC TargetScan GWAS_Catalog Plus support for generic database file formats such as .bed and .gff3 SIFT ENCODE_DNaseI_Hypersensitivity PolyPhen2 ENCODE_Transcription_Factor Mutation_Taster UCSC_Gene GERP Refseq_Gene PhyloP Ensembl_Gene LRT CCDS_Gene Mce46way DrugBank Complete_Genomics_69 Stanford University
  • 47. Workflow Optimization • Speed: • Replacing BWA with SNAP (for the same accuracy) • Re-implement some slow algorithms (e.g. BQSR) • Accuracy: • Add additional quality control steps • Replacing Unified Genotyper with Haplotype Caller Stanford University
  • 48. Clinical Whole Genome Analysis Computational Objective: < 3 hours < $100 Four approaches to optimize and achieve our Clinical Turn-Around Objective: • AWS • Refine and Improve WGA Applications • Create a Standardized, Robust CWGA Workflow • Stabilize a new Workflow and Distributive Computing Platform: COSMOS Stanford University
  • 49. COSMOS Workflow management System Job splitting COSMOS Job tracking GenomeKey Preparation/Alignm ent Variant calling Annotation Gluster FS MySQL DB Networking Web Interface OS & Software Grid engine EC2 and S3 AWS Instances Stanford University Storage
  • 50. COSMOS Parallelization 1200 Preparation/Alignment Variant calling Annotation Number of Jobs 1000 800 600 1 Exome 5 Exomes 400 10 Exomes All Runs 200 0 Stanford University
  • 54. Job Dependency Tracking PREPARATION / ALIGNMENT VARIANT CALLING ANNOTATION Stanford University
  • 55. COSMOS Web Interface PREPARATION / ALIGNMENT VARIANT CALLING ANNOTATION Stanford University
  • 56. Clinical Whole Genome Analysis Computational Objective: < 3 hours < $100 Four approaches to optimize and achieve our Clinical Turn-Around Objective: • AWS • Refine and Improve WGA Applications • Create a Standardized, Robust CWGA Workflow • Stabilize a new Workflow and Distributive Computing Platform: COSMOS Stanford University
  • 57. Whole Exome Analysis Pre and Post-Optimization 30 Before 25 Wall time 20 ~$90 Before 15 10 5 0 ~$48 After Before ~$27 After ~$47 After ~$27 ~$10 1 exome 5 exomes Stanford University 10 exomes
  • 58. Whole Exome Analysis: 30 Before 25 Wall time 20 ~$90 Before 15 10 5 0 ~$48 After Before ~$27 After ~$47 After ~$27 ~$10 1 exome 5 exomes Stanford University 10 exomes
  • 59. Whole Exome Analysis: 30 Before 25 Wall time 20 ~$90 Before 15 10 5 0 ~$48 After Before ~$27 After ~$47 After ~$27 ~$10 1 exome 5 exomes Stanford University 10 exomes
  • 60. Whole Genome Breast Cancer Program Objective: The objective of the Whole Genome Breast Cancer Program (WGBC) is to demonstrate the clinical utility and value of the use of whole genome analysis to practical breast cancer detection, diagnosis, prognosis and improved outcomes. WGA in Clinical Turn-Around Demonstrate the use of Amazon Web Services to establish Clinical Whole Genome Analysis in “clinical turn-around”: WG NGS Sequence to Actionable Health Care Information Clock time: Cost: < 3 hours < $100 Stanford University
  • 61. Acknowledgments LPM (Tonellato) Erik Gafni (InVitae) Vince Fusaro (InVitae) Jared B. Hawkins Ryan Powles Yassine Souilmi Autism Speaks 6000 Exomes (current) 10,000 Genomes Wall lab (Harvard & Stanford University) Jae-Yoon Jung Alex Lancaster David Tulga Ancient Human Genomes David Reich Stanford University
  • 62. Tractable, scalable, and economical processing of clinical whole genome sequences in AWS Clinical Genomics for Cancer Diagnosis Amazon Web Services Re-Invent 2013 Nov 14th, 2013 Las Vegas, NV Peter J. Tonellato, PhD Harvard Medical School Dennis P. Wall, PhD* Stanford University* Stanford University