SlideShare a Scribd company logo
1 of 28
How novel compute technology
transforms life science research
From Hadoop Spark to cloud-based micro-services
HEATH & BIOSECURITY
Dr Denis Bauer | Bioinformatics | @allPowerde
6 Dec 2016 – Cloudera Public Sector Government Forum, Canberra
stuckincustoms
Overview
Transformational Bioinformatics | Denis C. Bauer | @allPowerde
GT-Scan2
How can genome
engineering be
made safer?
VariantSpark
How to find
disease genes in
population-size
cohorts?
CSIRO
How to facilitate
better
collaborations?
Team CSIRO
Transformational Bioinformatics | Denis C. Bauer | @allPowerde
5319
talented staff
$1billion+
budget
Working
with over
2800+
industry
partners
55
sites across
Australia
Top 1%
of global
research
agencies
Each year
6 CSIRO
technologies
contribute
$5 billion to
the economy
Big ideas start here
Transformational Bioinformatics | Denis C. Bauer | @allPowerde
EXTENDED
WEAR
CONTACTS
POLYMER
BANKNOTES
RELENZA
FLU TREATMENT
Fast WLAN
Wireless Local
Area Network
AEROGARD
TOTAL
WELLBEING
DIET
RAFT
POLYMERISATION
BARLEYmax™
SELF
TWISTING
YARN
SOFTLY
WASHING
LIQUID
HENDRA
VACCINE
NOVACQ™
PRAWN FEED
Convenient cardiac rehabilitation
Enhancing relationship between patient and mentor
Digital data collection
Equitable access
World's first, clinically validated smartphone based Cardiac
Rehab: uptake + 30% and completion +70%
Preparation for and recovery from
a Total Knee Replacement
o Remote monitoring by
Clinician
o Physiotherapy
o Wearable Technology
o Gamification
Genomic sequencing is revolutionizing
Health Care today. It offers up to 50%
more diagnoses than standard of care
and is on average 96% cheaper
Bauer et al. Trends Mol Med. 2014 PMID: 24801560
Transformational Bioinformatics | Denis C. Bauer | @allPowerde
Advances in sequencing technology has
generated the capacity to sequence the
Earth’s Genome in just 10 days
The human genome is 3 billion letters long
Transformational Bioinformatics | Denis C. Bauer | @allPowerde
need 3 billion samples to robustly analyze
100,000 Genomes project
70,000 individuals
by 2017
The cancer genome atlas
11,000 samples 2015
Genomics projects hence are getting bigger
Transformational Bioinformatics | Denis C. Bauer | @allPowerde
The HapMap Project
270 samples 2002
Human genome
~1 sample
1000 Genome Project
1097 samples 2012
ASPREE
4000 healthy 70+ year olds
Project MinE
15,000 people with ALS
Single samples are around 200GB in size
New demands on sequence analysis
Transformational Bioinformatics | Denis C. Bauer | @allPowerde
• The sheer volume of new data
necessitates new approaches.
Computational genomics must
progress from file formats to APIs,
from local hardware to the elasticity
of the cloud, from a cottage industry
of poorly maintained academic
software to professional-grade,
scalable code, and from one-time
evaluation by publication to
continuous evaluation by online
benchmarks.
Paten et al. The NIH BD2K center for big data in
translational genomics JAMIA 2015
Elasticity in the Cloud
Transformational Bioinformatics | Denis C. Bauer | @allPowerde
1
Elastic cloud compute… is like an In-room sound system
Benefits:
• Instant availability of adequately powered system
• Images can be shared and everything on it is automatically version controlled
Efficient scalability2
Kelly et al. Churchill: an ultra-fast, deterministic, highly scalable and balanced parallelization strategy for the discovery of
human genetic variation in clinical and population-scale genomics Genome Biology 2015
Bespoke parallelization
e.g. Churchill
Chromosomal split
e.g. NGSANE
MapReduce
e.g. GATK queue
Transformational Bioinformatics | Denis C. Bauer | @allPowerde11
|
Beunder 2010 Embedded
Population-scale genomic data analysis requires BigData
solutions
Desktop compute High-performance
compute cluster
Hadoop/Spark
compute cluster
Focus small data Compute-intensive Data-intensive
Fault tolerant No No Yes
Node-bound Yes Yes No
Parallelization 10 CPU 100 CPU 1000 CPU
Parallelization
procedure
bespoke bespoke standardized
Transformational Bioinformatics | Denis C. Bauer | @allPowerde
CSIRO solution
Transformational Bioinformatics | Denis C. Bauer | @allPowerde
Spark Summit 2016 (June) by Frank Austin Nothaft (UC Berkeley)
(70TB – 300 individuals)
One human genome analyzed (variant called) every 3.2 hours
Still not fast enough…
Clinical genomics facilities expect to deal with >18,000 genomes a
year, so a 3.2h TAT would accumulate 6.5 years of compute.
CSIRO along with other prominent research institutes (MIT,
Berkeley) partnered with cloudera and AWS to investigate
• HPC-based solutions
• GATKspark (The Spark reimplementation of the accepted gold
standard)
• ADAM
Transformational Bioinformatics | Denis C. Bauer | @allPowerde
Setup
Transformational Bioinformatics | Denis C. Bauer | @allPowerde
• Instances
– 5 worker
– 3 Hadoop scheduler
– one Cloudera manager
• Why we chose to go with a
cloudera solution
– Set-up and deploy is automated,
e.g. no manual IP-address
matching
– No need for admin support, e.g.
preconfigured
– Set up is portable to other
providers and on-premise
All humans carry between 200 to 800
mutation that disrupt the function of a
gene.
Which needle is the right one?
Transformational Bioinformatics | Denis C. Bauer | @allPowerde
http://science.sciencemag.org/content/335/6070/823.full
https://waynealliance.wordpress.com/2010/06/02/all-needles-no-hay/
Transformational Bioinformatics | Denis C. Bauer | @allPowerde
BMC Genomics 2015, 16:1052 PMID: 26651996 (IF=4)
0
1000
2000
Python
R
H
adoop
Adam
AD
M
IXTU
R
E
VariantSpark
method
timeinseconds
task
binary−conversion
clustering
pre−processing
It can classify 3000 individuals and 80 million variants in
under 30 minutes
• Collaboration between CSIRO, NCI and the John Curtin School of
Medical Research (JCSMR)
• Reuse AWS cluster on NCI on-premise cluster.
– Cluster built by joint effort by CSIRO Hadoop administrator and local
Cloudera staff
– VariantSpark deployed and running within only 3 days
• Demonstration of the lower risk for organisations with proof of
concept
Setup
Transformational Bioinformatics | Denis C. Bauer | @allPowerde
NHMRC: Dementia
Research Teams Grant led
by Ian Blair (MQ)
Developing insight into the
molecular origins of
familial and sporadic
frontotemporal dementia
and amyotrophic lateral
sclerosis
Transformational Bioinformatics | Denis C. Bauer | @allPowerde
Affected
900 WGS
Normal
1400 WGS
Identify causative
mutations
Cluster Individuals on
disease progression
Application cases for a VariantSpark cluster
Kidney disease: Simon
Foote (JCSMR)
Uncover genetic cause of
early onset kidney failure.
Genome Engineering is currently
being developed for medical
treatments in humans, such as
cancer, blindness, HIV treatment.
However, the molecular
technology, CRISPR, is not 100%
efficient.
Aim: Develop computational
guidance framework to enable
edits the first time; every time.
Transformational Bioinformatics | Denis C. Bauer | @allPowerde
Achieving the first time; every time
1. Better understanding of the science
2. Higher powered computational tools
• Super-computing-scale analysis
• Interactive real time analysis (query style research)
Transformational Bioinformatics | Denis C. Bauer | @allPowerde
lauren riddoch
iconfinder
GT-Scan2
Ranked choices
• We tested GT-scan2.0 against two publically available models:
• sgRNAscorer (Chari et al 2015, Nature Methods)
• WU-CRISPR (Wong et al 2015, Genome Biology)
• Tested 2 independent datasets (>4000 sgRNAs)
• Our chromatin aware model consistently outperformed the other models
Better Science
Transformational Bioinformatics | Denis C. Bauer | @allPowerde
AreaUnderthePrecision/RecallCurveRecall
Precision
Validation Set 1
Higher powered instantaneous compute
Desktop
compute
High-performance
compute
Hadoop/Spark Microservices
Focus small data Compute-intensive Data-intensive Agility
Fault tolerant No No Yes (Yes)
Node-bound Yes Yes No No
Parallelization 10 CPU 100 CPU 1000 CPU 1000 CPU
Parallelization procedure bespoke bespoke standardized standardized
Overhead in the cloud NA spin-up lag spin-up lag instantaneously
Transformational Bioinformatics | Denis C. Bauer | @allPowerde
CSIRO solution
Transformational Bioinformatics | Denis C. Bauer | @allPowerde
stuckincustoms
Area Under the Precision/Recall Curve International Recognition
Implementation
Transformational Bioinformatics | Denis C. Bauer | @allPowerde
• GT-Scan2.0 is implemented as
a AWS Lambda function
• Server-less function:
• Does not require users to
have high-compute power
• Scalable:
• Can be easily scaled to
whole genome analysis
• Also intend to implement as a
“stand-alone”
• Can be run on local servers
• Can incorporate your own ChIP-seq
data rather than public data
On-demand instances vs Lambda
Pro Con
Lambda Instantaneously available Rel. small processing power
Spark-cluster Unlimited processing power Spin-up time
Transformational Bioinformatics | Denis C. Bauer | @allPowerde
Sweet-spot for when large number of “nimble” small processors give a
worse performance compared to a powerful cluster with overhead.
Especially, with spin up overhead reduced with managers like cloudera
Director.
Three things to remember
• Large volumes of detailed data?
VariantSpark, bringing bigLearning to genomics, can
classify 3000 individuals and 80 million variants in under
30 minutes using Spark
• Parallelizable tasks persistent cloud-availability?
GT-Scan2, computationally guiding genome engineering,
uses Chromatin information and the latest in cloud-
compute to improve CRISPR target site identification
• CSIRO specializes in using the latest advances in
compute technology to push the boundary on
bioinformatics problems
Transformational Bioinformatics | Denis C. Bauer | @allPowerde
Natalie Twine
Acknowledgements
Transformational Bioinformatics | Denis C. Bauer | @allPowerde
Denis Bauer Oscar Luo Rob Dunne Piotr Szul
Transformational Bioinformatics Team
Aidan O’BrienLaurence Wilson
Adrian White
Mia Champion
Gaetan Burgio
Collaborators
David LevyIan Blair
Kelly Williams
News
Software
Open Position
Dan Andrews

More Related Content

What's hot

Exploring Spark for Scalable Metagenomics Analysis: Spark Summit East talk by...
Exploring Spark for Scalable Metagenomics Analysis: Spark Summit East talk by...Exploring Spark for Scalable Metagenomics Analysis: Spark Summit East talk by...
Exploring Spark for Scalable Metagenomics Analysis: Spark Summit East talk by...Spark Summit
 
Scaling Genetic Data Analysis with Apache Spark with Jon Bloom and Tim Poterba
Scaling Genetic Data Analysis with Apache Spark with Jon Bloom and Tim PoterbaScaling Genetic Data Analysis with Apache Spark with Jon Bloom and Tim Poterba
Scaling Genetic Data Analysis with Apache Spark with Jon Bloom and Tim PoterbaDatabricks
 
Scalable Genome Analysis With ADAM
Scalable Genome Analysis With ADAMScalable Genome Analysis With ADAM
Scalable Genome Analysis With ADAMfnothaft
 
Data analysis & integration challenges in genomics
Data analysis & integration challenges in genomicsData analysis & integration challenges in genomics
Data analysis & integration challenges in genomicsmikaelhuss
 
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...Golden Helix Inc
 
BioBankCloud: Machine Learning on Genomics + GA4GH @ Med at Scale
BioBankCloud: Machine Learning on Genomics + GA4GH  @ Med at ScaleBioBankCloud: Machine Learning on Genomics + GA4GH  @ Med at Scale
BioBankCloud: Machine Learning on Genomics + GA4GH @ Med at ScaleAndy Petrella
 
Using Supercomputers and Supernetworks to Explore the Ocean of Life
Using Supercomputers and Supernetworks to Explore the Ocean of LifeUsing Supercomputers and Supernetworks to Explore the Ocean of Life
Using Supercomputers and Supernetworks to Explore the Ocean of LifeLarry Smarr
 
2013 stamps-intro-assembly
2013 stamps-intro-assembly2013 stamps-intro-assembly
2013 stamps-intro-assemblyc.titus.brown
 
2015 ohsu-metagenome
2015 ohsu-metagenome2015 ohsu-metagenome
2015 ohsu-metagenomec.titus.brown
 
Tin-Lap Lee: GDSAP- A Galaxy-based platform for large-scale genomics analysis
Tin-Lap Lee: GDSAP- A Galaxy-based platform for large-scale genomics analysisTin-Lap Lee: GDSAP- A Galaxy-based platform for large-scale genomics analysis
Tin-Lap Lee: GDSAP- A Galaxy-based platform for large-scale genomics analysisGigaScience, BGI Hong Kong
 
Next generation genomics: Petascale data in the life sciences
Next generation genomics: Petascale data in the life sciencesNext generation genomics: Petascale data in the life sciences
Next generation genomics: Petascale data in the life sciencesGuy Coates
 
2nd CRISPR Congress Boston, 23-25 February 2016
2nd CRISPR Congress Boston, 23-25 February 2016 2nd CRISPR Congress Boston, 23-25 February 2016
2nd CRISPR Congress Boston, 23-25 February 2016 Diane McKenna
 
CRISPR Gene Editing Congress, 25-27 February 2015 in Boston, MA
CRISPR Gene Editing Congress, 25-27 February 2015 in Boston, MACRISPR Gene Editing Congress, 25-27 February 2015 in Boston, MA
CRISPR Gene Editing Congress, 25-27 February 2015 in Boston, MADiane McKenna
 
Fast Variant Calling with ADAM and avocado
Fast Variant Calling with ADAM and avocadoFast Variant Calling with ADAM and avocado
Fast Variant Calling with ADAM and avocadofnothaft
 

What's hot (20)

Exploring Spark for Scalable Metagenomics Analysis: Spark Summit East talk by...
Exploring Spark for Scalable Metagenomics Analysis: Spark Summit East talk by...Exploring Spark for Scalable Metagenomics Analysis: Spark Summit East talk by...
Exploring Spark for Scalable Metagenomics Analysis: Spark Summit East talk by...
 
Scaling Genetic Data Analysis with Apache Spark with Jon Bloom and Tim Poterba
Scaling Genetic Data Analysis with Apache Spark with Jon Bloom and Tim PoterbaScaling Genetic Data Analysis with Apache Spark with Jon Bloom and Tim Poterba
Scaling Genetic Data Analysis with Apache Spark with Jon Bloom and Tim Poterba
 
Scalable Genome Analysis With ADAM
Scalable Genome Analysis With ADAMScalable Genome Analysis With ADAM
Scalable Genome Analysis With ADAM
 
Data analysis & integration challenges in genomics
Data analysis & integration challenges in genomicsData analysis & integration challenges in genomics
Data analysis & integration challenges in genomics
 
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
 
BioBankCloud: Machine Learning on Genomics + GA4GH @ Med at Scale
BioBankCloud: Machine Learning on Genomics + GA4GH  @ Med at ScaleBioBankCloud: Machine Learning on Genomics + GA4GH  @ Med at Scale
BioBankCloud: Machine Learning on Genomics + GA4GH @ Med at Scale
 
Using Supercomputers and Supernetworks to Explore the Ocean of Life
Using Supercomputers and Supernetworks to Explore the Ocean of LifeUsing Supercomputers and Supernetworks to Explore the Ocean of Life
Using Supercomputers and Supernetworks to Explore the Ocean of Life
 
2013 stamps-intro-assembly
2013 stamps-intro-assembly2013 stamps-intro-assembly
2013 stamps-intro-assembly
 
2015 ohsu-metagenome
2015 ohsu-metagenome2015 ohsu-metagenome
2015 ohsu-metagenome
 
2013 duke-talk
2013 duke-talk2013 duke-talk
2013 duke-talk
 
2014 sage-talk
2014 sage-talk2014 sage-talk
2014 sage-talk
 
Genetic data storage
Genetic data storageGenetic data storage
Genetic data storage
 
Tin-Lap Lee: GDSAP- A Galaxy-based platform for large-scale genomics analysis
Tin-Lap Lee: GDSAP- A Galaxy-based platform for large-scale genomics analysisTin-Lap Lee: GDSAP- A Galaxy-based platform for large-scale genomics analysis
Tin-Lap Lee: GDSAP- A Galaxy-based platform for large-scale genomics analysis
 
Next generation genomics: Petascale data in the life sciences
Next generation genomics: Petascale data in the life sciencesNext generation genomics: Petascale data in the life sciences
Next generation genomics: Petascale data in the life sciences
 
2014 bangkok-talk
2014 bangkok-talk2014 bangkok-talk
2014 bangkok-talk
 
2013 alumni-webinar
2013 alumni-webinar2013 alumni-webinar
2013 alumni-webinar
 
2nd CRISPR Congress Boston, 23-25 February 2016
2nd CRISPR Congress Boston, 23-25 February 2016 2nd CRISPR Congress Boston, 23-25 February 2016
2nd CRISPR Congress Boston, 23-25 February 2016
 
CRISPR Gene Editing Congress, 25-27 February 2015 in Boston, MA
CRISPR Gene Editing Congress, 25-27 February 2015 in Boston, MACRISPR Gene Editing Congress, 25-27 February 2015 in Boston, MA
CRISPR Gene Editing Congress, 25-27 February 2015 in Boston, MA
 
Fast Variant Calling with ADAM and avocado
Fast Variant Calling with ADAM and avocadoFast Variant Calling with ADAM and avocado
Fast Variant Calling with ADAM and avocado
 
Big data nebraska
Big data nebraskaBig data nebraska
Big data nebraska
 

Viewers also liked

Adventures in Translational Bioinformatics
Adventures in Translational BioinformaticsAdventures in Translational Bioinformatics
Adventures in Translational BioinformaticsHarry Hochheiser
 
Epigenetic regulation in plants
Epigenetic regulation in plantsEpigenetic regulation in plants
Epigenetic regulation in plantsRyza Priatama
 
Alien invasion
Alien invasionAlien invasion
Alien invasiontony_odom
 
Epigenetics mediated gene regulation in plants
Epigenetics mediated gene regulation in plantsEpigenetics mediated gene regulation in plants
Epigenetics mediated gene regulation in plantsSachin Ekatpure
 
Plant Disease Resistant And Genetic Engineering
Plant Disease Resistant And Genetic EngineeringPlant Disease Resistant And Genetic Engineering
Plant Disease Resistant And Genetic EngineeringShweta Jhakhar
 
Transformational generative grammar
Transformational generative grammarTransformational generative grammar
Transformational generative grammarKat OngCan
 
Transformational-Generative Grammar
Transformational-Generative GrammarTransformational-Generative Grammar
Transformational-Generative GrammarRuth Ann Llego
 
Transformational Grammar by: Noam Chomsky
Transformational Grammar by: Noam ChomskyTransformational Grammar by: Noam Chomsky
Transformational Grammar by: Noam ChomskyShiela May Claro
 
Dna microarray (dna chips)
Dna microarray (dna chips)Dna microarray (dna chips)
Dna microarray (dna chips)Rachana Tiwari
 
Transformational generative grammar
Transformational  generative grammarTransformational  generative grammar
Transformational generative grammarBaishakhi Amin
 

Viewers also liked (13)

Adventures in Translational Bioinformatics
Adventures in Translational BioinformaticsAdventures in Translational Bioinformatics
Adventures in Translational Bioinformatics
 
Epigenetic regulation in plants
Epigenetic regulation in plantsEpigenetic regulation in plants
Epigenetic regulation in plants
 
Alien invasion
Alien invasionAlien invasion
Alien invasion
 
Zoology Seminar 2011
Zoology Seminar 2011Zoology Seminar 2011
Zoology Seminar 2011
 
Epigenetics mediated gene regulation in plants
Epigenetics mediated gene regulation in plantsEpigenetics mediated gene regulation in plants
Epigenetics mediated gene regulation in plants
 
Plant Disease Resistant And Genetic Engineering
Plant Disease Resistant And Genetic EngineeringPlant Disease Resistant And Genetic Engineering
Plant Disease Resistant And Genetic Engineering
 
Transformational generative grammar
Transformational generative grammarTransformational generative grammar
Transformational generative grammar
 
Transformational-Generative Grammar
Transformational-Generative GrammarTransformational-Generative Grammar
Transformational-Generative Grammar
 
Epigenetics
EpigeneticsEpigenetics
Epigenetics
 
Epigenetics
EpigeneticsEpigenetics
Epigenetics
 
Transformational Grammar by: Noam Chomsky
Transformational Grammar by: Noam ChomskyTransformational Grammar by: Noam Chomsky
Transformational Grammar by: Noam Chomsky
 
Dna microarray (dna chips)
Dna microarray (dna chips)Dna microarray (dna chips)
Dna microarray (dna chips)
 
Transformational generative grammar
Transformational  generative grammarTransformational  generative grammar
Transformational generative grammar
 

Similar to How novel compute technology transforms life science research

Customer Case Study: How Novel Compute Technology Transforms Medical and Life...
Customer Case Study: How Novel Compute Technology Transforms Medical and Life...Customer Case Study: How Novel Compute Technology Transforms Medical and Life...
Customer Case Study: How Novel Compute Technology Transforms Medical and Life...Amazon Web Services
 
Genome-scale Big Data Pipelines
Genome-scale Big Data PipelinesGenome-scale Big Data Pipelines
Genome-scale Big Data PipelinesLynn Langit
 
Genomic Scale Big Data Pipelines
Genomic Scale Big Data PipelinesGenomic Scale Big Data Pipelines
Genomic Scale Big Data PipelinesLynn Langit
 
How novel compute technology transforms life science research
How novel compute technology transforms life science researchHow novel compute technology transforms life science research
How novel compute technology transforms life science researchDenis C. Bauer
 
Going Server-less for Web-Services that need to Crunch Large Volumes of Data
Going Server-less for Web-Services that need to Crunch Large Volumes of DataGoing Server-less for Web-Services that need to Crunch Large Volumes of Data
Going Server-less for Web-Services that need to Crunch Large Volumes of DataDenis C. Bauer
 
Cloud-native machine learning - Transforming bioinformatics research
Cloud-native machine learning - Transforming bioinformatics research Cloud-native machine learning - Transforming bioinformatics research
Cloud-native machine learning - Transforming bioinformatics research Denis C. Bauer
 
Translating genomics into clinical practice - 2018 AWS summit keynote
Translating genomics into clinical practice - 2018 AWS summit keynoteTranslating genomics into clinical practice - 2018 AWS summit keynote
Translating genomics into clinical practice - 2018 AWS summit keynoteDenis C. Bauer
 
Utility HPC: Right Systems, Right Scale, Right Science
Utility HPC: Right Systems, Right Scale, Right ScienceUtility HPC: Right Systems, Right Scale, Right Science
Utility HPC: Right Systems, Right Scale, Right ScienceChef Software, Inc.
 
wolstencroft-ogf20-astro
wolstencroft-ogf20-astrowolstencroft-ogf20-astro
wolstencroft-ogf20-astrowebuploader
 
High Performance Computing and the Opportunity with Cognitive Technology
 High Performance Computing and the Opportunity with Cognitive Technology High Performance Computing and the Opportunity with Cognitive Technology
High Performance Computing and the Opportunity with Cognitive TechnologyIBM Watson
 
A National Big Data Cyberinfrastructure Supporting Computational Biomedical R...
A National Big Data Cyberinfrastructure Supporting Computational Biomedical R...A National Big Data Cyberinfrastructure Supporting Computational Biomedical R...
A National Big Data Cyberinfrastructure Supporting Computational Biomedical R...Larry Smarr
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemWarren Kibbe
 
Application Optimized Performance: Choosing the Right Instance (CPN212) | AWS...
Application Optimized Performance: Choosing the Right Instance (CPN212) | AWS...Application Optimized Performance: Choosing the Right Instance (CPN212) | AWS...
Application Optimized Performance: Choosing the Right Instance (CPN212) | AWS...Amazon Web Services
 
Big Data, The Community and The Commons (May 12, 2014)
Big Data, The Community and The Commons (May 12, 2014)Big Data, The Community and The Commons (May 12, 2014)
Big Data, The Community and The Commons (May 12, 2014)Robert Grossman
 
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...Bonnie Hurwitz
 
Internet2 Bio IT 2016 v2
Internet2 Bio IT 2016 v2Internet2 Bio IT 2016 v2
Internet2 Bio IT 2016 v2Dan Taylor
 
Cloud Accelerated Genomics
Cloud Accelerated GenomicsCloud Accelerated Genomics
Cloud Accelerated GenomicsIdan Tohami
 
20170315 Cloud Accelerated Genomics - Tel Aviv / Phoenix
20170315 Cloud Accelerated Genomics - Tel Aviv / Phoenix20170315 Cloud Accelerated Genomics - Tel Aviv / Phoenix
20170315 Cloud Accelerated Genomics - Tel Aviv / PhoenixAllen Day, PhD
 
Opportunities for HPC in pharma R&D - main deck
Opportunities for HPC in pharma R&D - main deckOpportunities for HPC in pharma R&D - main deck
Opportunities for HPC in pharma R&D - main deckPistoia Alliance
 
VariantSpark on AWS
VariantSpark on AWSVariantSpark on AWS
VariantSpark on AWSLynn Langit
 

Similar to How novel compute technology transforms life science research (20)

Customer Case Study: How Novel Compute Technology Transforms Medical and Life...
Customer Case Study: How Novel Compute Technology Transforms Medical and Life...Customer Case Study: How Novel Compute Technology Transforms Medical and Life...
Customer Case Study: How Novel Compute Technology Transforms Medical and Life...
 
Genome-scale Big Data Pipelines
Genome-scale Big Data PipelinesGenome-scale Big Data Pipelines
Genome-scale Big Data Pipelines
 
Genomic Scale Big Data Pipelines
Genomic Scale Big Data PipelinesGenomic Scale Big Data Pipelines
Genomic Scale Big Data Pipelines
 
How novel compute technology transforms life science research
How novel compute technology transforms life science researchHow novel compute technology transforms life science research
How novel compute technology transforms life science research
 
Going Server-less for Web-Services that need to Crunch Large Volumes of Data
Going Server-less for Web-Services that need to Crunch Large Volumes of DataGoing Server-less for Web-Services that need to Crunch Large Volumes of Data
Going Server-less for Web-Services that need to Crunch Large Volumes of Data
 
Cloud-native machine learning - Transforming bioinformatics research
Cloud-native machine learning - Transforming bioinformatics research Cloud-native machine learning - Transforming bioinformatics research
Cloud-native machine learning - Transforming bioinformatics research
 
Translating genomics into clinical practice - 2018 AWS summit keynote
Translating genomics into clinical practice - 2018 AWS summit keynoteTranslating genomics into clinical practice - 2018 AWS summit keynote
Translating genomics into clinical practice - 2018 AWS summit keynote
 
Utility HPC: Right Systems, Right Scale, Right Science
Utility HPC: Right Systems, Right Scale, Right ScienceUtility HPC: Right Systems, Right Scale, Right Science
Utility HPC: Right Systems, Right Scale, Right Science
 
wolstencroft-ogf20-astro
wolstencroft-ogf20-astrowolstencroft-ogf20-astro
wolstencroft-ogf20-astro
 
High Performance Computing and the Opportunity with Cognitive Technology
 High Performance Computing and the Opportunity with Cognitive Technology High Performance Computing and the Opportunity with Cognitive Technology
High Performance Computing and the Opportunity with Cognitive Technology
 
A National Big Data Cyberinfrastructure Supporting Computational Biomedical R...
A National Big Data Cyberinfrastructure Supporting Computational Biomedical R...A National Big Data Cyberinfrastructure Supporting Computational Biomedical R...
A National Big Data Cyberinfrastructure Supporting Computational Biomedical R...
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health System
 
Application Optimized Performance: Choosing the Right Instance (CPN212) | AWS...
Application Optimized Performance: Choosing the Right Instance (CPN212) | AWS...Application Optimized Performance: Choosing the Right Instance (CPN212) | AWS...
Application Optimized Performance: Choosing the Right Instance (CPN212) | AWS...
 
Big Data, The Community and The Commons (May 12, 2014)
Big Data, The Community and The Commons (May 12, 2014)Big Data, The Community and The Commons (May 12, 2014)
Big Data, The Community and The Commons (May 12, 2014)
 
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
 
Internet2 Bio IT 2016 v2
Internet2 Bio IT 2016 v2Internet2 Bio IT 2016 v2
Internet2 Bio IT 2016 v2
 
Cloud Accelerated Genomics
Cloud Accelerated GenomicsCloud Accelerated Genomics
Cloud Accelerated Genomics
 
20170315 Cloud Accelerated Genomics - Tel Aviv / Phoenix
20170315 Cloud Accelerated Genomics - Tel Aviv / Phoenix20170315 Cloud Accelerated Genomics - Tel Aviv / Phoenix
20170315 Cloud Accelerated Genomics - Tel Aviv / Phoenix
 
Opportunities for HPC in pharma R&D - main deck
Opportunities for HPC in pharma R&D - main deckOpportunities for HPC in pharma R&D - main deck
Opportunities for HPC in pharma R&D - main deck
 
VariantSpark on AWS
VariantSpark on AWSVariantSpark on AWS
VariantSpark on AWS
 

More from Denis C. Bauer

Allelic Imbalance for Pre-capture Whole Exome Sequencing
Allelic Imbalance for Pre-capture Whole Exome SequencingAllelic Imbalance for Pre-capture Whole Exome Sequencing
Allelic Imbalance for Pre-capture Whole Exome SequencingDenis C. Bauer
 
Centralizing sequence analysis
Centralizing sequence analysisCentralizing sequence analysis
Centralizing sequence analysisDenis C. Bauer
 
Qbi Centre for Brain genomics (Informatics side)
Qbi Centre for Brain genomics (Informatics side)Qbi Centre for Brain genomics (Informatics side)
Qbi Centre for Brain genomics (Informatics side)Denis C. Bauer
 
Differential gene expression
Differential gene expressionDifferential gene expression
Differential gene expressionDenis C. Bauer
 
Transcript detection in RNAseq
Transcript detection in RNAseqTranscript detection in RNAseq
Transcript detection in RNAseqDenis C. Bauer
 
Functionally annotate genomic variants
Functionally annotate genomic variantsFunctionally annotate genomic variants
Functionally annotate genomic variantsDenis C. Bauer
 
Variant (SNPs/Indels) calling in DNA sequences, Part 2
Variant (SNPs/Indels) calling in DNA sequences, Part 2Variant (SNPs/Indels) calling in DNA sequences, Part 2
Variant (SNPs/Indels) calling in DNA sequences, Part 2Denis C. Bauer
 
Variant (SNPs/Indels) calling in DNA sequences, Part 1
Variant (SNPs/Indels) calling in DNA sequences, Part 1 Variant (SNPs/Indels) calling in DNA sequences, Part 1
Variant (SNPs/Indels) calling in DNA sequences, Part 1 Denis C. Bauer
 
Introduction to second generation sequencing
Introduction to second generation sequencingIntroduction to second generation sequencing
Introduction to second generation sequencingDenis C. Bauer
 
Introduction to Bioinformatics
Introduction to BioinformaticsIntroduction to Bioinformatics
Introduction to BioinformaticsDenis C. Bauer
 
The missing data issue for HiSeq runs
The missing data issue for HiSeq runsThe missing data issue for HiSeq runs
The missing data issue for HiSeq runsDenis C. Bauer
 
Deciphering the regulatory code in the genome
Deciphering the regulatory code in the genomeDeciphering the regulatory code in the genome
Deciphering the regulatory code in the genomeDenis C. Bauer
 
STAR: Recombination site prediction
STAR: Recombination site predictionSTAR: Recombination site prediction
STAR: Recombination site predictionDenis C. Bauer
 
SUMOylation site prediction
SUMOylation site predictionSUMOylation site prediction
SUMOylation site predictionDenis C. Bauer
 

More from Denis C. Bauer (16)

Trip Report Seattle
Trip Report SeattleTrip Report Seattle
Trip Report Seattle
 
Allelic Imbalance for Pre-capture Whole Exome Sequencing
Allelic Imbalance for Pre-capture Whole Exome SequencingAllelic Imbalance for Pre-capture Whole Exome Sequencing
Allelic Imbalance for Pre-capture Whole Exome Sequencing
 
Centralizing sequence analysis
Centralizing sequence analysisCentralizing sequence analysis
Centralizing sequence analysis
 
Qbi Centre for Brain genomics (Informatics side)
Qbi Centre for Brain genomics (Informatics side)Qbi Centre for Brain genomics (Informatics side)
Qbi Centre for Brain genomics (Informatics side)
 
Differential gene expression
Differential gene expressionDifferential gene expression
Differential gene expression
 
Transcript detection in RNAseq
Transcript detection in RNAseqTranscript detection in RNAseq
Transcript detection in RNAseq
 
Functionally annotate genomic variants
Functionally annotate genomic variantsFunctionally annotate genomic variants
Functionally annotate genomic variants
 
Variant (SNPs/Indels) calling in DNA sequences, Part 2
Variant (SNPs/Indels) calling in DNA sequences, Part 2Variant (SNPs/Indels) calling in DNA sequences, Part 2
Variant (SNPs/Indels) calling in DNA sequences, Part 2
 
Variant (SNPs/Indels) calling in DNA sequences, Part 1
Variant (SNPs/Indels) calling in DNA sequences, Part 1 Variant (SNPs/Indels) calling in DNA sequences, Part 1
Variant (SNPs/Indels) calling in DNA sequences, Part 1
 
Introduction to second generation sequencing
Introduction to second generation sequencingIntroduction to second generation sequencing
Introduction to second generation sequencing
 
Introduction to Bioinformatics
Introduction to BioinformaticsIntroduction to Bioinformatics
Introduction to Bioinformatics
 
The missing data issue for HiSeq runs
The missing data issue for HiSeq runsThe missing data issue for HiSeq runs
The missing data issue for HiSeq runs
 
Deciphering the regulatory code in the genome
Deciphering the regulatory code in the genomeDeciphering the regulatory code in the genome
Deciphering the regulatory code in the genome
 
ReliF
ReliFReliF
ReliF
 
STAR: Recombination site prediction
STAR: Recombination site predictionSTAR: Recombination site prediction
STAR: Recombination site prediction
 
SUMOylation site prediction
SUMOylation site predictionSUMOylation site prediction
SUMOylation site prediction
 

Recently uploaded

Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bSérgio Sacani
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticssakshisoni2385
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...ssifa0344
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsSumit Kumar yadav
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfSumit Kumar yadav
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.Nitya salvi
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡anilsa9823
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencySheetal Arora
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 

Recently uploaded (20)

CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 

How novel compute technology transforms life science research

  • 1. How novel compute technology transforms life science research From Hadoop Spark to cloud-based micro-services HEATH & BIOSECURITY Dr Denis Bauer | Bioinformatics | @allPowerde 6 Dec 2016 – Cloudera Public Sector Government Forum, Canberra stuckincustoms
  • 2. Overview Transformational Bioinformatics | Denis C. Bauer | @allPowerde GT-Scan2 How can genome engineering be made safer? VariantSpark How to find disease genes in population-size cohorts? CSIRO How to facilitate better collaborations?
  • 3. Team CSIRO Transformational Bioinformatics | Denis C. Bauer | @allPowerde 5319 talented staff $1billion+ budget Working with over 2800+ industry partners 55 sites across Australia Top 1% of global research agencies Each year 6 CSIRO technologies contribute $5 billion to the economy
  • 4. Big ideas start here Transformational Bioinformatics | Denis C. Bauer | @allPowerde EXTENDED WEAR CONTACTS POLYMER BANKNOTES RELENZA FLU TREATMENT Fast WLAN Wireless Local Area Network AEROGARD TOTAL WELLBEING DIET RAFT POLYMERISATION BARLEYmax™ SELF TWISTING YARN SOFTLY WASHING LIQUID HENDRA VACCINE NOVACQ™ PRAWN FEED Convenient cardiac rehabilitation Enhancing relationship between patient and mentor Digital data collection Equitable access World's first, clinically validated smartphone based Cardiac Rehab: uptake + 30% and completion +70%
  • 5. Preparation for and recovery from a Total Knee Replacement o Remote monitoring by Clinician o Physiotherapy o Wearable Technology o Gamification
  • 6. Genomic sequencing is revolutionizing Health Care today. It offers up to 50% more diagnoses than standard of care and is on average 96% cheaper Bauer et al. Trends Mol Med. 2014 PMID: 24801560 Transformational Bioinformatics | Denis C. Bauer | @allPowerde
  • 7. Advances in sequencing technology has generated the capacity to sequence the Earth’s Genome in just 10 days The human genome is 3 billion letters long Transformational Bioinformatics | Denis C. Bauer | @allPowerde need 3 billion samples to robustly analyze
  • 8. 100,000 Genomes project 70,000 individuals by 2017 The cancer genome atlas 11,000 samples 2015 Genomics projects hence are getting bigger Transformational Bioinformatics | Denis C. Bauer | @allPowerde The HapMap Project 270 samples 2002 Human genome ~1 sample 1000 Genome Project 1097 samples 2012 ASPREE 4000 healthy 70+ year olds Project MinE 15,000 people with ALS Single samples are around 200GB in size
  • 9. New demands on sequence analysis Transformational Bioinformatics | Denis C. Bauer | @allPowerde • The sheer volume of new data necessitates new approaches. Computational genomics must progress from file formats to APIs, from local hardware to the elasticity of the cloud, from a cottage industry of poorly maintained academic software to professional-grade, scalable code, and from one-time evaluation by publication to continuous evaluation by online benchmarks. Paten et al. The NIH BD2K center for big data in translational genomics JAMIA 2015
  • 10. Elasticity in the Cloud Transformational Bioinformatics | Denis C. Bauer | @allPowerde 1 Elastic cloud compute… is like an In-room sound system Benefits: • Instant availability of adequately powered system • Images can be shared and everything on it is automatically version controlled
  • 11. Efficient scalability2 Kelly et al. Churchill: an ultra-fast, deterministic, highly scalable and balanced parallelization strategy for the discovery of human genetic variation in clinical and population-scale genomics Genome Biology 2015 Bespoke parallelization e.g. Churchill Chromosomal split e.g. NGSANE MapReduce e.g. GATK queue Transformational Bioinformatics | Denis C. Bauer | @allPowerde11 | Beunder 2010 Embedded
  • 12. Population-scale genomic data analysis requires BigData solutions Desktop compute High-performance compute cluster Hadoop/Spark compute cluster Focus small data Compute-intensive Data-intensive Fault tolerant No No Yes Node-bound Yes Yes No Parallelization 10 CPU 100 CPU 1000 CPU Parallelization procedure bespoke bespoke standardized Transformational Bioinformatics | Denis C. Bauer | @allPowerde CSIRO solution
  • 13. Transformational Bioinformatics | Denis C. Bauer | @allPowerde Spark Summit 2016 (June) by Frank Austin Nothaft (UC Berkeley) (70TB – 300 individuals) One human genome analyzed (variant called) every 3.2 hours
  • 14. Still not fast enough… Clinical genomics facilities expect to deal with >18,000 genomes a year, so a 3.2h TAT would accumulate 6.5 years of compute. CSIRO along with other prominent research institutes (MIT, Berkeley) partnered with cloudera and AWS to investigate • HPC-based solutions • GATKspark (The Spark reimplementation of the accepted gold standard) • ADAM Transformational Bioinformatics | Denis C. Bauer | @allPowerde
  • 15. Setup Transformational Bioinformatics | Denis C. Bauer | @allPowerde • Instances – 5 worker – 3 Hadoop scheduler – one Cloudera manager • Why we chose to go with a cloudera solution – Set-up and deploy is automated, e.g. no manual IP-address matching – No need for admin support, e.g. preconfigured – Set up is portable to other providers and on-premise
  • 16. All humans carry between 200 to 800 mutation that disrupt the function of a gene. Which needle is the right one? Transformational Bioinformatics | Denis C. Bauer | @allPowerde http://science.sciencemag.org/content/335/6070/823.full https://waynealliance.wordpress.com/2010/06/02/all-needles-no-hay/
  • 17. Transformational Bioinformatics | Denis C. Bauer | @allPowerde BMC Genomics 2015, 16:1052 PMID: 26651996 (IF=4) 0 1000 2000 Python R H adoop Adam AD M IXTU R E VariantSpark method timeinseconds task binary−conversion clustering pre−processing It can classify 3000 individuals and 80 million variants in under 30 minutes
  • 18. • Collaboration between CSIRO, NCI and the John Curtin School of Medical Research (JCSMR) • Reuse AWS cluster on NCI on-premise cluster. – Cluster built by joint effort by CSIRO Hadoop administrator and local Cloudera staff – VariantSpark deployed and running within only 3 days • Demonstration of the lower risk for organisations with proof of concept Setup Transformational Bioinformatics | Denis C. Bauer | @allPowerde
  • 19. NHMRC: Dementia Research Teams Grant led by Ian Blair (MQ) Developing insight into the molecular origins of familial and sporadic frontotemporal dementia and amyotrophic lateral sclerosis Transformational Bioinformatics | Denis C. Bauer | @allPowerde Affected 900 WGS Normal 1400 WGS Identify causative mutations Cluster Individuals on disease progression Application cases for a VariantSpark cluster Kidney disease: Simon Foote (JCSMR) Uncover genetic cause of early onset kidney failure.
  • 20. Genome Engineering is currently being developed for medical treatments in humans, such as cancer, blindness, HIV treatment. However, the molecular technology, CRISPR, is not 100% efficient. Aim: Develop computational guidance framework to enable edits the first time; every time. Transformational Bioinformatics | Denis C. Bauer | @allPowerde
  • 21. Achieving the first time; every time 1. Better understanding of the science 2. Higher powered computational tools • Super-computing-scale analysis • Interactive real time analysis (query style research) Transformational Bioinformatics | Denis C. Bauer | @allPowerde lauren riddoch iconfinder GT-Scan2 Ranked choices
  • 22. • We tested GT-scan2.0 against two publically available models: • sgRNAscorer (Chari et al 2015, Nature Methods) • WU-CRISPR (Wong et al 2015, Genome Biology) • Tested 2 independent datasets (>4000 sgRNAs) • Our chromatin aware model consistently outperformed the other models Better Science Transformational Bioinformatics | Denis C. Bauer | @allPowerde AreaUnderthePrecision/RecallCurveRecall Precision Validation Set 1
  • 23. Higher powered instantaneous compute Desktop compute High-performance compute Hadoop/Spark Microservices Focus small data Compute-intensive Data-intensive Agility Fault tolerant No No Yes (Yes) Node-bound Yes Yes No No Parallelization 10 CPU 100 CPU 1000 CPU 1000 CPU Parallelization procedure bespoke bespoke standardized standardized Overhead in the cloud NA spin-up lag spin-up lag instantaneously Transformational Bioinformatics | Denis C. Bauer | @allPowerde CSIRO solution
  • 24. Transformational Bioinformatics | Denis C. Bauer | @allPowerde stuckincustoms Area Under the Precision/Recall Curve International Recognition
  • 25. Implementation Transformational Bioinformatics | Denis C. Bauer | @allPowerde • GT-Scan2.0 is implemented as a AWS Lambda function • Server-less function: • Does not require users to have high-compute power • Scalable: • Can be easily scaled to whole genome analysis • Also intend to implement as a “stand-alone” • Can be run on local servers • Can incorporate your own ChIP-seq data rather than public data
  • 26. On-demand instances vs Lambda Pro Con Lambda Instantaneously available Rel. small processing power Spark-cluster Unlimited processing power Spin-up time Transformational Bioinformatics | Denis C. Bauer | @allPowerde Sweet-spot for when large number of “nimble” small processors give a worse performance compared to a powerful cluster with overhead. Especially, with spin up overhead reduced with managers like cloudera Director.
  • 27. Three things to remember • Large volumes of detailed data? VariantSpark, bringing bigLearning to genomics, can classify 3000 individuals and 80 million variants in under 30 minutes using Spark • Parallelizable tasks persistent cloud-availability? GT-Scan2, computationally guiding genome engineering, uses Chromatin information and the latest in cloud- compute to improve CRISPR target site identification • CSIRO specializes in using the latest advances in compute technology to push the boundary on bioinformatics problems Transformational Bioinformatics | Denis C. Bauer | @allPowerde
  • 28. Natalie Twine Acknowledgements Transformational Bioinformatics | Denis C. Bauer | @allPowerde Denis Bauer Oscar Luo Rob Dunne Piotr Szul Transformational Bioinformatics Team Aidan O’BrienLaurence Wilson Adrian White Mia Champion Gaetan Burgio Collaborators David LevyIan Blair Kelly Williams News Software Open Position Dan Andrews

Editor's Notes

  1. Staff # as at 3 March 2016 = 5319 2014–15 budget = $1.2 billion -------------------- Today we have around 5300 talented people working out of 50-plus centres in Australia and internationally. We are a billion dollar organisation We generate $485+ million in external revenue – essentially nearly 40% per cent of our revenue is externally sourced Our people work closely with industry and communities to leave a lasting legacy. Our ability to achieve results is shown by the quality of our research. We are in the top 1% of global research institutions in 15 of 22 research fields and in the top 0.1% in four research fields. CSIRO is the key connector of institutions in the Australian system for some areas. CSIRO is the most central Australian institution in 6 research fields – Agricultural Sciences, Environment/Ecology, Plant and Animal Sciences, Geosciences, Chemistry and Materials Science. CSIRO works with 1208 SME’s and 2,877 customers each year. We’re always looking for ways we can help business and industry.
  2. Our work has impacted the daily lives of Australians and those around the world. These are some of our top inventions. We invented the world’s best wireless technology for our homes and offices We developed the Total wellbeing diet – a higher protein, low-fat diet that’s nutritious, and facilitates sustainable weight loss We developed Softly washing liquid – the first formula to successfully wash wool at high temperatures, killing bacteria while not shrinking the wool We developed Barleymax a high fibre wholegrain, which has four times the resistant starch and twice the dietary fibre of regular grains We invented Relenza, a treatment for flu We kept flies off her majesty, Queen Elizabeth II by creating Aerogard We invented plastic (polymer) banknotes which are now exported to 25 countries with more than 3 billion notes currently in circulation We invented Raft (Reverse Addition Fragmentation chain Transfer) technology enabling companies to develop new and advanced materials We developed self-twisting yarn and made children's clothing safer than anywhere in the world We invented contact lenses that can be worn for a month at a time We invented Equivac HeV Vaccine for Hendra virus to protect Australian horse owners and the equine industry Novacq prawn feed – need words
  3. http://www.nature.com/nature/journal/v462/n7276/fig_tab/nature08645_F1.html Bauer et al. Trends Mol Med. 2014 PMID: 24801560.
  4. http://www.nature.com/nature/journal/v462/n7276/fig_tab/nature08645_F1.html Bauer et al. Trends Mol Med. 2014 PMID: 24801560.
  5. https://waynealliance.wordpress.com/2010/06/02/all-needles-no-hay/
  6. http://www.nature.com/nature/journal/v462/n7276/fig_tab/nature08645_F1.html Bauer et al. Trends Mol Med. 2014 PMID: 24801560.
  7. http://lauren-riddoch.squarespace.com/about-10-seconds/ https://www.iconfinder.com/icons/805428/chemistry_experiment_lab_laboratory_researcher_scientist_icon
  8. Image source: https://www.youtube.com/watch?v=GEzP4lY-24Q