Deep learning in medicine: An introduction and applications to next-generation sequencing and disease diagnostics

Confidential + Proprietary
Deep learning in medicine: An
introduction and applications to
next-generation sequencing and
disease diagnostics
Allen Day, PhD, allenday@google.com, Twitter @allenday

Brain DeepMind
Cloud
Healthcare
Verily Calico
Google/Alphabet teams involved in healthcare

The basics of ML

Confidential & Proprietary
Observation: programming a computer to be clever is harder than
programming a computer to learn to be clever.
Intro to machine learning and deep learning

Traditional Machine Learning...vs the new way
The old way:
Write a computer program
with explicit rules to follow
if email contains V!agrå
then mark is-spam;
if email contains …
if email contains …
The new way:
Write a computer program to
learn from examples
try to classify some emails;
change self to reduce errors;
repeat;

Proprietary & Confidential
Deep Neural Networks Step 1: training

Deep Neural Networks Step 2: inference

[Tiger-Dog]: 0.9890
[Tiger] : 0.9791
[Dog] : 0.9311
[Pet] : 0.8139
[Fence] : 0.7998
…
[ゴジラ　　]: 0.0120

Key Innovation: Learns Features from the Data
HIGH LEVEL COMPLEX DETECTORS
PARTS OF OBJECTS, MORE COMPLEX
PATTERNS
PRIMITIVE FEATURES: EDGES, BLOCKS
OF COLORS, ETC.
INPUT: RAW DATA

“cat”
Deep Learning Revolution
Modern Reincarnation of Artificial Neural Networks
Collection of simple trainable mathematical units, organized in layers, that work together to solve
complicated tasks
Key Benefit
Learns features from raw, heterogeneous data
No explicit feature engineering required
What’s New
layered network architecture,
new training math, *scale*

Accuracy
Scale (data size, model size)
1980s and 1990s
neural networks
other approaches

more
computeAccuracy
neural networks
other approaches
1980s and 1990s

more
computeAccuracy
neural networks
other approaches
Now

Szegedy et al, 2014
“Inception” Module.
Auxiliary Classifiers
Pr(dog)
GoogLeNet (aka “Inception”) Architecture
Main Classifier

Confidential & Proprietary* Human Performance based on analysis done by Andrej Karpathy. More details here.
%errors
Year
Image understanding is getting better than human
level
ImageNet Challenge: Given
an image, predict one of
1000+ classes

Search
Search ranking
Speech recognition
Gmail
Smart Reply
Spam classification
Photos
Photos search
Translate
text, graphic, and
speech translations
Android
Keyboard & speech input
Drive
Intelligence in Apps
YouTube
Video recommendations
Better thumbnails
Cardboard
Smart stitching
Play
App recommendations
Game developer experience
Ads
Richer Text Ads
Automated Bidding
Chrome
Search by Image
Maps
Street View image
Parsing Local Search
Machine learning has transformed Google’s products

Google in Health

Medical applications of deep learning technology
● Deep learning has remarkable efficacy
○ Amazing with images: photos, search, streetview, Android cameras, …
○ And with speech, language, data centers, …
● How and where can we apply this in medicine and biotechnology?
○ Medical imaging: ophthalmology, pathology, ...
○ Genomics
○ ...

Confidential + ProprietaryConfidential + Proprietary
Diabetes causes
blindness
5-10% of population is diabetic
Should be screened annually for
diabetic retinopathy
Fastest growing cause of blindness
# Diabetics >> qualified graders
● 387M diabetics, 200k ophthalmologists
● Grading is highly technical
Poor adherence to care plan
● No symptoms, preventive not curative
● 30-50% screened in US
● 10% in high risk populations
● Many lost to follow up

How DR is Diagnosed: Retinal Fundus Images
Healthy Diseased
Hemorrhages
No DR Mild DR Moderate DR Severe DR Proliferative DR

Even when available, ophthalmologists are not consistent...
Consistency: intragrader ~65%, intergrader ~60%
Ophthalmologist Graders
Patient
Images

Adapt deep neural network to read fundus images
Conv Network - 26 layers
No DR
Mild DR
Moderate DR
Severe DR
Proliferative DR
Labeling tool
54 ophthalmologists
130k images
880k
diagnoses

0.95
F-score
Algorithm Ophthalmologist
(median)
0.91
“The study by Gulshan and
colleagues truly represents the
brave new world in medicine.”
“Google just published this paper
in JAMA (impact factor 37) [...] It
actually lives up to the hype.”
Dr. Andrew Beam, Dr. Isaac Kohane
Harvard Medical School
Dr. Luke Oakden-Rayner
University of Adelaide

Digital pathology
JAMA. 2015; 313(11):1122-1132
Correct
diagnosis
87%
48%
84%
96%
75%
Example: Breast Cancer Biopsies
Overdiagnosis
Underdiagnosis
1 in 12 breast cancer biopsies is misdiagnosed (population adjusted)
Similar for other cancer types (prostate 1 in 7, etc)

Detecting breast cancer metastases in lymph nodes
detail ←→ context
Multi scale model
resembles microscope
magnifications
● Goal: train a deep learning
model to identify cancerous
cells in pathology slide images
● Output: a map over the whole
image, indicating the probability
that each region harbors cancer
cells
● Trained on ~23M images
patches extracted from
gigapixel slide images of normal
(n=127) and cancerous (n=88)
tissues from Camelyon16
dataset

Tumor localization score (FROC) of 0.89 vs 0.73 for pathologist with unlimited time
(92% sensitivity with 8 false positives per slide vs. 73% sensitivity with 0 false positives per slide)
Slide level classification of AUC of 0.96 (on par with pathologist)
Predicted RegionsGround truth MaskOriginal Slide
Metastatic cell detection results are encouraging
Cancer
cells
Read more at https://arxiv.org/abs/1703.02442

Deep learning in genomics
New application area
Example papers: Alipanahi et al (2015),
Park Y, Kellis M (2015); Xiong et al
(2015); Zhou, Troyanskaya (2015);
Angermueller et al (2016)
Deep learning to call variants
Goals: (1) replace statistical machinery
with single deep learning model; (2)
state-of-the-art or better performance;
(3) generalize to new technologies.
Start with human germline
Use the germline case to figure out
deep learning data representation and
models. Extend the approach to
somatic mutations, non-human, etc..
Variant calling
Key challenge in genomics due to
complex errors of NGS technologies.
Current error rates vary from <1% for
germline SNPs to >25% somatic indels.

Where should we get started applying deep
learning to genetics and genomics problems?
Must-haves for deep learning
● Lots of data: >50k examples, >1M ideal.
● High-quality input data and labels for training.
● The mapping from data=>label is unknown but certainly exists.
● High-quality previous efforts so we know that deep learning is key.
○ i.e., hard to solve with classical statistical/ML approaches.
SNP and indel calling from NGS data

Figuring out the true genome sequence from NGS data is
a computational and statistical challenge
.......... cttgggttga tattgtcttg gaacatggag gttgtgtcac cgtaatggca caggacaaac cgactgtcga
catagagctg gttacaacaa cagtcagcaa catggcggag gtaagatcct actgctatga ggcatcaata tcagacatgg
cttcggacag ..........
True genome sequence: 3 billion bases
in 23 contiguous chunks (chromosomes)
Actual sequencer output: ~1 billion ~100
basepair long DNA reads (30x coverage)
Reference: ...ttgtcttggaacatggaggttgtgtcaccgtaatggcacaggacaaacc...
Read1: ...ttgtcttggaacatggaggttgtgtgaccgtaatggcacaggacaaacc
Read2: ...ttgtcttggaacatggaggttgtgtgaccgtaatggcacaggacaaacc...
Read3: tggaacatggaggttgtgtgaccgtaatggcacaggacaaacc...
Align reads to a
reference genome
Infer the true genomic
sequence(s)*
Step 1 Step 2
Read1: cttgggttgatattgtcttggaacatggaggttgtgtcaccgtaatggcacaggacaaacc
Read2: gatattgtcttggaacatggaggttgtgtcaccgtaatggcacaggacaaaccgactgtcg
Read3: tggaacatggaggttgtgtcaccgtaatggcacaggacaaaccgactgtcgacatagagct
Read4: ggttgtgtcaccgtaatggcacaggacaaaccgactgtcgacatagagctggttactgtcg
....
Read 1,000,000,000: ....caactgtcgacatagagctggttactgtcgacatagagctggtt
Reads aligned to a reference genome
Same as reference Same as reference

A complex error process makes it difficult to
call variants accurately in NGS data
Errors come from many
uncontrollable sources
Quality of the sample DNA
Protocol used to prepare
sample for the sequencer
From physical properties of
instrument itself
Data processing artifacts
Errors are correlated among
the reads
The most accurate variant
callers, such as the GATK,
use multiple techniques, e.g.
● Logistic regression
● Hidden Markov Models
● Bayesian inference
● Gaussian mixture
models
All make approximations
known to be invalid
Existing statistical techniques
work okay...
...but have well-known
drawbacks
Rely on hand-crafted features
Hand optimized parameters
Require years of work by
domain experts
Specialized to specific prep,
sequencer, tool chain, etc
Hard to generalize to new
technologies

Other features
ACGTGCCCCAAACGTGATGATC
ACGTGCCCCAACC---------
--GTGCCCCAAACGT-------
----GCCCCAAACGTGA-----
-------CCAACCGTGATG---
--------CAAACGTGATGATC
----------ACCGTGATGATC
Ref
Read
bases
Qualities
Pileup image
A
A
A
C
C
C
A
Reference
Reads
Candidate site
0.01 0.95 0.04
hom
ref het
hom
alt
Heterozygous
variant call
Genotype
likelihoods
CNN
Find candidate variants Create pileup images Evaluate image and call variants
DeepVariant
Recasting variant calling for deep learning

Encoding is roughly red = {A,C,G,T}; green = {quality score}; blue = {read strand};
alpha = {matches ref genome}
True
SNPs
True
Indels
False
variants
Encode reads and reference genome as images

Use inception-v3 to call variant genotype
Szegedy et al. 2015, https://arxiv.org/abs/1512.00567

Genome in a Bottle provides ground truth
human variation
● Extensive sequencing by orthogonal methods of single human (NA12878)
● Stringent criteria identify “callable genomic regions” and true variants
○ ~3.7M regions (covering 80% of genome) identified as callable
○ ~2.8M single nucleotide polymorphisms
○ ~350k small insertion/deletions
● Train and test on biological replicates of NA12878
○ Each germline WGS dataset provides ~3.7M labeled training variants
○ 2.1M true heterozygous variants
○ 1.3M true homozygous variants
○ 215k false positive variants
Zook et al. 2014

DeepVariant works well in our in-house evaluations
Train model on
training
chromosomes
Evaluate on
held-out
chromosomes
Call
variants
Outperforms GATK on human dataMethodology

Estimated P(error) [Phred-scaled, -10 log10(P(error))]
DeepVariant
GATK
Perfect calibration lineObservedP(error)
This is the
calibration for
heterozygous SNPs
but other variant
types and genotype
states are similar.
DeepVariant learns an accurate model of the
likelihood function P(genotype | reads)

DeepVariant learns an accurate model of the
likelihood function P(genotype | reads)
● Variants should be
correct at the
assigned
confidence rate to
be well-calibrated
● Genotype
likelihoods are the
critical input to
genomic analyses
such as imputation,
de-novo mutation
and association
Most callers are overconfident in their likelihoods

After lots of internal testing, we entered into the public
FDA-sponsored PrecisionFDA competition in April 2016
Unblinded training
sample
Blinded evaluation
sample

99.85
98.91
DeepVariant won an award at the 2016
PrecisionFDA competition
v2 => v3 truth set
for unblinded
sample
Unblinded =>
blinded sample
with v3 truth set
F-measure is the harmonic mean of precision and recall.

A trained DeepVariant model encodes everything needed
to call variants, enabling us to apply it in novel contexts
Training data Evaluation data F1
b37 chr1-19 b38 chr20-22 99.45%
b38 chr1-19 b38 chr20-22 99.47%
You can train on one genome build
and call variants on another
You can train on human data and call
mouse data
F1 is the harmonic mean of precision and recall.
Training data Evaluation data F1
Human chr1-19 Mouse chr18-19 98.29%
Mouse chr1-17 Mouse chr18-19 97.84%
Call variants on b38 using a model trained on
either b37 or b38 with effectively identical quality.
Means we can call on a genome build without
needing all of the metadata mapped to that build.
Robust to protocol differences; human: 50x
2x148bp HiSeq 2500; mouse: 27x 2x100bp GAII.
Leverage the larger and better truth data on
humans (e.g., ~5M in humans vs. ~700K in mouse)
to call variants in other organisms.

Dataset
10X Chromium
75x WGS
Ion AmpliSeq
exome
PacBio raw
reads 40x WGS
SOLID SE 85x
WGS
Illumina
TruSeq exome
DeepVariant
(F1 metric)
99.3% 96.9% 92.7% 86.4% 96.1%
Comparator
(F1 metric)
98.2% 97.3%1
56.1%2
78.8%3
95.4%
Comparator
caller
Long Ranger TVC samtools GATK ensemble
1
Uses four lanes of data vs. one for DeepVariant; 2
No standard caller exists for this technology for human
samples; 3
Old technology without any maintained variant callers.
DeepVariant can learn to call variants in many
sequencing technologies

DeepVariant can learn to call variants at a
range of input sequence depths
Sensitivity Precision
Sequencing depth Sequencing depth
GATK
DV 35-45x
DV 4-45x
DV 15-25x
GATK
DV 35-45x
DV 4-45x
DV 15-25x

DeepVariant outperforms GATK on low-coverage samples
Training on chromosomes 1-19
Evaluation on chromosomes 20-22

DeepVariant conclusions
● Deep Learning is a remarkably powerful and flexible technology.
● Example of how to apply deep learning to a genomics problem.
● Equivalent or better performance than current variant calling tools.
● Works for many (any?) sequencing technology.
● Run now at https://cloud.google.com/genomics/v1alpha2/deepvariant
● Open-sourced version coming soon!
● Read more in our BioRxiv paper https://doi.org/10.1101/092890.

Google confidential │ Do not distribute
Google’s Data Research...
2002 2004 2006 2008 2010 2012 2014 2016
GFS
MapReduce TensorFlow
BigTable
Dremel
Colossus
Flume
Megastore
Spanner
Millwheel
PubSub
F1

...are the technologies used in DeepVariant...
2002 2004 2006 2008 2010 2012 2014 2016
GFS
MapReduce TensorFlow
BigTable
Dremel
Colossus
Flume
Megastore
Spanner
Millwheel
PubSub
F1

... which are available to you today on GCP
2002 2004 2006 2008 2010 2012 2014 2016
ML
PubSub
DataFlow
DataStore
DataFlow
Cloud Storage
BigQuery
BigTable
DataProc
Cloud Storage

Confidential + ProprietaryConfidential + Proprietary
Sharing our tools with researchers and developers
around the world
repository
for “machine learning”
category on GitHub
#1
TensorFlow released
in Nov. 2015

Build What’s Next
Thank You!
Allen Day, PhD // Science Advocate // @allenday // #genomics #ml #datascience
Brain DeepMind
Cloud
Healthcare
Verily Calico

Deep learning in medicine: An introduction and applications to next-generation sequencing and disease diagnostics

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a Deep learning in medicine: An introduction and applications to next-generation sequencing and disease diagnostics

Similar a Deep learning in medicine: An introduction and applications to next-generation sequencing and disease diagnostics (20)

Más de Allen Day, PhD

Más de Allen Day, PhD (20)

Último

Último (20)

Deep learning in medicine: An introduction and applications to next-generation sequencing and disease diagnostics