SlideShare una empresa de Scribd logo
1 de 21
Introduction to
machine learning
in genomics
BRIAN SCHILDER
BIOINFORMATICIAN II
RAJ LAB 08/21/2020
[ 1 ] N A S H F A M I L Y D E P A R T M E N T O F N E U R O S C I E N C E &
F R I E D M A N B R A I N I N S T I T U T E
[ 2 ] R O N A L D M . L O E B C E N T E R F O R A L Z H E I M E R ’ S D I S E A S E
[ 3 ] D E P A R T M E N T O F G E N E T I C S A N D G E N O M I C S C I E N C E S &
I C A H N I N S T I T U T E F O R D A T A S C I E N C E A N D G E N O M I C
T E C H N O L O G Y ,
[ 4 ] E S T E L L E A N D D A N I E L M A G G I N D E P A R T M E N T O F
N E U R O L O G Y
Approaches to making predictions
L Breiman, Statistical modeling: The two
cultures. Statistical Science. 16, 199–215 (2001).
Explicit modeling
(your brain learns x~y)
“I will predict y from x by
assuming relationships based on
my knowledge/the literature.”
Pros
Can utilize the
prevailing wisdom.
Highly interpretable
models.
Cons
Susceptible to bias/
assumptions/
arbitrary parameters.
May not explain the
variance very well.
Machine learning
(your computer learns x~y)
“I will predict y from x by having
an algorithm learn the
relationships from data.”
Pros
Less susceptible to
(some forms) of
human bias.
Can make
predictions from
complex/multi-
variate data.
Cons
Can be less
interpretable.
May not generalize
to other data.
• What’s the relationship
between x and y?
• If you do something to x,
what will happen to y?
Science in a nutshell
cells
What is machine learning?
Artificial
Intelligence
The automation of
tasks that normally
require human
intelligence.
Machine
Learning
Automated
optimization of
some function by
learning directly
from data (as
opposed to
following explicit
rules).
x > 4
If True y + z < 2
If True =
Go to Dr.
If False =
Go to ER
If False Stay home
vippng.com
General ML framework
Input
training
data
Output
predictions
Evaluate
accuracy
against real
answer
Adjust
model
1. Training phase
2. Testing phase
Output predictions
Input testing data
Supervised learning example
Input data
• Categorical
• Continuous
MODEL
• Logistic
regression
• Linear
regression
• GLMM
• SVM
• Neural
network
• Genetic
algorithm
• etc...
Output
prediction
• Categorical
• Continuous
Dog
(.04)
Cat
(.96)
Transform data
(or a gene
expression
vector…)
Correct!
+1
ML vs. statistics: an increasingly blurry line
◦ Math and statistics were developed well before the advent of computers.
◦ Modern computers enable rapid iterative processes (optimization, distribution simulation)
◦ Linear regression, PCA and t-SNE are all technically AI/ML, but we often don’t think of them that way anymore.
https://towardsdatascience.com/introduction-to-linear-regression-and-polynomial-regression-f8adc96f31cb
https://ai.googleblog.com/2018/06/realtime-tsne-visualizations-with.html
Linear regression PCA t-SNE
Ways we use ML in biology
• DGE
• GWAS
• LDScore
• Batch correction
• …
Regression
• PCA
• MDS
• t-SNE
• UMAP
• Manifold learning
• Autoencoders
• …
Dimensionality Reduction
• Centroid-based
• K-means
• Hierarchical
• Agglomerative
• Density-based
• DBSCAN
• Louvain
• Distribution-based
• Expectation-maximization
Clustering
• Co-expression
• Multi-omics
• Causality
• Temporal graphs
• Imputation
• …
•Networks
…and much more!
Which ML model do I use?
Complex
relationships?
Yes
Need high
interpretability?
Yes Simpler model
No Lots of data?
Yes
More complex
model
No Simpler model
No Simpler model
https://towardsdatascience.com/the-balance-
accuracy-vs-interpretability-1b3861408062
In practice, you try multiple models of
varying complexity and compare
performances.
Deep learning in genomics
What is deep learning?
Eraslan et al. 2019, Nature Genetics Review
Given their sequences (input)
are how probable is it that these
regions are binding motifs for
TF A (output)?
Given their sequences (input)
are how probable is it that these
regions are binding motifs for
TF A (output 1) or TF B
(output 2)?
Given their sequences (input 1)
and chromatin accessibility
profiles (input 2) are how
probable is it that these regions
are binding motifs for TF A
(output)?
node
2-1
node
2-2
node
3-1
node
2-3
node
3-2
Layer 2
(hidden)
Layer 3
(output)
node
1-1
node
1-2
Layer 1
(input)
Pros
• Extremely flexible framework.
• Highly parallelizable (GPUs).
• Able to learn complex, non-linear
features.
Cons
• Challenging to interpret.
• Can require lots of compute.
• [Usually] requires lots of data.
https://www.cybercontrols.org/neuralnetworks
Deep learning in genomics
So what exactly can you do
with deep learning?
Predict [x] from DNA sequence
Disease risk
Gene expression
Splicing
TF motifs
Epigenomic impact
DNA sequence
•primateAI
•Deep Structured Phenotype Network (DSPN)
•xpresso
•spliceAI
•Equivariant networks
•DeepSEA
•DeepFIGV
•Avocado
• In many cases, performance of
deep learning models
performed far better than other
approaches (e.g. heuristics,
SVM)
• That said, rigorous testing on
sufficiently different datasets
than were used in training is
key (but often difficult)
Why are ANN so useful for sequences?
◦ DNA sequences are really hard for
humans to understand.
◦ Especially true when considering
long sequences, or multi-scale non-
linear interactions.
◦ Artificial neural networks (ANN)
excel at complex feature learning (e.g.
image recognition).
◦ CNNs are great for learning
hierarchical features
◦ nostril < nose < face < cat
◦ Humans can then interrogate and
interpret these features.
Eraslan et al. 2019, Nature Genetics Review
Encoded
Input ANN model
Predict-
ion
Other data types
Denoise noisy data (e.g. scRNA-seq)
Deep count autoencoder (DCA)
◦ Eraslan et al. (2019), Nature Communications
Learning with AuToEncoder
◦ Badsha et al. (2020), Quantitative Biology
SAVER-X
◦ Wang et al. (preprint) bioRxiv
Dimensionality reduction
~ 70k PBMCs
Transcriptomes
Realistic inference
Latent space interpolation:
◦ [Conditional] variational autoencoders (VAE)
◦ [style transfer] Generative adversarial networks (GAN)
(Gómez-Bombarelli, et al. 2018)
scGen (Lotfollahi, Wolf, & Theis, 2019)(Pieters & Wiering,(biorxiv) 2018 )
Stimulated (e.g. IFN-β )
peripheral blood mononuclear cell (PBMC):
e.g. T/B/NK cells, monocytes
Drugs
Faces
Disease prediction
“…we developed an interpretable deep-learning framework, the
Deep Structured Phenotype Network (DSPN) (21). This model
combines a Deep Boltzmann Machine architecture with
conditional and lateral connections derived from the regulatory
network (50).”
Improvement over baseline (50%)
• Logistic predictor: 2.4-fold
• DSPN: 6-fold
• Captures non-linear interactions
When does deep learning fail?
When there’s not enough
training/testing data.
Can contribute to
overfitting; model
can’t translate to
other datasets.
When the data hasn’t been
preprocessed properly, or
has some other
uncorrected confound.
e.g. White label on
bottom of image
from disease-
specialty hospital.
When a high degree of
interpretability and
explainability are required.
e.g. Clinical
decision support.
When a simpler model can
do just as well for less
compute.
Always compare
performance to
other methods.
When you’re asking the
wrong question, or the
fitness function is
mispecified.
Requires domain
knowledge.
Deep learning references
Reviews
◦ J Zou et al., A primer on deep learning in genomics. Nature Genetics
(2018), doi:10.1038/s41588-018-0295-5.
◦ G Eraslan et al., Deep learning: new computational modelling
techniques for genomics. Nature Reviews Genetics (2019),
doi:10.1038/s41576-019-0122-6.
◦ TJ Cleophas et al., Machine Learning in Medicine. Circulation. 132,
1920–1930 (2015).
◦ P Baldi, Deep Learning in Biomedical Data Science. Annual Review
of Biomedical Data Science. 1, 181–205 (2018).
◦ R Miotto et al., Deep learning for healthcare: review, opportunities
and challenges. Briefings in Bioinformatics. 19, 1236–1246 (2017).
◦ VI Jurtz et al., An introduction to deep learning on biological
sequence data: Examples and solutions. Bioinformatics. 33, 3685–
3690 (2017).
◦ MKK Leung et al., Machine Learning in Genomic Medicine: A
Review of Computational Problems and Data Sets. Proceedings of the
IEEE. 104, 176–197 (2016).
◦ DSW Ho et al., Machine learning SNP based prediction for
precision medicine. Frontiers in Genetics. 10, 1–10 (2019).
◦ A Taylor-Weiner et al., Scaling computational genomics to millions
of individuals with GPUs. Genome Biology. 20, 1–5 (2019).
◦ L Breiman, Statistical modeling: The two cultures. Statistical Science.
16, 199–215 (2001).
◦ BS Ullman, Using neuroscience to develop artificial intelligence.
363, 692–694 (2019).
◦ A Marblestone et al., Towards an integration of deep learning and
neuroscience. 10, 1–41 (2016).
◦ Y Bengio et al., Towards Biologically Plausible Deep Learning
(2015), doi:10.1007/s13398-014-0173-7.2.
◦ KM Chen et al., Selene: a PyTorch-based deep learning library for
sequence-level data. Nature Methods. 16, 315–318 (2019).
Genomics
◦ Disease risk
◦ D Wang et al., Comprehensive functional genomic resource and integrative model for the
adult brain. Science, 1266 (2018).
◦ L Sundaram et al., Predicting the clinical impact of human mutation with deep neural
networks. Nature Genetics. 50, 1161–1170 (2018).
◦ Y Ding et al., A deep learning model to predict a diagnosis of Alzheimer disease by using
18 F-FDG PET of the brain. Radiology. 290, 456–464 (2019).
◦ I Klyuzhin et al., Use of deep convolutional neural networks to predict Parkinson’s disease
progression from DaTscan SPECT images. Journal of Nuclear Medicine. 59, 29 (2018).
◦ KK Dey et al., Evaluating the informativeness of deep learning annotations for human
complex diseases. bioRxiv, 784439 (2019).
◦ A Romagnoni et al., Comparative performances of machine learning methods for
classifying Crohn Disease patients using genome-wide genotyping data. Scientific Reports. 9,
1–18 (2019).
◦ CAC Montañez et al., Deep Learning Classification of Polygenic Obesity using Genome
Wide Association Study SNPs. Proceedings of the International Joint Conference on Neural
Networks. 2018-July (2018), doi:10.1109/IJCNN.2018.8489048.
◦ Gene expression
◦ V Agarwal et al., Predicting mRNA Abundance Directly from Genomic Sequence Using
Deep Convolutional Neural Networks ll Predicting mRNA Abundance Directly from
Genomic Sequence Using Deep Convolutional Neural Networks. Cell Reports. 31, 107663
(2020).
◦ X Li et al., The impact of rare variation on gene expression across tissues. Nature. 550,
239–243 (2017).
◦ JD Washburn et al., Evolutionarily informed deep learning methods for predicting relative
transcript abundance from DNA sequence. Proceedings of the National Academy of Sciences of
the United States of America. 116, 5542–5549 (2019).
◦ Y Zhang et al., Predicting Gene Expression from DNA Sequence using Residual Neural
Network. bioRxiv, in press, doi:10.1101/2020.06.21.163956.
◦ Epigenomics
◦ J Zhou et al., Predicting effects of noncoding variants with deep learning-based sequence
model. Nature Methods. 12, 931–934 (2015).
◦ GE Hoffman et al., Functional Interpretation of Genetic Variants Using Deep Learning
Predicts Impact on Epigenome. Nucleic Acids Research, 1–15 (2019).
◦ J Schreiber et al., Avocado: a multi-scale deep tensor factorization method learns a latent
representation of the human epigenome. Genome Biology. 21, 364976 (2018).
◦ Splicing
◦ K Jaganathan et al., Predicting Splicing from Primary Sequence with Deep Learning. Cell.
0, 535-548.e24 (2019).
◦ TF
◦ RC Brown et al., An equivariant Bayesian convolutional network predicts recombination
hotspots and accurately resolves binding motifs. Bioinformatics. 35, 2177–2184 (2019).
Transcriptomics
◦ G Eraslan et al., Single-cell RNA-seq denoising using a deep count
autoencoder. Nature Communications. 10, 1–14 (2019).
◦ M Lotfollahi et al., scGen predicts single-cell perturbation responses. Nature
Methods. 16, 715–721 (2019).
◦ M Colomé-Tatché et al., Statistical single cell multi-omics integration. Current
Opinion in Systems Biology. 7, 54–59 (2018).
◦ C Lin et al., Using neural networks for reducing the dimensions of single-cell
RNA-Seq data. Nucleic Acids Research. 45 (2017), doi:10.1093/nar/gkx681.
◦ GP Way et al., Bayesian deep learning for single-cell analysis. Nature Methods.
15, 1009–1010 (2018).
◦ R Lopez et al., Deep generative modeling for single-cell transcriptomics.
Nature Methods. 15, 1053–1058 (2018).
◦ J Wang et al., Data denoising with transfer learning in single-cell
transcriptomics. Nature Methods. 16 (2019), doi:10.1038/s41592-019-0537-1.
◦ OmicsMapNet: Transforming omics data to take advantage of Deep
Convolutional Neural Network for discovery.
Drug discovery
◦ R Gómez-Bombarelli et al., Automatic Chemical Design Using a Data-Driven Continuous
Representation of Molecules. ACS Central Science. 4, 268–276 (2018).
◦ CF Lipinski et al., Advances and Perspectives in Applying Deep Learning for Drug Design and
Discovery. 6, 1–6 (2019).
◦ L David et al., Applications of deep-learning in exploiting large-scale and heterogeneous
compound data in industrial pharmaceutical research. Frontiers in Pharmacology. 10, 1–16 (2019).
Imaging
◦ G Lee et al., Predicting Alzheimer’s disease progression using multi-modal
deep learning approach. Scientific Reports. 9, 1–12 (2019).
◦ H Chen et al., VoxResNet: Deep voxelwise residual networks for brain
segmentation from 3D MR images. NeuroImage. 170, 446–455 (2018).
◦ T Jo et al., Deep Learning in Alzheimer’s Disease: Diagnostic Classification
and Prognostic Prediction Using Neuroimaging Data. Frontiers in Aging
Neuroscience. 11 (2019), doi:10.3389/fnagi.2019.00220.
◦ A Iqbal et al., Developing a brain atlas through deep learning. Nature Machine
Intelligence. 1, 277–287 (2019).
◦ A Mahbod et al., Automatic brain segmentation using artificial neural
networks with shape context. Pattern Recognition Letters. 101, 74–79 (2018).
◦ P Kumar et al., U-SEGNET: Fully convolutional neural network based
automated brain tissue segmentation tool. arXiv (2018).

Más contenido relacionado

La actualidad más candente

The Smith Waterman algorithm
The Smith Waterman algorithmThe Smith Waterman algorithm
The Smith Waterman algorithmavrilcoghlan
 
Needleman-Wunsch Algorithm
Needleman-Wunsch AlgorithmNeedleman-Wunsch Algorithm
Needleman-Wunsch AlgorithmProshantaShil
 
Deep learning for biomedicine
Deep learning for biomedicineDeep learning for biomedicine
Deep learning for biomedicineDeakin University
 
Genes, Genomics and Proteomics
Genes, Genomics and Proteomics Genes, Genomics and Proteomics
Genes, Genomics and Proteomics Garry D. Lasaga
 
Uses of Artificial Intelligence in Bioinformatics
Uses of Artificial Intelligence in BioinformaticsUses of Artificial Intelligence in Bioinformatics
Uses of Artificial Intelligence in BioinformaticsPragya Pai
 
Explainability and bias in AI
Explainability and bias in AIExplainability and bias in AI
Explainability and bias in AIBill Liu
 
Comparitive genomic hybridisation
Comparitive genomic hybridisationComparitive genomic hybridisation
Comparitive genomic hybridisationnamrathrs87
 
Machine learning in biology
Machine learning in biologyMachine learning in biology
Machine learning in biologyPranavathiyani G
 
Comparative genomics @ sid 2003 format
Comparative genomics @ sid 2003 formatComparative genomics @ sid 2003 format
Comparative genomics @ sid 2003 formatsidjena70
 
“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...
“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...
“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...Edge AI and Vision Alliance
 
GENOMICS AND BIOINFORMATICS
GENOMICS AND BIOINFORMATICSGENOMICS AND BIOINFORMATICS
GENOMICS AND BIOINFORMATICSsandeshGM
 
Comparative genomics presentation
Comparative genomics presentationComparative genomics presentation
Comparative genomics presentationEmmanuel Aguon
 
08.13.08: DNA Sequence Variation
08.13.08: DNA Sequence Variation08.13.08: DNA Sequence Variation
08.13.08: DNA Sequence VariationOpen.Michigan
 
Bioinfromatics - local alignment
Bioinfromatics - local alignmentBioinfromatics - local alignment
Bioinfromatics - local alignmentVivek Chandramohan
 

La actualidad más candente (20)

Deep Learning
Deep LearningDeep Learning
Deep Learning
 
The Smith Waterman algorithm
The Smith Waterman algorithmThe Smith Waterman algorithm
The Smith Waterman algorithm
 
String.pptx
String.pptxString.pptx
String.pptx
 
Needleman-Wunsch Algorithm
Needleman-Wunsch AlgorithmNeedleman-Wunsch Algorithm
Needleman-Wunsch Algorithm
 
Deep learning for biomedicine
Deep learning for biomedicineDeep learning for biomedicine
Deep learning for biomedicine
 
Genes, Genomics and Proteomics
Genes, Genomics and Proteomics Genes, Genomics and Proteomics
Genes, Genomics and Proteomics
 
Uses of Artificial Intelligence in Bioinformatics
Uses of Artificial Intelligence in BioinformaticsUses of Artificial Intelligence in Bioinformatics
Uses of Artificial Intelligence in Bioinformatics
 
Gemome annotation
Gemome annotationGemome annotation
Gemome annotation
 
Explainability and bias in AI
Explainability and bias in AIExplainability and bias in AI
Explainability and bias in AI
 
Comparitive genomic hybridisation
Comparitive genomic hybridisationComparitive genomic hybridisation
Comparitive genomic hybridisation
 
DNA Chip
DNA ChipDNA Chip
DNA Chip
 
Machine learning in biology
Machine learning in biologyMachine learning in biology
Machine learning in biology
 
Comparative genomics @ sid 2003 format
Comparative genomics @ sid 2003 formatComparative genomics @ sid 2003 format
Comparative genomics @ sid 2003 format
 
“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...
“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...
“Responsible AI: Tools and Frameworks for Developing AI Solutions,” a Present...
 
Data mining
Data miningData mining
Data mining
 
GENOMICS AND BIOINFORMATICS
GENOMICS AND BIOINFORMATICSGENOMICS AND BIOINFORMATICS
GENOMICS AND BIOINFORMATICS
 
GWAS
GWASGWAS
GWAS
 
Comparative genomics presentation
Comparative genomics presentationComparative genomics presentation
Comparative genomics presentation
 
08.13.08: DNA Sequence Variation
08.13.08: DNA Sequence Variation08.13.08: DNA Sequence Variation
08.13.08: DNA Sequence Variation
 
Bioinfromatics - local alignment
Bioinfromatics - local alignmentBioinfromatics - local alignment
Bioinfromatics - local alignment
 

Similar a Ml in genomics

Genetic prediction using Machine Learning Techniques .pptx
Genetic prediction using Machine Learning Techniques .pptxGenetic prediction using Machine Learning Techniques .pptx
Genetic prediction using Machine Learning Techniques .pptxHabtamuAyenew4
 
Discover How Scientific Data is Used for the Public Good with Natural Languag...
Discover How Scientific Data is Used for the Public Good with Natural Languag...Discover How Scientific Data is Used for the Public Good with Natural Languag...
Discover How Scientific Data is Used for the Public Good with Natural Languag...BaoTramDuong2
 
Frankie Rybicki slide set for Deep Learning in Radiology / Medicine
Frankie Rybicki slide set for Deep Learning in Radiology / MedicineFrankie Rybicki slide set for Deep Learning in Radiology / Medicine
Frankie Rybicki slide set for Deep Learning in Radiology / MedicineFrank Rybicki
 
Sample Work For Engineering Literature Review and Gap Identification
Sample Work For Engineering Literature Review and Gap IdentificationSample Work For Engineering Literature Review and Gap Identification
Sample Work For Engineering Literature Review and Gap IdentificationPhD Assistance
 
Prediction of Euro 50 Using Back Propagation Neural Network (BPNN) and Geneti...
Prediction of Euro 50 Using Back Propagation Neural Network (BPNN) and Geneti...Prediction of Euro 50 Using Back Propagation Neural Network (BPNN) and Geneti...
Prediction of Euro 50 Using Back Propagation Neural Network (BPNN) and Geneti...AI Publications
 
Rough Draft Essay. Rough Draft Examples - Writing a rough draft. Rough Draft....
Rough Draft Essay. Rough Draft Examples - Writing a rough draft. Rough Draft....Rough Draft Essay. Rough Draft Examples - Writing a rough draft. Rough Draft....
Rough Draft Essay. Rough Draft Examples - Writing a rough draft. Rough Draft....Alexandra Saunders
 
AI at GSK_Kim Branson_mHealth Israel
AI at GSK_Kim Branson_mHealth IsraelAI at GSK_Kim Branson_mHealth Israel
AI at GSK_Kim Branson_mHealth IsraelLevi Shapiro
 
AstraZeneca - The promise of graphs & graph-based learning in drug discovery
AstraZeneca - The promise of graphs & graph-based learning in drug discoveryAstraZeneca - The promise of graphs & graph-based learning in drug discovery
AstraZeneca - The promise of graphs & graph-based learning in drug discoveryNeo4j
 
Scientific applications of machine learning
Scientific applications of machine learningScientific applications of machine learning
Scientific applications of machine learningbutest
 
Charleston Conference 2016
Charleston Conference 2016Charleston Conference 2016
Charleston Conference 2016Anita de Waard
 
Fairness in Machine Learning @Codemotion
Fairness in Machine Learning @CodemotionFairness in Machine Learning @Codemotion
Fairness in Machine Learning @CodemotionAzzurra Ragone
 
Deep learning for biomedical discovery and data mining I
Deep learning for biomedical discovery and data mining IDeep learning for biomedical discovery and data mining I
Deep learning for biomedical discovery and data mining IDeakin University
 

Similar a Ml in genomics (20)

Genetic prediction using Machine Learning Techniques .pptx
Genetic prediction using Machine Learning Techniques .pptxGenetic prediction using Machine Learning Techniques .pptx
Genetic prediction using Machine Learning Techniques .pptx
 
2016 bergen-sars
2016 bergen-sars2016 bergen-sars
2016 bergen-sars
 
2016 davis-plantbio
2016 davis-plantbio2016 davis-plantbio
2016 davis-plantbio
 
Discover How Scientific Data is Used for the Public Good with Natural Languag...
Discover How Scientific Data is Used for the Public Good with Natural Languag...Discover How Scientific Data is Used for the Public Good with Natural Languag...
Discover How Scientific Data is Used for the Public Good with Natural Languag...
 
Frankie Rybicki slide set for Deep Learning in Radiology / Medicine
Frankie Rybicki slide set for Deep Learning in Radiology / MedicineFrankie Rybicki slide set for Deep Learning in Radiology / Medicine
Frankie Rybicki slide set for Deep Learning in Radiology / Medicine
 
Sample Work For Engineering Literature Review and Gap Identification
Sample Work For Engineering Literature Review and Gap IdentificationSample Work For Engineering Literature Review and Gap Identification
Sample Work For Engineering Literature Review and Gap Identification
 
ppt1 - Copy (1).pptx
ppt1 - Copy (1).pptxppt1 - Copy (1).pptx
ppt1 - Copy (1).pptx
 
Prediction of Euro 50 Using Back Propagation Neural Network (BPNN) and Geneti...
Prediction of Euro 50 Using Back Propagation Neural Network (BPNN) and Geneti...Prediction of Euro 50 Using Back Propagation Neural Network (BPNN) and Geneti...
Prediction of Euro 50 Using Back Propagation Neural Network (BPNN) and Geneti...
 
Rough Draft Essay. Rough Draft Examples - Writing a rough draft. Rough Draft....
Rough Draft Essay. Rough Draft Examples - Writing a rough draft. Rough Draft....Rough Draft Essay. Rough Draft Examples - Writing a rough draft. Rough Draft....
Rough Draft Essay. Rough Draft Examples - Writing a rough draft. Rough Draft....
 
AI at GSK_Kim Branson_mHealth Israel
AI at GSK_Kim Branson_mHealth IsraelAI at GSK_Kim Branson_mHealth Israel
AI at GSK_Kim Branson_mHealth Israel
 
AstraZeneca - The promise of graphs & graph-based learning in drug discovery
AstraZeneca - The promise of graphs & graph-based learning in drug discoveryAstraZeneca - The promise of graphs & graph-based learning in drug discovery
AstraZeneca - The promise of graphs & graph-based learning in drug discovery
 
Scientific applications of machine learning
Scientific applications of machine learningScientific applications of machine learning
Scientific applications of machine learning
 
2016 davis-biotech
2016 davis-biotech2016 davis-biotech
2016 davis-biotech
 
Charleston Conference 2016
Charleston Conference 2016Charleston Conference 2016
Charleston Conference 2016
 
2019 Fall Series: Postdoc Seminars - Special Guest Lecture, There is a Kernel...
2019 Fall Series: Postdoc Seminars - Special Guest Lecture, There is a Kernel...2019 Fall Series: Postdoc Seminars - Special Guest Lecture, There is a Kernel...
2019 Fall Series: Postdoc Seminars - Special Guest Lecture, There is a Kernel...
 
Lec.10 Dr Ahmed Elngar
Lec.10 Dr Ahmed ElngarLec.10 Dr Ahmed Elngar
Lec.10 Dr Ahmed Elngar
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Fairness in Machine Learning @Codemotion
Fairness in Machine Learning @CodemotionFairness in Machine Learning @Codemotion
Fairness in Machine Learning @Codemotion
 
Deep learning for biomedical discovery and data mining I
Deep learning for biomedical discovery and data mining IDeep learning for biomedical discovery and data mining I
Deep learning for biomedical discovery and data mining I
 
Oxford_15-03-22.pptx
Oxford_15-03-22.pptxOxford_15-03-22.pptx
Oxford_15-03-22.pptx
 

Último

Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologycaarthichand2003
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naJASISJULIANOELYNV
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxNandakishor Bhaurao Deshmukh
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayupadhyaymani499
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024innovationoecd
 
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》rnrncn29
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfSELF-EXPLANATORY
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.PraveenaKalaiselvan1
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuinethapagita
 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPirithiRaju
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPirithiRaju
 
User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationColumbia Weather Systems
 
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In DubaiDubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubaikojalkojal131
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxMurugaveni B
 
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTXALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTXDole Philippines School
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPirithiRaju
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxJorenAcuavera1
 
Carbon Dioxide Capture and Storage (CSS)
Carbon Dioxide Capture and Storage (CSS)Carbon Dioxide Capture and Storage (CSS)
Carbon Dioxide Capture and Storage (CSS)Tamer Koksalan, PhD
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 

Último (20)

Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technology
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by na
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyay
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024
 
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
 
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdfBehavioral Disorder: Schizophrenia & it's Case Study.pdf
Behavioral Disorder: Schizophrenia & it's Case Study.pdf
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
 
User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather Station
 
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In DubaiDubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
Dubai Calls Girl Lisa O525547819 Lexi Call Girls In Dubai
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
 
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTXALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptx
 
Carbon Dioxide Capture and Storage (CSS)
Carbon Dioxide Capture and Storage (CSS)Carbon Dioxide Capture and Storage (CSS)
Carbon Dioxide Capture and Storage (CSS)
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdf
 

Ml in genomics

  • 1. Introduction to machine learning in genomics BRIAN SCHILDER BIOINFORMATICIAN II RAJ LAB 08/21/2020 [ 1 ] N A S H F A M I L Y D E P A R T M E N T O F N E U R O S C I E N C E & F R I E D M A N B R A I N I N S T I T U T E [ 2 ] R O N A L D M . L O E B C E N T E R F O R A L Z H E I M E R ’ S D I S E A S E [ 3 ] D E P A R T M E N T O F G E N E T I C S A N D G E N O M I C S C I E N C E S & I C A H N I N S T I T U T E F O R D A T A S C I E N C E A N D G E N O M I C T E C H N O L O G Y , [ 4 ] E S T E L L E A N D D A N I E L M A G G I N D E P A R T M E N T O F N E U R O L O G Y
  • 2. Approaches to making predictions L Breiman, Statistical modeling: The two cultures. Statistical Science. 16, 199–215 (2001). Explicit modeling (your brain learns x~y) “I will predict y from x by assuming relationships based on my knowledge/the literature.” Pros Can utilize the prevailing wisdom. Highly interpretable models. Cons Susceptible to bias/ assumptions/ arbitrary parameters. May not explain the variance very well. Machine learning (your computer learns x~y) “I will predict y from x by having an algorithm learn the relationships from data.” Pros Less susceptible to (some forms) of human bias. Can make predictions from complex/multi- variate data. Cons Can be less interpretable. May not generalize to other data. • What’s the relationship between x and y? • If you do something to x, what will happen to y? Science in a nutshell cells
  • 3. What is machine learning? Artificial Intelligence The automation of tasks that normally require human intelligence. Machine Learning Automated optimization of some function by learning directly from data (as opposed to following explicit rules). x > 4 If True y + z < 2 If True = Go to Dr. If False = Go to ER If False Stay home vippng.com
  • 4. General ML framework Input training data Output predictions Evaluate accuracy against real answer Adjust model 1. Training phase 2. Testing phase Output predictions Input testing data Supervised learning example Input data • Categorical • Continuous MODEL • Logistic regression • Linear regression • GLMM • SVM • Neural network • Genetic algorithm • etc... Output prediction • Categorical • Continuous Dog (.04) Cat (.96) Transform data (or a gene expression vector…) Correct! +1
  • 5. ML vs. statistics: an increasingly blurry line ◦ Math and statistics were developed well before the advent of computers. ◦ Modern computers enable rapid iterative processes (optimization, distribution simulation) ◦ Linear regression, PCA and t-SNE are all technically AI/ML, but we often don’t think of them that way anymore. https://towardsdatascience.com/introduction-to-linear-regression-and-polynomial-regression-f8adc96f31cb https://ai.googleblog.com/2018/06/realtime-tsne-visualizations-with.html Linear regression PCA t-SNE
  • 6. Ways we use ML in biology • DGE • GWAS • LDScore • Batch correction • … Regression • PCA • MDS • t-SNE • UMAP • Manifold learning • Autoencoders • … Dimensionality Reduction • Centroid-based • K-means • Hierarchical • Agglomerative • Density-based • DBSCAN • Louvain • Distribution-based • Expectation-maximization Clustering • Co-expression • Multi-omics • Causality • Temporal graphs • Imputation • … •Networks …and much more!
  • 7. Which ML model do I use? Complex relationships? Yes Need high interpretability? Yes Simpler model No Lots of data? Yes More complex model No Simpler model No Simpler model https://towardsdatascience.com/the-balance- accuracy-vs-interpretability-1b3861408062 In practice, you try multiple models of varying complexity and compare performances.
  • 8. Deep learning in genomics
  • 9. What is deep learning? Eraslan et al. 2019, Nature Genetics Review Given their sequences (input) are how probable is it that these regions are binding motifs for TF A (output)? Given their sequences (input) are how probable is it that these regions are binding motifs for TF A (output 1) or TF B (output 2)? Given their sequences (input 1) and chromatin accessibility profiles (input 2) are how probable is it that these regions are binding motifs for TF A (output)? node 2-1 node 2-2 node 3-1 node 2-3 node 3-2 Layer 2 (hidden) Layer 3 (output) node 1-1 node 1-2 Layer 1 (input) Pros • Extremely flexible framework. • Highly parallelizable (GPUs). • Able to learn complex, non-linear features. Cons • Challenging to interpret. • Can require lots of compute. • [Usually] requires lots of data.
  • 11. Deep learning in genomics
  • 12. So what exactly can you do with deep learning?
  • 13. Predict [x] from DNA sequence Disease risk Gene expression Splicing TF motifs Epigenomic impact DNA sequence •primateAI •Deep Structured Phenotype Network (DSPN) •xpresso •spliceAI •Equivariant networks •DeepSEA •DeepFIGV •Avocado • In many cases, performance of deep learning models performed far better than other approaches (e.g. heuristics, SVM) • That said, rigorous testing on sufficiently different datasets than were used in training is key (but often difficult)
  • 14. Why are ANN so useful for sequences? ◦ DNA sequences are really hard for humans to understand. ◦ Especially true when considering long sequences, or multi-scale non- linear interactions. ◦ Artificial neural networks (ANN) excel at complex feature learning (e.g. image recognition). ◦ CNNs are great for learning hierarchical features ◦ nostril < nose < face < cat ◦ Humans can then interrogate and interpret these features. Eraslan et al. 2019, Nature Genetics Review Encoded Input ANN model Predict- ion
  • 16. Denoise noisy data (e.g. scRNA-seq) Deep count autoencoder (DCA) ◦ Eraslan et al. (2019), Nature Communications Learning with AuToEncoder ◦ Badsha et al. (2020), Quantitative Biology SAVER-X ◦ Wang et al. (preprint) bioRxiv
  • 18. Transcriptomes Realistic inference Latent space interpolation: ◦ [Conditional] variational autoencoders (VAE) ◦ [style transfer] Generative adversarial networks (GAN) (Gómez-Bombarelli, et al. 2018) scGen (Lotfollahi, Wolf, & Theis, 2019)(Pieters & Wiering,(biorxiv) 2018 ) Stimulated (e.g. IFN-β ) peripheral blood mononuclear cell (PBMC): e.g. T/B/NK cells, monocytes Drugs Faces
  • 19. Disease prediction “…we developed an interpretable deep-learning framework, the Deep Structured Phenotype Network (DSPN) (21). This model combines a Deep Boltzmann Machine architecture with conditional and lateral connections derived from the regulatory network (50).” Improvement over baseline (50%) • Logistic predictor: 2.4-fold • DSPN: 6-fold • Captures non-linear interactions
  • 20. When does deep learning fail? When there’s not enough training/testing data. Can contribute to overfitting; model can’t translate to other datasets. When the data hasn’t been preprocessed properly, or has some other uncorrected confound. e.g. White label on bottom of image from disease- specialty hospital. When a high degree of interpretability and explainability are required. e.g. Clinical decision support. When a simpler model can do just as well for less compute. Always compare performance to other methods. When you’re asking the wrong question, or the fitness function is mispecified. Requires domain knowledge.
  • 21. Deep learning references Reviews ◦ J Zou et al., A primer on deep learning in genomics. Nature Genetics (2018), doi:10.1038/s41588-018-0295-5. ◦ G Eraslan et al., Deep learning: new computational modelling techniques for genomics. Nature Reviews Genetics (2019), doi:10.1038/s41576-019-0122-6. ◦ TJ Cleophas et al., Machine Learning in Medicine. Circulation. 132, 1920–1930 (2015). ◦ P Baldi, Deep Learning in Biomedical Data Science. Annual Review of Biomedical Data Science. 1, 181–205 (2018). ◦ R Miotto et al., Deep learning for healthcare: review, opportunities and challenges. Briefings in Bioinformatics. 19, 1236–1246 (2017). ◦ VI Jurtz et al., An introduction to deep learning on biological sequence data: Examples and solutions. Bioinformatics. 33, 3685– 3690 (2017). ◦ MKK Leung et al., Machine Learning in Genomic Medicine: A Review of Computational Problems and Data Sets. Proceedings of the IEEE. 104, 176–197 (2016). ◦ DSW Ho et al., Machine learning SNP based prediction for precision medicine. Frontiers in Genetics. 10, 1–10 (2019). ◦ A Taylor-Weiner et al., Scaling computational genomics to millions of individuals with GPUs. Genome Biology. 20, 1–5 (2019). ◦ L Breiman, Statistical modeling: The two cultures. Statistical Science. 16, 199–215 (2001). ◦ BS Ullman, Using neuroscience to develop artificial intelligence. 363, 692–694 (2019). ◦ A Marblestone et al., Towards an integration of deep learning and neuroscience. 10, 1–41 (2016). ◦ Y Bengio et al., Towards Biologically Plausible Deep Learning (2015), doi:10.1007/s13398-014-0173-7.2. ◦ KM Chen et al., Selene: a PyTorch-based deep learning library for sequence-level data. Nature Methods. 16, 315–318 (2019). Genomics ◦ Disease risk ◦ D Wang et al., Comprehensive functional genomic resource and integrative model for the adult brain. Science, 1266 (2018). ◦ L Sundaram et al., Predicting the clinical impact of human mutation with deep neural networks. Nature Genetics. 50, 1161–1170 (2018). ◦ Y Ding et al., A deep learning model to predict a diagnosis of Alzheimer disease by using 18 F-FDG PET of the brain. Radiology. 290, 456–464 (2019). ◦ I Klyuzhin et al., Use of deep convolutional neural networks to predict Parkinson’s disease progression from DaTscan SPECT images. Journal of Nuclear Medicine. 59, 29 (2018). ◦ KK Dey et al., Evaluating the informativeness of deep learning annotations for human complex diseases. bioRxiv, 784439 (2019). ◦ A Romagnoni et al., Comparative performances of machine learning methods for classifying Crohn Disease patients using genome-wide genotyping data. Scientific Reports. 9, 1–18 (2019). ◦ CAC Montañez et al., Deep Learning Classification of Polygenic Obesity using Genome Wide Association Study SNPs. Proceedings of the International Joint Conference on Neural Networks. 2018-July (2018), doi:10.1109/IJCNN.2018.8489048. ◦ Gene expression ◦ V Agarwal et al., Predicting mRNA Abundance Directly from Genomic Sequence Using Deep Convolutional Neural Networks ll Predicting mRNA Abundance Directly from Genomic Sequence Using Deep Convolutional Neural Networks. Cell Reports. 31, 107663 (2020). ◦ X Li et al., The impact of rare variation on gene expression across tissues. Nature. 550, 239–243 (2017). ◦ JD Washburn et al., Evolutionarily informed deep learning methods for predicting relative transcript abundance from DNA sequence. Proceedings of the National Academy of Sciences of the United States of America. 116, 5542–5549 (2019). ◦ Y Zhang et al., Predicting Gene Expression from DNA Sequence using Residual Neural Network. bioRxiv, in press, doi:10.1101/2020.06.21.163956. ◦ Epigenomics ◦ J Zhou et al., Predicting effects of noncoding variants with deep learning-based sequence model. Nature Methods. 12, 931–934 (2015). ◦ GE Hoffman et al., Functional Interpretation of Genetic Variants Using Deep Learning Predicts Impact on Epigenome. Nucleic Acids Research, 1–15 (2019). ◦ J Schreiber et al., Avocado: a multi-scale deep tensor factorization method learns a latent representation of the human epigenome. Genome Biology. 21, 364976 (2018). ◦ Splicing ◦ K Jaganathan et al., Predicting Splicing from Primary Sequence with Deep Learning. Cell. 0, 535-548.e24 (2019). ◦ TF ◦ RC Brown et al., An equivariant Bayesian convolutional network predicts recombination hotspots and accurately resolves binding motifs. Bioinformatics. 35, 2177–2184 (2019). Transcriptomics ◦ G Eraslan et al., Single-cell RNA-seq denoising using a deep count autoencoder. Nature Communications. 10, 1–14 (2019). ◦ M Lotfollahi et al., scGen predicts single-cell perturbation responses. Nature Methods. 16, 715–721 (2019). ◦ M Colomé-Tatché et al., Statistical single cell multi-omics integration. Current Opinion in Systems Biology. 7, 54–59 (2018). ◦ C Lin et al., Using neural networks for reducing the dimensions of single-cell RNA-Seq data. Nucleic Acids Research. 45 (2017), doi:10.1093/nar/gkx681. ◦ GP Way et al., Bayesian deep learning for single-cell analysis. Nature Methods. 15, 1009–1010 (2018). ◦ R Lopez et al., Deep generative modeling for single-cell transcriptomics. Nature Methods. 15, 1053–1058 (2018). ◦ J Wang et al., Data denoising with transfer learning in single-cell transcriptomics. Nature Methods. 16 (2019), doi:10.1038/s41592-019-0537-1. ◦ OmicsMapNet: Transforming omics data to take advantage of Deep Convolutional Neural Network for discovery. Drug discovery ◦ R Gómez-Bombarelli et al., Automatic Chemical Design Using a Data-Driven Continuous Representation of Molecules. ACS Central Science. 4, 268–276 (2018). ◦ CF Lipinski et al., Advances and Perspectives in Applying Deep Learning for Drug Design and Discovery. 6, 1–6 (2019). ◦ L David et al., Applications of deep-learning in exploiting large-scale and heterogeneous compound data in industrial pharmaceutical research. Frontiers in Pharmacology. 10, 1–16 (2019). Imaging ◦ G Lee et al., Predicting Alzheimer’s disease progression using multi-modal deep learning approach. Scientific Reports. 9, 1–12 (2019). ◦ H Chen et al., VoxResNet: Deep voxelwise residual networks for brain segmentation from 3D MR images. NeuroImage. 170, 446–455 (2018). ◦ T Jo et al., Deep Learning in Alzheimer’s Disease: Diagnostic Classification and Prognostic Prediction Using Neuroimaging Data. Frontiers in Aging Neuroscience. 11 (2019), doi:10.3389/fnagi.2019.00220. ◦ A Iqbal et al., Developing a brain atlas through deep learning. Nature Machine Intelligence. 1, 277–287 (2019). ◦ A Mahbod et al., Automatic brain segmentation using artificial neural networks with shape context. Pattern Recognition Letters. 101, 74–79 (2018). ◦ P Kumar et al., U-SEGNET: Fully convolutional neural network based automated brain tissue segmentation tool. arXiv (2018).

Notas del editor

  1. predictive modeling Algorithmic modeling ~ machine learning In reality, there’s a lot of overlap between these approaches. For example, you can assume a normal distribution within a machine learning model.
  2. Pseudotime on DCA bottleneck coordinates was highly correlated with pseudotime on PCA coordinates (recommended usage) suggesting that DCA is capturing the correct continuous feature (cellular differentiation)