SlideShare una empresa de Scribd logo
1 de 88
Descargar para leer sin conexión
fast and flexible probabilistic modeling in python
jmschreiber91
@jmschrei
@jmschreiber91
Jacob Schreiber
PhD student, Paul G. Allen School of
Computer Science, University of
Washington, Seattle WA
Maxwell Libbrecht
Postdoc, Department of Genome Sciences,
University of Washington, Seattle WA
Assistant Professor, Simon Fraser University,
Vancouver BC
noble.gs.washington.edu/~maxwl/
fast and flexible probabilistic modeling in python
jmschreiber91
@jmschrei
@jmschreiber91
Jacob Schreiber
PhD student, Paul G. Allen School of
Computer Science, University of
Washington, Seattle WA
Maxwell Libbrecht
Postdoc, Department of Genome Sciences,
University of Washington, Seattle WA
Assistant Professor, Simon Fraser University,
Vancouver BC
noble.gs.washington.edu/~maxwl/
Follow along with the Jupyter tutorial
3
goo.gl/cbE7rs
What is probabilistic modeling?
4goo.gl/cbE7rs
Tasks:
• Sample from a distribution.
• Learn parameters of a
distribution from data.
• Predict which distribution
generated a data example.
When should you use pomegranate?
5
complexity of probabilistic modeling task
simple complexz }| {
• logistic regression
• sample from a Gaussian
distribution
• speed is not important
Use scipy or
scikit-learn
z }| {
• distributions not implemented in
pomegranate
• problem-specific inference
algorithms
Use custom
packages (in R or
stand-alone)
z }| {
Use
pomegranate
goo.gl/cbE7rs
When should you use pomegranate?
6
z }| {
Use pomegranate
probabilistic modeling tasks
orthogonaltasks
• Scientific computing: Use numpy, scipy
• Non-probabilistic supervised learning (SVMs,
neural networks): Use scikit-learn
• Visualizing probabilistic models: Use matplotlib
• Bayesian statistics: Check out PyMC3
goo.gl/cbE7rs
Overview: this talk
7
Overview
Major Models/Model Stacks
1. General Mixture Models
2. Hidden Markov Models
3. Bayesian Networks
4. Bayes Classifiers
Finale: Train a mixture of HMMs in parallel
The API is common to all models
8
All models have these methods!
All models composed of
distributions (like GMM, HMM...)
have these methods too!
model.log_probability(X) / model.probability(X)
model.sample()
model.fit(X, weights, inertia)
model.summarize(X, weights)
model.from_summaries(inertia)
model.predict(X)
model.predict_proba(X)
model.predict_log_proba(X)
Model.from_samples(X, weights)
goo.gl/cbE7rs
Overview: supported models
Six Main Models:
1. Probability Distributions
2. General Mixture Models
3. Markov Chains
4. Hidden Markov Models
5. Bayes Classifiers / Naive Bayes
6. Bayesian Networks
9
Two Helper Models:
1. k-means++/kmeans||
2. Factor Graphs
pomegranate supports many distributions
10
Univariate Distributions
1. UniformDistribution
2. BernoulliDistribution
3. NormalDistribution
4. LogNormalDistribution
5. ExponentialDistribution
6. BetaDistribution
7. GammaDistribution
8. DiscreteDistribution
9. PoissonDistribution
Kernel Densities
1. GaussianKernelDensity
2. UniformKernelDensity
3. TriangleKernelDensity
Multivariate Distributions
1. IndependentComponentsDistributio
n
2. MultivariateGaussianDistribution
3. DirichletDistribution
4. ConditionalProbabilityTable
5. JointProbabilityTable
11
mu,	sig	=	0,	2	
a	=	NormalDistribution(mu,	sig)
X	=	[0,	1,	1,	2,	1.5,	6,	7,	8,	7]	
a	=	GaussianKernelDensity(X)
Models can be created from known values
12
Models can be learned from data
X	=	numpy.random.normal(0,	1,	100)	
a	=	NormalDistribution.from_samples(X)
Overview: model stacking in pomegranate
13
Distributions
Bayes Classifiers
Markov Chains
General Mixture Models
Hidden Markov Models
Bayesian Networks
D BC MC GMM HMM BN
Sub-model
Wrapper model
Overview: model stacking in pomegranate
14
Distributions
Bayes Classifiers
Markov Chains
General Mixture Models
Hidden Markov Models
Bayesian Networks
D BC MC GMM HMM BN
Sub-model
Wrapper model
Overview: model stacking in pomegranate
15
Distributions
Bayes Classifiers
Markov Chains
General Mixture Models
Hidden Markov Models
Bayesian Networks
D BC MC GMM HMM BN
Sub-model
Wrapper model
Overview: model stacking in pomegranate
16
Distributions
Bayes Classifiers
Markov Chains
General Mixture Models
Hidden Markov Models
Bayesian Networks
D BC MC GMM HMM BN
17
pomegranate can be faster than numpy
Fitting a Normal Distribution to 1,000 samples
18
pomegranate can be faster than numpy
Fitting Multivariate Gaussian to 10,000,000 samples of 10
dimensions
19
pomegranate uses BLAS internally
20
pomegranate just merged GPU support
21
pomegranate uses additive summarization
pomegranate reduces data to sufficient statistics for updates
and so only has to go datasets once (for all models).
Here is an example of the Normal Distribution sufficient
statistics
22
pomegranate supports out-of-core learning
Batches from a dataset can be reduced to additive summary
statistics, enabling exact updates from data that can’t fit in memory.
23
Parallelization exploits additive summaries
Extract summaries
+
+
+
+
New Parameters
24
pomegranate supports semisupervised learning
Summary statistics from supervised models can be added to
summary statistics from unsupervised models to train a single model
on a mixture of labeled and unlabeled data.
25
pomegranate supports semisupervised learning
Supervised Accuracy: 0.93 Semisupervised Accuracy: 0.96
26
pomegranate can be faster than scipy
27
pomegranate uses aggressive caching
29
30
Example ‘blast’ from Gossip Girl
Spotted: Lonely Boy. Can't believe the love of his life has
returned. If only she knew who he was. But everyone knows
Serena. And everyone is talking. Wonder what Blair Waldorf
thinks. Sure, they're BFF's, but we always thought Blair's
boyfriend Nate had a thing for Serena.
31
How do we encode these ‘blasts’?
Better lock it down with Nate, B. Clock's ticking.
+1 Nate
-1 Blair
32
How do we encode these ‘blasts’?
This just in: S and B committing a crime of fashion. Who
doesn't love a five-finger discount. Especially if it's the middle
one.
-1 Blair
-1 Serena
33
Simple summations don’t work well
34
Beta distributions can model uncertainty
35
Beta distributions can model uncertainty
36
Beta distributions can model uncertainty
Overview: this talk
37
Overview
Major Models/Model Stacks
1. General Mixture Models
2. Hidden Markov Models
3. Bayesian Networks
4. Bayes Classifiers
Finale: Train a mixture of HMMs in parallel
GMMs can model complex distributions
38
GMMs can model complex distributions
39
model = GeneralMixtureModel.from_samples(NormalDistribution, 2, X)
GMMs can model complex distributions
40
An exponential distribution is not right
41
model = ExponentialDistribution.from_samples(X)
A mixture of exponentials is better
42
model = GeneralMixtureModel.from_samples(ExponentialDistribution, 2, X)
Heterogeneous mixtures natively supported
43
model = GeneralMixtureModel.from_samples([ExponentialDistribution, UniformDistribution], 2, X)
GMMs faster than sklearn
44
Overview: this talk
45
Overview
Major Models/Model Stacks
1. General Mixture Models
2. Hidden Markov Models
3. Bayesian Networks
4. Bayes Classifiers
Finale: Train a mixture of HMMs in parallel
CG enrichment detection HMM
46
GACTACGACTCGCGCTCGCACGTCGCTCGACATCATCGACA
CG enrichment detection HMM
GACTACGACTCGCGCTCGCACGTCGCTCGACATCATCGACA
47
pomegranate HMMs are feature rich
48
GMM-HMM easy to define
49
HMMs are faster than hmmlearn
50
Overview: this talk
51
Overview
Major Models/Model Stacks
1. General Mixture Models
2. Hidden Markov Models
3. Bayesian Networks
4. Bayes Classifiers
Finale: Train a mixture of HMMs in parallel
Bayesian networks
52
Bayesian networks are powerful inference tools which define a
dependency structure between variables.
Sprinkler
Wet Grass
Rain
Bayesian networks
53
Sprinkler
Wet Grass
Rain
Two main difficult tasks:
(1) Inference given incomplete information
(2) Learning the dependency structure from data
Bayesian network inference
54
Inference is done using Belief Bropagation on a factor graph.
Messages are sent from one side to the other until convergence.
Rain
Sprinkler
Grass
Wet
R/S/GW
Factor
Grass
Wet
Rain
Sprinkler
Marginals Factors
Inference in a medical diagnostic network
55
BRCA 2 BRCA 1 LCT
BLOATLE LOA VOM AC
PREGLIOC
Inference in a medical diagnostic network
56
BRCA 2 BRCA 1 LCT
BLOATLE LOA VOM AC
PREGLIOC
Bayesian network structure learning
57
???
Three primary ways:
● “Search and score” / Exact
● “Constraint Learning” / PC
● Heuristics
Bayesian network structure learning
58
???
pomegranate supports:
● “Search and score” / Exact
● “Constraint Learning” / PC
● Heuristics
Exact structure learning is intractable
???
59
Chow-Liu trees are fast approximations
60
pomegranate supports four algorithms
61
Constraint graphs merge data + knowledge
62
BRCA 2 BRCA 1 LCT
BLOATLE LOA VOM AC
PREGLIOC
genetic conditions
conditions
symptoms
Constraint graphs merge data + knowledge
63
genetic conditions
diseases
symptoms
Modeling the global stock market
64
Constraint graph published in PeerJ CS
65
Overview: this talk
66
Overview
Major Models/Model Stacks
1. General Mixture Models
2. Hidden Markov Models
3. Bayesian Networks
4. Bayes Classifiers
Finale: Train a mixture of HMMs in parallel
Bayes classifiers rely on Bayes’ rule
67
68
Let’s build a classifier on this data
69
Likelihood function alone is not good
70
Priors can model class imbalance
71
The posterior is a good model of the data
Naive Bayes assumes independent features
72
Naive Bayes produces ellipsoid boundaries
73model = NaiveBayes.from_samples(NormalDistribution, X, y)

Naive Bayes can be heterogeneous
74
Data can fall under different distributions
75
Using appropriate distributions is better
76
model = NaiveBayes.from_samples(NormalDistribution, X_train, y_train)
print "Gaussian Naive Bayes: ", (model.predict(X_test) == y_test).mean()

clf = GaussianNB().fit(X_train, y_train)

print "sklearn Gaussian Naive Bayes: ", (clf.predict(X_test) == y_test).mean()

model = NaiveBayes.from_samples([NormalDistribution, LogNormalDistribution,
ExponentialDistribution], X_train, y_train

print "Heterogeneous Naive Bayes: ", (model.predict(X_test) == y_test).mean()
Gaussian Naive Bayes: 0.798

sklearn Gaussian Naive Bayes: 0.798

Heterogeneous Naive Bayes: 0.844
This additional flexibility is just as fast
77
Bayes classifiers don’t require independence
78
naive accuracy: 0.929 bayes classifier accuracy: 0.966
Gaussian mixture model Bayes classifier
79
Creating complex Bayes classifiers is easy
80
gmm_a = GeneralMixtureModel.from_samples(MultivariateGaussianDistribution, 2, X[y == 0])

gmm_b = GeneralMixtureModel.from_samples(MultivariateGaussianDistribution, 2, X[y == 1])

model_b = BayesClassifier([gmm_a, gmm_b], weights=numpy.array([1-y.mean(), y.mean()]))

Creating complex Bayes classifiers is easy
81
mc_a = MarkovChain.from_samples(X[y == 0])
mc_b = MarkovChain.from_samples(X[y == 1])

model_b = BayesClassifier([mc_a, mc_b], weights=numpy.array([1-y.mean(), y.mean()]))
hmm_a = HiddenMarkovModel.from_samples(X[y == 0])
hmm_b = HiddenMarkovModel.from_samples(X[y == 1])

model_b = BayesClassifier([hmm_a, hmm_b], weights=numpy.array([1-y.mean(), y.mean()]))
bn_a = BayesianNetwork.from_samples(X[y == 0])
bn_b = BayesianNetwork.from_samples(X[y == 1])

model_b = BayesClassifier([bn_a, bn_b], weights=numpy.array([1-y.mean(), y.mean()]))



Overview: this talk
82
Overview
Major Models/Model Stacks
1. General Mixture Models
2. Hidden Markov Models
3. Bayesian Networks
4. Bayes Classifiers
Finale: Train a mixture of HMMs in parallel
Training a mixture of HMMs in parallel
83
Creating a mixture of HMMs is just as simple as passing the
HMMs into a GMM as if it were any other distribution
Training a mixture of HMMs in parallel
84fit(model, X, n_jobs=n)
Documentation available at Readthedocs
85
Tutorials available on github
86
https://github.com/jmschrei/pomegranate/tree/master/tutorials
Acknowledgements
87
fast and flexible probabilistic modeling in python
jmschreiber91
@jmschrei
@jmschreiber91
Jacob Schreiber
PhD student, Paul G. Allen School of
Computer Science, University of
Washington, Seattle WA
Maxwell Libbrecht
Postdoc, Department of Genome Sciences,
University of Washington, Seattle WA
Assistant Professor, Simon Fraser University,
Vancouver BC
noble.gs.washington.edu/~maxwl/

Más contenido relacionado

La actualidad más candente

Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8Hakky St
 
Bayesian Neural Networks
Bayesian Neural NetworksBayesian Neural Networks
Bayesian Neural NetworksNatan Katz
 
Genetic algorithms
Genetic algorithmsGenetic algorithms
Genetic algorithmsswapnac12
 
PAC Learning and The VC Dimension
PAC Learning and The VC DimensionPAC Learning and The VC Dimension
PAC Learning and The VC Dimensionbutest
 
Transfer learning-presentation
Transfer learning-presentationTransfer learning-presentation
Transfer learning-presentationBushra Jbawi
 
Feature Reduction Techniques
Feature Reduction TechniquesFeature Reduction Techniques
Feature Reduction TechniquesVishal Patel
 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsMd. Main Uddin Rony
 
Machine Learning - Accuracy and Confusion Matrix
Machine Learning - Accuracy and Confusion MatrixMachine Learning - Accuracy and Confusion Matrix
Machine Learning - Accuracy and Confusion MatrixAndrew Ferlitsch
 
Ml2 train test-splits_validation_linear_regression
Ml2 train test-splits_validation_linear_regressionMl2 train test-splits_validation_linear_regression
Ml2 train test-splits_validation_linear_regressionankit_ppt
 
PCA and LDA in machine learning
PCA and LDA in machine learningPCA and LDA in machine learning
PCA and LDA in machine learningAkhilesh Joshi
 
Aprendizado de máquina
Aprendizado de máquinaAprendizado de máquina
Aprendizado de máquinaparasite
 
Mean shift and Hierarchical clustering
Mean shift and Hierarchical clustering Mean shift and Hierarchical clustering
Mean shift and Hierarchical clustering Yan Xu
 

La actualidad más candente (20)

3D 딥러닝 동향
3D 딥러닝 동향3D 딥러닝 동향
3D 딥러닝 동향
 
Deep Learning
Deep LearningDeep Learning
Deep Learning
 
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8
 
Bayesian Neural Networks
Bayesian Neural NetworksBayesian Neural Networks
Bayesian Neural Networks
 
Genetic algorithms
Genetic algorithmsGenetic algorithms
Genetic algorithms
 
PAC Learning and The VC Dimension
PAC Learning and The VC DimensionPAC Learning and The VC Dimension
PAC Learning and The VC Dimension
 
Machine learning
Machine learningMachine learning
Machine learning
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Transfer learning-presentation
Transfer learning-presentationTransfer learning-presentation
Transfer learning-presentation
 
Feature Reduction Techniques
Feature Reduction TechniquesFeature Reduction Techniques
Feature Reduction Techniques
 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning Algorithms
 
Gradient descent method
Gradient descent methodGradient descent method
Gradient descent method
 
Machine Learning - Accuracy and Confusion Matrix
Machine Learning - Accuracy and Confusion MatrixMachine Learning - Accuracy and Confusion Matrix
Machine Learning - Accuracy and Confusion Matrix
 
Ml2 train test-splits_validation_linear_regression
Ml2 train test-splits_validation_linear_regressionMl2 train test-splits_validation_linear_regression
Ml2 train test-splits_validation_linear_regression
 
CSC446: Pattern Recognition (LN6)
CSC446: Pattern Recognition (LN6)CSC446: Pattern Recognition (LN6)
CSC446: Pattern Recognition (LN6)
 
PCA and LDA in machine learning
PCA and LDA in machine learningPCA and LDA in machine learning
PCA and LDA in machine learning
 
Aprendizado de máquina
Aprendizado de máquinaAprendizado de máquina
Aprendizado de máquina
 
Em Algorithm | Statistics
Em Algorithm | StatisticsEm Algorithm | Statistics
Em Algorithm | Statistics
 
Mean shift and Hierarchical clustering
Mean shift and Hierarchical clustering Mean shift and Hierarchical clustering
Mean shift and Hierarchical clustering
 
AutoML lectures (ACDL 2019)
AutoML lectures (ACDL 2019)AutoML lectures (ACDL 2019)
AutoML lectures (ACDL 2019)
 

Similar a Maxwell W Libbrecht - pomegranate: fast and flexible probabilistic modeling in python

Distance-based bias in model-directed optimization of additively decomposable...
Distance-based bias in model-directed optimization of additively decomposable...Distance-based bias in model-directed optimization of additively decomposable...
Distance-based bias in model-directed optimization of additively decomposable...Martin Pelikan
 
Dependency patterns for latent variable discovery
Dependency patterns for latent variable discovery Dependency patterns for latent variable discovery
Dependency patterns for latent variable discovery Bayesian Intelligence
 
Data driven model optimization [autosaved]
Data driven model optimization [autosaved]Data driven model optimization [autosaved]
Data driven model optimization [autosaved]Russell Jarvis
 
Neural Nets Deconstructed
Neural Nets DeconstructedNeural Nets Deconstructed
Neural Nets DeconstructedPaul Sterk
 
Higher-order Link Prediction GraphEx
Higher-order Link Prediction GraphExHigher-order Link Prediction GraphEx
Higher-order Link Prediction GraphExAustin Benson
 
The application of artificial intelligence
The application of artificial intelligenceThe application of artificial intelligence
The application of artificial intelligencePallavi Vashistha
 
NIPS2007: deep belief nets
NIPS2007: deep belief netsNIPS2007: deep belief nets
NIPS2007: deep belief netszukun
 
Mining Maximally Banded Matrices in Binary Data
 Mining Maximally Banded Matrices in Binary Data Mining Maximally Banded Matrices in Binary Data
Mining Maximally Banded Matrices in Binary DataFaris Alqadah
 
Cornell Pbsb 20090126 Nets
Cornell Pbsb 20090126 NetsCornell Pbsb 20090126 Nets
Cornell Pbsb 20090126 NetsMark Gerstein
 
Simplicial closure and higher-order link prediction (SIAMNS18)
Simplicial closure and higher-order link prediction (SIAMNS18)Simplicial closure and higher-order link prediction (SIAMNS18)
Simplicial closure and higher-order link prediction (SIAMNS18)Austin Benson
 
Composing graphical models with neural networks for structured representatio...
Composing graphical models with  neural networks for structured representatio...Composing graphical models with  neural networks for structured representatio...
Composing graphical models with neural networks for structured representatio...Jeongmin Cha
 
OpenCL applications in genomics
OpenCL applications in genomicsOpenCL applications in genomics
OpenCL applications in genomicsUSC
 
Uncertainty Quantification in AI
Uncertainty Quantification in AIUncertainty Quantification in AI
Uncertainty Quantification in AIFlorian Wilhelm
 
Are you sure about that?! Uncertainty Quantification in AI
Are you sure about that?! Uncertainty Quantification in AIAre you sure about that?! Uncertainty Quantification in AI
Are you sure about that?! Uncertainty Quantification in AIinovex GmbH
 
PPT - Deep and Confident Prediction For Time Series at Uber
PPT - Deep and Confident Prediction For Time Series at UberPPT - Deep and Confident Prediction For Time Series at Uber
PPT - Deep and Confident Prediction For Time Series at UberJisang Yoon
 
Dynamic Programming Algorithm for the Prediction for Gene Structure
Dynamic Programming Algorithm for the Prediction for Gene StructureDynamic Programming Algorithm for the Prediction for Gene Structure
Dynamic Programming Algorithm for the Prediction for Gene StructureMarilyn Arceo
 
Jillian ms defense-4-14-14-ja-novid3
Jillian ms defense-4-14-14-ja-novid3Jillian ms defense-4-14-14-ja-novid3
Jillian ms defense-4-14-14-ja-novid3Jillian Aurisano
 

Similar a Maxwell W Libbrecht - pomegranate: fast and flexible probabilistic modeling in python (20)

Distance-based bias in model-directed optimization of additively decomposable...
Distance-based bias in model-directed optimization of additively decomposable...Distance-based bias in model-directed optimization of additively decomposable...
Distance-based bias in model-directed optimization of additively decomposable...
 
Dependency patterns for latent variable discovery
Dependency patterns for latent variable discovery Dependency patterns for latent variable discovery
Dependency patterns for latent variable discovery
 
Data driven model optimization [autosaved]
Data driven model optimization [autosaved]Data driven model optimization [autosaved]
Data driven model optimization [autosaved]
 
Neural Nets Deconstructed
Neural Nets DeconstructedNeural Nets Deconstructed
Neural Nets Deconstructed
 
Higher-order Link Prediction GraphEx
Higher-order Link Prediction GraphExHigher-order Link Prediction GraphEx
Higher-order Link Prediction GraphEx
 
The application of artificial intelligence
The application of artificial intelligenceThe application of artificial intelligence
The application of artificial intelligence
 
NIPS2007: deep belief nets
NIPS2007: deep belief netsNIPS2007: deep belief nets
NIPS2007: deep belief nets
 
Mining Maximally Banded Matrices in Binary Data
 Mining Maximally Banded Matrices in Binary Data Mining Maximally Banded Matrices in Binary Data
Mining Maximally Banded Matrices in Binary Data
 
Cornell Pbsb 20090126 Nets
Cornell Pbsb 20090126 NetsCornell Pbsb 20090126 Nets
Cornell Pbsb 20090126 Nets
 
Simplicial closure and higher-order link prediction (SIAMNS18)
Simplicial closure and higher-order link prediction (SIAMNS18)Simplicial closure and higher-order link prediction (SIAMNS18)
Simplicial closure and higher-order link prediction (SIAMNS18)
 
Composing graphical models with neural networks for structured representatio...
Composing graphical models with  neural networks for structured representatio...Composing graphical models with  neural networks for structured representatio...
Composing graphical models with neural networks for structured representatio...
 
OpenCL applications in genomics
OpenCL applications in genomicsOpenCL applications in genomics
OpenCL applications in genomics
 
Magpie
MagpieMagpie
Magpie
 
Alignment Approaches II: Long Reads
Alignment Approaches II: Long ReadsAlignment Approaches II: Long Reads
Alignment Approaches II: Long Reads
 
Uncertainty Quantification in AI
Uncertainty Quantification in AIUncertainty Quantification in AI
Uncertainty Quantification in AI
 
Are you sure about that?! Uncertainty Quantification in AI
Are you sure about that?! Uncertainty Quantification in AIAre you sure about that?! Uncertainty Quantification in AI
Are you sure about that?! Uncertainty Quantification in AI
 
Biological Network Inference via Gaussian Graphical Models
Biological Network Inference via Gaussian Graphical ModelsBiological Network Inference via Gaussian Graphical Models
Biological Network Inference via Gaussian Graphical Models
 
PPT - Deep and Confident Prediction For Time Series at Uber
PPT - Deep and Confident Prediction For Time Series at UberPPT - Deep and Confident Prediction For Time Series at Uber
PPT - Deep and Confident Prediction For Time Series at Uber
 
Dynamic Programming Algorithm for the Prediction for Gene Structure
Dynamic Programming Algorithm for the Prediction for Gene StructureDynamic Programming Algorithm for the Prediction for Gene Structure
Dynamic Programming Algorithm for the Prediction for Gene Structure
 
Jillian ms defense-4-14-14-ja-novid3
Jillian ms defense-4-14-14-ja-novid3Jillian ms defense-4-14-14-ja-novid3
Jillian ms defense-4-14-14-ja-novid3
 

Más de PyData

Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...PyData
 
Unit testing data with marbles - Jane Stewart Adams, Leif Walsh
Unit testing data with marbles - Jane Stewart Adams, Leif WalshUnit testing data with marbles - Jane Stewart Adams, Leif Walsh
Unit testing data with marbles - Jane Stewart Adams, Leif WalshPyData
 
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake BolewskiThe TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake BolewskiPyData
 
Using Embeddings to Understand the Variance and Evolution of Data Science... ...
Using Embeddings to Understand the Variance and Evolution of Data Science... ...Using Embeddings to Understand the Variance and Evolution of Data Science... ...
Using Embeddings to Understand the Variance and Evolution of Data Science... ...PyData
 
Deploying Data Science for Distribution of The New York Times - Anne Bauer
Deploying Data Science for Distribution of The New York Times - Anne BauerDeploying Data Science for Distribution of The New York Times - Anne Bauer
Deploying Data Science for Distribution of The New York Times - Anne BauerPyData
 
Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam LermaGraph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam LermaPyData
 
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...PyData
 
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo MazzaferroRESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo MazzaferroPyData
 
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...PyData
 
Avoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
Avoiding Bad Database Surprises: Simulation and Scalability - Steven LottAvoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
Avoiding Bad Database Surprises: Simulation and Scalability - Steven LottPyData
 
Words in Space - Rebecca Bilbro
Words in Space - Rebecca BilbroWords in Space - Rebecca Bilbro
Words in Space - Rebecca BilbroPyData
 
End-to-End Machine learning pipelines for Python driven organizations - Nick ...
End-to-End Machine learning pipelines for Python driven organizations - Nick ...End-to-End Machine learning pipelines for Python driven organizations - Nick ...
End-to-End Machine learning pipelines for Python driven organizations - Nick ...PyData
 
Pydata beautiful soup - Monica Puerto
Pydata beautiful soup - Monica PuertoPydata beautiful soup - Monica Puerto
Pydata beautiful soup - Monica PuertoPyData
 
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...PyData
 
Extending Pandas with Custom Types - Will Ayd
Extending Pandas with Custom Types - Will AydExtending Pandas with Custom Types - Will Ayd
Extending Pandas with Custom Types - Will AydPyData
 
Measuring Model Fairness - Stephen Hoover
Measuring Model Fairness - Stephen HooverMeasuring Model Fairness - Stephen Hoover
Measuring Model Fairness - Stephen HooverPyData
 
What's the Science in Data Science? - Skipper Seabold
What's the Science in Data Science? - Skipper SeaboldWhat's the Science in Data Science? - Skipper Seabold
What's the Science in Data Science? - Skipper SeaboldPyData
 
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...PyData
 
Solving very simple substitution ciphers algorithmically - Stephen Enright-Ward
Solving very simple substitution ciphers algorithmically - Stephen Enright-WardSolving very simple substitution ciphers algorithmically - Stephen Enright-Ward
Solving very simple substitution ciphers algorithmically - Stephen Enright-WardPyData
 
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...PyData
 

Más de PyData (20)

Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
 
Unit testing data with marbles - Jane Stewart Adams, Leif Walsh
Unit testing data with marbles - Jane Stewart Adams, Leif WalshUnit testing data with marbles - Jane Stewart Adams, Leif Walsh
Unit testing data with marbles - Jane Stewart Adams, Leif Walsh
 
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake BolewskiThe TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
 
Using Embeddings to Understand the Variance and Evolution of Data Science... ...
Using Embeddings to Understand the Variance and Evolution of Data Science... ...Using Embeddings to Understand the Variance and Evolution of Data Science... ...
Using Embeddings to Understand the Variance and Evolution of Data Science... ...
 
Deploying Data Science for Distribution of The New York Times - Anne Bauer
Deploying Data Science for Distribution of The New York Times - Anne BauerDeploying Data Science for Distribution of The New York Times - Anne Bauer
Deploying Data Science for Distribution of The New York Times - Anne Bauer
 
Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam LermaGraph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
 
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
 
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo MazzaferroRESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
 
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
 
Avoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
Avoiding Bad Database Surprises: Simulation and Scalability - Steven LottAvoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
Avoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
 
Words in Space - Rebecca Bilbro
Words in Space - Rebecca BilbroWords in Space - Rebecca Bilbro
Words in Space - Rebecca Bilbro
 
End-to-End Machine learning pipelines for Python driven organizations - Nick ...
End-to-End Machine learning pipelines for Python driven organizations - Nick ...End-to-End Machine learning pipelines for Python driven organizations - Nick ...
End-to-End Machine learning pipelines for Python driven organizations - Nick ...
 
Pydata beautiful soup - Monica Puerto
Pydata beautiful soup - Monica PuertoPydata beautiful soup - Monica Puerto
Pydata beautiful soup - Monica Puerto
 
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
 
Extending Pandas with Custom Types - Will Ayd
Extending Pandas with Custom Types - Will AydExtending Pandas with Custom Types - Will Ayd
Extending Pandas with Custom Types - Will Ayd
 
Measuring Model Fairness - Stephen Hoover
Measuring Model Fairness - Stephen HooverMeasuring Model Fairness - Stephen Hoover
Measuring Model Fairness - Stephen Hoover
 
What's the Science in Data Science? - Skipper Seabold
What's the Science in Data Science? - Skipper SeaboldWhat's the Science in Data Science? - Skipper Seabold
What's the Science in Data Science? - Skipper Seabold
 
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
 
Solving very simple substitution ciphers algorithmically - Stephen Enright-Ward
Solving very simple substitution ciphers algorithmically - Stephen Enright-WardSolving very simple substitution ciphers algorithmically - Stephen Enright-Ward
Solving very simple substitution ciphers algorithmically - Stephen Enright-Ward
 
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
 

Último

A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 

Último (20)

A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 

Maxwell W Libbrecht - pomegranate: fast and flexible probabilistic modeling in python

  • 1. fast and flexible probabilistic modeling in python jmschreiber91 @jmschrei @jmschreiber91 Jacob Schreiber PhD student, Paul G. Allen School of Computer Science, University of Washington, Seattle WA Maxwell Libbrecht Postdoc, Department of Genome Sciences, University of Washington, Seattle WA Assistant Professor, Simon Fraser University, Vancouver BC noble.gs.washington.edu/~maxwl/
  • 2. fast and flexible probabilistic modeling in python jmschreiber91 @jmschrei @jmschreiber91 Jacob Schreiber PhD student, Paul G. Allen School of Computer Science, University of Washington, Seattle WA Maxwell Libbrecht Postdoc, Department of Genome Sciences, University of Washington, Seattle WA Assistant Professor, Simon Fraser University, Vancouver BC noble.gs.washington.edu/~maxwl/
  • 3. Follow along with the Jupyter tutorial 3 goo.gl/cbE7rs
  • 4. What is probabilistic modeling? 4goo.gl/cbE7rs Tasks: • Sample from a distribution. • Learn parameters of a distribution from data. • Predict which distribution generated a data example.
  • 5. When should you use pomegranate? 5 complexity of probabilistic modeling task simple complexz }| { • logistic regression • sample from a Gaussian distribution • speed is not important Use scipy or scikit-learn z }| { • distributions not implemented in pomegranate • problem-specific inference algorithms Use custom packages (in R or stand-alone) z }| { Use pomegranate goo.gl/cbE7rs
  • 6. When should you use pomegranate? 6 z }| { Use pomegranate probabilistic modeling tasks orthogonaltasks • Scientific computing: Use numpy, scipy • Non-probabilistic supervised learning (SVMs, neural networks): Use scikit-learn • Visualizing probabilistic models: Use matplotlib • Bayesian statistics: Check out PyMC3 goo.gl/cbE7rs
  • 7. Overview: this talk 7 Overview Major Models/Model Stacks 1. General Mixture Models 2. Hidden Markov Models 3. Bayesian Networks 4. Bayes Classifiers Finale: Train a mixture of HMMs in parallel
  • 8. The API is common to all models 8 All models have these methods! All models composed of distributions (like GMM, HMM...) have these methods too! model.log_probability(X) / model.probability(X) model.sample() model.fit(X, weights, inertia) model.summarize(X, weights) model.from_summaries(inertia) model.predict(X) model.predict_proba(X) model.predict_log_proba(X) Model.from_samples(X, weights) goo.gl/cbE7rs
  • 9. Overview: supported models Six Main Models: 1. Probability Distributions 2. General Mixture Models 3. Markov Chains 4. Hidden Markov Models 5. Bayes Classifiers / Naive Bayes 6. Bayesian Networks 9 Two Helper Models: 1. k-means++/kmeans|| 2. Factor Graphs
  • 10. pomegranate supports many distributions 10 Univariate Distributions 1. UniformDistribution 2. BernoulliDistribution 3. NormalDistribution 4. LogNormalDistribution 5. ExponentialDistribution 6. BetaDistribution 7. GammaDistribution 8. DiscreteDistribution 9. PoissonDistribution Kernel Densities 1. GaussianKernelDensity 2. UniformKernelDensity 3. TriangleKernelDensity Multivariate Distributions 1. IndependentComponentsDistributio n 2. MultivariateGaussianDistribution 3. DirichletDistribution 4. ConditionalProbabilityTable 5. JointProbabilityTable
  • 12. 12 Models can be learned from data X = numpy.random.normal(0, 1, 100) a = NormalDistribution.from_samples(X)
  • 13. Overview: model stacking in pomegranate 13 Distributions Bayes Classifiers Markov Chains General Mixture Models Hidden Markov Models Bayesian Networks D BC MC GMM HMM BN Sub-model Wrapper model
  • 14. Overview: model stacking in pomegranate 14 Distributions Bayes Classifiers Markov Chains General Mixture Models Hidden Markov Models Bayesian Networks D BC MC GMM HMM BN Sub-model Wrapper model
  • 15. Overview: model stacking in pomegranate 15 Distributions Bayes Classifiers Markov Chains General Mixture Models Hidden Markov Models Bayesian Networks D BC MC GMM HMM BN Sub-model Wrapper model
  • 16. Overview: model stacking in pomegranate 16 Distributions Bayes Classifiers Markov Chains General Mixture Models Hidden Markov Models Bayesian Networks D BC MC GMM HMM BN
  • 17. 17 pomegranate can be faster than numpy Fitting a Normal Distribution to 1,000 samples
  • 18. 18 pomegranate can be faster than numpy Fitting Multivariate Gaussian to 10,000,000 samples of 10 dimensions
  • 21. 21 pomegranate uses additive summarization pomegranate reduces data to sufficient statistics for updates and so only has to go datasets once (for all models). Here is an example of the Normal Distribution sufficient statistics
  • 22. 22 pomegranate supports out-of-core learning Batches from a dataset can be reduced to additive summary statistics, enabling exact updates from data that can’t fit in memory.
  • 23. 23 Parallelization exploits additive summaries Extract summaries + + + + New Parameters
  • 24. 24 pomegranate supports semisupervised learning Summary statistics from supervised models can be added to summary statistics from unsupervised models to train a single model on a mixture of labeled and unlabeled data.
  • 25. 25 pomegranate supports semisupervised learning Supervised Accuracy: 0.93 Semisupervised Accuracy: 0.96
  • 26. 26 pomegranate can be faster than scipy
  • 28.
  • 29. 29
  • 30. 30 Example ‘blast’ from Gossip Girl Spotted: Lonely Boy. Can't believe the love of his life has returned. If only she knew who he was. But everyone knows Serena. And everyone is talking. Wonder what Blair Waldorf thinks. Sure, they're BFF's, but we always thought Blair's boyfriend Nate had a thing for Serena.
  • 31. 31 How do we encode these ‘blasts’? Better lock it down with Nate, B. Clock's ticking. +1 Nate -1 Blair
  • 32. 32 How do we encode these ‘blasts’? This just in: S and B committing a crime of fashion. Who doesn't love a five-finger discount. Especially if it's the middle one. -1 Blair -1 Serena
  • 34. 34 Beta distributions can model uncertainty
  • 35. 35 Beta distributions can model uncertainty
  • 36. 36 Beta distributions can model uncertainty
  • 37. Overview: this talk 37 Overview Major Models/Model Stacks 1. General Mixture Models 2. Hidden Markov Models 3. Bayesian Networks 4. Bayes Classifiers Finale: Train a mixture of HMMs in parallel
  • 38. GMMs can model complex distributions 38
  • 39. GMMs can model complex distributions 39 model = GeneralMixtureModel.from_samples(NormalDistribution, 2, X)
  • 40. GMMs can model complex distributions 40
  • 41. An exponential distribution is not right 41 model = ExponentialDistribution.from_samples(X)
  • 42. A mixture of exponentials is better 42 model = GeneralMixtureModel.from_samples(ExponentialDistribution, 2, X)
  • 43. Heterogeneous mixtures natively supported 43 model = GeneralMixtureModel.from_samples([ExponentialDistribution, UniformDistribution], 2, X)
  • 44. GMMs faster than sklearn 44
  • 45. Overview: this talk 45 Overview Major Models/Model Stacks 1. General Mixture Models 2. Hidden Markov Models 3. Bayesian Networks 4. Bayes Classifiers Finale: Train a mixture of HMMs in parallel
  • 46. CG enrichment detection HMM 46 GACTACGACTCGCGCTCGCACGTCGCTCGACATCATCGACA
  • 47. CG enrichment detection HMM GACTACGACTCGCGCTCGCACGTCGCTCGACATCATCGACA 47
  • 48. pomegranate HMMs are feature rich 48
  • 49. GMM-HMM easy to define 49
  • 50. HMMs are faster than hmmlearn 50
  • 51. Overview: this talk 51 Overview Major Models/Model Stacks 1. General Mixture Models 2. Hidden Markov Models 3. Bayesian Networks 4. Bayes Classifiers Finale: Train a mixture of HMMs in parallel
  • 52. Bayesian networks 52 Bayesian networks are powerful inference tools which define a dependency structure between variables. Sprinkler Wet Grass Rain
  • 53. Bayesian networks 53 Sprinkler Wet Grass Rain Two main difficult tasks: (1) Inference given incomplete information (2) Learning the dependency structure from data
  • 54. Bayesian network inference 54 Inference is done using Belief Bropagation on a factor graph. Messages are sent from one side to the other until convergence. Rain Sprinkler Grass Wet R/S/GW Factor Grass Wet Rain Sprinkler Marginals Factors
  • 55. Inference in a medical diagnostic network 55 BRCA 2 BRCA 1 LCT BLOATLE LOA VOM AC PREGLIOC
  • 56. Inference in a medical diagnostic network 56 BRCA 2 BRCA 1 LCT BLOATLE LOA VOM AC PREGLIOC
  • 57. Bayesian network structure learning 57 ??? Three primary ways: ● “Search and score” / Exact ● “Constraint Learning” / PC ● Heuristics
  • 58. Bayesian network structure learning 58 ??? pomegranate supports: ● “Search and score” / Exact ● “Constraint Learning” / PC ● Heuristics
  • 59. Exact structure learning is intractable ??? 59
  • 60. Chow-Liu trees are fast approximations 60
  • 61. pomegranate supports four algorithms 61
  • 62. Constraint graphs merge data + knowledge 62 BRCA 2 BRCA 1 LCT BLOATLE LOA VOM AC PREGLIOC genetic conditions conditions symptoms
  • 63. Constraint graphs merge data + knowledge 63 genetic conditions diseases symptoms
  • 64. Modeling the global stock market 64
  • 65. Constraint graph published in PeerJ CS 65
  • 66. Overview: this talk 66 Overview Major Models/Model Stacks 1. General Mixture Models 2. Hidden Markov Models 3. Bayesian Networks 4. Bayes Classifiers Finale: Train a mixture of HMMs in parallel
  • 67. Bayes classifiers rely on Bayes’ rule 67
  • 68. 68 Let’s build a classifier on this data
  • 70. 70 Priors can model class imbalance
  • 71. 71 The posterior is a good model of the data
  • 72. Naive Bayes assumes independent features 72
  • 73. Naive Bayes produces ellipsoid boundaries 73model = NaiveBayes.from_samples(NormalDistribution, X, y)

  • 74. Naive Bayes can be heterogeneous 74
  • 75. Data can fall under different distributions 75
  • 76. Using appropriate distributions is better 76 model = NaiveBayes.from_samples(NormalDistribution, X_train, y_train) print "Gaussian Naive Bayes: ", (model.predict(X_test) == y_test).mean()
 clf = GaussianNB().fit(X_train, y_train)
 print "sklearn Gaussian Naive Bayes: ", (clf.predict(X_test) == y_test).mean()
 model = NaiveBayes.from_samples([NormalDistribution, LogNormalDistribution, ExponentialDistribution], X_train, y_train
 print "Heterogeneous Naive Bayes: ", (model.predict(X_test) == y_test).mean() Gaussian Naive Bayes: 0.798
 sklearn Gaussian Naive Bayes: 0.798
 Heterogeneous Naive Bayes: 0.844
  • 77. This additional flexibility is just as fast 77
  • 78. Bayes classifiers don’t require independence 78 naive accuracy: 0.929 bayes classifier accuracy: 0.966
  • 79. Gaussian mixture model Bayes classifier 79
  • 80. Creating complex Bayes classifiers is easy 80 gmm_a = GeneralMixtureModel.from_samples(MultivariateGaussianDistribution, 2, X[y == 0])
 gmm_b = GeneralMixtureModel.from_samples(MultivariateGaussianDistribution, 2, X[y == 1])
 model_b = BayesClassifier([gmm_a, gmm_b], weights=numpy.array([1-y.mean(), y.mean()]))

  • 81. Creating complex Bayes classifiers is easy 81 mc_a = MarkovChain.from_samples(X[y == 0]) mc_b = MarkovChain.from_samples(X[y == 1])
 model_b = BayesClassifier([mc_a, mc_b], weights=numpy.array([1-y.mean(), y.mean()])) hmm_a = HiddenMarkovModel.from_samples(X[y == 0]) hmm_b = HiddenMarkovModel.from_samples(X[y == 1])
 model_b = BayesClassifier([hmm_a, hmm_b], weights=numpy.array([1-y.mean(), y.mean()])) bn_a = BayesianNetwork.from_samples(X[y == 0]) bn_b = BayesianNetwork.from_samples(X[y == 1])
 model_b = BayesClassifier([bn_a, bn_b], weights=numpy.array([1-y.mean(), y.mean()]))
 

  • 82. Overview: this talk 82 Overview Major Models/Model Stacks 1. General Mixture Models 2. Hidden Markov Models 3. Bayesian Networks 4. Bayes Classifiers Finale: Train a mixture of HMMs in parallel
  • 83. Training a mixture of HMMs in parallel 83 Creating a mixture of HMMs is just as simple as passing the HMMs into a GMM as if it were any other distribution
  • 84. Training a mixture of HMMs in parallel 84fit(model, X, n_jobs=n)
  • 85. Documentation available at Readthedocs 85
  • 86. Tutorials available on github 86 https://github.com/jmschrei/pomegranate/tree/master/tutorials
  • 88. fast and flexible probabilistic modeling in python jmschreiber91 @jmschrei @jmschreiber91 Jacob Schreiber PhD student, Paul G. Allen School of Computer Science, University of Washington, Seattle WA Maxwell Libbrecht Postdoc, Department of Genome Sciences, University of Washington, Seattle WA Assistant Professor, Simon Fraser University, Vancouver BC noble.gs.washington.edu/~maxwl/