6. 6Hybrid Models for Heterogeneous and Multimodal Data
Motivation
Integrate
Interpret
multimodal, multisource data and
learn models that aid users the data.
Hybrid Systems
that
Aim: build
7. 7Hybrid Models for Heterogeneous and Multimodal Data
Motivation
Integrate
Interpret
Hybrid Systems
VIPR
Visualizations for
Informative Projection
Recovery
DNDF
Deep Neural Decision
Forests
ShortFuse
Learning Representations
from Time Series and
Structured Information
8. 8Hybrid Models for Heterogeneous and Multimodal Data
Motivation
Weak Supervision
for Cardiac MRI
Classification
Future Research
Directions
Interpret
Hybrid Systems
Integrate
10. Application: Alert Classification
10
§ Heart Rate<40 or >140
§ Respiratory Rate<8 or >36
§ Systolic Blood Pressure<80 or >200
§ Diastolic Blood Pressure>110
§ SPO2<85%
window of 4 minutes
preceding alert onset
alert duration
Features computed from time series include common statistics of
each VS: mean, stdev, min, max, range of values, duty cycle ...
Health alerts
some are
artifacts, not
true alerts
Informative Projection Recovery
11. 40 60 80 100 120 140 160 180 200 220 240
value-HR-mean
80
82
84
86
88
90
92
94
96
98
100
value-SPO2-mean
Defining interpretability
11Informative Projection Recovery
Imperfect separation Clear separation
0 0.2 0.4 0.6 0.8 1 1.2 1.4
0
0.2
0.4
0.6
0.8
1
1.2
1.4
value-HR-data--den
value-SPO2-data--den
Heart Rate Density*
Oxygen Saturation Density
Respiratory Rate
Respiratory Rate Increase
INFORMATIVE
PROJECTION
x
*Density = Average / Typical Values
Guillaume Obozinski, Ben Taskar, and Michael I. Jordan. Joint covariate selection and joint subspace selection for
multiple classification problems. Statistics and Computing, April 2010.
Related work on structured sparsity:
artifact
true alert
23. Min Respiratory Rate
Heart Rate Data Density
23
artifact
true alert
Informative Projection Recovery
§ 2 Informative
Projections
§ Test point
handled by one
of them
§ Accuracy: 0.91,
Precision: 0.93,
Recall: 0.945
§ Better accuracy
than Random
Forests and SVM
(<0.9)
Fiterau M, Dubrawski A, Chen L, Hravnak M, Clermont G, Pinsky MR. Automatic identification of artifacts in
monitoring critically ill patients. Annual Congress of the European Society of Intensive Care Medicine 2014.
Alert Classification with VIPR
25. More Research on Informative Projections
§ Informative projection retrieval for regression and clustering
§ Finding informative projections with active learning
§ Studies on usability by domain experts
§ Theoretical guarantees
§ Related work on interpretability:
25Informative Projection Recovery
Madalina Fiterau and Artur Dubrawski. Informative projection recovery for semi-supervised classification, clustering
and regression. In International Conference on Machine Learning and Applications, volume 12, ICMLA 2013.
Madalina Fiterau and Artur Dubrawski. Active learning for Informative Projection Recovery. In the Conference of the
Association for the Advancement of Artificial Intelligence, volume 29, AAAI 2015.
Fiterau M, Wang J, Dubrawski A, Clermont G, Hravnak M, Pinsky MR. Using expert review to calibrate semi-automated adjudication of
vital sign alerts in step-down units. Society of Critical Care Medicine Annual Congress 2016. Star Research Award.
Fiterau M, Dubrawski A, Chen L, Hravnak M, Bose E, Gilles C, Pinsky MR. Archetyping artifacts in monitored
noninvasive vital signs data. Society of Critical Care Medicine Annual Congress 2015. Oral Presentation.
PhD Thesis, Ch. 2.5 (VC dimension and Risk consistency); Under review: Compression scheme + Sample complexity
Bing Liu, Minqing Hu, and Wynne Hsu. Intuitive representation of decision trees using general rules and exceptions.
In Proceedings of Seventeeth National Conference on Artificial Intelligence (AAAI-2000).
NOW: Lipton, Zachary C. "The mythos of model interpretability." arXiv preprint arXiv:1606.03490 (2016).
Interpretable ML Symposium - NIPS 2017.
26. Deep Neural Decision Forests
Deep Neural Decision Forests 26
This research was partially completed during an internship at MSR Cambridge, UK.
Collaborators:
Peter Kontschieder, Microsoft Research
Antonio Criminisi, Microsoft Research
Samuel Rota-Bulò, Fondazione Bruno Kessler
27. Hybrid Models
27Deep Neural Decision Forests
Dataset
(tabular)
Classifier
(Random Forests)
Feature
Engineering
Hybrid Model
C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper
with convolutions. CVPR 2015
31. µℓ (x;Θ) = dn (x;Θ)1ℓ←n
n∈φℓ
∏ (1− dn (x;Θ)1n→ℓ
)
Modeling Node Splits
Deep Neural Decision Forests 31
Sigmoid functiond1
d2
d4 d5
d3
d6 d7
`4
Image by Samuel Rota-Bulò
§ Hierarchical routing along path Φl to leaf l
Φl4 = {n1, n2, n5}
µℓ4
(x;Θ) =σ (θ1
T
x)(1−σ (θ2
T
x))(1−σ (θ5
T
x))
1 if l belongs to left subtree of n
1 if l belongs to right subtree of n
32. Merging Decision Forests to Networks
Deep Neural Decision Forests 32
§ Each output of the DeepNet becomes a feature for the
Backpropagation Forest
Image credit: Samuel Rota-Bulò
d1
d2
d4
⇡1 ⇡2
d5
⇡3 ⇡4
d3
d6
⇡5 ⇡6
d7
⇡7 ⇡8
f7f3f6f1f5f2f4
d8
d9
d11
⇡9 ⇡10
d12
⇡11 ⇡12
d10
d13
⇡13 ⇡14
d14
⇡15 ⇡16
f14f10f13f8f12f9f11FC
Deep CNN with parameters ⇥
33. ImageNet Experiment
§ Millions of images
§ 1000 synsets (classes)
§ Modified GoogLeNet*, replaced Softmax layers with BPF
Deep Neural Decision Forests 33
* C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions.
Description Top 5 Error
GoogLeNet 10.07%
1 model, 1 crop 7.84%
1 model, 10 crops 7.08%
7 models, 1 crop 6.38%
Can now introduce
other covariates in the
model via the BPF.
Peter Kontschieder, Madalina Fiterau, Antonio Criminisi and Samuel Rota-Bulo. Deep
Neural Decision Forests, International Conference in Computer Vision, ICCV 2015.
34. ShortFuse: Learning Time Series
Representations in the Presence of
Structured Information
ShortFuse: Learning Time Series Representations with Structured Information 34
This work was supported in part by the Mobilize Center, a National Institutes of Health Big Data
to Knowledge (BD2K) Center of Excellence supported through Grant U54EB020405
Collaborators:
Suvrat Bhooshan, Stanford CS
Jason Fries, Stanford CS
Charles Bournhonesque, Stanford ICME
Jennifer Hicks, Stanford Bioenginnering
Eni Halilaj, Stanford Bioenginnering
Chris Re, Stanford CS
Scott Delp, Stanford Bioenginnering
36. Osteoarthritis Progression
ShortFuse: Learning Time Series Representations with Structured Information 36
§ Knee osteoarthritis causes cartilage degeneration
§ Activity influences progression; other factors
§ Can we predict osteoarthritis progression?
Joint Space
Narrowing
Activity counts
Source: Wikipedia
Gender
Nutrition
Age
Physical exam
Symptoms
40. 40
GenderAge Height Weight
12 M 154 77
Kernel
Covariates introduced in the representation learning process.
Hybrid CNN
ShortFuse: Learning Time Series Representations with Structured Information
X S = vector of d covariates
n sequences
t time points
41. 41
GenderAge Height Weight
12 M 154 77
+⊗
…. Deep
Network
Kernel
Covariates introduced in the representation learning process.
Hybrid CNN
ShortFuse: Learning Time Series Representations with Structured Information
X S = vector of d covariates
n sequences
t time points
contains terms of the type
42. Hybrid CNN
§ CNN used for the biomedical applications
§ Convolutional layers replaced with hybrid convolutions
§ Equivalent modification for LSTM
• Added parameters corresponding to the covariates
ShortFuse: Learning Time Series Representations with Structured Information 42
Age
Gender
Height
12
M
154
Mass77
...
...
...
Convolution Pooling
...
Convolution Pooling
Fully
Connected
Output
Joint motion waveforms
Madalina Fiterau, Suvrat Bhooshan, Jason Fries, Charles Bournhonesque, Jennifer Hicks, Eni Halilaj, Christopher Ré and Scott Delp. ShortFuse:
Biomedical Time Series Representations in the Presence of Structured Information. 3rd Conference on Machine Learning for Healthcare, MLHC 2017
43. Osteoarthritis Progression Results
ShortFuse: Learning Time Series Representations with Structured Information 43
Osteoarthritis Initiative Dataset (OAI) – 1926 subjects.
The OAI is a public-private partnership comprised of five contracts (N01-AR-2-2258; N01-AR-2-2259; N01-AR-2-2260; N01-AR-2-2261; N01-AR-2-2262) funded
by the National Institutes of Health (NIH).
Task: Predict whether subjects are at risk for OA progression.
Output: Joint space narrowing (JSN) > 0.7mm.
Joint symptoms/function
Medical history
Nutrition
Physical exam, measurements
Subject characteristics, risk factors
650 covariates,
out of which we selected 50.
Activity counts
Accelerometer data
7-day activity counts.
44. Osteoarthritis Progression Results
ShortFuse: Learning Time Series Representations with Structured Information 44
Osteoarthritis Initiative Dataset (OAI) – 1926 subjects.
The OAI is a public-private partnership comprised of five contracts (N01-AR-2-2258; N01-AR-2-2259; N01-AR-2-2260; N01-AR-2-2261; N01-AR-2-2262) funded
by the National Institutes of Health (NIH).
Binary classification: fast/slow progression
State of the art (engineered features, appended covariates): 67%
Best representation learning without covariates: 71%
Best representation learning with appended covariates: 72%
ShortFuse: 74% accuracy
Task: Predict whether subjects are at risk for OA progression.
Output: Joint space narrowing (JSN) > 0.7mm.
48. Cerebral Palsy Treatment
ShortFuse: Learning Time Series Representations with Structured Information 48
Binary classification: good/bad surgical outcome
State of the art (engineered features, appended covariates): 78%
Best representation learning without covariates: 74%
Best representation learning with appended covariates: 76%
ShortFuse: 78% accuracy
49. Weak Supervision for the Classification
of Aortic Valve Malformations
from Cardiac MRIs
Weak Supervision for Cardiac MRI Classification 49
To appear in Nature Communications. We acknowledge support from the NIH (U54 EB020405),
DARPA under No. FA87501720095 (D3M), ONR under No. N000141712266 and No. N000141410102.
Other Collaborators:
Jared Dunmon, Stanford CS
Ke Xiao, Stanford Medicine
Helio Tejeda, Stanford Medicine
Scott Delp, Stanford BioX
Chris Ré, Stanford CSJason Fries
Stanford CS
James Priest,
Stanford Med
Principal
Investigator
Paper lead
author
Paroma Varma,
Stanford CS
50. Source: www.umcvc.org
§ Congenital malformation
§ Incidence: 0.5-2%
§ Associated with
poor health outcomes
§ Diagnosed following
cardiovascular issues
§ May require surgical
replacement of valve
§ Need: link genetic information to cardiac morphology
§ Limitations: variable data of diagnosis; absence of large
imaging datasets specifically targeting subjects with BAV
Bicuspid Aortic Valve (BAV) Disease
Weak Supervision for Cardiac MRI Classification 50
51. UK Biobank
§ > 500,000 subjects total
§ For 100,000:
• Medical imaging
• Genotyping
§ Phase-contrast MRI
• Initial release
• 14,328 subjects
• Measure blood flow
• Multi-view
• ‘Sliced’ view
• 4-D tensors, 3 planes
§ No (BAV) labels L
Weak Supervision for Cardiac MRI Classification 51
52. Gold Standard Labels
§ 412 patients; 12,360 individual MRI frames
• development set: 100 controls and 6 BAV patients
q selected via chart review of disease codes related to BAV
q annotated by one cardiologist
• validation set: 208 controls and 8 BAV patients
q random uniform sampling
q captures class distribution expected at test
q annotated by one cardiologist
• held-out test set: 88 controls and 3 BAV patients
q random uniform sampling
q annotated by 3 cardiologists + vote
q agreement kappa = 0.354
q only used for the final evaluation
Weak Supervision for Cardiac MRI Classification 52
53. Weak Supervision for Cardiac MRI Classification 53
Probabilistic labels
Train Deep Net
Data programming
paradigm in Chris Ré’s
group: Snorkel, Coral.
Weak Supervision for MRI Classification
MRI Sequences Processed Segments
Preprocessing Domain Heuristics
Final MRI Labels
…
Generative Model
Weak Labels
!1 !2 !3 !4 !5
!1
!2
!3 !4
y
!5
56. Generative Model
Weak Supervision for Cardiac MRI Classification 56
!1
!2
!3 !4
y
!5
Generative Model
Probabilistic training labels
Labeling functions !
[SNORKEL] Ratner, A. J., De Sa, C. M., Wu, S., Selsam, D. & Re, C. Data programming: Creating large training sets,
quickly. NIPS 2016.
[GENERATIVE MODEL] Bach, S. H., He, B., Ratner, A. & Re, C. Learning the structure of generative models without
labeled data, ICML 2017.
[CORAL] Varma, P. et al. Inferring generative model structure with static analysis. NIPS 2017.
Research on data programming:
64. Starting up at UMass Amherst
§ Fusion of Multi-resolution Irregularly Sampled Time Series
• students: Iman Deznabi, Bhanu Pratap Singh
§ Multimodal Deep Learning to Forecast Disease Progression
• use MRIs, X-rays for OA progression
• combine DL with feature engineering
• students: Joie Wu, Surya Teja
§ Transfer Learning across Thermal Imaging Datasets
• person detection, face segmentation, body temp. estimation
• students: Debasmita Ghose, Sneha Bhattacharya, Shasvat Desai
§ Incorporating Domain Knowledge in Bayesian Deep Learning
• student: Aritra Gosh
§ Deep Causality
• Student: Purva Purty
64Future Research