Madalina Fiterau - Hybrid Machine Learning Methods for the Interpretation and Integration of Heterogeneous Multimodal Data

Hybrid Machine Learning Methods for
the Interpretation and Integration of
Heterogeneous Multimodal Data
Madalina Fiterau, University of Massachusetts Amherst
1
Advisors:
Artur Dubrawski, CMU, Auton Lab
Christopher Ré, Stanford CS
Scott Delp, Stanford Bioengineering
mfiterau e-mail: mfiterau@cs.umass.edu
New York, March 29th 2019

2Hybrid Models for Heterogeneous and Multimodal Data
Motivation
Vital Signs Gait Kinematics
Longitudinal dataAccelerometerX-rays MRIs
Stereo Recordings (video)
Structured
Information
Notes

X-rays MRIs
Structured
Information
Notes
Motivation
Longitudinal dataAccelerometer

X-rays MRIs
Structured
Information
Notes
Motivation
Integrate

X-rays MRIs
Structured
Information
Notes
Motivation
Integrate
Interpret

Motivation
Integrate
Interpret
multimodal, multisource data and
learn models that aid users the data.
Hybrid Systems
that
Aim: build

Motivation
Integrate
Interpret
Hybrid Systems
VIPR
Visualizations for
Informative Projection
Recovery
DNDF
Deep Neural Decision
Forests
ShortFuse
Learning Representations
from Time Series and
Structured Information

Motivation
Weak Supervision
for Cardiac MRI
Classification
Future Research
Directions
Interpret
Hybrid Systems
Integrate

VIPR: Visualizations for
Informative Projection Recovery
9
Collaborators:
Artur Dubrawski, CMU SCS
Donghan (Jarod) Wang, CMU, Auton Lab
Dr. Gilles Clermont, University of Pittsburgh
Dr. Marilyn Hravnak, University of Pittsburgh
Dr. Michael R. Pinsky, University of Pittsburgh
Github: https://github.com/inafiterau/VIPR

Application: Alert Classification
10
§ Heart Rate<40 or >140
§ Respiratory Rate<8 or >36
§ Systolic Blood Pressure<80 or >200
§ Diastolic Blood Pressure>110
§ SPO2<85%
window of 4 minutes
preceding alert onset
alert duration
Features computed from time series include common statistics of
each VS: mean, stdev, min, max, range of values, duty cycle ...
Health alerts
some are
artifacts, not
true alerts

40 60 80 100 120 140 160 180 200 220 240
value-HR-mean
80
82
84
86
88
90
92
94
96
98
100
value-SPO2-mean
Defining interpretability
11Informative Projection Recovery
Imperfect separation Clear separation
0 0.2 0.4 0.6 0.8 1 1.2 1.4
0
0.2
0.4
0.6
0.8
1
1.2
1.4
value-HR-data--den
value-SPO2-data--den
Heart Rate Density*
Oxygen Saturation Density
Respiratory Rate
Respiratory Rate Increase
INFORMATIVE
PROJECTION
x
*Density = Average / Typical Values
Guillaume Obozinski, Ben Taskar, and Michael I. Jordan. Joint covariate selection and joint subspace selection for
multiple classification problems. Statistics and Computing, April 2010.
Related work on structured sparsity:
artifact
true alert

Feature Selection, with a Twist
0 0.2 0.4 0.6 0.8 1 1.2 1.4
value-HR-data--den
0
0.2
0.4
0.6
0.8
1
1.2
1.4
value-RR-data--denRespiratory Rate Density
Noisy
samples
Blood Pressure Density
Handled
differently

Sparse Predictive Structures
X
Y

Y
X
Z
VIPR – a quick overview

Z
split on Y split on X, Y
split on X

Selecting Informative Projections
16
1
2
3
4
5
6
7
Data Points
Projections
Loss Matrix (L)
cj
... ...
Axis-aligned, 1D, 2D, 3D
minimal loss
low loss
high loss

Selecting Informative Projections
Penalty – limits
# of projections
1
2
3
4
5
6
7
Data Points
Projections
Loss Matrix (L)
17
cj
minimal loss
low loss
high loss

The Combinatorial Problem
Penalty – limits
# of projections
1
2
3
4
5
6
7
Data Points
Projections
18
Selection Matrix (B)
§ B binary selection matrix
§ bij is
§ 1, if projection j is to be
used to solve point i and
§ 0, otherwise

The Combinatorial Problem
some points use
suboptimal projections
1
2
3
4
5
6
7
Data Points
Projections
19
§ B binary selection matrix
§ bij is
§ 1, if projection j is to be
used to solve point i and
§ 0, otherwise
§ Learning B is NP-hard

Integer Linear Program
1
2
3
4
5
6
7
Data Points
Projections
20
§ ILP minimizes loss
§ Row constraints: sum to 1
§ Column constraints: up to k non-0
maximize − "
#$%
&
'#
(
ℓ#
subject to 0 ≤ bij ≤ pj ≤1 integer
bij =1,
j=1
m
∑ ∀i ∈ {1...n}
pj ≤ k
j=1
m
∑
Mk
*
= minMk ∈{(C,H,gmin )s.t.|H|<k}
L(Mk , X)
§ Best k sub-models for training data

Iterative Convex Procedure
1
2
3
4
5
6
7
Data Points
Projections
1
2
3
4
5
6
7
Data Points
Projections
Loss Matrix (L) Target Loss (T)
!" = min
'
("'
Madalina Fiterau and Artur Dubrawski. Projection Retrieval for Classification. In
Advances in Neural Information Processing Systems 25, pages 3032–3040, NIPS 2012.
Convex Program
min
)
! − (( ∗ -)10 1
1
+ 345(-)
( ∗ - "' = ("'-"'where

Min Respiratory Rate
Heart Rate Data Density
23
artifact
true alert
§ 2 Informative
Projections
§ Test point
handled by one
of them
§ Accuracy: 0.91,
Precision: 0.93,
Recall: 0.945
§ Better accuracy
than Random
Forests and SVM
(<0.9)
Fiterau M, Dubrawski A, Chen L, Hravnak M, Clermont G, Pinsky MR. Automatic identification of artifacts in
monitoring critically ill patients. Annual Congress of the European Society of Intensive Care Medicine 2014.
Alert Classification with VIPR

24
Heart Rate Density
Oxygen Saturation Density
artifact
true alert
Alert Classification with VIPR
Finger Plethysmograph
Noninvasive ECG
Interpretability and performance are NOT at odds.
Low density values
indicate probe fell off

More Research on Informative Projections
§ Informative projection retrieval for regression and clustering
§ Finding informative projections with active learning
§ Studies on usability by domain experts
§ Theoretical guarantees
§ Related work on interpretability:
Madalina Fiterau and Artur Dubrawski. Informative projection recovery for semi-supervised classification, clustering
and regression. In International Conference on Machine Learning and Applications, volume 12, ICMLA 2013.
Madalina Fiterau and Artur Dubrawski. Active learning for Informative Projection Recovery. In the Conference of the
Association for the Advancement of Artificial Intelligence, volume 29, AAAI 2015.
Fiterau M, Wang J, Dubrawski A, Clermont G, Hravnak M, Pinsky MR. Using expert review to calibrate semi-automated adjudication of
vital sign alerts in step-down units. Society of Critical Care Medicine Annual Congress 2016. Star Research Award.
Fiterau M, Dubrawski A, Chen L, Hravnak M, Bose E, Gilles C, Pinsky MR. Archetyping artifacts in monitored
noninvasive vital signs data. Society of Critical Care Medicine Annual Congress 2015. Oral Presentation.
PhD Thesis, Ch. 2.5 (VC dimension and Risk consistency); Under review: Compression scheme + Sample complexity
Bing Liu, Minqing Hu, and Wynne Hsu. Intuitive representation of decision trees using general rules and exceptions.
In Proceedings of Seventeeth National Conference on Artificial Intelligence (AAAI-2000).
NOW: Lipton, Zachary C. "The mythos of model interpretability." arXiv preprint arXiv:1606.03490 (2016).
Interpretable ML Symposium - NIPS 2017.

Deep Neural Decision Forests
Deep Neural Decision Forests 26
This research was partially completed during an internship at MSR Cambridge, UK.
Collaborators:
Peter Kontschieder, Microsoft Research
Antonio Criminisi, Microsoft Research
Samuel Rota-Bulò, Fondazione Bruno Kessler

Hybrid Models
27Deep Neural Decision Forests
Dataset
(tabular)
Classifier
(Random Forests)
Feature
Engineering
Hybrid Model
C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper
with convolutions. CVPR 2015

Deep Learning + Accurate Classifier
§ End-to-end deep learning architecture
§ Challenge: need differentiable objective
Decision tree ‘layers’

Back-propagation Trees
§ RF structure adapted to allow back propagation
θ
Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner. Gradient based learning applied to document recognition. In
Proceedings of the IEEE, pages 2278–2324, 1998

§ Soft routing of samples
§ Class distributions in leaf nodes
• optimal given a routing
§ Likelihood term
• weighted sum over set of all leaves L
§ Objective
Back-propagation Trees
π1
ℓ
...πc
ℓ
dn (x;Θ) 1− dn (x;Θ)

µℓ (x;Θ) = dn (x;Θ)1ℓ←n
n∈φℓ
∏ (1− dn (x;Θ)1n→ℓ
)
Modeling Node Splits
Sigmoid functiond1
d2
d4 d5
d3
d6 d7
`4
Image by Samuel Rota-Bulò
§ Hierarchical routing along path Φl to leaf l
Φl4 = {n1, n2, n5}
µℓ4
(x;Θ) =σ (θ1
T
x)(1−σ (θ2
T
x))(1−σ (θ5
T
x))
1 if l belongs to left subtree of n
1 if l belongs to right subtree of n

Merging Decision Forests to Networks
§ Each output of the DeepNet becomes a feature for the
Backpropagation Forest
Image credit: Samuel Rota-Bulò
d1
d2
d4
⇡1 ⇡2
d5
⇡3 ⇡4
d3
d6
⇡5 ⇡6
d7
⇡7 ⇡8
f7f3f6f1f5f2f4
d8
d9
d11
⇡9 ⇡10
d12
⇡11 ⇡12
d10
d13
⇡13 ⇡14
d14
⇡15 ⇡16
f14f10f13f8f12f9f11FC
Deep CNN with parameters ⇥

ImageNet Experiment
§ Millions of images
§ 1000 synsets (classes)
§ Modified GoogLeNet*, replaced Softmax layers with BPF
* C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions.
Description Top 5 Error
GoogLeNet 10.07%
1 model, 1 crop 7.84%
1 model, 10 crops 7.08%
7 models, 1 crop 6.38%
Can now introduce
other covariates in the
model via the BPF.
Peter Kontschieder, Madalina Fiterau, Antonio Criminisi and Samuel Rota-Bulo. Deep
Neural Decision Forests, International Conference in Computer Vision, ICCV 2015.

ShortFuse: Learning Time Series
Representations in the Presence of
ShortFuse: Learning Time Series Representations with Structured Information 34
This work was supported in part by the Mobilize Center, a National Institutes of Health Big Data
to Knowledge (BD2K) Center of Excellence supported through Grant U54EB020405
Collaborators:
Suvrat Bhooshan, Stanford CS
Jason Fries, Stanford CS
Charles Bournhonesque, Stanford ICME
Jennifer Hicks, Stanford Bioenginnering
Eni Halilaj, Stanford Bioenginnering
Chris Re, Stanford CS
Scott Delp, Stanford Bioenginnering

35
Biomedical Time Series Representations
in the Presence of Structured Information
Demographics
Clinical tests
Medical history
Short Fuse
Time series
Representations
Structured information
Prediction
ShortFuse: Learning Time Series Representations with Structured Information
N. Razavian and D. Sontag. Temporal convolutional neural networks for diagnosis from lab tests. 2015
A. Borovykh, S. Bohte, and C. W. Oosterlee. Conditional time series forecasting with CNNS. 2017
Z. Cui, W. Chen, and Y. Chen. Multi-scale convolutional neural networks for time series classification. 2016.
Related work:

Osteoarthritis Progression
§ Knee osteoarthritis causes cartilage degeneration
§ Activity influences progression; other factors
§ Can we predict osteoarthritis progression?
Joint Space
Narrowing
Activity counts
Source: Wikipedia
Gender
Nutrition
Age
Physical exam
Symptoms

37
Osteoarthritis
Progression
obese
Activity counts
peak intensity
fobese
Deep Net
Effect of Structured Information

obese
fobese
peak intensity
mean
fnormal
Activity counts
normal
weight
Deep Net
38
Osteoarthritis
Progression
Effect of Structured Information

§ Hybrid convolutions
§ Each filter uses a different set of covariates
39
GenderAge Height Weight
12 M 154 77
Covariates introduced in the representation learning process.
Hybrid CNN
X S = vector of d covariates
n sequences
t time points

40
12 M 154 77
Kernel
Hybrid CNN
n sequences
t time points

41
12 M 154 77
+⊗
…. Deep
Network
Kernel
Hybrid CNN
n sequences
t time points
contains terms of the type

Hybrid CNN
§ CNN used for the biomedical applications
§ Convolutional layers replaced with hybrid convolutions
§ Equivalent modification for LSTM
• Added parameters corresponding to the covariates
Age
Gender
Height
12
M
154
Mass77
...
...
...
Convolution Pooling
...
Convolution Pooling
Fully
Connected
Output
Joint motion waveforms
Madalina Fiterau, Suvrat Bhooshan, Jason Fries, Charles Bournhonesque, Jennifer Hicks, Eni Halilaj, Christopher Ré and Scott Delp. ShortFuse:
Biomedical Time Series Representations in the Presence of Structured Information. 3rd Conference on Machine Learning for Healthcare, MLHC 2017

Osteoarthritis Progression Results
Osteoarthritis Initiative Dataset (OAI) – 1926 subjects.
The OAI is a public-private partnership comprised of five contracts (N01-AR-2-2258; N01-AR-2-2259; N01-AR-2-2260; N01-AR-2-2261; N01-AR-2-2262) funded
by the National Institutes of Health (NIH).
Task: Predict whether subjects are at risk for OA progression.
Output: Joint space narrowing (JSN) > 0.7mm.
Joint symptoms/function
Medical history
Nutrition
Physical exam, measurements
Subject characteristics, risk factors
650 covariates,
out of which we selected 50.
Activity counts
Accelerometer data
7-day activity counts.

Osteoarthritis Progression Results
Osteoarthritis Initiative Dataset (OAI) – 1926 subjects.
The OAI is a public-private partnership comprised of five contracts (N01-AR-2-2258; N01-AR-2-2259; N01-AR-2-2260; N01-AR-2-2261; N01-AR-2-2262) funded
by the National Institutes of Health (NIH).
Binary classification: fast/slow progression
State of the art (engineered features, appended covariates): 67%
Best representation learning without covariates: 71%
Best representation learning with appended covariates: 72%
ShortFuse: 74% accuracy
Task: Predict whether subjects are at risk for OA progression.
Output: Joint space narrowing (JSN) > 0.7mm.

Cerebral Palsy
Birth-acquired condition which affects mobility.

Gait Kinematics
§ Time series: Joint angles obtained during the subject's gait
cycle from motion capture using markers
Hip flexion angle
Knee flexion angle
Ankle flexion angle
0 20 40 60 80 100
0
5
10
15
20
25
30
35
40
45
0 20 40 60 80 100
0
10
20
30
40
50
60
70
80
0 20 40 60 80 100
-10
-5
0
5
10
15
20
0 20 40 60 80 100
0
5
10
15
20
25
30
35
40
45
0 20 40 60 80 100
0
10
20
30
40
50
60
70
80
0 20 40 60 80 100
-10
-5
0
5
10
15
20
0 20 40 60 80 100
0
5
10
15
20
25
30
35
40
45
0 20 40 60 80 100
0
10
20
30
40
50
60
70
80
0 20 40 60 80 100
-10
-5
0
5
10
15
20
Source: Gillette’s Children Specialty Care
Gait
Deviation
Index

Cerebral Palsy Treatment
Surgical treatment (skeletal, muscular) is invasive.
Results vary greatly, making treatment planning difficult.
§ Psoas lengthening surgery
§ Positive outcome:
• post-surgical Gait Deviation
Index (GDI) > 90
• > 5 points improvement in
Pelvis and Hip Dev. Index
(PHiDI)
psoas major
iliacus
iliopsoas

Cerebral Palsy Treatment
Binary classification: good/bad surgical outcome
State of the art (engineered features, appended covariates): 78%
Best representation learning without covariates: 74%
Best representation learning with appended covariates: 76%
ShortFuse: 78% accuracy

Weak Supervision for the Classification
of Aortic Valve Malformations
from Cardiac MRIs
Weak Supervision for Cardiac MRI Classification 49
To appear in Nature Communications. We acknowledge support from the NIH (U54 EB020405),
DARPA under No. FA87501720095 (D3M), ONR under No. N000141712266 and No. N000141410102.
Other Collaborators:
Jared Dunmon, Stanford CS
Ke Xiao, Stanford Medicine
Helio Tejeda, Stanford Medicine
Scott Delp, Stanford BioX
Chris Ré, Stanford CSJason Fries
Stanford CS
James Priest,
Stanford Med
Principal
Investigator
Paper lead
author
Paroma Varma,
Stanford CS

Source: www.umcvc.org
§ Congenital malformation
§ Incidence: 0.5-2%
§ Associated with
poor health outcomes
§ Diagnosed following
cardiovascular issues
§ May require surgical
replacement of valve
§ Need: link genetic information to cardiac morphology
§ Limitations: variable data of diagnosis; absence of large
imaging datasets specifically targeting subjects with BAV
Bicuspid Aortic Valve (BAV) Disease

UK Biobank
§ > 500,000 subjects total
§ For 100,000:
• Medical imaging
• Genotyping
§ Phase-contrast MRI
• Initial release
• 14,328 subjects
• Measure blood flow
• Multi-view
• ‘Sliced’ view
• 4-D tensors, 3 planes
§ No (BAV) labels L

Gold Standard Labels
§ 412 patients; 12,360 individual MRI frames
• development set: 100 controls and 6 BAV patients
q selected via chart review of disease codes related to BAV
q annotated by one cardiologist
• validation set: 208 controls and 8 BAV patients
q random uniform sampling
q captures class distribution expected at test
q annotated by one cardiologist
• held-out test set: 88 controls and 3 BAV patients
q random uniform sampling
q annotated by 3 cardiologists + vote
q agreement kappa = 0.354
q only used for the final evaluation

Probabilistic labels
Train Deep Net
Data programming
paradigm in Chris Ré’s
group: Snorkel, Coral.
Weak Supervision for MRI Classification
MRI Sequences Processed Segments
Preprocessing Domain Heuristics
Final MRI Labels
…
Generative Model
Weak Labels
!1 !2 !3 !4 !5
!1
!2
!3 !4
y
!5

MRI Preprocessing
Image credit: Jason Fries

Labeling Heuristics
BAV TAV
Primitive Observation LF
Area ABAV > ATAV !1
Eccentricity EBAV > ETAV !2
Perimeter PBAV > TAV !3
Intensity IBAV < ITAV !4
- A/P2 differs !5

Generative Model
!1
!2
!3 !4
y
!5
Generative Model
Probabilistic training labels
Labeling functions !
[SNORKEL] Ratner, A. J., De Sa, C. M., Wu, S., Selsam, D. & Re, C. Data programming: Creating large training sets,
quickly. NIPS 2016.
[GENERATIVE MODEL] Bach, S. H., He, B., Ratner, A. & Re, C. Learning the structure of generative models without
labeled data, ICML 2017.
[CORAL] Varma, P. et al. Inferring generative model structure with static analysis. NIPS 2017.
Research on data programming:

Discriminative Model
MAG aortic valve box
+ probabilistic labels
…
DenseNet
40-12
Attention BiLSTM
Frame encoder Sequence encoder
BAV/
TAV
§ DenseNet40-12 outperformed VGG16 and ResNet-50
§ Data augmentation - crops, affine transformations

Classification Performance
Credit: Jason Fries

Survival Analysis
Credit: Jason Fries
Major Adverse Cardiac Event (from (ICD-9, ICD-10, OPCS-4)
N = 9,230

Future Research
60Future Research
Research articles, notes
Domain insights
Related multimodal datasets
Hybrid System
Analysis +
transferable
models

§ Use video for gait lab patients
§ For osteoarthritis study: use the MRIs and X-rays as well.
Integrating Specialized Tools in Hybrid Systems
61Future Research
Source: Gillette Children’s Specialty Care Source: Delp Lab
§ Text mining approaches

Weakly-supervised Transfer of
Models and Representations
62Future Research
Model trained on
healthy adults
Weakly supervised
adaptation
Model specialized
for children.
Model specialized
for injured subjects.
Image sources: Delp Lab, Gillette Children’s Specialty Care, CAMERA project
Image Source: MedicalExpo

Online Adaptive Policies for Feature
Selection and Representation Learning
63Future Research
Image sources: BioPac, Medical Express, Research Gate, Journal of Circulation
...
.
.
.
Convolution Pooling
.
.
.
Convolution Pooling
Fully
Connected
Output
Optimize data collection: sources, sensor arrays.
Cost: Acquisition, Invasiveness.
Leverage user-engineered features in the
representation learning pipeline.

Starting up at UMass Amherst
§ Fusion of Multi-resolution Irregularly Sampled Time Series
• students: Iman Deznabi, Bhanu Pratap Singh
§ Multimodal Deep Learning to Forecast Disease Progression
• use MRIs, X-rays for OA progression
• combine DL with feature engineering
• students: Joie Wu, Surya Teja
§ Transfer Learning across Thermal Imaging Datasets
• person detection, face segmentation, body temp. estimation
• students: Debasmita Ghose, Sneha Bhattacharya, Shasvat Desai
§ Incorporating Domain Knowledge in Bayesian Deep Learning
• student: Aritra Gosh
§ Deep Causality
• Student: Purva Purty
64Future Research

Conclusion
65
VIPR
Visualizations for
Informative Projection
Recovery
DNDF
Deep Neural Decision
Forests
ShortFuse
Learning Representations
from Time Series and
Optimize feature
selection and
learning
Weakly
supervised
transfer
Incorporating
data-specific
techniques
Weak Supervision
for Cardiac MRI
Classification

Thanks!
66
New York, March 29th 2019
Madalina Fiterau, University of Massachusetts Amherst
mfiterau e-mail: mfiterau@cs.umass.edu

Madalina Fiterau - Hybrid Machine Learning Methods for the Interpretation and Integration of Heterogeneous Multimodal Data

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (18)

Similar a Madalina Fiterau - Hybrid Machine Learning Methods for the Interpretation and Integration of Heterogeneous Multimodal Data

Similar a Madalina Fiterau - Hybrid Machine Learning Methods for the Interpretation and Integration of Heterogeneous Multimodal Data (20)

Más de MLconf

Más de MLconf (20)

Último

Último (20)

Madalina Fiterau - Hybrid Machine Learning Methods for the Interpretation and Integration of Heterogeneous Multimodal Data