Statistical Analysis of Imaging Trials: Multivariate Methods and Prediction, Probing Cancer with MR II: From Animal Models to Clinical Assessment, 17th Annual Conference of the International Society for Magnetic Resonance in Medicine, Honolulu, Hawai\'i, April 19-24
2. Declaration of Conflict of Interest or
Relationship
Speaker Name: Brandon Whitcher
I have the following conflict of interest to disclose with regard to
the subject matter of this presentation:
Company name: GlaxoSmithKline
Type of relationship: Employment
3. Outline
Motivation
– Univariate vs. multivariate
data
Supervised Learning
– Linear methods
Regression
Classification
– Separating hyperplanes
– Support vector machine
(SVM)
Examples
– Tuning
– Cross-validation
– Visualization
– Receiver operating
characteristics (ROC)
Conclusions
4. Motivation
Imaging trials rarely produce a single measurement.
– Demographic
– Questionnaire
– Genetic
– Serum biomarkers
– Structural and functional imaging biomarkers
Imaging biomarkers
– Multiple measurements occur within or between modalities
MRI, PET, CT, etc.
– Functional imaging:
Diffusion-weighted imaging DWI
Dynamic contrast-enhanced MRI DCE-MRI
Dynamic susceptibility contrast-enhanced MRI DSC-MRI
Blood oxygenation level dependent MRI BOLD-MRI
MR spectroscopy MRS
How can we combine these disparate sources of information?
What new questions can be addressed?
5. Neuroscience Example
Fig. 1. Voxel-based-morphometry (VBM) analysis showing an additive effect of the APOE ε4
allele (APOE4) on grey matter volume (GMV).
Filippini et al. NeuroImage 2008
7. What is Supervised Learning?
T1, T2, DWI, Regression,
DCE-MRI, LDA, SVM,
MRS, Genetics
Test Data
NN
Step 2
Training Supervised
Model
Data Learning
Step 1
Benign, Results
malignant
8. Linear Regression
Given a set of inputs X = (X1, X2, …, Xp), want to predict Y
– Linear regression model: f(X) = β0 + ∑j Xjβj
– Minimize residual sum of squares: RSS(β) = ∑i (yi – f(xi))2
9. Linear Methods for Classification
Linear Discriminant Analysis (LDA)
– Procedure:
Estimate mean vectors and covariance matrix
Calculate linear decision boundaries
Classify points using linear decision boundaries
Logistic regression is another popular method
– Binary outcome with qualitative/quantitative predictors
– Maximize likelihood via iteratively re-weighted least squares
Neither method was designed to explicitly separate data.
– LDA = optimized when mean vector and covariance is known
– Logistic regression = to understand the role of the input variables
10. LDA w/ Two Classes: Step-by-Step
Measurement #2
Measurement #1
11. LDA w/ Three Classes: Step-by-Step
Measuring #2
Measurement #1
12. Separating Hyperplanes
Rosenblatt’s Perceptron Learning Algorithm (1958)
– Minimizes the distance of misclassified points to the decision
boundary:
min D(β,β0) = –∑iєM yi(xTβ + β0); yi = ±1
– Converges in a “finite” number of steps.
Problems (Ripley, 1996)
1. Separable data implies many solutions (initial conditions).
2. Slow convergence... smaller the gap = longer the time.
3. Nonseparable data implies the algorithm will not converge!
Optimal separating hyperplanes (Vapnik and Chervonenkis, 1963)
– Forms the foundation for support vector machines.
14. Support Vector Machines (Vapnik 1996)
Separates two classes and maximizes the distance to the closest point
from either class:
max C subject to yi(xTβ + β0) ≥ C; yi = ±1
Extends “optimal separating hyperplanes”
– Nonseparable case and nonlinear boundaries
– Contain a “cost” parameter that may be optimized
– May be used in the regression setting
Basis expansions
– Enlarges the feature space
– Allowed to get very large or infinite
– Examples include k(x,x′) = exp(-γ║x-x′║2); γ > 0
Gaussian radial basis function (RBF) kernel
Polynomial kernel
ANOVA radial basis kernel
– Contain a “scaling factor” that may be optimized
15. Support Vector Classifiers: separable case
1
C
1 margin
C
support point
Adapted from Hastie, Tibshirani and Friedman (2001)
xT 0 0
16. Support Vector Classifiers: nonseparable case
1
C
1 margin
C
4
5
1
3
2
Adapted from Hastie, Tibshirani and Friedman (2001)
xT 0 0
26. Example: Prostate Specific Antigen (PSA)
Stamey et al. (1989); used in Hastie, Tibshirani and Friedman (2001).
Correlation between the level of PSA and various clinical measures (N = 97)
– log cancer volume,
– log prostate weight,
– log of BPH amount,
– seminal vesicle invasion,
– log of capsular penetration,
– Gleason score, and
– percent of Gleason scores 4 or 5.
Regression problem since outcome measure is quantitative.
Training data = 67, Test data = 30.
32. Conclusions
Multivariate data are being collected from imaging studies.
In order to utilize this information:
– Use the “right” statistical method
– Collaborate with quantitative scientists
– Paradigm shift in the analysis of imaging studies
Embrace the richness of multi-functional imaging data
– Quantitative
– Raw (avoid summaries)
Design of imaging studies requires
– A priori knowledge
– Few and focused scientific questions
– Well-defined methodology
34. Bibliography
Filippini N, Rao, A, et al. Anatomically-distinct genetic associations of APOE ε4 allele
load with regional cortical atrophy in Alzheimer's disease. NeuroImage 2009, 44:724-
728.
Freer TW, Ulissey, MJ. Screening Mammography with Computer-aided Detection:
Prospective Study of 12,860 Patients in a Community Breast Center. Radiology 2001,
220:781-786.
Hastie T, Tibshirani, R, Freidman, J. The Elements of Statistical Learning, Springer,
2001.
McDonough KL. Breast Cancer Stage Cost Analysis in a Manage Care Population.
American Journal of Managed Care 1999, 5(6):S377-S382.
R Development Team. R: A Language and Environment for Statistical Computing. R
Foundation for Statistical Computing, Vienna, Austria.
– www.R-project.org
– R package e1071
– R package mlbench
Ripley, BD. Pattern Recognition and Neural Networks, Cambridge University Press,
1996.
Vos PC, Hambrock, T, et al. Computerized analysis of prostate lesions in the peripheral
zone using dynamic contrast enhanced MRI. Medical Physics 2008, 35(3):888-899.
Wolberg WH, Mangasarian, OL. Multisurface method of pattern separation for medical
diagnosis applied to breast cytology. PNAS 1990, 87(23):9193-9196.