SlideShare una empresa de Scribd logo
1 de 34
Descargar para leer sin conexión
Statistical Techniques for
Multi-functional Imaging Trials

Brandon Whitcher, PhD
Image Analysis & Mathematical Biology
Clinical Imaging Centre, GlaxoSmithKline
Declaration of Conflict of Interest or
           Relationship
 Speaker Name: Brandon Whitcher
 I have the following conflict of interest to disclose with regard to
 the subject matter of this presentation:
 Company name: GlaxoSmithKline
 Type of relationship: Employment
Outline

 Motivation
  – Univariate vs. multivariate
     data
 Supervised Learning
  – Linear methods
         Regression
         Classification
  – Separating hyperplanes
  – Support vector machine
    (SVM)
 Examples
  – Tuning
  – Cross-validation
  – Visualization
  – Receiver operating
    characteristics (ROC)
 Conclusions
Motivation

 Imaging trials rarely produce a single measurement.
   – Demographic
   – Questionnaire
   – Genetic
   – Serum biomarkers
   – Structural and functional imaging biomarkers
 Imaging biomarkers
   – Multiple measurements occur within or between modalities
         MRI, PET, CT, etc.
  – Functional imaging:
         Diffusion-weighted imaging                       DWI
         Dynamic contrast-enhanced MRI                    DCE-MRI
         Dynamic susceptibility contrast-enhanced MRI     DSC-MRI
         Blood oxygenation level dependent MRI            BOLD-MRI
         MR spectroscopy                                  MRS
 How can we combine these disparate sources of information?
 What new questions can be addressed?
Neuroscience Example




   Fig. 1. Voxel-based-morphometry (VBM) analysis showing an additive effect of the APOE ε4
                         allele (APOE4) on grey matter volume (GMV).

Filippini et al. NeuroImage 2008
Motivation (cont.)

 Univariate statistical methods
  – One method → one measurement → answer one question
  – One method → multiple measurements
        Measurement #1 → answer question #1
        Measurement #2 → answer question #1
        …
 Multivariate statistical methods
  – Method #1 → one measurement
  – Method #2 → multiple measurements              answer one question
  – Method #3 → multiple measurements
  – …
 Goal = Prediction (e.g., computer-aided diagnosis)
  – Supervised learning procedures
What is Supervised Learning?


 T1, T2, DWI,             Regression,
  DCE-MRI,                LDA, SVM,
MRS, Genetics
                                        Test Data
                             NN
                                                    Step 2



    Training             Supervised
                                          Model
     Data                 Learning


                Step 1


                            Benign,      Results
                           malignant
Linear Regression

 Given a set of inputs X = (X1, X2, …, Xp), want to predict Y

  – Linear regression model:                         f(X) = β0 + ∑j Xjβj

  – Minimize residual sum of squares:            RSS(β) = ∑i (yi – f(xi))2
Linear Methods for Classification

 Linear Discriminant Analysis (LDA)




  – Procedure:
        Estimate mean vectors and covariance matrix
        Calculate linear decision boundaries
        Classify points using linear decision boundaries
 Logistic regression is another popular method
  – Binary outcome with qualitative/quantitative predictors
  – Maximize likelihood via iteratively re-weighted least squares
 Neither method was designed to explicitly separate data.
  – LDA = optimized when mean vector and covariance is known
  – Logistic regression = to understand the role of the input variables
LDA w/ Two Classes: Step-by-Step


     Measurement #2




                      Measurement #1
LDA w/ Three Classes: Step-by-Step


      Measuring #2




                     Measurement #1
Separating Hyperplanes

 Rosenblatt’s Perceptron Learning Algorithm (1958)
 – Minimizes the distance of misclassified points to the decision
    boundary:
                   min D(β,β0) = –∑iєM yi(xTβ + β0); yi = ±1

 – Converges in a “finite” number of steps.
 Problems (Ripley, 1996)
 1. Separable data implies many solutions (initial conditions).
 2. Slow convergence... smaller the gap = longer the time.
 3. Nonseparable data implies the algorithm will not converge!
 Optimal separating hyperplanes (Vapnik and Chervonenkis, 1963)
 – Forms the foundation for support vector machines.
Separating Hyperplanes: separable case



  optimal
Support Vector Machines (Vapnik 1996)

 Separates two classes and maximizes the distance to the closest point
 from either class:
                    max C subject to yi(xTβ + β0) ≥ C; yi = ±1

 Extends “optimal separating hyperplanes”
  – Nonseparable case and nonlinear boundaries
  – Contain a “cost” parameter that may be optimized
  – May be used in the regression setting
 Basis expansions
  – Enlarges the feature space
  – Allowed to get very large or infinite
  – Examples include                        k(x,x′) = exp(-γ║x-x′║2); γ > 0
        Gaussian radial basis function (RBF) kernel
        Polynomial kernel
        ANOVA radial basis kernel
  – Contain a “scaling factor” that may be optimized
Support Vector Classifiers: separable case

                                                           1
                                                      C
                                                           
                   1                     margin
        C
                  




                                                                 support point




Adapted from Hastie, Tibshirani and Friedman (2001)
                                                           xT   0  0
Support Vector Classifiers: nonseparable case

                                                                    1
                                                               C
                                                                    
                   1                     margin
        C
                  
                                                          4
                                                           
                                                                5
                                                                 


                                                   1
                                                         3
                                                          



                                                         2
                                                          




Adapted from Hastie, Tibshirani and Friedman (2001)
                                                                    xT   0  0
Support Vector Machine: Spiral Example
Support Vector Machine: Spiral Example
Receiver Operating Characteristic (ROC)

 Graphical plot of sensitivity vs. (1 – specificity)
  – Binary classifier system as discrimination threshold varies

                            actual value
                             p        n     total     2×2 contingency table
                       True        False
                    p’ Positive    Positive P’
         prediction
          outcome      False       True
                    n’ Negative    Negative N’

                   total     P        N


 Sensitivity = True Positive Rate = TP / (TP + FN)
 Specificity = 1 – False Positive Rate = 1 – FP / (FP + TN)
Example: Breast Cytology

                               699 samples
                                – 9 measurements (ordinal)
                                       Clump thickness
                                       Cell size uniformity
                                       Cell shape uniformity
                                       Marginal adhesion
                                       Single epithelial cell size
                                       Bare nuclei
                                       Bland chromatin
                                       Normal nucleoli
                                       Mitoses
                                – 2 classes
                                       Benign
                                       Malignant
                               Classification problem since
                               outcome measure is binary.
                               Train = 550, Test = 133.
Wolberg & Mangasarian (1990)
Example: Breast Cytology
Example: Breast Cytology




          Diagnostic plot from SVM procedure.
Example: Breast Cytology




          Response surface to SVM parameters.
Example: Breast Cytology


                 Logistic Regression
                  Benign            Malignant
Benign            84                5
                                                sensitivity = 95.5%
Malignant         4                 40          specificity = 88.9%
             Linear Discriminant Analysis
                  Benign            Malignant
Benign            90                6
                                                sensitivity = 98.9%
Malignant         1                 36
                                                specificity = 85.7%
            Naïve Support Vector Machine
                  Benign            Malignant
Benign            89                2
                                                sensitivity = 97.8%
                                                specificity = 95.2%
Malignant         2                 40
            Tuned Support Vector Machine
                  Benign            Malignant
                                                sensitivity = 97.8%
Benign            89                1
                                                specificity = 97.6%
Malignant         2                 41
Example: Breast Cytology




           Sensitivity




                         1 - Specificity


        Receiver operating characteristic (ROC) plot.
Example: Prostate Specific Antigen (PSA)




 Stamey et al. (1989); used in Hastie, Tibshirani and Friedman (2001).
 Correlation between the level of PSA and various clinical measures (N = 97)
  – log cancer volume,
  – log prostate weight,
  – log of BPH amount,
  – seminal vesicle invasion,
  – log of capsular penetration,
  – Gleason score, and
  – percent of Gleason scores 4 or 5.
 Regression problem since outcome measure is quantitative.
 Training data = 67, Test data = 30.
Example: Prostate Specific Antigen (PSA)
Example: Prostate Specific Antigen (PSA)




       Best subset selection for linear regression model.
Example: Prostate Specific Antigen (PSA)




          linear regression model (lcavol, lweight).
Example: Prostate Specific Antigen (PSA)




          Response surface to SVM parameters.
Example: Prostate Specific Antigen (PSA)




             Prediction errors for test data.
Conclusions

 Multivariate data are being collected from imaging studies.
 In order to utilize this information:
   – Use the “right” statistical method
   – Collaborate with quantitative scientists
   – Paradigm shift in the analysis of imaging studies
 Embrace the richness of multi-functional imaging data
   – Quantitative
   – Raw (avoid summaries)
 Design of imaging studies requires
   – A priori knowledge
   – Few and focused scientific questions
   – Well-defined methodology
Acknowledgments

Anwar Padhani
Roberto Alonzi
Claire Allen
Mark Emberton
Henkjan Huisman
Giulio Gambarota
Bibliography

 Filippini N, Rao, A, et al. Anatomically-distinct genetic associations of APOE ε4 allele
 load with regional cortical atrophy in Alzheimer's disease. NeuroImage 2009, 44:724-
 728.
 Freer TW, Ulissey, MJ. Screening Mammography with Computer-aided Detection:
 Prospective Study of 12,860 Patients in a Community Breast Center. Radiology 2001,
 220:781-786.
 Hastie T, Tibshirani, R, Freidman, J. The Elements of Statistical Learning, Springer,
 2001.
 McDonough KL. Breast Cancer Stage Cost Analysis in a Manage Care Population.
 American Journal of Managed Care 1999, 5(6):S377-S382.
 R Development Team. R: A Language and Environment for Statistical Computing. R
 Foundation for Statistical Computing, Vienna, Austria.
   – www.R-project.org
   – R package e1071
   – R package mlbench
 Ripley, BD. Pattern Recognition and Neural Networks, Cambridge University Press,
 1996.
 Vos PC, Hambrock, T, et al. Computerized analysis of prostate lesions in the peripheral
 zone using dynamic contrast enhanced MRI. Medical Physics 2008, 35(3):888-899.
 Wolberg WH, Mangasarian, OL. Multisurface method of pattern separation for medical
 diagnosis applied to breast cytology. PNAS 1990, 87(23):9193-9196.

Más contenido relacionado

La actualidad más candente

Lesson 20: Derivatives and the Shapes of Curves (handout)
Lesson 20: Derivatives and the Shapes of Curves (handout)Lesson 20: Derivatives and the Shapes of Curves (handout)
Lesson 20: Derivatives and the Shapes of Curves (handout)Matthew Leingang
 
ASCE_ChingHuei_Rev00..
ASCE_ChingHuei_Rev00..ASCE_ChingHuei_Rev00..
ASCE_ChingHuei_Rev00..butest
 
TUKE MediaEval 2012: Spoken Web Search using DTW and Unsupervised SVM
TUKE MediaEval 2012: Spoken Web Search using DTW and Unsupervised SVMTUKE MediaEval 2012: Spoken Web Search using DTW and Unsupervised SVM
TUKE MediaEval 2012: Spoken Web Search using DTW and Unsupervised SVMMediaEval2012
 
Chapter 3 projection
Chapter 3 projectionChapter 3 projection
Chapter 3 projectionNBER
 
Measurement of Consumer Welfare
Measurement of Consumer WelfareMeasurement of Consumer Welfare
Measurement of Consumer WelfareNBER
 
Fp day1 pm_robles
Fp day1 pm_roblesFp day1 pm_robles
Fp day1 pm_roblesMTID
 
Chapter 2 pertubation
Chapter 2 pertubationChapter 2 pertubation
Chapter 2 pertubationNBER
 
Reliability-Based Design Optimization Using a Cell Evolution Method ~陳奇中教授演講投影片
Reliability-Based Design Optimization Using a Cell Evolution Method ~陳奇中教授演講投影片Reliability-Based Design Optimization Using a Cell Evolution Method ~陳奇中教授演講投影片
Reliability-Based Design Optimization Using a Cell Evolution Method ~陳奇中教授演講投影片Chyi-Tsong Chen
 
Lecture on solving1
Lecture on solving1Lecture on solving1
Lecture on solving1NBER
 
GECCO'2007: Modeling XCS in Class Imbalances: Population Size and Parameter S...
GECCO'2007: Modeling XCS in Class Imbalances: Population Size and Parameter S...GECCO'2007: Modeling XCS in Class Imbalances: Population Size and Parameter S...
GECCO'2007: Modeling XCS in Class Imbalances: Population Size and Parameter S...Albert Orriols-Puig
 
Yearlylessonplanaddmathf42010
Yearlylessonplanaddmathf42010Yearlylessonplanaddmathf42010
Yearlylessonplanaddmathf42010suefee
 
Nonparametric Density Estimation
Nonparametric Density EstimationNonparametric Density Estimation
Nonparametric Density Estimationjachno
 
Chapter 1 nonlinear
Chapter 1 nonlinearChapter 1 nonlinear
Chapter 1 nonlinearNBER
 
Chapter 4 likelihood
Chapter 4 likelihoodChapter 4 likelihood
Chapter 4 likelihoodNBER
 
08 rita schoeny
08 rita schoeny08 rita schoeny
08 rita schoenyradarrt
 
Automata languages and computation
Automata languages and computationAutomata languages and computation
Automata languages and computationKarthik Velou
 

La actualidad más candente (20)

Lesson 20: Derivatives and the Shapes of Curves (handout)
Lesson 20: Derivatives and the Shapes of Curves (handout)Lesson 20: Derivatives and the Shapes of Curves (handout)
Lesson 20: Derivatives and the Shapes of Curves (handout)
 
ASCE_ChingHuei_Rev00..
ASCE_ChingHuei_Rev00..ASCE_ChingHuei_Rev00..
ASCE_ChingHuei_Rev00..
 
TUKE MediaEval 2012: Spoken Web Search using DTW and Unsupervised SVM
TUKE MediaEval 2012: Spoken Web Search using DTW and Unsupervised SVMTUKE MediaEval 2012: Spoken Web Search using DTW and Unsupervised SVM
TUKE MediaEval 2012: Spoken Web Search using DTW and Unsupervised SVM
 
Chapter 3 projection
Chapter 3 projectionChapter 3 projection
Chapter 3 projection
 
Measurement of Consumer Welfare
Measurement of Consumer WelfareMeasurement of Consumer Welfare
Measurement of Consumer Welfare
 
Fp day1 pm_robles
Fp day1 pm_roblesFp day1 pm_robles
Fp day1 pm_robles
 
Chapter 2 pertubation
Chapter 2 pertubationChapter 2 pertubation
Chapter 2 pertubation
 
Reliability-Based Design Optimization Using a Cell Evolution Method ~陳奇中教授演講投影片
Reliability-Based Design Optimization Using a Cell Evolution Method ~陳奇中教授演講投影片Reliability-Based Design Optimization Using a Cell Evolution Method ~陳奇中教授演講投影片
Reliability-Based Design Optimization Using a Cell Evolution Method ~陳奇中教授演講投影片
 
Baum2
Baum2Baum2
Baum2
 
Lecture on solving1
Lecture on solving1Lecture on solving1
Lecture on solving1
 
Std10 maths-em-1
Std10 maths-em-1Std10 maths-em-1
Std10 maths-em-1
 
GECCO'2007: Modeling XCS in Class Imbalances: Population Size and Parameter S...
GECCO'2007: Modeling XCS in Class Imbalances: Population Size and Parameter S...GECCO'2007: Modeling XCS in Class Imbalances: Population Size and Parameter S...
GECCO'2007: Modeling XCS in Class Imbalances: Population Size and Parameter S...
 
Interview Preparation
Interview PreparationInterview Preparation
Interview Preparation
 
Yearlylessonplanaddmathf42010
Yearlylessonplanaddmathf42010Yearlylessonplanaddmathf42010
Yearlylessonplanaddmathf42010
 
Nonparametric Density Estimation
Nonparametric Density EstimationNonparametric Density Estimation
Nonparametric Density Estimation
 
Chapter 1 nonlinear
Chapter 1 nonlinearChapter 1 nonlinear
Chapter 1 nonlinear
 
Chapter 4 likelihood
Chapter 4 likelihoodChapter 4 likelihood
Chapter 4 likelihood
 
08 rita schoeny
08 rita schoeny08 rita schoeny
08 rita schoeny
 
Automata languages and computation
Automata languages and computationAutomata languages and computation
Automata languages and computation
 
Bp31457463
Bp31457463Bp31457463
Bp31457463
 

Similar a Whitcher Ismrm 2009

Lecture7 cross validation
Lecture7 cross validationLecture7 cross validation
Lecture7 cross validationStéphane Canu
 
Huong dan cu the svm
Huong dan cu the svmHuong dan cu the svm
Huong dan cu the svmtaikhoan262
 
13ClassifierPerformance.pdf
13ClassifierPerformance.pdf13ClassifierPerformance.pdf
13ClassifierPerformance.pdfssuserdce5c21
 
Jörg Stelzer
Jörg StelzerJörg Stelzer
Jörg Stelzerbutest
 
Analytical study of feature extraction techniques in opinion mining
Analytical study of feature extraction techniques in opinion miningAnalytical study of feature extraction techniques in opinion mining
Analytical study of feature extraction techniques in opinion miningcsandit
 
Radial Basis Function Neural Network (RBFNN), Induction Motor, Vector control...
Radial Basis Function Neural Network (RBFNN), Induction Motor, Vector control...Radial Basis Function Neural Network (RBFNN), Induction Motor, Vector control...
Radial Basis Function Neural Network (RBFNN), Induction Motor, Vector control...cscpconf
 
ANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MINING
ANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MININGANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MINING
ANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MININGcsandit
 
Anomaly detection using deep one class classifier
Anomaly detection using deep one class classifierAnomaly detection using deep one class classifier
Anomaly detection using deep one class classifier홍배 김
 
Item Response Theory in Constructing Measures
Item Response Theory in Constructing MeasuresItem Response Theory in Constructing Measures
Item Response Theory in Constructing MeasuresCarlo Magno
 
Machine learning algorithms and business use cases
Machine learning algorithms and business use casesMachine learning algorithms and business use cases
Machine learning algorithms and business use casesSridhar Ratakonda
 
Introduction
IntroductionIntroduction
Introductionbutest
 
Surface features with nonparametric machine learning
Surface features with nonparametric machine learningSurface features with nonparametric machine learning
Surface features with nonparametric machine learningSylvain Ferrandiz
 
2.8 accuracy and ensemble methods
2.8 accuracy and ensemble methods2.8 accuracy and ensemble methods
2.8 accuracy and ensemble methodsKrish_ver2
 
Introduction to conventional machine learning techniques
Introduction to conventional machine learning techniquesIntroduction to conventional machine learning techniques
Introduction to conventional machine learning techniquesXavier Rafael Palou
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningAI Summary
 

Similar a Whitcher Ismrm 2009 (20)

Lecture7 cross validation
Lecture7 cross validationLecture7 cross validation
Lecture7 cross validation
 
5 5 10
5 5 105 5 10
5 5 10
 
Guide
GuideGuide
Guide
 
Huong dan cu the svm
Huong dan cu the svmHuong dan cu the svm
Huong dan cu the svm
 
13ClassifierPerformance.pdf
13ClassifierPerformance.pdf13ClassifierPerformance.pdf
13ClassifierPerformance.pdf
 
Jörg Stelzer
Jörg StelzerJörg Stelzer
Jörg Stelzer
 
Analytical study of feature extraction techniques in opinion mining
Analytical study of feature extraction techniques in opinion miningAnalytical study of feature extraction techniques in opinion mining
Analytical study of feature extraction techniques in opinion mining
 
Radial Basis Function Neural Network (RBFNN), Induction Motor, Vector control...
Radial Basis Function Neural Network (RBFNN), Induction Motor, Vector control...Radial Basis Function Neural Network (RBFNN), Induction Motor, Vector control...
Radial Basis Function Neural Network (RBFNN), Induction Motor, Vector control...
 
ANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MINING
ANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MININGANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MINING
ANALYTICAL STUDY OF FEATURE EXTRACTION TECHNIQUES IN OPINION MINING
 
Anomaly detection using deep one class classifier
Anomaly detection using deep one class classifierAnomaly detection using deep one class classifier
Anomaly detection using deep one class classifier
 
Item Response Theory in Constructing Measures
Item Response Theory in Constructing MeasuresItem Response Theory in Constructing Measures
Item Response Theory in Constructing Measures
 
Svm dbeth
Svm dbethSvm dbeth
Svm dbeth
 
CAD v2
CAD v2CAD v2
CAD v2
 
Machine learning algorithms and business use cases
Machine learning algorithms and business use casesMachine learning algorithms and business use cases
Machine learning algorithms and business use cases
 
Introduction
IntroductionIntroduction
Introduction
 
Guide
GuideGuide
Guide
 
Surface features with nonparametric machine learning
Surface features with nonparametric machine learningSurface features with nonparametric machine learning
Surface features with nonparametric machine learning
 
2.8 accuracy and ensemble methods
2.8 accuracy and ensemble methods2.8 accuracy and ensemble methods
2.8 accuracy and ensemble methods
 
Introduction to conventional machine learning techniques
Introduction to conventional machine learning techniquesIntroduction to conventional machine learning techniques
Introduction to conventional machine learning techniques
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 

Whitcher Ismrm 2009

  • 1. Statistical Techniques for Multi-functional Imaging Trials Brandon Whitcher, PhD Image Analysis & Mathematical Biology Clinical Imaging Centre, GlaxoSmithKline
  • 2. Declaration of Conflict of Interest or Relationship Speaker Name: Brandon Whitcher I have the following conflict of interest to disclose with regard to the subject matter of this presentation: Company name: GlaxoSmithKline Type of relationship: Employment
  • 3. Outline Motivation – Univariate vs. multivariate data Supervised Learning – Linear methods Regression Classification – Separating hyperplanes – Support vector machine (SVM) Examples – Tuning – Cross-validation – Visualization – Receiver operating characteristics (ROC) Conclusions
  • 4. Motivation Imaging trials rarely produce a single measurement. – Demographic – Questionnaire – Genetic – Serum biomarkers – Structural and functional imaging biomarkers Imaging biomarkers – Multiple measurements occur within or between modalities MRI, PET, CT, etc. – Functional imaging: Diffusion-weighted imaging DWI Dynamic contrast-enhanced MRI DCE-MRI Dynamic susceptibility contrast-enhanced MRI DSC-MRI Blood oxygenation level dependent MRI BOLD-MRI MR spectroscopy MRS How can we combine these disparate sources of information? What new questions can be addressed?
  • 5. Neuroscience Example Fig. 1. Voxel-based-morphometry (VBM) analysis showing an additive effect of the APOE ε4 allele (APOE4) on grey matter volume (GMV). Filippini et al. NeuroImage 2008
  • 6. Motivation (cont.) Univariate statistical methods – One method → one measurement → answer one question – One method → multiple measurements Measurement #1 → answer question #1 Measurement #2 → answer question #1 … Multivariate statistical methods – Method #1 → one measurement – Method #2 → multiple measurements answer one question – Method #3 → multiple measurements – … Goal = Prediction (e.g., computer-aided diagnosis) – Supervised learning procedures
  • 7. What is Supervised Learning? T1, T2, DWI, Regression, DCE-MRI, LDA, SVM, MRS, Genetics Test Data NN Step 2 Training Supervised Model Data Learning Step 1 Benign, Results malignant
  • 8. Linear Regression Given a set of inputs X = (X1, X2, …, Xp), want to predict Y – Linear regression model: f(X) = β0 + ∑j Xjβj – Minimize residual sum of squares: RSS(β) = ∑i (yi – f(xi))2
  • 9. Linear Methods for Classification Linear Discriminant Analysis (LDA) – Procedure: Estimate mean vectors and covariance matrix Calculate linear decision boundaries Classify points using linear decision boundaries Logistic regression is another popular method – Binary outcome with qualitative/quantitative predictors – Maximize likelihood via iteratively re-weighted least squares Neither method was designed to explicitly separate data. – LDA = optimized when mean vector and covariance is known – Logistic regression = to understand the role of the input variables
  • 10. LDA w/ Two Classes: Step-by-Step Measurement #2 Measurement #1
  • 11. LDA w/ Three Classes: Step-by-Step Measuring #2 Measurement #1
  • 12. Separating Hyperplanes Rosenblatt’s Perceptron Learning Algorithm (1958) – Minimizes the distance of misclassified points to the decision boundary: min D(β,β0) = –∑iєM yi(xTβ + β0); yi = ±1 – Converges in a “finite” number of steps. Problems (Ripley, 1996) 1. Separable data implies many solutions (initial conditions). 2. Slow convergence... smaller the gap = longer the time. 3. Nonseparable data implies the algorithm will not converge! Optimal separating hyperplanes (Vapnik and Chervonenkis, 1963) – Forms the foundation for support vector machines.
  • 14. Support Vector Machines (Vapnik 1996) Separates two classes and maximizes the distance to the closest point from either class: max C subject to yi(xTβ + β0) ≥ C; yi = ±1 Extends “optimal separating hyperplanes” – Nonseparable case and nonlinear boundaries – Contain a “cost” parameter that may be optimized – May be used in the regression setting Basis expansions – Enlarges the feature space – Allowed to get very large or infinite – Examples include k(x,x′) = exp(-γ║x-x′║2); γ > 0 Gaussian radial basis function (RBF) kernel Polynomial kernel ANOVA radial basis kernel – Contain a “scaling factor” that may be optimized
  • 15. Support Vector Classifiers: separable case 1 C  1 margin C  support point Adapted from Hastie, Tibshirani and Friedman (2001) xT   0  0
  • 16. Support Vector Classifiers: nonseparable case 1 C  1 margin C  4  5   1 3  2  Adapted from Hastie, Tibshirani and Friedman (2001) xT   0  0
  • 17. Support Vector Machine: Spiral Example
  • 18. Support Vector Machine: Spiral Example
  • 19. Receiver Operating Characteristic (ROC) Graphical plot of sensitivity vs. (1 – specificity) – Binary classifier system as discrimination threshold varies actual value p n total 2×2 contingency table True False p’ Positive Positive P’ prediction outcome False True n’ Negative Negative N’ total P N Sensitivity = True Positive Rate = TP / (TP + FN) Specificity = 1 – False Positive Rate = 1 – FP / (FP + TN)
  • 20. Example: Breast Cytology 699 samples – 9 measurements (ordinal) Clump thickness Cell size uniformity Cell shape uniformity Marginal adhesion Single epithelial cell size Bare nuclei Bland chromatin Normal nucleoli Mitoses – 2 classes Benign Malignant Classification problem since outcome measure is binary. Train = 550, Test = 133. Wolberg & Mangasarian (1990)
  • 22. Example: Breast Cytology Diagnostic plot from SVM procedure.
  • 23. Example: Breast Cytology Response surface to SVM parameters.
  • 24. Example: Breast Cytology Logistic Regression Benign Malignant Benign 84 5 sensitivity = 95.5% Malignant 4 40 specificity = 88.9% Linear Discriminant Analysis Benign Malignant Benign 90 6 sensitivity = 98.9% Malignant 1 36 specificity = 85.7% Naïve Support Vector Machine Benign Malignant Benign 89 2 sensitivity = 97.8% specificity = 95.2% Malignant 2 40 Tuned Support Vector Machine Benign Malignant sensitivity = 97.8% Benign 89 1 specificity = 97.6% Malignant 2 41
  • 25. Example: Breast Cytology Sensitivity 1 - Specificity Receiver operating characteristic (ROC) plot.
  • 26. Example: Prostate Specific Antigen (PSA) Stamey et al. (1989); used in Hastie, Tibshirani and Friedman (2001). Correlation between the level of PSA and various clinical measures (N = 97) – log cancer volume, – log prostate weight, – log of BPH amount, – seminal vesicle invasion, – log of capsular penetration, – Gleason score, and – percent of Gleason scores 4 or 5. Regression problem since outcome measure is quantitative. Training data = 67, Test data = 30.
  • 27. Example: Prostate Specific Antigen (PSA)
  • 28. Example: Prostate Specific Antigen (PSA) Best subset selection for linear regression model.
  • 29. Example: Prostate Specific Antigen (PSA) linear regression model (lcavol, lweight).
  • 30. Example: Prostate Specific Antigen (PSA) Response surface to SVM parameters.
  • 31. Example: Prostate Specific Antigen (PSA) Prediction errors for test data.
  • 32. Conclusions Multivariate data are being collected from imaging studies. In order to utilize this information: – Use the “right” statistical method – Collaborate with quantitative scientists – Paradigm shift in the analysis of imaging studies Embrace the richness of multi-functional imaging data – Quantitative – Raw (avoid summaries) Design of imaging studies requires – A priori knowledge – Few and focused scientific questions – Well-defined methodology
  • 33. Acknowledgments Anwar Padhani Roberto Alonzi Claire Allen Mark Emberton Henkjan Huisman Giulio Gambarota
  • 34. Bibliography Filippini N, Rao, A, et al. Anatomically-distinct genetic associations of APOE ε4 allele load with regional cortical atrophy in Alzheimer's disease. NeuroImage 2009, 44:724- 728. Freer TW, Ulissey, MJ. Screening Mammography with Computer-aided Detection: Prospective Study of 12,860 Patients in a Community Breast Center. Radiology 2001, 220:781-786. Hastie T, Tibshirani, R, Freidman, J. The Elements of Statistical Learning, Springer, 2001. McDonough KL. Breast Cancer Stage Cost Analysis in a Manage Care Population. American Journal of Managed Care 1999, 5(6):S377-S382. R Development Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. – www.R-project.org – R package e1071 – R package mlbench Ripley, BD. Pattern Recognition and Neural Networks, Cambridge University Press, 1996. Vos PC, Hambrock, T, et al. Computerized analysis of prostate lesions in the peripheral zone using dynamic contrast enhanced MRI. Medical Physics 2008, 35(3):888-899. Wolberg WH, Mangasarian, OL. Multisurface method of pattern separation for medical diagnosis applied to breast cytology. PNAS 1990, 87(23):9193-9196.