SlideShare una empresa de Scribd logo
1 de 26
Descargar para leer sin conexión
Towards Better than Human Capability in
         Diagnosing Prostate Cancer
    Using Infrared Spectroscopic Imaging

       Xavier Llorà1, Rohith Reddy2,3, Brian Matesic2, Rohit Bhargava2,3

1 National    Center for Supercomputing Applications & Illinois Genetic Algorithms Laboratory
                                 2 Department   of Bioengineering
                    3 Beckman   Institute for Advanced Science and Technology
                           University of Illinois at Urbana-Champaign




                        Supported by AFOSR FA9550-06-1-0370, NSF at ISS-02-09199
                     DoD W81XWH-07-PRCP-NIA and the Faculty Fellows program at NCSA
GECCO 2007 HUMIES                                                                           1
Motivation
• The American Cancer Society estimated 234,460 new cases of
  prostate cancer in 2006.
• Screening test:
     – Digital rectal examination
     – Prostate specific antigen (PSA) level
• Suspicious patients undergo biopsy process
• 1 million people undergo biopsies in the US alone per year
• Pathologist diagnose
     – Crucial for the therapy
     – Human accuracy ( error < 5% )
     – Costs


 GECCO 2007                   Llorà, Reddy, Matesic & Bhargava   2
Current Diagnosis Procedure
• Biopsy-staining-microscopy-manual recognition is the diagnosis
  procedure for the last 150 years.




 GECCO 2007              Llorà, Reddy, Matesic & Bhargava          3
Advances on Fourier Transform IR Imaging
• Infrared spectroscopy is a classical technique for
   measuring chemical composition of specimens.
• At specific frequencies, the vibrational modes of
   molecules are resonant with the frequency of infrared
   light.
• Microscope has develop to the point that resolution
   that match a pixel with a cell (and keep improving).
• It allows to start from the same data (stained tissue)
• Generates larges volumes of data


 GECCO 2007           Llorà, Reddy, Matesic & Bhargava     4
Advances on Fourier Transform IR Imaging




GECCO 2007     Llorà, Reddy, Matesic & Bhargava   5
Spectrum Analysis
• Microscope generate a lot of data
• Per spot the spectra signature requires GBs of storage
• Bhargava et al. (2005) feature extraction for tissue identification




• More than 200 potential features per spectrum (cell/pixel)
• Firsts methodology that allowed tissue identification


 GECCO 2007               Llorà, Reddy, Matesic & Bhargava        6
Human Activity
• As mentioned earlier: Area of exclusive human activity
• Two key tasks:
     – Using the spectra identify tissue type
     – Using filtered tissue diagnose samples
• Both tasks:
     – Require learning
     – Can be model as supervised learning problems
• Challenges:
     – Very large volumes of information
     – Scalability and efficiency is a priority
     – Interpretability of the models


 GECCO 2007                    Llorà, Reddy, Matesic & Bhargava   7
Genetics-Based Machine Learning
• GA-driven learning mechanisms
• Mainly rule based models
• Pittsburgh approach
• Inherently parallel process
• GBML is a good candidate for very large problems
• Rule matching is know to be the governing factor on
   the execution time (Llorà & Sastry, 2006)




 GECCO 2007           Llorà, Reddy, Matesic & Bhargava   8
Current Off-the-Shelf Systems
• There is a wide variety of GBML/LCS implementations
• Most of them:
      – Oriented to run experiments in a single processors
      – Have large memory footprints
      – Typical problem = tens of attributes + thousand
        records
      – Few attention to efficient implementation and
        acceleration techniques (Llorà & Sastry, 2006)
• Cancer diagnosis overwhelms them:
      – Hundreds of features
      – Millions of records
GECCO 2007                Llorà, Reddy, Matesic & Bhargava   9
NAX Specs
• Affordable memory footprints
• Squeeze any computation you got
• Efficient implementations:
      – Hardware acceleration
      – Massive parallelism




GECCO 2007            Llorà, Reddy, Matesic & Bhargava   10
NAX Mechanics
• The basic procedure:
      1. Create an empty decision list
      2. GA evolves a maximally accurate and maximally
         general rule using the available instances
      3. Add the evolved rule to the decision list
      4. Remove all the instances covered by the rule
      5. If there are uncovered instances go to step 2




GECCO 2007               Llorà, Reddy, Matesic & Bhargava   11
A Little Story about Hardware
• SIMD (Single Instruction Multiple Data) architectures were hot
  in the ‘80s supercomputing scene
• SIMD were widely used to performed binary operations among
  two vector operands (Cray)
• Those processors were very expensive
• Consumer products took another path, the scalar one
      – No SIMD support in hardware (left to the software)
      – The massive with spread of needs for CPUs make them cheaper
        and cheaper
• Side effect:
      – Hot in the supercomputing scene in the ‘90s become building
        machines with large numbers of “cheap” processors


GECCO 2007                  Llorà, Reddy, Matesic & Bhargava          12
The Consumer Market Strikes Back
• Computer games and multimedia applications
      – Use a particular type of matrix operations
      – Graphics heavily use 4x4 matrix operations
      – Digital signal processing applications also take advantage of it
• In late ‘90s Intel introduced SIMD instructions on Pentium chips via
  MMX
      – Multimedia oriented instructions
      – Vector operations for fix-size blocks
      – Goal: accelerate via hardware multimedia apps
• Nowadays most vendors provide “multimedia” vector instruction
  sets
      – Intel: MMX, SSE, SSE2, SSE3
      – AMD: 3Dnow!, 3Dnow+! (also support Intel’s MMX, SSE, SSE2)
      – IBM/Motorola: AltiVec

GECCO 2007                      Llorà, Reddy, Matesic & Bhargava           13
A Simple Example (I/II)
• Match = a simple aligned ‘and’ and ‘equal’

                           Instance                                                 Instance
             10 01 10 01                                             01 10 10 01
                             0101                                                     1001
                           Condition                                                Condition
      &                                                        &
             10 01 11 11                                             10 01 11 11
                            01##                                                     01##

                            Temp                                                     Temp
             10 01 10 01                                             00 00 10 01

        ==                                                      ==
                           Instance                                                 Instance
             10 01 10 01                                             01 10 10 01


                           Matched                                                 Not Matched
             11 11 11 11                                             10 01 11 11




• Vector operations allow different manipulations
• 4 floats can be manipulated at once (spectra features)

GECCO 2007                       Llorà, Reddy, Matesic & Bhargava                                14
A Simple Example (II/II)
                                                    1
                                                    2
                                                                           OP1
                                                    3
                                                    4

                                                    1
                                                    2
                                                                               OP2
                                                    3
                                                    4

                                                    1
                                                    4
                                  Res
                                                    9
                                                   16




   vecOP1                                                                                    vecOP2
             1    2       3             4                   1              2         3   4




                 vecRes       1             4           9             16

GECCO 2007                         Llorà, Reddy, Matesic & Bhargava                                   15
Exploiting the Inherent Parallelism
• Rule matching rules the overall execution time
• Fitness calculation > 99%
• The parallelization method focused on reducing
  communication cost
• The idea
      – Most of the time evaluating
      – Evaluate the evaluation
      – No master/slave
      – All processors run the same GA seeded in the same manner
      – Each processor only evaluate a chunk of the population (N/p)
      – Broadcast the fitness of the chunk to the other processors



GECCO 2007                   Llorà, Reddy, Matesic & Bhargava          16
NAX: Stretching GBML




GECCO 2007        Llorà, Reddy, Matesic & Bhargava   17
Prostate Cancer Data
1. Tissue identification
      –      Modeled as a supervised learning problem
      –      (Features, tissue type)
      –      The goal: Accurately retrieve epithelial tissue
2. Tissue identification
      –      Modeled as a supervised learning problem
      –      (Features, diagnosis)
      –      The goal: Accurately diagnose each cell (pixel) and
             aggregate those diagnosis to generate a spot
             (patient) diagnosis


GECCO 2007                   Llorà, Reddy, Matesic & Bhargava      18
GBML Identifies Tissue Types Accurately

                    Original




GECCO 2007 HUMIES              Llorà, Reddy, Matesic & Bhargava   19
GBML Identifies Tissue Types Accurately
OK




                                            Misclassified


• Accuracy >96%
• Mistakes on minority classes (not targeted) and boundaries
 GECCO 2007 HUMIES       Llorà, Reddy, Matesic & Bhargava      20
Filtered Tissue is Accurately Diagnosed


                    Original




GECCO 2007 HUMIES              Llorà, Reddy, Matesic & Bhargava   21
Filtered Tissue is Accurately Diagnosed


                    Diagnosed




GECCO 2007 HUMIES               Llorà, Reddy, Matesic & Bhargava   22
Filtered Tissue is Accurately Diagnosed
• Pixel crossvalidation accuracy (87.34%)
• Spot accuracy
      – 68 of 69 malignant spots
      – 70 of 71 benign spots

• Human-competitive computer-aided diagnosis system
    is possible
• First published results that fall in the range of
    human error (<5%)



GECCO 2007 HUMIES           Llorà, Reddy, Matesic & Bhargava   23
Breakthrough
• Current best published result, examples from
    different fields
      – Image Analysis - 77% accuracy1 (cancer/no cancer)
      – Raman Spectroscopy – 86%2 accuracy
      – Genomic analysis – 76% (low grade/high grade cancer)




        1. R. Stotzka et al. Anal. Quant. Cytol. Histol.,17, 204-218 (1995).
        2. P. Crow et al. Urol. 65, 1126-1130 (2005)
        3. L. True et al. Proc Natl Acad Sci U S A. 2006 Jul 18;103(29):10991-10996.



GECCO 2007 HUMIES                 Llorà, Reddy, Matesic & Bhargava                     24
Conclusions
• Humans are the ultimate and only source of diagnosis
• The FTIR imaging provides information about chemical
   signatures and structure
• Large volumes of data forced efficient GBML design
• Diagnosis require two steps
• The results on prostate cancer are human competitive
• No previous method has been able to match
   pathologist accuracy



 GECCO 2007           Llorà, Reddy, Matesic & Bhargava   25
Towards Better than Human Capability in
         Diagnosing Prostate Cancer
    Using Infrared Spectroscopic Imaging

       Xavier Llorà1, Rohith Reddy2,3, Brian Matesic2, Rohit Bhargava2,3

1 National    Center for Supercomputing Applications & Illinois Genetic Algorithms Laboratory
                                 2 Department   of Bioengineering
                    3 Beckman   Institute for Advanced Science and Technology
                           University of Illinois at Urbana-Champaign




                        Supported by AFOSR FA9550-06-1-0370, NSF at ISS-02-09199
                     DoD W81XWH-07-PRCP-NIA and the Faculty Fellows program at NCSA
GECCO 2007 HUMIES                                                                           26

Más contenido relacionado

Similar a Towards Better than Human Capability in Diagnosing Prostate Cancer Using Infrared Spectroscopic Imaging

Tackling the Management Challenges of Server Consolidation on Multi-core Systems
Tackling the Management Challenges of Server Consolidation on Multi-core SystemsTackling the Management Challenges of Server Consolidation on Multi-core Systems
Tackling the Management Challenges of Server Consolidation on Multi-core SystemsThe Linux Foundation
 
Future of LAPACK and ScaLAPACK
Future of LAPACK and ScaLAPACKFuture of LAPACK and ScaLAPACK
Future of LAPACK and ScaLAPACKJason Riedy
 
Deep learning based gaze detection system for automobile drivers using nir ca...
Deep learning based gaze detection system for automobile drivers using nir ca...Deep learning based gaze detection system for automobile drivers using nir ca...
Deep learning based gaze detection system for automobile drivers using nir ca...Jaey Jeong
 
GE4230 Micromirror Project 2
GE4230 Micromirror Project 2GE4230 Micromirror Project 2
GE4230 Micromirror Project 2Jon Zickermann
 
WST PhD presentation for PenTAG 17may11
WST PhD presentation for PenTAG 17may11WST PhD presentation for PenTAG 17may11
WST PhD presentation for PenTAG 17may11Will Stahl-Timmins
 
Optimization of parameter settings for GAMG solver in simple solver, OpenFOAM...
Optimization of parameter settings for GAMG solver in simple solver, OpenFOAM...Optimization of parameter settings for GAMG solver in simple solver, OpenFOAM...
Optimization of parameter settings for GAMG solver in simple solver, OpenFOAM...Masashi Imano
 
Tomotherapy Based Image Guided Imrt
Tomotherapy Based Image  Guided ImrtTomotherapy Based Image  Guided Imrt
Tomotherapy Based Image Guided Imrtfondas vakalis
 
Do not Match, Inherit: Fitness Surrogates for Genetics-Based Machine Learning...
Do not Match, Inherit: Fitness Surrogates for Genetics-Based Machine Learning...Do not Match, Inherit: Fitness Surrogates for Genetics-Based Machine Learning...
Do not Match, Inherit: Fitness Surrogates for Genetics-Based Machine Learning...Xavier Llorà
 
20160219 - M. Agostini - Nuove tecnologie per lo studio del DNA tumorale libe...
20160219 - M. Agostini - Nuove tecnologie per lo studio del DNA tumorale libe...20160219 - M. Agostini - Nuove tecnologie per lo studio del DNA tumorale libe...
20160219 - M. Agostini - Nuove tecnologie per lo studio del DNA tumorale libe...Roberto Scarafia
 
Gaze estimation using transformer
Gaze estimation using transformerGaze estimation using transformer
Gaze estimation using transformerJaey Jeong
 
Accurate protein-protein docking with rapid calculation
Accurate protein-protein docking with rapid calculationAccurate protein-protein docking with rapid calculation
Accurate protein-protein docking with rapid calculationMasahito Ohue
 
The DEBS Grand Challenge 2017
The DEBS Grand Challenge 2017The DEBS Grand Challenge 2017
The DEBS Grand Challenge 2017Roman Katerinenko
 
Mining Assumptions for Software Components using Machine Learning
Mining Assumptions for Software Components using Machine LearningMining Assumptions for Software Components using Machine Learning
Mining Assumptions for Software Components using Machine LearningLionel Briand
 
Flexible and Distributed Production Scheduling Problem Using Population-Based...
Flexible and Distributed Production Scheduling Problem Using Population-Based...Flexible and Distributed Production Scheduling Problem Using Population-Based...
Flexible and Distributed Production Scheduling Problem Using Population-Based...Mohd Nor Akmal Khalid
 
End-to-End Object Detection with Transformers
End-to-End Object Detection with TransformersEnd-to-End Object Detection with Transformers
End-to-End Object Detection with TransformersSeunghyun Hwang
 
Accelerating science with Puppet
Accelerating science with PuppetAccelerating science with Puppet
Accelerating science with PuppetTim Bell
 
Aiche 2008, Philadelphia
Aiche 2008, PhiladelphiaAiche 2008, Philadelphia
Aiche 2008, Philadelphiajshine
 

Similar a Towards Better than Human Capability in Diagnosing Prostate Cancer Using Infrared Spectroscopic Imaging (20)

Tackling the Management Challenges of Server Consolidation on Multi-core Systems
Tackling the Management Challenges of Server Consolidation on Multi-core SystemsTackling the Management Challenges of Server Consolidation on Multi-core Systems
Tackling the Management Challenges of Server Consolidation on Multi-core Systems
 
Future of LAPACK and ScaLAPACK
Future of LAPACK and ScaLAPACKFuture of LAPACK and ScaLAPACK
Future of LAPACK and ScaLAPACK
 
Deep learning based gaze detection system for automobile drivers using nir ca...
Deep learning based gaze detection system for automobile drivers using nir ca...Deep learning based gaze detection system for automobile drivers using nir ca...
Deep learning based gaze detection system for automobile drivers using nir ca...
 
GE4230 Micromirror Project 2
GE4230 Micromirror Project 2GE4230 Micromirror Project 2
GE4230 Micromirror Project 2
 
WST PhD presentation for PenTAG 17may11
WST PhD presentation for PenTAG 17may11WST PhD presentation for PenTAG 17may11
WST PhD presentation for PenTAG 17may11
 
Optimization of parameter settings for GAMG solver in simple solver, OpenFOAM...
Optimization of parameter settings for GAMG solver in simple solver, OpenFOAM...Optimization of parameter settings for GAMG solver in simple solver, OpenFOAM...
Optimization of parameter settings for GAMG solver in simple solver, OpenFOAM...
 
Tomotherapy Based Image Guided Imrt
Tomotherapy Based Image  Guided ImrtTomotherapy Based Image  Guided Imrt
Tomotherapy Based Image Guided Imrt
 
Do not Match, Inherit: Fitness Surrogates for Genetics-Based Machine Learning...
Do not Match, Inherit: Fitness Surrogates for Genetics-Based Machine Learning...Do not Match, Inherit: Fitness Surrogates for Genetics-Based Machine Learning...
Do not Match, Inherit: Fitness Surrogates for Genetics-Based Machine Learning...
 
Report
ReportReport
Report
 
20160219 - M. Agostini - Nuove tecnologie per lo studio del DNA tumorale libe...
20160219 - M. Agostini - Nuove tecnologie per lo studio del DNA tumorale libe...20160219 - M. Agostini - Nuove tecnologie per lo studio del DNA tumorale libe...
20160219 - M. Agostini - Nuove tecnologie per lo studio del DNA tumorale libe...
 
Gaze estimation using transformer
Gaze estimation using transformerGaze estimation using transformer
Gaze estimation using transformer
 
Jpe Part2
Jpe Part2Jpe Part2
Jpe Part2
 
Accurate protein-protein docking with rapid calculation
Accurate protein-protein docking with rapid calculationAccurate protein-protein docking with rapid calculation
Accurate protein-protein docking with rapid calculation
 
The DEBS Grand Challenge 2017
The DEBS Grand Challenge 2017The DEBS Grand Challenge 2017
The DEBS Grand Challenge 2017
 
Mining Assumptions for Software Components using Machine Learning
Mining Assumptions for Software Components using Machine LearningMining Assumptions for Software Components using Machine Learning
Mining Assumptions for Software Components using Machine Learning
 
Flexible and Distributed Production Scheduling Problem Using Population-Based...
Flexible and Distributed Production Scheduling Problem Using Population-Based...Flexible and Distributed Production Scheduling Problem Using Population-Based...
Flexible and Distributed Production Scheduling Problem Using Population-Based...
 
End-to-End Object Detection with Transformers
End-to-End Object Detection with TransformersEnd-to-End Object Detection with Transformers
End-to-End Object Detection with Transformers
 
Accelerating science with Puppet
Accelerating science with PuppetAccelerating science with Puppet
Accelerating science with Puppet
 
Aiche 2008, Philadelphia
Aiche 2008, PhiladelphiaAiche 2008, Philadelphia
Aiche 2008, Philadelphia
 
Session 2 ic2011 burnard
Session 2 ic2011 burnardSession 2 ic2011 burnard
Session 2 ic2011 burnard
 

Más de Xavier Llorà

Meandre 2.0 Alpha Preview
Meandre 2.0 Alpha PreviewMeandre 2.0 Alpha Preview
Meandre 2.0 Alpha PreviewXavier Llorà
 
Soaring the Clouds with Meandre
Soaring the Clouds with MeandreSoaring the Clouds with Meandre
Soaring the Clouds with MeandreXavier Llorà
 
From Galapagos to Twitter: Darwin, Natural Selection, and Web 2.0
From Galapagos to Twitter: Darwin, Natural Selection, and Web 2.0From Galapagos to Twitter: Darwin, Natural Selection, and Web 2.0
From Galapagos to Twitter: Darwin, Natural Selection, and Web 2.0Xavier Llorà
 
Large Scale Data Mining using Genetics-Based Machine Learning
Large Scale Data Mining using   Genetics-Based Machine LearningLarge Scale Data Mining using   Genetics-Based Machine Learning
Large Scale Data Mining using Genetics-Based Machine LearningXavier Llorà
 
Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study us...
Data-Intensive Computing for  Competent Genetic Algorithms:  A Pilot Study us...Data-Intensive Computing for  Competent Genetic Algorithms:  A Pilot Study us...
Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study us...Xavier Llorà
 
Scalabiltity in GBML, Accuracy-based Michigan Fuzzy LCS, and new Trends
Scalabiltity in GBML, Accuracy-based Michigan Fuzzy LCS, and new TrendsScalabiltity in GBML, Accuracy-based Michigan Fuzzy LCS, and new Trends
Scalabiltity in GBML, Accuracy-based Michigan Fuzzy LCS, and new TrendsXavier Llorà
 
Pittsburgh Learning Classifier Systems for Protein Structure Prediction: Sca...
Pittsburgh Learning Classifier Systems for Protein  Structure Prediction: Sca...Pittsburgh Learning Classifier Systems for Protein  Structure Prediction: Sca...
Pittsburgh Learning Classifier Systems for Protein Structure Prediction: Sca...Xavier Llorà
 
Towards a Theoretical Towards a Theoretical Framework for LCS Framework fo...
Towards a Theoretical  Towards a Theoretical  Framework for LCS  Framework fo...Towards a Theoretical  Towards a Theoretical  Framework for LCS  Framework fo...
Towards a Theoretical Towards a Theoretical Framework for LCS Framework fo...Xavier Llorà
 
Learning Classifier Systems for Class Imbalance Problems
Learning Classifier Systems  for Class Imbalance  ProblemsLearning Classifier Systems  for Class Imbalance  Problems
Learning Classifier Systems for Class Imbalance ProblemsXavier Llorà
 
A Retrospective Look at A Retrospective Look at Classifier System ResearchCl...
A Retrospective Look at  A Retrospective Look at  Classifier System ResearchCl...A Retrospective Look at  A Retrospective Look at  Classifier System ResearchCl...
A Retrospective Look at A Retrospective Look at Classifier System ResearchCl...Xavier Llorà
 
XCS: Current capabilities and future challenges
XCS: Current capabilities and future  challengesXCS: Current capabilities and future  challenges
XCS: Current capabilities and future challengesXavier Llorà
 
Negative Selection for Algorithm for Anomaly Detection
Negative Selection for Algorithm for Anomaly DetectionNegative Selection for Algorithm for Anomaly Detection
Negative Selection for Algorithm for Anomaly DetectionXavier Llorà
 
Searle, Intentionality, and the Future of Classifier Systems
Searle, Intentionality, and the  Future of Classifier SystemsSearle, Intentionality, and the  Future of Classifier Systems
Searle, Intentionality, and the Future of Classifier SystemsXavier Llorà
 
Computed Prediction: So far, so good. What now?
Computed Prediction:  So far, so good. What now?Computed Prediction:  So far, so good. What now?
Computed Prediction: So far, so good. What now?Xavier Llorà
 
Linkage Learning for Pittsburgh LCS: Making Problems Tractable
Linkage Learning for Pittsburgh LCS: Making Problems TractableLinkage Learning for Pittsburgh LCS: Making Problems Tractable
Linkage Learning for Pittsburgh LCS: Making Problems TractableXavier Llorà
 
Meandre: Semantic-Driven Data-Intensive Flows in the Clouds
Meandre: Semantic-Driven Data-Intensive Flows in the CloudsMeandre: Semantic-Driven Data-Intensive Flows in the Clouds
Meandre: Semantic-Driven Data-Intensive Flows in the CloudsXavier Llorà
 
ZigZag: The Meandring Language
ZigZag: The Meandring LanguageZigZag: The Meandring Language
ZigZag: The Meandring LanguageXavier Llorà
 
HUMIES 2007 Bronze Winner: Towards Better than Human Capability in Diagnosing...
HUMIES 2007 Bronze Winner: Towards Better than Human Capability in Diagnosing...HUMIES 2007 Bronze Winner: Towards Better than Human Capability in Diagnosing...
HUMIES 2007 Bronze Winner: Towards Better than Human Capability in Diagnosing...Xavier Llorà
 

Más de Xavier Llorà (20)

Meandre 2.0 Alpha Preview
Meandre 2.0 Alpha PreviewMeandre 2.0 Alpha Preview
Meandre 2.0 Alpha Preview
 
Soaring the Clouds with Meandre
Soaring the Clouds with MeandreSoaring the Clouds with Meandre
Soaring the Clouds with Meandre
 
From Galapagos to Twitter: Darwin, Natural Selection, and Web 2.0
From Galapagos to Twitter: Darwin, Natural Selection, and Web 2.0From Galapagos to Twitter: Darwin, Natural Selection, and Web 2.0
From Galapagos to Twitter: Darwin, Natural Selection, and Web 2.0
 
Large Scale Data Mining using Genetics-Based Machine Learning
Large Scale Data Mining using   Genetics-Based Machine LearningLarge Scale Data Mining using   Genetics-Based Machine Learning
Large Scale Data Mining using Genetics-Based Machine Learning
 
Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study us...
Data-Intensive Computing for  Competent Genetic Algorithms:  A Pilot Study us...Data-Intensive Computing for  Competent Genetic Algorithms:  A Pilot Study us...
Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study us...
 
Scalabiltity in GBML, Accuracy-based Michigan Fuzzy LCS, and new Trends
Scalabiltity in GBML, Accuracy-based Michigan Fuzzy LCS, and new TrendsScalabiltity in GBML, Accuracy-based Michigan Fuzzy LCS, and new Trends
Scalabiltity in GBML, Accuracy-based Michigan Fuzzy LCS, and new Trends
 
Pittsburgh Learning Classifier Systems for Protein Structure Prediction: Sca...
Pittsburgh Learning Classifier Systems for Protein  Structure Prediction: Sca...Pittsburgh Learning Classifier Systems for Protein  Structure Prediction: Sca...
Pittsburgh Learning Classifier Systems for Protein Structure Prediction: Sca...
 
Towards a Theoretical Towards a Theoretical Framework for LCS Framework fo...
Towards a Theoretical  Towards a Theoretical  Framework for LCS  Framework fo...Towards a Theoretical  Towards a Theoretical  Framework for LCS  Framework fo...
Towards a Theoretical Towards a Theoretical Framework for LCS Framework fo...
 
Learning Classifier Systems for Class Imbalance Problems
Learning Classifier Systems  for Class Imbalance  ProblemsLearning Classifier Systems  for Class Imbalance  Problems
Learning Classifier Systems for Class Imbalance Problems
 
A Retrospective Look at A Retrospective Look at Classifier System ResearchCl...
A Retrospective Look at  A Retrospective Look at  Classifier System ResearchCl...A Retrospective Look at  A Retrospective Look at  Classifier System ResearchCl...
A Retrospective Look at A Retrospective Look at Classifier System ResearchCl...
 
XCS: Current capabilities and future challenges
XCS: Current capabilities and future  challengesXCS: Current capabilities and future  challenges
XCS: Current capabilities and future challenges
 
Negative Selection for Algorithm for Anomaly Detection
Negative Selection for Algorithm for Anomaly DetectionNegative Selection for Algorithm for Anomaly Detection
Negative Selection for Algorithm for Anomaly Detection
 
Searle, Intentionality, and the Future of Classifier Systems
Searle, Intentionality, and the  Future of Classifier SystemsSearle, Intentionality, and the  Future of Classifier Systems
Searle, Intentionality, and the Future of Classifier Systems
 
Computed Prediction: So far, so good. What now?
Computed Prediction:  So far, so good. What now?Computed Prediction:  So far, so good. What now?
Computed Prediction: So far, so good. What now?
 
NIGEL 2006 welcome
NIGEL 2006 welcomeNIGEL 2006 welcome
NIGEL 2006 welcome
 
Linkage Learning for Pittsburgh LCS: Making Problems Tractable
Linkage Learning for Pittsburgh LCS: Making Problems TractableLinkage Learning for Pittsburgh LCS: Making Problems Tractable
Linkage Learning for Pittsburgh LCS: Making Problems Tractable
 
Meandre: Semantic-Driven Data-Intensive Flows in the Clouds
Meandre: Semantic-Driven Data-Intensive Flows in the CloudsMeandre: Semantic-Driven Data-Intensive Flows in the Clouds
Meandre: Semantic-Driven Data-Intensive Flows in the Clouds
 
ZigZag: The Meandring Language
ZigZag: The Meandring LanguageZigZag: The Meandring Language
ZigZag: The Meandring Language
 
HUMIES 2007 Bronze Winner: Towards Better than Human Capability in Diagnosing...
HUMIES 2007 Bronze Winner: Towards Better than Human Capability in Diagnosing...HUMIES 2007 Bronze Winner: Towards Better than Human Capability in Diagnosing...
HUMIES 2007 Bronze Winner: Towards Better than Human Capability in Diagnosing...
 
The DISCUS project
The DISCUS projectThe DISCUS project
The DISCUS project
 

Último

Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 

Último (20)

Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 

Towards Better than Human Capability in Diagnosing Prostate Cancer Using Infrared Spectroscopic Imaging

  • 1. Towards Better than Human Capability in Diagnosing Prostate Cancer Using Infrared Spectroscopic Imaging Xavier Llorà1, Rohith Reddy2,3, Brian Matesic2, Rohit Bhargava2,3 1 National Center for Supercomputing Applications & Illinois Genetic Algorithms Laboratory 2 Department of Bioengineering 3 Beckman Institute for Advanced Science and Technology University of Illinois at Urbana-Champaign Supported by AFOSR FA9550-06-1-0370, NSF at ISS-02-09199 DoD W81XWH-07-PRCP-NIA and the Faculty Fellows program at NCSA GECCO 2007 HUMIES 1
  • 2. Motivation • The American Cancer Society estimated 234,460 new cases of prostate cancer in 2006. • Screening test: – Digital rectal examination – Prostate specific antigen (PSA) level • Suspicious patients undergo biopsy process • 1 million people undergo biopsies in the US alone per year • Pathologist diagnose – Crucial for the therapy – Human accuracy ( error < 5% ) – Costs GECCO 2007 Llorà, Reddy, Matesic & Bhargava 2
  • 3. Current Diagnosis Procedure • Biopsy-staining-microscopy-manual recognition is the diagnosis procedure for the last 150 years. GECCO 2007 Llorà, Reddy, Matesic & Bhargava 3
  • 4. Advances on Fourier Transform IR Imaging • Infrared spectroscopy is a classical technique for measuring chemical composition of specimens. • At specific frequencies, the vibrational modes of molecules are resonant with the frequency of infrared light. • Microscope has develop to the point that resolution that match a pixel with a cell (and keep improving). • It allows to start from the same data (stained tissue) • Generates larges volumes of data GECCO 2007 Llorà, Reddy, Matesic & Bhargava 4
  • 5. Advances on Fourier Transform IR Imaging GECCO 2007 Llorà, Reddy, Matesic & Bhargava 5
  • 6. Spectrum Analysis • Microscope generate a lot of data • Per spot the spectra signature requires GBs of storage • Bhargava et al. (2005) feature extraction for tissue identification • More than 200 potential features per spectrum (cell/pixel) • Firsts methodology that allowed tissue identification GECCO 2007 Llorà, Reddy, Matesic & Bhargava 6
  • 7. Human Activity • As mentioned earlier: Area of exclusive human activity • Two key tasks: – Using the spectra identify tissue type – Using filtered tissue diagnose samples • Both tasks: – Require learning – Can be model as supervised learning problems • Challenges: – Very large volumes of information – Scalability and efficiency is a priority – Interpretability of the models GECCO 2007 Llorà, Reddy, Matesic & Bhargava 7
  • 8. Genetics-Based Machine Learning • GA-driven learning mechanisms • Mainly rule based models • Pittsburgh approach • Inherently parallel process • GBML is a good candidate for very large problems • Rule matching is know to be the governing factor on the execution time (Llorà & Sastry, 2006) GECCO 2007 Llorà, Reddy, Matesic & Bhargava 8
  • 9. Current Off-the-Shelf Systems • There is a wide variety of GBML/LCS implementations • Most of them: – Oriented to run experiments in a single processors – Have large memory footprints – Typical problem = tens of attributes + thousand records – Few attention to efficient implementation and acceleration techniques (Llorà & Sastry, 2006) • Cancer diagnosis overwhelms them: – Hundreds of features – Millions of records GECCO 2007 Llorà, Reddy, Matesic & Bhargava 9
  • 10. NAX Specs • Affordable memory footprints • Squeeze any computation you got • Efficient implementations: – Hardware acceleration – Massive parallelism GECCO 2007 Llorà, Reddy, Matesic & Bhargava 10
  • 11. NAX Mechanics • The basic procedure: 1. Create an empty decision list 2. GA evolves a maximally accurate and maximally general rule using the available instances 3. Add the evolved rule to the decision list 4. Remove all the instances covered by the rule 5. If there are uncovered instances go to step 2 GECCO 2007 Llorà, Reddy, Matesic & Bhargava 11
  • 12. A Little Story about Hardware • SIMD (Single Instruction Multiple Data) architectures were hot in the ‘80s supercomputing scene • SIMD were widely used to performed binary operations among two vector operands (Cray) • Those processors were very expensive • Consumer products took another path, the scalar one – No SIMD support in hardware (left to the software) – The massive with spread of needs for CPUs make them cheaper and cheaper • Side effect: – Hot in the supercomputing scene in the ‘90s become building machines with large numbers of “cheap” processors GECCO 2007 Llorà, Reddy, Matesic & Bhargava 12
  • 13. The Consumer Market Strikes Back • Computer games and multimedia applications – Use a particular type of matrix operations – Graphics heavily use 4x4 matrix operations – Digital signal processing applications also take advantage of it • In late ‘90s Intel introduced SIMD instructions on Pentium chips via MMX – Multimedia oriented instructions – Vector operations for fix-size blocks – Goal: accelerate via hardware multimedia apps • Nowadays most vendors provide “multimedia” vector instruction sets – Intel: MMX, SSE, SSE2, SSE3 – AMD: 3Dnow!, 3Dnow+! (also support Intel’s MMX, SSE, SSE2) – IBM/Motorola: AltiVec GECCO 2007 Llorà, Reddy, Matesic & Bhargava 13
  • 14. A Simple Example (I/II) • Match = a simple aligned ‘and’ and ‘equal’ Instance Instance 10 01 10 01 01 10 10 01 0101 1001 Condition Condition & & 10 01 11 11 10 01 11 11 01## 01## Temp Temp 10 01 10 01 00 00 10 01 == == Instance Instance 10 01 10 01 01 10 10 01 Matched Not Matched 11 11 11 11 10 01 11 11 • Vector operations allow different manipulations • 4 floats can be manipulated at once (spectra features) GECCO 2007 Llorà, Reddy, Matesic & Bhargava 14
  • 15. A Simple Example (II/II) 1 2 OP1 3 4 1 2 OP2 3 4 1 4 Res 9 16 vecOP1 vecOP2 1 2 3 4 1 2 3 4 vecRes 1 4 9 16 GECCO 2007 Llorà, Reddy, Matesic & Bhargava 15
  • 16. Exploiting the Inherent Parallelism • Rule matching rules the overall execution time • Fitness calculation > 99% • The parallelization method focused on reducing communication cost • The idea – Most of the time evaluating – Evaluate the evaluation – No master/slave – All processors run the same GA seeded in the same manner – Each processor only evaluate a chunk of the population (N/p) – Broadcast the fitness of the chunk to the other processors GECCO 2007 Llorà, Reddy, Matesic & Bhargava 16
  • 17. NAX: Stretching GBML GECCO 2007 Llorà, Reddy, Matesic & Bhargava 17
  • 18. Prostate Cancer Data 1. Tissue identification – Modeled as a supervised learning problem – (Features, tissue type) – The goal: Accurately retrieve epithelial tissue 2. Tissue identification – Modeled as a supervised learning problem – (Features, diagnosis) – The goal: Accurately diagnose each cell (pixel) and aggregate those diagnosis to generate a spot (patient) diagnosis GECCO 2007 Llorà, Reddy, Matesic & Bhargava 18
  • 19. GBML Identifies Tissue Types Accurately Original GECCO 2007 HUMIES Llorà, Reddy, Matesic & Bhargava 19
  • 20. GBML Identifies Tissue Types Accurately OK Misclassified • Accuracy >96% • Mistakes on minority classes (not targeted) and boundaries GECCO 2007 HUMIES Llorà, Reddy, Matesic & Bhargava 20
  • 21. Filtered Tissue is Accurately Diagnosed Original GECCO 2007 HUMIES Llorà, Reddy, Matesic & Bhargava 21
  • 22. Filtered Tissue is Accurately Diagnosed Diagnosed GECCO 2007 HUMIES Llorà, Reddy, Matesic & Bhargava 22
  • 23. Filtered Tissue is Accurately Diagnosed • Pixel crossvalidation accuracy (87.34%) • Spot accuracy – 68 of 69 malignant spots – 70 of 71 benign spots • Human-competitive computer-aided diagnosis system is possible • First published results that fall in the range of human error (<5%) GECCO 2007 HUMIES Llorà, Reddy, Matesic & Bhargava 23
  • 24. Breakthrough • Current best published result, examples from different fields – Image Analysis - 77% accuracy1 (cancer/no cancer) – Raman Spectroscopy – 86%2 accuracy – Genomic analysis – 76% (low grade/high grade cancer) 1. R. Stotzka et al. Anal. Quant. Cytol. Histol.,17, 204-218 (1995). 2. P. Crow et al. Urol. 65, 1126-1130 (2005) 3. L. True et al. Proc Natl Acad Sci U S A. 2006 Jul 18;103(29):10991-10996. GECCO 2007 HUMIES Llorà, Reddy, Matesic & Bhargava 24
  • 25. Conclusions • Humans are the ultimate and only source of diagnosis • The FTIR imaging provides information about chemical signatures and structure • Large volumes of data forced efficient GBML design • Diagnosis require two steps • The results on prostate cancer are human competitive • No previous method has been able to match pathologist accuracy GECCO 2007 Llorà, Reddy, Matesic & Bhargava 25
  • 26. Towards Better than Human Capability in Diagnosing Prostate Cancer Using Infrared Spectroscopic Imaging Xavier Llorà1, Rohith Reddy2,3, Brian Matesic2, Rohit Bhargava2,3 1 National Center for Supercomputing Applications & Illinois Genetic Algorithms Laboratory 2 Department of Bioengineering 3 Beckman Institute for Advanced Science and Technology University of Illinois at Urbana-Champaign Supported by AFOSR FA9550-06-1-0370, NSF at ISS-02-09199 DoD W81XWH-07-PRCP-NIA and the Faculty Fellows program at NCSA GECCO 2007 HUMIES 26