SlideShare una empresa de Scribd logo
1 de 64
Integrative Multi-Scale Analyses



Joel Saltz MD, PhD
Emory University
February 2013
a.k.a “Big Data”
Center for Comprehensive Informatics




                                       • Integrative Spatio-Temporal Analytics
                                       • Deep Integrative Biomedical Research
                                       • High End Computing/”Big Data” Computers,
                                         Systems Software
                                       • Analysis of Patient Populations
Application Targets
Center for Comprehensive Informatics




                                       • Multi-dimensional spatial-temporal datasets
                                          – Radiology and Microscopy Image Analyses
                                          – Oil Reservoir Simulation/Carbon
                                            Sequestration/Groundwater Pollution Remediation
                                          – Biomass monitoring and disaster surveillance using
                                            multiple types of satellite imagery
                                          – Weather prediction using satellite and ground sensor
                                            data
                                          – Analysis of Results from Large Scale Simulations
                                       • Correlative and cooperative analysis of data from
                                         multiple sensor modalities and sources
                                       • What-if scenarios and multiple design choices or
                                         initial conditions
Core Transformations
Center for Comprehensive Informatics




                                       • Data Cleaning and Low Level Transformations
                                       • Data Subsetting, Filtering, Subsampling
                                       • Spatio-temporal Mapping and Registration
                                       • Object Segmentation
                                       • Feature Extraction, Object Classification
                                       • Spatio-temporal Aggregation
                                       • Change Detection, Comparison, and Quantification
Emory In Silico Center for Brain Tumor
Research (PI = Dan Brat, PD= Joel Saltz)
National Science Foundation Grand Challenge
           in Land Cover Dynamics
                                         • Remote sensing analysis of
                                           high resolution satellite
                                           images.
                                         • Databases of land cover
                                           dynamics are essential for
                                           global carbon models,
                                           biogeochemical cycling,
                                           hydrological modeling and
                                           ecosystem response
                                           modeling
                                         • Maps of the world's tropical
                                           rain forest during the past
                                           three decades.
 Larry Davis , Rama Chellappa , Joel Saltz , Alan Sussman , John
 Townshend
Analysis of Computational Data; Uncertainty
                                       Quantification, Comparisons with Experimental Results
Center for Comprehensive Informatics




                                            Dimitri Mavriplis, Raja Das, Joel Saltz
a.k.a “Big Data”
Center for Comprehensive Informatics




                                       • Integrative Spatio-Temporal Analytics
                                       • Deep Integrative Biomedical Research
                                       • High End Computing/”Big Data” Computers,
                                         Systems Software
                                       • Analysis of Patient Populations
Center for Comprehensive Informatics

                                       Whole Slide Imaging: Scale
Pathology Computer Assisted Diagnosis
Center for Comprehensive Informatics




                                               Shimada, Gurcan, Kong, Saltz
Computerized Classification System
     for Grading Neuroblastoma
                    Initialization                           Yes
Image Tile                              Background?                           Label
                        I=L
                                                                                      • Background Identification
                                            No

                                       Create Image I(L)
                                                                                      • Image Decomposition (Multi-
   Training Tiles                                                                       resolution levels)
                                        Segmentation              I = I -1            • Image Segmentation
  Down-sampling
                                                                                        (EMLDA)
   Segmentation
                                     Feature Construction
                                                                                      • Feature Construction (2nd
                                                             Yes

                                                                             No         order statistics, Tonal
                                      Feature Extraction          I > 1?
Feature Construction
                                                                                        Features)
 Feature Extraction
                                        Classification
                                                                                      • Feature Extraction (LDA) +
                                                                                        Classification (Bayesian)
 Classifier Training
                                                            No
                                                                                      • Multi-resolution Layer
                                      Within Confidence
                                          Region ?                                      Controller (Confidence
                                                            Yes
   TRAINING                                                                             Region)
                                                  TESTING
Using TCGA Data to Study
 Glioblastoma

Diagnostic Improvement

Molecular Classification

Predictors of Progression
TCGA Network



               Digital Pathology




               Neuroimaging
Morphological Tissue Classification
Center for Comprehensive Informatics



                                           Whole Slide Imaging               Cellular Features




                                           Nuclei Segmentation




                                                                                     Lee Cooper,
                                                                                     Jun Kong
Can we use image analysis of TCGA GBMs TO INFORM
 diagnostic criteria based on molecular or clinical
 endpoints?

                    Nuclear Qualities




Oligodendroglioma                         Astrocytoma


 Application: Oligodendroglioma Component in GBM
Millions of Nuclei Defined by n Features

• Bottom-up analysis: let features define
  and drive the analysis

• Top-down analysis: use the features
  with existing diagnostic constructs
TCGA Whole Slide Images
    Step 1:
    Nuclei
                 • Identify individual nuclei
 Segmentation
                   and their boundaries




                Jun Kong
Nuclear Analysis Workflow
    Step 1:             Step 2:
    Nuclei              Feature
 Segmentation          Extraction




• Describe individual nuclei in terms of size,
  shape, and texture
Step 3:
    Nuclei
                  Nuclear Qualities
 Classification




 1                                         10

Oligodendroglioma                Astrocytoma
Representative Nuclei




Oligo               Astro
Comparison of Machine-based Classification
        to Human Based Classification




Separation of GBM, Oligo1, Oligo2   Separation of GBM, Oligo1 and
as Designated by                    Oligo2 as Designated by Machine
Neuropathologists
Survival Analysis




Human            Machine
Gene Expression Correlates of High Oligo-Astro
    Ratio on Machine-based Classification

                               Oligo Related Genes

                               Myelin Basic Protein
                               Proteolipoprotein
                               HoxD1



                               Nuclear features most
                               Associated with Oligo
                               Signature Genes:

                               Circularity (high)
                               Eccentricity (low)
Millions of Nuclei Defined by n Features

• Bottom-up analysis: let nuclear features
  define and drive the analysis

• Top-down analysis: analyze features in
  context of existing diagnostic constructs
Direct Study of Relationship Between
                                                 vs
Center for Comprehensive Informatics




                                                                              Lee Cooper,
                                                                              Carlos Moreno
Nuclear Features Used to Classify GBMs
Center for Comprehensive Informatics




                                                           50
                                                                                                         3                  2               1
                                                                                              20
                                                                                                                                                                  1
                                                           45
                                                                                              40
                                         Silhouette Area




                                                           40                                 60




                                                                                                                                                        Cluster
                                                                                              80
                                                                                                                                                                  2
                                                           35
                                                                                              100

                                                                                              120
                                                           30
                                                                                              140
                                                                                                                                                                  3

                                                           25                                 160
                                                                2    3    4       5   6   7         20       40   60   80       100   120   140   160
                                                                         # Clusters                                                                                   0         0.5          1
                                                                                                                                                                          Silhouette Value




                                                           Consensus clustering of morphological
                                                           signatures

                                                                    Study includes 200 million nuclei taken from 480
                                                                    slides corresponding to 167 distinct patients

                                                                    Each possibility evaluated using 2000 iterations of K-
                                                                    means to quantify co-clustering
Clustering identifies three morphological groups
Center for Comprehensive Informatics



                                       • Analyzed 200 million nuclei from 162 TCGA GBMs (462 slides)
                                       • Named for functions of associated genes:
                                         Cell Cycle (CC), Chromatin Modification (CM),
                                         Protein Biosynthesis (PB)
                                       • Prognostically-significant (logrank p=4.5e-4)


                                                                           CC   CM   PB
                                                                                                      1
                                                                                                                                           CC
                                                                      10                             0.8                                   CM
                                                                                                                                           PB
                                                                      20
                                                    Feature Indices




                                                                                                     0.6




                                                                                          Survival
                                                                      30                             0.4


                                                                      40                             0.2


                                                                      50
                                                                                                      0
                                                                                                           0   500   1000   1500   2000   2500   3000
                                                                                                                            Days
Center for Comprehensive Informatics

                                       Associations
Molecular Correlates of MR Features Using TCGA Data




 MRIs of TCGA GBMs reviewed by 3-6 neuroradiologists using VASARI feature set and In
 Vivo Imaging tools

 MR Features compared to TCGA Transcriptional Classes and Genetic Alterations




                                                                 David Gutman
Capturing structured annotations
 and markups/ AIM Data Service
VASARI
      Feature Set




Scott Hwang
Chad Holder
Adam Flanders
Prognostic Significance of Vasari Features
 Tests Between Groups:   0-33% vs. 34-95% Proportion enhancing




            Test     ChiSquare    DF        P-Value
          Log-Rank    12.4775     3         0.0059*
          Wilcoxon    10.0802     3         0.0179*
a.k.a “Big Data”
Center for Comprehensive Informatics




                                       • Integrative Spatio-Temporal Analytics
                                       • Deep Integrative Biomedical Research
                                       • High End Computing/”Big Data” Computers,
                                         Systems Software
                                       • Analysis of Patient Populations
Titan – Peak Speed 30,000,000,000,000,000
                                       floating point operations per second!
Center for Comprehensive Informatics
Center for Comprehensive Informatics
Extreme DataCutter Prototype
Center for Comprehensive Informatics




                                       DataCutter
                                         Pipeline of filters connected though logical streams
                                         In transit processing
                                         Flow control between filters and streams
                                         Developed 1990s-2000s; led to IBM System S
                                       Extreme DataCutter
                                         Two level hierarchical pipeline framework
                                         In transit processing
                                         Coarse grained components coordinated by Manager that
                                         coordinates work on pipeline stages between nodes
                                         Fine grained pipeline operations managed at the node level
                                         Both levels employ filter/stream paradigm
                                         Bottom line – everything ends up as DAGS
Extreme DataCutter – Two Level Model
Center for Comprehensive Informatics
Center for Comprehensive Informatics

                                       Node Level Work Scheduling
Brain Tumor Pipeline Scaling on Keeneland
                                       (100 Nodes)
Center for Comprehensive Informatics
Challenge: Structured/Unstructured Grid
                                       Calculations with Unpredictable Runtime
Center for Comprehensive Informatics



                                       Dependencies




                                        Key Kernel in Distance Transform,
                                        Morphological Reconstruction, Delaney
                                        Triagulation
“Speedup” relative to single CPU core
Center for Comprehensive Informatics
Large Scale Data Management
Center for Comprehensive Informatics




                                        Represented by a complex data model capturing
                                         multi-faceted   information   including  markups,
                                         annotations, algorithm provenance, specimen, etc.
                                        Support for complex relationships and spatial
                                         query:    multi-level granularities, relationships
                                         between markups and annotations, spatial and
                                         nested relationships
                                        Highly optimized spatial query and analyses
                                        Implemented in a variety of ways including
                                         optimized CPU/GPU, Hadoop/HDFS and IBM DB2
Spatial Centric – Pathology Imaging “GIS”
Point query: human marked point      Window query: return markups
inside a nucleus                     contained in a rectangle


                .


Containment query: nuclear feature   Spatial join query: algorithm
aggregation in tumor regions         validation/comparison
Algorithm Validation: Intersection
between Two Result Sets (Spatial Join)
            PAIS: Example Queries




    .   .
VLDB 2012
Center for Comprehensive Informatics




                                       Change Detection, Comparison, and Quantification
Approach to Integrated Sensor Data Analysis
                                       Framework
Center for Comprehensive Informatics




                                                              • Abstract templates specify
                                                                dataset geometry
                                                              • Templates describe
                                                                collections of space-time
                                                                regions
                                                              • Mapping to memory
                                                                hierarchies provided by user
                                                                defined mapping functions
                                                              • Leverages Parashar’s
                                                                DataSpaces
a.k.a “Big Data”
Center for Comprehensive Informatics




                                       • Integrative Spatio-Temporal Analytics
                                       • Deep Integrative Biomedical Research
                                       • High End Computing/”Big Data” Computers,
                                         Systems Software
                                       • Analysis of Patient Populations
Clinical Phenotype Characterization and the Emory
                                       Analytic Information Warehouse
Center for Comprehensive Informatics




                                       • Example Project: Find hot spots in readmissions within 30 days
                                          – What fraction of patients with a given principal diagnosis will be
                                            readmitted within 30 days?
                                          – What fraction of patients with a given set of diseases will be readmitted
                                            within 30 days?
                                          – How does severity and time course of co-morbidities affect
                                            readmissions?
                                          – Geographic analyses

                                       • Compare and contrast with UHC Clinical Data Base
                                          – Repeat analyses across all UHC hospitals
                                          – Are we performing the same?
                                          – How are UHC-curated groupings of patients (e.g., product lines) useful?

                                       • Need a repeatable process that we can apply identically to both
                                         local and UHC data


                                         Andrew Post, Sharath Cholleti, Doris Gao, Michel Monsour, Himanshu Rathod
Overall System
Center for Comprehensive Informatics



                                                                                             Metadata
                                                                                            Repository
                                                       I2b2 Web       I2b2
                                                         Server
                                                                    Database

                                       Investigator                                          Metadata
                                                                                             Manager

                                                                                                                 Data Modeler


                                                                                  Data              Query
                                                                               Processing        Specification


                                                                                                                       Data Analyst
                                       Investigator

                                                                                             Database
                                                                                              Mapper


                                                                                                                  Data Analyst
                                                                     Study-
                                                      Query tools   specific
                                                                    Database    Source       Source        Source
                                       Investigator
                                                                                 data         data          data
5-year Datasets from Emory and
                                       University Healthcare Consortium
Center for Comprehensive Informatics




                                       • EUH, EUHM and WW (inpatient encounters)
                                       • Removed encounter pairs with chemotherapy and radiation
                                         therapy readmit encounters (CDW data)

                                       •   Encounter location (down to unit for Emory)
                                       •   Providers (Emory only)
                                       •   Discharge disposition
                                       •   Primary and secondary ICD9 codes
                                       •   Procedure codes
                                       •   DRGs
                                       •   Medication orders (Emory only)
                                       •   Labs (Emory only)
                                       •   Vitals (Emory only)
                                       •   Geographic information (CDW only + US Census and American
                                           Community Survey)
                                                          Analytic Information
Using Emory & UHC Data to Find
                                       Associations With 30-day Readmits
Center for Comprehensive Informatics




                                       • Problem: “Raw” clinical and administrative variables
                                         are difficult to use for associative data mining
                                          – Too many diagnosis codes, procedure codes
                                          – Continuous variables (e.g., labs) require interpretation
                                          – Temporal relationships between variables are implicit
                                       • Solution: Transform the data into a much smaller set
                                         of variables using heuristic knowledge
                                          – Categorize diagnosis and procedure codes using code
                                            hierarchies
                                          – Classify continuous variables using standard
                                            interpretations (e.g., high, normal, low)
                                          – Identify temporal patterns (e.g., frequency, duration,
                                            sequence)
                                          – Apply standard data mining techniques

                                                            Analytic Information
Derived Variables
Center for Comprehensive Informatics



                                 •     30-day readmit
                                 •     The 9 Emory Enhanced Risk Assessment Tool diagnosis categories
                                 •     UHC product lines
                                 •     Variables derived from a combination of codes and/or laboratory test results
                                        – Obesity
                                        – Diabetes/uncontrolled diabetes
                                        – End-stage renal disease (ESRD)
                                        – Pressure ulcer
                                        – Sickle cell disease/sickle cell crisis
                                 •     Temporal variables derived over multiple encounters
                                        – Multiple MI
                                        – Multiple 30-day readmissions
                                        – Chemotherapy within 180 (or 365) days before surgery
                                        – Previous encounter within the last 90 (or 180) days
30-Day Readmission Rates for Derived
                                       Variables
Center for Comprehensive Informatics




                                       Emory Health Care
Geographic Analyses
                                       UHC Medicine General Product Line (#15)
Center for Comprehensive Informatics




                                                         Analytic Information Warehouse
Predictive Modeling for Readmission
Center for Comprehensive Informatics




                                       • Random forests (ensemble of decision trees)
                                         – Create a decision tree using a random subset of the
                                           variables in the dataset
                                         – Generate a large number of such trees
                                         – All trees vote to classify each test example in a
                                           training dataset
                                         – Generate a patient-specific readmission risk for each
                                           encounter
                                       • Rank the encounters by risk for a subsequent 30-
                                         day readmission


                                                    Sharath Cholleti
Emory Readmission Rates for High and
                                       Low Risk Groups Generated with
Center for Comprehensive Informatics




                                       Random Forest
Status of Clinical Phenotype
                                       Characterization
Center for Comprehensive Informatics




                                       • Integrative dataset analysis can leverage patient
                                         information gathered over many encounters
                                       • Temporal analyses can generate derived variables that
                                         appear to correlate with readmissions
                                       • Predictive modeling has promise of providing decision
                                         support
                                       • Data Analytics arm of the Emory New Care Model
                                         Initiative led by Greg Esper
                                       • Ongoing analyses involve characterization of clinical
                                         phenotype in GWAS, biomarker and quality
                                         improvement efforts
                                       • Co-lead (with Bill Hersh) of CTSA CER Informatics
                                         taskforce dedicated to this issue
a.k.a “Big Data”
Center for Comprehensive Informatics




                                       • Integrative Spatio-Temporal Analytics
                                       • Deep Integrative Biomedical Research
                                       • High End Computing/”Big Data” Computers,
                                         Systems Software
                                       • Analysis of Patient Populations
Thanks to:
                                       •   In silico center team: Dan Brat (Science PI), Tahsin Kurc, Ashish
Center for Comprehensive Informatics



                                           Sharma, Tony Pan, David Gutman, Jun Kong, Sharath Cholleti,
                                           Carlos Moreno, Chad Holder, Erwin Van Meir, Daniel Rubin, Tom
                                           Mikkelsen, Adam Flanders, Joel Saltz (Director)
                                       •   Digital Pathology R01 (s): Foran and Saltz; Jun Kong, Sharath
                                           Cholleti, Fusheng Wang, Tony Pan, Tahsin Kurc, Ashish Sharma,
                                           David Gutman (Emory), Wenjin Chen, Vicky Chu, Jun Hu, Lin Yang,
                                           David J. Foran (Rutgers)
                                       •   Analytic Warehouse team: Andrew Post, Sharath Cholleti, Doris
                                           Gao, Michel Monsour, Himanshu Rathod
                                       •   In vivo imaging Emory team: Tony Pan, Ashish Sharma, Joel Saltz
                                       •   NIH/in silico TCGA Imaging Group: Scott Hwang, Bob Clifford, Erich
                                           Huang, Dima Hammoud, Manal Jilwan, Prashant Raghavan, Max
                                           Wintermark, David Gutman, Carlos Moreno, Lee Cooper, John
                                           Freymann, Justin Kirby, Arun Krishnan, Seena Dehkharghani, Carl
                                           Jaffe
                                       •   ACTSI Biomedical Informatics Program: Marc Overcash, Tim
                                           Morris, Tahsin Kurc, Alexander Quarshie, Circe Tsui, Adam Davis,
                                           Sharon Mason, Andrew Post, Alfredo Tirado-Ramos
                                       •   ORNL HPC collaboration: Scott Klasky, David Pugmire ORNL
Thanks to
Center for Comprehensive Informatics




                                       • National Cancer Institute
                                       • National Library of Medicine
                                       • National Science Foundation
                                       • Cardiovascular Research Grid (NHLBI)
                                       • Minority Health Grid (ARRA)
                                       • Emory Health Care
                                       • Kaiser Health Care
                                       • Winship Cancer Institute
                                       • Oak Ridge National Laboratory
                                       • Woodruff Health Sciences
Thanks!

Más contenido relacionado

La actualidad más candente

Acre State System of Incentives for Environmental Services
Acre State System of Incentives for Environmental Services Acre State System of Incentives for Environmental Services
Acre State System of Incentives for Environmental Services CIFOR-ICRAF
 
LLTECH LIGHT-CT SCANNER IMAGE ATLAS
LLTECH LIGHT-CT SCANNER IMAGE ATLASLLTECH LIGHT-CT SCANNER IMAGE ATLAS
LLTECH LIGHT-CT SCANNER IMAGE ATLASLLTech
 
Week 3 descriptive research design
Week 3 descriptive research designWeek 3 descriptive research design
Week 3 descriptive research designuniv
 
PROGRAMMED TARGET RECOGNITION FRAMEWORKS FOR UNDERWATER MINE CLASSIFICATION
PROGRAMMED TARGET RECOGNITION FRAMEWORKS FOR UNDERWATER MINE CLASSIFICATIONPROGRAMMED TARGET RECOGNITION FRAMEWORKS FOR UNDERWATER MINE CLASSIFICATION
PROGRAMMED TARGET RECOGNITION FRAMEWORKS FOR UNDERWATER MINE CLASSIFICATIONEditor IJCTER
 
Performance of a Novel SUV Calculation Scheme for PET Study
Performance of a Novel SUV Calculation Scheme for PET StudyPerformance of a Novel SUV Calculation Scheme for PET Study
Performance of a Novel SUV Calculation Scheme for PET StudyPawitra Masa-ah
 
Marked Point Process For Neurite Tracing
Marked Point Process For Neurite TracingMarked Point Process For Neurite Tracing
Marked Point Process For Neurite TracingIPALab
 
Role of Biomedical Informatics in Translational Cancer Research
Role of Biomedical Informatics in Translational Cancer ResearchRole of Biomedical Informatics in Translational Cancer Research
Role of Biomedical Informatics in Translational Cancer ResearchJoel Saltz
 

La actualidad más candente (10)

Dolphin Imaging
Dolphin  ImagingDolphin  Imaging
Dolphin Imaging
 
Acre State System of Incentives for Environmental Services
Acre State System of Incentives for Environmental Services Acre State System of Incentives for Environmental Services
Acre State System of Incentives for Environmental Services
 
LLTECH LIGHT-CT SCANNER IMAGE ATLAS
LLTECH LIGHT-CT SCANNER IMAGE ATLASLLTECH LIGHT-CT SCANNER IMAGE ATLAS
LLTECH LIGHT-CT SCANNER IMAGE ATLAS
 
Week 3 descriptive research design
Week 3 descriptive research designWeek 3 descriptive research design
Week 3 descriptive research design
 
SECURE: Semantics Empowered resCUe enviRonmEnt
SECURE: Semantics Empowered resCUe enviRonmEntSECURE: Semantics Empowered resCUe enviRonmEnt
SECURE: Semantics Empowered resCUe enviRonmEnt
 
PROGRAMMED TARGET RECOGNITION FRAMEWORKS FOR UNDERWATER MINE CLASSIFICATION
PROGRAMMED TARGET RECOGNITION FRAMEWORKS FOR UNDERWATER MINE CLASSIFICATIONPROGRAMMED TARGET RECOGNITION FRAMEWORKS FOR UNDERWATER MINE CLASSIFICATION
PROGRAMMED TARGET RECOGNITION FRAMEWORKS FOR UNDERWATER MINE CLASSIFICATION
 
Performance of a Novel SUV Calculation Scheme for PET Study
Performance of a Novel SUV Calculation Scheme for PET StudyPerformance of a Novel SUV Calculation Scheme for PET Study
Performance of a Novel SUV Calculation Scheme for PET Study
 
Marked Point Process For Neurite Tracing
Marked Point Process For Neurite TracingMarked Point Process For Neurite Tracing
Marked Point Process For Neurite Tracing
 
Role of Biomedical Informatics in Translational Cancer Research
Role of Biomedical Informatics in Translational Cancer ResearchRole of Biomedical Informatics in Translational Cancer Research
Role of Biomedical Informatics in Translational Cancer Research
 
Workstationfujip
WorkstationfujipWorkstationfujip
Workstationfujip
 

Similar a Integrative Multi-Scale Analyses

Machine Learning and Deep Contemplation of Data
Machine Learning and Deep Contemplation of DataMachine Learning and Deep Contemplation of Data
Machine Learning and Deep Contemplation of DataJoel Saltz
 
HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...
HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...
HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...Tulipp. Eu
 
Exascale Computing and Experimental Sensor Data
Exascale Computing and Experimental Sensor DataExascale Computing and Experimental Sensor Data
Exascale Computing and Experimental Sensor DataJoel Saltz
 
Developments in datamanagement
Developments in datamanagementDevelopments in datamanagement
Developments in datamanagementSURFnet
 
Data Science, Big Data and You
Data Science, Big Data and YouData Science, Big Data and You
Data Science, Big Data and YouJoel Saltz
 
MICCAI - Workshop on High Performance and Distributed Computing for Medical I...
MICCAI - Workshop on High Performance and Distributed Computing for Medical I...MICCAI - Workshop on High Performance and Distributed Computing for Medical I...
MICCAI - Workshop on High Performance and Distributed Computing for Medical I...Joel Saltz
 
Functional and Architectural Requirements for Metadata: Supporting Discovery...
Functional and Architectural Requirements for Metadata: Supporting Discovery...Functional and Architectural Requirements for Metadata: Supporting Discovery...
Functional and Architectural Requirements for Metadata: Supporting Discovery...Jian Qin
 
Exascale Challenges: Space, Time, Experimental Science and Self Driving Cars
Exascale Challenges: Space, Time, Experimental Science and Self Driving Cars Exascale Challenges: Space, Time, Experimental Science and Self Driving Cars
Exascale Challenges: Space, Time, Experimental Science and Self Driving Cars Joel Saltz
 
High Dimensional Fused-Informatics
High Dimensional Fused-InformaticsHigh Dimensional Fused-Informatics
High Dimensional Fused-InformaticsJoel Saltz
 
Skills portfolio
Skills portfolioSkills portfolio
Skills portfolioyeboyerp
 
IGARSS11_BenchmarkImagerySlides_FINAL.pdf
IGARSS11_BenchmarkImagerySlides_FINAL.pdfIGARSS11_BenchmarkImagerySlides_FINAL.pdf
IGARSS11_BenchmarkImagerySlides_FINAL.pdfgrssieee
 
IGARSS11_BenchmarkImagerySlides_FINAL.pdf
IGARSS11_BenchmarkImagerySlides_FINAL.pdfIGARSS11_BenchmarkImagerySlides_FINAL.pdf
IGARSS11_BenchmarkImagerySlides_FINAL.pdfgrssieee
 
IGARSS11_BenchmarkImagerySlides_FINAL.pdf
IGARSS11_BenchmarkImagerySlides_FINAL.pdfIGARSS11_BenchmarkImagerySlides_FINAL.pdf
IGARSS11_BenchmarkImagerySlides_FINAL.pdfgrssieee
 
IGARSS11_BenchmarkImagerySlides_FINAL.pdf
IGARSS11_BenchmarkImagerySlides_FINAL.pdfIGARSS11_BenchmarkImagerySlides_FINAL.pdf
IGARSS11_BenchmarkImagerySlides_FINAL.pdfgrssieee
 
Qiagram Slides 2011 05
Qiagram Slides 2011 05Qiagram Slides 2011 05
Qiagram Slides 2011 05bhughes26
 
AI IN PATH final PPT.pptx
AI IN PATH final PPT.pptxAI IN PATH final PPT.pptx
AI IN PATH final PPT.pptxDivyaGaurav4
 

Similar a Integrative Multi-Scale Analyses (20)

Machine Learning and Deep Contemplation of Data
Machine Learning and Deep Contemplation of DataMachine Learning and Deep Contemplation of Data
Machine Learning and Deep Contemplation of Data
 
HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...
HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...
HiPEAC 2019 Workshop - Real-Time Modelling Visual Scenes with Biological Insp...
 
Exascale Computing and Experimental Sensor Data
Exascale Computing and Experimental Sensor DataExascale Computing and Experimental Sensor Data
Exascale Computing and Experimental Sensor Data
 
Developments in datamanagement
Developments in datamanagementDevelopments in datamanagement
Developments in datamanagement
 
Data Science, Big Data and You
Data Science, Big Data and YouData Science, Big Data and You
Data Science, Big Data and You
 
MICCAI - Workshop on High Performance and Distributed Computing for Medical I...
MICCAI - Workshop on High Performance and Distributed Computing for Medical I...MICCAI - Workshop on High Performance and Distributed Computing for Medical I...
MICCAI - Workshop on High Performance and Distributed Computing for Medical I...
 
Functional and Architectural Requirements for Metadata: Supporting Discovery...
Functional and Architectural Requirements for Metadata: Supporting Discovery...Functional and Architectural Requirements for Metadata: Supporting Discovery...
Functional and Architectural Requirements for Metadata: Supporting Discovery...
 
Exascale Challenges: Space, Time, Experimental Science and Self Driving Cars
Exascale Challenges: Space, Time, Experimental Science and Self Driving Cars Exascale Challenges: Space, Time, Experimental Science and Self Driving Cars
Exascale Challenges: Space, Time, Experimental Science and Self Driving Cars
 
High Dimensional Fused-Informatics
High Dimensional Fused-InformaticsHigh Dimensional Fused-Informatics
High Dimensional Fused-Informatics
 
Skills portfolio
Skills portfolioSkills portfolio
Skills portfolio
 
IGARSS11_BenchmarkImagerySlides_FINAL.pdf
IGARSS11_BenchmarkImagerySlides_FINAL.pdfIGARSS11_BenchmarkImagerySlides_FINAL.pdf
IGARSS11_BenchmarkImagerySlides_FINAL.pdf
 
IGARSS11_BenchmarkImagerySlides_FINAL.pdf
IGARSS11_BenchmarkImagerySlides_FINAL.pdfIGARSS11_BenchmarkImagerySlides_FINAL.pdf
IGARSS11_BenchmarkImagerySlides_FINAL.pdf
 
IGARSS11_BenchmarkImagerySlides_FINAL.pdf
IGARSS11_BenchmarkImagerySlides_FINAL.pdfIGARSS11_BenchmarkImagerySlides_FINAL.pdf
IGARSS11_BenchmarkImagerySlides_FINAL.pdf
 
IGARSS11_BenchmarkImagerySlides_FINAL.pdf
IGARSS11_BenchmarkImagerySlides_FINAL.pdfIGARSS11_BenchmarkImagerySlides_FINAL.pdf
IGARSS11_BenchmarkImagerySlides_FINAL.pdf
 
Seminar nov2017
Seminar nov2017Seminar nov2017
Seminar nov2017
 
Big Data & The Cloud
Big Data & The CloudBig Data & The Cloud
Big Data & The Cloud
 
Qiagram
QiagramQiagram
Qiagram
 
Qiagram Slides 2011 05
Qiagram Slides 2011 05Qiagram Slides 2011 05
Qiagram Slides 2011 05
 
Qiagram
QiagramQiagram
Qiagram
 
AI IN PATH final PPT.pptx
AI IN PATH final PPT.pptxAI IN PATH final PPT.pptx
AI IN PATH final PPT.pptx
 

Más de Joel Saltz

AI and whole slide imaging biomarkers
AI and whole slide imaging biomarkersAI and whole slide imaging biomarkers
AI and whole slide imaging biomarkersJoel Saltz
 
Pathomics, Clinical Studies, and Cancer Surveillance
Pathomics, Clinical Studies, and Cancer SurveillancePathomics, Clinical Studies, and Cancer Surveillance
Pathomics, Clinical Studies, and Cancer SurveillanceJoel Saltz
 
Learning, Training,  Classification,  Common Sense and Exascale Computing
Learning, Training,  Classification,  Common Sense and Exascale ComputingLearning, Training,  Classification,  Common Sense and Exascale Computing
Learning, Training,  Classification,  Common Sense and Exascale ComputingJoel Saltz
 
Integrative Everything, Deep Learning and Streaming Data
Integrative Everything, Deep Learning and Streaming DataIntegrative Everything, Deep Learning and Streaming Data
Integrative Everything, Deep Learning and Streaming DataJoel Saltz
 
Digital Pathology: Precision Medicine, Deep Learning and Computer Aided Inter...
Digital Pathology: Precision Medicine, Deep Learning and Computer Aided Inter...Digital Pathology: Precision Medicine, Deep Learning and Computer Aided Inter...
Digital Pathology: Precision Medicine, Deep Learning and Computer Aided Inter...Joel Saltz
 
Extreme Computing, Clinical Medicine and GPUs or Can GPUs Cure Cancer
Extreme Computing, Clinical Medicine and GPUs or Can GPUs Cure CancerExtreme Computing, Clinical Medicine and GPUs or Can GPUs Cure Cancer
Extreme Computing, Clinical Medicine and GPUs or Can GPUs Cure CancerJoel Saltz
 
Twenty Years of Whole Slide Imaging - the Coming Phase Change
Twenty Years of Whole Slide Imaging - the Coming Phase ChangeTwenty Years of Whole Slide Imaging - the Coming Phase Change
Twenty Years of Whole Slide Imaging - the Coming Phase ChangeJoel Saltz
 
Twenty Years of Whole Slide Imaging - the Coming Phase Change
Twenty Years of Whole Slide Imaging - the Coming Phase ChangeTwenty Years of Whole Slide Imaging - the Coming Phase Change
Twenty Years of Whole Slide Imaging - the Coming Phase ChangeJoel Saltz
 
Digital Pathology, FDA Approval and Precision Medicine
Digital Pathology, FDA Approval and Precision MedicineDigital Pathology, FDA Approval and Precision Medicine
Digital Pathology, FDA Approval and Precision MedicineJoel Saltz
 
Pathomics Based Biomarkers and Precision Medicine
Pathomics Based Biomarkers and Precision MedicinePathomics Based Biomarkers and Precision Medicine
Pathomics Based Biomarkers and Precision MedicineJoel Saltz
 
Integrative Multi-Scale Analysis in Biomedical Data Science: Tools, Methods a...
Integrative Multi-Scale Analysis in Biomedical Data Science: Tools, Methods a...Integrative Multi-Scale Analysis in Biomedical Data Science: Tools, Methods a...
Integrative Multi-Scale Analysis in Biomedical Data Science: Tools, Methods a...Joel Saltz
 
Tools to Analyze Morphology and Spatially Mapped Molecular Data - Informatio...
Tools to Analyze Morphology and Spatially Mapped Molecular Data -  Informatio...Tools to Analyze Morphology and Spatially Mapped Molecular Data -  Informatio...
Tools to Analyze Morphology and Spatially Mapped Molecular Data - Informatio...Joel Saltz
 
Generation and Use of Quantitative Pathology Phenotype
Generation and Use of Quantitative Pathology PhenotypeGeneration and Use of Quantitative Pathology Phenotype
Generation and Use of Quantitative Pathology PhenotypeJoel Saltz
 
Big Data and Extreme Scale Computing
Big Data and Extreme Scale Computing Big Data and Extreme Scale Computing
Big Data and Extreme Scale Computing Joel Saltz
 
Spatio-­‐temporal Sensor Integration, Analysis, Classification or Can Exascal...
Spatio-­‐temporal Sensor Integration, Analysis, Classification or Can Exascal...Spatio-­‐temporal Sensor Integration, Analysis, Classification or Can Exascal...
Spatio-­‐temporal Sensor Integration, Analysis, Classification or Can Exascal...Joel Saltz
 
Computational Pathology Workshop July 8 2014
Computational Pathology Workshop July 8 2014Computational Pathology Workshop July 8 2014
Computational Pathology Workshop July 8 2014Joel Saltz
 
Data and Computational Challenges in Integrative Biomedical Informatics
Data and Computational Challenges in Integrative Biomedical InformaticsData and Computational Challenges in Integrative Biomedical Informatics
Data and Computational Challenges in Integrative Biomedical InformaticsJoel Saltz
 
Biomedical Informatics Program -- Atlanta CTSA (ACTSI)
Biomedical Informatics Program -- Atlanta CTSA (ACTSI)Biomedical Informatics Program -- Atlanta CTSA (ACTSI)
Biomedical Informatics Program -- Atlanta CTSA (ACTSI)Joel Saltz
 
Presentation at UHC Annual Meeting
Presentation at UHC  Annual MeetingPresentation at UHC  Annual Meeting
Presentation at UHC Annual MeetingJoel Saltz
 
Indiana 4 2011 Final Final
Indiana 4 2011 Final FinalIndiana 4 2011 Final Final
Indiana 4 2011 Final FinalJoel Saltz
 

Más de Joel Saltz (20)

AI and whole slide imaging biomarkers
AI and whole slide imaging biomarkersAI and whole slide imaging biomarkers
AI and whole slide imaging biomarkers
 
Pathomics, Clinical Studies, and Cancer Surveillance
Pathomics, Clinical Studies, and Cancer SurveillancePathomics, Clinical Studies, and Cancer Surveillance
Pathomics, Clinical Studies, and Cancer Surveillance
 
Learning, Training,  Classification,  Common Sense and Exascale Computing
Learning, Training,  Classification,  Common Sense and Exascale ComputingLearning, Training,  Classification,  Common Sense and Exascale Computing
Learning, Training,  Classification,  Common Sense and Exascale Computing
 
Integrative Everything, Deep Learning and Streaming Data
Integrative Everything, Deep Learning and Streaming DataIntegrative Everything, Deep Learning and Streaming Data
Integrative Everything, Deep Learning and Streaming Data
 
Digital Pathology: Precision Medicine, Deep Learning and Computer Aided Inter...
Digital Pathology: Precision Medicine, Deep Learning and Computer Aided Inter...Digital Pathology: Precision Medicine, Deep Learning and Computer Aided Inter...
Digital Pathology: Precision Medicine, Deep Learning and Computer Aided Inter...
 
Extreme Computing, Clinical Medicine and GPUs or Can GPUs Cure Cancer
Extreme Computing, Clinical Medicine and GPUs or Can GPUs Cure CancerExtreme Computing, Clinical Medicine and GPUs or Can GPUs Cure Cancer
Extreme Computing, Clinical Medicine and GPUs or Can GPUs Cure Cancer
 
Twenty Years of Whole Slide Imaging - the Coming Phase Change
Twenty Years of Whole Slide Imaging - the Coming Phase ChangeTwenty Years of Whole Slide Imaging - the Coming Phase Change
Twenty Years of Whole Slide Imaging - the Coming Phase Change
 
Twenty Years of Whole Slide Imaging - the Coming Phase Change
Twenty Years of Whole Slide Imaging - the Coming Phase ChangeTwenty Years of Whole Slide Imaging - the Coming Phase Change
Twenty Years of Whole Slide Imaging - the Coming Phase Change
 
Digital Pathology, FDA Approval and Precision Medicine
Digital Pathology, FDA Approval and Precision MedicineDigital Pathology, FDA Approval and Precision Medicine
Digital Pathology, FDA Approval and Precision Medicine
 
Pathomics Based Biomarkers and Precision Medicine
Pathomics Based Biomarkers and Precision MedicinePathomics Based Biomarkers and Precision Medicine
Pathomics Based Biomarkers and Precision Medicine
 
Integrative Multi-Scale Analysis in Biomedical Data Science: Tools, Methods a...
Integrative Multi-Scale Analysis in Biomedical Data Science: Tools, Methods a...Integrative Multi-Scale Analysis in Biomedical Data Science: Tools, Methods a...
Integrative Multi-Scale Analysis in Biomedical Data Science: Tools, Methods a...
 
Tools to Analyze Morphology and Spatially Mapped Molecular Data - Informatio...
Tools to Analyze Morphology and Spatially Mapped Molecular Data -  Informatio...Tools to Analyze Morphology and Spatially Mapped Molecular Data -  Informatio...
Tools to Analyze Morphology and Spatially Mapped Molecular Data - Informatio...
 
Generation and Use of Quantitative Pathology Phenotype
Generation and Use of Quantitative Pathology PhenotypeGeneration and Use of Quantitative Pathology Phenotype
Generation and Use of Quantitative Pathology Phenotype
 
Big Data and Extreme Scale Computing
Big Data and Extreme Scale Computing Big Data and Extreme Scale Computing
Big Data and Extreme Scale Computing
 
Spatio-­‐temporal Sensor Integration, Analysis, Classification or Can Exascal...
Spatio-­‐temporal Sensor Integration, Analysis, Classification or Can Exascal...Spatio-­‐temporal Sensor Integration, Analysis, Classification or Can Exascal...
Spatio-­‐temporal Sensor Integration, Analysis, Classification or Can Exascal...
 
Computational Pathology Workshop July 8 2014
Computational Pathology Workshop July 8 2014Computational Pathology Workshop July 8 2014
Computational Pathology Workshop July 8 2014
 
Data and Computational Challenges in Integrative Biomedical Informatics
Data and Computational Challenges in Integrative Biomedical InformaticsData and Computational Challenges in Integrative Biomedical Informatics
Data and Computational Challenges in Integrative Biomedical Informatics
 
Biomedical Informatics Program -- Atlanta CTSA (ACTSI)
Biomedical Informatics Program -- Atlanta CTSA (ACTSI)Biomedical Informatics Program -- Atlanta CTSA (ACTSI)
Biomedical Informatics Program -- Atlanta CTSA (ACTSI)
 
Presentation at UHC Annual Meeting
Presentation at UHC  Annual MeetingPresentation at UHC  Annual Meeting
Presentation at UHC Annual Meeting
 
Indiana 4 2011 Final Final
Indiana 4 2011 Final FinalIndiana 4 2011 Final Final
Indiana 4 2011 Final Final
 

Integrative Multi-Scale Analyses

  • 1. Integrative Multi-Scale Analyses Joel Saltz MD, PhD Emory University February 2013
  • 2. a.k.a “Big Data” Center for Comprehensive Informatics • Integrative Spatio-Temporal Analytics • Deep Integrative Biomedical Research • High End Computing/”Big Data” Computers, Systems Software • Analysis of Patient Populations
  • 3. Application Targets Center for Comprehensive Informatics • Multi-dimensional spatial-temporal datasets – Radiology and Microscopy Image Analyses – Oil Reservoir Simulation/Carbon Sequestration/Groundwater Pollution Remediation – Biomass monitoring and disaster surveillance using multiple types of satellite imagery – Weather prediction using satellite and ground sensor data – Analysis of Results from Large Scale Simulations • Correlative and cooperative analysis of data from multiple sensor modalities and sources • What-if scenarios and multiple design choices or initial conditions
  • 4. Core Transformations Center for Comprehensive Informatics • Data Cleaning and Low Level Transformations • Data Subsetting, Filtering, Subsampling • Spatio-temporal Mapping and Registration • Object Segmentation • Feature Extraction, Object Classification • Spatio-temporal Aggregation • Change Detection, Comparison, and Quantification
  • 5. Emory In Silico Center for Brain Tumor Research (PI = Dan Brat, PD= Joel Saltz)
  • 6.
  • 7.
  • 8.
  • 9. National Science Foundation Grand Challenge in Land Cover Dynamics • Remote sensing analysis of high resolution satellite images. • Databases of land cover dynamics are essential for global carbon models, biogeochemical cycling, hydrological modeling and ecosystem response modeling • Maps of the world's tropical rain forest during the past three decades. Larry Davis , Rama Chellappa , Joel Saltz , Alan Sussman , John Townshend
  • 10. Analysis of Computational Data; Uncertainty Quantification, Comparisons with Experimental Results Center for Comprehensive Informatics Dimitri Mavriplis, Raja Das, Joel Saltz
  • 11. a.k.a “Big Data” Center for Comprehensive Informatics • Integrative Spatio-Temporal Analytics • Deep Integrative Biomedical Research • High End Computing/”Big Data” Computers, Systems Software • Analysis of Patient Populations
  • 12. Center for Comprehensive Informatics Whole Slide Imaging: Scale
  • 13. Pathology Computer Assisted Diagnosis Center for Comprehensive Informatics Shimada, Gurcan, Kong, Saltz
  • 14. Computerized Classification System for Grading Neuroblastoma Initialization Yes Image Tile Background? Label I=L • Background Identification No Create Image I(L) • Image Decomposition (Multi- Training Tiles resolution levels) Segmentation I = I -1 • Image Segmentation Down-sampling (EMLDA) Segmentation Feature Construction • Feature Construction (2nd Yes No order statistics, Tonal Feature Extraction I > 1? Feature Construction Features) Feature Extraction Classification • Feature Extraction (LDA) + Classification (Bayesian) Classifier Training No • Multi-resolution Layer Within Confidence Region ? Controller (Confidence Yes TRAINING Region) TESTING
  • 15. Using TCGA Data to Study Glioblastoma Diagnostic Improvement Molecular Classification Predictors of Progression
  • 16. TCGA Network Digital Pathology Neuroimaging
  • 17. Morphological Tissue Classification Center for Comprehensive Informatics Whole Slide Imaging Cellular Features Nuclei Segmentation Lee Cooper, Jun Kong
  • 18. Can we use image analysis of TCGA GBMs TO INFORM diagnostic criteria based on molecular or clinical endpoints? Nuclear Qualities Oligodendroglioma Astrocytoma Application: Oligodendroglioma Component in GBM
  • 19. Millions of Nuclei Defined by n Features • Bottom-up analysis: let features define and drive the analysis • Top-down analysis: use the features with existing diagnostic constructs
  • 20. TCGA Whole Slide Images Step 1: Nuclei • Identify individual nuclei Segmentation and their boundaries Jun Kong
  • 21. Nuclear Analysis Workflow Step 1: Step 2: Nuclei Feature Segmentation Extraction • Describe individual nuclei in terms of size, shape, and texture
  • 22. Step 3: Nuclei Nuclear Qualities Classification 1 10 Oligodendroglioma Astrocytoma
  • 24. Comparison of Machine-based Classification to Human Based Classification Separation of GBM, Oligo1, Oligo2 Separation of GBM, Oligo1 and as Designated by Oligo2 as Designated by Machine Neuropathologists
  • 26. Gene Expression Correlates of High Oligo-Astro Ratio on Machine-based Classification Oligo Related Genes Myelin Basic Protein Proteolipoprotein HoxD1 Nuclear features most Associated with Oligo Signature Genes: Circularity (high) Eccentricity (low)
  • 27. Millions of Nuclei Defined by n Features • Bottom-up analysis: let nuclear features define and drive the analysis • Top-down analysis: analyze features in context of existing diagnostic constructs
  • 28. Direct Study of Relationship Between vs Center for Comprehensive Informatics Lee Cooper, Carlos Moreno
  • 29. Nuclear Features Used to Classify GBMs Center for Comprehensive Informatics 50 3 2 1 20 1 45 40 Silhouette Area 40 60 Cluster 80 2 35 100 120 30 140 3 25 160 2 3 4 5 6 7 20 40 60 80 100 120 140 160 # Clusters 0 0.5 1 Silhouette Value Consensus clustering of morphological signatures Study includes 200 million nuclei taken from 480 slides corresponding to 167 distinct patients Each possibility evaluated using 2000 iterations of K- means to quantify co-clustering
  • 30. Clustering identifies three morphological groups Center for Comprehensive Informatics • Analyzed 200 million nuclei from 162 TCGA GBMs (462 slides) • Named for functions of associated genes: Cell Cycle (CC), Chromatin Modification (CM), Protein Biosynthesis (PB) • Prognostically-significant (logrank p=4.5e-4) CC CM PB 1 CC 10 0.8 CM PB 20 Feature Indices 0.6 Survival 30 0.4 40 0.2 50 0 0 500 1000 1500 2000 2500 3000 Days
  • 31. Center for Comprehensive Informatics Associations
  • 32. Molecular Correlates of MR Features Using TCGA Data MRIs of TCGA GBMs reviewed by 3-6 neuroradiologists using VASARI feature set and In Vivo Imaging tools MR Features compared to TCGA Transcriptional Classes and Genetic Alterations David Gutman
  • 33. Capturing structured annotations and markups/ AIM Data Service
  • 34. VASARI Feature Set Scott Hwang Chad Holder Adam Flanders
  • 35. Prognostic Significance of Vasari Features Tests Between Groups: 0-33% vs. 34-95% Proportion enhancing Test ChiSquare DF P-Value Log-Rank 12.4775 3 0.0059* Wilcoxon 10.0802 3 0.0179*
  • 36. a.k.a “Big Data” Center for Comprehensive Informatics • Integrative Spatio-Temporal Analytics • Deep Integrative Biomedical Research • High End Computing/”Big Data” Computers, Systems Software • Analysis of Patient Populations
  • 37. Titan – Peak Speed 30,000,000,000,000,000 floating point operations per second! Center for Comprehensive Informatics
  • 39. Extreme DataCutter Prototype Center for Comprehensive Informatics DataCutter Pipeline of filters connected though logical streams In transit processing Flow control between filters and streams Developed 1990s-2000s; led to IBM System S Extreme DataCutter Two level hierarchical pipeline framework In transit processing Coarse grained components coordinated by Manager that coordinates work on pipeline stages between nodes Fine grained pipeline operations managed at the node level Both levels employ filter/stream paradigm Bottom line – everything ends up as DAGS
  • 40. Extreme DataCutter – Two Level Model Center for Comprehensive Informatics
  • 41. Center for Comprehensive Informatics Node Level Work Scheduling
  • 42. Brain Tumor Pipeline Scaling on Keeneland (100 Nodes) Center for Comprehensive Informatics
  • 43. Challenge: Structured/Unstructured Grid Calculations with Unpredictable Runtime Center for Comprehensive Informatics Dependencies Key Kernel in Distance Transform, Morphological Reconstruction, Delaney Triagulation
  • 44. “Speedup” relative to single CPU core Center for Comprehensive Informatics
  • 45. Large Scale Data Management Center for Comprehensive Informatics  Represented by a complex data model capturing multi-faceted information including markups, annotations, algorithm provenance, specimen, etc.  Support for complex relationships and spatial query: multi-level granularities, relationships between markups and annotations, spatial and nested relationships  Highly optimized spatial query and analyses  Implemented in a variety of ways including optimized CPU/GPU, Hadoop/HDFS and IBM DB2
  • 46. Spatial Centric – Pathology Imaging “GIS” Point query: human marked point Window query: return markups inside a nucleus contained in a rectangle . Containment query: nuclear feature Spatial join query: algorithm aggregation in tumor regions validation/comparison
  • 47. Algorithm Validation: Intersection between Two Result Sets (Spatial Join) PAIS: Example Queries . .
  • 48. VLDB 2012 Center for Comprehensive Informatics Change Detection, Comparison, and Quantification
  • 49. Approach to Integrated Sensor Data Analysis Framework Center for Comprehensive Informatics • Abstract templates specify dataset geometry • Templates describe collections of space-time regions • Mapping to memory hierarchies provided by user defined mapping functions • Leverages Parashar’s DataSpaces
  • 50. a.k.a “Big Data” Center for Comprehensive Informatics • Integrative Spatio-Temporal Analytics • Deep Integrative Biomedical Research • High End Computing/”Big Data” Computers, Systems Software • Analysis of Patient Populations
  • 51. Clinical Phenotype Characterization and the Emory Analytic Information Warehouse Center for Comprehensive Informatics • Example Project: Find hot spots in readmissions within 30 days – What fraction of patients with a given principal diagnosis will be readmitted within 30 days? – What fraction of patients with a given set of diseases will be readmitted within 30 days? – How does severity and time course of co-morbidities affect readmissions? – Geographic analyses • Compare and contrast with UHC Clinical Data Base – Repeat analyses across all UHC hospitals – Are we performing the same? – How are UHC-curated groupings of patients (e.g., product lines) useful? • Need a repeatable process that we can apply identically to both local and UHC data Andrew Post, Sharath Cholleti, Doris Gao, Michel Monsour, Himanshu Rathod
  • 52. Overall System Center for Comprehensive Informatics Metadata Repository I2b2 Web I2b2 Server Database Investigator Metadata Manager Data Modeler Data Query Processing Specification Data Analyst Investigator Database Mapper Data Analyst Study- Query tools specific Database Source Source Source Investigator data data data
  • 53. 5-year Datasets from Emory and University Healthcare Consortium Center for Comprehensive Informatics • EUH, EUHM and WW (inpatient encounters) • Removed encounter pairs with chemotherapy and radiation therapy readmit encounters (CDW data) • Encounter location (down to unit for Emory) • Providers (Emory only) • Discharge disposition • Primary and secondary ICD9 codes • Procedure codes • DRGs • Medication orders (Emory only) • Labs (Emory only) • Vitals (Emory only) • Geographic information (CDW only + US Census and American Community Survey) Analytic Information
  • 54. Using Emory & UHC Data to Find Associations With 30-day Readmits Center for Comprehensive Informatics • Problem: “Raw” clinical and administrative variables are difficult to use for associative data mining – Too many diagnosis codes, procedure codes – Continuous variables (e.g., labs) require interpretation – Temporal relationships between variables are implicit • Solution: Transform the data into a much smaller set of variables using heuristic knowledge – Categorize diagnosis and procedure codes using code hierarchies – Classify continuous variables using standard interpretations (e.g., high, normal, low) – Identify temporal patterns (e.g., frequency, duration, sequence) – Apply standard data mining techniques Analytic Information
  • 55. Derived Variables Center for Comprehensive Informatics • 30-day readmit • The 9 Emory Enhanced Risk Assessment Tool diagnosis categories • UHC product lines • Variables derived from a combination of codes and/or laboratory test results – Obesity – Diabetes/uncontrolled diabetes – End-stage renal disease (ESRD) – Pressure ulcer – Sickle cell disease/sickle cell crisis • Temporal variables derived over multiple encounters – Multiple MI – Multiple 30-day readmissions – Chemotherapy within 180 (or 365) days before surgery – Previous encounter within the last 90 (or 180) days
  • 56. 30-Day Readmission Rates for Derived Variables Center for Comprehensive Informatics Emory Health Care
  • 57. Geographic Analyses UHC Medicine General Product Line (#15) Center for Comprehensive Informatics Analytic Information Warehouse
  • 58. Predictive Modeling for Readmission Center for Comprehensive Informatics • Random forests (ensemble of decision trees) – Create a decision tree using a random subset of the variables in the dataset – Generate a large number of such trees – All trees vote to classify each test example in a training dataset – Generate a patient-specific readmission risk for each encounter • Rank the encounters by risk for a subsequent 30- day readmission Sharath Cholleti
  • 59. Emory Readmission Rates for High and Low Risk Groups Generated with Center for Comprehensive Informatics Random Forest
  • 60. Status of Clinical Phenotype Characterization Center for Comprehensive Informatics • Integrative dataset analysis can leverage patient information gathered over many encounters • Temporal analyses can generate derived variables that appear to correlate with readmissions • Predictive modeling has promise of providing decision support • Data Analytics arm of the Emory New Care Model Initiative led by Greg Esper • Ongoing analyses involve characterization of clinical phenotype in GWAS, biomarker and quality improvement efforts • Co-lead (with Bill Hersh) of CTSA CER Informatics taskforce dedicated to this issue
  • 61. a.k.a “Big Data” Center for Comprehensive Informatics • Integrative Spatio-Temporal Analytics • Deep Integrative Biomedical Research • High End Computing/”Big Data” Computers, Systems Software • Analysis of Patient Populations
  • 62. Thanks to: • In silico center team: Dan Brat (Science PI), Tahsin Kurc, Ashish Center for Comprehensive Informatics Sharma, Tony Pan, David Gutman, Jun Kong, Sharath Cholleti, Carlos Moreno, Chad Holder, Erwin Van Meir, Daniel Rubin, Tom Mikkelsen, Adam Flanders, Joel Saltz (Director) • Digital Pathology R01 (s): Foran and Saltz; Jun Kong, Sharath Cholleti, Fusheng Wang, Tony Pan, Tahsin Kurc, Ashish Sharma, David Gutman (Emory), Wenjin Chen, Vicky Chu, Jun Hu, Lin Yang, David J. Foran (Rutgers) • Analytic Warehouse team: Andrew Post, Sharath Cholleti, Doris Gao, Michel Monsour, Himanshu Rathod • In vivo imaging Emory team: Tony Pan, Ashish Sharma, Joel Saltz • NIH/in silico TCGA Imaging Group: Scott Hwang, Bob Clifford, Erich Huang, Dima Hammoud, Manal Jilwan, Prashant Raghavan, Max Wintermark, David Gutman, Carlos Moreno, Lee Cooper, John Freymann, Justin Kirby, Arun Krishnan, Seena Dehkharghani, Carl Jaffe • ACTSI Biomedical Informatics Program: Marc Overcash, Tim Morris, Tahsin Kurc, Alexander Quarshie, Circe Tsui, Adam Davis, Sharon Mason, Andrew Post, Alfredo Tirado-Ramos • ORNL HPC collaboration: Scott Klasky, David Pugmire ORNL
  • 63. Thanks to Center for Comprehensive Informatics • National Cancer Institute • National Library of Medicine • National Science Foundation • Cardiovascular Research Grid (NHLBI) • Minority Health Grid (ARRA) • Emory Health Care • Kaiser Health Care • Winship Cancer Institute • Oak Ridge National Laboratory • Woodruff Health Sciences

Notas del editor

  1. Combine with next slide.Graphical representation