Joel Saltz MD, PhD discusses data and computational challenges in integrative biomedical informatics. His research center analyzes complex patient data like medical images, pathology slides, and "omic" data to characterize diseases at multiple scales. Machine learning is used to automatically segment and classify features in images and identify patterns across different data types that can improve disease classification, predict outcomes, and uncover new biology. Large computing resources are required to handle and analyze huge biomedical datasets.
Call Girls Jp Nagar Just Call 7001305949 Top Class Call Girl Service Available
Data and Computational Challenges in Integrative Biomedical Informatics
1. Data and Computational
Challenges in Integrative
Biomedical Informatics
Joel Saltz MD, PhD
Chair Department of Biomedical Informatics,
Director Center for Comprehensive Informatics
Emory University
Adjunct Professor CSE, CS
College of Computing, Georgia Tech
3. Integrative Biomedical Informatics Analytics
Center for Comprehensive Informatics
• Anatomic/functional
characterization at fine
level (Pathology) and
gross level (Radiology) Radiology
Imaging
• High throughput multi-
scale image
segmentation, feature Patient
“Omic”
extraction, analysis of Outcome
Data
features
• Integration of
anatomic/functional Pathologic
Features
characterization with
multiple types of
“omic” information
5. Quantitative Feature Analysis in Pathology:
Emory In Silico Center for Brain Tumor
Research (PI = Dan Brat, PD= Joel Saltz)
6. Using TCGA Data to Study
Glioblastoma
Diagnostic Improvement
Molecular Classification
Predictors of Progression
7. Millions of Nuclei Defined by n Features
• Top-down analysis: use the features
with existing diagnostic constructs
• Bottom-up analysis: let features define
and drive the analysis
8. TCGA Whole Slide Images
Step 1:
Nuclei
• Identify individual nuclei
Segmentation
and their boundaries
Jun Kong
9. Nuclear Analysis Workflow
Step 1: Step 2:
Nuclei Feature
Segmentation Extraction
• Describe individual nuclei in terms of size,
shape, and texture
11. Comparison of Machine-based Classification
to Human Based Classification
Separation of GBM, Oligo1, Oligo2 Separation of GBM, Oligo1 and
as Designated by Oligo2 as Designated by Machine
Neuropathologists
13. Gene Expression Correlates of High Oligo-Astro
Ratio on Machine-based Classification
Oligo Related Genes
Myelin Basic Protein
Proteolipoprotein
HoxD1
Nuclear features most
Associated with Oligo
Signature Genes:
Circularity (high)
Eccentricity (low)
14. Millions of Nuclei Defined by n Features
• Top-down analysis: analyze features in
context of existing diagnostic constructs
• Bottom-up analysis: let nuclear features
define and drive the analysis
15. Direct Study of Relationship Between
vs
Center for Comprehensive Informatics
Lee Cooper,
Carlos Moreno
16. Clustering identifies three morphological groups
Center for Comprehensive Informatics
• Analyzed 200 million nuclei from 162 TCGA GBMs (462 slides)
• Named for functions of associated genes:
Cell Cycle (CC), Chromatin Modification (CM),
Protein Biosynthesis (PB)
• Prognostically-significant (logrank p=4.5e-4)
CC CM PB
1
CC
10 0.8 CM
PB
20
Feature Indices
0.6
Survival
30 0.4
40 0.2
50
0
0 500 1000 1500 2000 2500 3000
Days
19. Clinical Phenotype Characterization and the Emory
Analytic Information Warehouse
Center for Comprehensive Informatics
• Example Project: Find hot spots in readmissions within 30 days
– What fraction of patients with a given principal diagnosis will be
readmitted within 30 days?
– What fraction of patients with a given set of diseases will be readmitted
within 30 days?
– How does severity and time course of co-morbidities affect
readmissions?
– Geographic analyses
• Compare and contrast with UHC Clinical Data Base
– Repeat analyses across all UHC hospitals
– Are we performing the same?
– How are UHC-curated groupings of patients (e.g., product lines) useful?
• Need a repeatable process that we can apply identically to both
local and UHC data
20. Overall System
Center for Comprehensive Informatics
Metadata
Repository
I2b2 Web I2b2
Server
Database
Investigator Metadata
Manager
Data Modeler
Data Query
Processing Specification
Data Analyst
Investigator
Database
Mapper
Data Analyst
Study-
Query tools specific
Database Source Source Source
Investigator
data data data
21. 5-year Datasets from Emory and
University Healthcare Consortium
Center for Comprehensive Informatics
• EUH, EUHM and WW (inpatient encounters)
• Removed encounter pairs with chemotherapy and radiation
therapy readmit encounters (CDW data)
• Encounter location (down to unit for Emory)
• Providers (Emory only)
• Discharge disposition
• Primary and secondary ICD9 codes
• Procedure codes
• DRGs
• Medication orders (Emory only)
• Labs (Emory only)
• Vitals (Emory only)
• Geographic information (CDW only + US Census and American
Community Survey)
Analytic Information
22. Using Emory & UHC Data to Find
Associations With 30-day Readmits
Center for Comprehensive Informatics
• Problem: “Raw” clinical and administrative variables
are difficult to use for associative data mining
– Too many diagnosis codes, procedure codes
– Continuous variables (e.g., labs) require interpretation
– Temporal relationships between variables are implicit
• Solution: Transform the data into a much smaller set
of variables using heuristic knowledge
– Categorize diagnosis and procedure codes using code
hierarchies
– Classify continuous variables using standard
interpretations (e.g., high, normal, low)
– Identify temporal patterns (e.g., frequency, duration,
sequence)
– Apply standard data mining techniques
Analytic Information
23. Derived Variables
Center for Comprehensive Informatics
• 30-day readmit
• The 9 Emory Enhanced Risk Assessment Tool diagnosis categories
• UHC product lines
• Variables derived from a combination of codes and/or laboratory test results
– Obesity
– Diabetes/uncontrolled diabetes
– End-stage renal disease (ESRD)
– Pressure ulcer
– Sickle cell disease/sickle cell crisis
• Temporal variables derived over multiple encounters
– Multiple MI
– Multiple 30-day readmissions
– Chemotherapy within 180 (or 365) days before surgery
– Previous encounter within the last 90 (or 180) days
24. 30-Day Readmission Rates for
Derived Variables
Center for Comprehensive Informatics
Emory Health Care
25. Geographic Analyses
UHC Medicine General Product Line (#15)
Center for Comprehensive Informatics
Analytic Information Warehouse
26. Predictive Modeling for Readmission
Center for Comprehensive Informatics
• Random forests (ensemble of decision trees)
– Create a decision tree using a random subset of the
variables in the dataset
– Generate a large number of such trees
– All trees vote to classify each test example in a
training dataset
– Generate a patient-specific readmission risk for each
encounter
• Rank the encounters by risk for a subsequent 30-
day readmission
Analytic Information
27. Emory Readmission Rates for High and
Low Risk Groups Generated with
Center for Comprehensive Informatics
Random Forest
28. Predictive Modeling Applied to 180 UHC Hospitals
Readmission fraction of top 10% high risk patients
Center for Comprehensive Informatics
0.9
0.8
0.7
0.6
0.5 All Hospital Model
Individual Hospital
0.4
Model
0.3
0.2
0.1
0
113
17
25
33
41
49
57
65
73
81
89
97
161
105
121
129
137
145
153
169
177
185
9
1
29. Status of Healthcare Data Analytics
Center for Comprehensive Informatics
• Integrative dataset analysis can leverage patient
information gathered over many encounters
• Temporal analyses can generate derived variables that
appear to correlate with readmissions
• Predictive modeling has promise of providing decision
support
• Data Analytics arm of the Emory New Care Model
Initiative led by Greg Esper
• Ongoing analyses involve characterization of clinical
phenotype in GWAS, biomarker and quality
improvement efforts
• Co-lead (with Bill Hersh) of CTSA CER Informatics
taskforce dedicated to this issue
31. Supercomputing – Collaboration with ORNL: Titan – Peak Speed
30,000,000,000,000,000 floating point operations per second!
Center for Comprehensive Informatics
32.
33. Core Transformations for multi-scale pipelines
Center for Comprehensive Informatics
• Data Cleaning and Low Level Transformations
• Data Subsetting, Filtering, Subsampling
• Spatio-temporal Mapping and Registration
• Object Segmentation
• Feature Extraction, Object Classification
• Spatio-temporal Aggregation
• Change Detection, Comparison, and Quantification