Engler and Prantl system of classification in plant taxonomy
Visualizing Patient Cohorts: Integrating Data Types, Relationships, and Time
1. Visualizing Patient Cohorts
Integrating Data Types, Relationships, and Time
Nils Gehlenborg, PhD
Department of Biomedical Informatics
Harvard Medical School
http://gehlenborglab.org | nils@hms.harvard.edu | @ngehlenborg
4. Typical Characteristics
Dozens to thousands of patients
One or more samples per patient: tumor &
normal tissue, primary tumor & metastatic
tumor(s), multiple time points, etc.
Many attributes per sample: omics data,
clinical measurements, outcomes, etc.
5. StratomeX
Discovering Subtypes in Tumor Cohorts
Marc Streit, Alexander Lex, Samuel Gratzl, Christian Partl, Dieter Schmalstieg, Hanspeter Pfister,
Peter Park, Nils Gehlenborg
Guided Visual Exploration of Genomic Stratifications in Cancer
Nature Methods, 11, 884–885, 2014
Samuel Gratzl
datavisyn
Marc Streit
JKU Linz
Alexander Lex
University of Utah
7. microRNA expression
DNA methylation
protein expression
copy number variants
mutation calls
clinical parameters
mRNA expression
The Cancer Genome Atlas
10,000+ patients
20+ tumor types
16. Tumor Subtypes
PROBLEM 1
Visualize overlap of patient sets across two or more stratifications.
PROBLEM 2
Visualize characteristics of patient sets within a stratification of interest.
19. Tumor Subtypes
PROBLEM 1
Visualize overlap of patient sets across two or more stratifications.
PROBLEM 2
Visualize characteristics of patient sets within a stratification of interest.
PROBLEM 3
Identify relevant stratifications, pathways, and clinical variables.
20. Is there a mutation that overlaps with this mRNA cluster?
Is there a CNV that affects survival?
Is there a pathway that is enriched in this cluster?
Is there a mutually exclusive mutation?
Query
Stratifications
Clinical Params
Pathways
Guided
Exploration
M Streit, A Lex, S Gratzl, C Partl, D Schmalstieg, H Pfister, P Park, N Gehlenborg , Nature Methods (2014)
22. Individual sets with large overlap: Jaccard Index
Overall similarity of stratifications: Adjusted Rand Index
Survival: Log Rank Score (one vs rest)
Queries to retrieve stratifications
Gene Set Enrichtment Score: original GSEA or Parametric
Assignment of Gene Set Enrichment (PAGE) (one vs rest)
Queries to retrieve pathways
L Hubert & P Arabie, Journal of Classification (1985)
A Subramanian et al., PNAS (2005)
S-Y Kim & DJ Volsky, BMC Bioinformatics (2005)
Guided
Exploration
36. Cluster Refinement
Adjust cluster (i.e. subtype) membership based on within- and between-cluster
metrics in context of other data
M Kern, A Lex, N Gehlenborg, C Johnson, BMC Bioinformatics (2017)
44. Motivation
1. StratomeX is limited to a rigid columnar layout
2. StratomeX only shows connections on a block level, not for individual samples
3. StratomeX only supports exploration along the sample/patient dimension
52. OncoThreads
Incorporating Longitudinal Information
Theresa Harbig, Sabrina Nusrat, Alex Thomson, Hans Bitter, Tali Mazor, Ethan Cerami, Nils Gehlenborg
Visualization of Longitudinal Cancer Genomics Data
Work in Progress
Sabrina Nusrat
Harvard
Theresa Harbig
Harvard
53. Motivation
1. Cohorts of patients with longitudinal sample information
2. Information about what happened between samples critical for interpretation
3. Application to longitudinal cancer cohorts or clinical trials
56. Take Aways
Despite highly heterogeneous data, the “block and ribbon” approaches are able
to integrate a wide range of data types
Integration of auxiliary visualization types (pathways, Kaplan-Meier plots, box
plots, etc.) extend the possibilities
Ability to aggregate data is critical to these approaches
57. Next Steps
Better integration of specialized visualizations with support for faceting and
aggregation
Scale to 100Ks or Ms of individuals (UK Biobank, All of Us, etc.)
Integration with data management systems (e.g. i2b2 TranSMART, cBioPortal)
- Challenge: generally not designed to support visualization, e.g. aggregation
- Opportunity: easier to deploy visualizations in real-world settings
Integration with analytical backends (e.g. Jupyter Notebooks or pipelines)
58. Broadening Impact
Consider role of academic visualization research in real-world settings
- novel visualization techniques are informing future work (cf. Domino)
- can fill niche not addressed by or not viable for commercial products
Collaboration with industry beginning at day 1 are ideal (cf. OncoThreads)
- generally true for any visualization project and any project partner!
Better education about strengths and weaknesses of visualization to avoid
disappointment and frustration by investing in the right places