Interpreting data from cohort studies where clinical and molecular data across hundreds to thousands of patient samples need to be integrated, potentially spanning multiple time points, is challenging. In this presentation, I will discuss how data visualization can be used to drive or support this process, using tools that are applying the concept of “divide and conquer” to visual exploration. I will be presenting our early work on StratomeX and illustrate how this approach led to techniques such as Domino and LineUp, and will also introduce OncoThreads and Lineage, tools that we designed for visualization of cohorts with temporal and genealogical information, respectively.
1. Patients, Genomes, Time:
Visualizing Disease Cohorts
Nils Gehlenborg, PhD
Department of Biomedical Informatics
Harvard Medical School
http://gehlenborglab.org | nils@hms.harvard.edu | @ngehlenborg
3. Patients, Genomes, Time:
Visualizing Disease Cohorts
Nils Gehlenborg, PhD
Department of Biomedical Informatics
Harvard Medical School
http://gehlenborglab.org | nils@hms.harvard.edu | @ngehlenborg
Disease Cohorts
5. Characteristics
Dozens to thousands of patients
One or more samples per patient: tumor &
normal tissue, primary tumor & metastatic
tumor(s), multiple time points, etc.
Many attributes per sample: omics data,
clinical measurements, outcomes, etc.
6. StratomeX
Discovering Subtypes in Tumor Cohorts
Marc Streit, Alexander Lex, Samuel Gratzl, Christian Partl, Dieter Schmalstieg, Hanspeter Pfister,
Peter Park, Nils Gehlenborg
Guided Visual Exploration of Genomic Stratifications in Cancer
Nature Methods, 11, 884–885, 2014
Samuel Gratzl
datavisyn
Marc Streit
JKU Linz
Alexander Lex
University of Utah
Funded by
NIH TCGA
8. microRNA expression
DNA methylation
protein expression
copy number variants
mutation calls
clinical parameters
mRNA expression
The Cancer Genome Atlas
10,000+ patients
20+ tumor types
14. Tumor Subtypes
PROBLEM 1
Visualize overlap of patient sets across two or more stratifications.
PROBLEM 2
Visualize characteristics of patient sets within a stratification of interest.
19. Tumor Subtypes
PROBLEM 1
Visualize overlap of patient sets across two or more stratifications.
PROBLEM 2
Visualize characteristics of patient sets within a stratification of interest.
PROBLEM 3
Identify relevant stratifications, pathways, and clinical variables.
20. Is there a mutation that overlaps with this mRNA cluster?
Is there a CNV that affects survival?
Is there a pathway that is enriched in this cluster?
Is there a mutually exclusive mutation?
Query
Rank
Visualize
Stratifications
Clinical Params
Pathways
Guided
Exploration
M Streit, A Lex, S Gratzl, C Partl, D Schmalstieg, H Pfister, P Park, N Gehlenborg , Nature Methods (2014)
21.
22. StratomeX+
Interactive visual exploration and refinement of
cluster assignments
Michael Kern, Alexander Lex, Nils Gehlenborg, Christopher R Johnson
Interactive visual exploration and refinement of cluster assignments
BMC Bioinformatics 18:406 (2017)
Alexander Lex
University of Utah
Funded by
NIH BD2K
23. Cluster Refinement
Adjust cluster (i.e. subtype) membership based on within- and between-cluster
metrics in context of other data
M Kern, A Lex, N Gehlenborg, C Johnson, BMC Bioinformatics (2017)
24.
25.
26.
27. Vistories
From Visual Exploration to
Storytelling and Back Again
Samuel Gratzl, Alexander Lex, Nils Gehlenborg, Nicola Cosgrove, Marc Streit
From Visual Exploration to Storytelling and Back Again
Computer Graphics Forum (EuroVis ’16) 35:491 (2016)
Samuel Gratzl
datavisyn
Marc Streit
JKU Linz
Alexander Lex
University of Utah
Funded by
NIH BD2K
28. Reproducible Visual Exploration
finding figure/videoAuthoringExploration Presentation
Current Model
S Gratzl, A Lex, N Gehlenborg, N Cosgrove, M Streit, Computer Graphics Forum (2016), http://vistories.org
29. Reproducible Visual Exploration
finding figure/videoAuthoringExploration Presentation
Current Model
Visualization Tool e.g. Illustrator e.g. PDF Viewer
S Gratzl, A Lex, N Gehlenborg, N Cosgrove, M Streit, Computer Graphics Forum (2016), http://vistories.org
33. Motivation
1. StratomeX is limited to a rigid columnar layout
2. StratomeX only shows connections on a block level, not for individual samples
3. StratomeX only supports exploration along the sample/patient dimension
39. OncoThreads
Incorporating Longitudinal Information
Theresa Harbig, Sabrina Nusrat, Alex Thomson, Hans Bitter, Tali Mazor, Ethan Cerami, Nils Gehlenborg
Visualization of Longitudinal Cancer Genomics Data
Work in Progress
Sabrina Nusrat
Harvard
Theresa Harbig
Harvard
Funded by
NIH/NHGRI
40. Motivation
1. Cohorts of patients with longitudinal sample information
2. Events between sample collection critical for interpretation
3. Application to longitudinal cancer cohorts or clinical trials
41.
42. Talk about OncoThreads tomorrow
at 5:40 pm in the BioVis COSI
or visit us at poster B763!
49. Take Aways
Despite highly heterogeneous data, the “block and ribbon” approaches are able to
integrate a wide range of data types
Integration of auxiliary visualization types (pathways, Kaplan-Meier plots, box plots,
etc.) extend the possibilities
Ability to aggregate data is critical to these approaches
50. Next Steps
Better integration of specialized visualizations with support for faceting and
aggregation
Scale to 100Ks or Ms of individuals (UK Biobank, All of Us, etc.)
Integration with data management systems (e.g. i2b2 TranSMART, cBioPortal)
- Challenge: generally not designed to support visualization, e.g. aggregation
- Opportunity: easier to deploy visualizations in real-world settings
Integration with analytical backends (e.g. Jupyter Notebooks or pipelines)