This document discusses visual analytics in omics data. It begins by noting the shift from hypothesis-driven to data-driven research due to large datasets. Visual analytics can help explore these data by opening the "black box" of algorithms and enabling researchers to develop hypotheses. Effective visualization leverages human perception through techniques like preattentive vision and Gestalt laws. Challenges to visual analytics include scalability issues for large datasets and identifying interesting patterns for further analysis. Examples demonstrate data exploration, filtering, and user-guided analysis in genomic applications.
1. Visual Analytics in omics - why, what, how?
Prof Jan Aerts
STADIUS - ESAT, Faculty of Engineering, University of Leuven, Belgium
Data Visualization Lab
!
jan.aerts@esat.kuleuven.be
jan@datavislab.org
creativecommons.org/licenses/by-nc/3.0/
2. • What problem are we trying to solve?
• What is Visual Analytics and how can it help?
• How do we actually do this?
• Some examples
• Challenges
2
4. hypothesis-driven -> data-driven
Scientific Research Paradigms (Jim Gray, Microsoft)
!
1st
1,000s years ago
empirical
!
2nd
100s years ago
theoretical
!
3rd
last few decades
computational
4rd
today
data exploration
!
I have an hypothesis -> need to generate data to (dis)prove it.
I have data -> need to find hypotheses that I can test.
4
5. What does this mean?
• immense re-use of existing datasets
• biologically interesting signals may be too poorly understood to be analyzed
in automated fashion
• much of initial analysis is exploratory in nature => what’s my hypothesis?
=> searching for unknown unknowns
• automated algorithms often act as black boxes => biologists must have blind
faith in bioinformatician (and bioinformatician in his/her own skills)
5
14. Why do we visualize data?
• record information
• blueprints, photographs,
seismographs, ...
• analyze data to support reasoning
• develop & assess hypotheses
• discover errors in data
• expand memory
• find patterns (see Snow’s cholera map)
• communicate information
• share & persuade
• collaborate & revise
14
15. Sedlmair et al. IEEE Transactions on Visualization and Computer Graphics. 2012
18. Steven’s psychophysical law
= proposed relationship between the magnitude of a physical stimulus and its
perceived intensity or strength
18
19. Accuracy of quantitative perceptual tasks
how much (quantitative)
what/where (qualitative)
McKinlay
19
20. Accuracy of quantitative perceptual tasks
how much (quantitative)
what/where (qualitative)
McKinlay
20
21. Accuracy of quantitative perceptual tasks
how much (quantitative)
what/where (qualitative)
“power of the plane”
McKinlay
21
22. Pre-attentive vision
= ability of low-level human visual system to rapidly identify certain basic visual
properties
• some features “pop out”
• used for:
• target detection
• boundary detection
• counting/estimation
• ...
• visual system takes over => all cognitive power available for interpreting the
figure, rather than needing part of it for processing the figure
22
25. Limitations of preattentive vision
1. Combining pre-attentive features does not always work => would need to
resort to “serial search” (most channel pairs; all channel triplets)
e.g. is there a red square in this picture
2. Speed depends on which channel (use one that is good for
categorical)
25
27. Gestalt laws - interplay between parts and the
whole
• simplicity
• familiarity
• proximity
• symmetry
• similarity
• connectedness
• good continuation
• common fate
27
58. Many challenges remain
• scalability (data processing + perception), uncertainty, “interestingness”,
interaction, evaluation
• infrastructure & architecture
• fast imprecise answers with progressive refinement
• incremental re-computation
• steering computation towards data regions of interest
58
59. Computational scalability
• speed
• preprocessing big data: mapreduce = batch
• interactivity: max 0.3 sec lag!
• size
• multiple data resolutions => data size increase
• not all resolutions necessary for all data regions: steer computation to
regions of interest
60. • Options:
• distribute visualization calculations over cluster
• distributing scala/spark or other “real-time” mapreduce paradigm
• functional programming paradigm?
• lazy evaluation and smart preprocessing: only calculate what’s needed
=> generic framework
61. Perceptual scalability
• “overview first, then zoom and filter, details on demand”: breaks down with
very big datasets
• “analyze first, show results, then zoom and filter, details on demand” => need
to identify regions of interest and “interestingness features”
• identify higher-level structure in data (e.g. clustering, dimensionality
reduction) -> use these to guide user
62. Thank you
• Georgios Pavlopoulos
• Ryo Sakai
• Thomas Boogaerts
• Toni Verbeiren
• Data Visualization Lab (datavislab.org)
• Erik Duval
• Andrew Vande Moere
62