Visual Analytics in omics - why what how

Visual Analytics in omics - why, what, how?
Prof Jan Aerts

STADIUS - ESAT, Faculty of Engineering, University of Leuven, Belgium

Data Visualization Lab

!
jan.aerts@esat.kuleuven.be

jan@datavislab.org
creativecommons.org/licenses/by-nc/3.0/

• What problem are we trying to solve?

• What is Visual Analytics and how can it help?

• How do we actually do this?

• Some examples

• Challenges

2

hypothesis-driven -> data-driven
Scientiﬁc Research Paradigms (Jim Gray, Microsoft)

!

1st

1,000s years ago

empirical

!

2nd

100s years ago

theoretical

!

3rd

last few decades

computational

4rd

today

data exploration

!

I have an hypothesis -> need to generate data to (dis)prove it. 
I have data -> need to ﬁnd hypotheses that I can test.

4

What does this mean?
• immense re-use of existing datasets

• biologically interesting signals may be too poorly understood to be analyzed
in automated fashion

• much of initial analysis is exploratory in nature => what’s my hypothesis? 
=> searching for unknown unknowns

• automated algorithms often act as black boxes => biologists must have blind
faith in bioinformatician (and bioinformatician in his/her own skills)

5

For domain expert: what’s my hypothesis?

Martin Krzywinski
7

For developer and domain expert: 
opening the black box
input
filter 1
filter 2
filter 3
output A

output B

output C
8

B. What is Visual Analytics and how can it help?

9

Our research interest:

visual design + interaction design + backend

10

What is visualization?

visualization of simulations

in situ visualization 
of real-world structures

11


T. Munzner

12


cognition <=> perception
cognitive task => perceptive task

T. Munzner

13

Why do we visualize data?
• record information

• blueprints, photographs, 
seismographs, ...

• analyze data to support reasoning

• develop & assess hypotheses

• discover errors in data

• expand memory

• ﬁnd patterns (see Snow’s cholera map)

• communicate information

• share & persuade

• collaborate & revise
14

Sedlmair et al. IEEE Transactions on Visualization and Computer Graphics. 2012

pictorial superiority effect
“information”
72hr

“informa”
65%

“i”
10%
17

Steven’s psychophysical law
= proposed relationship between the magnitude of a physical stimulus and its
perceived intensity or strength

18

Accuracy of quantitative perceptual tasks
how much (quantitative)

what/where (qualitative)

McKinlay
19



McKinlay
20



“power of the plane”

McKinlay
21

Pre-attentive vision
= ability of low-level human visual system to rapidly identify certain basic visual
properties

• some features “pop out”

• used for:

• target detection

• boundary detection

• counting/estimation

• ...

• visual system takes over => all cognitive power available for interpreting the
ﬁgure, rather than needing part of it for processing the ﬁgure
22

Limitations of preattentive vision
1. Combining pre-attentive features does not always work => would need to
resort to “serial search” (most channel pairs; all channel triplets)

e.g. is there a red square in this picture

2. Speed depends on which channel (use one that is good for
categorical)

25

Gestalt laws - interplay between parts and the
whole

26

Gestalt laws - interplay between parts and the
whole
• simplicity

• familiarity

• proximity

• symmetry

• similarity

• connectedness

• good continuation

• common fate

27

Bret Victor - Ladder of abstration

28

For domain expert: what’s my hypothesis?

Martin Krzywinski
29

For developer and domain expert: 
opening the black box
input
filter 1
filter 2
filter 3
output A

output B

output C
32

C. How do we actually do this?

36

Data visualization framework

38

Processing - http://processing.org
• java

41

D3 - http://d3js.org/
• javascript

42

Vega - https://github.com/trifacta/vega/wiki
• html + json

43

D. Examples

Data exploration
Data ﬁltering
User-guided analysis

44

Data exploration

HiTSee
Bertini E et al. IEEE Symposium on Biological Data Visualization (2011)

Aracari
Bartlett C et al. BMC Bioinformatics (2012)

Ryo Sakai
46

Reveal
Jäger, G et al. Bioinformatics (2012)

Meander
Pavlopoulos et al. Nucl Acids Res (2013)

Georgios
Pavlopoulos

48

ParCoord

Endeavour gene prioritization

Boogaerts T et al. IEEE International Conference on
Bioinformatics & Bioengineering (2012)

Thomas Boogaerts
49

subgroup

similarity

difference

Data ﬁltering (visual parameter setting)

TrioVis
Sakai R et al. Bioinformatics (2013)

Ryo Sakai

54

User-guided analysis
clustering

regions of interest

Spark
Nielsen et al. Genome Research (2012)

data samples
chromatin modiﬁcation

DNA methylation
RNA-Seq

55

BaobabView
decision trees

van den Elzen S & van Wijk J. IEEE Conference on
Visual Analytics Science and Technology (2011)

Many challenges remain
• scalability (data processing + perception), uncertainty, “interestingness”,
interaction, evaluation

• infrastructure & architecture

• fast imprecise answers with progressive reﬁnement

• incremental re-computation

• steering computation towards data regions of interest

58

Computational scalability
• speed
• preprocessing big data: mapreduce = batch

• interactivity: max 0.3 sec lag!

• size
• multiple data resolutions => data size increase

• not all resolutions necessary for all data regions: steer computation to
regions of interest

• Options:

• distribute visualization calculations over cluster

• distributing scala/spark or other “real-time” mapreduce paradigm

• functional programming paradigm?

• lazy evaluation and smart preprocessing: only calculate what’s needed

=> generic framework

Perceptual scalability
• “overview first, then zoom and filter, details on demand”: breaks down with
very big datasets

• “analyze first, show results, then zoom and filter, details on demand” => need
to identify regions of interest and “interestingness features”

• identify higher-level structure in data (e.g. clustering, dimensionality
reduction) -> use these to guide user

Thank you
• Georgios Pavlopoulos

• Ryo Sakai

• Thomas Boogaerts

• Toni Verbeiren

• Data Visualization Lab (datavislab.org)

• Erik Duval

• Andrew Vande Moere
62

Visual Analytics in omics - why what how

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a Visual Analytics in omics - why what how

Similar a Visual Analytics in omics - why what how (20)

Más de Jan Aerts

Más de Jan Aerts (20)

Último

Último (20)

Visual Analytics in omics - why what how