SlideShare una empresa de Scribd logo
1 de 23
Humanizing data
analysis
Jan Aerts
Bioinformatics, ESAT/SCD, University of Leuven
Future Health Department, iMinds
I                     II                    III                       IV
 x          y          x           y         x             y            x          y
10.0 8.04             10.0        9.14     10.0          7.46          8.0    6.58
8.0        6.95       8.0         8.14      8.0          6.77          8.0        5.76
13.0       7.58       13.0        8.74     13.0 12.74                  8.0        7.71
9.0        8.81       9.0         8.77      9.0          7.11          8.0    8.84
11.0       8.33       11.0        9.26      11.0         7.81          8.0        8.47
14.0 9.96             14.0        8.10     14.0 8.84                   8.0        7.04
6.0        7.24       6.0         6.13      6.0          6.08          8.0    5.25
4.0        4.26       4.0         3.10      4.0          5.39         19.0 12.50
12.0 10.84            12.0        9.13     12.0          8.15          8.0    5.56
7.0        4.82       7.0         7.26      7.0          6.42          8.0        7.91
5.0        5.68       5.0         4.74      5.0          5.73          8.0    6.80

                                                         correlation x & y = 0.816
             mean x = 9.0     variance x = 11.0          regression line: y =
n = 11
             mean y = 7.5     variance y = 4.12          3+0.5x
visual
analytics
cognitive task => perceptive
            task
Opening the
black box
input
filter 1

 filter
    2
 filter
    3
           output   output   output
             A        B        C
A       B




    C
A       B




    C
A       B




    C
Generating
hypotheses
wallpaperweb.org
put the human back in the loop!

Más contenido relacionado

Destacado

S Cheng - eagle-i: development and expansion of a scientific resource discove...
S Cheng - eagle-i: development and expansion of a scientific resource discove...S Cheng - eagle-i: development and expansion of a scientific resource discove...
S Cheng - eagle-i: development and expansion of a scientific resource discove...Jan Aerts
 
Visual Analytics in Omics - why, what, how?
Visual Analytics in Omics - why, what, how?Visual Analytics in Omics - why, what, how?
Visual Analytics in Omics - why, what, how?Jan Aerts
 
VIZBI 2014 - Visualizing Genomic Variation
VIZBI 2014 - Visualizing Genomic VariationVIZBI 2014 - Visualizing Genomic Variation
VIZBI 2014 - Visualizing Genomic VariationJan Aerts
 
L Fu - Dao: a novel programming language for bioinformatics
L Fu - Dao: a novel programming language for bioinformaticsL Fu - Dao: a novel programming language for bioinformatics
L Fu - Dao: a novel programming language for bioinformaticsJan Aerts
 
Visual Analytics talk at ISMB2013
Visual Analytics talk at ISMB2013Visual Analytics talk at ISMB2013
Visual Analytics talk at ISMB2013Jan Aerts
 
Visualizing the Structural Variome (VMLS-Eurovis 2013)
Visualizing the Structural Variome (VMLS-Eurovis 2013)Visualizing the Structural Variome (VMLS-Eurovis 2013)
Visualizing the Structural Variome (VMLS-Eurovis 2013)Jan Aerts
 
Visual Analytics in Omics: why, what, how?
Visual Analytics in Omics: why, what, how?Visual Analytics in Omics: why, what, how?
Visual Analytics in Omics: why, what, how?Jan Aerts
 
Intro to data visualization
Intro to data visualizationIntro to data visualization
Intro to data visualizationJan Aerts
 
SEEK for Science: A Data and Model Management Platform to support Open and Re...
SEEK for Science: A Data and Model Management Platform to support Open and Re...SEEK for Science: A Data and Model Management Platform to support Open and Re...
SEEK for Science: A Data and Model Management Platform to support Open and Re...Carole Goble
 

Destacado (9)

S Cheng - eagle-i: development and expansion of a scientific resource discove...
S Cheng - eagle-i: development and expansion of a scientific resource discove...S Cheng - eagle-i: development and expansion of a scientific resource discove...
S Cheng - eagle-i: development and expansion of a scientific resource discove...
 
Visual Analytics in Omics - why, what, how?
Visual Analytics in Omics - why, what, how?Visual Analytics in Omics - why, what, how?
Visual Analytics in Omics - why, what, how?
 
VIZBI 2014 - Visualizing Genomic Variation
VIZBI 2014 - Visualizing Genomic VariationVIZBI 2014 - Visualizing Genomic Variation
VIZBI 2014 - Visualizing Genomic Variation
 
L Fu - Dao: a novel programming language for bioinformatics
L Fu - Dao: a novel programming language for bioinformaticsL Fu - Dao: a novel programming language for bioinformatics
L Fu - Dao: a novel programming language for bioinformatics
 
Visual Analytics talk at ISMB2013
Visual Analytics talk at ISMB2013Visual Analytics talk at ISMB2013
Visual Analytics talk at ISMB2013
 
Visualizing the Structural Variome (VMLS-Eurovis 2013)
Visualizing the Structural Variome (VMLS-Eurovis 2013)Visualizing the Structural Variome (VMLS-Eurovis 2013)
Visualizing the Structural Variome (VMLS-Eurovis 2013)
 
Visual Analytics in Omics: why, what, how?
Visual Analytics in Omics: why, what, how?Visual Analytics in Omics: why, what, how?
Visual Analytics in Omics: why, what, how?
 
Intro to data visualization
Intro to data visualizationIntro to data visualization
Intro to data visualization
 
SEEK for Science: A Data and Model Management Platform to support Open and Re...
SEEK for Science: A Data and Model Management Platform to support Open and Re...SEEK for Science: A Data and Model Management Platform to support Open and Re...
SEEK for Science: A Data and Model Management Platform to support Open and Re...
 

Similar a Humanizing Data Analysis

Information Visualization for Knowledge Discovery
Information Visualization for Knowledge DiscoveryInformation Visualization for Knowledge Discovery
Information Visualization for Knowledge DiscoveryUniversity of Maryland
 
Information Visualization: See Patterns, Gain Insights & Make Decisions
Information Visualization: See Patterns, Gain Insights & Make DecisionsInformation Visualization: See Patterns, Gain Insights & Make Decisions
Information Visualization: See Patterns, Gain Insights & Make DecisionsUniversity of Maryland
 
Information Visualization for Medical Informatics
Information Visualization for Medical Informatics Information Visualization for Medical Informatics
Information Visualization for Medical Informatics University of Maryland
 
Powerhouse Factories, Data Analysis
Powerhouse Factories, Data AnalysisPowerhouse Factories, Data Analysis
Powerhouse Factories, Data AnalysisPowerhouse Factories
 
Visual Analytics Best Practices
Visual Analytics Best PracticesVisual Analytics Best Practices
Visual Analytics Best PracticesTableau Software
 
Info vis 4-22-2013-dc-vis-meetup-shneiderman
Info vis 4-22-2013-dc-vis-meetup-shneidermanInfo vis 4-22-2013-dc-vis-meetup-shneiderman
Info vis 4-22-2013-dc-vis-meetup-shneidermanUniversity of Maryland
 
MICE Midtern results 1/2553 (PSU TRANG)
MICE Midtern results 1/2553 (PSU TRANG)MICE Midtern results 1/2553 (PSU TRANG)
MICE Midtern results 1/2553 (PSU TRANG)Pavit Tansakul
 
Sampling: An an often overlooked art in exploratory data analysis
Sampling: An an often overlooked art in exploratory data analysisSampling: An an often overlooked art in exploratory data analysis
Sampling: An an often overlooked art in exploratory data analysisEli Bressert
 
Crafting Visual Stories with Data
Crafting Visual Stories with DataCrafting Visual Stories with Data
Crafting Visual Stories with DataAmit Kapoor
 
Information Visualization for Health Care
Information Visualization for Health CareInformation Visualization for Health Care
Information Visualization for Health CareKrist Wongsuphasawat
 
R4 Sant Vicenç de Calders - Manresa, per Barcelona i Sabadell
R4 Sant Vicenç de Calders - Manresa, per Barcelona i SabadellR4 Sant Vicenç de Calders - Manresa, per Barcelona i Sabadell
R4 Sant Vicenç de Calders - Manresa, per Barcelona i SabadellPsc Polinyà
 
Data science see what your eyes can't
Data science see what your eyes can'tData science see what your eyes can't
Data science see what your eyes can'tInnoTech
 
Anthropometric Analysis
Anthropometric AnalysisAnthropometric Analysis
Anthropometric AnalysisJoe Jancsics
 
Eric E Monson, Text->Data 08 Nov 2012
Eric E Monson, Text->Data 08 Nov 2012Eric E Monson, Text->Data 08 Nov 2012
Eric E Monson, Text->Data 08 Nov 2012emonson
 

Similar a Humanizing Data Analysis (20)

Info vis 4-2012-part1
Info vis 4-2012-part1Info vis 4-2012-part1
Info vis 4-2012-part1
 
Google nyc-6-3-2011
Google nyc-6-3-2011Google nyc-6-3-2011
Google nyc-6-3-2011
 
Information Visualization for Knowledge Discovery
Information Visualization for Knowledge DiscoveryInformation Visualization for Knowledge Discovery
Information Visualization for Knowledge Discovery
 
Information Visualization: See Patterns, Gain Insights & Make Decisions
Information Visualization: See Patterns, Gain Insights & Make DecisionsInformation Visualization: See Patterns, Gain Insights & Make Decisions
Information Visualization: See Patterns, Gain Insights & Make Decisions
 
Information Visualization for Medical Informatics
Information Visualization for Medical Informatics Information Visualization for Medical Informatics
Information Visualization for Medical Informatics
 
Info vis 12-2012-v17-shneiderman
Info vis 12-2012-v17-shneidermanInfo vis 12-2012-v17-shneiderman
Info vis 12-2012-v17-shneiderman
 
Powerhouse Factories, Data Analysis
Powerhouse Factories, Data AnalysisPowerhouse Factories, Data Analysis
Powerhouse Factories, Data Analysis
 
Visual Analytics Best Practices
Visual Analytics Best PracticesVisual Analytics Best Practices
Visual Analytics Best Practices
 
Info vis 4-22-2013-dc-vis-meetup-shneiderman
Info vis 4-22-2013-dc-vis-meetup-shneidermanInfo vis 4-22-2013-dc-vis-meetup-shneiderman
Info vis 4-22-2013-dc-vis-meetup-shneiderman
 
MICE Midtern results 1/2553 (PSU TRANG)
MICE Midtern results 1/2553 (PSU TRANG)MICE Midtern results 1/2553 (PSU TRANG)
MICE Midtern results 1/2553 (PSU TRANG)
 
Sampling: An an often overlooked art in exploratory data analysis
Sampling: An an often overlooked art in exploratory data analysisSampling: An an often overlooked art in exploratory data analysis
Sampling: An an often overlooked art in exploratory data analysis
 
Crafting Visual Stories with Data
Crafting Visual Stories with DataCrafting Visual Stories with Data
Crafting Visual Stories with Data
 
Visual analytics
Visual analyticsVisual analytics
Visual analytics
 
Information Visualization for Health Care
Information Visualization for Health CareInformation Visualization for Health Care
Information Visualization for Health Care
 
Bibliotheca Digitalis Summer school: Visualisation in Digital Humanities for ...
Bibliotheca Digitalis Summer school: Visualisation in Digital Humanities for ...Bibliotheca Digitalis Summer school: Visualisation in Digital Humanities for ...
Bibliotheca Digitalis Summer school: Visualisation in Digital Humanities for ...
 
R4 Sant Vicenç de Calders - Manresa, per Barcelona i Sabadell
R4 Sant Vicenç de Calders - Manresa, per Barcelona i SabadellR4 Sant Vicenç de Calders - Manresa, per Barcelona i Sabadell
R4 Sant Vicenç de Calders - Manresa, per Barcelona i Sabadell
 
Data science see what your eyes can't
Data science see what your eyes can'tData science see what your eyes can't
Data science see what your eyes can't
 
R4
R4R4
R4
 
Anthropometric Analysis
Anthropometric AnalysisAnthropometric Analysis
Anthropometric Analysis
 
Eric E Monson, Text->Data 08 Nov 2012
Eric E Monson, Text->Data 08 Nov 2012Eric E Monson, Text->Data 08 Nov 2012
Eric E Monson, Text->Data 08 Nov 2012
 

Más de Jan Aerts

S Cain - GMOD in the cloud
S Cain - GMOD in the cloudS Cain - GMOD in the cloud
S Cain - GMOD in the cloudJan Aerts
 
B Temperton - The Bioinformatics Testing Consortium
B Temperton - The Bioinformatics Testing ConsortiumB Temperton - The Bioinformatics Testing Consortium
B Temperton - The Bioinformatics Testing ConsortiumJan Aerts
 
J Goecks - The Galaxy Visual Analysis Framework
J Goecks - The Galaxy Visual Analysis FrameworkJ Goecks - The Galaxy Visual Analysis Framework
J Goecks - The Galaxy Visual Analysis FrameworkJan Aerts
 
S Cain - GMOD in the cloud
S Cain - GMOD in the cloudS Cain - GMOD in the cloud
S Cain - GMOD in the cloudJan Aerts
 
B Chapman - Toolkit for variation comparison and analysis
B Chapman - Toolkit for variation comparison and analysisB Chapman - Toolkit for variation comparison and analysis
B Chapman - Toolkit for variation comparison and analysisJan Aerts
 
P Rocca-Serra - The open source ISA metadata tracking framework: from data cu...
P Rocca-Serra - The open source ISA metadata tracking framework: from data cu...P Rocca-Serra - The open source ISA metadata tracking framework: from data cu...
P Rocca-Serra - The open source ISA metadata tracking framework: from data cu...Jan Aerts
 
J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...
J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...
J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...Jan Aerts
 
E Afgan - Zero to a bioinformatics analysis platform in four minutes
E Afgan - Zero to a bioinformatics analysis platform in four minutesE Afgan - Zero to a bioinformatics analysis platform in four minutes
E Afgan - Zero to a bioinformatics analysis platform in four minutesJan Aerts
 
M Reich - GenomeSpace
M Reich - GenomeSpaceM Reich - GenomeSpace
M Reich - GenomeSpaceJan Aerts
 
CT Brown - Doing next-gen sequencing analysis in the cloud
CT Brown - Doing next-gen sequencing analysis in the cloudCT Brown - Doing next-gen sequencing analysis in the cloud
CT Brown - Doing next-gen sequencing analysis in the cloudJan Aerts
 
L Forer - Cloudgene: an execution platform for MapReduce programs in public a...
L Forer - Cloudgene: an execution platform for MapReduce programs in public a...L Forer - Cloudgene: an execution platform for MapReduce programs in public a...
L Forer - Cloudgene: an execution platform for MapReduce programs in public a...Jan Aerts
 
Holland R - Pistoia Alliance Sequence Squeeze
Holland R - Pistoia Alliance Sequence SqueezeHolland R - Pistoia Alliance Sequence Squeeze
Holland R - Pistoia Alliance Sequence SqueezeJan Aerts
 
Zhang Q - A probabilistic approach to k-mer counting
Zhang Q - A probabilistic approach to k-mer countingZhang Q - A probabilistic approach to k-mer counting
Zhang Q - A probabilistic approach to k-mer countingJan Aerts
 
Menager H - Mobyle web framework: new features
Menager H - Mobyle web framework: new featuresMenager H - Mobyle web framework: new features
Menager H - Mobyle web framework: new featuresJan Aerts
 
J Lichtenberg - Discovery of motif-based regulatory signatures in NextGen Seq...
J Lichtenberg - Discovery of motif-based regulatory signatures in NextGen Seq...J Lichtenberg - Discovery of motif-based regulatory signatures in NextGen Seq...
J Lichtenberg - Discovery of motif-based regulatory signatures in NextGen Seq...Jan Aerts
 

Más de Jan Aerts (15)

S Cain - GMOD in the cloud
S Cain - GMOD in the cloudS Cain - GMOD in the cloud
S Cain - GMOD in the cloud
 
B Temperton - The Bioinformatics Testing Consortium
B Temperton - The Bioinformatics Testing ConsortiumB Temperton - The Bioinformatics Testing Consortium
B Temperton - The Bioinformatics Testing Consortium
 
J Goecks - The Galaxy Visual Analysis Framework
J Goecks - The Galaxy Visual Analysis FrameworkJ Goecks - The Galaxy Visual Analysis Framework
J Goecks - The Galaxy Visual Analysis Framework
 
S Cain - GMOD in the cloud
S Cain - GMOD in the cloudS Cain - GMOD in the cloud
S Cain - GMOD in the cloud
 
B Chapman - Toolkit for variation comparison and analysis
B Chapman - Toolkit for variation comparison and analysisB Chapman - Toolkit for variation comparison and analysis
B Chapman - Toolkit for variation comparison and analysis
 
P Rocca-Serra - The open source ISA metadata tracking framework: from data cu...
P Rocca-Serra - The open source ISA metadata tracking framework: from data cu...P Rocca-Serra - The open source ISA metadata tracking framework: from data cu...
P Rocca-Serra - The open source ISA metadata tracking framework: from data cu...
 
J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...
J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...
J Klein - KUPKB: sharing, connecting and exposing kidney and urinary knowledg...
 
E Afgan - Zero to a bioinformatics analysis platform in four minutes
E Afgan - Zero to a bioinformatics analysis platform in four minutesE Afgan - Zero to a bioinformatics analysis platform in four minutes
E Afgan - Zero to a bioinformatics analysis platform in four minutes
 
M Reich - GenomeSpace
M Reich - GenomeSpaceM Reich - GenomeSpace
M Reich - GenomeSpace
 
CT Brown - Doing next-gen sequencing analysis in the cloud
CT Brown - Doing next-gen sequencing analysis in the cloudCT Brown - Doing next-gen sequencing analysis in the cloud
CT Brown - Doing next-gen sequencing analysis in the cloud
 
L Forer - Cloudgene: an execution platform for MapReduce programs in public a...
L Forer - Cloudgene: an execution platform for MapReduce programs in public a...L Forer - Cloudgene: an execution platform for MapReduce programs in public a...
L Forer - Cloudgene: an execution platform for MapReduce programs in public a...
 
Holland R - Pistoia Alliance Sequence Squeeze
Holland R - Pistoia Alliance Sequence SqueezeHolland R - Pistoia Alliance Sequence Squeeze
Holland R - Pistoia Alliance Sequence Squeeze
 
Zhang Q - A probabilistic approach to k-mer counting
Zhang Q - A probabilistic approach to k-mer countingZhang Q - A probabilistic approach to k-mer counting
Zhang Q - A probabilistic approach to k-mer counting
 
Menager H - Mobyle web framework: new features
Menager H - Mobyle web framework: new featuresMenager H - Mobyle web framework: new features
Menager H - Mobyle web framework: new features
 
J Lichtenberg - Discovery of motif-based regulatory signatures in NextGen Seq...
J Lichtenberg - Discovery of motif-based regulatory signatures in NextGen Seq...J Lichtenberg - Discovery of motif-based regulatory signatures in NextGen Seq...
J Lichtenberg - Discovery of motif-based regulatory signatures in NextGen Seq...
 

Humanizing Data Analysis

  • 1. Humanizing data analysis Jan Aerts Bioinformatics, ESAT/SCD, University of Leuven Future Health Department, iMinds
  • 2.
  • 3. I II III IV x y x y x y x y 10.0 8.04 10.0 9.14 10.0 7.46 8.0 6.58 8.0 6.95 8.0 8.14 8.0 6.77 8.0 5.76 13.0 7.58 13.0 8.74 13.0 12.74 8.0 7.71 9.0 8.81 9.0 8.77 9.0 7.11 8.0 8.84 11.0 8.33 11.0 9.26 11.0 7.81 8.0 8.47 14.0 9.96 14.0 8.10 14.0 8.84 8.0 7.04 6.0 7.24 6.0 6.13 6.0 6.08 8.0 5.25 4.0 4.26 4.0 3.10 4.0 5.39 19.0 12.50 12.0 10.84 12.0 9.13 12.0 8.15 8.0 5.56 7.0 4.82 7.0 7.26 7.0 6.42 8.0 7.91 5.0 5.68 5.0 4.74 5.0 5.73 8.0 6.80 correlation x & y = 0.816 mean x = 9.0 variance x = 11.0 regression line: y = n = 11 mean y = 7.5 variance y = 4.12 3+0.5x
  • 4.
  • 5.
  • 7.
  • 8.
  • 9. cognitive task => perceptive task
  • 11. input filter 1 filter 2 filter 3 output output output A B C
  • 12. A B C
  • 13. A B C
  • 14. A B C
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23. put the human back in the loop!

Notas del editor

  1. first: question for you . next slide: 4 datasets of x-y coordinates. Question is : are they drawn from the same distribution?
  2. all summary statistics : the same => different datasets are drawn from the same distribution, right?
  3. It's only when we draw the data that we see that the 4 datasets are in fact vastly different. We can see this in a fraction of a second, instantaneous . Known as Anscombe's quartet . Many probably acquainted with.
  4. why do I show you this? represents my epiphany 2009 : genomics world abuzz: 1kG data available for analysis. 8 institutes at fore-front of genomics research set out to identify specific type of variation in the human genome (within 1kG project) 1yr later: results , but very little overlap (even though same input data) => overlap ranged from 60% down to 1% did the institutes make errors? no. but using different assumptions about the data and different parameters needed data visualization to find out what was going on ; automated algorithms couldn't
  5. Wikipedia: analytical reasoning facilitated by interactive visual interfaces used in terrorism informatics, network security, ... integrating core human strengths in data analysis: - pattern detection - intuition - prediction - context next slide: example of pattern detection: will show you a flash of blue dots . Is there a red one?
  6. pre-attentive vision => 50 milliseconds = enough initiation of eye-movement = 200 milliseconds already convert cognitive task into perceptive This talk: illustrate 2 strengths of visual analytics where visualization adds real value to automated analysis
  7. heading towards data infarction: increasing distance between domain expert and output of automated analysis (e.g. bioinformatics) <= use different languages + algorithms are too opaque and/or advanced to directly relate output to input => expert needs to trust information, but this is blind trust => black box
  8. example: data filtering if no golden standard available given input dataset: how to find the optimal combination of filters to get the maximum number of true positives but minimizing the false positives and false negatives this was the problem faced by the genomics community in 2009
  9. different combinations of filters and their parameters => different elements that pass all thresholds
  10. state of the art : run different combinations of filter settings => take the intersection
  11. but this is what we should have found => visualization of data streams can shed light in that black box
  12. second strength visual analytics Jim Gray (Microsoft) couple of years ago: article on different paradigms of doing scientific research through the ages
  13. thousand of years ago: 1st paradigm concerned with describing natural phenomena
  14. hundreds of years ago: second paradigm Kepler & Newton: theoretical approach : define laws, generalizations
  15. last couple of decades: third paradigm modeling and simulation ("computational biology")
  16. now: big data key difference: data first, hypothesis later
  17. given 2 interaction networks (gene network vs network of functions in linux operating system) which is which?? how do these differ? can calculate connectivity, average vertex degree, global and local complexity, ... where should we start to look? what are the hypotheses to test?
  18. Martin Krzywinski if we constrain nodes to 3 axes just based on the question if a node is a source and/or target of links => start to see patterns why is proportion of nodes on green axis much bigger in one network
  19. if normalize these axes to 100%: some additional patterns clear => look for things that we can investigate left: small number of nodes on yellow axis linked to many nodes on green right: other way around => what is special about this small set of nodes?
  20. have illustrated only 2 use cases of visual analytics, and hopefully spiced up your appetite call to action : put the human back in the loop It's by combining human and algorithm strengths => tackle onslaught of data - effectively - efficiently What we need to do, is detect the expected, discover the unexpected