SlideShare una empresa de Scribd logo
1 de 37
A Methodic Approach to
Good Data Visualization
Luca Candela - @luckymethod
Luca Candela
DataPad Inc. // UX Eye // @luckymethod
Men of great rank, or active business, can only
pay attention to particulars of use […] it is hoped
that with the assistance of these Charts,
information will be got, without the fatigue and
trouble of studying the particulars [...]
William Playfair - Commercial and Political Atlas, 1786
Data visualization is the art of
*reducing information in a data set while
preserving the knowledge contained in it.
*we can talk about what “reducing information” means in this case...
Data Preparation Data Visualization
Discovery of
knowledge
Conceptual data analysis workflow
Hadley Wickham popularized a concept called
split-apply-combine
as a way of thinking about data querying.
http://www.jstatsoft.org/v40/i01/paper
For the four most revenue generating
countries, what are the top three most
revenue generating categories?
Country Venue Type Sum Revenue
United States Fast Food $16
Street $10
Restaurant $9
France Cafe $18
Pub $12
Restaurant $2
Canada Cafe $10
Fast Food $4
Street $3
Japan Street $5
Fast Food $4
Pub $1
apply: Sum Revenue
Canada
United States
Germany
France
Japan
split by country
combine: sort descending by
Sum Revenue, limit 4
Country Sum Revenue
United States
France
Canada
Japan
$ 83
$ 42
$ 36
$ 18
data
Sum Revenue =
$ 36
Sum Revenue =
$ 83
Sum Revenue =
$ 8
Sum Revenue =
$ 42
Sum Revenue =
$ 18
The basics of split-apply-combine
Canada
United States
Germany
France
Japan
data
bus stop
fastfood
park
...
restaurant
hair saloon
pub
...
restaurant
street
cafe
...
park
pub
street
Country Sum Revenue
United States
France
Canada
Japan
$ 16
$ 10
$ 9
$ 18
$ 12
$ 2
$ 10
$ 4
$ 3
$ 5
$ 4
$ 1
Venue type
fastfood
street
restaurant
cafe
pub
restaurant
cafe
fastfood
park
street
fastfood
pub
...
The basics of split-apply-combine
Country Sum Revenue
United States
France
Canada
Japan
split by country,
combine by sorting
desc. on Sum
Revenue,
map to the vertical
axis using an ordinal
scale.
add labels
apply: sum revenue,
call it Sum Revenue,
plot rectangles and map
length to the horizontal
axis using a linear scale,
Color with #45808E.
Use `Country` as label
Split-apply-combine thinking translates to visualizations
1. split on state
apply sum population
combine: sort desc. by population; limit 6
Nested split-apply-combine underpins more complex visualizations
2. split on age (bin by 5 year)
combine: sort by age
apply sum population
Data Visualization can be thought as a
visual mapping function applied
during the *Apply and Combine steps.
*although it can be thought as applied exclusively during the combine step…
Name Operation Lines
Vadim Added 100
Luca Removed 34
Vadim Added 65
Vadim Removed 5
Luca Added 24
Vadim Removed 71
Luca Removed 45
Vadim Added 7
... ... ...
-960
LucaVadim
1531
-321
739
0
1k
2k
-2k
-1k
“plot”
AdditionsDeletions
Reduce information, preserve knowledge...
Question: Mapping of what, to what?
Types of data
ID Timestamp Location Name Operation Lines Pass Test?
0000001 11-05-2013 10.45 am San Francisco Vadim Added 100 Yes
0000002 11-05-2013 11.12 am San Bruno Luca Removed 34 Yes
0000003 11-05-2013 11.30 am San Francisco Vadim Added 65 Yes
0000004 11-05-2013 11.34 am San Francisco Vadim Removed 5 Yes
0000005 11-05-2013 11.43 am San Bruno Luca Added 24 No
0000006 11-05-2013 11.45 am San Francisco Vadim Removed 71 Yes
0000007 11-05-2013 12.51 pm San Francisco Luca Removed 45 Yes
0000008 11-05-2013 12.55 pm San Francisco Vadim Added 7 No
... ... ... ... ... ... ...
Categorical # Discrete
# Continuous# Discrete
Boolean
There are other ways to classify data,
but this one will get you very far.
pick up a good statistics book and just start reading...
Types of variables
1. Independent
a. a variable that isn't changed by the other
variables you are trying to measure. It
usually goes on the x axis.
2. Dependent
a. It is a variable that changes depending on
other variable(s). It usually goes on the y
axis.
-960
LucaVadim
1531
-321
739
0
1k
2k
-2k
-1k
AdditionsDeletions
Dependent Variable
Independent Variable
Variables of a visualization
1. Position (x,y)
2. Size (big, small…)
3. Value (bright, dark…)
4. Texture (hatched, dotted…)
5. Color (blue, red…)
6. Orientation (degree)
7. Shape (triangle, circle…)
y
x
# Discrete # Continuous Categorical Boolean
y
x
y
x
y
x
y
x
Optimal mappings by type
-960
LucaVadim
1531
-321
739
0
1k
2k
-2k
-1k
AddedRemoved
Name Operation Lines
Vadim Added 100
Luca Removed 34
Vadim Added 65
Vadim Removed 5
Luca Added 24
Vadim Removed 71
Luca Removed 45
Vadim Added 7
... ... ...
Split on Name
Split on Operation
Apply Sum(Added)
Apply Sum(Removed)
Combine -Removed map to
Red, value to size
Combine Added map to
Green, value to size
Combine Name map to x axis
Apply the minimum number of mappings
that illustrates the underlying question
you are trying to answer.
Choosing the right viz...
1. Label your axes
2. Include measurement units
3. Explain your encodings (add a legend)
4. Remove redundant information
5. Don’t fuck with distort the axis, especially with time series
Golden rules - Part 1
Golden rules - Part 2
1. If you are trying to visualize rate of change, then do it
2. Remove outliers, but know they are there
3. Tools have their own biases and quirks, know them.
4. The solution to 80% of your problems are bar charts and
histograms
5. Data Tables are visualizations too
...there are thousands of good rules, but the best one is still “keep it simple”
Some examples
this is going to be fun...
Example 1
Simple bar chart Linear scale
Missing bucket (4.8 - 4.9) Missing bucket (4.8 - 4.9)
Example 2
Example 2 - better
No - Human
Yes - Robot
Example 3
Example 4
Example 5
OK, this is comically bad, I was just going for a good collective giggle...
Books you should read
everybody knows about Tufte, so please don’t bring it up
The Semiology of Graphics, 1967
Jaques Bertin
The Elements of Graphing Data, 1985
&
Visualizing Data, 1993
William S. Cleveland
www.datapad.io
Thank you!
for questions, tweet me at @luckymethod

Más contenido relacionado

La actualidad más candente

Ch 3 rev trashketball exp logs
Ch 3 rev trashketball exp logsCh 3 rev trashketball exp logs
Ch 3 rev trashketball exp logs
Kristen Fouss
 
X factoring revised
X factoring revisedX factoring revised
X factoring revised
sgriffin01
 
Logic zoo ws 2013
Logic zoo ws 2013Logic zoo ws 2013
Logic zoo ws 2013
dgbjdjg
 
Multiplication 3
Multiplication 3Multiplication 3
Multiplication 3
Abha Arora
 
Mat0024 l8-16-sections 11-6-7
Mat0024 l8-16-sections 11-6-7Mat0024 l8-16-sections 11-6-7
Mat0024 l8-16-sections 11-6-7
jheggo10
 
8th pre alg -l36--nov26
8th pre alg -l36--nov268th pre alg -l36--nov26
8th pre alg -l36--nov26
jdurst65
 
7th pre alg -l36--dec7
7th pre alg -l36--dec77th pre alg -l36--dec7
7th pre alg -l36--dec7
jdurst65
 

La actualidad más candente (19)

SPECIAL PRODUCTS
SPECIAL PRODUCTSSPECIAL PRODUCTS
SPECIAL PRODUCTS
 
Sparse Binary Zero Sum Games (ACML2014)
Sparse Binary Zero Sum Games (ACML2014)Sparse Binary Zero Sum Games (ACML2014)
Sparse Binary Zero Sum Games (ACML2014)
 
[Lecture 2] AI and Deep Learning: Logistic Regression (Theory)
[Lecture 2] AI and Deep Learning: Logistic Regression (Theory)[Lecture 2] AI and Deep Learning: Logistic Regression (Theory)
[Lecture 2] AI and Deep Learning: Logistic Regression (Theory)
 
Ch 3 rev trashketball exp logs
Ch 3 rev trashketball exp logsCh 3 rev trashketball exp logs
Ch 3 rev trashketball exp logs
 
Perfect square of Binomials
Perfect square of BinomialsPerfect square of Binomials
Perfect square of Binomials
 
SPECIAL PRODUCTS
SPECIAL PRODUCTSSPECIAL PRODUCTS
SPECIAL PRODUCTS
 
Chess board problem(divide and conquer)
Chess board problem(divide and conquer)Chess board problem(divide and conquer)
Chess board problem(divide and conquer)
 
Alg2 lesson 10-3
Alg2 lesson 10-3Alg2 lesson 10-3
Alg2 lesson 10-3
 
Second Quarter Group F Math Peta - Special Products (Sq. of Bi, Sq. of Tri, S...
Second Quarter Group F Math Peta - Special Products (Sq. of Bi, Sq. of Tri, S...Second Quarter Group F Math Peta - Special Products (Sq. of Bi, Sq. of Tri, S...
Second Quarter Group F Math Peta - Special Products (Sq. of Bi, Sq. of Tri, S...
 
karnaugh maps
karnaugh mapskarnaugh maps
karnaugh maps
 
8.4 mixed.ppt worked
8.4 mixed.ppt worked8.4 mixed.ppt worked
8.4 mixed.ppt worked
 
X factoring revised
X factoring revisedX factoring revised
X factoring revised
 
Comuter graphics dda algorithm
Comuter graphics dda algorithm Comuter graphics dda algorithm
Comuter graphics dda algorithm
 
Logic zoo ws 2013
Logic zoo ws 2013Logic zoo ws 2013
Logic zoo ws 2013
 
Multiplication 3
Multiplication 3Multiplication 3
Multiplication 3
 
Mat0024 l8-16-sections 11-6-7
Mat0024 l8-16-sections 11-6-7Mat0024 l8-16-sections 11-6-7
Mat0024 l8-16-sections 11-6-7
 
8th pre alg -l36--nov26
8th pre alg -l36--nov268th pre alg -l36--nov26
8th pre alg -l36--nov26
 
Interactive High-Dimensional Visualization of Social Graphs
Interactive High-Dimensional Visualization of Social GraphsInteractive High-Dimensional Visualization of Social Graphs
Interactive High-Dimensional Visualization of Social Graphs
 
7th pre alg -l36--dec7
7th pre alg -l36--dec77th pre alg -l36--dec7
7th pre alg -l36--dec7
 

Destacado

Operations Professionals and Social Media: Compliance Hurdles and Business Va...
Operations Professionals and Social Media: Compliance Hurdles and Business Va...Operations Professionals and Social Media: Compliance Hurdles and Business Va...
Operations Professionals and Social Media: Compliance Hurdles and Business Va...
Hearsay Social
 

Destacado (9)

How to support content creators
How to support content creatorsHow to support content creators
How to support content creators
 
Operations Professionals and Social Media: Compliance Hurdles and Business Va...
Operations Professionals and Social Media: Compliance Hurdles and Business Va...Operations Professionals and Social Media: Compliance Hurdles and Business Va...
Operations Professionals and Social Media: Compliance Hurdles and Business Va...
 
Mobile Influences on Managed Travel
Mobile Influences on Managed TravelMobile Influences on Managed Travel
Mobile Influences on Managed Travel
 
Digital Marketing Championship
Digital Marketing ChampionshipDigital Marketing Championship
Digital Marketing Championship
 
World Economic Forum Tipping Points Report
World Economic Forum Tipping Points ReportWorld Economic Forum Tipping Points Report
World Economic Forum Tipping Points Report
 
Medienseminar TopSoft 2006
Medienseminar TopSoft 2006Medienseminar TopSoft 2006
Medienseminar TopSoft 2006
 
Content Marketing Canvas
Content Marketing CanvasContent Marketing Canvas
Content Marketing Canvas
 
SlideShare 101
SlideShare 101SlideShare 101
SlideShare 101
 
Build Features, Not Apps
Build Features, Not AppsBuild Features, Not Apps
Build Features, Not Apps
 

Similar a Visualize data using the split-apply-combine approach

20100119 mis
20100119 mis20100119 mis
20100119 mis
amikom
 
Hailey_Database_Performance_Made_Easy_through_Graphics.pdf
Hailey_Database_Performance_Made_Easy_through_Graphics.pdfHailey_Database_Performance_Made_Easy_through_Graphics.pdf
Hailey_Database_Performance_Made_Easy_through_Graphics.pdf
cookie1969
 
RTB Update 4 - Dominic Trigg, RocketFuel
RTB Update 4 - Dominic Trigg, RocketFuelRTB Update 4 - Dominic Trigg, RocketFuel
RTB Update 4 - Dominic Trigg, RocketFuel
HusetMarkedsforing
 

Similar a Visualize data using the split-apply-combine approach (20)

6 sigma introduction
6 sigma introduction6 sigma introduction
6 sigma introduction
 
20100119 mis
20100119 mis20100119 mis
20100119 mis
 
Hailey_Database_Performance_Made_Easy_through_Graphics.pdf
Hailey_Database_Performance_Made_Easy_through_Graphics.pdfHailey_Database_Performance_Made_Easy_through_Graphics.pdf
Hailey_Database_Performance_Made_Easy_through_Graphics.pdf
 
Record linkage, a real use case with spark ml - Paris Spark meetup Dec 2015
Record linkage, a real use case with spark ml  - Paris Spark meetup Dec 2015Record linkage, a real use case with spark ml  - Paris Spark meetup Dec 2015
Record linkage, a real use case with spark ml - Paris Spark meetup Dec 2015
 
OLAP Basics and Fundamentals by Bharat Kalia
OLAP Basics and Fundamentals by Bharat Kalia OLAP Basics and Fundamentals by Bharat Kalia
OLAP Basics and Fundamentals by Bharat Kalia
 
Advanced Analytics: Analytic Platforms Should Be Columnar Orientation
Advanced Analytics: Analytic Platforms Should Be Columnar OrientationAdvanced Analytics: Analytic Platforms Should Be Columnar Orientation
Advanced Analytics: Analytic Platforms Should Be Columnar Orientation
 
Pivoting Data with SparkSQL by Andrew Ray
Pivoting Data with SparkSQL by Andrew RayPivoting Data with SparkSQL by Andrew Ray
Pivoting Data with SparkSQL by Andrew Ray
 
Modern query optimisation features in MySQL 8.
Modern query optimisation features in MySQL 8.Modern query optimisation features in MySQL 8.
Modern query optimisation features in MySQL 8.
 
RTB Update 4 - Dominic Trigg, RocketFuel
RTB Update 4 - Dominic Trigg, RocketFuelRTB Update 4 - Dominic Trigg, RocketFuel
RTB Update 4 - Dominic Trigg, RocketFuel
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the code
 
4.Data-Visualization.pptx
4.Data-Visualization.pptx4.Data-Visualization.pptx
4.Data-Visualization.pptx
 
A Picture is Worth a Thousand Words
A Picture is Worth a Thousand WordsA Picture is Worth a Thousand Words
A Picture is Worth a Thousand Words
 
05 OLAP v6 weekend
05 OLAP  v6 weekend05 OLAP  v6 weekend
05 OLAP v6 weekend
 
Access intro
Access introAccess intro
Access intro
 
Chap12
Chap12Chap12
Chap12
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the code
 
2014-04-09, Data mining demo for astronomy researchers
2014-04-09, Data mining demo for astronomy researchers2014-04-09, Data mining demo for astronomy researchers
2014-04-09, Data mining demo for astronomy researchers
 
Fundamentals Of Data Mining 2010
Fundamentals Of Data Mining 2010Fundamentals Of Data Mining 2010
Fundamentals Of Data Mining 2010
 
Fundamentals Of Data Mining 2010
Fundamentals Of Data Mining 2010Fundamentals Of Data Mining 2010
Fundamentals Of Data Mining 2010
 
Fundamentals Of Data Mining 2010
Fundamentals Of Data Mining 2010Fundamentals Of Data Mining 2010
Fundamentals Of Data Mining 2010
 

Último

Escorts Service Basapura ☎ 7737669865☎ Book Your One night Stand (Bangalore)
Escorts Service Basapura ☎ 7737669865☎ Book Your One night Stand (Bangalore)Escorts Service Basapura ☎ 7737669865☎ Book Your One night Stand (Bangalore)
Escorts Service Basapura ☎ 7737669865☎ Book Your One night Stand (Bangalore)
amitlee9823
 
RT Nagar Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Bang...
RT Nagar Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Bang...RT Nagar Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Bang...
RT Nagar Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Bang...
amitlee9823
 
Brookefield Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Brookefield Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Brookefield Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Brookefield Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
B. Smith. (Architectural Portfolio.).pdf
B. Smith. (Architectural Portfolio.).pdfB. Smith. (Architectural Portfolio.).pdf
B. Smith. (Architectural Portfolio.).pdf
University of Wisconsin-Milwaukee
 
Escorts Service Nagavara ☎ 7737669865☎ Book Your One night Stand (Bangalore)
Escorts Service Nagavara ☎ 7737669865☎ Book Your One night Stand (Bangalore)Escorts Service Nagavara ☎ 7737669865☎ Book Your One night Stand (Bangalore)
Escorts Service Nagavara ☎ 7737669865☎ Book Your One night Stand (Bangalore)
amitlee9823
 
call girls in Dakshinpuri (DELHI) 🔝 >༒9953056974 🔝 genuine Escort Service 🔝✔️✔️
call girls in Dakshinpuri  (DELHI) 🔝 >༒9953056974 🔝 genuine Escort Service 🔝✔️✔️call girls in Dakshinpuri  (DELHI) 🔝 >༒9953056974 🔝 genuine Escort Service 🔝✔️✔️
call girls in Dakshinpuri (DELHI) 🔝 >༒9953056974 🔝 genuine Escort Service 🔝✔️✔️
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Peaches App development presentation deck
Peaches App development presentation deckPeaches App development presentation deck
Peaches App development presentation deck
tbatkhuu1
 

Último (20)

call girls in Vasundhra (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝...
call girls in Vasundhra (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝...call girls in Vasundhra (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝...
call girls in Vasundhra (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝...
 
Sector 104, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 104, Noida Call girls :8448380779 Model Escorts | 100% verifiedSector 104, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 104, Noida Call girls :8448380779 Model Escorts | 100% verified
 
HiFi Call Girl Service Delhi Phone ☞ 9899900591 ☜ Escorts Service at along wi...
HiFi Call Girl Service Delhi Phone ☞ 9899900591 ☜ Escorts Service at along wi...HiFi Call Girl Service Delhi Phone ☞ 9899900591 ☜ Escorts Service at along wi...
HiFi Call Girl Service Delhi Phone ☞ 9899900591 ☜ Escorts Service at along wi...
 
Escorts Service Basapura ☎ 7737669865☎ Book Your One night Stand (Bangalore)
Escorts Service Basapura ☎ 7737669865☎ Book Your One night Stand (Bangalore)Escorts Service Basapura ☎ 7737669865☎ Book Your One night Stand (Bangalore)
Escorts Service Basapura ☎ 7737669865☎ Book Your One night Stand (Bangalore)
 
Top Rated Pune Call Girls Koregaon Park ⟟ 6297143586 ⟟ Call Me For Genuine S...
Top Rated  Pune Call Girls Koregaon Park ⟟ 6297143586 ⟟ Call Me For Genuine S...Top Rated  Pune Call Girls Koregaon Park ⟟ 6297143586 ⟟ Call Me For Genuine S...
Top Rated Pune Call Girls Koregaon Park ⟟ 6297143586 ⟟ Call Me For Genuine S...
 
Pooja 9892124323, Call girls Services and Mumbai Escort Service Near Hotel Th...
Pooja 9892124323, Call girls Services and Mumbai Escort Service Near Hotel Th...Pooja 9892124323, Call girls Services and Mumbai Escort Service Near Hotel Th...
Pooja 9892124323, Call girls Services and Mumbai Escort Service Near Hotel Th...
 
Sweety Planet Packaging Design Process Book.pptx
Sweety Planet Packaging Design Process Book.pptxSweety Planet Packaging Design Process Book.pptx
Sweety Planet Packaging Design Process Book.pptx
 
RT Nagar Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Bang...
RT Nagar Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Bang...RT Nagar Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Bang...
RT Nagar Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Bang...
 
Brookefield Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Brookefield Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Brookefield Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Brookefield Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
SD_The MATATAG Curriculum Training Design.pptx
SD_The MATATAG Curriculum Training Design.pptxSD_The MATATAG Curriculum Training Design.pptx
SD_The MATATAG Curriculum Training Design.pptx
 
Jordan_Amanda_DMBS202404_PB1_2024-04.pdf
Jordan_Amanda_DMBS202404_PB1_2024-04.pdfJordan_Amanda_DMBS202404_PB1_2024-04.pdf
Jordan_Amanda_DMBS202404_PB1_2024-04.pdf
 
UI:UX Design and Empowerment Strategies for Underprivileged Transgender Indiv...
UI:UX Design and Empowerment Strategies for Underprivileged Transgender Indiv...UI:UX Design and Empowerment Strategies for Underprivileged Transgender Indiv...
UI:UX Design and Empowerment Strategies for Underprivileged Transgender Indiv...
 
B. Smith. (Architectural Portfolio.).pdf
B. Smith. (Architectural Portfolio.).pdfB. Smith. (Architectural Portfolio.).pdf
B. Smith. (Architectural Portfolio.).pdf
 
VIP Model Call Girls Kalyani Nagar ( Pune ) Call ON 8005736733 Starting From ...
VIP Model Call Girls Kalyani Nagar ( Pune ) Call ON 8005736733 Starting From ...VIP Model Call Girls Kalyani Nagar ( Pune ) Call ON 8005736733 Starting From ...
VIP Model Call Girls Kalyani Nagar ( Pune ) Call ON 8005736733 Starting From ...
 
Chapter 19_DDA_TOD Policy_First Draft 2012.pdf
Chapter 19_DDA_TOD Policy_First Draft 2012.pdfChapter 19_DDA_TOD Policy_First Draft 2012.pdf
Chapter 19_DDA_TOD Policy_First Draft 2012.pdf
 
Booking open Available Pune Call Girls Nanded City 6297143586 Call Hot India...
Booking open Available Pune Call Girls Nanded City  6297143586 Call Hot India...Booking open Available Pune Call Girls Nanded City  6297143586 Call Hot India...
Booking open Available Pune Call Girls Nanded City 6297143586 Call Hot India...
 
Escorts Service Nagavara ☎ 7737669865☎ Book Your One night Stand (Bangalore)
Escorts Service Nagavara ☎ 7737669865☎ Book Your One night Stand (Bangalore)Escorts Service Nagavara ☎ 7737669865☎ Book Your One night Stand (Bangalore)
Escorts Service Nagavara ☎ 7737669865☎ Book Your One night Stand (Bangalore)
 
Q4-W4-SCIENCE-5 power point presentation
Q4-W4-SCIENCE-5 power point presentationQ4-W4-SCIENCE-5 power point presentation
Q4-W4-SCIENCE-5 power point presentation
 
call girls in Dakshinpuri (DELHI) 🔝 >༒9953056974 🔝 genuine Escort Service 🔝✔️✔️
call girls in Dakshinpuri  (DELHI) 🔝 >༒9953056974 🔝 genuine Escort Service 🔝✔️✔️call girls in Dakshinpuri  (DELHI) 🔝 >༒9953056974 🔝 genuine Escort Service 🔝✔️✔️
call girls in Dakshinpuri (DELHI) 🔝 >༒9953056974 🔝 genuine Escort Service 🔝✔️✔️
 
Peaches App development presentation deck
Peaches App development presentation deckPeaches App development presentation deck
Peaches App development presentation deck
 

Visualize data using the split-apply-combine approach

  • 1. A Methodic Approach to Good Data Visualization Luca Candela - @luckymethod
  • 2. Luca Candela DataPad Inc. // UX Eye // @luckymethod
  • 3. Men of great rank, or active business, can only pay attention to particulars of use […] it is hoped that with the assistance of these Charts, information will be got, without the fatigue and trouble of studying the particulars [...] William Playfair - Commercial and Political Atlas, 1786
  • 4. Data visualization is the art of *reducing information in a data set while preserving the knowledge contained in it. *we can talk about what “reducing information” means in this case...
  • 5. Data Preparation Data Visualization Discovery of knowledge Conceptual data analysis workflow
  • 6. Hadley Wickham popularized a concept called split-apply-combine as a way of thinking about data querying. http://www.jstatsoft.org/v40/i01/paper
  • 7. For the four most revenue generating countries, what are the top three most revenue generating categories? Country Venue Type Sum Revenue United States Fast Food $16 Street $10 Restaurant $9 France Cafe $18 Pub $12 Restaurant $2 Canada Cafe $10 Fast Food $4 Street $3 Japan Street $5 Fast Food $4 Pub $1
  • 8. apply: Sum Revenue Canada United States Germany France Japan split by country combine: sort descending by Sum Revenue, limit 4 Country Sum Revenue United States France Canada Japan $ 83 $ 42 $ 36 $ 18 data Sum Revenue = $ 36 Sum Revenue = $ 83 Sum Revenue = $ 8 Sum Revenue = $ 42 Sum Revenue = $ 18 The basics of split-apply-combine
  • 9. Canada United States Germany France Japan data bus stop fastfood park ... restaurant hair saloon pub ... restaurant street cafe ... park pub street Country Sum Revenue United States France Canada Japan $ 16 $ 10 $ 9 $ 18 $ 12 $ 2 $ 10 $ 4 $ 3 $ 5 $ 4 $ 1 Venue type fastfood street restaurant cafe pub restaurant cafe fastfood park street fastfood pub ... The basics of split-apply-combine
  • 10. Country Sum Revenue United States France Canada Japan split by country, combine by sorting desc. on Sum Revenue, map to the vertical axis using an ordinal scale. add labels apply: sum revenue, call it Sum Revenue, plot rectangles and map length to the horizontal axis using a linear scale, Color with #45808E. Use `Country` as label Split-apply-combine thinking translates to visualizations
  • 11. 1. split on state apply sum population combine: sort desc. by population; limit 6 Nested split-apply-combine underpins more complex visualizations 2. split on age (bin by 5 year) combine: sort by age apply sum population
  • 12. Data Visualization can be thought as a visual mapping function applied during the *Apply and Combine steps. *although it can be thought as applied exclusively during the combine step…
  • 13. Name Operation Lines Vadim Added 100 Luca Removed 34 Vadim Added 65 Vadim Removed 5 Luca Added 24 Vadim Removed 71 Luca Removed 45 Vadim Added 7 ... ... ... -960 LucaVadim 1531 -321 739 0 1k 2k -2k -1k “plot” AdditionsDeletions Reduce information, preserve knowledge...
  • 14. Question: Mapping of what, to what?
  • 15. Types of data ID Timestamp Location Name Operation Lines Pass Test? 0000001 11-05-2013 10.45 am San Francisco Vadim Added 100 Yes 0000002 11-05-2013 11.12 am San Bruno Luca Removed 34 Yes 0000003 11-05-2013 11.30 am San Francisco Vadim Added 65 Yes 0000004 11-05-2013 11.34 am San Francisco Vadim Removed 5 Yes 0000005 11-05-2013 11.43 am San Bruno Luca Added 24 No 0000006 11-05-2013 11.45 am San Francisco Vadim Removed 71 Yes 0000007 11-05-2013 12.51 pm San Francisco Luca Removed 45 Yes 0000008 11-05-2013 12.55 pm San Francisco Vadim Added 7 No ... ... ... ... ... ... ... Categorical # Discrete # Continuous# Discrete Boolean
  • 16. There are other ways to classify data, but this one will get you very far. pick up a good statistics book and just start reading...
  • 17. Types of variables 1. Independent a. a variable that isn't changed by the other variables you are trying to measure. It usually goes on the x axis. 2. Dependent a. It is a variable that changes depending on other variable(s). It usually goes on the y axis.
  • 19. Variables of a visualization 1. Position (x,y) 2. Size (big, small…) 3. Value (bright, dark…) 4. Texture (hatched, dotted…) 5. Color (blue, red…) 6. Orientation (degree) 7. Shape (triangle, circle…) y x
  • 20. # Discrete # Continuous Categorical Boolean y x y x y x y x Optimal mappings by type
  • 21. -960 LucaVadim 1531 -321 739 0 1k 2k -2k -1k AddedRemoved Name Operation Lines Vadim Added 100 Luca Removed 34 Vadim Added 65 Vadim Removed 5 Luca Added 24 Vadim Removed 71 Luca Removed 45 Vadim Added 7 ... ... ... Split on Name Split on Operation Apply Sum(Added) Apply Sum(Removed) Combine -Removed map to Red, value to size Combine Added map to Green, value to size Combine Name map to x axis
  • 22. Apply the minimum number of mappings that illustrates the underlying question you are trying to answer.
  • 24. 1. Label your axes 2. Include measurement units 3. Explain your encodings (add a legend) 4. Remove redundant information 5. Don’t fuck with distort the axis, especially with time series Golden rules - Part 1
  • 25. Golden rules - Part 2 1. If you are trying to visualize rate of change, then do it 2. Remove outliers, but know they are there 3. Tools have their own biases and quirks, know them. 4. The solution to 80% of your problems are bar charts and histograms 5. Data Tables are visualizations too ...there are thousands of good rules, but the best one is still “keep it simple”
  • 26. Some examples this is going to be fun...
  • 27. Example 1 Simple bar chart Linear scale Missing bucket (4.8 - 4.9) Missing bucket (4.8 - 4.9)
  • 29. Example 2 - better No - Human Yes - Robot
  • 32. Example 5 OK, this is comically bad, I was just going for a good collective giggle...
  • 33. Books you should read everybody knows about Tufte, so please don’t bring it up
  • 34. The Semiology of Graphics, 1967 Jaques Bertin
  • 35. The Elements of Graphing Data, 1985 & Visualizing Data, 1993 William S. Cleveland
  • 37. Thank you! for questions, tweet me at @luckymethod