Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.
Data Visualization using R 
How to get, manage, and present data to tell a compelling science story 
William Gunn 
@mrgunn...
1.A short history of graphical presentation of data 
2.Introduction to R 
3.Finding, cleaning, and presenting data 
4.Repr...
Data viz has a long history 
John Snow’s cholera map helped communicate the idea that cholera was a water-borne disease.
Florence Nightingale used dataviz
Modernization of dataviz
Chart junk: good, bad, and ugly 
Which presentation is better?
It can be elegant…
Tufte
Tufte
How our eyes and brain perceive 
It takes 200 ms to initiate an eye movement, but the red dot can be found in 100 ms or le...
Shape is a little slower than color!
Pre-attentive processing fails!
There are many “primitive” properties which we perceive 
•Length 
•Width 
•Size 
•Density 
•Hue 
•Color intensity 
•Depth ...
Length
Width
Density
Hue
Color Intensity
Depth
3D orientation
Types of color schemes 
•Sequential – suited for ordered data that progress from low to high. Use light colors for low val...
Sequential 
http://colorbrewer2.org/
Diverging
Qualitative
Tips for maps 
•Keep it to 5-7 data classes 
•~8% of men are red-green colorblind 
•Diverging schemes don’t do well when p...
Part 2 
Introduction to R
Why R? 
•Open source tool 
•Huge variety of packages for any kind of analysis 
•Saves time repeating data processing steps...
.csv and .xls / xlsx 
•Excel files are designed to hold the appearance of the spreadsheet in addition to the data. 
•R jus...
data structures 
•x<-c(1,2,3,4,5,6,7,8,9,10) 
•x 
•length(x) 
•x[1] 
•x[2] 
•x<-c(1:10) 
•x
types of data 
•y<-c(“abc”, “def”, “g”, “h”, “i”) 
•y 
•class(y) 
•y[2] 
•length(y) 
•data can be integer (1,2,3,…), numer...
Vectors 
•R can hold data organized a few different ways 
•vectors (1,2,3,4) but not (1,2,3,x,y,z) 
•lists – can hold hete...
Vector operations 
•x + 1 
•x 
•sum(x) 
•mean(x) 
•mean(x+1) 
•x[2]<-x[2]+1 
•x 
•x+c(2:3) 
•x[2:10] + c(2:3)
working with lists 
•y<-list(name = “Bob”, age = 24) 
•y 
•y$name 
•y[1] 
•y[[1]] 
•class(y[1]) 
•class(y[[1]]) 
•y<-list(...
Loading data 
•data<-read.csv("C:/Users/William Gunn/Desktop/Dropbox/Scripting/Data/traffic_accidents/accidents2010_all.cs...
Selecting subsets of data 
•“[“ 
•“$” 
•which 
•grep and grepl 
•subset
PLOTS 
•ggplot2 – an implementation of the “grammar of graphics” in R 
•a set of graph types and a way of mapping variable...
Types of geoms 
•point – dotplot – takes x,y coords of points 
•abline – line layer – takes slope, intercept 
•line – conn...
Science Online 2013: Data Visualization Using R
Science Online 2013: Data Visualization Using R
Science Online 2013: Data Visualization Using R
Próxima SlideShare
Cargando en…5
×
  • Sé el primero en comentar

Science Online 2013: Data Visualization Using R

  1. 1. Data Visualization using R How to get, manage, and present data to tell a compelling science story William Gunn @mrgunn Head of Academic Outreach, Mendeley Access point: NRC Visitor
  2. 2. 1.A short history of graphical presentation of data 2.Introduction to R 3.Finding, cleaning, and presenting data 4.Reproducibility and data sharing
  3. 3. Data viz has a long history John Snow’s cholera map helped communicate the idea that cholera was a water-borne disease.
  4. 4. Florence Nightingale used dataviz
  5. 5. Modernization of dataviz
  6. 6. Chart junk: good, bad, and ugly Which presentation is better?
  7. 7. It can be elegant…
  8. 8. Tufte
  9. 9. Tufte
  10. 10. How our eyes and brain perceive It takes 200 ms to initiate an eye movement, but the red dot can be found in 100 ms or less. This is due to pre-attentive processing.
  11. 11. Shape is a little slower than color!
  12. 12. Pre-attentive processing fails!
  13. 13. There are many “primitive” properties which we perceive •Length •Width •Size •Density •Hue •Color intensity •Depth •3-D orientation
  14. 14. Length
  15. 15. Width
  16. 16. Density
  17. 17. Hue
  18. 18. Color Intensity
  19. 19. Depth
  20. 20. 3D orientation
  21. 21. Types of color schemes •Sequential – suited for ordered data that progress from low to high. Use light colors for low values and dark colors for higher. •Diverging – uses hue to show the breakpoint and intensity to show divergent extremes. •Qualitative – uses different colors to represent different categories. Beware of using hue/saturation to highlight unimportant categories.
  22. 22. Sequential http://colorbrewer2.org/
  23. 23. Diverging
  24. 24. Qualitative
  25. 25. Tips for maps •Keep it to 5-7 data classes •~8% of men are red-green colorblind •Diverging schemes don’t do well when printed or photocopied •Colors will often render differently on different screens, especially low-end LCD screens •http://colorbrewer2.org
  26. 26. Part 2 Introduction to R
  27. 27. Why R? •Open source tool •Huge variety of packages for any kind of analysis •Saves time repeating data processing steps •Allows working with more diverse types of data and much larger datasets than Excel •Processing is much faster than Excel •Scripts are easily shareable, promoting reproducible work
  28. 28. .csv and .xls / xlsx •Excel files are designed to hold the appearance of the spreadsheet in addition to the data. •R just wants the data, so always save as .csv if you have tabular data
  29. 29. data structures •x<-c(1,2,3,4,5,6,7,8,9,10) •x •length(x) •x[1] •x[2] •x<-c(1:10) •x
  30. 30. types of data •y<-c(“abc”, “def”, “g”, “h”, “i”) •y •class(y) •y[2] •length(y) •data can be integer (1,2,3,…), numeric (1.0, 2.3, …), character (a, b, c,…), logical (TRUE, FALSE) or other things
  31. 31. Vectors •R can hold data organized a few different ways •vectors (1,2,3,4) but not (1,2,3,x,y,z) •lists – can hold heterogeneous data –1 –2 –a •x •arrays – multi-dimensional •dataframes – lists of vectors - like spreadsheets
  32. 32. Vector operations •x + 1 •x •sum(x) •mean(x) •mean(x+1) •x[2]<-x[2]+1 •x •x+c(2:3) •x[2:10] + c(2:3)
  33. 33. working with lists •y<-list(name = “Bob”, age = 24) •y •y$name •y[1] •y[[1]] •class(y[1]) •class(y[[1]]) •y<-list(y$name, “Sue”) •y$name •y$age[2]<-list(33)
  34. 34. Loading data •data<-read.csv("C:/Users/William Gunn/Desktop/Dropbox/Scripting/Data/traffic_accidents/accidents2010_all.csv", header = TRUE, stringsAsFactors = FALSE)
  35. 35. Selecting subsets of data •“[“ •“$” •which •grep and grepl •subset
  36. 36. PLOTS •ggplot2 – an implementation of the “grammar of graphics” in R •a set of graph types and a way of mapping variables to graph features •graph types are called “geoms” •mappings are “aesthetics” •graphs are built up by layering geoms
  37. 37. Types of geoms •point – dotplot – takes x,y coords of points •abline – line layer – takes slope, intercept •line – connect points with a line •smooth – fit a curve •bar – aka histogram – takes vector of data •boxplot – box and whiskers •density – to show relative distributions •errorbar – what it says on the tin

×