SlideShare una empresa de Scribd logo
1 de 49
Descargar para leer sin conexión
Today we will use the reshape2 and xtables
packages, and the movies.csv.bz2 dataset.
install.packages(c("reshape2", "xtables"))
Garrett Grolemund
Phd Student / Rice University
Department of Statistics
Tables
1. Melt and Cast (reshape)
2. Making tables with R (xtables)
3. Advanced styling (booktabs)
Melt and
Cast
reshape2
Improved version of reshape - much
faster. melt() works the same as in
reshape.
Recall
Sometimes a single variable is strewn
across multiple columns. We can
recombine it with a 3 step process:
1. protect the good columns
2. melt the rest
3. subset out false rows
tips1 <- melt(tips, id = c("customer_ID",
"total_bill", "tip"))
names(tips1)[4] <- "gender"
tips1 <- subset(tips1, value == 1)[ , 1:4]
customer_id total_bill tip female male
1 20.53 4.00 0 1
2 21.70 4.30 0 1
3 19.65 3.00 1 0
customer_id total_bill tip gender
1 20.53 4.00 male
2 21.70 4.30 male
3 19.65 3.00 female
Your turn
Use melt() to combine the genre columns
of movies.csv into one column that
displays each movie’s genre.
movies <- read.csv("movies.csv.bz2",
stringsAsFactors = FALSE)
library(reshape)
m.movies <-melt(movies,
id = names(movies)[1:17])
names(movies)[18] <- "genre"
movies <- subset(m.movies, value == 1)[ , 1:18]
Cast
subject age weight height
John Smith 33 90 1.87
Mary Smith 1.54
subject variable value
John Smith age 33
John Smith weight 90
John Smith height 1.87
Mary Smith height 1.54
variable John Smith Mary Smith
age 33
weight 90
height 1.87 1.54
What is the
difference
between
these data
frames?
subject age weight height
John Smith 33 90 1.87
Mary Smith 1.54
subject variable value
John Smith age 33
John Smith weight 90
John Smith height 1.87
Mary Smith height 1.54
variable John Smith Mary Smith
age 33
weight 90
height 1.87 1.54
Same
measurements,
different
arrangements
dcast (or cast in reshape) transforms a
melted dataframe into whatever
arrangement you like. Dataframe must
contain a column named “value.”
smiths
m.smiths <- melt(smiths[ , -2])
dcast(m.smiths, subject ~ variable)
dcast(m.smiths, variable ~ subject)
General strategy
Think of dcast() as making a table
dcast(m.smiths, subject ~ variable)
melted
dataset
dcast(m.smiths, subject ~ variable)
what IDs to put in
left hand column
dcast(m.smiths, subject ~ variable)
what IDs to put
along the top
dcast(m.smiths, subject ~ variable)
Whatever was in the value column will get
spread across the cells of the table
dcast(m.smith, subject + gender ~ variable)
left hand
column 1
1 2
left hand
column 2
m.smiths$gender <- rep(c(“male”, “female”), 2)
Multiple variables
dcast(m.smith, subject ~ gender + variable)
Multiple variables
?
dcast(m.smith, subject ~ gender + variable)
still only one header row: names
get combined
Multiple variables
Your turn
Use reshape2 and the smiths data set to
create the following tables
variable John Smith Mary Smith
1 age 33.00 NA
2 weight 90.00 NA
3 height 1.87 1.54
subject variable 1.54 1.87
1 John Smith time NA 1
2 John Smith age NA 33
3 John Smith weight NA 90
4 Mary Smith time 1 NA
5 Mary Smith age NA NA
6 Mary Smith weight NA NA
Q: Often reshaping a data set will lead to
two values being put into the same cell.
How do we combine them?
For example,
Aggregating
game player variable value
1 Yao Ming shot attempts 16
2 Yao Ming shot attempts 12
2 Jordan Hill shot attempts 9
player shot attempts
Jordan Hill 9
Yao Ming ?
A: However you want.
Aggregating
player shot attempts
Jordan Hill 9
Yao Ming 28
dcast(data, left ~ top, agg.method)
R function to use to
combine the values
df <- data.frame(game = c(1,2,2),
player = c("Yao Ming", "Yao Ming",
"Jordan Hill"), variable = c("shot attempts",
"shot attempts", "shot attempts"),
value = c(16, 12, 9))
dcast(df, player ~ variable, sum)
Useful aggregating
functions
length
(default) number of times the
combination appeared
sum total number of all values in that cell
mean
average number of all values in that
cell
* remember to use na.rm = T
Your turn
Use the movie data set to create a table that
shows:
1. The total number of movies in each genre
by year. (hint: years in the rows, genres in the
columns)
2. The average rating, length, and budget of
movies from each year (hint: years in rows)
dcast(movies, year ~ genre, length)
movies1 <- melt(movies[ , c(2:5)], id = "year")
movies1 <- dcast(movies1, year ~ variable, mean,
na.rm = T)
qplot(year, length, data = movies1, geom = "line")
Its often useful to see the total for each
row and column. These totals are known
as margins or marginal distributions.
Margins
To add a margin column:
dcast(data, left ~ top, margins = "name of last variable on left side
of ~")
To add a margin row:
dcast(data, left ~ top, margins = "name of last variable on right
side of ~")
To add both:
dcast(data, left ~ top, margins = c("name1", "name2"))
Margins
Your turn
Add a margin row and column to your
table of genres by year.
Making
tables with R
Data
analysis
done in R
Problem
Tables
published
in latex?you +
hand entry
Data
analysis
done in R
solution
Tables
published
in latex?xtable
library(xtable)
xtable converts objects in R to a table in
latex code with xtable() and print()
head(smiths)
print(xtable(smiths))
print(xtable(smiths), floating = FALSE)
Captions
print(xtable(smiths, caption = "John is
taller than Mary."))
Centering
print(xtable(smiths), latex.environment =
"center")
Column Alignment
smith.table <- xtable(smiths)
align(smith.table) <- "|cc|cccc|"
print(smith.table)
Labels
print(xtable(smiths, label = "tab:smith"))
Suppress row names
print(xtable(smiths), include.rownames =
FALSE)
more
Remember that variable names like
“year_2010” won’t print out in latex. You’ll
have to change them to “year_2010”
first.
For more examples of xtables, visit:
cran.r-project.org/web/packages/xtable/
vignettes/xtableGallery.pd
Create latex code for the genre by year
table. Give the table an appropriate
caption and label and remove row names.
Your turn
genres <- dcast(movies, year ~ genre, length)
print(xtable(genres, label = "tab:genres",
caption = "Table of genres by year. Action
movies win."), include.rownames = FALSE)
Advanced
table styling
6 rules for pretty tables
From http://www.ctan.org/tex-archive/macros/latex/contrib/booktabs/booktabs.pdf
1. Never use vertical lines
2. Never use double lines
3. Put units in the column heading, not in
body of table
4. Always precede a decimal point by a digit
5. Never use “ditto” signs to repeat a
previous value
6. Use signficant digits consistently within
columns
booktabs package
booktabs is a latex package that provides
tools for making table even better.
Specifically it provides toprule and
bottomrule which are thicker and provide
better spacing than using hline to begin
and end your tables. midrule is thinner
and can be used within the table.
booktabs package
To use toprule, bottomrule, and midrule
first declare usepackage{booktabs} at
the top of your tex document.
This work is licensed under the Creative
Commons Attribution-Noncommercial 3.0 United
States License. To view a copy of this license,
visit http://creativecommons.org/licenses/by-nc/
3.0/us/ or send a letter to Creative Commons,
171 Second Street, Suite 300, San Francisco,
California, 94105, USA.

Más contenido relacionado

La actualidad más candente

Advanced Data Visualization Examples with R-Part II
Advanced Data Visualization Examples with R-Part IIAdvanced Data Visualization Examples with R-Part II
Advanced Data Visualization Examples with R-Part IIDr. Volkan OBAN
 
Derivatives vinnie
Derivatives vinnieDerivatives vinnie
Derivatives vinniecanalculus
 
Tree-Based Methods (Article 8 - Practical Exercises)
Tree-Based Methods (Article 8 - Practical Exercises)Tree-Based Methods (Article 8 - Practical Exercises)
Tree-Based Methods (Article 8 - Practical Exercises)Theodore Grammatikopoulos
 
Q plot tutorial
Q plot tutorialQ plot tutorial
Q plot tutorialAbhik Seal
 
CDAT - cdms2, maskes, cdscan, cdutil, genutil - Introduction
CDAT - cdms2, maskes, cdscan, cdutil, genutil - Introduction CDAT - cdms2, maskes, cdscan, cdutil, genutil - Introduction
CDAT - cdms2, maskes, cdscan, cdutil, genutil - Introduction Arulalan T
 

La actualidad más candente (7)

How big-is-your-graph
How big-is-your-graphHow big-is-your-graph
How big-is-your-graph
 
Advanced Data Visualization Examples with R-Part II
Advanced Data Visualization Examples with R-Part IIAdvanced Data Visualization Examples with R-Part II
Advanced Data Visualization Examples with R-Part II
 
09 Simulation
09 Simulation09 Simulation
09 Simulation
 
Derivatives vinnie
Derivatives vinnieDerivatives vinnie
Derivatives vinnie
 
Tree-Based Methods (Article 8 - Practical Exercises)
Tree-Based Methods (Article 8 - Practical Exercises)Tree-Based Methods (Article 8 - Practical Exercises)
Tree-Based Methods (Article 8 - Practical Exercises)
 
Q plot tutorial
Q plot tutorialQ plot tutorial
Q plot tutorial
 
CDAT - cdms2, maskes, cdscan, cdutil, genutil - Introduction
CDAT - cdms2, maskes, cdscan, cdutil, genutil - Introduction CDAT - cdms2, maskes, cdscan, cdutil, genutil - Introduction
CDAT - cdms2, maskes, cdscan, cdutil, genutil - Introduction
 

Destacado (20)

A Research Approach to Develop Measurable Competencies and Skills: Videogames...
A Research Approach to Develop Measurable Competencies and Skills: Videogames...A Research Approach to Develop Measurable Competencies and Skills: Videogames...
A Research Approach to Develop Measurable Competencies and Skills: Videogames...
 
Price Patterns and Fibonacci Ratios
Price Patterns and Fibonacci RatiosPrice Patterns and Fibonacci Ratios
Price Patterns and Fibonacci Ratios
 
10 simulation
10 simulation10 simulation
10 simulation
 
17 polishing
17 polishing17 polishing
17 polishing
 
20 Polishing
20 Polishing20 Polishing
20 Polishing
 
15 Time Space
15 Time Space15 Time Space
15 Time Space
 
14 Ddply
14 Ddply14 Ddply
14 Ddply
 
02 large
02 large02 large
02 large
 
12 adv-manip
12 adv-manip12 adv-manip
12 adv-manip
 
16 critique
16 critique16 critique
16 critique
 
01 Intro
01 Intro01 Intro
01 Intro
 
20 date-times
20 date-times20 date-times
20 date-times
 
21 spam
21 spam21 spam
21 spam
 
04 Wrapup
04 Wrapup04 Wrapup
04 Wrapup
 
Grammar Of Graphics: past, present, future
Grammar Of Graphics: past, present, futureGrammar Of Graphics: past, present, future
Grammar Of Graphics: past, present, future
 
13 case-study
13 case-study13 case-study
13 case-study
 
01 intro
01 intro01 intro
01 intro
 
15 time-space
15 time-space15 time-space
15 time-space
 
19 Critique
19 Critique19 Critique
19 Critique
 
11 Simulation
11 Simulation11 Simulation
11 Simulation
 

Similar a 19 tables

R Cheat Sheet – Data Management
R Cheat Sheet – Data ManagementR Cheat Sheet – Data Management
R Cheat Sheet – Data ManagementDr. Volkan OBAN
 
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov VyacheslavSeminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov VyacheslavVyacheslav Arbuzov
 
Stata cheat sheet: data transformation
Stata  cheat sheet: data transformationStata  cheat sheet: data transformation
Stata cheat sheet: data transformationTim Essam
 
A quick introduction to R
A quick introduction to RA quick introduction to R
A quick introduction to RAngshuman Saha
 
Data manipulation on r
Data manipulation on rData manipulation on r
Data manipulation on rAbhik Seal
 
Stata Cheat Sheets (all)
Stata Cheat Sheets (all)Stata Cheat Sheets (all)
Stata Cheat Sheets (all)Laura Hughes
 
Basic R Data Manipulation
Basic R Data ManipulationBasic R Data Manipulation
Basic R Data ManipulationChu An
 
Stata cheat sheet: data processing
Stata cheat sheet: data processingStata cheat sheet: data processing
Stata cheat sheet: data processingTim Essam
 
Cheat Sheet for Stata v15.00 PDF Complete
Cheat Sheet for Stata v15.00 PDF CompleteCheat Sheet for Stata v15.00 PDF Complete
Cheat Sheet for Stata v15.00 PDF CompleteTsamaraLuthfia1
 
Table of Useful R commands.
Table of Useful R commands.Table of Useful R commands.
Table of Useful R commands.Dr. Volkan OBAN
 
Practical data science_public
Practical data science_publicPractical data science_public
Practical data science_publicLong Nguyen
 
R Programming: Export/Output Data In R
R Programming: Export/Output Data In RR Programming: Export/Output Data In R
R Programming: Export/Output Data In RRsquared Academy
 

Similar a 19 tables (20)

Data import-cheatsheet
Data import-cheatsheetData import-cheatsheet
Data import-cheatsheet
 
R Cheat Sheet – Data Management
R Cheat Sheet – Data ManagementR Cheat Sheet – Data Management
R Cheat Sheet – Data Management
 
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov VyacheslavSeminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
 
Introduction to r
Introduction to rIntroduction to r
Introduction to r
 
Stata cheat sheet: data transformation
Stata  cheat sheet: data transformationStata  cheat sheet: data transformation
Stata cheat sheet: data transformation
 
A quick introduction to R
A quick introduction to RA quick introduction to R
A quick introduction to R
 
R
RR
R
 
Data manipulation on r
Data manipulation on rData manipulation on r
Data manipulation on r
 
Stata Cheat Sheets (all)
Stata Cheat Sheets (all)Stata Cheat Sheets (all)
Stata Cheat Sheets (all)
 
Basic R Data Manipulation
Basic R Data ManipulationBasic R Data Manipulation
Basic R Data Manipulation
 
3 Data Structure in R
3 Data Structure in R3 Data Structure in R
3 Data Structure in R
 
Stata cheat sheet: data processing
Stata cheat sheet: data processingStata cheat sheet: data processing
Stata cheat sheet: data processing
 
Cheat Sheet for Stata v15.00 PDF Complete
Cheat Sheet for Stata v15.00 PDF CompleteCheat Sheet for Stata v15.00 PDF Complete
Cheat Sheet for Stata v15.00 PDF Complete
 
R Programming Homework Help
R Programming Homework HelpR Programming Homework Help
R Programming Homework Help
 
R language introduction
R language introductionR language introduction
R language introduction
 
Table of Useful R commands.
Table of Useful R commands.Table of Useful R commands.
Table of Useful R commands.
 
Practical data science_public
Practical data science_publicPractical data science_public
Practical data science_public
 
R Programming: Export/Output Data In R
R Programming: Export/Output Data In RR Programming: Export/Output Data In R
R Programming: Export/Output Data In R
 
tidyr.pdf
tidyr.pdftidyr.pdf
tidyr.pdf
 
Introduction to R
Introduction to RIntroduction to R
Introduction to R
 

Más de Hadley Wickham (20)

27 development
27 development27 development
27 development
 
27 development
27 development27 development
27 development
 
24 modelling
24 modelling24 modelling
24 modelling
 
23 data-structures
23 data-structures23 data-structures
23 data-structures
 
Graphical inference
Graphical inferenceGraphical inference
Graphical inference
 
R packages
R packagesR packages
R packages
 
22 spam
22 spam22 spam
22 spam
 
18 cleaning
18 cleaning18 cleaning
18 cleaning
 
14 case-study
14 case-study14 case-study
14 case-study
 
11 adv-manip
11 adv-manip11 adv-manip
11 adv-manip
 
11 adv-manip
11 adv-manip11 adv-manip
11 adv-manip
 
10 simulation
10 simulation10 simulation
10 simulation
 
09 bootstrapping
09 bootstrapping09 bootstrapping
09 bootstrapping
 
08 functions
08 functions08 functions
08 functions
 
07 problem-solving
07 problem-solving07 problem-solving
07 problem-solving
 
06 data
06 data06 data
06 data
 
05 subsetting
05 subsetting05 subsetting
05 subsetting
 
04 reports
04 reports04 reports
04 reports
 
03 extensions
03 extensions03 extensions
03 extensions
 
25 fin
25 fin25 fin
25 fin
 

19 tables

  • 1. Today we will use the reshape2 and xtables packages, and the movies.csv.bz2 dataset. install.packages(c("reshape2", "xtables"))
  • 2. Garrett Grolemund Phd Student / Rice University Department of Statistics Tables
  • 3. 1. Melt and Cast (reshape) 2. Making tables with R (xtables) 3. Advanced styling (booktabs)
  • 5. reshape2 Improved version of reshape - much faster. melt() works the same as in reshape.
  • 6. Recall Sometimes a single variable is strewn across multiple columns. We can recombine it with a 3 step process: 1. protect the good columns 2. melt the rest 3. subset out false rows
  • 7. tips1 <- melt(tips, id = c("customer_ID", "total_bill", "tip")) names(tips1)[4] <- "gender" tips1 <- subset(tips1, value == 1)[ , 1:4] customer_id total_bill tip female male 1 20.53 4.00 0 1 2 21.70 4.30 0 1 3 19.65 3.00 1 0 customer_id total_bill tip gender 1 20.53 4.00 male 2 21.70 4.30 male 3 19.65 3.00 female
  • 8. Your turn Use melt() to combine the genre columns of movies.csv into one column that displays each movie’s genre.
  • 9. movies <- read.csv("movies.csv.bz2", stringsAsFactors = FALSE) library(reshape) m.movies <-melt(movies, id = names(movies)[1:17]) names(movies)[18] <- "genre" movies <- subset(m.movies, value == 1)[ , 1:18]
  • 10. Cast
  • 11. subject age weight height John Smith 33 90 1.87 Mary Smith 1.54 subject variable value John Smith age 33 John Smith weight 90 John Smith height 1.87 Mary Smith height 1.54 variable John Smith Mary Smith age 33 weight 90 height 1.87 1.54 What is the difference between these data frames?
  • 12. subject age weight height John Smith 33 90 1.87 Mary Smith 1.54 subject variable value John Smith age 33 John Smith weight 90 John Smith height 1.87 Mary Smith height 1.54 variable John Smith Mary Smith age 33 weight 90 height 1.87 1.54 Same measurements, different arrangements
  • 13. dcast (or cast in reshape) transforms a melted dataframe into whatever arrangement you like. Dataframe must contain a column named “value.” smiths m.smiths <- melt(smiths[ , -2]) dcast(m.smiths, subject ~ variable) dcast(m.smiths, variable ~ subject)
  • 14. General strategy Think of dcast() as making a table
  • 15. dcast(m.smiths, subject ~ variable) melted dataset
  • 16. dcast(m.smiths, subject ~ variable) what IDs to put in left hand column
  • 17. dcast(m.smiths, subject ~ variable) what IDs to put along the top
  • 18. dcast(m.smiths, subject ~ variable) Whatever was in the value column will get spread across the cells of the table
  • 19. dcast(m.smith, subject + gender ~ variable) left hand column 1 1 2 left hand column 2 m.smiths$gender <- rep(c(“male”, “female”), 2) Multiple variables
  • 20. dcast(m.smith, subject ~ gender + variable) Multiple variables ?
  • 21. dcast(m.smith, subject ~ gender + variable) still only one header row: names get combined Multiple variables
  • 22. Your turn Use reshape2 and the smiths data set to create the following tables variable John Smith Mary Smith 1 age 33.00 NA 2 weight 90.00 NA 3 height 1.87 1.54 subject variable 1.54 1.87 1 John Smith time NA 1 2 John Smith age NA 33 3 John Smith weight NA 90 4 Mary Smith time 1 NA 5 Mary Smith age NA NA 6 Mary Smith weight NA NA
  • 23. Q: Often reshaping a data set will lead to two values being put into the same cell. How do we combine them? For example, Aggregating game player variable value 1 Yao Ming shot attempts 16 2 Yao Ming shot attempts 12 2 Jordan Hill shot attempts 9 player shot attempts Jordan Hill 9 Yao Ming ?
  • 24. A: However you want. Aggregating player shot attempts Jordan Hill 9 Yao Ming 28 dcast(data, left ~ top, agg.method) R function to use to combine the values
  • 25. df <- data.frame(game = c(1,2,2), player = c("Yao Ming", "Yao Ming", "Jordan Hill"), variable = c("shot attempts", "shot attempts", "shot attempts"), value = c(16, 12, 9)) dcast(df, player ~ variable, sum)
  • 26. Useful aggregating functions length (default) number of times the combination appeared sum total number of all values in that cell mean average number of all values in that cell * remember to use na.rm = T
  • 27. Your turn Use the movie data set to create a table that shows: 1. The total number of movies in each genre by year. (hint: years in the rows, genres in the columns) 2. The average rating, length, and budget of movies from each year (hint: years in rows)
  • 28. dcast(movies, year ~ genre, length) movies1 <- melt(movies[ , c(2:5)], id = "year") movies1 <- dcast(movies1, year ~ variable, mean, na.rm = T) qplot(year, length, data = movies1, geom = "line")
  • 29. Its often useful to see the total for each row and column. These totals are known as margins or marginal distributions. Margins
  • 30. To add a margin column: dcast(data, left ~ top, margins = "name of last variable on left side of ~") To add a margin row: dcast(data, left ~ top, margins = "name of last variable on right side of ~") To add both: dcast(data, left ~ top, margins = c("name1", "name2")) Margins
  • 31. Your turn Add a margin row and column to your table of genres by year.
  • 35. library(xtable) xtable converts objects in R to a table in latex code with xtable() and print() head(smiths) print(xtable(smiths)) print(xtable(smiths), floating = FALSE)
  • 36. Captions print(xtable(smiths, caption = "John is taller than Mary."))
  • 38. Column Alignment smith.table <- xtable(smiths) align(smith.table) <- "|cc|cccc|" print(smith.table)
  • 40. Suppress row names print(xtable(smiths), include.rownames = FALSE)
  • 41. more Remember that variable names like “year_2010” won’t print out in latex. You’ll have to change them to “year_2010” first. For more examples of xtables, visit: cran.r-project.org/web/packages/xtable/ vignettes/xtableGallery.pd
  • 42. Create latex code for the genre by year table. Give the table an appropriate caption and label and remove row names. Your turn
  • 43. genres <- dcast(movies, year ~ genre, length) print(xtable(genres, label = "tab:genres", caption = "Table of genres by year. Action movies win."), include.rownames = FALSE)
  • 45. 6 rules for pretty tables From http://www.ctan.org/tex-archive/macros/latex/contrib/booktabs/booktabs.pdf 1. Never use vertical lines 2. Never use double lines 3. Put units in the column heading, not in body of table 4. Always precede a decimal point by a digit 5. Never use “ditto” signs to repeat a previous value 6. Use signficant digits consistently within columns
  • 46. booktabs package booktabs is a latex package that provides tools for making table even better. Specifically it provides toprule and bottomrule which are thicker and provide better spacing than using hline to begin and end your tables. midrule is thinner and can be used within the table.
  • 47. booktabs package To use toprule, bottomrule, and midrule first declare usepackage{booktabs} at the top of your tex document.
  • 48.
  • 49. This work is licensed under the Creative Commons Attribution-Noncommercial 3.0 United States License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc/ 3.0/us/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.