19 tables

Today we will use the reshape2 and xtables
packages, and the movies.csv.bz2 dataset.
install.packages(c("reshape2", "xtables"))

Garrett Grolemund
Phd Student / Rice University
Department of Statistics
Tables

1. Melt and Cast (reshape)
2. Making tables with R (xtables)
3. Advanced styling (booktabs)

reshape2
Improved version of reshape - much
faster. melt() works the same as in
reshape.

Recall
Sometimes a single variable is strewn
across multiple columns. We can
recombine it with a 3 step process:
1. protect the good columns
2. melt the rest
3. subset out false rows

tips1 <- melt(tips, id = c("customer_ID",
"total_bill", "tip"))
names(tips1)[4] <- "gender"
tips1 <- subset(tips1, value == 1)[ , 1:4]
customer_id total_bill tip female male
1 20.53 4.00 0 1
2 21.70 4.30 0 1
3 19.65 3.00 1 0
customer_id total_bill tip gender
1 20.53 4.00 male
2 21.70 4.30 male
3 19.65 3.00 female

Your turn
Use melt() to combine the genre columns
of movies.csv into one column that
displays each movie’s genre.

movies <- read.csv("movies.csv.bz2",
stringsAsFactors = FALSE)
library(reshape)
m.movies <-melt(movies,
id = names(movies)[1:17])
names(movies)[18] <- "genre"
movies <- subset(m.movies, value == 1)[ , 1:18]

subject age weight height
John Smith 33 90 1.87
Mary Smith 1.54
subject variable value
John Smith age 33
John Smith weight 90
John Smith height 1.87
Mary Smith height 1.54
variable John Smith Mary Smith
age 33
weight 90
height 1.87 1.54
What is the
difference
between
these data
frames?

subject age weight height
John Smith 33 90 1.87
Mary Smith 1.54
subject variable value
John Smith age 33
John Smith weight 90
John Smith height 1.87
Mary Smith height 1.54
age 33
weight 90
height 1.87 1.54
Same
measurements,
different
arrangements

dcast (or cast in reshape) transforms a
melted dataframe into whatever
arrangement you like. Dataframe must
contain a column named “value.”
smiths
m.smiths <- melt(smiths[ , -2])
dcast(m.smiths, subject ~ variable)
dcast(m.smiths, variable ~ subject)

General strategy
Think of dcast() as making a table

melted
dataset

what IDs to put in
left hand column

what IDs to put
along the top

Whatever was in the value column will get
spread across the cells of the table

dcast(m.smith, subject + gender ~ variable)
left hand
column 1
1 2
left hand
column 2
m.smiths$gender <- rep(c(“male”, “female”), 2)
Multiple variables

dcast(m.smith, subject ~ gender + variable)
Multiple variables
?

dcast(m.smith, subject ~ gender + variable)
still only one header row: names
get combined
Multiple variables

Your turn
Use reshape2 and the smiths data set to
create the following tables
1 age 33.00 NA
2 weight 90.00 NA
3 height 1.87 1.54
subject variable 1.54 1.87
1 John Smith time NA 1
2 John Smith age NA 33
3 John Smith weight NA 90
4 Mary Smith time 1 NA
5 Mary Smith age NA NA
6 Mary Smith weight NA NA

Q: Often reshaping a data set will lead to
two values being put into the same cell.
How do we combine them?
For example,
Aggregating
game player variable value
1 Yao Ming shot attempts 16
2 Yao Ming shot attempts 12
2 Jordan Hill shot attempts 9
player shot attempts
Jordan Hill 9
Yao Ming ?

A: However you want.
Aggregating
player shot attempts
Jordan Hill 9
Yao Ming 28
dcast(data, left ~ top, agg.method)
R function to use to
combine the values

df <- data.frame(game = c(1,2,2),
player = c("Yao Ming", "Yao Ming",
"Jordan Hill"), variable = c("shot attempts",
"shot attempts", "shot attempts"),
value = c(16, 12, 9))
dcast(df, player ~ variable, sum)

Useful aggregating
functions
length
(default) number of times the
combination appeared
sum total number of all values in that cell
mean
average number of all values in that
cell
* remember to use na.rm = T

Your turn
Use the movie data set to create a table that
shows:
1. The total number of movies in each genre
by year. (hint: years in the rows, genres in the
columns)
2. The average rating, length, and budget of
movies from each year (hint: years in rows)

dcast(movies, year ~ genre, length)
movies1 <- melt(movies[ , c(2:5)], id = "year")
movies1 <- dcast(movies1, year ~ variable, mean,
na.rm = T)
qplot(year, length, data = movies1, geom = "line")

Its often useful to see the total for each
row and column. These totals are known
as margins or marginal distributions.
Margins

To add a margin column:
dcast(data, left ~ top, margins = "name of last variable on left side
of ~")
To add a margin row:
dcast(data, left ~ top, margins = "name of last variable on right
side of ~")
To add both:
dcast(data, left ~ top, margins = c("name1", "name2"))
Margins

Your turn
Add a margin row and column to your
table of genres by year.

Data
analysis
done in R
Problem
Tables
published
in latex?you +
hand entry

Data
analysis
done in R
solution
Tables
published
in latex?xtable

library(xtable)
xtable converts objects in R to a table in
latex code with xtable() and print()
head(smiths)
print(xtable(smiths))
print(xtable(smiths), ﬂoating = FALSE)

Captions
print(xtable(smiths, caption = "John is
taller than Mary."))

Centering
print(xtable(smiths), latex.environment =
"center")

Column Alignment
smith.table <- xtable(smiths)
align(smith.table) <- "|cc|cccc|"
print(smith.table)

Labels
print(xtable(smiths, label = "tab:smith"))

Suppress row names
print(xtable(smiths), include.rownames =
FALSE)

more
Remember that variable names like
“year_2010” won’t print out in latex. You’ll
have to change them to “year_2010”
ﬁrst.
For more examples of xtables, visit:
cran.r-project.org/web/packages/xtable/
vignettes/xtableGallery.pd

Create latex code for the genre by year
table. Give the table an appropriate
caption and label and remove row names.
Your turn

genres <- dcast(movies, year ~ genre, length)
print(xtable(genres, label = "tab:genres",
caption = "Table of genres by year. Action
movies win."), include.rownames = FALSE)

6 rules for pretty tables
From http://www.ctan.org/tex-archive/macros/latex/contrib/booktabs/booktabs.pdf
1. Never use vertical lines
2. Never use double lines
3. Put units in the column heading, not in
body of table
4. Always precede a decimal point by a digit
5. Never use “ditto” signs to repeat a
previous value
6. Use signﬁcant digits consistently within
columns

booktabs package
booktabs is a latex package that provides
tools for making table even better.
Speciﬁcally it provides toprule and
bottomrule which are thicker and provide
better spacing than using hline to begin
and end your tables. midrule is thinner
and can be used within the table.

booktabs package
To use toprule, bottomrule, and midrule
ﬁrst declare usepackage{booktabs} at
the top of your tex document.

This work is licensed under the Creative
Commons Attribution-Noncommercial 3.0 United
States License. To view a copy of this license,
visit http://creativecommons.org/licenses/by-nc/
3.0/us/ or send a letter to Creative Commons,
171 Second Street, Suite 300, San Francisco,
California, 94105, USA.

19 tables

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (7)

Destacado

Destacado (20)

Similar a 19 tables

Similar a 19 tables (20)

Más de Hadley Wickham

Más de Hadley Wickham (20)

19 tables