AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
Introduction to Programming in R: Extracting and Replacing Object Elements
1. Introduction To Programming In R
2nd & 3rd Floor, 5/3 BEML Layout,
Varathur Road, Thubarahalli,
Kundalahalli Gate, Bangalore 66
Landmark – Behind Kundalahalli Gate bus stop,
Opposite to SKR Convention Mall,
Next to AXIS Bank.
Introduction T o Programming In R
2. 1 Workshop overview and materials
2 Data types
Extracting and replacing object elements
4 Applying functions to list elements
3
5
Writing functions
Control flow
6
7
The S3 object class system
Things that may surprise you
9 Additional resources
8
10 Loops (supplimental)
Last updated November 20, 2013 2 /
71Introduction T o Programming In R
3. W orkshop overview and materials
1 Workshop overview and materials
2 Data types
Extracting and replacing object elements
4 Applying functions to list elements
3
5
Writing functions
Control flow
6
7
The S3 object class system
Things that may surprise you
9 Additional resources
8
10 Loops (supplimental)
3 /
71Introduction T o Programming In R
4. W orkshop overview and materials
This is an intermediate/advanced R course
Appropriate for those with basic knowledge of R
Learning objectives:
Index data objects by position, name or logical condition
Understand looping and branching
Write your own simple functions
Debug functions
Understand and use the S3 object system
Last updated November 20, 2013 4 /
71Introduction T o Programming In R
5. W orkshop overview and materials
Throughout this workshop we will return to a running example that involves
calculating descriptive statistics for every column of a data.frame. We will
often use the built-in iris data set. You can load the iris data by evaluating
d a t a ( i r i s ) at the R prompt.
Our main example today consists of writing a statistical summary function
that calculates the min, mean, median, max, sd, and n for all numeric
columns in a data.frame, the correlations among these variables, and the
counts and proportions for all categorical columns. Typically I will describe a
topic and give some generic examples, then ask you to use the technique to
start building the summary.
Last updated November 20, 2013 5 /
71Introduction T o Programming In R
6. W orkshop overview and materials
Lab computer users:
USERNAME dataclass
PASSWORD dataclass
Download materials from
h t t p : / / p r o j e c t s . i q . h ar v a r d . e d u / r t c / r - prog
Scroll to the bottom of the page and download the
r-programming.zip file
Move it to your desktop and extract
Last updated November 20, 2013 6 /
71Introduction T o Programming In R
7. Data types
1 Workshop overview and materials
2 Data types
Extracting and replacing object elements
4 Applying functions to list elements
3
5
Writing functions
Control flow
6
7
The S3 object class system
Things that may surprise you
9 Additional resources
8
10 Loops (supplimental)
Last updated November 20, 2013 7 /
71Introduction T o Programming In R
8. Data types
Values can be combined into vectors using the c ( ) function
> num.var <- c (1, 2, 3, 4) # numeric vector
> char. var <- c ( "1", "2", "3", "4") # character vector
> log. var <- c (TRUE, TRUE, FALSE, TRUE) # logica l vector
> char.var2 <- c(num.var, char. var) # numbers coverted to character
>
Vectors have a class which determines how functions treat them
> class(num.var)
[1] "numeric"
> mean(num.var) # take the mean of a numeric vector
[1] 2.5
> class(cha r. v ar)
[1] " character"
> mean(char. var) # cannot average characters
[1] NA
> class(char. var2)
[1] " character"
Last updated November 20, 2013 8 /
71Introduction T o Programming In R
9. Data types
Vectors can be converted from one class to another
> class(char. var2)
[1] " character"
> num.var2 <- as.numeric(char. var2) # convert to numeric
> class(num.var2)
[1] "numeric"
> mean( as.numeric( c h a r . var2)) # now we can calcul a t e the mean
[1] 2.5
> as.numeric( c ( "a", "b", "c") ) # cannot convert l e t t e r s to numeric
[1] NA NA NA
In addition to class, you can examine the length()
and s t r ( ) ucture of vectors
> l s ( ) # l i s t objects in our workspace
[1] " char. var" "char.var2" " log. var" "num.var" "num.var2"
> length(char. var) # how many elements in char. var?
[1] 4
> s t r (num.var2) # what i s the structure of num.var2?
num [1:8] 1 2 3 4 1 2 3 4
Last updated November 20, 2013 9 /
71Introduction T o Programming In R
10. Data types
Factors are stored as numbers, but have character labels. Factors are useful
for
Modeling (automatically contrast coded)
Sorting/presenting values in arbitrary order
Most of the time we can treat factors as though they were character vectors
Last updated November 20, 2013 10 /
71Introduction T o Programming In R
11. Data types
A data.frame is a list of vectors, each of the same length
A list is a collection of objects each of which can be almost anything
> DF <- data.frame(x=1: 5, y = l e t t e r s [ 1: 5] )
> DF # d a t a . frame with two columns and 5 rows
x y
1 1 a
2 2 b
3 3 c
4 4 d
5 5 e
>
> # DF <- data.frame(x=1:10, y=1:7) # i l l e g a l becase lengths d i f f e r
> L <- l i s t ( x=1: 5, y=1: 3, z = DF)
> L # l i s t s a r e much more f l e x i b l e !
$x
[1] 1 2 3 4 5
$y
[1] 1 2 3
$z
x y
1 1 a Last updated November 20, 2013 11 /
71Introduction T o Programming In R
12. Data types
Key points:
vector classes include numeric, logical, character, and factors
vectors can be combined into lists or data.frames
a data.frame can almost always be thought of as a list of vectors of
equal length
a list is a collection of objects, each of which can by of almost any type
Functions introduced in this section:
c combine elements
as.numeric convert an object (e.g., a character verctor) to numeric
data.frame combine oject into a data.frame
ls list the objects in the workspace
class get the class of an object
str get the structure of an object
length get the number of elements in an object
mean calculate the mean of a vector
Last updated November 20, 2013 12 /
71Introduction T o Programming In R
13. Data types
Create a new vector called "test" containing five numbers of your choice
[ c ( ) , <- ]
Create a second vector called "students" containing five common names
of your choice [ c ( ) , <- ]
Determine the class of "students" and "test" [ c l a s s ( ) or s t r ( ) ]
Create a data frame containing two columns, "students" and "tests" as
defined above [ data.frame ]
Convert "test" to character class, and confirm that you were successful [
as.numeric(), <-, s t r ( ) ]
Last updated November 20, 2013 13 /
71Introduction T o Programming In R
14. Data types
Create a new vector called "test" containing five numbers of your choice
t e s t <- c ( 1, 2, 3, 4, 5)
a Create a second vector called "students" containing five common
names of your choice
students <- c ( "Mary", "Joan", "Steve", "Alex", "Suzy")
Determine the class of "students" and "test"
c l a s s ( s t ude nt s )
c l a s s ( t e s t )
Create a data frame containing two columns, "students" and "tests" as
defined above
testScores <- data.frame( s t u de n t s , t e s t s )
Convert "test" to character class, and confirm that you were successful
t e s t <- as.charact e r ( t e s t )
c l a s s ( t e s t )
Last updated November 20, 2013 14 /
71Introduction T o Programming In R
15. Extracting and replacing object elements
1 Workshop overview and materials
2 Data types
Extracting and replacing object elements
4 Applying functions to list elements
3
5
Writing functions
Control flow
6
7
The S3 object class system
Things that may surprise you
9 Additional resources
8
10 Loops (supplimental)
Last updated November 20, 2013 15 /
71Introduction T o Programming In R
16. Extracting and replacing object elements
Parts of vectors, matricies, data.frames, and lists can be extracted or replaced
based on position or name
> ## indexing vectors by position
> x <- 101: 110 # Creat a vector of integer s from 101 to 110
> x[c( 4, 5) ] # extract the fourth and f i f t h values of x
[1] 104 105
> x[4] <- 1 # change the 4th value to 1
> x # p r i n t x
[1] 101 102 103 1 105 106 107 108 109 110
>
> ## indexing vectors by name
> names(x) <- l e t t e r s [ 1: 10] # give x names
> p r i n t (x) #print x
a b c d e f g h i j
101 102 103 1 105 106 107 108 109 110
> x[c( "a", " f " ) ] # extract the values of a and f from x
a f
101 106
Last updated November 20, 2013 16 /
71Introduction T o Programming In R
17. Extracting and replacing object elements
Elements can also be selected or replaced based on logical (TRUE/FALSE)
vectors.
> x > 106 # shows which elements of x a r e > 106
a b c d e f g h i j
FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE
> x[x > 106] # s e l e c t s elements of x where x > 106
g h i j
107 108 109 110
Additional operators useful for logical indexing:
= = equal to
!= not equal to
> greater than
< less than
> = greater than or equal to
< = less than or equal to
%in% is included in
& and
| or
> x[x > 106 & x <= 108]
g h
Last updated November 20, 2013 17 /
71Introduction T o Programming In R
18. Extracting and replacing object elements
Extraction on matrices operate in two dimensions: first dimension refers to
rows, second dimension refers to columns
> ## indexing matricies
> # create a matrix
> (M <- cbind(x = 1: 5, y = -1: -5, z = c (6, 3, 4, 2, 8) ) )
x y
z [ 1 , ] 1 -1
6
[ 2 , ] 2 -2 3
[ 3 , ] 3 -3 4
[ 4 , ] 4 -4 2
[ 5 , ] 5 -5 8
> M[1: 3, ] #extract rows 1 through 3 , a l l columns
x y z
[ 1 , ] 1 -1 6
[ 2 , ] 2 -2 3
[ 3 , ] 3 -3 4
> M[c(5, 3, 1) , 2: 3] # rows 5 , 3 and 1 , columns 2 and 3
y z
[ 1 , ] -5 8
[ 2 , ] -3 4
[ 3 , ] -1 6
> M[M[, 1] %in% 4: 2, 2] # second column where f i r s t column <=4 & >= 2
[1] -2 -3 -4 Last updated November 20, 2013 18 /
71Introduction T o Programming In R
19. Extracting and replacing object elements
Lists can be indexed in the same way as vectors, with the following extension:
> # Lists can be indexed with single br a c k e t s , similar to vector indexing
> L[c(1, 2) ] # the f i r s t two elements of L
$x
[1] 1 2 3 4 5
$y
[1] 1 2 3
> L[1] # a l i s t with one element
$x
[1] 1 2 3 4 5
> ## double brackets s e l e c t the content of a single selected element
> ## effectiv e l y taking i t out of the l i s t .
> L[[1] ] # a vector
[1] 1 2 3 4 5
Last updated November 20, 2013 19 /
71Introduction T o Programming In R
20. Extracting and replacing object elements
A data.frame can be indexed in the same ways as a matrix, and also the same
ways as a list:
> DF[c(3, 1, 2) , c (1, 2) ] # rows 3 , 1 , and 2 , columns 1 and 2
x y
3 3 c
1 1 a
2 2 b
> DF[[1] ] # column 1 as a vector
[1] 1 2 3 4 5
There is a subtle but important difference between [ , n] and [ n] when
indexing data.frames: the first form returns a vector, the second returns a
data.frame with one column.
> s t r (DF[1] ) # a data.frame with one column
’data.frame’: 5 ob s . of 1 variable :
$ x: i n t 1 2 3 4 5
> s t r (DF[ , 1] ) # a vector
i n t [1:5] 1 2 3 4 5
Last updated November 20, 2013 20 /
71Introduction T o Programming In R
21. Extracting and replacing object elements
Key points:
elements of objects can be extracted or replaced using the [ operator
objects can be indexed by position, name, or logical (TRUE/FALSE)
vectors
vectors and lists have only one dimension, and hence only one index is
used
matricies and data.frames have two dimensions, and extraction methods
for these objects use two indices
Functions introduced in this section:
[ extraction operator, used to extract/replace object elements
names get the names of an object, usually a vector, list, or data.frame
print print an object
Last updated November 20, 2013 21 /
71Introduction T o Programming In R
22. Extracting and replacing object elements
Select just the Sepal.Length and Species columns from the iris data set
(built-in, will be available in your workspace automatically) and save the
result to a new data.frame named iris.2
Calculate the mean of the Sepal.Length column in iris.2
BONUS (optional): Calculate the mean of Sepal.Length, but only for
the setosa species
BONUS (optional): Calculate the number of sepal lengths that are more
than one standard deviation below the average sepal length
Last updated November 20, 2013 22 /
71Introduction T o Programming In R
23. Extracting and replacing object elements
Select just the Sepal.Length and Species columns from the iris data set
(built-in, will be available in your workspace automatically) and save the
result to a new data.frame named iris.2
data( i r i s )
i r i s 2 <- i r i s [ c ( "Sepal.Length", "Species") ]
s t r ( i r i s 2 )
Calculate the mean of the Sepal.Length column in iris.2
mean( i r i s 2 [ , "Sepal.Length"] )
3[@3] BONUS (optional): Calculate the mean of Sepal.Length, but only for
the setosa species
mean( i r i s 2 [ i r i s 2 [ [ "Species"] ] == "setosa", "Sepal.Length"] )
## s hor t c ut :
with( i r i s 2 , {
p r i n t ( mean(Sepal.Length[Species == "setosa"] ) )
})
4[@4] BONUS (optional): Calculate the number of sepal lengths that are
more than one standard deviation below the average sepal length
m.mi nus . s d <- mean( i r i s 2 [ [ "Sepal.Length"] ] ) - sd( i r i s 2 [ [ "Sepal.Length"] ] )
length( i r i s 2 [ i r i s 2 [ [ "Sepal.Length"] ] < m.minus.sd, "Sepal.Length"] )
Last updated November 20, 2013 23 /
71Introduction T o Programming In R
24. Applying functions to list elements
1 Workshop overview and materials
2 Data types
Extracting and replacing object elements
4 Applying functions to list elements
3
5
Writing functions
Control flow
6
7
The S3 object class system
Things that may surprise you
9 Additional resources
8
10 Loops (supplimental)
Last updated November 20, 2013 24 /
71Introduction T o Programming In R
25. Applying functions to list elements
The apply function is used to apply a function to the rows or columns of a
matrix
> M <- matrix( 1: 20, ncol=4)
> apply(M, 2, mean) ## average across the rows
[1] 3 8 13 18
> apply(M, 2, sum) ## sum the columns
[1] 15 40 65 90
Last updated November 20, 2013 25 /
71Introduction T o Programming In R
26. Applying functions to list elements
It is often useful to apply a function to each element of a vector, list, or
data.frame; use the sapply function for this
> sapply(DF, class ) # get the class of each column in the DF data.frame
x y
" integer" " factor "
> sapply(L, length) # get the length of each element in the L l i s t
x y z
5 3 2
> sapply(DF, i s . n umeric) # check each column of DF to see i f i t i s numeric
x y
TRUE FALSE
Last updated November 20, 2013 26 /
71Introduction T o Programming In R
27. Applying functions to list elements
The sapply function can be used in combination with indexing to extract
elements that meet certain criteria
Recall that we can index using logical vectors:
> DF[, c (TRUE, FALSE) ] # s e l e c t the f i r s t column of DF, but not the second
[1] 1 2 3 4 5
> ## r e c a l l t h a t we can index using logica l vectors :
> DF[, c(TRUE, FALSE) ] # s e l e c t the f i r s t column of DF, but not the second
[1] 1 2 3 4 5
sapply() can be used to generate the logical vector
> (DF.which.num <- sapply(DF, is. numeric))# check which columns of DF a r e numer
x y
TRUE FALSE
> DF[DF.which.num] # s e l e c t the numeric columns
x
1 1
2 2
3 3
4 4
5 5 Last updated November 20, 2013 27 /
71Introduction T o Programming In R
28. Applying functions to list elements
Key points:
R has convenient methods for applying functions to matricies, lists, and
data.frames
other apply-style functions exist, e.g., lapply, tapply, and mapply (see
documentation of these functions for details
Functions introduced in this section:
matrix create a matrix (vector with two dimensions)
apply apply a function to the rows or columns of a matrix
sapply apply a function to the elements of a list
is.numeric returns TRUE or FALSE, depending on the type of object
Last updated November 20, 2013 28 /
71Introduction T o Programming In R
29. W riting functions
1 Workshop overview and materials
2 Data types
Extracting and replacing object elements
4 Applying functions to list elements
3
5
Writing functions
Control flow
6
7
The S3 object class system
Things that may surprise you
9 Additional resources
8
10 Loops (supplimental)
Last updated November 20, 2013 29 /
71Introduction T o Programming In R
30. W riting functions
A function is a collection of commands that takes input(s) and returns
output.
If you have a specific analysis or transformation you want to do on
different data, use a function
Functions are defined using the function() function
Functions can be defined with any number of named arguments
Arguments can be of any type (e.g., vectors, data.frames, lists . . . )
Last updated November 20, 2013 30 /
71Introduction T o Programming In R
31. W riting functions
The return value of a function can be:
The last object stored in the body of the function
Objects explicitly returned with the return() function
Other function output can come from:
Calls to print() , message() or cat() in function body
Error messages
Assignment inside the body of a function takes place in a local
environment
Example:
> f <- function( ) { # define function f
+ p r i n t ( " s e t t i n g x to 1") # p r i n t a t e xt s t r i n g
+ x <- 1} # s e t x to 1
>
> y <- f ( ) # assign y the value returned by f
[1] " s e t t i n g x to 1"
>
> y # p r i n t y
[1] 1
> x # x in the global i s not 1!
a b c d e f g h i j
101 102 103 1 105 106 107 108 109 110
Last updated November 20, 2013 31 /
71Introduction T o Programming In R
32. W riting functions
Goal: write a function that returns the square of it’sargument
> square <- function (x) { # define function named "square" with argument x
+ return(x*x) # multiple the x argument by i t s e l f
+ } # end the function defini t io n
>
> # check to see t h a t the function works
> square(x = 2) # square the value 2
[1] 4
> square( 10) # square the value 10
[1] 100
> square( 1: 5) # square integer s 1 through 5
[1] 1 4 9 16 25
Last updated November 20, 2013 32 /
71Introduction T o Programming In R
33. W riting functions
Stepping througth functions and setting breakpoints
Use traceback() to see what went wrong after the fact
Last updated November 20, 2013 33 /
71Introduction T o Programming In R
34. W riting functions
Key points:
writing new functions is easy!
most functions will have a return value, but functions can also print
things, write things to file etc.
functions can be stepped through to facilitate debugging
Functions introduced in this section
function defines a new function
return used inside a function definition to set the return value
browser sets a break point
debug turns on the debugging flag of a function so you can step
through it
undebug turns off the debugging flag
traceback shows the error stack (call after an error to see what went
wrong)
Last updated November 20, 2013 34 /
71Introduction T o Programming In R
35. W riting functions
Write a function that takes a data.frame as an argument and returns the
mean of each numeric column in the data frame. Test your function
using the iris data.
Modify your function so that it returns a list, the first element if which is
the means of the numeric variables, the second of which is the counts of
the levels of each categorical variable.
Last updated November 20, 2013 35 /
71Introduction T o Programming In R
36. W riting functions
Write a function that takes a data.frame as an argument and returns the
mean of each numeric column in the data frame. Test your function
using the iris data.
statsum <- function( d f ) {
c l a s s e s <- sapply( d f , c l a s s )
means <- sappl y ( df [ c l a s s e s == "numeric"] , mean)
return (means)}
statsum( i r i s )
Modify your function so that it returns a list, the first element if which is
the means of the numeric variables, the second of which is the counts of
the levels of each categorical variable.
statsum <- f unc t i on( d f ) {
c l a s s e s <- s apply( d f , c l a s s )
means <- sapply( df [ c l a s s e s == "numeric"] , mean)
counts <- sapply( df [ c l a s s e s == " f a c t o r " ] , t a bl e )
r e t ur n ( l i s t (means, counts))}
statsum( i r i s )
Last updated November 20, 2013 36 /
71Introduction T o Programming In R
37. Control flow
1 Workshop overview and materials
2 Data types
Extracting and replacing object elements
4 Applying functions to list elements
3
5
Writing functions
Control flow
6
7
The S3 object class system
Things that may surprise you
9 Additional resources
8
10 Loops (supplimental)
Last updated November 20, 2013 37 /
71Introduction T o Programming In R
38. Control flow
Basic idea: if some condition is true, do one thing. If false, do
something else
Carried out in R using i f ( ) and e l s e ( ) statements, which can be
nested if necessary
Especially useful for checking function arguments and performing
different operations depending on function input
Last updated November 20, 2013 38 /
71Introduction T o Programming In R
39. Control flow
Goal: write a function that tells us if a number is positive or negative
> ## use branching to return d i f f e r e n t resul t depending on the sign of the inpu
> isPosit i ve <- function(x) { # define function " isPosit i ve "
+ i f (x > 0) { # i f x i s greater than zero, then
+ c a t( x , " i s positi ve n" ) } # say so!
+ else { # otherwise
+ c a t( x , " i s negative n" )} # say x i s negative
+ } # end function definit i o n
>
> ## t e s t the i s P o s i t i v e ( ) function
> isPosit i ve ( 10)
10 i s positi v e
> isPosit i ve ( -1)
-1 i s negative
> isPosit i ve ( 0)
0 i s negative
Need to do something different if x equals zero!
Last updated November 20, 2013 39 /
71Introduction T o Programming In R
40. Control flow
Add a condition to handle x = 0
> ## add condition to handle the case t h a t x i s zero
> i s P o s i t i ve <- function(x) { # define function " isPosit i ve "
+ i f (x > 0) { # i f x i s greater than zero, then
+ c a t( x , " i s positi ve n" ) } # say so!
+ else i f (x == 0) { # otherwise i f x i s zero
+ c a t( x , " i s zero n" )} # say so!
+ else { #otherwise
+ c a t( x , " i s negative n" )} # say x i s negative
+ } # end function defini t i on
>
Test the new function
> isPosit i ve ( 0) # t e s t the i s P o s i t i v e ( ) function
0 i s zero
> isPosit i ve ( "a") #oops, t h a t will not work! a i s
positi ve
We fixed the problem when x = 0, but now we need to make sure x is
numeric of length one (unless we agree with R that a is positive!)
Last updated November 20, 2013 40 /
71Introduction T o Programming In R
41. Control flow
Do something reasonable if x is not numeric
> ## add condition to handle the case t h a t x i s zero
> i s P o s i t i ve <- function(x) { # define function " isPosit i ve "
+ i f ( ! is.numeric(x) | length(x) > 1) {
+ c a t( "x must be a numeric vector of length one! n" )}
+ else i f (x > 0) { # i f x i s greater than zero, then
+ c a t( x , " i s positi ve n" ) } # say so!
+ else i f (x == 0) { # otherwise i f x i s z e ro
+ c a t( x , " i s zero n" )} # say so!
+ else { #otherwise
+ c a t( x , " i s negative n" )} # say x i s negative
+ } # end function defini t i on
>
> isPosit i ve ( "a") # t e s t the i s P o s i t i v e ( ) function on character
x must be a numeric vector of length one!
Last updated November 20, 2013 41 /
71Introduction T o Programming In R
42. Control flow
Key points:
code can be conditionally executed
conditions can be nested
conditional execution is often used for argument checking, among other
things
Functions1 introduced in this section
cat Concatenates and prints R objects
if execute code only if condition is met
else used with if; code to execute if condition is not met
1Technically i f and else are not functions, but this need not concern us at the moment
Last updated November 20, 2013 42 /
71Introduction T o Programming In R
43. Control flow
Add argument checking code to return an error if the argument to your
function is not a data.frame
Insert a break point with browser() and step through your function
Last updated November 20, 2013 43 /
71Introduction T o Programming In R
44. Control flow
Add argument checking code to return an error if the argument to your
function is not a data.frame
statsum <- function( d f ) {
i f ( c l a s s ( d f ) != "data.frame") stop( "df must be a data.frame!")
c l a s s e s <- sapply( d f , c l a s s )
means <- sappl y ( df [ c l a s s e s == "numeric"] , mean)
counts <- sapply( df [ c l a s s e s == " f a c t o r " ] , tabl e )
r e t ur n ( l i s t (means, counts))}
statsum( 1: 10)
statsum( i r i s )
Insert a break point with browser() and step through your function
statsum <- f unc t i on( d f ) {
i f ( c l a s s ( d f ) != "data.frame") stop( "df must be a data.frame!")
browser( )
c l a s s e s <- s apply( d f , c l a s s )
means <- sapply( df [ c l a s s e s == "numeric"] , mean)
counts <- sapply( df [ c l a s s e s == " f a c t o r " ] , t a bl e )
r e t ur n ( l i s t (means, counts))
}
statsum( i r i s )
Last updated November 20, 2013 44 /
71Introduction T o Programming In R
45. The S3 object class system
1 Workshop overview and materials
2 Data types
Extracting and replacing object elements
4 Applying functions to list elements
3
5
Writing functions
Control flow
6
7
The S3 object class system
Things that may surprise you
9 Additional resources
8
10 Loops (supplimental)
Last updated November 20, 2013 45 /
71Introduction T o Programming In R
46. The S3 object class system
R has two major object systems:
Relatively informal "S3" classes
Stricter, more formal "S4" classes
We will cover only the S3 system, not the S4 system
Basic idea: functions have different methods for different types of objects
Last updated November 20, 2013 46 /
71Introduction T o Programming In R
47. The S3 object class system
The class of an object can be retrieved and modified using the c l a s s ( )
function:
> x <- 1: 10
> class(x)
[1] " integer"
> class(x) <- "foo"
> class(x)
[1] "foo"
Objects are not limited to a single class, and can have many classes:
> class(x) <- c ( "A", "B")
> class(x)
[1] "A" "B"
Last updated November 20, 2013 47 /
71Introduction T o Programming In R
48. The S3 object class system
Functions can have many methods, allowing us to have (e.g.) one plot()
function that does different things depending on what is being plotted()
Methods can only be defined for generic functions: plot, print, summary,
mean, and several others are already generic
> # see what methods have been defined for the mean function
> methods(mean)
[1] mean.Date mean.default mean.difftim e mean.POSIXct
[5] mean.POSIXlt
> # which functions have methods for d a t a . f rames?
> methods(class=" d a t a . f rame") [ 1: 9]
[1] " aggregate. data. fra me" "anyDuplicated.data.frame"
[3] " as. data.frame.data .frame" " a s . l i s t . d a t a . f r a m e "
[5]
" a s . m a t r i x . d a t a . f r a me"
[7] "cbind.data.frame"
[9] " [. data. frame"
"by.data.frame"
"[<-.data.frame"
Last updated November 20, 2013 48 /
71Introduction T o Programming In R
49. The S3 object class system
To create a new method for a function that is already generic all you have to
do is name your function function.class
> # create a mean() method for objects of class " foo" :
> mean.foo <- function(x) { # mean method for "foo" class
+ i f ( i s . n u meric( x ) ) {
+ c a t( "The average i s " , mean.defaul t ( x ) )
+ return( i n v i s i b l e ( mean.default( x ) ) ) #use mean.default for numeric
+ } el s e
+ c a t( "x i s not numeric n" ) } # otherwise say x not numeric
>
> x <- 1: 10
> mean(x)
[1] 5.5
> class (x) <- "foo"
> mean(x)
The average i s 5.5>
> x <- as. character (x)
> class (x) <- "foo"
> mean(x)
x i s not nume r i c
Last updated November 20, 2013 49 /
71Introduction T o Programming In R
50. The S3 object class system
S3 generics are most often used for print, summary, and plot methods, but
sometimes you may want to create a new generic function
> # create a generic disp() function
> disp <- function( x , . . . ) {
+ UseMethod( "disp")
+ }
>
> # create a disp method for class "mat r i x "
> disp.matrix <- function(x) {
+ p r i n t ( round( x , digits= 2) )
+ }
>
> # t e s t i t out
> disp( matrix( runif ( 10) , ncol=2) )
[ , 1 ] [ , 2 ]
[ 1 , ] 0.78 0.21
[ 2 , ] 0.85 0.45
[ 3 , ] 0.31 0.34
[ 4 , ] 0.47 0.80
[ 5 , ] 0.51 0.66
Last updated November 20, 2013 50 /
71Introduction T o Programming In R
51. The S3 object class system
Key points:
there are several class systems in R, of which S3 is the oldest and
simplest
objects have class and functions have corresponding methods
the class of an object can be set by simple assignment
S3 generic functions all contain UseMethod("x") in the body, where x
is the name of the function
new methods for existing generic functions can be written simply by
defining a new function with a special naming scheme: the name of the
function followed by dot followed by the name of the class
Functions introduced in this section
plot creates a graphical display, the type of which depends on the
class of the object being plotted
methods lists the methods defined for a function or class
UseMethod the body of a generic function
invisible returns an object but does not print it
Last updated November 20, 2013 51 /
71Introduction T o Programming In R
52. The S3 object class system
Modify your function so that it also returns the standard deviations of
the numeric variables
Modify your function so that it returns a list of class "statsum"
Write a print method for the statsum class
Last updated November 20, 2013 52 /
71Introduction T o Programming In R
53. The S3 object class system
Modify your function so that it also returns the standard deviations of
the numeric variables
statsum <- function( d f ) {
i f ( c l a s s ( d f ) != "data.frame") stop( "df must be a data.frame!")
c l a s s e s <- sapply( d f , c l a s s )
means <- sappl y ( df [ c l a s s e s == "numeric"] , mean)
sds <- sapply( df [ c l a s s e s == "numeric"] , mean)
counts <- sapply( df [ c l a s s e s == " f a c t o r " ] , tabl e )
r e t ur n ( l i s t ( cbind(means , s d s ) , counts))}
statsum( i r i s )
Modify your function so that it returns a list of class "statsum"
statsum <- f unc t i on( d f ) {
i f ( c l a s s ( d f ) != "data.frame") stop( "df must be a data.frame!")
c l a s s e s <- s apply( d f , c l a s s )
means <- sapply( df [ c l a s s e s == "numeric"] , mean)
sds <- sapply( df [ c l a s s e s == "numeric"] , mean)
counts <- sapply( df [ c l a s s e s == " f a c t o r " ] , t a bl e )
R <- l i s t ( cbind(means, s d s ) , counts)
c l a s s (R) <- c ( "statsum", c l a s s (R))
r e t ur n (R)}
s t r ( statsum( i r i s ) )
3 [@3] Write a print method for the statsum class
pr i nt . s t a t s um <- function( x) {
p r i n t ( x [ [ 1] ] ,
c a t ( "Numeric va r i able de s c r i pt i ve s t a t i s t i c s : n " )
Introduction T o Programming In R
Last updated November 20, 2013 53 /
71
54. Things that may surprise you
1 Workshop overview and materials
2 Data types
Extracting and replacing object elements
4 Applying functions to list elements
3
5
Writing functions
Control flow
6
7
The S3 object class system
Things that may surprise you
9 Additional resources
8
10 Loops (supplimental)
Last updated November 20, 2013 54 /
71Introduction T o Programming In R
55. Things that may surprise you
There are an unfortunately large number of surprises in R programming
Some of these "gotcha’s" are common problems in other languages,
many are unique to R
We will only cover a few –for a more comprehensive discussion please
see http://www.burns-s t a t .com/pages/Tutor/R_inferno.pdf
Last updated November 20, 2013 55 /
71Introduction T o Programming In R
56. Things that may surprise you
Floating point arithmetic is not exact:
> . 1 == . 3 / 3
[1] FALSE
Solution: use a l l . e q u a l ( ):
> a l l . e q u a l ( . 1 , . 3/ 3)
[1] TRUE
Last updated November 20, 2013 56 /
71Introduction T o Programming In R
57. Things that may surprise you
R does not exclude missing values by default –a single missing value in a
vector means that many thing are unknown:
> x <- c (1: 10, NA, 12: 20)
> c (mean( x ) , sd(x) , median( x ) , min( x ) , sd( x ) )
[1] NA NA NA NA NA
NA is not equal to anything, not even NA
> NA == NA
[1] NA
Solutions: use na.rm = TRUE option when calculating, and is.na to test for
missing
Last updated November 20, 2013 57 /
71Introduction T o Programming In R
58. Things that may surprise you
Automatic type conversion happens a lot which is often useful, but makes it
easy to miss mistakes
> # combining values coereces them to the most general type
> (x <- c (TRUE, FALSE, 1, 2, "a", "b") )
[1] "TRUE" "FALSE" "1" "2" "a" "b"
> s t r (x)
chr [1:6] "TRUE" "FALSE" "1" "2" "a" "b"
>
> # comparisons convert arguments to most general type
> 1 > "a"
[1] FALSE
Maybe this is what you expect. . . I would like to at least get a warning!
Last updated November 20, 2013 58 /
71Introduction T o Programming In R
59. Things that may surprise you
Functions you might expect to work similarly don’t always:
> mean( 1, 2, 3, 4, 5)*5
[1] 5
> sum( 1, 2, 3, 4, 5)
[1] 15
Why are these different?!?
> args(mean)
function ( x , . . . )
NULL
> args(sum)
function ( . . . , na.rm = FALSE)
NULL
Ouch. That is not nice at all!
Last updated November 20, 2013 59 /
71Introduction T o Programming In R
60. Things that may surprise you
Factors sometimes behave as numbers, and sometimes as characters, which
can be confusing!
> (x <- f a c t or( c ( 5, 5, 6, 6) , levels = c (6, 5) ) )
[1] 5 5 6 6
Levels: 6 5
>
> s t r (x)
Factor w/ 2 levels " 6" , " 5" : 2 2 1 1
>
> as. character (x)
[1] "5" "5" "6" "6"
> # here i s where people sometimes get l o s t . . .
> as.numeric(x)
[1] 2 2 1 1
> # you probably want
> as.numeric( as. charact e r ( x ) )
[1] 5 5 6 6
Last updated November 20, 2013 60 /
71Introduction T o Programming In R
61. Additional resources
1 Workshop overview and materials
2 Data types
Extracting and replacing object elements
4 Applying functions to list elements
3
5
Writing functions
Control flow
6
7
The S3 object class system
Things that may surprise you
9 Additional resources
8
10 Loops (supplimental)
Last updated November 20, 2013 61 /
71Introduction T o Programming In R
62. Additional resources
S3 system overview:
https://github.com/ hadley/devtools/wiki/S3
S4 system overview:
https://github.com/ hadley/devtools/wiki/S4
R documentation: h t t p : / / c r a n . r - project.org/manuals.html
Collection of R tutorials:
h t t p : / / c r a n . r - projec t . o r g / o t h e r - docs.html
R for Programmers (by Norman Matloff, UC–Davis)
h t t p : / / h e a t h e r . c s . u c d avis.edu/~matloff/R/RProg.pdf
Calling C and Fortran from R (by Charles Geyer, UMinn)
http://www.stat.umn.e d u/ ~c ha rlie /rc/
State of the Art in Parallel Computing with R (Schmidberger et al.)
http://www.jstatso|.o rg/v31/i01/paper
Institute for Quantitative Social Science: http://iq. ha r var d .e d u
Research technology consulting:
h t t p : / / p r o j e c t s . i q . harvard.edu/rtc
Last updated November 20, 2013 62 /
71Introduction T o Programming In R
63. Additional resources
Help Us Make This Workshop Better!
Please take a moment to fill out a very short feedback form
These workshops exist for you –tell us what you need!
h t t p : / / t i n y u r l . c o m / RprogrammingFeedback
Last updated November 20, 2013 63 /
71Introduction T o Programming In R
64. Loops (supplimental)
1 Workshop overview and materials
2 Data types
Extracting and replacing object elements
4 Applying functions to list elements
3
5
Writing functions
Control flow
6
7
The S3 object class system
Things that may surprise you
9 Additional resources
8
10 Loops (supplimental)
Last updated November 20, 2013 64 /
71Introduction T o Programming In R
65. Loops (supplimental)
A loop is a collection of commands that are run over and over again.
A for loop runs the code a fixed number of times, or on a fixed set of
objects.
A while loop runs the code until a condition is met.
If you’re typing the same commands over and over again, you might
want to use a loop!
Last updated November 20, 2013 65 /
71Introduction T o Programming In R
66. Loops (supplimental)
For each value in a vector, print the number and its square
> # For-loop example
> for (num i n seq( -5, 5) ) {# for each number in [ - 5 , 5]
+ c a t(num, "squared i s " , num^2, " n" ) # p r i n t the number
+ }
-5 squared i s 25
-4 squared i s 16
-3 squared i s 9
-2 squared i s 4
-1 squared i s 1
0 squared i s 0
1 squared i s 1
2 squared i s 4
3 squared i s 9
4 squared i s 16
5 squared i s 25
Introduction T o Programming In R
67. Loops (supplimental)
Goal: simulate rolling two dice until we roll two sixes
> ## While-loop example: r o l l i n g dice
> set. seed ( 15) # allows repoducible sample() resu l t s
> dice <- seq( 1, 6) # s e t dice = [1 2 3 4 5 6]
> r o l l <- 0 # s e t r o l l = 0
> while ( r o l l < 12) {
+ r o l l <- sample( d i c e , 1) + sample( d i c e , 1) # calculate sum of two r o l l s
+ c a t( "We rolled a " , r o l l , " n" ) # p r i n t the r e s u l t
+ } # end the loop
We rolled a 6
We rolled a 10
We rolled a 9
We rolled a 7
We rolled a 10
We rolled a 5
We rolled a 9
We rolled a 12
Last updated November 20, 2013
Introduction T o Programming In R
68. Loops (supplimental)
Often you will want to store the results from a loop. You can create an object
to hold the results generated in the loop and fill in the values using indexing
> ## save calculat i o ns done in a loop
> Result <- l i s t ( ) # create an object to store the r e s u l t s
> for ( i in 1: 5) {# for each i in [ 1 , 5]
+ R e s u l t [ [ i ] ] <- 1: i ## a s s i gn the sequence 1 to i to Result
+ }
> Result # pr i n t Result
[ [ 1 ] ]
[1] 1
[ [ 2 ] ]
[1] 1 2
[ [ 3 ] ]
[1] 1 2 3
[ [ 4 ] ]
[1] 1 2 3 4
[ [ 5 ] ]
[1] 1 2 3 4 5
Last updated November 20, 2013
Introduction T o Programming In R
69. Loops (supplimental)
Most operations in R are vectorized –This makes loops unnecessary in many
cases
Use vector arithmatic instead of loops:
> x <- c ( ) # create vector x
> for ( i in 1: 5) x [ i ] <- i + i # double a vector using a loop
> p r i n t (x) # p r i n t the r e s u l t
[1] 2 4 6 8 10
>
> 1: 5 + 1: 5 #double a vector without a loop
[1] 2 4 6 8 10
> 1: 5 + 5 # shorter vectors are recycled
[1] 6 7 8 9 10
Use paste instead of loops:
> ## Earlier we said
> ## for (num in seq( -5, 5)) {# for each number in [ - 5 , 5]
>
>
## cat(num, "squared i s " , num^2, " n " ) # p r i n t the number
## }
> ## a bette r way:
> paste( 1: 5, "squared = " , ( 1: 5)^ 2)
[1] "1 squared = 1" "2 squared = 4" "3 squared =
9"[4] "4 squared = 16" "5 squared = 25"
Introduction T o Programming In R
Last updated November 20, 2013
70. Loops (supplimental)
use a loop to get the c l a s s ( ) of each column in the iris data set
use the results from step 1 to select the numeric columns
use a loop to calculate the mean of each numeric column in the iris data
Introduction T o Programming In R
71. Loops (supplimental)
use a loop to get the c l a s s ( ) of each column in the iris data set
> classes <- c ( )
> for (name in names( i r i s ) ) {
+
+ }
>
classes[name] <- class(ir is[ [n am e]] )
> classes
Sepal.Length Sepal.Width Petal.Length
Petal.Width
Species
" factor""numeric" "numeric" "numeric" "numeric"
use the results from step 1 to select the numeric columns
> iris.num <- i r i s [ , names(classes)[classes== "numeric"] ]
> head(ir is. nu m, 2)
Sepal.Length Sepal.Width Petal.Length Petal.Width
1
2
5.1
4.9
3.5
3.0
1.4
1.4
0.2
0.2
use a loop to calculate the mean of each numeric column in the iris data
> iris. m e ans <- c ( )
> for (var in names(iris. num) ) {
+ iris. means [[ var ]] <- mean( i r i s [ [ v a r ] ] )
+ }
Last updated November 20, 2013 /
71Introduction T o Programming In R
72. Training in Bangalore
Second Floor and Third Floor,
5/3 BEML Layout,
Varathur Road, Thubarahalli,
Kundalahalli Gate, Bangalore 66
Landmark – Behind Kundalahalli Gate bus stop,
Opposite to SKR Convention Mall,
Next to AXIS Bank.