Se ha denunciado esta presentación.
Se está descargando tu SlideShare. ×

R Programming Tutorial for Beginners - -TIB Academy

Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Cargando en…3
×

Eche un vistazo a continuación

1 de 72 Anuncio

Más Contenido Relacionado

Presentaciones para usted (20)

Similares a R Programming Tutorial for Beginners - -TIB Academy (20)

Anuncio

Más de rajkamaltibacademy (17)

Más reciente (20)

Anuncio

R Programming Tutorial for Beginners - -TIB Academy

  1. 1. Introduction To Programming In R 2nd & 3rd Floor, 5/3 BEML Layout, Varathur Road, Thubarahalli, Kundalahalli Gate, Bangalore 66 Landmark – Behind Kundalahalli Gate bus stop, Opposite to SKR Convention Mall, Next to AXIS Bank. Introduction T o Programming In R
  2. 2. 1 Workshop overview and materials 2 Data types Extracting and replacing object elements 4 Applying functions to list elements 3 5 Writing functions Control flow 6 7 The S3 object class system Things that may surprise you 9 Additional resources 8 10 Loops (supplimental) Last updated November 20, 2013 2 / 71Introduction T o Programming In R
  3. 3. W orkshop overview and materials 1 Workshop overview and materials 2 Data types Extracting and replacing object elements 4 Applying functions to list elements 3 5 Writing functions Control flow 6 7 The S3 object class system Things that may surprise you 9 Additional resources 8 10 Loops (supplimental) 3 / 71Introduction T o Programming In R
  4. 4. W orkshop overview and materials This is an intermediate/advanced R course Appropriate for those with basic knowledge of R Learning objectives: Index data objects by position, name or logical condition Understand looping and branching Write your own simple functions Debug functions Understand and use the S3 object system Last updated November 20, 2013 4 / 71Introduction T o Programming In R
  5. 5. W orkshop overview and materials Throughout this workshop we will return to a running example that involves calculating descriptive statistics for every column of a data.frame. We will often use the built-in iris data set. You can load the iris data by evaluating d a t a ( i r i s ) at the R prompt. Our main example today consists of writing a statistical summary function that calculates the min, mean, median, max, sd, and n for all numeric columns in a data.frame, the correlations among these variables, and the counts and proportions for all categorical columns. Typically I will describe a topic and give some generic examples, then ask you to use the technique to start building the summary. Last updated November 20, 2013 5 / 71Introduction T o Programming In R
  6. 6. W orkshop overview and materials Lab computer users: USERNAME dataclass PASSWORD dataclass Download materials from h t t p : / / p r o j e c t s . i q . h ar v a r d . e d u / r t c / r - prog Scroll to the bottom of the page and download the r-programming.zip file Move it to your desktop and extract Last updated November 20, 2013 6 / 71Introduction T o Programming In R
  7. 7. Data types 1 Workshop overview and materials 2 Data types Extracting and replacing object elements 4 Applying functions to list elements 3 5 Writing functions Control flow 6 7 The S3 object class system Things that may surprise you 9 Additional resources 8 10 Loops (supplimental) Last updated November 20, 2013 7 / 71Introduction T o Programming In R
  8. 8. Data types Values can be combined into vectors using the c ( ) function > num.var <- c (1, 2, 3, 4) # numeric vector > char. var <- c ( "1", "2", "3", "4") # character vector > log. var <- c (TRUE, TRUE, FALSE, TRUE) # logica l vector > char.var2 <- c(num.var, char. var) # numbers coverted to character > Vectors have a class which determines how functions treat them > class(num.var) [1] "numeric" > mean(num.var) # take the mean of a numeric vector [1] 2.5 > class(cha r. v ar) [1] " character" > mean(char. var) # cannot average characters [1] NA > class(char. var2) [1] " character" Last updated November 20, 2013 8 / 71Introduction T o Programming In R
  9. 9. Data types Vectors can be converted from one class to another > class(char. var2) [1] " character" > num.var2 <- as.numeric(char. var2) # convert to numeric > class(num.var2) [1] "numeric" > mean( as.numeric( c h a r . var2)) # now we can calcul a t e the mean [1] 2.5 > as.numeric( c ( "a", "b", "c") ) # cannot convert l e t t e r s to numeric [1] NA NA NA In addition to class, you can examine the length() and s t r ( ) ucture of vectors > l s ( ) # l i s t objects in our workspace [1] " char. var" "char.var2" " log. var" "num.var" "num.var2" > length(char. var) # how many elements in char. var? [1] 4 > s t r (num.var2) # what i s the structure of num.var2? num [1:8] 1 2 3 4 1 2 3 4 Last updated November 20, 2013 9 / 71Introduction T o Programming In R
  10. 10. Data types Factors are stored as numbers, but have character labels. Factors are useful for Modeling (automatically contrast coded) Sorting/presenting values in arbitrary order Most of the time we can treat factors as though they were character vectors Last updated November 20, 2013 10 / 71Introduction T o Programming In R
  11. 11. Data types A data.frame is a list of vectors, each of the same length A list is a collection of objects each of which can be almost anything > DF <- data.frame(x=1: 5, y = l e t t e r s [ 1: 5] ) > DF # d a t a . frame with two columns and 5 rows x y 1 1 a 2 2 b 3 3 c 4 4 d 5 5 e > > # DF <- data.frame(x=1:10, y=1:7) # i l l e g a l becase lengths d i f f e r > L <- l i s t ( x=1: 5, y=1: 3, z = DF) > L # l i s t s a r e much more f l e x i b l e ! $x [1] 1 2 3 4 5 $y [1] 1 2 3 $z x y 1 1 a Last updated November 20, 2013 11 / 71Introduction T o Programming In R
  12. 12. Data types Key points: vector classes include numeric, logical, character, and factors vectors can be combined into lists or data.frames a data.frame can almost always be thought of as a list of vectors of equal length a list is a collection of objects, each of which can by of almost any type Functions introduced in this section: c combine elements as.numeric convert an object (e.g., a character verctor) to numeric data.frame combine oject into a data.frame ls list the objects in the workspace class get the class of an object str get the structure of an object length get the number of elements in an object mean calculate the mean of a vector Last updated November 20, 2013 12 / 71Introduction T o Programming In R
  13. 13. Data types Create a new vector called "test" containing five numbers of your choice [ c ( ) , <- ] Create a second vector called "students" containing five common names of your choice [ c ( ) , <- ] Determine the class of "students" and "test" [ c l a s s ( ) or s t r ( ) ] Create a data frame containing two columns, "students" and "tests" as defined above [ data.frame ] Convert "test" to character class, and confirm that you were successful [ as.numeric(), <-, s t r ( ) ] Last updated November 20, 2013 13 / 71Introduction T o Programming In R
  14. 14. Data types Create a new vector called "test" containing five numbers of your choice t e s t <- c ( 1, 2, 3, 4, 5) a Create a second vector called "students" containing five common names of your choice students <- c ( "Mary", "Joan", "Steve", "Alex", "Suzy") Determine the class of "students" and "test" c l a s s ( s t ude nt s ) c l a s s ( t e s t ) Create a data frame containing two columns, "students" and "tests" as defined above testScores <- data.frame( s t u de n t s , t e s t s ) Convert "test" to character class, and confirm that you were successful t e s t <- as.charact e r ( t e s t ) c l a s s ( t e s t ) Last updated November 20, 2013 14 / 71Introduction T o Programming In R
  15. 15. Extracting and replacing object elements 1 Workshop overview and materials 2 Data types Extracting and replacing object elements 4 Applying functions to list elements 3 5 Writing functions Control flow 6 7 The S3 object class system Things that may surprise you 9 Additional resources 8 10 Loops (supplimental) Last updated November 20, 2013 15 / 71Introduction T o Programming In R
  16. 16. Extracting and replacing object elements Parts of vectors, matricies, data.frames, and lists can be extracted or replaced based on position or name > ## indexing vectors by position > x <- 101: 110 # Creat a vector of integer s from 101 to 110 > x[c( 4, 5) ] # extract the fourth and f i f t h values of x [1] 104 105 > x[4] <- 1 # change the 4th value to 1 > x # p r i n t x [1] 101 102 103 1 105 106 107 108 109 110 > > ## indexing vectors by name > names(x) <- l e t t e r s [ 1: 10] # give x names > p r i n t (x) #print x a b c d e f g h i j 101 102 103 1 105 106 107 108 109 110 > x[c( "a", " f " ) ] # extract the values of a and f from x a f 101 106 Last updated November 20, 2013 16 / 71Introduction T o Programming In R
  17. 17. Extracting and replacing object elements Elements can also be selected or replaced based on logical (TRUE/FALSE) vectors. > x > 106 # shows which elements of x a r e > 106 a b c d e f g h i j FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE > x[x > 106] # s e l e c t s elements of x where x > 106 g h i j 107 108 109 110 Additional operators useful for logical indexing: = = equal to != not equal to > greater than < less than > = greater than or equal to < = less than or equal to %in% is included in & and | or > x[x > 106 & x <= 108] g h Last updated November 20, 2013 17 / 71Introduction T o Programming In R
  18. 18. Extracting and replacing object elements Extraction on matrices operate in two dimensions: first dimension refers to rows, second dimension refers to columns > ## indexing matricies > # create a matrix > (M <- cbind(x = 1: 5, y = -1: -5, z = c (6, 3, 4, 2, 8) ) ) x y z [ 1 , ] 1 -1 6 [ 2 , ] 2 -2 3 [ 3 , ] 3 -3 4 [ 4 , ] 4 -4 2 [ 5 , ] 5 -5 8 > M[1: 3, ] #extract rows 1 through 3 , a l l columns x y z [ 1 , ] 1 -1 6 [ 2 , ] 2 -2 3 [ 3 , ] 3 -3 4 > M[c(5, 3, 1) , 2: 3] # rows 5 , 3 and 1 , columns 2 and 3 y z [ 1 , ] -5 8 [ 2 , ] -3 4 [ 3 , ] -1 6 > M[M[, 1] %in% 4: 2, 2] # second column where f i r s t column <=4 & >= 2 [1] -2 -3 -4 Last updated November 20, 2013 18 / 71Introduction T o Programming In R
  19. 19. Extracting and replacing object elements Lists can be indexed in the same way as vectors, with the following extension: > # Lists can be indexed with single br a c k e t s , similar to vector indexing > L[c(1, 2) ] # the f i r s t two elements of L $x [1] 1 2 3 4 5 $y [1] 1 2 3 > L[1] # a l i s t with one element $x [1] 1 2 3 4 5 > ## double brackets s e l e c t the content of a single selected element > ## effectiv e l y taking i t out of the l i s t . > L[[1] ] # a vector [1] 1 2 3 4 5 Last updated November 20, 2013 19 / 71Introduction T o Programming In R
  20. 20. Extracting and replacing object elements A data.frame can be indexed in the same ways as a matrix, and also the same ways as a list: > DF[c(3, 1, 2) , c (1, 2) ] # rows 3 , 1 , and 2 , columns 1 and 2 x y 3 3 c 1 1 a 2 2 b > DF[[1] ] # column 1 as a vector [1] 1 2 3 4 5 There is a subtle but important difference between [ , n] and [ n] when indexing data.frames: the first form returns a vector, the second returns a data.frame with one column. > s t r (DF[1] ) # a data.frame with one column ’data.frame’: 5 ob s . of 1 variable : $ x: i n t 1 2 3 4 5 > s t r (DF[ , 1] ) # a vector i n t [1:5] 1 2 3 4 5 Last updated November 20, 2013 20 / 71Introduction T o Programming In R
  21. 21. Extracting and replacing object elements Key points: elements of objects can be extracted or replaced using the [ operator objects can be indexed by position, name, or logical (TRUE/FALSE) vectors vectors and lists have only one dimension, and hence only one index is used matricies and data.frames have two dimensions, and extraction methods for these objects use two indices Functions introduced in this section: [ extraction operator, used to extract/replace object elements names get the names of an object, usually a vector, list, or data.frame print print an object Last updated November 20, 2013 21 / 71Introduction T o Programming In R
  22. 22. Extracting and replacing object elements Select just the Sepal.Length and Species columns from the iris data set (built-in, will be available in your workspace automatically) and save the result to a new data.frame named iris.2 Calculate the mean of the Sepal.Length column in iris.2 BONUS (optional): Calculate the mean of Sepal.Length, but only for the setosa species BONUS (optional): Calculate the number of sepal lengths that are more than one standard deviation below the average sepal length Last updated November 20, 2013 22 / 71Introduction T o Programming In R
  23. 23. Extracting and replacing object elements Select just the Sepal.Length and Species columns from the iris data set (built-in, will be available in your workspace automatically) and save the result to a new data.frame named iris.2 data( i r i s ) i r i s 2 <- i r i s [ c ( "Sepal.Length", "Species") ] s t r ( i r i s 2 ) Calculate the mean of the Sepal.Length column in iris.2 mean( i r i s 2 [ , "Sepal.Length"] ) 3[@3] BONUS (optional): Calculate the mean of Sepal.Length, but only for the setosa species mean( i r i s 2 [ i r i s 2 [ [ "Species"] ] == "setosa", "Sepal.Length"] ) ## s hor t c ut : with( i r i s 2 , { p r i n t ( mean(Sepal.Length[Species == "setosa"] ) ) }) 4[@4] BONUS (optional): Calculate the number of sepal lengths that are more than one standard deviation below the average sepal length m.mi nus . s d <- mean( i r i s 2 [ [ "Sepal.Length"] ] ) - sd( i r i s 2 [ [ "Sepal.Length"] ] ) length( i r i s 2 [ i r i s 2 [ [ "Sepal.Length"] ] < m.minus.sd, "Sepal.Length"] ) Last updated November 20, 2013 23 / 71Introduction T o Programming In R
  24. 24. Applying functions to list elements 1 Workshop overview and materials 2 Data types Extracting and replacing object elements 4 Applying functions to list elements 3 5 Writing functions Control flow 6 7 The S3 object class system Things that may surprise you 9 Additional resources 8 10 Loops (supplimental) Last updated November 20, 2013 24 / 71Introduction T o Programming In R
  25. 25. Applying functions to list elements The apply function is used to apply a function to the rows or columns of a matrix > M <- matrix( 1: 20, ncol=4) > apply(M, 2, mean) ## average across the rows [1] 3 8 13 18 > apply(M, 2, sum) ## sum the columns [1] 15 40 65 90 Last updated November 20, 2013 25 / 71Introduction T o Programming In R
  26. 26. Applying functions to list elements It is often useful to apply a function to each element of a vector, list, or data.frame; use the sapply function for this > sapply(DF, class ) # get the class of each column in the DF data.frame x y " integer" " factor " > sapply(L, length) # get the length of each element in the L l i s t x y z 5 3 2 > sapply(DF, i s . n umeric) # check each column of DF to see i f i t i s numeric x y TRUE FALSE Last updated November 20, 2013 26 / 71Introduction T o Programming In R
  27. 27. Applying functions to list elements The sapply function can be used in combination with indexing to extract elements that meet certain criteria Recall that we can index using logical vectors: > DF[, c (TRUE, FALSE) ] # s e l e c t the f i r s t column of DF, but not the second [1] 1 2 3 4 5 > ## r e c a l l t h a t we can index using logica l vectors : > DF[, c(TRUE, FALSE) ] # s e l e c t the f i r s t column of DF, but not the second [1] 1 2 3 4 5 sapply() can be used to generate the logical vector > (DF.which.num <- sapply(DF, is. numeric))# check which columns of DF a r e numer x y TRUE FALSE > DF[DF.which.num] # s e l e c t the numeric columns x 1 1 2 2 3 3 4 4 5 5 Last updated November 20, 2013 27 / 71Introduction T o Programming In R
  28. 28. Applying functions to list elements Key points: R has convenient methods for applying functions to matricies, lists, and data.frames other apply-style functions exist, e.g., lapply, tapply, and mapply (see documentation of these functions for details Functions introduced in this section: matrix create a matrix (vector with two dimensions) apply apply a function to the rows or columns of a matrix sapply apply a function to the elements of a list is.numeric returns TRUE or FALSE, depending on the type of object Last updated November 20, 2013 28 / 71Introduction T o Programming In R
  29. 29. W riting functions 1 Workshop overview and materials 2 Data types Extracting and replacing object elements 4 Applying functions to list elements 3 5 Writing functions Control flow 6 7 The S3 object class system Things that may surprise you 9 Additional resources 8 10 Loops (supplimental) Last updated November 20, 2013 29 / 71Introduction T o Programming In R
  30. 30. W riting functions A function is a collection of commands that takes input(s) and returns output. If you have a specific analysis or transformation you want to do on different data, use a function Functions are defined using the function() function Functions can be defined with any number of named arguments Arguments can be of any type (e.g., vectors, data.frames, lists . . . ) Last updated November 20, 2013 30 / 71Introduction T o Programming In R
  31. 31. W riting functions The return value of a function can be: The last object stored in the body of the function Objects explicitly returned with the return() function Other function output can come from: Calls to print() , message() or cat() in function body Error messages Assignment inside the body of a function takes place in a local environment Example: > f <- function( ) { # define function f + p r i n t ( " s e t t i n g x to 1") # p r i n t a t e xt s t r i n g + x <- 1} # s e t x to 1 > > y <- f ( ) # assign y the value returned by f [1] " s e t t i n g x to 1" > > y # p r i n t y [1] 1 > x # x in the global i s not 1! a b c d e f g h i j 101 102 103 1 105 106 107 108 109 110 Last updated November 20, 2013 31 / 71Introduction T o Programming In R
  32. 32. W riting functions Goal: write a function that returns the square of it’sargument > square <- function (x) { # define function named "square" with argument x + return(x*x) # multiple the x argument by i t s e l f + } # end the function defini t io n > > # check to see t h a t the function works > square(x = 2) # square the value 2 [1] 4 > square( 10) # square the value 10 [1] 100 > square( 1: 5) # square integer s 1 through 5 [1] 1 4 9 16 25 Last updated November 20, 2013 32 / 71Introduction T o Programming In R
  33. 33. W riting functions Stepping througth functions and setting breakpoints Use traceback() to see what went wrong after the fact Last updated November 20, 2013 33 / 71Introduction T o Programming In R
  34. 34. W riting functions Key points: writing new functions is easy! most functions will have a return value, but functions can also print things, write things to file etc. functions can be stepped through to facilitate debugging Functions introduced in this section function defines a new function return used inside a function definition to set the return value browser sets a break point debug turns on the debugging flag of a function so you can step through it undebug turns off the debugging flag traceback shows the error stack (call after an error to see what went wrong) Last updated November 20, 2013 34 / 71Introduction T o Programming In R
  35. 35. W riting functions Write a function that takes a data.frame as an argument and returns the mean of each numeric column in the data frame. Test your function using the iris data. Modify your function so that it returns a list, the first element if which is the means of the numeric variables, the second of which is the counts of the levels of each categorical variable. Last updated November 20, 2013 35 / 71Introduction T o Programming In R
  36. 36. W riting functions Write a function that takes a data.frame as an argument and returns the mean of each numeric column in the data frame. Test your function using the iris data. statsum <- function( d f ) { c l a s s e s <- sapply( d f , c l a s s ) means <- sappl y ( df [ c l a s s e s == "numeric"] , mean) return (means)} statsum( i r i s ) Modify your function so that it returns a list, the first element if which is the means of the numeric variables, the second of which is the counts of the levels of each categorical variable. statsum <- f unc t i on( d f ) { c l a s s e s <- s apply( d f , c l a s s ) means <- sapply( df [ c l a s s e s == "numeric"] , mean) counts <- sapply( df [ c l a s s e s == " f a c t o r " ] , t a bl e ) r e t ur n ( l i s t (means, counts))} statsum( i r i s ) Last updated November 20, 2013 36 / 71Introduction T o Programming In R
  37. 37. Control flow 1 Workshop overview and materials 2 Data types Extracting and replacing object elements 4 Applying functions to list elements 3 5 Writing functions Control flow 6 7 The S3 object class system Things that may surprise you 9 Additional resources 8 10 Loops (supplimental) Last updated November 20, 2013 37 / 71Introduction T o Programming In R
  38. 38. Control flow Basic idea: if some condition is true, do one thing. If false, do something else Carried out in R using i f ( ) and e l s e ( ) statements, which can be nested if necessary Especially useful for checking function arguments and performing different operations depending on function input Last updated November 20, 2013 38 / 71Introduction T o Programming In R
  39. 39. Control flow Goal: write a function that tells us if a number is positive or negative > ## use branching to return d i f f e r e n t resul t depending on the sign of the inpu > isPosit i ve <- function(x) { # define function " isPosit i ve " + i f (x > 0) { # i f x i s greater than zero, then + c a t( x , " i s positi ve n" ) } # say so! + else { # otherwise + c a t( x , " i s negative n" )} # say x i s negative + } # end function definit i o n > > ## t e s t the i s P o s i t i v e ( ) function > isPosit i ve ( 10) 10 i s positi v e > isPosit i ve ( -1) -1 i s negative > isPosit i ve ( 0) 0 i s negative Need to do something different if x equals zero! Last updated November 20, 2013 39 / 71Introduction T o Programming In R
  40. 40. Control flow Add a condition to handle x = 0 > ## add condition to handle the case t h a t x i s zero > i s P o s i t i ve <- function(x) { # define function " isPosit i ve " + i f (x > 0) { # i f x i s greater than zero, then + c a t( x , " i s positi ve n" ) } # say so! + else i f (x == 0) { # otherwise i f x i s zero + c a t( x , " i s zero n" )} # say so! + else { #otherwise + c a t( x , " i s negative n" )} # say x i s negative + } # end function defini t i on > Test the new function > isPosit i ve ( 0) # t e s t the i s P o s i t i v e ( ) function 0 i s zero > isPosit i ve ( "a") #oops, t h a t will not work! a i s positi ve We fixed the problem when x = 0, but now we need to make sure x is numeric of length one (unless we agree with R that a is positive!) Last updated November 20, 2013 40 / 71Introduction T o Programming In R
  41. 41. Control flow Do something reasonable if x is not numeric > ## add condition to handle the case t h a t x i s zero > i s P o s i t i ve <- function(x) { # define function " isPosit i ve " + i f ( ! is.numeric(x) | length(x) > 1) { + c a t( "x must be a numeric vector of length one! n" )} + else i f (x > 0) { # i f x i s greater than zero, then + c a t( x , " i s positi ve n" ) } # say so! + else i f (x == 0) { # otherwise i f x i s z e ro + c a t( x , " i s zero n" )} # say so! + else { #otherwise + c a t( x , " i s negative n" )} # say x i s negative + } # end function defini t i on > > isPosit i ve ( "a") # t e s t the i s P o s i t i v e ( ) function on character x must be a numeric vector of length one! Last updated November 20, 2013 41 / 71Introduction T o Programming In R
  42. 42. Control flow Key points: code can be conditionally executed conditions can be nested conditional execution is often used for argument checking, among other things Functions1 introduced in this section cat Concatenates and prints R objects if execute code only if condition is met else used with if; code to execute if condition is not met 1Technically i f and else are not functions, but this need not concern us at the moment Last updated November 20, 2013 42 / 71Introduction T o Programming In R
  43. 43. Control flow Add argument checking code to return an error if the argument to your function is not a data.frame Insert a break point with browser() and step through your function Last updated November 20, 2013 43 / 71Introduction T o Programming In R
  44. 44. Control flow Add argument checking code to return an error if the argument to your function is not a data.frame statsum <- function( d f ) { i f ( c l a s s ( d f ) != "data.frame") stop( "df must be a data.frame!") c l a s s e s <- sapply( d f , c l a s s ) means <- sappl y ( df [ c l a s s e s == "numeric"] , mean) counts <- sapply( df [ c l a s s e s == " f a c t o r " ] , tabl e ) r e t ur n ( l i s t (means, counts))} statsum( 1: 10) statsum( i r i s ) Insert a break point with browser() and step through your function statsum <- f unc t i on( d f ) { i f ( c l a s s ( d f ) != "data.frame") stop( "df must be a data.frame!") browser( ) c l a s s e s <- s apply( d f , c l a s s ) means <- sapply( df [ c l a s s e s == "numeric"] , mean) counts <- sapply( df [ c l a s s e s == " f a c t o r " ] , t a bl e ) r e t ur n ( l i s t (means, counts)) } statsum( i r i s ) Last updated November 20, 2013 44 / 71Introduction T o Programming In R
  45. 45. The S3 object class system 1 Workshop overview and materials 2 Data types Extracting and replacing object elements 4 Applying functions to list elements 3 5 Writing functions Control flow 6 7 The S3 object class system Things that may surprise you 9 Additional resources 8 10 Loops (supplimental) Last updated November 20, 2013 45 / 71Introduction T o Programming In R
  46. 46. The S3 object class system R has two major object systems: Relatively informal "S3" classes Stricter, more formal "S4" classes We will cover only the S3 system, not the S4 system Basic idea: functions have different methods for different types of objects Last updated November 20, 2013 46 / 71Introduction T o Programming In R
  47. 47. The S3 object class system The class of an object can be retrieved and modified using the c l a s s ( ) function: > x <- 1: 10 > class(x) [1] " integer" > class(x) <- "foo" > class(x) [1] "foo" Objects are not limited to a single class, and can have many classes: > class(x) <- c ( "A", "B") > class(x) [1] "A" "B" Last updated November 20, 2013 47 / 71Introduction T o Programming In R
  48. 48. The S3 object class system Functions can have many methods, allowing us to have (e.g.) one plot() function that does different things depending on what is being plotted() Methods can only be defined for generic functions: plot, print, summary, mean, and several others are already generic > # see what methods have been defined for the mean function > methods(mean) [1] mean.Date mean.default mean.difftim e mean.POSIXct [5] mean.POSIXlt > # which functions have methods for d a t a . f rames? > methods(class=" d a t a . f rame") [ 1: 9] [1] " aggregate. data. fra me" "anyDuplicated.data.frame" [3] " as. data.frame.data .frame" " a s . l i s t . d a t a . f r a m e " [5] " a s . m a t r i x . d a t a . f r a me" [7] "cbind.data.frame" [9] " [. data. frame" "by.data.frame" "[<-.data.frame" Last updated November 20, 2013 48 / 71Introduction T o Programming In R
  49. 49. The S3 object class system To create a new method for a function that is already generic all you have to do is name your function function.class > # create a mean() method for objects of class " foo" : > mean.foo <- function(x) { # mean method for "foo" class + i f ( i s . n u meric( x ) ) { + c a t( "The average i s " , mean.defaul t ( x ) ) + return( i n v i s i b l e ( mean.default( x ) ) ) #use mean.default for numeric + } el s e + c a t( "x i s not numeric n" ) } # otherwise say x not numeric > > x <- 1: 10 > mean(x) [1] 5.5 > class (x) <- "foo" > mean(x) The average i s 5.5> > x <- as. character (x) > class (x) <- "foo" > mean(x) x i s not nume r i c Last updated November 20, 2013 49 / 71Introduction T o Programming In R
  50. 50. The S3 object class system S3 generics are most often used for print, summary, and plot methods, but sometimes you may want to create a new generic function > # create a generic disp() function > disp <- function( x , . . . ) { + UseMethod( "disp") + } > > # create a disp method for class "mat r i x " > disp.matrix <- function(x) { + p r i n t ( round( x , digits= 2) ) + } > > # t e s t i t out > disp( matrix( runif ( 10) , ncol=2) ) [ , 1 ] [ , 2 ] [ 1 , ] 0.78 0.21 [ 2 , ] 0.85 0.45 [ 3 , ] 0.31 0.34 [ 4 , ] 0.47 0.80 [ 5 , ] 0.51 0.66 Last updated November 20, 2013 50 / 71Introduction T o Programming In R
  51. 51. The S3 object class system Key points: there are several class systems in R, of which S3 is the oldest and simplest objects have class and functions have corresponding methods the class of an object can be set by simple assignment S3 generic functions all contain UseMethod("x") in the body, where x is the name of the function new methods for existing generic functions can be written simply by defining a new function with a special naming scheme: the name of the function followed by dot followed by the name of the class Functions introduced in this section plot creates a graphical display, the type of which depends on the class of the object being plotted methods lists the methods defined for a function or class UseMethod the body of a generic function invisible returns an object but does not print it Last updated November 20, 2013 51 / 71Introduction T o Programming In R
  52. 52. The S3 object class system Modify your function so that it also returns the standard deviations of the numeric variables Modify your function so that it returns a list of class "statsum" Write a print method for the statsum class Last updated November 20, 2013 52 / 71Introduction T o Programming In R
  53. 53. The S3 object class system Modify your function so that it also returns the standard deviations of the numeric variables statsum <- function( d f ) { i f ( c l a s s ( d f ) != "data.frame") stop( "df must be a data.frame!") c l a s s e s <- sapply( d f , c l a s s ) means <- sappl y ( df [ c l a s s e s == "numeric"] , mean) sds <- sapply( df [ c l a s s e s == "numeric"] , mean) counts <- sapply( df [ c l a s s e s == " f a c t o r " ] , tabl e ) r e t ur n ( l i s t ( cbind(means , s d s ) , counts))} statsum( i r i s ) Modify your function so that it returns a list of class "statsum" statsum <- f unc t i on( d f ) { i f ( c l a s s ( d f ) != "data.frame") stop( "df must be a data.frame!") c l a s s e s <- s apply( d f , c l a s s ) means <- sapply( df [ c l a s s e s == "numeric"] , mean) sds <- sapply( df [ c l a s s e s == "numeric"] , mean) counts <- sapply( df [ c l a s s e s == " f a c t o r " ] , t a bl e ) R <- l i s t ( cbind(means, s d s ) , counts) c l a s s (R) <- c ( "statsum", c l a s s (R)) r e t ur n (R)} s t r ( statsum( i r i s ) ) 3 [@3] Write a print method for the statsum class pr i nt . s t a t s um <- function( x) { p r i n t ( x [ [ 1] ] , c a t ( "Numeric va r i able de s c r i pt i ve s t a t i s t i c s : n " ) Introduction T o Programming In R Last updated November 20, 2013 53 / 71
  54. 54. Things that may surprise you 1 Workshop overview and materials 2 Data types Extracting and replacing object elements 4 Applying functions to list elements 3 5 Writing functions Control flow 6 7 The S3 object class system Things that may surprise you 9 Additional resources 8 10 Loops (supplimental) Last updated November 20, 2013 54 / 71Introduction T o Programming In R
  55. 55. Things that may surprise you There are an unfortunately large number of surprises in R programming Some of these "gotcha’s" are common problems in other languages, many are unique to R We will only cover a few –for a more comprehensive discussion please see http://www.burns-s t a t .com/pages/Tutor/R_inferno.pdf Last updated November 20, 2013 55 / 71Introduction T o Programming In R
  56. 56. Things that may surprise you Floating point arithmetic is not exact: > . 1 == . 3 / 3 [1] FALSE Solution: use a l l . e q u a l ( ): > a l l . e q u a l ( . 1 , . 3/ 3) [1] TRUE Last updated November 20, 2013 56 / 71Introduction T o Programming In R
  57. 57. Things that may surprise you R does not exclude missing values by default –a single missing value in a vector means that many thing are unknown: > x <- c (1: 10, NA, 12: 20) > c (mean( x ) , sd(x) , median( x ) , min( x ) , sd( x ) ) [1] NA NA NA NA NA NA is not equal to anything, not even NA > NA == NA [1] NA Solutions: use na.rm = TRUE option when calculating, and is.na to test for missing Last updated November 20, 2013 57 / 71Introduction T o Programming In R
  58. 58. Things that may surprise you Automatic type conversion happens a lot which is often useful, but makes it easy to miss mistakes > # combining values coereces them to the most general type > (x <- c (TRUE, FALSE, 1, 2, "a", "b") ) [1] "TRUE" "FALSE" "1" "2" "a" "b" > s t r (x) chr [1:6] "TRUE" "FALSE" "1" "2" "a" "b" > > # comparisons convert arguments to most general type > 1 > "a" [1] FALSE Maybe this is what you expect. . . I would like to at least get a warning! Last updated November 20, 2013 58 / 71Introduction T o Programming In R
  59. 59. Things that may surprise you Functions you might expect to work similarly don’t always: > mean( 1, 2, 3, 4, 5)*5 [1] 5 > sum( 1, 2, 3, 4, 5) [1] 15 Why are these different?!? > args(mean) function ( x , . . . ) NULL > args(sum) function ( . . . , na.rm = FALSE) NULL Ouch. That is not nice at all! Last updated November 20, 2013 59 / 71Introduction T o Programming In R
  60. 60. Things that may surprise you Factors sometimes behave as numbers, and sometimes as characters, which can be confusing! > (x <- f a c t or( c ( 5, 5, 6, 6) , levels = c (6, 5) ) ) [1] 5 5 6 6 Levels: 6 5 > > s t r (x) Factor w/ 2 levels " 6" , " 5" : 2 2 1 1 > > as. character (x) [1] "5" "5" "6" "6" > # here i s where people sometimes get l o s t . . . > as.numeric(x) [1] 2 2 1 1 > # you probably want > as.numeric( as. charact e r ( x ) ) [1] 5 5 6 6 Last updated November 20, 2013 60 / 71Introduction T o Programming In R
  61. 61. Additional resources 1 Workshop overview and materials 2 Data types Extracting and replacing object elements 4 Applying functions to list elements 3 5 Writing functions Control flow 6 7 The S3 object class system Things that may surprise you 9 Additional resources 8 10 Loops (supplimental) Last updated November 20, 2013 61 / 71Introduction T o Programming In R
  62. 62. Additional resources S3 system overview: https://github.com/ hadley/devtools/wiki/S3 S4 system overview: https://github.com/ hadley/devtools/wiki/S4 R documentation: h t t p : / / c r a n . r - project.org/manuals.html Collection of R tutorials: h t t p : / / c r a n . r - projec t . o r g / o t h e r - docs.html R for Programmers (by Norman Matloff, UC–Davis) h t t p : / / h e a t h e r . c s . u c d avis.edu/~matloff/R/RProg.pdf Calling C and Fortran from R (by Charles Geyer, UMinn) http://www.stat.umn.e d u/ ~c ha rlie /rc/ State of the Art in Parallel Computing with R (Schmidberger et al.) http://www.jstatso|.o rg/v31/i01/paper Institute for Quantitative Social Science: http://iq. ha r var d .e d u Research technology consulting: h t t p : / / p r o j e c t s . i q . harvard.edu/rtc Last updated November 20, 2013 62 / 71Introduction T o Programming In R
  63. 63. Additional resources Help Us Make This Workshop Better! Please take a moment to fill out a very short feedback form These workshops exist for you –tell us what you need! h t t p : / / t i n y u r l . c o m / RprogrammingFeedback Last updated November 20, 2013 63 / 71Introduction T o Programming In R
  64. 64. Loops (supplimental) 1 Workshop overview and materials 2 Data types Extracting and replacing object elements 4 Applying functions to list elements 3 5 Writing functions Control flow 6 7 The S3 object class system Things that may surprise you 9 Additional resources 8 10 Loops (supplimental) Last updated November 20, 2013 64 / 71Introduction T o Programming In R
  65. 65. Loops (supplimental) A loop is a collection of commands that are run over and over again. A for loop runs the code a fixed number of times, or on a fixed set of objects. A while loop runs the code until a condition is met. If you’re typing the same commands over and over again, you might want to use a loop! Last updated November 20, 2013 65 / 71Introduction T o Programming In R
  66. 66. Loops (supplimental) For each value in a vector, print the number and its square > # For-loop example > for (num i n seq( -5, 5) ) {# for each number in [ - 5 , 5] + c a t(num, "squared i s " , num^2, " n" ) # p r i n t the number + } -5 squared i s 25 -4 squared i s 16 -3 squared i s 9 -2 squared i s 4 -1 squared i s 1 0 squared i s 0 1 squared i s 1 2 squared i s 4 3 squared i s 9 4 squared i s 16 5 squared i s 25 Introduction T o Programming In R
  67. 67. Loops (supplimental) Goal: simulate rolling two dice until we roll two sixes > ## While-loop example: r o l l i n g dice > set. seed ( 15) # allows repoducible sample() resu l t s > dice <- seq( 1, 6) # s e t dice = [1 2 3 4 5 6] > r o l l <- 0 # s e t r o l l = 0 > while ( r o l l < 12) { + r o l l <- sample( d i c e , 1) + sample( d i c e , 1) # calculate sum of two r o l l s + c a t( "We rolled a " , r o l l , " n" ) # p r i n t the r e s u l t + } # end the loop We rolled a 6 We rolled a 10 We rolled a 9 We rolled a 7 We rolled a 10 We rolled a 5 We rolled a 9 We rolled a 12 Last updated November 20, 2013 Introduction T o Programming In R
  68. 68. Loops (supplimental) Often you will want to store the results from a loop. You can create an object to hold the results generated in the loop and fill in the values using indexing > ## save calculat i o ns done in a loop > Result <- l i s t ( ) # create an object to store the r e s u l t s > for ( i in 1: 5) {# for each i in [ 1 , 5] + R e s u l t [ [ i ] ] <- 1: i ## a s s i gn the sequence 1 to i to Result + } > Result # pr i n t Result [ [ 1 ] ] [1] 1 [ [ 2 ] ] [1] 1 2 [ [ 3 ] ] [1] 1 2 3 [ [ 4 ] ] [1] 1 2 3 4 [ [ 5 ] ] [1] 1 2 3 4 5 Last updated November 20, 2013 Introduction T o Programming In R
  69. 69. Loops (supplimental) Most operations in R are vectorized –This makes loops unnecessary in many cases Use vector arithmatic instead of loops: > x <- c ( ) # create vector x > for ( i in 1: 5) x [ i ] <- i + i # double a vector using a loop > p r i n t (x) # p r i n t the r e s u l t [1] 2 4 6 8 10 > > 1: 5 + 1: 5 #double a vector without a loop [1] 2 4 6 8 10 > 1: 5 + 5 # shorter vectors are recycled [1] 6 7 8 9 10 Use paste instead of loops: > ## Earlier we said > ## for (num in seq( -5, 5)) {# for each number in [ - 5 , 5] > > ## cat(num, "squared i s " , num^2, " n " ) # p r i n t the number ## } > ## a bette r way: > paste( 1: 5, "squared = " , ( 1: 5)^ 2) [1] "1 squared = 1" "2 squared = 4" "3 squared = 9"[4] "4 squared = 16" "5 squared = 25" Introduction T o Programming In R Last updated November 20, 2013
  70. 70. Loops (supplimental) use a loop to get the c l a s s ( ) of each column in the iris data set use the results from step 1 to select the numeric columns use a loop to calculate the mean of each numeric column in the iris data Introduction T o Programming In R
  71. 71. Loops (supplimental) use a loop to get the c l a s s ( ) of each column in the iris data set > classes <- c ( ) > for (name in names( i r i s ) ) { + + } > classes[name] <- class(ir is[ [n am e]] ) > classes Sepal.Length Sepal.Width Petal.Length Petal.Width Species " factor""numeric" "numeric" "numeric" "numeric" use the results from step 1 to select the numeric columns > iris.num <- i r i s [ , names(classes)[classes== "numeric"] ] > head(ir is. nu m, 2) Sepal.Length Sepal.Width Petal.Length Petal.Width 1 2 5.1 4.9 3.5 3.0 1.4 1.4 0.2 0.2 use a loop to calculate the mean of each numeric column in the iris data > iris. m e ans <- c ( ) > for (var in names(iris. num) ) { + iris. means [[ var ]] <- mean( i r i s [ [ v a r ] ] ) + } Last updated November 20, 2013 / 71Introduction T o Programming In R
  72. 72. Training in Bangalore Second Floor and Third Floor, 5/3 BEML Layout, Varathur Road, Thubarahalli, Kundalahalli Gate, Bangalore 66 Landmark – Behind Kundalahalli Gate bus stop, Opposite to SKR Convention Mall, Next to AXIS Bank.

×