Se ha denunciado esta presentación.
Se está descargando tu SlideShare. ×
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Próximo SlideShare
6. R data structures
6. R data structures
Cargando en…3
×

Eche un vistazo a continuación

1 de 67 Anuncio

Más Contenido Relacionado

Presentaciones para usted (20)

Anuncio

Más reciente (20)

R Basics

  1. 1. Dr. E. N. SATHISHKUMAR, Guest Lecturer, Department of Computer Science, Periyar University, Salem -11.
  2. 2. Introduction  R (the language) was created in the early 1990s, by Ross Ihaka and Robert Gentleman.  It is based upon the S language that was developed at Bell Laboratories in the 1970s.  It is a high-level language like C#, Java, etc..,  R is an interpreted language (sometimes called a scripting language), which means that your code doesn’t need to be compiled before you run it.  R supports a mixture of programming paradigms (At its core, it is an imperative language, but it also supports OOP, and functional programming).
  3. 3. Getting started  Where to get R? The newest version of R and its documentation can be downloaded from http://www.r-project.org.  Download, Packages: Select CRAN  Set your Mirror: India (Indian Institute of Technology Madras) Select http://ftp.iitm.ac.in/cran/  Select Download R for Windows  Select base.  Select Download R 3.4.2 for Windows  Execute the R-3.4.2-win.exe with administrator privileges. Once the program is installed, run the R program by clicking on its icon
  4. 4. Choosing an IDE  If you use R under Windows or Mac OS X, then a graphical user interface (GUI) is available to you.  Some of he best GUIs are:  Emacs + ESS  Eclipse/Architect  RStudio  Revolution-R  Live-R  Tinn-R
  5. 5. A Scientific Calculator  R is at heart a supercharged scientific calculator, so typing commands directly into the R Console. > 5+5 [1] 10 > 4-7 [1] -3 > 7*3 [1] 21 > 16/31 [1] 0.516129 > log2(32) [1] 5
  6. 6. Variable Assignment  We assign values to variables with the assignment operator "=".  Just typing the variable by itself at the prompt will print out the value.  We should note that another form of assignment operator "<-" is also in use. > X = 2 [1] 2 > X <- 5 [1] 5 > X * X [1] 25
  7. 7. Comments  All text after the pound sign "#" within the same line is considered a comment. > X = 2 # this is a comment [1] 2 # 5 is assign to variable X > X <- 5 [1] 5
  8. 8. Getting Help  R provides extensive documentation. If we want to help to particular topic, just use help() with help topic.  For example,  > help("if")  starting httpd help server ... Done  Immediately help content opens in web browser.
  9. 9. Basic Data Types  There are several basic R data types that are of frequent occurrence in routine R calculations.  Numeric  Integer  Complex  Logical  Character  Factor
  10. 10. Numeric  Decimal values are called numerics in R. It is the default computational data type.  If we assign a decimal value to a variable x as follows, x will be of numeric type. > x = 10.5 # assign a decimal value > x # print the value of x [1] 10.5 > class(x) # print the class name of x [1] "numeric"
  11. 11. Numeric  Furthermore, even if we assign an integer to a variable k, it is still being saved as a numeric value. > k = 1 > k # print the value of k [1] 1 > class(k) # print the class name of k [1] "numeric"  The fact that k is not an integer can be confirmed with the is.integer function. > is.integer(k) # is k an integer? [1] FALSE
  12. 12. Integer  In order to create an integer variable in R, we invoke the as.integer function.  For example, > y = as.integer(3) > y # print the value of y [1] 3 > class(y) # print the class name of y [1] "integer" > is.integer(y) # is y an integer? [1] TRUE
  13. 13. Complex  A complex value in R is defined via the pure imaginary value i.  For example, > z = 1 + 2i # create a complex number > z # print the value of z [1] 1+2i > class(z) # print the class name of z [1] "complex"  The following gives an error as −1 is not a complex value. > sqrt(−1) # square root of −1 [1] NaN  Warning message: In sqrt(−1) : NaNs produced
  14. 14. Complex  Instead, we have to use the complex value −1 + 0i.  For example, > sqrt(−1+0i) # square root of −1+0i [1] 0+1i  An alternative is to coerce −1 into a complex value. > sqrt(as.complex(−1)) [1] 0+1i
  15. 15. Logical  A logical value is often created via comparison between variables.  For example, > x = 1; y = 2 # sample values > z = x > y # is x larger than y? > z # print the logical value [1] FALSE > class(z) # print the class name of z [1] "logical"
  16. 16. Logical  A Standard logical operations are "&", "|" , "!" .  For example, > u = TRUE; v = FALSE > u & v # u AND v [1] FALSE > u | v # u OR v [1] TRUE > !u # negation of u [1] FALSE
  17. 17. Character  A character object is used to represent string values in R. We convert objects into character values with the as.character(). For example, > x = as.character(3.14) > x # print the character string [1] "3.14" > class(x) # print the class name of x [1] "character" > x = as.character( “hai”) > x # print the character string [1] “hai” > class(x) # print the class name of x [1] "character"
  18. 18. Factor  The factor data type is used to represent categorical data. (i.e. data of which the value range is a collection of codes).  For example, to create a vector of length five of type factor do the following: >sex <- c("male","male","female","male","female")  The object sex is a character object. You need to transform it to factor. >sex <- factor(sex) >sex [1] male male female male female Levels: female male  Use the function levels to see the different levels a factor variable has.
  19. 19. Data structures  Before you can perform statistical analysis in R, your data has to be structured in some coherent way. To store your data R has the following structures:  Vector  Matrix  Array  Data frame  Time-series  List
  20. 20. Vectors  A vector is a sequence of data elements of the same basic type.  Members in a vector are officially called components.  For example, Here is a vector containing three numeric values 2, 3, 5. > c(2, 3, 5) [1] 2 3 5  Here is a vector of logical values. > c(TRUE, FALSE, TRUE, FALSE, FALSE) [1] TRUE FALSE TRUE FALSE FALSE
  21. 21. Combining Vectors  Vectors can be combined via the function c.  For example, Here is a vector containing three numeric values 2, 3, 5. > n = c(2, 3, 5) > s = c("aa", "bb", "cc", "dd", "ee") > c(n, s) [1] "2" "3" "5" "aa" "bb" "cc" "dd" "ee"
  22. 22. Vector Arithmetics  Arithmetic operations of vectors are performed member-by-member.  For example, Here is a vector containing three numeric values 2, 3, 5. > a = c(1, 3, 5, 7) > b = c(1, 2, 4, 8)  We add a and b together, the sum would be a vector whose members are the sum of the corresponding members from a and b. > a + b [1] 2 5 9 15  Similarly for subtraction, multiplication and division, we get new vectors via member wise operations.
  23. 23. Vector Recycling Rule  If two vectors are of unequal length, the shorter one will be recycled in order to match the longer vector.  For example, sum is computed by recycling values of the shorter vector. > u = c(10, 20, 30) > v = c(1, 2, 3, 4, 5, 6, 7, 8, 9) > u + v [1] 11 22 33 14 25 36 17 28 39
  24. 24. Vector Index  We retrieve values in a vector by declaring an index inside a single square bracket "[ ]" operator.  For example, > s = c("aa", "bb", "cc", "dd", "ee") > > s[3] [1] "cc"
  25. 25. Vector Negative Index  If the index is negative, it would strip the member whose position has the same absolute value as the negative index.  For example, > s = c("aa", "bb", "cc", "dd", "ee") > s[-3] [1] "aa" "bb" "dd" "ee" Out-of-Range Index  If an index is out-of-range, a missing value will be reported via the symbol NA. >s[10] [1] NA
  26. 26. Numeric Index Vector  A new vector can be sliced from a given vector with a numeric index vector, which consists of member positions of the original vector to be retrieved.  For example, > s = c("aa", "bb", "cc", "dd", "ee") > s[c(2, 3)] [1] "bb" "cc"
  27. 27. Vector Duplicate Indexes  The index vector allows duplicate values. Hence the following retrieves a member twice in one operation.  For example, > s = c("aa", "bb", "cc", "dd", "ee") > s[c(2, 3, 3)] [1] "bb" "cc" "cc"
  28. 28. Vector Out-of-Order Indexes  The index vector can even be out-of-order. Here is a vector slice with the order of first and second members reversed.  For example, > s = c("aa", "bb", "cc", "dd", "ee") > s[c(2, 1, 3)] [1] "bb" "aa" "cc"
  29. 29. Vector Range Index  To produce a vector slice between two indexes, we can use the colon operator ":".  For example, > s = c("aa", "bb", "cc", "dd", "ee") > s[2:4] [1] "bb" "cc" "dd"
  30. 30. Named Vector Members  We can assign names to vector members.  For example, the following variable v is a character string vector with two members. > v = c("Mary", "Sue") > v [1] "Mary" "Sue"  We now name the first member as First, and the second as Last. > names(v) = c("First", "Last") > v First Last "Mary" "Sue"
  31. 31. Named Vector Members  We can assign names to vector members.  For example, the following variable v is a character string vector with two members. > v = c("Mary", "Sue") > v [1] "Mary" "Sue”  We now name the first member as First, and the second as Last. > names(v) = c("First", "Last") > v First Last "Mary" "Sue"
  32. 32. Matrices  A matrix is a collection of data elements arranged in a row- column layout.  A matrix can be regarded as a generalization of a vector.  As with vectors, all the elements of a matrix must be of the same data type.  A matrix can be generated in several ways.  Use the function dim  Use the function matrix
  33. 33. Matrices  Use the function dim > x <- 1:8 [,1] [,2] [,3] [,4] > dim(x) <- c(2,4) [1,] 1 3 5 7 > X [2,] 2 4 6 8  Use the function matrix > A = matrix(c(2, 4, 3, 1, 5, 7), nrow=2, ncol=3, byrow = T) > A > A <- matrix(c(2, 4, 3, 1, 5, 7),2,3,byrow=T) > A A [,1] [,2] [,3] [1,] 2 4 3 [2,] 1 5 7
  34. 34. Accessing Matrices  An element at the mth row, nth column of A can be accessed by the expression A[m, n]. > A[2, 3] [1] 7  The entire mth row A can be extracted as A[m, ]. > A[2, ] [1] 1 5 7  We can also extract more than one rows/columns at a time. > A[ ,c(1,3)] [,1] [,2] [1,] 2 3 [2,] 1 7
  35. 35. Calculations on matrices  We construct the transpose of a matrix by interchanging its columns and rows with the function t . > t(A) # transpose of A [,1] [,2] [1,] 2 1 [2,] 4 5 [3,] 3 7  We can deconstruct a matrix by applying the c function, which combines all column vectors into one. > c(A) [1] 2 4 3 1 5 7
  36. 36. Arrays  In R, Arrays are generalizations of vectors and matrices.  A vector is a one-dimensional array and a matrix is a two dimensional array.  As with vectors and matrices, all the elements of an array must be of the same data type.  An array of one dimension of two element may be constructed as follows. > x = array(c(T,F),dim=c(2)) > print(x) [1] TRUE FALSE
  37. 37. Arrays  A three dimensional array - 3 by 3 by 3 - may be created as follows. > z = array(1:27,dim=c(3,3,3)) > dim(z) [1] 3 3 3 > print(z) , , 1 [,1] [,2] [,3] [1,] 1 4 7 [2,] 2 5 8 [3,] 3 6 9 , , 2 [,1] [,2] [,3] [1,] 10 13 16 [2,] 11 14 17 [3,] 12 15 18 , , 3 [,1] [,2] [,3] [1,] 19 22 25 [2,] 20 23 26 [3,] 21 24 27
  38. 38. Accessing Arrays  R arrays are accessed in a manner similar to arrays in other languages: by integer index, starting at 1 (not 0).  For example, the third dimension is a 3 by 3 array. > z[,,3] [,1] [,2] [,3] [1,] 19 22 25 [2,] 20 23 26 [3,] 21 24 27  Specifying two of the three dimensions returns an array on one dimension. >z[,3,3] [1] 25 26 27
  39. 39. Accessing Arrays  Specifying three of three dimension returns an element of the 3 by 3 by 3 array. > z[3,3,3] [1] 27  More complex partitioning of array may be had. > z[,c(2,3),c(2,3)] , , 1 [,1] [,2] [1,] 13 16 [2,] 14 17 [3,] 15 18 , , 2 [,1] [,2] [1,] 22 25 [2,] 23 26 [3,] 24 27
  40. 40. Lists  A list is a collection of R objects.  list() creates a list. unlist() transform a list into a vector.  The objects in a list do not have to be of the same type or length. >x <- c(1:4) >y <- FALSE > z <- matrix(c(1:4),nrow=2,ncol=2) > myList <- list(x,y,z) > myList [[1]] [1] 1 2 3 4 [[2]] [1] FALSE [[3]] [,1] [,2] [1,] 1 2 [2,] 3 4
  41. 41. Data Frame  A data frame is used for storing data like spreadsheet(table).  It is a list of vectors of equal length.  Most statistical modeling routines in R require a data frame as input.  For example, > weight = c(150, 135, 210, 140) > height = c(65, 61, 70, 65) > gender = c("Fe","Fe","Ma","Fe") > study = data.frame(weight,height,gender) # make the data frame > study weight height gender 1 150 65 Fe 2 135 61 Fe 3 210 70 Ma 4 140 65 Fe
  42. 42. Creating a data frame  The dataframe may be created directly using data.frame().  For example, the dataframe is created - naming each vector composing the dataframe as part of the argument list. > patientID <- c(1, 2, 3, 4) > age <- c(25, 34, 28, 52) > diabetes <- c("Type1", "Type2", "Type1", "Type1") > status <- c("Poor", "Improved", "Excellent", "Poor") > patientdata <- data.frame(patientID, age, diabetes, status) > patientdata patientID age diabetes status 1 1 25 Type1 Poor 2 2 34 Type2 Improved 3 3 28 Type1 Excellent 4 4 52 Type1 Poor
  43. 43. Accessing data frame elements  Use the subscript notation/specify column names to identify the elements in the patient data frame [1] 25 34 28 52 >patientdata[1:2] patientID age 1 1 25 2 2 34 3 3 28 4 4 52 >table(patientdata$diabetes, patientdata$status) Excellent Improved Poor Type1 1 0 2 Type2 0 1 0 >patientdata[c("diabetes", "status")] diabetes status 1 Type1 Poor 2 Type2 Improved 3 Type1 Excellent 4 Type1 Poor
  44. 44. Functions  Most tasks are performed by calling a function in R. All R functions have three parts:  the body(), the code inside the function.  the formals(), the list of arguments which controls how you can call the function.  the environment(), the “map” of the location of the function’s variables.  The general form of a function is given by: functionname <- function(arg1, arg2,...) { Body of function: a collection of valid statements }
  45. 45. Functions  Example 1: Creating a function, called f1, which adds a pair of numbers. f1 <- function(x, y) { x+y } f1( 3, 4) [1] 7
  46. 46. Functions  Example 2: Creating a function, called readinteger. readinteger <- function() { n <- readline(prompt="Enter an integer: ") return(as.integer(n)) } print(readinteger()) Enter an integer: 55 [1] 55
  47. 47. Functions Example 3: calculate rnorm() x <- rnorm(100) y <- x + rnorm(100) plot(x, y) my.plot <- function(..., pch.new=15) { plot(..., pch=pch.new) } my.plot(x, y)
  48. 48. Control flow  A list of constructions to perform testing and looping in R.  These allow you to control the flow of execution of a script typically inside of a function. Common ones include:  if, else  switch  for  while  repeat  break  next  return
  49. 49. Simple if  Syntax: if (test_expression) {statement}  Example: x <- 5 if(x > 0) { print("Positive number") }  Output: [1] "Positive number"  Example: x <- 4 == 3 if (x) { "4 equals 3" }  Output: [1] FALSE
  50. 50. if...else  Syntax: if (test_expression) { statement1 }else { statement2 }  Note that else must be in the same line as the closing braces of the if statements.  Example: x <- -5 if(x > 0) { print("Non-negative number") } else { print("Negative number") }  Output: [1] "Positive number"
  51. 51. Nested if...else  Syntax: if ( test_expression1) { statement1 } else if ( test_expression2) { statement2 } else if ( test_expression3) { statement3 } else statement4  Only one statement will get executed depending upon the test_expressions.  Example: x <- 0 if (x < 0) { print("Negative number") } else if (x > 0) { print("Positive number") } else print("Zero")  Output: [1] "Zero"
  52. 52. ifelse()  There is a vector equivalent form of the if...else statement in R, the ifelse() function.  Syntax: ifelse(test_expression, x, y)  Example: > a = c(5,7,2,9) > ifelse(a %% 2 == 0,"even","odd")  Output: [1] "odd" "odd" "even" "odd"
  53. 53. for  A for loop is used to iterate over a vector, in R programming.  Syntax: for (val in sequence) {statement}  Example: v <- c("this", "is", "the", “R", "for", "loop") for(i in v) { print(i) }  Output: [1] "this" [1] "is" [1] "the" [1] R [1] "for" [1] "loop"
  54. 54. Nested for loops  We can use a for loop within another for loop to iterate over two things at once (e.g., rows and columns of a matrix).  Example: for(i in 1:3) { for(j in 1:3) { print(paste(i,j)) } }  Output: [1] "1 1" [1] "1 2" [1] "1 3" [1] "2 1" [1] "2 2" [1] "2 3" [1] "3 1" [1] "3 2" [1] "3 3"
  55. 55. while  while loops are used to loop until a specific condition is met.  Syntax: while (test_expression) { statement }  Example: i <- 1 while (i < 6) { print(i) i = i+1 }  Output: [1] 1 [1] 2 [1] 3 [1] 4 [1] 5
  56. 56. repeat  The easiest loop to master in R is repeat.  All it does is execute the same code over and over until you tell it to stop.  Syntax: repeat {statement}  Example: x <- 1 repeat { print(x) x = x+1 if (x == 6){ break } }  Output: [1] 1 [1] 2 [1] 3 [1] 4 [1] 5
  57. 57. break  A break statement is used inside a loop to stop the iterations and flow the control outside of the loop.  Example: x <- 1:5 for (val in x) { if (val == 3){ break } print(val) }  Output: [1] 1 [1] 2
  58. 58. Replication  The rep() repeats its input several times.  Another related function, replicate() calls an expression several times.  rep will repeat the same random number several times, but replicate gives a different number each time  Example: >rep(runif(1), 5) [1] 0.04573 0.04573 0.04573 0.04573 0.04573 >replicate(5, runif(1)) [1] 0.5839 0.3689 0.1601 0.9176 0.5388
  59. 59. Packages  Packages are collections of R functions, compiled code, data, documentation, and tests, in a well-defined format.  The directory where packages are stored is called the library.  R comes with a standard set of packages.  Others are available for download and installation.  Once installed, they have to be loaded into the session to be used. >.libPaths() # get library location >library() # see all packages installed >search() # see packages currently loaded
  60. 60. Adding Packages  You can expand the types of analyses you do be adding other packages.  For adding package, Download and install a package. 1 2
  61. 61. Loading Packages  To load a package that is already installed on your machine; and call the library function with package name which package you want to load.  For example, the lattice package should be installed, but it won’t automatically be loaded. We can load it with the library() or require(). >library(lattice)  Same as, >library(eda) # load package "eda" >require(eda) # the same >library() # list all available packages >library(lib = .Library) # list all packages in the default library >library(help = eda) # documentation on package "eda"
  62. 62. Importing and Exporting Data There are many ways to get data in and out. Most programs (e.g. Excel), as well as humans, know how to deal with rectangular tables in the form of tab-delimited text files. Normally, you would start your R session by reading in some data to be analysed. This can be done with the read.table function. Download the sample data to your local directory... >x <- read.table(“sample.txt", header = TRUE) Also: read.delim, read.csv, scan >write.csv(x, file = “samplenew.csv") Also: write.matrix, write.table, write HANDSON
  63. 63. Frequently used Operators <- Assign + Sum - Difference * Multiplication / Division ^ Exponent %% Mod %*% Dot product %/% Integer division %in% Subset | Or & And < Less > Greater <= Less or = >= Greater or = ! Not != Not equal == Is equal
  64. 64. Frequently used Functions c Concatenate cbind, rbind Concatenate vectors min Minimum max Maximum length # values dim # rows, cols floor Max integer in which TRUE indices table Counts summary Generic stats Sort, order, rank Sort, order, rank a vector print Show value cat Print as char paste c() as char round Round apply Repeat over rows, cols
  65. 65. Statistical Functions rnorm, dnorm, pnorm, qnorm Normal distribution random sample, density, cdf and quantiles lm, glm, anova Model fitting loess, lowess Smooth curve fitting sample Resampling (bootstrap, permutation) .Random.seed Random number generation mean, median Location statistics var, cor, cov, mad, range Scale statistics svd, qr, chol, eigen Linear algebra
  66. 66. Graphical Functions plot Generic plot eg: scatter points Add points lines, abline Add lines text, mtext Add text legend Add a legend axis Add axes box Add box around all axes par Plotting parameters (lots!) colors, palette Use colors
  67. 67. Thank you! Queries ???

×