Se ha denunciado esta presentación.
Se está descargando tu SlideShare. ×

R programming language

Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Próximo SlideShare
R learning by examples
R learning by examples
Cargando en…3
×

Eche un vistazo a continuación

1 de 23 Anuncio

Más Contenido Relacionado

Presentaciones para usted (20)

A los espectadores también les gustó (20)

Anuncio

Similares a R programming language (20)

Anuncio

Más reciente (20)

R programming language

  1. 1. Alberto Minetti
  2. 2. What is R? • Functional programming language • Matrix-based • Interpreted (written in C and Fortran) • Environment for statistical computing and graphics • Open source and GPL license • 6000+ packages in CRAN
  3. 3. Why use R? • Matrix calculation • Data visualization (interactive too) • Statistic analysis (regression, time series, geo-spatial) • Data mining, classification, clustering • Analysis of genomic data • Machine learning
  4. 4. Who uses R? • Oracle integrates R in its Big Data Appliance • IBM offers support for in-Hadoop execution of R • Data analysts for Google and Apple • 12° in TIOBE popularity index
  5. 5. How to use R? • Command-line interface, autonomous script or graphical front-ends • Connection to any data source • Data analysis • Modeling and computation • Data visualization • Fitting models or displaying data
  6. 6. R Studio IDE • licence AGPL 3 • Scripts • Workspace • Console • Images
  7. 7. Reading and writing data • From/To plain text files • From/To Excel files • From/To Databases • From the Web > heisenberg <- read.csv(file="simple.csv",head=TRUE,sep=",") > write.csv(x=data, file="simple.csv") > library(gdata) > mydata = read.xls("mydata.xls") > write.xlsx(x=data, file="simple.csv«) > library(XLConnect) > wk = loadWorkbook("mydata.xls") > df = readWorksheet(wk,sheet="Sheet1") > library(RPostgreSQL) > con <- dbConnect(dbDriver("PostgreSQL"), dbname = "abc", user="postgres") > q <- dbGetQuery(con, "SELECT * FROM prices WHERE x > 0") > dbSendQuery(con, “INSERT INTO forecasts VALUE (10)") > fpe <- read.table("http://data.princeton.edu/wws509/datasets/effort.dat")
  8. 8. Programming features • Flow control statements • while, repeat, break, continue, if, return • Exceptions, using try catch blocks • Functions • Default parameters • Positional or named arguments • Generic • Anonymous fibonacci <- function(n) { if(n<=2) return 1 fib <- numeric(n) fib[1:2] <- 1 for(i in 3:n) { fib[i] <- fib[i-1] + fib[i-2] } return (fib[n]) } arr <- function(a = 1, b = 2) { c(a, b) } > arr(b=6) [1] 1 6 f3 <- function(f) { f(3) } f3(function(x) {x*7}) `%my%` <- function(a,b) { return 2*a + 2 *b }
  9. 9. Correlation mpg hp cyl Mazda RX4 21.0 110 6 Mazda RX4 Wag 21.0 110 6 Datsun 710 22.8 93 4 Hornet 4 Drive 21.4 110 6 Hornet Sportabout 18.7 175 8 Valiant 18.1 105 6 Duster 360 14.3 245 8 Merc 240D 24.4 62 4 Merc 230 22.8 95 4 Merc 280 19.2 123 6 Merc 280C 17.8 123 6 Merc 450SE 16.4 180 8 Merc 450SL 17.3 180 8 Merc 450SLC 15.2 180 8 Cadillac Fleetwood 10.4 205 8 Lincoln Continental 10.4 215 8 Chrysler Imperial 14.7 230 8 Fiat 128 32.4 66 4 Honda Civic 30.4 52 4 Toyota Corolla 33.9 65 4 Toyota Corona 21.5 97 4 Dodge Challenger 15.5 150 8 AMC Javelin 15.2 150 8 Camaro Z28 13.3 245 8 Pontiac Firebird 19.2 175 8 Fiat X1-9 27.3 66 4 Porsche 914-2 26.0 91 4 Lotus Europa 30.4 113 4 Ford Pantera L 15.8 264 8 Ferrari Dino 19.7 175 6 Maserati Bora 15.0 335 8 Volvo 142E 21.4 109 4 > mtcars2 <- subset(mtcars, select=c("mpg", "hp", "cyl")) > pairs(mtcars2)
  10. 10. Auto-correlation > A <- read.table(“http://cdiac.ornl.gov/ftp/trends/co2/maunaloa.co2”) > X=t(A[1,]) > ts.plot(X) > acf(X)
  11. 11. Plotting > x <- seq(-1.57,1.57,by=.001) > y <- (sqrt(abs(cos(x))) * cos(200*x) + sqrt(abs(x))-0.7) * (4-x * x)^0.01 > plot(0,0, type=‘n’, xlim=c(-2,+2),ylim=c(-1.6,+1.1)) > lines(x,y,col='pink') > spread <- seq(1, length(x), length.out=length(x)/10) > cols <- c('yellow','red','orange', 'purple') > text(x[spread],y[spread], label='love', col=sample(rep(cols, length.out=length(spread))), cex=1)
  12. 12. Regression > library("MASS") > str(cats) 'data.frame': 144 obs. of 3 variables: $ Sex: Factor w/ 2 levels "F","M": 1 1 1 1 1 1 1 1 1 1 ... $ Bwt: num 2 2 2 2.1 2.1 2.1 2.1 2.1 2.1 2.1 ... $ Hwt: num 7 7.4 9.5 7.2 7.3 7.6 8.1 8.2 8.3 8.5 ... attach(cats) > lm.out <- lm(Hwt ~ Bwt) Call: lm(formula = Hwt ~ Bwt) Coefficients: (Intercept) Bwt -0.3567 4.0341 > plot(Hwt ~ Bwt, main="Kitty Cat Plot") > abline(lm.out, col="red")
  13. 13. Data manipulation: discretisation > clinical.trial <- data.frame(patient = 1:100, age= rnorm(100, mean = 60, sd = 8), year.enroll = sample(paste("19", 85:99, sep = ""), 100, replace = TRUE)) > c1 <- cut(clinical.trial$age, breaks = 4) > table(c1) (41.1,50] (50,58.8] (58.8,67.6] (67.6,76.4] 9 34 41 16 > hist(clinical.trial$age, breaks=seq(40,100, by=10))
  14. 14. Plots from my MSc thesis • Prices of energy in the Italian Power Exchange spot market • Forecast using a SARIMA model
  15. 15. Performances • Good performances with built-in math functions • Possibility to monitor the memory usage • Possibility to offload data to an external DB to speed up large operations • Functions for big data sets • Parallel computation
  16. 16. Credits • http://adv-r.had.co.nz/ • http://cran.r-project.org/ • http://simplystatistics.org/2013/02/15/interview-with-nick- chamandy-statistician-at-google/ • https://kaosktrl.wordpress.com/2010/02/04/r-lanalisi-delle-serie- storiche-partendo-da-copenaghen/
  17. 17. Vector part 1 > x <- c(2,5,9.5,-3) #create a vector > x[2] #selects the second element [1] 5 > x[c(2,4)] #select the elements in position 2 and 4 [1] 5 -3 > x[-c(1,3)] #keep out the elements in position 1 and 3 [1] 5 -3 > x[x>0] #select only positive elements [1] 2.0 5.0 9.5 > x[!(x<=0)] #keep out the striclty not positve elements [1] 2.0 5.0 9.5 > x[x>0]-1 > x[x>0]+c(1,2,3) #sum element-wise [1] 1.0 4.0 8.5 [1] 3.0 7.0 11.0 > x[x>0][2] [1] 5
  18. 18. Vector part 2 > which(x>0) #show the indexes that match the condition [1] 1 2 3 > which.max(x) > which.min(x) > length(x) [1] 4 [1] 3 [1] 4 > x<-1:10 > paste(1:5, c("A","B"), sep="") [1] 1 2 3 4 5 6 7 8 9 10 [1] "1A" "2B" "3A" "4B" "5A" > x1<-seq(1,1000, length=10) #vector from 1 to 1000 with step 10 [1] 1 112 223 334 445 556 667 778 889 1000 > x2<-rep(2,times=10) #repeat 2 10 times [1] 2 2 2 2 2 2 2 2 2 2 > rep(c(1,3),times=4) #repeat (1,3) 4 times [1] 1 3 1 3 1 3 1 3 > rep(c(1,9),c(3,1)) #repeat (1,9) 3 and 1 times respectively [1] 1 1 1 9 > length(c(x,x1,x2,3)) [1] 31 #see also sort, order, eigen
  19. 19. Matrix part 1 > x<-matrix(1:10,ncol=5) #create [,1] [,2] [,3] [,4] [,5] [1,] 1 3 5 7 9 [2,] 2 4 6 8 10 > x[,1] #select the first column [1] 1 2 > x[,4:5] #select columns 4 and 5 [,1] [,2] [1,] 7 9 [2,] 8 10 > cbind(1:2,c(1,-2),c(0,9)) #combine vectors by columns/rows (rbind) [,1] [,2] [,3] [1,] 1 1 0 [2,] 2 -2 9 > x[2,]<-rep(2,5) [,1] [,2] [,3] [,4] [,5] [1,] 1 3 5 7 9 [2,] 2 2 2 2 2 > x[2,] #select the second row [1] 2 4 6 8 10 > x[,-c(2,4)] #select columns 1 3 5 [,1] [,2] [,3] [1,] 1 5 9 [2,] 2 6 10
  20. 20. Matrix part 2 > X<-diag(1:3) [,1] [,2] [,3] [1,] 1 0 0 [2,] 0 2 0 [3,] 0 0 3 > solve(X) #the inverse of X [,1] [,2] [,3] [1,] 1 0.0 0.0000000 [2,] 0 0.5 0.0000000 [3,] 0 0.0 0.3333333 > X%*%solve(X)#....verify [,1] [,2] [,3] [1,] 1 0 0 [2,] 0 1 0 [3,] 0 0 1
  21. 21. List, can contain different object types > lista<-list(matrix(1:9,nrow=3),rep(0,3),c(‘good’,’bad’)) > length(lista) [1] 3 > lista[[3]] #third element [1] ‘good’ ‘bad’ > length(lista[[3]]) [1] 2 > lista[[2]]+2 #sum on the second item [1] 2 2 2 > lista[[1]][2,2] [1] 5 > names(lista)<-c(‘first’, ‘second’, ‘third’) #names for elements > lista$second #or lista[[second]] return a vector [1] 0 0 0 > lista["second"] #return a filtered list by the condition $second [1] 0 0 0
  22. 22. Multidimensional Array and named indexes > a<-array(1:24, dim=c(3,4,2)) > dim(a) #show dimensions [1] 3 4 2 > a[,,2] [,1] [,2] [,3] [,4] [1,] 13 16 19 22 [2,] 14 17 20 23 [3,] 15 18 21 24 > a[1,,] [,1] [,2] [1,] 1 13 [2,] 4 16 [3,] 7 19 [4,] 10 22 > a[1,2,1] [1] 4 > x<-matrix(1:10, ncol=5) > dimnames(x)<-list(c("X","Y"),NULL) [,1] [,2] [,3] [,4] [,5] X 1 3 5 7 9 Y 2 4 6 8 10 > dimnames(x)[[2]]<-c("g","h","j","j","k") g h j j k X 1 3 5 7 9 Y 2 4 6 8 10 Summary of Data Structures Linear Rectangular Homogeneous Vectors Matrices Heterogeneous Lists Data frames
  23. 23. Data frame > X<-data.frame(id=1:4, sex=c("M","F","F","M")) id sex 1 1 M 2 2 F 3 3 F 4 4 M > X$age<-c(2.5,3,5,6.2) id sex age 1 1 M 2.5 2 2 F 3.0 3 3 F 5.0 4 4 M 6.2 #X[X$age<3 | X$age>5, c("id","sex")] > subset(X,subset=(age<3 | age>5), select=-age) id sex 1 1 M 4 4 M #see also merge, attach > summary(X) id sex age Min. :1.00 F:2 Min. :2.500 1st Qu.:1.75 M:2 1st Qu.:2.875 Median :2.50 Median :4.000 Mean :2.50 Mean :4.175 3rd Qu.:3.25 3rd Qu.:5.300 Max. :4.00 Max. :6.200

Notas del editor

  • R's data structures include vectors, matrices, multidimensional arrays, lists and data frames (similar to tables in a relational database). A scalar is represented as a vector with length one. It’s interpreted and its packages are mainly written using R, C and Fortran. R is freely available under the GPL, and pre-compiled binary versions are provided for various operating systems. The R community is very active in terms of packages for specific functions or specific areas of study.
  • R can act as a matrix-calculation toolbox with performances comparable to GNU Octave or MATLAB.

    Another strength of R is static graphics, which can produce publication-quality graphs, including mathematical symbols. Dynamic and interactive graphics are available through additional packages.

    R's system includes objects for: regression models, time-series and geo-spatial coordinates, techniques for linear and nonlinear modeling, classical statistical tests, classification, clustering, and others.

    R is easily extensible through functions and extensions.
  • Polls and surveys of data miners show that R's popularity has increased substantially in recent years.
  • R is an interpreted language; users typically access it through a command-line interpreter; there are also several graphical front-ends for it.



  • The IDE I used is very similar to MatLam with the following four sections: one for the scripts, one for the current workspace where the objects and the matriices are easlily accessible, one for the console to compute analysis on the fly, one for the generated images
  • R has the same capability of common procedural languages, to control the flow you can use instructions like while, repeat, if, and functions.
    R allows to handle exceptions using try catch blocks.
    Functions have default parameters in the definition, you can call a function using positional or named arguments.

    A generic function acts differently depending on the type of arguments passed to it. So, the generic function dispatches the implementation specific to that type of object. For example, R has a generic print function that can print almost every type of object in R with a simple print(objectname) syntax.

  • One line methods for: correlation, plotting, regression,
  • The syntax to caltulate the regression

×