Overview
Introduction to R
Why use it?
Setting up R Environment
Data Types
File Handling,
Plotting and Graphic features
Packages
What is ?
“R is a freely available language and
environment for statistical computing and
graphics”
Much like & , but bette !
What R is and what it is not
R is
a programming language
a statistical package
an interpreter
Open Source
R is not
a database
a collection of “black boxes”
a spreadsheet software package
commercially supported
Why use ?
SPSS and Excel users are limited in their
ability to change their environment. The way
they approach a problem is constrained by
how Excel & SPSS were programmed to
approach it
The users have to pay money to use the
software
R users can rely on functions that have been
developed for them by statistical researchers
or create their own
They don’t have to pay money to use them
Once experienced enough they are almost
unlimited in their ability to change their
environment
Installing
Go to R homepage:
http://www.r-project.org/
Choose a server
And just follow the installation instructions…
Getting started
To obtain and install R on your computer
Go to http://cran.r-project.org/mirrors.html to choose a
mirror near you
Click on your favorite operating system (Linux, Mac,
or Windows)
Download and install the “base”
To install additional packages
Start R on your computer
Choose the appropriate item from the “Packages”
menu
Installing RStudio
“RStudio is a new integrated development
environment (IDE) for R”
Install the “desktop edition” from this link:
http://www.rstudio.org/download/
Naming Convention
must start with a letter (A-Z or a-z)
can contain letters, digits (0-9), and/or periods “.”
case-sensitive
mydata different from MyData
do not use use underscore “_”
To quit R, use >q()
Assignment
“<-” used to indicate assignment
x<-c(1,2,3,4,5,6,7)
x<-c(1:7)
x<-1:4
note: as of version 1.4 “=“ is also a valid
assignment operator
R as a calculator
> 5 + (6 + 7) * pi^2
[1] 133.3049
> log(exp(1))
[1] 1
> log(1000, 10)
[1] 3
> sin(pi/3)^2 + cos(pi/3)^2
[1] 1
> Sin(pi/3)^2 + cos(pi/3)^2
Error: couldn't find function "Sin"
A variable is a symbolic name
given to stored information
Variables are assigned using
either ”=” or ”<-”
> x<-12.6
> x
[1] 12.6
Variables
Missing values
R is designed to handle statistical data
and therefore predestined to deal with
missing values
Numbers that are “not available”
> x <- c(1, 2, 3, NA)
> x + 3
[1] 4 5 6 NA
“Not a number”
> log(c(0, 1, 2))
[1] -Inf 0.0000000 0.6931472
> 0/0
[1] NaN
Basic (atomic) data types
Logical
> x <- T; y <- F
> x; y
[1] TRUE
[1] FALSE
Numerical
> a <- 5; b <-
sqrt(2)
> a; b
[1] 5
[1] 1.414214
Character
> a <- "1"; b <- 1
> a; b
[1] "1"
[1] 1
> a <- "character"
> b <- "a"; c <- a
> a; b; c
[1] "character"
[1] "a"
[1] "character"
R Program to Take Input From User
readline() function to take input from the user (terminal).
This function will return a single element character vector.
Example
my.name <- readline(prompt="Enter name: ")
my.age <- readline(prompt="Enter age: ")
# convert character into integer
my.age <- as.integer(my.age)
print(paste("Hi,", my.name, "next year you will be",
my.age+1, "years old."))
character vector into integer using the function as.integer().
prompt argument is printed in front of the user input. It
usually ends on ": ".
A vector is a list of values. A numeric
vector is composed of numbers
It may be created:
Using the c() function (concatenate) :
x = c(3,7,9,11)
> x
[1] 3 7 9 11
Using the rep(what,how_many_times) function
(replicate):
x = rep(10,3)
Using the “:” operator, signifiying a series
of integers
x=4:15
Variables - Numeric Vectors
Character strings are always double
quoted
Vectors made of character strings:
> x=c("I","want","to","go","home")
> x
[1] "I" "want" "to" "go" "home"
Using rep():
> rep("bye",2)
[1] "bye" "bye"
Notice the difference using paste()
(1 element):
> paste("I","want","to","go","home")
[1] "I want to go home"
Variables - Character Vectors
Our vector: x=c(100,101,102,103)
[] are used to access elements in x
Extract 2nd element in x
> x[2]
[1] 101
Extract 3rd and 4th elements in x
> x[3:4] # or x[c(3,4)]
[1] 102 103
Manipulation of Vectors
> x
[1] 100 101 102 103
Add 1 to all elements in x:
> x+1
[1] 101 102 103 104
Multiply all elements in x by 2:
> x*2
[1] 200 202 204 206
Manipulation of Vectors –
Cont.
Manipulation of Vectors –
Cont.
> x <- c(5.2, 1.7, 6.3)
> log(x)
[1] 1.6486586 0.5306283 1.8405496
> y <- 1:5
> z <- seq(1, 1.4, by = 0.1)
> y + z
[1] 2.0 3.1 4.2 5.3 6.4
> length(y)
[1] 5
> mean(y + z)
[1] 4.2
Mydata <- c(2,3.5,-0.2)
Vector c=“concatenate”)
Colors <- c("Red","Green","Red")
Character vector
x1 <- 25:30
> x1
[1] 25 26 27 28 29 30 Number sequences
> Colors[2]
[1] "Green" One element
> x1[3:5]
[1] 27 28 29 Various elements
Manipulation of Vectors –
Cont.
Manipulation of Vectors –
Cont.
Test on the elements
Extract the positive
elements
Remove elements
> Mydata
[1] 2 3.5 -0.2
> Mydata > 0
[1] TRUE TRUE FALSE
> Mydata[Mydata>0]
[1] 2 3.5
> Mydata[-c(1,3)]
[1] 3.5
More Operators
Comparison operators:
Equal ==
Not equal !=
Less / greater than < / >
Less / greater than or equal <= /
>=
Boolean (either FALSE or TRUE)
And &
Or |
Not !
Our vector: x=100:150
Elements of x higher than 145
> x[x>145]
[1] 146 147 148 149 150
Elements of x higher than 135 and
lower than 140
> x[ x>135 & x<140 ]
[1] 136 137 138 139
Manipulation of Vectors –
Cont.
Our vector:
> x=c("I","want","to","go","home")
Elements of x that do not equal “want”:
> x[x != "want"]
[1] "I" "to" "go" "home"
Elements of x that equal “want” and “home”:
> x[x %in% c("want","home")]
[1] "want" "home"
Manipulation of Vectors –
Cont.
Note: use “==” for 1 element and “%in%” for several elements
Bar plot
marks = c(70, 95, 80, 74)
barplot(marks, main = "Comparing marks of 5
subjects",
xlab = "Marks",
ylab = "Subject",
names.arg = c("English", "Science", "Math.", "Hist."),
col = "darkred", horiz = FALSE)
1. Write a R program to take input from the user
(name and age) and display the values.
2. Write an R-script to initialize your rollno., name
and branch then display all the details.
3. Write an R-script to initialize two variables, then
find out the sum, multiplication, subtraction and
division of them.
4. Write an R-script to enter a 3-digits number
from the keyboard, then find out sum of all the
3-digits.
5. Write an R-script to enter the radius of a circle,
then calculate the area and circumference of
the circle.
6. Write a R program to create a sequence of numbers
from 20 to 50 and find the mean of numbers from 20 to
60 and sum of numbers from 51 to 91.
7. Write a R program to create a vector which contains 10
random integer values between -50 and +50.
8. Write a R program to find the maximum and the
minimum value of a given vector
9. Write a R program to create three vectors numeric data,
character data and logical data. Display the content of
the vectors and their type.
10. Write a R program to compute sum, mean and product
of a given vector elements.
41
Matrices
Matrix: A two dimensional rectangular
data set. It can be created using a
vector input to a matrix function.
The basic syntax for creating a matrix in R is:
matrix(data, nrow, ncol, byrow, dimnames)
C= { 1,2,3,4,5,6,7,8,9,10} matrix of 5X2 , byrow
M1= Matrix (C, 5,2,byrow) M2 =matrix (C,5,2,bycol)
data is the input vector which becomes the data elements of the matrix.
nrow is the number of rows to be created.
ncol is the number of columns to be created.
byrow is a logical clue. If TRUE then the input vector elements are
arranged by row.
dimname is the names assigned to the rows and columns.
A matrix is a table of a different class
Each column must be of the same class
(e.g. numeric, character, etc.)
The number of elements in each
row must be identical
Variables – Matrices
Accessing elements in
matrices:
x[row,column]
The ‘Height’ column:
> x[,”Height”] # or:
> x[,2]
Note: you cannot use “$”
> x$Weight
Another way of creating a matrix is by using
functions cbind() and rbind() as in column bind and row
bind.
cbind(c(1,2,3),c(4,5,6))
[,1] [,2]
[1,] 1 4
[2,] 2 5
[3,] 3 6
rbind(c(1,2,3),c(4,5,6))
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
50
Useful Functions
length(object) # number of elements or components
str(object) # structure of an object
class(object) # class or type of an object
names(object) # names
c(object,object,...) # combine objects into a
vector
cbind(object, object, ...) # combine objects as
columns
rbind(object, object, ...) # combine objects as rows
ls() # list current objects
rm(object) # delete an object
newobject <- edit(object) # edit copy and save a
newobject
fix(object) # edit in place
6. Create a matrix taking a vector of numbers as input:
c(3:14), nrows = 4
Perform the following operations:
i) Elements are arranged sequentially by row.
ii) Elements are arranged sequentially by column.
iii) Define the column and row names.
iv) Access the element at 3rd column and 1st row.
v) Access the element at 2nd column and 4th row.
vi) Access only the 2nd row.
vii) Access only the 3rd column.
7. Create two 2x3 matrices.
Perform the following operations:
Add the matrices.
Subtract the matrices
Multiply the matrices.
Divide the matrices
Lists
vector: an ordered collection of data of the same
type.
> a = c(7,5,1)
> a[2]
[1] 5
list: an ordered collection of data of arbitrary
types.
> doe = list(name="john",age=28,married=F)
> doe$name
[1] "john“
> doe$age
[1] 28
Typically, vector elements are accessed by their index
(an integer), list elements by their name (a character
string). But both types support both access methods.
Lists 1
A list is an object consisting of objects
called components.
The components of a list don’t need to be
of the same mode or type and they can be a
numeric vector, a logical value and a
function and so on.
A component of a list can be referred as
aa[[I]] or aa$times, where aa is the name
of the list and times is a name of a
component of aa.
Lists 2
The names of components may be abbreviated
down to the minimum number of letters
needed to identify them uniquely.
aa[[1]] is the first component of aa,
while aa[1] is the sublist consisting of
the first component of aa only.
There are functions whose return value is
a List. We have seen some of them, eigen,
svd, …
Lists: Session
Empl <- list(employee=“Anna”, spouse=“Fred”,
children=3, child.ages=c(4,7,9))
Empl[[4]]
Empl$child.a
Empl[4] # a sublist consisting of the 4th
component of Empl
names(Empl) <- letters[1:4]
Empl <- c(Empl, service=8)
unlist(Empl) # converts it to a vector.
Mixed types will be converted to character,
giving a character vector.