SlideShare a Scribd company logo
1 of 13
Download to read offline
R Code for Data
Manipulation
Avjinder Singh Kaler
Table Content
1. Creating, Recoding and Renaming variables
2. Operators
3. Built-in Functions
4. Sorting, Merging and Aggregating Data
5. Reshaping Data
6. Sub-setting Data
Introduction
Data manipulation in R includes creating new
(including recoding and renaming existingvariables
variables), sorting and merging datasets, aggregating
data, reshaping data, and subsetting datasets
(including selecting observations that meet criteria,
randomly sampling observeration, and dropping or
keeping variables).
Suppose you have a dataset (mydata) with three variables
x1, x2 and x3.
Old
Variables
New
Variables
Sorting and Merge
Aggregating
Recode and Reshaping
Transform
Subsetting
Data
Frame
1. Creating, Recoding and Renaming variables
1. Create new variable:
For example, adding and getting average of two variables x1 and x2
to create new variable
mydata$sum <- mydata$x1 + mydata$x2
mydata$mean <- (mydata$x1 + mydata$x2)/2
Second option is that one can attach the data with this code
attach(mydata)
mydata$sum <- x1 + x2
mydata$mean <- (x1 + x2)/2
mydata <- transform( mydata,sum = x1 + x2,mean = (x1 + x2)/2 )
2. Recoding within variable
mydata$agecat <- ifelse(mydata$age > 70, c("older"), c("younger"))
Here, ifelse function can recode the variable. Meaning of this function in this code is that if one
is interested in age variable from mydata to convert into older and young.
mydata$age > 70------ condition if variable have more than 70 , then -----c("older"), --------
otherwise -------- c("younger"))
Second option of recoding is:
attach(mydata)
mydata$agecat[age > 75] <- "Elder"
mydata$agecat[age > 45 & age <= 75] <- "Middle Aged"
mydata$agecat[age <= 45] <- "Young"
Note: ifelse function can be used to change recode variable.
ifelse--- (condtion, “then”, “otherwise”)
3. Renaming the variables
Two ways to rename variables, interactively and programmatically
# rename interactively
fix(mydata) # results are saved on close
# rename programmatically
library(reshape)
mydata <- rename(mydata, c(oldname="newname"))
2. Operators
Operator can be arithmetic (+, -, /,) and logical (less or more than).
Arithmetic Operators
Operator Description
+ ----- addition
- ----- subtraction
* ------ multiplication
/ ------- division
^ or ** -----exponentiation
x %% y ------ modulus (x mod y) 5%%2 is 1
x %/% y -------- integer division 5%/%2 is 2Logical Operators
Logical Operators
Operator Description
< ---- less than
<= ----- less than or equal to
> ------ greater than
>= ------ greater than or equal to
== ------ exactly equal to
!= --- not equal to
!x -------Not x
x | y ------ x OR y
x & y --------- x AND y
isTRUE(x) ------- test if X is TRUE
# An example
x <- c(1:10)
x[(x>8) | (x<5)]
# yields 1 2 3 4 9 10
# How it works
x <- c(1:10)
x
1 2 3 4 5 6 7 8 9 10
x > 8
F F F F F F F F T T
x < 5
T T T T F F F F F F
x > 8 | x < 5
T T T T F F F F T T
3. Built-in Functions
Almost everything in R is done through functions. Here I'm only refering to numeric and
character functions that are commonly used in creating or recoding variables.
1. Numeric Functions
Function Description
abs(x) absolute value
sqrt(x) square root
ceiling(x) ceiling(3.475) is 4
floor(x) floor(3.475) is 3
trunc(x) trunc(5.99) is 5
round(x, digits=n) round(3.475, digits=2) is 3.48
signif(x, digits=n) signif(3.475, digits=2) is 3.5
cos(x), sin(x), tan(x) also acos(x), cosh(x), acosh(x), etc.
log(x) natural logarithm
log10(x) common logarithm
exp(x) e^x
2. Character Functions
Function Description
substr(x, start=n1, stop=n2) Extract or replace substrings in a character vector.
x <- "abcdef"
substr(x, 2, 4) is "bcd"
substr(x, 2, 4) <- "22222" is "a222ef"
grep(pattern, x ,
ignore.case=FALSE, fixed=FALSE)
Search for pattern in x. If fixed =FALSE then pattern is
a regular expression. If fixed=TRUE then pattern is a
text string. Returns matching indices.
grep("A", c("b","A","c"), fixed=TRUE) returns 2
sub(pattern, replacement,x,
ignore.case =FALSE,
fixed=FALSE)
Find pattern in x and replace with replacement text. If
fixed=FALSE then pattern is a regular expression.
If fixed = T then pattern is a text string.
sub("s",".","Hello There") returns "Hello.There"
strsplit(x, split) Split the elements of character vector x at split.
strsplit("abc", "") returns 3 element vector "a","b","c"
paste(..., sep="") Concatenate strings after using sep string to seperate
them.
paste("x",1:3,sep="") returns c("x1","x2" "x3")
paste("x",1:3,sep="M") returns c("xM1","xM2" "xM3")
paste("Today is", date())
toupper(x) Uppercase
tolower(x) Lowercase
3. Statistical Probability Functions
Function Description
dnorm(x) normal density function (by default m=0 sd=1)
# plot standard normal curve
x <- pretty(c(-3,3), 30)
y <- dnorm(x)
plot(x, y, type='l', xlab="Normal Deviate", ylab="Density", yaxs="i")
pnorm(q) cumulative normal probability for q
(area under the normal curve to the left of q)
pnorm(1.96) is 0.975
qnorm(p) normal quantile.
value at the p percentile of normal distribution
qnorm(.9) is 1.28 # 90th percentile
rnorm(n, m=0,sd=1) n random normal deviates with mean m
and standard deviation sd.
#50 random normal variates with mean=50, sd=10
x <- rnorm(50, m=50, sd=10)
dbinom(x, size, prob)
pbinom(q, size, prob)
qbinom(p, size, prob)
rbinom(n, size, prob)
binomial distribution where size is the sample size
and prob is the probability of a heads (pi)
# prob of 0 to 5 heads of fair coin out of 10 flips
dbinom(0:5, 10, .5)
# prob of 5 or less heads of fair coin out of 10 flips
pbinom(5, 10, .5)
dpois(x, lamda)
ppois(q, lamda)
qpois(p, lamda)
rpois(n, lamda)
poisson distribution with m=std=lamda
#probability of 0,1, or 2 events with lamda=4
dpois(0:2, 4)
# probability of at least 3 events with lamda=4
1- ppois(2,4)
dunif(x, min=0, max=1)
punif(q, min=0, max=1)
qunif(p, min=0, max=1)
runif(n, min=0, max=1)
uniform distribution, follows the same pattern
as the normal distribution above.
#10 uniform random variates
x <- runif(10)
4. Other Statistical Functions
Function Description
mean(x, trim=0,
na.rm=FALSE)
mean of object x
# trimmed mean, removing any missing values and
# 5 percent of highest and lowest scores
mx <- mean(x,trim=.05,na.rm=TRUE)
sd(x) standard deviation of object(x). also look at var(x) for variance and
mad(x) for median absolute deviation.
median(x) median
quantile(x, probs) quantiles where x is the numeric vector whose quantiles are desired
and probs is a numeric vector with probabilities in [0,1].
# 30th and 84th percentiles of x
y <- quantile(x, c(.3,.84))
range(x) range
sum(x) sum
diff(x, lag=1) lagged differences, with lag indicating which lag to use
min(x) minimum
max(x) maximum
scale(x, center=TRUE,
scale=TRUE)
column center or standardize a matrix.
4. Sorting, Merging and Aggregating Data
1. Sorting
# sorting examples using the mydata dataset
attach(mtdata)
# sort by x1
newdata <- mydata[order(x1),]
# sort by x1 and x2
newdata <- mydata[order(x1, x2),]
#sort by x1 (ascending) and x2(decending)
newdata <- mydata[order(x1, -x2),]
2. Merging
Adding columns
# merge two data frames by ID
total <- merge(data frameA,data frameB,by="ID")
# merge two data frames by ID and Country
total <- merge(data frameA,data frameB,by=c("ID","Country"))
Adding Row
total <- rbind(data frameA, data frameB
Aggregating Data
# aggregate data frame mtcars by cyl and vs, returning means
# for numeric variables
attach(mtcars)
aggdata <-aggregate(mtcars, by=list(cyl,vs), FUN=mean, na.rm=TRUE)
print(aggdata)
detach(mtcars)
5. Reshaping Data
1. Transpose
mydata
t(mydata)
2. Reshape package
mydata
id time x1 x2
1 1 5 6
1 2 3 5
2 1 6 1
2 2 2 4
library(reshape)
newdata <- melt(mydata, id=c("id","time"))
newdata
id time variablevalue
1 1 x1 5
1 2 x1 3
2 1 x1 6
2 2 x1 2
1 1 x2 6
1 2 x2 5
2 1 x2 1
2 2 x2 4
Can use other function with this
# cast the melted data
# cast(data, formula, function)
subjmeans <- cast(mdata, id~variable, mean)
timemeans <- cast(mdata, time~variable, mean)
6. Subsetting Data
1. Selecting (Keeping) Variables
# select variables v1, v2, v3
myvars <- c("v1", "v2", "v3")
newdata <- mydata[myvars]
# another method
myvars <- paste("v", 1:3, sep="")
newdata <- mydata[myvars]
# select 1st and 5th thru 10th variables
newdata <- mydata[c(1,5:10)]
2. Excluding (DROPPING) Variables
# exclude variables v1, v2, v3
myvars <- names(mydata) %in% c("v1", "v2", "v3")
newdata <- mydata[!myvars]
# exclude 3rd and 5th variable
newdata <- mydata[c(-3,-5)]
# delete variables v3 and v5
mydata$v3 <- mydata$v5 <- NULL
3. Selecting Observations
# first 5 observations
newdata <- mydata[1:5,]
# based on variable values
newdata <- mydata[ which(mydata$gender=='F'
& mydata$age > 65), ]
# or
attach(newdata)
newdata <- mydata[ which(gender=='F' & age > 65),]
detach(newdata)
4. Selection using the Subset Function
# using subset function
newdata <- subset(mydata, age >= 20 | age < 10,
select=c(ID, Weight))
# using subset function (part 2)
newdata <- subset(mydata, sex=="m" & age > 25,
select=weight:income)

More Related Content

What's hot

Introduction to data analysis using R
Introduction to data analysis using RIntroduction to data analysis using R
Introduction to data analysis using RVictoria López
 
8. R Graphics with R
8. R Graphics with R8. R Graphics with R
8. R Graphics with RFAO
 
R programming presentation
R programming presentationR programming presentation
R programming presentationAkshat Sharma
 
R basics
R basicsR basics
R basicsFAO
 
Introduction to R programming
Introduction to R programmingIntroduction to R programming
Introduction to R programmingAlberto Labarga
 
Data Types and Structures in R
Data Types and Structures in RData Types and Structures in R
Data Types and Structures in RRupak Roy
 
R programming slides
R  programming slidesR  programming slides
R programming slidesPankaj Saini
 
SAS and R Code for Basic Statistics
SAS and R Code for Basic StatisticsSAS and R Code for Basic Statistics
SAS and R Code for Basic StatisticsAvjinder (Avi) Kaler
 
A quick introduction to R
A quick introduction to RA quick introduction to R
A quick introduction to RAngshuman Saha
 
Introduction to matlab
Introduction to matlabIntroduction to matlab
Introduction to matlabBilawalBaloch1
 
Introduction to R Graphics with ggplot2
Introduction to R Graphics with ggplot2Introduction to R Graphics with ggplot2
Introduction to R Graphics with ggplot2izahn
 
3. R- list and data frame
3. R- list and data frame3. R- list and data frame
3. R- list and data framekrishna singh
 

What's hot (20)

Set theory: by Pratima Nayak
Set theory: by Pratima NayakSet theory: by Pratima Nayak
Set theory: by Pratima Nayak
 
Introduction to data analysis using R
Introduction to data analysis using RIntroduction to data analysis using R
Introduction to data analysis using R
 
8. R Graphics with R
8. R Graphics with R8. R Graphics with R
8. R Graphics with R
 
R programming presentation
R programming presentationR programming presentation
R programming presentation
 
Data Management in R
Data Management in RData Management in R
Data Management in R
 
An Intoduction to R
An Intoduction to RAn Intoduction to R
An Intoduction to R
 
Machine Learning in R
Machine Learning in RMachine Learning in R
Machine Learning in R
 
R basics
R basicsR basics
R basics
 
Introduction to R programming
Introduction to R programmingIntroduction to R programming
Introduction to R programming
 
Data Types and Structures in R
Data Types and Structures in RData Types and Structures in R
Data Types and Structures in R
 
3 Data Structure in R
3 Data Structure in R3 Data Structure in R
3 Data Structure in R
 
R programming slides
R  programming slidesR  programming slides
R programming slides
 
SAS and R Code for Basic Statistics
SAS and R Code for Basic StatisticsSAS and R Code for Basic Statistics
SAS and R Code for Basic Statistics
 
A quick introduction to R
A quick introduction to RA quick introduction to R
A quick introduction to R
 
Introduction to matlab
Introduction to matlabIntroduction to matlab
Introduction to matlab
 
Introduction to R Graphics with ggplot2
Introduction to R Graphics with ggplot2Introduction to R Graphics with ggplot2
Introduction to R Graphics with ggplot2
 
Machine Learning with R
Machine Learning with RMachine Learning with R
Machine Learning with R
 
Data frame operations
Data frame operationsData frame operations
Data frame operations
 
3. R- list and data frame
3. R- list and data frame3. R- list and data frame
3. R- list and data frame
 
R Language Introduction
R Language IntroductionR Language Introduction
R Language Introduction
 

Viewers also liked

Nutrient availability response to sulfur amendment in histosols having variab...
Nutrient availability response to sulfur amendment in histosols having variab...Nutrient availability response to sulfur amendment in histosols having variab...
Nutrient availability response to sulfur amendment in histosols having variab...Avjinder (Avi) Kaler
 
Sugarcane yield and plant nutrient response to sulfur amended everglades hist...
Sugarcane yield and plant nutrient response to sulfur amended everglades hist...Sugarcane yield and plant nutrient response to sulfur amended everglades hist...
Sugarcane yield and plant nutrient response to sulfur amended everglades hist...Avjinder (Avi) Kaler
 
Genomic Selection with Bayesian Generalized Linear Regression model using R
Genomic Selection with Bayesian Generalized Linear Regression model using RGenomic Selection with Bayesian Generalized Linear Regression model using R
Genomic Selection with Bayesian Generalized Linear Regression model using RAvjinder (Avi) Kaler
 
Genome-Wide Association Mapping of Carbon Isotope and Oxygen Isotope Ratios i...
Genome-Wide Association Mapping of Carbon Isotope and Oxygen Isotope Ratios i...Genome-Wide Association Mapping of Carbon Isotope and Oxygen Isotope Ratios i...
Genome-Wide Association Mapping of Carbon Isotope and Oxygen Isotope Ratios i...Avjinder (Avi) Kaler
 
SAS and R Code for Basic Statistics
SAS and R Code for Basic StatisticsSAS and R Code for Basic Statistics
SAS and R Code for Basic StatisticsAvjinder (Avi) Kaler
 
Genome-wide association mapping of canopy wilting in diverse soybean genotypes
Genome-wide association mapping of canopy wilting in diverse soybean genotypesGenome-wide association mapping of canopy wilting in diverse soybean genotypes
Genome-wide association mapping of canopy wilting in diverse soybean genotypesAvjinder (Avi) Kaler
 
R code descriptive statistics of phenotypic data by Avjinder Kaler
R code descriptive statistics of phenotypic data by Avjinder KalerR code descriptive statistics of phenotypic data by Avjinder Kaler
R code descriptive statistics of phenotypic data by Avjinder KalerAvjinder (Avi) Kaler
 
Seed rate calculation for experiment
Seed rate calculation for experimentSeed rate calculation for experiment
Seed rate calculation for experimentAvjinder (Avi) Kaler
 
Basic Tutorial of Association Mapping by Avjinder Kaler
Basic Tutorial of Association Mapping by Avjinder KalerBasic Tutorial of Association Mapping by Avjinder Kaler
Basic Tutorial of Association Mapping by Avjinder KalerAvjinder (Avi) Kaler
 
Tutorial for Circular and Rectangular Manhattan plots
Tutorial for Circular and Rectangular Manhattan plotsTutorial for Circular and Rectangular Manhattan plots
Tutorial for Circular and Rectangular Manhattan plotsAvjinder (Avi) Kaler
 
Tutorial for Estimating Broad and Narrow Sense Heritability using R
Tutorial for Estimating Broad and Narrow Sense Heritability using RTutorial for Estimating Broad and Narrow Sense Heritability using R
Tutorial for Estimating Broad and Narrow Sense Heritability using RAvjinder (Avi) Kaler
 

Viewers also liked (13)

Nutrient availability response to sulfur amendment in histosols having variab...
Nutrient availability response to sulfur amendment in histosols having variab...Nutrient availability response to sulfur amendment in histosols having variab...
Nutrient availability response to sulfur amendment in histosols having variab...
 
Sugarcane yield and plant nutrient response to sulfur amended everglades hist...
Sugarcane yield and plant nutrient response to sulfur amended everglades hist...Sugarcane yield and plant nutrient response to sulfur amended everglades hist...
Sugarcane yield and plant nutrient response to sulfur amended everglades hist...
 
Genomic Selection with Bayesian Generalized Linear Regression model using R
Genomic Selection with Bayesian Generalized Linear Regression model using RGenomic Selection with Bayesian Generalized Linear Regression model using R
Genomic Selection with Bayesian Generalized Linear Regression model using R
 
Genome-Wide Association Mapping of Carbon Isotope and Oxygen Isotope Ratios i...
Genome-Wide Association Mapping of Carbon Isotope and Oxygen Isotope Ratios i...Genome-Wide Association Mapping of Carbon Isotope and Oxygen Isotope Ratios i...
Genome-Wide Association Mapping of Carbon Isotope and Oxygen Isotope Ratios i...
 
SAS and R Code for Basic Statistics
SAS and R Code for Basic StatisticsSAS and R Code for Basic Statistics
SAS and R Code for Basic Statistics
 
Genome-wide association mapping of canopy wilting in diverse soybean genotypes
Genome-wide association mapping of canopy wilting in diverse soybean genotypesGenome-wide association mapping of canopy wilting in diverse soybean genotypes
Genome-wide association mapping of canopy wilting in diverse soybean genotypes
 
R code for data manipulation
R code for data manipulationR code for data manipulation
R code for data manipulation
 
R code descriptive statistics of phenotypic data by Avjinder Kaler
R code descriptive statistics of phenotypic data by Avjinder KalerR code descriptive statistics of phenotypic data by Avjinder Kaler
R code descriptive statistics of phenotypic data by Avjinder Kaler
 
Seed rate calculation for experiment
Seed rate calculation for experimentSeed rate calculation for experiment
Seed rate calculation for experiment
 
R Code for EM Algorithm
R Code for EM AlgorithmR Code for EM Algorithm
R Code for EM Algorithm
 
Basic Tutorial of Association Mapping by Avjinder Kaler
Basic Tutorial of Association Mapping by Avjinder KalerBasic Tutorial of Association Mapping by Avjinder Kaler
Basic Tutorial of Association Mapping by Avjinder Kaler
 
Tutorial for Circular and Rectangular Manhattan plots
Tutorial for Circular and Rectangular Manhattan plotsTutorial for Circular and Rectangular Manhattan plots
Tutorial for Circular and Rectangular Manhattan plots
 
Tutorial for Estimating Broad and Narrow Sense Heritability using R
Tutorial for Estimating Broad and Narrow Sense Heritability using RTutorial for Estimating Broad and Narrow Sense Heritability using R
Tutorial for Estimating Broad and Narrow Sense Heritability using R
 

Similar to R code for data manipulation

Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov VyacheslavSeminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov VyacheslavVyacheslav Arbuzov
 
Rcommands-for those who interested in R.
Rcommands-for those who interested in R.Rcommands-for those who interested in R.
Rcommands-for those who interested in R.Dr. Volkan OBAN
 
Basic R Data Manipulation
Basic R Data ManipulationBasic R Data Manipulation
Basic R Data ManipulationChu An
 
Optimization and Mathematical Programming in R and ROI - R Optimization Infra...
Optimization and Mathematical Programming in R and ROI - R Optimization Infra...Optimization and Mathematical Programming in R and ROI - R Optimization Infra...
Optimization and Mathematical Programming in R and ROI - R Optimization Infra...Dr. Volkan OBAN
 
R programming intro with examples
R programming intro with examplesR programming intro with examples
R programming intro with examplesDennis
 
R is a very flexible and powerful programming language, as well as a.pdf
R is a very flexible and powerful programming language, as well as a.pdfR is a very flexible and powerful programming language, as well as a.pdf
R is a very flexible and powerful programming language, as well as a.pdfannikasarees
 
Idea for ineractive programming language
Idea for ineractive programming languageIdea for ineractive programming language
Idea for ineractive programming languageLincoln Hannah
 
R Cheat Sheet for Data Analysts and Statisticians.pdf
R Cheat Sheet for Data Analysts and Statisticians.pdfR Cheat Sheet for Data Analysts and Statisticians.pdf
R Cheat Sheet for Data Analysts and Statisticians.pdfTimothy McBush Hiele
 
R Workshop for Beginners
R Workshop for BeginnersR Workshop for Beginners
R Workshop for BeginnersMetamarkets
 
A brief introduction to apply functions
A brief introduction to apply functionsA brief introduction to apply functions
A brief introduction to apply functionsNIKET CHAURASIA
 
The secrets of inverse brogramming
The secrets of inverse brogrammingThe secrets of inverse brogramming
The secrets of inverse brogrammingRichie Cotton
 
INFORMATIVE ESSAYThe purpose of the Informative Essay assignme.docx
INFORMATIVE ESSAYThe purpose of the Informative Essay assignme.docxINFORMATIVE ESSAYThe purpose of the Informative Essay assignme.docx
INFORMATIVE ESSAYThe purpose of the Informative Essay assignme.docxcarliotwaycave
 

Similar to R code for data manipulation (20)

Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov VyacheslavSeminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
 
Rcommands-for those who interested in R.
Rcommands-for those who interested in R.Rcommands-for those who interested in R.
Rcommands-for those who interested in R.
 
Basic R Data Manipulation
Basic R Data ManipulationBasic R Data Manipulation
Basic R Data Manipulation
 
Seminar PSU 10.10.2014 mme
Seminar PSU 10.10.2014 mmeSeminar PSU 10.10.2014 mme
Seminar PSU 10.10.2014 mme
 
bobok
bobokbobok
bobok
 
Optimization and Mathematical Programming in R and ROI - R Optimization Infra...
Optimization and Mathematical Programming in R and ROI - R Optimization Infra...Optimization and Mathematical Programming in R and ROI - R Optimization Infra...
Optimization and Mathematical Programming in R and ROI - R Optimization Infra...
 
R programming intro with examples
R programming intro with examplesR programming intro with examples
R programming intro with examples
 
proj1v2
proj1v2proj1v2
proj1v2
 
R is a very flexible and powerful programming language, as well as a.pdf
R is a very flexible and powerful programming language, as well as a.pdfR is a very flexible and powerful programming language, as well as a.pdf
R is a very flexible and powerful programming language, as well as a.pdf
 
R programming
R programmingR programming
R programming
 
Idea for ineractive programming language
Idea for ineractive programming languageIdea for ineractive programming language
Idea for ineractive programming language
 
R Cheat Sheet for Data Analysts and Statisticians.pdf
R Cheat Sheet for Data Analysts and Statisticians.pdfR Cheat Sheet for Data Analysts and Statisticians.pdf
R Cheat Sheet for Data Analysts and Statisticians.pdf
 
Introduction to R
Introduction to RIntroduction to R
Introduction to R
 
R Workshop for Beginners
R Workshop for BeginnersR Workshop for Beginners
R Workshop for Beginners
 
Spark workshop
Spark workshopSpark workshop
Spark workshop
 
Learn Matlab
Learn MatlabLearn Matlab
Learn Matlab
 
A brief introduction to apply functions
A brief introduction to apply functionsA brief introduction to apply functions
A brief introduction to apply functions
 
R Get Started II
R Get Started IIR Get Started II
R Get Started II
 
The secrets of inverse brogramming
The secrets of inverse brogrammingThe secrets of inverse brogramming
The secrets of inverse brogramming
 
INFORMATIVE ESSAYThe purpose of the Informative Essay assignme.docx
INFORMATIVE ESSAYThe purpose of the Informative Essay assignme.docxINFORMATIVE ESSAYThe purpose of the Informative Essay assignme.docx
INFORMATIVE ESSAYThe purpose of the Informative Essay assignme.docx
 

More from Avjinder (Avi) Kaler

Unleashing Real-World Simulations: A Python Tutorial by Avjinder Kaler
Unleashing Real-World Simulations: A Python Tutorial by Avjinder KalerUnleashing Real-World Simulations: A Python Tutorial by Avjinder Kaler
Unleashing Real-World Simulations: A Python Tutorial by Avjinder KalerAvjinder (Avi) Kaler
 
Tutorial for Deep Learning Project with Keras
Tutorial for Deep Learning Project  with KerasTutorial for Deep Learning Project  with Keras
Tutorial for Deep Learning Project with KerasAvjinder (Avi) Kaler
 
Tutorial for DBSCAN Clustering in Machine Learning
Tutorial for DBSCAN Clustering in Machine LearningTutorial for DBSCAN Clustering in Machine Learning
Tutorial for DBSCAN Clustering in Machine LearningAvjinder (Avi) Kaler
 
Python Code for Classification Supervised Machine Learning.pdf
Python Code for Classification Supervised Machine Learning.pdfPython Code for Classification Supervised Machine Learning.pdf
Python Code for Classification Supervised Machine Learning.pdfAvjinder (Avi) Kaler
 
Sql tutorial for select, where, order by, null, insert functions
Sql tutorial for select, where, order by, null, insert functionsSql tutorial for select, where, order by, null, insert functions
Sql tutorial for select, where, order by, null, insert functionsAvjinder (Avi) Kaler
 
Association mapping identifies loci for canopy coverage in diverse soybean ge...
Association mapping identifies loci for canopy coverage in diverse soybean ge...Association mapping identifies loci for canopy coverage in diverse soybean ge...
Association mapping identifies loci for canopy coverage in diverse soybean ge...Avjinder (Avi) Kaler
 
Normal and standard normal distribution
Normal and standard normal distributionNormal and standard normal distribution
Normal and standard normal distributionAvjinder (Avi) Kaler
 

More from Avjinder (Avi) Kaler (19)

Unleashing Real-World Simulations: A Python Tutorial by Avjinder Kaler
Unleashing Real-World Simulations: A Python Tutorial by Avjinder KalerUnleashing Real-World Simulations: A Python Tutorial by Avjinder Kaler
Unleashing Real-World Simulations: A Python Tutorial by Avjinder Kaler
 
Tutorial for Deep Learning Project with Keras
Tutorial for Deep Learning Project  with KerasTutorial for Deep Learning Project  with Keras
Tutorial for Deep Learning Project with Keras
 
Tutorial for DBSCAN Clustering in Machine Learning
Tutorial for DBSCAN Clustering in Machine LearningTutorial for DBSCAN Clustering in Machine Learning
Tutorial for DBSCAN Clustering in Machine Learning
 
Python Code for Classification Supervised Machine Learning.pdf
Python Code for Classification Supervised Machine Learning.pdfPython Code for Classification Supervised Machine Learning.pdf
Python Code for Classification Supervised Machine Learning.pdf
 
Sql tutorial for select, where, order by, null, insert functions
Sql tutorial for select, where, order by, null, insert functionsSql tutorial for select, where, order by, null, insert functions
Sql tutorial for select, where, order by, null, insert functions
 
Kaler et al 2018 euphytica
Kaler et al 2018 euphyticaKaler et al 2018 euphytica
Kaler et al 2018 euphytica
 
Association mapping identifies loci for canopy coverage in diverse soybean ge...
Association mapping identifies loci for canopy coverage in diverse soybean ge...Association mapping identifies loci for canopy coverage in diverse soybean ge...
Association mapping identifies loci for canopy coverage in diverse soybean ge...
 
Genome wide association mapping
Genome wide association mappingGenome wide association mapping
Genome wide association mapping
 
Population genetics
Population geneticsPopulation genetics
Population genetics
 
Quantitative genetics
Quantitative geneticsQuantitative genetics
Quantitative genetics
 
Abiotic stresses in plant
Abiotic stresses in plantAbiotic stresses in plant
Abiotic stresses in plant
 
Multiple linear regression
Multiple linear regressionMultiple linear regression
Multiple linear regression
 
Correlation in Statistics
Correlation in StatisticsCorrelation in Statistics
Correlation in Statistics
 
Simple linear regression
Simple linear regressionSimple linear regression
Simple linear regression
 
Analysis of Variance (ANOVA)
Analysis of Variance (ANOVA)Analysis of Variance (ANOVA)
Analysis of Variance (ANOVA)
 
Population and sample mean
Population and sample meanPopulation and sample mean
Population and sample mean
 
Descriptive statistics and graphs
Descriptive statistics and graphsDescriptive statistics and graphs
Descriptive statistics and graphs
 
Hypothesis and Test
Hypothesis and TestHypothesis and Test
Hypothesis and Test
 
Normal and standard normal distribution
Normal and standard normal distributionNormal and standard normal distribution
Normal and standard normal distribution
 

Recently uploaded

办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxUnduhUnggah1
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxdolaknnilon
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 

Recently uploaded (20)

办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docx
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptx
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 

R code for data manipulation

  • 1. R Code for Data Manipulation Avjinder Singh Kaler
  • 2. Table Content 1. Creating, Recoding and Renaming variables 2. Operators 3. Built-in Functions 4. Sorting, Merging and Aggregating Data 5. Reshaping Data 6. Sub-setting Data
  • 3. Introduction Data manipulation in R includes creating new (including recoding and renaming existingvariables variables), sorting and merging datasets, aggregating data, reshaping data, and subsetting datasets (including selecting observations that meet criteria, randomly sampling observeration, and dropping or keeping variables). Suppose you have a dataset (mydata) with three variables x1, x2 and x3. Old Variables New Variables Sorting and Merge Aggregating Recode and Reshaping Transform Subsetting Data Frame
  • 4. 1. Creating, Recoding and Renaming variables 1. Create new variable: For example, adding and getting average of two variables x1 and x2 to create new variable mydata$sum <- mydata$x1 + mydata$x2 mydata$mean <- (mydata$x1 + mydata$x2)/2 Second option is that one can attach the data with this code attach(mydata) mydata$sum <- x1 + x2 mydata$mean <- (x1 + x2)/2 mydata <- transform( mydata,sum = x1 + x2,mean = (x1 + x2)/2 ) 2. Recoding within variable mydata$agecat <- ifelse(mydata$age > 70, c("older"), c("younger")) Here, ifelse function can recode the variable. Meaning of this function in this code is that if one is interested in age variable from mydata to convert into older and young. mydata$age > 70------ condition if variable have more than 70 , then -----c("older"), -------- otherwise -------- c("younger")) Second option of recoding is: attach(mydata) mydata$agecat[age > 75] <- "Elder" mydata$agecat[age > 45 & age <= 75] <- "Middle Aged" mydata$agecat[age <= 45] <- "Young" Note: ifelse function can be used to change recode variable. ifelse--- (condtion, “then”, “otherwise”)
  • 5. 3. Renaming the variables Two ways to rename variables, interactively and programmatically # rename interactively fix(mydata) # results are saved on close # rename programmatically library(reshape) mydata <- rename(mydata, c(oldname="newname"))
  • 6. 2. Operators Operator can be arithmetic (+, -, /,) and logical (less or more than). Arithmetic Operators Operator Description + ----- addition - ----- subtraction * ------ multiplication / ------- division ^ or ** -----exponentiation x %% y ------ modulus (x mod y) 5%%2 is 1 x %/% y -------- integer division 5%/%2 is 2Logical Operators Logical Operators Operator Description < ---- less than <= ----- less than or equal to > ------ greater than >= ------ greater than or equal to == ------ exactly equal to != --- not equal to !x -------Not x x | y ------ x OR y x & y --------- x AND y isTRUE(x) ------- test if X is TRUE # An example x <- c(1:10) x[(x>8) | (x<5)] # yields 1 2 3 4 9 10 # How it works x <- c(1:10) x 1 2 3 4 5 6 7 8 9 10 x > 8 F F F F F F F F T T x < 5 T T T T F F F F F F x > 8 | x < 5 T T T T F F F F T T
  • 7. 3. Built-in Functions Almost everything in R is done through functions. Here I'm only refering to numeric and character functions that are commonly used in creating or recoding variables. 1. Numeric Functions Function Description abs(x) absolute value sqrt(x) square root ceiling(x) ceiling(3.475) is 4 floor(x) floor(3.475) is 3 trunc(x) trunc(5.99) is 5 round(x, digits=n) round(3.475, digits=2) is 3.48 signif(x, digits=n) signif(3.475, digits=2) is 3.5 cos(x), sin(x), tan(x) also acos(x), cosh(x), acosh(x), etc. log(x) natural logarithm log10(x) common logarithm exp(x) e^x 2. Character Functions Function Description substr(x, start=n1, stop=n2) Extract or replace substrings in a character vector. x <- "abcdef" substr(x, 2, 4) is "bcd" substr(x, 2, 4) <- "22222" is "a222ef" grep(pattern, x , ignore.case=FALSE, fixed=FALSE) Search for pattern in x. If fixed =FALSE then pattern is a regular expression. If fixed=TRUE then pattern is a text string. Returns matching indices. grep("A", c("b","A","c"), fixed=TRUE) returns 2 sub(pattern, replacement,x, ignore.case =FALSE, fixed=FALSE) Find pattern in x and replace with replacement text. If fixed=FALSE then pattern is a regular expression. If fixed = T then pattern is a text string. sub("s",".","Hello There") returns "Hello.There" strsplit(x, split) Split the elements of character vector x at split. strsplit("abc", "") returns 3 element vector "a","b","c" paste(..., sep="") Concatenate strings after using sep string to seperate
  • 8. them. paste("x",1:3,sep="") returns c("x1","x2" "x3") paste("x",1:3,sep="M") returns c("xM1","xM2" "xM3") paste("Today is", date()) toupper(x) Uppercase tolower(x) Lowercase 3. Statistical Probability Functions Function Description dnorm(x) normal density function (by default m=0 sd=1) # plot standard normal curve x <- pretty(c(-3,3), 30) y <- dnorm(x) plot(x, y, type='l', xlab="Normal Deviate", ylab="Density", yaxs="i") pnorm(q) cumulative normal probability for q (area under the normal curve to the left of q) pnorm(1.96) is 0.975 qnorm(p) normal quantile. value at the p percentile of normal distribution qnorm(.9) is 1.28 # 90th percentile rnorm(n, m=0,sd=1) n random normal deviates with mean m and standard deviation sd. #50 random normal variates with mean=50, sd=10 x <- rnorm(50, m=50, sd=10) dbinom(x, size, prob) pbinom(q, size, prob) qbinom(p, size, prob) rbinom(n, size, prob) binomial distribution where size is the sample size and prob is the probability of a heads (pi) # prob of 0 to 5 heads of fair coin out of 10 flips dbinom(0:5, 10, .5) # prob of 5 or less heads of fair coin out of 10 flips pbinom(5, 10, .5) dpois(x, lamda) ppois(q, lamda) qpois(p, lamda) rpois(n, lamda) poisson distribution with m=std=lamda #probability of 0,1, or 2 events with lamda=4 dpois(0:2, 4) # probability of at least 3 events with lamda=4 1- ppois(2,4) dunif(x, min=0, max=1) punif(q, min=0, max=1) qunif(p, min=0, max=1) runif(n, min=0, max=1) uniform distribution, follows the same pattern as the normal distribution above. #10 uniform random variates x <- runif(10)
  • 9. 4. Other Statistical Functions Function Description mean(x, trim=0, na.rm=FALSE) mean of object x # trimmed mean, removing any missing values and # 5 percent of highest and lowest scores mx <- mean(x,trim=.05,na.rm=TRUE) sd(x) standard deviation of object(x). also look at var(x) for variance and mad(x) for median absolute deviation. median(x) median quantile(x, probs) quantiles where x is the numeric vector whose quantiles are desired and probs is a numeric vector with probabilities in [0,1]. # 30th and 84th percentiles of x y <- quantile(x, c(.3,.84)) range(x) range sum(x) sum diff(x, lag=1) lagged differences, with lag indicating which lag to use min(x) minimum max(x) maximum scale(x, center=TRUE, scale=TRUE) column center or standardize a matrix.
  • 10. 4. Sorting, Merging and Aggregating Data 1. Sorting # sorting examples using the mydata dataset attach(mtdata) # sort by x1 newdata <- mydata[order(x1),] # sort by x1 and x2 newdata <- mydata[order(x1, x2),] #sort by x1 (ascending) and x2(decending) newdata <- mydata[order(x1, -x2),] 2. Merging Adding columns # merge two data frames by ID total <- merge(data frameA,data frameB,by="ID") # merge two data frames by ID and Country total <- merge(data frameA,data frameB,by=c("ID","Country")) Adding Row total <- rbind(data frameA, data frameB Aggregating Data # aggregate data frame mtcars by cyl and vs, returning means # for numeric variables attach(mtcars) aggdata <-aggregate(mtcars, by=list(cyl,vs), FUN=mean, na.rm=TRUE) print(aggdata) detach(mtcars)
  • 11. 5. Reshaping Data 1. Transpose mydata t(mydata) 2. Reshape package mydata id time x1 x2 1 1 5 6 1 2 3 5 2 1 6 1 2 2 2 4 library(reshape) newdata <- melt(mydata, id=c("id","time")) newdata id time variablevalue 1 1 x1 5 1 2 x1 3 2 1 x1 6 2 2 x1 2 1 1 x2 6 1 2 x2 5 2 1 x2 1 2 2 x2 4 Can use other function with this # cast the melted data # cast(data, formula, function) subjmeans <- cast(mdata, id~variable, mean) timemeans <- cast(mdata, time~variable, mean)
  • 12. 6. Subsetting Data 1. Selecting (Keeping) Variables # select variables v1, v2, v3 myvars <- c("v1", "v2", "v3") newdata <- mydata[myvars] # another method myvars <- paste("v", 1:3, sep="") newdata <- mydata[myvars] # select 1st and 5th thru 10th variables newdata <- mydata[c(1,5:10)] 2. Excluding (DROPPING) Variables # exclude variables v1, v2, v3 myvars <- names(mydata) %in% c("v1", "v2", "v3") newdata <- mydata[!myvars] # exclude 3rd and 5th variable newdata <- mydata[c(-3,-5)] # delete variables v3 and v5 mydata$v3 <- mydata$v5 <- NULL 3. Selecting Observations # first 5 observations newdata <- mydata[1:5,] # based on variable values newdata <- mydata[ which(mydata$gender=='F' & mydata$age > 65), ] # or attach(newdata) newdata <- mydata[ which(gender=='F' & age > 65),] detach(newdata)
  • 13. 4. Selection using the Subset Function # using subset function newdata <- subset(mydata, age >= 20 | age < 10, select=c(ID, Weight)) # using subset function (part 2) newdata <- subset(mydata, sex=="m" & age > 25, select=weight:income)