Take control of your SAP testing with UiPath Test Suite
Presentation R basic teaching module
1. Introduction to R
Basic Teaching module
EMBL International PhD Program
13-10-2010
Sander Timmer & Myrto Kostadima
2. Overview
What is R
Quick overview datatypes, input/output and
plots
Some biological examples
I’m not a particular good teacher, so please
ask when you’re lost!
3. What is this R thing?
R is a powerful, general purpose language
and software environment for statistical
computing and graphics
Runs on Linux, OS X and for the unlucky few
also on Windows
R is open source and free!
6. Vectors
Many ways of generating a vector with a range of numbers:
x <- 1:10
assign(“x”, 1:10)
x <- c(1,2,3,4,5,6,7,8,9,10)
x <- seq(1,10, by=1)
x <- seq(length = 10, from=1,by=1)
x
[1] 1 2 3 4 5 6 7 8 9 10
7. Vectors
Common way to store multiple values
x <- c(1,2,4,5,10,12,15)
length(x)
mean(x)
summary(x)
9. Matrices
Common form of storing 2 dimensional data
Think about having an Excel sheet
m = matrix(1:10,2,5)
[,1] [,2] [,3] [,4] [,5]
[1,] 1 3 5 7 9
[2,] 2 4 6 8 10
summary(m)
10. Factors
Factors are vectors with a discrete number of
levels:
x <- factor(c(“Cancer”, “Cancer”, “Normal”,
“Normal”))
levels(x)
[1] “Cancer” “Normal”
table(x)
Cancer Normal
2 2
11. Lists
A list can contain “anything”
Useful for storing several vectors
list(gene=”gene 1”, expression=c(5,2,3))
$gene
[1] “gene 1”
$expression
[1] 5, 2, 4
12. If-else statements
Essential for any programming language
if state then do x else do y
if(p < 0.01){
print(“Significant gene”)
}else{
print(“Insignificant gene”)
}
13. Repetition
You want to apply 1 function to every
element of a list
for(element in list){ ....do something.... }
For loops are easy though tend to be slow
Apply is the fast way of getting things done
in R:
apply(List,1,mean)
14. Data input
R has countless ways of importing data:
CSV
Excel
Flat text file
15. Data input
Most simple, the CSV file:
read.csv(“mydata.csv”,
row.names=T,col.names=T)
Load a tab separated file
read.table(“mytable.txt”, sep=”t”)
Load Rdata file
load(“mydata.Rdata”)
16. Data input
Also for more specific data sources:
Excel
Database connections
Mysql -> Ensembl e.g.
Affy
Affymetrix chips data
HapMap
.........
17. Data output
Most simple, the CSV file:
write.csv(x, file=”myx.csv”)
Save Rdata file:
save(x, file=”myx.Rdata”)
Save whole R session:
save(file=”mysession.Rdata”)
18. Graphics
Quick way to study your data is plotting it
The function “plot” in R can plot almost
anything out of the box (even if this doesn’t
make sense!)
22. Basic graphics
With R you can plot almost any object
Multidimensional variables like matrixes
can be plotted with matplot()
Other often used plot functions are:
boxplot(), hist(), levelplot(), heatmap()
26. Before the example
Help page for functions in R can be called:
?plot, ?hist, ?vector
Examples for most functions can be runned:
example(plot)
Text search for functions can be done by
performing:
??plot
27. Example
Some example Affymetrix dataset to play
with
Checking distribution of data
Plotting data
Clustering data
Correlate data
30. Summary
Checking what we got
summary(dil)
mva.pairs(dil)
Or:
boxplot(log(dil.ex))
Or:
hist(dil.ex, xlim=c(0,500), breaks=1000)
31. We need to normalise
first
For almost all experiments you have to apply
some sort of normalisation
dil.norm = maffy.normalize(dil,
subset=1:nrow(dil))
colnames(dil.norm) = colnames(dil)
mva.pairs(dil.norm)
32. Most equal samples
Applying euclidian distance to detect most
equal samples
dil.norm.dist = dist(t(dil.norm))
dil.norm.dist.hc = hclust(dil.norm.dist)
plot(dil.norm.dist.hc)
Do the same for the non normalised dataset
33. Checking expression
Heatmap representation of expression levels
for different probes
heatmap(dil.ex.norm[1:50,])
You could apply a T-test for example to rank
to only plot the most significant probes
34. Checking expression
Heatmap representation of expression levels
for different probes
heatmap(dil.ex.norm[1:50,])
You could apply a T-test for example to rank
to only plot the most significant probes
35. Checking expression
You could apply a T-test for example to rank
to only plot the most significant probes
library(genefilter)
f = factor(c(1,1,2,2))
dil.exp.norm.t = rowttests(dil.exp.norm, fac=f)
heatmap(dil.exp.norm[order(dil.exp.norm.t
$dm)[1:10],])
36. Want to know more?
Using R will benefit all PhD’s in this room
Learning by doing
Loads of basic examples at:
http://addictedtor.free.fr/graphiques/
http://www.mayin.org/ajayshah/KB/R/
index.html
http://www.r-project.org/