SlideShare una empresa de Scribd logo
1 de 38
Introduction to R
     Basic Teaching module
 EMBL International PhD Program
           13-10-2010
Sander Timmer & Myrto Kostadima
Overview

What is R

Quick overview datatypes, input/output and
plots

Some biological examples

I’m not a particular good teacher, so please
ask when you’re lost!
What is this R thing?

R is a powerful, general purpose language
and software environment for statistical
computing and graphics

Runs on Linux, OS X and for the unlucky few
also on Windows

R is open source and free!
Start your R interface
Variables


x <- 2

x <- x^2

x

[1] 4
Vectors
Many ways of generating a vector with a range of numbers:

   x <- 1:10

   assign(“x”, 1:10)

   x <- c(1,2,3,4,5,6,7,8,9,10)

   x <- seq(1,10, by=1)

   x <- seq(length = 10, from=1,by=1)

x
[1] 1 2 3 4 5 6 7 8 9 10
Vectors

Common way to store multiple values

x <- c(1,2,4,5,10,12,15)

length(x)

mean(x)

summary(x)
Vectors

Vectors are indexed

x[5] + x[10]
[1] 15

x[-c(5,10)]
[1] 1 2 3 4 6 7 8 9
Matrices

Common form of storing 2 dimensional data

  Think about having an Excel sheet

m = matrix(1:10,2,5)
     [,1] [,2] [,3] [,4] [,5]
[1,]    1   3    5    7    9
[2,] 2      4    6    8 10

summary(m)
Factors
Factors are vectors with a discrete number of
levels:

x <- factor(c(“Cancer”, “Cancer”, “Normal”,
“Normal”))

levels(x)
[1] “Cancer” “Normal”

table(x)
Cancer Normal
      2     2
Lists

A list can contain “anything”

Useful for storing several vectors

list(gene=”gene 1”, expression=c(5,2,3))
$gene
[1] “gene 1”
$expression
[1] 5, 2, 4
If-else statements

Essential for any programming language

if state then do x else do y

if(p < 0.01){
    print(“Significant gene”)
}else{
    print(“Insignificant gene”)
}
Repetition
You want to apply 1 function to every
element of a list

for(element in list){ ....do something.... }

For loops are easy though tend to be slow

Apply is the fast way of getting things done
in R:

apply(List,1,mean)
Data input


R has countless ways of importing data:

  CSV

  Excel

  Flat text file
Data input
Most simple, the CSV file:

  read.csv(“mydata.csv”,
  row.names=T,col.names=T)

Load a tab separated file

  read.table(“mytable.txt”, sep=”t”)

Load Rdata file

  load(“mydata.Rdata”)
Data input
Also for more specific data sources:

Excel

Database connections

            Mysql -> Ensembl e.g.

Affy

       Affymetrix chips data

HapMap

.........
Data output
Most simple, the CSV file:

  write.csv(x, file=”myx.csv”)

Save Rdata file:

  save(x, file=”myx.Rdata”)

Save whole R session:

  save(file=”mysession.Rdata”)
Graphics


Quick way to study your data is plotting it

The function “plot” in R can plot almost
anything out of the box (even if this doesn’t
make sense!)
plot(1:5,5:1)
plot(1:5,5:1, col=”red”, type=”l”)
plot(1:5,5:1, col=”red”, type=”l”,
    main="Title of this plot",
  xlab="x axis", ylab="y axis")
Basic graphics

With R you can plot almost any object

  Multidimensional variables like matrixes
  can be plotted with matplot()

Other often used plot functions are:

  boxplot(), hist(), levelplot(), heatmap()
Advanced plotting
Advanced plotting
Advanced plotting
Before the example
Help page for functions in R can be called:

  ?plot, ?hist, ?vector

Examples for most functions can be runned:

  example(plot)

Text search for functions can be done by
performing:

  ??plot
Example

Some example Affymetrix dataset to play
with

  Checking distribution of data

  Plotting data

  Clustering data

  Correlate data
Read file


library(affy)

library(affydata)

data(Dilution)

print(Dilution)
Read file


dil = pm(Dilution)[1:2000,]

dil.ex = exprs(Dilution)[1:2000,]

rownames(dil.ex) =
row.names(probes(Dilution))[1:2000]
Summary
Checking what we got

summary(dil)

mva.pairs(dil)

Or:

boxplot(log(dil.ex))

Or:

hist(dil.ex, xlim=c(0,500), breaks=1000)
We need to normalise
       first
For almost all experiments you have to apply
some sort of normalisation

dil.norm = maffy.normalize(dil,
subset=1:nrow(dil))

colnames(dil.norm) = colnames(dil)

mva.pairs(dil.norm)
Most equal samples

Applying euclidian distance to detect most
equal samples

dil.norm.dist = dist(t(dil.norm))

dil.norm.dist.hc = hclust(dil.norm.dist)

plot(dil.norm.dist.hc)

Do the same for the non normalised dataset
Checking expression

Heatmap representation of expression levels
for different probes

heatmap(dil.ex.norm[1:50,])

You could apply a T-test for example to rank
to only plot the most significant probes
Checking expression

Heatmap representation of expression levels
for different probes

heatmap(dil.ex.norm[1:50,])

You could apply a T-test for example to rank
to only plot the most significant probes
Checking expression
You could apply a T-test for example to rank
to only plot the most significant probes

library(genefilter)

f = factor(c(1,1,2,2))

dil.exp.norm.t = rowttests(dil.exp.norm, fac=f)

heatmap(dil.exp.norm[order(dil.exp.norm.t
$dm)[1:10],])
Want to know more?
Using R will benefit all PhD’s in this room

Learning by doing

Loads of basic examples at:

  http://addictedtor.free.fr/graphiques/

  http://www.mayin.org/ajayshah/KB/R/
  index.html

  http://www.r-project.org/
Just keep in mind......
Questions?


Contact me:

swtimmer@ebi.ac.uk

http://www.ebi.ac.uk/~swtimmer/ for slides
or http://www.slideshare.net/swtimmer

Más contenido relacionado

La actualidad más candente

RDataMining slides-r-programming
RDataMining slides-r-programmingRDataMining slides-r-programming
RDataMining slides-r-programmingYanchang Zhao
 
Data Analysis with R (combined slides)
Data Analysis with R (combined slides)Data Analysis with R (combined slides)
Data Analysis with R (combined slides)Guy Lebanon
 
Best corporate-r-programming-training-in-mumbai
Best corporate-r-programming-training-in-mumbaiBest corporate-r-programming-training-in-mumbai
Best corporate-r-programming-training-in-mumbaiUnmesh Baile
 
Introduction to R programming
Introduction to R programmingIntroduction to R programming
Introduction to R programmingAlberto Labarga
 
2. R-basics, Vectors, Arrays, Matrices, Factors
2. R-basics, Vectors, Arrays, Matrices, Factors2. R-basics, Vectors, Arrays, Matrices, Factors
2. R-basics, Vectors, Arrays, Matrices, Factorskrishna singh
 
Data analysis with R
Data analysis with RData analysis with R
Data analysis with RShareThis
 
R basics
R basicsR basics
R basicsFAO
 
4 R Tutorial DPLYR Apply Function
4 R Tutorial DPLYR Apply Function4 R Tutorial DPLYR Apply Function
4 R Tutorial DPLYR Apply FunctionSakthi Dasans
 
R-programming-training-in-mumbai
R-programming-training-in-mumbaiR-programming-training-in-mumbai
R-programming-training-in-mumbaiUnmesh Baile
 
Data Analysis and Programming in R
Data Analysis and Programming in RData Analysis and Programming in R
Data Analysis and Programming in REshwar Sai
 
Morel, a Functional Query Language
Morel, a Functional Query LanguageMorel, a Functional Query Language
Morel, a Functional Query LanguageJulian Hyde
 
Next Generation Programming in R
Next Generation Programming in RNext Generation Programming in R
Next Generation Programming in RFlorian Uhlitz
 
R Programming: Importing Data In R
R Programming: Importing Data In RR Programming: Importing Data In R
R Programming: Importing Data In RRsquared Academy
 
5 R Tutorial Data Visualization
5 R Tutorial Data Visualization5 R Tutorial Data Visualization
5 R Tutorial Data VisualizationSakthi Dasans
 
RDataMining slides-regression-classification
RDataMining slides-regression-classificationRDataMining slides-regression-classification
RDataMining slides-regression-classificationYanchang Zhao
 
R programming groundup-basic-section-i
R programming groundup-basic-section-iR programming groundup-basic-section-i
R programming groundup-basic-section-iDr. Awase Khirni Syed
 

La actualidad más candente (20)

Programming in R
Programming in RProgramming in R
Programming in R
 
R programming language
R programming languageR programming language
R programming language
 
RDataMining slides-r-programming
RDataMining slides-r-programmingRDataMining slides-r-programming
RDataMining slides-r-programming
 
Data Analysis with R (combined slides)
Data Analysis with R (combined slides)Data Analysis with R (combined slides)
Data Analysis with R (combined slides)
 
R language introduction
R language introductionR language introduction
R language introduction
 
Best corporate-r-programming-training-in-mumbai
Best corporate-r-programming-training-in-mumbaiBest corporate-r-programming-training-in-mumbai
Best corporate-r-programming-training-in-mumbai
 
Introduction to R programming
Introduction to R programmingIntroduction to R programming
Introduction to R programming
 
2. R-basics, Vectors, Arrays, Matrices, Factors
2. R-basics, Vectors, Arrays, Matrices, Factors2. R-basics, Vectors, Arrays, Matrices, Factors
2. R-basics, Vectors, Arrays, Matrices, Factors
 
Data analysis with R
Data analysis with RData analysis with R
Data analysis with R
 
R basics
R basicsR basics
R basics
 
4 R Tutorial DPLYR Apply Function
4 R Tutorial DPLYR Apply Function4 R Tutorial DPLYR Apply Function
4 R Tutorial DPLYR Apply Function
 
R-programming-training-in-mumbai
R-programming-training-in-mumbaiR-programming-training-in-mumbai
R-programming-training-in-mumbai
 
Machine Learning in R
Machine Learning in RMachine Learning in R
Machine Learning in R
 
Data Analysis and Programming in R
Data Analysis and Programming in RData Analysis and Programming in R
Data Analysis and Programming in R
 
Morel, a Functional Query Language
Morel, a Functional Query LanguageMorel, a Functional Query Language
Morel, a Functional Query Language
 
Next Generation Programming in R
Next Generation Programming in RNext Generation Programming in R
Next Generation Programming in R
 
R Programming: Importing Data In R
R Programming: Importing Data In RR Programming: Importing Data In R
R Programming: Importing Data In R
 
5 R Tutorial Data Visualization
5 R Tutorial Data Visualization5 R Tutorial Data Visualization
5 R Tutorial Data Visualization
 
RDataMining slides-regression-classification
RDataMining slides-regression-classificationRDataMining slides-regression-classification
RDataMining slides-regression-classification
 
R programming groundup-basic-section-i
R programming groundup-basic-section-iR programming groundup-basic-section-i
R programming groundup-basic-section-i
 

Destacado

Destacado (20)

R language tutorial
R language tutorialR language tutorial
R language tutorial
 
Introduction To R
Introduction To RIntroduction To R
Introduction To R
 
Introduction to the R Statistical Computing Environment
Introduction to the R Statistical Computing EnvironmentIntroduction to the R Statistical Computing Environment
Introduction to the R Statistical Computing Environment
 
Example R usage for oracle DBA UKOUG 2013
Example R usage for oracle DBA UKOUG 2013Example R usage for oracle DBA UKOUG 2013
Example R usage for oracle DBA UKOUG 2013
 
Getting Started with R
Getting Started with RGetting Started with R
Getting Started with R
 
R presentation
R presentationR presentation
R presentation
 
Big Data Analytics with R
Big Data Analytics with RBig Data Analytics with R
Big Data Analytics with R
 
Big data analytics using R
Big data analytics using RBig data analytics using R
Big data analytics using R
 
Rtutorial
RtutorialRtutorial
Rtutorial
 
Moving Data to and From R
Moving Data to and From RMoving Data to and From R
Moving Data to and From R
 
2 R Tutorial Programming
2 R Tutorial Programming2 R Tutorial Programming
2 R Tutorial Programming
 
R Introduction
R IntroductionR Introduction
R Introduction
 
1 R Tutorial Introduction
1 R Tutorial Introduction1 R Tutorial Introduction
1 R Tutorial Introduction
 
R tutorial
R tutorialR tutorial
R tutorial
 
R at Microsoft (useR! 2016)
R at Microsoft (useR! 2016)R at Microsoft (useR! 2016)
R at Microsoft (useR! 2016)
 
R for data analytics
R for data analyticsR for data analytics
R for data analytics
 
R at Microsoft
R at MicrosoftR at Microsoft
R at Microsoft
 
Intro to RStudio
Intro to RStudioIntro to RStudio
Intro to RStudio
 
Introduction to R for Data Science :: Session 7 [Multiple Linear Regression i...
Introduction to R for Data Science :: Session 7 [Multiple Linear Regression i...Introduction to R for Data Science :: Session 7 [Multiple Linear Regression i...
Introduction to R for Data Science :: Session 7 [Multiple Linear Regression i...
 
R- Introduction
R- IntroductionR- Introduction
R- Introduction
 

Similar a Presentation R basic teaching module

Basic and logical implementation of r language
Basic and logical implementation of r language Basic and logical implementation of r language
Basic and logical implementation of r language Md. Mahedi Mahfuj
 
Ejercicios de estilo en la programación
Ejercicios de estilo en la programaciónEjercicios de estilo en la programación
Ejercicios de estilo en la programaciónSoftware Guru
 
PPT ON MACHINE LEARNING by Ragini Ratre
PPT ON MACHINE LEARNING by Ragini RatrePPT ON MACHINE LEARNING by Ragini Ratre
PPT ON MACHINE LEARNING by Ragini RatreRaginiRatre
 
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov VyacheslavSeminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov VyacheslavVyacheslav Arbuzov
 
Basic R Data Manipulation
Basic R Data ManipulationBasic R Data Manipulation
Basic R Data ManipulationChu An
 
R Cheat Sheet – Data Management
R Cheat Sheet – Data ManagementR Cheat Sheet – Data Management
R Cheat Sheet – Data ManagementDr. Volkan OBAN
 
R Programming Reference Card
R Programming Reference CardR Programming Reference Card
R Programming Reference CardMaurice Dawson
 
20130215 Reading data into R
20130215 Reading data into R20130215 Reading data into R
20130215 Reading data into RKazuki Yoshida
 
Poetry with R -- Dissecting the code
Poetry with R -- Dissecting the codePoetry with R -- Dissecting the code
Poetry with R -- Dissecting the codePeter Solymos
 

Similar a Presentation R basic teaching module (20)

R workshop
R workshopR workshop
R workshop
 
Basic and logical implementation of r language
Basic and logical implementation of r language Basic and logical implementation of r language
Basic and logical implementation of r language
 
Ejercicios de estilo en la programación
Ejercicios de estilo en la programaciónEjercicios de estilo en la programación
Ejercicios de estilo en la programación
 
An Intoduction to R
An Intoduction to RAn Intoduction to R
An Intoduction to R
 
R for Statistical Computing
R for Statistical ComputingR for Statistical Computing
R for Statistical Computing
 
R교육1
R교육1R교육1
R교육1
 
PPT ON MACHINE LEARNING by Ragini Ratre
PPT ON MACHINE LEARNING by Ragini RatrePPT ON MACHINE LEARNING by Ragini Ratre
PPT ON MACHINE LEARNING by Ragini Ratre
 
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov VyacheslavSeminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
Seminar PSU 09.04.2013 - 10.04.2013 MiFIT, Arbuzov Vyacheslav
 
Basic R Data Manipulation
Basic R Data ManipulationBasic R Data Manipulation
Basic R Data Manipulation
 
R Basics
R BasicsR Basics
R Basics
 
R Basics
R BasicsR Basics
R Basics
 
20170509 rand db_lesugent
20170509 rand db_lesugent20170509 rand db_lesugent
20170509 rand db_lesugent
 
Ggplot2 v3
Ggplot2 v3Ggplot2 v3
Ggplot2 v3
 
R Cheat Sheet – Data Management
R Cheat Sheet – Data ManagementR Cheat Sheet – Data Management
R Cheat Sheet – Data Management
 
R Programming Reference Card
R Programming Reference CardR Programming Reference Card
R Programming Reference Card
 
20130215 Reading data into R
20130215 Reading data into R20130215 Reading data into R
20130215 Reading data into R
 
Poetry with R -- Dissecting the code
Poetry with R -- Dissecting the codePoetry with R -- Dissecting the code
Poetry with R -- Dissecting the code
 
Lrz kurse: r visualisation
Lrz kurse: r visualisationLrz kurse: r visualisation
Lrz kurse: r visualisation
 
Introduction to R
Introduction to RIntroduction to R
Introduction to R
 
Reference card for R
Reference card for RReference card for R
Reference card for R
 

Último

DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 

Último (20)

DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 

Presentation R basic teaching module

  • 1. Introduction to R Basic Teaching module EMBL International PhD Program 13-10-2010 Sander Timmer & Myrto Kostadima
  • 2. Overview What is R Quick overview datatypes, input/output and plots Some biological examples I’m not a particular good teacher, so please ask when you’re lost!
  • 3. What is this R thing? R is a powerful, general purpose language and software environment for statistical computing and graphics Runs on Linux, OS X and for the unlucky few also on Windows R is open source and free!
  • 4. Start your R interface
  • 5. Variables x <- 2 x <- x^2 x [1] 4
  • 6. Vectors Many ways of generating a vector with a range of numbers: x <- 1:10 assign(“x”, 1:10) x <- c(1,2,3,4,5,6,7,8,9,10) x <- seq(1,10, by=1) x <- seq(length = 10, from=1,by=1) x [1] 1 2 3 4 5 6 7 8 9 10
  • 7. Vectors Common way to store multiple values x <- c(1,2,4,5,10,12,15) length(x) mean(x) summary(x)
  • 8. Vectors Vectors are indexed x[5] + x[10] [1] 15 x[-c(5,10)] [1] 1 2 3 4 6 7 8 9
  • 9. Matrices Common form of storing 2 dimensional data Think about having an Excel sheet m = matrix(1:10,2,5) [,1] [,2] [,3] [,4] [,5] [1,] 1 3 5 7 9 [2,] 2 4 6 8 10 summary(m)
  • 10. Factors Factors are vectors with a discrete number of levels: x <- factor(c(“Cancer”, “Cancer”, “Normal”, “Normal”)) levels(x) [1] “Cancer” “Normal” table(x) Cancer Normal 2 2
  • 11. Lists A list can contain “anything” Useful for storing several vectors list(gene=”gene 1”, expression=c(5,2,3)) $gene [1] “gene 1” $expression [1] 5, 2, 4
  • 12. If-else statements Essential for any programming language if state then do x else do y if(p < 0.01){ print(“Significant gene”) }else{ print(“Insignificant gene”) }
  • 13. Repetition You want to apply 1 function to every element of a list for(element in list){ ....do something.... } For loops are easy though tend to be slow Apply is the fast way of getting things done in R: apply(List,1,mean)
  • 14. Data input R has countless ways of importing data: CSV Excel Flat text file
  • 15. Data input Most simple, the CSV file: read.csv(“mydata.csv”, row.names=T,col.names=T) Load a tab separated file read.table(“mytable.txt”, sep=”t”) Load Rdata file load(“mydata.Rdata”)
  • 16. Data input Also for more specific data sources: Excel Database connections Mysql -> Ensembl e.g. Affy Affymetrix chips data HapMap .........
  • 17. Data output Most simple, the CSV file: write.csv(x, file=”myx.csv”) Save Rdata file: save(x, file=”myx.Rdata”) Save whole R session: save(file=”mysession.Rdata”)
  • 18. Graphics Quick way to study your data is plotting it The function “plot” in R can plot almost anything out of the box (even if this doesn’t make sense!)
  • 21. plot(1:5,5:1, col=”red”, type=”l”, main="Title of this plot", xlab="x axis", ylab="y axis")
  • 22. Basic graphics With R you can plot almost any object Multidimensional variables like matrixes can be plotted with matplot() Other often used plot functions are: boxplot(), hist(), levelplot(), heatmap()
  • 26. Before the example Help page for functions in R can be called: ?plot, ?hist, ?vector Examples for most functions can be runned: example(plot) Text search for functions can be done by performing: ??plot
  • 27. Example Some example Affymetrix dataset to play with Checking distribution of data Plotting data Clustering data Correlate data
  • 29. Read file dil = pm(Dilution)[1:2000,] dil.ex = exprs(Dilution)[1:2000,] rownames(dil.ex) = row.names(probes(Dilution))[1:2000]
  • 30. Summary Checking what we got summary(dil) mva.pairs(dil) Or: boxplot(log(dil.ex)) Or: hist(dil.ex, xlim=c(0,500), breaks=1000)
  • 31. We need to normalise first For almost all experiments you have to apply some sort of normalisation dil.norm = maffy.normalize(dil, subset=1:nrow(dil)) colnames(dil.norm) = colnames(dil) mva.pairs(dil.norm)
  • 32. Most equal samples Applying euclidian distance to detect most equal samples dil.norm.dist = dist(t(dil.norm)) dil.norm.dist.hc = hclust(dil.norm.dist) plot(dil.norm.dist.hc) Do the same for the non normalised dataset
  • 33. Checking expression Heatmap representation of expression levels for different probes heatmap(dil.ex.norm[1:50,]) You could apply a T-test for example to rank to only plot the most significant probes
  • 34. Checking expression Heatmap representation of expression levels for different probes heatmap(dil.ex.norm[1:50,]) You could apply a T-test for example to rank to only plot the most significant probes
  • 35. Checking expression You could apply a T-test for example to rank to only plot the most significant probes library(genefilter) f = factor(c(1,1,2,2)) dil.exp.norm.t = rowttests(dil.exp.norm, fac=f) heatmap(dil.exp.norm[order(dil.exp.norm.t $dm)[1:10],])
  • 36. Want to know more? Using R will benefit all PhD’s in this room Learning by doing Loads of basic examples at: http://addictedtor.free.fr/graphiques/ http://www.mayin.org/ajayshah/KB/R/ index.html http://www.r-project.org/
  • 37. Just keep in mind......