SlideShare a Scribd company logo
1 of 58
Download to read offline
THE WRIGHT LAB COMPUTATION LUNCHES	


An introduction to R	

Gene expression from DEVA to differential expression	

Handling massively parallel sequencing data
R: data frames, plots and
linear models
Martin Johnsson
statistical environment
free open source
scripting language
great packages (limma for
microarrays, R/qtl for QTL
mapping etc)
You need to write code!
(

Why scripting?
harder the first time
easier the next 20 times …
necessity for moderately large data

)
R-project.org	


Rstudio.com
Interface is immaterial

ssh into a server, RStudio,
alternatives
very few platform-dependent
elements that the end user needs to
worry about
Scripting

write your code down in a .R file
run it with source ( )
## comments
if it runs without intervention
from start to finish, you’re
actually programming
Help!

within R: ? and tab
your favourite search engine
ask (Stack Exchange, R-help mailing list,
package mailing lists)
First task: make a new script that
imports a data set and takes a subset
of the data
Reading in data

Excel sheet to data.frame
one sheet at a time
clear formatting
short succinct column names
export text

read.table
Subsetting

logical operators ==, !=, >, <, !
subset(data, column1==1)
subset(data, column1==1 & column2>2)
Indexing with [ ]

first three rows: data[c(1,2,3),]
two columns: data[,c(2,4)]
ranges: data[,1:3]
Variables

columns in expressions
data$column1 + data$column2
log10(data$column2)

assignment arrow
data$new.column <- log10(data$column2)
new.data <- subset(data, column1==10)
Exercise: Start a new script and save
it as unicorn_analysis.R.
Import unicorn_data.csv.
Take a subset that only includes
green unicorns.
RStudio: File > New > R Script
data <- read.csv("unicorn_data.csv")
green.unicorns <- subset(data, colour=="green")
Anatomy of a function call

function.name(parameters)
mean(data$column)
mean(x=data$column)
mean(x=data$column, na.rm=T)
?mean
mean(exp(log(x)))
programming in R ==
stringing together functions
and writing new ones
Using a package (ggplot2) to make
statistical graphics.
install.packages("ggplot2")
library(ggplot2)
qplot(x=x.var, y=y.var, data=data)
only x: histogram
x and y numeric: scatterplot

or set geometry (geom) yourself
geoms:
point
line
boxplot
jitter – scattered points
tile – heatmap
and many more
Exercise: Make a scatterplot of weight
and horn length in green unicorns.
Write all code in the
unicorn_analysis.R script.
Save the plots as variables so you can
refer back to them.
33

horn.length

30

27

24

250

300

350

weight

400

450

green.scatterplot <- qplot(x=weight,
y=horn.length, data=green.unicorns)
Exercise: Make a boxplot of horn
length versus diet.
40

horn.length

35

30

25

candy

diet

flowers

qplot(x=diet, y=weight,
data=unicorn.data, geom="boxplot")
Small multiples

split the plot into multiple subplots
useful for looking at patterns
qplot(x=x.var, y=y.var, data=data,
facets=~variable)
facets=variable1~variable2
Exercise: Again, make a boxplot of
diet and horn length, but separated
into small multiples by colour.
green

40

pink

horn.length

35

30

25

candy

flowers

diet

candy

flowers

qplot(x=diet, y=horn.length, data=data,
geom="boxplot", facets=~colour)
Comparing means with linear models
Wilkinson–Rogers notation

one predictor: y ~ x
additive model: y ~ x1 + x2
interactions: y ~ x1 + x2 + x1:y2
or y ~ x1 * x2
factors: y ~ factor(x)
Student’s t-test
t.test(variable ~ grouping, data=data)
alternative: two sided, less, greater
var.equal
paired

geoms: boxplot, jitter
Carl Friedrich Gauss
least squares estimation
40

horn.length

35

30

25

250

300

350

weight

400

450
linear model
y = a + b x + e, e ~ N(0, sigma)
lm(y ~ x, data=some.data)
formula
data
summary( ) function
Exercise: Make a regression of horn
length and body weight.
model <- lm(horn.length ~ weight, data=data)
summary(model)
Call:
lm(formula = horn.length ~ weight, data = data)
Residuals:
Min
1Q Median
-6.5280 -2.0230 -0.1902

3Q
2.5459

Max
7.3620

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 20.41236
3.85774
5.291 1.76e-05 ***
weight
0.03153
0.01093
2.886 0.00793 **
--Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3.447 on 25 degrees of freedom
(3 observations deleted due to missingness)
Multiple R-squared: 0.2499, Adjusted R-squared: 0.2199
F-statistic: 8.327 on 1 and 25 DF, p-value: 0.007932
What have we actually fitted?
model.matrix(horn.length ~ weight, data)
What were the results?
coef(model)
How uncertain?
confint(model)
Plotting the model

regression equation y = a + x b
a is the intercept
b is the slope of the line

pull out coefficients with coef( )
a plot with two layers: scatterplot
with added geom_abline( )
40

horn.length

35

30

25

250

300

350

weight

400

450

scatterplot <- qplot(x=weight, y=horn.length, data=data)
a <- coef(model)[1]
b <- coef(model)[2]
scatterplot + geom_abline(intercept=a, slope=b)
A regression diagnostic

the linear model needs several
assumptions, particularly linearity and
equal error variance
the residuals vs fitted plot can help spot
gross deviations
8

residuals(model)

4

0

-4

28

30

32

fitted(model)

34

qplot(x=fitted(model), y=residuals(model))
Photo: Peter (anemoneprojectors), CC:BY-SA-2.0	

http://www.flickr.com/people/anemoneprojectors/
analysis of variance
aov(formula, data=some.data)
drop1(aov.object, test="F")
F-tests (Type II SS)

post-hoc tests
pairwise.t.test
TukeyHSD
Exercise: Perform analysis of variance
on weight and the effects of diet while
controlling for colour.
We want a two-way anova with an F-test for diet.
model.int <- aov(weight ~ diet * colour, data=data)
drop1(model.int, test="F")
model.add <- aov(weight ~ diet + colour, data=data)
drop1(model.add, test="F")
Single term deletions
Model:
weight ~ diet + colour
Df Sum of Sq
RSS
AIC F value Pr(>F)
<none>
85781 223.72
diet
1
471.1 86252 221.87 0.1318 0.71975
colour 1
13479.7 99260 225.66 3.7714 0.06396 .
--Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Black magic

none of the is really R’s fault, but things
that come up along the way
Black magic

none of the is really R’s fault, but things
that come up along the way
missing values: na.rm=T, na.exclude( )
Black magic

none of the is really R’s fault, but things
that come up along the way
missing values: na.rm=T, na.exclude( )
type I, II and III sums of squares in Anova
Black magic

none of the is really R’s fault, but things
that come up along the way
missing values: na.rm=T, na.exclude( )
type I, II and III sums of squares in anova
floating-point arithmetic, e.g. sin(pi)
Reading

Daalgard, Introductory statistics with R,
electronic resource at the library
Faraway, The linear model in R
Gelman & Hill, Data analysis using regression
and multilevel/hierarchical models
Wickham, ggplot2 book
tons of tutorials online, for instance
http://martinsbioblogg.wordpress.com/a-slightlydifferent-introduction-to-r/
Exercise

More (and some of the same) analysis of the
unicorn data set.
Use the R documentation and google.
I will post solutions.

More Related Content

What's hot

Linear regression with R 1
Linear regression with R 1Linear regression with R 1
Linear regression with R 1
Kazuki Yoshida
 
Algebraic Data Types and Origami Patterns
Algebraic Data Types and Origami PatternsAlgebraic Data Types and Origami Patterns
Algebraic Data Types and Origami Patterns
Vasil Remeniuk
 

What's hot (20)

Kleisli composition, flatMap, join, map, unit - implementation and interrelation
Kleisli composition, flatMap, join, map, unit - implementation and interrelationKleisli composition, flatMap, join, map, unit - implementation and interrelation
Kleisli composition, flatMap, join, map, unit - implementation and interrelation
 
Linear regression with R 1
Linear regression with R 1Linear regression with R 1
Linear regression with R 1
 
Algebraic Data Types and Origami Patterns
Algebraic Data Types and Origami PatternsAlgebraic Data Types and Origami Patterns
Algebraic Data Types and Origami Patterns
 
Python - Numpy/Pandas/Matplot Machine Learning Libraries
Python - Numpy/Pandas/Matplot Machine Learning LibrariesPython - Numpy/Pandas/Matplot Machine Learning Libraries
Python - Numpy/Pandas/Matplot Machine Learning Libraries
 
Stata Cheat Sheets (all)
Stata Cheat Sheets (all)Stata Cheat Sheets (all)
Stata Cheat Sheets (all)
 
Introduction to pandas
Introduction to pandasIntroduction to pandas
Introduction to pandas
 
Stata Programming Cheat Sheet
Stata Programming Cheat SheetStata Programming Cheat Sheet
Stata Programming Cheat Sheet
 
Language R
Language RLanguage R
Language R
 
1.Array and linklst definition
1.Array and linklst definition1.Array and linklst definition
1.Array and linklst definition
 
Oh, All the things you'll traverse
Oh, All the things you'll traverseOh, All the things you'll traverse
Oh, All the things you'll traverse
 
Chapter7
Chapter7Chapter7
Chapter7
 
Data analysis with R
Data analysis with RData analysis with R
Data analysis with R
 
Python Pandas
Python PandasPython Pandas
Python Pandas
 
4 R Tutorial DPLYR Apply Function
4 R Tutorial DPLYR Apply Function4 R Tutorial DPLYR Apply Function
4 R Tutorial DPLYR Apply Function
 
Introduction to Pandas and Time Series Analysis [PyCon DE]
Introduction to Pandas and Time Series Analysis [PyCon DE]Introduction to Pandas and Time Series Analysis [PyCon DE]
Introduction to Pandas and Time Series Analysis [PyCon DE]
 
statistical computation using R- an intro..
statistical computation using R- an intro..statistical computation using R- an intro..
statistical computation using R- an intro..
 
Data Visualization 2020_21
Data Visualization 2020_21Data Visualization 2020_21
Data Visualization 2020_21
 
Presentation R basic teaching module
Presentation R basic teaching modulePresentation R basic teaching module
Presentation R basic teaching module
 
Functional programming with_scala
Functional programming with_scalaFunctional programming with_scala
Functional programming with_scala
 
Functionsandpigeonholeprinciple
FunctionsandpigeonholeprincipleFunctionsandpigeonholeprinciple
Functionsandpigeonholeprinciple
 

Viewers also liked

Technical dictionary of electricity
Technical dictionary of electricityTechnical dictionary of electricity
Technical dictionary of electricity
Marcos Alexander
 
The life of a volleyball player
The life of a volleyball playerThe life of a volleyball player
The life of a volleyball player
rubina1230
 
Crime thriller pitch presentation AS Media
Crime thriller pitch presentation AS MediaCrime thriller pitch presentation AS Media
Crime thriller pitch presentation AS Media
hayleigh282
 
Lesson 10 - Wet Nurses Of Prophet Muhammed
Lesson 10 - Wet Nurses Of Prophet MuhammedLesson 10 - Wet Nurses Of Prophet Muhammed
Lesson 10 - Wet Nurses Of Prophet Muhammed
Ebrahim Ismail
 
Tolan revfinal conf_2013
Tolan revfinal conf_2013Tolan revfinal conf_2013
Tolan revfinal conf_2013
youth_nex
 
ConnectME: connecting content for future TV & video
ConnectME: connecting content for future TV & videoConnectME: connecting content for future TV & video
ConnectME: connecting content for future TV & video
connectme_project
 

Viewers also liked (20)

Technical dictionary of electricity
Technical dictionary of electricityTechnical dictionary of electricity
Technical dictionary of electricity
 
Fpvs
FpvsFpvs
Fpvs
 
The life of a volleyball player
The life of a volleyball playerThe life of a volleyball player
The life of a volleyball player
 
Mi familia
Mi familiaMi familia
Mi familia
 
عروض موقع سوق مصر
عروض موقع سوق مصرعروض موقع سوق مصر
عروض موقع سوق مصر
 
Crime thriller pitch presentation AS Media
Crime thriller pitch presentation AS MediaCrime thriller pitch presentation AS Media
Crime thriller pitch presentation AS Media
 
Lesson 10 - Wet Nurses Of Prophet Muhammed
Lesson 10 - Wet Nurses Of Prophet MuhammedLesson 10 - Wet Nurses Of Prophet Muhammed
Lesson 10 - Wet Nurses Of Prophet Muhammed
 
Lesson 15 - first journey to syria, monk buhaira and the pact of virtous
Lesson 15 -  first journey to syria, monk buhaira and the pact of virtousLesson 15 -  first journey to syria, monk buhaira and the pact of virtous
Lesson 15 - first journey to syria, monk buhaira and the pact of virtous
 
Lesson 14 - Life, Role and Death of Abu Talib
Lesson 14 - Life, Role and Death of Abu TalibLesson 14 - Life, Role and Death of Abu Talib
Lesson 14 - Life, Role and Death of Abu Talib
 
Lesson 19 - Prophet's (PBUH) Children
Lesson 19 - Prophet's (PBUH) ChildrenLesson 19 - Prophet's (PBUH) Children
Lesson 19 - Prophet's (PBUH) Children
 
Gloria Rockhold MA, M.Ed. - "Relationship-Building" The Corner Stone"
Gloria Rockhold MA, M.Ed. - "Relationship-Building" The Corner Stone"Gloria Rockhold MA, M.Ed. - "Relationship-Building" The Corner Stone"
Gloria Rockhold MA, M.Ed. - "Relationship-Building" The Corner Stone"
 
Colour communication
Colour communicationColour communication
Colour communication
 
Lesson 21 - Building up to Prophethood
Lesson 21 - Building up to ProphethoodLesson 21 - Building up to Prophethood
Lesson 21 - Building up to Prophethood
 
2012 Outside-In Holiday Infographic
2012 Outside-In Holiday Infographic2012 Outside-In Holiday Infographic
2012 Outside-In Holiday Infographic
 
Learning the city powerpointfrom am v3
Learning the city powerpointfrom am v3Learning the city powerpointfrom am v3
Learning the city powerpointfrom am v3
 
Tolan revfinal conf_2013
Tolan revfinal conf_2013Tolan revfinal conf_2013
Tolan revfinal conf_2013
 
Mi familia
Mi familiaMi familia
Mi familia
 
ConnectME: connecting content for future TV & video
ConnectME: connecting content for future TV & videoConnectME: connecting content for future TV & video
ConnectME: connecting content for future TV & video
 
Lesson 20 - Rebuilding of the Ka'bah
Lesson 20 - Rebuilding of the Ka'bahLesson 20 - Rebuilding of the Ka'bah
Lesson 20 - Rebuilding of the Ka'bah
 
Lesson five miracles
Lesson five   miraclesLesson five   miracles
Lesson five miracles
 

Similar to R introduction v2

IR-ranking
IR-rankingIR-ranking
IR-ranking
FELIX75
 
Formulas for Surface Weighted Numbers on Graph
Formulas for Surface Weighted Numbers on GraphFormulas for Surface Weighted Numbers on Graph
Formulas for Surface Weighted Numbers on Graph
ijtsrd
 
Background This course is all about data visualization. However, we.docx
Background This course is all about data visualization. However, we.docxBackground This course is all about data visualization. However, we.docx
Background This course is all about data visualization. However, we.docx
rosemaryralphs52525
 

Similar to R introduction v2 (20)

R Language Introduction
R Language IntroductionR Language Introduction
R Language Introduction
 
I stata
I stataI stata
I stata
 
R for Statistical Computing
R for Statistical ComputingR for Statistical Computing
R for Statistical Computing
 
R workshop
R workshopR workshop
R workshop
 
Intelligent Methods in Models of Text Information Retrieval: Implications for...
Intelligent Methods in Models of Text Information Retrieval: Implications for...Intelligent Methods in Models of Text Information Retrieval: Implications for...
Intelligent Methods in Models of Text Information Retrieval: Implications for...
 
An Intoduction to R
An Intoduction to RAn Intoduction to R
An Intoduction to R
 
R Basics
R BasicsR Basics
R Basics
 
R language introduction
R language introductionR language introduction
R language introduction
 
Introduction to r
Introduction to rIntroduction to r
Introduction to r
 
Statistics lab 1
Statistics lab 1Statistics lab 1
Statistics lab 1
 
IR-ranking
IR-rankingIR-ranking
IR-ranking
 
IntroductionSTATA.ppt
IntroductionSTATA.pptIntroductionSTATA.ppt
IntroductionSTATA.ppt
 
Introduction to R
Introduction to RIntroduction to R
Introduction to R
 
Seminar on MATLAB
Seminar on MATLABSeminar on MATLAB
Seminar on MATLAB
 
Formulas for Surface Weighted Numbers on Graph
Formulas for Surface Weighted Numbers on GraphFormulas for Surface Weighted Numbers on Graph
Formulas for Surface Weighted Numbers on Graph
 
Optimization of sample configurations for spatial trend estimation
Optimization of sample configurations for spatial trend estimationOptimization of sample configurations for spatial trend estimation
Optimization of sample configurations for spatial trend estimation
 
The Other HPC: High Productivity Computing in Polystore Environments
The Other HPC: High Productivity Computing in Polystore EnvironmentsThe Other HPC: High Productivity Computing in Polystore Environments
The Other HPC: High Productivity Computing in Polystore Environments
 
Scala. Introduction to FP. Monads
Scala. Introduction to FP. MonadsScala. Introduction to FP. Monads
Scala. Introduction to FP. Monads
 
Background This course is all about data visualization. However, we.docx
Background This course is all about data visualization. However, we.docxBackground This course is all about data visualization. However, we.docx
Background This course is all about data visualization. However, we.docx
 
Continuous Architecting of Stream-Based Systems
Continuous Architecting of Stream-Based SystemsContinuous Architecting of Stream-Based Systems
Continuous Architecting of Stream-Based Systems
 

Recently uploaded

Recently uploaded (20)

[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 

R introduction v2