SlideShare a Scribd company logo
1 of 58
Download to read offline
Linear Regression
      with
 1: Prepare data/specify model/read results

          2012-12-07 @HSPH
         Kazuki Yoshida, M.D.
           MPH-CLE student

                                       FREEDOM
                                       TO	
  KNOW
Group Website is at:
http://rpubs.com/kaz_yos/useR_at_HSPH
Previously in this group
n   Introduction               n   Graphics

n   Reading Data into R (1)    n   Groupwise, continuous

n   Reading Data into R (2)    n


n   Descriptive, continuous

n   Descriptive, categorical

n   Deducer
Menu


n   Linear regression
Ingredients
        Statistics                   Programming
n   Data preparation         n   within()

n   Model formula            n   factor(), relevel()

                              n   lm()

                              n   formula = Y ~ X1 + X2

                              n   summary()

                              n   anova(), car::Anova()
Open
R Studio
Create a new script
   and save it.
http://www.umass.edu/statdata/statdata/data/
We will use lowbwt dataset used in BIO213




             lowbwt.dat
http://www.umass.edu/statdata/statdata/data/lowbwt.txt
http://www.umass.edu/statdata/statdata/data/lowbwt.dat
Load dataset from web


lbw <- read.table("http://www.umass.edu/statdata/statdata/data/lowbwt.dat",
                  head = T, skip = 4)



                                       skip 4 rows
          header = TRUE
             to pick up
           variable names
“Fix” dataset


        lbw[c(10,39), "BWT"] <- c(2655, 3035)



            BWT column
                               Replace data points
10th,39th                  to make the dataset identical
  rows                         to BIO213 dataset
Lower case variable names


    names(lbw) <- tolower(names(lbw))



 Put them back into    Convert variable
  variable names      names to lower case
See overview
library(gpairs)
gpairs(lbw)
Recoding
Changing and creating variables
Name of newly created dataset
  (here replacing original)         Take dataset

 dataset <-
 	

within(dataset, {
 	

	

_variable manipulations_
 })         Perform variable manipulation
       You can specify by variable name
      only. No need for dataset$var_name
lbw <- within(lbw, {

     ## Relabel race
     race.cat <- factor(race, levels = 1:3, labels = c("White","Black","Other"))

     ## Categorize ftv (frequency of visit)
     ftv.cat <- cut(ftv, breaks = c(-Inf, 0, 2, Inf), labels = c("None","Normal","Many"))
     ftv.cat <- relevel(ftv.cat, ref = "Normal")

     ## Dichotomize ptl
     preterm <- factor(ptl >= 1, levels = c(F,T), labels = c("0","1+"))

})
Numeric to categorical:
                    element by element                                    1st will be reference
lbw <- within(lbw, {

     ## Relabel race
     race.cat <- factor(race, levels = 1:3, labels = c("White","Black","Other"))

     ## Categorize ftv (frequency of visit)
     ftv.cat <- cut(ftv, breaks = c(-Inf, 0, 2, Inf), labels = c("None","Normal","Many"))
     ftv.cat <- relevel(ftv.cat, ref = "Normal")

     ## Dichotomize ptl
     preterm <- factor(ptl >= 1, levels = c(F,T), labels = c("0","1+"))

})
                           1 to White                             1st will be reference
Categorize race and label: 2 to Black
                           3 to Other
Explained more in depth
factor() to create categorical variable
  Create new
variable named                               Take race variable
    race.cat
  lbw <- within(lbw, {

       ## Relabel race
       race.cat <- factor(race, levels = 1:3, labels = c("White","Black","Other"))

  })


   Order levels 1, 2, 3
  Make 1 reference level
                                                Label levels 1, 2, 3 as
                                                White, Black, Other
Numeric to categorical:
                     range to element
lbw <- within(lbw, {
                                                                    1st will be reference
     ## Relabel race
     race.cat <- factor(race, levels = 1:3, labels = c("White","Black","Other"))

     ## Categorize ftv (frequency of visit)
     ftv.cat <- cut(ftv, breaks = c(-Inf, 0, 2, Inf), labels = c("None","Normal","Many"))
     ftv.cat <- relevel(ftv.cat, ref = "Normal")

     ## Dichotomize ptl
     preterm <- factor(ptl >= 1, levels = c(F,T), labels = c("0","1+"))

})                                      How breaks work

(-Inf                       0] 1 2] 3              4     5     6                     Inf    ]
             None             Normal                         Many
Reset reference level
lbw <- within(lbw, {

     ## Relabel race
     race.cat <- factor(race, levels = 1:3, labels = c("White","Black","Other"))

     ## Categorize ftv (frequency of visit)
     ftv.cat <- cut(ftv, breaks = c(-Inf, 0, 2, Inf), labels = c("None","Normal","Many"))
     ftv.cat <- relevel(ftv.cat, ref = "Normal")

     ## Dichotomize ptl
     preterm <- factor(ptl >= 1, levels = c(F,T), labels = c("0","1+"))

})

               Change reference level of ftv.cat variable
                       from None to Normal
Numeric to Boolean to Category
lbw <- within(lbw, {

     ## Relabel race
     race.cat <- factor(race, levels = 1:3, labels = c("White","Black","Other"))

     ## Categorize ftv (frequency of visit)
     ftv.cat <- cut(ftv, breaks = c(-Inf, 0, 2, Inf), labels = c("None","Normal","Many"))
     ftv.cat <- relevel(ftv.cat, ref = "Normal")

     ## Dichotomize ptl
     preterm <- factor(ptl >= 1, levels = c(FALSE,TRUE), labels = c("0","1+"))

})

       TRUE, FALSE                      ptl < 1 to FALSE, then to “0”
        vector created                  ptl >= 1 to TRUE, then to “1+”
             here                                          levels                  labels
Binary 0,1 to No,Yes
lbw <- within(lbw, {

     ## Categorize smoke ht ui
     smoke <- factor(smoke, levels = 0:1, labels = c("No","Yes"))      One-by-one
     ht      <- factor(ht,     levels = 0:1, labels = c("No","Yes"))
     ui      <- factor(ui,    levels = 0:1, labels = c("No","Yes"))     method
})



## Alternative to above
lbw[,c("smoke","ht","ui")] <-
  lapply(lbw[,c("smoke","ht","ui")],
       function(var) {                                                 Loop method
          var <- factor(var, levels = 0:1, labels = c("No","Yes"))
       })
model formula
formula

 outcome ~ predictor1 + predictor2 + predictor3




               SAS equivalent:
model outcome = predictor1 predictor2 predictor3;
In the case of t-test

 continuous variable       grouping variable to
   to be compared            separate groups



          age ~ zyg
         Variable to be   Variable used
          explained        to explain
linear sum



Y ~ X1 + X2
n   . All variables except for the outcome

n   + X2 Add X2 term

n   - 1 Remove intercept

n   X1:X2 Interaction term between X1 and X2

n   X1*X2 Main effects and interaction term
Interaction term



Y ~ X1 + X2 + X1:X2
     Main effects   Interaction
Interaction term



Y ~ X1 * X2
   Main effects & interaction
On-the-fly variable manipulation
                        Inhibit formula
                   interpretation. For math
                         manipulation


  Y ~ X1 + I(X2 * X3)
              New variable (X2 times X3)
              created on-the-fly and used
Fit a model


lm.full <- lm(bwt ~ age + lwt + smoke + ht + ui +
              ftv.cat + race.cat + preterm ,
             data = lbw)
See model object



   lm.full
Call: command repeated




             Coefficient for each
                  variable
See summary



summary(lm.full)
Call: command repeated         Residual
                                       distribution


                                          Coef/SE = t



                                              Dummy
                                              variables
                                               created



Model                             R^2 and adjusted R^2
F-test
ftv.catNone No 1st trimester visit people compared to
    Normal 1st trimester visit people (reference level)
ftv.catMany Many 1st trimester visit people compared to
    Normal 1st trimester visit people (reference level)
race.catBlack Black people compared to
     White people (reference level)
race.catOther Other people compared to
     White people (reference level)
Confidence intervals



confint(fit.lm)
Confidence intervals
         Lower      Upper
        boundary   boundary
ANOVA table (type I)



anova(lm.full)
ANOVA table (type I)
   degree of    Sequential   Mean SS
   freedom         SS        = SS/DF




 F = Mean SS / Mean SS of residual
Type I = Sequential SS
    1 age


          1st gets all in type I


                                               er lap
                                            ov I
                                          ut pe
                                      ll b n ty
                                    sa 1i
             las                  et n
                                 g e                    2 lwt
            on emtr           nd twe
                             2 e
              ly                b
                 in aini
                    typ ng
3 smoke                eI
ANOVA table (type III)


     library(car)
Anova(lm.full, type = 3)
ANOVA table (type III)
                 Marginal    degree of
                   SS        freedom
 Multi-
category
variables
tested as
   one




            F = Mean SS / Mean SS of residual
Type III = Marginal SS
      1 age
                           gin
                         ar I
                    ets m e II
              1s t g typ
                     in
               o nly




                                             e I in
                                          typ rg
                                                II
                                       i n ma
         las




                                     ly ets
        on    tg                                      2 lwt
                 ets


                                       dg
           ly
              in ma
                                   2n
                 typ rg
                                  on
3 smoke              e I in
                        II
Comparison

Type I            Type III
Effect plot

library(effects)
plot(allEffects(lm.full), ylim = c(2000,4000))

                                Fix Y-axis
                               values for all
                                   plots
Effect of a variable
with other covariate
   set at average
Interaction
This model is for
demonstration purpose.
                Continuous * Continuous


  lm.full.int <- lm(bwt ~ age*lwt + smoke +
    ht + ui + age*ftv.cat + race.cat*preterm,
    data = lbw)


 Continuous * Categorical
                            Categorical * Categorical
Anova(lm.full.int, type = 3)
Marginal    degree of
                   SS        freedom




Interaction
   terms




              F = Mean SS / Mean SS of residual
plot(effect("age:lwt", lm.full.int))



                                                 lwt level
Continuous * Continuous
plot(effect("age:ftv.cat", lm.full.int), multiline = TRUE)
 Continuous * Categorical
plot(effect(c("race.cat*preterm"), lm.full.int),
x.var = "preterm", z.var = "race.cat", multiline = TRUE)
 Categorical * Categorical
Linear regression with R 1

More Related Content

What's hot

IR-ranking
IR-rankingIR-ranking
IR-rankingFELIX75
 
Introduction to r studio on aws 2020 05_06
Introduction to r studio on aws 2020 05_06Introduction to r studio on aws 2020 05_06
Introduction to r studio on aws 2020 05_06Barry DeCicco
 
Introduction to pandas
Introduction to pandasIntroduction to pandas
Introduction to pandasPiyush rai
 
Data manipulation on r
Data manipulation on rData manipulation on r
Data manipulation on rAbhik Seal
 
Data manipulation with dplyr
Data manipulation with dplyrData manipulation with dplyr
Data manipulation with dplyrRomain Francois
 
Python for R Users
Python for R UsersPython for R Users
Python for R UsersAjay Ohri
 
Data Manipulation Using R (& dplyr)
Data Manipulation Using R (& dplyr)Data Manipulation Using R (& dplyr)
Data Manipulation Using R (& dplyr)Ram Narasimhan
 
Next Generation Programming in R
Next Generation Programming in RNext Generation Programming in R
Next Generation Programming in RFlorian Uhlitz
 
Morel, a Functional Query Language
Morel, a Functional Query LanguageMorel, a Functional Query Language
Morel, a Functional Query LanguageJulian Hyde
 
Is there a perfect data-parallel programming language? (Experiments with More...
Is there a perfect data-parallel programming language? (Experiments with More...Is there a perfect data-parallel programming language? (Experiments with More...
Is there a perfect data-parallel programming language? (Experiments with More...Julian Hyde
 
Functions In Scala
Functions In Scala Functions In Scala
Functions In Scala Knoldus Inc.
 
Introduction to R Programming
Introduction to R ProgrammingIntroduction to R Programming
Introduction to R Programmingizahn
 
Data Structures In Scala
Data Structures In ScalaData Structures In Scala
Data Structures In ScalaKnoldus Inc.
 
Presentation R basic teaching module
Presentation R basic teaching modulePresentation R basic teaching module
Presentation R basic teaching moduleSander Timmer
 
Introduction to R programming
Introduction to R programmingIntroduction to R programming
Introduction to R programmingAlberto Labarga
 

What's hot (20)

R Language Introduction
R Language IntroductionR Language Introduction
R Language Introduction
 
IR-ranking
IR-rankingIR-ranking
IR-ranking
 
Introduction to r studio on aws 2020 05_06
Introduction to r studio on aws 2020 05_06Introduction to r studio on aws 2020 05_06
Introduction to r studio on aws 2020 05_06
 
R language introduction
R language introductionR language introduction
R language introduction
 
Introduction to pandas
Introduction to pandasIntroduction to pandas
Introduction to pandas
 
Data manipulation on r
Data manipulation on rData manipulation on r
Data manipulation on r
 
Data manipulation with dplyr
Data manipulation with dplyrData manipulation with dplyr
Data manipulation with dplyr
 
Python for R Users
Python for R UsersPython for R Users
Python for R Users
 
Data Manipulation Using R (& dplyr)
Data Manipulation Using R (& dplyr)Data Manipulation Using R (& dplyr)
Data Manipulation Using R (& dplyr)
 
Next Generation Programming in R
Next Generation Programming in RNext Generation Programming in R
Next Generation Programming in R
 
Morel, a Functional Query Language
Morel, a Functional Query LanguageMorel, a Functional Query Language
Morel, a Functional Query Language
 
Is there a perfect data-parallel programming language? (Experiments with More...
Is there a perfect data-parallel programming language? (Experiments with More...Is there a perfect data-parallel programming language? (Experiments with More...
Is there a perfect data-parallel programming language? (Experiments with More...
 
Programming in R
Programming in RProgramming in R
Programming in R
 
R language
R languageR language
R language
 
Functions In Scala
Functions In Scala Functions In Scala
Functions In Scala
 
Introduction to R Programming
Introduction to R ProgrammingIntroduction to R Programming
Introduction to R Programming
 
Introduction2R
Introduction2RIntroduction2R
Introduction2R
 
Data Structures In Scala
Data Structures In ScalaData Structures In Scala
Data Structures In Scala
 
Presentation R basic teaching module
Presentation R basic teaching modulePresentation R basic teaching module
Presentation R basic teaching module
 
Introduction to R programming
Introduction to R programmingIntroduction to R programming
Introduction to R programming
 

Similar to Linear regression with R 1

Introduction to python programming ( part-3 )
Introduction to python programming ( part-3 )Introduction to python programming ( part-3 )
Introduction to python programming ( part-3 )Ziyauddin Shaik
 
Python Performance 101
Python Performance 101Python Performance 101
Python Performance 101Ankur Gupta
 
Review session2
Review session2Review session2
Review session2NEEDY12345
 
Generic Functional Programming with Type Classes
Generic Functional Programming with Type ClassesGeneric Functional Programming with Type Classes
Generic Functional Programming with Type ClassesTapio Rautonen
 
Free Monads Getting Started
Free Monads Getting StartedFree Monads Getting Started
Free Monads Getting StartedKent Ohashi
 
TensorFlow for IITians
TensorFlow for IITiansTensorFlow for IITians
TensorFlow for IITiansAshish Bansal
 
Introduction to python cheat sheet for all
Introduction to python cheat sheet for allIntroduction to python cheat sheet for all
Introduction to python cheat sheet for allshwetakushwaha45
 
Declarative Thinking, Declarative Practice
Declarative Thinking, Declarative PracticeDeclarative Thinking, Declarative Practice
Declarative Thinking, Declarative PracticeKevlin Henney
 
Mementopython3 english
Mementopython3 englishMementopython3 english
Mementopython3 englishyassminkhaldi1
 
Data Analysis with R (combined slides)
Data Analysis with R (combined slides)Data Analysis with R (combined slides)
Data Analysis with R (combined slides)Guy Lebanon
 
Python3 cheatsheet
Python3 cheatsheetPython3 cheatsheet
Python3 cheatsheetGil Cohen
 
Python Cheat Sheet 2.0.pdf
Python Cheat Sheet 2.0.pdfPython Cheat Sheet 2.0.pdf
Python Cheat Sheet 2.0.pdfRahul Jain
 
Applied machine learning for search engine relevance 3
Applied machine learning for search engine relevance 3Applied machine learning for search engine relevance 3
Applied machine learning for search engine relevance 3Charles Martin
 
Profiling and optimization
Profiling and optimizationProfiling and optimization
Profiling and optimizationg3_nittala
 
Tip Top Typing - A Look at Python Typing
Tip Top Typing - A Look at Python TypingTip Top Typing - A Look at Python Typing
Tip Top Typing - A Look at Python TypingPatrick Viafore
 
Data Handling.pdf
Data Handling.pdfData Handling.pdf
Data Handling.pdfMILANOP1
 

Similar to Linear regression with R 1 (20)

Introduction to python programming ( part-3 )
Introduction to python programming ( part-3 )Introduction to python programming ( part-3 )
Introduction to python programming ( part-3 )
 
R language tutorial.pptx
R language tutorial.pptxR language tutorial.pptx
R language tutorial.pptx
 
Python Performance 101
Python Performance 101Python Performance 101
Python Performance 101
 
Review session2
Review session2Review session2
Review session2
 
Generic Functional Programming with Type Classes
Generic Functional Programming with Type ClassesGeneric Functional Programming with Type Classes
Generic Functional Programming with Type Classes
 
Day2
Day2Day2
Day2
 
Python Lecture 11
Python Lecture 11Python Lecture 11
Python Lecture 11
 
Free Monads Getting Started
Free Monads Getting StartedFree Monads Getting Started
Free Monads Getting Started
 
TensorFlow for IITians
TensorFlow for IITiansTensorFlow for IITians
TensorFlow for IITians
 
Introduction to python cheat sheet for all
Introduction to python cheat sheet for allIntroduction to python cheat sheet for all
Introduction to python cheat sheet for all
 
Declarative Thinking, Declarative Practice
Declarative Thinking, Declarative PracticeDeclarative Thinking, Declarative Practice
Declarative Thinking, Declarative Practice
 
Mementopython3 english
Mementopython3 englishMementopython3 english
Mementopython3 english
 
Data Analysis with R (combined slides)
Data Analysis with R (combined slides)Data Analysis with R (combined slides)
Data Analysis with R (combined slides)
 
Python3 cheatsheet
Python3 cheatsheetPython3 cheatsheet
Python3 cheatsheet
 
Python Cheat Sheet 2.0.pdf
Python Cheat Sheet 2.0.pdfPython Cheat Sheet 2.0.pdf
Python Cheat Sheet 2.0.pdf
 
Applied machine learning for search engine relevance 3
Applied machine learning for search engine relevance 3Applied machine learning for search engine relevance 3
Applied machine learning for search engine relevance 3
 
Profiling and optimization
Profiling and optimizationProfiling and optimization
Profiling and optimization
 
Tip Top Typing - A Look at Python Typing
Tip Top Typing - A Look at Python TypingTip Top Typing - A Look at Python Typing
Tip Top Typing - A Look at Python Typing
 
Python Cheat Sheet
Python Cheat SheetPython Cheat Sheet
Python Cheat Sheet
 
Data Handling.pdf
Data Handling.pdfData Handling.pdf
Data Handling.pdf
 

More from Kazuki Yoshida

Graphical explanation of causal mediation analysis
Graphical explanation of causal mediation analysisGraphical explanation of causal mediation analysis
Graphical explanation of causal mediation analysisKazuki Yoshida
 
Pharmacoepidemiology Lecture: Designing Observational CER to Emulate an RCT
Pharmacoepidemiology Lecture: Designing Observational CER to Emulate an RCTPharmacoepidemiology Lecture: Designing Observational CER to Emulate an RCT
Pharmacoepidemiology Lecture: Designing Observational CER to Emulate an RCTKazuki Yoshida
 
What is the Expectation Maximization (EM) Algorithm?
What is the Expectation Maximization (EM) Algorithm?What is the Expectation Maximization (EM) Algorithm?
What is the Expectation Maximization (EM) Algorithm?Kazuki Yoshida
 
Propensity Score Methods for Comparative Effectiveness Research with Multiple...
Propensity Score Methods for Comparative Effectiveness Research with Multiple...Propensity Score Methods for Comparative Effectiveness Research with Multiple...
Propensity Score Methods for Comparative Effectiveness Research with Multiple...Kazuki Yoshida
 
Visual Explanation of Ridge Regression and LASSO
Visual Explanation of Ridge Regression and LASSOVisual Explanation of Ridge Regression and LASSO
Visual Explanation of Ridge Regression and LASSOKazuki Yoshida
 
ENAR 2018 Matching Weights to Simultaneously Compare Three Treatment Groups: ...
ENAR 2018 Matching Weights to Simultaneously Compare Three Treatment Groups: ...ENAR 2018 Matching Weights to Simultaneously Compare Three Treatment Groups: ...
ENAR 2018 Matching Weights to Simultaneously Compare Three Treatment Groups: ...Kazuki Yoshida
 
Search and Replacement Techniques in Emacs: avy, swiper, multiple-cursor, ag,...
Search and Replacement Techniques in Emacs: avy, swiper, multiple-cursor, ag,...Search and Replacement Techniques in Emacs: avy, swiper, multiple-cursor, ag,...
Search and Replacement Techniques in Emacs: avy, swiper, multiple-cursor, ag,...Kazuki Yoshida
 
Comparison of Privacy-Protecting Analytic and Data-sharing Methods: a Simulat...
Comparison of Privacy-Protecting Analytic and Data-sharing Methods: a Simulat...Comparison of Privacy-Protecting Analytic and Data-sharing Methods: a Simulat...
Comparison of Privacy-Protecting Analytic and Data-sharing Methods: a Simulat...Kazuki Yoshida
 
Spacemacs: emacs user's first impression
Spacemacs: emacs user's first impressionSpacemacs: emacs user's first impression
Spacemacs: emacs user's first impressionKazuki Yoshida
 
Matching Weights to Simultaneously Compare Three Treatment Groups: a Simulati...
Matching Weights to Simultaneously Compare Three Treatment Groups: a Simulati...Matching Weights to Simultaneously Compare Three Treatment Groups: a Simulati...
Matching Weights to Simultaneously Compare Three Treatment Groups: a Simulati...Kazuki Yoshida
 
Multiple Imputation: Joint and Conditional Modeling of Missing Data
Multiple Imputation: Joint and Conditional Modeling of Missing DataMultiple Imputation: Joint and Conditional Modeling of Missing Data
Multiple Imputation: Joint and Conditional Modeling of Missing DataKazuki Yoshida
 
(Very) Basic graphing with R
(Very) Basic graphing with R(Very) Basic graphing with R
(Very) Basic graphing with RKazuki Yoshida
 
Introduction to Deducer
Introduction to DeducerIntroduction to Deducer
Introduction to DeducerKazuki Yoshida
 
Groupwise comparison of continuous data
Groupwise comparison of continuous dataGroupwise comparison of continuous data
Groupwise comparison of continuous dataKazuki Yoshida
 
Install and Configure R and RStudio
Install and Configure R and RStudioInstall and Configure R and RStudio
Install and Configure R and RStudioKazuki Yoshida
 
Reading Data into R REVISED
Reading Data into R REVISEDReading Data into R REVISED
Reading Data into R REVISEDKazuki Yoshida
 
Descriptive Statistics with R
Descriptive Statistics with RDescriptive Statistics with R
Descriptive Statistics with RKazuki Yoshida
 

More from Kazuki Yoshida (19)

Graphical explanation of causal mediation analysis
Graphical explanation of causal mediation analysisGraphical explanation of causal mediation analysis
Graphical explanation of causal mediation analysis
 
Pharmacoepidemiology Lecture: Designing Observational CER to Emulate an RCT
Pharmacoepidemiology Lecture: Designing Observational CER to Emulate an RCTPharmacoepidemiology Lecture: Designing Observational CER to Emulate an RCT
Pharmacoepidemiology Lecture: Designing Observational CER to Emulate an RCT
 
What is the Expectation Maximization (EM) Algorithm?
What is the Expectation Maximization (EM) Algorithm?What is the Expectation Maximization (EM) Algorithm?
What is the Expectation Maximization (EM) Algorithm?
 
Propensity Score Methods for Comparative Effectiveness Research with Multiple...
Propensity Score Methods for Comparative Effectiveness Research with Multiple...Propensity Score Methods for Comparative Effectiveness Research with Multiple...
Propensity Score Methods for Comparative Effectiveness Research with Multiple...
 
Emacs Key Bindings
Emacs Key BindingsEmacs Key Bindings
Emacs Key Bindings
 
Visual Explanation of Ridge Regression and LASSO
Visual Explanation of Ridge Regression and LASSOVisual Explanation of Ridge Regression and LASSO
Visual Explanation of Ridge Regression and LASSO
 
ENAR 2018 Matching Weights to Simultaneously Compare Three Treatment Groups: ...
ENAR 2018 Matching Weights to Simultaneously Compare Three Treatment Groups: ...ENAR 2018 Matching Weights to Simultaneously Compare Three Treatment Groups: ...
ENAR 2018 Matching Weights to Simultaneously Compare Three Treatment Groups: ...
 
Search and Replacement Techniques in Emacs: avy, swiper, multiple-cursor, ag,...
Search and Replacement Techniques in Emacs: avy, swiper, multiple-cursor, ag,...Search and Replacement Techniques in Emacs: avy, swiper, multiple-cursor, ag,...
Search and Replacement Techniques in Emacs: avy, swiper, multiple-cursor, ag,...
 
Comparison of Privacy-Protecting Analytic and Data-sharing Methods: a Simulat...
Comparison of Privacy-Protecting Analytic and Data-sharing Methods: a Simulat...Comparison of Privacy-Protecting Analytic and Data-sharing Methods: a Simulat...
Comparison of Privacy-Protecting Analytic and Data-sharing Methods: a Simulat...
 
Spacemacs: emacs user's first impression
Spacemacs: emacs user's first impressionSpacemacs: emacs user's first impression
Spacemacs: emacs user's first impression
 
Matching Weights to Simultaneously Compare Three Treatment Groups: a Simulati...
Matching Weights to Simultaneously Compare Three Treatment Groups: a Simulati...Matching Weights to Simultaneously Compare Three Treatment Groups: a Simulati...
Matching Weights to Simultaneously Compare Three Treatment Groups: a Simulati...
 
Multiple Imputation: Joint and Conditional Modeling of Missing Data
Multiple Imputation: Joint and Conditional Modeling of Missing DataMultiple Imputation: Joint and Conditional Modeling of Missing Data
Multiple Imputation: Joint and Conditional Modeling of Missing Data
 
(Very) Basic graphing with R
(Very) Basic graphing with R(Very) Basic graphing with R
(Very) Basic graphing with R
 
Introduction to Deducer
Introduction to DeducerIntroduction to Deducer
Introduction to Deducer
 
Groupwise comparison of continuous data
Groupwise comparison of continuous dataGroupwise comparison of continuous data
Groupwise comparison of continuous data
 
Install and Configure R and RStudio
Install and Configure R and RStudioInstall and Configure R and RStudio
Install and Configure R and RStudio
 
Reading Data into R REVISED
Reading Data into R REVISEDReading Data into R REVISED
Reading Data into R REVISED
 
Descriptive Statistics with R
Descriptive Statistics with RDescriptive Statistics with R
Descriptive Statistics with R
 
Introduction to R
Introduction to RIntroduction to R
Introduction to R
 

Recently uploaded

Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting DataJhengPantaleon
 
Micromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersMicromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersChitralekhaTherkar
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 

Recently uploaded (20)

Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
 
Micromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersMicromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of Powders
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 

Linear regression with R 1

  • 1. Linear Regression with 1: Prepare data/specify model/read results 2012-12-07 @HSPH Kazuki Yoshida, M.D. MPH-CLE student FREEDOM TO  KNOW
  • 2. Group Website is at: http://rpubs.com/kaz_yos/useR_at_HSPH
  • 3. Previously in this group n Introduction n Graphics n Reading Data into R (1) n Groupwise, continuous n Reading Data into R (2) n n Descriptive, continuous n Descriptive, categorical n Deducer
  • 4. Menu n Linear regression
  • 5. Ingredients Statistics Programming n Data preparation n within() n Model formula n factor(), relevel() n lm() n formula = Y ~ X1 + X2 n summary() n anova(), car::Anova()
  • 7. Create a new script and save it.
  • 9. We will use lowbwt dataset used in BIO213 lowbwt.dat http://www.umass.edu/statdata/statdata/data/lowbwt.txt http://www.umass.edu/statdata/statdata/data/lowbwt.dat
  • 10. Load dataset from web lbw <- read.table("http://www.umass.edu/statdata/statdata/data/lowbwt.dat", head = T, skip = 4) skip 4 rows header = TRUE to pick up variable names
  • 11. “Fix” dataset lbw[c(10,39), "BWT"] <- c(2655, 3035) BWT column Replace data points 10th,39th to make the dataset identical rows to BIO213 dataset
  • 12. Lower case variable names names(lbw) <- tolower(names(lbw)) Put them back into Convert variable variable names names to lower case
  • 15.
  • 17. Name of newly created dataset (here replacing original) Take dataset dataset <- within(dataset, { _variable manipulations_ }) Perform variable manipulation You can specify by variable name only. No need for dataset$var_name
  • 18. lbw <- within(lbw, { ## Relabel race race.cat <- factor(race, levels = 1:3, labels = c("White","Black","Other")) ## Categorize ftv (frequency of visit) ftv.cat <- cut(ftv, breaks = c(-Inf, 0, 2, Inf), labels = c("None","Normal","Many")) ftv.cat <- relevel(ftv.cat, ref = "Normal") ## Dichotomize ptl preterm <- factor(ptl >= 1, levels = c(F,T), labels = c("0","1+")) })
  • 19. Numeric to categorical: element by element 1st will be reference lbw <- within(lbw, { ## Relabel race race.cat <- factor(race, levels = 1:3, labels = c("White","Black","Other")) ## Categorize ftv (frequency of visit) ftv.cat <- cut(ftv, breaks = c(-Inf, 0, 2, Inf), labels = c("None","Normal","Many")) ftv.cat <- relevel(ftv.cat, ref = "Normal") ## Dichotomize ptl preterm <- factor(ptl >= 1, levels = c(F,T), labels = c("0","1+")) }) 1 to White 1st will be reference Categorize race and label: 2 to Black 3 to Other
  • 20. Explained more in depth factor() to create categorical variable Create new variable named Take race variable race.cat lbw <- within(lbw, { ## Relabel race race.cat <- factor(race, levels = 1:3, labels = c("White","Black","Other")) }) Order levels 1, 2, 3 Make 1 reference level Label levels 1, 2, 3 as White, Black, Other
  • 21. Numeric to categorical: range to element lbw <- within(lbw, { 1st will be reference ## Relabel race race.cat <- factor(race, levels = 1:3, labels = c("White","Black","Other")) ## Categorize ftv (frequency of visit) ftv.cat <- cut(ftv, breaks = c(-Inf, 0, 2, Inf), labels = c("None","Normal","Many")) ftv.cat <- relevel(ftv.cat, ref = "Normal") ## Dichotomize ptl preterm <- factor(ptl >= 1, levels = c(F,T), labels = c("0","1+")) }) How breaks work (-Inf 0] 1 2] 3 4 5 6 Inf ] None Normal Many
  • 22. Reset reference level lbw <- within(lbw, { ## Relabel race race.cat <- factor(race, levels = 1:3, labels = c("White","Black","Other")) ## Categorize ftv (frequency of visit) ftv.cat <- cut(ftv, breaks = c(-Inf, 0, 2, Inf), labels = c("None","Normal","Many")) ftv.cat <- relevel(ftv.cat, ref = "Normal") ## Dichotomize ptl preterm <- factor(ptl >= 1, levels = c(F,T), labels = c("0","1+")) }) Change reference level of ftv.cat variable from None to Normal
  • 23. Numeric to Boolean to Category lbw <- within(lbw, { ## Relabel race race.cat <- factor(race, levels = 1:3, labels = c("White","Black","Other")) ## Categorize ftv (frequency of visit) ftv.cat <- cut(ftv, breaks = c(-Inf, 0, 2, Inf), labels = c("None","Normal","Many")) ftv.cat <- relevel(ftv.cat, ref = "Normal") ## Dichotomize ptl preterm <- factor(ptl >= 1, levels = c(FALSE,TRUE), labels = c("0","1+")) }) TRUE, FALSE ptl < 1 to FALSE, then to “0” vector created ptl >= 1 to TRUE, then to “1+” here levels labels
  • 24. Binary 0,1 to No,Yes lbw <- within(lbw, { ## Categorize smoke ht ui smoke <- factor(smoke, levels = 0:1, labels = c("No","Yes")) One-by-one ht <- factor(ht, levels = 0:1, labels = c("No","Yes")) ui <- factor(ui, levels = 0:1, labels = c("No","Yes")) method }) ## Alternative to above lbw[,c("smoke","ht","ui")] <- lapply(lbw[,c("smoke","ht","ui")], function(var) { Loop method var <- factor(var, levels = 0:1, labels = c("No","Yes")) })
  • 26. formula outcome ~ predictor1 + predictor2 + predictor3 SAS equivalent: model outcome = predictor1 predictor2 predictor3;
  • 27. In the case of t-test continuous variable grouping variable to to be compared separate groups age ~ zyg Variable to be Variable used explained to explain
  • 28. linear sum Y ~ X1 + X2
  • 29. n . All variables except for the outcome n + X2 Add X2 term n - 1 Remove intercept n X1:X2 Interaction term between X1 and X2 n X1*X2 Main effects and interaction term
  • 30. Interaction term Y ~ X1 + X2 + X1:X2 Main effects Interaction
  • 31. Interaction term Y ~ X1 * X2 Main effects & interaction
  • 32. On-the-fly variable manipulation Inhibit formula interpretation. For math manipulation Y ~ X1 + I(X2 * X3) New variable (X2 times X3) created on-the-fly and used
  • 33. Fit a model lm.full <- lm(bwt ~ age + lwt + smoke + ht + ui + ftv.cat + race.cat + preterm , data = lbw)
  • 34. See model object lm.full
  • 35. Call: command repeated Coefficient for each variable
  • 37. Call: command repeated Residual distribution Coef/SE = t Dummy variables created Model R^2 and adjusted R^2 F-test
  • 38. ftv.catNone No 1st trimester visit people compared to Normal 1st trimester visit people (reference level) ftv.catMany Many 1st trimester visit people compared to Normal 1st trimester visit people (reference level)
  • 39. race.catBlack Black people compared to White people (reference level) race.catOther Other people compared to White people (reference level)
  • 41. Confidence intervals Lower Upper boundary boundary
  • 42. ANOVA table (type I) anova(lm.full)
  • 43. ANOVA table (type I) degree of Sequential Mean SS freedom SS = SS/DF F = Mean SS / Mean SS of residual
  • 44. Type I = Sequential SS 1 age 1st gets all in type I er lap ov I ut pe ll b n ty sa 1i las et n g e 2 lwt on emtr nd twe 2 e ly b in aini typ ng 3 smoke eI
  • 45. ANOVA table (type III) library(car) Anova(lm.full, type = 3)
  • 46. ANOVA table (type III) Marginal degree of SS freedom Multi- category variables tested as one F = Mean SS / Mean SS of residual
  • 47. Type III = Marginal SS 1 age gin ar I ets m e II 1s t g typ in o nly e I in typ rg II i n ma las ly ets on tg 2 lwt ets dg ly in ma 2n typ rg on 3 smoke e I in II
  • 48. Comparison Type I Type III
  • 49. Effect plot library(effects) plot(allEffects(lm.full), ylim = c(2000,4000)) Fix Y-axis values for all plots
  • 50. Effect of a variable with other covariate set at average
  • 52. This model is for demonstration purpose. Continuous * Continuous lm.full.int <- lm(bwt ~ age*lwt + smoke + ht + ui + age*ftv.cat + race.cat*preterm, data = lbw) Continuous * Categorical Categorical * Categorical
  • 54. Marginal degree of SS freedom Interaction terms F = Mean SS / Mean SS of residual
  • 55. plot(effect("age:lwt", lm.full.int)) lwt level Continuous * Continuous
  • 56. plot(effect("age:ftv.cat", lm.full.int), multiline = TRUE) Continuous * Categorical
  • 57. plot(effect(c("race.cat*preterm"), lm.full.int), x.var = "preterm", z.var = "race.cat", multiline = TRUE) Categorical * Categorical