6. What is Regression Analysis?
regression analysis is a statistical process for estimating the
relationships among variables. It includes many techniques for
modeling and analyzing several variables, when the focus is on the
relationship between a dependent variable and one or
more independent variables.
http://en.wikipedia.org/wiki/Regression_analysis
7. Fitting a line
plot(galton$child, galton$parent, pch=19,col="blue")
lm1 <- lm(child ~ parent, data=galton)
lines(galton$parent,lm1$fitted,col="red", lwd=3)
The line
width
12. P-value
Most Common Measure of Statistical Significance
Idea: Suppose nothing is going on - how unusual is it to see the estimate we got?
Some typical values (single test)
P < 0.05 (significant)
P < 0.01 (strongly significant)
P < 0.001 (very significant)
13. Confidence intervals
A confidence interval is a type of interval estimate of a population
parameter and is used to indicate the reliability of an estimate
confint(lm1,level=0.95)
http://en.wikipedia.org/wiki/Confidence_interval
14. 2
R
R2 : the proportion of response variation "explained" by the
regressors in the model.
R2= 1 :the fitted model explains all variability in
R2 = 0 indicates no 'linear' relationship (for straight line
regression, this means that the straight line model is a constant line
(slope=0, intercept=bar{y}) between the response variable and
regressors).
http://en.wikipedia.org/wiki/Coefficient_of_determination
15. Adjusted R2
The use of an adjusted R2 (often written as bar R^2 and pronounced
"R bar squared") is an attempt to take account of the phenomenon
of the R2 automatically and spuriously increasing when extra
explanatory variables are added to the model.
http://en.wikipedia.org/wiki/Coefficient_of_determination
16. Predicting with Linear Regression
coef(lm1)[1] + coef(lm1)[2]*80
newdata <- data.frame(parent=80)
predict(lm1,newdata)
17. Multivariate Linear Regression
WHO childhood hunger data
Dataset:
http://apps.who.int/gho/athena/data/GHO/WHOSIS_000008.csv?pr
ofile=text&filter=COUNTRY:*
hunger <- read.csv("./hunger.csv")
hunger <- hunger[hunger$Sex!="Both sexes", ]
18. Multivariate Linear Regression
– cont.
lmBoth <- lm(hunger$Numeric ~ hunger$Year + hunger$Sex)
lmBoth2 <- lm(hunger$Numeric ~ hunger$Year + hunger$Sex +
hunger$Sex*hunger$Year)
Same slopes
Different
slopes
20. Regression with Factor Variables
Outcome is still quantitative
Covariate(s) are factor variables
Fitting lines = fitting means
Want to evaluate contribution of all factor levels at once
21. Regression with Factor Variables – cont.
Dataset: http://www.rossmanchance.com/iscam2/data/movies03RT.txt
movies <- read.table("./movies.txt",sep="t",header=T,quote="")
head(movies)