SlideShare una empresa de Scribd logo
1 de 27
Statistical Analysis Software
  Click to edit Master title style

Bivariate and Multivariate Regression
              Analysis




               Academic Department of Marketing
                  Caucasus School of Business
                     Caucasus University          1
                             2011
Problems of Test 1
 • Formulating null and alternative
   hypotheses incorrectly
 • Ignoring question “why”
 • Ignoring the necessity to comment on
   the scale used
 • Mixing up Wilcoxon and paired
   samples T test
 • Massively ignoring the necessity to
   check the equality of variances
   (Levene’s test)
 • Kolmogorov-Smirnov test
Homework 1
 • Three or four homework assignments will be
   given throughout the course. You will be
   informed about the number of points you can
   get from each assignment.
 • The first homework assignment will include
   two problems: the first one is the ANOVA
   problem from test 1 – each one of you will
   have individual databases. The second problem
   will be about using Pearson’s Chi Square
   statistic in cross-tabulations. However, you will
   have to come up with your own example and
   your own fictional database.
 • The assignment is worth 2 points and is due
Important Note (Homework)
 • EVEN IF ALL THE INTERPRETATION IS
   CORRECT, YOU WILL GET ZERO POINTS
   IN CASE YOU SUBMIT THE WRONG
   OUTPUT WHETHER IT’S BECAUSE YOU
   DID THE WRONG TEST OR YOU USED
   SOMEBODY ELSE’S DATASET.
Warming Up – Linear Equations
 • What does a linear relationship imply?
 • How does a linear relationship look like
   (mathematically)?
 • What are the variables in this equation
   and what are the parameters?
 • How are the parameters interpreted?
Scatterplot (1)
 • Scatterplot – collection of points (x,y) on the
   coordinate system. Each point on a scatterplot
   depicts a single case, that has a specific X value
   and a specific Y value, which you can find on the
   X and Y axis.
Scatterplot (2)
 • As we see, there is a certain relationship
   between income and saving – the higher
   the income, the higher the saving.
 • But are we interested only in the
   direction? Not really. It is important to
   measure by how much saving increases as
   income increases by, say, 1 Lari.
 • By saying this we imply that there is a
   linear relationship between income and
   saving (which is not necessarily true, but
   let’s ignore this for now).
Scatterplot (3)
 • Going back to our scatterplot, we need
   to find a line (i.e. determine the
   intercept and the slope) which best
   describes the relationship between two
   variables (in this case saving and
   income).
 • This is exactly where regression comes
   into play – it helps to identify such a line
   by using the sample information.
Bivariate Regression Model
 • In theory, the relationship between saving
   and income already exists and is
   somewhere out there – we can’t really
   determine it in practice. Why? Because we
   would need to collect information about
   everybody’s income and everybody’s
   saving (i.e. we would need information
   about the whole population).
 • If we could, the bivariate regression model
   would look like this:
   Y=β0+ β1*X, where Y is saving and X is
   income.
Error Term
 • Note that even in the ideal case, where we
   have information about the population, we are
   still unable to exactly predict the level of saving
   by the level of income. Why? Because income
   is not the only factor that determines saving.
   There are other factors that aren’t accounted
   for in our bivariate regression model.
 • All the other factors not explicitly accounted
   for in the regression model fall in the so called
   error term, denoted by ε.
 • Therefore, the population regression model
   looks like this:
                     Y=β0+ β1*X+ ε
Linear Regression Analysis
(Bivariate)
 • Identifying the line that depicts the
   relationship between X and Y boils down to
   estimating β0 and β1.
 • What a regression does is basically
   providing us with estimates (regression
   coefficients) of β0 and β1, which are
   denoted by b0 and b1.
 • The estimated regression model looks like
   this:
                   Ŷ= b0 + b1*X
Interpreting Regression
Coefficients
 •     Ŷ= b0 + b1*X
 • Ŷ – predicted values, shows us the predicted
   values of Y as X takes specific values.
 • b0 - intercept, shows the predicted value of Y
   when X=0.
 • b1 - slope estimate, shows by how much the
   predicted value of Y changes as X changes by 1
   unit.
Residual
 • Residual is the difference between the
   actual value of Y and predicted value of
   Y, and is denoted by e.
 • e=Y – Ŷ
 • Do not mix up residual and error term.
   They are NOT the same. We never know
   the error term. However, we can easily
   estimate the residual. Residual is an
   estimate of the error term.
Linear Regression – How to do in
SPSS
Linear Regression - Output



 • Thus, if income is 0, the predicted saving is
   equal to 124.842. And if income increases by
   1 Lari, saving will increase by 0.147 Lari.
 • Is this model appropriate to predict the levels
   of saving? Not really. Saving is also
   determined by other factors, like family
   size, education level of household
   head, his/her age and gender. (Of course
   there may be other determinants as well, but
   let’s focus on these for now)
Multiple Regression Analysis
 • Multiple regression implies including more
   than one independent variable in the
   regression model. Basically it looks like this:
    Y=β0+ β1*X1+ β2*X2+ β3*X3+…+ βk*Xk+ ε
 • In this case we need to estimate (k+1)
   parameters - b0, b1, b2 … bk.
 • Interpretation of slope coefficients: b1
   shows by how much predicted Y changes as
   X1 changes, holding all other X-s constant.
 • Interpretation of intercept – the predicted
   value of Y when all the X-s are equal to
   zero.
Multiple Regression - Output
Major Goals of Conducting
Regression Analysis
 • Goal 1. Measuring partial effects – by how
   much does Y change when X1 changes by 1
   unit, holding all other X-s constant?
 • Goal 2. Forecasting the values of the
   dependent variable – what is the predicted
   saving level (measured in Laris) of a family
   that has a family income of 1000 Laris, that
   has 5 members, whose household head
   studied for 15 years and whose household
   head is 47 years old?
 • Regression provides answers to these
   questions.
Predictive Power of a Model
 • In order to know how good our model
   is for forecasting, we need to measure
   the predictive power of the model. In
   other words, we want to know how
   well the independent variables explain
   the dependent variable.
 • Coefficient of determination (R-
   squared) is widely used for this
   purpose.
Coefficient of Determination –
R-Squared (1)
 • Coefficient      of   determination      (R-squared)
   measures the portion of the variation in Y
   explained by the variation in X-s, in other
   words, how much of the variation in the
   dependent variable is explained by the
   independent variables.
 • This is also called goodness-of-fit.
 • R-squared ranges from 0 to 1 and shows how well
   the regression line describes the data cloud that
   you see on the scatterplot.
 • The closer the data are clustered around the
   regression line, the closer the R-squared is to 1.
   R2=1 is perfect fit (never possible in practice). The
   closer the R-squared is to 0, the worse the fit.
Coefficient of Determination –
R-Squared (2)
 • For example, if R-squared is equal to
   0.045, it means that independent
   variables explain only 4.5% of variation in
   the dependent variable.
 • This is an example of low predicting
   power.
 • The higher the R-squared, the better the
   predictive power of your model.
Testing Significance of Regression
Coefficients (1)
 • As we already mentioned, the other goal
   of regression analysis is to determine
   partial effects.
 • Basically, partial effects measure pure
   effects of respective independent
   variables on the dependent variable.
 • What we want to know is whether these
   pure effects are important. How can we
   find this out?
 • This is done by testing the significance of
   the regression coefficients.
Testing Significance of Regression
Coefficients (2)
 • Suppose we want to test whether age
   of household head (X4) has an
   important effect on saving once all the
   other          factors        (household
   size, income, education of household
   head) are controlled for.
 • Null hypothesis is that β4 = 0. (i.e., as X4
   changes by 1 unit, nothing happens to
   Y, no effect on Y)
 • Alternative hypothesis is that β4 is
   different from 0 (two-tailed test).
Testing Significance of Regression
Coefficients (3)
 • It can be shown that if we divide the estimate of
   β4 (b4) by standard error of b4 (which is standard
   deviation of b4 ), the resulting statistic follows t
   distribution.
 • Thus, we can either calculate the t statistic and
   compare it to the critical t value at 5%
   significance level, or we can simply look at the p-
   value (Sig.) of the regression coefficient. If the
   latter is less than 0.05, we conclude that the
   regression coefficient is significantly different
   from zero (or just significant, shortly). In other
   words, the partial effect of this variable is
   statistically important.
Testing Significance of Regression
Coefficients - Example
 • Going back to our multivariate regression
   example, no single independent variable
   appears to be statistically significant – all
   the p-values are more than 0.05.
 • However, even though these variables
   are separately insignificant, there is a
   chance that they are collectively
   significant.
 • This hypothesis is tested by joint F test.
Joint F Test
 • Null Hypothesis: β1 = β2 = β3 = β4 = 0
 • Alternative Hypothesis: at least one of them is different
   from zero.
 • This is equivalent to testing whether R2=0.
Important Note
 • It can happen that all the coefficients
   are separately insignificant but jointly
   significant, even though in our example
   they’re also jointly insignificant at 5%
   significance level.
 • It can also happen that regression
   coefficients are separately significant
   but jointly insignificant. WHEN?

Más contenido relacionado

La actualidad más candente

Regression Analysis
Regression AnalysisRegression Analysis
Regression Analysissomimemon
 
Regression analysis in R
Regression analysis in RRegression analysis in R
Regression analysis in RAlichy Sowmya
 
Applications of regression analysis - Measurement of validity of relationship
Applications of regression analysis - Measurement of validity of relationshipApplications of regression analysis - Measurement of validity of relationship
Applications of regression analysis - Measurement of validity of relationshipRithish Kumar
 
Regression analysis
Regression analysisRegression analysis
Regression analysisSohag Babu
 
Regression analysis ppt
Regression analysis pptRegression analysis ppt
Regression analysis pptElkana Rorio
 
Simple linear regression analysis
Simple linear  regression analysisSimple linear  regression analysis
Simple linear regression analysisNorma Mingo
 
Regression analysis.
Regression analysis.Regression analysis.
Regression analysis.sonia gupta
 
Regression Analysis
Regression AnalysisRegression Analysis
Regression AnalysisASAD ALI
 
Simple linear regression and correlation
Simple linear regression and correlationSimple linear regression and correlation
Simple linear regression and correlationShakeel Nouman
 
Basic probability theory and statistics
Basic probability theory and statisticsBasic probability theory and statistics
Basic probability theory and statisticsLearnbay Datascience
 
Stat 1163 -correlation and regression
Stat 1163 -correlation and regressionStat 1163 -correlation and regression
Stat 1163 -correlation and regressionKhulna University
 

La actualidad más candente (19)

Regression Analysis
Regression AnalysisRegression Analysis
Regression Analysis
 
Regression analysis in R
Regression analysis in RRegression analysis in R
Regression analysis in R
 
Regression
RegressionRegression
Regression
 
Applications of regression analysis - Measurement of validity of relationship
Applications of regression analysis - Measurement of validity of relationshipApplications of regression analysis - Measurement of validity of relationship
Applications of regression analysis - Measurement of validity of relationship
 
Regression
RegressionRegression
Regression
 
Regression analysis
Regression analysisRegression analysis
Regression analysis
 
Regression analysis ppt
Regression analysis pptRegression analysis ppt
Regression analysis ppt
 
Simple linear regression analysis
Simple linear  regression analysisSimple linear  regression analysis
Simple linear regression analysis
 
Chapter 10
Chapter 10Chapter 10
Chapter 10
 
Regression analysis.
Regression analysis.Regression analysis.
Regression analysis.
 
Regression ppt
Regression pptRegression ppt
Regression ppt
 
Regression
Regression Regression
Regression
 
Regression Analysis
Regression AnalysisRegression Analysis
Regression Analysis
 
Regression presentation
Regression presentationRegression presentation
Regression presentation
 
Regression Analysis
Regression AnalysisRegression Analysis
Regression Analysis
 
Bivariate
BivariateBivariate
Bivariate
 
Simple linear regression and correlation
Simple linear regression and correlationSimple linear regression and correlation
Simple linear regression and correlation
 
Basic probability theory and statistics
Basic probability theory and statisticsBasic probability theory and statistics
Basic probability theory and statistics
 
Stat 1163 -correlation and regression
Stat 1163 -correlation and regressionStat 1163 -correlation and regression
Stat 1163 -correlation and regression
 

Destacado

Language and the Lizard Brain
Language and the Lizard BrainLanguage and the Lizard Brain
Language and the Lizard BrainNew Adventures
 
What determines Sales of a Product?
What determines Sales of a Product?What determines Sales of a Product?
What determines Sales of a Product?PRIYAJNVCTC
 
Multiple regression analysis
Multiple regression analysisMultiple regression analysis
Multiple regression analysisDushyant Bheda
 
Multivariate data analysis regression, cluster and factor analysis on spss
Multivariate data analysis   regression, cluster and factor analysis on spssMultivariate data analysis   regression, cluster and factor analysis on spss
Multivariate data analysis regression, cluster and factor analysis on spssAditya Banerjee
 
Multiple Regression Analysis
Multiple Regression AnalysisMultiple Regression Analysis
Multiple Regression AnalysisMinha Hwang
 
Multiple regression presentation
Multiple regression presentationMultiple regression presentation
Multiple regression presentationCarlo Magno
 
Multiple linear regression
Multiple linear regressionMultiple linear regression
Multiple linear regressionJames Neill
 

Destacado (8)

Statistical analysis by iswar
Statistical analysis by iswarStatistical analysis by iswar
Statistical analysis by iswar
 
Language and the Lizard Brain
Language and the Lizard BrainLanguage and the Lizard Brain
Language and the Lizard Brain
 
What determines Sales of a Product?
What determines Sales of a Product?What determines Sales of a Product?
What determines Sales of a Product?
 
Multiple regression analysis
Multiple regression analysisMultiple regression analysis
Multiple regression analysis
 
Multivariate data analysis regression, cluster and factor analysis on spss
Multivariate data analysis   regression, cluster and factor analysis on spssMultivariate data analysis   regression, cluster and factor analysis on spss
Multivariate data analysis regression, cluster and factor analysis on spss
 
Multiple Regression Analysis
Multiple Regression AnalysisMultiple Regression Analysis
Multiple Regression Analysis
 
Multiple regression presentation
Multiple regression presentationMultiple regression presentation
Multiple regression presentation
 
Multiple linear regression
Multiple linear regressionMultiple linear regression
Multiple linear regression
 

Similar a Lecture 4

A presentation for Multiple linear regression.ppt
A presentation for Multiple linear regression.pptA presentation for Multiple linear regression.ppt
A presentation for Multiple linear regression.pptvigia41
 
Research method ch09 statistical methods 3 estimation np
Research method ch09 statistical methods 3 estimation npResearch method ch09 statistical methods 3 estimation np
Research method ch09 statistical methods 3 estimation npnaranbatn
 
Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...
Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...
Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...Maninda Edirisooriya
 
Applied statistics lecture_6
Applied statistics lecture_6Applied statistics lecture_6
Applied statistics lecture_6Daria Bogdanova
 
Unit-III Correlation and Regression.pptx
Unit-III Correlation and Regression.pptxUnit-III Correlation and Regression.pptx
Unit-III Correlation and Regression.pptxAnusuya123
 
simple-linear-regression (1).pptx
simple-linear-regression (1).pptxsimple-linear-regression (1).pptx
simple-linear-regression (1).pptxShrutiGupta3922
 
For this assignment, use the aschooltest.sav dataset.The d
For this assignment, use the aschooltest.sav dataset.The dFor this assignment, use the aschooltest.sav dataset.The d
For this assignment, use the aschooltest.sav dataset.The dMerrileeDelvalle969
 
Quantitative Analysis Homework Help
Quantitative Analysis Homework HelpQuantitative Analysis Homework Help
Quantitative Analysis Homework HelpExcel Homework Help
 
Logistic regression
Logistic regressionLogistic regression
Logistic regressionRupak Roy
 
biv_sssssssssssssssssssssssssssssssssssmult.ppt
biv_sssssssssssssssssssssssssssssssssssmult.pptbiv_sssssssssssssssssssssssssssssssssssmult.ppt
biv_sssssssssssssssssssssssssssssssssssmult.pptSAnjayKumar3129
 
Simple & Multiple Regression Analysis
Simple & Multiple Regression AnalysisSimple & Multiple Regression Analysis
Simple & Multiple Regression AnalysisShailendra Tomar
 
Linear Regression | Machine Learning | Data Science
Linear Regression | Machine Learning | Data ScienceLinear Regression | Machine Learning | Data Science
Linear Regression | Machine Learning | Data ScienceSumit Pandey
 
Data Analysison Regression
Data Analysison RegressionData Analysison Regression
Data Analysison Regressionjamuga gitulho
 

Similar a Lecture 4 (20)

A presentation for Multiple linear regression.ppt
A presentation for Multiple linear regression.pptA presentation for Multiple linear regression.ppt
A presentation for Multiple linear regression.ppt
 
Correlation & Regression.pptx
Correlation & Regression.pptxCorrelation & Regression.pptx
Correlation & Regression.pptx
 
statics in research
statics in researchstatics in research
statics in research
 
Research method ch09 statistical methods 3 estimation np
Research method ch09 statistical methods 3 estimation npResearch method ch09 statistical methods 3 estimation np
Research method ch09 statistical methods 3 estimation np
 
Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...
Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...
Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...
 
Applied statistics lecture_6
Applied statistics lecture_6Applied statistics lecture_6
Applied statistics lecture_6
 
MModule 1 ppt.pptx
MModule 1 ppt.pptxMModule 1 ppt.pptx
MModule 1 ppt.pptx
 
Unit-III Correlation and Regression.pptx
Unit-III Correlation and Regression.pptxUnit-III Correlation and Regression.pptx
Unit-III Correlation and Regression.pptx
 
simple-linear-regression (1).pptx
simple-linear-regression (1).pptxsimple-linear-regression (1).pptx
simple-linear-regression (1).pptx
 
For this assignment, use the aschooltest.sav dataset.The d
For this assignment, use the aschooltest.sav dataset.The dFor this assignment, use the aschooltest.sav dataset.The d
For this assignment, use the aschooltest.sav dataset.The d
 
Multicollinearity PPT
Multicollinearity PPTMulticollinearity PPT
Multicollinearity PPT
 
Quantitative Analysis Homework Help
Quantitative Analysis Homework HelpQuantitative Analysis Homework Help
Quantitative Analysis Homework Help
 
Logistical Regression.pptx
Logistical Regression.pptxLogistical Regression.pptx
Logistical Regression.pptx
 
Multiple linear regression
Multiple linear regressionMultiple linear regression
Multiple linear regression
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
biv_mult.ppt
biv_mult.pptbiv_mult.ppt
biv_mult.ppt
 
biv_sssssssssssssssssssssssssssssssssssmult.ppt
biv_sssssssssssssssssssssssssssssssssssmult.pptbiv_sssssssssssssssssssssssssssssssssssmult.ppt
biv_sssssssssssssssssssssssssssssssssssmult.ppt
 
Simple & Multiple Regression Analysis
Simple & Multiple Regression AnalysisSimple & Multiple Regression Analysis
Simple & Multiple Regression Analysis
 
Linear Regression | Machine Learning | Data Science
Linear Regression | Machine Learning | Data ScienceLinear Regression | Machine Learning | Data Science
Linear Regression | Machine Learning | Data Science
 
Data Analysison Regression
Data Analysison RegressionData Analysison Regression
Data Analysison Regression
 

Lecture 4

  • 1. Statistical Analysis Software Click to edit Master title style Bivariate and Multivariate Regression Analysis Academic Department of Marketing Caucasus School of Business Caucasus University 1 2011
  • 2. Problems of Test 1 • Formulating null and alternative hypotheses incorrectly • Ignoring question “why” • Ignoring the necessity to comment on the scale used • Mixing up Wilcoxon and paired samples T test • Massively ignoring the necessity to check the equality of variances (Levene’s test) • Kolmogorov-Smirnov test
  • 3. Homework 1 • Three or four homework assignments will be given throughout the course. You will be informed about the number of points you can get from each assignment. • The first homework assignment will include two problems: the first one is the ANOVA problem from test 1 – each one of you will have individual databases. The second problem will be about using Pearson’s Chi Square statistic in cross-tabulations. However, you will have to come up with your own example and your own fictional database. • The assignment is worth 2 points and is due
  • 4. Important Note (Homework) • EVEN IF ALL THE INTERPRETATION IS CORRECT, YOU WILL GET ZERO POINTS IN CASE YOU SUBMIT THE WRONG OUTPUT WHETHER IT’S BECAUSE YOU DID THE WRONG TEST OR YOU USED SOMEBODY ELSE’S DATASET.
  • 5. Warming Up – Linear Equations • What does a linear relationship imply? • How does a linear relationship look like (mathematically)? • What are the variables in this equation and what are the parameters? • How are the parameters interpreted?
  • 6. Scatterplot (1) • Scatterplot – collection of points (x,y) on the coordinate system. Each point on a scatterplot depicts a single case, that has a specific X value and a specific Y value, which you can find on the X and Y axis.
  • 7. Scatterplot (2) • As we see, there is a certain relationship between income and saving – the higher the income, the higher the saving. • But are we interested only in the direction? Not really. It is important to measure by how much saving increases as income increases by, say, 1 Lari. • By saying this we imply that there is a linear relationship between income and saving (which is not necessarily true, but let’s ignore this for now).
  • 8. Scatterplot (3) • Going back to our scatterplot, we need to find a line (i.e. determine the intercept and the slope) which best describes the relationship between two variables (in this case saving and income). • This is exactly where regression comes into play – it helps to identify such a line by using the sample information.
  • 9. Bivariate Regression Model • In theory, the relationship between saving and income already exists and is somewhere out there – we can’t really determine it in practice. Why? Because we would need to collect information about everybody’s income and everybody’s saving (i.e. we would need information about the whole population). • If we could, the bivariate regression model would look like this: Y=β0+ β1*X, where Y is saving and X is income.
  • 10. Error Term • Note that even in the ideal case, where we have information about the population, we are still unable to exactly predict the level of saving by the level of income. Why? Because income is not the only factor that determines saving. There are other factors that aren’t accounted for in our bivariate regression model. • All the other factors not explicitly accounted for in the regression model fall in the so called error term, denoted by ε. • Therefore, the population regression model looks like this: Y=β0+ β1*X+ ε
  • 11. Linear Regression Analysis (Bivariate) • Identifying the line that depicts the relationship between X and Y boils down to estimating β0 and β1. • What a regression does is basically providing us with estimates (regression coefficients) of β0 and β1, which are denoted by b0 and b1. • The estimated regression model looks like this: Ŷ= b0 + b1*X
  • 12. Interpreting Regression Coefficients • Ŷ= b0 + b1*X • Ŷ – predicted values, shows us the predicted values of Y as X takes specific values. • b0 - intercept, shows the predicted value of Y when X=0. • b1 - slope estimate, shows by how much the predicted value of Y changes as X changes by 1 unit.
  • 13. Residual • Residual is the difference between the actual value of Y and predicted value of Y, and is denoted by e. • e=Y – Ŷ • Do not mix up residual and error term. They are NOT the same. We never know the error term. However, we can easily estimate the residual. Residual is an estimate of the error term.
  • 14. Linear Regression – How to do in SPSS
  • 15. Linear Regression - Output • Thus, if income is 0, the predicted saving is equal to 124.842. And if income increases by 1 Lari, saving will increase by 0.147 Lari. • Is this model appropriate to predict the levels of saving? Not really. Saving is also determined by other factors, like family size, education level of household head, his/her age and gender. (Of course there may be other determinants as well, but let’s focus on these for now)
  • 16. Multiple Regression Analysis • Multiple regression implies including more than one independent variable in the regression model. Basically it looks like this: Y=β0+ β1*X1+ β2*X2+ β3*X3+…+ βk*Xk+ ε • In this case we need to estimate (k+1) parameters - b0, b1, b2 … bk. • Interpretation of slope coefficients: b1 shows by how much predicted Y changes as X1 changes, holding all other X-s constant. • Interpretation of intercept – the predicted value of Y when all the X-s are equal to zero.
  • 18. Major Goals of Conducting Regression Analysis • Goal 1. Measuring partial effects – by how much does Y change when X1 changes by 1 unit, holding all other X-s constant? • Goal 2. Forecasting the values of the dependent variable – what is the predicted saving level (measured in Laris) of a family that has a family income of 1000 Laris, that has 5 members, whose household head studied for 15 years and whose household head is 47 years old? • Regression provides answers to these questions.
  • 19. Predictive Power of a Model • In order to know how good our model is for forecasting, we need to measure the predictive power of the model. In other words, we want to know how well the independent variables explain the dependent variable. • Coefficient of determination (R- squared) is widely used for this purpose.
  • 20. Coefficient of Determination – R-Squared (1) • Coefficient of determination (R-squared) measures the portion of the variation in Y explained by the variation in X-s, in other words, how much of the variation in the dependent variable is explained by the independent variables. • This is also called goodness-of-fit. • R-squared ranges from 0 to 1 and shows how well the regression line describes the data cloud that you see on the scatterplot. • The closer the data are clustered around the regression line, the closer the R-squared is to 1. R2=1 is perfect fit (never possible in practice). The closer the R-squared is to 0, the worse the fit.
  • 21. Coefficient of Determination – R-Squared (2) • For example, if R-squared is equal to 0.045, it means that independent variables explain only 4.5% of variation in the dependent variable. • This is an example of low predicting power. • The higher the R-squared, the better the predictive power of your model.
  • 22. Testing Significance of Regression Coefficients (1) • As we already mentioned, the other goal of regression analysis is to determine partial effects. • Basically, partial effects measure pure effects of respective independent variables on the dependent variable. • What we want to know is whether these pure effects are important. How can we find this out? • This is done by testing the significance of the regression coefficients.
  • 23. Testing Significance of Regression Coefficients (2) • Suppose we want to test whether age of household head (X4) has an important effect on saving once all the other factors (household size, income, education of household head) are controlled for. • Null hypothesis is that β4 = 0. (i.e., as X4 changes by 1 unit, nothing happens to Y, no effect on Y) • Alternative hypothesis is that β4 is different from 0 (two-tailed test).
  • 24. Testing Significance of Regression Coefficients (3) • It can be shown that if we divide the estimate of β4 (b4) by standard error of b4 (which is standard deviation of b4 ), the resulting statistic follows t distribution. • Thus, we can either calculate the t statistic and compare it to the critical t value at 5% significance level, or we can simply look at the p- value (Sig.) of the regression coefficient. If the latter is less than 0.05, we conclude that the regression coefficient is significantly different from zero (or just significant, shortly). In other words, the partial effect of this variable is statistically important.
  • 25. Testing Significance of Regression Coefficients - Example • Going back to our multivariate regression example, no single independent variable appears to be statistically significant – all the p-values are more than 0.05. • However, even though these variables are separately insignificant, there is a chance that they are collectively significant. • This hypothesis is tested by joint F test.
  • 26. Joint F Test • Null Hypothesis: β1 = β2 = β3 = β4 = 0 • Alternative Hypothesis: at least one of them is different from zero. • This is equivalent to testing whether R2=0.
  • 27. Important Note • It can happen that all the coefficients are separately insignificant but jointly significant, even though in our example they’re also jointly insignificant at 5% significance level. • It can also happen that regression coefficients are separately significant but jointly insignificant. WHEN?