2
4. A researcher is interested in the level of ideological consistency among Democrats and Republicans.
She creates a measure of ideological consistency that ranges from 0 (total lack of consistency) to 10
(absolute consistency). What kind of statistical test should the researcher employ?
A. Chi-Square
B. Guessing
C. Differences of Means Test
D. Correlation
5. Regression in appropriate when our dependent variable is measured at what level of measurement?
A. Interval
B. Ordinal
C. Nominal
D. Dummy
6. A type one error occurs…
A. When we incorrectly fail to reject the null hypothesis even though it is false
B. When we have measurement error in one of our variables
C. When we incorrectly reject the null hypothesis even though it is true
D. When the results of our analysis do not support our alternative hypothesis
7. Below are four different hypotheses, which of the four should be tested using a one tailed test?
A. Democrats and Republicans will differ in their support for tax cuts
B. Republicans will be more supportive of tax cuts than Democrats
C. Republicans and Democrats will not differ in their support for tax cuts
D. Support for tax cuts will differ by party
8. In Chi Square testing our expected frequencies are…
A. The frequencies we would expect to observe if the null hypothesis was true
B. The frequencies we actually observe
3
C. The frequencies we would expect to observe if the alternative hypothesis was true
D. The frequencies we would expect to observe if the null hypothesis was false
9. A researcher is interested in testing whether males and females differ in their level of political
knowledge. To test this the researcher administers a political knowledge test to a sample of 10 males
and 10 females. Tests are scored out of 100 points. What statistical test should the researcher use to
test her hypothesis that males and females will differ in their level of political knowledge. (Hint think
about what test is appropriate for the level of measurement of these variables)
A. Correlation
B. Chi Square
C. Difference of Means Test
D. Standard Deviation
10. Outliers are a particular problem for which statistical test?
A. Correlation
B. Regression
C. Difference of Means
D. Chi Square
11. In regression our constant (Y intercept) is equal to:
A. The predicted value of Y when all of the X’s in our model = 0
B. The expected change in Y associated with a one unit change in X
C. The predicted value of X when all the Y’s in our model = 0
D. The expected change in X associated with a one unit change in Y
12. If we decrease our probability of making a Type 1 error we…
A. Decease our probability of making a Type 2 error
B. Increase our probability of making a Type 2 error
C. Have the same probability of making a Type 2 error
D. Have 0 probability of making a Type 2 error
4
13. Correlation and regression ana ...
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
2 4. A researcher is interested in the level of ideolog.docx
1. 2
4. A researcher is interested in the level of ideological
consistency among Democrats and Republicans.
She creates a measure of ideological consistency that ranges
from 0 (total lack of consistency) to 10
(absolute consistency). What kind of statistical test should the
researcher employ?
A. Chi-Square
B. Guessing
C. Differences of Means Test
D. Correlation
5. Regression in appropriate when our dependent variable is
measured at what level of measurement?
A. Interval
B. Ordinal
C. Nominal
D. Dummy
2. 6. A type one error occurs…
A. When we incorrectly fail to reject the null hypothesis even
though it is false
B. When we have measurement error in one of our variables
C. When we incorrectly reject the null hypothesis even though it
is true
D. When the results of our analysis do not support our
alternative hypothesis
7. Below are four different hypotheses, which of the four should
be tested using a one tailed test?
A. Democrats and Republicans will differ in their support for
tax cuts
B. Republicans will be more supportive of tax cuts than
Democrats
C. Republicans and Democrats will not differ in their support
for tax cuts
D. Support for tax cuts will differ by party
8. In Chi Square testing our expected frequencies are…
A. The frequencies we would expect to observe if the null
hypothesis was true
B. The frequencies we actually observe
3. 3
C. The frequencies we would expect to observe if the alternative
hypothesis was true
D. The frequencies we would expect to observe if the null
hypothesis was false
9. A researcher is interested in testing whether males and
females differ in their level of political
knowledge. To test this the researcher administers a political
knowledge test to a sample of 10 males
and 10 females. Tests are scored out of 100 points. What
statistical test should the researcher use to
test her hypothesis that males and females will differ in their
level of political knowledge. (Hint think
about what test is appropriate for the level of measurement of
these variables)
A. Correlation
B. Chi Square
C. Difference of Means Test
D. Standard Deviation
4. 10. Outliers are a particular problem for which statistical test?
A. Correlation
B. Regression
C. Difference of Means
D. Chi Square
11. In regression our constant (Y intercept) is equal to:
A. The predicted value of Y when all of the X’s in our model =
0
B. The expected change in Y associated with a one unit change
in X
C. The predicted value of X when all the Y’s in our model = 0
D. The expected change in X associated with a one unit change
in Y
12. If we decrease our probability of making a Type 1 error
we…
A. Decease our probability of making a Type 2 error
B. Increase our probability of making a Type 2 error
C. Have the same probability of making a Type 2 error
D. Have 0 probability of making a Type 2 error
5. 4
13. Correlation and regression analysis are appropriate for
which type of relationship?
A. A curvilinear relationship
B. An exponential relationship
C. A linear relationship
D. A non-linear relationship
14. When we are testing for a difference in means and we fail to
reject the null hypothesis this indicates
that…
A. The means of the two samples are exactly identical
B. The means of the two samples are significantly different
C. Any difference between the means of the two samples is most
likely a result of true population
differences
D. Any difference between the means of the two samples is
most likely a result of sampling error
6. 15. If there is no relationship between two variables there
correlation would be equal to…
A. -1
B. 0
C. .5
D. 1
5
Part 2 Computational Questions: Write in the correct answer (70
points)
16. A researcher theorizes that Democrats and Republicans will
differ in their support for gun control.
He surveys 25 Democrats and 15 Republicans about their
support for gun control legislation and records
7. their answers on a standardized measure where higher scores
represent greater support for gun control.
He finds that the mean level of support among Democrats is 6.0,
with a standard deviation (S) of 1.2.
The mean level of support for Republicans is 4.9, with a
standard deviation (S) of 1.4. (8 points)
A. Using a difference of means test, test the null hypothesis that
there is no difference between
Democrats and Republicans in their support for gun control
legislation.
B. Do the results of this test support the researcher’s
hypothesis?
8. 6
17. A researcher is interested in determining whether reading
an article against the death penalty
makes people less supportive of the death penalty. To test this
hypothesis, she conducts an experiment
where she gives 5 participants an article to read that is critical
of the death penalty. 5 other participants
do not get to read the article. The researchers then asks all 10
participants how supportive they are of
the death penalty to see whether there are any differences in
support for the death penalty between
the two groups. She records their answers on a 10 point scale
with higher scores indicating more
support for the death penalty. Her results are reported in the
table below. (10 points)
A. Using these results and a difference of means test, test the
null hypothesis that participants will be
equally supportive of the death penalty after reading the article.
Did not read
9. article
Read article
2 1
5 5
6 6
7 4
7 4
B. Do the results of this test support the researcher’s
hypothesis?
10. 7
18. A researcher is interesting in determining whether favorite
ice cream flavors are equally distributed
among the population. He asks 20 randomly selected people to
report their favorite ice cream flavor.
The results are reported below. (8 points)
A. Using a Chi Square test, test the null hypothesis that favorite
ice cream flavors are equally distributed
among the population
Ice Cream
Flavor
Observed
Frequency
Chocolate 11
Vanilla 6
Strawberry 3
Pistachio 0
Total 20
11. B. What can you conclude about the distribution of favorite ice
cream flavors in the population?
8
19. Below are the results from a cross tabulation of gender and
political ideology. (8 points)
Liberal Moderate Conservative Total
Male 30 25 50 105
Female 40 35 20 95
Total 70 60 70 200
12. A. Fill in the table below with the expected frequency for each
cell.
Liberal Moderate Conservative
Male
Female
B. Calculate the Chi Square test statistic
Cell Observed
Frequency
Male Lib. 30
Male Mod. 25
Male Cons. 50
Female Lib. 40
Female Mod. 35
Female Cons. 20
Total 200
13. C. Using your Chi Square statistics from part B, test the null
hypothesis that there is no difference in
political ideology by gender.
9
D. Based on the results of this test what can you conclude about
the relationship between gender and
ideology?
20. A researcher theorizes that people who read more books
will do better on a vocabulary quiz. To this
hypothesis the researcher surveyed 4 people and asked them the
number of books they had read in the
past year. He also gave them a short vocabulary quiz, which was
scored out of 10 points. The results are
reported below. (6 points)
A. Calculate the correlation between number of books read and
scores on the vocabulary quiz.
14. # of Books Read Quiz Score
5 8
4 6
6 9
1 5
B. What does the correlation say about the size and strength of
the relationship between the number of
books read and vocabulary quiz scores?
C. Is the correlation significant? Why or not?
10
15. 21. Below is a correlation matrix between X, Y, and Z. (4
points)
A. Find the partial correlation between X and Y controlling for
Z
22. Use the table below to regress X on Y (10 points)
X Y
5 6
9 7
6 5
8 6
2 2
A. Find the slope coefficient (b) for X and explain what it
16. means.
B. Find the constant (a) and explain what it means.
X Y Z
X 1.00 .77 .96
Y .77 1.00 .73
Z .96 .73 1.00
11
C. Find �2 for the regression and explain what it means.
D. Write the regression equation and plot the line on the
17. scatterplot below.
E. What is the predicted value of Y when X = 4?
0
1
2
3
4
5
6
7
0 1 2 3 4 5 6 7 8 9 10
Y
X
12
18. 23. A researcher theorizes that the more educated a state is the
higher voter turnout in the state will be.
To test this theory the researcher measures the percentage of the
population in each state that has a
college degree and also the state’s voter turnout in the last
election. The researcher then regresses the
percent of the state with a college degree on the state’s voter
turnout. The results from the regression
are reported below (8 points)
A. What is the constant and how should it be interpreted in the
terms of this example.
B. What is the slope coefficient for the college variable and
how should it be interpreted in terms of this
example?
C. Is the slope coefficient on the college variable significant?
Why or why not?
19. D. Do the results of this regression support the researcher’s
hypothesis? Why or why not?
_cons 47.74297 5.649103 8.45 0.000 36.38469
59.10125
college .4498426 .2153555 2.09 0.042 .0168413
.8828438
turnout Coef. Std. Err. t P>|t| [95% Conf.
Interval]
Regression?
Categorical/Ordinal
Dependent Variable
Interval Dependent
Variable
20. Categorical/ Ordinal
Independent Variable Chi Square Difference of Means
Interval Independent
Variable Not Covered!
Correlation OR
Regression (bivariate)
Regression Analysis
• Similar to correlation analysis in many ways
• Requires interval level dependent variables
• Similar calculations
• More precisely specifies the relationship between X and Y
• The independent variable X is used to predict values of the
dependent
variable Y
• The coefficient on X tells us the effect of a one unit change in
X on our
predicted value of Y
Regression Intuition
Y Intercept
or a Slope or b
21. Regression Analysis
Regression Analysis
Y = a + bX
Y = 1.30 + .59(X)
Regression Analysis
Y = 14 + 3X
Calculating the Regression Line
Calculating b: Specific Steps
Calculating a: Specific Steps
Drawing the Regression Line
22. Using the Regression Line for Prediction
Regression Analysis- Example
X Y XY
1 2
3 3
3 5
4 6
5 8
5 4
7 7
1
9
9
16
25
25
49
26. Regression Exercise
• Suppose we hypothesis that people who miss class more often
perform worse academically. To test the hypothesis we ask four
people how many classes they missed and their GPA. Use
regression
analysis to calculate the effect of missing class on a person’s
expected
GPA. Do the results support our hypothesis?
# of Absences GPA
0 4.0
2 3.0
3 3.0
7 2.0
Review- Regression
Review- Regression
Regression Analysis- Example
X Y XY
28. 27
42
48
60
227
X – Hours of FoxNews
watched
Y – Political Knowledge
Score
Regression Analysis- Example
Y = 11 + (-.5)(X)
(0,11)
(6,8)
Regression Exercise
• Suppose we hypothesis that people who miss class more often
perform worse academically. To test the hypothesis we ask four
people how many classes they missed and their GPA. Use
regression
analysis to calculate the effect of missing class on a person’s
expected
29. GPA. Do the results support our hypothesis?
# of Absences GPA
0 4.0
2 3.0
3 3.0
7 2.0
Review- Regression Analysis Example
Review- Regression Analysis Example Cont.
X Y XY
1 2
3 3
3 5
4 6
5 8
5 4
7 7
37. (0,11)
(6,8)
Interpreting Regression Output
_cons 1.545455 1.420253 1.09 0.326 -2.105423
5.196332
x1 .8636364 .3246104 2.66 0.045 .0291988
1.698074
y1 Coef. Std. Err. t P>|t| [95% Conf.
Interval]
Total 28 6 4.66666667 Root MSE =
1.5226
Adj R-squared = 0.5032
Residual 11.5909091 5 2.31818182 R-squared
= 0.5860
Model 16.4090909 1 16.4090909 Prob > F
= 0.0449
F( 1, 5) = 7.08
Source SS df MS Number of obs =
7
38. b
a
T values P Values
Confidence Intervals
N
Interpreting Regression Output
_cons 11 1.359676 8.09 0.004 6.672905
15.32709
x2 -.5 .2118296 -2.36 0.099 -1.174136
.1741364
y2 Coef. Std. Err. t P>|t| [95% Conf.
Interval]
Total 10 4 2.5 Root MSE =
1.0801
Adj R-squared = 0.5333
Residual 3.5 3 1.16666667 R-squared =
0.6500
Model 6.5 1 6.5 Prob > F =
0.0994
F( 1, 3) = 5.57
39. Source SS df MS Number of obs =
5
Interpreting Regression Output
_cons .1500793 1.503553 0.10 0.921 -2.799602
3.099761
partyid7 11.81617 .3270595 36.13 0.000 11.17454
12.4578
obamatherm Coef. Std. Err. t P>|t| [95% Conf.
Interval]
Total 1592752.78 1289 1235.64994 Root MSE
= 24.783
Adj R-squared = 0.5029
Residual 791073.363 1288 614.187394 R-squared
= 0.5033
Model 801679.413 1 801679.413 Prob > F
= 0.0000
F( 1, 1288) = 1305.27
Source SS df MS Number of obs =
1290
40. . reg obamatherm partyid7
Interpreting Regression Output
_cons 52.86189 2.239573 23.60 0.000 48.46827
57.2555
poliinterest -2.110292 .9567143 -2.21 0.028 -3.987184
-.2333998
obamatherm Coef. Std. Err. t P>|t| [95% Conf.
Interval]
Total 1592279.44 1287 1237.20236 Root MSE
= 35.121
Adj R-squared = 0.0030
Residual 1586277.95 1286 1233.49763 R-squared
= 0.0038
Model 6001.48622 1 6001.48622 Prob > F
= 0.0276
F( 1, 1286) = 4.87
Source SS df MS Number of obs =
1288
41. Requirements for using Regression
• Both variables are measured at the interval level
• There is some flexibility with this for independent variables
• Linear relationship
• Random sample
• Characteristics are normally distributed OR N > 30
Review-The Correlation Coefficient (r)
• A coefficient that tells us about the strength and direction of a
relationship
• Always ranges from -1 to 1
• Direction:
• Positive numbers indicate a positive relationship
• Negative numbers indicate a negative relationship
• Strength:
-1
Perfect Neg.
Correlation
1
Perfect Pos.
Correlation
42. -.6
Strong Neg.
Correlation
.6
Strong Pos.
Correlation
-.3
Moderate Neg.
Correlation
.3
Moderate Pos.
Correlation
-.1
Weak Neg.
Correlation
.1
Weak Pos.
Correlation
No Correlation
0
Review- The Computational Formula for r
The Potential Problem of Correlation
• Correlation measures the strength and the direction of a
43. relationship
between TWO variables
• But what about other variables that may affect this
relationship?
• Considering another variable may change the observed
strength
and/or direction of the relationship between the original two
variables
The Potential Problems of Correlation- Example
The Potential Problems of Correlation- Example
Possibilities
Genuine Relationship Conditional Relationship Spurious
Relationship Changed Relationship
Partial Correlation
Find the Partial Correlation Coefficient- Specific Steps
Find the Partial Correlation Coefficient- Specific Steps
44. • Calculate the degrees of freedom
• N – 3
• Look up the critical value in the Table H
• alpha = .05
• Compare the calculated correlation to the critical value
• If the calculated value is > critical value we reject the null
hypothesis
• If the calculated value is < critical value we fail to reject the
null hypothesis
Partial Correlation- Example
• Suppose we have the following correlation matrix between X,
Y, and
Z:
• What is the partial correlation between X and Y controlling
for Z?
X Y Z
X 1.00 .60 .20
Y .60 1.00 .30
Z .20 .30 1.00
Partial Correlation- Example
45. • Suppose we have the following correlation matrix between X,
Y, and
Z:
• What is the partial correlation between X and Y controlling
for Z?
X Y Z
X 1.00 .846 .821
Y .846 1.00 .988
Z .821 .988 1.00
Requirements Assumptions for Using Correlation
• Linear relationship between X and Y
• Can check for this with a scatterplot!
• Interval level data
• Random sampling
• Characteristics normally distributed OR sample size is over 30
Correlation does NOT equal Causation
Next Class
46. • Introduction to Regression
Hypothesis Testing-Terminology
• The Null Hypothesis:
• The two samples are from the same population
• �1 = �2
• The hypothesis that (in most cases) we wish to reject
• The Alternative Hypothesis:
• The two samples are not drawn from the same population
• �1 ≠ �2
• We can never accept this hypothesis but we can find that it
more likely by
rejecting the null hypothesis
The Null versus Alternative Hypothesis- Example
• Do Democrats and Republicans differ in their support for gun
control?
• �0: There is no difference between Democrats and
Republicans in their
support for gun control
• ��: Democrats and Republicans do differ in their support for
47. gun control
• Do males and females differ in their preferences for the
Democratic
party over the Republican Party?
• �0: There is no difference between males and females in their
preference for
the Democrats over the Republicans
• ��: Males and females do differ in their preference for the
Democratic party
over the Republican Party
Hypothesis Testing- Terminology cont.
• Sampling Distribution of Differences between Means- a
distribution
of a large number of differences between sample means
Hypothesis Testing- More Terminology
• Alpha (α)- the level of significance we set
• Rejection Region- the area that we reject the null hypothesis if
we get
a Z or T score above
Alpha Z Value Rejection Region
.05 1.96
.01 2.58
48. Hypothesis Testing- More Terminology cont.
• Type 1 Error- when we reject the null hypothesis when it is in
fact
true
• Probability of a type 1 error = α
• Type 2 Error- when we fail to reject the null hypothesis when
it is in
fact false
• Probability of a type 2 error = 1 - α
Hypothesis Testing General Procedure
• Establish hypotheses
• Collect sample data
• Calculate statistics to evaluate how likely the sample results
are given
the hypothesis
• Decide on the basis of the statistics whether to reject or fail to
reject
the null hypothesis
Difference of Means Hypothesis Testing- General Intuition
• Assume that the null hypothesis is correct (i.e. the difference
49. between population means is 0)
• �1 − �2 = 0
• Calculate the observed difference of means between the
samples
• �1 − �2
• Calculate the probability that we would obtain a difference as
extreme as the one we found if the null hypothesis is true
• Reject the null hypothesis if this probability is small enough
• P > .05
Difference of Means Hypothesis Testing- General Intuition
Sampling distribution of difference of means statistic
(assuming the null hypothesis is correct)
�1 − �2
P P
Difference of Means Testing- Overview
• Calculate the sample means �1 ��� �2
• Calculate the difference between sample means
• �1 − �2
• Translate the mean difference into a T score
• � =
50. �1 − �2
� �1 − �2
� �1 − �2 =
�1 �1
2+ �2+�2
2
�1+�2 −2
�1+ �2
�1�2
• Compare the observed T value to the table T value
• If observed T > Table T, p will be >.5 and you can reject the
null
• If observed T < Table T, p will not be >.5 you fail to reject the
null
Differences in Means Testing
• Do our sample means differ significantly from each
other or are they from the same distribution?
0
.1
.2
.3
51. .4
-6 -4 -2 0 2 4
x
Step by Step Procedure
1) Define Significance Level, e.g. α = 0.05
2) Specify Null and Alternative Hypothesis:
H0 :
HA :
> We are comparing the means of two samples
with each other! Are they significantly
different?
Step by Step Procedure
3) Find the critical value
> t-score given by e.g. α = 0.05 and d.f. = n1+n2–2
4) Calculate the test statistic:
5) Compare test statistic to the critical value
if , reject H0 , else retain H0
52. Example
• We ask a random sample of people about their
attitude towards gun control on a feeling
thermometer from 0-100 with the following results:
Liberals Conservatives
n = 25 n = 35
mean = 60 mean = 49
s.d. = 12 s.d. = 14
• Are the two groups different?
Example
1) Define Significance Level: α = 0.05
2) Specify Null and Alternative Hypothesis:
H0 :
HA :
3) Find critical the value (α = 0.05 and d.f. = n1+n2–2)
Example
4) Calculate the test statistic:
5) Compare test statistic to the critical value since
, we can reject H0!
53. > Liberals show greater support for gun control than
conservatives.
Agenda
• Introduction to Correlation
• Review Scatterplots and their connection to correlations
• The correlation coefficient
Why Correlation?
• Allows us to examine the strength and direction of a
relationship
• Allows us to examine the relationship between an interval
level
dependent variable and an interval level independent variable
Categorical/Ordinal
Dependent Variable
Interval Dependent
Variable
Categorical/ Ordinal
Independent Variable Chi Square Difference of Means
54. Interval Independent
Variable Not Covered!
Correlation OR
Regression (bivariate)
Review Scatterplot
D
e
p
e
n
d
e
n
t
va
ri
a
b
le
o
n
t
55. h
e
Y
A
xi
s
Independent variable on the X Axis
Each point
represents one
data point,
both its X and Y
values.
This point
corresponds to
Y =8 and X =6
The Importance of Scatterplots to Correlation
The Importance of Scatterplots to Correlation
• Identifying if the relationship is linear
• Correlation coefficient only appropriate for linear
relationships!
The Importance of Scatterplots to Correlation
56. The Importance of Scatterplots to Correlation
The Importance of Scatterplots to Correlation
• Researchers should always look at a scatterplot of their
variables
before calculating a correlation coefficient to:
• Get an idea about the strength and direction of the relationship
• Make sure the relationship is linear
• If the relationship is not linear calculating a correlation
coefficient is not
appropriate and the researcher should not proceed
• Make sure there are no problematic outliers
• If problematic outliers are identified the researcher might
want to think about
removing them
The Correlation Coefficient (r)
• A coefficient that tells us about the strength and direction of a
relationship
• Always ranges from -1 to 1
• Direction:
58. Correlation
No Correlation
0
The Correlation Coefficient (r)
The Computational Formula for r
Testing the Significance of r
Correlation – Specific Steps
Correlation – Specific Steps cont.
• Find the critical r in the table
• Calculate the degrees of freedom
• N - 2
• α = .05
• Compare calculated r to critical r
• If our calculated r is > critical r we reject the null hypothesis
• If our calculated r is < critical r we fail to reject the null
64. 78
1695
N = 7
.25 4624 34
Correlation – Exercise
• Find the correlation coefficient and determine whether or not
it is
significant for the following set of data.
X Y
1 5
2 6
3 4
4 2
Next Class
• Partial Correlation
• 3rd Homework Due
65. Agenda
• Introduction to Nonparametric Tests of Significance
• Chi Square Test
• One way chi square
• Two way chi square
Parametric versus Nonparametric Tests
• Parametric Test- A test that makes assumptions about the
population
parameters and the data
• Nonparametric Test- Tests that make fewer assumptions about
the
population parameters and the data
The One Way Chi Square Test – Terminology
• One way Chi-Square test: Testing for differences between the
distribution
of one variable among its categories.
• Two way Chi-Square Test: Testing for differences between the
distribution
of one variable by another variable.
• Observed frequencies: Frequencies that actually occur based
on data we
66. collected.
• Expected frequencies: Frequencies that we would expect IF
the null
hypothesis was true.
The One Way Chi Square Test – Intuition
• Expected frequencies are the frequencies that would occur if
the null
hypothesis was true
• We want to test the null hypothesis
• Can test it by seeing if the observed frequencies are
significantly
different from the expected frequencies
• If differences are small they might be sampli
H0
• If the differences are large enough, however, they probably
represent true
The One Way Chi Square Test – Intuition
• Is the coin fair?
Coin Observed
Frequency
Expected
67. Frequency
Head 80 50
Tails 20 50
• Calculate expected frequencies for each category.
• Divide total number of observations by K, the number of
categories.
• Calculate Chi-Square statistic
• Consult textbook page 323
The One Way Chi Square Test – Specific Steps
The One Way Chi Square Test – Specific Steps
• Look up the critical Chi Square value in the table (p.557)
• Degrees of Freedom = K-1
• Compare critical Chi Square to calculated Chi square
• If your calculated Chi Square is > your critical Chi square you
reject the null
hypothesis
• If your calculated Chi Square is < you critical Chi Square you
fail to reject the
null hypothesis
68. The One Way Chi Square Test – Example
• The following table summarizes the self-reported ideology of
a
sample of students from Stony Brook. Test the null hypothesis
that
ideological self-identification is distributed equally through the
student body.
Ideology
Liberal 30 50 -20 400 8
Moderate 75 50 25 625 12.5
Conservative 45 50 -5 25 .5
150
The One Way Chi Square Test – Exercise
• A researcher is interested in studying the voting turnout of
university
students. She takes a sample of 50 students and asks them
whether
they voted or not in the last election. She finds that only 15
students
reported voting while 35 reported not voting. Test the null
hypothesis
that voters and non-voters are distributed equally among
university
students.
69. The Two Way Chi Square Test
• Are the differences in a cross tabulation statistically
significant?
• Remember a cross tabulation looks at the distribution of one
variable by
categories of another variable
• The null hypothesis is that frequency of one variable does not
differ by
categories of the other variable
• Procedure is largely the same as the One Way Chi Square Test
• Only major difference is how you calculate the expected
frequencies and the
degrees of freedom
The Two Way Chi Square Test- Specific Steps
The Two Way Chi Square Test – Specific Steps
The Two Way Chi Square Test – Example
Democrat Republican Total
70. Obama 20 10 30
Romney 5 65 70
Total 25 75 100
Row
Marginal
Totals
Column Marginal Totals
N
The Two Way Chi Square Test – Example
Dem. Rep. Total
Obama 20 10 30
Romney 5 65 70
Total 25 75 100
Observed Frequencies
Dem. Rep. Total
Obama 7.5 22.5 30
Romney 17.5 52.5 70
Total 25 75 100
71. Expected Frequencies
The Two Way Chi Square Test – Example
Cell
Dems for
Obama
20 7.5
Dems for
Romney
5 17.5
Reps for
Obama
10 22.5
Reps for
Romney
65 52.5
Total 100 100
12.5
-12.5
-12.5
73. Average 25 23 48
Below
Average
10 8 18
50 50 100
The Two Way Chi Square Test- Example
Cell
No Course AA 15 17
No Course A 25 24
No Course BA 10 9
Course AA 19 17
Course A 23 24
Course BA 8 9
-2
1
.24
2
75. • The samples are random
• The expected frequency of any one cell should not be too
small
• Can use Yates Correction if this occurs
Next Class
• Introduction to Correlation
• Homework will be on Blackboard after class
• Deadline: April 20th, 5pm.
Review: Confidence Intervals
• What was their purpose again?
• How did we calculate them again?
Review: Confidence Intervals
• Our sample mean ( �) is our best estimate of the true
population
mean (μ)
• We know however that because of sampling error � is likely
to
deviate from μ
76. • The standard error of the mean tells us on average how far off
� is
likely to be from μ
• We can use this to construct a range of mean values around
our
sample mean � that contains μ with some level of probability
• This is called a confidence interval
Review: Confidence Intervals
Standard Error of the Mean Estimated Standard Error of the
Mean
Formula � � =
�
�
� � =
�
� − 1
Distribution Z Distribution T Distribution (small samples)
Z Distribution (large samples)
Review: The 95% Confidence Interval
• An interval around our sample mean ( �) that contains our
population
mean (μ) 95% of the time
77. • Formula:
95% ���������� �������� = �
+
−
(1.96)(� �)
Exercises
• See textbook page 194-195 for example
• See textbook page 212, question 21 and 22 for practice
Review: Confidence Intervals with the T
Distribution
• We use the T distribution if:
• We estimate the standard error of the mean using the � � =
�
� −1
formula
• AND if our sample size is small
• There is actually a family of T distributions
• The number of degrees of freedom you have will tell you
which distribution is
applicable
78. • Formula:
������� �� ������� �� = � − 1
Review: Using the T Table
• Alpha (α)- the area in the tails of the distribution
• Formula:
� = 1 − ����� �� ����������
• Degrees of freedom (df)- the number of free observations
• Formula:
�� = � − 1
Review: Calculating Confidence Intervals with
the T table
• Confidence intervals based on the T distribution
• Formula:
���������� �������� = �
+
−
� (� �)
Exercises
79. • See textbook page 201-203 for example
• See textbook page 212, question 26 and 27 for practice
Review: Difference of Means Hypothesis
Testing
• Assume that the null hypothesis is correct (i.e. the difference
between population means is 0)
• �1 − �2 = 0
• Calculate the observed difference of means between the
samples
• �1 − �2
• Calculate the probability that we would obtain a difference as
extreme as the one we found if the null hypothesis is true
• Reject the null hypothesis if this probability is small enough
• P < .05
Difference of Means Testing- Overview
• Calculate the sample means �1 ��� �2
• Calculate the difference between sample means
• �1 − �2
• Translate the mean difference into a T score
• � =
80. �1 − �2
� �1 − �2
� �1 − �2 =
�1 �1
2+ �2�2
2
�1+�2 −2
�1+ �2
�1�2
• Compare the observed T value to the table T value whereby df
= N1 +
N2 – 2 and alpha = 0.05
• If observed T > Table T, you can reject the null
• If observed T < Table T, you fail to reject the null
Exercises
• See textbook page 239-242 for example
• See textbook page 265, question 17, 18, and 19 for practice
• Calculate expected frequencies for each category.
• Divide total number of observations by K, the number of
categories.
81. • Calculate Chi-Square statistic
• Consult textbook page 323
Review: The One Way Chi Square Test
Review: The One Way Chi Square Test
• Look up the critical Chi Square value in the table (p.557)
• Degrees of Freedom = K-1
• Compare critical Chi Square to calculated Chi square
• If your calculated Chi Square is > your critical Chi square you
reject the null
hypothesis
• If your calculated Chi Square is < you critical Chi Square you
fail to reject the
null hypothesis
Exercises
• See textbook page 323-325 for example
• See textbook page 350-351, question 10 and 11 for practice
Review: The Two Way Chi Square Test
82. Review: The Two Way Chi Square Test
The Two Way Chi Square Test – Example
Democrat Republican Total
Obama 20 10 30
Romney 5 65 70
Total 25 75 100
Row
Marginal
Totals
Column Marginal Totals
N
Exercises
• See textbook page 331-333 for example
• See textbook page 357, question 31 and 32 for practice
The Correlation Coefficient (r)
• A coefficient that tells us about the strength and direction of a
84. Weak Neg.
Correlation
.1
Weak Pos.
Correlation
No Correlation
0
The Correlation Coefficient (r)
The Computational Formula for r
Testing the Significance of r
Correlation – Specific Steps
Correlation – Specific Steps cont.
• Find the critical r in the table
• Calculate the degrees of freedom
• N - 2
• α = .05
85. • Compare calculated r to critical r
• If our calculated r is > critical r we reject the null hypothesis
• If our calculated r is < critical r we fail to reject the null
hypothesis
Exercises
• See textbook page 377-379 for example
• See textbook page 393, question 16, 17, and 18 for practice
Partial Correlation
Find the Partial Correlation Coefficient- Specific Steps
Find the Partial Correlation Coefficient- Specific Steps
• Calculate the degrees of freedom
• N – 3
• Look up the critical value in the Table H
• alpha = .05
• Compare the calculated correlation to the critical value
• If the calculated value is > critical value we reject the null
hypothesis
86. • If the calculated value is < critical value we fail to reject the
null hypothesis
Review- Regression
Exercises
• See textbook page 385-386 for example
• See textbook page 397, question 30 and 31 for practice
Review- Regression
Review- Regression Analysis Example
Regression Analysis- Example Cont.
Y = 1.56 + .86 X
(0, 1.56)
(4, 5)
87. Exercises
• See textbook page 403-406 for example
• See textbook page 435, question 6 and 7 for practice (a-d
only)
Interpreting Regression Output
_cons 1.545455 1.420253 1.09 0.326 -2.105423
5.196332
x1 .8636364 .3246104 2.66 0.045 .0291988
1.698074
y1 Coef. Std. Err. t P>|t| [95% Conf.
Interval]
Total 28 6 4.66666667 Root MSE =
1.5226
Adj R-squared = 0.5032
Residual 11.5909091 5 2.31818182 R-squared
= 0.5860
Model 16.4090909 1 16.4090909 Prob > F
88. = 0.0449
F( 1, 5) = 7.08
Source SS df MS Number of obs =
7
b
a
T values P Values
Confidence Intervals
N
Interpreting Regression Output
_cons 11 1.359676 8.09 0.004 6.672905
15.32709
x2 -.5 .2118296 -2.36 0.099 -1.174136
.1741364
y2 Coef. Std. Err. t P>|t| [95% Conf.
Interval]
Total 10 4 2.5 Root MSE =
1.0801
Adj R-squared = 0.5333
89. Residual 3.5 3 1.16666667 R-squared =
0.6500
Model 6.5 1 6.5 Prob > F =
0.0994
F( 1, 3) = 5.57
Source SS df MS Number of obs =
5
Interpreting Regression Output
_cons .1500793 1.503553 0.10 0.921 -2.799602
3.099761
partyid7 11.81617 .3270595 36.13 0.000 11.17454
12.4578
obamatherm Coef. Std. Err. t P>|t| [95% Conf.
Interval]
Total 1592752.78 1289 1235.64994 Root MSE
= 24.783
Adj R-squared = 0.5029
Residual 791073.363 1288 614.187394 R-squared
= 0.5033
Model 801679.413 1 801679.413 Prob > F
90. = 0.0000
F( 1, 1288) = 1305.27
Source SS df MS Number of obs =
1290
. reg obamatherm partyid7
Interpreting Regression Output
_cons 52.86189 2.239573 23.60 0.000 48.46827
57.2555
poliinterest -2.110292 .9567143 -2.21 0.028 -3.987184
-.2333998
obamatherm Coef. Std. Err. t P>|t| [95% Conf.
Interval]
Total 1592279.44 1287 1237.20236 Root MSE
= 35.121
Adj R-squared = 0.0030
Residual 1586277.95 1286 1233.49763 R-squared
= 0.0038
Model 6001.48622 1 6001.48622 Prob > F
= 0.0276
91. F( 1, 1286) = 4.87
Source SS df MS Number of obs =
1288
Categorical/Ordinal
Dependent Variable
Interval Dependent
Variable
Categorical/ Ordinal
Independent Variable Chi Square Difference of Means
Interval Independent
Variable Not Covered!
Correlation OR
Regression (bivariate)
Which method to use?
• You want to know whether there is a relationship between
gender
(male-female) and the vote (voted-did not vote). What test do
you
use?
92. • You want to know whether growing up in Long Island makes
you more
likely to be vote Republican (0 – not likely at all to 10 – very
likely).
What test do you use?
• You want to know whether there is a relationship between
years of
education and perceived competence of President Obama (0- not
competent at all to 10 – very competent). What test do you use?