3. Descriptive Procedures
Basics of Statistical Inference
Contents
1 Descriptive Procedures
Summary Statistics
Graphs
2 Basics of Statistical Inference
Analysis of customer satisfaction
T-Tests
ANOVA
Non-parametric Statistical Tests
Linear by Linear Association
Regression
Taddesse Kassahun Statistical Procedures 3 / 90
4. Descriptive Procedures
Basics of Statistical Inference
Summary Statistics
Graphs
Preliminary Inspection
Before starting on the detailed analysis of a data set,
make a preliminary inspection of all the data.
Categorical variables (sex, marital status, car models,
etc.) are summarized by tables, frequencies and diagrams.
Quantitative variables (age, salary, weight, etc.) are
described by summary statistics and graphs.
For nominal or ordinal data, the mode is usually the
sensible measure of central tendency.
For interval or ratio data, it is not sensible to produce
contingency (frequency) tables or bar- and pie-charts.
Taddesse Kassahun Statistical Procedures 4 / 90
5. Descriptive Procedures
Basics of Statistical Inference
Summary Statistics
Graphs
Preliminary Inspection
If there are too many discrete values for a variable, it
is sometimes useful to recode the variable into a new
variable (for safety!) with a smaller number of categories.
Use the Crosstabs procedure to depict in tabular form
the association between two nominal or ordinal variables.
A clustered or stacked bar-chart would be a good choice
for graphing association between two nominal or ordinal
variables.
To depict in tabular form the association between two
interval or ratio variables, use the bivariate Correlate or
Regression procedures.
Taddesse Kassahun Statistical Procedures 5 / 90
6. Descriptive Procedures
Basics of Statistical Inference
Summary Statistics
Graphs
A General Guideline for Analyze > Descriptive
Statistics
Task SPSS Function
Count Frequencies; Crosstabs
Average and measures of
spread
Frequencies with Statistics op-
tion; Descriptives
Comparing sets of data Explore; Crosstabs
Checking whether a given
samples resembles a the-
oretical distribution
P-P Plots; Q-Q Plots
Taddesse Kassahun Statistical Procedures 6 / 90
7. Descriptive Procedures
Basics of Statistical Inference
Summary Statistics
Graphs
Descriptive Statistics—Frequencies
Frequency distributions are tabular presentations of data
that show each category for a variable and the frequency
of the category’s occurrence in the data set.
Use Analyze > Descriptive Statistics > Frequencies
Frequencies are used when you want to know how many
of something you have.
Additional statistics are also available via the Statistics
button.
The Charts button is particularly useful to automatically
produce charts.
Taddesse Kassahun Statistical Procedures 7 / 90
8. Descriptive Procedures
Basics of Statistical Inference
Summary Statistics
Graphs
Descriptive Statistics—Frequencies
The Statistics button
brings up the following
dialogue box.
The Charts button gives
the following.
Taddesse Kassahun Statistical Procedures 8 / 90
9. Descriptive Procedures
Basics of Statistical Inference
Summary Statistics
Graphs
Descriptive Statistics—Descriptives
The Descriptives
procedure is used to
provide summary statistics
for quantitative variables,
especially where a variable
has a large number of
values, e.g. age, income.
The results are similar to
that for Frequencies but
without a frequency table
or chart.
Taddesse Kassahun Statistical Procedures 9 / 90
10. Descriptive Procedures
Basics of Statistical Inference
Summary Statistics
Graphs
Descriptive Statistics—Explore
If you want to produce
statistics about the
dispersion and distribution
of a scale variable, and
controlling for a particular
categorical variable, the
Explore procedure can be
used.
For example, to compare
beginning salaries of males
and females.
Analyze > Descriptive
Statistics > Explore
gives:
Taddesse Kassahun Statistical Procedures 10 / 90
11. Descriptive Procedures
Basics of Statistical Inference
Summary Statistics
Graphs
Descriptive Statistics—Explore
Select which information
you want to Display:
Statistics, Plots or Both.
If you click on the
Statistics button, you
will have:
If you click on Plots
button, you will have:
Taddesse Kassahun Statistical Procedures 11 / 90
12. Descriptive Procedures
Basics of Statistical Inference
Summary Statistics
Graphs
Descriptive Statistics—Crosstabs
These provide a summary
of the number of cases
which have particular
combinations of values for
two or more variables.
Usually used when the
variables are nominal or
ordinal.
From the Analyze menu,
select Descriptive
Statistics >
Crosstabs. . .
Taddesse Kassahun Statistical Procedures 12 / 90
13. Descriptive Procedures
Basics of Statistical Inference
Summary Statistics
Graphs
Descriptive Statistics—Ratio
The Ratio Statistics procedure
provides a comprehensive list of
summary statistics for describing
the ratio between two scale
variables.
It is helpful to calculate range,
average absolute deviation,
median-centered coefficient of
variation and mean-centered
coefficient of variation.
Use Analyze > Descriptive
Statistics > Ratio...
Taddesse Kassahun Statistical Procedures 13 / 90
14. Descriptive Procedures
Basics of Statistical Inference
Summary Statistics
Graphs
Descriptive Statistics—Q-Q & P-P Plots
These plots are useful to
check whether a variable
under consideration
follows a certain
distribution, e.g., normal.
Use Analyze >
Descriptive Statistics >
P-P plots
Use Analyze >
Descriptive Statistics >
Q-Q plots
Taddesse Kassahun Statistical Procedures 14 / 90
15. Descriptive Procedures
Basics of Statistical Inference
Summary Statistics
Graphs
Custom Tables
This procedure can be used to
tabulate data of more than one
variable.
Use Analyze > Tables >
Custom Tables
Hit the Reset button to reset
the tabs in the window.
Click on the option N%
Summary Statistics to include
the summary measures and then
press OK
Taddesse Kassahun Statistical Procedures 15 / 90
16. Descriptive Procedures
Basics of Statistical Inference
Summary Statistics
Graphs
Graphs
SPSS provides a wide variety of
charts(graphs) to choose from
such as Bar chart, Pie,
Histogram, Boxplots and
Scatter plots.
Graphs should convey a
message.
There are three options to
produce graphs : Chart
Builder, Graphboard
Template Chooser and
Legacy Dialogs.
Taddesse Kassahun Statistical Procedures 16 / 90
17. Descriptive Procedures
Basics of Statistical Inference
Summary Statistics
Graphs
Chart Builder
Use Graph > Chart builder >
OK
Choose the chart type from the
Choose from: option.
Double click on the chart/graph
in the box on the right side of
Choose from: option.
Move the variable into the
X-axis.
Click on OK.
Taddesse Kassahun Statistical Procedures 17 / 90
18. Descriptive Procedures
Basics of Statistical Inference
Summary Statistics
Graphs
Legacy Dialogs
Use Graph > Legacy
Dialogs >
Bar/Pie/Line, etc
If for example Bar is
selected, choose bar
option (simple, cluster or
stacked bar).
Hit the Define button >
Move the variable into
the Category Axis box.
Click on OK.
Taddesse Kassahun Statistical Procedures 18 / 90
19. Descriptive Procedures
Basics of Statistical Inference
Summary Statistics
Graphs
Graphboard Template Chooser
This is the third option for
drawing a graph.
Use Graph >
Graphboard Template
Chooser
Select the variable from
the Visualization of:
box.
Choose the chart type on
the right side of the
Visualization of: box.
Click on OK. Taddesse Kassahun Statistical Procedures 19 / 90
20. Descriptive Procedures
Basics of Statistical Inference
Summary Statistics
Graphs
A Clustered Bar Chart
Consider the Employee.sav
data and compare the
educational levels of males and
females.
Use Graph > Chart Builder >
Reset
Drag the second bar chart
option in the gallery.
Drag educ into the X-axis box.
Drag gender into the cluster on
X box..
Taddesse Kassahun Statistical Procedures 20 / 90
21. Descriptive Procedures
Basics of Statistical Inference
Summary Statistics
Graphs
A Stacked Bar Chart
Consider the Employee.sav
data and compare the
educational levels of males and
females.
Use Graph > Legacy Dialogs
> Bar. . . > Stacked
Summaries for group of cases
> Define > move educ into
Category Axis box and gender
into Define stacks by: box.
Taddesse Kassahun Statistical Procedures 21 / 90
22. Descriptive Procedures
Basics of Statistical Inference
Summary Statistics
Graphs
A Panel Bar Chart
In panel plots, subgroups of the
data are plotted on separate
axes alongside or above and
below each other.
Consider again the
Employee.sav data.
Use Graph > Legacy Dialogs
> Bar. . . > Simple
Put jobcat in the Category Axis
box and gender in Panel by
Rows: box.
Taddesse Kassahun Statistical Procedures 22 / 90
23. Descriptive Procedures
Basics of Statistical Inference
Summary Statistics
Graphs
Bar for More than One Variable
In this case, each bar represents a different variable.
Suppose that we would like to have the graph of % of
respondents who
like to work hard;
like to think for one self;
want to be popular.
Those having any of these characteristics are indicated by
a 1 while others are by ≥ 2 in each column.
Use Graphs > Legacy Dialogs > Bar... > Simple
Select the option Summaries of Separate Variables
and then Define
Move each of the variables in the Bars Represent box.
Taddesse Kassahun Statistical Procedures 23 / 90
24. Descriptive Procedures
Basics of Statistical Inference
Summary Statistics
Graphs
Bar for More than One Variable
By default, the Mean is
shown.
Click on Change
Statistics to change the
Mean to %.
We ask SPSS to calculate
% of entries less than 2.
Choose percentage
below and type 2 in the
Value box.
Click on Continue and
then OK Taddesse Kassahun Statistical Procedures 24 / 90
25. Descriptive Procedures
Basics of Statistical Inference
Summary Statistics
Graphs
Histogram
A graph that shows the
measured value on the x-axis
and how many observations of
each value on the y-axis.
Use Graphs > Chart Builder
> Histogram.
Drag the first option into the
Preview Area.
Drag the variable into X-axis
and then click on OK.
The histogram produced in this
way may not be well formatted.Taddesse Kassahun Statistical Procedures 25 / 90
26. Descriptive Procedures
Basics of Statistical Inference
Summary Statistics
Graphs
Histogram
Double click on the histogram to
bring up the Chart Editor.
Double click on the bars in the
histogram for the Properties
box.
Under the Binning tab there is
an option of selecting:
the number of bins;
the interval width.
Select Custom > Number of
intervals > write the number.
Click on Apply Taddesse Kassahun Statistical Procedures 26 / 90
27. Descriptive Procedures
Basics of Statistical Inference
Summary Statistics
Graphs
Boxplots
Boxplots are useful to check
variability, observe outliers, etc..
They are also useful way of
comparing two or more datasets.
For a single variable: Graphs >
Legacy Dialogs > Boxplot.
Click on Summaries of
separate variables > Define
> move the variable to Boxes
Represent > OK.
Taddesse Kassahun Statistical Procedures 27 / 90
29. Descriptive Procedures
Basics of Statistical Inference
Summary Statistics
Graphs
Boxplots
Boxplots for comparing two or
more groups.
Use Graphs > Legacy
Dialogs > Boxplot > Simple
> Summaries for group of
cases.
Define > move the variables
to Variable: box and the
categorizing variable to the
Category Axis: box and click
on OK.
Taddesse Kassahun Statistical Procedures 29 / 90
30. Descriptive Procedures
Basics of Statistical Inference
Summary Statistics
Graphs
Boxplots
Boxplots for two or more
variables separated by one
variable:
Use Graphs > Legacy
Dialogs > Boxplot >
Clustered > Summaries of
separate variables.
Define > move the variables
to Boxes Represent: and the
categorizing variable to the
Category Axis: box and click
on OK.
Taddesse Kassahun Statistical Procedures 30 / 90
32. Descriptive Procedures
Basics of Statistical Inference
Analysis of customer satisfaction
T-Tests
ANOVA
Non-parametric Statistical Tests
Linear by Linear Association
Regression
Inferential Statistics
Statistical inference refers to making generalizations
about a population based on samples of this population.
Inferential procedures can be either parametric or
non-parametric.
Parametric test is a test whose model specifies certain
conditions about the parameters of the population from
which the research sample was drawn.
Conditions include:
The observations must be independent.
Normality, constant variances.
Variables involved must have been measured in at least
an interval scale.
Taddesse Kassahun Statistical Procedures 32 / 90
33. Descriptive Procedures
Basics of Statistical Inference
Analysis of customer satisfaction
T-Tests
ANOVA
Non-parametric Statistical Tests
Linear by Linear Association
Regression
Inferential Statistics ...
Non-parametric test is a test whose model does NOT
specify conditions about the parameters of the population
from which the sample was drawn.
They do not require measurement so strong as that
required for the parametric tests.
Most non-parametric tests apply to data in an ordinal
scale, and some apply to data in nominal scale.
When the data under analysis are met those assumptions
for parametric tests, we should choose parametric tests
because they are more powerful than non-parametric
tests.
Taddesse Kassahun Statistical Procedures 33 / 90
34. Descriptive Procedures
Basics of Statistical Inference
Analysis of customer satisfaction
T-Tests
ANOVA
Non-parametric Statistical Tests
Linear by Linear Association
Regression
Analysis of customer satisfaction
Perform reliability analysis to determine to what degree
you were successful in constructing questions that
measure a person’s opinion.
Reliability is the property of a measurement instrument
that causes it to give similar results for similar inputs. .
Cronbach’s alpha (Cronbach, 1951) is the most
commonly used measure of reliability.
Alpha is a lower bound for the true reliability of the
survey.
The computation of Cronbach’s alpha is based on the
number of items on the survey and the ratio of the
average inter-item covariance to the average item
variance.
Taddesse Kassahun Statistical Procedures 34 / 90
35. Descriptive Procedures
Basics of Statistical Inference
Analysis of customer satisfaction
T-Tests
ANOVA
Non-parametric Statistical Tests
Linear by Linear Association
Regression
Analysis of customer satisfaction
Data can be dichotomous, ordinal, or interval, but the
data should be coded numerically.
Assumptions. Observations should be independent, and
errors should be uncorrelated between items. Each pair of
items should have a bivariate normal distribution. Scales
should be additive, so that each item is linearly related to
the total score.
To Obtain a Reliability Analysis
Analyze > Scale > Reliability Analysis...
Select two or more variables as potential components of
an additive scale.
Taddesse Kassahun Statistical Procedures 35 / 90
36. Descriptive Procedures
Basics of Statistical Inference
Analysis of customer satisfaction
T-Tests
ANOVA
Non-parametric Statistical Tests
Linear by Linear Association
Regression
Analysis of customer satisfaction
The closer the value of
Chronbach’s alpha to 1,
the more reliable is the
instrument.
Taddesse Kassahun Statistical Procedures 36 / 90
37. Descriptive Procedures
Basics of Statistical Inference
Analysis of customer satisfaction
T-Tests
ANOVA
Non-parametric Statistical Tests
Linear by Linear Association
Regression
Analysis of customer satisfaction
You may want to test whether each of the shops provides
a similar and adequate level of customer service.
A variable on overall satisfaction is required.
Use the Crosstabs procedure to test the hypothesis that
the levels of service satisfaction are constant across shops.
Cross tabulate shops versus convenience.
If each shop has a similar level of convenience, the
pattern of responses should be similar across shops.
The chi-square test measures the discrepancy between the
observed cell counts and what you would expect if the
rows and columns were unrelated.
Taddesse Kassahun Statistical Procedures 37 / 90
38. Descriptive Procedures
Basics of Statistical Inference
Analysis of customer satisfaction
T-Tests
ANOVA
Non-parametric Statistical Tests
Linear by Linear Association
Regression
Analysis of customer satisfaction
While the chi-square tests whether there is a relationship,
it doesn’t tell you the strength of the relationship.
Use symmetric (Phi, Cramer’s V, etc.) and directional
(Lambda, uncertainty ceoff., etc.) measures to quantify
this.
Phi is the most optimistic of the symmetric measures,
and it does not have a theoretical upper bound when
either of the variables has more than two categories.
Cramer’s V is a rescaling of phi so that its maximum
possible value is always 1.
Taddesse Kassahun Statistical Procedures 38 / 90
39. Descriptive Procedures
Basics of Statistical Inference
Analysis of customer satisfaction
T-Tests
ANOVA
Non-parametric Statistical Tests
Linear by Linear Association
Regression
Analysis of customer satisfaction
Directional measures quantify the reduction in the error
of predicting the row variable value when you know the
column variable value, or vice versa.
Lambda defines error as the misclassification of cases,
and cases are classified according to the modal (most
frequent) category.
Tau defines error as the misclassification of a case, and
cases are classified into category j with probability =
observed frequency of category j.
Uncertainty coefficient defines error as
Pr(categoryj) ∗ ln(Pr(categoryj)) summed over the
categories of the variable.
Taddesse Kassahun Statistical Procedures 39 / 90
40. Descriptive Procedures
Basics of Statistical Inference
Analysis of customer satisfaction
T-Tests
ANOVA
Non-parametric Statistical Tests
Linear by Linear Association
Regression
Analysis of customer satisfaction
Symmetric and directional measures of ordinal
association are based on the idea of concordance versus
discordance.
Concordant: if the case with the larger value in the row
variable also has the larger value in the column variable.
Discordant: if the case with the larger value in the row
variable has the smaller value in the column variable.
The larger the value of symmetric and directional
measures the stronger the relationship is.
Taddesse Kassahun Statistical Procedures 40 / 90
41. Descriptive Procedures
Basics of Statistical Inference
Analysis of customer satisfaction
T-Tests
ANOVA
Non-parametric Statistical Tests
Linear by Linear Association
Regression
Analysis of customer satisfaction
To obtain symmetric and directional measures:
Analyze > Descriptive Statistics > Cross tabs ... and
then click on Statistics.
Taddesse Kassahun Statistical Procedures 41 / 90
42. Descriptive Procedures
Basics of Statistical Inference
Analysis of customer satisfaction
T-Tests
ANOVA
Non-parametric Statistical Tests
Linear by Linear Association
Regression
Compare Means > Means...
This procedure gives more
summary statistics such as
kurtosis,skewness, harmonic
mean, geometric mean, etc.
Use Analyze>Compare
Means > Means...
Move the variable where you
would like to have summary
statistics into the Dependent
List: box.
Click on Option button for
summary statistics.Taddesse Kassahun Statistical Procedures 42 / 90
43. Descriptive Procedures
Basics of Statistical Inference
Analysis of customer satisfaction
T-Tests
ANOVA
Non-parametric Statistical Tests
Linear by Linear Association
Regression
One-Sample T Test
Tests H0 : µ = µ0 vs
H1 : µ = µ0
Assumption: The data are
normally distributed
Use Analyze > Compare
Means > One Sample t Test
Select the variable to be tested.
Set the test value (by default=
0)
Click on Option button for
setting confidence interval.
Taddesse Kassahun Statistical Procedures 43 / 90
44. Descriptive Procedures
Basics of Statistical Inference
Analysis of customer satisfaction
T-Tests
ANOVA
Non-parametric Statistical Tests
Linear by Linear Association
Regression
Independent-Samples T Test
It compares means for two groups of cases.
The subjects should be randomly assigned to 2 groups.
Assumptions: The observations should be independent,
random samples from normal distributions (large n).
The hypothesis to be tested:
H0 : µ1 = µ2 vs. H1 : µ1 = µ2
Take the data from the two samples and enter them
within one column, create one grouping variable (with
value 0 for data from Sample 1 & 1 from 2).
Use Analyze > Compare Means >
Independent-Samples T-test. . .
Taddesse Kassahun Statistical Procedures 44 / 90
45. Descriptive Procedures
Basics of Statistical Inference
Analysis of customer satisfaction
T-Tests
ANOVA
Non-parametric Statistical Tests
Linear by Linear Association
Regression
Independent-Samples T Test
Question: Is there a difference
in the population mean salaries
of males and females?
Research Hypo: Difference in
the population mean salaries of
males and females.
Null Hypo: No difference in the
population mean salaries of
males and females.
Move salary to Test
Variable(s): and gender to
Grouping Variable: boxTaddesse Kassahun Statistical Procedures 45 / 90
46. Descriptive Procedures
Basics of Statistical Inference
Analysis of customer satisfaction
T-Tests
ANOVA
Non-parametric Statistical Tests
Linear by Linear Association
Regression
Paired-Samples T Test
This compares the differences between pairs of readings
for two related samples. e.g., two pulse readings from the
same patient.
Assumptions:
Observations for each pair should be made under the
same conditions.
Normal distribution of the mean differences.
Research Hypothesis: There is a difference in the
population means of the first and second pulse rates of
each patient.
Null Hypothesis: There is no difference in the
population means of the first and second rates of each
patient.
Taddesse Kassahun Statistical Procedures 46 / 90
47. Descriptive Procedures
Basics of Statistical Inference
Analysis of customer satisfaction
T-Tests
ANOVA
Non-parametric Statistical Tests
Linear by Linear Association
Regression
Paired-Samples T Test
Use Analyze > Compare
Means >
Paired-Samples T
Test...
Select a pair of variables.
Click the arrow button to
move the pair into the
Paired Variables: list.
Click on OK.
Taddesse Kassahun Statistical Procedures 47 / 90
48. Descriptive Procedures
Basics of Statistical Inference
Analysis of customer satisfaction
T-Tests
ANOVA
Non-parametric Statistical Tests
Linear by Linear Association
Regression
Comparison of More than Two Population Means
One may compare more than two groups of means using
t − test. However, Type I error rate will be inflated when
multiple t-test are conducted.
Instead, ANOVA is used to conduct such tests.
The one-way ANOVA requires one dependent variable and
a single independent variable with two or more levels.
ANOVA compares the variation among groups with the
variation within groups.
Research Hypothesis: There is a difference in the
population means of the highest year of school completed
for each region.
Taddesse Kassahun Statistical Procedures 48 / 90
49. Descriptive Procedures
Basics of Statistical Inference
Analysis of customer satisfaction
T-Tests
ANOVA
Non-parametric Statistical Tests
Linear by Linear Association
Regression
Comparison of More than Two Population Means
Null Hypothesis: There is no difference in the
population means of the highest year of school completed
for each region.
If the null hypothesis of ANOVA is rejected, then we may
want to know which means differ.
The following tests for comparing means can be used.
A priori contrasts: A test is set up before running the
experiment.
Post hoc: Tests are run after the experiment has been
conducted.
The dependent variable should be quantitative (interval
level of measurement).
Taddesse Kassahun Statistical Procedures 49 / 90
50. Descriptive Procedures
Basics of Statistical Inference
Analysis of customer satisfaction
T-Tests
ANOVA
Non-parametric Statistical Tests
Linear by Linear Association
Regression
Comparison of More than Two Population Means
Assumptions.
Each group is an independent
random sample from a normal
population.
The groups should come from
normal populations with equal
variances.
Stack the response data in one
column.
Use Analyze > Compare
Means > One-Way
ANOVA. . . Taddesse Kassahun Statistical Procedures 50 / 90
51. Descriptive Procedures
Basics of Statistical Inference
Analysis of customer satisfaction
T-Tests
ANOVA
Non-parametric Statistical Tests
Linear by Linear Association
Regression
One-Way ANOVA Contrasts
Polynomial: Partition the between-groups sums of
squares into trend components.
You can test for a trend of the dependent variable across
the ordered levels of the factor variable.
Degree: choose a 1st
, 2nd
, 3rd
, 4th
, or 5th
degree
polynomial.
Coefficients: User-specified a priori contrasts to be
tested by the t statistic.
Enter a coefficient for each group (category) of the factor
variable and click Add after each entry.
Each new value is added to the bottom of the coefficient
list.
Taddesse Kassahun Statistical Procedures 51 / 90
52. Descriptive Procedures
Basics of Statistical Inference
Analysis of customer satisfaction
T-Tests
ANOVA
Non-parametric Statistical Tests
Linear by Linear Association
Regression
One-Way ANOVA Contrasts
To specify
additional sets of
contrasts, click
Next.
The order of the
coefficients
corresponds to the
ascending order of
the category values
of the factor
variable.
Taddesse Kassahun Statistical Procedures 52 / 90
53. Descriptive Procedures
Basics of Statistical Inference
Analysis of customer satisfaction
T-Tests
ANOVA
Non-parametric Statistical Tests
Linear by Linear Association
Regression
One-Way ANOVA Post Hoc Tests
Two options: equal
variance assumed and
equal variance not
assumed.
Test for equality of
variances using Levene’s
Test of Homogeneity of
Variance.
Hypothesis:
H0 : σ2
1 = σ2
2 = σ2
k vs
H1 : Not H0
Click on Options button.Taddesse Kassahun Statistical Procedures 53 / 90
54. Descriptive Procedures
Basics of Statistical Inference
Analysis of customer satisfaction
T-Tests
ANOVA
Non-parametric Statistical Tests
Linear by Linear Association
Regression
One-Way ANOVA Post Hoc Tests
Select the box
Homogeneity of
Variance test
click on Continue.
Click on the Post
Hoc... button.
Choose either the
equal or not-equal
variance assumed
option based on
your test result
above. Taddesse Kassahun Statistical Procedures 54 / 90
55. Descriptive Procedures
Basics of Statistical Inference
Analysis of customer satisfaction
T-Tests
ANOVA
Non-parametric Statistical Tests
Linear by Linear Association
Regression
One-Way ANOVA Means plot
Displays a chart
that plots the
subgroup means
(the means for each
group defined by
values of the factor
variable).
To run mean plot
click on mean plot
from the option
check box.
Taddesse Kassahun Statistical Procedures 55 / 90
56. Descriptive Procedures
Basics of Statistical Inference
Analysis of customer satisfaction
T-Tests
ANOVA
Non-parametric Statistical Tests
Linear by Linear Association
Regression
Non-parametric Statistical Tests
These procedures are recommended if:
the population involved is not normally distributed;
data are measured on nominal and/or ordinal scales.
Sign test and Wilcoxon signed rank test are the
non-parametric alternatives of a one sample t test.
Research Hypothesis: The population median salary of
employees is different from M0.
Null Hypothesis: The population median salary of
employees is equal to M0.
Use Analyze > Nonparametric > One Sample >
Click on Settings > customize test
Write the hypothesized median value (M0)
Click on Run
Taddesse Kassahun Statistical Procedures 56 / 90
57. Descriptive Procedures
Basics of Statistical Inference
Analysis of customer satisfaction
T-Tests
ANOVA
Non-parametric Statistical Tests
Linear by Linear Association
Regression
One-Sample Non-parametric Tests
Taddesse Kassahun Statistical Procedures 57 / 90
58. Descriptive Procedures
Basics of Statistical Inference
Analysis of customer satisfaction
T-Tests
ANOVA
Non-parametric Statistical Tests
Linear by Linear Association
Regression
Mann-Whitney U Test
This is the non-parametric alternative to t-test for
independent samples.
Research Hypothesis: There is a difference in the
population media salaries of males and females.
Null Hypothesis: There is no difference in the
population median salaries of males and females.
Keep both samples in a single numeric column and create
another grouping column.
Use Analyze > Nonparametric Tests > 2
Independent Samples > Compare medians across
groups > click on Fields tab
Taddesse Kassahun Statistical Procedures 58 / 90
59. Descriptive Procedures
Basics of Statistical Inference
Analysis of customer satisfaction
T-Tests
ANOVA
Non-parametric Statistical Tests
Linear by Linear Association
Regression
Mann-Whitney U Test
Taddesse Kassahun Statistical Procedures 59 / 90
60. Descriptive Procedures
Basics of Statistical Inference
Analysis of customer satisfaction
T-Tests
ANOVA
Non-parametric Statistical Tests
Linear by Linear Association
Regression
Kruskal-Wallis Test
A non-parametric alternative to the one-way ANOVA.
Research Hypothesis: There is a difference in the
population medians of the highest year of school
completed for each region.
Research Hypothesis: There is no difference in the
population medians of the highest year of school
completed for each region.
Keep all samples in a single column and their levels in
another column.
Use Analyze > Nonparametric Tests > Legacy
Dialogs > K Independent Samples
Click Define Range button & type the values.
Taddesse Kassahun Statistical Procedures 60 / 90
61. Descriptive Procedures
Basics of Statistical Inference
Analysis of customer satisfaction
T-Tests
ANOVA
Non-parametric Statistical Tests
Linear by Linear Association
Regression
Kruskal-Wallis Test
Taddesse Kassahun Statistical Procedures 61 / 90
62. Descriptive Procedures
Basics of Statistical Inference
Analysis of customer satisfaction
T-Tests
ANOVA
Non-parametric Statistical Tests
Linear by Linear Association
Regression
Taddesse Kassahun Statistical Procedures 62 / 90
63. Descriptive Procedures
Basics of Statistical Inference
Analysis of customer satisfaction
T-Tests
ANOVA
Non-parametric Statistical Tests
Linear by Linear Association
Regression
Correlation
Correlations measure how variables or rank orders are
related.
Before calculating a correlation coefficient, screen out
your data for outliers which can cause misleading results.
Bivariate Correlations Data Considerations
quantitative variables for Pearson’s correlation
coefficient.
qualitative variables or variables with ordered categories
for Spearman’s rho and Kendall’s tau-b.
Correlation coefficient (r) is a number between -1 and 1
inclusive.
Taddesse Kassahun Statistical Procedures 63 / 90
64. Descriptive Procedures
Basics of Statistical Inference
Analysis of customer satisfaction
T-Tests
ANOVA
Non-parametric Statistical Tests
Linear by Linear Association
Regression
Correlation
Research
hypothesis: There
is a linear
relationship between
spending on
advertising & sales.
Null hypothesis:
There is no linear
relationship between
spending on
advertising & sales.
Use Analyze > Correlate >
Bivariate...
Taddesse Kassahun Statistical Procedures 64 / 90
65. Descriptive Procedures
Basics of Statistical Inference
Analysis of customer satisfaction
T-Tests
ANOVA
Non-parametric Statistical Tests
Linear by Linear Association
Regression
Partial Correlation
To describe the linear
relationship between two
variables while controlling
for the effects of one or
more additional variables.
UseAnalyze > Correlate
> Partial
Note that correlation does
not mean CAUSATION
Taddesse Kassahun Statistical Procedures 65 / 90
66. Descriptive Procedures
Basics of Statistical Inference
Analysis of customer satisfaction
T-Tests
ANOVA
Non-parametric Statistical Tests
Linear by Linear Association
Regression
Regression
We usually try to summarize data collected from a given
study area and then look for patters.
We also attempt to see the possible relationship which
may exist between variables.
The possible questions we might ask about the variables
are:
Are the the variables related?
What sort of relationship is there?
Can we predict one from the other(s)?
All such questions can be answered by a statistical
technique called regression.
Taddesse Kassahun Statistical Procedures 66 / 90
67. Descriptive Procedures
Basics of Statistical Inference
Analysis of customer satisfaction
T-Tests
ANOVA
Non-parametric Statistical Tests
Linear by Linear Association
Regression
Linear Regression
Linear Regression estimates the coefficients of the linear
equation, involving one or more independent
variables, that best predict the value of the dependent
variable.
Model: Yi = β0 + β1X1i + . . . + βkXki + i .
where Yi is a quantitative dependent variable, the β’s are
parameters to be estimated.
i is the error term distributed normal with mean zero and
variance σ2
.
Example. Prediction of total yearly sales from variables
such as age, education, and years of experience.
Taddesse Kassahun Statistical Procedures 67 / 90
68. Descriptive Procedures
Basics of Statistical Inference
Analysis of customer satisfaction
T-Tests
ANOVA
Non-parametric Statistical Tests
Linear by Linear Association
Regression
Linear Regression
Linear Regression Data Considerations
The dependent variable should be quantitative and
normally distributed.
The independent variable(s) can be categorical andor
quantitative variables.
Categorical variables need to be recoded to dummy
variables.
Assumptions
The number of observations (n) must be greater than
the number of parameters to be estimated.
Linearity
The mean of the random disturbance term (E( )) is
zero.
Taddesse Kassahun Statistical Procedures 68 / 90
69. Descriptive Procedures
Basics of Statistical Inference
Analysis of customer satisfaction
T-Tests
ANOVA
Non-parametric Statistical Tests
Linear by Linear Association
Regression
Linear Regression
Assumptions . . .
The random disturbance term should be normally
distributed.
The variance of the random disturbance term should be
constant.
No autocorrelation between the random disturbance
terms.
There is no multicollinearity. That is, there are no strong
linear relationships among the explanatory variables.
Taddesse Kassahun Statistical Procedures 69 / 90
70. Descriptive Procedures
Basics of Statistical Inference
Analysis of customer satisfaction
T-Tests
ANOVA
Non-parametric Statistical Tests
Linear by Linear Association
Regression
Linear Regression using SPSS
Use Analyze > Regression > Linear...
Place the dependent variable in the Dependent: box and
the independent variables in the Independent(s): box.
Taddesse Kassahun Statistical Procedures 70 / 90
71. Descriptive Procedures
Basics of Statistical Inference
Analysis of customer satisfaction
T-Tests
ANOVA
Non-parametric Statistical Tests
Linear by Linear Association
Regression
Linear Regression Plots
Plots can aid in the validation of the assumptions of
normality, linearity, and equality of variances.
Plots are also useful for detecting outliers, unusual
observations, and influential cases.
Save predicted values & residuals as new variables to
display in the Data Editor for constructing plots with the
independent variables.
Scatterplots: Plot the standardized residuals against the
standardized predicted values to check for linearity and
equality of variances.
Histograms of standardized residuals and normal
probability plots to check normality.
Taddesse Kassahun Statistical Procedures 71 / 90
72. Descriptive Procedures
Basics of Statistical Inference
Analysis of customer satisfaction
T-Tests
ANOVA
Non-parametric Statistical Tests
Linear by Linear Association
Regression
Linear Regression Plots . . .
Select the option ”ZPRED”
(standard normal of predicted
variable) and move it into the
box ”Y.”
Select the option ”ZRESID”
(standard normal of the
regression residual) and move it
into the box ”X.”
Any pattern in that plot will
indicate the presence of
heteroskedasticity.
Taddesse Kassahun Statistical Procedures 72 / 90
73. Descriptive Procedures
Basics of Statistical Inference
Analysis of customer satisfaction
T-Tests
ANOVA
Non-parametric Statistical Tests
Linear by Linear Association
Regression
Linear Regression Plots . . .
To check the normality of
residuals:
Click on the Plots
button.
Select Histogram and
Normal probability
plot under the
Standardized Residuals
Plots.
Click on Continue
Taddesse Kassahun Statistical Procedures 73 / 90
74. Descriptive Procedures
Basics of Statistical Inference
Analysis of customer satisfaction
T-Tests
ANOVA
Non-parametric Statistical Tests
Linear by Linear Association
Regression
Trend Analysis
You need to have a time series data on a daily, weekly,
etc. basis.
Create a sequence chart about the number of items sold
in month x and that of y with specific codes.
Taddesse Kassahun Statistical Procedures 74 / 90
75. Descriptive Procedures
Basics of Statistical Inference
Analysis of customer satisfaction
T-Tests
ANOVA
Non-parametric Statistical Tests
Linear by Linear Association
Regression
Trend Analysis
Analyze > Forecasting > Sequence Charts.
Take the 2 variables (no. of items sold in month x and y)
into Variables: box and the days into Time Axis
Labels:
Taddesse Kassahun Statistical Procedures 75 / 90
76. Descriptive Procedures
Basics of Statistical Inference
Analysis of customer satisfaction
T-Tests
ANOVA
Non-parametric Statistical Tests
Linear by Linear Association
Regression
Trend Analysis
To create a linear trend model for the number of items
sold: Analyze > Regression > Curve Estimation.
Move the recent month into Dependent(s): box and
select Time for independent variable.
You can also predict the number of items to be sold for a
certain number of days.
Analyze > Regression > Curve Estimation. Click on the
Save button.
Example. Consider the broadband1.sav dataset and
predict the total number of subscribers for the first six
months of 2004.
Taddesse Kassahun Statistical Procedures 76 / 90
77. Descriptive Procedures
Basics of Statistical Inference
Analysis of customer satisfaction
T-Tests
ANOVA
Non-parametric Statistical Tests
Linear by Linear Association
Regression
Trend Analysis
Taddesse Kassahun Statistical Procedures 77 / 90
78. Descriptive Procedures
Basics of Statistical Inference
Analysis of customer satisfaction
T-Tests
ANOVA
Non-parametric Statistical Tests
Linear by Linear Association
Regression
Logistic Regression
Logistic regression is useful to predict the
presence/absence of a characteristic or outcome based
on a set of predictor variables.
It is suited to model when the dependent variable is
categorical (having two or more possible categories).
Regression coefficients can be used to estimate odds
ratios for each of the independent variables in the model.
Independent variables can be quantitative or categorical:
if categorical, they should be dummy or indicator coded
(there is an option in the procedure to recode categorical
variables automatically).
Taddesse Kassahun Statistical Procedures 78 / 90
79. Descriptive Procedures
Basics of Statistical Inference
Analysis of customer satisfaction
T-Tests
ANOVA
Non-parametric Statistical Tests
Linear by Linear Association
Regression
Logistic Regression
Logistic function:
Prob(event) = π(X) =
exp(β0 + β1X1)
1 + exp(β0 + β1X1)
The odds for the above expression is given by
Odds =
Prob(event)
1 − Prob(event)
Taking the log of odds gives:
log
Pr(event)
1 − Pr(event)
= Logit(π(X)) = β0 + β1X1
In binary logistic regression, the outcome variable follows
a binomial distribution with prob of success = π(X).
Taddesse Kassahun Statistical Procedures 79 / 90
80. Descriptive Procedures
Basics of Statistical Inference
Analysis of customer satisfaction
T-Tests
ANOVA
Non-parametric Statistical Tests
Linear by Linear Association
Regression
Logistic Regression
The Wald test is used to test the significance of the
coefficients in logistic regression.
The odds ratio (OR) is given by: eβ
OR approximates how much more likely (or unlikely) it is
for the outcome to be present for a unit increase in the
value of quantitative explanatory variable.
OR also approximates how much more likely (unlikely) it
is for the outcome to be present among those with
category j compared with the reference category for a
categorical explanatory variable.
The Hosmer and Lemeshow Test is applied to test
goodness-of-fit of the model.
Taddesse Kassahun Statistical Procedures 80 / 90
81. Descriptive Procedures
Basics of Statistical Inference
Analysis of customer satisfaction
T-Tests
ANOVA
Non-parametric Statistical Tests
Linear by Linear Association
Regression
Logistic Regression using SPSS
Use Analyze >
Regression > Binary
Logistic...
Select the dependent
variable and move it into
the Dependent: box.
Select the independent
variables and move them
into the Covariates: box.
Click on OK
Taddesse Kassahun Statistical Procedures 81 / 90
82. Descriptive Procedures
Basics of Statistical Inference
Analysis of customer satisfaction
T-Tests
ANOVA
Non-parametric Statistical Tests
Linear by Linear Association
Regression
Multiple Logistic Regression
An outcome variable that has three or more nominal
categories can be modelled using multinomial logistic
regression.
A three-level response variable may be coded as 0, 1, and
2.
There is no inherent order in the outcome variable. The
0, 1, and 2 coding are just arbitrary.
Outcome categories: 0, 1, 2. Take any one of these
categories as a reference (e.g., 0).
Then compare: 1 versus 0, and 2 versus 0. There will be
two analogous expressions.
Taddesse Kassahun Statistical Procedures 82 / 90
83. Descriptive Procedures
Basics of Statistical Inference
Analysis of customer satisfaction
T-Tests
ANOVA
Non-parametric Statistical Tests
Linear by Linear Association
Regression
Multiple Logistic Regression
The first is the natural log of the probability that the
outcome is in category 1 divided by the probability that
the outcome is in category 0.
The second is the natural log of the probability that the
outcome is in category 2 divided by the probability that
the outcome is in category 0.
These are the natural log of ”oddslike” expressions.
Pr(Outcome = 0) + Pr(Outcome = 1) + Pr(Outcome =
2) = 1
But, Pr(Oucome = 1) + Pr(Outcome = 0) = 1 ;
Pr(Outcome = 2) + Pr(Outcome = 0) = 1.
Taddesse Kassahun Statistical Procedures 83 / 90
84. Descriptive Procedures
Basics of Statistical Inference
Analysis of customer satisfaction
T-Tests
ANOVA
Non-parametric Statistical Tests
Linear by Linear Association
Regression
Multiple Logistic Regression
Logistic regression model with three outcome categories
and one predictor (X1):
ln
Pr(Outcome = 1|X1)
Pr(Outcome = 0|X1)
= α1 + β11X1
ln
Pr(Outcome = 2|X1)
Pr(Outcome = 0|X1)
= α2 + β21X1
Consider the special case in which the only independent
variable (X1) is coded as 0 and 1.
Two ”oddslike” ratios:
Taddesse Kassahun Statistical Procedures 84 / 90
86. Descriptive Procedures
Basics of Statistical Inference
Analysis of customer satisfaction
T-Tests
ANOVA
Non-parametric Statistical Tests
Linear by Linear Association
Regression
Multiple Logistic Regression
Interpretation of ORs for X1 = 1 versus X1 = 0:
If the value of OR1 is greater than 1.00, then subjects in
X1 = 1 category relative to subjects in X1 = 0 category
are more likely to have their response categorized as 1
than as 0.
If the value of OR2 is greater than 1.00, then subjects in
X1 = 1 category relative to subjects in X1 = 0 category
are more likely to have their response categorized as 2
than as 0.
We can use a likelihood ratio test to assess the
significance of the independent variable in the model.
Taddesse Kassahun Statistical Procedures 86 / 90
87. Descriptive Procedures
Basics of Statistical Inference
Analysis of customer satisfaction
T-Tests
ANOVA
Non-parametric Statistical Tests
Linear by Linear Association
Regression
Multiple Logistic Regression
We can also expand the above model by adding more
independent variables.
The log odds comparing category 1 to category 0:
ln
Pr(Outcome = 1|X)
Pr(Outcome = 0|X)
= α1 +
p
i=1
β1i Xi
The log odds comparing category 2 to category 0:
ln
Pr(Outcome = 2|X)
Pr(Outcome = 0|X)
= α2 +
p
i=1
β2i Xi
Taddesse Kassahun Statistical Procedures 87 / 90
88. Descriptive Procedures
Basics of Statistical Inference
Analysis of customer satisfaction
T-Tests
ANOVA
Non-parametric Statistical Tests
Linear by Linear Association
Regression
Multiple Logistic Regression using SPSS
Analyze > Regression >
Multinomial Logistic...
Move the dependent variable
into the Dependent: box.
Move the explanatory variables
into the Factors: or
Covariates: box.
Factors can be either numeric
or categorical.
Covariates must be numeric if
specified.
Taddesse Kassahun Statistical Procedures 88 / 90
89. Descriptive Procedures
Basics of Statistical Inference
Analysis of customer satisfaction
T-Tests
ANOVA
Non-parametric Statistical Tests
Linear by Linear Association
Regression
Taddesse Kassahun Statistical Procedures 89 / 90
90. Descriptive Procedures
Basics of Statistical Inference
Analysis of customer satisfaction
T-Tests
ANOVA
Non-parametric Statistical Tests
Linear by Linear Association
Regression
Taddesse Kassahun Statistical Procedures 90 / 90