SlideShare una empresa de Scribd logo
1 de 138
Quantitative Data Analysis:
1
Data analysis
Descriptive/Frequency
- Demographics (Number and/or percentage)
- Cross-tabulation (Number and/or percentage)
Goodness of Measures Measurement Validity and Reliability.
Reliability: The degree to which measures are free from random error and therefore yield
consistent results.
Inferential/Hypothesis testing
- t-test or ANOVA
- Correlation
- Regression
The Right Technique in Data Analysis?
What is the purpose of the analysis?
- Descriptive, compare group, relationship
What is the level of measurement?
- Parametric and Non-parametric
How many variables are involved?
- Univariate, bivariate, multivariate
What kind of tests?
Descriptive or Inferential.
If inferential set the significance level
Descriptive Analysis
Purpose: To describe the distribution of the demographic
variable
Frequencies distribution – if 1 ordinal or nominal
Cross-tabulation – if 2 ordinal or nominal
Means – if 1 interval or ratio
Means of subgroup – if 1 interval or ratio by subgroup
GOODNESS OF MEASURE
Validity (criterion)
- Factor analysis
Reliability
- Cronbach alpha
FACTOR ANALYSIS
• Go to analyze – dimension reduction- factor
- Enter items of IV or DV into dialogue box
- Tick descriptive – initial solution – coefficient- sig. level-
determinant-KMO & Bartlett test-inverse-reproduced-
-anti-image
- Tick extraction – principal component
- Tick rotation – varimax – rotated solution -loading plot
- Tick score – display factor coefficient matrix
- Tick option – sorted by size
- Tick ok
FACTOR ANALYSIS … CONT.
To conduct a Factor Analysis, start from the
“Analyze” menu. This procedure is intended to
reduce the complexity in a set of data, so we
choose “Dimension Reduction” from the menu.
And the choice in this category is “Factor,” for
factor analysis.
This dataset gives children’s scores on subtests
of the Wechsler Intelligence Scale for Children
(WISC-III). The Wechsler scales are scored to
give you a “verbal” and a “performance” IQ.
The question is whether we can reproduce the
verbal vs. nonverbal distinction, with the
appropriate subtests grouping into each
category, using factor analysis.
FACTOR ANALYSIS … CONT.
Factor analysis has no IVs and DVs, so
everything you want to get factors for just goes
into the list labeled “variables.” In this case, it’s
all the variables. In some datasets, there is also
a dummy “subject number” variable included. Be
sure that you don’t include subject number as
one of the variables for your factor analysis!
FACTOR ANALYSIS … CONT.
In this dialog box, you can make a number of selections. First, I want you
to un-check the box labeled “Unrotated factor solution.” This is a
default setting for your printout, but it just gives you information that you
don’t need, and that may distract you from the real answers. So,
always go into the Extraction sub-dialog and un-check this box.
Second, check the box for a “scree plot.” This will give you a scree
diagram, which is one way to decide how many factors to extract.
Third, look at the section labeled “Extract.” As you can see, the default
setting is for SPSS to use the Kaiser stopping criterion (i.e., all factors
with eigenvalues greater than 1) to decide how
many factors to extract.
You can set a more conservative stopping criterion by requiring each
factor to have a higher eigenvalue.
Or, if you already know exactly how many factors you think
there will be, you can set the extraction method to a specific “Number of
factors,” and then put
the number into this box.
FACTOR ANALYSIS … CONT.
This dialog allows you to choose a “rotation method” for your
factor analysis.
a rotation method gets factors that are as different from each
other as possible, and helps you interpret the factors by putting
each variable primarily on one of the factors.
However, you still need to decide whether you want an
“orthogonal” solution (factors are not highly correlated with
each other), or an “oblique” solution (factors are correlated with
one another).
If you want an oblique solution, the only choice SPSS gives
you is “Direct Oblimin.”
All of the others are orthogonal solutions—the one that you’ll
use most often from these choices is the default value,
“Varimax.” Most of the factor analyses you will see in
published articles use a Varimax rotation.
make sure that the check-box for a “rotated solution” is on.
The rotated solution gives you the factor loadings for each
individual variable in your dataset, which are
what you use to interpret the meaning of (i.e., make up names
for) the different factors.
FACTOR ANALYSIS … CONT.
This table shows you the actual factors that were
extracted. If you look at the section labeled
“Rotation Sums of Squared Loadings,” it shows you
only those factors that met your cut-off
criterion (extraction method). In this case, there were
three factors with eigenvalues greater than
1. SPSS always extracts as many factors initially as
there are variables in the dataset, but the rest
of these didn’t make the grade. The “% of variance”
column tells you how much of the total
variability (in all of the variables together) can be
accounted for by each of these summary scales
or factors. Factor 1 accounts for 27.485% of the
variability in all 11 variables, and so on.
FACTOR ANALYSIS … CONT.
Finally, the Rotated Component Matrix shows you the
factor loadings for each variable. I went
across each row, and highlighted the factor that each
variable loaded most strongly on. Based on
these factor loadings, I think the factors represent:
--The first 5 subtests loaded strongly on Factor 1, which
I’ll call “Verbal IQ”
--Picture Completion through Object Assembly all loaded
strongly on Factor 2, which I’ll
“Performance IQ”
--Coding loaded strongly on Factor 3 (and Digit Span
loaded fairly strongly on Factor 3,
although it also loaded on Factor 1). Probably Factor 3 is
“Freedom from Distraction,” because
these are concentration-intensive tasks.
FACTOR ANALYSIS … CONT.
KMO and Bartlett's Test
Kaiser-Meyer-Olkin Measure of Sampling Adequacy. .619
Bartlett's Test of Sphericity Approx. Chi-Square 327.667
df 91
Sig. .000
Total Variance Explained
Initial Eigenvalues Extraction Sums of Squared Loadings Rotation Sums of Squared Loadings
Component Total % of Variance Cumulative % Total % of Variance Cumulative % Total % of Variance Cumulative %
1 2.672 19.087 19.087 2.672 19.087 19.087 1.941 13.864 13.864
2 2.116 15.111 34.198 2.116 15.111 34.198 1.911 13.648 27.512
3 1.314 9.385 43.583 1.314 9.385 43.583 1.521 10.866 38.378
4 1.129 8.065 51.648 1.129 8.065 51.648 1.489 10.635 49.012
5 1.024 7.316 58.964 1.024 7.316 58.964 1.393 9.952 58.964
6 .915 6.538 65.502
7 .908 6.485 71.987
8 .820 5.860 77.848
9 .729 5.209 83.056
10 .628 4.484 87.540
11 .541 3.865 91.405
12 .471 3.365 94.771
13 .403 2.876 97.647
14 .329 2.353 100.000
Extraction Method: Principal Component Analysis.
Output of Factor Analysis
Rotated Component Matrix
Component
1 2 3 4 5
BO12 .742 6.966E-02 8.411E-02 .291 1.308E-02
BO7 .724 -5.468E-02 -.214 -8.570E-02 .146
BO6 .722 -1.121E-02 .243 .136 .129
BO8 6.309E-02 .801 .149 -4.440E-02 -4.116E-02
BO14 -.303 .772 .183 2.543E-02 .170
BO13 -.197 -.627 .137 .127 .333
BO11 -6.129E-02 8.341E-02 -.802 6.165E-02 .153
BO4 3.467E-02 .318 .599 9.943E-02 -7.414E-02
BO9 -1.251E-02 -1.622E-02 -1.564E-02 .820 5.378E-02
BO1 -.236 .230 .359 -.556 .143
BO2 .303 2.235E-02 .190 .539 .223
BO3 .196 .114 -3.516E-02 -9.854E-02 .797
BO10 -.127 .169 .212 -.214 -.577
BO5 4.868E-02 .260 .349 -.136 -.379
Extraction Method: Principal Component Analysis. Rotation Method: Varimax with
Kaiser Normalization. a Rotation converged in 6 iterations.
Output of Factor Analysis … cont.
RELIABILITY
• Go to analyze – scale – reliability analysis
- Enter items to be analyzed
- Tick statistics – descriptive for –item – scale – scale if item deleted.
• Verify the output
- If the scale (Cronbach alpha) > .70, the reliability
of the variable is achieved (Nunnally, 1978)
- If not verify the table and check the alpha
scale if item deleted to detect for improvement.
- Drop item stated in scale if item deleted and run
reliability again.
- Do summated scale to formulate a variable.
R E L I A B I L I T Y A N A L Y S I S - S C A L E (A L P H A)
Mean Std Dev Cases
1. BO12 3.9580 1.0269 143.0
2. BO6 3.4825 .9704 143.0
3. BO7 2.9650 .8914 143.0
N of
Statistics for Mean Variance Std Dev Variables
SCALE 10.4056 4.9048 2.2147 3
Item-total Statistics
Scale Scale Corrected
Mean Variance Item- Alpha
if Item if Item Total if Item
Deleted Deleted Correlation Deleted
BO12 6.4476 2.3194 .4894 .5030
BO6 6.9231 2.4518 .4974 .4916
BO7 7.4406 2.9243 .3890 .6348
Reliability Coefficients
N of Cases = 143.0 N of Items = 3
Alpha = .6465
1 Dependent Variable 1 Independent Variable Test
Binary
Metric Logistic regression
Non-metric Chi-square test
Non-metric
Metric Logistic regression
Binary Mann-Whitney test
Metric
Binary t-test
Metric Regression analysis
Nominal Analysis of variance
When do we need which test?
18
1 Dependent Variable 2 or more Independent Variables Test
Non-metric
Metric Logistic regression
Non-metric Loglinear analysis
Metric
Metric Multiple regression
Non-metric Analysis of variance
When do we need which test?
19
A chi-squared test (also chi-square or χ2 test) is a statistical hypothesis test used in the analysis of contingency
tables when the sample sizes are large. A Chi-square test is a hypothesis testing method. Two common Chi-
square tests involve checking if observed frequencies in one or more categories match expected frequencies.
A contingency table is a tool used to summarize and analyze the relationship between two categorical variables.
The Mann-Whitney U test is used to compare differences between two independent groups when the dependent variable is
either ordinal or continuous, but not normally distributed. For example, you could use the Mann-Whitney U test to understand
whether attitudes towards pay discrimination, where attitudes are measured on an ordinal scale, differ based on gender (i.e., your
dependent variable would be "attitudes towards pay discrimination" and your independent variable would be "gender", which has
two groups: "male" and "female").
A t test is a statistical test that is used to compare the means of two groups. It is often used in hypothesis testing to determine
whether a process or treatment actually has an effect on the population of interest, or whether two groups are different from one
another.
Log-Linear Analysis is a statistical test used to determine if the proportions of categories in two or more group variables
significantly differ from each other. To use this test, you should have two or more group variables with two or more options in each
group variable. See more below.
Correlation
H1: Autonomy and innovative orientation among Bumiputera
SMEs in northern Malaysia are related significantly
Correlations
Autonomy Innovative
Autonomy Pearson Correlation 1 .072
Sig. (2-tailed) . .297
N 210 210
Innovative Pearson Correlation .072 1
Sig. (2-tailed) .297 .
N 210 210
Interpretation:
(r = .072, p < .297) if significant level is set at p < .05, then
there is no statistical significant correlation between autonomy
and innovativeness. Therefore, H1 rejected.
 The purpose of regression models is learn more about
the relationship between several independent or
predictor variables and a dependent or criterion
variable.
 The computational problem that needs to be solved in
regression analysis is to fit a straight line to a number
of points.
 Y = b0 +b1x1 + b2x2 + … + bnxn + e
Regression models
23
 Linear regression
 1 dependent variable: continuous/scale
 One or more independent variables: continuous/scale
 Hierarchical regression
 1 dependent variable: continuous/scale
 Multiple blocks of independent variables: continuous/scale
 Logistic regression
 1 dependent variable: binary
 One or more independent variables: continuous/scale
Types of Regression Models
24
Output of SPSS Regression Analyses
25
Output of SPSS Regression Analyses
26
the F-test can assess
the equality of variances.
Output of SPSS
Regression Analyses
27
Confidence interval = sample
mean ± margin of error
To obtain this confidence interval,
add and subtract the margin of
error from the sample mean. This
result is the upper limit and the
lower limit of the confidence
interval.
MULTIPLE REGRESSION ANALYSIS…CONT.
Consider Some Multiple Regression Assumptions:
1. Normality – Verify Skewness < 2.0 or histogram (Skewness is a measurement of the
distortion of symmetrical distribution or asymmetry in a data set. )
1. Linearity – Verify p-p plot of std. regress residuals
2. Homocedasticity – an assumption of equal or similar variances in different groups being compared.
3. Free from error term – Durbin Watson between 1.5 – 2.5
4. Free from multicollinearity – Correlation < .70,
1. Describe Descriptive Statistics (means, st. dev.) of all variables
2. Report on testing of assumptions – especially if assumptions are violated and what
was done about it.
3. Report on model fit statistics (F, df1, df2, R2).
4. Report parameter estimates – for constant and IV
1. Standardized Beta
2. T-value and significance
3. (Confidence intervals)
Reporting Regression Analyses
29
 Type of regression models where
 The dependent variable is binary
 [or ordinal: ordered logistic regression (e.g. 3 categories: low, medium, high)]
 Checks whether we can predict in which category we will land based on the
values of the IV.
 Essentially compares a model with predictors (BLOCK 1) against a model
without predictors (BLOCK 0):
 is a prediction with our variables better than random chance?
Example: http://eprints.qut.edu.au/31606/
Logistic Regression Analysis
30
Logistic Regression
Analysis: Output
31
Logistic Regression
Analysis: Output
32
Reporting Logistic
Regression Analyses
33
 a statistical method used to test differences between two or more means.
 Inferences about means are made by analyzing variance.
 Think of it as an extension of t-tests
 To two or more groups
 To means + variance rather than only means.
 In a typical ANOVA, the null hypothesis is that all groups are random samples of the same population.
 For example, when studying the effect of different treatments on similar samples of patients, the null hypothesis would be that
all treatments have the same effect (perhaps none).
 Rejecting the null hypothesis would imply that different treatments result in altered effects.
 Often used in experimental research, to study effects of treatments.
Analysis of Variance Models
34
 One-way ANOVA
 used to test for differences among two or more independent groups (means).
 Typically, however, the one-way ANOVA is used to test for differences among at least three groups, since the two-group case can be
covered by a t-test (when there are only two means to compare, the t-test and the ANOVA F-test are equivalent).
 Factorial ANOVA
 used when the experimenter wants to study the interaction effects among the treatments.
 Repeated measures ANOVA
 used when the same subjects are used for each treatment (e.g., in a longitudinal study).
 Multivariate analysis of variance (MANOVA)
 used when there is more than one dependent variable.
 Analysis of covariance (ANCOVA)
 blends ANOVA and regression: evaluates whether population means of a DV are equal across levels of a categorical IV [treatment],
while statistically controlling for the effects of other continuous variables that are not of primary interest [covariates].
Types of Analysis of Variance Models
35
When can we use ANOVA?
• The t-test is used to compare the means of two-groups.
• One-way ANOVA is used to compare the means of two or more
groups.
• We can use one-way ANOVA whenever the dependent variable (DV)
is numerical and the independent variable (IV) is categorical.
• The independent variable in ANOVA is also called a factor.
36
Examples
The following are situations where we can use ANOVA:
• Testing the differences in blood pressure among different groups
of people (DV is blood pressure and the group is the IV).
• Testing which type of social media affects hours of sleep (type of
social media used is the IV and hours of sleep is the DV).
37
 The type of ANOVA model is highly dependent on your research design and
theory; in particular:
 What are between-subject factors? How many?
 What are within-subject factors? How many?
 What are treatments? How many?
 Which factors are theoretically relevant, which are mere controls?
ANOVA and Research Designs
38
 Independence, normality and homogeneity of the variances of the
residuals
 Note there are no necessary assumptions for ANOVA in its full generality,
but the F-test used for ANOVA hypothesis testing has assumptions and
practical limitations.
ANOVA Assumptions
39
 One-way
= one-way between groups model
 E.g., school performance between boys versus girls
 Two-way
= two one-ways for each factor PLUS
interaction between two factors
 E.g., school performance between boys versus girls and locals versus
internationals
 Three-way
 You get the idea…
One-way and two-way ANOVA
40
 Injuries sustained by kids wearing superhero
costumes
 Does it depend on which costume they wear?
 Superman, Spiderman, Hulk, Ninja Turtle?
 Adopted from
http://www.statisticshell.com/docs/onewayanova.pdf
Illustration: Analysis of Variance
41
 Are injuries sustained random or significantly dependent on wearing superhero
costumes?
 Is there any order of injuries sustained by type of costume?
What ANOVA could tell us
42
What ANOVA could tell us
Variance in injuries severity explained by
different costumes
Flying superheroes
Non-flying
superheroes
Superman Spiderman
Hulk Ninja Turtle
Contrast 1
Contrast 2
Contrast 3
Assumptions of ANOVA
• The observations in each group are normally distributed.
This can be tested by plotting the numerical variable separately for
each group and checking that they all have a bell shape.
Alternatively, you could use the Shapiro-Wilk test for normality.
44
Assumptions
• The groups have equal variances (i.e., homogeneity of variance).
You can plot each group separately and check that they exhibit similar variability.
Alternatively, you can use Levene’s test for homogeneity.
• The observations in each group are independent.
This could be assessed by common sense looking at the study design.
For example, if there is a participant in more than one group, your observations are
not independent.
45
Hypothesis Testing
ANOVA tests the null hypothesis:
H0 : The groups have equal means versus the alternative
hypothesis:
H1 : At least one group mean is different from the other
group means.
46
F-Test
ANOVA in SPSS
47
Example:
Is there a difference in optimism scores for young,
middle-aged and old participants?
Categorical IV - Age with 3 levels:
• 29 and younger
• Between 30 and 44
• 45 or above
Continuous DV – Optimism scores
ANOVA in SPSS
48
Interpreting the output:
1. Check that the groups have equal variances using Levene’s test for
homogeneity.
• Check the significance value (Sig.) for Levene’s test Based on Mean.
• If this number is greater than .05 you have not violated the assumption of
homogeneity of variance.
ANOVA in SPSS
49
Interpreting the output:
2. Check the significance of the ANOVA.
• If the Sig. value is less than or equal to .05, there is a significant difference
somewhere among the mean scores on your dependent variable for the three
groups.
• However, this does not tell us which group is different from which other
group.
ANOVA in SPSS
50
Interpreting the output:
3. ONLY if the ANOVA is significant, check the significance of the
differences between each pair of groups in the table labelled
Multiple Comparisons.
ANOVA in SPSS
51
Calculating effect size:
• In an ANOVA, effect size will tell us how large the difference between
groups is.
• We will calculate eta squared, which is one of the most common
effect size statistics.
Eta squared =
Sum of squares between groups
Total sum of squares
ANOVA in SPSS
52
Calculating effect size:
179.07
8513.02
= .02
According to Cohen (1988):
Small effect: .01
Medium effect: .06
Large effect: .14
ANOVA in SPSS
53
Example results write-up:
A one way between-groups analysis of variance was conducted to explore the impact of
age on levels of optimism. Participants were divided into three groups according to their
age (Group 1: 29yrs or less; Group 2: 30 to 44yrs; Group 3: 45yrs and above). There was a
statistically significant difference at the p < .05 level in optimism scores for the three age
groups: F (2, 432) = 4.6, p = .01. Despite reaching statistical significance, the actual
difference in mean scores between the groups was quite small. The effect size, calculated
using eta squared, was .02. Post-hoc comparisons using the Tukey HSD test indicated that
the mean score for Group 1 (M = 21.36, SD = 4.55) was significantly different
from Group 3 (M = 22.96, SD = 4.49).
ANOVA in SPSS
54
Note: Results are usually rounded to two decimal places
Descriptive Statistics-Numeric Data
• After Importing your dataset, and providing names to
variables, click on:
• ANALYZE  DESCRIPTIVE STATISTICS DESCRIPTIVES
• Choose any variables to be analyzed and place them in box on right
• Options include:
e
8
8
0
1
3
3
1
1
8
C
V
t i
t i
t i
t i
t i
E
t i
t i
N
m
i m
u
e
t d
i a
i a
Descriptive Statistics-General Data
• After Importing your dataset, and providing names to variables,
click on:
• ANALYZE  DESCRIPTIVE STATISTICS FREQUENCIES
• Choose any variables to be analyzed and place them in box on right
• Options include (For Categorical Variables):
• Frequency Tables
• Pie Charts, Bar Charts
• Options include (For Numeric Variables)
• Frequency Tables (Useful for discrete data)
• Measures of Central Tendency, Dispersion, Percentiles
• Pie Charts, Histograms
Example 1.4 - Smoking Status
S
0
9
9
9
3
3
3
2
9
6
6
8
2
4
4
2
3
8
8
0
7
0
0
N
Q
Q
C
O
T
V
u
r c
P
u
r c
Vertical Bar Charts and Pie Charts
• After Importing your dataset, and providing names to variables, click on:
• GRAPHS  BAR…  SIMPLE (Summaries for Groups of Cases)  DEFINE
• Bars Represent N of Cases (or % of Cases)
• Put the variable of interest as the CATEGORY AXIS
• GRAPHS  PIE… (Summaries for Groups of Cases)  DEFINE
• Slices Represent N of Cases (or % of Cases)
• Put the variable of interest as the DEFINE SLICES BY
Example 1.5 - Antibiotic Study
OUTCOME
5
4
3
2
1
Count
80
60
40
20
0
5
4
3
2
1
Histograms
• After Importing your dataset, and providing names to
variables, click on:
• GRAPHS  HISTOGRAM
• Select Variable to be plotted
• Click on DISPLAY NORMAL CURVE if you want a normal curve
superimposed (see Chapter 3).
Example 1.6 - Drug Approval Times
MONTHS
1
2
0
.
0
1
1
0
.
0
1
0
0
.
0
9
0
.
0
8
0
.
0
7
0
.
0
6
0
.
0
5
0
.
0
4
0
.
0
3
0
.
0
2
0
.
0
1
0
.
0
0
.
0
30
20
10
0
Std. Dev = 20.97
Mean = 32.1
N = 175.00
Side-by-Side Bar Charts
• After Importing your dataset, and providing
names to variables, click on:
• GRAPHS  BAR…  Clustered (Summaries for Groups of
Cases)  DEFINE
• Bars Represent N of Cases (or % of Cases)
• CATEGORY AXIS: Variable that represents groups to be
compared (independent variable)
• DEFINE CLUSTERS BY: Variable that represents outcomes of
interest (dependent variable)
Example 1.7 - Streptomycin Study
TRT
2
1
Count
30
20
10
0
OUTCOME
1
2
3
4
5
6
Scatterplots
• After Importing your dataset, and providing
names to variables, click on:
• GRAPHS  SCATTER  SIMPLE  DEFINE
• For Y-AXIS, choose the Dependent (Response) Variable
• For X-AXIS, choose the Independent (Explanatory) Variable
Example 1.8 - Theophylline
Clearance
DRUG
3.5
3.0
2.5
2.0
1.5
1.0
.5
THCLRNCE
8
7
6
5
4
3
2
1
0
Scatterplots with 2 Independent
Variables
• After Importing your dataset, and providing names to variables,
click on:
• GRAPHS  SCATTER  SIMPLE  DEFINE
• For Y-AXIS, choose the Dependent Variable
• For X-AXIS, choose the Independent Variable with the most levels
• For SET MARKERS BY, choose the Independent Variable with the fewest
levels
Example 1.8 - Theophylline Clearance
SUBJECT
16
14
12
10
8
6
4
2
0
THCLRNCE
8
7
6
5
4
3
2
1
0
DRUG
Tagamet
Pepcid
Placebo
Contingency Tables for Conditional Probabilities
• After Importing your dataset, and providing names to
variables, click on:
• ANALYZE  DESCRIPTIVE STATISTICS  CROSSTABS
• For ROWS, select the variable you are conditioning on
(Independent Variable)
• For COLUMNS, select the variable you are finding the conditional
probability of (Dependent Variable)
• Click on CELLS
• Click on ROW Percentages
Example 1.10 - Alcohol & Mortality
C
5
5
0
%
%
%
1
4
5
%
%
%
6
9
5
%
%
%
C
%
C
%
C
%
0
1
W
T
0
1
A
o t
Independent Sample t-Test
• After Importing your dataset, and providing names to
variables, click on:
• ANALYZE  COMPARE MEANS  INDEPENDENT SAMPLES T-TEST
• For TEST VARIABLE, Select the dependent (response) variable(s)
• For GROUPING VARIABLE, Select the independent variable. Then
define the names of the 2 levels to be compared (this can be used
even when the full dataset has more than 2 levels for independent
variable).
Example 3.5 - Levocabastine in Renal
Patients
S t
6
3
2
2
6
7
9
7
G
N
H
A
N
e a
e
E
e
t S
4
1
6
0
4
7
7
5 0
8 3
6
3
6
7
7
1 3
4 6
E q
a s
E q
n o
A U
F
S i g
s T
o f V
t
d f
2 - t
M e a
e r e
. E
e r e
o w
p p
o n
v a l
e r e
u a l
Paired t-test
• After Importing your dataset, and providing
names to variables, click on:
• ANALYZE  COMPARE MEANS  PAIRED SAMPLES T-TEST
• For PAIRED VARIABLES, Select the two dependent
(response) variables (the analysis will be based on first
variable minus second variable)
Example 3.7 - Cmax in SRC&IRC Codeine
p l
3
2
8
3
5
5
S
I
P
1
e a
N
e
E
e
e s
S
P
N
e l
i g
m
3
9
2
8
9
6
2
0
S
P
e a
e v
E
e a
o w
p p
o n
a l
e r e
i
f f
t
d f
2 -
Chi-Square Test
• After Importing your dataset, and providing names to
variables, click on:
• ANALYZE  DESCRIPTIVE STATISTICS  CROSSTABS
• For ROWS, Select the Independent Variable
• For COLUMNS, Select the Dependent Variable
• Under STATISTICS, Click on CHI-SQUARE
• Under CELLS, Click on OBSERVED, EXPECTED, ROW PERCENTAGES, and
ADJUSTED STANDARDIZED RESIDUALS
• NOTE: Large ADJUSTED STANDARDIZED RESIDUALS (in absolute
value) show which cells are inconsistent with null hypothesis of
independence. A common rule of thumb is seeing which if any cells
have values >3 in absolute value
Example 5.8 - Marital Status &
Cancer
R E V
2 9
4 7
7 6
. 1
7 . 9
6 . 0
%
%
0 %
. 3
2 . 3
1 6
0 8
2 4
. 3
1 . 7
4 . 0
%
%
0 %
. 7
- . 7
6 7
5 6
2 3
. 6
1 . 4
3 . 0
%
%
0 %
. 1
1 . 1
5
5
1 0
. 0
5 . 0
0 . 0
%
%
0 %
. 0
. 0
1 7
1 6
3 3
. 0
6 . 0
3 . 0
%
%
0 %
C o u
E x p
% w
A d j
C o u
E x p
% w
A d j
C o u
E x p
% w
A d j
C o u
E x p
% w
A d j
C o u
E x p
% w
S i n
M a r
W i d
D i v
/
M A
T o t a
a n c e
C a n c
N C R E
T o t a l
u a r
0 a
3
3 7
2
3
3 4
1
1
5 7
3
P
L
L
A
N
a l u
d f
m p
s i d
1
m
a
Fisher’s Exact Test
• After Importing your dataset, and providing names to variables, click
on:
• ANALYZE  DESCRIPTIVE STATISTICS  CROSSTABS
• For ROWS, Select the Independent Variable
• For COLUMNS, Select the Dependent Variable
• Under STATISTICS, Click on CHI-SQUARE
• Under CELLS, Click on OBSERVED and ROW PERCENTAGES
• NOTE: You will want to code the data so that the outcome present (Success)
category has the lower value (e.g. 1) and the outcome absent (Failure) category
has the higher value (e.g. 2). Similar for Exposure present category (e.g. 1) and
exposure absent (e.g. 2). Use Value Labels to keep output straight.
Example 5.5 - Antiseptic Experiment
R E
6
4
0
%
%
%
6
9
5
%
%
%
2
3
5
%
%
%
C
%
C
%
C
%
A
C
T
T
e a
D e
T H
o t
u a r
5 b
1
4
8
1
8
7
1
3
0 5
0 4
2
1
4
5
P e
C
a
L i
F i
L i
A s
N
a l u
d f
m p .
s i d
c t
s i d
a c t
s i d
C
a .
0
1 0
b .
McNemar’s Test
• After Importing your dataset, and providing names to
variables, click on:
• ANALYZE  DESCRIPTIVE STATISTICS  CROSSTABS
• For ROWS, Select the outcome for condition/time 1
• For COLUMNS, Select the outcome for condition/time 2
• Under STATISTICS, Click on MCNEMAR
• Under CELLS, Click on OBSERVED and TOTAL PERCENTAGES
• NOTE: You will want to code the data so that the outcome present
(Success) category has the lower value (e.g. 1) and the outcome
absent (Failure) category has the higher value (e.g. 2). Similar for
Exposure present category (e.g. 1) and exposure absent (e.g. 2).
Use Value Labels to keep output straight.
Example 5.6 - Report of Implant Leak
R E
9
8
7
%
%
%
5
3
8
%
%
%
4
1
5
%
%
%
C
%
C
%
C
%
P
A
S
T
e s
s e
G
o t
a
a
M
N
l u
t
i d
B
a
P-value
Relative Risks and Odds Ratios
• After Importing your dataset, and providing names to
variables, click on:
• ANALYZE  DESCRIPTIVE STATISTICS  CROSSTABS
• For ROWS, Select the Independent Variable
• For COLUMNS, Select the Dependent Variable
• Under STATISTICS, Click on RISK
• Under CELLS, Click on OBSERVED and ROW PERCENTAGES
• NOTE: You will want to code the data so that the outcome present
(Success) category has the lower value (e.g. 1) and the outcome
absent (Failure) category has the higher value (e.g. 2). Similar for
Exposure present category (e.g. 1) and exposure absent (e.g. 2).
Use Value Labels to keep output straight.
Example 5.1 - Pamidronate Study
R
7
9
6
%
%
%
4
7
1
%
%
%
1
6
7
%
%
%
C
%
C
%
C
%
P
P
P
T
Y e
N o
E V
o t
E s t
6
3
0
7
2
5
6
3
6
7
O
( P
F
Y
F
N
N
a lu
o w
p p
o n
e r
Example 5.2 - Lip Cancer
R E
9
9
8
%
%
%
8
1
9
%
%
%
7
0
7
%
%
%
C
%
C
%
C
%
Y
N
P
T
Y e
N o
C R
o t
s t
3
1
9
6
8
5
8
2
4
7
O
P
F
Y
F
N
a l u
w
p p
o n
e r
Correlation
After Importing your dataset, and providing names
to variables, click on:
ANALYZE  CORRELATE BIVARIATE
Select the VARIABLES
Select the PEARSON CORRELATION
Select the Two tailed test of significance
Select Flag significant correlations
Linear Regression
• After Importing your dataset, and providing
names to variables, click on:
• ANALYZE  REGRESSION  LINEAR
• Select the DEPENDENT VARIABLE
• Select the INDEPENDENT VARAIABLE(S)
• Click on STATISTICS, then ESTIMATES, CONFIDENCE
INTERVALS, MODEL FIT
Examples 7.1-7.6 - Gemfibrozil Clearance
i c
a
8
8
1
0
0
6
5
1
5
3
6
2
8
( C
C
M
1
B
E
d a
i c
e t
a r
f i c
t
S i g
r B
r B
n c e
D
a
Examples 7.1-7.6 - Gemfibrozil Clearance
O
b
2
1
8
3
6 a
8
5
3
0
6
R
R
T
M
1
m
u a
d f
S
F
S i g
P
a
D
b
S u
b
5 a
1
6
0
M
1
R
q u
u s
q
E r
s t
P
a
D
b
Linear Regression
• We will introduce simple linear regression, in
particular we will:
• Learn when we can use simple linear regression
• Learn the basic workings involved in simple linear
regression
• Linear Regression in SPSS
• This presentation is intended for students in initial
stages of Statistics. No previous knowledge is
required.
90
Linear Regression
• Regression is used to study the relationship
between two variables.
• How a change in one variable (e.g., someone’s
exercise habits) can predict the outcome of another
variable (e.g., general health).
• We can use simple regression if both the
dependent variable (DV) and the independent
variable (IV) are numerical.
• If the DV is numerical but the IV is categorical, it is
best to use ANOVA. 91
Examples
The following are situations where we can use
regression:
• Testing if IQ affects income (IQ is the IV and income
is the DV).
• Testing if study time affects grades (hours of study
time is the IV and average grade is the DV).
• Testing if exercise affects blood pressure (hours of
exercise is the IV and blood pressure is the DV).
92
Displaying the data
When both the DV and IV are numerical, we can
represent data in the form of a scatterplot.
93
Displaying the data
It is important to perform a scatterplot because it
helps us to see if the relationship is linear.
In this example, the
relationship between
body fat % and chance
of heart failure is not
linear and hence it is
not sensible to use
linear regression.
95
• Straight line prediction model.
• As an independent variable
changes, what happens to the
dependent variable? I.e., as an
independent variable goes up
and down, does the dependent
variable go up and down?
• They could either move in the
same direction (positive
relationship) or opposite
direction (negative relationship)
Linear Regression
96
• Straight line prediction model.
• As an independent variable
changes, what happens to the
dependent variable? I.e., as an
independent variable goes up
and down, does the dependent
variable go up and down?
• They could either move in the
same direction (positive
relationship) or opposite
direction (negative relationship)
Linear Regression
97
Linear Regression
98
Linear Regression
y = B0 + B1 * X + E
grades
B0
B1
study time
99
Linear Regression
y = B0 - B1 * X + E
Assumptions of regression
• The errors E are normally distributed.
This can be tested by plotting an histogram of the residuals of
the regression and checking that they all have a bell shape.
Alternatively, you could use the Shapiro-Wilk test for
normality.
100
Assumptions of regression
• There are no clear outliers
This can be checked by performing the scatterplot. The outliers (circled
in red in the figure) can simply be removed from the analysis.
101
Hypothesis testing
Regression tests the null hypothesis:
H0 : There is no effect of X on Y.
versus the alternative hypothesis:
H1 : There is an effect of X on Y.
If the null hypothesis is rejected, we reject the hypothesis that
there is no relationship and hence we conclude that there is a
significant relationship between X and Y. 102
How do we know if we should reject the null
hypothesis?
We perform regression in SPSS and look at the p-value
of the coefficient b.
If the p-value is less than 0.05, we reject the null
hypothesis (the variable is significant), otherwise, we
do not reject the null hypothesis (the variable is not
significant).
103
Hypothesis testing
Interpreting the output:
1. The first table that we’re interested in is the Model Summary.
• The R value represents the simple correlation. This indicates a strong degree of correlation
between our two variables.
• The R2 value indicates how much of the total variation in the dependent variable (perceived stress)
can be explained by the independent variable (mastery). In this case, 37.3% can be explained.
104
Regression in SPSS
https://statistics.laerd.com/spss-tutorials/linear-regression-using-spss-statistics.php
Interpreting the output:
2. The next table is the ANOVA table, which shows us how well the
regression equation fits the data (i.e., predicts the dependent
variable).
• The regression predicts the dependent variable significantly well (p < .001).
105
Regression in SPSS
https://statistics.laerd.com/spss-tutorials/linear-regression-using-spss-statistics.php
Interpreting the output:
3. The Coefficients table gives us the information that we need to
predict stress from mastery, as well as determine whether mastery
contributes statistically significantly to the model.
106
Regression in SPSS
Y = B0 + B1 * X
Total perceived stress = 46.32 + (-.9*Total Mastery)
Example results write-up:
A simple linear regression was carried out to test if total mastery significantly predicted total
perceived stress. The results of the regression indicated that the model explained 37.3% of the
variance and that the model was significant F (1, 431) = 257.63, p < .001. It was found that total
mastery significantly predicted total perceived stress (B1 = -.9, p < .001). The final predictive model
was:
total perceived stress = 46.32 + (-9*total mastery)
107
Regression in SPSS
108
Regression in SPSS
Results are
usually
rounded to two
decimal places
Understanding Factor Analysis
 Regardless of purpose, factor analysis is used in:
 the determination of a small number of factors based on a
particular number of inter-related quantitative variables.
 Unlike variables directly measured such as speed,
height, weight, etc., some variables such as egoism,
creativity, happiness, religiosity, comfort are not a
single measurable entity.
 They are constructs that are derived from the
measurement of other, directly observable variables .
Understanding Factor Analysis
110
 Constructs are usually defined as unobservable latent variables. E.g.:
 motivation/love/hate/care/altruism/anxiety/worry/stress/product
quality/physical aptitude/democracy /reliability/power.
 Example: the construct of teaching effectiveness. Several variables are
used to allow the measurement of such construct (usually several scale
items are used) because the construct may include several dimensions.
 Factor analysis measures not directly observable constructs by
measuring several of its underlying dimensions.
 The identification of such underlying dimensions (factors) simplifies the
understanding and description of complex constructs.
Understanding Factor Analysis
111
• Generally, the number of factors is much smaller than the
number of measures.
• Therefore, the expectation is that a factor represents a set of
measures.
• From this angle, factor analysis is viewed as a data-reduction
technique as it reduces a large number of overlapping variables
to a smaller set of factors that reflect construct(s) or different
dimensions of contruct(s).
Understanding Factor Analysis
112
 The assumption of factor analysis is that underlying
dimensions (factors) can be used to explain complex
phenomena.
 Observed correlations between variables result from their
sharing of factors.
 Example: Correlations between a person’s test scores might be
linked to shared factors such as general intelligence, critical
thinking and reasoning skills, reading comprehension etc.
Ingredients of a Good Factor Analysis Solution
113
• A major goal of factor analysis is to represent
relationships among sets of variables parsimoniously
yet keeping factors meaningful.
• A good factor solution is both simple and interpretable.
• When factors can be interpreted, new insights are
possible.
Application of Factor Analysis
114
 Defining indicators of constructs:
 Ideally 4 or more measures should be chosen to represent each construct
of interest.
 The choice of measures should, as much as possible, be guided by theory,
previous research, and logic.
Application of Factor Analysis
115
 Defining dimensions for an existing measure:
In this case the variables to be analyzed are chosen by the
initial researcher and not the person conducting the analysis.
Factor analysis is performed on a predetermined set of
items/scales.
Results of factor analysis may not always be satisfactory:
The items or scales may be poor indicators of the construct or
constructs.
There may be too few items or scales to represent each underlying
dimension.
Application of Factor Analysis
116
 Selecting items or scales to be included in a measure.
Factor analysis may be conducted to determine what items or
scales should be included and excluded from a measure.
Results of the analysis should not be used alone in making
decisions of inclusions or exclusions. Decisions should be taken
in conjunction with the theory and what is known about the
construct(s) that the items or scales assess.
Steps in Factor Analysis
117
• Factor analysis usually proceeds in four steps:
• 1st Step: the correlation matrix for all variables is computed
• 2nd Step: Factor extraction
• 3rd Step: Factor rotation
• 4th Step: Make final decisions about the number of
underlying factors
Steps in Factor Analysis:
The Correlation Matrix
118
• 1st Step: the correlation matrix
• Generate a correlation matrix for all variables
• Identify variables not related to other variables
• If the correlation between variables are small, it is unlikely
that they share common factors (variables must be related to
each other for the factor model to be appropriate).
• Think of correlations in absolute value.
• Correlation coefficients greater than 0.3 in absolute value are
indicative of acceptable correlations.
• Examine visually the appropriateness of the factor model.
Steps in Factor Analysis:
The Correlation Matrix
• Bartlett Test of Sphericity:
 used to test the hypothesis the correlation matrix is an identity matrix (all
diagonal terms are 1 and all off-diagonal terms are 0).
 If the value of the test statistic for sphericity is large and the associated
significance level is small, it is unlikely that the population correlation matrix
is an identity.
• If the hypothesis that the population correlation matrix is an identity
cannot be rejected because the observed significance level is large, the
use of the factor model should be reconsidered.
119
Steps in Factor Analysis:
The Correlation Matrix
• The Kaiser-Meyer-Olkin (KMO) measure of sampling adequacy:
 is an index for comparing the magnitude of the observed correlation
coefficients to the magnitude of the partial correlation coefficients.
 The closer the KMO measure to 1 indicate a sizeable sampling adequacy (.8
and higher are great, .7 is acceptable, .6 is mediocre, less than .5 is
unaccaptable ).
 Reasonably large values are needed for a good factor analysis. Small KMO
values indicate that a factor analysis of the variables may not be a good idea.
120
Steps in Factor Analysis:
Factor Extraction
121
 2nd Step: Factor extraction
 The primary objective of this stage is to determine the factors.
 Initial decisions can be made here about the number of factors underlying
a set of measured variables.
 Estimates of initial factors are obtained using Principal components
analysis.
 The principal components analysis is the most commonly used extraction
method . Other factor extraction methods include:
 Maximum likelihood method
 Principal axis factoring
 Alpha method
 Unweighted lease squares method
 Generalized least square method
 Image factoring.
Steps in Factor Analysis:
Factor Extraction
122
 In principal components analysis, linear combinations of the
observed variables are formed.
 The 1st principal component is the combination that accounts
for the largest amount of variance in the sample (1st extracted
factor).
 The 2nd principle component accounts for the next largest
amount of variance and is uncorrelated with the first (2nd
extracted factor).
 Successive components explain progressively smaller portions of
the total sample variance, and all are uncorrelated with each
other.
Steps in Factor Analysis:Factor Extraction
123
 To decide on how many factors
we need to represent the data,
we use 2 statistical criteria:
 Eigen Values, and
 The Scree Plot.
 The determination of the
number of factors is usually
done by considering only factors
with Eigen values greater than 1.
 Factors with a variance less than
1 are no better than a single
variable, since each variable is
expected to have a variance of 1.
Total Variance Explained
Comp
onent
Initial Eigenvalues
Extraction Sums of Squared
Loadings
Total
% of
Variance
Cumulati
ve % Total
% of
Variance
Cumulati
ve %
1 3.046 30.465 30.465 3.046 30.465 30.465
2 1.801 18.011 48.476 1.801 18.011 48.476
3 1.009 10.091 58.566 1.009 10.091 58.566
4 .934 9.336 67.902
5 .840 8.404 76.307
6 .711 7.107 83.414
7 .574 5.737 89.151
8 .440 4.396 93.547
9 .337 3.368 96.915
10 .308 3.085 100.000
Extraction Method: Principal Component Analysis.
Steps in Factor Analysis:
Factor Extraction
 The examination of the Scree plot provides a
visual of the total variance associated with each
factor.
 The steep slope shows the large factors.
 The gradual trailing off (scree) shows the rest of
the factors usually lower than an Eigen value of 1.
 In choosing the number of factors, in addition to
the statistical criteria, one should make initial
decisions based on conceptual and theoretical
grounds.
 At this stage, the decision about the number of
factors is not final.
124
Steps in Factor Analysis:
Factor Extraction
125
Component Matrixa
Component
1 2 3
I discussed my frustrations and feelings with person(s) in school .771 -.271 .121
I tried to develop a step-by-step plan of action to remedy the problems .545 .530 .264
I expressed my emotions to my family and close friends .580 -.311 .265
I read, attended workshops, or sought someother educational approach to correct the
problem
.398 .356 -.374
I tried to be emotionally honest with my self about the problems .436 .441 -.368
I sought advice from others on how I should solve the problems .705 -.362 .117
I explored the emotions caused by the problems .594 .184 -.537
I took direct action to try to correct the problems .074 .640 .443
I told someone I could trust about how I felt about the problems .752 -.351 .081
I put aside other activities so that I could work to solve the problems .225 .576 .272
Extraction Method: Principal Component Analysis.
a. 3 components extracted.
Component Matrix using Principle Component Analysis
Steps in Factor Analysis:
Factor Rotation
126
 3rd Step: Factor rotation.
 In this step, factors are rotated.
 Un-rotated factors are typically not very interpretable (most factors
are correlated with may variables).
 Factors are rotated to make them more meaningful and easier to
interpret (each variable is associated with a minimal number of
factors).
 Different rotation methods may result in the identification of
somewhat different factors.
Steps in Factor Analysis:
Factor Rotation
 The most popular rotational method is Varimax rotations.
 Varimax use orthogonal rotations yielding uncorrelated factors/components.
 Varimax attempts to minimize the number of variables that have high
loadings on a factor. This enhances the interpretability of the factors.
127
Steps in Factor Analysis:
Factor Rotation
• Other common rotational method used include Oblique rotations which yield
correlated factors.
• Oblique rotations are less frequently used because their results are more
difficult to summarize.
• Other rotational methods include:
 Quartimax (Orthogonal)
 Equamax (Orthogonal)
 Promax (oblique)
128
Steps in Factor Analysis:
Factor Rotation
© Dr. Maher Khelifa 129
• A factor is interpreted or named by examining the largest values linking the factor to
the measured variables in the rotated factor matrix.
Rotated Component Matrixa
Component
1 2 3
I discussed my frustrations and feelings with person(s) in school .803 .186 .050
I tried to develop a step-by-step plan of action to remedy the problems .270 .304 .694
I expressed my emotions to my family and close friends .706 -.036 .059
I read, attended workshops, or sought someother educational approach to
correct the problem
.050 .633 .145
I tried to be emotionally honest with my self about the problems .042 .685 .222
I sought advice from others on how I should solve the problems .792 .117 -.038
I explored the emotions caused by the problems .248 .782 -.037
I took direct action to try to correct the problems -.120 -.023 .772
I told someone I could trust about how I felt about the problems .815 .172 -.040
I put aside other activities so that I could work to solve the problems -.014 .155 .657
Extraction Method: Principal Component Analysis.
Rotation Method: Varimax with Kaiser Normalization.
a. Rotation converged in 5 iterations.
Steps in Factor Analysis:
Making Final Decisions
130
• 4th Step: Making final decisions
• The final decision about the number of factors to choose is the number of factors
for the rotated solution that is most interpretable.
• To identify factors, group variables that have large loadings for the same factor.
• Plots of loadings provide a visual for variable clusters.
• Interpret factors according to the meaning of the variables
• This decision should be guided by:
• A priori conceptual beliefs about the number of factors from past research or
theory
• Eigen values computed in step 2.
• The relative interpretability of rotated solutions computed in step 3.
Assumptions Underlying Factor Analysis
131
• Assumption underlying factor analysis include.
• The measured variables are linearly related to the factors + errors.
• This assumption is likely to be violated if items limited response scales (two-
point response scale like True/False, Right/Wrong items).
• The data should have a bi-variate normal distribution for each pair of
variables.
• Observations are independent.
• The factor analysis model assumes that variables are determined by
common factors and unique factors. All unique factors are assumed to be
uncorrelated with each other and with the common factors.
Obtaining a Factor Analysis
• Click:
• Analyze and select
• Dimension Reduction
• Factor
• A factor Analysis Box will appear
© Dr. Maher Khelifa 132
Obtaining a Factor Analysis
• Move
variables/scale
items to Variable
box
133
Obtaining a Factor Analysis
• Factor extraction
• When variables
are in variable
box, select:
• Extraction
© Dr. Maher Khelifa 134
Obtaining a Factor Analysis
• When the factor extraction
Box appears, select:
• Scree Plot
• keep all default selections
including:
• Principle component Analysis
• Based on Eigen Value of 1, and
• Un-rotated factor solution
© Dr. Maher Khelifa 135
Obtaining a Factor Analysis
• During factor extraction
keep factor rotation default
of:
• None
• Press continue
© Dr. Maher Khelifa 136
Obtaining a Factor Analysis
• During Factor Rotation:
• Decide on the number of factors
based on actor extraction phase and
enter the desired number of factors
by choosing:
• Fixed number of factors and
entering the desired number of
factors to extract.
• Under Rotation Choose Varimax
• Press continue
• Then OK
© Dr. Maher Khelifa 137
Bibliographical References
 Almar, E.C. (2000). Statistical Tricks and traps. Los Angeles, CA: Pyrczak Publishing.
 Bluman, A.G. (2008). Elemtary Statistics (6th Ed.). New York, NY: McGraw Hill.
 Chatterjee, S., Hadi, A., & Price, B. (2000) Regression analysis by example. New York: Wiley.
 Cohen, J., & Cohen, P. (1983). Applied multiple regression/correlation analysis for the behavioral sciences (2nd Ed.).
Hillsdale, NJ.: Lawrence Erlbaum.
 Darlington, R.B. (1990). Regression and linear models. New York: McGraw-Hill.
 Einspruch, E.L. (2005). An introductory Guide to SPSS for Windows (2nd Ed.). Thousand Oak, CA: Sage Publications.
 Fox, J. (1997) Applied regression analysis, linear models, and related methods. Thousand Oaks, CA: Sage Publications.
 Glassnapp, D. R. (1984). Change scores and regression suppressor conditions. Educational and Psychological
Measurement (44), 851-867.
 Glassnapp. D. R., & Poggio, J. (1985). Essentials of Statistical Analysis for the Behavioral Sciences. Columbus, OH:
Charles E. Merril Publishing.
 Grimm, L.G., & Yarnold, P.R. (2000). Reading and understanding Multivariate statistics. Washington DC: American
Psychological Association.
 Hamilton, L.C. (1992) Regression with graphics. Belmont, CA: Wadsworth.
 Hochberg, Y., & Tamhane, A.C. (1987). Multiple Comparisons Procedures. New York: John Wiley.
 Jaeger, R. M. Statistics: A spectator sport (2nd Ed.). Newbury Park, London: Sage Publications.
© Dr. Maher Khelifa 138
Bibliographical References
• Keppel, G. (1991). Design and Analysis: A researcher’s handbook (3rd Ed.). Englwood Cliffs, NJ: Prentice Hall.
• Maracuilo, L.A., & Serlin, R.C. (1988). Statistical methods for the social and behavioral sciences. New York:
Freeman and Company.
• Maxwell, S.E., & Delaney, H.D. (2000). Designing experiments and analyzing data: Amodel comparison
perspective. Mahwah, NJ. : Lawrence Erlbaum.
• Norusis, J. M. (1993). SPSS for Windows Base System User’s Guide. Release 6.0. Chicago, IL: SPSS Inc.
• Norusis, J. M. (1993). SPSS for Windows Advanced Statistics. Release 6.0. Chicago, IL: SPSS Inc.
• Norusis, J. M. (2006). SPSS Statistics 15.0 Guide to Data Analysis. Upper Saddle River, NJ.: Prentice Hall.
• Norusis, J. M. (2008). SPSS Statistics 17.0 Guide to Data Analysis. Upper Saddle River, NJ.: Prentice Hall.
• Norusis, J. M. (2008). SPSS Statistics 17.0 Statistical Procedures Companion. Upper Saddle River, NJ.: Prentice
Hall.
• Norusis, J. M. (2008). SPSS Statistics 17.0 Advanced Statistical Procedures Companion. Upper Saddle River, NJ.:
Prentice Hall.
• Pedhazur, E.J. (1997). Multiple regression in behavioral research, third edition. New York: Harcourt Brace College
Publishers.
© Dr. Maher Khelifa 139

Más contenido relacionado

Similar a analysis part 02.pptx

Lecture 7 guidelines_and_assignment
Lecture 7 guidelines_and_assignmentLecture 7 guidelines_and_assignment
Lecture 7 guidelines_and_assignment
Daria Bogdanova
 
TOPIC Bench-marking Testing1. Windows operating system (Microso.docx
TOPIC Bench-marking Testing1. Windows operating system (Microso.docxTOPIC Bench-marking Testing1. Windows operating system (Microso.docx
TOPIC Bench-marking Testing1. Windows operating system (Microso.docx
juliennehar
 
1. F A Using S P S S1 (Saq.Sav) Q Ti A
1.  F A Using  S P S S1 (Saq.Sav)   Q Ti A1.  F A Using  S P S S1 (Saq.Sav)   Q Ti A
1. F A Using S P S S1 (Saq.Sav) Q Ti A
Zoha Qureshi
 
Factor analysis using SPSS
Factor analysis using SPSSFactor analysis using SPSS
Factor analysis using SPSS
Remas Mohamed
 
Lecture 5 practical_guidelines_assignments
Lecture 5 practical_guidelines_assignmentsLecture 5 practical_guidelines_assignments
Lecture 5 practical_guidelines_assignments
Daria Bogdanova
 
Missing Parts I don’t think you understood the assignment.docx
Missing Parts I don’t think you understood the assignment.docxMissing Parts I don’t think you understood the assignment.docx
Missing Parts I don’t think you understood the assignment.docx
annandleola
 
Summer 2015 Internship
Summer 2015 InternshipSummer 2015 Internship
Summer 2015 Internship
Taylor Martell
 

Similar a analysis part 02.pptx (20)

Team 16_Report
Team 16_ReportTeam 16_Report
Team 16_Report
 
Machine learning Mind Map
Machine learning Mind MapMachine learning Mind Map
Machine learning Mind Map
 
Lecture 7 guidelines_and_assignment
Lecture 7 guidelines_and_assignmentLecture 7 guidelines_and_assignment
Lecture 7 guidelines_and_assignment
 
TOPIC Bench-marking Testing1. Windows operating system (Microso.docx
TOPIC Bench-marking Testing1. Windows operating system (Microso.docxTOPIC Bench-marking Testing1. Windows operating system (Microso.docx
TOPIC Bench-marking Testing1. Windows operating system (Microso.docx
 
1. F A Using S P S S1 (Saq.Sav) Q Ti A
1.  F A Using  S P S S1 (Saq.Sav)   Q Ti A1.  F A Using  S P S S1 (Saq.Sav)   Q Ti A
1. F A Using S P S S1 (Saq.Sav) Q Ti A
 
Factor analysis using SPSS
Factor analysis using SPSSFactor analysis using SPSS
Factor analysis using SPSS
 
Lecture 5 practical_guidelines_assignments
Lecture 5 practical_guidelines_assignmentsLecture 5 practical_guidelines_assignments
Lecture 5 practical_guidelines_assignments
 
Machine Learning.pdf
Machine Learning.pdfMachine Learning.pdf
Machine Learning.pdf
 
Factor Analysis-Presentation DATA ANALYTICS
Factor Analysis-Presentation DATA ANALYTICSFactor Analysis-Presentation DATA ANALYTICS
Factor Analysis-Presentation DATA ANALYTICS
 
Chapter37
Chapter37Chapter37
Chapter37
 
Missing Parts I don’t think you understood the assignment.docx
Missing Parts I don’t think you understood the assignment.docxMissing Parts I don’t think you understood the assignment.docx
Missing Parts I don’t think you understood the assignment.docx
 
Btm8107 8 week2 activity understanding and exploring assumptions a+ work
Btm8107 8 week2 activity understanding and exploring assumptions a+ workBtm8107 8 week2 activity understanding and exploring assumptions a+ work
Btm8107 8 week2 activity understanding and exploring assumptions a+ work
 
An Introduction to Factor analysis ppt
An Introduction to Factor analysis pptAn Introduction to Factor analysis ppt
An Introduction to Factor analysis ppt
 
Factor analysis ppt
Factor analysis pptFactor analysis ppt
Factor analysis ppt
 
report
reportreport
report
 
7. logistics regression using spss
7. logistics regression using spss7. logistics regression using spss
7. logistics regression using spss
 
Mba2216 week 11 data analysis part 02
Mba2216 week 11 data analysis part 02Mba2216 week 11 data analysis part 02
Mba2216 week 11 data analysis part 02
 
Summer 2015 Internship
Summer 2015 InternshipSummer 2015 Internship
Summer 2015 Internship
 
Class9_PCA_final.ppt
Class9_PCA_final.pptClass9_PCA_final.ppt
Class9_PCA_final.ppt
 
factor-analysis (1).pdf
factor-analysis (1).pdffactor-analysis (1).pdf
factor-analysis (1).pdf
 

Último

POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.
Silpa
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Sérgio Sacani
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
Areesha Ahmad
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
levieagacer
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
NazaninKarimi6
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
MohamedFarag457087
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Último (20)

POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.
 
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS  ESCORT SERVICE In Bhiwan...
Bhiwandi Bhiwandi ❤CALL GIRL 7870993772 ❤CALL GIRLS ESCORT SERVICE In Bhiwan...
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
 
Velocity and Acceleration PowerPoint.ppt
Velocity and Acceleration PowerPoint.pptVelocity and Acceleration PowerPoint.ppt
Velocity and Acceleration PowerPoint.ppt
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspects
 
Introduction of DNA analysis in Forensic's .pptx
Introduction of DNA analysis in Forensic's .pptxIntroduction of DNA analysis in Forensic's .pptx
Introduction of DNA analysis in Forensic's .pptx
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdf
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
 
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxClimate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
 
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort ServiceCall Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
 
Stages in the normal growth curve
Stages in the normal growth curveStages in the normal growth curve
Stages in the normal growth curve
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Use of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxUse of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptx
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 

analysis part 02.pptx

  • 2. Data analysis Descriptive/Frequency - Demographics (Number and/or percentage) - Cross-tabulation (Number and/or percentage) Goodness of Measures Measurement Validity and Reliability. Reliability: The degree to which measures are free from random error and therefore yield consistent results. Inferential/Hypothesis testing - t-test or ANOVA - Correlation - Regression
  • 3. The Right Technique in Data Analysis? What is the purpose of the analysis? - Descriptive, compare group, relationship What is the level of measurement? - Parametric and Non-parametric How many variables are involved? - Univariate, bivariate, multivariate What kind of tests? Descriptive or Inferential. If inferential set the significance level
  • 4. Descriptive Analysis Purpose: To describe the distribution of the demographic variable Frequencies distribution – if 1 ordinal or nominal Cross-tabulation – if 2 ordinal or nominal Means – if 1 interval or ratio Means of subgroup – if 1 interval or ratio by subgroup
  • 5. GOODNESS OF MEASURE Validity (criterion) - Factor analysis Reliability - Cronbach alpha
  • 6. FACTOR ANALYSIS • Go to analyze – dimension reduction- factor - Enter items of IV or DV into dialogue box - Tick descriptive – initial solution – coefficient- sig. level- determinant-KMO & Bartlett test-inverse-reproduced- -anti-image - Tick extraction – principal component - Tick rotation – varimax – rotated solution -loading plot - Tick score – display factor coefficient matrix - Tick option – sorted by size - Tick ok
  • 7. FACTOR ANALYSIS … CONT. To conduct a Factor Analysis, start from the “Analyze” menu. This procedure is intended to reduce the complexity in a set of data, so we choose “Dimension Reduction” from the menu. And the choice in this category is “Factor,” for factor analysis. This dataset gives children’s scores on subtests of the Wechsler Intelligence Scale for Children (WISC-III). The Wechsler scales are scored to give you a “verbal” and a “performance” IQ. The question is whether we can reproduce the verbal vs. nonverbal distinction, with the appropriate subtests grouping into each category, using factor analysis.
  • 8. FACTOR ANALYSIS … CONT. Factor analysis has no IVs and DVs, so everything you want to get factors for just goes into the list labeled “variables.” In this case, it’s all the variables. In some datasets, there is also a dummy “subject number” variable included. Be sure that you don’t include subject number as one of the variables for your factor analysis!
  • 9. FACTOR ANALYSIS … CONT. In this dialog box, you can make a number of selections. First, I want you to un-check the box labeled “Unrotated factor solution.” This is a default setting for your printout, but it just gives you information that you don’t need, and that may distract you from the real answers. So, always go into the Extraction sub-dialog and un-check this box. Second, check the box for a “scree plot.” This will give you a scree diagram, which is one way to decide how many factors to extract. Third, look at the section labeled “Extract.” As you can see, the default setting is for SPSS to use the Kaiser stopping criterion (i.e., all factors with eigenvalues greater than 1) to decide how many factors to extract. You can set a more conservative stopping criterion by requiring each factor to have a higher eigenvalue. Or, if you already know exactly how many factors you think there will be, you can set the extraction method to a specific “Number of factors,” and then put the number into this box.
  • 10. FACTOR ANALYSIS … CONT. This dialog allows you to choose a “rotation method” for your factor analysis. a rotation method gets factors that are as different from each other as possible, and helps you interpret the factors by putting each variable primarily on one of the factors. However, you still need to decide whether you want an “orthogonal” solution (factors are not highly correlated with each other), or an “oblique” solution (factors are correlated with one another). If you want an oblique solution, the only choice SPSS gives you is “Direct Oblimin.” All of the others are orthogonal solutions—the one that you’ll use most often from these choices is the default value, “Varimax.” Most of the factor analyses you will see in published articles use a Varimax rotation. make sure that the check-box for a “rotated solution” is on. The rotated solution gives you the factor loadings for each individual variable in your dataset, which are what you use to interpret the meaning of (i.e., make up names for) the different factors.
  • 11. FACTOR ANALYSIS … CONT. This table shows you the actual factors that were extracted. If you look at the section labeled “Rotation Sums of Squared Loadings,” it shows you only those factors that met your cut-off criterion (extraction method). In this case, there were three factors with eigenvalues greater than 1. SPSS always extracts as many factors initially as there are variables in the dataset, but the rest of these didn’t make the grade. The “% of variance” column tells you how much of the total variability (in all of the variables together) can be accounted for by each of these summary scales or factors. Factor 1 accounts for 27.485% of the variability in all 11 variables, and so on.
  • 12. FACTOR ANALYSIS … CONT. Finally, the Rotated Component Matrix shows you the factor loadings for each variable. I went across each row, and highlighted the factor that each variable loaded most strongly on. Based on these factor loadings, I think the factors represent: --The first 5 subtests loaded strongly on Factor 1, which I’ll call “Verbal IQ” --Picture Completion through Object Assembly all loaded strongly on Factor 2, which I’ll “Performance IQ” --Coding loaded strongly on Factor 3 (and Digit Span loaded fairly strongly on Factor 3, although it also loaded on Factor 1). Probably Factor 3 is “Freedom from Distraction,” because these are concentration-intensive tasks.
  • 14. KMO and Bartlett's Test Kaiser-Meyer-Olkin Measure of Sampling Adequacy. .619 Bartlett's Test of Sphericity Approx. Chi-Square 327.667 df 91 Sig. .000 Total Variance Explained Initial Eigenvalues Extraction Sums of Squared Loadings Rotation Sums of Squared Loadings Component Total % of Variance Cumulative % Total % of Variance Cumulative % Total % of Variance Cumulative % 1 2.672 19.087 19.087 2.672 19.087 19.087 1.941 13.864 13.864 2 2.116 15.111 34.198 2.116 15.111 34.198 1.911 13.648 27.512 3 1.314 9.385 43.583 1.314 9.385 43.583 1.521 10.866 38.378 4 1.129 8.065 51.648 1.129 8.065 51.648 1.489 10.635 49.012 5 1.024 7.316 58.964 1.024 7.316 58.964 1.393 9.952 58.964 6 .915 6.538 65.502 7 .908 6.485 71.987 8 .820 5.860 77.848 9 .729 5.209 83.056 10 .628 4.484 87.540 11 .541 3.865 91.405 12 .471 3.365 94.771 13 .403 2.876 97.647 14 .329 2.353 100.000 Extraction Method: Principal Component Analysis. Output of Factor Analysis
  • 15. Rotated Component Matrix Component 1 2 3 4 5 BO12 .742 6.966E-02 8.411E-02 .291 1.308E-02 BO7 .724 -5.468E-02 -.214 -8.570E-02 .146 BO6 .722 -1.121E-02 .243 .136 .129 BO8 6.309E-02 .801 .149 -4.440E-02 -4.116E-02 BO14 -.303 .772 .183 2.543E-02 .170 BO13 -.197 -.627 .137 .127 .333 BO11 -6.129E-02 8.341E-02 -.802 6.165E-02 .153 BO4 3.467E-02 .318 .599 9.943E-02 -7.414E-02 BO9 -1.251E-02 -1.622E-02 -1.564E-02 .820 5.378E-02 BO1 -.236 .230 .359 -.556 .143 BO2 .303 2.235E-02 .190 .539 .223 BO3 .196 .114 -3.516E-02 -9.854E-02 .797 BO10 -.127 .169 .212 -.214 -.577 BO5 4.868E-02 .260 .349 -.136 -.379 Extraction Method: Principal Component Analysis. Rotation Method: Varimax with Kaiser Normalization. a Rotation converged in 6 iterations. Output of Factor Analysis … cont.
  • 16. RELIABILITY • Go to analyze – scale – reliability analysis - Enter items to be analyzed - Tick statistics – descriptive for –item – scale – scale if item deleted. • Verify the output - If the scale (Cronbach alpha) > .70, the reliability of the variable is achieved (Nunnally, 1978) - If not verify the table and check the alpha scale if item deleted to detect for improvement. - Drop item stated in scale if item deleted and run reliability again. - Do summated scale to formulate a variable.
  • 17. R E L I A B I L I T Y A N A L Y S I S - S C A L E (A L P H A) Mean Std Dev Cases 1. BO12 3.9580 1.0269 143.0 2. BO6 3.4825 .9704 143.0 3. BO7 2.9650 .8914 143.0 N of Statistics for Mean Variance Std Dev Variables SCALE 10.4056 4.9048 2.2147 3 Item-total Statistics Scale Scale Corrected Mean Variance Item- Alpha if Item if Item Total if Item Deleted Deleted Correlation Deleted BO12 6.4476 2.3194 .4894 .5030 BO6 6.9231 2.4518 .4974 .4916 BO7 7.4406 2.9243 .3890 .6348 Reliability Coefficients N of Cases = 143.0 N of Items = 3 Alpha = .6465
  • 18. 1 Dependent Variable 1 Independent Variable Test Binary Metric Logistic regression Non-metric Chi-square test Non-metric Metric Logistic regression Binary Mann-Whitney test Metric Binary t-test Metric Regression analysis Nominal Analysis of variance When do we need which test? 18
  • 19. 1 Dependent Variable 2 or more Independent Variables Test Non-metric Metric Logistic regression Non-metric Loglinear analysis Metric Metric Multiple regression Non-metric Analysis of variance When do we need which test? 19
  • 20. A chi-squared test (also chi-square or χ2 test) is a statistical hypothesis test used in the analysis of contingency tables when the sample sizes are large. A Chi-square test is a hypothesis testing method. Two common Chi- square tests involve checking if observed frequencies in one or more categories match expected frequencies. A contingency table is a tool used to summarize and analyze the relationship between two categorical variables. The Mann-Whitney U test is used to compare differences between two independent groups when the dependent variable is either ordinal or continuous, but not normally distributed. For example, you could use the Mann-Whitney U test to understand whether attitudes towards pay discrimination, where attitudes are measured on an ordinal scale, differ based on gender (i.e., your dependent variable would be "attitudes towards pay discrimination" and your independent variable would be "gender", which has two groups: "male" and "female"). A t test is a statistical test that is used to compare the means of two groups. It is often used in hypothesis testing to determine whether a process or treatment actually has an effect on the population of interest, or whether two groups are different from one another. Log-Linear Analysis is a statistical test used to determine if the proportions of categories in two or more group variables significantly differ from each other. To use this test, you should have two or more group variables with two or more options in each group variable. See more below.
  • 21. Correlation H1: Autonomy and innovative orientation among Bumiputera SMEs in northern Malaysia are related significantly Correlations Autonomy Innovative Autonomy Pearson Correlation 1 .072 Sig. (2-tailed) . .297 N 210 210 Innovative Pearson Correlation .072 1 Sig. (2-tailed) .297 . N 210 210 Interpretation: (r = .072, p < .297) if significant level is set at p < .05, then there is no statistical significant correlation between autonomy and innovativeness. Therefore, H1 rejected.
  • 22.  The purpose of regression models is learn more about the relationship between several independent or predictor variables and a dependent or criterion variable.  The computational problem that needs to be solved in regression analysis is to fit a straight line to a number of points.  Y = b0 +b1x1 + b2x2 + … + bnxn + e Regression models 23
  • 23.  Linear regression  1 dependent variable: continuous/scale  One or more independent variables: continuous/scale  Hierarchical regression  1 dependent variable: continuous/scale  Multiple blocks of independent variables: continuous/scale  Logistic regression  1 dependent variable: binary  One or more independent variables: continuous/scale Types of Regression Models 24
  • 24. Output of SPSS Regression Analyses 25
  • 25. Output of SPSS Regression Analyses 26 the F-test can assess the equality of variances.
  • 26. Output of SPSS Regression Analyses 27 Confidence interval = sample mean ± margin of error To obtain this confidence interval, add and subtract the margin of error from the sample mean. This result is the upper limit and the lower limit of the confidence interval.
  • 27. MULTIPLE REGRESSION ANALYSIS…CONT. Consider Some Multiple Regression Assumptions: 1. Normality – Verify Skewness < 2.0 or histogram (Skewness is a measurement of the distortion of symmetrical distribution or asymmetry in a data set. ) 1. Linearity – Verify p-p plot of std. regress residuals 2. Homocedasticity – an assumption of equal or similar variances in different groups being compared. 3. Free from error term – Durbin Watson between 1.5 – 2.5 4. Free from multicollinearity – Correlation < .70,
  • 28. 1. Describe Descriptive Statistics (means, st. dev.) of all variables 2. Report on testing of assumptions – especially if assumptions are violated and what was done about it. 3. Report on model fit statistics (F, df1, df2, R2). 4. Report parameter estimates – for constant and IV 1. Standardized Beta 2. T-value and significance 3. (Confidence intervals) Reporting Regression Analyses 29
  • 29.  Type of regression models where  The dependent variable is binary  [or ordinal: ordered logistic regression (e.g. 3 categories: low, medium, high)]  Checks whether we can predict in which category we will land based on the values of the IV.  Essentially compares a model with predictors (BLOCK 1) against a model without predictors (BLOCK 0):  is a prediction with our variables better than random chance? Example: http://eprints.qut.edu.au/31606/ Logistic Regression Analysis 30
  • 33.  a statistical method used to test differences between two or more means.  Inferences about means are made by analyzing variance.  Think of it as an extension of t-tests  To two or more groups  To means + variance rather than only means.  In a typical ANOVA, the null hypothesis is that all groups are random samples of the same population.  For example, when studying the effect of different treatments on similar samples of patients, the null hypothesis would be that all treatments have the same effect (perhaps none).  Rejecting the null hypothesis would imply that different treatments result in altered effects.  Often used in experimental research, to study effects of treatments. Analysis of Variance Models 34
  • 34.  One-way ANOVA  used to test for differences among two or more independent groups (means).  Typically, however, the one-way ANOVA is used to test for differences among at least three groups, since the two-group case can be covered by a t-test (when there are only two means to compare, the t-test and the ANOVA F-test are equivalent).  Factorial ANOVA  used when the experimenter wants to study the interaction effects among the treatments.  Repeated measures ANOVA  used when the same subjects are used for each treatment (e.g., in a longitudinal study).  Multivariate analysis of variance (MANOVA)  used when there is more than one dependent variable.  Analysis of covariance (ANCOVA)  blends ANOVA and regression: evaluates whether population means of a DV are equal across levels of a categorical IV [treatment], while statistically controlling for the effects of other continuous variables that are not of primary interest [covariates]. Types of Analysis of Variance Models 35
  • 35. When can we use ANOVA? • The t-test is used to compare the means of two-groups. • One-way ANOVA is used to compare the means of two or more groups. • We can use one-way ANOVA whenever the dependent variable (DV) is numerical and the independent variable (IV) is categorical. • The independent variable in ANOVA is also called a factor. 36
  • 36. Examples The following are situations where we can use ANOVA: • Testing the differences in blood pressure among different groups of people (DV is blood pressure and the group is the IV). • Testing which type of social media affects hours of sleep (type of social media used is the IV and hours of sleep is the DV). 37
  • 37.  The type of ANOVA model is highly dependent on your research design and theory; in particular:  What are between-subject factors? How many?  What are within-subject factors? How many?  What are treatments? How many?  Which factors are theoretically relevant, which are mere controls? ANOVA and Research Designs 38
  • 38.  Independence, normality and homogeneity of the variances of the residuals  Note there are no necessary assumptions for ANOVA in its full generality, but the F-test used for ANOVA hypothesis testing has assumptions and practical limitations. ANOVA Assumptions 39
  • 39.  One-way = one-way between groups model  E.g., school performance between boys versus girls  Two-way = two one-ways for each factor PLUS interaction between two factors  E.g., school performance between boys versus girls and locals versus internationals  Three-way  You get the idea… One-way and two-way ANOVA 40
  • 40.  Injuries sustained by kids wearing superhero costumes  Does it depend on which costume they wear?  Superman, Spiderman, Hulk, Ninja Turtle?  Adopted from http://www.statisticshell.com/docs/onewayanova.pdf Illustration: Analysis of Variance 41
  • 41.  Are injuries sustained random or significantly dependent on wearing superhero costumes?  Is there any order of injuries sustained by type of costume? What ANOVA could tell us 42
  • 42. What ANOVA could tell us Variance in injuries severity explained by different costumes Flying superheroes Non-flying superheroes Superman Spiderman Hulk Ninja Turtle Contrast 1 Contrast 2 Contrast 3
  • 43. Assumptions of ANOVA • The observations in each group are normally distributed. This can be tested by plotting the numerical variable separately for each group and checking that they all have a bell shape. Alternatively, you could use the Shapiro-Wilk test for normality. 44
  • 44. Assumptions • The groups have equal variances (i.e., homogeneity of variance). You can plot each group separately and check that they exhibit similar variability. Alternatively, you can use Levene’s test for homogeneity. • The observations in each group are independent. This could be assessed by common sense looking at the study design. For example, if there is a participant in more than one group, your observations are not independent. 45
  • 45. Hypothesis Testing ANOVA tests the null hypothesis: H0 : The groups have equal means versus the alternative hypothesis: H1 : At least one group mean is different from the other group means. 46 F-Test
  • 46. ANOVA in SPSS 47 Example: Is there a difference in optimism scores for young, middle-aged and old participants? Categorical IV - Age with 3 levels: • 29 and younger • Between 30 and 44 • 45 or above Continuous DV – Optimism scores
  • 47. ANOVA in SPSS 48 Interpreting the output: 1. Check that the groups have equal variances using Levene’s test for homogeneity. • Check the significance value (Sig.) for Levene’s test Based on Mean. • If this number is greater than .05 you have not violated the assumption of homogeneity of variance.
  • 48. ANOVA in SPSS 49 Interpreting the output: 2. Check the significance of the ANOVA. • If the Sig. value is less than or equal to .05, there is a significant difference somewhere among the mean scores on your dependent variable for the three groups. • However, this does not tell us which group is different from which other group.
  • 49. ANOVA in SPSS 50 Interpreting the output: 3. ONLY if the ANOVA is significant, check the significance of the differences between each pair of groups in the table labelled Multiple Comparisons.
  • 50. ANOVA in SPSS 51 Calculating effect size: • In an ANOVA, effect size will tell us how large the difference between groups is. • We will calculate eta squared, which is one of the most common effect size statistics. Eta squared = Sum of squares between groups Total sum of squares
  • 51. ANOVA in SPSS 52 Calculating effect size: 179.07 8513.02 = .02 According to Cohen (1988): Small effect: .01 Medium effect: .06 Large effect: .14
  • 52. ANOVA in SPSS 53 Example results write-up: A one way between-groups analysis of variance was conducted to explore the impact of age on levels of optimism. Participants were divided into three groups according to their age (Group 1: 29yrs or less; Group 2: 30 to 44yrs; Group 3: 45yrs and above). There was a statistically significant difference at the p < .05 level in optimism scores for the three age groups: F (2, 432) = 4.6, p = .01. Despite reaching statistical significance, the actual difference in mean scores between the groups was quite small. The effect size, calculated using eta squared, was .02. Post-hoc comparisons using the Tukey HSD test indicated that the mean score for Group 1 (M = 21.36, SD = 4.55) was significantly different from Group 3 (M = 22.96, SD = 4.49).
  • 53. ANOVA in SPSS 54 Note: Results are usually rounded to two decimal places
  • 54. Descriptive Statistics-Numeric Data • After Importing your dataset, and providing names to variables, click on: • ANALYZE  DESCRIPTIVE STATISTICS DESCRIPTIVES • Choose any variables to be analyzed and place them in box on right • Options include:
  • 55. e 8 8 0 1 3 3 1 1 8 C V t i t i t i t i t i E t i t i N m i m u e t d i a i a
  • 56. Descriptive Statistics-General Data • After Importing your dataset, and providing names to variables, click on: • ANALYZE  DESCRIPTIVE STATISTICS FREQUENCIES • Choose any variables to be analyzed and place them in box on right • Options include (For Categorical Variables): • Frequency Tables • Pie Charts, Bar Charts • Options include (For Numeric Variables) • Frequency Tables (Useful for discrete data) • Measures of Central Tendency, Dispersion, Percentiles • Pie Charts, Histograms
  • 57. Example 1.4 - Smoking Status S 0 9 9 9 3 3 3 2 9 6 6 8 2 4 4 2 3 8 8 0 7 0 0 N Q Q C O T V u r c P u r c
  • 58. Vertical Bar Charts and Pie Charts • After Importing your dataset, and providing names to variables, click on: • GRAPHS  BAR…  SIMPLE (Summaries for Groups of Cases)  DEFINE • Bars Represent N of Cases (or % of Cases) • Put the variable of interest as the CATEGORY AXIS • GRAPHS  PIE… (Summaries for Groups of Cases)  DEFINE • Slices Represent N of Cases (or % of Cases) • Put the variable of interest as the DEFINE SLICES BY
  • 59. Example 1.5 - Antibiotic Study OUTCOME 5 4 3 2 1 Count 80 60 40 20 0 5 4 3 2 1
  • 60. Histograms • After Importing your dataset, and providing names to variables, click on: • GRAPHS  HISTOGRAM • Select Variable to be plotted • Click on DISPLAY NORMAL CURVE if you want a normal curve superimposed (see Chapter 3).
  • 61. Example 1.6 - Drug Approval Times MONTHS 1 2 0 . 0 1 1 0 . 0 1 0 0 . 0 9 0 . 0 8 0 . 0 7 0 . 0 6 0 . 0 5 0 . 0 4 0 . 0 3 0 . 0 2 0 . 0 1 0 . 0 0 . 0 30 20 10 0 Std. Dev = 20.97 Mean = 32.1 N = 175.00
  • 62. Side-by-Side Bar Charts • After Importing your dataset, and providing names to variables, click on: • GRAPHS  BAR…  Clustered (Summaries for Groups of Cases)  DEFINE • Bars Represent N of Cases (or % of Cases) • CATEGORY AXIS: Variable that represents groups to be compared (independent variable) • DEFINE CLUSTERS BY: Variable that represents outcomes of interest (dependent variable)
  • 63. Example 1.7 - Streptomycin Study TRT 2 1 Count 30 20 10 0 OUTCOME 1 2 3 4 5 6
  • 64. Scatterplots • After Importing your dataset, and providing names to variables, click on: • GRAPHS  SCATTER  SIMPLE  DEFINE • For Y-AXIS, choose the Dependent (Response) Variable • For X-AXIS, choose the Independent (Explanatory) Variable
  • 65. Example 1.8 - Theophylline Clearance DRUG 3.5 3.0 2.5 2.0 1.5 1.0 .5 THCLRNCE 8 7 6 5 4 3 2 1 0
  • 66. Scatterplots with 2 Independent Variables • After Importing your dataset, and providing names to variables, click on: • GRAPHS  SCATTER  SIMPLE  DEFINE • For Y-AXIS, choose the Dependent Variable • For X-AXIS, choose the Independent Variable with the most levels • For SET MARKERS BY, choose the Independent Variable with the fewest levels
  • 67. Example 1.8 - Theophylline Clearance SUBJECT 16 14 12 10 8 6 4 2 0 THCLRNCE 8 7 6 5 4 3 2 1 0 DRUG Tagamet Pepcid Placebo
  • 68. Contingency Tables for Conditional Probabilities • After Importing your dataset, and providing names to variables, click on: • ANALYZE  DESCRIPTIVE STATISTICS  CROSSTABS • For ROWS, select the variable you are conditioning on (Independent Variable) • For COLUMNS, select the variable you are finding the conditional probability of (Dependent Variable) • Click on CELLS • Click on ROW Percentages
  • 69. Example 1.10 - Alcohol & Mortality C 5 5 0 % % % 1 4 5 % % % 6 9 5 % % % C % C % C % 0 1 W T 0 1 A o t
  • 70. Independent Sample t-Test • After Importing your dataset, and providing names to variables, click on: • ANALYZE  COMPARE MEANS  INDEPENDENT SAMPLES T-TEST • For TEST VARIABLE, Select the dependent (response) variable(s) • For GROUPING VARIABLE, Select the independent variable. Then define the names of the 2 levels to be compared (this can be used even when the full dataset has more than 2 levels for independent variable).
  • 71. Example 3.5 - Levocabastine in Renal Patients S t 6 3 2 2 6 7 9 7 G N H A N e a e E e t S 4 1 6 0 4 7 7 5 0 8 3 6 3 6 7 7 1 3 4 6 E q a s E q n o A U F S i g s T o f V t d f 2 - t M e a e r e . E e r e o w p p o n v a l e r e u a l
  • 72. Paired t-test • After Importing your dataset, and providing names to variables, click on: • ANALYZE  COMPARE MEANS  PAIRED SAMPLES T-TEST • For PAIRED VARIABLES, Select the two dependent (response) variables (the analysis will be based on first variable minus second variable)
  • 73. Example 3.7 - Cmax in SRC&IRC Codeine p l 3 2 8 3 5 5 S I P 1 e a N e E e e s S P N e l i g m 3 9 2 8 9 6 2 0 S P e a e v E e a o w p p o n a l e r e i f f t d f 2 -
  • 74. Chi-Square Test • After Importing your dataset, and providing names to variables, click on: • ANALYZE  DESCRIPTIVE STATISTICS  CROSSTABS • For ROWS, Select the Independent Variable • For COLUMNS, Select the Dependent Variable • Under STATISTICS, Click on CHI-SQUARE • Under CELLS, Click on OBSERVED, EXPECTED, ROW PERCENTAGES, and ADJUSTED STANDARDIZED RESIDUALS • NOTE: Large ADJUSTED STANDARDIZED RESIDUALS (in absolute value) show which cells are inconsistent with null hypothesis of independence. A common rule of thumb is seeing which if any cells have values >3 in absolute value
  • 75. Example 5.8 - Marital Status & Cancer R E V 2 9 4 7 7 6 . 1 7 . 9 6 . 0 % % 0 % . 3 2 . 3 1 6 0 8 2 4 . 3 1 . 7 4 . 0 % % 0 % . 7 - . 7 6 7 5 6 2 3 . 6 1 . 4 3 . 0 % % 0 % . 1 1 . 1 5 5 1 0 . 0 5 . 0 0 . 0 % % 0 % . 0 . 0 1 7 1 6 3 3 . 0 6 . 0 3 . 0 % % 0 % C o u E x p % w A d j C o u E x p % w A d j C o u E x p % w A d j C o u E x p % w A d j C o u E x p % w S i n M a r W i d D i v / M A T o t a a n c e C a n c N C R E T o t a l u a r 0 a 3 3 7 2 3 3 4 1 1 5 7 3 P L L A N a l u d f m p s i d 1 m a
  • 76. Fisher’s Exact Test • After Importing your dataset, and providing names to variables, click on: • ANALYZE  DESCRIPTIVE STATISTICS  CROSSTABS • For ROWS, Select the Independent Variable • For COLUMNS, Select the Dependent Variable • Under STATISTICS, Click on CHI-SQUARE • Under CELLS, Click on OBSERVED and ROW PERCENTAGES • NOTE: You will want to code the data so that the outcome present (Success) category has the lower value (e.g. 1) and the outcome absent (Failure) category has the higher value (e.g. 2). Similar for Exposure present category (e.g. 1) and exposure absent (e.g. 2). Use Value Labels to keep output straight.
  • 77. Example 5.5 - Antiseptic Experiment R E 6 4 0 % % % 6 9 5 % % % 2 3 5 % % % C % C % C % A C T T e a D e T H o t u a r 5 b 1 4 8 1 8 7 1 3 0 5 0 4 2 1 4 5 P e C a L i F i L i A s N a l u d f m p . s i d c t s i d a c t s i d C a . 0 1 0 b .
  • 78. McNemar’s Test • After Importing your dataset, and providing names to variables, click on: • ANALYZE  DESCRIPTIVE STATISTICS  CROSSTABS • For ROWS, Select the outcome for condition/time 1 • For COLUMNS, Select the outcome for condition/time 2 • Under STATISTICS, Click on MCNEMAR • Under CELLS, Click on OBSERVED and TOTAL PERCENTAGES • NOTE: You will want to code the data so that the outcome present (Success) category has the lower value (e.g. 1) and the outcome absent (Failure) category has the higher value (e.g. 2). Similar for Exposure present category (e.g. 1) and exposure absent (e.g. 2). Use Value Labels to keep output straight.
  • 79. Example 5.6 - Report of Implant Leak R E 9 8 7 % % % 5 3 8 % % % 4 1 5 % % % C % C % C % P A S T e s s e G o t a a M N l u t i d B a P-value
  • 80. Relative Risks and Odds Ratios • After Importing your dataset, and providing names to variables, click on: • ANALYZE  DESCRIPTIVE STATISTICS  CROSSTABS • For ROWS, Select the Independent Variable • For COLUMNS, Select the Dependent Variable • Under STATISTICS, Click on RISK • Under CELLS, Click on OBSERVED and ROW PERCENTAGES • NOTE: You will want to code the data so that the outcome present (Success) category has the lower value (e.g. 1) and the outcome absent (Failure) category has the higher value (e.g. 2). Similar for Exposure present category (e.g. 1) and exposure absent (e.g. 2). Use Value Labels to keep output straight.
  • 81. Example 5.1 - Pamidronate Study R 7 9 6 % % % 4 7 1 % % % 1 6 7 % % % C % C % C % P P P T Y e N o E V o t E s t 6 3 0 7 2 5 6 3 6 7 O ( P F Y F N N a lu o w p p o n e r
  • 82. Example 5.2 - Lip Cancer R E 9 9 8 % % % 8 1 9 % % % 7 0 7 % % % C % C % C % Y N P T Y e N o C R o t s t 3 1 9 6 8 5 8 2 4 7 O P F Y F N a l u w p p o n e r
  • 83. Correlation After Importing your dataset, and providing names to variables, click on: ANALYZE  CORRELATE BIVARIATE Select the VARIABLES Select the PEARSON CORRELATION Select the Two tailed test of significance Select Flag significant correlations
  • 84.
  • 85.
  • 86. Linear Regression • After Importing your dataset, and providing names to variables, click on: • ANALYZE  REGRESSION  LINEAR • Select the DEPENDENT VARIABLE • Select the INDEPENDENT VARAIABLE(S) • Click on STATISTICS, then ESTIMATES, CONFIDENCE INTERVALS, MODEL FIT
  • 87. Examples 7.1-7.6 - Gemfibrozil Clearance i c a 8 8 1 0 0 6 5 1 5 3 6 2 8 ( C C M 1 B E d a i c e t a r f i c t S i g r B r B n c e D a
  • 88. Examples 7.1-7.6 - Gemfibrozil Clearance O b 2 1 8 3 6 a 8 5 3 0 6 R R T M 1 m u a d f S F S i g P a D b S u b 5 a 1 6 0 M 1 R q u u s q E r s t P a D b
  • 89. Linear Regression • We will introduce simple linear regression, in particular we will: • Learn when we can use simple linear regression • Learn the basic workings involved in simple linear regression • Linear Regression in SPSS • This presentation is intended for students in initial stages of Statistics. No previous knowledge is required. 90
  • 90. Linear Regression • Regression is used to study the relationship between two variables. • How a change in one variable (e.g., someone’s exercise habits) can predict the outcome of another variable (e.g., general health). • We can use simple regression if both the dependent variable (DV) and the independent variable (IV) are numerical. • If the DV is numerical but the IV is categorical, it is best to use ANOVA. 91
  • 91. Examples The following are situations where we can use regression: • Testing if IQ affects income (IQ is the IV and income is the DV). • Testing if study time affects grades (hours of study time is the IV and average grade is the DV). • Testing if exercise affects blood pressure (hours of exercise is the IV and blood pressure is the DV). 92
  • 92. Displaying the data When both the DV and IV are numerical, we can represent data in the form of a scatterplot. 93
  • 93. Displaying the data It is important to perform a scatterplot because it helps us to see if the relationship is linear. In this example, the relationship between body fat % and chance of heart failure is not linear and hence it is not sensible to use linear regression.
  • 94. 95 • Straight line prediction model. • As an independent variable changes, what happens to the dependent variable? I.e., as an independent variable goes up and down, does the dependent variable go up and down? • They could either move in the same direction (positive relationship) or opposite direction (negative relationship) Linear Regression
  • 95. 96 • Straight line prediction model. • As an independent variable changes, what happens to the dependent variable? I.e., as an independent variable goes up and down, does the dependent variable go up and down? • They could either move in the same direction (positive relationship) or opposite direction (negative relationship) Linear Regression
  • 97. 98 Linear Regression y = B0 + B1 * X + E grades B0 B1 study time
  • 98. 99 Linear Regression y = B0 - B1 * X + E
  • 99. Assumptions of regression • The errors E are normally distributed. This can be tested by plotting an histogram of the residuals of the regression and checking that they all have a bell shape. Alternatively, you could use the Shapiro-Wilk test for normality. 100
  • 100. Assumptions of regression • There are no clear outliers This can be checked by performing the scatterplot. The outliers (circled in red in the figure) can simply be removed from the analysis. 101
  • 101. Hypothesis testing Regression tests the null hypothesis: H0 : There is no effect of X on Y. versus the alternative hypothesis: H1 : There is an effect of X on Y. If the null hypothesis is rejected, we reject the hypothesis that there is no relationship and hence we conclude that there is a significant relationship between X and Y. 102
  • 102. How do we know if we should reject the null hypothesis? We perform regression in SPSS and look at the p-value of the coefficient b. If the p-value is less than 0.05, we reject the null hypothesis (the variable is significant), otherwise, we do not reject the null hypothesis (the variable is not significant). 103 Hypothesis testing
  • 103. Interpreting the output: 1. The first table that we’re interested in is the Model Summary. • The R value represents the simple correlation. This indicates a strong degree of correlation between our two variables. • The R2 value indicates how much of the total variation in the dependent variable (perceived stress) can be explained by the independent variable (mastery). In this case, 37.3% can be explained. 104 Regression in SPSS https://statistics.laerd.com/spss-tutorials/linear-regression-using-spss-statistics.php
  • 104. Interpreting the output: 2. The next table is the ANOVA table, which shows us how well the regression equation fits the data (i.e., predicts the dependent variable). • The regression predicts the dependent variable significantly well (p < .001). 105 Regression in SPSS https://statistics.laerd.com/spss-tutorials/linear-regression-using-spss-statistics.php
  • 105. Interpreting the output: 3. The Coefficients table gives us the information that we need to predict stress from mastery, as well as determine whether mastery contributes statistically significantly to the model. 106 Regression in SPSS Y = B0 + B1 * X Total perceived stress = 46.32 + (-.9*Total Mastery)
  • 106. Example results write-up: A simple linear regression was carried out to test if total mastery significantly predicted total perceived stress. The results of the regression indicated that the model explained 37.3% of the variance and that the model was significant F (1, 431) = 257.63, p < .001. It was found that total mastery significantly predicted total perceived stress (B1 = -.9, p < .001). The final predictive model was: total perceived stress = 46.32 + (-9*total mastery) 107 Regression in SPSS
  • 107. 108 Regression in SPSS Results are usually rounded to two decimal places
  • 108. Understanding Factor Analysis  Regardless of purpose, factor analysis is used in:  the determination of a small number of factors based on a particular number of inter-related quantitative variables.  Unlike variables directly measured such as speed, height, weight, etc., some variables such as egoism, creativity, happiness, religiosity, comfort are not a single measurable entity.  They are constructs that are derived from the measurement of other, directly observable variables .
  • 109. Understanding Factor Analysis 110  Constructs are usually defined as unobservable latent variables. E.g.:  motivation/love/hate/care/altruism/anxiety/worry/stress/product quality/physical aptitude/democracy /reliability/power.  Example: the construct of teaching effectiveness. Several variables are used to allow the measurement of such construct (usually several scale items are used) because the construct may include several dimensions.  Factor analysis measures not directly observable constructs by measuring several of its underlying dimensions.  The identification of such underlying dimensions (factors) simplifies the understanding and description of complex constructs.
  • 110. Understanding Factor Analysis 111 • Generally, the number of factors is much smaller than the number of measures. • Therefore, the expectation is that a factor represents a set of measures. • From this angle, factor analysis is viewed as a data-reduction technique as it reduces a large number of overlapping variables to a smaller set of factors that reflect construct(s) or different dimensions of contruct(s).
  • 111. Understanding Factor Analysis 112  The assumption of factor analysis is that underlying dimensions (factors) can be used to explain complex phenomena.  Observed correlations between variables result from their sharing of factors.  Example: Correlations between a person’s test scores might be linked to shared factors such as general intelligence, critical thinking and reasoning skills, reading comprehension etc.
  • 112. Ingredients of a Good Factor Analysis Solution 113 • A major goal of factor analysis is to represent relationships among sets of variables parsimoniously yet keeping factors meaningful. • A good factor solution is both simple and interpretable. • When factors can be interpreted, new insights are possible.
  • 113. Application of Factor Analysis 114  Defining indicators of constructs:  Ideally 4 or more measures should be chosen to represent each construct of interest.  The choice of measures should, as much as possible, be guided by theory, previous research, and logic.
  • 114. Application of Factor Analysis 115  Defining dimensions for an existing measure: In this case the variables to be analyzed are chosen by the initial researcher and not the person conducting the analysis. Factor analysis is performed on a predetermined set of items/scales. Results of factor analysis may not always be satisfactory: The items or scales may be poor indicators of the construct or constructs. There may be too few items or scales to represent each underlying dimension.
  • 115. Application of Factor Analysis 116  Selecting items or scales to be included in a measure. Factor analysis may be conducted to determine what items or scales should be included and excluded from a measure. Results of the analysis should not be used alone in making decisions of inclusions or exclusions. Decisions should be taken in conjunction with the theory and what is known about the construct(s) that the items or scales assess.
  • 116. Steps in Factor Analysis 117 • Factor analysis usually proceeds in four steps: • 1st Step: the correlation matrix for all variables is computed • 2nd Step: Factor extraction • 3rd Step: Factor rotation • 4th Step: Make final decisions about the number of underlying factors
  • 117. Steps in Factor Analysis: The Correlation Matrix 118 • 1st Step: the correlation matrix • Generate a correlation matrix for all variables • Identify variables not related to other variables • If the correlation between variables are small, it is unlikely that they share common factors (variables must be related to each other for the factor model to be appropriate). • Think of correlations in absolute value. • Correlation coefficients greater than 0.3 in absolute value are indicative of acceptable correlations. • Examine visually the appropriateness of the factor model.
  • 118. Steps in Factor Analysis: The Correlation Matrix • Bartlett Test of Sphericity:  used to test the hypothesis the correlation matrix is an identity matrix (all diagonal terms are 1 and all off-diagonal terms are 0).  If the value of the test statistic for sphericity is large and the associated significance level is small, it is unlikely that the population correlation matrix is an identity. • If the hypothesis that the population correlation matrix is an identity cannot be rejected because the observed significance level is large, the use of the factor model should be reconsidered. 119
  • 119. Steps in Factor Analysis: The Correlation Matrix • The Kaiser-Meyer-Olkin (KMO) measure of sampling adequacy:  is an index for comparing the magnitude of the observed correlation coefficients to the magnitude of the partial correlation coefficients.  The closer the KMO measure to 1 indicate a sizeable sampling adequacy (.8 and higher are great, .7 is acceptable, .6 is mediocre, less than .5 is unaccaptable ).  Reasonably large values are needed for a good factor analysis. Small KMO values indicate that a factor analysis of the variables may not be a good idea. 120
  • 120. Steps in Factor Analysis: Factor Extraction 121  2nd Step: Factor extraction  The primary objective of this stage is to determine the factors.  Initial decisions can be made here about the number of factors underlying a set of measured variables.  Estimates of initial factors are obtained using Principal components analysis.  The principal components analysis is the most commonly used extraction method . Other factor extraction methods include:  Maximum likelihood method  Principal axis factoring  Alpha method  Unweighted lease squares method  Generalized least square method  Image factoring.
  • 121. Steps in Factor Analysis: Factor Extraction 122  In principal components analysis, linear combinations of the observed variables are formed.  The 1st principal component is the combination that accounts for the largest amount of variance in the sample (1st extracted factor).  The 2nd principle component accounts for the next largest amount of variance and is uncorrelated with the first (2nd extracted factor).  Successive components explain progressively smaller portions of the total sample variance, and all are uncorrelated with each other.
  • 122. Steps in Factor Analysis:Factor Extraction 123  To decide on how many factors we need to represent the data, we use 2 statistical criteria:  Eigen Values, and  The Scree Plot.  The determination of the number of factors is usually done by considering only factors with Eigen values greater than 1.  Factors with a variance less than 1 are no better than a single variable, since each variable is expected to have a variance of 1. Total Variance Explained Comp onent Initial Eigenvalues Extraction Sums of Squared Loadings Total % of Variance Cumulati ve % Total % of Variance Cumulati ve % 1 3.046 30.465 30.465 3.046 30.465 30.465 2 1.801 18.011 48.476 1.801 18.011 48.476 3 1.009 10.091 58.566 1.009 10.091 58.566 4 .934 9.336 67.902 5 .840 8.404 76.307 6 .711 7.107 83.414 7 .574 5.737 89.151 8 .440 4.396 93.547 9 .337 3.368 96.915 10 .308 3.085 100.000 Extraction Method: Principal Component Analysis.
  • 123. Steps in Factor Analysis: Factor Extraction  The examination of the Scree plot provides a visual of the total variance associated with each factor.  The steep slope shows the large factors.  The gradual trailing off (scree) shows the rest of the factors usually lower than an Eigen value of 1.  In choosing the number of factors, in addition to the statistical criteria, one should make initial decisions based on conceptual and theoretical grounds.  At this stage, the decision about the number of factors is not final. 124
  • 124. Steps in Factor Analysis: Factor Extraction 125 Component Matrixa Component 1 2 3 I discussed my frustrations and feelings with person(s) in school .771 -.271 .121 I tried to develop a step-by-step plan of action to remedy the problems .545 .530 .264 I expressed my emotions to my family and close friends .580 -.311 .265 I read, attended workshops, or sought someother educational approach to correct the problem .398 .356 -.374 I tried to be emotionally honest with my self about the problems .436 .441 -.368 I sought advice from others on how I should solve the problems .705 -.362 .117 I explored the emotions caused by the problems .594 .184 -.537 I took direct action to try to correct the problems .074 .640 .443 I told someone I could trust about how I felt about the problems .752 -.351 .081 I put aside other activities so that I could work to solve the problems .225 .576 .272 Extraction Method: Principal Component Analysis. a. 3 components extracted. Component Matrix using Principle Component Analysis
  • 125. Steps in Factor Analysis: Factor Rotation 126  3rd Step: Factor rotation.  In this step, factors are rotated.  Un-rotated factors are typically not very interpretable (most factors are correlated with may variables).  Factors are rotated to make them more meaningful and easier to interpret (each variable is associated with a minimal number of factors).  Different rotation methods may result in the identification of somewhat different factors.
  • 126. Steps in Factor Analysis: Factor Rotation  The most popular rotational method is Varimax rotations.  Varimax use orthogonal rotations yielding uncorrelated factors/components.  Varimax attempts to minimize the number of variables that have high loadings on a factor. This enhances the interpretability of the factors. 127
  • 127. Steps in Factor Analysis: Factor Rotation • Other common rotational method used include Oblique rotations which yield correlated factors. • Oblique rotations are less frequently used because their results are more difficult to summarize. • Other rotational methods include:  Quartimax (Orthogonal)  Equamax (Orthogonal)  Promax (oblique) 128
  • 128. Steps in Factor Analysis: Factor Rotation © Dr. Maher Khelifa 129 • A factor is interpreted or named by examining the largest values linking the factor to the measured variables in the rotated factor matrix. Rotated Component Matrixa Component 1 2 3 I discussed my frustrations and feelings with person(s) in school .803 .186 .050 I tried to develop a step-by-step plan of action to remedy the problems .270 .304 .694 I expressed my emotions to my family and close friends .706 -.036 .059 I read, attended workshops, or sought someother educational approach to correct the problem .050 .633 .145 I tried to be emotionally honest with my self about the problems .042 .685 .222 I sought advice from others on how I should solve the problems .792 .117 -.038 I explored the emotions caused by the problems .248 .782 -.037 I took direct action to try to correct the problems -.120 -.023 .772 I told someone I could trust about how I felt about the problems .815 .172 -.040 I put aside other activities so that I could work to solve the problems -.014 .155 .657 Extraction Method: Principal Component Analysis. Rotation Method: Varimax with Kaiser Normalization. a. Rotation converged in 5 iterations.
  • 129. Steps in Factor Analysis: Making Final Decisions 130 • 4th Step: Making final decisions • The final decision about the number of factors to choose is the number of factors for the rotated solution that is most interpretable. • To identify factors, group variables that have large loadings for the same factor. • Plots of loadings provide a visual for variable clusters. • Interpret factors according to the meaning of the variables • This decision should be guided by: • A priori conceptual beliefs about the number of factors from past research or theory • Eigen values computed in step 2. • The relative interpretability of rotated solutions computed in step 3.
  • 130. Assumptions Underlying Factor Analysis 131 • Assumption underlying factor analysis include. • The measured variables are linearly related to the factors + errors. • This assumption is likely to be violated if items limited response scales (two- point response scale like True/False, Right/Wrong items). • The data should have a bi-variate normal distribution for each pair of variables. • Observations are independent. • The factor analysis model assumes that variables are determined by common factors and unique factors. All unique factors are assumed to be uncorrelated with each other and with the common factors.
  • 131. Obtaining a Factor Analysis • Click: • Analyze and select • Dimension Reduction • Factor • A factor Analysis Box will appear © Dr. Maher Khelifa 132
  • 132. Obtaining a Factor Analysis • Move variables/scale items to Variable box 133
  • 133. Obtaining a Factor Analysis • Factor extraction • When variables are in variable box, select: • Extraction © Dr. Maher Khelifa 134
  • 134. Obtaining a Factor Analysis • When the factor extraction Box appears, select: • Scree Plot • keep all default selections including: • Principle component Analysis • Based on Eigen Value of 1, and • Un-rotated factor solution © Dr. Maher Khelifa 135
  • 135. Obtaining a Factor Analysis • During factor extraction keep factor rotation default of: • None • Press continue © Dr. Maher Khelifa 136
  • 136. Obtaining a Factor Analysis • During Factor Rotation: • Decide on the number of factors based on actor extraction phase and enter the desired number of factors by choosing: • Fixed number of factors and entering the desired number of factors to extract. • Under Rotation Choose Varimax • Press continue • Then OK © Dr. Maher Khelifa 137
  • 137. Bibliographical References  Almar, E.C. (2000). Statistical Tricks and traps. Los Angeles, CA: Pyrczak Publishing.  Bluman, A.G. (2008). Elemtary Statistics (6th Ed.). New York, NY: McGraw Hill.  Chatterjee, S., Hadi, A., & Price, B. (2000) Regression analysis by example. New York: Wiley.  Cohen, J., & Cohen, P. (1983). Applied multiple regression/correlation analysis for the behavioral sciences (2nd Ed.). Hillsdale, NJ.: Lawrence Erlbaum.  Darlington, R.B. (1990). Regression and linear models. New York: McGraw-Hill.  Einspruch, E.L. (2005). An introductory Guide to SPSS for Windows (2nd Ed.). Thousand Oak, CA: Sage Publications.  Fox, J. (1997) Applied regression analysis, linear models, and related methods. Thousand Oaks, CA: Sage Publications.  Glassnapp, D. R. (1984). Change scores and regression suppressor conditions. Educational and Psychological Measurement (44), 851-867.  Glassnapp. D. R., & Poggio, J. (1985). Essentials of Statistical Analysis for the Behavioral Sciences. Columbus, OH: Charles E. Merril Publishing.  Grimm, L.G., & Yarnold, P.R. (2000). Reading and understanding Multivariate statistics. Washington DC: American Psychological Association.  Hamilton, L.C. (1992) Regression with graphics. Belmont, CA: Wadsworth.  Hochberg, Y., & Tamhane, A.C. (1987). Multiple Comparisons Procedures. New York: John Wiley.  Jaeger, R. M. Statistics: A spectator sport (2nd Ed.). Newbury Park, London: Sage Publications. © Dr. Maher Khelifa 138
  • 138. Bibliographical References • Keppel, G. (1991). Design and Analysis: A researcher’s handbook (3rd Ed.). Englwood Cliffs, NJ: Prentice Hall. • Maracuilo, L.A., & Serlin, R.C. (1988). Statistical methods for the social and behavioral sciences. New York: Freeman and Company. • Maxwell, S.E., & Delaney, H.D. (2000). Designing experiments and analyzing data: Amodel comparison perspective. Mahwah, NJ. : Lawrence Erlbaum. • Norusis, J. M. (1993). SPSS for Windows Base System User’s Guide. Release 6.0. Chicago, IL: SPSS Inc. • Norusis, J. M. (1993). SPSS for Windows Advanced Statistics. Release 6.0. Chicago, IL: SPSS Inc. • Norusis, J. M. (2006). SPSS Statistics 15.0 Guide to Data Analysis. Upper Saddle River, NJ.: Prentice Hall. • Norusis, J. M. (2008). SPSS Statistics 17.0 Guide to Data Analysis. Upper Saddle River, NJ.: Prentice Hall. • Norusis, J. M. (2008). SPSS Statistics 17.0 Statistical Procedures Companion. Upper Saddle River, NJ.: Prentice Hall. • Norusis, J. M. (2008). SPSS Statistics 17.0 Advanced Statistical Procedures Companion. Upper Saddle River, NJ.: Prentice Hall. • Pedhazur, E.J. (1997). Multiple regression in behavioral research, third edition. New York: Harcourt Brace College Publishers. © Dr. Maher Khelifa 139