SlideShare una empresa de Scribd logo
1 de 61
Factor Analysis (FA)
• Factor analysis is an interdependence technique whose primary
purpose is to define the underlying structure among the
variables in the analysis.
• The purpose of FA is to condense the information contained in
a number of original variables into a smaller set of new
composite dimensions or variates (factors) with a minimum loss
of information.
Factor analysis decision process
Stage 1: Objectives of factor analysis
• Key issues:
• Specifying the unit of analysis
 R factor analysis- Correlation matrix of the variables to summarize the
characteristics.
 Q factor analysis- Correlation matrix of the individual respondents
based on their characteristics. Condenses large number of people into
distinctly different group.
• Achieving data summarization vs. data reduction
 Data summarization- It is the definition of structure. Viewing the set of
variables at various levels of generalization, ranging from the most
detailed level to the more generalized level. The linear composite of
variables is called variate or factor.
 Data reduction- Creating entirely a new set of variables and completely
replace the original values with empirical value (factor score).
• Variable selection
 The researcher should always consider the conceptual underpinnings of
the variables and use judgment as to the appropriateness of the variables
for factor analysis.
• Using factor analysis with other multivariate techniques
 Factor scores as representatives of variables will be used for further
analysis.
• Stage 2: Designing a factor analysis
• It involves three basic decisions:
 Correlations among variables or respondents (Q type vs. R type)
 Variable selection and measurement issues- Mostly performed on metric
variables. For nonmetric variables, define dummy variables (0-1) and
include in the set of metric variables.
 Sample size- The sample must have more observations than variables.
The minimum sample size should be fifty observations. Minimum 5 and
hopefully at least 10 observations per variable is desirable.
• Stage 3: Assumptions in factor analysis
– The assumptions are more conceptual than statistical.
 Conceptual issues- 1) Appropriate selection of variables 2)
Homogeneous sample.
 Statistical issues- Ensuring the variables are sufficiently intercorrelated
to produce representative factors.
Measure of intercorrelation:
Visual inspection of Correlations greater than .30 in substantial
cases in correlation matrix , the factor analysis is appropriate.
If partial correlation are high, indicating no underlying factors,
then factor analysis is inappropriate.
Bartlett test of sphericity- A test for the presence of correlation
among the variables. A statistically significant Bartlett’s test of
sphericity (sig. >.05) indicates that sufficient correlation exist
among the variables to proceed.
 Measure of sampling adequacy (MSA)- This index ranges from
0 to 1, reaching 1 when each variable is perfectly predicted
without error by the other variables. The measures can be
integrated with following guidelines:
– Kaiser-Meyer Measure of Sampling Adequacy
– in the .90s marvelous
– in the .80s meritorious
– in the .70s middling
– in the .60s mediocre
– in the .50s miserable
– below .50 unacceptable
• MSA values must exceed .50 for both the overall test and each
individual variable
• Variables with value less than .50 should be omitted from the
factor analysis.
• Stage 4: Deriving factors and assessing overall fit
• Apply factor analysis to identify the underlying structure of
relationships.
• Two decisions are important:
– Selecting the factor extraction method
• Common factor analysis
• Principal component analysis
• Concept of Partitioning the variance of a variable
– Common variance- Variance in the variable shared with all other
variables in the analysis. The variance is based on variable’s correlations
with other variables. Communality of variable estimates common
variance.
– Specific variance- AKA unique variance. This variance of variable cannot
be explained by the correlations to the other variables but is associated
uniquely with a single variable.
– Error variance- It is due to unreliability in the data-gathering process,
measurement error, or a random component in the measured
phenomenon.
• Component factor analysis- AKA principal components analysis.
Considers the total variance and derives factors that contain
small proportions of unique variance and in some instances
error variance.
• Common factor analysis- Considers only the common or shared
variance, assuming that both the unique and error variance are
not of interest in defining the structure of the variables.
Diagonal value
Unity
Variance
Communality
Variance extracted
Variance excluded
Total variance
common
• Suitability of factor extraction method
– Component factor analysis is appropriate when data reduction is primary
concern.
– Common factor analysis is appropriate when primary objective is to
identify the latent dimensions or constructs represented in the original
value.
• Criteria for the number of factors to extract
– Latent root criterion
• It applies to both extraction method.
• This criteria assumes that any individual factor should account for the
variance of at least a single variable if it is to be retained for interpretation.
• In component analysis each variable contribute a value of 1 to the latent
roots or eigen values.
• So, factors having eigen values greater than 1 are considered significant and
selected.
• Eigen value- It represents the amount of variance accounted
for by the factor. It is column sum of squared loading for a
factor.
– Scree test criterion
• This is plotting the latent roots against the number of
factors in their order of extraction.
• The shape of the resulting curve is used to evaluate the
cutoff point.
• The point at which the curve begins to straighten out is
considered to indicate the maximum numbers of factors
to extract.
• As a general rule, the scree test results in at least one
and sometimes two or three more factors being
considered for inclusion than does the latent root
criterion.
012345
0 5 10
Number
Scree plot of eigenvalues after factor
Factor
Eigen values
Scree criterion
• Stage 5: Interpreting the factors
• Three processes of factor interpretation
– Estimate the factor matrix
• Initial unrotated factor matrix is computed.
• It contains factor loadings for each variable on each factor.
• Factor loadings are the correlation of each variable on each factor.
• Higher loadings making the variable representative of the factor.
– Factor rotation
• Rotational method is employed to achieve simpler and theoretically
more meaningful factor solutions.
• The reference axes of the factors are turned about the origin until
some position has been reached.
• There are two types of rotation:
• Orthogonal factor rotation
• Oblique factor rotation.
Rotating Factors
F1
F1
F2
F2
Factor 1 Factor 2
x1 0.5 0.5
x2 0.8 0.8
x3 -0.7 0.7
x4 -0.5 -0.5
Factor 1 Factor 2
x1 0 0.6
x2 0 0.9
x3 -0.9 0
x4 0 -0.9
2
1
3
4
2
1
3
4
Orthogonal Rotation Oblique Rotation
14
When to use Factor Analysis?
• Data Reduction
• Identification of underlying latent structures
- Clusters of correlated variables are termed factors
– Example:
– Factor analysis could potentially be used to identify
the characteristics (out of a large number of
characteristics) that make a person popular.
Candidate characteristics: Level of social skills, selfishness, how
interesting a person is to others, the amount of time they spend
talking about themselves (Talk 2) versus the other person (Talk
1), their propensity to lie about themselves.
15
The R-Matrix
Meaningful clusters of large correlation
coefficients between subsets of variables
suggests these variables are measuring
aspects of the same underlying
dimension.
Factor 1:
The better your social skills,
the more interesting and
talkative you tend to be.
Factor 2:
Selfish people are likely to lie
and talk about themselves.
16
What is a Factor?
• Factors can be viewed as classification axes along
which the individual variables can be plotted.
• The greater the loading of variables on a factor,
the more the factor explains relationships among
those variables.
• Ideally, variables should be strongly related to (or
load on) only one factor.
17
Graphical Representation of a
factor plot
Note that each variable
loads primarily on only
one factor.
Factor loadings tell use about
the relative contribution that a
variable makes to a factor
18
Mathematical Representation
of a factor plot
Yi = b1X1i +b2X2i + … bnXn + εi
Factori = b1Variable1i +b2Variable2i + … bnVariablen + εi
• The equation describing a linear model can be
applied to the description of a factor.
• The b’s in the equation represent the factor
loadings observed in the factor plot.
Note: there is no intercept in the equation since the lines intersection at zero and hence
the intercept is also zero.
19
Mathematical Representation
of a factor plot
Sociabilityi = b1Talk 1i +b2Social Skillsi + b3interesti
+ b4Talk 2 + b5Selfishi + b6Liari + εi
There are two factors underlying the popularity construct: general
sociability and consideration.
We can construct equations that describe each factor in terms of the
variables that have been measured.
Considerationi = b1Talk 1i +b2Social Skillsi +
b3interesti + b4Talk 2 + b5Selfishi + b6Liari + εi
20
Mathematical Representation
of a factor plot
Sociabilityi = 0.87Talk 1i +0.96Social Skillsi + 0.92Interesti + 0.00Talk 2 -
0.10Selfishi + 0.09Liari + εi
The values of the “b’s” in the two equations differ, depending on
the relative importance of each variable to a particular factor.
Considerationi = 0.01Talk 1i - 0.03Social Skillsi + 0.04interesti + 0.82Talk 2 +
0.75Selfishi + 0.70Liari + εi
Ideally, variables should have very high b-values for one factor and very low
b-values for all other factors.
Replace values of b with the co-ordinate of each variable on the graph.
21
Factor Loadings
• The b values represent the weights of a variable on a factor and are
termed Factor Loadings.
• These values are stored in a Factor pattern matrix (A).
• Columns display the factors (underlying constructs) and rows
display how each variable loads onto each factor.
Variables
Factors
Sociability Consideration
Talk 1 0.87 0.01
Social Skills 0.96 -0.03
Interest 0.92 0.04
Talk 2 0.00 0.82
Selfish -0.10 0.75
Liar 0.09 0.70
22
Factor Scores
• Once factors are derived, we can estimate each
person’s Factor Scores (based on their scores for each
factor’s constituent variables).
• Potential uses for Factor Scores.
- Estimate a person’s score on one or more factors.
- Answer questions of scientific or practical interest (e.g., Are females are
more sociable than males? using the factors scores for sociability).
• Methods of Determining Factor Scores
- Weighted Average (simplest, but scale dependent)
- Regression Method (easiest to understand; most typically used)
- Bartlett Method (produces scores that are unbiased and correlate only with their
own factor).
- Anderson-Rubin Method (produces scores that are uncorrelated and
standardized)
Approaches to Factor Analysis
• Exploratory
– Reduce a number of measurements to a smaller number of indices or
factors (e.g., Principal Components Analysis or PCA).
– Goal: Identify factors based on the data and to maximize the amount
of variance explained.
• Confirmatory
– Test hypothetical relationships between measures and more abstract
constructs.
– Goal: The researcher must hypothesize, in advance, the number of
factors, whether or not these factors are correlated, and which items
load onto and reflect particular factors. In contrast to EFA, where all
loadings are free to vary, CFA allows for the explicit constraint of
certain loadings to be zero.
Communality
• Understanding variance in an R-matrix
– Total variance for a particular variable has two
components:
• Common Variance – variance shared with other variables.
• Unique Variance – variance specific to that variable (including
error or random variance).
• Communality
– The proportion of common (or shared) variance present in a
variable is known as the communality.
– A variable that has no unique variance has a communality of 1;
one that shares none of its variance with any other variable has
a communality of 0.
Factor Extraction: PCA vs. Factor Analysis
– Principal Component Analysis. A data reduction technique that represents
a set of variables by a smaller number of variables called principal components.
They are uncorrelated, and therefore, measure different, unrelated aspects or
dimensions of the data.
– Principal Components are chosen such that the first one accounts for as much of
the variation in the data as possible, the second one for as much of the
remaining variance as possible, and so on.
– Useful for combining many variables into a smaller number of subsets.
– Factor Analysis. Derives a mathematical model from which factors are
estimated.
– Factors are linear combinations that maximize the shared portion of the
variance underlying latent constructs.
– May be used to identify the structure underlying such variables and to estimate
scores to measure latent factors themselves.
Factor Extraction: Eigenvalues & Scree Plot
• Eigenvalues
– Measure the amount of variation accounted for by each factor.
– Number of principal components is less than or equal to the number of
original variables. The first principal component accounts for as much of
the variability in the data as possible. Each succeeding component has the
highest variance possible under the constraint that it be orthogonal to
(i.e., uncorrelated with) the preceding components.
• Scree Plots
– Plots a graph of each eigenvalue (Y-axis) against the factor with
which it is associated (X-axis).
– By graphing the eigenvalues, the relative importance of each factor
becomes apparent.
27
Factor Retention Based on Scree Plots
28
Kaiser (1960) recommends retaining all factors with
eigenvalues greater than 1.
- Based on the idea that eigenvalues represent the amount
of variance explained by a factor and that an eigenvalue
of 1 represents a substantial amount of variation.
- Kaiser’s criterion tends to overestimate the number of
factors to be retained.
Factor Retention: Kaiser’s Criterion
29
• Students often become stressed about statistics
(SAQ) and the use of computers and/or SPSS to
analyze data.
• Suppose we develop a questionnaire to measure
this propensity (see sample items on the following
slides; the data can be found in SAQ.sav).
• Does the questionnaire measure a single construct?
Or is it possible that there are multiple aspects
comprising students’ anxiety toward SPSS?
Doing Factor Analysis: An Example
30
31
32
Doing Factor Analysis: Some
Considerations
• Sample size is important! A sample of 300 or more
will likely provide a stable factor solution, but
depends on the number of variables and factors
identified.
• Factors that have four or more loadings greater than
0.6 are likely to be reliable regardless of sample
size.
• Correlations among the items should not be too low
(less than .3) or too high (greater than .8), but the
pattern is what is important.
33
nce
0669 096963 0192 19
9063 96 0564 0237 42
7511 72 5815 3998 42
7672 73 6174 97 53 17
852
534
627
340
156
773
425
016
217
829
986
351
801
624
436
839
909
432
380
Co
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
otal
% of
i an
m ula
%Tota
% o
ri an
m ula
%Tota
% o
ari a
m u
%
ge no f Sof S
Ex
Factor Extraction
34
Scree Plot for the
SAQ Data
35
un a
05
04
00
09
03
04
05
09
04
05
00
03
06
08
08
07
03
07
03
04
00
04
02
Q0
Q0
Q0
Q0
Q0
Q0
Q0
Q0
Q0
Q1
Q1
Q1
Q1
Q1
Q1
Q1
Q1
Q1
Q1
Q2
Q2
Q2
Q2
iti aa ct
Ex
Table of Communalities Before
and After Extraction
ne na
01
85
79
73
69
58
56
5200
43
34
29
93
86
56
490117
37
3604
27
27
48
65
6271
07
Q
Q
Q
Q
Q
Q
Q
Q
Q
Q
Q
Q
Q
Q
Q
Q
Q
Q
Q
Q
Q
Q
Q
1234
mp o
E
4a.
Component Matrix Before Rotation
(loadings of each variable onto each factor)
Note: Loadings less than
0.4 have been omitted.
36
Factor Rotation
• To aid interpretation it is possible to maximize the
loading of a variable on one factor while
minimizing its loading on all other factors.
• This is known as Factor Rotation.
• Two types:
– Orthogonal (factors are uncorrelated)
– Oblique (factors intercorrelate)
37
Orthogonal Rotation Oblique Rotation
38
Rotated Com ponent Matrixa
.800
.684
.647
.638
.579
.550
.459
.677
.661
-.567
.473 .523
.516
.514
.496
.429
.833
.747
.747
.648
.645
.586
.543
.427
I have little experience of computers
SPSS alw ays crashes w hen I try to use it
I w orry that I w ill cause irreparable damage because
of my incompetenece w ith computers
All computers hate me
Computers have minds of their ow n and deliberately
go w rong w henever I use them
Computers are useful only for playing games
Computers are out to get me
I can't sleep for thoughts of eigen vectors
I w ake up under my duvet thinking that I am trapped
under a normal distribtion
Standard deviations excite me
People try to tell you that SPSS makes statistics
easier to understand but it doesn't
I dream that Pearson is attacking me w ith correlation
coefficients
I w eep openly at the mention of central tendency
Statiscs makes me cry
I don't understand statistics
I have never been good at mathematics
I slip into a coma w henever I see an equation
I did badly at mathematics at school
My friends are better at statistics than me
My friends are better at SPSS than I am
If I'm good at statistics my friends w ill think I'm a nerd
My friends w ill think I'm stupid for not being able to
cope w ith SPSS
Everybody looks at me w hen I use SPSS
1 2 3 4
Component
Extraction Method: Principal Component Analysis.
Rotation Method: Varimax w ith Kaiser Normalization.
Rotation converged in 9 iterations.a.
Orthogonal
Rotation (varimax)
Fear of Computers
Fear of Statistics
Fear of Math
Peer Evaluation
Note: Varimax rotation is the
most commonly used
rotation. Its goal is to
minimize the complexity of
the components by making
the large loadings larger and
the small loadings smaller
within each component.
Quartimax rotation makes
large loadings larger and
small loadings smaller within
each variable. Equamax
rotation is a compromise that
attempts to simplify both
components and variables.
These are all orthogonal
rotations, that is, the axes
remain perpendicular, so the
components are not
correlated.
39
Oblique
Rotation: Pattern
Matrix
Pattern Matrixa
.706
.591
-.511
.405
.400
.643
.621
.615
.507
.885
.713
.653
.650
.588
.585
.412 .462
.411
-.902
-.774
-.774
I can't sleep for thoughts of eigen vectors
I w ake up under my duvet thinking that I am trapped
under a normal distribtion
Standard deviations excite me
I dream that Pearson is attacking me w ith correlation
coefficients
I w eep openly at the mention of central tendency
Statiscs makes me cry
I don't understand statistics
My friends are better at SPSS than I am
My friends are better at statistics than me
If I'm good at statistics my friends w ill think I'm a nerd
My friends w ill think I'm stupid for not being able to
cope w ith SPSS
Everybody looks at me w hen I use SPSS
I have little experience of computers
SPSS alw ays crashes w hen I try to use it
All computers hate me
I w orry that I w ill cause irreparable damage because
of my incompetenece w ith computers
Computers have minds of their ow n and deliberately
go w rong w henever I use them
Computers are useful only for playing games
People try to tell you that SPSS makes statistics
easier to understand but it doesn't
Computers are out to get me
I have never been good at mathematics
I slip into a coma w henever I see an equation
I did badly at mathematics at school
1 2 3 4
Component
Extraction Method: Principal Component Analysis.
Rotation Method: Oblimin w ith Kaiser Normalization.
Rotation converged in 29 iterations.a.
Fear of Statistics
Fear of Computers
Fear of Math
Peer Evaluation
40
Reliability:
A measure should consistently reflect the construct it is measuring
• Test-Retest Method
– What about practice effects/mood states?
• Alternate Form Method
– Expensive and Impractical
• Split-Half Method
– Splits the questionnaire into two random halves,
calculates scores and correlates them.
• Cronbach’s Alpha
– Splits the questionnaire (or sub-scales of a questionnaire)
into all possible halves, calculates the scores, correlates
them and averages the correlation for all splits.
– Ranges from 0 (no reliability) to 1 (complete reliability)
41
Reliability: Fear of Computers Subscale
42
Reliability: Fear of Statistics Subscale
43
Reliability: Fear of Math Subscale
44
Reliability: Peer Evaluation Subscale
45
Reporting the Results
A principal component analysis (PCA) was conducted on the 23 items with
orthogonal rotation (varimax). Bartlett’s test of sphericity, Χ2(253) = 19334.49,
p< .001, indicated that correlations between items were sufficiently large for
PCA. An initial analysis was run to obtain eigenvalues for each component in
the data. Four components had eigenvalues over Kaiser’s criterion of 1 and
in combination explained 50.32% of the variance. The scree plot was slightly
ambiguous and showed inflexions that would justify retaining either 2 or 4
factors.
Given the large sample size, and the convergence of the scree plot and
Kaiser’s criterion on four components, four components were retained in the
final analysis. Component 1 represents a fear of computers, component 2 a
fear of statistics, component 3 a fear of math, and component 4 peer
evaluation concerns.
The fear of computers, fear of statistics, and fear of math subscales of the
SAQ all had high reliabilities, all Chronbach’s α = .82. However, the fear of
negative peer evaluation subscale had a relatively low reliability, Chronbach’s
α= .57.
Step 1: Select Factor Analysis
Step 2: Add all variables to be included
Step 3: Get descriptive statistics & correlations
Step 4: Ask for Scree Plot and set extraction options
Step 5: Handle missing values and sort coefficients by
size
Step 6: Select rotation type and set rotation
iterations
Step 7: Save Factor Scores
Communalities
Variance Explained
Scree Plot
Rotated Component Matrix: Component 1
Rotated Component Matrix: Component 2
Component 1: Factor Score
Component (Factor): Score Values
Rename Components According to
Interpretation
Factor analysis (fa)

Más contenido relacionado

La actualidad más candente

Research Methology -Factor Analyses
Research Methology -Factor AnalysesResearch Methology -Factor Analyses
Research Methology -Factor Analyses
Neerav Shivhare
 

La actualidad más candente (20)

Research Methology -Factor Analyses
Research Methology -Factor AnalysesResearch Methology -Factor Analyses
Research Methology -Factor Analyses
 
An Introduction to Factor analysis ppt
An Introduction to Factor analysis pptAn Introduction to Factor analysis ppt
An Introduction to Factor analysis ppt
 
Exploratory Factor Analysis
Exploratory Factor AnalysisExploratory Factor Analysis
Exploratory Factor Analysis
 
Factor analysis
Factor analysisFactor analysis
Factor analysis
 
Factor analysis
Factor analysisFactor analysis
Factor analysis
 
Factor Analysis
Factor Analysis Factor Analysis
Factor Analysis
 
Factor analysis
Factor analysisFactor analysis
Factor analysis
 
Factor Analysis - Statistics
Factor Analysis - StatisticsFactor Analysis - Statistics
Factor Analysis - Statistics
 
Discriminant analysis
Discriminant analysisDiscriminant analysis
Discriminant analysis
 
Factor analysis
Factor analysisFactor analysis
Factor analysis
 
Factor Analysis with an Example
Factor Analysis with an ExampleFactor Analysis with an Example
Factor Analysis with an Example
 
Exploratory factor analysis
Exploratory factor analysisExploratory factor analysis
Exploratory factor analysis
 
Discriminant analysis
Discriminant analysisDiscriminant analysis
Discriminant analysis
 
Exploratory factor analysis
Exploratory factor analysisExploratory factor analysis
Exploratory factor analysis
 
Manova
ManovaManova
Manova
 
Regression vs correlation and causation
Regression vs correlation and causationRegression vs correlation and causation
Regression vs correlation and causation
 
Exploratory Factor Analysis
Exploratory Factor AnalysisExploratory Factor Analysis
Exploratory Factor Analysis
 
Correlation analysis ppt
Correlation analysis pptCorrelation analysis ppt
Correlation analysis ppt
 
Multivariate analysis
Multivariate analysisMultivariate analysis
Multivariate analysis
 
Confirmatory Factor Analysis
Confirmatory Factor AnalysisConfirmatory Factor Analysis
Confirmatory Factor Analysis
 

Similar a Factor analysis (fa)

factor analysis (basics) for research .ppt
factor analysis (basics) for research .pptfactor analysis (basics) for research .ppt
factor analysis (basics) for research .ppt
MsHumaJaved
 
Applied statistics lecture_5
Applied statistics lecture_5Applied statistics lecture_5
Applied statistics lecture_5
Daria Bogdanova
 
08 - FACTOR ANALYSIS PPT.pptx
08 - FACTOR ANALYSIS PPT.pptx08 - FACTOR ANALYSIS PPT.pptx
08 - FACTOR ANALYSIS PPT.pptx
Rishashetty8
 
Exploratory
Exploratory Exploratory
Exploratory
toby2036
 
Overview Of Factor Analysis Q Ti A
Overview Of  Factor  Analysis  Q Ti AOverview Of  Factor  Analysis  Q Ti A
Overview Of Factor Analysis Q Ti A
Zoha Qureshi
 

Similar a Factor analysis (fa) (20)

factor analysis (basics) for research .ppt
factor analysis (basics) for research .pptfactor analysis (basics) for research .ppt
factor analysis (basics) for research .ppt
 
Applied statistics lecture_5
Applied statistics lecture_5Applied statistics lecture_5
Applied statistics lecture_5
 
QCI WORKSHOP- Factor analysis-
QCI WORKSHOP- Factor analysis-QCI WORKSHOP- Factor analysis-
QCI WORKSHOP- Factor analysis-
 
08 - FACTOR ANALYSIS PPT.pptx
08 - FACTOR ANALYSIS PPT.pptx08 - FACTOR ANALYSIS PPT.pptx
08 - FACTOR ANALYSIS PPT.pptx
 
FactorAnalysis.ppt
FactorAnalysis.pptFactorAnalysis.ppt
FactorAnalysis.ppt
 
Marketing Research-Factor Analysis
Marketing Research-Factor AnalysisMarketing Research-Factor Analysis
Marketing Research-Factor Analysis
 
Exploratory
Exploratory Exploratory
Exploratory
 
Factor Analysis.ppt
Factor Analysis.pptFactor Analysis.ppt
Factor Analysis.ppt
 
Biostatistics and Research Methodology Semester 8
Biostatistics and Research Methodology Semester 8Biostatistics and Research Methodology Semester 8
Biostatistics and Research Methodology Semester 8
 
lecture 2.pptx
lecture 2.pptxlecture 2.pptx
lecture 2.pptx
 
Overview Of Factor Analysis Q Ti A
Overview Of  Factor  Analysis  Q Ti AOverview Of  Factor  Analysis  Q Ti A
Overview Of Factor Analysis Q Ti A
 
Chapter 8: Measurement and Sampling
Chapter 8: Measurement and SamplingChapter 8: Measurement and Sampling
Chapter 8: Measurement and Sampling
 
A presentation for Multiple linear regression.ppt
A presentation for Multiple linear regression.pptA presentation for Multiple linear regression.ppt
A presentation for Multiple linear regression.ppt
 
Questionnaire and Instrument validity
Questionnaire and Instrument validityQuestionnaire and Instrument validity
Questionnaire and Instrument validity
 
Mba2216 week 11 data analysis part 03 appendix
Mba2216 week 11 data analysis part 03 appendixMba2216 week 11 data analysis part 03 appendix
Mba2216 week 11 data analysis part 03 appendix
 
what is Correlations
what is Correlationswhat is Correlations
what is Correlations
 
Conjoint Analysis
Conjoint AnalysisConjoint Analysis
Conjoint Analysis
 
Confirmatory Factor Analysis
Confirmatory Factor AnalysisConfirmatory Factor Analysis
Confirmatory Factor Analysis
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
Factor analysis
Factor analysisFactor analysis
Factor analysis
 

Factor analysis (fa)

  • 1. Factor Analysis (FA) • Factor analysis is an interdependence technique whose primary purpose is to define the underlying structure among the variables in the analysis. • The purpose of FA is to condense the information contained in a number of original variables into a smaller set of new composite dimensions or variates (factors) with a minimum loss of information.
  • 2. Factor analysis decision process Stage 1: Objectives of factor analysis • Key issues: • Specifying the unit of analysis  R factor analysis- Correlation matrix of the variables to summarize the characteristics.  Q factor analysis- Correlation matrix of the individual respondents based on their characteristics. Condenses large number of people into distinctly different group. • Achieving data summarization vs. data reduction  Data summarization- It is the definition of structure. Viewing the set of variables at various levels of generalization, ranging from the most detailed level to the more generalized level. The linear composite of variables is called variate or factor.  Data reduction- Creating entirely a new set of variables and completely replace the original values with empirical value (factor score).
  • 3. • Variable selection  The researcher should always consider the conceptual underpinnings of the variables and use judgment as to the appropriateness of the variables for factor analysis. • Using factor analysis with other multivariate techniques  Factor scores as representatives of variables will be used for further analysis. • Stage 2: Designing a factor analysis • It involves three basic decisions:  Correlations among variables or respondents (Q type vs. R type)  Variable selection and measurement issues- Mostly performed on metric variables. For nonmetric variables, define dummy variables (0-1) and include in the set of metric variables.  Sample size- The sample must have more observations than variables. The minimum sample size should be fifty observations. Minimum 5 and hopefully at least 10 observations per variable is desirable.
  • 4. • Stage 3: Assumptions in factor analysis – The assumptions are more conceptual than statistical.  Conceptual issues- 1) Appropriate selection of variables 2) Homogeneous sample.  Statistical issues- Ensuring the variables are sufficiently intercorrelated to produce representative factors. Measure of intercorrelation: Visual inspection of Correlations greater than .30 in substantial cases in correlation matrix , the factor analysis is appropriate. If partial correlation are high, indicating no underlying factors, then factor analysis is inappropriate. Bartlett test of sphericity- A test for the presence of correlation among the variables. A statistically significant Bartlett’s test of sphericity (sig. >.05) indicates that sufficient correlation exist among the variables to proceed.
  • 5.  Measure of sampling adequacy (MSA)- This index ranges from 0 to 1, reaching 1 when each variable is perfectly predicted without error by the other variables. The measures can be integrated with following guidelines: – Kaiser-Meyer Measure of Sampling Adequacy – in the .90s marvelous – in the .80s meritorious – in the .70s middling – in the .60s mediocre – in the .50s miserable – below .50 unacceptable • MSA values must exceed .50 for both the overall test and each individual variable • Variables with value less than .50 should be omitted from the factor analysis.
  • 6. • Stage 4: Deriving factors and assessing overall fit • Apply factor analysis to identify the underlying structure of relationships. • Two decisions are important: – Selecting the factor extraction method • Common factor analysis • Principal component analysis • Concept of Partitioning the variance of a variable – Common variance- Variance in the variable shared with all other variables in the analysis. The variance is based on variable’s correlations with other variables. Communality of variable estimates common variance. – Specific variance- AKA unique variance. This variance of variable cannot be explained by the correlations to the other variables but is associated uniquely with a single variable. – Error variance- It is due to unreliability in the data-gathering process, measurement error, or a random component in the measured phenomenon.
  • 7. • Component factor analysis- AKA principal components analysis. Considers the total variance and derives factors that contain small proportions of unique variance and in some instances error variance. • Common factor analysis- Considers only the common or shared variance, assuming that both the unique and error variance are not of interest in defining the structure of the variables. Diagonal value Unity Variance Communality Variance extracted Variance excluded Total variance common
  • 8. • Suitability of factor extraction method – Component factor analysis is appropriate when data reduction is primary concern. – Common factor analysis is appropriate when primary objective is to identify the latent dimensions or constructs represented in the original value. • Criteria for the number of factors to extract – Latent root criterion • It applies to both extraction method. • This criteria assumes that any individual factor should account for the variance of at least a single variable if it is to be retained for interpretation. • In component analysis each variable contribute a value of 1 to the latent roots or eigen values. • So, factors having eigen values greater than 1 are considered significant and selected. • Eigen value- It represents the amount of variance accounted for by the factor. It is column sum of squared loading for a factor.
  • 9. – Scree test criterion • This is plotting the latent roots against the number of factors in their order of extraction. • The shape of the resulting curve is used to evaluate the cutoff point. • The point at which the curve begins to straighten out is considered to indicate the maximum numbers of factors to extract. • As a general rule, the scree test results in at least one and sometimes two or three more factors being considered for inclusion than does the latent root criterion.
  • 10. 012345 0 5 10 Number Scree plot of eigenvalues after factor Factor Eigen values Scree criterion
  • 11. • Stage 5: Interpreting the factors • Three processes of factor interpretation – Estimate the factor matrix • Initial unrotated factor matrix is computed. • It contains factor loadings for each variable on each factor. • Factor loadings are the correlation of each variable on each factor. • Higher loadings making the variable representative of the factor. – Factor rotation • Rotational method is employed to achieve simpler and theoretically more meaningful factor solutions. • The reference axes of the factors are turned about the origin until some position has been reached. • There are two types of rotation: • Orthogonal factor rotation • Oblique factor rotation.
  • 12. Rotating Factors F1 F1 F2 F2 Factor 1 Factor 2 x1 0.5 0.5 x2 0.8 0.8 x3 -0.7 0.7 x4 -0.5 -0.5 Factor 1 Factor 2 x1 0 0.6 x2 0 0.9 x3 -0.9 0 x4 0 -0.9 2 1 3 4 2 1 3 4
  • 14. 14 When to use Factor Analysis? • Data Reduction • Identification of underlying latent structures - Clusters of correlated variables are termed factors – Example: – Factor analysis could potentially be used to identify the characteristics (out of a large number of characteristics) that make a person popular. Candidate characteristics: Level of social skills, selfishness, how interesting a person is to others, the amount of time they spend talking about themselves (Talk 2) versus the other person (Talk 1), their propensity to lie about themselves.
  • 15. 15 The R-Matrix Meaningful clusters of large correlation coefficients between subsets of variables suggests these variables are measuring aspects of the same underlying dimension. Factor 1: The better your social skills, the more interesting and talkative you tend to be. Factor 2: Selfish people are likely to lie and talk about themselves.
  • 16. 16 What is a Factor? • Factors can be viewed as classification axes along which the individual variables can be plotted. • The greater the loading of variables on a factor, the more the factor explains relationships among those variables. • Ideally, variables should be strongly related to (or load on) only one factor.
  • 17. 17 Graphical Representation of a factor plot Note that each variable loads primarily on only one factor. Factor loadings tell use about the relative contribution that a variable makes to a factor
  • 18. 18 Mathematical Representation of a factor plot Yi = b1X1i +b2X2i + … bnXn + εi Factori = b1Variable1i +b2Variable2i + … bnVariablen + εi • The equation describing a linear model can be applied to the description of a factor. • The b’s in the equation represent the factor loadings observed in the factor plot. Note: there is no intercept in the equation since the lines intersection at zero and hence the intercept is also zero.
  • 19. 19 Mathematical Representation of a factor plot Sociabilityi = b1Talk 1i +b2Social Skillsi + b3interesti + b4Talk 2 + b5Selfishi + b6Liari + εi There are two factors underlying the popularity construct: general sociability and consideration. We can construct equations that describe each factor in terms of the variables that have been measured. Considerationi = b1Talk 1i +b2Social Skillsi + b3interesti + b4Talk 2 + b5Selfishi + b6Liari + εi
  • 20. 20 Mathematical Representation of a factor plot Sociabilityi = 0.87Talk 1i +0.96Social Skillsi + 0.92Interesti + 0.00Talk 2 - 0.10Selfishi + 0.09Liari + εi The values of the “b’s” in the two equations differ, depending on the relative importance of each variable to a particular factor. Considerationi = 0.01Talk 1i - 0.03Social Skillsi + 0.04interesti + 0.82Talk 2 + 0.75Selfishi + 0.70Liari + εi Ideally, variables should have very high b-values for one factor and very low b-values for all other factors. Replace values of b with the co-ordinate of each variable on the graph.
  • 21. 21 Factor Loadings • The b values represent the weights of a variable on a factor and are termed Factor Loadings. • These values are stored in a Factor pattern matrix (A). • Columns display the factors (underlying constructs) and rows display how each variable loads onto each factor. Variables Factors Sociability Consideration Talk 1 0.87 0.01 Social Skills 0.96 -0.03 Interest 0.92 0.04 Talk 2 0.00 0.82 Selfish -0.10 0.75 Liar 0.09 0.70
  • 22. 22 Factor Scores • Once factors are derived, we can estimate each person’s Factor Scores (based on their scores for each factor’s constituent variables). • Potential uses for Factor Scores. - Estimate a person’s score on one or more factors. - Answer questions of scientific or practical interest (e.g., Are females are more sociable than males? using the factors scores for sociability). • Methods of Determining Factor Scores - Weighted Average (simplest, but scale dependent) - Regression Method (easiest to understand; most typically used) - Bartlett Method (produces scores that are unbiased and correlate only with their own factor). - Anderson-Rubin Method (produces scores that are uncorrelated and standardized)
  • 23. Approaches to Factor Analysis • Exploratory – Reduce a number of measurements to a smaller number of indices or factors (e.g., Principal Components Analysis or PCA). – Goal: Identify factors based on the data and to maximize the amount of variance explained. • Confirmatory – Test hypothetical relationships between measures and more abstract constructs. – Goal: The researcher must hypothesize, in advance, the number of factors, whether or not these factors are correlated, and which items load onto and reflect particular factors. In contrast to EFA, where all loadings are free to vary, CFA allows for the explicit constraint of certain loadings to be zero.
  • 24. Communality • Understanding variance in an R-matrix – Total variance for a particular variable has two components: • Common Variance – variance shared with other variables. • Unique Variance – variance specific to that variable (including error or random variance). • Communality – The proportion of common (or shared) variance present in a variable is known as the communality. – A variable that has no unique variance has a communality of 1; one that shares none of its variance with any other variable has a communality of 0.
  • 25. Factor Extraction: PCA vs. Factor Analysis – Principal Component Analysis. A data reduction technique that represents a set of variables by a smaller number of variables called principal components. They are uncorrelated, and therefore, measure different, unrelated aspects or dimensions of the data. – Principal Components are chosen such that the first one accounts for as much of the variation in the data as possible, the second one for as much of the remaining variance as possible, and so on. – Useful for combining many variables into a smaller number of subsets. – Factor Analysis. Derives a mathematical model from which factors are estimated. – Factors are linear combinations that maximize the shared portion of the variance underlying latent constructs. – May be used to identify the structure underlying such variables and to estimate scores to measure latent factors themselves.
  • 26. Factor Extraction: Eigenvalues & Scree Plot • Eigenvalues – Measure the amount of variation accounted for by each factor. – Number of principal components is less than or equal to the number of original variables. The first principal component accounts for as much of the variability in the data as possible. Each succeeding component has the highest variance possible under the constraint that it be orthogonal to (i.e., uncorrelated with) the preceding components. • Scree Plots – Plots a graph of each eigenvalue (Y-axis) against the factor with which it is associated (X-axis). – By graphing the eigenvalues, the relative importance of each factor becomes apparent.
  • 27. 27 Factor Retention Based on Scree Plots
  • 28. 28 Kaiser (1960) recommends retaining all factors with eigenvalues greater than 1. - Based on the idea that eigenvalues represent the amount of variance explained by a factor and that an eigenvalue of 1 represents a substantial amount of variation. - Kaiser’s criterion tends to overestimate the number of factors to be retained. Factor Retention: Kaiser’s Criterion
  • 29. 29 • Students often become stressed about statistics (SAQ) and the use of computers and/or SPSS to analyze data. • Suppose we develop a questionnaire to measure this propensity (see sample items on the following slides; the data can be found in SAQ.sav). • Does the questionnaire measure a single construct? Or is it possible that there are multiple aspects comprising students’ anxiety toward SPSS? Doing Factor Analysis: An Example
  • 30. 30
  • 31. 31
  • 32. 32 Doing Factor Analysis: Some Considerations • Sample size is important! A sample of 300 or more will likely provide a stable factor solution, but depends on the number of variables and factors identified. • Factors that have four or more loadings greater than 0.6 are likely to be reliable regardless of sample size. • Correlations among the items should not be too low (less than .3) or too high (greater than .8), but the pattern is what is important.
  • 33. 33 nce 0669 096963 0192 19 9063 96 0564 0237 42 7511 72 5815 3998 42 7672 73 6174 97 53 17 852 534 627 340 156 773 425 016 217 829 986 351 801 624 436 839 909 432 380 Co 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 otal % of i an m ula %Tota % o ri an m ula %Tota % o ari a m u % ge no f Sof S Ex Factor Extraction
  • 34. 34 Scree Plot for the SAQ Data
  • 35. 35 un a 05 04 00 09 03 04 05 09 04 05 00 03 06 08 08 07 03 07 03 04 00 04 02 Q0 Q0 Q0 Q0 Q0 Q0 Q0 Q0 Q0 Q1 Q1 Q1 Q1 Q1 Q1 Q1 Q1 Q1 Q1 Q2 Q2 Q2 Q2 iti aa ct Ex Table of Communalities Before and After Extraction ne na 01 85 79 73 69 58 56 5200 43 34 29 93 86 56 490117 37 3604 27 27 48 65 6271 07 Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q 1234 mp o E 4a. Component Matrix Before Rotation (loadings of each variable onto each factor) Note: Loadings less than 0.4 have been omitted.
  • 36. 36 Factor Rotation • To aid interpretation it is possible to maximize the loading of a variable on one factor while minimizing its loading on all other factors. • This is known as Factor Rotation. • Two types: – Orthogonal (factors are uncorrelated) – Oblique (factors intercorrelate)
  • 38. 38 Rotated Com ponent Matrixa .800 .684 .647 .638 .579 .550 .459 .677 .661 -.567 .473 .523 .516 .514 .496 .429 .833 .747 .747 .648 .645 .586 .543 .427 I have little experience of computers SPSS alw ays crashes w hen I try to use it I w orry that I w ill cause irreparable damage because of my incompetenece w ith computers All computers hate me Computers have minds of their ow n and deliberately go w rong w henever I use them Computers are useful only for playing games Computers are out to get me I can't sleep for thoughts of eigen vectors I w ake up under my duvet thinking that I am trapped under a normal distribtion Standard deviations excite me People try to tell you that SPSS makes statistics easier to understand but it doesn't I dream that Pearson is attacking me w ith correlation coefficients I w eep openly at the mention of central tendency Statiscs makes me cry I don't understand statistics I have never been good at mathematics I slip into a coma w henever I see an equation I did badly at mathematics at school My friends are better at statistics than me My friends are better at SPSS than I am If I'm good at statistics my friends w ill think I'm a nerd My friends w ill think I'm stupid for not being able to cope w ith SPSS Everybody looks at me w hen I use SPSS 1 2 3 4 Component Extraction Method: Principal Component Analysis. Rotation Method: Varimax w ith Kaiser Normalization. Rotation converged in 9 iterations.a. Orthogonal Rotation (varimax) Fear of Computers Fear of Statistics Fear of Math Peer Evaluation Note: Varimax rotation is the most commonly used rotation. Its goal is to minimize the complexity of the components by making the large loadings larger and the small loadings smaller within each component. Quartimax rotation makes large loadings larger and small loadings smaller within each variable. Equamax rotation is a compromise that attempts to simplify both components and variables. These are all orthogonal rotations, that is, the axes remain perpendicular, so the components are not correlated.
  • 39. 39 Oblique Rotation: Pattern Matrix Pattern Matrixa .706 .591 -.511 .405 .400 .643 .621 .615 .507 .885 .713 .653 .650 .588 .585 .412 .462 .411 -.902 -.774 -.774 I can't sleep for thoughts of eigen vectors I w ake up under my duvet thinking that I am trapped under a normal distribtion Standard deviations excite me I dream that Pearson is attacking me w ith correlation coefficients I w eep openly at the mention of central tendency Statiscs makes me cry I don't understand statistics My friends are better at SPSS than I am My friends are better at statistics than me If I'm good at statistics my friends w ill think I'm a nerd My friends w ill think I'm stupid for not being able to cope w ith SPSS Everybody looks at me w hen I use SPSS I have little experience of computers SPSS alw ays crashes w hen I try to use it All computers hate me I w orry that I w ill cause irreparable damage because of my incompetenece w ith computers Computers have minds of their ow n and deliberately go w rong w henever I use them Computers are useful only for playing games People try to tell you that SPSS makes statistics easier to understand but it doesn't Computers are out to get me I have never been good at mathematics I slip into a coma w henever I see an equation I did badly at mathematics at school 1 2 3 4 Component Extraction Method: Principal Component Analysis. Rotation Method: Oblimin w ith Kaiser Normalization. Rotation converged in 29 iterations.a. Fear of Statistics Fear of Computers Fear of Math Peer Evaluation
  • 40. 40 Reliability: A measure should consistently reflect the construct it is measuring • Test-Retest Method – What about practice effects/mood states? • Alternate Form Method – Expensive and Impractical • Split-Half Method – Splits the questionnaire into two random halves, calculates scores and correlates them. • Cronbach’s Alpha – Splits the questionnaire (or sub-scales of a questionnaire) into all possible halves, calculates the scores, correlates them and averages the correlation for all splits. – Ranges from 0 (no reliability) to 1 (complete reliability)
  • 41. 41 Reliability: Fear of Computers Subscale
  • 42. 42 Reliability: Fear of Statistics Subscale
  • 43. 43 Reliability: Fear of Math Subscale
  • 45. 45 Reporting the Results A principal component analysis (PCA) was conducted on the 23 items with orthogonal rotation (varimax). Bartlett’s test of sphericity, Χ2(253) = 19334.49, p< .001, indicated that correlations between items were sufficiently large for PCA. An initial analysis was run to obtain eigenvalues for each component in the data. Four components had eigenvalues over Kaiser’s criterion of 1 and in combination explained 50.32% of the variance. The scree plot was slightly ambiguous and showed inflexions that would justify retaining either 2 or 4 factors. Given the large sample size, and the convergence of the scree plot and Kaiser’s criterion on four components, four components were retained in the final analysis. Component 1 represents a fear of computers, component 2 a fear of statistics, component 3 a fear of math, and component 4 peer evaluation concerns. The fear of computers, fear of statistics, and fear of math subscales of the SAQ all had high reliabilities, all Chronbach’s α = .82. However, the fear of negative peer evaluation subscale had a relatively low reliability, Chronbach’s α= .57.
  • 46. Step 1: Select Factor Analysis
  • 47. Step 2: Add all variables to be included
  • 48. Step 3: Get descriptive statistics & correlations
  • 49. Step 4: Ask for Scree Plot and set extraction options
  • 50. Step 5: Handle missing values and sort coefficients by size
  • 51. Step 6: Select rotation type and set rotation iterations
  • 52. Step 7: Save Factor Scores
  • 60. Rename Components According to Interpretation