SlideShare una empresa de Scribd logo
1 de 11
Descargar para leer sin conexión
J. Rovan: Multivariate Analysis 9 Principal Component Analysis 1
9 Principal Component Analysis
A principal component analysis (PCA) is concerned with explaining the variance-covariance
structure of a set of variables through a few linear combinations of these variables, called
principal components. Its general objectives are:
• data reduction and
• interpretation.
In general, p principal components are required to reproduce the total system of variability of
the original data set (n measurements on p variables). Fortunatelly, much of this variability can
often be accounted for by a small number of k of principal components. If so, there is (almost)
as much information in the first k components as there is in the original p variables. The k first
principal components can then replace the initial p variables, and the original n p× data set is
reduced to n k× data set consisting of n measurements on k principal components.
An analysis of principal components often reveals relationships that were not previously
suspected and thereby allows interpretations that would not ordinarily result.
Principal components also frequently serve as intermediate steps in much larger investigations,
e.g. as inputs to a multiple regression, cluster analysis, etc.
J. Rovan: Multivariate Analysis 9 Principal Component Analysis 2
Example
Suppose one would like to investigate the level of the socio-economic development of some
European countries in the year 1981. An investigation will take into account the following set of
economic, demographic, health, social security and level of living indicators:
• Per capita gross domestic product in $
• Share of agriculture in gross domestic product (%)
• Share of service activities in gross domestic product (%)
• Export/import ratio
• Per capita fuel consumption in kilograms of coal
• Natural change of population (rates per 1000 inhabitants)
• Share of urban population (%)
• Infant mortality per 1000 live birth
• Number of students per 1000 inhabitants
• Number of TV sets per 1000 inhabitants
J. Rovan: Multivariate Analysis 9 Principal Component Analysis 3
9.1 Geometry of Principal Component Analysis
Example
Suppose we have a data set of 12 measurements on 2 variables 1X and 2X for 12 randomly
selected units (Sharma, 1966, p. 59). Let us calculate their mean-corrected values.
Table 1
1ix 2ix 1,cix 2,cix
16 8 8 5
12 10 4 7
13 6 5 3
11 2 3 -1
10 8 2 5
9 -1 1 -4
8 4 0 1
7 6 -1 3
5 -3 -3 -6
3 -1 -5 -4
2 -3 -6 -6
0 0 -8 -3
J. Rovan: Multivariate Analysis 9 Principal Component Analysis 4
The position of units can be presented with points in the two-dimensional space. The
coordinates of the points are the values of mean-corrected variables 1,cX and 2,cX :
X1C
1086420-2-4-6-8-10
X2C
10
8
6
4
2
0
-2
-4
-6
-8
-10
12
11
10
9
8
7
6
5
4
3
2
1
J. Rovan: Multivariate Analysis 9 Principal Component Analysis 5
9.1.1 Identification of Alternative Axes and Forming New Variables
Let *
1,cX be any axis in the two dimensional space that goes through the origin of the two
rectangular axes 1,cX and 2,cX 1. Axis *
1,cX is making an angle of θ degrees with 1,cX . The
perpendicular projections of the units (observations) onto *
1,cX will give the coordinates of the
observations with respect to *
1,cX . These new coordinates are linear combinations of the
coordinates of the points with respect to the original set of axes 1,cX and 2,cX :
*
1,c 1,c 2,ccos sinX X Xθ θ= ⋅ + ⋅
There is one and only one new axis 1, cξ that results in a new variable accounting for the
maximum variance in the data. In our case this axis makes an angle of o
43,261 with 1,cX . The
corresponding equation for computing the values of 1,cξ is
o o
1,c 1,c 2,c 1,c 2,ccos43,261 sin 43,261 0,728 0,685X X X Xξ = ⋅ + ⋅ = + ,
while its values are
1,c 1,c 2,c0,728 0,685i i ix xξ = + , 1,2, ,i n= … .
1 The origin 1, 2,( , ) (0,0)c cx x ′ ′= , i.e. the centroid, is always part of the optimal subspace in the sence of
least squares.
J. Rovan: Multivariate Analysis 9 Principal Component Analysis 6
Of course, a one-dimensional space represented by the new axis 1, cξ (in general) does not
account for all the variance of the investigated phenomena, that has been originally presented by
the values of the two variables 1,cX and 2,cX in a two-dimensional space. Therefore, it is
possible to identify a second axis 2, cξ such that the corresponding new variable accounts for the
maximum of the variance that is not accounted for by 1, cξ . Let 2, cξ be the second new axis that
is orthogonal to 1, cξ . Thus, if the angle between 1, cξ and 1,cX is θ then the angle between 2, cξ
and 2,cX will also be θ.
The equation for computing the values of 2,cξ is
o o
2,c 1,c 2,c 1,c 2,csin 43,261 cos43,261 0,685 0,728X X X Xξ = − ⋅ + ⋅ = − + ,
while its values are
2,c 1,c 2,c0,685 0,728i i ix xξ = − + , 1,2, ,i n= … .
J. Rovan: Multivariate Analysis 9 Principal Component Analysis 7
The following conclusions can be made from the above figure and the statistical measures:
• the perpendicular projections of the points onto the original axes give the values of the
original variables 1,cX and 2,cX , and the perpendicular projections of the points onto the
new axes give the values for the new variables 1, cξ and 2, cξ . The new axes and the
corresponding variables are called principal components and the values of the new variables
are called principal component scores. Each of the new variables are linear combinations of
the original variables and remain mean-corrected.
• The total variance of the principal components is the same as the total variance of the
original variables.The variance accounted for by the first principal component is greater than
the variance accounted for by any one of the original variables.
J. Rovan: Multivariate Analysis 9 Principal Component Analysis 8
The geometrical illustration of principal component analysis can be easily extended to more
than two variables. An n p× data set now consists of p variables and each unit (observation)
can be represented as a point in a p-dimensional space with respect to the p new axes – principal
components. The projections of points on principal components are called principal component
scores.
If a substantial amount of the total variance in the data set is accounted for by a few first
principal components, than we can use these principal components for further analysis or for
interpretations instead of the original variables. This would result in a substantial data reduction
– an n k× data set ( k p ) of principal component scores is sufficient for further analysis.
Hence, principal component analysis is commonly referred to as a data-reduction technique.
9.2 Analytical Approach
Let us form the following p linear combinations:
1 11 1 12 2 1
2 21 1 22 2 2
1 1 2 2
p p
p p
p p p pp p
w X w X w X
w X w X w X
w X w X w X
ξ
ξ
ξ
= + + +
= + + +
= + + +
…
…
…
where 1 2, , , pξ ξ ξ… are the p principal components and jkw ( , 1,2, , )j k p= … is the weight of the
k-th variable for the j-th principal component.
J. Rovan: Multivariate Analysis 9 Principal Component Analysis 9
The principal component weights are estimated in such a way that:
1. The first principal component, 1ξ , accounts for the maximum variance in the data, the
second principal component, 2ξ , accounts for the maximum variance that has not been
accounted for by the first principal component, and so on
2. For each principal component, the sum of squares of its weights should be equal to 1
2
1
1
p
jk
k
w
=
=∑ , 1,2, ,j p= …
3. Sum of the products of the corresponding weights of two principal components should be
equal to 0
1
0
p
jk j k
k
w w ′′
=
=∑ , j j′′≠
The last condition ensures that principal components are ortogonal to each other.
How do we obtain the weights such that the above listed conditions are satisfied? We are
dealing with an optimization problem, usually based on covariance or correlation matrix. We
need to calculate eigen vectors, that define principal component weights, and eigenvalues that
represent variances of principal components.
J. Rovan: Multivariate Analysis 9 Principal Component Analysis 10
9.3 Issues Relating to the Use of Principal Component Analysis
9.3.1 Effect of Type of Data on Principal Component Analysis
Principal component analysis can either be done on raw or mean-corrected data on one hand or
on standardised data on the other. Each data set could give a different solution depending upon
the extent to which the variances of the variables differ.
In case of raw or mean-corrected data, the basis for principal component analysis is covariance
matrix. The influence of an individual variable on principal components is determined by the
magnitude of its variance. The higher the variance of the variable, the stronger the effect of a
variable on principal components.
In case of standardized data, the basis for principal component analysis is correlation matrix. All
the variances are equal to 1 and therefore they all have the same influence on principal
components.
In cases for which there is a reason to believe that the variances of the variables do indicate the
importance of given variable and the units of measure are commensurable, the raw or the mean-
corrected data should be used. In all other cases standardised data are preferable alternative.
J. Rovan: Multivariate Analysis 9 Principal Component Analysis 11
9.3.2 Is Principal Component Analysis the Appropriate Technique
The use of principal component analysis is appropriate at least in two cases:
• if principal components have meaningful interpretation, what is particularly important for
their further use in other statistical techniques and/or
• if the objective is to reduce the number of variables in the data set to a few principal
components without a substantial loss of information.
Principal component analysis is most appropriate if the variables are interrelated, for only then
will it be possible to reduce a number of variables to a manageble few without much loss of
information.
Many statistical tests are available for determining if the variables are significantly correlated
among themselves. For standardised data we can use Bartlett's test, but we should keep in mind
that it is very sensitive on the sample size:
0H : =P I , 1H : ≠P I
2 1
6
( 1) (2 5) lnn pχ = − − − +⎡ ⎤⎣ ⎦ R
2
( )/ 2m p p= −
J. Rovan: Multivariate Analysis 9 Principal Component Analysis 12
9.3.3 Number of Principal Components to Extract
We suggest the use of the following two empirical rules :
1. Kaiser's rule
In the case of standardised data, retain only those components whose eigenvalues (variances)
are greater than 1.
, s
2
1jj ξλ σ= ≥
The rationale for this rule is that for standardised data the amount of variance extracted by
each component should, at minimum, be equal to the variance of at least one variable.
2. Scree plot (Cattell, 1966)
Plot the percentage of variance (or the eigenvalue) accounted for by each of principal
components (on vertical axis) against the ordinal number of the components (on horizontal
axis) and look for an elbow.
However, no one rule is best under all circumstances. One should take into consideration the
purpose of the study, the type of data, and the trade-off between parsimony and the amount of
variation in the data that the researcher is willing to sacrifice in order to achieve parsimony.
Lastly, and more importantly, one should determine the interpretability of the principal
components in deciding upon how many principal components should be retained (Sharma,
1996, p. 79)
J. Rovan: Multivariate Analysis 9 Principal Component Analysis 13
9.3.4 Interpreting Principal Components
Since principal components are linear combinations of the original variables, one can use
loadings (simple correlations between the original variables and principal components) for
interpreting the principal components. The higher the loading of a variable, the more influence it
has in the formation of the principal component score and vice versa. Traditionally, a loading of
0.5 or above is used as the cutoff point.
9.3.5 Use of Principal Component Scores
The principal component scores can be plotted for further interpreting the results. Based on
visual examination of the plot, clusters can be defined.
Principal component scores can also be used as input variables for further analysing the data
using other multivariate techniques such as cluster analysis, multiple regression, and
discriminant analysis. The advantage of using principal component scores is that they are not
correlated and the problem of multicollinearity is avoided. Unfortunatelly, a new problem can
arise due to the inability to meaningfully interpret the principal components.
J. Rovan: Multivariate Analysis 9 Principal Component Analysis 14
Example (the level of the socio-economic development of some European countries - continued)
GET
FILE='F:Predmeti EFMagistrski studijMultivariate Analysis (IMB)Priprava prosojnic6_predavanjePCA.sav'.
EXECUTE .
LIST .
List
country gdp agric service expimp energy growth urban infmort student tv
Austria 8725 4,4 55,7 ,753 4160,00 ,0 54 14,7 15,7 290
Belgium 9702 2,1 61,2 ,896 6037,00 ,1 72 11,1 20,3 296
Bulgaria 4150 16,9 25,4 1,136 5678,00 ,7 64 19,8 13,2 200
Czechslovakia 5820 8,4 16,9 ,983 6482,00 ,7 63 15,8 12,5 252
Denmark 10874 4,8 66,7 ,898 5225,00 ,3 84 8,8 20,5 361
Finland 10028 8,2 56,2 ,987 5135,00 ,3 62 7,6 17,4 318
France 12214 4,2 60,1 ,840 4351,00 ,4 78 10,1 19,0 299
Greece 3887 15,5 56,7 ,477 2137,00 1,1 62 18,7 12,4 151
Italy 6085 6,4 50,7 ,826 3318,00 ,5 69 15,3 19,1 232
Yugoslavia 2620 13,3 34,8 ,694 2049,00 ,9 42 34,0 20,0 195
Hungary 4180 14,3 26,8 ,954 3850,00 ,4 54 23,7 9,9 251
GDR (East Germany) 7180 9,1 22,1 ,893 7408,00 -,2 77 13,0 23,0 344
Netherlands 9760 4,0 63,0 1,040 6183,00 ,7 76 8,7 23,2 298
Norway 13522 4,5 55,1 1,150 6434,00 ,4 53 8,8 18,5 294
Poland 3900 15,3 20,6 ,856 5590,00 ,9 57 21,3 16,9 218
Portugal 2370 13,0 41,0 ,423 1097,00 1,1 31 39,0 8,6 126
Romania 1904 11,0 25,0 ,904 4593,00 1,0 50 31,6 8,6 166
Spain 5678 8,0 55,0 ,632 2530,00 1,1 74 15,0 17,7 267
Sweden 13326 3,1 65,5 ,991 5296,00 ,3 87 7,5 23,9 375
Switzerland 15069 6,1 55,0 ,881 3708,00 -,3 58 10,0 12,6 320
United Kingdom 9358 1,9 63,5 1,003 4835,00 ,0 91 12,8 13,6 336
Sowiet Union 4550 15,1 23,5 1,115 5598,00 ,9 62 25,6 19,1 307
FRG (West Germany) 11135 2,2 49,9 1,074 5727,00 -,2 85 13,5 18,0 343
Number of cases read: 23 Number of cases listed: 23
J. Rovan: Multivariate Analysis 9 Principal Component Analysis 15
FACTOR
/VARIABLES gdp agric service expimp energy growth urban infmort student tv
/MISSING LISTWISE /ANALYSIS gdp agric service expimp energy growth urban infmort st
udent tv
/PRINT UNIVARIATE INITIAL CORRELATION SIG DET KMO EXTRACTION FSCORE
/PLOT EIGEN
/CRITERIA FACTORS(10) ITERATE(25)
/EXTRACTION PC
/ROTATION NOROTATE
/SAVE REG(ALL)
/METHOD=CORRELATION .
_
- - - - - - - - - - - - F A C T O R A N A L Y S I S - - - - - - - - - - - -
Factor Analysis
F:Predmeti EFMagistrski studijMultivariate Analysis (IMB)Priprava prosojnic6_predavanjePCA.sav
Descriptive Statistics
7653,78 3941,192 23
8,339 4,9609 23
45,670 17,0293 23
,88722 ,190763 23
4670,4783 1613,28267 23
,483 ,4448 23
65,43 14,981 23
16,800 8,8066 23
16,683 4,5156 23
271,26 69,072 23
Per capita gross domestic product in $
Share of agriculture in gross domestic product (%)
Share of services activities in gross domestic product (%)
Export/import ratio
Per capita fuel consumption in kilograms of coal
Natural change of population (rates per 1000 inhabitants)
Share of urban population (%)
Infant mortality per 1000 live birth
Number of students per 1000 inhabitants
Number of TV sets per 1000 inhabitants
Mean Std. Deviation Analysis N
J. Rovan: Multivariate Analysis 9 Principal Component Analysis 16
Correlation Matrixa
1,000 -,801 ,686 ,410 ,389 -,728 ,542 -,842 ,460 ,799
-,801 1,000 -,723 -,261 -,314 ,655 -,611 ,704 -,447 -,704
,686 -,723 1,000 -,103 -,151 -,332 ,465 -,606 ,359 ,438
,410 -,261 -,103 1,000 ,817 -,409 ,406 -,445 ,290 ,573
,389 -,314 -,151 ,817 1,000 -,449 ,479 -,544 ,437 ,595
-,728 ,655 -,332 -,409 -,449 1,000 -,476 ,622 -,278 -,751
,542 -,611 ,465 ,406 ,479 -,476 1,000 -,735 ,554 ,744
-,842 ,704 -,606 -,445 -,544 ,622 -,735 1,000 -,549 -,784
,460 -,447 ,359 ,290 ,437 -,278 ,554 -,549 1,000 ,635
,799 -,704 ,438 ,573 ,595 -,751 ,744 -,784 ,635 1,000
,000 ,000 ,026 ,033 ,000 ,004 ,000 ,014 ,000
,000 ,000 ,115 ,072 ,000 ,001 ,000 ,016 ,000
,000 ,000 ,319 ,245 ,061 ,013 ,001 ,046 ,018
,026 ,115 ,319 ,000 ,026 ,027 ,017 ,090 ,002
,033 ,072 ,245 ,000 ,016 ,010 ,004 ,019 ,001
,000 ,000 ,061 ,026 ,016 ,011 ,001 ,100 ,000
,004 ,001 ,013 ,027 ,010 ,011 ,000 ,003 ,000
,000 ,000 ,001 ,017 ,004 ,001 ,000 ,003 ,000
,014 ,016 ,046 ,090 ,019 ,100 ,003 ,003 ,001
,000 ,000 ,018 ,002 ,001 ,000 ,000 ,000 ,001
Per capita gross domestic product in $
Share of agriculture in gross domestic produ
Share of services activities in gross domestic
Export/import ratio
Per capita fuel consumption in kilograms of c
Natural change of population (rates per 1000
Share of urban population (%)
Infant mortality per 1000 live birth
Number of students per 1000 inhabitants
Number of TV sets per 1000 inhabitants
Per capita gross domestic product in $
Share of agriculture in gross domestic produ
Share of services activities in gross domestic
Export/import ratio
Per capita fuel consumption in kilograms of c
Natural change of population (rates per 1000
Share of urban population (%)
Infant mortality per 1000 live birth
Number of students per 1000 inhabitants
Number of TV sets per 1000 inhabitants
Correlation
Sig. (1-taile
Per capita
gross
domestic
product in $
Share of
agriculture
in gross
domestic
product (%)
Share of
services
activities in
gross
domestic
product (%)
Export/import
ratio
Per capita fue
consumption
in kilograms
of coal
Natural
change of
population
(rates per
1000
inhabitants)
Share of
urban
population
(%)
nfant mortality
per 1000 live
birth
Number of
students per
1000
inhabitants
Number of TV
sets per 1000
inhabitants
Determinant = 2,926E-05a.
KMO and Bartlett's Test
,769
186,166
45
,000
Kaiser-Meyer-Olkin Measure of Sampling
Adequacy.
Approx. Chi-Square
df
Sig.
Bartlett's Test of
Sphericity
J. Rovan: Multivariate Analysis 9 Principal Component Analysis 17
Communalities
1,000 1,000
1,000 1,000
1,000 1,000
1,000 1,000
1,000 1,000
1,000 1,000
1,000 1,000
1,000 1,000
1,000 1,000
1,000 1,000
Per capita gross domestic product in $
Share of agriculture in gross domestic product (%)
Share of services activities in gross domestic product (%)
Export/import ratio
Per capita fuel consumption in kilograms of coal
Natural change of population (rates per 1000 inhabitants)
Share of urban population (%)
Infant mortality per 1000 live birth
Number of students per 1000 inhabitants
Number of TV sets per 1000 inhabitants
Initial Extraction
Extraction Method: Principal Component Analysis.
Total Variance Explained
5,879 58,787 58,787 5,879 58,787 58,787
1,751 17,514 76,302 1,751 17,514 76,302
,830 8,305 84,607 ,830 8,305 84,607
,437 4,367 88,973 ,437 4,367 88,973
,399 3,995 92,968 ,399 3,995 92,968
,260 2,603 95,570 ,260 2,603 95,570
,224 2,237 97,808 ,224 2,237 97,808
,106 1,062 98,870 ,106 1,062 98,870
6,090E-02 ,609 99,479 6,090E-02 ,609 99,479
5,207E-02 ,521 100,000 5,207E-02 ,521 100,000
Component
1
2
3
4
5
6
7
8
9
10
Total % of Variance Cumulative % Total % of Variance Cumulative %
Initial Eigenvalues Extraction Sums of Squared Loadings
Extraction Method: Principal Component Analysis.
J. Rovan: Multivariate Analysis 9 Principal Component Analysis 18
Component Matrixa
,890 -,220 -,226 -,074 ,226 -,118 ,047 ,094 -,091 ,136
-,831 ,341 ,131 -,007 -,024 -,374 ,161 -,061 ,053 ,056
,589 -,745 ,060 ,135 ,186 ,027 ,076 -,124 ,137 ,024
,572 ,705 -,117 ,153 ,242 ,113 ,236 -,094 -,040 -,044
,623 ,719 ,031 ,029 ,075 ,031 -,259 ,003 ,112 ,077
-,764 -,024 ,486 ,261 ,290 ,049 ,006 ,160 -,002 ,000
,795 ,005 ,296 ,367 -,368 ,018 ,051 -,019 -,050 ,065
-,909 ,077 -,035 -,159 -,098 ,297 ,166 ,007 ,039 ,122
,651 ,035 ,647 -,382 ,052 ,022 ,012 -,073 -,042 -,001
,931 ,098 ,005 -,122 -,133 -,029 ,193 ,195 ,106 -,054
Per capita gross domestic product in $
Share of agriculture in gross domestic product (%)
Share of services activities in gross domestic
product (%)
Export/import ratio
Per capita fuel consumption in kilograms of coal
Natural change of population (rates per 1000
inhabitants)
Share of urban population (%)
Infant mortality per 1000 live birth
Number of students per 1000 inhabitants
Number of TV sets per 1000 inhabitants
1 2 3 4 5 6 7 8 9 10
Component
Extraction Method: Principal Component Analysis.
10 components extracted.a.
J. Rovan: Multivariate Analysis 9 Principal Component Analysis 19
Component Score Coefficient Matrix
,151 -,126 -,272 -,170 ,565 -,454 ,211 ,889 -1,492 2,603
-,141 ,195 ,158 -,015 -,059 -1,436 ,718 -,574 ,873 1,082
,100 -,425 ,073 ,309 ,465 ,104 ,338 -1,170 2,243 ,459
,097 ,402 -,141 ,350 ,605 ,432 1,054 -,887 -,660 -,849
,106 ,411 ,038 ,066 ,187 ,120 -1,157 ,030 1,833 1,475
-,130 -,014 ,585 ,597 ,726 ,187 ,025 1,505 -,032 ,007
,135 ,003 ,356 ,840 -,921 ,069 ,228 -,179 -,826 1,256
-,155 ,044 -,042 -,365 -,246 1,142 ,740 ,065 ,633 2,343
,111 ,020 ,779 -,875 ,131 ,085 ,053 -,687 -,692 -,025
,158 ,056 ,006 -,279 -,334 -,111 ,862 1,834 1,743 -1,041
Per capita gross domestic product in $
Share of agriculture in gross domestic product (%)
Share of services activities in gross domestic
product (%)
Export/import ratio
Per capita fuel consumption in kilograms of coal
Natural change of population (rates per 1000
inhabitants)
Share of urban population (%)
Infant mortality per 1000 live birth
Number of students per 1000 inhabitants
Number of TV sets per 1000 inhabitants
1 2 3 4 5 6 7 8 9 10
Component
Extraction Method: Principal Component Analysis.
Component Scores.
Component Score Covariance Matrix
1,000 ,000 ,000 ,000 ,000 ,000 ,000 ,000 ,000 ,000
,000 1,000 ,000 ,000 ,000 ,000 ,000 ,000 ,000 ,000
,000 ,000 1,000 ,000 ,000 ,000 ,000 ,000 ,000 ,000
,000 ,000 ,000 1,000 ,000 ,000 ,000 ,000 ,000 ,000
,000 ,000 ,000 ,000 1,000 ,000 ,000 ,000 ,000 ,000
,000 ,000 ,000 ,000 ,000 1,000 ,000 ,000 ,000 ,000
,000 ,000 ,000 ,000 ,000 ,000 1,000 ,000 ,000 ,000
,000 ,000 ,000 ,000 ,000 ,000 ,000 1,000 ,000 ,000
,000 ,000 ,000 ,000 ,000 ,000 ,000 ,000 1,000 ,000
,000 ,000 ,000 ,000 ,000 ,000 ,000 ,000 ,000 1,000
Component
1
2
3
4
5
6
7
8
9
10
1 2 3 4 5 6 7 8 9 10
Extraction Method: Principal Component Analysis.
Component Scores.
J. Rovan: Multivariate Analysis 9 Principal Component Analysis 20
COMPUTE pc1 = fac1_1*2.4246 .
EXECUTE .
COMPUTE pc2 = fac2_1*1.3234 .
EXECUTE .
LIST VARIABLES=country fac1_1 fac2_1 pc1 pc2 .
List
F:Predmeti EFMagistrski studijMultivariate Analysis (IMB)Priprava prosojnic6_predavanjePCA.sav
country FAC1_1 FAC2_1 pc1 pc2
Austria ,20385 -,83959 ,49 -1,11
Belgium ,85869 -,31095 2,08 -,41
Bulgaria -,68262 1,67025 -1,66 2,21
Czechslovakia -,28816 1,39670 -,70 1,85
Denmark 1,05110 -,54403 2,55 -,72
Finland ,54720 -,01549 1,33 -,02
France ,70860 -,84492 1,72 -1,12
Greece -1,28526 -1,51094 -3,12 -2,00
Italy -,07291 -,65342 -,18 -,86
Yugoslavia -1,39868 -,42705 -3,39 -,57
Hungary -,84721 ,73613 -2,05 ,97
GDR (East Germany) ,69647 1,43444 1,69 1,90
Netherlands ,87914 ,04262 2,13 ,06
Norway ,78937 ,41629 1,91 ,55
Poland -,83951 1,15341 -2,04 1,53
Portugal -2,24731 -1,48964 -5,45 -1,97
Romania -1,40470 ,75353 -3,41 1,00
Spain -,33848 -1,29162 -,82 -1,71
Sweden 1,40417 -,42425 3,40 -,56
Switzerland ,62967 -,80592 1,53 -1,07
United Kingdom ,93875 -,42776 2,28 -,57
Sowiet Union -,43132 1,70467 -1,05 2,26
FRG (West Germany) 1,12915 ,27755 2,74 ,37
Number of cases read: 23 Number of cases listed: 23
J. Rovan: Multivariate Analysis 9 Principal Component Analysis 21
DESCRIPTIVES
VARIABLES=fac1_1 fac2_1 pc1 pc2
/STATISTICS=MEAN STDDEV MIN MAX .
Descriptives
F:Predmeti EFMagistrski studijMultivariate Analysis (IMB)Priprava prosojnic6_predavanjePCA.sav
Descriptive Statistics
23 -2,24731 1,40417 ,0000000 1,00000000
23 -1,51094 1,70467 ,0000000 1,00000000
23 -5,45 3,40 ,0000 2,42460
23 -2,00 2,26 ,0000 1,32340
23
REGR factor score 1 for analysis 1
REGR factor score 2 for analysis 1
PC1
PC2
Valid N (listwise)
N Minimum Maximum Mean Std. Deviation
J. Rovan: Multivariate Analysis 9 Principal Component Analysis 22
GRAPH
/SCATTERPLOT(BIVAR)=pc1 WITH pc2 BY country (NAME)
/MISSING=LISTWISE .
Graph
F:Predmeti EFMagistrski studijMultivariate Analysis (IMB)Priprava prosojnic6_predavanjePCA.sav

Más contenido relacionado

La actualidad más candente

Forecasting day ahead power prices in germany using fixed size least squares ...
Forecasting day ahead power prices in germany using fixed size least squares ...Forecasting day ahead power prices in germany using fixed size least squares ...
Forecasting day ahead power prices in germany using fixed size least squares ...Niklas Ignell
 
Mining group correlations over data streams
Mining group correlations over data streamsMining group correlations over data streams
Mining group correlations over data streamsyuanchung
 
Principal Component Analysis
Principal Component AnalysisPrincipal Component Analysis
Principal Component AnalysisMason Ziemer
 
Bayes estimators for the shape parameter of pareto type i
Bayes estimators for the shape parameter of pareto type iBayes estimators for the shape parameter of pareto type i
Bayes estimators for the shape parameter of pareto type iAlexander Decker
 
Paper_Sound-LineConstraints_CompositePanel
Paper_Sound-LineConstraints_CompositePanelPaper_Sound-LineConstraints_CompositePanel
Paper_Sound-LineConstraints_CompositePanelRam Mohan
 
Algorithmic Thermodynamics
Algorithmic ThermodynamicsAlgorithmic Thermodynamics
Algorithmic ThermodynamicsSunny Kr
 
KNN and ARL Based Imputation to Estimate Missing Values
KNN and ARL Based Imputation to Estimate Missing ValuesKNN and ARL Based Imputation to Estimate Missing Values
KNN and ARL Based Imputation to Estimate Missing Valuesijeei-iaes
 
Linear regression [Theory and Application (In physics point of view) using py...
Linear regression [Theory and Application (In physics point of view) using py...Linear regression [Theory and Application (In physics point of view) using py...
Linear regression [Theory and Application (In physics point of view) using py...ANIRBANMAJUMDAR18
 
Numerical Study of Some Iterative Methods for Solving Nonlinear Equations
Numerical Study of Some Iterative Methods for Solving Nonlinear EquationsNumerical Study of Some Iterative Methods for Solving Nonlinear Equations
Numerical Study of Some Iterative Methods for Solving Nonlinear Equationsinventionjournals
 
Arthur B. Weglein, Hong Liang, and Chao Ma M-OSRP/Physics Dept./University o...
Arthur B. Weglein, Hong Liang, and Chao Ma  M-OSRP/Physics Dept./University o...Arthur B. Weglein, Hong Liang, and Chao Ma  M-OSRP/Physics Dept./University o...
Arthur B. Weglein, Hong Liang, and Chao Ma M-OSRP/Physics Dept./University o...Arthur Weglein
 
Method of weighted residuals
Method of weighted residualsMethod of weighted residuals
Method of weighted residualsJasim Almuhandis
 
Advanced Support Vector Machine for classification in Neural Network
Advanced Support Vector Machine for classification  in Neural NetworkAdvanced Support Vector Machine for classification  in Neural Network
Advanced Support Vector Machine for classification in Neural NetworkAshwani Jha
 
SupportVectorRegression
SupportVectorRegressionSupportVectorRegression
SupportVectorRegressionDaniel K
 
Projective and hybrid projective synchronization of 4-D hyperchaotic system v...
Projective and hybrid projective synchronization of 4-D hyperchaotic system v...Projective and hybrid projective synchronization of 4-D hyperchaotic system v...
Projective and hybrid projective synchronization of 4-D hyperchaotic system v...TELKOMNIKA JOURNAL
 

La actualidad más candente (19)

Forecasting day ahead power prices in germany using fixed size least squares ...
Forecasting day ahead power prices in germany using fixed size least squares ...Forecasting day ahead power prices in germany using fixed size least squares ...
Forecasting day ahead power prices in germany using fixed size least squares ...
 
Mining group correlations over data streams
Mining group correlations over data streamsMining group correlations over data streams
Mining group correlations over data streams
 
1 s2.0-0378381289800731-main
1 s2.0-0378381289800731-main1 s2.0-0378381289800731-main
1 s2.0-0378381289800731-main
 
Principal Component Analysis
Principal Component AnalysisPrincipal Component Analysis
Principal Component Analysis
 
Bayes estimators for the shape parameter of pareto type i
Bayes estimators for the shape parameter of pareto type iBayes estimators for the shape parameter of pareto type i
Bayes estimators for the shape parameter of pareto type i
 
ECE611 Mini Project2
ECE611 Mini Project2ECE611 Mini Project2
ECE611 Mini Project2
 
Paper_Sound-LineConstraints_CompositePanel
Paper_Sound-LineConstraints_CompositePanelPaper_Sound-LineConstraints_CompositePanel
Paper_Sound-LineConstraints_CompositePanel
 
Algorithmic Thermodynamics
Algorithmic ThermodynamicsAlgorithmic Thermodynamics
Algorithmic Thermodynamics
 
KNN and ARL Based Imputation to Estimate Missing Values
KNN and ARL Based Imputation to Estimate Missing ValuesKNN and ARL Based Imputation to Estimate Missing Values
KNN and ARL Based Imputation to Estimate Missing Values
 
Linear regression [Theory and Application (In physics point of view) using py...
Linear regression [Theory and Application (In physics point of view) using py...Linear regression [Theory and Application (In physics point of view) using py...
Linear regression [Theory and Application (In physics point of view) using py...
 
Powder
PowderPowder
Powder
 
Pca ppt
Pca pptPca ppt
Pca ppt
 
Numerical Study of Some Iterative Methods for Solving Nonlinear Equations
Numerical Study of Some Iterative Methods for Solving Nonlinear EquationsNumerical Study of Some Iterative Methods for Solving Nonlinear Equations
Numerical Study of Some Iterative Methods for Solving Nonlinear Equations
 
Arthur B. Weglein, Hong Liang, and Chao Ma M-OSRP/Physics Dept./University o...
Arthur B. Weglein, Hong Liang, and Chao Ma  M-OSRP/Physics Dept./University o...Arthur B. Weglein, Hong Liang, and Chao Ma  M-OSRP/Physics Dept./University o...
Arthur B. Weglein, Hong Liang, and Chao Ma M-OSRP/Physics Dept./University o...
 
MUMS: Bayesian, Fiducial, and Frequentist Conference - Generalized Probabilis...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Generalized Probabilis...MUMS: Bayesian, Fiducial, and Frequentist Conference - Generalized Probabilis...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Generalized Probabilis...
 
Method of weighted residuals
Method of weighted residualsMethod of weighted residuals
Method of weighted residuals
 
Advanced Support Vector Machine for classification in Neural Network
Advanced Support Vector Machine for classification  in Neural NetworkAdvanced Support Vector Machine for classification  in Neural Network
Advanced Support Vector Machine for classification in Neural Network
 
SupportVectorRegression
SupportVectorRegressionSupportVectorRegression
SupportVectorRegression
 
Projective and hybrid projective synchronization of 4-D hyperchaotic system v...
Projective and hybrid projective synchronization of 4-D hyperchaotic system v...Projective and hybrid projective synchronization of 4-D hyperchaotic system v...
Projective and hybrid projective synchronization of 4-D hyperchaotic system v...
 

Destacado

Gp3 pho-11-1-xfp-cpl-interop-public-20110502
Gp3 pho-11-1-xfp-cpl-interop-public-20110502Gp3 pho-11-1-xfp-cpl-interop-public-20110502
Gp3 pho-11-1-xfp-cpl-interop-public-20110502P Palai
 
広告がうざい
広告がうざい広告がうざい
広告がうざいGen Ito
 
Becoming Part of the Conversation - Australian Institute of Company Directors...
Becoming Part of the Conversation - Australian Institute of Company Directors...Becoming Part of the Conversation - Australian Institute of Company Directors...
Becoming Part of the Conversation - Australian Institute of Company Directors...The Executive Connection (TEC)
 
20060214185511218 bp5 01124-j-cas-en[1]
20060214185511218 bp5 01124-j-cas-en[1]20060214185511218 bp5 01124-j-cas-en[1]
20060214185511218 bp5 01124-j-cas-en[1]P Palai
 
Team Coaching in a Chaotic World - Australian Institute of Management - Keyno...
Team Coaching in a Chaotic World - Australian Institute of Management - Keyno...Team Coaching in a Chaotic World - Australian Institute of Management - Keyno...
Team Coaching in a Chaotic World - Australian Institute of Management - Keyno...The Executive Connection (TEC)
 
Identifying Partisan Slant in News Articles and Twitter during Political Crises
Identifying Partisan Slant in News Articles and Twitter during Political CrisesIdentifying Partisan Slant in News Articles and Twitter during Political Crises
Identifying Partisan Slant in News Articles and Twitter during Political CrisesDima Karamshuk
 
Take-away TV: Recharging Work Commutes with Greedy and Predictive Preloading ...
Take-away TV: Recharging Work Commutes with Greedy and Predictive Preloading ...Take-away TV: Recharging Work Commutes with Greedy and Predictive Preloading ...
Take-away TV: Recharging Work Commutes with Greedy and Predictive Preloading ...Dima Karamshuk
 

Destacado (14)

Gp3 pho-11-1-xfp-cpl-interop-public-20110502
Gp3 pho-11-1-xfp-cpl-interop-public-20110502Gp3 pho-11-1-xfp-cpl-interop-public-20110502
Gp3 pho-11-1-xfp-cpl-interop-public-20110502
 
TakeON! productivity brochure
TakeON! productivity brochureTakeON! productivity brochure
TakeON! productivity brochure
 
TakeON! change brochure
TakeON! change brochureTakeON! change brochure
TakeON! change brochure
 
広告がうざい
広告がうざい広告がうざい
広告がうざい
 
TakeON! Business Matters
TakeON! Business MattersTakeON! Business Matters
TakeON! Business Matters
 
Becoming Part of the Conversation - Australian Institute of Company Directors...
Becoming Part of the Conversation - Australian Institute of Company Directors...Becoming Part of the Conversation - Australian Institute of Company Directors...
Becoming Part of the Conversation - Australian Institute of Company Directors...
 
20060214185511218 bp5 01124-j-cas-en[1]
20060214185511218 bp5 01124-j-cas-en[1]20060214185511218 bp5 01124-j-cas-en[1]
20060214185511218 bp5 01124-j-cas-en[1]
 
The leadership Sphere background information
The leadership Sphere background informationThe leadership Sphere background information
The leadership Sphere background information
 
Our Leadership Declaration
Our Leadership Declaration Our Leadership Declaration
Our Leadership Declaration
 
Team Coaching in a Chaotic World - Australian Institute of Management - Keyno...
Team Coaching in a Chaotic World - Australian Institute of Management - Keyno...Team Coaching in a Chaotic World - Australian Institute of Management - Keyno...
Team Coaching in a Chaotic World - Australian Institute of Management - Keyno...
 
TakeON! Introduction
TakeON! IntroductionTakeON! Introduction
TakeON! Introduction
 
Otl
OtlOtl
Otl
 
Identifying Partisan Slant in News Articles and Twitter during Political Crises
Identifying Partisan Slant in News Articles and Twitter during Political CrisesIdentifying Partisan Slant in News Articles and Twitter during Political Crises
Identifying Partisan Slant in News Articles and Twitter during Political Crises
 
Take-away TV: Recharging Work Commutes with Greedy and Predictive Preloading ...
Take-away TV: Recharging Work Commutes with Greedy and Predictive Preloading ...Take-away TV: Recharging Work Commutes with Greedy and Predictive Preloading ...
Take-away TV: Recharging Work Commutes with Greedy and Predictive Preloading ...
 

Similar a PCA Explained: Data Reduction and Interpretation Using Principal Component Analysis

Robust Fuzzy Data Clustering In An Ordinal Scale Based On A Similarity Measure
Robust Fuzzy Data Clustering In An Ordinal Scale Based On A Similarity MeasureRobust Fuzzy Data Clustering In An Ordinal Scale Based On A Similarity Measure
Robust Fuzzy Data Clustering In An Ordinal Scale Based On A Similarity MeasureIJRES Journal
 
Count-Distinct Problem
Count-Distinct ProblemCount-Distinct Problem
Count-Distinct ProblemKai Zhang
 
Enhance The K Means Algorithm On Spatial Dataset
Enhance The K Means Algorithm On Spatial DatasetEnhance The K Means Algorithm On Spatial Dataset
Enhance The K Means Algorithm On Spatial DatasetAlaaZ
 
2012 mdsp pr09 pca lda
2012 mdsp pr09 pca lda2012 mdsp pr09 pca lda
2012 mdsp pr09 pca ldanozomuhamada
 
A comparative analysis of predictve data mining techniques4
A comparative analysis of predictve data mining techniques4A comparative analysis of predictve data mining techniques4
A comparative analysis of predictve data mining techniques4Mintu246
 
A comparative analysis of predictve data mining techniques4
A comparative analysis of predictve data mining techniques4A comparative analysis of predictve data mining techniques4
A comparative analysis of predictve data mining techniques4Mintu246
 
A note on estimation of population mean in sample survey using auxiliary info...
A note on estimation of population mean in sample survey using auxiliary info...A note on estimation of population mean in sample survey using auxiliary info...
A note on estimation of population mean in sample survey using auxiliary info...Alexander Decker
 
Overview and Implementation of Principal Component Analysis
Overview and Implementation of Principal Component Analysis Overview and Implementation of Principal Component Analysis
Overview and Implementation of Principal Component Analysis Taweh Beysolow II
 
Forecasting With An Adaptive Control Algorithm
Forecasting With An Adaptive Control AlgorithmForecasting With An Adaptive Control Algorithm
Forecasting With An Adaptive Control Algorithmshwetakarsh
 
Numerical Solutions of Burgers' Equation Project Report
Numerical Solutions of Burgers' Equation Project ReportNumerical Solutions of Burgers' Equation Project Report
Numerical Solutions of Burgers' Equation Project ReportShikhar Agarwal
 
Kinetic bands versus Bollinger Bands
Kinetic bands versus Bollinger  BandsKinetic bands versus Bollinger  Bands
Kinetic bands versus Bollinger BandsAlexandru Daia
 
Nonparametric approach to multiple regression
Nonparametric approach to multiple regressionNonparametric approach to multiple regression
Nonparametric approach to multiple regressionAlexander Decker
 

Similar a PCA Explained: Data Reduction and Interpretation Using Principal Component Analysis (20)

Class9_PCA_final.ppt
Class9_PCA_final.pptClass9_PCA_final.ppt
Class9_PCA_final.ppt
 
07 analisis komponen utama
07 analisis komponen utama07 analisis komponen utama
07 analisis komponen utama
 
PCA.ppt
PCA.pptPCA.ppt
PCA.ppt
 
Robust Fuzzy Data Clustering In An Ordinal Scale Based On A Similarity Measure
Robust Fuzzy Data Clustering In An Ordinal Scale Based On A Similarity MeasureRobust Fuzzy Data Clustering In An Ordinal Scale Based On A Similarity Measure
Robust Fuzzy Data Clustering In An Ordinal Scale Based On A Similarity Measure
 
Count-Distinct Problem
Count-Distinct ProblemCount-Distinct Problem
Count-Distinct Problem
 
Enhance The K Means Algorithm On Spatial Dataset
Enhance The K Means Algorithm On Spatial DatasetEnhance The K Means Algorithm On Spatial Dataset
Enhance The K Means Algorithm On Spatial Dataset
 
2012 mdsp pr09 pca lda
2012 mdsp pr09 pca lda2012 mdsp pr09 pca lda
2012 mdsp pr09 pca lda
 
A comparative analysis of predictve data mining techniques4
A comparative analysis of predictve data mining techniques4A comparative analysis of predictve data mining techniques4
A comparative analysis of predictve data mining techniques4
 
A comparative analysis of predictve data mining techniques4
A comparative analysis of predictve data mining techniques4A comparative analysis of predictve data mining techniques4
A comparative analysis of predictve data mining techniques4
 
A note on estimation of population mean in sample survey using auxiliary info...
A note on estimation of population mean in sample survey using auxiliary info...A note on estimation of population mean in sample survey using auxiliary info...
A note on estimation of population mean in sample survey using auxiliary info...
 
Overview and Implementation of Principal Component Analysis
Overview and Implementation of Principal Component Analysis Overview and Implementation of Principal Component Analysis
Overview and Implementation of Principal Component Analysis
 
4 MEDA.pdf
4 MEDA.pdf4 MEDA.pdf
4 MEDA.pdf
 
Forecasting With An Adaptive Control Algorithm
Forecasting With An Adaptive Control AlgorithmForecasting With An Adaptive Control Algorithm
Forecasting With An Adaptive Control Algorithm
 
Estimating Space-Time Covariance from Finite Sample Sets
Estimating Space-Time Covariance from Finite Sample SetsEstimating Space-Time Covariance from Finite Sample Sets
Estimating Space-Time Covariance from Finite Sample Sets
 
Lecture 8.pptx
Lecture 8.pptxLecture 8.pptx
Lecture 8.pptx
 
Numerical Solutions of Burgers' Equation Project Report
Numerical Solutions of Burgers' Equation Project ReportNumerical Solutions of Burgers' Equation Project Report
Numerical Solutions of Burgers' Equation Project Report
 
Kinetic bands versus Bollinger Bands
Kinetic bands versus Bollinger  BandsKinetic bands versus Bollinger  Bands
Kinetic bands versus Bollinger Bands
 
Core Training Presentations- 3 Estimating an Ag Database using CE Methods
Core Training Presentations- 3 Estimating an Ag Database using CE MethodsCore Training Presentations- 3 Estimating an Ag Database using CE Methods
Core Training Presentations- 3 Estimating an Ag Database using CE Methods
 
Nonparametric approach to multiple regression
Nonparametric approach to multiple regressionNonparametric approach to multiple regression
Nonparametric approach to multiple regression
 
7.pdf
7.pdf7.pdf
7.pdf
 

Último

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 

Último (20)

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 

PCA Explained: Data Reduction and Interpretation Using Principal Component Analysis

  • 1. J. Rovan: Multivariate Analysis 9 Principal Component Analysis 1 9 Principal Component Analysis A principal component analysis (PCA) is concerned with explaining the variance-covariance structure of a set of variables through a few linear combinations of these variables, called principal components. Its general objectives are: • data reduction and • interpretation. In general, p principal components are required to reproduce the total system of variability of the original data set (n measurements on p variables). Fortunatelly, much of this variability can often be accounted for by a small number of k of principal components. If so, there is (almost) as much information in the first k components as there is in the original p variables. The k first principal components can then replace the initial p variables, and the original n p× data set is reduced to n k× data set consisting of n measurements on k principal components. An analysis of principal components often reveals relationships that were not previously suspected and thereby allows interpretations that would not ordinarily result. Principal components also frequently serve as intermediate steps in much larger investigations, e.g. as inputs to a multiple regression, cluster analysis, etc. J. Rovan: Multivariate Analysis 9 Principal Component Analysis 2 Example Suppose one would like to investigate the level of the socio-economic development of some European countries in the year 1981. An investigation will take into account the following set of economic, demographic, health, social security and level of living indicators: • Per capita gross domestic product in $ • Share of agriculture in gross domestic product (%) • Share of service activities in gross domestic product (%) • Export/import ratio • Per capita fuel consumption in kilograms of coal • Natural change of population (rates per 1000 inhabitants) • Share of urban population (%) • Infant mortality per 1000 live birth • Number of students per 1000 inhabitants • Number of TV sets per 1000 inhabitants
  • 2. J. Rovan: Multivariate Analysis 9 Principal Component Analysis 3 9.1 Geometry of Principal Component Analysis Example Suppose we have a data set of 12 measurements on 2 variables 1X and 2X for 12 randomly selected units (Sharma, 1966, p. 59). Let us calculate their mean-corrected values. Table 1 1ix 2ix 1,cix 2,cix 16 8 8 5 12 10 4 7 13 6 5 3 11 2 3 -1 10 8 2 5 9 -1 1 -4 8 4 0 1 7 6 -1 3 5 -3 -3 -6 3 -1 -5 -4 2 -3 -6 -6 0 0 -8 -3 J. Rovan: Multivariate Analysis 9 Principal Component Analysis 4 The position of units can be presented with points in the two-dimensional space. The coordinates of the points are the values of mean-corrected variables 1,cX and 2,cX : X1C 1086420-2-4-6-8-10 X2C 10 8 6 4 2 0 -2 -4 -6 -8 -10 12 11 10 9 8 7 6 5 4 3 2 1
  • 3. J. Rovan: Multivariate Analysis 9 Principal Component Analysis 5 9.1.1 Identification of Alternative Axes and Forming New Variables Let * 1,cX be any axis in the two dimensional space that goes through the origin of the two rectangular axes 1,cX and 2,cX 1. Axis * 1,cX is making an angle of θ degrees with 1,cX . The perpendicular projections of the units (observations) onto * 1,cX will give the coordinates of the observations with respect to * 1,cX . These new coordinates are linear combinations of the coordinates of the points with respect to the original set of axes 1,cX and 2,cX : * 1,c 1,c 2,ccos sinX X Xθ θ= ⋅ + ⋅ There is one and only one new axis 1, cξ that results in a new variable accounting for the maximum variance in the data. In our case this axis makes an angle of o 43,261 with 1,cX . The corresponding equation for computing the values of 1,cξ is o o 1,c 1,c 2,c 1,c 2,ccos43,261 sin 43,261 0,728 0,685X X X Xξ = ⋅ + ⋅ = + , while its values are 1,c 1,c 2,c0,728 0,685i i ix xξ = + , 1,2, ,i n= … . 1 The origin 1, 2,( , ) (0,0)c cx x ′ ′= , i.e. the centroid, is always part of the optimal subspace in the sence of least squares. J. Rovan: Multivariate Analysis 9 Principal Component Analysis 6 Of course, a one-dimensional space represented by the new axis 1, cξ (in general) does not account for all the variance of the investigated phenomena, that has been originally presented by the values of the two variables 1,cX and 2,cX in a two-dimensional space. Therefore, it is possible to identify a second axis 2, cξ such that the corresponding new variable accounts for the maximum of the variance that is not accounted for by 1, cξ . Let 2, cξ be the second new axis that is orthogonal to 1, cξ . Thus, if the angle between 1, cξ and 1,cX is θ then the angle between 2, cξ and 2,cX will also be θ. The equation for computing the values of 2,cξ is o o 2,c 1,c 2,c 1,c 2,csin 43,261 cos43,261 0,685 0,728X X X Xξ = − ⋅ + ⋅ = − + , while its values are 2,c 1,c 2,c0,685 0,728i i ix xξ = − + , 1,2, ,i n= … .
  • 4. J. Rovan: Multivariate Analysis 9 Principal Component Analysis 7 The following conclusions can be made from the above figure and the statistical measures: • the perpendicular projections of the points onto the original axes give the values of the original variables 1,cX and 2,cX , and the perpendicular projections of the points onto the new axes give the values for the new variables 1, cξ and 2, cξ . The new axes and the corresponding variables are called principal components and the values of the new variables are called principal component scores. Each of the new variables are linear combinations of the original variables and remain mean-corrected. • The total variance of the principal components is the same as the total variance of the original variables.The variance accounted for by the first principal component is greater than the variance accounted for by any one of the original variables. J. Rovan: Multivariate Analysis 9 Principal Component Analysis 8 The geometrical illustration of principal component analysis can be easily extended to more than two variables. An n p× data set now consists of p variables and each unit (observation) can be represented as a point in a p-dimensional space with respect to the p new axes – principal components. The projections of points on principal components are called principal component scores. If a substantial amount of the total variance in the data set is accounted for by a few first principal components, than we can use these principal components for further analysis or for interpretations instead of the original variables. This would result in a substantial data reduction – an n k× data set ( k p ) of principal component scores is sufficient for further analysis. Hence, principal component analysis is commonly referred to as a data-reduction technique. 9.2 Analytical Approach Let us form the following p linear combinations: 1 11 1 12 2 1 2 21 1 22 2 2 1 1 2 2 p p p p p p p pp p w X w X w X w X w X w X w X w X w X ξ ξ ξ = + + + = + + + = + + + … … … where 1 2, , , pξ ξ ξ… are the p principal components and jkw ( , 1,2, , )j k p= … is the weight of the k-th variable for the j-th principal component.
  • 5. J. Rovan: Multivariate Analysis 9 Principal Component Analysis 9 The principal component weights are estimated in such a way that: 1. The first principal component, 1ξ , accounts for the maximum variance in the data, the second principal component, 2ξ , accounts for the maximum variance that has not been accounted for by the first principal component, and so on 2. For each principal component, the sum of squares of its weights should be equal to 1 2 1 1 p jk k w = =∑ , 1,2, ,j p= … 3. Sum of the products of the corresponding weights of two principal components should be equal to 0 1 0 p jk j k k w w ′′ = =∑ , j j′′≠ The last condition ensures that principal components are ortogonal to each other. How do we obtain the weights such that the above listed conditions are satisfied? We are dealing with an optimization problem, usually based on covariance or correlation matrix. We need to calculate eigen vectors, that define principal component weights, and eigenvalues that represent variances of principal components. J. Rovan: Multivariate Analysis 9 Principal Component Analysis 10 9.3 Issues Relating to the Use of Principal Component Analysis 9.3.1 Effect of Type of Data on Principal Component Analysis Principal component analysis can either be done on raw or mean-corrected data on one hand or on standardised data on the other. Each data set could give a different solution depending upon the extent to which the variances of the variables differ. In case of raw or mean-corrected data, the basis for principal component analysis is covariance matrix. The influence of an individual variable on principal components is determined by the magnitude of its variance. The higher the variance of the variable, the stronger the effect of a variable on principal components. In case of standardized data, the basis for principal component analysis is correlation matrix. All the variances are equal to 1 and therefore they all have the same influence on principal components. In cases for which there is a reason to believe that the variances of the variables do indicate the importance of given variable and the units of measure are commensurable, the raw or the mean- corrected data should be used. In all other cases standardised data are preferable alternative.
  • 6. J. Rovan: Multivariate Analysis 9 Principal Component Analysis 11 9.3.2 Is Principal Component Analysis the Appropriate Technique The use of principal component analysis is appropriate at least in two cases: • if principal components have meaningful interpretation, what is particularly important for their further use in other statistical techniques and/or • if the objective is to reduce the number of variables in the data set to a few principal components without a substantial loss of information. Principal component analysis is most appropriate if the variables are interrelated, for only then will it be possible to reduce a number of variables to a manageble few without much loss of information. Many statistical tests are available for determining if the variables are significantly correlated among themselves. For standardised data we can use Bartlett's test, but we should keep in mind that it is very sensitive on the sample size: 0H : =P I , 1H : ≠P I 2 1 6 ( 1) (2 5) lnn pχ = − − − +⎡ ⎤⎣ ⎦ R 2 ( )/ 2m p p= − J. Rovan: Multivariate Analysis 9 Principal Component Analysis 12 9.3.3 Number of Principal Components to Extract We suggest the use of the following two empirical rules : 1. Kaiser's rule In the case of standardised data, retain only those components whose eigenvalues (variances) are greater than 1. , s 2 1jj ξλ σ= ≥ The rationale for this rule is that for standardised data the amount of variance extracted by each component should, at minimum, be equal to the variance of at least one variable. 2. Scree plot (Cattell, 1966) Plot the percentage of variance (or the eigenvalue) accounted for by each of principal components (on vertical axis) against the ordinal number of the components (on horizontal axis) and look for an elbow. However, no one rule is best under all circumstances. One should take into consideration the purpose of the study, the type of data, and the trade-off between parsimony and the amount of variation in the data that the researcher is willing to sacrifice in order to achieve parsimony. Lastly, and more importantly, one should determine the interpretability of the principal components in deciding upon how many principal components should be retained (Sharma, 1996, p. 79)
  • 7. J. Rovan: Multivariate Analysis 9 Principal Component Analysis 13 9.3.4 Interpreting Principal Components Since principal components are linear combinations of the original variables, one can use loadings (simple correlations between the original variables and principal components) for interpreting the principal components. The higher the loading of a variable, the more influence it has in the formation of the principal component score and vice versa. Traditionally, a loading of 0.5 or above is used as the cutoff point. 9.3.5 Use of Principal Component Scores The principal component scores can be plotted for further interpreting the results. Based on visual examination of the plot, clusters can be defined. Principal component scores can also be used as input variables for further analysing the data using other multivariate techniques such as cluster analysis, multiple regression, and discriminant analysis. The advantage of using principal component scores is that they are not correlated and the problem of multicollinearity is avoided. Unfortunatelly, a new problem can arise due to the inability to meaningfully interpret the principal components. J. Rovan: Multivariate Analysis 9 Principal Component Analysis 14 Example (the level of the socio-economic development of some European countries - continued) GET FILE='F:Predmeti EFMagistrski studijMultivariate Analysis (IMB)Priprava prosojnic6_predavanjePCA.sav'. EXECUTE . LIST . List country gdp agric service expimp energy growth urban infmort student tv Austria 8725 4,4 55,7 ,753 4160,00 ,0 54 14,7 15,7 290 Belgium 9702 2,1 61,2 ,896 6037,00 ,1 72 11,1 20,3 296 Bulgaria 4150 16,9 25,4 1,136 5678,00 ,7 64 19,8 13,2 200 Czechslovakia 5820 8,4 16,9 ,983 6482,00 ,7 63 15,8 12,5 252 Denmark 10874 4,8 66,7 ,898 5225,00 ,3 84 8,8 20,5 361 Finland 10028 8,2 56,2 ,987 5135,00 ,3 62 7,6 17,4 318 France 12214 4,2 60,1 ,840 4351,00 ,4 78 10,1 19,0 299 Greece 3887 15,5 56,7 ,477 2137,00 1,1 62 18,7 12,4 151 Italy 6085 6,4 50,7 ,826 3318,00 ,5 69 15,3 19,1 232 Yugoslavia 2620 13,3 34,8 ,694 2049,00 ,9 42 34,0 20,0 195 Hungary 4180 14,3 26,8 ,954 3850,00 ,4 54 23,7 9,9 251 GDR (East Germany) 7180 9,1 22,1 ,893 7408,00 -,2 77 13,0 23,0 344 Netherlands 9760 4,0 63,0 1,040 6183,00 ,7 76 8,7 23,2 298 Norway 13522 4,5 55,1 1,150 6434,00 ,4 53 8,8 18,5 294 Poland 3900 15,3 20,6 ,856 5590,00 ,9 57 21,3 16,9 218 Portugal 2370 13,0 41,0 ,423 1097,00 1,1 31 39,0 8,6 126 Romania 1904 11,0 25,0 ,904 4593,00 1,0 50 31,6 8,6 166 Spain 5678 8,0 55,0 ,632 2530,00 1,1 74 15,0 17,7 267 Sweden 13326 3,1 65,5 ,991 5296,00 ,3 87 7,5 23,9 375 Switzerland 15069 6,1 55,0 ,881 3708,00 -,3 58 10,0 12,6 320 United Kingdom 9358 1,9 63,5 1,003 4835,00 ,0 91 12,8 13,6 336 Sowiet Union 4550 15,1 23,5 1,115 5598,00 ,9 62 25,6 19,1 307 FRG (West Germany) 11135 2,2 49,9 1,074 5727,00 -,2 85 13,5 18,0 343 Number of cases read: 23 Number of cases listed: 23
  • 8. J. Rovan: Multivariate Analysis 9 Principal Component Analysis 15 FACTOR /VARIABLES gdp agric service expimp energy growth urban infmort student tv /MISSING LISTWISE /ANALYSIS gdp agric service expimp energy growth urban infmort st udent tv /PRINT UNIVARIATE INITIAL CORRELATION SIG DET KMO EXTRACTION FSCORE /PLOT EIGEN /CRITERIA FACTORS(10) ITERATE(25) /EXTRACTION PC /ROTATION NOROTATE /SAVE REG(ALL) /METHOD=CORRELATION . _ - - - - - - - - - - - - F A C T O R A N A L Y S I S - - - - - - - - - - - - Factor Analysis F:Predmeti EFMagistrski studijMultivariate Analysis (IMB)Priprava prosojnic6_predavanjePCA.sav Descriptive Statistics 7653,78 3941,192 23 8,339 4,9609 23 45,670 17,0293 23 ,88722 ,190763 23 4670,4783 1613,28267 23 ,483 ,4448 23 65,43 14,981 23 16,800 8,8066 23 16,683 4,5156 23 271,26 69,072 23 Per capita gross domestic product in $ Share of agriculture in gross domestic product (%) Share of services activities in gross domestic product (%) Export/import ratio Per capita fuel consumption in kilograms of coal Natural change of population (rates per 1000 inhabitants) Share of urban population (%) Infant mortality per 1000 live birth Number of students per 1000 inhabitants Number of TV sets per 1000 inhabitants Mean Std. Deviation Analysis N J. Rovan: Multivariate Analysis 9 Principal Component Analysis 16 Correlation Matrixa 1,000 -,801 ,686 ,410 ,389 -,728 ,542 -,842 ,460 ,799 -,801 1,000 -,723 -,261 -,314 ,655 -,611 ,704 -,447 -,704 ,686 -,723 1,000 -,103 -,151 -,332 ,465 -,606 ,359 ,438 ,410 -,261 -,103 1,000 ,817 -,409 ,406 -,445 ,290 ,573 ,389 -,314 -,151 ,817 1,000 -,449 ,479 -,544 ,437 ,595 -,728 ,655 -,332 -,409 -,449 1,000 -,476 ,622 -,278 -,751 ,542 -,611 ,465 ,406 ,479 -,476 1,000 -,735 ,554 ,744 -,842 ,704 -,606 -,445 -,544 ,622 -,735 1,000 -,549 -,784 ,460 -,447 ,359 ,290 ,437 -,278 ,554 -,549 1,000 ,635 ,799 -,704 ,438 ,573 ,595 -,751 ,744 -,784 ,635 1,000 ,000 ,000 ,026 ,033 ,000 ,004 ,000 ,014 ,000 ,000 ,000 ,115 ,072 ,000 ,001 ,000 ,016 ,000 ,000 ,000 ,319 ,245 ,061 ,013 ,001 ,046 ,018 ,026 ,115 ,319 ,000 ,026 ,027 ,017 ,090 ,002 ,033 ,072 ,245 ,000 ,016 ,010 ,004 ,019 ,001 ,000 ,000 ,061 ,026 ,016 ,011 ,001 ,100 ,000 ,004 ,001 ,013 ,027 ,010 ,011 ,000 ,003 ,000 ,000 ,000 ,001 ,017 ,004 ,001 ,000 ,003 ,000 ,014 ,016 ,046 ,090 ,019 ,100 ,003 ,003 ,001 ,000 ,000 ,018 ,002 ,001 ,000 ,000 ,000 ,001 Per capita gross domestic product in $ Share of agriculture in gross domestic produ Share of services activities in gross domestic Export/import ratio Per capita fuel consumption in kilograms of c Natural change of population (rates per 1000 Share of urban population (%) Infant mortality per 1000 live birth Number of students per 1000 inhabitants Number of TV sets per 1000 inhabitants Per capita gross domestic product in $ Share of agriculture in gross domestic produ Share of services activities in gross domestic Export/import ratio Per capita fuel consumption in kilograms of c Natural change of population (rates per 1000 Share of urban population (%) Infant mortality per 1000 live birth Number of students per 1000 inhabitants Number of TV sets per 1000 inhabitants Correlation Sig. (1-taile Per capita gross domestic product in $ Share of agriculture in gross domestic product (%) Share of services activities in gross domestic product (%) Export/import ratio Per capita fue consumption in kilograms of coal Natural change of population (rates per 1000 inhabitants) Share of urban population (%) nfant mortality per 1000 live birth Number of students per 1000 inhabitants Number of TV sets per 1000 inhabitants Determinant = 2,926E-05a. KMO and Bartlett's Test ,769 186,166 45 ,000 Kaiser-Meyer-Olkin Measure of Sampling Adequacy. Approx. Chi-Square df Sig. Bartlett's Test of Sphericity
  • 9. J. Rovan: Multivariate Analysis 9 Principal Component Analysis 17 Communalities 1,000 1,000 1,000 1,000 1,000 1,000 1,000 1,000 1,000 1,000 1,000 1,000 1,000 1,000 1,000 1,000 1,000 1,000 1,000 1,000 Per capita gross domestic product in $ Share of agriculture in gross domestic product (%) Share of services activities in gross domestic product (%) Export/import ratio Per capita fuel consumption in kilograms of coal Natural change of population (rates per 1000 inhabitants) Share of urban population (%) Infant mortality per 1000 live birth Number of students per 1000 inhabitants Number of TV sets per 1000 inhabitants Initial Extraction Extraction Method: Principal Component Analysis. Total Variance Explained 5,879 58,787 58,787 5,879 58,787 58,787 1,751 17,514 76,302 1,751 17,514 76,302 ,830 8,305 84,607 ,830 8,305 84,607 ,437 4,367 88,973 ,437 4,367 88,973 ,399 3,995 92,968 ,399 3,995 92,968 ,260 2,603 95,570 ,260 2,603 95,570 ,224 2,237 97,808 ,224 2,237 97,808 ,106 1,062 98,870 ,106 1,062 98,870 6,090E-02 ,609 99,479 6,090E-02 ,609 99,479 5,207E-02 ,521 100,000 5,207E-02 ,521 100,000 Component 1 2 3 4 5 6 7 8 9 10 Total % of Variance Cumulative % Total % of Variance Cumulative % Initial Eigenvalues Extraction Sums of Squared Loadings Extraction Method: Principal Component Analysis. J. Rovan: Multivariate Analysis 9 Principal Component Analysis 18 Component Matrixa ,890 -,220 -,226 -,074 ,226 -,118 ,047 ,094 -,091 ,136 -,831 ,341 ,131 -,007 -,024 -,374 ,161 -,061 ,053 ,056 ,589 -,745 ,060 ,135 ,186 ,027 ,076 -,124 ,137 ,024 ,572 ,705 -,117 ,153 ,242 ,113 ,236 -,094 -,040 -,044 ,623 ,719 ,031 ,029 ,075 ,031 -,259 ,003 ,112 ,077 -,764 -,024 ,486 ,261 ,290 ,049 ,006 ,160 -,002 ,000 ,795 ,005 ,296 ,367 -,368 ,018 ,051 -,019 -,050 ,065 -,909 ,077 -,035 -,159 -,098 ,297 ,166 ,007 ,039 ,122 ,651 ,035 ,647 -,382 ,052 ,022 ,012 -,073 -,042 -,001 ,931 ,098 ,005 -,122 -,133 -,029 ,193 ,195 ,106 -,054 Per capita gross domestic product in $ Share of agriculture in gross domestic product (%) Share of services activities in gross domestic product (%) Export/import ratio Per capita fuel consumption in kilograms of coal Natural change of population (rates per 1000 inhabitants) Share of urban population (%) Infant mortality per 1000 live birth Number of students per 1000 inhabitants Number of TV sets per 1000 inhabitants 1 2 3 4 5 6 7 8 9 10 Component Extraction Method: Principal Component Analysis. 10 components extracted.a.
  • 10. J. Rovan: Multivariate Analysis 9 Principal Component Analysis 19 Component Score Coefficient Matrix ,151 -,126 -,272 -,170 ,565 -,454 ,211 ,889 -1,492 2,603 -,141 ,195 ,158 -,015 -,059 -1,436 ,718 -,574 ,873 1,082 ,100 -,425 ,073 ,309 ,465 ,104 ,338 -1,170 2,243 ,459 ,097 ,402 -,141 ,350 ,605 ,432 1,054 -,887 -,660 -,849 ,106 ,411 ,038 ,066 ,187 ,120 -1,157 ,030 1,833 1,475 -,130 -,014 ,585 ,597 ,726 ,187 ,025 1,505 -,032 ,007 ,135 ,003 ,356 ,840 -,921 ,069 ,228 -,179 -,826 1,256 -,155 ,044 -,042 -,365 -,246 1,142 ,740 ,065 ,633 2,343 ,111 ,020 ,779 -,875 ,131 ,085 ,053 -,687 -,692 -,025 ,158 ,056 ,006 -,279 -,334 -,111 ,862 1,834 1,743 -1,041 Per capita gross domestic product in $ Share of agriculture in gross domestic product (%) Share of services activities in gross domestic product (%) Export/import ratio Per capita fuel consumption in kilograms of coal Natural change of population (rates per 1000 inhabitants) Share of urban population (%) Infant mortality per 1000 live birth Number of students per 1000 inhabitants Number of TV sets per 1000 inhabitants 1 2 3 4 5 6 7 8 9 10 Component Extraction Method: Principal Component Analysis. Component Scores. Component Score Covariance Matrix 1,000 ,000 ,000 ,000 ,000 ,000 ,000 ,000 ,000 ,000 ,000 1,000 ,000 ,000 ,000 ,000 ,000 ,000 ,000 ,000 ,000 ,000 1,000 ,000 ,000 ,000 ,000 ,000 ,000 ,000 ,000 ,000 ,000 1,000 ,000 ,000 ,000 ,000 ,000 ,000 ,000 ,000 ,000 ,000 1,000 ,000 ,000 ,000 ,000 ,000 ,000 ,000 ,000 ,000 ,000 1,000 ,000 ,000 ,000 ,000 ,000 ,000 ,000 ,000 ,000 ,000 1,000 ,000 ,000 ,000 ,000 ,000 ,000 ,000 ,000 ,000 ,000 1,000 ,000 ,000 ,000 ,000 ,000 ,000 ,000 ,000 ,000 ,000 1,000 ,000 ,000 ,000 ,000 ,000 ,000 ,000 ,000 ,000 ,000 1,000 Component 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 Extraction Method: Principal Component Analysis. Component Scores. J. Rovan: Multivariate Analysis 9 Principal Component Analysis 20 COMPUTE pc1 = fac1_1*2.4246 . EXECUTE . COMPUTE pc2 = fac2_1*1.3234 . EXECUTE . LIST VARIABLES=country fac1_1 fac2_1 pc1 pc2 . List F:Predmeti EFMagistrski studijMultivariate Analysis (IMB)Priprava prosojnic6_predavanjePCA.sav country FAC1_1 FAC2_1 pc1 pc2 Austria ,20385 -,83959 ,49 -1,11 Belgium ,85869 -,31095 2,08 -,41 Bulgaria -,68262 1,67025 -1,66 2,21 Czechslovakia -,28816 1,39670 -,70 1,85 Denmark 1,05110 -,54403 2,55 -,72 Finland ,54720 -,01549 1,33 -,02 France ,70860 -,84492 1,72 -1,12 Greece -1,28526 -1,51094 -3,12 -2,00 Italy -,07291 -,65342 -,18 -,86 Yugoslavia -1,39868 -,42705 -3,39 -,57 Hungary -,84721 ,73613 -2,05 ,97 GDR (East Germany) ,69647 1,43444 1,69 1,90 Netherlands ,87914 ,04262 2,13 ,06 Norway ,78937 ,41629 1,91 ,55 Poland -,83951 1,15341 -2,04 1,53 Portugal -2,24731 -1,48964 -5,45 -1,97 Romania -1,40470 ,75353 -3,41 1,00 Spain -,33848 -1,29162 -,82 -1,71 Sweden 1,40417 -,42425 3,40 -,56 Switzerland ,62967 -,80592 1,53 -1,07 United Kingdom ,93875 -,42776 2,28 -,57 Sowiet Union -,43132 1,70467 -1,05 2,26 FRG (West Germany) 1,12915 ,27755 2,74 ,37 Number of cases read: 23 Number of cases listed: 23
  • 11. J. Rovan: Multivariate Analysis 9 Principal Component Analysis 21 DESCRIPTIVES VARIABLES=fac1_1 fac2_1 pc1 pc2 /STATISTICS=MEAN STDDEV MIN MAX . Descriptives F:Predmeti EFMagistrski studijMultivariate Analysis (IMB)Priprava prosojnic6_predavanjePCA.sav Descriptive Statistics 23 -2,24731 1,40417 ,0000000 1,00000000 23 -1,51094 1,70467 ,0000000 1,00000000 23 -5,45 3,40 ,0000 2,42460 23 -2,00 2,26 ,0000 1,32340 23 REGR factor score 1 for analysis 1 REGR factor score 2 for analysis 1 PC1 PC2 Valid N (listwise) N Minimum Maximum Mean Std. Deviation J. Rovan: Multivariate Analysis 9 Principal Component Analysis 22 GRAPH /SCATTERPLOT(BIVAR)=pc1 WITH pc2 BY country (NAME) /MISSING=LISTWISE . Graph F:Predmeti EFMagistrski studijMultivariate Analysis (IMB)Priprava prosojnic6_predavanjePCA.sav