4. Scatterplot: 산점도
- “Lego” for analytic data visualization
- Reflecting the third variable
quakes:
2013.11.29
longitude(=x), latitude(=y), depth(=z)
4
Health Info & Stat
5. Scatterplot: 산점도
- For the case of large (≧ ), over-plotting can produce
serious outcome.
Skin Segmentation Data: (red) vs. (green)
2013.11.29
5
Health Info & Stat
6. Scatterplot: 산점도
- For the case of large (≧ ), alpha channel can be utilized.
Skin Segmentation Data: (red) vs. (green)
2013.11.29
6
Health Info & Stat
7. Scatterplot: 산점도
- lowess: A nonparametric regression for bivariate data
cars data: distance vs. speed
2013.11.29
7
Health Info & Stat
8. Scatterplot: 산점도
- 3D Rotation for three variables
Skin Segmentation Data: (red), (green), (blue)
- ggobi:
2013.11.29
3D Rotation for four or more variables
8
Health Info & Stat
9. Biplot of Observations and Variables,
Gabriel (1971)
- The biplot is a graph that shows observations and variables.
Protein data (row: 25 nations, column: 9 protein sources)
2013.11.29
9
Health Info & Stat
10. Biplot of Observations and Variables,
Gabriel (1971)
- Idea: Linear projection
Protein data: variable cereal
2013.11.29
10
Health Info & Stat
11. Regression Biplot,
Huh and Lee (2013)
- Regression biplot is a graph for observations of ⋯ ,
arranged by predicted .
- Assume that the model fit is determined by a function of linear
combination of ⋯ . For instance,
⋯ ,
or
log ⋯ .
- Set the vertical dimension by the direction of regression coefficients
⋮ ,
or .
∥∥
- Set the horizontal dimension by the direction of principal axis of
⋯ ,
where
denotes the orthogonal component generated from the
projection of on .
2013.11.29
11
Health Info & Stat
12. Regression Biplot,
Huh and Lee (2013)
Example 1. Stack Loss Data ( ; loss of ammonia, )
2013.11.29
12
Health Info & Stat
13. Regression Biplot,
Huh and Lee (2013)
Example 2. Magazine Data ( ; Subscription (0,1), )
2013.11.29
13
Health Info & Stat
14. Kernel PCA,
Scholkopf et al. (1998)
- For observations ⋯ ( × ), consider the nonlinear mapping
⋯
to a Hilbert space, in which .
- Denoting , Kernel PCA is obtained from
eigen-decomposing
.
- Kernel PCA yields a plot of observations by projecting ⋯
on
′
where
2013.11.29
′
′ ,
, is an eigenvector of .
14
Health Info & Stat
15. Kernel PCA Diagram (or Kernel Biplot),
Huh (2013)
- Aim: Representation of variables in Kernel PC plot of observations.
- Proposed Procedure:
1) For each ⋯ , map on the plane,
⋯ , where is a constant and ⋯ ⋯ .
Projection is given by
′
′
′
″
′ ″
″ ″′ .
″
″
″ ″′
2) For each , link the projection points of and
2013.11.29
15
by an arrow.
Health Info & Stat
16. Example 1. Arrow diagrams [ ] for kernel PCA of the iris data
with rbf kernel,
2013.11.29
16
Health Info & Stat
17. Example 1. Arrow diagrams [ ] for kernel PCA of the iris data
with rbf kernel,
2013.11.29
17
Health Info & Stat
18. Example 2. Arrow diagrams [ ] for kernel PCA of the spam data
[ ]
2013.11.29
18
Health Info & Stat
19. SVM-Guided Biplot as an extension of Regression Biplot
- Idea: Combine Linear/Logistic Regression Biplot and Kernel PCA.
- Classification/Regression Part:
Classified
as
SVM classifier
-1 or 1 for ⋯ .
,
where
,
Vertical dimension is set to
2013.11.29
≧ .
( , ).
19
Health Info & Stat
20. SVM-Guided Biplot: Classification
- Kernel PCA Part:
∴
(
′ ),
′ ′
⋯ .
′ ′ ′
′ ′ ,
′ ⋯ .
Hence
→ ( ) or .
Horizontal dimension is determined by eigen-decomposing .
- Perturbation Scheme for Arrow Diagrams.
Define , × , where represents a perturbation of
which the magnitude is controlled by . Then, project on the first
(vertical) and the second (horizontal) dimension.
2013.11.29
20
Health Info & Stat
21. Example 1. Iris Data: Versicolor vs. Virginica [sigma=0.1, C=1, ]
2013.11.29
21
Health Info & Stat
22. Importance of Variables
(in the case of large
)
- It is necessary to select a small number of variables in determining
the first and second dimensions.
- Measures of Importance (definition) Length of Arrows
1) in vertical direction,
2) in horizontal direction.
- Plot arrow diagrams for importance variables only.
2013.11.29
22
Health Info & Stat
23. Example 2. Spam Data [sigma=0.1, C=10, ],
2013.11.29
23
Health Info & Stat
24. SVM-Guided Biplot: Regression
- The same method can be applied to SVM regression.
- Example 3. Aerobic Fitness [ ] for oxygen uptake (= )
with RBF kernel ( =0.1, C=10, =0.1, )
2013.11.29
24
Health Info & Stat
25. Concluding Remarks
- Biplot method can be extended to be suited for linear regression or
classification (logistic regression).
- Biplot method can be extended to allow nonlinear mapping of
observations and variables, by fully utilizing kernel trick.
http://blog.naver.com/huh4200
금붕어 어항 (on the iPad)
2013.11.29
25
Health Info & Stat
26. References
Gabriel, K.R. (1971). “The biplot display of matrices with the application to
principal component analysis”. Biometrika, 58. 453-467.
Huh, M.H. (2013). “Arrow diagrams for kernel principal component analysis”.
Communications for Statistical Applications and Methods, 20. 175-184.
Huh, M.H. (2013). “SVM-guided biplot of observations and variables”.
Communications for Statistical Applications and Methods. (to appear)
Huh, M.H. and Lee, Y.G. (2013). “Biplots of multivariate data guided by linear
and/or logistic regression”. Communications for Statistical Applications and
Methods, 20. 129-136.
Scholkopf, B., Smola, A. and Muller, K.R. (1998). Nonlinear component analysis as
a kernel eigenvalue problem. Neural Computation, 10. 1299–1319.
2013.11.29
26
Health Info & Stat