A fundamental theory in statistics, possibly applicable to data mining, machine learning, as well as epistemology. The principia mathematica of mine, 2nd version.
Ellipsoidal Representations about correlations (2011-11, Tsukuba, Kakenhi-Symposium)
1. Ellipsoidal representations about correlations
(Towards general correlation theory)
Toshiyuki Shimono
tshimono@05.alumni.u-tokyo.ac.jp
KAKENHI* Symposium
*Grant-in-Aid for Scientific Research
University of Tsukuba
2011-11-8
2. My profile
• My jobs are mainly building algorithms using
data in large amounts such as:
o web access log
o newspaper articles
o POS(Point of Sales) data
o tags of millions of pictures
o links among billions of pages
o psychology test results of a human resource company
o data produced used for recommendation engines
o data produced an original search engine
• This presentation touches on those above.
3. Background
1. Paradoxes of real world data :
o any elaborate regression analysis mostly gives ρ < 0.7
(This is when the observation is not very accurate, and 0.7 is arbitrary.)
-> so how to deal with them?
o data accuracy seems not important to see ρ if ρ < 0.7,
-> details shown later.
2. My temporal answer :
o The correlations are very important,
so we need interpretation methods.
o The ellipsoids will give you insights.
3. Then we will :
o understand the real world dominated by weak correlations.
o find new rules and findings in broad science, hopefully.
4. Main contents
§1. What is ρ?
o Shape of ellipse/ellipsoid
o Mysterious robustness
§2. Geometry of regression
o Similarity ratio of ellips*s
o Graduated rulers
o Linear scalar fields
5. §1. What is ρ ?
(ρ : the correlation coefficient)
It was developed by Karl Pearson from a similar but slightly
different idea introduced by Francis Galton in the 1880s.
(quoted from en.wikipedia.org)
6. The shapes of correlation ellipses (1)
Each entry of the left
figure shows the 2-
dimensional Gaussian
distributions with ρ
changing from -1 to +1
stepping with
0.1. (5000 points are
plotted for each)
7. The shapes of correlation ellipses (2)
The density function of 2-dim Gauss-
distribution with standardizations.
Note: for higher dimensions,
The ellipse inscribes the unit
square at 4 points (±1,±ρ)
and (±ρ,±1).
8. The shapes of correlation ellipses (3)
• Displacement and axial-
rescaling are allowed.
(Rotation or rescaling along
other direction is prohibited.)
When you draw the ellipses above,
1. draw an ellipse with the height and width of √(1±ρ),
2. rotate it 45 degree,
3. do parallel-shift and axial-rescaling.
9. The shapes of correlation ellipses (4)
[Baseball example] 6 teams of the Central League played 130 games in the
each of past 31 years. Each dot below corresponds to each team and each year
(N = 186 = 6 × 31).
x : total score lost(L)
x : total score gained(G) y : - rank
y : - rank ρ = -0.471
ρ = 0.419
x : total score gained x : -rank prediction
from both G & L
y : total score lost
y : - rank
ρ = 0.423
ρ = -0.828
(The prediction is
through the multiple
regression analysis)
11. Correlation ellipsoid (higher dimension)
z ( 0.5 , 0.7 , 1 )
ρ-matrix herein is,
(-1,-0.3,-0.5)
1 0.3 0.5
( 0.3 , 1 , 0.7 ) 0.3 1 0.7
0.5 0.7 1
( 1 , 0.3 , 0.5 )
-0.3 ,-1 ,-0.7 ) y
x
(-0.5 ,-0.7 ,-1 )
For 3-dim case, the probability ellipsoid touches the unit cube
at 6 points of ±( ρ・1 , ρ・2 , ρ・3 ) where ・ = 1,2,3.
(For k-dimensions, the hyper-ellipsoid touches the unit hyper-cube
at 2×k points of of ±( ρ・1 , ρ・2 ,.., ρ・k ) where ・ = 1,2,..,k.
12. The mysterious robustness (1)
ρ[X:Y] and ρ [ f(X) : g(Y) ] seems to differ only little each
other
• when f and g are both increasing functions
• unless X, Y, f(X) or g(Y) contains `outlier(s)'.
(Sampling fluctuations of ρ are much more than the effect
caused by non-linearity as well as error ε.)
* A function f(・) is increasing iff f(x) ≦ f(y) holds for any x ≦ y.
13. The mysterious robustness (2)
ρ[X:Y]=0.557 ρ[X2:Y]=0.519 ρ[X:Y2]=0.536 ρ[X:log(Y)]=0.539
(x,y)=(u,0.5*u+0.707*v) with Xを2乗 Yを2乗 Yを対数化
(u,v) from an uniform square.
ρ[Xrank:Yrank]=0.537 ρ[X(7):Y(7)]=0.524 ρ[X(5):Y(5)]=0.507
X,Yを順位化 X,Yを7値化 X,Yを5値化
Even N=200 causes the sampling
correlations rather big fluctuations,
• The deformations cause less effect on ρ, whereas the X marks from the
• N=200 ≫ 1 causes bigger ρ fluctuations. experiments rather concentrates.
14. The mysterious robustness (3)
Sampled ρ are perturbed corresponding to the sampling size with
N=30(blue) or N=300(red). The deformation effect by f( ) is less.
15. Where does the champion come from?
The champion of a game is often not the true champion.
potential ability
If ρ of the game is not close to 1, the true cannot win.
The winner is approximately ρ times as strong as the true guy.
(If the results and abilities form a 2-dim 0-centered Gaussian.)
16. Summary of `§1. What is ρ? '
• ρ is recognizable as an ellipse.
• ρ-matrix is recognizable as an ellipsoid.
• ρ seems robust against axial deformations unless outliers exist.
• ρ of a game is suggested by the champions.
17. §2. Geometry of Regression
The figures herein show the
possible region where
(x,y,z)=(ρ[Y:Z],ρ[Z:X],ρ[X:Y])
can exist.
18. Multiple-ρ is the similarity ratio of ellipses
[ Formulation of MRA ]
[ Multiple - ρ ]
The multiple-ρ (≦ 1) is the
similarity ratio of the ellipses.
(When X・ is k-dimentional, the hyper-ellipsoid is determined by k×k matrix
whose elements are ρ [ Xi : Xj ], and the inner point is at p-dimensional vector
whose elements are ρ [ Xi : Y ] . )
19. Examples : Multiple-ρ from the ellipses
Many interesting phenomena would be systematically
explained.
20. Partial-ρ is read by a ruler in the ellipse
The partial correlation r1' comes form the idea of the
correlation between X1 and Y but X2 is fixed.
The red ruler
• parallel to the corresponding axis,
• passing through (r1,r2),
• fully expanding inside the ellipse,
• graduated linearly ranging ±1,
reads the partial-ρ.
r1' = 0.75 for this case.
r2' is also read by changing the ruler direction vertically.
21. Standardized partial regression coefficients
• ai are called the partial regression coefficients.
• Assume X1,X2,Y are standardized.
Make a scalar field inside the ellipse
• 1 on the plus-side boundary of k-th axis,
• 0 on the boundary of the other axis,
• interpolate the assigning values linearly.
Then, ak is read by the value at (r1,r2).
Note:
• Extension to higher dimensions are easy.
• Boundary points at each facet is single.
• This pictorialization may be useful to SEM
(Structural Equation Analysis).
22. The elliptical depiction for the baseball example
This page is added after the symposium
Red : for the multiple-ρ (0.828),
Blue : for the two partial-ρ
Magenta : for the partial regression coefficients.
Each value corresponds to the length ratio of the
bold part to the whole same-colored line section.
X1 : annual total score gained
X2: annual total score lost
Y: zero minus annual ranking
( ρ[Y:X1] , ρ[Y:X2] ) = (0.419,-0.471) is plotted
inside the ellipse slanted with ρ[X1:X2]=0.423.
-> The meaning of numbers becomes clearer.
23. Summary and findings
of §2 Geometry of regression
• Multiple-ρ is the similarity ratio of two ellipses/ellipsoids.
• Partial-ρ is read by a graduated ruler in the ellipse/ellipsoids.
• Each regression coefficients are given by the schalar field.
So far, the derived numbers from MRA (Multiple Regression Analysis)
have often said to be hard to recognize. But this situation can be
changed.
24. Summary as a whole
[ Main resutls ]
Using the ellipse or hyper-ellipsoid,
• any correlation matrix is wholly pictorialized.
• multiple regression is translated into geometric quotients.
[ Sub results ]
• ρ seems quite robust against axial deformations unless outliers exist.
• (Spherical trigonometry may give you insights). <- Not referred today.
[ Next steps ]
• treat the parameter/sampling perturbations
• systematize interesting statistical phenomena
• produce new theories further on
• give new twists to other research areas
• make useful applications to the real world cases
• organize a new logic system for this ambiguous world.
25. Refs
1. 岩波数学辞典
Encyclopedic Dictionary of Mathematics, The Mathematical Society of Japan
2. R, http://www.r-project.org/
3. 共分散構造分析 [事例編]
The author sincerely welcomes any related literature.
26. Background of this presentation SKIP
1. We make judgements from related things
in daily or social life, but this real world is
noisy and filled with exceptions.
e.g. "Does the better posture and mental
concentration cause the better performance?"
2. The real world data causes paradoxes :
o any elaborate regression analysis mostly gives ρ < 0.7, how to deal?
o data accuracy is not important when ρ < 0.7, details shown later.
o why subjective sense works in the real?
3. Geometric interpretations of multiple regression analysis may be useful
o that wholly takes in any correlation matrix
o that is geometric using ellipsoids
to observe, analyze the background phenomena in detail.
4. Then we will understand weak correlations that dominates our world.
27. A primitive question SKIP
Question
Why(How) is data analysing important?
My Answer
It gives you inspirations and
updates your recognition to the real world.
Knowing the numbers μ, σ, ρ, ranking, VaR *
from phenomena you have met
is crucially important to make your next action
in either of your daily, social or business life!!
* average, std deviation, correlation coefficient, the rank order, Value at Risk
And so, the interpretation of the numbers is necessary.
(And I provides you that of ρ today!)
28. Main ideas in more detail SKIP
Using the ellipse or hyper-ellipsoid,
• 2nd order moments are completely imaginable in a picture.
• the numbers from Multiple-Regression are also imaginable.
1. (Pearson's) Correlation Coefficient
• basic of statistics (as you know)
• may change well when outliers are contained
• however, changes only few against `monotone' map
• depicted as 'correlation ellipse'
2. Multiple Regression Analysis
• (Spherical Surface Interpretation)
• Ellipse Interpretation
29.
30. Main ideas SKIP
1. What is the correlation coefficient after all?
2. Geometric interpretations of Multiple Regression
Analysis.
31. The mysterious robustness (3) SKIP
front figures: x - original sampling correlation. y - 3-valued then
correlation calculated. back figures: sample of 100.
32.
33. Summary of `§1. What is ρ?
'REDUNDANT
• A correlation ρ is recognizable as an ellipse.
• A correlation matrix is also recognizable as an ellipsoid.
• ρ seems robust against axial deformations unless outliers exist.
• You can guess `ρ' of a game by the champion.
34.
35. When partial-ρ is zero. (SKIP)
The condition partial-ρ = 0 ⇔
• The inner angle of the spheric triangle is 90 degrees.
• The two `hyper-planes' cross at 90 degrees at the `hyper-
axis'. The axis corresponds the fixed variables and each of
the planes contains each of the two variables.
• On the ellipse/ellipsoid, the characteristic point is on the
midpoint of the ruler.
36. Multiple-ρ is the similarity ratio of ellipses
[REDUNDANT ]
Formulation of MRA
[ Multiple - ρ ]
The multiple-ρ (≦ 1) is the
similarity ratio of the ellipses.
For arbitrary variables number case, you
calculate: the inverse of the correlation
(When X・ is k-dimentional, the hyper-
matrix → the reciprocal of each of the
diagonal elements → 1 minus each of them ellipsoid is determined by k×k matrix
→ take the square root of each → each are whose elements are ρ [ Xi : Xj ], and the
the multiple-ρ of the corresponding variable inner point is at p-dimensional vector whose
from the rest variables. elements are ρ [ Xi : Y ] . )
37. Summary and findings
of §2 Geometry of regressionREDUNDANT
• Multiple-ρ is the similarity ratio of two ellipses/ellipsoids.
• Partial-ρ is read by a graduated ruler in the ellipse/ellipsoids.
• Each regression coefficients are given by the scholar field.
• (Spherical trigonometry)
So far, the derived numbers from MRA have often said to be hard
to recognize. But this situation can be changed.
38. Introduction This page is added after the symposium
This page may need intensive proofreading by the author.
There is a Japanese word `kaizen', which means improvement.
The problems still existing today are as follows:
The real world is, however, so ambiguous that it often is hard to - The meaning of correlation value is not yet well known.
know whether any kaizen action would make positive effect or not.- The meaning of multiple regression analysis is also not yet
well known(, although when the correlation is weak the reasonable
Sometimes your action may cause negative effect or zero effect choice of analysis is multiple analysis or its elaborate
in an averaged sense even if you believe your action is a good derivatives).
one. Assume a situation that you can control a variable to make
some effect on the outcome variable (the number of control
variables The author found that correlation is very robust against any
would increase in the following). `axial deformations’ unless variables contain outliers. Rather
sampling correlation coefficient perturbs much more in many
The author's hypothetical proposition is that the correlation cases when N is less than 1000. The author also found
coefficient indeed plays important role. A reason is that when the geometrical backgrounds of correlations of multiple regression
correlation is positive then your rational action is just increasing analysis (Perhaps R.A.Fisher already knew that, but any person
the value of the control variable. And it seems very reasonable around me didn’t know that) that is producing many insights.
that you should select a strongly correlated variable to the output
variable. (The robustness is not well analyzed at this moment (some
pieces of analysis and numerical examples) The
geometrical background is analyzed in basic points so
the author is considering to investigate further for parameter
perturbations.)