Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Hedibert Lopes' talk at BigMC
1. Example 1
Normal factor
Parsimonious Bayesian factor analysis when
model
Variance
the number of factors is unknown1
structure
Identification
issues
Number of
parameters
Ordering of the Hedibert Freitas Lopes
variables
Example 2 Booth School of Business
Reduced-rank
loading matrix
University of Chicago
Structured
loadings
Parsimonious
BFA
New
identification
conditions
Theorem
Example 3 S´minaire BIG’MC
e
Example 4: M´thodes de Monte Carlo en grande dimension
e
simulating
large data Institut Henri Poincar´
e
Example 1:
revisited
March 11th 2011
1
Conclusion Joint work with Sylvia Fr¨hwirth-Schnatter.
u
2. 1 Example 1
2 Normal factor model
Example 1
Variance structure
Normal factor
model Identification issues
Variance
structure
Identification
Number of parameters
issues
Number of
Ordering of the variables
parameters
Ordering of the
variables 3 Example 2
Example 2 Reduced-rank loading matrix
Reduced-rank
loading matrix
4 Structured loadings
Structured
loadings
5 Parsimonious BFA
Parsimonious
BFA New identification conditions
New
identification
conditions
Theorem
Theorem
Example 3
6 Example 3
Example 4: 7 Example 4: simulating large data
simulating
large data
8 Example 1: revisited
Example 1:
revisited
9 Conclusion
Conclusion
3. Example 1. Early origins of health
Example 1
Normal factor
model Conti, Heckman, Lopes and Piatek (2011) Constructing
Variance
structure
Identification
economically justified aggregates: an application to the early
issues
Number of origins of health. Journal of Econometrics (to appear).
parameters
Ordering of the
variables Here we focus on a subset of the British Cohort Study started
Example 2
Reduced-rank
in 1970.
loading matrix
Structured 7 continuous cognitive tests (Picture Language
loadings
Parsimonious
Comprehension,Friendly Math, Reading, Matrices, Recall
BFA Digits, Similarities, Word Definition),
New
identification
conditions
Theorem 10 continuous noncognitive measurements (Child
Example 3 Developmental Scale).
Example 4:
simulating A total of m = 17 measurements on T = 2397 10-year old
large data
Example 1:
individuals.
revisited
Conclusion
5. Normal factor model
Example 1
Normal factor
model
For any specified positive integer k ≤ m, the standard k−factor
Variance
structure
model relates each yt to an underlying k−vector of random
Identification
issues variables ft , the common factors, via
Number of
parameters
Ordering of the
variables yt |ft ∼ N(βft , Σ)
Example 2
Reduced-rank
loading matrix where
Structured ft ∼ N(0, Ik )
loadings
Parsimonious
BFA
and
2 2
New
identification
Σ = diag(σ1 , · · · , σm )
conditions
Theorem
Example 3
Unconditional covariance matrix
Example 4:
simulating
large data
V (yt |β, Σ) = Ω = ββ + Σ
Example 1:
revisited
Conclusion
6. Variance structure
Example 1
Normal factor
model
Variance
structure
Identification
issues
Number of
parameters
Ordering of the
variables Conditional Unconditional
Example 2 var(yit |f ) = σi2 var(yit ) = k βil + σi2
l=1
2
Reduced-rank
loading matrix
cov(yit , yjt |f ) = 0 cov(yit , yjt ) = k βil βjl
l=1
Structured
loadings
Parsimonious
BFA
New
Common factors explain all the dependence structure among
identification
conditions the m variables.
Theorem
Example 3
Example 4:
simulating
large data
Example 1:
revisited
Conclusion
7. Identification issues
Example 1
Normal factor
model
Variance
structure
Identification
issues
Number of
parameters A rather trivial non-identifiability problem is sign-switching.
Ordering of the
variables
Example 2
Reduced-rank
A more serious problem is factor rotation: invariance under any
loading matrix
transformation of the form
Structured
loadings
Parsimonious β ∗ = βP and ft∗ = Pft ,
BFA
New
identification
conditions where P is any orthogonal k × k matrix.
Theorem
Example 3
Example 4:
simulating
large data
Example 1:
revisited
Conclusion
8. Solutions
Example 1
Normal factor
1. β Σ−1 β = I .
model
Variance
Pretty hard to impose!
structure
Identification
issues
Number of
2. β is a block lower triangular2 .
parameters
Ordering of the
variables
β11 0 0 ··· 0
Example 2
Reduced-rank
β21
β22 0 ··· 0
loading matrix β31 β32 β33 ··· 0
Structured
loadings .
. .
. .
. .. .
.
. . . . .
Parsimonious β=
BFA
βk1 βk2 βk3 ··· βkk
New
identification
conditions
βk+1,1 βk+1,2 βk+1,3
··· βk+1,k
Theorem .
. .
. .
. .
. .
.
Example 3
. . . . .
Example 4: βm1 βm2 βm3 ··· βmk
simulating
large data
Example 1: Somewhat restrictive, but useful for estimation.
revisited
2
Conclusion Geweke and Zhou (1996) and Lopes and West (2004).
9. Number of parameters
Example 1
The resulting factor form of Ω has
Normal factor
model
Variance m(k + 1) − k(k − 1)/2
structure
Identification
issues
Number of parameters, compared with the total
parameters
Ordering of the
variables
m(m + 1)/2
Example 2
Reduced-rank
loading matrix in an unconstrained (or k = m) model, leading to the
Structured
loadings
constraint that
Parsimonious
BFA 3 9
New
k ≤m+ − 2m + .
identification 2 4
conditions
Theorem
Example 3
For example,
Example 4: • m = 6 implies k ≤ 3,
simulating
large data • m = 7 implies k ≤ 4,
Example 1: • m = 10 implies k ≤ 6,
revisited
Conclusion • m = 17 implies k ≤ 12.
10. Ordering of the variables
Example 1
Normal factor
model
Variance
structure Alternative orderings are trivially produced via
Identification
issues
Number of
parameters yt∗ = Ayt
Ordering of the
variables
Example 2 for some switching matrix A.
Reduced-rank
loading matrix
Structured
loadings The new rotation has the same latent factors but transformed
Parsimonious loadings matrix Aβ.
BFA
New
identification
conditions y ∗ = Aβf + εt = β ∗ f + εt
Theorem
Example 3
Example 4:
simulating Problem: β ∗ not necessarily block lower triangular.
large data
Example 1:
revisited
Conclusion
11. Solution
Example 1
Normal factor
model
Variance
structure
Identification
We can always find an orthonormal matrix P such that
issues
Number of
parameters
Ordering of the β = β ∗ P = AβP
˜
variables
Example 2
Reduced-rank is block lower triangular and common factors
loading matrix
Structured
loadings ˜
ft = Pft
Parsimonious
BFA
New
still N(0, Ik ) (Lopes and West, 2004).
identification
conditions
Theorem
Example 3
The order of the variables in yt is immaterial when k is
Example 4: properly chosen, i.e. when β is full-rank.
simulating
large data
Example 1:
revisited
Conclusion
12. Example 2. Exchange rate data
Example 1
Normal factor
model
Variance
structure
Identification • Monthly international exchange rates.
issues
Number of
parameters • The data span the period from 1/1975 to 12/1986
Ordering of the
variables inclusive.
Example 2
Reduced-rank
• Time series are the exchange rates in British pounds of
loading matrix
• US dollar (US)
Structured
loadings • Canadian dollar (CAN)
Parsimonious • Japanese yen (JAP)
BFA
New
• French franc (FRA)
identification
conditions • Italian lira (ITA)
Theorem
• (West) German (Deutsch)mark (GER)
Example 3
Example 4:
• Example taken from Lopes and West (2004)
simulating
large data
Example 1:
revisited
Conclusion
13. Posterior inference of (β, Σ)
Example 1
Normal factor 1st ordering
model
Variance
structure US 0.99 0.00 0.05
Identification
issues
CAN 0.95 0.05
0.13
Number of
parameters JAP 0.46 0.42 0.62
Ordering of the E (β|y ) = E (Σ|y ) = diag
variables
FRA 0.39 0.91
0.04
Example 2 ITA 0.41 0.77 0.25
Reduced-rank
loading matrix GER 0.40 0.77 0.28
Structured
loadings
Parsimonious
2nd ordering
BFA
New
identification
US 0.98 0.00 0.06
conditions
Theorem
JAP 0.45 0.42
0.62
Example 3
CAN 0.95 0.03 0.12
E (β|y ) = E (Σ|y ) = diag
Example 4:
FRA 0.39 0.91
0.04
simulating
large data
ITA 0.41 0.77 0.25
Example 1:
GER 0.40 0.77 0.26
revisited
Conclusion
14. Posterior inference of ft
Example 1
Normal factor
model
Variance
structure
Identification
issues
Number of
parameters
Ordering of the
variables
Example 2
Reduced-rank
loading matrix
Structured
loadings
Parsimonious
BFA
New
identification
conditions
Theorem
Example 3
Example 4:
simulating
large data
Example 1:
revisited
Conclusion
15. Reduced-rank loading matrix
Example 1
Normal factor
model
Geweke and Singleton (1980) show that, if β has rank r < k
Variance
structure
then there exists a matrix Q such that
Identification
issues
Number of
parameters
βQ = 0 and Q Q = I
Ordering of the
variables
Example 2 and, for any orthogonal matrix M,
Reduced-rank
loading matrix
Structured ββ + Σ = (β + MQ ) (β + MQ ) + (Σ − MM ).
loadings
Parsimonious
BFA
New
identification This translation invariance of Ω under the factor model implies
conditions
Theorem lack of identification and, in application, induces symmetries
Example 3 and potential multimodalities in resulting likelihood functions.
Example 4:
simulating
large data
This issue relates intimately to the question of uncertainty of
Example 1:
revisited the number of factors.
Conclusion
16. Example 2. Multimodality
Example 1
Normal factor
model
Variance
structure
Identification
issues
Number of
parameters
Ordering of the
variables
Example 2
Reduced-rank
loading matrix
Structured
loadings
Parsimonious
BFA
New
identification
conditions
Theorem
Example 3
Example 4:
simulating
large data
Example 1:
revisited
Conclusion
17. Structured loadings
Example 1
Normal factor
model
Variance
structure
Identification
issues
Number of
parameters Proposed by Kaiser (1958) the varimax method of orthogonal
Ordering of the
variables rotation aims at providing axes with as few large loadings and
Example 2 as many near-zero loadings as possible.
Reduced-rank
loading matrix
Structured
loadings They are also known as dedicated factors.
Parsimonious
BFA
New Notice that the method can be applied to classical or Bayesian
identification
conditions
Theorem
estimates in order to obtain more interpretable factors.
Example 3
Example 4:
simulating
large data
Example 1:
revisited
Conclusion
18. Modern structured factor analysis
Example 1
Normal factor
model
Variance
structure • Time-varying factor loadings: dynamic correlations
Identification
issues
Number of
Lopes and Carvalho (2007)
parameters
Ordering of the
variables
Example 2 • Spatial dynamic factor analysis
Reduced-rank
loading matrix Lopes, Salazar and Gamerman (2008)
Structured
loadings
Parsimonious • Spatial hierarchical factors: ranking vulnerability
BFA
New Lopes, Schmidt, Salazar, Gomez and Achkar (2010)
identification
conditions
Theorem
Example 3 • Sparse factor models
Example 4:
simulating
Fr¨hwirth-Schnatter and Lopes (2010)
u
large data
Example 1:
revisited
Conclusion
19. Example 1: ML estimates
Example 1
Normal factor ˆ ˆ
model i Measurement βi1 βi2 σi2
ˆ
Variance
structure
1 Picture Language 0.32 0.46 0.68
Identification 2 Friendly Math 0.57 0.59 0.34
issues
Number of 3 Reading 0.56 0.62 0.30
parameters
Ordering of the 4 Matrices 0.41 0.51 0.57
variables
5 Recall digits 0.31 0.29 0.82
Example 2
Reduced-rank
6 Similarities 0.39 0.53 0.57
loading matrix 7 Word definition 0.43 0.55 0.52
Structured 8 Child Dev 2 -0.67 0.18 0.52
loadings
9 Child Dev 17 -0.69 -0.11 0.51
Parsimonious
BFA 10 Child Dev 19 -0.74 0.32 0.35
New 11 Child Dev 20 -0.45 0.33 0.69
identification
conditions 12 Child Dev 23 -0.75 0.42 0.26
Theorem
13 Child Dev 30 -0.52 0.28 0.66
Example 3
14 Child Dev 31 -0.45 0.25 0.73
Example 4:
simulating
15 Child Dev 33 -0.42 0.31 0.73
large data 16 Child Dev 52 0.41 -0.28 0.76
Example 1: 17 Child Dev 53 -0.69 0.41 0.36
revisited
Conclusion
20. Example 1: varimax rotation
Example 1
Normal factor ˆ ˆ
model i Measurement βi1 βi2 σi2
ˆ
Variance
structure
1 Picture Language 0.00 0.56 0.68
Identification 2 Friendly Math 0.12 0.81 0.34
issues
Number of 3 Reading 0.10 0.83 0.30
parameters
Ordering of the 4 Matrices 0.04 0.66 0.57
variables
5 Recall digits 0.09 0.41 0.82
Example 2
Reduced-rank
6 Similarities 0.01 0.66 0.57
loading matrix 7 Word definition 0.03 0.69 0.52
Structured 8 Child Dev 2 -0.65 -0.25 0.52
loadings
9 Child Dev 17 -0.50 -0.49 0.51
Parsimonious
BFA 10 Child Dev 19 -0.79 -0.17 0.35
New 11 Child Dev 20 -0.56 0.01 0.69
identification
conditions 12 Child Dev 23 -0.85 -0.09 0.26
Theorem
13 Child Dev 30 -0.58 -0.07 0.66
Example 3
14 Child Dev 31 -0.51 -0.05 0.73
Example 4:
simulating
15 Child Dev 33 -0.52 0.02 0.73
large data 16 Child Dev 52 0.49 0.01 0.76
Example 1: 17 Child Dev 53 -0.80 -0.06 0.36
revisited
Conclusion
21. Example 1: Parsimonious BFA
Example 1
Normal factor ˜ ˜
model i Measurement βi1 βi2 σi2
˜
Variance
structure
1 Picture Language 0 0.57 0.68
Identification 2 Friendly Math 0 0.81 0.34
issues
Number of 3 Reading 0 0.83 0.31
parameters
Ordering of the 4 Matrices 0 0.66 0.56
variables
5 Recall digits 0 0.42 0.83
Example 2
Reduced-rank
6 Similarities 0 0.66 0.56
loading matrix 7 Word definition 0 0.7 0.51
Structured 8 Child Dev 2 -0.68 0 0.54
loadings
9 Child Dev 17 -0.56 0 0.68
Parsimonious
BFA 10 Child Dev 19 -0.81 0 0.35
New 11 Child Dev 20 -0.55 0 0.70
identification
conditions 12 Child Dev 23 -0.86 0 0.27
Theorem
13 Child Dev 30 -0.59 0 0.66
Example 3
14 Child Dev 31 -0.51 0 0.74
Example 4:
simulating
15 Child Dev 33 -0.51 0 0.74
large data 16 Child Dev 52 0.49 0 0.76
Example 1: 17 Child Dev 53 -0.8 0 0.36
revisited
Conclusion
22. Parsimonious BFA
Example 1
Normal factor
model
Variance
We introduce a more general set of identifiability conditions for
structure
Identification the basic model
issues
Number of
parameters
Ordering of the
variables
yt = Λft + t t ∼ Nm (0, Σ0 ) ,
Example 2
Reduced-rank
loading matrix
which handles the ordering problem in a more flexible way:
Structured C1. Λ has full column-rank, i.e. r = rank(Λ).
loadings
Parsimonious C2. Λ is a generalized lower triangular matrix, i.e.
BFA
New
l1 < . . . < lr , where lj denotes for j = 1, . . . , r the row
identification
conditions index of the top non-zero entry in the jth column of Λ, i.e.
Theorem
Example 3
Λlj ,j = 0; Λij = 0, ∀ i < lj .
Example 4: C3. Λ does not contain any column j where Λlj ,j is the only
simulating
large data non-zero element in column j.
Example 1:
revisited
Conclusion
23. The regression-type representation
Example 1
Normal factor
model
Variance
structure
Identification
issues Assume that data y = {y1 , . . . , yT } were generated by the
Number of
parameters
Ordering of the
previous model and that the number of factors r , as well as Λ
variables
and Σ0 , should be estimated.
Example 2
Reduced-rank
loading matrix
Structured The usual procedure is to fit a model with k factors,
loadings
Parsimonious
BFA
yt = βft + t, t ∼ Nm (0, Σ) ,
New
identification
conditions
Theorem
where β is a m × k coefficient matrix with elements βij and Σ
Example 3 is a diagonal matrix.
Example 4:
simulating
large data
Example 1:
revisited
Conclusion
24. Theorem
Example 1
Theorem. Assume that the data were generated by a basic factor model
Normal factor
model
obeying conditions C1 − C3 and that a regression-type representation with
Variance k ≥ r potential factors is fitted. Assume that the following condition holds
structure
Identification for β:
issues
Number of
parameters
B1 The row indices of the top non-zero entry in each non-zero column of
Ordering of the
variables
β are different.
Example 2
Reduced-rank
loading matrix Then (r , Λ, Σ0 ) are related in the following way to (β, Σ):
Structured
loadings (a) r columns of β are identical to the r columns of Λ.
Parsimonious (b) If rank(β) = r , then the remaining k − r columns of β are zero
BFA
New
columns. The number of factors is equal to r and Σ0 = Σ,
identification
conditions (c) If rank(β) > r , then only k − rank(β) of the remaining k − r
Theorem
columns are zero columns, while s = rank(β) − r columns with
Example 3
column indices j1 , . . . , js differ from a zero column for a single
Example 4:
simulating
element lying in s different rows r1 , . . . , rs . The number of factors is
large data equal to r = rank(β) − s, while Σ0 = Σ + D, where D is a m × m
Example 1: diagonal matrix of rank s with non-zero diagonal elements
revisited Drl ,rl = βr2l ,jl for l = 1, . . . , s.
Conclusion
25. Indicator matrix
Example 1
Normal factor
model
Variance
structure
Identification
Since the identification of r and Λ from β by Theorem 1 relies
issues
Number of
on identifying zero and non-zero elements in β, we follow
parameters
Ordering of the previous work on parsimonious factor modeling and consider the
variables
Example 2
selection of the elements of β as a variable selection problem.
Reduced-rank
loading matrix
Structured
loadings We introduce for each element βij a binary indicator δij and
Parsimonious define βij to be 0, if δij = 0, and leave βij unconstrained
BFA
New
identification
otherwise.
conditions
Theorem
Example 3 In this way, we obtain an indicator matrix δ of the same
Example 4: dimension as β.
simulating
large data
Example 1:
revisited
Conclusion
26. Number of factors
Example 1
Normal factor
Theorem 1 allows the identification of the true number of
model
Variance
factors r directly from the indicator matrix δ:
structure
Identification
issues k m
Number of
parameters r= I{ δij > 1},
Ordering of the
variables
j=1 i=1
Example 2
Reduced-rank
loading matrix where I {·} is the indicator function, by taking spurious factors
Structured
loadings
into account.
Parsimonious
BFA
New
This expression is invariant to permuting the columns of δ
identification
conditions which is helpful for MCMC based inference with respect to r .
Theorem
Example 3
Example 4: Our approach provides a principled way for inference on r , as
simulating
large data opposed to previous work which are based on rather heuristic
Example 1: procedures to infer this quantity (Carvalho et al., 2008;
revisited
Bhattacharya and Dunson, 2009).
Conclusion
27. Model size
Example 1
Normal factor
model
Variance
structure
Identification
issues
Number of
The model size d is defined as the number of non-zero
parameters
Ordering of the elements in Λ, i.e.
variables
Example 2 k m
Reduced-rank
loading matrix d= dj I {dj > 1}, dj = δij .
Structured
loadings j=1 j=1
Parsimonious
BFA
New
identification
conditions
˜
Model size could be estimated by the posterior mode d or the
Theorem
posterior mean E(d|y) of p(d|y).
Example 3
Example 4:
simulating
large data
Example 1:
revisited
Conclusion
28. The Prior of the Indicator Matrix
Example 1
To define p(δ), we use a hierarchical prior which allows
Normal factor
model different occurrence probabilities τ = (τ1 , . . . , τk ) of non-zero
Variance
structure elements in the different columns of β and assume that all
Identification
issues
Number of
indicators are independent a priori given τ :
parameters
Ordering of the
variables
Pr(δij = 1|τj ) = τj , τj ∼ B (a0 , b0 ) .
Example 2
Reduced-rank
loading matrix
Structured
loadings A priori r may be represented as r = k I {Xj > 1}, where
j=1
Parsimonious
BFA
X1 , . . . , Xk are independent random variables and Xj follows a
New
identification
Beta-binomial distribution with parameters Nj = m − j + 1, a0 ,
conditions
Theorem and b0 .
Example 3
Example 4:
simulating
We recommend to tune for a given data set with known values
large data m and k the hyperparameters a0 and b0 in such a way that
Example 1:
revisited
p(r |a0 , b0 , m, k) is in accordance with prior expectations
Conclusion
concerning the number of factors.
29. The Prior on the Idiosyncratic
Example 1
Normal factor
Variances
model
Variance
structure
Identification
issues
Number of Heywood problems typically occur, if the constraint
parameters
Ordering of the
variables 1 1
Example 2 ≥ (Ω−1 )ii ⇔ σi2 ≤
Reduced-rank σi2 (Ω−1 ) ii
loading matrix
Structured
loadings is violated, where the matrix Ω is the covariance matrix of yt .
Parsimonious
BFA
New Improper priors on the idiosyncratic variances such as
identification
conditions
p(σi2 ) ∝ 1/σi2 ,
Theorem
Example 3
Example 4:
simulating are not able to prevent Heywood problems.
large data
Example 1:
revisited
Conclusion
30. We assume instead a proper inverted Gamma prior
Example 1 σi2 ∼ G −1 (c0 , Ci0 )
Normal factor
model
Variance
for each of the idiosyncratic variances σi2 .
structure
Identification
issues
Number of
parameters
c0 is large enough to bound the prior away from 0, typically
Ordering of the
variables
c0 = 2.5.
Example 2
Reduced-rank
loading matrix Ci0 is chose by controlling the probability of a Heywood
Structured
loadings
problem
Parsimonious Pr(X ≤ Ci0 (Ω−1 )ii )
BFA
New
identification
where X ∼ G (c0 , 1). So, Ci0 as the largest value for which
conditions
Theorem
1
Example 3
Ci0 /(c0 − 1) ≤
Example 4: (S−1 )ii
y
simulating
large data
or
Example 1:
revisited σi2 ∼ G −1 c0 , (c0 − 1)/(S−1 )ii .
y
Conclusion
31. The Prior on the Factor Loadings
Example 1
Normal factor
model
Variance
structure We assume that the rows of the coefficient matrix β are
Identification
issues
Number of
independent a priori given the factors f1 , . . . , fT .
parameters
Ordering of the
variables
Example 2
Let β δ be the vector of unconstrained elements in the ith row
i·
Reduced-rank
loading matrix
of β corresponding to δ. For each i = 1, . . . , m, we assume
Structured that
β δ |σi2 ∼ N bδ , Bδ σi2 .
loadings
i· i0 i0
Parsimonious
BFA
New
identification
conditions
Theorem The prior moments are either chosen as in Lopes and West
Example 3 (2004) or Ghosh and Dunson (2009) who considered a “unit
Example 4: scale” prior where bδ = 0 and Bδ = I.
i0 i0
simulating
large data
Example 1:
revisited
Conclusion
32. Fractional prior
Example 1
Normal factor
model We use a fractional prior (O’Hagan, 1995) which was applied by
Variance
structure
Identification
several authors for variable selection in latent variable models3
issues
Number of
parameters
Ordering of the
variables
The fractional prior can be interpreted as the posterior of a
Example 2 non-informative prior and a fraction b of the data.
Reduced-rank
loading matrix
Structured
loadings
We consider a conditionally fractional prior for the “regression
Parsimonious
model”
BFA yi = Xδ β δ + ˜i ,
˜ i i·
New
identification
conditions
Theorem where yi = (yi1 · · · yiT ) and ˜i = ( i1 · · · iT ) . Xδ is a
˜ i
Example 3 regressor matrix for β δ constructed from the latent factor
i·
Example 4: matrix F = (f1 · · · fT ) .
simulating
large data
Example 1:
revisited 3
Smith & Kohn, 2002; Fr¨hwirth-Schnatter & T¨chler, 2008; T¨chler,
u u u
Conclusion 2008.
33. Using
Example 1 p(β δ |σi2 ) ∝ p(˜i |β δ , σi2 )b
i· y i·
Normal factor
model we obtain from regression model:
Variance
structure
Identification
issues β δ |σi2 ∼ N biT , BiT σi2 /b ,
i·
Number of
parameters
Ordering of the
variables where biT and BiT are the posterior moments under an
Example 2 non-informative prior:
Reduced-rank
loading matrix
Structured
−1
loadings BiT = (Xδ ) Xδ
i i , biT = BiT (Xδ ) yi .
i ˜
Parsimonious
BFA
New
identification
conditions
Theorem
It is not entirely clear how to choose the fraction b for a factor
Example 3 model. If the regressors f1 , . . . , fT were observed, then we
Example 4: would deal with m independent regression models for each of
simulating
large data which T observations are available and the choice b = 1/T
Example 1: would be appropriate.
revisited
Conclusion
34. The factors, however, are latent and are estimated together
with the other parameters. This ties the m regression models
Example 1
together.
Normal factor
model
Variance
structure
Identification
If we consider the multivariate regression model as a whole,
issues
Number of
then the total number N = mT of observations has to be
parameters
Ordering of the taken into account which motivates choosing bN = 1/(Tm).
variables
Example 2
Reduced-rank
loading matrix In cases where the number of regressors d is of the same
Structured magnitude as the number of observations, Ley & Steel (2009)
loadings
recommend to choose instead the risk inflation criterion
Parsimonious
BFA bR = 1/d 2 suggested by Foster & George (1994), because bN
New
identification
conditions
implies a fairly small penalty for model size and may lead to
Theorem
overfitting models.
Example 3
Example 4:
simulating In the present context this implies choosing bR = 1/d(k, m)2
large data
where d(k, m) = (km − k(k − 1)/2) is the number of free
Example 1:
revisited elements in the coefficient matrix β.
Conclusion
35. Most visited configuration
Example 1
Normal factor
model
Variance Let l = (l1 , . . . , lrM ) be the most visited configuration.
structure
Identification
issues
Number of
parameters We may then estimate for each indicator the marginal inclusion
Ordering of the
variables probability
Λ
Example 2 Pr(δij = 1|y, l )
Reduced-rank
loading matrix
Structured under l as the average over the elements of (δ Λ ) .
loadings
Parsimonious
BFA Note that Pr(δlΛ,j = 1|y, l ) = 1 for j = 1, . . . , rM .
j
New
identification
conditions
Theorem
Example 3
Following Scott and Berger (2006), we determine the median
Example 4:
probability model (MPM) by setting the indicators δij in δ Λ to
Λ
M
simulating Λ
1 iff Pr(δij = 1|y, l ) ≥ 0.5.
large data
Example 1:
revisited
Conclusion
36. Example 1 The number of non-zero top elements rM in the identifiability
Normal factor constraint l is a third estimator of the number of factors, while
model
Variance
structure
the number dM of non-zero elements in the MPM is yet
Identification
issues
another estimator of model size.
Number of
parameters
Ordering of the
variables A discrepancy between the various estimators of the number of
Example 2
Reduced-rank
factors r is often a sign of a weakly informative likelihood and
loading matrix
it might help to use a more informative prior for p(r ) by
Structured
loadings choosing the hyperparameters a0 and b0 accordingly.
Parsimonious
BFA
New
identification
Also the structure of the indicator matrices δ Λ and δ Λ
H M
conditions
Theorem
corresponding, respectively, to the HPM and the MPM may
Example 3 differ, in particular if the frequency pH is small and some of the
Example 4:
Λ
inclusion probabilities Pr(δij = 1|y, l ) are close to 0.5.
simulating
large data
Example 1:
revisited
Conclusion
37. MCMC
Example 1 Highly efficient MCMC scheme:
Normal factor (a) Sample from p(δ|f1 , . . . , fT , τ , y):
model
Variance
structure
(a-1) Try to turn all non-zero columns of δ into zero columns.
Identification (a-2) Update the indicators jointly for each of the remaining
issues
Number of non-zero columns j of δ:
parameters
Ordering of the (a-2-1) try to move the top non-zero element lj ;
variables
(a-2-2) update the indicators in the rows lj + 1, . . . , m.
Example 2
Reduced-rank (a-3) Try to turn all (or at least some) zero columns of δ into a
loading matrix
non-zero column.
Structured
(b) 2 2
Sample from p(β, σ1 , . . . , σm |δ, f1 , . . . , fT , y).
loadings
(c) 2 2
Sample from p(f1 , . . . , fT |β, σ1 , . . . , σm , y).
Parsimonious
BFA (d) Perform an acceleration step (expanded factor model)
New (e) For each j = 1, . . . , k, perform a random sign switch: substitute the draws
identification
conditions of {fjt }T and {βij }m with probability 0.5 by {−fjt }T and {−βij }m ,
t=1 i=j t=1 i=j
Theorem
otherwise leave these draws unchanged. `
Example 3 ´
(f) Sample τj for j = 1, . . . , k from τj |δ ∼ B a0 + dj , b0 + pj , where pj is the
number of free elements and dj = m δij is number of non-zero elements
P
Example 4: i=1
simulating in column j.
large data
Example 1:
revisited
To generate sensible values for the latent factors in the initial model specification,
Conclusion we found it useful to run the first few steps without variable selection.
38. Adding a new factor
Example 1
Normal factor
model
Variance
structure
Identification
issues
Number of
parameters
Ordering of the
variables
X 0 0 X 0 0 0
Example 2
X 0 0
X 0 0 0
Reduced-rank
loading matrix
X 0 X
X 0 X 0
Structured X X X =⇒ X X X 0
loadings
X X X X X X X
Parsimonious
BFA X X X X X X 0
New
identification
conditions
X X X X X X X
Theorem
Example 3
Example 4:
simulating
large data
Example 1:
revisited
Conclusion
39. Removing a factor
Example 1
Normal factor
model
Variance
structure
Identification
issues
Number of
parameters
Ordering of the
variables X 0 0 0 X 0 0 0 X 0 0
Example 2 X 0 0 0 X 0 0 0 X 0 0
Reduced-rank
loading matrix
X 0 X 0
X 0 X 0
X 0 X
Structured ⇒ ⇒
loadings
X X X 0 X X X 0 X X X
Parsimonious
X X X X
X X X X
X X X
BFA
New
X X X 0 X X X 0 X X X
identification
conditions X X X X X X X 0 X X X
Theorem
Example 3
Example 4:
simulating
large data
Example 1:
revisited
Conclusion
40. Example 3: Maxwell’s Data
Example 1
Normal factor Scores on 10 tests for a sample of T = 148 children attending
model
Variance
a psychiatric clinic as well as a sample of T = 810 normal
structure
Identification children.
issues
Number of
parameters
Ordering of the
variables The first five tests are cognitive tests – (1) verbal ability, (2)
Example 2 spatial ability, (3) reasoning, (4) numerical ability and (5)
Reduced-rank
loading matrix verbal fluency. The resulting tests are inventories for assessing
Structured
loadings
orectic tendencies, namely (6) neurotic questionaire, (7) ways
Parsimonious to be different, (8) worries and anxieties, (9) interests and (10)
BFA
New
annoyances.
identification
conditions
Theorem
Example 3
While psychological theory suggests that a 2-factor model is
Example 4: sufficient to account for the variation between the test scores,
simulating
large data
the significance test considered in Maxwell (1961) suggested to
Example 1: fit a 3-factor model to the first and a 4-factor model to the
revisited
second data set.
Conclusion
41. Example 1
Table: Maxwell’s Children Data - neurotic children; posterior
Normal factor
distribution p(r |y) of the number r of factors (bold number
model corresponding to the posterior mode ˜) and highest posterior
r
Variance
structure identifiability constraint l with corresponding posterior probability
Identification
issues p(l |y) for various priors; number of visited models Nv ; frequency pH ,
Number of
parameters number of factors rH , and model size dH of the HPM; number of
Ordering of the
variables factors rM and model size dM of the MPM corresponding to l ;
Example 2 inefficiency factor τd of the posterior draws of the model size d.
Reduced-rank
loading matrix
p(r |y)
Structured Prior 2 3 4 5-6 l p(l |y)
loadings b = 10−3 0.755 0.231 0.014 0 (1,6) 0.532
bN = 6.8 · 10−4 0.828 0.160 0.006 0 (1,6) 0.623
Parsimonious
bR = 4.9 · 10−4 0.871 0.127 0.001 0 (1,6) 0.589
BFA
b = 10−4 0.897 0.098 0.005 0 (1,6) 0.802
New
identification GD 0.269 0.482 0.246 0.003 (1,2,3) 0.174
conditions LW 0.027 0.199 0.752 0.023 (1,2,3,6) 0.249
Theorem Prior Nv pH rH dH rM dM τd
Example 3 b = 10−3 1472 0.20 2 12 2 12 30.9
bN = 6.8 · 10−4 976 0.27 2 12 2 12 27.5
Example 4: bR = 4.9 · 10−4 768 0.34 2 12 2 12 22.6
simulating b = 10−4 421 0.45 2 12 2 12 18.1
large data GD 4694 0.06 2 15 3 19 40.6
LW 7253 0.01 4 24 4 24 32.5
Example 1:
revisited
Conclusion
42. Example 1 Table: Maxwell’s Children Data - normal children; posterior
Normal factor distribution p(r |y) of the number r of factors (bold number
model
Variance corresponding to the posterior mode ˜) and highest posterior
r
structure
Identification identifiability constraint l with corresponding posterior probability
issues
Number of p(l |y) for various priors; number of visited models Nv ; frequency pH ,
parameters
Ordering of the number of factors rH , and model size dH of the HPM; number of
variables
Example 2
factors rM and model size dM of the MPM corresponding to l ;
Reduced-rank inefficiency factor τd of the posterior draws of the model size d.
loading matrix
Structured p(r |y)
loadings Prior 3 4 5 6 l p(l |y)
b = 10−3 0 0.391 0.604 0.005 (1,2,4,5,6) 0.254
Parsimonious bN = 1.2 · 10−4 0 0.884 0.116 0 (1,2,4,5) 0.366
BFA b = 10−4 0 0.891 0.104 0.005 (1,2,4,5) 0.484
New GD 0 0.396 0.594 0 (1,2,4,5,6) 0.229
identification
conditions LW 0 0.262 0.727 0.011 (1,2,4,5,6) 0.259
Theorem Prior Nv pH rH dH rM dM τd
b = 10−3 4045 1.79 5 26 5 27 30.1
Example 3
bN = 1.2 · 10−4 1272 11.41 4 23 4 23 28.5
Example 4: b = 10−4 1296 12.17 4 23 4 23 29.1
simulating GD 4568 1.46 5 29 5 28 32.7
large data LW 5387 1.56 5 28 5 28 30.7
Example 1:
revisited
Conclusion
46. Example 1: revisited
Example 1
Normal factor
model
Variance
structure
Identification Outcome system Measurement system (age
issues
Number of
parameters Education: A-level or above (i.e., 10)
Ordering of the
variables Diploma or Degree) Cognitive tests: 7 scales
Example 2 Health (age 30): self-reported poor Personality traits: 5 sub-scales
Reduced-rank health, obesity, daily smoking =⇒ see next slide
loading matrix
Log hourly wage (age 30)
Structured
loadings
Parsimonious
BFA Control variables
New mother’s age and education at birth, father’s high social class at birth,
identification
conditions total gross family income and number of siblings at age 10, broken family,
Theorem
number of previous livebirths.
Example 3
Exclusions in choice equation: deviation of gender-specific county-level
Example 4:
simulating
unemployment rate and of gross weekly wage from their long-run average
large data wages, for the years 1987-88.
Example 1:
revisited
Conclusion
47. Measurement system: 126 items
Example 1
Normal factor
(age 10)
model
Variance
structure
Identification
issues
Number of
Cognitive tests (continuous Personality trait scales
parameters
Ordering of the scales)
variables
• Rutter Parental ‘A’ Scale of
Example 2
Reduced-rank
• Picture Language Behavioral Disorder: 19
loading matrix Comprehension Test continuous items
Structured
loadings • Friendly Math Test • Conners Hyperactivity Scale:
• Shortened Edinburgh Reading 19 continuous items
Parsimonious
BFA
Test • Child Developmental Scale
New
identification [CDS]: 53 continuous items
conditions
British Ability Scales (BAS):
Theorem • Locus of Control Scale [LoC]:
Example 3 • BAS - Matrices 16 binary items
Example 4: • BAS - Recall Digits • Self-Esteem Scale [S-E]: 12
simulating
large data • BAS - Similarities binary items
Example 1:
revisited
• BAS - Word Definition
Conclusion
48. Model ingredients
Example 1
Normal factor
model
Variance
structure
Identification
issues
Number of
parameters
Ordering of the
variables
Example 2
Reduced-rank
loading matrix
Structured
loadings
Parsimonious
BFA
New
identification
conditions
Theorem
Example 3
Example 4:
simulating
large data
Example 1:
revisited
Conclusion
52. Conclusion
Example 1
Normal factor
model
Variance
structure
Identification • Summary
issues
Number of • New take on factor model identifiability
parameters
Ordering of the • Generalized triangular parametrization
variables
Example 2
• Customized MCMC scheme
Reduced-rank • Highly efficient model search strategy
loading matrix
Structured
• Posterior distribution for the number of common factors
loadings
Parsimonious
BFA
New • Extensions
identification
conditions
Theorem
• Nonlinear (discrete choice) factor models
Example 3 • Non-normal (mixture of) factor models
Example 4:
• Dynamic factor models
simulating
large data
Example 1:
revisited
Conclusion
53. Example 1
Normal factor
model
Variance
structure
Identification
issues
Number of
parameters
Ordering of the
variables
Example 2
Reduced-rank
loading matrix
Structured
loadings Thank you!
Parsimonious
BFA
New
identification
conditions
Theorem
Example 3
Example 4:
simulating
large data
Example 1:
revisited
Conclusion