3. Introduction
• Parameter uncertainty and learning is ubiquitous in finance.
• Parameters (or certain states) of financial models are never known with certainty.
• Participants in financial markets acquire substantial information over time and
learn about the true parameters or states of the economy.
• Furthermore, they may want to use non-sample information in their decision
making process.
• Bayesian portfolio analysis allows the investor to combine proprietary beliefs,
parameter uncertainty and learning in dynamic decision making process.
• Whereas dynamic decision making under conditions of risk is meanwhile quite well
understood, the impact of parameter uncertainty and learning is not.
• Bayesian portfolio analysis is extended to a multi-period dynamic setting and
conditions for learning dynamic models established.
• The Bayesian learning process for conditionally normal linear models is shown to
be the Bayesian version of the well known Kalman filter.
• The learning process for the unobservable regime of a regime-switching model is
shown to be the Bayesian version of Hamilton′s basic filter.
3
4. Bayesian Portfolio Analysis (1)
• Bayesian portfolio analysis has a long tradition in finance.
• The literature includes:
– Uninformative prior approach.1
– Informative prior approach.2
– Shrinkage models (such as James-Stein estimators and Bayes-Stein).3
– Mixed estimation and the Black-Litterman Model.4
– Prior beliefs in an Asset Pricing Theory.5
– Prior beliefs in no-predictability in forecasting models.6
– Model uncertainty and Bayesian model selection and averaging.7
• All these approaches are deeply rooted in the theory of Bayesian analysis.
• Significant simplification can be achieved for wide class of conditionally normal
linear models in application of conjugate informative priors or diffuse (uninforma-
tive) Jeffreys′ priors.
4
5. Bayesian Portfolio Analysis (2)
• The classical portfolio selection problem:8
max ET [U (WT +1)] = max U (WT +1)p(rT +1|θ)drT +1, (1)
ω ω Ω
where Ω is the sample space, U (WT +1) is a utility function, WT +1 is the wealth
at time T + 1, θ is a set parameters, ω are portfolio weights, and p(rT +1|θ) is
the sample density of returns.
• Bayesian portfolio selection problem:9
max ET [U (WT +1)] = max U (WT +1)p(rT +1|ΦT )drT +1, (2)
ω ω Ω
where ΦT is the information available up to time T , p(rT +1|ΦT ) is the Bayesian
predictive distribution (density) of asset returns.
5
6. Bayesian Portfolio Analysis (3)
• Bayesian decomposition, Bayes′ rule and Fubini′s theorem:10
ET [U (WT +1)] = U (WT +1)p(rT +1|ΦT )drT +1 (3)
Ω
= U (WT +1)p(rT +1, θ|ΦT )d(drT +1, θ)
Ω×Θ
= U (WT +1)p(rT +1|θ)p(θ|ΦT )dθdrT +1
Ω Θ
= U (WT +1) p(rT +1|θ)p(ΦT |θ)p(θ)dθ drT +1,
Ω Θ
where Θ is the parameter space, p(rT +1, θ|ΦT ) is the joint density of parameters
and realizations, p(θ|ΦT ) is the posterior density, p(ΦT |θ) is the conditional
likelihood, and p(θ) is the prior density of the parameters.
• There are many ways to derive the predictive distribution.
6
7. Bayesian Portfolio Analysis (4)
• Uninformative prior approach
The model:11
r t = µ + ut (4)
ut ∼ i.i.d. N (0ι, Σ) (5)
where rt is the m × 1 vector of asset returns at time t, µ is the m × 1 vector of
unknown means, ut is the m × 1 of disturbances, ι is a m × 1 vector of ones, and
Σ is a PDS covariance matrix. The model has sample density:
p(rt|µ, Σ) = N µ, Σ . (6)
With a uninformative prior density p(µ) = ιc and assuming Σ known, the posterior
is given by12
1
p(µ|ΦT , Σ) = N µ, Σ ,
ˆ (7)
T
7
8. where ΦT is the information history and
T
1
ˆ
µ= rt . (8)
T t=1
The predictive density of the one period ahead returns is13
1
p(rT +1|ΦT , Σ) = N ˆ
µ, Σ + Σ . (9)
T
Following the variance decomposition, the k-period ahead predictive density is14
k2 k(T + k)
p(rT,T +k |ΦT , Σ) = N (k µ, kΣ + Σ) = N
ˆ ˆ
k µ, Σ . (10)
T T
Parameter uncertainty has a large impact on predictive density in the long-term
as noted by Barberis (2000).
8
9. Bayesian Portfolio Analysis (5)
• Informative prior approach
The model is the same as before, but here the prior is given a multivariate normal
density p(µ) = N (m0, Λ0), where m0 is the m × 1 vector of priors on the means,
and Λ0 is a m × m matrix of prior uncertainty. Σ is again assumed to be known.
Then, the posterior of µ is15
p(µ|ΦT , Σ) = N mT , ΛT (11)
−1
mT = Λ−1 + T Σ−1
0 (Λ−1m0 + T Σ−1µ)
0 ˆ (12)
−1
ΛT = Λ−1 + T Σ−1
0 . (13)
The predictive density of one period ahead asset returns is given by
p(rT +1|ΦT , Σ) = N (mT , Σ + ΛT ) . (14)
9
10. Bayesian Portfolio Analysis (6)
• Posterior Shrinkage16
The mean of the posterior in (12) can be written in shrinkage form:
mT = δm0 + (I − δ)µ,
ˆ (15)
where I is an m × m identity matrix with principal diagonal elements of one and
zeros elsewhere. δ is called the posterior shrinkage factor. It can be obtained
using matrix algebra and can be shown to be17
−1
δ = Λ−1 + T Σ−1
0 Λ−1
0 (16)
−1
= [prior covariance]−1 + [conditional covariance]−1 [prior covariance]−1
= [posterior covariance][prior covariance]−1.
The shrinkage target m0 and Λ0 can be obtained from a minimum variance
portfolio, reversed optimization or any other economic reasonable information the
investor might have prior to seeing the data.
10
11. Bayesian Portfolio Analysis (7)
• Bayes-Stein estimator
The mean of the posterior is the same as in posterior shrinkage above. The
Bayes-Stein estimator introduced by Jorion (1986) use the minimum variance
portfolio as a shrinkage target:
1
p(µ) = N (µ0ι, Σ) (17)
κ
ι′Σ−1
µ0 = ′ −1 µ.ˆ
ιΣ ι
Jorion (1986) obtains a posterior of the form:
p(µ|ΦT , Σ) = N mT , ΛT (18)
−1 −1 −1
mT = κΣ + TΣ (κΣ−1µ0ι + T Σ−1µ)
ˆ
−1 −1 −1
ΛT = κΣ + TΣ .
11
12. and in shrinkage form, this is
mT = δµ0ι + (1 − δ)µ
ˆ (19)
κ
δ = . (20)
κ+T
The predictive density for one period ahead returns is
p(rT +1|ΦT , Σ) = N mT , Σ + ΛT (21)
1
= N mT , 1 + Σ . (22)
κ+T
ιι′
Because the shrinkage target µ0 has minimum variance ι′ Σ−1 ι
, Jorion also finds
that
1 κ ιι′
p(rT +1|ΦT , Σ) = N mT , 1 + Σ+ ′ Σ−1 ι
. (23)
κ+T T (T + 1 + κ) ι
12
13. Bayesian Portfolio Analysis (8)
• Optimal shrinkage approach (James-Stein estimator)
ˆ
James & Stein (1961) define a loss function based on the estimate µs such that
(Jorion, 1986, p. 283)
L(µ, µs) = (µ − µs)′Σ−1(µ − µs).
ˆ ˆ ˆ (24)
They find that for a shrinkage target µ0 and prior variance λ2, such that the prior
0
2
is p(µ) = N (µ0ι, λ0I), the posterior is given by
p(µs|δ ∗, Σ) = N (ms, Λs)
ˆ (25)
ms = δ ∗µ0ι + (1 − δ ∗)µ
ˆ
Λs = δ ∗(λ2I),
0
13
14. where δ ∗ is called the optimal shrinkage factor
(m − 2)/T
δ ∗ = min 1, ′Σ−1 (µ − µ ι)
, (26)
(µ − µ0ι)
ˆ ˆ 0
and
T
1
ˆ
µ= rt . (27)
T t=1
Although the James-Stein estimator is usually used as a point estimate, it can be
shown that the predictive density of one period ahead asset returns is
p(rT +1|δ ∗, Σ) = N (ms, Σ + Λs) . (28)
14
15. Bayesian Portfolio Analysis (9)
• Mixed estimation18
Let the sample density of returns be given a multivariate normal density
p(rt|µ, Σ) = N µ, Σ , (29)
and the prior density for the m × 1 vector µ also have multivariate normal density
with
p(µ) = N (m0, Λ0) . (30)
The investor expresses views about µ by imposing
p(v|µ) = N (Pµ, Ω) , (31)
where P is an m × m design matrix that selects and combines returns into
portfolios about which the investor is able to express his views. v is a m × 1
vector of views and Ω expresses the uncertainty of those views. It emerges that
15
16. the posterior of µ updated by the views is
p(µ|v) = N (mv , Λv ) (32)
−1
mv = Λ−1
0 +PΩ ′ −1
P Λ−1m0 + P′Ω−1v
0
−1
Λv = Λ−1
0 +PΩ ′ −1
P .
Then, the predictive density of one period ahead returns is obtained by integrating
over the unknown parameter µ
p(rT +1|v, Σ) = p(rT +1|µ, Σ)p(µ|v)dµ, (33)
Θ
which can be shown to result in
p(rT +1|Σ, v) = N mv , Σ + Λv . (34)
16
17. Bayesian Portfolio Analysis (10)
• Black-Litterman model
Black & Litterman (1992) suggest using the market equilibrium model as a prior
µequ = γΣω ∗ ,
mkt (35)
where γ is the risk aversion of a power utility investor and ω ∗ is the market
mkt
capitalization. Black & Litterman assume a natural conjugate prior for the vector
of means such that
p(µ) = N µequ, λ0Σ . (36)
The investor expresses views about the µ by imposing
p(v|µ) = N (Pµ, Ω) . (37)
17
18. It follows that the posterior of µ updated by the views is
p(µ|v) = N (mv , Λv ) (38)
−1
−1 −1 −1
mv = (λ0Σ) ′
+PΩ P (λ0Σ) µequ + P′ Ω−1v
−1
−1 ′ −1
Λv = (λ0Σ) +PΩ P .
Then, the predictive density of one period ahead returns is again
p(rT +1|Σ, v) = N mv , Σ + Λv . (39)
18
19. Bayesian Portfolio Analysis (11)
• Bayesian Asset Pricing Model
Pastor (2000) formulated a Bayesian Asset Pricing Model
rt = x′ θ + εt
t (40)
εt ∼ i.i.d. N (0, σ 2), (41)
where xt = [1, z′ ]′ denotes a (k + 1) × 1 vector with zt containing a k × 1 vector
t
of observable factors. Then, θ is a (k + 1) × 1 vector with θ = [α, β ′]′ where α
is the intercept and β is a k × 1 vector containing the sensitivities (betas) of the
assets to the factors zt. Pastor (2000) imposes a prior on θ such that
p(θ) = N (m0, Λ0), (42)
where m0 is a m × 1 vector of prior means and Λ0 is a m × m uncertainty matrix.
In a single-factor model such as the CAPM, the benchmark portfolio zt is the
19
20. market portfolio, and market efficiency implies α = 0. Therefore he assumes that
2
m0 = 0ι and by sets the first element of the diagonal matrix Λ0 equal to σα and
the remaining to a high but finite value, indicating an uninformative prior for β.
He then obtains the posterior of θ by updating the prior with observations such
that19
p(θ|y, X, σ 2) = N (mT , ΛT ) (43)
−1
−1 −1 −1 ˆ
mT = Λ−1
0
2
+ σ (X X) ′
Λ−1m0 + σ 2(X′X)−1
0 θ
−1
−1 −1
ΛT = Λ−1
0
2
+ σ (X X) ′
,
′ ′
where X = [x1, x2, . . . , xt] , y = [r1, r2, . . . , rt] and
ˆ
θ = (X′X)−1X′y. (44)
The predictive density of one period ahead asset returns is then
p(rT +1|y, X, σ 2) = N x′ +1mT , σ 2 + x′ +1ΛT xT +1 .
T T (45)
20
21. Bayesian Portfolio Analysis (12)
• Bayesian return forecasting with a belief in no-predictability
Kandel & Stambaugh (1996) formulate the following predictive regression model:
rt = x′ θ + εt
t−1 (46)
εt ∼ i.i.d. N (0, σ 2). (47)
where xt−1 = [1, z′ ]′ denotes a (k + 1) × 1 vector with zt−1 containing a k × 1
t−1
vector of exogenous explanatory variables. Then, θ is a (k + 1) × 1 vector with
θ = [α, β ′]′, where α is the intercept, which is scalar, and β is a k × 1 vector
containing the regression coefficients of the explanatory variables zt−1 . Kandel &
Stambaugh (1996) imposes a prior on θ such that
p(θ) = N (m0, Λ0), (48)
where m0 is a m × 1 vector of prior means and Λ0 is a m × m uncertainty
matrix. Kandel & Stambaugh (1996) and Connor (1997) recommend imposing
21
22. an informative prior centered on the economic notion of (weak form) market
efficiency, which implies that the slope coefficient should be zero. Specifically,
2
they use the prior p(β) = N (0, σβ I) and an uninformative prior for α. This choice
of prior translates into m0 = 0ι where ι is a (k + 1) × 1 vector of ones; the first
element of the diagonal matrix Λ0 has a high but finite value, and the remaining
2
diagonal elements are assigned according to σβ .
The posterior density is then:
p(θ|y, X, σ 2) = N (mT , ΛT ) , (49)
−1
−1 −1 −1
ˆ
mT = Λ−1
0
2
+ σ (X X) ′
Λ−1m0 + σ 2(X′X)−1
0 θ
−1
−1 −1
ΛT = Λ−1
0
2
+ σ (X X) ′
.
′ ′
where X = [x0, x1, . . . , xt−1] , y = [r1, r2, . . . , rt] and
ˆ
θ = (X′X)−1X′y. (50)
22
23. The predictive density of one period ahead asset returns is then
p(rT +1|y, X, σ 2) = N x′ +1mT , σ 2 + x′ +1ΛT xT +1 .
T T (51)
23
24. Bayesian Portfolio Analysis (13)
• Bayesian Model Uncertainty20
Each model Mj ∈ {M1, . . . , MJ } is given a sample density p(rt|Mj , θ j ). Each
model has then a conditional likelihood p(ΦT |Mj , θ j ), where ΦT is the observation
acquired up to time T and θ j is the vector of parameters for model j ∈ {1, . . . , J}.
Then the posterior model probability is given by
p(ΦT |Mj )Pr(Mj )
Pr(Mj |ΦT ) = J
, (52)
j=1 p(ΦT |Mj )Pr(Mj )
where
p(ΦT |Mj ) = p(ΦT |Mj , θ j )p(θ j |Mj )dθ j (53)
Θθ
j
The predictive return density for one period ahead returns is given by predictive
24
26. Multi-period Bayesian Asset Allocation (1)
• Learning is a result of parameter (or state) uncertainty in dynamic decision
problems.
• Generally, Bayesian learning leads to a non-Markovian dynamic decision problem.
• However, under certain conditions an equivalent (re-)representation of the state
variables can be found that recaptures the Markovian structure.
• Conditions for the existence of the state (re-)representation are presented.
• Whenever the posterior density exhibits a finite number of sufficient statistics,
these sufficient statistics define a compact filter on observations and become state
variables in the dynamic decision problem.
• These state variables can be updated sequentially and exhibit a Markovian struc-
ture.
26
27. Multi-period Bayesian Asset Allocation (2)
• Terminal wealth problem:21
I
Vt(Wt, Zt) = max Et U (WT ) | Ft (56)
{ω s }T −1
s=t
s.t. Ws+1 = Ws ω ′ exp(rs+1 + rs ) + (1 − ι′ω s) exp(rs ) ,
s
f f
(57)
I
for all s ∈ {t, . . . , T − 1}, and where ω s is the optimal allocation, Ft and is the
information filtration.
• The finite horizon Bellman equation:
I
Vt(Wt, Zt) = max Et Vt+1(Wt+1, Zt+1) | Ft , (58)
ωt
with boundary condition VT +1(WT , ZT ) = U (WT ).
27
28. Multi-period Bayesian Asset Allocation (3)
• Bayesian decomposition, Bayes′ rule and Fubini′s theorem:22
Vt(Wt, Zt) = max Vt+1 (Wt+1, Zt+1) p(rt+1, θ|Φt)d(rt+1, θ) (59)
ωt Ω×Θ
= max Vt+1 (Wt+1, Zt+1) p(rt+1|θ, Zt)p(θ|Φt)dθdrt+1
ωt Ω Θ
= max Vt+1 (Wt+1, Zt+1) p(rt+1|θ, Zt)p(θ|Φt)dθ drt+1
ωt Ω Θ
= max Vt+1 (Wt+1, Zt+1) p(rt+1|Φt)drt+1,
ωt Ω
with boundary condition Vt+1(WT , ZT ) = U (WT ).
28
29. Multi-period Bayesian Asset Allocation (4)
• The predictive distribution breaks the Markovian structure of the multi-period
dynamic allocation problem.
• State (re-)representation theorem:23
A mere change in variables representing the state-space does not affect the
solution, provided that both sets of variables convey equivalent information.
The condition for the theorem to hold is that the two sets of variables span
equivalent σ-fields. Whenever a set of sufficient statistics exists, the information
necessary to characterize p(rt+1|Φt) can be captured by these sufficient statistics.
˜
Including these sufficient statistics in the state representation Zt, the Markovian
˜
representation of p(rt+1|Φt) is p(rt+1|Zt). In other words, p(rt+1|Zt) and˜
p(rt+1|Φt) convey equivalent information and therefore span equivalent σ-fields.
Then, the last line of (59) can be written as
˜
Vt(Wt, Zt) = max ˜
Vt+1 (Wt+1, Zt+1) p(rt+1|Zt)drt+1. (60)
ωt Ω
29
30. Multi-period Bayesian Asset Allocation (5)
• Implications of the state (re-)representation theorem:24
– Two representations of the state-space convey equivalent information if a
set of sufficient statistics exists such that no other statistic calculated from
observations is needed to uniquely determine the predictive distribution, which
could eliminate the need to condition the observation history.
– Should such sufficient statistics exist, we might sequentially update the statistics.
The resulting path is described by a process that exhibits a Markovian structure.
– Provided it does not depend upon unobservable states, the objective function
is irrelevant in determining the predictive distribution. Therefore, the predictive
distribution and the process of learning can be derived outside the decision
context.
– The state (re)-representation does not depend on the form of the probability
distribution of the state variables.25 However, the form of the probability
distribution and the prior determine the number of sufficient statistics.
– The sufficient statistics become state variables in the multi-period decision
problem.
30
31. Multi-period Bayesian Asset Allocation (6)
• Conditions for learning:26
– The learning process must be compact, meaning that additional information
does not change the form of the posterior distribution. Instead, the additional
information modifies the values of the sufficient statistics that determine the
predictive distribution.
– The observation process must be described by a parametric statistical model,
implying a likelihood function of parametric form.
– We must use an informative prior of the conjugate family or an uninforma-
tive Jeffreys′ prior. Only this family of priors guarantees that the posterior
distribution of the observations will preserve the distributional form.
– Alternatively, it is possible to use uninformative (diffuse) priors to represent the
absence of prior knowledge.
– Combining the prior with the likelihood function must result in a joint posterior
distribution for which a set of sufficient statistics can summarize all necessary
information to determine the predictive density.
31
32. Bayesian Learning (1)
• A wide class of conditionally normal linear models are discussed and the metho-
dology of Bayesian learning presented.
• The following models are analyzed: (1) Learning the risk-premium under un-
certainty; (2) Learning in a forecasting model; (3) Learning the parameters
of a vector-autoregressive model; (4) Learning the parameters of a generalized
state-space model; (5) Learning in a regime-switching model under uncertainty.
• The Bayesian learning process of the first four models is shown to be the Bayesian
version of the well known Kalman filter.
• The Bayesian learning process of the regime-switching model under uncertainty
can be seen as the Bayesian equivalent of the well known Hamilton filter.
• The presented models fulfil the conditions for the state (re-)representation such
that the multi-period dynamic decision problem remains solvable.
32
33. Bayesian Learning (2)
• Learning in a generalized state-space model27
– The model is riche enough to encompass model (1)-(4), therefore the presenta-
tion is restricted to this class.
– The Bayesian learning process is the Bayesian equivalent of the well known
Kalman filter.
Measurement equation:
yt = Ft−1θ t + vt (61)
vt ∼ i.i.d. N (0ι, V), (62)
where yt is an m × 1 vector of observable random variables, θ t is a k × 1 vector of
unobservable time-varying parameters and Ft−1 is an m × k design matrix relating
the parameters to observations and assuring that the product Ft−1θ t is an m × 1
vector. V is a m × m PDS covariance matrix.
33
34. Transition equation:
θ t = Gt−1θ t−1 + wt (63)
wt ∼ i.i.d. N (0ι, W), (64)
where Gt−1 is a k × k design matrix relating the parameters of the last period to
parameters of the current period ensuring that the product Gt−1θ t−1 is a k × 1
vector. W is a k × k PDS covariance matrix.
Let Φt contain the information acquired up to time t. Imposing the usual
conjugate prior on θ 0 such that p(θ 0) = N m0, Λ0), the posterior density is given
by28
p(θ t|Φt) = N mt, Λt (65)
Rt−1 = W + Gt−1Λt−1G′
t−1
mt = Gt−1mt−1 + Rt−1F′ (V + Ft−1Rt−1F′ )−1(yt − Ft−1Gt−1mt−1 )
t−1 t−1
Λt = Rt−1 − Rt−1F′ (V + Ft−1Rt−1F′ )−1Ft−1Rt−1.
t−1 t−1
34
35. The predictive return density for one period ahead returns is29
p(yt+1|Φt) = N FtGtmt, V + FtRtF′ .
t (66)
The state (re-)representation is therefore
˜ f
Zt = {rt , mt, unique(Λt), unique(Ft), unique(Gt)}, (67)
where the operator unique(·) determines all unique elements that are not constant.
35
36. Bayesian Learning (3)
• Learning in regime-switching model with uncertainty30
– The investor faces a generalized state-space models with multiple parameter
sets θ j , one for each regime j = {1, . . . , S}.
– The actual regime st is unobservable to the investor, while the other parameters
are assumed to be known.
– The regime-switching model can be seen as an application of Bayesian model
uncertainty.31
The model is governed by the following measurement equation
yt = Ft−1θ st + vt (68)
vt ∼ i.i.d. N (0ι, Vst ). (69)
36
37. The regimes are governed by the following transition matrix:
π11 · · · π1S
πij = Pr(st+1 = j|st = i), Π= .
. ... . .
. (70)
πS1 · · · πSS
Prediction: Let Φt−1 contain all information acquired up to time t − 1. At the
end of the (t − 1)-th iteration, the investor is given the posterior state probability
Pr(st−1 = j|Φt−1), for all j ∈ {1, . . . , S}. Based on the posterior, the investor
must predict the state probability of the next period by
S
Pr(st = j|Φt−1) = Pr(st = j|st−1 = i)Pr(st−1 = i|Φt−1) (71)
i=1
where Pr(st = j|st−1 = i), for all i = {1, . . . , S} and j = {1, . . . , S}, are the
transition probabilities πji in the transition matrix Π.
Update: For each date of the sample the optimal inference and forecast on the
37
38. active regime can be found by iterating on the equations
p(yt|st = j, Φt−1)Pr(st = j|Φt−1 )
Pr(st = j|Φt) = S
, (72)
i=1 p(yt|st = i, Φt−1)Pr(st = i|Φt−1 )
Initialization: The initial state probability Pr(s1 = j|Φ0) must sum up to pro-
bability one and Pr(s1 = j|Φ0) ≥ 0. Reasonable choices include the steady
state probability or attributing the same probability to all states. The procedure
starts with the initial state probability followed by iterating on the prediction and
updating steps.
Predictive density for one period ahead returns is then given by
S
p(yt+1|Φt) = p(yt+1|st+1 = i, Φt)Pr(st+1 = i|Φt). (73)
i=1
The and state (re-)representation of the Bayesian multi-period investor is
˜ f
Zt = {rt , Pr(st = j|Φt), unique(Ft)}, ∀j ∈ {1, . . . , S}. (74)
38
39. Conclusion (1)
• Classical portfolio selection assumes that parameters are known and constant over
time and that there is no other source of information useful to the investor than
sample statistics.
• Bayesian portfolio analysis assumes that parameters (or states) are unknown to
the investor, that the investor learns more about quantities of interest as new
observations become available and that there are sources of information other
than sample evidence that are useful in his decision making process.
• Parameters (or certain states) of economic models are never known with certainty.
In fact, these variables are often unknown, partly known or may vary randomly.
• The aim of the work is to extend Bayesian portfolio analysis to a multi-period
dynamic setting.
39
40. Conclusion (2)
• The Bayesian paradigm allows the investor to treat parameters (or certain states)
as uncertain, to include prior beliefs or views in the decision making process and
to learn the quantities of interest as new information become available.
• Given that the observation history is rarely a good predictor of the future, it is
evident that information other than the sample statistics of past observations may
be very useful in a portfolio selection context.
• Furthermore, portfolio choices are by nature subjective decisions and not objective
inference problems as the mainstream literature on portfolio choice might suggest.
Therefore, there is no need to facilitate comparison.32
• Given that the overall approach is feasible, it remains to show by empirical studies
that the Bayesian approach to parameter uncertainty and learning in fact leads to
better allocations in multi-period dynamic decision problems.
40
41. Footnotes
1
See, e.g., Jeffreys (1961), Zellner (1996), Barry (1974), Klein & Bawa (1976), and Bawa, Brown & Klein (1979).
2
See, e.g., Brandt (2010, p. 312), Hamilton (1994, p. 355), Raiffa & Schlaifer (1968).
3
Shrinkage estimation relates to either posterior shrinkage, James-Stein optimal shrinkage estimation and Jorion’s
(1986) Bayes-Stein estimation.
4
Mixed estimation is attributed to Theil & Goldberger (1961); The Black-Litterman model refers to the work of Black
& Litterman (1992).
5
Prior beliefs in an asset pricing theory has been formulated by Pstor (2000).
6
Prior beliefs in no-predictability in a forecasting model has been introduced by Kandel & Stambaugh (1996).
7
Bayesian model uncertainty, model selection and model averaging in financial applications can be found in Avramov
(2002), Brandt (2010, p. 319), Cremers (2002), Carlin & Louis (2009, pp. 203-204), and Koop, Poirier & Tobias (2007,
Ch. 16), among others.
8
See, e.g., Campbell & Viceira (2003, p. 22), Barberis (2000).
9
See, e.g., Barberis (2000), Kandel & Stambaugh (1996, p. 388), Rachev et al. (2008, p. 96), Wachter (2007, p. 14).
10
See Barberis (2000), Brandt (2010, p. 308), Brown (1976, 1978), Kandel & Stambaugh (1996, p. 388), Klein & Bawa
(1976), Pstor (2000), Skoulakis (2007, p. 7), and Zellner & Chetty (1965).
41
42. 11
See, e.g., Barberis (2000), Brandt (2010, p. 310), Hamilton (1994, p. 748), Hoff (2009, p. 108; Rachev et al., 2008,
p. 95; Zellner, 1996, p. 379)
12
See, e.g., Brandt (2010, p. 310), Zellner (1996), Hamilton (1994, p. 353).
13
Brandt (2010, p. 310).
14
The variance decomposition and the effect of parameter uncertainty in the long-run predictive density has been
discussed in Barberis (2000), Pastor & Veronesi (2009, p. 12), and Pastor & Stambaugh (2009).
15
See, e.g., Brennan & Xia (2001, p. 918), Hoff (2009, p. 108), Koop, Poirier & Tobias (2007, p. 26).
16
Posterior shrinkage is a generalization of the Bayes-Stein estimator (Jorion, 1986) and is a direct result from
reformulating the posterior obtained by an informative prior in shrinkage form.
17
See, e.g., Greene (2008, p. 607), Hoff (2009, p. 108), Koop, Poirier & Tobias (2007, p. 26).
18
Mixed estimation is attributed to the work of Theil & Goldberg (1961). It is also presented in Brandt (2010, p. 313),
Satchell & Scowcroft (2000), Scowcroft & Sefton (2003). As a special case of mixed estimation, Black & Litterman
(1992) present the Black-Litlerman model.
19
See, e.g., Greene (2008, p. 607), Hamilton (1994, p. 356), Hoff (2009, p. 155).
20
Bayesian model uncertainty, model selection and model averaging in financial applications can be found in Avramov
(2002), Brandt (2010, p. 319), Cremers (2002), Carlin & Louis (2009, pp. 203-204), and Koop, Poirier & Tobias (2007,
Ch. 16), among others.
42
43. 21
Instead of optimizing expected utility over final wealth, the investor could have other objectives, such maximizing
expected utility over interim consumption. The terminal wealth problem has been discussed in Barberis (2000, p. 251),
Brandt (2010, p. 274), Brandt et al. (2005, p. 836), and Brannan & Xia (2001, p. 918). Consumption-based models have
been presented in Rubinstein (1976), Lucas (1978), Breeden (1979), Campbell & Viceira (2003, p. 122), Cochrane (1989,
p. 322), and Mehra & Prescott (1985, 2003) and in other classical papers, such as those of Samuelson (1969, 1970) and
Merton (1969, 1971).
22
See, e.g., Barberis (2000, p. 255), Brandt (2010, p. 309), Bauwens, Lubrano & Richard (1999, p. 5). Fubini′s theorem
is presented in Billingsely (1986, p. 236), Chung (1974, p. 59), Capi´ski & Kopp (2004, p. 171), Duffie (1996, p. 282).
n
23
See Feldman (2007, p. 127).
24
See Feldman (2007, p. 127).
25
See Feldman (2007, p. 127).
26
See Feldman (2007, p. 124).
27
The presentation is mainly based on the work of Harvey (1993), Kim & Nelson (1999, pp. 22-57, pp. 189-236), West
& Harrison (1997), Meinhold & Singpurwalla (1983), and the references therein.
28
The derivation can be found in West & Harrison ( 1997, p. 583, p. 639), and Meinhold & Singpurwalla (1983, p. 125).
29
See, e.g., Carlin & Louis (2009, p. 26), West & Harrison (1997, p. 584).
30
The derivation is based on Hamilton (1994, pp. 692), Guidolin & Timmermann (2007), Harvey (1993, p. 289), and
Kim & Nelson (1999, p. 63).
43
44. 31
Guidolin & Timmermann (2007).
32
See Brandt (2010, p. 311).
44