1. Arthur CHARPENTIER - Predictive Modeling - SoA Webinar, 2013
Perspectives of Predictive Modeling
Arthur Charpentier
charpentier.arthur@uqam.ca
http ://freakonometrics.hypotheses.org/
(SOA Webcast, November 2013)
1
2. Arthur CHARPENTIER - Predictive Modeling - SoA Webinar, 2013
Agenda
•
◦
◦
•
◦
◦
•
•
◦
◦
◦
•
Introduction to Predictive Modeling
Prediction, best estimate, expected value and confidence interval
Parametric versus nonparametric models
Linear Models and (Ordinary) Least Squares
From least squares to the Gaussian model
Smoothing continuous covariates
From Linear Models to G.L.M.
Modeling a TRUE-FALSE variable
The logistic regression
R.O.C. curve
Classification tree (and random forests)
From individual to functional data
2
3. Arthur CHARPENTIER - Predictive Modeling - SoA Webinar, 2013
0.015
Prediction ? Best estimate ?
E.g. predicting someone’s weight (Y )
−→
q
qqq qqqqqqqqqqqqqqqqqqqqqqqqq qqqqqqqqqqqqq qqqq q
qqqqqq qq qqqq
q
qq
qqq
q
0.010
Consider a sample {y1 , · · · , yn }
Density
Model : Yi = β0 + εi
with E(ε) = 0 and Var(ε) = σ 2
0.005
ε is some unpredictable noise
0.000
Y = y is our ‘best guess’...
100
150
200
250
Weight (lbs.)
Predicting means estimating E(Y ).
Recall that E(Y ) = argmin{ Y − y
y∈R
ε
L2 }
= argmin{E [Y − y]2 }
y∈R
least squares
3
4. Arthur CHARPENTIER - Predictive Modeling - SoA Webinar, 2013
Best estimate with some confidence
0.015
E.g. predicting someone’s weight (Y )
Give an interval [y− , y+ ] such that
q
qq
qqq
q
we can estimate the distribution of Y
dF (x)
F (y) = P(Y ≤ y) or f (y) =
dx x=y
0.000
Confidence intervals can be derived if
0.005
Density
0.010
P(Y ∈ [y− , y+ ]) = 1 − α
qqq qqqqqqqqqqqqqqqqqqqqqqqqq qqqqqqqqqqqqq qqqq q
qqqqqq qq qqqq
q
100
150
200
250
Weight (lbs.)
(related to the idea of “quantifying uncertainty” in our prediction...)
4
5. Arthur CHARPENTIER - Predictive Modeling - SoA Webinar, 2013
Parametric inference
0.015
E.g. predicting someone’s weight (Y )
Assume that F ∈ F = {Fθ , θ ∈ Θ}
0.010
1. Provide an estimate θ
Density
2. Compute bound estimates
y+ =
θ
F −1 (1
θ
− α/2)
0.005
y− = F −1 (α/2)
Standard estimation technique :
0.000
−→ maximum likelihood techniques
90%
100
150
200
250
Weight (lbs.)
n
θ = argmax
θ∈Θ
log fθ (yi )
i=1
explicit (analytical) expression for θ
numerical optimization (Newton Raphson)
log likelihood
5
6. Arthur CHARPENTIER - Predictive Modeling - SoA Webinar, 2013
Non-parametric inference
natural estimator for P(Y ≤ y)
qq
qqq
q
0.010
qqq qqqqqqqqqqqqqqqqqqqqqqqqq qqqqqqqqqqqqq qqqq q
qqqqqq qq qqqq
q
0.005
#{i such that yi ≤y}
q
Density
1. Empirical distribution function
n
1
1(yi ≤ y)
F (y) =
n i=1
0.015
E.g. predicting someone’s weight (Y )
2. Compute bound estimates
90%
(α/2)
y+ = F −1 (1 − α/2)
0.000
y− = F
−1
100
150
200
250
Weight (lbs.)
6
7. Arthur CHARPENTIER - Predictive Modeling - SoA Webinar, 2013
Prediction using some covariates
βM
1(X1,i = F) + εi
0.010
0.005
β1
Male
βF −βM
with E(ε) = 0 and Var(ε) = σ 2
0.000
or Yi = β0 +
Female
Density
based on his/her sex (X1 )
β + ε if X = F
F
i
1,i
Model : Yi =
βH + εi if X1,i = M
0.015
E.g. predicting someone’s weight (Y )
100
150
200
250
Weight (lbs.)
7
8. Arthur CHARPENTIER - Predictive Modeling - SoA Webinar, 2013
Prediction using some (categorical) covariates
E.g. predicting someone’s weight (Y )
0.015
based on his/her sex (X1 )
Conditional parametric model
0.005
Male
0.000
−→ our prediction will be
Female
Density
F if X = F
θF
1,i
i.e. Yi ∼
Fθ if X1,i = M
M
0.010
assume that Y |X1 = x1 ∼ Fθ(x1 )
100
conditional on the covariate
150
200
250
Weight (lbs.)
8
9. Arthur CHARPENTIER - Predictive Modeling - SoA Webinar, 2013
Prediction using some (categorical) covariates
0.010
0.005
Density
0.005
Density
0.010
0.015
Prediction of Y when X1 = M
0.015
Prediction of Y when X1 = F
0.000
90%
0.000
90%
100
150
200
Weight (lbs.)
250
100
150
200
250
Weight (lbs.)
9
10. Arthur CHARPENTIER - Predictive Modeling - SoA Webinar, 2013
Linear Models, and Ordinary Least Squares
E.g. predicting someone’s weight (Y )
250
q
based on his/her height (X2 )
q
q
q
Linear Model : Yi = β0 + β2 X2,i + εi
200
2
Conditional parametric model
q
q
q
150
q
q
100
q
q
q
q
q
q
q
q q
q q q q
q
q
q
q q
q
q
q q
q q
q
q
q
q
q
q
q
q
q q
q
q
q
q
E.g. Gaussian Linear Model
β0 +β2 x2
q
q
q
Y |X2 = x2 ∼ N ( µ(x2 ) , σ 2 (x2 ))
q
q
q q
q
q
q q
q
q
q
q
assume that Y |X2 = x2 ∼ Fθ(x2 )
q
q
q
q
Weight (lbs.)
with E(ε) = 0 and Var(ε) = σ
q
q
q
q
q
q
q q q
q
q
q q
q
q
q
q
q
q q
q q
q q
q
q q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q q
q
q q
q
q
q q
q
q q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q q
q
q
q
q
q
σ2
5.0
5.5
6.0
6.5
Height (in.)
n
[Yi − X T β]2
i
−→ ordinary least squares, β = argmin
i=1
β is also the M.L. estimator of β
10
15. Arthur CHARPENTIER - Predictive Modeling - SoA Webinar, 2013
Improving our prediction ?
sex as
covariate
(Empirical) residuals, εi = Yi − X T β
i
height as
covariate
no covariates
Yi
R2 or log-likelihood
−50
0
50
100
−50
0
50
100
parsimony principle ?
−→ penalizing the likelihood with
the number of covariates
Akaike (AIC) or
Schwarz (BIC) criteria
q
q
q
q
qqq
q
qq
q
15
35. Arthur CHARPENTIER - Predictive Modeling - SoA Webinar, 2013
Visualizing classification trees (CART)
Y < 124.561
|
Y < 124.561
|
X < 5.39698
0.3429
X < 5.69226
0.1500
X < 5.69226
0.2727
X < 5.52822
0.4444
0.5231
0.3175
Y < 162.04
0.6207
0.1111
0.4722
35