Nonlinear Discrete-time Hazard Models for Entry into Marriage
1. Nonlinear Discrete-time Hazard Models for
Entry into Marriage
Heather Turner, Andy Batchelor, David Firth
Department of Statistics
University of Warwick, UK
8th March 2010
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
2. Motivating Application: The LII Survey
The Living in Ireland Surveys were conducted 1994-2001
For five 5-year cohorts of women born between 1950 and
1975 we have the following data
year of (first) marriage
year and month of birth
social class
highest level of education attained
year highest level of education was attained
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
3. When do women get married?
We can use methods from survival analysis to model the
timing of marriage
Consider time starting from the legal age of marriage,
then the survival time, T is the time until a person
marries
The time of marriage is recorded to the nearest year, so
we will use a discrete-time analysis
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
4. Discrete-time Hazard Models
For discrete-time the hazard of marriage occuring at time
t is defined as
h(t) = P (T = t|T ≥ t)
We are interested in the shape of the hazard over the life
course and how the hazard is affected by covariates
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
5. Cox Proportional Odds Model
A popular choice is the proportional odds model proposed
by Cox (JRSSB, 1972):
h(t|xit ) h0 (t)
= exp xit β
1 − h(t|xit ) 1 − h0 t
where h0 (t) is the baseline hazard
Taking logs we obtain
logit(h(t|xit )) = logit(h0 (t)) + xit β
= lt + xit β
semi-parametric - makes no assumption about the shape
of the hazard function
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
6. Episode-splitting
A simple way to estimate the proportional odds model is
to generate an event history for each observation
Pseudo observations are created at each time point from
time 0 up to marriage or censoring - this is known as
episode-splitting
The parameters in the proportional odds model can then
be estimated by fitting a logistic regression model to a
binary indicator of marriage at each time point (married
= 1, unmarried = 0)
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
7. Cox Proportional Odds Model
Probability of Marriage
0.08
0.04
0.00
15 19 23 27 31 35 39 43
Age (years)
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
8. Sidenote: interval-censored data
A similar model can be obtained by assuming that the
data are interval-censored observations of a
continuous-time proportional hazards model
The coefficients in the model
cloglog(h(t|xit )) = lt + xit β
are then the coefficients of the proportional hazards model
This relationship breaks down however if αt is replaced by
a parametric function
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
9. Blossfeld and Huinink Model
Blossfeld and Huinink (Am. J. Sociol., 1991) propose the
following parametric baseline
logit(h0 (t|ageit )) = l(ageit )
= c + βl log(ageit − 15) + βr log(45 − ageit )
describes the nature of the time dependence
fixes the support of the hazard to be 15 to 45 years
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
10. BH Model
qq
q q
q
q
Probability of Marriage
q
q
q
0.08
q q
q
q
q
q
q
0.04
q q
q
q q
q
q q
0.00
qq
q qqq
10 20 30 40 50
Age (years)
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
11. Effect of Endpoints
0.12
Hazard support
Probability of Marriage
15−45 years
12−75 years
0.08
0.04
0.00
10 20 30 40 50
Age (years)
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
12. Nonlinear Discrete-time Hazard Model
An obvious extension of the BH model is to treat the
endpoints as parameters
l(ageit ) = c + βl log(ageit − αl ) + βr log(αr − ageit )
nonlinear - need to extend available software
near-aliasing between parameters - need to
reparameterise
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
13. Developing the Nonlinear Model
First analyse using the BH model as a reference
Then analyse using the extended model and illustrate
near-aliasing
Finally analyse using a re-parameterised nonlinear discrete
model
compare to BH model
refine model for the LII data
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
14. BH Models
The BH models can be fitted using the glm function in R.
Following the model building strategy of Blossfeld &
Huinink (1991), we select
a cohort factor
a time-varying indicator of educational status (in/out)
For the 1970-1974 cohort the conditional odds of
marriage are 24% of those for the 1950-1954 cohort
For women in education the conditional odds of marriage
are 11% of those for women not in education
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
15. Selected BH Model
0.15
(1949,1954]
Probability of Marriage
(1954,1959]
(1959,1964]
0.10
(1964,1969]
(1969,1974]
0.05
0.00
15 20 25 30 35 40 45
Age (years)
Deviance = 12073 Residual d.f. = 31001
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
16. Nonlinear Discrete-time Hazard Models
The nonlinear discrete-time hazard model is an example of
a generalised nonlinear model, which can be fitted using
the gnm package in R (Turner and Firth, R News, 2007)
parameters estimated by a modified IWLS algorithm
certain nonlinear terms inbuilt e.g. Mult, Exp
our terms cannot be expressed in terms of these
functions, so need to write custom "nonlin" function
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
17. Custom "nonlin" Function
LogExcess <- function(age, side = "left"){
call <- sys.call()
constraint <- ifelse(side == "left",
min(age) - 1e-5, max(age) + 1e-5)
list(predictors = list(beta = ∼1, alpha = ∼1),
variables = list(substitute(age)),
term = function(predLabels, varLabels) {
paste(predLabels[1], " * log(",
" -"[side == "right"], varLabels[1], " + ",
" -"[side == "left"], constraint,
" + exp(", predLabels[2], "))")
},
call = as.expression(call))
}
class(LogExcess) <- "nonlin"
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
18. Summary of Baseline Model
Call:
gnm(formula = marriages/lives ~ LogExcess(age, side = "left") +
LogExcess(age, side = "right"), family = binomial, data = fulldata,
weights = lives, start = c(-20, 3, 0, 3, 0))
Deviance Residuals:
Min 1Q Median 3Q Max
-0.8098 -0.4441 -0.3224 -0.1528 4.0483
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -118.5395 201.6387 -0.588 0.55661
LogExcess(age, side = "left")beta 3.6928 1.1913 3.100 0.00194
LogExcess(age, side = "left")alpha -0.1432 0.8935 -0.160 0.87267
LogExcess(age, side = "right")beta 24.8623 38.5743 0.645 0.51923
LogExcess(age, side = "right")alpha 4.0247 1.7376 2.316 0.02054
Std. Error is NA where coefficient has been constrained or is unidentified
Residual deviance: 12553 on 31004 degrees of freedom
AIC: 12748
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-timeiterations: 76
Number of Hazard Models for Entry into Marriage
19. Parameter Correlations
c βl αl βr αr
c 1.00000
βl -0.92563 1.00000
αl -0.80861 0.95844 1.00000
βr -0.99999 0.92688 0.80989 1.00000
αr -0.99833 0.90319 0.77910 0.99808 1.00000
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
20. Example ’Recoil’ Plot
0.12
Probability of Marriage
0.08
0.04
0.00
10 20 30 40 50
Age
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
21. Example ’Recoil’ Plot
0.12
Probability of Marriage
0.08
0.04
0.00
10 20 30 40 50
Age
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
22. Example ’Recoil’ Plot
0.12
qq
Probability of Marriage
q q
q q
q
q
0.08
q
q q
q
q q
q
0.04
q q
q
q
q q
q
q qq
qq
0.00
q q
10 20 30 40 50
Age
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
23. Is Near-aliasing a Problem?
Extended model can still be used as baseline hazard
logit(h(t|xit )) = l(ageit ) + xit β
Near-aliasing will make models harder to fit - particularly
with several covariates
Not all parameters are interpretable
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
24. Re-parameterizing the Nonlinear Model
The nonlinear hazard model can be re-parameterized as
follows:
ν − αl
l(ageit ) = γ − δ (ν − αl ) log
ageit − αl
αr − ν
+ δ (αr − ν) log
αr − ageit
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
25. Interpretation of Parameters
The parameters of the new parameterisation have a more
useful interpretation than before:
expit(γ)
Probability of Marriage
αL ν αR
Age (years)
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
26. New Parameter Correlations
γ ν δ αl αr
γ 1.00000
ν 0.12956 1.00000
δ 0.21943 -0.69849 1.00000
αl 0.27236 -0.42848 0.91425 1.00000
αr 0.03231 -0.75428 0.93696 0.77910 1.00000
Table: Correlations between the estimated parameters of the
reparameterized baseline model defined in Equation ??
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
27. Recoil Plots for Reparameterised Model
0.12
peak height (γ) peak location (ν)
−2.09 → −1.95 25.39 → 28
predictCurve (x)
predictCurve (x)
0.08
0.04
0.00
fall off (δ) left endpoint (αL)
Probability of Marriage
x 0.34 → 0.15 x 14.17 → 15.04
predictCurve (x)
predictCurve (x)
0.12
right endpoint (αR) 10 20 30 40 50
x 100.66 → 47.68 x
predictCurve (x)
0.08
rep(0, 41)
Original Model
0.04
Perturbed Model
q Re−fitted Model
0.00
10 20 30 40 50
xAge 10:50
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
28. Analysis with the Reparameterised Model
We can now repeat the previous analysis using the
nonlinear baseline hazard instead of the BH hazard
function
The model selection is qualitatively unchanged
The residual deviance is reduced by about 20 at the
expense of 2 d.f.
There is a lot of uncertainty about the right end-point -
in the final model it is estimated as 400 years with a
large standard error.
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
29. Infinite Right End-point
It seems more appropriate to define the baseline hazard in
which the right end-point tends to infinity:
ν − αl
l(ageit ) = γ−δ (ν − αl ) log − ageit − ν
ageit − αl
Re-fitting the final model with this baseline increases the
deviance by a negligible amount
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
30. 0.15
Comparing Models
0.15
(1949,1954] (1949,1954]
Probability of Marriage
Probability of Marriage
(1954,1959] (1954,1959]
(1959,1964] (1959,1964]
0.10
0.10
(1964,1969] (1964,1969]
(1969,1974] (1969,1974]
0.05
0.05
0.00
0.00
15 20 25 30 35 40 45 15 20 25 30 35 40 45
Age (years) Age (years)
Deviance = 12073 Residual d.f. = 31001 Deviance = 12051 Residual d.f. = 31000
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
31. Refining the Model
The model building strategy so far has been similar to
Blossfeld and Huinink (1991) for comparison
Careful consideration of the fit of the model suggests that
improvements can be made
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
32. Final Model with New Baseline
0.15
(1949,1954]
Probability of Marriage
(1954,1959]
(1959,1964]
0.10
(1964,1969]
(1969,1974]
0.05
0.00
15 20 25 30 35 40 45
Age (years)
Deviance = 12051 Residual d.f. = 31000
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
33. Cohort Effect
We can investigate the cohort effect further by replacing
the cohort factor by a year-of-birth factor and plotting the
resultant effects
q
q q q q
−0.5 0.0
q q q q q
q q q q
Year−of−birth Effect
q
q q
q
q
q
q
−1.5
q q
−2.5
q
1955 1960 1965 1970
Year of Birth
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
34. Year-of-birth Effect
The plot suggests a more appropriate model
θ exp(λ(yrbi − 1950))
Replacing the year-of-birth factor with this nonlinear term
reduces the deviance by 19 whilst gaining 2 d.f.
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
35. Checking the Fit
The new year-of-birth terms takes account of the effect of
this factor on the magnitude of the hazard
To check for other effects on the hazard, we can group
the data by year of age and cohort then plot the
corresponding observed and fitted proportions
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
38. Linear Dependence of Peak Location
Quantifying the education level by a dynamic measure of
years in education ed, we incorporate a linear dependence
of peak location on ed:
ν0 + ν1 edi − αl
l(xit ) = γ − δ (ν0 + ν1 edi − αl ) log
ageit − αl
+δ {ageit + ν0 + ν1 edi }
This results in a non-proportional hazards model
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
39. Years Post-Education
Checking the fit against years post-education:
0.15
q
q
q
lower rate of increase in
Proportion married
q
q
0.10
q
q
q
q first 3 years
q
post-education
0.05
q qq
q
q
q q
q q
q
q
q q
q
qq
q
sharp change at 7 years
0.00
q
qqqqqqq qqq q q qqqqq
post-education
−10 0 10 20 30
Years post education
outlying points
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
40. Early Career Effect
The lower rate of increase during the first 3 years
post-education may be explained by an early career effect
This can be incorporated in the model by including an
appropriate indicator variable, significantly reducing the
deviance
The deviance does not significantly increase when the left
endpoint is constrained to 15 years
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
41. Effect of Education
Peak location varies from 20.78 years (primary education)
to 26.89 years (university graduates)
0.20
Education level
Primary
Probability of marriage
0.15
Lower sec.
Upper sec.
PLC
0.10
IT
University
0.05
0.00
10 20 30 40 50
Age (years)
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
42. Effect of Year-of-birth
Peak hazard varies from 0.17 (b. 1950) through 0.15 (b.
1960) to 0.07 (b. 1970)
0.20
Year of Birth
1950
Probability of marriage
0.15
1960
1970
0.10
0.05
0.00
10 20 30 40 50
Age (years)
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
43. Summary
Estimating the support of the hazard function improves fit
Near-aliasing can occur in nonlinear models, but can be
overcome by re-parameterisation
Our proposed model has more interpretable parameters,
particularly location and magnitude of the maximum
hazard
can investigate effect of covariates on these features
The parametric form does impose some restrictions on
the shape of the hazard curve
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage
44. References
A comprehensive manual is distributed with the package
at http://www.cran.r-project.org/package=gnm
A working paper on the marriage application is available
at www.warwick.ac.uk/go/crism/research/2007
Heather Turner, Andy Batchelor, David Firth University of Warwick
Nonlinear Discrete-time Hazard Models for Entry into Marriage