Similar a Jan Picek, Martin Schindler, Jan Kyselý, Romana Beranová: Statistical aspects of the regression quantiles methodology in the POT analysis (20)
Jan Picek, Martin Schindler, Jan Kyselý, Romana Beranová: Statistical aspects of the regression quantiles methodology in the POT analysis
1. 1
Statistical aspects of the regression
quantiles methodology in the POT analysis
Jan Picek, Martin Schindler
Technical University of Liberec, Czech Republic
Department of Applied Mathematics
Jan Kysel´, Romana Beranov´
y a
Institute of Atmospheric Physics, Czech Republic
Workshop Non-stationary extreme value modelling in climatology Liberec, February 16–17, 2012
2. Motivation 2
Motivation
Development of extreme value models with time-dependent parameters in
order to estimate (time-dependent) high quantiles of maximum daily air
temperatures over Europe in climate change simulations (1961-2100).
Kysel´, Picek, Beranova (2010): Global and Planetary Change, 72, 55-68
y
Workshop Non-stationary extreme value modelling in climatology Liberec, February 16–17, 2012
3. Data motivation 3
Differences between 20-yr return values of TMAX estimated using the
non-stationary POT model for years 2100 and 2071. Large (small) crosses
mark gridpoints in which the estimated 90% (80%) CIs do not overlap.
Workshop Non-stationary extreme value modelling in climatology Liberec, February 16–17, 2012
4. Theoretical models 4
Theoretical models
Fisher-Tippett Theorem: ”If suitable normalized maxima converge in dis-
tribution to a non-degenerate limit, then the limit distribution must be an
extreme value distribution.”
=⇒ Method block maxima – we collect data on block maxima and fit the
three-parameter form of the GEV distribution. For this we require a lot of
raw data so that we can form sufficiently many, sufficiently large blocks.
Threshold view – it is reasonable to involve all values exceeding a given
high threshold u. Pickands (1975) showed that the limiting distribution of
normalized excesses of a threshold u as the threshold approaches the end-
point uend of the variable of interest is the Generalized Pareto Distribution.
Workshop Non-stationary extreme value modelling in climatology Liberec, February 16–17, 2012
5. Theoretical models 5
It is usual to fit the Generalized Pareto Distribution to excesses over a
(high enough) threshold. Thus we suppose that the asymptotic result is
(approximately) true for the threshold of interest.
The method is known as peaks-over-threshold (POT) and leads to the
Poisson process model for threshold exceedances and the Generalized
Pareto (GP) distribution for their magnitudes.
The block maxima and POT methods assume stationarity of the under-
lying process which is often violated in climatology by the presence of a
trend or long-term variability in the data.
Workshop Non-stationary extreme value modelling in climatology Liberec, February 16–17, 2012
6. Theoretical models 6
If we describe a variable of primary interest by using covariate information
(time index, variables based on atmospheric circulation ...).
=⇒
An approach based on the theory of point processes developed by Smith
(1989) and Coles (2001).
The method leads to a likelihood function that can be treated in a usual
way to obtain maximum likelihood estimates, standard errors and con-
fidence intervals of the model parameters. One of its main advantages
is that it enables a straightforward incorporation of time-dependency of
parameters of the extreme value distribution.
BUT
also the threshold may depend on the covariates.
Workshop Non-stationary extreme value modelling in climatology Liberec, February 16–17, 2012
7. Theoretical models 7
When a significant trend is present in the data, no fixed threshold in the
POT models is suitable over longer periods of time: there are either too
few (or no) exceedances over the threshold in an earlier part of records or
too many exceedances towards the end of the examined period.
Workshop Non-stationary extreme value modelling in climatology Liberec, February 16–17, 2012
8. Regression quantiles 8
Regression quantiles
We use of a time-dependent threshold based on the quantile regression
methodology.
Consider the linear regression model
Y = Xβ + E, (1)
where Y is an (n×1) vector of observations, X is an (n×(p+1)) matrix,
β is the ((p + 1) × 1) unknown parameter (p ≥ 1)and E is an (n × 1)
vector of i. i. d. errors.
We assume that the first column of X is 1n , i.e. the first component of
β is an intercept.
Workshop Non-stationary extreme value modelling in climatology Liberec, February 16–17, 2012
9. Regression quantiles 9
R. Koenker a G. Basset (1978) defined the α-regression quantile β (α)
(0 < α < 1) for the model (1) as any solution of the minimization
n
ρα (Yi − xit) := min, t ∈ I p+1,
R (2)
i=1
where
ρα (x) = xψα (x), x ∈ I 1 and ψα (x) = α − I[x<0] , x ∈ I 1 .
R R (3)
Workshop Non-stationary extreme value modelling in climatology Liberec, February 16–17, 2012
10. Regression quantiles 10
6
4
2
y
0
-2
70%
-4
30%
-1.0 -0.5 0.0 0.5 1.0
x
The advantage of this approach is that many aspects of usual quantiles
and order statistics are generalized naturally to the linear model.
Workshop Non-stationary extreme value modelling in climatology Liberec, February 16–17, 2012
11. Regression quantiles 11
Mean annual number of exceedances above the threshold (averaged over
gridpoints) for the 95% regression quantile and the 95% quantile.
Workshop Non-stationary extreme value modelling in climatology Liberec, February 16–17, 2012
12. Regression quantiles 12
Computation: It is possible to characterize the α-regression quantile
β(α) as the component β of the optimal solution (β, r+ , r− ) of the linear
program
α1n r+ + (1 − α)1n r− := min
X β + r+ − r− = Y (4)
β ∈ I p+1, r+ , r− ∈ I + 0 < α < 1,
R Rn
where 1n = (1, . . . , 1) ∈ I n .
R
R – package quantreg
Workshop Non-stationary extreme value modelling in climatology Liberec, February 16–17, 2012
13. Regression quantiles 13
Theory – POT : Let X1, X2, . . . be iid random variables with dis-
tribution function F . The behavior of extreme events (all values ex-
ceeding a given high threshold u) is given by the conditional probability
P (Xi > y|Xi > u) and
P (Xi < y|Xi > u) → H(y), u → uend,
with ⎧ −1/γ
⎨ 1− 1+γ x−µ
γ=0
σ
H(y) = ,
⎩ − x−µ
1−e ( σ ) γ=0
where 1 + γ x−µ
σ > 0 and uend is the right end-point of the variable Xi.
Dienstbier and Picek (2011) showed that also the limiting distribution of
normalized excesses of a regression quantile threshold is the Generalized
Pareto.
Workshop Non-stationary extreme value modelling in climatology Liberec, February 16–17, 2012
14. Regression quantiles 14
The formal dual program to (4) can be written in the form
ˆ
Yn a := max
ˆ
Xna = (1 − α)X 1n (5)
ˆ
a ∈ [0, 1]n, 0<α<1
ˆ
The components of the optimal solutions a(α) = (ˆ1 (α), . . . , an (α)) are
a ˆ
called the regression rank scores. (Gutenbrunner and Jureˇkov´ 1992)
c a
Workshop Non-stationary extreme value modelling in climatology Liberec, February 16–17, 2012
15. Tests 15
Tests
H´jek (1965) extended the Kolmogorov - Smirnov test to verify the hy-
a
pothesis of randomness against the regression alternative. He considered
the rank - scores process and showed that not only the Kolmogorov -
Smirnov test but many other rank test can be expressed as functionals of
rank - scores process.
A general class of tests based on regression rank scores, parallel to classical
rank tests as the Wilcoxon, normal scores and median, was constructed
in Gutenbrunner et al. (1993), ...
Workshop Non-stationary extreme value modelling in climatology Liberec, February 16–17, 2012
16. Tests 16
Typically, the test based on regression rank scores applies to the model
Y = X1β + X2γ + E, (6)
where β and γ are p- and q-dimensional parameters, X1 of order (n × p)
and X2 of order n × q, respectively, where one verifies the hypothesis
H0 : γ = 0, β unspecified
Workshop Non-stationary extreme value modelling in climatology Liberec, February 16–17, 2012
17. Tests 17
Results of the tests on parameters of the linear and quadratic terms of
the 95% regression quantiles in individual GCM scenarios.
Percentage of gridpoints in which the examined parameter is significantly
different from zero at p=0.05
GCM Scenario Linear Quadratic
CM2.0 A2 100.0 90.3
A1B 98.1 43.5
B1 98.1 43.1
CM2.1 A2 98.9 77.5
A1B 99.4 38.7
B1 98.9 54.6
A1FI 99.8 47.4
Workshop Non-stationary extreme value modelling in climatology Liberec, February 16–17, 2012
18. Tests 18
Threshold Selection
”Remove trend and apply on residuals:”
• Threshold Choice plot
Let X ∼ GP D(µ0, σ0 , γ0 ). Let µ1 > µ0 be another threshold.
The r.v. X|X > µ1 is also GPD with updated parametrs σ1 =
σ0 + γ0(µ1 − µ0) and γ1 = γ0. Let σ = σ1 − γ1µ1. σ and γ1 are
constant for µ1 > µ0 if µ1 > µ0 is a suitable threshold.
• Mean Residual Life Plot
• L-Moments plot
Workshop Non-stationary extreme value modelling in climatology Liberec, February 16–17, 2012
20. Conclusions 20
Conclusions
• The proposed non-stationary peaks-over-threshold method with time-
dependent thresholds estimated using regression quantiles is compu-
tationally straightforward
• The limiting distribution of normalized excesses of a regression quan-
tile threshold is the Generalized Pareto.
• The choice of regression model is based on the ”rank” tests corre-
sponding to regression quantiles.
• We can use usual tools to select a suitable threshold.
Workshop Non-stationary extreme value modelling in climatology Liberec, February 16–17, 2012