Air pollution prediction using conformal kriging and machine learning

Conformal prediction of air pollution concentrations for
the Barcelona Metropolitan Region
PhD Thesis summary

Olga Ivina

University of Girona
GRECS research group
CIBER de Epidemiolog´ y la Salud P´blica
ıa u

November 22, 2012

1 / 42

Outline
Introduction
Air pollution and its eﬀects
Air pollution exposure assessment
Conformal predictors for air pollution problem
Objectives
Methods and data
Kriging
Conformal predictors
Computing
Data
Results
Ordinary kriging and RRCM models in default setting
Kernelisation: a Gaussian kernel
Kernelisation: other kernels
Comparison of models
Discussion
Conclusion
Conformal predictors and geostatistics
Future research
2 / 42

Air pollution and its effects
Introduction

Air pollutant is a problem of growing concern all over the world.
There exists great body of scientific evidence of hazardous effect of air
pollution on people’s health and well-being, as well as on general
ecological condition of our planet.
In people: association with adverse health outcomes - both in adults and
in children. Children are specially susceptible to pollution. They get
affected from the very first stages of their lives and on. Linked outcomes
(to name a few):
- preterm birth and low birth weight
- asthma aggravation, cough and bronchitis
- allergies: hay fever, rhinitis, ...
- excess risk of mortality

3 / 42

Air pollution and its effects - 2
Introduction

Adults are influenced by pollution as well. In them, pollution is linked to
both long-term and short-term health effects (to name a few):
- respiratory: COPD, asthma, chronic bronchitis
- lung cancer
- cardiovascular morbidity
- mortality: cancer, all-cause, cardiopulmonary, non-accidental,...

Special factors of impact: SES and geographical location of a person.

4 / 42

Introduction

Global air pollution map produced by Envisat’s SCIAMACHY.
Authors: S. Beirle, U. Platt and T. Wagner, University of Heidelberg’s Institute for Environmental Physics.

5 / 42

Introduction

The main contributor to air pollution in urban areas is traffic. Two -
”criteria” - traffic-related air pollutants are taken up in this study:
- nitrogen dioxide (NO2)
- particulate matter PM10

NO2 effects:
short-term: respiratory effects and asthma aggravation
long-term: risk of coronary heart disease and fatal events

PM10 effects:
short-term: aggravation of respiratory and cardiovascular diseases,
premature death, ...
long-term: development of heart and lung diseases, premature
death,...
6 / 42

Air pollution exposure assessment
Introduction

Problem: direct measurements of pollution not always available.
There exists a large number of models aimed t predict pollution at a given
spot. The main classes are:
- proximity models
- geostatistical models
- land use regression (LUR) models
- dispersion models
- integrated meteorological emission (IME) models
- hybrid models

7 / 42

Conformal predictors for air pollution problem
Introduction

Problem: nowadays existing methods for air pollution exposure
assessment may lack confidence in predictions.
In order to tackle this problem, this research suggests making use of a
newly developed approach that is conformal predictors. A conformal
predictor is a “confidence predictor”, where the level of confidence for
prediction is introduced ad hoc. This prediction is always valid - provided
by definition of conformal predictor.

8 / 42

Conformal predictors for air pollution problem - 2
Introduction

A conformal predictor is defined by some nonconformity measure, and it
has two major desiderata:
- validity of predictions
- efficiency of preditions

Conformal predictors are flexible: they can be based upon almost any
underlying statistical algorithm.
In air pollution modeling, if a regression-based algorithm is taken up, such
as LUR or kriging, regression residuals serve as a nonconformity measure.

9 / 42

Objectives

This dissertation has two major objectives:
1 To demonstrate the capacity of conformal predictors as a method for
spatial environmental modeling.
2 To provide valid estimates of nitrogen dioxide and ﬁne particulate
matter for Barcelona Metropolitan Region.

10 / 42

Kriging
Methods and data

Kriging is a spatial interpolation method. Provides a prediction of a factor
of interest in an unobserved point on the basis of a set of observed points.
Also provides an estimate of error variance (called “kriging variance”).
First introduced in 1951 by a South African engineer D.H. Krige in his
master work devoted to estimation of a mineral ore body. The method has
been further developed: nowadays the notion “kriging” stands for asset of
methods such as ordinary kriging, simple kriging, co-kriging, Bayesian
kriging etc.
In its simples form, a kriging estimate of the data at an unobserved
location is a linear combination of the observed data. The coeﬃcients of
the equation depend on spatial structure of the data and on the spatial
covariance.

11 / 42

Kriging - 2
Methods and data

The most common kriging is ordinary kriging. It is used when the mean
of the second order stationary process is unknown. It is based on a
geostatistical concept of variogram, and its approach - covariance function.
Let there be n neighboring observed locations, x1 , . . . , xn , and an
unobserved location x0 , on a spatial domain D. Let Z (x) : x ∈ D denote
the process, and let it have a variogram γ(h). Then the ordinary kriging
∗
estimate ZOK (x0 ) at the unobserved point x0 will take the following
analytical form:
n
∗
ZOK (x0 ) = ωα Z (xα ), (1)
α=1

where ωα are the kriging weights. Ordinary kriging provides BLUE
estimates of a random ﬁeld, together with an error variance estimate
(kriging variance.)

12 / 42

New methods. Conformal predictors
Methods and data

How it works? Provided: pairs of observations of (xi , yi ) where xi is an
object and yi is a label. Then

Z := X × Y (2)

denotes the example space. Z is a measurable space. Given an incomplete
data sequence (x1 , y1 ), (x2 , y2 ), . . . , (xn−1 , yn−1 ) ∈ Z∗ , the aim is to predict
a label yn for an object xn . An operator:

D : Z∗ × X → Y (3)

denotes then a simple predictor. (e.g., an ordinary kriging predictor).

13 / 42

New methods. Conformal predictors - 2
Methods and data

The prediction can be described as:

yn = D(x1 , y1 , x2 , y2 , . . . ; xn−1 ), Yn ∈ Y. (4)

Let us allow the predictor to output the prediction sets Yn large enough to
provide the confidence in prediction. This means, that the real value of yn
will fall in Yn with a given level of confidence, which is chosen and
provided to a predictor ad hoc.
A conformal predictor is a confidence predictor defined by some
nonconformity measure. Given the measure, a conformal predictor outputs
the prediction set assuming that the new example conforms with the
observed ones.

14 / 42

Methods and data

Ridge regression conﬁdence machine (RRCM) is a regression-based
conformal predictor. It makes use of the ridge regression procedure (A. E.
Hoerl, 1971) as an underlying algorithm.
Suppose Xn is the n × p matrix of objects (independent variables), and Yn
is the vector of labels (dependent variables). Then, a RRCM estimate of
parameters ω takes form:

ω = (Xn Xn + aIp )−1 Xn Yn , (5)

where a is a ridge factor. a = 0 yields a standard least squares estimate.
The nonconformity scores for this predictor are the regression residuals:
|ei | := |yi − yi |.
ˆ

15 / 42

Methods and data

Based on a signiﬁcance level for prediction introduced (roughly, a
probability of error not to exceed), a RRCM predictor outputs a set of
labels y for yn :

Si := {y : αi (y ) ≥ αn (y )} = {y : |ai + bi y | ≥ |an + bn y |}, (6)
where ai and bi are the components of the vectors A and B.
RRCM outputs prediction sets instead of point predictions (what kriging
does). These sets can be in form of a point, an interval, a ray, a union of
two rays, the whole real line, or empty. Usually, it is an interval.

16 / 42

Methods and data

When the number of parameters p is large, computation is hard. “Kernel
trick” is a method that helps deal with hight-dimensional data. It allows to
consider nonlinearity in RRCM.
A kernel is a similarity measure that operates in a feature space. Provided
an input space X with a dot product, and an operator Φ that maps X to a
feature space H:

Φ:X →H
x → x := Φ(x)
a kernel will be deﬁned as follows. For xα , xβ ∈ X :

k(xα , xβ ) = Φ(xα ), Φ(xβ ) (7)

17 / 42

Methods and data

Any conventional covariance function for kriging can be taken up as
a kernel for RRCM. This research uses three (positive deﬁnite) kernels:

a dot product kernel (default)
a radial basis Gaussian kernel
an inhomogeneous polynomial kernel of a second degree

18 / 42

Computing
Methods and data

All computational work made with R.
- Kriging: geoR package. Function krige.conv
- RRCM: PredictiveRegression package. Function iidpred.
- “Kernel trick” self-developed (on the basis of the PredictiveRegression
:
package) functions for RRCM in “dual form” and for implementing the
kernels.

19 / 42

Data
Methods and data

The data for this study has been kindly provided by XVPCA (Network for
Monitoring and Forecasting of Air Pollution) of the Generalitat de
Catalunya.
Mean annual concentrations of two criteria pollutants, NO2 and PM10, are
provided for the Barcelona Metropolitan Region, together with the
geographical coordinates of the monitoring stations(Mercator, UTM 31).
Time frames:
- NO2: 1998 - 2009, ex. 2003
- PM10: 2001 - 2009, ex.2003

20 / 42

Data - 2
Methods and data

49 monitoring stations over the area in total.
Barcelona Metropolitan Region has a territory of about 3200 km2 and
accommodates over 5 million inhabitants.
In BMR, there happen about 107 million displacements weekly, 54.1% of
them - by means of motorized transport.

21 / 42

Data - 3
Methods and data

Table: 1. Data on mean annual nitrogen dioxide concentrations
Available observations for each year
1998 1999 2000 2001 2002 2004 2005 2006 2007 2008 2009
24 25 25 25 25 24 22 24 25 25 24

Table: 2. Data on mean annual particulate matter concentrations
Available observations for each year
2001 2002 2004 2005 2006 2007 2008 2009
22 24 28 28 29 30 33 36

22 / 42

Data - 4
Methods and data

Two major drawbacks, or limiting factors, of the data set:
Size: there was a small number of observations for each year and
pollutant,
Distribution: the measurement spots are situated quite far apart
from one another, and they are distributed, or placed, unevenly over
the geographic region.

Also, the data is the mean averages, and more frequent observations were
unavailable for this study.

23 / 42

Ordinary kriging and RRCM modeling results
Results

24 / 42

Ordinary kriging and RRCM modeling results - 2
Results

25 / 42

Ordinary kriging and RRCM modeling results - 3
Results

26 / 42

Kernelisation: a Gaussian kernel
Results

27 / 42

Kernelisation: a Gaussian kernel - 2
Results

28 / 42

Kernelisation: a Gaussian kernel - 3
Results

29 / 42

Comparison of the RRCM models
Results

30 / 42

Comparison of the RRCM models - 2
Results

31 / 42

Results

Table: Comparison of models for diﬀerent ridge factors (µg/m3 )
linear iid RBF polynomial
ridge 0.01 1 2 0.01 1 2 0.01 1 2
2001 64.46 64.44 67.13 71.08 63.11 66.06 71.95 74.63 77.24
2002 43.43 42.46 45.54 47.41 42.91 45.05 50.44 53.17 55.82
2004 47.26 39.17 34.59 51.48 39.29 35.19 34.66 37.00 39.51
2005 39.65 45.14 49.28 35.50 47.60 51.91 51.44 54.76 57.76
2006 47.68 45.40 48.63 55.51 46.09 48.86 52.48 55.27 57.86
2007 91.43 94.02 96.45 85.40 94.09 96.65 99.83 102.11 104.29
2008 49.48 50.90 52.58 45.42 55.27 58.21 55.60 57.26 58.91
2009 28.42 27.32 29.01 29.16 26.11 27.79 32.26 33.67 35.09

32 / 42

Results

33 / 42

Results

34 / 42

Results

Table: Comparison of models for diﬀerent ridge factors (µg/m3 )
linear iid RBF polynomial
ridge 0.01 1 2 0.01 1 2 0.01 1 2
1998 76.08 72.33 68.27 65.81 72.37 68.37 65.27 64.71 65.99
1999 66.31 60.11 61.44 67.68 60.57 60.39 65.32 68.20 70.87
2000 51.69 55.27 57.89 50.91 52.90 55.63 61.89 64.19 66.38
2001 36.25 41.30 44.90 35.32 38.65 42.36 49.54 52.34 54.95
2002 52.12 46.57 49.51 47.78 51.44 57.38 54.51 56.99 59.37
2004 53.65 59.11 62.46 53.89 56.95 60.41 67.06 69.36 71.60
2005 78.75 84.77 88.57 79.44 82.18 86.14 94.41 96.94 99.43
2006 61.79 66.39 69.78 61.24 63.82 67.38 74.90 77.36 79.76
2007 47.01 49.35 53.13 48.15 47.11 51.04 57.15 59.91 62.48
2008 46.96 50.15 53.58 47.45 48.04 51.55 57.63 60.21 62.63
2009 55.59 55.17 53.89 48.38 54.35 52.68 52.79 55.19 57.57

35 / 42

Eﬃciency of predictions
Discussion

Kriging predictions are smooth and vary little, also made for mean annual
data. Error estimates, however, are huge in case of nitrogen dioxide, and
small in case of airborne particles - subject to properties of the substances:
NO2 is known to have a generally larger variability than PM10.
Kriging intervals can be derived, assuming the Gaussianity of data
distribution. This assumption is common, but not always correct. RRCM
makes no assumption on data distribution, apart from being iid.
Two factors help boost the eﬃciency of RRCM prediction: kernels and
ridge factor. The least is chosen by the brute force method (or the method
of consecutive approximations).

36 / 42

Conformal predictors and geostatistics
Conclusion

Table: Comparison of OK and RRCM
OK RRCM
point predictions prediction sets (usually intervals)
regression algorithm regression algorithm
Gaussianity assumption iid assumption
estimates error variance -
uses variogram and uses any appropriate
covariance function kernel
to approach it
- ridge factor
may lack conﬁdence conﬁdence level is
chosen and guaranteed

37 / 42

Future research
Conclusion

Extend the existing data set for BMR
Provide additional validation for the methods
Test these models on the data for other cities
Develop conformal predictors on the basis of other popular air
pollution exposure modeling algorithms (land use regression,
dispersion models etc.)

38 / 42

Selected references

V.Vovk, A.Gammerman, G.Shafer, Algorithmic learning in a random
world, Springer (2005).
V.Vovk, I.Nouretdinov, A. Gammerman, On-line predictive linear
regression, The Annals of Statistics (2009).
H. Wackernagel, Multivariate geostatistics: an introduction with
applications, Springer (2003).
B. Sch¨lkopf, J. Smola, Learning with kernels: support vector
o
machines, regularization, optimization, and beyond, MIT Press
(2002).
A. Lertxundi-Manterola, M. Saez, Modelling of nitrogen dioxide (NO2)
and ﬁne particulate matter (PM10) air pollution in the metropolitan
areas of Barcelona and Bilbao, Spain, Environmetrics (2009).

39 / 42

Selected references - 2

A. Hoerl, R. Kennard, Ridge regression: Biased estimation for
nonorthogonal problems, Technometrics 12.1 (1970).
P. Diggle, P. Ribeiro Jr., Model-Based Geostatistics, Springer (2007).
P. Ribeiro Jr., P. Diggle, geoR: a package for geostatistical analysis,
R-NEWS 1.2 (2001).
N. Cressie, Statistics for spatial data, Wiley (1993).
M. Jerrett et al., A review and evaluation of intraurban air pollution
exposure models, Journal of exposure analysis and environmental
epidemiology (2005).

40 / 42

Air pollution prediction using conformal kriging and machine learning

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Destacado

Destacado (20)

Similar a Air pollution prediction using conformal kriging and machine learning

Similar a Air pollution prediction using conformal kriging and machine learning (20)

Air pollution prediction using conformal kriging and machine learning