Assessing the effect of a health-oriented intervention by traditional epidemiological methods is commonly based only on population segments that use healthcare services. Here we introduce a complementary framework for evaluating the impact of a targeted intervention, such as a vaccination campaign against an infectious disease, through a statistical analysis of user-generated content submitted on web platforms. Using supervised learning, we derive a nonlinear regression model for estimating the prevalence of a health event in a population from Internet data. This model is applied to identify control location groups that correlate historically with the areas, where a specific intervention campaign has taken place. We then determine the impact of the intervention by inferring a projection of the disease rates that could have emerged in the absence of a campaign. Our case study focuses on the influenza vaccination program that was launched in England during the 2013/14 season, and our observations consist of millions of geo-located search queries to the Bing search engine and posts on Twitter. The impact estimates derived from the application of the proposed statistical framework support conventional assessments of the campaign.
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
Assessing the impact of a health intervention via user-generated Internet content
1. ECML
PKDD
2015,
Porto,
Portugal
Assessing
the
impact
of
a
health
intervention
via
user-‐generated
Internet
data
Data
Mining
and
Knowledge
Discovery
29(5),
pp.
1434–1457,
2015
Vasileios
Lampos,
Elad
Yom-‐Tov,
Richard
Pebody
and
Ingemar
J.
Cox
STATUTORY NOTIFICATIONS OF INFECTIOUS D
WEEK 2015/33 week ending 16/08/2015
in ENGLAND and WALES
Table 1 Statutory notifications of infectious diseases in the past 6 week
current year compared with corresponding periods of the two p
CONTENTS
Table 2 Statutory notifications of infectious diseases for diseases for W
Region, county, local and unitary authority including additional
6th April 2010
Registered Medical Practioner in England and Wales have a statutory duty to
the local authority, often the CCDC (Consultant in Communicable Disease Co
of certain infectious diseases:
Acute encephalitis Haemolytic uraemic syndrome * R
NOIDs WEEKLY REPORTat
bridge
2. ๏ Background
and
motivation
๏ Nowcasting
disease
rates
from
online
text
๏ Estimating
the
impact
of
a
health
intervention
๏ Case
study:
influenza
vaccination
impact
๏ Conclusions
&
future
work
1%
Assessing
the
impact
of
a
health
intervention
via
online
content
3. Online,
user-‐generated
data
+ Social
media,
blogs,
search
engine
query
logs
+ Proxy
of
real-‐world
(online+offline)
behaviour
+ Complementary
information
sensors
to
more
‘traditional’
crowdsourcing
efforts
+ Can
answer
questions
difficult
to
resolve
otherwise
+ Strong
predictive
power
4. Online,
user-‐generated
data
—
Applications
+ Politics
• voting
intention
• result
of
an
election
+ Finance
• financial
indices
• tourism
patterns
+ User
profiling
• age
• gender
• occupation (Preotiuc-‐Pietro,
Lampos
&
Aletras,
2015)
(Burger
et
al.,
2011)
(Rao
et
al.,
2010)
(Bollen,
Mao
&
Zeng,
2011)
(Choi
&
Varian,
2012)
(Lampos,
Preotiuc-‐Pietro
&
Cohn,
2013)
(Tumasjan
et
al.,
2010)
5. Online,
user-‐generated
data
for
health
Traditional
disease
surveillance
- does
not
cover
the
entire
population
- not
present
everywhere
(cities
/
countries)
- not
always
timely
Digital
disease
surveillance
+ different
or
better
population
coverage
+ better
geographical
granularity
+ useful
in
underdeveloped
parts
of
the
world
+ almost
instant
- noisy,
unstructured
information
e.g.
(Lampos
&
Cristianini,
2010
&
2012),
(Lamb,
Paul
&
Dredze,
2013),
(Lampos
et
al.,
2015)
6. What
this
work
is
all
about
Health
intervention
disease
rates
(
Pebody
&
Cox,
2015
impact ?
7. What
this
work
is
all
about
Health
intervention
disease
rates
(Lampos,
Yom-‐Tov,
Pebody
&
Cox,
2015)
impact ?
8. ✓ Background
and
motivation
๏ Estimating
disease
rates
from
online
text
๏ Estimating
the
impact
of
a
health
intervention
๏ Case
study:
influenza
vaccination
impact
๏ Conclusions
&
future
work
Assessing
the
impact
of
a
health
intervention
via
online
content
15%
9. Estimating
disease
rates
from
online
textVariables
N
M
X 2 RN⇥M
y 2 RN
time
intervals
n-‐grams
frequency
of
n-‐grams
during
the
time
intervals
disease
rates
during
the
time
intervals
Ridge regression
argmin
w,
0
@
NX
i=1
(xiw + yi)2
+
MX
j=1
w2
j
1
A
Elastic net
min
0
@
NX
i=1
(xiw + yi)2
+ 1
MX
j=1
|wj| + 2
MX
j=1
w2
j
1
A
(Hoerl
&
Kennard,
1970)
Ridge
regression
Ridge regression
argmin
w,
0
@
NX
i=1
(xiw + yi)2
+
MX
j=1
w2
j
1
A
Elastic net
argmin
w,
0
@
NX
i=1
(xiw + yi)2
+ 1
MX
j=1
|wj| + 2
MX
j=1
w2
j
1
A (Zou
&
Hastie,
2005)
Elastic
net
10. Estimating
disease
rates
from
online
text
the observation matrix X) we want to learn a function f:
drawn from a GP prior
f(x) ⇠ GP µ(x) = 0, k(x, x0
)
kSE(x, x0
) = 2
exp
✓
kx x0k2
2
2`2
◆
where 2 describes the overall level of variance and ` is r
characteristic length-scale parameter.
An infinite sum of SE kernels with di↵erent length-scal
other well studied covariance function, the Rational Quadra
kRQ(x, x0
) = 2
✓
1 +
kx x0k2
2
2↵`2
◆ ↵
↵ is a parameter that determines the relative weightin
and large-scale variations of input pairs. The RQ kernel can
Gaussian
Process
kSE(x, x0
) = 2
exp 2
2`2
where 2 describes the overall level of variance and ` is referred to a
acteristic length-scale parameter.
An infinite sum of SE kernels with di↵erent length-scales results to
r well studied covariance function, the Rational Quadratic (RQ) ke
kRQ(x, x0
) = 2
✓
1 +
kx x0k2
2
2↵`2
◆ ↵
↵ is a parameter that determines the relative weighting between s
large-scale variations of input pairs. The RQ kernel can be used to m
tions that are expected to vary smoothly across many length-scale
1
Rational
Quadratic
covariance
function
(kernel)
infinite
sum
of
squared
exponential
(RBF)
kernels
k(x, x0
) =
CX
n=1
kRQ(gn, g0
n)
!
+ kN(x, x0
)
One
kernel
per
n-‐gram
category
varied
usage
patterns,
increasing
semantic
value
(Rasmussen
&
Williams,
2006)
see
also
(
11. Estimating
disease
rates
from
online
text
the observation matrix X) we want to learn a function f:
drawn from a GP prior
f(x) ⇠ GP µ(x) = 0, k(x, x0
)
kSE(x, x0
) = 2
exp
✓
kx x0k2
2
2`2
◆
where 2 describes the overall level of variance and ` is r
characteristic length-scale parameter.
An infinite sum of SE kernels with di↵erent length-scal
other well studied covariance function, the Rational Quadra
kRQ(x, x0
) = 2
✓
1 +
kx x0k2
2
2↵`2
◆ ↵
↵ is a parameter that determines the relative weightin
and large-scale variations of input pairs. The RQ kernel can
Gaussian
Process
kSE(x, x0
) = 2
exp 2
2`2
where 2 describes the overall level of variance and ` is referred to a
acteristic length-scale parameter.
An infinite sum of SE kernels with di↵erent length-scales results to
r well studied covariance function, the Rational Quadratic (RQ) ke
kRQ(x, x0
) = 2
✓
1 +
kx x0k2
2
2↵`2
◆ ↵
↵ is a parameter that determines the relative weighting between s
large-scale variations of input pairs. The RQ kernel can be used to m
tions that are expected to vary smoothly across many length-scale
1
Rational
Quadratic
covariance
function
(kernel)
infinite
sum
of
squared
exponential
(RBF)
kernels
k(x, x0
) =
CX
n=1
kRQ(gn, g0
n)
!
+ kN(x, x0
)
here gn is used to express the features of each n-gram category
One
kernel
per
n-‐gram
category
varied
usage
patterns,
increasing
semantic
value
(Rasmussen
&
Williams,
2006)
see
also
(Lampos
et
al.,
2015)
12. Estimating
influenza-‐like
illness
(ILI)
rates
—
Data
2012 2013 2014
0
0.01
0.02
0.03
0.04
ILIrateper100people
ILI rates (PHE)
Bing
User-‐generated
data,
geolocated
in
England
• Twitter:
May
2011
to
April
2014
(308
million
tweets)
• Bing:
end
of
December
2012
to
April
2014
ILI
rates
from
Public
Health
England
(PHE)
13. Estimating
ILI
rates
—
Feature
extraction
• Start
with
a
manually
crafted
list
of
36
textual
markers,
e.g.
flu,
headache,
doctor,
cough
• Extract
frequent
co-‐occurring
n-‐grams
from
a
corpus
of
30
million
UK
tweets
(February
&
March,
2014)
after
removing
stop-‐words
• Set
of
markers
expanded
to
205
n-‐grams
(n
≤
4)
e.g.
#flu,
#cough,
annoying
cough,
worst
sore
throat
• Relatively
small
set
of
features
motivated
by
previous
work
(Culotta,
2013)
14. Estimating
ILI
rates
—
Experimental
setup
Two
time
intervals
based
on
the
different
temporal
coverage
of
Twitter
and
Bing
data
• Dt1:
154
weeks
(May
2011
to
April
2014)
• Dt2:
67
weeks
(December
2012
to
April
2014)
Stratified
10-‐fold
cross
validation
Error
metrics
• Pearson
correlation
(r)
• Mean
Absolute
Error
(MAE)
17. ✓ Background
and
motivation
✓ Estimating
disease
rates
from
online
text
๏ Estimating
the
impact
of
a
health
intervention
๏ Case
study:
influenza
vaccination
impact
๏ Conclusions
&
future
work
Assessing
the
impact
of
a
health
intervention
via
online
content
41%
18. Estimating
the
impact
of
a
health
intervention
1. Disease
intervention
launched
(to
a
set
of
areas)
2. Define
a
distinct
set
of
control
areas
3. Estimate
disease
rates
in
all
areas
4.Identify
pairs
of
areas
with
strong
historical
correlation
in
their
disease
rates
5. Use
this
relationship
during
and
slightly
after
the
intervention
to
infer
diseases
rates
in
the
affected
areas
had
the
intervention
not
taken
place
19. Estimating
the
impact
of
a
health
intervention
Based on a new observation x⇤, a prediction is conduc
the mean value of the posterior predictive distribution, E
—
⌧ = {t1, . . . , tN }
v
c
r(q⌧
v, q⌧
c )
f(w, ) : R ! R
argmin
w,
NX
i=1
qti
c w + qti
v
2
q⇤
v = q⇤
cw + b
time
interval(s)
before
the
intervention
location(s)
where
the
intervention
took
place
control
location(s)
log-marginal likelihood function
argmin
1,..., C ,`1,...,`C ,↵1,...,↵C , N
(y µ)|
K 1
(y µ) + log |K
where K holds the covariance function evaluations for all pai
i.e., (K)i,j = k(xi, xj), and µ = (µ(x1), . . . , µ(xN )).
Based on a new observation x⇤, a prediction is conducted by
the mean value of the posterior predictive distribution, E[y⇤|y,
—
⌧ = {t1, . . . , tN }
v
c
r(q⌧
v, q⌧
c )
f(w, ) : R ! R
where K holds the covariance function evaluations
i.e., (K)i,j = k(xi, xj), and µ = (µ(x1), . . . , µ(xN )).
Based on a new observation x⇤, a prediction is con
the mean value of the posterior predictive distribution
—
⌧ = {t1, . . . , tN }
v
c
r(q⌧
v, q⌧
c )
f(w, ) : R ! R
argmin
w,
NX
i=1
qti
c w + qti
v
2
such
that
i.e., (K)i,j = k(xi, xj), and µ = (µ(
Based on a new observation x⇤,
the mean value of the posterior pre
—
⌧ = {t1, . . . , tN }
v
c
r(q⌧
v, q⌧
c )
f(w, ) : R ! R
argmin
w,
NX
i=1
⇤
disease
rate(s)
in
affected
location
before
intervention
disease
rate(s)
in
control
location
before
intervention
high
20. Estimating
the
impact
of
a
health
intervention
c
r(q⌧
v, q⌧
c )
f(w, ) : R ! R
argmin
w,
NX
i=1
qti
c w + qti
v
2
q⇤
v = q⇤
cw + b
qv
v = qv q⇤
v
qv q⇤
v
f(w, ) : R ! R
argmin
w,
NX
i=1
qti
c w + qti
v
2
q⇤
v = q⇤
cw + b
qv
v = qv q⇤
v
✓v =
qv q⇤
v
q⇤
v
.
such
that
qv
disease
rate(s)
in
affected
location
during/after
intervention
v = qv q⇤
v
absolute
difference
✓v =
qv q⇤
v
q⇤
v
relative
difference
(impact)
(Lambert
&
Pregibon,
2008
estimate
projected
rate(s)
in
affected
location
during/after
intervention
argmin
w, i=1
qc w +
q⇤
v = q⇤
cw + b
q⇤
v = qcw + b
qv
v = qv q⇤
v
2
21. Estimating
the
impact
of
a
health
intervention
c
r(q⌧
v, q⌧
c )
f(w, ) : R ! R
argmin
w,
NX
i=1
qti
c w + qti
v
2
q⇤
v = q⇤
cw + b
qv
v = qv q⇤
v
qv q⇤
v
f(w, ) : R ! R
argmin
w,
NX
i=1
qti
c w + qti
v
2
q⇤
v = q⇤
cw + b
qv
v = qv q⇤
v
✓v =
qv q⇤
v
q⇤
v
.
such
that
f(w, ) : R ! R
argmin
w,
NX
i=1
qti
c w + qti
v
q⇤
v = q⇤
cw + b
qv
v = qv q⇤
v
✓v =
qv q⇤
v
q⇤
v
.
disease
rate(s)
in
affected
location
during/after
intervention
argmin
w,
NX
i=1
qti
c w + qti
v
2
q⇤
v = q⇤
cw + b
v = qv q⇤
v
✓ =
qv q⇤
v
absolute
difference
argmin
w,
NX
i=1
qti
c w + qti
v
2
q⇤
v = q⇤
cw + b
qv
v = qv q⇤
v
✓v =
qv q⇤
v
q⇤
v
relative
difference
(impact)
(Lambert
&
Pregibon,
2008)
estimate
projected
rate(s)
in
affected
location
during/after
intervention
argmin
w, i=1
qc w +
q⇤
v = q⇤
cw + b
q⇤
v = qcw + b
qv
v = qv q⇤
v
2
22. ✓ Background
and
motivation
✓ Estimating
disease
rates
from
online
text
✓ Estimating
the
impact
of
a
health
intervention
๏ Case
study:
influenza
vaccination
impact
๏ Conclusions
&
future
work
Assessing
the
impact
of
a
health
intervention
via
online
content
52%
23. Live
Attenuated
Influenza
Vaccine
(LAIV)
campaign
2012 2013 2014
0
0.01
0.02
0.03
ILIrateper100people
PHE/RCGP LAIV Post LAIV
∆t
v
• LAIV
programme
for
children
(4
to
11
years)
in
pilot
areas
of
England
during
the
2013/14
flu
season
• Vaccination
period
(blue):
Sept.
2013
to
Jan.
2014
• Post-‐vaccination
period
(green):
Feb.
to
April
2014
24. Target
(vaccinated)
&
control
areas
Brighton
•
Bristol
•
Cambridge
Exeter
•
Leeds
•
Liverpool
Norwich
•
Nottingham
•
Plymouth
Sheffield
•
Southampton
•
York
Control
areas
Bury
•
Cumbria
•
Gateshead
Leicester
•
East
Leicestershire
Rutland
•
South-‐East
Essex
Havering
(London)
Newham
(London)
Vaccinated
areas
25. Applying
the
impact
estimation
framework
Target
vs.
control
areas
• Use
previous
flu
season
only
to
establish
relationships
• Find
the
best
correlated
areas
or
supersets
of
them
Confidence
intervals
• Bootstrap
sampling
of
the
regression
residuals
(mapping
function
of
control
to
vaccinated
areas)
• Bootstrap
sampling
of
data
prior
to
the
application
of
the
bootstrapped
regressor
• 105
bootstraps;
use
the
.025
and
.975
quantiles
Statistical
significance
assessment
• Impact
estimate
(abs.)
>
2σ
of
the
bootstrap
estimates
26. Relationship
between
vaccinated
&
control
areas
Twitter
—
All
areas
Bing
—
All
areas
0 0.25 0.5 0.75 1
0
0.25
0.5
0.75
1
ILIratesinvaccinatedareas
ILI rates in control areas
pre−vaccination period
during/after LAIV
0 0.25 0.5 0.75 1
0
0.25
0.5
0.75
1
ILIratesinvaccinatedareas
ILI rates in control areas
pre−vaccination period
during/after LAIV
axes
normalised
from
0
to
1
r
=
.86
r
=
.87
27. Relationship
between
vaccinated
&
control
areas
Twitter
—
London
areas
Bing
—
London
areas
0 0.25 0.5 0.75 1
0
0.25
0.5
0.75
1
ILIratesinvaccinatedareas
ILI rates in control areas
pre−vaccination period
during/after LAIV
0 0.25 0.5 0.75 1
0
0.25
0.5
0.75
1
ILIratesinvaccinatedareas
ILI rates in control areas
pre−vaccination period
during/after LAIV
axes
normalised
from
0
to
1
r
=
.74
r
=
.85
28. Impact
estimation
results
(strongly
correlated
controls)
Source Target r δ
x
103 θ
(%)
Twitter All
areas .861 -‐2.5
(-‐4.1,
-‐1.0) -‐32.8
(-‐47.4,
-‐15.6)
Bing All
areas .866 -‐1.9
(-‐3.2,
-‐0.7) -‐21.7
(-‐32.1,
-‐9.10)
Twitter
London
areas
.738 -‐1.7
(-‐2.5,
-‐0.9) -‐30.5
(-‐41.8,
-‐17.5)
Bing
London
areas
.848 -‐2.8
(-‐4.1,
-‐1.6) -‐28.4
(-‐36.7,
-‐17.9)
29. Impact
estimation
results
(strongly
correlated
controls)
Source Target r δ
x
103 θ
(%)
Twitter All
areas .861 -‐2.5
(-‐4.1,
-‐1.0) -‐32.8
(-‐47.4,
-‐15.6)
Bing All
areas .866 -‐1.9
(-‐3.2,
-‐0.7) -‐21.7
(-‐32.1,
-‐9.10)
Twitter
London
areas
.738 -‐1.7
(-‐2.5,
-‐0.9) -‐30.5
(-‐41.8,
-‐17.5)
Bing
London
areas
.848 -‐2.8
(-‐4.1,
-‐1.6) -‐28.4
(-‐36.7,
-‐17.9)
30. Source Target r δ
x
103 θ
(%)
Twitter All
areas .861 -‐2.5
(-‐4.1,
-‐1.0) -‐32.8
(-‐47.4,
-‐15.6)
Bing All
areas .866 -‐1.9
(-‐3.2,
-‐0.7) -‐21.7
(-‐32.1,
-‐9.10)
Twitter
London
areas
.738 -‐1.7
(-‐2.5,
-‐0.9) -‐30.5
(-‐41.8,
-‐17.5)
Bing
London
areas
.848 -‐2.8
(-‐4.1,
-‐1.6) -‐28.4
(-‐36.7,
-‐17.9)
Impact
estimation
results
(strongly
correlated
controls)
31. Source Target r δ
x
103 θ
(%)
Twitter All
areas .861 -‐2.5
(-‐4.1,
-‐1.0) -‐32.8
(-‐47.4,
-‐15.6)
Bing All
areas .866 -‐1.9
(-‐3.2,
-‐0.7) -‐21.7
(-‐32.1,
-‐9.10)
Twitter
London
areas
.738 -‐1.7
(-‐2.5,
-‐0.9) -‐30.5
(-‐41.8,
-‐17.5)
Bing
London
areas
.848 -‐2.8
(-‐4.1,
-‐1.6) -‐28.4
(-‐36.7,
-‐17.9)
Impact
estimation
results
(strongly
correlated
controls)
32. Impact
estimation
results
(stat.
sig.)
-‐θ
(%)
0
7
14
21
28
35
All
areas London
areas Newham Cumbria Gateshead
30.2
28.7
21.7 21.1
30.430.5
32.8
Twitter Bing
33. Projected
vs.
inferred
ILI
rates
in
vaccinated
locations
Twitter
—
All
areas
Bing
—
All
areas
Oct Nov Dec Jan Feb Mar Apr
0
0.005
0.01
0.015
0.02
ILIratesper100people
weeks during and after the vaccination programme
inferred ILI rates
projected ILI rates
Oct Nov Dec Jan Feb Mar Apr
0
0.005
0.01
0.015
0.02
ILIratesper100people
weeks during and after the vaccination programme
inferred ILI rates
projected ILI rates
34. Projected
vs.
inferred
ILI
rates
in
vaccinated
locations
Twitter
—
London
areas
Bing
—
London
areas
Oct Nov Dec Jan Feb Mar Apr
0
0.005
0.01
ILIratesper100people
weeks during and after the vaccination programme
inferred ILI rates
projected ILI rates
Oct Nov Dec Jan Feb Mar Apr
0
0.005
0.01
0.015
ILIratesper100people
weeks during and after the vaccination programme
inferred ILI rates
projected ILI rates
35. Sensitivity
of
impact
estimates
to
variable
controls
• Repeat
the
impact
estimation
for
the
N
controls
(up
to
a
100)
with
r
≥
95%
of
the
best
r
—>
μ(δ)
and
μ(θ)
(%)
• Measure
%
of
difference,
Δ(θ),
between
θ
and
μ(θ)
Source Target N μ(r) μ(δ)
x
103
μ(θ)
(%) Δθ
(%)
Twitter All
areas 100 0.84 -‐2.5
(0.2) -‐32.7
(2.1) 0.10
Bing All
areas 46 0.85 -‐1.4
(0.4) -‐16.4
(3.6) 24.4
Twitter
London
areas
79 0.70 -‐1.5
(0.1) -‐27.9
(2.0) 8.32
Bing
London
areas
100 0.84 -‐1.4
(0.2) -‐16.9
(1.8) 40.4
36. Sensitivity
of
impact
estimates
to
variable
controls
• Repeat
the
impact
estimation
for
the
N
controls
(up
to
a
100)
with
r
≥
95%
of
the
best
r
—>
μ(δ)
and
μ(θ)
(%)
• Measure
%
of
difference,
Δ(θ),
between
θ
and
μ(θ)
Source Target N μ(r) μ(δ)
x
103
μ(θ)
(%) Δθ
(%)
Twitter All
areas 100 0.84 -‐2.5
(0.2) -‐32.7
(2.1) 0.10
Bing All
areas 46 0.85 -‐1.4
(0.4) -‐16.4
(3.6) 24.4
Twitter
London
areas
79 0.70 -‐1.5
(0.1) -‐27.9
(2.0) 8.32
Bing
London
areas
100 0.84 -‐1.4
(0.2) -‐16.9
(1.8) 40.4
37. Sensitivity
of
impact
estimates
to
variable
controls
• Repeat
the
impact
estimation
for
the
N
controls
(up
to
a
100)
with
r
≥
95%
of
the
best
r
—>
μ(δ)
and
μ(θ)
(%)
• Measure
%
of
difference,
Δ(θ),
between
θ
and
μ(θ)
Source Target N μ(r) μ(δ)
x
103
μ(θ)
(%) Δθ
(%)
Twitter All
areas 100 0.84 -‐2.5
(0.2) -‐32.7
(2.1) 0.10
Bing All
areas 46 0.85 -‐1.4
(0.4) -‐16.4
(3.6) 24.4
Twitter
London
areas
79 0.70 -‐1.5
(0.1) -‐27.9
(2.0) 8.32
Bing
London
areas
100 0.84 -‐1.4
(0.2) -‐16.9
(1.8) 40.4
38. ✓ Background
and
motivation
✓ Estimating
disease
rates
from
online
text
✓ Estimating
the
impact
of
a
health
intervention
✓ Case
study:
influenza
vaccination
impact
๏ Conclusions
&
future
work
Assessing
the
impact
of
a
health
intervention
via
online
content
89%
39. Conclusions
&
points
for
discussion
• Framework
for
estimating
the
impact
of
a
health
intervention
based
on
online
content
• Access
to
different
&
larger
parts
of
the
population
Evaluation
is
hard,
however:
• PHE’s
impact
estimates:
-‐66%
based
on
sentinel
surveillance,
-‐24%
laboratory
confirmed
• Correlation
between
actual
vaccination
uptake
and
our
study’s
estimated
impacts
Why
are
Bing
and
Twitter
estimations
different?
• Different
user
demographics
(?)
—
this
can
be
useful
• Different
temporal
resolution
(Pebody
et
al.,
2014)
40. Potential
future
work
directions
• Improve
supervised
learning
models
- better
natural
language
processing
/
machine
learning
modelling
- combination
of
different
data
sources
• Work
on
unsupervised
techniques
- inferring
/
understanding
the
demographics
of
the
online
medium
will
be
essential
• More
rigorous
evaluation
41. Collaborators,
acknowledgements
&
material
Elad
Yom-‐Tov,
Microsoft
Research
Richard
Pebody,
Public
Health
England
Ingemar
J.
Cox,
UCL
&
University
of
Copenhagen
Jens
Geyti,
UCL
(Software
Engineer)
Simon
de
Lusignan,
University
of
Surrey
&
RCGP
Slides:
ow.ly/RN7MZPaper:
ow.ly/RN9J2
i-‐sense.org.uk
42. Bollen,
Mao
&
Zeng.
Twitter
mood
predicts
the
stock
market.
J
Comp
Science,
2011.
Burger,
Henderson,
Kim
&
Zarrella.
Discriminating
Gender
on
Twitter.
EMNLP,
2011.
Choi
&
Varian.
Predicting
the
Present
with
Google
Trends.
Economic
Record,
2012.
Culotta.
Lightweight
methods
to
estimate
influenza
rates
and
alcohol
sales
volume
from
Twitter
messages.
Lang
Resour
Eval,
2013.
Hoerl
&
Kennard.
Ridge
regression:
biased
estimation
for
nonorthogonal
problems.
Technometrics,
1970.
Lamb,
Paul
&
Dredze.
Separating
Fact
from
Fear:
Tracking
Flu
Infections
on
Twitter.
NAACL,
2013.
Lambert
&
Pregibon.
Online
effects
of
offline
ads.
Data
Mining
&
Audience
Intelligence
for
Advertising,
2008.
Lampos
&
Cristianini.
Tracking
the
flu
pandemic
by
monitoring
the
Social
Web.
CIP,
2010.
Lampos
&
Cristianini.
Nowcasting
Events
from
the
Social
Web
with
Statistical
Learning.
ACM
TIST,
2012.
Lampos,
Miller,
Crossan
&
Stefansen.
Advances
in
nowcasting
influenza-‐like
illness
rates
using
search
query
logs.
Sci
Rep,
2015.
Lampos,
Yom-‐Tov,
Pebody
&
Cox.
Assessing
the
impact
of
a
health
intervention
via
user-‐generated
Internet
content.
DMKD,
2015.
Pebody
et
al.
Uptake
and
impact
of
a
new
live
attenuated
influenza
vaccine
programme
in
England:
early
results
of
a
pilot
in
primary
school-‐age
children,
2013/14
influenza
season.
Eurosurveillance,
2014.
Preotiuc-‐Pietro,
Lampos
&
Aletras.
An
analysis
of
the
user
occupational
class
through
Twitter
content.
ACL,
2015.
Rao,
Yarowsky,
Shreevats
&
Gupta.
Classifying
Latent
User
Attributes
in
Twitter.
SMUC,
2010.
Rasmussen
&
Williams.
Gaussian
Processes
for
Machine
Learning.
MIT
Press,
2006.
Tumasjan,
Sprenger,
Sandner
&
Welpe.
Predicting
Elections
with
Twitter:
What
140
characters
Reveal
about
Political
Sentiment.
ICWSM,
2010.
Zou
&
Hastie.
Regularization
and
variable
selection
via
the
elastic
net.
J
R
Stat
Soc
Series
B
Stat
Methodol,
2005.
References