We look closely on the insurance value chain and assess the impact of Big Data on underwriting, pricing and claims reserving. We examine the ethics of Big Data including data privacy, customer identification, data ownership and the legal aspects. We also discuss new frontiers for insurance and its impact on the actuarial profession. Will actuaries will be able to leverage Big Data, create sophisticated risk models and more personalized insurance offers, and bring new wave of innovation to the market?
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
IABE Big Data information paper - An actuarial perspective
1.
BIG
DATA:
An
actuarial
perspective
Information
Paper
November
2015
2.
2
Table
of
Contents
1
INTRODUCTION
3
2
INTRODUCTION
TO
BIG
DATA
3
2.1
INTRODUCTION
AND
CHARACTERISTICS
3
2.2
BIG
DATA
TECHNIQUES
AND
TOOLS
4
2.3
BIG
DATA
APPLICATIONS
4
2.4
DATA
DRIVEN
BUSINESS
5
3
BIG
DATA
IN
INSURANCE
VALUE
CHAIN
6
3.1
INSURANCE
UNDERWRITING
6
3.2
INSURANCE
PRICING
8
3.3
INSURANCE
RESERVING
10
3.4
CLAIMS
MANAGEMENT
11
4
LEGAL
ASPECTS
OF
BIG
DATA
13
4.1
INTRODUCTION
13
4.2
DATA
PROCESSING
14
4.3
DISCRIMINATION
16
5
NEW
FRONTIERS
17
5.1
RISK
POOLING
VS.
PERSONALIZATION
17
5.2
PERSONALISED
PREMIUM
18
5.3
FROM
INSURANCE
TO
PREVENTION
18
5.4
THE
ALL-‐SEEING
INSURER
18
5.5
CHANGE
IN
INSURANCE
BUSINESS
19
6
ACTUARIAL
SCIENCES
AND
THE
ROLE
OF
ACTUARIES
19
6.1
WHAT
IS
BIG
DATA
BRINGING
FOR
THE
ACTUARY?
19
6.2
WHAT
IS
THE
ACTUARY
BRINGING
TO
BIG
DATA?
20
7
CONCLUSIONS
21
8
REFERENCES
22
3.
3
1 Introduction
The
Internet
has
started
in
1984
linking
1,000
university
and
corporate
labs.
In
1998
it
grew
to
50
million
users,
while
in
2015
it
reached
3.2
billion
people
(44%
of
the
global
population).
This
enormous
user
growth
was
combined
with
an
explosion
of
data
that
we
all
produce.
Every
day
we
create
around
2.5
quintillion
bytes
of
data,
information
coming
from
various
sources
including
social
media
sites,
gadgets,
smartphones,
intelligent
homes
and
cars
or
industrial
sensors
to
name
few.
Any
company
that
can
combine
various
datasets
and
can
entail
effective
data
analytics
will
be
able
to
become
more
profitable
and
successful.
According
to
a
recent
report1
400
large
companies
who
adopted
Big
Data
analytics
"have
gained
a
significant
lead
over
the
rest
of
the
corporate
world."
Big
data
offers
big
business
gains,
but
also
has
hidden
costs
and
complexity
that
companies
will
have
to
struggle
with.
Semi-‐structured
and
unstructured
big
data
requires
new
skills
and
there
is
shortage
of
people
who
mastered
data
science
and
can
handle
mathematics
and
statistics,
programming
and
possess
substantive,
domain
knowledge.
What
will
be
the
impact
on
the
insurance
sector
and
the
actuarial
profession?
The
concepts
of
Big
Data
and
predictive
modelling
are
not
new
to
insurers
who
have
already
been
storing
and
analysing
large
quantities
of
data
to
achieve
deeper
insights
into
customers’
behaviour
or
setting
up
insurance
premiums.
Moreover
actuaries
are
data
scientists
for
insurance
and
they
have
all
the
statistical
training
and
analytical
thinking
to
understand
complexity
of
data
combined
with
the
business
insights.
We
look
closely
on
the
insurance
value
chain
and
assess
the
impact
of
Big
Data
on
underwriting,
pricing
and
claims
reserving.
We
examine
the
ethics
of
Big
Data
including
data
privacy,
customer
identification,
data
ownership
and
the
legal
aspects.
We
also
discuss
new
frontiers
for
insurance
and
its
impact
on
the
actuarial
profession.
Will
actuaries
will
be
able
to
leverage
Big
Data,
create
sophisticated
risk
models
and
more
personalized
insurance
offers,
and
bring
new
wave
of
innovation
to
the
market?
2 Introduction
to
Big
Data
2.1 Introduction
and
characteristics
Big
Data
broadly
refers
to
data
sets
so
large
and
complex
that
they
cannot
be
handled
by
traditional
data
processing
software
and
it
can
be
defined
by
the
following
attributes:
a. Volume:
in
2012
it
was
estimated
that
2.5
x
1018
bytes
of
data
was
created
worldwide
every
day
-‐
this
is
equivalent
to
a
stack
of
books
from
the
Sun
to
Pluto
and
back
again.
This
data
comes
from
everywhere:
sensors
used
to
gather
climate
information,
posts
to
social
media
sites,
digital
pictures
and
videos,
purchase
transaction
records,
software
logs,
GPS
signals
from
mobile
devices,
among
others.
b. Variety
and
Variability:
the
challenges
of
Big
Data
do
not
only
arise
from
the
sheer
volume
of
data
but
also
from
the
fact
that
data
is
generated
in
multiple
forms
as
a
mix
of
unstructured
and
structured
data,
and
as
a
mix
of
data
at
rest
and
data
in
motion
(i.e.
static
and
real
time
data).
Furthermore
the
meaning
of
data
can
change
over
time
or
depend
on
the
context.
Structured
data
is
organized
in
a
way
that
both
computers
and
humans
can
read,
for
example
information
stored
in
traditional
databases.
Unstructured
data
refers
to
data
types
such
as
images,
audio,
video,
social
media
and
other
information
that
are
not
organized
or
easily
interpreted
by
traditional
databases.
It
includes
data
generated
by
machines
such
as
sensors,
web
feeds,
networks
or
service
platforms.
c. Visualization:
the
insights
gained
by
a
company
from
analysing
data
must
be
shared
in
a
way
that
is
efficient
and
understandable
to
the
company’s
stakeholders.
d. Velocity:
data
is
created,
saved,
analysed
and
visualized
at
an
increasing
speed,
making
it
possible
to
analyse
and
visualize
high
volumes
of
data
in
real
time.
e. Veracity:
it
is
essential
that
the
data
is
accurate
in
order
to
generate
value.
f. Value:
the
insights
gleaned
from
Big
Data
can
help
organizations
deepen
customer
engagement,
optimize
operations,
prevent
threats
and
fraud,
and
capitalize
on
new
sources
of
revenue.
1
http://www.bain.com/publications/articles/big_data_the_organizational_challenge.aspx
4.
4
2.2 Big
Data
techniques
and
tools
The
Big
Data
industry
has
been
supported
by
the
following
technologies:
a. The
Apache
Hadoop
software
library
was
initially
released
in
December
2011
and
is
an
open
source
framework
that
allows
for
the
distributed
processing
of
large
data
sets
across
clusters
of
computers
using
simple
algorithms.
It
is
designed
to
scale
up
from
one
to
thousands
of
machines,
each
one
being
a
computational
and
storage
unit.
The
software
library
is
designed
under
the
fundamental
assumption
that
hardware
failures
are
common:
the
library
itself
automatically
detects
and
handles
hardware
failures
in
order
to
guarantee
that
the
services
provided
by
a
computer
cluster
will
stay
available
even
when
the
cluster
is
affected
by
hardware
failures.
A
wide
variety
of
companies
and
organizations
use
Hadoop
for
both
research
and
production:
web-‐based
companies
that
own
some
of
the
world’s
biggest
data
warehouses
(Amazon,
Facebook,
Google,
Twitter,
Yahoo!,
...),
media
groups,
universities
among
others.
A
list
of
Hadoop
users
and
systems
is
available
at
http://wiki.apache.org/hadoop/PoweredBy.
b. Non-‐relational
databases
have
existed
since
the
late
1960s
but
resurfaced
in
2009
(under
the
moniker
of
Not
Only
SQL
-‐
NOSQL))
as
it
became
clear
they
are
especially
well
suited
to
handle
the
Big
Data
challenges
of
volume
and
variety
and
as
they
neatly
fit
within
the
Apache
Hadoop
framework.
c. Cloud
Computing
is
a
kind
of
internet-‐based
computing,
where
shared
resources
and
information
are
provided
to
computers
and
other
devices
on-‐demand
(Wikipedia).
A
service
provider
offers
computing
resources
for
a
fixed
price,
available
online
and
in
general
with
a
high
degree
of
flexibility
and
reliability.
These
technologies
have
been
created
by
major
online
actors
(Amazon,
Google)
followed
by
other
technology
providers
(IBM,
Microsoft,
RedHat).
There
is
a
wide
variety
of
architecture
Public,
Private
and
Hybride
Cloud
with
all
the
objective
of
making
computing
infrastructure
a
commodity
asset
with
the
best
quality/total
cost
of
ownership
ratio.
Having
a
nearly
infinite
amount
of
computing
power
at
hand
with
a
high
flexibility
is
a
key
factor
for
the
success
of
Big
Data
initiatives.
d. Mining
Massive
Datasets
is
a
set
of
methods,
algorithms
and
techniques
that
can
be
used
to
deal
with
Big
Data
problems
and
in
particular
with
volume,
variety
and
velocity
issues.
PageRank
can
be
seen
as
a
major
step
(see
http://infolab.stanford.edu/pub/papers/google.pdf)
and
its
evolution
to
a
Map-‐Reduce
(https://en.wikipedia.org/wiki/MapReduce)
approach
is
definitively
a
breakthrough.
Social
Netword
Analysis
is
becoming
an
area
of
research
in
itself
that
aim
to
extract
useful
information
from
the
massive
amount
of
data
the
Social
Networks
are
providing.
These
methods
are
very
well
suited
to
run
on
software
such
as
Hadoop
in
a
Cloud
Computing
environment.
e. Social
Networks
is
one
source
of
Bid
Data
that
provides
a
stream
of
data
with
a
huge
value
for
almost
all
economic
(and
even
non-‐economic)
actors.
For
most
companies,
it
is
the
very
first
time
in
history
they
are
capable
of
interacting
directly
with
their
customers.
Many
applications
of
Big
Data
make
use
of
these
data
to
provide
enhanced
services,
products
and
to
increase
customer
satisfaction.
2.3 Big
Data
Applications
Big
Data
has
the
potential
to
change
the
way
academic
institutions,
corporate
and
organizations
conduct
business
and
change
our
daily
life.
Great
examples
of
Big
Data
applications
include:
a. Healthcare:
Big
Data
technologies
will
have
a
major
impact
in
healthcare.
IBM
estimates
that
80%
of
medical
data
is
unstructured
and
is
clinically
relevant.
Furthermore
medical
data
resides
in
multiple
places
like
individual
medical
files,
lab
and
imaging
systems,
physician
notes,
medical
correspondence,
etc.
Big
Data
technologies
allow
healthcare
organizations
to
bring
all
the
information
about
an
individual
together
to
get
insights
on
how
to
manage
care
coordination,
outcomes-‐based
reimbursement
models,
patient
engagement
and
outreach
programs.
b. Retail:
Retailers
can
get
insights
for
personalizing
marketing
and
improving
the
effectiveness
of
marketing
campaigns,
for
optimizing
assortment
and
merchandising
decisions,
and
for
removing
inefficiencies
in
distribution
and
operations.
For
instance
several
retailers
now
incorporate
5.
5
Twitter
streams
into
their
analysis
of
loyalty-‐program
data.
The
gained
insights
make
it
possible
to
plan
for
surges
in
demand
for
certain
items
and
to
create
mobile
marketing
campaigns
targeting
specific
customers
with
offers
at
the
times
of
day
they
would
be
most
receptive
to
them.2
c. Politics:
Big
Data
technologies
will
improve
the
efficiency
and
effectiveness
across
the
broad
range
of
government
responsibilities.
Great
example
of
Big
Data
use
in
politics
was
2012
analytics
and
metrics
driven
Barack
Obama’s
presidential
campaign
[1].
Other
examples
include:
i. Threat
and
crime
prediction
and
prevention.
For
instance
the
Detroit
Crime
Commission
has
turned
to
Big
Data
in
its
effort
to
assist
the
government
and
citizens
of
southeast
Michigan
in
the
prevention,
investigation
and
prosecution
of
neighbourhood
crime;3
ii. Detection
of
fraud,
waste
and
errors
in
social
programs;
iii. Detection
of
tax
fraud
and
abuse.
d. Cyber
risk
prevention:
companies
can
analyse
data
traffic
in
their
computer
networks
in
real
time
to
detect
anomalies
that
may
indicate
the
early
stages
of
a
cyber
attack.
Research
firm
Gartner
estimates
that
by
2016,
more
than
25%
of
global
firms
will
adopt
big
data
analytics
for
at
least
one
security
and
fraud
detection
use
case,
up
from
8%
as
at
2014.4
e. Insurance
fraud
detection:
Insurance
companies
can
determine
a
score
for
each
claim
in
order
to
target
for
fraud
investigation
the
claims
with
the
highest
scores
i.e.
the
ones
that
are
most
likely
to
be
fraudulent.
Fraud
detection
is
treated
in
paragraph
3.4.
f. Usage-‐Based
Insurance:
is
an
insurance
scheme,
where
car
insurance
premiums
are
calculated
based
on
dynamic
causal
data,
including
actual
usage
and
driving
behaviour.
Telematics
data
transmitted
from
a
vehicle
combined
with
Big
Data
analytics
enables
insurers
to
distinguish
cautious
drivers
from
aggressive
drivers
and
match
insurance
rate
with
the
actual
risk
incurred.
2.4 Data
driven
business
The
quantity
of
data
is
steeply
increasing
month
after
month
in
the
world.
Some
argue
it
is
time
to
organize
and
use
this
information:
data
must
now
be
viewed
as
a
corporate
asset.
In
order
to
respond
to
this
arising
transformation
of
business
culture,
two
specific
C-‐level
roles
have
thus
appeared
in
the
past
years,
one
in
the
banking
and
the
other
in
the
insurance
industry.
2.4.1 The
Chief
Data
Officer
The
Chief
Data
Officer
(abbreviated
to
CDO)
is
the
first
architect
of
this
“data-‐driven
business”.
Thanks
to
his
role
of
coordinator,
the
CDO
will
be
in
charge
of
the
data
that
drive
the
company,
by:
• defining
and
setting
up
a
strategy
to
guarantee
their
quality,
their
reliability
and
their
coherency;
• organizing
and
classifying
them;
• making
them
accessible
to
the
right
person
at
the
right
moment,
for
the
pertinent
need
and
in
the
right
format.
Thus,
the
Chief
Data
Officer
needs
a
strong
business
background
to
understand
how
business
runs.
The
following
question
will
then
emerge:
to
whom
should
the
CDO
report?
In
some
firms,
the
CDO
is
considered
part
of
the
IT,
and
reports
to
the
CTO
(Chief
Technology
Officer);
in
others,
he
holds
more
of
a
business
role,
reporting
to
the
CEO.
It’s
therefore
up
to
the
company
to
decide,
as
not
two
companies
are
exactly
similar
from
a
structural
point
of
view.
Which
companies
have
already
a
CDO?
Generali
Group
has
appointed
someone
to
this
newly
created
position
in
June
2015.
Other
companies
such
as
HSBC,
Wells
Fargo
and
QBE
had
already
appointed
a
person
to
this
position
in
2013
or
2014.
Even
Barack
Obama
appointed
a
Chief
Data
Officer/Scientist
during
his
2012
campaign
and
the
metrics-‐driven
decision-‐making
campaign
played
a
big
role
in
Obama’s
2
http://asmarterplanet.com/blog/2015/03/surprising-‐insights-‐ibmtwitter-‐alliance.html#more-‐33140
3
http://www.datameer.com/company/news/press-‐releases/detroit-‐crime-‐commission-‐combats-‐crime-‐with-‐
datameer-‐big-‐data-‐analytics.html
4
http://www.gartner.com/newsroom/id/2663015
6.
6
re-‐election.
In
the
beginning,
most
of
the
professionals
holding
the
actual
job
title
“Chief
Data
Officer”
were
located
in
the
United
States.
After
a
while,
Europe
followed
the
move.
Also,
lots
of
people
did
the
job
in
their
day-‐to-‐day
work,
but
didn’t
necessarily
hold
the
title.
Many
analysts
in
the
financial
sector
believe
that
yet
more
insurance
and
banking
companies
will
have
to
do
the
move
in
the
following
years
if
they
want
to
stay
attractive.
2.4.2 The
Chief
Analytics
Officer
Another
C-‐level
position
aroused
in
the
past
months:
the
Chief
Analytics
Officer
(abbreviated
to
CAO).
Are
there
differences
between
a
CAO
and
a
CDO?
Theoretically
a
CDO
focuses
on
tactical
data
management,
while
the
CAO
concentrates
on
the
strategic
deployment
of
analytics.
The
latter’s
focus
is
on
data
analysis
to
find
hidden,
but
valuable,
patterns.
These
will
result
in
operational
decisions
that
will
make
the
company
more
competitive,
more
efficient
and
more
attractive
to
their
potential
and
current
clients.
Therefore,
the
CAO
is
a
normal
prolongation
of
the
data-‐driven
business:
the
more
analytics
are
embedded
in
the
organization,
the
more
you
need
an
executive-‐level
person
to
manage
that
position
and
communicate
the
results
in
an
understandable
way.
The
CAO
usually
reports
to
the
CEO.
In
practice,
some
companies
put
the
CAO
responsibilities
into
the
CDO
tasks,
while
others
distinguish
both
positions.
Currently,
it’s
quite
rare
to
find
an
explicit
“Chief
Analytics
Officer”
position
in
the
banking
and
insurance
sector,
because
of
this
overlap.
But
in
other
fields,
the
distinction
is
often
made.
3 Big
Data
in
insurance
value
chain
Big
Data
provides
new
insights
from
social
networks,
telematics
sensors,
and
other
new
information
channels
and
as
a
result
it
allows
understanding
customer
preferences
better,
enabling
new
business
approaches
and
products,
and
enhancing
existing
internal
models,
processes
and
services.
With
the
rise
of
Big
Data
the
insurance
world
could
fundamentally
change
and
the
entire
insurance
value
chain
could
be
impacted
starting
from
underwriting
to
claims
management.
3.1 Insurance
underwriting
3.1.1 Introduction
In
traditional
insurance
underwriting
and
actuarial
analyses,
for
years
we
have
been
observing
a
never-‐
ending
search
for
more
meaningful
insight
into
individual
policyholder
risk
characteristics
to
distinguish
good
risks
from
the
bad
and
to
accurately
price
each
risk
accordingly.
The
analytics
performed
by
actuaries,
based
on
advanced
mathematical
and
financial
theories,
have
always
been
critically
important
to
an
insurer’s
profitability.
Over
the
last
decade,
however,
revolutionary
advances
in
computing
technology
and
the
explosion
of
new
digital
data
sources
have
expanded
and
reinvented
the
core
disciplines
of
insurers.
Today’s
advanced
analytics
in
insurance
go
much
further
than
traditional
underwriting
and
actuarial
science.
Data
mining
and
predictive
modelling
is
today
the
way
forward
for
insurers
for
improving
pricing,
segmentation
and
increasing
profitability.
3.1.2 What
is
predictive
modelling?
Predictive
modelling
can
be
defined
as
the
analysis
of
large
historical
data
sets
to
identify
correlations
and
interactions
and
the
use
of
this
knowledge
to
predict
future
events.
For
actuaries,
the
concepts
of
predictive
modelling
are
not
new
to
the
profession.
The
use
of
mortality
tables
to
price
life
insurance
products
is
an
example
of
predictive
modelling.
The
Belgian
MK,
FK
and
MR,
FR
tables
showed
the
relationship
between
death
probability
and
the
explaining
variables
of
age,
sex
and
product
type
(in
this
case
life
insurance
or
annuity).
Predictive
models
have
been
around
a
long
time
in
sales
and
marketing
environments
for
example
to
predict
the
probability
of
a
customer
to
buy
a
new
product.
Bringing
together
expertise
from
both
the
actuarial
profession
and
marketing
analytics
can
lead
to
new
innovative
initiatives
where
predictive
models
guide
expert
decisions
in
areas
such
as
claims
management,
fraud
detection
and
underwriting.
3.1.3 From
small
over
medium
to
Big
Data
Insurers
collect
a
wealth
of
information
on
their
customers.
In
the
first
place
during
the
underwriting
process:
by
asking
about
the
claims
history
of
a
customer
for
car
and
home
insurance
for
example.
Another
source
is
the
history
of
the
relationship
the
customer
has
with
the
insurance
company.
While
in
the
past
the
data
was
kept
in
silos
by
product,
the
key
challenge
now
lies
in
gathering
all
this
information
into
one
place
where
the
customer
dimension
is
central.
The
transversal
approach
to
the
database
also
7.
7
reflects
the
recent
evolution
in
marketing:
going
from
the
4P’s
(product,
price,
place,
promotion)
to
the
4C’s5
(customer,
costs,
convenience,
communication).
On
top
of
unleashing
the
value
of
internal
data,
new
data
sources
are
becoming
available
like
for
instance
wearables,
social
networks
to
name
few.
Because
Big
Data
can
be
overwhelming
to
start
with,
medium
data
should
be
considered
at
first.
In
Belgium,
the
strong
bancassurance
tradition
offers
interesting
opportunities
of
combining
the
insurance
and
bank
data
to
create
powerful
predictive
models.
3.1.4 Examples
of
predictive
modelling
for
underwriting
1°
Use
the
360
view
on
the
customer
and
predictive
models
to
maximize
profitability
and
gain
more
business.
By
thoroughly
analysing
data
from
different
sources
and
applying
analytics
to
gain
insight,
insurance
companies
should
strive
to
develop
a
comprehensive
360-‐degree
customer
view.
The
gains
of
this
complete
and
accurate
view
of
the
customer
are
twofold:
• Maximizing
the
profitability
of
the
current
customer
portfolio
through:
o detecting
cross-‐sell
and
up-‐sell
opportunities;
o customer
satisfaction
and
loyalty
actions,
o effective
targeting
of
products
and
services
(e.g.
customers
that
are
most
likely
to
be
in
good
health
or
those
customers
that
are
less
likely
to
have
a
car
accident).
• Acquiring
more
profitable
new
customers
at
a
reduced
marketing
cost:
modelling
the
existing
customers
will
lead
to
useful
information
to
focus
marketing
campaigns
on
the
most
interesting
prospects.
By
combining
data
mining
and
analytics,
insurance
companies
can
better
understand
which
customers
are
most
likely
to
buy,
discover
who
are
their
most
profitable
customers
and
how
to
attract
or
retain
more
of
them.
Another
use
case
can
be
the
evaluation
of
the
underwriting
process
to
improve
the
customer
experience
during
this
on-‐boarding
process.
2°
Predictive
underwriting
for
life
insurance6
Using
predictive
models,
in
theory
it
is
possible
to
predict
the
death
probability
of
a
customer.
However,
the
low
frequency
of
life
insurance
claims
presents
a
challenge
to
modellers.
While
for
car
insurance,
the
probability
of
a
customer
having
a
claim
can
be
around
10%,
for
life
insurance
it
is
around
0,1%
for
the
first
year.
Not
only
does
this
mean
that
a
significant
in
force
book
is
needed
to
have
confidence
in
the
results,
but
also
that
sufficient
history
should
be
present
to
be
able
to
show
mortality
experience
over
time.
For
this
reason,
using
the
underwriting
decision
as
the
variable
to
predict
is
a
more
common
choice.
All
life
insurance
companies
hold
historical
data
on
medical
underwriting
decisions
that
can
be
leveraged
to
build
predictive
models
that
predict
underwriting
decisions.
Depending
on
how
the
model
is
used,
the
outcome
can
be
a
reduction
of
costs
for
medical
examinations,
to
have
more
customer
friendly
processes
by
avoiding
asking
numerous
invasive
personal
questions
or
a
reduction
in
time
needed
to
assess
the
risks
by
automatically
approving
good
risks
and
focusing
underwriting
efforts
on
more
complex
cases.
For
example,
if
the
predictive
model
tells
you
that
a
new
customer
has
a
high
degree
of
similarity
to
customers
that
passed
the
medical
examination,
the
medical
examination
could
be
waved
for
this
customer.
If
this
sounds
scary
for
risk
professionals,
first
a
softer
approach
can
be
tested,
for
instance
by
improving
marketing
actions
by
targeting
only
those
individuals
that
have
a
high
likelihood
to
be
in
good
health.
This
not
only
decreases
the
cost
of
the
campaign,
but
also
avoids
the
disappointment
of
a
potential
customer
who
is
refused
during
the
medical
screening
process.
5
http://www.customfitonline.com/news/2012/10/19/4-‐cs-‐versus-‐the-‐4-‐ps-‐of-‐marketing/
6
Predictive
modeling
for
life
insurance,
April
2010,
Deloitte
8.
8
3.1.5 Challenges
of
predictive
modelling
in
underwriting7
Predictive
models
can
only
be
as
good
as
the
input
used
to
calibrate
the
model.
The
first
challenge
in
every
predictive
modelling
project
is
to
collect
relevant,
high
quality
data
of
which
a
history
is
present.
As
many
insurers
are
currently
replacing
legacy
systems
to
reduce
maintenance
costs,
this
can
be
at
the
expense
of
the
history.
Actuaries
are
uniquely
placed
to
prevent
the
history
being
lost,
as
for
adequate
risk
management;
a
portfolio’s
history
should
be
kept.
The
trend
of
moving
all
policies
from
several
legacy
systems
into
one
modern
single
policy
administration
system
is
an
opportunity
that
must
be
seized
so
in
the
future
data
collection
will
be
easier.
Once
the
necessary
data
are
collected,
some
legal
or
compliance
concerns
need
to
be
addressed
as
there
might
be
boundaries
to
using
certain
variables
in
the
underwriting
process.
In
Europe,
if
the
model
will
influence
the
price
of
the
insurance,
gender
is
no
longer
allowed
as
an
explanatory
variable.
And
this
is
only
one
example.
It
is
important
that
the
purpose
of
the
model
and
the
possible
inputs
are
discussed
with
the
legal
department
prior
to
starting
the
modelling.
Once
the
model
is
built,
it
is
important
that
the
users
realize
that
no
model
is
perfect.
This
means
that
residual
risks
will
be
present
and
this
should
be
put
in
the
balance
against
the
gains
that
the
use
of
the
model
can
bring.
And
finally,
once
a
predictive
model
has
been
set
up,
a
continuous
reviewing
cycle
must
be
put
in
place
that
collects
feedback
from
the
underwriting
and
sales
teams
and
collects
data
to
improve
and
refine
the
model.
Building
a
predictive
model
is
a
continuous
improvement
process,
not
a
one-‐off
project.
3.2 Insurance
pricing
3.2.1 Overview
of
existing
pricing
techniques
The
first
rate-‐making
techniques
were
based
on
rudimentary
methods
such
as
univariate
analysis
and
later
iterative
standardized
univariate
methods
such
as
the
minimum
bias
procedure.
They
look
at
how
changes
in
one
characteristic
result
in
differences
in
loss
frequency
or
severity.
Later
on
insurance
companies
moved
to
multivariate
methods.
However,
this
was
associated
with
a
further
development
of
the
computing
power
and
data
capabilities.
These
techniques
are
now
being
adopted
by
more
and
more
insurers
and
are
becoming
part
of
everyday
business
practices.
Multivariate
analytical
techniques
focus
on
individual
level
data
and
take
into
account
the
effects
(interactions)
that
many
different
characteristics
of
a
risk
have
on
one
another.
As
it
was
explained
in
the
previous
section,
many
companies
use
predictive
modelling
(a
form
of
multivariate
analysis)
to
create
measures
of
the
likelihood
that
a
customer
will
purchase
a
particular
product.
Banks
use
these
tools
to
create
measures
(e.g.
credit
scores)
of
whether
a
client
will
be
able
to
meet
lending
obligations
for
a
loan
or
mortgage.
Similarly,
P&C
insurers
can
use
predictive
models
to
predict
claim
behaviour.
Multivariate
methods
provide
valuable
diagnostics
that
aid
in
understanding
the
certainty
and
reasonableness
of
results.
Generalized
Linear
Models
are
essentially
a
generalized
form
of
linear
models.
This
family
encompasses
normal
error
linear
regression
models
and
the
nonlinear
exponential,
logistic
and
Poisson
regression
models,
as
well
as
many
other
models,
such
as
log-‐linear
models
for
categorical
data.
Generalized
linear
models
have
become
the
standard
for
classification
rate-‐making
in
most
developed
insurance
markets—
particularly
because
of
the
benefit
of
transparency.
Understanding
the
mathematical
underpinnings
is
an
important
responsibility
of
the
rate-‐making
actuary
who
intends
to
use
such
a
method.
Linear
models
are
a
good
place
to
start
as
GLMs
are
essentially
a
generalized
form
of
such
a
model.
As
with
many
techniques,
visualizing
the
GLM
results
is
an
intuitive
way
to
connect
the
theory
with
the
practical
use.
GLMs
do
not
stand
alone
as
the
only
multivariate
classification
method.
Other
methods
such
as
CART,
factor
analysis,
and
neural
networks
are
often
used
to
augment
GLM
analysis.
In
general
the
data
mining
techniques
listed
above
can
enhance
a
rate-‐making
exercise
by:
• whittling
down
a
long
list
of
potential
explanatory
variables
to
a
more
manageable
list
for
use
within
a
GLM;
• providing
guidance
in
how
to
categorize
discrete
variables;
7
Predictive
modelling
in
insurance:
key
issues
to
consider
throughout
the
lifecycle
of
a
model
9.
9
• reducing
the
dimension
of
multi-‐level
discrete
variables
(i.e.,
condensing
100
levels,
many
of
which
have
few
or
no
claims,
into
20
homogenous
levels);
• identifying
candidates
for
interaction
variables
within
GLMs
by
detecting
patterns
of
interdependency
between
variables.
3.2.2 Old
versus
new
modelling
techniques
The
adoption
of
GLMs
resulted
in
many
companies
seeking
external
data
sources
to
augment
what
had
already
been
collected
and
analysed
about
their
own
policies.
This
includes
but
is
not
limited
to
information
about
geo-‐demographics,
sensor
data,
social
media
information,
weather,
and
property
characteristics,
information
about
insured
individuals
or
business.
This
additional
data
helps
actuaries
further
improve
the
granularity
and
accuracy
of
classification
rate-‐making.
Unfortunately
this
new
data
is
very
often
unstructured
and
massive,
and
hence
the
traditional
generalized
linear
model
(GLM)
techniques
become
useless.
With
so
many
unique
new
variables
in
play,
it
can
become
a
very
difficult
task
to
identify
and
take
advantage
of
the
most
meaningful
correlations.
In
many
cases,
GLM
techniques
are
simply
unable
to
penetrate
deeply
into
these
giant
stores.
Even
in
the
cases
when
they
can,
the
time
constraints
required
to
uncover
the
critical
correlations
tend
to
be
onerous,
requiring
days,
weeks,
and
even
months
of
analysis.
Only
with
advanced
techniques,
and
specifically
machine
learning,
can
companies
generate
predictive
models
to
take
advantage
of
all
the
data
they
are
capturing.
Machine
learning
is
the
modern
science
of
finding
patterns
and
making
predictions
from
data
based
on
work
in
multivariate
statistics,
data
mining,
pattern
recognition,
and
advanced/predictive
analytics.
Machine
learning
methods
are
particularly
effective
in
situations
where
deep
and
predictive
insights
need
to
be
uncovered
from
data
sets
that
are
large,
diverse
and
fast
changing
—
Big
Data.
Across
these
types
of
data,
machine
learning
easily
outperforms
traditional
methods
on
accuracy,
scale,
and
speed.
3.2.3 Personalized
and
Real-‐time
pricing
–
Motor
Insurance
In
order
to
price
risk
more
accurately,
insurance
companies
are
now
combining
analytical
applications
–
e.g.
behavioural
models
based
on
customer
profile
data
–
with
a
continuous
stream
of
real
time
data
–
e.g.
satellite
data,
weather
reports,
vehicle
sensors
–
to
create
detailed
and
personalized
assessment
of
risk.
Usage-‐based
insurance
(UBI)
has
been
around
for
a
while
–
it
began
with
Pay-‐As-‐You-‐Drive
programs
that
gave
drivers
discounts
on
their
insurance
premiums
for
driving
under
a
set
number
of
miles.
These
soon
developed
into
Pay-‐How-‐You-‐Drive
programs,
which
track
your
driving
habits
and
give
you
discounts
for
'safe'
driving.
UBI
allows
a
firm
to
snap
a
picture
of
an
individual's
specific
risk
profile,
based
on
that
individual's
actual
driving
habits.
UBI
condenses
the
period
of
time
under
inspection
to
a
few
months,
guaranteeing
a
much
more
relevant
pool
of
information.
With
all
this
data
available,
the
pricing
scheme
for
UBI
deviates
greatly
from
that
of
traditional
auto
insurance.
Traditional
auto
insurance
relies
on
actuarial
studies
of
aggregated
historical
data
to
produce
rating
factors
that
include
driving
record,
credit-‐based
insurance
score,
personal
characteristics
(age,
gender,
and
marital
status),
vehicle
type,
living
location,
vehicle
use,
previous
claims,
liability
limits,
and
deductibles.
Policyholders
tend
to
think
of
traditional
auto
insurance
as
a
fixed
cost,
assessed
annually
and
usually
paid
for
in
lump
sums
on
an
annual,
semi-‐annual,
or
quarterly
basis.
However,
studies
show
that
there
is
a
strong
correlation
between
claim
and
loss
costs
and
mileage
driven,
particularly
within
existing
price
rating
factors
(such
as
class
and
territory).
For
this
reason,
many
UBI
programs
seek
to
convert
the
fixed
costs
associated
with
mileage
driven
into
variable
costs
that
can
be
used
in
conjunction
with
other
rating
factors
in
the
premium
calculation.
UBI
has
the
advantage
of
utilizing
individual
and
current
driving
behaviours,
rather
than
relying
on
aggregated
statistics
and
driving
records
that
are
based
on
past
trends
and
events,
making
premium
pricing
more
individualized
and
precise.
3.2.4 Advantages
UBI
programs
offer
many
advantages
to
insurers,
consumers
and
society.
Linking
insurance
premiums
more
closely
to
actual
individual
vehicle
or
fleet
performance
allows
insurers
to
price
premiums
more
accurately.
This
increases
affordability
for
lower-‐risk
drivers,
many
of
whom
are
also
lower-‐income
drivers.
It
also
gives
consumers
the
ability
to
control
their
premium
costs
by
encouraging
them
to
reduce
10.
10
miles
driven
and
adopt
safer
driving
habits.
The
use
of
telematics
helps
insurers
to
more
accurately
estimate
accident
damages
and
reduce
fraud
by
enabling
them
to
analyse
the
driving
data
(such
as
hard
breaking,
speed,
and
time)
during
an
accident.
This
additional
data
can
also
be
used
by
insurers
to
refine
or
differentiate
UBI
products.
3.2.5 Shortcomings/challenges
3.2.5.1 Organization
and
resources
Taking
advantage
of
the
potential
of
Big
Data
requires
some
different
approaches
to
organization,
resources,
and
technology.
As
in
many
new
technologies
that
offer
promise,
there
are
challenges
to
successful
implementation
and
the
production
of
meaningful
business
results.
The
number
one
organizational
challenge
is
determining
the
business
value,
with
financing
as
a
close
second.
Talent
is
the
other
big
issue
–
identifying
the
business
and
technology
experts
inside
the
enterprise,
recruiting
new
employees,
training
and
mentoring
individuals,
and
partnering
with
outside
resources
is
clearly
a
critical
success
factor
for
Big
Data.
Implementing
the
new
technology
and
organizing
the
data
are
listed
as
lesser
challenges
by
insurers,
although
there
are
still
areas
that
require
attention.
3.2.5.2 Technology
challenges
The
biggest
technology
challenge
in
the
Big
Data
world
is
framed
in
the
context
of
different
Big
Data
“V”
characteristics.
These
include
the
standard
three
V’s
of
volume,
velocity,
and
variety,
plus
two
more
–
veracity
and
value.
The
variety
and
veracity
of
the
data
presents
the
biggest
challenges.
As
insurers
venture
beyond
analysis
of
structured
transaction
data
to
incorporate
external
data
and
unstructured
data
of
all
sorts,
the
ability
to
combine
and
input
the
data
into
an
analytic
analysis
may
be
complicated.
On
one
hand,
the
variety
expresses
the
promise
of
Big
Data,
but
on
the
other
hand,
the
technical
challenges
are
significant.
The
veracity
of
the
data
is
also
deemed
as
a
challenge.
It
is
true
that
some
Big
Data
analyses
do
not
require
the
data
to
be
as
cleaned
and
organized
as
in
traditional
approaches.
However,
the
data
must
still
reflect
the
underlying
truth/reality
of
the
domain.
3.2.5.3 Technology
Approaches
Technology
should
not
be
the
first
focus
area
for
evaluating
the
potential
of
Big
Data
in
an
organization.
However,
choosing
the
best
technology
platform
for
your
organization
and
business
problems
does
become
an
important
consideration
for
success.
Cloud
computing
will
play
a
very
important
role
in
Big
Data.
Although
there
are
challenges
and
new
approaches
required
for
Big
Data,
there
is
a
growing
body
of
experience,
expertise,
and
best
practices
to
assist
in
successful
Big
Data
implementations.
3.3 Insurance
Reserving
Loss
reserving
is
a
classic
actuarial
problem
encountered
extensively
in
motor,
property
and
casualty
as
well
as
in
health
insurance.
It
is
a
consequence
of
the
fact
that
insurers
need
to
set
reserves
to
cover
future
liabilities
related
to
the
book
of
contracts.
In
other
words
the
insurer
has
to
hold
funds
aside
to
meet
future
liabilities
attached
to
incurred
claims.
In
non-‐life
insurance,
most
policies
run
for
a
period
of
12
months.
However
the
claims
payment
process
can
take
years
or
even
decades.
In
particular,
losses
arising
from
casualty
insurance
can
take
a
long
time
to
settle
and
even
when
the
claims
are
acknowledged,
it
may
take
time
to
establish
the
extent
of
the
claims
settlement
costs.
A
well-‐known
and
costly
example
is
provided
by
the
claims
from
asbestos
liabilities.
Thus
it
is
not
a
surprise
that
the
biggest
item
on
the
liabilities
side
of
an
insurer’s
balance
sheet
is
often
the
provision
of
reserves
for
future
claims
payments.
It
is
the
job
of
the
reserving
actuary
to
predict,
with
maximum
accuracy,
the
total
amount
necessary
to
pay
those
claims
that
the
insurer
has
legally
committed
to
cover
for.
Historically,
reserving
was
based
on
deterministic
calculations
with
pen
and
paper,
combined
with
expert
judgement.
Since
the
1980s,
the
arrival
of
personal
computers
and
‘spreadsheet’
software
packages
induced
a
real
change
for
the
reserving
actuaries.
The
use
of
spreadsheets
does
not
only
result
in
gain
of
calculation
time
but
allows
also
testing
different
scenarios
and
the
sensitivity
of
the
forecasts.
The
first
simple
models
used
by
actuaries
started
to
evolve
towards
more
developed
ideas
through
the
evolution
of
the
IT
resources.
Moreover
the
recent
changes
in
regulatory
requirements,
such
as
Solvency
II
in
Europe,
have
showed
the
need
of
stochastic
models
and
more
precise
statistical
techniques.
11.
11
3.3.1 Classical
methods
There
are
a
lot
of
different
frameworks
and
models
used
by
reserving
actuaries
to
compute
the
technical
provisions,
and
it
is
not
the
goal
of
this
paper
to
review
them
in
an
exhaustive
way
but
rather
to
show
that
they
share
the
central
notion
of
triangle.
A
triangle
is
a
way
of
presenting
data
in
the
form
of
a
triangular
structure
showing
the
development
of
claims
over
time
for
each
origin
period.
An
origin
period
can
be
the
year
the
policy
was
written
or
earned,
or
the
loss
occurrence
period.
After
having
used
deterministic
models,
reserving
generally
switches
to
stochastic
models.
These
models
allow
for
quantifying
reserve
risk.
The
use
of
models
based
on
aggregated
data
used
to
be
convenient
in
the
past
when
IT
resources
were
limited
but
is
more
and
more
questionable
nowadays
when
we
have
huge
computational
power
at
hand
at
an
affordable
price.
Therefore
there
is
a
need
to
move
to
models
that
fully
use
data
available
in
the
insurers’
data
warehouses.
3.3.2 Micro-‐level
reserving
methods
Unlike
aggregate
models
(or
macro-‐level
models),
micro-‐level
reserving
methods
(also
called
individual
claim
level
models)
use
individual
claims
data
as
inputs
and
estimate
outstanding
liabilities
for
each
individual
claim.
Unlike
the
models
detailed
in
the
previous
section,
they
model
very
precisely
the
lifetime
development
process
of
each
individual
claim,
including
events
such
as
claim
occurrence,
reporting,
payments
and
settlement.
Moreover
they
can
include
micro-‐level
covariates
such
as
information
about
the
policy,
the
policyholder,
claim,
claimant
and
transactions.
When
well
specified,
such
models
are
expected
to
generate
reliable
reserve
estimates.
Indeed
the
ability
to
model
the
claims
development
at
the
individual
level
and
to
incorporate
micro-‐level
covariate
information
allows
micro-‐level
models
to
handle
heterogeneities
in
claims
data
efficiently.
Moreover
the
large
amount
of
data
used
in
modelling
can
help
to
avoid
issues
of
over-‐parameterization
and
lack
of
robustness.
As
a
consequence,
micro-‐level
models
are
especially
significant
under
changing
environments,
as
these
changes
can
be
indicated
by
appropriate
covariates.
3.4 Claims
Management
Big
Data
can
play
a
tremendous
role
in
the
improvement
of
claims
management.
It
provides
access
to
data
that
was
not
available
before,
and
makes
the
claims
processing
faster.
Therefore
it
enables
improved
risk
management,
reduces
loss
adjustment
expenses
and
enhances
quality
of
service
resulting
in
increased
customer
retention.
Below
we
present
details
of
how
Big
Data
analytics
improves
fraud
detection
process.
3.4.1 Fraud
detection
It
is
estimated
that
a
typical
organization
loses
5%
of
its
revenues
to
fraud
each
year8.
The
total
cost
of
insurance
fraud
(non-‐health
insurance)
in
the
US
is
estimated
to
be
more
than
$40
billion
per
year9.
The
advent
of
Big
Data
&
Analytics
has
provided
new
and
powerful
tools
to
fight
fraud.
3.4.2 What
are
the
current
challenges
in
fraud
detection?
The
first
challenge
is
finding
the
right
data.
Analytical
models
need
data
and
in
a
fraud
detection
setting
this
is
not
always
that
evident.
Collected
fraud
data
are
often
very
skew,
with
typically
less
than
1%
fraudsters,
which
seriously
complicates
the
detection
task.
Also
the
asymmetric
costs
of
missing
fraud
versus
harassing
non-‐fraudulent
customers
represent
important
model
difficulties.
Furthermore,
fraudsters
try
to
constantly
outperform
the
analytical
models
such
that
these
models
should
be
permanently
monitored
and
re-‐configured
on
an
ongoing
basis.
3.4.3 What
analytical
approaches
are
being
used
to
tackle
fraud?
Most
of
the
fraud
detection
models
in
use
nowadays
are
expert
based
models.
When
data
becomes
available,
one
can
start
doing
analytics.
A
first
approach
is
supervised
learning
which
analyses
a
labelled
data
set
of
historically
observed
fraud
behaviour.
It
can
be
used
to
both
predict
fraud
as
well
as
the
amount
thereof.
Unsupervised
learning
starts
from
an
unlabelled
data
set
and
performs
anomaly
detection.
Finally,
Social
network
learning
analyses
fraud
behaviour
in
networks
of
linked
entities.
Throughout
our
research,
it
has
been
found
that
this
approach
is
superior
to
all
others!
8
www.acfe.com
9
www.fbi.gov
12.
12
3.4.4 What
are
the
key
characteristics
of
successful
analytical
models
for
fraud
detection?
Successful
fraud
analytical
models
should
satisfy
various
requirements.
First,
they
should
achieve
good
statistical
performance
in
terms
of
recall
or
hit
rate,
which
is
the
percentage
of
fraudsters
labelled
by
the
analytical
model
as
suspicious,
and
precision,
which
is
the
percentage
of
fraudsters
amongst
the
ones
labelled
as
suspicious.
Next,
the
analytical
models
should
not
be
based
on
complex
mathematical
formulas
(such
as
neural
networks,
support
vector
machines,...)
but
should
provide
clear
insight
into
the
fraud
mechanisms
adopted.
This
is
particularly
important
since
the
insights
gained
will
be
used
to
develop
new
fraud
prevention
strategies.
Also
the
operational
efficiency
of
the
fraud
analytical
model
needs
to
be
evaluated.
This
refers
to
the
amount
of
resources
needed
to
calculate
the
fraud
score
and
adequately
act
upon
it.
E.g.,
in
a
credit
card
fraud
environment,
a
decision
needs
to
be
made
within
a
few
seconds
after
the
transaction
was
initiated.
3.4.5 Use
of
social
network
analytics
to
detect
fraud10
Research
has
proven
that
network
models
significantly
outperform
non-‐network
models
in
terms
of
accuracy,
precision
and
recall.
Network
analytics
can
help
improve
fraud
detection
techniques.
Fraud
is
present
in
many
critical
human
processes
such
as
credit
card
transactions,
insurance
claim
fraud,
opinion
fraud,
social
security
fraud...
Fraud
can
be
defined
by
the
following
five
characteristics.
Fraud
is
an
uncommon,
well-‐considered,
imperceptibly
concealed,
time-‐evolving
and
often
carefully
organized
crime,
which
appears
in
many
types
and
forms.
Before
applying
fraud
detection
techniques,
these
five
issues
should
be
resolved
or
counterbalanced.
Fraud
is
an
uncommon
crime
and
this
means
that
it
is
an
extremely
skewed
class
distribution.
Rebalancing
techniques
could
be
used
such
as
the
SMOTE
to
counterbalance
this
effect.
SMOTE
consists
in
under
sampling
the
majority
class
of
data
(reduce
the
number
of
legitimate
cases)
and
oversampling
the
minority
class
of
data
(duplicate
of
fraud
cases
or
create
artificial
fraud
cases).
Complex
fraud
structures
are
well-‐considered,
this
implies
that
there
will
be
changes
in
behaviour
over
time
so
not
every
time
period
will
have
the
same
importance.
A
temporal
weighting
adjustment
should
put
an
emphasis
on
the
more
important
periods
(more
recent
data
periods)
that
could
be
explanatory
of
the
fraudulent
behaviour.
Fraud
is
imperceptibly
concealed
meaning
that
it
is
difficult
to
identify
fraud.
One
could
leverage
on
expert
knowledge
to
create
features
and
help
identify
fraud.
Fraud
is
time-‐evolving.
The
period
of
study
should
be
selected
carefully
taking
into
consideration
that
fraud
evolves
over
time.
How
much
of
previous
time
periods
could
explain
or
affect
the
present?
The
model
should
incorporate
these
changes
over
time.
Another
question
to
rise
is
in
what
time-‐window
the
model
should
be
able
to
detect
fraud:
short,
medium
or
long
term.
The
last
characteristic
of
fraud
is
that
it
is
most
of
the
time
carefully
organized.
Fraud
is
often
not
an
individual
phenomenon,
in
fact
there
are
many
interactions
between
fraudsters.
Often
there
are
fraud
sub-‐networks
developing
in
a
bigger
network.
Social
network
analysis
could
be
used
to
detect
these
networks.
Social
Network
analysis
helps
deriving
useful
patterns
and
insights
by
exploiting
the
relational
structure
between
objects.
A
network
consists
of
two
set
of
elements:
the
objects
of
the
network
which
are
called
nodes
and
the
relationships
between
nodes
which
are
called
links.
The
links
connect
two
or
more
nodes.
A
weight
could
be
assigned
to
the
nodes
and
links
to
measure
the
magnitude
of
the
crime
or
the
intensity
of
the
relationship.
When
constructing
such
networks,
focus
will
be
put
on
the
neighbourhood
of
a
node
which
is
a
subgraph
of
network
around
the
node
of
interest
(fraudster).
Once
a
network
has
been
constructed,
how
could
this
network
be
used
as
an
indicator
of
fraudulent
activities?
Fraud
could
be
detected
by
answering
following
question:
Does
the
network
contain
statistically
significant
patterns
of
homophily?
Detection
of
fraud
relies
on
a
concept
often
used
in
sociology
which
is
called
homophily.
Homophily
in
networks
consists
in
people
have
a
strong
tendency
to
10
based
on
the
research
of
Véronique
Van
Vlasselaer
(KULeuven)
13.
13
associate
with
other
whom
they
perceive
as
being
similar
to
themselves
in
some
way.
This
concept
could
be
translated
in
fraud
networks:
fraudulent
people
are
more
likely
to
be
connected
to
other
fraudulent
people.
Clustering
techniques
could
be
used
to
detect
significant
pattern
of
homophily
and
thus
could
spot
fraudsters.
Given
a
homophilic
network
with
evidence
of
fraud
clusters
then
it
is
possible
to
extract
features
from
the
network
around
the
node(s)
of
interest
(fraud
activity)
which
is
also
called
the
neighbourhood
of
the
node.
This
process
is
called
the
featurization
process:
extracting
features
for
each
network
object
based
on
its
neighbourhood.
Focus
will
be
put
on
the
first-‐order
neighbourhood
(first-‐degree
links)
also
known
as
the
“egonet”.
(ego:
node
of
interest
surrounded
by
its
direct
associates
known
as
alters).
Featurization
extraction
happens
at
two
levels:
egonet
generic
features
(how
many
fraudulent
resources
are
associated
to
that
company,
is
there
relationships
between
resources...)
and
alter
specific
features
(how
similar
are
the
alter
to
the
ego,
is
the
alter
involved
in
many
fraud
cases
or
not).
Once
these
first-‐order
neighbourhood
features
for
each
subject
of
interest
(companies)
have
been
extracted
such
as
degree
of
fraudulent
resources,
the
weight
of
the
fraudulent
resources,
it
is
then
easy
to
derive
the
propagation
effect
of
these
fraudulent
influences
through
the
network.
To
conclude,
network
models
always
outperform
non-‐network
models
as
they
are
able
to
better
distinguish
fraudsters
from
non-‐fraudsters.
They
are
also
more
precise
in
generating
high-‐risk
companies
and
smaller
list
and
better
detect
more
fraudulent
corporates.
3.4.6 Fraud
detection
in
motor
insurance
–
Usage-‐Based
Insurance
example
In
2014,
Coalition
Against
Insurance
Fraud11,
with
assistance
of
business
analytics
company
SAS,
has
published
a
report
in
which
it
stresses
that
technology
plays
a
growing
role
in
fighting
fraud.
“Insurers
are
investing
in
different
technologies
to
combat
fraud,
but
a
common
component
to
all
these
solutions
is
data,”
said
Stuart
Rose,
Global
Insurance
Marketing
Principal
at
SAS.
“The
ability
to
aggregate
and
easily
visualize
data
is
essential
to
identify
specific
fraud
patterns.”
“Technology
is
playing
a
larger
and
more
trusted
role
with
insurers
in
countering
growing
fraud
threats.
Software
tools
provide
the
efficiency
insurers
need
to
thwart
more
scams
and
impose
downward
pressure
on
premiums
for
policyholders,”
said
Dennis
Jay,
the
Coalition’s
executive
director.
In
motor
insurance,
a
good
example
is
Usage-‐Based
Insurance
(UBI),
where
insurers
can
benefit
from
the
superior
fraud
detection
that
telematics
can
provide.
It
equips
an
insurer
with
driving
behaviour
and
driving
exposure
patterns
including
information
about
speeding,
driving
dynamics,
driving
trips,
day
and
night
driving
patterns,
garaging
address
or
mileage.
In
some
sense
UBI
can
become
a
“lie
detector”
and
can
help
companies
to
detect
falsification
of
the
garaging
address,
annual
mileage
or
driving
behaviour.
Thanks
to
recording
vehicle’s
geographical
location
and
detecting
sharp
braking
and
harsh
acceleration
during
an
accident,
an
insurer
can
analyse
accident
details
and
estimate
accident
damages.
The
telematics
devices
used
in
the
UBI
can
contain
first
notice
of
loss
(FNOL)
services,
providing
very
valuable
information
for
insurers.
Analytics
performed
on
this
data
provide
additional
evidence
to
consider
when
investigating
a
claim,
and
can
help
to
reduce
fraud
and
claims
disputes.
4 Legal
aspects
of
Big
Data
4.1 Introduction
Data
processing
lies
at
the
very
heart
of
the
insurance
activities.
Insurers
and
intermediaries
collect
and
process
vast
amounts
of
personal
data
about
their
customers.
At
the
same
time
they
are
dealing
with
a
particular
type
of
‘discrimination’
among
their
insureds.
Like
all
businesses
operating
in
Europe,
insurers
are
subject
to
European
and
national
data
protection
laws
and
anti-‐discrimination
rules.
The
fast
technological
evolution
and
globalization
has
activated
a
comprehensive
reform
of
the
current
Data
Protection
laws.
The
EU
hopes
to
complete
a
new
General
Data
Protection
Regulation
at
the
end
of
this
year.
Insurers
are
concerned
that
this
new
Regulation
could
introduce
unintended
consequences
for
the
insurance
industry.
11
http://www.insurancefraud.org/about-‐us.htm
14.
14
4.2 Data
processing
4.2.1 Legislation:
an
overview
Insurers
collect
and
process
data
to
analyse
risks
that
individuals
wish
to
cover,
to
tailor
products
accordingly,
to
valuate
and
pay
claims
and
benefits,
and
detect
and
prevent
insurance
fraud.
The
rise
of
Big
Data
presents
opportunities
to
offer
more
creative,
competitive
pricing
and,
importantly,
predict
customers’
behavioural
activity.
As
insurers
continue
to
explore
this
relatively
untapped
resource,
evolutions
in
data
processing
legislation
need
to
be
followed
very
closely.
The
protection
of
personal
data
was
-‐
as
a
separate
right
granted
to
an
individual
-‐
for
the
first
time
guaranteed
in
the
Convention
for
the
Protection
of
Individuals
with
regard
to
Automatic
Processing
of
Personal
Data
(Convention
108).
It
was
adopted
by
the
Council
of
Europe
in
1981.
The
current,
principal
EU
legal
instrument
establishing
rules
for
fair
personal
data
processing
is
the
Data
Protection
Directive
(95/46/EC)
of
1995,
which
regulates
the
protection
of
individuals
with
regard
to
the
processing
of
personal
data
and
the
free
movement
of
such
data.
As
a
framework
law,
the
Directive
had
to
be
implemented
in
EU
Member
States
through
national
laws.
This
Directive
has
set
a
standard
for
the
legal
definition
of
personal
data
and
regulatory
responses
to
the
use
of
personal
data.
The
provisions
includes
principles
related
to
data
quality,
criteria
for
making
data
processing
legitimate
and
the
essential
right
not
to
be
subject
to
automated
individual
decisions.
The
Data
Protection
Directive
was
complemented
by
other
legal
instruments,
such
as
the
E-‐Privacy
Directive
(2002/58/EC),
part
of
a
package
of
5
new
Directives
that
aim
to
reform
the
legal
and
regulatory
framework
of
electronic
communications
services
in
the
EU.
Personal
data
and
individuals’
fundamental
right
to
privacy
needs
to
be
protected
but
at
the
same
time
the
legislator
must
take
into
account
the
legitimate
interests
of
governments
and
businesses.
One
of
the
innovative
provisions
of
this
Directive
was
the
introduction
of
a
legal
framework
for
the
use
of
devices
for
storing
or
retrieving
information,
such
as
cookies.
Companies
must
also
inform
customers
of
the
data
processing
to
which
their
data
will
be
subject
and
obtain
subscriber
consent
before
using
traffic
data
for
marketing
or
before
offering
added
value
services
with
traffic
or
location
data.
The
EU
Cookie
Directive
(2009/136/EC),
an
amendment
of
the
E-‐
Privacy
Directive,
aims
to
increase
consumer
protection
and
requires
websites
to
obtain
informed
consent
from
visitors
before
they
store
information
on
a
computer
or
any
web
connected
device.
In
2006
the
EU
Data
Retention
Directive
(2006/24/EC)
was
adopted
as
an
anti-‐terrorism
measure
after
the
terrorist
attacks
in
Madrid
and
London.
However
on
8
April
2014,
the
European
Court
of
Justice
declared
this
Directive
invalid.
The
Court
took
the
view
that
the
Directive
does
not
meet
the
principle
of
proportionality
and
should
have
provided
more
safeguards
to
protect
the
fundamental
rights
with
respect
to
private
life
and
to
the
protection
of
personal
data.
Belgium
has
established
a
Privacy
Act
or
Data
Protection
Act
in
1992.
Since
the
introduction
of
the
EU
Data
Protection
Directive
(1995)
the
principles
of
that
directive
has
been
transposed
into
Belgian
law.
The
Privacy
Act
consequently
underwent
significant
changes
introduced
by
the
Act
of
11
December
1998.
Further
modifications
have
been
made
in
the
meantime,
including
those
of
the
Act
of
26
February
2006.
The
Belgian
Privacy
Commission
is
part
of
a
European
task
force,
which
includes
data
protection
authorities
from
the
Netherlands,
Belgium,
Germany,
France
and
Spain.
In
October
2014,
a
new
Privacy
Bill
was
introduced
in
the
Belgian
Federal
Parliament.
The
Bill
mainly
aims
at
providing
the
Belgian
Data
Protection
Authority
(DPA)
with
stronger
enforcement
capabilities
and
ensuring
that
Belgian
citizens
regain
control
over
their
personal
data.
To
achieve
this,
certain
new
measures
are
being
proposed
to
be
included
in
the
existing
legislation,
adopted
already
in
1992,
as
inspired
by
the
proposed
European
data
protection
Regulation.
At
this
moment
the
current
data
processing
legislation
needs
an
urgent
update.
Rapid
technological
developments,
the
increasingly
globalized
nature
of
data
flows
and
the
arrival
of
cloud
computing
pose
new
challenges
for
data
protection
authorities.
In
order
to
ensure
a
continuity
of
high
level
data
protection,
the
rules
need
to
be
brought
in
line
with
technological
developments.
The
Directive
of
1995
has
also
not
prevented
fragmentation
in
the
way
data
protection
is
implemented
across
the
Union.
In
2012
the
European
Commission
has
proposed
a
comprehensive,
pan-‐European
reform
of
the
data
protection
rules
to
strengthen
online
privacy
rights
and
boost
Europe's
digital
economy.
On
15
June
2015,
the
Council
reached
a
‘general
approach’
on
a
General
Data
Protection
Regulation
(GDPR)
that
15.
15
establishes
rules
adapted
to
the
digital
era.
The
European
Commission
is
pushing
for
a
complete
agreement
between
Council
and
European
Parliament
before
the
end
of
this
year.
The
twofold
aim
of
the
Regulation
is
to
enhance
data
protection
rights
of
individuals
and
to
improve
business
opportunities
by
facilitating
the
free
flow
of
personal
data
in
the
digital
single
market.
The
Regulation
must
be
appropriately
balanced
in
order
to
guarantee
a
high
level
of
protection
of
the
individuals
and
allow
companies
to
preserve
innovation
and
competitiveness.
In
parallel
with
the
proposal
for
a
GDPR,
the
Commission
adopted
a
Directive
on
data
processing
for
law
enforcement
purposes
(5833/12).
4.2.2 Some
concerns
of
the
insurance
industry
The
European
insurance
and
reinsurance
federation,
Insurance
Europe,
is
concerned
that
the
proposed
Regulation
could
introduce
unintended
consequences
for
the
insurance
industry
and
their
policyholders.
The
new
legislation
must
correctly
balance
an
individual’s
right
to
privacy
against
the
needs
of
businesses.
The
way
insurers
process
data
must
be
taken
into
account
appropriately
so
that
they
can
perform
their
contractual
obligations,
assess
consumers’
needs
and
risks,
innovate,
and
also
combat
fraud.
There
is
also
a
clear
tension
between
Big
Data,
the
privacy
of
the
insured’s
personal
data
and
its
availability
to
business
and
the
State.
An
important
concern
is
that
the
proposed
rules
concerning
profiling
do
not
take
into
consideration
the
way
that
insurance
works.
The
Directive
of
1995
contains
rules
on
'automated
processing'
but
there
is
not
a
single
mention
of
'profiling'
in
the
text.
The
new
GDPR
aims
to
provide
more
legal
certainty
and
more
protection
for
individuals
with
respect
to
data
processing
in
the
context
of
profiling.
Insures
need
to
profile
potential
policyholders
to
measure
risk,
any
restrictions
on
profiling
could,
therefore,
translate
not
only
into
higher
insurance
prices
and
less
insurance
coverage,
but
also
into
an
inability
to
provide
consumers
with
appropriate
insurance.
Insurance
Europe
recommends
that
the
new
EU
Regulation
should
allow
insurance-‐related
profiling
at
pre-‐contractual
stage
and
during
the
performance
of
the
contract.
There
is
also
still
some
confusion
in
defining
profiling,
in
the
Council
approach
profiling
means
solely
automated
processing
while
Article
20(5)
proposed
by
the
European
Parliament,
could,
according
to
Insurance
Europe,
be
interpreted
as
prohibiting
fully
automated
processing,
requesting
human
intervention
for
every
single
insurance
contract
offered
to
consumers.
The
proposal
of
the
EU
Council
(June
2015)
stipulates
that
the
controller
should
use
adequate
mathematical
or
statistical
procedures
for
the
profiling.
He
must
secure
personal
data
in
a
way
which
takes
account
of
the
potential
risks
involved
for
the
interests
and
rights
of
the
data
subject
and
which
prevents
inter
alia
discriminatory
effects
against
individuals
on
the
basis
of
race
or
ethnic
origin,
political
opinions,
religion
or
beliefs,
trade
union
membership,
genetic
or
health
status,
sexual
orientation
or
that
result
in
measures
having
such
effect.
Automated
decision-‐making
and
profiling
based
on
special
categories
of
personal
data
should
only
be
allowed
under
specific
conditions.
According
to
the
Article
29
Working
Party12
the
proposals
of
the
Council
according
to
profiling
are
still
unclear
and
do
not
foresee
sufficient
safeguards
which
should
be
put
in
place.
In
June
2015
it
renews
its
call
for
provisions
giving
the
data
subject
a
maximum
of
control
and
autonomy
when
processing
personal
data
for
profiling.
The
provisions
should
clearly
define
the
purposes
for
which
profiles
may
be
created
and
used,
including
specific
obligations
on
controllers
to
inform
the
data
subject,
in
particular
on
his
or
her
right
to
object
to
the
creation
and
the
use
of
profiles.
The
academic
Research
Group
IRISS
remarks
that
the
GDPR
does
not
clarify
whether
or
not
there
is
an
obligation
on
data
controllers
to
disclose
information
about
the
algorithm
involved
in
profiling
practices
and
suggest
clarification
on
this
point.
Insurance
Europe
also
request
that
the
GDPR
should
explicitly
recognise
insurers’
need
to
process
and
share
data
for
fraud
prevention
and
detection.
According
to
the
Council
and
the
Article
29
Working
Party
fraud
prevention
may
fall
under
the
non-‐exhaustive
list
of
‘legitimate
interests’
in
Article
6(1)
(f)
and
will
provide
the
necessary
legal
basis
to
allow
processes
for
combatting
insurance
fraud.
The
new
Regulation
proposes
also
a
new
right
to
data
portability,
enabling
easier
transmission
of
personal
data
from
one
service
provider
to
another.
This
would
allow
policyholders
to
obtain
a
copy
of
any
of
their
data
being
processed
by
an
insurer
and
insurers
could
be
forced
to
disclose
confidential
and
12
Article
29
Working
Party
is
an
independent
advisory
body
on
data
protection
and
privacy,
set
up
under
Data
Protection
Direction
of
1995.
It
is
composed
of
representatives
from
the
national
data
protection
authorities
of
the
EU
Member
States,
the
European
Data
Protection
Supervisor
and
the
European
Commission.