SlideShare una empresa de Scribd logo
1 de 23
Descargar para leer sin conexión
 
	
  
	
  
	
  
	
  
BIG	
  DATA:	
  An	
  actuarial	
  perspective	
  
	
  
	
  
Information	
  Paper	
  
November	
  2015	
  
	
  
 
	
  
2
Table	
  of	
  Contents	
  
1	
  INTRODUCTION	
   3	
  
2	
  INTRODUCTION	
  TO	
  BIG	
  DATA	
   3	
  
2.1	
  INTRODUCTION	
  AND	
  CHARACTERISTICS	
   3	
  
2.2	
  BIG	
  DATA	
  TECHNIQUES	
  AND	
  TOOLS	
   4	
  
2.3	
  BIG	
  DATA	
  APPLICATIONS	
   4	
  
2.4	
  DATA	
  DRIVEN	
  BUSINESS	
   5	
  
3	
  BIG	
  DATA	
  IN	
  INSURANCE	
  VALUE	
  CHAIN	
   6	
  
3.1	
  INSURANCE	
  UNDERWRITING	
   6	
  
3.2	
  INSURANCE	
  PRICING	
   8	
  
3.3	
  INSURANCE	
  RESERVING	
   10	
  
3.4	
  CLAIMS	
  MANAGEMENT	
   11	
  
4	
  LEGAL	
  ASPECTS	
  OF	
  BIG	
  DATA	
   13	
  
4.1	
  INTRODUCTION	
   13	
  
4.2	
  DATA	
  PROCESSING	
   14	
  
4.3	
  DISCRIMINATION	
   16	
  
5	
  NEW	
  FRONTIERS	
   17	
  
5.1	
  RISK	
  POOLING	
  VS.	
  PERSONALIZATION	
   17	
  
5.2	
  PERSONALISED	
  PREMIUM	
   18	
  
5.3	
  FROM	
  INSURANCE	
  TO	
  PREVENTION	
   18	
  
5.4	
  THE	
  ALL-­‐SEEING	
  INSURER	
   18	
  
5.5	
  CHANGE	
  IN	
  INSURANCE	
  BUSINESS	
   19	
  
6	
  ACTUARIAL	
  SCIENCES	
  AND	
  THE	
  ROLE	
  OF	
  ACTUARIES	
   19	
  
6.1	
  WHAT	
  IS	
  BIG	
  DATA	
  BRINGING	
  FOR	
  THE	
  ACTUARY?	
   19	
  
6.2	
  WHAT	
  IS	
  THE	
  ACTUARY	
  BRINGING	
  TO	
  BIG	
  DATA?	
   20	
  
7	
  CONCLUSIONS	
   21	
  
8	
  REFERENCES	
   22	
  
 
	
  
3
1 Introduction	
  
The	
  Internet	
  has	
  started	
  in	
  1984	
  linking	
  1,000	
  university	
  and	
  corporate	
  labs.	
  In	
  1998	
  it	
  grew	
  to	
  50	
  million	
  
users,	
  while	
  in	
  2015	
  it	
  reached	
  3.2	
  billion	
  people	
  (44%	
  of	
  the	
  global	
  population).	
  This	
  enormous	
  user	
  
growth	
  was	
  combined	
  with	
  an	
  explosion	
  of	
  data	
  that	
  we	
  all	
  produce.	
  Every	
  day	
  we	
  create	
  around	
  2.5	
  
quintillion	
  bytes	
  of	
  data,	
  information	
  coming	
  from	
  various	
  sources	
  including	
  social	
  media	
  sites,	
  gadgets,	
  
smartphones,	
   intelligent	
   homes	
   and	
   cars	
   or	
   industrial	
   sensors	
   to	
   name	
   few.	
   Any	
   company	
   that	
   can	
  
combine	
  various	
  datasets	
  and	
  can	
  entail	
  effective	
  data	
  analytics	
  will	
  be	
  able	
  to	
  become	
  more	
  profitable	
  
and	
  successful.	
  According	
  to	
  a	
  recent	
  report1	
  400	
  large	
  companies	
  who	
  adopted	
  Big	
  Data	
  analytics	
  "have	
  
gained	
  a	
  significant	
  lead	
  over	
  the	
  rest	
  of	
  the	
  corporate	
  world."	
  Big	
  data	
  offers	
  big	
  business	
  gains,	
  but	
  also	
  
has	
   hidden	
   costs	
   and	
   complexity	
   that	
   companies	
   will	
   have	
   to	
   struggle	
   with.	
   Semi-­‐structured	
   and	
  
unstructured	
  big	
  data	
  requires	
  new	
  skills	
  and	
  there	
  is	
  shortage	
  of	
  people	
  who	
  mastered	
  data	
  science	
  and	
  
can	
  handle	
  mathematics	
  and	
  statistics,	
  programming	
  and	
  possess	
  substantive,	
  domain	
  knowledge.	
  
	
  
What	
  will	
  be	
  the	
  impact	
  on	
  the	
  insurance	
  sector	
  and	
  the	
  actuarial	
  profession?	
  The	
  concepts	
  of	
  Big	
  Data	
  
and	
   predictive	
   modelling	
   are	
   not	
   new	
   to	
   insurers	
   who	
   have	
   already	
   been	
   storing	
   and	
   analysing	
   large	
  
quantities	
  of	
  data	
  to	
  achieve	
  deeper	
  insights	
  into	
  customers’	
  behaviour	
  or	
  setting	
  up	
  insurance	
  premiums.	
  
Moreover	
   actuaries	
   are	
   data	
   scientists	
   for	
   insurance	
   and	
   they	
   have	
   all	
   the	
   statistical	
   training	
   and	
  
analytical	
  thinking	
  to	
  understand	
  complexity	
  of	
  data	
  combined	
  with	
  the	
  business	
  insights.	
  We	
  look	
  closely	
  
on	
   the	
   insurance	
   value	
   chain	
   and	
   assess	
   the	
   impact	
   of	
   Big	
   Data	
   on	
   underwriting,	
   pricing	
   and	
   claims	
  
reserving.	
   We	
   examine	
   the	
   ethics	
   of	
   Big	
   Data	
   including	
   data	
   privacy,	
   customer	
   identification,	
   data	
  
ownership	
   and	
   the	
   legal	
   aspects.	
   We	
   also	
   discuss	
   new	
   frontiers	
   for	
   insurance	
   and	
   its	
   impact	
   on	
   the	
  
actuarial	
  profession.	
  Will	
  actuaries	
  will	
  be	
  able	
  to	
  leverage	
  Big	
  Data,	
  create	
  sophisticated	
  risk	
  models	
  and	
  
more	
  personalized	
  insurance	
  offers,	
  and	
  bring	
  new	
  wave	
  of	
  innovation	
  to	
  the	
  market?	
  	
  
	
  
2 Introduction	
  to	
  Big	
  Data	
  
	
  
2.1 Introduction	
  and	
  characteristics	
  
Big	
  Data	
  broadly	
  refers	
  to	
  data	
  sets	
  so	
  large	
  and	
  complex	
  that	
  they	
  cannot	
  be	
  handled	
  by	
  traditional	
  data	
  
processing	
  software	
  and	
  it	
  can	
  be	
  defined	
  by	
  the	
  following	
  attributes:	
  
a. Volume:	
  in	
  2012	
  it	
  was	
  estimated	
  that	
  2.5	
  x	
  1018	
  bytes	
  of	
  data	
  was	
  created	
  worldwide	
  every	
  day	
  -­‐	
  
this	
  is	
  equivalent	
  to	
  a	
  stack	
  of	
  books	
  from	
  the	
  Sun	
  to	
  Pluto	
  and	
  back	
  again.	
  This	
  data	
  comes	
  from	
  
everywhere:	
   sensors	
   used	
   to	
   gather	
   climate	
   information,	
   posts	
   to	
   social	
   media	
   sites,	
   digital	
  
pictures	
  and	
  videos,	
  purchase	
  transaction	
  records,	
  software	
  logs,	
  GPS	
  signals	
  from	
  mobile	
  devices,	
  
among	
  others.	
  
b. Variety	
  and	
  Variability:	
  the	
  challenges	
  of	
  Big	
  Data	
  do	
  not	
  only	
  arise	
  from	
  the	
  sheer	
  volume	
  of	
  
data	
  but	
  also	
  from	
  the	
  fact	
  that	
  data	
  is	
  generated	
  in	
  multiple	
  forms	
  as	
  a	
  mix	
  of	
  unstructured	
  and	
  
structured	
  data,	
  and	
  as	
  a	
  mix	
  of	
  data	
  at	
  rest	
  and	
  data	
  in	
  motion	
  (i.e.	
  static	
  and	
  real	
  time	
  data).	
  
Furthermore	
   the	
   meaning	
   of	
   data	
   can	
   change	
   over	
   time	
   or	
   depend	
   on	
   the	
   context.	
   Structured	
  
data	
  is	
  organized	
  in	
  a	
  way	
  that	
  both	
  computers	
  and	
  humans	
  can	
  read,	
  for	
  example	
  information	
  
stored	
   in	
   traditional	
   databases.	
   Unstructured	
   data	
   refers	
   to	
   data	
   types	
   such	
   as	
   images,	
   audio,	
  
video,	
   social	
   media	
   and	
   other	
   information	
   that	
   are	
   not	
   organized	
   or	
   easily	
   interpreted	
   by	
  
traditional	
   databases.	
   It	
   includes	
   data	
   generated	
   by	
   machines	
   such	
   as	
   sensors,	
   web	
   feeds,	
  
networks	
  or	
  service	
  platforms.	
  
c. Visualization:	
  the	
  insights	
  gained	
  by	
  a	
  company	
  from	
  analysing	
  data	
  must	
  be	
  shared	
  in	
  a	
  way	
  that	
  
is	
  efficient	
  and	
  understandable	
  to	
  the	
  company’s	
  stakeholders.	
  
d. Velocity:	
  data	
  is	
  created,	
  saved,	
  analysed	
  and	
  visualized	
  at	
  an	
  increasing	
  speed,	
  making	
  it	
  possible	
  
to	
  analyse	
  and	
  visualize	
  high	
  volumes	
  of	
  data	
  in	
  real	
  time.	
  	
  
e. Veracity:	
  it	
  is	
  essential	
  that	
  the	
  data	
  is	
  accurate	
  in	
  order	
  to	
  generate	
  value.	
  
f. Value:	
  the	
  insights	
  gleaned	
  from	
  Big	
  Data	
  can	
  help	
  organizations	
  deepen	
  customer	
  engagement,	
  
optimize	
  operations,	
  prevent	
  threats	
  and	
  fraud,	
  and	
  capitalize	
  on	
  new	
  sources	
  of	
  revenue.	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
1	
  http://www.bain.com/publications/articles/big_data_the_organizational_challenge.aspx	
  
 
	
  
4
2.2 Big	
  Data	
  techniques	
  and	
  tools	
  
The	
  Big	
  Data	
  industry	
  has	
  been	
  supported	
  by	
  the	
  following	
  technologies:	
  
a. The	
  Apache	
  Hadoop	
  software	
  library	
  was	
  initially	
  released	
  in	
  December	
  2011	
  and	
  is	
  an	
  open	
  
source	
  framework	
  that	
  allows	
  for	
  the	
  distributed	
  processing	
  of	
  large	
  data	
  sets	
  across	
  clusters	
  of	
  
computers	
  using	
  simple	
  algorithms.	
  It	
  is	
  designed	
  to	
  scale	
  up	
  from	
  one	
  to	
  thousands	
  of	
  machines,	
  
each	
   one	
   being	
   a	
   computational	
   and	
   storage	
   unit.	
   The	
   software	
   library	
   is	
   designed	
   under	
   the	
  
fundamental	
   assumption	
   that	
   hardware	
   failures	
   are	
   common:	
   the	
   library	
   itself	
   automatically	
  
detects	
   and	
   handles	
   hardware	
   failures	
   in	
   order	
   to	
   guarantee	
   that	
   the	
   services	
   provided	
   by	
   a	
  
computer	
  cluster	
  will	
  stay	
  available	
  even	
  when	
  the	
  cluster	
  is	
  affected	
  by	
  hardware	
  failures.	
  A	
  wide	
  
variety	
  of	
  companies	
  and	
  organizations	
  use	
  Hadoop	
  for	
  both	
  research	
  and	
  production:	
  web-­‐based	
  
companies	
   that	
   own	
   some	
   of	
   the	
   world’s	
   biggest	
   data	
   warehouses	
   (Amazon,	
   Facebook,	
   Google,	
  
Twitter,	
  Yahoo!,	
  ...),	
  media	
  groups,	
  universities	
  among	
  others.	
  A	
  list	
  of	
  Hadoop	
  users	
  and	
  systems	
  
is	
  available	
  at	
  http://wiki.apache.org/hadoop/PoweredBy.	
  
b. Non-­‐relational	
  databases	
  have	
  existed	
  since	
  the	
  late	
  1960s	
  but	
  resurfaced	
  in	
  2009	
  (under	
  the	
  
moniker	
  of	
  Not	
  Only	
  SQL	
  -­‐	
  NOSQL))	
  as	
  it	
  became	
  clear	
  they	
  are	
  especially	
  well	
  suited	
  to	
  handle	
  the	
  
Big	
   Data	
   challenges	
   of	
   volume	
   and	
   variety	
   and	
   as	
   they	
   neatly	
   fit	
   within	
   the	
   Apache	
   Hadoop	
  
framework.	
  
c. Cloud	
   Computing	
   is	
   a	
   kind	
   of	
   internet-­‐based	
   computing,	
   where	
   shared	
   resources	
   and	
  
information	
   are	
   provided	
   to	
   computers	
   and	
   other	
   devices	
   on-­‐demand	
   (Wikipedia).	
   A	
   service	
  
provider	
  offers	
  computing	
  resources	
  for	
  a	
  fixed	
  price,	
  available	
  online	
  and	
  in	
  general	
  with	
  a	
  high	
  
degree	
  of	
  flexibility	
  and	
  reliability.	
  These	
  technologies	
  have	
  been	
  created	
  by	
  major	
  online	
  actors	
  
(Amazon,	
  Google)	
  followed	
  by	
  other	
  technology	
  providers	
  (IBM,	
  Microsoft,	
  RedHat).	
  There	
  is	
  a	
  
wide	
  variety	
  of	
  architecture	
  Public,	
  Private	
  and	
  Hybride	
  Cloud	
  with	
  all	
  the	
  objective	
  of	
  making	
  
computing	
  infrastructure	
  a	
  commodity	
  asset	
  with	
  the	
  best	
  quality/total	
  cost	
  of	
  ownership	
  ratio.	
  
Having	
  a	
  nearly	
  infinite	
  amount	
  of	
  computing	
  power	
  at	
  hand	
  with	
  a	
  high	
  flexibility	
  is	
  a	
  key	
  factor	
  
for	
  the	
  success	
  of	
  Big	
  Data	
  initiatives.	
  
d. Mining	
  Massive	
  Datasets	
  is	
  a	
  set	
  of	
  methods,	
  algorithms	
  and	
  techniques	
  that	
  can	
  be	
  used	
  to	
  deal	
  
with	
  Big	
  Data	
  problems	
  and	
  in	
  particular	
  with	
  volume,	
  variety	
  and	
  velocity	
  issues.	
  PageRank	
  can	
  
be	
   seen	
   as	
   a	
   major	
   step	
   (see	
   http://infolab.stanford.edu/pub/papers/google.pdf)	
   and	
   its	
  
evolution	
  to	
  a	
  Map-­‐Reduce	
  (https://en.wikipedia.org/wiki/MapReduce)	
  approach	
  is	
  definitively	
  a	
  
breakthrough.	
  Social	
  Netword	
  Analysis	
  is	
  becoming	
  an	
  area	
  of	
  research	
  in	
  itself	
  that	
  aim	
  to	
  extract	
  
useful	
   information	
   from	
   the	
   massive	
   amount	
   of	
   data	
   the	
   Social	
   Networks	
   are	
   providing.	
   These	
  
methods	
   are	
   very	
   well	
   suited	
   to	
   run	
   on	
   software	
   such	
   as	
   Hadoop	
   in	
   a	
   Cloud	
   Computing	
  
environment.	
  
e. Social	
  Networks	
  is	
  one	
  source	
  of	
  Bid	
  Data	
  that	
  provides	
  a	
  stream	
  of	
  data	
  with	
  a	
  huge	
  value	
  for	
  
almost	
  all	
  economic	
  (and	
  even	
  non-­‐economic)	
  actors.	
  For	
  most	
  companies,	
  it	
  is	
  the	
  very	
  first	
  time	
  
in	
  history	
  they	
  are	
  capable	
  of	
  interacting	
  directly	
  with	
  their	
  customers.	
  Many	
  applications	
  of	
  Big	
  
Data	
   make	
   use	
   of	
   these	
   data	
   to	
   provide	
   enhanced	
   services,	
   products	
   and	
   to	
   increase	
   customer	
  
satisfaction.	
  
2.3 Big	
  Data	
  Applications	
  
Big	
  Data	
  has	
  the	
  potential	
  to	
  change	
  the	
  way	
  academic	
  institutions,	
  corporate	
  and	
  organizations	
  conduct	
  
business	
  and	
  change	
  our	
  daily	
  life.	
  Great	
  examples	
  of	
  Big	
  Data	
  applications	
  include:	
  
a. Healthcare:	
   Big	
   Data	
   technologies	
   will	
   have	
   a	
   major	
   impact	
   in	
   healthcare.	
   IBM	
   estimates	
   that	
  
80%	
  of	
  medical	
  data	
  is	
  unstructured	
  and	
  is	
  clinically	
  relevant.	
  Furthermore	
  medical	
  data	
  resides	
  
in	
  multiple	
  places	
  like	
  individual	
  medical	
  files,	
  lab	
  and	
  imaging	
  systems,	
  physician	
  notes,	
  medical	
  
correspondence,	
   etc.	
   Big	
   Data	
   technologies	
   allow	
   healthcare	
   organizations	
   to	
   bring	
   all	
   the	
  
information	
   about	
   an	
   individual	
   together	
   to	
   get	
   insights	
   on	
   how	
   to	
   manage	
   care	
   coordination,	
  
outcomes-­‐based	
  reimbursement	
  models,	
  patient	
  engagement	
  and	
  outreach	
  programs.	
  
b. Retail:	
  Retailers	
  can	
  get	
  insights	
  for	
  personalizing	
  marketing	
  and	
  improving	
  the	
  effectiveness	
  of	
  
marketing	
  campaigns,	
  for	
  optimizing	
  assortment	
  and	
  merchandising	
  decisions,	
  and	
  for	
  removing	
  
inefficiencies	
  in	
  distribution	
  and	
  operations.	
  For	
  instance	
  several	
  retailers	
  now	
  incorporate	
  
 
	
  
5
Twitter	
  streams	
  into	
  their	
  analysis	
  of	
  loyalty-­‐program	
  data.	
  The	
  gained	
  insights	
  make	
  it	
  possible	
  
to	
  plan	
  for	
  surges	
  in	
  demand	
  for	
  certain	
  items	
  and	
  to	
  create	
  mobile	
  marketing	
  campaigns	
  
targeting	
  specific	
  customers	
  with	
  offers	
  at	
  the	
  times	
  of	
  day	
  they	
  would	
  be	
  most	
  receptive	
  to	
  them.2	
  
c. Politics:	
  Big	
  Data	
  technologies	
  will	
  improve	
  the	
  efficiency	
  and	
  effectiveness	
  across	
  the	
  broad	
  
range	
  of	
  government	
  responsibilities.	
  Great	
  example	
  of	
  Big	
  Data	
  use	
  in	
  politics	
  was	
  2012	
  analytics	
  
and	
  metrics	
  driven	
  Barack	
  Obama’s	
  presidential	
  campaign	
  [1].	
  Other	
  examples	
  include:	
  
i. Threat	
  and	
  crime	
  prediction	
  and	
  prevention.	
  For	
  instance	
  the	
  Detroit	
  Crime	
  Commission	
  
has	
  turned	
  to	
  Big	
  Data	
  in	
  its	
  effort	
  to	
  assist	
  the	
  government	
  and	
  citizens	
  of	
  southeast	
  
Michigan	
  in	
  the	
  prevention,	
  investigation	
  and	
  prosecution	
  of	
  neighbourhood	
  crime;3	
  
ii. Detection	
  of	
  fraud,	
  waste	
  and	
  errors	
  in	
  social	
  programs;	
  
iii. Detection	
  of	
  tax	
  fraud	
  and	
  abuse.	
  
d. Cyber	
  risk	
  prevention:	
  companies	
  can	
  analyse	
  data	
  traffic	
  in	
  their	
  computer	
  networks	
  in	
  real	
  
time	
  to	
  detect	
  anomalies	
  that	
  may	
  indicate	
  the	
  early	
  stages	
  of	
  a	
  cyber	
  attack.	
  Research	
  firm	
  
Gartner	
  estimates	
  that	
  by	
  2016,	
  more	
  than	
  25%	
  of	
  global	
  firms	
  will	
  adopt	
  big	
  data	
  analytics	
  for	
  at	
  
least	
  one	
  security	
  and	
  fraud	
  detection	
  use	
  case,	
  up	
  from	
  8%	
  as	
  at	
  2014.4	
  
e. Insurance	
  fraud	
  detection:	
  Insurance	
  companies	
  can	
  determine	
  a	
  score	
  for	
  each	
  claim	
  in	
  order	
  
to	
  target	
  for	
  fraud	
  investigation	
  the	
  claims	
  with	
  the	
  highest	
  scores	
  i.e.	
  the	
  ones	
  that	
  are	
  most	
  likely	
  
to	
  be	
  fraudulent.	
  Fraud	
  detection	
  is	
  treated	
  in	
  paragraph	
  3.4.	
  
f. Usage-­‐Based	
  Insurance:	
  is	
  an	
  insurance	
  scheme,	
  where	
  car	
  insurance	
  premiums	
  are	
  calculated	
  
based	
  on	
  dynamic	
  causal	
  data,	
  including	
  actual	
  usage	
  and	
  driving	
  behaviour.	
  Telematics	
  data	
  
transmitted	
  from	
  a	
  vehicle	
  combined	
  with	
  Big	
  Data	
  analytics	
  enables	
  insurers	
  to	
  distinguish	
  
cautious	
  drivers	
  from	
  aggressive	
  drivers	
  and	
  match	
  insurance	
  rate	
  with	
  the	
  actual	
  risk	
  incurred.	
  
2.4 Data	
  driven	
  business	
  
The	
   quantity	
   of	
   data	
   is	
   steeply	
   increasing	
   month	
   after	
   month	
   in	
   the	
   world.	
   Some	
   argue	
   it	
   is	
   time	
   to	
  
organize	
  and	
  use	
  this	
  information:	
  data	
  must	
  now	
  be	
  viewed	
  as	
  a	
  corporate	
  asset.	
  	
  In	
  order	
  to	
  respond	
  to	
  
this	
  arising	
  transformation	
  of	
  business	
  culture,	
  two	
  specific	
  C-­‐level	
  roles	
  have	
  thus	
  appeared	
  in	
  the	
  past	
  
years,	
  one	
  in	
  the	
  banking	
  and	
  the	
  other	
  in	
  the	
  insurance	
  industry.	
  
2.4.1 The	
  Chief	
  Data	
  Officer	
  
The	
  Chief	
  Data	
  Officer	
  (abbreviated	
  to	
  CDO)	
  is	
  the	
  first	
  architect	
  of	
  this	
  “data-­‐driven	
  business”.	
  Thanks	
  
to	
  his	
  role	
  of	
  coordinator,	
  the	
  CDO	
  will	
  be	
  in	
  charge	
  of	
  the	
  data	
  that	
  drive	
  the	
  company,	
  by:	
  	
  
• defining	
  and	
  setting	
  up	
  a	
  strategy	
  to	
  guarantee	
  their	
  quality,	
  their	
  reliability	
  and	
  their	
  
coherency;	
  
• organizing	
  and	
  classifying	
  them;	
  
• making	
  them	
  accessible	
  to	
  the	
  right	
  person	
  at	
  the	
  right	
  moment,	
  for	
  the	
  pertinent	
  need	
  and	
  in	
  
the	
  right	
  format.	
  
Thus,	
  the	
  Chief	
  Data	
  Officer	
  needs	
  a	
  strong	
  business	
  background	
  to	
  understand	
  how	
  business	
  runs.	
  The	
  
following	
   question	
   will	
   then	
   emerge:	
   to	
   whom	
   should	
   the	
   CDO	
   report?	
   In	
   some	
   firms,	
   the	
   CDO	
   is	
  
considered	
  part	
  of	
  the	
  IT,	
  and	
  reports	
  to	
  the	
  CTO	
  (Chief	
  Technology	
  Officer);	
  in	
  others,	
  he	
  holds	
  more	
  of	
  a	
  
business	
  role,	
  reporting	
  to	
  the	
  CEO.	
  It’s	
  therefore	
  up	
  to	
  the	
  company	
  to	
  decide,	
  as	
  not	
  two	
  companies	
  are	
  
exactly	
  similar	
  from	
  a	
  structural	
  point	
  of	
  view.	
  	
  
Which	
   companies	
   have	
   already	
   a	
   CDO?	
   Generali	
   Group	
   has	
   appointed	
   someone	
   to	
   this	
   newly	
   created	
  
position	
   in	
   June	
   2015.	
   Other	
   companies	
   such	
   as	
   HSBC,	
   Wells	
   Fargo	
   and	
   QBE	
   had	
   already	
   appointed	
   a	
  
person	
   to	
   this	
   position	
   in	
   2013	
   or	
   2014.	
   Even	
   Barack	
   Obama	
   appointed	
   a	
   Chief	
   Data	
   Officer/Scientist	
  
during	
  his	
  2012	
  campaign	
  and	
  the	
  metrics-­‐driven	
  decision-­‐making	
  campaign	
  played	
  a	
  big	
  role	
  in	
  Obama’s	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
2	
  http://asmarterplanet.com/blog/2015/03/surprising-­‐insights-­‐ibmtwitter-­‐alliance.html#more-­‐33140	
  
3	
  http://www.datameer.com/company/news/press-­‐releases/detroit-­‐crime-­‐commission-­‐combats-­‐crime-­‐with-­‐
datameer-­‐big-­‐data-­‐analytics.html	
  
4	
  http://www.gartner.com/newsroom/id/2663015	
  
 
	
  
6
re-­‐election.	
   In	
   the	
   beginning,	
   most	
   of	
   the	
   professionals	
   holding	
   the	
   actual	
   job	
   title	
   “Chief	
   Data	
   Officer”	
  
were	
  located	
  in	
  the	
  United	
  States.	
  After	
  a	
  while,	
  Europe	
  followed	
  the	
  move.	
  Also,	
  lots	
  of	
  people	
  did	
  the	
  job	
  
in	
  their	
  day-­‐to-­‐day	
  work,	
  but	
  didn’t	
  necessarily	
  hold	
  the	
  title.	
  Many	
  analysts	
  in	
  the	
  financial	
  sector	
  believe	
  
that	
  yet	
  more	
  insurance	
  and	
  banking	
  companies	
  will	
  have	
  to	
  do	
  the	
  move	
  in	
  the	
  following	
  years	
  if	
  they	
  
want	
  to	
  stay	
  attractive.	
  
2.4.2 The	
  Chief	
  Analytics	
  Officer	
  
Another	
  C-­‐level	
  position	
  aroused	
  in	
  the	
  past	
  months:	
  the	
  Chief	
  Analytics	
  Officer	
  (abbreviated	
  to	
  CAO).	
  Are	
  
there	
  differences	
  between	
  a	
  CAO	
  and	
  a	
  CDO?	
  	
  Theoretically	
  a	
  CDO	
  focuses	
  on	
  tactical	
  data	
  management,	
  
while	
  the	
  CAO	
  concentrates	
  on	
  the	
  strategic	
  deployment	
  of	
  analytics.	
  The	
  latter’s	
  focus	
  is	
  on	
  data	
  analysis	
  
to	
   find	
   hidden,	
   but	
   valuable,	
   patterns.	
   These	
   will	
   result	
   in	
   operational	
   decisions	
   that	
   will	
   make	
   the	
  
company	
   more	
   competitive,	
   more	
   efficient	
   and	
   more	
   attractive	
   to	
   their	
   potential	
   and	
   current	
   clients.	
  
Therefore,	
   the	
   CAO	
   is	
   a	
   normal	
   prolongation	
   of	
   the	
   data-­‐driven	
   business:	
   the	
   more	
   analytics	
   are	
  
embedded	
  in	
  the	
  organization,	
  the	
  more	
  you	
  need	
  an	
  executive-­‐level	
  person	
  to	
  manage	
  that	
  position	
  and	
  
communicate	
  the	
  results	
  in	
  an	
  understandable	
  way.	
  The	
  CAO	
  usually	
  reports	
  to	
  the	
  CEO.	
  
In	
   practice,	
   some	
   companies	
   put	
   the	
   CAO	
   responsibilities	
   into	
   the	
   CDO	
   tasks,	
   while	
   others	
   distinguish	
  
both	
  positions.	
  Currently,	
  it’s	
  quite	
  rare	
  to	
  find	
  an	
  explicit	
  “Chief	
  Analytics	
  Officer”	
  position	
  in	
  the	
  banking	
  
and	
  insurance	
  sector,	
  because	
  of	
  this	
  overlap.	
  But	
  in	
  other	
  fields,	
  the	
  distinction	
  is	
  often	
  made.	
  
3 Big	
  Data	
  in	
  insurance	
  value	
  chain	
  
Big	
   Data	
   provides	
   new	
   insights	
   from	
   social	
   networks,	
   telematics	
   sensors,	
   and	
   other	
   new	
   information	
  
channels	
   and	
   as	
   a	
   result	
   it	
   allows	
   understanding	
   customer	
   preferences	
   better,	
   enabling	
   new	
   business	
  
approaches	
  and	
  products,	
  and	
  enhancing	
  existing	
  internal	
  models,	
  processes	
  and	
  services.	
  With	
  the	
  rise	
  
of	
  Big	
  Data	
  the	
  insurance	
  world	
  could	
  fundamentally	
  change	
  and	
  the	
  entire	
  insurance	
  value	
  chain	
  could	
  
be	
  impacted	
  starting	
  from	
  underwriting	
  to	
  claims	
  management.	
  	
  
	
  
3.1 Insurance	
  underwriting	
  
3.1.1 Introduction	
  
In	
  traditional	
  insurance	
  underwriting	
  and	
  actuarial	
  analyses,	
  for	
  years	
  we	
  have	
  been	
  observing	
  a	
  never-­‐
ending	
  search	
  for	
  more	
  meaningful	
  insight	
  into	
  individual	
  policyholder	
  risk	
  characteristics	
  to	
  distinguish	
  
good	
   risks	
   from	
   the	
   bad	
   and	
   to	
   accurately	
   price	
   each	
   risk	
   accordingly.	
   The	
   analytics	
   performed	
   by	
  
actuaries,	
  based	
  on	
  advanced	
  mathematical	
  and	
  financial	
  theories,	
  have	
  always	
  been	
  critically	
  important	
  
to	
   an	
   insurer’s	
   profitability.	
   Over	
   the	
   last	
   decade,	
   however,	
   revolutionary	
   advances	
   in	
   computing	
  
technology	
   and	
   the	
   explosion	
   of	
   new	
   digital	
   data	
   sources	
   have	
   expanded	
   and	
   reinvented	
   the	
   core	
  
disciplines	
   of	
   insurers.	
   Today’s	
   advanced	
   analytics	
   in	
   insurance	
   go	
   much	
   further	
   than	
   traditional	
  
underwriting	
  and	
  actuarial	
  science.	
  Data	
  mining	
  and	
  predictive	
  modelling	
  is	
  today	
  the	
  way	
  forward	
  for	
  
insurers	
  for	
  improving	
  pricing,	
  segmentation	
  and	
  increasing	
  profitability.	
  
3.1.2 What	
  is	
  predictive	
  modelling?	
  
Predictive	
  modelling	
  can	
  be	
  defined	
  as	
  the	
  analysis	
  of	
  large	
  historical	
  data	
  sets	
  to	
  identify	
  correlations	
  
and	
  interactions	
  and	
  the	
  use	
  of	
  this	
  knowledge	
  to	
  predict	
  future	
  events.	
  For	
  actuaries,	
  the	
  concepts	
  of	
  
predictive	
  modelling	
  are	
  not	
  new	
  to	
  the	
  profession.	
  The	
  use	
  of	
  mortality	
  tables	
  to	
  price	
  life	
  insurance	
  
products	
   is	
   an	
   example	
   of	
   predictive	
   modelling.	
   The	
   Belgian	
   MK,	
   FK	
   and	
   MR,	
   FR	
   tables	
   showed	
   the	
  
relationship	
  between	
  death	
  probability	
  and	
  the	
  explaining	
  variables	
  of	
  age,	
  sex	
  and	
  product	
  type	
  (in	
  this	
  
case	
  life	
  insurance	
  or	
  annuity).	
  
Predictive	
   models	
   have	
   been	
   around	
   a	
   long	
   time	
   in	
   sales	
   and	
   marketing	
   environments	
   for	
   example	
   to	
  
predict	
  the	
  probability	
  of	
  a	
  customer	
  to	
  buy	
  a	
  new	
  product.	
  Bringing	
  together	
  expertise	
  from	
  both	
  the	
  
actuarial	
   profession	
   and	
   marketing	
   analytics	
   can	
   lead	
   to	
   new	
   innovative	
   initiatives	
   where	
   predictive	
  
models	
  guide	
  expert	
  decisions	
  in	
  areas	
  such	
  as	
  claims	
  management,	
  fraud	
  detection	
  and	
  underwriting.	
  
3.1.3 From	
  small	
  over	
  medium	
  to	
  Big	
  Data	
  
Insurers	
  collect	
  a	
  wealth	
  of	
  information	
  on	
  their	
  customers.	
  In	
  the	
  first	
  place	
  during	
  the	
  underwriting	
  
process:	
   by	
   asking	
   about	
   the	
   claims	
   history	
   of	
   a	
   customer	
   for	
   car	
   and	
   home	
   insurance	
   for	
   example.	
  
Another	
  source	
  is	
  the	
  history	
  of	
  the	
  relationship	
  the	
  customer	
  has	
  with	
  the	
  insurance	
  company.	
  While	
  in	
  
the	
  past	
  the	
  data	
  was	
  kept	
  in	
  silos	
  by	
  product,	
  the	
  key	
  challenge	
  now	
  lies	
  in	
  gathering	
  all	
  this	
  information	
  
into	
  one	
  place	
  where	
  the	
  customer	
  dimension	
  is	
  central.	
  The	
  transversal	
  approach	
  to	
  the	
  database	
  also	
  
 
	
  
7
reflects	
  the	
  recent	
  evolution	
  in	
  marketing:	
  going	
  from	
  the	
  4P’s	
  (product,	
  price,	
  place,	
  promotion)	
  to	
  the	
  
4C’s5	
  (customer,	
  costs,	
  convenience,	
  communication).	
  
On	
  top	
  of	
  unleashing	
  the	
  value	
  of	
  internal	
  data,	
  new	
  data	
  sources	
  are	
  becoming	
  available	
  like	
  for	
  instance	
  
wearables,	
  social	
  networks	
  to	
  name	
  few.	
  Because	
  Big	
  Data	
  can	
  be	
  overwhelming	
  to	
  start	
  with,	
  medium	
  
data	
   should	
   be	
   considered	
   at	
   first.	
   In	
   Belgium,	
   the	
   strong	
   bancassurance	
   tradition	
   offers	
   interesting	
  
opportunities	
  of	
  combining	
  the	
  insurance	
  and	
  bank	
  data	
  to	
  create	
  powerful	
  predictive	
  models.	
  
3.1.4 Examples	
  of	
  predictive	
  modelling	
  for	
  underwriting	
  
1°	
  Use	
  the	
  360	
  view	
  on	
  the	
  customer	
  and	
  predictive	
  models	
  to	
  maximize	
  profitability	
  and	
  gain	
  more	
  
business.	
  
By	
   thoroughly	
   analysing	
   data	
   from	
   different	
   sources	
   and	
   applying	
   analytics	
   to	
   gain	
   insight,	
   insurance	
  
companies	
   should	
   strive	
   to	
   develop	
   a	
   comprehensive	
   360-­‐degree	
   customer	
   view.	
   The	
   gains	
   of	
   this	
  
complete	
  and	
  accurate	
  view	
  of	
  the	
  customer	
  are	
  twofold:	
  
• Maximizing	
  the	
  profitability	
  of	
  the	
  current	
  customer	
  portfolio	
  through:	
  
o detecting	
  cross-­‐sell	
  and	
  up-­‐sell	
  opportunities;	
  
o customer	
  satisfaction	
  and	
  loyalty	
  actions,	
  
o effective	
  targeting	
  of	
  products	
  and	
  services	
  (e.g.	
  	
  customers	
  that	
  are	
  most	
  likely	
  to	
  be	
  in	
  
good	
  health	
  or	
  those	
  customers	
  that	
  are	
  less	
  likely	
  to	
  have	
  a	
  car	
  accident).	
  
• Acquiring	
   more	
   profitable	
   new	
   customers	
   at	
   a	
   reduced	
   marketing	
   cost:	
   modelling	
   the	
   existing	
  
customers	
  will	
  lead	
  to	
  useful	
  information	
  to	
  focus	
  marketing	
  campaigns	
  on	
  the	
  most	
  interesting	
  
prospects.	
  
By	
  combining	
  data	
  mining	
  and	
  analytics,	
  insurance	
  companies	
  can	
  better	
  understand	
  which	
  customers	
  
are	
  most	
  likely	
  to	
  buy,	
  discover	
  who	
  are	
  their	
  most	
  profitable	
  customers	
  and	
  how	
  to	
  attract	
  or	
  retain	
  
more	
   of	
   them.	
   Another	
   use	
   case	
   can	
   be	
   the	
   evaluation	
   of	
   the	
   underwriting	
   process	
   to	
   improve	
   the	
  
customer	
  experience	
  during	
  this	
  on-­‐boarding	
  process.	
  
2°	
  Predictive	
  underwriting	
  for	
  life	
  insurance6	
  
Using	
  predictive	
  models,	
  in	
  theory	
  it	
  is	
  possible	
  to	
  predict	
  the	
  death	
  probability	
  of	
  a	
  customer.	
  However,	
  
the	
  low	
  frequency	
  of	
  life	
  insurance	
  claims	
  presents	
  a	
  challenge	
  to	
  modellers.	
  While	
  for	
  car	
  insurance,	
  the	
  
probability	
  of	
  a	
  customer	
  having	
  a	
  claim	
  can	
  be	
  around	
  10%,	
  for	
  life	
  insurance	
  it	
  is	
  around	
  0,1%	
  for	
  the	
  
first	
  year.	
  Not	
  only	
  does	
  this	
  mean	
  that	
  a	
  significant	
  in	
  force	
  book	
  is	
  needed	
  to	
  have	
  confidence	
  in	
  the	
  
results,	
  but	
  also	
  that	
  sufficient	
  history	
  should	
  be	
  present	
  to	
  be	
  able	
  to	
  show	
  mortality	
  experience	
  over	
  
time.	
  For	
  this	
  reason,	
  using	
  the	
  underwriting	
  decision	
  as	
  the	
  variable	
  to	
  predict	
  is	
  a	
  more	
  common	
  choice.	
  
All	
  life	
  insurance	
  companies	
  hold	
  historical	
  data	
  on	
  medical	
  underwriting	
  decisions	
  that	
  can	
  be	
  leveraged	
  
to	
  build	
  predictive	
  models	
  that	
  predict	
  underwriting	
  decisions.	
  Depending	
  on	
  how	
  the	
  model	
  is	
  used,	
  the	
  
outcome	
  can	
  be	
  a	
  reduction	
  of	
  costs	
  for	
  medical	
  examinations,	
  to	
  have	
  more	
  customer	
  friendly	
  processes	
  
by	
  avoiding	
  asking	
  numerous	
  invasive	
  personal	
  questions	
  or	
  a	
  reduction	
  in	
  time	
  needed	
  to	
  assess	
  the	
  
risks	
  by	
  automatically	
  approving	
  good	
  risks	
  and	
  focusing	
  underwriting	
  efforts	
  on	
  more	
  complex	
  cases.	
  
For	
   example,	
   if	
   the	
   predictive	
   model	
   tells	
   you	
   that	
   a	
   new	
   customer	
   has	
   a	
   high	
   degree	
   of	
   similarity	
   to	
  
customers	
   that	
   passed	
   the	
   medical	
   examination,	
   the	
   medical	
   examination	
   could	
   be	
   waved	
   for	
   this	
  
customer.	
  
If	
  this	
  sounds	
  scary	
  for	
  risk	
  professionals,	
  first	
  a	
  softer	
  approach	
  can	
  be	
  tested,	
  for	
  instance	
  by	
  improving	
  
marketing	
  actions	
  by	
  targeting	
  only	
  those	
  individuals	
  that	
  have	
  a	
  high	
  likelihood	
  to	
  be	
  in	
  good	
  health.	
  
This	
   not	
   only	
   decreases	
   the	
   cost	
   of	
   the	
   campaign,	
   but	
   also	
   avoids	
   the	
   disappointment	
   of	
   a	
   potential	
  
customer	
  who	
  is	
  refused	
  during	
  the	
  medical	
  screening	
  process.	
  
	
  
	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
5	
  http://www.customfitonline.com/news/2012/10/19/4-­‐cs-­‐versus-­‐the-­‐4-­‐ps-­‐of-­‐marketing/	
  
6	
  Predictive	
  modeling	
  for	
  life	
  insurance,	
  April	
  2010,	
  Deloitte	
  
 
	
  
8
3.1.5 Challenges	
  of	
  predictive	
  modelling	
  in	
  underwriting7	
  
Predictive	
  models	
  can	
  only	
  be	
  as	
  good	
  as	
  the	
  input	
  used	
  to	
  calibrate	
  the	
  model.	
  The	
  first	
  challenge	
  in	
  
every	
  predictive	
  modelling	
  project	
  is	
  to	
  collect	
  relevant,	
  high	
  quality	
  data	
  of	
  which	
  a	
  history	
  is	
  present.	
  As	
  
many	
   insurers	
   are	
   currently	
   replacing	
   legacy	
   systems	
   to	
   reduce	
   maintenance	
   costs,	
   this	
   can	
   be	
   at	
   the	
  
expense	
  of	
  the	
  history.	
  Actuaries	
  are	
  uniquely	
  placed	
  to	
  prevent	
  the	
  history	
  being	
  lost,	
  as	
  for	
  adequate	
  
risk	
   management;	
   a	
   portfolio’s	
   history	
   should	
   be	
   kept.	
   The	
   trend	
   of	
   moving	
   all	
   policies	
   from	
   several	
  
legacy	
  systems	
  into	
  one	
  modern	
  single	
  policy	
  administration	
  system	
  is	
  an	
  opportunity	
  that	
  must	
  be	
  seized	
  
so	
  in	
  the	
  future	
  data	
  collection	
  will	
  be	
  easier.	
  
Once	
  the	
  necessary	
  data	
  are	
  collected,	
  some	
  legal	
  or	
  compliance	
  concerns	
  need	
  to	
  be	
  addressed	
  as	
  there	
  
might	
  be	
  boundaries	
  to	
  using	
  certain	
  variables	
  in	
  the	
  underwriting	
  process.	
  In	
  Europe,	
  if	
  the	
  model	
  will	
  
influence	
  the	
  price	
  of	
  the	
  insurance,	
  gender	
  is	
  no	
  longer	
  allowed	
  as	
  an	
  explanatory	
  variable.	
  And	
  this	
  is	
  
only	
  one	
  example.	
  It	
  is	
  important	
  that	
  the	
  purpose	
  of	
  the	
  model	
  and	
  the	
  possible	
  inputs	
  are	
  discussed	
  
with	
  the	
  legal	
  department	
  prior	
  to	
  starting	
  the	
  modelling.	
  
Once	
  the	
  model	
  is	
  built,	
  it	
  is	
  important	
  that	
  the	
  users	
  realize	
  that	
  no	
  model	
  is	
  perfect.	
  This	
  means	
  that	
  
residual	
  risks	
  will	
  be	
  present	
  and	
  this	
  should	
  be	
  put	
  in	
  the	
  balance	
  against	
  the	
  gains	
  that	
  the	
  use	
  of	
  the	
  
model	
  can	
  bring.	
  
And	
  finally,	
  once	
  a	
  predictive	
  model	
  has	
  been	
  set	
  up,	
  a	
  continuous	
  reviewing	
  cycle	
  must	
  be	
  put	
  in	
  place	
  
that	
  collects	
  feedback	
  from	
  the	
  underwriting	
  and	
  sales	
  teams	
  and	
  collects	
  data	
  to	
  improve	
  and	
  refine	
  the	
  
model.	
  Building	
  a	
  predictive	
  model	
  is	
  a	
  continuous	
  improvement	
  process,	
  not	
  a	
  one-­‐off	
  project.	
  
3.2 Insurance	
  pricing	
  
3.2.1 Overview	
  of	
  existing	
  pricing	
  techniques	
  
The	
  first	
  rate-­‐making	
  techniques	
  were	
  based	
  on	
  rudimentary	
  methods	
  such	
  as	
  univariate	
  analysis	
  and	
  
later	
  iterative	
  standardized	
  univariate	
  methods	
  such	
  as	
  the	
  minimum	
  bias	
  procedure.	
  They	
  look	
  at	
  how	
  
changes	
  in	
  one	
  characteristic	
  result	
  in	
  differences	
  in	
  loss	
  frequency	
  or	
  severity.	
  	
  
Later	
   on	
   insurance	
   companies	
   moved	
   to	
   multivariate	
   methods.	
   However,	
   this	
   was	
   associated	
   with	
   a	
  
further	
   development	
   of	
   the	
   computing	
   power	
   and	
   data	
   capabilities.	
   These	
   techniques	
   are	
   now	
   being	
  
adopted	
  by	
  more	
  and	
  more	
  insurers	
  and	
  are	
  becoming	
  part	
  of	
  everyday	
  business	
  practices.	
  Multivariate	
  
analytical	
  techniques	
  focus	
  on	
  individual	
  level	
  data	
  and	
  take	
  into	
  account	
  the	
  effects	
  (interactions)	
  that	
  
many	
  different	
  characteristics	
  of	
  a	
  risk	
  have	
  on	
  one	
  another.	
  As	
  it	
  was	
  explained	
  in	
  the	
  previous	
  section,	
  
many	
  companies	
  use	
  predictive	
  modelling	
  (a	
  form	
  of	
  multivariate	
  analysis)	
  to	
  create	
  measures	
  of	
  the	
  
likelihood	
  that	
  a	
  customer	
  will	
  purchase	
  a	
  particular	
  product.	
  Banks	
  use	
  these	
  tools	
  to	
  create	
  measures	
  
(e.g.	
  credit	
  scores)	
  of	
  whether	
  a	
  client	
  will	
  be	
  able	
  to	
  meet	
  lending	
  obligations	
  for	
  a	
  loan	
  or	
  mortgage.	
  
Similarly,	
   P&C	
   insurers	
   can	
   use	
   predictive	
   models	
   to	
   predict	
   claim	
   behaviour.	
   Multivariate	
   methods	
  
provide	
  valuable	
  diagnostics	
  that	
  aid	
  in	
  understanding	
  the	
  certainty	
  and	
  reasonableness	
  of	
  results.	
  	
  
Generalized	
  Linear	
  Models	
  are	
  essentially	
  a	
  generalized	
  form	
  of	
  linear	
  models.	
  This	
  family	
  encompasses	
  
normal	
   error	
   linear	
   regression	
   models	
   and	
   the	
   nonlinear	
   exponential,	
   logistic	
   and	
   Poisson	
   regression	
  
models,	
  as	
  well	
  as	
  many	
  other	
  models,	
  such	
  as	
  log-­‐linear	
  models	
  for	
  categorical	
  data.	
  Generalized	
  linear	
  
models	
  have	
  become	
  the	
  standard	
  for	
  classification	
  rate-­‐making	
  in	
  most	
  developed	
  insurance	
  markets—
particularly	
  because	
  of	
  the	
  benefit	
  of	
  transparency.	
  Understanding	
  the	
  mathematical	
  underpinnings	
  is	
  an	
  
important	
  responsibility	
  of	
  the	
  rate-­‐making	
  actuary	
  who	
  intends	
  to	
  use	
  such	
  a	
  method.	
  Linear	
  models	
  are	
  
a	
   good	
   place	
   to	
   start	
   as	
   GLMs	
   are	
   essentially	
   a	
   generalized	
   form	
   of	
   such	
   a	
   model.	
   As	
   with	
   many	
  
techniques,	
  visualizing	
  the	
  GLM	
  results	
  is	
  an	
  intuitive	
  way	
  to	
  connect	
  the	
  theory	
  with	
  the	
  practical	
  use.	
  
GLMs	
  do	
  not	
  stand	
  alone	
  as	
  the	
  only	
  multivariate	
  classification	
  method.	
  Other	
  methods	
  such	
  as	
  CART,	
  
factor	
  analysis,	
  and	
  neural	
  networks	
  are	
  often	
  used	
  to	
  augment	
  GLM	
  analysis.	
  	
  
In	
  general	
  the	
  data	
  mining	
  techniques	
  listed	
  above	
  can	
  enhance	
  a	
  rate-­‐making	
  exercise	
  by:	
  
• whittling	
  down	
  a	
  long	
  list	
  of	
  potential	
  explanatory	
  variables	
  to	
  a	
  more	
  manageable	
  list	
  for	
  use	
  
within	
  a	
  GLM;	
  
• providing	
  guidance	
  in	
  how	
  to	
  categorize	
  discrete	
  variables;	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
7	
  Predictive	
  modelling	
  in	
  insurance:	
  key	
  issues	
  to	
  consider	
  throughout	
  the	
  lifecycle	
  of	
  a	
  model	
  
 
	
  
9
• reducing	
   the	
   dimension	
   of	
   multi-­‐level	
   discrete	
   variables	
   (i.e.,	
   condensing	
   100	
   levels,	
   many	
   of	
  
which	
  have	
  few	
  or	
  no	
  claims,	
  into	
  20	
  homogenous	
  levels);	
  
• identifying	
   candidates	
   for	
   interaction	
   variables	
   within	
   GLMs	
   by	
   detecting	
   patterns	
   of	
  
interdependency	
  between	
  variables.	
  
	
  
3.2.2 Old	
  versus	
  new	
  modelling	
  techniques	
  
The	
  adoption	
  of	
  GLMs	
  resulted	
  in	
  many	
  companies	
  seeking	
  external	
  data	
  sources	
  to	
  augment	
  what	
  had	
  
already	
   been	
   collected	
   and	
   analysed	
   about	
   their	
   own	
   policies.	
   This	
   includes	
   but	
   is	
   not	
   limited	
   to	
  
information	
   about	
   geo-­‐demographics,	
   sensor	
   data,	
   social	
   media	
   information,	
   weather,	
   and	
   property	
  
characteristics,	
  information	
  about	
  insured	
  individuals	
  or	
  business.	
  This	
  additional	
  data	
  helps	
  actuaries	
  
further	
  improve	
  the	
  granularity	
  and	
  accuracy	
  of	
  classification	
  rate-­‐making.	
  Unfortunately	
  this	
  new	
  data	
  is	
  
very	
   often	
   unstructured	
   and	
   massive,	
   and	
   hence	
   the	
   traditional	
   generalized	
   linear	
   model	
   (GLM)	
  
techniques	
  become	
  useless.	
  
With	
   so	
   many	
   unique	
   new	
   variables	
   in	
   play,	
   it	
   can	
   become	
   a	
   very	
   difficult	
   task	
   to	
   identify	
   and	
   take	
  
advantage	
   of	
   the	
   most	
   meaningful	
   correlations.	
   In	
   many	
   cases,	
   GLM	
   techniques	
   are	
   simply	
   unable	
   to	
  
penetrate	
  deeply	
  into	
  these	
  giant	
  stores.	
  Even	
  in	
  the	
  cases	
  when	
  they	
  can,	
  the	
  time	
  constraints	
  required	
  to	
  
uncover	
  the	
  critical	
  correlations	
  tend	
  to	
  be	
  onerous,	
  requiring	
  days,	
  weeks,	
  and	
  even	
  months	
  of	
  analysis.	
  
Only	
   with	
   advanced	
   techniques,	
   and	
   specifically	
   machine	
   learning,	
   can	
   companies	
   generate	
   predictive	
  
models	
  to	
  take	
  advantage	
  of	
  all	
  the	
  data	
  they	
  are	
  capturing.	
  	
  
Machine	
  learning	
  is	
  the	
  modern	
  science	
  of	
  finding	
  patterns	
  and	
  making	
  predictions	
  from	
  data	
  based	
  on	
  
work	
   in	
   multivariate	
   statistics,	
   data	
   mining,	
   pattern	
   recognition,	
   and	
   advanced/predictive	
   analytics.	
  
Machine	
  learning	
  methods	
  are	
  particularly	
  effective	
  in	
  situations	
  where	
  deep	
  and	
  predictive	
  insights	
  need	
  
to	
  be	
  uncovered	
  from	
  data	
  sets	
  that	
  are	
  large,	
  diverse	
  and	
  fast	
  changing	
  —	
  Big	
  Data.	
  Across	
  these	
  types	
  of	
  
data,	
  machine	
  learning	
  easily	
  outperforms	
  traditional	
  methods	
  on	
  accuracy,	
  scale,	
  and	
  speed.	
  
3.2.3 Personalized	
  and	
  Real-­‐time	
  pricing	
  –	
  Motor	
  Insurance	
  
In	
  order	
  to	
  price	
  risk	
  more	
  accurately,	
  insurance	
  companies	
  are	
  now	
  combining	
  analytical	
  applications	
  –	
  
e.g.	
  behavioural	
  models	
  based	
  on	
  customer	
  profile	
  data	
  –	
  with	
  a	
  continuous	
  stream	
  of	
  real	
  time	
  data	
  –	
  e.g.	
  
satellite	
  data,	
  weather	
  reports,	
  vehicle	
  sensors	
  –	
  to	
  create	
  detailed	
  and	
  personalized	
  assessment	
  of	
  risk.	
  
Usage-­‐based	
  insurance	
  (UBI)	
  has	
  been	
  around	
  for	
  a	
  while	
  –	
  it	
  began	
  with	
  Pay-­‐As-­‐You-­‐Drive	
  programs	
  
that	
  gave	
  drivers	
  discounts	
  on	
  their	
  insurance	
  premiums	
  for	
  driving	
  under	
  a	
  set	
  number	
  of	
  miles.	
  These	
  
soon	
   developed	
   into	
   Pay-­‐How-­‐You-­‐Drive	
   programs,	
   which	
   track	
   your	
   driving	
   habits	
   and	
   give	
   you	
  
discounts	
  for	
  'safe'	
  driving.	
  
UBI	
  allows	
  a	
  firm	
  to	
  snap	
  a	
  picture	
  of	
  an	
  individual's	
  specific	
  risk	
  profile,	
  based	
  on	
  that	
  individual's	
  actual	
  
driving	
  habits.	
  UBI	
  condenses	
  the	
  period	
  of	
  time	
  under	
  inspection	
  to	
  a	
  few	
  months,	
  guaranteeing	
  a	
  much	
  
more	
  relevant	
  pool	
  of	
  information.	
  With	
  all	
  this	
  data	
  available,	
  the	
  pricing	
  scheme	
  for	
  UBI	
  deviates	
  greatly	
  
from	
   that	
   of	
   traditional	
   auto	
   insurance.	
   Traditional	
   auto	
   insurance	
   relies	
   on	
   actuarial	
   studies	
   of	
  
aggregated	
  historical	
  data	
  to	
  produce	
  rating	
  factors	
  that	
  include	
  driving	
  record,	
  credit-­‐based	
  insurance	
  
score,	
  personal	
  characteristics	
  (age,	
  gender,	
  and	
  marital	
  status),	
  vehicle	
  type,	
  living	
  location,	
  vehicle	
  use,	
  
previous	
  claims,	
  liability	
  limits,	
  and	
  deductibles.	
  	
  
Policyholders	
  tend	
  to	
  think	
  of	
  traditional	
  auto	
  insurance	
  as	
  a	
  fixed	
  cost,	
  assessed	
  annually	
  and	
  usually	
  
paid	
  for	
  in	
  lump	
  sums	
  on	
  an	
  annual,	
  semi-­‐annual,	
  or	
  quarterly	
  basis.	
  However,	
  studies	
  show	
  that	
  there	
  is	
  a	
  
strong	
   correlation	
   between	
   claim	
   and	
   loss	
   costs	
   and	
   mileage	
   driven,	
   particularly	
   within	
   existing	
   price	
  
rating	
  factors	
  (such	
  as	
  class	
  and	
  territory).	
  For	
  this	
  reason,	
  many	
  UBI	
  programs	
  seek	
  to	
  convert	
  the	
  fixed	
  
costs	
  associated	
  with	
  mileage	
  driven	
  into	
  variable	
  costs	
  that	
  can	
  be	
  used	
  in	
  conjunction	
  with	
  other	
  rating	
  
factors	
   in	
   the	
   premium	
   calculation.	
   UBI	
   has	
   the	
   advantage	
   of	
   utilizing	
   individual	
   and	
   current	
   driving	
  
behaviours,	
  rather	
  than	
  relying	
  on	
  aggregated	
  statistics	
  and	
  driving	
  records	
  that	
  are	
  based	
  on	
  past	
  trends	
  
and	
  events,	
  making	
  premium	
  pricing	
  more	
  individualized	
  and	
  precise.	
  
3.2.4 Advantages	
  
UBI	
  programs	
  offer	
  many	
  advantages	
  to	
  insurers,	
  consumers	
  and	
  society.	
  Linking	
  insurance	
  premiums	
  
more	
  closely	
  to	
  actual	
  individual	
  vehicle	
  or	
  fleet	
  performance	
  allows	
  insurers	
  to	
  price	
  premiums	
  more	
  
accurately.	
   This	
   increases	
   affordability	
   for	
   lower-­‐risk	
   drivers,	
   many	
   of	
   whom	
   are	
   also	
   lower-­‐income	
  
drivers.	
  It	
  also	
  gives	
  consumers	
  the	
  ability	
  to	
  control	
  their	
  premium	
  costs	
  by	
  encouraging	
  them	
  to	
  reduce	
  
 
	
  
10
miles	
   driven	
   and	
   adopt	
   safer	
   driving	
   habits.	
   The	
   use	
   of	
   telematics	
   helps	
   insurers	
   to	
   more	
   accurately	
  
estimate	
  accident	
  damages	
  and	
  reduce	
  fraud	
  by	
  enabling	
  them	
  to	
  analyse	
  the	
  driving	
  data	
  (such	
  as	
  hard	
  
breaking,	
  speed,	
  and	
  time)	
  during	
  an	
  accident.	
  This	
  additional	
  data	
  can	
  also	
  be	
  used	
  by	
  insurers	
  to	
  refine	
  
or	
  differentiate	
  UBI	
  products.	
  	
  
3.2.5 Shortcomings/challenges	
  	
  
3.2.5.1 Organization	
  and	
  resources	
  
Taking	
   advantage	
   of	
   the	
   potential	
   of	
   Big	
   Data	
   requires	
   some	
   different	
   approaches	
   to	
   organization,	
  
resources,	
   and	
   technology.	
   As	
   in	
   many	
   new	
   technologies	
   that	
   offer	
   promise,	
   there	
   are	
   challenges	
   to	
  
successful	
   implementation	
   and	
   the	
   production	
   of	
   meaningful	
   business	
   results.	
   The	
   number	
   one	
  
organizational	
  challenge	
  is	
  determining	
  the	
  business	
  value,	
  with	
  financing	
  as	
  a	
  close	
  second.	
  Talent	
  is	
  the	
  
other	
  big	
  issue	
  –	
  identifying	
  the	
  business	
  and	
  technology	
  experts	
  inside	
  the	
  enterprise,	
  recruiting	
  new	
  
employees,	
  training	
  and	
  mentoring	
  individuals,	
  and	
  partnering	
  with	
  outside	
  resources	
  is	
  clearly	
  a	
  critical	
  
success	
  factor	
  for	
  Big	
  Data.	
  Implementing	
  the	
  new	
  technology	
  and	
  organizing	
  the	
  data	
  are	
  listed	
  as	
  lesser	
  
challenges	
  by	
  insurers,	
  although	
  there	
  are	
  still	
  areas	
  that	
  require	
  attention.	
  
3.2.5.2 Technology	
  challenges	
  
The	
  biggest	
  technology	
  challenge	
  in	
  the	
  Big	
  Data	
  world	
  is	
  framed	
  in	
  the	
  context	
  of	
  different	
  Big	
  Data	
  “V”	
  
characteristics.	
  These	
  include	
  the	
  standard	
  three	
  V’s	
  of	
  volume,	
  velocity,	
  and	
  variety,	
  plus	
  two	
  more	
  –	
  
veracity	
   and	
   value.	
   The	
   variety	
   and	
   veracity	
   of	
   the	
   data	
   presents	
   the	
   biggest	
   challenges.	
   As	
   insurers	
  
venture	
   beyond	
   analysis	
   of	
   structured	
   transaction	
   data	
   to	
   incorporate	
   external	
   data	
   and	
   unstructured	
  
data	
  of	
  all	
  sorts,	
  the	
  ability	
  to	
  combine	
  and	
  input	
  the	
  data	
  into	
  an	
  analytic	
  analysis	
  may	
  be	
  complicated.	
  On	
  
one	
  hand,	
  the	
  variety	
  expresses	
  the	
  promise	
  of	
  Big	
  Data,	
  but	
  on	
  the	
  other	
  hand,	
  the	
  technical	
  challenges	
  
are	
   significant.	
   The	
   veracity	
   of	
   the	
   data	
   is	
   also	
   deemed	
   as	
   a	
   challenge.	
   It	
   is	
   true	
   that	
   some	
   Big	
   Data	
  
analyses	
  do	
  not	
  require	
  the	
  data	
  to	
  be	
  as	
  cleaned	
  and	
  organized	
  as	
  in	
  traditional	
  approaches.	
  However,	
  
the	
  data	
  must	
  still	
  reflect	
  the	
  underlying	
  truth/reality	
  of	
  the	
  domain.	
  
3.2.5.3 Technology	
  Approaches	
  
Technology	
  should	
  not	
  be	
  the	
  first	
  focus	
  area	
  for	
  evaluating	
  the	
  potential	
  of	
  Big	
  Data	
  in	
  an	
  organization.	
  
However,	
   choosing	
   the	
   best	
   technology	
   platform	
   for	
   your	
   organization	
   and	
   business	
   problems	
   does	
  
become	
  an	
  important	
  consideration	
  for	
  success.	
  Cloud	
  computing	
  will	
  play	
  a	
  very	
  important	
  role	
  in	
  Big	
  
Data.	
  Although	
  there	
  are	
  challenges	
  and	
  new	
  approaches	
  required	
  for	
  Big	
  Data,	
  there	
  is	
  a	
  growing	
  body	
  of	
  
experience,	
  expertise,	
  and	
  best	
  practices	
  to	
  assist	
  in	
  successful	
  Big	
  Data	
  implementations.	
  
3.3 Insurance	
  Reserving	
  
Loss	
  reserving	
  is	
  a	
  classic	
  actuarial	
  problem	
  encountered	
  extensively	
  in	
  motor,	
  property	
  and	
  casualty	
  as	
  
well	
  as	
  in	
  health	
  insurance.	
  It	
  is	
  a	
  consequence	
  of	
  the	
  fact	
  that	
  insurers	
  need	
  to	
  set	
  reserves	
  to	
  cover	
  
future	
  liabilities	
  related	
  to	
  the	
  book	
  of	
  contracts.	
  In	
  other	
  words	
  the	
  insurer	
  has	
  to	
  hold	
  funds	
  aside	
  to	
  
meet	
  future	
  liabilities	
  attached	
  to	
  incurred	
  claims.	
  
	
  
In	
  non-­‐life	
  insurance,	
  most	
  policies	
  run	
  for	
  a	
  period	
  of	
  12	
  months.	
  However	
  the	
  claims	
  payment	
  process	
  
can	
  take	
  years	
  or	
  even	
  decades.	
  In	
  particular,	
  losses	
  arising	
  from	
  casualty	
  insurance	
  can	
  take	
  a	
  long	
  time	
  
to	
   settle	
   and	
   even	
   when	
   the	
   claims	
   are	
   acknowledged,	
   it	
   may	
   take	
   time	
   to	
   establish	
   the	
   extent	
   of	
   the	
  
claims	
   settlement	
   costs.	
   A	
   well-­‐known	
   and	
   costly	
   example	
   is	
   provided	
   by	
   the	
   claims	
   from	
   asbestos	
  
liabilities.	
  Thus	
  it	
  is	
  not	
  a	
  surprise	
  that	
  the	
  biggest	
  item	
  on	
  the	
  liabilities	
  side	
  of	
  an	
  insurer’s	
  balance	
  sheet	
  
is	
   often	
   the	
   provision	
   of	
   reserves	
   for	
   future	
   claims	
   payments.	
   It	
   is	
   the	
   job	
   of	
   the	
   reserving	
   actuary	
   to	
  
predict,	
   with	
   maximum	
   accuracy,	
   the	
   total	
   amount	
   necessary	
   to	
   pay	
   those	
   claims	
   that	
   the	
   insurer	
   has	
  
legally	
  committed	
  to	
  cover	
  for.	
  
	
  
Historically,	
  reserving	
  was	
  based	
  on	
  deterministic	
  calculations	
  with	
  pen	
  and	
  paper,	
  combined	
  with	
  expert	
  
judgement.	
   Since	
   the	
   1980s,	
   the	
   arrival	
   of	
   personal	
   computers	
   and	
   ‘spreadsheet’	
   software	
   packages	
  
induced	
  a	
  real	
  change	
  for	
  the	
  reserving	
  actuaries.	
  The	
  use	
  of	
  spreadsheets	
  does	
  not	
  only	
  result	
  in	
  gain	
  of	
  
calculation	
  time	
  but	
  allows	
  also	
  testing	
  different	
  scenarios	
  and	
  the	
  sensitivity	
  of	
  the	
  forecasts.	
  The	
  first	
  
simple	
  models	
  used	
  by	
  actuaries	
  started	
  to	
  evolve	
  towards	
  more	
  developed	
  ideas	
  through	
  the	
  evolution	
  
of	
   the	
   IT	
   resources.	
   Moreover	
   the	
   recent	
   changes	
   in	
   regulatory	
   requirements,	
   such	
   as	
   Solvency	
   II	
   in	
  
Europe,	
  have	
  showed	
  the	
  need	
  of	
  stochastic	
  models	
  and	
  more	
  precise	
  statistical	
  techniques.	
  
	
  
	
  
	
  
 
	
  
11
3.3.1 Classical	
  methods	
  
There	
  are	
  a	
  lot	
  of	
  different	
  frameworks	
  and	
  models	
  used	
  by	
  reserving	
  actuaries	
  to	
  compute	
  the	
  technical	
  
provisions,	
  and	
  it	
  is	
  not	
  the	
  goal	
  of	
  this	
  paper	
  to	
  review	
  them	
  in	
  an	
  exhaustive	
  way	
  but	
  rather	
  to	
  show	
  that	
  
they	
  share	
  the	
  central	
  notion	
  of	
  triangle.	
  A	
  triangle	
  is	
  a	
  way	
  of	
  presenting	
  data	
  in	
  the	
  form	
  of	
  a	
  triangular	
  
structure	
  showing	
  the	
  development	
  of	
  claims	
  over	
  time	
  for	
  each	
  origin	
  period.	
  An	
  origin	
  period	
  can	
  be	
  the	
  
year	
  the	
  policy	
  was	
  written	
  or	
  earned,	
  or	
  the	
  loss	
  occurrence	
  period.	
  	
  
	
  
After	
  having	
  used	
  deterministic	
  models,	
  reserving	
  generally	
  switches	
  to	
  stochastic	
  models.	
  These	
  models	
  
allow	
  for	
  quantifying	
  reserve	
  risk.	
  	
  
	
  
The	
  use	
  of	
  models	
  based	
  on	
  aggregated	
  data	
  used	
  to	
  be	
  convenient	
  in	
  the	
  past	
  when	
  IT	
  resources	
  were	
  
limited	
  but	
  is	
  more	
  and	
  more	
  questionable	
  nowadays	
  when	
  we	
  have	
  huge	
  computational	
  power	
  at	
  hand	
  
at	
  an	
  affordable	
  price.	
  Therefore	
  there	
  is	
  a	
  need	
  to	
  move	
  to	
  models	
  that	
  fully	
  use	
  data	
  available	
  in	
  the	
  
insurers’	
  data	
  warehouses.	
  
	
  
3.3.2 Micro-­‐level	
  reserving	
  methods	
  
Unlike	
  aggregate	
  models	
  (or	
  macro-­‐level	
  models),	
  micro-­‐level	
  reserving	
  methods	
  (also	
  called	
  individual	
  
claim	
   level	
   models)	
   use	
   individual	
   claims	
   data	
   as	
   inputs	
   and	
   estimate	
   outstanding	
   liabilities	
   for	
   each	
  
individual	
  claim.	
  Unlike	
  the	
  models	
  detailed	
  in	
  the	
  previous	
  section,	
  they	
  model	
  very	
  precisely	
  the	
  lifetime	
  
development	
   process	
   of	
   each	
   individual	
   claim,	
   including	
   events	
   such	
   as	
   claim	
   occurrence,	
   reporting,	
  
payments	
  and	
  settlement.	
  Moreover	
  they	
  can	
  include	
  micro-­‐level	
  covariates	
  such	
  as	
  information	
  about	
  
the	
  policy,	
  the	
  policyholder,	
  claim,	
  claimant	
  and	
  transactions.	
  
	
  
When	
  well	
  specified,	
  such	
  models	
  are	
  expected	
  to	
  generate	
  reliable	
  reserve	
  estimates.	
  Indeed	
  the	
  ability	
  
to	
   model	
   the	
   claims	
   development	
   at	
   the	
   individual	
   level	
   and	
   to	
   incorporate	
   micro-­‐level	
   covariate	
  
information	
  allows	
  micro-­‐level	
  models	
  to	
  handle	
  heterogeneities	
  in	
  claims	
  data	
  efficiently.	
  Moreover	
  the	
  
large	
   amount	
   of	
   data	
   used	
   in	
   modelling	
   can	
   help	
   to	
   avoid	
   issues	
   of	
   over-­‐parameterization	
   and	
   lack	
   of	
  
robustness.	
   As	
   a	
   consequence,	
   micro-­‐level	
   models	
   are	
   especially	
   significant	
   under	
   changing	
  
environments,	
  as	
  these	
  changes	
  can	
  be	
  indicated	
  by	
  appropriate	
  covariates.	
  
	
  
3.4 Claims	
  Management	
  
Big	
  Data	
  can	
  play	
  a	
  tremendous	
  role	
  in	
  the	
  improvement	
  of	
  claims	
  management.	
  It	
  provides	
  access	
  to	
  data	
  
that	
  was	
  not	
  available	
  before,	
  and	
  makes	
  the	
  claims	
  processing	
  faster.	
  Therefore	
  it	
  enables	
  improved	
  risk	
  
management,	
  reduces	
  loss	
  adjustment	
  expenses	
  and	
  enhances	
  quality	
  of	
  service	
  resulting	
  in	
  increased	
  
customer	
  retention.	
  Below	
  we	
  present	
  details	
  of	
  how	
  Big	
  Data	
  analytics	
  improves	
  fraud	
  detection	
  process.	
  
3.4.1 Fraud	
  detection	
  
It	
  is	
  estimated	
  that	
  a	
  typical	
  organization	
  loses	
  5%	
  of	
  its	
  revenues	
  to	
  fraud	
  each	
  year8.	
  	
  The	
  total	
  cost	
  of	
  
insurance	
  fraud	
  (non-­‐health	
  insurance)	
  in	
  the	
  US	
  is	
  estimated	
  to	
  be	
  more	
  than	
  $40	
  billion	
  per	
  year9.	
  	
  The	
  
advent	
  of	
  Big	
  Data	
  &	
  Analytics	
  has	
  provided	
  new	
  and	
  powerful	
  tools	
  to	
  fight	
  fraud.	
  	
  	
  
3.4.2 What	
  are	
  the	
  current	
  challenges	
  in	
  fraud	
  detection?	
  
The	
  first	
  challenge	
  is	
  finding	
  the	
  right	
  data.	
  	
  Analytical	
  models	
  need	
  data	
  and	
  in	
  a	
  fraud	
  detection	
  setting	
  
this	
   is	
   not	
   always	
   that	
   evident.	
   	
   Collected	
   fraud	
   data	
   are	
   often	
   very	
   skew,	
   with	
   typically	
   less	
   than	
   1%	
  
fraudsters,	
  which	
  seriously	
  complicates	
  the	
  detection	
  task.	
  	
  Also	
  the	
  asymmetric	
  costs	
  of	
  missing	
  fraud	
  
versus	
   harassing	
   non-­‐fraudulent	
   customers	
   represent	
   important	
   model	
   difficulties.	
   	
   Furthermore,	
  
fraudsters	
   try	
   to	
   constantly	
   outperform	
   the	
   analytical	
   models	
   such	
   that	
   these	
   models	
   should	
   be	
  
permanently	
  monitored	
  and	
  re-­‐configured	
  on	
  an	
  ongoing	
  basis.	
  	
  	
  
3.4.3 What	
  analytical	
  approaches	
  are	
  being	
  used	
  to	
  tackle	
  fraud?	
  
Most	
   of	
   the	
   fraud	
   detection	
   models	
   in	
   use	
   nowadays	
   are	
   expert	
   based	
   models.	
   	
   When	
   data	
   becomes	
  
available,	
  one	
  can	
  start	
  doing	
  analytics.	
  	
  A	
  first	
  approach	
  is	
  supervised	
  learning	
  which	
  analyses	
  a	
  labelled	
  
data	
   set	
   of	
   historically	
   observed	
   fraud	
   behaviour.	
   	
   It	
   can	
   be	
   used	
   to	
   both	
   predict	
   fraud	
   as	
   well	
   as	
   the	
  
amount	
   thereof.	
   	
   Unsupervised	
   learning	
   starts	
   from	
   an	
   unlabelled	
   data	
   set	
   and	
   performs	
   anomaly	
  
detection.	
   	
   Finally,	
   Social	
   network	
   learning	
   analyses	
   fraud	
   behaviour	
   in	
   networks	
   of	
   linked	
   entities.	
  	
  
Throughout	
  our	
  research,	
  it	
  has	
  been	
  found	
  that	
  this	
  approach	
  is	
  superior	
  to	
  all	
  others!	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
8	
  www.acfe.com	
  
9	
  www.fbi.gov	
  
 
	
  
12
3.4.4 What	
  are	
  the	
  key	
  characteristics	
  of	
  successful	
  analytical	
  models	
  for	
  fraud	
  detection?	
  	
  
Successful	
  fraud	
  analytical	
  models	
  should	
  satisfy	
  various	
  requirements.	
  	
  First,	
  they	
  should	
  achieve	
  good	
  
statistical	
  performance	
  in	
  terms	
  of	
  recall	
  or	
  hit	
  rate,	
  which	
  is	
  the	
  percentage	
  of	
  fraudsters	
  labelled	
  by	
  the	
  
analytical	
   model	
   as	
   suspicious,	
   and	
   precision,	
   which	
   is	
   the	
   percentage	
   of	
   fraudsters	
   amongst	
   the	
   ones	
  
labelled	
   as	
   suspicious.	
   	
   Next,	
   the	
   analytical	
   models	
   should	
   not	
   be	
   based	
   on	
   complex	
   mathematical	
  
formulas	
  (such	
  as	
  neural	
  networks,	
  support	
  vector	
  machines,...)	
  but	
  should	
  provide	
  clear	
  insight	
  into	
  the	
  
fraud	
   mechanisms	
   adopted.	
   	
   This	
   is	
   particularly	
   important	
   since	
   the	
   insights	
   gained	
   will	
   be	
   used	
   to	
  
develop	
  new	
  fraud	
  prevention	
  strategies.	
  	
  Also	
  the	
  operational	
  efficiency	
  of	
  the	
  fraud	
  analytical	
  model	
  
needs	
  to	
  be	
  evaluated.	
  	
  This	
  refers	
  to	
  the	
  amount	
  of	
  resources	
  needed	
  to	
  calculate	
  the	
  fraud	
  score	
  and	
  
adequately	
  act	
  upon	
  it.	
  	
  E.g.,	
  in	
  a	
  credit	
  card	
  fraud	
  environment,	
  a	
  decision	
  needs	
  to	
  be	
  made	
  within	
  a	
  few	
  
seconds	
  after	
  the	
  transaction	
  was	
  initiated.	
  	
  	
  
3.4.5 Use	
  of	
  social	
  network	
  analytics	
  to	
  detect	
  fraud10	
  
	
  Research	
   has	
   proven	
   that	
   network	
   models	
   significantly	
   outperform	
   non-­‐network	
   models	
   in	
   terms	
   of	
  
accuracy,	
  precision	
  and	
  recall.	
  Network	
  analytics	
  can	
  help	
  improve	
  fraud	
  detection	
  techniques.	
  Fraud	
  is	
  
present	
  in	
  many	
  critical	
  human	
  processes	
  such	
  as	
  credit	
  card	
  transactions,	
  insurance	
  claim	
  fraud,	
  opinion	
  
fraud,	
   social	
   security	
   fraud...	
   Fraud	
   can	
   be	
   defined	
   by	
   the	
   following	
   five	
   characteristics.	
   	
   Fraud	
   is	
   an	
  
uncommon,	
   well-­‐considered,	
   imperceptibly	
   concealed,	
   time-­‐evolving	
   and	
   often	
   carefully	
   organized	
   crime,	
  
which	
  appears	
  in	
  many	
  types	
  and	
  forms.	
  Before	
  applying	
  fraud	
  detection	
  techniques,	
  these	
  five	
  issues	
  
should	
  be	
  resolved	
  or	
  counterbalanced.	
  	
  
	
  
Fraud	
   is	
   an	
   uncommon	
   crime	
   and	
   this	
   means	
   that	
   it	
   is	
   an	
   extremely	
   skewed	
   class	
   distribution.	
  
Rebalancing	
  techniques	
  could	
  be	
  used	
  such	
  as	
  the	
  SMOTE	
  to	
  counterbalance	
  this	
  effect.	
  SMOTE	
  consists	
  in	
  
under	
  sampling	
  the	
  majority	
  class	
  of	
  data	
  (reduce	
  the	
  number	
  of	
  legitimate	
  cases)	
  and	
  oversampling	
  the	
  
minority	
  class	
  of	
  data	
  (duplicate	
  of	
  fraud	
  cases	
  or	
  create	
  artificial	
  fraud	
  cases).	
  	
  	
  
Complex	
  fraud	
  structures	
  are	
  well-­‐considered,	
  this	
  implies	
  that	
  there	
  will	
  be	
  changes	
  in	
  behaviour	
  over	
  
time	
  so	
  not	
  every	
  time	
  period	
  will	
  have	
  the	
  same	
  importance.	
  A	
  temporal	
  weighting	
  adjustment	
  should	
  
put	
  an	
  emphasis	
  on	
  the	
  more	
  important	
  periods	
  (more	
  recent	
  data	
  periods)	
  that	
  could	
  be	
  explanatory	
  of	
  
the	
  fraudulent	
  behaviour.	
  
Fraud	
  is	
  imperceptibly	
  concealed	
  meaning	
  that	
  it	
  is	
  difficult	
  to	
  identify	
  fraud.	
  One	
  could	
  leverage	
  on	
  expert	
  
knowledge	
  to	
  create	
  features	
  and	
  help	
  identify	
  fraud.	
  	
  
Fraud	
   is	
   time-­‐evolving.	
   The	
   period	
   of	
   study	
   should	
   be	
   selected	
   carefully	
   taking	
   into	
   consideration	
   that	
  
fraud	
   evolves	
   over	
   time.	
   How	
   much	
   of	
   previous	
   time	
   periods	
   could	
   explain	
   or	
   affect	
   the	
   present?	
   The	
  
model	
  should	
  incorporate	
  these	
  changes	
  over	
  time.	
  Another	
  question	
  to	
  rise	
  is	
  in	
  what	
  time-­‐window	
  the	
  
model	
  should	
  be	
  able	
  to	
  detect	
  fraud:	
  short,	
  medium	
  or	
  long	
  term.	
  
The	
   last	
   characteristic	
   of	
   fraud	
   is	
   that	
   it	
   is	
   most	
   of	
   the	
   time	
   carefully	
   organized.	
   Fraud	
   is	
   often	
   not	
   an	
  
individual	
  phenomenon,	
  in	
  fact	
  there	
  are	
  many	
  interactions	
  between	
  fraudsters.	
  Often	
  there	
  are	
  fraud	
  
sub-­‐networks	
   developing	
   in	
   a	
   bigger	
   network.	
   Social	
   network	
   analysis	
   could	
   be	
   used	
   to	
   detect	
   these	
  
networks.	
  	
  
Social	
  Network	
  analysis	
  helps	
  deriving	
  useful	
  patterns	
  and	
  insights	
  by	
  exploiting	
  the	
  relational	
  structure	
  
between	
  objects.	
  
A	
  network	
  consists	
  of	
  two	
  set	
  of	
  elements:	
  the	
  objects	
  of	
  the	
  network	
  which	
  are	
  called	
  nodes	
  and	
  the	
  
relationships	
  between	
  nodes	
  which	
  are	
  called	
  links.	
  The	
  links	
  connect	
  two	
  or	
  more	
  nodes.	
  	
  A	
  weight	
  could	
  
be	
   assigned	
   to	
   the	
   nodes	
   and	
   links	
   to	
   measure	
   the	
   magnitude	
   of	
   the	
   crime	
   or	
   the	
   intensity	
   of	
   the	
  
relationship.	
  When	
  constructing	
  such	
  networks,	
  focus	
  will	
  be	
  put	
  on	
  the	
  neighbourhood	
  of	
  a	
  node	
  which	
  
is	
  a	
  subgraph	
  of	
  network	
  around	
  the	
  node	
  of	
  interest	
  (fraudster).	
  	
  
Once	
   a	
   network	
   has	
   been	
   constructed,	
   how	
   could	
   this	
   network	
   be	
   used	
   as	
   an	
   indicator	
   of	
   fraudulent	
  
activities?	
   Fraud	
   could	
   be	
   detected	
   by	
   answering	
   following	
   question:	
   Does	
   the	
   network	
   contain	
  
statistically	
   significant	
   patterns	
   of	
   homophily?	
   Detection	
   of	
   fraud	
   relies	
   on	
   a	
   concept	
   often	
   used	
   in	
  
sociology	
  which	
  is	
  called	
  homophily.	
  Homophily	
  in	
  networks	
  consists	
  in	
  people	
  have	
  a	
  strong	
  tendency	
  to	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
10	
  based	
  on	
  the	
  research	
  of	
  Véronique	
  Van	
  Vlasselaer	
  (KULeuven)	
  
	
  
 
	
  
13
associate	
  with	
  other	
  whom	
  they	
  perceive	
  as	
  being	
  similar	
  to	
  themselves	
  in	
  some	
  way.	
  This	
  concept	
  could	
  
be	
  translated	
  in	
  fraud	
  networks:	
  fraudulent	
  people	
  are	
  more	
  likely	
  to	
  be	
  connected	
  to	
  other	
  fraudulent	
  
people.	
   Clustering	
   techniques	
   could	
   be	
   used	
   to	
   detect	
   significant	
   pattern	
   of	
   homophily	
   and	
   thus	
   could	
  
spot	
  fraudsters.	
  	
  
Given	
  a	
  homophilic	
  network	
  with	
  evidence	
  of	
  fraud	
  clusters	
  then	
  it	
  is	
  possible	
  to	
  extract	
  features	
  from	
  the	
  
network	
   around	
   the	
   node(s)	
   of	
   interest	
   (fraud	
   activity)	
   which	
   is	
   also	
   called	
   the	
   neighbourhood	
   of	
   the	
  
node.	
  This	
  process	
  is	
  called	
  the	
  featurization	
  process:	
  extracting	
  features	
  for	
  each	
  network	
  object	
  based	
  
on	
  its	
  neighbourhood.	
  	
  Focus	
  will	
  be	
  put	
  on	
  the	
  first-­‐order	
  neighbourhood	
  (first-­‐degree	
  links)	
  also	
  known	
  
as	
  the	
  “egonet”.	
  (ego:	
  node	
  of	
  interest	
  surrounded	
  by	
  its	
  direct	
  associates	
  known	
  as	
  alters).	
  Featurization	
  
extraction	
  happens	
  at	
  two	
  levels:	
  egonet	
  generic	
  features	
  (how	
  many	
  fraudulent	
  resources	
  are	
  associated	
  
to	
  that	
  company,	
  is	
  there	
  relationships	
  between	
  resources...)	
  and	
  alter	
  specific	
  features	
  (how	
  similar	
  are	
  
the	
  alter	
  to	
  the	
  ego,	
  is	
  the	
  alter	
  involved	
  in	
  many	
  fraud	
  cases	
  or	
  not).	
  	
  
Once	
   these	
   first-­‐order	
   neighbourhood	
   features	
   for	
   each	
   subject	
   of	
   interest	
   (companies)	
   have	
   been	
  
extracted	
  such	
  as	
  degree	
  of	
  fraudulent	
  resources,	
  the	
  weight	
  of	
  the	
  fraudulent	
  resources,	
  it	
  is	
  then	
  easy	
  to	
  
derive	
  the	
  propagation	
  effect	
  of	
  these	
  fraudulent	
  influences	
  through	
  the	
  network.	
  	
  
To	
   conclude,	
   network	
   models	
   always	
   outperform	
   non-­‐network	
   models	
   as	
   they	
   are	
   able	
   to	
   better	
  
distinguish	
   fraudsters	
   from	
   non-­‐fraudsters.	
   	
   They	
   are	
   also	
   more	
   precise	
   in	
   generating	
   high-­‐risk	
  
companies	
  and	
  smaller	
  list	
  and	
  better	
  detect	
  more	
  fraudulent	
  corporates.	
  
3.4.6 Fraud	
  detection	
  in	
  motor	
  insurance	
  –	
  Usage-­‐Based	
  Insurance	
  example	
  
In	
   2014,	
   Coalition	
   Against	
   Insurance	
   Fraud11,	
   with	
   assistance	
   of	
   business	
   analytics	
   company	
   SAS,	
   has	
  
published	
  a	
  report	
  in	
  which	
  it	
  stresses	
  that	
  technology	
  plays	
  a	
  growing	
  role	
  in	
  fighting	
  fraud.	
  “Insurers	
  are	
  
investing	
  in	
  different	
  technologies	
  to	
  combat	
  fraud,	
  but	
  a	
  common	
  component	
  to	
  all	
  these	
  solutions	
  is	
  data,”	
  
said	
   Stuart	
   Rose,	
   Global	
   Insurance	
   Marketing	
   Principal	
   at	
   SAS.	
   “The	
   ability	
   to	
   aggregate	
   and	
   easily	
  
visualize	
   data	
   is	
   essential	
   to	
   identify	
   specific	
   fraud	
   patterns.”	
   “Technology	
   is	
   playing	
   a	
   larger	
   and	
   more	
  
trusted	
  role	
  with	
  insurers	
  in	
  countering	
  growing	
  fraud	
  threats.	
  Software	
  tools	
  provide	
  the	
  efficiency	
  insurers	
  
need	
  to	
  thwart	
  more	
  scams	
  and	
  impose	
  downward	
  pressure	
  on	
  premiums	
  for	
  policyholders,”	
  said	
  Dennis	
  Jay,	
  
the	
  Coalition’s	
  executive	
  director.	
  
In	
  motor	
  insurance,	
  a	
  good	
  example	
  is	
  Usage-­‐Based	
  Insurance	
  (UBI),	
  where	
  insurers	
  can	
  benefit	
  from	
  the	
  
superior	
   fraud	
   detection	
   that	
   telematics	
   can	
   provide.	
   It	
   equips	
   an	
   insurer	
   with	
   driving	
   behaviour	
   and	
  
driving	
  exposure	
  patterns	
  including	
  information	
  about	
  speeding,	
  driving	
  dynamics,	
  driving	
  trips,	
  day	
  and	
  
night	
  driving	
  patterns,	
  garaging	
  address	
  or	
  mileage.	
  In	
  some	
  sense	
  UBI	
  can	
  become	
  a	
  “lie	
  detector”	
  and	
  
can	
  help	
  companies	
  to	
  detect	
  falsification	
  of	
  the	
  garaging	
  address,	
  annual	
  mileage	
  or	
  driving	
  behaviour.	
  
Thanks	
  to	
  recording	
  vehicle’s	
  geographical	
  location	
  and	
  detecting	
  sharp	
  braking	
  and	
  harsh	
  acceleration	
  
during	
  an	
  accident,	
  an	
  insurer	
  can	
  analyse	
  accident	
  details	
  and	
  estimate	
  accident	
  damages.	
  The	
  telematics	
  
devices	
   used	
   in	
   the	
   UBI	
   can	
   contain	
   first	
   notice	
   of	
   loss	
   (FNOL)	
   services,	
   providing	
   very	
   valuable	
  
information	
  for	
  insurers.	
  Analytics	
  performed	
  on	
  this	
  data	
  provide	
  additional	
  evidence	
  to	
  consider	
  when	
  
investigating	
  a	
  claim,	
  and	
  can	
  help	
  to	
  reduce	
  fraud	
  and	
  claims	
  disputes.	
  
4 Legal	
  aspects	
  of	
  Big	
  Data	
  
4.1 Introduction	
  
Data	
  processing	
  lies	
  at	
  the	
  very	
  heart	
  of	
  the	
  insurance	
  activities.	
  Insurers	
  and	
  intermediaries	
  collect	
  and	
  
process	
  vast	
  amounts	
  of	
  personal	
  data	
  about	
  their	
  customers.	
  At	
  the	
  same	
  time	
  they	
  are	
  dealing	
  with	
  a	
  
particular	
  type	
  of	
  ‘discrimination’	
  among	
  their	
  insureds.	
  Like	
  all	
  businesses	
  operating	
  in	
  Europe,	
  insurers	
  
are	
   subject	
   to	
   European	
   and	
   national	
   data	
   protection	
   laws	
   and	
   anti-­‐discrimination	
   rules.	
   The	
   fast	
  
technological	
   evolution	
   and	
   globalization	
   has	
   activated	
   a	
   comprehensive	
   reform	
   of	
   the	
   current	
   Data	
  
Protection	
  laws.	
  The	
  EU	
  hopes	
  to	
  complete	
  a	
  new	
  General	
  Data	
  Protection	
  Regulation	
  at	
  the	
  end	
  of	
  this	
  
year.	
  Insurers	
  are	
  concerned	
  that	
  this	
  new	
  Regulation	
  could	
  introduce	
  unintended	
  consequences	
  for	
  the	
  
insurance	
  industry.	
  
	
  
	
  
	
  
	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
11	
  http://www.insurancefraud.org/about-­‐us.htm	
  
 
	
  
14
4.2 Data	
  processing	
  
4.2.1 Legislation:	
  an	
  overview	
  
Insurers	
   collect	
   and	
   process	
   data	
   to	
   analyse	
   risks	
   that	
   individuals	
   wish	
   to	
   cover,	
   to	
   tailor	
   products	
  
accordingly,	
  to	
  valuate	
  and	
  pay	
  claims	
  and	
  benefits,	
  and	
  detect	
  and	
  prevent	
  insurance	
  fraud.	
  The	
  rise	
  of	
  
Big	
   Data	
   presents	
   opportunities	
   to	
   offer	
   more	
   creative,	
   competitive	
   pricing	
   and,	
   importantly,	
   predict	
  
customers’	
   behavioural	
   activity.	
   As	
   insurers	
   continue	
   to	
   explore	
   this	
   relatively	
   untapped	
   resource,	
  
evolutions	
  in	
  data	
  processing	
  legislation	
  need	
  to	
  be	
  followed	
  very	
  closely.	
  	
  	
  
	
  
The	
   protection	
   of	
   personal	
   data	
   was	
   -­‐	
   as	
   a	
   separate	
   right	
   granted	
   to	
   an	
   individual	
   -­‐	
   for	
   the	
   first	
   time	
  
guaranteed	
  in	
  the	
  Convention	
  for	
  the	
  Protection	
  of	
  Individuals	
  with	
  regard	
  to	
  Automatic	
  Processing	
  
of	
  Personal	
  Data	
  (Convention	
  108).	
  It	
  was	
  adopted	
  by	
  the	
  Council	
  of	
  Europe	
  in	
  1981.	
  
The	
  current,	
  principal	
  EU	
  legal	
  instrument	
  establishing	
  rules	
  for	
  fair	
  personal	
  data	
  processing	
  is	
  the	
  Data	
  
Protection	
  Directive	
  (95/46/EC)	
  of	
  1995,	
  which	
  regulates	
  the	
  protection	
  of	
  individuals	
  with	
  regard	
  to	
  
the	
  processing	
  of	
  personal	
  data	
  and	
  the	
  free	
  movement	
  of	
  such	
  data.	
  As	
  a	
  framework	
  law,	
  the	
  Directive	
  
had	
  to	
  be	
  implemented	
  in	
  EU	
  Member	
  States	
  through	
  national	
  laws.	
  This	
  Directive	
  has	
  set	
  a	
  standard	
  for	
  
the	
  legal	
  definition	
  of	
  personal	
  data	
  and	
  regulatory	
  responses	
  to	
  the	
  use	
  of	
  personal	
  data.	
  The	
  provisions	
  
includes	
  principles	
  related	
  to	
  data	
  quality,	
  criteria	
  for	
  making	
  data	
  processing	
  legitimate	
  and	
  the	
  essential	
  
right	
  not	
  to	
  be	
  subject	
  to	
  automated	
  individual	
  decisions.	
  
The	
   Data	
   Protection	
   Directive	
   was	
   complemented	
   by	
   other	
   legal	
   instruments,	
   such	
   as	
   the	
   E-­‐Privacy	
  
Directive	
  (2002/58/EC),	
  part	
  of	
  a	
  package	
  of	
  5	
  new	
  Directives	
  that	
  aim	
  to	
  reform	
  the	
  legal	
  and	
  regulatory	
  
framework	
  of	
  electronic	
  communications	
  services	
  in	
  the	
  EU.	
  Personal	
  data	
  and	
  individuals’	
  fundamental	
  
right	
   to	
   privacy	
   needs	
   to	
   be	
   protected	
   but	
   at	
   the	
   same	
   time	
   the	
   legislator	
   must	
   take	
   into	
   account	
   the	
  
legitimate	
  interests	
  of	
  governments	
  and	
  businesses.	
  One	
  of	
  the	
  innovative	
  provisions	
  of	
  this	
  Directive	
  was	
  
the	
  introduction	
  of	
  a	
  legal	
  framework	
  for	
  the	
  use	
  of	
  devices	
  for	
  storing	
  or	
  retrieving	
  information,	
  such	
  as	
  
cookies.	
  Companies	
  must	
  also	
  inform	
  customers	
  of	
  the	
  data	
  processing	
  to	
  which	
  their	
  data	
  will	
  be	
  subject	
  
and	
   obtain	
   subscriber	
   consent	
   before	
   using	
   traffic	
   data	
   for	
   marketing	
   or	
   before	
   offering	
   added	
   value	
  
services	
  with	
  traffic	
  or	
  location	
  data.	
  The	
  EU	
  Cookie	
  Directive	
  (2009/136/EC),	
  an	
  amendment	
  of	
  the	
  E-­‐
Privacy	
  Directive,	
  aims	
  to	
  increase	
  consumer	
  protection	
  and	
  requires	
  websites	
  to	
  obtain	
  informed	
  consent	
  
from	
  visitors	
  before	
  they	
  store	
  information	
  on	
  a	
  computer	
  or	
  any	
  web	
  connected	
  device.	
  
In	
  2006	
  the	
  EU	
  Data	
  Retention	
  Directive	
  (2006/24/EC)	
  was	
  adopted	
  as	
  an	
  anti-­‐terrorism	
  measure	
  after	
  
the	
   terrorist	
   attacks	
   in	
   Madrid	
   and	
   London.	
   However	
   on	
   8	
   April	
   2014,	
   the	
   European	
   Court	
   of	
  
Justice	
  declared	
   this	
   Directive	
   invalid.	
   The	
   Court	
   took	
   the	
   view	
   that	
   the	
   Directive	
   does	
   not	
   meet	
   the	
  
principle	
  of	
  proportionality	
  and	
  should	
  have	
  provided	
  more	
  safeguards	
  to	
  protect	
  the	
  fundamental	
  rights	
  
with	
  respect	
  to	
  private	
  life	
  and	
  to	
  the	
  protection	
  of	
  personal	
  data.	
  
Belgium	
  has	
  established	
  a	
  Privacy	
  Act	
  or	
  Data	
  Protection	
  Act	
  in	
  1992.	
  Since	
  the	
  introduction	
  of	
  the	
  EU	
  
Data	
  Protection	
  Directive	
  (1995)	
  the	
  principles	
  of	
  that	
  directive	
  has	
  been	
  transposed	
  into	
  Belgian	
  law.	
  The	
  
Privacy	
   Act	
   consequently	
   underwent	
   significant	
   changes	
   introduced	
   by	
   the	
   Act	
   of	
   11	
   December	
   1998.	
  
Further	
  modifications	
  have	
  been	
  made	
  in	
  the	
  meantime,	
  including	
  those	
  of	
  the	
  Act	
  of	
  26	
  February	
  2006.	
  
The	
   Belgian	
   Privacy	
   Commission	
   is	
   part	
   of	
   a	
   European	
   task	
   force,	
   which	
   includes	
   data	
   protection	
  
authorities	
  from	
  the	
  Netherlands,	
  Belgium,	
  Germany,	
  France	
  and	
  Spain.	
  In	
  October	
  2014,	
  a	
  new	
  Privacy	
  
Bill	
  was	
  introduced	
  in	
  the	
  Belgian	
  Federal	
  Parliament.	
  The	
  Bill	
  mainly	
  aims	
  at	
  providing	
  the	
  Belgian	
  Data	
  
Protection	
   Authority	
   (DPA)	
   with	
   stronger	
   enforcement	
   capabilities	
   and	
   ensuring	
   that	
   Belgian	
   citizens	
  
regain	
  control	
  over	
  their	
  personal	
  data.	
  To	
  achieve	
  this,	
  certain	
  new	
  measures	
  are	
  being	
  proposed	
  to	
  be	
  
included	
  in	
  the	
  existing	
  legislation,	
  adopted	
  already	
  in	
  1992,	
  as	
  inspired	
  by	
  the	
  proposed	
  European	
  data	
  
protection	
  Regulation.	
  
At	
   this	
   moment	
   the	
   current	
   data	
   processing	
   legislation	
   needs	
   an	
   urgent	
   update.	
   Rapid	
   technological	
  
developments,	
  the	
  increasingly	
  globalized	
  nature	
  of	
  data	
  flows	
  and	
  the	
  arrival	
  of	
  cloud	
  computing	
  pose	
  
new	
   challenges	
   for	
   data	
   protection	
   authorities.	
   In	
   order	
   to	
   ensure	
   a	
   continuity	
   of	
   high	
   level	
   data	
  
protection,	
  the	
  rules	
  need	
  to	
  be	
  brought	
  in	
  line	
  with	
  technological	
  developments.	
  The	
  Directive	
  of	
  1995	
  
has	
  also	
  not	
  prevented	
  fragmentation	
  in	
  the	
  way	
  data	
  protection	
  is	
  implemented	
  across	
  the	
  Union.	
  
In	
   2012	
   the	
   European	
   Commission	
   has	
   proposed	
   a	
   comprehensive,	
   pan-­‐European	
   reform	
   of	
   the	
   data	
  
protection	
  rules	
  to	
  strengthen	
  online	
  privacy	
  rights	
  and	
  boost	
  Europe's	
  digital	
  economy.	
  On	
  15	
  June	
  2015,	
  
the	
   Council	
   reached	
   a	
   ‘general	
   approach’	
   on	
   a	
   General	
   Data	
   Protection	
   Regulation	
   (GDPR)	
   that	
  
 
	
  
15
establishes	
   rules	
   adapted	
   to	
   the	
   digital	
   era.	
   The	
   European	
   Commission	
   is	
   pushing	
   for	
   a	
   complete	
  
agreement	
  between	
  Council	
  and	
  European	
  Parliament	
  before	
  the	
  end	
  of	
  this	
  year.	
  The	
  twofold	
  aim	
  of	
  the	
  
Regulation	
  is	
  to	
  enhance	
  data	
  protection	
  rights	
  of	
  individuals	
  and	
  to	
  improve	
  business	
  opportunities	
  by	
  
facilitating	
   the	
   free	
   flow	
   of	
   personal	
   data	
   in	
   the	
   digital	
   single	
   market.	
   The	
   Regulation	
   must	
   be	
  
appropriately	
   balanced	
   in	
   order	
   to	
   guarantee	
   a	
   high	
   level	
   of	
   protection	
   of	
   the	
   individuals	
   and	
   allow	
  
companies	
   to	
   preserve	
   innovation	
   and	
   competitiveness.	
   In	
   parallel	
   with	
   the	
   proposal	
   for	
   a	
   GDPR,	
   the	
  
Commission	
  adopted	
  a	
  Directive	
  on	
  data	
  processing	
  for	
  law	
  enforcement	
  purposes	
  (5833/12).	
  	
  
4.2.2 Some	
  concerns	
  of	
  the	
  insurance	
  industry	
  
The	
  European	
  insurance	
  and	
  reinsurance	
  federation,	
  Insurance	
  Europe,	
  is	
  concerned	
  that	
  the	
  proposed	
  
Regulation	
  could	
  introduce	
  unintended	
  consequences	
  for	
  the	
  insurance	
  industry	
  and	
  their	
  policyholders.	
  
The	
   new	
   legislation	
   must	
   correctly	
   balance	
   an	
   individual’s	
   right	
   to	
   privacy	
   against	
   the	
   needs	
   of	
  
businesses.	
   The	
   way	
   insurers	
   process	
   data	
   must	
   be	
   taken	
   into	
   account	
   appropriately	
   so	
   that	
   they	
   can	
  
perform	
   their	
   contractual	
   obligations,	
   assess	
   consumers’	
   needs	
   and	
   risks,	
   innovate,	
   and	
   also	
   combat	
  
fraud.	
  There	
  is	
  also	
  a	
  clear	
  tension	
  between	
  Big	
  Data,	
  the	
  privacy	
  of	
  the	
  insured’s	
  personal	
  data	
  and	
  its	
  
availability	
  to	
  business	
  and	
  the	
  State.	
  
An	
  important	
  concern	
  is	
  that	
  the	
  proposed	
  rules	
  concerning	
  profiling	
  do	
  not	
  take	
  into	
  consideration	
  the	
  
way	
  that	
  insurance	
  works.	
  The	
  Directive	
  of	
  1995	
  contains	
  rules	
  on	
  'automated	
  processing'	
  but	
  there	
  is	
  not	
  
a	
  single	
  mention	
  of	
  'profiling'	
  in	
  the	
  text.	
  The	
  new	
  GDPR	
  aims	
  to	
  provide	
  more	
  legal	
  certainty	
  and	
  more	
  
protection	
   for	
   individuals	
   with	
   respect	
   to	
   data	
   processing	
   in	
   the	
   context	
   of	
   profiling.	
   Insures	
   need	
   to	
  
profile	
  potential	
  policyholders	
  to	
  measure	
  risk,	
  any	
  restrictions	
  on	
  profiling	
  could,	
  therefore,	
  translate	
  not	
  
only	
   into	
   higher	
   insurance	
   prices	
   and	
   less	
   insurance	
   coverage,	
   but	
   also	
   into	
   an	
   inability	
   to	
   provide	
  
consumers	
   with	
   appropriate	
   insurance.	
   Insurance	
   Europe	
   recommends	
   that	
   the	
   new	
   EU	
   Regulation	
  
should	
   allow	
   insurance-­‐related	
   profiling	
   at	
   pre-­‐contractual	
   stage	
   and	
   during	
   the	
   performance	
   of	
   the	
  
contract.	
  There	
  is	
  also	
  still	
  some	
  confusion	
  in	
  defining	
  profiling,	
  in	
  the	
  Council	
  approach	
  profiling	
  means	
  
solely	
  automated	
  processing	
  while	
  Article	
  20(5)	
  proposed	
  by	
  the	
  European	
  Parliament,	
  could,	
  according	
  
to	
   Insurance	
   Europe,	
   be	
   interpreted	
   as	
   prohibiting	
   fully	
   automated	
   processing,	
   requesting	
   human	
  
intervention	
  for	
  every	
  single	
  insurance	
  contract	
  offered	
  to	
  consumers.	
  	
  
The	
   proposal	
   of	
   the	
   EU	
   Council	
   (June	
   2015)	
   stipulates	
   that	
   the	
   controller	
   should	
   use	
   adequate	
  
mathematical	
  or	
  statistical	
  procedures	
  for	
  the	
  profiling.	
  He	
  must	
  secure	
  personal	
  data	
  in	
  a	
  way	
  which	
  
takes	
  account	
  of	
  the	
  potential	
  risks	
  involved	
  for	
  the	
  interests	
  and	
  rights	
  of	
  the	
  data	
  subject	
  and	
  which	
  
prevents	
  inter	
  alia	
  discriminatory	
  effects	
  against	
  individuals	
  on	
  the	
  basis	
  of	
  race	
  or	
  ethnic	
  origin,	
  political	
  
opinions,	
  religion	
  or	
  beliefs,	
  trade	
  union	
  membership,	
  genetic	
  or	
  health	
  status,	
  sexual	
  orientation	
  or	
  that	
  
result	
   in	
   measures	
   having	
   such	
   effect.	
   Automated	
   decision-­‐making	
   and	
   profiling	
   based	
   on	
   special	
  
categories	
  of	
  personal	
  data	
  should	
  only	
  be	
  allowed	
  under	
  specific	
  conditions.	
  	
  
According	
  to	
  the	
  Article	
  29	
  Working	
  Party12	
  the	
  proposals	
  of	
  the	
  Council	
  according	
  to	
  profiling	
  are	
  still	
  
unclear	
  and	
  do	
  not	
  foresee	
  sufficient	
  safeguards	
  which	
  should	
  be	
  put	
  in	
  place.	
  In	
  June	
  2015	
  it	
  renews	
  its	
  
call	
  for	
  provisions	
  giving	
  the	
  data	
  subject	
  a	
  maximum	
  of	
  control	
  and	
  autonomy	
  when	
  processing	
  personal	
  
data	
  for	
  profiling.	
  The	
  provisions	
  should	
  clearly	
  define	
  the	
  purposes	
  for	
  which	
  profiles	
  may	
  be	
  created	
  
and	
  used,	
  including	
  specific	
  obligations	
  on	
  controllers	
  to	
  inform	
  the	
  data	
  subject,	
  in	
  particular	
  on	
  his	
  or	
  
her	
  right	
  to	
  object	
  to	
  the	
  creation	
  and	
  the	
  use	
  of	
  profiles.	
  The	
  academic	
  Research	
  Group	
  IRISS	
  remarks	
  that	
  
the	
  GDPR	
  does	
  not	
  clarify	
  whether	
  or	
  not	
  there	
  is	
  an	
  obligation	
  on	
  data	
  controllers	
  to	
  disclose	
  information	
  
about	
  the	
  algorithm	
  involved	
  in	
  profiling	
  practices	
  and	
  suggest	
  clarification	
  on	
  this	
  point.	
  
Insurance	
  Europe	
  also	
  request	
  that	
  the	
  GDPR	
  should	
  explicitly	
  recognise	
  insurers’	
  need	
  to	
  process	
  and	
  
share	
  data	
  for	
  fraud	
  prevention	
  and	
  detection.	
  According	
  to	
  the	
  Council	
  and	
  the	
  Article	
  29	
  Working	
  Party	
  
fraud	
  prevention	
  may	
  fall	
  under	
  the	
  non-­‐exhaustive	
  list	
  of	
  ‘legitimate	
  interests’	
  in	
  Article	
  6(1)	
  (f)	
  and	
  will	
  
provide	
  the	
  necessary	
  legal	
  basis	
  to	
  allow	
  processes	
  for	
  combatting	
  insurance	
  fraud.	
  
The	
   new	
   Regulation	
   proposes	
   also	
   a	
   new	
   right	
   to	
   data	
   portability,	
   enabling	
   easier	
   transmission	
   of	
  
personal	
  data	
  from	
  one	
  service	
  provider	
  to	
  another.	
  This	
  would	
  allow	
  policyholders	
  to	
  obtain	
  a	
  copy	
  of	
  
any	
  of	
  their	
  data	
  being	
  processed	
  by	
  an	
  insurer	
  and	
  insurers	
  could	
  be	
  forced	
  to	
  disclose	
  confidential	
  and	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
12	
  Article	
  29	
  Working	
  Party	
  is	
  an	
  independent	
  advisory	
  body	
  on	
  data	
  protection	
  and	
  privacy,	
  set	
  up	
  under	
  Data	
  
Protection	
  Direction	
  of	
  1995.	
  It	
  is	
  composed	
  of	
  representatives	
  from	
  the	
  national	
  data	
  protection	
  authorities	
  of	
  the	
  
EU	
  Member	
  States,	
  the	
  European	
  Data	
  Protection	
  Supervisor	
  and	
  the	
  European	
  Commission.	
  
IABE Big Data information paper - An actuarial perspective
IABE Big Data information paper - An actuarial perspective
IABE Big Data information paper - An actuarial perspective
IABE Big Data information paper - An actuarial perspective
IABE Big Data information paper - An actuarial perspective
IABE Big Data information paper - An actuarial perspective
IABE Big Data information paper - An actuarial perspective
IABE Big Data information paper - An actuarial perspective

Más contenido relacionado

La actualidad más candente

Big Data Information Architecture PowerPoint Presentation Slide
Big Data Information Architecture PowerPoint Presentation SlideBig Data Information Architecture PowerPoint Presentation Slide
Big Data Information Architecture PowerPoint Presentation SlideSlideTeam
 
ebook.driving decision-making, security
ebook.driving decision-making, securityebook.driving decision-making, security
ebook.driving decision-making, securityRoman Chanclor
 
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...Geoffrey Fox
 
Intuit 2020 Report: The New Data Democracy
Intuit 2020 Report: The New Data DemocracyIntuit 2020 Report: The New Data Democracy
Intuit 2020 Report: The New Data DemocracyIntuit Inc.
 
Nuestar "Big Data Cloud" Major Data Center Technology nuestarmobilemarketing...
Nuestar "Big Data Cloud" Major Data Center Technology  nuestarmobilemarketing...Nuestar "Big Data Cloud" Major Data Center Technology  nuestarmobilemarketing...
Nuestar "Big Data Cloud" Major Data Center Technology nuestarmobilemarketing...IT Support Engineer
 
Data Science Courses - BigData VS Data Science
Data Science Courses - BigData VS Data ScienceData Science Courses - BigData VS Data Science
Data Science Courses - BigData VS Data ScienceDataMites
 
Big data (word file)
Big data  (word file)Big data  (word file)
Big data (word file)Shahbaz Anjam
 
Forecast of Big Data Trends
Forecast of Big Data TrendsForecast of Big Data Trends
Forecast of Big Data TrendsIMC Institute
 
An Introduction to Big Data
An Introduction to Big DataAn Introduction to Big Data
An Introduction to Big DataeXascale Infolab
 
Team 2 Big Data Presentation
Team 2 Big Data PresentationTeam 2 Big Data Presentation
Team 2 Big Data PresentationMatthew Urdan
 
Big Data & Analytics (Conceptual and Practical Introduction)
Big Data & Analytics (Conceptual and Practical Introduction)Big Data & Analytics (Conceptual and Practical Introduction)
Big Data & Analytics (Conceptual and Practical Introduction)Yaman Hajja, Ph.D.
 
BBDO Proximity: Big-data May 2013
BBDO Proximity: Big-data May 2013BBDO Proximity: Big-data May 2013
BBDO Proximity: Big-data May 2013Brian Crotty
 

La actualidad más candente (20)

Big Data
Big DataBig Data
Big Data
 
1
11
1
 
Big Data Information Architecture PowerPoint Presentation Slide
Big Data Information Architecture PowerPoint Presentation SlideBig Data Information Architecture PowerPoint Presentation Slide
Big Data Information Architecture PowerPoint Presentation Slide
 
ebook.driving decision-making, security
ebook.driving decision-making, securityebook.driving decision-making, security
ebook.driving decision-making, security
 
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...
Big Data Applications & Analytics Motivation: Big Data and the Cloud; Centerp...
 
Intuit 2020 Report: The New Data Democracy
Intuit 2020 Report: The New Data DemocracyIntuit 2020 Report: The New Data Democracy
Intuit 2020 Report: The New Data Democracy
 
Applications of Big Data
Applications of Big DataApplications of Big Data
Applications of Big Data
 
Fraud and Risk in Big Data
Fraud and Risk in Big DataFraud and Risk in Big Data
Fraud and Risk in Big Data
 
Big data
Big dataBig data
Big data
 
Presentation on Big Data
Presentation on Big DataPresentation on Big Data
Presentation on Big Data
 
Nuestar "Big Data Cloud" Major Data Center Technology nuestarmobilemarketing...
Nuestar "Big Data Cloud" Major Data Center Technology  nuestarmobilemarketing...Nuestar "Big Data Cloud" Major Data Center Technology  nuestarmobilemarketing...
Nuestar "Big Data Cloud" Major Data Center Technology nuestarmobilemarketing...
 
Data Science Courses - BigData VS Data Science
Data Science Courses - BigData VS Data ScienceData Science Courses - BigData VS Data Science
Data Science Courses - BigData VS Data Science
 
Big data-ppt
Big data-pptBig data-ppt
Big data-ppt
 
Big data (word file)
Big data  (word file)Big data  (word file)
Big data (word file)
 
Forecast of Big Data Trends
Forecast of Big Data TrendsForecast of Big Data Trends
Forecast of Big Data Trends
 
An Introduction to Big Data
An Introduction to Big DataAn Introduction to Big Data
An Introduction to Big Data
 
Team 2 Big Data Presentation
Team 2 Big Data PresentationTeam 2 Big Data Presentation
Team 2 Big Data Presentation
 
Big Data & Analytics (Conceptual and Practical Introduction)
Big Data & Analytics (Conceptual and Practical Introduction)Big Data & Analytics (Conceptual and Practical Introduction)
Big Data & Analytics (Conceptual and Practical Introduction)
 
BBDO Proximity: Big-data May 2013
BBDO Proximity: Big-data May 2013BBDO Proximity: Big-data May 2013
BBDO Proximity: Big-data May 2013
 
IT FUTURE- Big data
IT FUTURE- Big dataIT FUTURE- Big data
IT FUTURE- Big data
 

Similar a IABE Big Data information paper - An actuarial perspective

UNIT 1 -BIG DATA ANALYTICS Full.pdf
UNIT 1 -BIG DATA ANALYTICS Full.pdfUNIT 1 -BIG DATA ANALYTICS Full.pdf
UNIT 1 -BIG DATA ANALYTICS Full.pdfvvpadhu
 
Analysis on big data concepts and applications
Analysis on big data concepts and applicationsAnalysis on big data concepts and applications
Analysis on big data concepts and applicationsIJARIIT
 
Analysis of Big Data
Analysis of Big DataAnalysis of Big Data
Analysis of Big DataIRJET Journal
 
Big data document (basic concepts,3vs,Bigdata vs Smalldata,importance,storage...
Big data document (basic concepts,3vs,Bigdata vs Smalldata,importance,storage...Big data document (basic concepts,3vs,Bigdata vs Smalldata,importance,storage...
Big data document (basic concepts,3vs,Bigdata vs Smalldata,importance,storage...Taniya Fansupkar
 
Project 3 – Hollywood and IT· Find 10 incidents of Hollywood p.docx
Project 3 – Hollywood and IT· Find 10 incidents of Hollywood p.docxProject 3 – Hollywood and IT· Find 10 incidents of Hollywood p.docx
Project 3 – Hollywood and IT· Find 10 incidents of Hollywood p.docxstilliegeorgiana
 
What is big data ? | Big Data Applications
What is big data ? | Big Data ApplicationsWhat is big data ? | Big Data Applications
What is big data ? | Big Data ApplicationsShilpaKrishna6
 
An Investigation on Scalable and Efficient Privacy Preserving Challenges for ...
An Investigation on Scalable and Efficient Privacy Preserving Challenges for ...An Investigation on Scalable and Efficient Privacy Preserving Challenges for ...
An Investigation on Scalable and Efficient Privacy Preserving Challenges for ...IJERDJOURNAL
 
An Encyclopedic Overview Of Big Data Analytics
An Encyclopedic Overview Of Big Data AnalyticsAn Encyclopedic Overview Of Big Data Analytics
An Encyclopedic Overview Of Big Data AnalyticsAudrey Britton
 
Big data - a review (2013 4)
Big data - a review (2013 4)Big data - a review (2013 4)
Big data - a review (2013 4)Sonu Gupta
 
Big Data
Big DataBig Data
Big DataBBDO
 
141900791 big-data
141900791 big-data141900791 big-data
141900791 big-dataglittaz
 
Smart Data Module 6 d drive the future
Smart Data Module 6 d drive the futureSmart Data Module 6 d drive the future
Smart Data Module 6 d drive the futurecaniceconsulting
 
Idc big data whitepaper_final
Idc big data whitepaper_finalIdc big data whitepaper_final
Idc big data whitepaper_finalOsman Circi
 
Data foundation for analytics excellence
Data foundation for analytics excellenceData foundation for analytics excellence
Data foundation for analytics excellenceMudit Mangal
 

Similar a IABE Big Data information paper - An actuarial perspective (20)

UNIT 1 -BIG DATA ANALYTICS Full.pdf
UNIT 1 -BIG DATA ANALYTICS Full.pdfUNIT 1 -BIG DATA ANALYTICS Full.pdf
UNIT 1 -BIG DATA ANALYTICS Full.pdf
 
Analysis on big data concepts and applications
Analysis on big data concepts and applicationsAnalysis on big data concepts and applications
Analysis on big data concepts and applications
 
Business with Big data
Business with Big dataBusiness with Big data
Business with Big data
 
Unit-1 introduction to Big data.pdf
Unit-1 introduction to Big data.pdfUnit-1 introduction to Big data.pdf
Unit-1 introduction to Big data.pdf
 
Unit-1 introduction to Big data.pdf
Unit-1 introduction to Big data.pdfUnit-1 introduction to Big data.pdf
Unit-1 introduction to Big data.pdf
 
Analysis of Big Data
Analysis of Big DataAnalysis of Big Data
Analysis of Big Data
 
Big data document (basic concepts,3vs,Bigdata vs Smalldata,importance,storage...
Big data document (basic concepts,3vs,Bigdata vs Smalldata,importance,storage...Big data document (basic concepts,3vs,Bigdata vs Smalldata,importance,storage...
Big data document (basic concepts,3vs,Bigdata vs Smalldata,importance,storage...
 
130214 copy
130214   copy130214   copy
130214 copy
 
Project 3 – Hollywood and IT· Find 10 incidents of Hollywood p.docx
Project 3 – Hollywood and IT· Find 10 incidents of Hollywood p.docxProject 3 – Hollywood and IT· Find 10 incidents of Hollywood p.docx
Project 3 – Hollywood and IT· Find 10 incidents of Hollywood p.docx
 
What is big data ? | Big Data Applications
What is big data ? | Big Data ApplicationsWhat is big data ? | Big Data Applications
What is big data ? | Big Data Applications
 
An Investigation on Scalable and Efficient Privacy Preserving Challenges for ...
An Investigation on Scalable and Efficient Privacy Preserving Challenges for ...An Investigation on Scalable and Efficient Privacy Preserving Challenges for ...
An Investigation on Scalable and Efficient Privacy Preserving Challenges for ...
 
Big data assignment
Big data assignmentBig data assignment
Big data assignment
 
An Encyclopedic Overview Of Big Data Analytics
An Encyclopedic Overview Of Big Data AnalyticsAn Encyclopedic Overview Of Big Data Analytics
An Encyclopedic Overview Of Big Data Analytics
 
Big data - a review (2013 4)
Big data - a review (2013 4)Big data - a review (2013 4)
Big data - a review (2013 4)
 
Big Data
Big DataBig Data
Big Data
 
141900791 big-data
141900791 big-data141900791 big-data
141900791 big-data
 
Big Data: 8 facts and 8 fictions
Big Data: 8 facts and 8 fictionsBig Data: 8 facts and 8 fictions
Big Data: 8 facts and 8 fictions
 
Smart Data Module 6 d drive the future
Smart Data Module 6 d drive the futureSmart Data Module 6 d drive the future
Smart Data Module 6 d drive the future
 
Idc big data whitepaper_final
Idc big data whitepaper_finalIdc big data whitepaper_final
Idc big data whitepaper_final
 
Data foundation for analytics excellence
Data foundation for analytics excellenceData foundation for analytics excellence
Data foundation for analytics excellence
 

Más de Mateusz Maj

Meetup startup your insurance business
Meetup startup your insurance businessMeetup startup your insurance business
Meetup startup your insurance businessMateusz Maj
 
VivaDrive telematics for customer interaction
VivaDrive telematics for customer interactionVivaDrive telematics for customer interaction
VivaDrive telematics for customer interactionMateusz Maj
 
VivaDrive - UBI - report from the battlefield
VivaDrive - UBI - report from the battlefieldVivaDrive - UBI - report from the battlefield
VivaDrive - UBI - report from the battlefieldMateusz Maj
 
Will Usage Based Insurance (UBI) disrupt the insurance industry?
Will Usage Based Insurance (UBI) disrupt the insurance industry?Will Usage Based Insurance (UBI) disrupt the insurance industry?
Will Usage Based Insurance (UBI) disrupt the insurance industry?Mateusz Maj
 
Innovation in Insurance - necessity or luxury?
Innovation in Insurance - necessity or luxury?Innovation in Insurance - necessity or luxury?
Innovation in Insurance - necessity or luxury?Mateusz Maj
 
Road Vikings - join the driving revolution
Road Vikings - join the driving revolutionRoad Vikings - join the driving revolution
Road Vikings - join the driving revolutionMateusz Maj
 
Innovation and Big Data in Insurance
Innovation and Big Data in InsuranceInnovation and Big Data in Insurance
Innovation and Big Data in InsuranceMateusz Maj
 
Big Data Forum by Institute of Actuaries in Belgium (IABE)
Big Data Forum by Institute of Actuaries in Belgium (IABE)Big Data Forum by Institute of Actuaries in Belgium (IABE)
Big Data Forum by Institute of Actuaries in Belgium (IABE)Mateusz Maj
 
Digital Driving Pass - community-driven telematics
Digital Driving Pass - community-driven telematicsDigital Driving Pass - community-driven telematics
Digital Driving Pass - community-driven telematicsMateusz Maj
 
Big Data - an actuarial perspective
Big Data - an actuarial perspectiveBig Data - an actuarial perspective
Big Data - an actuarial perspectiveMateusz Maj
 

Más de Mateusz Maj (10)

Meetup startup your insurance business
Meetup startup your insurance businessMeetup startup your insurance business
Meetup startup your insurance business
 
VivaDrive telematics for customer interaction
VivaDrive telematics for customer interactionVivaDrive telematics for customer interaction
VivaDrive telematics for customer interaction
 
VivaDrive - UBI - report from the battlefield
VivaDrive - UBI - report from the battlefieldVivaDrive - UBI - report from the battlefield
VivaDrive - UBI - report from the battlefield
 
Will Usage Based Insurance (UBI) disrupt the insurance industry?
Will Usage Based Insurance (UBI) disrupt the insurance industry?Will Usage Based Insurance (UBI) disrupt the insurance industry?
Will Usage Based Insurance (UBI) disrupt the insurance industry?
 
Innovation in Insurance - necessity or luxury?
Innovation in Insurance - necessity or luxury?Innovation in Insurance - necessity or luxury?
Innovation in Insurance - necessity or luxury?
 
Road Vikings - join the driving revolution
Road Vikings - join the driving revolutionRoad Vikings - join the driving revolution
Road Vikings - join the driving revolution
 
Innovation and Big Data in Insurance
Innovation and Big Data in InsuranceInnovation and Big Data in Insurance
Innovation and Big Data in Insurance
 
Big Data Forum by Institute of Actuaries in Belgium (IABE)
Big Data Forum by Institute of Actuaries in Belgium (IABE)Big Data Forum by Institute of Actuaries in Belgium (IABE)
Big Data Forum by Institute of Actuaries in Belgium (IABE)
 
Digital Driving Pass - community-driven telematics
Digital Driving Pass - community-driven telematicsDigital Driving Pass - community-driven telematics
Digital Driving Pass - community-driven telematics
 
Big Data - an actuarial perspective
Big Data - an actuarial perspectiveBig Data - an actuarial perspective
Big Data - an actuarial perspective
 

Último

Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachBoston Institute of Analytics
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...karishmasinghjnh
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraGovindSinghDasila
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...amitlee9823
 
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...gajnagarg
 
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...gajnagarg
 
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night StandCall Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...gajnagarg
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 

Último (20)

CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning Approach
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
 
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
 
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
 
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night StandCall Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Shivaji Nagar ☎ 7737669865 🥵 Book Your One night Stand
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
Just Call Vip call girls Mysore Escorts ☎️9352988975 Two shot with one girl (...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 

IABE Big Data information paper - An actuarial perspective

  • 1.           BIG  DATA:  An  actuarial  perspective       Information  Paper   November  2015    
  • 2.     2 Table  of  Contents   1  INTRODUCTION   3   2  INTRODUCTION  TO  BIG  DATA   3   2.1  INTRODUCTION  AND  CHARACTERISTICS   3   2.2  BIG  DATA  TECHNIQUES  AND  TOOLS   4   2.3  BIG  DATA  APPLICATIONS   4   2.4  DATA  DRIVEN  BUSINESS   5   3  BIG  DATA  IN  INSURANCE  VALUE  CHAIN   6   3.1  INSURANCE  UNDERWRITING   6   3.2  INSURANCE  PRICING   8   3.3  INSURANCE  RESERVING   10   3.4  CLAIMS  MANAGEMENT   11   4  LEGAL  ASPECTS  OF  BIG  DATA   13   4.1  INTRODUCTION   13   4.2  DATA  PROCESSING   14   4.3  DISCRIMINATION   16   5  NEW  FRONTIERS   17   5.1  RISK  POOLING  VS.  PERSONALIZATION   17   5.2  PERSONALISED  PREMIUM   18   5.3  FROM  INSURANCE  TO  PREVENTION   18   5.4  THE  ALL-­‐SEEING  INSURER   18   5.5  CHANGE  IN  INSURANCE  BUSINESS   19   6  ACTUARIAL  SCIENCES  AND  THE  ROLE  OF  ACTUARIES   19   6.1  WHAT  IS  BIG  DATA  BRINGING  FOR  THE  ACTUARY?   19   6.2  WHAT  IS  THE  ACTUARY  BRINGING  TO  BIG  DATA?   20   7  CONCLUSIONS   21   8  REFERENCES   22  
  • 3.     3 1 Introduction   The  Internet  has  started  in  1984  linking  1,000  university  and  corporate  labs.  In  1998  it  grew  to  50  million   users,  while  in  2015  it  reached  3.2  billion  people  (44%  of  the  global  population).  This  enormous  user   growth  was  combined  with  an  explosion  of  data  that  we  all  produce.  Every  day  we  create  around  2.5   quintillion  bytes  of  data,  information  coming  from  various  sources  including  social  media  sites,  gadgets,   smartphones,   intelligent   homes   and   cars   or   industrial   sensors   to   name   few.   Any   company   that   can   combine  various  datasets  and  can  entail  effective  data  analytics  will  be  able  to  become  more  profitable   and  successful.  According  to  a  recent  report1  400  large  companies  who  adopted  Big  Data  analytics  "have   gained  a  significant  lead  over  the  rest  of  the  corporate  world."  Big  data  offers  big  business  gains,  but  also   has   hidden   costs   and   complexity   that   companies   will   have   to   struggle   with.   Semi-­‐structured   and   unstructured  big  data  requires  new  skills  and  there  is  shortage  of  people  who  mastered  data  science  and   can  handle  mathematics  and  statistics,  programming  and  possess  substantive,  domain  knowledge.     What  will  be  the  impact  on  the  insurance  sector  and  the  actuarial  profession?  The  concepts  of  Big  Data   and   predictive   modelling   are   not   new   to   insurers   who   have   already   been   storing   and   analysing   large   quantities  of  data  to  achieve  deeper  insights  into  customers’  behaviour  or  setting  up  insurance  premiums.   Moreover   actuaries   are   data   scientists   for   insurance   and   they   have   all   the   statistical   training   and   analytical  thinking  to  understand  complexity  of  data  combined  with  the  business  insights.  We  look  closely   on   the   insurance   value   chain   and   assess   the   impact   of   Big   Data   on   underwriting,   pricing   and   claims   reserving.   We   examine   the   ethics   of   Big   Data   including   data   privacy,   customer   identification,   data   ownership   and   the   legal   aspects.   We   also   discuss   new   frontiers   for   insurance   and   its   impact   on   the   actuarial  profession.  Will  actuaries  will  be  able  to  leverage  Big  Data,  create  sophisticated  risk  models  and   more  personalized  insurance  offers,  and  bring  new  wave  of  innovation  to  the  market?       2 Introduction  to  Big  Data     2.1 Introduction  and  characteristics   Big  Data  broadly  refers  to  data  sets  so  large  and  complex  that  they  cannot  be  handled  by  traditional  data   processing  software  and  it  can  be  defined  by  the  following  attributes:   a. Volume:  in  2012  it  was  estimated  that  2.5  x  1018  bytes  of  data  was  created  worldwide  every  day  -­‐   this  is  equivalent  to  a  stack  of  books  from  the  Sun  to  Pluto  and  back  again.  This  data  comes  from   everywhere:   sensors   used   to   gather   climate   information,   posts   to   social   media   sites,   digital   pictures  and  videos,  purchase  transaction  records,  software  logs,  GPS  signals  from  mobile  devices,   among  others.   b. Variety  and  Variability:  the  challenges  of  Big  Data  do  not  only  arise  from  the  sheer  volume  of   data  but  also  from  the  fact  that  data  is  generated  in  multiple  forms  as  a  mix  of  unstructured  and   structured  data,  and  as  a  mix  of  data  at  rest  and  data  in  motion  (i.e.  static  and  real  time  data).   Furthermore   the   meaning   of   data   can   change   over   time   or   depend   on   the   context.   Structured   data  is  organized  in  a  way  that  both  computers  and  humans  can  read,  for  example  information   stored   in   traditional   databases.   Unstructured   data   refers   to   data   types   such   as   images,   audio,   video,   social   media   and   other   information   that   are   not   organized   or   easily   interpreted   by   traditional   databases.   It   includes   data   generated   by   machines   such   as   sensors,   web   feeds,   networks  or  service  platforms.   c. Visualization:  the  insights  gained  by  a  company  from  analysing  data  must  be  shared  in  a  way  that   is  efficient  and  understandable  to  the  company’s  stakeholders.   d. Velocity:  data  is  created,  saved,  analysed  and  visualized  at  an  increasing  speed,  making  it  possible   to  analyse  and  visualize  high  volumes  of  data  in  real  time.     e. Veracity:  it  is  essential  that  the  data  is  accurate  in  order  to  generate  value.   f. Value:  the  insights  gleaned  from  Big  Data  can  help  organizations  deepen  customer  engagement,   optimize  operations,  prevent  threats  and  fraud,  and  capitalize  on  new  sources  of  revenue.                                                                                                                             1  http://www.bain.com/publications/articles/big_data_the_organizational_challenge.aspx  
  • 4.     4 2.2 Big  Data  techniques  and  tools   The  Big  Data  industry  has  been  supported  by  the  following  technologies:   a. The  Apache  Hadoop  software  library  was  initially  released  in  December  2011  and  is  an  open   source  framework  that  allows  for  the  distributed  processing  of  large  data  sets  across  clusters  of   computers  using  simple  algorithms.  It  is  designed  to  scale  up  from  one  to  thousands  of  machines,   each   one   being   a   computational   and   storage   unit.   The   software   library   is   designed   under   the   fundamental   assumption   that   hardware   failures   are   common:   the   library   itself   automatically   detects   and   handles   hardware   failures   in   order   to   guarantee   that   the   services   provided   by   a   computer  cluster  will  stay  available  even  when  the  cluster  is  affected  by  hardware  failures.  A  wide   variety  of  companies  and  organizations  use  Hadoop  for  both  research  and  production:  web-­‐based   companies   that   own   some   of   the   world’s   biggest   data   warehouses   (Amazon,   Facebook,   Google,   Twitter,  Yahoo!,  ...),  media  groups,  universities  among  others.  A  list  of  Hadoop  users  and  systems   is  available  at  http://wiki.apache.org/hadoop/PoweredBy.   b. Non-­‐relational  databases  have  existed  since  the  late  1960s  but  resurfaced  in  2009  (under  the   moniker  of  Not  Only  SQL  -­‐  NOSQL))  as  it  became  clear  they  are  especially  well  suited  to  handle  the   Big   Data   challenges   of   volume   and   variety   and   as   they   neatly   fit   within   the   Apache   Hadoop   framework.   c. Cloud   Computing   is   a   kind   of   internet-­‐based   computing,   where   shared   resources   and   information   are   provided   to   computers   and   other   devices   on-­‐demand   (Wikipedia).   A   service   provider  offers  computing  resources  for  a  fixed  price,  available  online  and  in  general  with  a  high   degree  of  flexibility  and  reliability.  These  technologies  have  been  created  by  major  online  actors   (Amazon,  Google)  followed  by  other  technology  providers  (IBM,  Microsoft,  RedHat).  There  is  a   wide  variety  of  architecture  Public,  Private  and  Hybride  Cloud  with  all  the  objective  of  making   computing  infrastructure  a  commodity  asset  with  the  best  quality/total  cost  of  ownership  ratio.   Having  a  nearly  infinite  amount  of  computing  power  at  hand  with  a  high  flexibility  is  a  key  factor   for  the  success  of  Big  Data  initiatives.   d. Mining  Massive  Datasets  is  a  set  of  methods,  algorithms  and  techniques  that  can  be  used  to  deal   with  Big  Data  problems  and  in  particular  with  volume,  variety  and  velocity  issues.  PageRank  can   be   seen   as   a   major   step   (see   http://infolab.stanford.edu/pub/papers/google.pdf)   and   its   evolution  to  a  Map-­‐Reduce  (https://en.wikipedia.org/wiki/MapReduce)  approach  is  definitively  a   breakthrough.  Social  Netword  Analysis  is  becoming  an  area  of  research  in  itself  that  aim  to  extract   useful   information   from   the   massive   amount   of   data   the   Social   Networks   are   providing.   These   methods   are   very   well   suited   to   run   on   software   such   as   Hadoop   in   a   Cloud   Computing   environment.   e. Social  Networks  is  one  source  of  Bid  Data  that  provides  a  stream  of  data  with  a  huge  value  for   almost  all  economic  (and  even  non-­‐economic)  actors.  For  most  companies,  it  is  the  very  first  time   in  history  they  are  capable  of  interacting  directly  with  their  customers.  Many  applications  of  Big   Data   make   use   of   these   data   to   provide   enhanced   services,   products   and   to   increase   customer   satisfaction.   2.3 Big  Data  Applications   Big  Data  has  the  potential  to  change  the  way  academic  institutions,  corporate  and  organizations  conduct   business  and  change  our  daily  life.  Great  examples  of  Big  Data  applications  include:   a. Healthcare:   Big   Data   technologies   will   have   a   major   impact   in   healthcare.   IBM   estimates   that   80%  of  medical  data  is  unstructured  and  is  clinically  relevant.  Furthermore  medical  data  resides   in  multiple  places  like  individual  medical  files,  lab  and  imaging  systems,  physician  notes,  medical   correspondence,   etc.   Big   Data   technologies   allow   healthcare   organizations   to   bring   all   the   information   about   an   individual   together   to   get   insights   on   how   to   manage   care   coordination,   outcomes-­‐based  reimbursement  models,  patient  engagement  and  outreach  programs.   b. Retail:  Retailers  can  get  insights  for  personalizing  marketing  and  improving  the  effectiveness  of   marketing  campaigns,  for  optimizing  assortment  and  merchandising  decisions,  and  for  removing   inefficiencies  in  distribution  and  operations.  For  instance  several  retailers  now  incorporate  
  • 5.     5 Twitter  streams  into  their  analysis  of  loyalty-­‐program  data.  The  gained  insights  make  it  possible   to  plan  for  surges  in  demand  for  certain  items  and  to  create  mobile  marketing  campaigns   targeting  specific  customers  with  offers  at  the  times  of  day  they  would  be  most  receptive  to  them.2   c. Politics:  Big  Data  technologies  will  improve  the  efficiency  and  effectiveness  across  the  broad   range  of  government  responsibilities.  Great  example  of  Big  Data  use  in  politics  was  2012  analytics   and  metrics  driven  Barack  Obama’s  presidential  campaign  [1].  Other  examples  include:   i. Threat  and  crime  prediction  and  prevention.  For  instance  the  Detroit  Crime  Commission   has  turned  to  Big  Data  in  its  effort  to  assist  the  government  and  citizens  of  southeast   Michigan  in  the  prevention,  investigation  and  prosecution  of  neighbourhood  crime;3   ii. Detection  of  fraud,  waste  and  errors  in  social  programs;   iii. Detection  of  tax  fraud  and  abuse.   d. Cyber  risk  prevention:  companies  can  analyse  data  traffic  in  their  computer  networks  in  real   time  to  detect  anomalies  that  may  indicate  the  early  stages  of  a  cyber  attack.  Research  firm   Gartner  estimates  that  by  2016,  more  than  25%  of  global  firms  will  adopt  big  data  analytics  for  at   least  one  security  and  fraud  detection  use  case,  up  from  8%  as  at  2014.4   e. Insurance  fraud  detection:  Insurance  companies  can  determine  a  score  for  each  claim  in  order   to  target  for  fraud  investigation  the  claims  with  the  highest  scores  i.e.  the  ones  that  are  most  likely   to  be  fraudulent.  Fraud  detection  is  treated  in  paragraph  3.4.   f. Usage-­‐Based  Insurance:  is  an  insurance  scheme,  where  car  insurance  premiums  are  calculated   based  on  dynamic  causal  data,  including  actual  usage  and  driving  behaviour.  Telematics  data   transmitted  from  a  vehicle  combined  with  Big  Data  analytics  enables  insurers  to  distinguish   cautious  drivers  from  aggressive  drivers  and  match  insurance  rate  with  the  actual  risk  incurred.   2.4 Data  driven  business   The   quantity   of   data   is   steeply   increasing   month   after   month   in   the   world.   Some   argue   it   is   time   to   organize  and  use  this  information:  data  must  now  be  viewed  as  a  corporate  asset.    In  order  to  respond  to   this  arising  transformation  of  business  culture,  two  specific  C-­‐level  roles  have  thus  appeared  in  the  past   years,  one  in  the  banking  and  the  other  in  the  insurance  industry.   2.4.1 The  Chief  Data  Officer   The  Chief  Data  Officer  (abbreviated  to  CDO)  is  the  first  architect  of  this  “data-­‐driven  business”.  Thanks   to  his  role  of  coordinator,  the  CDO  will  be  in  charge  of  the  data  that  drive  the  company,  by:     • defining  and  setting  up  a  strategy  to  guarantee  their  quality,  their  reliability  and  their   coherency;   • organizing  and  classifying  them;   • making  them  accessible  to  the  right  person  at  the  right  moment,  for  the  pertinent  need  and  in   the  right  format.   Thus,  the  Chief  Data  Officer  needs  a  strong  business  background  to  understand  how  business  runs.  The   following   question   will   then   emerge:   to   whom   should   the   CDO   report?   In   some   firms,   the   CDO   is   considered  part  of  the  IT,  and  reports  to  the  CTO  (Chief  Technology  Officer);  in  others,  he  holds  more  of  a   business  role,  reporting  to  the  CEO.  It’s  therefore  up  to  the  company  to  decide,  as  not  two  companies  are   exactly  similar  from  a  structural  point  of  view.     Which   companies   have   already   a   CDO?   Generali   Group   has   appointed   someone   to   this   newly   created   position   in   June   2015.   Other   companies   such   as   HSBC,   Wells   Fargo   and   QBE   had   already   appointed   a   person   to   this   position   in   2013   or   2014.   Even   Barack   Obama   appointed   a   Chief   Data   Officer/Scientist   during  his  2012  campaign  and  the  metrics-­‐driven  decision-­‐making  campaign  played  a  big  role  in  Obama’s                                                                                                                             2  http://asmarterplanet.com/blog/2015/03/surprising-­‐insights-­‐ibmtwitter-­‐alliance.html#more-­‐33140   3  http://www.datameer.com/company/news/press-­‐releases/detroit-­‐crime-­‐commission-­‐combats-­‐crime-­‐with-­‐ datameer-­‐big-­‐data-­‐analytics.html   4  http://www.gartner.com/newsroom/id/2663015  
  • 6.     6 re-­‐election.   In   the   beginning,   most   of   the   professionals   holding   the   actual   job   title   “Chief   Data   Officer”   were  located  in  the  United  States.  After  a  while,  Europe  followed  the  move.  Also,  lots  of  people  did  the  job   in  their  day-­‐to-­‐day  work,  but  didn’t  necessarily  hold  the  title.  Many  analysts  in  the  financial  sector  believe   that  yet  more  insurance  and  banking  companies  will  have  to  do  the  move  in  the  following  years  if  they   want  to  stay  attractive.   2.4.2 The  Chief  Analytics  Officer   Another  C-­‐level  position  aroused  in  the  past  months:  the  Chief  Analytics  Officer  (abbreviated  to  CAO).  Are   there  differences  between  a  CAO  and  a  CDO?    Theoretically  a  CDO  focuses  on  tactical  data  management,   while  the  CAO  concentrates  on  the  strategic  deployment  of  analytics.  The  latter’s  focus  is  on  data  analysis   to   find   hidden,   but   valuable,   patterns.   These   will   result   in   operational   decisions   that   will   make   the   company   more   competitive,   more   efficient   and   more   attractive   to   their   potential   and   current   clients.   Therefore,   the   CAO   is   a   normal   prolongation   of   the   data-­‐driven   business:   the   more   analytics   are   embedded  in  the  organization,  the  more  you  need  an  executive-­‐level  person  to  manage  that  position  and   communicate  the  results  in  an  understandable  way.  The  CAO  usually  reports  to  the  CEO.   In   practice,   some   companies   put   the   CAO   responsibilities   into   the   CDO   tasks,   while   others   distinguish   both  positions.  Currently,  it’s  quite  rare  to  find  an  explicit  “Chief  Analytics  Officer”  position  in  the  banking   and  insurance  sector,  because  of  this  overlap.  But  in  other  fields,  the  distinction  is  often  made.   3 Big  Data  in  insurance  value  chain   Big   Data   provides   new   insights   from   social   networks,   telematics   sensors,   and   other   new   information   channels   and   as   a   result   it   allows   understanding   customer   preferences   better,   enabling   new   business   approaches  and  products,  and  enhancing  existing  internal  models,  processes  and  services.  With  the  rise   of  Big  Data  the  insurance  world  could  fundamentally  change  and  the  entire  insurance  value  chain  could   be  impacted  starting  from  underwriting  to  claims  management.       3.1 Insurance  underwriting   3.1.1 Introduction   In  traditional  insurance  underwriting  and  actuarial  analyses,  for  years  we  have  been  observing  a  never-­‐ ending  search  for  more  meaningful  insight  into  individual  policyholder  risk  characteristics  to  distinguish   good   risks   from   the   bad   and   to   accurately   price   each   risk   accordingly.   The   analytics   performed   by   actuaries,  based  on  advanced  mathematical  and  financial  theories,  have  always  been  critically  important   to   an   insurer’s   profitability.   Over   the   last   decade,   however,   revolutionary   advances   in   computing   technology   and   the   explosion   of   new   digital   data   sources   have   expanded   and   reinvented   the   core   disciplines   of   insurers.   Today’s   advanced   analytics   in   insurance   go   much   further   than   traditional   underwriting  and  actuarial  science.  Data  mining  and  predictive  modelling  is  today  the  way  forward  for   insurers  for  improving  pricing,  segmentation  and  increasing  profitability.   3.1.2 What  is  predictive  modelling?   Predictive  modelling  can  be  defined  as  the  analysis  of  large  historical  data  sets  to  identify  correlations   and  interactions  and  the  use  of  this  knowledge  to  predict  future  events.  For  actuaries,  the  concepts  of   predictive  modelling  are  not  new  to  the  profession.  The  use  of  mortality  tables  to  price  life  insurance   products   is   an   example   of   predictive   modelling.   The   Belgian   MK,   FK   and   MR,   FR   tables   showed   the   relationship  between  death  probability  and  the  explaining  variables  of  age,  sex  and  product  type  (in  this   case  life  insurance  or  annuity).   Predictive   models   have   been   around   a   long   time   in   sales   and   marketing   environments   for   example   to   predict  the  probability  of  a  customer  to  buy  a  new  product.  Bringing  together  expertise  from  both  the   actuarial   profession   and   marketing   analytics   can   lead   to   new   innovative   initiatives   where   predictive   models  guide  expert  decisions  in  areas  such  as  claims  management,  fraud  detection  and  underwriting.   3.1.3 From  small  over  medium  to  Big  Data   Insurers  collect  a  wealth  of  information  on  their  customers.  In  the  first  place  during  the  underwriting   process:   by   asking   about   the   claims   history   of   a   customer   for   car   and   home   insurance   for   example.   Another  source  is  the  history  of  the  relationship  the  customer  has  with  the  insurance  company.  While  in   the  past  the  data  was  kept  in  silos  by  product,  the  key  challenge  now  lies  in  gathering  all  this  information   into  one  place  where  the  customer  dimension  is  central.  The  transversal  approach  to  the  database  also  
  • 7.     7 reflects  the  recent  evolution  in  marketing:  going  from  the  4P’s  (product,  price,  place,  promotion)  to  the   4C’s5  (customer,  costs,  convenience,  communication).   On  top  of  unleashing  the  value  of  internal  data,  new  data  sources  are  becoming  available  like  for  instance   wearables,  social  networks  to  name  few.  Because  Big  Data  can  be  overwhelming  to  start  with,  medium   data   should   be   considered   at   first.   In   Belgium,   the   strong   bancassurance   tradition   offers   interesting   opportunities  of  combining  the  insurance  and  bank  data  to  create  powerful  predictive  models.   3.1.4 Examples  of  predictive  modelling  for  underwriting   1°  Use  the  360  view  on  the  customer  and  predictive  models  to  maximize  profitability  and  gain  more   business.   By   thoroughly   analysing   data   from   different   sources   and   applying   analytics   to   gain   insight,   insurance   companies   should   strive   to   develop   a   comprehensive   360-­‐degree   customer   view.   The   gains   of   this   complete  and  accurate  view  of  the  customer  are  twofold:   • Maximizing  the  profitability  of  the  current  customer  portfolio  through:   o detecting  cross-­‐sell  and  up-­‐sell  opportunities;   o customer  satisfaction  and  loyalty  actions,   o effective  targeting  of  products  and  services  (e.g.    customers  that  are  most  likely  to  be  in   good  health  or  those  customers  that  are  less  likely  to  have  a  car  accident).   • Acquiring   more   profitable   new   customers   at   a   reduced   marketing   cost:   modelling   the   existing   customers  will  lead  to  useful  information  to  focus  marketing  campaigns  on  the  most  interesting   prospects.   By  combining  data  mining  and  analytics,  insurance  companies  can  better  understand  which  customers   are  most  likely  to  buy,  discover  who  are  their  most  profitable  customers  and  how  to  attract  or  retain   more   of   them.   Another   use   case   can   be   the   evaluation   of   the   underwriting   process   to   improve   the   customer  experience  during  this  on-­‐boarding  process.   2°  Predictive  underwriting  for  life  insurance6   Using  predictive  models,  in  theory  it  is  possible  to  predict  the  death  probability  of  a  customer.  However,   the  low  frequency  of  life  insurance  claims  presents  a  challenge  to  modellers.  While  for  car  insurance,  the   probability  of  a  customer  having  a  claim  can  be  around  10%,  for  life  insurance  it  is  around  0,1%  for  the   first  year.  Not  only  does  this  mean  that  a  significant  in  force  book  is  needed  to  have  confidence  in  the   results,  but  also  that  sufficient  history  should  be  present  to  be  able  to  show  mortality  experience  over   time.  For  this  reason,  using  the  underwriting  decision  as  the  variable  to  predict  is  a  more  common  choice.   All  life  insurance  companies  hold  historical  data  on  medical  underwriting  decisions  that  can  be  leveraged   to  build  predictive  models  that  predict  underwriting  decisions.  Depending  on  how  the  model  is  used,  the   outcome  can  be  a  reduction  of  costs  for  medical  examinations,  to  have  more  customer  friendly  processes   by  avoiding  asking  numerous  invasive  personal  questions  or  a  reduction  in  time  needed  to  assess  the   risks  by  automatically  approving  good  risks  and  focusing  underwriting  efforts  on  more  complex  cases.   For   example,   if   the   predictive   model   tells   you   that   a   new   customer   has   a   high   degree   of   similarity   to   customers   that   passed   the   medical   examination,   the   medical   examination   could   be   waved   for   this   customer.   If  this  sounds  scary  for  risk  professionals,  first  a  softer  approach  can  be  tested,  for  instance  by  improving   marketing  actions  by  targeting  only  those  individuals  that  have  a  high  likelihood  to  be  in  good  health.   This   not   only   decreases   the   cost   of   the   campaign,   but   also   avoids   the   disappointment   of   a   potential   customer  who  is  refused  during  the  medical  screening  process.                                                                                                                                 5  http://www.customfitonline.com/news/2012/10/19/4-­‐cs-­‐versus-­‐the-­‐4-­‐ps-­‐of-­‐marketing/   6  Predictive  modeling  for  life  insurance,  April  2010,  Deloitte  
  • 8.     8 3.1.5 Challenges  of  predictive  modelling  in  underwriting7   Predictive  models  can  only  be  as  good  as  the  input  used  to  calibrate  the  model.  The  first  challenge  in   every  predictive  modelling  project  is  to  collect  relevant,  high  quality  data  of  which  a  history  is  present.  As   many   insurers   are   currently   replacing   legacy   systems   to   reduce   maintenance   costs,   this   can   be   at   the   expense  of  the  history.  Actuaries  are  uniquely  placed  to  prevent  the  history  being  lost,  as  for  adequate   risk   management;   a   portfolio’s   history   should   be   kept.   The   trend   of   moving   all   policies   from   several   legacy  systems  into  one  modern  single  policy  administration  system  is  an  opportunity  that  must  be  seized   so  in  the  future  data  collection  will  be  easier.   Once  the  necessary  data  are  collected,  some  legal  or  compliance  concerns  need  to  be  addressed  as  there   might  be  boundaries  to  using  certain  variables  in  the  underwriting  process.  In  Europe,  if  the  model  will   influence  the  price  of  the  insurance,  gender  is  no  longer  allowed  as  an  explanatory  variable.  And  this  is   only  one  example.  It  is  important  that  the  purpose  of  the  model  and  the  possible  inputs  are  discussed   with  the  legal  department  prior  to  starting  the  modelling.   Once  the  model  is  built,  it  is  important  that  the  users  realize  that  no  model  is  perfect.  This  means  that   residual  risks  will  be  present  and  this  should  be  put  in  the  balance  against  the  gains  that  the  use  of  the   model  can  bring.   And  finally,  once  a  predictive  model  has  been  set  up,  a  continuous  reviewing  cycle  must  be  put  in  place   that  collects  feedback  from  the  underwriting  and  sales  teams  and  collects  data  to  improve  and  refine  the   model.  Building  a  predictive  model  is  a  continuous  improvement  process,  not  a  one-­‐off  project.   3.2 Insurance  pricing   3.2.1 Overview  of  existing  pricing  techniques   The  first  rate-­‐making  techniques  were  based  on  rudimentary  methods  such  as  univariate  analysis  and   later  iterative  standardized  univariate  methods  such  as  the  minimum  bias  procedure.  They  look  at  how   changes  in  one  characteristic  result  in  differences  in  loss  frequency  or  severity.     Later   on   insurance   companies   moved   to   multivariate   methods.   However,   this   was   associated   with   a   further   development   of   the   computing   power   and   data   capabilities.   These   techniques   are   now   being   adopted  by  more  and  more  insurers  and  are  becoming  part  of  everyday  business  practices.  Multivariate   analytical  techniques  focus  on  individual  level  data  and  take  into  account  the  effects  (interactions)  that   many  different  characteristics  of  a  risk  have  on  one  another.  As  it  was  explained  in  the  previous  section,   many  companies  use  predictive  modelling  (a  form  of  multivariate  analysis)  to  create  measures  of  the   likelihood  that  a  customer  will  purchase  a  particular  product.  Banks  use  these  tools  to  create  measures   (e.g.  credit  scores)  of  whether  a  client  will  be  able  to  meet  lending  obligations  for  a  loan  or  mortgage.   Similarly,   P&C   insurers   can   use   predictive   models   to   predict   claim   behaviour.   Multivariate   methods   provide  valuable  diagnostics  that  aid  in  understanding  the  certainty  and  reasonableness  of  results.     Generalized  Linear  Models  are  essentially  a  generalized  form  of  linear  models.  This  family  encompasses   normal   error   linear   regression   models   and   the   nonlinear   exponential,   logistic   and   Poisson   regression   models,  as  well  as  many  other  models,  such  as  log-­‐linear  models  for  categorical  data.  Generalized  linear   models  have  become  the  standard  for  classification  rate-­‐making  in  most  developed  insurance  markets— particularly  because  of  the  benefit  of  transparency.  Understanding  the  mathematical  underpinnings  is  an   important  responsibility  of  the  rate-­‐making  actuary  who  intends  to  use  such  a  method.  Linear  models  are   a   good   place   to   start   as   GLMs   are   essentially   a   generalized   form   of   such   a   model.   As   with   many   techniques,  visualizing  the  GLM  results  is  an  intuitive  way  to  connect  the  theory  with  the  practical  use.   GLMs  do  not  stand  alone  as  the  only  multivariate  classification  method.  Other  methods  such  as  CART,   factor  analysis,  and  neural  networks  are  often  used  to  augment  GLM  analysis.     In  general  the  data  mining  techniques  listed  above  can  enhance  a  rate-­‐making  exercise  by:   • whittling  down  a  long  list  of  potential  explanatory  variables  to  a  more  manageable  list  for  use   within  a  GLM;   • providing  guidance  in  how  to  categorize  discrete  variables;                                                                                                                             7  Predictive  modelling  in  insurance:  key  issues  to  consider  throughout  the  lifecycle  of  a  model  
  • 9.     9 • reducing   the   dimension   of   multi-­‐level   discrete   variables   (i.e.,   condensing   100   levels,   many   of   which  have  few  or  no  claims,  into  20  homogenous  levels);   • identifying   candidates   for   interaction   variables   within   GLMs   by   detecting   patterns   of   interdependency  between  variables.     3.2.2 Old  versus  new  modelling  techniques   The  adoption  of  GLMs  resulted  in  many  companies  seeking  external  data  sources  to  augment  what  had   already   been   collected   and   analysed   about   their   own   policies.   This   includes   but   is   not   limited   to   information   about   geo-­‐demographics,   sensor   data,   social   media   information,   weather,   and   property   characteristics,  information  about  insured  individuals  or  business.  This  additional  data  helps  actuaries   further  improve  the  granularity  and  accuracy  of  classification  rate-­‐making.  Unfortunately  this  new  data  is   very   often   unstructured   and   massive,   and   hence   the   traditional   generalized   linear   model   (GLM)   techniques  become  useless.   With   so   many   unique   new   variables   in   play,   it   can   become   a   very   difficult   task   to   identify   and   take   advantage   of   the   most   meaningful   correlations.   In   many   cases,   GLM   techniques   are   simply   unable   to   penetrate  deeply  into  these  giant  stores.  Even  in  the  cases  when  they  can,  the  time  constraints  required  to   uncover  the  critical  correlations  tend  to  be  onerous,  requiring  days,  weeks,  and  even  months  of  analysis.   Only   with   advanced   techniques,   and   specifically   machine   learning,   can   companies   generate   predictive   models  to  take  advantage  of  all  the  data  they  are  capturing.     Machine  learning  is  the  modern  science  of  finding  patterns  and  making  predictions  from  data  based  on   work   in   multivariate   statistics,   data   mining,   pattern   recognition,   and   advanced/predictive   analytics.   Machine  learning  methods  are  particularly  effective  in  situations  where  deep  and  predictive  insights  need   to  be  uncovered  from  data  sets  that  are  large,  diverse  and  fast  changing  —  Big  Data.  Across  these  types  of   data,  machine  learning  easily  outperforms  traditional  methods  on  accuracy,  scale,  and  speed.   3.2.3 Personalized  and  Real-­‐time  pricing  –  Motor  Insurance   In  order  to  price  risk  more  accurately,  insurance  companies  are  now  combining  analytical  applications  –   e.g.  behavioural  models  based  on  customer  profile  data  –  with  a  continuous  stream  of  real  time  data  –  e.g.   satellite  data,  weather  reports,  vehicle  sensors  –  to  create  detailed  and  personalized  assessment  of  risk.   Usage-­‐based  insurance  (UBI)  has  been  around  for  a  while  –  it  began  with  Pay-­‐As-­‐You-­‐Drive  programs   that  gave  drivers  discounts  on  their  insurance  premiums  for  driving  under  a  set  number  of  miles.  These   soon   developed   into   Pay-­‐How-­‐You-­‐Drive   programs,   which   track   your   driving   habits   and   give   you   discounts  for  'safe'  driving.   UBI  allows  a  firm  to  snap  a  picture  of  an  individual's  specific  risk  profile,  based  on  that  individual's  actual   driving  habits.  UBI  condenses  the  period  of  time  under  inspection  to  a  few  months,  guaranteeing  a  much   more  relevant  pool  of  information.  With  all  this  data  available,  the  pricing  scheme  for  UBI  deviates  greatly   from   that   of   traditional   auto   insurance.   Traditional   auto   insurance   relies   on   actuarial   studies   of   aggregated  historical  data  to  produce  rating  factors  that  include  driving  record,  credit-­‐based  insurance   score,  personal  characteristics  (age,  gender,  and  marital  status),  vehicle  type,  living  location,  vehicle  use,   previous  claims,  liability  limits,  and  deductibles.     Policyholders  tend  to  think  of  traditional  auto  insurance  as  a  fixed  cost,  assessed  annually  and  usually   paid  for  in  lump  sums  on  an  annual,  semi-­‐annual,  or  quarterly  basis.  However,  studies  show  that  there  is  a   strong   correlation   between   claim   and   loss   costs   and   mileage   driven,   particularly   within   existing   price   rating  factors  (such  as  class  and  territory).  For  this  reason,  many  UBI  programs  seek  to  convert  the  fixed   costs  associated  with  mileage  driven  into  variable  costs  that  can  be  used  in  conjunction  with  other  rating   factors   in   the   premium   calculation.   UBI   has   the   advantage   of   utilizing   individual   and   current   driving   behaviours,  rather  than  relying  on  aggregated  statistics  and  driving  records  that  are  based  on  past  trends   and  events,  making  premium  pricing  more  individualized  and  precise.   3.2.4 Advantages   UBI  programs  offer  many  advantages  to  insurers,  consumers  and  society.  Linking  insurance  premiums   more  closely  to  actual  individual  vehicle  or  fleet  performance  allows  insurers  to  price  premiums  more   accurately.   This   increases   affordability   for   lower-­‐risk   drivers,   many   of   whom   are   also   lower-­‐income   drivers.  It  also  gives  consumers  the  ability  to  control  their  premium  costs  by  encouraging  them  to  reduce  
  • 10.     10 miles   driven   and   adopt   safer   driving   habits.   The   use   of   telematics   helps   insurers   to   more   accurately   estimate  accident  damages  and  reduce  fraud  by  enabling  them  to  analyse  the  driving  data  (such  as  hard   breaking,  speed,  and  time)  during  an  accident.  This  additional  data  can  also  be  used  by  insurers  to  refine   or  differentiate  UBI  products.     3.2.5 Shortcomings/challenges     3.2.5.1 Organization  and  resources   Taking   advantage   of   the   potential   of   Big   Data   requires   some   different   approaches   to   organization,   resources,   and   technology.   As   in   many   new   technologies   that   offer   promise,   there   are   challenges   to   successful   implementation   and   the   production   of   meaningful   business   results.   The   number   one   organizational  challenge  is  determining  the  business  value,  with  financing  as  a  close  second.  Talent  is  the   other  big  issue  –  identifying  the  business  and  technology  experts  inside  the  enterprise,  recruiting  new   employees,  training  and  mentoring  individuals,  and  partnering  with  outside  resources  is  clearly  a  critical   success  factor  for  Big  Data.  Implementing  the  new  technology  and  organizing  the  data  are  listed  as  lesser   challenges  by  insurers,  although  there  are  still  areas  that  require  attention.   3.2.5.2 Technology  challenges   The  biggest  technology  challenge  in  the  Big  Data  world  is  framed  in  the  context  of  different  Big  Data  “V”   characteristics.  These  include  the  standard  three  V’s  of  volume,  velocity,  and  variety,  plus  two  more  –   veracity   and   value.   The   variety   and   veracity   of   the   data   presents   the   biggest   challenges.   As   insurers   venture   beyond   analysis   of   structured   transaction   data   to   incorporate   external   data   and   unstructured   data  of  all  sorts,  the  ability  to  combine  and  input  the  data  into  an  analytic  analysis  may  be  complicated.  On   one  hand,  the  variety  expresses  the  promise  of  Big  Data,  but  on  the  other  hand,  the  technical  challenges   are   significant.   The   veracity   of   the   data   is   also   deemed   as   a   challenge.   It   is   true   that   some   Big   Data   analyses  do  not  require  the  data  to  be  as  cleaned  and  organized  as  in  traditional  approaches.  However,   the  data  must  still  reflect  the  underlying  truth/reality  of  the  domain.   3.2.5.3 Technology  Approaches   Technology  should  not  be  the  first  focus  area  for  evaluating  the  potential  of  Big  Data  in  an  organization.   However,   choosing   the   best   technology   platform   for   your   organization   and   business   problems   does   become  an  important  consideration  for  success.  Cloud  computing  will  play  a  very  important  role  in  Big   Data.  Although  there  are  challenges  and  new  approaches  required  for  Big  Data,  there  is  a  growing  body  of   experience,  expertise,  and  best  practices  to  assist  in  successful  Big  Data  implementations.   3.3 Insurance  Reserving   Loss  reserving  is  a  classic  actuarial  problem  encountered  extensively  in  motor,  property  and  casualty  as   well  as  in  health  insurance.  It  is  a  consequence  of  the  fact  that  insurers  need  to  set  reserves  to  cover   future  liabilities  related  to  the  book  of  contracts.  In  other  words  the  insurer  has  to  hold  funds  aside  to   meet  future  liabilities  attached  to  incurred  claims.     In  non-­‐life  insurance,  most  policies  run  for  a  period  of  12  months.  However  the  claims  payment  process   can  take  years  or  even  decades.  In  particular,  losses  arising  from  casualty  insurance  can  take  a  long  time   to   settle   and   even   when   the   claims   are   acknowledged,   it   may   take   time   to   establish   the   extent   of   the   claims   settlement   costs.   A   well-­‐known   and   costly   example   is   provided   by   the   claims   from   asbestos   liabilities.  Thus  it  is  not  a  surprise  that  the  biggest  item  on  the  liabilities  side  of  an  insurer’s  balance  sheet   is   often   the   provision   of   reserves   for   future   claims   payments.   It   is   the   job   of   the   reserving   actuary   to   predict,   with   maximum   accuracy,   the   total   amount   necessary   to   pay   those   claims   that   the   insurer   has   legally  committed  to  cover  for.     Historically,  reserving  was  based  on  deterministic  calculations  with  pen  and  paper,  combined  with  expert   judgement.   Since   the   1980s,   the   arrival   of   personal   computers   and   ‘spreadsheet’   software   packages   induced  a  real  change  for  the  reserving  actuaries.  The  use  of  spreadsheets  does  not  only  result  in  gain  of   calculation  time  but  allows  also  testing  different  scenarios  and  the  sensitivity  of  the  forecasts.  The  first   simple  models  used  by  actuaries  started  to  evolve  towards  more  developed  ideas  through  the  evolution   of   the   IT   resources.   Moreover   the   recent   changes   in   regulatory   requirements,   such   as   Solvency   II   in   Europe,  have  showed  the  need  of  stochastic  models  and  more  precise  statistical  techniques.        
  • 11.     11 3.3.1 Classical  methods   There  are  a  lot  of  different  frameworks  and  models  used  by  reserving  actuaries  to  compute  the  technical   provisions,  and  it  is  not  the  goal  of  this  paper  to  review  them  in  an  exhaustive  way  but  rather  to  show  that   they  share  the  central  notion  of  triangle.  A  triangle  is  a  way  of  presenting  data  in  the  form  of  a  triangular   structure  showing  the  development  of  claims  over  time  for  each  origin  period.  An  origin  period  can  be  the   year  the  policy  was  written  or  earned,  or  the  loss  occurrence  period.       After  having  used  deterministic  models,  reserving  generally  switches  to  stochastic  models.  These  models   allow  for  quantifying  reserve  risk.       The  use  of  models  based  on  aggregated  data  used  to  be  convenient  in  the  past  when  IT  resources  were   limited  but  is  more  and  more  questionable  nowadays  when  we  have  huge  computational  power  at  hand   at  an  affordable  price.  Therefore  there  is  a  need  to  move  to  models  that  fully  use  data  available  in  the   insurers’  data  warehouses.     3.3.2 Micro-­‐level  reserving  methods   Unlike  aggregate  models  (or  macro-­‐level  models),  micro-­‐level  reserving  methods  (also  called  individual   claim   level   models)   use   individual   claims   data   as   inputs   and   estimate   outstanding   liabilities   for   each   individual  claim.  Unlike  the  models  detailed  in  the  previous  section,  they  model  very  precisely  the  lifetime   development   process   of   each   individual   claim,   including   events   such   as   claim   occurrence,   reporting,   payments  and  settlement.  Moreover  they  can  include  micro-­‐level  covariates  such  as  information  about   the  policy,  the  policyholder,  claim,  claimant  and  transactions.     When  well  specified,  such  models  are  expected  to  generate  reliable  reserve  estimates.  Indeed  the  ability   to   model   the   claims   development   at   the   individual   level   and   to   incorporate   micro-­‐level   covariate   information  allows  micro-­‐level  models  to  handle  heterogeneities  in  claims  data  efficiently.  Moreover  the   large   amount   of   data   used   in   modelling   can   help   to   avoid   issues   of   over-­‐parameterization   and   lack   of   robustness.   As   a   consequence,   micro-­‐level   models   are   especially   significant   under   changing   environments,  as  these  changes  can  be  indicated  by  appropriate  covariates.     3.4 Claims  Management   Big  Data  can  play  a  tremendous  role  in  the  improvement  of  claims  management.  It  provides  access  to  data   that  was  not  available  before,  and  makes  the  claims  processing  faster.  Therefore  it  enables  improved  risk   management,  reduces  loss  adjustment  expenses  and  enhances  quality  of  service  resulting  in  increased   customer  retention.  Below  we  present  details  of  how  Big  Data  analytics  improves  fraud  detection  process.   3.4.1 Fraud  detection   It  is  estimated  that  a  typical  organization  loses  5%  of  its  revenues  to  fraud  each  year8.    The  total  cost  of   insurance  fraud  (non-­‐health  insurance)  in  the  US  is  estimated  to  be  more  than  $40  billion  per  year9.    The   advent  of  Big  Data  &  Analytics  has  provided  new  and  powerful  tools  to  fight  fraud.       3.4.2 What  are  the  current  challenges  in  fraud  detection?   The  first  challenge  is  finding  the  right  data.    Analytical  models  need  data  and  in  a  fraud  detection  setting   this   is   not   always   that   evident.     Collected   fraud   data   are   often   very   skew,   with   typically   less   than   1%   fraudsters,  which  seriously  complicates  the  detection  task.    Also  the  asymmetric  costs  of  missing  fraud   versus   harassing   non-­‐fraudulent   customers   represent   important   model   difficulties.     Furthermore,   fraudsters   try   to   constantly   outperform   the   analytical   models   such   that   these   models   should   be   permanently  monitored  and  re-­‐configured  on  an  ongoing  basis.       3.4.3 What  analytical  approaches  are  being  used  to  tackle  fraud?   Most   of   the   fraud   detection   models   in   use   nowadays   are   expert   based   models.     When   data   becomes   available,  one  can  start  doing  analytics.    A  first  approach  is  supervised  learning  which  analyses  a  labelled   data   set   of   historically   observed   fraud   behaviour.     It   can   be   used   to   both   predict   fraud   as   well   as   the   amount   thereof.     Unsupervised   learning   starts   from   an   unlabelled   data   set   and   performs   anomaly   detection.     Finally,   Social   network   learning   analyses   fraud   behaviour   in   networks   of   linked   entities.     Throughout  our  research,  it  has  been  found  that  this  approach  is  superior  to  all  others!                                                                                                                               8  www.acfe.com   9  www.fbi.gov  
  • 12.     12 3.4.4 What  are  the  key  characteristics  of  successful  analytical  models  for  fraud  detection?     Successful  fraud  analytical  models  should  satisfy  various  requirements.    First,  they  should  achieve  good   statistical  performance  in  terms  of  recall  or  hit  rate,  which  is  the  percentage  of  fraudsters  labelled  by  the   analytical   model   as   suspicious,   and   precision,   which   is   the   percentage   of   fraudsters   amongst   the   ones   labelled   as   suspicious.     Next,   the   analytical   models   should   not   be   based   on   complex   mathematical   formulas  (such  as  neural  networks,  support  vector  machines,...)  but  should  provide  clear  insight  into  the   fraud   mechanisms   adopted.     This   is   particularly   important   since   the   insights   gained   will   be   used   to   develop  new  fraud  prevention  strategies.    Also  the  operational  efficiency  of  the  fraud  analytical  model   needs  to  be  evaluated.    This  refers  to  the  amount  of  resources  needed  to  calculate  the  fraud  score  and   adequately  act  upon  it.    E.g.,  in  a  credit  card  fraud  environment,  a  decision  needs  to  be  made  within  a  few   seconds  after  the  transaction  was  initiated.       3.4.5 Use  of  social  network  analytics  to  detect  fraud10    Research   has   proven   that   network   models   significantly   outperform   non-­‐network   models   in   terms   of   accuracy,  precision  and  recall.  Network  analytics  can  help  improve  fraud  detection  techniques.  Fraud  is   present  in  many  critical  human  processes  such  as  credit  card  transactions,  insurance  claim  fraud,  opinion   fraud,   social   security   fraud...   Fraud   can   be   defined   by   the   following   five   characteristics.     Fraud   is   an   uncommon,   well-­‐considered,   imperceptibly   concealed,   time-­‐evolving   and   often   carefully   organized   crime,   which  appears  in  many  types  and  forms.  Before  applying  fraud  detection  techniques,  these  five  issues   should  be  resolved  or  counterbalanced.       Fraud   is   an   uncommon   crime   and   this   means   that   it   is   an   extremely   skewed   class   distribution.   Rebalancing  techniques  could  be  used  such  as  the  SMOTE  to  counterbalance  this  effect.  SMOTE  consists  in   under  sampling  the  majority  class  of  data  (reduce  the  number  of  legitimate  cases)  and  oversampling  the   minority  class  of  data  (duplicate  of  fraud  cases  or  create  artificial  fraud  cases).       Complex  fraud  structures  are  well-­‐considered,  this  implies  that  there  will  be  changes  in  behaviour  over   time  so  not  every  time  period  will  have  the  same  importance.  A  temporal  weighting  adjustment  should   put  an  emphasis  on  the  more  important  periods  (more  recent  data  periods)  that  could  be  explanatory  of   the  fraudulent  behaviour.   Fraud  is  imperceptibly  concealed  meaning  that  it  is  difficult  to  identify  fraud.  One  could  leverage  on  expert   knowledge  to  create  features  and  help  identify  fraud.     Fraud   is   time-­‐evolving.   The   period   of   study   should   be   selected   carefully   taking   into   consideration   that   fraud   evolves   over   time.   How   much   of   previous   time   periods   could   explain   or   affect   the   present?   The   model  should  incorporate  these  changes  over  time.  Another  question  to  rise  is  in  what  time-­‐window  the   model  should  be  able  to  detect  fraud:  short,  medium  or  long  term.   The   last   characteristic   of   fraud   is   that   it   is   most   of   the   time   carefully   organized.   Fraud   is   often   not   an   individual  phenomenon,  in  fact  there  are  many  interactions  between  fraudsters.  Often  there  are  fraud   sub-­‐networks   developing   in   a   bigger   network.   Social   network   analysis   could   be   used   to   detect   these   networks.     Social  Network  analysis  helps  deriving  useful  patterns  and  insights  by  exploiting  the  relational  structure   between  objects.   A  network  consists  of  two  set  of  elements:  the  objects  of  the  network  which  are  called  nodes  and  the   relationships  between  nodes  which  are  called  links.  The  links  connect  two  or  more  nodes.    A  weight  could   be   assigned   to   the   nodes   and   links   to   measure   the   magnitude   of   the   crime   or   the   intensity   of   the   relationship.  When  constructing  such  networks,  focus  will  be  put  on  the  neighbourhood  of  a  node  which   is  a  subgraph  of  network  around  the  node  of  interest  (fraudster).     Once   a   network   has   been   constructed,   how   could   this   network   be   used   as   an   indicator   of   fraudulent   activities?   Fraud   could   be   detected   by   answering   following   question:   Does   the   network   contain   statistically   significant   patterns   of   homophily?   Detection   of   fraud   relies   on   a   concept   often   used   in   sociology  which  is  called  homophily.  Homophily  in  networks  consists  in  people  have  a  strong  tendency  to                                                                                                                             10  based  on  the  research  of  Véronique  Van  Vlasselaer  (KULeuven)    
  • 13.     13 associate  with  other  whom  they  perceive  as  being  similar  to  themselves  in  some  way.  This  concept  could   be  translated  in  fraud  networks:  fraudulent  people  are  more  likely  to  be  connected  to  other  fraudulent   people.   Clustering   techniques   could   be   used   to   detect   significant   pattern   of   homophily   and   thus   could   spot  fraudsters.     Given  a  homophilic  network  with  evidence  of  fraud  clusters  then  it  is  possible  to  extract  features  from  the   network   around   the   node(s)   of   interest   (fraud   activity)   which   is   also   called   the   neighbourhood   of   the   node.  This  process  is  called  the  featurization  process:  extracting  features  for  each  network  object  based   on  its  neighbourhood.    Focus  will  be  put  on  the  first-­‐order  neighbourhood  (first-­‐degree  links)  also  known   as  the  “egonet”.  (ego:  node  of  interest  surrounded  by  its  direct  associates  known  as  alters).  Featurization   extraction  happens  at  two  levels:  egonet  generic  features  (how  many  fraudulent  resources  are  associated   to  that  company,  is  there  relationships  between  resources...)  and  alter  specific  features  (how  similar  are   the  alter  to  the  ego,  is  the  alter  involved  in  many  fraud  cases  or  not).     Once   these   first-­‐order   neighbourhood   features   for   each   subject   of   interest   (companies)   have   been   extracted  such  as  degree  of  fraudulent  resources,  the  weight  of  the  fraudulent  resources,  it  is  then  easy  to   derive  the  propagation  effect  of  these  fraudulent  influences  through  the  network.     To   conclude,   network   models   always   outperform   non-­‐network   models   as   they   are   able   to   better   distinguish   fraudsters   from   non-­‐fraudsters.     They   are   also   more   precise   in   generating   high-­‐risk   companies  and  smaller  list  and  better  detect  more  fraudulent  corporates.   3.4.6 Fraud  detection  in  motor  insurance  –  Usage-­‐Based  Insurance  example   In   2014,   Coalition   Against   Insurance   Fraud11,   with   assistance   of   business   analytics   company   SAS,   has   published  a  report  in  which  it  stresses  that  technology  plays  a  growing  role  in  fighting  fraud.  “Insurers  are   investing  in  different  technologies  to  combat  fraud,  but  a  common  component  to  all  these  solutions  is  data,”   said   Stuart   Rose,   Global   Insurance   Marketing   Principal   at   SAS.   “The   ability   to   aggregate   and   easily   visualize   data   is   essential   to   identify   specific   fraud   patterns.”   “Technology   is   playing   a   larger   and   more   trusted  role  with  insurers  in  countering  growing  fraud  threats.  Software  tools  provide  the  efficiency  insurers   need  to  thwart  more  scams  and  impose  downward  pressure  on  premiums  for  policyholders,”  said  Dennis  Jay,   the  Coalition’s  executive  director.   In  motor  insurance,  a  good  example  is  Usage-­‐Based  Insurance  (UBI),  where  insurers  can  benefit  from  the   superior   fraud   detection   that   telematics   can   provide.   It   equips   an   insurer   with   driving   behaviour   and   driving  exposure  patterns  including  information  about  speeding,  driving  dynamics,  driving  trips,  day  and   night  driving  patterns,  garaging  address  or  mileage.  In  some  sense  UBI  can  become  a  “lie  detector”  and   can  help  companies  to  detect  falsification  of  the  garaging  address,  annual  mileage  or  driving  behaviour.   Thanks  to  recording  vehicle’s  geographical  location  and  detecting  sharp  braking  and  harsh  acceleration   during  an  accident,  an  insurer  can  analyse  accident  details  and  estimate  accident  damages.  The  telematics   devices   used   in   the   UBI   can   contain   first   notice   of   loss   (FNOL)   services,   providing   very   valuable   information  for  insurers.  Analytics  performed  on  this  data  provide  additional  evidence  to  consider  when   investigating  a  claim,  and  can  help  to  reduce  fraud  and  claims  disputes.   4 Legal  aspects  of  Big  Data   4.1 Introduction   Data  processing  lies  at  the  very  heart  of  the  insurance  activities.  Insurers  and  intermediaries  collect  and   process  vast  amounts  of  personal  data  about  their  customers.  At  the  same  time  they  are  dealing  with  a   particular  type  of  ‘discrimination’  among  their  insureds.  Like  all  businesses  operating  in  Europe,  insurers   are   subject   to   European   and   national   data   protection   laws   and   anti-­‐discrimination   rules.   The   fast   technological   evolution   and   globalization   has   activated   a   comprehensive   reform   of   the   current   Data   Protection  laws.  The  EU  hopes  to  complete  a  new  General  Data  Protection  Regulation  at  the  end  of  this   year.  Insurers  are  concerned  that  this  new  Regulation  could  introduce  unintended  consequences  for  the   insurance  industry.                                                                                                                                     11  http://www.insurancefraud.org/about-­‐us.htm  
  • 14.     14 4.2 Data  processing   4.2.1 Legislation:  an  overview   Insurers   collect   and   process   data   to   analyse   risks   that   individuals   wish   to   cover,   to   tailor   products   accordingly,  to  valuate  and  pay  claims  and  benefits,  and  detect  and  prevent  insurance  fraud.  The  rise  of   Big   Data   presents   opportunities   to   offer   more   creative,   competitive   pricing   and,   importantly,   predict   customers’   behavioural   activity.   As   insurers   continue   to   explore   this   relatively   untapped   resource,   evolutions  in  data  processing  legislation  need  to  be  followed  very  closely.         The   protection   of   personal   data   was   -­‐   as   a   separate   right   granted   to   an   individual   -­‐   for   the   first   time   guaranteed  in  the  Convention  for  the  Protection  of  Individuals  with  regard  to  Automatic  Processing   of  Personal  Data  (Convention  108).  It  was  adopted  by  the  Council  of  Europe  in  1981.   The  current,  principal  EU  legal  instrument  establishing  rules  for  fair  personal  data  processing  is  the  Data   Protection  Directive  (95/46/EC)  of  1995,  which  regulates  the  protection  of  individuals  with  regard  to   the  processing  of  personal  data  and  the  free  movement  of  such  data.  As  a  framework  law,  the  Directive   had  to  be  implemented  in  EU  Member  States  through  national  laws.  This  Directive  has  set  a  standard  for   the  legal  definition  of  personal  data  and  regulatory  responses  to  the  use  of  personal  data.  The  provisions   includes  principles  related  to  data  quality,  criteria  for  making  data  processing  legitimate  and  the  essential   right  not  to  be  subject  to  automated  individual  decisions.   The   Data   Protection   Directive   was   complemented   by   other   legal   instruments,   such   as   the   E-­‐Privacy   Directive  (2002/58/EC),  part  of  a  package  of  5  new  Directives  that  aim  to  reform  the  legal  and  regulatory   framework  of  electronic  communications  services  in  the  EU.  Personal  data  and  individuals’  fundamental   right   to   privacy   needs   to   be   protected   but   at   the   same   time   the   legislator   must   take   into   account   the   legitimate  interests  of  governments  and  businesses.  One  of  the  innovative  provisions  of  this  Directive  was   the  introduction  of  a  legal  framework  for  the  use  of  devices  for  storing  or  retrieving  information,  such  as   cookies.  Companies  must  also  inform  customers  of  the  data  processing  to  which  their  data  will  be  subject   and   obtain   subscriber   consent   before   using   traffic   data   for   marketing   or   before   offering   added   value   services  with  traffic  or  location  data.  The  EU  Cookie  Directive  (2009/136/EC),  an  amendment  of  the  E-­‐ Privacy  Directive,  aims  to  increase  consumer  protection  and  requires  websites  to  obtain  informed  consent   from  visitors  before  they  store  information  on  a  computer  or  any  web  connected  device.   In  2006  the  EU  Data  Retention  Directive  (2006/24/EC)  was  adopted  as  an  anti-­‐terrorism  measure  after   the   terrorist   attacks   in   Madrid   and   London.   However   on   8   April   2014,   the   European   Court   of   Justice  declared   this   Directive   invalid.   The   Court   took   the   view   that   the   Directive   does   not   meet   the   principle  of  proportionality  and  should  have  provided  more  safeguards  to  protect  the  fundamental  rights   with  respect  to  private  life  and  to  the  protection  of  personal  data.   Belgium  has  established  a  Privacy  Act  or  Data  Protection  Act  in  1992.  Since  the  introduction  of  the  EU   Data  Protection  Directive  (1995)  the  principles  of  that  directive  has  been  transposed  into  Belgian  law.  The   Privacy   Act   consequently   underwent   significant   changes   introduced   by   the   Act   of   11   December   1998.   Further  modifications  have  been  made  in  the  meantime,  including  those  of  the  Act  of  26  February  2006.   The   Belgian   Privacy   Commission   is   part   of   a   European   task   force,   which   includes   data   protection   authorities  from  the  Netherlands,  Belgium,  Germany,  France  and  Spain.  In  October  2014,  a  new  Privacy   Bill  was  introduced  in  the  Belgian  Federal  Parliament.  The  Bill  mainly  aims  at  providing  the  Belgian  Data   Protection   Authority   (DPA)   with   stronger   enforcement   capabilities   and   ensuring   that   Belgian   citizens   regain  control  over  their  personal  data.  To  achieve  this,  certain  new  measures  are  being  proposed  to  be   included  in  the  existing  legislation,  adopted  already  in  1992,  as  inspired  by  the  proposed  European  data   protection  Regulation.   At   this   moment   the   current   data   processing   legislation   needs   an   urgent   update.   Rapid   technological   developments,  the  increasingly  globalized  nature  of  data  flows  and  the  arrival  of  cloud  computing  pose   new   challenges   for   data   protection   authorities.   In   order   to   ensure   a   continuity   of   high   level   data   protection,  the  rules  need  to  be  brought  in  line  with  technological  developments.  The  Directive  of  1995   has  also  not  prevented  fragmentation  in  the  way  data  protection  is  implemented  across  the  Union.   In   2012   the   European   Commission   has   proposed   a   comprehensive,   pan-­‐European   reform   of   the   data   protection  rules  to  strengthen  online  privacy  rights  and  boost  Europe's  digital  economy.  On  15  June  2015,   the   Council   reached   a   ‘general   approach’   on   a   General   Data   Protection   Regulation   (GDPR)   that  
  • 15.     15 establishes   rules   adapted   to   the   digital   era.   The   European   Commission   is   pushing   for   a   complete   agreement  between  Council  and  European  Parliament  before  the  end  of  this  year.  The  twofold  aim  of  the   Regulation  is  to  enhance  data  protection  rights  of  individuals  and  to  improve  business  opportunities  by   facilitating   the   free   flow   of   personal   data   in   the   digital   single   market.   The   Regulation   must   be   appropriately   balanced   in   order   to   guarantee   a   high   level   of   protection   of   the   individuals   and   allow   companies   to   preserve   innovation   and   competitiveness.   In   parallel   with   the   proposal   for   a   GDPR,   the   Commission  adopted  a  Directive  on  data  processing  for  law  enforcement  purposes  (5833/12).     4.2.2 Some  concerns  of  the  insurance  industry   The  European  insurance  and  reinsurance  federation,  Insurance  Europe,  is  concerned  that  the  proposed   Regulation  could  introduce  unintended  consequences  for  the  insurance  industry  and  their  policyholders.   The   new   legislation   must   correctly   balance   an   individual’s   right   to   privacy   against   the   needs   of   businesses.   The   way   insurers   process   data   must   be   taken   into   account   appropriately   so   that   they   can   perform   their   contractual   obligations,   assess   consumers’   needs   and   risks,   innovate,   and   also   combat   fraud.  There  is  also  a  clear  tension  between  Big  Data,  the  privacy  of  the  insured’s  personal  data  and  its   availability  to  business  and  the  State.   An  important  concern  is  that  the  proposed  rules  concerning  profiling  do  not  take  into  consideration  the   way  that  insurance  works.  The  Directive  of  1995  contains  rules  on  'automated  processing'  but  there  is  not   a  single  mention  of  'profiling'  in  the  text.  The  new  GDPR  aims  to  provide  more  legal  certainty  and  more   protection   for   individuals   with   respect   to   data   processing   in   the   context   of   profiling.   Insures   need   to   profile  potential  policyholders  to  measure  risk,  any  restrictions  on  profiling  could,  therefore,  translate  not   only   into   higher   insurance   prices   and   less   insurance   coverage,   but   also   into   an   inability   to   provide   consumers   with   appropriate   insurance.   Insurance   Europe   recommends   that   the   new   EU   Regulation   should   allow   insurance-­‐related   profiling   at   pre-­‐contractual   stage   and   during   the   performance   of   the   contract.  There  is  also  still  some  confusion  in  defining  profiling,  in  the  Council  approach  profiling  means   solely  automated  processing  while  Article  20(5)  proposed  by  the  European  Parliament,  could,  according   to   Insurance   Europe,   be   interpreted   as   prohibiting   fully   automated   processing,   requesting   human   intervention  for  every  single  insurance  contract  offered  to  consumers.     The   proposal   of   the   EU   Council   (June   2015)   stipulates   that   the   controller   should   use   adequate   mathematical  or  statistical  procedures  for  the  profiling.  He  must  secure  personal  data  in  a  way  which   takes  account  of  the  potential  risks  involved  for  the  interests  and  rights  of  the  data  subject  and  which   prevents  inter  alia  discriminatory  effects  against  individuals  on  the  basis  of  race  or  ethnic  origin,  political   opinions,  religion  or  beliefs,  trade  union  membership,  genetic  or  health  status,  sexual  orientation  or  that   result   in   measures   having   such   effect.   Automated   decision-­‐making   and   profiling   based   on   special   categories  of  personal  data  should  only  be  allowed  under  specific  conditions.     According  to  the  Article  29  Working  Party12  the  proposals  of  the  Council  according  to  profiling  are  still   unclear  and  do  not  foresee  sufficient  safeguards  which  should  be  put  in  place.  In  June  2015  it  renews  its   call  for  provisions  giving  the  data  subject  a  maximum  of  control  and  autonomy  when  processing  personal   data  for  profiling.  The  provisions  should  clearly  define  the  purposes  for  which  profiles  may  be  created   and  used,  including  specific  obligations  on  controllers  to  inform  the  data  subject,  in  particular  on  his  or   her  right  to  object  to  the  creation  and  the  use  of  profiles.  The  academic  Research  Group  IRISS  remarks  that   the  GDPR  does  not  clarify  whether  or  not  there  is  an  obligation  on  data  controllers  to  disclose  information   about  the  algorithm  involved  in  profiling  practices  and  suggest  clarification  on  this  point.   Insurance  Europe  also  request  that  the  GDPR  should  explicitly  recognise  insurers’  need  to  process  and   share  data  for  fraud  prevention  and  detection.  According  to  the  Council  and  the  Article  29  Working  Party   fraud  prevention  may  fall  under  the  non-­‐exhaustive  list  of  ‘legitimate  interests’  in  Article  6(1)  (f)  and  will   provide  the  necessary  legal  basis  to  allow  processes  for  combatting  insurance  fraud.   The   new   Regulation   proposes   also   a   new   right   to   data   portability,   enabling   easier   transmission   of   personal  data  from  one  service  provider  to  another.  This  would  allow  policyholders  to  obtain  a  copy  of   any  of  their  data  being  processed  by  an  insurer  and  insurers  could  be  forced  to  disclose  confidential  and                                                                                                                             12  Article  29  Working  Party  is  an  independent  advisory  body  on  data  protection  and  privacy,  set  up  under  Data   Protection  Direction  of  1995.  It  is  composed  of  representatives  from  the  national  data  protection  authorities  of  the   EU  Member  States,  the  European  Data  Protection  Supervisor  and  the  European  Commission.