Se ha denunciado esta presentación.
Se está descargando tu SlideShare. ×

IABE Big Data information paper - An actuarial perspective

Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
 
	
  
	
  
	
  
	
  
BIG	
  DATA:	
  An	
  actuarial	
  perspective	
  
	
  
	
  
Information	
  Paper	
  
November	
  20...
 
	
  
2
Table	
  of	
  Contents	
  
1	
  INTRODUCTION	
   3	
  
2	
  INTRODUCTION	
  TO	
  BIG	
  DATA	
   3	
  
2.1	
  I...
 
	
  
3
1 Introduction	
  
The	
  Internet	
  has	
  started	
  in	
  1984	
  linking	
  1,000	
  university	
  and	
  co...
Anuncio
Anuncio
Anuncio
Anuncio
Próximo SlideShare
Big data-analytics-cpe8035
Big data-analytics-cpe8035
Cargando en…3
×

Eche un vistazo a continuación

1 de 23 Anuncio

IABE Big Data information paper - An actuarial perspective

Descargar para leer sin conexión

We look closely on the insurance value chain and assess the impact of Big Data on underwriting, pricing and claims reserving. We examine the ethics of Big Data including data privacy, customer identification, data ownership and the legal aspects. We also discuss new frontiers for insurance and its impact on the actuarial profession. Will actuaries will be able to leverage Big Data, create sophisticated risk models and more personalized insurance offers, and bring new wave of innovation to the market?

We look closely on the insurance value chain and assess the impact of Big Data on underwriting, pricing and claims reserving. We examine the ethics of Big Data including data privacy, customer identification, data ownership and the legal aspects. We also discuss new frontiers for insurance and its impact on the actuarial profession. Will actuaries will be able to leverage Big Data, create sophisticated risk models and more personalized insurance offers, and bring new wave of innovation to the market?

Anuncio
Anuncio

Más Contenido Relacionado

Presentaciones para usted (20)

Similares a IABE Big Data information paper - An actuarial perspective (20)

Anuncio

Más reciente (20)

Anuncio

IABE Big Data information paper - An actuarial perspective

  1. 1.           BIG  DATA:  An  actuarial  perspective       Information  Paper   November  2015    
  2. 2.     2 Table  of  Contents   1  INTRODUCTION   3   2  INTRODUCTION  TO  BIG  DATA   3   2.1  INTRODUCTION  AND  CHARACTERISTICS   3   2.2  BIG  DATA  TECHNIQUES  AND  TOOLS   4   2.3  BIG  DATA  APPLICATIONS   4   2.4  DATA  DRIVEN  BUSINESS   5   3  BIG  DATA  IN  INSURANCE  VALUE  CHAIN   6   3.1  INSURANCE  UNDERWRITING   6   3.2  INSURANCE  PRICING   8   3.3  INSURANCE  RESERVING   10   3.4  CLAIMS  MANAGEMENT   11   4  LEGAL  ASPECTS  OF  BIG  DATA   13   4.1  INTRODUCTION   13   4.2  DATA  PROCESSING   14   4.3  DISCRIMINATION   16   5  NEW  FRONTIERS   17   5.1  RISK  POOLING  VS.  PERSONALIZATION   17   5.2  PERSONALISED  PREMIUM   18   5.3  FROM  INSURANCE  TO  PREVENTION   18   5.4  THE  ALL-­‐SEEING  INSURER   18   5.5  CHANGE  IN  INSURANCE  BUSINESS   19   6  ACTUARIAL  SCIENCES  AND  THE  ROLE  OF  ACTUARIES   19   6.1  WHAT  IS  BIG  DATA  BRINGING  FOR  THE  ACTUARY?   19   6.2  WHAT  IS  THE  ACTUARY  BRINGING  TO  BIG  DATA?   20   7  CONCLUSIONS   21   8  REFERENCES   22  
  3. 3.     3 1 Introduction   The  Internet  has  started  in  1984  linking  1,000  university  and  corporate  labs.  In  1998  it  grew  to  50  million   users,  while  in  2015  it  reached  3.2  billion  people  (44%  of  the  global  population).  This  enormous  user   growth  was  combined  with  an  explosion  of  data  that  we  all  produce.  Every  day  we  create  around  2.5   quintillion  bytes  of  data,  information  coming  from  various  sources  including  social  media  sites,  gadgets,   smartphones,   intelligent   homes   and   cars   or   industrial   sensors   to   name   few.   Any   company   that   can   combine  various  datasets  and  can  entail  effective  data  analytics  will  be  able  to  become  more  profitable   and  successful.  According  to  a  recent  report1  400  large  companies  who  adopted  Big  Data  analytics  "have   gained  a  significant  lead  over  the  rest  of  the  corporate  world."  Big  data  offers  big  business  gains,  but  also   has   hidden   costs   and   complexity   that   companies   will   have   to   struggle   with.   Semi-­‐structured   and   unstructured  big  data  requires  new  skills  and  there  is  shortage  of  people  who  mastered  data  science  and   can  handle  mathematics  and  statistics,  programming  and  possess  substantive,  domain  knowledge.     What  will  be  the  impact  on  the  insurance  sector  and  the  actuarial  profession?  The  concepts  of  Big  Data   and   predictive   modelling   are   not   new   to   insurers   who   have   already   been   storing   and   analysing   large   quantities  of  data  to  achieve  deeper  insights  into  customers’  behaviour  or  setting  up  insurance  premiums.   Moreover   actuaries   are   data   scientists   for   insurance   and   they   have   all   the   statistical   training   and   analytical  thinking  to  understand  complexity  of  data  combined  with  the  business  insights.  We  look  closely   on   the   insurance   value   chain   and   assess   the   impact   of   Big   Data   on   underwriting,   pricing   and   claims   reserving.   We   examine   the   ethics   of   Big   Data   including   data   privacy,   customer   identification,   data   ownership   and   the   legal   aspects.   We   also   discuss   new   frontiers   for   insurance   and   its   impact   on   the   actuarial  profession.  Will  actuaries  will  be  able  to  leverage  Big  Data,  create  sophisticated  risk  models  and   more  personalized  insurance  offers,  and  bring  new  wave  of  innovation  to  the  market?       2 Introduction  to  Big  Data     2.1 Introduction  and  characteristics   Big  Data  broadly  refers  to  data  sets  so  large  and  complex  that  they  cannot  be  handled  by  traditional  data   processing  software  and  it  can  be  defined  by  the  following  attributes:   a. Volume:  in  2012  it  was  estimated  that  2.5  x  1018  bytes  of  data  was  created  worldwide  every  day  -­‐   this  is  equivalent  to  a  stack  of  books  from  the  Sun  to  Pluto  and  back  again.  This  data  comes  from   everywhere:   sensors   used   to   gather   climate   information,   posts   to   social   media   sites,   digital   pictures  and  videos,  purchase  transaction  records,  software  logs,  GPS  signals  from  mobile  devices,   among  others.   b. Variety  and  Variability:  the  challenges  of  Big  Data  do  not  only  arise  from  the  sheer  volume  of   data  but  also  from  the  fact  that  data  is  generated  in  multiple  forms  as  a  mix  of  unstructured  and   structured  data,  and  as  a  mix  of  data  at  rest  and  data  in  motion  (i.e.  static  and  real  time  data).   Furthermore   the   meaning   of   data   can   change   over   time   or   depend   on   the   context.   Structured   data  is  organized  in  a  way  that  both  computers  and  humans  can  read,  for  example  information   stored   in   traditional   databases.   Unstructured   data   refers   to   data   types   such   as   images,   audio,   video,   social   media   and   other   information   that   are   not   organized   or   easily   interpreted   by   traditional   databases.   It   includes   data   generated   by   machines   such   as   sensors,   web   feeds,   networks  or  service  platforms.   c. Visualization:  the  insights  gained  by  a  company  from  analysing  data  must  be  shared  in  a  way  that   is  efficient  and  understandable  to  the  company’s  stakeholders.   d. Velocity:  data  is  created,  saved,  analysed  and  visualized  at  an  increasing  speed,  making  it  possible   to  analyse  and  visualize  high  volumes  of  data  in  real  time.     e. Veracity:  it  is  essential  that  the  data  is  accurate  in  order  to  generate  value.   f. Value:  the  insights  gleaned  from  Big  Data  can  help  organizations  deepen  customer  engagement,   optimize  operations,  prevent  threats  and  fraud,  and  capitalize  on  new  sources  of  revenue.                                                                                                                             1  http://www.bain.com/publications/articles/big_data_the_organizational_challenge.aspx  
  4. 4.     4 2.2 Big  Data  techniques  and  tools   The  Big  Data  industry  has  been  supported  by  the  following  technologies:   a. The  Apache  Hadoop  software  library  was  initially  released  in  December  2011  and  is  an  open   source  framework  that  allows  for  the  distributed  processing  of  large  data  sets  across  clusters  of   computers  using  simple  algorithms.  It  is  designed  to  scale  up  from  one  to  thousands  of  machines,   each   one   being   a   computational   and   storage   unit.   The   software   library   is   designed   under   the   fundamental   assumption   that   hardware   failures   are   common:   the   library   itself   automatically   detects   and   handles   hardware   failures   in   order   to   guarantee   that   the   services   provided   by   a   computer  cluster  will  stay  available  even  when  the  cluster  is  affected  by  hardware  failures.  A  wide   variety  of  companies  and  organizations  use  Hadoop  for  both  research  and  production:  web-­‐based   companies   that   own   some   of   the   world’s   biggest   data   warehouses   (Amazon,   Facebook,   Google,   Twitter,  Yahoo!,  ...),  media  groups,  universities  among  others.  A  list  of  Hadoop  users  and  systems   is  available  at  http://wiki.apache.org/hadoop/PoweredBy.   b. Non-­‐relational  databases  have  existed  since  the  late  1960s  but  resurfaced  in  2009  (under  the   moniker  of  Not  Only  SQL  -­‐  NOSQL))  as  it  became  clear  they  are  especially  well  suited  to  handle  the   Big   Data   challenges   of   volume   and   variety   and   as   they   neatly   fit   within   the   Apache   Hadoop   framework.   c. Cloud   Computing   is   a   kind   of   internet-­‐based   computing,   where   shared   resources   and   information   are   provided   to   computers   and   other   devices   on-­‐demand   (Wikipedia).   A   service   provider  offers  computing  resources  for  a  fixed  price,  available  online  and  in  general  with  a  high   degree  of  flexibility  and  reliability.  These  technologies  have  been  created  by  major  online  actors   (Amazon,  Google)  followed  by  other  technology  providers  (IBM,  Microsoft,  RedHat).  There  is  a   wide  variety  of  architecture  Public,  Private  and  Hybride  Cloud  with  all  the  objective  of  making   computing  infrastructure  a  commodity  asset  with  the  best  quality/total  cost  of  ownership  ratio.   Having  a  nearly  infinite  amount  of  computing  power  at  hand  with  a  high  flexibility  is  a  key  factor   for  the  success  of  Big  Data  initiatives.   d. Mining  Massive  Datasets  is  a  set  of  methods,  algorithms  and  techniques  that  can  be  used  to  deal   with  Big  Data  problems  and  in  particular  with  volume,  variety  and  velocity  issues.  PageRank  can   be   seen   as   a   major   step   (see   http://infolab.stanford.edu/pub/papers/google.pdf)   and   its   evolution  to  a  Map-­‐Reduce  (https://en.wikipedia.org/wiki/MapReduce)  approach  is  definitively  a   breakthrough.  Social  Netword  Analysis  is  becoming  an  area  of  research  in  itself  that  aim  to  extract   useful   information   from   the   massive   amount   of   data   the   Social   Networks   are   providing.   These   methods   are   very   well   suited   to   run   on   software   such   as   Hadoop   in   a   Cloud   Computing   environment.   e. Social  Networks  is  one  source  of  Bid  Data  that  provides  a  stream  of  data  with  a  huge  value  for   almost  all  economic  (and  even  non-­‐economic)  actors.  For  most  companies,  it  is  the  very  first  time   in  history  they  are  capable  of  interacting  directly  with  their  customers.  Many  applications  of  Big   Data   make   use   of   these   data   to   provide   enhanced   services,   products   and   to   increase   customer   satisfaction.   2.3 Big  Data  Applications   Big  Data  has  the  potential  to  change  the  way  academic  institutions,  corporate  and  organizations  conduct   business  and  change  our  daily  life.  Great  examples  of  Big  Data  applications  include:   a. Healthcare:   Big   Data   technologies   will   have   a   major   impact   in   healthcare.   IBM   estimates   that   80%  of  medical  data  is  unstructured  and  is  clinically  relevant.  Furthermore  medical  data  resides   in  multiple  places  like  individual  medical  files,  lab  and  imaging  systems,  physician  notes,  medical   correspondence,   etc.   Big   Data   technologies   allow   healthcare   organizations   to   bring   all   the   information   about   an   individual   together   to   get   insights   on   how   to   manage   care   coordination,   outcomes-­‐based  reimbursement  models,  patient  engagement  and  outreach  programs.   b. Retail:  Retailers  can  get  insights  for  personalizing  marketing  and  improving  the  effectiveness  of   marketing  campaigns,  for  optimizing  assortment  and  merchandising  decisions,  and  for  removing   inefficiencies  in  distribution  and  operations.  For  instance  several  retailers  now  incorporate  
  5. 5.     5 Twitter  streams  into  their  analysis  of  loyalty-­‐program  data.  The  gained  insights  make  it  possible   to  plan  for  surges  in  demand  for  certain  items  and  to  create  mobile  marketing  campaigns   targeting  specific  customers  with  offers  at  the  times  of  day  they  would  be  most  receptive  to  them.2   c. Politics:  Big  Data  technologies  will  improve  the  efficiency  and  effectiveness  across  the  broad   range  of  government  responsibilities.  Great  example  of  Big  Data  use  in  politics  was  2012  analytics   and  metrics  driven  Barack  Obama’s  presidential  campaign  [1].  Other  examples  include:   i. Threat  and  crime  prediction  and  prevention.  For  instance  the  Detroit  Crime  Commission   has  turned  to  Big  Data  in  its  effort  to  assist  the  government  and  citizens  of  southeast   Michigan  in  the  prevention,  investigation  and  prosecution  of  neighbourhood  crime;3   ii. Detection  of  fraud,  waste  and  errors  in  social  programs;   iii. Detection  of  tax  fraud  and  abuse.   d. Cyber  risk  prevention:  companies  can  analyse  data  traffic  in  their  computer  networks  in  real   time  to  detect  anomalies  that  may  indicate  the  early  stages  of  a  cyber  attack.  Research  firm   Gartner  estimates  that  by  2016,  more  than  25%  of  global  firms  will  adopt  big  data  analytics  for  at   least  one  security  and  fraud  detection  use  case,  up  from  8%  as  at  2014.4   e. Insurance  fraud  detection:  Insurance  companies  can  determine  a  score  for  each  claim  in  order   to  target  for  fraud  investigation  the  claims  with  the  highest  scores  i.e.  the  ones  that  are  most  likely   to  be  fraudulent.  Fraud  detection  is  treated  in  paragraph  3.4.   f. Usage-­‐Based  Insurance:  is  an  insurance  scheme,  where  car  insurance  premiums  are  calculated   based  on  dynamic  causal  data,  including  actual  usage  and  driving  behaviour.  Telematics  data   transmitted  from  a  vehicle  combined  with  Big  Data  analytics  enables  insurers  to  distinguish   cautious  drivers  from  aggressive  drivers  and  match  insurance  rate  with  the  actual  risk  incurred.   2.4 Data  driven  business   The   quantity   of   data   is   steeply   increasing   month   after   month   in   the   world.   Some   argue   it   is   time   to   organize  and  use  this  information:  data  must  now  be  viewed  as  a  corporate  asset.    In  order  to  respond  to   this  arising  transformation  of  business  culture,  two  specific  C-­‐level  roles  have  thus  appeared  in  the  past   years,  one  in  the  banking  and  the  other  in  the  insurance  industry.   2.4.1 The  Chief  Data  Officer   The  Chief  Data  Officer  (abbreviated  to  CDO)  is  the  first  architect  of  this  “data-­‐driven  business”.  Thanks   to  his  role  of  coordinator,  the  CDO  will  be  in  charge  of  the  data  that  drive  the  company,  by:     • defining  and  setting  up  a  strategy  to  guarantee  their  quality,  their  reliability  and  their   coherency;   • organizing  and  classifying  them;   • making  them  accessible  to  the  right  person  at  the  right  moment,  for  the  pertinent  need  and  in   the  right  format.   Thus,  the  Chief  Data  Officer  needs  a  strong  business  background  to  understand  how  business  runs.  The   following   question   will   then   emerge:   to   whom   should   the   CDO   report?   In   some   firms,   the   CDO   is   considered  part  of  the  IT,  and  reports  to  the  CTO  (Chief  Technology  Officer);  in  others,  he  holds  more  of  a   business  role,  reporting  to  the  CEO.  It’s  therefore  up  to  the  company  to  decide,  as  not  two  companies  are   exactly  similar  from  a  structural  point  of  view.     Which   companies   have   already   a   CDO?   Generali   Group   has   appointed   someone   to   this   newly   created   position   in   June   2015.   Other   companies   such   as   HSBC,   Wells   Fargo   and   QBE   had   already   appointed   a   person   to   this   position   in   2013   or   2014.   Even   Barack   Obama   appointed   a   Chief   Data   Officer/Scientist   during  his  2012  campaign  and  the  metrics-­‐driven  decision-­‐making  campaign  played  a  big  role  in  Obama’s                                                                                                                             2  http://asmarterplanet.com/blog/2015/03/surprising-­‐insights-­‐ibmtwitter-­‐alliance.html#more-­‐33140   3  http://www.datameer.com/company/news/press-­‐releases/detroit-­‐crime-­‐commission-­‐combats-­‐crime-­‐with-­‐ datameer-­‐big-­‐data-­‐analytics.html   4  http://www.gartner.com/newsroom/id/2663015  
  6. 6.     6 re-­‐election.   In   the   beginning,   most   of   the   professionals   holding   the   actual   job   title   “Chief   Data   Officer”   were  located  in  the  United  States.  After  a  while,  Europe  followed  the  move.  Also,  lots  of  people  did  the  job   in  their  day-­‐to-­‐day  work,  but  didn’t  necessarily  hold  the  title.  Many  analysts  in  the  financial  sector  believe   that  yet  more  insurance  and  banking  companies  will  have  to  do  the  move  in  the  following  years  if  they   want  to  stay  attractive.   2.4.2 The  Chief  Analytics  Officer   Another  C-­‐level  position  aroused  in  the  past  months:  the  Chief  Analytics  Officer  (abbreviated  to  CAO).  Are   there  differences  between  a  CAO  and  a  CDO?    Theoretically  a  CDO  focuses  on  tactical  data  management,   while  the  CAO  concentrates  on  the  strategic  deployment  of  analytics.  The  latter’s  focus  is  on  data  analysis   to   find   hidden,   but   valuable,   patterns.   These   will   result   in   operational   decisions   that   will   make   the   company   more   competitive,   more   efficient   and   more   attractive   to   their   potential   and   current   clients.   Therefore,   the   CAO   is   a   normal   prolongation   of   the   data-­‐driven   business:   the   more   analytics   are   embedded  in  the  organization,  the  more  you  need  an  executive-­‐level  person  to  manage  that  position  and   communicate  the  results  in  an  understandable  way.  The  CAO  usually  reports  to  the  CEO.   In   practice,   some   companies   put   the   CAO   responsibilities   into   the   CDO   tasks,   while   others   distinguish   both  positions.  Currently,  it’s  quite  rare  to  find  an  explicit  “Chief  Analytics  Officer”  position  in  the  banking   and  insurance  sector,  because  of  this  overlap.  But  in  other  fields,  the  distinction  is  often  made.   3 Big  Data  in  insurance  value  chain   Big   Data   provides   new   insights   from   social   networks,   telematics   sensors,   and   other   new   information   channels   and   as   a   result   it   allows   understanding   customer   preferences   better,   enabling   new   business   approaches  and  products,  and  enhancing  existing  internal  models,  processes  and  services.  With  the  rise   of  Big  Data  the  insurance  world  could  fundamentally  change  and  the  entire  insurance  value  chain  could   be  impacted  starting  from  underwriting  to  claims  management.       3.1 Insurance  underwriting   3.1.1 Introduction   In  traditional  insurance  underwriting  and  actuarial  analyses,  for  years  we  have  been  observing  a  never-­‐ ending  search  for  more  meaningful  insight  into  individual  policyholder  risk  characteristics  to  distinguish   good   risks   from   the   bad   and   to   accurately   price   each   risk   accordingly.   The   analytics   performed   by   actuaries,  based  on  advanced  mathematical  and  financial  theories,  have  always  been  critically  important   to   an   insurer’s   profitability.   Over   the   last   decade,   however,   revolutionary   advances   in   computing   technology   and   the   explosion   of   new   digital   data   sources   have   expanded   and   reinvented   the   core   disciplines   of   insurers.   Today’s   advanced   analytics   in   insurance   go   much   further   than   traditional   underwriting  and  actuarial  science.  Data  mining  and  predictive  modelling  is  today  the  way  forward  for   insurers  for  improving  pricing,  segmentation  and  increasing  profitability.   3.1.2 What  is  predictive  modelling?   Predictive  modelling  can  be  defined  as  the  analysis  of  large  historical  data  sets  to  identify  correlations   and  interactions  and  the  use  of  this  knowledge  to  predict  future  events.  For  actuaries,  the  concepts  of   predictive  modelling  are  not  new  to  the  profession.  The  use  of  mortality  tables  to  price  life  insurance   products   is   an   example   of   predictive   modelling.   The   Belgian   MK,   FK   and   MR,   FR   tables   showed   the   relationship  between  death  probability  and  the  explaining  variables  of  age,  sex  and  product  type  (in  this   case  life  insurance  or  annuity).   Predictive   models   have   been   around   a   long   time   in   sales   and   marketing   environments   for   example   to   predict  the  probability  of  a  customer  to  buy  a  new  product.  Bringing  together  expertise  from  both  the   actuarial   profession   and   marketing   analytics   can   lead   to   new   innovative   initiatives   where   predictive   models  guide  expert  decisions  in  areas  such  as  claims  management,  fraud  detection  and  underwriting.   3.1.3 From  small  over  medium  to  Big  Data   Insurers  collect  a  wealth  of  information  on  their  customers.  In  the  first  place  during  the  underwriting   process:   by   asking   about   the   claims   history   of   a   customer   for   car   and   home   insurance   for   example.   Another  source  is  the  history  of  the  relationship  the  customer  has  with  the  insurance  company.  While  in   the  past  the  data  was  kept  in  silos  by  product,  the  key  challenge  now  lies  in  gathering  all  this  information   into  one  place  where  the  customer  dimension  is  central.  The  transversal  approach  to  the  database  also  
  7. 7.     7 reflects  the  recent  evolution  in  marketing:  going  from  the  4P’s  (product,  price,  place,  promotion)  to  the   4C’s5  (customer,  costs,  convenience,  communication).   On  top  of  unleashing  the  value  of  internal  data,  new  data  sources  are  becoming  available  like  for  instance   wearables,  social  networks  to  name  few.  Because  Big  Data  can  be  overwhelming  to  start  with,  medium   data   should   be   considered   at   first.   In   Belgium,   the   strong   bancassurance   tradition   offers   interesting   opportunities  of  combining  the  insurance  and  bank  data  to  create  powerful  predictive  models.   3.1.4 Examples  of  predictive  modelling  for  underwriting   1°  Use  the  360  view  on  the  customer  and  predictive  models  to  maximize  profitability  and  gain  more   business.   By   thoroughly   analysing   data   from   different   sources   and   applying   analytics   to   gain   insight,   insurance   companies   should   strive   to   develop   a   comprehensive   360-­‐degree   customer   view.   The   gains   of   this   complete  and  accurate  view  of  the  customer  are  twofold:   • Maximizing  the  profitability  of  the  current  customer  portfolio  through:   o detecting  cross-­‐sell  and  up-­‐sell  opportunities;   o customer  satisfaction  and  loyalty  actions,   o effective  targeting  of  products  and  services  (e.g.    customers  that  are  most  likely  to  be  in   good  health  or  those  customers  that  are  less  likely  to  have  a  car  accident).   • Acquiring   more   profitable   new   customers   at   a   reduced   marketing   cost:   modelling   the   existing   customers  will  lead  to  useful  information  to  focus  marketing  campaigns  on  the  most  interesting   prospects.   By  combining  data  mining  and  analytics,  insurance  companies  can  better  understand  which  customers   are  most  likely  to  buy,  discover  who  are  their  most  profitable  customers  and  how  to  attract  or  retain   more   of   them.   Another   use   case   can   be   the   evaluation   of   the   underwriting   process   to   improve   the   customer  experience  during  this  on-­‐boarding  process.   2°  Predictive  underwriting  for  life  insurance6   Using  predictive  models,  in  theory  it  is  possible  to  predict  the  death  probability  of  a  customer.  However,   the  low  frequency  of  life  insurance  claims  presents  a  challenge  to  modellers.  While  for  car  insurance,  the   probability  of  a  customer  having  a  claim  can  be  around  10%,  for  life  insurance  it  is  around  0,1%  for  the   first  year.  Not  only  does  this  mean  that  a  significant  in  force  book  is  needed  to  have  confidence  in  the   results,  but  also  that  sufficient  history  should  be  present  to  be  able  to  show  mortality  experience  over   time.  For  this  reason,  using  the  underwriting  decision  as  the  variable  to  predict  is  a  more  common  choice.   All  life  insurance  companies  hold  historical  data  on  medical  underwriting  decisions  that  can  be  leveraged   to  build  predictive  models  that  predict  underwriting  decisions.  Depending  on  how  the  model  is  used,  the   outcome  can  be  a  reduction  of  costs  for  medical  examinations,  to  have  more  customer  friendly  processes   by  avoiding  asking  numerous  invasive  personal  questions  or  a  reduction  in  time  needed  to  assess  the   risks  by  automatically  approving  good  risks  and  focusing  underwriting  efforts  on  more  complex  cases.   For   example,   if   the   predictive   model   tells   you   that   a   new   customer   has   a   high   degree   of   similarity   to   customers   that   passed   the   medical   examination,   the   medical   examination   could   be   waved   for   this   customer.   If  this  sounds  scary  for  risk  professionals,  first  a  softer  approach  can  be  tested,  for  instance  by  improving   marketing  actions  by  targeting  only  those  individuals  that  have  a  high  likelihood  to  be  in  good  health.   This   not   only   decreases   the   cost   of   the   campaign,   but   also   avoids   the   disappointment   of   a   potential   customer  who  is  refused  during  the  medical  screening  process.                                                                                                                                 5  http://www.customfitonline.com/news/2012/10/19/4-­‐cs-­‐versus-­‐the-­‐4-­‐ps-­‐of-­‐marketing/   6  Predictive  modeling  for  life  insurance,  April  2010,  Deloitte  
  8. 8.     8 3.1.5 Challenges  of  predictive  modelling  in  underwriting7   Predictive  models  can  only  be  as  good  as  the  input  used  to  calibrate  the  model.  The  first  challenge  in   every  predictive  modelling  project  is  to  collect  relevant,  high  quality  data  of  which  a  history  is  present.  As   many   insurers   are   currently   replacing   legacy   systems   to   reduce   maintenance   costs,   this   can   be   at   the   expense  of  the  history.  Actuaries  are  uniquely  placed  to  prevent  the  history  being  lost,  as  for  adequate   risk   management;   a   portfolio’s   history   should   be   kept.   The   trend   of   moving   all   policies   from   several   legacy  systems  into  one  modern  single  policy  administration  system  is  an  opportunity  that  must  be  seized   so  in  the  future  data  collection  will  be  easier.   Once  the  necessary  data  are  collected,  some  legal  or  compliance  concerns  need  to  be  addressed  as  there   might  be  boundaries  to  using  certain  variables  in  the  underwriting  process.  In  Europe,  if  the  model  will   influence  the  price  of  the  insurance,  gender  is  no  longer  allowed  as  an  explanatory  variable.  And  this  is   only  one  example.  It  is  important  that  the  purpose  of  the  model  and  the  possible  inputs  are  discussed   with  the  legal  department  prior  to  starting  the  modelling.   Once  the  model  is  built,  it  is  important  that  the  users  realize  that  no  model  is  perfect.  This  means  that   residual  risks  will  be  present  and  this  should  be  put  in  the  balance  against  the  gains  that  the  use  of  the   model  can  bring.   And  finally,  once  a  predictive  model  has  been  set  up,  a  continuous  reviewing  cycle  must  be  put  in  place   that  collects  feedback  from  the  underwriting  and  sales  teams  and  collects  data  to  improve  and  refine  the   model.  Building  a  predictive  model  is  a  continuous  improvement  process,  not  a  one-­‐off  project.   3.2 Insurance  pricing   3.2.1 Overview  of  existing  pricing  techniques   The  first  rate-­‐making  techniques  were  based  on  rudimentary  methods  such  as  univariate  analysis  and   later  iterative  standardized  univariate  methods  such  as  the  minimum  bias  procedure.  They  look  at  how   changes  in  one  characteristic  result  in  differences  in  loss  frequency  or  severity.     Later   on   insurance   companies   moved   to   multivariate   methods.   However,   this   was   associated   with   a   further   development   of   the   computing   power   and   data   capabilities.   These   techniques   are   now   being   adopted  by  more  and  more  insurers  and  are  becoming  part  of  everyday  business  practices.  Multivariate   analytical  techniques  focus  on  individual  level  data  and  take  into  account  the  effects  (interactions)  that   many  different  characteristics  of  a  risk  have  on  one  another.  As  it  was  explained  in  the  previous  section,   many  companies  use  predictive  modelling  (a  form  of  multivariate  analysis)  to  create  measures  of  the   likelihood  that  a  customer  will  purchase  a  particular  product.  Banks  use  these  tools  to  create  measures   (e.g.  credit  scores)  of  whether  a  client  will  be  able  to  meet  lending  obligations  for  a  loan  or  mortgage.   Similarly,   P&C   insurers   can   use   predictive   models   to   predict   claim   behaviour.   Multivariate   methods   provide  valuable  diagnostics  that  aid  in  understanding  the  certainty  and  reasonableness  of  results.     Generalized  Linear  Models  are  essentially  a  generalized  form  of  linear  models.  This  family  encompasses   normal   error   linear   regression   models   and   the   nonlinear   exponential,   logistic   and   Poisson   regression   models,  as  well  as  many  other  models,  such  as  log-­‐linear  models  for  categorical  data.  Generalized  linear   models  have  become  the  standard  for  classification  rate-­‐making  in  most  developed  insurance  markets— particularly  because  of  the  benefit  of  transparency.  Understanding  the  mathematical  underpinnings  is  an   important  responsibility  of  the  rate-­‐making  actuary  who  intends  to  use  such  a  method.  Linear  models  are   a   good   place   to   start   as   GLMs   are   essentially   a   generalized   form   of   such   a   model.   As   with   many   techniques,  visualizing  the  GLM  results  is  an  intuitive  way  to  connect  the  theory  with  the  practical  use.   GLMs  do  not  stand  alone  as  the  only  multivariate  classification  method.  Other  methods  such  as  CART,   factor  analysis,  and  neural  networks  are  often  used  to  augment  GLM  analysis.     In  general  the  data  mining  techniques  listed  above  can  enhance  a  rate-­‐making  exercise  by:   • whittling  down  a  long  list  of  potential  explanatory  variables  to  a  more  manageable  list  for  use   within  a  GLM;   • providing  guidance  in  how  to  categorize  discrete  variables;                                                                                                                             7  Predictive  modelling  in  insurance:  key  issues  to  consider  throughout  the  lifecycle  of  a  model  
  9. 9.     9 • reducing   the   dimension   of   multi-­‐level   discrete   variables   (i.e.,   condensing   100   levels,   many   of   which  have  few  or  no  claims,  into  20  homogenous  levels);   • identifying   candidates   for   interaction   variables   within   GLMs   by   detecting   patterns   of   interdependency  between  variables.     3.2.2 Old  versus  new  modelling  techniques   The  adoption  of  GLMs  resulted  in  many  companies  seeking  external  data  sources  to  augment  what  had   already   been   collected   and   analysed   about   their   own   policies.   This   includes   but   is   not   limited   to   information   about   geo-­‐demographics,   sensor   data,   social   media   information,   weather,   and   property   characteristics,  information  about  insured  individuals  or  business.  This  additional  data  helps  actuaries   further  improve  the  granularity  and  accuracy  of  classification  rate-­‐making.  Unfortunately  this  new  data  is   very   often   unstructured   and   massive,   and   hence   the   traditional   generalized   linear   model   (GLM)   techniques  become  useless.   With   so   many   unique   new   variables   in   play,   it   can   become   a   very   difficult   task   to   identify   and   take   advantage   of   the   most   meaningful   correlations.   In   many   cases,   GLM   techniques   are   simply   unable   to   penetrate  deeply  into  these  giant  stores.  Even  in  the  cases  when  they  can,  the  time  constraints  required  to   uncover  the  critical  correlations  tend  to  be  onerous,  requiring  days,  weeks,  and  even  months  of  analysis.   Only   with   advanced   techniques,   and   specifically   machine   learning,   can   companies   generate   predictive   models  to  take  advantage  of  all  the  data  they  are  capturing.     Machine  learning  is  the  modern  science  of  finding  patterns  and  making  predictions  from  data  based  on   work   in   multivariate   statistics,   data   mining,   pattern   recognition,   and   advanced/predictive   analytics.   Machine  learning  methods  are  particularly  effective  in  situations  where  deep  and  predictive  insights  need   to  be  uncovered  from  data  sets  that  are  large,  diverse  and  fast  changing  —  Big  Data.  Across  these  types  of   data,  machine  learning  easily  outperforms  traditional  methods  on  accuracy,  scale,  and  speed.   3.2.3 Personalized  and  Real-­‐time  pricing  –  Motor  Insurance   In  order  to  price  risk  more  accurately,  insurance  companies  are  now  combining  analytical  applications  –   e.g.  behavioural  models  based  on  customer  profile  data  –  with  a  continuous  stream  of  real  time  data  –  e.g.   satellite  data,  weather  reports,  vehicle  sensors  –  to  create  detailed  and  personalized  assessment  of  risk.   Usage-­‐based  insurance  (UBI)  has  been  around  for  a  while  –  it  began  with  Pay-­‐As-­‐You-­‐Drive  programs   that  gave  drivers  discounts  on  their  insurance  premiums  for  driving  under  a  set  number  of  miles.  These   soon   developed   into   Pay-­‐How-­‐You-­‐Drive   programs,   which   track   your   driving   habits   and   give   you   discounts  for  'safe'  driving.   UBI  allows  a  firm  to  snap  a  picture  of  an  individual's  specific  risk  profile,  based  on  that  individual's  actual   driving  habits.  UBI  condenses  the  period  of  time  under  inspection  to  a  few  months,  guaranteeing  a  much   more  relevant  pool  of  information.  With  all  this  data  available,  the  pricing  scheme  for  UBI  deviates  greatly   from   that   of   traditional   auto   insurance.   Traditional   auto   insurance   relies   on   actuarial   studies   of   aggregated  historical  data  to  produce  rating  factors  that  include  driving  record,  credit-­‐based  insurance   score,  personal  characteristics  (age,  gender,  and  marital  status),  vehicle  type,  living  location,  vehicle  use,   previous  claims,  liability  limits,  and  deductibles.     Policyholders  tend  to  think  of  traditional  auto  insurance  as  a  fixed  cost,  assessed  annually  and  usually   paid  for  in  lump  sums  on  an  annual,  semi-­‐annual,  or  quarterly  basis.  However,  studies  show  that  there  is  a   strong   correlation   between   claim   and   loss   costs   and   mileage   driven,   particularly   within   existing   price   rating  factors  (such  as  class  and  territory).  For  this  reason,  many  UBI  programs  seek  to  convert  the  fixed   costs  associated  with  mileage  driven  into  variable  costs  that  can  be  used  in  conjunction  with  other  rating   factors   in   the   premium   calculation.   UBI   has   the   advantage   of   utilizing   individual   and   current   driving   behaviours,  rather  than  relying  on  aggregated  statistics  and  driving  records  that  are  based  on  past  trends   and  events,  making  premium  pricing  more  individualized  and  precise.   3.2.4 Advantages   UBI  programs  offer  many  advantages  to  insurers,  consumers  and  society.  Linking  insurance  premiums   more  closely  to  actual  individual  vehicle  or  fleet  performance  allows  insurers  to  price  premiums  more   accurately.   This   increases   affordability   for   lower-­‐risk   drivers,   many   of   whom   are   also   lower-­‐income   drivers.  It  also  gives  consumers  the  ability  to  control  their  premium  costs  by  encouraging  them  to  reduce  
  10. 10.     10 miles   driven   and   adopt   safer   driving   habits.   The   use   of   telematics   helps   insurers   to   more   accurately   estimate  accident  damages  and  reduce  fraud  by  enabling  them  to  analyse  the  driving  data  (such  as  hard   breaking,  speed,  and  time)  during  an  accident.  This  additional  data  can  also  be  used  by  insurers  to  refine   or  differentiate  UBI  products.     3.2.5 Shortcomings/challenges     3.2.5.1 Organization  and  resources   Taking   advantage   of   the   potential   of   Big   Data   requires   some   different   approaches   to   organization,   resources,   and   technology.   As   in   many   new   technologies   that   offer   promise,   there   are   challenges   to   successful   implementation   and   the   production   of   meaningful   business   results.   The   number   one   organizational  challenge  is  determining  the  business  value,  with  financing  as  a  close  second.  Talent  is  the   other  big  issue  –  identifying  the  business  and  technology  experts  inside  the  enterprise,  recruiting  new   employees,  training  and  mentoring  individuals,  and  partnering  with  outside  resources  is  clearly  a  critical   success  factor  for  Big  Data.  Implementing  the  new  technology  and  organizing  the  data  are  listed  as  lesser   challenges  by  insurers,  although  there  are  still  areas  that  require  attention.   3.2.5.2 Technology  challenges   The  biggest  technology  challenge  in  the  Big  Data  world  is  framed  in  the  context  of  different  Big  Data  “V”   characteristics.  These  include  the  standard  three  V’s  of  volume,  velocity,  and  variety,  plus  two  more  –   veracity   and   value.   The   variety   and   veracity   of   the   data   presents   the   biggest   challenges.   As   insurers   venture   beyond   analysis   of   structured   transaction   data   to   incorporate   external   data   and   unstructured   data  of  all  sorts,  the  ability  to  combine  and  input  the  data  into  an  analytic  analysis  may  be  complicated.  On   one  hand,  the  variety  expresses  the  promise  of  Big  Data,  but  on  the  other  hand,  the  technical  challenges   are   significant.   The   veracity   of   the   data   is   also   deemed   as   a   challenge.   It   is   true   that   some   Big   Data   analyses  do  not  require  the  data  to  be  as  cleaned  and  organized  as  in  traditional  approaches.  However,   the  data  must  still  reflect  the  underlying  truth/reality  of  the  domain.   3.2.5.3 Technology  Approaches   Technology  should  not  be  the  first  focus  area  for  evaluating  the  potential  of  Big  Data  in  an  organization.   However,   choosing   the   best   technology   platform   for   your   organization   and   business   problems   does   become  an  important  consideration  for  success.  Cloud  computing  will  play  a  very  important  role  in  Big   Data.  Although  there  are  challenges  and  new  approaches  required  for  Big  Data,  there  is  a  growing  body  of   experience,  expertise,  and  best  practices  to  assist  in  successful  Big  Data  implementations.   3.3 Insurance  Reserving   Loss  reserving  is  a  classic  actuarial  problem  encountered  extensively  in  motor,  property  and  casualty  as   well  as  in  health  insurance.  It  is  a  consequence  of  the  fact  that  insurers  need  to  set  reserves  to  cover   future  liabilities  related  to  the  book  of  contracts.  In  other  words  the  insurer  has  to  hold  funds  aside  to   meet  future  liabilities  attached  to  incurred  claims.     In  non-­‐life  insurance,  most  policies  run  for  a  period  of  12  months.  However  the  claims  payment  process   can  take  years  or  even  decades.  In  particular,  losses  arising  from  casualty  insurance  can  take  a  long  time   to   settle   and   even   when   the   claims   are   acknowledged,   it   may   take   time   to   establish   the   extent   of   the   claims   settlement   costs.   A   well-­‐known   and   costly   example   is   provided   by   the   claims   from   asbestos   liabilities.  Thus  it  is  not  a  surprise  that  the  biggest  item  on  the  liabilities  side  of  an  insurer’s  balance  sheet   is   often   the   provision   of   reserves   for   future   claims   payments.   It   is   the   job   of   the   reserving   actuary   to   predict,   with   maximum   accuracy,   the   total   amount   necessary   to   pay   those   claims   that   the   insurer   has   legally  committed  to  cover  for.     Historically,  reserving  was  based  on  deterministic  calculations  with  pen  and  paper,  combined  with  expert   judgement.   Since   the   1980s,   the   arrival   of   personal   computers   and   ‘spreadsheet’   software   packages   induced  a  real  change  for  the  reserving  actuaries.  The  use  of  spreadsheets  does  not  only  result  in  gain  of   calculation  time  but  allows  also  testing  different  scenarios  and  the  sensitivity  of  the  forecasts.  The  first   simple  models  used  by  actuaries  started  to  evolve  towards  more  developed  ideas  through  the  evolution   of   the   IT   resources.   Moreover   the   recent   changes   in   regulatory   requirements,   such   as   Solvency   II   in   Europe,  have  showed  the  need  of  stochastic  models  and  more  precise  statistical  techniques.        
  11. 11.     11 3.3.1 Classical  methods   There  are  a  lot  of  different  frameworks  and  models  used  by  reserving  actuaries  to  compute  the  technical   provisions,  and  it  is  not  the  goal  of  this  paper  to  review  them  in  an  exhaustive  way  but  rather  to  show  that   they  share  the  central  notion  of  triangle.  A  triangle  is  a  way  of  presenting  data  in  the  form  of  a  triangular   structure  showing  the  development  of  claims  over  time  for  each  origin  period.  An  origin  period  can  be  the   year  the  policy  was  written  or  earned,  or  the  loss  occurrence  period.       After  having  used  deterministic  models,  reserving  generally  switches  to  stochastic  models.  These  models   allow  for  quantifying  reserve  risk.       The  use  of  models  based  on  aggregated  data  used  to  be  convenient  in  the  past  when  IT  resources  were   limited  but  is  more  and  more  questionable  nowadays  when  we  have  huge  computational  power  at  hand   at  an  affordable  price.  Therefore  there  is  a  need  to  move  to  models  that  fully  use  data  available  in  the   insurers’  data  warehouses.     3.3.2 Micro-­‐level  reserving  methods   Unlike  aggregate  models  (or  macro-­‐level  models),  micro-­‐level  reserving  methods  (also  called  individual   claim   level   models)   use   individual   claims   data   as   inputs   and   estimate   outstanding   liabilities   for   each   individual  claim.  Unlike  the  models  detailed  in  the  previous  section,  they  model  very  precisely  the  lifetime   development   process   of   each   individual   claim,   including   events   such   as   claim   occurrence,   reporting,   payments  and  settlement.  Moreover  they  can  include  micro-­‐level  covariates  such  as  information  about   the  policy,  the  policyholder,  claim,  claimant  and  transactions.     When  well  specified,  such  models  are  expected  to  generate  reliable  reserve  estimates.  Indeed  the  ability   to   model   the   claims   development   at   the   individual   level   and   to   incorporate   micro-­‐level   covariate   information  allows  micro-­‐level  models  to  handle  heterogeneities  in  claims  data  efficiently.  Moreover  the   large   amount   of   data   used   in   modelling   can   help   to   avoid   issues   of   over-­‐parameterization   and   lack   of   robustness.   As   a   consequence,   micro-­‐level   models   are   especially   significant   under   changing   environments,  as  these  changes  can  be  indicated  by  appropriate  covariates.     3.4 Claims  Management   Big  Data  can  play  a  tremendous  role  in  the  improvement  of  claims  management.  It  provides  access  to  data   that  was  not  available  before,  and  makes  the  claims  processing  faster.  Therefore  it  enables  improved  risk   management,  reduces  loss  adjustment  expenses  and  enhances  quality  of  service  resulting  in  increased   customer  retention.  Below  we  present  details  of  how  Big  Data  analytics  improves  fraud  detection  process.   3.4.1 Fraud  detection   It  is  estimated  that  a  typical  organization  loses  5%  of  its  revenues  to  fraud  each  year8.    The  total  cost  of   insurance  fraud  (non-­‐health  insurance)  in  the  US  is  estimated  to  be  more  than  $40  billion  per  year9.    The   advent  of  Big  Data  &  Analytics  has  provided  new  and  powerful  tools  to  fight  fraud.       3.4.2 What  are  the  current  challenges  in  fraud  detection?   The  first  challenge  is  finding  the  right  data.    Analytical  models  need  data  and  in  a  fraud  detection  setting   this   is   not   always   that   evident.     Collected   fraud   data   are   often   very   skew,   with   typically   less   than   1%   fraudsters,  which  seriously  complicates  the  detection  task.    Also  the  asymmetric  costs  of  missing  fraud   versus   harassing   non-­‐fraudulent   customers   represent   important   model   difficulties.     Furthermore,   fraudsters   try   to   constantly   outperform   the   analytical   models   such   that   these   models   should   be   permanently  monitored  and  re-­‐configured  on  an  ongoing  basis.       3.4.3 What  analytical  approaches  are  being  used  to  tackle  fraud?   Most   of   the   fraud   detection   models   in   use   nowadays   are   expert   based   models.     When   data   becomes   available,  one  can  start  doing  analytics.    A  first  approach  is  supervised  learning  which  analyses  a  labelled   data   set   of   historically   observed   fraud   behaviour.     It   can   be   used   to   both   predict   fraud   as   well   as   the   amount   thereof.     Unsupervised   learning   starts   from   an   unlabelled   data   set   and   performs   anomaly   detection.     Finally,   Social   network   learning   analyses   fraud   behaviour   in   networks   of   linked   entities.     Throughout  our  research,  it  has  been  found  that  this  approach  is  superior  to  all  others!                                                                                                                               8  www.acfe.com   9  www.fbi.gov  
  12. 12.     12 3.4.4 What  are  the  key  characteristics  of  successful  analytical  models  for  fraud  detection?     Successful  fraud  analytical  models  should  satisfy  various  requirements.    First,  they  should  achieve  good   statistical  performance  in  terms  of  recall  or  hit  rate,  which  is  the  percentage  of  fraudsters  labelled  by  the   analytical   model   as   suspicious,   and   precision,   which   is   the   percentage   of   fraudsters   amongst   the   ones   labelled   as   suspicious.     Next,   the   analytical   models   should   not   be   based   on   complex   mathematical   formulas  (such  as  neural  networks,  support  vector  machines,...)  but  should  provide  clear  insight  into  the   fraud   mechanisms   adopted.     This   is   particularly   important   since   the   insights   gained   will   be   used   to   develop  new  fraud  prevention  strategies.    Also  the  operational  efficiency  of  the  fraud  analytical  model   needs  to  be  evaluated.    This  refers  to  the  amount  of  resources  needed  to  calculate  the  fraud  score  and   adequately  act  upon  it.    E.g.,  in  a  credit  card  fraud  environment,  a  decision  needs  to  be  made  within  a  few   seconds  after  the  transaction  was  initiated.       3.4.5 Use  of  social  network  analytics  to  detect  fraud10    Research   has   proven   that   network   models   significantly   outperform   non-­‐network   models   in   terms   of   accuracy,  precision  and  recall.  Network  analytics  can  help  improve  fraud  detection  techniques.  Fraud  is   present  in  many  critical  human  processes  such  as  credit  card  transactions,  insurance  claim  fraud,  opinion   fraud,   social   security   fraud...   Fraud   can   be   defined   by   the   following   five   characteristics.     Fraud   is   an   uncommon,   well-­‐considered,   imperceptibly   concealed,   time-­‐evolving   and   often   carefully   organized   crime,   which  appears  in  many  types  and  forms.  Before  applying  fraud  detection  techniques,  these  five  issues   should  be  resolved  or  counterbalanced.       Fraud   is   an   uncommon   crime   and   this   means   that   it   is   an   extremely   skewed   class   distribution.   Rebalancing  techniques  could  be  used  such  as  the  SMOTE  to  counterbalance  this  effect.  SMOTE  consists  in   under  sampling  the  majority  class  of  data  (reduce  the  number  of  legitimate  cases)  and  oversampling  the   minority  class  of  data  (duplicate  of  fraud  cases  or  create  artificial  fraud  cases).       Complex  fraud  structures  are  well-­‐considered,  this  implies  that  there  will  be  changes  in  behaviour  over   time  so  not  every  time  period  will  have  the  same  importance.  A  temporal  weighting  adjustment  should   put  an  emphasis  on  the  more  important  periods  (more  recent  data  periods)  that  could  be  explanatory  of   the  fraudulent  behaviour.   Fraud  is  imperceptibly  concealed  meaning  that  it  is  difficult  to  identify  fraud.  One  could  leverage  on  expert   knowledge  to  create  features  and  help  identify  fraud.     Fraud   is   time-­‐evolving.   The   period   of   study   should   be   selected   carefully   taking   into   consideration   that   fraud   evolves   over   time.   How   much   of   previous   time   periods   could   explain   or   affect   the   present?   The   model  should  incorporate  these  changes  over  time.  Another  question  to  rise  is  in  what  time-­‐window  the   model  should  be  able  to  detect  fraud:  short,  medium  or  long  term.   The   last   characteristic   of   fraud   is   that   it   is   most   of   the   time   carefully   organized.   Fraud   is   often   not   an   individual  phenomenon,  in  fact  there  are  many  interactions  between  fraudsters.  Often  there  are  fraud   sub-­‐networks   developing   in   a   bigger   network.   Social   network   analysis   could   be   used   to   detect   these   networks.     Social  Network  analysis  helps  deriving  useful  patterns  and  insights  by  exploiting  the  relational  structure   between  objects.   A  network  consists  of  two  set  of  elements:  the  objects  of  the  network  which  are  called  nodes  and  the   relationships  between  nodes  which  are  called  links.  The  links  connect  two  or  more  nodes.    A  weight  could   be   assigned   to   the   nodes   and   links   to   measure   the   magnitude   of   the   crime   or   the   intensity   of   the   relationship.  When  constructing  such  networks,  focus  will  be  put  on  the  neighbourhood  of  a  node  which   is  a  subgraph  of  network  around  the  node  of  interest  (fraudster).     Once   a   network   has   been   constructed,   how   could   this   network   be   used   as   an   indicator   of   fraudulent   activities?   Fraud   could   be   detected   by   answering   following   question:   Does   the   network   contain   statistically   significant   patterns   of   homophily?   Detection   of   fraud   relies   on   a   concept   often   used   in   sociology  which  is  called  homophily.  Homophily  in  networks  consists  in  people  have  a  strong  tendency  to                                                                                                                             10  based  on  the  research  of  Véronique  Van  Vlasselaer  (KULeuven)    
  13. 13.     13 associate  with  other  whom  they  perceive  as  being  similar  to  themselves  in  some  way.  This  concept  could   be  translated  in  fraud  networks:  fraudulent  people  are  more  likely  to  be  connected  to  other  fraudulent   people.   Clustering   techniques   could   be   used   to   detect   significant   pattern   of   homophily   and   thus   could   spot  fraudsters.     Given  a  homophilic  network  with  evidence  of  fraud  clusters  then  it  is  possible  to  extract  features  from  the   network   around   the   node(s)   of   interest   (fraud   activity)   which   is   also   called   the   neighbourhood   of   the   node.  This  process  is  called  the  featurization  process:  extracting  features  for  each  network  object  based   on  its  neighbourhood.    Focus  will  be  put  on  the  first-­‐order  neighbourhood  (first-­‐degree  links)  also  known   as  the  “egonet”.  (ego:  node  of  interest  surrounded  by  its  direct  associates  known  as  alters).  Featurization   extraction  happens  at  two  levels:  egonet  generic  features  (how  many  fraudulent  resources  are  associated   to  that  company,  is  there  relationships  between  resources...)  and  alter  specific  features  (how  similar  are   the  alter  to  the  ego,  is  the  alter  involved  in  many  fraud  cases  or  not).     Once   these   first-­‐order   neighbourhood   features   for   each   subject   of   interest   (companies)   have   been   extracted  such  as  degree  of  fraudulent  resources,  the  weight  of  the  fraudulent  resources,  it  is  then  easy  to   derive  the  propagation  effect  of  these  fraudulent  influences  through  the  network.     To   conclude,   network   models   always   outperform   non-­‐network   models   as   they   are   able   to   better   distinguish   fraudsters   from   non-­‐fraudsters.     They   are   also   more   precise   in   generating   high-­‐risk   companies  and  smaller  list  and  better  detect  more  fraudulent  corporates.   3.4.6 Fraud  detection  in  motor  insurance  –  Usage-­‐Based  Insurance  example   In   2014,   Coalition   Against   Insurance   Fraud11,   with   assistance   of   business   analytics   company   SAS,   has   published  a  report  in  which  it  stresses  that  technology  plays  a  growing  role  in  fighting  fraud.  “Insurers  are   investing  in  different  technologies  to  combat  fraud,  but  a  common  component  to  all  these  solutions  is  data,”   said   Stuart   Rose,   Global   Insurance   Marketing   Principal   at   SAS.   “The   ability   to   aggregate   and   easily   visualize   data   is   essential   to   identify   specific   fraud   patterns.”   “Technology   is   playing   a   larger   and   more   trusted  role  with  insurers  in  countering  growing  fraud  threats.  Software  tools  provide  the  efficiency  insurers   need  to  thwart  more  scams  and  impose  downward  pressure  on  premiums  for  policyholders,”  said  Dennis  Jay,   the  Coalition’s  executive  director.   In  motor  insurance,  a  good  example  is  Usage-­‐Based  Insurance  (UBI),  where  insurers  can  benefit  from  the   superior   fraud   detection   that   telematics   can   provide.   It   equips   an   insurer   with   driving   behaviour   and   driving  exposure  patterns  including  information  about  speeding,  driving  dynamics,  driving  trips,  day  and   night  driving  patterns,  garaging  address  or  mileage.  In  some  sense  UBI  can  become  a  “lie  detector”  and   can  help  companies  to  detect  falsification  of  the  garaging  address,  annual  mileage  or  driving  behaviour.   Thanks  to  recording  vehicle’s  geographical  location  and  detecting  sharp  braking  and  harsh  acceleration   during  an  accident,  an  insurer  can  analyse  accident  details  and  estimate  accident  damages.  The  telematics   devices   used   in   the   UBI   can   contain   first   notice   of   loss   (FNOL)   services,   providing   very   valuable   information  for  insurers.  Analytics  performed  on  this  data  provide  additional  evidence  to  consider  when   investigating  a  claim,  and  can  help  to  reduce  fraud  and  claims  disputes.   4 Legal  aspects  of  Big  Data   4.1 Introduction   Data  processing  lies  at  the  very  heart  of  the  insurance  activities.  Insurers  and  intermediaries  collect  and   process  vast  amounts  of  personal  data  about  their  customers.  At  the  same  time  they  are  dealing  with  a   particular  type  of  ‘discrimination’  among  their  insureds.  Like  all  businesses  operating  in  Europe,  insurers   are   subject   to   European   and   national   data   protection   laws   and   anti-­‐discrimination   rules.   The   fast   technological   evolution   and   globalization   has   activated   a   comprehensive   reform   of   the   current   Data   Protection  laws.  The  EU  hopes  to  complete  a  new  General  Data  Protection  Regulation  at  the  end  of  this   year.  Insurers  are  concerned  that  this  new  Regulation  could  introduce  unintended  consequences  for  the   insurance  industry.                                                                                                                                     11  http://www.insurancefraud.org/about-­‐us.htm  
  14. 14.     14 4.2 Data  processing   4.2.1 Legislation:  an  overview   Insurers   collect   and   process   data   to   analyse   risks   that   individuals   wish   to   cover,   to   tailor   products   accordingly,  to  valuate  and  pay  claims  and  benefits,  and  detect  and  prevent  insurance  fraud.  The  rise  of   Big   Data   presents   opportunities   to   offer   more   creative,   competitive   pricing   and,   importantly,   predict   customers’   behavioural   activity.   As   insurers   continue   to   explore   this   relatively   untapped   resource,   evolutions  in  data  processing  legislation  need  to  be  followed  very  closely.         The   protection   of   personal   data   was   -­‐   as   a   separate   right   granted   to   an   individual   -­‐   for   the   first   time   guaranteed  in  the  Convention  for  the  Protection  of  Individuals  with  regard  to  Automatic  Processing   of  Personal  Data  (Convention  108).  It  was  adopted  by  the  Council  of  Europe  in  1981.   The  current,  principal  EU  legal  instrument  establishing  rules  for  fair  personal  data  processing  is  the  Data   Protection  Directive  (95/46/EC)  of  1995,  which  regulates  the  protection  of  individuals  with  regard  to   the  processing  of  personal  data  and  the  free  movement  of  such  data.  As  a  framework  law,  the  Directive   had  to  be  implemented  in  EU  Member  States  through  national  laws.  This  Directive  has  set  a  standard  for   the  legal  definition  of  personal  data  and  regulatory  responses  to  the  use  of  personal  data.  The  provisions   includes  principles  related  to  data  quality,  criteria  for  making  data  processing  legitimate  and  the  essential   right  not  to  be  subject  to  automated  individual  decisions.   The   Data   Protection   Directive   was   complemented   by   other   legal   instruments,   such   as   the   E-­‐Privacy   Directive  (2002/58/EC),  part  of  a  package  of  5  new  Directives  that  aim  to  reform  the  legal  and  regulatory   framework  of  electronic  communications  services  in  the  EU.  Personal  data  and  individuals’  fundamental   right   to   privacy   needs   to   be   protected   but   at   the   same   time   the   legislator   must   take   into   account   the   legitimate  interests  of  governments  and  businesses.  One  of  the  innovative  provisions  of  this  Directive  was   the  introduction  of  a  legal  framework  for  the  use  of  devices  for  storing  or  retrieving  information,  such  as   cookies.  Companies  must  also  inform  customers  of  the  data  processing  to  which  their  data  will  be  subject   and   obtain   subscriber   consent   before   using   traffic   data   for   marketing   or   before   offering   added   value   services  with  traffic  or  location  data.  The  EU  Cookie  Directive  (2009/136/EC),  an  amendment  of  the  E-­‐ Privacy  Directive,  aims  to  increase  consumer  protection  and  requires  websites  to  obtain  informed  consent   from  visitors  before  they  store  information  on  a  computer  or  any  web  connected  device.   In  2006  the  EU  Data  Retention  Directive  (2006/24/EC)  was  adopted  as  an  anti-­‐terrorism  measure  after   the   terrorist   attacks   in   Madrid   and   London.   However   on   8   April   2014,   the   European   Court   of   Justice  declared   this   Directive   invalid.   The   Court   took   the   view   that   the   Directive   does   not   meet   the   principle  of  proportionality  and  should  have  provided  more  safeguards  to  protect  the  fundamental  rights   with  respect  to  private  life  and  to  the  protection  of  personal  data.   Belgium  has  established  a  Privacy  Act  or  Data  Protection  Act  in  1992.  Since  the  introduction  of  the  EU   Data  Protection  Directive  (1995)  the  principles  of  that  directive  has  been  transposed  into  Belgian  law.  The   Privacy   Act   consequently   underwent   significant   changes   introduced   by   the   Act   of   11   December   1998.   Further  modifications  have  been  made  in  the  meantime,  including  those  of  the  Act  of  26  February  2006.   The   Belgian   Privacy   Commission   is   part   of   a   European   task   force,   which   includes   data   protection   authorities  from  the  Netherlands,  Belgium,  Germany,  France  and  Spain.  In  October  2014,  a  new  Privacy   Bill  was  introduced  in  the  Belgian  Federal  Parliament.  The  Bill  mainly  aims  at  providing  the  Belgian  Data   Protection   Authority   (DPA)   with   stronger   enforcement   capabilities   and   ensuring   that   Belgian   citizens   regain  control  over  their  personal  data.  To  achieve  this,  certain  new  measures  are  being  proposed  to  be   included  in  the  existing  legislation,  adopted  already  in  1992,  as  inspired  by  the  proposed  European  data   protection  Regulation.   At   this   moment   the   current   data   processing   legislation   needs   an   urgent   update.   Rapid   technological   developments,  the  increasingly  globalized  nature  of  data  flows  and  the  arrival  of  cloud  computing  pose   new   challenges   for   data   protection   authorities.   In   order   to   ensure   a   continuity   of   high   level   data   protection,  the  rules  need  to  be  brought  in  line  with  technological  developments.  The  Directive  of  1995   has  also  not  prevented  fragmentation  in  the  way  data  protection  is  implemented  across  the  Union.   In   2012   the   European   Commission   has   proposed   a   comprehensive,   pan-­‐European   reform   of   the   data   protection  rules  to  strengthen  online  privacy  rights  and  boost  Europe's  digital  economy.  On  15  June  2015,   the   Council   reached   a   ‘general   approach’   on   a   General   Data   Protection   Regulation   (GDPR)   that  
  15. 15.     15 establishes   rules   adapted   to   the   digital   era.   The   European   Commission   is   pushing   for   a   complete   agreement  between  Council  and  European  Parliament  before  the  end  of  this  year.  The  twofold  aim  of  the   Regulation  is  to  enhance  data  protection  rights  of  individuals  and  to  improve  business  opportunities  by   facilitating   the   free   flow   of   personal   data   in   the   digital   single   market.   The   Regulation   must   be   appropriately   balanced   in   order   to   guarantee   a   high   level   of   protection   of   the   individuals   and   allow   companies   to   preserve   innovation   and   competitiveness.   In   parallel   with   the   proposal   for   a   GDPR,   the   Commission  adopted  a  Directive  on  data  processing  for  law  enforcement  purposes  (5833/12).     4.2.2 Some  concerns  of  the  insurance  industry   The  European  insurance  and  reinsurance  federation,  Insurance  Europe,  is  concerned  that  the  proposed   Regulation  could  introduce  unintended  consequences  for  the  insurance  industry  and  their  policyholders.   The   new   legislation   must   correctly   balance   an   individual’s   right   to   privacy   against   the   needs   of   businesses.   The   way   insurers   process   data   must   be   taken   into   account   appropriately   so   that   they   can   perform   their   contractual   obligations,   assess   consumers’   needs   and   risks,   innovate,   and   also   combat   fraud.  There  is  also  a  clear  tension  between  Big  Data,  the  privacy  of  the  insured’s  personal  data  and  its   availability  to  business  and  the  State.   An  important  concern  is  that  the  proposed  rules  concerning  profiling  do  not  take  into  consideration  the   way  that  insurance  works.  The  Directive  of  1995  contains  rules  on  'automated  processing'  but  there  is  not   a  single  mention  of  'profiling'  in  the  text.  The  new  GDPR  aims  to  provide  more  legal  certainty  and  more   protection   for   individuals   with   respect   to   data   processing   in   the   context   of   profiling.   Insures   need   to   profile  potential  policyholders  to  measure  risk,  any  restrictions  on  profiling  could,  therefore,  translate  not   only   into   higher   insurance   prices   and   less   insurance   coverage,   but   also   into   an   inability   to   provide   consumers   with   appropriate   insurance.   Insurance   Europe   recommends   that   the   new   EU   Regulation   should   allow   insurance-­‐related   profiling   at   pre-­‐contractual   stage   and   during   the   performance   of   the   contract.  There  is  also  still  some  confusion  in  defining  profiling,  in  the  Council  approach  profiling  means   solely  automated  processing  while  Article  20(5)  proposed  by  the  European  Parliament,  could,  according   to   Insurance   Europe,   be   interpreted   as   prohibiting   fully   automated   processing,   requesting   human   intervention  for  every  single  insurance  contract  offered  to  consumers.     The   proposal   of   the   EU   Council   (June   2015)   stipulates   that   the   controller   should   use   adequate   mathematical  or  statistical  procedures  for  the  profiling.  He  must  secure  personal  data  in  a  way  which   takes  account  of  the  potential  risks  involved  for  the  interests  and  rights  of  the  data  subject  and  which   prevents  inter  alia  discriminatory  effects  against  individuals  on  the  basis  of  race  or  ethnic  origin,  political   opinions,  religion  or  beliefs,  trade  union  membership,  genetic  or  health  status,  sexual  orientation  or  that   result   in   measures   having   such   effect.   Automated   decision-­‐making   and   profiling   based   on   special   categories  of  personal  data  should  only  be  allowed  under  specific  conditions.     According  to  the  Article  29  Working  Party12  the  proposals  of  the  Council  according  to  profiling  are  still   unclear  and  do  not  foresee  sufficient  safeguards  which  should  be  put  in  place.  In  June  2015  it  renews  its   call  for  provisions  giving  the  data  subject  a  maximum  of  control  and  autonomy  when  processing  personal   data  for  profiling.  The  provisions  should  clearly  define  the  purposes  for  which  profiles  may  be  created   and  used,  including  specific  obligations  on  controllers  to  inform  the  data  subject,  in  particular  on  his  or   her  right  to  object  to  the  creation  and  the  use  of  profiles.  The  academic  Research  Group  IRISS  remarks  that   the  GDPR  does  not  clarify  whether  or  not  there  is  an  obligation  on  data  controllers  to  disclose  information   about  the  algorithm  involved  in  profiling  practices  and  suggest  clarification  on  this  point.   Insurance  Europe  also  request  that  the  GDPR  should  explicitly  recognise  insurers’  need  to  process  and   share  data  for  fraud  prevention  and  detection.  According  to  the  Council  and  the  Article  29  Working  Party   fraud  prevention  may  fall  under  the  non-­‐exhaustive  list  of  ‘legitimate  interests’  in  Article  6(1)  (f)  and  will   provide  the  necessary  legal  basis  to  allow  processes  for  combatting  insurance  fraud.   The   new   Regulation   proposes   also   a   new   right   to   data   portability,   enabling   easier   transmission   of   personal  data  from  one  service  provider  to  another.  This  would  allow  policyholders  to  obtain  a  copy  of   any  of  their  data  being  processed  by  an  insurer  and  insurers  could  be  forced  to  disclose  confidential  and                                                                                                                             12  Article  29  Working  Party  is  an  independent  advisory  body  on  data  protection  and  privacy,  set  up  under  Data   Protection  Direction  of  1995.  It  is  composed  of  representatives  from  the  national  data  protection  authorities  of  the   EU  Member  States,  the  European  Data  Protection  Supervisor  and  the  European  Commission.  

×