Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

Ensemble Contextual Bandits for Personalized Recommendation

995 visualizaciones

Publicado el

The cold-start problem has attracted extensive attention among various online services that provide personalized recommendation. Many online vendors employ contextual bandit strategies to tackle the so-called exploration/exploitation dilemma rooted from the cold-start problem. However, due to high-dimensional user/item features and the underlying characteristics of bandit policies, it is often difficult for service providers to obtain and deploy an appropriate algorithm to achieve acceptable and robust economic profit.

In this paper, we explore ensemble strategies of contextual bandit algorithms to obtain robust predicted click-through rate (CTR) of web objects. The ensemble is acquired by aggregating different pulling policies of bandit algorithms, rather than forcing the agreement of prediction results or learning a unified predictive model. To this end, we employ a meta-bandit paradigm that places a hyper bandit over the base bandits, to explicitly explore/exploit the relative importance of base bandits based on user feedbacks. Extensive empirical experiments on two real-world data sets (news recommendation and online advertising) demonstrate the effectiveness of our proposed approach in terms of CTR.

Publicado en: Tecnología
  • Inicia sesión para ver los comentarios

Ensemble Contextual Bandits for Personalized Recommendation

  1. 1. Ensemble  Contextual  Bandits  for   Personalized  Recommenda8on   Liang  Tang,  Yexi  Jiang,  Lei  Li,  Tao  Li   Florida  Interna8onal  University   10/7/14   ACM  RecSys  2014   1  
  2. 2. Cold  Start  Problem  for  Learning  based   Recommenda8on   •  Issue:  Do  not  have  enough  appropriate  data.   – Historical  user  log  data  is  biased.   – User  interest  may  change  over  8me.   – New  items  (or  users)  are  added.   •  Approach:  Exploita8on  and  Explora8on   – Contextual  Mul8-­‐Arm  Bandit  Algorithm   10/7/14   ACM  RecSys  2014   2   The  contextual  informa8on  are  item  features  and  user  features  
  3. 3. Contextual  Bandit  Algorithm  with   Personalized  Recommenda8on   •  Contextual  Bandit   –  Let  a1,  …,  am  be  a  set  of  arms.   –  Given  a  context  xt,  the  model  decides  which  arm  to  pull.   –  AYer  each  pull,  you  receive  a  random  reward,  which  is   determined  by  the  pulled  arm  and  xt.   –  Goal:  maximize  the  total  received  reward.   •  Online  Recommenda8on   –  Arm      à  Item      Pull    à  Recommend   –  Context      à  User  feature     –  Reward    à  Click   10/7/14   ACM  RecSys  2014   3  
  4. 4. Problem  Statement   •  Problem  Se3ng:  have  many  different   recommenda8on  models  (or  policies):   –  Different  CTR  Predic8on  Algorithms.   –  Different  Explora8on-­‐Exploita8on  Algorithms.   –  Different  Parameter  Choices.   •  No  data  to  do  model  valida;on   •  Problem  Statement:  how  to  build  an  ensemble   model  that  is  close  to  the  best  model  in  the  cold   start  situa8on  ?   10/7/14   ACM  RecSys  2014   4  
  5. 5. How  Ensemble?   •  Classifier  ensemble  method  does  not  work  in   this  seang   – Recommenda8on  decision  is  NOT  purely  based  on   the  predicted  CTR.   •  Each  individual  model  only  tells  us:   – Which  item  to  recommend.     10/7/14   ACM  RecSys  2014   5  
  6. 6. Ensemble  Method   •  Our  Method:   – Allocate  recommenda8on  chances  to  individual   models.   •  Problem:   – Beder  models  should  have  more  chances.   – We  do  not  know  which  one  is  good  or  bad  in   advance.   – Ideal  solu8on:  allocate  all  chances  to  the  best  one.   10/7/14   ACM  RecSys  2014   6  
  7. 7. Current  Prac8ce:  Online  Evalua8on  (or   A/B  tes8ng)   Let  π1,  π2    …  πm  be  the  individual  models.   1.  Deploy  π1,  π2    …  πm  into  the  online  system  at  the   same  8me.     2.  Dispatch  a  small  percent  user  traffic  to  each  model.   3.  AYer  a  period,  choose  the  model  having  the  best   CTR  as  the  produc8on  model.   10/7/14   ACM  RecSys  2014   7  
  8. 8. Current  Prac8ce:  Online  Evalua8on  (or   A/B  tes8ng)   Let  π1,  π2    …  πm  be  the  individual  models.   1.  Deploy  π1,  π2    …  πm  into  the  online  system  at  the   same  8me.     2.  Dispatch  a  small  percent  user  traffic  to  each  model.   3.  AYer  a  period,  choose  the  model  having  the  best   CTR  as  the  produc8on  model.   10/7/14   ACM  RecSys  2014   8   If  we  have  too  many  models,  this  will  hurt  the   performance  of  the  online  system.    
  9. 9. Our  Idea  1  (HyperTS)   •  The  CTR  of  model  πi  is  a  random  unknown  variable,  Ri  .     •  Goal:     –  maximize                                ,   rt  is  a  random  number  drawn  from  Rs(t),  s(t)=1,2,…,  or  m.   For  each  t=1,…,N,  we  decide  s(t).   •  Solu;on:   –  Bernoulli  Thompson  Sampling  (flat  prior:  beta(1,1))  .   –  π1,  π2    …  πm  are  bandit  arms.     10/7/14   ACM  RecSys  2014   9   1 N rt t=1 N ∑ CTR  of  our  ensemble   model   No  tricky  parameters  
  10. 10. An  Example  of  HyperTS   10/7/14   ACM  RecSys  2014   10   In  memory,  we  keep  these   es8mated  CTRs  for  π1,  π2    …  πm.   R1   R2   Rk   …   Rm   …  
  11. 11. An  Example  of  HyperTS   10/7/14   ACM  RecSys  2014   11   A  user  visit   HyperTS  selects  a   candidate  model,  πk  .   R1   R2   Rk   …   Es8mated  CTRs   Rm   …  
  12. 12. An  Example  of  HyperTS   10/7/14   ACM  RecSys  2014   12   A  user  visit   HyperTS  selects  a   candidate  model,  πk  .   πk  recommends  item  A  to   the  user.   A   xt::  context    features   Es8mated  CTRs   R1   R2   Rk   …   Rm   …  
  13. 13. An  Example  of  HyperTS   10/7/14   ACM  RecSys  2014   13   A  user  visit   HyperTS  selects  a   candidate  model,  πk  .   πk  recommends  item  A  to   the  user.   A   xt::  context    features   rt  :click  or  not   HyperTS  updates  the   es8ma8on  of  Rk  based  on  rt.   Es8mated  CTRs   R1   R2   Rk   …   Rm   …   update  
  14. 14. Two-­‐Layer  Decision   10/7/14   ACM  RecSys  2014   14   Bernoulli  Thompson   Sampling   π1   π2   πm  πk   Item  A   Item  B   Item  C   …   …  
  15. 15. Our  Idea  2  (HyperTSFB)   •  Limita8on  of  Previous  Idea:   – For  each  recommenda8on,  user  feedback  is  used   by  only  one  individual  model  (e.g.,  πk).     •  Mo8va8on:   – Can  we  update  all  R1,  R2,  …,  Rm  by  every  user   feedback?  (Share  every  user  feedback  to  every   individual  model).   10/7/14   ACM  RecSys  2014   15  
  16. 16. Our  Idea  2  (HyperTSFB)   •  Assume  each  model  can  output  the  probability  of   recommending  any  item  given  xt.   –  E.g.,  for  determinis8c  recommenda8on,  it  is  1  or  0.   •  For  a  user  visit  xt:   1.  πk  is  selected  to  perform  recommenda8on  (k=1,2,…,  or   m).     2.  Item  A  is  recommended  by  πk  given  xt.     3.  Receive  a  user  feedback  (click  or  not  click),  rt.   4.  Ask  every  model  π1,  π2    …  πm,  what  is  the  probability  of   recommending  A  given  xt.   10/7/14   ACM  RecSys  2014   16  
  17. 17. Our  Idea  2  (HyperTSFB)   •  Assume  each  model  can  output  the  probability  of   recommending  any  item  given  xt.   –  E.g.,  for  determinis8c  recommenda8on,  it  is  1  or  0.   •  For  a  user  visit  xt:   1.  πk  is  selected  to  perform  recommenda8on  (k=1,2,…,  or   m).     2.  Item  A  is  recommended  by  πk  given  xt.     3.  Receive  a  user  feedback  (click  or  not  click),  rt.   4.  Ask  every  model  π1,  π2    …  πm,  what  is  the  probability  of   recommending  A  given  xt.   10/7/14   ACM  RecSys  2014   17   Es;mate  the  CTR  of    π1,  π2    …  πm     (Importance  Sampling)  
  18. 18. Experimental  Setup   •  Experimental  Data   –  Yahoo!  Today  News  data  logs  (randomly  displayed).   –  KDD  Cup  2012  Online  Adver8sing  data  set.   •  Evalua;on  Methods   –  Yahoo!  Today  News:  Replay  (see  Lihong  Li  et.  al’s  WSDM  2011   paper).     –  KDD  Cup  2012  Data:  Simula>on  by  a  Logis8c   Regression  Model.   10/7/14   ACM  RecSys  2014   18  
  19. 19. Compara8ve  Methods   •  CTR  Predic8on  Algorithm   – Logis8c  Regression   •  Exploita8on-­‐Explora8on  Algorithms   – Random,  ε-­‐greedy,  LinUCB,  SoYmax,  Epoch-­‐ greedy,  Thompson  sampling   •  HyperTS  and  HyperTSFB     10/7/14   ACM  RecSys  2014   19  
  20. 20. Results  for  Yahoo!  News  Data   •  Every  100,000  impressions  are  aggregated  into  a  bucket.     10/7/14   ACM  RecSys  2014   20  
  21. 21. Results  for  Yahoo!  News  Data  (Cont.)   10/7/14   ACM  RecSys  2014   21  
  22. 22. Conclusions  for  Experimental  Results   1.  The  performance  of  baseline  exploita8on-­‐explora8on  algorithms   is  very  sensi8ve  to  the  parameter  seang.   –  In  cold-­‐start  situa8on,  no  enough  data  to  tune  parameter.   2.  HyperTS  and  HyperTSFB  can  be  close  to  the  op8mal  baseline   algorithm  (No  guarantee  be  beder  than  the  op8mal  one),  even   though  some  bad  individual  models  are  included.   3.  For  contextual  Thompson  sampling,  the  performance  depends  on   the  choice  of  prior  distribu8on  for  the  logis8c  regression.   –  For  online  Bayesian  learning,  the  posterior  distribu8on  approxima8on  is   not  accurate(cannot  store  the  past  data).   10/7/14   ACM  RecSys  2014   22  
  23. 23. Ques8on  &  Thank  you   •  Thank  you!   •  Ques8on?   10/7/14   ACM  RecSys  2014   23  

×