Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

RecSys 2015: Large-scale real-time product recommendation at Criteo

8.523 visualizaciones

Publicado el

The challenges of recommending product in realtime at Criteo. 1B users, 2B banners a day, a total of 4B products, optimized for the best ROI.

Publicado en: Internet
  • Inicia sesión para ver los comentarios

RecSys 2015: Large-scale real-time product recommendation at Criteo

  1. 1. Copyright © 2015 Criteo Large-Scale Real-Time Product Recommendation at Criteo Romain Lerallut, Diane Gasselin RecSys Vienna, Sept 18, 2015
  2. 2. Copyright © 2015 Criteo 2
  3. 3. Copyright © 2015 Criteo « The largest internet company you’ve never heard of »  • Founded in 2005, in the adtech business since 2008 • Recommendation was our first product • Disruptive business models • 1700 people WW (50+% for less than a year) • 300+ engineers • 26 offices • Live in 130 countries • 1B unique users
  4. 4. Copyright © 2015 Criteo We buy • Inventory ! (ad spaces) • Billions of times a day • All over the Internet • For 95% of the population => Funding the Web A technology company first and foremost We sell • Clicks ! • (that convert) • (that convert a lot) => Delight to our clients !  We take the risk You pay only for what you get
  5. 5. Copyright © 2015 Criteo Learn on huge volumes of data 10 000 displays
  6. 6. Copyright © 2015 Criteo Learn on huge volumes of data 10 000 displays leads to 50 clicks
  7. 7. Copyright © 2015 Criteo Learn on huge volumes of data 10 000 displays leads to 50 clicks leads to 1 sale
  8. 8. Copyright © 2015 Criteo 8 Traffic 800k HTTP requests / sec (peak activity) 29000 impressions / sec (peak activity)
  9. 9. Copyright © 2015 Criteo 9 Traffic 800k HTTP requests / sec (peak activity) 29000 impressions / sec (peak activity) <10 ms to process RTB request <100 ms to process reco request
  10. 10. Copyright © 2015 Criteo 10 Physical infrastructure 7 in-house data centers on 3 continents Traffic 800k HTTP requests / sec (peak activity) 29000 impressions / sec (peak activity) <10 ms to process RTB request <100 ms to process reco request
  11. 11. Copyright © 2015 Criteo 11 Physical infrastructure 7 in-house data centers on 3 continents ~ 15000 servers, largest Hadoop cluster in Europe More than 35 PB of storage Big Data Traffic 800k HTTP requests / sec (peak activity) 29000 impressions / sec (peak activity) <10 ms to process RTB request <100 ms to process reco request
  12. 12. Copyright © 2015 Criteo (Big) Data Sources Ad display data 20B events / day User behavior data 2B events / day Catalog data 1M+ products / client 10k clients
  13. 13. Copyright © 2015 Criteo How do we do it ?
  14. 14. Copyright © 2015 Criteo Recommend products for a user • What we want: reco(user) = products • But 1B users x 3B products ! • And we need to scale and keep it fresh • What we can do : • Pre-select products offline (source) • Refine recommendation online
  15. 15. Copyright © 2015 Criteo 15 Offline : prepare sources Advertiser events Co events Item View – Item View Item Sale – Item Sale Best of Best of by category Similarities Complementarities Top N 350M keys 12B values 50B 50M keys 1B values
  16. 16. Copyright © 2015 Criteo User X saw orange shoes Offline : prepare sources Historical Similar Best-of Other users : Most viewed products on the client website Some candidate products for user X Complementary
  17. 17. Copyright © 2015 Criteo OFFLINE Reco overview Advertiser events Source computation Map-Reduce jobs Recommendation Service Display, Click, Sale logs Prediction models Sources Catalog 12h 4h 6h 4.5B 500M 100K qps 50B
  18. 18. Copyright © 2015 Criteo ML model • Logistic regression models because : • They scale • They are fast • They can handle lots of features (with a bit of magic) Product-specific User-specific User-product interactions Display-specific
  19. 19. Copyright © 2015 Criteo Online: sources Similarities Most viewed Most bought
  20. 20. Copyright © 2015 Criteo Online: merge of products Similarities Most viewed Most bought
  21. 21. Copyright © 2015 Criteo Online: scoring Similarities Most viewed Most bought 0,02 0,12 0,06 0,18 0,03 0,05 0,01 0,005 0,011 0,013 0,004 0,007
  22. 22. Copyright © 2015 Criteo Online: scoring Similarities Most viewed Most bought 0,18 0,12 0,06 0,05 0,03 0,02 0,013 0,011 0,01 0,007 0,005 0,004
  23. 23. Copyright © 2015 Criteo Online: candidates 0,18 0,12 0,06 0,05 0,03 0,02 0,013 0,011 0,01 0,007 0,005 0,004 SHOP SHOP SHOP SHOP -50%
  24. 24. Copyright © 2015 Criteo Evaluation
  25. 25. Copyright © 2015 Criteo • It is the only truth we have • 50% users on model A • 50% users on model B The basics : online ab-testing My company BUY! BUY! BUY! My company BUY! BUY! BUY!
  26. 26. Copyright © 2015 Criteo • It is the only truth we have • 50% users on model A • 50% users on model B • But it is onerous • If not good, we lose money, fast ! • Tests are long (~2weeks needed to have good confidence intervals) • Code has to be prod-ready (no bug, good performance), we run 24/7 • Can be heavy on the infrastructure • And does not take long-term effect into account The basics : online ab-testing My company BUY! BUY! BUY! My company BUY! BUY! BUY!
  27. 27. Copyright © 2015 Criteo The test framework for prediction • ALTERNATIVE : Framework that replays production logs (offline) • 30 000 tests / year • Replay ~x100 • BUT : we only have data on products we display (exploration is costly) • SO : we can only make sure we are not completely mistaken
  28. 28. Copyright © 2015 Criteo Ultimate solution: offline ab-testing • Find the best offline predictor for online performance • Counterfactual Reasoning and Learning Systems Léon Bottou Microsoft Research, Redmond, WA Jonas Peters Max Planck Institute, Tübingen Joaquin Quiñonero-Candela, Denis X. Charles, D. Max Chickering, Elon Portugaly, Dipankar Ray, Patrice Simard, Ed Snelson • But we haven’t succeeded in making it precisely match reality..
  29. 29. Copyright © 2015 Criteo Ultimate solution: offline ab-testing • Find the best offline predictor for online performance • Counterfactual Reasoning and Learning Systems Léon Bottou Microsoft Research, Redmond, WA Jonas Peters Max Planck Institute, Tübingen Joaquin Quiñonero-Candela, Denis X. Charles, D. Max Chickering, Elon Portugaly, Dipankar Ray, Patrice Simard, Ed Snelson • But we haven’t succeeded in making it precisely match reality.. YET
  30. 30. Copyright © 2015 Criteo What’s next ?
  31. 31. Copyright © 2015 Criteo What’s next for us : Upcoming challenges • Long(er)-term user profiles
  32. 32. Copyright © 2015 Criteo What’s next for us : Upcoming challenges • Long(er)-term user profiles • More and better product information (images, semantic, NLP)
  33. 33. Copyright © 2015 Criteo What’s next for us : Upcoming challenges • Long(er)-term user profiles • More and better product information (images, semantic, NLP) • Instant-update of similarities • (because batch computation is soooo last year)
  34. 34. Copyright © 2015 Criteo What’s next for us : Upcoming challenges • Long(er)-term user profiles • More and better product information (images, semantic, NLP) • Instant-update of similarities • (because batch computation is soooo last year) • Joined product scoring • (score full banner and not products independently)
  35. 35. Copyright © 2015 Criteo What’s next for you : Fancy a try ? On your own: With us ! http://labs.criteo.com/jobs/ • We published datasets for click prediction • 4GB display-click data : Kaggle challenge in 2014 http://bit.ly/1vgw2XC • 1TB Display-Click data (industry’s largest dataset) : http://bit.ly/1PyH4Vq • 4 billion of observations • 156 billion feature-value • available on Microsoft Azure • used by edX (UC Berkeley) • We would be happy to share Recocentric data !
  36. 36. Copyright © 2015 Criteo Questions?
  37. 37. Copyright © 2015 Criteo Thank you ! r.lerallut@criteo.com @Rlerallut d.gasselin@criteo.com @recsysfr

×