Reward Innovation for long-term member satisfaction

Reward innovation for
long-term member
satisfaction
Gary Tang, Jiangwei Pan, Henry
Wang and Justin Basilico
Goal
Create a personalized homepage to help
members find content to watch and enjoy
that
maximizes long-term member satisfaction.
Member enjoys watching Netflix
So continues the subscription
and
tells their friends about it
Long-term satisfaction for Netflix
Batch learning from bandit feedback
Production policy
● show recommendations to the member
Member gives feedbacks on recommendations
● immediate: skip/play a show
● long-term: cancel/renew subscription
Goal
● train a policy to maximize the long-term reward
Batch learning from bandit feedback
Production policy
● show recommendations to the member
Member gives feedbacks on recommendations
● immediate: skip/play a show
● long-term: cancel/renew subscription
Goal
● train a policy to maximize the long-term reward
Challenges with long-term retention
Noisy signal
Influenced by external
factors
Nonsensitive signal
Only sensitive for
“borderline” members
Delayed signal
Need to wait a long time &
hard to attribute
Proxy reward
Train policy to optimize proxy reward
● highly correlated with long-term
satisfaction
● sensitive to individual
recommendations
Immediate feedback as proxy
● e.g. start to play a show
Delayed feedback could be more aligned
● e.g. completing a show
Delayed proxy reward
Need to wait a long time to observe the
delayed reward
● can not be used in training
immediately
● can hurt coldstarting of model
Don’t want to wait? predict delayed reward
● use all user actions up to policy training
time
Train policy to maximize predicted reward
Note: predicted reward can not be used
online as it uses post-recommendation user
actions
The ideas are not new
Related work:
● Long-term optimization in recommenders
● Reward shaping in reinforcement learning
● Online reward optimization
We focus on reward innovation as an important
product development workstream at Netflix
Integrating reward in bandit
Reward component
provides the objective
of bandit policy
training data
Reward innovation
● Ideation: what aspect of long-term satisfaction hasn’t been
captured as a reward?
○ Requires balancing perspectives of ML, business, and
psychology. Not easy!
● Development: how do we compute this new reward for every
recommended item?
○ Collecting immediate feedbacks, predicting delayed feedbacks
● Evaluation: whether this is a good reward for the bandit
policy to maximize?
Reward infrastructure
Common development patterns
● Computing immediate proxy
rewards
● Predicting delayed proxy rewards
● Combining multiple rewards
● Sharing rewards across multiple
recommender components
Reward evaluation
● Online testing is expensive (time, resources)
● Use offline evaluation to determine promising
reward candidates for online testing
● Challenge: hard to compare policies trained with
different rewards
Offline reward evaluation
● compare policies along multiple
reward axes using OPE
● choose small number of candidate
policies on pareto front
● compare candidate policies online
using long-term user satisfaction
metrics
Practical learnings
● Reward normalization: dynamic range of reward
can affect SGD training dynamics
● Reward features: pairing a reward with a correlated
feature tends to improve the model’s ability to
optimize that reward
● Reward alignment: make different parts of the
overall recommender system “point in the same
direction”
Summary
● Proxy rewards: we can train bandit policies using
proxy rewards to optimize long-term member
satisfaction
● Art and science: to come up with good reward
hypotheses
● Supporting infrastructure: we develop
infrastructure to help iterate on new hypotheses
quickly
Open challenges
● Proxy rewards: how can we identify proxy rewards that are
aligned with long-term satisfaction in a more principle
way?
● Reinforcement learning: can we use reinforcement
learning to optimize long-term reward directly for
recommendations?
Thank you!
Questions?
Jiangwei Pan
jpan@netflix.com
1 de 18

Recomendados

Recommendation Modeling with Impression Data at Netflix por
Recommendation Modeling with Impression Data at NetflixRecommendation Modeling with Impression Data at Netflix
Recommendation Modeling with Impression Data at NetflixJiangwei Pan
885 vistas24 diapositivas
Déjà Vu: The Importance of Time and Causality in Recommender Systems por
Déjà Vu: The Importance of Time and Causality in Recommender SystemsDéjà Vu: The Importance of Time and Causality in Recommender Systems
Déjà Vu: The Importance of Time and Causality in Recommender SystemsJustin Basilico
11.8K vistas45 diapositivas
Missing values in recommender models por
Missing values in recommender modelsMissing values in recommender models
Missing values in recommender modelsParmeshwar Khurd
118 vistas29 diapositivas
Recent Trends in Personalization: A Netflix Perspective por
Recent Trends in Personalization: A Netflix PerspectiveRecent Trends in Personalization: A Netflix Perspective
Recent Trends in Personalization: A Netflix PerspectiveJustin Basilico
30.3K vistas64 diapositivas
Sequential Decision Making in Recommendations por
Sequential Decision Making in RecommendationsSequential Decision Making in Recommendations
Sequential Decision Making in RecommendationsJaya Kawale
2.1K vistas32 diapositivas
Time, Context and Causality in Recommender Systems por
Time, Context and Causality in Recommender SystemsTime, Context and Causality in Recommender Systems
Time, Context and Causality in Recommender SystemsYves Raimond
5.9K vistas35 diapositivas

Más contenido relacionado

La actualidad más candente

Netflix Recommendations - Beyond the 5 Stars por
Netflix Recommendations - Beyond the 5 StarsNetflix Recommendations - Beyond the 5 Stars
Netflix Recommendations - Beyond the 5 StarsXavier Amatriain
21.1K vistas82 diapositivas
Deep Learning for Recommender Systems por
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender SystemsYves Raimond
15.4K vistas33 diapositivas
Personalizing "The Netflix Experience" with Deep Learning por
Personalizing "The Netflix Experience" with Deep LearningPersonalizing "The Netflix Experience" with Deep Learning
Personalizing "The Netflix Experience" with Deep LearningAnoop Deoras
1.1K vistas41 diapositivas
Personalizing the listening experience por
Personalizing the listening experiencePersonalizing the listening experience
Personalizing the listening experienceMounia Lalmas-Roelleke
2.1K vistas35 diapositivas
Tutorial on Deep Learning in Recommender System, Lars summer school 2019 por
Tutorial on Deep Learning in Recommender System, Lars summer school 2019Tutorial on Deep Learning in Recommender System, Lars summer school 2019
Tutorial on Deep Learning in Recommender System, Lars summer school 2019Anoop Deoras
2.2K vistas102 diapositivas
User behavior analytics por
User behavior analyticsUser behavior analytics
User behavior analyticsShankar Vedaraman
7.7K vistas36 diapositivas

La actualidad más candente(20)

Netflix Recommendations - Beyond the 5 Stars por Xavier Amatriain
Netflix Recommendations - Beyond the 5 StarsNetflix Recommendations - Beyond the 5 Stars
Netflix Recommendations - Beyond the 5 Stars
Xavier Amatriain21.1K vistas
Deep Learning for Recommender Systems por Yves Raimond
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
Yves Raimond15.4K vistas
Personalizing "The Netflix Experience" with Deep Learning por Anoop Deoras
Personalizing "The Netflix Experience" with Deep LearningPersonalizing "The Netflix Experience" with Deep Learning
Personalizing "The Netflix Experience" with Deep Learning
Anoop Deoras1.1K vistas
Tutorial on Deep Learning in Recommender System, Lars summer school 2019 por Anoop Deoras
Tutorial on Deep Learning in Recommender System, Lars summer school 2019Tutorial on Deep Learning in Recommender System, Lars summer school 2019
Tutorial on Deep Learning in Recommender System, Lars summer school 2019
Anoop Deoras2.2K vistas
Deep Learning for Personalized Search and Recommender Systems por Benjamin Le
Deep Learning for Personalized Search and Recommender SystemsDeep Learning for Personalized Search and Recommender Systems
Deep Learning for Personalized Search and Recommender Systems
Benjamin Le37.4K vistas
A Multi-Armed Bandit Framework For Recommendations at Netflix por Jaya Kawale
A Multi-Armed Bandit Framework For Recommendations at NetflixA Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at Netflix
Jaya Kawale11.1K vistas
Artwork Personalization at Netflix por Justin Basilico
Artwork Personalization at NetflixArtwork Personalization at Netflix
Artwork Personalization at Netflix
Justin Basilico28.1K vistas
Contextualization at Netflix por Linas Baltrunas
Contextualization at NetflixContextualization at Netflix
Contextualization at Netflix
Linas Baltrunas7.6K vistas
Homepage Personalization at Spotify por Oguz Semerci
Homepage Personalization at SpotifyHomepage Personalization at Spotify
Homepage Personalization at Spotify
Oguz Semerci3.5K vistas
Context Aware Recommendations at Netflix por Linas Baltrunas
Context Aware Recommendations at NetflixContext Aware Recommendations at Netflix
Context Aware Recommendations at Netflix
Linas Baltrunas5.7K vistas
RecSys 2020 A Human Perspective on Algorithmic Similarity Schendel 9-2020 por Zachary Schendel
RecSys 2020 A Human Perspective on Algorithmic Similarity Schendel 9-2020RecSys 2020 A Human Perspective on Algorithmic Similarity Schendel 9-2020
RecSys 2020 A Human Perspective on Algorithmic Similarity Schendel 9-2020
Zachary Schendel9K vistas
Interactive Recommender Systems with Netflix and Spotify por Chris Johnson
Interactive Recommender Systems with Netflix and SpotifyInteractive Recommender Systems with Netflix and Spotify
Interactive Recommender Systems with Netflix and Spotify
Chris Johnson105K vistas
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se... por Sudeep Das, Ph.D.
 Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se... Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
Sudeep Das, Ph.D.13K vistas
Enhancing SEO Content Writing with AI: Opportunities & Challenges por Search Engine Journal
Enhancing SEO Content Writing with AI: Opportunities & ChallengesEnhancing SEO Content Writing with AI: Opportunities & Challenges
Enhancing SEO Content Writing with AI: Opportunities & Challenges
Search Engine Journal2.2K vistas
Calibrated Recommendations por Harald Steck
Calibrated RecommendationsCalibrated Recommendations
Calibrated Recommendations
Harald Steck4.2K vistas
Data council SF 2020 Building a Personalized Messaging System at Netflix por Grace T. Huang
Data council SF 2020 Building a Personalized Messaging System at NetflixData council SF 2020 Building a Personalized Messaging System at Netflix
Data council SF 2020 Building a Personalized Messaging System at Netflix
Grace T. Huang447 vistas

Similar a Reward Innovation for long-term member satisfaction

VIP Program Creation por
VIP Program CreationVIP Program Creation
VIP Program CreationJenny Weigle
128 vistas13 diapositivas
2016 5-5 technology association of georgia - tag - sales force engagement (2)... por
2016 5-5 technology association of georgia - tag - sales force engagement (2)...2016 5-5 technology association of georgia - tag - sales force engagement (2)...
2016 5-5 technology association of georgia - tag - sales force engagement (2)...Erin Bush
119 vistas38 diapositivas
Headspace apm growth project por
Headspace apm growth projectHeadspace apm growth project
Headspace apm growth projectDhiren Patel
1.1K vistas43 diapositivas
Analytics Academy 2017 Presentation Slides por
Analytics Academy 2017 Presentation SlidesAnalytics Academy 2017 Presentation Slides
Analytics Academy 2017 Presentation SlidesHarvardComms
963 vistas180 diapositivas
Aligning with partners on customer success por
Aligning with partners on customer successAligning with partners on customer success
Aligning with partners on customer successMatthew Klassen
693 vistas23 diapositivas
Places for People - Web Strategy Presentation por
Places for People - Web Strategy PresentationPlaces for People - Web Strategy Presentation
Places for People - Web Strategy PresentationMichael Sutcliffe (CIM, BA Hons)
676 vistas36 diapositivas

Similar a Reward Innovation for long-term member satisfaction(20)

VIP Program Creation por Jenny Weigle
VIP Program CreationVIP Program Creation
VIP Program Creation
Jenny Weigle128 vistas
2016 5-5 technology association of georgia - tag - sales force engagement (2)... por Erin Bush
2016 5-5 technology association of georgia - tag - sales force engagement (2)...2016 5-5 technology association of georgia - tag - sales force engagement (2)...
2016 5-5 technology association of georgia - tag - sales force engagement (2)...
Erin Bush119 vistas
Headspace apm growth project por Dhiren Patel
Headspace apm growth projectHeadspace apm growth project
Headspace apm growth project
Dhiren Patel1.1K vistas
Analytics Academy 2017 Presentation Slides por HarvardComms
Analytics Academy 2017 Presentation SlidesAnalytics Academy 2017 Presentation Slides
Analytics Academy 2017 Presentation Slides
HarvardComms963 vistas
Aligning with partners on customer success por Matthew Klassen
Aligning with partners on customer successAligning with partners on customer success
Aligning with partners on customer success
Matthew Klassen693 vistas
How to Gauge Engagement in Moodle Using NPS por Lambda Solutions
How to Gauge Engagement in Moodle Using NPSHow to Gauge Engagement in Moodle Using NPS
How to Gauge Engagement in Moodle Using NPS
Lambda Solutions244 vistas
Dev's Guide to Feedback Driven Development por Marty Haught
Dev's Guide to Feedback Driven DevelopmentDev's Guide to Feedback Driven Development
Dev's Guide to Feedback Driven Development
Marty Haught1.9K vistas
Panel: Making responsible gambling work within the industry por Horizons RG
Panel: Making responsible gambling work within the industry Panel: Making responsible gambling work within the industry
Panel: Making responsible gambling work within the industry
Horizons RG599 vistas
Serious Games + Learning Science = Win: How to Teach Product Knowledge, Polic... por Bottom-Line Performance
Serious Games + Learning Science = Win: How to Teach Product Knowledge, Polic...Serious Games + Learning Science = Win: How to Teach Product Knowledge, Polic...
Serious Games + Learning Science = Win: How to Teach Product Knowledge, Polic...
Mkt2 building youronlinemktgpresence por brock55
Mkt2 building youronlinemktgpresenceMkt2 building youronlinemktgpresence
Mkt2 building youronlinemktgpresence
brock55446 vistas
5 Ways to Increase the Effectiveness of Your Compensation Programs por Human Capital Media
5 Ways to Increase the Effectiveness of Your Compensation Programs5 Ways to Increase the Effectiveness of Your Compensation Programs
5 Ways to Increase the Effectiveness of Your Compensation Programs
Human Capital Media943 vistas
How to Keep Up Your Creative Testing After Budget Cuts por Tinuiti
How to Keep Up Your Creative Testing After Budget CutsHow to Keep Up Your Creative Testing After Budget Cuts
How to Keep Up Your Creative Testing After Budget Cuts
Tinuiti555 vistas
Necessary Elements of Digital Marketing to Grow Your Business por Digital Vidya
Necessary Elements of Digital Marketing to Grow Your BusinessNecessary Elements of Digital Marketing to Grow Your Business
Necessary Elements of Digital Marketing to Grow Your Business
Digital Vidya1.1K vistas
Closing the Loop: Enhancing User Experience with Monetization | Tal Shoham por Jessica Tams
Closing the Loop: Enhancing User Experience with Monetization | Tal ShohamClosing the Loop: Enhancing User Experience with Monetization | Tal Shoham
Closing the Loop: Enhancing User Experience with Monetization | Tal Shoham
Jessica Tams341 vistas
The Bumpy Road to Actionable SLOs por DevOps.com
The Bumpy Road to Actionable SLOsThe Bumpy Road to Actionable SLOs
The Bumpy Road to Actionable SLOs
DevOps.com140 vistas
Agile Chennai 2022 - Mahadevan Periaswamy & Rajsekhar Janaswamy | Dopamine S... por AgileNetwork
Agile Chennai 2022 - Mahadevan Periaswamy &  Rajsekhar Janaswamy | Dopamine S...Agile Chennai 2022 - Mahadevan Periaswamy &  Rajsekhar Janaswamy | Dopamine S...
Agile Chennai 2022 - Mahadevan Periaswamy & Rajsekhar Janaswamy | Dopamine S...
AgileNetwork22 vistas
How to size up your Reward and Recognition Budget | Xexec por xexec_corporate
How to size up your Reward and Recognition Budget | XexecHow to size up your Reward and Recognition Budget | Xexec
How to size up your Reward and Recognition Budget | Xexec
xexec_corporate490 vistas

Último

Future of AR - Facebook Presentation por
Future of AR - Facebook PresentationFuture of AR - Facebook Presentation
Future of AR - Facebook Presentationssuserb54b561
15 vistas27 diapositivas
Melek BEN MAHMOUD.pdf por
Melek BEN MAHMOUD.pdfMelek BEN MAHMOUD.pdf
Melek BEN MAHMOUD.pdfMelekBenMahmoud
14 vistas1 diapositiva
【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院 por
【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院
【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院IttrainingIttraining
58 vistas8 diapositivas
HTTP headers that make your website go faster - devs.gent November 2023 por
HTTP headers that make your website go faster - devs.gent November 2023HTTP headers that make your website go faster - devs.gent November 2023
HTTP headers that make your website go faster - devs.gent November 2023Thijs Feryn
22 vistas151 diapositivas
Zero to Automated in Under a Year por
Zero to Automated in Under a YearZero to Automated in Under a Year
Zero to Automated in Under a YearNetwork Automation Forum
15 vistas23 diapositivas
STPI OctaNE CoE Brochure.pdf por
STPI OctaNE CoE Brochure.pdfSTPI OctaNE CoE Brochure.pdf
STPI OctaNE CoE Brochure.pdfmadhurjyapb
14 vistas1 diapositiva

Último(20)

Future of AR - Facebook Presentation por ssuserb54b561
Future of AR - Facebook PresentationFuture of AR - Facebook Presentation
Future of AR - Facebook Presentation
ssuserb54b56115 vistas
【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院 por IttrainingIttraining
【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院
【USB韌體設計課程】精選講義節錄-USB的列舉過程_艾鍗學院
HTTP headers that make your website go faster - devs.gent November 2023 por Thijs Feryn
HTTP headers that make your website go faster - devs.gent November 2023HTTP headers that make your website go faster - devs.gent November 2023
HTTP headers that make your website go faster - devs.gent November 2023
Thijs Feryn22 vistas
STPI OctaNE CoE Brochure.pdf por madhurjyapb
STPI OctaNE CoE Brochure.pdfSTPI OctaNE CoE Brochure.pdf
STPI OctaNE CoE Brochure.pdf
madhurjyapb14 vistas
Special_edition_innovator_2023.pdf por WillDavies22
Special_edition_innovator_2023.pdfSpecial_edition_innovator_2023.pdf
Special_edition_innovator_2023.pdf
WillDavies2218 vistas
Automating a World-Class Technology Conference; Behind the Scenes of CiscoLive por Network Automation Forum
Automating a World-Class Technology Conference; Behind the Scenes of CiscoLiveAutomating a World-Class Technology Conference; Behind the Scenes of CiscoLive
Automating a World-Class Technology Conference; Behind the Scenes of CiscoLive
Unit 1_Lecture 2_Physical Design of IoT.pdf por StephenTec
Unit 1_Lecture 2_Physical Design of IoT.pdfUnit 1_Lecture 2_Physical Design of IoT.pdf
Unit 1_Lecture 2_Physical Design of IoT.pdf
StephenTec12 vistas
Voice Logger - Telephony Integration Solution at Aegis por Nirmal Sharma
Voice Logger - Telephony Integration Solution at AegisVoice Logger - Telephony Integration Solution at Aegis
Voice Logger - Telephony Integration Solution at Aegis
Nirmal Sharma39 vistas
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f... por TrustArc
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...
TrustArc11 vistas
PharoJS - Zürich Smalltalk Group Meetup November 2023 por Noury Bouraqadi
PharoJS - Zürich Smalltalk Group Meetup November 2023PharoJS - Zürich Smalltalk Group Meetup November 2023
PharoJS - Zürich Smalltalk Group Meetup November 2023
Noury Bouraqadi132 vistas
Igniting Next Level Productivity with AI-Infused Data Integration Workflows por Safe Software
Igniting Next Level Productivity with AI-Infused Data Integration Workflows Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Safe Software280 vistas
ESPC 2023 - Protect and Govern your Sensitive Data with Microsoft Purview in ... por Jasper Oosterveld
ESPC 2023 - Protect and Govern your Sensitive Data with Microsoft Purview in ...ESPC 2023 - Protect and Govern your Sensitive Data with Microsoft Purview in ...
ESPC 2023 - Protect and Govern your Sensitive Data with Microsoft Purview in ...
Jasper Oosterveld19 vistas

Reward Innovation for long-term member satisfaction

  • 1. Reward innovation for long-term member satisfaction Gary Tang, Jiangwei Pan, Henry Wang and Justin Basilico
  • 2. Goal Create a personalized homepage to help members find content to watch and enjoy that maximizes long-term member satisfaction.
  • 3. Member enjoys watching Netflix So continues the subscription and tells their friends about it Long-term satisfaction for Netflix
  • 4. Batch learning from bandit feedback Production policy ● show recommendations to the member Member gives feedbacks on recommendations ● immediate: skip/play a show ● long-term: cancel/renew subscription Goal ● train a policy to maximize the long-term reward
  • 5. Batch learning from bandit feedback Production policy ● show recommendations to the member Member gives feedbacks on recommendations ● immediate: skip/play a show ● long-term: cancel/renew subscription Goal ● train a policy to maximize the long-term reward
  • 6. Challenges with long-term retention Noisy signal Influenced by external factors Nonsensitive signal Only sensitive for “borderline” members Delayed signal Need to wait a long time & hard to attribute
  • 7. Proxy reward Train policy to optimize proxy reward ● highly correlated with long-term satisfaction ● sensitive to individual recommendations Immediate feedback as proxy ● e.g. start to play a show Delayed feedback could be more aligned ● e.g. completing a show
  • 8. Delayed proxy reward Need to wait a long time to observe the delayed reward ● can not be used in training immediately ● can hurt coldstarting of model Don’t want to wait? predict delayed reward ● use all user actions up to policy training time Train policy to maximize predicted reward Note: predicted reward can not be used online as it uses post-recommendation user actions
  • 9. The ideas are not new Related work: ● Long-term optimization in recommenders ● Reward shaping in reinforcement learning ● Online reward optimization We focus on reward innovation as an important product development workstream at Netflix
  • 10. Integrating reward in bandit Reward component provides the objective of bandit policy training data
  • 11. Reward innovation ● Ideation: what aspect of long-term satisfaction hasn’t been captured as a reward? ○ Requires balancing perspectives of ML, business, and psychology. Not easy! ● Development: how do we compute this new reward for every recommended item? ○ Collecting immediate feedbacks, predicting delayed feedbacks ● Evaluation: whether this is a good reward for the bandit policy to maximize?
  • 12. Reward infrastructure Common development patterns ● Computing immediate proxy rewards ● Predicting delayed proxy rewards ● Combining multiple rewards ● Sharing rewards across multiple recommender components
  • 13. Reward evaluation ● Online testing is expensive (time, resources) ● Use offline evaluation to determine promising reward candidates for online testing ● Challenge: hard to compare policies trained with different rewards
  • 14. Offline reward evaluation ● compare policies along multiple reward axes using OPE ● choose small number of candidate policies on pareto front ● compare candidate policies online using long-term user satisfaction metrics
  • 15. Practical learnings ● Reward normalization: dynamic range of reward can affect SGD training dynamics ● Reward features: pairing a reward with a correlated feature tends to improve the model’s ability to optimize that reward ● Reward alignment: make different parts of the overall recommender system “point in the same direction”
  • 16. Summary ● Proxy rewards: we can train bandit policies using proxy rewards to optimize long-term member satisfaction ● Art and science: to come up with good reward hypotheses ● Supporting infrastructure: we develop infrastructure to help iterate on new hypotheses quickly
  • 17. Open challenges ● Proxy rewards: how can we identify proxy rewards that are aligned with long-term satisfaction in a more principle way? ● Reinforcement learning: can we use reinforcement learning to optimize long-term reward directly for recommendations?