SlideShare una empresa de Scribd logo
1 de 24
Descargar para leer sin conexión
ICML’11 Tutorial:
Recommender Problems for
       Web Applications
Deepak Agarwal and Bee-Chung Chen
                   Yahoo! Research
Other Significant Y! Labs Contributors

• Content Targeting
    –   Pradheep Elango
    –   Rajiv Khanna
    –   Raghu Ramakrishnan
    –   Xuanhui Wang
    –   Liang Zhang


• Ad Targeting
    – Nagaraj Kota




  Deepak Agarwal & Bee-Chung Chen @ ICML’11   2
Agenda

• Topic of Interest
    – Recommender problems for dynamic, time-sensitive applications
                                 dynamic
       • Content Optimization (main focus today), Online Advertising,
         Movie recommendation, shopping,…
• Introduction (20 min, Deepak)
• Offline components (40 min, Deepak)
    – Regression, Collaborative filtering (CF), …
• Online components + initialization (70 min, Bee-Chung)
    – Time-series, online/incremental methods, explore/exploit (bandit)
• Evaluation methods + Multi-Objective (10-15 min, Deepak)
• Challenges (5-10 min, Deepak)


  Deepak Agarwal & Bee-Chung Chen @ ICML’11                           3
Three components we will focus on today

• Defining the problem
    – Formulate objectives whose optimization achieves some long-
      term goals for the recommender system
         • E.g. How to serve content to optimize audience reach and engagement,
           optimize some combination of engagement and revenue ?

• Modeling (to estimate some critical inputs)
    – Predict rates of some positive user interaction(s) with items based
      on data obtained from historical user-item interactions
       • E.g. Click rates, average time-spent on page, etc
       • Could be explicit feedback like ratings
• Experimentation
    – Create experiments to collect data proactively to improve models,
      helps in converging to the best choice(s) cheaply and rapidly.
       • Explore and Exploit (continuous experimentation)
       • DOE (testing hypotheses by avoiding bias inherent in data)
  Deepak Agarwal & Bee-Chung Chen @ ICML’11                                       4
Modern Recommendation Systems

• Goal
    – Serve the right item to a user in a given context to optimize long-
      term business objectives
• A scientific discipline that involves
    – Large scale Machine Learning & Statistics
         • Offline Models (capture global & stable characteristics)
         • Online Models (incorporates dynamic components)
         • Explore/Exploit (active and adaptive experimentation)
    – Multi-Objective Optimization
         • Click-rates (CTR), Engagement, advertising revenue, diversity, etc
    – Inferring user interest
         • Constructing User Profiles
    – Natural Language Processing to understand content
         • Topics, “aboutness”, entities, follow-up of something, breaking news,…


  Deepak Agarwal & Bee-Chung Chen @ ICML’11                                     5
Some examples from content optimization

• Simple version
    – I have a content module on my page, content inventory is obtained
      from a third party source which is further refined through editorial
      oversight. Can I algorithmically recommend content on this
      module? I want to improve overall click-rate (CTR) on this module
• More advanced
    – I got X% lift in CTR. But I have additional information on other
      downstream utilities (e.g. advertising revenue). Can I increase
      downstream utility without losing too many clicks?
• Highly advanced
    – There are multiple modules running on my webpage. How do I
      perform a simultaneous optimization?




  Deepak Agarwal & Bee-Chung Chen @ ICML’11                              6
Recommend search queries

                                              Recommend packages:
                                                Image
                                                Title, summary
                                                Links to other pages

                                              Pick 4 out of a pool of K
                                                K = 20 ~ 40
                                                Dynamic

                                              Routes traffic other pages

                     Recommend applications   Recommend news article
Deepak Agarwal & Bee-Chung Chen @ ICML’11                                  7
Problems in this example

• Optimize CTR on multiple modules
    – Today Module, Trending Now, Personal Assistant, News
    – Simple solution: Treat modules as independent, optimize
      separately. May not be the best when there are strong correlations.



• For any single module
    – Optimize some combination of CTR, downstream engagement,
      and perhaps advertising revenue.




  Deepak Agarwal & Bee-Chung Chen @ ICML’11                           8
Online Advertising

                            Response rates
                            (click, conversion, ad-view)
                                                                            Bids
     conversion
                          ML /Statistical                  Auction
                          model




                                                                                        Advertisers
                                                Select argmax f(bid,response rates)
                                Click
                                              Recommend
                             Ads
                                                Best ad(s)
                                                                Ad Network
                            Page
   User                                                           •Examples:
                                                             Yahoo, Google, MSN, …
                                                            Ad exchanges (RightMedia,
                                                                 DoubleClick, …)

                             Publisher
  Deepak Agarwal & Bee-Chung Chen @ ICML’11                                                  9
Recommender problems in general

  • Example applications
      • Search: Web, Vertical
      • Online Advertising                           Item Inventory
      • Content                                      Articles, web page,
      •…..                                                  ads, …

                   Context
                      query, page, …
                                               Use an automated algorithm
                                                 to select item(s) to show

                                             Get feedback (click, time spent,..)
        USER
                                                    Refine the models

                                              Repeat (large number of times)
                                               Optimize metric(s) of interest
                                              (Total clicks, Total revenue,…)


 Deepak Agarwal & Bee-Chung Chen @ ICML’11                                         10
Important Factors

  • Items: Articles, ads, modules, movies, users, updates, etc.

  • Context: query keywords, pages, mobile, social media, etc.

  • Metric to optimize (e.g., relevance score, CTR, revenue, engagement)
      – Currently, most applications are single-objective
      – Could be multi-objective optimization (maximize X subject to Y, Z,..)


  • Properties of the item pool
      – Size (e.g., all web pages vs. 40 stories)
      – Quality of the pool (e.g., anything vs. editorially selected)
      – Lifetime (e.g., mostly old items vs. mostly new items)




   Deepak Agarwal & Bee-Chung Chen @ ICML’11                                    11
Factors affecting Solution (continued)

• Properties of the context
    – Pull: Specified by explicit, user-driven query (e.g., keywords, a form)
    – Push: Specified by implicit context (e.g., a page, a user, a session)
       • Most applications are somewhere on continuum of pull and push

• Properties of the feedback on the matches made
    – Types and semantics of feedback (e.g., click, vote)
    – Latency (e.g., available in 5 minutes vs. 1 day)
    – Volume (e.g., 100K per day vs. 300M per day)

• Constraints specifying legitimate matches
    – e.g., business rules, diversity rules, editorial Voice
    – Multiple objectives
• Available Metadata (e.g., link graph, various user/item attributes)


  Deepak Agarwal & Bee-Chung Chen @ ICML’11                                     12
Predicting User-Item Interactions (e.g. CTR)

• Myth: We have so much data on the web, if we can only
  process it the problem is solved
    – Number of things to learn increases with sample size
       • Rate of increase is not slow
    – Dynamic nature of systems make things worse
    – We want to learn things quickly and react fast


• Data is sparse in web recommender problems
    – We lack enough data to learn all we want to learn and as quickly
      as we would like to learn
    – Several Power laws interacting with each other
       • E.g. User visits power law, items served power law
               – Bivariate Zipf: Owen & Dyer, 2011


  Deepak Agarwal & Bee-Chung Chen @ ICML’11                              13
Can Machine Learning help?

• Fortunately, there are group behaviors that generalize to
  individuals & they are relatively stable
    – E.g. Users in San Francisco tend to read more baseball news


• Key issue: Estimating such groups
    – Coarse group : more stable but does not generalize that well.
    – Granular group: less stable with few individuals
    – Getting a good grouping structure is to hit the “sweet spot”


• Another big advantage on the web
    – Intervene and run small experiments on a small population to
      collect data that helps rapid convergence to the best choices(s)
         • We don’t need to learn all user-item interactions, only those that are good.


  Deepak Agarwal & Bee-Chung Chen @ ICML’11                                               14
Predicting user-item interaction rates

                                         Feature construction
                               Content: IR, clustering, taxonomy, entity,..
                             User profiles: clicks, views, social, community,..




             Offline                                                      Online
( Captures stable characteristics       Initialize                   (Finer resolution
     at coarse resolutions)                                             Corrections)
    (Logistic, Boosting,….)                                          (item, user level)
                                                                     (Quick updates)



                                        Explore/Exploit
                                     (Adaptive sampling)
                                  (helps rapid convergence
                                       to best choices)

    Deepak Agarwal & Bee-Chung Chen @ ICML’11                                             15
Post-click: An example in Content Optimization

Recommender                •
                           EDITORIAL
                                                AD SERVER

               Clicks on FP links influence       DISPLAY
               downstream supply distribution
content                                         ADVERTISING Revenue




                                                  Downstream
                                                  engagement

                                                  (Time spent)




   Deepak Agarwal & Bee-Chung Chen @ ICML’11                     16
Serving Content on Front Page: Click Shaping

• What do we want to optimize?
•   Current: Maximize clicks (maximize downstream supply from FP)
•   But consider the following
      – Article 1: CTR=5%, utility per click = 5
      – Article 2: CTR=4.9%, utility per click=10
         • By promoting 2, we lose 1 click/100 visits, gain 5 utils
•   If we do this for a large number of visits --- lose some clicks but obtain
    significant gains in utility?
      – E.g. lose 5% relative CTR, gain 40% in utility (revenue, engagement, etc)




    Deepak Agarwal & Bee-Chung Chen @ ICML’11                                   17
Example Application:
      Today Module on Yahoo! Homepage

Currently in production, powered by some methods
                           discussed in this tutorial
Recommend packages:
                                                      Image
                                                      Title, summary
                      1        2            3   4     Links to other pages

                                                    Pick 4 out of a pool of K
                                                      K = 20 ~ 40
                                                      Dynamic

                                                    Routes traffic other pages



Deepak Agarwal & Bee-Chung Chen @ ICML’11                                        19
Problem definition

• Display “best” articles for each user visit
• Best - Maximize User Satisfaction, Engagement
    – BUT Hard to obtain quick feedback to measure these


• Approximation
    – Maximize utility based on immediate feedback (click rate) subject
      to constraints (relevance, freshness, diversity)
• Inventory of articles?
    – Created by human editors
    – Small pool (30-50 articles) but refreshes periodically




  Deepak Agarwal & Bee-Chung Chen @ ICML’11                           20
Where are we today?

• Before this research
    – Articles created and selected for display by editors
• After this research
    – Article placement done through statistical models
• How successful ?
 "Just look at our homepage, for example. Since we began pairing our content
   optimization technology with editorial expertise, we've seen click-through rates
   in the Today module more than double. ----- Carol Bartz, CEO Yahoo! Inc (Q4,
   2009)




  Deepak Agarwal & Bee-Chung Chen @ ICML’11                                     21
Main Goals

• Methods to select most popular articles
    – This was done by editors before


• Provide personalized article selection
    – Based on user covariates
    – Based on per user behavior


• Scalability: Methods to generalize in small traffic scenarios
    – Today module part of most Y! portals around the world
    – Also syndicated to sources like Y! Mail, Y! IM etc




  Deepak Agarwal & Bee-Chung Chen @ ICML’11                   22
Similar applications

•   Goal: Use same methods for selecting most popular, personalization
    across different applications at Y!
•   Good news! Methods generalize, already in use




    Deepak Agarwal & Bee-Chung Chen @ ICML’11                        23
Next few hours



                                   Most Popular          Personalized
                                   Recommendation        Recommendation

  Offline Models                                         Collaborative filtering
                                                         (cold-start problem)

  Online Models                    Time-series models    Incremental CF,
                                                         online regression

  Intelligent Initialization       Prior estimation      Prior estimation,
                                                         dimension reduction

  Explore/Exploit                  Multi-armed bandits   Bandits with covariates




  Deepak Agarwal & Bee-Chung Chen @ ICML’11                                        24

Más contenido relacionado

Similar a Recommender Systems Tutorial (Part 1) -- Introduction

Recommendation engines matching items to users
Recommendation engines matching items to usersRecommendation engines matching items to users
Recommendation engines matching items to usersFlytxt
 
Scalable advertising recommender systems
Scalable advertising recommender systemsScalable advertising recommender systems
Scalable advertising recommender systemsJoaquin Delgado PhD.
 
Aiinpractice2017deepaklongversion
Aiinpractice2017deepaklongversionAiinpractice2017deepaklongversion
Aiinpractice2017deepaklongversionDeepak Agarwal
 
Preference Elicitation Interface
Preference Elicitation InterfacePreference Elicitation Interface
Preference Elicitation Interface晓愚 孟
 
Mini-training: Personalization & Recommendation Demystified
Mini-training: Personalization & Recommendation DemystifiedMini-training: Personalization & Recommendation Demystified
Mini-training: Personalization & Recommendation DemystifiedBetclic Everest Group Tech Team
 
Key Lessons Learned Building Recommender Systems for Large-Scale Social Netw...
 Key Lessons Learned Building Recommender Systems for Large-Scale Social Netw... Key Lessons Learned Building Recommender Systems for Large-Scale Social Netw...
Key Lessons Learned Building Recommender Systems for Large-Scale Social Netw...Christian Posse
 
Behemoth SEO: Search Strategy for Huge Websites
Behemoth SEO: Search Strategy for Huge WebsitesBehemoth SEO: Search Strategy for Huge Websites
Behemoth SEO: Search Strategy for Huge WebsitesPhilipp Klöckner
 
L'Oreal Tech Talk
L'Oreal Tech TalkL'Oreal Tech Talk
L'Oreal Tech TalkDoug Chang
 
Utilizing Marginal Net Utility for Recommendation in E-commerce
Utilizing Marginal Net Utility for Recommendation in E-commerceUtilizing Marginal Net Utility for Recommendation in E-commerce
Utilizing Marginal Net Utility for Recommendation in E-commerceLiangjie Hong
 
Design the Search Experience
Design the Search ExperienceDesign the Search Experience
Design the Search ExperienceMarianne Sweeny
 
Data Science for Digital Commerce
Data Science for Digital CommerceData Science for Digital Commerce
Data Science for Digital CommerceManish Gupta, Ph.D.
 
Sept 20 2012 ona show me the numbers
Sept 20 2012 ona  show me the numbersSept 20 2012 ona  show me the numbers
Sept 20 2012 ona show me the numbersHack the Hood
 
Conversion Rate Optmization
Conversion Rate OptmizationConversion Rate Optmization
Conversion Rate OptmizationEdureka!
 
Narrative Mind Week 6 H4D Stanford 2016
Narrative Mind Week 6 H4D Stanford 2016Narrative Mind Week 6 H4D Stanford 2016
Narrative Mind Week 6 H4D Stanford 2016Stanford University
 
Business model innovation in the cloud v1
Business model innovation in the cloud v1Business model innovation in the cloud v1
Business model innovation in the cloud v1Michael Netzley, Ph.D.
 
Social Media Use Cases - Webinar
Social Media Use Cases - Webinar Social Media Use Cases - Webinar
Social Media Use Cases - Webinar Mechanical Turk
 
Digital analytics: Optimization (Lecture 10)
Digital analytics: Optimization (Lecture 10)Digital analytics: Optimization (Lecture 10)
Digital analytics: Optimization (Lecture 10)Joni Salminen
 
Pubcon Presentation on How to Resolve Search Engine Reputation Problems
Pubcon Presentation on How to Resolve Search Engine Reputation ProblemsPubcon Presentation on How to Resolve Search Engine Reputation Problems
Pubcon Presentation on How to Resolve Search Engine Reputation ProblemsOnline Reputation Management
 

Similar a Recommender Systems Tutorial (Part 1) -- Introduction (20)

Recommendation engines matching items to users
Recommendation engines matching items to usersRecommendation engines matching items to users
Recommendation engines matching items to users
 
kdd2015
kdd2015kdd2015
kdd2015
 
Scalable advertising recommender systems
Scalable advertising recommender systemsScalable advertising recommender systems
Scalable advertising recommender systems
 
Aiinpractice2017deepaklongversion
Aiinpractice2017deepaklongversionAiinpractice2017deepaklongversion
Aiinpractice2017deepaklongversion
 
Preference Elicitation Interface
Preference Elicitation InterfacePreference Elicitation Interface
Preference Elicitation Interface
 
Mini-training: Personalization & Recommendation Demystified
Mini-training: Personalization & Recommendation DemystifiedMini-training: Personalization & Recommendation Demystified
Mini-training: Personalization & Recommendation Demystified
 
Key Lessons Learned Building Recommender Systems for Large-Scale Social Netw...
 Key Lessons Learned Building Recommender Systems for Large-Scale Social Netw... Key Lessons Learned Building Recommender Systems for Large-Scale Social Netw...
Key Lessons Learned Building Recommender Systems for Large-Scale Social Netw...
 
Behemoth SEO: Search Strategy for Huge Websites
Behemoth SEO: Search Strategy for Huge WebsitesBehemoth SEO: Search Strategy for Huge Websites
Behemoth SEO: Search Strategy for Huge Websites
 
L'Oreal Tech Talk
L'Oreal Tech TalkL'Oreal Tech Talk
L'Oreal Tech Talk
 
Utilizing Marginal Net Utility for Recommendation in E-commerce
Utilizing Marginal Net Utility for Recommendation in E-commerceUtilizing Marginal Net Utility for Recommendation in E-commerce
Utilizing Marginal Net Utility for Recommendation in E-commerce
 
Design the Search Experience
Design the Search ExperienceDesign the Search Experience
Design the Search Experience
 
Data Science for Digital Commerce
Data Science for Digital CommerceData Science for Digital Commerce
Data Science for Digital Commerce
 
Sept 20 2012 ona show me the numbers
Sept 20 2012 ona  show me the numbersSept 20 2012 ona  show me the numbers
Sept 20 2012 ona show me the numbers
 
SEO + MTurk
SEO + MTurk SEO + MTurk
SEO + MTurk
 
Conversion Rate Optmization
Conversion Rate OptmizationConversion Rate Optmization
Conversion Rate Optmization
 
Narrative Mind Week 6 H4D Stanford 2016
Narrative Mind Week 6 H4D Stanford 2016Narrative Mind Week 6 H4D Stanford 2016
Narrative Mind Week 6 H4D Stanford 2016
 
Business model innovation in the cloud v1
Business model innovation in the cloud v1Business model innovation in the cloud v1
Business model innovation in the cloud v1
 
Social Media Use Cases - Webinar
Social Media Use Cases - Webinar Social Media Use Cases - Webinar
Social Media Use Cases - Webinar
 
Digital analytics: Optimization (Lecture 10)
Digital analytics: Optimization (Lecture 10)Digital analytics: Optimization (Lecture 10)
Digital analytics: Optimization (Lecture 10)
 
Pubcon Presentation on How to Resolve Search Engine Reputation Problems
Pubcon Presentation on How to Resolve Search Engine Reputation ProblemsPubcon Presentation on How to Resolve Search Engine Reputation Problems
Pubcon Presentation on How to Resolve Search Engine Reputation Problems
 

Último

Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024SkyPlanner
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPathCommunity
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesMd Hossain Ali
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfDianaGray10
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxUdaiappa Ramachandran
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDELiveplex
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-pyJamie (Taka) Wang
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfDaniel Santiago Silva Capera
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024D Cloud Solutions
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Commit University
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Websitedgelyza
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureEric D. Schabell
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1DianaGray10
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Brian Pichman
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7DianaGray10
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXTarek Kalaji
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxMatsuo Lab
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdfPedro Manuel
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesDavid Newbury
 

Último (20)

Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024Salesforce Miami User Group Event - 1st Quarter 2024
Salesforce Miami User Group Event - 1st Quarter 2024
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation Developers
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
 
20230104 - machine vision
20230104 - machine vision20230104 - machine vision
20230104 - machine vision
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptx
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-py
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Website
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability Adventure
 
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1UiPath Platform: The Backend Engine Powering Your Automation - Session 1
UiPath Platform: The Backend Engine Powering Your Automation - Session 1
 
Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )Building Your Own AI Instance (TBLC AI )
Building Your Own AI Instance (TBLC AI )
 
UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBX
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptx
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdf
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond Ontologies
 

Recommender Systems Tutorial (Part 1) -- Introduction

  • 1. ICML’11 Tutorial: Recommender Problems for Web Applications Deepak Agarwal and Bee-Chung Chen Yahoo! Research
  • 2. Other Significant Y! Labs Contributors • Content Targeting – Pradheep Elango – Rajiv Khanna – Raghu Ramakrishnan – Xuanhui Wang – Liang Zhang • Ad Targeting – Nagaraj Kota Deepak Agarwal & Bee-Chung Chen @ ICML’11 2
  • 3. Agenda • Topic of Interest – Recommender problems for dynamic, time-sensitive applications dynamic • Content Optimization (main focus today), Online Advertising, Movie recommendation, shopping,… • Introduction (20 min, Deepak) • Offline components (40 min, Deepak) – Regression, Collaborative filtering (CF), … • Online components + initialization (70 min, Bee-Chung) – Time-series, online/incremental methods, explore/exploit (bandit) • Evaluation methods + Multi-Objective (10-15 min, Deepak) • Challenges (5-10 min, Deepak) Deepak Agarwal & Bee-Chung Chen @ ICML’11 3
  • 4. Three components we will focus on today • Defining the problem – Formulate objectives whose optimization achieves some long- term goals for the recommender system • E.g. How to serve content to optimize audience reach and engagement, optimize some combination of engagement and revenue ? • Modeling (to estimate some critical inputs) – Predict rates of some positive user interaction(s) with items based on data obtained from historical user-item interactions • E.g. Click rates, average time-spent on page, etc • Could be explicit feedback like ratings • Experimentation – Create experiments to collect data proactively to improve models, helps in converging to the best choice(s) cheaply and rapidly. • Explore and Exploit (continuous experimentation) • DOE (testing hypotheses by avoiding bias inherent in data) Deepak Agarwal & Bee-Chung Chen @ ICML’11 4
  • 5. Modern Recommendation Systems • Goal – Serve the right item to a user in a given context to optimize long- term business objectives • A scientific discipline that involves – Large scale Machine Learning & Statistics • Offline Models (capture global & stable characteristics) • Online Models (incorporates dynamic components) • Explore/Exploit (active and adaptive experimentation) – Multi-Objective Optimization • Click-rates (CTR), Engagement, advertising revenue, diversity, etc – Inferring user interest • Constructing User Profiles – Natural Language Processing to understand content • Topics, “aboutness”, entities, follow-up of something, breaking news,… Deepak Agarwal & Bee-Chung Chen @ ICML’11 5
  • 6. Some examples from content optimization • Simple version – I have a content module on my page, content inventory is obtained from a third party source which is further refined through editorial oversight. Can I algorithmically recommend content on this module? I want to improve overall click-rate (CTR) on this module • More advanced – I got X% lift in CTR. But I have additional information on other downstream utilities (e.g. advertising revenue). Can I increase downstream utility without losing too many clicks? • Highly advanced – There are multiple modules running on my webpage. How do I perform a simultaneous optimization? Deepak Agarwal & Bee-Chung Chen @ ICML’11 6
  • 7. Recommend search queries Recommend packages: Image Title, summary Links to other pages Pick 4 out of a pool of K K = 20 ~ 40 Dynamic Routes traffic other pages Recommend applications Recommend news article Deepak Agarwal & Bee-Chung Chen @ ICML’11 7
  • 8. Problems in this example • Optimize CTR on multiple modules – Today Module, Trending Now, Personal Assistant, News – Simple solution: Treat modules as independent, optimize separately. May not be the best when there are strong correlations. • For any single module – Optimize some combination of CTR, downstream engagement, and perhaps advertising revenue. Deepak Agarwal & Bee-Chung Chen @ ICML’11 8
  • 9. Online Advertising Response rates (click, conversion, ad-view) Bids conversion ML /Statistical Auction model Advertisers Select argmax f(bid,response rates) Click Recommend Ads Best ad(s) Ad Network Page User •Examples: Yahoo, Google, MSN, … Ad exchanges (RightMedia, DoubleClick, …) Publisher Deepak Agarwal & Bee-Chung Chen @ ICML’11 9
  • 10. Recommender problems in general • Example applications • Search: Web, Vertical • Online Advertising Item Inventory • Content Articles, web page, •….. ads, … Context query, page, … Use an automated algorithm to select item(s) to show Get feedback (click, time spent,..) USER Refine the models Repeat (large number of times) Optimize metric(s) of interest (Total clicks, Total revenue,…) Deepak Agarwal & Bee-Chung Chen @ ICML’11 10
  • 11. Important Factors • Items: Articles, ads, modules, movies, users, updates, etc. • Context: query keywords, pages, mobile, social media, etc. • Metric to optimize (e.g., relevance score, CTR, revenue, engagement) – Currently, most applications are single-objective – Could be multi-objective optimization (maximize X subject to Y, Z,..) • Properties of the item pool – Size (e.g., all web pages vs. 40 stories) – Quality of the pool (e.g., anything vs. editorially selected) – Lifetime (e.g., mostly old items vs. mostly new items) Deepak Agarwal & Bee-Chung Chen @ ICML’11 11
  • 12. Factors affecting Solution (continued) • Properties of the context – Pull: Specified by explicit, user-driven query (e.g., keywords, a form) – Push: Specified by implicit context (e.g., a page, a user, a session) • Most applications are somewhere on continuum of pull and push • Properties of the feedback on the matches made – Types and semantics of feedback (e.g., click, vote) – Latency (e.g., available in 5 minutes vs. 1 day) – Volume (e.g., 100K per day vs. 300M per day) • Constraints specifying legitimate matches – e.g., business rules, diversity rules, editorial Voice – Multiple objectives • Available Metadata (e.g., link graph, various user/item attributes) Deepak Agarwal & Bee-Chung Chen @ ICML’11 12
  • 13. Predicting User-Item Interactions (e.g. CTR) • Myth: We have so much data on the web, if we can only process it the problem is solved – Number of things to learn increases with sample size • Rate of increase is not slow – Dynamic nature of systems make things worse – We want to learn things quickly and react fast • Data is sparse in web recommender problems – We lack enough data to learn all we want to learn and as quickly as we would like to learn – Several Power laws interacting with each other • E.g. User visits power law, items served power law – Bivariate Zipf: Owen & Dyer, 2011 Deepak Agarwal & Bee-Chung Chen @ ICML’11 13
  • 14. Can Machine Learning help? • Fortunately, there are group behaviors that generalize to individuals & they are relatively stable – E.g. Users in San Francisco tend to read more baseball news • Key issue: Estimating such groups – Coarse group : more stable but does not generalize that well. – Granular group: less stable with few individuals – Getting a good grouping structure is to hit the “sweet spot” • Another big advantage on the web – Intervene and run small experiments on a small population to collect data that helps rapid convergence to the best choices(s) • We don’t need to learn all user-item interactions, only those that are good. Deepak Agarwal & Bee-Chung Chen @ ICML’11 14
  • 15. Predicting user-item interaction rates Feature construction Content: IR, clustering, taxonomy, entity,.. User profiles: clicks, views, social, community,.. Offline Online ( Captures stable characteristics Initialize (Finer resolution at coarse resolutions) Corrections) (Logistic, Boosting,….) (item, user level) (Quick updates) Explore/Exploit (Adaptive sampling) (helps rapid convergence to best choices) Deepak Agarwal & Bee-Chung Chen @ ICML’11 15
  • 16. Post-click: An example in Content Optimization Recommender • EDITORIAL AD SERVER Clicks on FP links influence DISPLAY downstream supply distribution content ADVERTISING Revenue Downstream engagement (Time spent) Deepak Agarwal & Bee-Chung Chen @ ICML’11 16
  • 17. Serving Content on Front Page: Click Shaping • What do we want to optimize? • Current: Maximize clicks (maximize downstream supply from FP) • But consider the following – Article 1: CTR=5%, utility per click = 5 – Article 2: CTR=4.9%, utility per click=10 • By promoting 2, we lose 1 click/100 visits, gain 5 utils • If we do this for a large number of visits --- lose some clicks but obtain significant gains in utility? – E.g. lose 5% relative CTR, gain 40% in utility (revenue, engagement, etc) Deepak Agarwal & Bee-Chung Chen @ ICML’11 17
  • 18. Example Application: Today Module on Yahoo! Homepage Currently in production, powered by some methods discussed in this tutorial
  • 19. Recommend packages: Image Title, summary 1 2 3 4 Links to other pages Pick 4 out of a pool of K K = 20 ~ 40 Dynamic Routes traffic other pages Deepak Agarwal & Bee-Chung Chen @ ICML’11 19
  • 20. Problem definition • Display “best” articles for each user visit • Best - Maximize User Satisfaction, Engagement – BUT Hard to obtain quick feedback to measure these • Approximation – Maximize utility based on immediate feedback (click rate) subject to constraints (relevance, freshness, diversity) • Inventory of articles? – Created by human editors – Small pool (30-50 articles) but refreshes periodically Deepak Agarwal & Bee-Chung Chen @ ICML’11 20
  • 21. Where are we today? • Before this research – Articles created and selected for display by editors • After this research – Article placement done through statistical models • How successful ? "Just look at our homepage, for example. Since we began pairing our content optimization technology with editorial expertise, we've seen click-through rates in the Today module more than double. ----- Carol Bartz, CEO Yahoo! Inc (Q4, 2009) Deepak Agarwal & Bee-Chung Chen @ ICML’11 21
  • 22. Main Goals • Methods to select most popular articles – This was done by editors before • Provide personalized article selection – Based on user covariates – Based on per user behavior • Scalability: Methods to generalize in small traffic scenarios – Today module part of most Y! portals around the world – Also syndicated to sources like Y! Mail, Y! IM etc Deepak Agarwal & Bee-Chung Chen @ ICML’11 22
  • 23. Similar applications • Goal: Use same methods for selecting most popular, personalization across different applications at Y! • Good news! Methods generalize, already in use Deepak Agarwal & Bee-Chung Chen @ ICML’11 23
  • 24. Next few hours Most Popular Personalized Recommendation Recommendation Offline Models Collaborative filtering (cold-start problem) Online Models Time-series models Incremental CF, online regression Intelligent Initialization Prior estimation Prior estimation, dimension reduction Explore/Exploit Multi-armed bandits Bandits with covariates Deepak Agarwal & Bee-Chung Chen @ ICML’11 24

Notas del editor

  1. This only shows one scenario; that of content match. Let’s add Sponsored Search (Replace Content with Query) and Have a new slide for display advertising. This also does not provide info for the revenue model (shall we add it here or later).