SlideShare una empresa de Scribd logo
1 de 26
When Relevance is not Enough:
Promoting Diversity and Freshness in
Personalized Question
Recommendation
IDAN SZPEKTOR,YOELLE MAAREK,DAN PELLEG

YAHOO!RESEARCH
ABSTRACT
a good question recommendation system
1.

designed around answerers, rather than exclusively for askers

2.

Scale to many questions and users and be fast enough

3.

Relevant to his or her interests

4.

diversity
INTRODUCTION
Common way: only to the best possible answerers (“experts”)
All potential answerers
INTRODUCTION
relevance: to what degree the question matches the user’s tastes
diversity and freshness needs
Three requirements:
1. questions need to be recommended for all types of users
2. questions have to be diverse
3. recommendations need to be fresh and be served fast
a) serve questions as recommendations immediately
b) instantly adapting to users’ changes in taste
RELATED WORK
limitations
real-time ranking
the needs of new users with very little historical data are not addressed well.
only on relevance
Framework
Question profile:
1. LDA model
2. Lexical model
3. Category model

User profile:
Question recommendation
Matching question and user profiles
Proactive diversification
Recommendation merging
QUESTION PROFILE
Split it according to the 26 top categories in Yahoo! Answers
Two Advantage:
1.
2.

represent disjoint users’ interests.
word sense disambiguation

1.

question textual content(title and body)

2.

category
QUESTION PROFILE
Build profile, which is represented by three vectors:
1.

a Latent Dirichlet Allocation (LDA) topic vector

2.

a lexical vector

3.

a category vector
LDA Model
1. Initial training: a random sample
of up to 2 million resolved
questions
2. Incremental learning: a random
sample of up to half a million
questions per top category
3. Inference: at least10% of the
probability mass
Lexical Model
a unigram bag-of-words representation of a question
tf·idf score / L1 normalized
a probability distribution

Category Model
a probability of 1 to the category in which the question was posted
USER PROFILE
the questions answered in the past
the user representation is generated by aggregating signals over these
questions
user profile: a probability tree
1. Aggregating the profiles of the questions the user answered
2. Update
the first and third tree levels:
a decaying factor on past questions

the second level:
1. Measure the similarity between the feature distribution of each model in the
question and the corresponding feature distribution in the user profile
2. Normalized to a probability distribution
QUESTION RECOMMENDATION
Matching Question and User Profiles
A list of open questions ranked by a relevance score, which is calculated for the pair {question
profile , user profile}

For question profiles:
1.

Turn the three vectors forming the question profile into a single vector, multiply the
probability of each feature by 1/3 before storing it in the index

2.

Index every question vector and build an inverted index
QUESTION RECOMMENDATION
For user profile:
associate with each user feature a score that consists of the product of each probability score
on the tree path that led to this feature

Ranking:
Similarity: a simple dot-product
QUESTION RECOMMENDATION
Proactive Diversification
thematic sampling:
1.

For each user vector u , we generate N query vectors u 1 ;u 2 ;…;u N

2.

N ranked lists

3.

Blending them together results in a final diverse list

Two types of thematic constraints:

specific top category: randomly select top categories as constraints by sampling without repetition
based on their distribution in the root node of the user’s probability tree
spefic LDA topic: randomly sample LDA topics without repetition from the user profile by traversing
the probability tree
QUESTION RECOMMENDATION
Recommendation Merging
blending algorithm
1.

Each list being associated with a probability score

2.

Sampling an intermediate list, based on the assigned probabilities

3.

Removing one recommendation from the sampled list to be added at the end of the final
list.

4.

Repeat
QUESTION RECOMMENDATION
Non-Thematic LDA Topics
QUESTION RECOMMENDATION
Non-Thematic LDA Topics
116 topics, 23 top categories
34% non-thematic topics
A logistic regression classifier
EXPERIMENTS
Offline Experiment
8 different top categories
Active users: at least 21 questions as of January 2011
New users: at least two questions as of January 2011
EXPERIMENTS
Online Experiment
A/B test
Control bucket , CTL ( n = 25093)
Relevance bucket , R ( n = 5359)
Freshness bucket , F ( n = 46228) : 50% recent ; 20% thematic sampling
Diversity bucket , D ( n = 42041) : 20% recent ; 50% thematic sampling
CONCLUSIONS
Relevance, but also by freshness and diversity
Several relevance models
“question retrieval engine“
Diversity: thematic sampling
内容上:different factors/models/levels

写作上:层次清楚,递进

Más contenido relacionado

Similar a When relevance is not enough

Using GradeMark to improve feedback and involve students in the marking process
Using GradeMark to improve feedback and involve students in the marking process Using GradeMark to improve feedback and involve students in the marking process
Using GradeMark to improve feedback and involve students in the marking process Sara Marsham
 
PUBH 6034 Module 3 Assignment Air Quality Standards Worksheet (Ru.docx
PUBH 6034 Module 3 Assignment Air Quality Standards Worksheet (Ru.docxPUBH 6034 Module 3 Assignment Air Quality Standards Worksheet (Ru.docx
PUBH 6034 Module 3 Assignment Air Quality Standards Worksheet (Ru.docxpotmanandrea
 
1 Social Science Statistics Project 1 Global Issu.docx
 1 Social Science Statistics  Project 1 Global Issu.docx 1 Social Science Statistics  Project 1 Global Issu.docx
1 Social Science Statistics Project 1 Global Issu.docxShiraPrater50
 
1 Social Science Statistics Project 1 Global Issu.docx
1 Social Science Statistics  Project 1 Global Issu.docx1 Social Science Statistics  Project 1 Global Issu.docx
1 Social Science Statistics Project 1 Global Issu.docxpoulterbarbara
 
What papers should I cite from my reading list? User evaluation of a manuscri...
What papers should I cite from my reading list? User evaluation of a manuscri...What papers should I cite from my reading list? User evaluation of a manuscri...
What papers should I cite from my reading list? User evaluation of a manuscri...Aravind Sesagiri Raamkumar
 
I want to answer, who has a
I want to answer, who has aI want to answer, who has a
I want to answer, who has achenbojyh
 
Instructions for LearnersUse this template only if you have a
Instructions for LearnersUse this template only if you have a Instructions for LearnersUse this template only if you have a
Instructions for LearnersUse this template only if you have a TatianaMajor22
 
· Toggle DrawerOverviewFor this assessment, you will complete .docx
· Toggle DrawerOverviewFor this assessment, you will complete .docx· Toggle DrawerOverviewFor this assessment, you will complete .docx
· Toggle DrawerOverviewFor this assessment, you will complete .docxodiliagilby
 
HUMANITIES 105 - THE HUMAN STRUGGLE PRESENTATION ASSIG.docx
HUMANITIES 105 - THE HUMAN STRUGGLE PRESENTATION ASSIG.docxHUMANITIES 105 - THE HUMAN STRUGGLE PRESENTATION ASSIG.docx
HUMANITIES 105 - THE HUMAN STRUGGLE PRESENTATION ASSIG.docxeugeniadean34240
 
TYPE TITLE OF DISSERTATION IN ALL UPPERCASE LETTERS USING TWO-TIER.docx
TYPE TITLE OF DISSERTATION IN ALL UPPERCASE LETTERS USING TWO-TIER.docxTYPE TITLE OF DISSERTATION IN ALL UPPERCASE LETTERS USING TWO-TIER.docx
TYPE TITLE OF DISSERTATION IN ALL UPPERCASE LETTERS USING TWO-TIER.docxouldparis
 
data analysis and report wring in research (Section d)
data analysis and report wring  in research (Section d)data analysis and report wring  in research (Section d)
data analysis and report wring in research (Section d)CGC Technical campus,Mohali
 
Publishing with IEEE Workshop February 2019
Publishing with IEEE Workshop February 2019Publishing with IEEE Workshop February 2019
Publishing with IEEE Workshop February 2019uoblibraries
 
! College of Doctoral Studies PSY 850 SPSS Assi.docx
!          College of Doctoral Studies PSY 850 SPSS Assi.docx!          College of Doctoral Studies PSY 850 SPSS Assi.docx
! College of Doctoral Studies PSY 850 SPSS Assi.docxMARRY7
 
Marshall hm poster_vra2015
Marshall hm poster_vra2015Marshall hm poster_vra2015
Marshall hm poster_vra2015Hannah Marshall
 
Running head HOW TO WRITE A RESEARCH PROPOSAL 1 .docx
Running head HOW TO WRITE A RESEARCH PROPOSAL  1  .docxRunning head HOW TO WRITE A RESEARCH PROPOSAL  1  .docx
Running head HOW TO WRITE A RESEARCH PROPOSAL 1 .docxcowinhelen
 
Research Proposal Tentative Schedule and Assignment(All of the .docx
Research Proposal Tentative Schedule and Assignment(All of the .docxResearch Proposal Tentative Schedule and Assignment(All of the .docx
Research Proposal Tentative Schedule and Assignment(All of the .docxdebishakespeare
 
College of Doctoral StudiesBackground Inform.docx
College of Doctoral StudiesBackground Inform.docxCollege of Doctoral StudiesBackground Inform.docx
College of Doctoral StudiesBackground Inform.docxadkinspaige22
 
College of Doctoral StudiesBackground Inform.docx
                College of Doctoral StudiesBackground Inform.docx                College of Doctoral StudiesBackground Inform.docx
College of Doctoral StudiesBackground Inform.docxhallettfaustina
 
5MARK012W – CW1 INDIVIDUAL ASSIGNMENTSemester One - Task Three S.docx
5MARK012W – CW1 INDIVIDUAL ASSIGNMENTSemester One - Task Three S.docx5MARK012W – CW1 INDIVIDUAL ASSIGNMENTSemester One - Task Three S.docx
5MARK012W – CW1 INDIVIDUAL ASSIGNMENTSemester One - Task Three S.docxblondellchancy
 
Academic Writing Expectations Checklist The faculty Assessor w.docx
Academic Writing Expectations Checklist The faculty Assessor w.docxAcademic Writing Expectations Checklist The faculty Assessor w.docx
Academic Writing Expectations Checklist The faculty Assessor w.docxdaniahendric
 

Similar a When relevance is not enough (20)

Using GradeMark to improve feedback and involve students in the marking process
Using GradeMark to improve feedback and involve students in the marking process Using GradeMark to improve feedback and involve students in the marking process
Using GradeMark to improve feedback and involve students in the marking process
 
PUBH 6034 Module 3 Assignment Air Quality Standards Worksheet (Ru.docx
PUBH 6034 Module 3 Assignment Air Quality Standards Worksheet (Ru.docxPUBH 6034 Module 3 Assignment Air Quality Standards Worksheet (Ru.docx
PUBH 6034 Module 3 Assignment Air Quality Standards Worksheet (Ru.docx
 
1 Social Science Statistics Project 1 Global Issu.docx
 1 Social Science Statistics  Project 1 Global Issu.docx 1 Social Science Statistics  Project 1 Global Issu.docx
1 Social Science Statistics Project 1 Global Issu.docx
 
1 Social Science Statistics Project 1 Global Issu.docx
1 Social Science Statistics  Project 1 Global Issu.docx1 Social Science Statistics  Project 1 Global Issu.docx
1 Social Science Statistics Project 1 Global Issu.docx
 
What papers should I cite from my reading list? User evaluation of a manuscri...
What papers should I cite from my reading list? User evaluation of a manuscri...What papers should I cite from my reading list? User evaluation of a manuscri...
What papers should I cite from my reading list? User evaluation of a manuscri...
 
I want to answer, who has a
I want to answer, who has aI want to answer, who has a
I want to answer, who has a
 
Instructions for LearnersUse this template only if you have a
Instructions for LearnersUse this template only if you have a Instructions for LearnersUse this template only if you have a
Instructions for LearnersUse this template only if you have a
 
· Toggle DrawerOverviewFor this assessment, you will complete .docx
· Toggle DrawerOverviewFor this assessment, you will complete .docx· Toggle DrawerOverviewFor this assessment, you will complete .docx
· Toggle DrawerOverviewFor this assessment, you will complete .docx
 
HUMANITIES 105 - THE HUMAN STRUGGLE PRESENTATION ASSIG.docx
HUMANITIES 105 - THE HUMAN STRUGGLE PRESENTATION ASSIG.docxHUMANITIES 105 - THE HUMAN STRUGGLE PRESENTATION ASSIG.docx
HUMANITIES 105 - THE HUMAN STRUGGLE PRESENTATION ASSIG.docx
 
TYPE TITLE OF DISSERTATION IN ALL UPPERCASE LETTERS USING TWO-TIER.docx
TYPE TITLE OF DISSERTATION IN ALL UPPERCASE LETTERS USING TWO-TIER.docxTYPE TITLE OF DISSERTATION IN ALL UPPERCASE LETTERS USING TWO-TIER.docx
TYPE TITLE OF DISSERTATION IN ALL UPPERCASE LETTERS USING TWO-TIER.docx
 
data analysis and report wring in research (Section d)
data analysis and report wring  in research (Section d)data analysis and report wring  in research (Section d)
data analysis and report wring in research (Section d)
 
Publishing with IEEE Workshop February 2019
Publishing with IEEE Workshop February 2019Publishing with IEEE Workshop February 2019
Publishing with IEEE Workshop February 2019
 
! College of Doctoral Studies PSY 850 SPSS Assi.docx
!          College of Doctoral Studies PSY 850 SPSS Assi.docx!          College of Doctoral Studies PSY 850 SPSS Assi.docx
! College of Doctoral Studies PSY 850 SPSS Assi.docx
 
Marshall hm poster_vra2015
Marshall hm poster_vra2015Marshall hm poster_vra2015
Marshall hm poster_vra2015
 
Running head HOW TO WRITE A RESEARCH PROPOSAL 1 .docx
Running head HOW TO WRITE A RESEARCH PROPOSAL  1  .docxRunning head HOW TO WRITE A RESEARCH PROPOSAL  1  .docx
Running head HOW TO WRITE A RESEARCH PROPOSAL 1 .docx
 
Research Proposal Tentative Schedule and Assignment(All of the .docx
Research Proposal Tentative Schedule and Assignment(All of the .docxResearch Proposal Tentative Schedule and Assignment(All of the .docx
Research Proposal Tentative Schedule and Assignment(All of the .docx
 
College of Doctoral StudiesBackground Inform.docx
College of Doctoral StudiesBackground Inform.docxCollege of Doctoral StudiesBackground Inform.docx
College of Doctoral StudiesBackground Inform.docx
 
College of Doctoral StudiesBackground Inform.docx
                College of Doctoral StudiesBackground Inform.docx                College of Doctoral StudiesBackground Inform.docx
College of Doctoral StudiesBackground Inform.docx
 
5MARK012W – CW1 INDIVIDUAL ASSIGNMENTSemester One - Task Three S.docx
5MARK012W – CW1 INDIVIDUAL ASSIGNMENTSemester One - Task Three S.docx5MARK012W – CW1 INDIVIDUAL ASSIGNMENTSemester One - Task Three S.docx
5MARK012W – CW1 INDIVIDUAL ASSIGNMENTSemester One - Task Three S.docx
 
Academic Writing Expectations Checklist The faculty Assessor w.docx
Academic Writing Expectations Checklist The faculty Assessor w.docxAcademic Writing Expectations Checklist The faculty Assessor w.docx
Academic Writing Expectations Checklist The faculty Assessor w.docx
 

Más de moresmile

Using content and interactions for discovering communities in
Using content and interactions for discovering communities inUsing content and interactions for discovering communities in
Using content and interactions for discovering communities inmoresmile
 
Topical keyphrase extraction from twitter
Topical keyphrase extraction from twitterTopical keyphrase extraction from twitter
Topical keyphrase extraction from twittermoresmile
 
Questions about questions
Questions about questionsQuestions about questions
Questions about questionsmoresmile
 
Magnet community identification on social networks
Magnet community identification on social networksMagnet community identification on social networks
Magnet community identification on social networksmoresmile
 
Is it time for a career switch
Is it time for a career switchIs it time for a career switch
Is it time for a career switchmoresmile
 
Generating event storylines from microblogs
Generating event storylines from microblogsGenerating event storylines from microblogs
Generating event storylines from microblogsmoresmile
 
Finding bursty topics from microblogs
Finding bursty topics from microblogsFinding bursty topics from microblogs
Finding bursty topics from microblogsmoresmile
 
Exploring social influence via posterior effect of word of-mouth
Exploring social influence via posterior effect of word of-mouthExploring social influence via posterior effect of word of-mouth
Exploring social influence via posterior effect of word of-mouthmoresmile
 
Event summarization using tweets
Event summarization using tweetsEvent summarization using tweets
Event summarization using tweetsmoresmile
 

Más de moresmile (9)

Using content and interactions for discovering communities in
Using content and interactions for discovering communities inUsing content and interactions for discovering communities in
Using content and interactions for discovering communities in
 
Topical keyphrase extraction from twitter
Topical keyphrase extraction from twitterTopical keyphrase extraction from twitter
Topical keyphrase extraction from twitter
 
Questions about questions
Questions about questionsQuestions about questions
Questions about questions
 
Magnet community identification on social networks
Magnet community identification on social networksMagnet community identification on social networks
Magnet community identification on social networks
 
Is it time for a career switch
Is it time for a career switchIs it time for a career switch
Is it time for a career switch
 
Generating event storylines from microblogs
Generating event storylines from microblogsGenerating event storylines from microblogs
Generating event storylines from microblogs
 
Finding bursty topics from microblogs
Finding bursty topics from microblogsFinding bursty topics from microblogs
Finding bursty topics from microblogs
 
Exploring social influence via posterior effect of word of-mouth
Exploring social influence via posterior effect of word of-mouthExploring social influence via posterior effect of word of-mouth
Exploring social influence via posterior effect of word of-mouth
 
Event summarization using tweets
Event summarization using tweetsEvent summarization using tweets
Event summarization using tweets
 

Último

Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 

Último (20)

Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 

When relevance is not enough

  • 1. When Relevance is not Enough: Promoting Diversity and Freshness in Personalized Question Recommendation IDAN SZPEKTOR,YOELLE MAAREK,DAN PELLEG YAHOO!RESEARCH
  • 2. ABSTRACT a good question recommendation system 1. designed around answerers, rather than exclusively for askers 2. Scale to many questions and users and be fast enough 3. Relevant to his or her interests 4. diversity
  • 3. INTRODUCTION Common way: only to the best possible answerers (“experts”) All potential answerers
  • 4. INTRODUCTION relevance: to what degree the question matches the user’s tastes diversity and freshness needs Three requirements: 1. questions need to be recommended for all types of users 2. questions have to be diverse 3. recommendations need to be fresh and be served fast a) serve questions as recommendations immediately b) instantly adapting to users’ changes in taste
  • 5.
  • 6. RELATED WORK limitations real-time ranking the needs of new users with very little historical data are not addressed well. only on relevance
  • 7. Framework Question profile: 1. LDA model 2. Lexical model 3. Category model User profile: Question recommendation Matching question and user profiles Proactive diversification Recommendation merging
  • 8. QUESTION PROFILE Split it according to the 26 top categories in Yahoo! Answers Two Advantage: 1. 2. represent disjoint users’ interests. word sense disambiguation 1. question textual content(title and body) 2. category
  • 9. QUESTION PROFILE Build profile, which is represented by three vectors: 1. a Latent Dirichlet Allocation (LDA) topic vector 2. a lexical vector 3. a category vector
  • 10. LDA Model 1. Initial training: a random sample of up to 2 million resolved questions 2. Incremental learning: a random sample of up to half a million questions per top category 3. Inference: at least10% of the probability mass
  • 11. Lexical Model a unigram bag-of-words representation of a question tf·idf score / L1 normalized a probability distribution Category Model a probability of 1 to the category in which the question was posted
  • 12. USER PROFILE the questions answered in the past the user representation is generated by aggregating signals over these questions user profile: a probability tree
  • 13. 1. Aggregating the profiles of the questions the user answered 2. Update
  • 14. the first and third tree levels: a decaying factor on past questions the second level: 1. Measure the similarity between the feature distribution of each model in the question and the corresponding feature distribution in the user profile 2. Normalized to a probability distribution
  • 15. QUESTION RECOMMENDATION Matching Question and User Profiles A list of open questions ranked by a relevance score, which is calculated for the pair {question profile , user profile} For question profiles: 1. Turn the three vectors forming the question profile into a single vector, multiply the probability of each feature by 1/3 before storing it in the index 2. Index every question vector and build an inverted index
  • 16. QUESTION RECOMMENDATION For user profile: associate with each user feature a score that consists of the product of each probability score on the tree path that led to this feature Ranking: Similarity: a simple dot-product
  • 17. QUESTION RECOMMENDATION Proactive Diversification thematic sampling: 1. For each user vector u , we generate N query vectors u 1 ;u 2 ;…;u N 2. N ranked lists 3. Blending them together results in a final diverse list Two types of thematic constraints: specific top category: randomly select top categories as constraints by sampling without repetition based on their distribution in the root node of the user’s probability tree spefic LDA topic: randomly sample LDA topics without repetition from the user profile by traversing the probability tree
  • 18. QUESTION RECOMMENDATION Recommendation Merging blending algorithm 1. Each list being associated with a probability score 2. Sampling an intermediate list, based on the assigned probabilities 3. Removing one recommendation from the sampled list to be added at the end of the final list. 4. Repeat
  • 20. QUESTION RECOMMENDATION Non-Thematic LDA Topics 116 topics, 23 top categories 34% non-thematic topics A logistic regression classifier
  • 21. EXPERIMENTS Offline Experiment 8 different top categories Active users: at least 21 questions as of January 2011 New users: at least two questions as of January 2011
  • 22. EXPERIMENTS Online Experiment A/B test Control bucket , CTL ( n = 25093) Relevance bucket , R ( n = 5359) Freshness bucket , F ( n = 46228) : 50% recent ; 20% thematic sampling Diversity bucket , D ( n = 42041) : 20% recent ; 50% thematic sampling
  • 23.
  • 24.
  • 25.
  • 26. CONCLUSIONS Relevance, but also by freshness and diversity Several relevance models “question retrieval engine“ Diversity: thematic sampling 内容上:different factors/models/levels 写作上:层次清楚,递进