Search plays an important role in online social networks as it provides an essential mechanism for discovering members and content on the network. Related search recommendation is one of several mechanisms used for improving members’ search experience in finding relevant results to their queries. This paper describes the design, implementation, and deployment of Metaphor, the related search recommendation system on LinkedIn, a professional social networking site with over 175 million members worldwide. Metaphor builds on a number of signals and filters that capture several dimensions of relatedness across member search activity.
The system, which has been in live operation for over a year, has gone through multiple iterations and evaluation cycles. This paper makes three contributions. First, we provide a discussion of a large-scale related search recommendation system. Second, we describe a mechanism for effectively combining several signals in building a unified dataset for related search recommendations. Third, we introduce a query length model for capturing bias in recommendation
click behavior. We also discuss some of the practical concerns in deploying related search recommendations.
Metaphor: A system for related searches recommendations
1. Metaphor: A System for Related
Searches Recommendations
Mitul Tiwari
Joint work with Azarias Reda, Yubin Park, Christian Posse, and
Sam Shah
LinkedIn
3. Outline
• Related Searches at LinkedIn
• Metaphor: A System for Related Searches
Recommendations
• Design
• Implementation
• Evaluation
4. LinkedIn by the numbers
• 175M+ members
• 2+ new user registrations per second
• 4.2 Billion people searches in 2011
• 9.3 Billion page views in Q2 2012
• 100+ million monthly active users in Q2 2012
5. Related Searches at LinkedIn
• Millions of searches everyday
• Goal: Build related searches system at LinkedIn
• To help users to explore and refine their queries
Azarias Reda, Yubin Park, Mitul Tiwari, Christian Posse, and Sam Shah
10. Design: Collaborative Filtering
• Searches correlated by time
• Searches done in the same session by the same user
• Collaborative filtering: implicit feedback
• TFIDF scoring to take care of popular queries (e.g. `Obama’)
Q1 Q2 Q3
Time
14. Design: Length Bias
• Insight: clicks on suggestions one term longer
• Corresponds to refining the initial query
• Statistical biasing model to score a longer query higher
15. Design: Ensemble Approach
• Need to generate unified recommendation dataset
• Analysis to figure out engagement of each signal
• Attempted ML approach
• Minimal overlap across different signals
16. Design: Ensemble Approach
• Step-wise unionization
• Importance based on individual signal performance
• First, collaborative filtering
• Second, queries correlated by query-result clicks
• Third, queries overlapping terms
17. Design: Practical Considerations
• System designed for public consumption
• Strong profanity filters
• Need to deal with misspellings
• Languages
• Remove spammy search queries
29. Selected References
• R. Baeza-Yates. Graphs from search engine queries. LNCS, 4362:1–8, 2007.
• P. Boldi, F. Bonchi, C. Castillo, D. Donato, and S. Vigna. Query suggestions using query-
flow graphs. In Proceedings of the WSDM, 2009.
• M. A. Hasan, N. Parikh, B. Singh, and N. Sundaresan. Query suggestion for E-commerce
sites. In Proceedings of the WSDM, 2011.
• Q. Mei, D. Zhou, and K. Church. Query suggestion using hitting time. In Proceedings of
the CIKM, 2008.
• Z. Zhang and O. Nasraoui. Mining search engine query logs for query recommendation.
In Proceedings of the WWW, 2006.
first context of related searches at LinkedIn
then design, implementation and evaluation of our related searches system
Slow down
Searches per second: 130, min: 8000, hour: 480000, day: 11.5M
Cut down
discovery, exploration, refine
a screenshot of search result page
explore more
candidates and scoring
For example, web developer -> HTML
why collaborative filtering
elaborate
session replace to within a time window
Elaborate - across individual
put a real example
importance of each click
query fanout, popular result
mechanical engineer
across individual
highlight - next
show evaluation/analysis result about clicks on queries one term longer
skip the second equation
describe signals
mention evaluation later
high level design
Kafka, Voldemort citations, url to Azkaban
high level design
Kafka, Voldemort citations, url to Azkaban
scientific and easily repeatable
fast, iterative way for tuning parameters and performance
P-decreases with the size of window, R-increases with time window K
P/R low: predicting future searches, conservative measure; judging whether a signal can predict future behavior
CF has advantage in this measure
top-10 recommendations