SlideShare una empresa de Scribd logo
1 de 36
Descargar para leer sin conexión
Fast, Lenient, and Accurate
Building Personalized Instant Search Experience at LinkedIn
Ganesh Venkataraman, Abhi Lad, Lin Guo, Shakti Sinha
LinkedIn
Agenda
● LinkedIn
● LinkedIn Search
○ Navigational vs Exploratory searches
○ Typeahead vs SERP
● Big picture and problem statement
● Instant search – Search-as-you-type
○ Query autocomplete
○ Entity-aware suggestions
○ Instant results
● Conclusions & Future work
LinkedIn – Professional Identity
LinkedIn – Professional Graph
LinkedIn – Jobs
LinkedIn – And much more...
Companies
Skills
Professional Content
LinkedIn – Massive Scale
LinkedIn Search
Navigational Search
Looking for someone specific
by name.
Query has a single correct
result.
Exploratory Search
Finding people that match a
given set of criteria.
Multiple results match the
user’s query.
Instant Search – Search-as-you-type
Satisfy navigational searches:
Show instant search results.
Help frame exploratory searches:
Complete the user’s query and show search suggestions.
Big Picture
Partial query
Instant results Autocomplete
Search suggestions
Query tagger
Full-text search
Search results
Manually
entered
query
Big Picture
Partial query
Instant results Autocomplete
Search suggestions
Query tagger
Full-text search
Search results
Manually
entered
query
Focus today:
● Autocomplete
● Search suggestions
● Instant results
Problem Statement
Partial query
Instant results Autocomplete
Search suggestions
Query tagger
Full-text search
Search results
Manually
entered
query
Focus today:
● Autocomplete
● Search suggestions
● Instant results
How can we build an instant search experience that scales to 450+ million members, and is
fast, lenient, and accurate?
● Instant search = Query autocomplete + search suggestions + instant results
● Fast = Search-as-you-type latencies
● Lenient = Handle spelling errors and common variations
● Accurate = Highly relevant and personalized results
Query Tagging
PERSON
TITLE
(ID=126)
COMPANY
(ID=1337)
Entity types identified:
Person name, job title, company, school, skills, locations.
Key part of query processing!
Impacts: autocomplete, spelling correction, search suggestions,
query rewriting, ranking.
Sequential prediction model
(CRF – Conditional Random Fields)
Training data:
● Standardized dictionaries (people names,
companies, schools, titles, skills, locations)
● Query logs
● Clickthrough (CTR) data
● Crowdsourced labels
Query Autocomplete
● Fast
● Relevant and contextual
● Resilient to spelling errors
Query Autocomplete – Offline processing
linkedin software engineer
software engineer
big data
data scientist
data engineer
expert systems
.
.
[linkedin] [software engineer]
Query logs Entities Index
FST – Finite State Transducers
Compact + fast retrieval + fuzzy match (via Levenstein Automata)
Query Autocomplete – Online processing
Two step process:
1. Retrieval (Candidate generation)
User’s query: [big data e]
Candidates = C(big data e) U C(data e) U C(e)
= big data engineer,
big data expert systems,
big data entry,
...
linkedin software engineer
software engineer
big data
data scientist
data engineer
expert systems
.
.
Query logs
Query Autocomplete – Online processing
Two step process:
2. Scoring (Ranking)
User’s query: [big data e]
Candidate completions: “big data engineer”, “big data expert”, “big data entry”
Score(“big data engineer”):
P(s1
, s2
, s3
…) ≈ P(s1
)·P(s2
|s1
)·P(s3
|s2
).. // Bigram language model
Use entities : P([engineer] | [big data])
Fall back to words : P(engineer | data)·P(data | big)
Query Suggestions – Autocomplete + query tagger
“linke” ⇒ “Linkedin” ⇒ COMPANY
“had” ⇒ “Hadoop” ⇒ SKILL
Instant Results
● Fast retrieval over 450+ million members
● Highly personalized
● Balance personalization & popularity
● Resilient to spelling variations
Instant Results – Indexing
NAME: richard
PREFIX: r, ri, ric, rich, richa, ...
NAME: branson
PREFIX: b, br, bra, bran, brans, ...
● Inverted Index (Maps token to list of docs that contain that token):
NAME:richard => [1, 4, 10, 15, …] // Everyone named “richard”
PREFIX:ri => [1, 2, 4, 7, 10, 15, …] // Everyone whose name starts with “ri”
…
● Retrieval approach
User’s query – richard b
Rewritten query – +NAME:richard +PREFIX:b
● Prefix-based tokenization:
DOCID 4
(posting lists)
Instant Results – Indexing
CONN: 1, 10, 15
● Inverted Index
CONN:4 => [1, 10, 15] // Everyone connected to Richard Branson
CONN:1 => [4, ...]
CONN:10 => [4, ...]
...
● Retrieval approach
User’s query – richard b
Rewritten query – +NAME:richard +PREFIX:b +CONN:1
(Everyone named richard b… and connected to User:1)
● Connections Index:
DOCID 4
Instant Results – Indexing
Early Termination
Problem: A query like [PREFIX:ri] might retrieve too many candidate documents.
How can we retrieve the most promising documents first so that we don’t need to score all of them?
Static Rank: Order documents based on their prior (query independent) likelihood of relevance:
A combination of:
● Profile views
● Spam and security related scores
● Editorial rules (Celebrities, influencers, …)
numToScore: The number of documents to retrieve and score for any query
Balancing Popularity and Personalization
Query: richard b…
Are you looking for Richard Branson, or a colleague name Richard Burton?
(Assume searcher’s ID = 1)
Rewritten Query:
● +NAME:richard +PREFIX:b +CONN:1 // Too restrictive. Only find searcher’s connections.
● +NAME:richard +PREFIX:b ?CONN:1[50%] // Try to retrieve 50% results from searcher’s connections
Instant Results – Retrieval
Custom search operator: “Weighted OR”
Instant Results – Spelling Variations
weiner ⇔ wiener
catherine ⇔ kathryn
dipak ⇔ deepak
Name Clusters
Offline process to cluster together similar sounding or similarly spelt names.
Two step process:
1. Coarse clustering (optimized for broad coverage)
Normalization: repeated chars, accented chars, common phonetic variations (c ⇔ k, ph ⇔ f)
Combination of edit distance & double metaphone (sound)
E.g. (dipak = deepak), (wiener = weiner), (catherine = kathryn), (jeff = joff)
2. Fine-grained clustering (optimized for precision)
Split up clusters based on more sophisticated rules
Position and character-aware edit distance
Query reformulation data (q1
→ q2
→ click)
E.g. (jeff ≠ joff)
Instant Results – Spelling Variations
Instant Results – Spelling Variations
NAME: kathryn
CLUSTER: katharine
Potential queries:
katherine
kathryn
katharine
catharine
Rewritten queries:
?NAME:katherine ?CLUSTER:katharine
?NAME:kathryn ?CLUSTER:katharine
?NAME:katharine ?CLUSTER:katharine
?NAME:catharine ?CLUSTER:katharine
Either match original query term or match the name cluster
Query time
Indexing time
Clicked result treated as positive.
All other shown results treated as negative.
Since this is navigational search, we assume there’s
only 1 correct result => low presentation bias.
Learning to Rank (Machine-learned ranking)
Training data
● Click data from previous typeahead sessions
● <searcher, query, doc> ⇒ positive/negative
Features / signals
● Textual match against various fields
● Network distance, number of shared connections
● Global popularity
● Compound features
Instant Results – Scoring
+
–
–
–
Conclusions
● Instant search experience
○ Directly satisfy navigational search uses in typeahead via Instant Results
○ Help the user frame exploratory search queries via Query Autocomplete & Search
Suggestions
● Combination of techniques
○ Query tagger for entity extraction – “Things not Strings”
○ FST-based query completion
○ Inverted index-based instant results + Early termination + Weighted OR
○ Name clusters for fuzzy name matching
Future Work
● Personalized query completions
○ m ⇒ machine learning
○ m ⇒ machinist
● Multi-entity query suggestions
○ Now : [linkedin] ⇒ “Find people who work at LinkedIn”
○ Future : [linkedin data scientist] ⇒ “Find data scientists at LinkedIn”
● Better blending
○ Autocomplete + query suggestions + instant results
○ Query features – what does the query mean?
○ Results features – what results come back from each system?
Thank You!
LinkedIn – The Economic Graph
LinkedIn Search – SERP (Jobs)
LinkedIn Search – Typeahead
LinkedIn Search – SERP

Más contenido relacionado

La actualidad más candente

Unit Testing like a Pro - The Circle of Purity
Unit Testing like a Pro - The Circle of PurityUnit Testing like a Pro - The Circle of Purity
Unit Testing like a Pro - The Circle of PurityVictor Rentea
 
The Google Hacking Database: A Key Resource to Exposing Vulnerabilities
The Google Hacking Database: A Key Resource to Exposing VulnerabilitiesThe Google Hacking Database: A Key Resource to Exposing Vulnerabilities
The Google Hacking Database: A Key Resource to Exposing VulnerabilitiesTechWell
 
Learning to Rank Presentation (v2) at LexisNexis Search Guild
Learning to Rank Presentation (v2) at LexisNexis Search GuildLearning to Rank Presentation (v2) at LexisNexis Search Guild
Learning to Rank Presentation (v2) at LexisNexis Search GuildSujit Pal
 
Search, Discovery and Questions at Quora
Search, Discovery and Questions at QuoraSearch, Discovery and Questions at Quora
Search, Discovery and Questions at QuoraNikhil Dandekar
 
RedisConf18 - Introducing RediSearch Aggregations
RedisConf18 - Introducing RediSearch AggregationsRedisConf18 - Introducing RediSearch Aggregations
RedisConf18 - Introducing RediSearch AggregationsRedis Labs
 
E-Commerce search with Elasticsearch
E-Commerce search with ElasticsearchE-Commerce search with Elasticsearch
E-Commerce search with ElasticsearchYevhen Shyshkin
 
Activity Ranking in LinkedIn Feed
Activity Ranking in LinkedIn FeedActivity Ranking in LinkedIn Feed
Activity Ranking in LinkedIn FeedBodla Kumar
 
Data Visualization for SEO
Data Visualization for SEOData Visualization for SEO
Data Visualization for SEOAhrefs
 
Integration testing with spring @snow one
Integration testing with spring @snow oneIntegration testing with spring @snow one
Integration testing with spring @snow oneVictor Rentea
 
Google Dorking Tutorial | What Is Google Dorks And How To Use It? | Ethical H...
Google Dorking Tutorial | What Is Google Dorks And How To Use It? | Ethical H...Google Dorking Tutorial | What Is Google Dorks And How To Use It? | Ethical H...
Google Dorking Tutorial | What Is Google Dorks And How To Use It? | Ethical H...Simplilearn
 
Clean Lambdas & Streams in Java8
Clean Lambdas & Streams in Java8Clean Lambdas & Streams in Java8
Clean Lambdas & Streams in Java8Victor Rentea
 
Clean Pragmatic Architecture - Avoiding a Monolith
Clean Pragmatic Architecture - Avoiding a MonolithClean Pragmatic Architecture - Avoiding a Monolith
Clean Pragmatic Architecture - Avoiding a MonolithVictor Rentea
 
Object Oriented programming Using Python.pdf
Object Oriented programming Using Python.pdfObject Oriented programming Using Python.pdf
Object Oriented programming Using Python.pdfRishuRaj953240
 
[QCon.ai 2019] People You May Know: Fast Recommendations Over Massive Data
[QCon.ai 2019] People You May Know: Fast Recommendations Over Massive Data[QCon.ai 2019] People You May Know: Fast Recommendations Over Massive Data
[QCon.ai 2019] People You May Know: Fast Recommendations Over Massive DataSumit Rangwala
 
Cypher to SQL online mapper
Cypher to SQL online mapperCypher to SQL online mapper
Cypher to SQL online mapperAl Zindiq
 
Apache Lucene/Solr Document Classification
Apache Lucene/Solr Document ClassificationApache Lucene/Solr Document Classification
Apache Lucene/Solr Document ClassificationSease
 
All you need to know about JavaScript loading and execution in the browser - ...
All you need to know about JavaScript loading and execution in the browser - ...All you need to know about JavaScript loading and execution in the browser - ...
All you need to know about JavaScript loading and execution in the browser - ...Caelum
 
Don't Be Mocked by your Mocks - Best Practices using Mocks
Don't Be Mocked by your Mocks - Best Practices using MocksDon't Be Mocked by your Mocks - Best Practices using Mocks
Don't Be Mocked by your Mocks - Best Practices using MocksVictor Rentea
 

La actualidad más candente (20)

Unit Testing like a Pro - The Circle of Purity
Unit Testing like a Pro - The Circle of PurityUnit Testing like a Pro - The Circle of Purity
Unit Testing like a Pro - The Circle of Purity
 
MMM, Search!
MMM, Search!MMM, Search!
MMM, Search!
 
The Google Hacking Database: A Key Resource to Exposing Vulnerabilities
The Google Hacking Database: A Key Resource to Exposing VulnerabilitiesThe Google Hacking Database: A Key Resource to Exposing Vulnerabilities
The Google Hacking Database: A Key Resource to Exposing Vulnerabilities
 
Learning to Rank Presentation (v2) at LexisNexis Search Guild
Learning to Rank Presentation (v2) at LexisNexis Search GuildLearning to Rank Presentation (v2) at LexisNexis Search Guild
Learning to Rank Presentation (v2) at LexisNexis Search Guild
 
Search, Discovery and Questions at Quora
Search, Discovery and Questions at QuoraSearch, Discovery and Questions at Quora
Search, Discovery and Questions at Quora
 
RedisConf18 - Introducing RediSearch Aggregations
RedisConf18 - Introducing RediSearch AggregationsRedisConf18 - Introducing RediSearch Aggregations
RedisConf18 - Introducing RediSearch Aggregations
 
E-Commerce search with Elasticsearch
E-Commerce search with ElasticsearchE-Commerce search with Elasticsearch
E-Commerce search with Elasticsearch
 
Activity Ranking in LinkedIn Feed
Activity Ranking in LinkedIn FeedActivity Ranking in LinkedIn Feed
Activity Ranking in LinkedIn Feed
 
How to Design Indexes, Really
How to Design Indexes, ReallyHow to Design Indexes, Really
How to Design Indexes, Really
 
Data Visualization for SEO
Data Visualization for SEOData Visualization for SEO
Data Visualization for SEO
 
Integration testing with spring @snow one
Integration testing with spring @snow oneIntegration testing with spring @snow one
Integration testing with spring @snow one
 
Google Dorking Tutorial | What Is Google Dorks And How To Use It? | Ethical H...
Google Dorking Tutorial | What Is Google Dorks And How To Use It? | Ethical H...Google Dorking Tutorial | What Is Google Dorks And How To Use It? | Ethical H...
Google Dorking Tutorial | What Is Google Dorks And How To Use It? | Ethical H...
 
Clean Lambdas & Streams in Java8
Clean Lambdas & Streams in Java8Clean Lambdas & Streams in Java8
Clean Lambdas & Streams in Java8
 
Clean Pragmatic Architecture - Avoiding a Monolith
Clean Pragmatic Architecture - Avoiding a MonolithClean Pragmatic Architecture - Avoiding a Monolith
Clean Pragmatic Architecture - Avoiding a Monolith
 
Object Oriented programming Using Python.pdf
Object Oriented programming Using Python.pdfObject Oriented programming Using Python.pdf
Object Oriented programming Using Python.pdf
 
[QCon.ai 2019] People You May Know: Fast Recommendations Over Massive Data
[QCon.ai 2019] People You May Know: Fast Recommendations Over Massive Data[QCon.ai 2019] People You May Know: Fast Recommendations Over Massive Data
[QCon.ai 2019] People You May Know: Fast Recommendations Over Massive Data
 
Cypher to SQL online mapper
Cypher to SQL online mapperCypher to SQL online mapper
Cypher to SQL online mapper
 
Apache Lucene/Solr Document Classification
Apache Lucene/Solr Document ClassificationApache Lucene/Solr Document Classification
Apache Lucene/Solr Document Classification
 
All you need to know about JavaScript loading and execution in the browser - ...
All you need to know about JavaScript loading and execution in the browser - ...All you need to know about JavaScript loading and execution in the browser - ...
All you need to know about JavaScript loading and execution in the browser - ...
 
Don't Be Mocked by your Mocks - Best Practices using Mocks
Don't Be Mocked by your Mocks - Best Practices using MocksDon't Be Mocked by your Mocks - Best Practices using Mocks
Don't Be Mocked by your Mocks - Best Practices using Mocks
 

Similar a Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn

Disrupting Data Discovery
Disrupting Data DiscoveryDisrupting Data Discovery
Disrupting Data Discoverymarkgrover
 
Dataiku at SF DataMining Meetup - Kaggle Yandex Challenge
Dataiku at SF DataMining Meetup - Kaggle Yandex ChallengeDataiku at SF DataMining Meetup - Kaggle Yandex Challenge
Dataiku at SF DataMining Meetup - Kaggle Yandex ChallengeDataiku
 
Reflected intelligence evolving self-learning data systems
Reflected intelligence  evolving self-learning data systemsReflected intelligence  evolving self-learning data systems
Reflected intelligence evolving self-learning data systemsTrey Grainger
 
Next generation linked in talent search
Next generation linked in talent searchNext generation linked in talent search
Next generation linked in talent searchRyan Wu
 
AI, Search, and the Disruption of Knowledge Management
AI, Search, and the Disruption of Knowledge ManagementAI, Search, and the Disruption of Knowledge Management
AI, Search, and the Disruption of Knowledge ManagementTrey Grainger
 
Fully Automated QA System For Large Scale Search And Recommendation Engines U...
Fully Automated QA System For Large Scale Search And Recommendation Engines U...Fully Automated QA System For Large Scale Search And Recommendation Engines U...
Fully Automated QA System For Large Scale Search And Recommendation Engines U...Spark Summit
 
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix ScaleQcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix ScaleXavier Amatriain
 
SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018 SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018 CareerBuilder.com
 
Data council sf amundsen presentation
Data council sf    amundsen presentationData council sf    amundsen presentation
Data council sf amundsen presentationTao Feng
 
Personalizing Search at LinkedIn
Personalizing Search at LinkedInPersonalizing Search at LinkedIn
Personalizing Search at LinkedInViet Ha-Thuc
 
Amundsen: From discovering to security data
Amundsen: From discovering to security dataAmundsen: From discovering to security data
Amundsen: From discovering to security datamarkgrover
 
Fairness, Transparency, and Privacy in AI @ LinkedIn
Fairness, Transparency, and Privacy in AI @ LinkedInFairness, Transparency, and Privacy in AI @ LinkedIn
Fairness, Transparency, and Privacy in AI @ LinkedInKrishnaram Kenthapadi
 
Crowdsourcing Linked Data Quality Assessment
Crowdsourcing Linked Data Quality AssessmentCrowdsourcing Linked Data Quality Assessment
Crowdsourcing Linked Data Quality AssessmentAmrapali Zaveri, PhD
 
Understanding Queries through Entities
Understanding Queries through EntitiesUnderstanding Queries through Entities
Understanding Queries through EntitiesPeter Mika
 
Deep natural language processing in search systems
Deep natural language processing in search systemsDeep natural language processing in search systems
Deep natural language processing in search systemsBill Liu
 
How Lyft Drives Data Discovery
How Lyft Drives Data DiscoveryHow Lyft Drives Data Discovery
How Lyft Drives Data DiscoveryNeo4j
 
Structure, Personalization, Scale: A Deep Dive into LinkedIn Search
Structure, Personalization, Scale: A Deep Dive into LinkedIn SearchStructure, Personalization, Scale: A Deep Dive into LinkedIn Search
Structure, Personalization, Scale: A Deep Dive into LinkedIn SearchC4Media
 
Análisis de las novedades del Elastic Stack
Análisis de las novedades del Elastic StackAnálisis de las novedades del Elastic Stack
Análisis de las novedades del Elastic StackElasticsearch
 
Crowdsourced query augmentation through the semantic discovery of domain spec...
Crowdsourced query augmentation through the semantic discovery of domain spec...Crowdsourced query augmentation through the semantic discovery of domain spec...
Crowdsourced query augmentation through the semantic discovery of domain spec...Trey Grainger
 

Similar a Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn (20)

Disrupting Data Discovery
Disrupting Data DiscoveryDisrupting Data Discovery
Disrupting Data Discovery
 
Dataiku at SF DataMining Meetup - Kaggle Yandex Challenge
Dataiku at SF DataMining Meetup - Kaggle Yandex ChallengeDataiku at SF DataMining Meetup - Kaggle Yandex Challenge
Dataiku at SF DataMining Meetup - Kaggle Yandex Challenge
 
Reflected intelligence evolving self-learning data systems
Reflected intelligence  evolving self-learning data systemsReflected intelligence  evolving self-learning data systems
Reflected intelligence evolving self-learning data systems
 
Next generation linked in talent search
Next generation linked in talent searchNext generation linked in talent search
Next generation linked in talent search
 
AI, Search, and the Disruption of Knowledge Management
AI, Search, and the Disruption of Knowledge ManagementAI, Search, and the Disruption of Knowledge Management
AI, Search, and the Disruption of Knowledge Management
 
kdd2015
kdd2015kdd2015
kdd2015
 
Fully Automated QA System For Large Scale Search And Recommendation Engines U...
Fully Automated QA System For Large Scale Search And Recommendation Engines U...Fully Automated QA System For Large Scale Search And Recommendation Engines U...
Fully Automated QA System For Large Scale Search And Recommendation Engines U...
 
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix ScaleQcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
Qcon SF 2013 - Machine Learning & Recommender Systems @ Netflix Scale
 
SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018 SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018
 
Data council sf amundsen presentation
Data council sf    amundsen presentationData council sf    amundsen presentation
Data council sf amundsen presentation
 
Personalizing Search at LinkedIn
Personalizing Search at LinkedInPersonalizing Search at LinkedIn
Personalizing Search at LinkedIn
 
Amundsen: From discovering to security data
Amundsen: From discovering to security dataAmundsen: From discovering to security data
Amundsen: From discovering to security data
 
Fairness, Transparency, and Privacy in AI @ LinkedIn
Fairness, Transparency, and Privacy in AI @ LinkedInFairness, Transparency, and Privacy in AI @ LinkedIn
Fairness, Transparency, and Privacy in AI @ LinkedIn
 
Crowdsourcing Linked Data Quality Assessment
Crowdsourcing Linked Data Quality AssessmentCrowdsourcing Linked Data Quality Assessment
Crowdsourcing Linked Data Quality Assessment
 
Understanding Queries through Entities
Understanding Queries through EntitiesUnderstanding Queries through Entities
Understanding Queries through Entities
 
Deep natural language processing in search systems
Deep natural language processing in search systemsDeep natural language processing in search systems
Deep natural language processing in search systems
 
How Lyft Drives Data Discovery
How Lyft Drives Data DiscoveryHow Lyft Drives Data Discovery
How Lyft Drives Data Discovery
 
Structure, Personalization, Scale: A Deep Dive into LinkedIn Search
Structure, Personalization, Scale: A Deep Dive into LinkedIn SearchStructure, Personalization, Scale: A Deep Dive into LinkedIn Search
Structure, Personalization, Scale: A Deep Dive into LinkedIn Search
 
Análisis de las novedades del Elastic Stack
Análisis de las novedades del Elastic StackAnálisis de las novedades del Elastic Stack
Análisis de las novedades del Elastic Stack
 
Crowdsourced query augmentation through the semantic discovery of domain spec...
Crowdsourced query augmentation through the semantic discovery of domain spec...Crowdsourced query augmentation through the semantic discovery of domain spec...
Crowdsourced query augmentation through the semantic discovery of domain spec...
 

Último

AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 

Último (20)

AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 

Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn

  • 1. Fast, Lenient, and Accurate Building Personalized Instant Search Experience at LinkedIn Ganesh Venkataraman, Abhi Lad, Lin Guo, Shakti Sinha LinkedIn
  • 2. Agenda ● LinkedIn ● LinkedIn Search ○ Navigational vs Exploratory searches ○ Typeahead vs SERP ● Big picture and problem statement ● Instant search – Search-as-you-type ○ Query autocomplete ○ Entity-aware suggestions ○ Instant results ● Conclusions & Future work
  • 6. LinkedIn – And much more... Companies Skills Professional Content
  • 9. Navigational Search Looking for someone specific by name. Query has a single correct result.
  • 10. Exploratory Search Finding people that match a given set of criteria. Multiple results match the user’s query.
  • 11. Instant Search – Search-as-you-type Satisfy navigational searches: Show instant search results. Help frame exploratory searches: Complete the user’s query and show search suggestions.
  • 12. Big Picture Partial query Instant results Autocomplete Search suggestions Query tagger Full-text search Search results Manually entered query
  • 13. Big Picture Partial query Instant results Autocomplete Search suggestions Query tagger Full-text search Search results Manually entered query Focus today: ● Autocomplete ● Search suggestions ● Instant results
  • 14. Problem Statement Partial query Instant results Autocomplete Search suggestions Query tagger Full-text search Search results Manually entered query Focus today: ● Autocomplete ● Search suggestions ● Instant results How can we build an instant search experience that scales to 450+ million members, and is fast, lenient, and accurate? ● Instant search = Query autocomplete + search suggestions + instant results ● Fast = Search-as-you-type latencies ● Lenient = Handle spelling errors and common variations ● Accurate = Highly relevant and personalized results
  • 15. Query Tagging PERSON TITLE (ID=126) COMPANY (ID=1337) Entity types identified: Person name, job title, company, school, skills, locations. Key part of query processing! Impacts: autocomplete, spelling correction, search suggestions, query rewriting, ranking. Sequential prediction model (CRF – Conditional Random Fields) Training data: ● Standardized dictionaries (people names, companies, schools, titles, skills, locations) ● Query logs ● Clickthrough (CTR) data ● Crowdsourced labels
  • 16. Query Autocomplete ● Fast ● Relevant and contextual ● Resilient to spelling errors
  • 17. Query Autocomplete – Offline processing linkedin software engineer software engineer big data data scientist data engineer expert systems . . [linkedin] [software engineer] Query logs Entities Index FST – Finite State Transducers Compact + fast retrieval + fuzzy match (via Levenstein Automata)
  • 18. Query Autocomplete – Online processing Two step process: 1. Retrieval (Candidate generation) User’s query: [big data e] Candidates = C(big data e) U C(data e) U C(e) = big data engineer, big data expert systems, big data entry, ... linkedin software engineer software engineer big data data scientist data engineer expert systems . . Query logs
  • 19. Query Autocomplete – Online processing Two step process: 2. Scoring (Ranking) User’s query: [big data e] Candidate completions: “big data engineer”, “big data expert”, “big data entry” Score(“big data engineer”): P(s1 , s2 , s3 …) ≈ P(s1 )·P(s2 |s1 )·P(s3 |s2 ).. // Bigram language model Use entities : P([engineer] | [big data]) Fall back to words : P(engineer | data)·P(data | big)
  • 20. Query Suggestions – Autocomplete + query tagger “linke” ⇒ “Linkedin” ⇒ COMPANY “had” ⇒ “Hadoop” ⇒ SKILL
  • 21. Instant Results ● Fast retrieval over 450+ million members ● Highly personalized ● Balance personalization & popularity ● Resilient to spelling variations
  • 22. Instant Results – Indexing NAME: richard PREFIX: r, ri, ric, rich, richa, ... NAME: branson PREFIX: b, br, bra, bran, brans, ... ● Inverted Index (Maps token to list of docs that contain that token): NAME:richard => [1, 4, 10, 15, …] // Everyone named “richard” PREFIX:ri => [1, 2, 4, 7, 10, 15, …] // Everyone whose name starts with “ri” … ● Retrieval approach User’s query – richard b Rewritten query – +NAME:richard +PREFIX:b ● Prefix-based tokenization: DOCID 4 (posting lists)
  • 23. Instant Results – Indexing CONN: 1, 10, 15 ● Inverted Index CONN:4 => [1, 10, 15] // Everyone connected to Richard Branson CONN:1 => [4, ...] CONN:10 => [4, ...] ... ● Retrieval approach User’s query – richard b Rewritten query – +NAME:richard +PREFIX:b +CONN:1 (Everyone named richard b… and connected to User:1) ● Connections Index: DOCID 4
  • 24. Instant Results – Indexing Early Termination Problem: A query like [PREFIX:ri] might retrieve too many candidate documents. How can we retrieve the most promising documents first so that we don’t need to score all of them? Static Rank: Order documents based on their prior (query independent) likelihood of relevance: A combination of: ● Profile views ● Spam and security related scores ● Editorial rules (Celebrities, influencers, …) numToScore: The number of documents to retrieve and score for any query
  • 25. Balancing Popularity and Personalization Query: richard b… Are you looking for Richard Branson, or a colleague name Richard Burton? (Assume searcher’s ID = 1) Rewritten Query: ● +NAME:richard +PREFIX:b +CONN:1 // Too restrictive. Only find searcher’s connections. ● +NAME:richard +PREFIX:b ?CONN:1[50%] // Try to retrieve 50% results from searcher’s connections Instant Results – Retrieval Custom search operator: “Weighted OR”
  • 26. Instant Results – Spelling Variations weiner ⇔ wiener catherine ⇔ kathryn dipak ⇔ deepak
  • 27. Name Clusters Offline process to cluster together similar sounding or similarly spelt names. Two step process: 1. Coarse clustering (optimized for broad coverage) Normalization: repeated chars, accented chars, common phonetic variations (c ⇔ k, ph ⇔ f) Combination of edit distance & double metaphone (sound) E.g. (dipak = deepak), (wiener = weiner), (catherine = kathryn), (jeff = joff) 2. Fine-grained clustering (optimized for precision) Split up clusters based on more sophisticated rules Position and character-aware edit distance Query reformulation data (q1 → q2 → click) E.g. (jeff ≠ joff) Instant Results – Spelling Variations
  • 28. Instant Results – Spelling Variations NAME: kathryn CLUSTER: katharine Potential queries: katherine kathryn katharine catharine Rewritten queries: ?NAME:katherine ?CLUSTER:katharine ?NAME:kathryn ?CLUSTER:katharine ?NAME:katharine ?CLUSTER:katharine ?NAME:catharine ?CLUSTER:katharine Either match original query term or match the name cluster Query time Indexing time
  • 29. Clicked result treated as positive. All other shown results treated as negative. Since this is navigational search, we assume there’s only 1 correct result => low presentation bias. Learning to Rank (Machine-learned ranking) Training data ● Click data from previous typeahead sessions ● <searcher, query, doc> ⇒ positive/negative Features / signals ● Textual match against various fields ● Network distance, number of shared connections ● Global popularity ● Compound features Instant Results – Scoring + – – –
  • 30. Conclusions ● Instant search experience ○ Directly satisfy navigational search uses in typeahead via Instant Results ○ Help the user frame exploratory search queries via Query Autocomplete & Search Suggestions ● Combination of techniques ○ Query tagger for entity extraction – “Things not Strings” ○ FST-based query completion ○ Inverted index-based instant results + Early termination + Weighted OR ○ Name clusters for fuzzy name matching
  • 31. Future Work ● Personalized query completions ○ m ⇒ machine learning ○ m ⇒ machinist ● Multi-entity query suggestions ○ Now : [linkedin] ⇒ “Find people who work at LinkedIn” ○ Future : [linkedin data scientist] ⇒ “Find data scientists at LinkedIn” ● Better blending ○ Autocomplete + query suggestions + instant results ○ Query features – what does the query mean? ○ Results features – what results come back from each system?
  • 33. LinkedIn – The Economic Graph
  • 34. LinkedIn Search – SERP (Jobs)
  • 35. LinkedIn Search – Typeahead