SlideShare una empresa de Scribd logo
1 de 25
Socializing Search. Professionally.
Sriram Sankar
Principal Staff Engineer
Recruiting Solutions

Daniel Tunkelang
Head, Query Understanding
Whether you’ve tried to find an Apache committer…
…or an Apache commander,

3
you’ve probably used LinkedIn Search.

4
Let’s talk about…

• Infrastructure

• Quality
5
LinkedIn Search leverages the economic graph.

6
Social means that relevance is highly personalized.

7
Machine-learned ranking, socially.
 Relevance models incorporate user features:
score = P (Document | Query, User)

 Our model: tree with logistic regression leaves.
X2=?

b0 + b1 T(x1 )+...+ bn xn

X10< 0.1234 ?

a0 + a1 P(x1 )+...+ anQ(xn )

g 0 + g1 R(x1 )+...+ g nQ(xn )
8
LinkedIn’s focus: entity-oriented search.

Company

Name
Search

Employees

Jobs

9
Query understanding can act as a relevance filter.

for i in [1..n]
s
w1 w2 … wi
if Pc(s) > 0
a
new Segment()
a.segs
{s}
a.prob
Pc(s)
B[i]
{a}
for j in [1..i-1]
for b in B[j]
s
wj wj+1 … wi
if Pc(s) > 0
a
new Segment()
a.segs
b.segs U {s}
a.prob
b.prob * Pc(s)
B[i]
B[i] U {a}
sort B[i] by prob
truncate B[i] to size k

10
Less is more.
warren buffett

11
Coming soon: entity-driven search assist.
link
Jobs at LinkedIn
People currently working at LinkedIn
People who used to work at LinkedIn

Search
Infrastructure

Lucene
 Map of terms to documents – the index
 Provides an API to add and remove documents to the
index
 Provides an API to query the index

13
1.

2.

BLAH BLAH BLAH

BLAH BLAH

Daniel

Daniel BLAH BLAH LinkedIn BLAH BLAH BLAH BLAH

Sriram

BLAH

LinkedIn BLAH BLAH BLAH BLAH BLAH BLAH BLAH

Sriram

LinkedIn

1
2
Inverted Index

Forward Index
14
A standard scoring capability is built in

15
 Extremely easy to build a search engine
 But difficult to get sophisticated

16
The LinkedIn Search Stack
Request
Live
Updates

Updates

Query Rewriter

Index Retrieval

Scorer
Offline
Data
Building

Data

Sorter/Blender

Response
17
Search Index Served by Lucene
 Inverted index
 Forward index
 Static rank based document ordering

18
Offline Data Builds on Hadoop
 Multi-stage map-reduce pipeline allows complex data
processing
 Produces sharded single segment Lucene index with
documents sorted by static rank
 Produces data models for use in query rewriting

19
Live Data Updates
 Feed based framework to support updates to offline data
builds
 Lucene enhanced with a partial index update capability

20
Query Rewriting (and Planning)
 Accepts raw query and user metadata
 Produces Lucene retrieval query and metadata for
scoring
 May use data models built offline

21
Index Retrieval
 Lucene query built by query rewriter is used to retrieve
documents from the Lucene index
 Documents are retrieved in static rank order (best
document first)
 Retrieval may be early-terminated – given that retrieval is
in static rank order
 No scoring is performed during retrieval

22
Scoring
 Scoring is performed after retrieval
 Its input is the retrieved document (i.e., includes the
forward index), a description of how the retrieval query
matched the document, and the scoring metadata
produced by the rewriter
 Costly features can be computed offline during the index
building process in Hadoop – e.g., tf/idf calculations

23
Summary
Quality
 LinkedIn Search leverages the economic graph.
 Social means that relevance is highly personalized.
 Less is more: query understanding is a relevance filter.
 Moving in the direction of suggesting structured queries.
System
 Powered by Lucene, but with additional components.
 Offline data builds on Hadoop, partial index updates.
 Index uses static ranking and early termination.
 Scoring performed outside of Lucene.

24
Sriram Sankar
ssankar@linkedin.com
https://linkedin.com/in/sriramxsankar

Daniel Tunkelang
dtunkelang@linkedin.com
https://linkedin.com/in/dtunkelang
25

Más contenido relacionado

Más de Daniel Tunkelang

Search as Communication: Lessons from a Personal Journey
Search as Communication: Lessons from a Personal JourneySearch as Communication: Lessons from a Personal Journey
Search as Communication: Lessons from a Personal Journey
Daniel Tunkelang
 
Enterprise Search: How do we get there from here?
Enterprise Search: How do we get there from here?Enterprise Search: How do we get there from here?
Enterprise Search: How do we get there from here?
Daniel Tunkelang
 
Big Data, We Have a Communication Problem
Big Data, We Have a Communication Problem Big Data, We Have a Communication Problem
Big Data, We Have a Communication Problem
Daniel Tunkelang
 
Data By The People, For The People
Data By The People, For The PeopleData By The People, For The People
Data By The People, For The People
Daniel Tunkelang
 
Design for Interaction
Design for InteractionDesign for Interaction
Design for Interaction
Daniel Tunkelang
 

Más de Daniel Tunkelang (20)

Query Understanding and Ecommerce
Query Understanding and EcommerceQuery Understanding and Ecommerce
Query Understanding and Ecommerce
 
Semantic Equivalence of e-Commerce Queries
Semantic Equivalence of e-Commerce QueriesSemantic Equivalence of e-Commerce Queries
Semantic Equivalence of e-Commerce Queries
 
Helping Searchers Satisfice through Query Understanding
Helping Searchers Satisfice through Query UnderstandingHelping Searchers Satisfice through Query Understanding
Helping Searchers Satisfice through Query Understanding
 
MMM, Search!
MMM, Search!MMM, Search!
MMM, Search!
 
Where should you put your data scientists?
Where should you put your data scientists?Where should you put your data scientists?
Where should you put your data scientists?
 
Search as Communication: Lessons from a Personal Journey
Search as Communication: Lessons from a Personal JourneySearch as Communication: Lessons from a Personal Journey
Search as Communication: Lessons from a Personal Journey
 
Enterprise Search: How do we get there from here?
Enterprise Search: How do we get there from here?Enterprise Search: How do we get there from here?
Enterprise Search: How do we get there from here?
 
Big Data, We Have a Communication Problem
Big Data, We Have a Communication Problem Big Data, We Have a Communication Problem
Big Data, We Have a Communication Problem
 
How to Interview a Data Scientist
How to Interview a Data ScientistHow to Interview a Data Scientist
How to Interview a Data Scientist
 
Information, Attention, and Trust: A Hierarchy of Needs
Information, Attention, and Trust: A Hierarchy of NeedsInformation, Attention, and Trust: A Hierarchy of Needs
Information, Attention, and Trust: A Hierarchy of Needs
 
Data By The People, For The People
Data By The People, For The PeopleData By The People, For The People
Data By The People, For The People
 
Content, Connections, and Context
Content, Connections, and ContextContent, Connections, and Context
Content, Connections, and Context
 
Scale, Structure, and Semantics
Scale, Structure, and SemanticsScale, Structure, and Semantics
Scale, Structure, and Semantics
 
Strata 2012: Humans, Machines, and the Dimensions of Microwork
Strata 2012: Humans, Machines, and the Dimensions of MicroworkStrata 2012: Humans, Machines, and the Dimensions of Microwork
Strata 2012: Humans, Machines, and the Dimensions of Microwork
 
Recommendations as a Conversation with the User
Recommendations as a Conversation with the UserRecommendations as a Conversation with the User
Recommendations as a Conversation with the User
 
Keeping It Professional: Relevance, Recommendations, and Reputation at LinkedIn
Keeping It Professional: Relevance, Recommendations, and Reputation at LinkedInKeeping It Professional: Relevance, Recommendations, and Reputation at LinkedIn
Keeping It Professional: Relevance, Recommendations, and Reputation at LinkedIn
 
The War on Attention Poverty: Measuring Twitter Authority
The War on Attention Poverty: Measuring Twitter AuthorityThe War on Attention Poverty: Measuring Twitter Authority
The War on Attention Poverty: Measuring Twitter Authority
 
Design for Interaction
Design for InteractionDesign for Interaction
Design for Interaction
 
Enabling Exploration Through Text Analytics
Enabling Exploration Through Text AnalyticsEnabling Exploration Through Text Analytics
Enabling Exploration Through Text Analytics
 
exploring semantic means
exploring semantic meansexploring semantic means
exploring semantic means
 

Último

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Último (20)

Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 

Socializing Search. Professionally.

  • 1. Socializing Search. Professionally. Sriram Sankar Principal Staff Engineer Recruiting Solutions Daniel Tunkelang Head, Query Understanding
  • 2. Whether you’ve tried to find an Apache committer…
  • 3. …or an Apache commander, 3
  • 4. you’ve probably used LinkedIn Search. 4
  • 5. Let’s talk about… • Infrastructure • Quality 5
  • 6. LinkedIn Search leverages the economic graph. 6
  • 7. Social means that relevance is highly personalized. 7
  • 8. Machine-learned ranking, socially.  Relevance models incorporate user features: score = P (Document | Query, User)  Our model: tree with logistic regression leaves. X2=? b0 + b1 T(x1 )+...+ bn xn X10< 0.1234 ? a0 + a1 P(x1 )+...+ anQ(xn ) g 0 + g1 R(x1 )+...+ g nQ(xn ) 8
  • 9. LinkedIn’s focus: entity-oriented search. Company Name Search Employees Jobs 9
  • 10. Query understanding can act as a relevance filter. for i in [1..n] s w1 w2 … wi if Pc(s) > 0 a new Segment() a.segs {s} a.prob Pc(s) B[i] {a} for j in [1..i-1] for b in B[j] s wj wj+1 … wi if Pc(s) > 0 a new Segment() a.segs b.segs U {s} a.prob b.prob * Pc(s) B[i] B[i] U {a} sort B[i] by prob truncate B[i] to size k 10
  • 11. Less is more. warren buffett 11
  • 12. Coming soon: entity-driven search assist. link Jobs at LinkedIn People currently working at LinkedIn People who used to work at LinkedIn Search
  • 13. Infrastructure Lucene  Map of terms to documents – the index  Provides an API to add and remove documents to the index  Provides an API to query the index 13
  • 14. 1. 2. BLAH BLAH BLAH BLAH BLAH Daniel Daniel BLAH BLAH LinkedIn BLAH BLAH BLAH BLAH Sriram BLAH LinkedIn BLAH BLAH BLAH BLAH BLAH BLAH BLAH Sriram LinkedIn 1 2 Inverted Index Forward Index 14
  • 15. A standard scoring capability is built in 15
  • 16.  Extremely easy to build a search engine  But difficult to get sophisticated 16
  • 17. The LinkedIn Search Stack Request Live Updates Updates Query Rewriter Index Retrieval Scorer Offline Data Building Data Sorter/Blender Response 17
  • 18. Search Index Served by Lucene  Inverted index  Forward index  Static rank based document ordering 18
  • 19. Offline Data Builds on Hadoop  Multi-stage map-reduce pipeline allows complex data processing  Produces sharded single segment Lucene index with documents sorted by static rank  Produces data models for use in query rewriting 19
  • 20. Live Data Updates  Feed based framework to support updates to offline data builds  Lucene enhanced with a partial index update capability 20
  • 21. Query Rewriting (and Planning)  Accepts raw query and user metadata  Produces Lucene retrieval query and metadata for scoring  May use data models built offline 21
  • 22. Index Retrieval  Lucene query built by query rewriter is used to retrieve documents from the Lucene index  Documents are retrieved in static rank order (best document first)  Retrieval may be early-terminated – given that retrieval is in static rank order  No scoring is performed during retrieval 22
  • 23. Scoring  Scoring is performed after retrieval  Its input is the retrieved document (i.e., includes the forward index), a description of how the retrieval query matched the document, and the scoring metadata produced by the rewriter  Costly features can be computed offline during the index building process in Hadoop – e.g., tf/idf calculations 23
  • 24. Summary Quality  LinkedIn Search leverages the economic graph.  Social means that relevance is highly personalized.  Less is more: query understanding is a relevance filter.  Moving in the direction of suggesting structured queries. System  Powered by Lucene, but with additional components.  Offline data builds on Hadoop, partial index updates.  Index uses static ranking and early termination.  Scoring performed outside of Lucene. 24