SlideShare una empresa de Scribd logo
1 de 32
Descargar para leer sin conexión
Shakti          Daniel




     formation Retrieval: Search at LinkedIn
Shakti Sinha               Daniel Tunkelang
Head, Search Relevance     Head, Query Understanding

    Recruiting Solutions                               1
Why do 200M+ people use LinkedIn?




                                    2
People use LinkedIn because of other people.




                                          3
Search helps members find and be found.




                                          4
Rich collection of professional content.




                                           5
Every search is personalized.




                                6
Let’s talk a bit about how it all works.

§  Query Understanding

§  Search Spam

§  Unified Search

More at http://data.linkedin.com/search.



                                           7
Query Understanding




                      8
People are semi-structured objects.




  for i in [1..n]!
    s ← w 1 w 2 … w i!
    if Pc(s) > 0!
      a ← new Segment()!
      a.segs ← {s}!
      a.prob ← Pc(s)!
      B[i] ← {a}!
    for j in [1..i-1]!
       for b in B[j]!
         s ← wj wj+1 … wi!
         if Pc(s) > 0!
            a ← new Segment()!
            a.segs ← b.segs U {s}!
            a.prob ← b.prob * Pc(s)!
            B[i] ← B[i] U {a}!
     sort B[i] by prob!
     truncate B[i] to size k!



                                       9
Word sense is contextual.




                            10
Understand queries as early as possible.




                                           11
Query structure has many applications.

§    Boost results that match query interpretation.
§    Bucket search log analysis by query classes.
§    Query rewriting specific to query classes.
§    …



      Query understanding focuses on set-level metrics.

                  Not just about best answer,
                  but getting to best question.


                                                          12
Search Spam




              13
Let’s look at a search spammer.




                                  14
Summary is verbose but legitimate.




                                     15
But then comes the keyword stuffing.




                                       16
How we train our search spam classifier.

§  Find the queries targeted by spammers.
   –  10,000 most common non-name queries.


§  Look at top results for a generic user.
   –  i.e., show unpersonalized search results.


§  Remove private profiles.
   –  Members first! Can’t sacrifice privacy to fight spammers.


§  Label data by crowdsourcing.
   –  Relevance is subjective, but spam is relatively objective.


                                                                   17
ROC curve for spam thresholding.

                   1
     Spam score
      threshold   0.9

                  0.8
          a
                  0.7

                  0.6

                  0.5
           b
                  0.4

                  0.3

     0<a<b<1      0.2

                  0.1

                   0
                        0   0.1   0.2   0.3   0.4   0.5   0.6   0.7   0.8   0.9   1




                                                                                      18
Integrate spamminess into relevance score.

§  Spam model yields a probability between 0 and 1.

§  Use spam score as piecewise linear factor:
      if score < spammin:
           # not a spammer
           relevance *= 1.0
      elif score > spammax:
           # spammer
           relevance *= 0.0
      else:
           # linear function of spamminess
           relevance *= (spammax - score) / (spammax - spammin)


                                                                  19
Spam is an arms race.

§  We can’t reveal precisely which features we use for spam
    detection, or spammers will work around them.

§  Spammers will try to reverse-engineer us anyway.

§  Personalization benefits us and our legitimate users – it’s
    hard to spam your way to high personalized ranking.

§  Fighting spam is all about making the investment less
    profitable for the spammer.



                                                              20
Unified Search




                 21
Un-Unified Search




                    22
Introducing LinkedIn Unified Search!

Goal: make all of our content more discoverable.

Three new features:
§  Query Auto-Complete
§  Content Type Suggestions
§  Unified Search Result Page




                                                   23
Query Auto-Complete




                      24
Best completion not always the most popular.

§  In a heavy-tailed distribution, even the most popular
    queries account for a small fraction of distribution.

§  We don’t want to suggest generic queries that would
    produce useless results.
   –  e.g., c -> company, j -> jobs


§  Goal is to not only to infer user’s intent but also suggest a
    search that yields relevant results across content types.




                                                                25
Content Type Suggestions




                           26
How we compute content type suggestions.

§  Rank content types by likelihood of a successful search.
   –  Consider click-through behavior as well as downstream actions.


§  Bootstrap using what we know from pre-unified search
    behavior.
   –  Tricky part is compensating for findability bias.


§  Continuously evaluate and collect feedback through user
    behavior.
   –  E.g., members using the left rail to select a particular vertical.




                                                                           27
Unified Search Result Page




                             28
Intent Detection and Page Construction

§  Relevance is now a two-part computation:

              P(Content Type | User, Query)
                             x
          P(Document | User, Query, Content Type)

§  Intent detection comes first: inefficient to send all queries
    to all verticals.

§  Secondary components introduce diversity.


                                                                    29
Summary

§    Personalize every search and leverage structure.
§    Understand queries as early as possible.
§    Fight the spammers that be.
§    Unify and simplify the search experience.


             Goal: help LinkedIn’s 200M+
             members find and be found.




                                                         30
Thank you!




             31
Want to learn more?

§  Check out http://data.linkedin.com/search.

§  Contact us:
     –  Shakti: ssinha@linkedin.com
                http://linkedin.com/in/sdsinha

   –  Daniel: dtunkelang@linkedin.com
              http://linkedin.com/in/dtunkelang

§  Did we mention that we’re hiring?


                                                  32

Más contenido relacionado

Destacado

MongoDB: Advantages of an Open Source NoSQL Database
MongoDB: Advantages of an Open Source NoSQL DatabaseMongoDB: Advantages of an Open Source NoSQL Database
MongoDB: Advantages of an Open Source NoSQL DatabaseFITC
 
User Acquisition Strategy Guide
User Acquisition Strategy Guide User Acquisition Strategy Guide
User Acquisition Strategy Guide pixelbuilders
 
Natural Language Processing (NLP), Search and Wearable Technology
Natural Language Processing (NLP), Search and Wearable TechnologyNatural Language Processing (NLP), Search and Wearable Technology
Natural Language Processing (NLP), Search and Wearable Technologypixelbuilders
 
E-Tools to Help College Students with Career Planning and Job Search
E-Tools to Help College Students with Career Planning and Job SearchE-Tools to Help College Students with Career Planning and Job Search
E-Tools to Help College Students with Career Planning and Job SearchDenise Felder
 
LinkedIn for Students
LinkedIn for StudentsLinkedIn for Students
LinkedIn for StudentsRachel Romba
 
Students on LinkedIn: What They're Doing and How to Engage Them | Talent Conn...
Students on LinkedIn: What They're Doing and How to Engage Them | Talent Conn...Students on LinkedIn: What They're Doing and How to Engage Them | Talent Conn...
Students on LinkedIn: What They're Doing and How to Engage Them | Talent Conn...LinkedIn Talent Solutions
 
Students on LinkedIn: What They're Doing and How to Engage Them I Talent Conn...
Students on LinkedIn: What They're Doing and How to Engage Them I Talent Conn...Students on LinkedIn: What They're Doing and How to Engage Them I Talent Conn...
Students on LinkedIn: What They're Doing and How to Engage Them I Talent Conn...LinkedIn Talent Solutions
 
Machine Learning for Search at LinkedIn
Machine Learning for Search at LinkedInMachine Learning for Search at LinkedIn
Machine Learning for Search at LinkedInViet Ha-Thuc
 
Get LinkedIn: How to use LinkedIn to Get Connected
Get LinkedIn: How to use LinkedIn to Get ConnectedGet LinkedIn: How to use LinkedIn to Get Connected
Get LinkedIn: How to use LinkedIn to Get ConnectedGretchen Edwards
 
Social Media Summer School: Use LinkedIn to Get Connected
Social Media Summer School: Use LinkedIn to Get ConnectedSocial Media Summer School: Use LinkedIn to Get Connected
Social Media Summer School: Use LinkedIn to Get ConnectedGretchen Edwards
 
Linkedin for students
Linkedin for studentsLinkedin for students
Linkedin for students@mhandy1
 
Linkedin for high school students
Linkedin for high school studentsLinkedin for high school students
Linkedin for high school studentsDominic Mandel
 
Joining, Searching, & Interacting on LinkedIn Groups
Joining, Searching, & Interacting on LinkedIn GroupsJoining, Searching, & Interacting on LinkedIn Groups
Joining, Searching, & Interacting on LinkedIn GroupsBryant & Stratton College
 
Debt collection letter - What do I do?
Debt collection letter - What do I do?Debt collection letter - What do I do?
Debt collection letter - What do I do?Western NY Law Center
 
Big Data, We Have a Communication Problem
Big Data, We Have a Communication Problem Big Data, We Have a Communication Problem
Big Data, We Have a Communication Problem Daniel Tunkelang
 

Destacado (16)

MongoDB: Advantages of an Open Source NoSQL Database
MongoDB: Advantages of an Open Source NoSQL DatabaseMongoDB: Advantages of an Open Source NoSQL Database
MongoDB: Advantages of an Open Source NoSQL Database
 
User Acquisition Strategy Guide
User Acquisition Strategy Guide User Acquisition Strategy Guide
User Acquisition Strategy Guide
 
Natural Language Processing (NLP), Search and Wearable Technology
Natural Language Processing (NLP), Search and Wearable TechnologyNatural Language Processing (NLP), Search and Wearable Technology
Natural Language Processing (NLP), Search and Wearable Technology
 
E-Tools to Help College Students with Career Planning and Job Search
E-Tools to Help College Students with Career Planning and Job SearchE-Tools to Help College Students with Career Planning and Job Search
E-Tools to Help College Students with Career Planning and Job Search
 
LinkedIn for Students
LinkedIn for StudentsLinkedIn for Students
LinkedIn for Students
 
Students on LinkedIn: What They're Doing and How to Engage Them | Talent Conn...
Students on LinkedIn: What They're Doing and How to Engage Them | Talent Conn...Students on LinkedIn: What They're Doing and How to Engage Them | Talent Conn...
Students on LinkedIn: What They're Doing and How to Engage Them | Talent Conn...
 
Students on LinkedIn: What They're Doing and How to Engage Them I Talent Conn...
Students on LinkedIn: What They're Doing and How to Engage Them I Talent Conn...Students on LinkedIn: What They're Doing and How to Engage Them I Talent Conn...
Students on LinkedIn: What They're Doing and How to Engage Them I Talent Conn...
 
Machine Learning for Search at LinkedIn
Machine Learning for Search at LinkedInMachine Learning for Search at LinkedIn
Machine Learning for Search at LinkedIn
 
Learn to Rank search results
Learn to Rank search resultsLearn to Rank search results
Learn to Rank search results
 
Get LinkedIn: How to use LinkedIn to Get Connected
Get LinkedIn: How to use LinkedIn to Get ConnectedGet LinkedIn: How to use LinkedIn to Get Connected
Get LinkedIn: How to use LinkedIn to Get Connected
 
Social Media Summer School: Use LinkedIn to Get Connected
Social Media Summer School: Use LinkedIn to Get ConnectedSocial Media Summer School: Use LinkedIn to Get Connected
Social Media Summer School: Use LinkedIn to Get Connected
 
Linkedin for students
Linkedin for studentsLinkedin for students
Linkedin for students
 
Linkedin for high school students
Linkedin for high school studentsLinkedin for high school students
Linkedin for high school students
 
Joining, Searching, & Interacting on LinkedIn Groups
Joining, Searching, & Interacting on LinkedIn GroupsJoining, Searching, & Interacting on LinkedIn Groups
Joining, Searching, & Interacting on LinkedIn Groups
 
Debt collection letter - What do I do?
Debt collection letter - What do I do?Debt collection letter - What do I do?
Debt collection letter - What do I do?
 
Big Data, We Have a Communication Problem
Big Data, We Have a Communication Problem Big Data, We Have a Communication Problem
Big Data, We Have a Communication Problem
 

Más de Daniel Tunkelang

Query Understanding and Ecommerce
Query Understanding and EcommerceQuery Understanding and Ecommerce
Query Understanding and EcommerceDaniel Tunkelang
 
Semantic Equivalence of e-Commerce Queries
Semantic Equivalence of e-Commerce QueriesSemantic Equivalence of e-Commerce Queries
Semantic Equivalence of e-Commerce QueriesDaniel Tunkelang
 
Helping Searchers Satisfice through Query Understanding
Helping Searchers Satisfice through Query UnderstandingHelping Searchers Satisfice through Query Understanding
Helping Searchers Satisfice through Query UnderstandingDaniel Tunkelang
 
Query Understanding: A Manifesto
Query Understanding: A ManifestoQuery Understanding: A Manifesto
Query Understanding: A ManifestoDaniel Tunkelang
 
Where should you put your data scientists?
Where should you put your data scientists?Where should you put your data scientists?
Where should you put your data scientists?Daniel Tunkelang
 
Data Science: A Mindset for Productivity
Data Science: A Mindset for ProductivityData Science: A Mindset for Productivity
Data Science: A Mindset for ProductivityDaniel Tunkelang
 
My Three Ex’s: A Data Science Approach for Applied Machine Learning
My Three Ex’s: A Data Science Approach for Applied Machine LearningMy Three Ex’s: A Data Science Approach for Applied Machine Learning
My Three Ex’s: A Data Science Approach for Applied Machine LearningDaniel Tunkelang
 
Web science - How is it different?
Web science - How is it different?Web science - How is it different?
Web science - How is it different?Daniel Tunkelang
 
Social Search in a Professional Context
Social Search in a Professional ContextSocial Search in a Professional Context
Social Search in a Professional ContextDaniel Tunkelang
 
Search as Communication: Lessons from a Personal Journey
Search as Communication: Lessons from a Personal JourneySearch as Communication: Lessons from a Personal Journey
Search as Communication: Lessons from a Personal JourneyDaniel Tunkelang
 
Enterprise Search: How do we get there from here?
Enterprise Search: How do we get there from here?Enterprise Search: How do we get there from here?
Enterprise Search: How do we get there from here?Daniel Tunkelang
 
How to Interview a Data Scientist
How to Interview a Data ScientistHow to Interview a Data Scientist
How to Interview a Data ScientistDaniel Tunkelang
 
Information, Attention, and Trust: A Hierarchy of Needs
Information, Attention, and Trust: A Hierarchy of NeedsInformation, Attention, and Trust: A Hierarchy of Needs
Information, Attention, and Trust: A Hierarchy of NeedsDaniel Tunkelang
 
Data By The People, For The People
Data By The People, For The PeopleData By The People, For The People
Data By The People, For The PeopleDaniel Tunkelang
 
Scale, Structure, and Semantics
Scale, Structure, and SemanticsScale, Structure, and Semantics
Scale, Structure, and SemanticsDaniel Tunkelang
 
Strata 2012: Humans, Machines, and the Dimensions of Microwork
Strata 2012: Humans, Machines, and the Dimensions of MicroworkStrata 2012: Humans, Machines, and the Dimensions of Microwork
Strata 2012: Humans, Machines, and the Dimensions of MicroworkDaniel Tunkelang
 
Recommendations as a Conversation with the User
Recommendations as a Conversation with the UserRecommendations as a Conversation with the User
Recommendations as a Conversation with the UserDaniel Tunkelang
 
Keeping It Professional: Relevance, Recommendations, and Reputation at LinkedIn
Keeping It Professional: Relevance, Recommendations, and Reputation at LinkedInKeeping It Professional: Relevance, Recommendations, and Reputation at LinkedIn
Keeping It Professional: Relevance, Recommendations, and Reputation at LinkedInDaniel Tunkelang
 

Más de Daniel Tunkelang (20)

Query Understanding and Ecommerce
Query Understanding and EcommerceQuery Understanding and Ecommerce
Query Understanding and Ecommerce
 
Semantic Equivalence of e-Commerce Queries
Semantic Equivalence of e-Commerce QueriesSemantic Equivalence of e-Commerce Queries
Semantic Equivalence of e-Commerce Queries
 
Helping Searchers Satisfice through Query Understanding
Helping Searchers Satisfice through Query UnderstandingHelping Searchers Satisfice through Query Understanding
Helping Searchers Satisfice through Query Understanding
 
MMM, Search!
MMM, Search!MMM, Search!
MMM, Search!
 
Enterprise Intelligence
Enterprise IntelligenceEnterprise Intelligence
Enterprise Intelligence
 
Query Understanding: A Manifesto
Query Understanding: A ManifestoQuery Understanding: A Manifesto
Query Understanding: A Manifesto
 
Where should you put your data scientists?
Where should you put your data scientists?Where should you put your data scientists?
Where should you put your data scientists?
 
Data Science: A Mindset for Productivity
Data Science: A Mindset for ProductivityData Science: A Mindset for Productivity
Data Science: A Mindset for Productivity
 
My Three Ex’s: A Data Science Approach for Applied Machine Learning
My Three Ex’s: A Data Science Approach for Applied Machine LearningMy Three Ex’s: A Data Science Approach for Applied Machine Learning
My Three Ex’s: A Data Science Approach for Applied Machine Learning
 
Web science - How is it different?
Web science - How is it different?Web science - How is it different?
Web science - How is it different?
 
Social Search in a Professional Context
Social Search in a Professional ContextSocial Search in a Professional Context
Social Search in a Professional Context
 
Search as Communication: Lessons from a Personal Journey
Search as Communication: Lessons from a Personal JourneySearch as Communication: Lessons from a Personal Journey
Search as Communication: Lessons from a Personal Journey
 
Enterprise Search: How do we get there from here?
Enterprise Search: How do we get there from here?Enterprise Search: How do we get there from here?
Enterprise Search: How do we get there from here?
 
How to Interview a Data Scientist
How to Interview a Data ScientistHow to Interview a Data Scientist
How to Interview a Data Scientist
 
Information, Attention, and Trust: A Hierarchy of Needs
Information, Attention, and Trust: A Hierarchy of NeedsInformation, Attention, and Trust: A Hierarchy of Needs
Information, Attention, and Trust: A Hierarchy of Needs
 
Data By The People, For The People
Data By The People, For The PeopleData By The People, For The People
Data By The People, For The People
 
Scale, Structure, and Semantics
Scale, Structure, and SemanticsScale, Structure, and Semantics
Scale, Structure, and Semantics
 
Strata 2012: Humans, Machines, and the Dimensions of Microwork
Strata 2012: Humans, Machines, and the Dimensions of MicroworkStrata 2012: Humans, Machines, and the Dimensions of Microwork
Strata 2012: Humans, Machines, and the Dimensions of Microwork
 
Recommendations as a Conversation with the User
Recommendations as a Conversation with the UserRecommendations as a Conversation with the User
Recommendations as a Conversation with the User
 
Keeping It Professional: Relevance, Recommendations, and Reputation at LinkedIn
Keeping It Professional: Relevance, Recommendations, and Reputation at LinkedInKeeping It Professional: Relevance, Recommendations, and Reputation at LinkedIn
Keeping It Professional: Relevance, Recommendations, and Reputation at LinkedIn
 

Último

Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 

Último (20)

Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 

[In]formation Retrieval: Search at LinkedIn

  • 1. Shakti Daniel formation Retrieval: Search at LinkedIn Shakti Sinha Daniel Tunkelang Head, Search Relevance Head, Query Understanding Recruiting Solutions 1
  • 2. Why do 200M+ people use LinkedIn? 2
  • 3. People use LinkedIn because of other people. 3
  • 4. Search helps members find and be found. 4
  • 5. Rich collection of professional content. 5
  • 6. Every search is personalized. 6
  • 7. Let’s talk a bit about how it all works. §  Query Understanding §  Search Spam §  Unified Search More at http://data.linkedin.com/search. 7
  • 9. People are semi-structured objects. for i in [1..n]! s ← w 1 w 2 … w i! if Pc(s) > 0! a ← new Segment()! a.segs ← {s}! a.prob ← Pc(s)! B[i] ← {a}! for j in [1..i-1]! for b in B[j]! s ← wj wj+1 … wi! if Pc(s) > 0! a ← new Segment()! a.segs ← b.segs U {s}! a.prob ← b.prob * Pc(s)! B[i] ← B[i] U {a}! sort B[i] by prob! truncate B[i] to size k! 9
  • 10. Word sense is contextual. 10
  • 11. Understand queries as early as possible. 11
  • 12. Query structure has many applications. §  Boost results that match query interpretation. §  Bucket search log analysis by query classes. §  Query rewriting specific to query classes. §  … Query understanding focuses on set-level metrics. Not just about best answer, but getting to best question. 12
  • 14. Let’s look at a search spammer. 14
  • 15. Summary is verbose but legitimate. 15
  • 16. But then comes the keyword stuffing. 16
  • 17. How we train our search spam classifier. §  Find the queries targeted by spammers. –  10,000 most common non-name queries. §  Look at top results for a generic user. –  i.e., show unpersonalized search results. §  Remove private profiles. –  Members first! Can’t sacrifice privacy to fight spammers. §  Label data by crowdsourcing. –  Relevance is subjective, but spam is relatively objective. 17
  • 18. ROC curve for spam thresholding. 1 Spam score threshold 0.9 0.8 a 0.7 0.6 0.5 b 0.4 0.3 0<a<b<1 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 18
  • 19. Integrate spamminess into relevance score. §  Spam model yields a probability between 0 and 1. §  Use spam score as piecewise linear factor: if score < spammin: # not a spammer relevance *= 1.0 elif score > spammax: # spammer relevance *= 0.0 else: # linear function of spamminess relevance *= (spammax - score) / (spammax - spammin) 19
  • 20. Spam is an arms race. §  We can’t reveal precisely which features we use for spam detection, or spammers will work around them. §  Spammers will try to reverse-engineer us anyway. §  Personalization benefits us and our legitimate users – it’s hard to spam your way to high personalized ranking. §  Fighting spam is all about making the investment less profitable for the spammer. 20
  • 23. Introducing LinkedIn Unified Search! Goal: make all of our content more discoverable. Three new features: §  Query Auto-Complete §  Content Type Suggestions §  Unified Search Result Page 23
  • 25. Best completion not always the most popular. §  In a heavy-tailed distribution, even the most popular queries account for a small fraction of distribution. §  We don’t want to suggest generic queries that would produce useless results. –  e.g., c -> company, j -> jobs §  Goal is to not only to infer user’s intent but also suggest a search that yields relevant results across content types. 25
  • 27. How we compute content type suggestions. §  Rank content types by likelihood of a successful search. –  Consider click-through behavior as well as downstream actions. §  Bootstrap using what we know from pre-unified search behavior. –  Tricky part is compensating for findability bias. §  Continuously evaluate and collect feedback through user behavior. –  E.g., members using the left rail to select a particular vertical. 27
  • 29. Intent Detection and Page Construction §  Relevance is now a two-part computation: P(Content Type | User, Query) x P(Document | User, Query, Content Type) §  Intent detection comes first: inefficient to send all queries to all verticals. §  Secondary components introduce diversity. 29
  • 30. Summary §  Personalize every search and leverage structure. §  Understand queries as early as possible. §  Fight the spammers that be. §  Unify and simplify the search experience. Goal: help LinkedIn’s 200M+ members find and be found. 30
  • 32. Want to learn more? §  Check out http://data.linkedin.com/search. §  Contact us: –  Shakti: ssinha@linkedin.com http://linkedin.com/in/sdsinha –  Daniel: dtunkelang@linkedin.com http://linkedin.com/in/dtunkelang §  Did we mention that we’re hiring? 32