Learning to Rank with Apache Solr and Fusion

•Descargar como PPTX, PDF•

2 recomendaciones•654 vistas

This document discusses using Apache Solr and Fusion to improve search relevance through learning to rank (LTR). It notes that traditional keyword search is not always sufficient and that LTR allows selecting features that matter to teach the machine how to rank items. Fusion provides signals like user clicks that can be used as ground truth to train Solr's LTR implementation. The document shows how defining features, deriving ground truth from signals, and using Solr LTR can improve search relevance over using just text alone. A/B testing is recommended to safely evaluate any changes without negatively impacting different user experiences.

Tecnología

Learning To Rank
with Apache Solr and Fusion
Trey Grainger
Chief Algorithms Officer
Andy Liu
Senior Data Scientist

The Problem: Classic Similarity
Keyword search isn’t always enough for relevance

The Problem
User searches for “outdoor rock speaker”
Should see this:Sees this:

The Problem
• Improving search relevance is hard,
• TF-IDF and BM25 are good for text-keyword but what about other
models of relevance?
• Text matching is sometimes not the best solution
• Users don’t always say what they mean

The Solution:
Fusion 4 Signals + Solr 7 LTR

The Solution : Learning to Rank Overview
• Learning to rank lets you pick “features” of a document that
“matter” and teach the machine how to rank a set of items.
• One possible source of ordering is user behavior (i.e. the only clicks were
on the speaker shaped like a rock)
• Solr provides a Learning to Rank implementation.
• Fusion provides a way of capturing user behavior through signals.

The Solution
• Define features (relevancy factors)
• Derive Ground Truth using Fusion’s signals
• Use Solr’s Learning to Rank implementation

Some notes
• Fusion’s normal click boosting is an alternative and pretty good
• It is possible to use them together or one where the other
doesn’t work
• Do other more simple things first, learning to rank without an
adequate schema won’t accomplish much.

Some notes
• Using click signals for ground truth
• Pros:
• Voluminous
• Cheap
• Reflects a captive user’s intent (especially when supplemented with purchase, add to
cart events)
• Tacitly, implicitly labeled data the key to an OOTB “self-learning” system
• Cons
• Noisy
• Potential for reinforcing existing ranking

…but is it better?
• Models compared:
• Solr Out-of-the-box BM25
ranking using textual
features only
• Logistic Regression using all
features except the signals
feature
• Logistic Regression using all
features

Why is it better?
• Summary of Benefits:
• LTR offers automated relevancy tuning
• Using Fusion to implement LTR greatly reduces the time and complexity
required to train and deploy LTR models in production
• Leveraging Fusion’s signals as features in an LTR model offers an easy way of
boosting search relevance performance beyond what is possible using textual
features alone

A/B and experiments
• Do this carefully.
• A/B testing is the safest way to make sure you don’t ruin different user
experiences.
• Stay tuned for a future webinar on Experiments and A/B testing

Where to learn more?
• Grab the technical paper (with step by step instructions):
https://lucidworks.com/ebook/learning-to-rank/
• Grab the code: https://github.com/lucidworks/fusion-ltr-
webinar#fusionsolr-setup

Register by Sep 6
to save $200
SEPTEMBER 9-12,
2019 WASHINGTON DC
Check out the site here: https:/ / activate-conf.com/
JOIN ANDY AND TREY AT ACTIVATE
• Productionizing Python ML Models Using Fusion 5, Sanket Shahane,
Andy Liu
• Natural Language Search with Knowledge Graphs, Trey Grainger
• Closing Keynote: The Next Generation of AI-powered Search, Trey
Grainger
AI, ML & DATA SCIENCE TRACK
• Supporting Query Tagging/Suggestion in Fusion 4.2, Uber
• Building a Health QA Chatbot with Solr, Healthwise Incorporated
• Tackling a “Small Data” Search Challenge at Airbnb Experiences,
Airbnb
• Using Deep Learning and Customized Solr Components to Improve
Search Relevancy at Target, Target
THE SEARCH AND AI
CONFERENCE
SEPTEMBER 9-
12,2019 WASHINGTON DC
Check out the site here: https://activate-conf.com/
Register by Sep 6
to save $200

LIVE Q&A:
Enter your questions in the chat box now
for Trey to answer live

Más contenido relacionado

Similar a Learning to Rank with Apache Solr and Fusion

50 Shades of Fail KScope16Christian Berg

Introduction to Test Driven DevelopmentSiva Arunachalam

odsc_2023.pdfSanghamitra Deb

Sharpest tool in the box: Choosing the right authoring tool for your learning...Brightwave Group

Retrieval Performance Bound Analysis for Single Term QueriesTwitter Inc.

TDD - Seriously, try it! (updated '22)Nacho Cougil

Introduction to Unit Testing, BDD and Mocking using TestBox & MockBox at Into...Ortus Solutions, Corp

Introduction to Unit Testing, BDD and Mocking using TestBox & MockBox at Adob...Uma Ghotikar

Eurosport's Kodakademi #2Benjamin Baumann

Avoiding test hellYun Ki Lee

Tuning ML Models: Scaling, Workflows, and ArchitectureDatabricks

Evolving the Optimal Relevancy Ranking Model at Dice.comSimon Hughes

Sample_CPT_Presentation-by_Dongwei_Mei.pdfSURYAPRAKASH281978

Mulesoft torronto meetup_16Anurag Dwivedi

Critical Capabilities to Shifting Left the Right WaySmartBear

Client Technical Analysis of Legacy Software and Future ReplacementVictorSzoltysek

Performance and AbstractionsMetosin Oy

TDD - Seriously, try it! - Trójmiasto Java User Group (17th May '23)ssusercaf6c1

TDD - Seriously, try it! - Trjjmiasto JUG (17th May '23)Nacho Cougil

Building trust within the organization, first steps towards DevOpsGuido Serra

Similar a Learning to Rank with Apache Solr and Fusion (20)

50 Shades of Fail KScope16

Introduction to Test Driven Development

odsc_2023.pdf

Sharpest tool in the box: Choosing the right authoring tool for your learning...

Retrieval Performance Bound Analysis for Single Term Queries

TDD - Seriously, try it! (updated '22)

Introduction to Unit Testing, BDD and Mocking using TestBox & MockBox at Into...

Introduction to Unit Testing, BDD and Mocking using TestBox & MockBox at Adob...

Eurosport's Kodakademi #2

Avoiding test hell

Tuning ML Models: Scaling, Workflows, and Architecture

Evolving the Optimal Relevancy Ranking Model at Dice.com

Sample_CPT_Presentation-by_Dongwei_Mei.pdf

Mulesoft torronto meetup_16

Critical Capabilities to Shifting Left the Right Way

Client Technical Analysis of Legacy Software and Future Replacement

Performance and Abstractions

TDD - Seriously, try it! - Trójmiasto Java User Group (17th May '23)

TDD - Seriously, try it! - Trjjmiasto JUG (17th May '23)

Building trust within the organization, first steps towards DevOps

Más de Lucidworks

Search is the Tip of the Spear for Your B2B eCommerce StrategyLucidworks

Drive Agent Effectiveness in SalesforceLucidworks

How Crate & Barrel Connects Shoppers with Relevant ProductsLucidworks

Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks

Connected Experiences Are Personalized ExperiencesLucidworks

Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Lucidworks

[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...Lucidworks

Preparing for Peak in Ecommerce | eTail Asia 2020Lucidworks

Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Lucidworks

AI-Powered Linguistics and Search with Fusion and RosetteLucidworks

The Service Industry After COVID-19: The Soul of Service in a Virtual MomentLucidworks

Webinar: Smart answers for employee and customer support after covid 19 - EuropeLucidworks

Smart Answers for Employee and Customer Support After COVID-19Lucidworks

Applying AI & Search in Europe - featuring 451 ResearchLucidworks

Webinar: Accelerate Data Science with Fusion 5.1Lucidworks

Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyLucidworks

Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Lucidworks

Apply Knowledge Graphs and Search for Real-World Decision IntelligenceLucidworks

Webinar: Building a Business Case for Enterprise SearchLucidworks

Why Insight Engines Matter in 2020 and BeyondLucidworks

Más de Lucidworks (20)

Search is the Tip of the Spear for Your B2B eCommerce Strategy

Drive Agent Effectiveness in Salesforce

How Crate & Barrel Connects Shoppers with Relevant Products

Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery

Connected Experiences Are Personalized Experiences

Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...

[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...

Preparing for Peak in Ecommerce | eTail Asia 2020

Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...

AI-Powered Linguistics and Search with Fusion and Rosette

The Service Industry After COVID-19: The Soul of Service in a Virtual Moment

Webinar: Smart answers for employee and customer support after covid 19 - Europe

Smart Answers for Employee and Customer Support After COVID-19

Applying AI & Search in Europe - featuring 451 Research

Webinar: Accelerate Data Science with Fusion 5.1

Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy

Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...

Apply Knowledge Graphs and Search for Real-World Decision Intelligence

Webinar: Building a Business Case for Enterprise Search

Why Insight Engines Matter in 2020 and Beyond

Último

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science

Scaling API-first – The story of a global engineering organizationRadu Cotescu

Finology Group – Insurtech Innovation Award 2024The Digital Insurer

IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge

How to convert PDF to text with Nanonetsnaman860154

Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge

Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun

The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los

Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC

Artificial Intelligence: Facts and MythsJoaquim Jorge

Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies

Histor y of HAM Radio presentation slidevu2urc

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j

Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun

04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays

Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2

How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes

Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer

Learning to Rank with Apache Solr and Fusion

1. Learning To Rank with Apache Solr and Fusion Trey Grainger Chief Algorithms Officer Andy Liu Senior Data Scientist

2. The Problem: Classic Similarity Keyword search isn’t always enough for relevance

3. The Problem User searches for “outdoor rock speaker” Should see this:Sees this:

4. The Problem • Improving search relevance is hard, • TF-IDF and BM25 are good for text-keyword but what about other models of relevance? • Text matching is sometimes not the best solution • Users don’t always say what they mean

5. The Solution: Fusion 4 Signals + Solr 7 LTR

6. The Solution : Learning to Rank Overview • Learning to rank lets you pick “features” of a document that “matter” and teach the machine how to rank a set of items. • One possible source of ordering is user behavior (i.e. the only clicks were on the speaker shaped like a rock) • Solr provides a Learning to Rank implementation. • Fusion provides a way of capturing user behavior through signals.

7. The Solution: Learning to Rank Overview

8. The Solution: Learning to Rank Overview

9. The Solution: Fusion Signals Overview

10. The Solution • Define features (relevancy factors) • Derive Ground Truth using Fusion’s signals • Use Solr’s Learning to Rank implementation

11. Some notes • Fusion’s normal click boosting is an alternative and pretty good • It is possible to use them together or one where the other doesn’t work • Do other more simple things first, learning to rank without an adequate schema won’t accomplish much.

12. Some notes • Using click signals for ground truth • Pros: • Voluminous • Cheap • Reflects a captive user’s intent (especially when supplemented with purchase, add to cart events) • Tacitly, implicitly labeled data the key to an OOTB “self-learning” system • Cons • Noisy • Potential for reinforcing existing ranking

13. Putting it together

14. Building an LTR Pipeline

15. …but is it better? • Models compared: • Solr Out-of-the-box BM25 ranking using textual features only • Logistic Regression using all features except the signals feature • Logistic Regression using all features

16. Why is it better? • Summary of Benefits: • LTR offers automated relevancy tuning • Using Fusion to implement LTR greatly reduces the time and complexity required to train and deploy LTR models in production • Leveraging Fusion’s signals as features in an LTR model offers an easy way of boosting search relevance performance beyond what is possible using textual features alone

17. A/B and experiments • Do this carefully. • A/B testing is the safest way to make sure you don’t ruin different user experiences. • Stay tuned for a future webinar on Experiments and A/B testing

18. Where to learn more? • Grab the technical paper (with step by step instructions): https://lucidworks.com/ebook/learning-to-rank/ • Grab the code: https://github.com/lucidworks/fusion-ltr- webinar#fusionsolr-setup

19. Thank you

20. Register by Sep 6 to save $200 SEPTEMBER 9-12, 2019 WASHINGTON DC Check out the site here: https:/ / activate-conf.com/ JOIN ANDY AND TREY AT ACTIVATE • Productionizing Python ML Models Using Fusion 5, Sanket Shahane, Andy Liu • Natural Language Search with Knowledge Graphs, Trey Grainger • Closing Keynote: The Next Generation of AI-powered Search, Trey Grainger AI, ML & DATA SCIENCE TRACK • Supporting Query Tagging/Suggestion in Fusion 4.2, Uber • Building a Health QA Chatbot with Solr, Healthwise Incorporated • Tackling a “Small Data” Search Challenge at Airbnb Experiences, Airbnb • Using Deep Learning and Customized Solr Components to Improve Search Relevancy at Target, Target THE SEARCH AND AI CONFERENCE SEPTEMBER 9- 12,2019 WASHINGTON DC Check out the site here: https://activate-conf.com/ Register by Sep 6 to save $200

21. LIVE Q&A: Enter your questions in the chat box now for Trey to answer live

Learning to Rank with Apache Solr and Fusion

Recomendados

Recomendados

Más contenido relacionado

Similar a Learning to Rank with Apache Solr and Fusion

Similar a Learning to Rank with Apache Solr and Fusion (20)

Más de Lucidworks

Más de Lucidworks (20)

Último

Último (20)

Learning to Rank with Apache Solr and Fusion