SlideShare una empresa de Scribd logo
1 de 23
Descargar para leer sin conexión
Relevanzranking aus der Nähe
Wie Treffergewichtungen in VuFind/Solr
zustandekommen und optimiert werden können
Oliver Goldschmidt
https://orcid.org/0000-0002-5468-401X
Universitätsbibliothek der TU Hamburg
28.09.2017
Oliver Goldschmidt
TUB Hamburg-Harburg
6th German VuFind-Meeting
University Library Hamburg, Germany
2017/09/28
https://creativecommons.org/licenses/by/4.0/
Oliver Goldschmidt
TU Hamburg-Harburg
6th German VuFind-Meeting
University Library Hamburg, Germany
2017/09/28
Brainstorming
Wünsche an das Relevanzranking
https://creativecommons.org/licenses/by/4.0/
●
Aktuelleres höher gewichten
●
Exact Match höher gewichten
●
Medienart höher gewichten (Lehrbuch-Boosting, E-Book-
Boosting, …)
●
Phrasentreffer-Boosting
Oliver Goldschmidt
TU Hamburg-Harburg
Wünsche an das Relevanzranking
6th German VuFind-Meeting
University Library Hamburg, Germany
2017/09/28
https://creativecommons.org/licenses/by/4.0/
●
Wege suchen, wie das Erscheinungsjahr und exact matches
besser in die Relevanzbewertung in VuFind eingeflochten
werden können
●
Schaffung eines Problembewusstseins für die
Relevanzgewichtung
●
Diskussion für weitere Verbesserungvorschläge bzw. Ideen
zur Verbesserung des Relevanzrankings
Oliver Goldschmidt
TU Hamburg-Harburg
Ziel des Workshops
6th German VuFind-Meeting
University Library Hamburg, Germany
2017/09/28
https://creativecommons.org/licenses/by/4.0/
●
Kurzer Suchbegriff mit Zahl
●
Erwartung: MSN-202 prominent
gewichtet
●
Ergebnis: MSN-202 gar nicht
auf erster Trefferseite
Oliver Goldschmidt
TU Hamburg-Harburg
Beispiel Ad 2000
6th German VuFind-Meeting
University Library Hamburg, Germany
2017/09/28
https://creativecommons.org/licenses/by/4.0/
●
„Allerweltsbegriff“ als Suchbegriff
●
Erste Treffer: Lehrbücher
(„Lehrbuchboosting“)
●
Exact matches im Titel geringer gewichtet
Oliver Goldschmidt
TU Hamburg-Harburg
Beispiel Thermodynamik
6th German VuFind-Meeting
University Library Hamburg, Germany
2017/09/28
https://creativecommons.org/licenses/by/4.0/
●
Produktname als Suchbegriff
●
Ergebnis nicht gut, erster
Treffer zur Software InDesign
in tub.find ist Treffer 8 (in
Beluga nicht auf den ersten
20 Seiten)
Oliver Goldschmidt
TU Hamburg-Harburg
Beispiel InDesign
6th German VuFind-Meeting
University Library Hamburg, Germany
2017/09/28
https://creativecommons.org/licenses/by/4.0/
●
Band aus Schriftenreihe als
Suchbegriff
Oliver Goldschmidt
TU Hamburg-Harburg
Beispiel VDI-Berichte 2217
6th German VuFind-Meeting
University Library Hamburg, Germany
2017/09/28
https://creativecommons.org/licenses/by/4.0/
●
Band aus Schriftenreihe als
Suchbegriff
Oliver Goldschmidt
TU Hamburg-Harburg
Beispiel DIN-Taschenbuch 126
6th German VuFind-Meeting
University Library Hamburg, Germany
2017/09/28
https://creativecommons.org/licenses/by/4.0/
●
Zeitschriftentitel als Suchbegriff
●
Ähnlich der Schriftenreihe-
Suche
●
Erwartungshaltung: Gesamt-TA
weit oben finden
●
Ergebnisse nicht gut
●
Workaround: Suche nach
Zeitschriftentitel
Oliver Goldschmidt
TU Hamburg-Harburg
Beispiel Nature/Science
6th German VuFind-Meeting
University Library Hamburg, Germany
2017/09/28
https://creativecommons.org/licenses/by/4.0/
Oliver Goldschmidt
TU Hamburg-Harburg
Beispiel Graphentheorie
6th German VuFind-Meeting
University Library Hamburg, Germany
2017/09/28
●
Beispiellink
●
Verschiedene Auflagen werden
gefunden
●
Relevanzbewertung sollte
aktuellere Auflage höher
gewichten
●
Optimierungsbedarf
Score: 42355,82
Score: 42300,203
Score: 42200,203
Score: 40642,887
Score: 40569,08
https://creativecommons.org/licenses/by/4.0/
Oliver Goldschmidt
TU Hamburg-Harburg
DisMax Boosting Parameter
6th German VuFind-Meeting
University Library Hamburg, Germany
2017/09/28
q Defines the raw input strings for the query.
q.alt Calls the standard query parser and defines query input strings, when the q parameter is not used.
qf Query Fields: specifies the fields in the index on which to perform the query. If absent, defaults to df.
mm Minimum "Should" Match: specifies a minimum number of clauses that must match in a query. If no 'mm' parameter is
specified in the query, or as a default in solrconfig.xml, the effective value of the q.op parameter (either in the query, as a
default in solrconfig.xml, or from the defaultOperator option in the Schema) is used to influence the behavior. If q.op is
effectively AND'ed, then mm=100%; if q.op is OR'ed, then mm=1. Users who want to force the legacy behavior should set a
default value for the 'mm' parameter in their solrconfig.xml file. Users should add this as a configured default for their request
handlers. This parameter tolerates miscellaneous white spaces in expressions (e.g., " 3 < -25% 10 < -3n", " n-25%n ", "
n3n ").
pf Phrase Fields: boosts the score of documents in cases where all of the terms in the q parameter appear in close
proximity.
ps Phrase Slop: specifies the number of positions two terms can be apart in order to match the specified phrase.
qs Query Phrase Slop: specifies the number of positions two terms can be apart in order to match the specified phrase.
Used specifically with the qf parameter.
tie Tie Breaker: specifies a float value (which should be something much less than 1) to use as tiebreaker in DisMax
queries. Default: 0.0
bq Boost Query: specifies a factor by which a term or phrase should be "boosted" in importance when considering a
match.
bf Boost Functions: specifies functions to be applied to boosts. (See for details about function queries.)
Quelle: https://cwiki.apache.org/confluence/display/solr/The+DisMax+Query+Parser
https://creativecommons.org/licenses/by/4.0/
Oliver Goldschmidt
TU Hamburg-Harburg
6th German VuFind-Meeting
University Library Hamburg, Germany
2017/09/28
Bewertungsgrundlage tub.find
pf Phrase Fields: boosts the score of documents in cases where all of the terms in the q parameter appear in close
proximity.
- [pf, title_short^5 title_full_unstemmed^400 title_full^5 title^5 title_alt^5
title_new^5 series^5 series2^5 author^10 author_fuller^10 topic_unstemmed^20
topic_title^20 topic^5 contents^0 allfields_unstemmed^0 fulltext_unstemmed^0
geographic genre description]
ps Phrase Slop: specifies the number of positions two terms can be apart in order to match the specified phrase.
- [ps, 10]
bf Boost Functions: specifies functions to be applied to boosts.
# Basic boosting for year of publication
- [bf, ord(publishDateSort)^10]
# More boosting on the year of publication, depending on the type of record
- [bf, "if(exists(query({!v=format:Journal})),ord(publishDateSort),0)^0"]
- [bf, "if(exists(query({!v=format:Book})),ord(publishDateSort),0)^10"]
- [bf, "if(exists(query({!v=format:eBook})),ord(publishDateSort),0)^12"]
- [bf, "if(exists(query({!v=collection:Website})),ord(publishDateSort),0)^0.1"]
- [bf, "if(exists(query({!v=collection:Weblog})),ord(publishDateSort),0)^0"]
https://creativecommons.org/licenses/by/4.0/
Oliver Goldschmidt
TU Hamburg-Harburg
6th German VuFind-Meeting
University Library Hamburg, Germany
2017/09/28
Bewertungsgrundlage tub.find
bq Boost Query: specifies a factor by which a term or phrase should be "boosted" in importance when considering a match.
# Additional format boosting
- [bq, format:Book^30]
- [bq, format:eBook^50]
- [bq, format:Journal^50]
- [bq, format:eJournal^25]
- [bq, collection:Website^8]
# Additional boost for text book collection
- [bq, standort_iln_str_mv:"23:LBS"^150]
https://creativecommons.org/licenses/by/4.0/
Oliver Goldschmidt
TU Hamburg-Harburg
6th German VuFind-Meeting
University Library Hamburg, Germany
2017/09/28
●
Erklärung Treffer 1:
– 7064.5713 title_full_unstemmed
algorithmische
– 7238.854 title_full_unstemmed
graphentheorie
– 14303.426 title_full_unstemmed
algorithmische graphentheorie
– 48.970158 format:book (30-faches
Boosting von 1.6323386)
– 6850.0 Erscheinungsdatum (10-
faches Boosting von 685)
– 6850.0 zusätzliches Buch-
Erscheinungsjahr-Boosting
Score: 42355,82
Beispiel Graphentheorie
https://creativecommons.org/licenses/by/4.0/
Oliver Goldschmidt
TU Hamburg-Harburg
6th German VuFind-Meeting
University Library Hamburg, Germany
2017/09/28
●
Erklärung Treffer 4:
– 6223.7314 title_full_unstemmed
algorithmische
– 6377.271 title_full_unstemmed
graphentheorie
– 12601.002 title_full_unstemmed
algorithmische graphentheorie
– 48.970158 format:book (30-faches
Boosting von 1.6323386)
– 1311.9124 Lehrbuch-Boosting
– 7040.0 Erscheinungsdatum (10-
faches Boosting von 704)
– 7040.0 zusätzliches Buch-
Erscheinungsjahr-Boosting
Score: 40642,887
Beispiel Graphentheorie
https://creativecommons.org/licenses/by/4.0/
Oliver Goldschmidt
TU Hamburg-Harburg
6th German VuFind-Meeting
University Library Hamburg, Germany
2017/09/28
●
Erklärung Treffer 5:
– 6392.8774 title_full_unstemmed
algorithmische
– 6284.0864 title_full_unstemmed
graphentheorie
– 12676.964 title_full_unstemmed
algorithmische graphentheorie
– 162.27933 format:ebook (50-faches
Boosting von 3.2455864 )
– 6850.0 Erscheinungsdatum (10-
faches Boosting von 685)
– 8220.0 zusätzliches eBook-
Erscheinungsjahr-Boosting (12-faches
Boosting von 685)
Score: 40569,08
Beispiel Graphentheorie
https://creativecommons.org/licenses/by/4.0/
Oliver Goldschmidt
TU Hamburg-Harburg
Beispiel
6th German VuFind-Meeting
University Library Hamburg, Germany
2017/09/28
https://creativecommons.org/licenses/by/4.0/
Oliver Goldschmidt
TU Hamburg-Harburg
Fragen und Schlussfolgerungen
6th German VuFind-Meeting
University Library Hamburg, Germany
2017/09/28
●
Warum werden die Erscheinungsjahre von
Treffer 1 und 5 (1996 und 2010) gleich
behandelt (beide haben einen Score von
685)?
●
Erscheinungsjahr-Berücksichtigung hat
offenbar gar keine Auswirkung?!
●
Warum entstehen unterschiedliche Scores
in title_full_unstemmed, obwohl der Titel in
allen drei Fällen identisch ist?
https://creativecommons.org/licenses/by/4.0/
Oliver Goldschmidt
TU Hamburg-Harburg
Beispiel
6th German VuFind-Meeting
University Library Hamburg, Germany
2017/09/28
Treffer 1 Treffer 4 Treffer 5 Feld
Algorithmische
Graphentheorie
Volker Turau
Algorithmische
Graphentheorie
Turau, Christoph
Weyer
Algorithmische
Graphentheorie
Elektronische
Ressource von
Volker Turau
title_full_unstemmed
nein ja nein Lehrbuch
Book Book eBook Formatboosting
1996 2015 2010 Erscheinungsjahr
●
Titel ist im Index nicht identisch!
https://creativecommons.org/licenses/by/4.0/
●
Searchspecs.yaml zur Anwendung von Boostingfunktionen
●
DisMax Boosting Parameter
https://cwiki.apache.org/confluence/display/solr/The+DisMa
x+Query+Parser
●
Solr-Parameter debugQuery=true zum Analysieren des
Scorings
●
Boosting-Funktionen
https://wiki.apache.org/solr/FunctionQuery
– recip (funktioniert im findex- oder Sharding-Kontext
nicht)
– ord / rord
Oliver Goldschmidt
TU Hamburg-Harburg
Konfiguration in VuFind
6th German VuFind-Meeting
University Library Hamburg, Germany
2017/09/28
https://creativecommons.org/licenses/by/4.0/
●
debugQuery=true an Solr-Anfrage anhängen
Oliver Goldschmidt
TU Hamburg-Harburg
Relevanzerklärung aus Solr
6th German VuFind-Meeting
University Library Hamburg, Germany
2017/09/28
https://creativecommons.org/licenses/by/4.0/
Oliver Goldschmidt
TU Hamburg-Harburg
Vielen Dank
6th German VuFind-Meeting
University Library Hamburg, Germany
2017/09/28
Viel Erfolg beim
Optimieren
https://creativecommons.org/licenses/by/4.0/

Más contenido relacionado

Destacado

Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Applitools
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at WorkGetSmarter
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...DevGAMM Conference
 
Barbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationBarbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationErica Santiago
 
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them wellGood Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them wellSaba Software
 

Destacado (20)

Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
 
Barbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationBarbie - Brand Strategy Presentation
Barbie - Brand Strategy Presentation
 
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them wellGood Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
 

Relevanzranking (in VuFind) aus der Nähe

  • 1. Relevanzranking aus der Nähe Wie Treffergewichtungen in VuFind/Solr zustandekommen und optimiert werden können Oliver Goldschmidt https://orcid.org/0000-0002-5468-401X Universitätsbibliothek der TU Hamburg 28.09.2017 Oliver Goldschmidt TUB Hamburg-Harburg 6th German VuFind-Meeting University Library Hamburg, Germany 2017/09/28 https://creativecommons.org/licenses/by/4.0/
  • 2. Oliver Goldschmidt TU Hamburg-Harburg 6th German VuFind-Meeting University Library Hamburg, Germany 2017/09/28 Brainstorming Wünsche an das Relevanzranking https://creativecommons.org/licenses/by/4.0/
  • 3. ● Aktuelleres höher gewichten ● Exact Match höher gewichten ● Medienart höher gewichten (Lehrbuch-Boosting, E-Book- Boosting, …) ● Phrasentreffer-Boosting Oliver Goldschmidt TU Hamburg-Harburg Wünsche an das Relevanzranking 6th German VuFind-Meeting University Library Hamburg, Germany 2017/09/28 https://creativecommons.org/licenses/by/4.0/
  • 4. ● Wege suchen, wie das Erscheinungsjahr und exact matches besser in die Relevanzbewertung in VuFind eingeflochten werden können ● Schaffung eines Problembewusstseins für die Relevanzgewichtung ● Diskussion für weitere Verbesserungvorschläge bzw. Ideen zur Verbesserung des Relevanzrankings Oliver Goldschmidt TU Hamburg-Harburg Ziel des Workshops 6th German VuFind-Meeting University Library Hamburg, Germany 2017/09/28 https://creativecommons.org/licenses/by/4.0/
  • 5. ● Kurzer Suchbegriff mit Zahl ● Erwartung: MSN-202 prominent gewichtet ● Ergebnis: MSN-202 gar nicht auf erster Trefferseite Oliver Goldschmidt TU Hamburg-Harburg Beispiel Ad 2000 6th German VuFind-Meeting University Library Hamburg, Germany 2017/09/28 https://creativecommons.org/licenses/by/4.0/
  • 6. ● „Allerweltsbegriff“ als Suchbegriff ● Erste Treffer: Lehrbücher („Lehrbuchboosting“) ● Exact matches im Titel geringer gewichtet Oliver Goldschmidt TU Hamburg-Harburg Beispiel Thermodynamik 6th German VuFind-Meeting University Library Hamburg, Germany 2017/09/28 https://creativecommons.org/licenses/by/4.0/
  • 7. ● Produktname als Suchbegriff ● Ergebnis nicht gut, erster Treffer zur Software InDesign in tub.find ist Treffer 8 (in Beluga nicht auf den ersten 20 Seiten) Oliver Goldschmidt TU Hamburg-Harburg Beispiel InDesign 6th German VuFind-Meeting University Library Hamburg, Germany 2017/09/28 https://creativecommons.org/licenses/by/4.0/
  • 8. ● Band aus Schriftenreihe als Suchbegriff Oliver Goldschmidt TU Hamburg-Harburg Beispiel VDI-Berichte 2217 6th German VuFind-Meeting University Library Hamburg, Germany 2017/09/28 https://creativecommons.org/licenses/by/4.0/
  • 9. ● Band aus Schriftenreihe als Suchbegriff Oliver Goldschmidt TU Hamburg-Harburg Beispiel DIN-Taschenbuch 126 6th German VuFind-Meeting University Library Hamburg, Germany 2017/09/28 https://creativecommons.org/licenses/by/4.0/
  • 10. ● Zeitschriftentitel als Suchbegriff ● Ähnlich der Schriftenreihe- Suche ● Erwartungshaltung: Gesamt-TA weit oben finden ● Ergebnisse nicht gut ● Workaround: Suche nach Zeitschriftentitel Oliver Goldschmidt TU Hamburg-Harburg Beispiel Nature/Science 6th German VuFind-Meeting University Library Hamburg, Germany 2017/09/28 https://creativecommons.org/licenses/by/4.0/
  • 11. Oliver Goldschmidt TU Hamburg-Harburg Beispiel Graphentheorie 6th German VuFind-Meeting University Library Hamburg, Germany 2017/09/28 ● Beispiellink ● Verschiedene Auflagen werden gefunden ● Relevanzbewertung sollte aktuellere Auflage höher gewichten ● Optimierungsbedarf Score: 42355,82 Score: 42300,203 Score: 42200,203 Score: 40642,887 Score: 40569,08 https://creativecommons.org/licenses/by/4.0/
  • 12. Oliver Goldschmidt TU Hamburg-Harburg DisMax Boosting Parameter 6th German VuFind-Meeting University Library Hamburg, Germany 2017/09/28 q Defines the raw input strings for the query. q.alt Calls the standard query parser and defines query input strings, when the q parameter is not used. qf Query Fields: specifies the fields in the index on which to perform the query. If absent, defaults to df. mm Minimum "Should" Match: specifies a minimum number of clauses that must match in a query. If no 'mm' parameter is specified in the query, or as a default in solrconfig.xml, the effective value of the q.op parameter (either in the query, as a default in solrconfig.xml, or from the defaultOperator option in the Schema) is used to influence the behavior. If q.op is effectively AND'ed, then mm=100%; if q.op is OR'ed, then mm=1. Users who want to force the legacy behavior should set a default value for the 'mm' parameter in their solrconfig.xml file. Users should add this as a configured default for their request handlers. This parameter tolerates miscellaneous white spaces in expressions (e.g., " 3 < -25% 10 < -3n", " n-25%n ", " n3n "). pf Phrase Fields: boosts the score of documents in cases where all of the terms in the q parameter appear in close proximity. ps Phrase Slop: specifies the number of positions two terms can be apart in order to match the specified phrase. qs Query Phrase Slop: specifies the number of positions two terms can be apart in order to match the specified phrase. Used specifically with the qf parameter. tie Tie Breaker: specifies a float value (which should be something much less than 1) to use as tiebreaker in DisMax queries. Default: 0.0 bq Boost Query: specifies a factor by which a term or phrase should be "boosted" in importance when considering a match. bf Boost Functions: specifies functions to be applied to boosts. (See for details about function queries.) Quelle: https://cwiki.apache.org/confluence/display/solr/The+DisMax+Query+Parser https://creativecommons.org/licenses/by/4.0/
  • 13. Oliver Goldschmidt TU Hamburg-Harburg 6th German VuFind-Meeting University Library Hamburg, Germany 2017/09/28 Bewertungsgrundlage tub.find pf Phrase Fields: boosts the score of documents in cases where all of the terms in the q parameter appear in close proximity. - [pf, title_short^5 title_full_unstemmed^400 title_full^5 title^5 title_alt^5 title_new^5 series^5 series2^5 author^10 author_fuller^10 topic_unstemmed^20 topic_title^20 topic^5 contents^0 allfields_unstemmed^0 fulltext_unstemmed^0 geographic genre description] ps Phrase Slop: specifies the number of positions two terms can be apart in order to match the specified phrase. - [ps, 10] bf Boost Functions: specifies functions to be applied to boosts. # Basic boosting for year of publication - [bf, ord(publishDateSort)^10] # More boosting on the year of publication, depending on the type of record - [bf, "if(exists(query({!v=format:Journal})),ord(publishDateSort),0)^0"] - [bf, "if(exists(query({!v=format:Book})),ord(publishDateSort),0)^10"] - [bf, "if(exists(query({!v=format:eBook})),ord(publishDateSort),0)^12"] - [bf, "if(exists(query({!v=collection:Website})),ord(publishDateSort),0)^0.1"] - [bf, "if(exists(query({!v=collection:Weblog})),ord(publishDateSort),0)^0"] https://creativecommons.org/licenses/by/4.0/
  • 14. Oliver Goldschmidt TU Hamburg-Harburg 6th German VuFind-Meeting University Library Hamburg, Germany 2017/09/28 Bewertungsgrundlage tub.find bq Boost Query: specifies a factor by which a term or phrase should be "boosted" in importance when considering a match. # Additional format boosting - [bq, format:Book^30] - [bq, format:eBook^50] - [bq, format:Journal^50] - [bq, format:eJournal^25] - [bq, collection:Website^8] # Additional boost for text book collection - [bq, standort_iln_str_mv:"23:LBS"^150] https://creativecommons.org/licenses/by/4.0/
  • 15. Oliver Goldschmidt TU Hamburg-Harburg 6th German VuFind-Meeting University Library Hamburg, Germany 2017/09/28 ● Erklärung Treffer 1: – 7064.5713 title_full_unstemmed algorithmische – 7238.854 title_full_unstemmed graphentheorie – 14303.426 title_full_unstemmed algorithmische graphentheorie – 48.970158 format:book (30-faches Boosting von 1.6323386) – 6850.0 Erscheinungsdatum (10- faches Boosting von 685) – 6850.0 zusätzliches Buch- Erscheinungsjahr-Boosting Score: 42355,82 Beispiel Graphentheorie https://creativecommons.org/licenses/by/4.0/
  • 16. Oliver Goldschmidt TU Hamburg-Harburg 6th German VuFind-Meeting University Library Hamburg, Germany 2017/09/28 ● Erklärung Treffer 4: – 6223.7314 title_full_unstemmed algorithmische – 6377.271 title_full_unstemmed graphentheorie – 12601.002 title_full_unstemmed algorithmische graphentheorie – 48.970158 format:book (30-faches Boosting von 1.6323386) – 1311.9124 Lehrbuch-Boosting – 7040.0 Erscheinungsdatum (10- faches Boosting von 704) – 7040.0 zusätzliches Buch- Erscheinungsjahr-Boosting Score: 40642,887 Beispiel Graphentheorie https://creativecommons.org/licenses/by/4.0/
  • 17. Oliver Goldschmidt TU Hamburg-Harburg 6th German VuFind-Meeting University Library Hamburg, Germany 2017/09/28 ● Erklärung Treffer 5: – 6392.8774 title_full_unstemmed algorithmische – 6284.0864 title_full_unstemmed graphentheorie – 12676.964 title_full_unstemmed algorithmische graphentheorie – 162.27933 format:ebook (50-faches Boosting von 3.2455864 ) – 6850.0 Erscheinungsdatum (10- faches Boosting von 685) – 8220.0 zusätzliches eBook- Erscheinungsjahr-Boosting (12-faches Boosting von 685) Score: 40569,08 Beispiel Graphentheorie https://creativecommons.org/licenses/by/4.0/
  • 18. Oliver Goldschmidt TU Hamburg-Harburg Beispiel 6th German VuFind-Meeting University Library Hamburg, Germany 2017/09/28 https://creativecommons.org/licenses/by/4.0/
  • 19. Oliver Goldschmidt TU Hamburg-Harburg Fragen und Schlussfolgerungen 6th German VuFind-Meeting University Library Hamburg, Germany 2017/09/28 ● Warum werden die Erscheinungsjahre von Treffer 1 und 5 (1996 und 2010) gleich behandelt (beide haben einen Score von 685)? ● Erscheinungsjahr-Berücksichtigung hat offenbar gar keine Auswirkung?! ● Warum entstehen unterschiedliche Scores in title_full_unstemmed, obwohl der Titel in allen drei Fällen identisch ist? https://creativecommons.org/licenses/by/4.0/
  • 20. Oliver Goldschmidt TU Hamburg-Harburg Beispiel 6th German VuFind-Meeting University Library Hamburg, Germany 2017/09/28 Treffer 1 Treffer 4 Treffer 5 Feld Algorithmische Graphentheorie Volker Turau Algorithmische Graphentheorie Turau, Christoph Weyer Algorithmische Graphentheorie Elektronische Ressource von Volker Turau title_full_unstemmed nein ja nein Lehrbuch Book Book eBook Formatboosting 1996 2015 2010 Erscheinungsjahr ● Titel ist im Index nicht identisch! https://creativecommons.org/licenses/by/4.0/
  • 21. ● Searchspecs.yaml zur Anwendung von Boostingfunktionen ● DisMax Boosting Parameter https://cwiki.apache.org/confluence/display/solr/The+DisMa x+Query+Parser ● Solr-Parameter debugQuery=true zum Analysieren des Scorings ● Boosting-Funktionen https://wiki.apache.org/solr/FunctionQuery – recip (funktioniert im findex- oder Sharding-Kontext nicht) – ord / rord Oliver Goldschmidt TU Hamburg-Harburg Konfiguration in VuFind 6th German VuFind-Meeting University Library Hamburg, Germany 2017/09/28 https://creativecommons.org/licenses/by/4.0/
  • 22. ● debugQuery=true an Solr-Anfrage anhängen Oliver Goldschmidt TU Hamburg-Harburg Relevanzerklärung aus Solr 6th German VuFind-Meeting University Library Hamburg, Germany 2017/09/28 https://creativecommons.org/licenses/by/4.0/
  • 23. Oliver Goldschmidt TU Hamburg-Harburg Vielen Dank 6th German VuFind-Meeting University Library Hamburg, Germany 2017/09/28 Viel Erfolg beim Optimieren https://creativecommons.org/licenses/by/4.0/