Contextualised Browsing in a Digital Library’s Living Lab

Contextualised Browsing in
a Digital Library’s Living
Lab
Zeljko Carevic, Sascha Schüller, Philipp Mayr, Norbert
Fuhr
JCDL 2018

Introduction
 Exploratory Search (especially
browsing/stratagem search) is one of the most
frequent search activities in DL [1-3]
 DL offer high quality structured metadata that can
be utilised for browsing. E.g.:
 Keywords
 Classifications
 Journals
 System support on this level rather low. E.g.:
 Browsing DL by keywords acts as a simple
Boolean filter
2

3
violence and sports
loosing the context
stratagems

Contextualised Browsing
 Implement contextual browsing that tailors search
results along previous search activities of a present
user.
 Introduce two contextual re-ranking features:
 Document similarity
 Session context
4

Contextualised Browsing
5
re-rank these
result lists based
on
contextual
information

Research Question
 Can we improve the effectiveness of exploratory
search on the level of browsing by using contextual
ranking features in comparison to a non-contextual
ranking feature?
6

Approach A: Baseline
 Default ranking that is based on a query expansion
including synonyms and translations.
 Browsing is not contextualised.
7
Q=Expanded Query e.g.
Keyword:“sport“
D=Set of documents

Approach B: Document Similarity
 Re-rank documents according to their similarity to
the seed document.
 To measure the similarity between two documents we
employ SOLR‘S „More Like This“ query parser.
8
Keyword:“sport“
D=Set of documents
D_s=Seed document

Approach C: Session Context
 Re-rank document based on previous search
activities -> Session Context
9
Keyword:“sport“
D=Set of documents
U_c=Session Context

Approach C: Session Context
 Session context
contains information
about:
 Submitted queries
(„violence“ and „violence and
sports“)
 Set of:
 Keywords and
 Classifications
 which were contained in
seen documents and in
documents within a result
set
10

Experiment
 For a period of 3 months each Sowiport user
is assigned one approach at the beginning of
a session:
 A: Baseline (non-contextualised)
 B: Document similarity (contextualised)
 C: Session context (contextualised)
11
Sowiport a DL for the Social Sciences as a
Living Lab
9.5 Mio. documents
20,000 unique users per week

Methodology
 Measure the effectiveness of our contextualised
ranking features on two levels:
 Mean First Relevant (MFR): The mean of the first
clicked document in a result set [4]
 Usefulness [5]
 Local usefulness: the immediate relevance of a
document
 Global usefulness: the total number of implicit
relevance signals for the entire session starting
from stratagem usage.
12

Results
 ~600,000 sessions in total
 Equally distribution for:
 Total stratagem usage
 Interactions per session
 Dwell time
 Document views from stratagem search notably
higher for the contextualised approaches
13

Results: Mean first relevant
14
 Baseline significantly outperformed by both
contextual re-ranking features
 Document similarity performs best.
 As result set sizes might contain only few
documents we additionaly measure MFR ≥ 20
 MFR increases for all approaches when MFR
≥ 20Bonferroni corrected p*=0.016

Results: Mean first relevant with
different history sizes (HS)
15
 MFR increases with growing HS
 Effect most evident for the baseline
 HS has the lowest effect on approach C
 The better the session context the better the re-
ranking
 Sample rather low.
 Approach C highly depends on the number of interactions
resulting in a more meaningful context -> Cold start
problem
 History size is defined by the number of interactions prior
stratagem search.

Results: Usefulness
16
 Similar observation as in MFR.
 Baseline outperformed by both contextualised
approaches
 Document Similarity performs best.
 Global usefulness only marginally different

Results (Summary)
 Document views from stratagem search notably
higher for the contextualised approaches
 Both contextual ranking features outperform the
baseline in terms of MFR.
 Document similarity performs best; esp. for short
sessions
 Performance of the session context increases
with growing history sizes
 In terms of usefulness the re-ranking based on
document similarity performs best.
 Differences in session related features like dwell time
could not be found.
17

Strengths and Limitations
 Pros
 Real life environment with real users
 Large sample of online users
 Strong indication for a need for contextual ranking
features
 Cons
 No information about the relevance of the clicked
documents
 User is not aware of the re-ranking and thus not
able to tune the results
18

Outlook
 Evaluate contextualisation in a controlled
environment.
 Gather information about the explicit relevance
of clicked documents
 Introduce a transparent re-ranking interface that
enables users to tune the ranking (e.g. disable
contextualisation)
 Implement more sophisticated re-ranking
approaches e.g.:
 Mouse tracking
 Collaborative contextualisation
19

Conclusion
 Implemented two contextual re-ranking features
that rank documents according to:
 Document Similarity
 Session Context
 Evaluation in a living lab for the Social Sciences
 Contextual ranking significantly outperforms the non-
contextualised baseline.
 Contextualisation has an immediate influence on
the local usefulness of search results.
20

References
 [1] Zeljko Carevic, Maria Lusky, Wilko van Hoek, and Philipp
Mayr. 2017. Investigating exploratory search activities based on
the stratagem level in digital libraries. International Journal on
Digital Libraries (2017), 1–21.
 [2] Zeljko Carevic and Philipp Mayr. 2016. Survey on High-level
Search Activities based on the Stratagem Level in Digital
Libraries. In Proceedings of TPDL 2016, Springer, 54–66
 [3] Philipp Mayr and Ameni Kacem. 2017. A Complete Year of
User Retrieval Sessions in a Social Sciences Academic Search
Engine. In Proceedings of TPDL 2017, Springer, 560–565
 [4] Norbert Fuhr. 2017. Some Common Mistakes In IR
Evaluation, And How They Can Be Avoided. Technical Report.
University of Duisburg-Essen, Germany
 [5] Daniel Hienert and Peter Mutschke. 2016. A usefulness-
based approach for measuring the local and global effect of IIR
services. In Proceedings of the 2016 ACM Conference on
Human Information Interaction and Retrieval. ACM, 153–162
21

Contextualised Browsing in a Digital Library’s Living Lab

Recomendados

Recomendados

Más contenido relacionado

Similar a Contextualised Browsing in a Digital Library’s Living Lab

Similar a Contextualised Browsing in a Digital Library’s Living Lab (20)

Más de GESIS

Más de GESIS (20)

Último

Último (20)

Contextualised Browsing in a Digital Library’s Living Lab

Notas del editor