Federated search engines allow users to search across multiple databases from a single search interface. This document discusses federated search engines, including their definition, benefits for users like saving time, and challenges around issues like differing search capabilities across databases and accurately deduplicating similar results. The document is presented as a slideshow for a conference on federated search engines.
Unleash Your Potential - Namagunga Girls Coding Club
Federated search engines: An introduction
1. 1
Federated search engines:
een inleiding
Paul.Nieuwenhuysen @ vub.ac.be
Prepared to support a presentation at the
1-day conference about
“Federated search engines”
organized by VVBAD,
section School Libraries,
in Brussels, Belgium,
on May 5, 2009
2
These slides should be available from the WWW site
http://www.vub.ac.be/BIBLIO/nieuwenhuysen/presentations/
(note: BIBLIO and not biblio)
and also from the WWW site of the organisers of the
conference = VVBAD
2. 3
1. Introduction and definition
2. Problem statement
- contents 3. Federated search engines as a
- summary partial solution
4. Meaning and confusion
- structure
5. Advantages / benefits ☺
- overview
6. Difficulties / limitations
7. Implementation
of this 8. Putting federated searching in
presentation a wider context
9. (Some good information
sources about federated
searching)
4
Federated searching
Introduction and definition
3. 5
Introduction:
scattering of sources
• Users want to exploit information sources fast and
effectively.
• This is hindered by the fact that digital, electronic
information sources that may contain relevant
information are created and scattered, distributed on
numerous computers all over their intranet or even over
the Internet and the WWW.
6
Introduction:
scattering of sources
• In other words:
integration / aggregation is still far from perfect.
4. 7
Introduction:
scattering of sources difficulties
• Various sources
»must be used one after the other which requires many
decisions and actions and costs time
»offer different user interfaces in the retrieval phase,
which is confusing and time consuming
»offer each found information item in various formats
»display items in different ways on a computer display
8
Federated searching
Problem statement
5. 9
Introduction:
problem statements
Which methods have been developed and applied to cope
with this reality?
10
Federated searching
Federated search engines as a partial solution
6. 11
Method 1: Merging = aggregating
into a searchable database
User User
User User
Search engine Aggregated database
Database Database Database
or web site or web site or web site o
or… or… or…
12
Method 1: Merging = aggregating
into a searchable database
User User
User User
Search engine Aggregated database
Database Database Database
or web site or web site or web site o
or… or… or…
7. 13
Method 2: Federated searching
through scattered databases
User
User User
User
Federated search engine
Search engine
Search engine Search engine
Database Database Database
14
Federated searching:
definition
An ideal federated search system
1. allows a user to formulate a query,
2. it adapts/transforms this query,
so that it can be sent with a proper syntax to each search
engine of a chosen set/group of disparate databases,
3. it broadcasts this query to those databases,
4. it collects results from each database,
5. (perhaps: consolidates these results into 1 result set)
6. (perhaps: detects and removes duplicate items)
7. shows the final results to the user, in a unified format
8. allows the user to sort the results by various criteria
☺
8. 15
Federated searching:
approach
• This type of computer systems helps to integrate access to
distributed databases in one search action, as far as
possible.
• The catalogue of local library holdings can be one of the
target databases.
16
Federated searching:
scheme
End user
End user
☺
☺ information
portal for
portal for sources
meta-searching
meta-searching
= federated searching
= federated searching
= cross-database searching
= cross-database searching
End user
End user
☺
☺
9. 17
Federated searching
through scattered databases: why?
• Applications:
»Finding flights to a particular destination offered by
various airline companies
»Finding the availability of rooms in various hotels
»Finding information in various databases related to a
particular museum
»Finding information in bibliographic databases !
18
Federated searching
through scattered databases: why?
The perfect trip:
The perfect trip:
☺
1. A cheap and nice flight
1. A cheap and nice flight
2. A cheap and nice hotel
2. A cheap and nice hotel
3. A visit to a nice museum
3. A visit to a nice museum
4. Something nice to read (free via your library)
4. Something nice to read (free via your library)
10. 19
Example
Federated searching: application:
finding a suitable flight
20
Example
Federated searching: application:
finding a hotel room in some city
11. 21
Example
Federated searching:
searching in a museum
22
Example
Federated searching:
searching in a library
12. 23
Federated searching:
integrating access
Intranet
Intranet
Articles
Articles
WWW
WWW
search engines
search engines
Journals
Journals
Catalog
Catalog
Publishers
Publishers database(s)
database(s)
of other libraries
of other libraries
Databases
Databases
(full-text or bibliographic)
(full-text or bibliographic)
Local library catalog
Local library catalog
database(s)
database(s)
Meta-searching system
Meta-searching system
24
Federated searching:
produce - distribute - implement
Producers ==developers ==creators
Producers developers creators
Intermediate sellers ==distributors
Intermediate sellers distributors
Implementers ==users (for instance aalibrary
Implementers users (for instance library
13. 25
Federated searching:
examples of commercial software
Producing company Distributing / selling company Product name
Ex Libris MetaLib
Infor (was GEAC) V-Spaces
MuseGlobal MuseSearch
MuseGlobal CSA
Serials Solutions 360 Search
Serials Solutions WebFeat
Groxis
Vivissimo
Infotrieve
…
26
Federated searching
Meaning and confusion
14. 27
Federated searching:
terminology / vocabulary / synonyms
federated searching
= meta-searching = metasearching
= cross-database searching
= multi-database searching
= multi-threaded searching
= one-stop searching
= poly-searching = polysearching
= broadcast searching
= searching through a portal (but the term “portal” is
used also with other meanings)
28
“Federated searching”
meaning and confusion
Here and in many other contexts,
the term “federated searching” is used
as a synonym for “meta-searching”.
15. 29
“Federated searching”
meaning and confusion
However, some use the terms “federated searching” and
“meta-searching” with DIFFERENT meanings.
»“Federated searching” as searching through a database
that results from merging several databases.
So this is certainly NOT equal to “meta-searching”.
»“Federated searching” as meta-searching that is followed
by merging (federating) the items retrieved from various
databases into only 1 set, ordered in one way or another.
This language problem creates confusion.
30
“Federated searching”
meaning and confusion
• Furthermore:
A federated search engine as software product
is NOT the same as
a federated searching system implemented as a service
that can be available for all on the WWW, to search
»public WWW search engines
»bookshop databases
»library catalogs / holdings
»flight databases
»hotel databases
16. 31
Federated searching
Advantages / benefits
☺
32
Federated searching:
benefits for the users
+ The system can help the user to select appropriate
sources.
☺
17. 33
Federated searching:
benefits for the users
+ The system can help in the process of authentication and
authorization when this involves not only a simple
recognition of IP-address of the user’s client computer,
but when it involves user-id’s and passwords.
☺
34
Federated searching:
benefits for the users
+ The need to know which particular database is suitable
for a particular search is reduced, because several ones
can be searched in one action.
☺
18. 35
Federated searching:
benefits for the users
+ The users have to learn only 1 user interface for
searching and only 1 search syntax,
instead of a user interface and a search syntax for each
database!
☺
36
Federated searching:
benefits for the users
+ Saves the users time executing queries to various servers!
☺
19. 37
Federated searching:
benefits for the users
+ Can make users search and exploit databases that they
would never use otherwise, that is without federated
search system!
☺
38
Federated searching:
benefits for the users
+ Useful, relevant, interesting items/references can be
found/uncovered from unexpected, unknown, unfamiliar
databases!
This is mainly beneficial in the case of interdisciplinary
subjects/topics.
☺
20. 39
Federated searching:
benefits for the users
+ Offers a consistent display of results in the output phase,
(even though the original sources display results in
various ways).
☺
40
Federated searching:
benefits for the users
+ Some systems offer tools to refine display of the results;
for instance
»to dedupe very similar items in the result set,
»to sort the results,
»to rank the results,
»to search within the result set,
»…
☺
21. 41
Federated searching:
benefits for the users
+ Some systems offer interesting links from a retrieval
result to various related sources or services
(such as the full text or a document delivery service),
using a link generator based on the OpenURL standard.
☺
42
Federated searching:
benefits for the users
+ Some systems check for each retrieved bibliographic
description if the corresponding full text is immediately
available online and indicate this immediately to the
user, on the fly.
☺
22. 43
Federated searching:
benefits for the users
+ Some systems further process the retrieved results and
display them in an interesting way that is not offered by
the searched original systems. For instance:
» Clustering of results according to
— subject
☺
— age
— availability of full text
» Displaying the results in a graphical way
44
Federated searching:
benefits for the users
So far so good !
☺
23. 45
Federated searching
Difficulties / challenges / problems / limitations
46
Federated searching:
difficulties / challenges / problems
- Portal software tries to cope
with several difficulties/challenges/problems/pitfalls
that hinder the application of the “good idea”:
The user does not notice most of these problems and
shortcomings,
because results from various databases are merged by the
federated search system.
24. 47
Federated searching
through scattered databases
User
User User
User
Federated search engine
Search engine
Search engine Search engine
Database Database Database
48
Federated searching:
difficulties / challenges / problems
- Searching in a target database may be restricted by the
federated search engine to a particular field (for example:
a restriction to words occurring in the title, because this is
the default way of searching of that system) while this
restriction is not present in other target databases.
Furthermore, this is perhaps not explained in the user
interface.
This may lead to a lower recall, which is of course NOT
desirable.
Even worse, the user is perhaps not aware of this.
25. 49
Federated searching:
difficulties / challenges / problems
- How to deduplicate/dedupe/cluster
very similar entries/results/items
= near-duplicates,
from various target sources?
When is similar similar enough?
Which entry/result/item
to choose/select
as the representative of a cluster of similar entries?
50
Federated searching:
difficulties / challenges / problems
- How to provide some useful relevance ranking of search
results/entries,
even when the target databases can be quite different in
type and quality, and
even when no index is created in advance, just-in-case,
well before the search action, like Google and other
Internet search engines do.
26. 51
Federated searching:
difficulties / challenges / problems
- Powerful / sophisticated / refined forms of searching may
not be applicable in a federated search.
Example:
limiting to a particular type of document,
such as a therapy in medicine.
This may cause a LOSS of time, instead of winning time.
52
Federated searching
through scattered databases
User
User User
User
Federated search engine
Search engine Search engine Search engine
Database Database Database
27. 53
Federated searching:
difficulties / challenges / problems
- Differences among target sources in the Internet
application protocols that are applied normally,
by default, for connection/communication and retrieval,
such as
»telnet
»HTTP
»proprietary, non-standard protocols
»Z39.50, ISO239.50, SRU, and related protocols that are
developed for federated-searching!
54
Federated searching:
difficulties / challenges / problems
- Even when the target is compatible with a suitable set of
protocols for standardised retrieval
Z39.50, ISO239.50, SRU…,
then difficulties can arise due to
»poor implementations
»incomplete implementations
(the target may lack features supported by the protocol and
by the software for federated searching)
»variations in implementations
28. 55
Federated searching:
difficulties / challenges / problems
- When a suitable protocol can NOT be used and simple
HTTP must be used for connection to the target source,
and
when simple HTML is used by the target source to
present results,
then the capture and analysis of the results by the
federating search system is complicated and difficult
and can be hindered by changes with time in the method
of the presentation of results.
56
Federated searching
through scattered databases
User
User User
User
Federated search engine
Search engine
Search engine Search engine
Database
Database Database
29. 57
Federated searching:
difficulties / challenges / problems
- Various search engines may act in different ways!
For instance:
Is truncation of a word in a search query possible?
Is limitation to a particular field possible?
How can a federated search engine take these differences
into account?
58
Federated searching:
difficulties / challenges / problems
- A query with several words and without explicit Boolean
operators can be interpreted in various ways
by the various database retrieval systems.
For instance, the retrieval software may apply the
Boolean operator AND to combine all the query words,
but it may also use OR.
In the case that the federated search system does not take
care of this well, then this may lead to lower recall and
precision.
30. 59
Federated searching:
difficulties / challenges / problems
- When some special, non-standard, dedicated retrieval
software is made available by a specific target source
databases to offer special features to the user to exploit
the database better than with a standard retrieval
interface,
then the source can probably not be exploited as well by
the federated search system.
Searches are reduced to the lowest common denominator.
60
Federated searching:
difficulties / challenges / problems
- Differences in response time among the target sources.
A slow response of a target source can hinder the final
analysis and presentation of the results to the user.
31. 61
Federated searching
through scattered databases
User
User User
User
Federated search engine
Search engine
Search engine Search engine
Database Database
Database
62
Federated searching:
difficulties / challenges / problems
- Differences among target sources in the
formatting/structuring of their database records in fields
hinders
- searching limited to a field
(for instance the author field)
- displaying selected fields only
(such as the retrieved titles)
- sorting of the displayed records on the contents of a
particular selected field
(such as author or publication date)
32. 63
Federated searching:
difficulties / challenges / problems
- Differences among target sources in the applied metadata
schemes in the databases to improve retrieval, such as
»classifications
»taxonomies
»thesaurus systems
»ontologies
This hinders the exploitation of the added value of such
metadata.
64
Federated searching:
difficulties / challenges / problems
- A user of a federated search system may perhaps
incorrectly assume
that ALL relevant databases are covered simply in 1
action, or
that if a database is not included,
then it must not be relevant/important.
However, even a federated search system can only search
a limited number of databases, so that perhaps some
relevant databases are NOT covered.
33. 65
Federated searching:
difficulties / challenges / problems
- Students who rely on a federated search system may
perhaps not learn about the important subject-specific
databases in their field,
so that when they have no access anymore to the same
federated search system, they still do not know which
database may help them in their research and how to use
it well.
66
Federated searching:
difficulties / challenges / problems
- Some databases are accessible only by a limited number of
concurrent/simultaneous users from one organisation, as
agreed in the licence and controlled by the authorization
software of the database.
When such a database would be included automatically in
all or in many federated searches,
then some users who really require access to that
particular database may perhaps not be able to use that
database.
34. 67
Federated searching:
difficulties / challenges / problems
- When a database is accessible by an unlimited number of
concurrent/simultaneous users from one organisation,
and when such a database would be included
automatically in all or in many federated searches, from
many organisations (even when the searcher does not
have any particular interest in that database),
then the retrieval system of that database may be
overburdened.
This is mainly a concern for information vendors, who
must maintain servers with sufficient capacity.
68
Federated searching:
difficulties / challenges / problems
- Some databases can NOT be included as a target
database in a federated searching engine,
because their owners/producers do not allow this.
This is an important difficulty, because in this way
interesting / valuable databases are perhaps not exploited
by users who rely on federated searching.
Even worse: one of the databases in this category is one of
the biggest and important databases for the discovery of
scientific documents AND is freely accessible through the
direct, normal user interface: Google Scholar.
35. 69
Federated searching
through scattered databases
User
User User
User
Federated search engine
Search engine
Search engine Search engine
Database Database Database
70
Federated searching:
difficulties / challenges / problems
- Users may be less impressed by a federated searching
system than by the simple, common, familiar, famous
Internet / WWW search engines, as response time is in
most cases less impressive, due to differences as follows:
- The computer hardware used by the systems
- Slower distributed searching through several computer
systems, versus faster searching through a more centralised
computer database of a priori compiled records
36. 71
Federated searching:
difficulties / challenges / problems
- The evaluation of the quality of each search result
from a federated search action
may be more difficult than when each database is
searched separately,
because the user may be less aware of the limitations,
strengths, selection criteria and aims of the individual,
separate databases that offer each result.
For instance, peer-reviewed articles from reputable scientific
journals may be mixed with more popular and more biased,
unscientific texts from trade literature.
72
Federated searching
Implementation
37. 73
Federated searching:
local or remote hosting
• The federated searching system can be developed and
maintained
»on a local computer in-house, or
»hosted on a more distant, external, remote computer;
this service is offered by some vendors of software for
federated searching;
partly outsourcing
74
Federated searching:
local hosting: scheme
End user
End user
☺
☺ In-house portal for
In-house portal for information
meta-searching
meta-searching sources
= federated searching
= federated searching
= cross-database searching
= cross-database searching
End user
End user
☺
☺
38. 75
Federated searching:
remote hosting: scheme
End user
End user
☺
☺ Externally hosted portal for
Externally hosted portal for information
meta-searching
meta-searching sources
= federated searching
= federated searching
= cross-database searching
= cross-database searching
End user
End user
☺
☺
76
Federated searching:
local versus remote hosting
• Remote hosting requires perhaps
»a smaller initial investment in computer hardware and
skilled personnel
»less time investment in installation and maintenance of
equipment and software
39. 77
Federated searching:
tasks for the library
Some roles/functions/tasks of the library
related to federated searching:
»of course providing a computer system for meta-searching
»maintaining a list of target information sources that are
appropriate in the framework of the particular library:
—subjects covered by the target databases should be
relevant
—subscriptions must have been made by the library for
access to the targets
78
Federated searching:
tasks for the library
»grouping databases in groups that correspond to subject
fields and offer these as pre-selections in the user interface
of the federated search system
»showing the system and its features to potential users
»…
40. 79
Federated searching
in a library WWW site?
- Searching for books - Opening hours
- Searching for articles - Library services
- Rules and regulations
- Organisation of the
library
80
Federated searching
in a library WWW site?
- Searching for books - Opening hours
- Catalog of this library - Library services
- Other catalogs - Rules and regulations
- Other book databases - Organisation of the
library
- Electronic books
- Federated searching for
books
- Searching for articles
41. 81
Federated searching
in a library WWW site?
- Searching for books - Opening hours
- Searching for articles - Library services
- Databases to find articles - Rules and regulations
- Electronic journals - Organisation of the
library
- Collective catalog of
periodicals
- Repositories of articles on
the Internet and WWW
- Federated searching for
articles
82
Federated searching
in a library WWW site!
- Find the information that you need
- The catalog
- Databases
- Opening hours
To a federated search engine
- Library services To a federated search engine
- Rules and regulations
- Organisation of the library
42. 83
Federated searching:
conclusion
Federated searching
- is a continuous challenge
for developers of the sophisticated software and
for the implementers in libraries and information centers
- offers benefits for those end-users
who are not enthusiastic to work with separate target
source databases
- does not eliminate the need for access to individual
databases
84
Libraries and information centres
Putting federating searching
in a wider context
43. 85
Federated searching
+ link generator
user
user
☺ full-text document !!
☺ full-text document
menu
menu
reference
reference
federated searching
federated searching context-sensitive
context-sensitive
hyperlink generator appropriate
hyperlink generator appropriate
target
target
information
information
source
source
database
database
about local situation
information about local situation
information
“knowledgebase”
sources “knowledgebase”
sources
86
Federated search system
and link resolver compared
Problem to be solved Federated Link resolver
search system
How to bring a user
! -
to many information sources
in 1 action?
How to bring a user from some
- !
information
to related information?
44. 87
Putting the digital tools together
in a library system
user
user
☺
☺
library WWW site
library WWW site
context-sensitive
context-sensitive
hyperlink generator
hyperlink generator
catalogue(s) federated searching
catalogue(s) federated searching
of local holdings
of local holdings database
database
about local situation
about local situation
“knowledgebase”
“knowledgebase”
88
Access to information sources:
tools / methods / systems
In sequence of priority:
1. Online library catalogue
(for hard copy and digital documents)
2. Library web site
3. Link generator + “knowledgebase”
4. Federated search system
5. …
45. 89
Libraries and information centres
Good information sources
about federated searching
90
Some good information sources
about federated searching
Baer, William
Federated searching: friend or foe?
College & Research Libraries News, October 2004, pp. 518-519.
Hofstede, Marten
Portals op de pijnbank.
Informatie Professional, 2002, No. 10, pp. 34-39.
Jacso, Peter
Thoughts about federated searching.
Information Today, October 2004, pp. 17, 20.
Joint, Nicholas
Managing the implementation of a federated search tool in an academic library.
Library Review, Vol. 58, No. 1, 2009, pp. 11-16.
Linoski, Alexis and Walczyk, Tine
Federated search 101.
Library Journal Netconnect Summer 2008, pp. 2-5.
Lockwood, Charles and Mac Donald, Patricia
Implementation of a federated search system in the academic library: lessons learned.
Co-published simultaneously in Internet Reference Service Quarterly, Vol. 12, No. ½, 2007, pp. 73-91 and in Federated search: solution or setback for online library services (edited by Christopher N. Cox) The Haworth Information Press, 2007, pp. 73-91. Available online from: http://irsq.haworthpress.com
McHale, Nina
Accidental federated searching: implementing federated searching in the smaller academic library.
Co-published simultaneously in Internet Reference Service Quarterly, Vol. 12, No. 1-2, 2007, pp. 93-110 and in Federated search: solution or setback for online library services (edited by Christopher N. Cox) The Haworth Information Press, 2007, pp. 93-110. Available online from: http://irsq.haworthpress.com
Noerr, Peter
Scaling the digital divide: how interoperable systems are making information more accessible.
In proceedings of the International Conference on Digital Libraries 2004: knowledge creation, preservation, access, and management, ICDL 2004, in Habitat Centre, New Delhi, India, 24-27 February 2004, Volume 1, 517 pp. New Delhi : TERI, The Energy and Resources Institute, 2004, ISBN 81-7993-029-7, pp. 66-68.
Reiss, Kevin
SRU, Open Data and the future of metasearch
Co-published simultaneously in Internet Reference Service Quarterly, Vol. 12, No. ½, 2007, pp. 369-386 and in Federated search: solution or setback for online library services (edited by Christopher N. Cox) The Haworth Information Press, 2007, pp. 369-386. Available online from: http://irsq.haworthpress.com
Sadeh, Tamar
To Google or not to Google: metasearch design in the quest for the ideal user experience. [online]
In: Proceedings of the ELAG 2004 Conference, May 2004. Available from: http://www.elag.org/ [cited 2004]
Sadeh, Tamar
Transforming the metasearch concept into a friendly user experience.
Co-published simultaneously in Internet Reference Service Quarterly, Vol. 12, No. ½, 2007, pp. 1-25 and in Federated search: solution or setback for online library services (edited by Christopher N. Cox) The Haworth Information Press, 2007, pp. 1-25. Available online from: http://irsq.haworthpress.com
Tennant, Roy
The right solution: federated search tools.
Library Journal, June 15, 2003, p. 28.
Webster, Peter M.
Challenges for federated searching.
Co-published simultaneously in Internet Reference Service Quarterly, Vol. 12, No. ½, 2007, pp. 357-368 and in Federated search: solution or setback for online library services (edited by Christopher N. Cox) The Haworth Information Press, 2007, pp. 357-368. Available online from: http://irsq.haworthpress.com
46. 91
• You are free to copy, distribute, display this work under
the following conditions:
»Attribution:
You must mention the author.
»Noncommercial:
You may not use this work for commercial purposes.
»No Derivative Works:
You may not change, modify, alter, transform, or build
upon this work.
• For any reuse or distribution, you must make clear to
others the license terms of this work.