Information retrieval basics_v1.0

Center for the Study of New Media and Society
www.newmediacenter.ru
Information Retrieval Basics
Sergey Chernov

Information search in action…
5/24/2013 Sergey Chernov, Information Retrieval Basics
 Vladimir Pekhtin
 Alexey Navalny
 Doct_z

Public data

Resources and achievements
 Search engines
 Databases for property owners in Europe & USA
 List of Deputies of State Duma
 Man-hours invested in manual search and exploration
Results: 500+ news, 150 articles, 20
interviews and videos, Pekhtin
resigned from Committee of Ethics

Outline for today
 Sources of Information
 Search strategies and tools
 Search Cases
 Assignments and Q&A Session

Information in numbers
 Facebook – 900 mln users
 Twitter – 500 mln
 Flickr – 50 mln
 Delicious – 5 mln
 Web – 1 trln

Information Retrieval
 Information Retrieval (IR) is
finding material (usually
documents) of an unstructured
nature (usually text) that
satisfies an information need
from within large collections
(usually stored on computers).
8

Information Domains
Desktop
Enterprise Web (Intranet)
Public Web (Internet)
DVD
Disk
FShare
DB
Web
CMS
E-mail
People
Web SitesOnline
Libraries
Online
Shops
Social
Networks

Information Retrieval System
Downloads/collects the data
Processes the data and builds Inverted
Index
Evaluates user queries against the index and
computes a list of (ranked) results
Organizes and displays the results to the
user, facilitates navigation through the
result set
Crawler
Indexer
Ranker
Display

User Needs
 Need [Broder 2002, Rose and Levinson 2004]
 Informational – want to learn about something
 Navigational – want to go to that page
 Transactional – want to do something (web-mediated)
 Access a service
 Downloads
 Shop
 Gray areas
 Find a good hub
 Exploratory search “see what’s there”
Low hemoglobin
United Airlines
Seattle weather
Mars surface images
Canon S410
Car rental Brasil
Sec. 19.4.1
11

How far do people look for results?
(Source: iprospect.com WhitePaper_2006_SearchEngineUserBehavior.pdf)
12

How to evaluate results? CRAAP
 Currency
 Relevance
 Authority
 Accuracy
 Purpose
http://www.csuchico.edu/lins/handouts/eval_websites.pdf
 How old is the material? Does the age matter?
History – better old info, medicine –fresh stuff.
 How well does it fit? Does it answer my question?
Detailed enough?
 Who wrote it? Is the author is qualified to write?
What about contact information?
 Is it supported by evidence? Refereed? Verifiable?
Unbiased? Clearly written?
 What can you infer about authors‘ message? Is it
fact, opinion or propaganda?
California State University, Chico

Where to search?
 Web
 Subject directories
 Intranet and Desktop
 Digital libraries
 Social platforms
 Databases and Hidden Web
 Business analytics
 Wikipedia
 Photo stocks
 Open datasets and Linked Data
 Open Gov Data

Web

Subject directories
http://webupon.com/search-
engines/top-five-subject-directories-and-
how-to-use-them/

Intranet

Desktop

Digital libraries

Social platforms

Databases and Hidden Web

Business Analytics

Wikipedia

Photo stocks

Linked Data

Open Data

Search is a journey
Is that all?
http://www.flickr.com/photos/morville

Search is a journey
http://www.flickr.com/photos/morville

Exploratory search
Lookup
Question answering
Fact retrieval
Known-item search
Navigational search
Lasts for seconds
Exploratory search
InvestigateLearn
Knowledge acquisition
Comprehension
Comparison
Discovery
Serendipity
Incremental search
Driven by uncertainty
Non-linear behavior
Result analysis
Lasts for hours

Exploratory behavior
 Learn
 About the search topic
 About the collection
 Reformulate query
 Broadening
 Narrowing
 Changing the focus
 Socialize
 Looking for experts
 Collaborative search

Search tools
 Web search engines
 Personalized search
 Faceted search
 Review services
 Geo-services
 Question answering
 Scientific search
 Domain-specific search
 Recommender systems

Web search engine
Query suggestions
Snippets

Web search engine (2)

Web search engine (3)
 Search for pages that link to a URL – “link:” operator
link: google.com/images
 Search for pages that similar to a URL – “related:”
related: nytimes.com
 Search for results from specific sites – “site:”
site: strelkainstitute.com

Personalized search
 Personalization is a modeling of user’s
preferences from previous interactions
 Queries, click-through analysis, eye tracking …
 Personalized Search usually implemented as:
 Re-ranking and filtering of the search results
 Personalized query expansion

Faceted search
It’s about Result Analysis!
facet
facet values

Faceted search (2)
It’s about Query Reformulation!

Review services

Geo-services

Question answering

Scientific search

Scientific Search (2)

Domain-specific search

Recommender systems

Case 1: finding a research paper

Case 2: planning a trip

Case 3: looking for an expert

Case 4: market analysis

Practical assignment
 Construct 3 information needs, relevant to your
everyday experience (preparing for an interview,
choosing a learning course, doing a homework, etc.)
 Search for the information, using maximum number
of sources and tools
 Share your experience

Information retrieval basics_v1.0

Recomendados

Recomendados

Más contenido relacionado

Similar a Information retrieval basics_v1.0

Similar a Information retrieval basics_v1.0 (20)

Último

Último (20)

Information retrieval basics_v1.0

Notas del editor