SlideShare una empresa de Scribd logo
1 de 56
Center for the Study of New Media and Society
www.newmediacenter.ru
Information Retrieval Basics
Sergey Chernov
Information search in action…
5/24/2013 Sergey Chernov, Information Retrieval Basics
 Vladimir Pekhtin
 Alexey Navalny
 Doct_z
Public data
5/24/2013 Sergey Chernov, Information Retrieval Basics
Resources and achievements
 Search engines
 Databases for property owners in Europe & USA
 List of Deputies of State Duma
 Man-hours invested in manual search and exploration
Results: 500+ news, 150 articles, 20
interviews and videos, Pekhtin
resigned from Committee of Ethics
5/24/2013 Sergey Chernov, Information Retrieval Basics
Outline for today
 Sources of Information
 Search strategies and tools
 Search Cases
 Assignments and Q&A Session
5/24/2013 Sergey Chernov, Information Retrieval Basics
Outline for today
 Sources of Information
 Search strategies and tools
 Search Cases
 Assignments and Q&A Session
5/24/2013 Sergey Chernov, Information Retrieval Basics
Information in numbers
 Facebook – 900 mln users
 Twitter – 500 mln
 Flickr – 50 mln
 Delicious – 5 mln
 Web – 1 trln
5/24/2013 Sergey Chernov, Information Retrieval Basics
Information Retrieval
 Information Retrieval (IR) is
finding material (usually
documents) of an unstructured
nature (usually text) that
satisfies an information need
from within large collections
(usually stored on computers).
8
Information Domains
Desktop
Enterprise Web (Intranet)
Public Web (Internet)
DVD
Disk
FShare
DB
Web
CMS
E-mail
People
Web SitesOnline
Libraries
Online
Shops
Social
Networks
Information Retrieval System
Downloads/collects the data
Processes the data and builds Inverted
Index
Evaluates user queries against the index and
computes a list of (ranked) results
Organizes and displays the results to the
user, facilitates navigation through the
result set
Crawler
Indexer
Ranker
Display
User Needs
 Need [Broder 2002, Rose and Levinson 2004]
 Informational – want to learn about something
 Navigational – want to go to that page
 Transactional – want to do something (web-mediated)
 Access a service
 Downloads
 Shop
 Gray areas
 Find a good hub
 Exploratory search “see what’s there”
Low hemoglobin
United Airlines
Seattle weather
Mars surface images
Canon S410
Car rental Brasil
Sec. 19.4.1
11
How far do people look for results?
(Source: iprospect.com WhitePaper_2006_SearchEngineUserBehavior.pdf)
12
How to evaluate results? CRAAP
 Currency
 Relevance
 Authority
 Accuracy
 Purpose
5/24/2013 Sergey Chernov, Information Retrieval Basics
http://www.csuchico.edu/lins/handouts/eval_websites.pdf
 How old is the material? Does the age matter?
History – better old info, medicine –fresh stuff.
 How well does it fit? Does it answer my question?
Detailed enough?
 Who wrote it? Is the author is qualified to write?
What about contact information?
 Is it supported by evidence? Refereed? Verifiable?
Unbiased? Clearly written?
 What can you infer about authors‘ message? Is it
fact, opinion or propaganda?
California State University, Chico
Where to search?
 Web
 Subject directories
 Intranet and Desktop
 Digital libraries
 Social platforms
 Databases and Hidden Web
 Business analytics
 Wikipedia
 Photo stocks
 Open datasets and Linked Data
 Open Gov Data
5/24/2013 Sergey Chernov, Information Retrieval Basics
Web
5/24/2013 Sergey Chernov, Information Retrieval Basics
Subject directories
5/24/2013 Sergey Chernov, Information Retrieval Basics
http://webupon.com/search-
engines/top-five-subject-directories-and-
how-to-use-them/
Intranet
5/24/2013 Sergey Chernov, Information Retrieval Basics
Desktop
5/24/2013 Sergey Chernov, Information Retrieval Basics
Digital libraries
5/24/2013 Sergey Chernov, Information Retrieval Basics
Social platforms
5/24/2013 Sergey Chernov, Information Retrieval Basics
Databases and Hidden Web
5/24/2013 Sergey Chernov, Information Retrieval Basics
Business Analytics
5/24/2013 Sergey Chernov, Information Retrieval Basics
Wikipedia
5/24/2013 Sergey Chernov, Information Retrieval Basics
Photo stocks
5/24/2013 Sergey Chernov, Information Retrieval Basics
Linked Data
5/24/2013 Sergey Chernov, Information Retrieval Basics
Open Data
5/24/2013 Sergey Chernov, Information Retrieval Basics
Outline for today
 Sources of Information
 Search strategies and tools
 Search Cases
 Assignments and Q&A Session
5/24/2013 Sergey Chernov, Information Retrieval Basics
Search is a journey
Is that all?
http://www.flickr.com/photos/morville
Search is a journey
http://www.flickr.com/photos/morville
Search is a journey
http://www.flickr.com/photos/morville
Search is a journey
http://www.flickr.com/photos/morville
Search is a journey
http://www.flickr.com/photos/morville
Exploratory search
Lookup
Question answering
Fact retrieval
Known-item search
Navigational search
Lasts for seconds
Exploratory search
InvestigateLearn
Knowledge acquisition
Comprehension
Comparison
Discovery
Serendipity
Incremental search
Driven by uncertainty
Non-linear behavior
Result analysis
Lasts for hours
Exploratory behavior
 Learn
 About the search topic
 About the collection
 Reformulate query
 Broadening
 Narrowing
 Changing the focus
 Socialize
 Looking for experts
 Collaborative search
Search tools
 Web search engines
 Personalized search
 Faceted search
 Review services
 Geo-services
 Question answering
 Scientific search
 Domain-specific search
 Recommender systems
5/24/2013 Sergey Chernov, Information Retrieval Basics
Web search engine
5/24/2013 Sergey Chernov, Information Retrieval Basics
Query suggestions
Snippets
Web search engine (2)
5/24/2013 Sergey Chernov, Information Retrieval Basics
Web search engine (3)
 Search for pages that link to a URL – “link:” operator
link: google.com/images
 Search for pages that similar to a URL – “related:”
related: nytimes.com
 Search for results from specific sites – “site:”
site: strelkainstitute.com
5/24/2013 Sergey Chernov, Information Retrieval Basics
Personalized search
5/24/2013 Sergey Chernov, Information Retrieval Basics
 Personalization is a modeling of user’s
preferences from previous interactions
 Queries, click-through analysis, eye tracking …
 Personalized Search usually implemented as:
 Re-ranking and filtering of the search results
 Personalized query expansion
5/24/2013 Sergey Chernov, Information Retrieval Basics
Faceted search
It’s about Result Analysis!
facet
facet values
Faceted search (2)
It’s about Query Reformulation!
Review services
5/24/2013 Sergey Chernov, Information Retrieval Basics
Geo-services
5/24/2013 Sergey Chernov, Information Retrieval Basics
Question answering
5/24/2013 Sergey Chernov, Information Retrieval Basics
Scientific search
5/24/2013 Sergey Chernov, Information Retrieval Basics
Scientific Search (2)
5/24/2013 Sergey Chernov, Information Retrieval Basics
Domain-specific search
5/24/2013 Sergey Chernov, Information Retrieval Basics
Recommender systems
5/24/2013 Sergey Chernov, Information Retrieval Basics
Outline for today
 Sources of Information
 Search strategies and tools
 Search Cases
 Assignments and Q&A Session
5/24/2013 Sergey Chernov, Information Retrieval Basics
Case 1: finding a research paper
5/24/2013 Sergey Chernov, Information Retrieval Basics
Case 2: planning a trip
5/24/2013 Sergey Chernov, Information Retrieval Basics
Case 3: looking for an expert
5/24/2013 Sergey Chernov, Information Retrieval Basics
Case 4: market analysis
5/24/2013 Sergey Chernov, Information Retrieval Basics
Outline for today
 Sources of Information
 Search strategies and tools
 Search Cases
 Assignments and Q&A Session
5/24/2013 Sergey Chernov, Information Retrieval Basics
Practical assignment
 Construct 3 information needs, relevant to your
everyday experience (preparing for an interview,
choosing a learning course, doing a homework, etc.)
 Search for the information, using maximum number
of sources and tools
 Share your experience
5/24/2013 Sergey Chernov, Information Retrieval Basics

Más contenido relacionado

Similar a Information retrieval basics_v1.0

Data retrieval basics_v1.0
Data retrieval basics_v1.0Data retrieval basics_v1.0
Data retrieval basics_v1.0Sergey Chernov
 
Basic SEO & Basic Research
Basic SEO & Basic ResearchBasic SEO & Basic Research
Basic SEO & Basic ResearchHans Lohrmann
 
Profiling a Person With Search Log Data
Profiling a Person With Search Log DataProfiling a Person With Search Log Data
Profiling a Person With Search Log DataJim Jansen
 
@WebSciDL PhD Student Project Reviews August 5&6, 2015
@WebSciDL PhD Student Project Reviews August 5&6, 2015@WebSciDL PhD Student Project Reviews August 5&6, 2015
@WebSciDL PhD Student Project Reviews August 5&6, 2015Michael Nelson
 
Tutorial: Context-awareness In Information Retrieval and Recommender Systems
Tutorial: Context-awareness In Information Retrieval and Recommender SystemsTutorial: Context-awareness In Information Retrieval and Recommender Systems
Tutorial: Context-awareness In Information Retrieval and Recommender SystemsYONG ZHENG
 
User Evaluation of Dublin Core Metadata in Image Collections
User Evaluation of Dublin Core Metadata in Image CollectionsUser Evaluation of Dublin Core Metadata in Image Collections
User Evaluation of Dublin Core Metadata in Image CollectionsKathleen Fear
 
Borgman orcid dryadsymposiumoxford20130523
Borgman orcid dryadsymposiumoxford20130523Borgman orcid dryadsymposiumoxford20130523
Borgman orcid dryadsymposiumoxford20130523ORCID, Inc
 
How to find out about the usability of your web site using a survey by @cjforms
How to find out about the usability of your web site using a survey by @cjformsHow to find out about the usability of your web site using a survey by @cjforms
How to find out about the usability of your web site using a survey by @cjformsCaroline Jarrett
 
The Effect of Search Task Familiarity on Search Behaviours in Biomedical Search
The Effect of Search Task Familiarity on Search Behaviours in Biomedical SearchThe Effect of Search Task Familiarity on Search Behaviours in Biomedical Search
The Effect of Search Task Familiarity on Search Behaviours in Biomedical SearchYing-Hsang Liu
 
Evaluation Of Research Methods And Data Collection A...
Evaluation Of Research Methods And Data Collection A...Evaluation Of Research Methods And Data Collection A...
Evaluation Of Research Methods And Data Collection A...Ashley Thomas
 
From metasearch to metaservices
From metasearch to metaservicesFrom metasearch to metaservices
From metasearch to metaservicesdswalker
 
A Brief (and Practical) Introduction to Information Architecture
A Brief (and Practical) Introduction to Information ArchitectureA Brief (and Practical) Introduction to Information Architecture
A Brief (and Practical) Introduction to Information ArchitectureLouis Rosenfeld
 
UserZoom: Search For People Online Study
UserZoom: Search For People Online StudyUserZoom: Search For People Online Study
UserZoom: Search For People Online StudyUserZoom
 
“From Discovery to Fulfillment: Improving the User Experience at Every Stage.”
 “From Discovery to Fulfillment: Improving the User Experience at Every Stage.” “From Discovery to Fulfillment: Improving the User Experience at Every Stage.”
“From Discovery to Fulfillment: Improving the User Experience at Every Stage.”Lynn Connaway
 
SERPs: From keyword to click. BrightonSEO (18th September 2015)
SERPs: From keyword to click. BrightonSEO (18th September 2015)SERPs: From keyword to click. BrightonSEO (18th September 2015)
SERPs: From keyword to click. BrightonSEO (18th September 2015)Similarweb
 
Rise presentation-2012-01
Rise presentation-2012-01Rise presentation-2012-01
Rise presentation-2012-01Richard Nurse
 
Learning Analytics: What it is, where we are, and where we could go
Learning Analytics: What it is, where we are, and where we could goLearning Analytics: What it is, where we are, and where we could go
Learning Analytics: What it is, where we are, and where we could goDoug Clow
 
Abdulwahaab Saif S Alsaif Investigate The Impact Of Social Media On Students
Abdulwahaab Saif S Alsaif Investigate The Impact Of Social Media On StudentsAbdulwahaab Saif S Alsaif Investigate The Impact Of Social Media On Students
Abdulwahaab Saif S Alsaif Investigate The Impact Of Social Media On StudentsLisa Garcia
 

Similar a Information retrieval basics_v1.0 (20)

Data retrieval basics_v1.0
Data retrieval basics_v1.0Data retrieval basics_v1.0
Data retrieval basics_v1.0
 
Basic SEO & Basic Research
Basic SEO & Basic ResearchBasic SEO & Basic Research
Basic SEO & Basic Research
 
NCompass Live: Racial & Gender Bias in Search
NCompass Live: Racial & Gender Bias in Search NCompass Live: Racial & Gender Bias in Search
NCompass Live: Racial & Gender Bias in Search
 
Profiling a Person With Search Log Data
Profiling a Person With Search Log DataProfiling a Person With Search Log Data
Profiling a Person With Search Log Data
 
@WebSciDL PhD Student Project Reviews August 5&6, 2015
@WebSciDL PhD Student Project Reviews August 5&6, 2015@WebSciDL PhD Student Project Reviews August 5&6, 2015
@WebSciDL PhD Student Project Reviews August 5&6, 2015
 
Tutorial: Context-awareness In Information Retrieval and Recommender Systems
Tutorial: Context-awareness In Information Retrieval and Recommender SystemsTutorial: Context-awareness In Information Retrieval and Recommender Systems
Tutorial: Context-awareness In Information Retrieval and Recommender Systems
 
User Evaluation of Dublin Core Metadata in Image Collections
User Evaluation of Dublin Core Metadata in Image CollectionsUser Evaluation of Dublin Core Metadata in Image Collections
User Evaluation of Dublin Core Metadata in Image Collections
 
Borgman orcid dryadsymposiumoxford20130523
Borgman orcid dryadsymposiumoxford20130523Borgman orcid dryadsymposiumoxford20130523
Borgman orcid dryadsymposiumoxford20130523
 
How to find out about the usability of your web site using a survey by @cjforms
How to find out about the usability of your web site using a survey by @cjformsHow to find out about the usability of your web site using a survey by @cjforms
How to find out about the usability of your web site using a survey by @cjforms
 
The Effect of Search Task Familiarity on Search Behaviours in Biomedical Search
The Effect of Search Task Familiarity on Search Behaviours in Biomedical SearchThe Effect of Search Task Familiarity on Search Behaviours in Biomedical Search
The Effect of Search Task Familiarity on Search Behaviours in Biomedical Search
 
21st century week 3
21st century week 321st century week 3
21st century week 3
 
Evaluation Of Research Methods And Data Collection A...
Evaluation Of Research Methods And Data Collection A...Evaluation Of Research Methods And Data Collection A...
Evaluation Of Research Methods And Data Collection A...
 
From metasearch to metaservices
From metasearch to metaservicesFrom metasearch to metaservices
From metasearch to metaservices
 
A Brief (and Practical) Introduction to Information Architecture
A Brief (and Practical) Introduction to Information ArchitectureA Brief (and Practical) Introduction to Information Architecture
A Brief (and Practical) Introduction to Information Architecture
 
UserZoom: Search For People Online Study
UserZoom: Search For People Online StudyUserZoom: Search For People Online Study
UserZoom: Search For People Online Study
 
“From Discovery to Fulfillment: Improving the User Experience at Every Stage.”
 “From Discovery to Fulfillment: Improving the User Experience at Every Stage.” “From Discovery to Fulfillment: Improving the User Experience at Every Stage.”
“From Discovery to Fulfillment: Improving the User Experience at Every Stage.”
 
SERPs: From keyword to click. BrightonSEO (18th September 2015)
SERPs: From keyword to click. BrightonSEO (18th September 2015)SERPs: From keyword to click. BrightonSEO (18th September 2015)
SERPs: From keyword to click. BrightonSEO (18th September 2015)
 
Rise presentation-2012-01
Rise presentation-2012-01Rise presentation-2012-01
Rise presentation-2012-01
 
Learning Analytics: What it is, where we are, and where we could go
Learning Analytics: What it is, where we are, and where we could goLearning Analytics: What it is, where we are, and where we could go
Learning Analytics: What it is, where we are, and where we could go
 
Abdulwahaab Saif S Alsaif Investigate The Impact Of Social Media On Students
Abdulwahaab Saif S Alsaif Investigate The Impact Of Social Media On StudentsAbdulwahaab Saif S Alsaif Investigate The Impact Of Social Media On Students
Abdulwahaab Saif S Alsaif Investigate The Impact Of Social Media On Students
 

Último

ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Projectjordimapav
 
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...JojoEDelaCruz
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...Nguyen Thanh Tu Collection
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 
Textual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSTextual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSMae Pangan
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxlancelewisportillo
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...Postal Advocate Inc.
 
TEACHER REFLECTION FORM (NEW SET........).docx
TEACHER REFLECTION FORM (NEW SET........).docxTEACHER REFLECTION FORM (NEW SET........).docx
TEACHER REFLECTION FORM (NEW SET........).docxruthvilladarez
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptxiammrhaywood
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4JOYLYNSAMANIEGO
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfPatidar M
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfJemuel Francisco
 
EMBODO Lesson Plan Grade 9 Law of Sines.docx
EMBODO Lesson Plan Grade 9 Law of Sines.docxEMBODO Lesson Plan Grade 9 Law of Sines.docx
EMBODO Lesson Plan Grade 9 Law of Sines.docxElton John Embodo
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Seán Kennedy
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptxmary850239
 
Activity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translationActivity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translationRosabel UA
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
 

Último (20)

ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Project
 
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 
Textual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSTextual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHS
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
 
TEACHER REFLECTION FORM (NEW SET........).docx
TEACHER REFLECTION FORM (NEW SET........).docxTEACHER REFLECTION FORM (NEW SET........).docx
TEACHER REFLECTION FORM (NEW SET........).docx
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdf
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
 
EMBODO Lesson Plan Grade 9 Law of Sines.docx
EMBODO Lesson Plan Grade 9 Law of Sines.docxEMBODO Lesson Plan Grade 9 Law of Sines.docx
EMBODO Lesson Plan Grade 9 Law of Sines.docx
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx
 
Paradigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTAParadigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTA
 
Activity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translationActivity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translation
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
 

Information retrieval basics_v1.0

  • 1. Center for the Study of New Media and Society www.newmediacenter.ru Information Retrieval Basics Sergey Chernov
  • 2. Information search in action… 5/24/2013 Sergey Chernov, Information Retrieval Basics  Vladimir Pekhtin  Alexey Navalny  Doct_z
  • 3. Public data 5/24/2013 Sergey Chernov, Information Retrieval Basics
  • 4. Resources and achievements  Search engines  Databases for property owners in Europe & USA  List of Deputies of State Duma  Man-hours invested in manual search and exploration Results: 500+ news, 150 articles, 20 interviews and videos, Pekhtin resigned from Committee of Ethics 5/24/2013 Sergey Chernov, Information Retrieval Basics
  • 5. Outline for today  Sources of Information  Search strategies and tools  Search Cases  Assignments and Q&A Session 5/24/2013 Sergey Chernov, Information Retrieval Basics
  • 6. Outline for today  Sources of Information  Search strategies and tools  Search Cases  Assignments and Q&A Session 5/24/2013 Sergey Chernov, Information Retrieval Basics
  • 7. Information in numbers  Facebook – 900 mln users  Twitter – 500 mln  Flickr – 50 mln  Delicious – 5 mln  Web – 1 trln 5/24/2013 Sergey Chernov, Information Retrieval Basics
  • 8. Information Retrieval  Information Retrieval (IR) is finding material (usually documents) of an unstructured nature (usually text) that satisfies an information need from within large collections (usually stored on computers). 8
  • 9. Information Domains Desktop Enterprise Web (Intranet) Public Web (Internet) DVD Disk FShare DB Web CMS E-mail People Web SitesOnline Libraries Online Shops Social Networks
  • 10. Information Retrieval System Downloads/collects the data Processes the data and builds Inverted Index Evaluates user queries against the index and computes a list of (ranked) results Organizes and displays the results to the user, facilitates navigation through the result set Crawler Indexer Ranker Display
  • 11. User Needs  Need [Broder 2002, Rose and Levinson 2004]  Informational – want to learn about something  Navigational – want to go to that page  Transactional – want to do something (web-mediated)  Access a service  Downloads  Shop  Gray areas  Find a good hub  Exploratory search “see what’s there” Low hemoglobin United Airlines Seattle weather Mars surface images Canon S410 Car rental Brasil Sec. 19.4.1 11
  • 12. How far do people look for results? (Source: iprospect.com WhitePaper_2006_SearchEngineUserBehavior.pdf) 12
  • 13. How to evaluate results? CRAAP  Currency  Relevance  Authority  Accuracy  Purpose 5/24/2013 Sergey Chernov, Information Retrieval Basics http://www.csuchico.edu/lins/handouts/eval_websites.pdf  How old is the material? Does the age matter? History – better old info, medicine –fresh stuff.  How well does it fit? Does it answer my question? Detailed enough?  Who wrote it? Is the author is qualified to write? What about contact information?  Is it supported by evidence? Refereed? Verifiable? Unbiased? Clearly written?  What can you infer about authors‘ message? Is it fact, opinion or propaganda? California State University, Chico
  • 14. Where to search?  Web  Subject directories  Intranet and Desktop  Digital libraries  Social platforms  Databases and Hidden Web  Business analytics  Wikipedia  Photo stocks  Open datasets and Linked Data  Open Gov Data 5/24/2013 Sergey Chernov, Information Retrieval Basics
  • 15. Web 5/24/2013 Sergey Chernov, Information Retrieval Basics
  • 16. Subject directories 5/24/2013 Sergey Chernov, Information Retrieval Basics http://webupon.com/search- engines/top-five-subject-directories-and- how-to-use-them/
  • 17. Intranet 5/24/2013 Sergey Chernov, Information Retrieval Basics
  • 18. Desktop 5/24/2013 Sergey Chernov, Information Retrieval Basics
  • 19. Digital libraries 5/24/2013 Sergey Chernov, Information Retrieval Basics
  • 20. Social platforms 5/24/2013 Sergey Chernov, Information Retrieval Basics
  • 21. Databases and Hidden Web 5/24/2013 Sergey Chernov, Information Retrieval Basics
  • 22. Business Analytics 5/24/2013 Sergey Chernov, Information Retrieval Basics
  • 23. Wikipedia 5/24/2013 Sergey Chernov, Information Retrieval Basics
  • 24. Photo stocks 5/24/2013 Sergey Chernov, Information Retrieval Basics
  • 25. Linked Data 5/24/2013 Sergey Chernov, Information Retrieval Basics
  • 26. Open Data 5/24/2013 Sergey Chernov, Information Retrieval Basics
  • 27. Outline for today  Sources of Information  Search strategies and tools  Search Cases  Assignments and Q&A Session 5/24/2013 Sergey Chernov, Information Retrieval Basics
  • 28. Search is a journey Is that all? http://www.flickr.com/photos/morville
  • 29. Search is a journey http://www.flickr.com/photos/morville
  • 30. Search is a journey http://www.flickr.com/photos/morville
  • 31. Search is a journey http://www.flickr.com/photos/morville
  • 32. Search is a journey http://www.flickr.com/photos/morville
  • 33. Exploratory search Lookup Question answering Fact retrieval Known-item search Navigational search Lasts for seconds Exploratory search InvestigateLearn Knowledge acquisition Comprehension Comparison Discovery Serendipity Incremental search Driven by uncertainty Non-linear behavior Result analysis Lasts for hours
  • 34. Exploratory behavior  Learn  About the search topic  About the collection  Reformulate query  Broadening  Narrowing  Changing the focus  Socialize  Looking for experts  Collaborative search
  • 35. Search tools  Web search engines  Personalized search  Faceted search  Review services  Geo-services  Question answering  Scientific search  Domain-specific search  Recommender systems 5/24/2013 Sergey Chernov, Information Retrieval Basics
  • 36. Web search engine 5/24/2013 Sergey Chernov, Information Retrieval Basics Query suggestions Snippets
  • 37. Web search engine (2) 5/24/2013 Sergey Chernov, Information Retrieval Basics
  • 38. Web search engine (3)  Search for pages that link to a URL – “link:” operator link: google.com/images  Search for pages that similar to a URL – “related:” related: nytimes.com  Search for results from specific sites – “site:” site: strelkainstitute.com 5/24/2013 Sergey Chernov, Information Retrieval Basics
  • 39. Personalized search 5/24/2013 Sergey Chernov, Information Retrieval Basics  Personalization is a modeling of user’s preferences from previous interactions  Queries, click-through analysis, eye tracking …  Personalized Search usually implemented as:  Re-ranking and filtering of the search results  Personalized query expansion
  • 40. 5/24/2013 Sergey Chernov, Information Retrieval Basics
  • 41. Faceted search It’s about Result Analysis! facet facet values
  • 42. Faceted search (2) It’s about Query Reformulation!
  • 43. Review services 5/24/2013 Sergey Chernov, Information Retrieval Basics
  • 44. Geo-services 5/24/2013 Sergey Chernov, Information Retrieval Basics
  • 45. Question answering 5/24/2013 Sergey Chernov, Information Retrieval Basics
  • 46. Scientific search 5/24/2013 Sergey Chernov, Information Retrieval Basics
  • 47. Scientific Search (2) 5/24/2013 Sergey Chernov, Information Retrieval Basics
  • 48. Domain-specific search 5/24/2013 Sergey Chernov, Information Retrieval Basics
  • 49. Recommender systems 5/24/2013 Sergey Chernov, Information Retrieval Basics
  • 50. Outline for today  Sources of Information  Search strategies and tools  Search Cases  Assignments and Q&A Session 5/24/2013 Sergey Chernov, Information Retrieval Basics
  • 51. Case 1: finding a research paper 5/24/2013 Sergey Chernov, Information Retrieval Basics
  • 52. Case 2: planning a trip 5/24/2013 Sergey Chernov, Information Retrieval Basics
  • 53. Case 3: looking for an expert 5/24/2013 Sergey Chernov, Information Retrieval Basics
  • 54. Case 4: market analysis 5/24/2013 Sergey Chernov, Information Retrieval Basics
  • 55. Outline for today  Sources of Information  Search strategies and tools  Search Cases  Assignments and Q&A Session 5/24/2013 Sergey Chernov, Information Retrieval Basics
  • 56. Practical assignment  Construct 3 information needs, relevant to your everyday experience (preparing for an interview, choosing a learning course, doing a homework, etc.)  Search for the information, using maximum number of sources and tools  Share your experience 5/24/2013 Sergey Chernov, Information Retrieval Basics

Notas del editor

  1. Here is what a search environment for a company employee looks like
  2. Need this slide in case some people are not familiar with how a IR system works. This is a very simplified standard architecture. In different scenarios some of these components may be absent.Depending on the level of the participants may spend some time explaining how each component works.
  3. Currency: the timeliness of the informationWhen was the information published or posted?Has the information been revised or updated?Is the information current or out-of date for your topic?Are the links functional?Relevance: the importance of the information for your needsDoes the information relate to your topic or answer your question?Who is the intended audience?Is the information at an appropriate level (i.e. not too elementary or advanced for your needs)?Have you looked at a variety of sources before determining this is one you will use?Would you be comfortable using this source for a research paper?Authority: the source of the informationWho is the author/publisher/source/sponsor?Are the author's credentials or organizational affiliations given?What are the author's credentials or organizational affiliations given?What are the author's qualifications to write on the topic?Is there contact information, such as a publisher or e-mail address?Does the URL reveal anything about the author or source?     examples: .com (commercial), .edu (educational), .gov (U.S. government),                .org (nonprofit organization), or .net (network) Accuracy: the reliability, truthfulness, and correctness of the content, and Where does the information come from?Is the information supported by evidence?Has the information been reviewed or refereed?Can you verify any of the information in another source or from personal knowledge?Does the language or tone seem biased and free of emotion?Are there spelling, grammar, or other typographical errors?Purpose: the reason the information existsWhat is the purpose of the information? to inform? teach? sell? entertain? persuade?Do the authors/sponsors make their intentions or purpose clear?Is the information fact? opinion? propaganda?Does the point of view appear objective and impartial?Are there political, ideological, cultural, religious, institutional, or personal biases?By scoring each category on a scale from 1 to 10 (1 = worst, 10=best possible) you can give each site a grade on a 50 point scale for how high-quality it is!45 - 50 Excellent | 40 - 44 Good | 35 - 39 Average | 30 - 34 Borderline Acceptable | Below 30 - Unacceptable
  4. Subject Directories can help one find more in-depth information on a certain subject, then just a plain search engine.Whether one is looking for articles for medical, academic or just plain curious, one way to find information is by using a basic search engine; however, if one is searching for information on a specific topic and wants to get direct to the point information, one needs to use a subject directory. However, which ones to choose and why can be difficult, so I compiled a list of the most commonly used ones and few hidden gems I found on the internet. Librarians’ Internet Index (LII) – Over 20,000 articles compiled by public librarians with completely reliable sourcesINFOMINE (Infomine.) – over 250,000 articles compiled by academic librarians, all reliable sources. We are talking college level information here. Want an A or a raise, this is a great sight for well researched information and all was written by expertsAbout.com (About.) – With nearly 2 million articles, About.com is one of the leading subject directories. These articles are written by people with experience in the area in which they writeGoogle Directory (Google Directory) – With well over 5 million articles, this is by far the leader in subject directories. This is of course enhanced by the Google search engine, which means more results on the chosen topic of researchYahoo Directory (Yahoo Directory.) – With just over 4 million articles, Yahoo offers up lots of useful information. The only draw back is that this subject directory really works best with popular topics, not vague onesRead more: http://webupon.com/search-engines/top-five-subject-directories-and-how-to-use-them/#ixzz2LHYbMsJ7
  5. The Million Book Project (or the Universal Library), was a book digitization project, led by Carnegie Mellon University School of Computer Science and University Libraries.[1] Working with government and research partners in India (Digital Library of India) and China, the project scanned books in many languages, using OCR to enable full text searching, and providing free-to-read access to the books on the web. As of 2007, they have completed the scanning of 1 million books and have made accessible the entire database from http://www.ulib.org.The Internet Archive is a non-profit digital library with the stated mission of "universal access to all knowledge."[2][3] It offers permanent storage of and free public access to collections of digitized materials, including websites, music, moving images, and nearly three million public-domain books; as of October 2012 it held over 10 petabytes in cultural material.[4]CiteSeer was a public search engine and digital library for scientific and academic papers, primarily in the fields of computer and information science. It became public in 1998 and had many new features unavailable in academic search engines at that time. The arXiv (pronounced "archive", as if the "X" were the Greek letterChi, χ) is an archive for electronic preprints of scientific papers in the fields of mathematics, physics, astronomy, computer science, quantitative biology, statistics, and quantitative finance which can be accessed online. In many fields of mathematics and physics, almost all scientific papers are self-archived on the arXiv. On October 3, 2008, arXiv.org passed the half-million article milestone.[2] The preprint archive turned 20 years old on August 14, 2011.[3] By 2012 the submission rate has grown to more than 7000 per month.[4]
  6. Web 2.0
  7. The Deep Web (also called the Deepnet, the Invisible Web, the Undernet or the hidden Web) is World Wide Web content that is not part of the Surface Web, which is indexed by standard search engines. http://www.makeuseof.com/tag/10-search-engines-explore-deep-invisible-web/ It should not be confused with the dark Internet, the computers that can no longer be reached via Internet, or with the distributed filesharing network Darknet, which could be classified as a smaller part of the Deep Web.Mike Bergman, founder of BrightPlanet and credited with coining the phrase,[1] said that searching on the Internet today can be compared to dragging a net across the surface of the ocean: a great deal may be caught in the net, but there is a wealth of information that is deep and therefore missed.[2] Most of the Web's information is buried far down on dynamically generated sites, and standard search engines do not find it. Traditional search engines cannot "see" or retrieve content in the deep Web—those pages do not exist until they are created dynamically as the result of a specific search. The deep Web is several orders of magnitude larger than the surface Web.[3]Dynamic content: dynamic pages which are returned in response to a submitted query or accessed only through a form, especially if open-domain input elements (such as text fields) are used; such fields are hard to navigate without domain knowledge. Unlinked content: pages which are not linked to by other pages, which may prevent Web crawling programs from accessing the content. This content is referred to as pages without backlinks (or inlinks). Private Web: sites that require registration and login (password-protected resources). Contextual Web: pages with content varying for different access contexts (e.g., ranges of client IP addresses or previous navigation sequence). Limited access content: sites that limit access to their pages in a technical way (e.g., using the Robots Exclusion Standard, CAPTCHAs, or no-cache PragmaHTTP headers which prohibit search engines from browsing them and creating cached copies.[8]) Scripted content: pages that are only accessible through links produced by JavaScript as well as content dynamically downloaded from Web servers via Flash or Ajax solutions. Non-HTML/text content: textual content encoded in multimedia (image or video) files or specific file formats not handled by search engines.
  8. Business analytics (BA) refers to the skills, technologies, applications and practices for continuous iterative exploration and investigation of past business performance to gain insight and drive business planning.[1] Business analytics focuses on developing new insights and understanding of business performance based on data and statistical methods. In contrast, business intelligence traditionally focuses on using a consistent set of metrics to both measure past performance and guide business planning, which is also based on data and statistical methods.Business analytics makes extensive use of data, statistical and quantitative analysis, explanatory and predictive modeling,[2] and fact-based management to drive decision making. Analytics may be used as input for human decisions or may drive fully automated decisions. Business intelligence is querying, reporting, OLAP, and "alerts.
  9. The Semantic Web is a collaborative movement led by the international standards body, the World Wide Web Consortium (W3C).[1] The standard promotes common data formats on the World Wide Web. By encouraging the inclusion of semantic content in web pages, the Semantic Web aims at converting the current web dominated by unstructured and semi-structured documents into a "web of data". The Semantic Web stack builds on the W3C's Resource Description Framework (RDF).[2]According to the W3C, "The Semantic Web provides a common framework that allows data to be shared and reused across application, enterprise, and community boundaries."[2]YAGO2s is a huge semantic knowledge base, derived from WikipediaWordNet and GeoNames. Currently, YAGO2s has knowledge of more than 10 million entities (like persons, organizations, cities, etc.) and contains more than 120 million facts about these entities.
  10. http://www.budgetstockphoto.com/free_stock_photos.html
  11. In computing, linked data describes a method of publishing structured data so that it can be interlinked and become more useful. It builds upon standard Web technologies such as HTTP and URIs, but rather than using them to serve web pages for human readers, it extends them to share information in a way that can be read automatically by computers. This enables data from different sources to be connected and queried.[1]Tim Berners-Lee, director of the World Wide Web Consortium, coined the term in a design note discussing issues around the Semantic Web project.[2] However, the idea is very old and is closely related to concepts including database network models, citations between scholarly articles, and controlled headings in library catalogs.[citation needed]Tim Berners-Lee gave a presentation on linked data at the TED 2009 conference.[4] In it, he restated the linked data principles as three "extremely simple" rules:All kinds of conceptual things, they have names now that start with HTTP.I get important information back. I will get back some data in a standard format which is kind of useful data that somebody might like to know about that thing, about that event.I get back that information it's not just got somebody's height and weight and when they were born, it's got relationships. And when it has relationships, whenever it expresses a relationship then the other thing that it's related to is given one of those names that starts with HTTP.FOAF (an acronym of Friend of a friend) is a machine-readableontology describing persons, their activities and their relations to other people and objects. Anyone can use FOAF to describe him or herself. FOAF allows groups of people to describe social networks without the need for a centralised database.FOAF is a descriptive vocabulary expressed using the Resource Description Framework (RDF) and the Web Ontology Language (OWL). Computers may use these FOAF profiles to find, for example, all people living in Europe, or to list all people both you and a friend of yours know.[1][2] This is accomplished by defining relationships between people. Each profile has a unique identifier (such as the person's e-mail addresses, a Jabber ID, or a URI of the homepage or weblog of the person), which is used when defining these relationships.The GeoNames geographical database is available for download free of charge under a creative commons attribution license. It contains over 10 million geographical names and consists of over 8 million unique features whereof 2.8 million populated places and 5.5 million alternate names. All features are categorized into one out of nine feature classes and further subcategorized into one out of 645 feature codes. (more statistics ...). The data is accessible free of charge through a number of webservices and a daily database export. GeoNames is already serving up to over 30 million web service requests per day.
  12. Open data is the idea that certain data should be freely available to everyone to use and republish as they wish, without restrictions from copyright, patents or other mechanisms of control. The goals of the open data movement are similar to those of other "Open" movements such as open source, open hardware, open content, and open access. The philosophy behind open data has been long established (for example in the Mertonian tradition of science), but the term "open data" itself is recent, gaining popularity with the rise of the Internet and World Wide Web and, especially, with the launch of open-data government initiatives such as Data.gov.Open data is often focused on non-textual material such as maps, genomes, connectomes, chemical compounds, mathematical and scientific formulae, medical data and practice, bioscience and biodiversity. Problems often arise because these are commercially valuable or can be aggregated into works of value. Access to, or re-use of, the data is controlled by organisations, both public and private. Control may be through access restrictions, licenses, copyright, patents and charges for access or re-use. Advocates of open data argue that these restrictions are against the communal good and that these data should be made available without restriction or fee. In addition, it is important that the data are re-usable without requiring further permission, though the types of re-use (such as the creation of derivative works) may be controlled by license.Data.gov is a U.S. government website launched in late May 2009 by the then Federal Chief Information Officer (CIO) of the United States, VivekKundra.According to its website, "The purpose of Data.gov is to increase public access to high value, machine readable datasets generated by the Executive Branch of the Federal Government."[1]Open Data Commons is the home of a set of legal tools to help you provide and use Open DataD3.js is a JavaScript library for manipulating documents based on data. D3 helps you bring data to life using HTML, SVG and CSS. D3’s emphasis on web standards gives you the full capabilities of modern browsers without tying yourself to a proprietary framework, combining powerful visualization components and a data-driven approach to DOM manipulation.
  13. Recommender systems or recommendation systems (sometimes replacing "system" with a synonym such as platform or engine) are a subclass of information filtering system that seek to predict the 'rating' or 'preference' that a user would give to an item (such as music, books, or movies) or social element (e.g. people or groups) they had not yet considered, using a model built from the characteristics of an item (content-based approaches) or the user's social environment (collaborative filtering approaches).[1][2]