SlideShare una empresa de Scribd logo
1 de 39
Course Overview:
An Introduction to Information
 Retrieval and Applications


           J. H. Wang
          Feb. 22, 2012
Instructor & TA
• Instructor
    –   J. H. Wang ( 王正豪 )
    –   Assistant Professor, CSIE, NTUT
    –   Office: R1534, Technology Building
    –   E-mail: jhwang@csie.ntut.edu.tw
    –   Tel: ext. 4238
    –   Office Hour: 9:00-12:00 am, every Tuesday and
        Wednesday
• TA
    – Mr. Liu ( 劉瀚之 )
    – R1424, Technology Building
IR, Spring 2012      NTUT CSIE                2
Course Description
• Course Web Page
    – http://www.ntut.edu.tw/~jhwang/IR/
• Time: 9:10-12:00am, Thu.
• Classroom: R1322, Technology Building
• Textbook:
    – Christopher D. Manning, Prabhakar Raghavan and Hinrich
      Schuetze, Introduction to Information Retrieval, Cambridge
      University Press, 2008.
        • Available online
        • International Student Edition, imported by Kai-Fa ( 開發 )
          Publishing
• Prerequisites:
    – Basic knowledge of data structures and algorithms, linear
      algebra, and probability theory
    – Programming experience is *required* for homeworks &
      projects
IR, Spring 2012         NTUT CSIE                        3
Additional References
• References:
    – Ricardo Baeza-Yates and Berthier Ribeiro-Neto,
      Modern Information Retrieval: The Concepts and
      Technology behind Search, Addison-Wesley, 2011.
        • This is the second edition of their book Modern Information
          Retrieval in 1999. ( 華通 )
    – Stefan Buettcher, Charles L.A. Clarke, and Gordon V.
      Cormack, Information Retrieval: Implementing and
      Evaluating Search Engines, MIT Press, 2010.
    – Bruce Croft, Donald Metzler, and Trevor Strohman,
      Search Engines: Information Retrieval in Practice,
      Addison-Wesley, 2010. ( 全華 )
IR, Spring 2012        NTUT CSIE                     4
More Books on IR
• Gerald Salton, Automatic information organization and
  retrieval, McGraw-Hill, 1968.
• Gerald Salton and M.J. McGill, Introduction to modern
  information retrieval, McGraw-Hill, 1983.
    – Two classics, but out-of-print.
• C. J. van Rijsbergen, Information Retrieval, Butterworths,
  1979.
    – The classic. More than 40 years old, but still worth reading.
• K. Sparck Jones, P. Willett, Readings in Information
  Retrieval, Morgan Kaufmann, 1997.
    – A collection of classical IR papers. (out of print)
• I.H. Witten, A. Moffat, T.C. Bell. Morgan Kaufmann,
  Managing Gigabytes, 2nd edition, 1999.
    – The authority on index construction and compression.
IR, Spring 2012           NTUT CSIE                         5
Grading Policy
• Homework assignments and
  programming exercises: 40%
• Mid-term exam: 25%
• Term project: 35%
    – Including the proposal and final report




IR, Spring 2012    NTUT CSIE            6
Programming Exercises and Term
            Project
• About 3 programming exercises
    – Team-based (at most 2 persons per team)
    – You can either write your own code or reuse existing
      open source code
• The term project
    – Either team-based system development (the same as
      programming exercises)
    – Or academic paper presentation
        • Only one person per team allowed
    – A proposal is required before midterm (Apr. 12, 2012)

IR, Spring 2012       NTUT CSIE              7
About the Term Project
• The score you get depends on the difficulty and
  quality of your project
    – For system development:
        • System functions and correctness
    – For academic paper presentation
        • Quality and your presentation of the paper
        • Major methods/experimental results *must* be presented
        • Papers from top conferences are strongly suggested
            – E.g. SIGIR, WWW, CIKM, WSDM, JCDL, ICMR, …
        • Proposals are *required* for each team, and will counted in
          the score

IR, Spring 2012        NTUT CSIE                     8
Online Submission
• Submission instructions
    – Programs, project proposals, and project
      reports in electronic files must be submitted to
      the TA online at:
        • http://140.124.183.39/ir/
    – Before submission:
        • User name: Your student ID
        • Please change your default password at your first
          login

IR, Spring 2012      NTUT CSIE                9
What this Course is NOT about
• This course will NOT tell you
    – The tips and tricks of using search engines,
      although power users might have better ideas on how
      to improve them
        • There’re plenty of books and websites on that…
    – How to find books in libraries,
      although it’s somewhat related to the basic IR
      concepts
    – How to make money on the Web,
      although the currently largest search engine did it


IR, Spring 2012        NTUT CSIE                    10
What’s Information Retrieval




IR, Spring 2012   NTUT CSIE   11
On Wikipedia




IR, Spring 2012    NTUT CSIE     12
On Google Images




IR, Spring 2012      NTUT CSIE   13
On Google Video Search




IR, Spring 2012   NTUT CSIE   14
On Google News (TW)




IR, Spring 2012   NTUT CSIE   15
On Google News (US)




IR, Spring 2012   NTUT CSIE   16
On Blogs




IR, Spring 2012   NTUT CSIE   17
On Google Translate…




IR, Spring 2012   NTUT CSIE   18
Or More Related Keywords
• NBA
• New York Knicks
• Linsanity
•…




IR, Spring 2012   NTUT CSIE   19
What if We Search in Chinese




IR, Spring 2012   NTUT CSIE   20
And More…
• 紐約尼克
• 哈佛
• 台裔球員
•…
• And other languages…
• And other search engines…
• And social websites…

IR, Spring 2012   NTUT CSIE   21
In Google Trends




IR, Spring 2012     NTUT CSIE    22
And More…




IR, Spring 2012   NTUT CSIE   23
And Other Keywords…




IR, Spring 2012   NTUT CSIE   24
And Other Keywords…




IR, Spring 2012   NTUT CSIE   25
Palanteer – TW Election




IR, Spring 2012   NTUT CSIE   26
IR, Spring 2012   NTUT CSIE   27
IR, Spring 2012   NTUT CSIE   28
What Is Information Retrieval?
• “Information retrieval is a field concerned
  with the structure, analysis, organization,
  storage, searching, and retrieval of
  information.” (Salton, 1968)




IR, Spring 2012   NTUT CSIE          29
Goal
• Information retrieval (IR): a research field
  that targets at effectively and efficiently
  searching information in text and
  multimedia documents
• In this course, we will introduce the basic
  text and query models in IR, retrieval
  evaluation, indexing and searching, and
  applications for IR

IR, Spring 2012   NTUT CSIE        30
A Big Picture




IR, Spring 2012    NTUT CSIE      31
User
                                      Interface
                 user need                                         Text

                                  Text Operations

                       logical view                          Doc representation
                   Query
                                                       Indexing
                                                        Indexing
 user feedback    Expansion

            query                            inverted file

                  Retrieval
                  Retrieval                              Inverte
                                                         d Index
     retrieved docs                                                          Document
                                                                             Collection
                  Ranking
                  Ranking
 ranked docs
IR, Spring 2012               NTUT CSIE                              32
Topics
• Text IR
    – Indexing and searching
    – Query languages and operations
• Retrieval evaluation
• Modeling
    – Boolean model
    – Vector space model
    – Probabilistic model
• Applications for IR
    – Multimedia IR
    – Web search
    – Digital libraries

IR, Spring 2012           NTUT CSIE    33
Organization of the Textbook
• Basics in IR (focus)
    – Inverted indexes for boolean queries (Ch.1-5)
    – Term weighting and vector space model (Ch. 6-7)
    – Evaluation in IR (Ch. 8)
• Advanced Topics
    –   Relevance feedback (Ch. 9)
    –   XML retrieval (Ch. 10)
    –   Probabilistic IR (Ch. 11)
    –   Language models (Ch. 12)
• Machine learning in IR (useful)
    – Text classification (Ch. 13-15)
    – Document clustering (Ch. 16-18)
• Web Search
    – Web crawling and indexes (Ch. 19-20)
    – Link analysis (Ch. 21)


IR, Spring 2012           NTUT CSIE                     34
Pointers to Other Topics
•   Cross-language IR
•   Image, video, and multimedia IR
•   Speech retrieval
•   Music retrieval
•   User interfaces
•   Parallel, distributed, and P2P IR
•   Digital libraries
•   Information science perspective
•   Logic-based approaches to IR
•   Natural language processing techniques

IR, Spring 2012   NTUT CSIE           35
Tentative Schedule
• Before midterm
    –   Boolean retrieval (1 wk)
    –   Indexing (2 wks)
    –   Vector space model and evaluation (2 wk)
    –   Relevance feedback (1 wk)
    –   Probabilistic IR (2 wk)
• After midterm
    –   Text classification (1-2 wk)
    –   Document clustering (1-2 wk)
    –   Web search (2 wks)
    –   Advanced topics: CLIR, IE, … (2 wks)
    –   Term Project Presentation (3 wks)
IR, Spring 2012      NTUT CSIE                 36
Generic Resources
• Wikipedia page on Information Retrieval:
  http://en.wikipedia.org/wiki/Information_re
• Information Retrieval Resources:
  http://www-
  csli.stanford.edu/~hinrich/information-
  retrieval.html
•


IR, Spring 2012      NTUT CSIE    37
Academic Resources
• Journals
    –   ACM TOIS: Transactions on Information Systems
    –   JASIST: Journal of the American Society of Information Sciences
    –   IP&M: Information Processing and Management
    –   IEEE TKDE: Transactions on Knowledge and Data Engineering
• Conferences
    – ACM SIGIR: International Conference on Information Retrieval
    – WWW: World Wide Web Conference
    – ACM CIKM: Conference on Information Knowledge and
      Management
    – JCDL: ACM/IEEE Joint Conference on Digital Libraries
    – ACM WSDM: International Conference on Web Search and Data
      Mining
    – TREC: Text Retrieval Conference

IR, Spring 2012         NTUT CSIE                     38
Thanks for Your Attention!




IR, Spring 2012   NTUT CSIE   39

Más contenido relacionado

Destacado

NBA Jeremy Lin 15-16 season photos (林書豪 NBA 15-16 球季照片)
NBA Jeremy Lin 15-16 season photos (林書豪 NBA 15-16 球季照片)NBA Jeremy Lin 15-16 season photos (林書豪 NBA 15-16 球季照片)
NBA Jeremy Lin 15-16 season photos (林書豪 NBA 15-16 球季照片)Chung Yen Chang
 
Just Lin, Baby! 10 Lessons Jeremy Lin Can Teach Us Before We Go To Work Monda...
Just Lin, Baby! 10 Lessons Jeremy Lin Can Teach Us Before We Go To Work Monda...Just Lin, Baby! 10 Lessons Jeremy Lin Can Teach Us Before We Go To Work Monda...
Just Lin, Baby! 10 Lessons Jeremy Lin Can Teach Us Before We Go To Work Monda...ixfinito
 
Animal Idiom 1take The Bll..
Animal Idiom 1take The Bll..Animal Idiom 1take The Bll..
Animal Idiom 1take The Bll..Eliana N.
 
Animal Idioms 3 Cat
Animal Idioms 3 CatAnimal Idioms 3 Cat
Animal Idioms 3 CatEliana N.
 
Learn Out Live Eng Idioms 1
Learn Out Live Eng Idioms 1Learn Out Live Eng Idioms 1
Learn Out Live Eng Idioms 1learnoutlive
 
Idioms
IdiomsIdioms
IdiomsBMS
 
Life lesson from betterfly
Life lesson from betterflyLife lesson from betterfly
Life lesson from betterflyWissal Lahsoumi
 
Idioms
IdiomsIdioms
Idiomssmtslp
 
Story Grammar for Animal Farm
Story Grammar for Animal FarmStory Grammar for Animal Farm
Story Grammar for Animal FarmDavid Widener
 
English Quiz - Animals
English Quiz - AnimalsEnglish Quiz - Animals
English Quiz - Animalsmanumelwin
 
Motivational story
Motivational storyMotivational story
Motivational storycdkumawat
 
There is always a betterway
There is always a betterwayThere is always a betterway
There is always a betterwayRealslidepro
 
The Legend Of Mahsuri
The Legend Of MahsuriThe Legend Of Mahsuri
The Legend Of MahsuriOH TEIK BIN
 

Destacado (20)

NBA Jeremy Lin 15-16 season photos (林書豪 NBA 15-16 球季照片)
NBA Jeremy Lin 15-16 season photos (林書豪 NBA 15-16 球季照片)NBA Jeremy Lin 15-16 season photos (林書豪 NBA 15-16 球季照片)
NBA Jeremy Lin 15-16 season photos (林書豪 NBA 15-16 球季照片)
 
Just Lin, Baby! 10 Lessons Jeremy Lin Can Teach Us Before We Go To Work Monda...
Just Lin, Baby! 10 Lessons Jeremy Lin Can Teach Us Before We Go To Work Monda...Just Lin, Baby! 10 Lessons Jeremy Lin Can Teach Us Before We Go To Work Monda...
Just Lin, Baby! 10 Lessons Jeremy Lin Can Teach Us Before We Go To Work Monda...
 
哈佛小子 – 林書豪
哈佛小子 – 林書豪哈佛小子 – 林書豪
哈佛小子 – 林書豪
 
Animal Idiom 1take The Bll..
Animal Idiom 1take The Bll..Animal Idiom 1take The Bll..
Animal Idiom 1take The Bll..
 
Animal Idioms 3 Cat
Animal Idioms 3 CatAnimal Idioms 3 Cat
Animal Idioms 3 Cat
 
Learn Out Live Eng Idioms 1
Learn Out Live Eng Idioms 1Learn Out Live Eng Idioms 1
Learn Out Live Eng Idioms 1
 
Love idioms
Love idiomsLove idioms
Love idioms
 
Happiness idioms
Happiness idioms Happiness idioms
Happiness idioms
 
Idioms
IdiomsIdioms
Idioms
 
Animal idioms
Animal idiomsAnimal idioms
Animal idioms
 
Weather idioms
Weather idiomsWeather idioms
Weather idioms
 
Animal idioms
Animal idiomsAnimal idioms
Animal idioms
 
Life lesson from betterfly
Life lesson from betterflyLife lesson from betterfly
Life lesson from betterfly
 
Idioms
IdiomsIdioms
Idioms
 
Story Grammar for Animal Farm
Story Grammar for Animal FarmStory Grammar for Animal Farm
Story Grammar for Animal Farm
 
English Quiz - Animals
English Quiz - AnimalsEnglish Quiz - Animals
English Quiz - Animals
 
No Shoes
No ShoesNo Shoes
No Shoes
 
Motivational story
Motivational storyMotivational story
Motivational story
 
There is always a betterway
There is always a betterwayThere is always a betterway
There is always a betterway
 
The Legend Of Mahsuri
The Legend Of MahsuriThe Legend Of Mahsuri
The Legend Of Mahsuri
 

Similar a 00 intro

empirical-SLR.pptx
empirical-SLR.pptxempirical-SLR.pptx
empirical-SLR.pptxJitha Kannan
 
Big Data and Computer Science Education
Big Data and Computer Science EducationBig Data and Computer Science Education
Big Data and Computer Science EducationJames Hendler
 
Landscape of IoT and Machine Learning Patterns
Landscape of IoT and Machine Learning PatternsLandscape of IoT and Machine Learning Patterns
Landscape of IoT and Machine Learning PatternsHironori Washizaki
 
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...SEAD
 
Automatic Classification of Springer Nature Proceedings with Smart Topic Miner
Automatic Classification of Springer Nature Proceedings with Smart Topic MinerAutomatic Classification of Springer Nature Proceedings with Smart Topic Miner
Automatic Classification of Springer Nature Proceedings with Smart Topic MinerFrancesco Osborne
 
The Experimental Project of DOI Registration for Research Data at Japan Link...
The Experimental Project of DOI Registration for Research Data at Japan Link...The Experimental Project of DOI Registration for Research Data at Japan Link...
The Experimental Project of DOI Registration for Research Data at Japan Link...National Institute of Informatics (NII)
 
An Introduction to Information Retrieval and Applications
 An Introduction to Information Retrieval and Applications An Introduction to Information Retrieval and Applications
An Introduction to Information Retrieval and Applications sathish sak
 
Demonstrating a Framework for KOS-based Recommendations Systems
Demonstrating a Framework for KOS-based Recommendations SystemsDemonstrating a Framework for KOS-based Recommendations Systems
Demonstrating a Framework for KOS-based Recommendations SystemsGESIS
 
Research Data-DOI Experiment in Japanese DOI Registration Agency (Japan Link ...
Research Data-DOI Experiment in Japanese DOI Registration Agency (Japan Link ...Research Data-DOI Experiment in Japanese DOI Registration Agency (Japan Link ...
Research Data-DOI Experiment in Japanese DOI Registration Agency (Japan Link ...National Institute of Informatics (NII)
 
INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎
INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎
INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎Libcorpio
 
Leveraging Computational Methods for Theorizing IS Phenomena
Leveraging Computational Methods for Theorizing IS PhenomenaLeveraging Computational Methods for Theorizing IS Phenomena
Leveraging Computational Methods for Theorizing IS PhenomenaMalmi Amadoru
 
Reproducibility in human cognitive neuroimaging: a community-­driven data sha...
Reproducibility in human cognitive neuroimaging: a community-­driven data sha...Reproducibility in human cognitive neuroimaging: a community-­driven data sha...
Reproducibility in human cognitive neuroimaging: a community-­driven data sha...Nolan Nichols
 
Software Professionals (RSEs) at NCSA
Software Professionals (RSEs) at NCSASoftware Professionals (RSEs) at NCSA
Software Professionals (RSEs) at NCSADaniel S. Katz
 
ISO 15926 Reference Data Engineering Methodology
ISO 15926 Reference Data Engineering MethodologyISO 15926 Reference Data Engineering Methodology
ISO 15926 Reference Data Engineering MethodologyAnatoly Levenchuk
 
NSF SI2 program discussion at 2014 SI2 PI meeting
NSF SI2 program discussion at 2014 SI2 PI meetingNSF SI2 program discussion at 2014 SI2 PI meeting
NSF SI2 program discussion at 2014 SI2 PI meetingDaniel S. Katz
 
Social Web: (Big) Data Mining | summer 2014/2015 course syllabus
Social Web: (Big) Data Mining | summer 2014/2015 course syllabusSocial Web: (Big) Data Mining | summer 2014/2015 course syllabus
Social Web: (Big) Data Mining | summer 2014/2015 course syllabusJakub Ruzicka
 

Similar a 00 intro (20)

empirical-SLR.pptx
empirical-SLR.pptxempirical-SLR.pptx
empirical-SLR.pptx
 
Big Data and Computer Science Education
Big Data and Computer Science EducationBig Data and Computer Science Education
Big Data and Computer Science Education
 
Landscape of IoT and Machine Learning Patterns
Landscape of IoT and Machine Learning PatternsLandscape of IoT and Machine Learning Patterns
Landscape of IoT and Machine Learning Patterns
 
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
 
Online Lecture May 2015
Online Lecture May 2015Online Lecture May 2015
Online Lecture May 2015
 
Automatic Classification of Springer Nature Proceedings with Smart Topic Miner
Automatic Classification of Springer Nature Proceedings with Smart Topic MinerAutomatic Classification of Springer Nature Proceedings with Smart Topic Miner
Automatic Classification of Springer Nature Proceedings with Smart Topic Miner
 
Szomszor "Methods and Tools for Scholarly Data Analytics"
Szomszor "Methods and Tools for Scholarly Data Analytics"Szomszor "Methods and Tools for Scholarly Data Analytics"
Szomszor "Methods and Tools for Scholarly Data Analytics"
 
The Experimental Project of DOI Registration for Research Data at Japan Link...
The Experimental Project of DOI Registration for Research Data at Japan Link...The Experimental Project of DOI Registration for Research Data at Japan Link...
The Experimental Project of DOI Registration for Research Data at Japan Link...
 
An Introduction to Information Retrieval and Applications
 An Introduction to Information Retrieval and Applications An Introduction to Information Retrieval and Applications
An Introduction to Information Retrieval and Applications
 
Demonstrating a Framework for KOS-based Recommendations Systems
Demonstrating a Framework for KOS-based Recommendations SystemsDemonstrating a Framework for KOS-based Recommendations Systems
Demonstrating a Framework for KOS-based Recommendations Systems
 
Research Data-DOI Experiment in Japanese DOI Registration Agency (Japan Link ...
Research Data-DOI Experiment in Japanese DOI Registration Agency (Japan Link ...Research Data-DOI Experiment in Japanese DOI Registration Agency (Japan Link ...
Research Data-DOI Experiment in Japanese DOI Registration Agency (Japan Link ...
 
BIS4408 Jan 2015
BIS4408 Jan 2015BIS4408 Jan 2015
BIS4408 Jan 2015
 
INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎
INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎
INNOVATION AND ‎RESEARCH (Digital Library ‎Information Access)‎
 
Leveraging Computational Methods for Theorizing IS Phenomena
Leveraging Computational Methods for Theorizing IS PhenomenaLeveraging Computational Methods for Theorizing IS Phenomena
Leveraging Computational Methods for Theorizing IS Phenomena
 
Reproducibility in human cognitive neuroimaging: a community-­driven data sha...
Reproducibility in human cognitive neuroimaging: a community-­driven data sha...Reproducibility in human cognitive neuroimaging: a community-­driven data sha...
Reproducibility in human cognitive neuroimaging: a community-­driven data sha...
 
Part 1 Research workshop
Part 1 Research workshopPart 1 Research workshop
Part 1 Research workshop
 
Software Professionals (RSEs) at NCSA
Software Professionals (RSEs) at NCSASoftware Professionals (RSEs) at NCSA
Software Professionals (RSEs) at NCSA
 
ISO 15926 Reference Data Engineering Methodology
ISO 15926 Reference Data Engineering MethodologyISO 15926 Reference Data Engineering Methodology
ISO 15926 Reference Data Engineering Methodology
 
NSF SI2 program discussion at 2014 SI2 PI meeting
NSF SI2 program discussion at 2014 SI2 PI meetingNSF SI2 program discussion at 2014 SI2 PI meeting
NSF SI2 program discussion at 2014 SI2 PI meeting
 
Social Web: (Big) Data Mining | summer 2014/2015 course syllabus
Social Web: (Big) Data Mining | summer 2014/2015 course syllabusSocial Web: (Big) Data Mining | summer 2014/2015 course syllabus
Social Web: (Big) Data Mining | summer 2014/2015 course syllabus
 

00 intro

  • 1. Course Overview: An Introduction to Information Retrieval and Applications J. H. Wang Feb. 22, 2012
  • 2. Instructor & TA • Instructor – J. H. Wang ( 王正豪 ) – Assistant Professor, CSIE, NTUT – Office: R1534, Technology Building – E-mail: jhwang@csie.ntut.edu.tw – Tel: ext. 4238 – Office Hour: 9:00-12:00 am, every Tuesday and Wednesday • TA – Mr. Liu ( 劉瀚之 ) – R1424, Technology Building IR, Spring 2012 NTUT CSIE 2
  • 3. Course Description • Course Web Page – http://www.ntut.edu.tw/~jhwang/IR/ • Time: 9:10-12:00am, Thu. • Classroom: R1322, Technology Building • Textbook: – Christopher D. Manning, Prabhakar Raghavan and Hinrich Schuetze, Introduction to Information Retrieval, Cambridge University Press, 2008. • Available online • International Student Edition, imported by Kai-Fa ( 開發 ) Publishing • Prerequisites: – Basic knowledge of data structures and algorithms, linear algebra, and probability theory – Programming experience is *required* for homeworks & projects IR, Spring 2012 NTUT CSIE 3
  • 4. Additional References • References: – Ricardo Baeza-Yates and Berthier Ribeiro-Neto, Modern Information Retrieval: The Concepts and Technology behind Search, Addison-Wesley, 2011. • This is the second edition of their book Modern Information Retrieval in 1999. ( 華通 ) – Stefan Buettcher, Charles L.A. Clarke, and Gordon V. Cormack, Information Retrieval: Implementing and Evaluating Search Engines, MIT Press, 2010. – Bruce Croft, Donald Metzler, and Trevor Strohman, Search Engines: Information Retrieval in Practice, Addison-Wesley, 2010. ( 全華 ) IR, Spring 2012 NTUT CSIE 4
  • 5. More Books on IR • Gerald Salton, Automatic information organization and retrieval, McGraw-Hill, 1968. • Gerald Salton and M.J. McGill, Introduction to modern information retrieval, McGraw-Hill, 1983. – Two classics, but out-of-print. • C. J. van Rijsbergen, Information Retrieval, Butterworths, 1979. – The classic. More than 40 years old, but still worth reading. • K. Sparck Jones, P. Willett, Readings in Information Retrieval, Morgan Kaufmann, 1997. – A collection of classical IR papers. (out of print) • I.H. Witten, A. Moffat, T.C. Bell. Morgan Kaufmann, Managing Gigabytes, 2nd edition, 1999. – The authority on index construction and compression. IR, Spring 2012 NTUT CSIE 5
  • 6. Grading Policy • Homework assignments and programming exercises: 40% • Mid-term exam: 25% • Term project: 35% – Including the proposal and final report IR, Spring 2012 NTUT CSIE 6
  • 7. Programming Exercises and Term Project • About 3 programming exercises – Team-based (at most 2 persons per team) – You can either write your own code or reuse existing open source code • The term project – Either team-based system development (the same as programming exercises) – Or academic paper presentation • Only one person per team allowed – A proposal is required before midterm (Apr. 12, 2012) IR, Spring 2012 NTUT CSIE 7
  • 8. About the Term Project • The score you get depends on the difficulty and quality of your project – For system development: • System functions and correctness – For academic paper presentation • Quality and your presentation of the paper • Major methods/experimental results *must* be presented • Papers from top conferences are strongly suggested – E.g. SIGIR, WWW, CIKM, WSDM, JCDL, ICMR, … • Proposals are *required* for each team, and will counted in the score IR, Spring 2012 NTUT CSIE 8
  • 9. Online Submission • Submission instructions – Programs, project proposals, and project reports in electronic files must be submitted to the TA online at: • http://140.124.183.39/ir/ – Before submission: • User name: Your student ID • Please change your default password at your first login IR, Spring 2012 NTUT CSIE 9
  • 10. What this Course is NOT about • This course will NOT tell you – The tips and tricks of using search engines, although power users might have better ideas on how to improve them • There’re plenty of books and websites on that… – How to find books in libraries, although it’s somewhat related to the basic IR concepts – How to make money on the Web, although the currently largest search engine did it IR, Spring 2012 NTUT CSIE 10
  • 11. What’s Information Retrieval IR, Spring 2012 NTUT CSIE 11
  • 12. On Wikipedia IR, Spring 2012 NTUT CSIE 12
  • 13. On Google Images IR, Spring 2012 NTUT CSIE 13
  • 14. On Google Video Search IR, Spring 2012 NTUT CSIE 14
  • 15. On Google News (TW) IR, Spring 2012 NTUT CSIE 15
  • 16. On Google News (US) IR, Spring 2012 NTUT CSIE 16
  • 17. On Blogs IR, Spring 2012 NTUT CSIE 17
  • 18. On Google Translate… IR, Spring 2012 NTUT CSIE 18
  • 19. Or More Related Keywords • NBA • New York Knicks • Linsanity •… IR, Spring 2012 NTUT CSIE 19
  • 20. What if We Search in Chinese IR, Spring 2012 NTUT CSIE 20
  • 21. And More… • 紐約尼克 • 哈佛 • 台裔球員 •… • And other languages… • And other search engines… • And social websites… IR, Spring 2012 NTUT CSIE 21
  • 22. In Google Trends IR, Spring 2012 NTUT CSIE 22
  • 23. And More… IR, Spring 2012 NTUT CSIE 23
  • 24. And Other Keywords… IR, Spring 2012 NTUT CSIE 24
  • 25. And Other Keywords… IR, Spring 2012 NTUT CSIE 25
  • 26. Palanteer – TW Election IR, Spring 2012 NTUT CSIE 26
  • 27. IR, Spring 2012 NTUT CSIE 27
  • 28. IR, Spring 2012 NTUT CSIE 28
  • 29. What Is Information Retrieval? • “Information retrieval is a field concerned with the structure, analysis, organization, storage, searching, and retrieval of information.” (Salton, 1968) IR, Spring 2012 NTUT CSIE 29
  • 30. Goal • Information retrieval (IR): a research field that targets at effectively and efficiently searching information in text and multimedia documents • In this course, we will introduce the basic text and query models in IR, retrieval evaluation, indexing and searching, and applications for IR IR, Spring 2012 NTUT CSIE 30
  • 31. A Big Picture IR, Spring 2012 NTUT CSIE 31
  • 32. User Interface user need Text Text Operations logical view Doc representation Query Indexing Indexing user feedback Expansion query inverted file Retrieval Retrieval Inverte d Index retrieved docs Document Collection Ranking Ranking ranked docs IR, Spring 2012 NTUT CSIE 32
  • 33. Topics • Text IR – Indexing and searching – Query languages and operations • Retrieval evaluation • Modeling – Boolean model – Vector space model – Probabilistic model • Applications for IR – Multimedia IR – Web search – Digital libraries IR, Spring 2012 NTUT CSIE 33
  • 34. Organization of the Textbook • Basics in IR (focus) – Inverted indexes for boolean queries (Ch.1-5) – Term weighting and vector space model (Ch. 6-7) – Evaluation in IR (Ch. 8) • Advanced Topics – Relevance feedback (Ch. 9) – XML retrieval (Ch. 10) – Probabilistic IR (Ch. 11) – Language models (Ch. 12) • Machine learning in IR (useful) – Text classification (Ch. 13-15) – Document clustering (Ch. 16-18) • Web Search – Web crawling and indexes (Ch. 19-20) – Link analysis (Ch. 21) IR, Spring 2012 NTUT CSIE 34
  • 35. Pointers to Other Topics • Cross-language IR • Image, video, and multimedia IR • Speech retrieval • Music retrieval • User interfaces • Parallel, distributed, and P2P IR • Digital libraries • Information science perspective • Logic-based approaches to IR • Natural language processing techniques IR, Spring 2012 NTUT CSIE 35
  • 36. Tentative Schedule • Before midterm – Boolean retrieval (1 wk) – Indexing (2 wks) – Vector space model and evaluation (2 wk) – Relevance feedback (1 wk) – Probabilistic IR (2 wk) • After midterm – Text classification (1-2 wk) – Document clustering (1-2 wk) – Web search (2 wks) – Advanced topics: CLIR, IE, … (2 wks) – Term Project Presentation (3 wks) IR, Spring 2012 NTUT CSIE 36
  • 37. Generic Resources • Wikipedia page on Information Retrieval: http://en.wikipedia.org/wiki/Information_re • Information Retrieval Resources: http://www- csli.stanford.edu/~hinrich/information- retrieval.html • IR, Spring 2012 NTUT CSIE 37
  • 38. Academic Resources • Journals – ACM TOIS: Transactions on Information Systems – JASIST: Journal of the American Society of Information Sciences – IP&M: Information Processing and Management – IEEE TKDE: Transactions on Knowledge and Data Engineering • Conferences – ACM SIGIR: International Conference on Information Retrieval – WWW: World Wide Web Conference – ACM CIKM: Conference on Information Knowledge and Management – JCDL: ACM/IEEE Joint Conference on Digital Libraries – ACM WSDM: International Conference on Web Search and Data Mining – TREC: Text Retrieval Conference IR, Spring 2012 NTUT CSIE 38
  • 39. Thanks for Your Attention! IR, Spring 2012 NTUT CSIE 39