SlideShare una empresa de Scribd logo
1 de 110
Lemur Toolkit Tutorial
Introductions ,[object Object],[object Object]
Installation ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Overview ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Overview (part 2) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Overview ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Language Modeling for IR Estimate a multinomial  probability distribution  from the text Smooth the distribution with one estimated from  the entire collection P(w|  D ) = (1-  ) P(w|D)+    P(w|C)
[object Object],Query Likelihood ? P(Q|  D ) =    P(q|  D )
Kullback-Leibler Divergence ,[object Object],= ? KL(  Q |  D ) =    P(w|  Q ) log(P(w|  Q ) / P(w|  D ))
Inference Networks ,[object Object],q1 q2 qn q3 I d1 d2 d3 di
Summary of Ranking ,[object Object],[object Object],[object Object],[object Object],[object Object]
Other Techniques ,[object Object],[object Object],[object Object],[object Object],[object Object]
Overview ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Indexing ,[object Object],[object Object],[object Object]
Two Index Formats ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Indexing – Document Preparation ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],(*) Note: Microsoft Word and Microsoft PowerPoint can only be indexed on a Windows-based machine, and Office must be installed.
Indexing – Document Preparation ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Indexing - Parameters ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Indexing – Parameters ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Indexing – Parameters ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Indexing - Parameters ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Indexing - Parameters ,[object Object],[object Object],[object Object],[object Object],[object Object]
Indexing - Parameters ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Indexing – Parameters ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Indexing anchor text ,[object Object],[object Object]
Retrieval ,[object Object],[object Object],[object Object]
Retrieval - Parameters ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Retrieval - Parameters ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Retrieval - Parameters ,[object Object],[object Object],[object Object],[object Object]
Retrieval - Parameters ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Retrieval - Parameters ,[object Object],[object Object],[object Object]
Retrieval – Query Formatting ,[object Object],[object Object],[object Object],[object Object]
Retrieval - Parameters ,[object Object],[object Object],[object Object],[object Object],[object Object]
Retrieval - Parameters ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Retrieval – Parameters ,[object Object],[object Object],[object Object],[object Object]
Retrieval - Parameters ,[object Object],[object Object]
Retrieval – Parameters ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Outputting INEX Result Format ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Retrieval – Interpreting Results ,[object Object],[object Object],[object Object],[object Object],[object Object]
Retrieval – Interpreting Results ,[object Object],[object Object],[object Object],[object Object]
Retrieval - Evaluation ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Smoothing ,[object Object],[object Object],[object Object]
Use RetEval for TF.IDF ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Use RetEval for TF.IDF ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Overview ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Indri Query Language ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Term Operations name example behavior term dog occurrences of  dog   (Indri will stem and stop) “ term” “ dog” occurrences of  dog   (Indri will not stem or stop) ordered window #od n (blue car) blue   n  words or less before  car unordered window #ud n (blue car) blue  within  n  words of  car synonym list #syn(car automobile) occurrences of  car  or  automobile weighted synonym #wsyn(1.0 car 0.5 automobile) like synonym, but only counts occurrences of  automobile  as 0.5 of an occurrence any operator #any:person all occurrences of  the  person  field
Field Restriction/Evaluation name example behavior restriction dog.title counts only occurrences of  dog  in  title  field dog.title,header counts occurrences of  dog   in  title  or  header evaluation dog.(title) builds belief  b (dog)  using  title  language model dog.(title,header) b (dog)  estimated using language model from concatenation of  all  title  and  header  fields #od1(trevor strohman).person(title) builds a model from all  title  text for b (#od1(trevor strohman).person) - only counts “ trevor strohman ” occurrences in  person  fields
Numeric Operators name example behavior less #less(year 2000) occurrences of  year  field  < 2000 greater #greater(year 2000) year  field  > 2000 between #between(year 1990 2000) 1990 < year  field  < 2000 equals #equals(year 2000) year  field  = 2000
Belief Operations name example behavior combine #combine(dog train) 0.5 log(  b (dog)  ) +  0.5 log(  b (train)  ) weight, wand #weight(1.0 dog 0.5 train) 0.67 log(  b (dog)  ) + 0.33 log(  b (train)  ) wsum #wsum(1.0 dog  0.5 dog.(title)) log( 0.67  b (dog)  + 0.33  b (dog.(title))  ) not #not(dog) log( 1 -  b (dog)  ) max #max(dog train) returns maximum of  b (dog)  and  b (train) or #or(dog cat) log(1 - (1 -  b (dog) ) *  (1 -  b (cat) ))
Field/Passage Retrieval name example behavior field retrieval #combine[title]( query ) return only title fields ranked according to  #combine(query) - beliefs are estimated on each  title ’s language model  -may use any belief node passage retrieval #combine[passage200:100]( query ) dynamically created passages of length  200  created every  100  words are ranked by  #combine(query)
More Field/Passage Retrieval ,[object Object],[object Object],example behavior #combine[section]( bootstrap  #combine[./title]( methodology )) Rank sections matching  bootstrap  where the section’s  title  also matches  methodology
Filter Operations name example behavior filter require #filreq(elvis #combine(blue shoes)) rank documents that  contain  elvis  by  #combine(blue shoes) filter reject #filrej(shopping #combine(blue shoes)) rank documents that do not contain  shopping  by #combine(blue shoes)
Document Priors ,[object Object],name example behavior prior #combine(#prior(RECENT) global warming) ,[object Object],[object Object]
Ad Hoc Retrieval ,[object Object],[object Object],[object Object]
Query Expansion ,[object Object],[object Object]
Known Entity Search ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Overview ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Overview (part 2) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Indexing Your Data ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Indexing Text Corpora ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Indexing: Offset Annotation ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],docno type id name start length value parent
Indexing Text Corpora ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Indexing Text Corpora ,[object Object],[object Object],[object Object],[object Object],[object Object]
Overview (part 2) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
ParsedDocument struct ParsedDocument { const char* text; size_t textLength; indri::utility::greedy_vector<char*> terms; indri::utility::greedy_vector<indri::parse::TagExtent*> tags; indri::utility::greedy_vector<indri::parse::TermExtent> positions; indri::utility::greedy_vector<indri::parse::MetadataPair> metadata; };
ParsedDocument: Text ,[object Object],[object Object],[object Object],[object Object]
ParsedDocument: Content ,[object Object],[object Object],[object Object],[object Object],[object Object]
ParsedDocument: Terms ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
ParsedDocument: Terms ,[object Object],[object Object],[object Object],[object Object]
ParsedDocument: Tags ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
ParsedDocument: Tags ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
ParsedDocument: Tags ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
ParsedDocument: Tags ,[object Object],[object Object],<doc> <par> <sent> My dog still has fleas. </sent>   <sent> My cat does not have fleas. </sent> </par> </doc>
ParsedDocument: Tags ,[object Object],[object Object],[object Object],[object Object]
ParsedDocument: Tags ,[object Object],[object Object],[object Object],[object Object]
ParsedDocument: Metadata ,[object Object],[object Object],[object Object],[object Object],greedy_vector<indri::parse::MetadataPair> metadata
Overview (part 2) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Tag Conflation ,[object Object],[object Object],[object Object],[object Object]
Indexing Fields ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Indexing Fields ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Overview (part 2) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
dumpindex ,[object Object],[object Object],[object Object],[object Object]
dumpindex ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],word  term_count doc_count
dumpindex ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],document, count, positions term, stem, count, total_count
dumpindex ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
dumpindex ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],document, weight, begin, end
Overview (part 2) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Introducing the API ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Indri: IndexEnvironment ,[object Object],[object Object],[object Object],[object Object],[object Object]
Indri: IndexEnvironment ,[object Object],[object Object],[object Object],[object Object],[object Object]
Indri: QueryEnvironment ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
QueryEnvrionment: Opening ,[object Object],[object Object],[object Object],[object Object],[object Object]
QueryEnvironment: Running ,[object Object],[object Object],[object Object]
QueryEnvironment: Retrieving ,[object Object],[object Object],[object Object],[object Object],[object Object]
Lemur “Classic” API ,[object Object],[object Object],[object Object],[object Object]
Lemur Index Browsing ,[object Object],[object Object],[object Object],[object Object],[object Object]
Lemur Index Browsing ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Lemur Index Browsing ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Lemur Index Browsing ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Lemur Index Browsing ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Lemur: DocInfoList ,[object Object],[object Object],[object Object],[object Object],[object Object]
Lemur: TermInfoList ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Lemur Retrieval Class Name Description TFIDFRetMethod BM25 SimpleKLRetMethod KL-Divergence InQueryRetMethod Simplified InQuery CosSimRetMethod Cosine CORIRetMethod CORI OkapiRetMethod Okapi IndriRetMethod Indri (wraps QueryEnvironment)
Lemur Retrieval ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Lemur: Other tasks ,[object Object],[object Object],[object Object]
Getting Help ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Concluding: In Review ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Concluding: In Review ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Questions Ask us questions! What is the best way to do  x ? How do I get started with my particular task? Does the toolkit have the  x  feature? How can I modify the toolkit to do  x ? When do we get coffee?

Más contenido relacionado

La actualidad más candente

Extracting article text from the web with maximum subsequence segmentation
Extracting article text from the web with maximum subsequence segmentationExtracting article text from the web with maximum subsequence segmentation
Extracting article text from the web with maximum subsequence segmentation
Jhih-Ming Chen
 
Ryan hamahashi 10x10 presentation
Ryan hamahashi 10x10 presentationRyan hamahashi 10x10 presentation
Ryan hamahashi 10x10 presentation
ronin124
 
ImplementingChangeTrackingAndFlagging
ImplementingChangeTrackingAndFlaggingImplementingChangeTrackingAndFlagging
ImplementingChangeTrackingAndFlagging
Suite Solutions
 
Presentation of OpenNLP
Presentation of OpenNLPPresentation of OpenNLP
Presentation of OpenNLP
Robert Viseur
 

La actualidad más candente (20)

Search explained T3DD15
Search explained T3DD15Search explained T3DD15
Search explained T3DD15
 
Extracting article text from the web with maximum subsequence segmentation
Extracting article text from the web with maximum subsequence segmentationExtracting article text from the web with maximum subsequence segmentation
Extracting article text from the web with maximum subsequence segmentation
 
Python cheat-sheet
Python cheat-sheetPython cheat-sheet
Python cheat-sheet
 
11 Unit 1 Chapter 02 Python Fundamentals
11  Unit 1 Chapter 02 Python Fundamentals11  Unit 1 Chapter 02 Python Fundamentals
11 Unit 1 Chapter 02 Python Fundamentals
 
Ryan hamahashi 10x10 presentation
Ryan hamahashi 10x10 presentationRyan hamahashi 10x10 presentation
Ryan hamahashi 10x10 presentation
 
Python interview questions
Python interview questionsPython interview questions
Python interview questions
 
An approach to source code plagiarism
An approach to source code plagiarismAn approach to source code plagiarism
An approach to source code plagiarism
 
NetBase API Presentation
NetBase API PresentationNetBase API Presentation
NetBase API Presentation
 
Open nlp presentationss
Open nlp presentationssOpen nlp presentationss
Open nlp presentationss
 
NLP and LSA getting started
NLP and LSA getting startedNLP and LSA getting started
NLP and LSA getting started
 
AdvancedXPath
AdvancedXPathAdvancedXPath
AdvancedXPath
 
Java platform
Java platformJava platform
Java platform
 
Python interview questions and answers
Python interview questions and answersPython interview questions and answers
Python interview questions and answers
 
Python Tutorial Part 1
Python Tutorial Part 1Python Tutorial Part 1
Python Tutorial Part 1
 
Fundamentals of Python Programming
Fundamentals of Python ProgrammingFundamentals of Python Programming
Fundamentals of Python Programming
 
ImplementingChangeTrackingAndFlagging
ImplementingChangeTrackingAndFlaggingImplementingChangeTrackingAndFlagging
ImplementingChangeTrackingAndFlagging
 
What is Python?
What is Python?What is Python?
What is Python?
 
Web search engines
Web search enginesWeb search engines
Web search engines
 
Presentation of OpenNLP
Presentation of OpenNLPPresentation of OpenNLP
Presentation of OpenNLP
 
Introduction to programming with python
Introduction to programming with pythonIntroduction to programming with python
Introduction to programming with python
 

Destacado

Electronics past,present and future
Electronics past,present and futureElectronics past,present and future
Electronics past,present and future
Rajat Dhiman
 
Design Thinking: The one thing that will transform the way you think
Design Thinking: The one thing that will transform the way you thinkDesign Thinking: The one thing that will transform the way you think
Design Thinking: The one thing that will transform the way you think
Digital Surgeons
 

Destacado (20)

Info 2402 irt-chapter_2
Info 2402 irt-chapter_2Info 2402 irt-chapter_2
Info 2402 irt-chapter_2
 
Chap1
Chap1Chap1
Chap1
 
Reddit
RedditReddit
Reddit
 
Web search engines ( Mr.Mirza )
Web search engines ( Mr.Mirza )Web search engines ( Mr.Mirza )
Web search engines ( Mr.Mirza )
 
120521 agile contracts 2.1
120521 agile contracts 2.1120521 agile contracts 2.1
120521 agile contracts 2.1
 
Harvard Business School Presentation "Outsourcing Selling"
Harvard Business School Presentation "Outsourcing Selling"Harvard Business School Presentation "Outsourcing Selling"
Harvard Business School Presentation "Outsourcing Selling"
 
Agile Contracts by Drew Jemilo (Agile2015)
Agile Contracts by Drew Jemilo (Agile2015)Agile Contracts by Drew Jemilo (Agile2015)
Agile Contracts by Drew Jemilo (Agile2015)
 
Outsourcing ppt
Outsourcing pptOutsourcing ppt
Outsourcing ppt
 
Open Source Search Tools for www2010 conferencesourcesearchtoolswww20100426dA...
Open Source Search Tools for www2010 conferencesourcesearchtoolswww20100426dA...Open Source Search Tools for www2010 conferencesourcesearchtoolswww20100426dA...
Open Source Search Tools for www2010 conferencesourcesearchtoolswww20100426dA...
 
IBM Watson Technical Deep Dive Swiss Group for Artificial Intelligence and Co...
IBM Watson Technical Deep Dive Swiss Group for Artificial Intelligence and Co...IBM Watson Technical Deep Dive Swiss Group for Artificial Intelligence and Co...
IBM Watson Technical Deep Dive Swiss Group for Artificial Intelligence and Co...
 
Outsourcing
OutsourcingOutsourcing
Outsourcing
 
Design Thinking Method Cards (Beta 1.0)
Design Thinking Method Cards (Beta 1.0)Design Thinking Method Cards (Beta 1.0)
Design Thinking Method Cards (Beta 1.0)
 
Design Thinking Method Sticker 2014
Design Thinking Method Sticker 2014Design Thinking Method Sticker 2014
Design Thinking Method Sticker 2014
 
Scope of electronics and communication engineering.ppt
Scope of electronics and communication engineering.pptScope of electronics and communication engineering.ppt
Scope of electronics and communication engineering.ppt
 
Electronics past,present and future
Electronics past,present and futureElectronics past,present and future
Electronics past,present and future
 
How To Study Effectively
How To Study EffectivelyHow To Study Effectively
How To Study Effectively
 
Design Thinking Method Cards
Design Thinking Method CardsDesign Thinking Method Cards
Design Thinking Method Cards
 
Design Thinking: The one thing that will transform the way you think
Design Thinking: The one thing that will transform the way you thinkDesign Thinking: The one thing that will transform the way you think
Design Thinking: The one thing that will transform the way you think
 
Kickstarting Design Thinking
Kickstarting Design ThinkingKickstarting Design Thinking
Kickstarting Design Thinking
 
IDEO's design thinking.
IDEO's design thinking. IDEO's design thinking.
IDEO's design thinking.
 

Similar a Lemur Tutorial at SIGIR 2006

C:\Users\User\Desktop\Eclipse Infocenter
C:\Users\User\Desktop\Eclipse InfocenterC:\Users\User\Desktop\Eclipse Infocenter
C:\Users\User\Desktop\Eclipse Infocenter
Suite Solutions
 

Similar a Lemur Tutorial at SIGIR 2006 (20)

PDF Localization
PDF  LocalizationPDF  Localization
PDF Localization
 
Presentation log4 j
Presentation log4 jPresentation log4 j
Presentation log4 j
 
Presentation log4 j
Presentation log4 jPresentation log4 j
Presentation log4 j
 
Dan Holevoet, Google
Dan Holevoet, GoogleDan Holevoet, Google
Dan Holevoet, Google
 
Tags in html
Tags in htmlTags in html
Tags in html
 
Processing XML with Java
Processing XML with JavaProcessing XML with Java
Processing XML with Java
 
Pmm05 16
Pmm05 16Pmm05 16
Pmm05 16
 
Developing web apps using Erlang-Web
Developing web apps using Erlang-WebDeveloping web apps using Erlang-Web
Developing web apps using Erlang-Web
 
Phing - A PHP Build Tool (An Introduction)
Phing - A PHP Build Tool (An Introduction)Phing - A PHP Build Tool (An Introduction)
Phing - A PHP Build Tool (An Introduction)
 
Html
HtmlHtml
Html
 
Introduction to PHP
Introduction to PHPIntroduction to PHP
Introduction to PHP
 
Struts2
Struts2Struts2
Struts2
 
C:\Users\User\Desktop\Eclipse Infocenter
C:\Users\User\Desktop\Eclipse InfocenterC:\Users\User\Desktop\Eclipse Infocenter
C:\Users\User\Desktop\Eclipse Infocenter
 
Language-agnostic data analysis workflows and reproducible research
Language-agnostic data analysis workflows and reproducible researchLanguage-agnostic data analysis workflows and reproducible research
Language-agnostic data analysis workflows and reproducible research
 
Methods and Best Practices for High Performance eCommerce
Methods and Best Practices for High Performance eCommerceMethods and Best Practices for High Performance eCommerce
Methods and Best Practices for High Performance eCommerce
 
Code Generation using T4
Code Generation using T4Code Generation using T4
Code Generation using T4
 
Unit 5
Unit 5Unit 5
Unit 5
 
Sumo Logic "How to" Webinar: Advanced Analytics
Sumo Logic "How to" Webinar: Advanced AnalyticsSumo Logic "How to" Webinar: Advanced Analytics
Sumo Logic "How to" Webinar: Advanced Analytics
 
Back-2-Basics: Exception & Event Instrumentation in .NET
Back-2-Basics: Exception & Event Instrumentation in .NETBack-2-Basics: Exception & Event Instrumentation in .NET
Back-2-Basics: Exception & Event Instrumentation in .NET
 
Back-2-Basics: Exception & Event Instrumentation in .NET
Back-2-Basics: Exception & Event Instrumentation in .NETBack-2-Basics: Exception & Event Instrumentation in .NET
Back-2-Basics: Exception & Event Instrumentation in .NET
 

Último

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Último (20)

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 

Lemur Tutorial at SIGIR 2006

  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7. Language Modeling for IR Estimate a multinomial probability distribution from the text Smooth the distribution with one estimated from the entire collection P(w|  D ) = (1-  ) P(w|D)+  P(w|C)
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.
  • 38.
  • 39.
  • 40.
  • 41.
  • 42.
  • 43.
  • 44.
  • 45.
  • 46.
  • 47. Term Operations name example behavior term dog occurrences of dog (Indri will stem and stop) “ term” “ dog” occurrences of dog (Indri will not stem or stop) ordered window #od n (blue car) blue n words or less before car unordered window #ud n (blue car) blue within n words of car synonym list #syn(car automobile) occurrences of car or automobile weighted synonym #wsyn(1.0 car 0.5 automobile) like synonym, but only counts occurrences of automobile as 0.5 of an occurrence any operator #any:person all occurrences of the person field
  • 48. Field Restriction/Evaluation name example behavior restriction dog.title counts only occurrences of dog in title field dog.title,header counts occurrences of dog in title or header evaluation dog.(title) builds belief b (dog) using title language model dog.(title,header) b (dog) estimated using language model from concatenation of all title and header fields #od1(trevor strohman).person(title) builds a model from all title text for b (#od1(trevor strohman).person) - only counts “ trevor strohman ” occurrences in person fields
  • 49. Numeric Operators name example behavior less #less(year 2000) occurrences of year field < 2000 greater #greater(year 2000) year field > 2000 between #between(year 1990 2000) 1990 < year field < 2000 equals #equals(year 2000) year field = 2000
  • 50. Belief Operations name example behavior combine #combine(dog train) 0.5 log( b (dog) ) + 0.5 log( b (train) ) weight, wand #weight(1.0 dog 0.5 train) 0.67 log( b (dog) ) + 0.33 log( b (train) ) wsum #wsum(1.0 dog 0.5 dog.(title)) log( 0.67 b (dog) + 0.33 b (dog.(title)) ) not #not(dog) log( 1 - b (dog) ) max #max(dog train) returns maximum of b (dog) and b (train) or #or(dog cat) log(1 - (1 - b (dog) ) * (1 - b (cat) ))
  • 51. Field/Passage Retrieval name example behavior field retrieval #combine[title]( query ) return only title fields ranked according to #combine(query) - beliefs are estimated on each title ’s language model -may use any belief node passage retrieval #combine[passage200:100]( query ) dynamically created passages of length 200 created every 100 words are ranked by #combine(query)
  • 52.
  • 53. Filter Operations name example behavior filter require #filreq(elvis #combine(blue shoes)) rank documents that contain elvis by #combine(blue shoes) filter reject #filrej(shopping #combine(blue shoes)) rank documents that do not contain shopping by #combine(blue shoes)
  • 54.
  • 55.
  • 56.
  • 57.
  • 58.
  • 59.
  • 60.
  • 61.
  • 62.
  • 63.
  • 64.
  • 65.
  • 66. ParsedDocument struct ParsedDocument { const char* text; size_t textLength; indri::utility::greedy_vector<char*> terms; indri::utility::greedy_vector<indri::parse::TagExtent*> tags; indri::utility::greedy_vector<indri::parse::TermExtent> positions; indri::utility::greedy_vector<indri::parse::MetadataPair> metadata; };
  • 67.
  • 68.
  • 69.
  • 70.
  • 71.
  • 72.
  • 73.
  • 74.
  • 75.
  • 76.
  • 77.
  • 78.
  • 79.
  • 80.
  • 81.
  • 82.
  • 83.
  • 84.
  • 85.
  • 86.
  • 87.
  • 88.
  • 89.
  • 90.
  • 91.
  • 92.
  • 93.
  • 94.
  • 95.
  • 96.
  • 97.
  • 98.
  • 99.
  • 100.
  • 101.
  • 102.
  • 103.
  • 104. Lemur Retrieval Class Name Description TFIDFRetMethod BM25 SimpleKLRetMethod KL-Divergence InQueryRetMethod Simplified InQuery CosSimRetMethod Cosine CORIRetMethod CORI OkapiRetMethod Okapi IndriRetMethod Indri (wraps QueryEnvironment)
  • 105.
  • 106.
  • 107.
  • 108.
  • 109.
  • 110. Questions Ask us questions! What is the best way to do x ? How do I get started with my particular task? Does the toolkit have the x feature? How can I modify the toolkit to do x ? When do we get coffee?

Notas del editor

  1. More information can be found at: http://www.lemurproject.org/tutorials/begin_indexing-1.html
  2. Blue text indicates settings specific to Indri applications. Refer to online documentation for other Lemur application parameters.
  3. Could also work with plain text documents
  4. Put link into web page for document classes
  5. Run a simple query now from the command line
  6. Command line options –runID=runName
  7. Run canned queries now! Evaluate using trec_eval
  8. &lt;parameters&gt; &lt;docFormat&gt;web&lt;/docFormat&gt; &lt;outputFile&gt;queries.reteval&lt;/outputFile&gt; &lt;stemmer&gt;krovetz&lt;/stemmer&gt; &lt;/parameters&gt;
  9. &lt;parameters&gt; &lt;index&gt;index&lt;/index&gt; &lt;retModel&gt;0&lt;/retModel&gt; &lt;textQuery&gt;queries.reteval&lt;/textQuery&gt; &lt;resultCount&gt;1000&lt;/resultCount&gt; &lt;resultFile&gt;tfidf.res&lt;/resultFile&gt; &lt;/parameters&gt;
  10. Indri normalizes weights to sum to 1.
  11. Better example and mention filed hierarchy at index time