Techniques of information retrieval

Tariq Hassan
Tariq HassanSCO en Board of Revenue,Punjab
Techniques of information retrieval
Techniques of Information Retrieval
Tariq Hassan & Sabahat
Road Map :
• What is IR ?
• Why & How it works?
• Evaluation Techniques
• Global & Local Methods
1. Relevance Feedback
2. Probabilistic Relevance Feedback
3. Indirect Relevance Feedback
4. Rocchio Algorithm
5. Linear Classifiers
6. Naïve Bayes Text Classification
Question & Discussion
What is IR? Why & How?
• Information needed to satisfy user.
• Why?
Due to different formats of Data.
• How?
StopList
Stemming
Inverse Document Frequency
Word Counts
What is IR? Why & How?
Generally IR used in 3 scenarios
1. Web search
2. Personal IR ( Text Classification )
3. Enterprise Level
Evaluation Techniques
• Why?
• How?
Relevant & Non Relevant Documents
Precision And Recall Methods
P = # (relevant Items Retrieved)
#(retrieved Items)
R = #(relevant Items Retrieved)
#(relevant Items)
Methods:
1. Global Methods
Reformulation Queries
2. Local Methods
Relative to the initial results against any query
Local Methods
1. Relevance Feedback
2. Probabilistic Relevance Feedback
3. Indirect Feedback
1. Relevance Feedback
Feedback given by the user about the relevance of the
documents in the initial set of results.
1. Relevance Feedback
2. Probabilistic Relevance Feedback
PRF is implementing by building a classifiers.
1. Relevance Feedback
2. Probabilistic Relevance Feedback
3. Indirect Relevance Feedback
Without user interventions.
1. By using user actions.
2. By using user Histories or Logs
Conclusion :
Relevance Feedback
Assumption:
User have initial knowledge
Issues :
Misspelling
Cross Languages
Mismatch Vocabulary
Rocchio Algorithm
Incorporates the relevance feedback
mechanism in vector space model.
Also uses the
Cosine Similarity Function
Euclidean Mechanism
Example
Outcome
• Relevance Feedback plays an important
role to understand the user requirements.
• Rocchio Algorithm is not the best but the
optimized and better option due to its
simplicity and good results.
• Have a significant importance with respect
to content based systems.
Classification Problems
• Given:
– A document d
– A fixed set of categories:
Sports, Informatics, literature, medical, entertainment
– A training set of documents each labeled
with its class
• Determine:
– A learning method or algorithm which will
enable us to learn a classifier
– For a test document dT we have to determine
its category
Classification Techniques
• Manual (a.k.a. Knowledge Engineering)
–typically, rule-based expert systems
• Machine Learning
–Naïve Bayesian (Probabilistic)
– Decision Trees (Decision Structures)
– Support Vector Machines (Linear Classification)
Document Representation
• Binary Representation
• Frequency Representation
• TF*IDF Representation
Naïve Bayes document
classification example
• Probabilistic
– Prior vs Posterior
• Bernoulli Model
– Feature vector with binary elements
• Multinomial Model
– Integers representing frequency of
words
Techniques of information retrieval
Classify the document
Naïve Bayes classfication
• Very fast learning and testing
– Why?
• Low storage requirements
• Very good in domains with many
equally important features
• More robust to irrelevant features
than many learning methods
Linear Classification
• Documents as labeled vectors
• Documents in the same class form a
contiguous region of space
• Documents from different classes don’t
overlap (much)
• Learning a classifier: build surfaces to
delineate classes in the space
Support Vector Machines
• Find a linear hyperplane (decision boundary) that
will separate the data
Support Vector Machines
• OnePossibleSolution
B1
Support Vector Machines
• Anotherpossiblesolution
B2
Support Vector Machines
• Otherpossiblesolutions
B2
Support Vector Machines
• Which one is better? B1 or B2?
• How do you define better?
B1
B2
Support Vector Machines
• Find hyperplane maximizes the margin
B1
B2
b11
b12
b21
b22
margin
Support Vector Machines
B1
B2
b11
b12
b21
b22
margin
Support
Vectors
Support Vector Machines
B1
b11
b12
0 bxw

1 bxw
 1 bxw







1bxwif1
1bxwif1
)( 


xf 2
||||
2
Margin
w

Support Vector Machines
B1
b11
b12
0 bxw

1 bxw
 1 bxw







1bxwif1
1bxwif1
)( 


xf 2
||||
2
Margin
w

Techniques of information retrieval
Bottom Line
• Which classifier do I use for a given document
classification problem?
 Answer : Depends
 How much training data is available?
 How simple/complex is the problem?
 How noisy is the data?
 How stable is the problem over time?
 For an unstable problem, its better to use a
simple and robust classifier.
1 de 31

Recomendados

Tutorial 1 (information retrieval basics) por
Tutorial 1 (information retrieval basics)Tutorial 1 (information retrieval basics)
Tutorial 1 (information retrieval basics)Kira
10.6K vistas19 diapositivas
Information Retrieval por
Information RetrievalInformation Retrieval
Information Retrievalrchbeir
3.2K vistas74 diapositivas
Introduction to Information Retrieval por
Introduction to Information RetrievalIntroduction to Information Retrieval
Introduction to Information RetrievalCarsten Eickhoff
2.5K vistas75 diapositivas
IR por
IRIR
IRGirish Khanzode
5K vistas136 diapositivas
Information retrieval concept, practice and challenge por
Information retrieval   concept, practice and challengeInformation retrieval   concept, practice and challenge
Information retrieval concept, practice and challengeGan Keng Hoon
2.1K vistas60 diapositivas
Text Data Mining por
Text Data MiningText Data Mining
Text Data MiningKU Leuven
1.8K vistas108 diapositivas

Más contenido relacionado

La actualidad más candente

The vector space model por
The vector space modelThe vector space model
The vector space modelpkgosh
6.3K vistas12 diapositivas
Introduction to Information Retrieval por
Introduction to Information RetrievalIntroduction to Information Retrieval
Introduction to Information RetrievalRoi Blanco
12.2K vistas200 diapositivas
Latest trends in AI and information Retrieval por
Latest trends in AI and information Retrieval Latest trends in AI and information Retrieval
Latest trends in AI and information Retrieval Abhay Ratnaparkhi
222 vistas23 diapositivas
Aggregation for searching complex information spaces por
Aggregation for searching complex information spacesAggregation for searching complex information spaces
Aggregation for searching complex information spacesMounia Lalmas-Roelleke
1.8K vistas44 diapositivas
Text data mining1 por
Text data mining1Text data mining1
Text data mining1KU Leuven
7.5K vistas104 diapositivas
3. introduction to text mining por
3. introduction to text mining3. introduction to text mining
3. introduction to text miningLokesh Ramaswamy
429 vistas20 diapositivas

La actualidad más candente(20)

The vector space model por pkgosh
The vector space modelThe vector space model
The vector space model
pkgosh6.3K vistas
Introduction to Information Retrieval por Roi Blanco
Introduction to Information RetrievalIntroduction to Information Retrieval
Introduction to Information Retrieval
Roi Blanco12.2K vistas
Latest trends in AI and information Retrieval por Abhay Ratnaparkhi
Latest trends in AI and information Retrieval Latest trends in AI and information Retrieval
Latest trends in AI and information Retrieval
Abhay Ratnaparkhi222 vistas
Text data mining1 por KU Leuven
Text data mining1Text data mining1
Text data mining1
KU Leuven7.5K vistas
Tdm recent trends por KU Leuven
Tdm recent trendsTdm recent trends
Tdm recent trends
KU Leuven1.2K vistas
Model of information retrieval (3) por 9866825059
Model  of information retrieval (3)Model  of information retrieval (3)
Model of information retrieval (3)
98668250599.9K vistas
4.4 text mining por Krish_ver2
4.4 text mining4.4 text mining
4.4 text mining
Krish_ver22.5K vistas
Information Retrieval por ssbd6985
Information RetrievalInformation Retrieval
Information Retrieval
ssbd6985222 vistas
Big Data & Text Mining por Michel Bruley
Big Data & Text MiningBig Data & Text Mining
Big Data & Text Mining
Michel Bruley20.6K vistas
Concepts and Challenges of Text Retrieval for Search Engine por Gan Keng Hoon
Concepts and Challenges of Text Retrieval for Search EngineConcepts and Challenges of Text Retrieval for Search Engine
Concepts and Challenges of Text Retrieval for Search Engine
Gan Keng Hoon1.3K vistas

Destacado

Information Retrieval Techniques of Google por
Information Retrieval Techniques of Google Information Retrieval Techniques of Google
Information Retrieval Techniques of Google Cyr Ish
4.8K vistas19 diapositivas
Information retrieval s por
Information retrieval sInformation retrieval s
Information retrieval ssilambu111
38.9K vistas18 diapositivas
Introduction to Information Retrieval & Models por
Introduction to Information Retrieval & ModelsIntroduction to Information Retrieval & Models
Introduction to Information Retrieval & ModelsMounia Lalmas-Roelleke
49.5K vistas179 diapositivas
Information retrieval system! por
Information retrieval system!Information retrieval system!
Information retrieval system!Jane Garay
13.7K vistas10 diapositivas
Storage And Retrieval Of Information por
Storage And Retrieval Of InformationStorage And Retrieval Of Information
Storage And Retrieval Of InformationMarcus9000
22.3K vistas31 diapositivas

Similar a Techniques of information retrieval

RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m... por
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...Joaquin Delgado PhD.
3.5K vistas73 diapositivas
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning... por
 RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning... RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...S. Diana Hu
854 vistas73 diapositivas
Evaluation of web scale discovery services por
Evaluation of web scale discovery servicesEvaluation of web scale discovery services
Evaluation of web scale discovery servicesNikesh Narayanan
1K vistas56 diapositivas
Machine learning (ML) and natural language processing (NLP) por
Machine learning (ML) and natural language processing (NLP)Machine learning (ML) and natural language processing (NLP)
Machine learning (ML) and natural language processing (NLP)Nikola Milosevic
323 vistas23 diapositivas
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com por
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.comEnhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.comSimon Hughes
354 vistas27 diapositivas
Machine Learned Relevance at A Large Scale Search Engine por
Machine Learned Relevance at A Large Scale Search EngineMachine Learned Relevance at A Large Scale Search Engine
Machine Learned Relevance at A Large Scale Search EngineSalford Systems
1.3K vistas75 diapositivas

Similar a Techniques of information retrieval (20)

RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m... por Joaquin Delgado PhD.
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
Joaquin Delgado PhD.3.5K vistas
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning... por S. Diana Hu
 RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning... RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
S. Diana Hu854 vistas
Evaluation of web scale discovery services por Nikesh Narayanan
Evaluation of web scale discovery servicesEvaluation of web scale discovery services
Evaluation of web scale discovery services
Nikesh Narayanan1K vistas
Machine learning (ML) and natural language processing (NLP) por Nikola Milosevic
Machine learning (ML) and natural language processing (NLP)Machine learning (ML) and natural language processing (NLP)
Machine learning (ML) and natural language processing (NLP)
Nikola Milosevic323 vistas
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com por Simon Hughes
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.comEnhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
Simon Hughes354 vistas
Machine Learned Relevance at A Large Scale Search Engine por Salford Systems
Machine Learned Relevance at A Large Scale Search EngineMachine Learned Relevance at A Large Scale Search Engine
Machine Learned Relevance at A Large Scale Search Engine
Salford Systems1.3K vistas
Indexing Techniques: Their Usage in Search Engines for Information Retrieval por Vikas Bhushan
Indexing Techniques: Their Usage in Search Engines for Information RetrievalIndexing Techniques: Their Usage in Search Engines for Information Retrieval
Indexing Techniques: Their Usage in Search Engines for Information Retrieval
Vikas Bhushan9K vistas
An introduction to Elasticsearch's advanced relevance ranking toolbox por Elasticsearch
An introduction to Elasticsearch's advanced relevance ranking toolboxAn introduction to Elasticsearch's advanced relevance ranking toolbox
An introduction to Elasticsearch's advanced relevance ranking toolbox
Elasticsearch356 vistas
Semantic Similarity and Selection of Resources Published According to Linked ... por Riccardo Albertoni
Semantic Similarity and Selection of Resources Published According to Linked ...Semantic Similarity and Selection of Resources Published According to Linked ...
Semantic Similarity and Selection of Resources Published According to Linked ...
Riccardo Albertoni536 vistas
Large-Scale Semantic Search por Roi Blanco
Large-Scale Semantic SearchLarge-Scale Semantic Search
Large-Scale Semantic Search
Roi Blanco788 vistas
CS6007 information retrieval - 5 units notes por Anandh Arumugakan
CS6007   information retrieval - 5 units notesCS6007   information retrieval - 5 units notes
CS6007 information retrieval - 5 units notes
Anandh Arumugakan7.6K vistas
Information retrieval 1 introduction to ir por Vaibhav Khanna
Information retrieval 1 introduction to irInformation retrieval 1 introduction to ir
Information retrieval 1 introduction to ir
Vaibhav Khanna115 vistas
Addressing scalability challenges in peer-to-peer search por Harisankar H
Addressing scalability challenges in peer-to-peer searchAddressing scalability challenges in peer-to-peer search
Addressing scalability challenges in peer-to-peer search
Harisankar H987 vistas
Language Models for Information Retrieval por Nik Spirin
Language Models for Information RetrievalLanguage Models for Information Retrieval
Language Models for Information Retrieval
Nik Spirin2.3K vistas
Information retrieval 6 ir models por Vaibhav Khanna
Information retrieval 6 ir modelsInformation retrieval 6 ir models
Information retrieval 6 ir models
Vaibhav Khanna76 vistas
National Center for Academic & Dissertation Excellence por Jamie_Patterson
National Center for Academic & Dissertation ExcellenceNational Center for Academic & Dissertation Excellence
National Center for Academic & Dissertation Excellence
Jamie_Patterson1.1K vistas
Combining IR with Relevance Feedback for Concept Location por Sonia Haiduc
Combining IR with Relevance Feedback for Concept LocationCombining IR with Relevance Feedback for Concept Location
Combining IR with Relevance Feedback for Concept Location
Sonia Haiduc945 vistas
How Portable Are the Metadata Standards for Scientific Data? por Jian Qin
How Portable Are the Metadata Standards for Scientific Data?How Portable Are the Metadata Standards for Scientific Data?
How Portable Are the Metadata Standards for Scientific Data?
Jian Qin784 vistas
How Portable Are the Metadata Standards for Scientific Data? por Jian Qin
How Portable Are the Metadata Standards for Scientific Data?How Portable Are the Metadata Standards for Scientific Data?
How Portable Are the Metadata Standards for Scientific Data?
Jian Qin861 vistas

Último

GigaIO: The March of Composability Onward to Memory with CXL por
GigaIO: The March of Composability Onward to Memory with CXLGigaIO: The March of Composability Onward to Memory with CXL
GigaIO: The March of Composability Onward to Memory with CXLCXL Forum
126 vistas12 diapositivas
"AI Startup Growth from Idea to 1M ARR", Oleksandr Uspenskyi por
"AI Startup Growth from Idea to 1M ARR", Oleksandr Uspenskyi"AI Startup Growth from Idea to 1M ARR", Oleksandr Uspenskyi
"AI Startup Growth from Idea to 1M ARR", Oleksandr UspenskyiFwdays
26 vistas9 diapositivas
Architecting CX Measurement Frameworks and Ensuring CX Metrics are fit for Pu... por
Architecting CX Measurement Frameworks and Ensuring CX Metrics are fit for Pu...Architecting CX Measurement Frameworks and Ensuring CX Metrics are fit for Pu...
Architecting CX Measurement Frameworks and Ensuring CX Metrics are fit for Pu...NUS-ISS
32 vistas54 diapositivas
"Role of a CTO in software outsourcing company", Yuriy Nakonechnyy por
"Role of a CTO in software outsourcing company", Yuriy Nakonechnyy"Role of a CTO in software outsourcing company", Yuriy Nakonechnyy
"Role of a CTO in software outsourcing company", Yuriy NakonechnyyFwdays
40 vistas21 diapositivas
"How we switched to Kanban and how it integrates with product planning", Vady... por
"How we switched to Kanban and how it integrates with product planning", Vady..."How we switched to Kanban and how it integrates with product planning", Vady...
"How we switched to Kanban and how it integrates with product planning", Vady...Fwdays
61 vistas24 diapositivas
"Quality Assurance: Achieving Excellence in startup without a Dedicated QA", ... por
"Quality Assurance: Achieving Excellence in startup without a Dedicated QA", ..."Quality Assurance: Achieving Excellence in startup without a Dedicated QA", ...
"Quality Assurance: Achieving Excellence in startup without a Dedicated QA", ...Fwdays
33 vistas39 diapositivas

Último(20)

GigaIO: The March of Composability Onward to Memory with CXL por CXL Forum
GigaIO: The March of Composability Onward to Memory with CXLGigaIO: The March of Composability Onward to Memory with CXL
GigaIO: The March of Composability Onward to Memory with CXL
CXL Forum126 vistas
"AI Startup Growth from Idea to 1M ARR", Oleksandr Uspenskyi por Fwdays
"AI Startup Growth from Idea to 1M ARR", Oleksandr Uspenskyi"AI Startup Growth from Idea to 1M ARR", Oleksandr Uspenskyi
"AI Startup Growth from Idea to 1M ARR", Oleksandr Uspenskyi
Fwdays26 vistas
Architecting CX Measurement Frameworks and Ensuring CX Metrics are fit for Pu... por NUS-ISS
Architecting CX Measurement Frameworks and Ensuring CX Metrics are fit for Pu...Architecting CX Measurement Frameworks and Ensuring CX Metrics are fit for Pu...
Architecting CX Measurement Frameworks and Ensuring CX Metrics are fit for Pu...
NUS-ISS32 vistas
"Role of a CTO in software outsourcing company", Yuriy Nakonechnyy por Fwdays
"Role of a CTO in software outsourcing company", Yuriy Nakonechnyy"Role of a CTO in software outsourcing company", Yuriy Nakonechnyy
"Role of a CTO in software outsourcing company", Yuriy Nakonechnyy
Fwdays40 vistas
"How we switched to Kanban and how it integrates with product planning", Vady... por Fwdays
"How we switched to Kanban and how it integrates with product planning", Vady..."How we switched to Kanban and how it integrates with product planning", Vady...
"How we switched to Kanban and how it integrates with product planning", Vady...
Fwdays61 vistas
"Quality Assurance: Achieving Excellence in startup without a Dedicated QA", ... por Fwdays
"Quality Assurance: Achieving Excellence in startup without a Dedicated QA", ..."Quality Assurance: Achieving Excellence in startup without a Dedicated QA", ...
"Quality Assurance: Achieving Excellence in startup without a Dedicated QA", ...
Fwdays33 vistas
Astera Labs: Intelligent Connectivity for Cloud and AI Infrastructure por CXL Forum
Astera Labs:  Intelligent Connectivity for Cloud and AI InfrastructureAstera Labs:  Intelligent Connectivity for Cloud and AI Infrastructure
Astera Labs: Intelligent Connectivity for Cloud and AI Infrastructure
CXL Forum125 vistas
Combining Orchestration and Choreography for a Clean Architecture por ThomasHeinrichs1
Combining Orchestration and Choreography for a Clean ArchitectureCombining Orchestration and Choreography for a Clean Architecture
Combining Orchestration and Choreography for a Clean Architecture
ThomasHeinrichs168 vistas
TE Connectivity: Card Edge Interconnects por CXL Forum
TE Connectivity: Card Edge InterconnectsTE Connectivity: Card Edge Interconnects
TE Connectivity: Card Edge Interconnects
CXL Forum96 vistas
.conf Go 2023 - How KPN drives Customer Satisfaction on IPTV por Splunk
.conf Go 2023 - How KPN drives Customer Satisfaction on IPTV.conf Go 2023 - How KPN drives Customer Satisfaction on IPTV
.conf Go 2023 - How KPN drives Customer Satisfaction on IPTV
Splunk86 vistas
Micron CXL product and architecture update por CXL Forum
Micron CXL product and architecture updateMicron CXL product and architecture update
Micron CXL product and architecture update
CXL Forum27 vistas
AMD: 4th Generation EPYC CXL Demo por CXL Forum
AMD: 4th Generation EPYC CXL DemoAMD: 4th Generation EPYC CXL Demo
AMD: 4th Generation EPYC CXL Demo
CXL Forum126 vistas
Samsung: CMM-H Tiered Memory Solution with Built-in DRAM por CXL Forum
Samsung: CMM-H Tiered Memory Solution with Built-in DRAMSamsung: CMM-H Tiered Memory Solution with Built-in DRAM
Samsung: CMM-H Tiered Memory Solution with Built-in DRAM
CXL Forum105 vistas
"Ukrainian Mobile Banking Scaling in Practice. From 0 to 100 and beyond", Vad... por Fwdays
"Ukrainian Mobile Banking Scaling in Practice. From 0 to 100 and beyond", Vad..."Ukrainian Mobile Banking Scaling in Practice. From 0 to 100 and beyond", Vad...
"Ukrainian Mobile Banking Scaling in Practice. From 0 to 100 and beyond", Vad...
Fwdays40 vistas
PharoJS - Zürich Smalltalk Group Meetup November 2023 por Noury Bouraqadi
PharoJS - Zürich Smalltalk Group Meetup November 2023PharoJS - Zürich Smalltalk Group Meetup November 2023
PharoJS - Zürich Smalltalk Group Meetup November 2023
Noury Bouraqadi113 vistas
Spesifikasi Lengkap ASUS Vivobook Go 14 por Dot Semarang
Spesifikasi Lengkap ASUS Vivobook Go 14Spesifikasi Lengkap ASUS Vivobook Go 14
Spesifikasi Lengkap ASUS Vivobook Go 14
Dot Semarang35 vistas
Future of Learning - Yap Aye Wee.pdf por NUS-ISS
Future of Learning - Yap Aye Wee.pdfFuture of Learning - Yap Aye Wee.pdf
Future of Learning - Yap Aye Wee.pdf
NUS-ISS38 vistas

Techniques of information retrieval

  • 2. Techniques of Information Retrieval Tariq Hassan & Sabahat
  • 3. Road Map : • What is IR ? • Why & How it works? • Evaluation Techniques • Global & Local Methods 1. Relevance Feedback 2. Probabilistic Relevance Feedback 3. Indirect Relevance Feedback 4. Rocchio Algorithm 5. Linear Classifiers 6. Naïve Bayes Text Classification Question & Discussion
  • 4. What is IR? Why & How? • Information needed to satisfy user. • Why? Due to different formats of Data. • How? StopList Stemming Inverse Document Frequency Word Counts
  • 5. What is IR? Why & How? Generally IR used in 3 scenarios 1. Web search 2. Personal IR ( Text Classification ) 3. Enterprise Level
  • 6. Evaluation Techniques • Why? • How? Relevant & Non Relevant Documents Precision And Recall Methods P = # (relevant Items Retrieved) #(retrieved Items) R = #(relevant Items Retrieved) #(relevant Items)
  • 7. Methods: 1. Global Methods Reformulation Queries 2. Local Methods Relative to the initial results against any query
  • 8. Local Methods 1. Relevance Feedback 2. Probabilistic Relevance Feedback 3. Indirect Feedback 1. Relevance Feedback Feedback given by the user about the relevance of the documents in the initial set of results. 1. Relevance Feedback 2. Probabilistic Relevance Feedback PRF is implementing by building a classifiers. 1. Relevance Feedback 2. Probabilistic Relevance Feedback 3. Indirect Relevance Feedback Without user interventions. 1. By using user actions. 2. By using user Histories or Logs
  • 9. Conclusion : Relevance Feedback Assumption: User have initial knowledge Issues : Misspelling Cross Languages Mismatch Vocabulary
  • 10. Rocchio Algorithm Incorporates the relevance feedback mechanism in vector space model. Also uses the Cosine Similarity Function Euclidean Mechanism
  • 12. Outcome • Relevance Feedback plays an important role to understand the user requirements. • Rocchio Algorithm is not the best but the optimized and better option due to its simplicity and good results. • Have a significant importance with respect to content based systems.
  • 13. Classification Problems • Given: – A document d – A fixed set of categories: Sports, Informatics, literature, medical, entertainment – A training set of documents each labeled with its class • Determine: – A learning method or algorithm which will enable us to learn a classifier – For a test document dT we have to determine its category
  • 14. Classification Techniques • Manual (a.k.a. Knowledge Engineering) –typically, rule-based expert systems • Machine Learning –Naïve Bayesian (Probabilistic) – Decision Trees (Decision Structures) – Support Vector Machines (Linear Classification)
  • 15. Document Representation • Binary Representation • Frequency Representation • TF*IDF Representation
  • 16. Naïve Bayes document classification example • Probabilistic – Prior vs Posterior • Bernoulli Model – Feature vector with binary elements • Multinomial Model – Integers representing frequency of words
  • 19. Naïve Bayes classfication • Very fast learning and testing – Why? • Low storage requirements • Very good in domains with many equally important features • More robust to irrelevant features than many learning methods
  • 20. Linear Classification • Documents as labeled vectors • Documents in the same class form a contiguous region of space • Documents from different classes don’t overlap (much) • Learning a classifier: build surfaces to delineate classes in the space
  • 21. Support Vector Machines • Find a linear hyperplane (decision boundary) that will separate the data
  • 22. Support Vector Machines • OnePossibleSolution B1
  • 23. Support Vector Machines • Anotherpossiblesolution B2
  • 24. Support Vector Machines • Otherpossiblesolutions B2
  • 25. Support Vector Machines • Which one is better? B1 or B2? • How do you define better? B1 B2
  • 26. Support Vector Machines • Find hyperplane maximizes the margin B1 B2 b11 b12 b21 b22 margin
  • 28. Support Vector Machines B1 b11 b12 0 bxw  1 bxw  1 bxw        1bxwif1 1bxwif1 )(    xf 2 |||| 2 Margin w 
  • 29. Support Vector Machines B1 b11 b12 0 bxw  1 bxw  1 bxw        1bxwif1 1bxwif1 )(    xf 2 |||| 2 Margin w 
  • 31. Bottom Line • Which classifier do I use for a given document classification problem?  Answer : Depends  How much training data is available?  How simple/complex is the problem?  How noisy is the data?  How stable is the problem over time?  For an unstable problem, its better to use a simple and robust classifier.