SlideShare a Scribd company logo
1 of 17
Download to read offline
Information Retrieval as
Statistical Translation
ADAM BERGER & JOHN LAFFERTY 1999
Bhavesh Singh
2010cs50281
OUTLINE
•
•
•
•
•
•
•
•

INTRODUCTION
MODEL OF QUERY GENERATION
PREVIOUS WORK USING 2-POISSON MODEL
STATISTICAL TRANSLATION
MODELS OF DOCUMENT-QUERY TRANSLATION
WORD-FOR-WORD TRANSLATION
EXPERIMENTAL RESULTS
CRITIQUE
INTRODUCTION
• Information Retrieval (IR): Obtaining information resources relevant to an information need from a
collection of information resources (documents).
• Predicting relevance is the central goal of IR.
• A new probabilistic approach to IR based upon the ideas and methods of statistical machine translation.
• Model: Medium between data and understanding.
• Ultimately, document retrieval systems must be sophisticated enough to handle polysemy and
synonymy.
INTRODUCTION (…cont.)
SOME BASIC TERMINOLOGIES
PRECISION is the fraction of the documents retrieved that are relevant to the user's information
need.

RECALL is the fraction of the documents that are relevant to the query that are successfully
retrieved.

There is a inverse relation between precision and recall.
MODEL OF QUERY GENERATION
• The user ‘U’ has an information need ‘I’ .

• From this need, he generates an ideal document ‘d’.
• Ideal Document: a perfect fit for the user, but almost certainly not present in the retrieval system’s
collection of documents.
• He selects a set of key terms from ‘d’, and generates a query ‘q’ from this set.

In this setting, the task of a retrieval system is to find those documents most similar to ‘d’.
The Retrieval System’s task
To find the most likely documents given the query; that is, those ‘d’ for which p(d | q, U) is
highest. By Bayes’ law –

Denominator p(q | U) is fixed for a given query and user, we can ignore it for the purpose of
ranking documents, and define the relevance of a document to a query as –
2-POISSON MODEL (PREVIOUS WORK)
The 2-Poisson model is a mixture, that is a linear combination, of two Poisson distributions:

Where Et – the Elite set of term t which occur more densely and non randomly in a few documents.
In the context of IR, the 2-Poisson is used to model the probability distribution of the frequency X of a term
in a collection of documents.
The effectiveness of the Two-Poisson model for document retrieval was never tested, for two reasons. The
first issue is that the learning of the three parameters using the Expectation Maximization (EM) algorithm
for each term is expensive, and in general large collections contain millions of terms. The second problem is
that the model does not take into account the document size, therefore the model should be extended to
normalize different document lengths.
STATISTICAL MACHINE TRANSLATION
Automatic translation by computer was first contemplated by Warren Weaver when modern
computers were in their infancy.
The central problem of statistical MT is to build a system that automatically learns how to
translate text, by inspecting a large set of sentences in one language along with their
translations into another language.
Let translational probability for each English word ‘e’ translating to each French word ‘f’ is given
by : t( f | e).
STATISTICAL MT (..cont.)
The probability that an English sentence e = {e1, e2,…} translates to a French sentence f =
{f1,f2,…} is calculated as

where Gamma is a normalizing factor. The hidden variable in this model is the alignment a
between the French and English words: aj = k means that the kth English word translates to the
jth French word.
MODEL OF DOCUMENT-QUERY
TRANSLATION
First, a word ‘w’ is chosen at random from the document d according to distribution l( w | d)
that we call the document language model.
Next translate ‘w’ into the word or phrase ‘q’ according to a translational model, with
parameters t( q | w).
Thus, the probability of choosing q as a representative of the document d is –

Now assuming the sample size model ᶲ( n | d) as the Poisson distribution with mean lamda(d)
as -
MODEL OF DOCUMENT-QUERY
TRANSLATION (…cont.)
Under that assumption of treating the number of samples ‘n’ as Poisson distribution, the
probability that a particular query q = q1,q2,…qm is generated will be given by –

This was the Model 1 of document-query translation. It was inspired by IBM statistical
translation model.
To fit translational probabilities in Model 1, Expectation Maximization (EM) algorithm is used.
MODEL 0 – THE SIMPLEST CASE: WORDFOR-WORD TRANSLATION
The simplest version of the model 1 which we will distinguish as Model 0 is one where each
word ‘w’ can be translated only as itself; that is, the translation probabilities are “diagonal”:

Under this model, the query terms are chosen simply according to their frequency of occurrence
in the document.
The probability for query in this case is simply -
EXPERIMENTAL RESULTS

Precision-Recall plots. The left plot compares Model 1 to Model 0 on the SDR data. The right
plot compares the same language model scored according to Model 0, demonstrating that the
approximations are very good.
CRITIQUE
The 2-Poisson Model was never tested due to one of the reason that the learning of three
parameters for each term is expensive because the Expectation Maximization algorithm
converges in several iterations.
According to this paper, to fit the translation probabilities of Model 1, EM algorithm is used. So
this is also an expensive operation. The efficiency of EM in Model 1 is not discussed well. It
should be more elaborated.
REFERENCES
[1] “Information Retrieval as Statistical Translation” by Adam Berger and John Lafferty, 1999.

[2] “Two Poisson model” by Giambattista Amati, Fondazione ugo Bordoni.
[3] Information Retrieval as Statistical Translation by Robert Barbey.
[4] Wikipedia article on “Information Retrieval”.
THANK YOU

More Related Content

What's hot

Probabilistic retrieval model
Probabilistic retrieval modelProbabilistic retrieval model
Probabilistic retrieval modelbaradhimarch81
 
5. manuel arcedillo & juanjo arevalillo (hermes) translation memories
5. manuel arcedillo & juanjo arevalillo (hermes) translation memories5. manuel arcedillo & juanjo arevalillo (hermes) translation memories
5. manuel arcedillo & juanjo arevalillo (hermes) translation memoriesRIILP
 
7. Trevor Cohn (usfd) Statistical Machine Translation
7. Trevor Cohn (usfd) Statistical Machine Translation7. Trevor Cohn (usfd) Statistical Machine Translation
7. Trevor Cohn (usfd) Statistical Machine TranslationRIILP
 
An Application of Pattern matching for Motif Identification
An Application of Pattern matching for Motif IdentificationAn Application of Pattern matching for Motif Identification
An Application of Pattern matching for Motif IdentificationCSCJournals
 
Improving Neural Abstractive Text Summarization with Prior Knowledge
Improving Neural Abstractive Text Summarization with Prior KnowledgeImproving Neural Abstractive Text Summarization with Prior Knowledge
Improving Neural Abstractive Text Summarization with Prior KnowledgeGaetano Rossiello, PhD
 
Cohesive Software Design
Cohesive Software DesignCohesive Software Design
Cohesive Software Designijtsrd
 
G04124041046
G04124041046G04124041046
G04124041046IOSR-JEN
 
Sentence Validation by Statistical Language Modeling and Semantic Relations
Sentence Validation by Statistical Language Modeling and Semantic RelationsSentence Validation by Statistical Language Modeling and Semantic Relations
Sentence Validation by Statistical Language Modeling and Semantic RelationsEditor IJCATR
 
Search Engines
Search EnginesSearch Engines
Search Enginesbutest
 
Proposed Method for String Transformation using Probablistic Approach
Proposed Method for String Transformation using Probablistic ApproachProposed Method for String Transformation using Probablistic Approach
Proposed Method for String Transformation using Probablistic ApproachEditor IJMTER
 
TextRank: Bringing Order into Texts
TextRank: Bringing Order into TextsTextRank: Bringing Order into Texts
TextRank: Bringing Order into TextsShubhangi Tandon
 
Boolean,vector space retrieval Models
Boolean,vector space retrieval Models Boolean,vector space retrieval Models
Boolean,vector space retrieval Models Primya Tamil
 
Neural Models for Information Retrieval
Neural Models for Information RetrievalNeural Models for Information Retrieval
Neural Models for Information RetrievalBhaskar Mitra
 

What's hot (20)

A survey of xml tree patterns
A survey of xml tree patternsA survey of xml tree patterns
A survey of xml tree patterns
 
Probabilistic retrieval model
Probabilistic retrieval modelProbabilistic retrieval model
Probabilistic retrieval model
 
5. manuel arcedillo & juanjo arevalillo (hermes) translation memories
5. manuel arcedillo & juanjo arevalillo (hermes) translation memories5. manuel arcedillo & juanjo arevalillo (hermes) translation memories
5. manuel arcedillo & juanjo arevalillo (hermes) translation memories
 
Ir 09
Ir   09Ir   09
Ir 09
 
Svv
SvvSvv
Svv
 
7. Trevor Cohn (usfd) Statistical Machine Translation
7. Trevor Cohn (usfd) Statistical Machine Translation7. Trevor Cohn (usfd) Statistical Machine Translation
7. Trevor Cohn (usfd) Statistical Machine Translation
 
An Application of Pattern matching for Motif Identification
An Application of Pattern matching for Motif IdentificationAn Application of Pattern matching for Motif Identification
An Application of Pattern matching for Motif Identification
 
Improving Neural Abstractive Text Summarization with Prior Knowledge
Improving Neural Abstractive Text Summarization with Prior KnowledgeImproving Neural Abstractive Text Summarization with Prior Knowledge
Improving Neural Abstractive Text Summarization with Prior Knowledge
 
Cohesive Software Design
Cohesive Software DesignCohesive Software Design
Cohesive Software Design
 
Ir 08
Ir   08Ir   08
Ir 08
 
G04124041046
G04124041046G04124041046
G04124041046
 
Sentence Validation by Statistical Language Modeling and Semantic Relations
Sentence Validation by Statistical Language Modeling and Semantic RelationsSentence Validation by Statistical Language Modeling and Semantic Relations
Sentence Validation by Statistical Language Modeling and Semantic Relations
 
Search Engines
Search EnginesSearch Engines
Search Engines
 
Proposed Method for String Transformation using Probablistic Approach
Proposed Method for String Transformation using Probablistic ApproachProposed Method for String Transformation using Probablistic Approach
Proposed Method for String Transformation using Probablistic Approach
 
Decision tables
Decision tablesDecision tables
Decision tables
 
TextRank: Bringing Order into Texts
TextRank: Bringing Order into TextsTextRank: Bringing Order into Texts
TextRank: Bringing Order into Texts
 
Cmpe 255 Short Story Assignment
Cmpe 255 Short Story AssignmentCmpe 255 Short Story Assignment
Cmpe 255 Short Story Assignment
 
Boolean,vector space retrieval Models
Boolean,vector space retrieval Models Boolean,vector space retrieval Models
Boolean,vector space retrieval Models
 
Ir 03
Ir   03Ir   03
Ir 03
 
Neural Models for Information Retrieval
Neural Models for Information RetrievalNeural Models for Information Retrieval
Neural Models for Information Retrieval
 

Similar to Information retrieval as statistical translation

Unsupervised Quality Estimation Model for English to German Translation and I...
Unsupervised Quality Estimation Model for English to German Translation and I...Unsupervised Quality Estimation Model for English to German Translation and I...
Unsupervised Quality Estimation Model for English to German Translation and I...Lifeng (Aaron) Han
 
Artificial Intelligence
Artificial IntelligenceArtificial Intelligence
Artificial Intelligencevini89
 
A Study Of Statistical Models For Query Translation :Finding A Good Unit Of T...
A Study Of Statistical Models For Query Translation :Finding A Good Unit Of T...A Study Of Statistical Models For Query Translation :Finding A Good Unit Of T...
A Study Of Statistical Models For Query Translation :Finding A Good Unit Of T...iyo
 
Seeds Affinity Propagation Based on Text Clustering
Seeds Affinity Propagation Based on Text ClusteringSeeds Affinity Propagation Based on Text Clustering
Seeds Affinity Propagation Based on Text ClusteringIJRES Journal
 
Measure the Similarity of Complaint Document Using Cosine Similarity Based on...
Measure the Similarity of Complaint Document Using Cosine Similarity Based on...Measure the Similarity of Complaint Document Using Cosine Similarity Based on...
Measure the Similarity of Complaint Document Using Cosine Similarity Based on...Editor IJCATR
 
Understanding Natural Languange with Corpora-based Generation of Dependency G...
Understanding Natural Languange with Corpora-based Generation of Dependency G...Understanding Natural Languange with Corpora-based Generation of Dependency G...
Understanding Natural Languange with Corpora-based Generation of Dependency G...Edmond Lepedus
 
EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...
EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...
EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...ijnlc
 
ACL-WMT2013.A Description of Tunable Machine Translation Evaluation Systems i...
ACL-WMT2013.A Description of Tunable Machine Translation Evaluation Systems i...ACL-WMT2013.A Description of Tunable Machine Translation Evaluation Systems i...
ACL-WMT2013.A Description of Tunable Machine Translation Evaluation Systems i...Lifeng (Aaron) Han
 
TOPIC EXTRACTION OF CRAWLED DOCUMENTS COLLECTION USING CORRELATED TOPIC MODEL...
TOPIC EXTRACTION OF CRAWLED DOCUMENTS COLLECTION USING CORRELATED TOPIC MODEL...TOPIC EXTRACTION OF CRAWLED DOCUMENTS COLLECTION USING CORRELATED TOPIC MODEL...
TOPIC EXTRACTION OF CRAWLED DOCUMENTS COLLECTION USING CORRELATED TOPIC MODEL...kevig
 
WITH SEMANTICS AND HIDDEN MARKOV MODELS TO AN ADAPTIVE LOG FILE PARSER
WITH SEMANTICS AND HIDDEN MARKOV MODELS TO AN ADAPTIVE LOG FILE PARSERWITH SEMANTICS AND HIDDEN MARKOV MODELS TO AN ADAPTIVE LOG FILE PARSER
WITH SEMANTICS AND HIDDEN MARKOV MODELS TO AN ADAPTIVE LOG FILE PARSERkevig
 
IRJET- K-SVD: Dictionary Developing Algorithms for Sparse Representation ...
IRJET-  	  K-SVD: Dictionary Developing Algorithms for Sparse Representation ...IRJET-  	  K-SVD: Dictionary Developing Algorithms for Sparse Representation ...
IRJET- K-SVD: Dictionary Developing Algorithms for Sparse Representation ...IRJET Journal
 
WITH SEMANTICS AND HIDDEN MARKOV MODELS TO AN ADAPTIVE LOG FILE PARSER
WITH SEMANTICS AND HIDDEN MARKOV MODELS TO AN ADAPTIVE LOG FILE PARSERWITH SEMANTICS AND HIDDEN MARKOV MODELS TO AN ADAPTIVE LOG FILE PARSER
WITH SEMANTICS AND HIDDEN MARKOV MODELS TO AN ADAPTIVE LOG FILE PARSERijnlc
 
T EXT M INING AND C LASSIFICATION OF P RODUCT R EVIEWS U SING S TRUCTURED S U...
T EXT M INING AND C LASSIFICATION OF P RODUCT R EVIEWS U SING S TRUCTURED S U...T EXT M INING AND C LASSIFICATION OF P RODUCT R EVIEWS U SING S TRUCTURED S U...
T EXT M INING AND C LASSIFICATION OF P RODUCT R EVIEWS U SING S TRUCTURED S U...csandit
 
International Journal of Computer Science and Security Volume (4) Issue (2)
International Journal of Computer Science and Security Volume (4) Issue (2)International Journal of Computer Science and Security Volume (4) Issue (2)
International Journal of Computer Science and Security Volume (4) Issue (2)CSCJournals
 
Tdm probabilistic models (part 2)
Tdm probabilistic  models (part  2)Tdm probabilistic  models (part  2)
Tdm probabilistic models (part 2)KU Leuven
 
Basic review on topic modeling
Basic review on  topic modelingBasic review on  topic modeling
Basic review on topic modelingHiroyuki Kuromiya
 
COLING 2012 - LEPOR: A Robust Evaluation Metric for Machine Translation with ...
COLING 2012 - LEPOR: A Robust Evaluation Metric for Machine Translation with ...COLING 2012 - LEPOR: A Robust Evaluation Metric for Machine Translation with ...
COLING 2012 - LEPOR: A Robust Evaluation Metric for Machine Translation with ...Lifeng (Aaron) Han
 

Similar to Information retrieval as statistical translation (20)

Unsupervised Quality Estimation Model for English to German Translation and I...
Unsupervised Quality Estimation Model for English to German Translation and I...Unsupervised Quality Estimation Model for English to German Translation and I...
Unsupervised Quality Estimation Model for English to German Translation and I...
 
Ju3517011704
Ju3517011704Ju3517011704
Ju3517011704
 
Topicmodels
TopicmodelsTopicmodels
Topicmodels
 
Equirs: Explicitly Query Understanding Information Retrieval System Based on Hmm
Equirs: Explicitly Query Understanding Information Retrieval System Based on HmmEquirs: Explicitly Query Understanding Information Retrieval System Based on Hmm
Equirs: Explicitly Query Understanding Information Retrieval System Based on Hmm
 
Artificial Intelligence
Artificial IntelligenceArtificial Intelligence
Artificial Intelligence
 
A Study Of Statistical Models For Query Translation :Finding A Good Unit Of T...
A Study Of Statistical Models For Query Translation :Finding A Good Unit Of T...A Study Of Statistical Models For Query Translation :Finding A Good Unit Of T...
A Study Of Statistical Models For Query Translation :Finding A Good Unit Of T...
 
Seeds Affinity Propagation Based on Text Clustering
Seeds Affinity Propagation Based on Text ClusteringSeeds Affinity Propagation Based on Text Clustering
Seeds Affinity Propagation Based on Text Clustering
 
Measure the Similarity of Complaint Document Using Cosine Similarity Based on...
Measure the Similarity of Complaint Document Using Cosine Similarity Based on...Measure the Similarity of Complaint Document Using Cosine Similarity Based on...
Measure the Similarity of Complaint Document Using Cosine Similarity Based on...
 
Understanding Natural Languange with Corpora-based Generation of Dependency G...
Understanding Natural Languange with Corpora-based Generation of Dependency G...Understanding Natural Languange with Corpora-based Generation of Dependency G...
Understanding Natural Languange with Corpora-based Generation of Dependency G...
 
EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...
EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...
EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...
 
ACL-WMT2013.A Description of Tunable Machine Translation Evaluation Systems i...
ACL-WMT2013.A Description of Tunable Machine Translation Evaluation Systems i...ACL-WMT2013.A Description of Tunable Machine Translation Evaluation Systems i...
ACL-WMT2013.A Description of Tunable Machine Translation Evaluation Systems i...
 
TOPIC EXTRACTION OF CRAWLED DOCUMENTS COLLECTION USING CORRELATED TOPIC MODEL...
TOPIC EXTRACTION OF CRAWLED DOCUMENTS COLLECTION USING CORRELATED TOPIC MODEL...TOPIC EXTRACTION OF CRAWLED DOCUMENTS COLLECTION USING CORRELATED TOPIC MODEL...
TOPIC EXTRACTION OF CRAWLED DOCUMENTS COLLECTION USING CORRELATED TOPIC MODEL...
 
WITH SEMANTICS AND HIDDEN MARKOV MODELS TO AN ADAPTIVE LOG FILE PARSER
WITH SEMANTICS AND HIDDEN MARKOV MODELS TO AN ADAPTIVE LOG FILE PARSERWITH SEMANTICS AND HIDDEN MARKOV MODELS TO AN ADAPTIVE LOG FILE PARSER
WITH SEMANTICS AND HIDDEN MARKOV MODELS TO AN ADAPTIVE LOG FILE PARSER
 
IRJET- K-SVD: Dictionary Developing Algorithms for Sparse Representation ...
IRJET-  	  K-SVD: Dictionary Developing Algorithms for Sparse Representation ...IRJET-  	  K-SVD: Dictionary Developing Algorithms for Sparse Representation ...
IRJET- K-SVD: Dictionary Developing Algorithms for Sparse Representation ...
 
WITH SEMANTICS AND HIDDEN MARKOV MODELS TO AN ADAPTIVE LOG FILE PARSER
WITH SEMANTICS AND HIDDEN MARKOV MODELS TO AN ADAPTIVE LOG FILE PARSERWITH SEMANTICS AND HIDDEN MARKOV MODELS TO AN ADAPTIVE LOG FILE PARSER
WITH SEMANTICS AND HIDDEN MARKOV MODELS TO AN ADAPTIVE LOG FILE PARSER
 
T EXT M INING AND C LASSIFICATION OF P RODUCT R EVIEWS U SING S TRUCTURED S U...
T EXT M INING AND C LASSIFICATION OF P RODUCT R EVIEWS U SING S TRUCTURED S U...T EXT M INING AND C LASSIFICATION OF P RODUCT R EVIEWS U SING S TRUCTURED S U...
T EXT M INING AND C LASSIFICATION OF P RODUCT R EVIEWS U SING S TRUCTURED S U...
 
International Journal of Computer Science and Security Volume (4) Issue (2)
International Journal of Computer Science and Security Volume (4) Issue (2)International Journal of Computer Science and Security Volume (4) Issue (2)
International Journal of Computer Science and Security Volume (4) Issue (2)
 
Tdm probabilistic models (part 2)
Tdm probabilistic  models (part  2)Tdm probabilistic  models (part  2)
Tdm probabilistic models (part 2)
 
Basic review on topic modeling
Basic review on  topic modelingBasic review on  topic modeling
Basic review on topic modeling
 
COLING 2012 - LEPOR: A Robust Evaluation Metric for Machine Translation with ...
COLING 2012 - LEPOR: A Robust Evaluation Metric for Machine Translation with ...COLING 2012 - LEPOR: A Robust Evaluation Metric for Machine Translation with ...
COLING 2012 - LEPOR: A Robust Evaluation Metric for Machine Translation with ...
 

Recently uploaded

AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 

Recently uploaded (20)

Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 

Information retrieval as statistical translation

  • 1. Information Retrieval as Statistical Translation ADAM BERGER & JOHN LAFFERTY 1999 Bhavesh Singh 2010cs50281
  • 2. OUTLINE • • • • • • • • INTRODUCTION MODEL OF QUERY GENERATION PREVIOUS WORK USING 2-POISSON MODEL STATISTICAL TRANSLATION MODELS OF DOCUMENT-QUERY TRANSLATION WORD-FOR-WORD TRANSLATION EXPERIMENTAL RESULTS CRITIQUE
  • 3. INTRODUCTION • Information Retrieval (IR): Obtaining information resources relevant to an information need from a collection of information resources (documents). • Predicting relevance is the central goal of IR. • A new probabilistic approach to IR based upon the ideas and methods of statistical machine translation. • Model: Medium between data and understanding. • Ultimately, document retrieval systems must be sophisticated enough to handle polysemy and synonymy.
  • 4. INTRODUCTION (…cont.) SOME BASIC TERMINOLOGIES PRECISION is the fraction of the documents retrieved that are relevant to the user's information need. RECALL is the fraction of the documents that are relevant to the query that are successfully retrieved. There is a inverse relation between precision and recall.
  • 5. MODEL OF QUERY GENERATION • The user ‘U’ has an information need ‘I’ . • From this need, he generates an ideal document ‘d’. • Ideal Document: a perfect fit for the user, but almost certainly not present in the retrieval system’s collection of documents. • He selects a set of key terms from ‘d’, and generates a query ‘q’ from this set. In this setting, the task of a retrieval system is to find those documents most similar to ‘d’.
  • 6.
  • 7. The Retrieval System’s task To find the most likely documents given the query; that is, those ‘d’ for which p(d | q, U) is highest. By Bayes’ law – Denominator p(q | U) is fixed for a given query and user, we can ignore it for the purpose of ranking documents, and define the relevance of a document to a query as –
  • 8. 2-POISSON MODEL (PREVIOUS WORK) The 2-Poisson model is a mixture, that is a linear combination, of two Poisson distributions: Where Et – the Elite set of term t which occur more densely and non randomly in a few documents. In the context of IR, the 2-Poisson is used to model the probability distribution of the frequency X of a term in a collection of documents. The effectiveness of the Two-Poisson model for document retrieval was never tested, for two reasons. The first issue is that the learning of the three parameters using the Expectation Maximization (EM) algorithm for each term is expensive, and in general large collections contain millions of terms. The second problem is that the model does not take into account the document size, therefore the model should be extended to normalize different document lengths.
  • 9. STATISTICAL MACHINE TRANSLATION Automatic translation by computer was first contemplated by Warren Weaver when modern computers were in their infancy. The central problem of statistical MT is to build a system that automatically learns how to translate text, by inspecting a large set of sentences in one language along with their translations into another language. Let translational probability for each English word ‘e’ translating to each French word ‘f’ is given by : t( f | e).
  • 10. STATISTICAL MT (..cont.) The probability that an English sentence e = {e1, e2,…} translates to a French sentence f = {f1,f2,…} is calculated as where Gamma is a normalizing factor. The hidden variable in this model is the alignment a between the French and English words: aj = k means that the kth English word translates to the jth French word.
  • 11. MODEL OF DOCUMENT-QUERY TRANSLATION First, a word ‘w’ is chosen at random from the document d according to distribution l( w | d) that we call the document language model. Next translate ‘w’ into the word or phrase ‘q’ according to a translational model, with parameters t( q | w). Thus, the probability of choosing q as a representative of the document d is – Now assuming the sample size model ᶲ( n | d) as the Poisson distribution with mean lamda(d) as -
  • 12. MODEL OF DOCUMENT-QUERY TRANSLATION (…cont.) Under that assumption of treating the number of samples ‘n’ as Poisson distribution, the probability that a particular query q = q1,q2,…qm is generated will be given by – This was the Model 1 of document-query translation. It was inspired by IBM statistical translation model. To fit translational probabilities in Model 1, Expectation Maximization (EM) algorithm is used.
  • 13. MODEL 0 – THE SIMPLEST CASE: WORDFOR-WORD TRANSLATION The simplest version of the model 1 which we will distinguish as Model 0 is one where each word ‘w’ can be translated only as itself; that is, the translation probabilities are “diagonal”: Under this model, the query terms are chosen simply according to their frequency of occurrence in the document. The probability for query in this case is simply -
  • 14. EXPERIMENTAL RESULTS Precision-Recall plots. The left plot compares Model 1 to Model 0 on the SDR data. The right plot compares the same language model scored according to Model 0, demonstrating that the approximations are very good.
  • 15. CRITIQUE The 2-Poisson Model was never tested due to one of the reason that the learning of three parameters for each term is expensive because the Expectation Maximization algorithm converges in several iterations. According to this paper, to fit the translation probabilities of Model 1, EM algorithm is used. So this is also an expensive operation. The efficiency of EM in Model 1 is not discussed well. It should be more elaborated.
  • 16. REFERENCES [1] “Information Retrieval as Statistical Translation” by Adam Berger and John Lafferty, 1999. [2] “Two Poisson model” by Giambattista Amati, Fondazione ugo Bordoni. [3] Information Retrieval as Statistical Translation by Robert Barbey. [4] Wikipedia article on “Information Retrieval”.