Introduction The Classical Approach The Language Modeling Approach Smoothing Techniques Relation with Classical Approach
Language Modeling for Information Retrieval
Manoj Kumar Chinnakotla
KReSIT IIT Bombay
Language Technologies for the Web
Mar 2006
Introduction The Classical Approach The Language Modeling Approach Smoothing Techniques Relation with Classical Approach
Outline
1 Introduction
2 The Classical Approach
3 The Language Modeling Approach
4 Smoothing Techniques
5 Relation with Classical Approach
Introduction The Classical Approach The Language Modeling Approach Smoothing Techniques Relation with Classical Approach
Probabilistic Models
The Central Problem in IR
Introduction The Classical Approach The Language Modeling Approach Smoothing Techniques Relation with Classical Approach
Probabilistic Models
Is this Document Relevant?
Introduction The Classical Approach The Language Modeling Approach Smoothing Techniques Relation with Classical Approach
Probabilistic Models
Probabilistic Models
Model uncertainties in the problem well Example
Is this term relevant?
Is this document relevant?
The Random Variables Relevance (R) - R 2 f0; 1g
Documents (D) - D 2 fD
1; D
2; : : : ; D
Ng
Query (Q) - Q 2 fAll Possible Queriesg
A Term (A ) - A 2 f0; 1g
Introduction The Classical Approach The Language Modeling Approach Smoothing Techniques Relation with Classical Approach
Probabilistic Models
The Ranking Function
Rank documents based on Posterior Probability of Relevance Score(D; Q) = P(R = 1jD; Q) (1) Ranking using following log-odds ratio is equivalent
Score (D; Q) = log P (R = 1jD; Q)
P (R = 0jD; Q) (2)
Introduction The Classical Approach The Language Modeling Approach Smoothing Techniques Relation with Classical Approach
Probabilistic Models
Probabilistic Ranking Principle
Due to Robertson [6]
Central theorem for Probabilistic IR Theorem
Ranking documents using the log-odds ratio of posterior probability
of relevance is optimal with respect to various retrieval measures (like
Average Precision).
Introduction The Classical Approach The Language Modeling Approach Smoothing Techniques Relation with Classical Approach
The Classical Approach
Due to Robertson-Sparck Jones [7]
Generative Model of Relevance
Rank documents based on the following log-odds ratio Score (D; Q) = log P (DjQ; R = 1)
P (DjQ; R = 0) (3) For query Q, most of collection C is irrelevant
P (:jQ; R = 0) P(:jQ; C) (4)
Introduction The Classical Approach The Language Modeling Approach Smoothing Techniques Relation with Classical Approach
Binary Independence Retrieval Model
Assuming term independence, we have Score(D; Q) = X
A
i2D
log P(A i jQ; R = 1)
P(A i jQ; C) (5) Popularly known as “Binary Independence Retrieval (BIR)”
Model
Need to estimate Relevance Distribution P (:jQ; R = 1)
Introduction The Classical Approach The Language Modeling Approach Smoothing Techniques Relation with Classical Approach
Are we back to the Original Problem?
Estimating P (:jQ; R = 1) is equivalent to solving the original problem!
Challenge - No sample relevant documents available initially Current Approaches
Choose some initial estimates for P (wjQ; R = 1) Iteratively assume top k documents retrieved are relevant Update estimates
Accuracy depends on initial separation achieved
Introduction The Classical Approach The Language Modeling Approach Smoothing Techniques Relation with Classical Approach
Are we back to the Original Problem?
Estimating P (:jQ; R = 1) is equivalent to solving the original problem!
Challenge - No sample relevant documents available initially Current Approaches
Choose some initial estimates for P (wjQ; R = 1) Iteratively assume top k documents retrieved are relevant Update estimates
Accuracy depends on initial separation achieved
Introduction The Classical Approach The Language Modeling Approach Smoothing Techniques Relation with Classical Approach
Are we back to the Original Problem?
Estimating P (:jQ; R = 1) is equivalent to solving the original problem!
Challenge - No sample relevant documents available initially Current Approaches
Choose some initial estimates for P (wjQ; R = 1) Iteratively assume top k documents retrieved are relevant Update estimates
Accuracy depends on initial separation achieved
Introduction The Classical Approach The Language Modeling Approach Smoothing Techniques Relation with Classical Approach
Are we back to the Original Problem?
Estimating P (:jQ; R = 1) is equivalent to solving the original problem!
Challenge - No sample relevant documents available initially Current Approaches
Choose some initial estimates for P (wjQ; R = 1) Iteratively assume top k documents retrieved are relevant Update estimates
Accuracy depends on initial separation achieved
Introduction The Classical Approach The Language Modeling Approach Smoothing Techniques Relation with Classical Approach
The Language Modeling Approach
Basic Idea (Ponte and Croft [5])
Assuming document D is relevant, what is the likelihood of user choosing current query Q to retrieve D?
Model the language of each document as a distribution over words (Unigram)
Individual document distributions P (wjD) called “Language Models”
Rank documents based on posterior probability of document given query
P(DjQ) = P(QjD)
Document Prior
z }| {
P(D)
Introduction The Classical Approach The Language Modeling Approach Smoothing Techniques Relation with Classical Approach
A Shift in Paradigm
Some Immediate Benefits
Allows integration of document importance through Document Prior P (D)
Document Prior could be estimated from Link Analysis algorithms (Page Rank, HITS)
Ease of Estimation - Document size usually larger than the query Document Language Models P (wjD) could be pre-computed at index time
Assuming uniform document priors, Query Likelihood Ranking Function is given by
Score(D; Q) = Y
P(wjD) (7)
Introduction The Classical Approach The Language Modeling Approach Smoothing Techniques Relation with Classical Approach
Smoothing Techniques
Motivation
The Maximum Likelihood Estimator (MLE) for P (wjD) given by P
ml= P c (w; D)
w2D
c (w; D) (8)
Since document length is limited, MLE P
mlAssigns zero probability to words not observed in D Has high variance
Solution - Smoothing the MLE using collection model P (wjC) Example
Jelinik-Mercer Smoothing
Introduction The Classical Approach The Language Modeling Approach Smoothing Techniques Relation with Classical Approach
Modeling of Relevance
Relation with Classical Approach
Figure: Two different factorizations of the same joint P (D; QjR) Two Approaches Equivalent
LM Approach makes additional assumptions
Justification for Assumptions
Introduction The Classical Approach The Language Modeling Approach Smoothing Techniques Relation with Classical Approach
Modeling of Relevance
Where is Relevance?
Notion of relevance assumed implicitly in the model This is a problem while handling “Relevance Feedback”
Solution - Query Models or Relevance Models [3, 4]
Relevance Model or Query Model - Distribution encoding the information need
Assume query Q to be sample from Relevance Model
RNew Ranking Function - Divergence Based
Score (D) = KL(Djj R )
= X
P (wjD) log P (wjD)
(10)
Introduction The Classical Approach The Language Modeling Approach Smoothing Techniques Relation with Classical Approach
Modeling of Relevance
Where is Relevance?
Notion of relevance assumed implicitly in the model This is a problem while handling “Relevance Feedback”
Solution - Query Models or Relevance Models [3, 4]
Relevance Model or Query Model - Distribution encoding the information need
Assume query Q to be sample from Relevance Model
RNew Ranking Function - Divergence Based
Score (D) = KL(Djj R )
X P (wjD)
Introduction The Classical Approach The Language Modeling Approach Smoothing Techniques Relation with Classical Approach
Modeling of Relevance
Where is Relevance?
Notion of relevance assumed implicitly in the model This is a problem while handling “Relevance Feedback”
Solution - Query Models or Relevance Models [3, 4]
Relevance Model or Query Model - Distribution encoding the information need
Assume query Q to be sample from Relevance Model
RNew Ranking Function - Divergence Based
Score (D) = KL(Djj R )
= X
P (wjD) log P (wjD)
(10)
Introduction The Classical Approach The Language Modeling Approach Smoothing Techniques Relation with Classical Approach
Modeling of Relevance
Implications for the LM Approach
Problem of Retrieval ) Estimating Two Distributions Relevance Model P (wj
R)
Document Language Models P (wjD)
Offers natural way to incorporate “Relevance Feedback”
Given relevant documents, update Relevance Model R
References I
CROFT, W. B.,ANDLAFFERTY, J.
Language Modeling for Information Retrieval.
Kluwer Academic Publishers, 2003.
JOHNLAFFERTY ANDCHENGXIANGZHAI.
Probabilistic Relevance Models Based on Document and Query Generation.
In Language Modeling for Information Retrieval (2003), vol. 13, Kluwer International Series on IR, pp. 1–10.
LAFFERTY, J.,ANDZHAI, C.
Document Language Models, Query Models, and Risk Minimization for Information Retrieval.
In SIGIR ’01: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval (New York, NY, USA, 2001), ACM Press, pp. 111–119.
LAVRENKO, V.,ANDCROFT, W. B.
Relevance Based Language Models.
In SIGIR ’01: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval (New York, NY, USA, 2001), ACM Press, pp. 120–127.
PONTE, J. M.,ANDCROFT, W. B.
A Language Modeling Approach to Information Retrieval.
References II
ROBERTSON, S. E.
The Probability Ranking Principle in IR.
Readings in information retrieval (1997), 281–286.
ROBERTSON, S. E.,ANDJONES, S.
Relevance Weighting of Search Terms.
Journal of the American Society for Information Science 27 (1976), 129–146.
YATES, R. B.,ANDNETO, B. R.
Modern Information Retrieval.
Pearson Education, 2005.