Language Modeling for Information Retrieval

(1)

Introduction The Classical Approach The Language Modeling Approach Smoothing Techniques Relation with Classical Approach

Language Modeling for Information Retrieval

Manoj Kumar Chinnakotla

KReSIT IIT Bombay

Language Technologies for the Web

Mar 2006

(2)

Outline

1 Introduction

2 The Classical Approach

3 The Language Modeling Approach

4 Smoothing Techniques

5 Relation with Classical Approach

(3)

Probabilistic Models

The Central Problem in IR

(4)

Is this Document Relevant?

(5)

Probabilistic Models

Model uncertainties in the problem well Example

Is this term relevant?

Is this document relevant?

The Random Variables Relevance (R) - R 2 f0; 1g

Documents (D) - D 2 fD

1

; D

2

; : : : ; D

N

g

Query (Q) - Q 2 fAll Possible Queriesg

A Term (A ) - A 2 f0; 1g

(6)

The Ranking Function

Rank documents based on Posterior Probability of Relevance Score(D; Q) = P(R = 1jD; Q) (1) Ranking using following log-odds ratio is equivalent

Score (D; Q) = log P (R = 1jD; Q)

P (R = 0jD; Q) (2)

(7)

Probabilistic Ranking Principle

Due to Robertson [6]

Central theorem for Probabilistic IR Theorem

Ranking documents using the log-odds ratio of posterior probability

of relevance is optimal with respect to various retrieval measures (like

Average Precision).

(8)

The Classical Approach

Due to Robertson-Sparck Jones [7]

Generative Model of Relevance

Rank documents based on the following log-odds ratio Score (D; Q) = log P (DjQ; R = 1)

P (DjQ; R = 0) (3) For query Q, most of collection C is irrelevant

P (:jQ; R = 0) P(:jQ; C) (4)

(9)

Binary Independence Retrieval Model

Assuming term independence, we have Score(D; Q) = X

A

i

2D

log P(A i jQ; R = 1)

P(A i jQ; C) (5) Popularly known as “Binary Independence Retrieval (BIR)”

Model

Need to estimate Relevance Distribution P (:jQ; R = 1)

(10)

Are we back to the Original Problem?

Estimating P (:jQ; R = 1) is equivalent to solving the original problem!

Challenge - No sample relevant documents available initially Current Approaches

Choose some initial estimates for P (wjQ; R = 1) Iteratively assume top k documents retrieved are relevant Update estimates

Accuracy depends on initial separation achieved

(11)

Are we back to the Original Problem?

Estimating P (:jQ; R = 1) is equivalent to solving the original problem!

Challenge - No sample relevant documents available initially Current Approaches

Choose some initial estimates for P (wjQ; R = 1) Iteratively assume top k documents retrieved are relevant Update estimates

Accuracy depends on initial separation achieved

(12)

Are we back to the Original Problem?

Estimating P (:jQ; R = 1) is equivalent to solving the original problem!

Challenge - No sample relevant documents available initially Current Approaches

Choose some initial estimates for P (wjQ; R = 1) Iteratively assume top k documents retrieved are relevant Update estimates

Accuracy depends on initial separation achieved

(13)

Are we back to the Original Problem?

Estimating P (:jQ; R = 1) is equivalent to solving the original problem!

Challenge - No sample relevant documents available initially Current Approaches

Choose some initial estimates for P (wjQ; R = 1) Iteratively assume top k documents retrieved are relevant Update estimates

Accuracy depends on initial separation achieved

(14)

The Language Modeling Approach

Basic Idea (Ponte and Croft [5])

Assuming document D is relevant, what is the likelihood of user choosing current query Q to retrieve D?

Model the language of each document as a distribution over words (Unigram)

Individual document distributions P (wjD) called “Language Models”

Rank documents based on posterior probability of document given query

P(DjQ) = P(QjD)

Document Prior

z }| {

P(D)

(15)

A Shift in Paradigm

Some Immediate Benefits

Allows integration of document importance through Document Prior P (D)

Document Prior could be estimated from Link Analysis algorithms (Page Rank, HITS)

Ease of Estimation - Document size usually larger than the query Document Language Models P (wjD) could be pre-computed at index time

Assuming uniform document priors, Query Likelihood Ranking Function is given by

Score(D; Q) = Y

P(wjD) (7)

(16)

Smoothing Techniques

Motivation

The Maximum Likelihood Estimator (MLE) for P (wjD) given by P

ml

= P c (w; D)

w2D

c (w; D) (8)

Since document length is limited, MLE P

ml

Assigns zero probability to words not observed in D Has high variance

Solution - Smoothing the MLE using collection model P (wjC) Example

Jelinik-Mercer Smoothing

(17)

Modeling of Relevance

Relation with Classical Approach

Figure: Two different factorizations of the same joint P (D; QjR) Two Approaches Equivalent

LM Approach makes additional assumptions

Justification for Assumptions

(18)

Where is Relevance?

Notion of relevance assumed implicitly in the model This is a problem while handling “Relevance Feedback”

Solution - Query Models or Relevance Models [3, 4]

Relevance Model or Query Model - Distribution encoding the information need

Assume query Q to be sample from Relevance Model

R

New Ranking Function - Divergence Based

Score (D) = KL(Djj R )

= X

P (wjD) log P (wjD)

(10)

(19)

Where is Relevance?

Notion of relevance assumed implicitly in the model This is a problem while handling “Relevance Feedback”

Solution - Query Models or Relevance Models [3, 4]

Relevance Model or Query Model - Distribution encoding the information need

Assume query Q to be sample from Relevance Model

R

New Ranking Function - Divergence Based

Score (D) = KL(Djj R )

X P (wjD)

(20)

Where is Relevance?

Notion of relevance assumed implicitly in the model This is a problem while handling “Relevance Feedback”

Solution - Query Models or Relevance Models [3, 4]

Relevance Model or Query Model - Distribution encoding the information need

Assume query Q to be sample from Relevance Model

R

New Ranking Function - Divergence Based

Score (D) = KL(Djj R )

= X

P (wjD) log P (wjD)

(10)

(21)

Implications for the LM Approach

Problem of Retrieval ) Estimating Two Distributions Relevance Model P (wj

R

)

Document Language Models P (wjD)

Offers natural way to incorporate “Relevance Feedback”

Given relevant documents, update Relevance Model R

(22)

References I

CROFT, W. B.,ANDLAFFERTY, J.

Language Modeling for Information Retrieval.

Kluwer Academic Publishers, 2003.

JOHNLAFFERTY ANDCHENGXIANGZHAI.

Probabilistic Relevance Models Based on Document and Query Generation.

In Language Modeling for Information Retrieval (2003), vol. 13, Kluwer International Series on IR, pp. 1–10.

LAFFERTY, J.,ANDZHAI, C.

Document Language Models, Query Models, and Risk Minimization for Information Retrieval.

In SIGIR ’01: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval (New York, NY, USA, 2001), ACM Press, pp. 111–119.

LAVRENKO, V.,ANDCROFT, W. B.

Relevance Based Language Models.

In SIGIR ’01: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval (New York, NY, USA, 2001), ACM Press, pp. 120–127.

PONTE, J. M.,ANDCROFT, W. B.

A Language Modeling Approach to Information Retrieval.

(23)

References II

ROBERTSON, S. E.

The Probability Ranking Principle in IR.

Readings in information retrieval (1997), 281–286.

ROBERTSON, S. E.,ANDJONES, S.

Relevance Weighting of Search Terms.

Journal of the American Society for Information Science 27 (1976), 129–146.

YATES, R. B.,ANDNETO, B. R.

Modern Information Retrieval.

Pearson Education, 2005.

(24)