• No results found

Fundamentals of Information Theory

N/A
N/A
Protected

Academic year: 2022

Share "Fundamentals of Information Theory"

Copied!
31
0
0

Loading.... (view fulltext now)

Full text

(1)

CS 621 Artificial Intelligence Lecture 12 – 30/08/05

Prof. Pushpak Bhattacharyya

Fundamentals of Information Theory

(2)

Weather (0)

Temp (T)

Humidity (H)

Windy (W)

Decision (D)

Sunny High High F N

Sunny High High T N

Cloudy High High F Y

Rain Med High F Y

Rain Cold Low N Y

Rain Cold Low T N

(3)

Weather (0)

Temp (T)

Humidity (H)

Windy (W)

Decision (D)

Sunny Med High F N

Sunny Cold Low F Y

Rain Med Low F Y

Sunny Med Low T Y

Cloudy Med High T Y

(4)

Outlook

Cloudy Rain Sunny

Windy Yes

Humidity

High Low T F

(5)

Rule Base

R1: If outlook is sunny and if humidity is high then Decision is No.

R2: If outlook is sunny and if humidity is low then Decision is Yes.

R3: If outlook is cloudy then Decision is Yes.

(6)

Making Sense of Information

• Classification

• Clustering

• Giving a short and nice description

(7)

Short Description

Occam Razor principle

(Shortest/simplest description is the best for generalization)

(8)

Representation Language

• Decision tree.

• Neural network.

• Rule base.

• Boolean expression.

(9)

The example data presented in the form of rows and labels has low ordered/structured

information compared to the succinct description (Decision Trees and Rule Base).

Define “information”

Lack of structure in information by “Entropy”

Information & Entropy

(10)

Define Entropy of S (Labeled data)

E(S) = - ( P+ log2P+

+

P- log2P- )

P+ = proportion of positively labeled data.

P- = proportion of negatively labeled data.

(11)

Example

P+ = 9/14 P1 = 5/14

E(S) = - 9/14 log2 (9/14) – 5/14 log2 (5/14)

(12)

Partitioning the Data

“Windy” as the attribute Windy = [ T, F]

Windy = T : Partitioning the data

(13)

Partitioning by focusing on a particular attribute produced

“Information gain”

“Reduction in Entropy”

Partitioning the Data (Contd)

(14)

Information gain when we choose windy = [ T, F ] Windy = T, P+ = 6 , P- = 2

Windy = F, P+ = 3 , P- = 3

Partitioning the Data (Contd)

(15)

Windy

T F

6, + 2, -

3, + 3, -

Partitioning the Data (Contd)

(16)

Gain(S,A) =

= E(S) -∑( |Sv| / |S| )E(Sv)

v є values of A

E(S) = 0.914 E(S, Windy):

E( Windy=T)

Partitioning the Data (Contd)

(17)

E( Windy=F)

= - 3/6 log 3/6 – 3/6 log 3/6

= 1.0

Partitioning the Data (Contd)

(18)

Gain(S,Windy) =

= 0.0914 – (8/14 * v + 9/19* 1.0)

= N

Exercise: Find information gain for each attribute:

outlook, Temp, Humidity and windy.

Partitioning the Data (Contd)

(19)

ID3 Algorithm

Calculating the gain for every attribute and choosing the one with maximum gain to finally

arrive at the decision tree is called “ID3”

algorithm to build a classifier.

(20)

Origin of Information Theory

1) Shannon “The mathematical Theory of

communication”, Bell systems Journal, 1948.

2) Cover and Thomas, “Elements of Information Theory”, 1991.

(21)

Motivation with the example of a horse race 8 horses - h1,h2……h8

Person P would like to bet on one of the horse.

The horse have probability of winning as follows:

Example

(22)

• h1 = 1/5 h5 = 1/64

• h2 = 1/4 h6 = 1/64

• h3 = 1/8 h7 = 1/64

• h4 = 1/16 h8 = 1/64

Example (Contd 1)

(23)

Send message specifying the horse on which to bet.

If the situation is “unbiased” i.e., all horses have equal probability of winning then we need 3

binary units.

Example (Contd 2)

(24)

Compute the bias

E(s) = - ∑ Pi Log Pi

i = 1,… 8

Pi = Probability of hi winning

E(s) = - ( ½ log ½ + ¼ log ¼ + ……

Example (Contd 3)

(25)

On the average we do not need more than 2 bits to communicate the desired horse.

Actual length of code ?

Example (Contd 4)

(26)

Example 2 ( Letter Guessing Game)

P t K a i u

1/8 1/4 1/8 1/8 1/4 1/8 20 – Question game

E(s) = - ∑ Pi Log2 Pi

i = {p, t, k, a, i , u }

(27)

On the average we need no more than 2.5 questions.

Design a code:

P t K a i u

1/8 1/4 1/8 1/8 1/4 1/8 100 00 101 110 01 111

(28)

Q1) Is the letter t or I ? Q2) Is it a constant ?

Expected number of questions = ∑ Pi * Ni

Where Ni = # questions for Situation i.

(29)

What has all this got to do with AI ? Why entropy?

Why design codes?

Why communicate ?

(30)

Bridge

Multiparty participation is intelligent transformation processing.

Information gain sets up theoretical limits in communicability.

(31)

Summary

• Haphazard presentation of data is not acceptable to MIND.

• Focusing attention on an attribute, automatically leads to information gain.

• Designed entropy.

• Parallely designed information gain .

• Related this to message communication.

References

Related documents

2 Single neuron level: Mathematical techniques used: Information theory, machine learning, and possibly theory of computation. 2 Theme II: Transformation of

Please provide information separately in the following format for the important matters on which the decision is taken by the public

A recurrent theme throughout this review of early warning, early action, and humanitarian information systems more generally is the use (or exclusion) of qualitative data

(2) Where the Central Information Commission or the State Information Commission, as the case may be, at the time of deciding any complaint or appeal is of the

Course description UNIT-I INFORMATION THEORY AND SOURCE CODING Introduction to Information Theory, Uncertainty and Information, Average Mutual Information and Entropy,

1.2 Subject to the provisions of the Act, all citizens shall have the right to information and Section 4(1) (b) of the Act casts an obligation on each public authority to publish a

• Part I Introduction to Information Theory; Discrete Memoryless Sources; Information Measures; Source Coding Theorem;.. • Source Coding Techniques; Channel Capacity; Channel Coding

Of those who have used the internet to access information and advice about health, the most trustworthy sources are considered to be the NHS website (81 per cent), charity