• No results found

Recap: One Hot Encoding for Characters

N/A
N/A
Protected

Academic year: 2022

Share "Recap: One Hot Encoding for Characters"

Copied!
23
0
0

Loading.... (view fulltext now)

Full text

(1)

. . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . .

. .

. . . . .

Lecture 26b: Unsupervised Learning: Dimensionality Reduction, Embeddings, PCA etc

Instructor: Prof. Ganesh Ramakrishnan

October 28, 2016 1 / 15

(2)

Recall: Supervised Feature Selection based on Gain

S is a sample of training examples,pCi is proportion of examples with classCi in S

Entropy measures impurity of S: H(S)≡

K

i=1

−pCilog2pCi

Selecting Rbest attributes: LetR=

Gain(S, ϕi)= expected Gaindue to choice of ϕi Eg: Gain based on entropy - Gain(S, ϕi)≡H(S)

vValues(ϕi) |Sv|

|S|H(Sv) Do:

1 ϕ=argmax

ϕi\VGain(S, ϕi)

2 R=R ∪ {ϕ}

|R|

(3)

. . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . .

. .

. . . . .

Supervised Feature Subset Selection (Optional)

One can also Optimally select subset of features using Iterative Hard Thresholding1 for Optimal Feature Selection

Input: Error function E(w) with gradient oracle to compute E(w) sparsity level s, step-sizeη:

w(0)= 0,t= 1

while not converged do

1 w(t+1)=Ps

(

w(t)ηwE(w(t)) )

//Projection function Ps(.)picks the highest weighteds features as per the updatew(t)ηwE(w(t))and sets rest to0

2 t=t+ 1 end while Output: w(t)

1https://arxiv.org/pdf/1410.5137v2.pdf

October 28, 2016 3 / 15

(4)

Recap: One Hot Encoding for Characters

With3 characters in vocabulary,a,b andc, what would be the best encoding to inform each character occurrence to the network?

One Hot Encoding: Give a unique keyk to each character in alpha-numeric order, and encode each character with a vector of vocabulary size, with a 1for the kth element, and 0 for all other elements.

(5)

. . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . .

. .

. . . . .

Recap: One Hot Encoding for Characters

With3 characters in vocabulary,a,b andc, what would be the best encoding to inform each character occurrence to the network?

One Hot Encoding: Give a unique keyk to each character in alpha-numeric order, and encode each character with a vector of vocabulary size, with a 1for the kth element, and 0 for all other elements.

October 28, 2016 4 / 15

(6)

Encoding Words

How to encode the words for the task of labeling a drama reviews as ”liked” or ”not liked” ? Review 1: The drama was interesting, loved the way each scene was directed. I simply loved everything in the drama.

Review 2: I had three boring hours. Very boring to watch.

Review 3: I liked the role each that was assigned to each super star. Especially loved the performance of actor.

Review 4: Though I hate all the dramas of the director, this one was an exception with lot of entertainment.

(7)

. . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . .

. .

. . . . .

Encoding Words

How to encode the words for the task of labeling a ‘’drama’ reviews as ”liked” or ”not liked” ? One Hot Encoding of Words.

Bag Of Words, similar to one hot encoding of characters

Use the vocabulary of highly frequent words in reviews.

Use the word frequency in each review instead of ”1”.

October 28, 2016 6 / 15

(8)

Encoding Words

How to encode the words for the task of labeling a ‘’drama’ reviews as ”liked” or ”not liked” ? One Hot Encoding of Words.

Bag Of Words, similar to one hot encoding of characters

Use the vocabulary of highly frequent words in reviews.

Use the word frequency in each review instead of ”1”.

(9)

. . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . .

. .

. . . . .

(Word) Embedding: Motivation

Limitations of Bag of Words or One Hot Encoding for words

High Dimension: In real life scenario, the vocabulary size could be huge.

Lacks Contextual Similarity - e.g. liked and loved are contextually similar words.

October 28, 2016 7 / 15

(10)

(Word) Embedding: Motivation

Dimensionality Reduction techniques.

Bag of Frequent Words: Contextual similarity is still lacking.

What happens if one passes a one hot encoded word as both input and output to a NN?

NN Auto-encoder: Output has same form as input. We extract the encoded vector from the hidden layer.

(11)

. . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . .

. .

. . . . .

(Word) Embedding: Motivation

Dimensionality Reduction techniques.

Bag of Frequent Words: Contextual similarity is still lacking.

What happens if one passes a one hot encoded word as both input and output to a NN?

NN Auto-encoder: Output has same form as input. We extract the encoded vector from the hidden layer.

October 28, 2016 8 / 15

(12)

(Word) Embedding: Motivation

After unsupervised training with lot of online data, can a machine answer the questions like:- King - Man + Woman = ?

If France:Paris, then Japan:?

(13)

. . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . .

. .

. . . . .

(Word) Embedding: Motivation

What would be the vector for Man? King - Man + Woman = ?

If King:Man then Queen:?

October 28, 2016 10 / 15

(14)

(Word) Embedding: Motivation

King - Man + Woman = ? If King:Man then Queen:?

(15)

. . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . .

. .

. . . . .

(Word) Embedding: Motivation

What would be the vector for Man?

King - Man + Woman = ?

If King:Man then Queen:?

October 28, 2016 10 / 15

(16)

(Word) Embedding: Motivation

(17)

. . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . .

. .

. . . . .

(Word) Embedding: Motivation

What would be the vector for Man? [0.01, 0.98, 0.05, 0.6]’

King - Man + Woman = ? Queen(as vector subtraction and addition give nearly same result as the vector for Queen)

If King:Man then Queen:? Woman(as vector differences of both pairs give nearly same results)

October 28, 2016 11 / 15

(18)

(Word) Embedding

(Word) Embedding: Building a low-dimensional vector representation from corpus of text, which preserves the contextual similarity.

In simple language, we want an efficient language of numbers which deep neural networks can understand as close as possible to the way we understand words.

Training: Continuous Bag of Words Model.

Take words in one hot encoded form. Take top V frequent words to represent each word.

Consider the sentence, ”... I really liked thedrama....”.

Take a N (say 5) word window around each word and train the Neural Network with context words setC as input and the central wordwas output.

For the example above use C = {”I”, ”really”, ”the” ”drama”} as input and W = ”liked” as output.

(19)

. . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . .

. .

. . . . .

(Word) Embedding

(Word) Embedding: Building a low-dimensional vector representation from corpus of text, which preserves the contextual similarity.

In simple language, we want an efficient language of numbers which deep neural networks can understand as close as possible to the way we understand words.

Training: Continuous Bag of Words Model.

Take words in one hot encoded form. Take top V frequent words to represent each word.

Consider the sentence, ”... I really liked thedrama....”.

Take a N (say 5) word window around each word and train the Neural Network with context words setC as input and the central wordwas output.

For the example above use C = {”I”, ”really”, ”the” ”drama”} as input and W = ”liked” as output.

October 28, 2016 12 / 15

(20)

(Word) Embedding: Unsupervised Training

(21)

. . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . .

. .

. . . . .

What if we want Embeddings to be Orthogonal?

LetX be a random vector and Γ its covariance matrix.

Principal Component Analysis: Find a rotation of the original coordinate system and expressX in that system so that each new coordinate expresses as much as possible of the variability inX as can be expressed by a linear combination of the n entries ofX. This has application in data transformation, feature discovery, feature selection and so on.

October 28, 2016 14 / 15

(22)

What if we want Embeddings to be Orthogonal?

LetX be a random vector and Γ its covariance matrix.

Principal Component Analysis:

Find a rotation of the original coordinate system and expressX in that system so that each new coordinate expresses as much as possible of the variability inX as can be expressed by a linear combination of the n entries ofX. This has application in data transformation,

(23)

. . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . .

. .

. . . . .

Embeddings as Generalization of PCA

Let Xbe a random vector and Γ its covariance matrix. Let e1, . . . ,en be the n (normalized) eigenvectors ofΓ.

Then principal components of X are said to be eT1X,eT2X,. . .,eTnX.

1 Let p(X1) =N(0,1)andp(X2) =N(0,1)andcov(X1,X2) =θ. Find all the principal components of the random vector X= [X1,X2]T. [Tutorial 10]

2 Now, letY=N(0,Σ)∈ ℜp whereΣ =λ2Ip×p+α2ones(p,p) for anyλ, α∈ ℜ. Here, Ip×p is a p×p identity matrix while ones(p,p) is ap×p matrix of 1’s. Find atleast one principal component of Y. [Tutorial 10]

October 28, 2016 15 / 15

References

Related documents

Consider a square of size n × n, find a solution of above problem using a SAT solver using tiles less than k. Test your encoding on reasonably sized n

Returns a specified number of characters from the specified character expression or memo field Total all or specified numeric field values in the current database file. The results

Each workstation will have one white maker board & one soft board for pinning and one wire manager cap with all nec"siary hardware included' Each workstation will

17 / Equal to the task: financing water supply, sanitation and hygiene for a clean, the Ministry of Planning, Development and Special Initiatives is central to overall

Its main function is to segment the characters in the selected candidate region (extracted license plate) such that each character can be sent to the optical character

the lower case English language characters. With the addition of each new primitive. the comparison narrows down to the particular character in question and an unambiguous decision

Figure 5.6(a) Shows the variation of solar cell temperature, hot-spot temperature and module efficiency for one hot-spot in Glass/cell/polymer-sheet PV

On the right hand side of the ORS, using ONLY a black ink ballpoint pen, (i) darken the appropriate bubble under each digit of your registration number and (ii) write your