• No results found

Backpropagation – for outermost layer

N/A
N/A
Protected

Academic year: 2022

Share "Backpropagation – for outermost layer"

Copied!
23
0
0

Loading.... (view fulltext now)

Full text

(1)

CS623: Introduction to

Computing with Neural Nets (lecture-5)

Pushpak Bhattacharyya

Computer Science and Engineering Department

IIT Bombay

(2)

Backpropagation algorithm

• Fully connected feed forward network

• Pure FF network (no jumping of connections over layers)

Hidden layers

Input layer (n i/p neurons)

Output layer (m o/p neurons)

j i

wji

….

….

….

….

(3)

Gradient Descent Equations

i ji

j ji

j

th j

ji j j

ji

ji ji

w jo j net w

net j E

w net net net

E w

E

w w E

δ ηδ ηδ δ

δ δ δ

δ δ δ

δ δ

δ

η δ η

η δ

=

=

=

=

×

=

=

=

) layer j

at the input

(

) 1 0

rate, learning

(

(4)

Backpropagation – for outermost layer

i j

j j

j ji

j j

j j

m

p

p p

th j

j j j

j

o o

o o

t w

o o

o t

j

o t

E

net net o o

E net

j E

) 1

( )

(

)) 1

( )

( ( Hence,

) 2 (

1

) layer j

at the input

(

1

2

=

=

=

=

×

=

=

=

η δ

δ δ δ

δ δ

δ δ

(5)

Backpropagation for hidden layers

Hidden layers

Input layer (n i/p neurons)

Output layer (m o/p neurons)

j i

….

….

….

….

k

δk is propagated backwards to find value of δj

(6)

Backpropagation – for hidden layers

i j j

k

k kj

j j

k k kj

j

j j

k k j

j j

j

j j j

j i ji

o o o

w

o o

w

o o o

netk net

E o o o

E

net o o

E net

j E

jo w

) 1

( ) (

) 1

( )

( Hence,

) 1

( )

(

) 1

(

layer next

layer next layer

next

=

×

×

=

×

×

=

×

=

×

=

=

=

δ

δ δ

δ δ δ

δ δ

δ

δ δ δ

δ δ

δ δ

ηδ

(7)

General Backpropagation Rule

i j

j k

k

kj o o o

w ) (1 ) (

layer next

=

δ

) 1

( )

( j j j j

j = to oo

δ

i

ji

jo

w = ηδ

• General weight updating rule:

• Where

for outermost layer

for hidden layers

(8)

How does it work?

• Input propagation forward and error propagation backward (e.g. XOR)

w2=1

w1=1 = 0.5

x1x2 x1x2

-1

x1 x2

1.5 -1

1.5

1 1

(9)

Issues in the training algorithm

• Algorithm is greedy. It always changes weight such that E reduces.

• The algorithm may get stuck up in a local minimum.

(10)

Local Minima

Due to the Greedy

nature of BP, it can get stuck in local

minimum m and will never be able to

reach the global minimum g as the error can only

decrease by weight change.

m

g

m- local minima, g- global minima Error Surface

Figure- Getting Stuck in local minimum

(11)

Reasons for no progress in training

1. Stuck in local minimum.

2. Network paralysis. (High –ve or +ve i/p makes neurons to saturate.)

3.

η

(learning rate) is too small.

(12)

Diagnostics in action (1)

1) If stuck in local minimum, try the following:

– Re-initializing the weight vector.

– Increase the learning rate.

– Introduce more neurons in the hidden layer.

(13)

Diagnostics in action (1) contd.

2) If it is network paralysis, then increase the number of neurons in the hidden layer.

Problem: How to configure the hidden layer ?

Known: One hidden layer seems to be sufficient. [Kolmogorov (1960’s)]

(14)

Diagnostics in action(2)

Kolgomorov statement:

More hidden layers reduce the size of individual layers.

(15)

Diagnostics in action(3)

3) Observe the outputs: If they are close to 0 or 1, try the following:

1. Scale the inputs or divide by a normalizing factor.

2. Change the shape and size of the sigmoid.

(16)

Answers to Quiz-1

• Q1: Show that of the 256 Boolean functions of 3 variables, only half are computable by a threshold perceptron

• Ans: The characteristic equation for 3 variables is W1X1+W2X2+W3X3= (E)

The 8 Boolean value combinations when inserted in (E) will produce 8 hyperplanes passing through the origin in the < W1, W2, W3, > space.

(17)

Q1 (contd)

The maximum number of function

computable by this perceptron is the number of regions produced by the

intersection of these 8 planes in the 4 dimensional space

R8,4= R7,4 + R7,3 (1)

R1,4= 2 and Rm,2= 2m, for m= 1,4 (boundary condition)

(18)

(8,4)

(7,4) (7,3)

(6,4) (6,3) (6,2)

(5,4) (5,3) (5,2)

(4,4)

(3,4) (2,4)

(1,4)

(4,3) (4,2)

(3,3) (3,2)

(2,3) (2,2)

(1,2) (1,3)

2 2 2

4

6

8

10

12

4 4

8 8

30 22

16 14

52 32

84 44

128

Each non-root node has 2 children as per the the

recurrence relation.

The value of Rm,n is Stored beside the node (m,n)

Answer

(19)

Answer to Quiz1 (contd)

• Q2. Prove if a perceptron with sin(x) as i-o relation can compute X-OR

• Ans:

yu

yl

y

W1 W2

(20)

Q2 (contd)

Input <0,0>: y < yl

sin( ) < yl (1)

Input <0,1>: y > yu

sin(W1+ ) > yu (2) Input <1,0>: y > yu

sin(W2+ ) > yu (3) Input <1,1>: y < yl

sin(W1+W2+ ) < yl (4)

(21)

Q2 (contd)

Taking

yl =0.1, yu= 0.9 W1=( /2)=W2

= 0

We can see that the perceptron can compute X-OR

(22)

Answer to Quiz-1 (contd)

• Q3: If in the perceptron training algorithm, the failed vector is again chosen by the

algorithm, will there be any problem?

• Ans:

In PTA,

Wn= Wn-1+ Xfail

After this, Xfail is chosen again for testing

and is added if fails again. This continues until Wk.Xfail > 0. Will this terminate?

(23)

Q3 (contd)

It will, because:

Wn= Wn-1 + Xfail Wn-1= Wn-2 + Xfail

. . .

Wn= W0 + n.Xfail

Therefore, Wn.Xfail= W0.Xfail+ n.(Xfail)2

-

Positive, growing with n.

Will overtake –

after some iterations.

Hence “no problem” is the answer.

References

Related documents

A multilayer FFN with linear neurons is collapsible to a single linear neuron, hence no a additional power due to hidden layer. Non-linearity is essential

As this network has one or more layers between the input and the output layer, it is called hidden layers..

Dijkstra’s algorithm to compute the shortest path through a graph.. Computer Networks, Fifth Edition by Andrew Tanenbaum and David Wetherall, © Pearson Education-Prentice Hall,

INDEPENDENT MONITORING BOARD | RECOMMENDED ACTION.. Rationale: Repeatedly, in field surveys, from front-line polio workers, and in meeting after meeting, it has become clear that

[r]

Then the neural network 3-5-1 ( three input neurons, five hidden neurons in one hidden layer and one output neuron) is trained using nine different training

It is found that ANN modeling based on MLP methodology using SCG training algorithm in combination with sinusoidal activation function, 6 hidden layer neurons

Figure 3.5 shows a 2 layer feedforward neural network with 12 input nodes, 5 hidden nodes, and single output node... Some offset bias may also be