• No results found

Training a Neural Network

N/A
N/A
Protected

Academic year: 2022

Share "Training a Neural Network"

Copied!
20
0
0

Loading.... (view fulltext now)

Full text

(1)

Lecture 20 contd: Neural Network Training using Backpropagation, Convolutional And Recurrent Neural Networks

Instructor: Prof. Ganesh Ramakrishnan

(2)

. . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . .

. .

. . . . .

Feed-forward Neural Nets

xn x2 x1 1

z1 =g(∑)

z2 =g(∑) wn1

w21 w11 w01

wn2 w22 w12 w02

f1 =g(.)

f2 =g(.) inputs

October 10, 2016 2 / 19

(3)

Training a Neural Network

STEP 0: Pick a network architecture

Number of input units: Dimension of features ϕ (

x(i)) . Number of output units: Number of classes.

Reasonable default: 1 hidden layer, or if >1 hidden layer, have same number of hidden units in every layer.

Number of hidden units in each layer a constant factor (3 or 4) of dimension ofx.

We will use

the smooth sigmoidal functiong(s) =1+e1s: We have now learnt how to train a single node sigmoidal (LR) neural network

instead of the non-smooth step functiong(s) = 1ifs[θ,)andg(s) = 0otherwise.

(4)

. . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . .

. .

. . . . .

High Level Overview of Backpropagation Algorithm for Training NN

1 Randomly initialize weightswlij for l= 1, . . . ,L,i= 1, . . . ,sl,j= 1, . . . ,sl+1.

2 Implementforward propagation to getfw(x) for anyx∈ D.

3 Execute backpropagation

1 by computing partial derivatives

∂w(l)ij E(w)forl= 1, . . . ,L,i= 1, . . . ,sl,j= 1, . . . ,sl+1.

2 and using gradient descent to try to minimize (non-convex)E(w)as a function of parametersw.

wlij=wlijη

∂w(l)ij E(w)

4 Verify that the cost functionE(w) has indeed reduced, else resort to some random perturbation of weights w.

October 10, 2016 4 / 19

(5)

The Backpropagation Algorithm

(6)

. . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . .

. .

. . . . .

Setting Notation for Backpropagation

σlsl11 σil1 σ2l1 σ1l1

σjl=σ (

sumlj)

σlsl =σ

(sumlsl) wlsl

1j

wlij wl2j wl1j

wlsl−1sl wlisl wl2sl wl1sl

σL1

σLK (l1)th layer

October 10, 2016 6 / 19

(7)

Gradient Computation

The Neural Network objective to be minimized:

E(w) =1 m

m

i=1

K k=1

y(i)k log (

σLk( x(i)))

+( 1y(i)k )

log (

1σLk( x(i)))

+ λ 2m

L l=1

sl−1 i=1

sl

j=1

(wlij)2

(1)

sumlj =

sl−1

k=1

wlkjσkl1 andσli = 1

1+esumli

∂E

∂wlij = ∂σ∂El j

∂σjl

∂sumlj

∂sumlj

∂wlij +2mλ wlij

∂σlj

∂sumlj = (

1 1+e−sumli

) (

1 1

1+e−sumli

)

=σjl(1−σjl)

∂sumlj

∂wlij = ∂wl

ij

sl1

k=1

wlkjσkl1

=σil1

(8)

. . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . .

. .

. . . . .

Gradient Computation

The Neural Network objective to be minimized:

E(w) =1 m

m

i=1

K k=1

y(i)k log (

σLk( x(i)))

+( 1y(i)k )

log (

1σLk( x(i)))

+ λ 2m

L l=1

sl−1 i=1

sl

j=1

(wlij)2

(1)

sumlj=

sl1

k=1

wlkjσkl1 andσli = 1

1+esumli

∂E

∂wlij = ∂σ∂El j

∂σjl

∂sumlj

∂sumlj

∂wlij +2mλ wlij

∂σlj

∂sumlj = (

1 1+e−sumli

) (

1 1

1+e−sumli

)

=σlj(1−σjl)

∂sumlj

∂wlij = ∂wl ij

sl1

k=1

wlkjσkl1

=σil1

October 10, 2016 7 / 19

(9)

For a single example(x,y):

K

k=1

yklog( σLk(x))

+ (1yk)log(

1σLk(x))

+ λ 2m

L l=1

sl−1 i=1

sl

j=1

(wlij)2

(2)

∂E

∂σlj =

sl+1

p=1

∂E

∂suml+1p

∂suml+1p

∂σlj =

sl+1

p=1

∂E

∂σpl+1

∂σpl+1

∂suml+1p wl+1jp since ∂suml+1p

∂σlj =wl+1jp

∂E

∂σLj =σyLj

j 11σyLj j

(10)

. . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . .

. .

. . . . .

Backpropagation in Action

σlsl11 σil1 σ2l1 σ1l1

∂E

∂σl j

= sl+1 p=1

∂E

∂σl+1p

∂σl+1p

∂suml+1p wl+1jp

∂E

∂σl sl

= sl+1 p=1

∂E

∂σl+1p

∂σl+1p

∂suml+1sl+1wl+1sl+1p

wlsl

1j

wlij wl2j wl1j

wlsl

1sl

wlis

l

wl2sl wl1sl

σL1

σLK (l1)th layer

October 10, 2016 9 / 19

(11)

Backpropagation in Action

. σil1

. .

∂E

∂σlj

.

∂E

∂wlij = ∂σ∂El j

∂σjl

∂sumlj

∂sumlj

∂wlij

. .

. . . .

σL1

σLK (l1)th layer

(12)

. . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . .

. .

. . . . .

Recall and Substitute

sumlj=

sl−1

k=1

wlkjσkl1 andσli = 1

1+esumli

∂E

∂wlij = ∂σ∂El j

∂σjl

∂sumlj

∂sumlj

∂wlij +2mλ wlij

∂σlj

∂sumlj =σjl(1−σlj)

∂sumlj

∂wlij =σil1

∂E

∂σlj =

sl+1

p=1

∂E

∂σjl+1

∂σjl+1

∂suml+1p wl+1jp

∂E

∂σLj =σyLj

j 11σyLj j

October 10, 2016 11 / 19

(13)

Backpropagation in Action

. σil1

. .

∂E

∂σjl,σjl

.

∂E

∂wlij = ∂σ∂El

jσjl(1−σljli1+2mλ wlij .

.

. . . .

σL1

σLK (l1)th layer

(14)

. . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . .

. .

. . . . .

Backpropagation in Action

. σil1

. .

∂E

∂σlj

∂E

∂σlsl

.

wlij =wlij−η∂w∂El ij

. .

. . . .

σL1

σLK (l1)th layer

October 10, 2016 13 / 19

(15)

The Backpropagation Algorithm for Training NN

1 Randomly initialize weightswlij for l= 1, . . . ,L,i= 1, . . . ,sl,j= 1, . . . ,sl+1.

2 Implementforward propagation to getfw(x) for everyx∈ D.

3 Execute backpropagationon any misclassifiedx∈ D by performing gradient descent to minimize (non-convex)E(w)as a function of parameters w.

4 ∂E

∂σLj =σyLj

j 11σyLj j

for j= 1 tosL.

5 For l=L−1 down to 2:

1 ∂E

∂σlj =

sl+1

p=1

∂E

∂σl+1j σjl+1(1σjl+1)wl+1jp

2 ∂E

∂wlij =∂σ∂El j

σjl(1σjlli1+2mλ wlij

3 wlij=wlijη∂w∂El ij

6 Keep picking misclassified examples until the cost function E(w) shows significant

(16)

. . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . .

. .

. . . . .

Challenges with Neural Network Training

October 10, 2016 15 / 19

(17)

What Changed with Neural Networks?

Origin: Computational Model of Threshold Logic from Warren McCulloch and Walter Pitts (1943)

Big Leap: For ImageNet Challenge, AlexNet acheived 85 % accuracy(NIPS 2012). Prev best was 75 % (CVPR 2011).

Present best is 96.5 % MSRA (arXiv 2015). Comparable to human level accuracy.

Challenges involved were varied background, same object with different colors(e.g. cats), varied sizes and postures of same objects, varied illuminated conditions.

Tasks like OCR, Speech recognition are now possible without segmenting the word image/signal into character images/signals.

(18)

. . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . .

. .

. . . . .

LeNet(1989 and 1998) v/s AlexNet(2012)

October 10, 2016 17 / 19

(19)

Reasons for Big Leap

Why LeeNet was not as successful as AlexNet, though the algorithm was same?

Right algorithm at wrong time.

Modern features.

Advancement in Machine learning.

Realistic data collection in huge amount due to: regular competitions, evaluation metrics or challenging problem statements.

Advances in Computational Resources: GPUs, industrial scale clusters.

Evolution of tasks: Classification of 10 objects to 100 objects to “structure of classes”.

(20)

. . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . .

. .

. . . . .

Convolutional Neural Network

Variation of multi layer feedforward neural network designed to use minimal preprocessing with wide application in image recognition and natural language processing

Traditional multilayer perceptron(MLP) models do not take into account spatial structure of data and suffer from curse of dimensionality

Convolution Neural network has smaller number of parameters due tolocal connections andweight sharing

October 10, 2016 19 / 19

References

Related documents

EEC is described by artificial neural network (ANN) and it demonstrates the accurate match of experimental data for EEC with different training functions (TRAINOSS, TRAINLM,

We have implemented three models such as Radial Basis Function Neural Network (RBFNN) model, Ensemble model based on two types Feed Forward Neural Networks and one Radial Basis

The efficacy of using a neural network for target is well inferable from the results of the simulation presented in this chapter. Some hardware aspects of neural networks and a

 Single Layer Functional Link Artificial Neural Networks (FLANN) such as Chebyshev Neural Network (ChNN), Legendre Neural Network (LeNN), Simple Orthogonal Polynomial

This research aims at developing efficient effort estimation models for agile and web-based software by using various neural networks such as Feed-Forward Neural Network (FFNN),

The Levenberg Marquardt method [7] may be used in conjunction with the backpropagation method to train a neural network. It has been designed to approach

Chapter–4 In this chapter, application of different techniques of neural networks (NNs) are chosen such as back propagation algorithm (BPA) and radial basis function neural

Each decomposed signals are forecasted individually with three different neural networks (multilayer feed-forward neural network, wavelet based multilayer feed-forward neural