IETE Journal of Research

Vol 44, Nos 4&5, July-October 1998, pp 219-225.

**On Making Neural Network Based Learning ** **Systems Robust**

**ASHISH GHOSH**

M achine Intelligence U nit, Indian Statistical Institute, 203 B T Road, C alcu tta 7 0 0 035, India.

ash@ isical.ac.in

### AND

**HIDEO TANAKA**

**Department of Industrial Engineering, College of Engineering, Osaka Prefecture University,**
**1-1 Gakuen-clio,Sakai, Osaka 593, Japan. **

**tanaka@ie.osakafu-u.ac.jp**

**A method for making nerual network based learning systems robust with respect to ** **component failure (damaging of nodes/links) is suggested in the present investigation. The ** **method allows some of the components to fail at various instants of the entire learning process. **

**The change in error value caused by this damage will be adjusted while the other components ** **learn their parameters during the rest part of learning. The damaging/component failure process ** **has been modeled as a Poisson process. The instants or moments of damaging are chosen by ** **statistical sampling. The components to be damaged are determined randomly. As an illustration, ** **the model is implemented on the back-propagation learning algorithm.**

**Indexing terms: Neural networks, Robustness, Learning algorithms**

**Indexing terms: Neural networks, Robustness, Learning algorithms**

**A**

n eural netw ork (N N ) [l' 51 based system consists o f a
larg e n u m b er o f neurons w ith m assive connectivity
a m o n g them . L ocal connectivity am ong th e neurons/nodes
(c o m p u tin g elem ents) being very high and th e storage o f
in fo rm a tio n being distributed, th e approach is claim ed to
b e h ig h ly ro b u st and can b e applied even w hen
in fo rm a tio n is ill-defined and/or defective/partial o r noisy.
I f so m e o f th e com ponents fail to w o rk com pletely or p artially , the o ther com ponents ad ju st them selves (during ite rativ e learn in g ) in such a m a n n er th a t th e outp u t is not d e te rio ra te d m uch. T his featu re is exploited here to p ro v id e a m o d el fo r designing ro b u st N N based learning sy stem s. A s a N N based system co n sists o f a large n um ber o f co m p o n en ts, the possibility o f som e o f its com ponents (n o d e s an d /o r links) to fail to w o rk is very high. H ere lies th e n ec essity o f designing ro b u st (under com plete or p artial failu re o f com ponents) learning system s. In this c o n te x t w e m ention th at several w orks t6' 9* have been d o n e to d esig n o p tim um N N architectures by dam aging so m e o f th e com ponents.

In th e p re se n t w ork an attem p t is m ade to provide a m o d e l fo r d esig n in g robust neural n etw ork based learning system s. T h is is d o n e by d am ag in g (com pletely) som e o f th e co m p o n e n ts o f a N N b ased system during the process o f le arn in g its param eters (w eights and biases) and stu d y in g .the ch an g e in its p erform ance. S ince (som e of) th e co m p o n e n ts are dam aged a t different tim e instants o f

th e entire learning process, th e error values at the output nodes w ill be different than if those com ponents would n o t fail; and thus th e o th e r com ponents will adjust their param eters so as to com pensate for this dam age. T hus the perform ance will not be deteriorated much due to this d am ag e and the system w ill be robust. The com ponent failure process o f N N has already been shown to follow Poisson distribution u n d er certain assum ptions [1°1. U n d er this m odel, the com ponents (nodes and/or links) to be dam aged are chosen random ly. T he tim e instants {i.e., w hen a dam age occurs) are determ ined b y d raw in g ran d o m sam ples from the appropriate prob ab ility distribution I,I1 2 1.

T hough the proposed m odel is valid, in gen eral, for any N N based learning system , the problem o f m ulti-lay er perceptron based classification using back-propagation learning is considered here fo r a dem onstration o f th e validity o f the m odel. It is im plem ented on IR IS d a t a 113'.

To dem onstrate the utility o f th e proposed m odel, a few links are dam aged co m pletely d uring the learning p hase (this has the underlying assum ption that th e tra in in g / learning phase takes a large, am o u n t o f tim e co m p ared to testing phase) o f the classificatory system . In th e te stin g phase, the perform ance" o f such a system is ev aluated by m easuring the p ercen tag e o f co rrect classification. A netw ork with the sam e config u ratio n (as used in th e previous experim ent) is th e n allow ed to learn w ith th e sam e s'et o f training sa m p le s (w ithout any c o m p o n e n t dam age). T he sam e set o f links w hich w ere d a m a g e d

during the learning p hase o f the previous e x p e rim en t are then dam aged (b efo re testing) and th e classification accuracy is m easured. A com parative study o f th e perfect classification rates for these tw o experim ents sh o w s that the proposed m odel provides m ore robust (b etter accuracy even with d am ag in g o f com ponents) perform ance.

**MODELING AND SAMPLING OF FAILURES IN A ** **NEURAL NETWORK**

**Modeling of failures**

L et us c o n sid er a neural network system with *N*
com ponents, w here a com ponent could be a node
(processor) o r a link. (In practical case one can m odel the
nodes and links separately also [l01). During the operation
o f the netw ork som e o f its com ponents may fail. We m ake
th e follow ing assum ptions about the failure process:

(i) T he system has *N* identical com ponents at th e tim e
instant *t = 0 (when the operation o f the netw ork *
starts).

*(it) * If a com ponent fails, it fails for ever (no re p a ir or
replacem ent).

*(Hi) Failure o f com ponents occurs at an average ra te o f fi *
per unit tim e.

(/v) The probability o f an event occurring betw een tim e t
and *t + h depends only on the length o f h, i.e., the *
probability d o es not depend on either the n u m b er o f
events that has occurred up to tim e t o r the specific
value o f t, i.e., the probability density function has
stationary increm ents.

(v) The probability o f a failure during a very sm all
interval o f tim e *h is positive but less than one (1), *
i.e., not certain.

(vi) A t m ost one failure can occur during a very sm all interval o f tim e h.

L et *p n(t) b e th e probability that the system h as n *
com ponents active at tim e instant *t, i.e., N - n failures *
during tim e interval [0, f]. It can be shown th at u n d er
th e assum ption s (i)-(vi), p n(t) is given by the form ula:

*,N -n*

*Pn(') =*

*e ^ ‘(jl t) *

*(N* - n)! n = 1 , 2 , . . . , AT

**(**

^{1}

**)**

a n d

*Po ( 0 = 1*
*N*

**£./»«( o-**

*n = I*

T h u s w e se e th a t p„(t) is a truncated Poisson distribution w ith m ean fi t.

I f *f i t )* is the p ro b ab ility density function ( p d f ) o f the
inter-failure tim e (i.e., tim e interval betw een tw o

successive failures), then it can be show n that for the earlier. Poisson failure p ro cess,/ ( / ) is given by

*f ( t ) = n e -v < * *t >* 0

= 0 *t <* 0. (3)

Thus, w hen the failure process is gov ern ed by a P o isso n / distribution, the inter-failure tim e is described by ah exponential distribution (3) with expected value (m ean)

E ( 0 = *[ ~ t f i t ) d t*

•'/i

(4)

**Sampling**

In .order to sim ulate the failure process, one needs to draw random sam ples from the exponential distribution (3). B efore describing the exact alg o rith m , let us first, consider the general strategy fo r sam p lin g from any distribution.

L e t *f ( x )* b e th e p d f o f the random d ev iate *x, and F(x) *
be the cum ulative density function (cdf) o f *x , i.e.,*

*F(X)= r m d t .*

(5)
It can be easily show n that th e ran d o m variable *y -*
*F ix ) is uniform ly distributed over 10, 1], regardless o f the *
distribution o f x. H ence, if R is a ran d o m n u m b e r draw n
from uniform [0, 1], then *x = F ~ liR ) is a random sam ple *
from the *p d f f i x ) .* T herefore, sam p lin g from any
distribution can be d o n e using the follo w in g sim ple
method having tw o steps.

S tep 1: G enerate a random n um ber R in [0,1] and assign it to F ix).

S tep 2: Solve f o r*x from R ~ F ix ) .*

The above sam pling m ethod is know n as m ethod o f inversion.

*S a m p lin g fr o m e x p o n e n tia l d istrib u tio n *

For an exponential distribution the p d f is
*f i t ) = lie '* 11 * *fi >* 0 *t>* 0

= 0 r < 0 .

Then

*F (t)= ( ' n e - ^ d x = l - e'*1'.*

(

**6**

^{)}

(2) I f the random n um ber draw n is R then
*R = F ( t )*

or, *R = ^ - e - ^ l. ,*

or, f = ~ l j n ( l - i ? ) = - £ - l n K . (7)

T h e last step is possible because if *R is a random *
n u m b e r o n [0, 1] then so is (1 - *R) and w e can replace *
(1 *- R ) b y R fo r convenience.*

It h a s been established before that if the failure
p ro c e ss is d escrib ed by a Poisson distribution, then the
tim e b e tw e e n the occurrence o f failures (inter-failure
tim e ) m u st follow th e co rresponding exponential
d istrib u tio n . T h u s in order to sim ulate the com ponent
fa ilu re p ro ce ss d escribed by the P oisson distribution with
m e an *lit, o v er a tim e period [0, T], all one has to do is to *
sa m p le th e co rresp o n d in g exponential distribution with
m ean 1 //x as m any tim es as necessary until the sum o f the
c o rre sp o n d in g exponential random sam ples generated
e x c e e d s *T* for th e first tim e. It can further be explained as
follow s.

S u p p o se *R t is the ith random sam ple draw n from *
u n ifo rm [0, 1], then

*ti = - j r \ n R i* ( 8)

is th e ith sa m p le from the exponential distribution (3).

T h e re fo re
*T i= Z * *tj*

*j=* I

(9)

### /,= X *WyOj*

^{(}

^{10}

^{)}

w ith *Oj as the outp u t o f the y'th neuron in the previous *
layer and w(/ is the co nnection w eight betw een the ith
node o f one layer and the y'th node o f the previous layer.

T he output o f a node i is obtained as

(11) w here / i s the activation function 11!. M ostly the activation function is sigm oidal w ith th e form

*f i x ) =* (12)

gives th e tim e instant w hen the ith (com ponent) failure
occu rs. T he p ro cess is repeated fo r the m axim um num ber
o f tim es (K, say) such that *Tk < T.*

**MAKING THE LEARNING SYSTEMS ROBUST**

T h o u g h th e m ethodology th at is g oing to be d eveloped fo r m ak in g neural n etw ork based learning system s ro b u st is true for any N N based system s, the presen t d isc u ssio n is m ade by co n sid erin g a specific type o f N N (the m u lti-lay e r perceptron). S o, let us briefly describ e th e arc h itec tu re and w o rk in g principles o f the m u lti-lay e r (5erceptron first.

In g e n e ra l, a m ultilayer p erceptron (M L P ) M is m ade up o f sets o f n o d es arranged in layers. N odes o f tw o differen t co n sec u tiv e la y ers are co n n e cted b y links or w eights, but th e re is no con n ectio n a m o n g th e elem ents o f the sa m e layer. T h e lay er w here the in p u ts are presented is know n as th e in p u t layer. O n th e o th e r h and th e output producing la y er is ca lle d the o u tp u t layer. T h e layers in betw een the in p u t and th e o u tp u t layers are know n as hidddn layers. T h e o utput o f n o d es in one layer is transm itted to n o d es in ano th er la y er via links that am plify o r a tten u a te o r in h ib it su ch o utputs through w eighting factors. E x ce p t fo r the in put lay er nodes, the total in p u t to e a ch n o d e is th e sum o f w eighted outputs o f the nodes in th e p rev io u s layer. E ach n o d e is activated in accordance w ith th e in put to th e n o d e and th e activation function o f th e node. T he total in p u t (/,) to the ith unit o f any lay er is,

T he function is sym m etrical around *Q, and 0O controls the *
steepness o f the function. *6 is know n as the threshold/bias *
value.

**The back-propagation learning algorithm**

F or the operation o f the m ulti-layer perceptron,
initially very small random values are assigned to the
links/w eights. In the learning p hase'(training) o f such a
netw ork w e present the pattern X = (x ,), w here*, is the ith
com ponent o f the vector *X*, as input and ask the net to
adjust its set o f w eights in the connecting links and also
the thresholds in the nodes such that the desired output
{r, | is obtained at the o u tp u t nodes. A fter this, we present
an o th er pair o f X and {/,}, and ask the net to learn that
association also. In fact, w e desire the net to find a sim ple
set o f w eights and biases th a t w ill be able to discrim inate
am ong all th e in put/output pairs presented to it. T his
process can pose a very strenuous learning task and is not
alw ays readily accom plished. H ere the desired output
basically acts as a teacher w hich tries to m inim ize the
error.

In general, the o utput {»,} will not be the sam e as the target o r desired value {f,}. F or a pattern the error is,

£ = 4 - £

** * *i*

(13) w here the factor o f one h a lf is inserted for m athem atical convenience. T h e increm ental change in w eights fo r a particular pattern p is given by [1]

*Awji = T}Sj* ** o,**

with

*dE*

(14)

(15) A s E can be directly calculated in the output layer, fo r the links co nnected to the o u tp u t la y er the change in w eig h t is given by

### A

^{Wji}### = 7] <- - ^ - ) / ’

*dE*

*( I j) O i*

### .

^{(16)}

F o r the w eights w hich do not affect the o u tp u t nodes directly

*dE*

*d o j * **= X ** *k*

^{(~}^{8}

^{8}

^{k)}** w**

^{>kj.}^{(}

^{(}

^{17}

^{)}

^{)}

H ence

*dE*

**v (L skwkj)f'(fP °i**

**v (L skwkj)f'(fP °i**

*k*

(18)

for th e o u tp u t lay er and othei layers, respectively. In particu lar e q u a tio n s (11) and (12), if

*O j = f ( I j ) =*

** ---**

^{77}

**(**

^{19}

**)**

1

then

*do;*

### ( **20** )

and thus w e g et

A *Wji =*

**n ( L sk wkj)**** °j (! - ****op (>i**^{(}^{21}^{)}

fo r the output layer and other layers, respectively.

It m ay be m entioned here that a large value o f r]

corresponds to rap id learning but m ight result in oscillations. A m om entum term o f a Wjt (t) can be added to increase th e learn in g rate and thus expression (14) can be m odified as

A*Wji (t + 1) = jjS j* o,- + a Awji (t) (

**22**

)
w here the quantity ( f + 1) is used to indicate the *(t + l)th*tim e instant, and a is a proportionality constant. The second term is used to specify that the change in at

*(t + l) th in stan t should b e som ew hat sim ilar to th e change*undertaken a t in stan t t.

**Making the learning systems robust**

In any system , com ponents may fail with passage o f tim e. In c a se o f N N based system s the com ponents are the neu ro n s/n o d es an d links. So, in such system s som e o f the neurons o r links or b oth m ay get dam aged in .c o u rse of tim e. N N based inform ation processing system s are norm ally c laim ed to b e ro b u st under com ponents failure as th e N N arc h itectu res, involve m assive processing elem ents and connectivity am ong them (m ostly with red u n d a n t co m p o n en ts). T hus even if we dam age som e o f th e co m p o n en ts in the learning phase, the strength o f other links an d b ia ses o f oth er nodes will autom atically get adjusted so as to co m p en sate for this dam age d u rin g the

rest p art o f learning resulting in h ig h er classification accuracy during testing phase. T his basic property can be used to design robust learning system s.

L et T b e the total tim e required fo r / iterations to learn the param eters o f a.N N on a m onoprocess system (w hich can roughly be estim ated from previous experim ents).

Then the tim e required per iteration is

*r=z*

*I '*

**(**

^{23}

**)**

N ote that tim e spent for testing is neg lig ib le com pared
to *T* (learning tim e). A lso let *t be the tim e req u ired fo r *
updating a single node (includes collecting the input to it,
transform ing the input to output, updating the links
connected to it). In practical case, *t m ay n ot b e equal fo r *
all nodes. S uppose there is an n-layer netw ork; w here the
operation betw een layers is strictly sequential and
operation am ong the nodes in th e sam e layer are parallel.

L et there be nj nodes in the first layer; *n 2 nodes in the *
second layer, « 3 nodes in the third layer and so on. As no
tim e is spent in the input layer,

So,

*t - ■* *T*

*n2 + «3 + •••+ n„*

**(**

^{24}

**)**

(25)

Since there are *(n -* 1) layers w hich o p e ra te ' strictly
sequentially, the tim e required' p er ite ratio n fo r parallel
im plem entation is

( « - ! ) / = •

**( n - l ) r**

n2 + n3 + ...+ *nn* (26)

T hus total tim e (for / iterations) req u ired fo r parallel im plem entation is

*T„ = -* ( n - l ) r

- / = • 0n - l ) T

(27)

If D com ponents are to be d am ag e d d u rin g this process,
then the param eter *(ji) for the P oisson distribution (1) is *
estim ated as,

*D*

**(**

^{28}

**)**

L et the inter-dam age tim e periods (sam ples) be
*tp j* = 1,2, - ., *L (L = D). N ow fo r each o f the L* tim e
instants select a com ponent to b e d am aged. In other
words, select *L* com ponents and d am ag e the ith selected
com ponent after T, seconds, w here

7 = 1 *i •* (29)

N ow if the ith com ponent is to be d am ag e d a t a tim e

Tj-, a n d T, falls in fcth iteration, then for the fcth and su b se q u e n t iterations assum e the ith com ponent as d am ag e d .

T h u s w e n o tice that the com ponents are getting d a m ag e d at various m om ents o f learning. O nly one c o m p o n e n t m ay g et dam aged at the end o f the learning phase. S in ce th e learning process continues even after d a m a g in g o f so m e o f the com ponents, th e adjustm ent o f th e o th e r co m p o n en ts will be in a w ay so as to get the o p tim u m p erfo rm an ce (i.e., with this configuration only th e e rro r value is m inim ized) thereby com pensating for th is d am ag e. N o w if these com ponents w ould have been failed afte r th e co m p letio n o f learning, the perform ance o f th e sy stem w ould d egrade a lot. S ince in a N N based sy stem failu re o f com ponents is natural, it is better to in c o rp o ra te th is fact (by deliberately assum ing som e o f th e c o m p o n en ts as dam aged) w hile learning the p ara m ete rs th ereb y achieving robust .perform ance. Please n o te th a t even th o u g h there is no ch a n g e in the learning alg o rith m , inco rp o ratio n o f dam aging o f com ponents d u rin g learn in g w ill m ake the system robust.

A sim ila r d iscu ssio n can be m ade on treating the links as se p arate co m p o n en ts and assum ing th at the total tim e is sp e n t o n u p d atin g th e links.

W e have co n d u c te d the present sim ulation study by d a m ag in g th e links only. This is due to the fact that the n u m b e r o f nodes in th e hidden layer is very sm all and the

usefulness o f the p roposed algorithm can n o t be d em onstrated rigorously. A s th ere rem ains no redundancy in the input and output nodes the present technique m ay not be useful to study the failure process o f these nodes. A sim ilar study can be d one by d am ag in g a com bination o f links and nodes.

**RESULTS AND DISCUSSION**

T h e proposed m ethod is im plem ented and tested on the standard M L P based classification problem . Im plem entation is d one on th e IRIS data set (150 sam ples) w hich has 4 input features and 3 classes. The netw ork architecture chosen is 4-3-3, i.e., it has 4 input nodes, one hidden layer w ith 3 nodes and 3 nodes in the output layer. For different sim ulations 10%, 20% and 50%

sam ples w ere taken random ly for training; and the w hole set (o f 150) was taken for testing. The percentage o f co rrect classification (w ith different training sizes and param eter values) are depicted in Tables 1-4. Sim ilar experim ents w ere perform ed by dam aging 2 links and 4 links (o u t o f 4 x 3 + 3 x 3 = 21) w hile the network is in th e training phase. T he classification accuracies are also p u t in th e sam e set o f tables. T he dam aging o f the links w ere p erform ed according to the procedure described earlier. A n o th er se t o f experim ents w ere perform ed using th e learned netw ork by dam aging the sam e set o f links (i.e., those links w hich w ere dam aged during training p hase o f th e form erly described set o f experim ents) in the

T A B LE 1 Percentage o f correct classification w ith learning ra te *tj*= 0.2 and m om entum value a =0.2

Training Usual Damaged during learning Damaged during testing

size (in %) MLP 2-components 4-components 2-components 4-components

10 96 96 66 96 66

20 96 96 % 96 92

50 98 66 64 66 33

T A B LE 2 Percentage o f correct classification w ith learning r a te t; = 0.2 and m om entum value a =0.5

Training Usual Damaged during learning Damaged during testing

size, (in %) MLP 2-components 4-components 2-components 4-components

10 96 98 94 33 33

20 95 96 62 93 33

50 98 98 66 98 66

T A B L E 3 Percentage o f correct classification w ith learning ra te *tj*= 0.5 and m om entum value a =: 0.2

Training Usual Damaged during learning Damaged during testing

size (in %) MLP 2-components 4-components 2-components 4-components

10 98 96 96 97 95

20 98 94 94 33 . 33

50 97 97 97 33 33

TA BLE 4 Percentage of correct classification with learning rate tj = 0.5 and m omentum value a = 0.5

Training Usual Damaged during learning Damaged during testing

size (in %) MLP 2-components 4-components 2-components 4-components

10 93 93 93 95 95

20 98 98 96 62 33

50 96 96 96 66 33

testing phase an d e v a lu a tin g the classification accuracy on this dam aged arc h itec tu re.T h e percentage scores obtained by dam aging 2 an d 4 links (o f this settled netw ork) are also included in T ables (1 - 4).

"From th e ta b les w e notice that in m ost o f the cases
(with the sam e n etw o rk architecture and sam e set o f
param eter v alu es) th e classification accuracy is m ore
when the links w ere dam aged during the training/learning
phase than they w ere dam aged during testing. For
exam ple let us co n sid e r the situation with 77 = 0.5 and
*a* = 0.2 (Table 3). F or 20% training sam ple, the
classification accuracy is 98% w ithout any dam age. T he
accuracy reduces to 94% with dam aging 2 and 4 links
during learning. B ut, if th e sam e links are d am aged w hile
testing (on the learned network) the classification
accuracy is drastically reduced to 33% . T h u s it is
advisable to co n sid er th e possibility o f com p o n en t failure
in NN w hile learning its param eters thereby m aking the
system more robust.

**CONCLUSIONS AND FURTHER SUGGESTIONS**

A m ethod to m ak e neural netw ork based learning system s robust w ith respect to com p o n en t failure (dam aging o f n o des/links) is suggested in th e present investigation. T he co m p o n en ts are allowed to be dam aged at different instants o f the learning phase; thereby allow ing tim e to the new architecture (w ith few er com ponents) to adjust its param eters so as to provide optim um p erform ance. T hus th e overall perform ance will n ot be d eteriorated m uch due to this dam age. The dam ag in g /co m p o n en t failure process has been m odeled as a PoissOn process. T h e instants or m om ents o f dam aging are chosen by statistical sam pling. The com ponents to be dam aged are d eterm in e d random ly. As an illustration, the p roposed m odel is im p lem en ted and tested on th e back- propagation le arn in g b a s e d , classification algorithm . A com parative stu d y o f th e scores obtained by the proposed learning system and the standard back-propagation algorithm estab lish es the superiority o f the proposed algorithm .

The w ork p rese n ted here show s a prelim inary attem p t on desig n in g N N b ased ro b u st learning system s. A num ber o f p ro b lem s re la te d to this contribution rem a in s to be investigated. T h e m o s t im p o rtan t and natural extension o f the present c o n c e p t w ill be to develop algorithm s w hich can handle sy stem s w ith p artially dam aged com ponents.

F uzzy set theoretic approach seem s to be a viable alternative for this task. F urther, g en e raliz atio n o f this m odel so as to h andle neuro-fuzzy learn in g system s w hich deal with linguistic or fuzzy input vecto rs and provide output with m ultiple class labels and certainty factors w ill constitute another im portant study.

**ACKNOWLEDGEMENT**

A part o f this w ork was d one w hen D r A shish G hosh held a research fellow ship o f the M inistry o f E ducation, Science, Sports and C ulture, Govt, o f Japan.

**REFERENCES**

1. D E Rumelhart, J McClelland, *et al, Parallel Distributed *
*Processing: Explorations in the Microstructure o f Cogni*

*tion, vol 1, MIT Press, Cambridge, MA, 1986.*

2. S Grossberg (ed.), *Neural Networks and Natural Intelli*

*gence, MIT Press, Reading, MA, 1988.*

3. Y H Pao, *Adaptive Pattern Recognition and Neural Net*

*works, Addison Wesley, Massachusetts, 1987.*

4. T Kohonen, *Self-organization and Associative Memory. *

Springer Verlag, Berlin, 1989.

5. P D Wassermann, Neural Computing: Theory and Practice, Van Nostrand Reinhold, New York, 1990.

6. G E Hinton, Connectionist learning procedure, Artificial In

*telligence., vol 40, pp 185-235,1989.*

7. S E Fahlman & C Lebiere, The cascade correlation learning
architecture, in Advances in Neural Information Processing
*Systems 2, D S Touretzky, editor, Morgan Kaufmann, *
1990.

8. T C Lee, A M Peterson & J C Tsai, A multi-layer feed-for

ward neural network with dynamically adjustable struc

tures, Proceedings IEEE International Conference On Sys

*tems, Man, and Cybernetics, pp 367-369, 1990.*

9. A S Weigend, D E Rumelhart & B A Huberman, Generali

zation by weight elimination with application to forecast

ing, in Advances in Neural Informaton Processing Systems
*3, R P Lippmann, J *E Moddy and D S Touretzkx. teds),
Morgan Kauffmann, CA, pp 875-882, 1991.

10. A Ghosh, N R Pal & S K Pal, Modeling o f component fail

ure in neural networks for robustness evaluation: an appli

cation to object extraction, *IEEE Transactions on Neural *
*Networks, vol 6, pp 648-656, 1995.*

11. H A Taha, *Operations Research: An Introduction. * *Statistics. Academic Publishers, New Delhi, India, 1983. *

Macmilan Publishing Co. New York, 1982.

13. R G Johnson & D W Wichern, Applied Multivariate Statis-
12. A Gupta, *Groundwork o f Mathematical Probability and * *tical Analysis Prentice Hall, Inc, New jersey, 1982.*

**AUTHOR**

## ■

**Ashish Ghosh is a Lecturer at the**

**Machine Intelligence Unit, Indian**

**Statistical Institute, Calcutta. He**

**received**

**the**

**BE**

**degree**

**in**

**Electronics and Telecommunication**

**from**

**the**

**Jadavpur**

**University,**

**Calcutta in 1987, and the MTech and**

**PhD degrees in Computer Science**

**from the Indian Statistical Institute,**

**Calcutta in 1989 and 1993, respectively. He received the **
**prestigious and most coveted Young Scientists award in **
**Engineering Sciences from the Indian National Science**

**Academy in 1995; and in Computer Science from the **
**Indian Science Congress Association in 1992. He has **
**been selected as an Associate of the Indian Academy of **
**Sciences, Bangalore in 1997. He visited the Osaka **
**Prefecture ** **University, ** **Japan with ** **a ** **Post-doctoral **
**fellowship during October 1995 to March 1997; and **
**Hannan University, Japan as a visiting scholar during **
**September-October 1997. His research interests include **
**Evolutionary Computation, ** **Neural Networks, ** **Image **
**Processing, Fuzzy Sets and Systems, and Pattern **
**Recognition.**