Certain pattern recognition tasks using genetic programming

(1)

CERTAIN PATTERN RECOGNITION TASKS USING GENETIC PROGRAMMING

Durga Prasad Muni

Electronics and Communication Sciences Unit Indian Statistical Institute

Kolkata - 700108 India.

A thesis submitted to the Indian Statistical Institute in partial fulfillment of the requirements for the degree of

Doctor of Philosophy

April 2008

(2)

ACKNOWLEDGEMENTS

This work would not have been possible without the active co-operation of my supervisor Prof. J.Das. I would also like to extend my heartfelt thanks to Prof. Nikhil R.Pal who had been not only my valued guide but also a source of inspiration. Like a true mentor, they have guided me in my effort to do the research work and develop the thesis and no word is enough to acknowledge their co-operation and help. I am grateful to them for the time they have spent in discussing our research and scrutinizing it critically and with painstaking care.

They have also been kind enough to allow our joint works to be included in this thesis.

I would also like to thank the members of the Research Fellow Advisory Committee of the Computer and Communication Sciences Division for their critical comments during my annual reviews without which this herculian task could not have been possible.

I would also take this opportunity to thank the faculties of Electronics and Communication Sciences Unit for their constant encouragement to speed up the research work.

I would like to thank Mr. Dibyendu Banerjee and my wife Sagarika for giving me the coveted support in time of need.

I also extend my sincere gratitude to Prof. P. K. Nanda, Mr. P.P. Mohanta and my seniors Dr. Debrup Chakraborty and Dr. Arijit Laha who have extended to me their incessant and unconditional support to help me go ahead with my research wyork. I would never forget the company I had from my friends such as Lingaraj, Kanhu, Sarif, Sanjaya, BLN, Pradip, Bibhas, Sitansu, Arindam, Brojeshwar, and Samrat.

I am very much grateful to my family especially to my parents for providing me the op

portunity to continue my research work unhampered and for being a source of continuous mental support and encouragement. Without them, this enormous work wouldn’t have seen the light.

The Institute’s library staff and everyone in the Electronics and Communication Sciences Unit (ECSU) have always been co-operative and friendly, which helped a lot. The Indian Statistical Institute has been like a mother tree to me helping a scapling as like me to thrive and prosper to my present self by giving me support in every sphere.

ISI, Kolkata

April 2008. (Durga Prasad Muni)

(3)

Feature

Selection and Classifier Design using G P . 8 1.3.4 Evolution of Fuzzy Classifier Rules using G P ... 9

1.3.5 Texture Generation and Classification using G P ... 9

1.3.6 Conclusion and Scope of the future works ... 1 1 2 An Introduction to Pattern Recognition and Evolutionary Algorithms 12 2.1 An Introduction to Pattern Recognition ... 12

2.1.1 D a ta Acquisition and/or P reprocessing... 14

2.1.2 Feature A n a ly s i s ... 15

2.1.3 Classification and/or c lu s t e r in g ... 15

2.2 A n Introduction to Evolutionary A lg o r ith m s ... 17

2.^{2 . 1} Genetic A lg o rith m s ... 18

2.2.2 Evolutionary Programming ... 19

(4)

2.2.3 Evolutionary Strategies... 20

2.2.4 Genetic Programming ... 21

2.3 Relcvancc of EA in Pattern Recognition ... 32

3 Classifier Design using Genetic Programming [Al, A7] 34 3.1 In tr o d u c tio n ... 34

3.2 Proposed Multi-tree G P based classifier... 36

3.2.1 In itia liz a tio n ... 37

3.2.2 Training and Fitness m e a s u r e ... 37

3.2.3 Unfitness of t r e e s ... 40

3.2.4 A modified Crossover o p e r a t io n ... 40

3.2.5 A modified point m utation o p e r a t io n ... 42

3.2.6 Termination of G P ... 44

3.2.7 Improving performance of classifier with O R - in g ... 45

3.2.8 Conflict R e s o lu tio n ... 46

3.2.9 Validation of classifier ... 51

3.3 Experimental R e s u lts ... 52

3.3.1 D a ta S e t s ... 52

3.3.2 R e s u lts ... 53

3.4 C o n c lu s io n s ... 62

4 Simultaneous feature selection and Classifier Design using Genetic Program m ing [A2] 64 4.1 In tr o d u c tio n ... 64

4.2 Designing Classifiers without Feature S e le c tio n ... 67

4.2.1 Modified Fitness F u n c t io n ... 67

4.3 Designing classifiers with on-line feature s e le c t io n ... 70

(5)

4.3.1 Selection o f a feature subset for each c h r o m o s o m e ... 70

4.3.2 Fitness Function ... 70

4.3.3 Crossover O p e r a t i o n ... 71

4.4 Experim ental R e s u l t s ... 75

4.4.1 D a t a S e t s ... 76

4.4.2 G P P a r a m e t e r s ... 77

4.4.3 R e s u lts ... 79

4.4.4 Effect o f noise f e a t u r e s ... 86

4.5 C o n c l u s io n s ... 88

5 E v olu tion o f Fuzzy classifiers U sing G en etic P ro g ra m m in g [A3] 90 5.1 In t r o d u c t io n ... 90

5.2 P r o c e d u r e ... 92

5.2.1 In itia liza tion ... 93

5.2.2 Fitness M e a s u r e ... 96

5.2.3 Crossover ... 96

5.2.4 M utation ... 99

5.2.5 Cleaning o f poor R u l e s ... 101

5.2.6 Termination o f G P and Validation ... 102

5.3 Experimental R e s u lt s ... 102

5.3.1 G P p a r a m e t e r s ...103

5.3.2 R e s u lts ... 104

5.4 C o n c lu s io n ... 113

6 T exture G en eration and Classification using G en etic P rogram m ing [A 4, A 5 , A 6] 115 6.1 I n t r o d u c t io n ...115

(6)

6.2 Proposed Texture Generation using G P ... 118

6.2.1 Initialization... 118

6.2.2 Filtering of T e x tu r e s ... 119

6.2.3 C om putation of feature v e c to r ... 120

6.2.4 F itn e s s ... 121

6.2.5 Performance of Texture Generation M e t h o d ... 123

6.3 Texture Classification using Genetic P ro g ra m m in g ...126

6.3.1 Preparation of D a ta S e t s ... 127

6.3.2 Classification R e s u lt s ...129

6.4 C o n c lu s io n ... 130

7 Conclilsion and Scope of further work 131 7.1 C o n c lu s io n ... 131

7.2 Scope of the further work ... 134

Bibliography 136

Publications of the A uthor Related to the Thesis 145

(7)

2.1 (a) D ivision by 0 and (b) square root o f a negative n u m b e r ... 23

2.2 A tree generated by Full M ethod ... 24

2.3 A m ulti-tree representation o f an individual ... 25

2.4 R oulette wheel representation for the selection s c h e m e ... 28

2.5 T w o parents for Crossover O p e r a t i o n ... 29

2.6 T w o Offspring after the Crossover O p e r a t i o n ... 30

2.7 A typical M utation O p e r a t i o n ... 30

2.8 Flow Chart o f Genetic P r o g r a m m in g ... 31

3.1 (a) and (b) Chromosomes C l and C2 before Crossover operation: (c) and (d) Chromosomes C l and C2 after Crossover o p e r a t io n ... 42

3.2 Variation o f training and test error rates for the vehicle data : (a) No post-processing is used, (b) Only O R -ing is used, (c) O R -in g and heuris tic rules are used, and (d) OR-ing, heuristic rules and weighting scheme are u s e d ... 59

3.3 (a) perform ance o f the eight trees for RS data up to 20 generations and (b ) perform ance o f the same eight trees from generations 20 to 520. . . 60

4.1 Decrease o f fitness(/sS) value with increase in r for a j = 0 .1 and n = 10 in Eq. 6 ... 68

4.2 Iris D ata using 3rd and 4th f e a t u r e ... 83

5.1 Representation o f a fuzzy classifier ... 93

(8)

5.2 A typical tree representation o f fuzzy rule 94

5.3 Parent 1 for c r o s s o v e r 97

5.4 Parent 2 for c r o s s o v e r 97

5.5 O ffspring 1 after c r o s s o v e r 98

5.6 O ffspring 2 after c r o s s o v e r 98

5.7 Best fitness and Mean fitness over generation for IRIS: all FK M proto

types ... 108

5.8 Best fitness and Mean fitness over generation for IRIS: all R andom pro totypes ...109

5.9 Best fitness and M ean fitness over generation for W B C : all FK M pro totypes ... 110

5.10 Best fitness and M ean fitness over generation for W B C : all Random prototypes ...I l l 6.1 B lock diagram o f the G P s y s t e m ...123

6.2 Evaluation o f procedures on the basis o f their produ ced im age/textu re . 123 6.3 Generated Textures by our G P s y s t e m ... 125

6.4 Generated Textures by our G P s y s t e m ... 125

6.5 Generated Textures by our G P s y s t e m ... 126

6.6 Textures produced by the given procedures 1 and 2 ... 126

6.7 Textures produced by the given procedures 3 and 4 ... 127

6.8 Textures for Experim ent 1 ... 128

(9)

3.1 Different Classes and their frequencies for RS data... 53

3.2 The Five D a ta s e ts ... 54

3.3 Com m on Parameters for all D a ta s e ts ... 54

3.4 Different parameters used for different data s e ts ... 55

3.5 Performance Evaluation for IR IS , W B C and BU PA D a t a ... 57

3.6 Performance Evaluation for Vehicle and RS D a t a ... 58

3.7 Comparison of mean error rate

m err

with other approaches ... 61

3.8 M ean G P R u n T i m e ... 61

3.9 Average number of heuristic rules obtained for each D a ta set ... 61

4.1 D a ta s e t s ... 76

4.2 C om m on Parameters for all D a ta s e ts ... 78

4.3 P opulation S iz e ... 78

4.4 Average Performance ... 81

4.5 M ean R u n T i m e ... 81

4.6 Com parison w ith other methods for IR IS d a t a ... 82

4.7 Com parison w ith other methods for W B C d a t a ... 83

4.8 Com parison for W ine and Vehicle d a t a ... 84

4.9 Weights of features for Iris, W B C and W ine d a t a ... 85

4.10 Weights of features for Vehicle data ... 85

(10)

4.11 C om parison with other methods for Sonar d a t a ... 86

4.12 A verage Perform ance for noisy d a t a ... 87

4.13 t Statistic for the noisy data s e t s ... 88

5.1 D ata s e t s ...102

5.2 C om m on G P Parameters for all Data sets ... 103

5.3 O ther G P P a r a m e t e r s ... 103

5.4 Test A ccu racy with initial 10 Rules per class ... 104

5.5 A verage number o f Rules with initial 10 rules per c l a s s ...105

5.6 Test A ccu racy with initial 5 Rules per c l a s s ... 106

5.7 A verage number o f Rules with initial 5 rules per class ...106

5.8 Test Error for W B C for comparison ... 113

5.9 Test Error for Diabetes for c o m p a ris o n ... 113

5.10 R ep orted test error in other G P work [116] ...113

(11)

ANN Artificial Neural Networks C l C om pu ta tion al Intelligence EAs E volutionary Algorithm s EC E volutionary C om putation EP E volutionary Program m ing ES E volutionary Strategics FCM Fuzzy K Means

FS Feature Selection G A s G enetic A lgorithm GP G enetic Program m ing

(12)

Chapter 1 Introduction and scope of the thesis

1.1 Introduction

In this thesis, we focus on developing m ethodologies to solve ccrtain pattern recognition tasks using evolutionary algorithms. Before explaining our m ethodologies, we give a very brief introduction to pattern recognition and evolutionary algorithm s for a better understanding o f the thesis.

M odern man is over flooded with myriad of inform ation each distinct and com plex in its own nature. Extraction o f useful inform ation from such data often reduces to recognition o f various patterns present in the data. Thus, one needs to do pattern recognition. But what is pattern recognition (P R )?

A ccording to D uda and Hart [36], pattern recognition is a field concerned with machine recognition o f meaningful regularities in noisy or com plex environments. In [115], pattern recognition is defined as the categorization o f input data into identifiable classes via extraction o f significant features or attributes o f the data from a background of irrelevant detail.

R ecognition is a basic feature o f every living organism and pattern is the “description”

of an o b je ct that affords its recognition from am ongst other ob jects. For example, when we sp ot a known person am ong a host o f people or recognize a voice am ong a ca

cophony, we are using our pattern recognition capability. W e can read handwriting and recognize music. A human being is a very sophisticated inform ation processing system, partly because he possesses a superior pattern recognition capability [115]. Though human beings have very g ood pattern recognition ability, human sensory system has

(13)

certain lim itations. Some undctectable patterns or otherwise patterns which are more com plex in nature b ccom c difficult for human beings to recognize unaided. For in

stance, detecting the sound o f a submarine, in the presence o f other marine signals and noise. T here are also some m onotonous routine pattern recognition jobs. These tedious tasks require huge manpower. One such case is detection o f defects in objects in factory outlet. Inform ation can be the most valuable asset to the human being only if he can extract potential or valuable knowledge from it. Human analysts can no longer keep track o f or make sense o f the enormous volum e o f inform ation in this era of rapidly growing and advanced technology. Thus, it becom es essential to autom atically process this plethora o f inform ation efficiently and autom atically. Inform ation may be in the form o f text, image, voice and raw data. To store, m anage and process oodles o f inform ation in real time, and accurately we need autom atic techniques like automatic pattern recognition. Speech recognition, fingerprint identification, optical character recognition, D N A sequence identification, medical diagnosis, and remote sensing data classification are som e examples o f pattern recognition.

A typical Pattern R ecognition system (P R S ) consists o f three tasks namely, data acqui

sition and/or preprocessing, feature analysis, classification and/or clustering. In the first step, data are collected by using some sensors or other means. A nd then these raw data may be preprocessed. Preprocessing may involve noise reduction, normalization and conversion o f raw data into suitable form for pattern recognition. After obtaining the data, g o o d features are extracted by m apping data to other dom ain or a subset of g o o d features is selected from the available features. This process finds useful features to obtain an efficient and im proved solution to a given problem . Success o f pattern recognition depends on the features used. Finally, in the classification a n d /o r clus

tering phase, the actual task o f P R S is performed. Classification involves assigning a class label to a given pattern while clustering finds hom ogeneous subgroups in data.

However, all these com ponents may not be essential in a P R S. Also, consccutive phases may be com bined together. T he design o f PRS depends on the problem at hand. If data are available for pattern recognition, then we may require schemes for feature extraction /selection and for classification/clustering. T h e to o l that perform s classifi

cation is called classifier. In this thesis, we have prim arily focused on two im portant tasks: classifier design and feature selection.

Since the very beginning, statistical approaches [44] have been used for pattern recog

(14)

nition. However, the classical statistical m ethods arc not always well suited for diffi

cult pattern recognition problems. Many alternative approaches have been introduced which can address these com plcx pattern recognition problem s. A collective approach called C om putational Intelligence [70] is one o f them. These m ethods are quite useful and popular to build ’’ intelligent” systems.

C om putational intelligence includes mainly three tools: Artificial Neural Networks (AN Ns), Fuzzy logic and Evolutionary algorithms (E A s). AN Ns [57] are most popular among com putational intelligence m ethods. T h ey have g o o d learning and generaliza

tion abilities but som etim es lack interpretability and m ay work as a black box.

It is desired to have decision-m aking systems with reasoning ability as human beings.

It would be also better if linguistic rules can be used to describe a decision making system. Fuzzy logic [68, 12] can fulfill these requirements up to a great extent. It can handle uncertainty and vagueness in the real-world problem s.

E volutionary algorithm s (E A s) [6, 46, 72] evolve desired solution to a given problem using biologically inspired operations like crossover, m utation and the Darwinian P rin  ciple o f the Survival o f the Fittest. At first, typically a ’’ pop u lation ” o f representation o f possible solutions is (random ly) generated. Each representation is called a chrom o

some. Then, variation (genetic operation (s)) and selection operations are implemented on the current population to create the next population. This is m otivated by the hope that new generation will be better than previous generation. T h e process o f evolution is continued till the desired solution is obtained or till the term ination criteria are satisfied. T h e com putation using Evolutionary algorithm s is called Evolutionary C om  putation.

Like pattern recognition, an urge to find an ap p rop ria te/op tim a l solution is a basic feature of hum an being. Optim ization is a process to find the optim al solution for a given problem . We can search the solution space to find the optim al/app ropriate solution. M any pattern recognition tasks can be viewed as search and optim ization problems. For exam ple, in case o f classifier design, we search the feature space to obtain the appropriate classifier(s). In m ost cases, we assume a structure o f the classifier and try to optim ize the parameters involved in the classifier. During optim ization process, we either m inim ize or maximize a perform ance index. So, when we design classifier, we may attem pt to minimize the classification error o f the classifier over a set o f known

(15)

samples callcd training set.

EAs are basically random ized search and optim ization techniques [46. 18]. Most tra

ditional scarch and optim ization techniques are appropriate only for certain types o f problems. For exam ple, Calculus-based optim ization m ethods are suitable for the problems having sm ooth , continuous, unim odal and differentiable (scarch) space. Enu

merate search techniques arc used when the number o f possible solutions is finite and small. But E A s are robust, efficient and adaptive. T h ey have ability to find near

optim a solutions in acceptable time for a wide range o f optim ization problems. EAs arc in great dem and where traditional m ethods fail. EAs do not require that the scarch surface should have continuous, unim odal and differentiable. EAs can be used for the problems where search space is vast. Moreover, unlike traditional m ethods, these are population based search techniques. This helps to reach to or near to global optim a by avoiding local optim a. This is a great advantage o f EAs. Due to all these advantages, EAs are widely used to solve com plex real-world optim ization problems.

Genetic A lgorithm s (G A s) [46, 18] are most popular EAs. In G A s, each solution is represented by a finite-length string. G A s are usually used to optim ize the parameters of a m odel such as classifier or to find configuration (subset) o f certain set of variables.

Genetic Program m ing (G P ) [72, 7] is another EA. G P is a variation o f G A s. In GP, cach solution is typically represented by a tree or a program . This difference makes it a tool suitable for structural optim ization in addition to inherent parameter optim ization of a m odel. A n d hence, we d o n ’t require assuming a particular structure o f the m odel if wc use GP. It can find b oth (near-optim al) structure and values o f param eters involved of the m odel and it provides the expression o f the m odel. W c can not only employ the m odel o f the system to solve the problem but also can analyze the expression of the m odel. For this reason, G P is rapidly becom ing popular. This distinct advantage of G P also motivated us to adopt it for different pattern recognition tasks. Before describing the scope o f the thesis, we will attem pt to describe m otivation for each o f my proposed m ethods in the next section.

(16)

1.2 M otivation

Classifier design is an im portant pattern recognition task. G P can learn the hidden relationship in the data and can provide the desired classifier including its expression.

This helps to analyze the classifier in addition to em ploying it to classify unknown data points. T his feature o f G P has attracted many researchers to use G P for clas

sifier design. M ost available G P based classifier design m ethods deal with two class classification problem s. O nly few researchers have developed classifiers for multi-class problems using G P [81, 24, 84, 67, 123]. These m ethods are interesting but usually require more than one G P run to develop classifiers for multi-class problem s. So, that motivated me to propose an algorithm that can develop classifier in a single G P run.

Success o f a Pattern R ecognition Systems depends on features wc use. In literature, very few works are available where G P has been used for feature selection /extraction . The suitability o f a feature subset depends not only on the problem being solved but also on the solution (classifier) that is used to solve the problem . So, if we design classifier and select features for that classifier simultaneously, then we can obtain better classifier. As the available G P based feature selection m ethods do not select features and design classifiers at the same time, we propose an algorithm for it. We have explained the previous line as follow. Typically when G P is used to design classifiers, different classifiers may use different sets o f features and they may not use all features.

Hence, one can argue that a G P - based scheme for classifier design does an implicit feature selection. However, since such a design does not explicitly consider the task of feature selection, in the worst case some classifier may even use all features, and some may involve derogatory features. Such a design process does not penalize the use of larger feature set because it considers only the classification accuracy. For example, if a classifier using all features perform s slightly better (m ay be either on the training data or on a validation set) than another classifier that uses a very small number o f features then we consider the former one better as our objective is to im prove the classification accuracy. However, a better objective would be to obtain g o o d accuracy using a small number o f features. Such a classifier can lead to better interpretability and usually since such a system will have less degrees o f freedom , it is likely to yield better generalization. So, classifier design should be form ulated as a multi objective task giving explicit im portance on the cardinality o f the used feature set. Our approach

(17)

considers b oth classification accuracy and size o f the used feature subset explicitly while evaluating a classifier.

W hile designing m odels, we may incorporate fuzzy logic con ccp t to address vagueness and uncertainty in data. So, instead of evolving crisp classifiers, wc can evolve fuzzy classifier rules using GP. In this case too, few attem pts have been made to generate fuzzy classifier rules using GP. We propose a simple but effective schemc to evolve fuzzy classier rules for multi class classification problem . It optim izes both structure and values o f involved parameters o f the rules.

So far we have considered development o f general m ethodologies. Next wc consider a specific application area. A rt is being considered as the m ost creative and innovative discipline. C om puter is being used in artistic application too. Generation o f interest

ing images and textures using com puter for various purposes including fashion/textile design, anim ation is a recent trend. Texture [113] is a property o f the surface or struc

ture o f an o b je ct. A texture produced by an algorithm or a procedure is called a procedural texture [37]. For each point o f the procedural texture, the procedure gives the corresponding gray level or color value. A lthough the procedural representation is extremely com pact com pared to the image representation, it is difficult to find/w rite procedures that will generate textures for some target application. E volutionary al

gorithm [46, 72] is a possible solution to this m ajor problem . For evolution towards better solutions (textures), we need to evaluate each solution (texture) that indicates how g o o d the solution is. Despite its wide use, texture has no precise definition. So an autom atic evaluation o f textures is not an easy task. In com parison, a human being can easily identify and assess a texture. If we allow com puter to generate textures and user to determ ine which textures are g ood according to h is/h er choice in the process o f generation, then interesting textures can be created. G eneration o f textures based on this principle is called interactive texture generation. However, interactive texture generation could be a tedious process if a user needs to assign a fitness value to every generated texture. Consequently some m ethods generate textures similar to a (or a set o f) given reference texture(s). This type o f G P based schemes are presented in [120] which evolve procedures to produce textures similar to a given reference texture.

This is an interesting approach but requires reference texture(s) and also produces only similar textures w ith respect to the given reference texture(s). But, the m ost optim al texture will be identical to the given reference texture.

(18)

B oth o f these approaches have their limitations. Hence, we need to device a hybrid approach that can com bine the advantages of both approaches. This m otivated us to propose a new approach using G P to generate textures. G P can generate procedures to produce interesting textures. T he proposed G P system acts as a sort o f pattern recognition system.

Texture Classification is another pattern recognition task. W e have already devised G P classifier systems. Hence, we were curious to im plem ent our G P classifier schemes for texture classification.

1.3 Scope of the thesis

In this thesis, I address certain pattern recognition tasks using G enetic Program m ing.

These tasks are classifier design, simultaneous feature selection and classifier design, fuzzy rule based classifier evolution, and texture generation and classification. In this context, we have proposed various m ethodologies. T h e proposed schemes are validated on a set o f benchm ark real data sets. T he perform ances o f the m ethods have been com pared with some existing ones.

Chapter 1 and Chapter 2 act as the prologue to the thesis where we have introduced the reader briefly to the scope and nature o f the work. T h e follow ing chapters, however, are the heart o f the thesis that contain the detail studies o f our research and m ethodologies.

T he introductory sections o f these chapters begin with a brief literature survey o f the works that are to follow. Contents and contribution o f the chapters are summarized in the subsequent sections.

1.3.1 A n Introduction to Pattern Recognition and Evolution

ary Algorithms

In Chapter 2, we provide an introduction to E volutionary algorithm s (E A s) and pattern recognition. W e especially emphasis on Genetic Program m ing, the to o l we use for solving the tasks. A lso how EAs excel in optim ization has been briefed.

(19)

1.3.2 Classifier Design using GP

In Chapter 3, wc begin with a short survey o f previous works on classifier design us

ing GP and then wc present our proposed G P based schem e for classifier design for m ulti-category classification problems. T he proposed approach takes an integrated view o f all classes when the G P evolves. A m ulti-tree representation o f chromosomes is used. For c-class problem , a chrom osom e (possible classifier) consists o f c trees, each representing a classifier for a particular class. In this context, we propose a modified crossover operation and a new m utation operation that reduces the destructive nature of conventional genetic operations. W e use a new concept o f unfitness o f a tree to select trees for genetic operations. This gives m ore opportu n ity to unfit trees to becom e fit.

Further a new concept o f O R -ing chrom osom es in the terminal population is also intro

duced, which enables us to get a classifier with better perform ance. Finally, a weight based scheme and some heuristic rules characterizing typical ambiguous situations are used for conflict resolution. The classifier is capable o f saying “ d o n ’t know” when faced with unfamiliar examples. The effectiveness of our scheme is dem onstrated on several real data sets.

1.3.3 Simultaneous Feature Selection and Classifier Design using GP

In Chapter 4, we propose a G P system that perform s feature selection and classifier design simultaneously. Although, one can view that G P docs an im plicit feature anal

ysis, as explained earlier, if we do not explicitly add a penalty to the fitness function when m ore features are used, then the classifiers can use m ore features than what is required. Som e features may be redundant or irrelevant, even som e may be deroga

tory. Therefore, we should consider both classification accuracy and the number o f features used while evaluating a classifier. T he goal is to design a better classifier using small number o f features. Here we propose this m u lti-objective approach that can sim ultaneously select features and design a classifier.

At the beginning o f the chapter, we give a brief literature survey on feature selection.

Our G P scheme selects a g ood subset o f features and constructs a classifier using the selected features simultaneously. For a c-class problem , it provides a classifier having

(20)

c trees. In this context, we introduce two new crossover operations to suit the feature selection process. As a byproduct, our algorithm produces a feature-ranking scheme.

W c tested our m eth od on several data sets having dim ensions varying from 4 to 7129.

We com pared the perform ance o f our m ethod with results available in the literature and found that the proposed m ethod produces consistently g o o d results. T o dem onstrate the robustness o f the scheme, we studied its effectiveness on data sets with knowrn (synthetically added) redu n d an t/bad features.

1.3.4 Evolution of Fuzzy Classifier Rules using GP

In Chapter 5, we propose a G enetic Program m ing (G P ) based approach to evolve fuzzy rule based classifiers. For a c-class problem , a classifier consists o f c trees. Each tree, T, , o f the multi-tree classifier com prised a set o f rules for class i. During the evolutionary process, the inaccurate/inactive rules o f the initial set of rules are removed by a cleaning scheme. This allows g o o d rules to sustain and that eventually determines the number of rules. In stead o f using all features in a rule, in the beginning, our G P scheme uses only a random ly selected subset o f features and then evolves the features to be used in each rule. The initial rules are constructed using prototypes. T he prototypes are generated random ly as well as by the fuzzy K-m eans (F K M ) algorithm . Experiments arc conducted in three different ways: using only random ly generated rules, using a mixture o f random ly generated rules and F K M prototype based rules, and with exclusively F K M prototype based rules. Contrary to expectation, random ly generated rules work better than FK M based rules in m ost cases and this emphasizes the novelty o f the proposed scheme. In this context, we propose a new m utation operation to alter the rule parameters. Hence, the G P scheme not only optim izes the structure o f rules but also optim izes the param eters involved. This results in g o o d fuzzy rule based classifiers. M oreover, the resultant fuzzy rules can be analyzed. T h e perform ance o f the proposed scheme is quite satisfactory.

1.3.5 Texture Generation and Classification using GP

So far, we have developed general m ethodologies. In Chapter 6, we consider a specific application area. W e present a new m ethod to generate textures using G P and also

(21)

wc use our G P classifier schemes for texture classification. G enetic Program m ing can evolve suitable procedures or mathematical functions to solve a problem . This ad

vantage has been utilized to generate procedures that can produce interesting/desired textures. Our G P based texture generation scheme acts as a sort o f a pattern recogni

tion system.

The initialization process o f G P generates tree representation o f procedures. Each generated procedure Tl is activated to produce an image Sl (2-dim ensional gray value m atrix). We use contrast o f the generated im ages/textures to filter out p oor textures.

This phase resembles D ata acquisition/P reprocessing phase.

If Si could able to pass through the filtering process, a vector o f statistical features v , is extracted from Si to represent it.

Then the pattern is passed through the classification/clustering phase. Similarity o f the texture Si with respect to the already generated textures is com pu ted using the feature vector v ,. If the texture is m ore similar to a cluster of already generated textures then the texture is placed in that cluster and the fitness value o f that cluster is assigned to the procedure Si. Otherwise, if the generated texture is quite dissimilar with the existing textures then it is displayed for the user. T he user according to h is/h er choice assigns a fitness value by visually inspecting it. T he texture is placed in a new cluster and the fitness value o f the texture is assigned to that cluster.

After assigning fitness value to each procedure T,, genetic operations are applied and the evolutionary process is carried on. This produces many interesting textures ac

cording to the user’s choice by occasionally seeking user’s discretion.

For texture classification, we extract statistical features to represent them. These features are carefully chosen becausc success o f the process (pattern classification) depends on the features we use. After representing textures by the corresponding statistical feature vectors, we use our G P classifier schemes to design classifier for texture classification. W e have experim ented the texture classification task on a set of natural textures.

(22)

1.3.6 Conclusion and Scope of the future works

We conclude the thesis in chapter 7. T he hitherto detailed discussion is represented in a nutshell w ith com m ents on its respective merits and drawbacks. T he chapter also includes a discussion on the future scope o f the research work.

(23)

Chapter 2 An Introduction to Pattern Recognition and Evolutionary Algorithms

2.1 A n Introduction to Pattern Recognition

A ccording to D uda and Hart [36], pattern recognition is a field concerned with machine recognition o f meaningful regularities in noisy or com plex environments. As mentioned earlier typical Pattern R ecognition system consists o f three com ponents namely, data acquisition and/or preprocessing, feature analysis, classification and/or clustering. Be

fore going into detail about different pattern recognition tasks, I am giving the following example to explain pattern recognition.

Suppose a girl has never seen a cow or a goat. She also does not know how cows and goats look like. Now a teacher takes the girl to a playground where a host o f cows and goats arc grazing. Next, the teacher points to each animal and tells whether that is a cow or a goat. In this case each appearance o f cow or goat is called a pattern, or more particularly a training sample. T he girl tries to learn the characteristics o f cows and goats and m ay make the following observations on each animal.

1. A pproxim ate height 2. A pproxim ate length 3. A pproxim ate length o f tail 4. Shape o f the head

(24)

5. Num ber o f legs 6. Num ber o f eyes 7. Num ber o f ears

Such observations are called attributes or features. A pattern is represented by a set of features or attributes. C ollection o f data by sensing or measuring the features is called data acquisition.

Now the girl has been told that she may have to recognize cows and goats later. Then the girl analyzes the above mentioned observations and finds that:

(i) T ypically the height o f a cow is larger then the height o f a goat.

(ii) Length o f a cow is usually larger than the length o f a goat.

(iii) Tail o f a cow is longer than the tail of a goat.

(iv) Shapes o f heads o f cow and goat are different, usually the head o f a cow is bigger than that o f a goat.

(v) B oth cow and goat have the same number o f legs.

(vi) B oth cow and goat have the same number o f eyes.

(vii) B oth cow and goat have the same number o f ears.

Now, the girl gets an idea that for discrim ination between a cow and a goat, the first four observations are im portant to classify an animal whether it is a cow or a goat and the last three features are not useful. The above analysis o f observations/features to find the g o o d features (for classification) is called f e a t u r e a n a ly s is . T he selection o f good features from a given set o f features is called feature selection. In this case, the first four g o o d features are being selected from the set o f seven features.

On the way back, the teacher notices an animal (c o w /g o a t) and asks the girl to identify whether it is a cow or a goat. W ith her previous experience, the girl only takes interest to observe the above four im portant features. Then she tries to estim ate how close are the given features o f the shown animal to the corresponding features o f a cow and a

(25)

goat. That means, she may observe how big is the animal is, the length of its tail, and the shape o f its head. W ith her previous learning, she might be able to say whether the animal is a cow or a goat. Here cow and goat arc called the classes to which the pattern may belong to. T his task o f categorization o f a given pattern to a known class is called classification. T his is also called supervised classification bccause there is a need to supervise or to tcac.h the learning algorithm (girl) prior to classification of unknown patterns. If c stands for the number of classes then classification can be defined as follows: Classification is the partition o f the feature space into c subsets, one for each class and m apping a given unknown pattern (belongs to one o f these classes) or point to one o f the subsets o f the feature space.

In case o f clustering, there is no need o f a supervisor (or a teacher). To explain clustering the above example can be m odified as follow. A b o y who has never seen a cow or a goat is taken to a playground where the cows and goats are grazing. T he boy is asked to group the similar animals. He may observe the above m entioned features o f the animals and may analyze the features to find g o o d features to discriminate the two types o f animals. Further, he may choose the above m entioned four good features.

Using these four features o f animals, he may be able to cluster the cows into one group and goats into another group. This task is called clustering. Clustering may be defined as the grouping o f similar given patterns into groups- it is a partitioning o f a given data set.

In the following subsections, different phases o f a typical pattern recognition system are described.

2.1.1 D ata Acquisition and/or Preprocessing

In this phase data is collected by a set o f sensors or by other means. D ata may be numerical, linguistic, pictorial, signal/w aveform , or any com bination of them. After obtaining, the raw data is preprocessed. It may involve noise reduction, normalization and conversion o f raw data suitable for the task i.e. for pattern recognition. It typically represents a pattern by a vector o f features. This type o f data representation is called object data type and it is most com m on in pattern recognition. There is another type o f data structure that is rarely used in pattern recognition. It is called relational data and consists o f the pairwise relations (similarities or dissimilarities) between each pair

(26)

o f objects.

2.1.2 Feature Analysis

Feature analysis (FA ) is a process to find useful features to obtain an efficient and improved solution to a given problem . All available features arc not useful for the task at hand. Som e o f the features may be redundant while some others may bad too. These features may cause confusion during the proccss o f m odeI(e.g. classifier) development. These features unnecessarily increase the com plexity o f the feature space which in turn demands more com putation time to find a solution to the given problem.

FA is a process o f finding a map $ : 1ZP —> Tiq using a criterion J on the given (training) data set. Typically, q < p and it is called dim ensionality reduction.

Depending upon the type o f process, it may be categorized into two basic types: feature extraction (F E ) and feature selection (FS). Feature extraction is a m ethod to generate a q dimensional feature vector from a given p dim ensional input vector. Although, it is not necessary that q < p; however, for pattern recognition q < p is preferred. In other words, the original features are projected in a different space o f lower dimensionality by using some criteria. The extracted features are the linear/nonlinear com bination o f the given set o f features that may not bear the meanings o f the original ones. Principal com ponent Analysis (P C A ) [44] is a popular feature extraction m ethod for pattern recognition.

Feature selection (F S ) selects a subset o f g o o d features from the set of available features.

Ideally, the feature selection process should select an optim um subset o f features from the set o f available features which is necessary and sufficient for solving the problem.

2.1.3 Classification a n d /or clustering

Classification a n d /o r clustering is the actual task o f a pattern recognition system. In pattern classification task, we assume that there exist c groups or classes, denoted by Wi,u)2, ...,ujc. For a given pattern x , we assign a class label i £ {1 , 2 ,..., c } denoting x belongs to class w,-. T h e abstract space that represent all possible data points is called feature space. Basically, for c-class classification task, the feature space is partitioned into c partitions, each representing a particular class. T h e m odel that defines the

(27)

partition is callcd classifier. Unfortunately, in real world we don't, have all possible data points to obtain the exact partition or classifier. In stead, wc have a finite and usually a small set o f data points that provides partial inform ation for optim al design o f m odels such as classifier, feature selcctor/extra ctor. Hcnce, it is assumed that these data points are representative o f (distribution o f) the classes. This data set is typically called training set. Each data point o f the training set is associated with its corresponding class label. O n the basis o f the training set, we extra ct/sclect features and design the classifier. T he algorithm or the m ethodology learns from the inform ation including class membership available in the training set and provides the model. Here, the algorithm is supervised by providing the class membership of training samples while learning. This type o f learning is callcd supervised learning.

After obtaining the classifier, it is used to classify unknown data points i.e. patterns without knowing their class labels. In practice, before classifying unknown patterns, the classifier is tested on a set o f data points called test set. Although class label is associated to each data point of test set, while classifying the test point by the classifier we d o n ’t use the class label. After classification o f test point, the estimated class label of the test point by the classifier and its actual class label is com pared. On the basis of the com parison, the error/accu ra cy o f the designed classifier is estimated. If the perform ance is satisfactory, then it is used to classify unknown data points.

A classifier can be defined as follow. A classifier D is a m apping, D : V I —> N hc.

where TZP is the p-dimensional real space and N hc is the set o f label vectors for a c- class problem and is defined as N hc — { y G 1ZC : y l e {0 ,1 }V i, y* = 1}. For any vector x € TZP, D (x ) is a vector in c-dim ension with only one com ponent as 1 and all others as 0. In other words, a classifier is a function which takes a feature vector in p dimension as input and assigns a class label to it.

In clustering, we cluster the given data points into hom ogeneous subgroups. The difference between classification and clustering is that in classification we partition the feature space while in clustering we partition the given data points into groups. Unlike classification, we do not require the class label o f the data points in clustering. And hence the process is also called unsupervised learning.

These are the three basic com ponents o f a typical pattern recognition system. However, all these above m entioned processes may not be required for a pattern recognition system. For exam ple, we can directly provide the measurements or values o f required

(28)

features to the classifier. Here, the user intuitively decide the features and hence acts as a feature selector. A lso, som e phases may be com bined together. In the above example, data acquisition and feature analysis are considered together. A pattern recognition system may have b o th classification and clustering schemes. In some eases, feature analysis and classification/clustering are perform ed simultaneously. We have proposed an algorithm where feature selection and classifier design arc sim ultaneously performed.

W ith the in troduction to pattern recognition, wc now discuss the main tool, E A , that we have used in this thesis to solve different pattern recognition tasks.

2.2 A n Introduction to Evolutionary Algorithms

Evolution is the natural developm ental process by which different kinds o f living or

ganism develop from the earlier forms. From the very dawn o f human civilization man has been engaged incessantly in his onward quest for perfection by closely observing the various traits o f nature and consequently analyzing and deducing the logic behind such natural occurances. He has observed the flights of birds and with an urge to defy gravity like them, he developed the flying machine but w ithout being self com place- mcnt, he has been constantly im proving on it till now. Fascinated by the way the brain neurons function, he developed artificial neural networks (A N N s). W ith this similar urge, he developed algorithms which mimic the principle o f natural evolution. These algorithms are called evolutionary algorithms.

Evolutionary algorithm s(E A s) evolve desired solution to a given problem using biolog

ically inspired operations like crossover, mutation and the D arwinian principle o f the survival o f the fittest. Typically, initially a ’’ population” o f possible solutions is taken.

Then, variation (genetic operation (s)) and selection operations are im plem ented on the the current popu lation to create the next population. This is m otivated by the hope that new generation will be better then previous generation. T he com pu tation using E volutionary algorithm s is called Evolutionary Com putation. A lthough the origin of evolutionary com pu tation can be traced back to the late 1950’s, it slowly becam e p op

ular during the 1970’s with m ajor contributions from pioneer researchers like Holland, Rechenberg, Schwefel and Fogel [5]. Now, with powerful com puters, use o f EAs is increasing rapidly.

(29)

Genetic A lgorithm s [46], Genetic Program m ing [72], E volutionary Program m ing [6]

and E volutionary Strategies [6] are four m ajor Evolutionary Algorithm s. These algo

rithms have been discussed in the following sections.

2.2.1 Genetic Algorithms

G enetic Algorithm s (G A s ) [46, 18], introduced by Holland and subsequently studied by researchers like De Jong and G oldberg are most popular evolutionary algorithms. GAs arc mainly used for optim ization problems.

In G A s, the solutions are encoded into finite-length strings o f alphabets of certain cardinality. These strings are called chromosomes, the alphabets are referred to as genes and the values o f genes are called alleles. In most cases, chrom osom es are binary strings consisting o f alphabets ” 1” and ” 0” . In a problem such as the traveling sales man problem , a chrom osom e represents a route, and a gene may represent a city.

Initially, a population (set) o f solutions are random ly generated for evolution. In the process o f evolution and natural selection, the g ood solutions survive and produce offspring while the bad solutions die out (rem oved). To determ ine the goodness o f the solution, each solution is evaluated using an objective function. T he evaluated value is assigned as the fitness value to that solution. Instead o f using an objective function, a subjective function may be considered where the user ch oose better solutions over worse one.

T h e follow ing steps are carried out in Genetic Algorithm .

1. Initialization: A population o f solutions are random ly generated across the search space. However, if dom ain-specific knowledge is available, then it can be incorporated.

2.Evaluation: Each solution o f the population is evaluated and the evaluated fitness value is assigned to the solution.

3. Based on the fitness values, better solutions are selected for genetic operations.

4. Usually two chrom osom es are taken for crossover with probab ility p c. T hat means, a random num ber r G [0,1] is generated. If r < pc,then these two chromosomes are allowed for crossover. In crossover operation, sub-strings o f b oth chromosomes (parents) are random ly taken and swapped am ong them. Usually, pc is very high

(30)

( « 0.95).

5. After crossover operation, each alphabet o f each chrom osom e is allowed for muta

tion with m utation probability p m. Usually, pm is very low ( « 0.001). In mutation operation, the alphabet is altered. In binary string, " 1 ” is inverted to ” 0” anti vice versa.

6. During evolution, there is chance that the best chrom osom e o f the G P popula

tion may be dcstructed by genetic operations. Hence to avoid this, a copy of best chrom osom e is always preserved during evolution.

7. Continue steps 2-6 till the term ination criteria are not satisfied.

2.2.2 Evolutionary Programming

Evolutionary P rogram m ing(E P) [5, 63] was introduced by Fogel in 1961. The purpose was to create artificial intelligence in the sense o f developing ability to predict changes in an environment. EP uses the concepts o f natural evolution, selection and stochastic m utation. It does not use crossover operation. EP was originally meant for evolving finite state m achine(FSM ) to predict events on the basis o f form er observations. An FSM transforms a sequence of input sym bols into a sequence o f output symbols based on a finite set o f states and state transition rules.

Now-a-days EP m ethodologies are implemented in many discrete and continuous pa

rameter optim ization problems [6]. Like other EAs to evolve the desired (optim um ) solution (param eter), a population o f jx possible solutions are generated. Depending on the problem domain, the representation o f solutions varies. Each solution o f the population was evaluated. M utation operation is then im plem ented on the solutions to produce [i offspring. In case o f FSM , there are five possible m utation operators:

change o f an output sym bol, change o f a state transition, addition o f a state, dele

tion o f a state, and change o f the initial state. A fter evaluating the /i offspring, a selection o f the /i best out o f parents and offspring, i.e. a (/x + ^ -s e le c tio n , was per

formed. EP implements a probabilistically selection m eth od to select individuals from the (n + p ) individuals for the next generation: Each individual is com pared with q > 1 other random ly chosen individuals from the union o f parents and offspring. For each com parison, a ’’ w in” is assigned if the individual’s score is better or equal to that of

(31)

opponent. T hen /i individuals with the greatest number o f wins are retained to be parents of the next generation [6].

2.2.3 Evolutionary Strategies

Evolutionary Strategies (ES) [98, 6, 63] were proposed by Rechenberg and Schwefel in 1960s as a m ethod to solve parameter optim ization problem s.

Initially ESs was based on population consisting o f only one individual and only one genetic operator: m utation. The individual was represented by a pair o f float-valued vectors,i.e., v = { x , a ) . x represents a point in the search space and a is a vector o f standard deviations. M utation is implemented by replacing x by

f

^t+1= x l + N( Q, f f )

where N ( 0 , a ) is a vector o f independent random Gaussian numbers with a mean of zero and standard deviations a. If the offspring (m utated individual) is better (higher fitness) than parent, then it replaces the parent. Otherwise, parent is retained and the offspring is discarded.

Later on, population (multi m em bered) based ESs wTere introduced. Here p parents produce A offspring. In (/i, A)-ESs, fj, individuals are selected from the A offspring for the next generation. This m ethod is not elitist and we m ay get worse individuals in the next generation. It has been given in [6] that this may help to leave the region of attraction o f a local optim um and reach a better optim um . In contrast, the + A)-ESs select p, offspring from the com bined (p -f A) individuals for the next generation. This retains best individuals o f the previous generation and hence it does not loose the best individuals during the process o f evolution (variation).

In these popu lation based ESs, the strategy param eter a is no longer constant. Rather, it is incorporated in the structure o f the individual itself and undergoes the evolutionary process. T his facilitates the self-adaptation o f the param eters. To produce an offspring, usually two parents are selected and crossover operation is applied on them. Suppose two selected parents are:

{ x \ a l ) = ((X !1, ....,x n1), {cf\1, ...,0-n1)) and ( f 2 , <72 ) = ( ( X ! 2 , . . . . , Xn 2 ) , ((Ti2 , . . . , crn 2 ) ) .

T he crossover operation blends the two parents. T h e crossover operation may be

(32)

different types. In one type, com ponents from the first or second parent are selected to produce the offspring (x, a ) as below:

(x,<r) = ( ( x i h , . . . . , x nln), (a / 1, ...,crn'" ) ) , where — 1 or 2.

In other type, average o f com ponents o f parents arc taken:

(x, 5 ) = ( ( ( X i 1 + X !2) /2 , ...., ( X n1+ Xn2) /2 ) , ((o '!1 + CTj 2) /2 , .... (On1 + <Tn2) / 2))

After producing offspring (x, <?) by crossover operation, m utation operation is applied on it. T he resultant offspring is (x, a ) where

d = a .e N^ Aa) x = x + Af(0, a)

where A a is a parameter o f the m ethod.

2.2.4 Genetic Programming

In 1980s, researchers like S.F. Smith (in 1980) and N. Cram er (in 1985) proposed variation o f genetic algorithms that can evolve com puter codes. However, with K o za ’s contribution in 1990s, it becom e rapidly popular. This algorithm has been given name Genetic Program m ing.

Since the thesis is mainly based on GP, we provide a detailed discussion on it.

Genetic Program m ing [72, 7, 73] evolves a population o f com pu ter programs, which are possible solutions to a given optim ization problem , using the Darwinian principle o f Survival o f the Fittest. It uses biologically inspired operations like reproduction, crossover and m utation. This is a variation o f G enetic A lgorithm (G A ). T he main difference betw een G P and G A is representation. In GP, each solution is typically rep

resented by a tree. In few research papers, different structures such as linear structure and graph structure are also used. I focus on tree representation o f G P solution for three reasons: 1. It is the standard representation 2. It is m ost popular 3. I use this representation in m y thesis.

However, the concept o f G P is same for all representations.

I give a schem atic overview o f G P algorithm in the next section and details of G P in subsequent sections.

(33)

Steps o f G P :

A typical im plem entation of G P involves the following steps.

Step 1) Initialization: G P begins with a random ly generated population o f solutions o f size P.

Step 2) Term ination: G P is terminated when term ination criteria are satisfied. Unlike G A , G P will not converge. So, G P is terminated when a desired solution (m ay be with fitness value 1) is achieved. Otherwise, it is term inated after a predefined number o f generations.

Step 3) Evaluation: A fitness value is assigned to each solution o f the population.

Step 4) N ext G eneration: T he current population is replaced by a new population by means o f applying genetic operations probabilistically.

Step 5) This com pletes one generation. G o to step 2 and repeat if termination criteria arc not satisfied.

Initialization:

Each individual or solution in the G P population is generally represented by a hi

erarchically structured program or a tree com posed o f functions and data/term inals appropriate to the problem domain. The function set may contain

• standard arithm etic operators:

• m athem atical functions: Sin,C os,E xp,Log,...

• Boolean Operators: A N D ,O R ,N O T ,...

• Conditional: If-Then-Else,...

• Relations:

• Iterations and Loops: Do-Until, W h ile-D o,F or-D o

• Control Transfer Statements: G o To, Call, Jump

• Dom ain specific functions: M ove-R an dom ,If-F ood-A hea d,...

(34)

@

(a) (b)

Figure 2.1: (a) Division by 0 and (b) square root of a negative number

The terminal set usually consists of arguments and constants for the functions. The set of functions

T

and set of term inals/inputs <5 must satisfy the closure and sufficiency properties.

Closure Property:

The closure property demands that the function set is well defined and closed for any combination of arguments that it may encounter. As a tree (or ex

pression) is generated randomly and afterward it is also random ly changed, a function (parent node) of a tree or expression may encounter different types of arguments (chil

dren nodes). For example, when we use

division

function to generate tree in random manner, then

division by 0

situation may occur (e.g. Fig. 2.1 (a)). To cope up w ith this type of undefined case, we have to define the

division

function properly. Similarly, the square root can encounter a negative argument (e.g. Fig. 2.1 (b)) and the logarithm function can encounter non-positive argument (in a random ly generated expression or tree). In these cases, we must satisfy the closure property by using protected functions which can handle these type of situations.

Sufficiency Property:

The sufficiency property requires that the set of functions in

T

and the set of terminals in

S

be able to express a solution to the problem [72], For example, the function set

T

= {A N D , O R , N O T } is sufficient to represent any boolean expression. However,

T

= { A N D , O R } is not sufficient to build all possible boolean expressions.

After determining function set

T

and terminal set

S,

a set of tree structures are (randomly) generated as an initial population. To prevent generation of large tree structures, restriction in size of tree is imposed. T hat means, depth of tree is restricted

Certain pattern recognition tasks using genetic programming