• No results found

Development of soft computing models for data mining

N/A
N/A
Protected

Academic year: 2022

Share "Development of soft computing models for data mining"

Copied!
14
0
0

Loading.... (view fulltext now)

Full text

(1)

Indian Journal of Engineering & Materials Sciences Vol. 8, December 2001, pp. 327-340

Development of soft computing models for data mining

S N Sivanandam, A Shanmugam, S Sumathi & K Usha

Department of Electrical and Electronics Engineering, PSG College of Technology, Coimbatore 641 004, India Received 28 June 2000; accepted 3 September 2001

The increasing amount and complexity of today's data available in science, business, industry and many other areas creates an urgent need to accelerate discovery of knowledge in large databases. Such data can provide a rich resource for knowledge discovery and decision support. To understand, analyze and eventually use this data, a multidisciplinary approach called data mining has been proposed. Technically, data mining is the process of finding correlation or patterns among dozens of fields in large relational databases. Pattern classification is one particular category of data mining, which enables the discovery of knowledge from very large databases (VLDB). In this paper, mining the database through pattern classification has been done by utilizing two important mining tools called K-Nearest Neighbour algorithm and Decision trees. The K-Nearest Neighbour (K-NN) is the popularly used conventional statistical approach for data mining. K-NN is a technique that classifies each record in a data set based on a combination of the classes of K-records most similar to it in a historical data set. The fuzzy version of K-NN, crisp and fuzzy versions of nearest prototype classifiers have also been proposed. Decision tree is one of the best machine learning approaches for data mining. A decision tree is a predictive model that as its name implies, can be viewed as a tree. Briefly, decision trees are tree shaped structures that represent sets of decisions. These decisions generate rules for classification of a data set. Classification and Regression Tree (CART), 103 are the two decision tree methods used in this paper. The classification rules have been extracted in the form of IF THEN rules. The performance analysis of K-NN methods and tree-based classifiers has been done. The proposed methods have been tested on three applications such as land sat imagery, letter image recognition and optical recognition of hand written digits data. The simulation algorithms have been implemented using C++ under UNIX platform.

With the wide use of advanced data base technology developed during past decades '-4 it is not difficult to efficiently store huge volume of data in computers and retrieve them whenever needed. Although the stored data are a valuable asset of any application, more people face the problem of data rich but knowledge poor, sooner or later. This situation aroused the recent surge of research interest in the area of data mining. Pattern classification is a well- recopized data mining operation used extensively for decision-making. Classification is a process of finding the common properties among different entities and classifies them into classes.

The popularly used methods by researchers in the area of machine learning include neural networks and decision trees. Decision trees are predictive models that are extensively used for pattern classification and decision-making. Decision trees provide easily under- standable and perfect decision making needed for real-life applications. The decision tree's intelligence is obtained in the form of production rules, which enables the tree to have lower classification time and better classification capability. Nearest Neighbour classifier-a statistical approach, which is a widely

used conventional method for classification has also been used. In this paper, the K-NN and Nearest Prototype classifiers with crisp and fuzzy versions are proposed for data mining.

The data mining process consists of three major steps:

(i) Data preparation: Data is selected, cleaned and preprocessed under the guidance and knowledge of domain experts who capture and integrate both the internal and external data into a compre- hensive view that encompasses the whole organization.

(ii) Data mining algorithm: Data mining algorithm is used to mine the integrated data to make easy to identify any valuable information.

(iii) Data analysis phase: Data mining output is evaluated to see if domain knowledge was discovered is in the form of rules extracted out of the network. The overall data mining process IS

shown in the Fig. 1.

Data preparation

Data mining algoritlun

Fig I--Overall data mining process

(2)

328 INDIAN J. ENG. MATER. SCI., DECEMBER 2001

Data mining techniquesJ

A variety of data mining techniques and algorithms are available with each one having their own strength and weakness. Among the data mining tools available namely, (i) Statistical approaches, (ii) Machine learning approaches, (iii) Neural networks, (iv) Rule Induction, (v) Database systems, (vi) Rough sets and (vii) Data visualization.

K-NN method - a modern statistical approach and Decision trees - the most commonly used machine learning approach are used as data mining tools.

Data mining categories3,4

The goal of data mining or knowledge discovery is to determine explicit hidden relationships, patterns, or correlation from data stored in a database. There are five categories in data mining process, viz., (i) Summarization, (ii) Classification, (iii) Clustering, (iv) Association and (v) Trend analysis.

In this paper, classification category - one of the important problems in data mining is dealt with.

Pattern Classification using K-Nearest Neighbour Methods5-7

Classification of objects is an important area of research and application in a variety of fields. The K- Nearest Neighbour (NN) decision rule has often been used in pattern recogllltIOn and classification problems. The basic idea behind the K-NN is very straightforward. For training, all input and output pairs in the training set are stored into a database.

When a classification is needed on a new input pattern, the answer is based on the K-nearest training patterns in the database. It is the simplest method used for pattern classification, which classifies the unknown record based on the K-nearest records from the historical database. K-NN requires no training time other than the time required for the prepro- cessing and storing of the entire training set. It is very memory intensive since the entire training set is stored. Classification is slow since the distance between input pattern and all patterns in the training set must be computed. The flow chart (Fig. 2) shows the simple classification procedure used by K-NN method.

Common training parameters for K-NN

(i) Number of Nearest Neighbours: This is the number of nearest Neighbours, (K), used to classify the input patterns.

(ii) Input compression: Since K-N is very storage intensive data patterns can be compressed as a preprocessing step before classification. Typi- cally, using input compression will result in slightly worse performance (as resolution in the input data is last). Sometimes using compression will improve performance because it performs automatic normalization of the data, which can equalize the effects, each input in the Euclidean distance measure. K-NN algorithm should be used without any compression unless there is a memory problem.

The Fuzzy version of K-NN algorithm is also used to improve the classification performance. In addition, the Crisp and Fuzzy Nearest prototype classifier techniques are also proposed. In Nearest Prototype technique, a typical pattern of each class is chosen, and the unknown vector is assigned to the class of its closest prototype.

Crisp K-NN pattern cIassifierS6

The crisp K-NN classification rule assigns an input sample vector y, which is of unknown classification, to the class of its nearest Neighbour. This idea has been extended to the K-nearest Neighbours with the vector y being assigned to the class that is represented by a majority amongst the K-nearest Neighbours. If a tie exists, the sample vector is assigned to the class, of those classes that tied, of which the sum of distances from the sample to each Neighbour in the class is a minimum. Due to space limitations the crisp K-NN algorithm5 is not listed here.

Fuzzy K-NN pattern cIassifierS6

The theory of fuzzy sets is introduced into the K- Nearest Neighbour technique to develop a fuzzy version of the algorithm. This decision rule provides a

Store all input/output pairs in the training set.

Training

Tes ting

I

For each pattern in the test set

I

..

Search for the k-nearest patterns to the input pattern using a Euclidean distance measure.

J._

For classification, compute the confidence for each a C;lk. where C, is the number of patterns among the k nearest patterns belonging to class i. The classification for the input pattern is the class with highest confidence.

Fig. 2 -Flow chart for K-Nearest Neighbour Algorithm

(3)

SlY ANANDAM et at.: SOFf COMPUTING MODELS FOR DATA MINING 329

simple non-parametric procedure for the assignment of class label to the input pattern based on the membership values in each class using the class labels represented by the K-closest Neighbours of the vector.

Reasons for introducing fuzzy into K-NN algorithm (i) Each of the sample vectors is considered equally

important in the assignment of the class label to the input vector. This frequently causes difficulty in those places where the sample sets overlap.

(ii) Once input vector is assigned to a class, there is no indication of its strength of membership in that class.

It is these two problems in the K-NN algorithm that are leading to incorporate fuzzy set theory into the K- NN algorithm.

A fuzzy K-NN algorithm is developed utilizing fuzzy memberships of the sample sets and thus producing a fuzzy classification rule. Two different methods of assigning fuzzy memberships for the training sets are used.

Classification using fuzzy K-NN

The fuzzy K-NN algorithm assigns class member- ship to a sample vector rather than assigning the vector to a particular class. The advantage is that the algorithm makes no arbitrary assignments. In addition, the vector's membership values should provide a level of assurance to accompany the resultant classification. The basis of the algorithm is to assign membership as a function of vector's distance from its K-nearest neighbours and those neighbour memberships in the possible classes. The fuzzy algorithm is similar to the crisp version in the sense that it must also search the labelled sample set for the K-nearest neighbours. Two methods of assigning memberships to the labelled samples are:

Membership functions:

#1. Iii (x)= 1, XE

0, xei

#2. Iii (x)=x2 / (x2 + 1), x >=0

0, x<O

Here first membership function assigns complete membership in their known class and non- membership in all other classes. The second one assigns the samples membership based on distance from their class mean, and then to use the resulting

memberships in the classifier. The fuzzy K-NN algorithm5 is used here.

K

fli (x) =

L

flij (1111 x - x j

W /

(111-1))/

j=1

L

K (11

II

x - x . 112/(111-1))

j=1 J

... (I)

As seen in the Eq. (1), the assigned memberships of x are influenced by the inverse of the distances from the nearest neighbours and their class memberships. The inverse distance serves to weight a vector membership more if it is closer and fewer if it is farther from the vector under consideration. The labeled samples can be assigned class memberships in several ways.

Here variable 'm' plays an important role in assigning membership values in each class for the unclassified pattern. Variable 'm' determines how heavily the distance is weighted when calculating each neighbour's contribution to the membership value. If 'm' is two then the contribution of each neighbouring point is weighted by the reciprocal of its distance from the point being classified. As 'm' increases, the neighbours are more evenly weighted, and their relative distances from the point being classified have less effect. As 'm' approaches one, the closer neighbours are weighted far more heavily than those farther away, which has the effect of reducing the number of points that contribute to the member- ship value of the point being classified. Fuzzy algorithm dominates crisp version in terms of lower error rates; the resulting memberships give a confidence measure of classification.

Nearest prototype classifiers (N PC) 5.7

These classifiers bear a marked resemblance to the one-nearest neighbour (l-NN) classifier. Actually, the only difference is that for the nearest prototype classifier the labeled samples are a set of class prototypes, whereas in the nearest neighbour classifier we use a set of labeled Samples that are not necessarily prototypical. The prototypes used for these routines are taken as the class means of the labeled sample set.

Crisp nearest prototype classifier (NPC)

Crisp NPC classifies the new pattern into the class to which the nearest prototype among the c prototypes by calculating the distance between the new pattern and c prototypes and finding the nearest prototype to

(4)

330 INDIAN J. ENG. MATER. SCI., DECEMBER 2001

the pattern. Due to space limitations the crisp NPC algorithmS is not listed here.

Fuzzy nearest prototype classifier (NPC)

This is similar to the crisp nearest prototype classifier and the only difference is that here membership values are assigned to the prototypes.

Depending on the nearest prototype the new input pattern is assigned membership values in all the classes and will be classified into the class of maximum membership. The fuzzy NPC algorithmS is used.

Membership function: ,Ll i (x) = ,Ll ij (l /

II

x - Z j 112f

( m -I) ) /

L

K (1/ II x - Z 112f(m-I))

j=1 J

... (2)

In both K-NN and nearest prototype techniques, the fuzzy version dominates its crisp counterpart in lower error rates and this statement has been proved experimentally in the forthcoming sections.

Induction of DecisionTrees 8-16

Decision trees are popular for pattern recogmtlOn because the models they produce are easier to understand. Decision tree based models have a simple top down tree structure where decisions are made at each node. These decisions generate rules for the classification of a data set. Decision tree model is built, starting with a root node. Training data is partitioned to the children nodes using a splitting rule.

The nodes at the bottom of the resulting tree provide the final classification.

In classification training data contains sample vectors that have one or several measurement vari- ables (or features) and one variable that determines the class of the sample. A splitting rule can be of the form: If A < C then s belongs to L, otherwise to R. Here A is selected variable, c is a constant, s is the data sample and Land R are the left and right branches of the node. Here splitting is done using one variable and a node has two branches and thus two children. A node can also have more branches and the splitting can be done based on several variables. The tree is constructed until the purity of the data in each leaf node is at predefined level, or until leaf nodes contain a predefined minimum number of data samples. Each leaf node is then labelled with a class.

Usually the class of the node is determined based on

majority rule: Node is labelled to the class to which majority of the training data belong .

Key issues for any decision trees

The criteria used to build the tree: This is the first step in the process of classification. Any algorithm seeks to create a tree that works as perfectly as possible on all the data that is available. Determining which variables to use and what splits to use at each point in the tree to divide the entire data available at a node. The best splitting for each node is searched based on a "purity" function calculated from the data.

Most frequently used purity functions are entropy, gini-index. The data portion that falls into each children node is partitioned again in effort to maximize the purity function.

The criteria/or stopping the growth a/the tree, i.e., when does the branching at a given node stops:

Construction of the tree can be stopped easily using some condition to the purity function, or a fully constructed tree can be pruned afterwards.

The criteria to prune the tree for maximum classification effectiveness, i.e., which branches of the tree should be removed: The branches, which are not contributing much for classification is deleted from the full tree to reduce the size of the tree.

Binary decision trees

Binary decision trees are architectures for classi- fication that draw decision region boundaries through constructing a binary tree. Traversing the tree beginning at the root node, and ending at a leaf node does classification of input vector. Each node of the tree computes an inequality based on a single variable. If the inequality is satisfied, the left child is the next node traversed otherwise the right node is the next. Each leaf is assigned to a particular class, and input vectors are classified as the class associated with the leaf at which the trajectory ends. The popular algorithm to construct binary decision tree is Classification and Regression Tree (CART).

Classification and regression trees and Iterative Dichotomiser 3 (103)10-15

The top down induction of decision trees is a popular approach in which classification starts from a root node and proceeds to generate sub trees until leaf nodes are created. A decision tree is a representation of a decision procedure for determining the class of a given instance. This approach uses attribute based descriptions and the learned concepts are represented

(5)

SlY ANANDAM et al.: SOFf COMPUTING MODELS FOR DATA MINING 331

by decision trees. It is possible to categorize conjunctive and disjunctive descriptions of concepts with 'if-then' rules, which can be lifted from the trees.

These rules often offer a more flexible representation than the decision trees themselves. Classification and Regression Tree (CART) is a data exploration and prediction algorithm with a structure of a simple binary tree .. Classification and Regression Trees are binary prediction trees with a structure, which is easy to understand, interpret and use. CART splits a single variable at each node. Iterative Dichotomiser (ID) 3 is an improved version over CART, which generates a complex tree and will have number of branches for each node equal to the range of a particular attribute, which seem to be the best for splitting at that node. For both the trees a pruning algorithm is used to prune the tree back to reduce the size of the tree by maintaining the classification accuracy. The tree building process of CART with splitting criterion8.13

is used in this paper.

Iterative Dicholomiser(ID) 31415

Iterative Dichotomiser is a simple and widely used symbolic algorithm for learning from a set of training instances. ID3 is the basis of several commercial rule induction systems. ID3 has been augmented with techniques for handling numerically valued features, noisy data, and missing information. ID3 Algorithm creates from a given data set an efficient description of a classifier by means of a decision tree. At each step a new node is added to the decision tree by partitioning the training examples based on their value along a single most informative attribute. Each resulting partition is processed recursively, unless it contains examples of only a single category. The information gain criterion that determines the

"splitting" attribute acts as a hill-climbing heuristic, which tends to minimize the size of the tree. When the data set is consistent (i.e. no contradictions between data objects), the resulting decision tree describes the data exactly. This means if the original data set is examined via the resulting tree, the results from the tree will be exactly that which is expected. In addition, the tree can be used for the prediction of new data objects. The fundamental assumption is that the data set given to the algorithm is representative of the total data set.

The structure of ID 3 tree is basically a non-binary tree. The branches of the tree are equal to the range of values of the patterns for the particular attribute selected at that node. Essentially, a decision tree

defines a set of paths from the root node to the leaf nodes. Which path to take is determined by the descriptors on the non-leaf nodes. The descriptor describes which branch to f6llow. When one reaches a leaf node, then there are no more questions and the result is given. This is the output of the tree.

Training phase

The production of the decision tree from the data set is called the training phase. During this phase the original data set is examined, the best descriptor is found and then the set is divided into subsets. ID3 picks predictors and their splitting values on the basis of the gain in information that the split(s) provide.

The descriptor is then made into a node of the decision tree. The same procedure is performed on the remaining subsets (dividing the set even more and creating more descriptor nodes) until all the elements in the set have the same classifier value (unless no descriptor can tell the elements apart, i.e. the original data set has contradictions).

Key concepts involved in building the tree

Two major concepts are involved within the ID3 method: (i) Entropy and (ii) Decision tree.

Entropy-The entropy concept is used to find the most significant parameter in characterizing the classifier. The concept of entropy is used to order the list of descriptors with respect to the data set and the classifier. Entropy provides a definition of the most significant descriptor and is one of the major concepts within the ID3 method.

Decision tree-The decision tree built using the entropy concept to select the best descriptor will have the number of branches equal to the range of the values of the descriptor selected at that particular node. The basic structure of ID3 is iterative. At every new node created by the tree a descriptor is selected and again the branches are formed, so the process is iterative and builds a huge tree and the node is labeled as a leaf when all the patterns arrived at that node come under the same category. Here the tree stops growing.

Reduced error pruning9.13

Obtaining the optimum size neural network is important for good generalization. If the network is very large, it tends to memorize the training data and thus generalize poorly. The size of the network depends on the number of training data and the degree of generalization required. Reduced error prunll1g

(6)

332 INDIAN J. ENG. MATER. SCI., DECEMBER 2001

method is a simple and direct method. Assume a separate test set is used, each 'case' which is classified by the original tree (T). For every non-leaf sub-tree S of T the change in misclassifications is examined over the test set that would occur if S were replaced by the best possible leaf. If the new tree would give an equal or fewer numbers of errors and S contains no sub-tree with the same property, S is replaced by the leaf. The process continues unti I any further replacements would increase the number of errors over the test set. Its rationale is clearer, though, since the final tree is the most accurate sub-tree of the original tree with respect to the test set and is the smallest tree with that accuracy. The disadvantage of this method is that parts of the original tree con'esponding to rare special cases not represented by the test set may be exercised. This method is used to prune both the CART and 103 trees to get a right sized tree.

103 has been improved to handle high-cardinality predictors. A high cardinality predictor is one that has many different possible values and hence many different ways of performing a split. With a decision tree structure used by CART, the metric for choosing splits might also erroneously allow high-cardinality splits to occur. To avoid this a pruning process is used to test the tree against held-aside data, so it is likely that an erroneous split would be eliminated during this phase.

Data Representation Schemes and Rule Extraction 16·18

Data representation can also be called as data preprocessing. Data preparation is done to maximize the influence of absolute scale. All inputs to a network should be scaled and normalized so that they correspond to roughly the same range of values. The goal of data preprocessing is to reduce the non- linearity when its character is known and let the network resolve the hidden non-linearity that are not understood.

Data scaling

Data scaling is a type of analog preprocessing method. It has the advantage of mapping the desired range of a variable (with a range of the minimum and maximum values) to the complete working range of the network input. This linear scaling follows the procedure below:

y=mx + c

where m is the slope and c is the y intercept. If the values between which the scaling is to be done is 0.1- 0.9. Then

Y=O.1 when X=Xlllill Y=0.9 when X=XlIIllX

For the assumed range 0.1-0.9, m=0.8/6.;

b=0.9-0.8 Xll1ax /6..;

Y=(0.8/6..) X + (0.9-0.8 Xll1ax //1); where 6.=XIII {/X -Xlllill Thermometer coding

Thermometer coding is used to code the data in digital form, i.e., binary form. In which the number of bits used to code are one less than the ranb ae of the pattern values and each value is coded by making that many number of bits to one from the right hand side of the complete code vector.

Rule generation procedure for K-NN method

Rule extraction from K-Nearest neighbour method is simple and generates the rules, which can then be tested for validation against the original data set.

Here, depending on the Nearest Neighbours found for a pattern and the value of each attribute for those Nearest Neighbours, the rules are given. For a particular class the range of each attribute is to be computed using the patterns, which are classified as that particular class.

Rule generation from decision trees

Many inductive knowledge acquisition algorithms generate classifiers in the form of decision trees.

Transforming such trees to small sets of production rules is a common formalism for expressing knowledge in expert systems. The method makes use of the training set of cases from which the decision tree was generated, first to generalize and assess the reliability of individual rules extracted from the tree , and subsequently refine the collection of rules as a whole. The method for expressing the decision tree as a succinct collection of production rules of the form is: IF left-hand side THEN class.

Reasons for transforming the decision trees to production rules

Production rules are a widely used and well-

~lI1derstood vehicle for representing knowledge

111 expert systems.

A decision tree can be difficult for human expert to understand and modify, whereas the extreme modularity of production rules makes them relatively transparent.

(7)

SIYANANDAM el al.: SOFf COMPUTING MODELS FOR DATA MINING 333

• Transforming the decision trees to production rules improves classification performance by eliminating test in the decision tree attributable to peculiarities of the training set and making it possible to combine different decision trees for the same task.

Extracting individual rules

Following a path through the root of the tree to one of the leaves effects classification of a new pattern using decision tree. This path from the root of the tree to a leaf establishes conditions, in terms of specified outcomes for the tests along the path that must be satisfied by any case classified by that leaf. Every leaf of a decision tree thus corresponds to a primitive production rule of the form: IF XI I\X21\X3 1\ .. ,Xn THEN class C

where the Xi'S are conditions and C is the class of the leaf.

Simulation Results

The results obtained in testing, pruning and rule generation stages for the three applications considered are presented.

Dataset descriptionsl9: The data are collected from UCI machine learning database site.

Title of Database: Optical Recognition of Hand- written Digits Datall)

Number of Instances: Training:

Testing:

3823.

1797.

Number of Attributes: 64 input + 1 class attribute Title of database: Land Sat MSS imagery Data 19

Number of examples: Training set: 4435.

Test set: 2000.

Number of classes: There are 6 decision classes. Number of attributes: 36

(= 4 spectral bands x 9 pixels in Neighbourhood) Title of database: Letter Image Recognition Datal9 Number of Instances: 20,000. Number of attributes:

17 (letter category and 16 numeric features) K-Nearest Neighbour method results

The crisp and fuzzy versions of k-Nearest Neighbour method and Nearest Prototype Classifiers have been tested using two different applications such as Optical recognition of hand written digits data and Letter image recognition data.

Application 1 (Optical Recognition of Hand written Digits Data)-The Optical Recognition of hand written digits data contains 5620 samples with 10 classes. There are totally 64 attributes defining each sample. In order to evaluate the optimum performance of the classifier models, 35% of data samples are used for training.

Results obtainedfor crisp K-NN

Table I shows the variation in overall percentage of accuracy as the number of training samples increases. For 30% of the training samples, the %

Table I-Train/test results

Train/test Overall % accuracy Testing time (s)

200/5420 89.50 2764

500/5120 94.92 2673

1000/4620 96.47 2564

2000/3620 97.66 1987

3000/2620 98.02 1856

4000/1620 99.31 525

5000/620 100.0 272

Table 2-Results obtained for crisp K-NN for different input data

K Overall % Overall % Overall %

accuracy accuracy accuracy

Raw i/p Analog i/p Digital i/p

I 97.67 97.68 96.20

2 97.68 97.69 96.85

3 97.91 97.91 96.85

4 97.92 97.88 96.88

5 97.68 97.89 96.91

6 97.80 97.82 96.98

7 97.70 97.81 96.81

8 97.60 97.66 96.65

9 97.40 97.78 96.75

10 97.59 97.66 96.57

II 97.41 97.88 96.65

12 97.46 97.69 96.55

~ 100

"

90

u u

...: 80

"

r

0 70 t

"

t.:: u 60

~

~ 50

"

u

...

10 510 1010 1510 201251 301 351401451 501551 601

0 0 0 0 0 0 0 0 0 0

'$. No OfTrainillg Samples

Fig. 3--Yariation of nlllnber of training samples with % accuracy

(8)

334 INDIAN J. ENG. MATER. SCI.. DECEMBER 2001

classification accuracy

is

97.66%, and the testing time

is

1987 s.

Fig.

3 shows

th

e variation in the accuracy as

the number

of training sa

mples in

creases

.

It can be observed that in the case

of K-NN, with

out training, testing

th

e

new pattern

s

is not poss ibl

e.

Because in

case of

K-NN training IS nothin

g

but loading th

e

pattern

s in order to find the distance

between the new pattern

s and loaded patterns fo

r class ificati

on.

So, at least

a

minimum number of sa mples are needed for training.

Table 2 shows th

e overall classification acc

uracy

obtained for crisp K-NN using

raw input data, analog

input data and digital input data,

for diffe rent

values

of

K rang1l1g from I to 12. Here ana log da ta IS nothing but the scaled data and

digital

data is the raw data coded using therm

ometer coding.

Results obtainedfor fuzzy K-NN with membership function # 1

Table 3

shows

th

e

classificatio n

accuracy fo

r di

fferent types of input data and weighting

fac tor

values,

vi

z., m=2,

1.5

and 3 for diffe

rent

va

lues of K

(Nearest Neighbours)

rangin

g fro

m 1

-12.

Results obtained using fuzzy membership function # 2

Tabl

e

4

shows

the vari

ation in overall percentage of

classificati

on accuracy

U

S

1l1g the membership

Table 3-- Results obtained with membership function # I for different '111' values K

2 3 4 5 6 7 8 9 10 II 12

K

2 3 4 5 6 7 8 9 10 II 12

111=2

98.34 98.41 98.10 98.27 98.29 98.40 98.30 98.23 98.23 98.41 98.40 98.35

111=2

98.10 98.23 97.89 98.10 98.23 98.10 98.23 97.89 98.10 98.34 98.23 98.34

Overall % accuracy Raw data

111=1.5

87.70 88.10 87.71 87.70 87.75 87.88 87.96 88.07 88.10 88.10 87.71 87.70

111=3

87.79 87.98 88.10 87.98 87.79 87.96 87.92 87.78 87.65 87.79 87.65 87.23

111=2

98.36 98.36 98.10 98.28 98.28 98.29 98.40 98.35 98.44 98.42 97.25 97.95

Overall % accuracy Analog data

111=1.5

87.71 88.07 87.70 87.90 87.75 87.71 87.96 88.10 88.10 88.07 87.96 87.71

111=3

87.79 87.79 88.07 87.98 87.98 87.79 87.96 87.92 87.65 87.65 87.79 87.23

111=2

97.88 98.87 97.66 97.66 98.22 98.01 97.99 97.87 97.40 97.54 97.41 97.46

Overall % accuracy Digital data

111= 1.5

85.98 86.23 86.09 85.98 86.45 86.23 86.23 86.13 86.13 85.78 85.98 85.78

Table 4--Results obtained with membership function #2 for different '111' values Overall % accuracy

Raw data 111=1.5

87.65 87.79 88.10 87.78 87.96 88.07 87.92 88.10 88.07 88.10 88.07 87.79

111=3

87.98 87.79 87.96 88.07 87.79 87.65 87.96 87.79 87.65 87.79 87.65 87.34

111=2

98.23 98.23 98.10 98.10 98.23 98.23 98.34 98.02 98.34 98.02 97.25 98.40

Overall % accuracy Analog data

111=1.5

87.65 88.07 88.07 87.92 87.92 87.96 87.96 88.10 88.10 88.10 87.96 87.71

111=3

87.79 87.79 88.07 88.10 87.96 87.79 87.92 87.96 87.79 87.65 87.79 87.34

111=2

97.88 97.87 97.66 97.66 98.22 98.01 97.99 97.87 97.40 97.54 97.23 97.34

Overall % accuracy Digital data

111=1.5

85.98 86.09 86.13 86.23 86.23 86.09 86.23 86.13 86.09 86.23 86.09 85.78

1Il=3

86.23 86.13 85.98 86.13 85.78 86.09 85.98 86.13 85.78 86.09 86.23 86.09

111=3

86.09 86.09 85.98 86.09 85.65 86.13 86.23 86.09 85.78 86.13 86.23 86.09

(9)

SlY ANANDAM et al.: SOFf COMPUTING MODELS FOR DATA MINING 335

98,5 (5 >.

- Fuzzy # 1

~ (J 1\1

0

..

--Crisp

=

1\1 ::l (J 97,5

. . (J

-Fuzzy#2

97 0

96,5

~ cry lJ") I"- m ~

~

K

Fig.4-Comparison between crisp and fuzzy methods for raw input

99.000 (5 98.000

>.

~ (J

0 ~

~ :> 97.000

(J

<l> (J

0-<

96.000

95.000

6-

...

,.,\

... _J'

\..~-.

/

...

-- -

"<:t I'- 0

K

_ _ Crisp

_ Fuzzy#1

_ _ • Fuzzy # 2

Fig. 5-Comparison between crisp and fuzzy methods for digital input

K

• • • Crisp - - Fuzzy #1 ._--Fuzzy #2

function # 2 for different types of input data and weighting values 'm' as the K value ranges from 1-12.

Figs 4-6 shows the variation in overall classi- fication accuracy with different types of methods such as crisp, fuzzy membership # 1 and fuzzy membership

# 2 as K changes from 1-12 for raw, digital and analog input respectively. Fig. 7 shows the variation in accuracy with' m' .

Results obtained using nearest prototype classifier Table 5 shows the results obtained for nearest prototype classifier where the prototypes are calculated considering the complete vectors as the input patterns and the selected features of the vector as the input patterns. The selected features for optical recognition of hand written digits data are: 2-8, 10-24, 26-39,41-63.

Application 2 (Letter Image Recognition Data) Results obtained using crisp K-NN

Table 6 shows the results obtained using the crisp K-NN method for different types of input data and the K value ranges from 1-12 .

Table 6-Results obtained using crisp K-NN for different input data

K Overall % Overall % Overall %

accuracy accuracy accuracy

Raw i/p Analog i/p Digital i/p Fig. 6--Comparison between crisp and fuzzy methods for analog I 93.34 93.12 93.23

input 2

~ 100 , - - - ,

--.---

§ ~ r_---~

« o

00 I---~

~"" . ·J .• j'("aSE.Li£ Ai . ~.,." . .-.' ... ~~

~ ffi~---4

~ oo~---~

o

K

- - .Fcr m=2 - - - .Fcr m=1.5

---FcrITF3

Fig.7--Yariation in % accuracy with weighting factor 'm'.

3 4 5 6 7 8 9 10 11 12

93.23 93.20

93.33 93.27

93.34 93.43

92.96 92.83

92.90 92.90

92.38 92.46

92.34 92.31

91.83 91.83

91.96 91.84

91.83 91.40

91.83 91.36

Table 5--Yariation in classification accuracy with complete vector and selected features of the vector

Type of Input Data Crisp of % accuracy Fuzzy # I % accuracy Fuzzy # 2% accuracy

Com* Sel** Com* Sel** Com* Sel**

Raw 88.89 87.07 90.13 88.10 90.05 88.07

Analog 88.34 87.13 90.13 88.10 89.98 88.10

Digital 87.86 86.65 88.23 87.34 88.23 87.23

Com*--Complete vector Sel**--Selected features of vector

93.01 93.34 93.23 93.23 92.96 92.89 92.25 91.96 91.83 91.45 91.45

(10)

336 INDIAN 1. ENG. MATER. SCI., DECEMBER 2001

Results obtained for fuzzy K-NN using fuzzy Results obtained using fuzzy membership function #2 membership function #1

Table 8 shows the variation in overall percentage

Table 7 shows the variation in overall percentage

of classification accuracy using Fuzzy membership of classification accuracy using Fuzzy membership function # 1 with weighting factor 111=2, I.S and 3 for function #2 with weighting factor m=2, 1.S and 3 for different types of input data. different types of input data.

Table 7-Results obtained with membership function # I for different 'III' values

K Overall % accuracy Overall % accuracy Overall % accuracy

Raw data Analog data Digital uata

111=2 111= 1.5 111=3 111=2 111=1.5 111=3 111=2 111=1.5 111=3

I 94.12 87.65 88.05 94.12 87.65 88.13 94.02 85.9H 87.34

2 94.02 87.79 89.27 94.12 88.07 89.34 93.89 86.09 87.23

3 93.89 88.10 90.05 94.02 88.07 90.13 93.96 86.13 88.13

4 94.02 87.78 90.89 93.96 87.92 90.89 93.89 86.23 89.34

5 93.96 87.96 91.27 94.02 87.92 91.27 93.34 86.23 89.23

6 93.89 88.07 91.83 93.89 87.96 91.83 93.23 86.09 88.23

7 93.46 87.92 90.45 93.96 87.96 90.45 92.96 86.23 89.27

8 93.34 88.10 90.27 93.89 88.10 90.27 92.90 86.13 88.13

9 93.34 88.07 90.05 93.46 88.10 90.05 91.83 86.09 87.23

10 93.23 88.10 89.27 93.34 88.10 89.27 91.83 86.23 87.23

II 92.96 88.07 89.87 93.25 87.96 89.34 92.90 86.09 87.34

12 92.90 87.79 88.05 92.96 87.71 89.10 91.83 H5.78 87.01

Table 8-Results obtained with membership function #2 for different 'm' values

K Overall % accuracy Overall % accuracy Overall % accuracy

Raw data Analog data Digital data

111=2 111=1.5 111=3 111=2 111=1.5 111=3 111=2 111=1.5 111=3

I 94.02 87.23 89.27 94.12 87.89 88.05 93.89 87.13 87.23

2 94.12 87.89 88.13 94.02 88.99 89.27 94.02 87.23 87.13

3 93.89 88.99 90.05 93.96 88.90 90.13 93.34 88.13 88.05

4 94.02 90.13 90.13 94.12 90.27 90.05 93.23 88.34 88.13

5 93.89 9l.l3 91.27 94.02 91.59 90.45 92.90 88.23 89.34

6 94.12 91.59 91.34 93.89 90.65 91.83 92.96 88.13 88.23

7 93.46 90.05 91.83 93.34 90.05 90.45 92.90 89.27 88.05

8 93.34 89.90 90.05 93.89 90.33 90.27 92.96 88.23 88.05

9 93.23 89.27 90.27 93.46 90.05 90.05 91.83 88.13 87.13

10 93.34 89.34 89.34 93.23 89.27 89.27 91.83 87.13 87.23

II 92.90 87.34 90.05 93.23 88.23 89.87 91.78 87.13 87.34

12 92.90 88.23 88.05 92.96 87.89 89.10 91.83 87.05 87.98

Table 9--Variation in classification accuracy with complete vector and selected features of the vector

Type of input data Crisp % accuracy Fuzzy# I % accuracy Fuzzy#2% accuracy

Com* Sel** Com* Sel** Com* Sel**

Raw 86.95 87.01 87.34 87.23 87.23 87.13

Analog 86.95 87.01 87.23 87.34 87.23 87.34

Digital 85.78 86.23 86.89 86.79 86.89 86.79

Com*-Complete vector, Sel**- Selected features of vector

References

Related documents

All these results show that random forest algorithm is well suited to classify tabla strokes and works significant- ly better than the other two tree classifiers, namely deci-

Here, using probabilistic outputs of binary SVM classifiers, two algorithms namely decision tree based one-against-all for multiclass SVM classification and hybrid SVM based

Suppose that items of stack or queues were explicitly ordered, that is, each item contained within itself the address of the next item, such an explicit ordering give rise to a

The performance of supervised Kohone n Architecture using linear vector qu anti za tion (L VQ) is s hown in Table 12. As the number of epoc hs increa1es the netw

tiple regression models in turning of AISI 1040 steel [19]. Both the NN models found better in prediction than regression model. A similar work was carried out by Pare et al. [20]

Therefore, given a set of input-output data pairs, the obvious objective is to minimize the number of inequalities that are not satisfied. The N samples can be classified into the

Subsequently, multiple linear regression analysis was carried out between the obtained EC e values and S2 data, for the prediction of soil salinity models.. The relationship

Variation of average validation error on the breast cancer data set, shown as the number of genes used in the feature set.. The average validation error reached the minimum when