• No results found

On fuzzy-rough sets approach to feature selection

N/A
N/A
Protected

Academic year: 2023

Share "On fuzzy-rough sets approach to feature selection"

Copied!
11
0
0

Loading.... (view fulltext now)

Full text

(1)

Control Group, Department of Electrical Engineering, Indian Institute of Technology Delhi, Hauz Khas, New Delhi-I10016, India Received 8 September 2004

Abstract

In this paper, we have shown that the fuzzy-rough set attribute reduction algorithm [Jenson, R., Shen, Q., 2002.

Fuzzy-rough sets for descriptive dimensionality reduction. In: Proceedings of IEEE International Conference on Fuzzy Systems, FUZZ-IEEE'02, May 12-17, pp. 29-34] is not convergent on many real datasets due to its poorly designed termination criteria; and the computational complexity of the algorithm increases exponentially with increase in the number of input variables and in multiplication with the size of data patterns. Based on natural properties of fuzzy t-norm and t-conorm, we have put forward the concept of fuzzy-rough sets on compact computational domain, which is then utilized to improve the computational efficiency of FRSAR algorithm. Speed up factor as high as 622 have been achieved with this concept with improved accuracy. We also restructure the algorithm with efficient termination criteria to achieve the convergence on all the datasets and to improve the reliability of selected set of features.

Keywords: Computational complexity; Feature selection; Fuzzy sets; Rough sets

1. Introduction ables, from given set of input candidates, which really affect the output. The later reduces the origi- One aspect common to pattern recognition nal set of features into a linearly or nonlinearly problems is feature dimensionality reduction. transformed set. Feature selection allows to dis- Two different approaches for feature dimensional- card some of the irrelevant features and better per- ity reduction are feature selection and feature formance may be achieved by discarding such extraction. The former is to find a set of input vari- features (Fukunaga, 1972; Mucciardi and Gose,

1971; Steppe et al., 1996).

In their paper (Jenson and Shen, 2002) pre- sented an approach to dimensionality reduction by applying the concept of fuzzy-rough sets. In- deed the concept is very interesting and systematic

(2)

one, since by considering the degree of belonging- ness of real-valued data to the fuzzy sets, impor- tant information loss may be recovered. Fuzzy- rough dependency measure is natural extension to Pawlak's measure for dependency degree (Paw- lak, 1991). Also, feature selection stage is sepa- rated from model building stage. This approach allows building a fuzzy model directly on reduced dimension such as given by (Basak et al., 1998;

De et al., 1999), and once the model has been developed, it can be tuned further by neuro-fuzzy or genetic optimization technique to improve clas- sification/regression accuracy. Further, algorithm does not require supplying any external parameter or terminating criteria from the user.

However, the algorithm has not been applied to various benchmark problems. The purpose here is to describe our experience in applying the FRSAR (fuzzy rough set attribute reduction) algorithm, whose performance, we found, is severely compro- mised due to two flaws:

1. As per our investigation, this algorithm is not convergent (i.e., may be trapped into an infinite search loop) on many real data sets due to its poorly designed termination criteria and it may select unreliable features.

2. The computational complexity of the algorithm increases exponentially with Cartesian product of the input frame of cognition (i.e. Cartesian product of collection of fuzzy sets on each input attribute) and in multiplication with size of the data patterns. This makes the algorithm almost infeasible on large dimensional problems.

The main contribution of this paper is to put forward the concept of fuzzy-rough sets on com- pact computational domain based on the proper- ties of fuzzy t-norm and t-conorm operators.

This modified definition has been utilized to speed up existing FRSAR algorithm. We also restructure the termination criteria of the algorithm to make it convergent on subset of features within finite time and to reduce the number of levels and nodes re- quired to evaluate in search tree.

This paper is organized as follows: Section 2 discusses the fuzzy-rough sets and FRSAR. In Sec-

tion 3, we put forward the concept of fuzzy-rough sets on compact computational domain. Based on this definition, Section 4 builds improved feature selection algorithm, which is then compared with FRSAR. Experimental results, comparison with results available from the literature, and discus- sions are given in Section 5. Section 6 concludes the paper.

2. Theoretical foundations

A typical fuzzy classification/regression prob- lem can be described as follows. A universe of ob- jects or set of learning patterns U={xiji = 1,...,n} are described by a collection of input fuzzy attributes {P1,P2,...,Pp}. Each attribute mea- sures some important feature of an object and is limited to usually a small set of fuzzy linguistic terms A(Pi) = {Fikjk = 1,..,, Q. Each object xi 2 U is classified by a set of classes A(Q) = Each Fl 2 A(Q) may be a crisp or fuzzy set and Q is decision attribute.

The set U/P={Fikji=1,...,p;k=1,...,Ci}

can be regarded as fuzzy partitions of U by a set of attributes P using fuzzy similarity relations R on U (Radzikowska and Kerre, 2002).

The set U/P builds basic partitions about the domain of interest. Given any arbitrary fuzzy set A on the universe U, we can approximate it using basic fuzzy partitions. However, due to limited dis- cernibility of objects, fuzzy set approximation may not be perfect. Fuzzy-rough sets are a pair of approximation degrees hl;l p) with which we can 'certainly' and 'possibly' approximate arbitrary fuzzy set A using single or composite fuzzy set from basic fuzzy partitions.

More formal definition of fuzzy-rough sets is given below.

Definition 1. Given arbitrary fuzzy set lA(x):U![0,1]; "x 2 U and Fik 2 U/P. Accord- ing to (Radzikowska and Kerre, 2002) the fuzzy- rough set is a tuple O^,/^), where lower and upper approximation membership functions are defined by

(3)

AðFikÞ = inf maxf1 -

x2U

= S UP

x2U

(x),fiA(x)}

(1) Set approximations can be performed for each class Fl2A(Q) and approximations hFl;Fli can be generated by each Fik 2 A(Pi).

The positive region of a fuzzy set is the maxi- mum membership degree with which a unique class can be classified by fuzzy set Fik, i.e.,

ikÞ = sup flFðFikÞg

Membership of each pattern x 2 Uto the fuzzy po- sitive region is given by

flpos(x) = sup min{iJ.Fjt(x),iJ.ms(Fik)}

Dependency degree cP(Q)(0 6 cP(Q) 6 1) of Q on the set of attributes P is defined by

FRSAR algorithm for descriptive dimensionality reduction uses dependency degree of decision attri- bute Q on condition attribute(s) in hierarchical manner to identify the set of relevant features.

Pseudocode of the algorithm is given in (Jenson and Shen, 2002). We are reporting it here to main- tain the flow of reading.

The algorithm is a tree structured combinato- rial search procedure based on the evaluation of dependency factor for S C P. Evaluation proce- dure is organized in top-down approach. First, the one-element candidate sets are evaluated with the corresponding cS(Q) value. Then those two- element candidate sets are evaluated which con- tain the best candidate of the previous level and so on. The candidate sets are structured in a tree, whose root (level 0) is the set ;, and level v con- tains nodes with candidate sets of cardinality v.

With this search, maximum pðp nodes are evalu- ated in total of p levels out of 2p — 1 possible com- binations, and at vth level, number of nodes are p — v + 1. Here p is the cardinality of input attri-

bute set.

To find dependency degree with more than one attributes, i.e., with P = {P1,P2,..,} Cartesian

product of frame of cognitions

should be considered. Each fuzzy linguistic term will be of the form

fðF1k1;F2k2;...Þj 1 6 k i6 Cig:

Symbols used in FRSAR pseudocode are:

R stores selected set of variables at each level;

T stores temporary variables at each level of search;

c0best is the best dependency degree at any level;

and

0prev is th e best dependency degree of previous level.

FRSAR (taken from Jenson and Shen, 2002):

do

cprev cbest

"Pi e(P-R) {Pi

u n t i l cbest = Tpr

return R prev

In this tree structured search, computational complexity of FRSAR increases with the Carte- sian product of input frame of cognition and in multiplication with the number of data patterns (further details are given in Section 4).

3. Proposed fuzzy-rough sets on compact computational domain

Here we propose the concept of fuzzy-rough sets on compact computational domain by using properties of fuzzy t-norm and t-conorm opera- tors. Our proposed modification in FRSAR algo- rithm is based on new definition of fuzzy-rough sets on compact computational domain. In the next section, we propose three modifications in FRSAR algorithm.

(4)

Fuzzy t-norm and t-conorm Operators:

Definition 2 (Radzikowska and Kerre, 2002):

Fuzzy t-norm (s) and t-conorm (#), are an increasing, associative and commutative mapping x : [0,1]2 -+ 0;1] and # : [0,1]2 -+ 0;1]

Two most commonly used fuzzy t-norm and t-con- orms are min and max operators, respectively.

minðx; 1Þ = x; minðx; 0Þ = 0;

maxðx; 1Þ = 1;maxðx;0Þ = x:

For this family of fuzzy t-norm operators, "one" is the neutral element and for fuzzy t-conorm opera- tors, "zero" is the neutral element.

Proposed definition of Fuzzy-Rough Sets: Using neutral properties of fuzzy logic operators, we sug- gest new definitions of lower and upper approxi- mation membership functions.

Definition 3. Given arbitrary fuzzy set lA(x):U! [0,1]; "x2U and Fik 2 U/P. The fuzzy-rough set on compact computational domain is a tuple hlA; lAi, where lower and upper membership functions are defined by

sup maxflFik ðxÞ ; lA ðxÞ g; DA (Fik) ^ 6

D ð Fi

where DA(Fik) C U and DA(Fik) C U are compact computational domains for lower and upper approximation membership functions defined as, DA(Fik) = x2U

Dj(Flk) = x2U

Here A is logical AND connective.

By Eq. (2), DA(Flk)=flknAc

DA(Flk)=flknAs

where

(2)

(3)

AC = fx 2 U j lAðxÞ = 1g AS = fx2Uj lAðxÞ > 0g; and fik = fx2Uj lFikðxÞ > 0g

It is defined as fuzzy-rough sets on compact com- putational domain, because it suffices to calculate lower and upper approximation memberships only on certain compact domain, defined in Eq. (3) in- stead of all x 2 U.

The definition given above is consistent with that given in Eq. (1), but requires fewer numbers of computations for calculating lower and upper approximations.

For fuzzy t-norm operators, "one" is the neu- tral element. The region where maxf 1 — lFik ðxÞ ;

LAÐXÞG = 1 does not have any impact on the for- mation of lower approximation membership due to 'inf operator on it. We will find a region where maxf1 — lFikðxÞ ; fiA(x)} ^ 1, and compute lower and upper approximation membership degrees only on this region. If this region is empty we will show that lAðFikÞ = 1.

Let,

m a x f 1 - fiF.k(x),fiA(x)} ^ 1

It is obvious that iffDAðFik = 0, then there

=x 2 U; maxf1 - lFikðxÞ ; fiA(x)} ^ 1, i.e., all the patterns x 2 Fik, belong to core of fuzzy set A. In this case, fik C AC and lAðFikÞ = 1.

The consistency and compactness of upper approximation can be proved in a similar way. If DAðFik = 0 then Fik and ; do not have any pattern in common (i.e., fik \ As = ; ) .

Further, if DAðFikÞ = U then lAðFikÞ never be- comes 1. Let,

Dd(F,k) = U

F o r lAðFikÞ to be 1, m a x f 1 - lFikðxÞ ; lAðxÞg = 1;

" x 2 U. But lA(x) 5 1({AC=;) and lFikðxÞ 6 0ð*fik = UÞ. Thus lAðFikÞ never becomes 1.

(5)

Qualitatively fik = U A AC=; indicates a situa- tion where no element belongs to the core of fuzzy set A, and support (Fik) = U. fik = U is possible, though rarely, but AC = ; never happens in prac- tice. When the classification is crisp (i.e., A is crisp set), where either element belongs to the set or not (i.e., crisp partitioning of decision region); saying that core of certain classification is empty, is equiv- alent to saying that classification does not exist at all. When A is fuzzy set, there will be some training data which belongs to the core of given fuzzy set.

So for real world applications, compact computa- tional domain is strictly subset of U.

DA(F,k)cU and j DAðFikÞ j< n (4) This shows that number of elements on which cal- culations are required by proposed definition will be always less then total number of training patterns.

Observation 1. Compact computational domain with composite fuzzy set can be defined as DA(Fu...,FJ)=f1n---nfJnAc

= DA(F1)n---nDA(FJ) D1(Fu...,Fj)=f1n---nfjnAs

= D1(F1)n---nD1(Fj) To prove this, we will find a region where max{l -min((j,Fl(x),..., fiFj(x)), fiA(x)} ^ 1;

and define compact computational domain by its set theoretic difference with U.

maxf1 - i(x),...,nF.(x)),nA(x)}

and thus DA(FU... ,Fj) = D^F,) n • • • nDA(Fj), Proof for DAðF1;... ;FjÞ can be developed on sim- ilar lines.

Note that, DAðF1;...;FjÞ

4. Proposed fuzzy rough feature selection algorithm Proposed new definition of fuzzy-rough sets can be utilized to make pattern recognition algorithms faster. We build here a significant feature selection algorithm by utilizing proposed definition of fuz- zy-rough sets and then defining dependency degree of output on set of input(s). At level 1, for S = {Pi}; "i = 1,... ,p, cS(Q) can be calculated as follows:

1. Calculate " k , " l

1;DFlðFik Þ Q)

2. Calculate fuzzy positive region for Fik;

l;3F,eA(Q),DFi(Flk)=®

3. Calculate membership of x 2 U to the fuzzy positive region by

= sup minflFik ðxÞ ; lPOS ðFikÞ g 4. By this, calculate

ys{Q) = Zxeu

Compact computational domain is required to calculate only at first level of the search tree. At every subsequent level it can be calculated by iter- ative intersections. This modification restricts the number of patterns to increase in multiplication with the Cartesian product of the frame of cogni- tion of selected set of inputs (see Observation 1).

We restructure the tree search algorithm by incorporating two other important steps.

1. If compared to the best variable's performance at vth level, the performance of its combination with any other variable at v + 1st level gets reduced, then added variable can be perma- nently removed from the search tree.

(6)

2. If two nodes at any level v give same best signif- icance factor then, problem arises, which might be called multiple branching. From this node emerge two separate branches, and further search becomes more complex. This problem is common to FRSAR algorithm. To this end, there are two possible solutions. One is to con- tinue search with first best selected node and remove the other one, or alternately, signifi- cance factors can be calculated at v + 1st level with both the nodes, and keep that node which results in higher dependency.

By incorporating these steps, we list below the pseudocode of our proposed feature selection

algorithm.

Symbols used in pseudocode are:

S stores selected set of variable at each level;

C stores rejected set of variables at each level;

T temporary storage of variables at each level;

temp stores remaining variables after selection and rejection at each level;

cbest is the best dependency degree at any level;

and

cprev is the best dependency degree of previous level.

START

8k;8i Compute fik = fx 2 U j lFikðxÞ > 0g 8l Compute f, = {xeU\ fiFl(x) '> 0g

cprev = 0, cbest = 0, v <- 0

do

cprev * cbest

level — v: " P i 2 temp

Compute DF ðFikÞ; 8k = 1;. . .; Ci Compute cS [ fPig

V cS[Pi < cprevÞ O R ð cS[Pi < cbest TAe« C ^ C U { P , j

^/ ðcS[Pi > cbestÞ

r/^e« i. r ^ x u {p

t

}

2. cbest c T end of level — v v v + 1

S T

temp <- P S - C while (temp 5 ;) Return S STOP

FRSAR versus Proposed Algorithm: Number of computations at any level v of search tree for both the algorithms can be calculated as follows:

FRSAR:

CS CS C

Q n þ n

where S is selected set of variables up to the level v, and temp is P—selected set of variables up to the level v (i.e., P-S). At first level S = ;, CS=1, and temp= {1,...,p}.

Proposed algorithm: Algorithm computes com- pact computational domain for each variable at level 1. At level 1, number of computations is,

E

2 x Ci þ CQ þ Ci x CQ þ domainiþ

1=1

At any subsequent level v,

E

i x CS þ 2 x domaini þ n

where S is selected set of variables up to the level v, and temp is P—selected set of variables—rejected set of variables up to the level v (i.e., P—S—C). At first level S = ;, CS = 1, and temp = { 1 , . . . ,p}. do- maini, will be calculated on Cartesian product of ith input frame of cognition along with frame of cognition of selected variables in S.

domairii = J3 J3 | Dp, (Fit) \

1=1 fe{l,...,CjxCs}

Computations at each level are summed together to get the total computational effort of the algorithm.

(7)

Analysis by growth rate: Here we analyze both the algorithms by their growth rate with increase in the level of search tree, and prove that proposed algorithm grow no faster than FRSAR.

Let,

TV = {0,1,2,3,...}

R = the set of real numbers Rþ = the set of positive reals R* =R+U {0}

Definition 4 (Sara, 1988): Let f:N!R*. o(f) is the set of functions g:N!R* such that for some c 2 R+ and some n0 2 N, g(n) 6 cf(n) for all n P n0.

o(f) is the set of functions which grow no faster thanf

Let C is the proposed algorithm and K is FRSAR algorithm. At some arbitrary level v(v 5 1), we write both the algorithms as functions of n as Cv(n) and Kv(n).

From the definition of compact computational do- main and Eq. (4),

•E E i

1=1 k£{l,...,QxCs}

vC,

-y y

1=1 i2f1 ;...;c,x

i2temp

E, Q

t x CS þ 2 x domaini þn<

d x CS þ 2 x Q xCsxCn xn þ n

^.P(«)</P(«)

From Definition 4, algorithm C is of order o(K), i.e., grow no faster than K. This way proposed algorithm has strictly smaller asymptotic growth rate than FRSAR.

For level 1, the relationship between two algo- rithms is nonrigorous. Depending on fuzzy parti- tions and number of data patterns, computations needed by proposed algorithm is little higher or less then FRSAR, due to computations of com-

pact domain. However, this has been compensated by high computational gain of proposed algorithm at subsequent levels.

In fact, computation growth rate of proposed algorithm is very slow compared to FRSAR, be- cause compact computational domains at subse- quent levels are merely intersection of compact domains at former levels. Many a times it may re- sult in empty sets, which results in no computa- tions as per the proposed definition.

Within a finite time, proposed feature selection algorithm will converge to some S C ? , without generating the problem of multiple branching and without getting trapped in an infinite search loop. After certain level v, at v + 1st level, the algo- rithm (i) may not reject any variable and select one variable, (ii) may reject more than one variable and select one variable, or (iii) may reject all the variables without selecting single variable. So at level v, if jtempjv is the cardinality of temp, then jtempjv+1 < jtempjv. Fuzzy RSAR algorithm lacks this. FRSAR adds new variable to the current sub- set of features, if the dependency degree is higher than the best achieved till the current level (i.e.,

^ cR0[fPigðQÞ > cT0ðQÞ). If this condition does not hold for atleast one combination of variables, then algorithm will not add any variable at new search level of the tree. This will result in an infinite search loop (see computational experiments in Sec- tion 5). Further in FRSAR algorithm, if at certain level v, we achieve the condition "best 'prev'then algorithm will terminate by selecting a set of signif- icant attributes S up to level v. However, it is quite possible that for other combinations, higher func- tional dependency can be achieved.

5. Computational experiments

The behavior of the proposed algorithm is examined against FRSAR on 6 standard datasets, out of which first 3 are regression and last 3 are classification problems. Their features are summa- rized in Table 1. Third column indicates input-out- put triangular fuzzy partitions. Each selected datasets is first split into two parts; the training set, composed of randomly chosen 50% patterns and testing set of the remaining 50% patterns.

(8)

Table 1

Summary of datasets used in experiments Dataset

Operator Gas furnace Synthetic function Thyroid gland Diabetes Vetem lung cancer

No. of attributes 5

10 4 5 8 7

Fuzzy sets 6/6 4/4 9/9 3, 2-class 3, 2-class 3, 2-class

Samples 70 296 100 215 768 137

Training set has been fuzzyfied using FCM (Bezdek, 1981) algorithm. Clustered raw data gen- erated by FCM algorithm have been approxi- mated by triangular fuzzy sets using convex hull of clustered raw data. Convex hull of each clus- tered raw data has been determined by MATLAB software function 'convhull'. For regression prob- lems input and output both are fuzzyfied, while for classification problems, only inputs have been fuzzyfied, outputs are crisp classification sets. Fea- ture selection is then performed using fuzzyfied data by using proposed and FRSAR algorithm.

After selection of significant features, fuzzy model has been developed on reduced set of fea- tures. Wang and Mendel's approach (Wang and Mendel, 1992) has been used for fuzzy rule extrac- tion for regression problems and fuzzy-rough rule extraction algorithm (Bhatt and Gopal, 2004) has been used for rule extraction in classification prob- lems. After extracting fuzzy classification/regres- sion rules, parameters of the triangular membership functions have been optimized using

Table 2

Features selected: a—FRSAR, b—proposed algorithm Dataset

Operator Gas furnace Synthetic function Thyroid gland Diabetes

Vetern lung cancer

a Algorithm did not converge and trapped into an infinite search loop.

b The algorithm did not converge for 10 hours, so we have terminated the algorithm in between, and answers reported in this paper are with respect to that early termination.

Features selected A

a

[11098347]b [2341]

[5 3124]

[52614738]

[3462715]

B [3]

[17 6]

[12]

[213]

[5234]

[3 467]

OS

O O

a

H

1

>

O

ed upSpecet.

a

ming seque

H

o

aS

Qas .0 ,—,

er genet mizatioi

s &

0

fact

f ^

com

D

rels (CP

«

<

i n 0 0 0 0

S|

<N

1

&>

1

• n 0

II

.422

0

2

0

3

0 OS

&

O

^ j . O O O

.004 m

O

**•

**•

m

O N

622

O N O

^_;

II

Q ^ '

^O

°^

.859

0 i n 0

' s t

44.7:

1

81440..1250.23

0

rnace

i n OS

0

0 0 0 0

r i

147

0

ci

m

**

O N

O

.65

O N

.87

11

\—1 00

<N m

.265

<—1

m

ci

)17x = 95.

II ^

^ ol

2 0X ^

^\1

5066.3605 [0 0.0310.:iontic functi

or

^ 3 S3

o

82.81.

m

, 1

en

.55.61

0

11

O O

.266

0 0 0 O

953

0 11

41020..0780.17

0

d gland

10 m<N

77.

0 0

,2

un

<N

^0m

.32

m

O

||

0 0

^ 1 ion

>—1

<N Wan 0

.235

0

• n

!-3

0

0^

m• n in

<N

5.531

r e

54180..1250.21

or

Q

93.93.

lun

t~~-

0 0

.2857

0

875

0

11

m

10.

0 0

<N O ,—1

O

.062

0

r~~•<t 0 0

i n

37.

II

78]

0

r i

0.25.09565.54 00.0310

11 a

lung cancer

C3

a

s than d

D i a

tleve;ncy aipenc

<

000 in r—^

509 (

0

0 0 0 0

• •

s j GO

>

five

0

_g

ssi

CD

•f

ency

T 3

lapi

0 W

0 C3 O 0

didgorithm

<

O H

_O

fsigni j H

^

c^a j

+ J

GO

&

5 j d

"3

GO

g

<D

same beiainsrem[ 4, temp

>

ap O a j

S1O1n discusgiv

&

el7.F

>

• — '

O H

"2

3 selectureieaiwit!

fu

Q

g

lilt

- O

00 d0

• y O H

S§

>ar

CD

d

LOh, *

as

Q

&

onvergenot cidid[gorithm

< ^

tri

a

a j

to hand'

"3 ST

Progr

^0<N II

ets is ;

f

'5

itio

0 0

&

^^

0

O

D H C3as

tesifCar

0

inali

T 3

cart level 8

OS

:xity of F

a

^ D ON ON

O

0B

Gas

ON

**

s

x of simat:

0

i

<D O H

.B

ing

1

<D

al

d

•-H

reqi

sOS

^ 3

613aloftot238 colulandrowsis many: level 9.

(9)

genetic algorithm to improve classification/regres- sion accuracy.

Triangular fuzzy set A can be represented by three parameters; (a,b,c), where the degree of membership of pattern u to the triangular fuzzy set A is given by

fiA(u) = 0 ;u

u — a b-a u — b c-b

0 ; U P

;a <u

<u

We have optimized these three parameters for each attribute's fuzzy set, within each fuzzy classi- fication/regression rule using genetic algorithm.

The genetic algorithm operates by generating somewhat random solutions called chromosomes to form a set of possible solutions called a popula- tion. In each generation, or iteration of the algo- rithm, the genetic algorithm constructs a new population using genetic operators such as muta- tion and crossover. Each generation yields a solu- tion that is the same or better than that of the previous generation (Jang et al., 1997).

The methodology for the optimization of fuzzy classification/regression rules through tuning of membership functions has been adopted from (Janikow, 1996).

Parameters for the genetic algorithm are as follows:

Maximum number of generations: 3000 Population size: 30

Mutation probability: 0.5 Crossover probability: 0.5

Three parameters have been used for compari- son of both the algorithms. (1) Time to perform each level operation to compare growth rate. (2) Total convergence time to compare speed. (3) Accuracy of developed fuzzy model to compare reliability of selected features. Speed up factor measures the ratio of total convergence time of FRSAR to convergence time of proposed algorithm.

Accuracy has been measured by applying test- ing set to developed fuzzy model. For classification problems performance index is number of samples classified correctly to total number of samples. For

Table 4

Comparison of proposed feature selection algorithm with some of the well-known results available from the literature for diabetes dataset

Proposed (after model optimization) F N N SNR NNFS OFEI FQI

# of selected features 4 Accuracy (%) 77.23

2 76.81

1 73.53

2.03 74.29

2 75.85

2 75.28

*FNN—feature selection with neural networks (Verikas and Bacauskiene, 2002); SNR—signal to noise ratios (Bauer et al., 2000);

NNFS—neural network feature selector (Setiono and Liu, 1997); OFEI—overall feature evaluation index (De et al., 1997); FQI—

feature quality index (De et al., 1997).

Table 5

Comparison of proposed feature selection algorithm with some of the well known results available from the literature for operator, gas furnace and nonlinear synthetic function dataset, lower number of accuracy means better performance

Proposed (after model optimization) Fuzzy-neural approach SY model Operator

Gas furnace Synthetic function

# of selected Accuracy

# of selected Accuracy

# of selected Accuracy

features features features

1 0.0005 3 0.0014 2 0.0018

0.002 5 0.071 2 0.0049

NA NA 3 0.190 NA NA NA—not applicable. Fuzzy-neural approach (Lin and Cunningham, 1995), SY model (Sugeno and Yasukawa, 1993).

(10)

regression problems, performance is defined as (Lin and Cunningham, 1995),

PI =

ZUQ.-Q,)

2

where Qt—actual output and Qt—model output.

The sources for datasets are: operator and gas furnace (Sugeno and Yasukawa, 1993), Synthetic function (Lin and Cunningham, 1995), Thyroid gland and Diabetes (Black and Merz, 1998), and Vetern lung cancer (http://lib.stat.cmu.edu/data- sets/veteran).

Features selected in order by both the algo- rithms for each dataset are given in Table 2.

Results are summarized in Table 3. Results shows that growth rate of proposed algorithm be- tween two consecutive levels are very low. In each case algorithm requires to evaluate less number of nodes and levels, and accuracy has been improved on selected set of features by proposed algorithm.

Tables 4 and 5 compare results of proposed fea- ture selection algorithm with some of the well- known results available from the literature.

6. Conclusion

In this paper, we have reported a difficulty with the application of the descriptive dimensionality reduction algorithm; FRSAR (Jenson and Shen, 2002). In applying this algorithm for dimensional- ity reduction stage of various benchmark prob- lems, we found that the computational complexity of the algorithm increases exponen- tially with Cartesian product of the input frame of cognition and in multiplication with the size of the data patterns. Also, the algorithm is not convergent many a times on real datasets due to its poorly designed termination criteria. Some crit- ical issues concerning the complexity and conver- gence of the feature selection algorithm have been discussed in detail. Based on natural proper- ties of fuzzy t-norm and t-conorm, the concept of fuzzy-rough sets on compact computational domain is put forward, which is then utilized to improve the existing algorithm. We have restruc- tured the algorithm with efficient termination

criteria. It has been proved that, in any case growth rate of the proposed algorithm is strictly less then FRSAR.

Our proposed algorithm has some similarities with adaptive floating search algorithm (Somol et al., 1999). However, our proposed algorithm does not require predefined user specified absolute generalization limit, adaptive calculation of abso- lute generalization limit, and termination criteria.

Further, implementation of our proposed algo- rithm is very straight forward compared to ASFFS (Somol et al., 1999).

As a further research, dynamic fuzzy-rough fea- ture selection will be explored to extract most sta- ble feature subset by randomly sampling the data patterns and calculating significant features from each sampled table.

Acknowledgments

The authors would like to thank anonymous referees for their helpful and precious comments.

Authors are also thankful to Prof. Zdislaw Paw- lak, Polish Academy of Sciences, Warsaw, Poland, for sending us some of his research notes on rough set theory.

References

Basak, J., De, R.K., Pal, S.K., 1998. Unsupervised feature selection using neuro-fuzzy approach. Pattern Recognition Lett. 19, 997-1006.

Bauer, K.W., Alsing, S.G., Greene, K.A., 2000. Feature screening using signal-to-noise ratios. Neurocomputing 31, 29-44.

Bezdek, J.C., 1981. Pattern Recognition with Fuzzy Objective Function Algorithm. Plenum, New York.

Bhatt, R.B., Gopal, M., 2004. Induction of weighted and interpretable fuzzy classification rules for medical informat- ics. In: Proceedings of International Conference on Syste- mics, Cybernetics, and Informatics, ICSCI 2004, February 12-15, Hyderabad, INDIA, pp. 371-376.

Black, C.L., Merz, C.J., 1998. UCI Repository of machine learning databases; http://www.ics.uci.edu/~mlearn/MLRe- pository.html. Irvine, CA: University of California, Depart- ment of Information and Computer Science.

De, R.K., Pal, N.R., Pal, S.K., 1997. Feature analysis: neural network and fuzzy set theoretic approaches. Pattern Rec- ognition 30 (10), 1579-1590.

(11)

De, R.K., Basak, J., Pal, S.K., 1999. Unsupervised feature selection using neuro-fuzzy approach. Pattern Recognition Lett. 19, 997-1006.

Fukunaga, K., 1972. Introduction to Statistical Pattern Rec- ognition. Academic Press, New York.

Jang, J.-S.R., Sun, C.-T., Mizutani, E., 1997. Neuro-fuzzy and Soft Computing. Prentice-Hall, Inc., Englwood Cliffs, NJ.

Janikow, C.Z., 1996. A genetic algorithm method for optimiz- ing fuzzy decision trees. Inform. Sci. 89, 275-296.

Jenson, R., Shen, Q., 2002. Fuzzy-rough sets for descriptive dimensionality reduction. In: Proceedings of IEEE Interna- tional Conference on Fuzzy Systems, FUZZ-IEEE'02, May 12-17, pp. 29-34.

Lin, Y., Cunningham, G.A., 1995. A new approach to fuzzy-neural system modeling. IEEE Trans. Fuzzy Syst. 3 (2), 190-198.

Mucciardi, A., Gose, E.E., 1971. A comparison of seven techniques for choosing subsets of pattern recognition properties. IEEE Trans. Comput. 20 (9), 1023-1031.

Pawlak, Z., 1991. Rough Sets: Theoretical Aspects of Reason- ing about Data. Kluwer, Norwell, MA.

Radzikowska, A.M., Kerre, E.E., 2002. A comparative study on fuzzy-rough sets. Fuzzy Sets Syst. 126, 137-155.

Sara, B., 1988. Computer Algorithms: Introduction to Design and Analysis. Addison-Wesley.

Setiono, R., Liu, H., 1997. Neural-network feature selector.

IEEE Trans. Neural Networks 8 (3), 654-662.

Somol, P., Pudil, P., Novovikova, 1999. Adaptive floating search feature selection. Pattern Recognition Lett. 20 ( 1 1 - 13), 1157-1163.

Steppe, J.M., Bauer, K.W., Rogers, S.K., 1996. Integrated feature and architecture selection. IEEE Trans. Neural Networks 7 (4), 1007-1014.

Sugeno, M., Yasukawa, T., 1993. A fuzzy-logic-based approach to qualitative modeling. IEEE Trans. Fuzzy Syst. 1 (2), 6—

31.

Verikas, A., Bacauskiene, M., 2002. Feature selection with neural networks. Pattern Recognition Lett. 23, 1323-1335.

Wang, L.-X., Mendel, J.M., 1992. Generating fuzzy rules by learning from examples. IEEE Trans. SMC 22 (6), 1414- 1427.

References

Related documents

In this paper we have proposed a feature selec- tion technique, in which features are assigned sig- nificance values based on the intuition that if an attribute is significant,

We present the results with eight different data sets to illustrate that the proposed feature subset selection criterion can achieve similar or better performance compared with

We have proposed a complete framework to show how rough set based reasoning can be used to build a user relevance feedback based text filtering system, using concept

The major contributions of the thesis are: formulation of hybrid fuzzy-rou帥 measures and their analysis from a pattern classiffication view point, incorporation of these measures

An ICFS approach to detect the malware associated with feature selection and classifier based on machine learning [18].The naive Bayes approach was proposed based on

A methodology for integrating four soft computing tools, viz., arti&#34;cial neural networks, fuzzy sets, genetic algorithms and rough sets for designing a knowledge- based network

Several quantitative indexes are defined based on fuzzy- rough sets to evaluate the performance of the proposed feature selection method on fuzzy approximation spaces for real-life

The performance of FRGSOM and some related clustering methods like rough self- organizing map (Rough SOM), fuzzy self-organizing map (FSOM) and self-organizing map (SOM) is