• No results found

Applications of information theory in chemical graph theory

N/A
N/A
Protected

Academic year: 2022

Share "Applications of information theory in chemical graph theory"

Copied!
14
0
0

Loading.... (view fulltext now)

Full text

(1)

Indian Journal of Chemistry Vol. 42A, June 2003, pp. 1227-1240

Applications of information theory in chemical graph theory

t

Elena V Konstantinova*, Vladimir A Skorobogatov, Maxim V Vidyuk Sobolev Institute of Mathematics. Russian Academy of Sciences.

Novosibirsk. 630090. Russia. e konsta@math.nsc.ru. skrbg@maitar.org.il

Budker Institute of Nuclear Physics. Russian Academy of Sciences. Novosibirsk. 630090. Russia. vidyuk@inp.nsk.su Received 30 December 2002

Information theory has been used in various branches of science. During recent years it is applied extensively in chemical graph theory for describing chemical structures and for providing good correlations between physico-chemical and structural properties. The application of information theory to the problem of characterizing molecular structures is presented in the paper. The information indices based on the distance in a graph are considered with respect to their correlating ability and discriminating power.

We briefly review here selected topics in chemical graph theory, in particular, those being concerned with the use of information theory to characterizing molecular structure. Chemical graph theory is interested in the nature of chemical structurel. All structural formulae of chemical compounds are molecular graphs where vertices represent atoms and edges represent chemical bonds. Figure 1 gives the schematic representation of the derivation of a molecular graph from an alkane molecule. The molecular graph is the hydrogen-suppressed one. That is the commonly used representation in chemical graph theory because hydrogen atoms are small and so add very little to the overall size of the molecule.

Using the molecular graph one can obtain, for example, the carbon-number index (the number of carbon atoms in the hydrocarbon molecule) that is known since 1844 as one of the first topological indices used in chemistry to characterize molecular structures. Basically topological index expresses in numerical form the topology of the chemical species it presents. Topological indices are designed by transforming a molecular graph into a number and possess the remarkable ability of being able to correlate and predict a very wide spectrum of properties for a vast range of molecular species. The carbon-number index is well known to provide an effective measure of the molecular volume: for the members of homologous series, the molecular volume is known to be directly proportional to the carbon- number index2.

tThis work was supported by the Russian fund of Fundamental investigations through Grant 03-01-00796.

The construction and investigation of topological indices that could uniquely characterize molecular topology is one of the main directions of chemical graph theor/. The isomer discrimination, structure- property relationships and structure-actIVIty correlations, and the design of compounds of desired properties are the most important trends in chemical graph theory where topological indices are commonly

A Ikanc molecule

Chemical formula

H H H

I I I

H - C - C - C - H

I I I

H H H

Siructural formula

Molecular hydrogen·suppressed graph

II

=

3

Topological index

Fig. I- Schematic representation of the derivation of the molecular graph and the carbon-number index from an alkane molecule

(2)

1228 INDIAN J CHEM, SEC. A, JUNE 2003

used. We concentrate our attention on the first and second trends. Moreover, we consider all the aspects with respect to the information theory application.

It is well-known that application of some ideas from one scientific field to another often gives a new view on the problems. Information, one of the most general ideas in contemporary science, should be expected to penetrate in various branches of science.

Indeed, applications of information theory to molecular graphs have produced results that are important in chemistry.

The science of information theory has grown mainly out of the pioneering studies of Shannon3,

Ashb/, Brilloui n5 and Kolmogorov6. There is more than one version of information theory. In Shannon's statistical information theory, information is measured as reduced uncertainty of the system. Ashb/

describes information as a measure of variety. In the algorithmic theory of Kolmogorov, the quantity of information is defined as the minimal length of a program that allows a one-to-one transformation of an object (set) into another one.

Applying information theory to material structures like atoms, molecules, crystals, etc., as well as to different mathematical structures like sets, groups, graphs, etc., the interpretation given by Mowshovitz7 in 1968 is more appropriate.

Let a given system [ having 11 elements be regarded according to a certain equivalence relation, into k equivalence classes with cardinalities Hi '

Considering all the n elements partitioned into k classes, we can define the probability Pi' i = 1, ... , k , for a randomly selected element of this system to be found in the i -th class. Therefore, a finite probability scheme may be associated with the following structure:

2, 3, ...

k

1

112 I1J ' " 11k

P2 p) Pk

where n= ~k n., p. =nln and ~k Pi =1.

L.J,=I I " L..J,=I

The information content of a system [ with n elements is defined by the relation5

k

[ = nlog2 n -

L

ni log211i .. , (1)

;=1

The logarithm is taken at basis 2 for measuring the information contents in bits. Another information

measure is the mean information content of one element of the system I defined by means of the total information content or by the Shannon relation3:

k

I =

1/11

= - L

Pi log2 Pi ... (2)

;=1

The application of information theory to different systems or structures is based on the possibility of constructing a finite probability scheme for every system. One can mention here, that the criterion for partitioning the elements of a given system is not unique. The number of information measures is equal to the number of ways in which a set of 11 elements may be partitioned into different subsets, that is, the number of Young diagrams for a given n. It is always possible to select for any system several information measures, each of them closely connected with certain properties of the system. They reflect the essence of the idea of information, given by Ashb/ as a measure of the variety in a given system. This idea was used in graph theory and in chemical graph theory for characterizing graphs as well as molecular graphs and molecular structures.

At first information theory was applied to graphs in 1955 by Rashevskl, who defined the so-called topological information content of the graph [top. His definition is based on the partitioning of the vertices of a given graph into classes of equivalent vertices having the same valencies. Trucc09 in 1956 made this definition more precise on the basis of an automorphism group of the graphs. In the latter case, two vertices are considered equivalent if they belong to the same orbit of the automorphism group, i.e., if they can interchange preserving the adjacency of the graph. Rashevsky 10 used the topological information in studying the possibility of self-generation of the life on earth. As for chemical structures, information theory has been successfully applied in the study of various molecular properties 1 1-13, in the field of

. 1415 h ' . 16-19

molecular dynamics ' and quantum c emlstry , in the description of the electronic structure of atoms20, in the interpretation of the Pauli principle and Hund rule21.

A molecular topology determines a large number of molecular properties. It was found in the last years that some biological activities of molecules, and even carcinogenecity, are closely related to a molecular topology. Thus, it is of a pertinent interest for chemistry (as well as for other natural sciences) to

(3)

KONSTANTINOVA et 01.: APPLICATIONS OF INFORMATION THEORY IN CHEMICAL GRAPH THEORY 1229

have some quantitative measure reflecting the essential features of a given topological structure. As it was mentioned above such measures are usually called topological indices in chemical graph theory. A lot of such indices have been suggested in the last 50 years22-36. They have usually correlated more or less satisfactorily with the molecular properties but could not discriminate well between structural isomers, often providing the same index for different isomers.

The first topological index reflecting the topological structure of a molecular graph was proposed by Harry Wiener22 in 1947. The Wiener number W was defined as the sum of all edges between all pairs of carbon atoms in hydrocarbons. It gives a good correlation with the thermodynamic properties of saturated hydrocarbon molecules but doesn't discriminate well among structural isomers.

Bonchev and Trinajstic37

applied information theory to the problem of characterizing molecular structures and molecular topology38-42 by means of information indices43 that are just the quantitative measure of a given topological structure. The advantage of such kind of indices is in that they may be used directly as simple numerical descriptors in a comparison with physical, chemical or biological parameters of molecules in quantitative structure- property relationships and in structure-actiVIty relationshi ps44-46. It can also be noted that information indices normally have greater discriminating power for isomers than the respective topological indices.

The reasons for this are that information indices are not restricted to integral values as topological indices frequently are and information indices are formed from a summation of different magnitudes which is usually greater in number than that for the topological indices.

We present here some results concerning information indices applications to characterizing molecular structures. The paper is organized in the following way. First of all, the information indices based on the distance in a graph are reviewed. Then

I 2 3 4 5

< > - - - - < > - - - <

the numerical results of discriminating tests of indices on structural isomers and graphs are presented. Lastly, the correlating ability of information indices is demonstrated on the classes of organometallic compound.

Information Indices Based on the Distance in a Graph

One can start looking for possible information indices among the graph invariants. Information indices are constructed for various matrices (an adjacency matrix, an incidence matrix, a distance matrix, a layer matrix) and also for some topological indices such as the Wiener number.

In 1977 Bonchev and Trinajstic37

introduced the information on distances to explain the molecular branching that is the critical parameter determining the relative magnitude of various molecular thermodynamic properties. Firstly they used the information indices defined by Rashevsky for graphs.

However, these indices are not suitable for describing branching properties of graphs since they cannot reflect the essence of branching. This may be exemplified by considering trees with five vertices presented in Fig. 2. The five vertices are partitioned in different orbits In the above three graphs:

7;(2,2,1), T2(2,1,1,1), T3(4,1). Using Eq.(l), the following values for the information content in bits are obtained: IT. =7.61,lT =9.61,lT =3.61. One can see

I 1 ]

that this index cannot reproduce the obvious fact that the branching increases from a chain, through a branched tree, to a star.

So another approach to find an appropriate information measure of branching was used. One of the graph invariants is the distance matrix. Let C be a connected graph with the set of vertices V(C), n

=1

V(C)

I.

The distance d(u, v) between vertices u and v in a graph C is the length of the shortest path that connects these vertices. The distance matrix D

=11

dij

II,

i, j

=

1, ... , n, contains the distances

+

T,: /1.51, /2,41.131 To: {4, 51, /11.121, 131 T J: II, 2, 3, 4 I. 151

Fig. 2- Trees with five vertices and their orbits

(4)

1230 INDIAN J CHEM, SEC. A, JUNE 2003

d;j = d(i, j) between the every p~ur of connected vertices. Branching is connected with the distance matrix in an obvious way, since with increasing branching the distances in the graph become smaller.

This can easily be seen from the distance matrices of the trees ~,T2,T, presented in Fig. 2:

0 I 2 3 4 0 I 2 3 3

I 0 I 2 3 I 0 I 2 2

O(~)= 2 I 0 I 2 O(T,) = 2 I 0 I I

3 2 I 0 I 3 2 I 0 2

4 3 2 0 3 2 I 2 0

0 2 2 2 2 0 2 2

0(7;)= 2 2 0 2 I

2 2 2 0 I 0

Wiener22 first made use of the connection between the distance matrix and branching defining the topological index

I "

W = -

I

dij

2 '.j=1

... (3)

However, the Wiener number often has the same value for different graphs. For reducing degeneracies, Bonchev and Trinajstic introduced the information I D

on distances in a graph, considering all the matrix elements of distance matrix d;j as elements of a finite probability scheme associated with the graph in question. Let the distance of a value i appears 2n;

times in the distance matrix, where I ~ i ~ d(C) and d(C) = max;,jEV(G) dCi, j) is the diameter of a graph.

Then n2 matrix elements d;j are partitioned into d(C) + 1 groups, and d(C) + 1 group contains II zeros that are the diagonal matrix elements. With each one of these d(C)

+

1 groups can be associated a certain probability for a randomly chosen distance d;j to be in the i -th group:

[ ~.

" 2n2 P2 2, 2nd(C) Pd(G) d (G)

j

where p;

=

21l/n2 and Po

=

I/In2

=

lin .

The information on distances of a given graph will then, according to Eqs (I), (2) be

dcm I = 112

log2 /12

-l1log2 11 -

L

21l; log2 21l;

;-:::1

- I I d(C) 211 211

I =--Ioa - -02 " -L.J 2 ' loa - ' 02 2 n n ;=1 n /1

... (4)

... (5)

Since 0 is a symmetric matrix, Bonchev and Trinajstic consider, for simplicity of discussion, only the upper triangular submatrix that does preserve all properties of the information measure. In that case, the following expressions for the mean and total information on distances are obtained:

( 1) ( 1) d(Ci)

E_ l l n - la l l n - _ " l a If) - 002 L.J n; 002 II;

2 2 ;=1

... (6)

- E d(G) 21l 217.

If)

=-L '

log2 "

;=1 n(1l -1) n(n -I) ... (7)

where 11(,;-1) is the total number of upper off-diagonal elements in the distance matrix D. These information indices correspond to the information on the distribution of distances in the gt'aph according to their equality or nonequality and depend on the partitioning of the total number of distances into classes.

Using Eq. (7), one can obtain the following information on distances in the three graphs with five vertices presented in Fig. 2:

2 3 4 2 3

~: 4 3 2 11;

T 2

4 4 2 l1i

4 3 2 I

10 10 10 10 Pi

4 4 2

Pi 10 10 III

T ;

(~) = 1.85

T ; (T2) =

1.52

2

T, : 4 6 ni

4 I>

Pi 10 10

T;

(T,) = 0.97

(5)

KONSTANTINOVA el 01.: APPLICATIONS OF INFORMATION THEORY IN CHEMICAL GRAPH THEORY 1231

One can see that I ~ as well as 1 ~

=

1/(/;-1)

T :

reproduces the branching properiies of trees ~, T2, Tl decreasing regularly with increased branching.

Moreover, Bonchev and Trinajstie have shown that

I Z

is a rather sensitive measure of branching having different values for all trees with 11 = 4,5,6,7,8 (the total number of trees is 45). The number of all possible distributions of d;j' i.e., number of different

Ig,

increases rapidly with the increase in the number of vertices in the graph. This makes I

g

an appropriate quantity for distinguishing structural isomers.

However, there is another possible information measure that can be defined on the basis of distances in the graph. Bonchev and Trinajstie introduced the information index

I;

as the information on the realized distances in a given graph that depends on the partitioning of the total distance. It is information on the partitioning the Wiener number (that is the total distance of the graph) into groups of distances of the same absolute values. Since the Wiener number is

. ~d(G) .

given by formula W = L.J;~I I It; and following Eqs (1) and (2), we obtain

diG)

l~ =

W iog2 W -

Lin;

iog2

i

(8)

i=1

dI G ) · .

- W

L

1 1

1 =-

n-iog-

D . 1

W

2

W

1=1

(9)

For the three five-vertices trees presented in Figure 2, the following values are obtained:

~: I; =62.93,

T;;

=3.15,

T2 : I; =57.55,

TD w

=3.20, T) : I; = 52.00, T~v = 3.25

It is easy to see that I;~ decreases with branching. It is a more sensitive quantity than the Wiener number since it can distinguish two graphs having the same Wiener number but different i and 12;. It was checked that ,,~v increases regularly with branching at lower values of 12, but at higher ones some irregularity occur and it cannot be used as a good measure of branching.

As for

1; ,

it is a sensitive measure of branching

having different values for all trees with n =4,5,6,7,8 (the total number of trees is 45). Figure 3 presents the pair of trees having the same value of the Wiener number and the different values of

I Z

and

I ; .

The values of information measures

Ig

and

I;

were inspected3? in comparison with several topological indices such as the Wiener number, the greatest eigenvalue of the characteristic polynomial XI' the sum of the polynomial coefficients (or Hosoya index24), the information on polynomial coefficients 1,,(., and Randie connectivity index25

XR '

The inspection of these values indicates the great sensitivity of the two information indices

I Z, I;

to

IV = 48

I :

= 41.0774

I;'

= 203.0586

IV = 46

I~ = 39.5676 I; = 195.5544

IV= 7 1~

I~ = 62.0606 I; = 328.0287

IV = 67

I~ = 55.3448 I; = J 12.3888

IV = 62

I~ = 52.3253 I; = 189.3857

Fig. J--The pairs of trees having the same value of the Wiener number and the different values of

If ;

and I~

(6)

1232 INDIAN J CHEM, SEC. A, JUNE 2003

all structural detai Is of the tree graphs. There are no two graphs among the 45 graphs examined that have the same information on the graph distances. All the other listed indices are not so speci fico and they often have the same value for different graphs. The same results were obtained for information indices 7;~ and

- IV

IJ) . Their values were tested on the set of 45 tree graphs. There are no two graphs having the same values of these indices.

Thus the information measures introduced on the basis of distance matrix appear to be very appropriate indices for discrimination of graphs. The number of different

I;

for the graphs having the same number of vertices is limited by the number of all the possible distributions n(1/ -1)/2 graph edges into k different groups. Since the number increases rapidly with increasing values of n, one may expect the information on graph distances to have a good ability of differentiation between structural isomers even for very large systems. It was one of the main results obtained by Bonchev and Trinajstic.

Later Konstantinova, Paleev and Diudea47-49 have confirmed that the information approach allows to design very sensitive information indices based on the distance in a graph. The information distance index of vertex i was introduced in Ref. 47 and defined as follows:

~ d d H f) (i) = -L.J _ 'J_Iog? _ IJ_,

j=l d(i) -d(i) ... (10)

where d(i) = '" L.J /I j=l d IJ is the distance of a vertex i.

Then the information distance index of graph vertices takes the form

H; = i Hf)(i) ... (11)

;=1

The same approach was applied to the layer matrix A=IIAij

II,

i=l, ... ,n,j=I, ... ,d(G), where \ is equal to the number of vertices located at a distance j from a vertex i. The information layer index of graph vertices is defined by the formula

... (12)

where e(i) = max ,,,,V(G) dei, v) IS the vertex eccentricity. It will be shown later that indices

H ;

and H~ have a great discriminating power among structural isomers.

One more information index based on the distance matrix was considered by Skorobogatov et al.50 in structure-activity correlations. The information index

H2 is defined by

_ Lk d (i)k; I d (i)k;

H - - - - o a - -

2 ;=1 2W 02 2W ' ... (13)

where k;, i = 1, ... , k, is the number of vertices having the distance d(i). This index gives the linear cOITelations with information mass-spectrum indices on the several classes of organic and organometallic compounds50-54

.

D'yachkov and Konstantinova55 consider the entropy HD , the marginal entropy H~ and the information ID based on the distance matrix as follows

/I /I d.. d ..

Ho -. =-~ L..J ~L.; _'J loa b2 _ 'J

;=1 j=l 2W 2W ... (14)

= - 1 [ 2W log2 2W - d(G)

L

2n; i log2 i

J

2W ;=1

H; == -~ d(i) 10 d(i) o

f:(

2W g2 2W

... (15)

1 ( /I '

= - 2W log2 2W -

L

dU)log2 d(i) )

2W ;=1 ,

(16)

where /1.; is the number of vertex pairs being at a distance i from each other and W =L~~IG) i 12;.

Let I; be the number of matrix elements equal to l.

The entropy H;., the margi nal entropies H

1

and H ~

and the information I). are defined by

"d(G)

A.. A. .

' " ' " IJ I 'J

H). == -L.J L.J

ogz ..

;=1 j=1 12(n -1) n(n -1)

= - -

1

_-

( n.(n -1) iogz n(n -1) -

II,

max i iogz i

J

n(n. 1) ;=1

... (17)

(7)

KONSTANTINOVA el al.: APPLICATIONS OF INFORMATlON THEORY IN CHEMICAL GRAPH THEORY 1233

tI(C) 211 . 2n.

H' == -

L

' 100 '

). j=i n(n -1) 02 n(n -I)

1 [ tI«(;j

= _ n(n -1) log2 n(n -1) -

L

2nj log2 2nj

n(n 1) j=i

... (18)

. n-l n-1

H~ == -n log2

=

log2 n

n(n -1) n(n -1) ... (19)

... (20) The information indices

I;,

H D and H

1

are based on the vector ni and the constants nand W that leads to their correlations. In particular, since

I~v = ~ (w

log2 W -

L ~~iG ) n

i i log2

i)

then followi ng Eq. (14) we immediately obtain H D =

I; +

I. So it is

enough to study the only index among them. The index ~w was investigated in discriminating tests as the most known one56,57.

Discriminating Tests

The discriminating power41.58 is one of the basic characteristics of a topological index [ and corresponds to a measure of its ability to distinguish among the nonisomorphic graphs (the structural isomers) by distinct numerical values of an index [.

The theoretical evaluation of the index sensitivity S on a fixed set M of non isomorphic graphs can be achieved by the formula

S = N-N1

N ... (21)

where N

=1

M

1

is the number of graphs in a set M and NI is the number of degeneracies of an index I within a set M. According to the definition, S

=

1 means that among the elements of the set considered, no two non isomorphic graphs have the same value of the index l.

Bonchev and Trinajstie7 investigated the discriminating power of information and topological indices between 45 alkane trees. Basak et al.59 have continued these investigations on the set of 45 alkane trees as well as on the set of 19 monocyclic graphs.

Razinger, Chretien and Dubois58 explicitly pointed out the fact that the discriminating power of the Wiener number is very low in alkane series. The discriminating tests among polycyclic graphs were done by Konstantinova and Paleev47 on the set of 1020 subgraphs of the regular hexagonal and square lattices. Later Konstantinova48 has tested information and topological indices for 2562 subgraphs of the regular hexagonal lattice. Graphs of this class represent the molecular structures of unbranched cata- condensed benzenoid hydrocarbons. The discriminating powers of topological and information indices as well as the Wiener polynomial derivatives were studied by Konstantinova and Diudea49 on 3006 subgraphs of the regular hexagonal lattice and on the set of 347 cycle-containing graphs with ten vertices and three to eight-membered cycle.

An exhaustive analysis of 12 information and topological indices based on the distance in a graph was performed by Konstantinova and Vidyuk56,57 on 1.443.032 polycyclic graphs and "3.473.141 trees. The information indices I D' H~,l)., H)., H;,H; ,H2

,!:;

presented in section 2 and the topological indices such as the Wiener number, the Schultz number, the Balaban number and the Randi6 number were examined in the discriminating tests. The formulas for topological indices are given below.

The Schultz molecular topological index60 is defined by

MTI

= L.

deg(v)· d(v)

+ L.

deg(v)2, ... (22)

"",V(G) VEV(G)

where deg(v) is the vertex degree. This index has found interesting applications in chemistri4 Its discriminating power was investigated by Dobrynin61 for cata-condensed benzenoid graphs.

The average distance sum connectivity was introduced by Balaban29 and defined by

m '" _1.

} =

L.J (d(u)· d(v» 2,

m - n

+

2 ",VEV(G)

... (23)

where m is the number of edges in a graph G.

The Randi6 index25

X

is based on the molecular connectivity and achieved by the formula

X=

L

(deg(u).deg(v)rt ... (24)

u,,,,,,V(G)

(8)

1234 INDIAN] CHEM, SEC. A, JUNE 2003

The numerical results of discriminating tests for considered indices are given below on the sets of polycyclic graphs and trees.

Polycyclic graphs

The polycyclic graphs embedded to the regular hexagonal, square and triangular lattices are tested.

The hexagonal graphs correspond to the structural formulas of planar polycyclic aromatic hydrocarbons62.63

. The values of 12 information and topological indices were calculated for 849.285 hexagonal, 298.382 square and 295.365 triangular graphs. The calculation accuracy for all indices is 10-13. The discriminating powers of indices were obtained in accordance with Eq. (21) and the results are given in Table I, where N is the number of graphs in the respective class.

The data show that the information indices give a much more discriminating power. The indices H;~,lD,H~,H2 have the best result (5=0.999) for hexagonal graphs. All topological indices, excepting J , could not discriminate between these graphs. The same situation is observed for square and triangular graphs. The degeneracy is high for W, MTI, and very low for H;,l D' The information index -w I v discriminates not bad among hexagonal graphs but it doesn't discriminate square and triangular graphs. The opposite situation is observed for the Randi6 index X. Its discriminating power is the lowest one on hexagonal graphs and the highest one on triangular graphs.

Trees

Similar results were obtained on the set of trees. A tree is a connected acyclic graph. The discriminating powers of indices were calculated on the set of

3.490.528 trees up to 21 vertices. The obtained data are given in Table 2. The highest sensitivity corresponds to the information index

H; .

There are no degeneracies of this index, i.e. 5 = 1, on the set of trees up to 17 vertices (the number of trees is

N =81.134). Two trees with 18 vertices and two trees having 19 vertices give the same values of

H ; .

There are only 14 degeneracies on the set of 20 vertices trees, N = 823.065, and only 12 degeneracies on the set of 21 vertices trees, N =2.144.505. All of trees having the same values of this index are presented in Fig. 4. Topological indices show a very low discriminating power.

So the data obtained on the sets of nonisomorphic polycyclic graphs and trees indicate that in common the information indices have greater discriminating power than the topological ones. Another basic characteristic of a topological index is its correlating ability with a molecular property. The information approach to the correlations between structure and reactive capability of molecules is considered in the next section.

Correlating Ability of Information Indices

One of the key problems of a modern chemistry is to find a relation between a structure and a reactive capability of a molecule. The reactive capability of a molecule can be characterized by its mass-'spectrum that contains the information on the ways of a molecule fragmentation and displays the "behavior"

of some molecular fragments that can be interpreted as subgraphs of a molecular graph.

Let us define the information index of the chemical mass-spectrum using the Shannon relation. From the information theory point of view, the mass-spectrum is the distribution of probabilities p ,

= --'-,

A A i

=

1, ... , k,

Table I- The discriminating power of indices on the sets of N hexagonal, square and triangular graphs

10 Hi H). H" H2 H:' -w J X MTI W

N 0 I). D A 10

849285 0.999 0.999 0.997 0.993 0.999 0.999 0.997 0.659 0.998 0.0001 0.004 0.0006

298382 0.997 0.995 0.954 0.811 0.998 0.994 0.906 0.133 0.993 0.005 0.002 0.0003

295365 0.984 0.982 0.844 0.466 0.992 0.981 0.585 0.021 0.986 0.407 0.0008 0.0001 Table 2- The discriminating power of indices on the set of N trees up to 21 vertices

N 10 Hi

D I). H). H" 0 H2 H" ).

T ;

J X MTI W

3490528 0.998 0.912 0.985 0.321 0.999 0.907 0.428 0.683 0.907 0.017 0.00004 0.00001

(9)

KONSTANTINOVA el al.: APPLICATIONS OF INFORMATION THEORY IN CHEMICAL GRAPH THEORY 1235

n

=

20. Contd.

/

H ~ = 7 1.067 195826536 I 8

H ;

= 82.23982070813867

H ; =

76.97820768153596

H ;

= 82.46161593939024

H ~ = 8 1.80354946997848

H;

= 82.74262835086996

/

H; =

82.07740940300192

H ; =

83.25510903225370

H ~ = 82.16707259857088

H ; =

87.30751562364020 Fig. 4 - The pairs of trees having the same values of the information index. H~ . (Conld)

(10)

1236 INDIAN J CHEM. SEC. A, JUNE 2003

n = 21. Contd.

~

\

- - ' r /

I

1<

\

H ; =

87.93993238547368

~-

H ; =

88.30758149768472

H~

=

88.38614500060488

H; =88.55999706571380

H ~ = 88.56334987995548 Fig. 4 - The pairs of trees having the same values of the information index H~.

k j

J -

2

0 - - - 0 - - - 6 - --I 0::'

'0

8

10

4,5 '7.8

I I

d(i)

2

I I

16 18 2U

12 14

Fig. 5--The distance vertex spectrum

of the ions formation, where A; is the mass-spectrum amplitude of the i -th ion, A =

2: ; = 1

A;, and k is the

number of peaks in the mass-spectrum. The amplitude information index HA is defined by

kA

A HA =-~ L.J -.100 02 ---1...

; = 1

A A ... (25)

On the other hand, a molecular graph that represents a structural formula of a molecule could be

used for defining specific structural features of a molecule by means of information indices based on the distance in a molecular graph. As it was mentioned in introduction the topological index is .designed by transforming a molecular graph into a number and it expresses in a numerical form the topology of the chemical species it presents.

Moreover, it was shown by Skorobogatov et aI.50 that some information indices have a "chemical" spectral interpretation. Let us consider the information index H 2 that is based on the vertex distance d (i) and the

(11)

KONSTANTINOVA el al.: APPLICATIONS OF INFORMATION THEORY IN CHEMICAL GRAPH THEORY 1237

I, 4

7

- I 2 3 6

7. 8 4,5

- 8

5

I 6 2 3

-

I I I aUlomelflclly classes

Fig. 6-The autometricity vertex spectrum

number k; of vertices having the distance d (i) , and let us define the pairs (d(i),k) as the points in Euclidean plane. Then the distance vertex spectrum can be pictured on the plane by the lines ((d (i), k;), (d (i). 0»), i

=

1, ... , k. Figure 5 shows the distance vertex spectrum for the tree. As one can see, there is even the visual correspondence between the chemical mass-spectra and the topological spectra of molecular graphs.

One more topological spectrum called the autometricity vertex spectrum was defined on the basis of a layer matrix. It could be happen that some rows of this matrix are the same. It means that the corresponding vertices belong to one and the same class of autometricity. By this way the vertex set is divided into the autometricity classes. The autometricity vertex spectrum is defined by the autometricity classes and by the number

Il;

of

vertices in the

i

-th class of autometricity.

Figure 6 shows the autometr;city vertex spectrum for the same tree. Its canonical layer matrix is the following one:

1 ')

...

3 7, 8 I I 3 2 I

): = 3 3 6

2 3 2 2

3 3 4, 5

4 3 3

The rows are ordered with respect to their lengths and then the rows are ordered lexicographically. The numeration of autometricity classes corresponds to the numeration of rows in the canonical layer matrix. Let us notice, that there is a finite probability scheme on the vertex set with respect to the autometricity ratio and one can define the information index

k I1A

,l

H a

=-"-'

L..J 10b2 0 - '

;=1 Il 11

... (26)

The information indices H2,H" and HA based on the topological as well as the chemical spectra provide the presentation of a molecular graph and a chemical structure in terms of the same quantitative scale because their values are expressed in information bits. These indices are suitable ones for finding structure-activity correlations.

At first this approach was applied to the class of ferrocene derivatives C"FeCsH.R, where R is a substituent5o. The linear correlations between the information indices H2 and HA, and H" and HA were found. It was shown that the initial set of molecular structures is divided into three subsets by the linear regression. In the cases considered the correlation ratio ranges from 0.94 to 0.975. The example of the autometricity vertex spectrum for a molecular structure of this class is given in Fig. 7.

Figure 8 shows the correlations between the information indices !fa and HA on the set of ferrocene derivatives. The correlation ratio r and the number of structures

n

for each subset are presented.

The similar results were obtained for infomlation indices H 2 and HA. The correlations between them were found as follows:

1 : HA=I .09+0.76H2 (r=0.950, 11=9) 2 : HA= 1.30+0.99H2 (r=0.975, 11= 17) 3 : HA=3.96+0.40H2 (r=0.950, 1l=6)

Nekrasov and co-workers51-54 have used this approach for finding correlation on the several classes ( - 20) of organic and organometallic compounds. In particular, they have found the correlations on the set

(12)

1238 [NDIAN J CHEM, SEC. A. JUNE 2003

~/ H H

H " ~ c C "-:

H H

Fe ---)

H?ct\H

H ~H

H

a) the structure fomlUla of methylferrocene

3 9

3 3

4 9 7

A= 4 3 9

I 3 9

10 10 3 4 12 7 4 9 10

7 3 9 7 3 7 10

20 14

3

19

9 18

b) the molecular graph of methylferrocene

12. 13,17,18,19,20,21:; X,

22,23,24 :; X,

1, 2, 7 , 8 , 9 ,10,II :;X,

15 :; ~

14,16 :; X5

6 :; X.

4 3, 5

:; X, :; Xs 22

23

c) the canonical layer matrix of the methylferrocene molecular

n, l-

7 - 6 - 5 - 4 - 3 - 2 - 1-

o 2

I

I 3 4

I I

I 1

5 6 7 8 1 autometnclty c1assesXi 9

d) the autometricity vertex spectrum

Fig. 7- The example of the autometricity vertex spectrum for the molecular graph of methylferrocene

of arylsilanes. There are three subsets of arylsilanes and each of them has specific structure peculiarities. Figure 9 shows the correlations between the information indices H 2 and HA on the set of arylsilanes, Line 1 corresponds to the set of phenylmethylsilanes, line 2 corresponds to the set of phenylallyl- and phenylvinylsilanes and line 3 corresponds to the set of vinylmethylsilanes. As one can see from the picture the correlation ratio for all the cases ranges from 0.856 to 0.982.

HA

o

4

H,

1.25 2.25 3.25 4.25 5.25

I: HA = 1.70 + 0.57 H, (r = 0.940.11 = 10) 2: HA = 1.24 + 0.98 H, (r = 0.975, n = 16) 3 : HA = 3.91 + 0.41 H, (r = 0.g40. '1 = 6)

Fig. 8- The correlations between the information indices H" and HA on the set of ferrocene derivatives

HA

4

2

1.25

PhMeSiR f

2.25

o /

ViIlMeSiR !

PhAll(Vin)SiR

3.25 4.25

I 5.25 I: HA = -5.9 + 2.9 HJ (r = 0.945, II = 12) 2: HA = -8.3 + 2,8 HJ (r = 0.982. II = 10) 3: HA = -8.5 + 3.4 Hl (r = 0.856, II = 5)

)

Fig. 9- The correlations between the information indices H 2 and HA on the set of arylsilanes

References

Related documents

For a connected graph with |V | &gt; 1 and exactly 2k odd vertices, the minimum number of trails that decompose it is max{k, 1}.... Application of

A graph is called Eulerian if it has a closed walk that contains all edges, and each edge occurs exactly once.. Such a walk is called an

• Proper ordering of nodes of a flow graph speeds up the iterative algorithms: depth- first ordering1. • “Normal” flow graphs have a surprising property --- reducibility ---

Providing cer- tainty that avoided deforestation credits will be recognized in future climate change mitigation policy will encourage the development of a pre-2012 market in

These gains in crop production are unprecedented which is why 5 million small farmers in India in 2008 elected to plant 7.6 million hectares of Bt cotton which

Of those who have used the internet to access information and advice about health, the most trustworthy sources are considered to be the NHS website (81 per cent), charity

The approach has since then been ut- ilized31,35-42to investigate (i) the state of aggrega- tion of the components of a mixture (and also the nature of molecular entities present

higher temperature than a secondary or tertiary amine of the same molecular weight... SEMWAL: CORRELATION OF PHYSICAL PROPERTIES USING GRAPH THEORY 145 The connectivity indices do