Artificial neural network models for advanced oxidation of organics in water matrix–Comparison of applied methodologies

(1)

Artificial neural network models for advanced oxidation of organics in water matrix–Comparison of applied methodologies

Tomislav Bolanča*, Šime Ukić, Igor Peternel, Hrvoje Kušić & Ana Lončarić Božić

Faculty of Chemical Engineering and Technology, University of Zagreb, Marulićev trg 19, 10000 Zagreb, Croatia

Received 24 July 2012; accepted 6 June 2013

This study focuses on development, characterization and validation of an artificial neural network (ANN) model for prediction of advanced oxidation of organics in water matrix. The different ANNs, based on multilayer perceptron (MLP) and radial basis function (RBF) methodologies, have been applied for modeling of the behavior of complex system;

zero-valent iron activated persulfate oxidation (Fe⁰/S₂O₈^2-) of reactive azo dye C.I. Reactive Red 45 (RR45) in aqueous solution. The input variables for ANN modeling are corresponding to Fe⁰/S2O82-

process parameters such as pH, dosage of zero-valent iron and concentration of persulfate, while the system output is the mineralization extent of aqueous RR45 solution after the treatment by Fe⁰/S2O82-

at set conditions. The performance of developed ANN models has been compared and evaluated with regard the applied methodology, training algorithm, activation function and network topology. The results show that MLP methodology needs sinusoidal activation function to reveal the maximal capability. It is demonstrated that although ANN model based on RBF methodology offers good predictive ability, its capability to extrapolate is limited.

The full potential of ANN modeling is reached using MLP methodology and scaled conjugate gradient training algorithm in combination with sinusoidal activation function, 6 hidden layer neurons and 8 experimental data points. Based on external validation set, it is demonstrated that the developed model is accurate with the average of relative error 1.70%, and there is no absolute or proportional systematic error.

Keywords: Advanced oxidation process, Artificial neural network, Modeling methodology

The organic pollution of natural resources due to uncontrolled discharge of vast array of hazardous and presumably toxic chemicals originated from various industries present a serious threat to environment and should be minimized at the source. Accordingly, various processes for the treatment of such wastewaters; either biological, physical or chemical, are available^1,2. However, their applicability depends on various factors such as (i) nature and concentration of organic pollutants present in wastewaters influencing its adequacy, (ii) specific parameters of applied process influencing treatment efficiency, and (iii) economic aspects as well^1,3. The effective treatment of recalcitrant and toxic pollutants can be achieved by the means of advanced oxidation processes (AOPs), which are classified as destructive and low- or non-generative technologies^3-6. They have been applied to a wide range of organic pollutants, due to the simplicity and ease of implementation;

running under mild conditions of temperature and pressure^5,6.

Generally, the optimization of treatment process leads to the improvement of its overall effectiveness.

The optimization using empirical approach based on a single-factor-at-a-time experimental prediction which does not consider the impact of interaction factors, can be complicated and costly, and may often yield misleading information⁷. In order to reduce laboratory studies and save time and money, the application of modeling tools in combination with experimental approach, such as artificial neural networking (ANN), mechanistic modeling (MM), structure-activity relationship modeling (SAR), or response surface modeling (RSM), are favorable^8-11. AOPs are complex systems and their efficiency depends on various particular process parameters such as pH, catalyst/activator type and concentration, oxidant type and concentration, UV type and dose, and frequency and field of applied energy. Moreover, these process parameters may also interact with each other^5,6. Hard modeling of AOPs includes detail chemical phenomena occurring in the system and accordingly, requires unpractical, complex, and too slow computing routines, which involve solving of mass and energy transfer equations. Application of

——————

*Correspoding author.

E-mail: Tomislav.Bolanca@fkit.hr

(2)

multivariate techniques like RSM can overcome aforementioned disadvantages considering the relative significance of several affecting factors even in the presence of complicated, multidimensional interactions^11,12. However, this approach presumes linear behavior between factors in the developed RSM model affecting the system properties. On the other hand, the application of non linear modeling techniques could increase the predictive ability, which is the major concern, particularly when developed model is needed for up-scaling of complex systems.

Considering above stated, ANN modeling seems to be reasonable tool for the prediction of complex system behavior, such as AOPs. Among all available types of ANN, the most common methodology applied is the multilayer perceptron (MLP). There are many algorithms for training MLP networks^13-17. The back-propagation (BP) algorithm is simple, but characterized with slow convergence¹⁸. Various algorithms, mostly based on second order information about the shape of the error surface, might be introduced to address the problem^13-19. On the other hand, radial basis function (RBF) methodology, using exponentially decaying localized nonlinearities (e.g. Gaussian functions), constructs local approximations to non linear input-output mapping. Networks based on RBF are capable of fast learning. Moreover, they are less sensitive to the order of presentation of training data in comparison to MLP networks. Recently, ANN application in modeling of AOPs have been reported^8,20-22. However, in order to investigate the full potential of ANN maximizing its predictivity, model should be optimized according to methodology, training algorithm, activation function and topology.

The aim of this work includes development and comparison of ANN based on different methodologies

applied for the prediction of an oxidative treatment of model pollutant by AOP. For this purpose, aqueous solution of reactive azo dye C.I. Reactive Red 45 has been treated with zero-valent iron activated persulfate system (Fe⁰/S2O8

2-). The MLP and RBF types of ANN methodologies are applied. Developed ANN models are optimized in terms of training algorithm, activation function and number of hidden layer neurons. In addition, minimal number of experimental data needed for training are determined and discussed.

The evaluation of developed models is performed using external experimental data set.

Methodology

A rather unique concept of artificial neural network is multilayer feed-forward network with error back- propagation training methodology, commonly called multilayer perceptron (MLP) (Fig. 1)²³. Input of the network is an x vector consisted of k input variables, while the network’s hidden layer comprises one or more (m) chained sublayers. The input of first sublayer, with n hidden neurons, is calculated by multiplying the input vector x, with k n× weight matrix W1. These hidden neurons are activated by nonlinear activation function Φ1(x1W1). The values of transfer functions multiplied by next weight matrix (W2) become an input of the next layer, providing the final network output vector y.

1 2

1 1

1{... 1( W) W} W ] W

[Φ ₋ Φ ⋅ ⋅ ⋅ ⋅ ₊

Φ

= _m _m x _m _m

y …(1)

The MLP activation function Φ is generally tangent hyperbolic, sigmoid or sinusoidal function.

On the other hand, radial basis function (RBF) ANNs employ Gaussian kernel as activation function^24-26. The kernel is centered at the point specified by the weight vector associated with the unit. Gradient descent (GD) methods23-25,27-31

are commonly used as

Fig. 1—Schematic diagram of multilayer perceptron

(3)

MLP training algorithm. They are finding a minimum of an error function F by means of adapting the weights w. For the simplest case with only one hidden sublayer function, F(w) can be represented by the following equation, where i presents training sample:

( ) ( _i _iW)2 i

F w =

∑

y −x ^…(2)

The descent gradient [negative first derivative of Eq. (A), denoted as ^−∇^{F w}

( )

] is used in next iteration (t+1), as a new search direction for position w in the weight space, as given below:

( )

1

t t t

w₊ =w − ⋅∇η F w …(3)

Factor η is referred to the learning rate and determines the speed of convergence. The weight adaptation can be performed sequentially using the follwing first order Taylor expansion:

( )

^t ¹

(

^t ^t

) ( )

^t ^t ^t

F w₊ =F w + ∆w =F w + ⋅ ∆g w …(4) where gt represents the gradient ∇F w( _t) evaluated for the previous iteration, t. In order to decrease number of iteration steps following Newton method can be applied:

( )

^t¹

(

^t ^t

) ( )

^t ^t ^t ^t ^t

F w₊ =F w + ∆w =F w + ∇F w ⋅ ∆ + ∇w F w ⋅∆w

( )

¹ ²

( )

²

t t t t t t 2 t t

F w =F w + ∆w =F w + ∇F w ⋅∆ + ∇w F w ⋅∆w …(5) Newton’s weight update routine can be expressed as ^32,33:

1

1 A

t t t t

w₊ =w − ⁻ ⋅g ^…(6)

where A^-1 represents the inverse Hessian matrix

[ ² ¹

(∇ F w( _t))]^-1(ref. 34).

Computation of the inverse Hessian matrix should be avoided even though global minimum can be found using significantly lower number of iteration steps, which is major improvement to the iterative approximators of the gradient descent method.

Newton method requires inversion and storage of Hessian, which is usually quite computational and memory consuming. The additional problem arises if multilayered neural network consists of more complex error surface. This can be overcome by Gauss-Newton based update known as

Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm²³, as shown below:

1 1

1 2

t

t t t

g g

w w g

w w

η δ δ

δ δ

− +

 

= − ⋅ 

  ^…(7)

BFGS update is accurate only in vicinity of global minimum, due to the approximation of Hessian matrix. Moreover, it is divergence sensitive when training starts far from the minimum. The inverse of approximated Hessian can be stabilized by using following Levenberg-Marquardt (LM)³⁵ update:

1 1

1 I

2

t

t t t

g g

w w g

w w

η δ δ τ

δ δ

− +

 

= −  + 

  ^…(8)

This is combination of the known gradient descent and Gauss-Newton method. Parameters η^and τ control behavior of the weights updates, while I represents identity matrix. For a large value of η and τ gradient descent training will be performed, in contrast to training behavior when applied for small values of η^andτ. LM, combining GD and Gauss-Newton training, has been applied successfully due to the fact that the algorithm is stable and efficient. In contrast to the previous methods, conjugated GD does not compute or approximate the inverse Hessian but explores the error surface in a similar way (i.e. using higher order information about the error hyperplane shape). The weights update is similar to that of GD [(Eq. (3)]. The conjugated gradient jt reveals more information about the error surface since current and previous gradients are simultaneously evaluated, as shown below:

( )

¹¹² ¹

t t t

t

g g g

j g j

g

− −

−

= − + − ^…(9)

A variation of the conjugate gradient method is scaled conjugate gradient (SCG), which avoids line search per learning iteration by using LM approach in order to scale the step size. Application of this approach could lead to additional saving of the time needed for training calculation.

On the contrary, in RBF training process both the positions and widths of Gaussian kernel functions must be learned from the training patterns. A number of kernels are positioned in the input space using one of possible placement algorithms. K-means (KM) algorithm tries to select an optimal set of points that are placed at the centroids of the cluster of training

(4)

data^23,24. Kernel positioning is then followed by systematic passes through the training data set to determine the kernel widths usually using K-nearest neighbor (KNN) algorithm^23,36. Once centers and deviations have been set, the output layer can be optimized using the standard linear optimization technique like singular value decomposition (SVD).

RBF networks train faster than MLP but they are not as versatile and are comparatively slower for use. The selection of appropriate network for a given problem depends strictly on the problem logics, which is aimed to be performed by this study.

Experimental Procedure

The chosen system for ANN modeling was zero-valent iron activated persulfate oxidation (Fe⁰/S2O8

2-) of a reactive azo dye (C.I. Reactive Red 45, RR45, purchased by Ciba-Geigy) in aqueous solution. Series of experiments (Table 1) were conducted in order to obtain representative data set used afterwards in development, characterization and validation of applied ANN models derived by different methodologies. The considered Fe⁰/S2O8

2-

process parameters such as initial pH, zero-valent iron activator dosage and concentration of persulfate as oxidant, correspond to the number of input layer neurons in ANN. The mineralization extent was chosen as the studied output, characterizing the process effectiveness. The concentration of RR45 model pollutant in aqueous solution was 80 mg L^-1. All other used chemicals were purchased from Kemika, Croatia: zero valent iron (Fe⁰) p.a.; potassium persulfate (K2S2O8) p.a.; potassium hydroxide (KOH) p.a.; and sulfuric acid (H2SO4) >96%. The influence of the initial pH was investigated within the range 3-7.

The iron concentration was varied from 1 mM to 5 mM (corresponding to zero-valent iron dosages 55.85 - 279.25 ppm respectively), while concentration of persulfate oxidant was varied within the range 10 – 150 mM. The experiments were performed in following manner. First the appropriate oxidant dosage was added into the aqueous dye solution, which was followed by the adjustment of pH at the desired value and the addition of demanded dosage of zero-valent iron depending on the chosen process conditions within the abovementioned range of studied process parameters (Table 1). The reaction mixture (250 mL) was continuously stirred at room temperature in an open batch system with magnetic stirring bar. After one hour of treatment, samples were taken from the reaction mixture and thereafter

immediately analyzed. All experiments were performed in triplicates and averages were reported, while the reproducibility of experiments was >95%.

Total organic carbon content (TOC) of aqueous dye solution was measured using total organic carbon analyzer (TOC-VCPN) Shimadzu, Japan, while Handylab pH/LF portable pH-meter, Schott Instruments GmbH, Germany, was used for the adjustment of initial pH.

Modeling procedure

In this study MLP and RBF were used as artificial neural networks methodologies. The input layer consisted of three neurons corresponding to pH, dosage of zero-valent iron and concentration of persulfate, while the output layer consisted of one neuron corresponding to the mineralization extent of aqueous dye solution after the treatment by Fe⁰/S2O8

2-. The number of neurons in single hidden layer was varied from 2 to 12 (increment 2) and the number of experimental data points in training set was varied from 4 to 20 (increment 4); the rest of the experimental data was used for validation purposes.

Table 1—Scope of experiments performed for development of ANN models predicting the treatment of reactive

azo dye aqueous solution by Fe⁰/S₂O₈^2- Run # pH [Fe⁰], mM [S2O82-

], mM Mineralization, %

1 3 1 80 35.44

2 3 5 80 48.24

3 3 3 10 32.72

4 3 3 150 48.99

5 3 2.5 55 43.21

6 3 1.5 100 41.22

7 3.3 4 10 29.90

8 3.3 4 150 52.05

9 5 1 10 22.74

10 5 5 10 27.08

11 5 1 150 42.99

12 5 5 150 54.02

13 5 3 80 49.13

14 5 3 80 47.87

15 5 3 80 48.41

16 5 2.5 100 49.91

17 5 1.5 55 37.83

18 5 5 55 45.89

19 5.52 4.27 138.4 53.36

20 7 1 80 39.22

21 7 5 80 48.21

22 7 3 10 28.85

23 7 3 150 50.16

24 7 1.5 100 45.04

25 7 1.5 10 26.80

26 7 4 55 45.07

(5)

Besides, the training algorithm and activation function connecting input and hidden layer had to be optimized as well. The linear activation function connected hidden and output layer in all cases. The training algorithms tested for MLP were GD, BFGS, and SCG in combination with hyperbolic tangent (tan), exponential (exp) and sinusoidal (sin) activation function. The Gaussian kernel activation function was applied for RFB in combination with KM radial assignment and KNN radial spread algorithms.

In order to prevent overfitting, early stopping conditions are defied as: (i) training will be stopped when the root mean square error drops below 0.0000001, and (ii) training will be stopped when the root mean square error fails to improve by 0.0000001 over consecutive 10 iteration steps.

If training and testing sets are aimed to be the representative groups of data of the whole design area, it is preferable that every experimental data point has an equal influence on the neural network model. For this reason the random function was applied for the selection of experimental data points used for training, testing and validation sets of data.

The input experimental data were scaled to obtain their mean values and standard deviations of 0 and 1, respectively. This was necessary because most neural networks could accept input values in any range, but they were sensitive to inputs in a far smaller range. All calculations were performed in the Statistica 8.0 (StatSoft Inc. USA).

Results and Discussion

Figure 2 shows the application of GD algorithm for MLP methodology used in modeling of Fe⁰/S2O8

2-

process with different activation functions, number of

hidden layer neurons and number of experimental data points in training set. One can observe that predictive ability is significantly affected by the selection of appropriate activation function. A deeper insight could reveal that MLP models relative error decreases in following order: exp > tan > sin. Most likely, periodical nature of the sin activation function enables capturing of the multiple extremes phenomena that consequently increases the predictive ability of the model. Moreover, the same effect can be noticed in the case of using other training algorithms, such as BFGS (Fig. 3) and SCG (Fig. 4). These results suggest that the application of other modeling strategies, which are usually based on linear dependence between investigated parameters (with or without interactions), yield the approximation of system behavior without capturing real process extremes, thus probably decreasing predictive ability.

This effect could be even more emphasized if the model is used for scale-up purposes. It should be pointed out that the capturing of the multiple extremes occurring by ANN application always rises the question of overfitting, which will be discussed later on. To summarize, the application of sin activation function has been shown as optimal selection for modeling of Fe⁰/S2O8

2- process.

Figures 3 and 4 present the application of second order training algorithms (BFGS and SCG) in MPL methodology applied for modeling of Fe⁰/S2O8

2-

process behavior. It can be seen that error surfaces obtained by using GD (Fig. 2) and BFGS (Fig. 3) algorithms possess more local extremes than in the case of SCG algorithm application (Fig. 4). More flat error surface (obtained using SCG) offers two significant advantages such as (i) more accurate

Fig. 2—Development of ANN model for prediction of studied system behavior using MLP methodology with GD training algorithm in combination with (A) hyperbolic tangent, (B) exponential, and (C) sinusoidal activation function

(6)

Fig. 3—Development of ANN model for prediction of studied system behavior using MLP methodology with BFGS training algorithm in combination with (A) hyperbolic tangent, (B) exponential, and (C) sinusoidal activation function

Fig. 4—Development of ANN model for prediction of studied system behavior using MLP methodology with SCG training algorithm in combination with (A) hyperbolic tangent, (B) exponential, and (C) sinusoidal activation function

training process due to the lower possibility for trapping in the local minimum; and (ii) training process is faster due to the higher gradients of training process. This could be explained by considering the principles of SCG training algorithm, which implements the advantages of GD and BFGS simultaneously; it introduces the additional weighting parameter in order to more efficiently search for the minimum on the error surface. This indicates that SCG training algorithm is the most suitable among applied for modeling of studied system behavior; i.e.

Fe⁰/S2O8

2- process for the treatment of reactive azo dye in aqueous solution.

The optimization of applied MLP methodology regarding the number of hidden layer neurons is presented in Figs 2 - 4, for GD, BFGS and SCG algorithms. Such optimization represents one the emerging issues in overfitting prevention in ANN modeling. One can observe that prediction error is

decreasing with the increase of number of hidden layer neurons. However, there are no significant improvements in MLP model performance characteristics when 6 or more hidden layer neurons are built in the model. Hence, in order to avoid any chance of overfitting by keeping the topology not too complex, 6 neurons in hidden layer are selected as optimal value. The applied MLP methodology including all three studied algorithms (GD, BFGS and SCG) is optimized regarding the number of experimental data points needed for training set as well (Figs 2 - 4). In theory, it is preferable to reduce the number of experimental data points used for training in order to reduce the experimental efforts.

The obtained results (Figs 2 - 4) demonstrate that the number of experimental data points used for training procedure can be reduced up to 8 without severe disturbance of the predictive ability. It should be pointed out that selected 8 experimental data points as

(7)

optimal value does not represent the global minimum on error surfaces but rather minimize experimental effort whilst keeping reasonable generalization properties.

In Fig. 5 the predictive ability of RBF methodology used for modeling of Fe⁰/S2O8

2- process, regarding the number of hidden layers and number of experimental data for training, is presented. The developed structure of obtained error surface clearly indicates the optimal conditions; there is a global minimum at 6 hidden layer neurons and 8 experimental data in training set. The shape of developed surface obtained by RBF methodology is a consequence of application of radial basis activation function. This reflects as a space of hidden units that are local in regard to the input space; typically only a few hidden units will have significant activations for a given input vector.

On the other hand, in the training network using MLP methodology and three studied algorithms (Figs 2 - 4), the output functions represented by the hidden units, when linearly combined by the final layer of weights, must generate the correct outputs for a range of possible input values.

Fig. 5—Development of ANN model for prediction of studied system behavior using RBF training methodology

The relative difference between performance characteristic of ANN models using MPL methodology with SCG algorithm and RBF methodology is 0.2%. Due to the fact that ANN using RBF is more sensitive to extrapolation and missing data circumstances, ANN using MPL with SCG is favored for modeling the studied system. The value of relative error (1.7%) obtained for ANN model using MPL with SCG and 6 hidden layer neurons and 8 experimental data points for training, proves its very good predictive ability (Table 2). More detailed error analysis and validation is performed to reveal and confirm actual potential of the optimal ANN model for prediction of studied system behavior.

Table 3 shows the validation results of the ANN model using MPL with SCG and 6 hidden layer neurons and 8 experimental data points for training, analyzing the relationship between experimentally obtained mineralization extents (x) and those predicted by the model (y). If there is no modeling errors and no measurement random errors were made and if there is no bias, this would yield the relationship y=x. Otherwise there are two following possibilities, namely (i) if intercept = 0 and slope ≠ 1, there is proportional systematic error; and (ii) if intercept ≠ 0 and slope = 1, there is absolute systematic error. Furthermore, if there is no linearity, it is necessary to carry out the modeling over a shorter

Table 2—Comparison of optimized ANN models predictive ability Network type Activation

function

Training algorithm

Hidden layer nodes

Experimental data for training

Average of relative error, %

MLP Sinusoidal GD 6 8 3.24

MLP Sinusoidal BFGS 6 8 2.16

MLP Sinusoidal SCG 6 8 1.70

RBF Radial based KM KNN SVD 6 8 1.92

Table 3—Validation characteristics of optimal ANN model

Parameter Value

Coefficient -0.0157

Lower 95% -0.1048

Intercept

Upper 95% 0.0734

Coefficient 1.0048

Lower 95% 0.9956

Slope

Upper 95% 1.0139

Coefficient of determination 0.9901 Average of relative error % 1.70

Minimal relative error % 0.03

Maximal relative error % 2.59

(8)

parameter range if this is still compatible with original aim. From Table 3, one can observe that the values of intercept and slope satisfy above given criteria, and are not significantly different from 0 and 1 respectively, with the confidence level of 95%. This indicates that there is no systematic error, neither absolute nor proportional, in the ANN model using MPL with SCG and 6 hidden layer neurons and 8 experimental data points for training. The correlation coefficient is a measure of the joint variation between two variables presenting the strength of the proposed linear relationship between predicted and measured mineralization extents. The obtained value of the coefficient of determination (0.9901) confirms a strong linear relationship between predicted and measured values. In addition to above results, the values of average of relative error (1.70), with the narrow interval between minimal (0.03%) and maximal relative error (2.59), prove that the developed ANN model using MPL with SCG and 6 hidden layer neurons and 8 experimental data points for training have good predictive power for the studied system.

Conclusion

This work critically compared different artificial neural network methodologies for modeling zero- valent iron activated persulfate oxidation of reactive azo dye in aqueous solution. It is found that ANN modeling based on MLP methodology using SCG training algorithm in combination with sinusoidal activation function, 6 hidden layer neurons and 8 experimental data points, is found to be optimal for the studied system. The periodical nature of the sin activation function enables capturing the multiple extremes phenomena consequently increasing the predictive ability of model, while SCG training algorithm shows the best performance implementing the advantages of GD and BFGS simultaneously by introducing the additional weighting parameter in order to more efficiently search for the minimum on the error surface.

The developed model is proved to be very accurate with the average of relative error 1.70%, claimed on the basis of validation results obtained using external validation data set. It is shown that there is no absolute or proportional systematic error present in the develop model. Consequently, its superior performance characteristics suggest a great potential for up-scaling applications in complex systems such as AOPs.

Acknowledgement

Authors acknowledge the financial support from the University of Zagreb, Croatia (Project #110005).

References

1 Cunningham W P, Cunningham M A & Saigo B, Environmental Science, A Global Concern (Mc Graw-Hill Education, New York), 2005.

2 Sincero A P & Sincero G A, Physical-chemical Treatment of Water and Wastewater (CRC Press, IWA Publishing, New York), 2003.

3 Koprivanac N & Kusic H, Hazardous Organic Pollutants in Colored Wastewaters (Nova Science Publishers Inc., New York), 2009.

4 Forgacs E, Cserhati T & Oros G, Environ Int, 30 (2004) 953.

5 Gogate R & Pandit B, Adv Environ Res, 8 (2004) 501.

6 Parsons S, Advanced Oxidation Processes for Water and Wastewater Treatment (IWA Publishing, London), 2004.

7 Frigon N L & Mathews D, Practical Guide to Experimental Design (John Wiley and Sons, New York), 1997.

8 Aleboyeh A, Kasiri M B, Olya M E & Aleboyeh H, Dyes Pig, 77 (2008) 288.

9 Kralik P, Kusic H, Koprivanac N, Loncaric Bozic A, Chem Eng J, 158 (2010) 154.

10 Kusic H, Rasulev B, Leszczynska D, Leszczynski J &

Koprivanac N, Chemosphere, 75 (8) (2009) 1128.

11 Dopar M, Kušić H & Koprivanac N, Chem Eng J, 173 (2011) 267.

12 Myers R H & Montgomery D C, Response Surface Methodology: Process and Product Optimization using Designed Experiments, 2^nd edn. (John Wiley & Sons, New York), 2002.

13 Maren A, Harston C & Pap R, Handbook of Neural Computing Applications (Academic Press, London), 1990.

14 Pham D T & Sagiroglu S, J Eng Manuf, 210 (1996) 69.

15 Alpsan D, Towsey M, Ozdamar O, Tsoi A C & Ghista D N, Neural Networks, 8 (6) (1995) 945.

16 Tang K W, Pingle G & Srikant G, J Intell Syst, 7 (1997) 307.

17 Stager F & Agarwal M, Neural Networks, 10 (8) (1997) 1435.

18 Drake R & Packianather M S, Int J Adv Manuf Tech, 14 (1998) 280.

19 Devika P & Achenie L, J Intell Fuzzy Syst, 3 (1995) 287.

20 Yu R-F, Chen H-W, Liu K-Y, Cheng W-P & Hsieh P-H, J Chem Tech Biot, 85 (2) (2010) 267.

21 Saien J, Soleymani A R & Sun J H, Desalination, 279 (1-3) (2011) 298.

22 Hamzaoui Y E, Hernández J A, Silva-Martínez S, Bassam A, Álvarez A & Lizama-Bahena C, Desalination, 277 (1-3) (2011) 325.

23 Bishop C M, Neural Networks for Pattern Recognition (Clarendon Press, Oxford), 1995.

24 Tou J T & Gonzalez R C, Pattern Recognition Principles (Addison-Wesley, London), 1974.

25 Hill T & Lewicki P, Statistics: Methods and Applications, 1^st edn. (StatSoft, Tulsa), 2006.

26 Gupta M M, Jin L & Homma N, Static and Dynamic Neural Networks: from Fundamentals to Advanced Theory (Wiley, Hoboken NJ), 2003.

(9)

27 Haykin S, Neural Networks: A Comprehensive Foundation (Macmillan, New York), 1994.

28 Zupan J & Gasteiger J, Neural Networks in Chemistry and Drug Design, 2^nd edn. (Wiley-VCH, Weinheim), 1999.

29 Bolanča T, Cerjan Stefanović Š, Ukić Š, Luša M & Rogošić M, Chromatographia, 70 (2009) 15.

30 Bolanča T, Cerjan Stefanović Š, Ukić Š & Rogošić M, J Liq Chromatogr R T, 32 (2009) 2765.

31 Bolanča T, Šipušić J, Ukić Š, Šiljeg M & Ujević Bošnjak M, Fresen Environ Bull, 21 (2012) 76.

32 Tetteh J, Metcalfe E & Howells S, Chemometr Intell Lab, 32 (1996) 177.

33 Tibshirani R, Neural Comput, 8 (1996) 152.

34 Bronshtein I N, Semendyayev K A, Musiol G & Muehlig H, Handbook of Mathematics, 5^th edn. (Springer Verlag, Berlin- Heidelberg), 2007.

35 Masters D, Advanced Algorithms for Neural Networks: A C++ Sourcebook (Wiley, New York), 1995.

36 Musavi M T, Ahmed W, Chan K H, Faris K B & Hummels D M, Neural Networks, 5 (1992) 595.