The conflicting nature of the objective functions leads to a set of non-dominating solutions called Pareto Optimal (PO) solutions from which a single solution is obtained using higher order information, often provided by the decision maker [3]. The number of nodes in a single layer and the number of layers in the network together form the architecture of the network. One of the mistakes in implementing the ANNs is the inability to optimally design the architecture of the network.

The architecture of the network is obtained based on the hit-and-try method, which often leads to a dead end. In addition, the sample size required for training also significantly affects the predictability of the network. Some prominent contributions in the literature are Mixed Integer Nonlinear Programming (MINLP) approach [16], Akaike Information Criteria (AIC) [17], etc.

2 clearly shows that the surrogate construction algorithm is governed by several parameters whose values are usually set based on some heuristic, inviting potential errors and credible variations in the predictability of the surrogates. The work presents a solid foundation and rationale for the need for a new parameter-free surrogate building algorithm, focusing in particular on the automated design of the configuration of ANNs along with the simultaneous determination of the sample size required to maximize prediction accuracy without overfitting networks. The individual effect of each of the parameters such as architecture, sample size, sampling plan, transfer function, etc.

The potential dangers associated with heuristic-based design of ANN regarding the recognition of the capability of the ANNs as surrogate models were also presented.

## Parameters in Surrogate building algorithm

*Accuracy of prediction**Sampling plan or design of experiments (DoE)**Sample size**Architecture of the network**Activation function*

The sampling plan is at the heart of the surrogate construction algorithm as it directly affects the number of sample points, prediction accuracy, and network architecture. The sampling plan can be easily interpreted as a scheme of placing some arbitrary probes in an m-dimensional space to capture the behavior of the model (m is the number of inputs). The projection of the distribution of 200 sample points in 3-dimensional space obtained using the Sobol sampling plan is compared with the distribution of those obtained using the LHS sampling plan and is shown in Fig. 3.

Lower the value of this metric, better the .. space-filling ability of the sampling plan. It is clear from Table 1 that Sobol sampling plan appears to be one of the best alternatives among the existing options. One significant contribution in literature [24] shows a new algorithm for sample size determination of the given network.

Their approach is based on the fact that the training error of the network is minimized by increasing the sample size. Thus, to ensure the parsimonious nature of the network, they combined the K-fold model evaluation technique [25, 26] (with K = 10) together with a variant of LHS called the incremental-LHS (i-LHS) sampling included. plan for sample size determination. From the K available folds, one group is selected for validation and the remaining groups are used to train the network.

Thus, an average of these errors is taken into account and referred to as the cross-validation error of the current model (models are distinguished by sample size). In short, the essence of their algorithm is to find a minimum cross-validation error metric, which is a function of sample size. One of the major disadvantages of this algorithm is the large computation time of the K-fold based validation method.

The architecture of the network is perhaps the most important input, which influences the ANN surrogate construction algorithm to the maximum extent than any other parameter. These activation functions are enabled in the architecture as the decision variables of the optimization formulation mentioned in the previous section. The author in this work has limited the variability of the activation function to the entire network and.

## Industrial Sintering

### Modeling of Sintering Process

Although this could be implemented with slight modification, it was deliberately avoided to meet the computation time constraint on the ANN design. To ensure better sintering process, consumption of coke must be minimal due to the direct correlation of coke consumption with carbon footprint of the plant. The lower the coke consumption, the higher the efficiency of the plant in terms of energy, which leads to lower carbon footprint value of the operation and therefore lowers the cost of operation.

The conventional sintering process starts with the raw materials being loaded onto a moving string (30–60 cm thick) and proceeds for sintering. The burning of the coke, to reach the desired temperature during sintering process, starts above where the charge is ignited. This is the area where cold air is forced in by the vacuum created by suction pressure.

On the other hand, the preheated air, moving away from the upper zone where coke is burned, creates a wide melting zone in the lower region, which is much higher than the optimum. The melting of the charge is therefore not uniform due to different temperatures due to the different conditions of combustions at both the upper and lower regions of the charge. To avoid this, the charging process is divided into two layers, where the burning of coke is different but uniform in each of the layers, and this ensures uniform melting of the sinter mixture.

The initial conditions to solve this ODE are provided at the inlet boundary, while zero gradient condition is applied at the outlet. The convection terms in the velocity expression (given below) are used to calculate the velocities of the solids. The details of the kinetic models and the parameters involved in reaction mechanisms can be obtained from the literature [28, 29].

### Optimization of Sintering Process

The combined weighted average of the coke consumed in both layers (Cw) is considered as the second objective. The predicted values of the outputs corresponding to the inputs in the validation set are sent as outputs of the ANN code together with the original outputs of the model sent as validation set. Weights of the trained neural network that will enable it to interpolate any new value.

The operation of the code according to the sequential flow of steps is described in the rest of the article. Normalization of training data: Training data should be normalized before it is used to train a given network. Declaration and initialization of weights: The number of weights is constantly changing depending on the network architecture.

The initial values of the weights will affect the optimization routine used to determine the weights of the network. The results of the present work are reported below in the order of the simulations performed. The non-linear curves and the drastic intensity variations in these tile plots clearly indicate the complicated behavior of the sinter model.

This study justifies the fact that as the number of parameters of the network increases, the number of sample points required for training decreases. The evolution of the ANN surrogates with increment in sample size for output 1 is shown in Fig. The results in Tables 5 and 6 clearly indicate that as the sample size increases, the prediction accuracy of the architecture also increases.

The complete replacement of the original model with ANN surrogate in the optimization algorithm resulted in a saving of as much as 70% of the function evaluations, resulting in an almost 4 times faster optimization. This result justifies the elimination of the heuristic-based assumption of considering only some hidden layered architectures. Since the predictability and efficiency of the surrogate model play a dominant role in the success of surrogate-based optimization, the effect of various parameters on the ANN surrogate construction process has been studied.

Surrogate ANN models are used to simulate a complicated nonlinear sintering model used for successful blast furnace operation in steel plants. The surrogate-based optimization results revealed that the surrogate-based optimization methods were 4 to 7 times faster than the conventional method. ANN surrogate-based optimization reduced function evaluations by 70% dramatically paving the way for real-time optimization of complex industrial sintering model.

Apply the proposed algorithm to build ANN models for an experimental setup and ensure the successful operation of the proposed ANN surrogate building algorithm with experimental setups.