One of the main motivations of this thesis is to do a multi-scenario analysis on the ex-
perimentally available binned data, to obtain a data-based selection of a best case and
ranking and weighting of the remaining cases. For this purpose, different models can be
compared with regard to their model fit by computing a ∆χ^{2} test. This test allows to
decide whether a given model fits significantly better or worse than a competing model.

2.5.1 ∆χ^{2} test

A ∆χ^{2} test is useful when the competing models are nested. Two models are considered

“nested” if one is a subset or extension of the other, i.e. one of the models could be
obtained by fixing or eliminating parameters in the other model. When the model with
the fewer free parameters (null, in many cases) is true, and when certain conditions
are satisfied, Wilks’ Theorem [137] says that this difference (∆χ^{2}) should have a χ^{2}
distribution with the number of degrees of freedom equal to the difference in the number
of free parameters in the two models. This lets one compute ap-value and then compare
it to a critical value to decide whether to reject the null model in favor of the alternative
model.

Considering the simplicity of this technique, we need to remember that unlike the
Akaike information criterion (AIC_{c}) or the Schwarz-Bayesian Criterion (BIC) [138], which
incorporate the concept of parsimony and can be applied to nested as well as non-nested
models,∆χ^{2} test, can only be applied to nested models.

One of the most powerful and reliable methods for model comparison is cross vali- dation. The most straightforward (and also most expensive) flavor of cross validation is leave-one-out cross validation (LOOCV). It simultaneously tests the predictive power of the model as well as minimizes the bias and variance together. In LOOCV, one of the data points is left out and the rest of the sample (training set) is optimized. Then that result is used to find the predicted residual for the left-out data point. This process is repeated for all data points and a mean-squared error (MSE) is obtained. For model selection, this MSE is minimized. Unfortunately, it is computationally very expensive. Hence we have to find out some other reasonable method for model selection.

For that goal, we have made use of information-theoretic approaches, especially of the second order Akaike information criterion (AICc) in the analysis of empirical data. It has been shown that LOOCV is asymptotically equivalent to minimizing AIC [139]

2.5.2 Introduction to AIC_{c}

The ‘concept of parsimony’ [140] dictates that a model representing the truth should be obtained with “... the smallest possible number of parameters for adequate representation of the data.” In general, bias decreases and variance increases as the dimension of the model increases. Often, the number of parameters in a model is used as a measure of the degree of structure inferred from the data. The fit of any model can be improved by increasing the number of parameters. Parsimonious models achieve a proper trade-off between bias and variance. All model selection methods are based to some extent on the principle of parsimony [141].

In information theory, the Kullback-Leibler (K-L) Information or measureI(f, g) de- notes the information lost when g is used to approximatef. Heref is a notation for full reality or truth and g denotes an approximating model in terms of probability distribu- tion. I(f, g) can also be defined between the ‘best’ approximating model and a competing one. Akaike, in his seminal paper [142] proposed the use of the K-L information as a fun- damental basis for model selection. However, K-L distance cannot be computed without full knowledge of both f (full reality) and the parameters (Θ) in each of the candidate models gi(x|Θ) (a model gi with parameter-set Θ explaining data x). Akaike found a rigorous way to estimate K-L information, based on the empirical log-likelihood function at its maximum point.

‘Akaike’s information criterion’(AIC) with respect to our analysis can be defined as,

AIC =χ^{2}_{min}+ 2K (2.15)

where K is the number of estimable parameters. In application, one computes AIC for each of the candidate models and selects the model with the smallest value of AIC. It is this model that is estimated to be “closest” to the unknown reality that generated the data, from among the candidate models considered.

While Akaike derived an estimator of K-L information, AIC may perform poorly if there are too many parameters in relation to the size of the sample. Sugiura [143] derived a second-order variant of AIC,

AICc=χ^{2}_{min}+ 2K+2K(K+ 1)

n−K−1 (2.16)

where n is the sample size. As a rule of thumb, Use of AIC_{c} is preferred in literature
when n/K < 40. There are various other such information criteria defined later on, e.g.

QAIC, QAIC_{c}, TIC etc. In this analysis, we consistently use AIC_{c}.

Whereas AICc are all on a relative (or interval) scale and are strongly dependent on
sample size, simple differences of AIC_{c} values (∆^{AIC}_{i} = AIC^{i}_{c}−AIC^{min}_{c} ) allow estimates
of the relative expected K-L differences between f and gi(x|Θ). This allows a quick
comparison and ranking of candidate models. The model estimated to be best has ∆^{AIC}_{i} ≡

∆^{AIC}_{min} = 0. The larger ∆^{AIC}_{i} is, the less plausible it is that the fitted modelg_{i}(x|Θ) is the
K-L best model, given the data x. Table 2.1 lists rough rule-of-thumb values of ∆^{AIC}_{i} for
analysis of nested models.

∆^{AIC}_{i} Level of Empirical Support for Model i

0−2 Substantial

4−7 Considerably Less

>10 Essentially None

Table 2.1: Rough rule-of-thumb values of ∆^{AIC}_{i} for analysis of nested models.

While the ∆^{AIC}_{i} are useful in ranking the models, it is possible to quantify the plausi-
bility of each model as being the actual K-L best model. This can be done by extending the
concept of the likelihood of the parameters given both the data and model, i.e. L(Θ|x, g_{i}),
to the concept of the likelihood of the model given the data, henceL(g_{i}|x);

L(g_{i}|x)∝e^{(−∆}^{AIC}^{i} ^{/2)}. (2.17)
Such likelihoods represent the relative strength of evidence for each model [144].

To better interpret the relative likelihood of a model, given the data and the set ofR
models, we normalize the L(g_{i}|x) to be a set of positive Akaike weights, w_{i} , adding up

to 1:

w_{i}= e^{(−∆}^{AIC}^{i} ^{/2)}
PR

r=1e^{(−∆}^{AIC}^{r} ^{/2)} (2.18)

A given w_{i} is considered as the weight of evidence in favor of model i being the actual
K-L best model for the situation at hand, given that one of theR models must be the K-L
best model of that set. Thew_{i} depend on the entire set; therefore, if a model is added or
dropped during a post hoc analysis, the wi must be recomputed for all the models in the
newly defined set.