2.3 Interpretability Landscape
2.3.4 Implicit vs Posthoc Techniques
2. BACKGROUND
sis focuses on interpreting individual predictions for model-agnostic posthoc methods and model-specific implicit methods.
5. Locally Interpretable for Multiple Predictions: A group of predictions could also be explained with global and modular methods or instance-based explanations.
Here, the group of instances is treated as if they represent the complete dataset followed by individual explanation methods for each instance followed by ag- gregation of explanations for the group [112].
Evaluating Interpretability Techniques: Although measuring interpretabil- ity is an active research field, a real consensus about interpretability in machine learning and evaluation metric is still lacking. However, there are a few evaluation approaches proposed in the literature [29, 125]:
1. Expert Validation: The technique embedded in a product performing a real task is validated by experts. For instance, an ECG classification system will generate heatmaps highlighting relevant signal timestamps for a diagnosis, and the cardiologist will evaluate the model.
2. Dilettante Validation: The technique is validated by dilettante/non-experts.
The advantage is that no domain expert is required and could be validated by several users. For instance, users can choose the best explanation from various explanations.
3. Proxy Model Validation: The technique adopts a proxy function/model that is already validated by non-experts. Suppose users understand decision trees, then tree depth can be provided as a proxy for explanation where short trees get a high explainability score.
Nature of Interpretability Techniques: The techniques can be applied to a model after or during the model building. The nature of the techniques becomes intrinsic if they are part of the model architecture and posthoc if they are applied later to a pretrained model. The intrinsic and posthoc techniques are the main focus of the thesis and are discussed in detail in Section 2.3.4.
2.3. INTERPRETABILITY LANDSCAPE
118, 121, 126, 129, 131]. On the other hand, the techniques focusing on model outputs and highlighting changes in inputs are known as posthoc interpretations as they provide reasons for output and not the information regarding the model’s internal working [115, 121, 126, 127, 130, 131, 131]. The posthoc interpretations also provide insights into why a model may fail or generate undesired effects [129]. However, the interpretations change with the posthoc techniques, but they can be adopted for pretrained models, and retraining can be omitted. Thus, these interpretations have better comprehensiveness with less completeness [131]. Model interpretability has been classified into various categories in the literature [14, 20, 112, 114, 118, 125, 129, 135, 138] but the general consensus is that it can be categorized into implicit and posthoc approaches [14, 15, 112, 114, 117, 120, 125, 129, 138–140]. A general description and various sub-categories of implicit and posthoc techniques are provided.
Implicit Approaches: The intrinsic or implicit interpretability methods are considered interpretable due to their simple structure, which restricts the complexity of ML models [125]. The implicit approaches are generally model-specific as they are embedded in model architecture and describe model behavior at local and global levels, where global-level interpretation is more common [14, 114, 125]. Decision trees and linear models are considered intrinsically interpretable due to their simple struc- ture. The interpretation of regression weights in a linear model is an example of model-specific interpretation. Implicit approaches can be sub-divided into transpar- ent, hybrid, and approaches generating prototypes as explanations [14, 118].
1. Transparent Approaches: Family of models whose internal mechanisms are com- prehensive [14,112]. These include decision trees, Naive Bayes, logistic, and lin- ear regression models. These models themselves serve as an explanation and are also termed as an intrinsically interpretable model, transparent model, white- box model [20, 22, 114, 120, 141]. These models generate explanations that have completeness, but comprehensiveness depends on model complexity, which leads to inferior performance, such as in the case of the linear regression model.
2. Explainable Prototypes: The approaches are applied in a cascaded fashion to a black-box model that provides prototypes as explanations for improving trans- parency of internal mechanisms of the model [118]. The trade-off between completeness and comprehensiveness varies with the approach. For instance, in [31, 134], the authors combined clustering and neural networks for classifica- tion and generation of explainable prototypes.
3. Hybrid Approaches: Approaches that combine transparent and black-box mod- els, thereby sacrificing comprehensiveness but achieving better performance [14].
32 TH-2764_156201001
2. BACKGROUND
In [14], authors combined logistic regression and support vector machine, and the resultant model was transparent and provided explanations. The attention- based models produce heatmaps that highlight relevant signal timestamps for ECG classification [142–149]. However, these models are inapplicable to previ- ously developed pretrained models.
Posthoc Approaches: Posthoc approaches are applied to already built trained models and are also applicable to intrinsically interpretable models [20, 112, 114, 125, 129]. The posthoc approaches are generally model-agnostic or model-independent.
They can be applied to a pretrained intrinsically interpretable model to analyze the input-output relationship to describe the model’s internal mechanism [14,114,125,135, 139, 140]. The approaches work at global and local levels, where local-level interpre- tations are more common. The generated interpretations have low completeness but high comprehensiveness and are more generalized. The approaches allow customiza- tion for varying user needs, thereby enabling comparisons and model switching in an already deployed system [140]. The posthoc methods approximate the reasoning rather than providing a cause-effect relationship for a pretrained classifier [31]. They are deprived of model weights or architectural information. Moreover, their explana- tions change with classifier, leading to multiple conflicting yet convincing explanations for a prediction [31]. The posthoc approaches generate visualizations through par- tial dependence plots, rule-extraction, feature influence methods such as sensitivity analysis [114, 125]. For instance, feature ranking is calculated for decision trees.
Posthoc techniques can be categorized into Perturbation and Backpropagation based methods. The perturbation-based methods perturb the input signal with noise and observe the changes in predicted probability. The perturbation of most con- tributing signal timestamps reduces the likelihood with the maximum amount [150].
However, multiple feed-forward runs are required for a given input, making the tech- nique computationally expensive. The method is unstable to surprise artifacts as sud- den perturbations to a normal class signal might result in a diseased class prediction, making the technique unsuitable. The backpropagation-based posthoc techniques are computationally cheaper and measure the contribution of input signal timestamps through backpropagation-based methods. The backpropagation-based posthoc tech- niques can be categorized into relevance score backpropagation-based techniques and gradient backpropagation-based techniques.
1. Relevance score backpropagation methods: These techniques calculate the rele- vance of each input signal timestamp by backpropagating the probability score TH-2764_156201001
2.3. INTERPRETABILITY LANDSCAPE
instead of the gradient. In the literature, two relevance score-based techniques are proposed, namely, Layerwise Relevance Propagation (LRP) [151] and Deep Learning Important FeaTures (DeepLIFT) [152]. LRP propagates relevance scores and redistributes as per the proportion of the activation of previous lay- ers. Since the redistribution is based on activation scores, the technique does not suffer difficulties with non-linear activation layers. DeepLIFT explains the difference in the output prediction and prediction on a baseline reference signal instead of directly explaining the output prediction [152].
2. Gradient backpropagation methods: These techniques backpropagate the out- put gradient to the input layer to describe the influence of input timestamp on the predicted class. The methods describe the local behavior of the out- put at specific input timestamps. The method requires a single forward and backward pass instead of multiple forward passes for the perturbation-based methods, making it computationally inexpensive and free of artifacts. The Gradient backpropagation-based techniques include Guided Backpropagation (GBP) [36], Gradient Class Activation Map (Grad CAM) [37], and their hybrid Guided Grad CAM [37].
This thesis focuses on gradient-backpropagation-based posthoc techniques for ECG interpretation, including GBP, Grad CAM, Guided Grad CAM, and implicit approaches, including attention mechanism and explainable prototypes.