Different methods for explainable AI by Bhusan Chettri

Interpretable Machine Learning methods This tutorial from Bhusan Chettri provides an overview of different methods of interpretable machine learning (IML) a.k.a explainable AI (xAI) framework. This tutorial is the third installment of the interpretable AI tutorial series by Dr Bhusan Chettri a PhD graduate in AI and Voice Technology from Queen Mary University of London. The tutorial explains different approaches towards explaining or understanding the working phenomenon of data-driven machine learning models. The methods for interpreting AI models falls usually in two categories: (1) aiming to design inherently interpretable models that are fairly easy and straightforward to understand; (2) and, devising specialised algorithms and methods to analyse or unbox a pre-trained black-box machine learning models (usually deep learning based). This second category is often referred as post-hoc interpretability methods that take into account a pre-trained model rather than aiming to incorporate various conditioning during model training as done in case of approach 1. As the two topics are quite vast to cover in a single tutorial, more focus has been put on first part in this tutorial. The follow-up on this tutorial will focus more on post-hoc methods of interpretability, the second part. However, before going further into the topic, it is worth revisiting briefly the previous installments of this tutorial series. Part1 focussed on providing an overview of AI, Machine learning, Data, Big-Data and Interpretability. It is a well known fact that ‘Data’ has been the driving fuel behind the success of every machine learning and AI applications. The first part discussed how vast amount of data are produced every single minute from different mediums (online transations, sensors, surveillance, and social media). It talked about how today’s fast growing digital age that leads to generation of such massive data, commonly referred as Big Data, has been one of the key factor towards the apparent success of current AI systems across different sectors. The tutorial also highlighted how AI, Machine Learning and Deep Learning are inter-related: deep learning is a subset of machine learning and machine learning is a subset of AI. In other words, AI is a general terminology that encompasses both machine learning and deep learning. The tutorial also briefly explained back-propagation, the engine of neural networks. Finally, it provided a basic overview of IML stressing their need and importance towards understanding how model makes a judgement about a particular outcome. Please read part1-tutorial for more details. Part2 of the series, provided insights on xAI and IML taking into consideration safe- critical application domains such as medicine, finance and security where deployment of ML or AI requires satisfaction of certain criterias (such as fairness, trustworthiness, reliability etc). The tutorial explained the need for interpretability on today’s state-of- the-art ML models that offer impressive results as governed by a single evaluation metric (e.g., accuracy). Wild-life monitoring and automated tuberculosis detecter as two use cases were taken into consideration to elabore the need of xAI in detail. Furthermore, how dataset biases can impact adoption of machine learning models in

real-world scenarios and how crucial is understanding training data were discussed in the tutorial. Please read part2-tutorial for details. Interpretability methods This tutorial is focussed on explaining different interepretability methods for understanding the behaviour of machine learning models. There have been tremendous research work on IML and researchers have proposed several methods to explain the working phenomenon of ML models. Different taxonomies of IML methods can also be found in the literature but with a lack in the consistency of taxonomies. Thus, for simplicity this tutorial summarises IML methods in two broad categories. The first involves designing ML models that are implicitly interpretable. These class of models are simple models such as decision trees which in itself is easy to interpret. The second method includes attempting to understand what a pre-trained model has learned from the underlying data to form a particular outcome or decision. This is called post-hoc analysis that takes a pre-trained model which is often black- box in nature, for example deep neural networks. Towards designing interpretable models In this approach, researchers aim to build solution to a given problem using ML models that do not require use of any post-hoc analysis once the model is trained, rather it focusses on building models in such a way that they are easy to interpret in themselves. Although these form of methods offer a good degree of explanability, which is encoded into the model itself, they often suffer in terms of performance due to the underlying simplicity of the model architecture that often fails to learn the underlying complex data distribution. This offcourse depends and varies across different problem domains. Nonetheless, they are easy to understand which is a key to many safety-critical application domains, for example finance and medicine. During model training these form of models are conditioned to satisfy certain criterion in order to maintain interpretability. These conditions (for example sparsity) may take different forms depending upon the nature of the problem. They are often referred as white-boxes, intrinsic explainable models or transparent boxes. To derive an understanding of their working phenomenon, one can inspect different model components directly. For example, inspecting the different nodes visited from the root to the leaf node in a decision tree. Such analysis provides enough insights about why and how a model made a certain decision.

Approach 1: Rule-based models The first category of methods aim at applying a predefined set of rules that are often mutually exlusive or dependent while training the models. One well known example of such model class is decision tree model which comprises set of if-else rules. Because of simplicity of if-else rules it becomes very easier to get an idea of how the model is forming a particular prediction. Researchers have proposed an extension to decision tree which is called as decision lists that comprises of an ordered set of if- then-else statements and these models take a decision whenever a particular rule holds true. Approach 2: Case-based reasoning and prototype selection In this approach, prototype selection and case-based reasoning are applied towards desigining interpretable ML models. Here, prototype can mean different for various application and therefore it is application specific. For example, an average of N training examples from a particular class in the training dataset can be regarded as a prototype. Once trained, such model perform inference (or prediction) by computing the similarity of a test example with every element in the prototype set. Unsupervised clustering followed by prototype and subspaces learning have been performed by researchers to learn an interpretable Bayesian case model where each subspace is defined as a subset of features characterising a prototype. Learning such prototypes and low-dimensional subspaces helps promote interpretability and generating explanations from the learned model. Approach 3: towards building inherently interpretable models In this approach, researchers aim at developing training algorithms and often defines dedicated model architecture in a way to bring interpretability in black-box machine learning models (especially the deep learning based). In that direction, one common and quite popular method used in the literature to promote interpretability is through use of attention algorithms during model training. Through such attention mechanism one can encode some degree of explainability in the training process itself. In other words, it provides a way to weigh feature components in the input (that can be eventually visualised) to understand what part of the input is being utilised most heavily by the model in forming a particular prediction in contrast to other feature components. On the other note, researchers have also encapsulated a special layer within the deep neural network (DNN) architecture to train the model in an interpretable way for different machine learning tasks. The output from such a layer that provides different information (for example different parts of input) can later be utilised during inference time for explaining or understanding different class category.

Furthermore, use of some training tricks such as network regularisation has also been performed in the literature to make convolutional neural network models more interpretable. Such a regularisation guides the training algorithm in learning disentangled representation from the input which eventually helps model learn the weights (i.e the filters) that eventually learns more meaningful features. Some other line of work can be found where self-explanable DNNs have been proposed. This model architecture comprises of a encoder module, a parameterizer module and an aggregation function module. It is to be noted, however, that the design of interpretable models is not favourable under every situations. While it is true that they provide inherent explainability due to their design choices but there are limitations or challenges with this approach. One challenge is the use of input features. What if the input features used in itself is hard for humans to understand? For example, Mel Frequency Cepstral Coefficients is one of the state-of-the-art features used in automatic speech recognition systems, and is not easily interpretable. This implies that the obtained explanations from the trained interpretable model would lack interpretability because of the choice of input features. Thus, as highlighted earlier, there is always a tradeoff between model complexity and model interpretability. Lower the model complexity, higher is the interpretability but lower would be model performance. In contrast, higher the model complexity lower is the interpretability (but generally offers better performance on a test dataset). In almost every domain applications (audio, video, text, images etc) high accuracy showing models are complex in nature. It is hard to achieve state-of-the-art performance on a given task using simplistic interpretable models for example a linear regression model because of its simplicity as it fails to learn the complex data distribution in the training dataset, and hence shows poor performance on a test set. Thus post-hoc methods have evolved and explored by researchers across many domains to understand what complex machine learning models are capturing from the input data to make predictions. The next section provides a brief introduction on post-hoc methods of interpretability. Post-hoc interpretability methods This class of interpretability method works on a pre-trained machine learning model. Here, the post-hoc interpretability methods aims at investigating the behaviour of a pre-trained models using specially devised algorithms to perform explainability study. This means that this class of methods do not put any conditioning with regard to interpretability during the model training. Thus the models that are being investigated to understand their behaviour using post-hoc approaches are usually complex deep learning models which are black-box in nature. These methods are broadly grouped into two parts.

First class of methods aim at understanding the global or overall behaviour of machine learning models (deep learning models in particular). The second class of methods focus on understanding the local behaviour of the models. For example, producting explanations to understand which different features (among N set of features)contributed most to a particular prediction. It should also be noted that these post-hoc methods can be applicable to any machine learning model (so called model agnostic types) or it can be designed specifically for a particular class of machine learning models (so called model specific). In the next tutorial on this series Bhusan Chettri will be discussing more on the post-hoc methods of model interpretability.

Different methods for explainable AI by Bhusan Chettri