Explainable AI methods, such as LIME, allow to explain and interpret the predictions of machine learning models. They offer a solution for the trade-off between interpretability and performance, i. e. between complex models that can handle large and versatile data sets and less complex models that are much easier to interpret, but usually also less performant.
In short, they provide insight into the criteria behind predictions of machine learning models.
This allows you to work specifically on weak points and to counteract them when training a model.
Many machine learning applications require trust in them so that the application can help with making a decision. Otherwise their advice might simply be ignored for lack of trust. If one knows the reasons for a prediction thanks to explainable AI methods such as LIME, the decision to trust the application is easier.
The amount of machine learning-based applications in everyday life is increasing rapidly. Whereas in the past, statically programmed rules were responsible for the function of a program, nowadays dynamic models are used increasingly. Their output is not fixed from the beginning but depends very much on the training data. Therefore, when developing these models, it is crucial to not only evaluate them on the basis of metrics such as the accuracy achieved on test data, but to also consider the interpretability of these models. Interpretability in this context means understanding why certain predictions are made by a model. Knowing these reasons increases the acceptance of and confidence in machine learning based applications.
For optimal results with large and versatile data sets, one often has to rely on complex models. However, their interpretability by the user is usually low, so that in some cases one has to accept a reduction in model performance. In order to increase the interpretability of even complex models from deep learning, toolkits have been developed that illustrate the causes of model predictions.
For example, such a model can then help in medicine to detect cancer using CT or MRT images. Explainable Artificial Intelligence (XAI) tools quickly and easily highlight the areas in images that are crucial for classification, thus providing the attending doctor with significant support in making decisions.
Even for complex models, which are used in the field of deep learning, e.g. for image classification, solutions for better interpretability have been developed.
The explainable AI method LIME (Local Interpretable Model-agnostic Explanations) helps to illuminate a machine learning model and to make its predictions individually comprehensible. The method explains the classifier for a specific single instance and is therefore suitable for local explanations.
To simplify, LIME manipulates the input data and creates a series of artificial data containing only a part of the original attributes. Thus, in the case of text data, for example, different versions of the original text are created, in which a certain number of different, randomly selected words are removed. This new artificial data is then assigned to different categories (classified). Hence, through the presence or absence of certain keywords we can see their influence on the classification of the selected text.
In principle, the explainable AI method LIME is compatible with many different classifiers and can be used with text, image and tabular data. So we can apply the same pattern to image classification, where the artificial data does not contain a part of the original words, but image sections (pixels) of an image.
With our following example we show this and train a Natural Language Processing (NLP) model, which assigns message texts to different categories like hockey, cars or baseball.
For the example we use the following categories from the public 20 newsgroup dataset of the Scikit-Learn package:
An entry from the data looks like this:
To be able to classify the messages, we use a simple pipeline which first transforms the raw text data into vector form compatible with machine learning models via a tf-idf vectorizer. Then we use a multinomial Naive Bayes model to classify the transformed data.
Checking the classification accuracy using a retained test data set yields a good value of 91.9%.
To investigate whether the NLP model also uses meaningful words (features) for the classification, we use the LIME Text Explainer. For this purpose we can add single text instances to the Explainer. The return value provides an overview of the extent to which the individual features (words) contributed to the assignment of the tested text instance to a specific class. To keep the output clear and understandable, we limit it to the 6 most influential features in the classification.
In our case, the text shown above was assigned to the hockey category with a probability of 85%. The decisive factors were the Canadian ice hockey club Oilers, the National Hockey League (NHL), but also the National Basketball Association (NBA). While the first two key features seem to be reasonable and correct, the choice of the word NBA for the class of hockey already raises first doubts about the correctness of our model.
So why did the model learn the word NBA as one of the characteristic features for the hockey category? One possible explanation is that in our training data, NBA often appears in this category, but only sporadically or never in the other categories. Nevertheless, the question arises whether this is a suitable feature for the use case of the model. If this is not the case, targeted feature engineering and data preprocessing should be used to counteract this when training the model.
The need for optimization of our model becomes even more obvious when we consider the following text:
It is incorrectly assigned to the baseball category. The words highlighted in green contribute positively to the classification to the baseball category, whereas the words highlighted in red speak against this category. Here it becomes clear that in this case, the choice of keywords and their weighting are mostly not suitable for assigning the text to the correct category, but rather work randomly in parts of the training record - and probably only there.
Considering these simple examples, it is already clear that despite the good performance on the test data set, our model does not necessarily use suitable features to make a prediction. After deployment in a production environment, this fact can quickly lead to questionable results.
With the help of the explainable AI method LIME these errors can be recognized and corrected. For a better usability, the used LIME library also provides a method to easily output a representative sample with a configurable number of sample elements and associated features. Thus, a more global insight into the investigated model can be quickly generated.
Besides LIME, there are other explainable AI tools like IBM AIX 360, What-if Tool and Shap that can help to increase the interpretability and explainability of the data sets and machine learning models used. The information thus obtained allows the development of more robust models and a targeted adaptation to new data. Apart from that, the insights gained can be helpful in increasing the acceptance and confidence in machine learning applications.