The ability to explain and trust the outcome of an AI-driven business decision is now a crucial aspect of the data science journey. There are many tools in the marketplace that claim to provide transparency and interpretability around machine learning models but how does one actually explain a model?
H2O Driverless AI provides robust interpretability of machine learning models to explain modeling results. With these built in capabilities, everyone including expert and junior data scientists, domain scientists, and data engineers can develop trusted machine learning models and explain them without much complexity.
H2O.ai’s very own Patrick Hall has put together a collection of resources to help guide Driverless AI users around the Machine Learning Interpretability (MLI) capabilities built into the platform. This blog will focus on the Machine Learning Interpretability walkthrough video for Driverless AI and the MLI cheat sheet that goes along with the video.
First, adjust the overall settings in Driverless AI to create a more interpretable model.
Set interpretability to >=7, which will result in:
Also, just like setting a random seed in your favorite modeling package, be sure to click on the ‘Reproducible’ button to ensure repeatable and reproducible results .
During the Driverless AI training process the system will attempt to create new features from the original features in the data set you provided. The final model will typically be built using a combination of the original features and features the system creates on its own. Because of the settings specified above, the system won’t create too many new features, won’t create extremely complex new features, and will make sure the relationship between the original and new features and the model predictions is explainable. Once the model is trained, all the interactive charts described below can be used to explore details about the model. Let’s start with global Shapley feature importance and continue on from there.
Global Shapley feature importance provides an overall view of the drivers of your model’s predictions. Global Shapley values are reported for original features and any feature the Driverless AI system creates on its own.
Global Shapley values:
Global original feature importance provides an approximate overall view of how your original features affect model predictions.
Partial dependence shows the average Driverless AI model prediction and its standard deviation for different values of important original features. This helps you understand the average model behavior for the most important original features.
The global surrogate decision tree provides an overall flowchart of the Driverless AI model’s decision making processes based on the original features.
The surrogate decision tree shows:
The global interpretable model is a linear model of the Driverless AI model predictions.
The interpretable, global linear model of the Driverless AI predictions shows:
Local Shapley feature importance shows how each feature directly impacts each individual row’s prediction.
Local Shapley values:
Local linear explanations show the local linear trends around an individual row (they are derived using the LIME technique). They pair nicely with local Shapley values. The local Shapley values give a point estimate for how a feature impacts an individual row’s prediction, while local linear explanations tell us about the trends of each feature for the same row’s prediction.
The local surrogate decision tree path shows how the logic of the model is applied to any given individual.
The decision path:
Individual conditional expectation (ICE) shows how an individual prediction changes when one of the input’s feature’s values are changed. If ICE values are different from partial dependence this can also help confirm interactions spotted in the surrogate decision tree.
Like the Shapley values in step 6, local original feature importance shows the original features that drive a prediction for an individual row (The Shapley values in step 6 show importance for the new and original features — not just the original features).
As you can see the MLI module of H2O Driverless AI can tell you a lot about how your model is behaving. As this is a relatively new area of research and development, stay tuned for more features like these! Also, check out our Explainable AI page for more resources related to machine learning interpretability.