In the past two posts we’ve learned a little about interpretable machine learning in general. In this post, we will focus on how to accomplish interpretable machine learning using H2O Driverless AI . To review, the past two posts discussed:
Let’s start with exploratory data analysis (EDA). EDA enables you to understand your data and form reasonable expectations for the results of your machine learning project. AutoViz in Driverless AI automates many EDA processes, allowing users to find outliers, understand relationships between input variables, and identify potential data quality problems all with just a few mouse clicks. To learn more about AutoViz, watch this recent talk by Leland Wilkinson, H2O.ai’s chief scientist, from H2O World San Fransisco 2019. Below is an example of the visualization capabilities available within AutoViz.
Several types of accurate and interpretable models can be trained automatically by H2O Driverless AI today, including traditional linear models for maximum interpretability. For users who want to try potentially more accurate, nonlinear, and still highly interpretable models, Driverless AI provides Jerome Friedman ‘s RuleFit approach and monotonically constrained gradient boosting machines. Have a look at the Driverless AI documentation to see all the modeling options available today.
For global and local explanations H2O Driverless AI offers its cutting-edge machine learning interpretability (MLI) module. MLI enables you to see the overall drivers of model behavior for an entire dataset and to understand the logic behind any one individual model prediction. Driverless AI MLI can even be run on models created by other software packages! See this recent video and accompanying cheat sheet (also shown below) to get a better idea of exactly how MLI makes your models explainable.
Sensitivity or “what-if” analysis to test your model behavior in future mission-critical scenarios and disparate impact analysis to test for potential discrimination in model predictions are conducted using the Driverless AI Python API today. Both are major roadmap items and should be available in the graphical user interface (GUI) soon. Jupyter notebook examples are available for both sensitivity analysis and disparate impact analysis .
Model documentation is yet another process which Driverless AI automates. After each experiment, pertinent information about the trained complex model such as data dictionaries, modeling methodologies, and model assessments are summarized in a single document for human review. Click here to download a basic sample report. Below is an example of the first two pages included in each report.
From our own internal experience over the past few years, it’s very important to combine all these techniques to create holistic, human-friendly solutions to real business problems. This series of blogs has introduced broad concepts in interpretable, fair, and trustworthy machine learning and highlighted H2O’s implementation progress toward them. To see what we have in store for future iterations of Driverless AI check out my recent talk from H2O World San Fransisco 2019.
P.S. If you are interested in interpretable machine learning for open source H2O there’s always good old linear models! We’ve also recently added monotonic constraints into the gradient boosting machine, provided this Github repo with lots of interpretability goodies for open source H2O, and we will be adding Shapley values for local explanation of tree-based models soon. Stay tuned!
This is the third blog in a 3-part series. You can catch the first and second parts here .