February 19th, 2019

What is Your AI Thinking? Part 3

RSS icon RSS Category: Data Science, Explainable AI, Financial Services, H2O Driverless AI, Machine Learning Interpretability

In the past two posts we’ve learned a little about interpretable machine learning in general. In this post, we will focus on how to accomplish interpretable machine learning using H2O Driverless AI. To review, the past two posts discussed:

  • Exploratory data analysis (EDA)
  • Accurate and interpretable models
  • Global explanations
  • Local explanations
  • Model debugging and sensitivity analysis
  • Fairness and disparate impact analysis
  • Model documentation

Let’s start with exploratory data analysis (EDA). EDA enables you to understand your data and form reasonable expectations for the results of your machine learning project. AutoViz in Driverless AI automates many EDA processes, allowing users to find outliers, understand relationships between input variables, and identify potential data quality problems all with just a few mouse clicks. To learn more about AutoViz, watch this recent talk by Leland Wilkinson, H2O.ai’s chief scientist, from H2O World San Fransisco 2019. Below is an example of the visualization capabilities available within AutoViz.

Several types of accurate and interpretable models can be trained automatically by H2O Driverless AI today, including traditional linear models for maximum interpretability. For users who want to try potentially more accurate, nonlinear, and still highly interpretable models, Driverless AI provides Jerome Friedman‘s RuleFit approach and monotonically constrained gradient boosting machines. Have a look at the Driverless AI documentation to see all the modeling options available today.

For global and local explanations H2O Driverless AI offers its cutting-edge machine learning interpretability (MLI) module. MLI enables you to see the overall drivers of model behavior for an entire dataset and to understand the logic behind any one individual model prediction. Driverless AI MLI can even be run on models created by other software packages! See this recent video and accompanying cheat sheet (also shown below) to get a better idea of exactly how MLI makes your models explainable.

Sensitivity or “what-if” analysis to test your model behavior in future mission-critical scenarios and disparate impact analysis to test for potential discrimination in model predictions are conducted using the Driverless AI Python API today. Both are major roadmap items and should be available in the graphical user interface (GUI) soon. Jupyter notebook examples are available for both sensitivity analysis and disparate impact analysis.

Model documentation is yet another process which Driverless AI automates. After each experiment, pertinent information about the trained complex model such as data dictionaries, modeling methodologies, and model assessments are summarized in a single document for human review.  Click here to download a basic sample report. Below is an example of the first two pages included in each report.

From our own internal experience over the past few years, it’s very important to combine all these techniques to create holistic, human-friendly solutions to real business problems. This series of blogs has introduced broad concepts in interpretable, fair, and trustworthy machine learning and highlighted H2O’s implementation progress toward them. To see what we have in store for future iterations of Driverless AI check out my recent talk from H2O World San Fransisco 2019.

P.S. If you are interested in interpretable machine learning for open source H2O there’s always good old linear models! We’ve also recently added monotonic constraints into the gradient boosting machine, provided this Github repo with lots of interpretability goodies for open source H2O, and we will be adding Shapley values for local explanation of tree-based models soon. Stay tuned!

This is the third blog in a 3-part series. You can catch the first and second parts here.

Leave a Reply

Developing and Retaining Data Science Talent

It’s been almost a decade since the Harvard Business Review proclaimed that “Data Scientist” is

May 12, 2022 - by Jon Farland
The H2O.ai Wildfire Challenge Winners Blog Series – Team Too Hot Encoder

Note: this is a community blog post by Team Too Hot Encoder - one of

May 10, 2022 - by H2O.ai Team
The H2O.ai Wildfire Challenge Winners Blog Series – Team HTB

Note: this is a community blog post by Team HTB - one of the H2O.ai

May 10, 2022 - by H2O.ai Team
Bias and Debiasing

An important aspect of practicing machine learning in a responsible manner is understanding how models

April 15, 2022 - by Kim Montgomery
Comprehensive Guide to Image Classification using H2O Hydrogen Torch

In this article, we will learn how to build state-of-the-art models in computer vision and

March 29, 2022 - by H2O.ai Team
H2O Wave Snippet Plugin for PyCharm

Note: this blog post by Shamil Dilshan Prematunga was first published on Medium. What is PyCham? PyCharm

March 24, 2022 - by Shamil Prematunga

Start Your Free Trial