May 30th, 2019

Building an Interpretable & Deployable Propensity AI/ML Model in 7 Steps…

RSS icon RSS Category: Beginners, Community, Data Science, Demos, Explainable AI, H2O Driverless AI

To start with, you may have a tabular data set with a combination of:

  • Dates/Timestamps
  • Categorical Values
  • Text strings
  • Numeric Values

A business sponsor wants to build a Propensity to Buy model from historical data.

How many Steps does it take?

Let’s find out. We are going to use H2O’s Driverless AI instance with 1 GPU (optional BTW).

We pick the famous UCI-ML Portuguese Bank Marketing data as an example.

Citation: Moro et al., 2014] S. Moro, P. Cortez, and P. Rita. A Data-Driven Approach to Predict the Success of Bank Telemarketing. Decision Support Systems, Elsevier, 62:22–31, June 2014

See also links here:

https://www.kaggle.com/henriqueyamahata/bank-marketing

https://archive.ics.uci.edu/ml/datasets/bank+marketing

Data

The data has a nice mix of customer profile, behavioral and exogenous data and we want to build a model on whether a future customer will open a term deposit (The Output Variable below) based on a model learned from historical information.

Step #1: Data Sets Page

I just dragged and dropped the bank-additional-full.csv file into Driverless AI’s Data Set Page, which is the default screen when you log in.

Step #2 and #3: Data Splitter Wizard

Click next to the uploaded data set and select the Split option.

We choose banktraining85 to be the training data set name, which has 85% of the data and banktest15 as the test data set name, which has 15% of the data. We choose the target column as y, so split happens nicely and choose 0.85 as the split ratio.

Step #4 and #5: Experiments Page

We can now select Predict on the menu by first clicking on banktraining85, which will take us to the Experiments page.

In the screen below, we are going to drop the column duration as it’s a data leakage column. Here’s the blurb from UCI ML webpage on why we are dropping it.

Important note: this attribute highly affects the output target (e.g., if duration=0 then y=’no’). Yet, the duration is not known before a call is performed. Also, after the end of the call y is obviously known. Thus, this input should only be included for benchmark purposes and should be discarded if the intention is to have a realistic predictive model.

Choose banktest15 for TEST DATASET and choose target column as y. Choose default values and click LAUNCH EXPERIMENT.

… . After 5 minutes and 30 seconds, my single GPU instance of Driverless AI created the model …

Step #6: Finished Experiment and Deployment Decisions …

We can now download and deploy either a python or java scoring pipeline to production — either to a REST Server, Java UDF or whatever your scoring environment is!

We have an AUC of 0.8047 for training and a slightly higher AUC of 0.8052 for the holdout set!

Step #7: Machine Learning Interpretability or MLI Dashboard

Click on the INTERPRET THE MODEL on the finished Experiments page above.

Once you are directed to MLI page, click on the Dashboard on the left!

Very clearly the feature importance shows nr_employed, euribor3m as the top 2 predictors for someone to open a term deposit. So exogenous factors play a big role in this data set. Followed by that are the behavioral predictors such as pdays and poutcome, etc.,

There is a lot more digging that can be done with this dashboard like Partial Dependence Plots, Shapley, Local Interpretations etc.,,

What did we accomplish in roughly 7 Steps?

This experiment ran GLM, LightGBM, and XGBoost algorithms based on default settings + hyperparameter optimization + automatic feature engineering!! It also built a final ensemble of the winning GLM/LightGBM/XGBoost models. If you are a very experienced data scientist, you can always work on tweaking the knobs, choose Expert Settings to turn on/off algorithms etc., — but those steps are optional. We also skipped the steps on advanced product features such as Automatic Visualization or AutoViz, Automatic Documentation, Python Client API (access from a Jupyter notebook) etc., in this blog post – those are all available with extra clicks/steps.

The final model was decided by Driverless AI using Automatic Machine Learning – with stunning accuracy that usually takes a data scientist may be a week or few by writing code. We were also able to interpret the model really quickly and find insights. It only took us 10 minutes to do all of this — with just 7 Steps in Driverless AI!!

About the Author

Karthik Guruswamy

Karthik is a Principal Pre-sales Solutions Architect with H2O. In his role, Karthik works with customers to define, architect and deploy H2O’s AI solutions in production to bring AI/ML initiatives to fruition.

Karthik is a “business first” data scientist. His expertise and passion have always been around building game-changing solutions - by using an eclectic combination of algorithms, drawn from different domains. He has published 50+ blogs on “all things data science” in Linked-in, Forbes and Medium publishing platforms over the years for the business audience and speaks in vendor data science conferences. He also holds multiple patents around Desktop Virtualization, Ad networks and was a co-founding member of two startups in silicon valley.

Leave a Reply

+
Enhancing H2O Model Validation App with h2oGPT Integration

As machine learning practitioners, we’re always on the lookout for innovative ways to streamline and

May 17, 2023 - by Parul Pandey
+
Building a Manufacturing Product Defect Classification Model and Application using H2O Hydrogen Torch, H2O MLOps, and H2O Wave

Primary Authors: Nishaanthini Gnanavel and Genevieve Richards Effective product quality control is of utmost importance in

May 15, 2023 - by Shivam Bansal
AI for Good hackathon
+
Insights from AI for Good Hackathon: Using Machine Learning to Tackle Pollution

At H2O.ai, we believe technology can be a force for good, and we're committed to

May 10, 2023 - by Parul Pandey and Shivam Bansal
H2O democratizing LLMs
+
Democratization of LLMs

Every organization needs to own its GPT as simply as we need to own our

May 8, 2023 - by Sri Ambati
h2oGPT blog header
+
Building the World’s Best Open-Source Large Language Model: H2O.ai’s Journey

At H2O.ai, we pride ourselves on developing world-class Machine Learning, Deep Learning, and AI platforms.

May 3, 2023 - by Arno Candel
LLM blog header
+
Effortless Fine-Tuning of Large Language Models with Open-Source H2O LLM Studio

While the pace at which Large Language Models (LLMs) have been driving breakthroughs is remarkable,

May 1, 2023 - by Parul Pandey

Request a Demo

Explore how to Make, Operate and Innovate with the H2O AI Cloud today

Learn More