May 3rd, 2019

AI/ML Projects — Don’t get stymied in the last mile

RSS icon RSS Category: Community, Data Journalism, Data Science, Demos, H2O Driverless AI

Data Scientists build AI/ML models from data, and then deploy it to production – in addition to a plethora of tasks around data insights, data cleansing etc., Part of the Data Scientist job description/requirement is making models available for transparency, auditability as well as explainability for both regulators as well as internal business use.

While model monitoring, deployment, and data engineering fall in the infrastructure side and  has challenges of its own, creating auditable, transparent and explainable AI/ML models that perform well has always been elusive to AI/ML projects.

Also, there is no easy way to build quality models “consistently” especially because it requires a lot of talent that require a “lot of” various tools, tool integration, a suite of algorithms, iterations etc., Each business problem, data set creates new challenges for data scientists. Data Science experiments are great, but cranking out industrial-strength models day after day is another story. A high rate of conversion of business problems to quality models or turn around time is what makes AI/ML initiatives become successful.

Common Challenges in the Last Mile of AI/ML Model Creation:

  • Algorithms to use — It is hard to determine ahead of time which algorithm/combination of algorithms or its parameters is going to be a better fit. Even though a list of top leaderboard algorithms will always be good, finding the right fit is a challenge by itself, including building an ensemble of the top N algorithms by score.
  • Feature Engineering — Doing a whole lot of complex and combinatorial feature engineering by data engineers ahead of time creates ‘feature zoo’ that slows down AI/ML model building. Feature engineering includes converting categorical to numeric, vice-versa, combining multiple columns, encoding, etc., Feature engineering is heavily relied on by Data Scientists to create HIGH ACCURATE models and often push that task to data engineers. Unfortunately, it’s not easy to determine “what features” are important ahead of time, unless done iteratively and tested well. If the data changes over time, new features have been discovered again, while the model lags in quality in production.
  • Model Documentation — Creating Documentation on the deployed models + winning features for auditability.
  • Model Explainability — Explaining the current model in production on how it’s deciding what it is deciding. Questions on a production model like, “What is the marginal effect of this column on the final outcome ?”, “What is the numeric cutoff point of this column after which churn drops ?” “I need the reason codes for the model prediction for customer X” etc., has to be answered …
  • Scoring Pipeline — Packaging the ‘scoring pipeline’ in a consistent way, that is fully portable across different environments — What’s the use of data science experiments if the output cannot be used by downstream applications? Also when data changes, features change, model changes and thus scoring pipelines need to be regenerated and can be  impossible to keep up, when done manually.

Even though 80% of the data enterprise is tabular data, bringing AI/ML projects to fruition remains a challenge for most enterprises — in fact, AI/ML projects stall because of the issues mentioned above resulting in businesses stay behind the AI maturity curve.

Industrial strength AI/ML model creation on your tabular data

To cite an example of how some of the above issues are tackled, I use the Pima Indians Diabetes Data Set from National Institute of Diabetes and Digestive and Kidney Diseases (also available in Kaggle at in the blog post.

You can read the description of the data from the above link. In essence, we are trying to build a model on the outcome of “Diabetes — Yes or No” based on historical data. In the future, when new data arrives, the model can predict if a patient has diabetes or not, learned from prior history. It’s not a big data set, but the process is no different when applied to millions rows of data – works the exact same way. I chose this example as it’s a public data set and easier to understand the outcome without being a domain expert.

Algorithms and Feature Engineering

I’m going to use’s Driverless AI to upload the data set in the Data Sets Page and then split them 80/20 into a diabetes_train and diabetes_test by right-clicking and choosing “Split” next to diabetes.csv

I click on “Predict” next to diabetes_train data set.

I then choose my Target variable as “Outcome” and then set the Scorer to AUC (Area Under the Curve — higher the better and lower the false positives and negatives). I also choose the test data set to diabetes_test. I then click Launch Experiment.

The Algorithms, Cross Validation Scheme, Feature Engg. to be done is all decided by Driverless (using our Kaggle Grandmaster’s recipes) and then models start getting built and experiment finishes in 13 min. 🙂

A Note on Automatic Feature Engineering in Driverless AI:  Unlike the common practice of building features ahead of time (before model tuning), Driverless AI creates features on the fly using an Evolutionary technique, avoiding the exhaustive feature generation step. This results in performance, better features and less resource consumption.

I’m looking at the AUC for the test score and it is 0.85148 which is higher than the validation/ensemble score, which means Driverless AI generalized well to predict higher on the data it has not seen.

Model Documentation

Can I get the documentation please on the winning model and features? Click on Experiment Summary and find report.docx that is written for a Data Scientist.

Some more screenshots on what you can find inside.

Model Explainability

The screenshot below shows the Machine Learning Interpretability Dashboard that is derived from the final predictions. The Explainability tool is model agnostic and uses K-LIME and LIME-SUP to build surrogate models and explain away with reason codes. Also shows the final global variable importance on the top right — Glucose and BMI (Body Mass Index) comes on top :). Shapley is tucked inside as well. For individual predictions, the reason codes are localized and sometimes may vary from global importance, to explain what the model found to the business.

Scoring Pipeline

Ok, we have a great model that I explain to business— how do we deploy in production? You can simply download the python or java scoring pipeline from the finished experiment page. All the individual algorithms scoring, final ensemble scoring, feature engineering is all part of the code, that can be loaded into a CICD pipeline or an existing package distribution framework.


We saw how to build a world-class model, with the highest accuracy possible - without writing a single line of code. The model would be equal or better than what you can find in the blogs and was created in like 13 minutes. The tool did all the algorithm selection, feature engineering with just default settings! We were able to create a doc with the steps like a champ and then generate code to go to production and explain the model away with a dashboard. This is what industrial strength automatic machine learning looks like today. You can convert business requirements to production ready models from the data, that you have already collected and avoid time/resource consuming feature engineering and model tuning.

Driverless AI is available on-prem, on all the cloud providers and on partner hardware.

About the Author

Karthik Guruswamy

Karthik is a Principal Pre-sales Solutions Architect with H2O. In his role, Karthik works with customers to define, architect and deploy H2O’s AI solutions in production to bring AI/ML initiatives to fruition.

Karthik is a “business first” data scientist. His expertise and passion have always been around building game-changing solutions - by using an eclectic combination of algorithms, drawn from different domains. He has published 50+ blogs on “all things data science” in Linked-in, Forbes and Medium publishing platforms over the years for the business audience and speaks in vendor data science conferences. He also holds multiple patents around Desktop Virtualization, Ad networks and was a co-founding member of two startups in silicon valley.

Leave a Reply

Enhancing H2O Model Validation App with h2oGPT Integration

As machine learning practitioners, we’re always on the lookout for innovative ways to streamline and

May 17, 2023 - by Parul Pandey
Building a Manufacturing Product Defect Classification Model and Application using H2O Hydrogen Torch, H2O MLOps, and H2O Wave

Primary Authors: Nishaanthini Gnanavel and Genevieve Richards Effective product quality control is of utmost importance in

May 15, 2023 - by Shivam Bansal
AI for Good hackathon
Insights from AI for Good Hackathon: Using Machine Learning to Tackle Pollution

At, we believe technology can be a force for good, and we're committed to

May 10, 2023 - by Parul Pandey and Shivam Bansal
H2O democratizing LLMs
Democratization of LLMs

Every organization needs to own its GPT as simply as we need to own our

May 8, 2023 - by Sri Ambati
h2oGPT blog header
Building the World’s Best Open-Source Large Language Model:’s Journey

At, we pride ourselves on developing world-class Machine Learning, Deep Learning, and AI platforms.

May 3, 2023 - by Arno Candel
LLM blog header
Effortless Fine-Tuning of Large Language Models with Open-Source H2O LLM Studio

While the pace at which Large Language Models (LLMs) have been driving breakthroughs is remarkable,

May 1, 2023 - by Parul Pandey

Request a Demo

Explore how to Make, Operate and Innovate with the H2O AI Cloud today

Learn More