July 23rd, 2019

Custom Machine Learning Recipes: The ingredients for success

RSS icon RSS Category: AutoML, Data Science, H2O Driverless AI, Machine Learning
Last updated: 07/23/19Machine learning is akin to cooking in several ways. A perfect dish originates from a tried-and-tested recipe, has the right combination of ingredients, and is baked at just the right temperature. Successful AI solutions work on the same principle. One needs fresh and right quality ingredients in the form of data, a skilled chef in the form of a Data Scientist and food recipes in the form of algorithms. But just like the shortage of skilled professionals is affecting the fine-dining world, in the same way, there is also a dearth of expert data scientists in the industry.

This is precisely the issue that H2O’s Driverless AI tries to address. Not only does it automate some of the most challenging and repetitive tasks in applied data science, but it also gives the ability to its users to bring their domain expertise to the platform in the form of custom curated recipes. This latest addition to the Driverless AI arsenal called Bring Your own Recipes (BYOR) further cements H2O.ai’s vision of democratizing AI for everyone.

Driverless AI (DAI)

Driverless AI in action: Performing sentiment analysis on Amazon reviews.

To say that Machine Learning has become indispensable for businesses, would be an understatement. Every company, big or small, has realized the potential of AI applications in driving better customer experiences and increased profits. However, the constant shortage of data scientists is a matter of concern today. Finding technical people who can develop production-ready AI models is not a mean task.

Driverless AI, H2O.ai’s flagship product for automatic machine learning, tries to bridge this gap between supply and demand. With Driverless AI, everyone, including expert and junior data scientists, domain scientists, and data engineers can develop trusted machine learning models and that too, faster.

The goal of DAI is not to replace data scientists but to empower them using automation and state-of-the-art tools. In other words, H2O Driverless AI enables AI to do AI.

The Key Capabilities of Driverless AI

Here is an end-to-end demo of H2O Driverless AI. This demo includes Data Visualization, an AI experiment, and Machine Learning Interpretability via Surrogate Models. The demo below gives a perfect overview in just over 6 minutes. You can read more about DAI and its capabilities here.

Driverless AI + Your Recipes = A Truly Extensible AI Platform

H2O Driverless AI is being successfully used across many industries ranging from healthcare to financial institutions to telecom and marketing. Its performance can be leveraged even further by incorporating the domain knowledge and intuition of the users.

The latest version (1.7.0) of DAI implements a key feature called BYOR which stands for Bring Your Own Recipes. This feature has been designed to enable Data Scientists or domain experts to customize the DAI as per their business needs.

Recipes are customizations and extensions to the Driverless AI platform. They are nothing but Python code snippets that can be uploaded into Driverless AI at runtime, like plugins. Recipes can be either any one or a combination of the following:

  • Custom machine learning models
  • Custom scorers (classification or regression)
  • Custom transformers
  • Custom datasets

Here are some of the examples of the recipes:

For a complete list, visit the associated GitHub repository which has been completely open-sourced: https://github.com/h2oai/driverlessai-recipes

How Driverless AI Recipes Work

During the training of a machine learning pipeline, Driverless AI can use these custom recipes as building blocks, either independently or in combination with the built-in code pieces. Recipes only need to be added once. After a recipe is added to an experiment, that recipe will then be available for all the future experiments. Users can, in a way gain control over the optimization choices that Driverless AI makes to best solve their specific machine learning problems.

Demo using Credit Card Dataset

All one needs to create a recipe is a text editor. Simply, create a .py file and punch in the source code. Users can either use one of their own recipes or the ones provided by H2O. H2O has built and open-sourced more than 80 recipes which can be used as templates. The screenshot below gives a preview of the various types of recipes currently in the offering.

Let’s use the famous Credit Card data from Kaggle to demonstrate how to include a recipe into an experiment. The data belongs to a Taiwanese bank and the goal is to predict who will default on the credit card payment.

  • Clone the repository containing the recipes(https://github.com/h2oai/driverlessai-recipes)and make modifications to existing recipes, if needed.
  • Start a Driverless AI instance. Upload the data and select default as the target column. Keep all the other parameters as default.

  • Next, go to the expert settings and upload a recipe from the cloned repository. Suppose we want a new transformer for numeric data. We shall navigate to the driverlessai-recipes > transformers > numericand select the round_transformer.py file. This recipe rounds numbers to 1, 2 or 3 decimal places. Click save to save the settings.

  • Launch the experiment using the above custom recipe. If everything goes well, my_round transformer should show up on the screen.

  • We can also select specific transformers in the expert settings. Let’s select only my_round transformer and get rid of everything else.

This is it. Uploading a recipe is as simple as uploading a dataset, but the advantages are manifold.

Advantages of BYOR

Creating custom recipes can be advantageous to organizations in several ways:

  • BYOR enables flexibility, extensibility, and customization of the DAI platform. As a result, the focus can be more on solving domain-specific problems.
  • The recipes provided by H2O are built by the data science community, curated by the Kaggle Grand Masters themselves.
  • The custom recipes are treated as first-class citizens and are extremely easy to upload to the workflow.

The screenshot below shows how the inclusion of a custom recipe reduced the Logloss component in case of sentiment analysis, from 0.6 to 0.48, which, when translated to a business domain can have immense value.

Additional Resources

About the Author

Parul Pandey

Parul is a Data Science Evangelist here at H2O.ai. She combines Data Science, evangelism, and community in her work. She is also a Kaggle Grandmaster in the notebooks category and was one of Linkedin’s Top Voice in the Software Development category in 2019.

Leave a Reply

+
H2O Wave joins Hacktoberfest

It’s that time of the year again. A great initiative by DigitalOcean called Hacktoberfest that aims to bring

September 29, 2022 - by Martin Turoci
+
Three Keys to Ethical Artificial Intelligence in Your Organization

There’s certainly been no shortage of examples of AI gone bad over the past few

September 23, 2022 - by H2O.ai Team
+
Using GraphQL, HTTPX, and asyncio in H2O Wave

Today, I would like to cover the most basic use case for H2O Wave, which is

September 21, 2022 - by Martin Turoci
+
머신러닝 자동화 솔루션 H2O Driveless AI를 이용한 뇌에서의 성차 예측

Predicting Gender Differences in the Brain Using Machine Learning Automation Solution H2O Driverless AI 아동기 뇌인지

August 29, 2022 - by H2O.ai Team
+
Make with H2O.ai Recap: Validation Scheme Best Practices

Data Scientist and Kaggle Grandmaster, Dmitry Gordeev, presented at the Make with H2O.ai session on

August 23, 2022 - by Blair Averett
+
Integrating VSCode editor into H2O Wave

Let’s have a look at how to provide our users with a truly amazing experience

August 18, 2022 - by Martin Turoci

Start Your Free Trial