June 21st, 2017

Scalable Automatic Machine Learning: Introducing H2O’s AutoML

RSS icon RSS Category: AutoML, Ensembles, H2O Release, Technical
Machine for peneteration

Prepared by: Erin LeDell, Navdeep Gill & Ray Peck
Machine for peneteration
In recent years, the demand for machine learning experts has outpaced the supply, despite the surge of people entering the field. To address this gap, there have been big strides in the development of user-friendly machine learning software that can be used by non-experts and experts, alike. The first steps toward simplifying machine learning involved developing simple, unified interfaces to a variety of machine learning algorithms (e.g. H2O).
Although H2O has made it easy for non-experts to experiment with machine learning, there is still a fair bit of knowledge and background in data science that is required to produce high-performing machine learning models. Deep Neural Networks in particular are notoriously difficult for a non-expert to tune properly. We have designed an easy-to-use interface which automates the process of training a large, diverse, selection of candidate models and training a stacked ensemble on the resulting models (which often leads to an even better model). Making it’s debut in the latest “Preview Release” of H2O, version 3.12.0.1 (aka “Vapnik”), we introduce H2O’s AutoML for Scalable Automatic Machine Learning.
H2O’s AutoML can be used for automating a large part of the machine learning workflow, which includes automatic training and tuning of many models within a user-specified time-limit. The user can also use a performance metric-based stopping criterion for the AutoML process rather than a specific time constraint. Stacked Ensembles will be automatically trained on the collection individual models to produce a highly predictive ensemble model which, in most cases, will be the top performing model in the AutoML Leaderboard.

AutoML Interface

We provide a simple function that performs a process that would typically require many lines of code. This frees up users to focus on other aspects of the data science pipeline tasks such as data-preprocessing, feature engineering and model deployment.
R:

aml <- h2o.automl(x = x, y = y, training_frame = train,
                  max_runtime_secs = 3600)

Python:

aml = H2OAutoML(max_runtime_secs = 3600)
aml.train(x = x, y = y, training_frame = train)

Flow (H2O’s Web GUI):
Run AutoML

AutoML Leaderboard

Each AutoML run returns a “Leaderboard” of models, ranked by a default performance metric. Here is an example leaderboard for a binary classification task:
Model Id auc data
More information, and full R and Python code examples are available on the H2O 3.12.0.1 AutoML docs page in the H2O User Guide.

Leave a Reply

+
5 Tips for Improving Your Wave Apps

Let’s quickly uncover a few simple tips that are quick to implement and have a

August 9, 2022 - by Martin Turoci
+
Make with H2O.ai Recap: Getting Started with H2O Document AI

Product Owner, Data Scientist, and Kaggle Grandmaster, Mark Landry presented at the Make with H2O.ai

August 5, 2022 - by Blair Averett
+
Advice for Those Getting Started on Their AI Journey

H2O.ai Innovation Day Summer ‘22 included a customer insights panel made up of Prince Paulraj,

August 4, 2022 - by Blair Averett
+
AES Transforms its Energy Business with AI and H2O.ai

AES is a leading renewable-energy company with global operations. The business produces energy and distributes

June 20, 2022 - by Read Maloney
+
The H2O.ai Wildfire Challenge Winners Blog Series – Team Titans

Note: this is a community blog post by Team Titans - one of the H2O.ai

June 14, 2022 - by H2O.ai Team
+
Improving Machine Learning Operations with H2O.ai and Snowflake

Operationalizing models is critical for companies to get a return on their machine learning investments,

June 7, 2022 - by Eric Gudgion

Start Your Free Trial