June 21st, 2017

Scalable Automatic Machine Learning: Introducing H2O’s AutoML

RSS icon RSS Category: AutoML, Ensembles, H2O Release, Technical
Machine for peneteration

Prepared by: Erin LeDell, Navdeep Gill & Ray Peck
Machine for peneteration
In recent years, the demand for machine learning experts has outpaced the supply, despite the surge of people entering the field. To address this gap, there have been big strides in the development of user-friendly machine learning software that can be used by non-experts and experts, alike. The first steps toward simplifying machine learning involved developing simple, unified interfaces to a variety of machine learning algorithms (e.g. H2O).
Although H2O has made it easy for non-experts to experiment with machine learning, there is still a fair bit of knowledge and background in data science that is required to produce high-performing machine learning models. Deep Neural Networks in particular are notoriously difficult for a non-expert to tune properly. We have designed an easy-to-use interface which automates the process of training a large, diverse, selection of candidate models and training a stacked ensemble on the resulting models (which often leads to an even better model). Making it’s debut in the latest “Preview Release” of H2O, version 3.12.0.1 (aka “Vapnik”), we introduce H2O’s AutoML for Scalable Automatic Machine Learning.
H2O’s AutoML can be used for automating a large part of the machine learning workflow, which includes automatic training and tuning of many models within a user-specified time-limit. The user can also use a performance metric-based stopping criterion for the AutoML process rather than a specific time constraint. Stacked Ensembles will be automatically trained on the collection individual models to produce a highly predictive ensemble model which, in most cases, will be the top performing model in the AutoML Leaderboard.

AutoML Interface

We provide a simple function that performs a process that would typically require many lines of code. This frees up users to focus on other aspects of the data science pipeline tasks such as data-preprocessing, feature engineering and model deployment.
R:

aml <- h2o.automl(x = x, y = y, training_frame = train,
                  max_runtime_secs = 3600)

Python:

aml = H2OAutoML(max_runtime_secs = 3600)
aml.train(x = x, y = y, training_frame = train)

Flow (H2O’s Web GUI):
Run AutoML

AutoML Leaderboard

Each AutoML run returns a “Leaderboard” of models, ranked by a default performance metric. Here is an example leaderboard for a binary classification task:
Model Id auc data
More information, and full R and Python code examples are available on the H2O 3.12.0.1 AutoML docs page in the H2O User Guide.

Leave a Reply

+
Developing and Retaining Data Science Talent

It’s been almost a decade since the Harvard Business Review proclaimed that “Data Scientist” is

May 12, 2022 - by Jon Farland
+
The H2O.ai Wildfire Challenge Winners Blog Series – Team Too Hot Encoder

Note: this is a community blog post by Team Too Hot Encoder - one of

May 10, 2022 - by H2O.ai Team
+
The H2O.ai Wildfire Challenge Winners Blog Series – Team HTB

Note: this is a community blog post by Team HTB - one of the H2O.ai

May 10, 2022 - by H2O.ai Team
+
Bias and Debiasing

An important aspect of practicing machine learning in a responsible manner is understanding how models

April 15, 2022 - by Kim Montgomery
+
Comprehensive Guide to Image Classification using H2O Hydrogen Torch

In this article, we will learn how to build state-of-the-art models in computer vision and

March 29, 2022 - by H2O.ai Team
+
H2O Wave Snippet Plugin for PyCharm

Note: this blog post by Shamil Dilshan Prematunga was first published on Medium. What is PyCham? PyCharm

March 24, 2022 - by Shamil Prematunga

Start Your Free Trial