June 21st, 2017

Scalable Automatic Machine Learning: Introducing H2O’s AutoML

RSS icon RSS Category: AutoML, Ensembles, H2O Release, Technical
Machine for peneteration

Prepared by: Erin LeDell, Navdeep Gill & Ray Peck
Machine for peneteration
In recent years, the demand for machine learning experts has outpaced the supply, despite the surge of people entering the field. To address this gap, there have been big strides in the development of user-friendly machine learning software that can be used by non-experts and experts, alike. The first steps toward simplifying machine learning involved developing simple, unified interfaces to a variety of machine learning algorithms (e.g. H2O).
Although H2O has made it easy for non-experts to experiment with machine learning, there is still a fair bit of knowledge and background in data science that is required to produce high-performing machine learning models. Deep Neural Networks in particular are notoriously difficult for a non-expert to tune properly. We have designed an easy-to-use interface which automates the process of training a large, diverse, selection of candidate models and training a stacked ensemble on the resulting models (which often leads to an even better model). Making it’s debut in the latest “Preview Release” of H2O, version 3.12.0.1 (aka “Vapnik”), we introduce H2O’s AutoML for Scalable Automatic Machine Learning.
H2O’s AutoML can be used for automating a large part of the machine learning workflow, which includes automatic training and tuning of many models within a user-specified time-limit. The user can also use a performance metric-based stopping criterion for the AutoML process rather than a specific time constraint. Stacked Ensembles will be automatically trained on the collection individual models to produce a highly predictive ensemble model which, in most cases, will be the top performing model in the AutoML Leaderboard.

AutoML Interface

We provide a simple function that performs a process that would typically require many lines of code. This frees up users to focus on other aspects of the data science pipeline tasks such as data-preprocessing, feature engineering and model deployment.
R:

aml <- h2o.automl(x = x, y = y, training_frame = train,
                  max_runtime_secs = 3600)

Python:

aml = H2OAutoML(max_runtime_secs = 3600)
aml.train(x = x, y = y, training_frame = train)

Flow (H2O’s Web GUI):
Run AutoML

AutoML Leaderboard

Each AutoML run returns a “Leaderboard” of models, ranked by a default performance metric. Here is an example leaderboard for a binary classification task:
Model Id auc data
More information, and full R and Python code examples are available on the H2O 3.12.0.1 AutoML docs page in the H2O User Guide.

Leave a Reply

+
H2O World Dallas Customer Talks

After three long years of not having an #H2OWorld, we finally held our first one

November 24, 2022 - by Vinod Iyengar
+
New in Wave 0.24.0

Another Wave release has arrived with quite a few exciting new features. Let's quickly go

November 21, 2022 - by Martin Turoci
Fallback Featured Image
+
H2O.ai Raises $40 Million to Democratize Artificial Intelligence for the Enterprise

Series C round led by Wells Fargo and NVIDIA MOUNTAIN VIEW, CA – November 30, 2017

November 20, 2022 - by
+
H2O.ai Placed Furthest in Completeness of Vision in 2021 Gartner Data Science and Machine Learning Magic Quadrant in the Visionaries Quadrant. — Copy

At H2O.ai, our mission is to democratize AI, and we believe driving value from data

November 18, 2022 - by Read Maloney, SVP of Marketing
+
H2O.ai Expands Market Footprint in Healthcare AI by Signing Hackensack Meridian Health and Other Key Providers

We’re excited to attend the HLTH conference this week in Las Vegas, NV. This industry

November 14, 2022 - by Prashant Natarajan
+
An Introduction to H2O Wave Table

H2O Wave is a Python package for creating realtime ML/AI applications for a wide variety of

November 13, 2022 - by Rohan Rao

Start Your Free Trial