Return to page

WIKI

Python AutoML

What is Automated Machine Learning?

Traditional machine learning (ML) processes require experts to carefully prepare, train, validate, and tune a model to get optimal results. Automated machine learning, or AutoML, is a system that attempts to facilitate the creation of models in order to make ML more accessible to non-expert users. AutoML will automatically prepare data, select ML models, and set hyperparameters for a specific predictive modeling task. The H2O AutoML provides an accessible interface that allows for simple operation of the model and modification of basic parameters.

 

Python AutoML Libraries

Auto-Sklearn

Auto-Sklearn is an ML toolkit that uses models from the well known scikit-learn library. Auto-Sklearn uses methods like Bayesian optimization to look at possible models and identify configurations that will work with a given task. It has the ability to learn from models that worked well on similar datasets. Auto-Sklearn is a valid option to get started with AutoML thanks to its simple implementation and minimal need for user input.

TPOT

Tree-based Pipeline Optimization Tool (TPOT) is a Python library designed to automate many important ML tasks including pipeline modeling, data preparation, feature selection, preprocessing, model selection, and parameter optimization. It uses a tree based structure where pipelines are represented as trees and an evolutionary process improves these trees over time to yield the optimal results. It then exports the finished model as a Python code file.

HyperOpt

HyperOpt is a Python library which develops a model using Bayesian optimization, a system for tuning a model and setting hyperparameters. It is made for use on projects involving hundreds of parameters and large scale data. HyperOpt’s scope is limited to optimizing machine learning pipelines and does not assist with other steps of the model building process. HyperOpt on its own is very complicated and requires detailed set up. However, the library HyperOpt-sklearn integrates HyperOpt but also uses the available models in the Sklearn library.

AutoKeras

AutoKeras is a library which was built to assist with the considerably more difficult tasks of building neural networks and deep learning. AutoKeras is a neural architecture search algorithm that identifies the best architectures for the task. AutoKeras builds complex models and automates much of the preprocessing. AutoKeras supports many types of data including text, images, and structured data.

 

How to Choose a Library

The optimal library depends on the task and the needs of the user. Auto-Sklearn is simple to use and provides rapid results for simpler tasks. TPOT creates highly accurate models with greater customizability, but can lead to long training times. HyperOpt also provides high accuracy but focuses on hyperparameter optimization and less on data preparation and preprocessing, leaving these tasks to the user. AutoKeras is only necessary when creating in depth models such as neural networks or if the data comes in unstructured forms such as text or images. It takes a significant amount of time and computing power to train, but can provide powerful results.

 

H2O Resources 

H2O Wiki: Automated Machine Learning