Scalable AutoML in H2O-3 Open Source
AutoML or Automatic Machine Learning is the process of automating algorithm selection, feature generation, hyperparameter tuning, iterative modeling, and model assessment. AutoML makes it easy to train and evaluate machine learning models. Automating repetitive tasks allows people to focus on the data and the business problems they are trying to solve.
H2O Open Source AutoML
- Train the best model in the least amount of time to save human hours.
- Reduce the need for expertise in machine learning by reducing the manual code-writing time.
- Improve the performance of machine learning models.
- Increase reproducibility and establish a baseline for scientific research or applications.
- Scales training data set to clusters (Hadoop, Spark, Kubernetes)
Aspects of AutoML
- Imputation, one-hot encoding, standardization
- Feature selection and/or feature extraction (e.g. PCA)
- Count/Label/Target encoding of categorical features
- Cartesian grid search or random grid search
- Bayesian hyperparameter optimization
- Individual models can be tuned using a validation set
- Ensembles often out-perform individual models
- Stacking/Super Learning (Wolpert, Breiman)
- Ensemble Selection (Caruana)