February 22nd, 2018

New features in H2O 3.18

RSS icon RSS Category: AutoML, Ensembles, H2O Release, XGBoost
Xgboost logo

Wolpert Release (H2O 3.18)

There’s a new major release of H2O and it’s packed with new features and fixes!
We named this release after David Wolpert, who is famous for inventing Stacking (aka Stacked Ensembles). Stacking is a central component in H2O AutoML, so we’re very grateful for his contributions to machine learning! He is also famous for the “No Free Lunch” theorem, which generally states that no single algorithm will be the best in all cases. In other words, there’s no magic bullet. This is precisely why stacking is such a powerful and practical algorithm — you never know in advance if a Deep Neural Network, or GBM or Random Forest will be the best algorithm for your problem. When you combine all of these together into a stacked ensemble, you are guaranteed to benefit from the strengths of each of these algorithms. You can read more about Dr. Wolpert and his work here.

Distributed XGBoost

Xgboost logo
The central feature of this release is support for distributed XGBoost, as well as other XGBoost enhancements and bug fixes. We are bringing XGBoost support to more platforms (including older versions of CentOS/Ubuntu) and we now support multi-node XGBoost training (though this feature is still in “beta”).
There are a number of XGBoost bug fixes, such the ability to use XGBoost models after they have been saved to disk and re-loaded into the H2O cluster, and fixes to the XGBoost MOJO. With all the improvements to H2O’s XGBoost, we are much closer to adding XGBoost to AutoML, and you can expect to see that in a future release. You can read more about the H2O XGBoost integration in the XGBoost User Guide.

AutoML & Stacked Ensembles

Black branch tree
One big addition to H2O Automatic Machine Learning (AutoML) is the ability to turn off certain algorithms. By default, H2O AutoML will train Gradient Boosting Machines (GBM), Random Forests (RF), Generalized Linear Models (GLM), Deep Neural Networks (DNN) and Stacked Ensembles. However, sometimes it may be useful to turn off some of those algorithms. In particular, if you have sparse, wide data, you may choose to turn off the tree-based models (GBMs and RFs). Conversely, if tree-based models perform comparatively well on your data, then you may choose to turn off GLMs and DNNs. Keep in mind that Stacked Ensembles benefit from diversity of the set of base learners, so keeping “bad” models may still improve the overall performance of the Stacked Ensembles created by the AutoML run. The new argument is called exclude_algos and you can read more about it in the AutoML User Guide.
There are several improvements to the Stacked Ensemble functionality in H2O 3.18. The big new feature is the ability to fully customize the metalearning algorithm. The default metalearner (a GLM with non-negative weights) usually does pretty well, however, you are encouraged to experiment with other algorithms (such as GBM) and various hyperparameter settings. In the next major release, we will add the ability to easily perform a grid search on the hyperparameters of the metalearner algorithm using the standard H2O Grid Search functionality.

Highlights

Below is a list of some of the highlights from the 3.18 release. As usual, you can see a list of all the items that went into this release at the Changes.md file in the h2o-3 GitHub repository.
New Features:

  • PUBDEV-4652 – Added support for XGBoost multi-node training in H2O
  • PUBDEV-4980 – Users can now exclude certain algorithms during an AutoML run
  • PUBDEV-5086 – Stacked Ensemble should allow user to pass in a customized metalearner
  • PUBDEV-5224 – Users can now specify a seed parameter in Stacked Ensemble
  • PUBDEV-5204 – GLM: Allow user to specify a list of interactions terms to include/exclude

Bugs:

  • PUBDEV-4585 – Fixed an issue that caused XGBoost binary save/load to fail
  • PUBDEV-4593 – Fixed an issue that caused a Levenshtein Distance Normalization Error
  • PUBDEV-5133 – In Flow, the scoring history plot is now available for GLM models
  • PUBDEV-5195 – Fixed an issue in XGBoost that caused MOJOs to fail to work without manually adding the Commons Logging dependency
  • PUBDEV-5215 – Users can now specify interactions when running GLM in Flow
  • PUBDEV-5315 – Fixed an issue that caused XGBoost OpenMP to fail on Ubuntu 14.04

Documentation:

  • PUBDEV-5311 – The H2O-3 download site now includes a link to the HTML version of the R documentation

Download here: http://h2o-release.s3.amazonaws.com/h2o/latest_stable.html

Leave a Reply

+
H2O LLM DataStudio Part II: Convert Documents to QA Pairs for fine tuning of LLMs

Convert unstructured datasets to Question-answer pairs required for LLM fine-tuning and other downstream tasks with

September 22, 2023 - by Genevieve Richards, Tarique Hussain and Shivam Bansal
+
Building a Fraud Detection Model with H2O AI Cloud

In a previous article[1], we discussed how machine learning could be harnessed to mitigate fraud.

July 28, 2023 - by Asghar Ghorbani
+
A Look at the UniformRobust Method for Histogram Type

Tree-based algorithms, especially Gradient Boosting Machines (GBM's), are one of the most popular algorithms used.

July 25, 2023 - by Hannah Tillman and Megan Kurka
+
H2O LLM EvalGPT: A Comprehensive Tool for Evaluating Large Language Models

In an era where Large Language Models (LLMs) are rapidly gaining traction for diverse applications,

July 19, 2023 - by Srinivas Neppalli, Abhay Singhal and Michal Malohlava
+
Testing Large Language Model (LLM) Vulnerabilities Using Adversarial Attacks

Adversarial analysis seeks to explain a machine learning model by understanding locally what changes need

July 19, 2023 - by Kim Montgomery, Pramit Choudhary and Michal Malohlava
+
Reducing False Positives in Financial Transactions with AutoML

In an increasingly digital world, combating financial fraud is a high-stakes game. However, the systems

July 14, 2023 - by Asghar Ghorbani

Ready to see the H2O.ai platform in action?

Make data and AI deliver meaningful and significant value to your organization with our state-of-the-art AI platform.