Return to page


H2O Release 3.44


By Marek Novotny | minute read | October 20, 2023

Blog decorative banner image

We are excited to announce the release of H2O-3! We have added and improved many items. A few of our highlights are the implementation of AdaBoost, Shapley values support, Python 3.10 and 3.11 support, and added custom metric support for Deep Learning, Uplift Distributed Random Forest (DRF), Stacked Ensemble, and AutoML. Please read on for more details.

AdaBoost (Adam Valenta)

We are proud to introduce AdaBoost, an algorithm known for its effectiveness in improving model performance. AdaBoost is particularly notable for its approach in constructing an ensemble of weak learners (typically decision trees) and sequentially refining them to enhance predictive accuracy. It achieves this by assigning higher weights to misclassified data points in each iteration. This emphasizes the correction of errors and ultimately leads to a more precise and robust predictive model. This adaptability and refinement makes AdaBoost a valuable tool in various domains, allowing it to aid in better predictions and informed decision-making.

Shapley support for ensemble models (Tomáš Frýda)

Stacked Ensembles now supports SHapley Additive exPlanations (SHAP) estimation using the Generalized-DeepSHAP method. This is only supported for base models and metalearner models that support SHAP estimation with a background frame. Support for SHAP with a background frame was added for:

There are two variants of the newly implemented SHAP: baseline SHAP and marginal SHAP (default when calling predict_contributions with a background dataset). Baseline SHAP returns contributions for each point from the background dataset. Marginal SHAP returns the average contribution across the whole background dataset. The calculation of both of these SHAP methods can have big memory requirements because the result of the baseline has number of rows equal to nrows(frame) * nrow(background_frame). For marginal SHAP contributions in Stacked Ensembles, we optimized the calculation by going through the whole process (baseline SHAP —> average) several times, so the memory usage is small than (number of base models + 1) * nrow(frame) * nrow(background_frame) (unless the frame is very small).


The new SHAP implementation requires you to choose your references, or background dataset. This can be used for getting new insights as seen in Figure 3 of Explaining a series of models by propagating Shapley values. It can also be used to comply with some regulations that require explanations with regards to some reference.


For example, according to the Consumer Financial Protection Bureau, for credit denials in the US, the regulatory commentary suggests to “identify the factors for which the applicant’s score fell furthest below the average score for each of those factors achieved by applicants whose total score was at or slightly above the minimum passing score.” This process can be done by using the applicants just above the cutoff to receive the credit product as the background dataset according to Hall et al. in their book Machine Learning for High-Risk Applications.

Fixed H2O-3 Vulnerabilities (Marek Novotný)

This release contains fixes for more than 30 CVE vulnerabilities in the standalone h2o.jar, Python package, R package, and the docker image for Kubernetes. These deployment artifacts don’t contain any critical or high CVE vulnerabilities at the time of writing this article.

Categorical feature support for Single Decision Tree (Yuliia Syzon)

We added support for categorical columns into the Single Decision Tree. You can now build a binary Single Decision Tree classifier with both numerical and categorical columns!


Categorical values are treated as non-sortable values. When splitting the dataset into nodes, a categorical binning approach is utilized. It’s important for you to note that the number of categories shouldn’t be excessively large. Ideally, up to 10 categories is optimal for this implementation.

Uplift DRF enhancements (Veronika Maurerova)

There have been several enhancements to the Uplift DRF algorithm.

New treatment effect metrics

Treatment effect metrics show how the uplift predictions look across the whole dataset (population). Scored data are used to calculate these metrics (uplift_predict column = individual treatment effect).

  • Average Treatment Effect (ATE): the average expected uplift prediction (treatment effect) over all records in the dataset.
  • Average Treatment Effect on the Treated (ATT): the average expected uplift prediction (treatment effect) of all records in the dataset belonging to the treatment group.
  • Average Treatment Effect on the Control (ATC): the average expected uplift prediction (treatment effect) of all records in the dataset belonging to the control group.

Custom metric functionality enabled

You can now specify your custom metric if you need a special metric calculation. Check out an example of the custom metric for Uplift DRF.

MOJO support introduced

You can import the Uplift DRF model as a MOJO and deploy it to your environment.

Prediction table renamed

Due to your feedback, we’ve chosen to rename the prediction table column names to be more precise. We changed p_y1_ct1 to p_y1_without_treatment and p_y1_ct0 to p_y1_with_treatment.

Make metrics from a new dataset with custom AUUC thresholds

This new feature enables custom AUUC thresholds to calculate the AUUC metric using the make_metrics method. If you don’t specify custom thresholds, the default ones will be used.

Deep Learning with custom metric

We have implemented custom metric support for the Deep Learning model. This option is not available for AutoEncoder Deep Learning models.


Marek Novotný, Wendy Wong, Adam Valenta, Tomáš Frýda, Veronika Maurerova, Bartosz Krasinski, Yuliia Syzon, Sebastien Poirier, Hannah Tillman


Marek Novotny

Marek is a software engineer focusing on development of the Sparkling Water project. He obtained a master degree in computer science at Charles University in Prague. Before Marek joined the team, he spent several years in financial industry developing scalable and fault-tolerant software systems. He is excited about learning new things and open-source software.


Wendy Wong

Wendy is a hacker at devising solutions to making systems smarter. Prior to working at, she was building intelligent applications on mobile devices to recognize user activities from sensor data and predict user app usage from user logs at Lab126. At Intel Labs and Aperto Networks, she was a system engineer/architect designing wireless communication systems for WiFi and cellular networks. Wendy obtained her bachelor in electrical engineering from Purdue University and a master and Ph.D. in Electrical Engineering from Cornell. She loves machine learning, swarm intelligence, mathematics and wireless communication systems. She enjoys being involved with all phases of the product development cycle; requirements analysis (what are we building and how well does it need to perform), architecture/algorithm design, performance analysis and simulation, prototyping/implementation, integration, test, and verification. Her diverse interests and skills are reflected in her patents. In her spare time, Wendy loves learning new things, being active, reading, and scuba diving. She loves the ocean and wishes she could be an amphibian someday.