April 10th, 2019

H2O-3, Sparkling Water and Enterprise Steam Updates

RSS icon RSS Category: Community, Data Science, H2O Release, Technical
Fallback Featured Image

We are excited to announce the new release of H2O Core, Sparkling Water and Enterprise Steam.

Below are some of the new features we have added:


Yates ( – 3/31/2019

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-yates/1/index.html


  • [PUBDEV-6159] – The AutoMLTest.java test suite now runs correctly on a local machine.
  • [PUBDEV-6189] – Fixed an issue in as_date that occurred when the column included NAs.
  • [PUBDEV-6208] – AutoML no longer fails if one of the Stacked Ensemble models is deleted.
  • [PUBDEV-6230] – Removed ellipses after the H2O server link when launching the Python client..
  • [PUBDEV-6231] – In Deep Learning, fixed an issue that occurred when running one-hot-encoding on categoricals.
  • [PUBDEV-6266] – In predictions, fixed an issue that resulted in a “Categorical value out of bounds error” when calling a model.
  • [PUBDEV-6284] – The Python API no longer reverses the labels for positive and negative values in the standardized coefficients plot legend.
  • [PUBDEV-6346] – In R, fixed an issue that cause group_by mean to only calculate one column when multiple columns were specified.
  • [PUBDEV-6350] – Fixed an issue that caused the confusion_matrix method to return matrices for other metrics.
  • [PUBDEV-6357] – Fixed an issue that resulted in a “Categorical value out of bounds error” when calling a model using Python.
  • [PUBDEV-6360] – Improved the error message that displays when a user attempts to modify an Enum/categorical column as if it were a string.
  • [PUBDEV-6367] – Rows that start with a # symbol are no longer dropped during the import process.
  • [PUBDEV-6368] – Fixed an SVM import failure.
  • [PUBDEV-6376] – Fixed an issue that caused the default StackedEnsemble prediction to fail when applied to a test dataset without a response column.
  • [PUBDEV-6379] – Fixed handling of BAD state in CategoricalWrapperVec.

New Feature

  • [PUBDEV-4680] – Added Blending mode to Stacked Ensembles, which can be specified with the `blending_frame` parameter. With Blending mode, you do not use cross-validation preds to train the metalearner. Instead you score the base models on a holdout set and use those predicted values.
  • [PUBDEV-5801] – Model output now includes column names and types.
  • [PUBDEV-5809] – AutoML now includes a max_runtime_secs_per_model option.
  • [PUBDEV-5925] – In GLM, added support for negative binomial family.
  • [PUBDEV-6056] – For GBM and XGBoost models, users can now generate feature contributions (SHAP values).
  • [PUBDEV-6136] – Added support for Generic Models, which provide a means to use external, pretrained MOJO models in H2O for scoring. Currently only GBM, DRF, IF, and GLM MOJO models are supported.
  • [PUBDEV-6180] – Added the blending_frame parameter to Stacked Ensembles in Flow.
  • [PUBDEV-6196] – Added an include_algos parameter to AutoML in the R and Python APIs. Note that in Flow, users can specify exclude_algos only.
  • [PUBDEV-6339] – In the R and Python clients, added a function that calculates the chunk size based on raw size of the data, number of CPU cores, and number of nodes.
  • [PUBDEV-6344] – Added ability to import from Hive using metadata from Metastore.
  • [PUBDEV-6358] – Users can now choose the database where import_sql_select creates a temporary table.
  • [PUBDEV-6365] – Added support for monotonicity constraints for binomial GBMs.
  • [PUBDEV-6374] – Users can now define custom HTTP headers using an `-add_http_header` option.
  • [PUBDEV-6386] – XGBoost MOJO now uses Java predictor by default.


  • [PUBDEV-4982] – Fixed an issue that caused Python tests to sometimes fail when run inside a Docker container.
  • [PUBDEV-5876] – Simplified and improved the GLM COD implementation.


  • [PUBDEV-5491] – SQLite support is available via any JDBC driver in streaming mode.
  • [PUBDEV-5993] – Updated Retrofit and okHttp dependencies.
  • [PUBDEV-6129] – Target Encoding is now available in the Python client.
  • [PUBDEV-6176] – Moved StackedEnsembleModel to hex.ensemble packages. In prior versions, this was in a root hex package.
  • [PUBDEV-6188] – Secret key ID and secret key are available for s3:// AWS protocol.
    • This can be done in the R client using: h2o.setS3Credentials(accessKeyId, accesSecretKey)
    • And in Python client using: from h2o.persist import set_s3_credentials set_s3_credentials(access_key_id, secret_access_key)
  • [PUBDEV-6217] – Users can now specify AWS credentials at runtime.
  • [PUBDEV-6254] – The new blending_frame parameter is now available in AutoML.
  • [PUBDEV-6334] – Fixed an error in the Javadoc for the Frame.java sort function.
  • [PUBDEV-6363] – Fixed Hive delegation token generation.
  • [PUBDEV-6388] – Reordered the algorithms train in AutoML and prioritized hardcoded XGBoost models.


  • [PUBDEV-4977] – Removed FAQ indicating that Java 9 was not yet supported.
  • [PUBDEV-6136] – Added a “Generic Models” chapter to the Algorithms section.
  • [PUBDEV-6179] – Added the blending_frame parameter to Stacked Ensembles documentation.
  • [PUBDEV-6280] – Added information about the Negative Binomial family to the GLM booklet and the user guide.
  • [PUBDEV-6289] – Improved the R and Python client documentation for the `sum` function.
  • [PUBDEV-6331] – Added include_algos,e xclude_algos, max_models, and max_runtime_secs_per_model examples to the Parameters appendix.
  • [PUBDEV-6362] – In the User Guide and R an Python documentation, replaced references to “H2O Cloud” with “H2O Cluster”.
  • [PUBDEV-6375] – Added information about predict_contributions to the Performance and Prediction chapter.
  • [PUBDEV-6381] – In the GBM chapter, noted that monotone_constraints is available for Bernoulli distributions in addition to Gaussian distributions.
  • Improved the GBM Reproducibility FAQ.

Xu ( – 3/13/2019

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-xu/6/index.html


  • [PUBDEV-6335] – In GBM, added a check to ensure that monotonicity constraints can only be used when distribution=”gaussian”.
  • [PUBDEV-6342] – Fixed an issue that caused decreasing monotonic constraints to fail to work correctly. Min-Max bounds are now properly propagated to the subtrees.


  • [PUBDEV-6343] – Added internal validation of monotonicity of GBM trees.


Sparking Water:

v2.4.9 – 04/03/2019

Download at: http://h2o-release.s3.amazonaws.com/sparkling-water/rel-2.4/9/index.html


  • SW-1162 – Exception when there is a column with BOOLEAN type in dataset during H2OMOJOModel transformation
  • SW-1177 – In Pysparkling script, setting –driver-class-path influences the environment
  • SW-1178 – Upgrade to h2O
  • SW-1180 – Use specific metrics in grid search, in the same way as H2O Grid
  • SW-1181 – Document off heap memory configuration for Spark in Standalone mode/IBM conductor
  • SW-1182 – Fix random project name generation in H2OAutoML Spark Wrapper

New Feature

  • SW-1167 – Expose search_criteria for H2OGridSearch
  • SW-1174 – Expose H2OGridSearch models
  • SW-1183 – Add include Algos to H2o AutoML pipeline stage & ability to ignore XGBoost


  • SW-1164 – Add Sparkling Water to Jupyter spark/pyspark kernels in EMR terraform template
  • SW-1171 – Upgrade build to Gradle 5.2.1
  • SW-1175 – Integrate with H2O native hive support

v2.3.26 – 03/15/2019

Download at: http://h2o-release.s3.amazonaws.com/sparkling-water/rel-2.3/26/index.html


  • SW-1163 – Expose missing variables in shared TF EMR SW template


  • SW-1145 – Start jupyter notebook with Scala & Python Spark in AWS EMR Terraform template
  • SW-1165 – Upgrade to H2O

v2.3.25 – 03/07/2019

Download at: http://h2o-release.s3.amazonaws.com/sparkling-water/rel-2.3/25/index.html


  • SW-1150 – hc.stop() shows ‘exit’ not defined error
  • SW-1152 – Fix RSparkling in case the jars are being fetched from maven
  • SW-1156 – H2OXgboost pipeline stage does not define updateH2OParams method
  • SW-1159 – Unique project name in automl to avoid sharing one leaderboard
  • SW-1161 – Fix grid search pipeline step on pyspark side


  • SW-1052 – Document teraform scripts for AWS
  • SW-1089 – Document using Google Cloud Storage In Sparkling Water
  • SW-1135 – Speed up conversion between sparse spark vectors and h2o frames by using sparse new chunk
  • SW-1141 – Improve terraform templates for AWS EMR and make them part of the release process
  • SW-1147 – Integrate with Spark 2.3.3
  • SW-1149 – Allow login via ssh to created cluster using terraform
  • SW-1153 – Add H2OGridSearch pipeline stage to PySpark
  • SW-1155 – Test GBM Grid Search Scala pipeline step
  • SW-1158 – Generalize H2OGridSearch Pipeline step to support other available algos
  • SW-1160 – Upgrade H2O to

Enterprise Steam:

Version 1.4.7 – 04/03/2019

  • Fix Sparkling Water proxy issue with uppercase usernames
  • Improve uploading h2o-3 engines
  • Set SPARK_YARN_MODE correctly based on the Hadoop distribution

Version 1.4.6 – 04/01/2019

  • Added ability to choose H2O-3 Leader Node when starting a cluster
  • Added ability to control the number of clusters a user can spin per cluster profile
  • Added option to select default Sparkling Water backend
  • Added automatic redirection back to login with an expired session cookie
  • Added an ability to auto-assign Steam profiles according to SAML profiles
  • Docs: Add “Before you begin installation” section
  • Docs: Documented steam.yaml configuration options
  • Docs: Updated documentation
  • Fix an issue when Steam was hitting API endpoints of dead clusters
  • Fix and issue when hadoop-unjar files were not deleted from temp directory
  • Fix issue with uppercase usernames and Sparkling Water on Hadoop

Version 1.4.5 – 03/22/2019

  • Added Configurable Steam Web UI timeout (STEAM_WEB_UI_TIMEOUT_MIN)

Version 1.4.4 – 02/20/2019

  • Make log file permissions configurable (STEAM_LOG_PERMISSIONS)
  • H2O: Communicate with cluster using leader node only
  • SW: Added support for Hive tables
  • SW: Disable Spark dynamic allocation for internal backend
  • SW: Bundle and distribute all pysparkling dependencies
  • LDAP group configuration is no longer mandatory
  • Bug fixes for Jupyterhub
  • Bug fixes for Sparkling Water params
  • Bug fixes for CDH5

Please see links below for additional details on H2O & Sparkling Water.

Release Notes:




H2O & Sparkling Water Documentation:



If you have any questions, please reach out to support@h2o.ai

Venkatesh Yadav

About the Author

venkatesh yadav
Venkatesh Yadav

Software Engineering Leader at heart with a focus on building great teams that delivers amazing products and customer happiness. Venkatesh serves H2O as VP of Engineering Services. He joined the company from Adobe Systems, where he held a number of positions in the Software Engineering and Leadership space including his latest role as Sr. Manager, Software Engineering and Product Management with primary focus on Master Data Management and Data Science. Venkatesh played an instrumental Engineering and Product Management leadership role as an “Entrepreneur in Residence” in the various key strategic programs and initiatives like Adobe@Adobe, Adobe.io and Adobe.Data. Experience of managing and working with teams across the globe in US, Canada, Switzerland, Romania, India with a focus on value creation. Prior to Adobe Systems Venkatesh has served technology companies in various engineering roles in companies like Philips, HP and IBM. Venkatesh holds a Bachelor of Commerce degree from Mumbai University India and has successfully completed Product Management program from UC Berkeley and General Business Administration and Management program from McGill University. Connect with Venkatesh (@venkateshai)

Leave a Reply

Recap of H2O World India 2023: Advancements in AI and Insights from Industry Leaders

On April 19th, the H2O World  made its debut in India, marking yet another milestone

May 29, 2023 - by Parul Pandey
Enhancing H2O Model Validation App with h2oGPT Integration

As machine learning practitioners, we’re always on the lookout for innovative ways to streamline and

May 17, 2023 - by Parul Pandey
Building a Manufacturing Product Defect Classification Model and Application using H2O Hydrogen Torch, H2O MLOps, and H2O Wave

Primary Authors: Nishaanthini Gnanavel and Genevieve Richards Effective product quality control is of utmost importance in

May 15, 2023 - by Shivam Bansal
AI for Good hackathon
Insights from AI for Good Hackathon: Using Machine Learning to Tackle Pollution

At H2O.ai, we believe technology can be a force for good, and we're committed to

May 10, 2023 - by Parul Pandey and Shivam Bansal
H2O democratizing LLMs
Democratization of LLMs

Every organization needs to own its GPT as simply as we need to own our

May 8, 2023 - by Sri Ambati
h2oGPT blog header
Building the World’s Best Open-Source Large Language Model: H2O.ai’s Journey

At H2O.ai, we pride ourselves on developing world-class Machine Learning, Deep Learning, and AI platforms.

May 3, 2023 - by Arno Candel

Request a Demo

Explore how to Make, Operate and Innovate with the H2O AI Cloud today

Learn More