BACK TO LIST

Community, Data Science, H2O Release, Technical

H2O-3, Sparkling Water and Enterprise Steam Updates

Published: April 10, 2019

min read

Written by: Venkatesh Yadav

We are excited to announce the new release of H2O Core, Sparkling Water and Enterprise Steam.

Below are some of the new features we have added:

H2O-3

Yates (3.24.0.1) – 3/31/2019

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-yates/1/index.html

Bug

[PUBDEV-6159] – The AutoMLTest.java test suite now runs correctly on a local machine.
[PUBDEV-6189] – Fixed an issue in as_date that occurred when the column included NAs.
[PUBDEV-6208] – AutoML no longer fails if one of the Stacked Ensemble models is deleted.
[PUBDEV-6230] – Removed ellipses after the H2O server link when launching the Python client..
[PUBDEV-6231] – In Deep Learning, fixed an issue that occurred when running one-hot-encoding on categoricals.
[PUBDEV-6266] – In predictions, fixed an issue that resulted in a “Categorical value out of bounds error” when calling a model.
[PUBDEV-6284] – The Python API no longer reverses the labels for positive and negative values in the standardized coefficients plot legend.
[PUBDEV-6346] – In R, fixed an issue that cause group_by mean to only calculate one column when multiple columns were specified.
[PUBDEV-6350] – Fixed an issue that caused the confusion_matrix method to return matrices for other metrics.
[PUBDEV-6357] – Fixed an issue that resulted in a “Categorical value out of bounds error” when calling a model using Python.
[PUBDEV-6360] – Improved the error message that displays when a user attempts to modify an Enum/categorical column as if it were a string.
[PUBDEV-6367] – Rows that start with a # symbol are no longer dropped during the import process.
[PUBDEV-6368] – Fixed an SVM import failure.
[PUBDEV-6376] – Fixed an issue that caused the default StackedEnsemble prediction to fail when applied to a test dataset without a response column.
[PUBDEV-6379] – Fixed handling of BAD state in CategoricalWrapperVec.

New Feature

[PUBDEV-4680] – Added Blending mode to Stacked Ensembles, which can be specified with the `blending_frame` parameter. With Blending mode, you do not use cross-validation preds to train the metalearner. Instead you score the base models on a holdout set and use those predicted values.
[PUBDEV-5801] – Model output now includes column names and types.
[PUBDEV-5809] – AutoML now includes a max_runtime_secs_per_model option.
[PUBDEV-5925] – In GLM, added support for negative binomial family.
[PUBDEV-6056] – For GBM and XGBoost models, users can now generate feature contributions (SHAP values).
[PUBDEV-6136] – Added support for Generic Models, which provide a means to use external, pretrained MOJO models in H2O for scoring. Currently only GBM, DRF, IF, and GLM MOJO models are supported.
[PUBDEV-6180] – Added the blending_frame parameter to Stacked Ensembles in Flow.
[PUBDEV-6196] – Added an include_algos parameter to AutoML in the R and Python APIs. Note that in Flow, users can specify exclude_algos only.
[PUBDEV-6339] – In the R and Python clients, added a function that calculates the chunk size based on raw size of the data, number of CPU cores, and number of nodes.
[PUBDEV-6344] – Added ability to import from Hive using metadata from Metastore.
[PUBDEV-6358] – Users can now choose the database where import_sql_select creates a temporary table.
[PUBDEV-6365] – Added support for monotonicity constraints for binomial GBMs.
[PUBDEV-6374] – Users can now define custom HTTP headers using an `-add_http_header` option.
[PUBDEV-6386] – XGBoost MOJO now uses Java predictor by default.

Task

[PUBDEV-4982] – Fixed an issue that caused Python tests to sometimes fail when run inside a Docker container.
[PUBDEV-5876] – Simplified and improved the GLM COD implementation.

Improvement

[PUBDEV-5491] – SQLite support is available via any JDBC driver in streaming mode.
[PUBDEV-5993] – Updated Retrofit and okHttp dependencies.
[PUBDEV-6129] – Target Encoding is now available in the Python client.
[PUBDEV-6176] – Moved StackedEnsembleModel to hex.ensemble packages. In prior versions, this was in a root hex package.
[PUBDEV-6188] – Secret key ID and secret key are available for s3:// AWS protocol.
- This can be done in the R client using: h2o.setS3Credentials(accessKeyId, accesSecretKey)
- And in Python client using: from h2o.persist import set_s3_credentials set_s3_credentials(access_key_id, secret_access_key)
[PUBDEV-6217] – Users can now specify AWS credentials at runtime.
[PUBDEV-6254] – The new blending_frame parameter is now available in AutoML.
[PUBDEV-6334] – Fixed an error in the Javadoc for the Frame.java sort function.
[PUBDEV-6363] – Fixed Hive delegation token generation.
[PUBDEV-6388] – Reordered the algorithms train in AutoML and prioritized hardcoded XGBoost models.

Docs

[PUBDEV-4977] – Removed FAQ indicating that Java 9 was not yet supported.
[PUBDEV-6136] – Added a “Generic Models” chapter to the Algorithms section.
[PUBDEV-6179] – Added the blending_frame parameter to Stacked Ensembles documentation.
[PUBDEV-6280] – Added information about the Negative Binomial family to the GLM booklet and the user guide.
[PUBDEV-6289] – Improved the R and Python client documentation for the `sum` function.
[PUBDEV-6331] – Added include_algos,e xclude_algos, max_models, and max_runtime_secs_per_model examples to the Parameters appendix.
[PUBDEV-6362] – In the User Guide and R an Python documentation, replaced references to “H2O Cloud” with “H2O Cluster”.
[PUBDEV-6375] – Added information about predict_contributions to the Performance and Prediction chapter.
[PUBDEV-6381] – In the GBM chapter, noted that monotone_constraints is available for Bernoulli distributions in addition to Gaussian distributions.
Improved the GBM Reproducibility FAQ.

Xu (3.22.1.6) – 3/13/2019

Download at: http://h2o-release.s3.amazonaws.com/h2o/rel-xu/6/index.html

Bug

[PUBDEV-6335] – In GBM, added a check to ensure that monotonicity constraints can only be used when distribution=”gaussian”.
[PUBDEV-6342] – Fixed an issue that caused decreasing monotonic constraints to fail to work correctly. Min-Max bounds are now properly propagated to the subtrees.

Improvement

[PUBDEV-6343] – Added internal validation of monotonicity of GBM trees.

Docs

[PUBDEV-6337] – Updated the description of monotone_constraints for GBM. This option can only be used for gaussian distributions.
[PUBDEV-6347] – Improved documentation for the EC2 and S3 storage topic for AWS Standalone instances (http://docs.h2o.ai/h2o/latest-stable/h2o-docs/cloud-integration/ec2-and-s3.html#aws-standalone-instance).

Sparking Water:

v2.4.9 – 04/03/2019

Download at: http://h2o-release.s3.amazonaws.com/sparkling-water/rel-2.4/9/index.html

Bug

SW-1162 – Exception when there is a column with BOOLEAN type in dataset during H2OMOJOModel transformation
SW-1177 – In Pysparkling script, setting –driver-class-path influences the environment
SW-1178 – Upgrade to h2O 3.24.0.1
SW-1180 – Use specific metrics in grid search, in the same way as H2O Grid
SW-1181 – Document off heap memory configuration for Spark in Standalone mode/IBM conductor
SW-1182 – Fix random project name generation in H2OAutoML Spark Wrapper

New Feature

SW-1167 – Expose search_criteria for H2OGridSearch
SW-1174 – Expose H2OGridSearch models
SW-1183 – Add include Algos to H2o AutoML pipeline stage & ability to ignore XGBoost

Improvement

SW-1164 – Add Sparkling Water to Jupyter spark/pyspark kernels in EMR terraform template
SW-1171 – Upgrade build to Gradle 5.2.1
SW-1175 – Integrate with H2O native hive support

v2.3.26 – 03/15/2019

Download at: http://h2o-release.s3.amazonaws.com/sparkling-water/rel-2.3/26/index.html

Bug

SW-1163 – Expose missing variables in shared TF EMR SW template

Improvement

SW-1145 – Start jupyter notebook with Scala & Python Spark in AWS EMR Terraform template
SW-1165 – Upgrade to H2O 3.22.1.6

v2.3.25 – 03/07/2019

Download at: http://h2o-release.s3.amazonaws.com/sparkling-water/rel-2.3/25/index.html

Bug

SW-1150 – hc.stop() shows ‘exit’ not defined error
SW-1152 – Fix RSparkling in case the jars are being fetched from maven
SW-1156 – H2OXgboost pipeline stage does not define updateH2OParams method
SW-1159 – Unique project name in automl to avoid sharing one leaderboard
SW-1161 – Fix grid search pipeline step on pyspark side

Improvement

SW-1052 – Document teraform scripts for AWS
SW-1089 – Document using Google Cloud Storage In Sparkling Water
SW-1135 – Speed up conversion between sparse spark vectors and h2o frames by using sparse new chunk
SW-1141 – Improve terraform templates for AWS EMR and make them part of the release process
SW-1147 – Integrate with Spark 2.3.3
SW-1149 – Allow login via ssh to created cluster using terraform
SW-1153 – Add H2OGridSearch pipeline stage to PySpark
SW-1155 – Test GBM Grid Search Scala pipeline step
SW-1158 – Generalize H2OGridSearch Pipeline step to support other available algos
SW-1160 – Upgrade H2O to 3.22.1.5

Enterprise Steam:

Version 1.4.7 – 04/03/2019

Fix Sparkling Water proxy issue with uppercase usernames
Improve uploading h2o-3 engines
Set SPARK_YARN_MODE correctly based on the Hadoop distribution

Version 1.4.6 – 04/01/2019

Added ability to choose H2O-3 Leader Node when starting a cluster
Added ability to control the number of clusters a user can spin per cluster profile
Added option to select default Sparkling Water backend
Added automatic redirection back to login with an expired session cookie
Added an ability to auto-assign Steam profiles according to SAML profiles
Docs: Add “Before you begin installation” section
Docs: Documented steam.yaml configuration options
Docs: Updated documentation
Fix an issue when Steam was hitting API endpoints of dead clusters
Fix and issue when hadoop-unjar files were not deleted from temp directory
Fix issue with uppercase usernames and Sparkling Water on Hadoop

Version 1.4.5 – 03/22/2019

Added Configurable Steam Web UI timeout (STEAM_WEB_UI_TIMEOUT_MIN)

Version 1.4.4 – 02/20/2019

Make log file permissions configurable (STEAM_LOG_PERMISSIONS)
H2O: Communicate with cluster using leader node only
SW: Added support for Hive tables
SW: Disable Spark dynamic allocation for internal backend
SW: Bundle and distribute all pysparkling dependencies
LDAP group configuration is no longer mandatory
Bug fixes for Jupyterhub
Bug fixes for Sparkling Water params
Bug fixes for CDH5

Please see links below for additional details on H2O & Sparkling Water.

Release Notes:

https://github.com/h2oai/h2o-3/blob/master/Changes.md

http://docs.h2o.ai/sparkling-water/2.3/latest-stable/doc/CHANGELOG.html

https://s3.amazonaws.com/steam-release/enterprise-steam/STEAM-1.4.3.82/docs/user-docs/_build/html/ReleaseNotes.html

H2O & Sparkling Water Documentation:

http://docs.h2o.ai/h2o/latest-stable/h2o-docs/index.html

http://docs.h2o.ai/sparkling-water/2.3/latest-stable/doc/index.html

If you have any questions, please reach out to support@h2o.ai

Thanks,
Venkatesh Yadav

Venkatesh Yadav

VP of Engineering

Software Engineering Leader at heart with a focus on building great teams that delivers amazing products and customer happiness. Venkatesh serves H2O as VP of Engineering Services. He joined the company from Adobe Systems, where he held a number of positions in the Software Engineering and Leadership space including his latest role as Sr. Manager, Software Engineering and Product Management with primary focus on Master Data Management and Data Science. Venkatesh played an instrumental Engineering and Product Management leadership role as an “Entrepreneur in Residence” in the various key strategic programs and initiatives like Adobe@Adobe, Adobe.io and Adobe.Data. Experience of managing and working with teams across the globe in US, Canada, Switzerland, Romania, India with a focus on value creation. Prior to Adobe Systems Venkatesh has served technology companies in various engineering roles in companies like Philips, HP and IBM. Venkatesh holds a Bachelor of Commerce degree from Mumbai University India and has successfully completed Product Management program from UC Berkeley and General Business Administration and Management program from McGill University. Connect with Venkatesh (@venkateshai)

BACK TO LIST