April 10th, 2019
H2O-3, Sparkling Water and Enterprise Steam UpdatesRSS Share Category: Community, Data Science, H2O Release, Technical
By: Venkatesh Yadav
We are excited to announce the new release of H2O Core, Sparkling Water and Enterprise Steam.
Below are some of the new features we have added:
Yates (184.108.40.206) – 3/31/2019
- [PUBDEV-6159] – The AutoMLTest.java test suite now runs correctly on a local machine.
- [PUBDEV-6189] – Fixed an issue in as_date that occurred when the column included NAs.
- [PUBDEV-6208] – AutoML no longer fails if one of the Stacked Ensemble models is deleted.
- [PUBDEV-6230] – Removed ellipses after the H2O server link when launching the Python client..
- [PUBDEV-6231] – In Deep Learning, fixed an issue that occurred when running one-hot-encoding on categoricals.
- [PUBDEV-6266] – In predictions, fixed an issue that resulted in a “Categorical value out of bounds error” when calling a model.
- [PUBDEV-6284] – The Python API no longer reverses the labels for positive and negative values in the standardized coefficients plot legend.
- [PUBDEV-6346] – In R, fixed an issue that cause group_by mean to only calculate one column when multiple columns were specified.
- [PUBDEV-6350] – Fixed an issue that caused the confusion_matrix method to return matrices for other metrics.
- [PUBDEV-6357] – Fixed an issue that resulted in a “Categorical value out of bounds error” when calling a model using Python.
- [PUBDEV-6360] – Improved the error message that displays when a user attempts to modify an Enum/categorical column as if it were a string.
- [PUBDEV-6367] – Rows that start with a # symbol are no longer dropped during the import process.
- [PUBDEV-6368] – Fixed an SVM import failure.
- [PUBDEV-6376] – Fixed an issue that caused the default StackedEnsemble prediction to fail when applied to a test dataset without a response column.
- [PUBDEV-6379] – Fixed handling of BAD state in CategoricalWrapperVec.
- [PUBDEV-4680] – Added Blending mode to Stacked Ensembles, which can be specified with the `blending_frame` parameter. With Blending mode, you do not use cross-validation preds to train the metalearner. Instead you score the base models on a holdout set and use those predicted values.
- [PUBDEV-5801] – Model output now includes column names and types.
- [PUBDEV-5809] – AutoML now includes a max_runtime_secs_per_model option.
- [PUBDEV-5925] – In GLM, added support for negative binomial family.
- [PUBDEV-6056] – For GBM and XGBoost models, users can now generate feature contributions (SHAP values).
- [PUBDEV-6136] – Added support for Generic Models, which provide a means to use external, pretrained MOJO models in H2O for scoring. Currently only GBM, DRF, IF, and GLM MOJO models are supported.
- [PUBDEV-6180] – Added the blending_frame parameter to Stacked Ensembles in Flow.
- [PUBDEV-6196] – Added an include_algos parameter to AutoML in the R and Python APIs. Note that in Flow, users can specify exclude_algos only.
- [PUBDEV-6339] – In the R and Python clients, added a function that calculates the chunk size based on raw size of the data, number of CPU cores, and number of nodes.
- [PUBDEV-6344] – Added ability to import from Hive using metadata from Metastore.
- [PUBDEV-6358] – Users can now choose the database where import_sql_select creates a temporary table.
- [PUBDEV-6365] – Added support for monotonicity constraints for binomial GBMs.
- [PUBDEV-6374] – Users can now define custom HTTP headers using an `-add_http_header` option.
- [PUBDEV-6386] – XGBoost MOJO now uses Java predictor by default.
- [PUBDEV-4982] – Fixed an issue that caused Python tests to sometimes fail when run inside a Docker container.
- [PUBDEV-5876] – Simplified and improved the GLM COD implementation.
- [PUBDEV-5491] – SQLite support is available via any JDBC driver in streaming mode.
- [PUBDEV-5993] – Updated Retrofit and okHttp dependencies.
- [PUBDEV-6129] – Target Encoding is now available in the Python client.
- [PUBDEV-6176] – Moved StackedEnsembleModel to hex.ensemble packages. In prior versions, this was in a root hex package.
- [PUBDEV-6188] – Secret key ID and secret key are available for s3:// AWS protocol.
- This can be done in the R client using: h2o.setS3Credentials(accessKeyId, accesSecretKey)
- And in Python client using: from h2o.persist import set_s3_credentials set_s3_credentials(access_key_id, secret_access_key)
- [PUBDEV-6217] – Users can now specify AWS credentials at runtime.
- [PUBDEV-6254] – The new blending_frame parameter is now available in AutoML.
- [PUBDEV-6334] – Fixed an error in the Javadoc for the Frame.java sort function.
- [PUBDEV-6363] – Fixed Hive delegation token generation.
- [PUBDEV-6388] – Reordered the algorithms train in AutoML and prioritized hardcoded XGBoost models.
- [PUBDEV-4977] – Removed FAQ indicating that Java 9 was not yet supported.
- [PUBDEV-6136] – Added a “Generic Models” chapter to the Algorithms section.
- [PUBDEV-6179] – Added the blending_frame parameter to Stacked Ensembles documentation.
- [PUBDEV-6280] – Added information about the Negative Binomial family to the GLM booklet and the user guide.
- [PUBDEV-6289] – Improved the R and Python client documentation for the `sum` function.
- [PUBDEV-6331] – Added include_algos,e xclude_algos, max_models, and max_runtime_secs_per_model examples to the Parameters appendix.
- [PUBDEV-6362] – In the User Guide and R an Python documentation, replaced references to “H2O Cloud” with “H2O Cluster”.
- [PUBDEV-6375] – Added information about predict_contributions to the Performance and Prediction chapter.
- [PUBDEV-6381] – In the GBM chapter, noted that monotone_constraints is available for Bernoulli distributions in addition to Gaussian distributions.
- Improved the GBM Reproducibility FAQ.
Xu (220.127.116.11) – 3/13/2019
- [PUBDEV-6335] – In GBM, added a check to ensure that monotonicity constraints can only be used when distribution=”gaussian”.
- [PUBDEV-6342] – Fixed an issue that caused decreasing monotonic constraints to fail to work correctly. Min-Max bounds are now properly propagated to the subtrees.
- [PUBDEV-6343] – Added internal validation of monotonicity of GBM trees.
- [PUBDEV-6337] – Updated the description of monotone_constraints for GBM. This option can only be used for gaussian distributions.
- [PUBDEV-6347] – Improved documentation for the EC2 and S3 storage topic for AWS Standalone instances (http://docs.h2o.ai/h2o/latest-stable/h2o-docs/cloud-integration/ec2-and-s3.html#aws-standalone-instance).
v2.4.9 – 04/03/2019
- SW-1162 – Exception when there is a column with BOOLEAN type in dataset during H2OMOJOModel transformation
- SW-1177 – In Pysparkling script, setting –driver-class-path influences the environment
- SW-1178 – Upgrade to h2O 18.104.22.168
- SW-1180 – Use specific metrics in grid search, in the same way as H2O Grid
- SW-1181 – Document off heap memory configuration for Spark in Standalone mode/IBM conductor
- SW-1182 – Fix random project name generation in H2OAutoML Spark Wrapper
- SW-1167 – Expose search_criteria for H2OGridSearch
- SW-1174 – Expose H2OGridSearch models
- SW-1183 – Add include Algos to H2o AutoML pipeline stage & ability to ignore XGBoost
- SW-1164 – Add Sparkling Water to Jupyter spark/pyspark kernels in EMR terraform template
- SW-1171 – Upgrade build to Gradle 5.2.1
- SW-1175 – Integrate with H2O native hive support
v2.3.26 – 03/15/2019
- SW-1163 – Expose missing variables in shared TF EMR SW template
- SW-1145 – Start jupyter notebook with Scala & Python Spark in AWS EMR Terraform template
- SW-1165 – Upgrade to H2O 22.214.171.124
v2.3.25 – 03/07/2019
- SW-1150 – hc.stop() shows ‘exit’ not defined error
- SW-1152 – Fix RSparkling in case the jars are being fetched from maven
- SW-1156 – H2OXgboost pipeline stage does not define updateH2OParams method
- SW-1159 – Unique project name in automl to avoid sharing one leaderboard
- SW-1161 – Fix grid search pipeline step on pyspark side
- SW-1052 – Document teraform scripts for AWS
- SW-1089 – Document using Google Cloud Storage In Sparkling Water
- SW-1135 – Speed up conversion between sparse spark vectors and h2o frames by using sparse new chunk
- SW-1141 – Improve terraform templates for AWS EMR and make them part of the release process
- SW-1147 – Integrate with Spark 2.3.3
- SW-1149 – Allow login via ssh to created cluster using terraform
- SW-1153 – Add H2OGridSearch pipeline stage to PySpark
- SW-1155 – Test GBM Grid Search Scala pipeline step
- SW-1158 – Generalize H2OGridSearch Pipeline step to support other available algos
- SW-1160 – Upgrade H2O to 126.96.36.199
Version 1.4.7 – 04/03/2019
- Fix Sparkling Water proxy issue with uppercase usernames
- Improve uploading h2o-3 engines
- Set SPARK_YARN_MODE correctly based on the Hadoop distribution
Version 1.4.6 – 04/01/2019
- Added ability to choose H2O-3 Leader Node when starting a cluster
- Added ability to control the number of clusters a user can spin per cluster profile
- Added option to select default Sparkling Water backend
- Added automatic redirection back to login with an expired session cookie
- Added an ability to auto-assign Steam profiles according to SAML profiles
- Docs: Add “Before you begin installation” section
- Docs: Documented steam.yaml configuration options
- Docs: Updated documentation
- Fix an issue when Steam was hitting API endpoints of dead clusters
- Fix and issue when hadoop-unjar files were not deleted from temp directory
- Fix issue with uppercase usernames and Sparkling Water on Hadoop
Version 1.4.5 – 03/22/2019
- Added Configurable Steam Web UI timeout (STEAM_WEB_UI_TIMEOUT_MIN)
Version 1.4.4 – 02/20/2019
- Make log file permissions configurable (STEAM_LOG_PERMISSIONS)
- H2O: Communicate with cluster using leader node only
- SW: Added support for Hive tables
- SW: Disable Spark dynamic allocation for internal backend
- SW: Bundle and distribute all pysparkling dependencies
- LDAP group configuration is no longer mandatory
- Bug fixes for Jupyterhub
- Bug fixes for Sparkling Water params
- Bug fixes for CDH5
Please see links below for additional details on H2O & Sparkling Water.
H2O & Sparkling Water Documentation:
If you have any questions, please reach out to email@example.com