We are happy to announce that Sparkling Water now fully supports Spark 2.3 and is available from our download page.
If you are using an older version of Spark, that’s no problem. Even though we suggest upgrading to the latest version possible, we keep the Sparkling Water releases for Spark 2.2 and 2.1 up-to-date with the latest version if we are not limited by Spark.
The last release of Sparkling Water contained several important bug fixes. The 3 major bug fixes are:
- Handle nulls properly in H2OMojoModel. In the previous versions, running predictions on the H2OMojoModel with null values would fail. We now handle the null values as missing values and it no longer fails.
- We marked the Spark dependencies in our maven packages as provided. This means that we assume that Spark dependencies are always provided by the run-time, which should always be true. This ensures a cleaner and more transparent Sparkling Water environment.
In PySparkling, the method as_h2o_frame didn’t issue an alert when we passed in a wrong input type. This method accepts only Spark DataFrames and RDDs, however, some users tried to pass different types and this method ended silently. Now we fail if the user passes a wrong data type to this method.
It is also important to mention that Spark 2.3 removed support for Scala 2.10. We’ve done the same in the release for Spark 2.3. Scala 2.10 is still supported in the older Spark versions.
The latest Sparkling Water versions also integrated with H2O 220.127.116.11 which brings several important fixes. The full change log for H2O 18.104.22.168 is available here and the full Sparkling Water change log can be viewed here.
Senior Software Engineer, Sparkling Water Team