March 3rd, 2015

Sparkling Water Certified by Cloudera

RSS icon RSS Category: Uncategorized [EN]
cloudera_connect

Last month before the H2O.ai team publicly announced Sparkling Water at Strata San Jose we made sure that the product was backed and certified by some major partners. This includes approval from databricks itself as well as Cloudera.
cloudera_connect_combined_logo

Integration Testing for Cloudera

For Cloudera, testing was mainly geared toward deployment and sustainablility of a Sparkling Water cluster. Some key points that came up during the certification process include:

  • Compability. Because Spark is bundled with Cloudera with each release and the user can choose to install either Standalone Spark or a Spark on Yarn through the Cloudera manager, the test must use the Spark installation that comes bundled with Cloudera. Sparkling Water was also tested on Standalone Spark as well as one launched with Yarn:

    # SPARK_HOME set to Cloudera bundled Spark
    export SPARK_HOME="/opt/cloudera/parcels/CDH-5.3.1-1.cdh5.3.1.p0.5/lib/spark"
    export MASTER="spark://mr-0xd1-precise1.0xdata.loc:7077"
    cd $SPARK_INSTALLATION
    # Run test script with standalone Sparkling Water
    bin/sparkling-shell --num-executors 8 --executor-memory 5g -i cloudera_cert.scala
    # Run test script with Sparkling Water launched on YARN
    bin/sparkling-shell --num-executors 8 --executor-memory 5g --master yarn -i cloudera_cert.scala

    The cloudera_cert.scala script included a test of import and parse of data, moving data between Spark and H2O, using Spark’s SQLContext and H2O’s Algorithms.
  • Security. Sparkling water had to be able to launch and run on a secure cluster. An authentication layer was added by utilizing Kerberos, an authenication server, on the Hadoop servers, and tests that ran on the unsecured servers had to pass on Kerberized servers. To run Sparkling Shell on Kerberos, the user will first initialize the kerberos ticket and then run the scala script with Sparkling Shell:

    kinit 0xdata@CLOUDERA-CERT
    cd $SPARK_INSTALLATION
    bin/sparkling-shell --num-executors 8 --executor-memory 5g -i cloudera_cert.scala
  • Missing Class Paths. Sparkling water had to be able to run with CDH5.3, and to accomplish this the cluster had to be upgraded from the then-current version of Cloudera from 5.3.0-1.cdh5.3.0.p0.30 to 5.3.1-1.cdh5.3.1.p0.5. Though it was only a minor release update this fixed any class path issues we found in the older version for joda jar files:

    java.lang.NoSuchMethodError: org.joda.time.DateTime.now(Lorg/joda/time/DateTimeZone;)Lorg/joda/time/DateTime;
  • Port Collisions. At the moment if there is a port collision when H2O is launched from a Spark launched on Yarn, executors that can’t find an available port will kill itself and try to restart. H2O however cannot accept executor deaths and so when an executor dies so does the H2O cluster. One way to avoid a port collision for now is to specify the base port you want to launch H2O at by passing the configuration spark.ext.h2o.port.base during the launch process:

    bin/sparkling-shell --num-executors 8 --executor-memory 5g --conf spark.ext.h2o.port.base=63331

Getting Started

To get started yourself, download Sparkling Water and get started with our Github Examples.

Leave a Reply

+
Enhancing H2O Model Validation App with h2oGPT Integration

As machine learning practitioners, we’re always on the lookout for innovative ways to streamline and

May 17, 2023 - by Parul Pandey
+
Building a Manufacturing Product Defect Classification Model and Application using H2O Hydrogen Torch, H2O MLOps, and H2O Wave

Primary Authors: Nishaanthini Gnanavel and Genevieve Richards Effective product quality control is of utmost importance in

May 15, 2023 - by Shivam Bansal
AI for Good hackathon
+
Insights from AI for Good Hackathon: Using Machine Learning to Tackle Pollution

At H2O.ai, we believe technology can be a force for good, and we're committed to

May 10, 2023 - by Parul Pandey and Shivam Bansal
H2O democratizing LLMs
+
Democratization of LLMs

Every organization needs to own its GPT as simply as we need to own our

May 8, 2023 - by Sri Ambati
h2oGPT blog header
+
Building the World’s Best Open-Source Large Language Model: H2O.ai’s Journey

At H2O.ai, we pride ourselves on developing world-class Machine Learning, Deep Learning, and AI platforms.

May 3, 2023 - by Arno Candel
LLM blog header
+
Effortless Fine-Tuning of Large Language Models with Open-Source H2O LLM Studio

While the pace at which Large Language Models (LLMs) have been driving breakthroughs is remarkable,

May 1, 2023 - by Parul Pandey

Request a Demo

Explore how to Make, Operate and Innovate with the H2O AI Cloud today

Learn More