March 24th, 2016

Connecting to Spark & Sparkling Water from R & Rstudio

RSS icon RSS Category: Uncategorized [EN]
Spark commands

Sparkling Water offers the best of breed machine learning for Spark users. Sparkling Water brings all of H2O’s advanced algorithms and capabilities to Spark. This means that you can continue to use H2O from Rstudio or any other ide of your choice. This post will walk you through the steps to get running on plain R or R studio from Spark.
It works just the same the same way as regular H2O. You just need to call h2o.init() from R with the right parameters i.e. IP, PORT
For example: we start sparkling shell (bin/sparkling-shell) here and create an H2OContext:
Spark commands
Now H2OContext is running and H2O’s REST API is exposed on 172.162.223:54321
So we can open RStudio and call h2o.init() (make sure you have the right R H2O package installed):
Rstudio-start
Let’s now create a Spark DataFrame, then publish it as H2O frame and access it from R:
This is how you achieve that in sparkling-shell:
val df = sc.parallelize(1 to 100).toDF // creates Spark DataFrame
val hf = h2oContext.asH2OFrame(df) // publishes DataFrame as H2O's Frame

Scala val df code
You can see that the name of the published frame is frame_rdd_6. Now let us go to RStudio and list all the available frames via h2o.ls() function:
Alternatively you could also name the frame during the transformation from Spark to H2O as shown below:
h2oContext.asH2OFrame(df) -> val hf = h2oContext.asH2OFrame(df, "simple.frame")
Rstudio-frames
We can fetch the frame as well or invoke a R function on it:
Rstudio-rdd
Keep hacking!

Leave a Reply

AT&T panel: AI as a Service
+
AT&T panel: AI as a Service (AIaaS)

Mark Austin, Vice President of Data Science at AT&T joined us on stage at H2O

March 22, 2023 - by Liz Pratusevich
+
[Infographic] Healthcare providers: How to avoid AI “Pilot-Itis”

From increased clinician burnout and financial instability to delays in elective and preventative care, the

March 15, 2023 - by
+
Deploy a WAVE app on an AWS EC2 instance

This article was originally published by Greg Fousas and Michelle Tanco on Medium  and reviewed by

March 10, 2023 - by Michelle Tanco and Greg Fousas
+
How Horse Racing Predictions with H2O.ai Saved a Local Insurance Company $8M a Year

In this Technical Track session at H2O World Sydney 2022, SimplyAI's Chief Data Scientist Matthew

March 8, 2023 - by Liz Pratusevich
+
AI and Humans Combating Extinction Together with Dr. Tanya Berger-Wolf

Dr. Tanya Berger-Wolf, Co-Founder and Director of AI for conservation nonprofit Wild Me, takes the

March 1, 2023 - by Liz Pratusevich
+
Improving Search Query Accuracy: A Beginner’s Guide to Text Regression with H2O Hydrogen Torch

Although search engines are vital to our daily lives, they need help understanding complex user

February 28, 2023 - by

Request a Demo

Explore how to Make, Operate and Innovate with the H2O AI Cloud today

Learn More