June 25th, 2014

H2O – The Killer-App on Spark

RSS icon RSS Category: Uncategorized [EN]
Spark H2o

Spark H2o
Summary:
In-memory big data has come of age. Spark platform with it’s elegant API and architecture has captured developer’s hearts. Machine learning as an API for big data is just as real. R and predictive analytics on Big Data has become the center of the space. H2O has established a leadership in scalable ML having focused over the past two years. Spark captured developer’s hearts and minds of developers at the same time.
Sparkling Water brings together best of the both worlds!

H2o_killer_app_spark
Backdrop: Over the past few years, we watched Matei & Ion build a thriving open-source movement and a great development platform for in-memory big data, Spark. At the same time, H2O built a great open source product with a growing customer base focused on scalable machine learning and interactive data science. These past couple of months Spark and H2O teams started brainstorming to bring the best of H2O’s Machine Learning and Spark’s platform. The result is Sparkling Water which brings to Spark the power of of H2O’s fast big data Machine Learning.
Sparkling Water
Users can in a single invocation and process, get the best of Spark – It’s elegant APIs, RDD, simple context, multi-tenancy and H2O’s speed, columnar-compression, in-memory scale and fully-featured Machine Learning and Deep-Learning algorithms.
Easy single-process integration for end-users, reading and writing from Tachyon and RDD is a first step and now available. Data gets parsed and exchanged between Spark and H2O via Tachyon. And a single SparkDriver can setup context and run SQL and ML from same process.
H2o_spark_tachyon
On the longer-term roadmap is H2ORDD which brings the the speed, compression and production-ready in-memory engineering to Spark’s core.
H2o RDD
This allows seamless use of H2O’s Deep Learning and Advanced Algorithms to Spark’s user community.H2O as the killer machine learning application for the Spark Platform will further empower application developers on Spark.
MLLib and H2O: MLlib is a library of efficient implementations of popular algorithms directly built using Spark. Our overarching goal is to see Spark succeed and so we believe that customers should have the choice to select the best tool for meeting their needs in the context of Spark. That’s why we think it is fantastic that Mahout will be porting their algorithms to Spark, and why we’re thrilled 0xData is bringing all the capabilities of H2O to Spark. Overtime, H2O’s ML algorithms and library of legos will accelerate efforts that are started in the community.
We think it is great that we’re moving towards a tighter integration where H2O can be used naturally with the rest of Spark’s capabilities.
What’s next? Sparkling Water code is here:
https://github.com/0xdata/h2o-sparkling
Steps to get it installed and use Tachyon for interoperability are described Installation and Test
Demo Code

object AirlinesDemo extends Demo {
  override def run(conf: DemoConf): Unit = {
    // Prepare data
    // Dataset
    val dataset   = “data/allyears2k_headers.csv”
    // Row parser
    val rowParser = AirlinesParser
    // Table name for SQL
    val tableName = “airlines_table”
    // Select all flights with destination == SFO
    val query = “””SELECT * FROM airlines_table WHERE dest=”SFO” “””
    // Connect to shark cluster and make a query over prostate, transfer data into H2O
    val frame:Frame = executeSpark<a href="dataset, rowParser, conf.extractor, tableName, query, local=conf.local">Airlines</a>
    Log.info(“Extracted frame from Spark: “)
    Log.info(if (frame!=null) frame.toString + “\nRows: “ + frame.numRows() else “<nothing>“)</nothing>
    // Now make a blocking call of GBM directly via Java API
    val model = gbm(frame, frame.vec(“isDepDelayed”), 100, true)
    Log.info(“Model built!”)
  }
  override def name: String = “airlines”
}

Leave a Reply

+
H2O Wave joins Hacktoberfest

It’s that time of the year again. A great initiative by DigitalOcean called Hacktoberfest that aims to bring

September 29, 2022 - by Martin Turoci
+
Three Keys to Ethical Artificial Intelligence in Your Organization

There’s certainly been no shortage of examples of AI gone bad over the past few

September 23, 2022 - by H2O.ai Team
+
Using GraphQL, HTTPX, and asyncio in H2O Wave

Today, I would like to cover the most basic use case for H2O Wave, which is

September 21, 2022 - by Martin Turoci
+
머신러닝 자동화 솔루션 H2O Driveless AI를 이용한 뇌에서의 성차 예측

Predicting Gender Differences in the Brain Using Machine Learning Automation Solution H2O Driverless AI 아동기 뇌인지

August 29, 2022 - by H2O.ai Team
+
Make with H2O.ai Recap: Validation Scheme Best Practices

Data Scientist and Kaggle Grandmaster, Dmitry Gordeev, presented at the Make with H2O.ai session on

August 23, 2022 - by Blair Averett
+
Integrating VSCode editor into H2O Wave

Let’s have a look at how to provide our users with a truly amazing experience

August 18, 2022 - by Martin Turoci

Start Your Free Trial