February 25th, 2015

Strata San Jose 2015

RSS icon RSS Category: Uncategorized [EN]
strata2015

I had a great time at Strata SJ 2015! I had a lot of fun answering questions and talking to enthusiastic and curious H2O users at our booth. It was great seeing how many people are involved in the H2O community and I also really enjoyed drinking free margaritas at the booth crawl.
The H2O team met some really great people with lots of different use cases for our product and we hope to see all of you again at our First-Fridays Hackathons or other meetups.

Strata 2015 Presentation

The H2O.ai team presented on Thursday right after lunch. We had two presenters on stage – Cliff and Michal presented two new super cool features of H2O – Python API and Sparkling Water. The presentation was legendary! Not only because the room was packed – all seats were occupied and people were standing along walls, but also we received lot of interesting questions and feedback regarding H2O, Python and Sparkling Water.
strata2015
The presentation involved introduction of H2O and its features, but a major part of the talk was devoted to online product demo (real online demo running on Cliff and Michal’s laptops using the latest H2O release!). For this case, we became CitiBike New York data scientists, predicting number of bikes at individual bike sharing stations at any given time based on historical data and weather data.
The demo used two publicly available datasets – CitiBike NY historical data from years 2013 and 2014 (available here) and New York weather data publicly available from National Climatic Data Center. They are also available in H2O’s S3 storage which you can get by cloning the H2O repository and fetching big data:

git clone https://github.com/h2oai/h2o-dev.git
cd h2o-dev
./gradlew syncBigdataLaptop
cd bigdata/laptop/citibike-nyc/

During the demo we demonstrated real-life machine learning workflow involving the following steps:
– data loading
– data munging including feature generation and refinement
– filtering data
– joining data from both sources (i.e., joining weather and bikes tables)
– splitting data into three splits for model training, on-the-fly validation, and testing
– and finally models (in this case we generated GBM and GLM) training and evaluation of their performance based on R-squared score
Our overall goal of the talk was to demonstrate this data science workflow using Python API and then perform the same workflow from Sparkling Water, combining Scala, Spark, and H2O APIs.
Cliff showed the workflow using Python API directly from iPython notebook. The notebook source is available in H2O’s GitHub here and the raw Python code is here.
Michal (that’s me) demonstrated Sparkling Water by using Sparkling Shell (regular Spark shell with additional Sparkling Water library) and went step-by-step through the workflow described by a script available in H2O GitHub. also showed our new H2O UI and used it to explore data.
The entire presentation was recorded by Strata and will be available soon at Strata Proceedings. However, the presentation deck is already available here:

Additional Resources

Leave a Reply

+
Recap of H2O World India 2023: Advancements in AI and Insights from Industry Leaders

On April 19th, the H2O World  made its debut in India, marking yet another milestone

May 29, 2023 - by Parul Pandey
+
Enhancing H2O Model Validation App with h2oGPT Integration

As machine learning practitioners, we’re always on the lookout for innovative ways to streamline and

May 17, 2023 - by Parul Pandey
+
Building a Manufacturing Product Defect Classification Model and Application using H2O Hydrogen Torch, H2O MLOps, and H2O Wave

Primary Authors: Nishaanthini Gnanavel and Genevieve Richards Effective product quality control is of utmost importance in

May 15, 2023 - by Shivam Bansal
AI for Good hackathon
+
Insights from AI for Good Hackathon: Using Machine Learning to Tackle Pollution

At H2O.ai, we believe technology can be a force for good, and we're committed to

May 10, 2023 - by Parul Pandey and Shivam Bansal
H2O democratizing LLMs
+
Democratization of LLMs

Every organization needs to own its GPT as simply as we need to own our

May 8, 2023 - by Sri Ambati
h2oGPT blog header
+
Building the World’s Best Open-Source Large Language Model: H2O.ai’s Journey

At H2O.ai, we pride ourselves on developing world-class Machine Learning, Deep Learning, and AI platforms.

May 3, 2023 - by Arno Candel

Request a Demo

Explore how to Make, Operate and Innovate with the H2O AI Cloud today

Learn More