February 25th, 2015

Strata San Jose 2015

RSS icon RSS Category: Uncategorized [EN]
strata2015

I had a great time at Strata SJ 2015! I had a lot of fun answering questions and talking to enthusiastic and curious H2O users at our booth. It was great seeing how many people are involved in the H2O community and I also really enjoyed drinking free margaritas at the booth crawl.
The H2O team met some really great people with lots of different use cases for our product and we hope to see all of you again at our First-Fridays Hackathons or other meetups.

Strata 2015 Presentation

The H2O.ai team presented on Thursday right after lunch. We had two presenters on stage – Cliff and Michal presented two new super cool features of H2O – Python API and Sparkling Water. The presentation was legendary! Not only because the room was packed – all seats were occupied and people were standing along walls, but also we received lot of interesting questions and feedback regarding H2O, Python and Sparkling Water.
strata2015
The presentation involved introduction of H2O and its features, but a major part of the talk was devoted to online product demo (real online demo running on Cliff and Michal’s laptops using the latest H2O release!). For this case, we became CitiBike New York data scientists, predicting number of bikes at individual bike sharing stations at any given time based on historical data and weather data.
The demo used two publicly available datasets – CitiBike NY historical data from years 2013 and 2014 (available here) and New York weather data publicly available from National Climatic Data Center. They are also available in H2O’s S3 storage which you can get by cloning the H2O repository and fetching big data:

git clone https://github.com/h2oai/h2o-dev.git
cd h2o-dev
./gradlew syncBigdataLaptop
cd bigdata/laptop/citibike-nyc/

During the demo we demonstrated real-life machine learning workflow involving the following steps:
– data loading
– data munging including feature generation and refinement
– filtering data
– joining data from both sources (i.e., joining weather and bikes tables)
– splitting data into three splits for model training, on-the-fly validation, and testing
– and finally models (in this case we generated GBM and GLM) training and evaluation of their performance based on R-squared score
Our overall goal of the talk was to demonstrate this data science workflow using Python API and then perform the same workflow from Sparkling Water, combining Scala, Spark, and H2O APIs.
Cliff showed the workflow using Python API directly from iPython notebook. The notebook source is available in H2O’s GitHub here and the raw Python code is here.
Michal (that’s me) demonstrated Sparkling Water by using Sparkling Shell (regular Spark shell with additional Sparkling Water library) and went step-by-step through the workflow described by a script available in H2O GitHub. also showed our new H2O UI and used it to explore data.
The entire presentation was recorded by Strata and will be available soon at Strata Proceedings. However, the presentation deck is already available here:

Additional Resources

Leave a Reply

+
10 Consejos para Convertirte en un Científico de Datos Exitoso

En este mundo que no deja de cambiar y sorprendernos, como científicos de datos debemos

January 19, 2023 - by Favio Vázquez
+
Explaining models built in H2O-3 — Part 1

Machine Learning explainability refers to understanding and interpreting the decisions and predictions made by a

December 22, 2022 - by Parul Pandey
+
H2O.ai at NeurIPS 2022

H2O.ai is proud to participate in the 36th Conference on Neural Information Processing Systems (NeurIPS)

December 6, 2022 - by Marcos V. Conde
+
A Brief Overview of AI Governance for Responsible Machine Learning Systems

Our paper “A Brief Overview of AI Governance for Responsible Machine Learning Systems” was recently

November 30, 2022 - by Navdeep Gill, Abhishek Mathur and Marcos V. Conde
+
H2O World Dallas Customer Talks

After three long years of not having an #H2OWorld, we finally held our first one

November 24, 2022 - by Vinod Iyengar
+
New in Wave 0.24.0

Another Wave release has arrived with quite a few exciting new features. Let's quickly go

November 21, 2022 - by Martin Turoci

Request a Demo

Explore how to Make, Operate and Innovate with the H2O AI Cloud today

Learn More