July 16th, 2015

useR! Aalborg 2015 conference

RSS icon RSS Category: Uncategorized [EN]
matt_dowle

The H2O team spent most of the useR! Aalborg 2015 conference at the booth giving demos and discussing H2O. Amy had a 16 node EC2 cluster running with 8 cores per node, making a total of 128 CPUs. The demo consisted of loading large files in parallel and then running our distributed machine learning algos in parallel.
At an R conference, most people wanted to script H2O from R, which is of course built-in (as is Python) but we also conveyed the benefits that our user interface Flow can provide in this space (even for programmers) by automating and accelerating common tasks. We enjoyed discussing future directions with and bouncing ideas off of the attendees. There is nothing like seeing people’s first reaction to the product, live and in person! As an open source platform, H2O thrives on suggestions and contributions from our community.
All components of H2O are developed-in-the-open on GitHub.

H2O contributed 3 talks:

Matt Dowle on Scalable Radix Sorting

Matt Dowle presented on the details and benchmarks of the fast and stable radix sort implementation in data.table:::forderv. On 500 million random numerics (4 GB), base R takes approximately 22 minutes vs forder at 2 minutes. He discussed the pros and cons of most-significant-digit (forwards) and least-significant-digit (backwards) as well as application to all types: integer with large range (>1e5), numeric and character. We hope to find a sponsor from the R core team to help us include this method in base R where it could benefit the community automatically. The work builds on articles by Terdiman, 2000 and Herf, 2001 and is joint work with Arun Srinivasan.
Slides: Fast, stable and scalable true radix sorting with Matt Dowle at useR! Aalborg

matt_dowle
Photo courtesty of flickr user Rhaen

Erin LeDell on h2oEnsemble

Erin presented an overview of scalable ensemble learning in R using the h2oEnsemble R package. Practitioners may prefer ensemble algorithms when model performance is valued above other factors such as model complexity or training time. This R interface provides easy access to scalable ensemble learning using H2O. The H2O Ensemble software implements the Super Learner, or stacking, ensemble algorithm, using distributed base learning algorithms from the open source machine learning platform, H2O. The following base learner algorithms are currently supported in h2oEnsemble: Generalized linear models with elastic net regularization, Gradient Boosting (GBM) with regression and classification trees, Random Forest and Deep Learning (multi-layer feed-forward neural networks). Erin provided code examples and some simple benchmarks.
Slides: h2oensemble with Erin Ledell at useR! Aalborg

erin_ledell
Photo courtesty of flickr user Rhaen

Amy Wang on H2O Architecture

Amy presented H2O at the useR! sponsor talk and went over the architecture of our product. Her live demo showed the speed and scale of H2O through an R interface. On top of reading in data and aggregating columnar data at lightning fast speed, H2O also comes with a suite of sophisticated models with all the parameters exposed to the front end for ease of use. This attracted discussion at our booth even as the conference came to a close and we began packing up our banners. Many academics expressed interest in using H2O to teach students Machine Learning algorithms, while people in the industry discussed partnerships and use cases. The emphasis of the talk is to encourage R users to try H2O and build a community of users with interesting questions, ideas, and feedback who can ultimately help provide a better open source H2O experience for everyone.
Slides: H2O Overview with Amy Wang at useR! Aalborg

amy_wang
Photo courtesty of Matt Dowle

Matt also stopped by Copenhagen to give a talk at the R Summit. You can find his R Summit slides on our Slideshare

Want to try one of the demos we ran at the useR! booth?

Check out our Github page for instructions, scripts, and datasets.
Click here for R demos
Special thanks to the useR! organizing committee and all the people who stopped by our booth!

Leave a Reply

+
Enhancing H2O Model Validation App with h2oGPT Integration

As machine learning practitioners, we’re always on the lookout for innovative ways to streamline and

May 17, 2023 - by Parul Pandey
+
Building a Manufacturing Product Defect Classification Model and Application using H2O Hydrogen Torch, H2O MLOps, and H2O Wave

Primary Authors: Nishaanthini Gnanavel and Genevieve Richards Effective product quality control is of utmost importance in

May 15, 2023 - by Shivam Bansal
AI for Good hackathon
+
Insights from AI for Good Hackathon: Using Machine Learning to Tackle Pollution

At H2O.ai, we believe technology can be a force for good, and we're committed to

May 10, 2023 - by Parul Pandey and Shivam Bansal
H2O democratizing LLMs
+
Democratization of LLMs

Every organization needs to own its GPT as simply as we need to own our

May 8, 2023 - by Sri Ambati
h2oGPT blog header
+
Building the World’s Best Open-Source Large Language Model: H2O.ai’s Journey

At H2O.ai, we pride ourselves on developing world-class Machine Learning, Deep Learning, and AI platforms.

May 3, 2023 - by Arno Candel
LLM blog header
+
Effortless Fine-Tuning of Large Language Models with Open-Source H2O LLM Studio

While the pace at which Large Language Models (LLMs) have been driving breakthroughs is remarkable,

May 1, 2023 - by Parul Pandey

Request a Demo

Explore how to Make, Operate and Innovate with the H2O AI Cloud today

Learn More