Return to page Blog

Filter By:

35 results Category: Year:
Compressing Zip Codes with Generalized Low Rank Models
by Team | December 07, 2015 GLRM , R

This tutorial introduces the Generalized Low Rank Model (GLRM) [1 ], a new machine learning approach for reconstructing missing values and identifying important features in heterogeneous data. It demonstrates how to build a GLRM in H2O that condenses categorical information into a numeric representation, which can then be used in other mo...

Read more
Databricks and H2O Make it Rain with Sparkling Water
by Team | December 01, 2015 Demos , Sparkling Water

**This blog post was first posted on the Databricks blog hereDatabricks provides a cloud-based integrated workspace on top of Apache Spark for developers and data scientists. has been an early adopter of Apache Spark and has developed Sparkling Water to seamlessly integrate’s machine learning library on top of Spark. In thi...

Read more
H2O World from an Attendee's Perspective
by Team | November 18, 2015 Community , Events , Guest Posts , H2O World

Data Science is like Rome, and all roads lead to Rome. H2O WORLD is the crossroad, pulling in a confluence of math, statistics, science and computer science and incorporating all avenues of business. From the academic, research oriented models to the business and computer science analytics implementations of those ideas, H2O WORLD inform...

Read more at ODSC SF 2015!
by Team | November 16, 2015 Events

As promised, we’re here reporting from the floor of the ( Open Data Science Conference (ODSC). It’s been another wild day for us, with an early start at 7:30am to set up ahead of the show. However, the long days are all worth it for a chance to see you all in the field. While we thought bringing two boxes of booklets woul...

Read more
H2O at ML Conf SF 2015
by Team | November 13, 2015 Community , Events

H2O is ubiquitous, and just like H2O, our team is everywhere! Today we attended the ( 2015 Machine Learning Conference in San Francisco. Located at the gorgeous Julia Morgan Ballroom the ML Conference brought together some of the world’s foremost experts on machine learning, including the tireless Xavier Amatriain, VP of...

Read more
H2O World Third Day Wrap-Up
by Team | November 12, 2015 Events , H2O World

H2O fans, we know that distance and the twin holidays of Veteran’s Day and Diwali kept many of you from attending the grand finale of H2O World, but we want to at least give you a taste of all that went on at the Computer History Museum in Mountain View. Day 3 of H2O World got off to a strong start with a massive panel on creating a cultu...

Read more
H2O World Second Day Wrap-Up
by Team | November 11, 2015 Events , H2O World

H2O fans, we didn’t think that our second day could top our first, but somehow it did! Still, although we had record attendance, we know that a lot of you aren’t here. While we can’t hope to get across all that’s happened, we do want to share some of the highlights. The morning started off with CEO Sri Ambati welcoming attendees and givin...

Read more
H2O World First Day Wrap-Up
by Team | November 10, 2015 Events , H2O World

H2O fans, we wish that all of you were here, but we also know that our community is spread across the globe and not all of you could make it to H2O World. However, those of you not able to attend the conference are just as much a part of our community as those that are. While we can’t hope to convey the energy and excitement of H2O World,...

Read more
Pre-H2O World, Part 2
by Team | November 09, 2015 Community , Events , H2O World

H2O fans, we have a day of data delights in store you for you tomorrow! The first day of H2O World is totally devoted to demos and walkthroughs designed to help YOU get the most out of your data. In fact, we have so many sessions planned that unless you have Hermione’s Time Turner, you won’t be able to attend them all. So choose wisely! A...

Read more
A Newbie's Guide to H2O in Python - Guest Post
by Team | November 09, 2015 Community , Guest Posts , Python

This blog was originally posted hereI created this guide to help fellow newbies get their feet wet with H2O, an open-source predictive analytics platform that is fast, powerful, and easy to use. Using a combination of extraordinary math and high-performance parallel processing, H2O allows you to quickly create models for big data. The st...

Read more
Pre-H2O World, Part 1
by Team | November 08, 2015 Community , Customers , Events , H2O World

H2O fans, the team is burning the midnight oil to get H2O World ready for you all. With an audience size twice that of last year’s event we’re going to pack the house at the Computer History Museum! This year’s event will feature 70+ speakers spread out over 41 talks, 22 training sessions and eight panels during the course of the m...

Read more
How to Build a Machine Learning App Using Sparkling Water and Apache Spark
by Team | October 03, 2015

The Sparkling Water project is nearing its one-year anniversary, which means Michal Malohlava, our main contributor, has been very busy for the better part of this past year. The Sparkling Water project combines H2O machine-learning algorithms with the execution power of Apache Spark. This means that the project is heavily dependent on tw...

Read more
How I used H2O to crunch through a bank's customer data
by Team | September 20, 2015

This entry was originally posted here Six months back I gingerly started exploring a few data science courses. After having successfully completed some of the courses I was restless. I wanted to try my data hacking skills on some real data (read kaggle). I find competing in hackathons, helps you to benchmark yourself against your fellow ...

Read more
Fast, Scalable Machine Learning- Now with New and Improved Python API
by Team | September 04, 2015

H2O now has a new Python API, based on valuable feedback provided by our community. Newest features include: – pandas-like dataframes, but for large, distributed computing – scikit learn integration – machine learning pipeline API Check out the tutorial below: ...

Read more
An Introduction to Data Science: Meetup Summary Guest Post by Zen Kishimoto
by Team | August 28, 2015

Originally posted on Tek-Tips forums by Zen here I went to two meetups at H2O , which provides an open source predictive analytics platform. The second meetup was full of participants because its theme was an introduction to data science. Data science is a new buzzword, and I feel like everyone claims to be a data scientist or somethin...

Read more
The Definitive Performance Tuning Guide for H2O Deep Learning (Ported scripts to H2O-3, results are taken from February's blog)
by Team | August 28, 2015

  Introduction This document gives guidelines for performance tuning of H2O Deep Learning, both in terms of speed and accuracy. It is intended for existing users of H2O Deep Learning (which is easy to change if you’re not), as it assumes some familiarity with the parameters and use cases. Motivation This effort was in part motivated b...

Read more
KMeans Diagnostics with H2O Cluster Models
by Team | August 05, 2015


Read more
Lending Club : Predict Bad Loans to Minimize Loss to Defaulted Accounts
by Team | August 03, 2015

As a sales engineer on the team I get asked a lot about the value add of H2O. How do you put a price tag on something that is open source? This typically revolves around the use cases; if a use case pertains to improving user experience or making apps that can improve internal operations then there’s no straightforward way of monet...

Read more
Introduction to Data Science using H2O - Chicago
by Team | August 03, 2015

Thank you to Chicago for the great meetup on 29 July 2015. Slides have been posted on GitHub . The links to the sample scripts and data is contained in the slides. If you have any further questions about H2O, please join our GoogleGroup or chat with us on Gitter . The slides are also available on the H2O Slideshare : Also, thank you t...

Read more
useR! Aalborg 2015 conference
by Team | July 16, 2015

The H2O team spent most of the useR! Aalborg 2015 conference at the booth giving demos and discussing H2O. Amy had a 16 node EC2 cluster running with 8 cores per node, making a total of 128 CPUs. The demo consisted of loading large files in parallel and then running our distributed machine learning algos in parallel. At an R conference, m...

Read more
KFold Cross Validation With H2O-3 and R
by Team | July 09, 2015

This blog is also explains the solution to a Google Stream question we received Note: KFold Cross Validation will be added to H2O-3 as an argument soonThis is a terse guide to building KFold cross-validated models with H2O using the R interface. There’s not very much R code needed to get up and running, but it’s by no means the one-magic-...

Read more
'Ask Craig'- Determining Craigslist Job Categories with Sparkling Water, Part 2
by Team | July 02, 2015

This is the second blog in a two blog series. The first blog is on turning these models into a Spark streaming applicationThe presentation on this application can be downloaded and viewed at SlideshareIn the last blog post we learned how to build a set of H2O and Spark models to predict categories for jobs posted on Craigslist using Spar...

Read more
Sparkling Water Tutorials Updated
by Team | July 01, 2015

This is updated version of Sparkling Water tutorials originally published by Amy Wang here For the newest examples, and updates, please visit Sparkling Water GitHub page The blog post introduces 3 tutorials: Running Sparkling Water Locally Running Sparkling Water on Standalone Spark Cluster Running H2O Commands from Spark Shell ...

Read more
'Ask Craig'- Determining Craigslist Job Categories with Sparkling Water
by Team | June 15, 2015

This is the first blog in a two blog series. The second blog is on turning these models into a Spark streaming applicationThe presentation on this application can be downloaded and viewed at SlideshareOne question we often get asked at Meetups or conferences is: “How are you guys different than other open-source machine-learning toolkits?...

Read more
Scaling R with H2O
by Team | June 10, 2015

In the advent of H2O 3.0 it seems appropriately timed to reintroduce the R API for H2O to help users better understand the differences between R dataframes and H2OFrames. Typically some of the first questions we get include: Does H2O support all R packages and functions? Is H2OFrame an extension of data.frame? Are H2O supported algo...

Read more
Using H2O for Kaggle: Guest Post by Gaston Besanson and Tim Kreienkamp
by Team | May 05, 2015

This post also appears on the GSE Data Science BlogIn this special H2O guest blog post, Gaston Besanson and Tim Kreienkamp talk about their experience using H2O for competitive data science . They are both students in the new Master of Data Science Program at the Barcelona Graduate School of Economics and used H2O in an in-class Kaggle...

Read more
PyData Dallas 2015
by Team | May 04, 2015

H2O was in attendance last week at PyData in Dallas, Texas. Our CTO, Cliff Click, spoke at PyData about driving H2O from Python to perform feature-engineering, group by, quantiles, and model building with H2O’s GBM, GLM, and Distributed Random Forest . We met a lot of great people and we are really excited to see the enthusiasm for H2O w...

Read more
Deep Learning for Public Safety
by Team | April 22, 2015

This article first appeared on KDnuggetsContributors: Alex Tellez, Michal Malohlava, Prithvi Prabhu, Hank Roark, Amy Wang.Download full report We’ve seen some incredible applications of Deep Learning with respect to image recognition and machine translation but this particular use case has to do with public safety; in particular, how De...

Read more
by Team | April 11, 2015

———- Forwarded message ———- From: SriSatish Ambati Date: Sun, Jun 1, 2014 at 12:29 PM Subject: Re: jirassic hierarchy. To: Kevin Cc: Tom Kraljevic, engr, team The best cultures are ones where it feels like there isn’t any. Not saying scrum won’t fit,...

Read more
Sparkling Water Certified by Cloudera
by Team | March 03, 2015

Last month before the team publicly announced Sparkling Water at Strata San Jose we made sure that the product was backed and certified by some major partners. This includes approval from databricks itself as well as Cloudera . Integration Testing for ClouderaFor Cloudera, testing was mainly geared toward deployment and sustaina...

Read more
The Definitive Performance Tuning Guide for H2O Deep Learning
by Team | February 27, 2015

This document gives guidelines for performance tuning of H2O Deep Learning, both in terms of speed and accuracy. It is intended for existing users of H2O Deep Learning (which is easy to change if you’re not), as it assumes some familiarity with the parameters and use cases. Motivation This effort was in part motivated by a Deep Learn...

Read more
Strata San Jose 2015
by Team | February 25, 2015

I had a great time at Strata SJ 2015! I had a lot of fun answering questions and talking to enthusiastic and curious H2O users at our booth. It was great seeing how many people are involved in the H2O community and I also really enjoyed drinking free margaritas at the booth crawl. The H2O team met some really great people with lots of dif...

Read more
How does Java Both Optimize Hot Loops and Allow Debugging
by Team | February 22, 2015

This blog came about because an old friend is trying to figure out how Java can do aggressive loop and inlining optimizations, while allowing the loading of new code and setting breakpoints… in that optimized code. On 2/21/2015 11:04 AM, IG wrote: Safepoint. I’m still confused because I don’t understand what is the required state at a s...

Read more
Introducing first-Fridays Hackathon with H2O
by Team | February 11, 2015

Greetings fellow ML/AI enthusiasts! This blog post serves two purposes: 1) Introduction of our First Fridays initiative 2) Recap our first 12-hour Hackathon!WHAT:The first Friday of each month, will hold a Hack-A-Thon from 1pm – 10pm (yep, you read correctly!) whereby we invite ANYONE to come hack through a data problem with the H2...

Read more
Launching H2O with Docker
by Team | January 09, 2015

Hello world, again. H2O is already relatively easy to launch, all the user needs is a compatiable Java version but now that level of difficulty is reduce to nil. Jeff, our DevOps engineer, presented me with a Docker container for H2O making shipping H2O possible regardless of your environment setup. You can now launch H2O in an isolated e...

Read more