H2O.ai Blog
Filter By:
35 results Category: Year:Compressing Zip Codes with Generalized Low Rank Models
This tutorial introduces the Generalized Low Rank Model (GLRM) [1 ], a new machine learning approach for reconstructing missing values and identifying important features in heterogeneous data. It demonstrates how to build a GLRM in H2O that condenses categorical information into a numeric representation, which can then be used in other mo...
Read moreDatabricks and H2O Make it Rain with Sparkling Water
**This blog post was first posted on the Databricks blog hereDatabricks provides a cloud-based integrated workspace on top of Apache Spark for developers and data scientists. H2O.ai has been an early adopter of Apache Spark and has developed Sparkling Water to seamlessly integrate H2O.ai’s machine learning library on top of Spark. In thi...
Read moreH2O World from an Attendee's Perspective
Data Science is like Rome, and all roads lead to Rome. H2O WORLD is the crossroad, pulling in a confluence of math, statistics, science and computer science and incorporating all avenues of business. From the academic, research oriented models to the business and computer science analytics implementations of those ideas, H2O WORLD inform...
Read moreH2O.ai at ODSC SF 2015!
As promised, we’re here reporting from the floor of the (H2O.ai-sponsored) Open Data Science Conference (ODSC). It’s been another wild day for us, with an early start at 7:30am to set up ahead of the show. However, the long days are all worth it for a chance to see you all in the field. While we thought bringing two boxes of booklets woul...
Read moreH2O at ML Conf SF 2015
H2O is ubiquitous, and just like H2O, our team is everywhere! Today we attended the (H2O.ai-sponsored) 2015 Machine Learning Conference in San Francisco. Located at the gorgeous Julia Morgan Ballroom the ML Conference brought together some of the world’s foremost experts on machine learning, including the tireless Xavier Amatriain, VP of...
Read moreH2O World Third Day Wrap-Up
H2O fans, we know that distance and the twin holidays of Veteran’s Day and Diwali kept many of you from attending the grand finale of H2O World, but we want to at least give you a taste of all that went on at the Computer History Museum in Mountain View. Day 3 of H2O World got off to a strong start with a massive panel on creating a cultu...
Read moreH2O World Second Day Wrap-Up
H2O fans, we didn’t think that our second day could top our first, but somehow it did! Still, although we had record attendance, we know that a lot of you aren’t here. While we can’t hope to get across all that’s happened, we do want to share some of the highlights. The morning started off with CEO Sri Ambati welcoming attendees and givin...
Read moreH2O World First Day Wrap-Up
H2O fans, we wish that all of you were here, but we also know that our community is spread across the globe and not all of you could make it to H2O World. However, those of you not able to attend the conference are just as much a part of our community as those that are. While we can’t hope to convey the energy and excitement of H2O World,...
Read morePre-H2O World, Part 2
H2O fans, we have a day of data delights in store you for you tomorrow! The first day of H2O World is totally devoted to demos and walkthroughs designed to help YOU get the most out of your data. In fact, we have so many sessions planned that unless you have Hermione’s Time Turner, you won’t be able to attend them all. So choose wisely! A...
Read moreA Newbie's Guide to H2O in Python - Guest Post
This blog was originally posted hereI created this guide to help fellow newbies get their feet wet with H2O, an open-source predictive analytics platform that is fast, powerful, and easy to use. Using a combination of extraordinary math and high-performance parallel processing, H2O allows you to quickly create models for big data. The st...
Read morePre-H2O World, Part 1
H2O fans, the H2O.ai team is burning the midnight oil to get H2O World ready for you all. With an audience size twice that of last year’s event we’re going to pack the house at the Computer History Museum! This year’s event will feature 70+ speakers spread out over 41 talks, 22 training sessions and eight panels during the course of the m...
Read moreHow to Build a Machine Learning App Using Sparkling Water and Apache Spark
The Sparkling Water project is nearing its one-year anniversary, which means Michal Malohlava, our main contributor, has been very busy for the better part of this past year. The Sparkling Water project combines H2O machine-learning algorithms with the execution power of Apache Spark. This means that the project is heavily dependent on tw...
Read moreHow I used H2O to crunch through a bank's customer data
This entry was originally posted here Six months back I gingerly started exploring a few data science courses. After having successfully completed some of the courses I was restless. I wanted to try my data hacking skills on some real data (read kaggle). I find competing in hackathons, helps you to benchmark yourself against your fellow ...
Read moreFast, Scalable Machine Learning- Now with New and Improved Python API
H2O now has a new Python API, based on valuable feedback provided by our community. Newest features include: – pandas-like dataframes, but for large, distributed computing – scikit learn integration – machine learning pipeline API Check out the tutorial below: ...
Read moreAn Introduction to Data Science: Meetup Summary Guest Post by Zen Kishimoto
Originally posted on Tek-Tips forums by Zen here I went to two meetups at H2O , which provides an open source predictive analytics platform. The second meetup was full of participants because its theme was an introduction to data science. Data science is a new buzzword, and I feel like everyone claims to be a data scientist or somethin...
Read moreThe Definitive Performance Tuning Guide for H2O Deep Learning (Ported scripts to H2O-3, results are taken from February's blog)
Introduction This document gives guidelines for performance tuning of H2O Deep Learning, both in terms of speed and accuracy. It is intended for existing users of H2O Deep Learning (which is easy to change if you’re not), as it assumes some familiarity with the parameters and use cases. Motivation This effort was in part motivated b...
Read moreLending Club : Predict Bad Loans to Minimize Loss to Defaulted Accounts
As a sales engineer on the H2O.ai team I get asked a lot about the value add of H2O. How do you put a price tag on something that is open source? This typically revolves around the use cases; if a use case pertains to improving user experience or making apps that can improve internal operations then there’s no straightforward way of monet...
Read moreIntroduction to Data Science using H2O - Chicago
Thank you to Chicago for the great meetup on 29 July 2015. Slides have been posted on GitHub . The links to the sample scripts and data is contained in the slides. If you have any further questions about H2O, please join our GoogleGroup or chat with us on Gitter . The slides are also available on the H2O Slideshare : Also, thank you t...
Read moreuseR! Aalborg 2015 conference
The H2O team spent most of the useR! Aalborg 2015 conference at the booth giving demos and discussing H2O. Amy had a 16 node EC2 cluster running with 8 cores per node, making a total of 128 CPUs. The demo consisted of loading large files in parallel and then running our distributed machine learning algos in parallel. At an R conference, m...
Read moreKFold Cross Validation With H2O-3 and R
This blog is also explains the solution to a Google Stream question we received Note: KFold Cross Validation will be added to H2O-3 as an argument soonThis is a terse guide to building KFold cross-validated models with H2O using the R interface. There’s not very much R code needed to get up and running, but it’s by no means the one-magic-...
Read more'Ask Craig'- Determining Craigslist Job Categories with Sparkling Water, Part 2
This is the second blog in a two blog series. The first blog is on turning these models into a Spark streaming applicationThe presentation on this application can be downloaded and viewed at SlideshareIn the last blog post we learned how to build a set of H2O and Spark models to predict categories for jobs posted on Craigslist using Spar...
Read moreSparkling Water Tutorials Updated
This is updated version of Sparkling Water tutorials originally published by Amy Wang here For the newest examples, and updates, please visit Sparkling Water GitHub page The blog post introduces 3 tutorials: Running Sparkling Water Locally Running Sparkling Water on Standalone Spark Cluster Running H2O Commands from Spark Shell ...
Read more'Ask Craig'- Determining Craigslist Job Categories with Sparkling Water
This is the first blog in a two blog series. The second blog is on turning these models into a Spark streaming applicationThe presentation on this application can be downloaded and viewed at SlideshareOne question we often get asked at Meetups or conferences is: “How are you guys different than other open-source machine-learning toolkits?...
Read moreScaling R with H2O
In the advent of H2O 3.0 it seems appropriately timed to reintroduce the R API for H2O to help users better understand the differences between R dataframes and H2OFrames. Typically some of the first questions we get include: Does H2O support all R packages and functions? Is H2OFrame an extension of data.frame? Are H2O supported algo...
Read moreUsing H2O for Kaggle: Guest Post by Gaston Besanson and Tim Kreienkamp
This post also appears on the GSE Data Science BlogIn this special H2O guest blog post, Gaston Besanson and Tim Kreienkamp talk about their experience using H2O for competitive data science . They are both students in the new Master of Data Science Program at the Barcelona Graduate School of Economics and used H2O in an in-class Kaggle...
Read morePyData Dallas 2015
H2O was in attendance last week at PyData in Dallas, Texas. Our CTO, Cliff Click, spoke at PyData about driving H2O from Python to perform feature-engineering, group by, quantiles, and model building with H2O’s GBM, GLM, and Distributed Random Forest . We met a lot of great people and we are really excited to see the enthusiasm for H2O w...
Read moreDeep Learning for Public Safety
This article first appeared on KDnuggetsContributors: Alex Tellez, Michal Malohlava, Prithvi Prabhu, Hank Roark, Amy Wang.Download full report We’ve seen some incredible applications of Deep Learning with respect to image recognition and machine translation but this particular use case has to do with public safety; in particular, how De...
Read moreCulture
———- Forwarded message ———- From: SriSatish Ambati srisatish@0xdata.com Date: Sun, Jun 1, 2014 at 12:29 PM Subject: Re: jirassic hierarchy. To: Kevin kevin@0xdata.com Cc: Tom Kraljevic tomk@0xdata.com, engr engr@0xdata.com, team team@0xdata.com The best cultures are ones where it feels like there isn’t any. Not saying scrum won’t fit,...
Read moreSparkling Water Certified by Cloudera
Last month before the H2O.ai team publicly announced Sparkling Water at Strata San Jose we made sure that the product was backed and certified by some major partners. This includes approval from databricks itself as well as Cloudera . Integration Testing for ClouderaFor Cloudera, testing was mainly geared toward deployment and sustaina...
Read moreThe Definitive Performance Tuning Guide for H2O Deep Learning
This document gives guidelines for performance tuning of H2O Deep Learning, both in terms of speed and accuracy. It is intended for existing users of H2O Deep Learning (which is easy to change if you’re not), as it assumes some familiarity with the parameters and use cases. Motivation This effort was in part motivated by a Deep Learn...
Read moreStrata San Jose 2015
I had a great time at Strata SJ 2015! I had a lot of fun answering questions and talking to enthusiastic and curious H2O users at our booth. It was great seeing how many people are involved in the H2O community and I also really enjoyed drinking free margaritas at the booth crawl. The H2O team met some really great people with lots of dif...
Read moreHow does Java Both Optimize Hot Loops and Allow Debugging
This blog came about because an old friend is trying to figure out how Java can do aggressive loop and inlining optimizations, while allowing the loading of new code and setting breakpoints… in that optimized code. On 2/21/2015 11:04 AM, IG wrote: Safepoint. I’m still confused because I don’t understand what is the required state at a s...
Read moreIntroducing first-Fridays Hackathon with H2O
Greetings fellow ML/AI enthusiasts! This blog post serves two purposes: 1) Introduction of our First Fridays initiative 2) Recap our first 12-hour Hackathon!WHAT:The first Friday of each month, H2O.ai will hold a Hack-A-Thon from 1pm – 10pm (yep, you read correctly!) whereby we invite ANYONE to come hack through a data problem with the H2...
Read moreLaunching H2O with Docker
Hello world, again. H2O is already relatively easy to launch, all the user needs is a compatiable Java version but now that level of difficulty is reduce to nil. Jeff, our DevOps engineer, presented me with a Docker container for H2O making shipping H2O possible regardless of your environment setup. You can now launch H2O in an isolated e...
Read more