Search Button
RSS icon Sort by:
H2O GBM Tuning Tutorial for R
H2O GBM Tuning Tutorial for R
by Arno Candel June 16, 2016 GBM R Technical Tutorials

In this tutorial, we show how to build a well-tuned H2O GBM model for a supervised classification task. We specifically don’t focus on feature engineering and use a small dataset to allow you to reproduce these results in a few minutes on a laptop. This script can be directly transferred to datasets that are hundreds […]

Read More
Error for Random Search
Hyperparameter Optimization in H2O: Grid Search, Random Search and the Future
by Raymond Peck June 16, 2016 R-Bloggers Technical Tutorials

“Good, better, best. Never let it rest. ‘Til your good is better and your better is best.” – St. Jerome tl;dr H2O now has random hyperparameter search with time- and metric-based early stopping. Bergstra and Bengio[1] write on p. 281: Compared with neural networks configured by a pure grid search, we find that random search […]

Read More
Spam Detection with Sparkling Water and Spark Machine Learning Pipelines
Spam Detection with Sparkling Water and Spark Machine Learning Pipelines
by Jakub Hava June 15, 2016 Sparkling Water Technical Tutorials

This short post presents the “ham or spam” demo, which has already been posted earlier by Michal Malohlava, using our new API in latest Sparkling Water for Spark 1.6 and earlier versions, unifying Spark and H2O Machine Learning pipelines. It shows how to create a simple Spark Machine Learning pipeline and a model based on […]

Read More
People gather for H2o chicago
Red herring bites
by Matt Dowle May 6, 2016 Data Munging R-Bloggers Technical

At the Bay Area R User Group in February I presented progress in big-join in H2O which is based on the algorithm in R’s data.table package. The presentation had two goals: i) describe one test in great detail so everyone understands what is being tested so they can judge if it is relevant to them […]

Read More
Results
Fast csv writing for R
by Matt Dowle April 24, 2016 Data Munging R R-Bloggers Technical

R has traditionally been very slow at reading and writing csv files of, say, 1 million rows or more. Getting data into R is often the first task a user needs to do and if they have a poor experience (either hard to use, or very slow) they are less likely to progress. The data.table […]

Read More
1 4 5