Search Button
RSS icon Sort by:
Using Python’s datatable library seamlessly on Kaggle
by h2oai February 3, 2021 Data Munging Data Science datatable

Managing large datasets on Kaggle without fearing about the out of memory error

Read More
Speed up your Data Analysis with Python’s Datatable package
by h2oai February 5, 2020 Data Munging Data Science datatable H2O Driverless AI

A while ago, I did a write up on Python’s Datatable library. The article was an overview of the datatable package whose focus is on big data support and high performance. The article also compared datatable’s performance with the pandas’ library on certain parameters. This is the second article in the series with a two-fold objective: In […]

Read More
ensemble
Stacked Ensembles and Word2Vec now available in H2O!
by Erin LeDell February 8, 2017 Data Munging Ensembles H2O Release NLP Python R Technical

Prepared by: Erin LeDell and Navdeep Gill Stacked Ensembles H2O’s new Stacked Ensemble method is a supervised ensemble machine learning algorithm that finds the optimal combination of a collection of prediction algorithms using a process called stacking or “Super Learning.” This method currently supports regression and binary classification, and multiclass support is planned for a […]

Read More
People gather for H2o chicago
Red herring bites
by Matt Dowle May 6, 2016 Data Munging R-Bloggers Technical

At the Bay Area R User Group in February I presented progress in big-join in H2O which is based on the algorithm in R’s data.table package. The presentation had two goals: i) describe one test in great detail so everyone understands what is being tested so they can judge if it is relevant to them […]

Read More
Results
Fast csv writing for R
by Matt Dowle April 24, 2016 Data Munging R R-Bloggers Technical

R has traditionally been very slow at reading and writing csv files of, say, 1 million rows or more. Getting data into R is often the first task a user needs to do and if they have a poor experience (either hard to use, or very slow) they are less likely to progress. The data.table […]

Read More