Using Python’s datatable library seamlessly on Kaggle
February 3, 2021 Data Munging Data Science datatableManaging large datasets on Kaggle without fearing about the out of memory error
Speed up your Data Analysis with Python’s Datatable package
February 5, 2020 Data Munging Data Science datatable H2O Driverless AIA while ago, I did a write up on Python’s Datatable library. The article was an overview of the datatable package whose focus is on big data support and high performance. The article also compared datatable’s performance with the pandas’ library on certain parameters. This is the second article in the series with a two-fold objective: In […]
Stacked Ensembles and Word2Vec now available in H2O!
February 8, 2017 Data Munging Ensembles H2O Release NLP Python R TechnicalPrepared by: Erin LeDell and Navdeep Gill Stacked Ensembles H2O’s new Stacked Ensemble method is a supervised ensemble machine learning algorithm that finds the optimal combination of a collection of prediction algorithms using a process called stacking or “Super Learning.” This method currently supports regression and binary classification, and multiclass support is planned for a […]
Red herring bites
May 6, 2016 Data Munging R-Bloggers TechnicalAt the Bay Area R User Group in February I presented progress in big-join in H2O which is based on the algorithm in R’s data.table package. The presentation had two goals: i) describe one test in great detail so everyone understands what is being tested so they can judge if it is relevant to them […]
Fast csv writing for R
April 24, 2016 Data Munging R R-Bloggers TechnicalR has traditionally been very slow at reading and writing csv files of, say, 1 million rows or more. Getting data into R is often the first task a user needs to do and if they have a poor experience (either hard to use, or very slow) they are less likely to progress. The data.table […]