For your data analysis pleasure, I give you a giant list of super cool publicly available data. If you’re looking at the data sets and wondering “now what?” – you can find this list AND tutorials on how to use H2O for analysis at the H2O docs page (here: http://docs.0xdata.com). You can also get a detailed […]
Been to long between blogs… “TCP Is Not Reliable” – what's THAT mean? Means: I can cause TCP to reliably fail in under 5 mins, on at least 2 different modern Linux variants and on modern hardware, both in our datacenter (no hypervisor) and on EC2. What does “fail” mean? Means the client will open […]
With the REST API, it's simple to run H2O operations from within R using similar syntax to all your favorite R functions. In this post, we'll walk through a simple demo of its capabilities. First, get H2O installed and running by following the tutorial here. Once you have the R package loaded, you can take […]
Our resident R users will demonstrate how to use the R package and invoke big data modeling entirely from R. In this session our resident R & Math hacker, Anqi Fu will demonstrate the R API for H2O. Early users, community and customers of H2O have been invoking GLM, Random Forest and K-means from an […]
This post discusses the performance of H2O’s Random Forest  algorithm. We compare different versions of H2O as well as the RF implementation by wise.io. We use wall-clock time to measure work flows that match up with the user experience. A link to the scripts used is available here .
You may have noticed that we have a ton of stuff going on at 0xdata, including several upcoming meetups that I expect will be very well attended. I was feeling a little curious about who exactly would be attending. What are the common areas of interest, are our members mostly software people or data scientists? […]
I recently came across an article by Shaw et al, in Decision Support Systems (1). The article discussed the importance of data mining and information management to good customer relationship management in increasingly competitive markets. A key point of the paper that I agree with is the importance of heuristics in data mining, particularly in […]
Big Data Science at Frontier Real Time Streaming Meetup. 250 Big Data enthusiasts have signed up for a saturday presentation! Looks like it's going to be quite interesting presentation and panel!