Public Data Sets
August 16, 2013 UncategorizedFor your data analysis pleasure, I give you a giant list of super cool publicly available data. If you’re looking at the data sets and wondering “now what?” – you can find this list AND tutorials on how to use H2O for analysis at the H2O docs page (here: http://docs.0xdata.com). You can also get a detailed […]
TCP Is Not Reliable
August 16, 2013 UncategorizedBeen to long between blogs… “TCP Is Not Reliable” – what's THAT mean? Means: I can cause TCP to reliably fail in under 5 mins, on at least 2 different modern Linux variants and on modern hardware, both in our datacenter (no hypervisor) and on EC2. What does “fail” mean? Means the client will open […]
Run H2O From Within R
August 13, 2013 UncategorizedWith the REST API, it's simple to run H2O operations from within R using similar syntax to all your favorite R functions. In this post, we'll walk through a simple demo of its capabilities. First, get H2O installed and running by following the tutorial here. Once you have the R package loaded, you can take […]
Use R to run Better Algorithms on Big Data
August 12, 2013 UncategorizedOur resident R users will demonstrate how to use the R package and invoke big data modeling entirely from R. In this session our resident R & Math hacker, Anqi Fu will demonstrate the R API for H2O. Early users, community and customers of H2O have been invoking GLM, Random Forest and K-means from an […]
Random Forest Measurements for the MNIST Dataset
August 8, 2013 UncategorizedThis post discusses the performance of H2O’s Random Forest [5] algorithm. We compare different versions of H2O as well as the RF implementation by wise.io. We use wall-clock time to measure work flows that match up with the user experience. A link to the scripts used is available here [1].
We the people: Our meetup member introductions
August 5, 2013 UncategorizedYou may have noticed that we have a ton of stuff going on at 0xdata, including several upcoming meetups that I expect will be very well attended. I was feeling a little curious about who exactly would be attending. What are the common areas of interest, are our members mostly software people or data scientists? […]
Hey good looking; Visualization and Data Mining 1
August 1, 2013 UncategorizedI recently came across an article by Shaw et al, in Decision Support Systems (1). The article discussed the importance of data mining and information management to good customer relationship management in increasingly competitive markets. A key point of the paper that I agree with is the importance of heuristics in data mining, particularly in […]
Big Data Cloud Computing Streaming Systems & Infrastructures
July 27, 2013 UncategorizedBig Data Science at Frontier Real Time Streaming Meetup. 250 Big Data enthusiasts have signed up for a saturday presentation! Looks like it's going to be quite interesting presentation and panel!