Big Data Science at Frontier Real Time Streaming Meetup. 250 Big Data enthusiasts have signed up for a saturday presentation! Looks like it's going to be quite interesting presentation and panel!
We will take a simple yet popular & powerful math algorithm such as Linear Regression and implement a distributed version in 2hrs. Pre-requisites: Knowledge of Java or R See: http://h2o.0xdata.com/ Warning: Only software programmers ignore Warnings! 🙂 That said, this seriously is a very hands on java-intense exercise. Extinguished engineers may not enjoy the proceedings. […]
Using the Million Songs Data we want to characterize a subset of the songs. To do this we’re going to run a binomial regression in H2O’s GLM. The approach to characterizing songs from the 90’s is the same method you can apply to your own data to characterize your customers relative to some larger group. In […]
In any field where data collection is dependent on what your clients, customers, public, whomever …. tell you, there’s the risk that people are big fat fibbers. This often happens because people respond they way they think they SHOULD rather than with their own personal truths. Social sciences and marketing people call this phenomenon social […]
Using the Million Songs Data Set I want to go from beginning to end through H2O's GLM tool. Note that the original data are large, so downloading and fiddling with the full data set can be quite painful if you just do it from your desktop, that said you can find it here. It’s a good […]
All in the day: Anqi Fu, our wickedly smart Math & Data Science hacker-intern from Stanford this summer, was characterizing GLMNet in R on sparse data and comparing with other tools. We were using a data sets predicting Two Bedroom median rent based on neighborhoods from huduser.org. DATA: http://www.huduser.org/portal/datasets/fmr/CensusRentData/index.html She found the analysis brisk and […]
Building A TB-Scale Math Platform Datasets have gotten to PB-scale, but the modeling you can do has been limited to a single-node (e.g. R, SAS) or stuck inside the database or takes hours on Hadoop-like technologies. We have built a simple clustering package, and are using it to do distributed analytics on the sum of […]
Experience a hands-on hack data session using H2O & R at BigDataBootCamp by GlobalBigDataConference. Every few months, Sridhar puts together a content-rich conference filled with highly engaged audience. This weekend Globalbigdataconference is doing a BigDataBootCamp – Tickets are on sale. Sri brings H2O & R to this audience, munging a couple of datasets for insight. […]