0xdata and Yelp – Machine Learning for Relevance and Serendipity/Distributed Gradient Boosting
October 31, 2013 Uncategorized [EN]Join us and Yelp for a chat on Machine Learning, and make sure not to miss Sri’s lightning talk on Distributed Gradient Boosting! Main Talk: Machine Learning for Relevance and Serendipity Speaker: Aria Haghighi (Prismatic) Abstract: Careful use of well-designed machine learning systems can transform products by providing highly personalized user experiences. Unlike hand-tuned or heuristic-based personalization […]
Our data, our math // our tools, our science!
October 30, 2013 Uncategorized [EN]Big data has always been with us. Our race's answer to data explosion was through math & computation. Whether it was Newton's calculus, Einstein's Relativity or Shannon's Information Theory, each generation's answer to it's big data problem arose from it's best and brightest. Our generation's challenge is here. Our lives are mired in data. If […]
Building a Distributed GBM on H2O
October 29, 2013 Uncategorized [EN]At 0xdata we build state-of-the-art distributed algorithms – and recently we embarked on building GBM, and algorithm notorious for being impossible to parallelize much less distribute. We built the algorithm shown in Elements of Statistical Learning II, Trevor Hastie, Robert Tibshirani, and Jerome Friedman on page 387 (shown at the bottom of this post). Most […]
An API For Distributed Analytics
October 28, 2013 Uncategorized [EN]There are so many APIs to choose from… Features of the space: Lots of data – which I’ll qualify as “bigger than 1 machine” and thus needing parallel i.o, parallel memory, & parallel compute – and distributed algorithms. Ease of programming; hide details (but expose when want to). High level for ease-of-use, but “under the […]
Strata NYC & Hadoop World: How to Stop Worrying and Start Modeling Big Data with Better Algorithms and H2O
October 25, 2013 Uncategorized [EN]How to Stop Worrying and Start Modeling Big Data with Better Algorithms and H2O Srisatish Ambati (0xdata Inc), Cliff Click (0xdata Inc) 5:05pm Tuesday, 10/29/2013 Data Science Beekman Parlor – Sutton North Data Modeling has been constrained through scale; Sampling still rules the day for Adhoc Analytics. Scale brings much needed change to the modeling […]
GBM on Ecology – Recreating a model made for R
October 22, 2013 Uncategorized [EN]In the last couple of weeks we’ve had two meetups on GBM (gradient boosted classification and regression), and hence a lot of excitement about running the algorithm as presented by Cliff, Earl and Dr. Hastie. You can find the hella cool videos of both presentations here: http://www.youtube.com/0xdata One of my favorite articles on GBM is a […]
NYC Big Data Meetup – Distributed Random Forest, GBM, GLM & API for Big Data Algos
October 22, 2013 Uncategorized [EN]Distributed Machine Learning has come of age. Just in time to meet the challenges of Big Data, we present an API for extending and rolling your own Algorithms or using powerful contest-winning Gradient Boosting Machine, Generalized Linear Modeling and Random Forest at scale. Demo and Fireworks using big datasets from within the familiar R interface […]
Join Us Tomorrow at Trulia – Distributed GBM!
October 16, 2013 Uncategorized [EN]Hi hackers! Just a quick reminder we’ll be joining our friends at Trulia tomorrow for a meetup on machine learning discussing Distributed GBM. GBM is one of the most popular machine learning algorithms used in data mining competitions. Most of us use GBM through R implementation. However, we have recently written a distributed version for […]