
By: H2O.ai
An API for Distributed Computing
We have defined an API and built an open-source platform for dealing with in-memory distributed data. We’ve used it to built state-of-the-art predictive modeling and analytics (e.g. GLMNET, GBM, Random Forest) that’s 1000x faster than the disk-bound alternatives, and 100x faster than R (we love R but it’s tooo slow on big data!). We’re building our newest algorithms in a few weeks, start to finish, because the platform makes Big Math easy. We routinely test on 100G datasets, have customers using 200G datasets, and have lab tested even more.
This talk is about a coding style & API that lets us seamlessly deal with datasets from 1K to 1TB without changing a line of code, lets us use clusters ranging from your laptop to 50 server clusters with many many TB of ram and hundreds of CPUs.
Talk objectives:
Learn about a platform & API for doing in-memory analytics.
Target audience:
People who got data, and want to do fast predictive modeling and analytics… or just need a platform that lets them code to Big Data naturally.