H2O4GPU
H2O4GPU is an open-source collection of GPU solvers created by H2O.ai. It builds on the easy-to-use scikit-learn API and its well-tested CPU-based algorithms. It can be used as a drop-in replacement for scikit-learn with support for GPUs on selected (and ever-growing) algorithms. H2O4GPU inherits all the existing scikit-learn algorithms and falls back to CPU algorithms when the GPU algorithm does not support an important existing scikit-learn class option.
Today, select algorithms are GPU-enabed. These include Gradient Boosting Machines (GBM’s), Generalized Linear Models (GLM’s), and K-Means Clustering.
H2O4GPU ROAD MAP
Currently Available:
- GLM (POGS)
- Pyton API for scoring and training
- GBM
- Inference on GPU (GLM)
- Random Forest
- Inference on GPU (GBM)
- k-Means clustering
- Scikit learn API for compatibility
- PCA
- R API for training and scoring
- SVD
Coming Q2 2018
- k-Nearest Neighbors
- Matrix Factorization
- Factorization Machines
- Quantiles
- Kalman Filters
- Sort
- Aggregator
- API Support:
- GOAI API support
- Data.table
- GOAI API support
- Performance & Scalability:
- Multi machine
Q4 2018
- Kernel Methods
- Recommendation Engines – Non-Negative Matrix Factorization Recommendation Engines – Bayesian Neural Nets
- MCMC Solver
- Time Series
- SVM
- Text Analysis-TF-IDF
- Text Analysis – Word2Vec
- Text Analysis -0oc2Vec
- Automatic K for K-means
- H2O GLM – Lasso
- Simulation Techniques
- Sampling Techniques
- Domain Specific Algorithms:
- Life Sciences
- Financial Services Underwriting
- Sampling Techniques
- Life Sciences
Gradient Linear Model (GLM)
- Framework utilizes Proximal Graph Solver (POGS)
- Solvers include Lasso, Ridge Regression, Logistic Regression, and Elastic Net Regularization


SPECIFICATIONS
Software
- PC with Ubuntu 16.04+
- Install CUDA with bundled display drivers CUDA 8 or CUDA 9
Hardware
- Nvida GPU with Compute Capability >= 3.5




- Improvements to original implementation of POGS:
- Full alpha search
- Cross Validation
- Early Stopping
- Added scikit-learn-like API
- Supports multiple GPU’s
- Full alpha search
Gradient Boosting Machines
- Based on XGBoost
- Raw floating point data — binned into quantiles
- Quantiles are stored as compressed instead of floats
- Compressed Quantiles are efficiently transferred to GPU
- Sparsity is handled directly to high GPU efficiently
- Multi-GPU enabled by sharing rows using NVIDIA NCCL AllReduce
k-Means Clustering
- Based on NVIDIA prototype of k-Means algorithm in CUDA
- Improvements to original implementation:
- Significantly faster than scikit-learn implementation (50x) and other GPU implementations (5-10x)
- Supports multiple GPU’s
- Significantly faster than scikit-learn implementation (50x) and other GPU implementations (5-10x)