Return to page

H2O4GPU is an open-source collection of GPU solvers created by It builds on the easy-to-use scikit-learn API and its well-tested CPU-based algorithms. It can be used as a drop-in replacement for scikit-learn with support for GPUs on selected (and ever-growing) algorithms. H2O4GPU inherits all the existing scikit-learn algorithms and falls back to CPU algorithms when the GPU algorithm does not support an important existing scikit-learn class option.

Today, select algorithms are GPU-enabed. These include Gradient Boosting Machines (GBM’s), Generalized Linear Models (GLM’s), and K-Means Clustering.



Currently Available: 

  • GLM (POGS)

  • Pyton API for scoring and training

  • GBM

  • Inference on GPU (GLM)

  • Random Forest

  • Inference on GPU (GBM)

  • k-Means clustering

  • Scikit learn API for compatibility

  • PCA 

  • R API for training and scoring 

  • SVD 

Coming Q2 2018

  • k-Nearest Neighbors 

  • Matrix Factorization 

  • Factorization Machines 

  • Quantiles 

  • Kalman Filters 

  • Sort 

  • Aggregator 

  • API Support: 
    • GOAI API support 

    • Data.table 

  • Performance & Scalability: 
    • Multi machine

Q4 2018

  • Kernel Methods 

  • Recommendation Engines – Non-Negative Matrix Factorization Recommendation Engines – Bayesian Neural Nets 

  • MCMC Solver 

  • Time Series 

  • SVM 

  • Text Analysis-TF-IDF 

  • Text Analysis – Word2Vec 

  • Text Analysis -0oc2Vec 

  • Automatic K for K-means 

  • H2O GLM – Lasso 

  • Simulation Techniques 

  • Sampling Techniques 

  • Domain Specific Algorithms: 
    • Life Sciences 

    • Financial Services Underwriting 

    • Sampling Techniques


Gradient Linear Model (GLM) 

  • Framework utilizes Proximal Graph Solver (POGS)

  • Solvers include Lasso, Ridge Regression, Logistic Regression, and Elastic Net Regularization



  • PC with Ubuntu 16.04+

  • Install CUDA with bundled display drivers CUDA 8 or CUDA 9


  • Nvida GPU with Compute Capability >= 3.5
  • Improvements to original implementation of POGS:
    • Full alpha search

    • Cross Validation

    • Early Stopping

    • Added scikit-learn-like API

    • Supports multiple GPU’s


Gradient Boosting Machines 

  • Based on XGBoost

  • Raw floating point data — binned into quantiles

  • Quantiles are stored as compressed instead of floats

  • Compressed Quantiles are efficiently transferred to GPU

  • Sparsity is handled directly to high GPU efficiently

  • Multi-GPU enabled by sharing rows using NVIDIA NCCL AllReduce

k-Means Clustering 

  • Based on NVIDIA prototype of k-Means algorithm in CUDA

  • Improvements to original implementation:
    • Significantly faster than scikit-learn implementation (50x) and other GPU implementations (5-10x)

    • Supports multiple GPU’s