# Generalized Linear Modeling with H2O

May 2020: Seventh Edition

## Contents

SectionTitlePage
1Introduction6
2What is H2O?6
3Installation7
3.1Installation in R7
3.2Installation in Python8
3.3Pointing to a Different H2O Cluster9
3.4Example Code9
3.5Citation10
4Generalized Linear Models10
4.1Model Components10
4.2GLM in H2O11
4.3Model Fitting13
4.4Model Validation13
4.5Regularization14
4.5.1Lasso and Ridge Regression14
4.5.2Elastic Net Penalty15
4.6GLM Model Families15
4.6.1Linear Regression (Gaussian Family)15
4.6.2Logistic Regression (Binomial Family)17
4.6.3Fractional Logit Model (Fraction Binomial)19
4.6.4Logistic Ordinal Regression (Ordinal Family)20
4.6.5Multi-class classification (Multinomial Family)23
4.6.6Poisson Models24
4.6.7Gamma Models26
4.6.8Tweedie Models27
4.6.9Negative Binomial Models30
4.7Hierarchical GLM32
4.7.1Gaussian Family and Random Family in HGLM33
4.7.2H2O Implementation34
4.7.3Fixed and Random Coefficients Estimation35
4.7.4Estimation of Fixed Effect Dispersion Parameter/Variance35
4.7.5Estimation of Random Effect Dispersion Parameter/-Variance35
4.7.6Fitting Algorithm Overview35
4.7.7Linear Mixed Model with Correlated Random Effect36
4.7.8HGLM Model Metrics37
4.7.9Mapping of Fitting Algorithm to the H2O-3 Implementation38
5Building GLM Models in H2O38
5.1Classification and Regression38
5.2Training and Validation Frames39
5.3Predictor and Response Variables39
5.3.1Categorical Variables39
5.5Regularization Parameters40
5.5.1Alpha and Lambda40
5.5.2Lambda Search40
5.6Solver Selection43
5.6.1Solver Details43
5.6.2Stopping Criteria44
5.7.1Standardizing Data46
5.7.2Auto-remove collinear columns46
5.7.3P-Values47
5.7.4K-fold Cross-Validation47
5.7.5Grid Search Over Alpha49
5.7.6Grid Search Over Lambda50
5.7.7Offsets52
5.7.8Row Weights52
5.7.9Coefficient Constraints52
5.7.10Proximal Operators53
6GLM Model Output53
6.1Coefficients and Normalized Coefficients56
6.2Model Statistics57
6.3Confusion Matrix59
6.4Scoring History59
7Making Predictions60
7.1Batch In-H2O Predictions60
7.2Low-latency Predictions using POJOs63
8Best Practices64
8.1Verifying Model Results65
9Implementation Details66
9.1Categorical Variables67
9.1.1Largest Categorical Speed Optimization67
9.2Performance Characteristics67
9.2.1IRLSM Solver67
9.2.2L-BFGS solver68
9.3FAQ69
10Appendix: Parameters69
11Acknowledgments73
12References73
13Authors74