tl;dr: H2O and LiblineaR have nearly identical predictive performance.
In this blog, we examine the single-node implementations of L2-regularized logistic regression (LR) by H2O and LiblineaR .
Both LibR and H2O are driven from the R console on the same hardware and evaluated on the same datasets. We compare regression coefficients and behavior (AUC, Precision, Recall, F1) on hold out data. Before starting into the performance comparison, let’s discuss some of the differences between the two packages.
Whooa… there shouldn’t be any modeling differences, right? Well.. no, but there can be subtle implementation differences! Here we explain a few of the implementation details of H2O’s GLM and LiblineaR’s.
While we don’t focus on the distributed aspects of H2O, it should be acknowledged that H2O’s GLM modeling results come back as if the model was built on a single machine and retain the higher-quality single-machine results! H2O’s state-of-the-art GLM uses Stephen Boyd’s ADMM solver , allows for any combination of L1 & L2, performs automatic factor expansion (easily handling factors with thousands of levels), cross-validation , and optionally performs a grid search over the parameters. There are all sorts of model evaluation metrics reported by H2O’s GLM: AUC, AIC, Error, by-class error, and deviances.
How does H2O distribute GLM?
A Gram matrix is built in a parallel and distributed way. The algorithm is essentially a two-step, iterative process of building a Gram matrix and then solving for betas, building a Gram, solving for betas, and so on, until convergence on the betas. In a distributed setting with N nodes, each node computes a Gram over its data. The Gram’s are reduced together and the result is bit-for-bit identical to doing it all locally. If you want more, here are some slides on what we implemented: http://www.slideshare.net/mobile/0xdata/glm-talk-tomas . Also here is a link to the implementation in our git: https://github.com/0xdata/h2o/tree/master/src/main/java/hex/glm .
LiblineaR is also an open source implementation of GLM in C++. We note that it is discussed extensively elsewhere [pdf ], but also point out that it too has grid search capabilities and cross-validation.
In order to make fair comparisons, we match the input parameters between H2O and LiblineaR. Note that the cost parameter in LiblineaR is inversely proportional to the lambda used in H2O, scaled inversely by the number of parameters in the model:
$$C = \cfrac{1}{(\ell \times \lambda)}$$
where $$C$$ is the cost parameter in LiblineaR, $$\ell$$ is the number of features, and $$\lambda$$ is the shrinkage parameter.
All comparisons were performed on a single machine with the following attributes (from /proc/cpuinfo)
processor : 31
vendor_id : GenuineIntel
cpu family : 6
model : 45
model name : Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
stepping : 7
microcode : 0x710
cpu MHz : 1200.000
cache size : 20480 KB
physical id : 1
siblings : 16
core id : 7
cpu cores : 8
apicid : 47
initial apicid : 47
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx lahf_lm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid
bogomips : 5199.90
clflush size : 64
cache_alignment : 64
address sizes : 46 bits physical, 48 bits virtual
We used R version 3.0.2 “Frisbee Sailing” to interface with both LiblineaR (version 1.93) and H2O (build 1064).
Driving H2O from within R is easy! Checkout this blog http://0xdata.com/blog/2013/08/run-h2o-from-within-r/ and some slides from a recent meetup on the subject http://0xdata.com/blog/2013/08/big-data-science-in-h2o-with-r/ and of course this is all documented, http://docs.0xdata.com/Ruser/Rwrapper.html
We used 3 datasets: Prostate, Sample Airlines (years 1987 – 2008), and Full Airlines (years 1987 – 2013). These data are publicly available to download . The parameters and models built on these datasets are as follows:
Prostate | Sample Airlines(’87 – ’08) | Full Airlines(’87 – ’13) | |
---|---|---|---|
Features in Model | 6 | 3 | 3 |
Number of Training Instances | 306 | 24,442 | 128,654,471 |
Number of Testing Instances | 76 | 2,692 | 14,290,947 |
H2O | LiblineaR |
---|---|
family = binomial | type = 0 |
link = logit | .. |
lambda = 1 / 700 | cost = 100 |
alpha = 0.0 | .. |
beta_epsilon = 1E-4 | epsilon = 1E-4 |
nfolds = 1 | cross = 0 |
H2O | LiblineaR |
---|---|
family = binomial | type = 0 |
link = logit | .. |
lambda = 0.0033333 | cost = 100 |
alpha = 0.0 | .. |
beta_epsilon = 1E-4 | epsilon = 1E-4 |
nfolds = 1 | cross = 0 |
H2O | LiblineaR |
---|---|
family = binomial | type = 0 |
link = logit | .. |
lambda = 0.0033333 | cost = 100 |
alpha = 0.0 | .. |
beta_epsilon = 1E-4 | epsilon = 1E-4 |
nfolds = 1 | cross = 0 |
Betas | AGE | DPROS | DCAPS | PSA | VOL | GLEASON | INTERCEPT |
---|---|---|---|---|---|---|---|
H2O | -0.06725409 | 0.5742158 | 0.1369673 | 0.4041241 | -0.2270453 | 1.170544 | -0.4930266 |
LiblineaR | 0.06878511 | -0.582572 | -0.1335687 | -0.4056746 | 0.2309275 | -1.197098 | 0.4969579 |
Mean relative difference: 0.01601093
Test Evaluation | AUC | Precision | Recall | F1 Score |
---|---|---|---|---|
H2O | 0.6907796 | 0.7608696 | 0.7608696 | 0.7608696 |
LiblineaR | 0.6907796 | 0.7608696 | 0.7608696 | 0.7608696 |
Betas | DepTime | ArrTime | Distance | Intercept |
---|---|---|---|---|
H2O | 0.29061806 | -0.027987806 | 0.1360023 | 0.19251044 |
LiblineaR | 0.29585398 | -0.032675851 | 0.1373844 | 0.19258853 |
Mean relative difference: 0.01759207
Test Evaluation | AUC | Precision | Recall | F1 Score |
---|---|---|---|---|
H2O | 0.57245362 | 0.48479869 | 0.54078827 | 0.51126516 |
LiblineaR | 0.56406416 | 0.35743632 | 0.56274256 | 0.43718593 |
Betas | DepTime | ArrTime | Distance | Intercept |
---|---|---|---|---|
H2O | 0.3736 | 0.0233 | 0.1317 | -0.3933 |
LiblineaR | 0.377 | 0.0209 | 0.132 | -0.393 |
Mean relative difference: 0.006942185
Test Evaluation | AUC | Precision | Recall | F1 Score |
---|---|---|---|---|
H2O | 0.587 | 0.527 | 0.686 | 0.596 |
LiblineaR | 0.552 | 0.841 | 0.625 | 0.717 |
We can see that the H2O and LiblineaR do not vary much from one another (they all have a small mean relative difference of $$\approx 1 – 2\%$$). Typically, we would expect the objective functions being minimized to match exactly, and allow for differences in the coefficients (we see here that the betas are usually within $$10^{-3}$$). What is emphasized here are the similarities in predictive power, and we note that the AUCs above are all nearly identical.
It would be informative to involve a third reference (e.g. glmnet) to bolster the comparisons here. As this is a first stab at comparing H2O and LiblineaR, it is by no means complete. We will continue to add to this blog other datasets fit for comparison, and additionally give benchmark characteristics.
Additionally, we have skipped over a couple of obvious things: no categoricals were used here and the models aren’t very good. For this comparison, we stripped down to the bare minimum (expanding categoricals for LiblineaR will be something that is tackled in the future) and studied non-categorical data only. All modeling was done by first setting the cost parameter to 100 and then proceeding (nothing magic about $$C = 100$$).
The data are here: https://s3.amazonaws.com/h2o-bench/blog-2013-10-10
And the R scripts are here: https://github.com/0xdata/h2o/tree/master/R/tests