This post discusses the performance of H2O’s Random Forest  algorithm. We compare different versions of H2O as well as the RF implementation by wise.io . We use wall-clock time to measure work flows that match up with the user experience. A link to the scripts used is available here  .
The original MNIST data (60,000 28×28 images of hand-written digits) was expanded into 8.1 million instances by thickening, dilating, skewing, and contracting the original images as described here  .
This expanded MNIST dataset is available here  . There are 784 features with values in the range 0 – 255. This data was split into testing and training sets following the methodology described here  .
Dataset Name: mnist8m Number of Features: 784</code></li> Number of Training Observations: 7,000,000 Number of Testing Observations: 1,100,000 Number of Classes: 10
The following parameters were used for comparing the H2O RF and WiseRF algorithms. These parameters were chosen using the methodology described here  .
This methodology was previously shown to produce very low error rates when predicting on the test data (less than one-tenth of one percent in all cases).
Tests were performed on an Amazon EC2 instance. Since individual runs in EC2 can experience variability, each configuration was run 10 times. The graphs in the “Speed” section below show box plots for each configuration (each box represents 10 runs).
depth: 2147483647 (no limit)
bin limit: 1024
max depth: 0 (no limit)
min node size: 1
feature type: uchar, float, double
The following graphs show the measurements for H2O’s RF and WiseRF. All measurements are wall clock times. The graphs are broken down into overall run time, the time to parse the training file, the time to train the model, and the time to parse and score a test file. Note: Graphs are annotated with median times.
The graphs show that H2O’s RF performance improved significantly from Fourier-1 to Fourier-6. The reader can see that overall the Wise RF algorithm performs roughly equivalently to H2O when the dataset values do not fit perfectly within a single byte. Also note that, once a model is built, both H2O and WiseRF-uchar parse and score the test data in nearly the same amount of time.
The graph shows memory utilization (RSS) for each algorithm over time. These values were collected over one additional independent run for each algorithm. Note that none of the algorithms required swap to run.
Scripts and Steps to reproduce the data are available here . Directions in the README.
Data used to generate charts above available here and also below in the tables (memory tables excluded).