October 22nd, 2013

GBM on Ecology – Recreating a model made for R

RSS icon RSS Category: Uncategorized
GBMmodelview

In the last couple of weeks we’ve had two meetups on GBM (gradient boosted classification and regression), and hence a lot of excitement about running the algorithm as presented by Cliff, Earl and Dr. Hastie. You can find the hella cool videos of both presentations here: http://www.youtube.com/0xdata
One of my favorite articles on GBM is a great case study from ecology, Elith, Leathwick & Hastie (2008). You can find the original article here: http://onlinelibrary.wiley.com/store/10.1111/j.1365-2656.2008.01390.x/asset/j.1365-2656.2008.01390.x.pdf;jsessionid=5B5FE919D24D8C3EA12FCB74BF352C62.f04t04?v=1&t=hn3iw9wm&s=29c201e8d1d94504ec9e07dcb12bfb2cb539fe7e
The authors kindly made their data and process in R publicly available, so you can get the data and try the model for yourself.
Here is the final model presented – carried out in H2O. Note that data were originally split into testing and training data (called model and eval data respectively in their available download).
The model was originally specified on 14 variables and 1000 observations. The dependent variable is found in column 2, named “Angaus”, and about 80% of the data in the column are 0. In the original paper the family was specified as Bernoulli, with a complexity of 5, and a learning rate of .01.
We recreated the original model in H2O. The specification is depicted below, as well as the output.   Note that the X variable field asks for opt out variable specification, and that both the training and testing data sets are set in the model specification page (so your model output is automatically applied to the testing data if you specify it – which is a feature I’m pretty fond of). Also notice that the model is specified as a classification because the dependent variable is a binomial.

Request GBM
Ntrees form data
And here are the results (I only requested 650 trees – which keeps with the model given in the paper, but it’s pretty trivial to request over 1000. I did it earlier with a 20gig heap and it took about as long as making a cup of coffee .)
GBMmodelview

Leave a Reply

+
Developing and Retaining Data Science Talent

It’s been almost a decade since the Harvard Business Review proclaimed that “Data Scientist” is

May 12, 2022 - by Jon Farland
+
The H2O.ai Wildfire Challenge Winners Blog Series – Team Too Hot Encoder

Note: this is a community blog post by Team Too Hot Encoder - one of

May 10, 2022 - by H2O.ai Team
+
The H2O.ai Wildfire Challenge Winners Blog Series – Team HTB

Note: this is a community blog post by Team HTB - one of the H2O.ai

May 10, 2022 - by H2O.ai Team
+
Bias and Debiasing

An important aspect of practicing machine learning in a responsible manner is understanding how models

April 15, 2022 - by Kim Montgomery
+
Comprehensive Guide to Image Classification using H2O Hydrogen Torch

In this article, we will learn how to build state-of-the-art models in computer vision and

March 29, 2022 - by H2O.ai Team
+
H2O Wave Snippet Plugin for PyCharm

Note: this blog post by Shamil Dilshan Prematunga was first published on Medium. What is PyCham? PyCharm

March 24, 2022 - by Shamil Prematunga

Start Your Free Trial