October 22nd, 2013

GBM on Ecology – Recreating a model made for R

RSS icon RSS Category: Uncategorized [EN]
GBMmodelview

In the last couple of weeks we’ve had two meetups on GBM (gradient boosted classification and regression), and hence a lot of excitement about running the algorithm as presented by Cliff, Earl and Dr. Hastie. You can find the hella cool videos of both presentations here: http://www.youtube.com/0xdata
One of my favorite articles on GBM is a great case study from ecology, Elith, Leathwick & Hastie (2008). You can find the original article here: http://onlinelibrary.wiley.com/store/10.1111/j.1365-2656.2008.01390.x/asset/j.1365-2656.2008.01390.x.pdf;jsessionid=5B5FE919D24D8C3EA12FCB74BF352C62.f04t04?v=1&t=hn3iw9wm&s=29c201e8d1d94504ec9e07dcb12bfb2cb539fe7e
The authors kindly made their data and process in R publicly available, so you can get the data and try the model for yourself.
Here is the final model presented – carried out in H2O. Note that data were originally split into testing and training data (called model and eval data respectively in their available download).
The model was originally specified on 14 variables and 1000 observations. The dependent variable is found in column 2, named “Angaus”, and about 80% of the data in the column are 0. In the original paper the family was specified as Bernoulli, with a complexity of 5, and a learning rate of .01.
We recreated the original model in H2O. The specification is depicted below, as well as the output.   Note that the X variable field asks for opt out variable specification, and that both the training and testing data sets are set in the model specification page (so your model output is automatically applied to the testing data if you specify it – which is a feature I’m pretty fond of). Also notice that the model is specified as a classification because the dependent variable is a binomial.

Request GBM
Ntrees form data
And here are the results (I only requested 650 trees – which keeps with the model given in the paper, but it’s pretty trivial to request over 1000. I did it earlier with a 20gig heap and it took about as long as making a cup of coffee .)
GBMmodelview

Leave a Reply

+
Recap of H2O World India 2023: Advancements in AI and Insights from Industry Leaders

On April 19th, the H2O World  made its debut in India, marking yet another milestone

May 29, 2023 - by Parul Pandey
+
Enhancing H2O Model Validation App with h2oGPT Integration

As machine learning practitioners, we’re always on the lookout for innovative ways to streamline and

May 17, 2023 - by Parul Pandey
+
Building a Manufacturing Product Defect Classification Model and Application using H2O Hydrogen Torch, H2O MLOps, and H2O Wave

Primary Authors: Nishaanthini Gnanavel and Genevieve Richards Effective product quality control is of utmost importance in

May 15, 2023 - by Shivam Bansal
AI for Good hackathon
+
Insights from AI for Good Hackathon: Using Machine Learning to Tackle Pollution

At H2O.ai, we believe technology can be a force for good, and we're committed to

May 10, 2023 - by Parul Pandey and Shivam Bansal
H2O democratizing LLMs
+
Democratization of LLMs

Every organization needs to own its GPT as simply as we need to own our

May 8, 2023 - by Sri Ambati
h2oGPT blog header
+
Building the World’s Best Open-Source Large Language Model: H2O.ai’s Journey

At H2O.ai, we pride ourselves on developing world-class Machine Learning, Deep Learning, and AI platforms.

May 3, 2023 - by Arno Candel

Request a Demo

Explore how to Make, Operate and Innovate with the H2O AI Cloud today

Learn More