October 22nd, 2013

GBM on Ecology – Recreating a model made for R

RSS icon RSS Category: Uncategorized [EN]

In the last couple of weeks we’ve had two meetups on GBM (gradient boosted classification and regression), and hence a lot of excitement about running the algorithm as presented by Cliff, Earl and Dr. Hastie. You can find the hella cool videos of both presentations here: http://www.youtube.com/0xdata
One of my favorite articles on GBM is a great case study from ecology, Elith, Leathwick & Hastie (2008). You can find the original article here: http://onlinelibrary.wiley.com/store/10.1111/j.1365-2656.2008.01390.x/asset/j.1365-2656.2008.01390.x.pdf;jsessionid=5B5FE919D24D8C3EA12FCB74BF352C62.f04t04?v=1&t=hn3iw9wm&s=29c201e8d1d94504ec9e07dcb12bfb2cb539fe7e
The authors kindly made their data and process in R publicly available, so you can get the data and try the model for yourself.
Here is the final model presented – carried out in H2O. Note that data were originally split into testing and training data (called model and eval data respectively in their available download).
The model was originally specified on 14 variables and 1000 observations. The dependent variable is found in column 2, named “Angaus”, and about 80% of the data in the column are 0. In the original paper the family was specified as Bernoulli, with a complexity of 5, and a learning rate of .01.
We recreated the original model in H2O. The specification is depicted below, as well as the output.   Note that the X variable field asks for opt out variable specification, and that both the training and testing data sets are set in the model specification page (so your model output is automatically applied to the testing data if you specify it – which is a feature I’m pretty fond of). Also notice that the model is specified as a classification because the dependent variable is a binomial.

Request GBM
Ntrees form data
And here are the results (I only requested 650 trees – which keeps with the model given in the paper, but it’s pretty trivial to request over 1000. I did it earlier with a 20gig heap and it took about as long as making a cup of coffee .)

Leave a Reply

Three Keys to Ethical Artificial Intelligence in Your Organization

There’s certainly been no shortage of examples of AI gone bad over the past few

September 23, 2022 - by H2O.ai Team
Using GraphQL, HTTPX, and asyncio in H2O Wave

Today, I would like to cover the most basic use case for H2O Wave, which is

September 21, 2022 - by Martin Turoci
머신러닝 자동화 솔루션 H2O Driveless AI를 이용한 뇌에서의 성차 예측

Predicting Gender Differences in the Brain Using Machine Learning Automation Solution H2O Driverless AI 아동기 뇌인지

August 29, 2022 - by H2O.ai Team
Make with H2O.ai Recap: Validation Scheme Best Practices

Data Scientist and Kaggle Grandmaster, Dmitry Gordeev, presented at the Make with H2O.ai session on

August 23, 2022 - by Blair Averett
Integrating VSCode editor into H2O Wave

Let’s have a look at how to provide our users with a truly amazing experience

August 18, 2022 - by Martin Turoci
5 Tips for Improving Your Wave Apps

Let’s quickly uncover a few simple tips that are quick to implement and have a

August 9, 2022 - by Martin Turoci

Start Your Free Trial