September 5th, 2013

Replay: Modeling MNIST With RF Hands-on Demo

RSS icon RSS Category: Uncategorized [EN]
Mnist table

Last week Spencer put together a great hands on for modeling data using H2O (http://www.meetup.com/H2Omeetup/).  This post is a write-up of the workflow for generating an RF model on MNIST data for those of you who want to walk through the demo again, or maybe missed the live action version.  I’m running through one of our local servers, with an allocation to H2O of 20 gigs.
RF on MNIST data: Spencer used a data set of pre-GPUed MNIST data similar to that provided by Kaggle in a currently running competition.  If you’re interested in some of the different approaches to the MNIST data (including Neural Nets and K Nearest Neighbors) I highly recommend taking a look at http://yann.lecun.com/exdb/mnist/.
Problem: The training data are 60,000 observations of 786 variables, testing data are 10,000 observations. Each independent variable corresponds to one square pixel of  an image. The value given for any variable indicates the level of saturation of the pixel.  Results are given and discussed below. Here is the step by step process for generating these results.

  1. Starting at the drop down menu Data inhale and parse data (both the testing and training sets).
  2. From the Model drop down menu choose **Random Forest**
    Mnist table
  3. Set Ntree = 50, and Features = 200. Leave all other options in default.  Note that H2O automatically ignores all constant columns, so you need not sort through the data summary by hand  to find those variables.Request Rf form
  4. Step 3 generates a model, the confusion matrix shown below is the output of this model.
  5. The model key is at the top of the RF results page; highlight and copy it. From the drop down menu Score, select RF.RFview data key
  6. In the specification page for scoring your RF model enter the .hex  key for your testing data, paste the model key, specify the dependent variable column, and submit.Request RFscore

At this point you have built a model and verified that it works. In practice, the motivation is generally to actually predict an outcome of interest – which you can now do with this same model by returning to the drop down menu Score and selecting Predict. Feeding  Predict data with the same predictors as contained in your training set produces a column of predictions matching each observation.
Results:  In an RF model of 50 trees,  features set to 200, and all other options left in default, H2O produces this confusion matrix.Confusion matrix
Testing the generated RF model on the test set  produces a classification error of 3.28%.Confusion matrix full scoring
So- there you have it. A walkthrough of Spencer’s meetup presentation that you can follow step by step.

Leave a Reply

+
Enhancing H2O Model Validation App with h2oGPT Integration

As machine learning practitioners, we’re always on the lookout for innovative ways to streamline and

May 17, 2023 - by Parul Pandey
+
Building a Manufacturing Product Defect Classification Model and Application using H2O Hydrogen Torch, H2O MLOps, and H2O Wave

Primary Authors: Nishaanthini Gnanavel and Genevieve Richards Effective product quality control is of utmost importance in

May 15, 2023 - by Shivam Bansal
AI for Good hackathon
+
Insights from AI for Good Hackathon: Using Machine Learning to Tackle Pollution

At H2O.ai, we believe technology can be a force for good, and we're committed to

May 10, 2023 - by Parul Pandey and Shivam Bansal
H2O democratizing LLMs
+
Democratization of LLMs

Every organization needs to own its GPT as simply as we need to own our

May 8, 2023 - by Sri Ambati
h2oGPT blog header
+
Building the World’s Best Open-Source Large Language Model: H2O.ai’s Journey

At H2O.ai, we pride ourselves on developing world-class Machine Learning, Deep Learning, and AI platforms.

May 3, 2023 - by Arno Candel
LLM blog header
+
Effortless Fine-Tuning of Large Language Models with Open-Source H2O LLM Studio

While the pace at which Large Language Models (LLMs) have been driving breakthroughs is remarkable,

May 1, 2023 - by Parul Pandey

Request a Demo

Explore how to Make, Operate and Innovate with the H2O AI Cloud today

Learn More