January 9th, 2017

Start Off 2017 with Our Stanford Advisors

RSS icon RSS Category: Community, Technical
Fallback Featured Image

We were very excited to meet with our advisors (Prof. Stephen Boyd, Prof. Rob Tibshirani and Prof. Trevor Hastie) at H2O.AI on Jan 6, 2017.


Our CEO, Sri Ambati, made two great observations at the start of the meeting:

  • First was the hardware trend where hardware companies like Intel/Nvidia/AMD plan to put the various machine learning algorithms into hardware/GPUs.
  • Second was the data trend where more and more datasets are images/texts/audio instead of the traditional transactional datasets.  To deal with these new datasets, deep learning seems to be the go-to algorithm.  However, with deep learning, it might work very well but it was very difficult to explain to business or regulatory professionals how and why it worked.

There were several techniques to get around this problem and make machine learning solutions interpretable to our customers:

  • Patrick Hall pointed out that monotonicity determines interpretability, not linearity of systems.  He cited a credit scoring system using a constrained neural network, when the input variable was monotonic to the response variable, the system could automatically generate reason codes.
  • One could use deep learning and simpler algorithms (like GLM, Random Forest, etc.) on datasets.  When the performances were similar, we chose the simple models since they tended to be more interpretable. These meetings were great learning opportunities for us.
  • Another suggestion is to use a layered approach:
  • Use deep learning to extract a small number of features from a high dimension datasets.
  • Next, use a simple model that used these extracted features to perform specific tasks.

This layered approach could provide great speed up as well.  Imagine the cases where you could use feature sets for images/text/speech derived from others on your datasets, all you need to do was to build your simple model off the feature sets to perform the functions you desired.  In this case, deep learning is the equivalent of PCA for non-linear features.  Prof. Boyd seemed to like GLRM (check out H2O GLRM) as well for feature extraction.

With this layered approach, there were more system parameters to tune.  Our auto-ML toolbox would be perfect for this!  Go team!

Subsequently the conversation turned to visualization of datasets.  Patrick Hall brought up the approach to first use clustering to separate the datasets and apply simple models for each cluster.  This approach was very similar to their hierarchical mixture of experts algorithm described in their elements of statistical learning book.  Basically, you built decision trees from your dataset, then fit linear models at the leaf nodes to perform specific tasks.
Our very own Dr. Wilkinson had built a dataset visualization tool that could summarize a big dataset while maintaining the characteristics of the original datasets (like outliners and others). Totally awesome!
Arno Candel brought up the issue of overfitting and how to detect it during the training process rather than at the end of the training process using the held-out set.  Prof. Boyd mentioned that we should checkout Bayesian trees/additive models.
Last Words of Wisdom from our esteemed advisors: Deep learning was powerful but other algorithms like random forest could beat deep learning depending on the datasets.  Deep learning required big datasets to train.  It worked best with datasets that had some kind of organization in it like spatial features (in images) and temporal trends (in speech/time series).  Random forest, on the other hand, worked perfectly well with dataset with no such organization/features.

Leave a Reply

+
A Brief Overview of AI Governance for Responsible Machine Learning Systems

Our paper “A Brief Overview of AI Governance for Responsible Machine Learning Systems” was recently

November 30, 2022 - by Navdeep Gill, Abhishek Mathur and Marcos V. Conde
+
H2O World Dallas Customer Talks

After three long years of not having an #H2OWorld, we finally held our first one

November 24, 2022 - by Vinod Iyengar
+
New in Wave 0.24.0

Another Wave release has arrived with quite a few exciting new features. Let's quickly go

November 21, 2022 - by Martin Turoci
Fallback Featured Image
+
H2O.ai Raises $40 Million to Democratize Artificial Intelligence for the Enterprise

Series C round led by Wells Fargo and NVIDIA MOUNTAIN VIEW, CA – November 30, 2017

November 20, 2022 - by
+
H2O.ai Placed Furthest in Completeness of Vision in 2021 Gartner Data Science and Machine Learning Magic Quadrant in the Visionaries Quadrant. — Copy

At H2O.ai, our mission is to democratize AI, and we believe driving value from data

November 18, 2022 - by Read Maloney, SVP of Marketing
+
H2O.ai Expands Market Footprint in Healthcare AI by Signing Hackensack Meridian Health and Other Key Providers

We’re excited to attend the HLTH conference this week in Las Vegas, NV. This industry

November 14, 2022 - by Prashant Natarajan

Start Your Free Trial