We were very excited to meet with our advisors (Prof. Stephen Boyd, Prof. Rob Tibshirani and Prof. Trevor Hastie) at H2O.AI on Jan 6, 2017.
— H2O.ai (@h2oai) January 6, 2017
Our CEO, Sri Ambati, made two great observations at the start of the meeting:
There were several techniques to get around this problem and make machine learning solutions interpretable to our customers:
This layered approach could provide great speed up as well. Imagine the cases where you could use feature sets for images/text/speech derived from others on your datasets, all you need to do was to build your simple model off the feature sets to perform the functions you desired. In this case, deep learning is the equivalent of PCA for non-linear features. Prof. Boyd seemed to like GLRM (check out H2O GLRM) as well for feature extraction.
With this layered approach, there were more system parameters to tune. Our auto-ML toolbox would be perfect for this! Go team!
Subsequently the conversation turned to visualization of datasets. Patrick Hall brought up the approach to first use clustering to separate the datasets and apply simple models for each cluster. This approach was very similar to their hierarchical mixture of experts algorithm described in their elements of statistical learning book. Basically, you built decision trees from your dataset, then fit linear models at the leaf nodes to perform specific tasks.
Our very own Dr. Wilkinson had built a dataset visualization tool that could summarize a big dataset while maintaining the characteristics of the original datasets (like outliners and others). Totally awesome!
Arno Candel brought up the issue of overfitting and how to detect it during the training process rather than at the end of the training process using the held-out set. Prof. Boyd mentioned that we should checkout Bayesian trees/additive models.
Last Words of Wisdom from our esteemed advisors: Deep learning was powerful but other algorithms like random forest could beat deep learning depending on the datasets. Deep learning required big datasets to train. It worked best with datasets that had some kind of organization in it like spatial features (in images) and temporal trends (in speech/time series). Random forest, on the other hand, worked perfectly well with dataset with no such organization/features.