July 28th, 2020
Exploring the Next Frontier of Automatic Machine Learning with H2O Driverless AI
RSS Share Category: AutoML, H2O Driverless AI
By: Jo-Fai Chow
At H2O.ai, it is our goal to democratize AI by bridging the gap between the State-of-the-Art (SOTA) in machine learning and a user-friendly, enterprise-ready platform. We have been working tirelessly to bring the SOTA from Kaggle competitions to our enterprise platform Driverless AI since its very first release. The growing list of Driverless AI features and our growing team of Kaggle Grandmasters and industry expert data scientists can be seen as our effort and commitment to achieve that goal.
Today, we are excited to announce the availability of our latest Driverless AI release 1.9 which comes with tons of new features. This article is the first of the 1.9 release blog series. It provides a quick overview of the new features. There will be more blog posts about individual new features in the coming weeks so watch this space. You should also check out this webinar by Arno Candel and Dan Darnell.
Without further ado, here is a list of the new features in 1.9:
- Automatic image recognition (webinar)
- State-of-the-Art Text Analytics with BERT (webinar)
- End-to-End Model Deployment and Operation with MLOps
- Improved GUI/UX for machine learning interpretability
- Shapley values for original features
- Automatic project leaderboard
- Uncertainty quantification for regression
- Custom visualizations
- Advanced time-series with epidemic SEIRD model
- Zero-inflated model
- Multi-layer hierarchical feature engineering
- Multi-node support for model training
That’s a lot to go through. I will try to keep it short and sweet.
Automatic Image Recognition
Yes, we heard you. Image recognition is one of the most common questions/requests from our users. After months of hard work and rigorous testing, we can now present the first version of automatic image recognition in Driverless AI.
This is the brainchild of our Kaggle Grandmaster Yauhen Babakhin and team. The idea is to mimic what Yauhen would do when he faces a new image recognition challenge. In order to automate the most time-consuming tasks, our team implemented two key features:
- Pre-trained image transformers – transforming images into vectors.
- Automatic image model – automatic model training strategy optimization, hyperparameters tuning, image augmentation as well as model inspection for sanity checks and debugging. The automatic image model allows our users to get more information about the best individual model in the insights tab.
More importantly, we can turn the image transformer into a production-ready MOJO pipeline.
For more information, check out this webinar.
State-of-the-Art Text Analytics with BERT
Bidirectional Encoder Representations from Transformers (BERT) achieved SOTA results on a number of natural language processing (NLP) tasks. Our in-house NLP experts Sudalai Rajkumar (SRK), Maximilian Jeblick, and Trushant Kalyanpur have been working hard on the BERT implementation for the 1.9 release. This enables our users to leverage SOTA techniques based on a variety of BERT models and transformers in our latest Driverless AI release out-of-the-box.
The base BERT model can be further extended for domain specific problems using recipes. Users can also productionize the models using either C++ MOJO or Python pipeline.
For more information, check out this webinar.
End-to-End Model Deployment and Operation with MLOps
We are releasing MLOps to automate the end-to-end model life cycle with Driverless AI. The key capabilities of MLOps are:
- Model management – an easy collaboration with projects workspace and model store.
- Model deployment – deploy models to different environments (cloud/on-premises).
- Model monitoring – monitor specific metrics or parameters.
For more information, check out our product page.
Improved GUI/UX for Machine Learning Interpretability
Results from individual analysis will appear on the screen as soon as they become available. This should give our users a better overall user experience.
Shapley Values for Original Features
We also added Kernel Shapley for the original features. This should make it easier for our users to visualize and compare the contribution from each original feature.
Automatic Project Leaderboard
The new project leaderboard feature makes it easy to run multiple diverse experiments. This is useful for estimating the model complexity and accuracy trade-offs. It also keeps track of the expert settings and modeling constraints so our users can quickly generate different and diverse models within a defined search space.
Uncertainty Quantification for Regression
This is another common feature request. We added empiric confidence bands based on actual model behavior on holdout data for regression problems.
Custom Visualizations
We have extended the visualization interface to allow users’ inputs for custom visualizations.
Advanced Time-Series: Epidemic SEIRD Model
Over the last couple of months, some of our Kaggle Grandmasters participated and placed in top rankings for Kaggle COVID-19 forecasting competitions. They also published an article on the backtesting of covid forecast.
Susceptible-Exposed-Infected-Recovered-Dead (SEIRD) is one of the useful methods for epidemic forecasting use cases. We have integrated it with Driverless AI to further enhance our time-series modeling capabilities.
Read more about our COVID-19 response here.
Zero-inflated Model
For regression problems with many zeros in the dataset (such as loan default), zero-inflated model (a combination of binary classification and regression) could be a better solution than the standard models. This is now available in 1.9 out-of-the-box thanks to Ryan Chesler.
Multi-layer Hierarchical Feature Engineering
We added a feature for users to configure feature engineering steps (transformers) in multiple stages (layers). This feature allows an optional pre-processing layer for specific custom data cleanup/conversions. Subsequent layers can also take each previous layer’s output as input. Our users can now create complex feature engineering pipelines in a flexible manner.
Multi-node Support for Model Training
Driverless AI can now be configured to run in a multi-node worker mode. This allows users to scale up the training process when they need to complete multiple experiments in a short amount of time.
Note: This new multi-node feature is in a preview (alpha) stage. If you are interested in using multi-node configurations, please contact support@h2o.ai. A single experiment runs entirely on one machine. For this reason, using a large number of commodity-grade hardware is not useful in the context of multi-node.
How to Get Started?
If you are new to Driverless AI, we would recommend our risk-free, web-based test drive in H2O Aquarium Cloud. Each lab session lasts for two hours and you can keep trying our software for free. No license key required. We also have self-paced tutorials to guide you through the journey. Note: We are in the process of updating the materials to Driverless AI 1.9. The new tutorials should be available in the coming weeks.
For existing users with license keys, please download the latest version from our website. You can also find the links to different cloud marketplaces on the same page.
I hope you enjoy reading this quick overview. Please give it a spin and share your experience with us.
Learning Resources
- Self-paced tutorials and instructor-led courses from our Learning Center.
- H2O Documentation.
- H2O Blog.
- Be part of our community, find a meetup group near you.
- H2O Events Overview
- Related webinars:
- July 9: What the Future of AI Looks Like with Arno Candel, CTO
- July 16: More Use Cases and More Value with Automated Computer Vision Modeling
- July 23: State of The Art NLP Models in H2O Driverless AI 1.9
- July 30: Further Exploration into Model Explainability with H2O Driverless AI 1.9
- July 30: Making to Production with Machine Learning
Acknowledgements
I would like to thank my colleagues for all the technical details and feedback. Driverless AI is the result of continuous team effort led by Arno. To illustrate, let me just leave a screenshot of Arno’s GitHub page here.
Until next time,
Joe