At H2O.ai, it is our goal to democratize AI by bridging the gap between the State-of-the-Art (SOTA) in machine learning and a user-friendly, enterprise-ready platform. We have been working tirelessly to bring the SOTA from Kaggle competitions to our enterprise platform Driverless AI since its very first release. The growing list of Driverless AI features and our growing team of Kaggle Grandmasters and industry expert data scientists can be seen as our effort and commitment to achieve that goal.
Today, we are excited to announce the availability of our latest Driverless AI release 1.9 which comes with tons of new features. This article is the first of the 1.9 release blog series. It provides a quick overview of the new features. There will be more blog posts about individual new features in the coming weeks so watch this space. You should also check out this webinar by Arno Candel and Dan Darnell.
Without further ado, here is a list of the new features in 1.9:
That’s a lot to go through. I will try to keep it short and sweet.
Yes, we heard you. Image recognition is one of the most common questions/requests from our users. After months of hard work and rigorous testing, we can now present the first version of automatic image recognition in Driverless AI.
This is the brainchild of our Kaggle Grandmaster Yauhen Babakhin and team. The idea is to mimic what Yauhen would do when he faces a new image recognition challenge. In order to automate the most time-consuming tasks, our team implemented two key features:
More importantly, we can turn the image transformer into a production-ready MOJO pipeline.
For more information, check out this webinar .
Bidirectional Encoder Representations from Transformers ( BERT ) achieved SOTA results on a number of natural language processing (NLP) tasks. Our in-house NLP experts Sudalai Rajkumar (SRK), Maximilian Jeblick, and Trushant Kalyanpur have been working hard on the BERT implementation for the 1.9 release. This enables our users to leverage SOTA techniques based on a variety of BERT models and transformers in our latest Driverless AI release out-of-the-box.
The base BERT model can be further extended for domain specific problems using recipes. Users can also productionize the models using either C++ MOJO or Python pipeline.
For more information, check out this webinar .
We are releasing MLOps to automate the end-to-end model life cycle with Driverless AI. The key capabilities of MLOps are:
For more information, check out our product page .
Results from individual analysis will appear on the screen as soon as they become available. This should give our users a better overall user experience.
We also added Kernel Shapley for the original features. This should make it easier for our users to visualize and compare the contribution from each original feature.
The new project leaderboard feature makes it easy to run multiple diverse experiments. This is useful for estimating the model complexity and accuracy trade-offs. It also keeps track of the expert settings and modeling constraints so our users can quickly generate different and diverse models within a defined search space.
This is another common feature request. We added empiric confidence bands based on actual model behavior on holdout data for regression problems.
We have extended the visualization interface to allow users’ inputs for custom visualizations.
Over the last couple of months, some of our Kaggle Grandmasters participated and placed in top rankings for Kaggle COVID-19 forecasting competitions. They also published an article on the backtesting of covid forecast.
Susceptible-Exposed-Infected-Recovered-Dead (SEIRD) is one of the useful methods for epidemic forecasting use cases. We have integrated it with Driverless AI to further enhance our time-series modeling capabilities.
Read more about our COVID-19 response here .
For regression problems with many zeros in the dataset (such as loan default ), zero-inflated model (a combination of binary classification and regression) could be a better solution than the standard models. This is now available in 1.9 out-of-the-box thanks to Ryan Chesler .
We added a feature for users to configure feature engineering steps (transformers) in multiple stages (layers). This feature allows an optional pre-processing layer for specific custom data cleanup/conversions. Subsequent layers can also take each previous layer’s output as input. Our users can now create complex feature engineering pipelines in a flexible manner.
Driverless AI can now be configured to run in a multi-node worker mode. This allows users to scale up the training process when they need to complete multiple experiments in a short amount of time.
Note : This new multi-node feature is in a preview (alpha) stage. If you are interested in using multi-node configurations, please contact email@example.com . A single experiment runs entirely on one machine. For this reason, using a large number of commodity-grade hardware is not useful in the context of multi-node.
If you are new to Driverless AI, we would recommend our risk-free, web-based test drive in H2O Aquarium Cloud . Each lab session lasts for two hours and you can keep trying our software for free. No license key required. We also have self-paced tutorials to guide you through the journey. Note: We are in the process of updating the materials to Driverless AI 1.9. The new tutorials should be available in the coming weeks.
For existing users with license keys, please download the latest version from our website . You can also find the links to different cloud marketplaces on the same page.
I hope you enjoy reading this quick overview. Please give it a spin and share your experience with us.
I would like to thank my colleagues for all the technical details and feedback. Driverless AI is the result of continuous team effort led by Arno. To illustrate, let me just leave a screenshot of Arno’s GitHub page here.
Until next time,