KNIME and H2O.ai , the two data science pioneers known for their open source platforms, have partnered to further democratize AI. Our approaches are about being open, transparent, and pushing the leading edge of AI. We believe strongly that AI is not for the select few but for everyone. We are taking another step in democratizing AI by integrating our award-winning H2O Driverless AI and KNIME Analytics Platform , to make it even easier, faster, and cheaper to deliver expert data science as a force multiplier for every enterprise.
KNIME and H2O.ai started collaborating in 2017 by integrating H2O-3 and Sparkling Water with a collection of KNIME nodes. By the way, if you want to learn more about this integration, check out the resources at the end of this blog.
Today, we are excited to announce that we have expanded our partnership and collaboration . Now you can now seamlessly use H2O Driverless AI in KNIME via a new KNIME Driverless AI extension available from the KNIME Hub. This new integration empowers data scientists or data analysts to work on machine learning projects faster and more efficiently using automation and state-of-the-art computing power to accomplish tasks that can take humans months in just minutes or hours.
KNIME users can leverage Driverless AI in a workflow to provide automatic feature engineering , model validation, model tuning, model selection, machine learning interpretability, time-series, NLP, computer vision, and automatic pipeline generation for model scoring. H2O Driverless AI provides companies with a data science platform that addresses the needs of various use cases for every enterprise in every industry.
We have been working with a few early adopters to get their feedback. The response has been overwhelmingly positive and a feeling of excitement about the integration and productivity gains. Vision Banco has been a long term user of H2O.ai and KNIME. The data science team is looking forward to the improved simplification and even more rapid development of data science projects. Below is a quote by Alejandro Lopes, the Data Science Leader at Vision Banco on how he thinks it will help them:
“We have been using KNIME and H2O Driverless AI for years, and we are very excited about this new integration and the automation and simplification that it will bring to our data science workflow. ” (Alejandro Lopez, Data Science Leader at Vision Banco)
This blog will provide you more details about the integration, how to get started, how various personas can leverage this integration, a sample workflow, and pointers to further resources.
If you are new to KNIME, you can learn more from the KNIME product page .
If you are new to H2O Driverless AI, explore the product page or tutorials .
In order to use H2O Driverless AI within KNIME Analytics Platform, all you need to do is install the H2O Driverless AI extension , and you’re ready to go. Check this video , if you do not know how to install a KNIME extension.
The integration of H2O Driverless AI in KNIME offers an extensive number of nodes and encapsulating functionalities of the H2O Driverless AI automatic machine learning (AutoML ) platform, making it easy to use H2O Driverless AI autoML capabilities from a KNIME workflow without touching any code – each of the H2O Driverless AI nodes looks and feels just like a normal KNIME node – but the workflow reaches out to the high-performance libraries of H2O during execution.
This new integration between H2O Driverless AI and KNIME helps various personas in the data science life cycle. Below will provide a short overview of key personas and how this new integration improves the workflow and productivity.
Data Engineers
For Data Engineers, this solution enables seamless data preprocessing connected into DriverlessAI using the popular, easy to use, and free KNIME Analytics Platform. You can also use KNIME Server to provide additional deployment capabilities, automation, collaboration, cloud execution, and IT administration. With the new KNIME to H2O.ai connectors, customers can do data blending with hundreds of data sources, including Salesforce, Sharepoint, Oracle, SAP, SAP Hana, Snowflake, Spark, DataBricks, Hadoop, Tibco, Tableau, PowerBI, AWS, Azure, and GCP.
Data Scientist
For data scientists and model operation teams, this solution provides additional flexibility by enabling a mix and match of automated and custom machine learning approaches. Data scientists can now collaborate with business stakeholders, gaining valuable input to achieve the optimal result. Upon initial model creation, they can ensure that it is streamlined using Integrated Deployment from KNIME and the Driverless AI AutoML and MOJO deployment artifacts . The addition of Driverless AI natively within a KNIME workflow now provides data scientists an integrated visual drag and drop ability to create such a pipeline. Data Scientists can now leverage the industry-leading AutoML in Driverless AI to quickly train high quality and explainable models that are production-ready in less time.
Deployment Teams
For Deployment Teams, there is now additional flexibility in how and where the H2O Driverless AI trained models are automatically deployed as workflows, from visualization to being deployed as RESTful services, to web applications, to BI dashboards, to 3rd party tools, and all with a no-code approach. Teams will now be able to automatically and continuously deploy and update models including automated data access, preparation, and pre-processing of workflows, ensuring that there is no loss in translation between the creation and deployment of the model and that ideal compute resources are utilized for ongoing deployment.
Data Science Team Leaders
For Leaders of Data Science teams, this solution enables you to make the best use of your people, time, and technology resources in order to meet the needs of both the team and the enterprise. It provides an environment which empowers your data science team to use best in class AutoML with other best in class approaches and to collaborate on complex projects with the granular permissions and logging needed for team and project management. Productionize data science applications and services in a way that is transparent, secure, and able to be audited and governed as needed. The deployment and management functionalities make it easy to productionize data science applications and services and deliver usable, reliable, and reproducible insights for the business.
Line of Business Leaders
This solution provides Line of Business Leaders to have insight into the entire process and data lineage so that you can understand how and why decisions are made from data access to deployment and bring your domain expertise to bear in the process. This allows you to mitigate risks and ensure the best results are delivered quickly and at scale to drive the desired business outcome.
The 4 Steps to get started with the KNIME Analytics Platform and H2O Driverless AI integration are:
Below we will provide a quick overview of each step.
1. Get the tools
Download and install KNIME Analytics Platform
Download , get trial license , and install H2O Driverless AI
If you are interested in trying the Driverless AI integration with KNIME server please fill out this form .
2. Get Driverless AI KNIME Extension
Download and Install Driverless AI KNIME Extension via the KNIME Analytics Platform.
Or get it from the KNIME HUB .
3. Configure KNIME to connect to H2O Driverless AI
You are almost ready to start, now you just need to enter the Driverless AI license key and configure KNIME to connect to H2O Driverless AI. Follow these instructions .
4. Start Building your workflow
Once you have successfully installed the Driverless AI Extension, restart KNIME Analytics Platform and you should see the following nodes in the node repository under KNIME Labs:
Get an overview of how to starting building your flow below and follow the KNIME H2O Driverless AI Integration User Guide
In this section, we will walk through an example of the major steps of an end-to-end data science workflow using KNIME Analytics Platform and Driverless AI.
Step 1: Import the Driverless AI license
In order to utilize the H2O Driverless Al nodes, you will need to import an H2O Driverless Al license file into your KNIME preferences. You will find the Driverless AI license key typically under the following path: /opt/h2oai/dai/home/.driverlessai/license.sig . Copy this file to where your KNIME Analytics Platform is installed. Import this file into KNIME by navigating to File -> Preferences -> KNIME-> H2O Driverless Al and, as shown below:
Step 2: Importing Data
KNIME supports a wide array of data types. From flat files to dynamic Spark connections, KNIME can make it simple to read disparate data types and make them work together for use in machine learning algorithms . In the below example, joining a CSV file, two database tables, and a KNIME table is a simple drag and drop process.
Step 3: Data Preparation
KNIME provides a rich set of data source connectors and data preparation nodes with a no-code drag and drop canvas to simplify data access and preparation. This empowers data analysts, data engineers and data scientists to quickly build data preparations flows to prepare, wrangle, clean, join, and filter the data and get it ready for machine learning. Once the data is prepared it can be connected to Driverless AI to build the machine learning models within the same drag and drop canvas.
In order to send KNIME data tables to Driverless AI, connect your workflow to the “Send to Driverless AI” node. Right-click the node and select “Configure” from the context menu.
Before you push the data to Driverless AI you need to configure the connection.
After you send the data to Driverless AI you can right-click on the “Send to Driverless AI” node and select “Interactive View: H2O Driverless AI Experiment View” to bring up the Driverless AI and use this interface to build an experiment, view AutoReport, and generation Machine Learning Interpretability (MLI) metrics and graphs.
Below is what the Driverless AI UI looks like within KNIME
KNIME can build Machine Learning production workflows to consume the models that were trained. H2O.ai provides production-ready low latency models and pipelines in the MOJO deployment artifact. MOJO (stands for Model Object, Optimized) is a standalone, low-latency model object designed to be easily embeddable in production environments. Add an H2O Driverless AI MOJO Predictor node to score data within a KNIME Workflow via drag and drop interface.
The expanded integration between H2O.ai and KNIME brings together all-encompassing, intuitive, automated machine learning from H2O.ai with the guided analytics from KNIME. Customers of H2O.ai and KNIME can now:
Blogs
KNIME H2O.ai Extensions
Community
Docs
Partner Pages