Return to page

BLOG

Combining the power of KNIME and H2O.ai in a single integrated workflow

 headshot

By Rafael Coss | minute read | October 14, 2020

Blog decorative banner image

KNIME  and H2O.ai , the two data science  pioneers known for their open source platforms, have partnered to further democratize AI. Our approaches are about being open, transparent, and pushing the leading edge of AI. We believe strongly that AI is not for the select few but for everyone. We are taking another step in democratizing AI by integrating our award-winning H2O Driverless AI  and KNIME Analytics Platform , to make it even easier, faster, and cheaper to deliver expert data science as a force multiplier for every enterprise.

KNIME and H2O.ai started collaborating in 2017 by integrating H2O-3 and Sparkling Water with a collection of KNIME nodes. By the way, if you want to learn more about this integration, check out the resources at the end of this blog.

Today, we are excited to announce that we have expanded our partnership and collaboration . Now you can now seamlessly use H2O Driverless AI in KNIME via a new KNIME Driverless AI extension  available from the KNIME Hub. This new integration empowers data scientists or data analysts to work on machine learning projects faster and more efficiently using automation and state-of-the-art computing power to accomplish tasks that can take humans months in just minutes or hours.

  • Develop an integrated data science workflow in KNIME Analytics Platform, from data discovery, data preparation to production-ready predictive models
  • Deliver the power of automatic machine learning to business analysts, enabling more citizen data scientists with H2O Driverless AI
  • Reduce model deployment times, leveraging H2O Driverless AI and KNIME Server for reliably managing production deployment process

KNIME users can leverage Driverless AI in a workflow to provide automatic feature engineering , model validation, model tuning, model selection, machine learning interpretability, time-series, NLP, computer vision, and automatic pipeline generation for model scoring. H2O Driverless AI provides companies with a data science platform that addresses the needs of various use cases for every enterprise in every industry.

We have been working with a few early adopters to get their feedback. The response has been overwhelmingly positive and a feeling of excitement about the integration and productivity gains. Vision Banco  has been a long term user of H2O.ai and KNIME.  The data science team is looking forward to the improved simplification and even more rapid development of data science projects.  Below is a quote by Alejandro Lopes, the Data Science Leader at Vision Banco on how he thinks it will help them:

We have been using KNIME and H2O Driverless AI for years, and we are very excited about this new integration and the automation and simplification that it will bring to our data science workflow. ” (Alejandro Lopez, Data Science Leader at Vision Banco)

This blog will provide you more details about the integration, how to get started, how various personas can leverage this integration, a sample workflow, and pointers to further resources.

If you are new to KNIME, you can learn more from the KNIME product page .

Screen-Shot-2020-10-13-at-9.18.34-PM-1024x505.png Screen-Shot-2020-10-13-at-9.18.34-PM-1024x505.png

If you are new to H2O Driverless AI, explore the product page  or tutorials .

Screen-Shot-2020-10-12-at-10.55.52-PM-1024x569.png Screen-Shot-2020-10-12-at-10.55.52-PM-1024x569.png

The KNIME H2O Driverless AI Extension

In order to use H2O Driverless AI within KNIME Analytics Platform, all you need to do is install the H2O Driverless AI extension , and you’re ready to go. Check this video , if you do not know how to install a KNIME extension.

The integration of H2O Driverless AI in KNIME offers an extensive number of nodes and encapsulating functionalities of the H2O Driverless AI automatic machine learning (AutoML ) platform, making it easy to use H2O Driverless AI autoML capabilities from a KNIME workflow without touching any code – each of the H2O Driverless AI nodes looks and feels just like a normal KNIME node – but the workflow reaches out to the high-performance libraries of H2O during execution.

Screen-Shot-2020-10-12-at-10.56.54-PM-1024x554.png Screen-Shot-2020-10-12-at-10.56.54-PM-1024x554.png

Use Cases By Persona

This new integration between H2O Driverless AI and KNIME helps various personas in the data science life cycle. Below will provide a short overview of key personas and how this new integration improves the workflow and productivity.

Data Engineers 

For Data Engineers, this solution enables seamless data preprocessing connected into DriverlessAI using the popular, easy to use, and free KNIME Analytics Platform.  You can also use KNIME Server to provide additional deployment capabilities, automation, collaboration, cloud execution, and IT administration. With the new KNIME to H2O.ai connectors,  customers can do data blending with hundreds of data sources, including Salesforce, Sharepoint, Oracle, SAP, SAP Hana, Snowflake, Spark, DataBricks, Hadoop, Tibco, Tableau, PowerBI, AWS, Azure, and GCP.

Data Scientist 

For data scientists and model operation teams, this solution provides additional flexibility by enabling a mix and match of automated and custom machine learning approaches.  Data scientists can now collaborate with business stakeholders, gaining valuable input to achieve the optimal result. Upon initial model creation, they can ensure that it is streamlined using Integrated Deployment from KNIME and the Driverless AI AutoML and MOJO deployment artifacts . The addition of Driverless AI natively within a KNIME workflow now provides data scientists an integrated visual drag and drop ability to create such a pipeline. Data Scientists can now leverage the industry-leading AutoML in Driverless AI to quickly train high quality and explainable models that are production-ready in less time.

Deployment Teams 

For Deployment Teams, there is now additional flexibility in how and where the H2O Driverless AI trained models are automatically deployed as workflows, from visualization to being deployed as RESTful services, to web applications, to BI dashboards, to 3rd party tools, and all with a no-code approach.  Teams will now be able to automatically and continuously deploy and update models including automated data access, preparation, and pre-processing of workflows, ensuring that there is no loss in translation between the creation and deployment of the model and that ideal compute resources are utilized for ongoing deployment.

Data Science Team Leaders 

For Leaders of Data Science teams, this solution enables you to make the best use of your people, time, and technology resources in order to meet the needs of both the team and the enterprise. It provides an environment which empowers your data science team to use best in class AutoML with other best in class approaches and to collaborate on complex projects with the granular permissions and logging needed for team and project management. Productionize data science applications and services in a way that is transparent, secure, and able to be audited and governed as needed.  The deployment and management functionalities make it easy to productionize data science applications and services and deliver usable, reliable, and reproducible insights for the business.

Line of Business Leaders 

This solution provides Line of Business Leaders to have insight into the entire process and data lineage so that you can understand how and why decisions are made from data access to deployment and bring your domain expertise to bear in the process.  This allows you to mitigate risks and ensure the best results are delivered quickly and at scale to drive the desired business outcome.

4 Steps to Getting Started

The 4 Steps to get started with the KNIME Analytics Platform and H2O Driverless AI integration are:

  1. Get the tools
  2. Get KNIME Extension
  3. Configure KNIME to connect to H2O Driverless AI server
  4. Start Building your workflow

Below we will provide a quick overview of each step.

1. Get the tools 

Download  and install  KNIME Analytics Platform

Download , get trial license , and install  H2O Driverless AI

If you are interested in trying the Driverless AI integration with KNIME server please fill out this form .

2. Get Driverless AI KNIME Extension 

Download and Install Driverless AI KNIME Extension via the KNIME Analytics Platform.

Screen-Shot-2020-10-12-at-10.58.50-PM-688x1024.png Screen-Shot-2020-10-12-at-10.58.50-PM-688x1024.png

Or get it from the KNIME HUB .

3. Configure KNIME to connect to H2O Driverless AI 

You are almost ready to start, now you just need to enter the Driverless AI license key and configure KNIME to connect to H2O Driverless AI. Follow these instructions .

4. Start Building your workflow 

Once you have successfully installed the Driverless AI Extension, restart KNIME Analytics Platform and you should see the following nodes in the node repository under KNIME Labs:

Screen-Shot-2020-10-12-at-11.02.56-PM.png Screen-Shot-2020-10-12-at-11.02.56-PM.png

Get an overview of how to starting building your flow below and follow the KNIME H2O Driverless AI Integration User Guide 

Combining the power of KNIME and H2O in a single workflow example

In this section, we will walk through an example of the major steps of an end-to-end data science workflow using KNIME Analytics Platform and Driverless AI.

Step 1: Import the Driverless AI license 

In order to utilize the H2O Driverless Al nodes, you will need to import an H2O Driverless Al license file into your KNIME preferences.  You will find the Driverless AI license key  typically under the following path: /opt/h2oai/dai/home/.driverlessai/license.sig .  Copy this file to where your KNIME Analytics Platform is installed. Import this file into KNIME by navigating to File -> Preferences -> KNIME-> H2O Driverless Al and, as shown below:

Uploading Driverless AI license to KNIME

Step 2: Importing Data 

KNIME supports a wide array of data types. From flat files to dynamic Spark connections, KNIME can make it simple to read disparate data types and make them work together for use in machine learning algorithms . In the below example, joining a CSV file, two database tables, and a KNIME table is a simple drag and drop process.

Screen-Shot-2020-10-12-at-11.07.18-PM.png Screen-Shot-2020-10-12-at-11.07.18-PM.png

Step 3: Data Preparation 

KNIME provides a rich set of data source connectors and data preparation nodes with a no-code drag and drop canvas to simplify data access and preparation. This empowers data analysts, data engineers and data scientists to quickly build data preparations flows to prepare, wrangle, clean, join, and filter the data and get it ready for machine learning.  Once the data is prepared it can be connected to Driverless AI to build the machine learning models within the same drag and drop canvas.

data_processing.png data_processing.png

Step 4: Building Models with Driverless AI

In order to send KNIME data tables to Driverless AI, connect your workflow to the “Send to Driverless AI” node. Right-click the node and select “Configure” from the context menu.

Example workflow to push data from KNIME Analytics Platform to H2O Driverless AI

Before you push the data to Driverless AI you need to configure the connection.

SendToDAI_Configuration-1024x788.jpg SendToDAI_Configuration-1024x788.jpg

After you send the data to Driverless AI you can right-click on the “Send to Driverless AI” node and select “Interactive View: H2O Driverless AI Experiment View” to bring up the Driverless AI and use this interface to build an experiment, view AutoReport, and generation Machine Learning Interpretability (MLI) metrics and graphs.

LaunchInteractiveView-1024x529.jpg LaunchInteractiveView-1024x529.jpg

Below is what the Driverless AI UI looks like within KNIME

DAI_inKnime-1024x601.jpg DAI_inKnime-1024x601.jpg

Step 5: Deploy Model and Score New Data

KNIME can build Machine Learning production workflows to consume the models that were trained.  H2O.ai provides production-ready low latency models and pipelines in the MOJO deployment artifact.  MOJO  (stands for Model Object, Optimized) is a standalone, low-latency model object designed to be easily embeddable in production environments.  Add an H2O Driverless AI MOJO Predictor node to score data within a KNIME Workflow via drag and drop interface.

ScoreMojo-1024x739.jpg ScoreMojo-1024x739.jpg

Conclusion

The expanded integration between H2O.ai and KNIME brings together all-encompassing, intuitive, automated machine learning  from H2O.ai with the guided analytics from KNIME. Customers of H2O.ai and KNIME can now:

  • Develop an integrated data science workflow in KNIME Analytics Platform and KNIME Server, from data discovery, data preparation to production-ready predictive models
  • Deliver the power of automatic machine learning to business analysts, enabling more citizen data scientists with H2O Driverless AI
  • Reduce model deployment times, leveraging H2O Driverless AI and KNIME Server for reliably managing workflow, the model creation process, and production deployment

Additional Resources

Blogs

KNIME H2O.ai Extensions 

Community 

  • H2O Machine Learning with KNIME Analytics Platform – Christian Dietz – H2O AI World London (Slides)(Video)
  • Meetup: Leveraging H2O Machine Learning with KNIME Analytics Platform – Paolo Tamagnini, Marten Pfannenschmidt
  • H2O in KNIME: Integrating High Performance Machine Learning – Jo-Fai Chow (H2O.ai), Marten Pfannenschmidt (KNIME), Christian Dietz (KNIME)

Docs 

Partner Pages 

 headshot

Rafael Coss

Rafael Coss is a Community and Partner Maker at H2O.ai. Prior to joining H2O.ai, he was technical marketing and community Director and a developer advocate at Hortonworks. He was also the DataWorks Summit Program Co-Chair for the past 3 years. Prior to Hortonworks he was a Senior Solution Architect and Manager of IBM’s WW Big Data Enablement team. At IBM he was responsible for the technical product enablement for BigInsights and Streams. Previously, he held several other positions in IBM, where he worked on tools, XML db, federated db and Object-Relational db.

 headshot

Stefan Pacinda

Stefan Pacinda is a solution architect at H2O.ai. Located in Prague, Czech Republic, he is responsible for making sure H2O.ai prospects and customers adopt Machine Learning solutions and implement them within their IT infrastructure - both on premise or in the cloud. Prior to joining H2O.ai, he was working at Hewlett Packard, HPE, and Microfocus in the engineering team to build Service Virtualization.