November 22nd, 2021

Amazon Redshift Integration for H2O.ai Model Scoring

RSS icon RSS Category: Data Science, H2O AI Cloud

We consistently work with our partners on innovative ways to use models in production here at H2O.ai, and we are excited to demonstrate our AWS Redshift integration for model scoring.

Amazon Redshift is a very popular data warehouse on AWS. We wanted to expand on the existing capacities of using data from Redshift to train a model on the H2O AI Cloud, which is a comprehensive automated machine learning platform. Now, once a model is trained, Redshift can use the model for inferencing (scoring) using standard SQL. 

Calling the model using SQL is a very convenient way to create inferences that can be stored in Redshift.  Because the model operates like an SQL function, it is easy to include the model in an SQL query that uses live data rather than the more common extract, score and upload approach.

Any application that uses Redshift to access data can now get real time predictions by leveraging current data at scoring time. Once a model is created using the H2O AI Hybrid Cloud, the following install steps will enable that model to be used in Redshift:

 

  1. Download the mojo.zip file (Download Scoring Pipeline > Download Mojo Scoring Pipeline)
  2. Unzip the mojo.zip, in the unzipped directory, and find the file pipeline.mojo. This is the model and is the only file required.
  3. Download and follow the AWS SageMaker integration steps outlined here
  4. Generate the RedShift SQL for this model. With all the files in the same directory, use the jar downloaded in Step 3 and specify the model to use and the type of artifact you want to generate (Redshift-SQL). This command will generate the Redshift function for this specific model and the SQL that can be used to call the model for inference.The result of the above command is a file (pipeline.mojo.Redshift-sql):

Notice how the function name (h2oscore_pipeline) contains the name of the model. This is because each model could have a different number of parameters and types attributed to it. If we specified a model called churn.mojo on the generation step, the function name would be h2oscore_churn.

    5. Now paste into an SQL tool and execute.

The SageMaker and IAM_ROLE need to be specified for your account. These would have been created in Step 3. 

The last part of the output shows an example SQL select statement for the specific model. This shows us how to call the model and the columns that will be passed from the specific table. The eye catcher <table-name> should be changed to the table in RedShift that you would like to use for inferencing.

Now you can execute the SQL with any SQL Editor that can connect to RedShift. In this example,  I used the AWS Query Editor.

 

One way to capture the results is to create a table with a select statement. This operation allows the original table to remain unchanged, and the results of the scoring to be written to a new table. Notice here that I used a key (customer id) so that I can use this within a join to reference the original row.

This new functionality enables scoring to be invoked from Redshift. This saves time for the operations team as data does not have to be selected and exported for scoring then reloaded, reducing the time it takes to operationalize the model.

As the model can be called using SQL, now any application that uses Redshift can get predictions, this further increases the value of the models output throughout the organization, as the predictions can use current data rather than predictions that were created days or weeks ago.

This new functionality for scoring opens the possibility of using Redshift for real time scoring! If you aren’t a current H2O.ai user, you can sign up to try the H2O AI Cloud for free today!

 

 

 

About the Author

Eric Gudgion
Eric Gudgion

Eric is a Senior Principal Solutions Architect, he is passionate about performance and scalability. Eric’s role enables him to help customers adopt h2o within their enterprises.

Leave a Reply

+
Recap of H2O World India 2023: Advancements in AI and Insights from Industry Leaders

On April 19th, the H2O World  made its debut in India, marking yet another milestone

May 29, 2023 - by Parul Pandey
+
Enhancing H2O Model Validation App with h2oGPT Integration

As machine learning practitioners, we’re always on the lookout for innovative ways to streamline and

May 17, 2023 - by Parul Pandey
+
Building a Manufacturing Product Defect Classification Model and Application using H2O Hydrogen Torch, H2O MLOps, and H2O Wave

Primary Authors: Nishaanthini Gnanavel and Genevieve Richards Effective product quality control is of utmost importance in

May 15, 2023 - by Shivam Bansal
AI for Good hackathon
+
Insights from AI for Good Hackathon: Using Machine Learning to Tackle Pollution

At H2O.ai, we believe technology can be a force for good, and we're committed to

May 10, 2023 - by Parul Pandey and Shivam Bansal
H2O democratizing LLMs
+
Democratization of LLMs

Every organization needs to own its GPT as simply as we need to own our

May 8, 2023 - by Sri Ambati
h2oGPT blog header
+
Building the World’s Best Open-Source Large Language Model: H2O.ai’s Journey

At H2O.ai, we pride ourselves on developing world-class Machine Learning, Deep Learning, and AI platforms.

May 3, 2023 - by Arno Candel

Request a Demo

Explore how to Make, Operate and Innovate with the H2O AI Cloud today

Learn More