June 9th, 2021

H2O Integrates with Snowflake Snowpark/Java UDFs: How to better leverage the Snowflake Data Marketplace and deploy In-Database

RSS icon RSS Category: H2O AI Cloud, H2O Driverless AI, Partners, Snowflake, Technical
Snowflake on H2O.ai

One of the goals of machine learning is to find unknown predictive features, even hidden from subject matter experts, in datasets that might not be apparent before, and use those 3rd party features to increase the accuracy of the model.

A traditional way of doing this was to try and scrape and scour distributed, stagnant data sources on the web and cross your fingers you would see some performance gain when doing model validation. An example of this would be scraping stock prices or GDP prices to see if the economy had an effect on things. One potentially better solution was Snowflake’s Data Marketplace, which allowed people to more easily share up-to-date datasets, but the requirement to work in SQL made it troublesome.

Today, we are excited to announce integrations with Snowflake’s Snowpark and Java UDFs, Snowflake’s new developer experience. This essentially opens up access to Snowflake’s Data Marketplace to data scientists and developers, now allowing 3rd party unique datasets that can help improve model accuracy and deliver better business outcomes to be included in their often standard notebook based programming approaches.

Snowpark is a client-side library that pushes client-side logic into Snowflake. Within a Virtual Warehouse, Java User Defined Functions (UDFs) can be run that contains your choice of code.

Snowflake’s new feature Snowpark enables the data in Snowflake to be available as a dataframe using Scala, and the Java UDF feature allows Java code to be executed as a user defined function (UDF) within the Snowflake environment. Meaning once we trained our model on our data plus the 3rd party data we like in the Snowflake marketplace, we can import the model into where the data lives for an easier path to deployment. For the data scientist, this opens up an amazing opportunity to use familiar H2O.ai tools and models, but use the power and scaling of the Snowflake platform.

To show an example, we will discuss how to use the Snowflake Data Marketplace to access a dataset from a company, Infutor Data Solutions, that has shared demographic data, and we want to use that with our existing dataset (to predict loan defaults with LendingClub data) in Snowflake.

The first step is to create a table based on the Infutor dataset. Then, we will average some columns and group them, so that they can be used.

val DataMarketPlaceDF = session.sql("CREATE or REPLACE TABLE DataMarketPlacePropertyAverage AS SELECT STATE, ROUND(avg(PROP_VALCALC),0) as PROP_VALCALC FROM Infutor_AV10424_TOTAL_DEMOGRAPHIC_PROFILES_100 GROUP BY State;")

The Snowflake marketplace data can then be used as a reference table. We can use the Snowpark API to create this reference table:

val PropertyAverageDF = session.table("DataMarketPlacePropertyAverage")


println("Number of rows created at State level: %s",PropertyAverageDF.count())

Snowpark makes it simple to join the two tables:

// Create Dataframe with Lending Club Data

val LendingDF = session.table("LENDINGCLUB")


// Join Dataframes based on State

val enrichedDF =  LendingDF.join(PropertyAverageDF,
  LendingDF.col("ADDR_STATE") === PropertyAverageDF.col("STATE"))

The table will now contain the average property value for each row in the original table. All of these operations occur in Snowflake, so scaling and the time to complete this operation is very fast.

The next step is to save this new dataframe and invoke Driverless AI training:

// Save new Dataframe to table and start training


println("Calling Training with new Table")

val training = session.sql("select H2OTrain('Build --Table=LENDINGCLUBwithPropertyAverage --Target=BAD_LOAN --Clone=Yes --Modelname=riskmodelPropertyAverage.mojo')")


This operation uses another unique Snowflake feature, Snowflake External Functions, which enables a table to be used with Driverless AI. The result is a model that is built using the advanced feature engineering and modeling capabilities offered by Driverless AI.

Once the model is trained, it is automatically deployed. It is available within the Snowflake environment, and we are now able to use the model directly in Snowflake using the Snowpark and Java UDF features.

// call the model using SQL until Java UDF is auto deployed, then use session.table()



The model executes and makes predictions in Snowflake. The key theme here is: the data does not leave the Snowflake environment, and it automatically scales with the size of the warehouse you’ve selected.

What is nice is that the Snowpark Scala and Java UDF code is all auto-generated for the Data Scientist.

Using the Infutor data, we were able to improve the model’s accuracy and score at scale within the Snowflake environment.


About the Authors

Eric Gudgion

Eric is a Senior Principal Solutions Architect at H2O.ai and works with customers on deploying machine learning models into production environments.


Miles Adkins

Miles is a Partner Sales Engineer at Snowflake and leads the joint technical go-to-market for Snowflake’s machine learning and data science partners.

About the Author

Eric Gudgion
Eric Gudgion

Eric is a Senior Principal Solutions Architect, he is passionate about performance and scalability. Eric’s role enables him to help customers adopt h2o within their enterprises.

Leave a Reply

H2O World Dallas Customer Talks

After three long years of not having an #H2OWorld, we finally held our first one

November 24, 2022 - by Vinod Iyengar
New in Wave 0.24.0

Another Wave release has arrived with quite a few exciting new features. Let's quickly go

November 21, 2022 - by Martin Turoci
Fallback Featured Image
H2O.ai Raises $40 Million to Democratize Artificial Intelligence for the Enterprise

Series C round led by Wells Fargo and NVIDIA MOUNTAIN VIEW, CA – November 30, 2017

November 20, 2022 - by
H2O.ai Placed Furthest in Completeness of Vision in 2021 Gartner Data Science and Machine Learning Magic Quadrant in the Visionaries Quadrant. — Copy

At H2O.ai, our mission is to democratize AI, and we believe driving value from data

November 18, 2022 - by Read Maloney, SVP of Marketing
H2O.ai Expands Market Footprint in Healthcare AI by Signing Hackensack Meridian Health and Other Key Providers

We’re excited to attend the HLTH conference this week in Las Vegas, NV. This industry

November 14, 2022 - by Prashant Natarajan
An Introduction to H2O Wave Table

H2O Wave is a Python package for creating realtime ML/AI applications for a wide variety of

November 13, 2022 - by Rohan Rao

Start Your Free Trial