October 12th, 2020

Empowering Snowflake Users with AI using SQL

RSS icon RSS Category: Community, Machine Learning, Partners, Technical, Tutorials
Fallback Featured Image

At H2O.ai we work with many enterprise customers, all the way from Fortune 500 giants to small startups. What we heard from all these customers as they embark on their data science and machine learning journey is the need to capture and manage more data cost-effectively, and the ability to share that data across their organization to make better business decisions. The cloud provides many benefits to build a data platform, but the danger of vendor lock-in always lurks in the corner. That’s why many customers are looking to Snowflake as their data platform, so they can use their choice of cloud provider for their data strategy. The same is true when customers are looking to select the best automatic machine learning technology. Having the flexibility to choose the cloud infrastructure on which to run data science workloads provides customers the flexibility of using best of breed solutions that give them a competitive edge with cloud-neutral, innovative technology platforms.

Making AI Accessible to Snowflake SQL Users

The challenge for many companies is how to extract more value from the data they capture and store in the Snowflake Data Cloud. Data science and machine learning is a great way to provide predictive insights from data to make better business decisions. Companies are highly dependent on data scientists for extracting new predictive insights from the data they have. The implementation of the entire process tends to be difficult, tedious and requires a number of different skilled resources. It’s not only the data scientist that is a key player in that process, but also other functions such as data engineers and analysts that are very familiar with SQL for querying data. Making AI and ML available to these users in their familiar SQL environment opens up a range of new possibilities to accelerate the adoption of AI. This is why H2O.ai worked closely with Snowflake to bring the power of Driverless AI at the fingertips of Snowflake users.

Figure 1: Using SQL in Snowflake for machine learning (Click on the image to watch full walkthrough video)

Removing Barriers to Deploying Models in Production

Organizations depend on data ops people, as well as data engineers to extract the business value from the models that data scientists are building. The whole idea behind the integration of H2O Driverless AI with Snowflake is to streamline that end to end machine learning process, from right at the start of developing machine learning models all the way to putting those models into production and scoring new data that is being captured about customers.

Figure 2: Streamlining the ML pipeline process

The question is how much can we automate the model development process within the ML platform? With Driverless AI it’s all about automation of data science and machine learning tasks that can speed up the creation of highly accurate models.  Once the model is built, then it needs to go into production where it will actually generate business value. And so the whole process from model development to model deployment introduces complicated tasks where different resources come into the picture in addition to data scientists. Data engineers or data ops people have the responsibility to take those models and ensure they can be operationalized in a production environment.

Using Driverless AI from Within Snowflake

Let’s first talk about the common process of model building and deployment with data in a Snowflake environment.  The data scientist would use the Driverless AI GUI to train a model with data imported using the Snowflake connector. That model was then deployed in a scoring engine for production use.  To make predictions on new data, you had to export that data into a .csv file (or any other file format) and push it into the scoring engine. Then the predictions made in the scoring engine have to be written back into the Snowflake environment. So even though this might seem simple and straightforward, it is a tedious and cumbersome process to set up and manage. In addition, this batch process does not lend itself to real-time scoring on fresh new data for AI-enabled applications that need in-the-moment predictions.

With Snowflake introducing external functions earlier this year, H2O got an opportunity to make this whole process much more efficient. By using external functions we can make Driverless AI available as a remote service to users from within Snowflake. Driverless AI can be invoked from within Snowflake to train or retrain a model, automatically deploy it as a REST server, and make it available to score new data. All this is executed by using familiar SQL statements and commands to score the data from within Snowflake. With the use of external functions, there is no longer the need for exporting data from Snowflake to score data.  By calling the function in SQL using the Snowflake user interface it is now possible to update tables with predictions directly in Snowflake.

Figure 3: Using external functions to make predictions in Snowflake

The integration of H2O Driverless AI with Snowflake using external functions makes automatic machine learning available at the fingertips of every Snowflake user, including data engineers and data analysts.  They no longer need to learn a new technology platform to use the full power of ML to extract meaningful insights from their data. This results in a more efficient, flexible and cost-effective machine learning process that will accelerate the adoption of AI.

To know more, visit our Snowflake page at: https://www.h2o.ai/partner/snowflake/

About the Authors

vinod iyengar
Vinod Iyengar, VP of Products

Vinod is VP of Products at H2O.ai. He leads all product marketing efforts, new product development and integrations with partners. Vinod comes with over 10 years of Marketing & Data Science experience in multiple startups. He was the founding employee for his previous startup, Activehours (Earnin), where he helped build the product and bootstrap the user acquisition with growth hacking. He has worked to grow the user base for his companies from almost nothing to millions of customers. He’s built models to score leads, reduce churn, increase conversion, prevent fraud and many more use cases. He brings a strong analytical side and a metrics driven approach to marketing. When he is not busy hacking, Vinod loves painting and reading. He is a huge foodie and will eat anything that doesn’t crawl, swim or move.

Yves Laurent

Yves has over 20 years of experience in building partner and channel go to market strategies for leading technology companies. He started his career at Cisco Systems where he held various sales and marketing leadership positions across EMEA, APAC and US.  Before joining H2O he lead partner marketing at Denodo and Hortonworks where his focus has been on ensuring partner success through partner programs that align with business objectives. During his spare time he enjoys the outdoors with his family and friends.

Leave a Reply

H2O LLM DataStudio Part II: Convert Documents to QA Pairs for fine tuning of LLMs

Convert unstructured datasets to Question-answer pairs required for LLM fine-tuning and other downstream tasks with

September 22, 2023 - by Genevieve Richards, Tarique Hussain and Shivam Bansal
Building a Fraud Detection Model with H2O AI Cloud

In a previous article[1], we discussed how machine learning could be harnessed to mitigate fraud.

July 28, 2023 - by Asghar Ghorbani
A Look at the UniformRobust Method for Histogram Type

Tree-based algorithms, especially Gradient Boosting Machines (GBM's), are one of the most popular algorithms used.

July 25, 2023 - by Hannah Tillman and Megan Kurka
H2O LLM EvalGPT: A Comprehensive Tool for Evaluating Large Language Models

In an era where Large Language Models (LLMs) are rapidly gaining traction for diverse applications,

July 19, 2023 - by Srinivas Neppalli, Abhay Singhal and Michal Malohlava
Testing Large Language Model (LLM) Vulnerabilities Using Adversarial Attacks

Adversarial analysis seeks to explain a machine learning model by understanding locally what changes need

July 19, 2023 - by Kim Montgomery, Pramit Choudhary and Michal Malohlava
Reducing False Positives in Financial Transactions with AutoML

In an increasingly digital world, combating financial fraud is a high-stakes game. However, the systems

July 14, 2023 - by Asghar Ghorbani

Ready to see the H2O.ai platform in action?

Make data and AI deliver meaningful and significant value to your organization with our state-of-the-art AI platform.