October 28th, 2021

Announcing the H2O AI Feature Store

RSS icon RSS Category: H2O AI Cloud, Product Updates

We’re really excited to announce the H2O AI Feature Store – The only intelligent feature store in the market. We’ve been working on this for many months with our co-development partner: AT&T. This enabled us to build a first-of-its-kind platform that is designed to be enterprise-grade from day 1. It is built with best-of-breed technology that integrates seamlessly with all the common enterprise machine learning pipelines. The Feature Store will be available as part of the H2O AI Cloud for customers to use.

What is a Feature

Before we get to the feature store, let’s look at a quick review of what a feature is. For most machine learning and AI applications, raw data is typically not used directly but transformed into ‘features’ that are optimized for capturing the most signal from the data. The features can often be simple transformations (like logarithmic or exponential) or aggregations (sum of sales over a time period) or interactions with other features (debt to income ratio).

Key Challenges

As we deployed our AI platform at many large enterprises and put models in production, we started hearing about issues around redundant, time-consuming work in recreating features in production and lack of collaboration across data science teams. In some of the large companies, getting the right set of features put together is probably the most significant part of the project.

One large financial service customer told us that it took them nearly half a year to put together data for a new model. A lot of the challenges were around resolving data access, permissions, approvals for certain features, and model review from governance teams. What they found is that often, the same set of features were being used by models – so if they could reuse features from existing models then they could bypass the whole upfront process. H2O AI Feature Store is a repository to store, update, retrieve, and share machine learning (ML) features.

Many data scientists and domain experts often spend large amounts of time exploring and transforming raw data to create predictive features. Unfortunately, these highly valuable and often costly features are typically only available to the data scientists that created them. H2O AI Feature Store makes it easy for organizations to organize, govern, share and operationalize these valuable features. And as important is the fact that these features are made available for both batch and real-time requirements without having to engineer them again.

With the H2O AI Feature Store, organizations can increase their pace of innovation and deliver impactful AI outcomes faster.

How does it work?

The feature store consists of 3 main components:

  • Offline store of features for training and batch scoring
  • Online store for real-time scoring and streaming
  • Metadata Registry to enable search and collaboration

Data scientists and engineers can continue to build features in their existing environments or tools and bring those to the feature store through one of our many clients. We have native integration to platforms like Databricks, Snowflake, Teradata, and more. Data scientists can also directly use our Scala or Python client to access the feature store.

Users can create new projects, register them and then ingest data in the feature store.

Once the data is in the feature store, they can configure how frequently they want to update the features based on their use case and needs. The feature store keeps a mirrored copy of the data for both online and offline requirements. Typically the offline store is used to access the data for model training (using historical data) and batch scoring. The offline store is built to handle massive amounts of data. The online store, on the other hand, is typically used for real-time scoring and streaming use cases, and therefore it is built to deliver features with sub-millisecond latency. Models that are deployed using H2O MLOps or for that matter anywhere can hit the feature store in real-time in the middle of a transaction and use the output to score and provide predictions back.

Key Capabilities

We also have a bunch of capabilities that we are super excited about:

 

  • Automatic Feature Recommendations

Automatically improve the features in your feature store. Data scientists can select the feature sets that they are looking to update and improve and simply request feature recommendations. H2O will automatically recommend new features and feature updates that could improve AI model performance. Data Scientists can review the proposed updated features and accept or discard them, retaining complete control. Users can set up feature recommendations to run automatically or on demand.

 

  • Automatic Feature Drift 

Automatically checks both individual features and feature sets for drift over time and alerts users. Alerts can be used to trigger retraining or refitting to keep models accurate.

 

  • Automatic Bias Identification

Automatically detect bias in your features. Data Scientists can simply select the set of features they’d like to analyze for bias, and the H2O AI Feature Store will analyze and report if bias was detected. This capability helps data scientists monitor features on an ongoing basis to continually remove bias. With our automatic Bias Identification feature, data scientists have complete control to review and take action on features that may create bias.

 

  • Feature Rank

Automatically score features to indicate popularity or value across different use-cases. This will be tied to the variable importance of models to understand which features are most valuable across use-cases.

 

  • Detailed Cataloging

Add over 40 metadata attributes, such as Description, Data Sources, and Data Sensitivity Categories. Additionally, metadata tags can be added to further improve the feature discoverability and exploration. The complete list of attributes is located in our H2O AI Feature Store documentation.

Get started

To learn more about the feature store, check out the page and sign up for early access.

About the Author

Vinod Iyengar

Vinod Iyengar is the Vice President of Product at H2O.ai. He leads a team charged with product management and product development across the H2O.ai platform.

Vinod has worked for H2O.ai since 2015. In his time with the company, he has worked as the VP of marketing & technical alliances, and VP of customer success & product. Vinod received his bachelor’s degree in engineering from the University of Mumbai and his master’s degree in quantitative analysis from the University of Cincinnati College of Business.

Leave a Reply

+
Enhancing H2O Model Validation App with h2oGPT Integration

As machine learning practitioners, we’re always on the lookout for innovative ways to streamline and

May 17, 2023 - by Parul Pandey
+
Building a Manufacturing Product Defect Classification Model and Application using H2O Hydrogen Torch, H2O MLOps, and H2O Wave

Primary Authors: Nishaanthini Gnanavel and Genevieve Richards Effective product quality control is of utmost importance in

May 15, 2023 - by Shivam Bansal
AI for Good hackathon
+
Insights from AI for Good Hackathon: Using Machine Learning to Tackle Pollution

At H2O.ai, we believe technology can be a force for good, and we're committed to

May 10, 2023 - by Parul Pandey and Shivam Bansal
H2O democratizing LLMs
+
Democratization of LLMs

Every organization needs to own its GPT as simply as we need to own our

May 8, 2023 - by Sri Ambati
h2oGPT blog header
+
Building the World’s Best Open-Source Large Language Model: H2O.ai’s Journey

At H2O.ai, we pride ourselves on developing world-class Machine Learning, Deep Learning, and AI platforms.

May 3, 2023 - by Arno Candel
LLM blog header
+
Effortless Fine-Tuning of Large Language Models with Open-Source H2O LLM Studio

While the pace at which Large Language Models (LLMs) have been driving breakthroughs is remarkable,

May 1, 2023 - by Parul Pandey

Request a Demo

Explore how to Make, Operate and Innovate with the H2O AI Cloud today

Learn More