H2O.ai and AT&T co-created the H2O AI Feature Store to store, update, and share the features data scientists, developers, and engineers need to build AI models. Organizations spend large amounts of time exploring and transforming raw data to create predictive features. Unfortunately, these highly valuable and often costly features are typically only available to the data scientists that created them. H2O AI Feature Store makes it easy for organizations to organize, govern, share and operationalize these valuable features. With H2O AI Feature Store, organizations can increase their pace of innovation and deliver impactful AI outcomes faster.
1. Data Science and engineering teams engineer features using their tools of choice.
2. Popular feature engineering pipelines, such as Snowflake, Databricks, H2O Sparkling Water, Apache Spark, and more have pre-built integrations with the H2O AI Feature Store. Additionally, any engineering pipeline can write features and associated metadata to the H2O AI Feature Store via the REST API.
3. When features are written to the H2O AI Feature Store, data scientists can specify over 40 metadata attributes, tags, and the set of features that need to be available for real-time applications. H2O AI Feature Store uses built-in AI to automatically recommend new features, identify bias, and create feature insights.
4. Data Scientists can explore and search the feature store to find features to use in their models. Helping them build more accurate and robust models faster. The online Feature Store is powered by Redis to enable sub-millisecond reads for inference. The offline Feature Store supports use cases for batch model training, analytical applications without realtime requirements and exploratory data analysis.
Benefits and Capabilities
Automatic Feature Recommendations
Automatically improve the features in your feature store and also generate new features with our state-of-the-art automated feature engineering. Data scientists can select the feature sets that they are looking to update and improve, and simply request feature recommendations. The H2O AI Feature Store will automatically recommend new features and feature updates that could improve AI model performance. Data Scientists can review the proposed updated features and accept or discard them, retaining complete control. Users can set up feature recommendations to run automatically or on-demand.
Automatic Feature Drift
Automatically checks both individual features and feature sets for drift over time and alerts users. Alerts can be used to trigger retraining or refitting to keep models accurate.
Automatic Bias Identification
Automatically detect bias in your features. Data Scientists can simply select the set of features they’d like to analyze for bias, and the H2O AI Feature Store will analyze and report if bias was detected. This capability helps data scientists monitor features on an ongoing basis to continually remove bias. With our automatic Bias Identification feature, data scientists have complete control to review and take action on features that may create bias.
Automatically scores features to indicate popularity or value across different use-cases.
Detailed Cataloging and Search
Add over 40 metadata attributes, such as Description, Data Sources, and Data Sensitivity Categories. Additionally, metadata tags can be added to further improve the feature discoverability and exploration. The complete list of attributes is located in our H2O AI Feature Store documentation.
Use natural language search to find the best features using simple or advanced queries.
Automated Feature Metadata
Explore and find features based on automatically generated feature statistics like mean, standard deviation, frequencies etc.
Integration with AI Tools
Integration with Popular Feature Engineering Pipelines
Integrated with Snowflake, Databricks, H2O Sparkling Water, Apache Spark, Python, Java, and Scala feature engineering pipelines. Other data engineering pipelines can write features and metadata attributes to the H2O AI Feature Store via the REST API.
Integration with H2O AI Cloud
Integrated within H2O.ai’s state-of-the-art cloud platform, H2O AI Cloud. H2O AI Cloud supports data exploration, automated feature engineering, model building with H2O3 and H2O Driverless AI, model monitoring and operations, and a broad set of pre-built AI applications.
High Performance and Scale
Real-time Model Scoring
Features that are needed to support real-time applications are stored in-memory and served with sub-millisecond latency.
Scale to handle nearly any number of real-time or batch reads / writes. Kubernetes is utilized to make scaling easier to manage for customers.
Access Management and Governance
Single Sign On
Integrate existing identity and access management tools.
Grant role-based and individual permissions to specific projects and features.
Store versions of features and their associated metadata to comply with regulations, backtest models, and learn from past features.
ML Time Travel
Automatically log changes to all feature values, allowing data scientists and ML governance organizations to understand how past models would have scored records at any given time.
New Insights from Features
Automated Feature Insights
Automatically compute feature statistics like mean, median, standard deviation, frequencies etc. for quick insights on features.
Featured Success Story
AT&T carries more than 465 petabytes of data traffic across their global network on an average day, and turning data into actionable intelligence as quickly as possible is vital to their success. AT&T, co-creating with H2O.ai, is putting the final touches on the H2O AI Feature Store internally to reliably and securely handle large-scale and real-time production workloads.
We’ve built our internal AI-as-a-Service (AIaaS) platform leveraging H2O’s AI Cloud services, and we believe the co-invention and co-development of H2O AI Feature Store is going to be one of the most impactful elements in our platform. With the H2O AI Feature Store, we’re building AI solutions that are much faster, more accurate and robust in a fraction of the time.”
Prince Paulraj, AVP of Data Science