This course provides a complete walkthrough of the AI lifecycle using the H2O.ai platform.
Designed for data science and AI professionals, it explains how to build, deploy, and manage both predictive machine learning models and generative AI applications within a secure enterprise environment.
You will see how the platform connects each stage of the data science workflow. The course covers everything from raw data preparation and automated feature engineering to model deployment, real-time monitoring, and strict regulatory compliance.
You will also learn how to integrate modern AI agents and Large Language Models (LLMs) into standard enterprise workflows.
What you'll learn
Data Preparation & Feature Engineering
Learn how to automate data transformations, generate synthetic data, and manage scalable offline and online feature stores.
Predictive Model Development
Understand how to train, track, and optimize machine learning models while ensuring explainability, bias testing, and alignment with business ROI.
Enterprise MLOps & Monitoring
See how to deploy models, manage artifact registries, optimize compute resources, and monitor production models for real-time data drift.
Generative AI & LLM Management
Learn to test prompts, apply instruction tuning (DPO), set up security guardrails, and evaluate LLM responses using LLM-as-a-judge metrics.
RAG & Agentic Workflows
Understand how to build multimodal RAG pipelines and orchestrate AI agents to automate data science tasks and multi-step workflows.
AI Governance & Compliance
Learn how to enforce role-based access controls, maintain data traceability, and automatically generate model documentation and audit trails.
Course Playlist on YouTube
Learn how enterprise teams move from data profiling and feature engineering through automated machine learning, explainability, and production deployment — all within a single, coordinated platform.
Rather than stitching together disconnected tools, H2O.ai is designed to support the full journey from experimentation to operations with shared governance, security controls, and lifecycle management built in. The course also covers how traditional predictive modeling connects with modern generative AI capabilities and agent-driven workflows.
What You Will Learn :
✦ Data Preparation & Feature Engineering: Automated profiling, transformations, and feature pipelines from raw data to model-ready inputs.
✦ Automated Machine Learning: Model training, explainability, and bias testing with H2O Driverless AI.
✦ MLOps & Production Deployment: Model registration, real-time serving, drift monitoring, and lifecycle management.
✦ Generative AI & Agentic Workflows: Connecting predictive models with LLMs and autonomous agents via Enterprise h2oGPTe.
✦ Enterprise Governance: Unified security controls, audit logging, and compliance across the entire AI lifecycle.
🔗 H2O.ai Platform Overview: https://h2o.ai/
🎓 H2O.ai University: https://h2o.ai/university/
📚 H2O.ai Documentation: https://docs.h2o.ai/
How H2O.ai automates data wrangling, profiling, and synthetic data generation within the ML training pipeline.
Before modeling begins, raw data must be cleaned, profiled, and transformed consistently. H2O Driverless AI handles this automatically—detecting missing values, analyzing distributions, and embedding all preprocessing logic directly into the scoring pipeline. This eliminates train-serve skew by ensuring identical transformations at training and inference time. Synthetic data can also be generated via natural language prompts using Enterprise h2oGPTe agents.
↪ Technical Capabilities & Resources
➤ Advanced Dataframe Processing: Multi-threaded, in-memory dataset processing using H2O-3.
🔗 https://docs.h2o.ai/h2o/latest-stable/h2o-py/docs/frame.html
➤ Automated Data Transformations: Categorical encoding, imputation, and feature transforms embedded into the scoring pipeline.
🔗 https://docs.h2o.ai/driverless-ai/latest-stable/docs/userguide/transformations.html
➤Data Augmentation: Automated lag features, temporal splits, and test set augmentation for time series.
🔗 https://docs.h2o.ai/driverless-ai/latest-stable/docs/userguide/time-series.html#test-set-augmentation
➤ Synthetic Data Generation: Create structured datasets via natural language prompts using h2oGPTe agents.
🔗 https://docs.h2o.ai/enterprise-h2ogpte/guide/agents
In enterprise AI, it’s critical to know where your data comes from, how it was transformed, and who has access to it. In this short lesson, we walk through how the H2O.ai platform supports data lineage, automated dataset profiling, and secure handling of sensitive data across the machine learning lifecycle.
You’ll see how every experiment automatically captures complete lineage metadata, including the dataset version used, feature engineering steps applied, model configuration, and the experiments that produced the model. This allows teams to trace predictions from raw data all the way to model output, which is essential for debugging, governance, and regulatory audits.
Within H2O Feature Store, feature sets maintain their full transformation history, making it possible to trace any feature back to its source datasets and derived logic.
Documentation:
https://docs.h2o.ai/featurestore/api/feature_set_api
The platform also helps identify data quality issues early. When datasets are ingested into Driverless AI, AutoViz automatically performs profiling such as missing value detection, distribution analysis, outlier visualization, correlation checks, and target imbalance identification. Driverless AI can also detect potential data leakage during experiment setup, helping teams avoid training on problematic datasets.
Documentation:
https://docs.h2o.ai/h2o-driverless-ai-tutorials/tutorials/core/tutorial-1a/task-4
For sensitive data, the platform uses a defense-in-depth approach. Role-based access control ensures users only see data they are authorized to access, while workspace isolation and granular Feature Store permissions control access to specific datasets and features. Deployments can also support isolated VPC environments and air-gapped on-premise installations for highly regulated environments.
Feature Store permissions:
https://docs.h2o.ai/featurestore/api/permissions
For text and document workflows, H2O LLM DataStudio and Enterprise h2oGPTe provide options for PII detection, anonymization, and sanitization of sensitive information during dataset preparation and document ingestion.
Documentation:
https://docs.h2o.ai/h2o-llm-data-studio/tutorials/prepare/data-preparation/configuration#data-anonymization
These capabilities help data science teams move faster while maintaining governance, traceability, and security across the AI lifecycle.
How the H2O.ai Feature Store manages, versions, and serves ML features consistently across training and production.
Rebuilding features from scratch across projects wastes time and introduces inconsistency. The H2O Feature Store provides a self-contained system with offline and online engines handling feature registration, metadata, versioning, and low-latency serving. Teams can discover reusable features—like behavioral metrics or sentiment scores—through a searchable catalog, and synchronize those features between training and inference environments to eliminate skew.
↪ Technical Capabilities & Resources
➤ Native Feature Store (Offline & Online Engines): Full feature lifecycle management from creation to real-time serving.
🔗 https://docs.h2o.ai/featurestore/
➤ Feature Metadata & Versioning: Track transformation history, version features, and support controlled rollback.
🔗 https://docs.h2o.ai/featurestore/concepts#storage
➤ Driverless AI Integration (MOJO Pipelines): Automate feature generation and export pipelines directly into the Feature Store.
🔗 https://docs.h2o.ai/featurestore/examples/example_dai_mojo
➤ External Data Source Ingestion: Ingest features from external systems and proprietary data sources.
🔗 https://docs.h2o.ai/featurestore/supported_data_sources
How Driverless AI automates feature engineering and promotes high-value features into a reusable enterprise Feature Store.
Manual feature engineering is time-consuming and difficult to reproduce at scale. Driverless AI automatically generates hundreds of candidate features—including interaction terms, polynomial features, time-based aggregations, and categorical encodings—then evaluates each one's predictive value, keeping only those that improve model performance. High-value features can then be promoted to the H2O Feature Store as versioned, reusable assets shared across teams.
Technical Capabilities & Resources
➤ Automated Feature Engineering & Selection: Generates and evaluates candidate features, retaining only those with measurable predictive impact.
🔗 https://docs.h2o.ai/featurestore/supported_derived_transformation
➤ Searchable Feature Catalog: Register, tag, and discover features and feature sets across projects.
🔗 https://docs.h2o.ai/featurestore/concepts#features
➤ Feature Transformation Pipelines: Consistent feature transformations orchestrated across offline training and online inference.
🔗 https://docs.h2o.ai/featurestore/get-started/architecture#feature-store-online-engine
➤ Feature Versioning & Rollback: Evolve feature definitions while preserving prior versions for full reproducibility.
🔗 https://docs.h2o.ai/featurestore/api/feature_set_new_version
How Driverless AI automates model explainability, fairness testing, and responsible AI documentation for regulated industries.
In regulated industries, a performant model alone is insufficient—teams must also explain and audit it. Driverless AI automatically generates SHAP values, K-LIME, and ICE plots to provide both global and individual-level transparency. Disparate Impact Analysis (DIA) enables fairness testing across demographic groups. All findings are compiled into AutoDoc reports automatically, and generative AI agents can translate complex explainability metrics into plain business language.
➤ Automated Explainability (SHAP, K-LIME, ICE): Auto-generates interpretability visualizations for global and local model behavior.
🔗 https://docs.h2o.ai/driverless-ai/latest-stable/docs/userguide/interpreting.html
➤ Disparate Impact Analysis (DIA): Compares aggregate outcomes across privileged and underprivileged demographic groups.
🔗 https://docs.h2o.ai/driverless-ai/latest-stable/docs/userguide/interpret-understanding.html#dai-dia
➤ Eval Studio Fairness Evaluators: Systematic fairness metrics for LLM and predictive model assessments.
🔗 https://docs.h2o.ai/eval-studio-docs/get-started/core-features
How to register, deploy, A/B test, and monitor ML models in real time using H2O MLOps.
Taking a model from training to production requires version control, safe rollout strategies, and continuous observability. H2O MLOps handles model registration from Driverless AI, capturing all metadata, training metrics, and artifact history. From there, teams can configure REST endpoints or batch scoring jobs, run Champion/Challenger and A/B tests to de-risk updates, and deploy to external platforms like Snowflake or AWS SageMaker via eScorer—all with built-in monitoring from the moment a deployment goes live.
➤ Deployment Templates & Scoring Runtimes: Centralized registration and repeatable deployments using Java, C++, Python, MOJO, and MLflow pipelines.
🔗 https://docs.h2o.ai/mlops/v0.65.1/deployments/scoring-runtimes
➤ Real-Time & Batch Deployments: Configure REST endpoints, batch scoring, and A/B testing from a single interface.
🔗 https://docs.h2o.ai/mlops/model-deployments/understand-deployments
➤ External Platform Deployment (eScorer): Deploy models to Snowflake and AWS SageMaker via H2O eScorer.
🔗 https://docs.h2o.ai/h2o-escorer/user-guide/aws-sagemaker-deployment
How H2O MLOps validates, containerizes, autoscales, and logs ML model deployments for enterprise production use.
Before any model goes live, H2O MLOps runs automated readiness checks to verify artifact completeness and runtime compatibility. Each deployment runs as an isolated containerized service, handling variable workloads through configurable replicas and vertical autoscaling—whether processing millions of batch records or serving real-time predictions. Full request and response logging feeds into integrated Apache Superset dashboards, enabling teams to build custom monitoring visualizations.
Technical Capabilities & Resources
➤ Model Readiness Validation: Pre-deployment checks verify artifact completeness and runtime compatibility automatically.
🔗 https://docs.h2o.ai/mlops/models/understand-models
➤ Batch Scoring Execution: Schedule and configure large-scale batch predictions with concurrency management.
🔗 https://docs.h2o.ai/mlops/batch-scoring
➤ Containerized Deployments & Autoscaling: Isolated container services with dynamic replica management and vertical pod autoscaling.
🔗 https://docs.h2o.ai/mlops/model-deployments/vertical-pod-autoscaler
➤ Request Logging & Superset Dashboards: Audit logs and custom analytics dashboards via integrated Apache Superset.
🔗 https://docs.h2o.ai/mlops/model-monitoring
How H2O MLOps centralizes model governance with a version-controlled registry supporting both native and third-party models.
Managing a growing portfolio of production models requires a structured, searchable registry with full version history. H2O MLOps provides exactly that—capturing training metrics, validation scores, feature importance, and metadata tags for every registered model. Importantly, the platform is not restricted to H2O-native models: teams can import MLflow models complete with package dependencies, enabling unified deployment, monitoring, and governance across all ML assets from one platform.
Technical Capabilities & Resources
➤ Internal Model Repository: Register Driverless AI models with complete version history, scoring artifacts, and custom taxonomy tags.
🔗 https://docs.h2o.ai/mlops/models/understand-models
➤ Third-Party & MLflow Integration: Import and manage MLflow and external framework models alongside native H2O models.
🔗 https://docs.h2o.ai/mlops/models/mlflow-model-support
➤ Supported Third-Party Models: Review the full list of supported external model frameworks.
🔗 https://docs.h2o.ai/mlops/models/mlflow-model-support#supported-third-party-models
How H2O MLOps links registered models to their experiments, datasets, artifacts, and managed scoring runtimes.
Models should never exist in isolation from their origin. H2O MLOps automatically links every registered model back to its exact Driverless AI experiment—including training configurations, comparison data, AutoDoc reports, feature analysis, and MOJO scoring pipelines. System administrators can pre-configure containerized scoring runtimes tailored to standard, GPU-enabled, or regulated environments, allowing data scientists to deploy securely without requiring infrastructure expertise.
Technical Capabilities & Resources
➤ Linked Model Metrics & Artifacts: Auto-link evaluation metrics, AutoDoc reports, and scoring pipelines to registered models for complete lineage.
🔗 https://docs.h2o.ai/mlops/models/understand-models
➤ Experiment Management via API: Programmatically query and manage experiments linked to registered models.
🔗 https://docs.h2o.ai/mlops/py-client/examples/manage-experiments
➤ Model Import & Export (MOJO & Python Pipelines): Import external models or export Driverless AI MOJO and Python scoring pipelines.
🔗 https://docs.h2o.ai/mlops/models/mlops-model-support#h2o-driverless-ai-mojo-pipeline--python-scoring-pipeline
➤ Managed Container Runtimes: Admin-configured runtimes for specific workloads, GPU requirements, and regulatory environments.
🔗 https://docs.h2o.ai/mlops/model-deployments/scoring-runtimes
How H2O Driverless AI and MLOps enable parallel experiment execution, real-time comparison, and full reproducibility.
Before any model reaches production, data science teams run and compare many experiments. Driverless AI supports parallel experiment execution with automated resource management, real-time leaderboard monitoring, and side-by-side metric comparisons. Every experiment automatically syncs its full configuration—datasets, targets, and parameters—to H2O MLOps workspaces. Teams can interact through the UI or query programmatically via the Python client to integrate tracking into CI/CD pipelines.
Technical Capabilities & Resources
➤ Parallel Experiment Execution: Run concurrent ML experiments with automated resource allocation and queue management.
🔗 https://docs.h2o.ai/mlops/py-client/examples/manage-experiments
➤ Real-Time Monitoring & Visualization: Live leaderboards, cross-validation tracking, and side-by-side experiment comparison.
🔗 https://docs.h2o.ai/driverless-ai/latest-stable/docs/userguide/autoviz.html
➤ Complete Lineage & Reproducibility: Auto-sync experiment configurations, datasets, and metrics to H2O MLOps.
🔗 https://docs.h2o.ai/mlops/py-client/overview
➤ API-Based Tracking: Programmatically extract metrics and integrate experiment tracking into existing workflows.
🔗 https://docs.h2o.ai/mlops/py-client/examples/handle-artifacts
How H2O MLOps and Apache Superset track model performance and calculate data drift in production deployments.
Models degrade over time as real-world data distributions shift. H2O MLOps allows teams to configure baseline metrics and select specific columns for drift monitoring the moment a deployment begins scoring. For deeper analysis, integrated Apache Superset provides a visualization layer where teams can build custom dashboards using stored statistical aggregates—bin edges, counts, and sums—enabling granular drift calculations across both continuous and categorical features over time.
➤ Real-Time Model Monitoring: Configure baseline metrics and track live predictions directly within the H2O MLOps interface.
🔗 https://docs.h2o.ai/mlops/model-monitoring
➤ Advanced Drift Detection & Custom Dashboards: Calculate feature distribution drift and build analytics dashboards using integrated Apache Superset.
🔗 https://docs.h2o.ai/mlops/model-monitoring
➤ Granular Metrics Capture: Monitor statistical aggregates including bin counts, edges, and categorical feature statistics for precise drift analysis.
🔗 https://docs.h2o.ai/mlops/model-monitoring
How Enterprise h2oGPTe manages prompt templates, version control, and multilingual AI agent deployment at scale.
Bridging predictive models and end users requires well-engineered, maintainable prompts. h2oGPTe provides a centralized prompt library where teams can create, clone, version, and share templates across the organization. The H2O Super Agent connects natural language prompts directly to predictive scoring APIs—enabling real-world actions like addressing customer churn. Multilingual template support and UI localization allow consistent AI behavior to be deployed across global markets.
Technical Capabilities & Resources
➤ Prompt Templates & Libraries: Create, clone, and share prompt templates from a managed organizational catalog.
🔗 https://docs.h2o.ai/enterprise-h2ogpte/guide/prompts
➤ Prompt Version Control & Iteration: Define system behaviors, iterate on prompt designs, and manage template settings.
🔗 https://docs.h2o.ai/enterprise-h2ogpte/guide/prompts#create-a-prompt-template
➤ Template Sharing Across Teams: Distribute prompt templates for consistent AI behavior organization-wide.
🔗 https://docs.h2o.ai/enterprise-h2ogpte/guide/prompts#share-a-prompt-template
➤ Custom Multilingual Prompts: Configure language-specific templates for consistent, localized global AI deployment.
🔗 https://docs.h2o.ai/enterprise-h2ogpte/guide/prompts#create-a-prompt-template-for-a-specific-language
How to fine-tune domain-specific LLMs for tasks like text-to-SQL and multimodal QA using H2O Enterprise LLM Studio.
When prompt engineering alone is insufficient, fine-tuning a domain-specific model can reduce costs while improving accuracy. H2O Enterprise LLM Studio walks through the full instruction tuning process—leveraging LoRA adapters, built-in AutoML for hyperparameter optimization, and real-time training metrics like loss curves and validation perplexity. Models are evaluated for safety and quality, then exported directly to Hugging Face for distribution across the organization.
Technical Capabilities & Resources
➤ Multimodal Generative AI Tuning: Train models for domain-specific tasks including multi-modal causal language modeling and image/text classification.
🔗 https://docs.h2o.ai/h2o-enterprise-llm-studio/get-started/what-is-h2o-enterprise-llm-studio#use-cases
➤ Instruction Tuning & DPO Alignment: Fine-tune base models using labeled data, automated hyperparameter search, and preference optimization.
🔗 https://docs.h2o.ai/h2o-llmstudio/guide/experiments/supported-problem-types#dpo-modeling
➤ Augmentation for Fine-Tuning Datasets: Use LLM DataStudio to augment and prepare training data for downstream instruction tuning.
🔗 https://docs.h2o.ai/h2o-llm-data-studio/guide/augment/augmentation-datasets
How Enterprise h2oGPTe protects LLM applications from toxic content, PII leaks, and adversarial jailbreak attempts.
Even high-performing generative AI models require safeguards. h2oGPTe enforces multi-stage guardrails at the collection level—monitoring content during ingestion, at prompt submission, and before final response generation. Built-in toxic topic classifications and configurable custom guardrails keep AI strictly on-topic. PII detection uses a defense-in-depth approach combining regex, Presidio, and a fine-tuned ModernBERT model, while PromptGuard actively blocks adversarial jailbreak patterns and logs every violation.
Technical Capabilities & Resources
➤ Toxic Content & Custom Topic Filtering: Block harmful content and restrict AI to approved business topics using configurable guardrails.
🔗 https://docs.h2o.ai/enterprise-h2ogpte/guide/collections/create-a-collection#guardrails-and-pii-detection
➤ PII Detection & Redaction: Identify and redact sensitive data across prompts and responses using Regex, Presidio, and ModernBERT.
🔗 https://docs.h2o.ai/enterprise-h2ogpte/guide/collections/pii-sanitization#pii-detection-methods
➤ Adversarial Jailbreak Protection: PromptGuard detects and neutralizes adversarial prompt patterns before they reach the model.
🔗 https://docs.h2o.ai/enterprise-h2ogpte/changelog/tags/v-1-5#guardrails
How h2oGPTe grounds LLMs in enterprise data using multimodal RAG, hybrid search, and autonomous agent workflows.
LLMs don't inherently know your internal products, policies, or customer history. h2oGPTe bridges this gap by ingesting 50+ file formats—including documents, audio, and video—using multi-engine OCR and native enterprise connectors for SharePoint, S3, and Azure. Hybrid retrieval combines semantic similarity and BM25 with Reciprocal Rank Fusion and cross-encoder reranking for precise, citation-backed answers. Agentic workflows extend RAG further by enabling autonomous retrieval, reasoning, and tool execution.
Technical Capabilities & Resources
➤ Document Ingestion & Transformation: 50+ format support with multi-engine OCR, table preservation, and enterprise connectors.
🔗 https://docs.h2o.ai/enterprise-h2ogpte/guide/collections/supported-file-types
➤ Built-in & External Vector Storage: Includes Vex embedded vector database plus integrations with external vector providers.
🔗 https://docs.h2o.ai/enterprise-h2ogpte/architecture/vector-database
➤ Advanced Hybrid Search: Combines semantic similarity, BM25, Reciprocal Rank Fusion, and cross-encoder reranking.
🔗 https://docs.h2o.ai/enterprise-h2ogpte/guide/chats/chat-settings#generation-approach
➤ Agentic Workflows: Autonomous agents that iteratively retrieve, reason, and trigger tool execution for complex queries.
🔗 https://docs.h2o.ai/enterprise-h2ogpte/guide/agents
How H2O Eval Studio provides systematic evaluation of LLMs and RAG applications across accuracy, fairness, and hallucination metrics.
Deploying AI with confidence requires more than intuition—it requires structured, repeatable testing. H2O Eval Studio assesses LLM and RAG performance across metrics including answer correctness, context precision, context recall, and hallucination detection. Teams can configure unit test cases, define custom evaluators using their own prompts (BYOP), and use interactive dashboards to compare multiple models side-by-side—identifying issues like bias, looping, or token presence failures before deployment.
Technical Capabilities & Resources
➤ Comprehensive Evaluation Metrics: Assess answer correctness, context precision, context recall, and hallucinations using standard benchmarks and LLM-as-a-judge.
🔗 https://docs.h2o.ai/eval-studio-docs/guide/evaluations/evaluators
➤ Custom Evaluators (BYOP): Define parameterizable, domain-specific evaluation logic using your own prompts.
🔗 https://docs.h2o.ai/eval-studio-docs/guide/evaluations/evaluators#parameterizable-byop-evaluator
➤ Interactive Dashboards & Leaderboards: Compare LLMs and RAG systems side-by-side to pinpoint performance gaps and bias.
🔗 https://docs.h2o.ai/eval-studio-docs/guide/evaluations/view-evaluation
How H2O.ai orchestrates enterprise AI workloads on Kubernetes with managed resource profiles and cost guardrails.
As AI programs scale, managing compute resources across teams and use cases becomes operationally critical. H2O.ai runs all workloads—Driverless AI experiments, Feature Store operations, MLOps deployments, and h2oGPTe agent executions—as managed Kubernetes workloads. Administrators define specialized resource profiles allocating the right CPU, GPU, and memory per task. Cost guardrails enforce idle timeouts, maximum run durations, and dynamic cluster autoscaling, keeping infrastructure spend under control without requiring Kubernetes expertise from data scientists.
Technical Capabilities & Resources
➤ Workload Orchestration & Resource Profiles: Schedule ML workloads using admin-managed profiles that allocate CPU, GPU, and memory automatically.
🔗 https://docs.h2o.ai/ai-engine-manager/user-guide/dai-engine/create-dai-engine/#step-4-configure-resources
➤ Cost Optimization & Infrastructure Guardrails: Control compute costs with resource constraints, idle timeouts, and dynamic cluster autoscaling.
🔗 https://docs.h2o.ai/mlops/model-deployments/create-a-deployment#advanced-settings
➤ H2O Engine Management: View and manage engine configuration and last-used resource profile information.
🔗 https://docs.h2o.ai/ai-engine-manager/user-guide/h2o-engine/manage-h2o-engine/
How H2O.ai's Python SDKs, REST APIs, and MCP tools enable full programmatic control over the enterprise AI platform.
No-code interfaces are valuable, but serious AI developers need deep programmatic flexibility. H2O.ai exposes Python SDKs, REST APIs, and hosted Jupyter Labs for scripting and automating every platform component—from triggering Driverless AI experiments to managing MLOps deployments within CI/CD pipelines. OpenAPI Swagger UIs allow developers to explore endpoints and generate client code in Python, JavaScript, or Go. The Model Context Protocol (MCP) server enables h2oGPTe agents to connect directly to external systems like Salesforce, MongoDB, and GitHub.
Technical Capabilities & Resources
➤ Comprehensive SDKs & Libraries: Automate Driverless AI, MLOps, h2oGPTe, and Eval Studio via Python, JavaScript, and Go clients.
🔗 https://docs.h2o.ai/mlops/py-client/overview
➤ OpenAPI Specifications: Interactive Swagger UIs for exploring endpoints, testing calls, and generating client code.
🔗 https://h2ogpte.cloud-dev.h2o.dev/swagger-ui/
➤ Agent Extensibility via MCP: Connect generative AI agents to external tools and proprietary data systems using the h2oGPTe MCP server.
🔗 https://pypi.org/project/h2ogpte-mcp-server/
How H2O.ai's Data Science Agent automates EDA, model training, and SHAP explainability across the ML lifecycle.
The H2O Data Science Agent connects directly to enterprise data sources like S3 to autonomously perform data profiling and generate visual analytics—distribution plots, correlation heatmaps—then synthesizes findings into a business narrative tailored to the audience. Beyond exploration, the agent integrates with Driverless AI to configure and monitor AutoML experiments, and extracts SHAP values for transparent feature importance analysis. This tight coupling between generative AI and predictive modeling accelerates the full data science workflow.
Technical Capabilities & Resources
➤ Automated Exploratory Data Analysis: Autonomous data profiling, visual analytics generation, and business-context narrative synthesis.
🔗 https://docs.h2o.ai/enterprise-h2ogpte/guide/chats/chat-settings#agent-type
➤ Driverless AI Integration via Agent: Configure, trigger, and monitor AutoML experiments directly through agent tool integration.
🔗 https://docs.h2o.ai/enterprise-h2ogpte/guide/chats/chat-settings#agent-type
➤ Integrated SHAP Explainability: Agent extracts SHAP values from trained models to provide transparent feature importance insights.
🔗 https://docs.h2o.ai/enterprise-h2ogpte/guide/chats/chat-settings#agent-type
How h2oGPTe enables no-code Super Agent customization and auto-generated multi-agent Python code using CrewAI and LangGraph.
Enterprises often need agents tailored to specific workflows beyond built-in defaults. h2oGPTe supports two approaches: the Super Agent can be rapidly customized using system prompts, collections, and tool configurations without any code. For advanced requirements, the Agent Builder generates fully executable Python source code by accepting a natural language workflow description, selecting the right framework (CrewAI or LangGraph), and running an internal build-test-refine loop. Agents also generate standardized A2A protocol files for cross-framework interoperability.
Technical Capabilities & Resources
➤ Super Agent Customization: Configure task-specific agents using custom system prompts, collections, and tool orchestration—no code required.
🔗 https://docs.h2o.ai/enterprise-h2ogpte/guide/agents#how-it-works
➤ Agent Builder & Code Generation: Generate production-ready Python code for custom agents using CrewAI, LangGraph, or the OpenAI SDK.
🔗 https://docs.h2o.ai/enterprise-h2ogpte/guide/agents
➤ Multi-Agent Ecosystems (A2A Protocol): Automatically generate A2A communication files for agent interoperability across different frameworks.
🔗 https://docs.h2o.ai/enterprise-h2ogpte/guide/agents/agent-builder/a2a-protocol
How the H2O Super Agent acts as a natural language IDE to automate, execute, and orchestrate enterprise AI workflows conversationally.
The H2O Super Agent goes beyond question-answering—it translates natural language intent directly into execution by calling APIs, writing and running code, and modifying configurations autonomously. Users select from specialized agent types optimized for different use cases. The agent provides full transparency into its reasoning, code generation, and tool outputs for developer debugging. Role-based access controls ensure business users see only finalized outputs, while maintaining human-in-the-loop oversight throughout.
Technical Capabilities & Resources
➤ Natural Language IDE & Workflow Automation: Build and execute complex AI workflows through conversational prompts with transparent step-by-step reasoning.
🔗 https://docs.h2o.ai/enterprise-h2ogpte/get-started/h2ogpte-flow
➤ Tool Calling & Execution: Agents autonomously call external APIs, write code, and orchestrate integrated tools beyond text generation.
🔗 https://docs.h2o.ai/enterprise-h2ogpte/guide/agents/tool-calling
➤ Agent Builder Overview: Understand the broader agent architecture enabling autonomous workflow orchestration.
🔗 https://docs.h2o.ai/enterprise-h2ogpte/guide/agents/agent-builder/overview
How H2O.ai bridges visual no-code ML pipelines and code-first Python execution for diverse data science working styles.
AI teams contain visual thinkers, coders, and everyone in between. Driverless AI supports intuitive wizards and visual pipeline diagrams for feature engineering and model tuning. MLOps allows switching between UI-based row scoring and command-line batch execution. h2oGPTe agents generate sandbox-tested Python code that can be exported, modified, and used in automated testing—enabling teams to fluidly transition from a visual interface to a fully scriptable environment without losing any work.
Technical Capabilities & Resources
➤ Visual Pipeline Composition: Visualize feature engineering, model selection, and ensembling steps as interactive diagrams in Driverless AI.
🔗 https://docs.h2oai.com/driverless-ai/latest-stable/docs/userguide/scoring_pipeline_visualize.html
➤ No-Code to Code Conversion: Export UI workflows from Driverless AI into reproducible, executable Python scripts.
🔗 https://docs.h2o.ai/driverless-ai/latest-stable/docs/userguide/examples/autoviz_client_example/autoviz_client_example.html
➤ Custom Code Integration: Incorporate custom functions and recipes directly into Driverless AI workflows for granular control.
🔗 https://docs.h2o.ai/driverless-ai/latest-stable/docs/userguide/custom_recipes.html
How H2O.ai enforces RBAC, model constraints, audit logging, and GenAI guardrails across the enterprise AI lifecycle.
As AI scales across organizations, governance must be embedded into the platform architecture—not added as an afterthought. H2O.ai manages role-based access controls across workspaces so business users and ML engineers see only what they need to. Monotonicity constraints embed regulatory logic directly into model training behavior. VPC and air-gapped deployment options enforce data residency requirements, while comprehensive audit logging and automated GenAI guardrails maintain full operational transparency.
Technical Capabilities & Resources
➤ Role-Based Access Control (RBAC) & Workspaces: Manage workspace-level permissions for secure collaboration across MLOps and GenAI workflows.
🔗 https://docs.h2o.ai/enterprise-h2ogpte/guide/system-dashboard/roles-and-permissions
➤ Model Constraints & Metadata Tagging: Embed monotonicity constraints into models and tag assets by risk level and sensitivity.
🔗 https://docs.h2oai.com/driverless-ai/latest-stable/docs/userguide/monotonicity-constraints.html
➤ Audit Logging & Compliance: Capture all governance events, approvals, and permission changes to support regulatory examinations.
🔗 https://docs.h2o.ai/haic-documentation/security-guarantees-model#audit-logging
➤ Model Monitoring & Alerts: Customizable alerts for performance degradation to maintain ongoing compliance.
🔗 https://docs.h2o.ai/mlops/model-monitoring
How H2O.ai delivers audit-ready AI with centralized logging, automated model documentation, and traceable agent execution.
When regulators ask questions, teams need a complete, reproducible paper trail. H2O.ai builds auditability directly into the DSML lifecycle—centralized audit logs capture every user action, deployment event, and configuration change with precise timestamps and actor context. AutoDoc eliminates manual reporting by generating comprehensive model documentation automatically. For generative AI, every agent execution step is fully traceable, exposing tool calls, data access, and reasoning steps to explain exactly how a recommendation was reached.
Technical Capabilities & Resources
➤ Centralized Audit Logging: Complete history of user actions, administrative changes, and operational events with timestamps and actor context.
🔗 https://docs.h2o.ai/haic-documentation/security-guarantees-model#audit-logging
➤ Automated Model Documentation (AutoDoc): Generate reproducible reports covering model configurations, feature importance, and performance metrics automatically.
🔗 https://docs.h2o.ai/driverless-ai/latest-lts/docs/userguide/autodoc-using.html
➤ Traceable Agent Execution: Step-by-step breakdown of agent tool calls, data access, and reasoning for full decision transparency.
🔗 https://docs.h2oai.com/enterprise-h2ogpte/guide/agents#how-to-review-agent-behavior
➤ ML Interpretability & Retention: Support long-term compliance with interpretability tooling and configurable data retention policies.
🔗 https://docs.h2o.ai/driverless-ai/latest-lts/docs/userguide/mli.html
How H2O.ai aligns ML model development with business outcomes using custom scorers, ROI documentation, and enterprise tool integration.
AI projects only succeed when they deliver measurable business value. H2O.ai organizes technical work around strategic goals using structured workspaces and business metadata tagging. Custom scoring functions in Driverless AI allow teams to optimize models directly for profit functions—such as customer retention probability or intervention costs—rather than generic statistical metrics. The H2O Super Agent can autonomously draft ROI analyses, while API integrations with tools like Jira and ServiceNow synchronize the model lifecycle with existing enterprise workflows.
Technical Capabilities & Resources
➤ Goal-Oriented Workspaces: Organize AI projects around business strategy and expected outcomes using collaborative workspace descriptions.
🔗 https://docs.h2o.ai/haic-documentation/guide/general/create-manage-workspaces
➤ Custom Business Value Scoring: Optimize Driverless AI models directly for revenue or cost-based profit functions using custom scorers.
🔗 https://github.com/h2oai/driverlessai-recipes/blob/master/scorers/classification/binary/profit.py
➤ Automated Business Documentation: Use AutoDoc and the H2O Super Agent to generate business cases and ROI analyses from model performance data.
🔗 https://docs.h2o.ai/driverless-ai/latest-stable/docs/userguide/autodoc.html
➤ Enterprise Tool Integration: Synchronize model lifecycle events with Jira, ServiceNow, or Azure DevOps via the Python API.
🔗 https://docs.h2o.ai/driverless-ai/latest-stable/docs/userguide/python_client.html
1
1:25
AI Platform for Data Science & Machine Learning | H2O.ai University
2
1:53
Automated Data Prep & Synthetic Data with H2O Driverless AI | Part 1
3
1:43
Governing the AI Lifecycle: H2O.ai Data Traceability | Part 2
4
1:54
Scaling Enterprise ML with the H2O Feature Store | Part 3
5
1:26
Automated Feature Engineering in H2O Driverless AI | Part 4
6
2:03
Automated ML Explainability & Bias Testing in H2O.ai | Part 5
7
1:58
Enterprise MLOps: Model Deployment with H2O.ai | Part 6
8
1:21
Scalable ML Runtime Deployment with H2O MLOps | Part 7
9
1:32
H2O MLOps Enterprise Model Registry & Hugging Face | Part 8 Integration | Part 8
10
1:38
AI Artifact Management & Traceability via H2O MLOps | Part 9
11
2:03
ML Experiment Tracking in H2O Driverless AI | Part 10
12
1:52
Real Time ML Drift Detection & Monitoring via H2O MLOps | Part 11
13
1:49
Enterprise Prompt Engineering & LLM Testing via h2oGPTe | Part 12
14
2:02
LLM Instruction Tuning & DPO via H2O Enterprise LLM Studio | Part 13
15
1:59
Securing Enterprise LLMs with h2oGPTe Guardrails | Part 14
16
2:00
Multimodal RAG & Agentic Workflows via Enterprise h2oGPTe | Part 15
17
1:59
LLM-as-a-Judge Evaluation Metrics via H2O Eval Studio | Part 16
18
1:47
Optimizing ML Compute & Orchestration with H2O MLOps | Part 17
19
1:53
Extending AI Workflows with H2O ai APIs & Python SDKs | Part 18
20
1:47
Accelerating Data Science Workflows with H2O AI Agents in Enterprise h2oGPTe | Part 19
21
4:43
Multi Agent AI Orchestration & MCP via Enterprise h2oGPTe | Part 20
22
1:51
Conversational ML Workflows via the H2O AI Super Agent in Enterprise h2oGPTe | Part 21
23
1:57
Building Visual ML Pipelines to Python with H2O Driverless AI | Part 22
24
1:51
Enforcing AI Governance & Compliance on the H2O.ai Platform | Part 23
25
1:46
Automated ML Audit Trails & AutoDoc in H2O Driverless AI | Part 24
26
1:45
Optimizing ML Models for Business ROI with H2O Driverless AI | Part 25
As the Head of Product at H2O.ai, Michelle Tanco’s primary focus lies in delivering a seamless user experience across machine learning applications. With a strong dedication to math and computer science, she is enthusiastic about leveraging these disciplines to address real-world challenges. Before joining H2O, she served as a Senior Data Science Consultant at Teradata, working on high-impact analytics projects to tackle business issues across various industries. Michelle holds a B.A. in Mathematics and Computer Science from Ursinus College. In her downtime, she enjoys spending quality time with her family and expressing her creativity by playing the bass and ukulele.
Andreea Turcu
Head of Global Training
Andreea is a data scientist with over 7 years of experience in demystifying AI and Data Science concepts for anyone keen on working in this exciting field using cutting-edge technology. Having obtained a Master’s Degree in Quantitative Economics and Econometrics from Lumière Lyon 2 University, she enjoys integrating machine learning principles with real-world applications. Andreea’s passion lies in developing engaging training programs and ensuring an optimal customer education journey. As she frequently likes to remark, “AI is essentially Economics turbocharged by data, with a sprinkle of innovation.”
Principal Data Scientist at H2O.ai, specializing in leading complex Machine Learning projects from ideation to production, with a keen interest in Model Ops and a strong background in statistics.
Her expertise covers a broad range of industries such as insurance, energy, and services, enabling her to communicate effectively with both technical and non-technical stakeholders.
Holding a Master of Science in Mathematics and Statistics from Université Aix-Marseille II, Audrey has a proven track record of enhancing business strategies and objectives through data analysis and model development across various data science roles.
Learn Hands On
Follow structured learning paths designed to build real, production-ready AI skills. Learn at your own pace, practice on real environments, and validate your knowledge through certification.