Return to page
H2O LLM Studio, Large Language Models, MLOps, Python

Deploying Fine-Tuned LLM Binary Classification Models with MLflow: A Complete Guide

Published: February 03, 2026 Written by: H2O.ai Team min read
decorator

Large Language Models (LLMs) are increasingly used for classification tasks such as content moderation, risk scoring, and intent detection. While fine-tuning these models is becoming easier with tools like LLMStudio, getting them into production; with proper packaging, versioning, and deployment, is still a common challenge.

This post walks through a production-ready pattern for packaging and deploying a fine-tuned binary classification LLM using MLflow in a way that integrates cleanly with H2O MLOPS.

What You’ll Learn

By the end of this guide, you’ll know how to:

  • Package a fine-tuned LLM with a custom binary classification head
  • Build MLflow-compatible model artifacts ready for enterprise deployment
  • Implement a custom PyFunc wrapper that encapsulates your end-to-end inference flow

The Core Challenge

When you fine-tune an LLM for binary classification in OSS LLMStudio, the typical output artifacts include:

  • The fine-tuned base model (e.g., Hugging Face AutoModelForCausalLM)
  • Classification head weights
  • Configuration and metadata files (tokenizer, model config, training params, etc.)

This approach works well if you have your own deployment pipeline. However, if you want to deploy via H2O MLOps, the model must be packaged as an MLflow artifact. This is where MLflow’s PyFunc model abstraction becomes invaluable—it allows you to bundle your model logic, artifacts, and dependencies into a reusable, portable package that can be deployed seamlessly on any MLflow-compatible platform like our H2O MLOPS.

End-to-End Deployment Workflow

Below is the end-to-end process you’ll follow to go from a trained LLMStudio model to a deployed binary classifier in H2O MLOPS platform.

 

Step 1: Train Your Binary Classification Model in LLMStudio

Start by running your binary classification experiment in LLMStudio:

  • Configure training with appropriate hyperparameters (learning rate, batch size, epochs, etc.)
  • Monitor training for convergence and performance on validation metrics
  • Wait for the run to complete and generate its artifacts

Once training finishes successfully, LLMStudio will produce a package that includes:

  • The fine-tuned base LLM
  • Classification head weights for binary classification
  • Config and metadata (tokenizer files, model config, training params)

 

Step 2: Download the Training Artifact

Next, pull the training artifact to your local environment:

  1. Go to your LLMStudio workspace
  2. Locate the completed training run of interest
  3. Download the generated artifact bundle (e.g., a .zip file)

Inside this artifact you’ll find, for example:

  • model_refined-swan/ – your fine-tuned LLM directory (with config.json, pytorch_model.bin, etc.)
  • classification_head.pth – the torch-saved classification head weights
  • Additional configuration and metadata files

 

Step 3: Extract and Inspect the Artifact

On your local machine:

unzip
llmstudio_artifact.zip
cd
extracted_artifact/

Inspect the directory structure and identify:

  • The fine-tuned model directory
    • Contains config.json, pytorch_model.bin, tokenizer files, etc.
  • The classification head file
    • Typically named something like classification_head.pth

You’ll reference these paths when building your MLflow model.

 

Step 4: Configure Paths for Packaging

In your packaging script, configure the paths that point to the model and classification head:

# Path to local Hugging Face CausalLM directory

LOCAL_LLM_DIR = "model_refined-swan"

 

# Path to classification head weights (torch-saved)

HEAD_PATH = "model_refined-swan/classification_head.pth"

 

# Output MLflow model directory

MLFLOW_MODEL_DIR = "danube_classification_v1"

 

# Output zip basename (produces artifact_v5.zip)

ZIP_BASENAME = "artifact_v5"

 

These constants determine which artifacts are pulled into the MLflow PyFunc model, and where the final deployable bundle will be written.

Building the MLflow PyFunc Wrapper

The core of this deployment pattern is a custom MLflow PyFunc model. It:

  • Loads your fine-tuned LLM and classification head
  • Defines the inference pipeline (tokenization → forward pass → classification → probability)
  • Exposes a standard predict method that works with pandas DataFrames

 

Architecture Overview

Conceptually, the inference flow looks like this:

Input (DataFrame with prompt column)

Tokenization (Hugging Face Tokenizer)

LLM Forward Pass (CausalLM)

Last Token Logits Extraction

Classification Head Application

Sigmoid Activation (for probability)

Output (DataFrame with logits and probability)

 

This design cleanly separates model loading from inference logic, and returns structured outputs suitable for downstream services or dashboards.

 

Key Implementation Details

1. Context Loading

In load_context, you initialize your model using artifacts provided by MLflow at runtime:

def load_context(self, context):

    # Automatic device selection

    self.device = "cuda" if torch.cuda.is_available() else "cpu"

 

    # Load artifacts from MLflow-provided paths

    model_dir = context.artifacts["model_dir"]

    head_path = context.artifacts["head_path"]

 

    # Initialize tokenizer and model

    self.tokenizer = AutoTokenizer.from_pretrained(

        model_dir, trust_remote_code=True

    )

    self.model = AutoModelForCausalLM.from_pretrained(

        model_dir, torch_dtype="auto", trust_remote_code=True

    ).to(self.device).eval()

 

    # Load classification head

    head_weights = torch.load(head_path, map_location=self.device)

    self.head = torch.nn.Linear(1, 1, bias=False).to(self.device)

    self.head.weight.data = head_weights

 

This method ensures that:

  • The model automatically uses GPU when available
  • All paths are resolved via MLflow’s artifact system
  • The tokenizer, base LLM, and classification head are loaded and ready for inference

 

2. Inference Pipeline

The predict method handles batch inference over a pandas DataFrame:

def predict(self, context, model_input):

    # Validate input schema

    if "prompt" not in model_input.columns:

        raise ValueError("Input DataFrame must contain a 'prompt' column.")

 

    prompts = model_input["prompt"].astype(str).tolist()

 

    with torch.no_grad():

        for prompt in prompts:

            # Tokenize

            inputs = self.tokenizer(

                prompt, return_tensors="pt", add_special_tokens=False

            ).to(self.device)

 

            # Forward pass

            logits = self.model(**inputs).logits      # [1, seq, vocab]

            last = logits[:, -1]                      # [1, vocab]

 

            # Classification head

            cls_logits = self.head(last)

 

            # Convert to probabilities

            prob_val = torch.sigmoid(cls_logits)

 

From here, you can:

  • Aggregate logits and probabilities into a DataFrame
  • Optionally add thresholds or business rules for converting probabilities into class labels

This structure is flexible and can easily be extended to multi-class or multi-label setups.

Step 5: Package the Model with MLflow

With the PyFunc wrapper in place, you can now build an MLflow model that includes:

  • The Python logic (python_model)
  • The model artifacts (model_dir, head_path)
  • The input/output schema (signature)
  • The required Python dependencies

# Define input/output schema

input_schema = Schema([

    ColSpec(DataType.string, "prompt")

])

output_schema = Schema([

    ColSpec(DataType.string, "logits"),

    ColSpec(DataType.string, "probability"),

])

signature = ModelSignature(inputs=input_schema, outputs=output_schema)

 

# Save MLflow PyFunc model

mlflow.pyfunc.save_model(

    path=MLFLOW_MODEL_DIR,

    python_model=FTDanube(),

    artifacts={

        "model_dir": LOCAL_LLM_DIR,

        "head_path": HEAD_PATH,

    },

    signature=signature,

    pip_requirements=[

        "mlflow>=2.0.0",

        "torch",

        "transformers",

        "pandas",

        "sentencepiece>=0.1.99",

    ],

)

 

# Create deployment artifact

shutil.make_archive(ZIP_BASENAME, "zip", MLFLOW_MODEL_DIR)

 

Expected outputs:

  • danube_classification_v1/ – the MLflow model directory (PyFunc-compatible)
  • artifact_v5.zip – a deployment-ready artifact for your MLOps platform

This .zip file is what you’ll upload or register in your production environment.

Step 6: Deploy to Your MLOps Platform

Once you’ve created the zipped MLflow model, you can deploy it through your H2O MLOPS platform.


Option A: GUI-Based Deployment

  1. Navigate to your platform’s Model Registry or Model Management UI
  2. Click “Upload Model Artifact” (or equivalent)
  3. Select and upload artifact_v5.zip
  4. Create a deployment or endpoint from the registered model version

This option is ideal for teams that prefer visual workflows.

 

Option B: Python Client Deployment

If you’re already using the MLflow Python client or automation scripts, you can register the model programmatically.

A Full MLflow PyFunc Reference for H2O MLOps

import os

import shutil

import mlflow

import mlflow.pyfunc

import torch

import pandas as pd

from mlflow.models import ModelSignature

from mlflow.types import Schema, ColSpec, DataType

from transformers import AutoModelForCausalLM, AutoTokenizer



# ============================================================

# ===================== USER CONFIG ==========================

# ============================================================

 

# Path to local Hugging Face CausalLM directory

LOCAL_LLM_DIR = "model_refined-swan"

 

# Path to classification head weights (torch-saved)

HEAD_PATH = "model_refined-swan/classification_head.pth"

 

# Output MLflow model directory

MLFLOW_MODEL_DIR = "danube_jan8th_v1"

 

# Output zip basename (produces artifact_v4.zip)

ZIP_BASENAME = "artifact_v5"

 

# ============================================================

# =================== MLflow PyFunc ==========================

# ============================================================

 

class FTDanube(mlflow.pyfunc.PythonModel):

    """

    MLflow PyFunc wrapper around:

      - A local Hugging Face CausalLM model directory

      - A torch-saved classification head applied to last-token logits

 

    Input:

      pandas.DataFrame with column:

        - 'prompt' (string)

 

    Output:

      pandas.DataFrame with columns:

        - 'logits' (list[float])

        - 'probability' (list[float], sigmoid(logits))

    """

 

    def load_context(self, context):

        # Select device automatically

        self.device = "cuda" if torch.cuda.is_available() else "cpu"

 

        # MLflow-provided artifact paths

        model_dir = context.artifacts["model_dir"]

        head_path = context.artifacts["head_path"]

 

        # Load tokenizer and language model

        self.tokenizer = AutoTokenizer.from_pretrained(

            model_dir,

            trust_remote_code=True

        )

        self.model = AutoModelForCausalLM.from_pretrained(

            model_dir,

            torch_dtype="auto",

            trust_remote_code=True,

        ).to(self.device).eval()

 

        # Load classification head weights

        head_weights = torch.load(head_path, map_location=self.device)

        self.head = torch.nn.Linear(1, 1, bias=False).to(self.device)

        self.head.weight.data = head_weights

        self.head = self.head.to(self.model.dtype)

 

    def predict(self, context, model_input):

        # Input validation

        if not isinstance(model_input, pd.DataFrame):

            raise TypeError("Expected a pandas.DataFrame with a 'prompt' column.")

 

        if "prompt" not in model_input.columns:

            raise ValueError("Input DataFrame must contain a 'prompt' column.")

 

        prompts = model_input["prompt"].astype(str).tolist()

 

        logit_rows = []

        prob_rows = []

 

        # Disable gradients for inference

        with torch.no_grad():

            for prompt in prompts:

                # Tokenize prompt

                inputs = self.tokenizer(

                    prompt,

                    return_tensors="pt",

                    add_special_tokens=False,

                ).to(self.device)

 

                # Forward pass through LLM

                logits = self.model(**inputs).logits      # [1, seq, vocab]

                last = logits[:, -1]                      # [1, vocab]

 

                # Apply classification head

                cls_logits = self.head(last)

 

                # Convert logits to JSON-serializable list

                logit_val = cls_logits.squeeze(0).detach().cpu().float()

                logit_rows.append(logit_val.tolist())

 

                # Compute probability via sigmoid

                prob_val = torch.sigmoid(logit_val)

                prob_rows.append(prob_val.tolist())

 

        return pd.DataFrame({

            "logits": logit_rows,

            "probability": prob_rows

        })



# ============================================================

# ===================== SAVE MODEL ===========================

# ============================================================

 

def main():

    # Validate user paths

    if not os.path.isdir(LOCAL_LLM_DIR):

        raise FileNotFoundError(f"LOCAL_LLM_DIR not found: {LOCAL_LLM_DIR}")

 

    if not os.path.isfile(HEAD_PATH):

        raise FileNotFoundError(f"HEAD_PATH not found: {HEAD_PATH}")

 

    # Define MLflow model signature

    input_schema = Schema([

        ColSpec(DataType.string, "prompt")

    ])

    output_schema = Schema([

        ColSpec(DataType.string, "logits"),

        ColSpec(DataType.string, "probability"),

    ])

    signature = ModelSignature(

        inputs=input_schema,

        outputs=output_schema

    )

 

    # Save MLflow PyFunc model

    mlflow.pyfunc.save_model(

        path=MLFLOW_MODEL_DIR,

        python_model=FTDanube(),

        artifacts={

            "model_dir": LOCAL_LLM_DIR,

            "head_path": HEAD_PATH,

        },

        signature=signature,

        pip_requirements=[

            "mlflow>=2.0.0",

            "torch",

            "transformers",

            "pandas",

            "sentencepiece>=0.1.99",

        ],

    )

 

    # Zip the MLflow model directory

    shutil.make_archive(ZIP_BASENAME, "zip", MLFLOW_MODEL_DIR)

    print("Wrote:", os.path.abspath(ZIP_BASENAME + ".zip"))



if __name__ == "__main__":

    main()

 

Conclusion

By combining LLMStudio, Hugging Face Transformers, and MLflow PyFunc, you get a clean, repeatable path from fine-tuned LLM to production-grade binary classifier:

  1. Train your model in LLMStudio
  2. Download and inspect the artifact
  3. Configure paths for model and head weights
  4. Implement a robust PyFunc wrapper
  5. Package everything into an MLflow model
  6. Deploy to your H2O MLOps platform via UI or API
 headshot

H2O.ai Team

At H2O.ai, democratizing AI isn’t just an idea. It’s a movement. And that means that it requires action. We started out as a group of like minded individuals in the open source community, collectively driven by the idea that there should be freedom around the creation and use of AI.

Today we have evolved into a global company built by people from a variety of different backgrounds and skill sets, all driven to be part of something greater than ourselves. Our partnerships now extend beyond the open-source community to include business customers, academia, and non-profit organizations.

decorator decorator
decorator decorator
h2oai_cube h2oai_cube

Best-in-Class Agents
For Sovereign AI

REQUEST LIVE DEMO