BACK TO LIST

H2O LLM Studio, Large Language Models, MLOps, Python

Deploying Fine-Tuned LLM Binary Classification Models with MLflow: A Complete Guide

Published: February 03, 2026

Written by: Santhosh Pattamudu Manoharan

min read

Large Language Models (LLMs) are increasingly used for classification tasks such as content moderation, risk scoring, and intent detection. While fine-tuning these models is becoming easier with tools like LLMStudio, getting them into production; with proper packaging, versioning, and deployment, is still a common challenge.

This post walks through a production-ready pattern for packaging and deploying a fine-tuned binary classification LLM using MLflow in a way that integrates cleanly with H2O MLOPS.

What You’ll Learn

By the end of this guide, you’ll know how to:

Package a fine-tuned LLM with a custom binary classification head
Build MLflow-compatible model artifacts ready for enterprise deployment
Implement a custom PyFunc wrapper that encapsulates your end-to-end inference flow

The Core Challenge

When you fine-tune an LLM for binary classification in OSS LLMStudio, the typical output artifacts include:

The fine-tuned base model (e.g., Hugging Face AutoModelForCausalLM)
Classification head weights
Configuration and metadata files (tokenizer, model config, training params, etc.)

This approach works well if you have your own deployment pipeline. However, if you want to deploy via H2O MLOps, the model must be packaged as an MLflow artifact. This is where MLflow’s PyFunc model abstraction becomes invaluable—it allows you to bundle your model logic, artifacts, and dependencies into a reusable, portable package that can be deployed seamlessly on any MLflow-compatible platform like our H2O MLOPS.

End-to-End Deployment Workflow

Below is the end-to-end process you’ll follow to go from a trained LLMStudio model to a deployed binary classifier in H2O MLOPS platform.

Step 1: Train Your Binary Classification Model in LLMStudio

Start by running your binary classification experiment in LLMStudio:

Configure training with appropriate hyperparameters (learning rate, batch size, epochs, etc.)
Monitor training for convergence and performance on validation metrics
Wait for the run to complete and generate its artifacts

Once training finishes successfully, LLMStudio will produce a package that includes:

The fine-tuned base LLM
Classification head weights for binary classification
Config and metadata (tokenizer files, model config, training params)

Step 2: Download the Training Artifact

Next, pull the training artifact to your local environment:

Go to your LLMStudio workspace
Locate the completed training run of interest
Download the generated artifact bundle (e.g., a .zip file)

Inside this artifact you’ll find, for example:

model_refined-swan/ – your fine-tuned LLM directory (with config.json, pytorch_model.bin, etc.)
classification_head.pth – the torch-saved classification head weights
Additional configuration and metadata files

Step 3: Extract and Inspect the Artifact

On your local machine:

unzip
llmstudio_artifact.zip
cd
extracted_artifact/

Inspect the directory structure and identify:

The fine-tuned model directory
- Contains config.json, pytorch_model.bin, tokenizer files, etc.
The classification head file
- Typically named something like classification_head.pth

You’ll reference these paths when building your MLflow model.

Step 4: Configure Paths for Packaging

In your packaging script, configure the paths that point to the model and classification head:

# Path to local Hugging Face CausalLM directory

LOCAL_LLM_DIR = "model_refined-swan"

# Path to classification head weights (torch-saved)

HEAD_PATH = "model_refined-swan/classification_head.pth"

# Output MLflow model directory

MLFLOW_MODEL_DIR = "danube_classification_v1"

# Output zip basename (produces artifact_v5.zip)

ZIP_BASENAME = "artifact_v5"

These constants determine which artifacts are pulled into the MLflow PyFunc model, and where the final deployable bundle will be written.

Building the MLflow PyFunc Wrapper

The core of this deployment pattern is a custom MLflow PyFunc model. It:

Loads your fine-tuned LLM and classification head
Defines the inference pipeline (tokenization → forward pass → classification → probability)
Exposes a standard predict method that works with pandas DataFrames

Architecture Overview

Conceptually, the inference flow looks like this:

Input (DataFrame with prompt column)
↓
Tokenization (Hugging Face Tokenizer)
↓
LLM Forward Pass (CausalLM)
↓
Last Token Logits Extraction
↓
Classification Head Application
↓
Sigmoid Activation (for probability)
↓
Output (DataFrame with logits and probability)

This design cleanly separates model loading from inference logic, and returns structured outputs suitable for downstream services or dashboards.

Key Implementation Details

1. Context Loading

In load_context, you initialize your model using artifacts provided by MLflow at runtime:

def load_context(self, context):

# Automatic device selection

self.device = "cuda" if torch.cuda.is_available() else "cpu"

# Load artifacts from MLflow-provided paths

model_dir = context.artifacts["model_dir"]

head_path = context.artifacts["head_path"]

# Initialize tokenizer and model

self.tokenizer = AutoTokenizer.from_pretrained(

model_dir, trust_remote_code=True

)

self.model = AutoModelForCausalLM.from_pretrained(

model_dir, torch_dtype="auto", trust_remote_code=True

).to(self.device).eval()

# Load classification head

head_weights = torch.load(head_path, map_location=self.device)

self.head = torch.nn.Linear(1, 1, bias=False).to(self.device)

self.head.weight.data = head_weights

This method ensures that:

The model automatically uses GPU when available
All paths are resolved via MLflow’s artifact system
The tokenizer, base LLM, and classification head are loaded and ready for inference

2. Inference Pipeline

The predict method handles batch inference over a pandas DataFrame:

def predict(self, context, model_input):

# Validate input schema

if "prompt" not in model_input.columns:

raise ValueError("Input DataFrame must contain a 'prompt' column.")

prompts = model_input["prompt"].astype(str).tolist()

with torch.no_grad():

for prompt in prompts:

# Tokenize

inputs = self.tokenizer(

prompt, return_tensors="pt", add_special_tokens=False

).to(self.device)

# Forward pass

logits = self.model(**inputs).logits # [1, seq, vocab]

last = logits[:, -1] # [1, vocab]

# Classification head

cls_logits = self.head(last)

# Convert to probabilities

prob_val = torch.sigmoid(cls_logits)

From here, you can:

Aggregate logits and probabilities into a DataFrame
Optionally add thresholds or business rules for converting probabilities into class labels

This structure is flexible and can easily be extended to multi-class or multi-label setups.

Step 5: Package the Model with MLflow

With the PyFunc wrapper in place, you can now build an MLflow model that includes:

The Python logic (python_model)
The model artifacts (model_dir, head_path)
The input/output schema (signature)
The required Python dependencies

# Define input/output schema

input_schema = Schema([

ColSpec(DataType.string, "prompt")

])

output_schema = Schema([

ColSpec(DataType.string, "logits"),

ColSpec(DataType.string, "probability"),

])

signature = ModelSignature(inputs=input_schema, outputs=output_schema)

# Save MLflow PyFunc model

mlflow.pyfunc.save_model(

path=MLFLOW_MODEL_DIR,

python_model=FTDanube(),

artifacts={

"model_dir": LOCAL_LLM_DIR,

"head_path": HEAD_PATH,

signature=signature,

pip_requirements=[

"mlflow>=2.0.0",

"torch",

"transformers",

"pandas",

"sentencepiece>=0.1.99",

)

# Create deployment artifact

shutil.make_archive(ZIP_BASENAME, "zip", MLFLOW_MODEL_DIR)

Expected outputs:

danube_classification_v1/ – the MLflow model directory (PyFunc-compatible)
artifact_v5.zip – a deployment-ready artifact for your MLOps platform

This .zip file is what you’ll upload or register in your production environment.

Step 6: Deploy to Your MLOps Platform

Once you’ve created the zipped MLflow model, you can deploy it through your H2O MLOPS platform.

Option A: GUI-Based Deployment

Navigate to your platform’s Model Registry or Model Management UI
Click “Upload Model Artifact” (or equivalent)
Select and upload artifact_v5.zip
Create a deployment or endpoint from the registered model version

This option is ideal for teams that prefer visual workflows.

Option B: Python Client Deployment

If you’re already using the MLflow Python client or automation scripts, you can register the model programmatically.

A Full MLflow PyFunc Reference for H2O MLOps

import os

import shutil

import mlflow

import mlflow.pyfunc

import torch

import pandas as pd

from mlflow.models import ModelSignature

from mlflow.types import Schema, ColSpec, DataType

from transformers import AutoModelForCausalLM, AutoTokenizer

# ============================================================

# ===================== USER CONFIG ==========================

# ============================================================

# Path to local Hugging Face CausalLM directory

LOCAL_LLM_DIR = "model_refined-swan"

# Path to classification head weights (torch-saved)

HEAD_PATH = "model_refined-swan/classification_head.pth"

# Output MLflow model directory

MLFLOW_MODEL_DIR = "danube_jan8th_v1"

# Output zip basename (produces artifact_v4.zip)

ZIP_BASENAME = "artifact_v5"

# ============================================================

# =================== MLflow PyFunc ==========================

# ============================================================

class FTDanube(mlflow.pyfunc.PythonModel):

"""

MLflow PyFunc wrapper around:

- A local Hugging Face CausalLM model directory

- A torch-saved classification head applied to last-token logits

Input:

pandas.DataFrame with column:

- 'prompt' (string)

Output:

pandas.DataFrame with columns:

- 'logits' (list[float])

- 'probability' (list[float], sigmoid(logits))

"""

def load_context(self, context):

# Select device automatically

self.device = "cuda" if torch.cuda.is_available() else "cpu"

# MLflow-provided artifact paths

model_dir = context.artifacts["model_dir"]

head_path = context.artifacts["head_path"]

# Load tokenizer and language model

self.tokenizer = AutoTokenizer.from_pretrained(

model_dir,

trust_remote_code=True

)

self.model = AutoModelForCausalLM.from_pretrained(

model_dir,

torch_dtype="auto",

trust_remote_code=True,

).to(self.device).eval()

# Load classification head weights

head_weights = torch.load(head_path, map_location=self.device)

self.head = torch.nn.Linear(1, 1, bias=False).to(self.device)

self.head.weight.data = head_weights

self.head = self.head.to(self.model.dtype)

def predict(self, context, model_input):

# Input validation

if not isinstance(model_input, pd.DataFrame):

raise TypeError("Expected a pandas.DataFrame with a 'prompt' column.")

if "prompt" not in model_input.columns:

raise ValueError("Input DataFrame must contain a 'prompt' column.")

prompts = model_input["prompt"].astype(str).tolist()

logit_rows = []

prob_rows = []

# Disable gradients for inference

with torch.no_grad():

for prompt in prompts:

# Tokenize prompt

inputs = self.tokenizer(

prompt,

return_tensors="pt",

add_special_tokens=False,

).to(self.device)

# Forward pass through LLM

logits = self.model(**inputs).logits # [1, seq, vocab]

last = logits[:, -1] # [1, vocab]

# Apply classification head

cls_logits = self.head(last)

# Convert logits to JSON-serializable list

logit_val = cls_logits.squeeze(0).detach().cpu().float()

logit_rows.append(logit_val.tolist())

# Compute probability via sigmoid

prob_val = torch.sigmoid(logit_val)

prob_rows.append(prob_val.tolist())

return pd.DataFrame({

"logits": logit_rows,

"probability": prob_rows

})

# ============================================================

# ===================== SAVE MODEL ===========================

# ============================================================

def main():

# Validate user paths

if not os.path.isdir(LOCAL_LLM_DIR):

raise FileNotFoundError(f"LOCAL_LLM_DIR not found: {LOCAL_LLM_DIR}")

if not os.path.isfile(HEAD_PATH):

raise FileNotFoundError(f"HEAD_PATH not found: {HEAD_PATH}")

# Define MLflow model signature

input_schema = Schema([

ColSpec(DataType.string, "prompt")

])

output_schema = Schema([

ColSpec(DataType.string, "logits"),

ColSpec(DataType.string, "probability"),

])

signature = ModelSignature(

inputs=input_schema,

outputs=output_schema

)

# Save MLflow PyFunc model

mlflow.pyfunc.save_model(

path=MLFLOW_MODEL_DIR,

python_model=FTDanube(),

artifacts={

"model_dir": LOCAL_LLM_DIR,

"head_path": HEAD_PATH,

signature=signature,

pip_requirements=[

"mlflow>=2.0.0",

"torch",

"transformers",

"pandas",

"sentencepiece>=0.1.99",

)

# Zip the MLflow model directory

shutil.make_archive(ZIP_BASENAME, "zip", MLFLOW_MODEL_DIR)

print("Wrote:", os.path.abspath(ZIP_BASENAME + ".zip"))

if __name__ == "__main__":

main()

Conclusion

By combining LLMStudio, Hugging Face Transformers, and MLflow PyFunc, you get a clean, repeatable path from fine-tuned LLM to production-grade binary classifier:

Train your model in LLMStudio
Download and inspect the artifact
Configure paths for model and head weights
Implement a robust PyFunc wrapper
Package everything into an MLflow model
Deploy to your H2O MLOps platform via UI or API

Santhosh Pattamudu Manoharan

Customer Data Scientist

As a Customer Data Scientist at H2O.ai, Santhosh partners with enterprise customers to unlock business value using advanced machine learning and generative AI solutions. He collaborates closely with product and engineering teams to translate customer feedback into impactful product enhancements and holds a Master’s degree from the University of Colorado Boulder.

BACK TO LIST