Large Language Models (LLMs) are increasingly used for classification tasks such as content moderation, risk scoring, and intent detection. While fine-tuning these models is becoming easier with tools like LLMStudio, getting them into production; with proper packaging, versioning, and deployment, is still a common challenge.
This post walks through a production-ready pattern for packaging and deploying a fine-tuned binary classification LLM using MLflow in a way that integrates cleanly with H2O MLOPS.
By the end of this guide, you’ll know how to:
When you fine-tune an LLM for binary classification in OSS LLMStudio, the typical output artifacts include:
This approach works well if you have your own deployment pipeline. However, if you want to deploy via H2O MLOps, the model must be packaged as an MLflow artifact. This is where MLflow’s PyFunc model abstraction becomes invaluable—it allows you to bundle your model logic, artifacts, and dependencies into a reusable, portable package that can be deployed seamlessly on any MLflow-compatible platform like our H2O MLOPS.
Below is the end-to-end process you’ll follow to go from a trained LLMStudio model to a deployed binary classifier in H2O MLOPS platform.
Start by running your binary classification experiment in LLMStudio:
Once training finishes successfully, LLMStudio will produce a package that includes:
Next, pull the training artifact to your local environment:
Inside this artifact you’ll find, for example:
On your local machine:
unzip
llmstudio_artifact.zip
cd
extracted_artifact/
Inspect the directory structure and identify:
You’ll reference these paths when building your MLflow model.
In your packaging script, configure the paths that point to the model and classification head:
# Path to local Hugging Face CausalLM directory
LOCAL_LLM_DIR = "model_refined-swan"
# Path to classification head weights (torch-saved)
HEAD_PATH = "model_refined-swan/classification_head.pth"
# Output MLflow model directory
MLFLOW_MODEL_DIR = "danube_classification_v1"
# Output zip basename (produces artifact_v5.zip)
ZIP_BASENAME = "artifact_v5"
These constants determine which artifacts are pulled into the MLflow PyFunc model, and where the final deployable bundle will be written.
The core of this deployment pattern is a custom MLflow PyFunc model. It:
Conceptually, the inference flow looks like this:
Input (DataFrame with prompt column)
↓
Tokenization (Hugging Face Tokenizer)
↓
LLM Forward Pass (CausalLM)
↓
Last Token Logits Extraction
↓
Classification Head Application
↓
Sigmoid Activation (for probability)
↓
Output (DataFrame with logits and probability)
This design cleanly separates model loading from inference logic, and returns structured outputs suitable for downstream services or dashboards.
In load_context, you initialize your model using artifacts provided by MLflow at runtime:
def load_context(self, context):
# Automatic device selection
self.device = "cuda" if torch.cuda.is_available() else "cpu"
# Load artifacts from MLflow-provided paths
model_dir = context.artifacts["model_dir"]
head_path = context.artifacts["head_path"]
# Initialize tokenizer and model
self.tokenizer = AutoTokenizer.from_pretrained(
model_dir, trust_remote_code=True
)
self.model = AutoModelForCausalLM.from_pretrained(
model_dir, torch_dtype="auto", trust_remote_code=True
).to(self.device).eval()
# Load classification head
head_weights = torch.load(head_path, map_location=self.device)
self.head = torch.nn.Linear(1, 1, bias=False).to(self.device)
self.head.weight.data = head_weights
This method ensures that:
The predict method handles batch inference over a pandas DataFrame:
def predict(self, context, model_input):
# Validate input schema
if "prompt" not in model_input.columns:
raise ValueError("Input DataFrame must contain a 'prompt' column.")
prompts = model_input["prompt"].astype(str).tolist()
with torch.no_grad():
for prompt in prompts:
# Tokenize
inputs = self.tokenizer(
prompt, return_tensors="pt", add_special_tokens=False
).to(self.device)
# Forward pass
logits = self.model(**inputs).logits # [1, seq, vocab]
last = logits[:, -1] # [1, vocab]
# Classification head
cls_logits = self.head(last)
# Convert to probabilities
prob_val = torch.sigmoid(cls_logits)
From here, you can:
This structure is flexible and can easily be extended to multi-class or multi-label setups.
With the PyFunc wrapper in place, you can now build an MLflow model that includes:
# Define input/output schema
input_schema = Schema([
ColSpec(DataType.string, "prompt")
])
output_schema = Schema([
ColSpec(DataType.string, "logits"),
ColSpec(DataType.string, "probability"),
])
signature = ModelSignature(inputs=input_schema, outputs=output_schema)
# Save MLflow PyFunc model
mlflow.pyfunc.save_model(
path=MLFLOW_MODEL_DIR,
python_model=FTDanube(),
artifacts={
"model_dir": LOCAL_LLM_DIR,
"head_path": HEAD_PATH,
},
signature=signature,
pip_requirements=[
"mlflow>=2.0.0",
"torch",
"transformers",
"pandas",
"sentencepiece>=0.1.99",
],
)
# Create deployment artifact
shutil.make_archive(ZIP_BASENAME, "zip", MLFLOW_MODEL_DIR)
Expected outputs:
This .zip file is what you’ll upload or register in your production environment.
Once you’ve created the zipped MLflow model, you can deploy it through your H2O MLOPS platform.
This option is ideal for teams that prefer visual workflows.
If you’re already using the MLflow Python client or automation scripts, you can register the model programmatically.
import os
import shutil
import mlflow
import mlflow.pyfunc
import torch
import pandas as pd
from mlflow.models import ModelSignature
from mlflow.types import Schema, ColSpec, DataType
from transformers import AutoModelForCausalLM, AutoTokenizer
# ============================================================
# ===================== USER CONFIG ==========================
# ============================================================
# Path to local Hugging Face CausalLM directory
LOCAL_LLM_DIR = "model_refined-swan"
# Path to classification head weights (torch-saved)
HEAD_PATH = "model_refined-swan/classification_head.pth"
# Output MLflow model directory
MLFLOW_MODEL_DIR = "danube_jan8th_v1"
# Output zip basename (produces artifact_v4.zip)
ZIP_BASENAME = "artifact_v5"
# ============================================================
# =================== MLflow PyFunc ==========================
# ============================================================
class FTDanube(mlflow.pyfunc.PythonModel):
"""
MLflow PyFunc wrapper around:
- A local Hugging Face CausalLM model directory
- A torch-saved classification head applied to last-token logits
Input:
pandas.DataFrame with column:
- 'prompt' (string)
Output:
pandas.DataFrame with columns:
- 'logits' (list[float])
- 'probability' (list[float], sigmoid(logits))
"""
def load_context(self, context):
# Select device automatically
self.device = "cuda" if torch.cuda.is_available() else "cpu"
# MLflow-provided artifact paths
model_dir = context.artifacts["model_dir"]
head_path = context.artifacts["head_path"]
# Load tokenizer and language model
self.tokenizer = AutoTokenizer.from_pretrained(
model_dir,
trust_remote_code=True
)
self.model = AutoModelForCausalLM.from_pretrained(
model_dir,
torch_dtype="auto",
trust_remote_code=True,
).to(self.device).eval()
# Load classification head weights
head_weights = torch.load(head_path, map_location=self.device)
self.head = torch.nn.Linear(1, 1, bias=False).to(self.device)
self.head.weight.data = head_weights
self.head = self.head.to(self.model.dtype)
def predict(self, context, model_input):
# Input validation
if not isinstance(model_input, pd.DataFrame):
raise TypeError("Expected a pandas.DataFrame with a 'prompt' column.")
if "prompt" not in model_input.columns:
raise ValueError("Input DataFrame must contain a 'prompt' column.")
prompts = model_input["prompt"].astype(str).tolist()
logit_rows = []
prob_rows = []
# Disable gradients for inference
with torch.no_grad():
for prompt in prompts:
# Tokenize prompt
inputs = self.tokenizer(
prompt,
return_tensors="pt",
add_special_tokens=False,
).to(self.device)
# Forward pass through LLM
logits = self.model(**inputs).logits # [1, seq, vocab]
last = logits[:, -1] # [1, vocab]
# Apply classification head
cls_logits = self.head(last)
# Convert logits to JSON-serializable list
logit_val = cls_logits.squeeze(0).detach().cpu().float()
logit_rows.append(logit_val.tolist())
# Compute probability via sigmoid
prob_val = torch.sigmoid(logit_val)
prob_rows.append(prob_val.tolist())
return pd.DataFrame({
"logits": logit_rows,
"probability": prob_rows
})
# ============================================================
# ===================== SAVE MODEL ===========================
# ============================================================
def main():
# Validate user paths
if not os.path.isdir(LOCAL_LLM_DIR):
raise FileNotFoundError(f"LOCAL_LLM_DIR not found: {LOCAL_LLM_DIR}")
if not os.path.isfile(HEAD_PATH):
raise FileNotFoundError(f"HEAD_PATH not found: {HEAD_PATH}")
# Define MLflow model signature
input_schema = Schema([
ColSpec(DataType.string, "prompt")
])
output_schema = Schema([
ColSpec(DataType.string, "logits"),
ColSpec(DataType.string, "probability"),
])
signature = ModelSignature(
inputs=input_schema,
outputs=output_schema
)
# Save MLflow PyFunc model
mlflow.pyfunc.save_model(
path=MLFLOW_MODEL_DIR,
python_model=FTDanube(),
artifacts={
"model_dir": LOCAL_LLM_DIR,
"head_path": HEAD_PATH,
},
signature=signature,
pip_requirements=[
"mlflow>=2.0.0",
"torch",
"transformers",
"pandas",
"sentencepiece>=0.1.99",
],
)
# Zip the MLflow model directory
shutil.make_archive(ZIP_BASENAME, "zip", MLFLOW_MODEL_DIR)
print("Wrote:", os.path.abspath(ZIP_BASENAME + ".zip"))
if __name__ == "__main__":
main()
By combining LLMStudio, Hugging Face Transformers, and MLflow PyFunc, you get a clean, repeatable path from fine-tuned LLM to production-grade binary classifier: