Source: oceanid/scripts/README-ml-backend.md | ✏️ Edit on GitHub

ML Backend Connection Script

Programmatically connect the Triton ML backend to Label Studio projects.

Recommended: Manual per-project connection

The recommended approach is to connect the ML backend from the Label Studio UI on a per‑project basis:

Project → Settings → Model → Connect model
URL: http://ls-triton-adapter.apps.svc.cluster.local:9090
Authentication: none

This avoids coupling infrastructure to a personal user token.

Quick Start

# 1. Get your Label Studio API key
# Navigate to: https://label.boathou.se → Account & Settings → Access Token

# 2. Set environment variables
export LABEL_STUDIO_URL=https://label.boathou.se
export LABEL_STUDIO_API_KEY='your_token_here'

# 3. Install dependencies (one-time)
pip install label-studio-sdk requests

# 4. Run the script
python3 scripts/connect-ml-backend.py

What This Script Does (optional)

Connects to Label Studio using your user access token
Finds your project by name (default: "SME 2025")
Adds ML backend URL to project settings
Enables pre-annotations for automatic labeling
Tests backend health to verify connectivity

Configuration

Override defaults with environment variables:

# Custom project name
export PROJECT_NAME="My Project"

# Custom ML backend URL (if not using default cluster service)
export ML_BACKEND_URL="http://custom-backend.example.com:9090"

# Run script
python3 scripts/connect-ml-backend.py

Defaults

Label Studio URL: https://label.boathou.se
ML Backend URL: http://ls-triton-adapter.apps.svc.cluster.local:9090
Project Name: SME 2025

ML Backend Capabilities

The ls-triton-adapter backend provides:

Named Entity Recognition (NER)

Model: DistilBERT base uncased
Task Type: Text annotation
Entities: Configured via ESC (vessel_name, IMO, IRCS, etc.)

PDF Document Extraction

Model: Docling Granite 258M
Task Type: Document processing
Extracts: Tables (TEDS 0.97), formulas (F1 0.968), code (F1 0.988), text (F1 0.84)

Using Pre-Annotations

After connecting the backend:

Import tasks into your project
- Text files for NER
- PDF URLs for document extraction
Generate predictions
- UI: Settings → Machine Learning → "Retrieve predictions"
- API: Use /api/projects/{project_id}/predictions endpoint
Review and correct
- Pre-labels appear automatically
- Human annotators review and correct
- Corrections feed back into training pipeline

Troubleshooting

"LABEL_STUDIO_API_KEY environment variable not set"

Get your API key from Label Studio:

Navigate to https://label.boathou.se
Click your profile → Account & Settings
Go to Access Token section
Copy token: export LABEL_STUDIO_API_KEY='token_here'

"Project 'SME 2025' not found"

List available projects:

from label_studio_sdk.client import LabelStudio
ls = LabelStudio(base_url='...', api_key='...')
for p in ls.projects.list():
    print(f"{p.title} (ID: {p.id})")

Or specify exact project name:

export PROJECT_NAME="My Exact Project Name"

"Could not reach ML backend directly"

This is expected when running from outside the cluster. The backend is accessible to Label Studio pods via internal cluster networking. To test manually:

# From within cluster
kubectl port-forward -n apps svc/ls-triton-adapter 9090:9090 &
curl http://localhost:9090/health

"403 Forbidden" or "401 Unauthorized"

Verify API key is correct
Check token hasn't expired
Ensure user has project access permissions

Advanced Usage

Connect Multiple Projects

for project in "SME 2025" "Maritime Data" "Vessel Registry"; do
  export PROJECT_NAME="$project"
  python3 scripts/connect-ml-backend.py
done

Custom Backend Configuration

Edit script to customize:

is_interactive: Enable/disable live predictions during annotation
title: Display name for backend
description: Backend description shown in UI

API Reference

The script uses these Label Studio API endpoints:

GET /api/projects - List projects
GET /api/ml?project={id} - List ML backends for project
POST /api/ml - Add ML backend
PATCH /api/projects/{id} - Update project settings

Full API docs: https://api.labelstud.io

Integration with Training Pipeline

Tasks enter system
ML backend generates predictions
Human annotators review/correct
Corrections stored in HuggingFace dataset
Nightly GitHub Action retrains model
Updated model deployed to Triton
Improved predictions for next batch

Recommended: Manual per-project connection​

Quick Start​

What This Script Does (optional)​

Configuration​

Defaults​

ML Backend Capabilities​

Named Entity Recognition (NER)​

PDF Document Extraction​

Using Pre-Annotations​

Troubleshooting​

"LABEL_STUDIO_API_KEY environment variable not set"​

"Project 'SME 2025' not found"​

"Could not reach ML backend directly"​

"403 Forbidden" or "401 Unauthorized"​

Advanced Usage​

Connect Multiple Projects​

Custom Backend Configuration​

API Reference​

Integration with Training Pipeline​

See Also​