Source:
oceanid/scripts/README-ml-backend.md| ✏️ Edit on GitHub
ML Backend Connection Script
Programmatically connect the Triton ML backend to Label Studio projects.
Recommended: Manual per-project connection
The recommended approach is to connect the ML backend from the Label Studio UI on a per‑project basis:
- Project → Settings → Model → Connect model
- URL:
http://ls-triton-adapter.apps.svc.cluster.local:9090 - Authentication: none
This avoids coupling infrastructure to a personal user token.
Quick Start
# 1. Get your Label Studio API key
# Navigate to: https://label.boathou.se → Account & Settings → Access Token
# 2. Set environment variables
export LABEL_STUDIO_URL=https://label.boathou.se
export LABEL_STUDIO_API_KEY='your_token_here'
# 3. Install dependencies (one-time)
pip install label-studio-sdk requests
# 4. Run the script
python3 scripts/connect-ml-backend.py
What This Script Does (optional)
- Connects to Label Studio using your user access token
- Finds your project by name (default: "SME 2025")
- Adds ML backend URL to project settings
- Enables pre-annotations for automatic labeling
- Tests backend health to verify connectivity
Configuration
Override defaults with environment variables:
# Custom project name
export PROJECT_NAME="My Project"
# Custom ML backend URL (if not using default cluster service)
export ML_BACKEND_URL="http://custom-backend.example.com:9090"
# Run script
python3 scripts/connect-ml-backend.py
Defaults
- Label Studio URL:
https://label.boathou.se - ML Backend URL:
http://ls-triton-adapter.apps.svc.cluster.local:9090 - Project Name:
SME 2025
ML Backend Capabilities
The ls-triton-adapter backend provides:
Named Entity Recognition (NER)
- Model: DistilBERT base uncased
- Task Type: Text annotation
- Entities: Configured via ESC (vessel_name, IMO, IRCS, etc.)
PDF Document Extraction
- Model: Docling Granite 258M
- Task Type: Document processing
- Extracts: Tables (TEDS 0.97), formulas (F1 0.968), code (F1 0.988), text (F1 0.84)
Using Pre-Annotations
After connecting the backend:
-
Import tasks into your project
- Text files for NER
- PDF URLs for document extraction
-
Generate predictions
- UI: Settings → Machine Learning → "Retrieve predictions"
- API: Use
/api/projects/{project_id}/predictionsendpoint
-
Review and correct
- Pre-labels appear automatically
- Human annotators review and correct
- Corrections feed back into training pipeline
Troubleshooting
"LABEL_STUDIO_API_KEY environment variable not set"
Get your API key from Label Studio:
- Navigate to https://label.boathou.se
- Click your profile → Account & Settings
- Go to Access Token section
- Copy token:
export LABEL_STUDIO_API_KEY='token_here'
"Project 'SME 2025' not found"
List available projects:
from label_studio_sdk.client import LabelStudio
ls = LabelStudio(base_url='...', api_key='...')
for p in ls.projects.list():
print(f"{p.title} (ID: {p.id})")
Or specify exact project name:
export PROJECT_NAME="My Exact Project Name"
"Could not reach ML backend directly"
This is expected when running from outside the cluster. The backend is accessible to Label Studio pods via internal cluster networking. To test manually:
# From within cluster
kubectl port-forward -n apps svc/ls-triton-adapter 9090:9090 &
curl http://localhost:9090/health
"403 Forbidden" or "401 Unauthorized"
- Verify API key is correct
- Check token hasn't expired
- Ensure user has project access permissions
Advanced Usage
Connect Multiple Projects
for project in "SME 2025" "Maritime Data" "Vessel Registry"; do
export PROJECT_NAME="$project"
python3 scripts/connect-ml-backend.py
done
Custom Backend Configuration
Edit script to customize:
is_interactive: Enable/disable live predictions during annotationtitle: Display name for backenddescription: Backend description shown in UI
API Reference
The script uses these Label Studio API endpoints:
GET /api/projects- List projectsGET /api/ml?project={id}- List ML backends for projectPOST /api/ml- Add ML backendPATCH /api/projects/{id}- Update project settings
Full API docs: https://api.labelstud.io
Integration with Training Pipeline
- Tasks enter system
- ML backend generates predictions
- Human annotators review/correct
- Corrections stored in HuggingFace dataset
- Nightly GitHub Action retrains model
- Updated model deployed to Triton
- Improved predictions for next batch
See Also
cluster/src/components/lsTritonAdapter.ts- Backend implementationOPERATIONS.md- Full infrastructure guideSME_READINESS.md- Onboarding guide