Skip to main content

Source: oceanid/index.md | ✏️ Edit on GitHub

Oceanid Infrastructure

Pulumi-powered GitOps stack for operating the Oceanid K3s fleet behind Cloudflare Zero Trust.

Overview

Oceanid serves as the data processing + ML pipeline layer that cleans and validates data before promotion to the @ebisu globalDB (maritime intelligence platform).

Infrastructure Architecture

ProjectManagesRuns WhereTriggered By
cloud/Cloudflare DNS/Access, CrunchyBridge PostgreSQL, ESC secretsGitHub Actions (OIDC)Push to cloud/**
cluster/K3s bootstrap, Flux, PKO, Cloudflare tunnelsLocal / Self-hosted runnerManual pulumi up
clusters/Application workloads (Label Studio, etc.)Flux CD in-clusterPush to clusters/**
policy/OPA security policies, TypeScript helpersGitHub Actions CIAll PRs

Key Principle: Cloud resources (DNS, DB) are automated via CI. Cluster bootstrap requires kubeconfig and runs locally. Applications deploy via GitOps.

Data Pipeline

Raw CSV/PDF → Docling-Granite → ML Cleaning → Human Review
↓ ↓ ↓ ↓
Label Studio Structure csv-repair Corrections
Extraction -bert

↓ Promotion (audited)

@ebisu GlobalDB (Production)

Components

  • Triton Inference (Calypso GPU): ML models for NER and PDF extraction
  • Label Studio: Annotation + review UI
  • Staging DB: Document versions + cleaning audit
  • Ingestion Worker: Automated CSV processing

Documentation

  • Setup Guides: /docs/guides/setup/ - ESC, GitHub tokens, secrets management
  • Operations: /docs/operations/ - ML backend, SQL playground setup
  • SME Guides: /@docs/guides/SME/ - Subject matter expert documentation
  • ADRs: /docs/adr/ - Architecture decision records