Skip to main content

Source: ebisu/docs/adr/0002-vessel-trust-scoring-mdm.md | ✏️ Edit on GitHub

ADR-0002: Vessel Trust Scoring and Master Data Management

Status

Implemented

Context

The Ebisu vessel database is effectively a Master Data Management (MDM) system for global fishing vessels. With 40+ data sources of varying quality and authority, we need a systematic approach to:

  1. Track vessel presence across ALL sources (not just first appearance)
  2. Assess data quality and trustworthiness for AI training
  3. Handle conflicting information between sources
  4. Build confidence in vessel identity and attributes

This is critical for AI readiness because training models on poor quality or conflicting data leads to unreliable predictions.

Decision

We implemented a comprehensive trust scoring system with these components

  1. Source Authority Levels

    • AUTHORITATIVE: RFMOs, government registries
    • VERIFIED: Certification bodies, verified civil society
    • REPUTABLE: Established NGOs, research institutions
    • UNVERIFIED: New sources, unverified data
    • BLACKLIST: IUU lists, sanctions (negative authority)
  2. Trust Score Calculation (0.0-1.0)

    • 35% Source authority (more authoritative sources = higher trust)
    • 25% Identifier strength (IMO > IRCS+MMSI > IRCS > MMSI > Name)
    • 20% Data completeness (more fields = better)
    • 10% Data consistency (fewer conflicts = better)
    • 10% Temporal relevance (recent data = better)
  3. Always Track Source Presence

    • Every import records vessel presence in that source
    • Even if vessel already exists from another source
    • Tracks which fields each source provides
    • Enables "vessel appears in X authoritative sources" queries
  4. Conflict Detection & Resolution

    • Automatically detects when sources disagree
    • Records all conflicting values with source attribution
    • Auto-resolves using most authoritative source
    • Tracks unresolved conflicts for manual review
  5. AI Training Suitability

    • Vessels marked AI-suitable if:
      • Trust score >= 0.7
      • Data completeness >= 0.6
      • Not on any blacklist
    • Provides confidence scores for ML models

Implementation Details

Database Schema

-- Trust scores table
vessel_trust_scores (
vessel_uuid,
trust_score, -- Overall 0-1 score
source_count, -- Total sources
authoritative_source_count,
blacklist_source_count,
data_completeness, -- Field coverage
data_consistency, -- Conflict measure
ai_training_suitable, -- Boolean flag
score_components -- JSONB breakdown
)

-- Conflict tracking
vessel_data_conflicts (
vessel_uuid,
field_name,
values, -- JSONB array of {source, value, date}
resolution_method,
resolved_value
)

Key Functions

  • calculate_vessel_trust_score() - Computes trust for a vessel
  • record_vessel_source_presence() - Always tracks source appearance
  • find_or_create_vessel_with_trust() - Enhanced matching with trust
  • detect_vessel_data_conflicts() - Identifies disagreements

Consequences

Positive

  • AI Training Quality: Can select high-trust vessels for training
  • Explainable Trust: Score components show why a vessel is trusted
  • Source Attribution: Know exactly which sources report a vessel
  • Conflict Visibility: See where sources disagree
  • Temporal Tracking: Understand vessel appearance/disappearance patterns

Negative

  • Processing Overhead: Trust calculation adds ~20% import time
  • Storage Growth: More metadata stored per vessel
  • Complexity: Importers must use new trust-aware functions

For AI/ML

  • Training Data Selection: Use ai_training_vessels view
  • Confidence Weighting: Use trust scores as sample weights
  • Feature Engineering: Source count, authority mix as features
  • Data Quality Metrics: Built-in completeness/consistency scores

Example Usage

-- Get AI-ready vessels
SELECT * FROM ai_training_vessels
WHERE trust_score >= 0.8;

-- Check vessel trustworthiness
SELECT * FROM get_vessel_trust_summary('vessel-uuid');

-- Find vessels in multiple authoritative sources
SELECT v.*, vts.source_count, vts.trust_score
FROM vessels v
JOIN vessel_trust_scores vts ON v.vessel_uuid = vts.vessel_uuid
WHERE vts.authoritative_source_count >= 3
ORDER BY vts.trust_score DESC;

Migration Impact

All vessel importers must be updated to:

  1. Use find_or_create_vessel_with_trust() instead of basic matching
  2. Pass field presence arrays for completeness tracking
  3. Call trust scoring after import batches

Success Metrics

  • 80%+ vessels have trust scores calculated
  • 60%+ vessels marked AI-training suitable
  • <5% vessels with unresolved conflicts
  • Average trust score > 0.7 across database

References

  • Migration: 0009_vessel_trust_scoring.sql
  • Functions: vessel_trust_functions.sql
  • Related: ADR-0001 (Vessel Import Strategy)