Source:
ebisu/docs/adr/0002-vessel-trust-scoring-mdm.md| ✏️ Edit on GitHub
ADR-0002: Vessel Trust Scoring and Master Data Management
Status
Implemented
Context
The Ebisu vessel database is effectively a Master Data Management (MDM) system for global fishing vessels. With 40+ data sources of varying quality and authority, we need a systematic approach to:
- Track vessel presence across ALL sources (not just first appearance)
- Assess data quality and trustworthiness for AI training
- Handle conflicting information between sources
- Build confidence in vessel identity and attributes
This is critical for AI readiness because training models on poor quality or conflicting data leads to unreliable predictions.
Decision
We implemented a comprehensive trust scoring system with these components
-
Source Authority Levels
- AUTHORITATIVE: RFMOs, government registries
- VERIFIED: Certification bodies, verified civil society
- REPUTABLE: Established NGOs, research institutions
- UNVERIFIED: New sources, unverified data
- BLACKLIST: IUU lists, sanctions (negative authority)
-
Trust Score Calculation (0.0-1.0)
- 35% Source authority (more authoritative sources = higher trust)
- 25% Identifier strength (IMO > IRCS+MMSI > IRCS > MMSI > Name)
- 20% Data completeness (more fields = better)
- 10% Data consistency (fewer conflicts = better)
- 10% Temporal relevance (recent data = better)
-
Always Track Source Presence
- Every import records vessel presence in that source
- Even if vessel already exists from another source
- Tracks which fields each source provides
- Enables "vessel appears in X authoritative sources" queries
-
Conflict Detection & Resolution
- Automatically detects when sources disagree
- Records all conflicting values with source attribution
- Auto-resolves using most authoritative source
- Tracks unresolved conflicts for manual review
-
AI Training Suitability
- Vessels marked AI-suitable if:
- Trust score >= 0.7
- Data completeness >= 0.6
- Not on any blacklist
- Provides confidence scores for ML models
- Vessels marked AI-suitable if:
Implementation Details
Database Schema
-- Trust scores table
vessel_trust_scores (
vessel_uuid,
trust_score, -- Overall 0-1 score
source_count, -- Total sources
authoritative_source_count,
blacklist_source_count,
data_completeness, -- Field coverage
data_consistency, -- Conflict measure
ai_training_suitable, -- Boolean flag
score_components -- JSONB breakdown
)
-- Conflict tracking
vessel_data_conflicts (
vessel_uuid,
field_name,
values, -- JSONB array of {source, value, date}
resolution_method,
resolved_value
)
Key Functions
calculate_vessel_trust_score()- Computes trust for a vesselrecord_vessel_source_presence()- Always tracks source appearancefind_or_create_vessel_with_trust()- Enhanced matching with trustdetect_vessel_data_conflicts()- Identifies disagreements
Consequences
Positive
- AI Training Quality: Can select high-trust vessels for training
- Explainable Trust: Score components show why a vessel is trusted
- Source Attribution: Know exactly which sources report a vessel
- Conflict Visibility: See where sources disagree
- Temporal Tracking: Understand vessel appearance/disappearance patterns
Negative
- Processing Overhead: Trust calculation adds ~20% import time
- Storage Growth: More metadata stored per vessel
- Complexity: Importers must use new trust-aware functions
For AI/ML
- Training Data Selection: Use
ai_training_vesselsview - Confidence Weighting: Use trust scores as sample weights
- Feature Engineering: Source count, authority mix as features
- Data Quality Metrics: Built-in completeness/consistency scores
Example Usage
-- Get AI-ready vessels
SELECT * FROM ai_training_vessels
WHERE trust_score >= 0.8;
-- Check vessel trustworthiness
SELECT * FROM get_vessel_trust_summary('vessel-uuid');
-- Find vessels in multiple authoritative sources
SELECT v.*, vts.source_count, vts.trust_score
FROM vessels v
JOIN vessel_trust_scores vts ON v.vessel_uuid = vts.vessel_uuid
WHERE vts.authoritative_source_count >= 3
ORDER BY vts.trust_score DESC;
Migration Impact
All vessel importers must be updated to:
- Use
find_or_create_vessel_with_trust()instead of basic matching - Pass field presence arrays for completeness tracking
- Call trust scoring after import batches
Success Metrics
- 80%+ vessels have trust scores calculated
- 60%+ vessels marked AI-training suitable
- <5% vessels with unresolved conflicts
- Average trust score > 0.7 across database
References
- Migration:
0009_vessel_trust_scoring.sql - Functions:
vessel_trust_functions.sql - Related: ADR-0001 (Vessel Import Strategy)