Source:
ebisu/docs/adr/0056-staged-intelligence-import.md| ✏️ Edit on GitHub
ADR-056: Staged Intelligence Import Architecture
Status
Proposed
Context
The current RFMO import system has a fundamental flaw: it attempts real-time vessel matching during individual RFMO imports. This creates several critical problems:
- Vessel Identity Confusion: Same vessel names from different RFMOs get incorrectly merged (e.g., "Salwa" vessels in ICCAT)
- Import Order Dependency: The first RFMO imported becomes the "truth" for subsequent imports
- Massive Data Loss: 33.7% of cleaned data is lost during import due to matching failures
- Intelligence Loss: Temporal and cross-source patterns are destroyed by premature matching
Example of Current Problem
ICCAT has vessel "Salwa" with:
- ICCAT Serial AT0046355, Owner: Ridha dridi
- ICCAT Serial AT0046525, Owner: mohamed grafi
Current system treats these as the same vessel because name+flag match, losing critical intelligence about potentially different vessels or ownership changes.
Decision
Implement a Staged Intelligence Import Architecture that mirrors intelligence community best practices:
Stage 1: Raw Intelligence Collection
- Import ALL RFMO data as separate intelligence reports
- NO vessel matching during import
- Each report gets unique intelligence_report_id
- Preserve every data point exactly as reported
Stage 2: Cross-Source Identity Resolution
- Analyze ALL collected intelligence to identify vessel entities
- Use hierarchical matching: IMO > IRCS > MMSI > Name+Flag+Context
- Generate confidence scores for each identity resolution
- Preserve conflicts as intelligence indicators
Stage 3: Trust Scoring and Pattern Analysis
- Calculate trust scores based on source agreement
- Identify deception patterns (conflicting identities)
- Flag potential IUU indicators (rapid flag changes, conflicting data)
Implementation Plan
Phase 1: Raw Intelligence Tables
-- Raw intelligence reports (no vessel matching)
CREATE TABLE intelligence_reports (
report_id UUID PRIMARY KEY,
source_id UUID REFERENCES original_sources_vessels,
report_date DATE,
raw_data JSONB, -- Exact data as reported
created_at TIMESTAMP DEFAULT NOW()
);
-- Vessel intelligence extracted from reports
CREATE TABLE vessel_intelligence (
intelligence_id UUID PRIMARY KEY,
report_id UUID REFERENCES intelligence_reports,
vessel_name TEXT,
imo TEXT,
ircs TEXT,
mmsi TEXT,
flag_code TEXT,
rfmo_vessel_id TEXT, -- Source-specific ID
additional_data JSONB,
created_at TIMESTAMP DEFAULT NOW()
);
Phase 2: Identity Resolution Engine
-- Cross-source vessel identity resolution
CREATE TABLE vessel_identity_clusters (
cluster_id UUID PRIMARY KEY,
master_vessel_uuid UUID, -- Final resolved vessel
confidence_score DECIMAL(3,2),
resolution_method TEXT,
created_at TIMESTAMP DEFAULT NOW()
);
-- Links intelligence to resolved identities
CREATE TABLE intelligence_to_identity (
intelligence_id UUID REFERENCES vessel_intelligence,
cluster_id UUID REFERENCES vessel_identity_clusters,
match_confidence DECIMAL(3,2),
match_reason TEXT,
created_at TIMESTAMP DEFAULT NOW()
);
Phase 3: Conflict and Pattern Analysis
-- Identity conflicts for intelligence analysis
CREATE TABLE identity_conflicts (
conflict_id UUID PRIMARY KEY,
cluster_id UUID REFERENCES vessel_identity_clusters,
conflict_type TEXT, -- 'FLAG_CHANGE', 'NAME_CONFLICT', 'OWNER_CHANGE'
conflicting_intelligence JSONB[],
risk_indicators TEXT[],
analyst_notes TEXT,
created_at TIMESTAMP DEFAULT NOW()
);
Benefits
- Complete Data Preservation: No data loss during import
- True Intelligence Analysis: Cross-source pattern detection
- Temporal Intelligence: Track changes over time across all sources
- Deception Detection: Identify vessels trying to hide identity
- Audit Trail: Complete provenance of all intelligence
- Flexible Matching: Can re-run identity resolution with improved algorithms
Risks and Mitigations
Risk: More complex architecture Mitigation: Phase implementation, start with raw collection
Risk: Larger data storage requirements
Mitigation: Intelligence compression, archival strategies
Risk: Longer processing time Mitigation: Batch processing, parallel analysis
Implementation Steps
- Create staged import tables
- Convert RFMO loaders to raw intelligence collection
- Implement identity resolution engine
- Build conflict analysis tools
- Migrate existing data to new structure
Success Metrics
- 0% data loss during import (vs current 33.7%)
- Cross-RFMO vessel identification accuracy >95%
- Deception pattern detection capability
- Complete audit trail for all intelligence
References
- Intelligence Community Analytic Standards
- Competing Hypotheses Analysis methodology
- Maritime Domain Awareness best practices