Skip to main content

Source: ebisu/docs/adr/0056-staged-intelligence-import.md | ✏️ Edit on GitHub

ADR-056: Staged Intelligence Import Architecture

Status

Proposed

Context

The current RFMO import system has a fundamental flaw: it attempts real-time vessel matching during individual RFMO imports. This creates several critical problems:

  1. Vessel Identity Confusion: Same vessel names from different RFMOs get incorrectly merged (e.g., "Salwa" vessels in ICCAT)
  2. Import Order Dependency: The first RFMO imported becomes the "truth" for subsequent imports
  3. Massive Data Loss: 33.7% of cleaned data is lost during import due to matching failures
  4. Intelligence Loss: Temporal and cross-source patterns are destroyed by premature matching

Example of Current Problem

ICCAT has vessel "Salwa" with:

  • ICCAT Serial AT0046355, Owner: Ridha dridi
  • ICCAT Serial AT0046525, Owner: mohamed grafi

Current system treats these as the same vessel because name+flag match, losing critical intelligence about potentially different vessels or ownership changes.

Decision

Implement a Staged Intelligence Import Architecture that mirrors intelligence community best practices:

Stage 1: Raw Intelligence Collection

  • Import ALL RFMO data as separate intelligence reports
  • NO vessel matching during import
  • Each report gets unique intelligence_report_id
  • Preserve every data point exactly as reported

Stage 2: Cross-Source Identity Resolution

  • Analyze ALL collected intelligence to identify vessel entities
  • Use hierarchical matching: IMO > IRCS > MMSI > Name+Flag+Context
  • Generate confidence scores for each identity resolution
  • Preserve conflicts as intelligence indicators

Stage 3: Trust Scoring and Pattern Analysis

  • Calculate trust scores based on source agreement
  • Identify deception patterns (conflicting identities)
  • Flag potential IUU indicators (rapid flag changes, conflicting data)

Implementation Plan

Phase 1: Raw Intelligence Tables

-- Raw intelligence reports (no vessel matching)
CREATE TABLE intelligence_reports (
report_id UUID PRIMARY KEY,
source_id UUID REFERENCES original_sources_vessels,
report_date DATE,
raw_data JSONB, -- Exact data as reported
created_at TIMESTAMP DEFAULT NOW()
);

-- Vessel intelligence extracted from reports
CREATE TABLE vessel_intelligence (
intelligence_id UUID PRIMARY KEY,
report_id UUID REFERENCES intelligence_reports,
vessel_name TEXT,
imo TEXT,
ircs TEXT,
mmsi TEXT,
flag_code TEXT,
rfmo_vessel_id TEXT, -- Source-specific ID
additional_data JSONB,
created_at TIMESTAMP DEFAULT NOW()
);

Phase 2: Identity Resolution Engine

-- Cross-source vessel identity resolution
CREATE TABLE vessel_identity_clusters (
cluster_id UUID PRIMARY KEY,
master_vessel_uuid UUID, -- Final resolved vessel
confidence_score DECIMAL(3,2),
resolution_method TEXT,
created_at TIMESTAMP DEFAULT NOW()
);

-- Links intelligence to resolved identities
CREATE TABLE intelligence_to_identity (
intelligence_id UUID REFERENCES vessel_intelligence,
cluster_id UUID REFERENCES vessel_identity_clusters,
match_confidence DECIMAL(3,2),
match_reason TEXT,
created_at TIMESTAMP DEFAULT NOW()
);

Phase 3: Conflict and Pattern Analysis

-- Identity conflicts for intelligence analysis
CREATE TABLE identity_conflicts (
conflict_id UUID PRIMARY KEY,
cluster_id UUID REFERENCES vessel_identity_clusters,
conflict_type TEXT, -- 'FLAG_CHANGE', 'NAME_CONFLICT', 'OWNER_CHANGE'
conflicting_intelligence JSONB[],
risk_indicators TEXT[],
analyst_notes TEXT,
created_at TIMESTAMP DEFAULT NOW()
);

Benefits

  1. Complete Data Preservation: No data loss during import
  2. True Intelligence Analysis: Cross-source pattern detection
  3. Temporal Intelligence: Track changes over time across all sources
  4. Deception Detection: Identify vessels trying to hide identity
  5. Audit Trail: Complete provenance of all intelligence
  6. Flexible Matching: Can re-run identity resolution with improved algorithms

Risks and Mitigations

Risk: More complex architecture Mitigation: Phase implementation, start with raw collection

Risk: Larger data storage requirements
Mitigation: Intelligence compression, archival strategies

Risk: Longer processing time Mitigation: Batch processing, parallel analysis

Implementation Steps

  1. Create staged import tables
  2. Convert RFMO loaders to raw intelligence collection
  3. Implement identity resolution engine
  4. Build conflict analysis tools
  5. Migrate existing data to new structure

Success Metrics

  • 0% data loss during import (vs current 33.7%)
  • Cross-RFMO vessel identification accuracy >95%
  • Deception pattern detection capability
  • Complete audit trail for all intelligence

References

  • Intelligence Community Analytic Standards
  • Competing Hypotheses Analysis methodology
  • Maritime Domain Awareness best practices