Source: ebisu/docs/adr/0056-staged-intelligence-import.md | ✏️ Edit on GitHub

ADR-056: Staged Intelligence Import Architecture

Status

Proposed

Context

The current RFMO import system has a fundamental flaw: it attempts real-time vessel matching during individual RFMO imports. This creates several critical problems:

Vessel Identity Confusion: Same vessel names from different RFMOs get incorrectly merged (e.g., "Salwa" vessels in ICCAT)
Import Order Dependency: The first RFMO imported becomes the "truth" for subsequent imports
Massive Data Loss: 33.7% of cleaned data is lost during import due to matching failures
Intelligence Loss: Temporal and cross-source patterns are destroyed by premature matching

Example of Current Problem

ICCAT has vessel "Salwa" with:

ICCAT Serial AT0046355, Owner: Ridha dridi
ICCAT Serial AT0046525, Owner: mohamed grafi

Current system treats these as the same vessel because name+flag match, losing critical intelligence about potentially different vessels or ownership changes.

Decision

Implement a Staged Intelligence Import Architecture that mirrors intelligence community best practices:

Stage 1: Raw Intelligence Collection

Import ALL RFMO data as separate intelligence reports
NO vessel matching during import
Each report gets unique intelligence_report_id
Preserve every data point exactly as reported

Stage 2: Cross-Source Identity Resolution

Analyze ALL collected intelligence to identify vessel entities
Use hierarchical matching: IMO > IRCS > MMSI > Name+Flag+Context
Generate confidence scores for each identity resolution
Preserve conflicts as intelligence indicators

Stage 3: Trust Scoring and Pattern Analysis

Calculate trust scores based on source agreement
Identify deception patterns (conflicting identities)
Flag potential IUU indicators (rapid flag changes, conflicting data)

Implementation Plan

Phase 1: Raw Intelligence Tables

-- Raw intelligence reports (no vessel matching)
CREATE TABLE intelligence_reports (
    report_id UUID PRIMARY KEY,
    source_id UUID REFERENCES original_sources_vessels,
    report_date DATE,
    raw_data JSONB,  -- Exact data as reported
    created_at TIMESTAMP DEFAULT NOW()
);

-- Vessel intelligence extracted from reports
CREATE TABLE vessel_intelligence (
    intelligence_id UUID PRIMARY KEY,
    report_id UUID REFERENCES intelligence_reports,
    vessel_name TEXT,
    imo TEXT,
    ircs TEXT,
    mmsi TEXT,
    flag_code TEXT,
    rfmo_vessel_id TEXT,  -- Source-specific ID
    additional_data JSONB,
    created_at TIMESTAMP DEFAULT NOW()
);

Phase 2: Identity Resolution Engine

-- Cross-source vessel identity resolution
CREATE TABLE vessel_identity_clusters (
    cluster_id UUID PRIMARY KEY,
    master_vessel_uuid UUID,  -- Final resolved vessel
    confidence_score DECIMAL(3,2),
    resolution_method TEXT,
    created_at TIMESTAMP DEFAULT NOW()
);

-- Links intelligence to resolved identities
CREATE TABLE intelligence_to_identity (
    intelligence_id UUID REFERENCES vessel_intelligence,
    cluster_id UUID REFERENCES vessel_identity_clusters,
    match_confidence DECIMAL(3,2),
    match_reason TEXT,
    created_at TIMESTAMP DEFAULT NOW()
);

Phase 3: Conflict and Pattern Analysis

-- Identity conflicts for intelligence analysis
CREATE TABLE identity_conflicts (
    conflict_id UUID PRIMARY KEY,
    cluster_id UUID REFERENCES vessel_identity_clusters,
    conflict_type TEXT,  -- 'FLAG_CHANGE', 'NAME_CONFLICT', 'OWNER_CHANGE'
    conflicting_intelligence JSONB[],
    risk_indicators TEXT[],
    analyst_notes TEXT,
    created_at TIMESTAMP DEFAULT NOW()
);

Benefits

Complete Data Preservation: No data loss during import
True Intelligence Analysis: Cross-source pattern detection
Temporal Intelligence: Track changes over time across all sources
Deception Detection: Identify vessels trying to hide identity
Audit Trail: Complete provenance of all intelligence
Flexible Matching: Can re-run identity resolution with improved algorithms

Risks and Mitigations

Risk: More complex architecture Mitigation: Phase implementation, start with raw collection

Risk: Larger data storage requirements
Mitigation: Intelligence compression, archival strategies

Risk: Longer processing time Mitigation: Batch processing, parallel analysis

Implementation Steps

Create staged import tables
Convert RFMO loaders to raw intelligence collection
Implement identity resolution engine
Build conflict analysis tools
Migrate existing data to new structure

Success Metrics

0% data loss during import (vs current 33.7%)
Cross-RFMO vessel identification accuracy >95%
Deception pattern detection capability
Complete audit trail for all intelligence

References

Intelligence Community Analytic Standards
Competing Hypotheses Analysis methodology
Maritime Domain Awareness best practices

Status​

Context​

Example of Current Problem​

Decision​

Stage 1: Raw Intelligence Collection​

Stage 2: Cross-Source Identity Resolution​

Stage 3: Trust Scoring and Pattern Analysis​

Implementation Plan​

Phase 1: Raw Intelligence Tables​

Phase 2: Identity Resolution Engine​

Phase 3: Conflict and Pattern Analysis​

Benefits​

Risks and Mitigations​

Implementation Steps​

Success Metrics​

References​