Skip to main content

Source: ebisu/docs/INTELLIGENCE_ARCHITECTURE.md | ✏️ Edit on GitHub

Ebisu Intelligence Architecture

Overview

Ebisu implements a staged intelligence import architecture that treats multiple reports as confirmations, not duplicates. This document explains the critical architectural decisions and data flow.

Core Principles

1. There Are No Duplicates in Intelligence

  • Multiple reports of the same vessel = CONFIRMATIONS
  • More sources reporting = HIGHER confidence
  • Every report is preserved in its original form
  • Cross-source validation increases trust

2. Complete Data Isolation

  • Each source maintains separate records
  • No cross-contamination between sources
  • Original data always preserved
  • Analysis layers are read-only

3. Temporal Intelligence

  • Track changes over time
  • Detect vessel behavior patterns
  • Monitor flag changes and ownership transfers
  • Build historical intelligence picture

Architecture Stages

Phase 1: Raw Intelligence Collection (Current)

Source Files → Import System → intelligence_reports → vessel_intelligence → Analysis Layer
↓ ↓ ↓
Git LFS data_lineage vessel_identity_confirmations
Storage import_batches intelligence_change_log

The import system provides:

  • Organized import scripts in import-sources/ directory
  • Git LFS integration for large data files
  • Security-first approach with no hardcoded credentials
  • Unified interface via docker-import.sh
  • Complete audit trail through data_lineage tracking

Phase 2: Cross-Source Identity Resolution (Future)

vessel_intelligence → Identity Graph → Resolved Vessels → Intelligence Products

Confidence Scoring
Network Analysis

Data Model

Core Tables

intelligence_reports

  • Raw intelligence from each source
  • One record per vessel per source
  • Complete JSONB preservation
  • Temporal tracking (valid_from/valid_to)
CREATE TABLE intelligence_reports (
report_id UUID PRIMARY KEY,
source_id UUID NOT NULL, -- Which RFMO/source
rfmo_shortname TEXT NOT NULL, -- Quick reference
raw_vessel_data JSONB NOT NULL, -- Complete original data
data_hash TEXT, -- Full data fingerprint
vessel_key_hash TEXT GENERATED, -- Cross-source matching
valid_from DATE DEFAULT CURRENT_DATE,
valid_to DATE,
is_current BOOLEAN DEFAULT TRUE
);

vessel_intelligence

  • Structured extraction from reports
  • 1:1 relationship with intelligence_reports
  • Normalized fields for analysis
  • No cross-source merging

vessel_identity_confirmations

  • Cross-source confirmation tracking
  • Shows which sources report same vessel
  • Calculates confidence scores
  • Never modifies source data
CREATE TABLE vessel_identity_confirmations (
vessel_key_hash TEXT PRIMARY KEY, -- IMO + name + flag hash
confirming_sources UUID[], -- Array of source IDs
confirming_source_names TEXT[], -- Human readable
confirmation_count INTEGER, -- Number of sources
confidence_score NUMERIC -- Based on confirmations
);

Key Concepts

File Deduplication vs Intelligence Confirmations

File Deduplication (Infrastructure)

  • Prevents same file from being imported twice
  • Uses SHA-256 file hash
  • Protects against processing errors
  • Tracked in data_lineage table

Intelligence Confirmations (Analysis)

  • Multiple sources reporting same vessel
  • Increases confidence in data
  • Tracked in vessel_identity_confirmations
  • Core value of intelligence platform

Confirmation Scoring

confidence_score = 
base_score (by source count) +
imo_confirmation_bonus +
consistent_naming_bonus +
temporal_consistency_bonus
  • 5+ sources: Very High Confidence (0.5+ score)
  • 3+ sources: High Confidence (0.3+ score)
  • 2 sources: Moderate Confidence (0.1+ score)
  • 1 source: Single Source (0.0 score)

Change Detection

Tracks vessel changes between imports:

  • NEW: Vessel first appearance
  • UPDATED: Vessel data changed
  • REMOVED: Vessel no longer reported
  • UNCHANGED: Vessel data consistent

Risk scoring for changes:

  • Flag changes: High risk
  • Ownership changes: High risk
  • Name changes: Medium risk
  • Technical changes: Low risk

Import Process

1. Pre-Import Validation

# Check if file already imported
FILE_HASH=$(sha256sum "$INPUT_FILE")
Check data_lineage for existing hash

2. Batch Creation

INSERT INTO intelligence_import_batches
Track: source, date, file info, previous batch

3. Raw Import

INSERT INTO intelligence_reports
- Preserve complete raw data
- Generate vessel_key_hash for matching
- Mark previous reports as not current

4. Intelligence Extraction

INSERT INTO vessel_intelligence
- Extract structured fields
- Calculate completeness scores
- Maintain source relationship

5. Confirmation Updates

UPDATE vessel_identity_confirmations
- Find matching vessels across sources
- Update confirmation counts
- Recalculate confidence scores

6. Change Detection

INSERT INTO intelligence_change_log
- Compare with previous batch
- Identify changes and risks
- Track temporal patterns

Query Patterns

Find Multi-Source Vessels

SELECT 
vessel_name,
vessel_imo,
confirmation_count,
confirming_source_names,
confidence_score
FROM vessel_identity_confirmations
WHERE confirmation_count > 1
ORDER BY confirmation_count DESC;

Track Vessel History

SELECT 
ir.report_date,
ir.rfmo_shortname,
vi.reported_flag,
vi.reported_owner_name,
vi.authorization_status
FROM vessel_intelligence vi
JOIN intelligence_reports ir ON vi.report_id = ir.report_id
WHERE vi.reported_imo = '1234567'
ORDER BY ir.report_date DESC;

Analyze Source Overlaps

WITH source_pairs AS (
SELECT source_1, source_2, COUNT(*) as shared_vessels
FROM (cross source analysis)
GROUP BY source_1, source_2
)
SELECT * FROM source_pairs
WHERE shared_vessels > 100
ORDER BY shared_vessels DESC;

Best Practices

1. Never Delete Intelligence

  • Use temporal tracking (valid_to dates)
  • Maintain complete audit trail
  • Archive instead of delete

2. Preserve Original Data

  • Store complete raw data in JSONB
  • Extract to structured fields
  • Never modify originals

3. Track Everything

  • File lineage (source to import)
  • Data lineage (transformations)
  • Temporal changes (vessel history)
  • Cross-source confirmations

4. Build Trust Through Confirmations

  • More sources = higher confidence
  • Track confirmation patterns
  • Identify reliable sources
  • Flag anomalies

Future Enhancements (Phase 2)

  1. Identity Resolution

    • Build vessel identity graph
    • Resolve name variations
    • Track vessel networks
  2. Risk Scoring

    • Behavioral analysis
    • Network risk propagation
    • Predictive indicators
  3. Intelligence Products

    • Automated alerts
    • Trend analysis
    • Relationship mapping

References

  • ADR-0056: Staged Intelligence Import Architecture
  • ADR-0059: PostgreSQL 17 Native Architecture
  • ADR-0061: Intelligence Confirmations Are Not Duplicates
  • ADR-0062: Phase 1 Data Isolation Architecture