Source:
ebisu/docs/INTELLIGENCE_ARCHITECTURE.md| ✏️ Edit on GitHub
Ebisu Intelligence Architecture
Overview
Ebisu implements a staged intelligence import architecture that treats multiple reports as confirmations, not duplicates. This document explains the critical architectural decisions and data flow.
Core Principles
1. There Are No Duplicates in Intelligence
- Multiple reports of the same vessel = CONFIRMATIONS
- More sources reporting = HIGHER confidence
- Every report is preserved in its original form
- Cross-source validation increases trust
2. Complete Data Isolation
- Each source maintains separate records
- No cross-contamination between sources
- Original data always preserved
- Analysis layers are read-only
3. Temporal Intelligence
- Track changes over time
- Detect vessel behavior patterns
- Monitor flag changes and ownership transfers
- Build historical intelligence picture
Architecture Stages
Phase 1: Raw Intelligence Collection (Current)
Source Files → Import System → intelligence_reports → vessel_intelligence → Analysis Layer
↓ ↓ ↓
Git LFS data_lineage vessel_identity_confirmations
Storage import_batches intelligence_change_log
The import system provides:
- Organized import scripts in
import-sources/directory - Git LFS integration for large data files
- Security-first approach with no hardcoded credentials
- Unified interface via
docker-import.sh - Complete audit trail through data_lineage tracking
Phase 2: Cross-Source Identity Resolution (Future)
vessel_intelligence → Identity Graph → Resolved Vessels → Intelligence Products
↓
Confidence Scoring
Network Analysis
Data Model
Core Tables
intelligence_reports
- Raw intelligence from each source
- One record per vessel per source
- Complete JSONB preservation
- Temporal tracking (valid_from/valid_to)
CREATE TABLE intelligence_reports (
report_id UUID PRIMARY KEY,
source_id UUID NOT NULL, -- Which RFMO/source
rfmo_shortname TEXT NOT NULL, -- Quick reference
raw_vessel_data JSONB NOT NULL, -- Complete original data
data_hash TEXT, -- Full data fingerprint
vessel_key_hash TEXT GENERATED, -- Cross-source matching
valid_from DATE DEFAULT CURRENT_DATE,
valid_to DATE,
is_current BOOLEAN DEFAULT TRUE
);
vessel_intelligence
- Structured extraction from reports
- 1:1 relationship with intelligence_reports
- Normalized fields for analysis
- No cross-source merging
vessel_identity_confirmations
- Cross-source confirmation tracking
- Shows which sources report same vessel
- Calculates confidence scores
- Never modifies source data
CREATE TABLE vessel_identity_confirmations (
vessel_key_hash TEXT PRIMARY KEY, -- IMO + name + flag hash
confirming_sources UUID[], -- Array of source IDs
confirming_source_names TEXT[], -- Human readable
confirmation_count INTEGER, -- Number of sources
confidence_score NUMERIC -- Based on confirmations
);
Key Concepts
File Deduplication vs Intelligence Confirmations
File Deduplication (Infrastructure)
- Prevents same file from being imported twice
- Uses SHA-256 file hash
- Protects against processing errors
- Tracked in
data_lineagetable
Intelligence Confirmations (Analysis)
- Multiple sources reporting same vessel
- Increases confidence in data
- Tracked in
vessel_identity_confirmations - Core value of intelligence platform
Confirmation Scoring
confidence_score =
base_score (by source count) +
imo_confirmation_bonus +
consistent_naming_bonus +
temporal_consistency_bonus
- 5+ sources: Very High Confidence (0.5+ score)
- 3+ sources: High Confidence (0.3+ score)
- 2 sources: Moderate Confidence (0.1+ score)
- 1 source: Single Source (0.0 score)
Change Detection
Tracks vessel changes between imports:
- NEW: Vessel first appearance
- UPDATED: Vessel data changed
- REMOVED: Vessel no longer reported
- UNCHANGED: Vessel data consistent
Risk scoring for changes:
- Flag changes: High risk
- Ownership changes: High risk
- Name changes: Medium risk
- Technical changes: Low risk
Import Process
1. Pre-Import Validation
# Check if file already imported
FILE_HASH=$(sha256sum "$INPUT_FILE")
Check data_lineage for existing hash
2. Batch Creation
INSERT INTO intelligence_import_batches
Track: source, date, file info, previous batch
3. Raw Import
INSERT INTO intelligence_reports
- Preserve complete raw data
- Generate vessel_key_hash for matching
- Mark previous reports as not current
4. Intelligence Extraction
INSERT INTO vessel_intelligence
- Extract structured fields
- Calculate completeness scores
- Maintain source relationship
5. Confirmation Updates
UPDATE vessel_identity_confirmations
- Find matching vessels across sources
- Update confirmation counts
- Recalculate confidence scores
6. Change Detection
INSERT INTO intelligence_change_log
- Compare with previous batch
- Identify changes and risks
- Track temporal patterns
Query Patterns
Find Multi-Source Vessels
SELECT
vessel_name,
vessel_imo,
confirmation_count,
confirming_source_names,
confidence_score
FROM vessel_identity_confirmations
WHERE confirmation_count > 1
ORDER BY confirmation_count DESC;
Track Vessel History
SELECT
ir.report_date,
ir.rfmo_shortname,
vi.reported_flag,
vi.reported_owner_name,
vi.authorization_status
FROM vessel_intelligence vi
JOIN intelligence_reports ir ON vi.report_id = ir.report_id
WHERE vi.reported_imo = '1234567'
ORDER BY ir.report_date DESC;
Analyze Source Overlaps
WITH source_pairs AS (
SELECT source_1, source_2, COUNT(*) as shared_vessels
FROM (cross source analysis)
GROUP BY source_1, source_2
)
SELECT * FROM source_pairs
WHERE shared_vessels > 100
ORDER BY shared_vessels DESC;
Best Practices
1. Never Delete Intelligence
- Use temporal tracking (valid_to dates)
- Maintain complete audit trail
- Archive instead of delete
2. Preserve Original Data
- Store complete raw data in JSONB
- Extract to structured fields
- Never modify originals
3. Track Everything
- File lineage (source to import)
- Data lineage (transformations)
- Temporal changes (vessel history)
- Cross-source confirmations
4. Build Trust Through Confirmations
- More sources = higher confidence
- Track confirmation patterns
- Identify reliable sources
- Flag anomalies
Future Enhancements (Phase 2)
-
Identity Resolution
- Build vessel identity graph
- Resolve name variations
- Track vessel networks
-
Risk Scoring
- Behavioral analysis
- Network risk propagation
- Predictive indicators
-
Intelligence Products
- Automated alerts
- Trend analysis
- Relationship mapping
References
- ADR-0056: Staged Intelligence Import Architecture
- ADR-0059: PostgreSQL 17 Native Architecture
- ADR-0061: Intelligence Confirmations Are Not Duplicates
- ADR-0062: Phase 1 Data Isolation Architecture