Source: ebisu/docs/adr/0061-intelligence-confirmations-not-duplicates.md | ✏️ Edit on GitHub

ADR-0061: Intelligence Confirmations Are Not Duplicates

Status: Accepted
Date: 2025-01-11
Stakeholders: Development team, Intelligence analysts

Context

During Phase 1 implementation, there was potential confusion between:

File deduplication - Preventing the same file from being imported twice
Intelligence confirmations - Multiple sources reporting the same vessel

This distinction is CRITICAL for an intelligence platform where multiple reports of the same entity increase confidence, not redundancy.

Decision

We explicitly separate these two concepts:

1. File-Level Deduplication (Good)

Prevent reimporting the exact same file (same SHA-256 hash)
Tracked in data_lineage table
Purpose: Avoid processing waste and data corruption

2. Intelligence Confirmations (Critical)

Multiple sources reporting the same vessel are CONFIRMATIONS
Each source maintains its own records in intelligence_reports
Cross-source confirmations tracked in vessel_identity_confirmations
More sources = higher confidence score

Architecture

-- Each source has its own data
intelligence_reports
├── source_id (RFMO/source identifier)
├── raw_vessel_data (complete original data)
├── vessel_key_hash (for cross-source matching)
└── data_hash (includes source-specific fields)

-- Confirmation tracking
vessel_identity_confirmations
├── vessel_key_hash (IMO + name + flag)
├── confirming_sources[] (array of source IDs)
├── confirmation_count (number of sources)
└── confidence_score (based on confirmations)

Implementation

File Deduplication

# Check if file already imported
EXISTING_LINEAGE=$(execute_sql "
    SELECT lineage_id 
    FROM data_lineage 
    WHERE source_file_hash = '$FILE_HASH'
")

if [[ -n "$EXISTING_LINEAGE" ]]; then
    log_warning "This exact file was already imported"
    # Optionally exit or continue with new version
fi

Intelligence Confirmations

-- Multiple reports = confirmations
SELECT 
    vessel_name,
    confirmation_count,
    confirming_source_names,
    confidence_score
FROM vessel_identity_confirmations
WHERE confirmation_count > 1
ORDER BY confirmation_count DESC;

Consequences

Positive

No intelligence data is ever lost
Cross-source validation increases data confidence
Can track how many sources confirm each vessel
Proper audit trail of all reports

Negative

More storage required (but intelligence requires this)
Must carefully distinguish between file and data deduplication

Neutral

Requires clear documentation and training
Import scripts must handle both concepts

Examples

Correct: Multiple Sources Confirming

NAFO reports vessel "OCEAN STAR" IMO 1234567
NEAFC reports vessel "OCEAN STAR" IMO 1234567
→ 2 intelligence_reports records
→ 1 vessel_identity_confirmations record with confirmation_count = 2
→ Higher confidence score

Correct: Preventing File Re-import

Import NAFO_vessels_2024-12.csv (hash: abc123...)
Re-run same import
→ System detects same file hash
→ Prevents duplicate import
→ Protects data integrity

Incorrect: Treating Confirmations as Duplicates

❌ NEVER DO THIS:
"NAFO already reported this vessel, skip NEAFC's report"
→ This loses valuable confirmation data

References

ADR-0056: Intelligence Platform Principles
ADR-0059: PostgreSQL 17 Native Architecture
CLAUDE.md: Critical principle about no duplicates in intelligence

Context​

Decision​

1. File-Level Deduplication (Good)​

2. Intelligence Confirmations (Critical)​

Architecture​

Implementation​

File Deduplication​

Intelligence Confirmations​

Consequences​

Positive​

Negative​

Neutral​

Examples​

Correct: Multiple Sources Confirming​

Correct: Preventing File Re-import​

Incorrect: Treating Confirmations as Duplicates​

References​