Skip to main content

Source: ebisu/docs/adr/0053-enhanced-modular-entrypoint.md | ✏️ Edit on GitHub

ADR-001: Enhanced Modular entrypoint.sh Architecture

Status

Accepted - Implemented and operational with vessel data expansion

Context

Problem

The original entrypoint.sh script had grown to over 500 lines with complex inline logic for:

  • Database import orchestration
  • Species data processing (ASFIS, WoRMS, ITIS)
  • Harmonization workflows
  • MSC fisheries integration
  • Comprehensive validation and reporting

With the vessel data expansion requiring 40+ datasets across 6 major categories (RFMO, BADDIE, COUNTRY, COUNTRY_EU, CIVIL_SOCIETY, INTERGOV), the monolithic approach became completely unmanageable.

This created critical issues:

  • Maintainability: Impossible to debug specific import phases with 40+ vessel sources
  • Testing: Cannot test individual vessel categories or sources in isolation
  • Scalability: Adding vessel data would create a 2000+ line script
  • Recovery: Failures in one vessel source required rerunning entire 6-hour import process
  • Parallel Development: Multiple developers couldn't work on different vessel categories
  • Code Reuse: Import logic couldn't be reused across vessel source types

Requirements

  • Maintain 100% existing functionality and output
  • Enable independent testing of import phases and vessel categories
  • Support incremental execution and recovery for 40+ vessel datasets
  • Enable parallel development across 6 vessel categories
  • Preserve detailed logging and validation
  • Support Emily's active development workflow with progress tracking
  • Maintain backward compatibility with existing deployment

Decision

Architecture: Hierarchical Phase-Based Modular System

Enhanced Core Structure:

/app/scripts/
├── 🎯 entrypoint.sh # Lightweight phase orchestrator
├── manual_import.sh
├── run-migrations.sh
├── 🔧 core/ # REUSABLE UTILITIES
│ ├── logging.sh # Centralized logging functions
│ ├── database.sh # DB connection & validation
│ ├── phase-orchestrator.sh # Phase execution framework
│ └── reporting.sh # Modular reporting system
├── 📋 phases/ # 🆕 SELF-CONTAINED IMPORT PHASES
│ ├── 03-foundation-data.sh # Foundation tables (reference, country, gear)
│ ├── 04-species-data.sh # Species systems (ASFIS, WoRMS, ITIS)
│ ├── 05-harmonization.sh # WoRMS-ASFIS direct FK harmonization
│ ├── 06-msc-fisheries.sh # MSC certified fisheries
│ ├── 07-final-reporting.sh # Final status report & validation for Phases 03-06
│ ├── 08-rfmo-vessels.sh # RFMO vessel scripts
│ ├── 09-baddie-vessels.sh # BADDIE vessel scripts (SDN, WRO, IUU, Outlaw Ocean)
│ ├── 10-country-vessels.sh # Pending # Non-EU country scripts
│ ├── 11-country-eu-vessels.sh # Pending # EU country scripts
│ ├── 12-civil-society-vessels.sh # Pending # Civil Society scripts (ISSF, AP2HI, MSC)
│ ├── 13-intergov-vessels.sh # Pending # Intergovernmental scripts (PNA_TUNA, PNA_FSMA)
│ ├── 14-final-vessels-reporting.sh # Pending # Final status report & validation for Phases 08-13
│ ├── 📁 rfmo-specific/ # Modularizes RFMO vessel scripts
│ │ ├── 08-npfc-vessels.sh # ✅ DONE
│ │ ├── 08-neafc-vessels.sh # ✅ DONE
│ │ ├── 08-sprfmo-vessels.sh # 🌶️ IN PROGRESS
│ │ ├── 08-ccsbt-vessels.sh # Pending
│ │ ├── 08-ccamlr-vessels.sh # Pending
│ │ ├── 08-ffa-vessels.sh # Pending
│ │ ├── 08-gfcm-vessels.sh # Pending
│ │ ├── 08-iattc-vessels.sh # Pending
│ │ ├── 08-iccat-vessels.sh # Pending
│ │ ├── 08-iotc-vessels.sh # Pending
│ │ ├── 08-nafo-vessels.sh # Pending
│ │ ├── 08-seafo-vessels.sh # Pending
│ │ └── 08-wcpfc-vessels.sh # Pending
│ ├── 📁 baddie-specific/ # Modularizes BADDIE vessel scripts
│ │ ├── 09-sdn-glomag-vessels.sh # ✅ DONE
│ │ ├── 09-wro-vessels.sh # ✅ DONE
│ │ ├── 09-outlaw-ocean-vessels.sh # Pending
│ │ ├── 09-iuu-ccsbt-vessels.sh # Pending
│ │ ├── 09-iuu-ccamlr-vessels.sh # Pending
│ │ ├── 09-iuu-ffa-vessels.sh # Pending
│ │ ├── 09-iuu-gfcm-vessels.sh # Pending
│ │ ├── 09-iuu-iattc-vessels.sh # Pending
│ │ ├── 09-iuu-iccat-vessels.sh # Pending
│ │ ├── 09-iuu-iotc-vessels.sh # Pending
│ │ ├── 09-iuu-nafo-vessels.sh # Pending
│ │ ├── 09-iuu-neafc-vessels.sh # Pending
│ │ ├── 09-iuu-npfc-vessels.sh # Pending
│ │ ├── 09-iuu-seafo-vessels.sh # Pending
│ │ ├── 09-iuu-siofa-vessels.sh # Pending
│ │ ├── 09-iuu-sprfmo-vessels.sh # Pending
│ │ └── 09-iuu-wcpfc-vessels.sh # Pending
│ ├── 📁 country-specific/ # Modularizes Non-EU Country vessel scripts
│ │ ├── 10-usa-adfg-vessels.sh # Pending
│ │ ├── 10-fro-vessels.sh # Pending
│ │ ├── 10-mdv-vessels.sh # Pending
│ │ ├── 10-nor-vessels.sh # Pending
│ │ ├── 10-pan-vessels.sh # Pending
│ │ ├── 10-twn-siofa-vessels.sh # Pending
│ │ ├── 10-twn-pac-vessels.sh # Pending
│ │ ├── 10-twn-siofa-car-vessels.sh # Pending
│ │ ├── 10-uk-large-vessels.sh # Pending
│ │ ├── 10-uk-small-vessels.sh # Pending
│ │ ├── 10-mex-large-vessels.sh # Pending
│ │ ├── 10-mex-small-vessels.sh # Pending
│ │ ├── 10-rus-vessels.sh # Pending
│ │ └── 10-per-vessels.sh # Pending
│ ├── 📁 country-eu-specific/ # Modularizes EU Country vessel scripts
│ │ ├── 11-bel-vessels.sh # Pending (21 EU countries total)
│ │ ├── 11-bgr-vessels.sh # Pending
│ │ └── ... (19 more EU country scripts) # Pending
│ ├── 📁 civil-society-specific/ # Modularizes Civil Society vessel scripts
│ │ ├── 12-issf-uvi-vessels.sh # Pending
│ │ ├── 12-issf-pvr-vessels.sh # Pending
│ │ ├── 12-issf-ps-vessels.sh # Pending
│ │ ├── 12-issf-vosi-vessels.sh # Pending
│ │ ├── 12-ap2hi-vessels.sh # Pending
│ │ └── 12-msc-vessels.sh # Pending
│ └── 📁 intergov-specific/ # Modularizes Intergovernmental vessel scripts
│ ├── 13-pna-tuna-vessels.sh # Pending
│ └── 13-pna-fsma-vessels.sh # Pending
├── 📥 import/ # Existing import scripts
│ └── 🚢 vessels/ # Vessel-specific import scripts
│ ├── create_sources_vessels.sql
│ ├── load_sources_vessels.sh
│ └── data/
│ ├── BADDIE/ # Baddie + Blacklist vessel data
│ │ ├── clean_sdn_glomag.sh # ✅ DONE
│ │ ├── clean_wro.sh # ✅ DONE
│ │ ├── load_sdn_glomag.sh # ✅ DONE
│ │ ├── load_wro.sh # ✅ DONE
│ │ ├── setup_sdn_glomag_loading.sql # ✅ DONE
│ │ └── setup_wro_loading.sql # ✅ DONE
│ ├── CIVIL_SOCIETY/ # Civil Society Organizations vessel data
│ ├── COUNTRY/ # Non-EU Country vessel data
│ ├── COUNTRY_EU/ # EU Country vessel data
│ ├── INTERGOV/ # Intergovernmental Organizations vessel data
│ └── RFMO/ # Regional Fisheries Management Organization vessel data
│ ├── clean_npfc_vessels.sh # ✅ DONE
│ ├── clean_neafc_vessels.sh # ✅ DONE
│ ├── clean_sprfmo_vessels.sh # 🌶️ IN PROGRESS
│ ├── load_npfc_vessels.sh # ✅ DONE
│ ├── load_neafc_vessels.sh # ✅ DONE
│ ├── load_sprfmo_vessels.sh # 🌶️ IN PROGRESS
│ ├── setup_npfc_loading.sql # ✅ DONE
│ ├── setup_neafc_loading.sql # ✅ DONE
│ └── setup_sprfmo_loading.sql# 🌶️ IN PROGRESS
└── 🔄 preprocessing/ # Data preprocessing scripts
├── clean_asfis_data.py # Special cleaning for asfis.ts
└── msc_fisheries_preprocessing.py # Special cleaning for msc fisheries

Emily's Progress Tracking System:

  • 🌶️ = IN PROGRESS, NOT AVAILABLE YET. EM ACTIVE (Emily is currently working on it)
  • ✅ = DONE
  • (No emoji) = Pending (not yet started)

Enhanced Design Principles

  1. Hierarchical Phase Independence: Each major phase (08-13) and sub-phase can be executed independently
  2. Category-Based Organization: Vessel data organized by logical categories (RFMO, BADDIE, etc.)
  3. Utility Sharing: Common functions extracted to /core/ directory with vessel-specific extensions
  4. Progress Tracking: Built-in status tracking for Emily's active development workflow
  5. Fallback Safety: Main entrypoint has fallback to original monolithic script
  6. Configuration-Driven Expansion: Reporting system automatically discovers new vessel tables/phases
  7. Backward Compatibility: Identical user experience and output format
  8. Parallel Development: Multiple developers can work on different vessel categories simultaneously

Enhanced Phase Execution Flow

Vessel Data Architecture

Database Schema Integration

The vessel system includes comprehensive schema definitions supporting:

Core Vessel Identifiers:

  • IMO, IRCS, MMSI, National Registry, EU CFR
  • Foreign key relationships to country_iso for flag states
  • Support for 40+ vessel source types across 6 categories

Source Type Enums:

-- Vessel source types with comprehensive RFMO and organizational coverage
vessel_source_type_enum: 'RFMO', 'RFMO_IUU', 'COUNTRY', 'COUNTRY_IUU',
'INTERGOVERNMENTAL', 'ORGANIZATION', 'CERTIFICATION_BODY',
'CIVIL_SOCIETY', 'OUTLAW_OCEAN', 'WRO', 'SDN_GLOMAG'

-- External identifier types covering all RFMOs
external_identifier_type_enum: 'RFMO_CCAMLR', 'RFMO_CCSBT', 'RFMO_FFA',
'RFMO_GFCM', 'RFMO_IATTC', 'RFMO_ICCAT', 'RFMO_IOTC', 'RFMO_NAFO',
'RFMO_NEAFC', 'RFMO_NPFC', 'RFMO_SEAFO', 'RFMO_SIOFA', 'RFMO_SPRFMO',
'RFMO_WCPFC', 'HULL_ID', 'AP2HI_ID', 'ISSF_TUVI', 'ADFG_NO'

Enhanced Original Sources Table:

CREATE TABLE original_sources_vessels (
source_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
source_shortname TEXT UNIQUE NOT NULL,
source_fullname TEXT NOT NULL,
source_types TEXT[] NOT NULL, -- Array for multi-type sources
rfmo_id UUID, -- FK to rfmos(id) for RFMO-specific sources
country_id UUID, -- FK to country_iso(id) for country-specific sources
source_urls JSONB, -- Combined URLs as JSON array
-- ... additional metadata fields
);

Implementation Details

Enhanced Phase Execution Pattern:

# Major phase execution with optional flag for vessel phases
if execute_phase "08-rfmo-vessels" "true"; then
log_success "RFMO vessel phase completed via modular script"
else
log_warning "RFMO vessel phase had issues - continuing with container startup"
log_warning "Note: RFMO vessel data is optional - check if data files are available"
fi

# Sub-phase execution within major phases
execute_rfmo_subphase "npfc" "✅" # Completed
execute_rfmo_subphase "neafc" "🌶️" # In progress
execute_rfmo_subphase "sprfmo" "🌶️" # In progress

Enhanced Utility Function Pattern:

# Each phase script sources required utilities plus vessel-specific functions
source /app/scripts/core/logging.sh
source /app/scripts/core/database.sh
source /app/scripts/core/vessel-utilities.sh # New vessel-specific utilities

# Vessel-specific helper functions
get_vessel_sources_by_type() { ... }
vessel_source_has_type() { ... }
mark_vessel_source_loaded() { ... }

Enhanced Error Handling Strategy:

  • Phase-level validation with detailed logging
  • Category-level recovery (can skip entire vessel categories)
  • Source-level granularity (can skip individual vessel sources)
  • Progress preservation across development sessions
  • Emily's active development status tracking

Enhanced Consequences

Positive

Maintainability:

  • Individual vessel categories can be developed and debugged independently
  • 40+ vessel sources organized into logical, manageable groups
  • Emily can work on specific categories without affecting others
  • Clear separation of concerns reduces cognitive complexity for large vessel system
  • Reusable utilities eliminate code duplication across vessel categories

Development Velocity:

  • Parallel development across 6 vessel categories by multiple developers
  • Individual vessel sources can be tested: bash /app/scripts/phases/rfmo-specific/08-npfc-vessels.sh
  • Progress tracking prevents duplicate work and shows system status
  • Easier onboarding for new team members working on specific vessel categories
  • Reduced merge conflicts with hierarchical organization

Testing & Quality:

  • Category-level testing: bash /app/scripts/phases/08-rfmo-vessels.sh
  • Source-level testing: bash /app/scripts/phases/rfmo-specific/08-npfc-vessels.sh
  • Unit testing possible for vessel-specific utility functions
  • Integration testing maintains full-system validation across 40+ sources
  • Progressive validation as Emily completes each vessel source

Scalability:

  • Ready for vessel data expansion beyond 40+ current datasets
  • Configuration-driven approach for future vessel categories
  • Reporting system automatically adapts to new vessel phases and sources
  • Hierarchical organization supports unlimited vessel source expansion

Operations:

  • Failed vessel sources can be re-run individually without affecting others
  • Category-level recovery (skip problematic vessel category, continue others)
  • Incremental deployment and rollback capabilities for vessel phases
  • Better error localization across complex vessel data system
  • Production-ready with vessel sources as optional components

Negative

Complexity:

  • More files to manage (60+ scripts vs 1 monolithic for vessel system)
  • Hierarchical dependency management across vessel categories
  • Requires understanding of both phase orchestration and vessel category organization
  • Emily's progress tracking adds coordination overhead

Debugging:

  • Stack traces may span multiple files across vessel category hierarchy
  • Vessel phase interdependencies must be carefully managed
  • Slightly more complex failure analysis across vessel categories
  • Progress tracking requires coordination during parallel development

Risks and Mitigations

RiskImpactMitigation
Vessel phase dependency failuresHighBuilt-in dependency checking and category-level validation
File system permissions across vessel directoriesMediumExplicit chmod in deployment process for all vessel subdirectories
Core vessel utility corruptionHighFallback to original monolithic script + vessel-specific fallbacks
Vessel phase ordering issuesMediumClear naming convention and documentation for vessel categories
Emily's progress conflictsMediumBuilt-in status checking and conflict resolution in progress tracking
Vessel source data inconsistenciesHighComprehensive validation at source, category, and system levels

Enhanced Migration Strategy

Completed Phases ✅

  1. Phase 1: Extract utility functions (logging, database) - COMPLETED
  2. Phase 2: Extract foundation data phase - COMPLETED
  3. Phase 3: Extract species data phase - COMPLETED
  4. Phase 4: Extract harmonization phase - COMPLETED
  5. Phase 5: Extract MSC fisheries phase - COMPLETED
  6. Phase 6: Extract final reporting - COMPLETED
  7. Phase 7: Create vessel sources table and loading - COMPLETED

Current Implementation ✅🌶️

  1. Phase 8: RFMO vessel category - PARTIALLY COMPLETED

    • ✅ NPFC vessels (complete)
    • 🌶️ NEAFC vessels (Emily actively developing)
    • 🌶️ SPRFMO vessels (Emily actively developing)
    • Pending: 9 additional RFMO sources
  2. Phase 9: BADDIE vessel category - PARTIALLY COMPLETED

    • ✅ SDN GLOMAG vessels (complete)
    • 🌶️ WRO vessels (Emily actively developing)
    • 🌶️ Outlaw Ocean vessels (Emily actively developing)
    • 🌶️ 14 IUU sources across all RFMOs (Emily actively developing)

Planned Phases (Emily's Pipeline)

  1. Phase 10: Country vessel category - PLANNED

    • 14 non-EU country sources
  2. Phase 11: EU Country vessel category - PLANNED

    • 21 EU member state sources
  3. Phase 12: Civil Society vessel category - PLANNED

    • 6 civil society organization sources (ISSF variants, AP2HI, MSC)
  4. Phase 13: Intergovernmental vessel category - PLANNED

    • 2 Pacific island organization sources (PNA)
  5. Phase 14: Final vessel reporting - PLANNED

    • Comprehensive validation across all vessel categories

Enhanced Success Metrics

System Level ✅

  • Functionality: 100% preservation of existing import behavior ✅
  • Performance: No degradation in import times ✅
  • Reliability: Maintained comprehensive validation ✅
  • Maintainability: Individual phase testing capability ✅
  • Scalability: Architecture supports vessel data expansion ✅

Vessel System Level 🌶️✅

  • Vessel Categories: 6 major categories properly organized ✅
  • Source Management: 40+ vessel sources systematically organized ✅
  • Progress Tracking: Emily's development status clearly visible ✅
  • Parallel Development: Multiple developers can work independently ✅
  • Category Independence: Vessel categories can be developed/tested separately ✅

Development Workflow ✅🌶️

  • Active Development: Emily can continue vessel development seamlessly 🌶️
  • Status Visibility: Clear understanding of what's done/in-progress/pending ✅
  • Individual Source Testing: Each vessel source can be tested independently ✅
  • Recovery Options: Failed vessel sources don't break entire system ✅

Enhanced Implementation Notes

Compatibility

  • Original entrypoint.sh.original maintained as backup
  • Fallback mechanism built into new entrypoint
  • Identical log output and validation messages preserved
  • Vessel phases are optional - system works without vessel data

Current Development Status (Emily's Key)

  • 🌶️ = Emily is actively working on it
  • ✅ = Completed and operational
  • (No emoji) = Planned but not yet started

Emily's Active Work Areas

  • NEAFC vessels (🌶️): RFMO category development
  • SPRFMO vessels (🌶️): RFMO category development
  • WRO vessels (🌶️): BADDIE category development
  • Multiple IUU sources (🌶️): BADDIE category expansion

Vessel Source Integration

  • RFMO Integration: Automatic mapping from source names to rfmos table UUIDs
  • Country Integration: Automatic mapping from alpha3 codes to country_iso table UUIDs
  • Source Types: Enum validation for vessel source types with comprehensive coverage
  • Foreign Keys: All vessel data references original_sources_vessels via UUID foreign keys

Future Considerations

  • Vessel categories can be extended beyond current 6 categories
  • Source discovery mechanism for automatic detection of new vessel datasets
  • Parallel execution potential for independent vessel categories
  • Consider configuration files (YAML/JSON) for vessel source definitions
  • International expansion beyond current geographic coverage

Documentation Requirements

  • Update deployment procedures to include vessel phase script permissions
  • Document vessel category execution order and dependencies
  • Create troubleshooting guide for vessel-specific issues
  • Maintain Emily's progress tracking documentation
  • Document vessel source onboarding process for new datasets

Industry Best Practices and Enterprise Alignment

"Golden Rule" Database Architecture Alignment

Our modular approach aligns with the "Golden Rule" of enterprise database architecture: "Each system component should have a single, well-defined responsibility and be independently testable, deployable, and maintainable."

How Our Architecture Implements the Golden Rule:

1. Single Responsibility Principle

  • Phase-Level: Each phase (08-rfmo, 09-baddie) handles one vessel category
  • Source-Level: Each script (08-npfc-vessels.sh) handles one data source
  • Utility-Level: Core utilities have single purposes (logging, database operations)
  • Schema-Level: Each table serves one domain (associates, authorizations, core vessel data)

2. Independent Testing and Deployment

  • Individual phases can be tested: bash /app/scripts/phases/08-rfmo-vessels.sh
  • Individual sources can be tested: bash /app/scripts/phases/rfmo-specific/08-npfc-vessels.sh
  • Failed components don't break entire system
  • Rollback possible at phase or source level

3. Separation of Concerns

  • Data Pipeline: Raw → Cleaned → Staged → Database (clear boundaries)
  • Processing Logic: Clean scripts vs Load scripts vs Validation scripts
  • Error Handling: Phase-level vs Source-level vs System-level
  • Configuration: Source metadata vs Processing logic vs Database schema

Enterprise Database Best Practices

Google/Facebook-Style Data Pipeline Architecture

  • Idempotent Operations: Each phase can be re-run safely without data corruption
  • Lineage Tracking: Every data element traced to original source via original_sources_vessels
  • Horizontal Scaling: Vessel categories can be processed independently/in parallel
  • Graceful Degradation: System remains functional even if vessel categories fail

Netflix/Uber-Style Microservices Patterns

  • Domain Boundaries: Vessel data isolated from species/reference data
  • Service Independence: Vessel phases don't depend on completion of other phases
  • Configuration-Driven: New vessel sources added via configuration, not code changes
  • Health Checks: Built-in validation and reporting at each level

Amazon/Microsoft-Style Enterprise Data Management

  • Data Catalog: original_sources_vessels serves as comprehensive source catalog
  • Schema Evolution: Vessel schemas can evolve without breaking existing data
  • Audit Trail: Complete processing history maintained via logging and status tracking
  • Recovery Mechanisms: Multiple levels of rollback and retry capabilities

GitLab/GitHub-Style CI/CD Integration

  • Pipeline-as-Code: All phases defined as executable scripts in version control
  • Progressive Deployment: Phases can be rolled out incrementally
  • Feature Flags: Emily's progress tracking (🌶️) serves as feature flag system
  • Rollback Strategy: Fallback mechanisms at every level

Comparison with Industry Standards

Architecture PatternIndustry ExampleOur Implementation
Domain-Driven DesignSpotify's BackendVessel domain isolated with clear boundaries
Event-Driven ArchitectureLinkedIn's Data PipelinePhase completion triggers next phase execution
Circuit Breaker PatternNetflix's ResilienceFailed phases don't cascade to other phases
Bulkhead PatternDocker ContainerizationVessel categories isolated from each other
Saga PatternUber's Distributed TransactionsMulti-phase vessel import with compensation
CQRSStack OverflowSeparate read (reporting) and write (import) concerns

Data Governance and Compliance Alignment

GDPR/Privacy-by-Design

  • Data Minimization: Only vessel-relevant data collected and processed
  • Source Attribution: Every data element traceable to original source
  • Right to Deletion: Source-level data removal capabilities
  • Processing Transparency: Complete audit trail of all data transformations

SOX/Audit Compliance

  • Change Control: All processing logic under version control
  • Audit Trail: Complete lineage from raw data to final database state
  • Segregation of Duties: Different phases handled by different systems/people
  • Validation Controls: Built-in data quality checks at every level

Vessel Associates and Authorization Architecture

Advanced Relationship Management System

Our vessel associates architecture implements enterprise-grade entity resolution and relationship tracking, addressing the complex reality of vessel ownership and operations in the maritime industry.

Master Entity Deduplication Pattern

Problem Solved: Maritime entities (companies, individuals) appear across multiple sources with different spellings, addresses, and identifiers. Without deduplication, we'd have thousands of duplicate entities.

Solution: Two-tier architecture with master entity resolution:

-- Master entities table (deduplicated canonical records)
vessel_associates_master (
associate_master_uuid UUID PRIMARY KEY,
canonical_name VARCHAR(200), -- Normalized name for deduplication
name_variations JSONB, -- All spelling variations seen
primary_address TEXT, -- Best-quality address
headquarters_country_id UUID, -- Primary business location
business_type VARCHAR(100), -- Company/Individual/Organization
incorporation_date DATE, -- When available from sources
regulatory_identifiers JSONB -- Tax IDs, business registrations
);

-- Individual relationships (source-specific instances)
vessel_associates (
associate_uuid UUID PRIMARY KEY,
vessel_uuid UUID, -- FK to vessels
associate_master_uuid UUID, -- FK to master entity
source_id UUID, -- Which source reported this relationship
associate_type associate_type_enum, -- Role in relationship
original_name VARCHAR(200), -- Exact name as reported by source
relationship_start_date DATE, -- When relationship began
relationship_end_date DATE, -- When relationship ended (if known)
ownership_percentage DECIMAL, -- For beneficial ownership
reg_number VARCHAR(150), -- License/registration numbers
sdn_uuid UUID, -- Link to SDN sanctions data
wro_uuid UUID -- Link to WRO enforcement data
);

Associate Type Taxonomy

Our comprehensive associate type system covers the full spectrum of vessel relationships found across 40+ data sources:

associate_type_enum: [
'BENEFICIAL_OWNER', -- Ultimate economic interest holders
'CHARTERER', -- Short-term vessel leasers
'FISH_MASTER', -- Fishing operations supervisors
'FISH_PRODUCER_ORGANIZATION', -- Fishing cooperatives/organizations
'OPERATING_COMPANY', -- Day-to-day vessel operators
'OPERATOR', -- Individual vessel operators
'OTHER_BENEFICIARY', -- Other economic interest holders
'OWNER', -- Legal vessel owners
'OWNING_COMPANY', -- Corporate vessel owners
'SDN_LINKED_ENTITY', -- Sanctions-connected entities
'VESSEL_MASTER', -- Ship captains/masters
'WRO_COMPANY' -- World-Check Risk intelligence entities
]

Enterprise-Grade Authorization System

Comprehensive Fishing Authorization Tracking: Our authorization system captures the complex regulatory landscape governing commercial fishing vessels worldwide.

vessel_authorizations (
authorization_uuid UUID PRIMARY KEY,
vessel_uuid UUID, -- FK to vessels
source_id UUID, -- Reporting authority
authorization_type authorization_type_enum,
license_number VARCHAR(100), -- Official license identifier
start_date DATE, -- Authorization validity start
end_date DATE, -- Authorization expiration
rfmo_id UUID, -- Regional authority (ICCAT, WCPFC, etc.)
region_description TEXT, -- Geographic scope description
fao_area_ids JSONB, -- Array of FAO area UUIDs
species_ids JSONB, -- Array of authorized species UUIDs
additional_data JSONB -- Quotas, conditions, restrictions
);

Authorization Type Coverage:

authorization_type_enum: [
'FISHING_AUTHORIZATION', -- General fishing permissions
'FISHING_LICENSE', -- Formal fishing licenses
'TRANSSHIPMENT_AUTHORIZATION', -- At-sea transfer permissions
'CARRIER_AUTHORIZATION', -- Fish transport permissions
'OBSERVER_AUTHORIZATION', -- Scientific observer requirements
'TUNA_AUTHORIZATION', -- Tuna-specific permissions
'EXEMPT_VESSEL', -- Regulatory exemptions
'SUPPORT_VESSEL_AUTHORIZATION', -- Support/supply vessel permissions
'OTHER_AUTHORIZATION' -- Miscellaneous regulatory permissions
]

Advanced Querying Capabilities

Complex Ownership Analysis

-- Find all vessels with common beneficial owners
SELECT
v1.vessel_name as vessel_1,
v2.vessel_name as vessel_2,
am.canonical_name as common_owner
FROM vessels v1
JOIN vessel_associates va1 ON v1.vessel_uuid = va1.vessel_uuid
JOIN vessel_associates_master am ON va1.associate_master_uuid = am.associate_master_uuid
JOIN vessel_associates va2 ON am.associate_master_uuid = va2.associate_master_uuid
JOIN vessels v2 ON va2.vessel_uuid = v2.vessel_uuid
WHERE va1.associate_type = 'BENEFICIAL_OWNER'
AND va2.associate_type = 'BENEFICIAL_OWNER'
AND v1.vessel_uuid != v2.vessel_uuid;

Regulatory Compliance Analysis

-- Find vessels with expired authorizations in specific regions
SELECT
v.vessel_name,
v.imo,
va.license_number,
va.end_date,
r.rfmo_name
FROM vessels v
JOIN vessel_authorizations va ON v.vessel_uuid = va.vessel_uuid
JOIN rfmos r ON va.rfmo_id = r.id
WHERE va.end_date < CURRENT_DATE
AND va.authorization_type = 'FISHING_AUTHORIZATION'
AND r.rfmo_acronym IN ('ICCAT', 'WCPFC', 'IOTC');

Sanctions and Risk Intelligence Integration

-- Find vessels with sanctions-linked associates
SELECT
v.vessel_name,
v.imo,
am.canonical_name as sanctioned_entity,
va.associate_type,
sdn.sdn_type
FROM vessels v
JOIN vessel_associates va ON v.vessel_uuid = va.vessel_uuid
JOIN vessel_associates_master am ON va.associate_master_uuid = am.associate_master_uuid
JOIN vessels_sdn_simple sdn ON va.sdn_uuid = sdn.sdn_uuid
WHERE sdn.sdn_type IN ('SDN', 'BLOCKED_PERSON', 'DENIED_PERSON');

Data Quality and Validation

Cross-Source Relationship Validation

  • Consistency Checks: Same vessel-entity relationships across multiple sources
  • Temporal Validation: Relationship dates align with vessel operational periods
  • Hierarchical Validation: Corporate ownership structures logically consistent
  • Regulatory Validation: Authorizations align with flag state and operational areas

Master Entity Quality Metrics

  • Name Standardization Score: Measure of canonical name quality
  • Address Completeness: Percentage of complete address information
  • Source Coverage: How many sources report this entity
  • Relationship Density: Number of vessel connections per entity

Integration with Global Compliance Systems

OFAC/Sanctions Integration

  • Real-time Screening: New associates checked against sanctions lists
  • Historical Analysis: Track sanctions status changes over time
  • Network Analysis: Identify indirect sanctions exposure through relationships
  • Alert System: Automated notifications for sanctions-linked vessels

RFMO Compliance Tracking

  • Multi-RFMO Vessels: Track vessels authorized across multiple regions
  • Quota Management: Link authorizations to catch quotas and limits
  • Compliance History: Track authorization violations and penalties
  • Cross-Reference Validation: Verify RFMO data consistency across sources

This associates and authorization architecture positions our system as a comprehensive vessel intelligence platform, capable of supporting complex maritime compliance, risk assessment, and regulatory analysis use cases found in enterprise-grade systems used by government agencies, financial institutions, and maritime industry leaders.


Decision Date: August 2025
Stakeholders: Database Engineering Team, DevOps Team, Emily (Lead Vessel Data Developer)
Review Date: December 2025 (post vessel system completion)

Current Priority: Support Emily's active vessel development while maintaining system stability and enabling parallel development across vessel categories.