Source:
ebisu/docs/adr/0053-enhanced-modular-entrypoint.md| ✏️ Edit on GitHub
ADR-001: Enhanced Modular entrypoint.sh Architecture
Status
Accepted - Implemented and operational with vessel data expansion
Context
Problem
The original entrypoint.sh script had grown to over 500 lines with complex inline logic for:
- Database import orchestration
- Species data processing (ASFIS, WoRMS, ITIS)
- Harmonization workflows
- MSC fisheries integration
- Comprehensive validation and reporting
With the vessel data expansion requiring 40+ datasets across 6 major categories (RFMO, BADDIE, COUNTRY, COUNTRY_EU, CIVIL_SOCIETY, INTERGOV), the monolithic approach became completely unmanageable.
This created critical issues:
- Maintainability: Impossible to debug specific import phases with 40+ vessel sources
- Testing: Cannot test individual vessel categories or sources in isolation
- Scalability: Adding vessel data would create a 2000+ line script
- Recovery: Failures in one vessel source required rerunning entire 6-hour import process
- Parallel Development: Multiple developers couldn't work on different vessel categories
- Code Reuse: Import logic couldn't be reused across vessel source types
Requirements
- Maintain 100% existing functionality and output
- Enable independent testing of import phases and vessel categories
- Support incremental execution and recovery for 40+ vessel datasets
- Enable parallel development across 6 vessel categories
- Preserve detailed logging and validation
- Support Emily's active development workflow with progress tracking
- Maintain backward compatibility with existing deployment
Decision
Architecture: Hierarchical Phase-Based Modular System
Enhanced Core Structure:
/app/scripts/
├── 🎯 entrypoint.sh # Lightweight phase orchestrator
├── manual_import.sh
├── run-migrations.sh
├── 🔧 core/ # REUSABLE UTILITIES
│ ├── logging.sh # Centralized logging functions
│ ├── database.sh # DB connection & validation
│ ├── phase-orchestrator.sh # Phase execution framework
│ └── reporting.sh # Modular reporting system
├── 📋 phases/ # 🆕 SELF-CONTAINED IMPORT PHASES
│ ├── 03-foundation-data.sh # Foundation tables (reference, country, gear)
│ ├── 04-species-data.sh # Species systems (ASFIS, WoRMS, ITIS)
│ ├── 05-harmonization.sh # WoRMS-ASFIS direct FK harmonization
│ ├── 06-msc-fisheries.sh # MSC certified fisheries
│ ├── 07-final-reporting.sh # Final status report & validation for Phases 03-06
│ ├── 08-rfmo-vessels.sh # RFMO vessel scripts
│ ├── 09-baddie-vessels.sh # BADDIE vessel scripts (SDN, WRO, IUU, Outlaw Ocean)
│ ├── 10-country-vessels.sh # Pending # Non-EU country scripts
│ ├── 11-country-eu-vessels.sh # Pending # EU country scripts
│ ├── 12-civil-society-vessels.sh # Pending # Civil Society scripts (ISSF, AP2HI, MSC)
│ ├── 13-intergov-vessels.sh # Pending # Intergovernmental scripts (PNA_TUNA, PNA_FSMA)
│ ├── 14-final-vessels-reporting.sh # Pending # Final status report & validation for Phases 08-13
│ ├── 📁 rfmo-specific/ # Modularizes RFMO vessel scripts
│ │ ├── 08-npfc-vessels.sh # ✅ DONE
│ │ ├── 08-neafc-vessels.sh # ✅ DONE
│ │ ├── 08-sprfmo-vessels.sh # 🌶️ IN PROGRESS
│ │ ├── 08-ccsbt-vessels.sh # Pending
│ │ ├── 08-ccamlr-vessels.sh # Pending
│ │ ├── 08-ffa-vessels.sh # Pending
│ │ ├── 08-gfcm-vessels.sh # Pending
│ │ ├── 08-iattc-vessels.sh # Pending
│ │ ├── 08-iccat-vessels.sh # Pending
│ │ ├── 08-iotc-vessels.sh # Pending
│ │ ├── 08-nafo-vessels.sh # Pending
│ │ ├── 08-seafo-vessels.sh # Pending
│ │ └── 08-wcpfc-vessels.sh # Pending
│ ├── 📁 baddie-specific/ # Modularizes BADDIE vessel scripts
│ │ ├── 09-sdn-glomag-vessels.sh # ✅ DONE
│ │ ├── 09-wro-vessels.sh # ✅ DONE
│ │ ├── 09-outlaw-ocean-vessels.sh # Pending
│ │ ├── 09-iuu-ccsbt-vessels.sh # Pending
│ │ ├── 09-iuu-ccamlr-vessels.sh # Pending
│ │ ├── 09-iuu-ffa-vessels.sh # Pending
│ │ ├── 09-iuu-gfcm-vessels.sh # Pending
│ │ ├── 09-iuu-iattc-vessels.sh # Pending
│ │ ├── 09-iuu-iccat-vessels.sh # Pending
│ │ ├── 09-iuu-iotc-vessels.sh # Pending
│ │ ├── 09-iuu-nafo-vessels.sh # Pending
│ │ ├── 09-iuu-neafc-vessels.sh # Pending
│ │ ├── 09-iuu-npfc-vessels.sh # Pending
│ │ ├── 09-iuu-seafo-vessels.sh # Pending
│ │ ├── 09-iuu-siofa-vessels.sh # Pending
│ │ ├── 09-iuu-sprfmo-vessels.sh # Pending
│ │ └── 09-iuu-wcpfc-vessels.sh # Pending
│ ├── 📁 country-specific/ # Modularizes Non-EU Country vessel scripts
│ │ ├── 10-usa-adfg-vessels.sh # Pending
│ │ ├── 10-fro-vessels.sh # Pending
│ │ ├── 10-mdv-vessels.sh # Pending
│ │ ├── 10-nor-vessels.sh # Pending
│ │ ├── 10-pan-vessels.sh # Pending
│ │ ├── 10-twn-siofa-vessels.sh # Pending
│ │ ├── 10-twn-pac-vessels.sh # Pending
│ │ ├── 10-twn-siofa-car-vessels.sh # Pending
│ │ ├── 10-uk-large-vessels.sh # Pending
│ │ ├── 10-uk-small-vessels.sh # Pending
│ │ ├── 10-mex-large-vessels.sh # Pending
│ │ ├── 10-mex-small-vessels.sh # Pending
│ │ ├── 10-rus-vessels.sh # Pending
│ │ └── 10-per-vessels.sh # Pending
│ ├── 📁 country-eu-specific/ # Modularizes EU Country vessel scripts
│ │ ├── 11-bel-vessels.sh # Pending (21 EU countries total)
│ │ ├── 11-bgr-vessels.sh # Pending
│ │ └── ... (19 more EU country scripts) # Pending
│ ├── 📁 civil-society-specific/ # Modularizes Civil Society vessel scripts
│ │ ├── 12-issf-uvi-vessels.sh # Pending
│ │ ├── 12-issf-pvr-vessels.sh # Pending
│ │ ├── 12-issf-ps-vessels.sh # Pending
│ │ ├── 12-issf-vosi-vessels.sh # Pending
│ │ ├── 12-ap2hi-vessels.sh # Pending
│ │ └── 12-msc-vessels.sh # Pending
│ └── 📁 intergov-specific/ # Modularizes Intergovernmental vessel scripts
│ ├── 13-pna-tuna-vessels.sh # Pending
│ └── 13-pna-fsma-vessels.sh # Pending
├── 📥 import/ # Existing import scripts
│ └── 🚢 vessels/ # Vessel-specific import scripts
│ ├── create_sources_vessels.sql
│ ├── load_sources_vessels.sh
│ └── data/
│ ├── BADDIE/ # Baddie + Blacklist vessel data
│ │ ├── clean_sdn_glomag.sh # ✅ DONE
│ │ ├── clean_wro.sh # ✅ DONE
│ │ ├── load_sdn_glomag.sh # ✅ DONE
│ │ ├── load_wro.sh # ✅ DONE
│ │ ├── setup_sdn_glomag_loading.sql # ✅ DONE
│ │ └── setup_wro_loading.sql # ✅ DONE
│ ├── CIVIL_SOCIETY/ # Civil Society Organizations vessel data
│ ├── COUNTRY/ # Non-EU Country vessel data
│ ├── COUNTRY_EU/ # EU Country vessel data
│ ├── INTERGOV/ # Intergovernmental Organizations vessel data
│ └── RFMO/ # Regional Fisheries Management Organization vessel data
│ ├── clean_npfc_vessels.sh # ✅ DONE
│ ├── clean_neafc_vessels.sh # ✅ DONE
│ ├── clean_sprfmo_vessels.sh # 🌶️ IN PROGRESS
│ ├── load_npfc_vessels.sh # ✅ DONE
│ ├── load_neafc_vessels.sh # ✅ DONE
│ ├── load_sprfmo_vessels.sh # 🌶️ IN PROGRESS
│ ├── setup_npfc_loading.sql # ✅ DONE
│ ├── setup_neafc_loading.sql # ✅ DONE
│ └── setup_sprfmo_loading.sql# 🌶️ IN PROGRESS
└── 🔄 preprocessing/ # Data preprocessing scripts
├── clean_asfis_data.py # Special cleaning for asfis.ts
└── msc_fisheries_preprocessing.py # Special cleaning for msc fisheries
Emily's Progress Tracking System:
- 🌶️ = IN PROGRESS, NOT AVAILABLE YET. EM ACTIVE (Emily is currently working on it)
- ✅ = DONE
- (No emoji) = Pending (not yet started)
Enhanced Design Principles
- Hierarchical Phase Independence: Each major phase (08-13) and sub-phase can be executed independently
- Category-Based Organization: Vessel data organized by logical categories (RFMO, BADDIE, etc.)
- Utility Sharing: Common functions extracted to
/core/directory with vessel-specific extensions - Progress Tracking: Built-in status tracking for Emily's active development workflow
- Fallback Safety: Main entrypoint has fallback to original monolithic script
- Configuration-Driven Expansion: Reporting system automatically discovers new vessel tables/phases
- Backward Compatibility: Identical user experience and output format
- Parallel Development: Multiple developers can work on different vessel categories simultaneously
Enhanced Phase Execution Flow
Vessel Data Architecture
Database Schema Integration
The vessel system includes comprehensive schema definitions supporting:
Core Vessel Identifiers:
- IMO, IRCS, MMSI, National Registry, EU CFR
- Foreign key relationships to
country_isofor flag states - Support for 40+ vessel source types across 6 categories
Source Type Enums:
-- Vessel source types with comprehensive RFMO and organizational coverage
vessel_source_type_enum: 'RFMO', 'RFMO_IUU', 'COUNTRY', 'COUNTRY_IUU',
'INTERGOVERNMENTAL', 'ORGANIZATION', 'CERTIFICATION_BODY',
'CIVIL_SOCIETY', 'OUTLAW_OCEAN', 'WRO', 'SDN_GLOMAG'
-- External identifier types covering all RFMOs
external_identifier_type_enum: 'RFMO_CCAMLR', 'RFMO_CCSBT', 'RFMO_FFA',
'RFMO_GFCM', 'RFMO_IATTC', 'RFMO_ICCAT', 'RFMO_IOTC', 'RFMO_NAFO',
'RFMO_NEAFC', 'RFMO_NPFC', 'RFMO_SEAFO', 'RFMO_SIOFA', 'RFMO_SPRFMO',
'RFMO_WCPFC', 'HULL_ID', 'AP2HI_ID', 'ISSF_TUVI', 'ADFG_NO'
Enhanced Original Sources Table:
CREATE TABLE original_sources_vessels (
source_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
source_shortname TEXT UNIQUE NOT NULL,
source_fullname TEXT NOT NULL,
source_types TEXT[] NOT NULL, -- Array for multi-type sources
rfmo_id UUID, -- FK to rfmos(id) for RFMO-specific sources
country_id UUID, -- FK to country_iso(id) for country-specific sources
source_urls JSONB, -- Combined URLs as JSON array
-- ... additional metadata fields
);
Implementation Details
Enhanced Phase Execution Pattern:
# Major phase execution with optional flag for vessel phases
if execute_phase "08-rfmo-vessels" "true"; then
log_success "RFMO vessel phase completed via modular script"
else
log_warning "RFMO vessel phase had issues - continuing with container startup"
log_warning "Note: RFMO vessel data is optional - check if data files are available"
fi
# Sub-phase execution within major phases
execute_rfmo_subphase "npfc" "✅" # Completed
execute_rfmo_subphase "neafc" "🌶️" # In progress
execute_rfmo_subphase "sprfmo" "🌶️" # In progress
Enhanced Utility Function Pattern:
# Each phase script sources required utilities plus vessel-specific functions
source /app/scripts/core/logging.sh
source /app/scripts/core/database.sh
source /app/scripts/core/vessel-utilities.sh # New vessel-specific utilities
# Vessel-specific helper functions
get_vessel_sources_by_type() { ... }
vessel_source_has_type() { ... }
mark_vessel_source_loaded() { ... }
Enhanced Error Handling Strategy:
- Phase-level validation with detailed logging
- Category-level recovery (can skip entire vessel categories)
- Source-level granularity (can skip individual vessel sources)
- Progress preservation across development sessions
- Emily's active development status tracking
Enhanced Consequences
Positive
Maintainability:
- Individual vessel categories can be developed and debugged independently
- 40+ vessel sources organized into logical, manageable groups
- Emily can work on specific categories without affecting others
- Clear separation of concerns reduces cognitive complexity for large vessel system
- Reusable utilities eliminate code duplication across vessel categories
Development Velocity:
- Parallel development across 6 vessel categories by multiple developers
- Individual vessel sources can be tested:
bash /app/scripts/phases/rfmo-specific/08-npfc-vessels.sh - Progress tracking prevents duplicate work and shows system status
- Easier onboarding for new team members working on specific vessel categories
- Reduced merge conflicts with hierarchical organization
Testing & Quality:
- Category-level testing:
bash /app/scripts/phases/08-rfmo-vessels.sh - Source-level testing:
bash /app/scripts/phases/rfmo-specific/08-npfc-vessels.sh - Unit testing possible for vessel-specific utility functions
- Integration testing maintains full-system validation across 40+ sources
- Progressive validation as Emily completes each vessel source
Scalability:
- Ready for vessel data expansion beyond 40+ current datasets
- Configuration-driven approach for future vessel categories
- Reporting system automatically adapts to new vessel phases and sources
- Hierarchical organization supports unlimited vessel source expansion
Operations:
- Failed vessel sources can be re-run individually without affecting others
- Category-level recovery (skip problematic vessel category, continue others)
- Incremental deployment and rollback capabilities for vessel phases
- Better error localization across complex vessel data system
- Production-ready with vessel sources as optional components
Negative
Complexity:
- More files to manage (60+ scripts vs 1 monolithic for vessel system)
- Hierarchical dependency management across vessel categories
- Requires understanding of both phase orchestration and vessel category organization
- Emily's progress tracking adds coordination overhead
Debugging:
- Stack traces may span multiple files across vessel category hierarchy
- Vessel phase interdependencies must be carefully managed
- Slightly more complex failure analysis across vessel categories
- Progress tracking requires coordination during parallel development
Risks and Mitigations
| Risk | Impact | Mitigation |
|---|---|---|
| Vessel phase dependency failures | High | Built-in dependency checking and category-level validation |
| File system permissions across vessel directories | Medium | Explicit chmod in deployment process for all vessel subdirectories |
| Core vessel utility corruption | High | Fallback to original monolithic script + vessel-specific fallbacks |
| Vessel phase ordering issues | Medium | Clear naming convention and documentation for vessel categories |
| Emily's progress conflicts | Medium | Built-in status checking and conflict resolution in progress tracking |
| Vessel source data inconsistencies | High | Comprehensive validation at source, category, and system levels |
Enhanced Migration Strategy
Completed Phases ✅
- Phase 1: Extract utility functions (logging, database) - COMPLETED
- Phase 2: Extract foundation data phase - COMPLETED
- Phase 3: Extract species data phase - COMPLETED
- Phase 4: Extract harmonization phase - COMPLETED
- Phase 5: Extract MSC fisheries phase - COMPLETED
- Phase 6: Extract final reporting - COMPLETED
- Phase 7: Create vessel sources table and loading - COMPLETED
Current Implementation ✅🌶️
-
Phase 8: RFMO vessel category - PARTIALLY COMPLETED
- ✅ NPFC vessels (complete)
- 🌶️ NEAFC vessels (Emily actively developing)
- 🌶️ SPRFMO vessels (Emily actively developing)
- Pending: 9 additional RFMO sources
-
Phase 9: BADDIE vessel category - PARTIALLY COMPLETED
- ✅ SDN GLOMAG vessels (complete)
- 🌶️ WRO vessels (Emily actively developing)
- 🌶️ Outlaw Ocean vessels (Emily actively developing)
- 🌶️ 14 IUU sources across all RFMOs (Emily actively developing)
Planned Phases (Emily's Pipeline)
-
Phase 10: Country vessel category - PLANNED
- 14 non-EU country sources
-
Phase 11: EU Country vessel category - PLANNED
- 21 EU member state sources
-
Phase 12: Civil Society vessel category - PLANNED
- 6 civil society organization sources (ISSF variants, AP2HI, MSC)
-
Phase 13: Intergovernmental vessel category - PLANNED
- 2 Pacific island organization sources (PNA)
-
Phase 14: Final vessel reporting - PLANNED
- Comprehensive validation across all vessel categories
Enhanced Success Metrics
System Level ✅
- Functionality: 100% preservation of existing import behavior ✅
- Performance: No degradation in import times ✅
- Reliability: Maintained comprehensive validation ✅
- Maintainability: Individual phase testing capability ✅
- Scalability: Architecture supports vessel data expansion ✅
Vessel System Level 🌶️✅
- Vessel Categories: 6 major categories properly organized ✅
- Source Management: 40+ vessel sources systematically organized ✅
- Progress Tracking: Emily's development status clearly visible ✅
- Parallel Development: Multiple developers can work independently ✅
- Category Independence: Vessel categories can be developed/tested separately ✅
Development Workflow ✅🌶️
- Active Development: Emily can continue vessel development seamlessly 🌶️
- Status Visibility: Clear understanding of what's done/in-progress/pending ✅
- Individual Source Testing: Each vessel source can be tested independently ✅
- Recovery Options: Failed vessel sources don't break entire system ✅
Enhanced Implementation Notes
Compatibility
- Original
entrypoint.sh.originalmaintained as backup - Fallback mechanism built into new entrypoint
- Identical log output and validation messages preserved
- Vessel phases are optional - system works without vessel data
Current Development Status (Emily's Key)
- 🌶️ = Emily is actively working on it
- ✅ = Completed and operational
- (No emoji) = Planned but not yet started
Emily's Active Work Areas
- NEAFC vessels (🌶️): RFMO category development
- SPRFMO vessels (🌶️): RFMO category development
- WRO vessels (🌶️): BADDIE category development
- Multiple IUU sources (🌶️): BADDIE category expansion
Vessel Source Integration
- RFMO Integration: Automatic mapping from source names to
rfmostable UUIDs - Country Integration: Automatic mapping from alpha3 codes to
country_isotable UUIDs - Source Types: Enum validation for vessel source types with comprehensive coverage
- Foreign Keys: All vessel data references
original_sources_vesselsvia UUID foreign keys
Future Considerations
- Vessel categories can be extended beyond current 6 categories
- Source discovery mechanism for automatic detection of new vessel datasets
- Parallel execution potential for independent vessel categories
- Consider configuration files (YAML/JSON) for vessel source definitions
- International expansion beyond current geographic coverage
Documentation Requirements
- Update deployment procedures to include vessel phase script permissions
- Document vessel category execution order and dependencies
- Create troubleshooting guide for vessel-specific issues
- Maintain Emily's progress tracking documentation
- Document vessel source onboarding process for new datasets
Industry Best Practices and Enterprise Alignment
"Golden Rule" Database Architecture Alignment
Our modular approach aligns with the "Golden Rule" of enterprise database architecture: "Each system component should have a single, well-defined responsibility and be independently testable, deployable, and maintainable."
How Our Architecture Implements the Golden Rule:
1. Single Responsibility Principle
- Phase-Level: Each phase (08-rfmo, 09-baddie) handles one vessel category
- Source-Level: Each script (08-npfc-vessels.sh) handles one data source
- Utility-Level: Core utilities have single purposes (logging, database operations)
- Schema-Level: Each table serves one domain (associates, authorizations, core vessel data)
2. Independent Testing and Deployment
- Individual phases can be tested:
bash /app/scripts/phases/08-rfmo-vessels.sh - Individual sources can be tested:
bash /app/scripts/phases/rfmo-specific/08-npfc-vessels.sh - Failed components don't break entire system
- Rollback possible at phase or source level
3. Separation of Concerns
- Data Pipeline: Raw → Cleaned → Staged → Database (clear boundaries)
- Processing Logic: Clean scripts vs Load scripts vs Validation scripts
- Error Handling: Phase-level vs Source-level vs System-level
- Configuration: Source metadata vs Processing logic vs Database schema
Enterprise Database Best Practices
Google/Facebook-Style Data Pipeline Architecture
- Idempotent Operations: Each phase can be re-run safely without data corruption
- Lineage Tracking: Every data element traced to original source via
original_sources_vessels - Horizontal Scaling: Vessel categories can be processed independently/in parallel
- Graceful Degradation: System remains functional even if vessel categories fail
Netflix/Uber-Style Microservices Patterns
- Domain Boundaries: Vessel data isolated from species/reference data
- Service Independence: Vessel phases don't depend on completion of other phases
- Configuration-Driven: New vessel sources added via configuration, not code changes
- Health Checks: Built-in validation and reporting at each level
Amazon/Microsoft-Style Enterprise Data Management
- Data Catalog:
original_sources_vesselsserves as comprehensive source catalog - Schema Evolution: Vessel schemas can evolve without breaking existing data
- Audit Trail: Complete processing history maintained via logging and status tracking
- Recovery Mechanisms: Multiple levels of rollback and retry capabilities
GitLab/GitHub-Style CI/CD Integration
- Pipeline-as-Code: All phases defined as executable scripts in version control
- Progressive Deployment: Phases can be rolled out incrementally
- Feature Flags: Emily's progress tracking (🌶️) serves as feature flag system
- Rollback Strategy: Fallback mechanisms at every level
Comparison with Industry Standards
| Architecture Pattern | Industry Example | Our Implementation |
|---|---|---|
| Domain-Driven Design | Spotify's Backend | Vessel domain isolated with clear boundaries |
| Event-Driven Architecture | LinkedIn's Data Pipeline | Phase completion triggers next phase execution |
| Circuit Breaker Pattern | Netflix's Resilience | Failed phases don't cascade to other phases |
| Bulkhead Pattern | Docker Containerization | Vessel categories isolated from each other |
| Saga Pattern | Uber's Distributed Transactions | Multi-phase vessel import with compensation |
| CQRS | Stack Overflow | Separate read (reporting) and write (import) concerns |
Data Governance and Compliance Alignment
GDPR/Privacy-by-Design
- Data Minimization: Only vessel-relevant data collected and processed
- Source Attribution: Every data element traceable to original source
- Right to Deletion: Source-level data removal capabilities
- Processing Transparency: Complete audit trail of all data transformations
SOX/Audit Compliance
- Change Control: All processing logic under version control
- Audit Trail: Complete lineage from raw data to final database state
- Segregation of Duties: Different phases handled by different systems/people
- Validation Controls: Built-in data quality checks at every level
Vessel Associates and Authorization Architecture
Advanced Relationship Management System
Our vessel associates architecture implements enterprise-grade entity resolution and relationship tracking, addressing the complex reality of vessel ownership and operations in the maritime industry.
Master Entity Deduplication Pattern
Problem Solved: Maritime entities (companies, individuals) appear across multiple sources with different spellings, addresses, and identifiers. Without deduplication, we'd have thousands of duplicate entities.
Solution: Two-tier architecture with master entity resolution:
-- Master entities table (deduplicated canonical records)
vessel_associates_master (
associate_master_uuid UUID PRIMARY KEY,
canonical_name VARCHAR(200), -- Normalized name for deduplication
name_variations JSONB, -- All spelling variations seen
primary_address TEXT, -- Best-quality address
headquarters_country_id UUID, -- Primary business location
business_type VARCHAR(100), -- Company/Individual/Organization
incorporation_date DATE, -- When available from sources
regulatory_identifiers JSONB -- Tax IDs, business registrations
);
-- Individual relationships (source-specific instances)
vessel_associates (
associate_uuid UUID PRIMARY KEY,
vessel_uuid UUID, -- FK to vessels
associate_master_uuid UUID, -- FK to master entity
source_id UUID, -- Which source reported this relationship
associate_type associate_type_enum, -- Role in relationship
original_name VARCHAR(200), -- Exact name as reported by source
relationship_start_date DATE, -- When relationship began
relationship_end_date DATE, -- When relationship ended (if known)
ownership_percentage DECIMAL, -- For beneficial ownership
reg_number VARCHAR(150), -- License/registration numbers
sdn_uuid UUID, -- Link to SDN sanctions data
wro_uuid UUID -- Link to WRO enforcement data
);
Associate Type Taxonomy
Our comprehensive associate type system covers the full spectrum of vessel relationships found across 40+ data sources:
associate_type_enum: [
'BENEFICIAL_OWNER', -- Ultimate economic interest holders
'CHARTERER', -- Short-term vessel leasers
'FISH_MASTER', -- Fishing operations supervisors
'FISH_PRODUCER_ORGANIZATION', -- Fishing cooperatives/organizations
'OPERATING_COMPANY', -- Day-to-day vessel operators
'OPERATOR', -- Individual vessel operators
'OTHER_BENEFICIARY', -- Other economic interest holders
'OWNER', -- Legal vessel owners
'OWNING_COMPANY', -- Corporate vessel owners
'SDN_LINKED_ENTITY', -- Sanctions-connected entities
'VESSEL_MASTER', -- Ship captains/masters
'WRO_COMPANY' -- World-Check Risk intelligence entities
]
Enterprise-Grade Authorization System
Comprehensive Fishing Authorization Tracking: Our authorization system captures the complex regulatory landscape governing commercial fishing vessels worldwide.
vessel_authorizations (
authorization_uuid UUID PRIMARY KEY,
vessel_uuid UUID, -- FK to vessels
source_id UUID, -- Reporting authority
authorization_type authorization_type_enum,
license_number VARCHAR(100), -- Official license identifier
start_date DATE, -- Authorization validity start
end_date DATE, -- Authorization expiration
rfmo_id UUID, -- Regional authority (ICCAT, WCPFC, etc.)
region_description TEXT, -- Geographic scope description
fao_area_ids JSONB, -- Array of FAO area UUIDs
species_ids JSONB, -- Array of authorized species UUIDs
additional_data JSONB -- Quotas, conditions, restrictions
);
Authorization Type Coverage:
authorization_type_enum: [
'FISHING_AUTHORIZATION', -- General fishing permissions
'FISHING_LICENSE', -- Formal fishing licenses
'TRANSSHIPMENT_AUTHORIZATION', -- At-sea transfer permissions
'CARRIER_AUTHORIZATION', -- Fish transport permissions
'OBSERVER_AUTHORIZATION', -- Scientific observer requirements
'TUNA_AUTHORIZATION', -- Tuna-specific permissions
'EXEMPT_VESSEL', -- Regulatory exemptions
'SUPPORT_VESSEL_AUTHORIZATION', -- Support/supply vessel permissions
'OTHER_AUTHORIZATION' -- Miscellaneous regulatory permissions
]
Advanced Querying Capabilities
Complex Ownership Analysis
-- Find all vessels with common beneficial owners
SELECT
v1.vessel_name as vessel_1,
v2.vessel_name as vessel_2,
am.canonical_name as common_owner
FROM vessels v1
JOIN vessel_associates va1 ON v1.vessel_uuid = va1.vessel_uuid
JOIN vessel_associates_master am ON va1.associate_master_uuid = am.associate_master_uuid
JOIN vessel_associates va2 ON am.associate_master_uuid = va2.associate_master_uuid
JOIN vessels v2 ON va2.vessel_uuid = v2.vessel_uuid
WHERE va1.associate_type = 'BENEFICIAL_OWNER'
AND va2.associate_type = 'BENEFICIAL_OWNER'
AND v1.vessel_uuid != v2.vessel_uuid;
Regulatory Compliance Analysis
-- Find vessels with expired authorizations in specific regions
SELECT
v.vessel_name,
v.imo,
va.license_number,
va.end_date,
r.rfmo_name
FROM vessels v
JOIN vessel_authorizations va ON v.vessel_uuid = va.vessel_uuid
JOIN rfmos r ON va.rfmo_id = r.id
WHERE va.end_date < CURRENT_DATE
AND va.authorization_type = 'FISHING_AUTHORIZATION'
AND r.rfmo_acronym IN ('ICCAT', 'WCPFC', 'IOTC');
Sanctions and Risk Intelligence Integration
-- Find vessels with sanctions-linked associates
SELECT
v.vessel_name,
v.imo,
am.canonical_name as sanctioned_entity,
va.associate_type,
sdn.sdn_type
FROM vessels v
JOIN vessel_associates va ON v.vessel_uuid = va.vessel_uuid
JOIN vessel_associates_master am ON va.associate_master_uuid = am.associate_master_uuid
JOIN vessels_sdn_simple sdn ON va.sdn_uuid = sdn.sdn_uuid
WHERE sdn.sdn_type IN ('SDN', 'BLOCKED_PERSON', 'DENIED_PERSON');
Data Quality and Validation
Cross-Source Relationship Validation
- Consistency Checks: Same vessel-entity relationships across multiple sources
- Temporal Validation: Relationship dates align with vessel operational periods
- Hierarchical Validation: Corporate ownership structures logically consistent
- Regulatory Validation: Authorizations align with flag state and operational areas
Master Entity Quality Metrics
- Name Standardization Score: Measure of canonical name quality
- Address Completeness: Percentage of complete address information
- Source Coverage: How many sources report this entity
- Relationship Density: Number of vessel connections per entity
Integration with Global Compliance Systems
OFAC/Sanctions Integration
- Real-time Screening: New associates checked against sanctions lists
- Historical Analysis: Track sanctions status changes over time
- Network Analysis: Identify indirect sanctions exposure through relationships
- Alert System: Automated notifications for sanctions-linked vessels
RFMO Compliance Tracking
- Multi-RFMO Vessels: Track vessels authorized across multiple regions
- Quota Management: Link authorizations to catch quotas and limits
- Compliance History: Track authorization violations and penalties
- Cross-Reference Validation: Verify RFMO data consistency across sources
This associates and authorization architecture positions our system as a comprehensive vessel intelligence platform, capable of supporting complex maritime compliance, risk assessment, and regulatory analysis use cases found in enterprise-grade systems used by government agencies, financial institutions, and maritime industry leaders.
Decision Date: August 2025
Stakeholders: Database Engineering Team, DevOps Team, Emily (Lead Vessel Data Developer)
Review Date: December 2025 (post vessel system completion)
Current Priority: Support Emily's active vessel development while maintaining system stability and enabling parallel development across vessel categories.