Source:
ebisu/docs/adr/0064-vessel-import-directory-organization.md| ✏️ Edit on GitHub
ADR-0064: Vessel Import Directory Organization
Status: Accepted
Date: 2025-01-11
Stakeholders: Development team, Data team
Context
The vessel import system was becoming difficult to manage with various data sources updating at different frequencies. We needed a standardized approach to:
- Organize data files by source type and individual source
- Handle updates independently for each source
- Maintain version history
- Prevent accidental reimports
- Track data lineage
Decision
Implement a hierarchical directory structure with standardized import scripts for each data source.
Directory Structure
import/vessels/vessel_data/
├── RFMO/ # Regional Fisheries Management Organizations
├── COUNTRY/ # National vessel registries
├── INTERGOV/ # Inter-governmental organizations
├── BADDIE/ # Sanctions and IUU lists
└── CIVIL_SOCIETY/ # NGO and civil society sources
Each source has:
SOURCE_NAME/
├── raw/ # Original files as received
├── cleaned/ # Processed CSV files ready for import
└── archive/ # Historical versions
Import Script Pattern
Each data source has its own import script that:
- Automatically finds the latest file in
/raw/ - Checks file hash to prevent reimports
- Creates import batch and lineage records
- Loads to staging tables
- Converts to intelligence reports
- Updates cross-source confirmations
File Naming Convention
- Include date:
SOURCE_NAME_YYYY-MM-DD.ext - Preserves original filename structure where possible
- Clear indication of data collection date
Implementation
1. Directory Creation
mkdir -p vessel_data/{TYPE}/{SOURCE}/{raw,cleaned,archive}
2. Import Script Template
/scripts/import/vessels/data/{TYPE}/load_{source}.sh
3. Status Monitoring
/scripts/import/vessels/check_import_status.sh
Consequences
Positive
- Independent Updates: Each source can be updated without affecting others
- Clear Organization: Easy to find data for any source
- Version Control: Archive directory preserves history
- Automated Processing: Scripts handle most import logic
- Audit Trail: Complete lineage tracking
Negative
- More Scripts: One script per source (maintenance overhead)
- Directory Sprawl: Many subdirectories to manage
- Storage: Keeping raw and cleaned versions uses more space
Neutral
- Learning Curve: Team needs to understand new structure
- Migration Work: Existing data needs reorganization
Example Workflow
Adding New Data
- Download latest vessel list from source
- Place in
/raw/directory with date - Run import script
- Move previous version to
/archive/
Monthly Update Example
# 1. Add new PNA TUNA file
cp PNA_TUNA_2025-02-08.csv import/vessels/vessel_data/INTERGOV/PNA_TUNA/raw/
# 2. Run import
docker exec -i ebisu-importer /app/scripts/import/vessels/data/INTERGOV/load_pna_tuna.sh
# 3. Archive old file
mv import/vessels/vessel_data/INTERGOV/PNA_TUNA/raw/PNA_TUNA_2025-01-08.csv \
import/vessels/vessel_data/INTERGOV/PNA_TUNA/archive/
Security Considerations
- No Credentials: Never store API keys or passwords in import directories
- File Validation: Always verify file integrity before import
- Access Control: Limit write access to import directories
Monitoring
The check_import_status.sh script provides:
- Recent import history
- Source coverage statistics
- Cross-source confirmation metrics
- Data quality indicators
References
- ADR-0061: Intelligence Confirmations Are Not Duplicates
- ADR-0062: Phase 1 Data Isolation Architecture
- ADR-0063: Missing Vessel Registries Import Strategy
- IMPORT_GUIDE.md: Detailed import instructions