Source:
ebisu/docs/adr/0065-vessel-data-import-reorganization.md| ✏️ Edit on GitHub
ADR-0065: Vessel Data Import System Reorganization
Status
Accepted
Context
The Ebisu vessel data import system had become disorganized with:
- 60+ import scripts scattered across multiple directories
- 152 data files (61MB+) stored directly in Git, causing repository bloat
- Inconsistent patterns and error handling across importers
- Poor discoverability of available sources
- Hardcoded credentials in scripts creating security risks
Decision
We reorganized the import system with the following changes:
-
Structured Import Scripts: Created
import-sources/directory with clear organization by source type (rfmo/, country/, etc.) -
Git LFS for Data Files: Use Git Large File Storage for vessel data files to avoid repository bloat while maintaining version control
-
Security First: Removed all hardcoded credentials from scripts, requiring environment variables with no defaults
-
Unified Interface: Single entry point for all imports via
docker-import.sh -
Local Staging Option: Support for staging new data files locally before committing to Git LFS
Consequences
Positive
- Clean Git Repository: Large data files tracked via Git LFS, preventing repository bloat
- Better Security: No credentials in code, all secrets via environment variables
- Improved Organization: Clear directory structure makes finding importers easy
- Consistent Interface: Same import process for all data sources
- Version Control: Full Git history for data files without performance impact
Negative
- Git LFS Dependency: Requires Git LFS installation and configuration
- Migration Effort: Existing data files need to be migrated to Git LFS
- Learning Curve: Team needs to understand Git LFS workflow
Neutral
- Dual Storage: Supports both Git LFS and local staging directories
- Environment Setup: Requires proper environment variable configuration
Implementation
Directory Structure
import-sources/
├── docker-import.sh # Unified import runner
├── manage-data.sh # Local data management
└── {type}/{source}/ # Organized by source type
└── import.sh # Source-specific importer
import/vessels/vessel_data/ # Git LFS tracked data
├── COUNTRY/
└── RFMO/
Security Configuration
# .env file (never committed)
POSTGRES_USER=your_user
POSTGRES_PASSWORD=your_secure_password
POSTGRES_HOST=localhost
POSTGRES_PORT=5433
POSTGRES_DB=ebisu
Git LFS Setup
# One-time setup
./setup-git-lfs.sh
# Track patterns in .gitattributes
import/vessels/vessel_data/**/*.csv filter=lfs diff=lfs merge=lfs -text
import/vessels/vessel_data/**/*.xlsx filter=lfs diff=lfs merge=lfs -text
Usage Examples
# Import from Git LFS data
./import-sources/docker-import.sh country/usa-alaska
# Stage new data locally first
./import-sources/manage-data.sh add "COUNTRY/USA_AK" new_data.csv
./import-sources/docker-import.sh country/usa-alaska
References
- Git LFS Documentation: https://git-lfs.github.com/
- Docker Compose Environment Variables: https://docs.docker.com/compose/environment-variables/
- PostgreSQL Environment Variables: https://www.postgresql.org/docs/current/libpq-envars.html