Skip to main content

Source: ebisu/docs/adr/0065-vessel-data-import-reorganization.md | ✏️ Edit on GitHub

ADR-0065: Vessel Data Import System Reorganization

Status

Accepted

Context

The Ebisu vessel data import system had become disorganized with:

  • 60+ import scripts scattered across multiple directories
  • 152 data files (61MB+) stored directly in Git, causing repository bloat
  • Inconsistent patterns and error handling across importers
  • Poor discoverability of available sources
  • Hardcoded credentials in scripts creating security risks

Decision

We reorganized the import system with the following changes:

  1. Structured Import Scripts: Created import-sources/ directory with clear organization by source type (rfmo/, country/, etc.)

  2. Git LFS for Data Files: Use Git Large File Storage for vessel data files to avoid repository bloat while maintaining version control

  3. Security First: Removed all hardcoded credentials from scripts, requiring environment variables with no defaults

  4. Unified Interface: Single entry point for all imports via docker-import.sh

  5. Local Staging Option: Support for staging new data files locally before committing to Git LFS

Consequences

Positive

  • Clean Git Repository: Large data files tracked via Git LFS, preventing repository bloat
  • Better Security: No credentials in code, all secrets via environment variables
  • Improved Organization: Clear directory structure makes finding importers easy
  • Consistent Interface: Same import process for all data sources
  • Version Control: Full Git history for data files without performance impact

Negative

  • Git LFS Dependency: Requires Git LFS installation and configuration
  • Migration Effort: Existing data files need to be migrated to Git LFS
  • Learning Curve: Team needs to understand Git LFS workflow

Neutral

  • Dual Storage: Supports both Git LFS and local staging directories
  • Environment Setup: Requires proper environment variable configuration

Implementation

Directory Structure

import-sources/
├── docker-import.sh # Unified import runner
├── manage-data.sh # Local data management
└── {type}/{source}/ # Organized by source type
└── import.sh # Source-specific importer

import/vessels/vessel_data/ # Git LFS tracked data
├── COUNTRY/
└── RFMO/

Security Configuration

# .env file (never committed)
POSTGRES_USER=your_user
POSTGRES_PASSWORD=your_secure_password
POSTGRES_HOST=localhost
POSTGRES_PORT=5433
POSTGRES_DB=ebisu

Git LFS Setup

# One-time setup
./setup-git-lfs.sh

# Track patterns in .gitattributes
import/vessels/vessel_data/**/*.csv filter=lfs diff=lfs merge=lfs -text
import/vessels/vessel_data/**/*.xlsx filter=lfs diff=lfs merge=lfs -text

Usage Examples

# Import from Git LFS data
./import-sources/docker-import.sh country/usa-alaska

# Stage new data locally first
./import-sources/manage-data.sh add "COUNTRY/USA_AK" new_data.csv
./import-sources/docker-import.sh country/usa-alaska

References