Skip to main content

Source: ebisu/backend/import-sources/README.md | ✏️ Edit on GitHub

Ebisu Import System

Overview

The reorganized import system separates code from data, providing better organization and avoiding Git repository bloat.

Directory Structure

ebisu/
├── import-sources/ # Import scripts and tools
│ ├── manage-data.sh # Data file management
│ ├── docker-import.sh # Docker-based import runner
│ └── {source-type}/{name}/ # Organized import scripts

├── data/
│ ├── raw/ # Raw datasets synced from Google Drive (ignored by Git)
│ │ ├── vessels/vessel_data/
│ │ │ ├── COUNTRY/USA_AK/raw/
│ │ │ └── RFMO/ICCAT/raw/
│ │ ├── WoRMS_download_2025-07-01/
│ │ └── ...
│ ├── processed/ # Derived outputs generated locally
│ └── archive/ # Optional long-term storage for superseded snapshots

└── (optional) external storage, e.g. `$EBISU_DATA_ROOT` pointing at a Google Drive sync folder

Quick Start

1. Add New Data File

# Add a new data file for import
./import-sources/manage-data.sh add "COUNTRY/USA_AK" ~/Downloads/Alaska_vessels_2025-02.csv

2. List Available Data

# List all data files
./import-sources/manage-data.sh list

# List for specific source
./import-sources/manage-data.sh list "RFMO/ICCAT"

3. Run Import

# Import USA Alaska data
./import-sources/docker-import.sh country/usa-alaska

# Import EU country (Spain)
./import-sources/docker-import.sh country/eu-fleet ESP

# Import RFMO data
./import-sources/docker-import.sh rfmo/iccat

4. Archive Old Data

# Archive files older than 30 days
./import-sources/manage-data.sh archive "COUNTRY/USA_AK"

Available Sources

Country Registries

  • country/usa-alaska - Alaska vessel registry
  • country/chile-ltp-pep - Chile fishing licenses/permits
  • country/eu-fleet <CODE> - EU fleet register (requires country code)
    • Codes: BEL, BGR, CYP, DEU, DNK, ESP, EST, FIN, FRA, GRC, HRV, IRL, ITA, LTU, LVA, MLT, NLD, POL, PRT, ROU, SVN, SWE

RFMO Registries

  • rfmo/iccat - International Commission for Conservation of Atlantic Tunas
  • rfmo/iotc - Indian Ocean Tuna Commission
  • rfmo/wcpfc - Western & Central Pacific Fisheries Commission
  • rfmo/iattc - Inter-American Tropical Tuna Commission
  • rfmo/ccsbt - Commission for Conservation of Southern Bluefin Tuna
  • rfmo/nafo - Northwest Atlantic Fisheries Organization
  • rfmo/neafc - North East Atlantic Fisheries Commission
  • rfmo/npfc - North Pacific Fisheries Commission
  • rfmo/sprfmo - South Pacific Regional Fisheries Management Organisation
  • rfmo/ffa - Pacific Islands Forum Fisheries Agency

Benefits

  1. Git-light workflow: Large files live outside the repository, avoiding push/pull bottlenecks
  2. Better Organization: Clear structure for each data source under data/raw/
  3. Easy Discovery: Simple commands to list and manage data drops
  4. Consistent Interface: Same workflow for all sources
  5. Audit Friendly: Metadata JSON and dated folders document the lineage of each snapshot
  6. Shared storage ready: Designed to mirror a Google Drive folder so teams can collaborate without reconfiguring scripts (see docs/data/google_drive_sync.md).

Migration from Old System

If you have existing data files in the Git repository:

# Run migration script
./scripts/migrate_data_files.sh

# Remove from Git tracking
git rm -r --cached import/vessels/vessel_data/*/*/raw/*.csv
git rm -r --cached import/vessels/vessel_data/*/*/raw/*.xlsx
git commit -m "chore: remove vessel data files from Git"

Adding New Sources

To add a new data source:

  1. Create directory: mkdir -p import-sources/{type}/{name}
  2. Copy an existing import script as template
  3. Update source name and configuration
  4. Document in this README

Troubleshooting

"No data file found"

  • Check data exists: ./import-sources/manage-data.sh list
  • Add new data: ./import-sources/manage-data.sh add "SOURCE/NAME" file.csv

"Import script not found"

  • Verify source name matches exactly
  • Check available sources with: ls import-sources/*/

Docker permissions

  • Ensure Docker daemon is running
  • Check containers are up: docker ps | grep ebisu