Source:
ebisu/backend/import-sources/README.md| ✏️ Edit on GitHub
Ebisu Import System
Overview
The reorganized import system separates code from data, providing better organization and avoiding Git repository bloat.
Directory Structure
ebisu/
├── import-sources/ # Import scripts and tools
│ ├── manage-data.sh # Data file management
│ ├── docker-import.sh # Docker-based import runner
│ └── {source-type}/{name}/ # Organized import scripts
│
├── data/
│ ├── raw/ # Raw datasets synced from Google Drive (ignored by Git)
│ │ ├── vessels/vessel_data/
│ │ │ ├── COUNTRY/USA_AK/raw/
│ │ │ └── RFMO/ICCAT/raw/
│ │ ├── WoRMS_download_2025-07-01/
│ │ └── ...
│ ├── processed/ # Derived outputs generated locally
│ └── archive/ # Optional long-term storage for superseded snapshots
│
└── (optional) external storage, e.g. `$EBISU_DATA_ROOT` pointing at a Google Drive sync folder
Quick Start
1. Add New Data File
# Add a new data file for import
./import-sources/manage-data.sh add "COUNTRY/USA_AK" ~/Downloads/Alaska_vessels_2025-02.csv
2. List Available Data
# List all data files
./import-sources/manage-data.sh list
# List for specific source
./import-sources/manage-data.sh list "RFMO/ICCAT"
3. Run Import
# Import USA Alaska data
./import-sources/docker-import.sh country/usa-alaska
# Import EU country (Spain)
./import-sources/docker-import.sh country/eu-fleet ESP
# Import RFMO data
./import-sources/docker-import.sh rfmo/iccat
4. Archive Old Data
# Archive files older than 30 days
./import-sources/manage-data.sh archive "COUNTRY/USA_AK"
Available Sources
Country Registries
country/usa-alaska- Alaska vessel registrycountry/chile-ltp-pep- Chile fishing licenses/permitscountry/eu-fleet <CODE>- EU fleet register (requires country code)- Codes: BEL, BGR, CYP, DEU, DNK, ESP, EST, FIN, FRA, GRC, HRV, IRL, ITA, LTU, LVA, MLT, NLD, POL, PRT, ROU, SVN, SWE
RFMO Registries
rfmo/iccat- International Commission for Conservation of Atlantic Tunasrfmo/iotc- Indian Ocean Tuna Commissionrfmo/wcpfc- Western & Central Pacific Fisheries Commissionrfmo/iattc- Inter-American Tropical Tuna Commissionrfmo/ccsbt- Commission for Conservation of Southern Bluefin Tunarfmo/nafo- Northwest Atlantic Fisheries Organizationrfmo/neafc- North East Atlantic Fisheries Commissionrfmo/npfc- North Pacific Fisheries Commissionrfmo/sprfmo- South Pacific Regional Fisheries Management Organisationrfmo/ffa- Pacific Islands Forum Fisheries Agency
Benefits
- Git-light workflow: Large files live outside the repository, avoiding push/pull bottlenecks
- Better Organization: Clear structure for each data source under
data/raw/ - Easy Discovery: Simple commands to list and manage data drops
- Consistent Interface: Same workflow for all sources
- Audit Friendly: Metadata JSON and dated folders document the lineage of each snapshot
- Shared storage ready: Designed to mirror a Google Drive folder so teams can collaborate without reconfiguring scripts (see
docs/data/google_drive_sync.md).
Migration from Old System
If you have existing data files in the Git repository:
# Run migration script
./scripts/migrate_data_files.sh
# Remove from Git tracking
git rm -r --cached import/vessels/vessel_data/*/*/raw/*.csv
git rm -r --cached import/vessels/vessel_data/*/*/raw/*.xlsx
git commit -m "chore: remove vessel data files from Git"
Adding New Sources
To add a new data source:
- Create directory:
mkdir -p import-sources/{type}/{name} - Copy an existing import script as template
- Update source name and configuration
- Document in this README
Troubleshooting
"No data file found"
- Check data exists:
./import-sources/manage-data.sh list - Add new data:
./import-sources/manage-data.sh add "SOURCE/NAME" file.csv
"Import script not found"
- Verify source name matches exactly
- Check available sources with:
ls import-sources/*/
Docker permissions
- Ensure Docker daemon is running
- Check containers are up:
docker ps | grep ebisu