Source:
ebisu/docs/guides/import/import-guide.md| ✏️ Edit on GitHub
Vessel Import Guide
This guide explains how to import vessel data from various sources into the Ebisu intelligence platform.
Quick Start
1. Place Data File
Put your data file in the appropriate directory:
/import/vessels/vessel_data/[TYPE]/[SOURCE]/raw/filename_YYYY-MM-DD.ext
Example:
/import/vessels/vessel_data/INTERGOV/PNA_TUNA/raw/PNA_TUNA_2025-10-15.csv
2. Run Import Script
docker exec -i ebisu-importer bash -c "
export POSTGRES_HOST=ebisu-db &&
export POSTGRES_PORT=5432 &&
export POSTGRES_DB=ebisu &&
export POSTGRES_USER=ebisu_user &&
export POSTGRES_PASSWORD=ebisu_password &&
/app/scripts/import/vessels/data/[TYPE]/load_[source].sh"
Import Scripts by Category
INTERGOV (Inter-governmental Organizations)
# PNA TUNA Registry
/app/scripts/import/vessels/data/INTERGOV/load_pna_tuna.sh
# PNA FSMA Registry
/app/scripts/import/vessels/data/INTERGOV/load_pna_fsma.sh
COUNTRY (National Registries)
# Peru National Registry
/app/scripts/import/vessels/data/COUNTRY/load_per_vessels.sh
# Chile Regional Registries (I-XVI)
/app/scripts/import/vessels/data/COUNTRY/load_chl_rpa.sh REGION=I
/app/scripts/import/vessels/data/COUNTRY/load_chl_rpa.sh REGION=II
# ... etc for each region
# Chile LTP-PEP Registry
/app/scripts/import/vessels/data/COUNTRY/load_chl_ltp_pep.sh
RFMO (Regional Fisheries Management)
# Various RFMOs (already implemented)
/app/scripts/import/vessels/data/RFMO/load_[rfmo]_vessels.sh
CIVIL_SOCIETY
# ISSF Registries
/app/scripts/import/vessels/data/CIVIL_SOCIETY/load_issf_ps.sh
/app/scripts/import/vessels/data/CIVIL_SOCIETY/load_issf_pvr.sh
/app/scripts/import/vessels/data/CIVIL_SOCIETY/load_issf_uvi.sh
/app/scripts/import/vessels/data/CIVIL_SOCIETY/load_issf_vosi.sh
Data Preparation
CSV Files
- Should have headers in first row
- UTF-8 encoding
- Standard delimiters (comma)
- No need for cleaning if already properly formatted
Excel Files (.xlsx, .xls)
Need conversion to CSV first:
# Convert Excel to CSV
docker exec -i ebisu-importer python3 /app/scripts/import/vessels/convert_country_registries.py --country CHL
PDF Files
- Manual extraction may be required
- Consider using tabula-py or similar tools
- Contact data team for assistance
Import Features
Automatic Features
- Duplicate Prevention: File hash tracking prevents reimporting same file
- Change Detection: Compares with previous imports
- Confirmation Tracking: Updates cross-source vessel confirmations
- Data Lineage: Complete audit trail
- Temporal History: Tracks changes over time
Manual Steps
- Source Creation: New sources need to be added to database first
- Schema Mapping: New formats may need custom staging tables
- Data Cleaning: Non-CSV formats need conversion
Monitoring Imports
Check Import Status
-- Recent imports
SELECT
rfmo_shortname as source,
import_date,
raw_records_count as records,
processing_completed_at
FROM intelligence_import_batches
ORDER BY import_date DESC
LIMIT 10;
Check Confirmations
-- Vessels confirmed by multiple sources
SELECT
vessel_name,
vessel_imo,
confirmation_count,
confirming_source_names
FROM vessel_identity_confirmations
WHERE confirmation_count > 1
ORDER BY confirmation_count DESC
LIMIT 20;
Check Data Quality
-- Import quality metrics
SELECT
osv.source_shortname,
COUNT(DISTINCT vi.intelligence_id) as vessels,
AVG(vi.data_completeness_score) as avg_completeness,
SUM(CASE WHEN vi.reported_imo IS NOT NULL THEN 1 ELSE 0 END)::float / COUNT(*) as imo_coverage
FROM vessel_intelligence vi
JOIN intelligence_reports ir ON vi.report_id = ir.report_id
JOIN original_sources_vessels osv ON ir.source_id = osv.source_id
GROUP BY osv.source_shortname
ORDER BY vessels DESC;
Troubleshooting
Common Errors
"Source not found"
# Add missing sources
./scripts/run_sql.sh scripts/import/vessels/create_missing_country_sources.sql
"No CSV files found"
- Check file is in correct
/raw/directory - Verify file extension and permissions
"Data loss detected"
- Check for empty lines in CSV
- Verify encoding (should be UTF-8)
- Look for embedded newlines in fields
"File already imported"
- This is correct behavior - prevents duplicates
- To force reimport, delete lineage record first
Debug Commands
# Check what files are visible
docker exec -i ebisu-importer ls -la /import/vessels/vessel_data/[TYPE]/[SOURCE]/raw/
# Test database connection
docker exec -i ebisu-importer bash -c "
export POSTGRES_HOST=ebisu-db &&
export POSTGRES_PORT=5432 &&
/app/scripts/core/test_connection.sh"
# Check import logs
docker logs ebisu-importer --tail 50
Best Practices
- File Naming: Always include date (YYYY-MM-DD)
- Regular Updates: Schedule imports based on source update frequency
- Archive Old Files: Move processed files to
/archive/subdirectory - Monitor Quality: Check data completeness after each import
- Document Issues: Note any data quality problems for future reference
Adding New Sources
- Create source in database:
INSERT INTO original_sources_vessels (
source_shortname,
source_fullname,
source_types,
authority_level,
data_quality_score
) VALUES (
'NEW_SOURCE',
'Full Name of New Source',
ARRAY['COUNTRY']::text[],
'AUTHORITATIVE',
0.80
);
- Create directory structure:
mkdir -p /import/vessels/vessel_data/[TYPE]/[SOURCE]/{raw,cleaned,archive}
- Create import script based on template:
cp /app/scripts/import/vessels/data/template_import.sh \
/app/scripts/import/vessels/data/[TYPE]/load_[source].sh
- Customize staging table and field mappings
Support
For assistance:
- Check existing import scripts for examples
- Review error logs in container
- Consult ADR-0063 for architecture decisions
- Contact data team for complex conversions