Source:
ebisu/docs/guides/import/vessel-sources-readme.md| ✏️ Edit on GitHub
Vessel Sources Import Scripts
This directory contains scripts to create and populate the original_sources_vessels table, which tracks data sources for vessel information including RFMO registries, national registries, and other vessel databases.
Directory Structure
scripts/import/vessels/
├── create_sources_vessels.sql # SQL schema creation
├── load_sources_vessels.sh # Data import script
├── README_vessel_sources_import.md # This documentation
└── vessel_data/ # 40+ dataset processing organized by category
├── RFMO/ # RFMO vessel datasets
│ ├── clean_iccat_vessels.sh
│ ├── load_iccat_vessels.sh
│ ├── clean_iotc_vessels.sh
│ ├── load_iotc_vessels.sh
│ └── ... [other RFMO datasets]
├── COUNTRY/ # National vessel registries
│ ├── clean_usa_national_registry.sh
│ ├── load_usa_national_registry.sh
│ ├── clean_canada_vessel_registry.sh
│ ├── load_canada_vessel_registry.sh
│ └── ... [other country datasets]
├── OTHER/ # Organizations, MSC, etc.
│ ├── clean_msc_certified_vessels.sh
│ ├── load_msc_certified_vessels.sh
│ ├── clean_noaa_fisheries.sh
│ ├── load_noaa_fisheries.sh
│ └── ... [other organization datasets]
└── BADDIE/ # IUU and enforcement datasets
├── clean_global_iuu_list.sh
├── load_global_iuu_list.sh
├── clean_rfmo_iuu_combined.sh
├── load_rfmo_iuu_combined.sh
└── ... [other enforcement datasets]
Note: Main vessel table schemas are handled by the existing migrations/0002_vessel_tables_migration.sql file.
1. create_sources_vessels.sql
Creates the PostgreSQL table and associated functions for vessel source tracking.
Features:
- UUID primary keys for unique source identification
- Multi-type source support using PostgreSQL arrays
- RFMO integration with
rfmo_idfield - Comprehensive indexing for performance
- Helper functions for source management
- Views for source analysis
2. load_sources_vessels.sh
Bash script that processes the CSV file and loads data into the database.
Features:
- Python-based CSV processing with data cleaning
- Multi-type source conversion (single type → array)
- URL combination (source_url[0] + source_url[1] → JSON array)
- RFMO ID handling
- Data validation and error handling
- Progress reporting and verification
3. original_sources_vessels_rfmos.csv
Input CSV file containing vessel source metadata.
3. original_sources_vessels.csv
Input CSV file containing vessel source metadata.
Expected Columns:
source_shortname- Short identifier (e.g., "ICCAT_VESSELS", "USA_NATIONAL_REGISTRY")source_fullname- Full descriptive namesource_type- Source type from enum: RFMO, RFMO-IUU, COUNTRY, COUNTRY-IUU, INTERGOVERNMENTAL, ORGANIZATION, CERTIFICATION BODY, CIVIL SOCIETY, MSCrefresh_date- When data was last refreshedsource_url[0]- Primary URLsource_url[1]- Secondary URL (optional)update_frequency- How often data updates (MONTHLY, ANNUALLY, etc.)size_approx- Approximate number of recordsstatus- Load status (will be set to PENDING initially)rfmo_id- RFMO identifier (will be mapped to UUID automatically)country_id- Country alpha3 code (e.g., "USA") for country sources (will be mapped to UUID)
Usage
Prerequisites
- PostgreSQL database with write access
- Python 3 with csv, json modules
- Environment variables set:
POSTGRES_HOSTPOSTGRES_USERPOSTGRES_PASSWORDPOSTGRES_DB
Step 1: Create the Table Structure
# Run the SQL script to create table and functions
psql -h $POSTGRES_HOST -U $POSTGRES_USER -d $POSTGRES_DB -f create_enhanced_original_sources_vessels.sql
Step 2: Load the Data
# Ensure CSV file is in /import/ directory
cp original_sources_vessels_rfmos.csv /import/
# Run the load script
./load_original_sources_vessels.sh
Database Schema
CREATE TYPE vessel_source_type_enum AS ENUM (
'RFMO', 'RFMO-IUU', 'COUNTRY', 'COUNTRY-IUU',
'INTERGOVERNMENTAL', 'ORGANIZATION', 'CERTIFICATION BODY',
'CIVIL SOCIETY', 'MSC'
);
CREATE TABLE original_sources_vessels (
source_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
source_shortname TEXT UNIQUE NOT NULL,
source_fullname TEXT NOT NULL,
version_year INTEGER,
source_types vessel_source_type_enum[] NOT NULL, -- Enum array for type safety
refresh_date DATE,
source_urls JSONB, -- Combined URLs as JSON array
update_frequency vessel_update_frequency_enum,
size_approx INTEGER,
status vessel_source_status_enum DEFAULT 'PENDING',
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
last_updated TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
rfmo_id UUID REFERENCES rfmos(id), -- FK to rfmos table
country_id UUID REFERENCES country_iso(id), -- FK to country_iso table
metadata JSONB -- Additional flexible metadata
);
RFMO & Country Integration
The system automatically maps vessel sources to reference tables based on source type:
Automatic RFMO Mapping
- When: Only when
source_typecontains "RFMO" - How: Extracts RFMO acronym from
source_shortname(e.g., "ICCAT_VESSELS" → "ICCAT") - Lookup: Matches
rfmo_acronyminrfmostable to get UUID - Result: Sets
rfmo_idforeign key to link vessel source with RFMO organization
Automatic Country Mapping
- When: Only when
source_typecontains "COUNTRY" - How: Uses
country_idcolumn value as alpha3 code (e.g., "USA") - Lookup: Matches
alpha3_codeincountry_isotable to get UUID - Result: Sets
country_idforeign key to link vessel source with country
Integration Examples
ICCAT_VESSELS+ source_type="RFMO" → Links to ICCAT in rfmos tableUSA_NATIONAL_REGISTRY+ source_type="COUNTRY" + country_id="USA" → Links to USA in country_iso tableNOAA_FISHERIES+ source_type="ORGANIZATION" → No automatic mappingMSC_CERTIFIED+ source_type="MSC" → No automatic mapping
Vessel Source Type Enum
The system supports 9 standardized source types via enum:
RFMO- Regional Fisheries Management Organization vessel listsRFMO-IUU- RFMO Illegal, Unreported, Unregulated fishing listsCOUNTRY- National vessel registries and databasesCOUNTRY-IUU- National IUU vessel listsINTERGOVERNMENTAL- International organization databasesORGANIZATION- Non-governmental organization databasesCERTIFICATION BODY- Certification scheme vessel listsCIVIL SOCIETY- NGO and advocacy group databasesMSC- Marine Stewardship Council certified vessel lists
Helper Functions
The scripts create several utility functions:
Source Management
mark_vessel_source_loaded(shortname, record_count, refresh_date)- Mark source as successfully loadedupdate_vessel_source_record_count(shortname, table_name)- Update record count from actual table
Source Queries
get_vessel_sources_by_type(source_type)- Get all sources of specific enum typeget_all_vessel_source_types()- Get breakdown of all source types with countsvessel_source_has_type(shortname, type)- Check if source has specific enum type
Mapping Functions
get_rfmo_id_from_source(shortname)- Extract RFMO acronym and lookup UUIDget_country_id_from_alpha3(alpha3_code)- Map country alpha3 code to UUID
Views
active_vessel_sources_summary
Summary of active sources by type:
SELECT * FROM active_vessel_sources_summary;
vessel_source_details
Detailed information with freshness status:
SELECT * FROM vessel_source_details;
Example Queries
Find all RFMO vessel sources with organization details
SELECT ovs.source_shortname, ovs.source_fullname, r.rfmo_acronym, r.rfmo_name
FROM original_sources_vessels ovs
JOIN rfmos r ON ovs.rfmo_id = r.id
WHERE 'RFMO'::vessel_source_type_enum = ANY(ovs.source_types);
Find all country vessel sources with country details
SELECT ovs.source_shortname, ovs.source_fullname, c.alpha3_code, c.short_name_en
FROM original_sources_vessels ovs
JOIN country_iso c ON ovs.country_id = c.id
WHERE 'COUNTRY'::vessel_source_type_enum = ANY(ovs.source_types);
Find sources with multiple types
SELECT source_shortname, source_types, array_length(source_types, 1) as type_count
FROM original_sources_vessels
WHERE array_length(source_types, 1) > 1;
Get sources by RFMO organization
SELECT r.rfmo_acronym, COUNT(*) as source_count,
string_agg(ovs.source_shortname, ', ') as sources
FROM original_sources_vessels ovs
JOIN rfmos r ON ovs.rfmo_id = r.id
GROUP BY r.rfmo_acronym
ORDER BY r.rfmo_acronym;
Get sources by country
SELECT c.alpha3_code, c.short_name_en, COUNT(*) as source_count,
string_agg(ovs.source_shortname, ', ') as sources
FROM original_sources_vessels ovs
JOIN country_iso c ON ovs.country_id = c.id
GROUP BY c.alpha3_code, c.short_name_en
ORDER BY c.alpha3_code;
Get sources by freshness
SELECT source_shortname, freshness_status, refresh_date
FROM vessel_source_details
ORDER BY refresh_date DESC;
Troubleshooting
Common Issues
-
CSV Format Errors
- Ensure CSV uses proper comma separation
- Check for special characters in text fields
- Verify date format is YYYY-MM-DD or MM/DD/YYYY
-
Missing Required Fields
source_shortnameandsource_fullnameare requiredsource_typemust not be empty- Check for NULL/NaN values in required columns
-
URL Processing Issues
- URLs with special characters may need escaping
- Both source_url[0] and source_url[1] are optional
- Invalid URLs are logged but don't stop processing
-
Enum Validation Issues
- Source types must be one of the 9 valid enum values
- Invalid source types are logged and skipped during processing
- Check CSV for typos in source_type values
- Enum values are case-sensitive and must be uppercase
-
RFMO Mapping Issues
- RFMO mapping only occurs when
source_typecontains "RFMO" - RFMO acronym is extracted from the first part of
source_shortname - If RFMO acronym not found in
rfmostable,rfmo_idremains NULL - Check that
rfmostable is populated before running vessel sources import - Verify RFMO acronyms match exactly (case-sensitive)
- RFMO mapping only occurs when
-
Country Mapping Issues
- Country mapping only occurs when
source_typecontains "COUNTRY" - Country alpha3 code is read directly from
country_idcolumn in CSV - Alpha3 codes must be exactly 3 alphabetic characters (e.g., "USA", "CAN")
- If alpha3 code not found in
country_isotable,country_idremains NULL - Check that
country_isotable is populated with proper alpha3_code values - Verify alpha3 codes match exactly (case-sensitive)
- Country mapping only occurs when
Log Analysis
The script provides detailed logging:
- Row processing status
- Data cleaning results
- Import verification
- Query capability testing
Check the console output for warnings about skipped rows or data cleaning issues.
Integration
After loading vessel sources, they can be referenced by:
- Vessel import scripts
- Data quality checks
- Source tracking in vessel tables
- RFMO-specific processing workflows
The source_id UUID can be used as foreign key in vessel-related tables to track data provenance.