Source:
ebisu/docs/adr/0063-missing-vessel-registries-import.md| ✏️ Edit on GitHub
ADR-0063: Missing Vessel Registries Import Strategy
Status: Accepted
Date: 2025-01-11
Stakeholders: Development team, Data team
Context
We have received additional vessel registry data from various countries and organizations that need to be imported into the Ebisu intelligence platform. This data was collected on 2025-09-08 and includes:
Country Registries
- EU Countries (already have sources): Belgium, Bulgaria, Cyprus, Germany, Denmark, Spain, Estonia, Finland, France, UK, Greece, Croatia, Ireland, Italy, Lithuania, Latvia, Malta, Netherlands, Poland, Portugal, Romania, Slovenia, Sweden
- Chile: 16 regional RPA files (xlsx) + LTP-PEP file
- Peru: Single vessels file (xls)
- Others: Faroe Islands (pdf), USA Alaska (already have source)
Organization Registries
- ISSF: PS, PVR, UVI, VOSI (already have sources)
- PNA: FSMA, TUNA (new sources needed)
Decision
Implement a phased approach to import this data:
Phase 1: Create Missing Sources
Add sources for data not yet in the system:
- Chile regional registries (CHL_RPA_I through CHL_RPA_XVI)
- Chile LTP-PEP registry (CHL_LTP_PEP)
- Peru vessel registry (PER_VESSELS)
- PNA FSMA registry (PNA_FSMA)
- PNA Tuna registry (PNA_TUNA)
Phase 2: Data Conversion
Convert non-CSV formats to standardized CSV:
- Excel files (Chile, Peru, UK) → CSV
- PDF files (Faroe Islands, Taiwan) → Structured data (may require manual processing)
Phase 3: Enhanced Import Scripts
Create import scripts using the Phase 1 enhanced template that includes:
- Temporal tracking
- Change detection
- Confirmation tracking
- Data lineage
Architecture
Raw Files → Conversion → Cleaned CSV → Staging → Intelligence Reports
↓ ↓ ↓ ↓ ↓
PDF/Excel Python Validation PostgreSQL Confirmation
Scripts Tables Tracking
Implementation Plan
1. Source Creation Script
-- Add Chile regional sources
INSERT INTO original_sources_vessels (
source_shortname,
source_fullname,
source_types,
country_id,
authority_level,
data_quality_score
) VALUES
('CHL_RPA_I', 'Chile Region I (Tarapacá) Vessel Registry', '{COUNTRY}',
(SELECT country_id FROM countries WHERE iso_alpha3 = 'CHL'),
'AUTHORITATIVE', 0.80),
('CHL_RPA_II', 'Chile Region II (Antofagasta) Vessel Registry', '{COUNTRY}',
(SELECT country_id FROM countries WHERE iso_alpha3 = 'CHL'),
'AUTHORITATIVE', 0.80),
-- ... continue for all regions
('CHL_LTP_PEP', 'Chile LTP-PEP Vessel Registry', '{COUNTRY}',
(SELECT country_id FROM countries WHERE iso_alpha3 = 'CHL'),
'AUTHORITATIVE', 0.80),
('PER_VESSELS', 'Peru National Vessel Registry', '{COUNTRY}',
(SELECT country_id FROM countries WHERE iso_alpha3 = 'PER'),
'AUTHORITATIVE', 0.80),
('PNA_FSMA', 'PNA Vessel Registry (FSMA)', '{INTERGOV}',
NULL, 'AUTHORITATIVE', 0.85),
('PNA_TUNA', 'PNA Tuna Vessel Registry', '{INTERGOV}',
NULL, 'AUTHORITATIVE', 0.85);
2. Data Conversion Tools
- Use Python pandas for Excel → CSV conversion
- Implement validation for each source's specific format
- Handle character encoding issues (especially Spanish/Portuguese names)
- Preserve all original data fields
3. Import Scripts
Based on the enhanced template, create specific importers for:
- Each Chile region (handle regional differences)
- Peru vessels (handle Spanish field names)
- PNA registries (handle Pacific island specifics)
Data Quality Considerations
Field Mapping Challenges
- CFR (EU): Community Fleet Register number
- ADFG Number (USA Alaska): Alaska Department of Fish & Game ID
- Registration variations: Each country has unique identifiers
Standardization Approach
- Map country-specific IDs to
vessel_external_identifiers - Extract IMO, IRCS, MMSI where available
- Store country-specific fields in JSONB
- Calculate data completeness scores
Consequences
Positive
- Adds ~50,000+ vessels from national registries
- Increases cross-source confirmations
- Improves coverage of artisanal fleets (Chile, Peru)
- Enhances Pacific island coverage (PNA)
Negative
- Complex data conversion required
- PDF extraction may need manual work
- Regional variations need careful handling
Neutral
- Will trigger many confirmation updates
- Increases storage requirements
- May reveal data conflicts with RFMO data
Validation Strategy
After import, validate:
- Data completeness: Check required fields
- Cross-source confirmations: Look for vessels in multiple sources
- Geographic consistency: Vessels flagged to country should appear in registry
- Temporal consistency: Registration dates should be logical
References
- ADR-0061: Intelligence Confirmations Are Not Duplicates
- Import Architecture documentation
- Enhanced import template:
template_enhanced_staged_import.sh