Skip to main content

Source: ebisu/docs/adr/0063-missing-vessel-registries-import.md | ✏️ Edit on GitHub

ADR-0063: Missing Vessel Registries Import Strategy

Status: Accepted
Date: 2025-01-11
Stakeholders: Development team, Data team

Context

We have received additional vessel registry data from various countries and organizations that need to be imported into the Ebisu intelligence platform. This data was collected on 2025-09-08 and includes:

Country Registries

  • EU Countries (already have sources): Belgium, Bulgaria, Cyprus, Germany, Denmark, Spain, Estonia, Finland, France, UK, Greece, Croatia, Ireland, Italy, Lithuania, Latvia, Malta, Netherlands, Poland, Portugal, Romania, Slovenia, Sweden
  • Chile: 16 regional RPA files (xlsx) + LTP-PEP file
  • Peru: Single vessels file (xls)
  • Others: Faroe Islands (pdf), USA Alaska (already have source)

Organization Registries

  • ISSF: PS, PVR, UVI, VOSI (already have sources)
  • PNA: FSMA, TUNA (new sources needed)

Decision

Implement a phased approach to import this data:

Phase 1: Create Missing Sources

Add sources for data not yet in the system:

  • Chile regional registries (CHL_RPA_I through CHL_RPA_XVI)
  • Chile LTP-PEP registry (CHL_LTP_PEP)
  • Peru vessel registry (PER_VESSELS)
  • PNA FSMA registry (PNA_FSMA)
  • PNA Tuna registry (PNA_TUNA)

Phase 2: Data Conversion

Convert non-CSV formats to standardized CSV:

  • Excel files (Chile, Peru, UK) → CSV
  • PDF files (Faroe Islands, Taiwan) → Structured data (may require manual processing)

Phase 3: Enhanced Import Scripts

Create import scripts using the Phase 1 enhanced template that includes:

  • Temporal tracking
  • Change detection
  • Confirmation tracking
  • Data lineage

Architecture

Raw Files → Conversion → Cleaned CSV → Staging → Intelligence Reports
↓ ↓ ↓ ↓ ↓
PDF/Excel Python Validation PostgreSQL Confirmation
Scripts Tables Tracking

Implementation Plan

1. Source Creation Script

-- Add Chile regional sources
INSERT INTO original_sources_vessels (
source_shortname,
source_fullname,
source_types,
country_id,
authority_level,
data_quality_score
) VALUES
('CHL_RPA_I', 'Chile Region I (Tarapacá) Vessel Registry', '{COUNTRY}',
(SELECT country_id FROM countries WHERE iso_alpha3 = 'CHL'),
'AUTHORITATIVE', 0.80),
('CHL_RPA_II', 'Chile Region II (Antofagasta) Vessel Registry', '{COUNTRY}',
(SELECT country_id FROM countries WHERE iso_alpha3 = 'CHL'),
'AUTHORITATIVE', 0.80),
-- ... continue for all regions
('CHL_LTP_PEP', 'Chile LTP-PEP Vessel Registry', '{COUNTRY}',
(SELECT country_id FROM countries WHERE iso_alpha3 = 'CHL'),
'AUTHORITATIVE', 0.80),
('PER_VESSELS', 'Peru National Vessel Registry', '{COUNTRY}',
(SELECT country_id FROM countries WHERE iso_alpha3 = 'PER'),
'AUTHORITATIVE', 0.80),
('PNA_FSMA', 'PNA Vessel Registry (FSMA)', '{INTERGOV}',
NULL, 'AUTHORITATIVE', 0.85),
('PNA_TUNA', 'PNA Tuna Vessel Registry', '{INTERGOV}',
NULL, 'AUTHORITATIVE', 0.85);

2. Data Conversion Tools

  • Use Python pandas for Excel → CSV conversion
  • Implement validation for each source's specific format
  • Handle character encoding issues (especially Spanish/Portuguese names)
  • Preserve all original data fields

3. Import Scripts

Based on the enhanced template, create specific importers for:

  • Each Chile region (handle regional differences)
  • Peru vessels (handle Spanish field names)
  • PNA registries (handle Pacific island specifics)

Data Quality Considerations

Field Mapping Challenges

  • CFR (EU): Community Fleet Register number
  • ADFG Number (USA Alaska): Alaska Department of Fish & Game ID
  • Registration variations: Each country has unique identifiers

Standardization Approach

  1. Map country-specific IDs to vessel_external_identifiers
  2. Extract IMO, IRCS, MMSI where available
  3. Store country-specific fields in JSONB
  4. Calculate data completeness scores

Consequences

Positive

  • Adds ~50,000+ vessels from national registries
  • Increases cross-source confirmations
  • Improves coverage of artisanal fleets (Chile, Peru)
  • Enhances Pacific island coverage (PNA)

Negative

  • Complex data conversion required
  • PDF extraction may need manual work
  • Regional variations need careful handling

Neutral

  • Will trigger many confirmation updates
  • Increases storage requirements
  • May reveal data conflicts with RFMO data

Validation Strategy

After import, validate:

  1. Data completeness: Check required fields
  2. Cross-source confirmations: Look for vessels in multiple sources
  3. Geographic consistency: Vessels flagged to country should appear in registry
  4. Temporal consistency: Registration dates should be logical

References

  • ADR-0061: Intelligence Confirmations Are Not Duplicates
  • Import Architecture documentation
  • Enhanced import template: template_enhanced_staged_import.sh