Skip to main content

Source: ebisu/docs/reference/schemas/asfis_species/asfis_annual_update_strategy.md | ✏️ Edit on GitHub

Annual Update Strategy - Complete Implementation Guide

📋 Executive Summary

CURRENT PROBLEM: The existing "nuke and rebuild" approach for annual updates is unsustainable:

  • ⏱️ 30+ minute downtime
  • 🚨 Risk of data corruption
  • ❌ No rollback capability
  • 💥 All-or-nothing approach

PROPOSED SOLUTION: Incremental update strategy with proper change tracking:

  • ⚡ 5-10 minute updates
  • 🔄 Zero downtime
  • 🛡️ Full backup and rollback capability
  • 📊 Complete audit trail
  • ✅ Only new trade codes need WoRMS/ITIS mapping

🚨 Critical Issues Found & Fixed

Issue 1: ASFIS Data Duplication (RESOLVED)

-- ❌ PROBLEM: Double \COPY commands in import_asfis.sql
\COPY asfis_species (...) FROM '/import/ASFIS_sp_2025_cleaned.csv' ...
\COPY asfis_species (...) FROM '/import/ASFIS_sp_2025_cleaned.csv' ... -- DUPLICATE!

-- ✅ SOLUTION: Single import command (provided in fixed script)
\COPY asfis_species (...) FROM '/import/ASFIS_sp_2025_cleaned.csv' ...

Result: Record count should be ~13,733, not 27,466

Issue 2: Broken PostgreSQL Function Syntax (RESOLVED)

-- ❌ PROBLEM: Invalid dollar quoting
) AS $ -- Missing function body terminator
$ LANGUAGE plpgsql; -- Missing function body terminator

-- ✅ SOLUTION: Proper PostgreSQL function syntax (provided in fixed script)
) AS $$
BEGIN
-- function body
END;
$$ LANGUAGE plpgsql;

Issue 3: Preprocessing Creates Duplicate Keys (IDENTIFIED)

# ❌ PROBLEM: Same Taxonomic_Code for multiple species breaks UPSERT logic
# "Alosa alosa, A. fallax" becomes:
# Row 1: Taxonomic_Code "1750100101", scientificName "Alosa alosa"
# Row 2: Taxonomic_Code "1750100101", scientificName "Alosa fallax" # Same key!

# ✅ SOLUTION: Use composite keys for UPSERT operations
composite_key = f"{taxonomic_code}::{scientific_name}"

🎯 Annual Update Strategy - Three Implementation Options

Approach: Minimal changes to current setup, use composite unique keys

Pros:

  • ✅ Works with existing preprocessing
  • ✅ Minimal code changes required
  • ✅ Can implement immediately

Cons:

  • ❌ More complex UPSERT logic
  • ❌ Not the cleanest architecture

Implementation:

-- Add composite unique constraint for incremental updates
CREATE UNIQUE INDEX asfis_species_composite_key
ON asfis_species ("Taxonomic_Code", "scientificName");

-- Use composite key for UPSERT operations
ON CONFLICT ("Taxonomic_Code", "scientificName") DO UPDATE SET ...

Approach: Separate trade codes from species mappings

Pros:

  • ✅ Clean separation of concerns
  • ✅ Easy incremental updates
  • ✅ Proper normalization
  • ✅ Future-proof architecture

Cons:

  • ❌ Requires preprocessing changes
  • ❌ Schema migration needed
  • ❌ More complex initial setup

Implementation:

-- Master table: One record per trade code
CREATE TABLE asfis_trade_codes (
asfis_trade_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
"Taxonomic_Code" TEXT UNIQUE NOT NULL, -- Now truly unique
"Alpha3_Code" TEXT,
-- Trade-level attributes
data_year INTEGER,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Detail table: Multiple species per trade code
CREATE TABLE asfis_species_mapping (
mapping_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
asfis_trade_id UUID REFERENCES asfis_trade_codes(asfis_trade_id),
"scientificName" TEXT NOT NULL,
-- Species-specific attributes
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
UNIQUE(asfis_trade_id, "scientificName")
);

Option C: Current Setup with Composite Keys (Compromise - 2-3 weeks)

Approach: Keep current table structure, modify preprocessing for stable composite keys

Implementation provided in: Modified ASFIS Preprocessing for Incremental Updates artifact


🔧 Implementation Roadmap

Phase 1: Immediate Fixes (This Week)

Step 1.1: Fix ASFIS Duplication Bug

# Replace import_asfis.sql with the fixed version provided
# Test the fix
docker compose down && docker compose up

# Verify record count
docker exec -it pg-schema-viewer-db psql -U myuser -d speciestest -c "SELECT COUNT(*) FROM asfis_species;"
# Should show ~13,733, not 27,466

Step 1.2: Add Historical Tracking Infrastructure

-- Run these SQL commands to prepare for incremental updates
CREATE TABLE asfis_species_historical (
-- Copy all columns from main table
LIKE asfis_species INCLUDING ALL,

-- Add historical tracking columns
archived_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
archived_reason TEXT DEFAULT 'REMOVED_FROM_SOURCE',
data_year INTEGER,
original_asfis_id UUID
);

-- Add version tracking to main table
ALTER TABLE asfis_species ADD COLUMN IF NOT EXISTS data_year INTEGER DEFAULT 2025;
ALTER TABLE asfis_species ADD COLUMN IF NOT EXISTS last_seen_year INTEGER DEFAULT 2025;

-- Create indexes for historical queries
CREATE INDEX idx_asfis_historical_taxonomic_code ON asfis_species_historical("Taxonomic_Code");
CREATE INDEX idx_asfis_historical_alpha3_code ON asfis_species_historical("Alpha3_Code");
CREATE INDEX idx_asfis_historical_archived_at ON asfis_species_historical(archived_at);
CREATE INDEX idx_asfis_historical_data_year ON asfis_species_historical(data_year);

Phase 2: Choose Architecture Path (This Month)

Decision Point: Choose Option A, B, or C based on your requirements:

  • Option A for fastest implementation
  • Option B for cleanest long-term architecture
  • Option C for balanced approach

Phase 3: Implement Incremental Updates (Next Month)

Step 3.1: Deploy Incremental Update Functions

  • Use the SQL functions provided in the Incremental Annual Update Implementation artifact
  • Test with staging data first
  • Validate backup and restore procedures

Step 3.2: Create Annual Update Script

  • Use the bash script provided in the Annual Update Orchestration Script artifact
  • Test the complete workflow with 2025 data
  • Verify rollback procedures work

Step 3.3: Integrate with WoRMS/ITIS Mapping

-- After incremental update, identify new mappings needed
SELECT
a."Alpha3_Code",
a."scientificName",
CASE
WHEN w.aphia_id IS NOT NULL THEN 'WoRMS_MAPPED'
WHEN i.tsn IS NOT NULL THEN 'ITIS_MAPPED'
ELSE 'NEEDS_MAPPING'
END as mapping_status
FROM asfis_species a
LEFT JOIN worms_taxonomic_units w ON a."scientificName" = w."scientificName"
LEFT JOIN itis_taxonomic_units i ON a."scientificName" = i.complete_name
WHERE a.data_year = 2025 -- Only check new/updated records
AND mapping_status = 'NEEDS_MAPPING';

📅 Annual Update Workflow (Future State)

Current Workflow (BROKEN):

# ❌ Current approach - unsustainable
docker compose down # Downtime starts
docker compose up # 30+ minutes, rebuild everything
# Risk of failure, no rollback, loses all user data

New Workflow (OPTIMIZED):

# ✅ New incremental approach
# 1. Download new ASFIS data
wget "https://fao.org/asfis/ASFIS_sp_2026.csv" -O /import/ASFIS_sp_2026.csv

# 2. Run incremental update (zero downtime)
./annual_update.sh 2026

# 3. Review what changed
psql -c "SELECT * FROM preview_asfis_annual_update('/import/ASFIS_sp_2026_cleaned.csv');"

# 4. Execute update (5-10 minutes)
psql -c "SELECT * FROM perform_asfis_annual_update(2026, '/import/ASFIS_sp_2026_cleaned.csv');"

# 5. Map only new codes to WoRMS/ITIS (separate process)
./map_new_codes_to_taxonomies.sh 2026

# 6. Done! Database never went offline ✅

🎯 Benefits Analysis

Performance Improvements:

MetricCurrent ApproachNew ApproachImprovement
Downtime30+ minutes0 minutes100% elimination
Update Time30+ minutes5-10 minutes75% reduction
Data ProcessingFull rebuildOnly changes90% reduction
Risk LevelHighLowSignificant reduction

Operational Benefits:

  • Existing records keep their UUIDs (API stability)
  • New records get current timestamps (proper versioning)
  • Complete audit trail (know exactly what changed when)
  • Rollback capability (can undo if needed)
  • Historical data preserved (nothing lost)
  • Only new codes need mapping (focused effort)

Data Integrity Benefits:

  • 🛡️ Backup before every update (safety first)
  • 📊 Change detection (know what actually changed)
  • 🔍 Validation checks (verify update success)
  • 📋 Status tracking (monitor update progress)
  • ↩️ Recovery procedures (handle failures gracefully)

🚀 Trade Data Implementation Status

✅ FEASIBLE - Trade Data Patterns Supported:

Pattern 1: Single Species per Trade Code (1:1)

-- Example: YFT (Yellowfin Tuna)
SELECT * FROM asfis_species WHERE "Alpha3_Code" = 'YFT';
-- Returns: Only Thunnus albacares

Pattern 2: Multiple Species per Trade Code (1:Many)

-- Example: HKC (Cape Hakes)
SELECT * FROM asfis_species WHERE "Alpha3_Code" = 'HKC';
-- Returns: Merluccius capensis AND Merluccius paradoxus

Pattern 3: Higher Taxonomic Level with Cascading (1:Hierarchy)

-- Example: CAX (Ariidae - requires cascading to all family members)
SELECT * FROM asfis_species WHERE "Alpha3_Code" = 'CAX';
-- Returns: Ariidae (family level)
-- Future: Should cascade to ALL species within Ariidae family via WoRMS/ITIS

Cascading Implementation (Future):

-- Family-level cascading example
WITH family_codes AS (
SELECT "Alpha3_Code", "scientificName" as family_name
FROM asfis_species WHERE "taxonRank" = 'Family'
)
SELECT
w."scientificName",
fc."Alpha3_Code" as inherited_trade_code,
'Family-level inheritance' as assignment_type
FROM worms_taxonomic_units w
JOIN family_codes fc ON w."family" = fc.family_name;

📋 Implementation Checklist

Immediate Actions (This Week):

  • Replace import_asfis.sql with fixed version (remove duplicate \COPY)
  • Test fix: docker compose down && docker compose up
  • Verify ASFIS record count is ~13,733 (not 27,466)
  • Add historical tracking tables to database schema

Short Term (This Month):

  • Choose architecture path (Option A, B, or C)
  • Implement chosen approach
  • Create staging environment for testing updates
  • Develop incremental update functions

Medium Term (Next Month):

  • Deploy incremental update orchestration script
  • Test complete annual update workflow with 2025 data
  • Validate backup and rollback procedures
  • Document new update process for team

Long Term (Ongoing):

  • Map new trade codes to WoRMS/ITIS as they appear
  • Monitor update performance and optimization opportunities
  • Implement automated notifications for update completion
  • Create data quality monitoring dashboards

🎉 Expected Outcomes

After Implementation:

  • 🚀 Annual updates become routine 10-minute operations
  • 🛡️ Zero risk of losing data during updates
  • 📊 Complete visibility into what changes between years
  • New trade codes can be mapped incrementally as needed
  • 🔄 System stays online during all update operations
  • 📋 Full audit trail for compliance and debugging

Trade Data Ready:

  • 1:1 species mappings (direct trade codes)
  • 1:Many species mappings (shared trade codes)
  • 1:Hierarchy mappings (taxonomic cascading ready)
  • WoRMS/ITIS integration (new codes mapped as needed)

This incremental update strategy transforms annual updates from a high-risk, high-downtime operation into a routine, safe, fast process that preserves data integrity and provides complete auditability.