Source:
ebisu/docs/reference/schemas/asfis_species/asfis_annual_update_strategy.md| ✏️ Edit on GitHub
Annual Update Strategy - Complete Implementation Guide
📋 Executive Summary
CURRENT PROBLEM: The existing "nuke and rebuild" approach for annual updates is unsustainable:
- ⏱️ 30+ minute downtime
- 🚨 Risk of data corruption
- ❌ No rollback capability
- 💥 All-or-nothing approach
PROPOSED SOLUTION: Incremental update strategy with proper change tracking:
- ⚡ 5-10 minute updates
- 🔄 Zero downtime
- 🛡️ Full backup and rollback capability
- 📊 Complete audit trail
- ✅ Only new trade codes need WoRMS/ITIS mapping
🚨 Critical Issues Found & Fixed
Issue 1: ASFIS Data Duplication (RESOLVED)
-- ❌ PROBLEM: Double \COPY commands in import_asfis.sql
\COPY asfis_species (...) FROM '/import/ASFIS_sp_2025_cleaned.csv' ...
\COPY asfis_species (...) FROM '/import/ASFIS_sp_2025_cleaned.csv' ... -- DUPLICATE!
-- ✅ SOLUTION: Single import command (provided in fixed script)
\COPY asfis_species (...) FROM '/import/ASFIS_sp_2025_cleaned.csv' ...
Result: Record count should be ~13,733, not 27,466
Issue 2: Broken PostgreSQL Function Syntax (RESOLVED)
-- ❌ PROBLEM: Invalid dollar quoting
) AS $ -- Missing function body terminator
$ LANGUAGE plpgsql; -- Missing function body terminator
-- ✅ SOLUTION: Proper PostgreSQL function syntax (provided in fixed script)
) AS $$
BEGIN
-- function body
END;
$$ LANGUAGE plpgsql;
Issue 3: Preprocessing Creates Duplicate Keys (IDENTIFIED)
# ❌ PROBLEM: Same Taxonomic_Code for multiple species breaks UPSERT logic
# "Alosa alosa, A. fallax" becomes:
# Row 1: Taxonomic_Code "1750100101", scientificName "Alosa alosa"
# Row 2: Taxonomic_Code "1750100101", scientificName "Alosa fallax" # Same key!
# ✅ SOLUTION: Use composite keys for UPSERT operations
composite_key = f"{taxonomic_code}::{scientific_name}"
🎯 Annual Update Strategy - Three Implementation Options
Option A: Quick Fix (Recommended for MVP - 1-2 weeks)
Approach: Minimal changes to current setup, use composite unique keys
Pros:
- ✅ Works with existing preprocessing
- ✅ Minimal code changes required
- ✅ Can implement immediately
Cons:
- ❌ More complex UPSERT logic
- ❌ Not the cleanest architecture
Implementation:
-- Add composite unique constraint for incremental updates
CREATE UNIQUE INDEX asfis_species_composite_key
ON asfis_species ("Taxonomic_Code", "scientificName");
-- Use composite key for UPSERT operations
ON CONFLICT ("Taxonomic_Code", "scientificName") DO UPDATE SET ...
Option B: Master-Detail Architecture (Recommended for Production - 3-4 weeks)
Approach: Separate trade codes from species mappings
Pros:
- ✅ Clean separation of concerns
- ✅ Easy incremental updates
- ✅ Proper normalization
- ✅ Future-proof architecture
Cons:
- ❌ Requires preprocessing changes
- ❌ Schema migration needed
- ❌ More complex initial setup
Implementation:
-- Master table: One record per trade code
CREATE TABLE asfis_trade_codes (
asfis_trade_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
"Taxonomic_Code" TEXT UNIQUE NOT NULL, -- Now truly unique
"Alpha3_Code" TEXT,
-- Trade-level attributes
data_year INTEGER,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
-- Detail table: Multiple species per trade code
CREATE TABLE asfis_species_mapping (
mapping_id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
asfis_trade_id UUID REFERENCES asfis_trade_codes(asfis_trade_id),
"scientificName" TEXT NOT NULL,
-- Species-specific attributes
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
UNIQUE(asfis_trade_id, "scientificName")
);
Option C: Current Setup with Composite Keys (Compromise - 2-3 weeks)
Approach: Keep current table structure, modify preprocessing for stable composite keys
Implementation provided in: Modified ASFIS Preprocessing for Incremental Updates artifact
🔧 Implementation Roadmap
Phase 1: Immediate Fixes (This Week)
Step 1.1: Fix ASFIS Duplication Bug
# Replace import_asfis.sql with the fixed version provided
# Test the fix
docker compose down && docker compose up
# Verify record count
docker exec -it pg-schema-viewer-db psql -U myuser -d speciestest -c "SELECT COUNT(*) FROM asfis_species;"
# Should show ~13,733, not 27,466
Step 1.2: Add Historical Tracking Infrastructure
-- Run these SQL commands to prepare for incremental updates
CREATE TABLE asfis_species_historical (
-- Copy all columns from main table
LIKE asfis_species INCLUDING ALL,
-- Add historical tracking columns
archived_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
archived_reason TEXT DEFAULT 'REMOVED_FROM_SOURCE',
data_year INTEGER,
original_asfis_id UUID
);
-- Add version tracking to main table
ALTER TABLE asfis_species ADD COLUMN IF NOT EXISTS data_year INTEGER DEFAULT 2025;
ALTER TABLE asfis_species ADD COLUMN IF NOT EXISTS last_seen_year INTEGER DEFAULT 2025;
-- Create indexes for historical queries
CREATE INDEX idx_asfis_historical_taxonomic_code ON asfis_species_historical("Taxonomic_Code");
CREATE INDEX idx_asfis_historical_alpha3_code ON asfis_species_historical("Alpha3_Code");
CREATE INDEX idx_asfis_historical_archived_at ON asfis_species_historical(archived_at);
CREATE INDEX idx_asfis_historical_data_year ON asfis_species_historical(data_year);
Phase 2: Choose Architecture Path (This Month)
Decision Point: Choose Option A, B, or C based on your requirements:
- Option A for fastest implementation
- Option B for cleanest long-term architecture
- Option C for balanced approach
Phase 3: Implement Incremental Updates (Next Month)
Step 3.1: Deploy Incremental Update Functions
- Use the SQL functions provided in the
Incremental Annual Update Implementationartifact - Test with staging data first
- Validate backup and restore procedures
Step 3.2: Create Annual Update Script
- Use the bash script provided in the
Annual Update Orchestration Scriptartifact - Test the complete workflow with 2025 data
- Verify rollback procedures work
Step 3.3: Integrate with WoRMS/ITIS Mapping
-- After incremental update, identify new mappings needed
SELECT
a."Alpha3_Code",
a."scientificName",
CASE
WHEN w.aphia_id IS NOT NULL THEN 'WoRMS_MAPPED'
WHEN i.tsn IS NOT NULL THEN 'ITIS_MAPPED'
ELSE 'NEEDS_MAPPING'
END as mapping_status
FROM asfis_species a
LEFT JOIN worms_taxonomic_units w ON a."scientificName" = w."scientificName"
LEFT JOIN itis_taxonomic_units i ON a."scientificName" = i.complete_name
WHERE a.data_year = 2025 -- Only check new/updated records
AND mapping_status = 'NEEDS_MAPPING';
📅 Annual Update Workflow (Future State)
Current Workflow (BROKEN):
# ❌ Current approach - unsustainable
docker compose down # Downtime starts
docker compose up # 30+ minutes, rebuild everything
# Risk of failure, no rollback, loses all user data
New Workflow (OPTIMIZED):
# ✅ New incremental approach
# 1. Download new ASFIS data
wget "https://fao.org/asfis/ASFIS_sp_2026.csv" -O /import/ASFIS_sp_2026.csv
# 2. Run incremental update (zero downtime)
./annual_update.sh 2026
# 3. Review what changed
psql -c "SELECT * FROM preview_asfis_annual_update('/import/ASFIS_sp_2026_cleaned.csv');"
# 4. Execute update (5-10 minutes)
psql -c "SELECT * FROM perform_asfis_annual_update(2026, '/import/ASFIS_sp_2026_cleaned.csv');"
# 5. Map only new codes to WoRMS/ITIS (separate process)
./map_new_codes_to_taxonomies.sh 2026
# 6. Done! Database never went offline ✅
🎯 Benefits Analysis
Performance Improvements:
| Metric | Current Approach | New Approach | Improvement |
|---|---|---|---|
| Downtime | 30+ minutes | 0 minutes | 100% elimination |
| Update Time | 30+ minutes | 5-10 minutes | 75% reduction |
| Data Processing | Full rebuild | Only changes | 90% reduction |
| Risk Level | High | Low | Significant reduction |
Operational Benefits:
- ✅ Existing records keep their UUIDs (API stability)
- ✅ New records get current timestamps (proper versioning)
- ✅ Complete audit trail (know exactly what changed when)
- ✅ Rollback capability (can undo if needed)
- ✅ Historical data preserved (nothing lost)
- ✅ Only new codes need mapping (focused effort)
Data Integrity Benefits:
- 🛡️ Backup before every update (safety first)
- 📊 Change detection (know what actually changed)
- 🔍 Validation checks (verify update success)
- 📋 Status tracking (monitor update progress)
- ↩️ Recovery procedures (handle failures gracefully)
🚀 Trade Data Implementation Status
✅ FEASIBLE - Trade Data Patterns Supported:
Pattern 1: Single Species per Trade Code (1:1)
-- Example: YFT (Yellowfin Tuna)
SELECT * FROM asfis_species WHERE "Alpha3_Code" = 'YFT';
-- Returns: Only Thunnus albacares
Pattern 2: Multiple Species per Trade Code (1:Many)
-- Example: HKC (Cape Hakes)
SELECT * FROM asfis_species WHERE "Alpha3_Code" = 'HKC';
-- Returns: Merluccius capensis AND Merluccius paradoxus
Pattern 3: Higher Taxonomic Level with Cascading (1:Hierarchy)
-- Example: CAX (Ariidae - requires cascading to all family members)
SELECT * FROM asfis_species WHERE "Alpha3_Code" = 'CAX';
-- Returns: Ariidae (family level)
-- Future: Should cascade to ALL species within Ariidae family via WoRMS/ITIS
Cascading Implementation (Future):
-- Family-level cascading example
WITH family_codes AS (
SELECT "Alpha3_Code", "scientificName" as family_name
FROM asfis_species WHERE "taxonRank" = 'Family'
)
SELECT
w."scientificName",
fc."Alpha3_Code" as inherited_trade_code,
'Family-level inheritance' as assignment_type
FROM worms_taxonomic_units w
JOIN family_codes fc ON w."family" = fc.family_name;
📋 Implementation Checklist
Immediate Actions (This Week):
- Replace
import_asfis.sqlwith fixed version (remove duplicate \COPY) - Test fix:
docker compose down && docker compose up - Verify ASFIS record count is ~13,733 (not 27,466)
- Add historical tracking tables to database schema
Short Term (This Month):
- Choose architecture path (Option A, B, or C)
- Implement chosen approach
- Create staging environment for testing updates
- Develop incremental update functions
Medium Term (Next Month):
- Deploy incremental update orchestration script
- Test complete annual update workflow with 2025 data
- Validate backup and rollback procedures
- Document new update process for team
Long Term (Ongoing):
- Map new trade codes to WoRMS/ITIS as they appear
- Monitor update performance and optimization opportunities
- Implement automated notifications for update completion
- Create data quality monitoring dashboards
🎉 Expected Outcomes
After Implementation:
- 🚀 Annual updates become routine 10-minute operations
- 🛡️ Zero risk of losing data during updates
- 📊 Complete visibility into what changes between years
- ⚡ New trade codes can be mapped incrementally as needed
- 🔄 System stays online during all update operations
- 📋 Full audit trail for compliance and debugging
Trade Data Ready:
- ✅ 1:1 species mappings (direct trade codes)
- ✅ 1:Many species mappings (shared trade codes)
- ✅ 1:Hierarchy mappings (taxonomic cascading ready)
- ✅ WoRMS/ITIS integration (new codes mapped as needed)
This incremental update strategy transforms annual updates from a high-risk, high-downtime operation into a routine, safe, fast process that preserves data integrity and provides complete auditability.