Skip to main content

Source: ocean/docs/REPLICATION_TROUBLESHOOTING.md | ✏️ Edit on GitHub

Logical Replication Troubleshooting Guide

This guide helps diagnose and fix common issues with the Ebisu to Neon logical replication setup.

Testing Scripts Overview

Run these scripts in order to diagnose issues:

  1. Test Crunchy Bridge CLI: ./scripts/test-crunchy-bridge-connection.sh
  2. Test Local Setup: ./scripts/test-replication-locally.sh
  3. Test Replication Connection: ./scripts/test-replication-connection.sh

Common Issues and Solutions

1. Crunchy Bridge CLI Issues

Problem: "cb: command not found"

# Install Crunchy Bridge CLI
brew install CrunchyData/brew/cb

Problem: "Not logged in to Crunchy Bridge"

# Login to Crunchy Bridge
cb login

Problem: "No Ebisu cluster found"

  • Check cluster name contains "ebisu" (case-insensitive)
  • Or manually select cluster ID when prompted
  • Verify cluster exists: cb list

2. Connection Issues

Problem: "Connection refused" or "timeout"

  1. Check firewall rules:

    cb firewall-rule list --network-id YOUR_NETWORK_ID
  2. Add Neon IP range if missing:

    cb firewall-rule create \
    --network-id YOUR_NETWORK_ID \
    --name "neon-replication" \
    --rule "NEON_IP_RANGE/32"

Problem: "SSL connection required"

  • Ensure certificate is exported: cb cert CLUSTER_ID
  • Add certificate to environment: export EBISU_CERT_PEM="$(cat cert.pem)"
  • Verify SSL mode in connection string includes certificate path

3. Authentication Issues

Problem: "password authentication failed for user replicator"

  1. Verify password is correct

  2. Recreate user if needed:

    ALTER USER replicator WITH PASSWORD 'new-secure-password';

Problem: "role replicator does not exist"

Run the setup script: ./scripts/configure-crunchy-bridge.sh

4. Replication Configuration Issues

Problem: "wal_level is not logical"

  1. Check current setting:

    SHOW wal_level;
  2. Update in Crunchy Bridge dashboard:

    • Go to cluster settings
    • Set wal_level to logical
    • Restart cluster (causes brief downtime)

Problem: "publication does not exist"

Create the publication:

CREATE PUBLICATION ebisu_master_data FOR TABLE
original_sources, countries, fao_major_areas, rfmos,
vessel_types, hull_materials, gear_types_fao, gear_types_cbp,
gear_types_msc, gear_relationships_msc_fao, gear_relationships_fao_cbp,
asfis_species, worms_taxon, worms_speciesprofile, worms_vernacularname,
worms_identifier, itis_species, msc_fishery, msc_gear_data, country_profile;

Problem: "permission denied for table"

Grant permissions to replicator:

GRANT SELECT ON ALL TABLES IN SCHEMA public TO replicator;
GRANT USAGE ON SCHEMA public TO replicator;

5. Edge Function Issues

Problem: "Ebisu configuration missing"

Add these secrets to Supabase Edge Functions:

  • EBISU_HOST
  • EBISU_PORT
  • EBISU_DATABASE
  • EBISU_REPLICATOR_PASSWORD
  • EBISU_CERT_PEM

Problem: "Invalid tenant ID format"

  • Ensure tenant ID is a valid UUID
  • Format: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx

Problem: "vault.decrypt failed"

  • Check VAULT_KEY_ID is set in Edge Functions
  • Verify vault is properly configured in Supabase

6. Data Synchronization Issues

Problem: "pg_dump: command not found" in Edge Functions

This is a known limitation. Edge Functions don't have pg_dump. Solutions:

  1. Use a different approach for initial seeding
  2. Create a separate service for data seeding
  3. Use COPY commands instead

Problem: "subscription already exists"

This is handled gracefully by the code, but to manually fix:

DROP SUBSCRIPTION IF EXISTS tenant_${tenantId}_sync;

Problem: Replication lag

  1. Check lag:

    SELECT * FROM pg_stat_subscription;
  2. Common causes:

    • Network latency between regions
    • Large transactions on publisher
    • Insufficient resources on subscriber

7. Monitoring Issues

Problem: No alerts for failures

  1. Verify Slack webhook URL is set

  2. Check monitor function logs:

    supabase functions logs monitor-replication

Problem: Monitor shows all tenants as failed

  • Check if monitor can decrypt connection strings
  • Verify network connectivity from Edge Functions to Neon

Debug Commands

Check Replication Status on Publisher (Crunchy Bridge)

-- Active replication connections
SELECT * FROM pg_stat_replication;

-- Replication slots
SELECT * FROM pg_replication_slots;

-- Publication details
SELECT * FROM pg_publication_tables WHERE pubname = 'ebisu_master_data';

Check Subscription Status on Subscriber (Neon)

-- Subscription status
SELECT * FROM pg_subscription;

-- Subscription statistics
SELECT * FROM pg_stat_subscription;

-- Check for errors
SELECT * FROM pg_stat_subscription_stats;

View Edge Function Logs

# Sync function logs
supabase functions logs sync-tenant-data --tail

# Monitor function logs
supabase functions logs monitor-replication --tail

# Provisioning logs
supabase functions logs provision-tenant-resources --tail

Emergency Procedures

Reset Failed Subscription

curl -X POST https://your-project.supabase.co/functions/v1/sync-tenant-data \
-H "Authorization: Bearer $SERVICE_ROLE_KEY" \
-H "Content-Type: application/json" \
-d '{"tenantId": "UUID-HERE", "action": "reset"}'

Manually Remove Subscription

-- On subscriber (Neon)
DROP SUBSCRIPTION IF EXISTS tenant_UUID_sync;

-- Update metadata
UPDATE tenant_metadata
SET replication_status = 'pending',
replication_error = NULL
WHERE tenant_id = 'UUID-HERE';

Check All Tenants Health

curl https://your-project.supabase.co/functions/v1/monitor-replication \
-H "Authorization: Bearer $SERVICE_ROLE_KEY"

Performance Optimization

For Large Initial Data Sets

  1. Consider parallel table copies
  2. Increase work_mem temporarily
  3. Use COPY instead of INSERT

For Ongoing Replication

  1. Monitor replication slot size
  2. Set appropriate max_replication_slots
  3. Consider regional placement of databases

Getting Help

  1. Check logs first (database, edge functions)
  2. Run test scripts to isolate the issue
  3. Check ADR-0048 for known limitations
  4. Create an issue with:
    • Error messages
    • Test script outputs
    • Relevant logs