Source:
ocean/docs/REPLICATION_TROUBLESHOOTING.md| ✏️ Edit on GitHub
Logical Replication Troubleshooting Guide
This guide helps diagnose and fix common issues with the Ebisu to Neon logical replication setup.
Testing Scripts Overview
Run these scripts in order to diagnose issues:
- Test Crunchy Bridge CLI:
./scripts/test-crunchy-bridge-connection.sh - Test Local Setup:
./scripts/test-replication-locally.sh - Test Replication Connection:
./scripts/test-replication-connection.sh
Common Issues and Solutions
1. Crunchy Bridge CLI Issues
Problem: "cb: command not found"
# Install Crunchy Bridge CLI
brew install CrunchyData/brew/cb
Problem: "Not logged in to Crunchy Bridge"
# Login to Crunchy Bridge
cb login
Problem: "No Ebisu cluster found"
- Check cluster name contains "ebisu" (case-insensitive)
- Or manually select cluster ID when prompted
- Verify cluster exists:
cb list
2. Connection Issues
Problem: "Connection refused" or "timeout"
-
Check firewall rules:
cb firewall-rule list --network-id YOUR_NETWORK_ID -
Add Neon IP range if missing:
cb firewall-rule create \
--network-id YOUR_NETWORK_ID \
--name "neon-replication" \
--rule "NEON_IP_RANGE/32"
Problem: "SSL connection required"
- Ensure certificate is exported:
cb cert CLUSTER_ID - Add certificate to environment:
export EBISU_CERT_PEM="$(cat cert.pem)" - Verify SSL mode in connection string includes certificate path
3. Authentication Issues
Problem: "password authentication failed for user replicator"
-
Verify password is correct
-
Recreate user if needed:
ALTER USER replicator WITH PASSWORD 'new-secure-password';
Problem: "role replicator does not exist"
Run the setup script: ./scripts/configure-crunchy-bridge.sh
4. Replication Configuration Issues
Problem: "wal_level is not logical"
-
Check current setting:
SHOW wal_level; -
Update in Crunchy Bridge dashboard:
- Go to cluster settings
- Set
wal_leveltological - Restart cluster (causes brief downtime)
Problem: "publication does not exist"
Create the publication:
CREATE PUBLICATION ebisu_master_data FOR TABLE
original_sources, countries, fao_major_areas, rfmos,
vessel_types, hull_materials, gear_types_fao, gear_types_cbp,
gear_types_msc, gear_relationships_msc_fao, gear_relationships_fao_cbp,
asfis_species, worms_taxon, worms_speciesprofile, worms_vernacularname,
worms_identifier, itis_species, msc_fishery, msc_gear_data, country_profile;
Problem: "permission denied for table"
Grant permissions to replicator:
GRANT SELECT ON ALL TABLES IN SCHEMA public TO replicator;
GRANT USAGE ON SCHEMA public TO replicator;
5. Edge Function Issues
Problem: "Ebisu configuration missing"
Add these secrets to Supabase Edge Functions:
EBISU_HOSTEBISU_PORTEBISU_DATABASEEBISU_REPLICATOR_PASSWORDEBISU_CERT_PEM
Problem: "Invalid tenant ID format"
- Ensure tenant ID is a valid UUID
- Format:
xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
Problem: "vault.decrypt failed"
- Check
VAULT_KEY_IDis set in Edge Functions - Verify vault is properly configured in Supabase
6. Data Synchronization Issues
Problem: "pg_dump: command not found" in Edge Functions
This is a known limitation. Edge Functions don't have pg_dump. Solutions:
- Use a different approach for initial seeding
- Create a separate service for data seeding
- Use COPY commands instead
Problem: "subscription already exists"
This is handled gracefully by the code, but to manually fix:
DROP SUBSCRIPTION IF EXISTS tenant_${tenantId}_sync;
Problem: Replication lag
-
Check lag:
SELECT * FROM pg_stat_subscription; -
Common causes:
- Network latency between regions
- Large transactions on publisher
- Insufficient resources on subscriber
7. Monitoring Issues
Problem: No alerts for failures
-
Verify Slack webhook URL is set
-
Check monitor function logs:
supabase functions logs monitor-replication
Problem: Monitor shows all tenants as failed
- Check if monitor can decrypt connection strings
- Verify network connectivity from Edge Functions to Neon
Debug Commands
Check Replication Status on Publisher (Crunchy Bridge)
-- Active replication connections
SELECT * FROM pg_stat_replication;
-- Replication slots
SELECT * FROM pg_replication_slots;
-- Publication details
SELECT * FROM pg_publication_tables WHERE pubname = 'ebisu_master_data';
Check Subscription Status on Subscriber (Neon)
-- Subscription status
SELECT * FROM pg_subscription;
-- Subscription statistics
SELECT * FROM pg_stat_subscription;
-- Check for errors
SELECT * FROM pg_stat_subscription_stats;
View Edge Function Logs
# Sync function logs
supabase functions logs sync-tenant-data --tail
# Monitor function logs
supabase functions logs monitor-replication --tail
# Provisioning logs
supabase functions logs provision-tenant-resources --tail
Emergency Procedures
Reset Failed Subscription
curl -X POST https://your-project.supabase.co/functions/v1/sync-tenant-data \
-H "Authorization: Bearer $SERVICE_ROLE_KEY" \
-H "Content-Type: application/json" \
-d '{"tenantId": "UUID-HERE", "action": "reset"}'
Manually Remove Subscription
-- On subscriber (Neon)
DROP SUBSCRIPTION IF EXISTS tenant_UUID_sync;
-- Update metadata
UPDATE tenant_metadata
SET replication_status = 'pending',
replication_error = NULL
WHERE tenant_id = 'UUID-HERE';
Check All Tenants Health
curl https://your-project.supabase.co/functions/v1/monitor-replication \
-H "Authorization: Bearer $SERVICE_ROLE_KEY"
Performance Optimization
For Large Initial Data Sets
- Consider parallel table copies
- Increase work_mem temporarily
- Use COPY instead of INSERT
For Ongoing Replication
- Monitor replication slot size
- Set appropriate max_replication_slots
- Consider regional placement of databases
Getting Help
- Check logs first (database, edge functions)
- Run test scripts to isolate the issue
- Check ADR-0048 for known limitations
- Create an issue with:
- Error messages
- Test script outputs
- Relevant logs