Source:
ocean/docs/OPERATIONS_RUNBOOK.md| ✏️ Edit on GitHub
Operations Runbook
Quick Reference: Debug guide for common issues, deployment procedures, and emergency responses.
Common Issues & Solutions
Authentication Issues
"Database error saving new user"
Debug steps:
# 1. Check trigger function
psql postgresql://postgres:postgres@localhost:54322/postgres
\df handle_new_user
# 2. Test user creation manually
pnpm run debug:user "test@example.com" "Test Org"
# 3. Check provisioning events
SELECT * FROM provisioning_events
WHERE organization_id = 'org-id'
ORDER BY created_at DESC;
# 4. Check Supabase logs
docker logs supabase_db_ocean --tail 100 | grep ERROR
JWT Token Invalid
# Check token expiry
jwt decode <token>
# Force refresh session
const { data, error } = await supabase.auth.refreshSession()
# Verify Supabase keys match
echo $VITE_SUPABASE_PUBLISHABLE_KEY
cat .env.local | grep PUBLISHABLE
RLS Policy Blocking Access
-- Check current user
SELECT auth.uid();
-- Test RLS policies
SET LOCAL role TO 'authenticated';
SET LOCAL request.jwt.claims.sub TO 'user-uuid';
SELECT * FROM organizations;
-- Debug specific policy
SELECT * FROM pg_policies WHERE tablename = 'organizations';
Database Issues
Migration Conflicts
# Fix duplicate timestamps
./scripts/fix-migration-names.sh
# Check migration status
supabase migration list
# Reset and reapply
supabase db reset --no-seed
Connection Pool Exhausted
-- Check active connections
SELECT count(*) FROM pg_stat_activity;
-- Kill idle connections
SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE state = 'idle'
AND state_change < NOW() - INTERVAL '5 minutes';
Slow Queries
-- Enable query logging
ALTER DATABASE postgres SET log_statement = 'all';
ALTER DATABASE postgres SET log_duration = on;
-- Find slow queries
SELECT query, mean_exec_time, calls
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 10;
-- Explain query plan
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM organizations WHERE owner_id = 'uuid';
Stripe Integration Issues
Webhook Signature Verification Failed
# Verify webhook secret
echo $STRIPE_WEBHOOK_SECRET
# Test webhook locally
stripe listen --forward-to localhost:54321/functions/v1/handle-stripe-webhook
# Check webhook logs in Stripe dashboard
# https://dashboard.stripe.com/test/webhooks
Subscription Creation Failed
// Debug Stripe API calls
const stripe = new Stripe(process.env.STRIPE_SECRET_KEY, {
apiVersion: '2024-11-20',
maxNetworkRetries: 2,
telemetry: false,
})
// Enable debug logging
stripe.on('request', (event) => {
console.log('Stripe Request:', event)
})
Payment Method Attachment Failed
# Check customer exists
stripe customers retrieve cus_xxx
# List payment methods
stripe payment_methods list --customer cus_xxx
# Manually attach
stripe payment_methods attach pm_xxx --customer cus_xxx
Neon Database Issues
Tenant Database Connection Failed
Debug commands:
# Test connection string
psql "postgresql://user:pass@xxx.neon.tech/neondb?sslmode=require"
# Check provisioning status
SELECT * FROM organization_databases
WHERE organization_id = 'org-id';
# Verify Neon project exists
curl -H "Authorization: Bearer $NEON_API_KEY" \
https://console.neon.tech/api/v2/projects
Performance Issues
High API Response Times
# Check Edge Function logs
supabase functions logs graphql-v2 --limit 100
# Monitor cold starts
grep "cold start" logs.txt | wc -l
# Check bundle size
pnpm run analyze:bundle
# Profile specific endpoints
curl -w "@curl-format.txt" -o /dev/null -s \
https://ocean.goldfish.io/api/graphql
Frontend Performance Degradation
# Run Lighthouse audit
npx lighthouse https://ocean.goldfish.io
# Check bundle size regression
pnpm run perf:check
# Analyze specific chunks
npx source-map-explorer dist/assets/*.js
Deployment Procedures
Standard Production Deploy
Pre-deployment checklist:
- All tests passing locally
- TypeScript compilation successful
- No ESLint errors
- Bundle size within limits
- Migrations tested locally
Emergency Hotfix Deploy
# 1. Create hotfix branch
git checkout -b hotfix/critical-issue
# 2. Make fix and test
# ... make changes ...
pnpm run validate
pnpm test
# 3. Deploy directly to production
git push origin hotfix/critical-issue
# Create PR and merge with admin override
# 4. Monitor deployment
pnpm run health:check
Database Migration Deploy
# 1. Test migration locally
supabase migration new fix_issue
# ... write migration ...
supabase db reset --no-seed
# 2. Review migration
cat supabase/migrations/xxx_fix_issue.sql
# 3. Deploy to staging first
git push origin feature/migration
# 4. Apply to production
supabase migration up --project-ref prod-ref
Rollback Procedures
Application Rollback
# Interactive rollback with safety checks
pnpm run rollback:prod
# Manual rollback via Vercel
vercel ls ocean-platform --prod
vercel promote [deployment-id]
# Verify rollback
curl https://ocean.goldfish.io/health
Database Rollback
-- Create restore point before risky operations
BEGIN;
SAVEPOINT before_changes;
-- Make changes
ALTER TABLE organizations ADD COLUMN risky_field TEXT;
-- If issues, rollback
ROLLBACK TO SAVEPOINT before_changes;
-- If good, commit
COMMIT;
Stripe Configuration Rollback
# Cannot rollback Stripe changes directly
# Must manually revert via dashboard or API
# Revert subscription price
stripe subscriptions update sub_xxx \
--items[0][price]=old_price_id \
--proration_behavior=none
Monitoring & Alerts
Health Check Endpoints
# Frontend health
curl https://ocean.goldfish.io/health
# API health
curl https://ocean.goldfish.io/api/health
# GraphQL health
curl -X POST https://ocean.goldfish.io/api/graphql \
-H "Content-Type: application/json" \
-d '{"query": "{ system_status { healthy } }"}'
Log Locations
Quick log access:
# Vercel logs
vercel logs --prod --follow
# Supabase Edge Function logs
supabase functions logs graphql-v2 --project-ref prod-ref
# Local database logs
docker logs supabase_db_ocean --follow
Alert Response Procedures
High Error Rate Alert
- Check Sentry for error spike details
- Identify affected endpoints/components
- Check recent deployments
- Rollback if deployment-related
- Apply hotfix if code issue
- Scale resources if load-related
Database Connection Alert
- Check Supabase status page
- Verify connection pool settings
- Check for long-running queries
- Kill idle connections
- Restart connection pool if needed
- Contact Supabase support if persistent
Payment Processing Alert
- Check Stripe dashboard for failures
- Verify webhook processing
- Check payment method issues
- Review recent Stripe changes
- Contact Stripe support if needed
Security Incident Response
Suspected API Key Leak
# 1. Rotate immediately
# Supabase: Dashboard > Settings > API
# Stripe: Dashboard > Developers > API keys
# 2. Update environment variables
vercel env pull
# Update keys
vercel env add SUPABASE_SERVICE_ROLE_KEY
# 3. Audit usage
# Check Supabase logs for unusual activity
# Check Stripe logs for unexpected charges
# 4. Update monitoring
pnpm run security:check
SQL Injection Attempt
-- Check for suspicious queries
SELECT query, calls
FROM pg_stat_statements
WHERE query LIKE '%UNION%'
OR query LIKE '%<script>%'
ORDER BY calls DESC;
-- Review RLS policies
SELECT * FROM pg_policies;
-- Ensure parameterized queries everywhere
-- Never use string concatenation for queries
DDoS Attack
Debugging Tools
Database Inspection
# Connect to production (read-only)
psql $PRODUCTION_DATABASE_URL
# Useful queries
\dt # List tables
\d+ organizations # Table details
\df # List functions
SELECT * FROM pg_indexes; # Check indexes
API Testing
# GraphQL introspection
curl -X POST http://localhost:54321/functions/v1/graphql-v2 \
-H "Content-Type: application/json" \
-d '{"query": "{ __schema { types { name } } }"}'
# Test with authentication
curl -X POST http://localhost:54321/functions/v1/graphql-v2 \
-H "Authorization: Bearer $JWT_TOKEN" \
-H "Content-Type: application/json" \
-d '{"query": "{ my_organizations { id name } }"}'
Performance Profiling
# Bundle analysis
pnpm run analyze:bundle
# Database query analysis
EXPLAIN (ANALYZE, BUFFERS) SELECT ...;
# Edge Function timing
console.time('operation')
// ... code ...
console.timeEnd('operation')
Disaster Recovery
Full Database Corruption
-
Stop all writes immediately
-
Backup current state (even if corrupted)
-
Restore from Supabase backups:
# Contact Supabase support for point-in-time restore
# Backups available: 7 days (Pro plan) -
Verify data integrity after restore
-
Run test suite before reopening
Complete Service Outage
- Update status page immediately
- Check all dependencies:
- Vercel status
- Supabase status
- Stripe status
- Neon status
- Implement read-only mode if partial service possible
- Communicate with users via email/social
- Post-mortem after resolution
Critical Data Loss
- Identify scope of data loss
- Check all backup sources:
- Supabase backups
- Neon backups
- Stripe data (source of truth for billing)
- Restore from most recent backup
- Reconcile any gaps manually
- Implement additional backup strategy
Contact Information
Service Providers
- Supabase Support: support.Supabase.io
- Vercel Support: vercel.com/support
- Stripe Support: support.stripe.com
- Neon Support: neon.tech/support
Internal Escalation
Since this is a two-person operation:
- Check documented solutions first
- Search error in GitHub issues
- Check service provider status pages
- Contact service provider support
- Post in relevant Discord/Slack communities