Skip to main content

Source: ocean/docs/sentry-alert-configuration.md | ✏️ Edit on GitHub

Sentry Alert Configuration for Ocean

This document contains the recommended Sentry alert rules for Ocean, including pg_net monitoring and Stripe payment alerts.

Prerequisites

  1. Sentry Slack integration must be configured
  2. Slack channels created:
    • #alerts-critical - Immediate action required
    • #alerts-errors - Application errors
    • #alerts-performance - Performance issues
    • #alerts-payments - Payment/billing issues

Alert Rules to Configure

1. 🚨 pg_net Critical Backlog

Purpose: Alert when pg_net queue exceeds safe thresholds (would have caught your 664k issue)

Type: Issue Alert
Name: '🚨 pg_net Critical Backlog'

Conditions:
WHEN all of these conditions are met:
- An event is seen
- The event's message contains "pg_net backlog"
- The event's tags match pg_net_status equals "critical"

THEN perform these actions:
- Send a Slack notification to #alerts-critical
- Create a Jira ticket (if configured)

Rate limiting:
- At most once every 5 minutes

2. ⚠️ pg_net Warning

Type: Issue Alert
Name: '⚠️ pg_net Queue Warning'

Conditions:
WHEN all of these conditions are met:
- An event is seen
- The event's message contains "pg_net backlog"
- The event's tags match pg_net_status equals "warning"

THEN:
- Send a Slack notification to #alerts-performance

Rate limiting:
- At most once every 15 minutes

3. 💳 Stripe Payment Failed

Purpose: Critical alert for payment processing failures

Type: Issue Alert
Name: '💳 Payment Processing Failed'

Conditions:
WHEN all of these conditions are met:
- An event is seen
- The event's level is "error"
- Any of these conditions:
- The event's message contains "stripe"
- The event's message contains "payment failed"
- The event's tags match payment_error equals "true"
- The environment is "production"

THEN:
- Send a Slack notification to #alerts-payments
- Send a Slack notification to #alerts-critical
- Page on-call engineer (if PagerDuty configured)

Rate limiting:
- At most once every 1 minute (payments are critical)

4. 🔄 Stripe Webhook Failures

Type: Issue Alert
Name: '🔄 Stripe Webhook Processing Failed'

Conditions:
WHEN all of these conditions are met:
- An event is seen
- The event's message contains "webhook"
- The event's message contains "stripe"
- The event's level is "error"

THEN:
- Send a Slack notification to #alerts-payments

Rate limiting:
- At most once every 10 minutes

5. 📊 High Payment Error Rate

Type: Metric Alert
Name: '📊 High Payment Error Rate'

Alert when:
- error_rate(payment.failed)
- is above 5%
- for 10 minutes

Filter:
- Environment equals "production"
- Transaction contains "payment" OR "stripe"

THEN:
- Send a Slack notification to #alerts-payments
- Send a Slack notification to #alerts-critical

6. 🚫 Subscription Cancellation Spike

Type: Metric Alert
Name: '🚫 Subscription Cancellation Spike'

Alert when:
- count() of events matching "subscription_cancelled"
- is above 5
- in 1 hour
- compared to same time 1 week ago, increased by 200%

THEN:
- Send a Slack notification to #alerts-payments

7. 🐌 Slow Database Queries

Type: Issue Alert
Name: '🐌 Slow Database Query Detected'

Conditions:
WHEN all of these conditions are met:
- An event is seen
- The event's tags match db_slow_query equals "true"
- The event's measurements.duration is greater than 5000ms

THEN:
- Send a Slack notification to #alerts-performance

Rate limiting:
- At most once every 10 minutes per unique query

8. 🔐 Authentication Failures Spike

Type: Metric Alert
Name: '🔐 Authentication Failures Spike'

Alert when:
- count() of events matching "Auth failed"
- is above 20
- in 10 minutes

THEN:
- Send a Slack notification to #alerts-critical
- Security incident response

9. 💰 Revenue Impact Alert

Type: Issue Alert
Name: '💰 Revenue Impacting Error'

Conditions:
WHEN all of these conditions are met:
- An event is seen
- The event's level is "error" or "fatal"
- Any of these conditions:
- The event's tags match revenue_impact equals "true"
- The event's message contains "billing"
- The event's message contains "subscription"
- The event's message contains "checkout"

THEN:
- Send a Slack notification to #alerts-critical
- Send a Slack notification to #alerts-payments

Rate limiting:
- No rate limiting (revenue is critical)

10. 🏥 Service Health Check Failed

Type: Cron Monitor
Name: '🏥 Service Health Check'

Monitor slug: ocean-health-check
Schedule: Every 5 minutes

If check-in is missed:
- Send a Slack notification to #alerts-critical
- Page on-call

Stripe Best Practices for Sentry

1. Tag Payment Events Properly

// In your payment processing code
try {
const paymentIntent = await stripe.paymentIntents.create({...})
} catch (error) {
Sentry.captureException(error, {
tags: {
payment_error: true,
payment_method: paymentMethod,
amount: amount,
currency: currency,
customer_id: customerId,
},
contexts: {
stripe: {
payment_intent_id: paymentIntentId,
customer_id: customerId,
amount: amount,
error_code: error.code,
error_type: error.type,
}
},
level: 'error',
fingerprint: ['stripe', error.code], // Group by error code
})
}

2. Track Webhook Events

// In your webhook handler
Sentry.addBreadcrumb({
category: 'stripe.webhook',
message: `Processing ${event.type}`,
level: 'info',
data: {
event_id: event.id,
event_type: event.type,
livemode: event.livemode,
},
})

// On webhook failure
Sentry.captureException(new Error(`Webhook processing failed: ${event.type}`), {
tags: {
webhook_error: true,
webhook_type: event.type,
stripe_event_id: event.id,
},
level: 'error',
})

3. Monitor Subscription Lifecycle

// Track subscription changes
Sentry.addBreadcrumb({
category: 'subscription',
message: `Subscription ${status}`,
level: 'info',
data: {
subscription_id: subscription.id,
customer_id: subscription.customer,
status: subscription.status,
plan: subscription.items.data[0].price.id,
},
})

Testing Your Alerts

Test pg_net Alert

-- Run in Supabase SQL editor
INSERT INTO monitoring.pg_net_health (
queue_size, response_size, status
) VALUES (
15000, 60000, 'critical'
);

Test Stripe Alert

// Add to your app temporarily
Sentry.captureException(new Error('Test Stripe payment failed'), {
tags: {
payment_error: true,
test: true,
},
level: 'error',
})

Alert Response Playbook

When pg_net Critical Alert Fires

  1. Check Supabase dashboard for queue size

  2. Run cleanup if needed:

    SELECT monitoring.cleanup_pg_net_safely(1, 10000);
  3. Identify root cause (usually a failing webhook)

  4. Fix or disable the problematic webhook

When Payment Alert Fires

  1. Check Stripe dashboard for failures
  2. Verify webhook endpoints are responding
  3. Check for API key issues
  4. Review recent deployments
  5. Contact Stripe support if needed

Monitoring Dashboard

Create a bookmark in each Slack channel linking to:

  • Sentry Issues: https://sentry.io/organizations/YOUR_ORG/issues/?project=YOUR_PROJECT
  • Supabase Logs: https://app.supabase.com/project/YOUR_PROJECT/logs/explorer
  • Stripe Dashboard: https://dashboard.stripe.com/events

Rate Limiting Guidelines

  • Critical/Revenue: No rate limiting or 1 minute max
  • Errors: 5-10 minute rate limits
  • Warnings: 15-30 minute rate limits
  • Info: 30-60 minute rate limits

Remember: It's better to get alerted and investigate than to miss a critical issue!