Source:
ocean/docs/sentry-alert-configuration.md| ✏️ Edit on GitHub
Sentry Alert Configuration for Ocean
This document contains the recommended Sentry alert rules for Ocean, including pg_net monitoring and Stripe payment alerts.
Prerequisites
- Sentry Slack integration must be configured
- Slack channels created:
#alerts-critical- Immediate action required#alerts-errors- Application errors#alerts-performance- Performance issues#alerts-payments- Payment/billing issues
Alert Rules to Configure
1. 🚨 pg_net Critical Backlog
Purpose: Alert when pg_net queue exceeds safe thresholds (would have caught your 664k issue)
Type: Issue Alert
Name: '🚨 pg_net Critical Backlog'
Conditions:
WHEN all of these conditions are met:
- An event is seen
- The event's message contains "pg_net backlog"
- The event's tags match pg_net_status equals "critical"
THEN perform these actions:
- Send a Slack notification to #alerts-critical
- Create a Jira ticket (if configured)
Rate limiting:
- At most once every 5 minutes
2. ⚠️ pg_net Warning
Type: Issue Alert
Name: '⚠️ pg_net Queue Warning'
Conditions:
WHEN all of these conditions are met:
- An event is seen
- The event's message contains "pg_net backlog"
- The event's tags match pg_net_status equals "warning"
THEN:
- Send a Slack notification to #alerts-performance
Rate limiting:
- At most once every 15 minutes
3. 💳 Stripe Payment Failed
Purpose: Critical alert for payment processing failures
Type: Issue Alert
Name: '💳 Payment Processing Failed'
Conditions:
WHEN all of these conditions are met:
- An event is seen
- The event's level is "error"
- Any of these conditions:
- The event's message contains "stripe"
- The event's message contains "payment failed"
- The event's tags match payment_error equals "true"
- The environment is "production"
THEN:
- Send a Slack notification to #alerts-payments
- Send a Slack notification to #alerts-critical
- Page on-call engineer (if PagerDuty configured)
Rate limiting:
- At most once every 1 minute (payments are critical)
4. 🔄 Stripe Webhook Failures
Type: Issue Alert
Name: '🔄 Stripe Webhook Processing Failed'
Conditions:
WHEN all of these conditions are met:
- An event is seen
- The event's message contains "webhook"
- The event's message contains "stripe"
- The event's level is "error"
THEN:
- Send a Slack notification to #alerts-payments
Rate limiting:
- At most once every 10 minutes
5. 📊 High Payment Error Rate
Type: Metric Alert
Name: '📊 High Payment Error Rate'
Alert when:
- error_rate(payment.failed)
- is above 5%
- for 10 minutes
Filter:
- Environment equals "production"
- Transaction contains "payment" OR "stripe"
THEN:
- Send a Slack notification to #alerts-payments
- Send a Slack notification to #alerts-critical
6. 🚫 Subscription Cancellation Spike
Type: Metric Alert
Name: '🚫 Subscription Cancellation Spike'
Alert when:
- count() of events matching "subscription_cancelled"
- is above 5
- in 1 hour
- compared to same time 1 week ago, increased by 200%
THEN:
- Send a Slack notification to #alerts-payments
7. 🐌 Slow Database Queries
Type: Issue Alert
Name: '🐌 Slow Database Query Detected'
Conditions:
WHEN all of these conditions are met:
- An event is seen
- The event's tags match db_slow_query equals "true"
- The event's measurements.duration is greater than 5000ms
THEN:
- Send a Slack notification to #alerts-performance
Rate limiting:
- At most once every 10 minutes per unique query
8. 🔐 Authentication Failures Spike
Type: Metric Alert
Name: '🔐 Authentication Failures Spike'
Alert when:
- count() of events matching "Auth failed"
- is above 20
- in 10 minutes
THEN:
- Send a Slack notification to #alerts-critical
- Security incident response
9. 💰 Revenue Impact Alert
Type: Issue Alert
Name: '💰 Revenue Impacting Error'
Conditions:
WHEN all of these conditions are met:
- An event is seen
- The event's level is "error" or "fatal"
- Any of these conditions:
- The event's tags match revenue_impact equals "true"
- The event's message contains "billing"
- The event's message contains "subscription"
- The event's message contains "checkout"
THEN:
- Send a Slack notification to #alerts-critical
- Send a Slack notification to #alerts-payments
Rate limiting:
- No rate limiting (revenue is critical)
10. 🏥 Service Health Check Failed
Type: Cron Monitor
Name: '🏥 Service Health Check'
Monitor slug: ocean-health-check
Schedule: Every 5 minutes
If check-in is missed:
- Send a Slack notification to #alerts-critical
- Page on-call
Stripe Best Practices for Sentry
1. Tag Payment Events Properly
// In your payment processing code
try {
const paymentIntent = await stripe.paymentIntents.create({...})
} catch (error) {
Sentry.captureException(error, {
tags: {
payment_error: true,
payment_method: paymentMethod,
amount: amount,
currency: currency,
customer_id: customerId,
},
contexts: {
stripe: {
payment_intent_id: paymentIntentId,
customer_id: customerId,
amount: amount,
error_code: error.code,
error_type: error.type,
}
},
level: 'error',
fingerprint: ['stripe', error.code], // Group by error code
})
}
2. Track Webhook Events
// In your webhook handler
Sentry.addBreadcrumb({
category: 'stripe.webhook',
message: `Processing ${event.type}`,
level: 'info',
data: {
event_id: event.id,
event_type: event.type,
livemode: event.livemode,
},
})
// On webhook failure
Sentry.captureException(new Error(`Webhook processing failed: ${event.type}`), {
tags: {
webhook_error: true,
webhook_type: event.type,
stripe_event_id: event.id,
},
level: 'error',
})
3. Monitor Subscription Lifecycle
// Track subscription changes
Sentry.addBreadcrumb({
category: 'subscription',
message: `Subscription ${status}`,
level: 'info',
data: {
subscription_id: subscription.id,
customer_id: subscription.customer,
status: subscription.status,
plan: subscription.items.data[0].price.id,
},
})
Testing Your Alerts
Test pg_net Alert
-- Run in Supabase SQL editor
INSERT INTO monitoring.pg_net_health (
queue_size, response_size, status
) VALUES (
15000, 60000, 'critical'
);
Test Stripe Alert
// Add to your app temporarily
Sentry.captureException(new Error('Test Stripe payment failed'), {
tags: {
payment_error: true,
test: true,
},
level: 'error',
})
Alert Response Playbook
When pg_net Critical Alert Fires
-
Check Supabase dashboard for queue size
-
Run cleanup if needed:
SELECT monitoring.cleanup_pg_net_safely(1, 10000); -
Identify root cause (usually a failing webhook)
-
Fix or disable the problematic webhook
When Payment Alert Fires
- Check Stripe dashboard for failures
- Verify webhook endpoints are responding
- Check for API key issues
- Review recent deployments
- Contact Stripe support if needed
Monitoring Dashboard
Create a bookmark in each Slack channel linking to:
- Sentry Issues:
https://sentry.io/organizations/YOUR_ORG/issues/?project=YOUR_PROJECT - Supabase Logs:
https://app.supabase.com/project/YOUR_PROJECT/logs/explorer - Stripe Dashboard:
https://dashboard.stripe.com/events
Rate Limiting Guidelines
- Critical/Revenue: No rate limiting or 1 minute max
- Errors: 5-10 minute rate limits
- Warnings: 15-30 minute rate limits
- Info: 30-60 minute rate limits
Remember: It's better to get alerted and investigate than to miss a critical issue!