Source:
ocean/docs/adr/0037-comprehensive-sentry-integration.md| ✏️ Edit on GitHub
ADR-003: Comprehensive Sentry Integration for Edge Functions
Status
Accepted
Date
2025-01-31
Context
Our application experienced limited error visibility and debugging capabilities across our Edge Function architecture. While Sentry was technically integrated, it was significantly underutilized with 0 reported issues, indicating a gap between our monitoring setup and actual error detection. This lack of comprehensive observability posed risks to:
- Customer Experience: Errors going undetected until customer reports
- Business Operations: Payment, authentication, and database failures not being tracked
- Development Velocity: Debugging issues without proper error context
- Security Monitoring: Failed authentication attempts and security violations not being captured
- Performance Optimization: No visibility into slow operations or bottlenecks
The existing setup had Sentry configured but lacked:
- Distributed tracing between frontend and backend
- Comprehensive error context (user, organization, operation details)
- Security event monitoring
- Business event tracking
- Performance monitoring for critical operations
Decision
We decided to implement comprehensive Sentry integration across all critical Edge Functions with the following approach:
1. Distributed Tracing Architecture
- Implement end-to-end tracing from frontend through GraphQL to database operations
- Use Sentry v9 APIs (
startSpan,getCurrentScope,getActiveSpan) - Connect frontend errors to backend operations with trace propagation
- Track operation performance and identify bottlenecks
2. Complete Function Coverage (15/15 functions - 100%)
High Priority (All Completed):
graphql-v2- Main API with distributed tracingauth-hook- Authentication events with security monitoringdatabase-hook- Database changes with comprehensive trackingstripe-billing- Billing details with error trackingstripe-subscription- Subscription management with Stripe operation tracking
Medium Priority (All Completed):
feature-flags- Feature flag evaluation with security checksstripe-portal- Customer portal with session trackingstripe-setup-intent- Payment method setup trackingstripe-products- Product catalog management with API trackingsync-stripe-data- Data synchronization with comprehensive error trackingsync-user-to-stripe- User profile sync with customer creation tracking
Existing Functions (Already Had Sentry):
handle-stripe-webhook- Webhook processingprovision-tenant-resources- Resource provisioningprovision-user-resources- User setupcheck-tenant-health- Health monitoring
3. Security Monitoring
- Track failed authentication attempts with IP and user agent
- Monitor invalid webhook signatures and access violations
- Capture cross-user access attempts and permission violations
- Alert on suspicious patterns and security events
4. Business Event Tracking
- Organization creation and plan changes
- Subscription lifecycle events
- Payment flow completion and failures
- Feature flag evaluations and rollouts
- User onboarding and engagement events
5. Error Context Enhancement
- User ID and organization ID in all error reports
- Request details (method, URL, headers)
- Operation-specific context (Stripe customer ID, subscription ID, etc.)
- Database operation details (table, operation type, duration)
- Feature flag evaluation context
6. Performance Monitoring
- Database operation timing with
trackSupabaseOperation - External API call tracking with
tracedFetch - Stripe operation performance monitoring
- GraphQL operation timing and complexity tracking
7. Cost Optimization
- 10% transaction sampling in production (within Team plan limits)
- Error filtering for non-critical events (CORS, validation errors)
- Sensitive data removal from error reports
- Health check endpoint exclusion
Implementation Details
Standard Implementation Pattern
import {
initSentryWithTracing,
withTracing,
trackSupabaseOperation,
} from '../_shared/sentry-tracing.ts'
import { Logger } from '../_shared/observability.ts'
import * as Sentry from 'https://deno.land/x/sentry/index.mjs'
// Initialize Sentry
initSentryWithTracing('function-name')
const logger = new Logger({ functionName: 'function-name' })
// Wrap function with distributed tracing
serve(
withTracing('function-name', async (req) => {
try {
// Set user and organization context
Sentry.setUser({ id: user.id })
Sentry.setContext('organization', { id: organizationId })
// Track database operations
const { data, error } = await trackSupabaseOperation('select', 'table', async () =>
supabase.from('table').select('*')
)
// Track external API calls
const response = await tracedFetch('https://api.stripe.com/endpoint', {
spanName: 'Stripe API Call',
})
} catch (error) {
// Capture with rich context
Sentry.withScope((scope) => {
scope.setTag('function.name', 'function-name')
scope.setTag('error.type', 'operation_failed')
scope.setContext('operation_context', {
/* relevant details */
})
Sentry.captureException(error)
})
}
})
)
Key Features Implemented
- Distributed Tracing: Full request flow visibility
- Security Monitoring: Authentication failures, access violations
- Business Event Tracking: Subscriptions, organizations, feature flags
- Performance Monitoring: Database and API operation timing
- Error Context: User, organization, and operation details
- Cost Optimization: Sampling and filtering for Team plan limits
Consequences
Positive
- Complete Visibility: 80% of Edge Functions now have comprehensive monitoring
- Faster Debugging: Rich error context reduces time to resolution
- Proactive Monitoring: Catch errors before customer reports
- Security Insights: Track authentication failures and suspicious activity
- Performance Optimization: Identify slow operations and bottlenecks
- Business Intelligence: Track subscription changes, feature adoption
- Distributed Tracing: End-to-end request flow visibility
- Team Plan Optimized: Stays within 50K errors, 100K performance units/month
Negative
- Increased Complexity: More monitoring code in each function
- Performance Overhead: Small latency increase from tracking operations
- Cost Monitoring Required: Need to watch Sentry usage against Team plan limits
- Maintenance Overhead: Sentry integration needs updates with new functions
Risks Mitigated
- Customer Impact: Proactive error detection vs reactive customer reports
- Security Vulnerabilities: Authentication and access monitoring
- Payment Failures: Complete Stripe operation tracking
- Database Issues: Comprehensive database operation monitoring
- Feature Rollout Issues: Feature flag evaluation tracking
Monitoring and Maintenance
Key Metrics to Track
- Error Rate Trends: Monitor for increasing error patterns
- Authentication Failures: Track failed login attempts and patterns
- Payment Flow Health: Monitor Stripe operation success rates
- Database Performance: Track slow queries and operation failures
- Feature Flag Adoption: Monitor feature rollout success
Team Plan Usage Monitoring
- Monthly Error Budget: 50,000 errors/month
- Performance Budget: 100,000 performance units/month
- Current Sampling: 10% transaction sampling in production
- Cost Optimization: Filter non-critical errors, exclude health checks
Maintenance Tasks
- Quarterly Review: Assess error patterns and optimization opportunities
- Alert Configuration: Set up notifications for critical error thresholds
- Performance Analysis: Review slow operations and optimization opportunities
- Security Review: Analyze authentication failure patterns
Success Criteria
Immediate (Achieved)
- ✅ 100% of Edge Functions have comprehensive Sentry integration
- ✅ All business operations are monitored (authentication, payments, data sync)
- ✅ Distributed tracing implemented from frontend to database
- ✅ Security monitoring for authentication and access violations
- ✅ Rich error context for faster debugging
- ✅ Complete product catalog and data synchronization monitoring
Short-term (1-3 months)
- Establish error rate baselines and alert thresholds
- Create Sentry dashboards for business and technical metrics
- Configure automated alerts for critical error patterns
- Analyze data synchronization patterns and optimize performance
Long-term (3-6 months)
- Demonstrate reduced time-to-resolution for production issues
- Show proactive issue detection vs customer-reported issues
- Optimize performance based on distributed tracing insights
- Enhance security posture based on monitoring insights
Alternatives Considered
Alternative 1: Basic Error Logging
Rejected: Console.error() logging provides no aggregation, alerting, or context tracking.
Alternative 2: Multiple Monitoring Tools
Rejected: Using separate tools for errors, performance, and business events would increase complexity and cost.
Alternative 3: Gradual Implementation
Rejected: Partial coverage would leave critical gaps in monitoring during high-risk operations.
Alternative 4: Frontend-Only Monitoring
Rejected: Backend errors often occur without frontend visibility, especially in webhook processing and background jobs.