Skip to main content

Source: ocean/docs/adr/0037-comprehensive-sentry-integration.md | ✏️ Edit on GitHub

ADR-003: Comprehensive Sentry Integration for Edge Functions

Status

Accepted

Date

2025-01-31

Context

Our application experienced limited error visibility and debugging capabilities across our Edge Function architecture. While Sentry was technically integrated, it was significantly underutilized with 0 reported issues, indicating a gap between our monitoring setup and actual error detection. This lack of comprehensive observability posed risks to:

  • Customer Experience: Errors going undetected until customer reports
  • Business Operations: Payment, authentication, and database failures not being tracked
  • Development Velocity: Debugging issues without proper error context
  • Security Monitoring: Failed authentication attempts and security violations not being captured
  • Performance Optimization: No visibility into slow operations or bottlenecks

The existing setup had Sentry configured but lacked:

  • Distributed tracing between frontend and backend
  • Comprehensive error context (user, organization, operation details)
  • Security event monitoring
  • Business event tracking
  • Performance monitoring for critical operations

Decision

We decided to implement comprehensive Sentry integration across all critical Edge Functions with the following approach:

1. Distributed Tracing Architecture

  • Implement end-to-end tracing from frontend through GraphQL to database operations
  • Use Sentry v9 APIs (startSpan, getCurrentScope, getActiveSpan)
  • Connect frontend errors to backend operations with trace propagation
  • Track operation performance and identify bottlenecks

2. Complete Function Coverage (15/15 functions - 100%)

High Priority (All Completed):

  • graphql-v2 - Main API with distributed tracing
  • auth-hook - Authentication events with security monitoring
  • database-hook - Database changes with comprehensive tracking
  • stripe-billing - Billing details with error tracking
  • stripe-subscription - Subscription management with Stripe operation tracking

Medium Priority (All Completed):

  • feature-flags - Feature flag evaluation with security checks
  • stripe-portal - Customer portal with session tracking
  • stripe-setup-intent - Payment method setup tracking
  • stripe-products - Product catalog management with API tracking
  • sync-stripe-data - Data synchronization with comprehensive error tracking
  • sync-user-to-stripe - User profile sync with customer creation tracking

Existing Functions (Already Had Sentry):

  • handle-stripe-webhook - Webhook processing
  • provision-tenant-resources - Resource provisioning
  • provision-user-resources - User setup
  • check-tenant-health - Health monitoring

3. Security Monitoring

  • Track failed authentication attempts with IP and user agent
  • Monitor invalid webhook signatures and access violations
  • Capture cross-user access attempts and permission violations
  • Alert on suspicious patterns and security events

4. Business Event Tracking

  • Organization creation and plan changes
  • Subscription lifecycle events
  • Payment flow completion and failures
  • Feature flag evaluations and rollouts
  • User onboarding and engagement events

5. Error Context Enhancement

  • User ID and organization ID in all error reports
  • Request details (method, URL, headers)
  • Operation-specific context (Stripe customer ID, subscription ID, etc.)
  • Database operation details (table, operation type, duration)
  • Feature flag evaluation context

6. Performance Monitoring

  • Database operation timing with trackSupabaseOperation
  • External API call tracking with tracedFetch
  • Stripe operation performance monitoring
  • GraphQL operation timing and complexity tracking

7. Cost Optimization

  • 10% transaction sampling in production (within Team plan limits)
  • Error filtering for non-critical events (CORS, validation errors)
  • Sensitive data removal from error reports
  • Health check endpoint exclusion

Implementation Details

Standard Implementation Pattern

import {
initSentryWithTracing,
withTracing,
trackSupabaseOperation,
} from '../_shared/sentry-tracing.ts'
import { Logger } from '../_shared/observability.ts'
import * as Sentry from 'https://deno.land/x/sentry/index.mjs'

// Initialize Sentry
initSentryWithTracing('function-name')
const logger = new Logger({ functionName: 'function-name' })

// Wrap function with distributed tracing
serve(
withTracing('function-name', async (req) => {
try {
// Set user and organization context
Sentry.setUser({ id: user.id })
Sentry.setContext('organization', { id: organizationId })

// Track database operations
const { data, error } = await trackSupabaseOperation('select', 'table', async () =>
supabase.from('table').select('*')
)

// Track external API calls
const response = await tracedFetch('https://api.stripe.com/endpoint', {
spanName: 'Stripe API Call',
})
} catch (error) {
// Capture with rich context
Sentry.withScope((scope) => {
scope.setTag('function.name', 'function-name')
scope.setTag('error.type', 'operation_failed')
scope.setContext('operation_context', {
/* relevant details */
})
Sentry.captureException(error)
})
}
})
)

Key Features Implemented

  • Distributed Tracing: Full request flow visibility
  • Security Monitoring: Authentication failures, access violations
  • Business Event Tracking: Subscriptions, organizations, feature flags
  • Performance Monitoring: Database and API operation timing
  • Error Context: User, organization, and operation details
  • Cost Optimization: Sampling and filtering for Team plan limits

Consequences

Positive

  • Complete Visibility: 80% of Edge Functions now have comprehensive monitoring
  • Faster Debugging: Rich error context reduces time to resolution
  • Proactive Monitoring: Catch errors before customer reports
  • Security Insights: Track authentication failures and suspicious activity
  • Performance Optimization: Identify slow operations and bottlenecks
  • Business Intelligence: Track subscription changes, feature adoption
  • Distributed Tracing: End-to-end request flow visibility
  • Team Plan Optimized: Stays within 50K errors, 100K performance units/month

Negative

  • Increased Complexity: More monitoring code in each function
  • Performance Overhead: Small latency increase from tracking operations
  • Cost Monitoring Required: Need to watch Sentry usage against Team plan limits
  • Maintenance Overhead: Sentry integration needs updates with new functions

Risks Mitigated

  • Customer Impact: Proactive error detection vs reactive customer reports
  • Security Vulnerabilities: Authentication and access monitoring
  • Payment Failures: Complete Stripe operation tracking
  • Database Issues: Comprehensive database operation monitoring
  • Feature Rollout Issues: Feature flag evaluation tracking

Monitoring and Maintenance

Key Metrics to Track

  • Error Rate Trends: Monitor for increasing error patterns
  • Authentication Failures: Track failed login attempts and patterns
  • Payment Flow Health: Monitor Stripe operation success rates
  • Database Performance: Track slow queries and operation failures
  • Feature Flag Adoption: Monitor feature rollout success

Team Plan Usage Monitoring

  • Monthly Error Budget: 50,000 errors/month
  • Performance Budget: 100,000 performance units/month
  • Current Sampling: 10% transaction sampling in production
  • Cost Optimization: Filter non-critical errors, exclude health checks

Maintenance Tasks

  • Quarterly Review: Assess error patterns and optimization opportunities
  • Alert Configuration: Set up notifications for critical error thresholds
  • Performance Analysis: Review slow operations and optimization opportunities
  • Security Review: Analyze authentication failure patterns

Success Criteria

Immediate (Achieved)

  • ✅ 100% of Edge Functions have comprehensive Sentry integration
  • ✅ All business operations are monitored (authentication, payments, data sync)
  • ✅ Distributed tracing implemented from frontend to database
  • ✅ Security monitoring for authentication and access violations
  • ✅ Rich error context for faster debugging
  • ✅ Complete product catalog and data synchronization monitoring

Short-term (1-3 months)

  • Establish error rate baselines and alert thresholds
  • Create Sentry dashboards for business and technical metrics
  • Configure automated alerts for critical error patterns
  • Analyze data synchronization patterns and optimize performance

Long-term (3-6 months)

  • Demonstrate reduced time-to-resolution for production issues
  • Show proactive issue detection vs customer-reported issues
  • Optimize performance based on distributed tracing insights
  • Enhance security posture based on monitoring insights

Alternatives Considered

Alternative 1: Basic Error Logging

Rejected: Console.error() logging provides no aggregation, alerting, or context tracking.

Alternative 2: Multiple Monitoring Tools

Rejected: Using separate tools for errors, performance, and business events would increase complexity and cost.

Alternative 3: Gradual Implementation

Rejected: Partial coverage would leave critical gaps in monitoring during high-risk operations.

Alternative 4: Frontend-Only Monitoring

Rejected: Backend errors often occur without frontend visibility, especially in webhook processing and background jobs.

References