Source: ocean/docs/adr/0037-comprehensive-sentry-integration.md | ✏️ Edit on GitHub

ADR-003: Comprehensive Sentry Integration for Edge Functions

Status

Accepted

Date

2025-01-31

Context

Our application experienced limited error visibility and debugging capabilities across our Edge Function architecture. While Sentry was technically integrated, it was significantly underutilized with 0 reported issues, indicating a gap between our monitoring setup and actual error detection. This lack of comprehensive observability posed risks to:

Customer Experience: Errors going undetected until customer reports
Business Operations: Payment, authentication, and database failures not being tracked
Development Velocity: Debugging issues without proper error context
Security Monitoring: Failed authentication attempts and security violations not being captured
Performance Optimization: No visibility into slow operations or bottlenecks

The existing setup had Sentry configured but lacked:

Distributed tracing between frontend and backend
Comprehensive error context (user, organization, operation details)
Security event monitoring
Business event tracking
Performance monitoring for critical operations

Decision

We decided to implement comprehensive Sentry integration across all critical Edge Functions with the following approach:

1. Distributed Tracing Architecture

Implement end-to-end tracing from frontend through GraphQL to database operations
Use Sentry v9 APIs (startSpan, getCurrentScope, getActiveSpan)
Connect frontend errors to backend operations with trace propagation
Track operation performance and identify bottlenecks

2. Complete Function Coverage (15/15 functions - 100%)

High Priority (All Completed):

graphql-v2 - Main API with distributed tracing
auth-hook - Authentication events with security monitoring
database-hook - Database changes with comprehensive tracking
stripe-billing - Billing details with error tracking
stripe-subscription - Subscription management with Stripe operation tracking

Medium Priority (All Completed):

feature-flags - Feature flag evaluation with security checks
stripe-portal - Customer portal with session tracking
stripe-setup-intent - Payment method setup tracking
stripe-products - Product catalog management with API tracking
sync-stripe-data - Data synchronization with comprehensive error tracking
sync-user-to-stripe - User profile sync with customer creation tracking

Existing Functions (Already Had Sentry):

handle-stripe-webhook - Webhook processing
provision-tenant-resources - Resource provisioning
provision-user-resources - User setup
check-tenant-health - Health monitoring

3. Security Monitoring

Track failed authentication attempts with IP and user agent
Monitor invalid webhook signatures and access violations
Capture cross-user access attempts and permission violations
Alert on suspicious patterns and security events

4. Business Event Tracking

Organization creation and plan changes
Subscription lifecycle events
Payment flow completion and failures
Feature flag evaluations and rollouts
User onboarding and engagement events

5. Error Context Enhancement

User ID and organization ID in all error reports
Request details (method, URL, headers)
Operation-specific context (Stripe customer ID, subscription ID, etc.)
Database operation details (table, operation type, duration)
Feature flag evaluation context

6. Performance Monitoring

Database operation timing with trackSupabaseOperation
External API call tracking with tracedFetch
Stripe operation performance monitoring
GraphQL operation timing and complexity tracking

7. Cost Optimization

10% transaction sampling in production (within Team plan limits)
Error filtering for non-critical events (CORS, validation errors)
Sensitive data removal from error reports
Health check endpoint exclusion

Implementation Details

Standard Implementation Pattern

import {
  initSentryWithTracing,
  withTracing,
  trackSupabaseOperation,
} from '../_shared/sentry-tracing.ts'
import { Logger } from '../_shared/observability.ts'
import * as Sentry from 'https://deno.land/x/sentry/index.mjs'

// Initialize Sentry
initSentryWithTracing('function-name')
const logger = new Logger({ functionName: 'function-name' })

// Wrap function with distributed tracing
serve(
  withTracing('function-name', async (req) => {
    try {
      // Set user and organization context
      Sentry.setUser({ id: user.id })
      Sentry.setContext('organization', { id: organizationId })

      // Track database operations
      const { data, error } = await trackSupabaseOperation('select', 'table', async () =>
        supabase.from('table').select('*')
      )

      // Track external API calls
      const response = await tracedFetch('https://api.stripe.com/endpoint', {
        spanName: 'Stripe API Call',
      })
    } catch (error) {
      // Capture with rich context
      Sentry.withScope((scope) => {
        scope.setTag('function.name', 'function-name')
        scope.setTag('error.type', 'operation_failed')
        scope.setContext('operation_context', {
          /* relevant details */
        })
        Sentry.captureException(error)
      })
    }
  })
)

Key Features Implemented

Distributed Tracing: Full request flow visibility
Security Monitoring: Authentication failures, access violations
Business Event Tracking: Subscriptions, organizations, feature flags
Performance Monitoring: Database and API operation timing
Error Context: User, organization, and operation details
Cost Optimization: Sampling and filtering for Team plan limits

Consequences

Positive

Complete Visibility: 80% of Edge Functions now have comprehensive monitoring
Faster Debugging: Rich error context reduces time to resolution
Proactive Monitoring: Catch errors before customer reports
Security Insights: Track authentication failures and suspicious activity
Performance Optimization: Identify slow operations and bottlenecks
Business Intelligence: Track subscription changes, feature adoption
Distributed Tracing: End-to-end request flow visibility
Team Plan Optimized: Stays within 50K errors, 100K performance units/month

Negative

Increased Complexity: More monitoring code in each function
Performance Overhead: Small latency increase from tracking operations
Cost Monitoring Required: Need to watch Sentry usage against Team plan limits
Maintenance Overhead: Sentry integration needs updates with new functions

Risks Mitigated

Customer Impact: Proactive error detection vs reactive customer reports
Security Vulnerabilities: Authentication and access monitoring
Payment Failures: Complete Stripe operation tracking
Database Issues: Comprehensive database operation monitoring
Feature Rollout Issues: Feature flag evaluation tracking

Monitoring and Maintenance

Key Metrics to Track

Error Rate Trends: Monitor for increasing error patterns
Authentication Failures: Track failed login attempts and patterns
Payment Flow Health: Monitor Stripe operation success rates
Database Performance: Track slow queries and operation failures
Feature Flag Adoption: Monitor feature rollout success

Team Plan Usage Monitoring

Monthly Error Budget: 50,000 errors/month
Performance Budget: 100,000 performance units/month
Current Sampling: 10% transaction sampling in production
Cost Optimization: Filter non-critical errors, exclude health checks

Maintenance Tasks

Quarterly Review: Assess error patterns and optimization opportunities
Alert Configuration: Set up notifications for critical error thresholds
Performance Analysis: Review slow operations and optimization opportunities
Security Review: Analyze authentication failure patterns

Success Criteria

Immediate (Achieved)

✅ 100% of Edge Functions have comprehensive Sentry integration
✅ All business operations are monitored (authentication, payments, data sync)
✅ Distributed tracing implemented from frontend to database
✅ Security monitoring for authentication and access violations
✅ Rich error context for faster debugging
✅ Complete product catalog and data synchronization monitoring

Short-term (1-3 months)

Establish error rate baselines and alert thresholds
Create Sentry dashboards for business and technical metrics
Configure automated alerts for critical error patterns
Analyze data synchronization patterns and optimize performance

Long-term (3-6 months)

Demonstrate reduced time-to-resolution for production issues
Show proactive issue detection vs customer-reported issues
Optimize performance based on distributed tracing insights
Enhance security posture based on monitoring insights

Alternatives Considered

Alternative 1: Basic Error Logging

Rejected: Console.error() logging provides no aggregation, alerting, or context tracking.

Alternative 2: Multiple Monitoring Tools

Rejected: Using separate tools for errors, performance, and business events would increase complexity and cost.

Alternative 3: Gradual Implementation

Rejected: Partial coverage would leave critical gaps in monitoring during high-risk operations.

Alternative 4: Frontend-Only Monitoring

Rejected: Backend errors often occur without frontend visibility, especially in webhook processing and background jobs.

Status​

Date​

Context​

Decision​

1. Distributed Tracing Architecture​

2. Complete Function Coverage (15/15 functions - 100%)​

3. Security Monitoring​

4. Business Event Tracking​

5. Error Context Enhancement​

6. Performance Monitoring​

7. Cost Optimization​

Implementation Details​

Standard Implementation Pattern​

Key Features Implemented​

Consequences​

Positive​

Negative​

Risks Mitigated​

Monitoring and Maintenance​

Key Metrics to Track​

Team Plan Usage Monitoring​

Maintenance Tasks​

Success Criteria​

Immediate (Achieved)​

Short-term (1-3 months)​

Long-term (3-6 months)​

Alternatives Considered​

Alternative 1: Basic Error Logging​

Alternative 2: Multiple Monitoring Tools​

Alternative 3: Gradual Implementation​

Alternative 4: Frontend-Only Monitoring​

Related Decisions​

References​