Skip to main content

Source: ocean/docs/adr/0015-user-provisioning-hardening.md | ✏️ Edit on GitHub

ADR 0018: User Provisioning Hardening

Status

Accepted

Context

The user provisioning system had several security and reliability issues:

  1. Partial Provisioning: External services (Stripe, Neon) could fail independently, leaving users with incomplete resources
  2. Security Concerns:
    • CORS was using wildcard (*) allowing any origin
    • No rate limiting on provisioning attempts
    • Missing authorization checks
  3. Idempotency: Multiple provisioning attempts could create duplicate resources
  4. Secret Management: Connection strings were being stored in plain text in database tables
  5. Region Mapping: Inconsistent mapping between UI regions and Neon regions

Decision

Implement a hardened provisioning system with the following changes:

1. Atomic Provisioning

  • All-or-nothing resource creation
  • Pre-flight checks to verify services are available
  • Rollback mechanism for failed provisioning attempts
  • Circuit breakers for external services (Stripe, Neon)

2. Security Enhancements

  • CORS Allowlist: Configure allowed origins via ALLOWED_ORIGINS environment variable
  • Rate Limiting: 5 provisioning attempts per email per minute using Upstash Redis
  • User Recency Check: Only provision for users created within the last 10 minutes
  • Authorization: Verify organization ownership before allowing retry provisioning

3. Idempotency

  • Check for existing Stripe customer ID before creation
  • Check for existing Neon database before creation
  • Use idempotency keys for Stripe API calls

4. Secret Management

  • Store Neon connection strings in Supabase Vault
  • Remove plaintext secrets from database tables
  • Reference secrets via vault keys only

5. Unified Region Mapping

  • Centralized mapDataRegionToNeon() function
  • Support for all major AWS regions
  • Consistent mapping across the codebase

Implementation Details

GraphQL Schema Changes

# Pre-OTP provisioning - no userId required
mutation provisionUserResources(email: String!, metadata: JSON): ProvisioningResult!

# Retry provisioning with authorization
mutation retryProvisioning(organizationId: ID!): ProvisioningResult!

Atomic Provisioning Service

Located in supabase/functions/graphql-v2/services/atomic-provisioning.ts:

export async function atomicProvisionResources(
orgId: string,
user: any,
supabase: SupabaseClient
): Promise<{ success: boolean; error?: string }>

Key features:

  • Pre-flight checks before provisioning
  • Sequential resource creation with tracking
  • Automatic rollback on failure
  • Comprehensive event logging

Provisioning Events

Enhanced provisioning_events table tracks:

  • event_type: Type of provisioning event
  • resource_type: Type of resource being provisioned
  • resource_id: ID of the created resource
  • status: Event status (in_progress, completed, failed)
  • metadata: Additional context

CORS Configuration

// Environment variable: ALLOWED_ORIGINS=https://ocean.goldfish.io,https://ocean.staging.goldfish.io
const allowedOrigins = (Deno.env.get('ALLOWED_ORIGINS') || '')
.split(',')
.map((s) => s.trim())
.filter(Boolean)

Consequences

Positive

  • Reliability: No more partial provisioning states
  • Security: Proper CORS, rate limiting, and authorization
  • Observability: Complete audit trail of provisioning attempts
  • Maintainability: Centralized provisioning logic
  • User Experience: Resources ready immediately after OTP verification

Negative

  • Complexity: More complex provisioning flow with rollback logic
  • Latency: Pre-flight checks add slight overhead
  • Dependencies: Requires Redis for rate limiting

Trade-offs

  • Chose atomic provisioning over eventual consistency for better UX
  • Added complexity for rollback to ensure data consistency
  • Rate limiting may affect legitimate rapid testing

References