Source:
ocean/docs/adr/0015-user-provisioning-hardening.md| ✏️ Edit on GitHub
ADR 0018: User Provisioning Hardening
Status
Accepted
Context
The user provisioning system had several security and reliability issues:
- Partial Provisioning: External services (Stripe, Neon) could fail independently, leaving users with incomplete resources
- Security Concerns:
- CORS was using wildcard (*) allowing any origin
- No rate limiting on provisioning attempts
- Missing authorization checks
- Idempotency: Multiple provisioning attempts could create duplicate resources
- Secret Management: Connection strings were being stored in plain text in database tables
- Region Mapping: Inconsistent mapping between UI regions and Neon regions
Decision
Implement a hardened provisioning system with the following changes:
1. Atomic Provisioning
- All-or-nothing resource creation
- Pre-flight checks to verify services are available
- Rollback mechanism for failed provisioning attempts
- Circuit breakers for external services (Stripe, Neon)
2. Security Enhancements
- CORS Allowlist: Configure allowed origins via ALLOWED_ORIGINS environment variable
- Rate Limiting: 5 provisioning attempts per email per minute using Upstash Redis
- User Recency Check: Only provision for users created within the last 10 minutes
- Authorization: Verify organization ownership before allowing retry provisioning
3. Idempotency
- Check for existing Stripe customer ID before creation
- Check for existing Neon database before creation
- Use idempotency keys for Stripe API calls
4. Secret Management
- Store Neon connection strings in Supabase Vault
- Remove plaintext secrets from database tables
- Reference secrets via vault keys only
5. Unified Region Mapping
- Centralized
mapDataRegionToNeon()function - Support for all major AWS regions
- Consistent mapping across the codebase
Implementation Details
GraphQL Schema Changes
# Pre-OTP provisioning - no userId required
mutation provisionUserResources(email: String!, metadata: JSON): ProvisioningResult!
# Retry provisioning with authorization
mutation retryProvisioning(organizationId: ID!): ProvisioningResult!
Atomic Provisioning Service
Located in supabase/functions/graphql-v2/services/atomic-provisioning.ts:
export async function atomicProvisionResources(
orgId: string,
user: any,
supabase: SupabaseClient
): Promise<{ success: boolean; error?: string }>
Key features:
- Pre-flight checks before provisioning
- Sequential resource creation with tracking
- Automatic rollback on failure
- Comprehensive event logging
Provisioning Events
Enhanced provisioning_events table tracks:
event_type: Type of provisioning eventresource_type: Type of resource being provisionedresource_id: ID of the created resourcestatus: Event status (in_progress, completed, failed)metadata: Additional context
CORS Configuration
// Environment variable: ALLOWED_ORIGINS=https://ocean.goldfish.io,https://ocean.staging.goldfish.io
const allowedOrigins = (Deno.env.get('ALLOWED_ORIGINS') || '')
.split(',')
.map((s) => s.trim())
.filter(Boolean)
Consequences
Positive
- Reliability: No more partial provisioning states
- Security: Proper CORS, rate limiting, and authorization
- Observability: Complete audit trail of provisioning attempts
- Maintainability: Centralized provisioning logic
- User Experience: Resources ready immediately after OTP verification
Negative
- Complexity: More complex provisioning flow with rollback logic
- Latency: Pre-flight checks add slight overhead
- Dependencies: Requires Redis for rate limiting
Trade-offs
- Chose atomic provisioning over eventual consistency for better UX
- Added complexity for rollback to ensure data consistency
- Rate limiting may affect legitimate rapid testing