Source: ocean/docs/user-provisioning.md | ✏️ Edit on GitHub
User Provisioning Hardening Work Plan
Objective
- Strengthen security and authorization of the provisioning flow
- Ensure idempotency for Stripe and Neon to prevent double-creation and billing drift
- Consolidate to a single, maintainable backend provisioning path
- Eliminate plaintext secret storage; use Supabase Secrets (Vault) exclusively
- Unify region mapping and align with UI values
- Improve observability using OpenTelemetry-style spans and Prometheus-style metrics
Scope
- Supabase Edge Functions:
supabase/functions/graphql-v2/**
- Frontend:
src/services/auth.ts, src/services/provisioning.ts, signup flow
- Database: secret handling and provisioning records
- CORS and rate limiting for GraphQL endpoint
- No ORM changes: continue using Drizzle for Neon and CrunchyBridge, Supabase SQL/PLpgSQL for Supabase, and Deno for Edge Functions
End-to-end flow summary (pre-OTP provisioning)
- User submits signup with mandatory fields: name, organization, industry, and data region.
- Frontend calls
supabase.auth.signInWithOtp({ shouldCreateUser: true, data: metadata }) to create the user and send OTP.
- Immediately (without waiting for OTP), frontend calls GraphQL
provisionUserResources(email, metadata).
- GraphQL (service role):
- Looks up the new user by email and verifies recency.
- Ensures there is an owner organization for the user from metadata.
- In parallel: creates Stripe customer (idempotent) and Neon project in the selected region (idempotent).
- Persists results:
organizations.stripe_customer_id; organization_databases row for Neon; Neon connection secret stored in Supabase Secrets (Vault) and referenced by a vault_key.
- While the user is receiving the OTP, provisioning is in progress.
- When the user enters the OTP, the app routes to the dashboard with resources already ready.
- On first dashboard load, a background
validateOrganizationReady can idempotently heal partial provisioning if any step failed.
Phase 1: Secure the API contract and authorization
- Pre-OTP provisioning via email + metadata (primary path)
- Client flow: After
signInWithOtp returns success, immediately call the GraphQL mutation with email and metadata (name, organization, industry, data region). No userId is sent from the client.
- Schema: Change
provisionUserResources(userId: ID!, email: String!, metadata: JSON) to provisionUserResources(email: String!, metadata: JSON): ProvisioningResult! in supabase/functions/graphql-v2/schema.ts.
- Resolver: In
supabase/functions/graphql-v2/resolvers/provisioning.ts, use the service-role Supabase client to:
- Look up the newly created
auth.users record by email and verify it was created very recently (e.g., within the last few minutes) to prevent abuse.
- Derive
userId and find or lazily create the owner organization from metadata.
- Enforce rate limiting (Upstash) and implement idempotency checks before creating Stripe/Neon resources.
- If an auth token is present (magic-link flow), allow the authenticated path as well, but the primary path is pre-OTP email-based.
- Provisioning must start immediately after
signInWithOtp returns success; do not wait for OTP verification.
- Guard resolver access
- Verify organization ownership/membership before provisioning (owner/admin).
- Reject cross-user attempts.
- Restrict CORS
- Edit
supabase/functions/graphql-v2/index.ts to allowlist app origins instead of *.
- Add basic rate limiting
- Use Upstash Redis in
graphql-v2 resolver to throttle provisioning attempts by email/IP (e.g., 5/min). Return 429 on exceed and log to Sentry.
Phase 2: Idempotency and consolidation
- Stripe idempotency and pre-checks
- Before creating customer, check
organizations.stripe_customer_id; skip if present.
- When creating, include
Idempotency-Key header, e.g. provisioning:org:${orgId}:stripe-customer.
- Neon idempotency and pre-checks
- Before creating project, check
organization_databases for existing active record; skip if present.
- Optionally verify with Neon during health checks, not during hot path.
- Consolidate provisioning logic
- Canonicalize on
supabase/functions/graphql-v2/services/provisioning.ts for external calls and storage.
- Update
resolvers/provisioning.ts to delegate to the service and return normalized results only.
- Deprecate
supabase/functions/provision-user-resources/index.ts (if it exists).
Phase 3: Secrets handling and schema hygiene
- Eliminate plaintext secrets
- Remove writes of
neon_role_password and connection_string to tables from all paths.
- Store connection URIs and credentials only via Supabase Secrets (Vault) using
vault_store_secret and keep a vault_key reference in the table.
- Schema cleanup
- If columns (
neon_role_password, connection_string) exist, set to NULL and stop using them; plan a follow-up migration to drop after sanitization.
- Use
organization_databases as canonical inventory; keep minimal flags/foreign keys on organizations.
- Guardrails
- Add a repo check (AST-based) to block writes of known secret fields to DB tables.
Phase 4: Region mapping unification
- Single mapping function
- Implement
mapDataRegionToNeon(region: 'us-east-1' | ...) in graphql-v2/services/provisioning.ts.
- Replace all ad hoc mappings with this function.
- UI alignment and validation
- Confirm signup UI emits AWS-style values (
us-east-1, eu-west-2, ap-southeast-1).
- Validate and reject unsupported regions server-side with a clear error.
Phase 5: Observability and reliability
- OpenTelemetry spans
- Wrap Stripe and Neon calls with spans (attributes:
organization.id, user.id, region, operation).
- Prometheus-style metrics
- Counters:
provisioning_attempt_total, provisioning_success_total, provisioning_error_total with service={stripe|neon} and reason labels.
- Histograms:
provisioning_external_call_duration_ms for Stripe/Neon requests.
- Error handling
- On partial failure, log to Sentry with context and return
success=false and errors[]. Never log secrets.
Phase 6: Frontend changes
- Remove reliance on
getUser() immediately after signInWithOtp
- Edit
src/services/auth.ts: do not expect a user session post-OTP.
- Update provisioning client
- Edit
src/services/provisioning.ts to call mutation without userId; pass email and include a token when available.
- Prefer server-triggered provisioning or rely on first authenticated organization query to idempotently ensure resources (
validateOrganizationReady).
Phase 7: Tests (cloud-first)
- Integration tests (using test keys)
- Idempotency: call provisioning twice; assert one Stripe customer and one Neon project persisted.
- Region validation: reject invalid region.
- Partial failure: simulate Neon failure; ensure Stripe succeeds and errors reported.
- Security tests
- Unauthenticated provisioning is rejected.
- Cross-user attempts denied.
- CORS denies unknown origins.
- Secret hygiene tests
- Automated check that no code writes
neon_role_password/connection_string to tables.
Phase 8: RLS and permissions
- RLS enforcement
- Verify RLS policies on
organizations and organization_databases disallow user writes to provisioning fields.
- Service role usage
- Ensure GraphQL function uses service role for provisioning writes; never expose privileged writes to browser.
Rollout and validation
- Deploy GraphQL with updated schema and resolver guards.
- Validate CORS allowlist.
- Exercise provisioning twice for a test org; confirm idempotency and metrics.
- Confirm Supabase Secrets (Vault) holds Neon connection secret and tables contain no plaintext secrets.
- Validate region mapping end-to-end from UI to Neon.
- Update ADR
017-synchronous-user-provisioning.md to reflect API contract changes (no userId in mutation) and idempotency policy.
- Confirm dashboard loads with ready resources immediately after OTP verification in typical network conditions.
Acceptance criteria
- Provisioning mutation rejects unauthenticated and cross-user attempts
- Double invocation does not create duplicate Stripe customers or Neon projects
- No plaintext Neon credentials in tables or logs; Supabase Secrets (Vault) used exclusively
- Region mapping consistent and validated against UI values
- CORS locked down and rate limiting active
- Spans and metrics present for Stripe/Neon; Sentry captures failures with context
- Provisioning starts immediately after
signInWithOtp success and completes within the OTP delivery window so the dashboard is ready on first load