Skip to main content

Source: ocean/docs/adr/0041-posthog-cdp-supabase-integration.md | ✏️ Edit on GitHub

ADR-004: PostHog CDP Integration with Supabase for Privacy-First Analytics

Status

Accepted

Date

2025-07-31

Context

We needed a comprehensive analytics solution that could:

  1. Track user behavior and feature usage across our application
  2. Provide business intelligence on organization growth and subscription metrics
  3. Maintain strict privacy compliance (GDPR, data anonymization)
  4. Integrate seamlessly with our existing Supabase infrastructure
  5. Minimize performance impact on our application

Previous Approach

  • Manual PostHog event tracking via client-side and Edge Function calls
  • Limited data correlation between analytics events and database state
  • Risk of PII exposure through direct user data tracking
  • Maintenance overhead for tracking code across the application

Requirements

  • Privacy: No exposure of PII (emails, real names, addresses)
  • Performance: Minimal impact on application performance
  • Completeness: Capture all relevant user and organization data
  • Security: Secure database access with proper authentication
  • Maintainability: Reduce manual tracking code and maintenance

Decision

We will implement PostHog's Customer Data Platform (CDP) integration with Supabase to automatically sync database tables for analytics, using privacy-protected database views.

Architecture Components

1. Privacy-Protected Database Views

Create secure views that expose analytics data without PII:

  • posthog_users: User data with hashed emails, industry, and region
  • posthog_user_profiles: Combined user/organization data with anonymized IDs
  • posthog_organization_metrics: Aggregated organization statistics

2. Database Connection Configuration

  • Connection Method: Supabase Session Pooler (aws-0-us-east-2.pooler.supabase.com:5432)
  • Authentication: Project-specific postgres user (postgres.{project-ref})
  • Network Security: PostHog IP whitelist (US region IPs)
  • SSL/TLS: Required for secure data transmission

3. Data Synchronization Strategy

  • Incremental Sync: For mutable data (organizations, profiles, user views)
  • Full Table Sync: For aggregated metrics views
  • Append-Only Sync: For event/audit data (feature_flag_evaluations)

4. Privacy Protection Measures

  • Email Hashing: SHA-256 hashing of email addresses for user identification
  • ID Anonymization: Hash organization and user IDs in analytics views
  • Data Filtering: Exclude internal/test users and sensitive metadata
  • Domain-Only Exposure: Only email domains exposed for organizational tracking

Implementation Details

Database Views Created

-- Example: Privacy-protected user view
CREATE OR REPLACE VIEW public.posthog_users AS
SELECT
id,
encode(digest(email, 'sha256'), 'hex') as email_hash,
split_part(email, '@', 2) as email_domain,
created_at,
updated_at,
raw_user_meta_data->>'industry' as industry,
raw_user_meta_data->>'hosting_region' as hosting_region,
CASE WHEN confirmed_at IS NOT NULL THEN true ELSE false END as email_confirmed
FROM auth.users
WHERE confirmed_at IS NOT NULL;

PostHog Connection Configuration

Host: aws-0-us-east-2.pooler.supabase.com
Port: 5432
Database: postgres
User: postgres.fldiayolmgphysdwgsuk
Schema: public

Sync Schedule

  • High-frequency tables: Every 1 hour (user activity, organization changes)
  • Reference data: Every 24 hours (plans, feature flags)
  • Metrics views: Every 6 hours (aggregated statistics)

Benefits

1. Enhanced Analytics Capabilities

  • Complete user journey tracking from signup to conversion
  • Cohort analysis by industry, region, and plan type
  • Feature adoption metrics with organization context
  • Revenue analytics combined with usage patterns

2. Privacy Compliance

  • Zero PII exposure in analytics system
  • GDPR compliant with user data anonymization
  • Audit trail of data access and usage
  • Right-to-be-forgotten compliance through view-level filtering

3. Operational Benefits

  • Reduced client-side tracking code (better performance)
  • Automatic data correlation (no manual event tracking)
  • Historical data analysis (immediate access to existing data)
  • Simplified analytics implementation for new features

4. Security Improvements

  • Database-level access controls
  • Encrypted data transmission
  • IP-restricted database access
  • Row-level security through database views

Drawbacks and Mitigations

1. Data Freshness

  • Issue: CDP sync introduces latency (not real-time)
  • Mitigation: Maintain critical real-time events via direct PostHog API for immediate feedback

2. Database Load

  • Issue: Regular full-table syncs on large datasets
  • Mitigation: Use incremental sync where possible, schedule during low-usage periods

3. Complex Data Relationships

  • Issue: Analytics queries may be complex due to hashed relationships
  • Mitigation: Pre-calculated metrics views and PostHog's SQL query capabilities

4. Vendor Lock-in

  • Issue: Increased dependency on PostHog's CDP infrastructure
  • Mitigation: Views are portable, can be adapted for other analytics platforms

Alternative Approaches Considered

1. Manual Event Tracking (Current)

Rejected: High maintenance overhead, incomplete data coverage, PII exposure risk

2. Data Warehouse ETL (Airbyte/Fivetran)

Rejected: Additional infrastructure complexity, higher cost, overkill for current scale

3. Custom Analytics Pipeline

Rejected: Significant development and maintenance overhead, reinventing analytics wheel

4. Direct Database Access for PostHog

Rejected: Security risk, PII exposure, no data transformation layer

Success Metrics

Technical Metrics

  • Data sync reliability: >99.9% success rate
  • Sync latency: <1 hour for incremental updates
  • Query performance: <5s for standard analytics queries
  • Zero PII exposure incidents

Business Metrics

  • Reduced analytics implementation time: >50% reduction for new features
  • Improved data completeness: 100% coverage of user journey
  • Enhanced decision-making: Weekly business metrics automated
  • Compliance readiness: 100% GDPR compliance score

Implementation Plan

Phase 1: Infrastructure Setup ✅

  • Create privacy-protected database views
  • Configure PostHog CDP connection
  • Set up network security and authentication
  • Test data sync for core tables

Phase 2: Analytics Migration

  • Migrate existing analytics queries to use CDP data
  • Create PostHog dashboards for key business metrics
  • Implement automated alerting for critical metrics
  • Validate data accuracy against existing tracking

Phase 3: Advanced Analytics

  • Implement cohort analysis and retention metrics
  • Create funnel analysis for user onboarding
  • Set up revenue attribution and feature ROI tracking
  • Build predictive analytics for churn prevention

Phase 4: Optimization

  • Fine-tune sync schedules based on usage patterns
  • Optimize database views for query performance
  • Implement data archiving for historical analysis
  • Scale analytics infrastructure for growth

Security Considerations

Data Protection

  • All sensitive data (emails, names) are hashed using SHA-256
  • Database views filter out internal/test accounts
  • PostHog readonly user has minimal required permissions
  • Network access restricted to PostHog IP addresses

Access Control

  • Database connection uses dedicated readonly user
  • Row-level security enforced through database views
  • Audit logging enabled for all database access
  • Regular security reviews of exposed data fields

Compliance

  • GDPR Article 25: Privacy by design implemented through secure views
  • Data minimization: Only analytics-relevant fields exposed
  • Anonymization: Personal identifiers hashed or removed
  • Retention policies: Configurable data retention periods

Monitoring and Maintenance

Operational Monitoring

  • Sync job success/failure alerts
  • Data quality validation checks
  • Query performance monitoring
  • Cost tracking for PostHog usage

Data Quality Assurance

  • Automated tests for view data accuracy
  • Reconciliation checks between database and PostHog
  • Schema change impact assessment
  • Data drift detection and alerting

Regular Reviews

  • Quarterly security audit of exposed data
  • Annual privacy compliance review
  • Performance optimization assessment
  • Cost-benefit analysis of analytics ROI

Conclusion

The PostHog CDP integration with privacy-protected Supabase views provides a scalable, secure, and maintainable analytics solution that meets our current needs while positioning us for future growth. The approach balances comprehensive analytics capabilities with strict privacy requirements, reducing technical debt while improving data-driven decision making.

This architecture supports our long-term vision of becoming a data-driven organization while maintaining the highest standards of user privacy and data protection.


Decision Made By: System Architecture Team
Next Review Date: 2025-10-31
Related ADRs: ADR-001 (Supabase Selection), ADR-002 (Authentication Strategy), ADR-003 (Sentry Monitoring)