Source:
ocean/docs/adr/0041-posthog-cdp-supabase-integration.md| ✏️ Edit on GitHub
ADR-004: PostHog CDP Integration with Supabase for Privacy-First Analytics
Status
Accepted
Date
2025-07-31
Context
We needed a comprehensive analytics solution that could:
- Track user behavior and feature usage across our application
- Provide business intelligence on organization growth and subscription metrics
- Maintain strict privacy compliance (GDPR, data anonymization)
- Integrate seamlessly with our existing Supabase infrastructure
- Minimize performance impact on our application
Previous Approach
- Manual PostHog event tracking via client-side and Edge Function calls
- Limited data correlation between analytics events and database state
- Risk of PII exposure through direct user data tracking
- Maintenance overhead for tracking code across the application
Requirements
- Privacy: No exposure of PII (emails, real names, addresses)
- Performance: Minimal impact on application performance
- Completeness: Capture all relevant user and organization data
- Security: Secure database access with proper authentication
- Maintainability: Reduce manual tracking code and maintenance
Decision
We will implement PostHog's Customer Data Platform (CDP) integration with Supabase to automatically sync database tables for analytics, using privacy-protected database views.
Architecture Components
1. Privacy-Protected Database Views
Create secure views that expose analytics data without PII:
posthog_users: User data with hashed emails, industry, and regionposthog_user_profiles: Combined user/organization data with anonymized IDsposthog_organization_metrics: Aggregated organization statistics
2. Database Connection Configuration
- Connection Method: Supabase Session Pooler (
aws-0-us-east-2.pooler.supabase.com:5432) - Authentication: Project-specific postgres user (
postgres.{project-ref}) - Network Security: PostHog IP whitelist (US region IPs)
- SSL/TLS: Required for secure data transmission
3. Data Synchronization Strategy
- Incremental Sync: For mutable data (
organizations,profiles, user views) - Full Table Sync: For aggregated metrics views
- Append-Only Sync: For event/audit data (
feature_flag_evaluations)
4. Privacy Protection Measures
- Email Hashing: SHA-256 hashing of email addresses for user identification
- ID Anonymization: Hash organization and user IDs in analytics views
- Data Filtering: Exclude internal/test users and sensitive metadata
- Domain-Only Exposure: Only email domains exposed for organizational tracking
Implementation Details
Database Views Created
-- Example: Privacy-protected user view
CREATE OR REPLACE VIEW public.posthog_users AS
SELECT
id,
encode(digest(email, 'sha256'), 'hex') as email_hash,
split_part(email, '@', 2) as email_domain,
created_at,
updated_at,
raw_user_meta_data->>'industry' as industry,
raw_user_meta_data->>'hosting_region' as hosting_region,
CASE WHEN confirmed_at IS NOT NULL THEN true ELSE false END as email_confirmed
FROM auth.users
WHERE confirmed_at IS NOT NULL;
PostHog Connection Configuration
Host: aws-0-us-east-2.pooler.supabase.com
Port: 5432
Database: postgres
User: postgres.fldiayolmgphysdwgsuk
Schema: public
Sync Schedule
- High-frequency tables: Every 1 hour (user activity, organization changes)
- Reference data: Every 24 hours (plans, feature flags)
- Metrics views: Every 6 hours (aggregated statistics)
Benefits
1. Enhanced Analytics Capabilities
- Complete user journey tracking from signup to conversion
- Cohort analysis by industry, region, and plan type
- Feature adoption metrics with organization context
- Revenue analytics combined with usage patterns
2. Privacy Compliance
- Zero PII exposure in analytics system
- GDPR compliant with user data anonymization
- Audit trail of data access and usage
- Right-to-be-forgotten compliance through view-level filtering
3. Operational Benefits
- Reduced client-side tracking code (better performance)
- Automatic data correlation (no manual event tracking)
- Historical data analysis (immediate access to existing data)
- Simplified analytics implementation for new features
4. Security Improvements
- Database-level access controls
- Encrypted data transmission
- IP-restricted database access
- Row-level security through database views
Drawbacks and Mitigations
1. Data Freshness
- Issue: CDP sync introduces latency (not real-time)
- Mitigation: Maintain critical real-time events via direct PostHog API for immediate feedback
2. Database Load
- Issue: Regular full-table syncs on large datasets
- Mitigation: Use incremental sync where possible, schedule during low-usage periods
3. Complex Data Relationships
- Issue: Analytics queries may be complex due to hashed relationships
- Mitigation: Pre-calculated metrics views and PostHog's SQL query capabilities
4. Vendor Lock-in
- Issue: Increased dependency on PostHog's CDP infrastructure
- Mitigation: Views are portable, can be adapted for other analytics platforms
Alternative Approaches Considered
1. Manual Event Tracking (Current)
Rejected: High maintenance overhead, incomplete data coverage, PII exposure risk
2. Data Warehouse ETL (Airbyte/Fivetran)
Rejected: Additional infrastructure complexity, higher cost, overkill for current scale
3. Custom Analytics Pipeline
Rejected: Significant development and maintenance overhead, reinventing analytics wheel
4. Direct Database Access for PostHog
Rejected: Security risk, PII exposure, no data transformation layer
Success Metrics
Technical Metrics
- Data sync reliability: >99.9% success rate
- Sync latency: <1 hour for incremental updates
- Query performance: <5s for standard analytics queries
- Zero PII exposure incidents
Business Metrics
- Reduced analytics implementation time: >50% reduction for new features
- Improved data completeness: 100% coverage of user journey
- Enhanced decision-making: Weekly business metrics automated
- Compliance readiness: 100% GDPR compliance score
Implementation Plan
Phase 1: Infrastructure Setup ✅
- Create privacy-protected database views
- Configure PostHog CDP connection
- Set up network security and authentication
- Test data sync for core tables
Phase 2: Analytics Migration
- Migrate existing analytics queries to use CDP data
- Create PostHog dashboards for key business metrics
- Implement automated alerting for critical metrics
- Validate data accuracy against existing tracking
Phase 3: Advanced Analytics
- Implement cohort analysis and retention metrics
- Create funnel analysis for user onboarding
- Set up revenue attribution and feature ROI tracking
- Build predictive analytics for churn prevention
Phase 4: Optimization
- Fine-tune sync schedules based on usage patterns
- Optimize database views for query performance
- Implement data archiving for historical analysis
- Scale analytics infrastructure for growth
Security Considerations
Data Protection
- All sensitive data (emails, names) are hashed using SHA-256
- Database views filter out internal/test accounts
- PostHog readonly user has minimal required permissions
- Network access restricted to PostHog IP addresses
Access Control
- Database connection uses dedicated readonly user
- Row-level security enforced through database views
- Audit logging enabled for all database access
- Regular security reviews of exposed data fields
Compliance
- GDPR Article 25: Privacy by design implemented through secure views
- Data minimization: Only analytics-relevant fields exposed
- Anonymization: Personal identifiers hashed or removed
- Retention policies: Configurable data retention periods
Monitoring and Maintenance
Operational Monitoring
- Sync job success/failure alerts
- Data quality validation checks
- Query performance monitoring
- Cost tracking for PostHog usage
Data Quality Assurance
- Automated tests for view data accuracy
- Reconciliation checks between database and PostHog
- Schema change impact assessment
- Data drift detection and alerting
Regular Reviews
- Quarterly security audit of exposed data
- Annual privacy compliance review
- Performance optimization assessment
- Cost-benefit analysis of analytics ROI
Conclusion
The PostHog CDP integration with privacy-protected Supabase views provides a scalable, secure, and maintainable analytics solution that meets our current needs while positioning us for future growth. The approach balances comprehensive analytics capabilities with strict privacy requirements, reducing technical debt while improving data-driven decision making.
This architecture supports our long-term vision of becoming a data-driven organization while maintaining the highest standards of user privacy and data protection.
Decision Made By: System Architecture Team
Next Review Date: 2025-10-31
Related ADRs: ADR-001 (Supabase Selection), ADR-002 (Authentication Strategy), ADR-003 (Sentry Monitoring)