Technical Infrastructure Context
TARGET_DW_ARCHITECTURE
ANALYTICAL_DB = ClickHouse (UAT: VM deployment, PROD: TBD)
TRANSFORMATION_LAYER = dbt (medallion architecture)
DATA_INGESTION = Kafka → ClickHouse direct streaming
ARCHITECTURE_PATTERN = Bronze → Silver → Gold layers
BRONZE_LAYER_FOUNDATION
TABLE_SCHEMA = finvu_events (as prototyped in hackathon)
PARTITION_STRATEGY = toYYYYMM(event_date)
ORDER_BY = (event_name, event_time, txn_id)
JSON_PRESERVATION = event_data + additional_info as JSON fields
COMPRESSION = ZSTD(3) + Delta for timestamps
DATA_MIGRATION_CHALLENGES
HISTORICAL_BACKUPS = nested JSON with escape characters
CLEANING_SOLUTION = Go script for data preprocessing
KAFKA_EVENT_STRUCTURE = event_id, txn_id, event_name, event_source, event_time + JSON payloads
BACKFILL_REQUIREMENT = historical data from Cassandra LogDB
DEPLOYMENT_STRATEGY
UAT_PHASE = VM-based ClickHouse + Kafka connectivity testing
PROD_DECISIONS_PENDING = ClickHouse deployment architecture
SUCCESS_CRITERIA = TBD (performance + accuracy benchmarks)
ARCHITECTURAL_CONSTRAINTS
EVENT_FLEXIBILITY = bronze layer preserves raw structure for unknown future events
REAL_TIME_CAPABILITY = operational dashboards require streaming data
HISTORICAL_ANALYSIS = support backfill from existing Cassandra data
COMPLIANCE_MAINTAINED = data blind company requirements preserved
UX_DATA_INTEGRATION
POSTHOG_INTEGRATION = user journey tracking (recently deployed, testing phase)
ARCHITECTURE_QUESTION = ClickHouse (server events) vs PostHog (user journey) separation
CROSS_REFERENCE_GOAL = link server events to user journey via session IDs
FUTURE_NAVIGATION = seamless transition between backend + frontend insights
CONSENT_JOURNEY_TRACING
JOURNEY_MAPPING_ARCHITECTURE
CONSENT_HANDLE = Primary identifier for consent journey
SESSION_ID = Multiple session IDs can map to one consent handle
EVENT_STITCHING = [Consent Handle → Session ID(s) → Individual Events → Outbound Call Mapping]
OUTBOUND_MAPPING = txnid-based correlation for AA→FIP calls
KAFKA_EVENT_STREAMING
EVENT_COVERAGE = All server-side events streamed to Kafka
EVENT_SOURCE = AA Core service (single service architecture)
STREAMING_GUARANTEE = Complete journey coverage via Kafka events
CLICKHOUSE_INTEGRATION = Direct streaming from Kafka to ClickHouse
EXPECTED_SYSTEM_CHANGES
CHANGE_FREQUENCY
NEW_EVENT_TYPES = Expected every ~6 months (not frequent)
CORE_RIGIDITY = Production system with minimal structural changes
CHANGE_CATEGORIES = [Payload changes, Tech stack migration, Minor enhancements]
ANTICIPATED_CHANGES
TECH_MIGRATION = Karaf Camel → SpringBoot (potential future)
RETRY_MECHANISMS = Enhanced retry logic implementation
CONSENT_GUARD = Fair Use Policy enforcement (already implemented)
RATE_LIMITING = FIP-specific batching based on uptime
OTP_MONITORING = OTP provider performance tracking
ASSET_MANAGEMENT = Internal system optimization
CHANGE_ADAPTATION_STRATEGY
SCHEMA_FLEXIBILITY = Bronze layer JSON preservation for unknown events
NEW_EVENT_ADOPTION = Planned process for event type additions
PAYLOAD_EVOLUTION = Support for payload structure changes
CORE_API_PATCHES = Minimal impact design for production stability
CURRENT_EVENT_CONFIGURATION
EVENT_SOURCES = [AAServer:UserEndpoint, AAServer:ConsentFlow, AAServer:DataFlow,
AAServer:Notifications, AAServer:AdminApi, AAServer:PullNotifications]
SPECIAL_PROCESSORS = [FIDATA_PROCESSOR, FIREQUEST_PROCESSOR, WS_REQUEST_PROCESSOR,
DISCOVERY_REQUEST_PROCESSOR]
EVENT_TYPES = [HTTP_IN, HTTP_OUT, WS_IN, WS_OUT, DEFAULT]