Finvu Data Warehouse Documentation

Finvu Home

Bronze Layer Design Rationale


Executive Summary

The Bronze Layer serves as the engineering-focused staging and ingestion tier in Finvu’s Medallion Architecture, designed for optimal event capture and data forwarding. This layer prioritizes complete data preservation, ingestion performance, and reliable data lineage while deferring all business logic to the Silver layer where proper event stitching and business context will be applied.

Core Engineering Principles

  1. Complete Data Capture: Ingest all 43 event types from eventConfig-aa.json without loss
  2. Performance-Optimized: Real-time ingestion with <15 minute latency targets
  3. Schema Agnostic: JSON preservation for maximum flexibility during schema evolution
  4. Engineering Reliability: Robust error handling, monitoring, and data lineage
  5. Business Logic Deferred: Let Silver layer handle event correlation and business context

Medallion Architecture Flow

The Bronze Layer fits into our three-tier architecture with clear separation of concerns:

graph LR
    subgraph "Bronze Layer (Engineering-Driven)"
        INGEST["Complete Event Ingestion<br/>β€’ 43 event types captured<br/>β€’ JSON preservation<br/>β€’ Kafka metadata lineage<br/>β€’ Performance optimization"]
    end
    
    subgraph "Silver Layer (Business Transition)"
        STITCH["Event Correlation & Stitching<br/>β€’ consent_handle enrichment<br/>β€’ fiu_id attribution<br/>β€’ Journey reconstruction<br/>β€’ Business context application"]
    end
    
    subgraph "Gold Layer (Business-Driven)"
        ANALYTICS["Star Schema & Analytics<br/>β€’ Revenue attribution<br/>β€’ Customer journey analysis<br/>β€’ Dimensional modeling<br/>β€’ Business KPIs"]
    end
    
    INGEST --> STITCH
    STITCH --> ANALYTICS
    
    style INGEST fill:#ff9800
    style STITCH fill:#4caf50
    style ANALYTICS fill:#2196f3

Clear Responsibilities

  • Bronze: Maximum data capture, optimal ingestion performance, complete lineage
  • Silver: Event correlation, business context enrichment, data quality improvement
  • Gold: Business analytics, dimensional modeling, KPI calculations

Bronze Layer Architecture: Pure Ingestion Focus

Following the ClickHouse Kafka Table Engine pattern for optimal performance:

graph TB
    subgraph "Kafka Event Streaming"
        KT["Kafka Topic: finvu-aa-events<br/>43 Event Types<br/>High Volume Production Load"]
        KP["6-12 Partitions<br/>Optimized for Throughput"]
    end
    
    subgraph "Bronze Layer (Engineering Infrastructure)"
        subgraph "Step 1: Kafka Consumer (Ephemeral)"
            KTE["kafka_finvu_events<br/>Kafka Table Engine<br/>NO STORAGE - Pure Consumer"]
            KMD["Kafka Metadata Capture<br/>(_topic, _partition, _offset, _timestamp, _key)"]
        end
        
        subgraph "Step 2: Real-time Processing"
            MV["mv_kafka_to_finvu_events<br/>Minimal Processing<br/>JSON + Basic Extraction"]
            PERF["Performance Optimizations<br/>β€’ Batch processing<br/>β€’ Error resilience<br/>β€’ Schema flexibility"]
        end
        
        subgraph "Step 3: Staging Storage"
            ST["finvu_events (Bronze Storage)<br/>MergeTree Engine<br/>STAGING TABLE FOR SILVER LAYER"]
            META["Complete Data Lineage<br/>β€’ Full event data<br/>β€’ Kafka metadata<br/>β€’ Ingestion timestamps"]
        end
    end
    
    subgraph "Engineering Monitoring"
        ING["Ingestion Performance"]
        LAG["Consumer Lag Monitoring"]
        ERR["Error Rate Tracking"]
        VOL["Volume & Throughput"]
    end
    
    KT --> KTE
    KP --> KTE
    KTE --> MV
    MV --> PERF
    PERF --> ST
    ST --> META
    
    ST --> ING
    ST --> LAG
    ST --> ERR
    ST --> VOL
    
    style KT fill:#ffecb3
    style KTE fill:#fff3e0
    style MV fill:#e8f5e8
    style ST fill:#e3f2fd
    style META fill:#f3e5f5

Event Ingestion Strategy

Based on eventConfig-aa.json analysis, our engineering-focused approach:

flowchart TD
    subgraph "Event Sources (Production Systems)"
        PROD["Production AA Systems<br/>β€’ 43 distinct event types<br/>β€’ 6 event source categories<br/>β€’ Variable schema patterns<br/>β€’ High-frequency streams"]
    end
    
    subgraph "Bronze Ingestion Pipeline"
        CAPTURE["Complete Event Capture<br/>β€’ Raw JSON preservation<br/>β€’ Zero data loss priority<br/>β€’ Schema evolution tolerance"]
        
        EXTRACT["Minimal Field Extraction<br/>β€’ event_name (for routing)<br/>β€’ event_timestamp (for ordering)<br/>β€’ Basic metadata only<br/>β€’ Business logic β†’ Silver"]
        
        STORE["Optimized Storage<br/>β€’ Monthly partitioning<br/>β€’ Compression optimization<br/>β€’ Index for performance<br/>β€’ TTL for compliance"]
    end
    
    subgraph "Data Forward to Silver"
        STAGE["Staging Handoff<br/>β€’ Complete event data<br/>β€’ Full Kafka lineage<br/>β€’ Quality flags<br/>β€’ Ready for business logic"]
    end
    
    PROD --> CAPTURE
    CAPTURE --> EXTRACT
    EXTRACT --> STORE
    STORE --> STAGE
    
    style PROD fill:#ffcdd2
    style CAPTURE fill:#e8f5e8
    style EXTRACT fill:#fff3e0
    style STORE fill:#e1f5fe
    style STAGE fill:#f3e5f5

Engineering-Optimized Schema Design

Our Bronze table focuses on ingestion efficiency and data completeness:

-- Bronze Layer: Engineering-Optimized Event Staging
CREATE TABLE bronze.finvu_events (
    -- Temporal Organization (Ingestion Performance)
    event_date Date DEFAULT toDate(event_timestamp),
    event_timestamp DateTime64(3) CODEC(Delta, ZSTD(3)),
    ingestion_timestamp DateTime64(3) DEFAULT now64(3),
    
    -- Event Identity (Minimal for Routing)
    event_id String COMMENT 'Event identifier for deduplication',
    event_name LowCardinality(String) COMMENT 'Event type for Silver layer routing',
    event_source LowCardinality(String) COMMENT 'Source system classification',
    event_type LowCardinality(String) COMMENT 'Transport type (HTTP/WS/etc)',
    
    -- Complete Event Preservation (Schema Agnostic)
    event_data JSON CODEC(ZSTD(3)) COMMENT 'Complete raw event - business logic in Silver',
    event_data_size UInt32 COMMENT 'Size tracking for performance monitoring',
    
    -- Engineering Metadata (Infrastructure Support)
    data_quality LowCardinality(String) DEFAULT 'CAPTURED' COMMENT 'CAPTURED/CORRUPTED/PARTIAL',
    processing_errors String DEFAULT '' COMMENT 'Ingestion error tracking',
    is_duplicate Bool DEFAULT 0 COMMENT 'Duplicate detection flag',
    
    -- Complete Kafka Lineage (Data Engineering Essential)
    kafka_topic LowCardinality(String) COMMENT 'Source topic',
    kafka_partition UInt32 COMMENT 'Source partition',
    kafka_offset UInt64 COMMENT 'Source offset',
    kafka_timestamp DateTime COMMENT 'Kafka message timestamp',
    kafka_key String COMMENT 'Kafka message key',
    
    -- Minimal Business Context (For Silver Layer Efficiency)
    txn_id String DEFAULT '' COMMENT 'Transaction correlation ID',
    session_id String DEFAULT '' COMMENT 'Session correlation ID',
    request_url String DEFAULT '' COMMENT 'API endpoint tracking',
    response_code UInt16 DEFAULT 0 COMMENT 'HTTP response code'
    
) ENGINE = MergeTree()
PARTITION BY toYYYYMM(event_date)
ORDER BY (event_name, event_timestamp, event_id)
SETTINGS index_granularity = 8192
TTL event_timestamp + INTERVAL 7 YEAR
COMMENT 'Bronze staging table - complete event capture for Silver layer processing';

Design Philosophy

  • JSON-First: Complete event preservation with minimal extraction
  • Performance-Optimized: Partitioning and indexing for high-throughput ingestion
  • Lineage-Complete: Full Kafka metadata for debugging and replay capability
  • Business-Deferred: Consent handles, FIU IDs, purpose codes extracted in Silver layer

Event Processing: Minimal & Efficient

Our materialized view focuses on ingestion performance rather than business logic:

-- Engineering-Focused Event Processing
CREATE MATERIALIZED VIEW bronze.mv_kafka_to_finvu_events TO bronze.finvu_events AS
SELECT
    -- Time handling (performance critical)
    toDate(COALESCE(
        parseDateTime64BestEffortOrNull(JSONExtractString(event_data, 'eventTimestamp')),
        _timestamp
    )) AS event_date,
    
    COALESCE(
        parseDateTime64BestEffortOrNull(JSONExtractString(event_data, 'eventTimestamp')),
        _timestamp
    ) AS event_timestamp,
    
    now64(3) AS ingestion_timestamp,
    
    -- Basic event identification (minimal extraction)
    COALESCE(
        JSONExtractString(event_data, 'eventId'),
        JSONExtractString(event_data, 'id'),
        toString(generateUUIDv4())
    ) AS event_id,
    
    COALESCE(
        JSONExtractString(event_data, 'eventName'),
        'UNKNOWN_EVENT'
    ) AS event_name,
    
    -- Event source classification (for Silver layer routing)
    CASE 
        WHEN JSONExtractString(event_data, 'eventSource') != '' THEN 
            JSONExtractString(event_data, 'eventSource')
        WHEN JSONExtractString(event_data, 'eventName') LIKE 'USER:AA:%' THEN 'AAServer:UserEndpoint'
        WHEN JSONExtractString(event_data, 'eventName') LIKE 'FIU:AA:%' THEN 'AAServer:ConsentFlow'
        WHEN JSONExtractString(event_data, 'eventName') LIKE 'FIP:AA:%' THEN 'AAServer:Notifications'
        WHEN JSONExtractString(event_data, 'eventName') LIKE 'AA:FIP:%' THEN 'AAServer:DataFlow'
        WHEN JSONExtractString(event_data, 'eventName') LIKE 'AA:FIU:%' THEN 'AAServer:DataFlow'
        ELSE 'AAServer:Undefined'
    END AS event_source,
    
    -- Event type (transport classification)
    CASE
        WHEN JSONExtractString(event_data, 'eventType') != '' THEN 
            JSONExtractString(event_data, 'eventType')
        WHEN JSONExtractString(event_data, 'eventName') LIKE '%:AA:%Request' THEN 'HTTP_IN'
        WHEN JSONExtractString(event_data, 'eventName') LIKE 'AA:%:%Request' THEN 'HTTP_OUT'
        WHEN JSONExtractString(event_data, 'eventName') LIKE '%Websocket%' THEN 'WS_IN'
        ELSE 'DEFAULT'
    END AS event_type,
    
    -- Complete event preservation (core Bronze responsibility)
    CAST(event_data AS JSON) AS event_data,
    length(event_data) AS event_data_size,
    
    -- Engineering quality assessment (not business quality)
    CASE 
        WHEN event_data = '' OR NOT isValidJSON(event_data) THEN 'CORRUPTED'
        WHEN length(event_data) < 50 THEN 'PARTIAL'
        ELSE 'CAPTURED'
    END AS data_quality,
    
    -- Error tracking for engineering ops
    CASE
        WHEN NOT isValidJSON(event_data) THEN 'INVALID_JSON'
        WHEN length(event_data) = 0 THEN 'EMPTY_PAYLOAD'
        ELSE ''
    END AS processing_errors,
    
    -- Basic duplicate detection
    CASE
        WHEN JSONExtractString(event_data, 'isDuplicate') = 'true' THEN 1
        ELSE 0
    END AS is_duplicate,
    
    -- Complete Kafka lineage (essential for data engineering)
    _topic AS kafka_topic,
    _partition AS kafka_partition,
    _offset AS kafka_offset,
    _timestamp AS kafka_timestamp,
    _key AS kafka_key,
    
    -- Minimal correlation IDs (for Silver layer efficiency)
    JSONExtractString(event_data, 'txnId') AS txn_id,
    JSONExtractString(event_data, 'sessionId') AS session_id,
    JSONExtractString(event_data, 'requestUrl') AS request_url,
    JSONExtractUInt(event_data, 'responseCode') AS response_code
    
FROM bronze.kafka_finvu_events
WHERE event_data != '' AND length(event_data) > 10
COMMENT 'Minimal processing for maximum ingestion performance';

Engineering Monitoring Strategy

Focus on infrastructure health and data flow reliability:

graph TB
    subgraph "Engineering Monitoring (Bronze Focus)"
        subgraph "Ingestion Performance"
            THROUGHPUT["Throughput Monitoring<br/>β€’ Events/second ingestion rate<br/>β€’ Peak load handling<br/>β€’ Batch processing efficiency"]
            LAG["Consumer Lag Tracking<br/>β€’ Kafka offset lag<br/>β€’ Processing delays<br/>β€’ Backpressure detection"]
        end
        
        subgraph "Data Quality Engineering"
            CAPTURE["Capture Completeness<br/>β€’ All 43 event types seen<br/>β€’ No missing partitions<br/>β€’ Schema evolution handling"]
            ERRORS["Error Rate Monitoring<br/>β€’ JSON parsing failures<br/>β€’ Processing exceptions<br/>β€’ Duplicate detection rates"]
        end
        
        subgraph "Infrastructure Health"
            KAFKA["Kafka Consumer Health<br/>β€’ Connection stability<br/>β€’ Partition distribution<br/>β€’ Consumer group balance"]
            STORAGE["Storage Performance<br/>β€’ Write throughput<br/>β€’ Compression ratios<br/>β€’ Partition efficiency"]
        end
    end
    
    subgraph "Silver Layer Handoff Quality"
        READINESS["Data Readiness<br/>β€’ Complete event availability<br/>β€’ Proper temporal ordering<br/>β€’ Lineage metadata intact"]
    end
    
    THROUGHPUT --> READINESS
    LAG --> READINESS
    CAPTURE --> READINESS
    ERRORS --> READINESS
    KAFKA --> READINESS
    STORAGE --> READINESS
    
    style THROUGHPUT fill:#e3f2fd
    style CAPTURE fill:#e8f5e8
    style KAFKA fill:#fff3e0
    style READINESS fill:#c8e6c9

Key Monitoring Views

-- 1. Ingestion Performance Monitor
CREATE VIEW bronze.v_ingestion_performance AS
SELECT
    toStartOfHour(ingestion_timestamp) AS hour,
    event_source,
    count(*) AS events_ingested,
    round(count(*) / 3600, 2) AS events_per_second,
    countIf(data_quality = 'CORRUPTED') AS corrupted_events,
    countIf(processing_errors != '') AS error_count,
    max(ingestion_timestamp) AS latest_ingestion,
    dateDiff('minute', max(event_timestamp), max(ingestion_timestamp)) AS lag_minutes
FROM bronze.finvu_events
WHERE ingestion_timestamp >= now() - INTERVAL 24 HOUR
GROUP BY hour, event_source
ORDER BY hour DESC, events_ingested DESC
COMMENT 'Engineering performance metrics for Bronze layer';

-- 2. Event Type Coverage Monitor
CREATE VIEW bronze.v_event_coverage AS
SELECT
    event_source,
    event_type,
    count(DISTINCT event_name) AS unique_event_types,
    count(*) AS total_events,
    min(event_timestamp) AS first_seen,
    max(event_timestamp) AS last_seen,
    round(avg(event_data_size), 0) AS avg_event_size_bytes
FROM bronze.finvu_events
WHERE ingestion_timestamp >= now() - INTERVAL 24 HOUR
GROUP BY event_source, event_type
ORDER BY total_events DESC
COMMENT 'Complete event type coverage validation';

-- 3. Kafka Consumer Health Monitor
CREATE VIEW bronze.v_kafka_consumer_health AS
SELECT
    kafka_topic,
    kafka_partition,
    count(*) AS events_consumed,
    max(kafka_offset) AS latest_offset,
    max(kafka_timestamp) AS latest_kafka_timestamp,
    dateDiff('minute', max(kafka_timestamp), max(ingestion_timestamp)) AS consumer_lag_minutes,
    countIf(processing_errors != '') AS processing_errors
FROM bronze.finvu_events
WHERE ingestion_timestamp >= now() - INTERVAL 1 HOUR
GROUP BY kafka_topic, kafka_partition
ORDER BY consumer_lag_minutes DESC
COMMENT 'Kafka consumer performance and health monitoring';

Silver Layer Integration Strategy

Bronze Layer prepares data for efficient Silver layer processing:

Data Handoff Design

-- Silver Layer will perform the business logic:
-- 1. Event Correlation & Stitching
SELECT 
    event_data,
    -- Extract business context in Silver:
    JSONExtractString(event_data, 'consentHandle') AS consent_handle,
    JSONExtractString(event_data, 'fiu') AS fiu_id,
    JSONExtractString(event_data, 'purposeCode') AS purpose_code
FROM bronze.finvu_events
WHERE event_source = 'AAServer:ConsentFlow'
  AND event_timestamp >= now() - INTERVAL 1 HOUR;

-- 2. Journey Reconstruction
-- Silver layer will correlate events by consent_handle, session_id, txn_id

-- 3. Business Enrichment
-- Silver layer will apply Finvu-specific business rules and KPIs

Clear Separation of Concerns

  • Bronze: β€œDid we capture the event completely and reliably?”
  • Silver: β€œWhat business meaning does this event have in context?”
  • Gold: β€œHow does this impact our KPIs and analytics?”

Implementation Roadmap

Phase 1: Core Ingestion Infrastructure (Week 1)

  • Kafka consumer setup with production-ready configurations
  • Bronze schema deployment optimized for ingestion performance
  • Materialized view with minimal processing overhead
  • Engineering monitoring views (performance, coverage, health)

Phase 2: Production Validation (Week 2)

  • All 43 event types ingesting successfully
  • Performance validation (>10K events/minute sustained)
  • Error handling and recovery procedures tested
  • Complete Kafka lineage validation

Phase 3: Silver Layer Integration (Week 3)

  • Efficient data handoff to Silver layer validated
  • Business logic processing confirmed in Silver
  • End-to-end data flow Bronze β†’ Silver β†’ Gold tested
  • Operational runbooks and monitoring alerts configured

Success Criteria

This engineering-focused Bronze Layer delivers:

Engineering Excellence

  • βœ… Complete Data Capture: All 43 event types ingested without loss
  • βœ… High Performance: >10K events/minute sustained throughput
  • βœ… Reliable Processing: <15 minute ingestion lag consistently
  • βœ… Schema Flexibility: Handles event evolution without breaking
  • βœ… Complete Lineage: Full Kafka metadata for debugging and replay

Silver Layer Enablement

  • βœ… Clean Data Handoff: Well-structured staging data for business logic
  • βœ… Efficient Queries: Optimized partitioning and indexing for Silver processing
  • βœ… Context Preservation: All event data available for business enrichment
  • βœ… Quality Indicators: Clear data quality flags for Silver layer handling

Operational Readiness

  • βœ… Monitoring Coverage: Complete infrastructure and performance monitoring
  • βœ… Error Handling: Robust error detection and recovery procedures
  • βœ… Scalability: Clear scaling path based on production load patterns
  • βœ… Maintainability: Simple, reliable, and well-documented infrastructure

Philosophy

Bronze Layer is the foundation: Our job is to be the reliable, high-performance staging ground that captures everything cleanly and hands it off efficiently to the Silver layer where the real business magic happens.

We focus on engineering excellence in data ingestion, letting the Silver layer focus on business intelligence in data transformation.


This Bronze Layer represents pure engineering discipline: maximum data capture, optimal performance, complete reliability - setting up the Silver layer for business success.