Star Schema Design ⭐️

erDiagram
    %% CENTRAL FACT TABLE
    fact_consent_journey {
        string consent_handle PK
        date event_date PK
        int32 fiu_key FK
        int32 purpose_code_key FK
        int32 fip_key FK
        date date_key FK
        int32 license_type_key FK
        int32 user_type_key FK
        int32 journey_type_key FK
        int32 fi_type_key FK
        int32 tsp_key FK
        int32 consent_initiated_count
        int32 consent_approved_count
        int32 consent_fulfilled_count
        int32 data_fetch_initiated_count
        int32 data_fetch_fulfilled_count
        decimal revenue_amount
        int32 target_consents_initiated
        int32 target_consents_fulfilled
        decimal deficit_from_target
        int32 journey_duration_minutes
        int32 fip_response_time_ms
        int32 avg_timelag_fi_notification_ms
        decimal customer_lifetime_value_estimate
        decimal market_growth_rate
        decimal conversion_rate_ci_to_cf
        decimal overall_success_rate
        decimal fi_ready_to_request_ratio
        decimal fip_attribution_ratio
        decimal aa_reliability_score
        decimal competitive_position_score
    }

    %% TIER 1 DIMENSIONS - HIGH PRIORITY
    dim_fiu {
        int32 fiu_key PK
        string fiu_name
        string client_type
        string revenue_tier
        date onboarding_date
        string revenue_model
        decimal monthly_target_volume
    }

    dim_purpose_code {
        int32 purpose_code_key PK
        string purpose_code
        string description
        string billing_model
        decimal billing_rate
        string revenue_type
        string frequency_type
    }

    dim_fip {
        int32 fip_key PK
        string fip_name
        string fip_type
        decimal uptime_score
        decimal success_rate
        string performance_tier
        string bank_category
    }

    dim_date {
        date date_key PK
        int32 year
        int32 month
        int32 day
        int32 quarter
        string month_name
        string day_of_week
        boolean is_business_day
        int32 fiscal_year
        int32 fiscal_quarter
    }

    dim_license_type {
        int32 license_type_key PK
        string license_category
        string regulatory_body
        string compliance_level
        string entity_type
        boolean is_regulated
    }

    dim_user_type {
        int32 user_type_key PK
        string user_category
        string experience_level
        boolean has_linked_accounts
        string journey_complexity
        decimal expected_success_rate
    }

    %% TIER 2 DIMENSIONS - PHASE 2
    dim_journey_type {
        int32 journey_type_key PK
        string journey_category
        string assistance_level
        string channel_type
        decimal typical_success_rate
        string target_audience
    }

    dim_fi_type {
        int32 fi_type_key PK
        string fi_category
        string data_type
        string regulatory_classification
        boolean requires_special_handling
        string sahamati_category
    }

    dim_tsp {
        int32 tsp_key PK
        string tsp_name
        string integration_type
        decimal performance_score
        string service_tier
        date partnership_start
    }

    %% RELATIONSHIPS
    fact_consent_journey ||--o{ dim_fiu : "fiu_key"
    fact_consent_journey ||--o{ dim_purpose_code : "purpose_code_key"
    fact_consent_journey ||--o{ dim_fip : "fip_key"
    fact_consent_journey ||--o{ dim_date : "date_key"
    fact_consent_journey ||--o{ dim_license_type : "license_type_key"
    fact_consent_journey ||--o{ dim_user_type : "user_type_key"
    fact_consent_journey ||--o{ dim_journey_type : "journey_type_key"
    fact_consent_journey ||--o{ dim_fi_type : "fi_type_key"
    fact_consent_journey ||--o{ dim_tsp : "tsp_key"

🔸 KIMBALL STEP 1: IDENTIFY BUSINESS PROCESS

REASONING FROM MD-SPACE:

From business_analysis_insights.md:

“OVERVIEW_FUNNEL = 5-stage executive view (CI→CA→CF→DRI→DRF)”
“DETAILED_FUNNEL = 8-stage operational view for drill-down”
“DRILL_DOWN_LATENCY = < 3 seconds from executive funnel to individual error logs”

From customer_success_team_requirements.md:

“DROP_OFF_REDUCTION = primary focus for customer success”
“FIP_PERFORMANCE_ATTRIBUTION = separate FIP issues from AA issues”
“ROOT_CAUSE_ANALYSIS = systematic drill-down from high-level metrics to specific logs”

From dw_design_business_requirements.md:

“DRILL_DOWN_CAPABILITY = 5-stage → 8-stage → error analysis → API events → logs (<3 second latency)”
“FUNNEL_CONVERSION_RATES = CI→CA→CF→DRI→DRF (5-stage executive view)”

From current_pain_points_analysis.md:

“TRUST_ISSUE = ‘data given by us is always under inspection of being right or wrong’”
“PROBLEM = assets often insufficient for new requirements”

From finvu_business_context.md:

“VALUE_PROPOSITION = reliable financial data gateway (low latency, high success rate)”
“SUCCESS_RATE = primary value proposition (reliability focus)”

✅ DECISION RATIONALE:

Highest Business Impact: Funnel optimization directly affects revenue
Multi-Team Usage: Customer Success, Business Team, Tech Team all need this
Clear Success Metrics: <3 second drill-down requirement
Data Availability: Events already captured in Kafka
Trust Building: Addresses core pain point of data reliability and accuracy

🔸 KIMBALL STEP 2: DECLARE THE GRAIN

REASONING FROM MD-SPACE:

From consent_lifecycle_detailed.md:

“CONSENT_HANDLE: temporary identifier for consent session”
“CONSENT_ARTEFACT: approved consent stored on AA (key for data requests)”

From tech_infrastructure_context.md:

“CONSENT_HANDLE = Primary identifier for consent journey”
“SESSION_ID = Multiple session IDs can map to one consent handle”
“EVENT_STITCHING = [Consent Handle → Session ID(s) → Individual Events → Outbound Call Mapping]”

From data_retention_tiering_strategy.md:

“HOT_DATA (HOURLY_ACCESS) = Customer Success Team, Support Team, Tech Team”
“WARM_DATA (DAILY_ACCESS) = Business Team, Sales, Marketing, Account Management”

From revenue_model_business_metrics.md:

“CUSTOMER_IDENTITY_RESOLUTION_ENHANCED: CURRENT_LIMITATION = mobileno@finvu ≠ unique individual user”
“SOLUTION_AVAILABLE = masked account numbers for enhanced deduplication”

✅ GRAIN RATIONALE:

Business Logic: Consent Handle is the atomic business transaction
Daily Aggregation: Supports both real-time (hourly) and daily business needs
Drill-Down Support: Can aggregate up to weekly/monthly, drill down to events
Performance: Daily grain balances detail with <3 second query requirement
Identity Consideration: Grain accommodates future customer identity resolution via masked account numbers

🔸 KIMBALL STEP 3: IDENTIFY DIMENSIONS

DECISION: Priority Dimensions Based on Business Questions

From dw_design_business_requirements.md core business questions:

🥇 TIER 1 DIMENSIONS (Immediate Implementation)

1. dim_fiu - Client Performance Analysis

-- BUSINESS REASONING: Q1: "Which FIU clients generate highest revenue per consent?"
-- FROM md-space: "FIU_WISE_BREAKDOWN = [Navi, Cred, Axis, ICICI, Bajaj Finance, RKB Finance]"
-- FROM md-space: "CLIENT_PORTFOLIO_CONFIRMED = [Navi, Cred, Axis, RKB Finance, ICICI, Bajaj Finance]"

2. dim_purpose_code - Revenue Attribution

-- BUSINESS REASONING: Q3: "How does purpose code mix impact monthly recurring revenue?"
-- FROM md-space: "PURPOSE_CODE_CORRELATION = 103 (lending), 101/102 (PFM)"
-- FROM md-space: "REVENUE_ATTRIBUTION = ₹3/fetch (lending) vs ₹0.10/month (PFM)"
-- FROM md-space: "PURPOSE_CODE_SEGMENTATION_DETAILED = 101/102 (PFM recurring), 103 (LENDING per-fetch)"

3. dim_fip - FIP Health Attribution

-- BUSINESS REASONING: Q15: "How do we separate FIP health issues from AA performance issues?"
-- FROM md-space: "FIP_HEALTH_ATTRIBUTION = separate FIP vs AA performance issues"
-- FROM md-space: "PRIMARY_SUCCESS_RATE_DRIVER = FIP functional uptime"
-- FROM md-space: "CHALLENGE = FIPs not highly functionally available"

4. dim_date - Time Analysis

-- BUSINESS REASONING: Daily/MTD tracking requirements
-- FROM md-space: "DAY_ON_DAY_TRACKING = short-term trend identification"
-- FROM md-space: "MONTH_TO_DATE_ANALYSIS = progress toward monthly goals"
-- FROM md-space: "TARGET_VS_ACTUAL = variance analysis for course correction"

5. dim_license_type - Regulatory Categorization

-- BUSINESS REASONING: Q16: "Can we automate Sahamati daily/monthly reporting?"
-- FROM md-space: "LICENSE_TYPE_DIMENSION = SEBI, RBI, other regulatory categorization"
-- FROM md-space: "LICENSE_TYPE = [SEBI, RBI, Others] regulatory categorization"
-- FROM md-space: "CLIENT_TYPES = [Regulated_FIUs, LSPs(Loan_Service_Providers)]"

6. dim_user_type - User Behavior Analysis

-- BUSINESS REASONING: Q2: "What is the cost-per-acquisition by journey type?"
-- FROM md-space: "USER_CLASSIFICATION_LOGIC: NEW_USERS = users with finvu handle created BUT no bank accounts linked"
-- FROM md-space: "EXISTING_USERS = users with at least one account already linked on finvu"
-- FROM md-space: "USER_SEGMENTATION = new vs existing user performance comparison"

🥈 TIER 2 DIMENSIONS (Phase 2 Implementation)

7. dim_journey_type - Performance Comparison

-- BUSINESS REASONING: Q13: "How do assisted vs DIY journeys perform by time/geography?"
-- FROM md-space: "JOURNEY_TYPE_DIMENSION = assisted vs DIY performance comparison"
-- FROM md-space: "ASSISTED_JOURNEYS = bank branch RMs, higher success rates"
-- FROM md-space: "DIY_JOURNEYS = PFM users, educated audience, lower success rates"

8. dim_fi_type - Regulatory Reporting

-- BUSINESS REASONING: Sahamati FI Type wise breakdown requirements
-- FROM md-space: "FI_TYPE_BREAKDOWN = deposit, mutual fund, insurance, etc."
-- FROM md-space: "SAHAMATI_REPORTING = FI Type wise breakdown required"
-- FROM md-space: "DATA_TYPES = [DEPOSIT, TERM_DEPOSIT, MUTUAL_FUNDS, ETF, etc.]"

9. dim_tsp - Technical Service Provider Segmentation

-- BUSINESS REASONING: Technical integration performance analysis
-- FROM md-space: "TSP_DIMENSION = technical service provider segmentation"
-- FROM md-space: "TSP_WISE = technical service provider segmentation"

✅ DIMENSION PRIORITY RATIONALE:

Tier 1: Directly answers revenue optimization, operational excellence, and regulatory compliance questions
Tier 2: Supports advanced analytics and deeper segmentation
Performance: Fewer dimensions initially = faster queries
Iterative: Add dimensions based on actual usage patterns
Complete Coverage: Addresses all critical business aspects from md-space

🔸 KIMBALL STEP 4: IDENTIFY FACTS

DECISION: Fact Types Based on Business Metrics

From business_analysis_insights.md key metrics:

🎯 ADDITIVE FACTS (Can be summed across all dimensions)

-- FROM md-space: "CI_GROWTH_RATE = Consent Initiated growth tracking"
consent_initiated_count Int32,

-- FROM md-space: "CONSENT_SR_TRACKING = Success Rate monitoring"  
consent_approved_count Int32,
consent_fulfilled_count Int32,

-- FROM md-space: "DF_SR_TRACKING = Data Fetch Success Rate"
data_fetch_initiated_count Int32,
data_fetch_fulfilled_count Int32,

-- FROM md-space: "REVENUE_ATTRIBUTION = ₹3/fetch (lending) vs ₹0.10/month (PFM)"
revenue_amount Decimal(10,2),

-- FROM md-space: "SURPLUS_CALCULATION = actual performance vs target variance"
target_consents_initiated Int32,
target_consents_fulfilled Int32,

-- FROM md-space: "DAILY_DEFICIT_TRACKING = daily shortfall from monthly targets"
deficit_from_target Decimal(10,2)

🎯 SEMI-ADDITIVE FACTS (Can be averaged, not summed across time)

-- FROM md-space: "Journey duration for performance analysis"
journey_duration_minutes Int32,

-- FROM md-space: "FIP_HEALTH_SCORING = provider-specific uptime"
fip_response_time_ms Int32,

-- FROM md-space: "SAHAMATI_REPORTING: DAILY_METRICS = [timelag metrics, FI ready ratios]"
avg_timelag_fi_notification_ms Int32,

-- FROM md-space: "CUSTOMER_IDENTITY_RESOLUTION_ENHANCED"
customer_lifetime_value_estimate Decimal(10,2),

-- FROM md-space: "GROWTH_RATE_TRACKING = 2x market rate achievement monitoring"
market_growth_rate Decimal(5,4)

🎯 NON-ADDITIVE FACTS (Calculated at query time)

-- FROM md-space: "FUNNEL_CONVERSION_RATES = CI→CA→CF→DRI→DRF"
conversion_rate_ci_to_cf Decimal(5,4),

-- FROM md-space: "SUCCESS_RATE = primary value proposition"
overall_success_rate Decimal(5,4),

-- FROM md-space: "FI_READY_to_FI_request_ratio"
fi_ready_to_request_ratio Decimal(5,4),

-- FROM md-space: "FIP_ATTRIBUTION_RATIO = fip_caused_failures / total_failures"
fip_attribution_ratio Decimal(5,4),

-- FROM md-space: "AA_RELIABILITY_SCORE = (total_successful - fip_caused_failures) / (total_attempts - fip_caused_failures)"
aa_reliability_score Decimal(5,4),

-- FROM md-space: "COMPETITIVE_GAP = onemoneyaa_share - finvu_share"
competitive_position_score Decimal(5,2)

✅ FACT SELECTION RATIONALE:

Business Alignment: Each fact answers specific business questions from md-space
Kimball Compliance: Proper additive/semi-additive/non-additive classification
Performance: Atomic facts enable flexible aggregation
Extensibility: Can add derived facts without schema changes
Target Tracking: Supports deficit alerting and growth monitoring requirements
Regulatory Compliance: Includes Sahamati/RBI automated reporting metrics
Competitive Analysis: Supports market positioning and competitive benchmarking

📋 COMPREHENSIVE BUSINESS ALIGNMENT

✅ ADDRESSES FROM MD-SPACE:

Revenue optimization (Q1-Q5 from business requirements)
Funnel analysis (5-stage + 8-stage drill-down capability)
FIP health attribution (separate FIP vs AA performance issues)
Regulatory compliance (Sahamati/RBI automation)
Customer success drill-down (<3 second latency requirement)
Growth tracking (2x market rate target monitoring)
Target variance analysis (daily deficit tracking)
Customer identity resolution (enhanced deduplication support)
Competitive benchmarking (market positioning analysis)

🔧 GAPS IDENTIFIED & ADDRESSED:

Added regulatory dimensions (license_type for SEBI/RBI categorization)
Added user behavior dimensions (user_type for new vs existing analysis)
Added target tracking facts (deficit monitoring and growth targets)
Added regulatory compliance facts (automated Sahamati/RBI reporting)
Added competitive analysis facts (market positioning metrics)

📋 NEXT IMMEDIATE ACTION:

Should we now:

Design the detailed star schema DDL for Tier 1 dimensions?
Map specific Kafka events to our grain and facts?
Create dbt model structure for the transformations?

Based on Holistics’ guidance on “Model As And When You Need To”, I recommend we start with #1 - detailed star schema DDL for immediate implementation.