Star Schema Design ⭐️
erDiagram
%% CENTRAL FACT TABLE
fact_consent_journey {
string consent_handle PK
date event_date PK
int32 fiu_key FK
int32 purpose_code_key FK
int32 fip_key FK
date date_key FK
int32 license_type_key FK
int32 user_type_key FK
int32 journey_type_key FK
int32 fi_type_key FK
int32 tsp_key FK
int32 consent_initiated_count
int32 consent_approved_count
int32 consent_fulfilled_count
int32 data_fetch_initiated_count
int32 data_fetch_fulfilled_count
decimal revenue_amount
int32 target_consents_initiated
int32 target_consents_fulfilled
decimal deficit_from_target
int32 journey_duration_minutes
int32 fip_response_time_ms
int32 avg_timelag_fi_notification_ms
decimal customer_lifetime_value_estimate
decimal market_growth_rate
decimal conversion_rate_ci_to_cf
decimal overall_success_rate
decimal fi_ready_to_request_ratio
decimal fip_attribution_ratio
decimal aa_reliability_score
decimal competitive_position_score
}
%% TIER 1 DIMENSIONS - HIGH PRIORITY
dim_fiu {
int32 fiu_key PK
string fiu_name
string client_type
string revenue_tier
date onboarding_date
string revenue_model
decimal monthly_target_volume
}
dim_purpose_code {
int32 purpose_code_key PK
string purpose_code
string description
string billing_model
decimal billing_rate
string revenue_type
string frequency_type
}
dim_fip {
int32 fip_key PK
string fip_name
string fip_type
decimal uptime_score
decimal success_rate
string performance_tier
string bank_category
}
dim_date {
date date_key PK
int32 year
int32 month
int32 day
int32 quarter
string month_name
string day_of_week
boolean is_business_day
int32 fiscal_year
int32 fiscal_quarter
}
dim_license_type {
int32 license_type_key PK
string license_category
string regulatory_body
string compliance_level
string entity_type
boolean is_regulated
}
dim_user_type {
int32 user_type_key PK
string user_category
string experience_level
boolean has_linked_accounts
string journey_complexity
decimal expected_success_rate
}
%% TIER 2 DIMENSIONS - PHASE 2
dim_journey_type {
int32 journey_type_key PK
string journey_category
string assistance_level
string channel_type
decimal typical_success_rate
string target_audience
}
dim_fi_type {
int32 fi_type_key PK
string fi_category
string data_type
string regulatory_classification
boolean requires_special_handling
string sahamati_category
}
dim_tsp {
int32 tsp_key PK
string tsp_name
string integration_type
decimal performance_score
string service_tier
date partnership_start
}
%% RELATIONSHIPS
fact_consent_journey ||--o{ dim_fiu : "fiu_key"
fact_consent_journey ||--o{ dim_purpose_code : "purpose_code_key"
fact_consent_journey ||--o{ dim_fip : "fip_key"
fact_consent_journey ||--o{ dim_date : "date_key"
fact_consent_journey ||--o{ dim_license_type : "license_type_key"
fact_consent_journey ||--o{ dim_user_type : "user_type_key"
fact_consent_journey ||--o{ dim_journey_type : "journey_type_key"
fact_consent_journey ||--o{ dim_fi_type : "fi_type_key"
fact_consent_journey ||--o{ dim_tsp : "tsp_key"
🔸 KIMBALL STEP 1: IDENTIFY BUSINESS PROCESS
DECISION: Focus on “Consent Journey Funnel” as Primary Business Process
REASONING FROM MD-SPACE:
From business_analysis_insights.md:
- “OVERVIEW_FUNNEL = 5-stage executive view (CI→CA→CF→DRI→DRF)”
- “DETAILED_FUNNEL = 8-stage operational view for drill-down”
- “DRILL_DOWN_LATENCY = < 3 seconds from executive funnel to individual error logs”
From customer_success_team_requirements.md:
- “DROP_OFF_REDUCTION = primary focus for customer success”
- “FIP_PERFORMANCE_ATTRIBUTION = separate FIP issues from AA issues”
- “ROOT_CAUSE_ANALYSIS = systematic drill-down from high-level metrics to specific logs”
From dw_design_business_requirements.md:
- “DRILL_DOWN_CAPABILITY = 5-stage → 8-stage → error analysis → API events → logs (<3 second latency)”
- “FUNNEL_CONVERSION_RATES = CI→CA→CF→DRI→DRF (5-stage executive view)”
From current_pain_points_analysis.md:
- “TRUST_ISSUE = ‘data given by us is always under inspection of being right or wrong’”
- “PROBLEM = assets often insufficient for new requirements”
From finvu_business_context.md:
- “VALUE_PROPOSITION = reliable financial data gateway (low latency, high success rate)”
- “SUCCESS_RATE = primary value proposition (reliability focus)”
✅ DECISION RATIONALE:
- Highest Business Impact: Funnel optimization directly affects revenue
- Multi-Team Usage: Customer Success, Business Team, Tech Team all need this
- Clear Success Metrics: <3 second drill-down requirement
- Data Availability: Events already captured in Kafka
- Trust Building: Addresses core pain point of data reliability and accuracy
🔸 KIMBALL STEP 2: DECLARE THE GRAIN
DECISION: One row per Consent Handle per Day
REASONING FROM MD-SPACE:
From consent_lifecycle_detailed.md:
- “CONSENT_HANDLE: temporary identifier for consent session”
- “CONSENT_ARTEFACT: approved consent stored on AA (key for data requests)”
From tech_infrastructure_context.md:
- “CONSENT_HANDLE = Primary identifier for consent journey”
- “SESSION_ID = Multiple session IDs can map to one consent handle”
- “EVENT_STITCHING = [Consent Handle → Session ID(s) → Individual Events → Outbound Call Mapping]”
From data_retention_tiering_strategy.md:
- “HOT_DATA (HOURLY_ACCESS) = Customer Success Team, Support Team, Tech Team”
- “WARM_DATA (DAILY_ACCESS) = Business Team, Sales, Marketing, Account Management”
From revenue_model_business_metrics.md:
- “CUSTOMER_IDENTITY_RESOLUTION_ENHANCED: CURRENT_LIMITATION = mobileno@finvu ≠ unique individual user”
- “SOLUTION_AVAILABLE = masked account numbers for enhanced deduplication”
✅ GRAIN RATIONALE:
- Business Logic: Consent Handle is the atomic business transaction
- Daily Aggregation: Supports both real-time (hourly) and daily business needs
- Drill-Down Support: Can aggregate up to weekly/monthly, drill down to events
- Performance: Daily grain balances detail with <3 second query requirement
- Identity Consideration: Grain accommodates future customer identity resolution via masked account numbers
🔸 KIMBALL STEP 3: IDENTIFY DIMENSIONS
DECISION: Priority Dimensions Based on Business Questions
From dw_design_business_requirements.md core business questions:
🥇 TIER 1 DIMENSIONS (Immediate Implementation)
1. dim_fiu - Client Performance Analysis
-- BUSINESS REASONING: Q1: "Which FIU clients generate highest revenue per consent?"
-- FROM md-space: "FIU_WISE_BREAKDOWN = [Navi, Cred, Axis, ICICI, Bajaj Finance, RKB Finance]"
-- FROM md-space: "CLIENT_PORTFOLIO_CONFIRMED = [Navi, Cred, Axis, RKB Finance, ICICI, Bajaj Finance]"
2. dim_purpose_code - Revenue Attribution
-- BUSINESS REASONING: Q3: "How does purpose code mix impact monthly recurring revenue?"
-- FROM md-space: "PURPOSE_CODE_CORRELATION = 103 (lending), 101/102 (PFM)"
-- FROM md-space: "REVENUE_ATTRIBUTION = ₹3/fetch (lending) vs ₹0.10/month (PFM)"
-- FROM md-space: "PURPOSE_CODE_SEGMENTATION_DETAILED = 101/102 (PFM recurring), 103 (LENDING per-fetch)"
3. dim_fip - FIP Health Attribution
-- BUSINESS REASONING: Q15: "How do we separate FIP health issues from AA performance issues?"
-- FROM md-space: "FIP_HEALTH_ATTRIBUTION = separate FIP vs AA performance issues"
-- FROM md-space: "PRIMARY_SUCCESS_RATE_DRIVER = FIP functional uptime"
-- FROM md-space: "CHALLENGE = FIPs not highly functionally available"
4. dim_date - Time Analysis
-- BUSINESS REASONING: Daily/MTD tracking requirements
-- FROM md-space: "DAY_ON_DAY_TRACKING = short-term trend identification"
-- FROM md-space: "MONTH_TO_DATE_ANALYSIS = progress toward monthly goals"
-- FROM md-space: "TARGET_VS_ACTUAL = variance analysis for course correction"
5. dim_license_type - Regulatory Categorization
-- BUSINESS REASONING: Q16: "Can we automate Sahamati daily/monthly reporting?"
-- FROM md-space: "LICENSE_TYPE_DIMENSION = SEBI, RBI, other regulatory categorization"
-- FROM md-space: "LICENSE_TYPE = [SEBI, RBI, Others] regulatory categorization"
-- FROM md-space: "CLIENT_TYPES = [Regulated_FIUs, LSPs(Loan_Service_Providers)]"
6. dim_user_type - User Behavior Analysis
-- BUSINESS REASONING: Q2: "What is the cost-per-acquisition by journey type?"
-- FROM md-space: "USER_CLASSIFICATION_LOGIC: NEW_USERS = users with finvu handle created BUT no bank accounts linked"
-- FROM md-space: "EXISTING_USERS = users with at least one account already linked on finvu"
-- FROM md-space: "USER_SEGMENTATION = new vs existing user performance comparison"
🥈 TIER 2 DIMENSIONS (Phase 2 Implementation)
7. dim_journey_type - Performance Comparison
-- BUSINESS REASONING: Q13: "How do assisted vs DIY journeys perform by time/geography?"
-- FROM md-space: "JOURNEY_TYPE_DIMENSION = assisted vs DIY performance comparison"
-- FROM md-space: "ASSISTED_JOURNEYS = bank branch RMs, higher success rates"
-- FROM md-space: "DIY_JOURNEYS = PFM users, educated audience, lower success rates"
8. dim_fi_type - Regulatory Reporting
-- BUSINESS REASONING: Sahamati FI Type wise breakdown requirements
-- FROM md-space: "FI_TYPE_BREAKDOWN = deposit, mutual fund, insurance, etc."
-- FROM md-space: "SAHAMATI_REPORTING = FI Type wise breakdown required"
-- FROM md-space: "DATA_TYPES = [DEPOSIT, TERM_DEPOSIT, MUTUAL_FUNDS, ETF, etc.]"
9. dim_tsp - Technical Service Provider Segmentation
-- BUSINESS REASONING: Technical integration performance analysis
-- FROM md-space: "TSP_DIMENSION = technical service provider segmentation"
-- FROM md-space: "TSP_WISE = technical service provider segmentation"
✅ DIMENSION PRIORITY RATIONALE:
- Tier 1: Directly answers revenue optimization, operational excellence, and regulatory compliance questions
- Tier 2: Supports advanced analytics and deeper segmentation
- Performance: Fewer dimensions initially = faster queries
- Iterative: Add dimensions based on actual usage patterns
- Complete Coverage: Addresses all critical business aspects from md-space
🔸 KIMBALL STEP 4: IDENTIFY FACTS
DECISION: Fact Types Based on Business Metrics
From business_analysis_insights.md key metrics:
🎯 ADDITIVE FACTS (Can be summed across all dimensions)
-- FROM md-space: "CI_GROWTH_RATE = Consent Initiated growth tracking"
consent_initiated_count Int32,
-- FROM md-space: "CONSENT_SR_TRACKING = Success Rate monitoring"
consent_approved_count Int32,
consent_fulfilled_count Int32,
-- FROM md-space: "DF_SR_TRACKING = Data Fetch Success Rate"
data_fetch_initiated_count Int32,
data_fetch_fulfilled_count Int32,
-- FROM md-space: "REVENUE_ATTRIBUTION = ₹3/fetch (lending) vs ₹0.10/month (PFM)"
revenue_amount Decimal(10,2),
-- FROM md-space: "SURPLUS_CALCULATION = actual performance vs target variance"
target_consents_initiated Int32,
target_consents_fulfilled Int32,
-- FROM md-space: "DAILY_DEFICIT_TRACKING = daily shortfall from monthly targets"
deficit_from_target Decimal(10,2)
🎯 SEMI-ADDITIVE FACTS (Can be averaged, not summed across time)
-- FROM md-space: "Journey duration for performance analysis"
journey_duration_minutes Int32,
-- FROM md-space: "FIP_HEALTH_SCORING = provider-specific uptime"
fip_response_time_ms Int32,
-- FROM md-space: "SAHAMATI_REPORTING: DAILY_METRICS = [timelag metrics, FI ready ratios]"
avg_timelag_fi_notification_ms Int32,
-- FROM md-space: "CUSTOMER_IDENTITY_RESOLUTION_ENHANCED"
customer_lifetime_value_estimate Decimal(10,2),
-- FROM md-space: "GROWTH_RATE_TRACKING = 2x market rate achievement monitoring"
market_growth_rate Decimal(5,4)
🎯 NON-ADDITIVE FACTS (Calculated at query time)
-- FROM md-space: "FUNNEL_CONVERSION_RATES = CI→CA→CF→DRI→DRF"
conversion_rate_ci_to_cf Decimal(5,4),
-- FROM md-space: "SUCCESS_RATE = primary value proposition"
overall_success_rate Decimal(5,4),
-- FROM md-space: "FI_READY_to_FI_request_ratio"
fi_ready_to_request_ratio Decimal(5,4),
-- FROM md-space: "FIP_ATTRIBUTION_RATIO = fip_caused_failures / total_failures"
fip_attribution_ratio Decimal(5,4),
-- FROM md-space: "AA_RELIABILITY_SCORE = (total_successful - fip_caused_failures) / (total_attempts - fip_caused_failures)"
aa_reliability_score Decimal(5,4),
-- FROM md-space: "COMPETITIVE_GAP = onemoneyaa_share - finvu_share"
competitive_position_score Decimal(5,2)
✅ FACT SELECTION RATIONALE:
- Business Alignment: Each fact answers specific business questions from md-space
- Kimball Compliance: Proper additive/semi-additive/non-additive classification
- Performance: Atomic facts enable flexible aggregation
- Extensibility: Can add derived facts without schema changes
- Target Tracking: Supports deficit alerting and growth monitoring requirements
- Regulatory Compliance: Includes Sahamati/RBI automated reporting metrics
- Competitive Analysis: Supports market positioning and competitive benchmarking
📋 COMPREHENSIVE BUSINESS ALIGNMENT
✅ ADDRESSES FROM MD-SPACE:
- Revenue optimization (Q1-Q5 from business requirements)
- Funnel analysis (5-stage + 8-stage drill-down capability)
- FIP health attribution (separate FIP vs AA performance issues)
- Regulatory compliance (Sahamati/RBI automation)
- Customer success drill-down (<3 second latency requirement)
- Growth tracking (2x market rate target monitoring)
- Target variance analysis (daily deficit tracking)
- Customer identity resolution (enhanced deduplication support)
- Competitive benchmarking (market positioning analysis)
🔧 GAPS IDENTIFIED & ADDRESSED:
- Added regulatory dimensions (license_type for SEBI/RBI categorization)
- Added user behavior dimensions (user_type for new vs existing analysis)
- Added target tracking facts (deficit monitoring and growth targets)
- Added regulatory compliance facts (automated Sahamati/RBI reporting)
- Added competitive analysis facts (market positioning metrics)
📋 NEXT IMMEDIATE ACTION:
Should we now:
- Design the detailed star schema DDL for Tier 1 dimensions?
- Map specific Kafka events to our grain and facts?
- Create dbt model structure for the transformations?
Based on Holistics’ guidance on “Model As And When You Need To”, I recommend we start with #1 - detailed star schema DDL for immediate implementation.