Data Retention & Tiering Strategy

🗄️

Strategic Data Lifecycle Management

Consumer-driven data tiering strategy aligned with business requirements and regulatory compliance

👥 Consumer-Based Data Requirements

HOT_DATA (HOURLY_ACCESS)

CONSUMERS = [Customer Success Team, Support Team, Tech Team]
USE_CASES = [FIP health monitoring, real-time error analysis, system alerts]
DIMENSIONS = [TSP, Purpose Code, FI Type, License Type, Journey Type, User Type]
LATENCY = < 3 seconds for drill-down analysis
RETENTION = 7 days in high-performance storage

WARM_DATA (DAILY_ACCESS)

CONSUMERS = [Business Team, Sales, Marketing, Account Management]
USE_CASES = [daily funnel reports, growth tracking, client performance]
DELIVERY_SCHEDULE = every morning before business hours
RETENTION = 90 days in standard storage

COOL_DATA (MONTHLY_ACCESS)

CONSUMERS = [Executive Team, Investors]
USE_CASES = [monthly investor meetings, strategic planning, competitive analysis]
DELIVERY_SCHEDULE = 1st week of every month
RETENTION = 3 months in standard storage

COLD_DATA (YEARLY/ADHOC_ACCESS)

CONSUMERS = [RBI, Sahamati, Audit Entities, Regulators]
USE_CASES = [lifetime consent metrics, regulatory compliance, audit trails]
QUERIES = ["Lifetime consents number", "Lifetime consents fulfilled", compliance reporting]
RETENTION = 7+ years in archive storage (regulatory compliance)

ECOSYSTEM_SCALE_CONTEXT

Based on Sahamati dashboard data:
CONSENT_VOLUME = ~16.2M monthly (as of latest data)
ACCOUNT_LINKING = ~9.8M monthly (as of latest data)
GROWTH_TREND = exponential growth from ~1.6M to current volumes
FINVU_SHARE = estimated top 3 position in this ecosystem

STORAGE_TIERING_ARCHITECTURE

HOT_TIER = ClickHouse main cluster (SSD storage, high compute)
WARM_TIER = ClickHouse with compression (balanced storage/compute)
COOL_TIER = ClickHouse with heavy compression (storage optimized)
COLD_TIER = Object storage (S3/GCS) with ClickHouse external tables

DEVOPS_RETENTION_POLICIES

SERVER_LOGS

RETENTION_PERIOD = 6-12 months
STORAGE_TYPE = Standard log management
PURPOSE = System debugging, performance monitoring

APPLICATION_LOGS

RETENTION_PERIOD = 7-10 years
STORAGE_TYPE = Long-term archive
PURPOSE = Audit trails, regulatory compliance, business intelligence

DATA_SECURITY_IMPLEMENTATION

ENCRYPTION_AT_REST = Disk encryption enabled
PII_HANDLING = All PII data masked
DATA_BLIND_COMPLIANCE = Maintained across all systems

DISASTER_RECOVERY_STRATEGY

CURRENT_DR_SETUP

ARCHITECTURE = Hot-Hot replication for LogDB (Cassandra)
SECONDARY_SITE = Automatic failover capability
ANALYTICS_CONTINUITY = MR/Funnels run from secondary site if primary down
MONTHLY_PROCESSES = [Billing, Reports] - run after primary site recovery

DR_EXPECTATIONS

PRIMARY_SITE_RECOVERY = Hopefully < 1 month (critical business operations)
DATA_AVAILABILITY = Immediate via hot-hot replication
BUSINESS_CONTINUITY = Analytics uninterrupted, billing deferred

COMPLIANCE_AND_AUDIT_REQUIREMENTS

REGULATORY_BODIES = [RBI, Sahamati, Industry Auditors]
AUDIT_TRAIL_DURATION = 7+ years minimum
DATA_LINEAGE = Complete event trail preservation
COMPLIANCE_QUERIES = ["Lifetime consents", "Lifetime fulfillments", "Transaction audit trails"]