Marketing teams need to send relevant offers within seconds of customer transactions (e.g., 10% off groceries). The system must be reliable, low-cost, and privacy-aware with PII minimization.
Architecture
Built a production-ready streaming architecture:
Kafka (Confluent) → Spark Structured Streaming → S3 (Parquet, partitioned by date) + SQS → Lambda for notifications
Key Features
Real-time Processing: 30-60 second end-to-end latency
PII Protection: Hashed phone numbers in data lake, raw only in notification path
Fault Tolerance: Spark checkpointing for exactly-once processing
Dual-Sink Pattern: Parallel writes to analytics (S3) and notifications (SQS)
Implementation
Broadcast Joins: Efficient offer matching using broadcast variables
Partitioned Storage: Date-based partitioning for optimal query performance
Privacy by Design: SHA-256 phone hashing for lake storage
Serverless Notifications: SQS + Lambda for scalable SMS delivery
Results & Impact
High Throughput: 10K+ transactions per minute processing capability
Low Latency: P95 < 60 seconds for real-time offer delivery
Cost Efficient: ~$50/month operational costs with serverless architecture
Production Ready: Comprehensive error handling and monitoring