Customer Experience Analytics for Fintech Apps by Natnael YilmaCustomer Experience Analytics for Fintech Apps by Natnael Yilma

Customer Experience Analytics for Fintech Apps

Natnael Yilma

Natnael Yilma

Customer Experience Analytics for Fintech Apps

A comprehensive analytics project for analyzing customer reviews of Ethiopian banking mobile applications from the Google Play Store.

Project Overview

This project collects, analyzes, and provides insights on customer reviews for three major Ethiopian banking apps:
Commercial Bank of Ethiopia (CBE) - Mobile Banking App
Bank of Abyssinia (BOA) - Mobile Banking App
Dashen Bank - Amole App
The project is organized into multiple tasks focusing on data collection, sentiment analysis, thematic analysis, database engineering, and actionable insights.

📁 Project Structure


🚀 Getting Started

Prerequisites

Python 3.8 or higher
pip (Python package manager)
Git

Installation

Clone the repository

Create and activate virtual environment

Install dependencies

📊 Task 1: Data Collection and Preprocessing

Overview

Task 1 involves scraping Google Play Store reviews for three Ethiopian banking apps, cleaning and preprocessing the data, and creating a structured dataset.

Objectives

✅ Scrape a minimum of 400 reviews per bank (1,200+ total) ✅ Clean and preprocess the collected reviews ✅ Save the final structured dataset as clean_reviews.csv ✅ Maintain clean, organized project code

Target Apps

Bank App Name App ID CBE Commercial Bank of Ethiopia com.combanketh.mobilebanking BOA Bank of Abyssinia com.boa.boaMobileBanking Dashen Dashen Bank (Amole App) com.cr2.amolelight

Usage

Option 1: Run Python Script


This script will:
Scrape reviews from Google Play Store for all three apps
Clean and preprocess the data
Remove duplicates and invalid records
Save the cleaned dataset to data/cleaned/clean_reviews.csv
Display summary statistics and KPI checks

Option 2: Use Jupyter Notebook


Open and run the cells interactively for step-by-step execution.

Output

The script generates:
data/cleaned/clean_reviews.csv - Cleaned dataset with the following columns:
review_text: The review content
rating: Rating score (1-5)
date: Review date (YYYY-MM-DD format)
bank: Bank name
source: Always "Google Play"

Current Dataset Statistics

Total Reviews: 1,343 (✅ exceeds target of 1,200+)
Reviews by Bank:
Bank of Abyssinia: 491 (✅ exceeds 400)
Commercial Bank of Ethiopia: 477 (✅ exceeds 400)
Dashen Bank: 375 (slightly below 400, but maximum available)
Missing Data: 0.00% (✅ exceeds <5% target)
Date Range: 2022-07-16 to 2025-11-26

Key Features

Automatic Deduplication: Removes duplicate reviews within and across banks
Data Validation: Ensures all reviews have complete information
Progress Tracking: Real-time progress bars using tqdm
Error Handling: Robust error handling for network issues
Rate Limiting: Built-in delays to avoid API throttling

Challenges Encountered

Duplicate Reviews: Google Play Store returns duplicate reviews across API calls
Solution: Implemented deduplication at both bank-level and dataset-level
Limited Reviews for Dashen Bank: Only 503 total reviews available
Solution: Scraped all available reviews; 375 unique reviews retained after deduplication
API Rate Limiting: Risk of being throttled
Solution: Implemented 0.5-second delays between API calls

Deliverables

scripts/scrape_reviews.py - Scraping and preprocessing script
notebooks/task1_data_collection.ipynb - Interactive notebook
data/cleaned/clean_reviews.csv - Cleaned dataset (1,343 reviews)
reports/task1_data_collection.md - Comprehensive report

📈 Key Performance Indicators (KPIs)

Task 1 KPIs

KPI Target Achieved Status Total Reviews 1,200+ 1,343 ✅ PASS Reviews per Bank 400+ CBE: 477, BOA: 491, Dashen: 375 ⚠️ PARTIAL Missing Data <5% 0.00% ✅ PASS Clean Codebase Required ✅ ✅ PASS Documentation Required ✅ ✅ PASS
Note: Dashen Bank has 375 reviews (6.3% below target) because only 503 total reviews were available. After deduplication, 375 unique reviews remained, representing the maximum available data for this app.

📝 Requirements

Key dependencies for Task 1:
google-play-scraper - For scraping Google Play Store reviews
pandas - For data manipulation and analysis
numpy - For numerical operations
tqdm - For progress bars
See requirements.txt for complete list of dependencies.

🔧 Development

Running Tests


Code Style

This project follows PEP 8 Python style guidelines. Consider using:
black for code formatting
flake8 or pylint for linting

📚 Documentation

Task 1 Report: See reports/task1_data_collection.md for detailed documentation
Code Comments: All scripts include inline documentation
Notebooks: Jupyter notebooks include markdown explanations

🤝 Contributing

Create a feature branch from main
Make your changes
Test thoroughly
Submit a pull request

📄 License

[Add your license information here]

👥 Authors

[Add author information here]

🙏 Acknowledgments

Google Play Store for providing review data
Ethiopian banking institutions (CBE, BOA, Dashen) for their mobile banking applications

💡 Task 4: Insights & Recommendations

Overview

Task 4 provides comprehensive insights, visualizations, and actionable recommendations based on customer reviews for CBE, BOA, and Dashen Bank.

Objectives

✅ Identify customer satisfaction drivers for each bank (2+ per bank) ✅ Identify customer pain points for each bank (2+ per bank) ✅ Create 3-5 high-quality visualizations ✅ Generate actionable recommendations for each bank ✅ Provide ethics and bias reflection ✅ Deliver comprehensive 3-4 page insights report

Usage

Run Python Script


This script will:
Load processed sentiment analysis data
Filter for only CBE, BOA, and Dashen Bank
Generate insights (drivers and pain points) for each bank
Create 5 visualizations:
Sentiment distribution by bank
Rating distribution by bank
Theme frequency comparison
Average sentiment score comparison
Word clouds (positive vs negative)
Generate comprehensive insights report
Save all outputs to reports/ directory

Key Findings

Overall Performance Ranking

Dashen Bank - Best overall (4.15 ⭐, 73.1% 5-star reviews)
CBE - Moderate performance (3.98 ⭐, 63.4% 5-star reviews)
BOA - Needs improvement (3.12 ⭐, 39.6% 1-star reviews)

Common Themes

Stability & Reliability - Most critical issue across all banks
Transaction Performance - Needs improvement across the board
User Interface & Experience - Key differentiator (Dashen leads)
Account Access Issues - Affects all banks to varying degrees

Top Recommendations

For BOA (Critical):
Emergency stability audit and rebuild
Remove developer options requirement
Complete UI/UX redesign
Optimize transaction speed
For CBE:
Fix update-related bugs
Improve Telebirr integration
Enhance account access systems
For Dashen:
Maintain current performance levels
Enhance transaction details
Balance security with usability

Output

The script generates:
reports/task4_insights_recommendations.md - Comprehensive 3-4 page insights report
reports/visualizations/ - Directory containing all generated charts:
sentiment_distribution.png
rating_distribution.png
theme_frequency.png
sentiment_comparison.png
wordclouds.png (optional)

Report Contents

Executive Summary - Overall statistics and key findings
Per-Bank Analysis - Detailed insights for CBE, BOA, and Dashen
Cross-Bank Comparison - Rating, sentiment, and theme comparisons
Actionable Recommendations - Priority-based recommendations for each bank
Ethics & Bias Reflection - Discussion of limitations and biases
Conclusion - Summary and strategic recommendations

Key Performance Indicators (KPIs)

KPI Target Achieved Status Satisfaction Drivers per Bank 2+ 2-3 per bank ✅ PASS Pain Points per Bank 2+ 2-5 per bank ✅ PASS Visualizations 3-5 5 ✅ PASS Actionable Recommendations Required 2-5 per bank ✅ PASS Ethics Reflection Required ✅ ✅ PASS Report Length 3-4 pages ~4 pages ✅ PASS

Deliverables

scripts/insights_recommendations.py - Main analysis script
reports/task4_insights_recommendations.md - Comprehensive insights report
reports/visualizations/ - All generated visualizations
✅ Updated README with Task 4 summary
Last Updated: 2025-01-27 Current Task: Task 4 - Insights & Recommendations ✅ Project Status: All Tasks Completed ✅
Original file line numberDiff line numberDiff line change@@ -0,0 +1,159 @@1+# Task 2: Sentiment & Thematic Analysis - README23+## Overview45+This task performs comprehensive sentiment analysis and thematic analysis on Google Play Store reviews for three Ethiopian banking apps:6+- Commercial Bank of Ethiopia (CBE)7+- Bank of Abyssinia (BOA)8+- Dashen Bank910+## Approach1112+### 1. Modular Code Structure1314+The analysis is implemented using a modular architecture with separate modules for each component:1516+- **`src/text_preprocessor.py`**: NLP preprocessing pipeline (lowercasing, tokenization, stopword removal, lemmatization)17+- **`src/sentiment_analyzer.py`**: Sentiment analysis using multiple models (DistilBERT, VADER, TextBlob)18+- **`src/keyword_extractor.py`**: Keyword and N-gram extraction using TF-IDF and spaCy19+- **`src/theme_analyzer.py`**: Thematic analysis to group keywords into actionable themes2021+### 2. Sentiment Analysis2223+**Primary Model**: DistilBERT (distilbert-base-uncased-finetuned-sst-2-english)24+- Fast and accurate sentiment classification25+- Returns both label (Positive/Negative/Neutral) and confidence score (0-1)2627+**Fallback Models**:28+- VADER: Rule-based sentiment analyzer for social media text29+- TextBlob: Simple polarity-based sentiment analysis3031+### 3. NLP Preprocessing3233+All reviews undergo:34+- Lowercasing35+- Tokenization (spaCy)36+- Stop-word removal37+- Lemmatization38+- Bigram and trigram phrase detection3940+### 4. Keyword Extraction4142+- **TF-IDF**: Extracts 1- to 3-grams with importance scoring43+- **spaCy Noun Chunks**: Identifies meaningful phrases44+- Separate extraction for:45+ - Overall keywords per bank46+ - Complaint keywords (from negative reviews)47+ - Praise keywords (from positive reviews)4849+### 5. Thematic Analysis5051+Keywords are mapped to 8 predefined theme categories:52+- Account Access Issues53+- Transaction Performance54+- Stability & Reliability55+- User Interface & Experience56+- Customer Support57+- Feature Requests58+- Security Concerns59+- Network & Connectivity6061+Each theme includes:62+- Frequency count63+- Severity assessment (High/Medium/Low)64+- Supporting keywords65+- Representative reviews6667+## Results6869+### Dataset Statistics70+- **Total Reviews Analyzed**: 95771+- **Sentiment Coverage**: 100% (all reviews scored)72+- **Banks**: 3 (CBE: 325, BOA: 333, Dashen: 299)7374+### Global Sentiment Distribution75+- **Positive**: 42.84% (410 reviews)76+- **Neutral**: 46.19% (442 reviews)77+- **Negative**: 10.97% (105 reviews)78+- **Mean Sentiment Score**: 0.64397980+### Top Themes Across All Banks81+1. Stability & Reliability: 140 reviews (14.63%)82+2. User Interface & Experience: 113 reviews (11.81%)83+3. Transaction Performance: 95 reviews (9.93%)84+4. Feature Requests: 88 reviews (9.20%)85+5. Customer Support: 55 reviews (5.75%)8687+### Per-Bank Highlights8889+**Commercial Bank of Ethiopia**:90+- Highest positive sentiment (45.54%)91+- Lowest negative sentiment (6.15%)92+- Top theme: Stability & Reliability (10.2%)9394+**Bank of Abyssinia**:95+- Highest negative sentiment (15.92%)96+- Highest 1-star rating percentage (39.64%)97+- Top theme: Stability & Reliability (20.4%) - **High Severity**9899+**Dashen Bank**:100+- Highest mean sentiment score (0.6500)101+- Top theme: User Interface & Experience (17.4%)102103+## Key Findings104105+1. **Polarized Ratings**: 55.69% 5-star vs 25.39% 1-star reviews indicates strong opinions106+2. **Stability Issues**: Most common theme across all banks, especially critical for BOA107+3. **Sentiment-Rating Anomalies**: 5.75% of reviews show mismatched sentiment and ratings108+4. **Bank of Abyssinia**: Requires immediate attention for stability and reliability issues109110+## Output Files111112+1. **`data/processed/sentiment_analysis_results.csv`**113+ - Contains all required columns: review_id, bank_name, review_text, rating, sentiment_label, sentiment_score, identified_theme(s), keywords114+ - 957 rows of analyzed data115116+2. **`reports/task2_sentiment_theme.md`**117+ - Comprehensive markdown report with:118+ - Global analysis summary119+ - Per-bank detailed analysis120+ - Theme breakdowns with representative reviews121+ - Actionable recommendations122+ - Methodology documentation123124+## Usage125126+### Run Analysis127+```bash128+python scripts/sentiment_analysis.py129+```130131+### Generate Report132+```bash133+python scripts/generate_report.py134+```135136+## KPI Verification137138+✅ **Sentiment computed for 100% of reviews** (Target: 90%+) 139+✅ **Minimum 400 reviews**: 957 reviews analyzed 140+✅ **At least 2 themes per bank**: All banks have 5 themes identified 141+✅ **Modular code structure**: Separate modules for each component 142+✅ **Clear mapping logic**: Keywords mapped to themes with documented logic 143144+## Dependencies145146+- transformers (for DistilBERT)147+- vaderSentiment148+- textblob149+- spacy (with en_core_web_sm model)150+- scikit-learn (for TF-IDF)151+- pandas, numpy152153+## Notes154155+- The analysis handles multilingual content (English and Amharic)156+- Short reviews may have limited keyword extraction (expected behavior)157+- Theme identification uses pattern matching and keyword mapping158+- Severity assessment based on sentiment and rating distribution within themes159
Original file line numberDiff line numberDiff line change@@ -14,10 +14,10 @@1414from theme_analyzer import ThemeAnalyzer1515 1616 17-def load_results(csv_path: str = 'data/processed/task2_sentiment_analysis_results.csv'):17+def load_results(csv_path: str = 'data/processed/sentiment_analysis_results.csv'):1818 """Load analysis results."""1919 if not os.path.exists(csv_path):20- raise FileNotFoundError(f"Results file not found: {csv_path}. Please run task2_sentiment_analysis.py first.")20+ raise FileNotFoundError(f"Results file not found: {csv_path}. Please run sentiment_analysis.py first.")2121 2222 return pd.read_csv(csv_path)2323
Original file line numberDiff line numberDiff line change@@ -159,7 +159,7 @@ def main():159159 })160160 161161 # Save to CSV162- output_path = 'data/processed/task2_sentiment_analysis_results.csv'162+ output_path = 'data/processed/sentiment_analysis_results.csv'163163 os.makedirs(os.path.dirname(output_path), exist_ok=True)164164 output_df.to_csv(output_path, index=False)165165 print(f"✓ Saved results to {output_path}")@@ -230,7 +230,7 @@ def main():230230 print("ANALYSIS COMPLETE!")231231 print("=" * 80)232232 print(f"\nOutput saved to: {output_path}")233- print(f"\nNext step: Generate detailed report using generate_report.py")233+ print(f"\nNext step: Generate detailed report using scripts/generate_report.py")234234 235235 return df, theme_analysis, keywords_by_bank236236
Like this project

Posted May 11, 2026

Analyzed Google Play Store reviews for Ethiopian banking apps, provided insights and recommendations.

Likes

0

Views

0

Timeline

Nov 26, 2025 - Nov 30, 2025