clean_reviews.csv
✅ Maintain clean, organized project codecom.combanketh.mobilebanking BOA Bank of Abyssinia com.boa.boaMobileBanking Dashen Dashen Bank (Amole App) com.cr2.amolelightdata/cleaned/clean_reviews.csvdata/cleaned/clean_reviews.csv - Cleaned dataset with the following columns:review_text: The review contentrating: Rating score (1-5)date: Review date (YYYY-MM-DD format)bank: Bank namesource: Always "Google Play"tqdmscripts/scrape_reviews.py - Scraping and preprocessing scriptnotebooks/task1_data_collection.ipynb - Interactive notebookdata/cleaned/clean_reviews.csv - Cleaned dataset (1,343 reviews)reports/task1_data_collection.md - Comprehensive reportgoogle-play-scraper - For scraping Google Play Store reviewspandas - For data manipulation and analysisnumpy - For numerical operationstqdm - For progress barsrequirements.txt for complete list of dependencies.black for code formattingflake8 or pylint for lintingreports/task1_data_collection.md for detailed documentationmainreports/ directoryreports/task4_insights_recommendations.md - Comprehensive 3-4 page insights reportreports/visualizations/ - Directory containing all generated charts:sentiment_distribution.pngrating_distribution.pngtheme_frequency.pngsentiment_comparison.pngwordclouds.png (optional)scripts/insights_recommendations.py - Main analysis scriptreports/task4_insights_recommendations.md - Comprehensive insights reportreports/visualizations/ - All generated visualizations@@ -0,0 +1,159 @@1+# Task 2: Sentiment & Thematic Analysis - README23+## Overview45+This task performs comprehensive sentiment analysis and thematic analysis on Google Play Store reviews for three Ethiopian banking apps:6+- Commercial Bank of Ethiopia (CBE)7+- Bank of Abyssinia (BOA)8+- Dashen Bank910+## Approach1112+### 1. Modular Code Structure1314+The analysis is implemented using a modular architecture with separate modules for each component:1516+- **`src/text_preprocessor.py`**: NLP preprocessing pipeline (lowercasing, tokenization, stopword removal, lemmatization)17+- **`src/sentiment_analyzer.py`**: Sentiment analysis using multiple models (DistilBERT, VADER, TextBlob)18+- **`src/keyword_extractor.py`**: Keyword and N-gram extraction using TF-IDF and spaCy19+- **`src/theme_analyzer.py`**: Thematic analysis to group keywords into actionable themes2021+### 2. Sentiment Analysis2223+**Primary Model**: DistilBERT (distilbert-base-uncased-finetuned-sst-2-english)24+- Fast and accurate sentiment classification25+- Returns both label (Positive/Negative/Neutral) and confidence score (0-1)2627+**Fallback Models**:28+- VADER: Rule-based sentiment analyzer for social media text29+- TextBlob: Simple polarity-based sentiment analysis3031+### 3. NLP Preprocessing3233+All reviews undergo:34+- Lowercasing35+- Tokenization (spaCy)36+- Stop-word removal37+- Lemmatization38+- Bigram and trigram phrase detection3940+### 4. Keyword Extraction4142+- **TF-IDF**: Extracts 1- to 3-grams with importance scoring43+- **spaCy Noun Chunks**: Identifies meaningful phrases44+- Separate extraction for:45+ - Overall keywords per bank46+ - Complaint keywords (from negative reviews)47+ - Praise keywords (from positive reviews)4849+### 5. Thematic Analysis5051+Keywords are mapped to 8 predefined theme categories:52+- Account Access Issues53+- Transaction Performance54+- Stability & Reliability55+- User Interface & Experience56+- Customer Support57+- Feature Requests58+- Security Concerns59+- Network & Connectivity6061+Each theme includes:62+- Frequency count63+- Severity assessment (High/Medium/Low)64+- Supporting keywords65+- Representative reviews6667+## Results6869+### Dataset Statistics70+- **Total Reviews Analyzed**: 95771+- **Sentiment Coverage**: 100% (all reviews scored)72+- **Banks**: 3 (CBE: 325, BOA: 333, Dashen: 299)7374+### Global Sentiment Distribution75+- **Positive**: 42.84% (410 reviews)76+- **Neutral**: 46.19% (442 reviews)77+- **Negative**: 10.97% (105 reviews)78+- **Mean Sentiment Score**: 0.64397980+### Top Themes Across All Banks81+1. Stability & Reliability: 140 reviews (14.63%)82+2. User Interface & Experience: 113 reviews (11.81%)83+3. Transaction Performance: 95 reviews (9.93%)84+4. Feature Requests: 88 reviews (9.20%)85+5. Customer Support: 55 reviews (5.75%)8687+### Per-Bank Highlights8889+**Commercial Bank of Ethiopia**:90+- Highest positive sentiment (45.54%)91+- Lowest negative sentiment (6.15%)92+- Top theme: Stability & Reliability (10.2%)9394+**Bank of Abyssinia**:95+- Highest negative sentiment (15.92%)96+- Highest 1-star rating percentage (39.64%)97+- Top theme: Stability & Reliability (20.4%) - **High Severity**9899+**Dashen Bank**:100+- Highest mean sentiment score (0.6500)101+- Top theme: User Interface & Experience (17.4%)102103+## Key Findings104105+1. **Polarized Ratings**: 55.69% 5-star vs 25.39% 1-star reviews indicates strong opinions106+2. **Stability Issues**: Most common theme across all banks, especially critical for BOA107+3. **Sentiment-Rating Anomalies**: 5.75% of reviews show mismatched sentiment and ratings108+4. **Bank of Abyssinia**: Requires immediate attention for stability and reliability issues109110+## Output Files111112+1. **`data/processed/sentiment_analysis_results.csv`**113+ - Contains all required columns: review_id, bank_name, review_text, rating, sentiment_label, sentiment_score, identified_theme(s), keywords114+ - 957 rows of analyzed data115116+2. **`reports/task2_sentiment_theme.md`**117+ - Comprehensive markdown report with:118+ - Global analysis summary119+ - Per-bank detailed analysis120+ - Theme breakdowns with representative reviews121+ - Actionable recommendations122+ - Methodology documentation123124+## Usage125126+### Run Analysis127+```bash128+python scripts/sentiment_analysis.py129+```130131+### Generate Report132+```bash133+python scripts/generate_report.py134+```135136+## KPI Verification137138+✅ **Sentiment computed for 100% of reviews** (Target: 90%+) 139+✅ **Minimum 400 reviews**: 957 reviews analyzed 140+✅ **At least 2 themes per bank**: All banks have 5 themes identified 141+✅ **Modular code structure**: Separate modules for each component 142+✅ **Clear mapping logic**: Keywords mapped to themes with documented logic 143144+## Dependencies145146+- transformers (for DistilBERT)147+- vaderSentiment148+- textblob149+- spacy (with en_core_web_sm model)150+- scikit-learn (for TF-IDF)151+- pandas, numpy152153+## Notes154155+- The analysis handles multilingual content (English and Amharic)156+- Short reviews may have limited keyword extraction (expected behavior)157+- Theme identification uses pattern matching and keyword mapping158+- Severity assessment based on sentiment and rating distribution within themes159@@ -14,10 +14,10 @@1414from theme_analyzer import ThemeAnalyzer1515
1616
17-def load_results(csv_path: str = 'data/processed/task2_sentiment_analysis_results.csv'):17+def load_results(csv_path: str = 'data/processed/sentiment_analysis_results.csv'):1818 """Load analysis results."""1919 if not os.path.exists(csv_path):20- raise FileNotFoundError(f"Results file not found: {csv_path}. Please run task2_sentiment_analysis.py first.")20+ raise FileNotFoundError(f"Results file not found: {csv_path}. Please run sentiment_analysis.py first.")2121
2222 return pd.read_csv(csv_path)2323
@@ -159,7 +159,7 @@ def main():159159 })160160
161161 # Save to CSV162- output_path = 'data/processed/task2_sentiment_analysis_results.csv'162+ output_path = 'data/processed/sentiment_analysis_results.csv'163163 os.makedirs(os.path.dirname(output_path), exist_ok=True)164164 output_df.to_csv(output_path, index=False)165165 print(f"✓ Saved results to {output_path}")@@ -230,7 +230,7 @@ def main():230230 print("ANALYSIS COMPLETE!")231231 print("=" * 80)232232 print(f"\nOutput saved to: {output_path}")233- print(f"\nNext step: Generate detailed report using generate_report.py")233+ print(f"\nNext step: Generate detailed report using scripts/generate_report.py")234234
235235 return df, theme_analysis, keywords_by_bank236236
Posted May 11, 2026
Analyzed Google Play Store reviews for Ethiopian banking apps, provided insights and recommendations.
0
0
Nov 26, 2025 - Nov 30, 2025