SHEIN Product Data Cleaning & E-commerce Analysis
Cleaned and structured a large-scale scraped e-commerce dataset (80,000+ product records across 21 CSV files).
The raw dataset contained inconsistent formatting, duplicate entries, missing values, and noisy text fields that made it unsuitable for analysis.
Key work included:
:Merging and standardising 21 raw CSV files into a single structured dataset
Removing 11,000+ duplicate products using title-based deduplication logic
Handling missing discount values using controlled null retention (no artificial imputation)
Filtering out statistical outliers without clipping or distortion
Engineering analytical features such as:
units_sold
log-transformed sales metric
price category segmentation (fixed bins using pd.cut)
discount presence flag
value efficiency score (sales-to-price ratio)
Final output: 70,292 clean, analysis-ready product records.
This project demonstrates real-world e-commerce data wrangling, feature engineering, and dataset preparation for downstream analytics and dashboarding.
1
25
RFM Customer Segmentation for E-commerce Retention Analysis
Built a customer segmentation model using RFM (Recency, Frequency, Monetary) analysis to identify high-value customers, at-risk users, and churned segments.
The dataset was structured from order-level e-commerce transaction data and processed to generate actionable customer-level insights.
Key work included:
Calculating Recency, Frequency, and Monetary values per customer
Scoring customers using percentile-based ranking and NTILE logic (SQL validation)
Segmenting customers into actionable groups:
Champions
Loyal Customers
Potential Loyalists
At-Risk Customers
Lapsed Customers
Cross-validating segmentation logic between Excel (Power Query) and SQL (PostgreSQL)
Structuring outputs for CRM and retention strategy use cases
Final output: fully segmented customer dataset ready for marketing targeting and retention campaigns.
This project demonstrates applied customer analytics for e-commerce growth and retention optimization.
1
22
Amazon Product Funnel & Performance Analysis (Python + SQL + Looker Studio)
Analyzed an Amazon product dataset to understand pricing behavior, product performance, and engagement-driven funnel patterns.
The project focused on identifying how products perform from listing characteristics through to engagement signals such as reviews and category-level competition.
Key work included:
Cleaned and structured a dataset of 224 Amazon product listings using Python (Pandas)
Standardized pricing and categorical fields for analysis readiness
Built SQL queries in BigQuery to extract performance insights across products and categories
Performed funnel-style analysis from product listing attributes → engagement metrics → review patterns
Identified concentration effects in product performance (Pareto distribution across categories)
Built a Looker Studio dashboard to visualize key metrics and category-level comparisons
Final output: an interactive analytics view of Amazon product performance and behavioral patterns.
This project demonstrates applied product analytics and funnel thinking in an e-commerce marketplace context.
1
13
Forma Active Ecommerce Pipeline
End-to-End Ecommerce Analytics Pipeline (Shopify → PostgreSQL → dbt → Tableau)
Built a full end-to-end analytics pipeline for an e-commerce dataset, simulating a real production data stack from extraction to visualization.
The project started with Shopify-style order data and progressed through a complete modern data engineering workflow.
Key work included:
Extracted and transformed Shopify-like order data using Python (json_normalize, explode for nested structures)
Built a relational PostgreSQL database (Neon) for structured storage
Modeled data using dbt with staging and mart layers
Implemented data quality tests (uniqueness, null checks, referential integrity)
Created analytics-ready models for revenue, customers, and product performance
Built a Tableau dashboard with multiple views:
Sales performance over time
Customer cohort behavior
Product performance and concentration
Final output: a production-style analytics system from raw data ingestion to business-ready dashboards.
This project demonstrates full-stack analytics capability: data engineering, modeling, and visualization.
1
17
Spain Data Analyst Job Market Analysis (SQL Exploration Project)
Conducted an exploratory SQL analysis of the data analyst job market in Spain to identify salary trends, skill demand, and market structure.
The dataset was queried and analyzed using SQL to extract insights on compensation patterns and in-demand technical skills.
Key work included:
Cleaned and explored structured job market dataset using SQL
Analyzed salary distributions across experience levels and roles
Identified most frequently requested technical skills in job postings
Built skill-salary relationships to understand market value of competencies
Visualized insights around optimal skill combinations for higher-paying roles
Final output: structured insights into the Spanish data analytics job market, including salary benchmarks and skill demand patterns.
This project demonstrates SQL-based exploratory data analysis and labor market insight generation.