Master SHEIN E-commerce Data Cleaning & Analysis TechniquesMaster SHEIN E-commerce Data Cleaning & Analysis Techniques
The network for creativity
Join 1.25M professional creatives like you
Connect with clients, get discovered, and run your business 100% commission-free
Creatives on Contra have earned over $150M and we are just getting started
SHEIN Product Data Cleaning & E-commerce Analysis Cleaned and structured a large-scale scraped e-commerce dataset (80,000+ product records across 21 CSV files).
The raw dataset contained inconsistent formatting, duplicate entries, missing values, and noisy text fields that made it unsuitable for analysis.
Key work included:
:Merging and standardising 21 raw CSV files into a single structured dataset
Removing 11,000+ duplicate products using title-based deduplication logic
Handling missing discount values using controlled null retention (no artificial imputation)
Filtering out statistical outliers without clipping or distortion
Engineering analytical features such as:
units_sold
log-transformed sales metric
price category segmentation (fixed bins using pd.cut)
discount presence flag
value efficiency score (sales-to-price ratio)
Final output: 70,292 clean, analysis-ready product records.
This project demonstrates real-world e-commerce data wrangling, feature engineering, and dataset preparation for downstream analytics and dashboarding.
Post image
Post image
Back to feed
The network for creativity
Join 1.25M professional creatives like you
Connect with clients, get discovered, and run your business 100% commission-free
Creatives on Contra have earned over $150M and we are just getting started