Enhancing Parts Database through Advanced Web Scraping

Justin Stevenson

Data Scraper
eCommerce Manager
Data Analyst
Python
Scrapy
Selenium
Zuma, a prominent equipment rental enterprise, faced a daunting challenge: their vast inventory of over 280,000 products from key vendors like JLG, Genie, Haulotte, and Skyjack suffered from data inaccuracies due to unreliable vendor information. The need for a comprehensive update to their database was critical for operational efficiency and customer satisfaction.

Objective

The project aimed to create a reliable and centralized database that accurately reflected product titles, images, descriptions, categories, costs, and relevant equipment models. This required the collection of precise data directly from the vendors, bypassing the limitations of publicly available information.
Leveraging my 7 years of IT experience, I spearheaded a complex web scraping initiative to gather and verify the necessary data. This involved circumventing anti-bot measures on private vendor websites and implementing a multi-faceted scraping strategy.

Process

The web scraping operation was executed in several stages:
Initial Data Collection: Utilizing Python scripts with Beautiful Soup 4, Scrapy, and Selenium, I navigated through anti-bot protections to scrape essential product information and download equipment manuals.
Data Analysis and Cleaning: Post-collection, the data underwent rigorous analysis and cleaning using the Python Pandas library. This step was crucial in ensuring the integrity and usability of the data.
Database Creation: A new SQL database was constructed, establishing relationships between various data points, such as product categories and models.
Data Enrichment: The process also involved identifying duplicate part numbers and discovering similar and related parts, which significantly improved inventory management.

Results

The project culminated in several key achievements:
Comprehensive Database: Assembled a detailed database encompassing over 280,000 products, complete with verified descriptions, images, pricing, and technical specifications.
Data Duplication and Relation: Streamlined the identification of duplicate parts and facilitated the discovery of related products, enhancing inventory control.
Updated Inventory Source: Provided departments with an up-to-date and centralized inventory source, improving cross-functional operations.
Customer Experience: Improved the accuracy of product details available to customers, leading to a better user experience.

Conclusion

This case study exemplifies the successful application of data engineering, web scraping, data cleaning, and database design to overcome a significant data management hurdle. The project not only optimized internal processes but also delivered substantial value to Zuma’s customers and various departments.
Partner With Justin
View Services

More Projects by Justin