🤖 Web Catalogue Scraper

Capp Wiedenhoefer

Data Entry Specialist
Data Scraper
Web Developer
AWS
Puppeteer
TypeScript

Here is What my client had to say after completion of the project:

"Working with Capp has been amazing. He understood requirements well, kept me updated with progress and overall delivery is high quality. Amazing professional experience to work with him."

I decided to make the code public for this project. Any references to the identity of the site has been removed. Feel free to check it out here:

Overview

In a significant undertaking, I was commissioned to extract the entire catalogue of a pharmaceutical company located abroad, focusing on publicly available drug information. The client's goal was to amass detailed data on over 500,000 drugs, including names, descriptions, usages, benefits, and images. This project required a highly customized scraping solution to navigate complex web architectures, including Server-Side Rendering (SSR), ensuring thorough and efficient data collection.

Challenges and Solutions

One of the main challenges was designing a scraper capable of handling the vast catalogue size and diversity of data presentation, including SSR techniques used by the pharmaceutical company's website. To address this, I developed a sophisticated scraper that could dynamically adapt to various scenarios, successfully extracting the required information without compromising on detail or accuracy.

Outcome

The project culminated in the successful delivery of a comprehensive dataset to the client, encompassing all requested details for each drug in the catalogue. Recognizing the potential value of this project as a showcase of my technical capabilities, I opted to make the scraper's code publicly available on my GitHub, save for any direct references to the source URL. This decision not only demonstrates my expertise in data scraping but also contributes to the broader community by providing a resource for similar challenges.

Professional Impact

This project stands as a testament to my ability to tackle large-scale, complex data scraping tasks, offering clients bespoke solutions that meet their specific data collection needs. By sharing my work, I also underline my commitment to transparency and community engagement in the tech field, reinforcing my reputation as a skilled and resourceful developer in the area of data scraping.

Partner With Capp
View Services

More Projects by Capp