Lead Generation Data Extraction

Asif Farhan Khan

Email Marketer
Business Development Specialist
Lead Generator
BeautifulSoup
Python
SQL

Worked with dozens of freelancers, none were able to bypass the Cloudflare security. I am really greatful to have found your service for long term collaboration.



Table of Contents

Client's Objective

I am looking for a data scrapping expert. This is not for beginners. My data source has over a million records with IP detectors that detects the scrappers if a particular IP visits pages more than 10 times, also it has cloudflare protection and Bot detector being invoked after a few visits. The website even limits the number of times humans can visit the data before they're given a cooldown period.

It has 3 levels of securities and almost every scrapper out here and on Fiverr declined the job saying it's not doable. I do not intend on wasting any more money.

Project Specification

  • Search for businesses going by the alphabetical order in the business listing page
  • For each alphabet, Around 30,000 business records are available
  • Retreive all the data for each business, mainly:
    • BusinessName
    • BusinessRegistrationDate
    • BusinessEmail
    • BusinessNotificationEmail
  • Save all the html data in a tabular form in an XLS or a CSV file

Project Timeline

  • Client's TImeline: Flexible but less than 3 Months if possible
  • Scripting: 3 Days
  • Execution: 7 Days
  • Delivery: 10 Days

Challenges/Proposed Workarounds

  • Challenge: Upon initially auditing of the website, it was found that a lot of businesses were repeated through out multiple pages which made the script inefficient and slow constantly hitting on links that were already previously targeted.
  • Workaround: Inspecting the network requests revealed that all the business data were being fetched from an external database as an API call, Instead of the alphabetical approch, it made much more sense to directly call the API with the business id no which was a serialised ID. This helped the scraping speed up by 30x!
  • Challenge: The website was one of the toughest to beat in terms of security as it possibly hindered every approch, Apart from just IP blocking it also blocked the network and device so no devices connected to the same network was ever able to make a request.
  • Workaround: A carefully scripted program that used multiple levels of anti detection measures was developed.

Solution Provided

The work was delivered keeping the clients main objective ie Email Marketting and Lead Generation into the main focus:

  • 770,000 Rows were delivered for Project 1 and 1.4 Million Rows for Project 2
  • All the rows were carefully examined to remove any untidy or useless characters
  • Data was filtered in a way that only rows containing the most significant columns prescribed by client was packaged for delivery.
  • The businesses were sorted by RegistrationDate in the final output as requested
  • Error tracking using logs to minimise any skipping or failure of data
  • A suite of tools had to be programmed all serving specific needs
relevant scraper files

Database Preview

Project 2 Database (1.4 Million Rows)

Database Project 1

Project 1 Database (700,000 Rows)

Databse for Project 2



Partner With Asif Farhan
View Services

More Projects by Asif Farhan