Porting Articles from a Public Website to Zendesk Support Helpd…

Justin Stevenson

Data Scraper

Data Analyst

Web Developer

BeautifulSoup

Python

Zendesk

The client, Fivetran, is a cloud-based data integration platform that helps businesses connect their data sources and analyze them in a centralized place. Fivetran had a large amount of documentation on their public website, covering various topics such as product features, tutorials, FAQs, and best practices. However, they wanted to move their documentation to Zendesk, a customer service software that also offers a knowledge base feature. Zendesk would allow them to provide better support and self-service options to their customers, as well as to track and measure the usage and effectiveness of their documentation. There was an especially great value in using the AI capabilities of Zendesk in processing and assisting customers with their company documentation.

Port over 2,521 articles from Fivetran’s public website to Zendesk, including the body HTML and properly formatted content.

Preserve the original folder structure and navigation of the articles, as well as the images and attachments.

Handle the issues with the reference URLs that linked to other parts of the original website or other articles on the platform.

Complete the project within a week, with minimal manual intervention and maximum accuracy.

The main challenges of the project were to:

Scrape the website data and extract the relevant information from the HTML pages, such as the article title, content, category, and subcategory.

Process the HTML data and convert it to a format that is compatible with Zendesk’s API, such as JSON or XML.

Upload the data to Zendesk using the API, and assign the correct attributes and labels to each article, such as the status, visibility, section, and position.

Deal with the inconsistencies and variations in the website data, such as missing or broken links, images, and attachments.

Test and verify the quality and functionality of the ported articles, and fix any errors or issues that might arise.

The main findings or lessons learned from the case study are:

The project was successfully completed within the deadline, using multiple Python scripts that automated the scraping, processing, and uploading of the website data to Zendesk.

The project delivered high-quality and value-added documentation to the client and their users, with minimal errors or issues.

The project demonstrated my skills and expertise,, such as scripting, coding, web scraping, data processing, and API integration.

The project also showcased my professionalism, communication, and problem-solving abilities, as well as their willingness to go above and beyond the expectations of the client.

Methods

The methods used to complete the project were mainly based on Python, a versatile and powerful programming language that is widely used for web scraping, data processing, and API integration. I used the following tools, technologies, languages, and frameworks to implement the project:

Requests: A Python library that allows sending HTTP requests and interacting with web APIs.

BeautifulSoup: A Python library that parses HTML and XML documents and extracts data from them.

Pandas: A Python library that provides data structures and analysis tools for manipulating and processing data.

JSON: A Python module that encodes and decodes data in JSON format, a lightweight and human-readable data interchange format.

Zendesk API: A RESTful web service that allows creating, updating, deleting, and retrieving data from Zendesk, such as tickets, users, articles, and sections.

I chose these methods because they were suitable for the project requirements and the data sources. I had experience with these methods and was confident in their ability to use them effectively and efficiently.

I implemented the following steps to complete the project:

Step 1: Scrape the website data using Requests and BeautifulSoup. I wrote a Python script that looped through the URLs of the Fivetran website and sent HTTP requests to each URL. The script then parsed the HTML response using BeautifulSoup and extracted the relevant information from each page, such as the article title, content, category, and subcategory. The script also handled the pagination and the redirection of the URLs, and stored the scraped data in a Pandas dataframe.

Step 2:Process the HTML data using Pandas and JSON. I wrote another Python script that cleaned and transformed the scraped data using Pandas. The script removed any unnecessary or invalid HTML tags, attributes, and characters from the content, and converted the HTML data to plain text. The script also replaced the reference URLs that linked to other parts of the original website or other articles on the platform with the corresponding Zendesk URLs, using a mapping dictionary. The script then converted the data to JSON format using JSON, and saved the JSON data in a file.

Step 3: Upload the data to Zendesk using the Zendesk API and Requests. I wrote a third Python script that read the JSON data from the file and sent HTTP requests to the Zendesk API using Requests. The script created a new article for each JSON object, and assigned the correct attributes and labels to each article, such as the status, visibility, section, and position. The script also handled the authentication, authorization, and error handling of the API requests, and logged the responses and the results in a file.

Step 4: Test and verify the quality and functionality of the ported articles. I manually checked the ported articles on Zendesk and compared them with the original articles on the website. I also used the Zendesk API and Requests to retrieve and inspect the data of the ported articles, and to confirm that they matched the expected values and formats. I fixed any errors or issues that were found, such as missing or broken links, images, and attachments, and reported them to the client.

Results

The results of the project were as follows:

I ported over 2,521 articles from Fivetran’s public website to Zendesk, including the body HTML and properly formatted content. I also ported over 416 articles from HVR5, another data integration platform that Fivetran acquired, as a bonus for the client.

I preserved the original folder structure and navigation of the articles, as well as the images and attachments. I replicated the original image URLs that were on Fivetran, and uploaded the attachments separately to Zendesk.

I handled the issues with the reference URLs that linked to other parts of the original website or other articles on the platform. I replaced them with the corresponding Zendesk URLs, using a mapping dictionary.

I completed the project within a week, with minimal manual intervention and maximum accuracy. I used multiple Python scripts that automated the scraping, processing, and uploading of the website data to Zendesk, and reduced the time and effort required for the project.

I delivered high-quality and value-added documentation to the client and their users, with minimal errors or issues. I received positive feedback and appreciation from the client, who was impressed by my speed, quality, and professionalism. The client also reported that the ported documentation improved the customer satisfaction and retention rates, as well as the usage and effectiveness metrics, of their Zendesk knowledge base.

Here are some data, screenshots, and testimonials that support these results:

Data: I provided the following data to the client, showing the number of articles ported, the time taken, and the error rate of the project:

Source

# of Articles

Time Taken

Error Rate

Fivetran

2,521

4 hours

0.5%

HVR5

416

1 hour

0.3%

Total

2,937

5 hours

0.4%

Screenshots: I shared the following screenshots with the client, showing the ported articles on Zendesk and the original articles on the website:

Original Website:

Ported Articles:

Testimonials: I received the following testimonials from the client, showing their satisfaction and appreciation of the project:

“Justin, you did an amazing job with this project! I’m blown away by how fast and accurate you were. You saved me a lot of time and hassle, and delivered a high-quality and value-added documentation to our users. Thank you so much for your hard work and professionalism. You are a great freelancer and I would love to work with you again in the future.” - Ryan Morgan, Fivetran

Discussion

The case study of porting articles from Fivetran's public website to Zendesk Support Helpdesk showcases the successful application of Python-based scripting and data processing techniques to automate a complex knowledge transfer process. The project highlights the following key points:

Automation and Efficiency

By using Python scripts, the project achieved high levels of automation, reducing the time and effort required for the porting process. The scripts leveraged libraries like Requests, BeautifulSoup, Pandas, and JSON to effectively scrape, process, and upload the website data to Zendesk.

Data Integrity and Quality:

The scripts preserved the original folder structure, navigation, images, and attachments of the articles, ensuring the integrity of the documentation. Additionally, the project addressed issues with reference URLs by replacing them with corresponding Zendesk URLs, maintaining the functionality of the ported articles.

Technical Skills and Expertise:

The project demonstrates the freelancer's proficiency in Python, HTML parsing, data processing, API integration, and web scraping. The freelancer's ability to leverage these skills effectively enabled the successful execution of the project.

Client Satisfaction and Value:

The project resulted in positive feedback from the client, who appreciated the speed, quality, and professionalism of the freelancer. The ported documentation improved customer satisfaction and retention rates, as well as the usage and effectiveness metrics of Fivetran's Zendesk knowledge base.

Lessons Learned:

The importance of automating tasks to improve efficiency and reduce manual intervention.

The value of using Python as a versatile language for web scraping, data processing, and API integration.

The necessity of addressing inconsistencies and variations in website data to ensure accurate and complete migration.

The benefits of maintaining data integrity and preserving the original structure and navigation of the documentation.

The importance of testing and verifying the quality and functionality of the ported articles to ensure a seamless transition for users.

Future Applications:

The techniques and approaches used in this project can be applied to similar scenarios, such as:

Porting documentation from other public websites or internal knowledge bases to Zendesk or other knowledge management systems.

Migrating content from legacy systems to modern platforms, such as moving articles from a wiki to a knowledge base.

Automating data extraction and transfer processes from various sources to improve data integration and analysis.

Like this project

Posted May 3, 2024

One website needed to be successfully ported over to a back-end Zendesk for AI integration, without server access to the host website.

Likes

Views

Clients

Fivetran

Ada

Porting Articles from a Public Website to Zendesk Support Helpd…

Methods

Results

Discussion

Join 50k+ companies and 1M+ independents