This project is a Python-based automation solution for extracting news data from the Reuters website. It was developed as part of the test for the Python Automation Engineer position. The goal is to showcase the ability to build a bot that automates the process of extracting and processing news data using Robocorp's RPA Framework.
🟢 The Challenge
The task is to automate the process of extracting news data from a chosen news site. For this test, the Reuters website was selected. The automation includes:
Opening the news site.
Entering a search phrase and selecting a news category.
Extracting news details such as title, date, description, and picture.
Saving the data into an Excel file.
Processing news based on a specified number of months.
The Source
The automation is implemented for the Reuters website.
Parameters
The process requires the following parameters via a Robocloud work item:
main_url: The Reuters link do access the website.
search_input: The phrase to search for in the news.
section: The category or section of the news.
months: The number of months to retrieve news for (e.g., 1 for the current month, 2 for the current and previous month).
The Process
Open the Site: Navigate to the Reuters website.
Search: Enter the search phrase and select the news section.
Retrieve News: Collect news URLs and extract details from each news article.
Extract Data:
Title
Date
Description
Picture filename
Count of search phrases in the title and description
Whether the title or description contains monetary amounts
Save Data: Store the extracted data in an Excel file, including downloaded news pictures.
Delete files: Deletes files generated after starting the script, keeping space consumption low.