Web Scraper & Sentiment Analysis

Salvatore Cancilla

Data Scientist
Data Analyst
AI Developer
Python
Selenium
Sentiment analysis is a process that involves using natural language processing and machine learning techniques to determine the emotional tone or sentiment expressed in a given text. It aims to classify the overall sentiment of the text as positive, negative, or neutral.
This analysis can be applied to various types of text data, such as social media posts, customer reviews, or news articles, providing valuable insights into public opinion, customer feedback, and brand reputation.
This small project aims to show the potential of Natural Language Processing, taking the data directly from the comments on Linkedin
To get the data it was necessary to make a script with selenium, to interact with the web pages and get the comments.
First the script asks for a linkedin username and password, and then it asks for the name of the company to search for.
Given the name of the desired company, I set the scraper to look for the comments of users who have tagged that company.
After I got the data, I put it in a dataframe and cleaned the useless information
This is a preview of the first datasdet obtained
Since the comments were written in several languages ​​and the model I used was trained on English sentences, I used two libraries LangDetect and Google Translator to automate the process of translating comments.
LangeDetect checks if the language is English, if not, Google Translator translates it into English.
​​​​​​​I also took advantage of this operation to create a new "Country" feature because it could be useful and interesting to divide the comments by country
Since many records contained the personal names of the users who had written the comment, I used another NLP library Spacy which allowed me to automatically identify the personal names and delete them.
An NLP model requires time, dedication, skills and a lot of data to be trained, however an interesting practice in data science is to use pre-trained models found on Hugging Face.
To identify user comment sentiment, I used a BERT-based NLP model via pipeline provided by hugging Face.
As a result of applying the pretrained model, I created a function that returns the sentiment of the comments in a new "Sentiment" column for each record.
For a deeper sentiment analysis it would be necessary to have more data on users who left comments. This project aims to show the potential of natural language processing. For example, it would be possible to use the same model to identify negative comments and create a chatbot that responds to users who leave negative feedback
Partner With Salvatore
View Services

More Projects by Salvatore