Fake News Detection Using Deep Learning Models

Segun Oni

Data Analyst

Writer

Grammarly

Python

scikit-learn

ABSTRACT

This research project aims to develop a reliable fake news detection system using advanced deep learning models, specifically transformer-based architectures like BERT and RoBERTa. The system's objective is to combat misinformation and ensure information integrity in the digital landscape.

Traditional machine learning methods and recent advancements in deep learning are compared through an extensive literature review. Transformer-based models are selected for their ability to capture complex relationships and contextual understanding. Utilizing pre-trained models and specialized accelerators further enhances the system's efficiency.

Performance evaluation on the Welfake dataset shows that transformer-based models outperform traditional classifiers like Random Forest and Decision Trees. The deep learning models achieve approximately 1.85% higher accuracy, recall, and F1-score, with 0.84% higher precision. Moreover, they outperform Random Forest by approximately 5.53% higher accuracy, 2.96% higher precision, 6.65% higher recall, and 5.33% higher F1-score, and Decision Trees by approximately 5.95% higher accuracy, 3.26% higher precision, 6.93% higher recall, and 5.74% higher F1-score.

The system's comparison with existing solutions reaffirms the superiority of transformer-based models in fake news detection. An interactive interface ensures practicality and user-friendliness. Strategies for bias mitigation, diverse data sources, and multilingual support are proposed to address limitations, emphasizing the importance of ethical considerations.

The system's contributions include insights into transformer-based models' effectiveness, leveraging pre-trained models, and promoting ethical AI use. It offers practical applications for media organizations, social media platforms, and fact-checking agencies to combat misinformation.

Future work involves exploring ensemble models, interpretability, cross-domain transfer learning, and multimedia content detection for further advancements.

In conclusion, this research project advances fake news detection using deep learning models, with a focus on preserving information integrity. The system's contributions and potential benefits pave the way for future research in addressing misinformation and fostering reliable information dissemination.

CHAPTER 1: INTRODUCTION

1.1. Background

Fake news is fictitious information in the form of articles, stories or hoaxes that are deliberately false and are created to deliberately misinform or deceive (Hsu, 2022). Fake news has taken many different forms across history, ranging from word of mouth, and printed form, to the internet we have today. Before the internet, most people got their information from a few trusted sources, such as newspapers or television networks, so it was in the best interest of these sources to avoid distributing fake news to maintain credibility. Ever since the internet, we have been bombarded with so much information from so many sources, the internet has made it so much easier to spread fake news. Studies show that 75% of people who see fake news think it is real (Silverman and Singer-Vine 2016).

Having misinformation online can have a huge impact on society and democracy (Colomina et al. 2021). Wu et al. (2022) highlighted in their research paper that misinformation has been linked to political division, and it has been used in the past as a form of social control. Misinformation has the potential to manipulate democratic processes and affect public understanding of health topics. For example, Wu et al. (2022) found that misinformation about covid-19 has been linked to negative health behaviors like vaccine hesitancy and other negative outcomes. Fact-checking organizations like NewsGuard and Hoaxy have been built to verify the accuracy of news articles. As the problem of fake news continues to grow, effective detection methods are needed, so more and more tools are being developed to identify it and prevent it from spreading.

For fake news detection, models generally perform better than machine learning models due to the increasing data samples (Okoro et al. 2018). Deep learning models have shown great potential for fake news detection. Note that no single method is perfect, and there is still room for improvement. There are specific challenges involved with using deep learning models for this task: Deep learning models require large datasets containing fake and legitimate news to train. Since fake news is often difficult to identify and collect, this can be a challenge. Due to the evolving nature of fake news, it can be difficult for deep learning models to keep up with the latest trends in fake news, so they need to be regularly retrained with new data to keep up.

A pretrained deep learning model is a model that has prior knowledge of a specific task. It has been trained on a large dataset of data, typically in a supervised learning setting. They can be used to save time and effort when developing new machine-learning applications. Instead of having to train a model from scratch, we can simply fine-tune a pretrained deep model that already has a holistic view of the task it was intended to perform. This can be very useful when “he training dataset available is not large.

The table below summarizes the advantages and drawbacks of using pretrained deep learning models for fake news detection:

Table 1.1: Pros and Cons of using pre-trained deep learning models.

Pros

Faster to train on top of or finetune.

More accurate than deep learning models trained from scratch.

2. Cons

If the dataset is large, it may be less accurate than a deep learning model trained from scratch.

Less accurate for a task outside of the domain of the data it was trained on.

Feature extraction techniques in fake news detection include lexical, syntactic, and semantic features (Zhou and Zafarani 2020). The transformers library uses lexical features and a vector representation to capture the meaning of each word. Feature extraction and representation techniques, model architecture selection, and optimization strategies are crucial for achieving high accuracy in detecting fake news.

The research goal is to develop an interactive web application for fake news detection by finetuning pre-trained machine learning models. The research will focus on computation speed and use recent data to provide a relevant tool for detecting fake news as of the period of this research.

1.2.Problem Statement

Fake news is false or misleading information presented as news. It can negatively impact society, politics, businesses, and people’s behaviour (Riccio and Gibbs 2018). In an article, Aldwairi and Alwahedi (2018) pointed out that there are different existing methods for fake news detection, but there is no perfect method. Han’s article reported that deep learning methods have shown promise in this area but can be difficult to train and deploy for public use, there are several challenges that need to be addressed before they can be widely used (Figueira, Guimarães and Torgo, 2018).

Since it takes a long time to train these models, it is difficult for this method to keep up with the evolution of techniques used by creators of fake news. This study proposes using pre-trained models as a means of speeding up the training process, as well as other methods to increase computational speed. This research study also proposes using Python frameworks like Streamlit and Gradio since they can be used to deploy machine learning models quickly and easily. This approach of model deployment could be beneficial for future scalability to handle more users and requests. With more research, there is a good chance that deep learning will play a major role in combating the spread of fake news.

1.3.Personal Motivation

I am personally motivated to detect fake news using deep learning models due to the severe impact that misinformation has on society and the pressing need for effective solutions. I believe that addressing the spread of fake news is not only a technological challenge but also a moral responsibility to protect the well-being of individuals and society as a whole. As a researcher, I aim to find innovative ways to balance the trade-off between model accuracy and computational speed and provide solutions that are both accurate and efficient. I am committed to developing a research approach that enables me to stay ahead of the curve and ensure that my detection tools remain relevant and effective in the face of emerging misinformation strategies. Ultimately, my personal motivation lies in contributing to the broader fight against fake news and promoting a more informed and resilient society.

1.4.Content of the Following Chapters

Below is the structure of the content in the remaining chapters.

Chapter Two - Literature review: presents a literature review on fake news detection using deep learning models, covering methods, datasets, limitations, and gaps in previous research. Methods include classical ML and deep learning, with datasets like FakeNewsNet and LIAR. Limitations involve interpretability, scalability, and language generalization. The research aims to address gaps, including multimodal data integration and adaptability for better fake news detection.

Chapter Three - Project specifications: outlines the project's aim to develop a fake news detection system using deep learning models and pre-trained models like BERT and RoBERTa. The methodology follows a waterfall model, including data collection, preprocessing, model training, testing, and evaluation. Risks are analyzed and mitigated to ensure project success.

Chapter Four - Design Alternatives and Justification for Chosen Design: presents the machine learning alternatives, the justifications for the choice of machine learning models, the experimental setup, the results from the machine learning models and the justification for using deep learning models.

Chapter Five – Implementation and Result: a detailed description of how the research is to be conducted. The hardware and software used, the steps involved in implementation, and the challenges encountered. We will document the speed of the process, as well as the evaluation loss.

Chapter Six – Evaluation: Critical analyses of research outcomes, comparison of achievement with existing solutions, comparison of achievement with project objectives, and evaluation of functional requirements.

Chapter Seven - Conclusion: Limitations, factors that could affect performance, potential benefits of the research, and recommendations for future research.

Like this project

Posted Jun 24, 2024

Segun authored an academic paper exploring the detection of fake news using deep learning models, which was subsequently published in a reputed journal.

Likes

Views

Fake News Detection Using Deep Learning Models

Join 50k+ companies and 1M+ independents