In this project, I conducted a comprehensive data analysis of depression among students using a large-scale survey dataset. The analysis was performed in R within a Jupyter Notebook as part of the Google Data Analytics Capstone project.
Objectives
Identify key demographic, academic, lifestyle, and psychological factors linked to student depression.
Assess the quality and reliability of the dataset before analysis.
Clean and transform the dataset for accuracy and consistency.
Explore trends, patterns, and correlations between variables and depression outcomes.
Provide actionable insights to support early intervention strategies for mental health.
Methodology
Loaded and inspected the dataset using R and the custom-built autofileIO package for automated file reading.
Performed ROCCC assessment (Reliable, Original, Comprehensive, Current, Cited) to evaluate data quality.
Cleaned data by handling missing values, removing duplicates, and converting categorical variables to factors.
Conducted descriptive statistics to understand variable distributions.
Created data visualizations using ggplot2 to identify trends and patterns across demographics, academic pressure, lifestyle habits, and mental health indicators.
Analyzed relationships between variables (e.g., financial stress, family history of mental illness) and depression levels.
Results
Found that depression prevalence was similar among male and female students (~58%).
Identified significant associations between high academic pressure, poor sleep duration, unhealthy dietary habits, and increased depression rates.
Observed that financial stress and family history of mental illness were strong contributing factors.
Produced a clear, data-driven report to guide educational institutions and policymakers in creating effective mental health intervention programs.
Like this project
Posted Aug 14, 2025
Conducted data analysis on student depression for Google Data Analytics Capstone.