Spotify Top 10,000 Songs Analysis

Abhinav

Abhinav Dubey

Analysis of Top 10,000 Songs on Spotify (1960-Present)

Overview

This project explores the 'Top 10,000 Songs on Spotify from 1960 to 2023', offering a detailed analysis of musical trends, popularity, and evolution over the years. Through data cleaning, exploratory data analysis (EDA), and machine learning, insights into the temporal evolution of music listening on Spotify — including analyses of tracks, albums, artists, labels, and genres — are revealed. Predictive modelling of song popularity and a recommendation system based on musical features are also featured.

Visualisations

For an interactive exploration of the project's findings, visit the Tableau Public dashboard at the following link: Interactive Music Analysis on Tableau Public
This dashboard complements the analysis, providing a dynamic way to engage with the data.

Installation

To get started, clone this repository to your local machine. Ensure you have Anaconda installed, then import the musicanalysis environment from the provided environment.yml file by running: conda env create -f environment.yml
Activate the musicanalysis environment: conda activate musicanalysis
Note: This project utilises a custom-built function, describex(), housed within a custom module for enhanced data description. The function and its module are not provided in this repository to maintain the focus on the notebook's methodology and findings. Descriptions of its functionality and intended output are included within the notebook.

Usage

This project is presented through a Jupyter Lab notebook, offering a narrative journey through the data analysis process. To explore the project:
Activate the musicanalysis environment: conda activate musicanalysis
Start Jupyter Lab: jupyter lab
Navigate to and open the project notebook.

Project Structure

README.md - Project overview and setup instructions.
environment.yml - Conda environment file.
Music Analysis (Top 10,000 Songs on Spotify 1960-Now).ipynb - Jupyter Lab notebook containing the project's analysis.

Analysis Overview

The project's analysis section is meticulously structured to cover various aspects of the musical landscape on Spotify:
Temporal Evolution of Music (1960-Present): Tracks analysis over time, popular tracks and artists, and album and label trends.
Genre Analysis: Including broad categorisation of genres into 4 major categories (Hip-Hop, Pop, Rock, and Others), popularity, and evolution.
Miscellaneous Analysis: Like popularity over years, musical feature evolution, and distribution of explicit content.
Machine Learning: Implementation of Random Forest and Gradient Boosting Regressor models for song popularity prediction and development of a recommendation system based on song features.

Dataset Reference

This project utilises the 'Top 10,000 Spotify Songs from 1960 to Now' dataset, contributed by JOAKIM ARVIDSSON. The dataset can be accessed on Kaggle at this link: https://www.kaggle.com/datasets/joebeachcapital/top-10000-spotify-songs-1960-now/data

Acknowledgements

I would like to extend my gratitude to the numerous YouTube creators who have shared their knowledge on data science and data analysis, which has been instrumental in my learning journey:
A special thanks to ChatGPT for its invaluable assistance throughout the development of this project.

Contributing

Feedback and contributions to this project are welcome.

Licence

This project is open-source and available under the MIT Licence. The use of the dataset is subject to the terms provided by its respective owner or contributor.
Like this project

Posted Apr 24, 2025

Analysis of Spotify's top 10,000 songs from 1960-2023, exploring trends and creating a recommendation system.