Analyzing Spotify and YouTube Songs Using Python & MySQL

Drashti Patel

Data Analyst
Google Sheets
Python
SQL

Project Overview

I was able to participate in a live Data Analytics project offered by HiCounselor. In this project, I performed essential data pre-processing tasks with Python to ensure accurate and reliable data for further analysis with SQL. This included handling null values, removing irrelevant columns, eliminating duplicates, converting data types, and creating new columns. I also analyzed the dataset to determine dominant platforms and exported the cleaned dataset to a CSV file. Overcoming challenges during the data import process was a significant aspect of the project.

Process of Live Project Execution

Part 1: Data PreProcessing
Data Pre-processing is crucial for effective data analytics as unprocessed data can lead to undesirable outcomes in further applications. In this project, I used Python to perform various data modifications including handling null values, deleting or transforming irrelevant values, changing data types, removing duplicates, and more. These tasks optimized the dataset, eliminated errors, and prepared it for in-depth analysis. Importing the table posed challenges, but I successfully addressed them. The resulting dataset was high-quality and ready for advanced exploration.
I performed various tasks to enhance the quality and usability of the dataset:
1. Streamlined the data by eliminating unnecessary columns (Url_spotify, Uri, Key, Url_youtube, and Description).
2. Conducted a thorough examination of the dataset for null values and calculated the total count of null values in each column to assess data completeness and identify potential quality issues.
3. Managed null values by implementing appropriate strategies to improve overall data quality.
4. Identified and removed duplicate records, retaining the first occurrence of each unique value to ensure data integrity and reduce redundancy while preserving the original data structure.
5. Converted duration values from milliseconds to minutes, making it easier to understand and interpret time intervals in a user-friendly format.
6. Renamed the modified column as "Duration_min" to accurately reflect the data transformation.
7. Eliminated irrelevant track names starting with the "?" character to ensure the dataset only contained meaningful track information.
8. Computed the Energy to Liveness ratio for each track, quantifying the relationship between energy and liveliness attributes and storing the results in a new column named 'EnergyLiveness'.
9. Modified the data type of the 'views' column to float, enabling numerical operations and facilitating its use in subsequent calculations requiring floating-point values.
10. Analyzed the 'views' and 'stream' columns to determine the dominant platform (YouTube or Spotify) where each song track received the highest play count. Created a new column called 'most_playedon' with values 'Spotify' or 'YouTube' indicating the platform with the highest play count for each track.
11. Exported the cleaned dataset to a CSV file named "cleaned_dataset.csv" and provided a clickable file name for easy access and retrieval by users.
12. Created a MySQL table named "cleaned_dataset" using phpMyAdmin and imported the exported file "cleaned_dataset.csv" for further analysis.
Part 2: Data Analysis
During this project, I successfully answered several data analysis questions to gain valuable insights into song popularity, engagement patterns, and user preferences across different platforms.
Analysis Questions:
1. Identified the most viewed song track on YouTube.
2. Determined the song track with the highest streaming count on Spotify.
3. Identified the top 5 songs with the highest energyLiveness ratio, a popular measure of song quality.
4. Analyzed Black Eyed Peas songs to determine which platform (YouTube or Spotify) generated more engagement for their tracks.
5. Found the most liked song on YouTube for Gorillaz and included its energy and tempo.
6. Identified the prominent album types on Spotify.
7. Assisted Spotify in finding the most loved songs by selecting the top tracks with the highest combined streaming and YouTube views.
into song popularity, engagement patterns, and user preferences across different platforms.

Certificate

Partner With Drashti
View Services

More Projects by Drashti