Death by suicide is commonplace and occurs almost everywhere. However, the suicide rates and numbers differ from country to country. This project is a dataset of suicide cases from 1985-2020. The project aims to find and explore relationships between the suicide rate and age group, generation, country, HDI for the year, GDP for the year, and GDP per capita.
###Project description
The data set has 12 variables: age, country, country-year, GDP for_year, gdp_per_capita, generation, HDI for year, population, sex, suicides_no, suicides/100k pop, year. The data is between the years 1985 and 2020. It was downloaded from the Kaggle open dataset.
The following data-wrangling steps were carried out to clean the data set
HDI for year was removed since it had lots of nulls.
2. An ID column was added to uniquely identify the countries.
Summary of Findings
I observed that over time, GDP per capita has increased and suicide rates have decreased.
2. The number of suicides among males is more than twice that of females.
3. The number of suicides in the Russian Federation is high.
4. The age group 35-54 has high suicide rates.
5. The Baby Boomer generation has high suicide rates.
CREATE database portfolioproject
USE portfolioproject
SHOW tables;
-- Select data we are going to use `suicide m` table
SELECT *
FROM `suicide m`;
-- Data cleaning of suicide dataset
-- Adding id column
ALTER TABLE `suicide m`
ADD COLUMN id int NOT NULL auto_increment Primary Key;
-- checking for distinct number of users
SELECT COUNT(DISTINCT(id))
FROM `suicide m`;
-- `suicide m` has 30556 users
-- checking missing data
SELECT *
FROM `suicide m`
WHERE id IS NULL;
-- deleting column HDI for year which will not use in analysis