Employee Performance Analytics Using Python & MySQL

Drashti Patel

Data Analyst

Jupyter

MySQL

Python

Project Overview

I had the valuable opportunity to participate in a live project with HiCounselor, where I collaborated with a team to complete various project modules. This project aimed to leverage a real-world dataset of employee records, allowing me to apply data analytics techniques to solve business problems and uncover valuable insights. The primary objective of this project was to evaluate the provided dataset thoroughly, extract meaningful information, and address specific business challenges. Throughout the project, I utilized structured query language (SQL) to analyze the real-world database, ensuring I could retrieve the most useful insights from the dataset. To enhance the project's performance and efficiency, I employed Python for data preprocessing tasks. This involved cleaning, transforming, and optimizing the dataset to improve the quality and reliability of our analysis. This project provided a comprehensive opportunity for SQL analysis, Python data preprocessing, and extracting insights from real-world databases. It equipped me with practical skills, a strong foundation in data analytics, and a deep understanding of its value for organizations. It showcases my proficiency in delivering actionable insights and making informed decisions through data-driven approaches.

Process of Live Project Execution

Part 1: Data PreProcessing

The first part of the project involved data processing. This involved several steps to ensure data quality and compatibility. Firstly, I imported the dataset's CSV file into a Jupyter Notebook using Python. Then, I utilized Python to remove duplicate rows and eliminate any entries with blank or missing values.

After cleaning the dataset, I proceeded to import the CSV file into phpMyAdmin. This conversion allowed me to transform the data into SQL format, enabling the execution of queries within the Sandbox environment. This process ensured the dataset was ready for further analysis and allowed for seamless integration with the project's analytical workflows.

Removing duplicates from the dataset

Removing missing values from the dataset

Part 2: Data Analysis

In the data analysis phase of the project, I focused on generating and executing SQL queries on the cleaned dataset. The objective was to address a series of 15 business problems specifically related to the HR dataset. These examples demonstrate my ability to leverage SQL queries to address complex business problems to derive insights and contribute to informed decision-making processes.

1. List the top 3 departments with the highest average training scores.

SELECT department, ROUND(AVG(avg_training_score), 2) AS average_training_score

FROM employee

GROUP BY department

ORDER by average_training_score desc

LIMIT 3;

Output

Total Records Fetched: 3

Headers:
 department, average_training_score

    

Values:

Analytics, 84.59

R&D, 84.42

Technology, 79.81

2. Find the percentage of female employees who have won awards, per department. Also show the number of female employees who won awards and total female employees.

SELECT department, 

       ROUND(COUNT(CASE WHEN gender = 'f' and awards_won = 1 then employee_id end)/

       COUNT(CASE WHEN gender = 'f' then employee_id end) * 100, 2) AS female_award_winners_percentage,

       COUNT(CASE WHEN gender = 'f' and awards_won = 1 then employee_id end) AS females_won_award,

       COUNT(CASE WHEN gender = 'f' then employee_id end) AS total_females

FROM employee

GROUP BY department;

Output

Total Records Fetched: 9

Headers: 
department, female_award_winners_percentage, females_won_award, total_females

    

Values:

HR, 1.31, 4, 306

Sales & Marketing, 2.14, 19, 889

Procurement, 3.35, 31, 926

Finance, 2.22, 4, 180

Analytics, 2.29, 3, 131

Technology, 2.20, 17, 772

Operations, 3.08, 42, 1365

Legal, 4.00, 1, 25

R&D, 0.00, 0, 16

3. List the top 5 recruitment channels with the highest average length of service for employees who have met more than 80% of their KPIs, have a previous_year_rating of 5, and an age between 25 and 45 years, grouped by department and recruitment channel.

SELECT department, recruitment_channel, ROUND(AVG(length_of_service), 2) 

       AS average_service_length

FROM employee

WHERE KPIs_met_more_than_80 > 0 AND previous_year_rating = 5 AND age BETWEEN 25 AND 45

GROUP BY department, recruitment_channel

ORDER BY average_service_length DESC

LIMIT 5;

Output

Total Records Fetched: 5

Headers:
 department, recruitment_channel, average_service_length

    

Values:

Operations, referred, 6.20

Operations, other, 6.08

Operations, sourcing, 5.93

Sales & Marketing, other, 5.77

Procurement, sourcing, 5.65

Certificate

Like this project

Posted May 12, 2023

Executed SQL queries on HR dataset, solving 15 business problems, to derive valuable insights about employee performance.

Likes

Views

247

Employee Performance Analytics Using Python & MySQL

Project Overview

Process of Live Project Execution

Certificate

Join 50k+ companies and 1M+ independents