Drashti Patel
I had the valuable opportunity to participate in a live project with HiCounselor, where I collaborated with a team to complete various project modules. This project aimed to leverage a real-world dataset of employee records, allowing me to apply data analytics techniques to solve business problems and uncover valuable insights. The primary objective of this project was to evaluate the provided dataset thoroughly, extract meaningful information, and address specific business challenges. Throughout the project, I utilized structured query language (SQL) to analyze the real-world database, ensuring I could retrieve the most useful insights from the dataset. To enhance the project's performance and efficiency, I employed Python for data preprocessing tasks. This involved cleaning, transforming, and optimizing the dataset to improve the quality and reliability of our analysis. This project provided a comprehensive opportunity for SQL analysis, Python data preprocessing, and extracting insights from real-world databases. It equipped me with practical skills, a strong foundation in data analytics, and a deep understanding of its value for organizations. It showcases my proficiency in delivering actionable insights and making informed decisions through data-driven approaches.
Part 1: Data PreProcessing
The first part of the project involved data processing. This involved several steps to ensure data quality and compatibility. Firstly, I imported the dataset's CSV file into a Jupyter Notebook using Python. Then, I utilized Python to remove duplicate rows and eliminate any entries with blank or missing values.
After cleaning the dataset, I proceeded to import the CSV file into phpMyAdmin. This conversion allowed me to transform the data into SQL format, enabling the execution of queries within the Sandbox environment. This process ensured the dataset was ready for further analysis and allowed for seamless integration with the project's analytical workflows.
Part 2: Data Analysis
In the data analysis phase of the project, I focused on generating and executing SQL queries on the cleaned dataset. The objective was to address a series of 15 business problems specifically related to the HR dataset. These examples demonstrate my ability to leverage SQL queries to address complex business problems to derive insights and contribute to informed decision-making processes.
1. List the top 3 departments with the highest average training scores.
SELECT department, ROUND(AVG(avg_training_score), 2) AS average_training_score
FROM employee
GROUP BY department
ORDER by average_training_score desc
LIMIT 3;
Output
Total Records Fetched: 3
Headers:
department, average_training_score
Values:
Analytics, 84.59
R&D, 84.42
Technology, 79.81
2. Find the percentage of female employees who have won awards, per department. Also show the number of female employees who won awards and total female employees.
SELECT department,
ROUND(COUNT(CASE WHEN gender = 'f' and awards_won = 1 then employee_id end)/
COUNT(CASE WHEN gender = 'f' then employee_id end) * 100, 2) AS female_award_winners_percentage,
COUNT(CASE WHEN gender = 'f' and awards_won = 1 then employee_id end) AS females_won_award,
COUNT(CASE WHEN gender = 'f' then employee_id end) AS total_females
FROM employee
GROUP BY department;
Output
Total Records Fetched: 9
Headers:
department, female_award_winners_percentage, females_won_award, total_females
Values:
HR, 1.31, 4, 306
Sales & Marketing, 2.14, 19, 889
Procurement, 3.35, 31, 926
Finance, 2.22, 4, 180
Analytics, 2.29, 3, 131
Technology, 2.20, 17, 772
Operations, 3.08, 42, 1365
Legal, 4.00, 1, 25
R&D, 0.00, 0, 16
3. List the top 5 recruitment channels with the highest average length of service for employees who have met more than 80% of their KPIs, have a previous_year_rating of 5, and an age between 25 and 45 years, grouped by department and recruitment channel.
SELECT department, recruitment_channel, ROUND(AVG(length_of_service), 2)
AS average_service_length
FROM employee
WHERE KPIs_met_more_than_80 > 0 AND previous_year_rating = 5 AND age BETWEEN 25 AND 45
GROUP BY department, recruitment_channel
ORDER BY average_service_length DESC
LIMIT 5;
Output
Total Records Fetched: 5
Headers:
department, recruitment_channel, average_service_length
Values:
Operations, referred, 6.20
Operations, other, 6.08
Operations, sourcing, 5.93
Sales & Marketing, other, 5.77
Procurement, sourcing, 5.65