Step-by-step process of cleaning and preprocessing of survey data on salaries in relation to profession and region of employment.
Surveys Have some of the most 'unclean' data when being taken. this is because the people surveyed have personal preferences regarding how they input data into the surveys.
For example, when inputting Age, some might input the row number, others might use the short form 'yrs' while others might write it as 'years'.
Due to this inconsistency during Data entry, it is very crucial to 'clean' or take it through the preprocessing stages before analysis can be done using any computer language.
These projects undertake the step-by-step procedure of these preprocesses ensuring that the data is consistent.
Some of these steps include dealing with missing values and removing marks on numerical values for purposes of conducting statistical operations on them.