LINEAR REGRESSION FOR ADMISSION PREDICTION- A COMPETITIVE STRAT…

Imaobong Njokko

Data Visualizer
Business Analyst
Data Analyst
Microsoft Excel

Introduction

Here, we have the fictitious case of Duke University, a school located in the United States of America with 400 applicants for the new graduate school session. Based on different factors, which we will consider the “explanatory” or “independent” variables in this case study, we will determine the likelihood of each applicant getting accepted into the university.
The impact of these variables on our response or dependent variable, which is termed “chance of admission” in the dataset, will be analyzed using a multiple linear regression model in Microsoft Excel, and the results will be interpreted.
The aim of this analysis and study is to come up with an analytical model for the university to streamline the initial screening phase of their admissions process for international students and, in doing so, give them a competitive edge over other universities. By applying this model, the university will pick students with great academic track records who are likely to succeed and produce quality research in the process of their studies. This will improve the school’s retention rate, graduation rate, and reputation, therefore giving them a competitive advantage in attracting future applicants and funding.
The various independent variables are: Graduate Record Examination (GRE) Score, Test of English as a Foreign Language (TOEFL) Score, University Rating of their undergraduate school, Statement of Purpose (SOP), Letter of Recommendation (LOR), Cumulative Grade Point Average (CGPA), and Research. The SOP and LOR have been reviewed and given a rating ranging from 1 - 5 based on preset criteria by the university, and Research has a score of either 1 or 0, to indicate applicants who have previously worked on a research paper or not. By analyzing this applicant’s dataset using the regression model, we will determine which students can move forward with the next phase of the admissions process.

Case Profile

Duke University has a graduate school that is quite reputable and is rated moderately high on the list of many top schools. However, there is still room for improvement and enhancement in their competitive strategy.
One hallmark of the top schools is the high academic standards they set for applicants, giving the schools a head start on the right pick of students and already ensuring excellent academic output over other universities. This in turn leads to these schools having a better reputation, getting better research funding and donations for facilities, which just furthers their competitive edge over other schools even more.
This is the strategy this case study proposes for Duke University and will be achieved using a multiple linear regression model. By using the model to screen the applicants and determine their chance of admission based on their prior academic records, the university will start to set itself apart by the type of students, quality of academic output, and as long as they uphold excellent teaching standards and provide the appropriate support on their part, they should see an upward trajectory in their competitive position moving forward.

Methodology

The dataset is fictitious and was found on Kaggle, it contains 400 rows and 9 columns which are as follows;
Serial Number (For the student’s application)
GRE (Graduate Record Examination) Scores
TOEFL (Test of English as a Foreign Language) Scores
University Rating (Of the student’s undergraduate school)
Statement of Purpose (SOP) Strength
Letter of Recommendation (LOR) Strength
Undergraduate GPA
Research Experience
Chance of Admission (0 to 1)
A multiple linear regression model will be applied here because, not only is it useful for analyzing the impact of the independent variables on the dependent variable, it also shows the degree to which each variable affects the dependent variable. Using a multiple linear regression model also allows us to perform predictions on the outcome variable, giving the school a chance to streamline its application screening process and attain its competitive goals.
Multiple linear regression is a statistical technique employed to determine or estimate the relationship between a single quantitative response or dependent variable and multiple explanatory or independent variables.
For this analysis, I began by creating a correlation matrix, which showed that the relationship between the explanatory variables and the response variable ranged from moderately strong to very strong. Upon finding this out, we determined that it was appropriate to proceed with the regression analysis using the independent variables in the dataset. This was done using the Analysis ToolPak add-in in Microsoft Excel, the confidence level was set to 99%, and the model was set to show the residuals.
A residual plot was created using the residual table generated, and scatterplots were also created to show the relationship between the statistically significant independent variables and the dependent variable. Then all the visualizations were compiled into one dashboard to give the university a better overview of the analysis.
Lastly, a column was created using the following formula to predict the chance of admission based on the model; Y = β0 + β1x1 + β2x2 + β3x3 + β4x4 + β5x5 + β6x6. Any new rows created will automatically have a prediction generated after all the explanatory variables are included.

Result and Discussion

Correlation analysis

This is coefficient matrix shows the correlation between the different variables. We will focus on the correlation between the Chance of Admission and the independent variables, as seen in the first column. The values range from 0.55 to 1, the higher values indicate a stronger correlation. The variables with the highest correlation are CGPA, GRE Score, and TOEFL Score, while Research has the lowest correlation.

Regression analysis

The regression model has a multiple R value of 0.896, indicating a strong positive correlation between the independent variables and the dependent variable. The R-squared suggests that the model can explain 80.3% of the variability in the dependent variable. The ANOVA table shows that the regression model is statistically significant with a low p-value, indicating that the model fits the data well. The model has a standard error of 0.063, which is relatively small, and there are 400 observations in the dataset.
The coefficients represent the strength and direction of the relationship between the independent variables and the dependent variable. A one-unit increase in GRE Score is associated with a 0.0017 unit increase in Chance of Admission, and the other coefficients can be interpreted in that manner too, except for the intercept which is the value of the dependent variable when the independent variables are at 0, but this interpretation has no managerial meaning in our analysis.
The p-values indicate the statistical significance of each of the independent variables to the outcome of the response variable. A p-value less than 0.05 indicates that the variable has a significant impact on the dependent variable. In this analysis, GRE Score, TOEFL Score, LOR, CGPA, and Research are statistically significant predictors of Chance of Admission, while University Rating and SOP are not.

Residual plot

In the residual plot, we see that the residuals are evenly and randomly distributed around the horizontal line at zero, there is no pattern shown, which can be interpreted to mean that the regression model is a good fit for the data.
By applying this model, Duke University can improve its ability to predict the chance of admission for each applicant more accurately, which could help reduce the chances of admitting students who have a history of poor academic performance. By admitting students who have a solid academic track record and greater chances of succeeding in their education, the school will improve its retention rates and graduation rates, along with its reputation. All these will add up to give the university a competitive edge when it comes to attracting future applicants and securing funding, ultimately improving the school’s position among the lists of top schools.
Dashboard for variables

Conclusion

The multiple linear regression model applied in this study will provide the university with a more objective and data-driven approach to its admission process than what it currently uses. This will help the school improve the fairness and transparency of its admission process, which is important in the highly competitive world of academia today, especially being situated in a country where thousands of students across the globe compete for a chance to study. There is also a growing demand for accountability from schools with regards to their admission processes and equal access to educational opportunities, and by employing this regression model, Duke University can provide proof of its transparent and data-driven applicant selection process.
Additionally, the model can help the university identify the most important variables that influence the chances of admission and use this information to optimize their admission strategy. By focusing on the independent variables that have a stronger relationship with the chance of admission, the school can direct its efforts where they matter most by doing things like providing support or incentives for applicants with high GRE Scores, or Research experience, and so on.
Conclusively, the model will improve the school’s ability to predict the chance of admission for each applicant more accurately, which should help reduce the risk of admitting students who are likely to perform poorly or drop out early. By admitting students who have a greater chance of success in their education, the school can improve its retention rates, graduation rates, and reputation. This will give the school a competitive edge in attracting future applicants and funding, and eventually enhance its position among the lists of top schools.
Partner With Imaobong
View Services

More Projects by Imaobong