Implementation of ML Algorithm for C&R Business using SAS

Angelin

Data Modelling Analyst
Business Analyst
Data Analyst
Python
SAP
Sass

This project aims to optimize transaction processes and increase operational efficiency at CRANK & ROLL by comparing the performance of six machine learning algorithms, including Bayesian Network, Gradient Boosting, Random Forest, Decision Tree, Support Vector Machine (SVM), and Neural Network, in classifying payment methods, particularly credit cards using SAS software. The researcher used metrics such as accuracy, precision, recall, F1 score, confusion matrix, ROC-AUC, lift chart, and cumulative gain to evaluate the performance of the machine learning models. Based on the analysis results, it was found that Gradient Boosting is the best model with the lowest misclassification rate. Implementing the Gradient Boosting model in the CRANK & ROLL transaction system optimizes transaction processes by making more accurate predictions about customer payment methods, allowing for more efficient and effective transaction processing. Additionally, a better understanding of customer payment preferences enables CRANK & ROLL to optimize inventory management and marketing strategies, as well as support more informed and strategic business decision-making. This study is expected to provide deeper insights into customer payment preferences and increase understanding of machine learning algorithms, particularly Gradient Boosting, in the context of motorcycle parts retail business.



The CRANK & ROLL data first undergoes Exploratory Data Analysis (EDA) to understand its structure, patterns, and characteristics before further modeling. EDA involves examining data distribution, identifying outliers and missing values, finding duplicated features, and analyzing variable correlations. The data includes variables like product type, warehouse location, quantity sold, unit price, total sales, and payment method. Basic statistical summaries and visualizations (box plots, histograms, line charts, and pie charts) are used to understand distributions and value ranges. Correlation analysis helps identify factors influencing payment methods. After EDA, the cleaned data is analyzed using the DCOVA & I framework: Define, Collect, Organize, Visualize, Analyze, and Insights.



Define

The first step is to define the business problem and set clear objectives. The focus is on understanding CRANK & ROLL's challenges in managing various payment methods, especially credit card payments. The goal is to identify payment patterns to optimize transaction processes and enhance operational efficiency. The study aims to compare the performance of different machine learning algorithms in classifying credit card payment methods based on sales data features. Once the problem and objectives are clear, the next step is data collection.

Collect

In the collect phase, data is sourced from Kaggle, a reputable platform providing high-quality datasets. The data used is “sales_data” from motorbike parts sales, which includes eight features:

Date (dateTime), Warehouse (Char), Client type (Char), Product line (Char), Quantity (Numeric), Unit price (Numeric), Total (Numeric), Payment (Char)

Organize

In this phase, the collected data is cleaned and organized. A unique identifier, “Unique_ID”, is added to each entry to ensure uniqueness. Duplicates are removed to avoid analysis inaccuracies, and missing values are filtered out to ensure data quality.





Visualize

The organized data is visualized using tables, bar charts, pie charts, and scatter plots. These visualizations make it easier to identify patterns, trends, and relationships within the data, aiding in problem analysis.







Analyze

During the analyze phase, various machine learning algorithms (Random Forest, SVM, Bayesian Learning, Decision Trees, Gradient Boosting, and Neural Network) are used to create and compare models. The goal is to determine the most effective algorithm for classifying credit card payment methods based on the sales data features. Models are evaluated on their accuracy, misclassification rate, and other performance metrics to identify the best-performing model.








Insights

The final phase involves interpreting the analysis results to provide insights into CRANK & ROLL's customer payment preferences, particularly credit card usage. This includes identifying patterns in credit card use across product categories and warehouse locations, and factors influencing payment choices. The insights help pinpoint customer segments more likely to use credit cards, achieving the research objectives.



The comparison chart for all algorithms displays the performance of six classification models in predicting whether a payment was made by credit card: Bayesian Network, Gradient Boosting, Random Forest, Decision Tree, Support Vector Machine (SVM), and Neural Network. Each model is evaluated based on misclassification rate and confusion matrix performance. The misclassification rate for all models is approximately 0.116, indicating good performance with minimal error variation.

Gradient Boosting is selected as the best model with the highest cumulative lift (1.5175) and the best F1 score (0.925). Its confusion matrix shows 657 out of 775 credit card payments and 231 out of 341 non-credit card payments correctly predicted. With a C-Statistic of 0.961, Gradient Boosting demonstrates excellent discriminative ability, much better than the Bayesian Network's C-Statistic of 0.834.

Random Forest, also a strong performer, has a cumulative lift of 1.3861 and an F1 score of 0.919. Its confusion matrix performance is similar to Gradient Boosting but slightly lower in precision. Neural Network and SVM also perform well, with F1 scores of 0.919 and cumulative lifts close to 1.4.

Decision Tree, although having a good F1 score (0.925), falls slightly short of Gradient Boosting in cumulative lift (1.312). Bayesian Network performs the worst overall, with the lowest cumulative lift and F1 score.

In summary, Gradient Boosting is the best model, followed by Random Forest and Neural Network, with Bayesian Network having the lowest performance among the six models.



Conclusions This project aims to optimize transaction processes and enhance operational efficiency at CRANK & ROLL by comparing the performance of various machine learning algorithms in classifying payment methods, specifically credit cards, and addressing the challenges of understanding and predicting customer payment methods. Based on the analysis and comparison of six classification models (Bayesian Network, Gradient Boosting, Random Forest, Decision Tree, Support Vector Machine (SVM), and Neural Network), it was found that Gradient Boosting is the best model with the lowest misclassification rate of 0.1070.

The implementation of the Gradient Boosting model in CRANK & ROLL's transaction system optimizes transaction processes with more accurate predictions of customer payment methods, leading to more efficient and effective transactions. It also enhances operational efficiency by providing a better understanding of customer payment preferences, allowing CRANK & ROLL to optimize inventory management and marketing strategies. The accurate data analysis from the Gradient Boosting model provides a solid foundation for CRANK & ROLL's business decision-making, making it more precise and strategic.

This study demonstrates that the use of machine learning technology can significantly help understand customer payment patterns and improve overall business performance.

Partner With Angelin
View Services

More Projects by Angelin