Hyacinth Ampadu
Lead conversion is a crucial aspect of business, especially for companies that deal in apps and websites. Identifying and predicting which leads are likely to become paying customers is important since those individuals contribute a lot to the company’s profit margin. A lead conversion rate of around 11% is deemed to be a good percentage, so every company wants to achieve that and even more, and the best way to do that is to predict these so that the required and appropriate arrangements are made to enable them to complete the process of converting, to keep making profits for the company and business.
For a lead to convert, there are various processes, one of the ideal processes is, the lead hears about the app, leads checks out the app, leads reads more about the app elsewhere, leads registers with the app, and hence lead converts. During this process, the lead can stop at any time, so it’s important to be able to predict the ones that are likely to convert to enable them to complete the converting process successfully. The ability to identify and target leads that are likely to convert can be the difference between a profitable venture and a failed one.
In this project, the use case was a mobile company trying to improve its lead conversion by utilizing ML techniques, and in this blog, I will discuss my approach to helping this company improve its lead conversion by diving into the technical details of how I built and deployed the system to a simple streamlit web app.
Below are the tools and technologies employed in this project:
The first and most important thing was to look at the characteristics of leads that converted
From the image above, most leads reside in the United States, with over 98% of leads, but only less than 5% of them converted, compared to the Uk which accounted for less than 1% of all leads, had the most leads converting, with around 5.27% of leads converting, which suggests leads registering in the UK are to be targeted more since they tend to convert the most.
From the image above, Facebook is the source that most leads redirect from, with almost half of all leads redirecting from there, but only 3.73% of leadsconvert from there. Medium redirection accounts for the most lead conversion with 6.76%, even though less than 1% of all leads redirect from there, so leads that redirect from medium should be given the most attention.
From the image above, the phone is the device most leads use to register for the app, with almost 94% of all leads registering with it, but only 4.43% of such leads are converted. The desktop has the highest conversion rate, with around 6.69% of all leads converting when they used the desktop to sign up, but only 5% of all leads signed up using the desktop, this could be due to the fact that leads who use the desktop have a bigger screen to look at the app and see all its features, and its easier to navigate, for example clicking a link on the app could just redirect to a new tab, keeping the other tab open, whereas, in the phone, the old tab probably would be closed. So the best option here is to target leads using desktops as they are more likely to convert.
In building the models, 5 algorithms were used: Random forest, Logistic regression, Decision trees, Xgboost, and LightGBM. These were chosen for the following reasons:
These are the final features used to build the predictive model:
Weights and biases was used to track the experiments. Experiment tracking is relevant to keep track of all the experiments I performed, recording the various parameters and configurations I tried, and comparing the performance of different models. I run a lot of experiments, so having this experiment tracker is vital in order to keep track of what I did, and know what worked and what didn't. This ultimately helps to increase efficiency, productivity, and accuracy in developing these machine learning models.
The figure above shows the experiments of the models built being tracked by wandb, and the best-performing model was the random forest algorithm, so that’s the model chosen for further development.
The performance of the model was evaluated on 5 metrics, accuracy(which isn’t a good metric for unbalanced datasets), recall, precision, f1score, and roc_auc. The model does not do quite that well in general, achieving scores of around 60% for the metrics, as it’s hard to predict leads that would convert as leads have very unique characteristics, but it makes an effort to predict them. The recall is the most important metric here, as we are interested in predicting all leads that convert, even if it means having a few leads that did not convert predicted as converted, so here having a few more false positives is acceptable. The Roc curve with the AUC score is shown below.
The main thing here is to minimize false negatives, leads that converted but predicted to not convert, as having more false negatives would mean missing out on key leads that can potentially make the company profit.
Every lead has different characteristics, where they are based, the device they use, the time they registered, etc. Still, there’s a pattern in which leads that converted or leads that did not take, so in coming up with the predictions, the model identified that some of the features were more important than others in predicting if leads would convert or not, below we can see the feature importance.
The most important feature in predicting lead conversion is if the lead visited the app twice on the same day, which makes sense as those leads have the highest interest in the app. Generally, the length of time between registering and subscribing is the most important in predicting leads that would convert.
The country where the lead is based does not seem to be a very distinguishing feature, as no country makes the top 15 important features, but it still is important to predict lead conversion. The feature that was the least important is if the lead uses a set-top box to access the app, as it is an outmoded device.
GitHub Actions, which is my go-to CI/CD tool, was used here to automate the building, testing, and release. For this project, a very simple pipeline was built, which just installs dependencies, runs flake8 to make sure the code is readable and conforms to standards, and runs the testing to make sure the app is robust and would not break when deployed.
Pytest which is my preferred testing tool was used to build and run the unit tests, to make sure the app is behaving the way I expect it to. Very simple tests were written, such as:
Fast API was used to create a robust and scalable API for serving the predictive model, which takes the trained model and outputs a prediction, and streamlit was used as the front-end, to display the results in a nice UI that company leaders can use to access their leads.
The whole app was containerized using docker, so we can ensure that it runs consistently in any environment, regardless of the underlying system. Docker compose was also used to define the different services required for the application, which are the FastAPI and the streamlit, and run them all together in a single environment, as it allows you to easily manage and scale your services as needed.
To capture user data for real-time prediction by the ML models, these are some of the techniques used:
This data is obtained from each user as they interact with the app, and the data goes through the pipeline of cleaning, and feature engineering to a useable format and is passed to the web app(that would be shown in the next section) for real-time prediction to help understand which users are more likely to convert into customers.
Below is a demo of how the app works,
So as you can see from the simple demo above, based on the user's activities on the app, the model can predict if this lead would convert or not, so based on that, recommended actions would be taken.
The web app is only available to the mobile company and not for the users, as ideally the data is automatically populated as explained in the previous section, but here, I made it a choice so we can mimic user details being captured, so the company heads can make informed decisions.
When a lead is predicted to convert, it’s crucial to capitalize on its potential.
On the other hand, when a lead is predicted to not convert, it’s essential to employ effective re-engagement strategies.
Certain user information can be added to make the predictions more robust, and to better predict leads likely to convert. These additional features have a great potential to even increase the accuracy and hence convert more leads into potential customers.
Some of these features are:
Predicting the leads that are likely to convert is of utmost importance to companies, and to this mobile company as well. Many leads start the process and fall by the way, so being able to identify such promising leads that have a high possibility of becoming paying customers is crucial, so certain incentives can be afforded them to help them complete the process. People's minds can change at any time, but with the company’s intervention obtained from the knowledge gained from this project, knowing those that are likely to convert, can become paying customers to keep the business afloat and make the company profits.
Some key MLOPS principles were followed to develop this application and also some analysis to further understand the data to help build a more robust predictive model. This can be further worked on to become a usable app or integrated into an existing system to derive value to help reduce losses at a company and increase profit. Engineers can follow some of these principles to begin building apps that are robust and can create value.
Code used in this project can be found here → https://github.com/JoAmps/Lead_conversion_prediction_for_a_mobile_app_company