Food Delivery Time Prediction using Machine Learning

Smruti Pote

Completed work

ML Engineer

CSS Gradient

Python

scikit-learn

Food & Beverage

Food Delivery Time Prediction using Machine Learning

Smruti Pote

7 min read

Apr 5, 2025

🛵 Food Delivery Time Prediction with Machine Learning

Companies like Zomato and Swiggy need to give users an accurate estimate of how long it will take to deliver their food. This helps: Build trust with customers 🧑‍🍳Improve user experience 📱Optimize delivery operations 🚴But how do they do it?

They use Machine Learning (ML)! 🧠 ML models are trained to predict delivery time based on past data — like how long it took delivery partners to travel similar distances under similar conditions.

🧪 What This Article Covers

If you’re curious about how this works under the hood, you’re in the right place. This article covers how to build a food delivery time prediction system using: Historical delivery data (e.g., distance, delivery time, traffic). Python and popular libraries like scikit-learn or GradientBoostA real Machine Learning model that predicts time taken to deliver an order which includes Data collection and preprocessing, Feature engineering (e.g., extracting time of day, day of week), Model training and evaluation, Making predictions on new delivery orders.

So, for this task, we need a dataset containing data about the time taken by delivery partners to deliver food from the restaurant to the delivery location. I found an ideal dataset with all the features for this task. You can download the dataset from here.

Food Delivery Time Prediction using Python

I will start the task of food delivery time prediction by importing the necessary Python libraries and the dataset:

import pandas as pdimport numpy as npimport plotly.express as pxdata = pd.read_csv("deliverytime.txt")print(data.head())

Let’s have a look at the column insights before moving forward:

data.info()

Now let’s have a look at whether this dataset contains any null values or not:

There are no missing values in the dataset, so we’re good to go!

Calculating the Distance Between Two Coordinates

The dataset doesn’t directly provide the distance between the restaurant and the delivery address. Instead, it gives us the latitude and longitude for both points. To estimate the distance between them, we can apply the Haversine formula, which calculates the great-circle distance between two points on a sphere based on their geographic coordinates.

Here’s how you can compute the distance between the restaurant and the delivery location using the Haversine formula:

# Set the earth's radius (in kilometers)R = 6371# Convert degrees to radiansdef deg_to_rad(degrees):    return degrees * (np.pi/180)# Function to calculate the distance between two points using the haversine formuladef distcalculate(lat1, lon1, lat2, lon2):    d_lat = deg_to_rad(lat2-lat1)    d_lon = deg_to_rad(lon2-lon1)    a = np.sin(d_lat/2)**2 + np.cos(deg_to_rad(lat1)) * np.cos(deg_to_rad(lat2)) * np.sin(d_lon/2)**2    c = 2 * np.arctan2(np.sqrt(a), np.sqrt(1-a))    return R * c  # Calculate the distance between each pair of pointsdata['distance'] = np.nanfor i in range(len(data)):    data.loc[i, 'distance'] = distcalculate(data.loc[i, 'Restaurant_latitude'],                                         data.loc[i, 'Restaurant_longitude'],                                         data.loc[i, 'Delivery_location_latitude'],                                         data.loc[i, 'Delivery_location_longitude'])

This sets the radius of the Earth in kilometers. It’s a constant used in the Haversine formula to convert angular distance (in radians) to actual distance (in km).Since most trigonometric functions in NumPy (like sin, cos) expect radians, this helper function converts degrees to radians. Formula: radians = degrees × π / 180

d_lat and d_lon are the differences in latitude and longitude, converted to radians. Initializes a new column called 'distance' in the DataFrame with NaN values. Loops through each row in the DataFrame. For each row, it: Retrieves the latitude/longitude of the restaurant and delivery location. Calculates the distance using distcalculate(). Stores the result in the 'distance' column.

We have now calculated the distance between the restaurant and the delivery location. We have also added a new feature in the dataset as distance. Let’s look at the dataset again:

Data Exploration

Now let’s explore the data to find relationships between the features. I’ll start by looking at the relationship between the distance and time taken to deliver the food:

figure = px.scatter(data_frame = data,                     x="distance",                    y="Time_taken(min)",                     size="Time_taken(min)",                     trendline="ols",                     title = "Relationship Between Distance and Time Taken")figure.show()

There appears to be a steady pattern between the delivery time and the distance covered. This suggests that most delivery partners complete deliveries in about 25 to 30 minutes, regardless of how far they have to travel.

Next, let’s explore how delivery time varies based on the age of the delivery partner.

figure = px.scatter(data_frame = data,                     x="Delivery_person_Age",                    y="Time_taken(min)",                     size="Time_taken(min)",                     color = "distance",                    trendline="ols",                     title = "Relationship Between Time Taken and Age")figure.show()

here is a linear relationship between the time taken to deliver the food and the age of the delivery partner. It means young delivery partners take less time to deliver the food compared to the elder partners.

Now let’s have a look at the relationship between the time taken to deliver the food and the ratings of the delivery partner:

figure = px.scatter(data_frame = data,                     x="Delivery_person_Ratings",                    y="Time_taken(min)",                     size="Time_taken(min)",                     color = "distance",                    trendline="ols",                     title = "Relationship Between Time Taken and Ratings")figure.show()

There is an inverse linear relationship between the time taken to deliver the food and the ratings of the delivery partner. It means delivery partners with higher ratings take less time to deliver the food compared to partners with low ratings.

Now let’s have a look if the type of food ordered by the customer and the type of vehicle used by the delivery partner affects the delivery time or not:

fig = px.box(data,              x="Type_of_vehicle",             y="Time_taken(min)",              color="Type_of_order")fig.show()

There isn’t a significant variation in delivery times based on the type of vehicle used or the kind of food being delivered.

Based on our analysis, the key factors that have the most impact on food delivery time are: The age of the delivery partner, The ratings of the delivery partner. The distance between the restaurant and the delivery location.

In the following section, we’ll walk through the process of training a Machine Learning model to predict food delivery times.

Food Delivery Time Prediction Model

from sklearn.model_selection import train_test_splitx = np.array(df[["Delivery_person_Age",                    "Delivery_person_Ratings",                    "distance"]])y = np.array(df[["Time_taken(min)"]])xtrain, xtest, ytrain, ytest = train_test_split(x, y,                                                 test_size=0.20,                                                 random_state=42)from sklearn.linear_model import LinearRegressionfrom sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score# Train the modellr_model = LinearRegression()lr_model.fit(xtrain, ytrain)# Predictlr_preds = lr_model.predict(xtest)# Evaluationprint("Linear Regression:")print("MAE:", mean_absolute_error(ytest, lr_preds))print("MSE:", mean_squared_error(ytest, lr_preds))print("RMSE:", np.sqrt(mean_squared_error(ytest, lr_preds)))print("R² Score:", r2_score(ytest, lr_preds))from sklearn.ensemble import RandomForestRegressor# Train the modelrf_model = RandomForestRegressor(n_estimators=100, random_state=42)rf_model.fit(xtrain, ytrain.ravel())# Predictrf_preds = rf_model.predict(xtest)# Evaluationprint("Random Forest Regressor:")print("MAE:", mean_absolute_error(ytest, rf_preds))print("MSE:", mean_squared_error(ytest, rf_preds))print("RMSE:", np.sqrt(mean_squared_error(ytest, rf_preds)))print("R² Score:", r2_score(ytest, rf_preds))from sklearn.ensemble import GradientBoostingRegressorgbr_model = GradientBoostingRegressor()gbr_model.fit(xtrain, ytrain.ravel())gbr_preds = gbr_model.predict(xtest)print("Gradient Boosting:")print("MAE:", mean_absolute_error(ytest, gbr_preds))print("RMSE:", np.sqrt(mean_squared_error(ytest, gbr_preds)))print("R² Score:", r2_score(ytest, gbr_preds))results = {    "Model": [],    "MAE": [],    "RMSE": [],    "R² Score": []}# List of model names and trained modelsmodel_names = ["Linear Regression", "Random Forest", "Gradient Boost"]trained_models = [lr_model, rf_model, gbr_model]  # replace with your trained model variables# Evaluate modelsfor name, model in zip(model_names, trained_models):    preds = model.predict(xtest)    results["Model"].append(name)    results["MAE"].append(round(mean_absolute_error(ytest, preds), 2))    results["RMSE"].append(round(np.sqrt(mean_squared_error(ytest, preds)), 2))    results["R² Score"].append(round(r2_score(ytest, preds), 2))# Create summary DataFramesummary_df = pd.DataFrame(results)print(summary_df)

This code demonstrates a complete machine learning workflow for predicting food delivery time using three different regression models: Linear Regression, Random Forest Regressor, and Gradient Boosting Regressor.

It starts by selecting relevant features — delivery partner’s age, their rating, and the distance between the restaurant and delivery location — as the input variables (X), and the delivery time in minutes as the target variable (y). The dataset is then split into training and testing sets using an 80–20 ratio. Each model is trained on the training data and used to make predictions on the test set.

The performance of the models is evaluated using metrics such as Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and R² score, which indicate the accuracy of the predictions. Finally, the results from all three models are compared and summarized in a DataFrame, allowing us to easily assess which model performs best for predicting food delivery times.

Summary

To accurately predict food delivery time in real-time, it’s essential to calculate the distance between the location where the food is prepared and the customer’s address. Once this distance is determined, the next step is to analyze historical data to identify patterns in how long delivery partners have typically taken to cover similar distances.

This helps build a reliable prediction model. I hope you found this article on predicting food delivery time using Machine Learning in Python insightful and useful.

Check out the code and repository on github :https://github.com/smrutipote/Food-Delivery-Time-using-Machine-Learning

Also check out this project live on huggingface : https://huggingface.co/spaces/smrup/Food-Delivery-Time-Prediction-using-Machine-Learning

Like this project