SALES PREDICTION USING PYTHON

Ali Hassan

Data Modelling Analyst
Data Analyst
AI Model Developer
Microsoft Excel
Python

Data Analysis and Modeling Documentation

Importing Libraries

In this section, we import the necessary Python libraries for data analysis and modeling, including pandas, numpy, matplotlib.pyplot, seaborn, and specific modules from scikit-learn for machine learning.
import pandas as pd

import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score

Loading the Dataset

We load a dataset from a CSV file named 'sales_data.csv' (you can replace this with your dataset file) into a Pandas DataFrame named data.
# Step 1: Load and Prepare Data

data = pd.read_csv('D:\\Data Analyst\\CodSoft\\Task 3\\advertising.csv')

Data Wrangling and Cleaning

Checking for Missing Values

We start by checking for missing values in the dataset:
We use the isnull().sum() method to calculate the number of missing values for each column.
The results are printed to the console to provide an overview of missing data.
# Step 2: Data Wrangling and Cleaning

# Check for missing values
missing_values = data.isnull().sum()
print("Missing Values:\n", missing_values)

Summary Statistics

We compute summary statistics for the dataset:
Summary statistics, including count, mean, standard deviation, minimum, and maximum values, are generated using describe().
The summary statistics are printed on the console to provide an overview of the data's central tendencies and variability.
# Step 3: Data Analysis and Visualization

# Explore the data
print(data.describe()) # Summary statistics
print(data.head()) # Display the first few rows

Data Visualization

Pairplot Visualization

We create a pairplot to visualize relationships between variables:
The sns.pairplot() function generates scatterplots for the 'TV', 'Radio', and 'Newspaper' variables against the 'Sales' variable.
The plt.suptitle() function is used to set the title for the pairplot.
The pairplot is displayed to visually inspect potential relationships.
# Data Visualization

# Pairplot to visualize relationships between variables
sns.pairplot(data, x_vars=['TV', 'Radio', 'Newspaper'], y_vars='Sales', height=4, aspect=1)
plt.suptitle("Pairplot of Sales vs. Advertising Channels", y=1.02)
plt.show()
Sales vs. Advertising Channels
Sales vs. Advertising Channels

Heatmap Visualization

We visualize correlations between variables using a heatmap:
We calculate the correlation matrix using the corr() method.
The sns.heatmap() function is used to generate a heatmap with annotations.
The heatmap is displayed to visualize the strength and direction of correlations between variables.
# Heatmap to visualize correlations

correlation_matrix = data.corr()
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm')
plt.title("Correlation Matrix")
plt.show()
Correlation Matrix
Correlation Matrix

Feature Selection

We select the features (independent variables) and the target variable (dependent variable) for our analysis:
We create a feature matrix X containing the 'TV', 'Radio', and 'Newspaper' columns.
We create a target vector y containing the 'Sales' column.
# Feature Selection

X = data[['TV', 'Radio', 'Newspaper']]
y = data['Sales']

Data Splitting

We split the dataset into training and testing sets for model evaluation:
We use train_test_split() from scikit-learn to split X and y into X_train, X_test, y_train, and y_test.
The test size is set to 20% of the data, and a random seed (random_state) is set for reproducibility.
# Data Splitting

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Model Creation and Training

Creating and Training the Model

We create a Linear Regression model and train it on the training data:
We instantiate a Linear Regression model using LinearRegression().
We fit the model to the training data using fit().
# Create and Train the Model

model = LinearRegression()
model.fit(X_train, y_train)

Making Predictions

We use the trained model to make predictions on the test data:
We use the predict() method to generate predictions for X_test, resulting in y_pred.
# Make Predictions

y_pred = model.predict(X_test)

Model Evaluation

Evaluating the Model

We evaluate the model's performance:
We calculate the Mean Squared Error (MSE) and R-squared (R2) using functions from scikit-learn.
MSE measures the average squared difference between actual and predicted values.
R2 quantifies the proportion of the variance in the dependent variable explained by the independent variables.
The evaluation metrics are printed to the console.
# Evaluate the Model

mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print("\nModel Evaluation:")
print("Mean Squared Error:", mse)
print("R-squared:", r2)

Visualization of Predicted vs. Actual Sales

We create a scatter plot to visualize predicted vs. actual sales from the test data:
We used plt.scatter() to create the scatter plot, where y_test represents actual sales and y_pred represents predicted sales.
Axes labels and a title are added to the plot for clarity.
The plot is displayed to visualize how well the model's predictions align with actual values.
# Visualization of Predicted vs. Actual Sales

plt.scatter(y_test, y_pred)
plt.xlabel("Actual Sales")
plt.ylabel("Predicted Sales")
plt.title("Actual Sales vs. Predicted Sales")
plt.show()
Actual VS. Predicted Sales
Actual VS. Predicted Sales

Using the Model for Predictions (Example)

We demonstrate how to use the trained model for predictions using a new data point:
We create a new DataFrame new_data containing values for 'TV', 'Radio', and 'Newspaper'.
We use the predict() method to predict sales for this new data point.
The predicted sales value is printed on the console.
# Use the Model for Predictions (Example)

new_data = pd.DataFrame({
'TV': [100],
'Radio': [25],
'Newspaper': [10]
})

predicted_sales = model.predict(new_data)
print("\nPredicted Sales:", predicted_sales[0])
Partner With Ali
View Services

More Projects by Ali