Dhaka Stock Market Data Analysis With Python

MOHAMMAD ULLAH

Data Visualizer

Business Analyst

Data Analyst

Microsoft Excel

MySQL

Python

Introduction

Dataset Overview

Let's proceed to the analysis.

Import the necessary libraries

Loading the dataset with pandas

Check the data types

Calculate basic summary statistics for each column (mean, median, standard deviation, etc.)

Get the top 5 companies with the highest total volume:

Explore the distribution of the 'Close' prices over time

Identify and analyze any outliers (if any) in the dataset of volume column

Create a line chart to visualize the 'Close' prices over time

Calculate and plot the daily percentage change in closing prices

Investigate the presence of any trends or seasonality in the stock prices

Apply moving averages to smooth the time series data in 15/30 day intervals against the original graph

Calculate the average closing price for each stock

Identify the top 5 and bottom 5 stocks based on average closing price.

Calculate and plot the rolling standard deviation of the 'Close' prices

Create a new column for daily price change

Analyze the distribution of daily price changes

Identify days with the largest price increases and decreases

Identify stocks with unusually high trading volume on certain days

Explore the relationship between trading volume and volatility

Correlation matrix between the 'Open' & 'High', 'Low' &'Close' prices

Create heatmaps to visualize the correlations using the seaborn package.

Acknowledgments

Introduction

Welcome to a journey through the Dhaka stock market of 2022!

This blog is your gateway to decoding market trends, deciphering data patterns, and discovering the specifics of Bangladesh's financial landscape. Through Python-powered analysis and visualizations, we'll unveil the underlying narratives behind stock movements. Join me on this data-driven journey as we unravel the Dhaka stock market's story, offering valuable insights for traders, investors, and enthusiasts seeking a deeper understanding of market dynamics.

Dataset Overview

Within this analysis, we harnessed a CSV file containing a substantial dataset for exploration. The dataset comprises 49,159 rows and 7 columns, encapsulating a wealth of information related to the Dhaka stock market. It spans the time period from January 2022 to June 2022, encompassing a diverse array of 412 companies. Key features in this dataset include columns such as Date, Name, Open, High, Low, Close, Volume. This dataset offers a comprehensive perspective on the dynamics of the Dhaka stock market, allowing for in-depth analysis and insights into this sector.

Let's proceed to the analysis.

Import the necessary libraries

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

import seaborn as sns

These libraries are essential components for our analysis.

NumPy: Utilized for numerical operations and array manipulations.

Pandas: Essential for data manipulation and analysis, offering powerful data structures and tools.

Matplotlib.pyplot: Enables the creation of visualizations, including plots, charts, and graphs.

Seaborn: A robust library for statistical data visualization, enhancing the aesthetics and overall appeal of visual representations.

Loading the dataset with pandas

In this section, the stock market data is imported using the read_csv() function from the pandas library within a .py file in VS Code. Following this, the head() function is employed to showcase the first 5 rows of the dataset, providing an initial overview of the structure and contents of the Dhaka stock market data.

Output:

df = pd.read_csv('Stock_Market_Data.csv')

print(df.head())

Check the data types

Verify the data types of all columns using the following command. Ensuring correct data formatting is crucial before analysis. Analyzing data in the wrong format can lead to inaccurate insights or results.

print(df.dtypes)

Output:

We notice an issue with the data type of the Date column—it appears as an object type when it should be in a DateTime format. Therefore, we need to convert the object type to datetime. Let’s proceed with that conversion.

# Convert 'Date' column to datetime format

df['Date'] = pd.to_datetime(df['Date'],dayfirst=True)

Utilizing pd.to_datetime(), we perform the conversion. Additionally, we set dayfirst=True since the date format is %d%m%y. Let's recheck the data types to confirm the successful conversion

print(df.dtypes)

Output:

Calculate basic summary statistics for each column (mean, median, standard deviation, etc.)

The code generates essential statistical measures for the df DataFrame using the describe() function. This function delivers crucial metrics including count, mean, standard deviation, as well as minimum and maximum values for each numerical column in the dataset. These statistics offer a comprehensive overview of the data's central tendencies and variability.

print(df.describe())

Output:

Get the top 5 companies with the highest total volume:

Handpicking key companies from the dataset allows for a deeper analysis of their market impact. The code snippet identifies the top 5 companies by their total trading volume.

# Calculate total volume for each company

volume_per_company = df.groupby('Name')['Volume'].sum()



# Get the top 5 companies with the highest total volume

top_5_companies = volume_per_company.nlargest(5).index

print(top_5_companies.to_list())

Output:

Explore the distribution of the 'Close' prices over time

Analyzing the 'Close' price distribution over time for the top 5 companies, this code generates histograms for each company. The visualizations provide insights into the variations in closing prices, facilitating a comparative assessment of their stock performance.

for name in top_5_companies:

    company_data = df[df['Name'] == name]

    plt.figure(figsize=(15, 5))

    sns.histplot(data=company_data, x="Close", bins=30, label=name)

    plt.xlabel("Closing Price Distribution")

    plt.ylabel("Frequency")

    plt.title("Distribution of Close Prices Over Time of {}".format(name))

    plt.legend()

    plt.xticks(rotation=45)

    plt.show()

(2 selected plots from the 5 output plots are provided for reference.)

Identify and analyze any outliers (if any) in the dataset of `volume` column

Outliers beyond the maximum quartile in the 'Volume' boxplots indicate:

Unprecedented Extremes: Representing values extraordinarily higher than the upper quartile (Q3) and even beyond the maximum observed values.

Exceptional Market Activity: Signifying unparalleled and exceptionally rare trading volumes for these companies, often unique and unmatched in the dataset.

Possible Data Anomalies: Needing scrutiny for accuracy, as these extreme values might sometimes arise from errors or anomalies in the data.

Unmatched Market Events: Reflecting extraordinary market occurrences or exceptional trading interests that significantly deviate from typical market behavior.

Potential Market Impact: While confirming the accuracy, highlighting moments of extreme market activity that could have impacted short-term price movements or market perceptions.

(3 selected plots from the 5 output plots are provided for reference.)

Create a line chart to visualize the 'Close' prices over time

Generate line charts depicting the 'Close' prices over time for each selected company, unraveling the historical trends and patterns in their stock performance.

# Loop through the top 5 companies based on volume

for name in top_5_companies:

    # Filter data for each specific company

    company_data = df[df['Name'] == name]



    # Create a separate line chart for each company's 'Close' prices over time

    plt.figure(figsize=(10, 4))

    plt.plot(company_data['Date'], company_data['Close'])



    # Set labels and title for the plot

    plt.xlabel('Date')

    plt.ylabel('Closing Price')

    plt.title(f'Close Prices Over Time for {name}')

    plt.xticks(rotation=45)



    # Show the plot for each company

    plt.show()

IFIC and UNIONBANK: These companies experienced a rapid decline in their 'Close' prices over the observed period, indicating a consistent downward trend in their stock prices.

FUWANGFOOD: In contrast, FUWANGFOOD exhibited an increasing trend in its 'Close' prices from January 2022 to July 2022, suggesting a rise in stock prices over this period.

These observations can provide insights into the performance of these companies within the given timeframe. It's important to analyze the reasons behind these trends—whether they are influenced by company-specific factors (like earnings reports, market news, or company performance) or broader market conditions—to better understand the drivers behind the stock price movements.

Calculate and plot the daily percentage change in closing prices

Here's a more concise breakdown of insights from daily percentage change in closing prices:

Volatility Identification: Higher fluctuations suggest more volatile stock behavior, potentially indicating greater risk.

Trend Recognition: Consistent positive or negative changes signal potential upward or downward trends in stock prices.

Event Impact: Sharp spikes or drops may align with significant company or market events, influencing stock prices.

Comparative Analysis: Comparing changes among companies highlights differing market sensitivities or strengths.

Trading Signals: Useful for short-term strategies, indicating potential buy/sell opportunities based on price movements.

Risk Evaluation: Assessing fluctuations aids in adjusting risk management strategies for investments.

  plt.figure(figsize=(15, 4))

  # Calculate daily percentage change for each company and plot individually

  for name in top_5_companies:

      plt.figure(figsize=(15, 4))

      company_data = df[df['Name'] == name]

      company_data['Daily_PCT_Change'] = company_data['Close'].pct_change() * 100



      # Plot the daily percentage change for each company

      plt.plot(company_data['Date'], company_data['Daily_PCT_Change'], label=name)



      # Set labels and title for the plot

      plt.xlabel('Date')

      plt.ylabel('Daily Percentage Change')

      plt.title(f'Daily Percentage Change in Closing Prices of {name}')

      plt.legend()

      plt.xticks(rotation=45)

      plt.show()

(2 selected plots from the 5 output plots are provided for reference.)

Investigate the presence of any trends or seasonality in the stock prices

This code snippet creates line charts displaying the stock prices over time for each of the top 5 companies, along with a rolling average trend line for a smoother representation of the trends. Here's what the code accomplishes:

Data Plotting: It generates individual line charts for each company's 'Close' prices over time.

Trend Line: Adds a rolling average line (e.g., 30-day) to visualize the trend in stock prices. This line represents the average closing price over a specified window (30 days in this case).

Visualization: Titles, labels, and legends are set for clarity in the visualization.

This code provides a visual representation of both the actual closing prices and a smoothed trend line, aiding in identifying the general trend in stock price movements over time for each company. Adjustments to the rolling window size or other visualization aspects can be made as needed for better analysis and presentation.

for name in top_5_companies:

    company_data = df[df['Name'] == name]

    plt.plot(company_data['Date'],company_data['Close'], label=name)

    # Plotting a rolling average (e.g., 30 days) for trend visualizations

    rolling_avg = company_data['Close'].rolling(window=30).mean()

    plt.plot(company_data['Date'],rolling_avg, label=f'{name} - Trend Line', linestyle='--')

    plt.title('Stock Prices Trend Line Over Time')

    plt.xlabel('Date')

    plt.ylabel('Closing Price')

    plt.legend()

    plt.show()

Apply moving averages to smooth the time series data in 15/30 day intervals against the original graph

The code depicting original closing prices alongside 15-day and 30-day moving averages offers these insights:

Trend Visualization: Shows long-term trends by smoothing price data, aiding in trend identification.

Signal Analysis: Crosses between moving averages indicate potential shifts in trends, guiding short-term decision-making.

Volatility Indication: The gap between prices and moving averages highlights market volatility.

Support/Resistance Levels: Interaction between prices and moving averages identifies potential support or resistance levels.

Momentum Confirmation: Consistent alignment between prices and moving averages confirms momentum in price movement.

# Loop through the top 5 companies based on volume

for name in top_5_companies:

    plt.figure(figsize=(12, 6))

    company_data = df[df['Name'] == name]



    # Plotting original closing prices

    plt.plot(company_data['Date'], company_data['Close'], label=name)



    # Calculate and plot moving averages (15-day and 30-day)

    moving_avg_15 = company_data['Close'].rolling(window=15).mean()

    moving_avg_30 = company_data['Close'].rolling(window=30).mean()



    plt.plot(company_data['Date'], moving_avg_15, label=f'{name} - 15-day MA', linestyle='--')

    plt.plot(company_data['Date'], moving_avg_30, label=f'{name} - 30-day MA', linestyle='-.')



    # Set labels, title, and legend

    plt.title('Stock Prices with Moving Averages Over Time')

    plt.xlabel('Date')

    plt.ylabel('Closing Price')

    plt.legend()

    plt.xticks(rotation=45)



    # Show the plot

    plt.show()

Calculate the average closing price for each stock

This code groups the DataFrame by the 'Name' column and then calculates the mean of the 'Close' prices for each group, resulting in the average closing price for each stock using the 'groupby()' function. The output will display the average closing price for all the stocks available in the dataset.

# Calculate average closing price for each stock

average_closing_price = df.groupby('Name')['Close'].mean()

# Display the average closing prices

print(average_closing_price)

Output:

Identify the top 5 and bottom 5 stocks based on average closing price.

# Calculate average closing price for each stock

average_closing_price = df.groupby('Name')['Close'].mean()



# Sort stocks based on average closing price

sorted_stocks = average_closing_price.sort_values()



# Display top and bottom stocks

print("Top 5 Stocks based on Average Closing Price:")

print(sorted_stocks.tail(5))



print("\nBottom 5 Stocks based on Average Closing Price:")

print(sorted_stocks.head(5))

Output:

Calculate and plot the rolling standard deviation of the 'Close' prices

Analyzing the rolling standard deviation provides insights into

Volatility Levels: Comparing standard deviations helps gauge volatility differences among companies.

Market Uncertainty: Peaks in standard deviation may indicate uncertain market periods or impactful events.

Risk Perception: Higher standard deviations suggest higher risk due to greater price variability.

Stability Indicator: Lower standard deviations imply more stable prices, appealing to risk-averse investors.

Trading Insights: Fluctuations guide risk management strategies and identify potential trading opportunities.

  # Calculate and plot rolling standard deviation for each of the top 5 companies

  for name in top_5_companies:

      company_data = df[df['Name'] == name]

      rolling_std = company_data['Close'].rolling(window=30).std()



      plt.figure(figsize=(12, 6))

      plt.plot(company_data['Date'], rolling_std, label=f'{name} - Rolling Std (30-day)', color='orange')

      plt.title(f'Rolling Standard Deviation of Close Prices for {name} (30-day Window)')

      plt.xlabel('Date')

      plt.ylabel('Standard Deviation')

      plt.legend()

      plt.xticks(rotation=45)

      plt.show()

Create a new column for daily price change

# Create a new column for daily price change

df['Daily_Price_Change'] = df['Close'] - df['Open']

# Display the updated DataFrame with the new column

print(df.head())

Output:

Analyze the distribution of daily price changes

# Analyze distribution of daily price changes for top 5 companies

for name in top_5_companies:

    company_data = df[df['Name'] == name]



    plt.figure(figsize=(8, 6))

    plt.hist(company_data['Daily_Price_Change'], bins=30, edgecolor='black')

    plt.title(f'Distribution of Daily Price Changes for {name}')

    plt.xlabel('Daily Price Change')

    plt.ylabel('Frequency')

    plt.grid(axis='y', alpha=0.5)

    plt.show()

Identify days with the largest price increases and decreases

# Identify days with the largest price increases

largest_increase = df.nlargest(5, 'Daily_Price_Change')

print("Days with the largest price increases:")

print(largest_increase[['Date', 'Daily_Price_Change']])

# Identify days with the largest price decreases

largest_decrease = df.nsmallest(5, 'Daily_Price_Change')

print("\nDays with the largest price decreases:")

print(largest_decrease[['Date', 'Daily_Price_Change']])

Output:

Identify stocks with unusually high trading volume on certain days

This analysis helps spot peaks in trading activity, indicating potential market events or investor interest. It shows patterns, events, or irregularities that might impact specific stocks or the market as a whole.

for name in top_5_companies:

    company_data = df[df['Name'] == name]

    plt.plot(company_data['Date'],company_data['Volume'],label=name)

    threshold = company_data['Volume'].quantile(0.95)

    high_volume_data = company_data[company_data['Volume'] > threshold]

    plt.scatter(high_volume_data['Date'],high_volume_data['Volume'],color="red",marker='o',label="{} - High Volume Days".format(name))

    plt.title('Trading Volume Over Time with Emphasis on Unusually High Volume Days')

    plt.xlabel('Date')

    plt.ylabel('Trading Volume')

    plt.legend()

    plt.show()

Explore the relationship between trading volume and volatility

# Calculate volatility (daily price range) for each company

df['Volatility'] = df['High'] - df['Low']



# Plot individual correlation heatmaps for each company

for name in top_5_companies:

    company_data = df[df['Name'] == name]

    correlation_matrix = company_data[['Volume', 'Volatility']].corr()

    plt.figure(figsize=(4, 4))

    sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', fmt='.2f')

    plt.title(f'Correlation Heatmap of {name}')

    plt.show()

The correlation heatmaps show how trading volume relates to price volatility:

Strength of Relationship: The color intensity and the value annotated within each cell represent the correlation coefficient. A value close to 1 signifies a strong positive correlation, while a value near -1 indicates a strong negative correlation. Values closer to 0 suggest a weaker or no linear relationship.

Volume-Volatility Link: Look for patterns where higher trading volume aligns with increased volatility. Strong positive correlations indicate they often move together.

Comparing Companies: Differences in correlations among companies highlight unique trading behaviors. Higher correlations suggest more synchronized volume-volatility movements.

Trading Insights: Identify potential periods where increased trading volume predicts higher volatility, guiding potential trading strategies.

Watch for Exceptions: Notice any unexpected relationships or instances where volume and volatility don't align as anticipated, which might warrant further investigation into market behavior.

Correlation matrix between the `'Open'` & `'High'`, `'Low'` &`'Close'` prices

# Iterate over each top company

for company in top_5_companies:

    # Filter data for the current company

    company_data = df[df['Name'] == company]



    # Select columns for correlation analysis

    price_data = company_data[['Open', 'High', 'Low', 'Close']]



    # Calculate correlation matrix

    price_correlation = price_data.corr()

Create heatmaps to visualize the correlations using the `seaborn` package.

Acknowledgments

I express my sincere appreciation to Bohubrihi for their invaluable contributions to this analysis, offering guidance and resources that significantly enriched the project.

Like this project

Posted Apr 21, 2024

Likes

Views

Dhaka Stock Market Data Analysis With Python

Introduction

Dataset Overview

Let's proceed to the analysis.

Import the necessary libraries

Loading the dataset with pandas

Check the data types

Calculate basic summary statistics for each column (mean, median, standard deviation, etc.)

Get the top 5 companies with the highest total volume:

Explore the distribution of the 'Close' prices over time

Identify and analyze any outliers (if any) in the dataset of volume column

Create a line chart to visualize the 'Close' prices over time

Calculate and plot the daily percentage change in closing prices

Investigate the presence of any trends or seasonality in the stock prices

Apply moving averages to smooth the time series data in 15/30 day intervals against the original graph

Calculate the average closing price for each stock

Identify the top 5 and bottom 5 stocks based on average closing price.

Calculate and plot the rolling standard deviation of the 'Close' prices

Create a new column for daily price change

Analyze the distribution of daily price changes

Identify days with the largest price increases and decreases

Identify stocks with unusually high trading volume on certain days

Explore the relationship between trading volume and volatility

Correlation matrix between the 'Open' & 'High', 'Low' &'Close' prices

Create heatmaps to visualize the correlations using the seaborn package.

Acknowledgments

Join 50k+ companies and 1M+ independents

Identify and analyze any outliers (if any) in the dataset of `volume` column

Correlation matrix between the `'Open'` & `'High'`, `'Low'` &`'Close'` prices

Create heatmaps to visualize the correlations using the `seaborn` package.