Real-Time Object Detection | YOLOv8

Amr Khaled

ML Engineer
AI Chatbot Developer
AI Developer
OpenCV
Python
PyTorch

Hello

You can check the code and the 𝗳𝘂𝗹𝗹 𝗺𝗼𝗱𝗲𝗹 information on my 𝗚𝗶𝘁𝗛𝘂𝗯 account. Have fun :)

Code Explanation

First, we start by importing the necessary libraries and setting up our environment. We get the current working directory and print it for confirmation. Next, we download a video file from Google Drive using gdown and install the required libraries, ultralytics and supervision, which will help us with object detection and annotation.

import os
HOME = os.getcwd()
!gdown '10zzs49pm90lG5EqJpuf9X-N-sQcSSl43' -O CCTV_Input.mp4
SOURCE_VIDEO_PATH = f"{HOME}/CCTV_Input.mp4"

Now, we import more libraries including cv2 for image processing, YOLO from ultralytics for our object detection model, supervision for annotations, and numpy for numerical operations. We load the YOLOv8 model and retrieve the class names that the model can detect. For this example, we are particularly interested in class 0, which usually represents 'person' in YOLO models. We also set up a colour palette for annotations.

import cv2
from ultralytics import YOLO
import supervision as sv
import numpy as np

model = YOLO('yolov8n.pt')
CLASS_NAMES_DICT = model.model.names

selected_classes = [0]
color_palette = sv.ColorPalette.DEFAULT

color_to_use = color_palette.by_idx(1)

Next, we create a frame generator to read frames from the video. We also set up an annotator that will draw boxes around detected objects with specified thicknesses and colours. Here, we're specifying the frame index (600) that we want to process.

generator = sv.get_video_frames_generator(SOURCE_VIDEO_PATH)
corner_annotator = sv.BoxCornerAnnotator(thickness=2, color= color_to_use)
frame_index = 600
iterator = iter(generator)
for i in range(frame_index + 1):
frame = next(iterator)

We use the model to make predictions on the selected frame. The results are converted into a format suitable for supervision detections. We filter the detections to only include our selected class (people). Then, we create labels for each detected object, annotate the frame with these labels, and display the annotated frame.

results = model(frame, verbose=False)[0]
detections = sv.Detections.from_ultralytics(results)
detections = detections[np.isin(detections.class_id, selected_classes)]
labels = [
f"{CLASS_NAMES_DICT[class_id]} {confidence:0.2f}"
for confidence, class_id in zip(detections.confidence, detections.class_id)
]
annotated_frame = corner_annotator.annotate(scene=frame, detections=detections)
sv.plot_image(annotated_frame, (16, 16))

We then set up for processing the entire video. This involves defining a callback function that will be called for each frame. Inside this function, we run the model to get detections, filter them for the selected classes, and update the tracker with these detections. We annotate each frame with detection results and then return the annotated frame. Finally, we process the entire video, apply the callback to each frame, and save the output.

TARGET_VIDEO_PATH = f"{HOME}/Halo_output_video.mp4"
byte_tracker = sv.ByteTrack(track_thresh=0.25, track_buffer=30,
match_thresh=0.8, frame_rate=30)
video_info = sv.VideoInfo.from_video_path(SOURCE_VIDEO_PATH)
generator = sv.get_video_frames_generator(SOURCE_VIDEO_PATH)
corner_annotator = sv.BoxCornerAnnotator(thickness=2)
def callback(frame: np.ndarray, index:int) -> np.ndarray:
results = model(frame, verbose=False)[0]
detections = sv.Detections.from_ultralytics(results)
detections = detections[np.isin(detections.class_id, selected_classes)]

annotated_frame = corner_annotator.annotate(
scene=frame.copy(),
detections=detections)
return annotated_frame

Partner With Amr
View Services

More Projects by Amr