Umer Nazeer
The bag of visual words (BoVW) is a popular method for image classification and recognition that extracts features from images and represents them as visual words. In this task we first extract the features from the images then we generate a visual vocabulary as known as Codebook Generation, lastly we train two classifiers, one using Support Vector Machines and the other one using Random Forest. We work on two datasets in this task, Objects Dataset and Flowers Dataset and perform these tasks for both these datasets.
The first data set that was provided to us was the Objects dataset containing four different types of objects. This dataset was already split into training and test data and only needed to be loaded into the code in order to get started. The training set contained 14 images of all 4 categories namely accordion, dollar bill, motorbike and soccer ball. The test set contained 2 images per category (8 images in total).
The second dataset was the Flowers dataset which contained 3670 images of five different types of flowers namely Daisy (633 images), Dandelion (898 images), Roses (641 images), Sunflowers (699 images), Tulips (799 images). In this dataset, a suitable quantity of images must be chosen for training and test sets. I kept 3450 out of 3670 images inside the training set whereas 220 images for the test set. The division of training and test set per class was as follows:
Flowers Dataset
The Bag of Visual Words is applied on a dataset in the following sequence of steps.
First copy the files Joblibs on your machine. Then Import the mentioned libraries and dependencies in the code. Then load the Joblibs mentioned in the codes to load the models.
The path for train and test sets mentioned in the code needs to be altered to match with where the Datasets are located in your machine in order for this code to work.
The images below show how SIFT feature detector outputs the features and key points in the images which then constitude towards the formation of the codebook or the visual vocabulary.