Vyom Modi
Highlights
🔍 Seamless Search: Effortlessly search for subtitles by dialogue using advanced BERT embeddings technology.
📈 Accurate Results: Access the most probable matches for your search queries, powered by state-of-the-art Chroma vector database.
💻 User-Friendly Interface: Navigate SubSleuth's intuitive interface with ease, ensuring a seamless user experience.
📥 Convenient Download: Download your identified subtitle files directly from the application for hassle-free access.
Process Overview
All the files in the given dataset were converted into BERT vector embeddings and stored in ChromaDB.
Here we selected 'multi-qa-MiniLM-L6-cos-v1' from S-BERT to generate embeddings as it is fine-tuned specifically for semantic search using set of question-answer pairs.
We then utilised this Chroma Vector DataBase in a Flask application.
Here we first preprocessed the user input with same steps as utilised for the dataset before vectorising it.
After vectorising the user query, most relevant subtitle files along with their download links are returned based on their cosine similarity score.
Furthermore, the flask app was further enhance by user friendly UI and elements like dynamic background altering its linear and radial gradient according to the user's cursor movement making it a much more engaging experience overall.
Demo