Simple-Vector-Store: Quick and Easy Vectorization of your Data

Aidan Tilgner

AI Developer
Python
SQLite

Watch on YouTube

What is the problem?

I found myself wanting to create a chatbot which used my notes as a basis for generations. Essentially, I wanted build a chatbot for my site (currently on

aidantilgner.dev

), which would talk to users of my site, with my notes in mind. This concept is called "

Retrieval Augmented Generation

", and allows for powerful customization of language models to fit your business data. Specifically, I wanted to take my Obsidian knowledge base, which is essentially a directory of text files, and have a system to automatically feed semantically relevant content to my chatbot before it responded to used.

While there were existing technologies to aid in my use-case, such as

Vector Databases

, such systems often didn't provide synchronization features, and the setup was overkill. To use a Vector Database directly for this would mean setting up an entire system for building the vector database, hosting it, then handling processes like synchronization of my files. So I set out to find a simpler solution, and quickly found that there was a lack of straightforward and quick solutions to this problem. After realizing a gap in in the solution space, I got to work building a simple CLI in python to do the task that I had in mind.

Ok, so how do we fix it?

Simple Vector Store (SVS) is a lightweight, simple vector store for your files. Simply point it at a directory, and it will create a vectorized version of that directory in the store of your choice, which can be easily searched semantically. Manage multiple stores, and use the REST API feature to connect your stores with other systems. Simple Vector Store is the perfect system for simple RAG-based Chatbots, knowledge base search, and plenty of other use-cases.

The synchronization feature allows you to easily maintain a vector representation of your directory over time, as it evolves. The SVS automatically detects differences between your source material and the vector store, and updates itself accordingly so that you never have to rebuild from scratch. This allows for significant performance improvements and easy tracking of invalid content against a base store. It also means that redundant calls to the OpenAI embeddings endpoint are not made, which saves real time and money.

How does it work?

Simple Vector Store uses

Sqlite

, and

sqlite-vss

under the hood to build a lightweight vector store in a single file. This means that you don't have to rely on hosting an entire service just to access your data or perform operations. Rather, you can perform the entirety of your operations directly through the Sqlite adapter of your choice, whether it be in Node.js, Python, Golang, or another language.

Simple Vector Store was built with Python, and runs locally in your terminal, providing a minimalist but robust interface for interaction with your files. It allows for multiple "stores" to be created and managed, which allows for scalability and flexibility over time. For embeddings, it uses

OpenAI's Embeddings API

, which is one of the top rated models for semantic search. In the future, I may add additional options to use other models, such as open source or locally hosted options.

Partner With Aidan
View Services

More Projects by Aidan