Built a Multimodal RAG system live on stream. From zero to working CLI in oneBuilt a Multimodal RAG system live on stream. From zero to working CLI in one
The network for creativity
Join 1.25M professional creatives like you
Connect with clients, get discovered, and run your business 100% commission-free
Creatives on Contra have earned over $150M and we are just getting started
Built a Multimodal RAG system live on stream. From zero to working CLI in one session.
The goal: a terminal-based RAG pipeline using Haystack, Gemini Multimodal Embeddings, and Flash Lite, querying complex research PDFs through a conversational loop.
Three real engineering lessons from the build:
Chunking isn't optional. Raw PDFs exhaust token limits instantly. 6-page splits solved it.
InMemoryDocumentStore works for prototyping, but re-vectorizes on every launch. Persistent DBs like Weaviate are the next step.
Haystack's pipeline is strict. Mis-wiring a retriever to a generator crashes the loop immediately. API contracts matter.
Result: a CLI that maps user queries to the top 4 semantic chunks in RAM and returns grounded, non-hallucinated answers from your own documents.
Every mistake, every fix, on camera.
Post image
Back to feed
The network for creativity
Join 1.25M professional creatives like you
Connect with clients, get discovered, and run your business 100% commission-free
Creatives on Contra have earned over $150M and we are just getting started