Built a Multimodal RAG system live on stream. From zero to working CLI in oneBuilt a Multimodal RAG system live on stream. From zero to working CLI in one
The network for creativity
Join 1.25M professional creatives like you
Connect with clients, get discovered, and run your business 100% commission-free
Creatives on Contra have earned over $150M and we are just getting started
Built a Multimodal RAG system live on stream. From zero to working CLI in one session.
The goal: a terminal-based RAG pipeline using Haystack, Gemini Multimodal Embeddings, and Flash Lite, querying complex research PDFs through a conversational loop.
Three real engineering lessons from the build:
Chunking isn't optional. Raw PDFs exhaust token limits instantly. 6-page splits solved it.
InMemoryDocumentStore works for prototyping, but re-vectorizes on every launch. Persistent DBs like Weaviate are the next step.
Haystack's pipeline is strict. Mis-wiring a retriever to a generator crashes the loop immediately. API contracts matter.
Result: a CLI that maps user queries to the top 4 semantic chunks in RAM and returns grounded, non-hallucinated answers from your own documents.
Every mistake, every fix, on camera.
Post image
Mani Sharan's avatar
"Ugurcan Uzunkaya this is seriously impressive! 🔥 Building a full RAG pipeline live on stream in one session is no joke!
That chunking lesson is so real — token limits hit different when you're working with raw PDFs.
And the Haystack pipeline point about API contracts is...
Uğurcan's avatar
You can reach the code clicking completed work button. Thank you for your kind words!
Trashu's avatar
great work!
Back to feed
The network for creativity
Join 1.25M professional creatives like you
Connect with clients, get discovered, and run your business 100% commission-free
Creatives on Contra have earned over $150M and we are just getting started