Vlad Ioan's Work | ContraWork by Vlad Ioan
Vlad Ioan
pro

Vlad Ioan

AI Infrastructure Engineer | On-Premise LLM & RAG

New to Contra

Vlad is building their profile!

Cover image for Built two production AI agents
Built two production AI agents via Dify tool-calling: nmap network scanner (natural language → structured scan results) and Active Directory lookup (LDAP/NTLM auth, user/group queries). LLM orchestration with Granite-4.0-Tiny as tool-calling model.
1
8
Cover image for Built production RAG system using
Built production RAG system using Dify 1.13.x, Qdrant vector database, BGE-M3 embeddings and bge-reranker-v2-m3 reranker. Chunk size 800/overlap 200. Integrated with internal document repositories. Achieves reliable retrieval on air-gapped infrastructure with no cloud dependency.
1
12
Cover image for Optimize Your AI Workload with Multi-Model CPU Clusters
Deployed a multi-model CPU inference cluster on 2× HP ProLiant DL360 Gen9 servers (376GB RAM total). Running 5 concurrent LLM endpoints: Qwen3-Coder-30B, Qwen3-Next-80B, GLM-4.7-Flash, Granite-4.0-Tiny — all quantized (GGUF/Q4_K_M). Optimized with ik_llama + MKL for +62% throughput. Ollama-compatible API, Open WebUI frontend, Opik observability.
1
17