Production AI Data Platform: Safety, Eval, and Observability by Sergiu NicoaraProduction AI Data Platform: Safety, Eval, and Observability by Sergiu Nicoara

Production AI Data Platform: Safety, Eval, and Observability

Sergiu Nicoara

Sergiu Nicoara

A production-grade AI data platform where reliability, observability, safety, and evaluation are first-class concerns, not afterthoughts. Most RAG implementations are built for demos: single retrieval backend, no evaluation, no safety layer, no observability. They break silently in production. This one doesn't.

Retrieval pipeline

Hybrid dense + sparse retrieval (pgvector ANN + PostgreSQL FTS) with RRF fusion and MMR reranking. OpenSearch backend (BM25 + kNN) available as a runtime-selectable alternative. Multimodal ingestion: PDF and image inputs processed via GPT-4o Vision captions into a unified vector space. P95 latency gated at 800ms.

NL→SQL layer

Replaced hand-written prompts with DSPy BootstrapFewShot-optimised natural language to SQL. Schema-aware intent extraction trained against a 20-example golden dataset, injection-safe parameterised queries, workspace-scoped execution, and full audit logging.

Safety and guardrails

Prompt injection detection across 5 taxonomies, PII redaction, toxicity filtering, structured audit events, and safe fallback behavior under failure or policy violations. Workspace-level token-bucket rate limiting.

Evaluation and observability

Offline evaluation with Recall@K, MRR, and groundedness gates (≥ 0.70). Rolling SLO window with EWMA anomaly detection and automated remediation. Prometheus + Grafana 15-panel dashboard with alerting. 184-test suite covering safety, reliability contracts, and SQL correctness.
Stack: Python, FastAPI, PostgreSQL, pgvector, Redis, Docker, Prometheus, Grafana, GCP, DSPy.
Like this project