LLM Evaluation Framework by Sergiu Nicoara
LLM Evaluation Framework by Sergiu Nicoara
Sign Up
Post a job
Sign Up
Log In
LLM Evaluation Framework
Sergiu Nicoara
I build evaluation infrastructure that tells you whether your AI system actually works — before it fails in production.
What you get:
LLM-as-a-Judge harness with golden datasets and multi-metric scoring
RAGAS integration: faithfulness, relevancy, factuality, context recall
Regression logging with score delta tracking across runs
Prompt injection detection (5 taxonomies) and PII redaction guardrails
HITL safety gates and output moderation
OpenTelemetry instrumentation + Jaeger trace visibility
Built for teams who need confidence in their LLM outputs, not just vibes.
FAQs
What do I need to provide to get started?
What if I don't have a golden dataset yet?
Does this work with my stack (not LangChain/OpenAI)?
Who owns the code?
What if we need ongoing eval support after the 2 weeks?
Example work
Production AI Data Platform: Safety, Eval, and Observability
Knowledge Graph RAG: 6-Stage Retrieval on Neo4j
Real-Time Agent Observability Platform
Sergiu's other services
AI System Audit & Roadmap
$750
RAG System Design & Implementation
$5,000
Starting at
$3,500
Message
Duration
2 weeks
Tags
FastAPI
Google Cloud Platform
LangChain
OpenAI
Python
Redis
Machine Learning
LangFuse
LangSmith
Service provided by
Sergiu Nicoara
Timișoara, Romania
LLM Evaluation Framework
Sergiu Nicoara
Starting at
$3,500
Message
Duration
2 weeks
Tags
FastAPI
Google Cloud Platform
LangChain
OpenAI
Python
Redis
Machine Learning
LangFuse
LangSmith
I build evaluation infrastructure that tells you whether your AI system actually works — before it fails in production.
What you get:
LLM-as-a-Judge harness with golden datasets and multi-metric scoring
RAGAS integration: faithfulness, relevancy, factuality, context recall
Regression logging with score delta tracking across runs
Prompt injection detection (5 taxonomies) and PII redaction guardrails
HITL safety gates and output moderation
OpenTelemetry instrumentation + Jaeger trace visibility
Built for teams who need confidence in their LLM outputs, not just vibes.
FAQs
What do I need to provide to get started?
What if I don't have a golden dataset yet?
Does this work with my stack (not LangChain/OpenAI)?
Who owns the code?
What if we need ongoing eval support after the 2 weeks?
Example work
Production AI Data Platform: Safety, Eval, and Observability
Knowledge Graph RAG: 6-Stage Retrieval on Neo4j
Real-Time Agent Observability Platform
Sergiu's other services
AI System Audit & Roadmap
$750
RAG System Design & Implementation
$5,000
$3,500
Message