ProposalGuard: Building Grounded AI Proposals with LangGraph
Portfolio project. Agentic AI engineering for RFP responses.
The problem I wanted to solve
AI-generated proposals are everywhere now. The output looks polished,
but the underlying claims often aren't real. Hallucinated case studies,
invented credentials, fabricated metrics. The polish is the feature.
The hallucinations are the liability.
I wanted to build the opposite. Output verified against real context,
scanned for bias, and human-reviewed before anything ships. Agentic AI
with guardrails wired in from the start, not bolted on after.
ProposalGuard is the lab where I'm working on it.
The architecture
A 5-node LangGraph pipeline. Each node has one job. The failure mode of
one node triggers the right response from the next.
01. Retrieve context (Haiku)
Embed the RFP, query a ChromaDB vector store of company case studies,
capabilities, and credentials. Haiku does the job. Fast retrieval with
reasonable filtering, not deep reasoning.
02. Generate draft (Sonnet / Opus)
Take the retrieved context plus the RFP, draft the full proposal.
Sonnet by default. Opus for high-stakes or longer proposals where
reasoning depth matters more than speed.
03. Grounding check (Sonnet)
This is where most proposal AI breaks. Every claim in the draft gets
verified against the retrieved context. Hallucinated metrics, fabricated
case studies, invented credentials all get flagged. If the grounding
score falls below threshold, the node passes failure context back to
node 02 for regeneration with explicit feedback about what didn't
ground.
04. Bias detection (Haiku)
Scan output for tone and demographic bias. Pattern-match against known
bias surfaces: gendered language, demographic assumptions, exclusionary
framing. Cheaper model, narrower job.
05. Human review (HITL)
Stream the final draft to the frontend via SSE. A human approves, edits,
or rejects before anything ships. AI does the work, the human owns the
output.
Demo
Architecture decisions worth calling out
Models matched to jobs. Haiku for retrieval and bias scanning.
Sonnet for grounding. Sonnet or Opus for generation. Cost stays
predictable, latency stays acceptable, and the smart models work where
reasoning actually matters.
Failure injected back into the generator. The common LLM-judge
pattern is to score outputs, fail them, and regenerate blind. That
wastes tokens and rarely converges. ProposalGuard passes the grounding
failure context (which claims failed, what the retrieved context
actually said) back into the generator's prompt so the regen has the
information to fix the problem.
Stateless graph state. Each node reads from and writes to LangGraph
state. Nothing depends on a side channel. Replays work, traces work,
debugging works.
Streaming via SSE, not WebSockets. SSE is simpler. One-way matches
the data flow. The frontend doesn't need to maintain a persistent
connection. Less to break.
What I'm working on now
Being honest in this section because "shipped, perfect, done" isn't
true for what this is. ProposalGuard is an active build.
The hardest problem is the grounding regeneration loop. Early versions
had the grounding node scoring claims below threshold consistently and
triggering regen loops that didn't converge. Root causes: feedback
wasn't passed back into the generator prompt, the threshold was
miscalibrated against what "good enough" actually meant, and no
max-retries cap meant the pipeline could spin.
Current work:
Explicit feedback injection from grounding into the generator prompt
Threshold calibration from real eval data
Max-retries cap with graceful degradation
Roadmap:
Langfuse observability for tracing the full pipeline
Evaluation suite running against a real proposal dataset
Node 3.5: context leakage detection for prompt injection defense
DECISIONS.md documenting every architectural call and tradeoff
How I work
I build portfolio projects like this when I want to push deeper into
systems that client work doesn't always demand. It keeps me current
with the actual engineering of AI products, not just the API surface.
If you're building something agentic and want a technical partner
who's done the depth work, my DMs are open. I take on AI engineering
the same way I take on greenfield builds and production rebuilds:
milestone-based, with judgment and ownership.