Building Grounded AI Proposals with LangGraph by Muhammad HassanBuilding Grounded AI Proposals with LangGraph by Muhammad Hassan

Building Grounded AI Proposals with LangGraph

Muhammad Hassan

Completed work

AI Agent Engineer

AI Developer

AI Engineer

Claude

LangChain

Next.js

Artificial Intelligence

ProposalGuard: Building Grounded AI Proposals with LangGraph

Portfolio project. Agentic AI engineering for RFP responses.

The problem I wanted to solve

AI-generated proposals are everywhere now. The output looks polished, but the underlying claims often aren't real. Hallucinated case studies, invented credentials, fabricated metrics. The polish is the feature. The hallucinations are the liability.

I wanted to build the opposite. Output verified against real context, scanned for bias, and human-reviewed before anything ships. Agentic AI with guardrails wired in from the start, not bolted on after.

ProposalGuard is the lab where I'm working on it.

The architecture

A 5-node LangGraph pipeline. Each node has one job. The failure mode of one node triggers the right response from the next.

01. Retrieve context (Haiku) Embed the RFP, query a ChromaDB vector store of company case studies, capabilities, and credentials. Haiku does the job. Fast retrieval with reasonable filtering, not deep reasoning.

02. Generate draft (Sonnet / Opus) Take the retrieved context plus the RFP, draft the full proposal. Sonnet by default. Opus for high-stakes or longer proposals where reasoning depth matters more than speed.

03. Grounding check (Sonnet) This is where most proposal AI breaks. Every claim in the draft gets verified against the retrieved context. Hallucinated metrics, fabricated case studies, invented credentials all get flagged. If the grounding score falls below threshold, the node passes failure context back to node 02 for regeneration with explicit feedback about what didn't ground.

04. Bias detection (Haiku) Scan output for tone and demographic bias. Pattern-match against known bias surfaces: gendered language, demographic assumptions, exclusionary framing. Cheaper model, narrower job.

05. Human review (HITL) Stream the final draft to the frontend via SSE. A human approves, edits, or rejects before anything ships. AI does the work, the human owns the output.

Demo

Architecture decisions worth calling out

Models matched to jobs. Haiku for retrieval and bias scanning. Sonnet for grounding. Sonnet or Opus for generation. Cost stays predictable, latency stays acceptable, and the smart models work where reasoning actually matters.

Failure injected back into the generator. The common LLM-judge pattern is to score outputs, fail them, and regenerate blind. That wastes tokens and rarely converges. ProposalGuard passes the grounding failure context (which claims failed, what the retrieved context actually said) back into the generator's prompt so the regen has the information to fix the problem.

Stateless graph state. Each node reads from and writes to LangGraph state. Nothing depends on a side channel. Replays work, traces work, debugging works.

Streaming via SSE, not WebSockets. SSE is simpler. One-way matches the data flow. The frontend doesn't need to maintain a persistent connection. Less to break.

What I'm working on now

Being honest in this section because "shipped, perfect, done" isn't true for what this is. ProposalGuard is an active build.

The hardest problem is the grounding regeneration loop. Early versions had the grounding node scoring claims below threshold consistently and triggering regen loops that didn't converge. Root causes: feedback wasn't passed back into the generator prompt, the threshold was miscalibrated against what "good enough" actually meant, and no max-retries cap meant the pipeline could spin.

Current work:

Explicit feedback injection from grounding into the generator prompt

Threshold calibration from real eval data

Max-retries cap with graceful degradation

Roadmap:

Langfuse observability for tracing the full pipeline

Evaluation suite running against a real proposal dataset

Node 3.5: context leakage detection for prompt injection defense

DECISIONS.md documenting every architectural call and tradeoff

How I work

I build portfolio projects like this when I want to push deeper into systems that client work doesn't always demand. It keeps me current with the actual engineering of AI products, not just the API surface.

If you're building something agentic and want a technical partner who's done the depth work, my DMs are open. I take on AI engineering the same way I take on greenfield builds and production rebuilds: milestone-based, with judgment and ownership.

Like this project

Completed work

Posted May 17, 2026

Developed ProposalGuard to ensure AI-generated proposal accuracy using LangGraph pipeline.

Likes

Views