Familiar — AI Encounter Generation Engine by Zona GilreathFamiliar — AI Encounter Generation Engine by Zona Gilreath

Familiar — AI Encounter Generation Engine

Zona Gilreath

Zona Gilreath

AI-powered D&D 5e encounter generator
Familiar generates structured, ready-to-run encounter sheets for dungeon masters. It takes optional form input and produces nine distinct encounter types — each with its own schema, run-sheet layout, and type-specific guidance — driven by a RAG pipeline built on the D&D 5e SRD.

The retrieval problem

The D&D 5e SRD covers hundreds of monsters, hundreds of spells, and a large body of rules and encounter design guidance. Loading it all into every generation request is wasteful and context-inefficient. Loading none of it produces hallucinated stat blocks the moment the model reaches for a specific creature.
The solution is a two-tier RAG architecture. A curated subset of the SRD — encounter design guidelines, rules for skill challenges, trap design principles, social encounter frameworks, and general guidance for each encounter type — loads into a Gemini API-level context cache at server startup and stays warm across requests. The large datasets (monster stat blocks and spells) stay out of the context window entirely, retrieved only when the model asks for them.

Tool-backed SRD datasets

The model has access to four tools, organized by dataset type. Two operate on the creature dataset (monsters and animals parsed from the SRD markdown), and two on the spell dataset. Within each dataset there’s a search tool — filtering by name, type, CR range, size, spell school, class, and concentration — and a detail tool that returns the full stat block or spell description by exact name.
The generation loop runs up to ten tool call rounds. On each round the model can issue multiple calls in parallel; the server executes them, appends the results to the conversation, and continues. The model typically searches first to find candidates, then fetches full stat blocks for the creatures it selects. The loop terminates when the model returns a text response with no pending tool calls.

Encounter schema design

The output is a discriminated union of nine encounter kinds: combat, puzzle, social, skill challenge, investigation, trap, exploration, chase, and hazard. Every kind shares a common envelope — title, goal, stakes, setup, and per-class spotlight hooks — and extends it with a type-specific payload.
The payloads encode design intent structurally. A combat encounter carries creature roles (brute, controller, lurker, etc.), terrain features typed as cover, obstacle, hazard, or interactable, an XP budget, and tactical notes per creature type. A social encounter carries NPC profiles with ideals, bonds, flaws, objection lists, and patience windows. A puzzle carries clue sets where each clue names the conclusion it supports and how it’s discovered. These aren’t free-form strings — the schema constrains what the model can return and guarantees the UI has typed fields to render.
The encounter design guidelines in the cached context specify what good looks like for each type: combat encounters require at least two terrain features; puzzles require at least three clues per conclusion and two solution paths; investigation nodes must each carry clues that are independently sufficient; traps must include at least two countermeasures; social encounters must have branching consequences for success, partial success, and failure. The schema enforces the structure; the guidelines shape the content.

Context caching

The Gemini API supports explicit context caching: a named, server-side cache that stores a system prompt, a priming turn, and tool declarations against a specific model. Cached content is billed at a reduced token rate and doesn’t count toward per-request input tokens.
Familiar creates the cache at first request and keeps the reference for 30 minutes. Each generation call attaches the cache name rather than resending the full SRD context. If cache creation fails — cold start, quota limit, model mismatch — the server falls back to an uncached system prompt transparently. The degraded path is slower and more expensive per request but otherwise identical in behavior.

Tradeoffs

The generation endpoint buffers a complete JSON response rather than streaming partial text. Streaming structured data mid-object is fragile — partial JSON isn’t parseable, and a partially rendered run sheet is harder to read than a complete one that appears at once. The tradeoff is a longer wait before the first pixel updates, mitigated with a loading state.
The tool call loop adds latency proportional to the number of rounds the model takes. A combat encounter that searches for creatures by CR, fetches two full stat blocks, then generates adds 2–3 round-trips over a simple text generation. That cost is intentional — grounded stat blocks are worth the wait.
Like this project

Posted Mar 17, 2026

Two-tier RAG pipeline on Gemini API with 4 tool calls, 9 encounter types, and context caching. Up to 10 tool-call rounds per generation request