Dogfood Methodology
The regression workflow is designed to answer two different questions:
- Can AgentDocs compile and audit this documentation deterministically?
- Can a coding agent safely complete a specific task using the generated context?
The first question is automated. The second remains an explicit human judgment.
The published findings preserve the June 11, 2026 baseline, a post-hardening rerun captured on June 12, 2026, and an agent workflow-layer rerun captured on June 16, 2026. Prepared crawl artifacts were rebuilt without a live network recrawl; live website results can change as upstream documentation changes.
Standard capture
Every prepared target runs:
pnpm regression:dogfood -- <target-directory>The runner builds the target twice, compares generated-artifact hashes, runs the readiness doctor, and captures the top five results for:
authentication
quickstart
error handlingWorkflow-specific queries are repeatable:
pnpm regression:dogfood -- .dogfood/fastify \
--name fastify-local-docs \
--query schema-validation="schema validation" \
--query plugin=plugin \
--query migration=migrationThe runner can also enforce machine-checkable expectations:
pnpm regression:dogfood -- .dogfood/fastify \
--query migration=migration \
--expect-top migration="V5 Migration Guide" \
--expect-no-mixed migration=version \
--expect-task-pack quickstartUse --expect-warning <label>=<warning-code> when a deliberately broad query must report a context conflict. Failed expectations are preserved in summary.json and fail the regression.
Task-pack routing is captured with deterministic workflow commands:
pnpm regression:dogfood -- .dogfood/fastify \
--routing-goal migration="migrate to Fastify v5" \
--expect-route migration=migration--routing-goal records agentdocs handoff and agentdocs verify-context for the goal. --expect-route turns that routing goal into a regression assertion. Without --expect-route, routing is report-only.
Workflow-layer reruns may also check:
agentdocs status
agentdocs handoff "<task>"
agentdocs verify-context --task "<task>"These commands answer a different question from the original build/search harness. They do not prove that an agent completed the implementation task; they show whether the built context is fresh and whether AgentDocs can produce a task-shaped, verified handoff for the requested goal.
CI runs pnpm regression:fixtures against the committed hardening corpus. It checks version, framework, and router filtering; context-conflict warnings; tolerant MDX diagnostics; and quickstart task-pack generation without relying on live network sources.
Recorded evidence
Each target records:
- pages collected;
- chunks generated;
- entities extracted;
- task packs generated;
- readiness score;
- broken links;
- warnings and deprecations;
- top five search results for standard and workflow-specific queries;
- task-pack routing results when routing goals are declared;
- first-build and repeated-build output hashes;
- explicit search-quality judgments;
- explicit
agent_task_passedjudgment; - automated expectation results;
- notes and preserved failure details.
Successful target output is written under results/:
results/
build.json
build-repeat.json
doctor.json
search-auth.json
search-quickstart.json
search-errors.json
routing-<label>-handoff.json
routing-<label>-verify.json
summary.json
summary.csvIf a command fails, the runner writes failure.json with the command, exit code, and captured diagnostics.
Evaluation rules
A deterministic build is necessary, but it does not prove that the context is correct. A readiness score is informative, but it does not prove that a specific workflow is safe. Relevant search results are useful, but they do not prove that an implementation task can be completed.
For that reason:
- repeated-build hashes must match;
- search quality is judged separately for standard queries;
- task-pack routing expectations fail only when explicitly declared;
- workflow-specific retrieval is inspected for version, framework, router, and runtime mixing;
- failures must preserve actionable diagnostics;
agent_task_passedstaysunknownuntil the specified task is completed using the generated context.
Agent task criteria
The matrix includes tasks such as:
- build Hono GET and POST routes with typed validation and deploy to Cloudflare Workers;
- build a Fastify v5 server with JSON schema validation, a plugin, and structured error handling;
- implement a React mutation with invalidation using React-specific TanStack Query evidence only;
- build a current Next.js App Router POST route handler;
- use Supabase auth and Row Level Security without exposing secret keys.
These tasks deliberately test whether generated context respects boundaries that matter in real implementation work.
Interpreting the published findings
The published table separates:
- passed regression: automated build, audit, search capture, and repeated hash completed;
- failed regression: AgentDocs stopped and preserved diagnostics;
- blocked preparation: the source target could not be prepared;
- passed agent task: an agent completed the workflow successfully using the generated context;
- unknown agent task: the implementation task has not yet been judged.
This avoids turning a large page count or a high readiness score into a claim the evidence does not support.
For the exact target commands and task-specific pass criteria, see the Dogfood Workflow Matrix. For metric definitions, see the Evaluation Metrics Reference. For additional bounded live crawl examples, see Live Dogfood Runs.