Templates and Command Reference
Full generic templates for PRDs, phase plans, and specs. Complete command definitions for all 8 commands. Queue and metrics schemas. Adapt the structure to your stack; keep the constraints.
~15 minute read (reference material)
Each template below shows the generic structure first, then a concrete excerpt using the Linear running example. The structure is stack-agnostic. The examples use a specific stack (Next.js, tRPC, Drizzle, Neon) to make concepts concrete.
PRD Template
Header
# [Unit ID]: [Title]
Category: Frontend | Backend | Full Stack
Milestone: M[N]
Status: Draft | Verified
Dependencies: [Unit IDs]
Test Pair: [Unit ID]
Common Sections (All PRDs)
- Purpose - Why this unit exists. Consequence of not having it.
- Scope - In scope / out of scope. Out-of-scope items reference which unit handles them.
- Dependencies - Unit ID, title, status, why needed.
- Test Pair - Which unit verifies this one.
- Related Units - 4 relationship types: depends on, test pair, composes into, shares pattern with.
- Success Signals - Observable behaviors (not metrics).
- Metrics to Track - Quantifiable with targets and measurement method.
- Cross-Unit Integration Points - Sends to / receives from, with triggers and data shapes.
- Tier and Metering Constraints - Feature availability per tier. User experience at each limit.
- Configuration Surface - What is configurable, defaults, where it lives.
- Assumptions - Documented, not implicit.
- Open Questions - With resolution path. Zero allowed in phase plans.
- Cross-Cutting Features - Audit, search, notifications: follows defaults or overrides.
- Changelog - Date, change, reason.
Frontend Additions (F1-F13)
F1: User Journeys (4+ scenarios) | F2: Page Structure | F3: Interaction Patterns (trigger/behavior/feedback/result) | F4: States (8 sub-variants: empty first-time, empty filtered, loading initial, loading refresh, loaded standard, loaded dense, error full, error partial) | F5: Degradation (3 levels) | F6: Responsive | F7: URL Structure | F8: Keyboard | F9: Accessibility | F10: Performance | F11: Brand | F12: First-Time UX | F13: UI Mock (interactive HTML with state switcher)
Backend Additions (B1-B9)
B1: Capability (business-level) | B2: Business Rules (absolute constraints) | B3: Data Flow Narratives (with failure paths) | B4: Performance Targets | B5: Reliability and Recovery | B6: Security | B7: Scalability (launch/growth/scale) | B8: Event Catalog | B9: Output Mocks
# M1-U03: Issue List View
Category: Frontend
Milestone: M1 (Issue Data Model + Views)
Status: Verified
Dependencies: M1-U02 (Issue CRUD API), M0-U08 (App Shell)
Test Pair: M1-U02 (verified by this frontend unit)
Phase Plan Template
16 Sections
- Overview - What this phase accomplishes in one paragraph.
- Codebase Snapshot - Existing tables, routers, components, constraint files. Used by verify-spec for drift detection.
- Pre-requisites - Codebase requirements (verified), human tasks (blocking/non-blocking), new dependencies (approval needed).
- Decisions - Every non-obvious choice: decision, context, rationale.
- Deferred Decisions - Question, why deferred, resolved by whom, measurable criteria.
- Tables and Schemas - Every column, type, default, constraint, index, RLS policy.
- API Contracts - Per procedure: name, type, auth, metering, input, output, validation, errors, optimistic strategy (Via UI / Via Cache / None).
- Background Jobs - Trigger, framework, expected duration, retry policy.
- Visual Reference - Mock file path and relevant states.
- Component References - Used (already built) and needed (build in this phase).
- Integration Points - Connects to and consumed by.
- Excerpted Guidelines - Rules copied verbatim from foundation docs. Only rules relevant to touched packages.
- Risks - Implementation risks with impact, likelihood, mitigation.
- Test Strategy - Categories, criteria per category, background job test approach (framework-specific), performance thresholds.
- Phase Completion Criteria - Tests, code quality, visual match, integration, documentation.
- Changelog
Key Detail: Optimistic Strategy per Mutation
| Pattern | When | Implementation |
|---|---|---|
| Via UI | Additive (create, toggle) | Render optimistic state using mutation's isPending + variables in JSX. Do not touch cache. |
| Via Cache | Reorder/move/delete | Manipulate query cache in onMutate. Save previous. Restore in onError. Invalidate in onSettled. |
| None | Irreversible (delete, submit) | Wait for server. Show loading on trigger element. |
Key Detail: Background Job Test Approach
For event-driven job frameworks (e.g., Inngest): use the framework's test engine to test full function execution, individual steps, and mocked dependencies.
For long-running task frameworks (e.g., Trigger.dev): extract business logic into pure functions. Test pure functions with standard test runner. Mock the task wrapper.
Spec Template
Header
# Spec: [Unit ID]-P[N]-S[N] [Descriptive Name]
Phase: [Unit ID]-P[N]
Category: Frontend | Backend | Full Stack
Status: Draft | Verified | Tests Written | Implemented
Estimated Duration: [N] hours
Test Count: [N] tests in Block A
Context Budget: [N]K tokens (tests: [N]K, tasks: [N]K, refs: [N]K)
Block A: Test Spec
Categories: A1 (Unit/API), A2 (Visual/Component), A3 (Interaction/Accessibility), A4 (E2E), A5 (Performance).
Per test: unique ID, name, setup, input, assertions (deep only). No toBeDefined(), toBeTruthy(), toBeFalsy() as standalone assertions. Every async test: expect.assertions(N). Every DB test: verify state with independent query.
Block B: Implementation Spec
- B1: Pre-execution checks - Dependencies verified, pattern files exist, no merge conflicts.
- B2: Tasks - Sequential. Per task: action, file, what to build, pattern reference, mapped test IDs.
- B3: Deferred decision tasks - Measurable criteria, both paths specified.
- B4: Post-execution (ex-impl, Tier 1-2) - Full test suite, typecheck, lint, security, self-review.
- B5: Verification (ex-verify, Tier 3-4) - Mutation testing (StrykerJS, >=70%), exploratory browser testing, code review, migration generation, push to main.
- B6: Commit - Message format, squash, local only (ex-verify pushes).
Schema Tasks Use Push, Not Generate
During execution: drizzle-kit push (or equivalent) applies schema directly to the isolated database branch. No migration files generated. Canonical migration files are generated by ex-verify as the final step before pushing to main. This avoids migration file conflicts between parallel specs.
## A1-T3: issue.create validates title length
| Field | Value |
|-----------|-------|
| Setup | Create workspace and team via factories. |
| Input | Call issue.create with title of 501 characters. |
| Assert | Throws TRPCError with code BAD_REQUEST. |
| | Error message contains "title". |
| | No row inserted into issues table |
| | (verify: SELECT COUNT(*) FROM issues |
| | WHERE team_id = [team.id] returns 0). |
Command Reference
Planning Commands
| Command | Model | Input | Output | Key Checks |
|---|---|---|---|---|
| verify-prd | Deep reasoning | PRD + all existing PRDs + foundation docs | Verification report | 18 checks: template conformance, terminology, scope overlap, dependency validation, integration symmetry, states coverage, mock verification, etc. |
| generate-phase | Deep reasoning | Verified PRD + codebase | 1-3 phase plan files | Resolves all open questions. Every section filled. Tables with RLS. API contracts with optimistic strategy. Job test approaches specified. |
| generate-spec | Fast | Phase plan + codebase | 1-3 spec files | Splitting by test count (<25), duration (<2h), context (<50K tokens). Deep assertion enforcement. Context budget in header. |
| verify-spec | Fast | Spec + codebase + phase plan | Updated spec or pass | 8 drift checks (file existence, schema match, new files) + 7 fresh-eyes checks (self-contained, deep assertions, optimistic strategy match). |
Execution Commands
| Command | Model | Input | Output | Key Actions |
|---|---|---|---|---|
| ex-test | Fast | Spec Block A | Test files (all failing) | Rebase. DB branch create/drop. Generate tests. ESLint shallow check. Code review. Commit locally. |
| ex-impl | Fast | Spec Block B + test files | Implementation (Tier 1-2 verified) | Rebase. DB branch. Execute tasks sequentially. Fix-and-retest loop. Tier 1 (automated) + Tier 2 (self-review). Squash commit locally. |
| ex-verify | Deep reasoning | Spec + phase + PRD + code | Verified code on main | DB branch. StrykerJS >=70%. Exploratory testing (5-15 scenarios via extended reasoning). Code review. Migration generate. Rebase + push. |
Feedback Command
| Command | Model | Input | Output | Key Actions |
|---|---|---|---|---|
| apply-learnings | Deep reasoning | Exec log (3 sections) + metrics | Updated documents | Per-spec: categorize findings into 6 types, propose diffs, human approves. Per-milestone: trend analysis across all specs. |
Self-Scheduling
All execution commands support running without arguments. They read .machine (machine identity) and agents/queue.json (spec status and dependencies), filter by eligibility, apply priority (unblocking > critical path > conflict avoidance > FIFO), and recommend the next spec. Human confirms before proceeding.
Queue Schema
agents/queue.json
{
"version": 1,
"updated_at": "2026-03-26T14:30:00Z",
"updated_by": "machine-a1",
"ready": [
{
"spec_id": "M1-U02-P1-S1",
"unit": "M1-U02",
"title": "Issue CRUD Backend",
"category": "backend",
"packages_touched": ["db", "api", "types"],
"files_modified": ["packages/api/src/root.ts"],
"dependencies": ["M0-U01-P1-S1"],
"estimated_hours": 1.5,
"priority": 1
}
],
"in_progress": [],
"completed": []
}
Status progression: ready > testing > tests-written > implementing > implemented > verifying > complete
Conflict avoidance: files_modified tracks existing files that will be changed (not new files created). Two specs both creating new files in the same directory is fine. Two specs both modifying the same existing file must run sequentially.
Execution Log Structure
agents/exec-log-[SpecID].md
Three sections, each appended by its respective command:
Section 1 (ex-test): Session timing, model, machine, memory snapshots, DB branch lifecycle, test files created (count per file), self-check results (spec match, auto-generated tests), ESLint results, code review, test run results (new failing, existing passing), commit hash, key learnings.
Section 2 (ex-impl): Session timing, model, machine, memory, DB branch lifecycle, task execution table (per-task: duration, tests run, fix cycles, notes), deferred decisions resolved (question, answer, measurement), Tier 1-2 results (pass/fail per check), code review, commit hash, key learnings.
Section 3 (ex-verify): Session timing, model, machine, memory, DB branch lifecycle, mutation testing (initial/final score, assertions deepened, time), exploratory testing (scenarios planned/executed, issues by severity, fixed/deferred), code review findings, Tier 3-4 results, migration files, rebase details (conflicts, resolution), push confirmation, key learnings, deferred items for apply-learnings.
Metrics Schema
agents/metrics.json
Two layers: per-spec entries and per-milestone aggregates.
Per spec: spec_id, unit, milestone, machine, category, per-command metrics (duration, model, tests created/passed, fix cycles, mutation scores, issues found), totals (total duration, deviations, human interventions, rework flag, push timestamp, commit hash).
Per milestone: specs completed, total duration, average duration, total deviations, rework rate, average mutation score.
Trend notes: Added at end of each milestone. Human-readable observations about process health, adjustments made, and rationale.
Adapting These Templates
The templates above are designed to be stack-agnostic in structure but stack-specific in examples. When adapting to your project:
- Keep the section structure. The sections exist because each one prevents a specific class of error. Removing sections creates gaps that AI agents will fill with guesswork.
- Replace tool-specific references. Swap "tRPC" for your API framework, "Drizzle" for your ORM, "Neon" for your database. The patterns (RLS testing, migration strategy, optimistic UI) apply regardless of tool.
- Adjust thresholds to your context. The 70% mutation score, 50K context budget, and 2-hour session duration are calibrated for the described toolchain. Your thresholds may differ.
- Start with one unit end-to-end. Do not try to set up the entire system before building anything. Pick one backend unit, create its PRD, generate its phase plan, generate its spec, and execute it. Refine the templates based on what you learn.