KitTools Release Notes

Changelog and release history for the KitTools Claude Code plugin.

On this page

2.5.0 — 2026-05-27
Added
Changed
2.4.3 — 2026-05-22
Added
Changed
Fixed
2.4.2 — 2026-04-24
Added
2.4.1 — 2026-04-24
Fixed
Added
Changed
2.4.0 — 2026-04-17
Added
Changed
Fixed
Removed
2.3.1 — 2026-04-10
Fixed
2.3.0 — 2026-04-06
Added
Fixed
Changed
2.2.2 — 2026-04-04
Added
Changed
2.2.1 — 2026-04-03
Changed
2.2.0 — 2026-04-01
Added
Changed
Removed
2.1.4 — 2026-03-18
Fixed
2.1.3 — 2026-03-13
Changed
2.1.2 — 2026-03-11
Added
Changed
2.1.1 — 2026-03-07
Fixed
Changed
2.1.0 — 2025-03-04
Added
Changed
Removed
2.0.0 — 2026-03-01
Breaking Changes
Changed
1.6.6 — 2026-03-01
Added
Fixed
Changed
1.6.5 — 2026-02-26
Fixed
Changed
Added
1.6.4 — 2026-02-23
Added
Changed
1.6.3 — 2026-02-23
Fixed
1.6.2 — 2026-02-23
Added
1.6.1 — 2026-02-23
Fixed
1.6.0 — 2026-02-22
Added
Changed
Deprecated
1.5.4 — 2026-02-19
Fixed
1.5.3 — 2026-02-09
Added
Fixed
Changed
1.5.2 — 2026-02-07
Added
Changed
Removed
1.5.1 — 2026-02-06
Added
Fixed
1.5.0 — 2026-02-06
Added
Changed
Removed
1.4.0 — 2025-02-02
Added
Changed
1.3.0 — 2025-02-01
Added
Changed
1.1.0 — 2025-01-28
Added
Changed
1.0.0 — 2025-01-27
Added

Back to KitTools Docs

#2.5.0 — 2026-05-27

#Added

Spec security reviewer — New adversarial security review agent for /kit-tools:validate-epic. Reviews feature specs for attack surface expansion, auth/authz gaps, data exposure risks, trust boundary violations, and security-relevant omissions before any code is written. Shift-left security catches design-level problems when they’re cheapest to fix.

#Changed

Validate-epic runs 6 reviewers — Security reviewer joins the existing five parallel reviewers (completionist, story quality, salty engineer, codebase fit, second opinion) for comprehensive pre-execution validation.

#2.4.3 — 2026-05-22

#Added

Codebase fit reviewer — New review dimension for /kit-tools:validate-epic. Deeply explores the actual codebase to verify implementation hints, find missed reuse opportunities, check pattern conformance, and identify duplication risks. Every finding includes file paths and function names grounded in real code exploration.
Signal feedback hook — New harvest_signals Stop hook silently captures skill telemetry from KitTools artifacts for retrospective analysis of skill performance across projects.

#Changed

Validate-epic runs reviewers in parallel — All five reviewers now spawn concurrently instead of sequentially, with consolidated finding presentation and selective re-run of individual reviewers after spec updates.
Epic pause behavior is mode-dependent — Autonomous and guarded modes now run continuously between specs. Supervised mode pauses between specs for user review. Previously, the default caused unintended pauses in autonomous/guarded execution.

#Fixed

Autonomous/guarded epic pausing between specs — Example config and skill documentation hardcoded pause-between-specs as enabled, causing agents to set it regardless of execution mode.

#2.4.2 — 2026-04-24

#Added

KitTools commit signing — Commits created by KitTools agents during orchestration now include a Co-Authored-By: KitTools + Claude trailer, making it easy to identify KitTools-originated commits in your git history.

#2.4.1 — 2026-04-24

#Fixed

Dirty-tree self-block on resume — The orchestrator writes a run header to EXECUTION_LOG.md after the clean-worktree check passes, but before story execution begins. If the orchestrator then crashes, the log is left dirty and every subsequent relaunch fails. Fixed by committing the log header immediately after writing it, closing the dirty-tree window.

#Added

Hybrid model escalation — New escalation model role (defaults to Opus). On retry for specs marked size: L or size: XL, the implementation session upgrades from Sonnet to the escalation model. First attempt is always Sonnet — cheap exploration that produces learnings. Retry gets Opus for stories where the context is too large for Sonnet to process within the timeout.
size: frontmatter field — Feature specs can now declare size: S | M | L | XL to control session timeouts and model escalation on retry.

#Changed

Story sizing raised to 5–7 criteria — Sweet spot raised from 3–5 to 5–7 acceptance criteria per story. The old 3–5 range caused planners to drop criteria to fit, producing under-specified stories. The new guidance: more stories with well-defined criteria is always better than fewer stories with compressed scope.
Story-quality-reviewer hard ceilings — Two new critical (execution-blocking) triggers: more than 10 acceptance criteria, or spanning 3+ architectural layers. Previously all oversized stories were warnings, meaning they could proceed to execution and time out.
Plan-epic sizing step — The final scope check now includes guidance for setting size: frontmatter based on spec complexity.

#2.4.0 — 2026-04-17

Foundation refactor. 2.4.0 is a deep audit of the plugin: hardening, architectural cleanup, cross-agent consistency, and a full decomposition of the orchestrator. The user-visible workflow is unchanged — every skill you invoke still behaves the same — but the internals are substantially more robust and easier to extend.

#Added

Model configurability — Per-run model selection for orchestrator-spawned sessions. Defaults: Sonnet for implementation (cost-optimized for bulk generation), Opus for verification (quality gate), Opus for post-execution validation. Override per-run via a model_config block in the execution config, or pick a preset at launch via /kit-tools:execute-epic.
Unified finding schema for review agents — Every review agent (code-quality-validator, security-reviewer, feature-compliance-reviewer, drift-detector, template-validator, and all spec/vision reviewers) now emits findings in one canonical JSON shape. Skills parse a single format instead of three text-block dialects, which makes adding new review dimensions a lot easier.
Feature spec frontmatter schema doc — New templates/specs/SCHEMA.md documents every valid field on feature-*.md and epic-*.md files, with validation rules and examples for standalone, epic-child, and epic-final cases.
Vision review split — The single vision-reviewer agent (which had three different modes baked into one prompt) is now three focused agents: vision-completionist-reviewer, vision-feasibility-reviewer, and vision-readiness-reviewer. Each has one clear job and one output shape.
Structured event logging — Orchestrator now writes a machine-readable JSONL event log (kit_tools/.execution-events.jsonl) alongside the human-readable stdout stream. Instrumented at critical failure sites. Post-mortem debugging with jq just works now.
EXECUTION_LOG.md rotation — The execution log now rotates past 5 MB, keeping one .1 backup. No more unbounded log growth across resumed runs.
Clean-worktree precondition — The orchestrator now checks for a clean git worktree before creating any branches. If the tree is dirty or the directory isn’t a git repo, you get a clear error up front instead of a confusing failure deep in branch creation.
Git recovery detection — When git merge --abort or git revert leaves the repo stuck in MERGING/REVERTING/CHERRY-PICKING/REBASING state, the orchestrator now detects it and escalates with specific remediation guidance rather than blindly retrying. Uses git rev-parse --git-dir so it works inside linked worktrees too.
State schema versioning — .execution-state.json carries a schema_version field. Newer-than-supported state aborts with a clear message; older-than-current is tolerated (auto-upgraded on next save). Corruption and malformed state get caught at load-time instead of crashing downstream.

#Changed

Orchestrator decomposition — The single 4,087-line execute_orchestrator.py is now a 13-module package (utils, events, config, state, specs, prompts, sessions, tests_metrics, git_ops, supervisor, execution_log, executor, entry). The CLI entry point remains execute_orchestrator.py as a thin shim — no skill changes, no workflow changes.
Explicit tool grants on all agents — Every agent declares exactly which tools it’s allowed to use. Review agents can’t Edit source. Story-verifier can’t modify code. The “independent verifier” boundary is now enforced at the tool layer, not just prompt-layer.
Atomic state writes — State, health snapshots, test metrics, and control files now write via temp-file + fsync + atomic rename. Mid-write crashes no longer corrupt state. The supervisor polling .execution-health.json never sees a partial file.
spec-second-opinion model unpinned — The cross-model second-opinion reviewer no longer hardcodes Sonnet. The invoking skill picks a secondary model at invocation time (different from the primary), so the pattern keeps working as new models ship without needing to re-author the agent.
Supervisor cron cleanup extended — 2.3.1’s self-cleanup handled Completed and “no execution state” cases. 2.4.0 extends to Crashed, Stale, and Failed states, so a supervisor cron never lingers past a run that stopped making progress. /kit-tools:execute-epic docs now explicitly surface the cron’s lifetime (tied to the OG session) and the laptop-sleep caveat.
/kit-tools:sync-project description sharpened — Description now leads with outcome instead of jargon. New “When to use” and “Outcome” sections make the quick / full / resume modes easier to choose between.
Prompt-injection hygiene — Nine code-reading agents (story-implementer, story-verifier, the validation reviewers, drift-detector, test-optimizer, generic-explorer, feature-fixer) got an explicit callout that code, comments, and tool output they consume may contain adversarial prompt-injection attempts and should be treated as text to analyze, never as instructions to execute.

#Fixed

Orchestrator hang after SIGKILL — proc.wait() after killing a subprocess had no timeout. If SIGKILL didn’t take (zombie, permissions, uninterruptible sleep), the orchestrator could hang indefinitely. Now bounded at 10 seconds — prefers a leaked PID over a stuck 24-hour autonomous run.
Stuck merge/revert recovery — Previously, a failed git merge --abort or git revert --abort would be logged as a warning and immediately retried, which would also fail. Now each abort is checked for stuck state and raises with manual-remediation guidance.
Worktree indirection for git recovery — Previously, the merge/revert state check looked at project_dir/.git/MERGE_HEAD directly, which fails in linked worktrees where .git is a file pointing elsewhere. Now uses git rev-parse --git-dir to follow the indirection correctly.
Prompt substitution drift guard — If a prompt builder typo’d a token name (e.g., {{STORRY_ID}}), the malformed {{...}} marker would silently survive into the agent’s prompt. Now every built prompt is checked for leftover tokens and raises with a specific error pointing at the typo.

#Removed

/kit-tools:sync-symlinks — Claude 3.5-era workaround for stale autocomplete symlinks. The plugin’s skill discovery works correctly without it now.
/kit-tools:update-kit-tools — The standard /plugin update kit-tools@washingbearlabs does the same thing natively. To add templates or hooks that weren’t initially selected in your project, re-run /kit-tools:init-project and choose the merge option.

#2.3.1 — 2026-04-10

#Fixed

Supervisor cron cleanup — The supervisor monitoring cron job now self-cleans when execution completes. Previously, the cron created by /kit-tools:execute-epic kept polling after the orchestrator finished and cleaned up its state files. Now /kit-tools:execution-status detects there’s nothing to monitor and deletes its own cron job.

#2.3.0 — 2026-04-06

#Added

Supervisor monitoring mode — New --monitor option for autonomous and guarded execution. When enabled, the launching Claude session stays active as a supervisor, checking orchestrator health every 30 minutes. The supervisor can detect crashes, split oversized stories, pause on repeated failures, and restart the orchestrator — all without requiring system-level permissions (communication happens through JSON files, not shell commands).
Story splitting — The supervisor can split stories that repeatedly fail due to scope. It writes full replacement story definitions (with proper US-NNN IDs) to a control file, and the orchestrator applies the split to the feature spec automatically.
Graduated intervention — The supervisor follows an escalation path: observe retries → intervene after exhaustion → escalate to user if intervention fails. Prevents both premature intervention and runaway failure loops.
24-hour safety net — Orchestrator self-terminates after 24 hours with a critical notification.
Health snapshots — Orchestrator writes health data (heartbeat, memory, PIDs, failure counts) after every story attempt. The supervisor reads these to assess health without running system commands.
Test metrics tracking — New kit_tools/testing/test-metrics.json tracks per-file test pass/fail counts, durations, timeouts, and last run dates across orchestration runs. Portable JSON — no external dependencies.
Verifier: tests_run result field — The verifier now reports which test files it executed, their pass/fail status, and duration. Feeds into test metrics for identifying slow or flaky tests.

#Fixed

Orchestrator: orphaned process cleanup — Claude sessions now kill their entire process group on normal exit, not just on timeout. Previously, child processes (pytest, vitest, node workers) spawned during sessions survived after the session completed, accumulating across stories in an epic and eventually exhausting system memory.
Orchestrator: regression check process handling — Regression tests now run with proper process group isolation. Timeouts kill pytest and all its children instead of only the wrapper shell.
Orchestrator: graceful process termination — Process groups are now terminated with SIGTERM first (with a grace period) before SIGKILL, allowing child processes to clean up.
Orchestrator: tmux cleanup timeout — kill_tmux_session now has a timeout to prevent hanging if tmux is unresponsive.

#Changed

Verifier: no more full-suite fallback — When targeted test detection finds no matches, the verifier now identifies and runs only relevant tests from the diff instead of falling back to the full test suite. Prevents multi-minute test runs in large codebases. Broader coverage is still enforced by the regression check and end-of-epic validation.

#2.2.2 — 2026-04-04

#Added

New Skill: /kit-tools:optimize-tests — Full test suite audit covering mapping completeness, stale test detection, coverage overlap, performance profiling, KitTools convention alignment, and suite verification. Run periodically to keep your test suite healthy as the codebase grows.
Orchestrator: intelligent retry system — Failed stories now receive structured retry context based on failure type (timeout, test failure, criteria mismatch). The orchestrator classifies failures automatically and tailors guidance for each retry attempt.
Orchestrator: adaptive timeouts — Implementation and verification sessions use separate timeout budgets (900s/600s). Optional size: S/M/L/XL in spec frontmatter scales timeouts for larger stories.
Orchestrator: pre-flight checks — Before each story, the orchestrator checks for oversized scope and test mapping gaps. Warnings are logged but don’t block execution.
Orchestrator: cross-story regression detection — After merging a story, the orchestrator runs prior stories’ tests to catch regressions. If a regression is detected, the merge is reverted and execution halts with a notification.
Orchestrator: learnings persistence — Execution learnings now persist across epics in a JSONL file. Future runs benefit from lessons learned in prior epics.
Verifier: pass-with-warnings verdict — The verifier can now return a third verdict for non-blocking concerns (style, naming). Stories merge immediately; warnings accumulate for review during validation.

#Changed

Orchestrator: smarter test targeting — Complete rewrite of test detection with tiered matching (T0: explicit mapping, T1: heuristic). Directory-scoped matching preferred over global search. Match caps prevent timeout-causing over-matching.
Completionist reviewer — New “Integration & Wiring Completeness” dimension checks for UI gaps, unwired artifacts, missing cross-layer connections, and scope narrowness.
Story quality reviewer — New anti-pattern detection (vague verbs, compound criteria) and story ordering checks.

#2.2.1 — 2026-04-03

#Changed

/kit-tools:validate-feature → /kit-tools:validate-implementation — Renamed to better reflect that this skill validates the implementation (code on a branch), not the feature spec itself. No behavioral changes.
/kit-tools:complete-feature → /kit-tools:complete-implementation — Renamed for consistency with the epic-forward workflow. No behavioral changes.
All cross-references updated across skills, agents, hooks, orchestrator, templates, and documentation.

#2.2.0 — 2026-04-01

#Added

Pre-execution validation (/kit-tools:validate-epic) — Quality gate between planning and execution. Runs four sequential agent reviews on every feature spec in an epic before coding starts:
1. Completionist reviewer — Missing stories, uncovered goals, flow gaps
2. Story quality reviewer — Story sizing, ID format, vague criteria, integration scope
3. Salty engineer reviewer — Adversarial GAN-style review for implementation traps, hand-waving, and deployment risks
4. Second opinion (Sonnet) — Cross-model review using a different AI model to evaluate architecture decisions, feasibility, over-engineering, and alternative approaches. All alternatives require explicit trade-off statements.
- Interactive: revise specs and re-run reviews between agents. Produces a go/no-go readiness verdict.
Epic-first planning (/kit-tools:plan-epic) — All work is now structured as an epic, even single-spec features. Replaces the old binary “epic detection” gate with a scope assessment that determines how many feature specs are needed. Always generates an epic-*.md wrapper alongside feature specs.
Desktop notifications for autonomous execution — The orchestrator now sends OS-level notifications (macOS and Linux) on story failures, execution completion, crashes, and pauses. No more discovering failures hours after they happen.
READ_ME.html — Single-file HTML5 documentation page with an interactive 8-phase workflow flowchart, skills grid, hooks table, and install guide.

#Changed

/kit-tools:execute-epic (formerly execute-feature) — Epic-first entry point: selects the epic from epic-*.md files, derives execution order from the Decomposition table.
/kit-tools:complete-implementation — Enhanced learnings capture: gotchas go to GOTCHAS.md, conventions to CONVENTIONS.md, spec-writing notes to Implementation Notes. Context-aware next steps guide to the next epic or feature.
Workflow handoffs improved — seed-project, start-session, and complete-implementation now include clear next-step guidance, closing the workflow loop from init through completion and back to planning.
Repositioned from “documentation framework” to “framework for AI-assisted development” — Updated across all repos and documentation.

#Removed

/kit-tools:plan-feature — Replaced by /kit-tools:plan-epic
/kit-tools:execute-feature — Replaced by /kit-tools:execute-epic
/kit-tools:migrate — v1.x → v2.0 migration no longer supported as a dedicated skill

#2.1.4 — 2026-03-18

#Fixed

Orphaned subprocess cleanup on timeout — Timed-out execution sessions now kill the entire process group (claude + all child processes like pytest, node, etc.) instead of just the direct child. Previously, orphaned test runners would accumulate and consume CPU indefinitely after session timeouts.

#2.1.3 — 2026-03-13

#Changed

Smart test scoping — Story verification now runs only related tests instead of the full suite. Tests are matched by naming convention (e.g., foo.py → test_foo.py) or explicit test mappings in your project’s TESTING_GUIDE.md. The full suite runs only at the validate-implementation gate.
Test output control — Quiet flags suppress per-test PASSED noise while preserving full failure tracebacks and assertion diffs. A safety-net output cap prevents runaway output without hiding failure details.

#2.1.2 — 2026-03-11

#Added

Inline diff for verifier — The verifier agent now receives the full diff content inline (up to 20KB) instead of reading files one-by-one via tool calls. Large diffs are truncated with a stat summary and the verifier falls back to reading full files.
Fail-fast test flags — Verification test commands now include fail-fast flags for known runners (pytest -x, jest --bail, vitest --bail 1), stopping at the first failure instead of running the full suite. The full suite is preserved for validate-implementation.
Completion strategy — Choose how execution finishes with a new completion_strategy option:
- Create PR (default) — Pushes branch and creates a GitHub PR via gh
- Merge to main — Auto-merges to main (blocked if validation finds critical issues, falls back to PR)
- None — Leaves branch as-is for manual handling
- The orchestrator now handles completion directly instead of spawning a separate Claude session

#Changed

Verifier diff accuracy — Diffs use explicit commit-based two-dot syntax instead of merge-base, eliminating ambiguity in multi-commit scenarios
Verifier workflow — Review step updated to start from the inline diff, using the Read tool only when more context is needed
Epic completion — Epics now complete via the same completion strategy instead of spawning a separate completion session
/kit-tools:execute-feature — New Step 2b prompts for completion strategy; pre-flight checks verify gh auth when PR strategy is selected

#2.1.1 — 2026-03-07

#Fixed

Epic automation state mismatch — Fixed a crash when running epic execution in autonomous mode. The skill was pre-creating state with the wrong schema; now the orchestrator handles state creation for both single-spec and epic modes.
Orchestrator crash resilience — Crash handler now registers before config load; leaked attempt branches are cleaned up on startup; archive operations are atomic (write-then-delete instead of modify-then-move)
Agent output parsing — Orchestrator now handles common LLM output quirks: markdown code fences around JSON, preamble text, and trailing commas
Verification session errors — Session errors are now checked before reading result files, preventing stale result reads
Scratchpad hook feedback — Scratchpad creation failures are now reported instead of silently swallowed
Placeholder validation accuracy — Tightened patterns to stop flagging legitimate markdown like [note] or [example] as unfilled placeholders
Manifest completeness — Added missing templates to both SEED_MANIFEST and SYNC_MANIFEST

#Changed

Execute feature skill — State initialization now defers to the orchestrator for autonomous/guarded modes, preventing schema mismatches
Execution status skill — Token estimate display handles missing data gracefully
Story quality pre-flight — Execute-feature now checks story quality before launching (flags vague criteria, under-specified stories)
Learnings cap — Per-story learnings capped at 20 at write time to prevent state file bloat
Template versions — All 30 templates normalized to version 2.0.0
Network retry clarity — Rewrote session retry logic for clearer error categorization
Dead code cleanup — Removed unused functions and imports from orchestrator

#2.1.0 — 2025-03-04

#Added

New Skill: /kit-tools:create-vision — Interactive product vision definition with AI-assisted review
- Guided conversation captures your vision, target users, value proposition, success criteria, and feature areas
- Two-pass review: completeness scoring across 6 dimensions, then feasibility assessment
- Surfaces gaps and suggestions between rounds for iterative refinement
- Produces kit_tools/PRODUCT_VISION.md — one strategic document per project
New Template: PRODUCT_VISION.md — Singular root-level strategic document replacing Product Briefs
- Sections: Vision Statement, Target Users & Personas, Value Proposition, Success Criteria, High-Level Feature Areas, Constraints & Assumptions, Open Questions

#Changed

/kit-tools:plan-feature — Now checks for Product Vision instead of Product Briefs
- Reads vision doc for strategic context when planning features
- Step 12 updates both BACKLOG.md and MILESTONES.md with priority confirmation
- Feature specs use vision_ref: instead of brief: frontmatter
/kit-tools:init-project — Recommended workflow updated: init → seed → create-vision → plan-feature
/kit-tools:migrate — New vision/brief migration steps: creates blank vision doc if missing, flags legacy briefs for review, checks v2.0 completeness
Feature Spec and Epic templates — brief: field replaced with vision_ref: (references a section in PRODUCT_VISION.md)

#Removed

Product Brief template (PRODUCT_BRIEF.md) — Replaced by Product Vision

#2.0.0 — 2026-03-01

#Breaking Changes

kit_tools/prd/ → kit_tools/specs/ — The feature specs directory has been renamed. All internal paths, config keys, state keys, and agent tokens updated to match.
- Run /kit-tools:migrate to update existing projects automatically
- Config keys renamed: prd_path → spec_path, epic_prds → epic_specs
- State keys renamed: prd → spec, prds → specs, current_prd → current_spec

#Changed

/kit-tools:migrate rewritten — Now handles v1.x → v2.0 migration: directory rename, file renames (prd-*.md → feature-*.md), config/state key migration, hook path updates, and documentation path sweep. All steps are idempotent — safe to run multiple times.
Backwards compatibility preserved — detect_phase_completion hook checks both kit_tools/specs/ and kit_tools/prd/ paths. Archive dependency lookups check both feature-*.md and prd-*.md patterns.

#1.6.6 — 2026-03-01

#Added

PRD Compliance Agent — PRD compliance review is now a dedicated subagent (prd-compliance-reviewer) that runs in parallel with code quality and security reviews during feature validation. Previously this ran inline in the validation session, consuming context window.
Diff summarization — Large branch diffs are automatically truncated per-file (60KB budget) before being passed to validator agents. Agents are instructed to read full files when they need more context.
Prompt size guard — Implementation and verification prompts are automatically trimmed if they approach context limits (480K chars). Removes prior learnings and previous attempt diffs first, with a hard-truncate fallback.
Result schema validation — Agent result files are now validated on read. Missing required fields (like story_id, status, verdict) return clear errors instead of causing cryptic failures downstream.

#Fixed

Permanent error handling — Context window and token limit errors are now classified as permanent and cause immediate failure with notification, instead of retrying indefinitely
PRD checkbox scoping — Checkbox replacement now uses regex with line-start anchoring, preventing false positives when - [ ] appears inside descriptions or hint text
Git operation visibility — All git operations now log warnings on failure instead of silently ignoring errors
Pause timeout — Paused execution now auto-resumes after 24 hours with periodic log reminders, preventing indefinite hangs

#Changed

Parallel validation — Feature validation Steps 3 (quality), 4 (security), and 5 (compliance) can now all run in parallel as independent subagents

#1.6.5 — 2026-02-26

#Fixed

Nested session errors — The orchestrator now strips the CLAUDECODE environment variable before spawning claude -p subprocesses, eliminating the “cannot be launched inside another Claude Code session” error in autonomous/guarded mode
Cleanup on error exits — All orchestrator exit paths (Ctrl+C, max retries, dependency failures, crashes) now properly clean up tmux sessions, commit tracking files, and remove temporary result files
Merge conflict handling — If merging an attempt branch into the feature branch fails, the orchestrator now aborts the merge and retries instead of silently marking the story as completed
Result file cleanup — Temporary result files are now cleaned on all retry paths, preventing stale data from being misread on restart
Hook robustness — All hooks now wrap file I/O in error handling to prevent tracebacks on encoding errors or permission issues

#Changed

Notifications simplified — Removed macOS native notifications (osascript). All execution progress is now reported through in-session notifications surfaced on your next prompt. No more context-switching to Notification Center.
tmux self-cleanup — The orchestrator now kills its own tmux session on completion. No more orphaned sessions lingering after execution finishes.

#Added

Git health check — /kit-tools:start-session now checks branch state, uncommitted changes, stash, remote sync status, and recent commits before orienting. Issues are flagged with suggestions, but no actions are taken without your approval.
Plugin discoverability — Projects using KitTools now include an install hint in SYNOPSIS.md so new contributors can find and install the plugin.

#1.6.4 — 2026-02-23

#Added

Execution Notification System — Two-pronged notifications keep you informed during autonomous/guarded execution
- macOS native alerts fire immediately on completions, failures, crashes, and pauses — no need to check manually
- In-session notifications via a UserPromptSubmit hook surface a batched summary the next time you send a message to Claude
- Nine notification points cover the full execution lifecycle: story pass, story failure, single-PRD complete, validation pause, epic PRD complete, between-PRD pause, all epic PRDs complete, dependency blocked, and crash
- Crash detection — An atexit handler detects unexpected orchestrator exits, sets state to crashed, and sends both an OS alert and a file notification
Crashed status in execution-status — /kit-tools:execution-status now recognizes the crashed state with resume/reset actions (same options as stale state)

#Changed

Distribution cleanup — Test files and dev dependencies removed from the shipped plugin. Only runtime files are included in installs.

#1.6.3 — 2026-02-23

#Fixed

Unique tmux session names — Autonomous execution now uses descriptive, per-feature session names (kit-exec-{feature}) instead of a single hardcoded name
- Running multiple projects concurrently no longer risks killing each other’s tmux sessions
- Session names are stored in the execution config so /kit-tools:execution-status can find the right session
- Backwards compatible with older runs

#1.6.2 — 2026-02-23

#Added

New Skill: /kit-tools:execution-status — Check progress of autonomous execution from within Claude Code
- Shows completion percentage, per-story status table, session stats (tokens, time elapsed)
- Detects stale state when the orchestrator has crashed or exited
- Offers contextual actions based on current state: pause, resume, attach to tmux, retry
- Epic mode: shows per-PRD progress table

#1.6.1 — 2026-02-23

#Fixed

Autonomous execution launch — The orchestrator now launches in a detached tmux session instead of running in the background from within a Claude session
- Fixes nested claude -p calls being blocked by Claude Code’s recursion prevention
- If tmux is not installed, a copy-pasteable command is printed for running in a separate terminal
- Pre-flight checks now verify tmux availability for autonomous/guarded modes
- Monitoring commands (attach, tail log, check state, pause) reported after launch

#1.6.0 — 2026-02-22

#Added

Unit Test Suite — 75 tests for the execute orchestrator covering PRD parsing, story extraction, prompt building, and test command detection
File-Based Agent Results — Agents write structured JSON result files (.story-impl-result.json, .story-verify-result.json) instead of stdout parsing, eliminating ~33% false failure rate from LLM output formatting
Branch-per-Attempt Strategy — Each implementation attempt runs on a temporary branch; successful attempts merge, failed attempts are deleted cleanly (no more destructive git reset)
Patch-Based Retry Context — Failed attempt diffs are included in retry prompts so the agent takes a different approach
Token Estimation — Per-session input/output token tracking logged in execution state
Auto-Detect Test Command — Automatically finds the project’s test runner by checking package.json, pyproject.toml, pytest.ini, Makefile, and TESTING_GUIDE.md
Test Execution in Validation — /kit-tools:validate-implementation now runs the project’s test suite; failed tests are logged as critical findings
Auto-Injected Test Criteria — /kit-tools:plan-feature automatically adds “Tests written/updated” and “Full test suite passes” criteria to every code story (doc/config-only stories are exempt)
Implementation Hints — Per-story hints flow from planning to implementation, reducing agent exploration time
- plan-feature generates hints during refinement (key files, patterns, gotchas)
- Implementer agent receives hints as part of its prompt
Pause on Critical Findings — Autonomous execution pauses when validation finds critical issues, creating a .pause_execution file referencing the findings. Resumes when the file is removed after review.

#Changed

YAML Parsing — Replaced hand-rolled frontmatter parser with PyYAML for proper handling of lists, booleans, and edge cases
Verifier Independence — Verifier agent receives git-sourced file lists (git diff --name-only) instead of trusting implementer claims
Reference-Based Context — Agent prompts pass file paths instead of inlining full contents, reducing prompt size ~80% for large projects
Skill Structure — Four pipeline skills (execute-feature, plan-feature, validate-implementation, complete-implementation) split into SKILL.md (core workflow) + REFERENCE.md (detailed formats and examples), reducing context consumption significantly
PRD Template — Updated to v1.3.0 with Implementation Hints section and auto-injected test criteria

#Deprecated

Stdout-based result parsing — Kept for backward compatibility but superseded by file-based JSON results
reset_to_commit() — Replaced by branch-per-attempt strategy

#1.5.4 — 2026-02-19

#Fixed

Hook path resolution — Project-level hook commands now use $CLAUDE_PROJECT_DIR instead of relative paths
- Previously, hooks used python3 kit_tools/hooks/... which breaks if shell CWD drifts during a session
- Now uses python3 "$CLAUDE_PROJECT_DIR/kit_tools/hooks/..." — resolves correctly regardless of CWD
- Fixes an infinite loop scenario where a Stop hook file-not-found error re-triggers the Stop event
- Existing projects: run /kit-tools:update-kit-tools to get the updated hook paths

#1.5.3 — 2026-02-09

#Added

Epic Chaining — Multi-PRD epics now execute automatically on a shared epic/[name] branch
- PRD template gains epic, epic_seq, epic_final frontmatter fields
- /kit-tools:execute-feature detects epic PRDs and offers sequential execution
- Orchestrator chains PRDs: stories -> validate -> tag checkpoint -> archive -> next PRD
- Hard dependency gate blocks execution if depends_on PRDs aren’t archived
- Git tags mark each PRD checkpoint (e.g., oauth/oauth-schema-complete)
- Resume support: skips already-completed PRDs on restart
- Cross-PRD learnings carried forward to subsequent story prompts
Pause Between PRDs — Option to review after each PRD before continuing the epic
- Recommended default for epic execution
Epic-Aware Completion — /kit-tools:complete-implementation handles mid-epic and final-epic PRDs
- Mid-epic: tag + archive only (no PR or artifact cleanup)
- Final epic PRD: PR references all PRDs and checkpoint tags

#Fixed

Verifier output parsing — Strips markdown code fences before parsing, fixing ~33% false failure rate when the verifier wraps output in triple backticks
- Fallback verdict detection scans for pass/fail signals when the structured block is missing
- Raw output logged on parse failure for diagnosis
Verification-only retry — When implementation succeeded but verifier parsing failed, retries now skip re-implementation and only re-run verification
Failure detail sanitization — Log entries no longer contain raw template content from session errors
Verifier template — Now explicitly instructs the LLM to output the structured block as plain text, not inside code fences

#Changed

Orchestrator — Refactored into run_single_prd() and run_epic() with shared story execution loop
/kit-tools:plan-feature — Epic decomposition now sets chaining fields (epic, epic_seq, epic_final)
/kit-tools:execute-feature — Epic detection, dependency hard gate, epic/[name] branching, epic_prds config format

#1.5.2 — 2026-02-07

#Added

New Skill: /kit-tools:validate-implementation — Full branch-level validation against PRD
- Reviews entire branch diff (git diff main...HEAD) — all changes across the feature
- Three independent review passes: code quality, security, and PRD compliance
- Automatic fix loop (max 3 iterations) for critical findings
- Autonomous mode: spawns a fixer agent; supervised mode: fixes inline
Dedicated Security Review Agent — Security gets focused attention in its own review pass
- Covers injection vulns, auth gaps, secrets, input validation, insecure defaults, dependency risks
Dedicated Fix Agent — Targeted fixes for validation findings in autonomous mode
Automatic validation after execution — The orchestrator now spawns a validation session after all stories complete

#Changed

Code quality validator — Narrowed to quality-only (security and intent alignment moved to dedicated agents)
/kit-tools:execute-feature — Completion messaging now directs to validate-implementation
/kit-tools:complete-implementation — Now cleans up execution artifacts, handles feature branch (PR/merge), and references validate-implementation
/kit-tools:close-session and /kit-tools:checkpoint — Use inline quality checks for session-level diffs instead of the full feature validation
detect_phase_completion hook — Only suggests validate-implementation when all PRD criteria are complete, not on every checkbox

#Removed

/kit-tools:validate-phase — Replaced by validate-implementation (branch-level validation)

#1.5.1 — 2026-02-06

#Added

New Skill: /kit-tools:sync-symlinks — Force-refresh skill symlinks after a plugin update
- Reads installed_plugins.json to find the correct install path
- Useful when skills appear stale after /plugin update

#Fixed

sync_skill_symlinks hook — Now reads ~/.claude/plugins/installed_plugins.json as the source of truth for the plugin install path
- Fixes issue where skill symlinks remained pointed at the previous version after a plugin update
- $CLAUDE_PLUGIN_ROOT can be stale after updates; the hook now bypasses it in favor of the authoritative JSON

#1.5.0 — 2026-02-06

#Added

Native Autonomous Execution — /kit-tools:execute-feature replaces the previous Ralph integration
- Three execution modes: Supervised, Autonomous, and Guarded
- Supervised: in-session with user review between stories
- Autonomous: spawns independent claude -p sessions per story (unlimited retries by default)
- Guarded: autonomous with human oversight on failures (3 retries default)
Story Implementer Agent — agents/story-implementer.md implements a single user story
- Explores codebase, implements changes, self-verifies, commits
- Structured output format for orchestrator parsing
Story Verifier Agent — agents/story-verifier.md independently verifies acceptance criteria
- Skeptical assessment — reads actual code, doesn’t trust implementer claims
- Runs typecheck/lint/tests as specified in criteria
Execution Orchestrator — scripts/execute_orchestrator.py manages multi-session execution
- Spawns fresh Claude sessions per story (implementation + verification)
- Pause/resume via touch kit_tools/.pause_execution
- Dual-track state: PRD checkboxes + JSON sidecar
- Execution log at kit_tools/EXECUTION_LOG.md
Git Branch Isolation — All execution happens on feature/[prd-name] branches
- Failed retries reset working tree, never touch main
- Branch ready for user review when all stories complete

#Changed

PRD Template — ralph_ready field renamed to session_ready
/kit-tools:plan-feature — Removed Ralph references, uses session_ready and execute-feature
/kit-tools:complete-implementation — Removed Ralph cleanup step, updated Related Skills

#Removed

/kit-tools:export-ralph — Replaced by native execute-feature
/kit-tools:import-learnings — Learnings captured natively during execution

#1.4.0 — 2025-02-02

#Added

Epic Detection & Decomposition — /kit-tools:plan-feature now detects large features and decomposes them
- Automatic detection of epic-sized scope (>7 stories, multiple subsystems, scope keywords)
- Proposes breakdown into multiple focused PRDs
- Tracks dependencies between related PRDs with depends_on field
Ralph-Ready Validation — /kit-tools:export-ralph validates PRD scope before export
- Checks story count (target <=7), acceptance criteria count (target <=35)
- Soft warning with strong recommendation if PRD exceeds limits
- Suggests decomposition via plan-feature if PRD is too large
Senior Dev Persona — Skills now act as senior dev reviewers
- Push back on scope creep and poorly-scoped PRDs
- Ensure PRDs are set up for implementation success

#Changed

PRD Template — Updated to v1.1.0 with new frontmatter fields
- ralph_ready: true/false — Indicates if PRD is properly scoped
- depends_on: [] — Array of feature names this PRD depends on
- Added session-fit guidelines in template comments
/kit-tools:plan-feature — Enhanced with scope validation
- Final scope check before generating PRD
- Story count limits (5-7 ideal, 8+ triggers warning)
- Acceptance criteria limits (3-5 per story, <=35 total)

#1.3.0 — 2025-02-01

#Added

PRD (Product Requirements Document) System — New workflow for feature planning
- kit_tools/prd/ directory for PRD files with YAML frontmatter
- kit_tools/prd/archive/ for completed PRDs
- PRD template with user stories (US-XXX), acceptance criteria, functional requirements (FR-X)
New Skill: /kit-tools:complete-implementation — Mark PRD as completed and archive it
New Skill: /kit-tools:export-ralph — Convert KitTools PRD to ralph’s prd.json format
New Skill: /kit-tools:import-learnings — Import ralph progress.txt learnings back to PRD

#Changed

/kit-tools:plan-feature — Now generates PRDs (prd-[name].md) instead of FEATURE_TODO_*.md
- User story format with acceptance criteria
- Functional requirements in FR-X format
- Implementation Notes section for capturing learnings
/kit-tools:start-session — Now checks kit_tools/prd/ for active features
/kit-tools:close-session — Prompts for Implementation Notes when working on a PRD
/kit-tools:checkpoint — Captures learnings to active PRD’s Implementation Notes

#1.1.0 — 2025-01-28

#Added

New Skill: /kit-tools:validate-phase — Code quality, security, and intent alignment validation
- Three-pass review: quality & conventions, security, intent alignment
- Findings written to persistent AUDIT_FINDINGS.md with unique IDs and severity tracking
New Agent: code-quality-validator.md — Prompt template for the validation subagent
New Template: AUDIT_FINDINGS.md — Persistent audit findings log
- Status tracking (open / resolved / dismissed)
- Severity levels (critical / warning / info)
New Hook: detect_phase_completion — Advisory hook for TODO task completions

#Changed

/kit-tools:checkpoint — Added validation step for code changes
/kit-tools:close-session — Added validation step
/kit-tools:start-session — Reviews open audit findings

#1.0.0 — 2025-01-27

#Added

Initial public release
Core Skills: init-project, seed-project, migrate, start-session, close-session, checkpoint, plan-feature, sync-project, update-kit-tools
Automation Hooks: create_scratchpad, update_doc_timestamps, remind_scratchpad_before_compact, remind_close_session
Project Type Presets: API/Backend, Web App, Full Stack, CLI Tool, Library, Mobile, Custom
25+ Documentation Templates across Core, API, Ops, UI, and Patterns categories

GitHub Repository