juliusbrussee/cavekit

@JuliusBrussee 1039

Drive spec-driven development in Claude Code with caveman encoding and automated validation.

spec-driven-developmentclaude-codeai-codingcaveman-encodingdevelopment-workflowpluginsddcoding

Install

$ npx skills add JuliusBrussee/cavekit

README

# GitHub Repository: JuliusBrussee/cavekit

**URL:** https://github.com/JuliusBrussee/cavekit
**Author:** JuliusBrussee
**Description:** A Claude Code plugin that turns natural language into blueprints, blueprints into parallel build plans, and build plans into working software with automated iteration, validation, and cross-model peer review.
**Homepage:** https://caveman.so/
**Language:** 

## Stats
- Stars: 1039
- Forks: 74
- Open Issues: 14
- Commits: 138
- Created: 2026-03-14T23:59:42Z
- Updated: 2026-06-18T14:56:45Z
- Pushed: 2026-06-18T06:00:50Z

## README
<h1 align="center">cavekit</h1>

<p align="center">
  <strong>compressed spec-driven development for claude code</strong><br/>
  <sub>one file · one loop · zero sub-agents</sub>
</p>

---

## what this is

Plan-then-execute forgets. SDD remembers — but most SDD frameworks bury
that value under agent swarms, dashboards, and ceremony that costs more
tokens than it saves.

Cavekit is the simplest full loop: **grill → spec → research → review →
build**, over one `SPEC.md` file, no sub-agents. Three commands you run
every time; four more you reach for only when the change earns it.

The spine is three properties that earn their tokens:

- **durable spec** — `SPEC.md` at repo root survives context resets. It is
  the agent's long-term memory: lose the window, reload the spec, keep going.
- **caveman encoding** — ~75% fewer tokens than prose. Symbols, fragments,
  pipe tables. All nine skill descriptions cost ~1.1k context — 16× lighter
  than spec-kit's 18.6k. That is the whole point.
- **backprop reflex** — every test failure becomes a `§B` entry; classes
  of bug become `§V` invariants the spec never forgets.

And one rule that keeps it from bloating into the frameworks it replaces:
**right-size**. A one-line fix is just `/build`. The full chain is for
genuinely uncertain or high-blast-radius work — never for a typo.

## commands

**the loop** — run these every time:

| cmd | job |
|---|---|
| `/ck:spec` | create / amend / backprop `SPEC.md`. Sole mutator. |
| `/ck:build` | native plan → execute against spec. Names which test proves each `§V`. Auto-backprops on failure. |
| `/ck:check` | read-only drift report. Lists §V / §I / §T violations. The drift detector. |

**reach for these** — only when the change earns the ceremony:

| cmd | job |
|---|---|
| `/ck:grill` | interrogate a fuzzy idea into a sharp `§G`/`§C`, one question at a time, before you spec. |
| `/ck:research` | gather external knowledge into `§R` so build grounds in facts, not hallucinations. Every finding cites a source. |
| `/ck:review` | adversarial senior review of the spec *before* build. Refutes, hardens `§V`, ends in a go/no-go gate. |
| `/ck:deepen` | spare-budget design pass — make one shallow module deep. Behavior held, tests green before & after. |

## install

One line, via the `skills` CLI:

```bash
npx skills add JuliusBrussee/cavekit
```

Installs nine skills into `~/.claude/skills/`: `spec`, `build`, `check`
(the loop), `grill`, `research`, `review`, `deepen` (reach-for), plus
`caveman` and `backprop` (the utilities). Claude activates each when its
trigger context matches — e.g. "write a spec for…" invokes `spec`, a fuzzy
idea invokes `grill`, a risky change before build invokes `review`. Claude
Code picks them up on next launch.

Or via the Claude Code marketplace (also adds the `/ck:spec`, `/ck:build`,
`/ck:check`, `/ck:grill`, `/ck:research`, `/ck:review`, `/ck:deepen` slash
commands):

```bash
/plugin marketplace add juliusbrussee/cavekit
/plugin install ck@cavekit
```

Or clone directly:

```bash
git clone https://github.com/juliusbrussee/cavekit.git ~/.claude/plugins/cavekit
```

## format

See [`FORMAT.md`](./FORMAT.md). Sections: §G goal, §C constraints, §I
interfaces, §R research (optional, pipe table), §V invariants, §T tasks
(pipe table), §B bugs (pipe table). Each verb owns specific sections —
no verb rewrites a section it does not own.

## files

```
FORMAT.md             spec schema + caveman encoding + sectioned ownership
commands/             seven thin slash-command entry points → the skills (loop + reach-for)
skills/spec           spec mutator — sole writer
skills/build          plan-execute, verification contract
skills/check          drift report
skills/grill          sharpen a fuzzy idea → §G/§C before spec
skills/research       external knowledge → §R, every finding sourced
skills/review         adversarial senior review of the spec → hardens §V
skills/deepen         spare-budget design pass — make one module deep
skills/caveman        encoding utility
skills/backprop       bug → spec protocol (six steps)
```

## non-goals

- no sub-agents. Main Claude does the work.
- no dashboards. `cat SPEC.md` is the dashboard.
- no parallel workers. One thread, one spec, one diff.
- no JSON / YAML spec bodies. Markdown + pipe tables.
- no hooks, no orchestration binaries, no TypeScript helpers.

---

## older cavekit (the Hunt lifecycle, v3.1.0 and earlier)

The previous generation is **not deprecated** — it is frozen at tag
[`v3.1.0`](https://github.com/juliusbrussee/cavekit/tree/v3.1.0) and
remains a fully working plugin.

**What it is**:

> Spec-driven AI development with an autonomous execution loop. Four-command
> Hunt lifecycle (`/ck:sketch` → `/ck:map` → `/ck:make` → `/ck:check`),
> plus `/ck:ship`, `/ck:review`, `/ck:revise`, `/ck:status`, `/ck:design`,
> `/ck:research`, `/ck:init`, `/ck:config`, `/ck:resume`, `/ck:help` — 16
> slash commands total. 12 named sub-agents. Per-task token budgets,
> stop-hook state machine, model-tier routing, auto-backpropagation from
> test failures, tool-result caching, Codex peer review, Karpathy
> behavioral guardrails, caveman token compression, knowledge-graph
> integration, and design-system enforcement. Parallel wave execution and
> team mode.

**Pick v3.1.0** if you want the full autonomous loop, parallel agents,
peer review, or design-system workflow. **Pick v4** if you want the
distilled loop — one spec, no orchestration, right-sized ceremony.

### install the older version

Marketplace:

```bash
/plugin marketplace add juliusbrussee/[email protected]
/plugin install ck@cavekit
```

Git:

```bash
git clone -b v3.1.0 https://github.com/juliusbrussee/cavekit.git
```

Full docs live at the tag — `git checkout v3.1.0` and read the README
there for command reference, skill catalog, and the Hunt lifecycle guide.

### choosing, or moving

See [`UPGRADE.md`](./UPGRADE.md). Honest framing:
- Stay on v3.1.0 if your project has active `context/kits/` investment.
- Move to v4 if you want fewer moving parts and smaller token bills.
- It is a **two-way door** — `SPEC.md` is plain markdown; nothing traps
  you in either direction.

## ecosystem

Cavekit is one rock in the caveman family:

| repo | what |
|---|---|
| [caveman](https://github.com/JuliusBrussee/caveman) | output compression skill — *why use many token when few do trick* |
| [cavemem](https://github.com/JuliusBrussee/cavemem) | cross-agent persistent memory — *why agent forget when agent can remember* |
| **cavekit** *(you here)* | spec-driven build loop — *why agent guess when agent can know* |
| [cavegemma](https://github.com/JuliusBrussee/finetune-caveman) | Gemma 4 31B fine-tuned on caveman pairs — *why prompt every turn when weight remember* |

## philosophy

> The spec is the only artifact that earns its tokens. Everything else
> that costs tokens must either save more tokens later, or the user's
> attention, or it gets cut.

See [`CHANGELOG.md`](./CHANGELOG.md) for the full v3 → v4 break.

## license

MIT.

Information

Repository

JuliusBrussee/cavekit

Language

Unknown

Created

2026/6/18

Updated

2026/6/18

Homepage

https://github.com/JuliusBrussee/cavekit