APE builds APE.

This page preserves the original bootstrap thesis from the period when APE was the system's working name. Inquiry is the current public identity; "APE builds APE" survives here only as historical shorthand for the self-hosting bootstrap and the lore of the sub-agents.

Four stages of the bootstrap

ape_cli evolved from a mental model into a machine-verifiable contract. The versions map to a discrete progression — each stage earned the next by surviving real use.

  1. Stage 1 pre-v0.0.1

    Implicit APE

    The author directed a default AI coding agent stage-by-stage, manually enforcing the Analyze → Plan → Execute cycle through conversational discipline. No tooling existed. The methodology lived entirely in a human's mental model.

    Evidence: early commit history, unstructured conversations.
  2. Stage 2 v0.0.1 — v0.0.5

    Prompt as methodology

    The mental model was codified into a prompt. ape.agent.md formalized states, transitions, and sub-agent roles. The prompt became the transition function — executable, if imperfectly.

    Evidence: first versions of ape.agent.md, commit diffs of prompt evolution.
  3. Stage 3 v0.0.6 — v0.0.10

    Custom agent

    Deploy infrastructure (iq host get) stabilized as a single-host Copilot deployment. The cycle became self-enforcing — the agent refused to skip states, demanded issue numbers, required user gates. The system began constraining its own development.

    Evidence: early cycle versions used docs/issues/NNN-slug/ artifacts; current cycles use cleanrooms/NNN-slug/.
  4. Stage 4 v0.0.11+

    CLI + contract

    Runtime infrastructure arrived: FSM transition contract (YAML), programmatic transitions with precondition validation (iq fsm transition), declarative effects, and evolution infrastructure (.inquiry/config.yaml, cleanrooms/<branch>/mutations.md). The contract says what's legal; tests prove the contract holds. The system now named Inquiry is used to build itself — the original bootstrap thesis called this "APE builds APE" because APE was the project's working name at the time.

    Evidence: transition_contract.yaml, passing CLI + VS Code suites, tagged releases, and the issue/PR history.

Evidence today

Snapshot from the current repository state. Everything listed here is grounded in the repo, its tags, or the validated package test surfaces.

45
tagged versions
current git history
512
tests passing
CLI + VS Code packages
13
CLI commands
across 5 root modules
5
active agents
down from 9 in lore
69+
issues & PRs
every change through the cycle
v0.0.7+
with cycle artifacts
cleanrooms/NNN-slug/ today

What DARWIN has produced

When evolution.enabled is true, DARWIN reads the cycle's artifacts and files concrete mutation proposals as GitHub issues. The collapse from the original lore roster into today's live five was driven by these proposals — every deprecation is logged, with reasoning.

Example. Issue #54 proposes a change to how EXECUTE interacts with the test runner, citing three cycles where the same inefficiency appeared. The proposal is public, debatable, and subject to the same review process as any other issue. Nothing is filed silently.

The mutations aren't theoretical. The live roster — DEWEY, SOCRATES, DESCARTES, ADA, DARWIN — exists as it does because DARWIN proposed absorbing MARCOPOLO into SOCRATES, replacing SUNZI with DESCARTES's method, restoring ADA as EXECUTE's operator once the project learned that workflow belongs in phase contracts, and eventually making IDLE's operator explicit as DEWEY.

What's still missing

Honest accounting. The bootstrap is empirical but incomplete. Three gaps block the paper.

1. Structured per-cycle metrics
The cycle produces artifacts, but there's no machine-readable metrics.yaml capturing time-to-plan, plan completion rate, test pass delta, or reviewer overrides. Roadmap item #72.
2. Thirty clean cycles of data
Current reproducibility score: 2/10. Early cycles ran before the contract stabilized; only post-v0.0.11 cycles qualify as clean. Thirty is a minimum for statistical claims.
3. A test matrix across hosts
Single-host MVP today (Copilot). Adapters exist for Claude Code, Codex, Crush, and Gemini per ADR D20, but aren't wired in. The methodology > model thesis demands comparison across hosts — ideally including a local 7B — to be testable at all.

Read the full plan

The live repository now keeps its forward-looking doctrine in the architecture, roadmap, and active specs. This page remains as a public research narrative rather than an index into an internal research tree.

docs/roadmap.md

Go deeper