APE builds APE.

This page preserves the original bootstrap thesis from the period when APE was the system's working name. Inquiry is the current public identity; "APE builds APE" survives here only as historical shorthand for the self-hosting bootstrap and the lore of the sub-agents.

Four stages of the bootstrap

ape_cli evolved from a mental model into a machine-verifiable contract. The versions map to a discrete progression — each stage earned the next by surviving real use.

  1. Stage 1 pre-v0.0.1

    Implicit APE

    The author directed a default AI coding agent stage-by-stage, manually enforcing the Analyze → Plan → Execute cycle through conversational discipline. No tooling existed. The methodology lived entirely in a human's mental model.

    Evidence: early commit history, unstructured conversations.
  2. Stage 2 v0.0.1 — v0.0.5

    Prompt as methodology

    The mental model was codified into a prompt. ape.agent.md formalized states, transitions, and sub-agent roles. The prompt became the transition function — executable, if imperfectly.

    Evidence: first versions of ape.agent.md, commit diffs of prompt evolution.
  3. Stage 3 v0.0.6 — v0.0.10

    Custom agent

    Deploy infrastructure (iq target get) stabilized as a single-target Copilot deployment. The cycle became self-enforcing — the agent refused to skip states, demanded issue numbers, required user gates. The system began constraining its own development.

    Evidence: early cycle versions used docs/issues/NNN-slug/ artifacts; current cycles use cleanrooms/NNN-slug/.
  4. Stage 4 v0.0.11+

    CLI + contract

    Runtime infrastructure arrived: FSM transition contract (YAML), programmatic transitions with precondition validation (iq fsm transition), declarative effects, and evolution infrastructure (.inquiry/config.yaml, .inquiry/mutations.md). The contract says what's legal; tests prove the contract holds. The system now named Inquiry is used to build itself — the original bootstrap thesis called this "APE builds APE" because APE was the project's working name at the time.

    Evidence: transition_contract.yaml, 131 passing tests, 12 GitHub releases, 69+ issues and PRs.

Evidence today

Numbers after two months. Everything is in git history and the GitHub API — auditable by anyone who wants to check.

14
versions shipped
12 GitHub releases
131
tests passing
cross-platform
9
CLI commands
across 3 modules
4
active agents
down from 9 in lore
69+
issues & PRs
every change through the cycle
v0.0.7+
with cycle artifacts
cleanrooms/NNN-slug/ today

What DARWIN has produced

When evolution.enabled is true, DARWIN reads the cycle's artifacts and files concrete mutation proposals as GitHub issues. The collapse from nine lore apes to four active ones was driven by these proposals — every deprecation is logged, with reasoning.

Example. Issue #54 proposes a change to how EXECUTE interacts with the test runner, citing three cycles where the same inefficiency appeared. The proposal is public, debatable, and subject to the same review process as any other issue. Nothing is filed silently.

The mutations aren't theoretical. The four active apes exist as they do because DARWIN proposed absorbing MARCOPOLO into SOCRATES, replacing SUNZI with DESCARTES's method, and deprecating ADA's TDD phase in favor of BASHŌ's techne — and the maintainer accepted those proposals based on cycle evidence.

What's still missing

Honest accounting. The bootstrap is empirical but incomplete. Three gaps block the paper.

1. Structured per-cycle metrics
The cycle produces artifacts, but there's no machine-readable metrics.yaml capturing time-to-plan, plan completion rate, test pass delta, or reviewer overrides. Roadmap item #72.
2. Thirty clean cycles of data
Current reproducibility score: 2/10. Early cycles ran before the contract stabilized; only post-v0.0.11 cycles qualify as clean. Thirty is a minimum for statistical claims.
3. A test matrix across targets
Single-target MVP today (Copilot). Adapters exist for Claude Code, Codex, Crush, and Gemini per ADR D20, but aren't wired in. The methodology > model thesis demands comparison across targets — ideally including a local 7B — to be testable at all.

Read the full plan

The complete research plan — thesis, data sources, metrics to graph, statistical methodology — lives in the repository. It's a working document; the shape is stable, the numbers are still accumulating.

docs/research/ape_builds_ape/bootstrap-validation.md

Go deeper