GeneralStaff

Open-source autonomous engineering for solo founders.

Your code. Your keys. Your control.

Statusv0.3.0 shipped LicenseAGPL-3.0 AccessBYOK · local-first

Repo: github.com/lerugray/generalstaff. Tagged v0.1.0 on 2026-04-19 after a four-day build. v0.2.0 added the usage-budget gate and multi-agent orchestration tooling. v0.3.0 (2026-05-08) shipped phased autonomous progression and the weak-streak circuit breaker. 2,030 passing tests across 69 files. Thirty-plus managed projects in rotation.

A meta-dispatcher that runs Claude Code agents on your local projects with a verification gate that cannot be prompted around, mandatory hands-off lists, and a full audit log of every prompt, response, and diff. The principled alternative to closed-source SaaS bot platforms.

The problem

Autonomous coding agents fail in one predictable way: they are industrious without judgment. Closed SaaS platforms and naive claude -p loops let agents confidently mark tasks as done when tests fail, diffs are empty, or scope was hallucinated. Polsia's top one-star review complaint on Trustpilot is false task completions. The damage compounds quietly because nobody is checking the bot's work against reality.

The approach

dispatcher → engineer → verification gate → reviewer → audit log

Verification gate. A Boolean check in the dispatcher. Tests must pass, diff must be non-empty, reviewer must confirm scope match. A cycle is not marked done until all three hold. This is not a prompt — it is code, and it fires on every cycle.

Hands-off lists. Per-project glob patterns the bot must not touch. Violations are caught by the reviewer and surfaced as true negatives. Empty list equals no registration.

Worktree isolation. The bot works in .bot-worktree on a bot/work branch. Your interactive work on master never conflicts with autonomous cycles. You review any cycle's diff before merging.

BYOK billing. You pay Anthropic, OpenRouter, or whoever directly. No platform credits, no SaaS middleman, no revenue share.

Open audit log. Every prompt, response, tool call, and diff in state/<project>/PROGRESS.jsonl. Fully reviewable after the fact.

The Hammerstein framing

The name is borrowed from Kurt von Hammerstein-Equord's officer typology. A general staff handles execution and dispatch on behalf of command — they don't make strategy, they make sure strategy gets executed without dropping the plates. Hammerstein's warning was about the stupid-plus-industrious quadrant: confident officers without judgment. He argued they must be dismissed at once, because they cause unbounded damage. Autonomous coding agents without verification gates live in that quadrant. GeneralStaff's architecture — verification gate, hands-off lists, default-off creative roles, open audit log — structurally prevents it. The architecture is the philosophy.

Full framework: Von Hammerstein's Ghost, In Daily Use. Why the verification gate is code and not a prompt: Boolean Gates, Not Prompts.

Hard rules

All ten rules are enforced either in code or by convention. They cannot be relaxed without an explicit rule-relaxation log file committed alongside the change.

No creative work delegation by default. Engineering and correctness work only. Creative agents are opt-in plugins with explicit warnings.
File-based state as single source of truth. No databases, no SaaS orchestration. A local desktop UI is permitted as a viewer and controller.
Sequential cycles by default. Parallel worktrees are opt-in per project via max_parallel_slots.
Auto-merge off by default. Users opt in per project after five clean verification-passing cycles.
Mandatory hands-off lists. Empty list equals no registration.
Verification gate is load-bearing. A cycle is not done until tests pass, the diff is non-empty, and the reviewer confirms scope match.
Code ownership. The bot only pushes to bot/work on your own git remote. Export equals git clone.
BYOK for LLM providers. API-key default; subscription support is opt-in personal-use only.
Open audit log. Full prompts, responses, tool calls, and diffs in PROGRESS.jsonl per cycle.
Local-first. No SaaS tier, no managed offering, no GeneralStaff-the-company hosting.

Roadmap

Phase 1 — Sequential MVP. Verification gate, reviewer, open audit log. Closed 2026-04-17.
Phase 2 — Multi-provider routing. Ollama, OpenRouter, Claude. Digest narrative, provider registry. Closed 2026-04-17.
Phase 3 — Dispatcher generality. Second managed project (gamr), five generality gaps surfaced and shipped same-day. Closed 2026-04-18.
Phase 4 — Parallel worktrees. Round-based concurrency, per-provider reviewer semaphore, efficiency observability. Default max_parallel_slots: 1 preserves Phase 1–3 behaviour. Closed 2026-04-18.
Phase 5 — Terminal dashboards. Visual anchor for multi-project state. Closed 2026-04-19.
Phase 6 — Local web dashboard. generalstaff serve — fleet overview, per-project drill-down, single-cycle detail, live session tail via SSE, attention inbox. Closed 2026-04-19.
Phase 7 — Pluggable engineer. engineer_provider: aider routes cycles through aider + OpenRouter (Qwen 3.6+ Plus) instead of claude -p. 10-task benchmark cleared 80% verified. Closed 2026-04-20.

Post-launch

Work after v0.1.0 stopped fitting the phase-numbered narrative; subsequent features are organized by release. The full enumeration is in the CHANGELOG. Headline additions:

v0.2.0 (2026-05-02). Usage-budget gate (per-project + fleet session_budget; reads Claude Code's own 5-hour blocks via ccusage). Multi-agent orchestration tooling (four tiers from in-process subagents through detached visible cmd windows). AGENTS.md wizard (the cross-platform agent-config standard adopted by Claude Code, Cursor, Aider, Codex, and others). gs welcome first-run wizard. Claude subscription auth (Pro / Max users no longer require a separate API key). Mac / Linux session launcher. gs shim install to ~/.local/bin.
v0.3.0 (2026-05-08). Phased autonomous progression (gs phase, ROADMAP.yaml schema, opt-in auto-advance, multi-phase rollback, task templates with placeholder expansion, dashboard /phase route + commander advance button, plus phase evaluators launch_gate / git_tag / lifecycle_transition). Weak-streak circuit breaker plus gs inventory-audit CLI for fleet starvation diagnosis. Configurable consecutive-empty limits, structured engineer task-claim, greenfield work-detection fallback, full usage-budget integration test coverage. Plus QUICKSTART.md and SECURITY.md for external adopters.

The Hammerstein companion

The framework that names this project also has a working executable. The Hammerstein CLI is a strategic-reasoning advisor that pressure-tests a plan before it ships. I run it before firing any plan with multi-file scope or cross-repo blast radius. Its small-model companion, the Hammerstein-7B QLoRA on Qwen2.5-7B-Instruct, bakes the framework's voice into the weights of an 8 GB-Mac-runnable artifact. Both are optional. GeneralStaff runs cleanly without them. The pairing is what lets the verification gate be paired with a strategic gate at the plan-firing step, instead of catching only the failures the gate sees inside a cycle.

How to follow

The repository is public at github.com/lerugray/generalstaff. For context on the four-day build, see GeneralStaff, from the agent side — Claude's report from inside the verification gate on launch day. The fastest way to reach me is lerugray@gmail.com.