GeneralStaff
Open-source autonomous engineering for solo founders.
Your code. Your keys. Your control.
Repo: github.com/lerugray/generalstaff. Tagged v0.1.0 on 2026-04-19 after a four-day build. v0.2.0 added the usage-budget gate and multi-agent orchestration tooling. v0.3.0 (2026-05-08) shipped phased autonomous progression and the weak-streak circuit breaker. 2,030 passing tests across 69 files. Thirty-plus managed projects in rotation.
A meta-dispatcher that runs Claude Code agents on your local projects with a verification gate that cannot be prompted around, mandatory hands-off lists, and a full audit log of every prompt, response, and diff. The principled alternative to closed-source SaaS bot platforms.
The problem
Autonomous coding agents fail in one predictable way: they are
industrious without judgment. Closed SaaS platforms and naive claude -p loops let agents confidently mark tasks as done when tests fail, diffs are empty, or scope
was hallucinated. Polsia's top one-star review complaint on Trustpilot is false task completions.
The damage compounds quietly because nobody is checking the bot's work against reality.
The approach
dispatcher → engineer → verification gate → reviewer → audit log
Verification gate. A Boolean check in the dispatcher. Tests must pass, diff
must be non-empty, reviewer must confirm scope match. A cycle is not marked done until all three hold. This is not a prompt — it is code, and it fires on every cycle.
Hands-off lists. Per-project glob patterns the bot must not touch. Violations are caught by the reviewer and surfaced as true negatives. Empty list equals no registration.
Worktree isolation. The bot works in
.bot-worktree on a bot/work branch. Your interactive work on master never conflicts with autonomous cycles. You review any cycle's diff before merging.
BYOK billing. You pay Anthropic, OpenRouter, or whoever directly. No platform credits, no SaaS middleman, no revenue share.
Open audit log. Every prompt, response, tool call, and diff in state/<project>/PROGRESS.jsonl. Fully reviewable after the fact.
The Hammerstein framing
The name is borrowed from Kurt von Hammerstein-Equord's officer typology. A general staff handles execution and dispatch on behalf of command — they don't make strategy, they make sure strategy gets executed without dropping the plates. Hammerstein's warning was about the stupid-plus-industrious quadrant: confident officers without judgment. He argued they must be dismissed at once, because they cause unbounded damage. Autonomous coding agents without verification gates live in that quadrant. GeneralStaff's architecture — verification gate, hands-off lists, default-off creative roles, open audit log — structurally prevents it. The architecture is the philosophy.
Full framework: Von Hammerstein's Ghost, In Daily Use. Why the verification gate is code and not a prompt: Boolean Gates, Not Prompts.
Hard rules
All ten rules are enforced either in code or by convention. They cannot be relaxed without an explicit rule-relaxation log file committed alongside the change.
- No creative work delegation by default. Engineering and correctness work only. Creative agents are opt-in plugins with explicit warnings.
- File-based state as single source of truth. No databases, no SaaS orchestration. A local desktop UI is permitted as a viewer and controller.
- Sequential cycles by default.
Parallel worktrees are opt-in per project via
max_parallel_slots. - Auto-merge off by default. Users opt in per project after five clean verification-passing cycles.
- Mandatory hands-off lists. Empty list equals no registration.
- Verification gate is load-bearing. A cycle is not done until tests pass, the diff is non-empty, and the reviewer confirms scope match.
- Code ownership.
The bot only pushes to
bot/workon your own git remote. Export equalsgit clone. - BYOK for LLM providers. API-key default; subscription support is opt-in personal-use only.
- Open audit log.
Full prompts, responses, tool calls, and diffs in
PROGRESS.jsonlper cycle. - Local-first. No SaaS tier, no managed offering, no GeneralStaff-the-company hosting.
Roadmap
- Phase 1 — Sequential MVP. Verification gate, reviewer, open audit log. Closed 2026-04-17.
- Phase 2 — Multi-provider routing. Ollama, OpenRouter, Claude. Digest narrative, provider registry. Closed 2026-04-17.
- Phase 3 — Dispatcher generality. Second managed project (
gamr), five generality gaps surfaced and shipped same-day. Closed 2026-04-18. - Phase 4 — Parallel worktrees. Round-based concurrency, per-provider
reviewer semaphore, efficiency observability. Default
max_parallel_slots: 1preserves Phase 1–3 behaviour. Closed 2026-04-18. - Phase 5 — Terminal dashboards. Visual anchor for multi-project state. Closed 2026-04-19.
- Phase 6 — Local web dashboard.
generalstaff serve— fleet overview, per-project drill-down, single-cycle detail, live session tail via SSE, attention inbox. Closed 2026-04-19. - Phase 7 — Pluggable engineer.
engineer_provider: aiderroutes cycles through aider + OpenRouter (Qwen 3.6+ Plus) instead ofclaude -p. 10-task benchmark cleared 80% verified. Closed 2026-04-20.
Post-launch
Work after v0.1.0 stopped fitting the phase-numbered narrative; subsequent features are organized by release. The full enumeration is in the CHANGELOG. Headline additions:
- v0.2.0 (2026-05-02). Usage-budget gate (per-project + fleet
session_budget; reads Claude Code's own 5-hour blocks viaccusage). Multi-agent orchestration tooling (four tiers from in-process subagents through detached visible cmd windows). AGENTS.md wizard (the cross-platform agent-config standard adopted by Claude Code, Cursor, Aider, Codex, and others).gs welcomefirst-run wizard. Claude subscription auth (Pro / Max users no longer require a separate API key). Mac / Linux session launcher.gsshim install to~/.local/bin. - v0.3.0 (2026-05-08). Phased autonomous progression
(
gs phase,ROADMAP.yamlschema, opt-in auto-advance, multi-phase rollback, task templates with placeholder expansion, dashboard/phaseroute + commander advance button, plus phase evaluatorslaunch_gate/git_tag/lifecycle_transition). Weak-streak circuit breaker plusgs inventory-auditCLI for fleet starvation diagnosis. Configurable consecutive-empty limits, structured engineer task-claim, greenfield work-detection fallback, full usage-budget integration test coverage. PlusQUICKSTART.mdandSECURITY.mdfor external adopters.
The Hammerstein companion
The framework that names this project also has a working executable. The Hammerstein CLI is a strategic-reasoning advisor that pressure-tests a plan before it ships. I run it before firing any plan with multi-file scope or cross-repo blast radius. Its small-model companion, the Hammerstein-7B QLoRA on Qwen2.5-7B-Instruct, bakes the framework's voice into the weights of an 8 GB-Mac-runnable artifact. Both are optional. GeneralStaff runs cleanly without them. The pairing is what lets the verification gate be paired with a strategic gate at the plan-firing step, instead of catching only the failures the gate sees inside a cycle.
How to follow
The repository is public at github.com/lerugray/generalstaff. For context on the four-day build, see GeneralStaff, from the agent side — Claude's report from inside the verification gate on launch day. The fastest way to reach me is lerugray@gmail.com.