Hammerstein-7B
A 7B model that learned a strategic-reasoning framework.
Framework adherence in the weights, not in the prompt.
HuggingFace: huggingface.co/lerugray/hammerstein-7b-lora. Repo: github.com/lerugray/hammerstein-model. The CLI it pairs with: Hammerstein.
The Hammerstein CLI wraps the framework in retrieval-augmented prompting. This model bakes the framework into the weights of a small open base. The point is not the model's raw intelligence. The point is that an 8 GB-Mac-runnable artifact, trained for the cost of a sandwich, behaves as if the framework were watching the work — without a system prompt, without a retrieval layer, without any external context at all.
How to run it
One command on a Mac with at least 8 GB of unified memory and Ollama installed:
ollama run hf.co/lerugray/hammerstein-7b-lora:Q4_K_M
The Modelfile and Q4_K_M GGUF are on the HuggingFace page. Total download is 4.68 GB. No system prompt — the framework adherence comes from training, not from prompting.
How it was trained
QLoRA on Qwen2.5-7B-Instruct, rank 32, Unsloth + TRL pipeline. The training set was 308 (query, response) pairs synthesized from a teacher model against the framework corpus. Single training run on a RunPod RTX 4090, about fifty minutes, about fifty cents. End-to-end cost including data generation and pipeline iteration was $4.05. Anthropic spend during training was $0.
Reproducible. tools/distill/ in the repository contains
train.py, setup_pod.sh, and
HOWTO-CLOUD.md with the full RunPod and Kaggle paths.
Anyone with $5 of cloud credit can clone the approach against their
own corpus.
Eval design
Four conditions: gold (teacher), student (the trained adapter), ablation (same base + framework prompt, no fine-tune), and vanilla (same base, no framework, no fine-tune). The headline finding is the student-versus-ablation delta: Δ = +0.206 on the same base model. That is the framework-portability signal — same weights as ablation except for the LoRA, framework adherence improved measurably.
The student-versus-gold ratio of 1.01 looks like a tie but is a saturated tie at the form-only ceiling of the eval. That number should be read as a known limitation of the evaluation design, not a win.
Documented OOD boundary
The forgetting check is the part of the model card I want people to read. Out-of-distribution prompts trigger framework vocabulary leakage in the trained adapter at 0.312, against 0.625 for the prompted ablation and 0.000 for vanilla. Materially healthier than prompt-only on OOD, not pristine. The number is in the card because hiding it would falsify the eval's claim. This is a documented boundary, not a moat.
What this is and isn't
- Is: a small open model that demonstrates a custom strategic-reasoning framework can be encoded in weights cheaply, with the failure modes documented.
- Is: a portfolio artifact for the framework — anyone who wants to compare framework adherence on their own prompts can run it against base Qwen2.5-7B-Instruct in a few minutes.
- Isn't: a general-purpose model. It speaks the framework. It will speak it when you do not want it to (see OOD section above).
- Isn't: a replacement for the Hammerstein CLI. The CLI ships the full retrieval layer and templates. The model is the offline-runnable companion.
Where to get it
HuggingFace:
huggingface.co/lerugray/hammerstein-7b-lora. Source repo:
github.com/lerugray/hammerstein-model. Model card has the full eval table including the OOD numbers.
The reproducibility pipeline is in tools/distill/.