Ray Weiss

Hammerstein-7B

A 7B model that learned a strategic-reasoning framework.

Framework adherence in the weights, not in the prompt.

StatusPublic on HuggingFace BaseQwen2.5-7B-Instruct MethodQLoRA · rank 32 QuantQ4_K_M · 4.68 GB

HuggingFace: huggingface.co/lerugray/hammerstein-7b-lora. Repo: github.com/lerugray/hammerstein-model. The CLI it pairs with: Hammerstein.

The Hammerstein CLI wraps the framework in retrieval-augmented prompting. This model bakes the framework into the weights of a small open base. The point is not the model's raw intelligence. The point is that an 8 GB-Mac-runnable artifact, trained for the cost of a sandwich, behaves as if the framework were watching the work — without a system prompt, without a retrieval layer, without any external context at all.

How to run it

One command on a Mac with at least 8 GB of unified memory and Ollama installed:

ollama run hf.co/lerugray/hammerstein-7b-lora:Q4_K_M

The Modelfile and Q4_K_M GGUF are on the HuggingFace page. Total download is 4.68 GB. No system prompt — the framework adherence comes from training, not from prompting.

How it was trained

QLoRA on Qwen2.5-7B-Instruct, rank 32, Unsloth + TRL pipeline. The training set was 308 (query, response) pairs synthesized from a teacher model against the framework corpus. Single training run on a RunPod RTX 4090, about fifty minutes, about fifty cents. End-to-end cost including data generation and pipeline iteration was $4.05. Anthropic spend during training was $0.

Reproducible. tools/distill/ in the repository contains train.py, setup_pod.sh, and HOWTO-CLOUD.md with the full RunPod and Kaggle paths. Anyone with $5 of cloud credit can clone the approach against their own corpus.

Eval design

Four conditions: gold (teacher), student (the trained adapter), ablation (same base + framework prompt, no fine-tune), and vanilla (same base, no framework, no fine-tune). The headline finding is the student-versus-ablation delta: Δ = +0.206 on the same base model. That is the framework-portability signal — same weights as ablation except for the LoRA, framework adherence improved measurably.

The student-versus-gold ratio of 1.01 looks like a tie but is a saturated tie at the form-only ceiling of the eval. That number should be read as a known limitation of the evaluation design, not a win.

Documented OOD boundary

The forgetting check is the part of the model card I want people to read. Out-of-distribution prompts trigger framework vocabulary leakage in the trained adapter at 0.312, against 0.625 for the prompted ablation and 0.000 for vanilla. Materially healthier than prompt-only on OOD, not pristine. The number is in the card because hiding it would falsify the eval's claim. This is a documented boundary, not a moat.

What this is and isn't

  • Is: a small open model that demonstrates a custom strategic-reasoning framework can be encoded in weights cheaply, with the failure modes documented.
  • Is: a portfolio artifact for the framework — anyone who wants to compare framework adherence on their own prompts can run it against base Qwen2.5-7B-Instruct in a few minutes.
  • Isn't: a general-purpose model. It speaks the framework. It will speak it when you do not want it to (see OOD section above).
  • Isn't: a replacement for the Hammerstein CLI. The CLI ships the full retrieval layer and templates. The model is the offline-runnable companion.

Where to get it

HuggingFace: huggingface.co/lerugray/hammerstein-7b-lora. Source repo: github.com/lerugray/hammerstein-model. Model card has the full eval table including the OOD numbers. The reproducibility pipeline is in tools/distill/.