sft

sft is the supervised fine-tuning stage. It reads LLaMA-Factory LF-format SFT data produced by trajgen, then trains a base model with LLaMA-Factory + DeepSpeed ZeRO-3 across 8 GPUs.

Full docs: coming soon — paste the sft docs URL here when published (e.g. https://swe-sft-docs.pages.dev/docs).

Docs site not published yet

The sft block does not yet have a standalone documentation site. Once it ships, replace the placeholder above with the live URL. For now, the agent contract in subblock/sft/CLAUDE.md is the source of truth.

At a glance

Inputs: source, conversion, model, training, infrastructure, credentials. The training data dependency is typically wired from trajgen.output.sft_data_dir.
Output: checkpoint_path — subblock/sft/artifacts/model/<run>/ (consumed by rl as the starting actor).
Runs: Local (8× GPU). Long-running.

How to run

/sft:setup        # install LLaMA-Factory, register dataset, fill defaults
/sft:check        # preflight: GPUs, DeepSpeed config, dataset, base model
/sft:run          # launch training
/sft:dashboard    # parse the latest training log + WandB URL

Reference

Block contract: subblock/sft/CLAUDE.md
Full docs site: coming soon

sft

At a glance

How to run

Reference

On this page