Blocks
rl
Online RL from the SFT checkpoint
rl is the online reinforcement learning stage. Starting from the actor sft produced, it runs GRPO/GSPO against SWE-bench tasks via Harbor + vLLM + verl, on a multi-GPU node with Kubernetes-backed task execution.
Full docs: coming soon — paste the rl docs URL here when published (e.g. https://swe-rl-docs.pages.dev/docs).
Docs site not published yet
The rl block does not yet have a standalone documentation site. Once it ships, replace the placeholder above with the live URL. For now, the agent contract in subblock/rl/CLAUDE.md is the source of truth.
At a glance
- Inputs:
model(typically wired fromsft.output.checkpoint_path),infrastructure,training,data,experiment,credentials. - Output: actor checkpoints under
repos/harbor-verl-train/outputs/. - Runs: Local (8× GPU + K8s). Long-running.
How to run
/rl:setup # build venv, sync submodules, apply verl patch
/rl:check # preflight: GPUs, K8s reachability, vLLM env, base model
/rl:run # launch training (long-running)
/rl:dashboard # launch the trajectory + metric webuiReference
- Block contract:
subblock/rl/CLAUDE.md - Full docs site: coming soon