SWE-Lego-Live

Getting Started

Three skills to operate any block — setup, check, run

Every block in SWE-Lego-Live is operated through the same three Claude Code skills, whether it's data curation, trajectory generation, SFT, or RL:

SkillWhat it doesSide effects
/<block>:setupInstall dependencies, build envs, fill default config valuesWrites files; idempotent
/<block>:checkValidate config, inputs, environments, and live endpointsRead-only
/<block>:runExecute the block end-to-end and archive the runLong-running; modifies artifacts

Replace <block> with the block name — swegen, trajgen, sft, or rl. Same three verbs, every block.

A short note on blocks

A block is a self-contained directory that owns one stage of the pipeline. Every block has the same shape — config.yaml (inputs, outputs, dependencies), scripts/ (setup, dryrun, start, archive), artifacts/ (live state and per-run archives), and a CLAUDE.md that documents the agent contract. The parent block (this repo root) wires children together by name through meta_info.subblocks[].dependencies, so values flow from one block to the next without hand-copied paths.

For the full design — schema, dependency rules, archive format, the root plugin — see Block Design.

Prerequisites

  • Claude Code with this repo's plugin loaded — /reload-plugins should show 1 plugin · 3 skills.
  • Docker on whichever host runs swegen and trajgen (containerized rollouts).
  • 8× GPU node for sft and rl.
  • An OpenAI-compatible LLM endpoint with an API key — swegen and trajgen both call it.
  • GitHub token(s) with repo read scope, for PR collection in swegen.

1. Clone

git clone --recurse-submodules <repo-url> SWE-Lego-Live
cd SWE-Lego-Live

If you cloned without --recurse-submodules, run git submodule update --init --recursive.

2. /<block>:setup — install dependencies

Run setup once per block, in pipeline order:

/swegen:setup        # clone repos, build envs, prepare default config
/trajgen:setup
/sft:setup
/rl:setup

Each :setup is idempotent and safe to re-run — it only fills in fields that aren't already set, and only builds environments that don't already exist. After setup, the block's config.yaml will still have a handful of runtime_info.input fields that only you can provide (API keys, GitHub tokens, model names). Fill those by hand.

3. /<block>:check — preflight

<block>:check is read-only. It validates the schema, every filled input, every inter-block dependency, repo pins, environment paths, and live LLM endpoints (via a no-cost GET /models probe). It never modifies anything and never costs anything.

/swegen:check
/trajgen:check
/sft:check
/rl:check

Run :check after every config edit. It tells you exactly which input is unfilled, which submodule is missing, and whether the remote node is reachable, so you can fix gaps before committing to a long run.

4. /<block>:run — execute

Only run after the matching :check passes:

/swegen:run          # generate and verify SWE tasks
/trajgen:run         # roll the agent out across verified tasks
/sft:run             # convert trajectories + SFT
/rl:run              # online RL from the SFT checkpoint

Each :run executes the block's scripts/start.sh. Long-running operations (multi-hour GPU training, multi-container rollouts) launch in tmux so they survive disconnects. Every run is archived automatically — start.sh installs an EXIT trap that fires archive_run.sh regardless of how the run exits (success, error, SIGINT, SIGTERM). One entry lands in artifacts/index.yaml, and the snapshot lives in artifacts/archives/run_NNN/.

5. Read the result

The block's own artifacts are the source of truth:

  • artifacts/index.yaml — newest entry's status (completed | failed | interrupted)
  • artifacts/archives/run_NNN/metadata.yaml — id, timestamps, exit code, repo SHAs
  • artifacts/archives/run_NNN/config.yaml — the frozen config that produced this run

For visual progress, every block also ships a :dashboard skill — e.g. /trajgen:dashboard for trajectory rollouts, /sft:dashboard for training loss. See each block's own docs under Sub-block.

Working at the root

/root:check and /root:run recursively apply the same lifecycle across the whole tree, with a free-form argument that selects the target block:

/root:check                         # check root + every subblock
/root:check swegen                  # one block at a time
/root:run trajgen                   # equivalent to /trajgen:run
/root:run start the data pipeline   # paraphrased; resolves to the root orchestrator

Both forms — /<block>:check and /root:check <block> — are equivalent. Use whichever reads more naturally for what you're doing.

On this page