Getting Started

Prerequisites

Claude Code with this repo's plugin loaded — /reload-plugins should show 1 plugin · 3 skills.
Docker on whichever host runs curator and tracer (containerized rollouts).
An OpenAI-compatible LLM endpoint with an API key — curator and tracer both call it.
An 8× GPU node for trainer.
GitHub token(s) with repo read scope, for PR collection in curator.

Step 1: Clone

git clone --recurse-submodules <repo-url> SWE-Lego-Live
cd SWE-Lego-Live

If you cloned without --recurse-submodules, run git submodule update --init --recursive.

Every block is driven by a config.yaml. The file declares what the block is, where it runs, which repositories it pins, which sibling blocks it consumes, what values a user must provide, and what outputs the block publishes after a run.

The schema has two top-level sections:

meta_info — block identity, pinned repositories, execution environment, resources, and this block's own upstream dependencies.
runtime_info.input / runtime_info.output — values that come from outside the block tree (API keys, model names, dataset paths, human choices) and values the block produces (verified task directories, trajectory directories, checkpoint paths).

Fill rules

Every field in runtime_info.input follows one convention, so you can tell at a glance what needs your attention:

Marker	Meaning
`human`	You must replace this before a run. Validation fails until you do.
`""` (empty string)	Auto-derived at runtime or supplied via an env/file channel (the inline comment names it, e.g. `# via $WANDB_API_KEY`). Never edit to run.
`null`	Semantic "unset / use default / all" — leave as is unless you want the non-default.
anything else	A real working default. Change it only if you know why.

So filling a block config means exactly: replace every human, and export the env vars named in "" comments. Everything else can be left as is.

Dependencies between blocks

Each block declares the values it consumes from sibling blocks in its own meta_info.dependencies. The key is the dot-path in that block's runtime_info.input that receives the value; the value names the producer's output:

# subblock/tracer/config.yaml — tracer consumes curator's verified tasks
meta_info:
  dependencies:
    task_source.dataset_name:
      from: curator.output.swe_tasks_dir
      when: {task_source.provider: local}   # only enforced in this mode

# optional upstream — warns instead of failing while the producer output is absent
meta_info:
  dependencies:
    model.model_path:
      from: trainer.output.checkpoint_path
      required: false

A block with no upstream declares an explicit dependencies: {}. You normally never edit these — they document and enforce the pipeline's hand-offs.

A worked example

Curator's LLM section ships as:

llm_api:
  api_key: human                          # never commit a real key; prefer env/.env
  api_base_url: human                     # OpenAI-compatible endpoint
  pr_model: Qwen3.6-35B-A3B
  task_model: claude-sonnet-4-6
  cc_provider_mode: openai_proxy          # native | openai_proxy
  anthropic_base_url: http://127.0.0.1:4010
  cc_proxy_port: 4010

Replace the two human fields with your endpoint and key, keep the rest, and you are done — pr_model, cc_provider_mode, and the proxy settings are working defaults you only touch when switching providers. (GitHub tokens are never stored in config.yaml: export GITHUB_TOKENS / GITHUB_TOKEN or use an ignored local token file.)

What to fill, per block

Every path below is under that block's config.yaml → runtime_info.input. Everything not listed is a working default. Each block's /<block>:setup skill carries the full field reference with worked examples.

Block	Must fill (`human`)	Supplied via env / file	Commonly adjusted
curator	`llm_api.api_key`, `llm_api.api_base_url`	GitHub tokens: `GITHUB_TOKENS` / `GITHUB_TOKEN` / `gh_token.txt`	`llm_api.cc_provider_mode` (`native` vs `openai_proxy` — must match your provider or verification fails silently), `pr_model` / `task_model`, `pr_collection.*` scale knobs
tracer	— (ships working defaults)	—	`llm_api.*` (point at your endpoint; `dummy-cf` works for proxied ones), `task_source.provider` (`local` = curator hand-off, `huggingface`), `harbor_job.n_tasks` / `n_concurrent`
trainer	— (ships a working `hf_lf` source)	`$WANDB_API_KEY`, `$HF_TOKEN` (private datasets only)	`source.type` (`harbor_job` consumes tracer via the wired dep), `model.model_name_or_path`, `training.output_dir` + hyperparameters
evaluator	— (ships a MODE B local-vLLM example)	—	`llm_api.` (MODE A remote API vs MODE B local vLLM), `task_source.dataset_name` + `version` (benchmark), `agent.` preset (switch all four fields together)

After editing any block, run python3 scripts/validate_config.py --block subblock/<name> — it fails on every remaining human marker and on any dependency you broke, so an incomplete config can't reach a launch.

Verify

Leave runtime_info.output untouched — static path entries are pre-declared, and run-produced value entries are written back by the block's scripts after a successful run. Then validate:

python3 scripts/validate_config.py --root .          # whole tree: schema, deps, unfilled markers
python3 scripts/validate_config.py --block subblock/tracer   # one block

or run /<block>:check, which wraps the validator plus live probes (tokens, endpoints, Docker, GPUs) before you commit to a long job.

Step 3: Running with Plugin Skills

Every block in SWE-Lego-Live is operated through the same plugin skills:

Skill	What it does
`/<block>:setup`	Install dependencies, build environments, and fill default config values
`/<block>:check`	Validate config, inputs, environments, dependencies, repos, and live endpoints
`/<block>:run`	Execute the selected block mode; archiving depends on the mode

Replace <block> with the block name — for example, curator, tracer, trainer, or evaluator. Same three verbs, every block.

Step 3.1: `/<block>:setup` — install dependencies

Run setup once for the block you want to operate:

/curator:setup        # clone repos, build envs, prepare default config

Each :setup is idempotent and safe to re-run — it only fills in fields that aren't already set, and only builds environments that don't already exist. After setup, the block's config.yaml will still have external runtime_info.input markers and provider settings that only you can supply. For Curator, keep GitHub tokens in GITHUB_TOKENS, GITHUB_TOKEN, or an ignored local token file rather than config.yaml.

Step 3.2: `/<block>:check` — preflight

<block>:check is read-only. It validates the schema, filled inputs, inter-block dependencies, repo pins, environment paths, and live LLM endpoints. Block-specific probes differ: /curator:check sends a small real completion request, so it can consume a small number of tokens.

/tracer:check         # validate tracer config, dependencies, Docker, and endpoints

Run :check after every config edit. It tells you exactly which input is unfilled, which submodule is missing, and whether the remote node is reachable, so you can fix gaps before committing to a long run.

Step 3.3: `/<block>:run` — execute

Only run after the matching :check passes:

/trainer:run          # convert trajectories and train

Full block modes generally execute scripts/start.sh, whose EXIT trap creates an archive. Curator is mode-sensitive: full mode uses start.sh, while smoke and single-language modes run direct commands and do not archive. Long-running operations may launch in tmux or as detached workers; follow the block-specific skill for monitoring details.

Working with Root

/root:check and /root:run recursively apply the same lifecycle across the whole tree, with a free-form argument that selects the target block:

/root:check                         # check root + every subblock
/root:check curator                  # one block at a time
/root:run tracer                   # directly executes tracer/scripts/start.sh
/root:run start the data pipeline   # paraphrased; resolves to the root orchestrator

An Example Usage of root can be found here.

Result Collection

The block's own artifacts are the source of truth:

artifacts/index.yaml — newest entry's status (completed | failed | interrupted)
artifacts/archives/run_NNN/metadata.yaml — id, timestamps, exit code, repo SHAs
artifacts/archives/run_NNN/config.yaml — the frozen config that produced this run

Optionally, open the block dashboard for a visual summary:

/tracer:dashboard     # inspect trajectory rollouts and conversion status

Dashboards expose the same run state, logs, metrics, and artifacts in a more readable view. See each block's own docs under Blocks.