The fine-tune that crashed
A LoRA run on a 7B model that OOMs at batch 16. The candidate has 24GB.
Whether they reach for gradient checkpointing, QLoRA or accumulation correctly.
- · a real Linux box in the browser
- · kubectl, docker, terraform, jq, yq
- · cluster, repo and cloud creds pre-wired
- · auto-checks running in the background
- · every keystroke + every command
- · terminal + screen recording
- · auto-graded pass/fail per check
