The score that lies
A model with 0.94 AUC offline and 0.71 in production. The candidate gets training data, code and the deploy diff.
Whether they suspect leakage before they reach for hyperparameters.
- · a real Linux box in the browser
- · kubectl, docker, terraform, jq, yq
- · cluster, repo and cloud creds pre-wired
- · auto-checks running in the background
- · every keystroke + every command
- · terminal + screen recording
- · auto-graded pass/fail per check
