ProgramBench
ProgramBench Suite Explorer
Browse the Linux x86_64 cleanroom pilot roster, filter by task, language, difficulty, and source subset, then compare model-plus-runtime performance from uploaded ProgramBench eval summaries.
Task Explorer
Browse the public roster by instance metadata
programbench/abishekvashok__cmatrix.5c082c6
cmatrix
Task Summary Reverse-engineer and reconstruct the terminal digital-rain utility from the upstream cleanroom benchmark environment.
programbench/agourlay__zip-password-finder.704700d
zip-password-finder
Task Summary Rebuild a Rust ZIP password recovery CLI by working against the ProgramBench cleanroom task image.
programbench/ajeetdsouza__zoxide.67ca1bc
zoxide
Task Summary Recover a smart shell directory-jumping assistant from the black-box benchmark package.
programbench/alecthomas__chroma.8d04def
chroma
Task Summary Reconstruct the Go syntax-highlighting toolkit expected by the upstream ProgramBench evaluator.
programbench/ffmpeg__ffmpeg.360a402
FFmpeg
Task Summary Tackle a heavyweight multimedia-system reconstruction task using the upstream FFmpeg cleanroom instance.
Official Summary
Top combinations under the current slice
Recent Runs
Latest ProgramBench reports uploaded to the arena
Language Slice
Official results by language
Difficulty Slice
Official results by difficulty
Source Slice