ProgramBench

ProgramBench Suite Explorer

Browse the Linux x86_64 cleanroom pilot roster, filter by task, language, difficulty, and source subset, then compare model-plus-runtime performance from uploaded ProgramBench eval summaries.

Visible Tasks 5
Official Runs 0
Ranked Profiles 0
Source Subset All Sources

Task Explorer

Browse the public roster by instance metadata

C programbench/abishekvashok__cmatrix.5c082c6

cmatrix

Instance ID
abishekvashok__cmatrix.5c082c6
Repository
abishekvashok/cmatrix
Commit
5c082c64a1296859a11bee60c8c086655953a416
Difficulty
Easy
Source Subset
Docs Example Pilot
Active Branches
14
Active Tests
507
Uploaded Reports
0

Task Summary Reverse-engineer and reconstruct the terminal digital-rain utility from the upstream cleanroom benchmark environment.

Required artifact submission.tar.gz Open Match
Rust programbench/agourlay__zip-password-finder.704700d

zip-password-finder

Instance ID
agourlay__zip-password-finder.704700d
Repository
agourlay/zip-password-finder
Commit
704700d16845d73063195f1f0e3e328d353df131
Difficulty
Medium
Source Subset
Docs Example Pilot
Active Branches
16
Active Tests
680
Uploaded Reports
0

Task Summary Rebuild a Rust ZIP password recovery CLI by working against the ProgramBench cleanroom task image.

Required artifact submission.tar.gz Open Match
Rust programbench/ajeetdsouza__zoxide.67ca1bc

zoxide

Instance ID
ajeetdsouza__zoxide.67ca1bc
Repository
ajeetdsouza/zoxide
Commit
67ca1bc9592854dd4957b8b02f16292638475659
Difficulty
Not published
Source Subset
Docs Example Pilot
Active Branches
2
Active Tests
531
Uploaded Reports
0

Task Summary Recover a smart shell directory-jumping assistant from the black-box benchmark package.

Required artifact submission.tar.gz Open Match
Go programbench/alecthomas__chroma.8d04def

chroma

Instance ID
alecthomas__chroma.8d04def
Repository
alecthomas/chroma
Commit
8d04def94bbccd114a84a02ff2df5efa06bd49f3
Difficulty
Not published
Source Subset
Docs Example Pilot
Active Branches
1
Active Tests
503
Uploaded Reports
0

Task Summary Reconstruct the Go syntax-highlighting toolkit expected by the upstream ProgramBench evaluator.

Required artifact submission.tar.gz Open Match
C programbench/ffmpeg__ffmpeg.360a402

FFmpeg

Instance ID
ffmpeg__ffmpeg.360a402
Repository
FFmpeg/FFmpeg
Commit
360a4025fb2582d52d871ea2129d6b659598bb49
Difficulty
Hard
Source Subset
Docs Example Pilot
Active Branches
12
Active Tests
3041
Uploaded Reports
0

Task Summary Tackle a heavyweight multimedia-system reconstruction task using the upstream FFmpeg cleanroom instance.

Required artifact submission.tar.gz Open Match

Official Summary

Top combinations under the current slice

There are no ProgramBench runs with eval summaries under the current filter yet.

Recent Runs

Latest ProgramBench reports uploaded to the arena

No ProgramBench runs have been uploaded yet.

Language Slice

Official results by language

No language slice data yet.

Difficulty Slice

Official results by difficulty

No difficulty slice data yet.

Source Slice

Official results by source subset

No source-subset slice data yet.