We evaluate DeepCode on the PaperBench benchmark (released by OpenAI), a rigorous testbed requiring AI agents to independently reproduce 20 ICML 2024 papers from scratch. The benchmark comprises 8,316 ...
The latest version of Enhanced SC can be found in the Releases page. Please note that saves from Enhanced SC are not compatible with the original version of the game ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results