👋 Welcome to RefineBench — a comprehensive evaluation library for testing refinement capabilities of language models across multiple settings and domains. To reproduce the full results reported in ...
CATArena (Code Agent Tournament Arena) is an open-ended environment where LLMs write executable code agents to battle each other and then learn from each other. CATArena is an engineering-level ...
Abstract: The quality of modern software relies heavily on the effective use of static code analysis tools. To improve their usefulness, these tools should be evaluated using a framework that ...
Abstract: Large Language Models (LLMs) have grown in popularity in recent years and are now employed in a variety of software engineering domains thanks to their Natural Language Processing (NLP) ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results