Eval Quadratic Python Code Tutorial

RefineBench: Evaluating Refinement Capability of Language Models via Checklists

👋 Welcome to RefineBench — a comprehensive evaluation library for testing refinement capabilities of language models across multiple settings and domains. To reproduce the full results reported in ...

GitHub

CATArena: Engineering-Level Tournament Evaluation Platform for LLM-Driven Code Agents

CATArena (Code Agent Tournament Arena) is an open-ended environment where LLMs write executable code agents to battle each other and then learn from each other. CATArena is an engineering-level ...

IEEE

Evaluating Python Static Code Analysis Tools Using FAIR Principles

Abstract: The quality of modern software relies heavily on the effective use of static code analysis tools. To improve their usefulness, these tools should be evaluated using a framework that ...

IEEE

Methodology for Code Synthesis Evaluation of LLMs Presented by a Case Study of ChatGPT and Copilot

Abstract: Large Language Models (LLMs) have grown in popularity in recent years and are now employed in a variety of software engineering domains thanks to their Natural Language Processing (NLP) ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results