Long Code Arena is a suite of benchmarks for code-related tasks with large contexts, up to a whole code repository. It currently spans six different tasks and contains six datasets:
- ๐ค Library-based code generation
- ๐ค CI builds repair
- ๐ค Project-level code completion
- ๐ค Commit message generation
- ๐ค Bug localization
- ๐ค Module summarization
For each task, we have different approaches and environments. You can find the baseline solutions and the necessary information about each task in the corresponding directories.
We are excited to invite you to participate in solving our benchmarks! To submit your results, please send the following materials to our ๐ฉ email ([email protected]):
- Results: Include the summary of your benchmark outcomes.
- Reproduction Package: To ensure the integrity and reproducibility of your results, please include the code for context collection (if any), generation of predictions, and evaluating. You can follow our baselines as a reference.
- Metadata: Model information, organization name, licence of your model, context size, and other information you find relevant.