SelfCodeAlign: Self-Alignment for Code Generation

🧐 About | ⭐️ StarCoder2-Instruct | 📝 Citation

Note

The documentation is still WIP. We are working on it and will update it soon.

About

SelfCodeAlign is the first fully open and transparent pipeline that enhances a code language model without relying on human annotations or distilled data from large, proprietary models. This approach led to the creation of StarCoder2-Instruct, a fully transparent, permissively licensed, self-aligned code model that achieves state-of-the-art performance in coding tasks.

Authors: Yuxiang Wei, Federico Cassano, Jiawei Liu, Yifeng Ding, Naman Jain, Zachary Mueller, Harm de Vries, Leandro von Werra, Arjun Guha, Lingming Zhang.

StarCoder2-Instruct

StarCoder2-Instruct is created with an earlier version of SelfCodeAlign. It is the very first entirely self-aligned code Large Language Model (LLM) trained with a fully permissive and transparent pipeline. Our open-source pipeline uses StarCoder2-15B to generate thousands of instruction-response pairs, which are then used to fine-tune StarCoder-15B itself without any human annotations or distilled data from huge and proprietary LLMs.

Model: bigcode/starcoder2-15b-instruct-v0.1
Code: bigcode-project/starcoder2-self-align
Dataset: bigcode/self-oss-instruct-sc2-exec-filter-50k

For more details, check README-SC2INST.md.

Citation

@article{wei2024selfcodealign,
  title={SelfCodeAlign: Self-Alignment for Code Generation}, 
  author={Yuxiang Wei and Federico Cassano and Jiawei Liu and Yifeng Ding and Naman Jain and Zachary Mueller and Harm de Vries and Leandro von Werra and Arjun Guha and Lingming Zhang},
  year={2024},
  journal={arXiv preprint arXiv:2410.24198}
}

Name		Name	Last commit message	Last commit date
Latest commit History 95 Commits
evaluation		evaluation
prompts		prompts
seed_gathering		seed_gathering
src/star_align		src/star_align
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README-SC2INST.md		README-SC2INST.md
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
sanitize.sh		sanitize.sh
self_ossinstruct_sc2.sh		self_ossinstruct_sc2.sh
self_ossinstruct_sc2_parallel.sh		self_ossinstruct_sc2_parallel.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SelfCodeAlign: Self-Alignment for Code Generation

About

StarCoder2-Instruct

Citation

About

Releases

Packages

Contributors 5

Languages

License

bigcode-project/selfcodealign

Folders and files

Latest commit

History

Repository files navigation

SelfCodeAlign: Self-Alignment for Code Generation

About

StarCoder2-Instruct

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages