This repository contains code of a course project for fine-tuning a DeBERTa-Xlarge model to detect AI-generated text. The model is trained using all active layers and 5 cross-validation approach for performance evaluation.
- Model: DeBERTa-Xlarge
- Cross-Validation: 5-fold cross-validation is employed to assess model performance and generalization capability.
The dataset used for fine-tuning the model consists of a combination of several publicly available datasets, as well as generated samples:
- Persaude Corpus 2
- LLM Detect AI Generated Text Competition Dataset, where the data lodaers were also referred.
- DAIGT V4 Train Dataset
- In addition, data samples were also generated using LLaMA and GPT-2 models.
The final merged datasets can be downloaded in https://file.io/Wz3JI0DVhXF1