I Learn Better If You Speak My Language: Understanding the Superior Performance of Fine-Tuning Large Language Models with LLM-Generated Responses

Overview

This project includes training scripts, evaluation tools, and datasets for the paper I Learn Better If You Speak My Language: Understanding the Superior Performance of Fine-Tuning Large Language Models with LLM-Generated Responses. It builds upon the LLAMA-FACTORY project to train and test language models effectively.

Features

Training Scripts: Customizable training scripts leveraging LLAMA-FACTORY.
Evaluation Tools: Unified prediction functions for seamless evaluation.
Datasets Included: All necessary datasets are provided within the project.
Perplexity Calculation: (Coming soon) Scripts for perplexity calculations will be added.

Getting Started

Prerequisites
- Python 3.10
Set Up LLAMA-FACTORY Directory

Modify the LLAMA_FACTORY_DIRECTORY_new variable in your scripts to point to your LLAMA-FACTORY directory:
```
LLAMA_FACTORY_DIRECTORY_new = '/path/to/your/LLAMA-FACTORY'
```
Install Dependencies

Install the required Python packages:
```
pip install -r requirements.txt
```

Training

To train the model:

Modify the Training Script

In utils/train.py, locate the train_llama_factory function and update the model path:

Evaluation

To evaluate the trained model:

Modify the Evaluation Script

In evaluation/eval.py, find the do_predict_llama_factory_unify function and set your model path:

Datasets

All datasets required for training and evaluation are available in the datasets/ directory.

Perplexity Calculation

Note: The perplexity calculation scripts will be added soon.

Recommended Workflow

For the best experience:

Create Custom Training Scripts
- Start by creating your own training scripts based on the provided templates.
- Customize them according to your project's needs.
Train Your Model
- Use the modified training scripts to train your model.
- Ensure all paths and configurations point to your directories and models.
Evaluate the Model
- After training, use the evaluation scripts to assess your model's performance.
- Merge or adapt the evaluation scripts into your project as needed.

Acknowledgments

LLAMA-FACTORY: This project builds upon the excellent work done in the LLAMA-FACTORY project.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.vscode		.vscode
config		config
dataset		dataset
dataset_delete		dataset_delete
evaluation		evaluation
intermediate_data		intermediate_data
log		log
output		output
perplexity_record/utils		perplexity_record/utils
utils		utils
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

I Learn Better If You Speak My Language: Understanding the Superior Performance of Fine-Tuning Large Language Models with LLM-Generated Responses

Overview

Features

Getting Started

Training

Evaluation

Datasets

Perplexity Calculation

Recommended Workflow

Acknowledgments

About

Releases

Packages

Contributors 2

Languages

XuanRen4470/I-Learn-Better-If-You-Speak-My-Language

Folders and files

Latest commit

History

Repository files navigation

I Learn Better If You Speak My Language: Understanding the Superior Performance of Fine-Tuning Large Language Models with LLM-Generated Responses

Overview

Features

Getting Started

Training

Evaluation

Datasets

Perplexity Calculation

Recommended Workflow

Acknowledgments

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages