LLMTrainingTools provides a suite of utilities designed to facilitate the creation and management of training data for language learning models (LLMs), particularly chatbots. It offers tools for converting data between JSONL and SQLite formats, editing training data through a web interface, and converting CSV files to JSONL.
- Export and Import Data: Export data from the database to JSONL and import JSONL data into the database.
- Python 3.6 or above
- Pip
- An IDE (PyCharm preferred) VSCode also ok
- SQLite : For the Database operations
- Flask : For the Web UI and API
- Flask SQLAlchemy & SQLAlchemy : For the SQLite DB operations
- Jinja2 - For rendering the HTML templates
Clone the repository and install the required Python packages:
For Windows
git clone https://github.com/pradeepgudipati/LLMTrainingTools.git
cd LLMTrainingTools
python -m venv llmtraining
./llmtraining/Scripts/activate
pip install -r requirements.txt
For Other platforms please adjust above commands
To run the web application for editing the training data
> python app.py
----------------------------------------------------------------
(llmtraining) PS D:\Dev\Workspace\LLMTrainingTools> python .\app.py
BASE_DIR: D:\Dev\Workspace\LLMTrainingTools
DB_PATH: D:\Dev\Workspace\LLMTrainingTools\jsonl_data_to_db/data/qa_data.db
* Serving Flask app 'app'
* Debug mode: on
WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
* Running on http://127.0.0.1:5000
Press CTRL+C to quit
* Restarting with stat
BASE_DIR: D:\Dev\Workspace\LLMTrainingTools
DB_PATH: D:\Dev\Workspace\LLMTrainingTools\jsonl_data_to_db/data/qa_data.db
* Debugger is active!
* Debugger PIN: 209-879-976
127.0.0.1 - - [29/Feb/2024 12:11:15] "GET / HTTP/1.1" 200 -
...
Now Open the below URL in your favorite browser - http://127.0.0.1:5000
- If you have a Question answer CSV base you can convert it to JSONL and then import it to the SQLite DB
- Click on the "CSV to JSONL" button at the top. Once the conversion is done, you will see a message "CSV to JSONL Conversion Done"
- Now Import the JSONL into the DB file (SQLite) by clicking on the "Import JSONL to DB" button. Once the import is done, you will see a message "Import JSONL to DB Done"
- Now you can see the data in the table view. You can edit the data and save it back to the DB by clicking on the "Save to DB" button. Once the save is done, you will see a message "Data Saved to DB"
- Now to export the data from the DB to JSONL, click on the "Export DB to JSONL" button. Once the export is done, you will see a message "Export DB to JSONL Done"
- This JSONL file can now be used to train the LLM model.
- Code for DB to JSONL is in db_to_jsonl.py. Verify the path for the jsonl and db files in the code
# Paths for the JSONL file and SQLite database
jsonl_file_path = "data/qa_data.jsonl"
sqlite_db_path = "data/merged_data.db"
- Now run the following command in the terminal to execute the code
python db_to_jsonl.py
- First verify the jsonl and db paths in the jsonl_to_sqllite.py
jsonl_file_path = "src/data_tools/import_utils/data/qa_data.jsonl"
sqlite_db_path = "data/qa_data.db"
- Run the following command in the terminal to execute the code
python jsonl_to_sqllite.py
- First verify the jsonl and csv paths in thecsv_to_jsonl.py
- If you have multiple CSV files then use the below path variable
# Paths
csv_files_path = './docs'
output_jsonl_file = 'training_data.jsonl'
convert_folder_csv_to_jsonl(csv_files_path, output_jsonl_file)
- If you have only 1 CSV file
# Paths
csv_files_path = './training_data.csv'
output_jsonl_file = 'training_data.jsonl'
convert_single_csv_to_jsonl(csv_file_path, output_jsonl_file)
- Execute the code
python csv_to_jsonl.py
- JSONL to SQLite DB Converter Code - jsonl_to_sqllite.py
This python code converts the jsonl data to a sqllite database with a table called messages. The messages table has 3 rows
- id - Primary key
- Question - Contains the Question
- Answer - Contains the answer to the Question
- SQLite to JSONL Converter code - db_to_jsonl.py
This code converts the sqlite database to a jsonl file The format of the jsonl is as follows
{
"messages": [
{
"role": "user",
"content": ""
},
{
"role": "assistant",
"content": ""
}
]
}
- Flask Server App Code - app.py
- Flask HTML Template - table_view.html
- Data Model llm_training_data_model.py
- JSONL Data File - qa_data.jsonl
- SQLite DB file - qa_data.db
- CSV to JSONL Converter - csv_to_jsonl.py
Start the Flask server by running the below command in the terminal
python app.py
Access the Web UI by opening the below URL in your favorite browser http://localhost:5000
Contributions are welcome! If you have ideas for improvements or want to contribute code, please feel free to reach out or submit a pull request.
This project is free to use and modify. For any queries or contributions, please contact Pradeep Gudipati on Linkedin