data format for training #1

quangtuan-0504 · 2023-11-16T15:45:59Z

I have some questions, please help me answer them. My dataset has 10000 background music audio samples. Each sample has a duration of approximately 10 s, sample_rate=16khz, in .mp3 format. Each sound sample is accompanied by a description of that sound. Please let me know what format I need to put my audio samples in to train them with your repo, what is the sample_rate, what is the duration,...I tried running this repo, for each audio sample I put them in .wav format, sample_rate=16khz, number of epochs=30, update_per_step=1000, btach_size=2 but the result is very bad

Thank you for reading the comment

lyramakesmusic · 2023-11-16T20:31:30Z

if you used the autolabeler, it's hardcoded to 44100, so you might need to dig into that cell and change it to 16000. it should also auto-set the length. 30 epochs should be quite enough to get good results if the labeling and config is correct:

entry = {
    "key": f"{key}",
    "artist": artist_name,
    "sample_rate": 44100, # Change this to 16000
    "file_extension": "wav", # You already converted, so this is OK
    "description": "",
    "keywords": "",
    "duration": length, # Double-check that this is getting set to 10 in the config files, but should be OK
    "bpm": tempo,
    "genre": result.get('genres', ""),
    "title": "",
    "name": "",
    "instrument": result.get('instruments', ""),
    "moods": result.get('moods', []),
    "path": os.path.join(dataset_path, filename),
}

lyramakesmusic · 2023-11-16T20:34:00Z

If you do have higher quality versions of the files, you should probably be using those, since 16khz is pretty low. musicgen generates 32khz, for reference

quangtuan-0504 · 2023-11-17T02:24:48Z

Thank you very much for giving me the answer, I want to ask a little more

I don't use automatic labeling, in my .json file there is only sample_rate,file_extension,description,duration,path and the remaining fields I Leave blank, does this lead to any errors?
this is my yaml file

This is an audio file of my training data

Here is the link to my train data set:
https://drive.google.com/file/d/1-4l7c_QmItyd1pawdl1ppLX-8xSo7sw_/view?usp=sharing

lyramakesmusic · 2023-11-27T01:46:25Z

it's okay to leave those fields blank
max_sample_rate in datasource yaml should probably be 16000. this might help, but I don't see an obvious error in this training config

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data format for training #1

data format for training #1

quangtuan-0504 commented Nov 16, 2023 •

edited

Loading

lyramakesmusic commented Nov 16, 2023

lyramakesmusic commented Nov 16, 2023

quangtuan-0504 commented Nov 17, 2023 •

edited

Loading

lyramakesmusic commented Nov 27, 2023 •

edited

Loading

data format for training #1

data format for training #1

Comments

quangtuan-0504 commented Nov 16, 2023 • edited Loading

lyramakesmusic commented Nov 16, 2023

lyramakesmusic commented Nov 16, 2023

quangtuan-0504 commented Nov 17, 2023 • edited Loading

lyramakesmusic commented Nov 27, 2023 • edited Loading

quangtuan-0504 commented Nov 16, 2023 •

edited

Loading

quangtuan-0504 commented Nov 17, 2023 •

edited

Loading

lyramakesmusic commented Nov 27, 2023 •

edited

Loading