Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

data format for training #1

Open
quangtuan-0504 opened this issue Nov 16, 2023 · 4 comments
Open

data format for training #1

quangtuan-0504 opened this issue Nov 16, 2023 · 4 comments

Comments

@quangtuan-0504
Copy link

quangtuan-0504 commented Nov 16, 2023

I have some questions, please help me answer them. My dataset has 10000 background music audio samples. Each sample has a duration of approximately 10 s, sample_rate=16khz, in .mp3 format. Each sound sample is accompanied by a description of that sound. Please let me know what format I need to put my audio samples in to train them with your repo, what is the sample_rate, what is the duration,...I tried running this repo, for each audio sample I put them in .wav format, sample_rate=16khz, number of epochs=30, update_per_step=1000, btach_size=2 but the result is very bad

Thank you for reading the comment

@lyramakesmusic
Copy link
Owner

if you used the autolabeler, it's hardcoded to 44100, so you might need to dig into that cell and change it to 16000. it should also auto-set the length. 30 epochs should be quite enough to get good results if the labeling and config is correct:

entry = {
    "key": f"{key}",
    "artist": artist_name,
    "sample_rate": 44100, # Change this to 16000
    "file_extension": "wav", # You already converted, so this is OK
    "description": "",
    "keywords": "",
    "duration": length, # Double-check that this is getting set to 10 in the config files, but should be OK
    "bpm": tempo,
    "genre": result.get('genres', ""),
    "title": "",
    "name": "",
    "instrument": result.get('instruments', ""),
    "moods": result.get('moods', []),
    "path": os.path.join(dataset_path, filename),
}

@lyramakesmusic
Copy link
Owner

If you do have higher quality versions of the files, you should probably be using those, since 16khz is pretty low. musicgen generates 32khz, for reference

@quangtuan-0504
Copy link
Author

quangtuan-0504 commented Nov 17, 2023

Thank you very much for giving me the answer, I want to ask a little more
image
I don't use automatic labeling, in my .json file there is only sample_rate,file_extension,description,duration,path and the remaining fields I Leave blank, does this lead to any errors?
this is my yaml file
image
This is an audio file of my training data
image

Here is the link to my train data set:
https://drive.google.com/file/d/1-4l7c_QmItyd1pawdl1ppLX-8xSo7sw_/view?usp=sharing

@lyramakesmusic
Copy link
Owner

lyramakesmusic commented Nov 27, 2023

  • it's okay to leave those fields blank
  • max_sample_rate in datasource yaml should probably be 16000. this might help, but I don't see an obvious error in this training config

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants