-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Weird alignment images and bad sound even after 100k steps. #27
Comments
Your GPU needs more ram to be doing training. Per other tacotron repo comments, you should try and have batch size 32 or above to get alignment. I also get the triangle charts with my dataset, not sure what causes that. |
Like I said I did over 100k steps using batch size of 32 with CPUs and 48gbs of ram. Same problem whether I've got the CPU or GPU. I just tried the CPU with a batch size of 64 and got the same issue. A band shows across the top: Again, I'm not sure if it's relevant but the wavs it generates are WAY louder than the original training wavs. I have to turn the training wavs volumes to about 75% to hear them well on vlc and mplayer. The clips generated by Mimic2 I have to set to 3% and it's still twice as loud as the training clips at 75%. Not sure what you mean by triangles? I'm complaining about the band that never goes away! You can see it in all my alignment images. |
See your 100k step picture for what I mean by triangle. Tends to echo out still when that occurs. I'm not sure why yours ending up with the test wavs being loud. I haven't been able to train LJ on mimic2 successfully, though one of the mycroft folks said he was able to do. |
That just means that the training is working: keithito#144 So you've tried training in the LJs data set as well? Did you get the same interference I'm getting in your alignment graphs and sound synthases? |
It doesn't work, though. The generated samples from models that have the weird align/fuzzy bar triangley thing end up being either filled echo or lose coherency quickly. Aligned models from previous iterations of tacotron/mimic2 I've run haven't had those issues, and their alignment charts are much closer to ideal (ie, just a line going bottom left to upper right). |
So what changed that is creating this issue? Do you know when around this issue started to crop up? |
There was a bunch of stuff updated last September or so. For a test, try the following. Preprocess all your data with the mimic2 repo. Then, clone keithito's tacotron repo and use it to do the training with for 25k or so, by which time you should see normal alignment. |
Mimic2 crabs about "bias not found in checkpoint" even when the data was preprocessed with Mimic2. I guess I'll have to figure this out on my own. |
Did you clear out previous training run step/model/checkpoints? |
I cut and paste them into a different folder. |
So more issues, the thing is still super loud. Also it only aligns some times, even with my modifications. I have a feeling the remaining issues are volume related. For now I'm just going to use Tacotron. So I modified Tacotron so I can use it to interface with MyCroft and it seems to be working. |
I have used a different dataset (private) and trained for 18k steps using the existing mimic2 (master branch). I was able to get good alignment and also a decent voice. Could you please share your plots generated using this |
The plots above were generated using mimic2 on the LJs data set. It could be there is something going on in particular with the LJs data set that makes it not work well Mimic2, I dunno. |
So I've been trying to train this on the LJSpeech data set since it seemed like the most solid one out there. However I've been having an issue where there is a messy band across the alignment even after it finds alignment the disturbance remains. The step audios sound great after 20k steps or so, but if you synthesize using the demo server it sound like garbled junk even after 100k steps. Here are some alignment images from when I did 100k on the CPU:
This is step 1k, you can already see the band.
50k steps, band still there, but starting to align finally!
100k step, alignment improving, but band still there!
I used the default hparams for the CPU run. Then I decided to use the GPU. The GPU is a K620 with limited memory at 2Gb. So I had to set the hparams like this to not OOM:
Note I had to bring the batch size down to conserve memory and I also changed the frequency to 22050 from 22000 because that's what was listed in the data set. I thought that may be the issue. I only ran it for 12 hours so I didn't get a lot of steps but here are the results:
1k steps, hmm looks like that dang band again to me!
12k steps, this is where I stopped it because I didn't want to bother wasting more time, but the band is still there looking stronger than ever!
Anyone have any clue what could be causing this issue? Is there anything I can tweak in the hparams to correct this? Could it be an issue with the code? On a side note if I use the demo server or listen to the alignment clips, they are much MUCH louder than the sample data. I'm not sure if that's related or controllable some how.
The text was updated successfully, but these errors were encountered: