-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simple training script for toy data? #46
Comments
could you retry? there was an issue uncovered by Farid that has since been resolved |
Hi @xiao-xian I uploaded a demo notebook at my fork: It's tricky using such a small dataset I've have had more luck using 30-40 models per label since it can generalize better. Also; if you don't generate the codes before you train the transformer; the autoencoder will generate and waste 4-96G VRAM each training step since the codes it generates deletes itself due to dataloader. |
Many thanks @MarcusLoppe!! I pull your branch and use the above notebook to run the training. The training loss for encoder is around 0.28: |
Ah, yes the transformer seem to have trouble at the first token sometimes. It's due to the text doesn't guide it very well when there is no tokens. This issue resolves when using many more mesh models but is a issue when dealing with small amount of meshes. I seems to also have some trouble with meshes with very small amount of triangles, the box only has 12 triangles and it always had some trouble while the 112 triangles meshes where fine. Try with some meshes that are bit more 'complex', here is 4 tables and 4 chairs that works pretty well, apply 50 augmentations per model for bit more robust generalization. |
@MarcusLoppe oh interesting, maybe i could have an option to make the first token generated unconditionally? what do you think? |
It kinda seems like it's already doing, I'm guessing it's due to that the cross-attention impact isn't very high when there is no data/tokens. I've tried setting the only_cross to True but it doesn't have a noticeable impact on the problem. |
Hi there,
I wonder if it's possible to have some script reproducing the same toy example from an older paper. I tried to run the training, but the best thing I came up with is this:
I also constantly run into NaN as reported here. Thanks for any help!
The text was updated successfully, but these errors were encountered: