Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

显存会随轮数无限变大吗 #19

Open
DirtyBit64 opened this issue May 7, 2023 · 5 comments
Open

显存会随轮数无限变大吗 #19

DirtyBit64 opened this issue May 7, 2023 · 5 comments

Comments

@DirtyBit64
Copy link

我训到70轮显存溢出了,总共还需要训400轮,,,,

@dongzhang89
Copy link

@PGthree3 不会,猜测是你代码的问题

@DirtyBit64
Copy link
Author

我降低了两个bs,并且将block的中间扩张率下调成了2,有一定涨点效果,请问作者有试过调整EVCblock的中间扩张率吗,源码是4和2差距大吗

@dongzhang89
Copy link

@PGthree3 扩张率是一个超参数,应该在不同数据集上性能有所差距。我们当时没有做这么多的调参,EVCblock只是提供了一种除了attention之外的其他可能性。要是你有更多细节的实验结果,欢迎report或者提交pull request,谢谢。

@DirtyBit64
Copy link
Author

好的,感谢回复!

@SmoothJing
Copy link

我降低了两个bs,并且将block的中间扩张率下调成了2,有一定涨点效果,请问作者有试过调整EVCblock的中间扩张率吗,源码是4和2差距大吗

请问你是直接用的CFP模块还是只用了EVC模块啊

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants