We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GLM-4使用的是Pre-Norm还是Deepnorm(Post-Norm)?
Techinical Report中的说法,应该是沿用了GLM-130b的Deepnorm。 但是huggingface给出的配置文件中apply_residual_connection_post_layernorm=False,应该是没有使用Post-Norm;但是这个文件中还有一个post_layer_norm=True,这个参数只在decoder最后使用layernorm。 所以到底哪个是对的啊,应不应该用啊?
apply_residual_connection_post_layernorm=False
post_layer_norm=True
见上
The text was updated successfully, but these errors were encountered:
No branches or pull requests
Feature request / 功能建议
GLM-4使用的是Pre-Norm还是Deepnorm(Post-Norm)?
Motivation / 动机
Techinical Report中的说法,应该是沿用了GLM-130b的Deepnorm。
但是huggingface给出的配置文件中
apply_residual_connection_post_layernorm=False
,应该是没有使用Post-Norm;但是这个文件中还有一个post_layer_norm=True
,这个参数只在decoder最后使用layernorm。所以到底哪个是对的啊,应不应该用啊?
Your contribution / 您的贡献
见上
The text was updated successfully, but these errors were encountered: