fix 'intermediate_size' in Llama configuration files after the 'mlp_type' option was removed #1309
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
After the 'mlp_type' option was removed, the regular-type and the llama-type of MLP share the same implementation, and the "mlp_type" is now specified by whether the activation is gated or not.
However, this changes the meaning of the 'intermediate_size' option in llama configuration files. The code (megatron/model/transformer.py) now treats 'intermediate_size' as the size of the output tensor of the first linear layer in the MLP, which includes the first layer and the gated layer of llama-type MLP. This means that the code actually halves the 'intermediate_size' of llama-type MLP. Meanwhile, the code multiplies the 'intermediate_size' by (2/3), which means the actual 'intermediate_size' is only (1/3) of the intended size in the configuration file.
To fix this problem, I revised the llama configuation files, and set the 'intermediate_size' to 3 times as its intended value.