fix 'intermediate_size' in Llama configuration files after the 'mlp_type' option was removed #1309

tiandeyu-cs · 2024-10-23T09:55:25Z

After the 'mlp_type' option was removed, the regular-type and the llama-type of MLP share the same implementation, and the "mlp_type" is now specified by whether the activation is gated or not.
However, this changes the meaning of the 'intermediate_size' option in llama configuration files. The code (megatron/model/transformer.py) now treats 'intermediate_size' as the size of the output tensor of the first linear layer in the MLP, which includes the first layer and the gated layer of llama-type MLP. This means that the code actually halves the 'intermediate_size' of llama-type MLP. Meanwhile, the code multiplies the 'intermediate_size' by (2/3), which means the actual 'intermediate_size' is only (1/3) of the intended size in the configuration file.

To fix this problem, I revised the llama configuation files, and set the 'intermediate_size' to 3 times as its intended value.

…ype' option was removed

CLAassistant · 2024-10-23T09:55:33Z

All committers have signed the CLA.

…ype' option was removed (#1309) * fix 'intermediate_size' in Llama configuration files after the 'mlp_type' option was removed * config adjustments for llama and gated activations * pre-commit --------- Co-authored-by: jahatef <[email protected]> Co-authored-by: Quentin Anthony <[email protected]>

fix 'intermediate_size' in Llama configuration files after the 'mlp_t…

575c4b6

…ype' option was removed

tiandeyu-cs requested a review from Quentin-Anthony as a code owner October 23, 2024 09:55

config adjustments for llama and gated activations

dc4b81f

tiandeyu-cs mentioned this pull request Nov 11, 2024

LLama mlp project layers missmatch with HF config during conversion #1319

Closed

pre-commit

93970fe

Quentin-Anthony approved these changes Nov 13, 2024

View reviewed changes

Quentin-Anthony merged commit fc74a0c into EleutherAI:main Nov 13, 2024
1 check passed

tomsbergmanis mentioned this pull request Nov 26, 2024

Unreachable code and bug in if-else clause #1324

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix 'intermediate_size' in Llama configuration files after the 'mlp_type' option was removed #1309

fix 'intermediate_size' in Llama configuration files after the 'mlp_type' option was removed #1309

tiandeyu-cs commented Oct 23, 2024

CLAassistant commented Oct 23, 2024 •

edited

Loading

fix 'intermediate_size' in Llama configuration files after the 'mlp_type' option was removed #1309

fix 'intermediate_size' in Llama configuration files after the 'mlp_type' option was removed #1309

Conversation

tiandeyu-cs commented Oct 23, 2024

CLAassistant commented Oct 23, 2024 • edited Loading

CLAassistant commented Oct 23, 2024 •

edited

Loading