Failed to reproduce paper's model sizes. #48

AI-Guru · 2024-08-22T09:28:56Z

Hi everyone,

I am most excited about xLSTM. Great and promising work!

Today, I am having trouble reproducing the model sizes from the paper. For example xLSTM[7:1] with 125M trainable parameters.

From the paper, I constructed the following config:

from omegaconf import OmegaConf
from dacite import from_dict
from xlstm.xlstm_lm_model import xLSTMLMModel, xLSTMLMModelConfig

# Load the config.
config_string = """ 
model:
  vocab_size: 50257
  num_blocks: 24
  embedding_dim: 384
  mlstm_block:
    mlstm:
      num_heads: 4
  slstm_block:
    slstm:
      num_heads: 4
  slstm_at: [3, 20]
  context_length: 2048
"""
config = OmegaConf.create(config_string)

# Create the model.
model_config = from_dict(xLSTMLMModelConfig, OmegaConf.to_container(config.model))
model = xLSTMLMModel(model_config)
print(model_config)
print(model)

# Get the number of parameters.
number_of_parameters = sum(p.numel() for p in model.parameters())
print(f"Number of parameters: {number_of_parameters:_}")

It yields:

Number of parameters: 60_575_792

This is roughly half of the expected parameters. What did I miss?

Cheers,
Tristan

The text was updated successfully, but these errors were encountered:

PRamoneda · 2024-10-09T12:05:22Z

We have the same problem

kpoeppel · 2024-10-10T11:35:36Z

It should be an embedding dimension of 768. Where did you find the 384?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failed to reproduce paper's model sizes. #48

Failed to reproduce paper's model sizes. #48

AI-Guru commented Aug 22, 2024

PRamoneda commented Oct 9, 2024

kpoeppel commented Oct 10, 2024

Failed to reproduce paper's model sizes. #48

Failed to reproduce paper's model sizes. #48

Comments

AI-Guru commented Aug 22, 2024

PRamoneda commented Oct 9, 2024

kpoeppel commented Oct 10, 2024