"gas" configuration doesn't do anything #149

segyges · 2024-02-04T21:03:48Z

Per this, my understanding is that the gas config in neox doesn't do anything, and shouldn't be used, and should be removed. We should be using gradient_accumulation_steps instead.

It appears that all existing pythia configs set gas to 1, which is the default for gradient_accumulation_steps anyway, so this will not matter. Per that same search some of the old eval results specifically show gas at 2, which would be a bad error and would halve effective batch size if the expectation was that gas did something.

I am not putting in a PR to replace gas with gradient_accumulation_steps because these configs are references for the settings of existing artifacts, so it's not clear to me that they should be fixed to be "correct", or if they are, what the correct steps would be to make sure that they're preserved as references on those artifacts if the configuration is fixed going forward.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"gas" configuration doesn't do anything #149

"gas" configuration doesn't do anything #149

segyges commented Feb 4, 2024 •

edited

Loading

"gas" configuration doesn't do anything #149

"gas" configuration doesn't do anything #149

Comments

segyges commented Feb 4, 2024 • edited Loading

segyges commented Feb 4, 2024 •

edited

Loading