Error in FLOPS Calculation #1093

passaglia · 2023-12-06T10:02:56Z

There's a bug in the GPT-NeoX flops calculation here:

Line 104 in a2b2020

flops_calc2 = vocab_size / (16.0 * num_layers * hidden_size)

The term proportional to the vocab_size, flops_calc2 , should share all the same prefactors as flops_calc_1. See page 12 of 2104.04473

Since this term scales inversely with both hidden_dim and num_layers, it is more significant for small models and less important for large models: for Pythia-70m the error is roughly 50257/(16 \times 6 \times 512) = 102% while for GPT-NeoX-20B it is only 50257/(16 \times 44 \times 6144) = 1.2%.

This bug seems to have been introduced only ~3 months ago in #1044 so it may not have had an impact on, e.g., any tests done while training Pythia.

The text was updated successfully, but these errors were encountered:

passaglia added the bug Something isn't working label Dec 6, 2023

StellaAthena mentioned this issue Dec 6, 2023

Corrects FLOPs formula as per 1093 #1094

Merged

Quentin-Anthony closed this as completed in #1094 Dec 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error in FLOPS Calculation #1093

Error in FLOPS Calculation #1093

passaglia commented Dec 6, 2023 •

edited

Loading

Error in FLOPS Calculation #1093

Error in FLOPS Calculation #1093

Comments

passaglia commented Dec 6, 2023 • edited Loading

passaglia commented Dec 6, 2023 •

edited

Loading