Why are we using olympic scores stdev for RCP interpolation? #375

ShriyaPalsamudram · 2024-07-11T19:14:19Z

The general idea of stdev is to get the variance of a population, but Olympic scoring drops the extreme values. So we should consider using stdev of the entire population for interpolation.

Need to analyze and fix for v4.1

pgmpablo157321 · 2024-07-31T20:55:43Z

pgmpablo157321 · 2024-07-31T20:57:47Z

hiwotadese · 2024-08-01T15:42:52Z

Can we see how this affect the v4.0 scores?

pgmpablo157321 · 2024-08-02T21:00:48Z

@ShriyaPalsamudram I am getting the same results when with and without the olympic stdev. It seems to be, because it the RCP Stdev is only being used to compute the min_epochs

logging/mlperf_logging/rcp_checker/rcp_checker.py

Lines 272 to 275 in 369260b

    
           min_epochs = self._find_min_acceptable_mean( 
        
                             record_contents['RCP Mean'], 
        
                             record_contents['RCP Stdev'], 
        
                             len(epoch_list)-samples_rejected*2)

And since we are no longer pruning based on min_epochs, it doesn't seem to have an effect on the results. The min_epochs later affects the Max Speedup, but this only later used in a condition to check if the RCP passed.

logging/mlperf_logging/rcp_checker/rcp_checker.py

Line 276 in 369260b

record_contents['Max Speedup'] = record_contents['RCP Mean'] / min_epochs

logging/mlperf_logging/rcp_checker/rcp_checker.py

Line 438 in 369260b

if mean_subm_epochs >= (rcp_record["RCP Mean"] / rcp_record["Max Speedup"]):

@ShriyaPalsamudram What changes were expected when changing the Stdev?

ShriyaPalsamudram · 2024-08-08T15:51:10Z

Since this impacts max speedup, can we compare max speedup before and after the change for all RCP points?

pgmpablo157321 · 2024-08-14T15:20:42Z

pgmpablo157321 · 2024-08-14T16:52:49Z

Additionally, an example of the max_speed_up values for last training results:
HPE-Cray-XD670-Gen11-H100-SXM5-80GB_n1_mxnet_24.04
With olympic score:

[1.018748075108887, 1.0601459916687128]

Without olympic score:

[1.0262860923600325, 1.0699036853548622]

pgmpablo157321 mentioned this issue Jul 31, 2024

Calculate Stdev without olympic pruning #377

Merged

hiwotadese closed this as completed in #377 Aug 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why are we using olympic scores stdev for RCP interpolation? #375

Why are we using olympic scores stdev for RCP interpolation? #375

ShriyaPalsamudram commented Jul 11, 2024 •

edited

Loading

pgmpablo157321 commented Jul 31, 2024

pgmpablo157321 commented Jul 31, 2024

hiwotadese commented Aug 1, 2024

pgmpablo157321 commented Aug 2, 2024

ShriyaPalsamudram commented Aug 8, 2024

pgmpablo157321 commented Aug 14, 2024

pgmpablo157321 commented Aug 14, 2024

Why are we using olympic scores stdev for RCP interpolation? #375

Why are we using olympic scores stdev for RCP interpolation? #375

Comments

ShriyaPalsamudram commented Jul 11, 2024 • edited Loading

pgmpablo157321 commented Jul 31, 2024

pgmpablo157321 commented Jul 31, 2024

hiwotadese commented Aug 1, 2024

pgmpablo157321 commented Aug 2, 2024

ShriyaPalsamudram commented Aug 8, 2024

pgmpablo157321 commented Aug 14, 2024

pgmpablo157321 commented Aug 14, 2024

ShriyaPalsamudram commented Jul 11, 2024 •

edited

Loading