Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why are we using olympic scores stdev for RCP interpolation? #375

Closed
ShriyaPalsamudram opened this issue Jul 11, 2024 · 7 comments · Fixed by #377
Closed

Why are we using olympic scores stdev for RCP interpolation? #375

ShriyaPalsamudram opened this issue Jul 11, 2024 · 7 comments · Fixed by #377

Comments

@ShriyaPalsamudram
Copy link
Contributor

ShriyaPalsamudram commented Jul 11, 2024

The general idea of stdev is to get the variance of a population, but Olympic scoring drops the extreme values. So we should consider using stdev of the entire population for interpolation.

Need to analyze and fix for v4.1

@pgmpablo157321
Copy link
Contributor

Stdev_olympic_prunning

@pgmpablo157321
Copy link
Contributor

RCPs_pruned_varying_Stdev

@hiwotadese
Copy link
Contributor

Can we see how this affect the v4.0 scores?

@pgmpablo157321
Copy link
Contributor

@ShriyaPalsamudram I am getting the same results when with and without the olympic stdev. It seems to be, because it the RCP Stdev is only being used to compute the min_epochs

min_epochs = self._find_min_acceptable_mean(
record_contents['RCP Mean'],
record_contents['RCP Stdev'],
len(epoch_list)-samples_rejected*2)

And since we are no longer pruning based on min_epochs, it doesn't seem to have an effect on the results. The min_epochs later affects the Max Speedup, but this only later used in a condition to check if the RCP passed.

record_contents['Max Speedup'] = record_contents['RCP Mean'] / min_epochs

if mean_subm_epochs >= (rcp_record["RCP Mean"] / rcp_record["Max Speedup"]):

@ShriyaPalsamudram What changes were expected when changing the Stdev?

@ShriyaPalsamudram
Copy link
Contributor Author

Since this impacts max speedup, can we compare max speedup before and after the change for all RCP points?

@pgmpablo157321
Copy link
Contributor

RCPs_MaxSpeedUP

@pgmpablo157321
Copy link
Contributor

Additionally, an example of the max_speed_up values for last training results:
HPE-Cray-XD670-Gen11-H100-SXM5-80GB_n1_mxnet_24.04
With olympic score:

[1.018748075108887, 1.0601459916687128]

Without olympic score:

[1.0262860923600325, 1.0699036853548622]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants