Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What is going on when further optimization appears to make the program slower? #211

Open
andres-erbsen opened this issue Sep 24, 2023 · 4 comments

Comments

@andres-erbsen
Copy link

image

A cartoon image of CryptOpt I've had is that it rejects mutations that make the program slower. I know that this is idealistic, even the optimization trace in the paper goes down for a bit. But how is it that continuing to try more mutations seems to have a real chance of making the cycle counts in the CryptOpt output go up? I understand that there's some chance that a mutation will misleadingly appear attractive due to measurement noise, but looking at the above log I can tell at a glance that a previous program would likely perform better. What is going on here -- is the theory that something changes about the machine to make both versions run slower, or does the wrong one just get picked sometimes?

@dderjoel
Copy link
Collaborator

dderjoel commented Oct 4, 2023

Short answer: Needs to be investigated.
We've seen this behavior for some functions on some machines. It improves the performance until some point and then cannot recover from that. I'm having a hard time to reliably reproduce it, so even harder to fix.
The workaround at the moment is to let CryptOpt emit asm-files every time one new log-line appears at the screen (ie every 10%) and then post process and find the good ones.

I'm out of ideas why this is happening.
We see that the instruction count is not changing significantly. So maybe the first step to investigate is to not only show the l/g ratio, but show the absolute measured cycles. Then we'd if the C code appears to be run quicker or the assembly slower.

@andres-erbsen
Copy link
Author

Thank you for being open about the situation. I think the current workaround is acceptable, and since I've shifted to running with smaller --evals as you suggested the checkpoint frequency is quite comfortable.

I was able to reproduce this behavior on 2 out of 2 machines for the above picture by using all cores and flipping through tmux tabs for a minute. I have the sense that it would just work if I tried again. If there's a specific change you'd like me to test with to gather data, I'd be more than happy to do this.

Looking at cycl in the image, I think assembly is running slower after the threshold. Or is that not fair to conclude that easily?

@dderjoel
Copy link
Collaborator

dderjoel commented Oct 5, 2023

Lets see. The status line is generated in src/optimizer/optimizer.helper.ts.
The green number after the white cycl is the cycle-Delta between the good Assembly and the bad Assembly.
The yellow and red one's are actually already the absolute cycle counts, so yes, you're right that the Assembly is actually measured slower, whereas the pink L is the absolute cycle count from the library which seems pretty constant at 64.
Also the stddev seems very low (0...1), so not too much noise there either.

So based on this, the assembly actually seems to get worse.
Now, interestingly, it gets worse by almost 15 per cent points, without increase in stack size (somewhat constant at 6) or increase in instruction count (somewhat constant at 132 instructions).

So, from that, I think we should actually have a look at the assembly from this cutoff point, to find out what actually changed.

@dderjoel
Copy link
Collaborator

dderjoel commented Oct 5, 2023

Oh, and I just double checked. I was wrong about the writeout every 10%. It used to be that, but now only the last asm is written. This means, the last one from each of the bets, and one final one from the run. I believe I did that because it was just too many files. (10 times as much). The writeout is happening around line 300 in src/optimizer/optimizer.class.ts.

I'm in the works of fixing that, but during the tests I'm now able to reproduce the error (50) that has been thrown in the GH CI lately. I'll try to fix that, too and push some changes to dev

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants