-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Heuristically choose and preserve rdx across mulx #210
base: dev
Are you sure you want to change the base?
Heuristically choose and preserve rdx across mulx #210
Conversation
@@ -51,7 +51,7 @@ export function sanityCheckAllocations(c: CryptOpt.DynArgument): void { | |||
} | |||
byReg[r64] = varname; | |||
} | |||
if (matchArg(varname)) { | |||
if (matchArg(varname) && isMem(store)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not understand why this is necessary. Are we checking if a value is in an register and a memory location at the same time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not understand why the entire conditional is necessary, but without the &&
the assertion failed with all my other changes. I believe the old conditional asks whether the variable is an input-array cell, and the new conditional asks whether it is stored in memory. The difference would be an input variable that is cached in a register, for example rdx
.
@@ -463,6 +463,9 @@ export class RegisterAllocator { | |||
const caf = (flagToCheck: AllocationFlags): boolean => | |||
((allocationReq.allocationFlags ?? AllocationFlags.NONE) & flagToCheck) === flagToCheck; | |||
const inAllocationsTemp = allocationReq.in.map((readVariable) => { | |||
const currentLocation = this._allocations[readVariable]; | |||
if (currentLocation && isRegister(currentLocation.store)) return currentLocation.store; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need this condition up here?
I see that for the most part, the remaining parts are checking if it is not a register and then do something. But I am a bit unsure if there is really no side-effects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If arg1[3]
is also stored in a register, reading it from the register is more flexible than loading it from memory again. Without this new short-circuit, the next conditional would take the memory path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So instead of
mov rdx, [rsi]
mulx r8 , r9, [rsi]
it will now emit
mov rdx, [rsi]
mulx r8 , r9, rdx
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then I'd like to write a test case for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No wait. This would be a square operation, which is handled differently anyway.
This emits
mov rax, [rsi]
...
mulx r8 , r9, rax
instead of
mov rax, [rsi]
...
mulx r8 , r9, [rsi]
@@ -593,7 +595,7 @@ export class RegisterAllocator { | |||
} | |||
} | |||
|
|||
if (caf(AllocationFlags.ONE_IN_MUST_BE_IN_RDX) && !inAllocations.includes(Register.rdx)) { | |||
if (caf(AllocationFlags.ONE_IN_MUST_BE_IN_RDX) && !allocationReq.in.some((i) => this._allocations[i].store === Register.rdx)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we checkthis._allocations[i]
instead of inAllocations
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not remember for sure, but I think it was a difference between [rax+4]
and arg1[1]
. You're right to question this code as I was working from examples, not deep understanding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thing is, I could see this working but also failing. inAllocations
is being used and modified in this function, whereas _allocations
is modified by the getW()
function if I remember correctly. I'd want to not touch anything if we don't know if its necessary. This whole RegisterAllocator needs a rewrite, but I've never had the time.
is that little orange block other the brown block (y=50, x=22000) the reason why we think this heuristic is better? |
I am not sure we want the heuristic. What do you think would be a good test? If you'd be more comfortable landing this change minus the heuristic, I'd be enthusiastic for that as well. |
Hm. So lets summarise. The heuristic is only impacting Additionally, the effect of preferring loading |
Closes #199
Experiment with p256_mul, this patch versus an alternative that always uses
chooseArg
.