Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Heuristically choose and preserve rdx across mulx #210

Open
wants to merge 4 commits into
base: dev
Choose a base branch
from

Conversation

andres-erbsen
Copy link

@andres-erbsen andres-erbsen commented Sep 22, 2023

Closes #199

Experiment with p256_mul, this patch versus an alternative that always uses chooseArg.

p256_mul

@andres-erbsen andres-erbsen marked this pull request as ready for review September 23, 2023 17:56
@@ -51,7 +51,7 @@ export function sanityCheckAllocations(c: CryptOpt.DynArgument): void {
}
byReg[r64] = varname;
}
if (matchArg(varname)) {
if (matchArg(varname) && isMem(store)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not understand why this is necessary. Are we checking if a value is in an register and a memory location at the same time?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not understand why the entire conditional is necessary, but without the && the assertion failed with all my other changes. I believe the old conditional asks whether the variable is an input-array cell, and the new conditional asks whether it is stored in memory. The difference would be an input variable that is cached in a register, for example rdx.

@@ -463,6 +463,9 @@ export class RegisterAllocator {
const caf = (flagToCheck: AllocationFlags): boolean =>
((allocationReq.allocationFlags ?? AllocationFlags.NONE) & flagToCheck) === flagToCheck;
const inAllocationsTemp = allocationReq.in.map((readVariable) => {
const currentLocation = this._allocations[readVariable];
if (currentLocation && isRegister(currentLocation.store)) return currentLocation.store;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need this condition up here?
I see that for the most part, the remaining parts are checking if it is not a register and then do something. But I am a bit unsure if there is really no side-effects.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If arg1[3] is also stored in a register, reading it from the register is more flexible than loading it from memory again. Without this new short-circuit, the next conditional would take the memory path.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So instead of

mov rdx, [rsi]
mulx r8 , r9, [rsi]

it will now emit

mov rdx, [rsi]
mulx r8 , r9, rdx

?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then I'd like to write a test case for that.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No wait. This would be a square operation, which is handled differently anyway.
This emits

mov rax, [rsi]
...
mulx r8 , r9, rax

instead of

mov rax, [rsi]
...
mulx r8 , r9, [rsi]

@@ -593,7 +595,7 @@ export class RegisterAllocator {
}
}

if (caf(AllocationFlags.ONE_IN_MUST_BE_IN_RDX) && !inAllocations.includes(Register.rdx)) {
if (caf(AllocationFlags.ONE_IN_MUST_BE_IN_RDX) && !allocationReq.in.some((i) => this._allocations[i].store === Register.rdx)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we checkthis._allocations[i] instead of inAllocations?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not remember for sure, but I think it was a difference between [rax+4] and arg1[1]. You're right to question this code as I was working from examples, not deep understanding.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thing is, I could see this working but also failing. inAllocations is being used and modified in this function, whereas _allocations is modified by the getW() function if I remember correctly. I'd want to not touch anything if we don't know if its necessary. This whole RegisterAllocator needs a rewrite, but I've never had the time.

src/instructionGeneration/multiplication.ts Show resolved Hide resolved
@dderjoel
Copy link
Collaborator

dderjoel commented Oct 4, 2023

is that little orange block other the brown block (y=50, x=22000) the reason why we think this heuristic is better?
Did we try other functions / other machines?

@andres-erbsen
Copy link
Author

I am not sure we want the heuristic. What do you think would be a good test? If you'd be more comfortable landing this change minus the heuristic, I'd be enthusiastic for that as well.

@dderjoel
Copy link
Collaborator

dderjoel commented Oct 5, 2023

Hm. So lets summarise. The heuristic is only impacting mulxs. When emitting code for an 64x64 multiplication it will now check the next mulx instruction, if it has a common factor. if it does, It will prefer to load this one into rdx over letting the optimiser find this decision.

Additionally, the effect of preferring loading argN[n] from registers over loading them form memory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants