Heuristically choose and preserve rdx across mulx #210

andres-erbsen · 2023-09-22T19:40:38Z

Closes #199

Experiment with p256_mul, this patch versus an alternative that always uses chooseArg.

dderjoel · 2023-10-04T00:51:41Z

src/assembler/assembler.helper.ts

@@ -51,7 +51,7 @@ export function sanityCheckAllocations(c: CryptOpt.DynArgument): void {
        }
        byReg[r64] = varname;
      }
-      if (matchArg(varname)) {
+      if (matchArg(varname) && isMem(store)) {


I do not understand why this is necessary. Are we checking if a value is in an register and a memory location at the same time?

I do not understand why the entire conditional is necessary, but without the && the assertion failed with all my other changes. I believe the old conditional asks whether the variable is an input-array cell, and the new conditional asks whether it is stored in memory. The difference would be an input variable that is cached in a register, for example rdx.

dderjoel · 2023-10-04T01:23:31Z

src/registerAllocator/RegisterAllocator.class.ts

@@ -463,6 +463,9 @@ export class RegisterAllocator {
    const caf = (flagToCheck: AllocationFlags): boolean =>
      ((allocationReq.allocationFlags ?? AllocationFlags.NONE) & flagToCheck) === flagToCheck;
    const inAllocationsTemp = allocationReq.in.map((readVariable) => {
+      const currentLocation = this._allocations[readVariable];
+      if (currentLocation && isRegister(currentLocation.store)) return currentLocation.store;


why do we need this condition up here?
I see that for the most part, the remaining parts are checking if it is not a register and then do something. But I am a bit unsure if there is really no side-effects.

If arg1[3] is also stored in a register, reading it from the register is more flexible than loading it from memory again. Without this new short-circuit, the next conditional would take the memory path.

So instead of

mov rdx, [rsi] mulx r8 , r9, [rsi]

it will now emit

mov rdx, [rsi] mulx r8 , r9, rdx

?

Then I'd like to write a test case for that.

No wait. This would be a square operation, which is handled differently anyway.
This emits

mov rax, [rsi] ... mulx r8 , r9, rax

instead of

mov rax, [rsi] ... mulx r8 , r9, [rsi]

dderjoel · 2023-10-04T01:25:15Z

src/registerAllocator/RegisterAllocator.class.ts

@@ -593,7 +595,7 @@ export class RegisterAllocator {
      }
    }

-    if (caf(AllocationFlags.ONE_IN_MUST_BE_IN_RDX) && !inAllocations.includes(Register.rdx)) {
+    if (caf(AllocationFlags.ONE_IN_MUST_BE_IN_RDX) && !allocationReq.in.some((i) => this._allocations[i].store === Register.rdx)) {


Why do we checkthis._allocations[i] instead of inAllocations?

I do not remember for sure, but I think it was a difference between [rax+4] and arg1[1]. You're right to question this code as I was working from examples, not deep understanding.

Thing is, I could see this working but also failing. inAllocations is being used and modified in this function, whereas _allocations is modified by the getW() function if I remember correctly. I'd want to not touch anything if we don't know if its necessary. This whole RegisterAllocator needs a rewrite, but I've never had the time.

src/registerAllocator/RegisterAllocator.class.ts

src/instructionGeneration/multiplication.ts

dderjoel · 2023-10-04T01:33:27Z

is that little orange block other the brown block (y=50, x=22000) the reason why we think this heuristic is better?
Did we try other functions / other machines?

andres-erbsen · 2023-10-04T18:25:33Z

I am not sure we want the heuristic. What do you think would be a good test? If you'd be more comfortable landing this change minus the heuristic, I'd be enthusiastic for that as well.

dderjoel · 2023-10-05T00:56:00Z

Hm. So lets summarise. The heuristic is only impacting mulxs. When emitting code for an 64x64 multiplication it will now check the next mulx instruction, if it has a common factor. if it does, It will prefer to load this one into rdx over letting the optimiser find this decision.

Additionally, the effect of preferring loading argN[n] from registers over loading them form memory.

dderjoel and others added 4 commits September 14, 2023 09:15

sketching slightly more sophisticated argument loading

d6f0538

Merge branch 'dev' into feature/reg199

b92a58c

Merge branch 'dev' into feature/reg199

b9608a0

heuristically choose and preserve rdx across mulx

ca9a928

andres-erbsen marked this pull request as ready for review September 23, 2023 17:56

dderjoel reviewed Oct 4, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Heuristically choose and preserve rdx across mulx #210

Heuristically choose and preserve rdx across mulx #210

andres-erbsen commented Sep 22, 2023 •

edited

Loading

dderjoel Oct 4, 2023

andres-erbsen Oct 4, 2023

dderjoel Oct 4, 2023

andres-erbsen Oct 4, 2023

dderjoel Oct 5, 2023

dderjoel Oct 5, 2023

dderjoel Oct 5, 2023

dderjoel Oct 4, 2023

andres-erbsen Oct 4, 2023

dderjoel Oct 5, 2023

dderjoel commented Oct 4, 2023

andres-erbsen commented Oct 4, 2023

dderjoel commented Oct 5, 2023

Heuristically choose and preserve rdx across mulx #210

Are you sure you want to change the base?

Heuristically choose and preserve rdx across mulx #210

Conversation

andres-erbsen commented Sep 22, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dderjoel commented Oct 4, 2023

andres-erbsen commented Oct 4, 2023

dderjoel commented Oct 5, 2023

andres-erbsen commented Sep 22, 2023 •

edited

Loading