Add N64 recompiler block hashes & inline 64-bit ops #1640

kannoneer · 2024-09-08T14:54:12Z

Reworking recompiler's JIT block accesses to be more flexible so that we can bake more runtime info to them. This way runtime checks can be avoided in the generated code, making inlining the implementations more attractive since we don't need to generate calls to e.g. kernel mode checks for every 64-bit op. This is expected to increase recompiled code's performance eventually.

I expect these changes to have a small negative performance impact but haven't measured it.

Commit messages:

We'd like the recompiler to take the execution context such as kernel
mode into account when compiling blocks. That's why it's necessary to
identify blocks not just by address but all the information used at
compile time. This is done by computing a 32-bit key and using that as
a block's identifier instead of the last six physical address bits like
was done before.

Since we have now 32-bit instead of 6-bit keys, the block() function
hashes the keys before mapping them to one of the 64 pool rows. The hash
function was chosen arbitrarily to be better than a simple multiplicative
hash and is likely not the best choice for this exact task.

Pass JITContext down to leaf emit functions.

Emit inline implementations of basic 64-bit operations.

Use block compile-time information to elide kernel mode checks of
the now inlined operations.

rasky · 2024-09-09T10:43:10Z

ares/n64/cpu/cpu.cpp

+        .cop1Enabled = scc.status.enable.coprocessor1 > 0,
+        .floatingPointMode = scc.status.floatingPointMode > 0,
+        .is64bit = context.bits == 64,
+        });


I'm a bit concerned that we compose the context structure every block we run. I believe it might be better if we do this on changes instead. I understand it's harder to get right and can bring more bugs, but I believe it's going to be faster.

So let's say, have a function that recalculates the current context and its hash key and store them somewhere. At runtime, we just use the precalulcated context and the precalculated hash key.

Then, you need to call the function to recalculate the context in any codepath that can change one of the variables that affect it, os for instance mtc0 of the status register is a prime suspect, and there will be others of course, but maybe not dozen of them.

Thank you for the quick review. I moved JITContext inside the existing Context struct, now it's Context::JIT and its contents + bit vector representation are computed only when Context::setMode() is called. This is more often than strictly needed, for example exceptions change the mcc vector state and call setMode() but should be better than before.

I added separate update and toBits member functions to Context::JIT for later debugging; they can be used to check for staleness bugs in debug builds.

We'd like the recompiler to take the execution context such as kernel mode into account when compiling blocks. That's why it's necessary to identify blocks not just by address but all the information used at compile time. This is done by computing a 32-bit key and using that as a block's identifier instead of the last six physical address bits like was done before. The execution state and its representation as bit vector are recomputed only when needed, in this case each time Context::setMode() is called, which happens on powerup, in both MTC0 and MFC0 instructions, and on exceptions. Since we have now 32-bit instead of 6-bit keys, the block() function hashes the keys before mapping them to one of the 64 pool rows. The hash function was chosen arbitrarily to be better than a simple multiplicative hash and is likely not the best choice for this exact task.

* Pass JITContext down to leaf emit functions. * Emit inline implementations of basic 64-bit operations. * Use block compile-time information to elide kernel mode checks of the now inlined operations.

kannoneer · 2024-09-10T19:08:57Z

The GDB::server.hasBreakpoints() check is a bit problematic because right now it gets updated only when Context::setMode is called even though breakpoints might be added or removed at other times as well. I suppose it could be polled for every block perhaps? I considered it part of the execution context but before it wasn't taken into account at all when looking up blocks. To me it seems like breakpoints only worked after all Pool instances where flushed.

invertego · 2024-10-10T04:22:49Z

Sorry for the delay. I will try to review this in the next few days.

invertego

Looks good aside from the issues mentioned. How much more are you planning to do with this before taking it out of draft status?

invertego · 2024-10-18T07:17:31Z

ares/n64/cpu/recompiler.cpp

+  return (address >> 2 & 0x3f) | (jitBits & ~0x3f);
+}
+
+auto CPU::Recompiler::computePoolRow(u32 key) -> u32 {


The n64 core already pulls in xxhash.h, so XXH32_avalanche might work here as well.

invertego · 2024-10-18T07:23:27Z

ares/n64/cpu/context.cpp

@@ -18,6 +18,9 @@ auto CPU::Context::setMode() -> void {
    break;
  }

+  jit.update(*this, self);
+  jitBits = jit.toBits();


Should update() be responsible for recalculating jitBits?

invertego · 2024-10-18T07:28:15Z

ares/n64/cpu/context.cpp

@@ -18,6 +18,9 @@ auto CPU::Context::setMode() -> void {
    break;
  }

+  jit.update(*this, self);


We also need to make sure this is updated after a save state load.

Also when the breakpoint count changes.

invertego · 2024-10-18T07:37:35Z

ares/n64/cpu/cpu.hpp

@@ -106,6 +118,8 @@ struct CPU : Thread {
    u32  mode;
    u32  bits;
    u32  segment[8];  //512_MiB chunks
+    u32  jitBits;


The rest of the recompiler state is in the recompiler object. Is there a reason you want to keep this state separate?

invertego · 2024-10-18T07:40:35Z

ares/n64/cpu/recompiler.cpp

-    call(&CPU::DSUBU);
-    emitZeroClear(Rdn);
+    if (!checkDualAllowed(ctx)) return 1;
+    sub64(reg(0), mem(Rs), mem(Rt), set_o);


set_o not needed

invertego · 2024-10-18T07:50:31Z

ares/n64/cpu/recompiler.cpp

+    mov32_f(temp, flag_o);
+    auto didntOverflow = cmp32_jump(temp, imm(0), flag_eq);


Suggested change

mov32_f(temp, flag_o);

auto didntOverflow = cmp32_jump(temp, imm(0), flag_eq);

auto didntOverflow = jump(flag_no);

invertego · 2024-10-18T08:02:04Z

ares/n64/cpu/recompiler.cpp

+    // If overflow flag set: throw an exception, skip the instruction via the 'end' label.
+    mov32_f(temp, flag_o);
+    auto didntOverflow = cmp32_jump(temp, imm(0), flag_eq);
+    call(&CPU::Exception::arithmeticOverflow, &cpu.exception);


Emitting calls this way causes all the parameters to be emitted as immediates. In general it's cheaper (in terms of code footprint) to calculate addresses that are passed as arguments (see instances of lea in this file). Although, for cold code paths like this, it would be even better to call a trampoline in CPU that takes no arguments (aside from the implicit this pointer).

invertego · 2024-10-18T08:04:09Z

ares/n64/cpu/recompiler.cpp

-    emitZeroClear(Rtn);
+    if (!checkDualAllowed(ctx)) return 1;
+    add64(reg(0), mem(Rs), imm(i16), set_o);
+    if(Rtn > 0) mov64(mem(Rt), reg(0));


The add64 can also be skipped if Rtn is zero.

invertego · 2024-10-18T08:06:19Z

ares/n64/cpu/recompiler.cpp

-    emitZeroClear(Rdn);
+    if (!checkDualAllowed(ctx)) return 1;
+    sub64(reg(0), mem(Rs), mem(Rt), set_o);
+    if(Rdn > 0) mov64(mem(Rd), reg(0));


The sub64 can be skipped in Rdn is zero.

invertego · 2024-10-18T08:09:19Z

ares/n64/cpu/cpu.hpp

+        Block* block;
+        u32 tag;
+      };
+      Row rows[1 << 6];


How did you arrive at 64 rows? Were other numbers tried?

kannoneer marked this pull request as draft September 8, 2024 15:08

LukeUsher requested review from invertego and rasky September 9, 2024 09:00

rasky reviewed Sep 9, 2024

View reviewed changes

kannoneer added 2 commits September 10, 2024 22:00

n64: inline simple dual mode operations

ba4504a

* Pass JITContext down to leaf emit functions. * Emit inline implementations of basic 64-bit operations. * Use block compile-time information to elide kernel mode checks of the now inlined operations.

kannoneer force-pushed the add-block-hashes branch from 382c633 to ba4504a Compare September 10, 2024 19:02

invertego requested changes Oct 18, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add N64 recompiler block hashes & inline 64-bit ops #1640

Add N64 recompiler block hashes & inline 64-bit ops #1640

kannoneer commented Sep 8, 2024 •

edited

Loading

rasky Sep 9, 2024

kannoneer Sep 10, 2024

kannoneer commented Sep 10, 2024 •

edited

Loading

invertego commented Oct 10, 2024

invertego left a comment

invertego Oct 18, 2024

invertego Oct 18, 2024

invertego Oct 18, 2024

invertego Oct 18, 2024

invertego Oct 18, 2024

invertego Oct 18, 2024

invertego Oct 18, 2024

invertego Oct 18, 2024

invertego Oct 18, 2024

invertego Oct 18, 2024

invertego Oct 18, 2024

		mov32_f(temp, flag_o);
		auto didntOverflow = cmp32_jump(temp, imm(0), flag_eq);

	mov32_f(temp, flag_o);
	auto didntOverflow = cmp32_jump(temp, imm(0), flag_eq);
	auto didntOverflow = jump(flag_no);

Add N64 recompiler block hashes & inline 64-bit ops #1640

Are you sure you want to change the base?

Add N64 recompiler block hashes & inline 64-bit ops #1640

Conversation

kannoneer commented Sep 8, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kannoneer commented Sep 10, 2024 • edited Loading

invertego commented Oct 10, 2024

invertego left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kannoneer commented Sep 8, 2024 •

edited

Loading

kannoneer commented Sep 10, 2024 •

edited

Loading