-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add N64 recompiler block hashes & inline 64-bit ops #1640
base: master
Are you sure you want to change the base?
Conversation
ares/n64/cpu/cpu.cpp
Outdated
.cop1Enabled = scc.status.enable.coprocessor1 > 0, | ||
.floatingPointMode = scc.status.floatingPointMode > 0, | ||
.is64bit = context.bits == 64, | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit concerned that we compose the context structure every block we run. I believe it might be better if we do this on changes instead. I understand it's harder to get right and can bring more bugs, but I believe it's going to be faster.
So let's say, have a function that recalculates the current context and its hash key and store them somewhere. At runtime, we just use the precalulcated context and the precalculated hash key.
Then, you need to call the function to recalculate the context in any codepath that can change one of the variables that affect it, os for instance mtc0
of the status register is a prime suspect, and there will be others of course, but maybe not dozen of them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the quick review. I moved JITContext inside the existing Context struct, now it's Context::JIT
and its contents + bit vector representation are computed only when Context::setMode()
is called. This is more often than strictly needed, for example exceptions change the mcc vector state and call setMode()
but should be better than before.
I added separate update
and toBits
member functions to Context::JIT
for later debugging; they can be used to check for staleness bugs in debug builds.
We'd like the recompiler to take the execution context such as kernel mode into account when compiling blocks. That's why it's necessary to identify blocks not just by address but all the information used at compile time. This is done by computing a 32-bit key and using that as a block's identifier instead of the last six physical address bits like was done before. The execution state and its representation as bit vector are recomputed only when needed, in this case each time Context::setMode() is called, which happens on powerup, in both MTC0 and MFC0 instructions, and on exceptions. Since we have now 32-bit instead of 6-bit keys, the block() function hashes the keys before mapping them to one of the 64 pool rows. The hash function was chosen arbitrarily to be better than a simple multiplicative hash and is likely not the best choice for this exact task.
* Pass JITContext down to leaf emit functions. * Emit inline implementations of basic 64-bit operations. * Use block compile-time information to elide kernel mode checks of the now inlined operations.
382c633
to
ba4504a
Compare
The |
Sorry for the delay. I will try to review this in the next few days. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good aside from the issues mentioned. How much more are you planning to do with this before taking it out of draft status?
return (address >> 2 & 0x3f) | (jitBits & ~0x3f); | ||
} | ||
|
||
auto CPU::Recompiler::computePoolRow(u32 key) -> u32 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The n64 core already pulls in xxhash.h, so XXH32_avalanche might work here as well.
@@ -18,6 +18,9 @@ auto CPU::Context::setMode() -> void { | |||
break; | |||
} | |||
|
|||
jit.update(*this, self); | |||
jitBits = jit.toBits(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should update() be responsible for recalculating jitBits?
@@ -18,6 +18,9 @@ auto CPU::Context::setMode() -> void { | |||
break; | |||
} | |||
|
|||
jit.update(*this, self); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also need to make sure this is updated after a save state load.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also when the breakpoint count changes.
@@ -106,6 +118,8 @@ struct CPU : Thread { | |||
u32 mode; | |||
u32 bits; | |||
u32 segment[8]; //512_MiB chunks | |||
u32 jitBits; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rest of the recompiler state is in the recompiler object. Is there a reason you want to keep this state separate?
call(&CPU::DSUBU); | ||
emitZeroClear(Rdn); | ||
if (!checkDualAllowed(ctx)) return 1; | ||
sub64(reg(0), mem(Rs), mem(Rt), set_o); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
set_o not needed
mov32_f(temp, flag_o); | ||
auto didntOverflow = cmp32_jump(temp, imm(0), flag_eq); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mov32_f(temp, flag_o); | |
auto didntOverflow = cmp32_jump(temp, imm(0), flag_eq); | |
auto didntOverflow = jump(flag_no); |
// If overflow flag set: throw an exception, skip the instruction via the 'end' label. | ||
mov32_f(temp, flag_o); | ||
auto didntOverflow = cmp32_jump(temp, imm(0), flag_eq); | ||
call(&CPU::Exception::arithmeticOverflow, &cpu.exception); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Emitting calls this way causes all the parameters to be emitted as immediates. In general it's cheaper (in terms of code footprint) to calculate addresses that are passed as arguments (see instances of lea
in this file). Although, for cold code paths like this, it would be even better to call a trampoline in CPU that takes no arguments (aside from the implicit this pointer).
emitZeroClear(Rtn); | ||
if (!checkDualAllowed(ctx)) return 1; | ||
add64(reg(0), mem(Rs), imm(i16), set_o); | ||
if(Rtn > 0) mov64(mem(Rt), reg(0)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The add64 can also be skipped if Rtn is zero.
emitZeroClear(Rdn); | ||
if (!checkDualAllowed(ctx)) return 1; | ||
sub64(reg(0), mem(Rs), mem(Rt), set_o); | ||
if(Rdn > 0) mov64(mem(Rd), reg(0)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sub64 can be skipped in Rdn is zero.
Block* block; | ||
u32 tag; | ||
}; | ||
Row rows[1 << 6]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How did you arrive at 64 rows? Were other numbers tried?
Reworking recompiler's JIT block accesses to be more flexible so that we can bake more runtime info to them. This way runtime checks can be avoided in the generated code, making inlining the implementations more attractive since we don't need to generate calls to e.g. kernel mode checks for every 64-bit op. This is expected to increase recompiled code's performance eventually.
I expect these changes to have a small negative performance impact but haven't measured it.
Commit messages: