Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracing support for the IR Interpreter #19457

Merged
merged 16 commits into from
Sep 16, 2024
Merged

Conversation

Nemoumbra
Copy link
Contributor

I've been working on the tracing support for a few years now, starting from the alpha MIPSLogger. I don't think it's worth listing the previous attempts and, say, the changes between https://github.com/Nemoumbra/ppsspp/tree/mips-logger and https://github.com/Nemoumbra/ppsspp/tree/mipslogger-faster, let's get to the point.

Note: this system is for experienced users only.
MIPSTracer works only in the IR Interpreter CPU core and it's disabled on Switch, Android and IOS (well, the code is there, but the UI to interact with the tracer is hidden under a preprocessor condition). For some reason the SDL build for Linux doesn't support the file picker, so the tracer can't work there. The button just doesn't do anything. I'm PRing it as is, because I intend to add the nativefiledialog-extended GTK file picker or something similar.

Modus operandi

Unmodified IR Interpreter

The IR Interpreter first checks if the pc is pointing at a special PPSSPP hack instruction which signifies that the basic block is ready for execution. If it's not a hack instruction, PPSSPP invokes the compiler (which should patch the first instruction on success) and retries. The compiler iterates over the MIPS instructions sequentially until the basic block's end and emits the PPSSPP's internal IR from the assembly. Then it runs various optimizations and saves the resulting IR code. The basic block's start position is encoded inside the emuhack instruction and later the IR is executed by IR Interpreter.

The compilation with the tracer enabled

The idea is to remember which blocks were executed and write this information to a file. If the tracing is ongoing, the compiler now will add one extra IR instruction to all blocks right after the first one (which is, generally speaking, a downcount). After the block is created, the tracer copies its content to a dedicated storage alongside with the block's size... unless it already saw a block like that before. In order to properly distinguish the blocks, we evaluate their hashes and place them in a mapping hash -> storage index, so if the hash is found in the unordered map, we don't have to copy the instructions!

In any case, we proceed by pushing the storage index and the block's start address into a supplementary vector trace_info, because identical basic blocks could appear at different addresses and we must remember them all. There's a little problem... I'm unable to prevent PPSSPP from both running fast and conserving memory as explained by the comment next to the line
trace_info.push_back({ virt_addr, storage_index }); inside MIPSTracer::prepare_block. This means that if, for some reason, PPSSPP decides to recompile old basic blocks, the trace_info will be filled with identical entries, which we can't easily filter out on the fly. Say, if the game repeatedly reloads overlays, this will cause the vector to grow until the user runs out of RAM or stops the tracing session. That shouldn't be a problem though, such cases are rather rare.

The indexes of trace_info elements represent basic blocks and they can fit into 4 bytes, therefore they are used as block identifiers. Now the tracer reaches into the IR instructions' storage and fills the constant argument of the newly added IROp::LogIRBlock. Then the tracer lets the compiler do its thing (overwrite the first MIPS instruction).

The execution

I designed the system to maximize the execution speed.
If the IR Interpreter sees the LogIRBlock instruction, it checks if we're tracing (perhaps, we're just running after a trace session, but some blocks still contain the logging instructions) and if we do, it pushes the block id into a cyclic buffer called executed_blocks. The user decides on the max number of basic blocks that can be saved to a trace. Once we run out of space, the cyclic buffer starts forgetting the oldest entries by overwriting them.
The tracing is stopped automatically once the core enters stepping.

The trace generation

We iterate over the executed_blocks (which contains trace_info indexes that represent basic blocks) and flush all of them to a file. That requires fetching the basic block's size and content from out storage, disassembling the instructions and formatting the output lines.

UI

image

I've added a new paragraph to the Developer tools, right before the Ubershaders. The first option is grayed out unless the CPU core is correct, a game is loaded and the CPU is stepping. Invalidating the JIT cache is required to force the recompilation of all basic blocks that could have been built without the logging instruction within should we ever run into them.

Performance

I tested it on Windows and noticed almost no differences in performance when compared to the IR interpreter running on its own. I also staged an experiment by sitting at the idle Class Select screen of Patapon 3 EU under different circumstances.

MIPSTracer build:

  • Debug:
    • IR Interpreter: 18 fps
    • Interpreter: 4 fps
    • IR Interpreter x MIPSTracer: 14 fps
  • Release:
    • IR Interpreter: 30 fps
    • Interpreter: 30 fps
    • IR Interpreter x MIPSTracer: 30 fps

The old "MIPSLogger faster" build (which I didn't sync with upstream):

  • Debug:
    • IR Interpreter: 10 fps
    • Interpreter: 4 fps
    • Interpreter x MIPSLogger: 💀
  • Release:
    • IR Interpreter: 30 fps
    • Interpreter: 30 fps
    • Interpreter x MIPSLogger: 27 fps

The file generation routine seems to be the longest.

Resulting trace size (Mb) Time to create the file (seconds)
654 21.2
350 11.3
108 3.3

This is, of course, the Release mode.
Unfortunately, the UI hangs until the trace is written to a file. I really wish to know if PPSSPP can schedule background processes that reflect the UI somehow.

I would really love to know if I'm missing some things worth adding (excluding the translations, that part can be added later) or reconsidering.

@anr2me
Copy link
Collaborator

anr2me commented Sep 15, 2024

Unfortunately, the UI hangs until the trace is written to a file. I really wish to know if PPSSPP can schedule background processes that reflect the UI somehow.

Is this happened when clicking on the "Flush the trace" ?
If it's, may be you can flush it on a separate thread, like the void HostnameSelectScreen::ResolverThread() at GameSettingsScreen.cpp, which runs on a separate thread due to net::DNSResolve(toResolve_, "80", &resolved, err) could blocked the thread and freezes the UI.
Should probably disable the "Flush the trace" button temporarily while the file is being flushed to prevent invoking another background thread when someone accidentally clicked the button twice :)

Copy link
Owner

@hrydgard hrydgard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a pretty cool functionality, which I can see some uses for, although I don't expect to use it very often myself.

I really like how self-contained it is, outside of MIPSTracer.cpp/h the changes are small. Well done on that!

I mainly just have some style requests, happy to merge then.

Threaded flush can be added later or in this PR.

@@ -189,6 +189,8 @@ static const IRMeta irMeta[] = {
{ IROp::RestoreRoundingMode, "RestoreRoundingMode", "" },
{ IROp::ApplyRoundingMode, "ApplyRoundingMode", "" },
{ IROp::UpdateRoundingMode, "UpdateRoundingMode", "" },

{IROp::LogIRBlock, "LogIRBlock", ""}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor formatting issue, please keep spacing consistent with above lines

Comment on lines +144 to +145
void start_tracing();
void stop_tracing();
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except in older code and in containers (that might want to look STL-like), we generally format member functions in CapitalCase. Though, since this is very self-contained, a local style can be accepted too.
In that case, it's a bit weird that setLoggingPath is different than the others.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setLoggingPath is taken from the MIPSLogger, I used to have camelCaseNames there. Will fix that.

INFO_LOG(Log::JIT, "TraceBlockStorage cleared");
}

void MIPSTracer::prepare_block(MIPSComp::IRBlock* block, MIPSComp::IRBlockCache& blocks) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can block be a const ptr or ref here?

return true;
}

void MIPSTracer::flush_block_to_file(TraceBlockInfo& block_info) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is only read from I think, should be const TraceBlockInfo &block_info

Comment on lines +63 to +64
template <typename T>
struct CyclicBuffer {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned, i'd like to see this moved into a new header in Common/Data/Collections, though would also be fine with this happening in a separate PR later.

@hrydgard hrydgard added this to the v1.18.0 milestone Sep 15, 2024
@Nemoumbra
Copy link
Contributor Author

Oh, I've just remembered I haven't mentioned the situation when the tracer dumps the last block to the trace fully instead of just halting at the PC:
image

During the development I initially thought it's just a quirk of the system, but this can theoretically be fixed by handling the last basic block in a separate function which would require knowing the current PC. I'm not sure if this is actually worth adding, because this would force the user to flush straight away after it enters stepping... And we also gotta fetch the PC from somewhere (it's not accessible from the MIPSTracer.cpp right now). Or maybe the tracer will have to save the PC for the time being until the trace is flushed. Neither of this seems optimal for me, so I'd like to know what you think.
If you'd like it to be implemented, which way? And should it be a different pull request?

@hrydgard
Copy link
Owner

hrydgard commented Sep 16, 2024

Well, as long as we know about the problem of the last block, it's not that big of a deal...

Maybe add a very prominent comment somewhere about it, until it's fixed?

Either add the comment, or post that you want it merged without it, i'm fine with either.

@hrydgard hrydgard merged commit 94f0e7a into hrydgard:master Sep 16, 2024
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants