-
Notifications
You must be signed in to change notification settings - Fork 481
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CallstackIsntr: Fix callback types #1447
base: dev
Are you sure you want to change the base?
Conversation
This change combined with #1445 introduces an inconsistency in the output of callstack_instr. Running the same recording multiple times can result in different output from callstack_instr...here's an example of what I'm seeing:
|
Oh no, I was expecting a difference with output from callstack_instr before/after these changes, but you're saying it's just producing varying output just running repeatedly with these changes? |
Correct - different consistent output would not have surprised me - but I'm seeing output that varies between different runs of panda replaying the same recording file. |
89b4117
to
ad4487d
Compare
@AndrewFasano EBE callbacks do not occur when an exit is requested. SBE/EBE are injected into the TCG stream and only occur when execution reaches that point. I am skeptical that this is the reason that we're experiencing an issue here. The recordings I've taken have tens of millions of instructions and tend to have tens of exits. I've done some analysis: #!/usr/bin/env python3
from sys import argv
from pandare import Panda
# Single arg of arch, defaults to i386
arch = "arm" if len(argv) <= 1 else argv[1]
panda = Panda(generic=arch)
from os.path import exists
recording_name = "rec"+arch
if not exists(f"{recording_name}-rr-snp"):
print("taking recording")
@panda.queue_blocking
def do_stuff():
panda.revert_sync("root")
panda.run_monitor_cmd(f"begin_record {recording_name}")
print(panda.run_serial_cmd("cat /etc/passwd && sleep 2 && cat /etc/passwd && ls -la"))
panda.run_monitor_cmd("end_record")
panda.stop_run()
panda.run()
else:
print("recording exists. not remaking recording")
bbe = 0
@panda.cb_before_block_exec
def bbed(cpu, tb):
global bbe
bbe += 1
abe = 0
num_exit = 0
@panda.cb_after_block_exec
def abed(cpu, tb, ec):
global num_exit,num_exit_tb_exit
if ec > 1:
num_exit += 1
global abe
abe += 1
sbe = 0
@panda.cb_start_block_exec
def sbed(cpu, tb):
global sbe
sbe += 1
ebe = 0
@panda.cb_end_block_exec
def ebed(cpu, tb):
global ebe
ebe += 1
panda.enable_tb_chaining()
panda.run_replay(recording_name)
print(f"{bbe=}, {sbe=}")
print(f"Number missing: {bbe-sbe=}")
print(f"{abe=}, {ebe=}")
print(f"Number missing: {abe-ebe=}")
print(f"Starts vs ends: {sbe-ebe=}, {bbe-abe=}")
print(f"{num_exit=} ") Which produces results like this:
To me this indicates that the block is checking for the exit status ahead of This does appear to be something you can check because even if the block is exited early our |
Thanks Luke, that's helpful info. I'm glad the EBE callback isn't called when a block is interrupted, that's a nice improvement over ABE! I can try building some more tests for this PR later, but I'm not sure how this PR could (in itself) cause the non-deterministic behavior unless something it's built on is more fundamentally broken 😨 |
The old logic would miss many calls if tb_chaining was enabled (which is is by default). Since this plugin did not disable tb_chaining, many calls would be missed if a user didn't disable chaining or load another plugin that disabled chaining. This commit updates the plugin to use start_block_exec and end_block_exec which work even with tb_chaining enabled.
ac4e71f
to
d152c22
Compare
I tested with the latest code and I'm still seeing issues where callstack_instr outputs different results for the same replay. |
I haven't forgotten about this, just haven't had time to track down the issue. Things are working correctly on the main |
Yes, no issues on dev. |
Oh - sorry - didn't realize the recent push was just a rebase - I thought you made some changes - ignore me. :) |
I think I know what is going wrong here. It looks like our implementation of start_block_exec runs prior to the first instruction, but not prior to the initial TCG block elements. In this case those initial TCG checks mean that if an exit event has occurred we would generate a before_block_exec, but not a start_block_exec. These are fairly rare, but over the course of millions of blocks it will occur. I had started testing out a change to start_block_exec that adds functionality earlier in the block, but I ran into some issues with LLVM generation failing. I'll likely follow up with a PR. |
@lacraig2 so if that's the issue it would mean the SBE callback is sometimes (non-deterministically) triggered once when it should be triggered twice? Are we going to have to add an exitCode to the EBE callback to indicate if a block was ended early and then just keep the existing callstack instr logic for filtering out double calls. |
I think I've found the issue. It seems like SBE should check I've implemented this as a separate PR #1471, but I have cherry picked it onto this branch so that @MarkMankins can check it with his tests. I have already confirmed that this fixes the issues for the file you sent me. Once we confirm that this works I'd like to revert this commit and merge #1471 and then this PR. |
#1445 incorrectly change the callback types without changing how they were registered. Thanks to @LauraLMann and @MarkMankins for pointing this out! I'm slightly terrified that this sort of mistake doesn't cause a compile error (and that all the CI tests still passed)!
I reverted that PR on
dev
. This PR now includes both those changes + the fix so we can discuss the issue @MarkMankins has found when the intended change is applied.