-
Notifications
You must be signed in to change notification settings - Fork 481
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Callstack_instr: Is popping up to 10 entries sensible? #1513
Comments
Hi Coffeeri, It's exciting to hear about efforts to build dynamic analyses on top of QEMU's modern plugin architecture - we've been exploring that direction and think it's a really promising area for future development! We have an issue about moving panda functionality to qemu plugins (#1383), a qemu fork where we've done some of this work, and we issued a (never merged) patch to upstream trying to add support for allowing plugins to interact with one another. As for your question with callstack_instr, I'm not too sure what's going on in our implementation. The search for 10 blocks has been there since @moyix wrote it 12 years ago in PANDA 1.0 here. I suspect this might be trying to account for false positives - if a value is incorrectly added to the shadow stack, we'd skip past it if we later return to a PC that wasn't on the top of our shadow stack. Then we clean up the shadow stack and trigger the Also, it's not related to your question, but one hard-learned lesson about |
Thank you for your answer! I am actively following your effort to Allow TCG plugins to read memory. Regarding your explanation, I assumed it was a way to counter false positives. The suggested additional map would not require this step. If you are going to re-implement the plugin in your new architecture, I will be happy to take a look and may contribute if help is needed. My suggestion does not require Capstone as a dependency. As we have total control of the translation step, I found translating instructions only to later disassemble them again using Capstone to be unnecessary overhead. However, I definitely understand your approach, which aims to be mostly architecture-independent (I am focusing on x86(_64) only) without modifying too many moving parts of the TCG core. I don't have a better generic solution to this, which has potential to be compatible with QEMUs plugin-system, while leaving the TGC core mostly untouched. You may close this issue if no change to the current state of |
@Coffeeri I'd be quite interested in a
I appreciate that you're following this issue. I'm pushing on this issue in part because I suspect that there have been many separate implementations of a lot of the core infrastructure that QEMU is lacking in the plugin sphere. I know of at least a handful. I've found that even if you don't have time to provide code to communities like QEMU it's still worth your time to provide perspective and opinions based on your experiences. |
Hello there,
I am currently using an approach similar to PANDA, embedding callbacks into QEMU's v8 source code for malware research in my master's thesis. One of my core needs is to keep track of currently called, but not yet returned, procedures. This is where a shadow callstack becomes very useful.
My shadow callstack implementation (in C) encountered some issues with determining when to pop and ensuring the return is correct. My approach was to add boolean variables
is_call
andis_ret
(with respect to the parsed opcodes in that TB) to the TranslationBlock data structure at translation time.On a call, I push the
pc
to the callstack and hook the return address. When the hook is executed, it pops the callstack until it reaches a stored stack index referring to the earlier pushedpc
.Coming along your implementation of the callstack_instr plugin I have learned about the interrupts! Earlier I have totally missed them, which probably screwed my shadow callstack.
What confuses me about your implementation is that you pop up to 10 TBs before TB execution (if it was not interrupted):
My new approach uses an additional map that is stack-segregated, with the return address
(tb->pc + tb->size)
as the key and a simple uint as a counter. On every call, I add the return address with a count of 1, or increment it if it already exists.Before every TB execution, I check whether the current
pc
is in that map. If so, I pop the callstack until the return address is found. Then, I decrease the related value in the map by one or remove it if it would be zero.I'm confused and wondering if I'm misunderstanding something.
I'd love some feedback if my approach is totally off and the current one of callstack_instr is the sensible.
The text was updated successfully, but these errors were encountered: