Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need heuristics to determine when a high-confidence type should not be completely trusted #5922

Open
fuzyll opened this issue Sep 11, 2024 · 2 comments
Labels
Component: Core Issue needs changes to the core Effort: Medium Issue should take < 1 month Impact: Medium Issue is impactful with a bad, or no, workaround Type: Enhancement Issue is a small enhancement to existing functionality

Comments

@fuzyll
Copy link
Contributor

fuzyll commented Sep 11, 2024

What is the feature you'd like to have?
There are a number of situations where we receive high-confidence types from places like debug info (e.g. DWARF) or the demangler and the types we receive are obviously wrong. As examples:

  1. DWARF information for something like glibc will not be able to "see through" GNU indirect functions and will report the wrong name, overwriting the correct name (which probably came from the demangler).
  2. The demangler might represent a function type that has a hidden initial argument without all of its required parameters. For example, in Demangler should create types referenced from demangled names that don't already exist #5920, QString __cdecl QString::fromUtf8(const char *str, qsizetype size) is actually three arguments with the initial argument being a structure return type described here).

In these cases, type information from a different source (or from our own analysis) might actually be more accurate than what we're being fed from external sources. It would be nice to have a "double-check" step that applies some adjustments in these situations by either combining information from multiple sources or overriding higher-confidence data.

As a first step, it might also be useful to just detect these cases and hand their resolution over to the user (e.g. by tagging them all and having some indication of what the potentially detected problem was).

Is your feature request related to a problem?
Yes, see above.

Are any alternative solutions acceptable?
Anything that arrives at the 'correct' solution in these cases should be acceptable.

Additional Information:
Binaries that exhibit both of these cases are available upon request.

@fuzyll fuzyll added Type: Enhancement Issue is a small enhancement to existing functionality Effort: Medium Issue should take < 1 month Impact: Medium Issue is impactful with a bad, or no, workaround Component: Core Issue needs changes to the core labels Sep 11, 2024
@emesare
Copy link
Member

emesare commented Sep 15, 2024

2 Is also fundamentally indescribable from our current calling convention API, we should identify structure/memory returns as described by the ABI and add the hidden return argument. In the case of imports with demangled names where we don't have any backing function we would have to assume this behavior when a return type is over some calling convention specific size (in this case we assume QString is not a bare-type). When we can actually analyze the function we should be able to figure out based off the register specified.

Here's a good resource https://blog.aaronballman.com/2012/02/describing-the-msvc-abi-for-structure-return-types/ I also added it to the initial comment to limit confusion.

@fuzyll
Copy link
Contributor Author

fuzyll commented Oct 12, 2024

Issue #2275 may be required before this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Core Issue needs changes to the core Effort: Medium Issue should take < 1 month Impact: Medium Issue is impactful with a bad, or no, workaround Type: Enhancement Issue is a small enhancement to existing functionality
Projects
None yet
Development

No branches or pull requests

2 participants