Need heuristics to determine when a high-confidence type should not be completely trusted #5922

fuzyll · 2024-09-11T19:59:40Z

What is the feature you'd like to have?
There are a number of situations where we receive high-confidence types from places like debug info (e.g. DWARF) or the demangler and the types we receive are obviously wrong. As examples:

DWARF information for something like glibc will not be able to "see through" GNU indirect functions and will report the wrong name, overwriting the correct name (which probably came from the demangler).
The demangler might represent a function type that has a hidden initial argument without all of its required parameters. For example, in Demangler should create types referenced from demangled names that don't already exist #5920, QString __cdecl QString::fromUtf8(const char *str, qsizetype size) is actually three arguments with the initial argument being a structure return type described here).

In these cases, type information from a different source (or from our own analysis) might actually be more accurate than what we're being fed from external sources. It would be nice to have a "double-check" step that applies some adjustments in these situations by either combining information from multiple sources or overriding higher-confidence data.

As a first step, it might also be useful to just detect these cases and hand their resolution over to the user (e.g. by tagging them all and having some indication of what the potentially detected problem was).

Is your feature request related to a problem?
Yes, see above.

Are any alternative solutions acceptable?
Anything that arrives at the 'correct' solution in these cases should be acceptable.

Additional Information:
Binaries that exhibit both of these cases are available upon request.

The text was updated successfully, but these errors were encountered:

emesare · 2024-09-15T20:29:24Z

2 Is also fundamentally indescribable from our current calling convention API, we should identify structure/memory returns as described by the ABI and add the hidden return argument. In the case of imports with demangled names where we don't have any backing function we would have to assume this behavior when a return type is over some calling convention specific size (in this case we assume QString is not a bare-type). When we can actually analyze the function we should be able to figure out based off the register specified.

Here's a good resource https://blog.aaronballman.com/2012/02/describing-the-msvc-abi-for-structure-return-types/ I also added it to the initial comment to limit confusion.

fuzyll · 2024-10-12T18:10:46Z

Issue #2275 may be required before this one.

fuzyll added Type: Enhancement Issue is a small enhancement to existing functionality Effort: Medium Issue should take < 1 month Impact: Medium Issue is impactful with a bad, or no, workaround Component: Core Issue needs changes to the core labels Sep 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Need heuristics to determine when a high-confidence type should not be completely trusted #5922

Need heuristics to determine when a high-confidence type should not be completely trusted #5922

fuzyll commented Sep 11, 2024 •

edited by emesare

Loading

emesare commented Sep 15, 2024 •

edited

Loading

fuzyll commented Oct 12, 2024

Need heuristics to determine when a high-confidence type should not be completely trusted #5922

Need heuristics to determine when a high-confidence type should not be completely trusted #5922

Comments

fuzyll commented Sep 11, 2024 • edited by emesare Loading

emesare commented Sep 15, 2024 • edited Loading

fuzyll commented Oct 12, 2024

fuzyll commented Sep 11, 2024 •

edited by emesare

Loading

emesare commented Sep 15, 2024 •

edited

Loading