Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decouple operand decoding #278

Merged
merged 2 commits into from
Dec 7, 2021
Merged

Decouple operand decoding #278

merged 2 commits into from
Dec 7, 2021

Conversation

flobernd
Copy link
Member

@flobernd flobernd commented Dec 1, 2021

Closes #110

@flobernd flobernd requested a review from athre0z December 1, 2021 09:16
@flobernd flobernd force-pushed the deccouple-operand-decoding branch 2 times, most recently from 0633749 to 1241203 Compare December 1, 2021 17:20
@flobernd flobernd marked this pull request as ready for review December 2, 2021 13:56
@flobernd flobernd force-pushed the deccouple-operand-decoding branch 2 times, most recently from a3e7003 to ad99855 Compare December 2, 2021 14:34
Copy link
Member

@athre0z athre0z left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent PR, all good changes. Reviewed everything except for the fuzzing logic, which I want to take a look at when I'm less tired, presumably sometime on the weekend.

Do we intend to keep the API as-is, or do we turn the ZydisDecoderDecodeBuffer function into ZydisDecoderDecodeBufferEx later, providing a simpler variant that automatically decodes the operands as well, naming it ZydisDecoderDecodeBuffer later? I originally thought this would be the way to go, but looking at the formatter interface, we'd also have to support the struct filled by the simple variant in that one, resulting in an explosion of complexity that I'd strongly prefer to avoid.

I'm kind of leaning towards keeping the interface as proposed in this PR as our proper interface and instead providing an even simpler function that fills an even larger struct that also does formatting in one wash, the audience of which would primarily be beginner users.

In any case, I'd be happy to keep as-is and discuss this in a separate issue later.

examples/RewriteCode.c Outdated Show resolved Hide resolved
examples/RewriteCode.c Outdated Show resolved Hide resolved
include/Zydis/DecoderTypes.h Show resolved Hide resolved
include/Zydis/SharedTypes.h Show resolved Hide resolved
/**
* Internal EVEX-specific information.
*/
struct
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you intentionally not ifdef away these when EVEX support is compiled out? I guess we don't really need to if the struct is purely internal, just wondering if you meant to do that.

Copy link
Member Author

@flobernd flobernd Dec 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that was intentional. I thought about this for quite some time. As we rely on the user to allocate our structs, we have to make sure the input struct size matches our internal expectations. In a pure C/C++ environment, we could probably ifdef-ing these members away without any problem, but that's a different story for bindings. We would always have to check if the struct size is matching (e.g. going the Windows API way and introducing a size field in the ZydisDecodedInstruction that needs to be initialized by the user - extra complexity which I don't really like).

I think we should discuss this in detail. One good solution might be to introduce a ZYDIS_SIZE_OPTIMIZED_UNSAFE_FOR_BINDINGS option. This would indicate that Zydis is used in a pure C/C++ project and always built on demand (-> we know the feature defines of headers and code files are the same). This would as well allow us to e.g. change the ZydisDecodedOperand struct to unions or to introduce bit-fields in the public API.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should discuss this in detail.

Agreed, but this definitely deserves a dedicated issue for proper discussion -- let's not do that in this PR.

Copy link
Member Author

@flobernd flobernd Dec 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's not do that in this PR

Will create a separate issue tomorrow.

src/Decoder.c Outdated Show resolved Hide resolved
include/Zydis/Decoder.h Show resolved Hide resolved
include/Zydis/Decoder.h Outdated Show resolved Hide resolved
include/Zydis/DecoderTypes.h Outdated Show resolved Hide resolved
include/Zydis/Formatter.h Show resolved Hide resolved
Copy link
Contributor

@mappzor mappzor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice changes :)

Some thoughs about API (@athre0z):

Do we intend to keep the API as-is, or do we turn the ZydisDecoderDecodeBuffer function into ZydisDecoderDecodeBufferEx later, providing a simpler variant that automatically decodes the operands as well, naming it ZydisDecoderDecodeBuffer later? I originally thought this would be the way to go, but looking at the formatter interface, we'd also have to support the struct filled by the simple variant in that one, resulting in an explosion of complexity that I'd strongly prefer to avoid.

I'm kind of leaning towards keeping the interface as proposed in this PR as our proper interface and instead providing an even simpler function that fills an even larger struct that also does formatting in one wash, the audience of which would primarily be beginner users.

What about this:

  • ZydisDecoderDecodeBuffer -> function that decodes instruction + operands (simple utility function)
  • ZydisDecoderDecodeInstruction -> minimal decoding (renamed ZydisDecoderDecodeBuffer from this PR)
  • ZydisDecoderDecodeOperands -> as is in this PR

In this concept signature for ZydisDecoderDecodeBuffer would change to accept operands, so it would be another breaking change since v3. This has some advantages though:

  • problem you highlighted (with having one big struct) is avoided
  • function still does what it did in v3 - decodes instruction and operands, just the output is split into new v4-fashion
  • this is easy to explain in porting guide and keeps migration rather staigtforward. For most existing users documentation of new ZydisDecoderDecodeBuffer would introduce them to a new concept of decoding instruction and operands separately while immediately informing them it's optional to do so because they can keep using ZydisDecoderDecodeBuffer after making minimal adjustments to existing codebases.

README.md Outdated Show resolved Hide resolved
@@ -716,6 +716,7 @@ ZYAN_STATIC_ASSERT(ZYDIS_RW_ACTION_REQUIRED_BITS <= 8);
# define ZYDIS_INSTRUCTION_DEFINITION_BASE \
ZyanU16 mnemonic ZYAN_BITFIELD(ZYDIS_MNEMONIC_REQUIRED_BITS); \
ZyanU8 operand_count ZYAN_BITFIELD( 4); \
ZyanU8 operand_count_visible ZYAN_BITFIELD( 3); \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this change, it allows converting some obsolete checks inside ZydisIsDefinitionCompatible into assertion.

src/Encoder.c Outdated Show resolved Hide resolved
tools/ZydisFuzzEncoder.c Outdated Show resolved Hide resolved
tools/ZydisFuzzEncoder.c Outdated Show resolved Hide resolved
@flobernd
Copy link
Member Author

flobernd commented Dec 3, 2021

@mappzor

What about this: [...]

I like the idea of providing one function that combines instruction- and operand-decoding. However I would probably just name the utility function to something like ZydisDecoderDecode/ZydisDecoderFullDecode. Renaming ZydisDecoderDecodeBuffer -> ZydisDecoderDecodeInstruction actually makes sense!

@mappzor
Copy link
Contributor

mappzor commented Dec 3, 2021

I like the idea of providing one function that combines instruction- and operand-decoding. However I would probably just name the utility function to something like ZydisDecoderDecode/ZydisDecoderFullDecode.

Both sound good to me. However if latter one is preferred I'd go with ZydisDecoderDecodeFull to stay consistent with ZydisDecoderDecode(Instruction|Operands). It's important to document that change in porting guide and it's probably best place to mention that alternative approach to decoding exists in v4.

@flobernd flobernd force-pushed the deccouple-operand-decoding branch 7 times, most recently from a8fcd64 to 1199d4b Compare December 5, 2021 10:30
@flobernd flobernd force-pushed the deccouple-operand-decoding branch from 1199d4b to e8f71e6 Compare December 5, 2021 10:34
Comment on lines +1909 to +1945
else
if (!operand->ignore_seg_override &&
instruction->attributes & ZYDIS_ATTRIB_HAS_SEGMENT_ES)
{
operands[i].mem.segment = ZYDIS_REGISTER_ES;
}
else
if (!operand->ignore_seg_override &&
instruction->attributes & ZYDIS_ATTRIB_HAS_SEGMENT_FS)
{
operands[i].mem.segment = ZYDIS_REGISTER_FS;
}
else
if (!operand->ignore_seg_override &&
instruction->attributes & ZYDIS_ATTRIB_HAS_SEGMENT_GS)
{
operands[i].mem.segment = ZYDIS_REGISTER_GS;
}
else
{
if (operands[i].mem.segment == ZYDIS_REGISTER_NONE)
{
if ((operands[i].mem.base == ZYDIS_REGISTER_RSP) ||
(operands[i].mem.base == ZYDIS_REGISTER_RBP) ||
(operands[i].mem.base == ZYDIS_REGISTER_ESP) ||
(operands[i].mem.base == ZYDIS_REGISTER_EBP) ||
(operands[i].mem.base == ZYDIS_REGISTER_SP) ||
(operands[i].mem.base == ZYDIS_REGISTER_BP))
{
operands[i].mem.segment = ZYDIS_REGISTER_SS;
}
else
{
operands[i].mem.segment = ZYDIS_REGISTER_DS;
}
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a fan of the rightward drift here, perhaps just going with inverted conditions and gotos here might be preferable? Could also do do { .. } while (ZYAN_FALSE); and break out of it, but that's essentially just a less honest goto.

src/Formatter.c Show resolved Hide resolved
src/Encoder.c Show resolved Hide resolved
@flobernd flobernd force-pushed the deccouple-operand-decoding branch from f756d6a to e8f71e6 Compare December 6, 2021 14:43
athre0z
athre0z previously approved these changes Dec 6, 2021
Copy link
Member

@athre0z athre0z left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! :-) One remaining open comment, but it's just style / not a blocker.

Edit: Ah, wait! We should also update the porting guide.

@flobernd
Copy link
Member Author

flobernd commented Dec 7, 2021

Will update the porting guide before merge.

Open points for later:

I think we should discuss this in detail. One good solution might be to introduce a ZYDIS_SIZE_OPTIMIZED_UNSAFE_FOR_BINDINGS option. This would indicate that Zydis is used in a pure C/C++ project and always built on demand (-> we know the feature defines of headers and code files are the same). This would as well allow us to e.g. change the ZydisDecodedOperand struct to unions or to introduce bit-fields in the public API.

  • Discussion: Remove ZydisFormatterFormatEx
  • Optimize encoder (operand masks)

@flobernd flobernd merged commit 979f0be into master Dec 7, 2021
@flobernd flobernd deleted the deccouple-operand-decoding branch December 7, 2021 10:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Structual changes to decouple operand-decoding
3 participants