Skip to content

Internal Encoding of x86 Encodings

Thomas Harte edited this page Oct 2, 2023 · 3 revisions

For cached decodings, this emulator uses the layout:

  • 8 bits: operation;
  • 8 bits:
    • b7: address size — 16- or 32-bit;
    • b6: set if this instruction has a displacement or offset attached;
    • b5: set if this instruction has an immediate operand attached;
    • [b4, b0]: the source operand's Source;
  • 16 bits:
    • [b15, b14]: this instruction's data size;
    • [b13, b10]: the length of this instruction in bytes (or 0 to indicate a length extension word is present);
    • [b9, b5]: the top five bits of this instruction's SIB;
    • [b4, b0]: the destination operand's Source.

The low three bits of the SIB are stored in the low three bits of its operand's Source if necessary; the Source enum treats all values from 11000b upwards as having the equivalent meaning of Indirect for this reason.

Extension words are 16 bits in length for 16-bit decodings and 32 bits in length for 32-bit decodings. Up to three may be present, in the order:

  1. an immediate operand;
  2. an offset or displacement; and
  3. a length extension.

If a length extension is present then it is laid out as:

  • [b15/b31, b6]: instruction length in bytes;
  • [b5, b4]: repetition attached to this instruction — repe/repne;
  • [b3, b1]: segment override attached to this instruction;
  • b0: whether the lock prefix was found.

Therefore each decoded instruction is:

  • between 4 and 10 bytes for 16-bit decodings; and
  • between 4 and 16 bytes for 32-bit decodings.

sizeof(Instruction) is therefore either 10 or 16; it provides packing_size to give the size in bytes that are actually in use. Instruction is plain-old-data with a trivial destructor so it is safe to place them into memory such that instruction n+1 is placed at the address of instruction n + its packing_size(). Extension words therefore need be paid for only when required.