Adding support for 32-bit architectures #10

jkrshnmenon · 2023-12-05T08:28:42Z

Hi,

I was looking into using gtirb-rewriting along with ddisasm on some 32 bit applications (x86 and arm), and I saw that ddisasm does support both these architectures, however, gtirb-rewriting does not.

I see that an ABI class exists that is intended for x86_32 architecture, but it doesn't seem to be used anywhere.

I wanted to ask how much effort you expect you might need to implement support for 32 bit x86 and ARM applications ?
If it is a reasonable amount, I'd like to give it a shot if I can get some guidance on what needs to be done.

Looking forward to hearing from you.

jranieri-grammatech · 2023-12-06T16:22:06Z

Thanks for the interest! That class is used for 32-bit PE support, but as you've noticed there is not a corresponding ELF implementation and no 32-bit ARM support all.

Relatively speaking, adding a new ABI is straightforward:

Add a new ABI subclass in gtirb_rewriting/abi.py and make sure it's registered at the bottom of the file.
Add/update tests in tests/test_abi.py.
Update tests/test_calls.py to ensure that CallPatch does the right thing for your ABI, potentially also updating CallPatch.

For 32-bit ARM, there's a little bit more to do because we don't have any support for it yet:

Add a new CallPatch implementation in gtirb_rewriting/patches/call_patch.py
Updategtirb_rewriting/assembler/_mc_utils.py to know which instructions are indirect calls (note that these are LLVM instruction names, not the ISA names).
Double check that the Assembler works correctly for the 32-bit ARM variant you care about. It's possible that we'll need a more specific target triple in gtirb_rewriting/utils.py than 'arm'.

Just a heads up, I'm currently inquiring internally about how to accept an outside contribution to this repository and it will probably require you to sign a CLA.

jkrshnmenon · 2023-12-09T06:58:19Z

Thank you for your response.
Let me spend some time on this and see if I can get the 32-bit x86 ELF support ready first.
I'll keep you posted on this thread about progress or issues.

jkrshnmenon · 2023-12-14T18:26:19Z

I've managed to implement the x86 32-bit ELF support and all the test-cases do pass.
I've also made some progress on the ARM end.

The only thing that I'm missing from your list is the part about updating gtirb_rewriting/assembler/_mc_utils.py.
I could not find any documentation about the LLVM instructions and would appreciate if you could point me the right direction.

The code is available here

jranieri-grammatech · 2023-12-15T01:53:05Z

That code is used to determine if a call instruction has a known target or is indirect. You can use mc-asm's command line interface to print out what LLVM instruction names get used for a given assembly input:

Here's an example for x86-64:

$ echo "call direct; call rax; call qword ptr [rax]" | python3 -m mcasm --syntax=intel --target=x86_64-pc-linux --filter=emit_instruction -
⚡️ emit_instruction
├── state (ParserState)
│   └── loc (SourceLocation)
│       ├── lineno = 1
│       └── offset = 1
├── inst (Instruction)
│   ├── desc (InstructionDesc)
│   │   ├── implicit_uses (list)
│   │   │   ├── [0] (Register)
│   │   │   │   ├── id = 58
│   │   │   │   ├── is_physical_register = True
│   │   │   │   └── name = 'RSP'
│   │   │   └── [1] (Register)
│   │   │       ├── id = 66
│   │   │       ├── is_physical_register = True
│   │   │       └── name = 'SSP'
│   │   └── is_call = True
│   ├── name = 'CALL64pcrel32'
│   ├── opcode = 661
│   └── operands (list)
│       └── [0] (SymbolRefExpr)
│           ├── location (SourceLocation)
│           │   ├── lineno = 1
│           │   └── offset = 6
│           ├── symbol (Symbol)
│           │   └── name = 'direct'
│           └── variant_kind = SymbolRefExpr.VariantKind.None_
├── data = b'\xe8\x00\x00\x00\x00'
└── fixups (list)
    └── [0] (Fixup)
        ├── kind_info (FixupKindInfo)
        │   ├── bit_size = 32
        │   ├── is_pc_rel = 1
        │   └── name = 'reloc_branch_4byte_pcrel'
        ├── offset = 1
        └── value (BinaryExpr)
            ├── lhs (SymbolRefExpr)
            │   ├── location (SourceLocation)
            │   │   ├── lineno = 1
            │   │   └── offset = 6
            │   ├── symbol (Symbol)
            │   │   └── name = 'direct'
            │   └── variant_kind = SymbolRefExpr.VariantKind.None_
            ├── opcode = BinaryExpr.Opcode.Add
            └── rhs (ConstantExpr)
                └── value = -4

⚡️ emit_instruction
├── state (ParserState)
│   └── loc (SourceLocation)
│       ├── lineno = 1
│       └── offset = 14
├── inst (Instruction)
│   ├── desc (InstructionDesc)
│   │   ├── implicit_uses (list)
│   │   │   ├── [0] (Register)
│   │   │   │   ├── id = 58
│   │   │   │   ├── is_physical_register = True
│   │   │   │   └── name = 'RSP'
│   │   │   └── [1] (Register)
│   │   │       ├── id = 66
│   │   │       ├── is_physical_register = True
│   │   │       └── name = 'SSP'
│   │   └── is_call = True
│   ├── name = 'CALL64r'
│   ├── opcode = 662
│   └── operands (list)
│       └── [0] (Register)
│           ├── id = 49
│           ├── is_physical_register = True
│           └── name = 'RAX'
├── data = b'\xff\xd0'
└── fixups = []

⚡️ emit_instruction
├── state (ParserState)
│   └── loc (SourceLocation)
│       ├── lineno = 1
│       └── offset = 24
├── inst (Instruction)
│   ├── desc (InstructionDesc)
│   │   ├── implicit_uses (list)
│   │   │   ├── [0] (Register)
│   │   │   │   ├── id = 58
│   │   │   │   ├── is_physical_register = True
│   │   │   │   └── name = 'RSP'
│   │   │   └── [1] (Register)
│   │   │       ├── id = 66
│   │   │       ├── is_physical_register = True
│   │   │       └── name = 'SSP'
│   │   ├── is_call = True
│   │   └── may_load = True
│   ├── name = 'CALL64m'
│   ├── opcode = 659
│   └── operands (list)
│       ├── [0] (Register)
│       │   ├── id = 49
│       │   ├── is_physical_register = True
│       │   └── name = 'RAX'
│       ├── [1] = 1
│       ├── [2] (Register)
│       ├── [3] = 0
│       └── [4] (Register)
├── data = b'\xff\x10'
└── fixups = []

You can see how there's different LLVM instruction names despite it being the same assembly mnemonic. My hope is that 32-bit ARM also has different LLVM instruction names for direct calls versus indirect calls, but I'm only really familiar with 64-bit ARM.

jranieri-grammatech · 2023-12-15T02:04:33Z

Another thing I've noticed is that there'll probably need to be a change to mc-asm to expose the isa-specific MCExprs used in fixups. For example, there's important data missing when parsing this assembly:

        MOVS r0, #:upper8_15:#foo
        LSLS r0, r0, #8
        ADDS r0, #:upper0_7:#foo
        LSLS r0, r0, #8
        ADDS r0, #:lower8_15:#foo
        LSLS r0, r0, #8
        ADDS r0, #:lower0_7:#foo

... but I'm not familiar enough with 32-bit ARM to know if these relocations are commonly used or not.

adrianherrera · 2024-06-25T21:54:58Z

Hello! Was looking at using GTIRB-rewriting on some 32-bit binaries and stumbled across this thread.

Reading through this thread, it seems that ARM32 is not 100% implemented. But it seems like x86 is? If so, could we please merge in the x86 support? That would be grand!

jranieri-grammatech · 2024-07-08T17:11:29Z

@jkrshnmenon, is there any update on this? I can dig up a CLA for you to sign to get at least the 32-bit x86 support merged if you think that's ready.

jkrshnmenon · 2024-07-08T18:11:39Z

@jranieri-grammatech Apologies for the lack of communication here. But I think the 32-bit x86 support is ready to get merged.
I can try running more tests some time soon, but unfortunately I'm a bit busy until the end of July.
I can sign the CLA any time though.

jranieri-grammatech · 2024-10-23T18:58:29Z

@jranieri-grammatech Apologies for the lack of communication here. But I think the 32-bit x86 support is ready to get merged. I can try running more tests some time soon, but unfortunately I'm a bit busy until the end of July. I can sign the CLA any time though.

@jkrshnmenon The CLA has been added to the repository and contains instructions about where to submit it.

jkrshnmenon · 2024-10-31T20:28:40Z

@jranieri-grammatech Thank you for letting me know. I will run it by my employer just to make sure everything is in order before signing it.

7nightingale · 2024-11-12T08:23:13Z

I've managed to implement the x86 32-bit ELF support and all the test-cases do pass. I've also made some progress on the ARM end.

The only thing that I'm missing from your list is the part about updating gtirb_rewriting/assembler/_mc_utils.py. I could not find any documentation about the LLVM instructions and would appreciate if you could point me the right direction.

The code is available here

"Hi, I noticed there's been some progress on 32-bit ARM support in the project, which I've reviewed with interest. Could you provide an update on the current status? If I’d like to contribute further improvements, are there specific areas or issues I should focus on?"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding support for 32-bit architectures #10

Adding support for 32-bit architectures #10

jkrshnmenon commented Dec 5, 2023

jranieri-grammatech commented Dec 6, 2023

jkrshnmenon commented Dec 9, 2023

jkrshnmenon commented Dec 14, 2023

jranieri-grammatech commented Dec 15, 2023

jranieri-grammatech commented Dec 15, 2023

adrianherrera commented Jun 25, 2024

jranieri-grammatech commented Jul 8, 2024

jkrshnmenon commented Jul 8, 2024

jranieri-grammatech commented Oct 23, 2024

jkrshnmenon commented Oct 31, 2024

7nightingale commented Nov 12, 2024

Adding support for 32-bit architectures #10

Adding support for 32-bit architectures #10

Comments

jkrshnmenon commented Dec 5, 2023

jranieri-grammatech commented Dec 6, 2023

jkrshnmenon commented Dec 9, 2023

jkrshnmenon commented Dec 14, 2023

jranieri-grammatech commented Dec 15, 2023

jranieri-grammatech commented Dec 15, 2023

adrianherrera commented Jun 25, 2024

jranieri-grammatech commented Jul 8, 2024

jkrshnmenon commented Jul 8, 2024

jranieri-grammatech commented Oct 23, 2024

jkrshnmenon commented Oct 31, 2024

7nightingale commented Nov 12, 2024