Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding support for 32-bit architectures #10

Open
jkrshnmenon opened this issue Dec 5, 2023 · 11 comments
Open

Adding support for 32-bit architectures #10

jkrshnmenon opened this issue Dec 5, 2023 · 11 comments

Comments

@jkrshnmenon
Copy link

Hi,

I was looking into using gtirb-rewriting along with ddisasm on some 32 bit applications (x86 and arm), and I saw that ddisasm does support both these architectures, however, gtirb-rewriting does not.

I see that an ABI class exists that is intended for x86_32 architecture, but it doesn't seem to be used anywhere.

I wanted to ask how much effort you expect you might need to implement support for 32 bit x86 and ARM applications ?
If it is a reasonable amount, I'd like to give it a shot if I can get some guidance on what needs to be done.

Looking forward to hearing from you.

@jranieri-grammatech
Copy link
Collaborator

Thanks for the interest! That class is used for 32-bit PE support, but as you've noticed there is not a corresponding ELF implementation and no 32-bit ARM support all.

Relatively speaking, adding a new ABI is straightforward:

  • Add a new ABI subclass in gtirb_rewriting/abi.py and make sure it's registered at the bottom of the file.
  • Add/update tests in tests/test_abi.py.
  • Update tests/test_calls.py to ensure that CallPatch does the right thing for your ABI, potentially also updating CallPatch.

For 32-bit ARM, there's a little bit more to do because we don't have any support for it yet:

  • Add a new CallPatch implementation in gtirb_rewriting/patches/call_patch.py
  • Updategtirb_rewriting/assembler/_mc_utils.py to know which instructions are indirect calls (note that these are LLVM instruction names, not the ISA names).
  • Double check that the Assembler works correctly for the 32-bit ARM variant you care about. It's possible that we'll need a more specific target triple in gtirb_rewriting/utils.py than 'arm'.

Just a heads up, I'm currently inquiring internally about how to accept an outside contribution to this repository and it will probably require you to sign a CLA.

@jkrshnmenon
Copy link
Author

Thank you for your response.
Let me spend some time on this and see if I can get the 32-bit x86 ELF support ready first.
I'll keep you posted on this thread about progress or issues.

@jkrshnmenon
Copy link
Author

I've managed to implement the x86 32-bit ELF support and all the test-cases do pass.
I've also made some progress on the ARM end.

The only thing that I'm missing from your list is the part about updating gtirb_rewriting/assembler/_mc_utils.py.
I could not find any documentation about the LLVM instructions and would appreciate if you could point me the right direction.

The code is available here

@jranieri-grammatech
Copy link
Collaborator

That code is used to determine if a call instruction has a known target or is indirect. You can use mc-asm's command line interface to print out what LLVM instruction names get used for a given assembly input:

Here's an example for x86-64:

$ echo "call direct; call rax; call qword ptr [rax]" | python3 -m mcasm --syntax=intel --target=x86_64-pc-linux --filter=emit_instruction -
⚡️ emit_instruction
├── state (ParserState)
│   └── loc (SourceLocation)
│       ├── lineno = 1
│       └── offset = 1
├── inst (Instruction)
│   ├── desc (InstructionDesc)
│   │   ├── implicit_uses (list)
│   │   │   ├── [0] (Register)
│   │   │   │   ├── id = 58
│   │   │   │   ├── is_physical_register = True
│   │   │   │   └── name = 'RSP'
│   │   │   └── [1] (Register)
│   │   │       ├── id = 66
│   │   │       ├── is_physical_register = True
│   │   │       └── name = 'SSP'
│   │   └── is_call = True
│   ├── name = 'CALL64pcrel32'
│   ├── opcode = 661
│   └── operands (list)
│       └── [0] (SymbolRefExpr)
│           ├── location (SourceLocation)
│           │   ├── lineno = 1
│           │   └── offset = 6
│           ├── symbol (Symbol)
│           │   └── name = 'direct'
│           └── variant_kind = SymbolRefExpr.VariantKind.None_
├── data = b'\xe8\x00\x00\x00\x00'
└── fixups (list)
    └── [0] (Fixup)
        ├── kind_info (FixupKindInfo)
        │   ├── bit_size = 32
        │   ├── is_pc_rel = 1
        │   └── name = 'reloc_branch_4byte_pcrel'
        ├── offset = 1
        └── value (BinaryExpr)
            ├── lhs (SymbolRefExpr)
            │   ├── location (SourceLocation)
            │   │   ├── lineno = 1
            │   │   └── offset = 6
            │   ├── symbol (Symbol)
            │   │   └── name = 'direct'
            │   └── variant_kind = SymbolRefExpr.VariantKind.None_
            ├── opcode = BinaryExpr.Opcode.Add
            └── rhs (ConstantExpr)
                └── value = -4

⚡️ emit_instruction
├── state (ParserState)
│   └── loc (SourceLocation)
│       ├── lineno = 1
│       └── offset = 14
├── inst (Instruction)
│   ├── desc (InstructionDesc)
│   │   ├── implicit_uses (list)
│   │   │   ├── [0] (Register)
│   │   │   │   ├── id = 58
│   │   │   │   ├── is_physical_register = True
│   │   │   │   └── name = 'RSP'
│   │   │   └── [1] (Register)
│   │   │       ├── id = 66
│   │   │       ├── is_physical_register = True
│   │   │       └── name = 'SSP'
│   │   └── is_call = True
│   ├── name = 'CALL64r'
│   ├── opcode = 662
│   └── operands (list)
│       └── [0] (Register)
│           ├── id = 49
│           ├── is_physical_register = True
│           └── name = 'RAX'
├── data = b'\xff\xd0'
└── fixups = []

⚡️ emit_instruction
├── state (ParserState)
│   └── loc (SourceLocation)
│       ├── lineno = 1
│       └── offset = 24
├── inst (Instruction)
│   ├── desc (InstructionDesc)
│   │   ├── implicit_uses (list)
│   │   │   ├── [0] (Register)
│   │   │   │   ├── id = 58
│   │   │   │   ├── is_physical_register = True
│   │   │   │   └── name = 'RSP'
│   │   │   └── [1] (Register)
│   │   │       ├── id = 66
│   │   │       ├── is_physical_register = True
│   │   │       └── name = 'SSP'
│   │   ├── is_call = True
│   │   └── may_load = True
│   ├── name = 'CALL64m'
│   ├── opcode = 659
│   └── operands (list)
│       ├── [0] (Register)
│       │   ├── id = 49
│       │   ├── is_physical_register = True
│       │   └── name = 'RAX'
│       ├── [1] = 1
│       ├── [2] (Register)
│       ├── [3] = 0
│       └── [4] (Register)
├── data = b'\xff\x10'
└── fixups = []

You can see how there's different LLVM instruction names despite it being the same assembly mnemonic. My hope is that 32-bit ARM also has different LLVM instruction names for direct calls versus indirect calls, but I'm only really familiar with 64-bit ARM.

@jranieri-grammatech
Copy link
Collaborator

Another thing I've noticed is that there'll probably need to be a change to mc-asm to expose the isa-specific MCExprs used in fixups. For example, there's important data missing when parsing this assembly:

        MOVS r0, #:upper8_15:#foo
        LSLS r0, r0, #8
        ADDS r0, #:upper0_7:#foo
        LSLS r0, r0, #8
        ADDS r0, #:lower8_15:#foo
        LSLS r0, r0, #8
        ADDS r0, #:lower0_7:#foo

... but I'm not familiar enough with 32-bit ARM to know if these relocations are commonly used or not.

@adrianherrera
Copy link

Hello! Was looking at using GTIRB-rewriting on some 32-bit binaries and stumbled across this thread.

Reading through this thread, it seems that ARM32 is not 100% implemented. But it seems like x86 is? If so, could we please merge in the x86 support? That would be grand!

@jranieri-grammatech
Copy link
Collaborator

@jkrshnmenon, is there any update on this? I can dig up a CLA for you to sign to get at least the 32-bit x86 support merged if you think that's ready.

@jkrshnmenon
Copy link
Author

@jranieri-grammatech Apologies for the lack of communication here. But I think the 32-bit x86 support is ready to get merged.
I can try running more tests some time soon, but unfortunately I'm a bit busy until the end of July.
I can sign the CLA any time though.

@jranieri-grammatech
Copy link
Collaborator

@jranieri-grammatech Apologies for the lack of communication here. But I think the 32-bit x86 support is ready to get merged. I can try running more tests some time soon, but unfortunately I'm a bit busy until the end of July. I can sign the CLA any time though.

@jkrshnmenon The CLA has been added to the repository and contains instructions about where to submit it.

@jkrshnmenon
Copy link
Author

@jranieri-grammatech Thank you for letting me know. I will run it by my employer just to make sure everything is in order before signing it.

@7nightingale
Copy link

I've managed to implement the x86 32-bit ELF support and all the test-cases do pass. I've also made some progress on the ARM end.

The only thing that I'm missing from your list is the part about updating gtirb_rewriting/assembler/_mc_utils.py. I could not find any documentation about the LLVM instructions and would appreciate if you could point me the right direction.

The code is available here

"Hi, I noticed there's been some progress on 32-bit ARM support in the project, which I've reviewed with interest. Could you provide an update on the current status? If I’d like to contribute further improvements, are there specific areas or issues I should focus on?"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants