-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathindex.json
1 lines (1 loc) · 150 KB
/
index.json
1
[{"authors":{"avatar":"img/avatar/romain.png","display_name":"Romain Thomas","family_name":"Thomas","given_name":"Romain","profile":"https://www.romainthomas.fr"},"categories":null,"content":"Documentation Documentation is an important aspect of LIEF and since the beginning of the project, I have spent a decent amount of time keeping comprehensive and intuitive documentation.\nUsually, I don\u0026rsquo;t include documentation updates in the changelog but in this case, I thought it could be worth sharing this experience.\nLIEF is written in C++ with bindings for Python and Rust. Originally, the documentation was driven by languages API (isolated from each other) and generated by Sphinx with the Breathe plugin to reference the C++ Doxygen domain.\nRecently, Rust landed in the arena. Compared to Python and C++, the Rust language embeds a built-in documentation engine to process and generate in-code documentation into html pages.\nGiven the new Rust bindings and the Rust built-in documentation engine, two questions emerged:\nDo we want to add (yet) another API page for Rust? How do we reference Rust API in Sphinx? For the first point, I moved from a language-driven documentation structure to a functionality-driven structure. This changes the way the documentation is consumed: Instead of looking for format-specific API for a given language (e.g. Python) you look first for what you want to do (ELF processing, Dyld shared cache parsing) and then you access the language API you are interested in.\nSo instead of adding another Rust API reference page, the Rust API has been transparently integrated with the new layout.\nThe second point has been a bit more tricky to approach. With a reverse engineering background, I really value the cross-reference feature provided by Sphinx:\n1blah blah blah :py:class:`lief.ELF.Binary` another blah: :cpp:class:`LIEF::ELF::Binary` Python is a built-in domain supported by Sphinx and Breathe extension is doing the bridge between C++ Doxygen XML files and Sphinx. For Rust, there are some attempts to create a bridge but I decided to take another path. I created a Rust-sphinx domain that cross-references to the official or nightly Rust documentation. Basically with this custom domain, the following cross-references redirect to the official or nightly documentation:\n1:rust:module:`lief::assembly` 2:rust:enum:`lief::assembly::Instructions` Is translated into:\n1https://lief-rs.s3.fr-par.scw.cloud/doc/latest/lief/assembly/index.html 2https://lief-rs.s3.fr-par.scw.cloud/doc/latest/lief/assembly/enum.Instructions.html By doing so, we can leverage Sphinx\u0026rsquo;s cross-reference functionalities while still keeping the built-in Rust documentation. In addition to this Rust-specific domain, I created a .. lief-api:: directive that can pack similar cross-language API into a single block.\nFor instance, this directive:\n1.. lief-api:: lief.Binary.disassemble() 2 3 :rust:method:`lief::generic::Binary::disassemble [trait]` 4 :rust:method:`lief::generic::Binary::disassemble_symbol [trait]` 5 :rust:method:`lief::generic::Binary::disassemble_address [trait]` 6 :rust:method:`lief::generic::Binary::disassemble_slice [trait]` 7 :cpp:func:`LIEF::Binary::disassemble` 8 :py:meth:`lief.Binary.disassemble` 9 :py:meth:`lief.Binary.disassemble_from_bytes` Is rendered as:\nThis allows us to refer the API for different languages without being too verbose and impacting readability. Combined with Sphinx substitution, we can write:\nThis is an example that cross-reference |lief-disassemble| .. |lief-disassemble| lief-api:: lief.Binary.disassemble() :rust:method:`lief::generic::Binary::disassemble [trait]` :rust:method:`lief::generic::Binary::disassemble_symbol [trait]` :rust:method:`lief::generic::Binary::disassemble_address [trait]` :rust:method:`lief::generic::Binary::disassemble_slice [trait]` :cpp:func:`LIEF::Binary::disassemble` :py:meth:`lief.Binary.disassemble` :py:meth:`lief.Binary.disassemble_from_bytes` You can go checking out this page https://lief.re/doc/latest/formats/pe/index.html to see a concrete rendering of these changes.\nExtended Features Public Release\nThe extended version is now publicly available at this address: https://extended.lief.re Assembler \u0026amp; Disassembler Adding (or not adding) a disassembler in LIEF has been a long-standing question and with the extended version, I found a fair trade-off:\nLIEF core focuses on executable formats, free from any extra features that might have a significant impact on the build complexity or library size. On the other hand, LIEF extended provides additional functionalities that require a more complex build pipeline and increase the binary size. Among these extended functionalities, ther are a disassembler and an assembler based on the LLVM\u0026rsquo;s MC layer.\nThe disassembling API is provided at different levels:\nLIEF::Binary 1import lief 2 3pe = lief.PE.parse(\u0026#34;cmd.exe\u0026#34;) 4for inst in pe.disassemble(0x400000): 5 print(inst) 6 7 # Instruction semantic 8 print(inst.is_syscall) 9 print(inst.is_memory_access) 10 print(inst.is_call) 11 12 # Instruction operands (for AArch64 and x86-64) 13 if isinstance(inst, lief.assembly.aarch64.Instruction): 14 for idx, operand in enumerate(inst.operands): 15 match operand: 16 case lief.assembly.aarch64.operands.Register(): 17 print(f\u0026#34;OP[{idx}] -- REG: {operand.value}\u0026#34;) 18 case lief.assembly.aarch64.operands.Memory(): 19 print(f\u0026#34;OP[{idx}] -- MEM: {operand.base} {operand.offset}\u0026#34;) 20 case lief.assembly.aarch64.operands.PCRelative(): 21 print(f\u0026#34;OP[{idx}] -- PCR: {operand.value}\u0026#34;) 22 case lief.assembly.aarch64.operands.Immediate(): 23 print(f\u0026#34;OP[{idx}] -- IMM: {operand.value}\u0026#34;) LIEF::dwarf::Function 1import lief 2 3elf = lief.ELF.parse(\u0026#34;my-dbg.elf\u0026#34;) 4dwarf: lief.dwarf.DebugInfo = elf.debug_info 5func: lief.dwarf.Function = dwarf.find_function(\u0026#34;my_debug_function\u0026#34;) 6 7for inst in func.instructions: 8 print(inst) LIEF::dsc::DyldSharedCache 1import lief 2 3cache = lief.dsc.load(\u0026#34;ios-18/\u0026#34;) 4for inst in cache.disassemble(0x1886f4a44): 5 print(inst) In terms of implementation, the disassembler wraps a lazy iterator that evaluates/disassembles an instruction only when the iterator is processed. It means that you don\u0026rsquo;t pay any overhead until you access the iterator\u0026rsquo;s value:\n1# O(0) 2inst = macho.disassemble(0x400000) 3 4inst = macho.disassemble(0x400000) 5# O(10) 6for _ in range(10): 7 next(inst) The .end() sentinel of the iterator is based on two properties:\nEither a range is specified (e.g. macho.disassemble(0x400000, /*size*/0x1000)) and the iterator past the end of the range. The instruction can\u0026rsquo;t be disassembled. This kind of sentinel allows us to use this API: macho.disassemble(0x400000) which will disassemble (lazily) instructions at the address 0x400000 until it fails.\nC++ \u0026 Rust \u0026 Python\nThe disassembler/assembler API is uniformly available in Rust, C++, and Python. Capstone? Nyxstone? As stated in the documentation the major design difference with Capstone is that LIEF uses a mainstream version of LLVM with limited patches1 on the MC layer (the current version is based on LLVM 19.1.2).\nThe design difference with Nyxstone is that LLVM is hidden from the public API which means that it does not require to have an LLVM version pre-install on the system. Moreover, it exposes opcodes and control-flow/semantic information about the instructions.\nOn the other hand, LIEF does not provide a standalone API to disassemble arbitrary instructions. The disassembler engine is bound to the object from which the API is exposed.\nAssembler In association with a disassembler, LIEF exposes a (basic) assembly API that allows generating and patching instructions:\n1import lief 2 3elf = lief.ELF.parse(\u0026#34;my-android-obfuscated.so\u0026#34;) 4text = elf.get_section(\u0026#34;.text\u0026#34;) 5# Disassembler 6syscall = [inst for inst in elf.disassemble(bytes(text)) if inst.is_syscall] 7 8# Assembler 9for syscall_inst in syscall: 10 new_bytes = elf.assemble(syscall_inst.address, \u0026#34;nop;\u0026#34;) # Assemble AND patch 11 print(new_bytes.hex(\u0026#34;, \u0026#34;)) Warning\nIn this current version, the assembler is working pretty well for x86/x86_64 and AArch64 but might break on other architectures. In addition, llvm::MCFixup are not supported. This can be used to patch LIEF\u0026rsquo;s binary object directly at the assembly level. I have some plans to provide LIEF Binary context to the assembly engine such as if the binary defines a function call_me() that is either exported or present in the debug info, users would be able to leverage this function at the assembly level:\n1fn patch_with_context(macho: \u0026amp;mut lief::macho::Binary) { 2 macho.assemble(0x140000090, r#\u0026#34; 3 adrp x0, call_me; 4 add x0, x0, :lo12:call_me; 5 mov x1, 0x90; 6 str x1, [x0]; 7 \u0026#34;#r); 8} And LIEF would handle the relocation/resolution process to instruct LLVM about the location and the definition of call_me.\nC++ \u0026 Rust \u0026 Python\nThe disassembler/assembler API is seamlessly available in Rust, C++, and Python :) Dyld Shared Cache Initial support for processing Apple\u0026rsquo;s Dyld shared cache with LIEF has been released along with an API to deoptimize in-cache Dylib. The API looks like this:\n1import lief 2 3cache = lief.dsc.load(\u0026#34;ios-18.1/\u0026#34;) 4for dylib in cache.libraries: 5 print(f\u0026#34;0x{dylib.address:016x} {dylib.path}\u0026#34;) 6 # Extract the dylib as a regular lief.MachO.Binary 7 macho: lief.MachO.Binary = dylib.get() Warning\nPlease note that the deoptimization feature is not working well on all the shared cache libraries. This support is going to be improved over time. One could also use this API to diff two shared caches:\n1use lief; 2let ios_17 = lief::dsc::load_from_path(\u0026#34;ios-17.7.1\u0026#34;); 3let ios_18 = lief::dsc::load_from_path(\u0026#34;ios-18.1.1\u0026#34;); 4 5let libraries_17: HashSet\u0026lt;String\u0026gt; = ios_17.libraries() 6 .map(|lib| lib.path()) 7 .collect(); 8 9let libraries_18: HashSet\u0026lt;String\u0026gt; = ios_18.libraries() 10 .map(|lib| lib.path()) 11 .collect(); 12 13println!(\u0026#34;{:?}\u0026#34;, libraries_17.symmetric_difference(\u0026amp;libraries_18)) Rust Rust bindings got their first mutable functions which are listed in the changelog. These mutable functions are limited but they allow us to make basic modifications like adding a library or patching assembly code:\n1fn add_library(elf: \u0026amp;mut lief::elf::Binary) { 2 elf.add_library(\u0026#34;libtest.so\u0026#34;); 3 elf.write(\u0026#34;patched.elf\u0026#34;); 4} 1fn patch_asm(elf: \u0026amp;mut lief::macho::Binary) { 2 macho.assemble(0x100004090, r#\u0026#34; 3 mov x0, x16; 4 br x0; 5 \u0026#34;#); 6 macho.write(\u0026#34;patched.macho\u0026#34;); 7} In addition, the support for the x86_64-unknown-linux-musl target triple is now available and the minimal GLIBC version for x86_64-unknown-linux-gnu has been lowered to 2.28. It means that Linux Rust bindings can now run on Debian 10, Ubuntu 19.10, \u0026hellip; while before it required Debian 11 or Ubuntu 20.04.\nThe new x86_64-unknown-linux-musl triple can be used to generate full static without any dependencies to the libstdc++, libc, ....\nFor instance, given this code:\n1use lief; 2use lief::generic::Section; 3 4fn main() { 5 let path = std::env::args().last().unwrap(); 6 let mut file = std::fs::File::open(path).expect(\u0026#34;Can\u0026#39;t open the file\u0026#34;); 7 8 if let Some(lief::Binary::PE(pe)) = lief::Binary::from(\u0026amp;mut file) { 9 for section in pe.sections() { 10 println!( 11 \u0026#34;{:20}: [0x{:016x}-0x{:016x}]\u0026#34;, 12 section.name(), 13 section.virtual_address(), 14 section.virtual_address() + section.virtual_size() as u64 15 ); 16 } 17 } 18} We can generate a dependencies-free executable by running:\n1$ cargo build [--release] --target x86_64-unknown-linux-musl 1$ ldd target/x86_64-unknown-linux-musl/release/reader 2 statically linked 1$ target/x86_64-unknown-linux-musl/release/reader steam.exe 2.text : [0x0000000000001000-0x00000000002cbe53] 3.rdata : [0x00000000002cc000-0x00000000003a7fa2] 4.data : [0x00000000003a8000-0x000000000043ada0] 5.rsrc : [0x000000000043b000-0x0000000000471b8c] 6.reloc : [0x0000000000472000-0x0000000000490c74] Python Bindings LIEF is now using nanobind v2.4.0 which improves the support for typing.\nAmong these typing improvements, C++ enums flags are now properly inheriting from enum.Flag which results in a better interface with Python code.\nAdditionally, typing stub files (*.pyi) are now generated with the nanobind\u0026rsquo;s built-in stubgen.py instead for mypy.\nFinal Words Additional changes are listed in the detailed changelog.\nMany thanks to dornstetter and kohnakagawa for their feedback about the dyld shared cache feature.\nThank you also to Konstantin Vinogradov and dctoralves for their sponsorship.\nAll the patches have been PR-submitted to the LLVM. You can check LIEF \u0026amp; LLVM for the details\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","date":1733788800,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1733788800,"objectID":"1494404171927e7e7a85cc70eb089b74","permalink":"https://lief.re/blog/2024-12-10-lief-0-16-0/","publishdate":"2024-12-10T00:00:00Z","relpermalink":"/blog/2024-12-10-lief-0-16-0/","section":"blog","summary":"LIEF 0.16.0 is out. This blog post highlights important changes and features","tags":null,"title":"LIEF v0.16.0","type":"blog"},{"authors":{"avatar":"img/avatar/romain.png","display_name":"Romain Thomas","family_name":"Thomas","given_name":"Romain","profile":"https://www.romainthomas.fr"},"categories":null,"content":"While this new release adds new functionalities and addresses different bugs, It is worth mentioning that it is the first release to officially expose Rust binding! In addition, an extended version was also released to provide additional functionalities not strictly related to the executable formats.\nRust bindings As discussed in these blog posts:\nLIEF Rust bindings updates Rust bindings for LIEF LIEF is now available in Rust for the following architectures:\naarch64-unknown-linux-gnu x86_64-apple-darwin x86_64-pc-windows-msvc (MT/MD runtimes) x86_64-unknown-linux-gnu aarch64-apple-ios aarch64-apple-darwin I published the release on crates.io so you should be able to start using LIEF in Rust with:\n1[package] 2name = \u0026#34;lief-demo\u0026#34; 3version = \u0026#34;0.0.1\u0026#34; 4edition = \u0026#34;2021\u0026#34; 5 6[dependencies] 7lief = \u0026#34;0.15.0\u0026#34; LIEF Extended LIEF is now providing additional features thanks to an extended version. Among those features, it provides support for DWARF and PDB debug formats as well as Objective-C metadata.\nObjective-C This support is a kind of spin-off of iCDump which is now completely integrated into LIEF. Compared to the original iCDump project, it fixes the issue with the new chained relocations (c.f. issue#4) format and can be used on all the platforms supported by LIEF (including Windows) in C++/Rust/Python:\nRust:\n1let macho: lief::macho::Binary; 2 3if let Some(metadata) = macho.objc_metadata() { 4 println!(\u0026#34;Objective-C metadata found\u0026#34;); 5 for class in metadata.classes() { 6 println!(\u0026#34;name={}\u0026#34;, class.name()); 7 for method in class.methods() { 8 println!(\u0026#34; method.name={}\u0026#34;, method.name()); 9 } 10 } 11} Python:\n1import lief 2macho: lief.MachO.Binary = ... 3metadata: lief.objc.Metadata = macho.objc_metadata 4 5if metadata is not None: 6 print(\u0026#34;Objective-C metadata found\u0026#34;) 7 8 for clazz in metadata.classes: 9 print(f\u0026#34;name={clazz.name}\u0026#34;) 10 for meth in clazz.methods: 11 print(f\u0026#34; method.name={meth.name}\u0026#34;) 12 13 # Generate a header like \u0026#34;class-dump\u0026#34; 14 print(metadata.to_decl()) DWARF \u0026amp; PDB Supporting debug formats like DWARF or PDB has been a long-standing discussion (c.f. issue #17). The main reasons to avoid supporting these formats from scratch were:\nThe maintenance effort There already exists libraries to process these debug formats: pyelftools for DWARF LLVM (DWARF \u0026amp; PDB) gimli (DWARF) On the other hand, I do understand the need to be able to process debug info (if present) from a LIEF binary object. While looking at the API of the different existing projects, I noticed that they are pretty powerful to expose a low-level API that matches the debug format specifications but they don\u0026rsquo;t provide1 some kind of abstraction over the complexity of these specifications.\nDevelopers and reverse engineers have concepts of compilation units, functions, global variables, stack variables, etc but before being able to access this information from a DWARF or a PDB file, you need to go through what a PDB DBI stream is or understand that the address of a function in DWARF can be determined by either DW_AT_entry_pc or DW_AT_low_pc.\nThe idea behind the support of the DWARF and PDB formats in LIEF is to:\nbridge concepts that make sense to the developers/reverse engineers with their concrete specifications in DWARF/PDB Have a (documented) C++ API and bindings for Python/Rust. This LIEF bridge is based on LLVM which did the heavy job of supporting DWARF \u0026amp; PDB within a single framework.\nThe DWARF \u0026amp; PDB support in LIEF leverages the LLVM API to abstract concepts as listed above.\nFor instance, you can iterate over all the PDB\u0026rsquo;s public symbols of the ntoskrnl.pdb through:\n1import lief 2 3ntoskrnl: lief.pdb.DebugInfo = lief.pdb.load(\u0026#34;./ntoskrnl.pdb\u0026#34;) 4 5for sym in ntoskrnl.public_symbols: 6 print(f\u0026#34;{sym.demangled_name}: 0x{sym.RVA:06x}\u0026#34;) If the PDB embeds extended information about the compilation units we can do (in Rust):\n1let pdb = lief::pdb::load(\u0026#34;peacecannary.pdb\u0026#34;); 2for cu in pdb.compilation_units() { 3 for func in cu.functions() { 4 if func.name().starts_with(\u0026#34;peacecannary::CObfuscator\u0026#34;) { 5 println!(\u0026#34;{}: {} (0x{:04x})\u0026#34;, cu.module_name(), func.name(), func.rva()); 6 } 7 } 8} The API for the DWARF format is pretty similar:\n1import lief 2 3elf: lief.ELF.Binary = ... 4# If the binary embeds DWARF debug info in the ELF: 5dwarf: lief.dwarf.DebugInfo = elf.debug_info 6# Otherwise: 7dwarf: lief.dwarf.DebugInfo = lief.dwarf.load(\u0026#34;my_dwarf.dwarf\u0026#34;) 8 9for cu in dwarf.compilation_units: 10 print(f\u0026#34;Produced by: {cu.producer} in {cu.compilation_dir}\u0026#34;) 11 12 for func in cu.functions: 13 print(f\u0026#34;0x{func.address:04x}: {func.name} ({func.size} bytes)\u0026#34;) 14 15 for var in cu.variables: 16 if var.is_constexpr: 17 continue 18 # Look for global variables only 19 if var.address is not None and var.address \u0026gt; 0: 20 print(f\u0026#34;0x{var.address:04x}: {var.linkage_name} ({var.size} bytes)\u0026#34;) For more details about the API, you can take a look at these dedicated sections:\nDWARF PDB Other Updates Mach-O AI LIEF is now powered by AI supporting Apple *.hwx files which are some kind of Mach-O file for the Apple Neural Engine (ANE).\nThese *.hwx start with a new magic identifier: 0xbeefface and embed custom LC_ command like the command 0x40\nLC Command 0x40\nI could be interested in adding the support of this private command in LIEF so if anyone already reversed or has some info about the layout of this command, feel free to reach out. To support unknown or non-public LC commands in LIEF, I created an artificial LIEF::MachO::UnknownCommand which is a placeholder for any Mach-O commands that are not recognized by LIEF.\nFor instance, we can inspect the private 0x40 command as follows:\n1import lief 2target = lief.MachO.parse(\u0026#34;personsemantics-u8-v4.H16.espresso.hwx\u0026#34;).at(0) 3lc_0x40: lief.MachO.UnknownCommand = macho.commands[18].command 4 5print(lc_0x40.original_command) # Outputs 0x40/61 6print(bytes(lc_0x40.data)) # Print the raw content of the command These .hwx files have been involved in the Dopamine jailbreak and you can also find a BlackHat presentation about the Apple Neural Engine: Apple Neural Engine Internal.\nPE Authenticode LIEF can inspect and verify the PE Authenticode and with this release, we can even do that in Rust!\n1use lief::pe; 2 3let mut file = std::fs::File::open(path).expect(\u0026#34;Can\u0026#39;t open the file\u0026#34;); 4if let Some(lief::Binary::PE(pe)) = lief::Binary::from(\u0026amp;mut file) { 5 let result = pe.verify_signature(pe::signature::VerificationChecks::DEFAULT); 6 if result.is_ok() { 7 println!(\u0026#34;Valid signature!\u0026#34;); 8 } else { 9 println!(\u0026#34;Signature not valid: {}\u0026#34;, result); 10 } 11 return ExitCode::SUCCESS; 12} This new release also adds the support of the Ms-CounterSignture attribute (OID: 1.3.6.1.4.1.311.3.3.1) and some other attributes like Ms-ManifestBinaryID (OID: 1.3.6.1.4.1.311.10.3.28)\nELF No breaking updates for the ELF format.\nLIEF is now able to parse and modify binaries compiled with the new DT_RELR and DT_ANDROID_REL_ relocations.\nI also added the helper: LIEF::ELF::Binary::get_relocated_dynamic_array which allows us to get a relocated view of the DT_INIT_ARRAY/DT_FINI_ARRAY.\nThis can be useful when \u0026ndash; for instance \u0026ndash; the init array values are null because of relocations:\n1import lief 2 3elf: lief.ELF.Binary = ... 4 5# Return: [0, 0, 0, 0, ...] 6elf.get(lief.ELF.DynamicEntry.TAG.INIT_ARRAY).array 7 8# Return relocated values: [0x96db10, 0x9b9c14, 0xe7f660, 0xe7f70c, ...] 9elf.get_relocated_dynamic_array(lief.ELF.DynamicEntry.TAG.INIT_ARRAY) Enums Since the beginning of LIEF, all the enums used by the different formats were located in a single header file (e.g. LIEF/PE/enums.hpp or lief.PE.{enums, ...} in Python). Some of them were clashing with system headers that were also #define some of these enums.\nTo workaround this issue, we had a dirty hack based on LIEF/{ELF.PE,MachO}/undef.h that undefines these values before being included.\nIn LIEF 0.15.0 the scope of the enums has been redefined so that we should no longer need the undef.h.\nFor instance the standalone enum LIEF::ELF::ELF_SECTION_TYPES (or lief.ELF.SECTION_TYPES) has been re-scoped in the LIEF::ELF::Section class:\n1// \u0026lt;LIEF/ELF/Section.hpp\u0026gt; 2class LIEF_API Section : public LIEF::Section { 3 enum class TYPE : uint64_t { 4 SHT_NULL = 0, /**\u0026lt; No associated section (inactive entry). */ 5 PROGBITS = 1, /**\u0026lt; Program-defined contents. */ 6 ... 7 }; 8}; This means that instead of using LIEF::ELF::ELF_SECTION_TYPES::SHT_PROGBITS or lief.ELF.SECTION_TYPES.SHT_PROGBITS you should now use:\n1- LIEF::ELF::ELF_SECTION_TYPES::SHT_PROGBITS 2+ LIEF::ELF::Section::TYPE::PROGBITS 3 4- lief.ELF.SECTION_TYPES.SHT_PROGBITS 5+ lief.ELF.Section.TYPE.PROGBITS The list of the enums affected by this change is listed in the changelog.\nPerformances PE Parser I received some feedback about performance issues in the latest release (0.14.x) compared to former releases. This regression affects Mach-O and PE binaries and I\u0026rsquo;m happy to say that this v0.15.0 release should be faster on ELF, PE, and Mach-O compared to previous releases.\nThe PE regression comes from the LIEF::PE::OptionalHeader::computed_checksum introduced in LIEF 0.12.0 and discussed in this issue: #660.\nAs of LIEF 0.12.0, this computed_checksum was computed during the parsing phase, and on large binaries, this computation might have a significant impact on the performances. In LIEF 0.15.0, the OptionalHeader\u0026rsquo;s checksum can be re-computed over the LIEF::PE::Binary object:\n1import lief 2 3pe: lief.PE.Binary = ... 4computed_checksum = pe.compute_checksum() Thus, avoiding the computation during the parsing phase and moving to an \u0026ldquo;on-demand\u0026rdquo; API.\nMach-O Parser On the other hand, the Mach-O regression was pretty tricky to identify (c.f. issue #1069).\nThe root cause of the regression was these lines:\n1// https://github.com/lief-project/LIEF/blob/0.14.1/src/MachO/BinaryParser.cpp#L285-L290 2for (LARGE_LOOP) { 3 if (!is_printable(name)) { 4 ... 5 } 6} with is_printable implemented as follows:\n1bool is_printable(const std::string\u0026amp; str) { 2 return std::all_of(std::begin(str), std::end(str), 3 [] (char c) { return std::isprint\u0026lt;char\u0026gt;(c, std::locale(\u0026#34;C\u0026#34;)); }); 4} Then, while processing large Mach-O binaries with LIEF we can observe:\nOn Linux: No regression On macOS: REGRESSION On Windows: REGRESSION It turned out that std::locale(\u0026quot;C\u0026quot;) is cached by the STL on Linux but not on macOS \u0026amp; Windows. This means that we were invoking std::locale(\u0026quot;C\u0026quot;) for each character of each string (which has a cost).\nOne solution is to store std::locale(\u0026quot;C\u0026quot;) in a static variable as it is done \u0026ndash; under the hood \u0026ndash; in the Linux STL.\n1bool is_printable(const std::string\u0026amp; str) { 2 return std::all_of(std::begin(str), std::end(str), 3- [] (char c) { return std::isprint\u0026lt;char\u0026gt;(c, std::locale(\u0026#34;C\u0026#34;)); }); 4+ [] (char c) { 5+ static std::locale LC(\u0026#34;C\u0026#34;); 6+ return std::isprint\u0026lt;char\u0026gt;(c, LC); 7+ }); 8} This actual fix is slightly different though: 7c3f63194.\nPython Wheels LIEF Python wheels are now available for Musl-based systems. This support is motivated by the fact that Python Docker images tagged with the suffix the -alpine are using Alpine system which is based on Musl libc.\nThus, we can now use Docker\u0026rsquo;s python-alpine as image base to install LIEF:\n1FROM python:3.13.0b3-alpine 2 3RUN pip install --no-cache-dir lief==0.15.0 Note that the LIEF Python wheel for Alpine weighs ~2.5MB compressed and ~7MB decompressed.\nFinal Words This new Rust-oriented release is a major milestone for LIEF. While the library is widely used among Python community with ~16,000 daily downloads on PyPI, I\u0026rsquo;m eager to see new use cases or issues brought by the Rust community.\nAs a reminder, there is a Discord channel where you can drop your questions, and remarks (that are not issues \u0026#x1f609;).\nThank you also to arttson and lexika979, for their sponsorship.\nWhich makes sense since this is not the purpose of these projects\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","date":172152e4,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":172152e4,"objectID":"fe5e82b354107e244f5842d160669736","permalink":"https://lief.re/blog/2024-07-21-lief-0.15-0/","publishdate":"2024-07-21T00:00:00Z","relpermalink":"/blog/2024-07-21-lief-0.15-0/","section":"blog","summary":"This blog post introduces the v0.15.0 release","tags":null,"title":"LIEF v0.15.0","type":"blog"},{"authors":{"avatar":"img/avatar/romain.png","display_name":"Romain Thomas","family_name":"Thomas","given_name":"Romain","profile":"https://www.romainthomas.fr"},"categories":null,"content":"The rust bindings for LIEF are getting more and more production-ready for the next official release of LIEF (v0.15.0). This blog post exposes the recent updates on these bindings and some use cases.\nDocumentation The Rust documentation for the current bindings is now almost complete such as most of the functions and structures are documented:\nOne can access the nightly doc at this address: https://lief-rs.s3.fr-par.scw.cloud/doc/latest/lief/index.html\nNew Architectures Supported As mentioned in the previous blog post, the Rust bindings work with a \u0026ldquo;pre-compilation\u0026rdquo; step. Since the previous blog post, I added the support of iOS (i.e. aarch64-apple-ios) and for Linux ARM64 (i.e. aarch64-unknown-linux-gnu) which gives us this support in LIEF compared to the Rust Platform Support\nRust Tier 1 Support Triplet Support Comment aarch64-unknown-linux-gnu \u0026#x2705; i686-pc-windows-gnu \u0026#x274c; i686-pc-windows-msvc \u0026#x1f937; Could be supported if needed i686-unknown-linux-gnu \u0026#x1f937; Could be supported if needed x86_64-apple-darwin \u0026#x2705; x86_64-pc-windows-gnu \u0026#x274c; x86_64-pc-windows-msvc \u0026#x2705; x86_64-unknown-linux-gnu \u0026#x2705; Rust Tier 2 Support Triplet Support Comment aarch64-apple-ios \u0026#x2705; aarch64-apple-ios-sim \u0026#x1f937; Could be supported if needed aarch64-linux-android \u0026#x23f1;\u0026#xfe0f; Planned aarch64-apple-darwin \u0026#x2705; x86_64-unknown-linux-musl \u0026#x23f1;\u0026#xfe0f; Planned The support for some triplets like i686-pc-windows-msvc will be done on an as-needed basis so feel free to reach out or to open an issue/discussion on GitHub if you need this support.\nUses Cases 1// This code checks the PE Authenticode 2 3let path = std::env::args().last().unwrap(); 4let mut file = std::fs::File::open(path).expect(\u0026#34;Can\u0026#39;t open the file\u0026#34;); 5 6if let Some(lief::Binary::PE(pe)) = lief::Binary::from(\u0026amp;mut file) { 7 let result = pe.verify_signature(pe::signature::VerificationChecks::DEFAULT); 8 if result.is_ok() { 9 println!(\u0026#34;Valid signature!\u0026#34;); 10 } else { 11 println!(\u0026#34;Signature not valid: {}\u0026#34;, result); 12 } 13 return ExitCode::SUCCESS; 14} 15ExitCode::FAILURE 1// This code list all the libraries needed by an ELF binary as well as 2// the versioning of the symbols. 3// Example of output: 4// Dependencies: 5// - libclang-cpp.so.17 6// - libLLVM-17.so 7// - libstdc++.so.6 8// - libc.so.6 9// Versions: 10// From libc.so.6 11// - GLIBC_ABI_DT_RELR 12// - GLIBC_2.14 13// - GLIBC_2.34 14// - GLIBC_2.32 15// From libstdc++.so.6 16// - GLIBCXX_3.4.29 17// - GLIBCXX_3.4.30 18// From libLLVM-17.so 19// - LLVM_17 20 21let mut args = std::env::args(); 22if args.len() != 2 { 23 println!(\u0026#34;Usage: {} \u0026lt;binary\u0026gt;\u0026#34;, args.next().unwrap()); 24 return ExitCode::FAILURE; 25} 26 27let path = std::env::args().last().unwrap(); 28let mut file = std::fs::File::open(\u0026amp;path).expect(\u0026#34;Can\u0026#39;t open the file\u0026#34;); 29if let Some(lief::Binary::ELF(elf)) = lief::Binary::from(\u0026amp;mut file) { 30 println!(\u0026#34;Dependencies:\u0026#34;); 31 for entry in elf.dynamic_entries() { 32 if let dynamic::Entries::Library(lib) = entry { 33 println!(\u0026#34; - {}\u0026#34;, lib.name()); 34 } 35 } 36 println!(\u0026#34;Versions:\u0026#34;); 37 for version in elf.symbols_version_requirement() { 38 println!(\u0026#34; From {}\u0026#34;, version.name()); 39 for aux in version.auxiliary_symbols() { 40 println!(\u0026#34; - {}\u0026#34;, aux.name()); 41 } 42 } 43 44 return ExitCode::SUCCESS; 45} 46println!(\u0026#34;Can\u0026#39;t process {}\u0026#34;, path); 47ExitCode::FAILURE 1// Inspecting the PE rich header 2 3let path = std::env::args().last().unwrap(); 4let mut file = std::fs::File::open(\u0026amp;path).expect(\u0026#34;Can\u0026#39;t open the file\u0026#34;); 5 6if let Some(lief::Binary::PE(pe)) = lief::Binary::from(\u0026amp;mut file) { 7 let rich_header = pe.rich_header().unwrap_or_else(|| { 8 println!(\u0026#34;Rich header not found!\u0026#34;); 9 process::exit(0); 10 }); 11 12 println!(\u0026#34;Rich header key: 0x{:x}\u0026#34;, rich_header.key()); 13 for entry in rich_header.entries() { 14 println!(\u0026#34;id: 0x{:04x} build_id: 0x{:04x} count: #{}\u0026#34;, 15 entry.id(), entry.build_id(), entry.count()); 16 } 17 18 return ExitCode::SUCCESS; 19} 20println!(\u0026#34;Can\u0026#39;t process {}\u0026#34;, path); 21ExitCode::FAILURE 1// Dumping which section of an iOS app is encrypted 2 3let path = std::env::args().last().unwrap(); 4let mut file = std::fs::File::open(\u0026amp;path).expect(\u0026#34;Can\u0026#39;t open the file\u0026#34;); 5 6if let Some(lief::Binary::MachO(fat)) = lief::Binary::from(\u0026amp;mut file) { 7 for macho in fat.iter() { 8 for cmd in macho.commands() { 9 if let lief::macho::Commands::EncryptionInfo(info) = cmd { 10 println!(\u0026#34;Encrypted area: 0x{:08x} - 0x{:08x} (id: {})\u0026#34;, 11 info.crypt_offset(), info.crypt_offset() + info.crypt_size(), 12 info.crypt_id() 13 ) 14 } 15 } 16 } 17 return ExitCode::SUCCESS; 18} ","date":1718496e3,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1718496e3,"objectID":"b42f778cde8701ed59cfa743776df5a0","permalink":"https://lief.re/blog/2024-06-16-rust-update/","publishdate":"2024-06-16T00:00:00Z","relpermalink":"/blog/2024-06-16-rust-update/","section":"blog","summary":"This blog post describes the recent updates in LIEF Rust bindings","tags":null,"title":"LIEF Rust bindings updates","type":"blog"},{"authors":{"avatar":"img/avatar/romain.png","display_name":"Romain Thomas","family_name":"Thomas","given_name":"Romain","profile":"https://www.romainthomas.fr"},"categories":null,"content":"LIEF Rust bindings are now available. This blog post introduces these bindings and the technical challenges behind this journey.\ntl;dr 1[package] 2name = \u0026#34;lief-demo\u0026#34; 3version = \u0026#34;0.0.1\u0026#34; 4edition = \u0026#34;2021\u0026#34; 5 6[dependencies] 7lief = { git = \u0026#34;https://github.com/lief-project/LIEF\u0026#34;, branch = \u0026#34;main\u0026#34;} 1use lief::Binary; 2 3fn main() { 4 let mut file = File::open(path).expect(\u0026#34;Can\u0026#39;t open the file\u0026#34;); 5 6 match Binary::from(\u0026amp;mut file) { 7 Some(Binary::ELF(elf)) =\u0026gt; { 8 for section in elf.sections() { 9 println!(\u0026#34;{}: 0x{:x}\u0026#34;, section.name(), section.virtual_address()); 10 } 11 }, 12 Some(Binary::PE(pe)) =\u0026gt; { 13 // ... 14 }, 15 Some(Binary::MachO(macho)) =\u0026gt; { 16 // ... 17 }, 18 None =\u0026gt; { 19 // Parsing error 20 } 21 } 22} Nightly documentation is available here: https://lief-rs.s3.fr-par.scw.cloud/doc/latest/lief/index.html and the package will be published on https://crates.io/crates/lief for the 0.15.0 release.\nIntroduction It has been a long journey to have Rust bindings for LIEF, and I\u0026rsquo;m happy to announce that these bindings are starting to be ready for public release.\nI\u0026rsquo;ll take this blog post as an opportunity to share the different challenges that led me to the current design of the bindings. I\u0026rsquo;m not a rust guru, so feel free to share your feedback or suggestions!\nIdiomacy First off, I\u0026rsquo;m attached to have bindings that are idiomatic in the language they target. Reaching the current state of the Rust API took me most of the time during the development. The Rust language introduces new concepts that do not exactly match what we can find in object-oriented languages. You can get an idea of the Rust API with these examples.\nIterate over ELF sections 1use lief::Binary; 2use lief::generic::Section; // for the \u0026#34;abstract\u0026#34; traits 3 4let path = std::env::args().last().unwrap(); 5if let Some(Binary::ELF(elf)) = Binary::parse(path.as_str()) { 6 for section in elf.sections() { 7 println!(\u0026#34;{}\u0026#34;, section.name()); 8 } 9} Get PE PDB path 1use lief::Binary; 2use lief::pe::debug::Entries::CodeViewPDB; 3 4if let Some(Binary::PE(pe)) = Binary::parse(path.as_str()) { 5 for entry in pe.debug() { 6 if let CodeViewPDB(pdb_view) = entry { 7 println!(\u0026#34;{}\u0026#34;, pdb_view.filename()); 8 } 9 } 10} Access Mach-O Dyld Info 1use lief::Binary; 2use lief::macho::commands::Commands; 3use lief::macho::binding_info::BindingInfo; 4 5if let Some(Binary::MachO(fat)) = Binary::parse(path.as_str()) { 6 for macho in fat.iter() { 7 8 // First version, iterate over the commands 9 for cmd in macho.commands() { 10 // Alternative to `if let` pattern 11 match cmd { 12 Commands::DyldInfo(dyld_info) =\u0026gt; { 13 for binding in dyld_info.bindings() { 14 if let BindingInfo::Chained(chained) = binding { 15 println!(\u0026#34;Library: 0x{:x}\u0026#34;, chained.address()); 16 } 17 } 18 } 19 _ =\u0026gt; {} 20 } 21 } 22 23 // Second version, using the helper 24 if let Some(dyld_info) = macho.dyld_info() { 25 for binding in dyld_info.bindings() { 26 if let BindingInfo::Chained(chained) = binding { 27 println!(\u0026#34;Library: 0x{:x}\u0026#34;, chained.address()); 28 } 29 } 30 } 31 } 32} Given this idiomatic goal, there were some challenges in exposing C++ code to Rust.\nPolymorphism \u0026amp; Inheritance How to idiomatically bind this C++ code in Rust?\n1class Base { 2 virtual std::string get_name() { 3 return \u0026#34;Base\u0026#34;; 4 } 5}; 6 7class Derived : public Base { 8 virtual std::string get_name() { 9 return \u0026#34;Derived\u0026#34;; 10 } 11}; 12 13class OtherDerived : public Base { 14 virtual std::string get_name() { 15 return \u0026#34;OtherDerived\u0026#34;; 16 } 17}; For the inheritance relationship, the idea is to leverage Rust\u0026rsquo;s enum structure in which, all the leaves of the inheritance tree are an entry of the enum:\n1pub enum Inheritance { 2 Derived(Derived), 3 OtherDerived(OtherDerived), 4} Secondly, all these objects inherit and share the get_name() virtual function. To provide this shared property in Rust, we can leverage a Rust trait that would make get_name available for the structures that implement this trait:\n1pub trait AsBase { 2 fn get_name(\u0026amp;self) -\u0026gt; String; 3} 4 5impl AsBase for Derived { 6 fn get_name(\u0026amp;self) -\u0026gt; String { 7 ... 8 } 9} 10 11impl AsBase for OtherDerived { 12 fn get_name(\u0026amp;self) -\u0026gt; String { 13 ... 14 } 15} One can also simplify the definition of the trait such as the derived objects only have to provide the FFI reference to the base class:\n1pub trait AsBase { 2- fn get_name(\u0026amp;self) -\u0026gt; String; 3+ fn as_base(\u0026amp;self) -\u0026gt; ffi::BaseImpl; 4+ 5+ fn get_name(\u0026amp;self) -\u0026gt; String { 6+ self.as_base().get_name().to_string() 7+ } 8} 9 10impl AsBase for Derived { 11- fn get_name(\u0026amp;self) -\u0026gt; String { 12+ fn as_base(\u0026amp;self) -\u0026gt; ffi::BaseImpl { 13 ... 14 } 15} 16 17impl AsBase for OtherDerived { 18 fn get_name(\u0026amp;self) -\u0026gt; String { 19 ... 20 } 21} LIEF\u0026rsquo;s Rust bindings highly rely on these patterns to expose classes with polymorphism and inheritance properties.\nLifetime In C++, we don\u0026rsquo;t have the concept of a lifetime for an object. For instance, it\u0026rsquo;s perfectly fine to write this code:\n1int main() { 2 LIEF::PE::Binary* pe = nullptr; 3 { 4 std::unique_ptr\u0026lt;LIEF::PE::Binary\u0026gt; pe_unique = LIEF::PE::Parser::parse(\u0026#34;...\u0026#34;); 5 pe = pe_unique.get(); 6 } 7 printf(\u0026#34;%s\\n\u0026#34;, pe-\u0026gt;get_section(\u0026#34;.text\u0026#34;).name()); // Use-after-free 8 return 0; 9} Nevertheless, the pe pointer used in printf is no longer valid because of the scope of the std::unique_ptr.\nIn Python, Nanobind and Pybind11 provide helpers to define the lifetime of an object according to its parent or its scope:\n1nb::class\u0026lt;LIEF::PE::Binary\u0026gt;(m, \u0026#34;Binary\u0026#34;) 2 .def_prop_ro(\u0026#34;sections\u0026#34;, 3 nb::overload_cast\u0026lt;\u0026gt;(\u0026amp;Binary::sections), 4 nb::keep_alive\u0026lt;0, 1\u0026gt;()) With nb::keep_alive, we indicate that the lifetime of the PE section iterator must be at least as long as the lifetime of the PE Binary instance.\nIn Rust, we could express this lifetime with something like:\n1pub struct Iterator\u0026lt;\u0026#39;a\u0026gt; { 2 pub it: ffi::Impl, 3} 4 5impl\u0026lt;\u0026#39;a\u0026gt; Iterator\u0026lt;\u0026#39;a\u0026gt; { 6 pub fn new(it: ffi::Impl) -\u0026gt; Self { 7 Self { 8 it, 9 } 10 } 11} 12 13 14impl Binary { 15 pub fn get_iterator(\u0026amp;\u0026#39;a self) { 16 Iterator::new(self.get_ffi_impl()) 17 } 18} But this code is not correct since the lifetime \u0026lt;'a\u0026gt; of the Iterator structure is not bound to an attribute in the structure. For technical-ffi reasons, we can\u0026rsquo;t bind this lifetime to ffi::Impl.\nOne solution consists of using PhantomData to provide the lifetime semantic:\n1pub struct Iterator\u0026lt;\u0026#39;a\u0026gt; { 2 pub it: ffi::Impl, 3 _owner: PhantomData\u0026lt;\u0026amp;\u0026#39;a ffi::PE_Binary\u0026gt;, 4} Safety First! LIEF is developed in what we could say, an \u0026ldquo;unsafe\u0026rdquo; language (i.e. C++). On the other hand, Rust provides strong guarantees about memory, concurrency, \u0026hellip;\nEven though LIEF\u0026rsquo;s core can\u0026rsquo;t provide the safety guarantees that Rust is giving, I tried to provide some guarantees about the bindings.\nCoverage 65% of the functions exposed by the Rust binding are covered by the test suite and you can access the coverage report here: https://lief-rs.s3.fr-par.scw.cloud/coverage/index.html (nightly generated).\nCoverage\nBy covered I mean: \"the function that bridges from C++ to Rust is executed in the test suite\". ASAN Regarding memory safety, Rust allows to compile packages with ASAN thanks to compiler options:\nexport RUSTFLAGS=\u0026#34;-Z sanitizer=address -Clink-args=-fsanitize=address\u0026#34; export TARGET_CXXFLAGS=\u0026#34;-fsanitize=address -fno-omit-frame-pointer -O1\u0026#34; ... Thus, we also leverage this option to compile both: LIEF core and the Rust binding with ASAN.\nGiven the fact that 65% of the functions and 70% of the lines are test-covered, running these tests with ASAN gives us some confidence about the fact that the bindings do not introduce leaks or memory issues.\nI don\u0026rsquo;t pretend that the code is free of bugs but at least these mechanisms are in place in the development cycle of the project.\nCompilation The bindings rely on autocxx to automatically generate rust FFI code from existing C++ include file. Autocxx is powerful but it can fail to process complex headers like LIEF/ELF/Binary.hpp. Thus, I had to create some kind of wrapper over the existing LIEF/*.hpp header files such as autocxx can process them. These wrappers are available in the directory api/rust/include/\nThe time to generate the Rust FFI code for the different C++ headers is significant: about ~50s with the current bindings. This generation time can be problematic for the end user especially if LIEF is indirectly imported from other dependencies. On the other hand, for fixed versions of LIEF, cxxgen and, autocxx, the code generated by cxxgen and autocxx is always the same. Thus, we can pregenerate and precompile these files to save time during the pure-rust compilation step.\nDocker All the different steps mentioned in the previous parts: pre-compilation, ASAN, and code coverage are CI-compiled and fully Dockerized.\nIt might also be worth mentioning that the pre-compiled FFI artifacts are also compiled and cross-compiled with Docker. Yes, cross-compiled.\nCross-Compilation \u0026amp; CI Digression\nFeel free to skip this part which is not strictly related to LIEF \u0026 Rust. LIEF uses Github Actions for the CI and from my experience, macOS and Windows runners are less available than Linux runners (i.e. you wait more for these runners). In addition, if you use these runners for a private repository (which is not the case for LIEF), you have a pool of 2000 minutes for the CI of the private repo. Depending on the runner you are using, these minutes are counted with a multiplier1:\nOperating system Minute multiplier Linux 1 Windows 2 OSX 10 1 minute spent on a macOS runner is equivalent to 10 minutes spent on a Linux runner. Hence, if your private project is exclusively using the macOS runner, you don\u0026rsquo;t have 2000 minutes (~33h) but 200 minutes (~3h).\nAnd then, after this pool of 2000 minutes, 1 minute on a macOS 6 vCPU is priced at 0.16$ while the same minute on a Linux 8 vCPU is priced at 0.032$.\nGiven those facts, cross-compiling for macOS and Windows can be interesting. LLVM provides all the facilities to perform this cross-compilation2 and since we are only generating static libraries, we don\u0026rsquo;t even need the libraries for these platforms.\nSo yes, LIEF core and the Rust FFI library are cross-compiled for Windows(MT/MD CRT) and OSX(aarch64, x86_64) with a Docker container running on Linux :)\nThe Windows and OSX runners are only used for testing that the cross-compilation worked well (i.e. ld64 can link.exe can link the cross-compiled libraries) and that the test suite is also working.\nLong story short, we save resources and CI minutes by cross-compiling for Windows and OSX in a Docker running on a Linux runner. As a side effect, we also get fully reproducible builds. The whole pipeline (LIEF core compilation, ASAN, coverage, S3 upload) takes less than 15 minutes (with cache optimizations).\nOther Projects LIEF Rust bindings might not be suitable for all the projects. Especially, if you are looking for a pure-safety-rust library or a #![no_std] context, please consider using these alternatives which are the standards libraries in Rust:\nGoblin: https://github.com/m4b/goblin gimli-rs - object: https://github.com/gimli-rs/object Acknowledgement Thank you to Erynian for the initial introduction of autocxx, back in the days I was working at Quarkslab \u0026#x1f609;\nhttps://docs.github.com/en/billing/managing-billing-for-github-actions/about-billing-for-github-actions#minute-multipliers\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nIncluding the generation of an ad-hoc signature for the Apple Silicon binaries (c.f ld/MachO/SyntheticSections.cpp)\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","date":1714262400,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1714262400,"objectID":"2e26408ffc682262592f43592c2ea6f3","permalink":"https://lief.re/blog/2024-04-28-rust/","publishdate":"2024-04-28T00:00:00Z","relpermalink":"/blog/2024-04-28-rust/","section":"blog","summary":"LIEF Rust Bindings","tags":null,"title":"Rust bindings for LIEF","type":"blog"},{"authors":{"avatar":"img/avatar/romain.png","display_name":"Romain Thomas","family_name":"Thomas","given_name":"Romain","profile":"https://www.romainthomas.fr"},"categories":null,"content":"LIEF v0.14.0 is out, here is an overview of the main changes!\nWhat\u0026rsquo;s new? Python Bindings LIEF v0.14.0 comes with some internal enhancements for the bindings.\nFirst, LIEF now uses nanobind instead of Pybind11. This change is motivated by the fact that nanobind reduces the compilation time while also improving the overall performances of the bindings1.\nThe typing stubs (.pyi) are almost complete. This means that almost all the functions and classes have accurate typing information that is not object or Any.\nFinally, setuptools has been replaced by scikit-build-core as it provides a cleaner API to generate native wheels.\nELF LIEF\u0026rsquo;s ELF module now supports the GNU properties notes and exposes a friendly API to access the underlying properties information. For instance, one can check if AArch64\u0026rsquo;s PAC is used by an ELF binary using the following API:\n1import lief 2 3elf = lief.ELF.parse(\u0026#34;aarch64-binary.elf\u0026#34;) 4prop: lief.ELF.NoteGnuProperty = elf.get(lief.ELF.Note.TYPE.GNU_PROPERTY_TYPE_0) 5aarch64_feat: lief.ELF.AArch64Feature = prop.find(lief.ELF.NoteGnuProperty.Property.TYPE.AARCH64_FEATURES) 6 7if lief.ELF.AArch64Feature.FEATURE.PAC in aarch64_feat.features: 8 print(\u0026#34;PAC is supported!\u0026#34;) In addition, the ELF parser can be tweaked to disable parsing some specific parts of an ELF file. For instance, one can skip parsing the relocations as follows:\n1import lief 2config = lief.ELF.ParserConfig() 3config.parse_relocations = False 4 5# ELF object without relocations information 6elf = lief.ELF.parse(\u0026#34;some-binary.elf\u0026#34;, config) PE As of now, one of the major design issues in LIEF is the enum API. Indeed, when I started to develop LIEF, I wanted to have class and enum names as close to their names mentioned in official documentation.\nBut it turned out that those names are \u0026ndash; sometimes \u0026ndash; already #define in system headers. It means that if we #include a header that already defines one of these names, we have a compilation error:\n1#include \u0026lt;um/winnt.h\u0026gt; // #define IMAGE_FILE_MACHINE_AM33 0x01d3 2#include \u0026lt;LIEF/PE/enums.hpp\u0026gt; // /!\\ Compilation error on IMAGE_FILE_MACHINE_AM33 The current (hacky) workaround for this issue is a undef.h file which #undef the names that create conflict between system definition and LIEF (c.f. LIEF/PE/undef.h).\nYes, it\u0026rsquo;s a hack and the current ongoing work to address this issue is a complete refactoring of the enums API which starts with a re-scoping. Currently, all the enums are defined in a single header file and some of them are used by only one class.\nFor instance, the enum LIEF::PE::SIG_ATTRIBUTE_TYPES, has been re-scoped in the LIEF::PE::Attribute:\n1// Before (v0.13.x): LIEF/PE/enums.hpp 2enum class SIG_ATTRIBUTE_TYPES { 3 UNKNOWN = 0, 4 CONTENT_TYPE, 5 ... 6}; 7 8// Now (v0.14.0): LIEF/PE/signature/Attribute.hpp 9class LIEF_API Attribute : public Object { 10 public: 11 enum class TYPE { 12 UNKNOWN = 0, 13 CONTENT_TYPE, 14 ... 15 }; 16} As of LIEF v0.14.0, the PE format is mostly impacted by this refactoring and the other formats should be progressively updated accordingly.\nOn Going Work As a reminder, LIEF is exclusively developed on my spare time, so some functionalities might take time to be completed and integrated\nRust Bindings This is still ongoing and the bindings are almost completed for ELF, PE, and Mach-O.\nI still need to create the bindings for the enums and figure out a way to reduce the compilation time but it keeps moving!\nDWARF \u0026amp; PDB LIEF will welcome DWARF and PDB debug information support through an external extension. This module will provide a comprehensive API to iterate over DWARF \u0026amp; PDB information.\nFinal Word Compared to LIEF 0.13.2, this new version introduces 274 new commits, with 35 292 additions and 39 392 deletions thanks to 15 contributors!\nThe complete changelog is available here: lief-project.github.io/changelog.html#january-20-2024\nThank you also to F., antipatico, and MobSF for their sponsoring.\nhttps://nanobind.readthedocs.io/en/latest/why.html\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","date":1705795200,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1705795200,"objectID":"5217fc7e9b230746183413124f032c6b","permalink":"https://lief.re/blog/2024-01-20-lief-0-14-0/","publishdate":"2024-01-21T00:00:00Z","relpermalink":"/blog/2024-01-20-lief-0-14-0/","section":"blog","summary":"LIEF v0.14.0 release highlights","tags":null,"title":"LIEF v0.14.0","type":"blog"},{"authors":{"avatar":"img/avatar/romain.png","display_name":"Romain Thomas","family_name":"Thomas","given_name":"Romain","profile":"https://www.romainthomas.fr"},"categories":null,"content":"LIEF v0.13.0 is eventually out! The full changelog is available here, but for those who would like an highlight on the main changes here they are.\nMach-O in Memory Parser LIEF is now able to parse a Mach-O file from an in-memory pointer:\n1uintptr_t mhdr = ...; // Absolute address to an in-memory Mach-O file 2auto macho = LIEF::MachO::Parser::parse_from_memory(mhdr); This feature can be handy on iOS to access the in-memory content of the binary. Firstly because some parts of the application code are encrypted (thanks to the LC_ENCRYPTION_INFO commands). Secondly, it can be used on protected code that fills the __data segment with clear (original) strings (c.f. Gotta Catch \u0026lsquo;Em All: Frida \u0026amp; jailbreak detection).\nThis feature could also be used in pair with _dyld_get_image_header, once we are injected into the targeted process:\n1size_t count = _dyld_image_count(); 2for (size_t i = 0; i \u0026lt; count; ++i) { 3 llvm::StringRef Name = _dyld_get_image_name(i); 4 if (Name.contains(\u0026#34;MyApp.app/Target\u0026#34;)) { 5 auto* mhdr = _dyld_get_image_header(i); 6 auto macho = MachO::Parser::parse_from_memory((uintptr_t)mhdr); 7 } 8} It might also work to access the dyld shared cache libraries but this aspect does not have been heavily tested.\nFramed ELF Sections The ELF format is \u0026ndash; by far \u0026ndash; the most tricky format especially when it comes dealing with sections and segments. These ELF structures are two different ways of slicing the binary data:\nSections are used by the compiler and the linker Segments are used by the system loader. LIEF implements a mechanism to deal with this dual representation so that if the user updates the .text section, the changes are also committed in the associated segment (if any).\nIt turns out that in some scenarios1, we might want to NOT commit the changes we are doing on the sections. The lief.ELF.Section.as_frame() function can be used to make the section \u0026ldquo;frame only\u0026rdquo;. All the attributes of the section will be committed in the final ELF binary but LIEF won\u0026rsquo;t consider this section to write the content in the binary.\nOne can use \u0026ndash; for instance \u0026ndash; this function to corrupt sections attributes:\n1elf = lief.parse(\u0026#34;/bin/ls\u0026#34;) 2text = elf.get_section(\u0026#34;.text\u0026#34;).as_frame() 3text.offset = 0xffffff 4elf.write(\u0026#34;ls.modified\u0026#34;) As the .text section is set as \u0026ldquo;framed\u0026rdquo;, its 0xffffff offset is not considered for changing its content. Thus this code does not update anything:\n1text.content = ... Since the ELF loader only relies on segment, this modification does not affect the execution of the modified binary.\nInternal Changes As announced in the previous v0.12.0 release, the LIEF\u0026rsquo;s code base is now free from exceptions and RTTI. This means that the core library can be compiled in an -fno-exceptions context.\nThe Python build process is now compliant with the PEP 621 pyproject.toml requirement. You can find out this file, along with config-default.toml, in the api/python directory. The config-default.toml file can be used to tweak the compilation of the bindings. This is essentially an interface over the LIEF\u0026rsquo;s cmake option. Lastly, the setup.py file used for compiling the binding has moved from the root directory to api/python.\nOn top of that, the Python bindings also stubgen interfaces (.pyi) which are handy for type checking and code completion (cf. issues/650).\nFinal Words Some features like the Rust bindings are not released yet they still require some ongoing work. I also really do hope to be able to work on enhancing the PE format modification in the next release but this will be balanced with the spare time I have to work on it :)\nThank you for using LIEF!\nhttps://passthesalt.ubicast.tv/videos/the-poor-mans-obfuscator/\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","date":1680998400,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1680998400,"objectID":"9acc4ef447b5cbcde4f60bd3fdaa686c","permalink":"https://lief.re/blog/2023-04-09-lief-0-13-0/","publishdate":"2023-04-09T00:00:00Z","relpermalink":"/blog/2023-04-09-lief-0-13-0/","section":"blog","summary":"This blog post highlights the main changes in LIEF v0.13.0","tags":null,"title":"LIEF v0.13.0","type":"blog"},{"authors":{"avatar":"img/avatar/romain.png","display_name":"Romain Thomas","family_name":"Thomas","given_name":"Romain","profile":"https://www.romainthomas.fr"},"categories":null,"content":" tl;dr\nThe next release of LIEF (v0.13.0) is fixing several Mach-O layout issues when adding new sections/segments. I also added the support for the two new load commands: LC_DYLD_CHAINED_FIXUPS LC_DYLD_EXPORTS_TRIE The support of LIEF for modifying Mach-O binaries was mostly limited to adding new load commands and thus, extending the load commands table.\nThe tutorial #11 explains the technical details to extend the load commands table which consists in shifting the content right after the load commands table and patching the relocations accordingly.\nNevertheless, the Mach-O binaries generated by LIEF after the modifications were somehow inconsistent regarding codesign. As a consequence, the binaries generated by LIEF could not be signed and executed on iOS or \u0026ndash; more recently \u0026ndash; an Apple M1.\nLIEF is now able to generate Mach-O-modified files that can be signed and that follow a strict layout, enforced by dyld and codesign. To better understand what was wrong, let\u0026rsquo;s consider the following script in which we add two new segments:\n1import lief 2 3target = lief.parse(\u0026#34;mbedtls_selftest_arm64.bin\u0026#34;) 4 5segment = lief.MachO.SegmentCommand(\u0026#34;__NEW\u0026#34;, [0] * 0x123) 6target.add(segment) 7 8segment = lief.MachO.SegmentCommand(\u0026#34;__NEW\u0026#34;, [0] * 0x456) 9target.add(segment) 10 11target.write(\u0026#34;test.out\u0026#34;) Under the hood, LIEF was relocating the binary to add two new LC_SEGMENT commands and was allocating space at the end of the file to store the content of the new segments. In particular, the new segments data were located after the content of the __LINKEDIT segment which breaks the layout required by codesign.\nThe following figure depicts the layout of a Mach-O file from the original layout to the layout generated by LIEF v0.13.0.\n\u003c?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?\u003e In LIEF v0.13.0 we fixed this inconsistency to make sure that the content of the new segments are located before the content of the __LINKEDIT segment. We can perform this change without breaking the binary as __LINKEDIT is a kind of self-contained blob of data1.\ncodesign requires the __LINKEDIT segment at the end of the file because the signature is appended at the end of the file. Otherwise, codesign would have to perform the similar relocation process done by LIEF.\n__LINKEDIT The __LINKEDIT segment plays an important role in the layout of the Mach-O format and its execution. This segment is used to store information about the exports, the symbols, the relocations, the signature, and more broadly, information used by the dyld loader to load the binary.\nThis segment has a known layout which is described in the following figure:\n\u003c?xml version=\"1.0\" ?\u003e This layout is very strict and its content must follow the same order as mentioned in the previous figure. In addition, there are sanity checks that ensure all the __LINKEDIT\u0026rsquo;s chunks are contiguous within the __LINKEDIT content. If the layout is wrong, the executable could run but it won\u0026rsquo;t likely pass the codesign checks.\nThis strict layout can be seen \u0026ndash; at first sight \u0026ndash; as a major hurdle for modifying Mach-O files but since the __LINKEDIT segment is located at the end of the file, we can extend it or shrink it quite easily.\nLIEF v0.13.0 is able to regenerate the content of this segment from the LIEF objects stored in the LIEF::MachO::Binary object Completely regenerating the __LINKEDIT segment enables to perform advanced modifications like creating exports and adding or removing symbols as it is discussed in the next sections.\nLC_DYLD_CHAINED_FIXUPS \u0026amp; LC_DYLD_EXPORTS_TRIE Compared to the ELF and PE formats, the relocations and the exported functions of Mach-O binaries are not wrapped by a table of entries\nIn the Mach-O format, the relocations are encoded either:\nBy a bytecode located in the LC_DYLD_INFO command By a chained fixups located in the LC_DYLD_CHAINED_FIXUPS On the other hand, the exports are encoded in a Trie located either\nIn the LC_DYLD_INFO command In the LC_DYLD_EXPORTS_TRIE LC_DYLD_CHAINED_FIXUPS appeared more recently compared to the LC_DYLD_INFO command for which the differences are described in the blog post: How iOS 15 makes your app launch faster.\nThe LC_DYLD_EXPORTS_TRIE has the same structure as LC_DYLD_INFO[Export Trie] but the export information has been moved in this dedicated load command.\nConverting a Mach-O Binary into a Library Converting a binary into a library can be useful to harness a fuzzed binary or to instrument/debug a specific function in a controlled environment (like an unknown cryptography function or a whiteboxed function)\nIn the tutorial #8, we described the process to perform this transformation on an ELF binary and the transformation for a Mach-O binary is a bit more straightforward.\nLet\u0026rsquo;s consider the following code:\n1#include \u0026lt;stdint.h\u0026gt; 2#include \u0026lt;stdio.h\u0026gt; 3#include \u0026lt;stdlib.h\u0026gt; 4 5static int X = 1; 6 7int compute() { 8 return X++; 9} 10 11int main(int argc, const char** argv) { 12 for (size_t i = 0; i \u0026lt; argc; ++i) { 13 printf(\u0026#34;compute(): %d\\n\u0026#34;, compute()); 14 } 15 return 0; 16} It can be compiled with:\n1romain@Mac-M1 % clang -O3 -fvisibility=hidden -Wl,-x -o bin2lib.bin bin2lib.c Which produces this executable: bin2lib.bin\nTo convert this binary into a library, we first need to change its type in the Mach-O\u0026rsquo;s header:\n1import lief 2bin2lib = lief.parse(\u0026#34;bin2lib.bin\u0026#34;) 3 4bin2lib.header.file_type = lief.MachO.FILE_TYPES.DYLIB 5 6bin2lib.write(\u0026#34;bin2lib.dyld\u0026#34;) It\u0026rsquo;s should be technically enough, but dyld_info raises some concerns:\n1romain@Mac-M1 % dyld_info ./bin2lib.dylib 2dyld_info: \u0026#39;./bin2lib.dylib\u0026#39; in \u0026#39;./bin2lib.dylib\u0026#39; MH_DYLIB is missing LC_ID_DYLIB This can be confirmed by looking at the source code of dyld.\nTo fix this error, we just have to create a new LC_ID_DYLIB command:\n1import lief 2bin2lib = lief.parse(\u0026#34;bin2lib.bin\u0026#34;) 3 4bin2lib.header.file_type = lief.MachO.FILE_TYPES.DYLIB 5+ bin2lib.add(lief.MachO.DylibCommand.id_dylib(\u0026#34;bin2lib.dylib\u0026#34;, 0, 1, 2)) 6 7bin2lib.write(\u0026#34;bin2lib.dyld\u0026#34;) Which enables to dlopen bin2lib.dyld\n1import ctypes 2handler = ctypes.cdll.LoadLibrary(\u0026#34;bin2lib.dyld\u0026#34;) 3# \u0026lt;CDLL \u0026#39;./bin2lib.dyld\u0026#39;, handle 208270460 at 0x107d277f0\u0026gt; Adding Symbols Thanks to the improvements on the __LINKEDIT segment, we can now create new exports. If we consider the stripped function int compute() from the binary in the previous section, we can create a new export as follows:\n\u003c?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?\u003e address = 0x100003f18 original.add_exported_function(address, \u0026#34;_compute\u0026#34;) Code Injection Another use case of these improvements is the capability to inject code in Mach-O file and to re-sign the modified binary. Code signing is not required for x86-64 binaries but it becomes mandatory when targeting the arm64 architecture.\nLet\u0026rsquo;s consider the library _heapq.cpython-39-darwin.so which is one of the first libraries dynamically loaded by the Python interpreter. The injection consists in:\nCreating new segments in the library _heapq.cpython-39-darwin.so that will embed our shellcode Changing the address of one of the exported functions to redirect the execution to the shellcode\u0026rsquo;s entrypoint. By running the python interpreter with the environment variable DYLD_PRINT_APIS=1 we can observe the following output:\n1romain@Mac-M1 ~ % DYLD_PRINT_APIS=1 python3 -c \u0026#34;import io\u0026#34; 2dyld[76439]: _dyld_is_memory_immutable(0x1b3f8cea0, 26) =\u0026gt; 1 3dyld[76439]: dlopen(\u0026#34;/opt/homebrew/Cellar/[email protected]/3.9.5/Frameworks/Python.framework/Versions/3.9/lib/python3.9/lib-dynload/_heapq.cpython-39-darwin.so\u0026#34;, 0x00000002) 4dyld[76439]: dlopen(_heapq.cpython-39-darwin.so) =\u0026gt; 0x208f35800 5dyld[76439]: dlsym(0x208f35800, \u0026#34;PyInit__heapq\u0026#34;) 6dyld[76439]: dlsym(\u0026#34;PyInit__heapq\u0026#34;) =\u0026gt; 0x104bcb824 It suggests that PyInit__heapq is a suitable function for redirecting the execution to the shellcode\u0026rsquo;s entrypoint. To create the shellcode, we can use gdelugre/shell-factory developed by a former colleague and which provides no less than a C++ STL-like to create shellcode.\nThanks to this project, we can create the following shellcode:\n1volatile uintptr_t ORIGINAL_EP = 0xdeadc0de; 2volatile uintptr_t IMAGEBASE = 0x00c0de; 3using PyInit__heapq_t = void(*)(); 4 5inline uintptr_t imagebase() { 6 /* 7 * The value of IMAGEBASE is set by the injector. 8 * After the patch, it contains the relative virtual address of \u0026amp;IMAGEBASE 9 * in the final binary. 10 */ 11 return reinterpret_cast\u0026lt;uintptr_t\u0026gt;(\u0026amp;IMAGEBASE) - IMAGEBASE; 12} 13 14SHELLCODE_ENTRY 15{ 16 uintptr_t base = imagebase(); 17 Pico::printf(\u0026#34;LIEF says hello!\\n\u0026#34;); 18 Pico::printf(\u0026#34;Time to jump on the real function: %p\\n\u0026#34;, ORIGINAL_EP); 19 auto PyInit__heapq = reinterpret_cast\u0026lt;PyInit__heapq_t\u0026gt;(base + ORIGINAL_EP); 20 return PyInit__heapq(); 21} Pico::printf\nThe attentive reader may have noticed the Pico::printf(\"[...] %p\") which is correctly supported by shell-factory (see: include/pico/format.h) The compiled shellcode can be downloaded here: lief_demo_darwin_arm64.bin. To inject the shellcode in _heapq.cpython-39-darwin.so, we first need to copy the shellcode\u0026rsquo;s segments in the library:\n1shellcode = lief.parse(\u0026#34;lief_demo_darwin_arm64.bin\u0026#34;) 2heapq = lief.parse(\u0026#34;_heapq.cpython-39-darwin.so\u0026#34;) 3 4for segment in shellcode.segments: 5 seg_name = segment.name.replace(\u0026#34;__\u0026#34;, \u0026#34;\u0026#34;) 6 seg = lief.MachO.SegmentCommand(f\u0026#34;__L{new_seg_name}\u0026#34;, list(segment.content)) 7 8 heapq.add(new_seg) Then, we have to patch the Mach-O exports trie to change the address of PyInit__heapq to the shellcode\u0026rsquo;s entrypoint:\n1shellcode_rva_entry = ... 2for exp in heapq.dyld_info.exports: 3 if exp.symbol.name != \u0026#34;_PyInit__heapq\u0026#34;: 4 continue 5 6 original = exp.address 7 exp.address = shellcode_rva_entry 8 return original Finally, we can rewrite the library:\n1heapq.write(\u0026#34;_heapq.cpython-39-darwin.so.patched\u0026#34;) and sign it:\n1romain@Mac-M1 ~ % codesign -f --verbose -s - _heapq.cpython-39-darwin.so.patched Now when running the Python interpreter, we can observe the execution of the shellcode:\n1romain@Mac-M1 ~ % python3 2LIEF says hello! 3Time to jump on the real function: 0x15f8 4Python 3.9.5 (default, May 3 2021, 19:12:05) 5[Clang 12.0.5 (clang-1205.0.22.9)] on darwin 6Type \u0026#34;help\u0026#34;, \u0026#34;copyright\u0026#34;, \u0026#34;credits\u0026#34; or \u0026#34;license\u0026#34; for more information. 7\u0026gt;\u0026gt;\u0026gt; Injection\nThe script that contains the complete logic of the transformation is available here and, _heapq.cpython-39-darwin.so.patched can be downloaded here. Surprisingly, we open the patched version of the library (_heapq.cpython-39-darwin.so.patched) in IDA and we jump on the symbol _PyInit__heapq, it actually displays this function:\nIDA Version 7.7.211224, January 18, 2022\nWhich is the original function and not the function associated with the shellcode whilst the patched library prints LIEF says hello [...]\nOn the other hand, if we get the address of _PyInit__heapq with LIEF:\n1import lief 2patched = lief.parse(\u0026#34;./_heapq.cpython-39-darwin.so.patched\u0026#34;) 3symbol = patched.get_symbol(\u0026#34;_PyInit__heapq\u0026#34;) 4print(hex(symbol.export_info.address)) The result is:\n_PyInit__heapq: 0xf824 Jumping on this address gives a better output (once manually disassembled):\nWe recognize the shellcode\u0026rsquo;s entrypoint function .\nWhat\u0026rsquo;s happened in IDA since this is the function located at 0xf824 which is executed and thus, resolved by dyld and not IDA?\nIDA is confused because Mach-O\u0026rsquo;s symbols can be stored in two different commands:\nLC_DYLD_INFO.export_trie / LC_DYLD_EXPORTS_TRIE LC_SYMTAB LC_DYLD_INFO.export_trie / LC_DYLD_EXPORTS_TRIE are used to store the exported symbols while LC_SYMTAB stores symbols for other purposes.\nThe important point is that the same symbol can be duplicated in these two commands with different addresses.\nIDA gives the priority to the LC_SYMTAB over the exports trie while the Mach-O loader uses the exports trie. The following figure illustrates why it can be confusing:\n\u003c?xml version=\"1.0\" ?\u003e Actually, I intentionally took a shortcut in the LIEF script that resolves the address of _PyInit__heapq and we can programmatically access these two addresses as follows:\n1import lief 2patched = lief.parse(\u0026#34;./_heapq.cpython-39-darwin.so.patched\u0026#34;) 3symbol = patched.get_symbol(\u0026#34;_PyInit__heapq\u0026#34;) 4+ print(hex(symbol.value)) 5print(hex(symbol.export_info.address)) 6 7+ # 0x15f8 address from the LC_SYMTAB 8 # 0xf824 address from the export trie We can observe a similar issue with BinaryNinja, Ghidra and, to a lesser extent, Radare2 BinaryNinja Version 3.0\nGhidra Version 10.1.2 - Jan 26, 2022\nRadare2 Version: 5.6.6 - Mar 22, 2022\n1$ r2 _heapq.cpython-39-darwin.so.patched 2[0x00000000]\u0026gt; aaa 3... 4[0x00000000]\u0026gt; ia 5 6[Imports] 7nth vaddr bind type lib name 8――――――――――――――――――――――――――――――――― 90 0x000021ec NONE FUNC PyErr_SetString 101 0x00000000 NONE FUNC PyExc_IndexError 112 0x00000000 NONE FUNC PyExc_RuntimeError 123 0x00000000 NONE FUNC PyExc_TypeError 134 0x000021f8 NONE FUNC PyList_Append 145 0x00002204 NONE FUNC PyList_SetSlice 156 0x00002210 NONE FUNC PyModuleDef_Init 167 0x0000221c NONE FUNC PyModule_AddObject 178 0x00002228 NONE FUNC PyObject_RichCompareBool 189 0x00002234 NONE FUNC PyUnicode_FromString 1910 0x00002240 NONE FUNC _PyArg_CheckPositional 2011 0x0000224c NONE FUNC _Py_Dealloc 2112 0x00000000 NONE FUNC _Py_NoneStruct 2213 0x00000000 NONE FUNC dyld_stub_binder 23 24[Exports] 25 26nth paddr vaddr bind type size lib name 27――――――――――――――――――――――――――――――――――――――――――――――――――― 280 0x000015f8 0x000015f8 GLOBAL FUNC 0 _PyInit__heapq On the other hand, the afl command outputs a better result:\n1[0x00000000]\u0026gt; afl 20x000015f8 1 12 sym._PyInit__heapq 30x00001604 6 108 sym._heapq_exec 40x00002238 1 8 fcn.00002238 50x00002220 1 8 fcn.00002220 60x00002250 1 8 fcn.00002250 7... 80x0000f824 1 88 sym.imp._PyInit__heapq Demo Conclusion These changes strengthen LIEF to read and modify Mach-O binaries. It should enable to develop and create new reverse engineering and binary analysis techniques.\nFor those who are interested in Mach-O (and ELF) tricks that could prevent static analysis tools from working correctly, I\u0026rsquo;ll present The Poor Man\u0026rsquo;s Obfuscator at Pass The Salt in July 2022 :)\nIn the general case, we can\u0026rsquo;t insert content between two arbitrary segments as it could break the binary. For instance, if the __TEXT segment references variables in the __DATA segment with relative addressing, inserting some data between these two segments will likely break the relative addressing.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","date":1651968e3,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1651968e3,"objectID":"63f90daea71fec85d376b29358b1c8ea","permalink":"https://lief.re/blog/2022-05-08-macho/","publishdate":"2022-05-08T00:00:00Z","relpermalink":"/blog/2022-05-08-macho/","section":"blog","summary":"This blog post describes the enhancements made in LIEF for modifying Mach-O files","tags":null,"title":"Mach-O Support Enhancements","type":"blog"},{"authors":{"avatar":"img/avatar/romain.png","display_name":"Romain Thomas","family_name":"Thomas","given_name":"Romain","profile":"https://www.romainthomas.fr"},"categories":null,"content":"We are thrilled to announce that LIEF v0.12.0 is released! You can find the complete changelog here.\nLIEF v0.12.0: What\u0026rsquo;s New? LIEF v0.12.0 is a balanced mix of new features, internal refactoring, and performance improvement.\nNew Features Regarding the new features, we added support for recomputing the PE\u0026rsquo;s rich header and the PE\u0026rsquo;s checksum. The PE\u0026rsquo;s rich header is a well-known-hidden1 feature that can be helpful to fingerprint a PE binary.\nLIEF enables \u0026ndash; since the version v0.7.0 \u0026ndash; to access this part of the PE file with the following API:\n1import lief 2pe_file = lief.parse(\u0026#34;hello.exe\u0026#34;) 3 4rich_header = pe_file.rich_header 5print(f\u0026#34;XOR Key: {rich_header.key}\u0026#34;) 6for e in rich_header.entries: 7 print(f\u0026#34;{e.id}: {e.build_id} {e.count}\u0026#34;) In LIEF v0.12.0, we added two functions:\nLIEF::PE::RichHeader::raw: To generate the rich header blob with or without a xor key. LIEF::PE::RichHeader::hash: To generate the MD5/SHA-1/SHA-256/(\u0026hellip;) of the rich header blob. For those who are looking for PE\u0026rsquo;s markers or tracking PE binaries, these two functions could be used to generate a characteristic of the binary, regardless of the xor-key:\n1# [...] 2rich_header = pe_file.rich_header 3 4marker = bytes(rich_header.hash(lief.PE.ALGORITHMS.SHA_1)).hex() Still about the PE format, we added LIEF::PE::OptionalHeader::computed_checksum() which returns the re-computed value of the PE\u0026rsquo;s checksum (LIEF::PE::OptionalHeader::checksum()).\nFor regular binaries, the verification of the OptionalHeader\u0026rsquo;s checksum is not enforced by Windows and the integrity checks are usually deferred to the PE\u0026rsquo;s Authenticode. Nonetheless, verifying the checksum() value with the output of computed_checksum() could help identify binaries that would have been modified after the compilation.\nFinally, we added the support for the PE\u0026rsquo;s delayed imports in LIEF and Luca Moro added the support of the LC_FILESET_ENTRY command in the Mach-O format.\nRefactoring \u0026amp; Performance Improvement We also refactored and enhanced LIEF\u0026rsquo;s internal codebase. Among those changes, we started to get rid of the C++ exceptions as described in this blog post: LIEF RTTI \u0026amp; Exceptions\nWe also introduced a std::span like interface (based on tcbrindle/span) to avoid returning and potentially copying std::vector\u0026lt;uint8_t\u0026gt;. For instance, LIEF::Section::content now uses the span interface. Regarding the Python API, functions or properties that bind a function which returns a span, are now returning a py::memoryview instead of the list of bytes. The original list of bytes can be recovered as follows:\n1bin = lief.parse(\u0026#34;/bin/ls\u0026#34;) 2section = bin.get_section(\u0026#34;.text\u0026#34;) 3 4if section is not None: 5 memory_view = section.content 6 list_of_bytes = list(memory_view) About the performances, we did a global refactoring of the ELF builder as described in this blog post: New ELF Builder. We also reduced the memory footprint of the ELF parser. For instance, in LIEF v0.11.5 a binary of 1.5G takes 3G or RAM2 while in LIEF v0.12.0, it takes quite the same memory as the file size.\nEric Kilmer also did a nice and complete cleaning of the LIEF CMake integration\nIn February 2022, tmp.0ut v2 has been released and @netspooky presented interesting tricks on the ELF format 3 4. We fixed the ELF parser to make sure we handle these tricks.\nWhat\u0026rsquo;s Next? We started to implement Rust bindings for LIEF thanks to cxx and google/autocxx. These bindings are in their early stages and we can\u0026rsquo;t confirm they will be present in the next release. In the current development stage, the API looks like this:\n1let mut path: String = \u0026#34;/bin/ls\u0026#34;; 2 3match Binary::parse(\u0026amp;path) { 4 Binary::ELF(elf) =\u0026gt; { 5 println!(\u0026#34;ELF binary\u0026#34;); 6 for segment in elf.segments() { 7 println!(\u0026#34;Address: {:x}\u0026#34;, segment.virtual_address); 8 } 9 }, 10 Binary::PE(pe) =\u0026gt; { 11 println!(\u0026#34;PE binary\u0026#34;); 12 let text_section = pe.get_section(\u0026#34;.text\u0026#34;); 13 text_section.name = \u0026#34;.foo\u0026#34;; 14 text_section.file_offset = 0x123; 15 16 text_section.commit(); // Commit the changes 17 }, 18 Binary::MachO(macho) =\u0026gt; { 19 println!(\u0026#34;MachO binary\u0026#34;); 20 for command in macho.commands() { 21 match command { 22 Commands::Dylib(dylib) =\u0026gt; { 23 ... 24 }, 25 Commands::Main(main_cmd) =\u0026gt; { 26 ... 27 }, 28 } 29 } 30 31 }, 32 Binary::Unknown(x) =\u0026gt; { 33 println!(\u0026#34;Unknown\u0026#34;); 34 }, 35} We will also merge the (still private) branch that enables to parse Mach-O from memory as well as the global improvement of the Mach-O\u0026rsquo;s builder.\nRegarding LIEF\u0026rsquo;s experimentations and work in progress, here is a list of topics on which we are working or we would like to work:\nTopic Status Parsing ELF files from memory Not started yet Parsing DART/Flutter snapshots PoC Creating an ELF from scratch PoC Parsing PE\u0026rsquo;s private authenticode: MS Counter Signature Not started yet Refactoring the PE\u0026rsquo;s builder Not started yet, priority undefined Supporting the Mach-O\u0026rsquo;s commands: LC_DYLD_CHAINED_FIXUPS / LC_DYLD_EXPORTS_TRIE Done, under testing Supporting the archive format (AR) Early stage If you are interested in supporting some of these topics, feel free to reach out.\nEnjoy!\nhttps://www.virusbulletin.com/virusbulletin/2020/01/vb2019-paper-rich-headers-leveraging-mysterious-artifact-pe-format/\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nMore generally, we have a factor 2 in memory compared to the file size. Oups \u0026hellip;\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nhttps://tmpout.sh/2/3.html\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nhttps://tmpout.sh/2/14.html\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","date":1648339200,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1648339200,"objectID":"78e4b1f7fb19b205fd17b361c0e8cc0e","permalink":"https://lief.re/blog/2022-03-27-lief-v0-12-0/","publishdate":"2022-03-27T00:00:00Z","relpermalink":"/blog/2022-03-27-lief-v0-12-0/","section":"blog","summary":"LIEF v0.12.0 is out. This blog post highlights the main changes and the upcomming features","tags":null,"title":"LIEF v0.12.0","type":"blog"},{"authors":{"avatar":"img/avatar/romain.png","display_name":"Romain Thomas","family_name":"Thomas","given_name":"Romain","profile":"https://www.romainthomas.fr"},"categories":null,"content":"try { When we started to develop LIEF, we choose to manage errors through the C++ exceptions as it is widely spread in Java. However, with a little hindsight it was not the best choice in the design of LIEF.\nFirst off, LIEF is a library and the API functions that throw exceptions are not compatible with library\u0026rsquo;s users that are not using exceptions (e.g with the -fno-exceptions flag). It is also considered as a bad practice quoting from C++ Coding Standards:\nC++ Coding Standards: Item 62\n“Don't throw stones into your neighbor's garden: There is no ubiquitous binary standard for C++ exception handling.” For instance, the function LIEF::ELF::Binary::get_section(const std::string\u0026amp; name) threw an exception if the section were not found. To avoid raising the exception, the API exposes helpers that can be used to check \u0026ndash; beforehand \u0026ndash; that it will not take the exception path:\n1if (bin.has_section(\u0026#34;.toto\u0026#34;)) { 2 auto\u0026amp; sec = bin.get_section(\u0026#34;.toto\u0026#34;); // Ok no exception 3} 4 5// With exception: 6try { 7 bin.get_section(\u0026#34;.toto\u0026#34;); 8} catch (const std::exception\u0026amp;) { 9 // .toto does not exist :( 10} Actually, the has_\u0026lt;element\u0026gt; / get_\u0026lt;element\u0026gt; pattern hides another issue: the performances.\nBasically, has_section(...) iterates over the list of the sections to check if a section with the given name exists and get_section() iterates again on this list to access the section. The code performs twice the same iteration. This is not a big deal for the ELF sections as they are quite small but it can be problematic for large sequences like the symbols table.\nIn LIEF v0.12.0 we changed the API of these functions to return a pointer on these objects instead of a reference. If the item can\u0026rsquo;t be found, it returns a nullptr.\nThe API contract of these functions is changing from raising an exception into returning a nullptr. The documentation has been updated accordingly and the list of the functions which have changed are listed here As a result, the previous code can be re-written as follows:\n1if (bin.has_section(\u0026#34;.toto\u0026#34;)) { 2 auto* sec = bin.get_section(\u0026#34;.toto\u0026#34;); // Non nullptr instead of a reference 3} 4 5// Or: 6if (auto* sec = bin.get_section(\u0026#34;.toto\u0026#34;)) { 7 // ... 8} This kind of API change is doable and meaningful for functions that aim at returning an optional object but it is less meaningful to transform a function like:\n1uint64_t Binary::virtual_address_to_offset(...) { ... } that returns an integer (while still potentially raising an exception).\nIn LIEF 0.12.0, this kind of function still raises an exception but in the next version (LIEF v0.13.0) the returned value will be wrapped by Boost\u0026rsquo;s Leaf1 such as the returned type will become:\n1// Future returned type 2result\u0026lt;uint64_t\u0026gt; Binary::virtual_address_to_offset(...) { ... } 3 4// To use it: 5auto res = bin.virtual_address_to_offset(); 6if (!res) { 7 // Error 8} else { 9 uint64_t val = res.value(); // or val = *res 10} In LIEF v0.12.0 only internal/private functions associated with the Parser/Builder module are using this mechanism and we plan to move to this mechanism in the public API2 in LIEF v0.13.0.\nWarning\nBoost LEAF is required in the public headers of LIEF. If you find conflicts, compilation issues, integration issues, or you think that it is a bad idea, please let us know before it becomes the default interface to manage errors. The RTTI LIEF also relies on the RTTI information which includes calling functions like typeid() or dynamic_cast\u0026lt;\u0026gt;(). For instance, to check if a Mach-O\u0026rsquo;s LoadCommand exists, the main MachO::Binary class calls at some point this helper:\n1template\u0026lt;class T\u0026gt; 2bool Binary::has_command() const { 3 static_assert(std::is_base_of\u0026lt;LoadCommand, T\u0026gt;::value, 4 \u0026#34;Require inheritance from \u0026#39;LoadCommand\u0026#39;\u0026#34;); 5 6 const auto it_cmd = std::find_if( 7 std::begin(commands_), std::end(commands_), 8 [] (const LoadCommand* command) { 9 return typeid(T) == typeid(*command); 10 }); 11 12 return it_cmd != std::end(commands_); 13} This code generates extra data for the RTTI information of the LoadCommand objects which can be perfectly fine. Actually, this RTTI information is redundant as the type of a Mach-O\u0026rsquo;s LoadCommand is already stored in the class itself:\n1class LoadCommand { 2 ... 3 private: 4 LOAD_COMMAND_TYPES command_; 5}; So instead of having these redundant RTTI, we implemented a LLVM-like RTTI3 based on classof() and that uses the already present command_ attribute. In the end, the previous has_command() can be updated as follows:\n1template\u0026lt;class T\u0026gt; 2bool Binary::has_command() const { 3 static_assert(std::is_base_of\u0026lt;LoadCommand, T\u0026gt;::value, 4 \u0026#34;Require inheritance from \u0026#39;LoadCommand\u0026#39;\u0026#34;); 5 const auto it_cmd = std::find_if( 6 std::begin(commands_), std::end(commands_), 7 [] (const LoadCommand* command) { 8 return T::classof(command); 9 }); 10 return it_cmd != std::end(commands_); 11} We applied this pattern for the LIEF\u0026rsquo;s object where typeid was present and as a result, we managed to completely remove this function as it was redundant with an existing attribute.\n} catch (const std::length_error\u0026) { We welcome feedback on these changes \u0026ndash; whether positive or negative \u0026ndash; as it impacts the public API.\nThank you for reading!\n} See the section Error Handling of the documentation for more details.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nIt still keeps the public headers compliant with C++11\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nhttps://llvm.org/docs/HowToSetUpLLVMStyleRTTI.html\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","date":1644710400,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1644710400,"objectID":"6ba95e8c883ea8e087f070418b4e48c4","permalink":"https://lief.re/blog/2022-02-13-lief-rtti-exceptions/","publishdate":"2022-02-13T00:00:00Z","relpermalink":"/blog/2022-02-13-lief-rtti-exceptions/","section":"blog","summary":"This blog post explains how and why we started to remove the exceptions and the RTTI in LIEF.","tags":null,"title":"LIEF RTTI \u0026 Exceptions","type":"blog"},{"authors":{"avatar":"img/avatar/romain.png","display_name":"Romain Thomas","family_name":"Thomas","given_name":"Romain","profile":"https://www.romainthomas.fr"},"categories":null,"content":"LIEF\u0026rsquo;s Modification Process Let\u0026rsquo;s start with a small recap of the LIEF modification process.\nTo enable executable file formats modification, LIEF transforms the raw executable formats into an object representation. This object can be manipulated with an API that is mainly exposed through the following interfaces:\nC++ Python LIEF::ELF::Binary lief.ELF.Binary LIEF::PE::Binary lief.PE.Binary LIEF::MachO::Binary lief.MachO.Binary Then, the LIEF\u0026rsquo;s builders take the object representation and (try to) reconstruct an executable according to the user\u0026rsquo;s changes.\nChallenges in Modifying ELF Binaries Compared to the PE and Mach-O formats, the ELF format is far the more trickier to handle for both: parsing and modifying. First off, there is a strong relationship between the segment\u0026rsquo;s virtual address and the file\u0026rsquo;s offset associated with its content. This relationship is ruled by the following property:\n$$\\text{\\textcolor{red}{file\\_offset}} \\equiv \\text{\\textcolor{blue}{virtual\\_address}} \\mod{\\textcolor{green}{\\text{page\\_size}}}$$ So basically, we can\u0026rsquo;t insert a segment at an arbitrary virtual address.\nThe second difficulty is about the strings table optimization that is performed on the .dynstr section. To understand how this optimization works, let\u0026rsquo;s consider these two functions:\n1int foo() { 2 return 1; 3} 4 5int call_foo() { 6 return foo(); 7} When these functions are compiled, the compiler generates two symbols for which the names of the symbols are referenced by the field st_name. Usually, this field points in the .dynstr section:\n1struct Elf_Sym { 2 Elf_Word st_name; // Offset of the symbol\u0026#39;s name in the .dynstr section 3 ... 4}; Naively, we could imagine that the .dynstr section contains these two symbols names, one next to the other:\n00000130 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00000140 0d 00 00 00 12 00 01 00 00 00 00 00 00 00 00 00 |................| 00000150 0b 00 00 00 00 00 00 00 0a 00 00 00 12 00 01 00 |................| 00000160 0b 00 00 00 00 00 00 00 0b 00 00 00 00 00 00 00 |................| 00000170 00 74 6f 74 6f 2e 63 70 70 00 66 6f 6f 00 64 6f |.test.cpp.foo.do| 00000180 5f 66 6f 6f 00 00 00 00 10 00 00 00 00 00 00 00 |_foo............| 00000 With such a layout, Elf_Sym(\u0026quot;foo\u0026quot;).st_name would point to the offset 0x17A while Elf_Sym(\u0026quot;do_foo\u0026quot;).st_name would point to the offset 0x17E.\nBut the real layout of the .dynstr is a bit smaller:\n00000130 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| 00000140 0d 00 00 00 12 00 01 00 00 00 00 00 00 00 00 00 |................| 00000150 0b 00 00 00 00 00 00 00 0a 00 00 00 12 00 01 00 |................| 00000160 0b 00 00 00 00 00 00 00 0b 00 00 00 00 00 00 00 |................| 00000170 00 74 6f 74 6f 2e 63 70 70 00 64 6f 5f 66 6f 6f |.test.cpp.do_foo| 00000180 00 00 00 00 00 00 00 00 10 00 00 00 00 00 00 00 |................| 00000190 04 00 00 00 03 00 00 00 fc ff ff ff ff ff ff ff |................| As we can see, it only contains the do_foo string. Since foo is a suffix of do_foo, st_name can point to a different offset of the same string. In this layout Elf_Sym(\u0026quot;foo\u0026quot;).st_name points to the offset 0x17C and Elf_Sym(\u0026quot;foo\u0026quot;).st_name points to 0x17A.\nConsequently, instead of taking the space of len(call_foo) + 1 + len(foo) + 1, it only takes len(call_foo) + 1 The consequence of this optimization is that we can\u0026rsquo;t naively push back the symbols names in the .dynstr section. Instead, we have to sort the symbols names such as this optimization can take place.\nIn addition to this strings optimization, ELF object files (.o) generated by Clang share the same section for the names of the sections and for the symbols\u0026rsquo; names.\n1$ readelf -hWS ./hello.o 2 3ELF Header: 4 Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 5 [...] 6 Number of section headers: 11 7 Section header string table index: 1 8 9Section Headers: 10 [Nr] Name Type Address Off Size ES Flg Lk Inf Al 11 [ 0] NULL 0000000000000000 000000 000000 00 0 0 0 12 [ 1] .strtab STRTAB 0000000000000000 000199 000078 00 0 0 1 13 [..] 14 [10] .symtab SYMTAB 0000000000000000 0000c0 000090 18 1 4 8 As we can notice, the Section header string table index of the ELF header indexes the .strtab which is also the section associated with the symbols\u0026rsquo; names (cf. the link attribute of the .symtab).\nIt results that we have to consider this kind of ELF file differently from regular libraries or executables.\nThere are other nasty tricks like the management of the ELF constructors between Linux and Android but this will be covered in another blog post.\nThe New ELF Builder For the historical context, I created LIEF during my internship at Quarkslab with the supervision of Serge-Sans-Paille and Adrien Guinet and the trust/boost from Fred Raynal.\nEven though I had the chance to get valuable feedback and review from them, I clearly made poor design decisions in LIEF and the implementation of the ELF builder is one of them.\nBasically, the implementation is recursive such as in the extreme cases the builder re-computes the same information several times.\nIn the new implementation, we added a new stage in the build process that pre-computes the offsets of the new sections and the data that need to be relocated. This pre-computation enables to know exactly which parts of the ELF structures need to be relocated according to the user\u0026rsquo;s changes. This computation is managed by the Layout class which has two implementations depending on whether it is an ELF object or a library/executable.\nCompared to the previous ELF builder, this new implementation produces smaller files (with fewer ELF segments) as exposed in the following figure. This figure compares the number of segments between the former and the new implementation:\nIn addition, it supports larger binaries faster as a consequence of the new linear implementation of the ELF builder :)\nTo perform these benchmarks, we generated ELF binaries with the modifications described in the following script:\n1import lief 2 3elf: lief.ELF.Binary = lief.parse(file_path.as_posix()) 4 5# Force relocating the .dynamic/.dynstr 6elf.add_library(\u0026#34;a_very_long_name.so\u0026#34;) 7 8# For relocating the interpreter 9elf.interpreter = \u0026#34;/a/very/longlonglong/interpreter-1.2.3.bin\u0026#34; 10 11# Force relocating .dynsym / .gnu.hash table 12for i in range(10): 13 elf.add_exported_function(0xdeadc0de + i, f\u0026#34;new_export_{i}\u0026#34;) 14 15# Add a segment 16segment = lief.ELF.Segment() 17segment.type = lief.ELF.SEGMENT_TYPES.LOAD 18segment.content = [0xcc] * 0x23 19 20elf.add(segment) 21 22elf.write(\u0026#34;/tmp/bench.bin\u0026#34;) The raw results of the benchmark are also available here\nFinal Words These new improvements introduce breaking changes in the ELF binaries generated by LIEF but:\nThe final binary size should be smaller The building time should be much faster We tried to cover most of the cases in the tests suite but some corner cases with exotic compilers or linkers might break the final binaries.\nSince this improvement aims at being in the next release, feel free to drop an email or to open an issue if you find a bug with this new implementation.\nSince September I continue maintaining LIEF exclusively in my spare so issues and new features are addressed with more delay. ","date":1642896e3,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1642896e3,"objectID":"ace950ecf166c5acc57659728094c97a","permalink":"https://lief.re/blog/2022-01-23-new-elf-builder/","publishdate":"2022-01-23T00:00:00Z","relpermalink":"/blog/2022-01-23-new-elf-builder/","section":"blog","summary":"After spending months on refactoring the ELF builder, here are the improvements.","tags":null,"title":"New ELF Builder","type":"blog"},{"authors":{"avatar":"img/avatar/romain.png","display_name":"Romain Thomas","family_name":"Thomas","given_name":"Romain","profile":"https://www.romainthomas.fr"},"categories":null,"content":" Tl;DR\nThis blog post is not, strictly speaking, related to LIEF but it aims at completing the previous blog about profiling code with Frida. In particular, it exposes the limits of our approach regarding the Microsoft/Itanium ABI. Long story short, the previous code does not work on Linux/OSX for virtual functions.\nThe previous blog post tried to show a use case of Frida to profile C++ functions. In particular, it exposed what we called a trick to convert a C++ member function into a void*:\n1template\u0026lt;typename Func\u0026gt; 2inline void* cast_func(Func f) { 3 union { 4 Func func; 5 void* p; 6 }; 7 func = f; 8 return p; 9} First, and as noticed by Julien Jorge, writing a union\u0026rsquo;s field and accessing another field of this union is undefined behavior:\nIt\u0026rsquo;s undefined behavior to read from the member of the union that wasn\u0026rsquo;t most recently written. Many compilers implement, as a non-standard language extension, the ability to read inactive members of a union.\nThanks also to the feedback from Julien Jorge, there is another issue when converting a C++ member function into a raw pointer.\nBasically, a member function pointer is not the same kind of pointer as a regular C function. While the regular size of a C function pointer is the same as sizeof(void*), the size of a member function pointer is usually greater:\n1struct Foo { 2 void bar() {} 3}; 4 5int main() { 6 printf(\u0026#34;sizeof(\u0026amp;Foo::bar): %d\\n\u0026#34;, sizeof(\u0026amp;Foo::bar)); 7 return 0; 8} 1$ clang++ sizeof_member.cpp -o sizof_member 2$ ./sizeof_member 3sizeof(\u0026amp;Foo::bar): 16 The layout of a member function pointer is ABI specific but according to LLVM\u0026rsquo;s source code we can distinguish two ABI that describe this layout:\nItanium CXX ABI which is used on Linux, iOS, OSX, Android, \u0026hellip; Microsoft Itanium ABI For the Itanium CXX ABI and according to the official documentation, non-virtual functions have the following structure:\n1struct { 2 uintptr_t ptr; 3 ptrdiff_t adj; 4}; Where, ptr is the address of the function and adj is an offset applied on this in the case of multi-inheritance.\nSo in our bad-coded casting function cast_func(), it works as expected for non-virtual functions since we access the first field ptr which is the function pointer. We can observe these two fields with the following piece of code 1:\n1template\u0026lt;typename Func\u0026gt; 2void print(Func f) { 3 union { 4 Func fcn; 5 struct { 6 uintptr_t ptr; 7 ptrdiff_t adj; 8 }; 9 }; 10 fcn = f; 11 printf(\u0026#34;%016lx | %016lx\\n\u0026#34;, ptr, adj); 12} that outputs this kind of values:\n1struct Foo { 2 void bar() {} 3}; 4 5int main() { 6 print(\u0026amp;Foo::bar); 7 return 0; 8} $ ./show_fields 00005568e19021e0 | 0000000000000000 If bar() were a virtual function, the meaning of the ptr field would be different. Still according to the Itanium CXX ABI, the value of ptr in the case of a virtual function is 1 plus the offset of the function within the v-table. In particular, we can\u0026rsquo;t access the address of the function without this since the vtable is embedded in the layout of the object. 2\nMicrosoft ABI Regarding the Microsoft ABI, there is not as much documentation compared to the \u0026ldquo;Linux/OSX\u0026rdquo; ABI. LLVM supports this ABI as described in clang/lib/CodeGen/MicrosoftCXXABI.cpp but I was still curious to know how (without LLVM) the layout of a function member pointer looks like. One could look at c1xx.dll/c2.dll located in the Visual Studio directory but these libraries are not straightforward to reverse.\nAlternately, we can try to infer the layout from the assembly code output. First of all, the result of sizeof() applied to a function member pointer is 16. 16 being twice a pointer\u0026rsquo;s size on an 64-bits architecture, we can start following the Itanium ABI and confirm or infirm our choices:\n1struct FuncMemPtr { 2 uintptr_t unknown1; 3 uintptr_t unknown2; 4}; Then we can unpack the fields of the function member pointer with the union trick:\n1struct Base1 { 2 virtual void f() { } 3}; 4 5struct Base2 { 6 virtual void g() {} 7}; 8 9struct Derived2 : Base2, Base1 { 10 virtual void f() {} 11 virtual void g() {} 12 virtual h() {} 13}; 14 15template\u0026lt;typename Func\u0026gt; 16void info(Func f) { 17 union { 18 Func fcn; 19 struct { 20 uintptr_t unknown1; 21 uintptr_t unknown2; 22 }; 23 }; 24 fcn = f; 25} 26 27int main() { 28 info(\u0026amp;Derived2::h); 29 info(\u0026amp;Derived2::f); 30 return 0; 31} The layout of the non-virtual function Derived2::h() seems to follow the same layout as the Itanium ABI where we find the function pointer in the first field.\nFor the virtual function Derived2::f, we can notice a first memory write that fills the first field with a pointer to a thunk 3 function while the second field contains a constant which matches the value of this adjustor. For the second field (this adjustor), we can switch from \u0026amp;Derived2::f to \u0026amp;Derived2::g to confirm that it changes accordingly to the output of /d1reportAllClassLayout\nThis leads to the following guessing:\n1struct MsvcCXXFuncMember { 2 uintptr_t fnc_ptr; // That can be a thunk for virtual function 3 int adjustor; // int because of mov DWORD and not mov QWORD in this assembly output 4}; These two fields follow the LLVM implementation:\n1struct { 2 // A pointer to the member function to call. If the member function is 3 // virtual, this will be a thunk that forwards to the appropriate vftable 4 // slot. 5 void *FunctionPointerOrVirtualThunk; 6 7 // An offset to add to the address of the vbtable pointer after 8 // (possibly) selecting the virtual base but before resolving and calling 9 // the function. 10 // Only needed if the class has any virtual bases or bases at a non-zero 11 // offset. 12 int NonVirtualBaseAdjustment; 13 14 // The offset of the vb-table pointer within the object. Only needed for 15 // incomplete types. 16 int VBPtrOffset; 17 18 // An offset within the vb-table that selects the virtual base containing 19 // the member. Loading from this offset produces a new offset that is 20 // added to the address of the vb-table pointer to produce the base. 21 int VirtualBaseAdjustmentOffset; 22}; From LLVM, we also learn that the full layout can contain up to four fields. We can trigger the third field with the following change:\n1@@ -11,3 +11,3 @@ 2-struct Derived2 : Base2, Base1 { 3+struct Derived2 : Base2, virtual Base1 { 4 virtual void f() {} 5@@ -32 +32,2 @@ 6 } The fourth field is a bit more tricky to trigger and the following code comes from the LLVM test suite 4\n1struct B1 { 2 void foo(); 3 int b; 4}; 5struct B2 { 6 int b2; 7 int v; 8 void foo(); 9}; 10 11struct UnspecWithVBPtr; 12int UnspecWithVBPtr::*forceUnspecWithVBPtr; 13struct UnspecWithVBPtr : B1, virtual B2 { 14 int u; 15 void foo(); 16}; We can notice that the result of sizeof() applied to UnspecWithVBPtr::foo is 24: sizeof(uintptr_t) + 3 * sizeof(int) + padding\nConclusion The profiler described in the first blog post works as expected for non-virtual but does not work with virtual functions that follow the Itanium ABI. To work with virtual functions, we would need to pass an extra parameter to the object that implements the virtual functions. By assuming that the vtable is placed at the beginning of the object\u0026rsquo;s layout, we can support such functions with the following modifications:\n1diff --git a/main.cpp b/main.cpp 2index d30a0c1..65d18eb 100644 3--- a/main.cpp 4+++ b/main.cpp 5 6+struct Foo { 7+ virtual void bar() { 8+ std::cout \u0026lt;\u0026lt; \u0026#34;In bar\u0026#34; \u0026lt;\u0026lt; std::endl; 9+ } 10+ uint8_t x = 1; 11+}; 12+ 13 14@@ -88,9 +95,10 @@ struct Profiler { 15 16- void setup() { 17- PROFILE(LIEF::ELF::Parser::init); 18- PROFILE(LIEF::ELF::Parser::parse_segments\u0026lt;LIEF::ELF::ELF64\u0026gt;); 19+ template\u0026lt;class T\u0026gt; 20+ void setup(const T\u0026amp; obj) { 21+ const uintptr_t vtable = *reinterpret_cast\u0026lt;const uintptr_t*\u0026gt;(\u0026amp;obj); 22+ profile_func(\u0026amp;Foo::bar, \u0026#34;Foo:bar\u0026#34;, vtable); 23 } 24 25@@ -98,8 +106,13 @@ struct Profiler { 26 template\u0026lt;typename Func\u0026gt; 27- void profile_func(Func func, std::string name) { 28+ void profile_func(Func func, std::string name, uintptr_t vtable = 0) { 29 void* addr = cast_func(func); 30+ 31+ if (vtable \u0026gt; 0) { 32+ const uintptr_t voff = reinterpret_cast\u0026lt;uintptr_t\u0026gt;(addr) - 1; 33+ addr = *reinterpret_cast\u0026lt;void**\u0026gt;(vtable + voff); 34+ } 35 funcs[reinterpret_cast\u0026lt;uintptr_t\u0026gt;(addr)] = std::move(name); 36 gum_interceptor_begin_transaction (ctx_-\u0026gt;interceptor); 37 gum_interceptor_attach (ctx_-\u0026gt;interceptor, 38@@ -130,8 +143,9 @@ int main(int argc, const char** argv) { 39 return 1; 40 } 41 42+ Foo f; 43 Profiler\u0026amp; prof = Profiler::get(); 44- prof.setup(); 45- LIEF::ELF::Parser::parse(argv[1]); 46+ prof.setup(f); 47+ f.bar(); 48 return 0; 49 } The Microsoft C++ ABI is poorly documented but the LLVM project is a good reference for that. One might also be interested in this presentation (Bringing Clang and LLVM to Visual C++ users) that outlines the challenges for LLVM developers to support this ABI.\nAcknowledgment Thanks to Julien Jorge for proofreading this post and his valuable feedback.\nWhich is still UB\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nThere is an exception for the ARM architecture:\nIn the 32-bit ARM representation, the this-adjustment stored in adj is left-shifted by one, and the low bit of adj indicates whether ptr is a function pointer (including null) or the offset of a v-table entry. A virtual member function pointer sets ptr to the v-table entry offset as if by reinterpret_cast\u0026lt;fnptr_t\u0026gt;(uintfnptr_t(offset)). A null member function pointer sets ptr to a null function pointer and must ensure that the low bit of adj is clear; the upper bits of adj remain unspecified.\n\u0026#160;\u0026#x21a9;\u0026#xfe0e; A thunk function is generated by the compiler as a trampoline to the right virtual function. This trampoline can also be used to fix this pointer with the given adjustor.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nThe layout of this code goes beyond my understanding\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","date":161784e4,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":161784e4,"objectID":"a8347144c5c2855bdc158f2bb590c997","permalink":"https://lief.re/blog/2021-04-08-profiling-cpp-code-with-frida-part2/","publishdate":"2021-04-08T00:00:00Z","relpermalink":"/blog/2021-04-08-profiling-cpp-code-with-frida-part2/","section":"blog","summary":"This blog post brings additional information about using Frida to hook function in a static library. It exposes the limits of the approach regarding the different C++ ABI","tags":null,"title":"Profiling C++ code with Frida (2nd part)","type":"blog"},{"authors":{"avatar":"img/avatar/romain.png","display_name":"Romain Thomas","family_name":"Thomas","given_name":"Romain","profile":"https://www.romainthomas.fr"},"categories":null,"content":"Frida is a well-known reverse engineering framework that enables (along with other functionalities) to hook functions on closed-source binaries. While hooking is generally used to get dynamic information about functions for which we don\u0026rsquo;t have the source code, this blog post introduces another use case to profile C/C++ code.\nCode Profiling LIEF starts to be quite mature but there are still some concerns regarding:\nThe speed (especially when rebuilding large ELF binaries) The memory consumption Compilation time These limitations are \u0026ldquo;quite\u0026rdquo; acceptable on modern computers but when we target embedded systems like iPhone or Android devices, it starts to reach the limits. Since (spoiler) I started to implement a parser for the Dyld shared cache and for parsing in-memory Mach-O files, I faced some of these issues.\nTo address these problems, we must identify where are the bottleneck and ideally, without modifying too much the source code. To profile memory consumption, valgrind --tool=massif does the job pretty well out of the box: we don\u0026rsquo;t need to pass extra compilation flags nor modifying the source code. Regarding the code execution, we can profile it with:\nValgrind (or QBDI?) Inserting log functions in the source code Using compiler instrumentation: -finstrument-functions In the context of profiling LIEF, I\u0026rsquo;m mostly interested in profiling the code at the functions level: \u0026ldquo;How long does a function take to be executed?\u0026rdquo;\nValgrind provides CPU cycles that are somehow correlated to the execution time but it requires an extra processing step to identify the function\u0026rsquo;s overhead. Moreover, since Valgrind instruments the code, it can take time to profile a large codebase.\nOn the other hand, inserting log messages in the code is the easiest way to get the execution time of functions. I was not completely convinced with this solution since it adds log messages that are not always needed.\nFinally, Clang and GCC enable to instrument the source code through the -finstrument-functions compilation flag. This flag basically inserts the __cyg_profile_func_enter and __cyg_profile_func_exit functions at the beginning and at the end of the original functions.\nFrida works on compiled code and provides a mechanism (hook) to insert a callback before a given function and after the execution of the function. It is very similar to the -finstrument-functions, except that it is done post-compilation.\nTo setup a hook, we only have to provide a pointer to the function that aims at being hooked. In the context of profiling execution time, the callback at the beginning of the function can initialize a std::chrono object and the callback at the end of the function can print the time spent since the initialization of the std::chrono.\nLet\u0026rsquo;s take a simple example to explain what Frida does. If we have the following function:\n1void heavy_function() { 2 for (size_t i = 0; i \u0026lt; 1000000; ++i) { 3 // Code that takes time ... 4 } 5} Frida enables (from a logical point of view) to have:\n1void heavy_function() { 2 frida_on_enter(); 3 4 for (size_t i = 0; i \u0026lt; 1000000; ++i) { 5 // Code that takes time ... 6 } 7 8 frida_on_leave(); 9} \u0026hellip; without tweaking the compilation flags :)\nFrida Bootstrap Most of the documentation and the blog posts that we can find on the internet about Frida are based on the JavaScript API but Frida also provides in the first place the frida-gum SDK 1 that exposes a C API over the hook engine. This SDK comes with the frida-gum-example.c file that shows how to setup the hook engine.\nRegarding the API of our profiler, we would like to have :\n1#include \u0026lt;LIEF/ELF.hpp\u0026gt; 2 3// Functions to profile 4profile(\u0026amp;LIEF::ELF::Parser::parse_symbol_version); 5profile(\u0026amp;LIEF::ELF::Parser::parse_segments\u0026lt;LIEF::ELF::ELF64\u0026gt;); 6 7LIEF::ELF::Parser::parse(\u0026#34;./sample.bin\u0026#34;); And an output like:\n1$ ./run 2LIEF::ELF::Parser::parse_symbol_version() took 39ms 3LIEF::ELF::Parser::parse_segments() took 109ms I won\u0026rsquo;t go through all the details of the implementation of the profiler since the source code is on Github but the next section covers some tricky parts.\nFirstly, and as mentioned previous section, Frida takes a void* pointer on the function to hook. Therefore, we have to cast \u0026amp;LIEF::ELF::Parser::parse_symbol_version into a void*. One might want to do reinterpret_cast\u0026lt;void*\u0026gt;() on the function pointer but it does not work. The trick here is to use a union to get the void*:\n1template\u0026lt;typename Func\u0026gt; 2inline void* cast_func(Func f) { 3 union { 4 Func func; 5 void* p; 6 }; 7 func = f; 8 return p; 9} Secondly, the example frida-gum-example.c uses an enum to identify the function being hooked:\n1typedef enum _ExampleHookId ExampleHookId; 2enum _ExampleHookId 3{ 4 EXAMPLE_HOOK_OPEN, 5 EXAMPLE_HOOK_CLOSE 6}; 7... 8gum_interceptor_attach (interceptor, 9 GSIZE_TO_POINTER (gum_module_find_export_by_name (NULL, \u0026#34;open\u0026#34;)), 10 listener, 11 GSIZE_TO_POINTER (EXAMPLE_HOOK_OPEN)); In our case, we don\u0026rsquo;t know beforehand which functions will be hooked or profiled by the user. Consequently, instead of using an enum we use the function\u0026rsquo;s absolute address and we register its name in a map:\n1template\u0026lt;typename Func\u0026gt; 2void profile_func(Func func, std::string name) { 3 void* addr = cast_func(func); 4 funcs[reinterpret_cast\u0026lt;uintptr_t\u0026gt;(addr)] = std::move(name); 5 gum_interceptor_begin_transaction(ctx_-\u0026gt;interceptor); 6 gum_interceptor_attach(ctx_-\u0026gt;interceptor, 7 /* Target */ reinterpret_cast\u0026lt;gpointer\u0026gt;(addr), 8 /* Param */ reinterpret_cast\u0026lt;GumInvocationListener*\u0026gt;(ctx_), 9 /* id */ reinterpret_cast\u0026lt;gpointer\u0026gt;(addr)); 10 gum_interceptor_end_transaction(ctx_-\u0026gt;interceptor); 11} Last but not least, we might want to profile private or protected functions. To enable the access to the Profiler to protected/private members we can friend an opaque Profile structure:\n1struct Profiler; 2 3namespace LIEF { 4class LIEF_API Parser : public LIEF::Parser { 5 public: 6 friend struct ::Profiler; 7 ... 8}; 9} Conclusion Through this blog post, we have shown that Frida also has some applications in the field of software engineering not only for reverse-engineering :)\nThis approach can be quite convenient to isolate the profiling process from the compilation process. It also enables to quickly switch from a given SDK version to another as long as the profiled functions still exist.\n1$ clang++ [-other-flags] LIEF-0.9.0/lib/libLIEF.a profile.cpp 2$ clang++ [-other-flags] LIEF-0.12.0/lib/libLIEF.a profile.cpp The source code used in this blog post is available on Github: lief-project/frida-profiler\nSee frida-gum-devkit-14.2.13-linux-x86_64.tar.xz on https://github.com/frida/frida/releases\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","date":1615334400,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1615334400,"objectID":"fa80b9333467808feea8afb386bea4d5","permalink":"https://lief.re/blog/2021-03-10-profiling-cpp-code-with-frida/","publishdate":"2021-03-10T00:00:00Z","relpermalink":"/blog/2021-03-10-profiling-cpp-code-with-frida/","section":"blog","summary":"This blog post introduces a new technique to profile C/C++ code with Frida","tags":null,"title":"Profiling C++ code with Frida","type":"blog"},{"authors":{"avatar":"img/avatar/romain.png","display_name":"Romain Thomas","family_name":"Thomas","given_name":"Romain","profile":"https://www.romainthomas.fr"},"categories":null,"content":" Tl;DR\nLIEF v0.11.1 fixes some issues related to PE Authentihash computation. The new packages are available on PyPI and the SDKs can be downloaded on the official website. Enjoy!\nLIEF 0.11.0 missed handling some cases in the processing of the PE Authentihash. This new release addresses these issues and the following blog post explains the cases we did not handle.\nSection name PE section\u0026rsquo;s names are stored in a fixed char array (8 bytes) which means that a section\u0026rsquo;s name can contain trailing bytes after the null char:\n1struct pe_section { 2 char name[8]; 3 uint32_t RVA; 4 // ... 5}; Before v0.11.1, LIEF didn\u0026rsquo;t take into account the trailing bytes and stopped to read the section\u0026rsquo;s name on the first null char:\n1this-\u0026gt;name_ = std::string(header-\u0026gt;name, sizeof(header-\u0026gt;name)).c_str(); This implementation has two drawbacks. First, we lose information since we don\u0026rsquo;t store the extra trailing bytes. Regular binaries have zero trailing bytes after the first null char but some of them might use this spot to hide data.\nSecondly, the full section name (i.e the whole 8 bytes) is used to compute the Authentihash. Therefore, if the first null char is followed by trailing bytes different from zero, the computed hash is inconsistent.\nData directory According to the PE specifications 1 the last entry of the data directory table must contain a null entry (i.e. an entry with an RVA and size set to 0).\nIt turns out that this requirement is not enforced by the loader. In the case of the binary (bc203f2b6a\u0026hellip;) the last entry is set to 0x02b7bc68/0x01a7a0 (used for watermarking?).\nIn the previous versions of LIEF we assumed that the last entry of the data directory table was always zero. Since the last entry is used to compute the Authentihash value, it led to a bad signature while it was effectively correct.\nThis issue has been addressed in the commit 3c65ffe\nReturn value of verify_signature() As noticed by Cedric Halbronn in the issue issues/532, the return value of LIEF::PE::Binary::verify_signature lacks of information when the verification failed. The return value was either VERIFICATION_FLAGS.OK or VERIFICATION_FLAGS.BAD_SIGNATURE because of a fail-fast implementation of the verification flag.\nThe function now returns flags as follows:\nVERIFICATION_FLAGS.BAD_DIGEST | VERIFICATION_FLAGS.BAD_SIGNATURE | VERIFICATION_FLAGS.CERT_EXPIRED Other issues One of the critical issues raised by imidoriya and fixed in the new version is the processing of the overlay data when the \u0026ldquo;data directory signature\u0026rdquo; is located in this area (c.f. 463bb0ec3\u0026hellip;). This kind of layout triggers a memory error on this part of the processing Binary.cpp#L1174-L1187. It has been addressed in the commit 05103f5\nAcknowledgment Thanks to Andrew Williams for providing the different samples that raised some of these errors! Thank you also to Cedric Halbronn and the CERT Gouvernemental of Luxembourg for their feedback about the API.\nhttps://docs.microsoft.com/en-us/windows/win32/debug/pe-format#optional-header-data-directories-image-only\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","date":1613952e3,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1613952e3,"objectID":"75766af7e151ed7790e370e93d7aed06","permalink":"https://lief.re/blog/2021-02-22-lief-0-11-1/","publishdate":"2021-02-22T00:00:00Z","relpermalink":"/blog/2021-02-22-lief-0-11-1/","section":"blog","summary":"This blog post outlines the fixes made in LIEF 0.11.1","tags":null,"title":"LIEF - Release 0.11.1","type":"blog"},{"authors":{"avatar":"img/avatar/romain.png","display_name":"Romain Thomas","family_name":"Thomas","given_name":"Romain","profile":"https://www.romainthomas.fr"},"categories":null,"content":" Tl;DR\nLIEF v0.11.0 is out. The main changelog is available here and packages can be downloaded on the official website. Installation As for the previous versions, release packages are available on the Github release page and Python packages can be installed from PyPI:\n1$ pip install [--user] lief==0.11.0 Release Highlight It has spent more than one year since the release of the version 0.10.1 but we are glad to announce that LIEF v0.11.0 is finally out!\nThis new version does not introduce a lot of new features but rather small improvements in the different formats. One of the main changes in terms of new functionalities is the refactoring of the PE Authenticode. We fixed parsing issues and we implemented verification functions so that we can now verify a PE signed binary through:\n1import lief 2pe = lief.parse(\u0026#34;signed.exe\u0026#34;) 3assert pe.verify_signature() == lief.PE.Signature.VERIFICATION_FLAGS.OK We also improved the computation of imphash so that it can generate the same value as pefile (and therefore, Virus Total)\n1pe = lief.parse(\u0026#34;example.exe\u0026#34;) 2vt_imphash = lief.PE.get_imphash(pe, lief.PE.IMPHASH_MODE.PEFILE) 3lief_imphash = lief.PE.get_imphash(pe, lief.PE.IMPHASH_MODE.DEFAULT) Regarding the contributions, Janusz Lisiecki fixed a performance issue in the ELF builder that moved from N2 computations to Nlog(N). His contribution raised a major weakness in LIEF: performances issue when re-building objects. We started to refactor the whole ELF builder to avoid recursive calls.\nAdrien Guinet updated the bin2lib tutorial to support recent version of glibc which introduced the DF_1_PIE flag.\nkohnakagawa and Clcanny also fixed various issues related to the ELF \u0026amp; PE formats.\nNinja on Windows \u0026amp; CI We improved AppVeyor Windows CI to be more efficient on the compiler cache. It results in a decrease of 1-hour compilation time to ~20 minutes thanks to sccache and Ninja.\nIf Ninja is installed on Windows, one can now use the --ninja flag when calling setup.py:\n1$ python.exe .\\setup.py --ninja build install [--user] Using Ninja on Windows requires to invoke the vcvarsall.bat script beforehand. This script can be tricky to locate depending on the MSVC versions. Thankfully, setuptools provides the msvc.msvc14_get_vc_env() helper to get the environment variables that need to populate the calling script. We use it in LIEF\u0026rsquo;s setup.py as follows:\n1... 2env = os.environ 3if platform.system() == \u0026#34;Windows\u0026#34;: 4 from setuptools import msvc 5 if build_with_ninja: 6 arch = \u0026#39;x64\u0026#39; if is64 else \u0026#39;x86\u0026#39; 7 ninja_env = msvc.msvc14_get_vc_env(arch) 8 env.update(ninja_env) 9 else: 10 ... 11... Regarding the CI, we added Android and iOS SDK packages as well as Python wheels for Linux AArch64 (manylinux2014 compliant).\nThe nightly builds are available on the gh-pages branch of the repository lief-project/packages:\nThe sdk directory contains a shared and a static version of LIEF library for iOS, macOS, Android, Windows, Linux, \u0026hellip; The lief directory contains the Python wheels for the supported platforms What\u0026rsquo;s next We have a few ideas of what would like to improve and introduce in the next releases of LIEF which includes:\nRefactoring the ELF builder to address performances issues (see also #482)\nSupporting OAT/VDEX/CDEX for Android 9, 10 and 11\nSupporting Mach-O signature (as for PE Authenticode)\nSupporting Android packed relocations (in the parser and in the builder)\nImproving the C API to ease Rust bindings\nSupporting DART snapshot formats to ease reverse-engineering of Flutter applications.\nSpoiler: we can process all the clusters of a snapshot for a fixed version of the DART runtime.\n+= Fixing issues\nAlthough the roadmap mostly follows Quarkslab\u0026rsquo;s needs, the R\u0026amp;D time we have and the topic we enjoy to work on, we are open to the development of private or public features as it has been done for improving PE Authenticode.\nAcknowledgment Thank you to CERT Gouvernemental of Luxembourg that sponsored new functionalities in this release. Thanks also to Quarkslab for the time allocated to make this release.\n","date":1611014400,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1611014400,"objectID":"5ccc6e7fc14008e43e5ad1fd84aa8b2d","permalink":"https://lief.re/blog/2021-01-19-lief-0-11-0/","publishdate":"2021-01-19T00:00:00Z","relpermalink":"/blog/2021-01-19-lief-0-11-0/","section":"blog","summary":"This blog post summarizes the changes in LIEF 0.11.0","tags":null,"title":"LIEF - Release 0.11.0","type":"blog"},{"authors":{"avatar":"img/avatar/romain.png","display_name":"Romain Thomas","family_name":"Thomas","given_name":"Romain","profile":"https://www.romainthomas.fr"},"categories":null,"content":"Installation Release packages are available on the Github page and Python package can be installed with:\n1$ pip install [--user] lief==0.9.0 Release highlight Android Formats This new version of LIEF comes with support for Android formats related to the ART runtime: OAT, VDEX, DEX and ART. As the OAT format is a derivation of ELF, it made sense to add it in LIEF. Basically, this format is used by Android to wrap native code being the result of Dalvik bytecode optimization.\nRegarding VDEX, DEX, and ART, these formats have somehow a relation with OAT and therefore we also choose to add them. For more information about these Android formats and how to use them, a tutorial is available in the LIEF documentation: Android Formats.\nWe can currently only parse these formats, but their modification will come step by step in the project. Indeed, some attacks are based on the modification of the OAT format as it has been explained by Collin Mulliner in \u0026ldquo;Inside Android’s SafetyNetAttestation: Attack and Defense\u0026rdquo; 1 and \u0026ldquo;How Samsung Secures Your Wallet \u0026amp; How To Break It\u0026rdquo; 2 by Tencent’s Xuanwu Lab. In further version we plan to provide an API to add native code in OAT.\nJSON serialization As one purpose of this project is to provide an API that can be easily integrated in other projects, we are glad to announce that JSON serialization is now available for all LIEF objects. It means that one can now access to format information through a JSON interface. Previous versions had a JSON support for ELF and PE formats, the v0.9 now supports all formats and all objects.\nObjects can be serialized with the lief.to_json function:\n1import lief 2 3gcc = lief.parse(\u0026#34;/usr/bin/gcc\u0026#34;) 4lief.to_json(gcc.header) 5 6{ 7 \u0026#39;entrypoint\u0026#39;: 4209824, 8 \u0026#39;file_type\u0026#39;: \u0026#39;EXECUTABLE\u0026#39;, 9 \u0026#39;header_size\u0026#39;: 64, 10 \u0026#39;identity_class\u0026#39;: \u0026#39;CLASS64\u0026#39;, 11 \u0026#39;identity_data\u0026#39;: \u0026#39;LSB\u0026#39; 12} 13 14libSystem = lief.parse(\u0026#34;/usr/lib/libSystem.dylib\u0026#34;) 15lief.to_json(libSystem.commands[1]) 16 17{ 18 \u0026#39;command\u0026#39;: \u0026#39;SEGMENT\u0026#39;, 19 \u0026#39;command_offset\u0026#39;: 492, 20 \u0026#39;command_size\u0026#39;: 464, 21 \u0026#39;content_hash\u0026#39;: 18446744072658165641, 22 \u0026#39;data_hash\u0026#39;: 1841536728, 23 \u0026#39;file_offset\u0026#39;: 8192, 24 \u0026#39;file_size\u0026#39;: 4096, 25 \u0026#39;flags\u0026#39;: 0, 26 \u0026#39;init_protection\u0026#39;: 3, 27 \u0026#39;max_protection\u0026#39;: 7, 28 \u0026#39;name\u0026#39;: \u0026#39;__DATA\u0026#39;, 29 \u0026#39;numberof_sections\u0026#39;: 6, 30 \u0026#39;sections\u0026#39;: [\u0026#39;__nl_symbol_ptr\u0026#39;, 31 \u0026#39;__la_symbol_ptr\u0026#39;, 32 \u0026#39;__mod_init_func\u0026#39;, 33 \u0026#39;__const\u0026#39;, 34 \u0026#39;__data\u0026#39;, 35 \u0026#39;__common\u0026#39;], 36 \u0026#39;virtual_address\u0026#39;: 8192, 37 \u0026#39;virtual_size\u0026#39;: 4096 38} One can also disable the JSON module using a CMake configuration flag:\n1$ cmake -DLIEF_ENABLE_JSON=off ... What\u0026rsquo;s next LIEF v0.9 still has a poor support for Mach-O modification and only supports modifications on header and some Load commands.\nOne of the primitives to do more general modification on Mach-O format is the ability to add arbitrary Load commands. Some tools 3 4 exist to add commands, but they usually use padding between the load command table and the raw content or they remove / replace existing one. The main limitation with this technique is that the number of load command which can be added depends on the size of the padding. In LIEF, we took advantage of the fact that Mach-O are PIE to shift the content that follow the load command table. This enable us to inject more than one or two commands. To keep a consistent state of format (relocations, segment\u0026rsquo;s virtual address, \u0026hellip;), the Mach-O builder of LIEF rebuilds the export-trie, regenerates binding opcode, rebase opcodes, \u0026hellip;\nIn our tests, we succeeded in adding arbitrary number of LC_DYLIB command in clang as well as adding 10 new sections in the __TEXT segment. We are currently working on stabilization of the instrumentation process, but it should be merged soon in then master branch. Stay tuned!\nWe will be also be presenting about file formats instrumentation at Recon Montréal and Pass The Salt for a talk about file formats instrumentation. In this talk we will present techniques to perform code injection, hooking by using formats.\nSlide 58 of Inside Safetynet Attestation Attacks and Defense.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nSlide 89 of How Samsung Secures Your Wallet And How To Break It.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\ninsert_dylib.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\noptool\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","date":1528675200,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1528675200,"objectID":"6af3cb0a31132349390362d7e24f4601","permalink":"https://lief.re/blog/2018-06-11-lief-0-9-0/","publishdate":"2018-06-11T00:00:00Z","relpermalink":"/blog/2018-06-11-lief-0-9-0/","section":"blog","summary":"This blog post introduces major changes in LIEF 0.9 as well as work in progress features that will be integrated in further releases","tags":null,"title":"LIEF - Release 0.9.0","type":"featured"},{"authors":{"avatar":"img/avatar/romain.png","display_name":"Romain Thomas","family_name":"Thomas","given_name":"Romain","profile":"https://www.romainthomas.fr"},"categories":null,"content":" Tl;DR\nLIEF v0.8.3 is out. The main changelog is available here and packages can be downloaded on the official website. Development process We attach a great importance to the automation of some development tasks like testing, distributing, packaging, etc. Here is a summary of these processes:\nEach commits is tested on\nLinux - x86-64 - Python{2.7, 3.5, 3.6} Windows - x86 / x86-64 - Python{2.7, 3.5, 3.6} OSX - x86-64 - Python{2.7, 3.5, 3.6} The test suite includes:\nTests on the Python API Tests on the C API Tests on the parsers Tests on the builders If tests succeeds packages are automatically uploaded on the https://github.com/lief-project/packages repository.\nFor tagged version, packages are uploaded on the Github release page: https://github.com/lief-project/LIEF/releases.\nDockerlief To facilitate the compilation and the use of LIEF, we created the Dockerlief repo which includes various Dockerfiles as well as the dockerlief utility. dockerlief is basically a wrapper on docker build .\nAmong Dockerfiles, we provide a Dockerfile to cross compile LIEF for Android (ARM, AARCH64, x86, x86-64)\nTo cross compile LIEF for Android ARM, one can run:\n1$ dockerlief build --api-level 21 --arm lief-android 2 3[INFO] - Location of the Dockerfiles: ~/dockerfiles 4[INFO] - Building Dockerfile: \u0026#39;lief-android\u0026#39; 5[INFO] - Target architecture: armeabi-v7a 6[INFO] - Target API Level: 21 The SDK package LIEF-0.8.3-Android_API21_armeabi-v7a.tar.gz is automatically pulled from the Docker to the current directory.\nIntegration of LibFuzzer Fuzzing our own library is a good way to detect bugs, memory leak, unsanitized inputs \u0026hellip;\nThus, we integrated LibFuzzer in the project. Fuzzing the LIEF ELF, PE, Mach-O parser is as simple as:\n1#include \u0026lt;LIEF/LIEF.hpp\u0026gt; 2#include \u0026lt;vector\u0026gt; 3#include \u0026lt;memory\u0026gt; 4 5extern \u0026#34;C\u0026#34; int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) { 6 std::vector\u0026lt;uint8_t\u0026gt; raw = {data, data + size}; 7 try { 8 std::unique_ptr\u0026lt;LIEF::Binary\u0026gt; b{LIEF::Parser::parse(raw)}; 9 } catch (const LIEF::exception\u0026amp; e) { 10 std::cout \u0026lt;\u0026lt; e.what() \u0026lt;\u0026lt; std::endl; 11 } 12 return 0; 13} To launch the fuzzer, one can run the following commands:\n1$ make fuzz-elf # Launch ELF Fuzzer 2$ make fuzz-pe # Launch PE Fuzzer 3$ make fuzz-macho # Launch MachO Fuzzer 4$ make fuzz # Launch ELF, PE and MachO Fuzzer ELF Play with ELF symbols - Part 2 In the tutorial #03 we demonstrated how to swap dynamic symbols between a binary and a library. In this part, we will see how we can rename these symbols.\nChanging symbol names is not a trivial modification, since modifying the string table of the PT_DYNAMIC segment has side effects:\nIt requires to update the hash table (GNU Hash / SYSV). It usually requires to extend the DYNAMIC part of the ELF format. The previous version of LIEF already implements the rebuilding of the hash table but not the extending of the DYNAMIC part.\nWith the v0.8.3 we can extend the DYNAMIC part. Therefore:\nWe can add new entries in the .dynamic section We can change dynamic symbols names We can change DT_RUNPATH and DT_RPATH without length restriction We will rename all imported functions of gpg that are imported from libgcrypt.so.20 into a_very_long_name_of_function_XX and all exported functions of libgcrypt.so.20 into the same name (XX is the symbol index). 1\n1import lief 2 3# Load targets 4gpg = lief.parse(\u0026#34;/usr/bin/gpg\u0026#34;) 5libgcrypt = lief.parse(\u0026#34;/usr/lib/libgcrypt.so.20\u0026#34;) 6 7# Change names 8for idx, lsym in enumerate(filter(lambda e : e.exported, libgcrypt.dynamic_symbols)): 9 new_name = \u0026#39;a_very_long_name_of_function_{:d}\u0026#39;.format(idx) 10 print(\u0026#34;New name for \u0026#39;{}\u0026#39;: {}\u0026#34;.format(lsym.name, new_name)) 11 for bsym in filter(lambda e : e.name == lsym.name, gpg.dynamic_symbols): 12 bsym.name = new_name 13 lsym.name = new_name 14 15# Write back 16binary.write(gpg.name) 17libgcrypt.write(libgcrypt.name) By using readelf we can check that function names have been modified:\n1$ readelf -s ./gpg|grep \u0026#34;a_very_long_name\u0026#34; 2 3 2: 0000000000000000 0 FUNC GLOBAL DEFAULT UND a_very_long_name_of_funct@GCRYPT_1.6 (2) 4 3: 0000000000000000 0 FUNC GLOBAL DEFAULT UND a_very_long_name_of_funct@GCRYPT_1.6 (2) 5 11: 0000000000000000 0 FUNC GLOBAL DEFAULT UND a_very_long_name_of_funct@GCRYPT_1.6 (2) 6 13: 0000000000000000 0 FUNC GLOBAL DEFAULT UND a_very_long_name_of_funct@GCRYPT_1.6 (2) 7 ... 8 9$ readelf -s ./libgcrypt.so.20|grep \u0026#34;a_very_long_name\u0026#34; 10 11 88: 000000000000d050 6 FUNC GLOBAL DEFAULT 10 a_very_long_name_of_funct@@GCRYPT_1.6 12 89: 000000000000dcd0 69 FUNC GLOBAL DEFAULT 10 a_very_long_name_of_funct@@GCRYPT_1.6 13 90: 000000000000d310 34 FUNC GLOBAL DEFAULT 10 a_very_long_name_of_funct@@GCRYPT_1.6 14 91: 000000000000de70 81 FUNC GLOBAL DEFAULT 10 a_very_long_name_of_funct@@GCRYPT_1.6 15 ... Now if we run the new gpg binary, we get the following error:\n1$ ./gpg --output bar.txt --symmetric ./foo.txt 2relocation error: ./gpg: symbol a_very_long_name_of_function_8, version GCRYPT_1.6 not defined in file libgcrypt.so.20 with link time reference Because the Linux loader tries to resolve the function a_very_long_name_of_function_8 against /usr/lib/libgcrypt.so.20 and that library doesn\u0026rsquo;t include the updated names we get the error.\nOne way to fix this error is to set the environment variable LD_LIBRARY_PATH to the current directory:\n1$ LD_LIBRARY_PATH=. ./gpg --output bar.txt --symmetric ./foo.txt 2$ xxd ./bar.txt|head -n1 3 400000000: 8c0d 0407 0302 c5af 9fba cab1 9545 ebd2 .............E.. 5 6$ LD_LIBRARY_PATH=. ./gpg --output foo_decrypted.txt --decrypt ./bar.txt 7$ xxd ./foo_decrypted.txt|head -n1 8 900000000: 4865 6c6c 6f20 576f 726c 640a Hello World. Another way to fix it is to add a new entry in .dynamic section.\nAs mentioned at the beginning, we can now add new entries in the .dynamic so let\u0026rsquo;s add a DT_RUNPATH entry with the $ORIGIN value so that the Linux loader resolves the modified libgcrypt.so.20 instead of the system one:\n1... 2# Add a DT_RUNPATH entry 3gpg += lief.ELF.DynamicEntryRunPath(\u0026#34;$ORIGIN\u0026#34;) 4 5# Write back 6binary.write(gpg.name) 7libgcrypt.write(libgcrypt.name) And we don\u0026rsquo;t need the LD_LIBRARY_PATH anymore:\n1$ readelf -d ./gpg|grep RUNPATH 2 30x000000000000001d (RUNPATH) Library runpath: [$ORIGIN] 4 5$ ./gpg --decrypt ./bar.txt 6 7gpg: AES encrypted data 8gpg: encrypted with 1 passphrase 9Hello World Hiding its symbols While IDA v7.0 has been released recently, among the changelog one can notice two changes:\nELF: describe symbols using symtab from DYNAMIC section ELF: IDA now uses the PHT by default instead of the SHT to load segments from ELF files These changes are partially true. Let\u0026rsquo;s see what go wrong in IDA with the following snippet:\n1id = lief.parse(\u0026#34;/usr/bin/id\u0026#34;) 2dynsym = id.get_section(\u0026#34;.dynsym\u0026#34;) 3dynsym.entry_size = dynsym.size // 2 4id.write(\u0026#34;id_test\u0026#34;) This snippet defines the size of one symbol as the entire size of .dynsym section divided by 2.\nThe normal size of ELF symbols would be:\n1\u0026gt;\u0026gt;\u0026gt; print(int(lief.ELF.ELF32.SIZES.SYM)) # For 32-bits 216 3\u0026gt;\u0026gt;\u0026gt; print(int(lief.ELF.ELF64.SIZES.SYM)) # For 64-bits 424 In the case of the 64-bits id binary, we set this size to 924.\nWhen opening id_test in IDA and forcing to use Segment for parsing and not Sections we get the following imports :\nOnly one import is resolved and the others are **hidden**. Note that id_test is still executable:\n1$ id_test 2uid=1000(romain) gid=1000(romain) ... By using readelf we can still retrieve the symbols and we have an error indicating that symbol size is corrupted.\n1$ readelf -s id_test 2readelf: Error: Section 5 has invalid sh_entsize of 000000000000039c 3readelf: Error: (Using the expected size of 24 for the rest of this dump) 4 5Symbol table \u0026#39;.dynsym\u0026#39; contains 77 entries: 6 Num: Value Size Type Bind Vis Ndx Name 7 0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND 8 1: 0000000000000000 0 FUNC GLOBAL DEFAULT UND endgrent@GLIBC_2.2.5 (2) 9 2: 0000000000000000 0 FUNC GLOBAL DEFAULT UND __uflow@GLIBC_2.2.5 (2) 10 3: 0000000000000000 0 FUNC GLOBAL DEFAULT UND getenv@GLIBC_2.2.5 (2) 11 4: 0000000000000000 0 FUNC GLOBAL DEFAULT UND free@GLIBC_2.2.5 (2) 12 5: 0000000000000000 0 FUNC GLOBAL DEFAULT UND abort@GLIBC_2.2.5 (2) 13 ... In LIEF the (dynamic) symbol table address is computed through the DT_SYMTAB from the PT_DYNAMIC segment.\nTo compute the number of dynamic symbols LIEF uses three heuristics:\nBased on hash tables (Gnu Hash / Sysv Hash) Based on relocations Based on sections Malwares start to use this kind of corruption as we will see in the next part.\nRootnik Malware Rootnik is a malware targeting Android devices. It has been analyzed by Fortinet security researcher.\nA full analysis of the malware is available on the Fortinet blog.\nThis part is focused on the ELF format analysis of one component: libshell.\nActually there are two libraries libshella_2.10.3.1.so and libshellx_2.10.3.1.so. As they have the same purpose, we will use the x86 version.\nFirst if we look at the ELF sections of libshellx_2.10.3.1.so we can notice that the address, offset and size of some sections like .text, .init_array, .dynstr, .dynsym are set to 0.\nThis kind of modification is used to disturb tools that rely on sections to parse some ELF structures (like objdump, readelf, IDA \u0026hellip;)\n1$ readelf -S ./libshellx-2.10.3.1.so 2There are 21 section headers, starting at offset 0x2431c: 3 4Section Headers: 5 [Nr] Name Type Addr Off Size ES Flg Lk Inf Al 6 [ 0] NULL 00000000 000000 000000 00 0 0 0 7 [ 1] .dynsym DYNSYM 00000114 000114 000300 10 A 2 1 4 8 [ 2] .dynstr STRTAB 00000414 000414 0001e2 00 A 0 0 1 9 [ 3] .hash HASH 00000000 000000 000000 04 A 1 0 4 10 [ 4] .rel.dyn REL 00000000 000000 000000 08 A 1 0 4 11 [ 5] .rel.plt REL 00000000 000000 000000 08 AI 1 6 4 12 [ 6] .plt PROGBITS 00000000 000000 000000 04 AX 0 0 16 13 [ 7] .text PROGBITS 00000000 000000 000000 00 AX 0 0 16 14 [ 8] .code PROGBITS 00000000 000000 000000 00 AX 0 0 16 15 [ 9] .eh_frame PROGBITS 00000000 000000 000000 00 A 0 0 4 16 [10] .eh_frame_hdr PROGBITS 00000000 000000 000000 00 A 0 0 4 17 [11] .fini_array FINI_ARRAY 00000000 000000 000000 00 WA 0 0 4 18 [12] .init_array INIT_ARRAY 00000000 000000 000000 00 WA 0 0 4 19 [13] .dynamic DYNAMIC 0000ce50 00be50 0000f8 08 WA 2 0 4 20 [14] .got PROGBITS 00000000 000000 000000 00 WA 0 0 4 21 [15] .got.plt PROGBITS 00000000 000000 000000 00 WA 0 0 4 22 [16] .data PROGBITS 00000000 000000 000000 00 WA 0 0 16 23 [17] .bss NOBITS 0000d398 00c395 000000 00 WA 0 0 4 24 [18] .comment PROGBITS 00000000 00c395 000045 01 MS 0 0 1 25 [19] .note.gnu.gold-ve NOTE 00000000 00c3dc 00001c 00 0 0 4 26 [20] .shstrtab STRTAB 00000000 024268 0000b1 00 0 0 1 27Key to Flags: 28 W (write), A (alloc), X (execute), M (merge), S (strings), I (info), 29 L (link order), O (extra OS processing required), G (group), T (TLS), 30 C (compressed), x (unknown), o (OS specific), E (exclude), 31 p (processor specific) If we open the given library in IDA we have no exports, no imports and no sections:\nBased on the segments and dynamic entries we can recover most of these information:\n.init_array address and size are available through the DT_INIT_ARRAY and DT_INIT_ARRAYSZ entries .dynstr address and size are available through the DT_STRTAB and DT_STRSZ .dynsym address is available through the DT_SYMTAB The script recover_shellx.py recovers the missing values, patch sections and rebuild a fixed library.\nNow if we open the new libshellx-2.10.3.1_FIXED.so we have access to imports / exports and some sections. The .init_array section contains 2 functions:\ntencent652524168491435794009 sub_60C0 The tencent652524168491435794009 function basically do a stack alignment and the sub_60C0 is one of the decryption routines 2. This function is obfuscated with graph flattening and looks like to O-LLVM graph flattening passe 3:\nFortunately there are few \u0026ldquo;relevant blocks\u0026rdquo; and there are not obfuscated.\nThe function sub_60C0 basically iterates over the program headers to find the encrypted one and decrypt it using a custom algorithm (based on shift, xor, etc).\nTriggering CVE-2017-1000249 The CVE-2017-1000249 is a stack based buffer overflow in the file utility. It affects the versions 5.29, 5.30 and 5.31.\nBasically the overflow occurs in the size of the note description.\nUsing LIEF we can trigger the overflow as follow:\n1target = lief.parse(\u0026#34;/usr/bin/id\u0026#34;) 2note_build_id = target[lief.ELF.NOTE_TYPES.BUILD_ID] 3note_build_id.description = [0x41] * 30 4target.write(\u0026#34;id_overflow\u0026#34;) 1$ file --version 2file-5.29 3magic file from /usr/share/file/misc/magic 4 5$ id_overflow 6uid=1000(romain) gid=1000(romain) ... 7 8$ file id_overflow 9*** buffer overflow detected ***: file terminated 10./id_overflow: [1] 3418 abort (core dumped) file ./id_overflow Here is the commit that introduced the bug: 9611f3\nPE The Load Config directory is now parsed into the LoadConfiguration object. This structure evolves with the Windows versions and LIEF has been designed to support this evolution. You can take a look at LoadConfigurationV0, LoadConfigurationV6.\nOne can find the different versions of this structure in the following directories:\ninclude/LIEF/PE/LoadConfigurations src/PE/LoadConfigurations The current version of LIEF is able to parse the structure up to Windows 10 build 15002 with the hotpatch table offset.\nHere are some examples of the LoadConfiguration API:\n1\u0026gt;\u0026gt;\u0026gt; target = lief.parse(\u0026#34;PE64_x86-64_binary_WinApp.exe\u0026#34;) 2\u0026gt;\u0026gt;\u0026gt; target.has_configuration 3True 4\u0026gt;\u0026gt;\u0026gt; config = target.load_configuration 5\u0026gt;\u0026gt;\u0026gt; config.version 6WIN_VERSION.WIN10_0_15002 7\u0026gt;\u0026gt;\u0026gt; hex(config.guard_rf_failure_routine) 8\u0026#39;0x140001040\u0026#39; LIEF also provides an API to serialize any ELF or PE objects into JSON 4\nFor examples to transform LoadConfiguration object into Json:\n1\u0026gt;\u0026gt;\u0026gt; from lief import to_json 2\u0026gt;\u0026gt;\u0026gt; to_json(config) 3\u0026#39;{\u0026#34;characteristics\u0026#34;:248,\u0026#34;code_integrity\u0026#34;:{\u0026#34;catalog\u0026#34;:0,\u0026#34;catalog_offset\u0026#34;:0 ... }}\u0026#39; # Not fully printed One can also serialize the whole Binary object:\n1\u0026gt;\u0026gt;\u0026gt; to_json(target) 2\u0026#39;{\u0026#34;data_directories\u0026#34;:[{\u0026#34;RVA\u0026#34;:0,\u0026#34;size\u0026#34;:0,\u0026#34;type\u0026#34;:\u0026#34;EXPORT_TABLE\u0026#34;},{\u0026#34;RVA\u0026#34;:62584,\u0026#34;section\u0026#34; ...}}\u0026#39; # # Not fully printed Mach-O For Mach-O binary, dynamic executables embed the LC_DYLD_INFO command which is associated with the dyld_info_command structure.\nThe structure is basically a list of offsets and sizes pointing to other data structures.\nFrom /usr/lib/mach-o/loader.h the structure looks like this:\n1struct dyld_info_command { 2 uint32_t cmd; 3 uint32_t cmdsize; 4 uint32_t rebase_off; 5 uint32_t rebase_size; 6 uint32_t bind_off; 7 uint32_t bind_size; 8 uint32_t weak_bind_off; 9 uint32_t weak_bind_size; 10 uint32_t lazy_bind_off; 11 uint32_t lazy_bind_size; 12 uint32_t export_off; 13 uint32_t export_size; 14}; The dyld loader uses this structure to:\nRebase the executable Bind symbols to addresses Retrieve exported functions (or symbols) Whereas in the ELF and PE format relocations are basically a table, Mach-O format uses byte streams to rebase the image and to bind symbols with addresses. For exports it uses a trie as subjacent structure.\nIn the new version of LIEF, the Mach-O parser is able to handle these underlying structures to provide an user-friendly API:\nThe export trie is represented by the ExportInfo object which is usually tied to a Symbol. The binding byte stream is represented trough the BindingInfo object.\nFor the rebase byte stream, the parser create virtual relocations to model the rebasing process. These virtual relocations are represented by the RelocationDyld object and among other attributes it contains address, size and type 5.\nHere is an example using the Python API:\n1\u0026gt;\u0026gt;\u0026gt; id = lief.parse(\u0026#34;/usr/bin/id\u0026#34;) 2\u0026gt;\u0026gt;\u0026gt; print(id.relocations[0]) 3100002000 POINTER 64 DYLDINFO __DATA.__eh_frame dyld_stub_binder 4\u0026gt;\u0026gt;\u0026gt; print(id.has_dyld_info) 5True 6\u0026gt;\u0026gt;\u0026gt; dyldinfo = id.dyld_info 7\u0026gt;\u0026gt;\u0026gt; print(dyldinfo.bindings[0]) 8Class: STANDARD 9Type: POINTER 10Address: 0x100002010 11Symbol: ___stderrp 12Segment: __DATA 13Library: /usr/lib/libSystem.B.dylib 14\u0026gt;\u0026gt;\u0026gt; print(dyldinfo.exports[0]) 15Node Offset: 18 16Flags: 0 17Address: 0 18Symbol: __mh_execute_header Conclusion In this release we did a large improvement of the ELF builder. Mach-O and PE parts gain new objects and new functions. LIEF is now available on PyPI and can be added in the requirements of Python projects whatever the Python version and the target platform.\nSince the v0.7.0 LIEF has been presented at RMLL and the MISP project uses it for its PyMISP objects.\nSome may complain about the C API. They are right! Until the v1.0.0 we will provide a minimal C API. Once C++ API is stable we plan to provide full APIs for Python, C, Java, OCaml 6, etc.\nNext version should be focused on the Mach-O builder especially for adding sections and segments. We also plan to support PE .NET headers and fix some performances issues.\nFor questions you can join the Gitter channel\nAll Python examples are done with the 3.5 version\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nSee the blog post about O-LLVM analysis: https://blog.quarkslab.com/deobfuscation-recovering-an-ollvm-protected-program.html\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nAs mentioned in the Fortinet blog post, the library is packed.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nThis feature is not yet available for MachO objects\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nDue to the inheritance relationship and abstraction these attributes are located in the MachO::Relocation and LIEF::Relocation objects.\u0026#160;\u0026#x21a9;\u0026#xfe0e;\nhttps://github.com/aziem/LIEF-ocaml\u0026#160;\u0026#x21a9;\u0026#xfe0e;\n","date":1509321600,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1509321600,"objectID":"5390db8429858f2c07b028a24992ab0d","permalink":"https://lief.re/blog/2017-10-30-lief-0-8-3/","publishdate":"2017-10-30T00:00:00Z","relpermalink":"/blog/2017-10-30-lief-0-8-3/","section":"blog","summary":"This blog post introduces new features in LIEF 0.8.3 as well as some uses cases.","tags":null,"title":"Have fun with LIEF and Executable Formats!","type":"blog"},{"authors":{"avatar":"img/avatar/romain.png","display_name":"Romain Thomas","family_name":"Thomas","given_name":"Romain","profile":"https://www.romainthomas.fr"},"categories":null,"content":" Tl;DR\nLIEF is a library to parse and manipulate ELF, PE and Mach-O formats. Source code is available on GitHub and use cases are here. Executable File Formats in a Nutshell When dealing with executable files, the first layer of information is the format in which the code is wrapped. We can see an executable file format as an envelope. It contains information so that the postman (i.e. Operating System) can handle and deliver (i.e. execute) it. The message wrapped by this envelope would be the machine code.\nThere are mainly three mainstream formats, one per OS:\nPortable Executable (PE) for Windows systems Executable and Linkable Format (ELF) for UN*X systems (Linux, Android\u0026hellip;). Mach-O for OS-X, iOS\u0026hellip; Other executable file formats, such as COFF, exist but they are less relevant.\nUsually each format has a header which describes at least the target architecture, the program\u0026rsquo;s entry point and the type of the wrapped object (executable, library\u0026hellip;) Then we have blocks of data that will be mapped by the OS\u0026rsquo;s loader. These blocks of data could hold machine code (.text), read-only data (.rodata) or other OS specific information.\nFor PE there is only one kind of such block: Section. For ELF and Mach-O formats, a section has a different meaning. In these formats, sections are used by the linker at the compilation step, whereas segments (second type of block) are used by the OS\u0026rsquo;s loader at execution step. Thus sections are not mandatory for ELF and Mach-O formats and can be removed without affecting the execution.\nPurpose of LIEF It turns out that many projects need to parse executable file formats but don\u0026rsquo;t use a standard library and re-implement their own parser (and the wheel). Moreover, these parsers are usually bound to one language.\nOn Unix system one can find the objdump and objcopy utilities but they are limited to Unix and the API is not user-friendly.\nThe purpose of LIEF is to fill this void:\nProviding a cross platform library which can parse and modify (in a certain extent) ELF, PE and Mach-O formats using a common abstraction Providing an API for different languages (Python, C++, C\u0026hellip;) Abstract common features from the different formats (Section, header, entry point, symbols\u0026hellip;) The following snippets show how to obtain information about an executable using different API of LIEF:\n1import lief 2# ELF 3binary = lief.parse(\u0026#34;/usr/bin/ls\u0026#34;) 4print(binary) 5 6# PE 7binary = lief.parse(\u0026#34;C:\\\\Windows\\\\explorer.exe\u0026#34;) 8print(binary) 9 10# Mach-O 11binary = lief.parse(\u0026#34;/usr/bin/ls\u0026#34;) 12print(binary) With the C++ API:\n1#include \u0026lt;LIEF/LIEF.hpp\u0026gt; 2int main(int argc, const char** argv) { 3 LIEF::ELF::Binary* elf = LIEF::ELF::Parser::parse(\u0026#34;/usr/bin/ls\u0026#34;); 4 LIEF::PE::Binary* pe = LIEF::PE::Parser::parse(\u0026#34;C:\\\\Windows\\\\explorer.exe\u0026#34;); 5 LIEF::MachO::Binary* macho = LIEF::MachO::Parser::parse(\u0026#34;/usr/bin/ls\u0026#34;); 6 7 std::cout \u0026lt;\u0026lt; *elf \u0026lt;\u0026lt; std::endl; 8 std::cout \u0026lt;\u0026lt; *pe \u0026lt;\u0026lt; std::endl; 9 std::cout \u0026lt;\u0026lt; *macho \u0026lt;\u0026lt; std::endl; 10 11 delete elf; 12 delete pe; 13 delete macho; 14} And finally with the C API:\n1#include \u0026lt;LIEF/LIEF.h\u0026gt; 2int main(int argc, const char** argv) { 3 4 Elf_Binary_t* elf_binary = elf_parse(\u0026#34;/usr/bin/ls\u0026#34;); 5 Pe_Binary_t* pe_binary = pe_parse(\u0026#34;C:\\\\Windows\\\\explorer.exe\u0026#34;); 6 Macho_Binary_t** macho_binaries = macho_parse(\u0026#34;/usr/bin/ls\u0026#34;); 7 8 Pe_Section_t** pe_sections = pe_binary-\u0026gt;sections; 9 Elf_Section_t** elf_sections = elf_binary-\u0026gt;sections; 10 Macho_Section_t** macho_sections = macho_binaries[0]-\u0026gt;sections; 11 12 for (size_t i = 0; pe_sections[i] != NULL; ++i) { 13 printf(\u0026#34;%s\\n\u0026#34;, pe_sections[i]-\u0026gt;name) 14 } 15 16 for (size_t i = 0; elf_sections[i] != NULL; ++i) { 17 printf(\u0026#34;%s\\n\u0026#34;, elf_sections[i]-\u0026gt;name) 18 } 19 20 for (size_t i = 0; macho_sections[i] != NULL; ++i) { 21 printf(\u0026#34;%s\\n\u0026#34;, macho_sections[i]-\u0026gt;name) 22 } 23 24 elf_binary_destroy(elf_binary); 25 pe_binary_destroy(pe_binary); 26 macho_binaries_destroy(macho_binaries); 27} LIEF supports FAT-MachO and one can iterate over binaries as follows:\n1import lief 2binaries = lief.MachO.parse(\u0026#34;/usr/lib/libc++abi.dylib\u0026#34;) 3for binary in binaries: 4 print(binary) Note\nThe above script uses the lief.MachO.parse function instead of the lief.parse function because lief.parse returns a single lief.MachO.binary object whereas lief.MachO.parse returns a list of lief.MachO.binary (according to the FAT-MachO format). Along with standard format components like headers, sections, import table, load commands, symbols, etc. LIEF is also able to parse PE Authenticode:\n1import lief 2driver = lief.parse(\u0026#34;driver.sys\u0026#34;) 3 4for crt in driver.signature.certificates: 5 print(crt) 1Version: 3 2Serial Number: 61:07:02:dc:00:00:00:00:00:0b 3Signature Algorithm: SHA1_WITH_RSA_ENCRYPTION 4Valid from: 2005-9-15 21:55:41 5Valid to: 2016-3-15 22:5:41 6Issuer: DC=com, DC=microsoft, CN=Microsoft Root Certificate Authority 7Subject: C=US, ST=Washington, L=Redmond, O=Microsoft Corporation, CN=Microsoft Windows Verification PCA 8... Full API documentation is available here\nPython API C++ API C API Architecture In the LIEF architecture, each format implements at least the following classes:\nParser: Parse the format and decompose it into a Binary class Binary: Modelize the format and provide an API to modify and explore it. Builder: Transform the binary object into a valid file. To factor common characteristics in formats we have an inheritance relationship between these characteristics.\nFor symbols it gives the following diagram:\nIt enables to write cross-format utility like nm. nm is a Unix utility to list symbols in an executable. The source code is available here: binutils\nWith the given inheritance relationship one can write this utility for the three formats in a single script:\n1import lief 2import sys 3 4def nm(binary): 5 for symbol in binary.symbols: 6 print(symbol) 7 8 return 0 9 10if __name__ == \u0026#34;__main__\u0026#34;: 11 r = nm(sys.argv[1]) 12 sys.exit(r) Conclusion As LIEF is still a young project we hope to have feedback, ideas, suggestions and pull requests.\nThe source code is available here: https://github.com/lief-project (under Apache 2.0 license) and the associated website: http://lief.quarkslab.com\nIf you are interested in use cases, you can take a look at these tutorials:\nParse and manipulate formats Create a PE from scratch Play with ELF symbols Hooking Infecting the PLT/GOT The project will be presented at the Third French Japanese Meeting on Cybersecurity\nContact lief [at] quarkslab [dot] com Gitter: lief-project Thanks Thanks to Serge Guelton and Adrien Guinet for their advice about the design and their code review. Thanks to Quarkslab for making this project open-source.\n","date":1492473600,"expirydate":-62135596800,"kind":"page","lang":"en","lastmod":1492473600,"objectID":"b983072d4a5bc4f7b532cb32c8a7d99f","permalink":"https://lief.re/blog/2017-04-18-lief/","publishdate":"2017-04-18T00:00:00Z","relpermalink":"/blog/2017-04-18-lief/","section":"blog","summary":"Blog post about the open-sourcing of LIEF","tags":null,"title":"LIEF - Library to Instrument Executable Formats","type":"featured"}]