Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle Rust bindings in the Linux kernel #63

Open
Abyss-W4tcher opened this issue Jun 4, 2024 · 7 comments · May be fixed by #65
Open

Handle Rust bindings in the Linux kernel #63

Abyss-W4tcher opened this issue Jun 4, 2024 · 7 comments · May be fixed by #65

Comments

@Abyss-W4tcher
Copy link

Abyss-W4tcher commented Jun 4, 2024

Hi,

while investigating #57, I noticed the issue started appearing around the integration of Rust in the Linux kernel. With a bit more debugging, I was able to confirm that some bindings were being processed by dwarf2json in the same pool as C structs names :

$ llvm-dwarfdump vmlinux-6.5.0-14-generic --name fs_struct --show-children | grep 'DW_TAG_member' -A 2
0x063c0c3e:   DW_TAG_member
                DW_AT_name      ("_unused")
                DW_AT_type      (0x063ca2a5 "u8[0]")
--
0x063d404b:   DW_TAG_member
                DW_AT_name      ("_unused")
                DW_AT_type      (0x063d52f5 "u8[0]")

$ llvm-dwarfdump vmlinux-6.5.0-14-generic --debug-info=0x063c0c3e --show-parents
vmlinux-6.5.0-14-generic:       file format elf64-x86-64

.debug_info contents:

0x063b59a4: DW_TAG_compile_unit
              DW_AT_producer    ("clang LLVM (rustc version 1.68.2 (9eb3afe9e 2023-03-27) (built from a source tarball))")
              DW_AT_language    (DW_LANG_Rust)
              DW_AT_name        ("/build/linux-SXblTa/linux-6.5.0/rust/bindings/lib.rs/@/bindings.04c8d523-cgu.0")
              DW_AT_stmt_list   (0x00d8f3e1)
              DW_AT_comp_dir    ("/build/linux-SXblTa/linux-6.5.0/debian/build/build-generic")
              DW_AT_GNU_pubnames        (true)
              DW_AT_low_pc      (0xffffffff818060b0)
              DW_AT_high_pc     (0xffffffff81808cf0)

0x063be675:   DW_TAG_namespace
                DW_AT_name      ("bindings")

0x063be67a:     DW_TAG_namespace
                  DW_AT_name    ("bindings_raw")

0x063c0c37:       DW_TAG_structure_type
                    DW_AT_name  ("fs_struct")
                    DW_AT_byte_size     (0x00)
                    DW_AT_alignment     (1)

0x063c0c3e:         DW_TAG_member
                      DW_AT_name        ("_unused")
                      DW_AT_type        (0x063ca2a5 "u8[0]")
                      DW_AT_alignment   (1)
                      DW_AT_data_member_location        (0x00)

Should these bindings, or wider all rust content, be processed separately from the regular structures ? I don't think they should be discarded, but maybe stored under a different parent key in the ISF ?

@Abyss-W4tcher
Copy link
Author

Abyss-W4tcher commented Jul 17, 2024

Hello, has anyone had a chance to look into a solution ?

Unfortunately, all ISFs generated after Linux kernel 6.5 are currently invalid. :/

@ikelos
Copy link
Member

ikelos commented Aug 24, 2024

Anyone here got any progress on this? If changes need making to the main symbol table format, that's possible but I don't fully understand what these new structures are or how they relate yet, so hopefully someone can give me a run down so we can figure out a way to sort them appropriately...

@Abyss-W4tcher
Copy link
Author

Abyss-W4tcher commented Aug 24, 2024

The Ubuntu (Linux) kernel includes Rust bindings for existing C APIs. It is possible to check them by looking at a sample source code : https://bugs.launchpad.net/~canonical-kernel-team/+archive/ubuntu/ppa/+build/26995753/+files/linux-lib-rust-6.5.0-14-generic_6.5.0-14.14_amd64.deb.

Related to this issue, we can check out the fs_struct binding (usr/src/linux-lib-rust-6.5.0-14-generic/rust/bindings/bindings_generated.rs) :

#[repr(C)]
#[derive(Copy, Clone)]
pub struct fs_struct {
    _unused: [u8; 0],
}

The problem is that we now have two fs_struct structs inside the vmlinux DWARF information. However, one is the "classical" C struct and the other one a Rust binding. My guess is that dwarf2json processes everything directly, instead of iterating over DW_TAG_compile_unit (check the first comment of this issue).

To avoid breaking completely the existing ISF format, we could prefix every extracted rust binding/data with something like rust., resulting in :

  • fs_struct
  • rust.fs_struct

edit : There might be confusions with cross references, so not a relevant idea (except if handled correctly ?). Maybe storing all rust content inside additional keys might be required (rust_symbols, rust_types...), but this also breaks with Volatility.

@ikelos
Copy link
Member

ikelos commented Aug 24, 2024

That seems reasonable if it becomes a unique namespace (which it sounds like rust. or <language>. would. Anyone any idea how much effort will that be to add to dwarf2json?

@Abyss-W4tcher Abyss-W4tcher linked a pull request Aug 29, 2024 that will close this issue
@mkonshie
Copy link
Collaborator

Sorry for the delay on this, I was able to discuss this with the dwarf2json maintainers.

This issue and discussion has been about the conflict between Rust and C types. However, we believe that a conflict between Rust and C symbols is also possible. We think modifying the current schema is probably the best way to avoid these collisions between Rust and C types instead of adding a prefix to the type names. For example, the new top-level schema could look close to this:

{
    metadata: {},
    base_types: {},
    base_types_rust: {},
    user_types: {},
    user_types_rust: {},
}

Can this new schema work with volatility3? or will changes need to be made there as well?

Separating the user types and the base types should be straight forward, but separating C symbols and Rust symbols will be more complex. This is because symbols can come from different sources like system.map, DWARF, and the symbol table and whether Rust and C symbols will collide depends on the input source.

I'm currently looking into addressing this, but it will take some time. In the meantime, a solution could be to skip rust compilation units all together to avoid the collision and then add them back after deciding on a solution.

@Abyss-W4tcher
Copy link
Author

Abyss-W4tcher commented Aug 29, 2024

Hello, looking at a sample System.map, there is no way to tell with precision from which compile unit a symbol originates. Even if some of them are conveniently prefixed with rust_ :

ffffffff818095c0 T rust_fmt_argument

Many cannot be determined precisely :

ffffffff81809570 T _RNvXs0_NvNtNtCsbwHtcUjRN57_6kernel4sync7condvar1__NtB7_7CondVarNtNtNtBb_4init10___internal10HasPinData10___pin_data

Those are exported explicitely in the Ubuntu rust bindings :

EXPORT_SYMBOL_RUST_GPL(rust_fmt_argument);
EXPORT_SYMBOL_RUST_GPL(_RNvXs0_NvNtNtCsbwHtcUjRN57_6kernel4sync7condvar1__NtB7_7CondVarNtNtNtBb_4init10___internal10HasPinData10___pin_data);

However, when exported through EXPORT_SYMBOL_RUST_GPL, I noticed that these "rust" symbols were labeled under the "GNU C11" compile unit in the vmlinux, so in the same pool as regular C symbols. So, in fact, the symbols in System.map aren't designed to be "language" labeled by nature.


FYI, PR #65 makes use of namespace prefixes, which allows to keep the existing schema while resolving conflicts and separating types and symbols. Of course, it is open for reviews :) .

edit : Even without Rust support, there are some symbols existing multiple times in the same System.map/symbols list (see https://patchwork.kernel.org/project/linux-kbuild/patch/[email protected]/). It can be checked out with awk '{print $3}' System.map | sort | uniq -d.

@mkonshie
Copy link
Collaborator

mkonshie commented Oct 8, 2024

I opened a PR that will be a short-term fix for this problem and will unblock existing plugins. I will keep this issue open to continue discussing how Rust types/symbols should be integrated and how that will affect the current schema.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants