Thunks: Analyze data layout to detect platform differences #3156

neobrain · 2023-09-25T21:46:48Z

Overview

FEX's thunking currently assumes that parameters passed across architecture boundaries use the same data layout on both sides. For example, a function of the form void CreateObject(ConfigType* config) can only be handled by FEX if ConfigType has the same members (with the same bit sizes) on both x86 and on ARM. This is generally a safe assumption when thunking 64-bit guest applications, but it fully falls apart for 32-bit guests (since pointer members of any structure will be 32-bits instead of 64-bit). However even on 64-bit, it's desirable to have some assurance that we're indeed operating on consistent data.

This PR adds a data layout analysis framework as the first step towards solving these problems. The framework is integrated as a preprocessing pass in the thunk generator. This new pass performs these tasks:

it gathers all argument types used in any thunked functions
it queries their data layout from libclang on both x86 and ARM to build an internal type name -> data layout map for each architecture
finally, it compares the two maps to detect any potential differences

For simple libraries, the data layout analysis will detect a perfect match such that no additional work is needed. For more complex libraries, future PRs will add annotations to the interface files that allow to resolve data type differences semi-automatically via "struct repacking". Since most of our current thunk libraries fall in the latter category, analysis is only enabled for testing.

Implementation details

Since data layout is queried from libclang, we must run a large chunk of logic twice to get accurate data for both the guest and the host architecture. The thunk generator will first parse the thunk interface file and query data layouts for the guest architecture, after which it repeats the same for the host architecture.

It's easy to see that this gets very intertwined quickly, especially if we want a testable design.

To facilitate separation of concerns, a class-hierarchy approach is used for the implementation hence. #3067 already initiated this by splitting interface parsing and boilerplate code emission into an AnalysisAction and a GenerateThunkLibsAction, respectively. The new functionality affects AnalysisAction and adds a number of new modules on top, illustrated in this diagram:

classDiagram
    AnalysisAction
    AnalysisAction <|-- AnalyzeDataLayoutAction
    AnalysisAction <|-- DataLayoutCompareAction
    DataLayoutCompareAction <|-- DataLayoutCompareActionForTest
    DataLayoutCompareAction <|-- GenerateThunkLibsAction
  
    class AnalysisAction {
        - Parses interface files
        - Analyzes data layout factoring in annotations
    }
    class AnalyzeDataLayoutAction {
        - Runs AnalysisAction for the guest

        - Turns any libclang objects into strings so they
        can be used across sessions
    }
    class DataLayoutCompareAction {
        - Runs AnalysisAction for the host
        - Takes the output of an AnalyzeDataLayoutAction
        for the guest arch and detects any differences
        between the guest/host results
    }
    class DataLayoutCompareActionForTest {
        - Thin layer around DataLayoutCompareAction
        used for unit-testing data layout comparison.
        - Mainly deals with lifetime properties
        of libclang objects
    }
    class GenerateThunkLibsAction {
        - Emits thunk boilerplate, factoring in results
        of the DataLayoutCompareAction
    }

Impact

This feature is currently only enabled for unit tests. To enable it for real libraries, new annotations must be added and applied to the corresponding interface files. This has been done in the preview change set in #3135, which supports all of our existing thunk libraries.

These annotations are planned to be added on a per-type basis, so that we won't need to go through each function individually. That helps cut down the work involved tremendously.

neobrain

Adding some notes for personal reference.

unittests/ThunkLibs/common.h

ThunkLibs/include/common/GeneratorInterface.h

neobrain · 2023-09-25T21:50:56Z

ThunkLibs/GuestLibs/CMakeLists.txt

+  if (BITNESS EQUAL 32)
+    target_compile_definitions(${NAME}-guest-deps INTERFACE IS_32BIT_THUNK)
+  endif ()


Changes related to 32-bit support should probably be moved to a future PR, too.

Actually this is needed for libfex_thunk_test, which we're already testing the 32-bit build of. Not including this here would require adding some temporary workaround that would be removed one PR later anyway.

ThunkLibs/Generator/data_layout.h

ThunkLibs/GuestLibs/CMakeLists.txt

unittests/ThunkLibs/abi.cpp

neobrain · 2023-09-26T16:15:15Z

unittests/ThunkLibs/common.h

 /**
 * Run the given ToolAction on the input code.
 *
 * The "silent" parameter is used to suppress non-fatal diagnostics in tests that expect failure
 */
-inline void run_tool(clang::tooling::ToolAction& action, std::string_view code, bool silent = false) {
+inline void run_tool(clang::tooling::ToolAction& action, std::string_view code, bool silent = false, std::optional<GuestABI> guest_abi = std::nullopt) {


libclang will leak memory in invocation.run() when the given ToolAction throws an exception. This is annoying but otherwise harmless. To allow AddressSanitizer to run cleanly anyway, we may consider defining something like this here though to disable memory leak checking for thunk unit tests:

const char* __asan_default_options() { return "detect_leaks=0"; }

Sonicadvance1

I'm liking what I'm seeing. The unit tests are great to see!

unittests/ThunkLibs/abi.cpp

Sonicadvance1 · 2023-09-29T16:35:02Z

unittests/ThunkLibs/abi.cpp

+        CHECK(action->GetTypeCompatibility("struct A") == TypeCompatibility::None);
+    }
+
+if (false) // TODO: Currently fails


Curly braces around for if or fix todo I guess.

Checking if the guest layout misses any members from the host layout would be a bit annoying, but I found an elegant way to implement the tested behavior:

The analysis pass already verified that the host layout doesn't missed any members from the guest layout (i.e. the reverse case). To extend this to a "both-way match", we can simply add a check for equal member count!

These are not supported yet.

neobrain · 2023-10-02T20:08:15Z

Branch updated (see changes):

Minor cleanups and implemented all easy TODOs (others refer to open questions or will be addressed in follow-up PRs)
Now disabling ASAN leak detection for tests
Addressed review feedback
Fixed and re-enabled tests previously marked as failing

Good to merge from my end 🙂

The set of these types is tracked in AnalysisAction, to which extensive verification logic is added to detect potential incompatibilities and to enforce use of annotatations where needed.

This adds a ComputeDataLayout function that maps a set of clang::Types to an internal representation of their data layout (size, member list, ...).

This runs the data layout analysis pass added in the previous change twice: Once for the host architecture and once for the guest architecture. This allows the new DataLayoutCompareAction to query architecture differences for each type, which can then be used to instruct code generation accordingly. Currently, type compatibility is classified into 3 categories: * Fully compatible (same size/alignment for the type itself and any members) * Repackable (incompatibility can be resolved with emission of automatable repacking code, e.g. when struct members are located at differing offsets due to padding bytes) * Incompatible

…void parameters

…test code

Sonicadvance1

Zip zap zoop, in to main it goes

neobrain mentioned this pull request Sep 25, 2023

[Not for review] Thunks: Implement data layout analysis and struct repacking. Add 32-bit support to Wayland and Vulkan #3135

Closed

neobrain commented Sep 25, 2023

View reviewed changes

ThunkLibs/GuestLibs/CMakeLists.txt Outdated Show resolved Hide resolved

neobrain force-pushed the feature_thunk_data_layout_analysis branch from 79c3e22 to cf2b571 Compare September 26, 2023 09:19

neobrain commented Sep 26, 2023

View reviewed changes

unittests/ThunkLibs/abi.cpp Show resolved Hide resolved

neobrain force-pushed the feature_thunk_data_layout_analysis branch from cf2b571 to 508cdf1 Compare September 26, 2023 15:20

neobrain commented Sep 26, 2023

View reviewed changes

neobrain force-pushed the feature_thunk_data_layout_analysis branch from 508cdf1 to 56ee4c1 Compare September 28, 2023 20:41

Sonicadvance1 reviewed Sep 29, 2023

View reviewed changes

neobrain force-pushed the feature_thunk_data_layout_analysis branch from 878d6b8 to 899d6a5 Compare October 2, 2023 20:02

neobrain added 2 commits October 2, 2023 22:03

Thunks: Disable 32-bit host thunks

7791e00

These are not supported yet.

Thunks/gen: Rename EmitOutput to OnAnalysisComplete

d65d299

neobrain force-pushed the feature_thunk_data_layout_analysis branch from 899d6a5 to 36fedfd Compare October 2, 2023 20:03

neobrain added 7 commits October 2, 2023 22:18

Thunks/gen: Track data types passed across architecture boundaries

371bf50

The set of these types is tracked in AnalysisAction, to which extensive verification logic is added to detect potential incompatibilities and to enforce use of annotatations where needed.

Thunks/gen: Add data layout analysis

070fa9f

This adds a ComputeDataLayout function that maps a set of clang::Types to an internal representation of their data layout (size, member list, ...).

Thunks/gen: Enforce type compatibility for function parameters

2b472cb

unittests/ThunkLibs: Add data layout tests

b04b054

unittests/ThunkLibs: Add various tests for structs repacking and for …

2d9e816

…void parameters

unittests/ThunkLibs: Specify clang resource directory when compiling …

fe681ab

…test code

neobrain force-pushed the feature_thunk_data_layout_analysis branch from 36fedfd to fe681ab Compare October 2, 2023 20:19

Sonicadvance1 approved these changes Oct 4, 2023

View reviewed changes

Sonicadvance1 merged commit 48fa4f1 into FEX-Emu:main Oct 4, 2023
9 checks passed

neobrain deleted the feature_thunk_data_layout_analysis branch October 4, 2023 15:10

neobrain mentioned this pull request Oct 4, 2023

Thunks: Add new pointer annotations to assist data layout analysis #3177

Merged

neobrain mentioned this pull request Oct 25, 2023

Thunks: Annotate pointer parameters throughout all thunked libraries #3224

Merged

neobrain mentioned this pull request Dec 8, 2023

Library Forwarding: Emit layout helpers to allow repacking struct data #3318

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Thunks: Analyze data layout to detect platform differences #3156

Thunks: Analyze data layout to detect platform differences #3156

neobrain commented Sep 25, 2023 •

edited

Loading

neobrain left a comment

neobrain Sep 25, 2023

neobrain Sep 26, 2023

neobrain Sep 26, 2023 •

edited

Loading

Sonicadvance1 left a comment

Sonicadvance1 Sep 29, 2023

neobrain Oct 2, 2023

neobrain commented Oct 2, 2023

Sonicadvance1 left a comment

Thunks: Analyze data layout to detect platform differences #3156

Thunks: Analyze data layout to detect platform differences #3156

Conversation

neobrain commented Sep 25, 2023 • edited Loading

Overview

Implementation details

Impact

neobrain left a comment

Choose a reason for hiding this comment

neobrain Sep 25, 2023

Choose a reason for hiding this comment

neobrain Sep 26, 2023

Choose a reason for hiding this comment

neobrain Sep 26, 2023 • edited Loading

Choose a reason for hiding this comment

Sonicadvance1 left a comment

Choose a reason for hiding this comment

Sonicadvance1 Sep 29, 2023

Choose a reason for hiding this comment

neobrain Oct 2, 2023

Choose a reason for hiding this comment

neobrain commented Oct 2, 2023

Sonicadvance1 left a comment

Choose a reason for hiding this comment

neobrain commented Sep 25, 2023 •

edited

Loading

neobrain Sep 26, 2023 •

edited

Loading