FEX-2207
Read the blog post at FEX-Emu's Site!
This is going to be a very interesting release this month for users. Quite a large number of features landed for this release!
Automatic TSO mode migration
When FEX is running a single threaded application, we can be optimistic and disable heavy TSO-emulation related features. This significantly speeds up some single threaded applications. Once the program creates a thread then FEX will disable this optimize and clear its code cache to be safe.
EroFS rootfs image support
While FEX has supports SquashFS for a long time. We are now adding support for EroFS as well. The big advantage of EroFS is that it doesn't serialize accesses to a single thread. When you're having dozens of threads accessing the filesystem this is a real bottleneck. Low end devices would end up having a single CPU core maxed out inside of the squashfuse application while multiple threads are trying to request data.
erofsfuse solves this by allowing multi-threaded decompression that scales quite well depending on the number of file requests in flight. We can see how this scales in the following benchmark graphs.
As one can see, while erofsfuse scales quite well with multiple threads; squashfuse stays pretty much flat the entire time. The downsides to EroFS is that the compression ratio of its LZ4HC compression isn't quite as good as ZSTD, causing the rootfs to be larger. But the reduction in memory usage, and lower read amplification plus higher bandwidth is worth it. Seriously improving performance of using a rootfs over a network mapped share like some people do.
An additional problem is that the erofsfuse application requires erofs-utils version 1.5, which came out on 2022-06-13. This is really bleeding edge currently.
Never the less, FEXRootFSFetcher will now allow you to download a FEX Rootfs image with this compression format. Just ensure you have a erofsfuse installed.
FEXServer
This is a fairly significant change to how FEX-Emu operates in the background. Similar to how wine has a wineserver, FEX is now requiring a FEXServer
to always be running.
For now the FEXServer is taking over duty for rootfs image mounting and a logging server. In this future this will be expanded to also handle code
caching services and more. FEXServer will automatically start on invocation of FEX and be running in the background until all instances of FEX close.
Pressure-vessel and Proton Fixes
FEX-Emu now officially works inside of pressure-vessel. This is the tool that Steam uses for running Proton games. Thunking doesn't yet work in this case but it is coming.
If you're wanting to test proton games, make sure to sign up to the latest SteamLinuxRuntime_soldier beta in the settings and give it a go. It's not currently the speediest, but it should work.
Disable FEXServer rootfs when running under pressure-vessel
Pressure-vessel sets up an x86-64 rootfs. FEX shouldn't be using the FEXServer provided rootfs in this case.
We now detect when running inside of pressure-vessel, and disable the FEXServer RootFS
Enable Hypervisor bit
This change allows pressure-vessel to detect FEX-Emu and do FEX specific setup for games.
Fix open syscall path emulation
The open syscall is fairly rarely used so this has gone unnoticed for a while. We weren't wrapping this syscall in our filesystem emulation and was breaking applications from running. With this fixed, the latest Proton Experimental branch from Steam now works!
Support thunks in pressure-vessel
Pressure-vessel uses a bunch of environment variable overriding to replace where libraries are inside of its chroot. Support this inside of FEX. While this is a step to getting Thunks working inside of pressure vessel, it is not yet supported.
Thunks
Lots of improvements to thunks, it's hard to capture them all. There is a heavy amount of infrastructure work going on in here to make thunks more robust and stable. Starting with Vulkan and GL.
- Work around lack of generic callback support in VK_EXT_debug_report (4771a34)
- Disable debug report callback (751b66d)
- Allow building thunks on a wider range of platforms (ad6fd5a)
- Add fex:is_lib_loaded (88b94be)
- Support returning host function pointers to the guest (04a1ac9)
Fix clone3 syscall's stack pointer again
In an edge case of how FEX-Emu handles clone3, it wasn't handling the stack pointer size correctly again.
Resolving this edge case once again gets Steam's web helper working with glibc 2.34.
Fix 32-bit memory allocation range scanning
When scanning for free chunks of memory in the 32-bit range, FEX-Emu needs to use a custom allocator to ensure everything returned ends up in the lower 32-bit memory space. This fixes a bug where large allocations would never find an empty space. Fixes X-Plane 11!
Optimize file descriptor to filename mapping
It is a common occurrence that FEX needs to map an open file descriptor back to a file path. This used to take 14 system calls.
Since each system call was querying filesystem metadata these could take some time. With this optimized approach it now takes only one system call instead. Significantly lowering file IO overhead!
Enable Wine application profiles
Wine applications when they are executing typically only showed up as wine or wine-preloader to FEX-Emu.
Now we work around this issue by scanning the arguments to find the executable name, which allows application profiles to function.
Now we can easily support SonicMania.exe.json!
FEXRootFSFetcher fix to file hashing
It was discovered that this tool was hashing files incorrectly. The new version is now hashing correctly and image files have been updated to be using the new hash. Nothing to see here
Fix 32-bit DRM ioctl DRM_IOCTL_WAIT_VBLANK
This ioctl does exactly what it says on the tin. Due to a copy and paste error, this wasn't actually waiting on vblank.
Fix 32-bit ioctl structure copying
A feature of the DRM subsystem allows you to extend ioctl struct definitions safely. The kernel knows the size of the ioctl structure and if it
differs from what the userspace application passes in, then it will only copy the smaller amount of data and zero out the rest.
This allows older userspaces to safely work with newer kernels. FEX wasn't reproducing this with its ioctl emulation in some V3D ioctls, resulting in unsafe execution of ioctls. This has been resolved.
Support CLMUL Extension
This instruction is heavily used to accelerate CRC and other hashing algorithms. This perfectly matches the AArch64 instruction as well. So
implementing this was very straightforward!
Self-modifying-code frontend improvements
Allows FEX to track code pages inside our frontend decoding. This fixes some issues where code can be changed while we are decoding things in the frontend. Now FEX can detect this and throw away what it compiled.
Developer specific improvements
Check for binfmt_misc conflict before installing
To ensure building from source doesn't result in a broken configuration, cmake will now check for conflicting binfmt_misc files before installing.
How to uninstall the conflicting binfmt_misc files is specific to how the user has installed them, so it is left up to them to find out how.
Auto CI fetching
If the CI systems need an updated rootfs, the config can now be updated and they will fetch the latest.
unittests now longer forever recompiler
ASM unittests would always reglob on building which took time. This is now fixed
Fix ASAN bug in how register allocation data was allocated
This was hard to track, finally this annoying bug that has gone back and forth a bit has been resolved!
ARM64 CPU feature detection for ASM unit tests
Automatically disables some incompatible unit tests on ARM64 devices that don't support some features. No more confusing failures.
GDB integration
This allows a plugin to be loaded in GDB to show more information that we would otherwise have. Giving us both backtraces and source inside of GDB
even through the JIT. Should let debugging the JIT be that much easier.
Raw Changes
-
AOTIR
-
Fix IRList delete (fb41ba1)
-
Fix RAData free (9242e59)
-
Arm64
-
EncryptionOps
-
Fix register specifiers in PCLMUL movs (63b70ff)
-
JIT
-
Use IR names in opcode implementations (19b0a9c)
-
Backends
-
Unified dispatch, interface rework, cleanups (072690a)
-
CI
-
Auto rootfs fetching (c027ace)
-
CMAKE
-
Create directories during configuration, fixes endless generation of unittests (e62bc24)
-
CMake
-
Check for binfmt_misc conflicts before install (6d2f98a)
-
CPUID
-
Enable the hypervisor bit (da8dbf1)
-
Common
-
Support application profiles for games launched through wine (3913dd6)
-
Config
-
Fixes AppConfig for wine-preloader (ae6a57e)
-
Context
-
Fix CreateThread partial initialization issue (eac579f)
-
Decouple from CodeLoader, introduce generic CustomIREntrypoints (b0a31f7)
-
CoreState
-
Add register size constants (1c1ad87)
-
Dispatcher
-
Arm64
-
Fix vixl assert (0f696c6)
-
Dispatchers
-
Use thread local emitters for backend callbacks (4f9bc70)
-
FDUtils
-
Fix get_fdpath (aa17f64)
-
FEXGDBReader
-
Fix install path (b3e090c)
-
FEXRootFSFetcher
-
Update and fix xxhash file hashing (c0a8984)
-
FEXServer
-
Stop leaking FDs to subprocesses (b7806e4)
-
Minor changes (1f1d070)
-
Adds -w option for waiting on current FEXServer (b020e59)
-
Adds new FEXServer service (5a19425)
-
FEXServerClient
-
When running under pressure-vessel don't use FEXServer rootfs (9110546)
-
IPR
-
Store copy of IRLists, Dispatcher cleanups (790bd97)
-
IR
-
Remove GuestCallDirect, GuestCallIndirect (e4d659a)
-
add IsFragmentExit, IsBlockExit (2d3c6ef)
-
Invalidations
-
Move invalidation locks to Context (ffcde18)
-
Ioctl
-
Safely access v3d csd ioctl structure (158ba1a)
-
Ioctl32
-
Fix DRM_IOCTL_WAIT_VBLANK (982518d)
-
JITs
-
Qualify external includes consistently (b05e5ce)
-
Linux
-
Fixes for clone3 stack size (1494aac)
-
Make
get_fdpath
more optimal (542ab04) -
Fixes 32-bit allocator range scanning (4c73c71)
-
Fixes
open
syscall emulated path handling (4323493) -
OpcodeDispatcher
-
Handle CLMUL opcode extension (9e9ceb3)
-
SMC
-
Track code pages before frontend decode (bbd9eb5)
-
Scripts
-
Allow user override on tagged version (aafe7ff)
-
TSO
-
Add auto migration optimisation for applications that don't need TSO (c99d1e4)
-
Tests
-
IRLoader
-
Silence missing override warning (ade3a52)
-
ThunkLibs
-
Fix Guest.h (302a6c9)
-
silence warnings (30a28ff)
-
vulkan
-
Work around lack of generic callback support in VK_EXT_debug_report (4771a34)
-
Thunks
-
Implement generic callback support (aec5b21)
-
Soften error condition to be non-fatal (3b8491b)
-
Adds libvulkan steam pinned library thunking support (6b226dd)
-
Fix std::set crash (6a43db8)
-
Add fex:is_lib_loaded (88b94be)
-
Support returning host function pointers to the guest (04a1ac9)
-
Support pressure-vessel prefixes (e2e6f2a)
-
vulkan
-
Disable debug report callback (751b66d)
-
ThunksDB
-
Fix String.find error check (d8fa53a)
-
ValueDominanceValidation
-
Avoid stack exhaustion when aggregating predecessors (124097d)
-
Vulkan
-
Handle queries for unknown functions more gracefully (e137c2e)
-
Misc
-
Support EroFS (46fcbe2)
-
Allow building thunks on a wider range of platforms (ad6fd5a)
-
Fix inconsistent allocation schemes used for RegisterAllocationData (8a21eca)
-
Make Dispatcher per Context from per Thread, Simplify TestHarnessRunner (0a62a4c)
-
IR.json: Correct 'Dest' key to 'Desc' (51c5f94)
-
gdb
-
jit integration (4449b60)
-
unittests
-
Classify CPU based on CPU features (d005fdc)
-
Disable known flake in posix tests (a97fb2f)
-
Add FEXLinuxTests with a few tests (cb0935c)
-
ThunkLibs
-
Fix test failures due to missing FEX_PACKFN_LINKAGE define (a2c9d5a)