Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code always built with -msse4.1 leads to crashes on older CPUs #493

Open
BtbN opened this issue Dec 15, 2024 · 11 comments
Open

Code always built with -msse4.1 leads to crashes on older CPUs #493

BtbN opened this issue Dec 15, 2024 · 11 comments
Labels
bug Something isn't working library Concerns the VVenC library

Comments

@BtbN
Copy link

BtbN commented Dec 15, 2024

Since vvenc seems to always set various CPU feature compiler options unconditionally (only checking if the compiler understands them), this leads to the generated code to crash near immediately on any CPU that does not support AVX2.

See also BtbN/FFmpeg-Builds#411

The issue is made even worse by the use of static initializers like

PelBufferOps g_pelBufOP = PelBufferOps();

So even if for example an ffmpeg user doesn't even want to do anything with vvenc, ffmpeg will crash instantly on startup, since the static initializer runs and was generated with advanced simd instructions by gcc.

What confuses me a bit about this is that there's clearly some degree of runtime simd level detection in https://github.com/fraunhoferhhi/vvenc/blob/master/source/Lib/CommonLib/x86/InitX86.cpp, but it'll be mostly useless if the code generated by the compiler itself uses avx2 everywhere.

A possible workaround to at least prevent the crashing static initializers would be to build specifically those with a generic arch, by attaching some compiler attribute or moving them to their own file or something.

@K-os
Copy link
Contributor

K-os commented Dec 16, 2024

Actually, the -mavx2 flag is only used for objects with code that should be called for AVX2 supporting CPUs, only. How exactly do you build VVenC?

Can you please try if the problem is solved by disabling link-time optimization (add -DVVENC_ENABLE_LINK_TIME_OPT=0 to the cmake call). We've had problems before with AVX2 code leaking to other objects due to link-time optimization.

@BtbN
Copy link
Author

BtbN commented Dec 16, 2024

https://github.com/BtbN/FFmpeg-Builds/blob/master/scripts.d/50-vvenc.sh that's the build script.
Basically nothing fancy. Static library only, already disabling LTO since it messes with static linking.

The file in question that specifically causes the SIGILL crash is aforementioned Buffer.cpp, and in there the static initializer g_pelBufOP.

@K-os
Copy link
Contributor

K-os commented Dec 16, 2024

Hmm, very interesting.

The static initializer for g_pelBufOP just sets some function pointers to the default C implementations of several functions. So, actually no need for the compiler to emit any SIMD code there.

The Buffer.cpp file is compiled without -mavx2, but with -msse4.1 . Are you sure the problematic CPU is only missing AVX2 support or could it also be missing support of SSE4.1?

@BtbN
Copy link
Author

BtbN commented Dec 16, 2024

I didn't actually check which extension the instruction(pinsrq) belongs to. It's indeed SSE4.1 and not anything AVX.
The user reporting this is using an old Phenom II CPU, so either extension is missing there.

I tried setting some compiler attributes on the static initializer, in an attempt to stop the compiler from emitting any advanced instructions, but with no success.
Moving the static initializers(There is at least one other one in TrQuant_EMT.cpp) to their own file and compiling it them with minimal instruction flags would be an option, but it'd also be relatively involved due to needing to access all the other functions.

@K-os
Copy link
Contributor

K-os commented Dec 16, 2024

Ok, now it all makes sense. We never thought anybody would use such an old CPU to encode VVC, so we took SSE4.1 as given. But it's clearly a problem if these instructions are in the static initializers.

A quick workaround could be to add the function attribute __attribute__(( target( "no-sse" ) )) to the function called by the static initializer. Unfortunately, I don't think we have access to such old machines to debug the problem.
Could you please try if adding the attribute like this in Buffer.h:78 solves the issue for this specific location?

  PelBufferOps() __attribute__(( target( "no-sse" ) ));

This would need to be done for all static initialization code, but if it stops crashing in PelBufferOps(), we know we are on the right path.

A long term fix could be getting rid of all the static initialization, but I think that needs some more work.

@BtbN
Copy link
Author

BtbN commented Dec 16, 2024

I can't test/debug at the moment, since I'm at work. Will do later tonight.
I had the same problem though, and the way I used to reproduce this was to spin up a VM on Proxmox/qemu and turn off KVM acceleration in the options so it actually emulates, and set the CPU type to qemu64.

@BtbN BtbN changed the title Code always built with -mavx2 leads to crashes on older CPUs Code always built with -msse4.1 leads to crashes on older CPUs Dec 16, 2024
@BtbN
Copy link
Author

BtbN commented Dec 16, 2024

I've managed to get past the Buffer.cpp initializer, but it does not work properly with the Rom.cpp one sadly.
Here's what I have tried: master...BtbN:vvenc:fix_global_init
No matter what I set there, it keeps generating some instructions from the sets that are explicitly disabled, like for example pshufb for the int16_t rho = line.

@BtbN
Copy link
Author

BtbN commented Dec 16, 2024

I've given up on trying to stop gcc from emitting sse instructions.
So instead I just got rid of static initializers: BtbN@f3ae9d6
Not the prettiest, but it does work and stops the crash on ffmpeg startup.

@adamjw24
Copy link
Member

I guess we also add the -msse4.1 because its mandated on x86_64 architectures. We could remove it for 32-bit builds. It would require code-changes at a few points where inline SIMD is used (it would not be used at runtime, but cause compilation failures if the -msse4.1 is removed during compilation). I will try and prepare something soon.

@BtbN
Copy link
Author

BtbN commented Dec 17, 2024

SSE4.1 is not mandated by amd64/x86_64, only SSE2 is.
According to gcc feature levels, sse4.1 and sse4.2 don't appear until v2.

I also don't think it's neccesary to remove it. Requiring SSE4.1 or x86-64-v2 is super fair.
It's just a bit unfortunate that global initializers use it.

Shall I PR the Patch that gets rid of static initializers? It solves the issue, though is not very pretty yet and will likely need a few iterations.

@adamjw24 adamjw24 added bug Something isn't working library Concerns the VVenC library labels Jan 2, 2025
@adamjw24
Copy link
Member

adamjw24 commented Jan 2, 2025

Ok, now it all makes sense. We never thought anybody would use such an old CPU to encode VVC, so we took SSE4.1 as given. But it's clearly a problem if these instructions are in the static initializers.

A quick workaround could be to add the function attribute __attribute__(( target( "no-sse" ) )) to the function called by the static initializer. Unfortunately, I don't think we have access to such old machines to debug the problem. Could you please try if adding the attribute like this in Buffer.h:78 solves the issue for this specific location?

  PelBufferOps() __attribute__(( target( "no-sse" ) ));

This would need to be done for all static initialization code, but if it stops crashing in PelBufferOps(), we know we are on the right path.

A long term fix could be getting rid of all the static initialization, but I think that needs some more work.

Could we test this with a VM as described?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working library Concerns the VVenC library
Projects
None yet
Development

No branches or pull requests

3 participants