Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Differing shift behavior between u8x16 and u16x8 #241

Open
vadixidav opened this issue Apr 26, 2019 · 6 comments
Open

Differing shift behavior between u8x16 and u16x8 #241

vadixidav opened this issue Apr 26, 2019 · 6 comments
Labels
P-High Unsound Something breaks Rust safety guarantees

Comments

@vadixidav
Copy link

vadixidav commented Apr 26, 2019

I was executing something similar to the following code:

use packed_simd::{u16x8, u8x16};

fn main() {
    let x: u16x8 = u16x8::from([0xFFFF; 8]);
    let counted = x.count_ones();
    let x = (u16x8::from([1; 8]) << counted) - 1;
    assert_eq!(x, u16x8::from([0xFFFF; 8]));

    let x: u8x16 = u8x16::from([0xFF; 16]);
    let counted = x.count_ones();
    let x = (u8x16::from([1; 16]) << counted) - 1;
    assert_eq!(x, u8x16::from([0xFF; 16]));
}

If you run this, it will fail on the second assertion. The reason why is because when you shift left with u16x8, if the shift amount exceeds or equals the width of the lane (in this case 16 bits), then the result is 0, which is result you normally get with horizontal instructions. However, if we have a u8x16 and we shift left, I get different behavior. Specifically, the shift amount is modded by the number of bits (in this case 8) before shifting. So, if you shift left by 8, it actually shifts left by 0, meaning that it is unable to populate the lane with 1. I have worked around this problem with the following:

let x = (u8x16::from([1; 16]) << counted ^ counted >> 3) - 1;

What this does is it XORs the 1 bit from the 8 with the 1 that remains in bit position 0 in the number so that it properly becomes 0 before subtracting one. I feel like, under the hood, this operation should be made consistent across all architectures by checking if any bit is set in bit position 3 or above in the shift amount and making a mask from that to choose between 0 or the shifted number.

One alternative is to do the opposite and support the mod by lane width behavior for all widths.

I think that both of these shifting operations should be supported (perhaps wrapping_shift and overflowing_shift, but maybe a better name). However, the correct default behavior should be to support the same behavior Rust has with normal numbers, which is that shifts over the bit width should give back 0. So Shl and Shr should have this behavior for consistency.

System

I am running on a Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz.

I am compiling with the default rustc flags in both Debug and Release. Please let me know if you want me to test this in other configurations because I do wish for this to work consistently, since I am using it very extensively for performance-critical code.

@vadixidav
Copy link
Author

I tested it in --release mode and I actually get different behavior:

thread 'main' panicked at 'assertion failed: `(left == right)`
  left: `u16x8(0, 0, 0, 0, 60080, 33969, 21872, 0)`,
 right: `u16x8(65535, 65535, 65535, 65535, 65535, 65535, 65535, 65535)`', src/main.rs:7:5

This looks like undefined behavior to me.

@vadixidav
Copy link
Author

vadixidav commented Apr 26, 2019

I cannot reproduce the above behavior in another program I wrote in release mode, but the undefined behavior does happen in the above test case. This is especially concerning. I recommend not relying on the behavior of shifts above or equal to the lane width.

@vadixidav
Copy link
Author

This is my new solution to work around the problem in the above example:

use packed_simd::{u16x8, u8x16};

fn main() {
    let x: u16x8 = u16x8::from([0xFFFF; 8]);
    let counted = x.count_ones();
    let x = ((u16x8::from([1; 8]) << (counted % 16)) ^ counted >> 4) - 1;
    assert_eq!(x, u16x8::from([0xFFFF; 8]));

    let x: u8x16 = u8x16::from([0xFF; 16]);
    let counted = x.count_ones();
    let x = ((u8x16::from([1; 16]) << (counted % 8)) ^ counted >> 3) - 1;
    assert_eq!(x, u8x16::from([0xFF; 16]));
}

@gnzlbg gnzlbg added P-High Unsound Something breaks Rust safety guarantees labels Apr 29, 2019
@gnzlbg
Copy link
Contributor

gnzlbg commented Apr 29, 2019

I think this rings a bell, and sounds like the current API introduces UB to safe Rust, so tagging as unsound.

@TheIronBorn
Copy link
Contributor

1_u32 << 32 is undefined in LLVM:
<result> = shl i32 1, 32 ; undefined

@vadixidav
Copy link
Author

vadixidav commented Sep 27, 2020

1_u32 << 32 is undefined in LLVM:
<result> = shl i32 1, 32 ; undefined

It might be necessary to do a test to see if any bits are set above the wrap around point and mask the register where those bits are not set. It would hurt performance slightly, but it would prevent the introduction of undefined behavior and differing behavior between release and debug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P-High Unsound Something breaks Rust safety guarantees
Projects
None yet
Development

No branches or pull requests

3 participants