Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

To what degree do compilers auto-vectorize unrolled procedures? #2

Open
expikr opened this issue Nov 8, 2023 · 3 comments
Open

To what degree do compilers auto-vectorize unrolled procedures? #2

expikr opened this issue Nov 8, 2023 · 3 comments

Comments

@expikr
Copy link

expikr commented Nov 8, 2023

Suppose I write a naive, explicit 3x3 matrix multiplication function like so:

fn dgemm3_unrolled(a: [9]f64, b: [9]f64) [9]f64 {
    return .{
        a[0]*b[0] + a[1]*b[3] + a[2]*b[6],
        a[0]*b[1] + a[1]*b[4] + a[2]*b[7],
        a[0]*b[2] + a[1]*b[5] + a[2]*b[8],
        a[3]*b[0] + a[4]*b[3] + a[5]*b[6],
        a[3]*b[1] + a[4]*b[4] + a[5]*b[7],
        a[3]*b[2] + a[4]*b[5] + a[5]*b[8],
        a[6]*b[0] + a[7]*b[3] + a[8]*b[6],
        a[6]*b[1] + a[7]*b[4] + a[8]*b[7],
        a[6]*b[2] + a[7]*b[5] + a[8]*b[8],
    };
}

Would the compiler be smart enough to find a pattern to emit SIMD instructions, or does it require extra nudging?

Short of directly using @Vector and @shuffle, would writing it in a loop format provide extra information for the compiler to spot the stride pattern?

@nektro
Copy link
Owner

nektro commented Nov 9, 2023

I'll definitely defer to others for more specific descriptions but in my experience I'd sum it up as "it's tricky". sometimes adding SIMD can have massive gains, in other cases trying to outsmart the optimizer with explicit simd actually led to worse code performance. the best thing I'd personally recommend is trying a few variations and measuring which ends up the best in your particular case. std.time.Timer is a great tool for that, as well as andrewrk/poop for whole-program perf measurements. when rolling your own, std.simd.suggestVectorSize(T) is a nice helper function to pick an N that won't overload the target's SIMD lane size

@expikr
Copy link
Author

expikr commented Jan 14, 2024

interesting read: https://deplinenoise.files.wordpress.com/2015/03/gdc2015_afredriksson_simd.pdf

particularly the example starting from page 38

@nektro
Copy link
Owner

nektro commented Jan 14, 2024

if you're familiar with assembly, https://zig.godbolt.org is a nice playground

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants