You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'll definitely defer to others for more specific descriptions but in my experience I'd sum it up as "it's tricky". sometimes adding SIMD can have massive gains, in other cases trying to outsmart the optimizer with explicit simd actually led to worse code performance. the best thing I'd personally recommend is trying a few variations and measuring which ends up the best in your particular case. std.time.Timer is a great tool for that, as well as andrewrk/poop for whole-program perf measurements. when rolling your own, std.simd.suggestVectorSize(T) is a nice helper function to pick an N that won't overload the target's SIMD lane size
Suppose I write a naive, explicit 3x3 matrix multiplication function like so:
Would the compiler be smart enough to find a pattern to emit SIMD instructions, or does it require extra nudging?
Short of directly using
@Vector
and@shuffle
, would writing it in a loop format provide extra information for the compiler to spot the stride pattern?The text was updated successfully, but these errors were encountered: