Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HEDLEY_ASSUME_ALIGNED #1

Open
nemequ opened this issue Feb 14, 2017 · 15 comments
Open

HEDLEY_ASSUME_ALIGNED #1

nemequ opened this issue Feb 14, 2017 · 15 comments

Comments

@nemequ
Copy link
Owner

nemequ commented Feb 14, 2017

I don't want to forget about all this again, so…

I'm having trouble with __builtin_assume_aligned.

For ICC there is __assume_aligned. This is what I'm looking to emulate; you just pass it a pointer and the desired alignment:

__assume_aligned(arg, 16)
/* Compiler knows arg is 16-byte aligned */

For MSVC there is __assume, which is a bit of a pain to use, but basically you pass an expression (which evaluates to true if the variable is aligned as expected). Something like

__assume((((char*) arg) - ((char*) 0)) % (16) == 0)
/* Compiler knows arg is 16-byte aligned */

GCC 4.7+, OTOH, does things a bit different. It returns a value, and you're supposed to use the returned value:

void* x = __builtin_assume_aligned(arg, 16);
/* Compiler knows x is 16-byte aligned */

The question is, with __builtin_assume_aligned, does GCC know that arg (not x) is 16-byte aligned? Basically, can I use it like __assume_aligned?

Also, for 4.5+ (when __builtin_unreachable was added), if I do something like

#define assume_aligned(arg, align) \
  ((((char*) arg) - ((char*) 0)) % (align) == 0) ? \
    (0) : __builtin_unreachable()

Would GCC know that arg is aligned?

Next, there is the question of OpenMP. If _OPENMP >= 201307L (4.0), should we output #pragma omp simd aligned(arg, align)? If so, in addition to the builtin/intrinsic, or instead of it?

@nemequ
Copy link
Owner Author

nemequ commented Feb 14, 2017

I asked about this on a GCC ML, and the response wasn't what I was hoping for.

Basically, __builtin_assume_aligned doesn't tell GCC anything about the argument, only the return value. And that assume_aligned macro which uses an MSVC-style check coupled with __builtin_unreachable has no effect.

Something like

#define HEDLEY_ASSUME_ALIGNED(arg, align) (arg = __builtin_assume_aligned(arg, align))

would work, but we can't do that because it would evaluate arg twice. Also, arg may be read-only (or not an lvalue, though I'm not sure how much good it would do __assume_aligned in that case).

Next idea is going the opposite direction, and forcing everyone else to use a GCC-style macro:

void* foo = HEDLEY_ASSUME_ALIGNED(bar, align);

Unfortunately, this would have the same problem with the argument being evaluated twice for everyone other than GCC:

#define HEDLEY_ASSUME_ALIGNED(arg, align) (__assume_aligned((arg), (align)), (arg))

We want to return a value, so a do...while loop with a temp variable won't work (besides, that would restrict us to C99+). Just using a function won't work because the __assume_aligned call would be inside the function (it would also suck for C++ since it would rely on going through a void*). Everything else would require compiler extensions; GCC supports compound statements in parenthesis, but I don't know of anything similar in MSVC (or other compilers we may want to support) and we would need them in the other compilers, not in GCC.

Basically, I don't think there is a way to support this without pushing an ifdef into the code of people using Hedley, which is unacceptable.

I'm out of ideas, other than to ask the GCC people to change __builtin_assume_aligned to mark the argument as aligned (which I plan to do).

@Shelwien
Copy link

I also had a version like this for gcc:
#define __assume_aligned(x,y) (x=decltype(x)(__builtin_assume_aligned((void*)x,y)))
My idea is that we just have to add a restriction that first arg has to be a pointer.

What's more interesting (and what I wanted to demonstrate), is that __assume is much stronger than that,
and it should be possible to enable all kinds of other optimizations with __assume.

@nemequ
Copy link
Owner Author

nemequ commented Feb 14, 2017

void* const x = ...;
__assume_aligned(x, 32);

*Compiler explodes*.

If you care about C, you may want to change that to something like

#if !defined(__cplusplus)
#define __assume_aligned(x,y) (x=(__typeof__(x))(__builtin_assume_aligned((void*)x,y)))
#elif /* Test for C++11 mode */
#define __assume_aligned(x,y) (x=decltype(x)(__builtin_assume_aligned((void*)x,y)))
#else
...
#endif

Obviously it's a GNU extension, but decltype doesn't work at all in C mode…

@nemequ
Copy link
Owner Author

nemequ commented Feb 14, 2017

If you want, I could implement __assume for GCC using __builtin_unreachable(). Unfortunately it won't work for specifying alignment, at least not currently (see that response on the gcc-help ML), but maybe in the future…

#if HEDLEY_GCC_HAS_BUILTIN(__builtin_unreachable,4,5,0)
#  define HEDLEY_ASSUME(expr) ((expr) ? 1 : (__builtin_unreachable(), 0))
#elif ...
#endif

@Shelwien
Copy link

I think that VS/IC have some kind of tracking system for value bits.
Its totally unrelated to unreachable code.
But you can write __assume(x>0) or _assume(x&1==x) and it affects the compiled code.

Btw, what do you think about my coroutine implementation - https://github.com/Shelwien/stegdict/blob/master/Lib/coro2b.inc
And where we can discuss that?

@nemequ
Copy link
Owner Author

nemequ commented Feb 14, 2017

I think that VS/IC have some kind of tracking system for value bits.
Its totally unrelated to unreachable code.
But you can write __assume(x>0) or _assume(x&1==x) and it affects the compiled code.

I think it does work for clang. I haven't verified, but see http://llvm.org/docs/doxygen/html/Compiler_8h.html#a2fd576fb00a760ba803c8a171bff051a (grep for LLVM_ASSUME_ALIGNED). Maybe the author was just optimistic, but if not this could take care of HEDLEY_ASSUME_ALIGNED for clang.

Btw, what do you think about my coroutine implementation - https://github.com/Shelwien/stegdict/blob/master/Lib/coro2b.inc
And where we can discuss that?

Interesting. It looks like it would take me a while to review the code, a coroutine implementation could be a nice feature for portable-snippets; if you would be interested in that you can file an issue there, that would be a good place to discuss it. FWIW, I've been planning on adding libcoro to Squash for some stuff, it might be interesting to compare. Also, Microsoft has an interesting fiber API which could make for a nice backend…

@Shelwien
Copy link

An example with __assume:
https://godbolt.org/g/kUbVT3

@Shelwien
Copy link

@nemequ
Copy link
Owner Author

nemequ commented Feb 14, 2017

@RoyiAvital
Copy link

For MSVC there is __assume, which is a bit of a pain to use, but basically you pass an expression (which evaluates to true if the variable is aligned as expected). Something like

__assume((((char*) arg) - ((char*) 0)) % (16) == 0)
/* Compiler knows arg is 16-byte aligned */

Have you seen this to work?
I have not seen any documentation on MSVC stating it can be used to hint alignment.

By the way, why not:

__assume((ptrArg % 16) == 0)

Isn't this enough to say the address of the pointer is aligned?

@nemequ
Copy link
Owner Author

nemequ commented Sep 21, 2018

Have you seen this to work?
I have not seen any documentation on MSVC stating it can be used to hint alignment.

It's a thing people do, I've never actually verified that the code generated by MSVC works as expected, but I think it does. Intel has some documentation where it works at https://software.intel.com/en-us/articles/data-alignment-to-assist-vectorization

FWIW, __assume is actually pretty smart in MSVC. The compiler really does get a lot out of it considering it's such a generic construct.

By the way, why not:

That's probably okay since this is compiler-specific anyways. In theory, though, you could run into some problems because you're doing math on mixed types (an int and a pointer). With the code I have above you do the math on two pointers of the same type, then modulus on the integer type.

@RoyiAvital
Copy link

RoyiAvital commented Sep 21, 2018

I wonder if there is someone in the MSVC Compiler Team we can ask if this works and makes the compiler treat the data as aligned for aligned load when using SIMD.

@nemequ
Copy link
Owner Author

nemequ commented Apr 9, 2019

After seeing the propasal for std::assume_aligned, I decided to take another stab at this. I pushed what I have to test/assume-aligned.c in the wip/assume-aligned branch.

There is still a lot of work to do (especially testing), and it's already pretty ugly, but this version might work out. If anyone has a better test case I'd be very interested.

@yuri-kilochek
Copy link

Is might be reasinable to make HEDLEY_ASSUME_ALIGNED a no-op on MSVC, since they are no longer generating aligned loads/stores anyway: https://developercommunity.visualstudio.com/content/problem/19160/regression-from-vs-2015-in-ssseavx-instructions-ge.html#reply-72079

@travisdowns
Copy link
Collaborator

I didn't go back and read this whole issue, but at least regarding @yuri-kilochek comment, I don't think the reason for "assume aligned" is to generate aligned instructions (e.g., movaps vs movups), but rather to improve code generation in various cases where alignment is useful.

E.g., during vectorization the compiler might want the core loop to be aligned, not to use aligned load instructions, but because aligned access is more efficient. To do that, it may insert a prologue which does some scalar iterations to get to an alignment point: this code is redundant if the input is already aligned. I think that's the main purpose of assume aligned. Now I am not sure if MSVC actually ever does that - my experience is more gcc and clang.

Another reason to assume alignment is allow load-op instructions (instructions that include a memory operand) for SSE instructions, which require alignment for memory operations (AVX changed that and load-op instructions no longer needed to be aligned).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants