Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use unaligned data types for unaligned intrinsics #632

Merged
merged 1 commit into from
May 20, 2024

Conversation

Logikable
Copy link
Contributor

No description provided.

@Logikable Logikable requested review from jserv and marktwtn as code owners May 14, 2024 21:57
@jserv jserv requested a review from howjmay May 14, 2024 22:06
@howjmay
Copy link
Collaborator

howjmay commented May 14, 2024

some fix still required

tests/impl.cpp Outdated Show resolved Hide resolved
@Logikable
Copy link
Contributor Author

How do I debug the Armv7 issues? I don't have the hardware to test this locally.

@jserv
Copy link
Member

jserv commented May 16, 2024

How do I debug the Armv7 issues? I don't have the hardware to test this locally.

You can emulate Armv7 targets via QEMU. Check https://dev.to/amarjargal/running-debian-on-an-emulated-arm-machine-2i04 and specify armhf target.

@Logikable
Copy link
Contributor Author

I think I found the issue. In _mm_loadu_si64, there is a call to vld1_s64, which looks like this:

typedef __attribute__((neon_vector_type(1))) int64_t int64x1_t;
int64x1_t vld1_s64 (const int64_t * __a)
{
  return (int64x1_t) { *__a };
}

On 32-bit Arm, this is done using ldrd, and ldrd doesn't support unaligned accesses. This isn't a problem on 64-bit because it uses ldr.

Is there an unaligned version of vld1_s64? If not, what do you suggest we do?

@MaskRay
Copy link

MaskRay commented May 16, 2024

Thanks for the patch. This should fix some -fsanitize=alignment (part of -fsanitize=undefined) uses.

@jserv
Copy link
Member

jserv commented May 17, 2024

ARMv6 and Armv7 CPUs can perform unaligned accesses for most single load and store instructions up to word size. However, LDM, STM, LDRD, and STRD instructions still need to be handled separately for unaligned accesses. 64-bit variables are typically accessed using LDRD/STRD, which require 32-bit alignment. To handle unaligned 64-bit accesses, we can use a struct-based implementation, which the compiler is smart enough to handle using multiple 32-bit accesses.

Reference: Memory alignment issue

@Logikable
Copy link
Contributor Author

Logikable commented May 17, 2024

I found another way around this issue. _mm_loadu_si64 is written differently from _mm_loadu_si{16,32,128} -- specifically, the latter is written without the use of the problematic vld1_s*. I rewrote _mm_loadu_si64 to be symmetrical.

@Logikable Logikable requested a review from jserv May 18, 2024 15:59
Copy link
Member

@jserv jserv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check https://cbea.ms/git-commit/ and improve the Git commit message to address the benefits of unaligned memory access. You should squash the commits into a single one before amending the Git commit message.

Unaligned memory accesses are primarily useful in cases where space is a
concern, and data would benefit from being packed. While unaligned
accesses are known for causing non-negligible performance degradation,
in situations where performance is also a primary concern, having access
to unaligned SIMD instructions is monumentally beneficial.

This has become particularly relevant with the growing popularity of
edge and mobile machine learning.
@Logikable Logikable requested a review from jserv May 20, 2024 16:49
@jserv jserv changed the title Use unaligned data types for unaligned intrinsics. Use unaligned data types for unaligned intrinsics May 20, 2024
@jserv jserv merged commit 42c7047 into DLTcollab:master May 20, 2024
16 checks passed
@jserv
Copy link
Member

jserv commented May 20, 2024

Thank @Logikable for contributing!

@aqrit
Copy link
Contributor

aqrit commented May 21, 2024

Is __attribute__((aligned(x))) being used incorrectly?

"Cannot decrease the alignment below the natural alignment of the type."
"For a variable that is not in a structure, the minimum alignment is the natural alignment of the variable type."

https://developer.arm.com/documentation/101754/0622/armclang-Reference/Compiler-specific-Function--Variable--and-Type-Attributes/--attribute----aligned---variable-attribute

@Logikable
Copy link
Contributor Author

That only applies to structs/struct members, and the alignment can still be decreased in that situation by also specifying packed.

https://gcc.gnu.org/onlinedocs/gcc-11.2.0/gcc/Common-Variable-Attributes.html#Common-Variable-Attributes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants