Type punning in `block_ldlt_internal::find_maxloc` is undefined behaviour #154

mjacobse · 2023-10-07T13:04:24Z

A union is introduced in

spral/src/ssids/cpu/kernels/block_ldlt.hxx

Lines 104 to 109 in b89ab7b

    
           // Define a union that lets us abuse T to store ints and still use 
        
           // avx blend. 
        
           union intT { 
        
              int i; 
        
              T d; 
        
           };

which is used repeatedly to store int to .i and "read it as double" from .d, for example:

spral/src/ssids/cpu/kernels/block_ldlt.hxx

Lines 114 to 116 in b89ab7b

    
           intT imax; 
        
           imax.i = std::numeric_limits<int>::max(); 
        
           SimdVecT bestr(imax.d);

This is commonly done for type punning and I believe this was allowed in C and happens to work on most platforms even in C++. But it is undefined behaviour in C++ (https://www.youtube.com/watch?v=_qzMpk-22cc&t=570s). In general, the modern way to do this in C++20 is std::bit_cast. Probably wanting to support earlier standards, one could write their own version with that possible implementation using std::memcpy.

In this specific case, it seems strange to me that any type punning is done at all though. If we have AVX, couldn't we just load the int values into the AVX register using _mm256_set1_epi32, do the operations and then extract them with _mm256_extract_epi32? Otherwise without AVX, SimdVec just turns into a plain double, which makes its use for the row and column indices in block_ldlt_internal::find_maxloc very akward. For no real reason the int values are basically type-punned to unnecessarily larger double type as which swaps are done, only to then be punned back into int.

The text was updated successfully, but these errors were encountered:

jfowkes · 2023-10-09T07:25:32Z

Very interesting, thank you @mjacobse! Perhaps @jhogg41 can comment on the reasoning behind this int to double type punning for the LDLT CPU kernel AVX calls?

I feel that the way to go on this is simply to switch to using the integer AVX calls _mm256_set1_epi32 and _mm256_extract_epi32 so as to avoid the unnecessary (and expensive) type punning of int to double and then back to int when there is no AVX.

jfowkes · 2023-10-30T14:28:29Z

@mjacobse as we haven't heard about the reasoning for this type punning I'm happy for the type punning to be removed in favour of using the integer AVX calls _mm256_set1_epi32 and _mm256_extract_epi32 as you suggest. As far as I can tell this can only result in improved performance on AVX systems (most are these days). It may well be that when SPRAL was written ten years ago these integer AVX calls were not widely supported (it seems that they may be an x86_64 specific instruction).

jfowkes added the enhancement label Oct 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Type punning in `block_ldlt_internal::find_maxloc` is undefined behaviour #154

Type punning in `block_ldlt_internal::find_maxloc` is undefined behaviour #154

mjacobse commented Oct 7, 2023

jfowkes commented Oct 9, 2023

jfowkes commented Oct 30, 2023

Type punning in block_ldlt_internal::find_maxloc is undefined behaviour #154

Type punning in block_ldlt_internal::find_maxloc is undefined behaviour #154

Comments

mjacobse commented Oct 7, 2023

jfowkes commented Oct 9, 2023

jfowkes commented Oct 30, 2023

Type punning in `block_ldlt_internal::find_maxloc` is undefined behaviour #154

Type punning in `block_ldlt_internal::find_maxloc` is undefined behaviour #154