c++: std::array of AVX intrinsics

lundi 23 mars 2015

std::array of AVX intrinsics

I don't know if there's something missing on my understanding of how AVX intrinsics/std::array work, but I'm having a strange issue with Clang when I combine the two.

Sample code:


std::array<__m256, 1> gen_data()
{
    std::array<__m256, 1> res;
    res[0] = _mm256_set1_ps(1);
    return res;
}

void main()
{
    auto v = gen_data();
    float a[8];
    _mm256_storeu_ps(a, v[0]);
    for(size_t i = 0; i < 8; ++i)
    {
        std::cout << a[i] << std::endl;
    }
}

Output from Clang 3.5.0 (upper 4 floats are garbage data):


1
1
1
1
8.82272e-39
0
5.88148e-39
0

Output from GCC 4.8.2/4.9.1 (expected):

If I instead pass v into gen_data as an output parameter it works just fine on both compilers. I'm willing to accept that this might be a bug in Clang, however I don't know if this might be UB. Testing with Clang 3.7* (svn build) and Clang appears to now give my expected result. If I switch to SSE 128-bit intrinsics (__m128) then all compilers give the same expected results.

So my questions are:

Is there any UB here? Or is this just a bug in Clang 3.5.0?

Is my understanding that __m256 is simply a 32-byte aligned chunk of memory correct? Or is there something else special about it that I have to be careful with?

c++

lundi 23 mars 2015

std::array of AVX intrinsics

Aucun commentaire:

Enregistrer un commentaire