lundi 23 mars 2015

std::array of AVX intrinsics


I don't know if there's something missing on my understanding of how AVX intrinsics/std::array work, but I'm having a strange issue with Clang when I combine the two.


Sample code:



std::array<__m256, 1> gen_data()
{
std::array<__m256, 1> res;
res[0] = _mm256_set1_ps(1);
return res;
}

void main()
{
auto v = gen_data();
float a[8];
_mm256_storeu_ps(a, v[0]);
for(size_t i = 0; i < 8; ++i)
{
std::cout << a[i] << std::endl;
}
}


Output from Clang 3.5.0 (upper 4 floats are garbage data):



1
1
1
1
8.82272e-39
0
5.88148e-39
0

Output from GCC 4.8.2/4.9.1 (expected):



1
1
1
1
1
1
1
1

If I instead pass v into gen_data as an output parameter it works just fine on both compilers. I'm willing to accept that this might be a bug in Clang, however I don't know if this might be UB. Testing with Clang 3.7* (svn build) and Clang appears to now give my expected result. If I switch to SSE 128-bit intrinsics (__m128) then all compilers give the same expected results.


So my questions are:



  1. Is there any UB here? Or is this just a bug in Clang 3.5.0?

  2. Is my understanding that __m256 is simply a 32-byte aligned chunk of memory correct? Or is there something else special about it that I have to be careful with?




Aucun commentaire:

Enregistrer un commentaire