I don't know if there's something missing on my understanding of how AVX intrinsics/std::array work, but I'm having a strange issue with Clang when I combine the two.
Sample code:
std::array<__m256, 1> gen_data()
{
std::array<__m256, 1> res;
res[0] = _mm256_set1_ps(1);
return res;
}
void main()
{
auto v = gen_data();
float a[8];
_mm256_storeu_ps(a, v[0]);
for(size_t i = 0; i < 8; ++i)
{
std::cout << a[i] << std::endl;
}
}
Output from Clang 3.5.0 (upper 4 floats are garbage data):
1
1
1
1
8.82272e-39
0
5.88148e-39
0
Output from GCC 4.8.2/4.9.1 (expected):
1
1
1
1
1
1
1
1
If I instead pass v into gen_data as an output parameter it works just fine on both compilers. I'm willing to accept that this might be a bug in Clang, however I don't know if this might be UB. Testing with Clang 3.7* (svn build) and Clang appears to now give my expected result. If I switch to SSE 128-bit intrinsics (__m128) then all compilers give the same expected results.
So my questions are:
- Is there any UB here? Or is this just a bug in Clang 3.5.0?
- Is my understanding that __m256 is simply a 32-byte aligned chunk of memory correct? Or is there something else special about it that I have to be careful with?
Aucun commentaire:
Enregistrer un commentaire