jeudi 26 mars 2015

How to vectorize my loop with g++?


The intro-links I found while searching:



  1. 6.59.14 Loop-Specific Pragmas

  2. 2.100 Pragma Loop_Optimize

  3. How to give hint to gcc about loop count

  4. Tell gcc to specifically unroll a loop

  5. How to Force Vectorization in C++


As you can see most of them are for C, but I thought that they might work at C++ as well. Here is my code:



template<typename T>
//__attribute__((optimize("unroll-loops")))
//__attribute__ ((pure))
void foo(std::vector<T> &p1, size_t start,
size_t end, const std::vector<T> &p2) {
typename std::vector<T>::const_iterator it2 = p2.begin();
//#pragma simd
//#pragma omp parallel for
//#pragma GCC ivdep Unroll Vector
for (size_t i = start; i < end; ++i, ++it2) {
p1[i] = p1[i] - *it2;
p1[i] += 1;
}
}


int main()
{
size_t n;
double x,y;
n = 12800000;
vector<double> v,u;
for(size_t i=0; i<n; ++i) {
x = i;
y = i - 1;
v.push_back(x);
u.push_back(y);
}
using namespace std::chrono;

high_resolution_clock::time_point t1 = high_resolution_clock::now();
foo(v,0,n,u);
high_resolution_clock::time_point t2 = high_resolution_clock::now();

duration<double> time_span = duration_cast<duration<double>>(t2 - t1);

std::cout << "It took me " << time_span.count() << " seconds.";
std::cout << std::endl;
return 0;
}


I used al the hints one can see commented above, but I did not get any speedup, as a sample output shows (with the first run having uncommented this #pragma GCC ivdep Unroll Vector:



samaras@samaras-A15:~/Downloads$ g++ test.cpp -O3 -std=c++0x -funroll-loops -ftree-vectorize -o test
samaras@samaras-A15:~/Downloads$ ./test
It took me 0.026575 seconds.
samaras@samaras-A15:~/Downloads$ g++ test.cpp -O3 -std=c++0x -o test
samaras@samaras-A15:~/Downloads$ ./test
It took me 0.0252697 seconds.


Is there hope? Or the optimization flag O3 just does the trick? Any suggestions to speedup this code (the foo function) are welcome!


My version:



samaras@samaras-A15:~/Downloads$ g++ --version
g++ (Ubuntu 4.8.1-2ubuntu1~12.04) 4.8.1



Aucun commentaire:

Enregistrer un commentaire