mardi 17 mars 2015

Are STL algorithms optimized for speed?


I was testing the speed of different ways loop on a std::vector. In the code below, I consider 5 ways to calculate the sum of all elements of a vector of N = 10000000 elements:



  1. using iterators

  2. using integer indices

  3. using integer indices, unrolling by a factor 2

  4. using integer indices, unrolling by a factor 4

  5. using std::accumulate


The code is compiled with g++ for windows, the command line used to compile is:



g++ -std=c++11 -O3 loop.cpp -o loop.exe


I ran the code 4 times, measuring the time of each method, I get the following results (time in microseconds, max and min are given):



  • Iterators: 8002 - 8007

  • Int indices: 8004 - 9003

  • Unroll 2: 6004 - 7005

  • Unroll 4: 4001 - 5004

  • accumulate: 8005 - 9007


What these experiments seem to indicate is:




  1. Looping with iterators vs integer indices does not make much difference, at least with full optimization.




  2. Unrolling the loop pays off




  3. Surprisingly, the stl::accumulate gives the worse performance.




While the conclusions 1 and 2 were somewat expected, the number 3 is quite surprising. Don't all books say to use the STL algorithms instead of writing loops by myself?


Am I making any mistake in the way I am measuring the time, or in the way I interprete the results? Do you guys get a different scenario in case you try out the code given below?



#include <iostream>
#include <chrono>
#include <vector>
#include <numeric>

using namespace std;
using namespace std::chrono;



int main()
{
const int N = 10000000;
vector<int> v(N);
for (int i = 0; i<N; ++i)
v[i] = i;

//looping with iterators
{
high_resolution_clock::time_point t1 = high_resolution_clock::now();

long long int sum = 0;
for (auto it = v.begin(); it != v.end(); ++it)
sum+=*it;

high_resolution_clock::time_point t2 = high_resolution_clock::now();

auto duration = std::chrono::duration_cast<std::chrono::microseconds>( t2 - t1 ).count();

cout << duration << "microseconds output = " << sum << " (Iterators)\n";
}

//looping with integers
{
high_resolution_clock::time_point t1 = high_resolution_clock::now();

long long int sum = 0;
for (int i = 0; i<N; ++i)
sum+=v[i];

high_resolution_clock::time_point t2 = high_resolution_clock::now();

auto duration = std::chrono::duration_cast<std::chrono::microseconds>( t2 - t1 ).count();

cout << duration << "microseconds output = " << sum << " (integer index)\n";
}

//looping with integers (UNROLL 2)
{
high_resolution_clock::time_point t1 = high_resolution_clock::now();

long long int sum = 0;
for (int i = 0; i<N; i+=2)
sum+=v[i]+v[i+1];

high_resolution_clock::time_point t2 = high_resolution_clock::now();

auto duration = std::chrono::duration_cast<std::chrono::microseconds>( t2 - t1 ).count();

cout << duration << "microseconds output = " << sum << " (integer index, UNROLL 2)\n";
}

//looping with integers (UNROLL 4)
{
high_resolution_clock::time_point t1 = high_resolution_clock::now();

long long int sum = 0;
for (int i = 0; i<N; i+=4)
sum+=v[i]+v[i+1]+v[i+2]+v[i+3];

high_resolution_clock::time_point t2 = high_resolution_clock::now();

auto duration = std::chrono::duration_cast<std::chrono::microseconds>( t2 - t1 ).count();

cout << duration << "microseconds output = " << sum << " (integer index, UNROLL 4)\n";
}

//using std::accumulate
{
high_resolution_clock::time_point t1 = high_resolution_clock::now();

long long int sum = accumulate(v.begin(), v.end(), static_cast<long long int>(0));

high_resolution_clock::time_point t2 = high_resolution_clock::now();

auto duration = std::chrono::duration_cast<std::chrono::microseconds>( t2 - t1 ).count();

cout << duration << "microseconds output = " << sum << " (std::accumulate)\n";
}
return 0;
}



Aucun commentaire:

Enregistrer un commentaire