I was testing the speed of different ways loop on a std::vector. In the code below, I consider 5 ways to calculate the sum of all elements of a vector of N = 10000000 elements:
- using iterators
- using integer indices
- using integer indices, unrolling by a factor 2
- using integer indices, unrolling by a factor 4
- using std::accumulate
The code is compiled with g++ for windows, the command line used to compile is:
g++ -std=c++11 -O3 loop.cpp -o loop.exe
I ran the code 4 times, measuring the time of each method, I get the following results (time in microseconds, max and min are given):
- Iterators: 8002 - 8007
- Int indices: 8004 - 9003
- Unroll 2: 6004 - 7005
- Unroll 4: 4001 - 5004
- accumulate: 8005 - 9007
What these experiments seem to indicate is:
Looping with iterators vs integer indices does not make much difference, at least with full optimization.
Unrolling the loop pays off
Surprisingly, the stl::accumulate gives the worse performance.
While the conclusions 1 and 2 were somewat expected, the number 3 is quite surprising. Don't all books say to use the STL algorithms instead of writing loops by myself?
Am I making any mistake in the way I am measuring the time, or in the way I interprete the results? Do you guys get a different scenario in case you try out the code given below?
#include <iostream>
#include <chrono>
#include <vector>
#include <numeric>
using namespace std;
using namespace std::chrono;
int main()
{
const int N = 10000000;
vector<int> v(N);
for (int i = 0; i<N; ++i)
v[i] = i;
//looping with iterators
{
high_resolution_clock::time_point t1 = high_resolution_clock::now();
long long int sum = 0;
for (auto it = v.begin(); it != v.end(); ++it)
sum+=*it;
high_resolution_clock::time_point t2 = high_resolution_clock::now();
auto duration = std::chrono::duration_cast<std::chrono::microseconds>( t2 - t1 ).count();
cout << duration << "microseconds output = " << sum << " (Iterators)\n";
}
//looping with integers
{
high_resolution_clock::time_point t1 = high_resolution_clock::now();
long long int sum = 0;
for (int i = 0; i<N; ++i)
sum+=v[i];
high_resolution_clock::time_point t2 = high_resolution_clock::now();
auto duration = std::chrono::duration_cast<std::chrono::microseconds>( t2 - t1 ).count();
cout << duration << "microseconds output = " << sum << " (integer index)\n";
}
//looping with integers (UNROLL 2)
{
high_resolution_clock::time_point t1 = high_resolution_clock::now();
long long int sum = 0;
for (int i = 0; i<N; i+=2)
sum+=v[i]+v[i+1];
high_resolution_clock::time_point t2 = high_resolution_clock::now();
auto duration = std::chrono::duration_cast<std::chrono::microseconds>( t2 - t1 ).count();
cout << duration << "microseconds output = " << sum << " (integer index, UNROLL 2)\n";
}
//looping with integers (UNROLL 4)
{
high_resolution_clock::time_point t1 = high_resolution_clock::now();
long long int sum = 0;
for (int i = 0; i<N; i+=4)
sum+=v[i]+v[i+1]+v[i+2]+v[i+3];
high_resolution_clock::time_point t2 = high_resolution_clock::now();
auto duration = std::chrono::duration_cast<std::chrono::microseconds>( t2 - t1 ).count();
cout << duration << "microseconds output = " << sum << " (integer index, UNROLL 4)\n";
}
//using std::accumulate
{
high_resolution_clock::time_point t1 = high_resolution_clock::now();
long long int sum = accumulate(v.begin(), v.end(), static_cast<long long int>(0));
high_resolution_clock::time_point t2 = high_resolution_clock::now();
auto duration = std::chrono::duration_cast<std::chrono::microseconds>( t2 - t1 ).count();
cout << duration << "microseconds output = " << sum << " (std::accumulate)\n";
}
return 0;
}
Aucun commentaire:
Enregistrer un commentaire