c++: CUDA Maximum Threads per Block

For some Reason I'm not able to launch a kernel with more than (32, 32) threads per block in two dimensions. Here the Code:


dim3 dimBlock(64, 64); 
dim3 dimGrid(1, 1);
LinearKernelKernel <<<dimGrid, dimBlock>>>(dev_X, dev_K);

The Kernel is simple, just it does nothing...


__global__ void LinearKernelKernel(Matrix X, Matrix K)
{
}

If I change the dimBlock to this: dimBlock(32, 32) it works fine. Everything greater than (32, 32) produces an 'Invalid Configuration Argument', if I change the code to use just one dimension it works as expected with large amount of threads per block, the problem is present in two dimensions.

I've tested this code in two different devices: GTX980 and GT540, the compilation is with compute_35, sm_35 and the drivers are up to date.

so What could be wrong?, any ideas?, thanks in advance.

c++

jeudi 26 mars 2015

CUDA Maximum Threads per Block

Aucun commentaire:

Enregistrer un commentaire