mardi 17 mars 2015

Matrix the Rectangle Part transpose Cuda


im writing Cuda Program to Transpose Square Matrix, the idea is to do it in two parts depending on size of matrix; the matrix size cut into even size with Tile , and remain rectangle part left i transpose it separately Ex: 67 x 67 Matrix with Tile : 32, first part is 64x64 transposed, then second part is 3x67.


my problem is in the rectangle part, first below code shows the main code with the defined values:



const int TILE_DIM = 32;
const int BLOCK_ROWS = 8;
const int NUM_REPS = 100;

const int Nx = 2024; //size of the matrix
const int Ny = 2024;

int main(int argc, char **argv)
{
const int nx = Nx;
const int ny = Ny; // Size of the Arrays
const int mem_size = nx*ny*sizeof(int);// Size of the Orig.Arr

int *h_idata = (int*)malloc(mem_size); // original Host Arr.

int *d_idata; //device Arr.
checkCuda(cudaMalloc(&d_idata, mem_size));

dim3 dimGridX(nx / TILE_DIM, 1, 1); //grid dimension used
dim3 dimBlockX(TILE_DIM, 1, 1); // number of threads used

// the Kernel Function for only the rectangle
EdgeTransposeX << < dimGrid, dimBlock >> >(d_idata);
cudaEventRecord(startEvent, 0);
cudaEventRecord(stopEvent, 0);
cudaEventSynchronize(stopEvent);
cudaEventElapsedTime(&ms, startEvent, stopEvent);
cudaMemcpy(h_idata, d_idata, mem_size, cudaMemcpyDeviceToHost);


the Kernel Code i was advised not to use shared, so below is how ive done :



__global__ void EdgeTransposeX(int *idata)
{

int tile_C[Edge][Nx];
int tile_V[Nx][Edge];

int x = blockIdx.x * TILE_DIM + threadIdx.x;

if (x == (nEven - 1))
{

for (int j = 0; j < Nx; j++)
for (int i = 1; i <= Edge; i++)
{

tile_V[j][i - 1] = idata[j*Nx + (x + i)];
tile_C[i - 1][j] = idata[(x + i)*Nx + j];}

__syncthreads();

for (int j = 0; j < Nx; j++)
for (int i = 1; i <= Edge; i++)
{
idata[j*Nx + (x + i)] = tile_C[i - 1][j];
idata[(x + i)*Nx + j] = tile_V[j][i - 1];}

} }


the code works Okay until matrix size reaches 1025, after that it stops working, any idea why ? am i missing something here ?




Aucun commentaire:

Enregistrer un commentaire