c++: Coalesced memory access to 2d array with CUDA

samedi 28 février 2015

Coalesced memory access to 2d array with CUDA

I'm working on a piece of CUDA C++ code and need each thread to, essentially, access a 2D array in global memory by BOTH row-major AND column-major. Specifically, I need each thread-block to:

generate it's own 1-d array (let's say, gridDim # of elements)

Write these to global memory

Read the n-th element of each written array, where n is block ID.

The way I see it, only the write OR the read can be coalesced, and the other will be accessing a separate cache line for each element (and perform terribly). I've read that texture memory has a 2-d caching mechanism, but don't know if it can be used to improve this situation.

BTW I am using a GTX 770, so its a GK104 Kepler card with compute capability 3.0.

Any help or advice would be greatly appreciated! Thanks.

c++

samedi 28 février 2015

Coalesced memory access to 2d array with CUDA

Aucun commentaire:

Enregistrer un commentaire