samedi 28 février 2015

Coalesced memory access to 2d array with CUDA


I'm working on a piece of CUDA C++ code and need each thread to, essentially, access a 2D array in global memory by BOTH row-major AND column-major. Specifically, I need each thread-block to:



  • generate it's own 1-d array (let's say, gridDim # of elements)

  • Write these to global memory

  • Read the n-th element of each written array, where n is block ID.


The way I see it, only the write OR the read can be coalesced, and the other will be accessing a separate cache line for each element (and perform terribly). I've read that texture memory has a 2-d caching mechanism, but don't know if it can be used to improve this situation.


BTW I am using a GTX 770, so its a GK104 Kepler card with compute capability 3.0.


Any help or advice would be greatly appreciated! Thanks.




Aucun commentaire:

Enregistrer un commentaire