Quantitative Analysis
Parallel Processing
Numerical Analysis
C++ Multithreading
Python for Excel
Python Utilities
Services
Author

I. Introduction into GPU programming.
II. Exception safe dynamic memory handling in Cuda project.
1. Allocating and deallocating device memory. ots::cuda::memory::Device class.
2. Accessing device memory. ots::cuda::memory::Host class.
3. Crossing host/device boundary. ots::cuda::memory::DeviceHandler class.
4. Accessing memory in __device__ code. ots::cuda::memory::Block class.
5. Handling two dimensional memory blocks. Do not use cudaMallocPitch.
6. Allocation of memory from Host scope.
7. Tagged data. Compiler assisted data verification.
III. Calculation of partial sums in parallel.
IV. Manipulation of piecewise polynomial functions in parallel.
V. Manipulation of localized piecewise polynomial functions in parallel.
Downloads. Index. Contents.

Handling two dimensional memory blocks. Do not use cudaMallocPitch.


he same project contains the classes Device2D and Block2D with similar functionality and design ideas. However, there is one topic worth mentioning. Cuda documentation recommends using cudaMallocPitch for all 2D allocations. The author dares to recommend exactly the opposite: do not use cudaMallocPitch.

The author conducted a simple experiment with combined use of cudaMallocPitch and cudaMemGetInfo. The amount of free memory in bytes was measured before and after a call to cudaMallocPitch. A block of width=5*sizeof(double)=40 and height=1,000,000 was allocated. The calls to cudaMemGetInfo indicated that such allocation changed the amount of free memory from 916,508,672 to 404,410,368. On the same machine, a cudaMalloc-based allocation of size=5,000,000*sizeof(double) changed the amount of free memory from 914,108,416 to 874,000,384 (as it should).





Downloads. Index. Contents.


















Copyright 2007