Quantitative Analysis
Parallel Processing
Numerical Analysis
C++ Multithreading
Python for Excel
Python Utilities
Services
Author

I. Introduction into GPU programming.
II. Exception safe dynamic memory handling in Cuda project.
1. Allocating and deallocating device memory. ots::cuda::memory::Device class.
2. Accessing device memory. ots::cuda::memory::Host class.
3. Crossing host/device boundary. ots::cuda::memory::DeviceHandler class.
4. Accessing memory in __device__ code. ots::cuda::memory::Block class.
5. Handling two dimensional memory blocks. Do not use cudaMallocPitch.
6. Allocation of memory from Host scope.
7. Tagged data. Compiler assisted data verification.
III. Calculation of partial sums in parallel.
IV. Manipulation of piecewise polynomial functions in parallel.
V. Manipulation of localized piecewise polynomial functions in parallel.
Downloads. Index. Contents.

Exception safe dynamic memory handling in Cuda project.


here are three distinct compiler scopes in every Cuda project.

"Regular" compiler scope covers cpp files (compiled by the native C++ compiler). In this scope all the regular C++ tools are available but the Cuda API is not accessible.

"Host" compiler scope resides in cu files (compiled by the Nvidia compiler). This scope covers host-side Cuda programming. It includes memory allocation and deallocation of device-side memory and kernel launches. Most of C++ standard library and boost library are not accessible in this compiler scope.

"Device" compiler scope resides in cu files. This scope covers device-side multithreaded code. The most narrow set of tools is available in this scope. Flow control operations (if,while,for) should be avoided. Function recursion should be avoided. There is no exception throwing or handling.

Host and Device scopes operate over distinct memory domains. The data communication between these scopes is possible only via stack variables and Cuda API calls.

Regular and Host compiler scopes operate over the same memory domain. To control Cuda from Regular compiler scope one needs to isolate Cuda API inside a layer of user-made functions placed in the Host scope and then call these functions normally from the Regular scope.

In this section we illustrate such separation of responsibilities by implementing exception-safe and leak-safe control of device memory from Regular scope. The source and make files are part of the PiecewisePoly project (explained in the following sections). All the files are accessible from the Download section.




1. Allocating and deallocating device memory. ots::cuda::memory::Device class.
2. Accessing device memory. ots::cuda::memory::Host class.
3. Crossing host/device boundary. ots::cuda::memory::DeviceHandler class.
4. Accessing memory in __device__ code. ots::cuda::memory::Block class.
5. Handling two dimensional memory blocks. Do not use cudaMallocPitch.
6. Allocation of memory from Host scope.
7. Tagged data. Compiler assisted data verification.

Downloads. Index. Contents.


















Copyright 2007