Quantitative Analysis
Parallel Processing
Numerical Analysis
C++ Multithreading
Python for Excel
Python Utilities
Services
Author

I. Introduction into GPU programming.
II. Exception safe dynamic memory handling in Cuda project.
III. Calculation of partial sums in parallel.
IV. Manipulation of piecewise polynomial functions in parallel.
V. Manipulation of localized piecewise polynomial functions in parallel.
Downloads. Index. Contents.

Calculation of partial sums in parallel.


et MATH is a given array of numbers (integers or doubles). We would like to calculate an array MATH s.t. MATH

The presented procedure has $\log K_{0}$ -proportional processing time under assumption of infinite parallel processing capability.

We proceed as follows. Choose an integer $L$ : $1<L<<K_{0}$ .

For step 0, we calculate in parallel ( $i_{0}$ is the thread index): MATH

For step 1, we calculate MATH

We continue, for step $s$ , we calculate MATH
Partial summation in parallel figure
Summation in parallel for $L=3$ and $K_{0}=35$ .

The last step would be $s^{\ast}$ when MATH

We have MATH Similarly, MATH Note that for each $s$ MATH is a partial sum. Next, we correct the rest of the numbers.

For step 0 we calculate (on a single thread): MATH Thus, MATH Note that the final $u_{j_{1}}^{0}$ contains the final partial sum. Hence, there is no need to increment this position anymore. This fact is reflected in slightly different indexing for the following steps.

For step 1 we calculate (on $L$ threads, $j_{1}$ is the thread index): MATH We adopt the convention MATH The index $j_{2}$ varies from 0 to $L-1$ except for the boundary where $Lj_{1}-1+i~$ cannot exceed or be equal to $K_{s^{\ast}}~$ (because indexes of $u^{1}$ are in the same range as indexes of $z^{s^{\ast}-1}$ ): MATH Hence, the stated range for $j_{2}$ follows.

We have MATH MATH thus MATH

For general step $s$ we calculate (on $L^{s}$ threads, $j$ is the thread index): MATH Assuming that MATH MATH we get MATH thus MATH

At $s=s^{\ast}+1$ we have the partial sums.

It remains to describe memory allocation and positioning for the intermediate data $\left\{ z\right\} $ and $\left\{ u\right\} $ . We place (see the figure ( Partial summation in parallel )) MATH

Summary

(Calculation of partial sums) The array MATH and the parameters $K_{0}$ , $L$ : $1<L<<K_{0}$ are input data.

We introduce an operation MATH that transforms the array MATH according to the rule MATH where $K^{\ast}$ is chosen to limit the range of $i$ so that $x_{K_{0}}$ participates in the sum at most once and the range of $j$ (thread index) is MATH

Let MATH

We introduce the operation MATH that transforms the array MATH according to the rule (single threaded calculation): MATH

We introduce the operation MATH that transforms the array MATH according to the rule ( $j$ is the thread index): MATH where MATH

We consecutively perform the operations MATH MATH MATH At completion MATH contains the partial sums.





Downloads. Index. Contents.


















Copyright 2007