Calculation of partial sums in parallel.

II.

Exception safe dynamic memory handling in Cuda project.

III.

IV.

Manipulation of piecewise polynomial functions in parallel.

Manipulation of localized piecewise polynomial functions in parallel.

Downloads.

Index.

Contents.

Calculation of partial sums in parallel.

et is a given array of numbers (integers or doubles). We would like to calculate an array s.t. MATH

The presented procedure has $\log K_{0}$ -proportional processing time under assumption of infinite parallel processing capability.

We proceed as follows. Choose an integer : $1<L<<K_{0}$ .

For step 0, we calculate in parallel ( $i_{0}$ is the thread index): MATH

For step 1, we calculate MATH

We continue, for step , we calculate MATH
Partial summation in parallel figure

Summation in parallel for and $K_{0}=35$ .

The last step would be $s^{\ast}$ when MATH

We have MATH Similarly, MATH Note that for each MATH is a partial sum. Next, we correct the rest of the numbers.

For step 0 we calculate (on a single thread): MATH Thus, MATH Note that the final $u_{j_{1}}^{0}$ contains the final partial sum. Hence, there is no need to increment this position anymore. This fact is reflected in slightly different indexing for the following steps.

For step 1 we calculate (on threads, $j_{1}$ is the thread index): MATH We adopt the convention MATH The index $j_{2}$ varies from 0 to except for the boundary where $Lj_{1}-1+i~$ cannot exceed or be equal to $K_{s^{\ast}}~$ (because indexes of $u^{1}$ are in the same range as indexes of $z^{s^{\ast}-1}$ ): MATH Hence, the stated range for $j_{2}$ follows.

We have MATH MATH thus MATH

For general step we calculate (on $L^{s}$ threads, is the thread index): MATH Assuming that MATH MATH we get MATH thus MATH

At $s=s^{\ast}+1$ we have the partial sums.

It remains to describe memory allocation and positioning for the intermediate data $\left\{ z\right\}$ and $\left\{ u\right\}$ . We place (see the figure ( Partial summation in parallel )) MATH

Summary

(Calculation of partial sums) The array MATH and the parameters $K_{0}$ , : $1<L<<K_{0}$ are input data.

We introduce an operation MATH that transforms the array according to the rule MATH where $K^{\ast}$ is chosen to limit the range of so that $x_{K_{0}}$ participates in the sum at most once and the range of (thread index) is MATH