+2 votes
I want to understand why is D=O(log(n)) in Beyond Comparison-based Sorting: (Parallel) Counting Sort.

module CountSortPar([N]nat{M} ?a,b) {
[N]nat{N+1} ls,gt;
// for every a[i] count the number of elements a[j] that are less than a[i]
for(i=0..N−1)
ls[i] = sum(j=0..N−1) (a[j]<a[i]?1:0);
// for every a[i] count the number of elements a[j] that are greater than a[i]
for(i=0..N−1)
gt[i] = sum(j=0..N−1) (a[j]>a[i]?1:0);
// generate the sorted array
for(i=0..N−1)
for(j=0..N−1)
if(ls[i]<=j & j<N−gt[i])
b[j] = a[i];
}

I agree that W=O(N^2 ) and P=O(N^2 ) processors.

W=O(N^2 ) and P=O(N^2 ) are regarding to the last "for" that generates the sorted array and they can sort the array in one PRAM step D=O(1).

But in the first and second "for" the algorithm has to compare the condition a[j]<a[i], assigns the value of 1or 0 and sums in ls[i] or gt[i], and for me W=O(N^2),P=O(N) (to compare j=0..N-1 with every a[i]), and D=O(N) (due to every value of i).

I don't understand why D=O(log(n)) in the code?.
in * TF "Emb. Sys. and Rob." by (210 points)

1 Answer

+3 votes

Thank you for your question and the careful studying of the matters even though that algorithm is not an important one as explained in the slides and the video. So, why is the depth O(log(N))? When you study the bodies of the loops, you see in the first two loop an assignment whose right hand side contains a sum(j=0..N.1) over some values which are 0 or 1. To compute those sums you need O(log(N)) time as another algorithm (about array sums) explains. 

To clarify, have a look at the first loop

    for(i=0..N−1)
        ls[i] = sum(j=0..N−1) (a[j]<a[i]?1:0);

You can rewrite this code as follows:
    for(i=0..N−1) {
        s = 0;
        for(j=0..N-1) {
            next(s) = s + (a[j]<a[i]?1:0);
            pause;
        }
        ls[i] = s;
    }

However, the depth would now be even O(N): Note that we can run the N inner loops in parallel, but since they are sequential, each one requires O(N) depth which is then also the case for the entire statement.

We can improve this by arranging the inner loop's operations in a binary tree as shown in the chapter on array sums. If you would do that, the sums would be computed in O(log(N)) depth and O(N) work, and that leads to the statements on the slide that you were wondering about.

It was not explained in that detail since the algorithm is NOT a comparison-based one that can therefore have less work than O(N log(N)). These are the algorithms we do not want to study in the chapter for the reasons mentioned.

by (96.2k points)
edited by

Related questions

0 votes
1 answer
0 votes
2 answers
asked Feb 4, 2021 in * TF "Emb. Sys. and Rob." by dn (1.4k points)
0 votes
2 answers
asked Aug 18, 2020 in * TF "Emb. Sys. and Rob." by Nicola (800 points)
Imprint | Privacy Policy
...