# Parallel Merging

In the first chapter on PRAM, a parallel merging algorithm is given starting from slide 55. I understand everything up to slide 57, but then it gets confusing for me.

The slide states:

for every block i of a, we now have the greatest j=firstC[i] such that
a[i*M]>=b[j*M] holds

The computation on the slide is correct as far as I can see. Having split arrays "a" and "b" into M blocks of size M, we first compare the M first entries of the blocks of "a" with the M first entries of the blocks of "b" which yields the boolean array c[i][j]. In the example, we got

 1 1 1 1 0 1 1 1 0 1 1 1 0 0 1 1

Then, we compute a boolean array firstC such that firstC[i] is the least index j where c[i][j], i.e., a[i*M]<b[j*M] holds. In the example, array firstC is then [|0;1;1;2|].

After this, the positive entries are decremented so that we finally get firstC=[|0;0;0;1|]. For every block i of array "a", we now have the greatest j=firstC[i] such that a[i*M]>=b[j*M] holds. The elements of block a[i*M] must therefore be merged somewhere inside block b[j*M] where j=firstC[i].

It remains to find the index inside block b[j*M] with j=firstC[i] where a[i*M] must be put in between. To this end, the same procedure is repeated, i.e., we compute a boolean array d as d[i][j] := a[i*M]<b[firstC[i]*M+j] for i,j=0..M-1 which is as follows:

 1 1 1 1 0 0 1 1 0 0 0 0 0 0 1 1

We then compute for each row the first index >0, which yields firstD=[|0;2;4;2|] (note that we return M=4 if there is no index >0).

The ranks are determined such that rank[i] is the least index j where a[i*M]<b[j] holds, i.e., we then have a[i*M]<b[rank[i]]. We obtain the ranks from arrays firstC and firstD as follows: rank[i] = firstC[i]*M+firstD[i] where firstD[i]=M must be treated as zero since there is no j=0..M-1 with a[i*M]<b[firstC[i]*M+j].

by (142k points)
selected
So we have firstC=[|0;0;0;1|], j=firstC[i] and a[i*M]>=b[j*M] holds. Suppose now i=0, this leads to a[0*M]>=b[firstC[i]*M] which is equal to a>=b. This means that the first block of "a" must be merged into the first block of "b". This makes sense. But what confuses me is what happens when we plug in the actual values from the arrays. So in this example a>=b will be 1>=4, which seems odd. Are we only looking at the block positions and not at the actual values here?
Well, you are right with this. As explained above, we first compute firstC such that firstC[i] is the least index such that a[i*M]<b[firstC[i]*M] holds. In the example, a is the least element of both arrays, and therefore firstC=0.

The next step decrements the values in firstC so that firstC[i] shall become the greatest index such that a[i*M]>=b[firstC[i]*M] holds (that is so because incrementing firstC[I] would result in a[i*M]>=b[firstC[i]*M]). This means that a[i*M] has to be placed somewhere in block b[firstC[i]*M].

Now, here is the problem. If firstC[i] was 0, and we would decrement it, we would get negative index expressions. For this reason, we do not decrement it, and therefore, you are right that a[i*M]>=b[firstC[i]*M]) is not true for those values i where firstC[I] was already 0. So, you are right that in this case the statement you highlighted is wrong.

Still the algorithm is correct, since even in this case, a[i*M] has to be placed somewhere inside block b.

So, I shall correct the sentence as follows: "for every block i of a, we have found the block b[firstC[i]*M] where a[i*M] has to be placed in"
I see now were the problem is. Thank you for the great explanation!