# Definition of the length of the kernel in modulo scheduling

Hello,

I have a question regarding the chapter on static scheduling.

I have understood the definition of the initiation interval and how we have built prologue, kernel and epilogue in the example on slide 78. What confuses me is the definition of the length of the kernel, given on the next slide: "length of kernel = initiation interval π"

So in the example above the length of the kernel is 1. What does this tell us? Does this matter with respect to the hardware? Since if initiation interval is the same as the length of the kernel, wouldn't it be sufficient to argue about the initiation interval?

I was thinking about what "length" could mean in this context, but my suggestions (number of iterations of the loop which represents the kernel after code transformation or the number of instructions in one iteration) are wrong.

In other words: Is length of kernel just a synonym for initiation interval or does we gather any further information like hardware aspects or anything like that?

Indeed, that is not so easy to see, but the equation "length of kernel = initiation interval π" is indeed true. To clarify, consider the examples below first, and then the general case.

As you know, the initiation interval π is the time gap between starting the different loop bodies that then overlap in execution. For instance, considering a general loop statement

```    for(i=1..n-1) {
I1[i];
:
Im[i];
}
```

with an initiation intervall π=1, we would get

```    I1;
I2;  I1;
I3;  I2;  I1;
I4;  I3;  I2;  I1;
:
```

and therefore the following kernel of length π=1:

```    I4[i];  I3[i+1];  I2[i+2];  I1[i+3];
```

Next, consider π=2:

```    I1;
I2;
I3;  I1;
I4;  I2;
I5;  I3;  I1;
I6;  I4;  I2;
I7;  I5;  I3;  I1;
I8;  I6;  I4;  I2;
```

What is the kernel here? It has length π=2 and is as follows:

```    I7[i];  I5[i+1];  I3[i+2];  I1[i+3];
I8[i];  I6[i+1];  I4[i+2];  I2[i+3];
```

You can generalize this and in the general case, where we may consider a loop body with m=p*π many statements S[i];...;S[i][p*π -1], you would schedule the statements S[i][j] as follows and thus you get the following kernel: which has π  rows and p columns. The length of the kernel is now the number of rows of the above block, i.e., π, and in each each row, we can execute p of the statements in parallel.

Bytheway, you can now also see the pipeline: in the above case, the entire execution of a loop body S[i];...;S[i][p*π -1] is split into p stages where stage k executes S[i][k*π];...;S[i][k*π -1]. The execution of one stage of a loop body overlaps with the stages of p-1 other loop bodies.

by (162k points)
selected by