Indeed, that is not so easy to see, but the equation "length of kernel = initiation interval π" is indeed true. To clarify, consider the examples below first, and then the general case.
As you know, the initiation interval π is the time gap between starting the different loop bodies that then overlap in execution. For instance, considering a general loop statement
with an initiation intervall π=1, we would get
I3; I2; I1;
I4; I3; I2; I1;
and therefore the following kernel of length π=1:
I4[i]; I3[i+1]; I2[i+2]; I1[i+3];
Next, consider π=2:
I5; I3; I1;
I6; I4; I2;
I7; I5; I3; I1;
I8; I6; I4; I2;
What is the kernel here? It has length π=2 and is as follows:
I7[i]; I5[i+1]; I3[i+2]; I1[i+3];
I8[i]; I6[i+1]; I4[i+2]; I2[i+3];
You can generalize this and in the general case, where we may consider a loop body with m=p*π many statements S[i];...;S[i][p*π -1], you would schedule the statements S[i][j] as follows
and thus you get the following kernel:
which has π rows and p columns. The length of the kernel is now the number of rows of the above block, i.e., π, and in each each row, we can execute p of the statements in parallel.
Bytheway, you can now also see the pipeline: in the above case, the entire execution of a loop body S[i];...;S[i][p*π -1] is split into p stages where stage k executes S[i][k*π];...;S[i][k*π -1]. The execution of one stage of a loop body overlaps with the stages of p-1 other loop bodies.