-funroll-loops

In general, C is a lousy language for expressing this kind of parallelism on the SPU. The original loop that ‘inspired’ this nonsense looks something like :

for (j = 0; j < num_indexes; j += 3) {   
 const float *v0, *v1, *v2;
 v0 = (const float *) (vertices + indexes[j+0] * vertex_size);
 v1 = (const float *) (vertices + indexes[j+1] * vertex_size);
 v2 = (const float *) (vertices + indexes[j+2] * vertex_size);

 func(v0, v1, v2);
}

which is quite clear and straightforward to read, but with hidden complexity – the lack of quadword alignment, the way it is expressed as three seperate multiply-adds, and the separation into three (unpacked) variables which are repacked inside func().

Leave a Reply