(This is wonderfully obtuse, but amused me :)

Neil Henning (@sheredom) asked:

SPU gurus of twitter unite, want a vector unsigned int with {1, 2, 3, 4} in each slot, without putting it in as elf constant, any ideas?

Interesting question. The SPU ISA generally doesn’t help build vectors with different values in each slot. In this case, there are only very small values required in each register, so it can be done with a neat little trick.

My answer:

    fsmbi r4, 0x7310  # r4 = {0x00ffffff, 0x0000ffff, 0x000000ff, 0x00000000}
    clz r5, r4        # r5 = {8,16,24,32}
    rotmi r6, r5, -3  # r6 = {1,2,3,4}

Instructions are:

  • fsmbi — form select mask byte immediate. Creates a 128 bit mask from a 16 bit value, expanding each bit of input to 8 bits of output.
  • clz — count leading zeroes. Counts the number of leading zeros in each word.
  • rotmi — rotate and mask word immediate (logical shift right by negative immediate). Shifts each word right by the negation of number of bits specified.

This solution is entirely self contained, required no pre-set state (unlike my first attempt utilising the cbd instruction). In terms of raw instruction size, it’s a whole eight bytes smaller than storing the vector in memory and loading it when needed (that being 16+4 bytes), and a little slower than using a load instruction.

(On a cursory re-examination of the SPU ISA, fsmbi is the only instruction that will construct a different value in each word of a register. A specific pattern may be generated with cbd/cbx that can be used for this problem, but it depends on the contents of another register which limits its already limited usefulness. Combining fsmbi with other immediate instructions allows for a wide range of values to be constructed independent of register state and without access to storage)


2 thoughts on “{1,2,3,4}”

Leave a Reply

Your email address will not be published. Required fields are marked *