{"id":30,"date":"2009-01-15T15:26:00","date_gmt":"2009-01-15T05:26:00","guid":{"rendered":"http:\/\/brnz.org\/hbr\/?p=30"},"modified":"2010-11-09T14:14:17","modified_gmt":"2010-11-09T04:14:17","slug":"spu-unaligned-loads","status":"publish","type":"post","link":"https:\/\/brnz.org\/hbr\/?p=30","title":{"rendered":"SPU unaligned loads"},"content":{"rendered":"<p>Extract three adjacent ushorts from an arbitrary array location.<\/p>\n<p>(Would do a lot better unrolled, I think)<\/p>\n<pre>for (j = 0; j &lt; num_indexes; j += 3) {\r\n \/\/ Determine address of aligned qword containing indexes[j]\r\n qword lower_qword = si_from_ptr(&amp;indexes[j]);\r\n\r\n \/\/ Load qword containing indexes[j] and successor\r\n\u00a0qword first = si_lqd(lower_qword, 0);\r\n\u00a0qword second = si_lqd(lower_qword, 16);\r\n\r\n \/\/ Calculate &amp;indexes[j]&amp;15 - offset of index from 16 byte alignment\r\n\u00a0qword offset = si_andi(lower_qword, 15);\r\n\r\n \/\/ Generate a mask to select the appropriate parts of first and \r\n\u00a0\/\/ second form byte select mask from (1&lt;\r\n\u00a0qword one = si_from_uint(1);\r\n\u00a0qword mask = si_fsmb(si_sf(one, si_shl(one, offset)));\r\n\r\n \/\/ Rotate first and second parts to desired locations\r\n\u00a0\/\/ This is the key interesting bit, but I'd like to\r\n\u00a0\/\/ think this could be improved upon...\r\n\u00a0first = si_shlqby(first, offset);\r\n\u00a0second = si_rotqmby(second, si_ori(offset, 16));\r\n\r\n \/\/ Store indexes[j],[j+1],[j+2] in vs.\r\n\u00a0qword is = si_selb(first, second, mask);\r\n\r\n \/\/ Expand is to uint positioning\r\n\u00a0is = si_shufb(is, is, SHUFB8(0,A,0,B,0,C,0,0));\r\n\r\n qword vs = si_mpya(is, (qword)spu_splats(vertex_size),\r\n                \u00a0(qword)spu_splats((unsigned)vertices));\r\n\r\n func(vs);\r\n}<\/pre>\n","protected":false},"excerpt":{"rendered":"<p>Extract three adjacent ushorts from an arbitrary array location. (Would do a lot better unrolled, I think) for (j = 0; j &lt; num_indexes; j += 3) { \/\/ Determine address of aligned qword containing indexes[j] qword lower_qword = si_from_ptr(&amp;indexes[j]); \/\/ Load qword containing indexes[j] and successor \u00a0qword first = si_lqd(lower_qword, 0); \u00a0qword second = &hellip; <a href=\"https:\/\/brnz.org\/hbr\/?p=30\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;SPU unaligned loads&#8221;<\/span><\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[4],"tags":[37],"_links":{"self":[{"href":"https:\/\/brnz.org\/hbr\/index.php?rest_route=\/wp\/v2\/posts\/30"}],"collection":[{"href":"https:\/\/brnz.org\/hbr\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/brnz.org\/hbr\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/brnz.org\/hbr\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/brnz.org\/hbr\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=30"}],"version-history":[{"count":3,"href":"https:\/\/brnz.org\/hbr\/index.php?rest_route=\/wp\/v2\/posts\/30\/revisions"}],"predecessor-version":[{"id":622,"href":"https:\/\/brnz.org\/hbr\/index.php?rest_route=\/wp\/v2\/posts\/30\/revisions\/622"}],"wp:attachment":[{"href":"https:\/\/brnz.org\/hbr\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=30"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/brnz.org\/hbr\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=30"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/brnz.org\/hbr\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=30"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}