{"id":724,"date":"2010-12-01T23:10:46","date_gmt":"2010-12-01T13:10:46","guid":{"rendered":"http:\/\/brnz.org\/hbr\/?p=724"},"modified":"2010-12-02T15:42:17","modified_gmt":"2010-12-02T05:42:17","slug":"assembly-primer-part-6-%e2%80%94-moving-data-%e2%80%94-ppc","status":"publish","type":"post","link":"https:\/\/brnz.org\/hbr\/?p=724","title":{"rendered":"Assembly Primer Part 6 \u2014 Moving Data \u2014 PPC"},"content":{"rendered":"<p>These are my notes for where I can see PPC varying from ia32, as presented in the video <a href=\"http:\/\/securitytube.net\/Assembly-Primer-for-Hackers-(Part-6)-Moving-Data-video.aspx\">Part 6 \u2014 Moving Data<\/a>.<\/p>\n<p>There are notable differences between PPC and ia32 when moving\/copying data around, although not as much as is the case with SPU &#8212; PPC copes with non-natural alignments (although I&#8217;m not sure what performance penalties there are on Cell or modern ia32 arches for doing so &#8212; at the very least, the cost of the occasional extra cache line), but doesn&#8217;t have the full range of mov instructions supported by ia32.<\/p>\n<p>(When approaching this part, I wrote the SPU version first because I&#8217;ve had a lot more experience with that arch and I though it would be quicker.  I was wrong.)<\/p>\n<h2>Moving Data<\/h2>\n<p>So, let&#8217;s look at <a href=\"http:\/\/code.securitytube.net\/MovDemo.s\">MovDemo.s<\/a> for PPC, piece by piece.<\/p>\n<p>First, the storage:<\/p>\n<pre style=\"padding-left: 30px;\"># Demo program to show how to use Data types and MOVx instructions \r\n\r\n.data\r\n    HelloWorld:\r\n        .ascii \"Hello World!\"\r\n\r\n    ByteLocation:\r\n        .byte 10\r\n\r\n    Int32:\r\n        .int 2\r\n    Int16:\r\n        .short 3\r\n    Float:\r\n        .float 10.23\r\n\r\n    IntegerArray:\r\n        .int 10,20,30,40,50\r\n<\/pre>\n<p>Same as for ia32 and SPU.\u00a0 PPC will cope with the lack of alignment.<\/p>\n<h3>1. Immediate value to register<\/h3>\n<pre style=\"padding-left: 30px;\">.text\r\n    .globl _start\r\n    _start:\r\n        #movl $10, %eax\r\n        li 0,10\r\n<\/pre>\n<p>Simple enough to load a small constant into a register.\u00a0 Like SPU, there&#8217;s extra work required if trying to load more complex values.<\/p>\n<p>To load a full 32-bits of immediate data into a register requires two half-word load  instructions for the upper and lower parts (as will be seen for loading addresses).\u00a0 Loading 64-bit values <a href=\"http:\/\/www.ibm.com\/developerworks\/library\/l-ppc\/\">appears to take five instructions<\/a> (four immediate loads and a rotate in the middle).\u00a0 The joys of 32-bit, fixed-length instructions :)<\/p>\n<p>Aside:\u00a0 <em>li<\/em> and <em>lis<\/em> are extended mnemonics that generate <em>addi <\/em>and <em>addsi<\/em> instructions &#8212; and if the second operand is zero, these instructions use the value zero, not the value in gpr0.\u00a0 Special case ftw.<\/p>\n<p>Speculating: On SPU, loads from local store take 6 cycles, so it will often be quicker to load a value than to generate it.\u00a0 On PPC, it would seem that even five instructions will complete much faster than a (potential) L2 cache miss.<\/p>\n<h3>2. Immediate value to memory<\/h3>\n<pre style=\"padding-left: 30px;\">#movw $50, Int16\r\n\r\nli 1,50\r\nlis 2,Int16@ha\r\naddi 2,2,Int16@l\r\nsth 1,0(2)\r\n<\/pre>\n<p>There&#8217;s no instruction to write an immediate value directly to memory &#8212; the source for a write must be a register, so we load that first.<\/p>\n<p>The address is loaded in the following two instructions &#8212; @l is the lower 16 bits of the address.\u00a0 @ha refers to the upper 16 bits of the address, where the a indicates the value is &#8220;adjusted so that adding the low 16 bits will perform the correct calculation of the address accounting for signed arithmetic&#8221; (from <a href=\"http:\/\/refspecs.linuxfoundation.org\/ELF\/ppc64\/PPC-elf64abi-1.9.html#RELOC-TYPE\">here<\/a>, where these suffixes and are documented).<\/p>\n<p>The halfword is then written to the address stored in gpr2.<\/p>\n<h3>3. Register to register<\/h3>\n<pre style=\"padding-left: 30px;\">#movl %eax, %ebx\r\nori 3,0,0\r\n<\/pre>\n<p>Like SPU, register copy can be done with Or Immediate against zero.<\/p>\n<h3>4. Memory to register<\/h3>\n<pre style=\"padding-left: 30px;\">#movl Int32, %eax\r\n\r\nlis 4,Int32@ha\r\naddi 4,4,Int32@l\r\nlwz 5,0(4)\r\n<\/pre>\n<p>Easy enough &#8212; load the address, load from the address.<\/p>\n<h3>5. Register to memory<\/h3>\n<pre style=\"padding-left: 30px;\">#movb $3, %al\r\n#movb %al, ByteLocation\r\n\r\nli 6,3\r\nlis 7,ByteLocation@ha\r\naddi 7,7,ByteLocation@l\r\nstb 6,0(7)\r\n<\/pre>\n<p>Again, load the address, store a byte to the address.<\/p>\n<h3>6. Register to indexed memory location<\/h3>\n<pre style=\"padding-left: 30px;\">#movl $0, %ecx\r\n#movl $2, %edi\r\n#movl $22, IntegerArray(%ecx,%edi , 4)\r\n\r\nli 7,2\r\nslwi 8,7,2\u00a0 # extended mnemonic - rlwinm 8,7,2,0,31-2\r\nlis 9,IntegerArray@ha\r\naddi 9,9,IntegerArray@l\r\nlwzx 10,9,8\r\n<\/pre>\n<p>Load an element offset, shift to get the byte offset, load the address and use the x-form Load Word to fetch from the (base address + offset).<\/p>\n<p>(The z in that mnemonic refers to zeroing of the upper 32 bits of the 64-bit register.\u00a0 There appears to be an algebraic load that does sign extension)<\/p>\n<h3>7. Indirect addressing<\/h3>\n<pre style=\"padding-left: 30px;\">#movl $Int32, %eax\r\n#movl (%eax), %ebx\r\n\r\nlis 11,Int32@ha\r\naddi 11,11,Int32@l\r\nlwz 12,0(11)\r\n\r\n#movl $9, (%eax)\r\n\r\nli 13,9\r\nstw 13,0(11)\r\n<\/pre>\n<p>More of the same kind of thing because that&#8217;s how PPC does loads and stores.<\/p>\n<h2>Concluding thoughts<\/h2>\n<p>Reasonably straightforward, a bit more limited than ia32 in addressing modes but nothing too surprising.\u00a0 Particularly compared to SPU.<\/p>\n<p>PPC does appear to have some more interesting load and store instructions that I haven&#8217;t tried here &#8212; updating index on load\/store stands out as something I&#8217;d like to take a closer look at.\u00a0 The PPC rotate and mask instructions look like some mind-bending fun to play with, but that&#8217;s something for another time.<\/p>\n<h3>Previous assembly primer notes\u2026<\/h3>\n<p>Part 1 \u2014 System Organization \u2014 <a href=\"?p=631\">PPC<\/a> \u2014 <a href=\"?p=632\">SPU<\/a><br \/>\nPart 2 \u2014 Memory Organisation \u2014 <a href=\"?p=633\">SPU<\/a><br \/>\nPart 3 \u2014 GDB Usage Primer \u2014 <a href=\"?p=634\">PPC &amp; SPU<\/a><br \/>\nPart 4 \u2014 Hello World \u2014 <a href=\"https:\/\/brnz.org\/hbr\/?p=635\">PPC<\/a> \u2014 <a href=\"?p=634\">SPU<\/a><br \/>\nPart 5 &#8212; Data Types &#8212; <a href=\"https:\/\/brnz.org\/hbr\/?p=685\">PPC &amp; SPU<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>These are my notes for where I can see PPC varying from ia32, as presented in the video Part 6 \u2014 Moving Data. There are notable differences between PPC and ia32 when moving\/copying data around, although not as much as is the case with SPU &#8212; PPC copes with non-natural alignments (although I&#8217;m not sure &hellip; <a href=\"https:\/\/brnz.org\/hbr\/?p=724\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;Assembly Primer Part 6 \u2014 Moving Data \u2014 PPC&#8221;<\/span><\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[5,26],"tags":[38,39],"_links":{"self":[{"href":"https:\/\/brnz.org\/hbr\/index.php?rest_route=\/wp\/v2\/posts\/724"}],"collection":[{"href":"https:\/\/brnz.org\/hbr\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/brnz.org\/hbr\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/brnz.org\/hbr\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/brnz.org\/hbr\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=724"}],"version-history":[{"count":11,"href":"https:\/\/brnz.org\/hbr\/index.php?rest_route=\/wp\/v2\/posts\/724\/revisions"}],"predecessor-version":[{"id":736,"href":"https:\/\/brnz.org\/hbr\/index.php?rest_route=\/wp\/v2\/posts\/724\/revisions\/736"}],"wp:attachment":[{"href":"https:\/\/brnz.org\/hbr\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=724"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/brnz.org\/hbr\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=724"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/brnz.org\/hbr\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=724"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}