Assembly Primer Part 3 — GDB Usage Primer

These are my notes for where I can see both PPC and SPU varying from ia32, as presented in the video Part 3 — GDB Usage Primer.  The usage of gdb is effectively the same for all three architectures — I’ve noted here some of the differences in the program being debugged.

In the ia32 disassembly of SimpleDemo.c, the call instruction is generated for function calls.

When compiled for PPC, I see bl — branch to address offset from bl instruction, placing the address of the following instruction in the link register (lr).

When compiled for SPU, I see brsl — branch to address offset from brsl instruction, placing the address of the following instruction into the specified register (typically r0, used as link register).

Neither PPC nor SPU pass args on the stack (at least not for two scalar args as for the add function in SimpleDemo.c).  Those values can still be seen as being present on the stack when examining it in gdb.  The reason appears to be that when compiled with no optimisation, a number of registers are pushed to the stack that are not needed.  Compiling at -O1 eliminates the superfluous pushes, so the args are no longer visible there, being present in the appropriate registers when the function is called.

(This document on calling conventions from Intel seems to say that args get passed to functions in regs where possible on ia32 as well… I can see it happening for amd64, not ia32)

As noted above, PPC and SPU store the function return address in the link register (lr or r0), not on the stack.

All three architectures appear to put the return value in a register (eax or r3).

Previous assembly primer notes…

Part 1 — System Organization — PPC — SPU
Part 2 — Memory Organisation — SPU

Assembly Primer Part 2 — Memory Organisation — SPU

These are my notes for where I can see SPU varying from ia32, as presented in the video Part 2 — Virtual Memory Organization.

(I didn’t notice see any significant differences between the presented information for ia32 and PPC — apart from what was noted from the first presentation — so there’s no separate post for that arch).

To compile SimpleDemo.c to examine on the SPU, you’ll need to add the -mstdmain option to spu-gcc (or spu-elf-gcc) so that the program will correctly receive the command line options.

If you examine the /proc/$PID/maps file when running a standalone SPU program, you’ll see something like this:

00100000-00120000 r-xp 00000000 00:00 0       [vdso]
0fd70000-0fd90000 r-xp 00000000 fe:02 1590608 /lib/libgcc_s.so.1
0fd90000-0fda0000 rw-p 00010000 fe:02 1590608 /lib/libgcc_s.so.1
0fdb0000-0fdd0000 r-xp 00000000 fe:02 292441  /lib/libpthread-2.11.2.so
0fdd0000-0fde0000 rw-p 00010000 fe:02 292441  /lib/libpthread-2.11.2.so
0fdf0000-0fe00000 r-xp 00000000 fe:02 292418  /lib/librt-2.11.2.so
0fe00000-0fe10000 rw-p 00000000 fe:02 292418  /lib/librt-2.11.2.so
0fe20000-0ff90000 r-xp 00000000 fe:02 292437  /lib/libc-2.11.2.so
0ff90000-0ffa0000 rw-p 00160000 fe:02 292437  /lib/libc-2.11.2.so
0ffa0000-0ffb0000 rw-p 00000000 00:00 0
0ffc0000-0ffe0000 r-xp 00000000 fe:02 1590211 /usr/lib/libspe2.so.2.2.80
0ffe0000-0fff0000 rw-p 00010000 fe:02 1590211 /usr/lib/libspe2.so.2.2.80
10000000-10010000 r-xp 00000000 fe:02 1821445 /usr/bin/elfspe
10010000-10020000 rw-p 00000000 fe:02 1821445 /usr/bin/elfspe
10020000-10050000 rwxp 00000000 00:00 0       [heap]
f7f60000-f7f70000 rw-p 00000000 00:00 0
f7f70000-f7fb0000 rw-s 00000000 00:13 9086
                                       /spu/spethread-2971-268566640/mem
f7fb0000-f7fc0000 rw-p 00000000 fe:02 1463963
                     /home/jonathan/AssemblyLanguagePrimer/SimpleDemoSPU
f7fc0000-f7fe0000 r-xp 00000000 fe:02 292430  /lib/ld-2.11.2.so
f7fe0000-f7ff0000 rw-p 00020000 fe:02 292430  /lib/ld-2.11.2.so
ffea0000-ffff0000 rw-p 00000000 00:00 0       [stack]

This is the information for the elfspe loader for the SPU program.

(The SPU’s local store is mapped into elfspe’s address space at 0xf7f7000.  This is with randomize_va_space set to zero, so it should always be in that location. This is possibly useful…)

There is no equivalent of this for the SPU program itself as there is no virtual memory mapping required (or possible) within the local store.  The state of the SPU’s memory state may be examined externally through the spufs interface provided (in this case, the file /spu/spethread-2971-268566640/mem from the above listing may be used to access the current SPU LS state). Or, of course, using gdb.

Previous assembly primer notes…

Part 1 — System Organization — PPCSPU

Assembly Primer Part 1 — System Organization — SPU

The platform I’m using is Debian Sid on a PS3 (3.15 OtherOS) with the spu-gcc toolchain.

These are my notes for where I can see the SPU varying from the ia32, as presented in the video Part 1 — System Organization.  Let me know if I’ve missed something important, obvious or got something wrong.

For reference, I’m using the SPU ABI and ISA docs.

General Purpose Registers

  • 128 128bit registers, treated as different data types depending on the instruction used.
    • r0 (LR) — Return Address / Link Register
    • r1 (SP) — Stack pointer information.
      • Word 0 — current stack pointer (always 16-byte aligned, grows down)
      • Word 1 — bytes of available stack space
    • r2 — Environment pointer (for languages that use one)
    • r3–r74 — First 72 qwords of a function’s argument list and its return value
    • r75–r79 — Scratch registers
    • r80–r127 — Local variable registers.  Preserved across function calls.
  • FPSCR — Floating-Point Status and Control Register
  • Channels — Used for various DMA operations, access to the decrementer, mailboxes and signalling.
  • SRR0 — Used to store the address of next instruction upon interrupt
  • LSLR — Local Store Limit Register.  0x0003ffff == 218-1 == 262143

Memory model

  • .text at address 0
  • Bottom of stack at 0x3ffff, effectively earlier if using -mstdmain.  (at least, afaict — could look more closely at how -mstdmain actually works…)