jonathan

Getting into gamedev [aka Career Motivational Speaker]

Before departing Tasmania, I visited several high schools where I talked about getting into a career in gamedev. That was the premise, at least — I really talked more about what you can do to get a job doing what you like. Lots of high school students like games and I was getting ready to move for a gamedev job, so it was a good hook.

The opportunity came about from a conversation (with a high school student) about my upcoming move, what I was doing and how it had come about, and about what part of my experience was relevant to his own. The conversation was had while the Pathway Planning Officer for a local school was nearby, and she invited me to the school to talk to some of the students there.

The problem then became how to turn a spontaneous conversation into something sufficiently well-prepared and engaging that I could talk to a room of teenagers for up to an hour. I enjoy presenting to/speaking with groups, particularly on topics that I’m passionate about, but I have little experience talking to teenagers and was somewhat uncertain about what I’d need to do to get and keep their attention. I like to keep presentations interactive and flexible — I’d rather talk about what interests the listeners than about my own prepared material. For that reason, I don’t tend to use slides and try to be interesting, engaging and memorable all on my own. (there’s always a risk leaving out something “important” — but as there’s always far more material than I can cover in a single presentation, if the audience has been interested it’s probably a nett win :P)

For all my desire to keep it free-flowing and interactive, to give a talk without a clear idea of what I want to talk about and how it fits together in a coherent and plausible manner, I’m going to struggle to impart any useful information/knowledge to the students that have so generously taken time out from their Social Science class (or whatever). It’s hard to evoke passion without passion. I find it easier to convey my excitement and passion for something when I’m well prepared to talk about it.

I did some reading in preparation of the talk to make sure that I wouldn’t be talking nonsense. While I was about to start in the industry, I had not worked in the industry. While I didn’t think I had many incorrect preconceptions or invalid assumptions about the industry (who would?), my lack of experience was one thing that cropped up repeatedly through recent job applications. I thought it appropriate to do my best to make sure what I had to say would be generally useful.

I read what I could find, but a couple of sites stood out in particular: there’s a lot of the great advice on tinysubversions.com, particularly the material on effective networking in the games industry. Linked from there, I found a link to a list of New Year’s Resolutions for Game Industry Newbies (or people who want to eventually be one), which I basically ripped off to form the core of my presentation (many thanks to Chris Hecker and Jonathan Blow for the list).

Here’s an outline of what I talked about:

who I am

always good for the audience to know the name of the guy they’re listening to.
talk about my education and work history with emphasis on what are likely to be common points of reference — educated/live in local area, personal history back to the age of the audience
upcoming move — mention Insomniac and the games they’ve made, find out how many people in the room knew Insomniac IP (lots)
(made the point that my own education history is not being held up as any ideal for how to get into the industry — far from it)

why I like gamedev (or talk about the sort of gamedev role I aspire to…)
the diversity of careers available in gamedev

used this to kick of some interaction — ask the audience “What goes into making a game? What sort of jobs are there in gamedev?”
purpose was to emphasise diversity of opportunity. It’s not just programmers. (more on that later, though)

nature of the industry —

games are popular

high % of people play electronic games of one kind or another
lots of money spent on games

often unreliable working situation

recent history of gamedev studios in .au (and elsewhere) is not good

not many Australian gamedevs

estimates of <3,000 gamedevs in .au
contrast: >300,000 teachers in .au (not sure if it was a useful stat, but I like it :)

opportunity in smaller scale

low entry options to making games
no guarantees of success…
the indie life is not for everyone

invite questions
on to five points (taken from the New Year’s Resolutions post — see more there)

make things

build experience, build portfolio
good idea, regardless of specialisation or desired industry

play games

play for purpose of critique, understanding
what makes this game good? why do I hate this one? how could it be better?
tie back to point 1 — make things based on what you’ve played, remake, modify, extend

learn things

generally a good idea :)
learn things that will help get to your desired career — be selective
I spruiked the UTas Bachelor of Computing (Games Technology) degree as one option
more learning -> more understanding (hopefully). Helps with 1 and 2.
what you know matters

people

who you know matters
work with people locally with similar interests — opportunity now! Useful with 1, 2, 3
be active in the wider gamedev community e.g. follow gamedevs on twitter. Caveat: don’t be an annoying fanboi. Read, watch, learn, interact in a civil fashion.
being visible to people can help when applying for jobs

learn to program

presented as “optional”
useful skill no matter what — understand how computers work and how to bend them to your will

answer questions until time/questions run out

For all the game-related content in the presentation, it was presented to make clear that these things will work outside the gamedev industry, too — do things that will help get you a job doing what you want, here are some things that can help.

Prepare yourself — opportunities come along from time to time. While you typically can’t make them happen, you can encourage their arrival. Don’t expect you can get a job with no experience/training/portfolio/etc – rather, do what you can to be as ready as you can be for when opportunities arrive.

(Additional: I was interested to hear TJ Fixman talk about similar ideas when recounting his own gamedev career path in a recent Feedback episode)

Assembly Primer Part 7 — Working with Strings — ARM

These are my notes for where I can see ARM varying from IA32, as presented in the video Part 7 — Working with Strings.

I’ve not remotely attempted to implement anything approximating optimal string operations for this part — I’m just working my way through the examples and finding obvious mappings to the ARM arch (or, at least what seem to be obvious). When I do something particularly stupid, leave a comment and let me know :)

Working with Strings

.data
     HelloWorldString:
        .asciz "Hello World of Assembly!"
    H3110:
        .asciz "H3110"

.bss
    .lcomm Destination, 100
    .lcomm DestinationUsingRep, 100
    .lcomm DestinationUsingStos, 100

Here’s the storage that the provided example StringBasics.s uses. No changes are required to compile this for ARM.

1. Simple copying using movsb, movsw, movsl

    @movl $HelloWorldString, %esi
    movw r0, #:lower16:HelloWorldString
    movt r0, #:upper16:HelloWorldString

    @movl $Destination, %edi
    movw r1, #:lower16:Destination
    movt r1, #:upper16:Destination

    @movsb
    ldrb r2, [r0], #1
    strb r2, [r1], #1

    @movsw
    ldrh r3, [r0], #2
    strh r3, [r1], #2

    @movsl
    ldr r4, [r0], #4
    str r4, [r1], #4

More visible complexity than IA32, but not too bad overall.

IA32’s movs instructions implicitly take their source and destination addresses from %esi and %edi, and increment/decrement both. Because of ARM’s load/store architecture, separate load and store instructions are required in each case, but there is support for indexing of these registers:

ARM addressing modes

According to ARM A8.5, memory access instructions commonly support three addressing modes:

Offset addressing — An offset is applied to an address from a base register and the result is used to perform the memory access. It’s the form of addressing I’ve used in previous parts and looks like [rN, offset]
Pre-indexed addressing — An offset is applied to an address from a base register, the result is used to perform the memory access and also written back into the base register. It looks like [rN, offset]!
Post-indexed addressing — An address is used as-is from a base register for memory access. The offset is applied and the result is stored back to the base register. It looks like [rN], offset and is what I’ve used in the example above.

2. Setting / Clearing the DF flag

ARM doesn’t have a DF flag (to the best of my understanding). It could perhaps be simulated through the use of two instructions and conditional execution to select the right direction. I’ll look further into conditional execution of instructions on ARM in a later post.

3. Using Rep

ARM also doesn’t appear to have an instruction quite like IA32’s rep instruction. A conditional branch and a decrement will be the long-form equivalent. As branches are part of a later section, I’ll skip them for now.

    @movl $HelloWorldString, %esi
    movw r0, #:lower16:HelloWorldString
    movt r0, #:upper16:HelloWorldString

    @movl $DestinationUsingRep, %edi
    movw r1, #:lower16:DestinationUsingRep
    movt r1, #:upper16:DestinationUsingRep

    @movl $25, %ecx # set the string length in ECX
    @cld # clear the DF
    @rep movsb
    @std

    ldm r0!, {r2,r3,r4,r5,r6,r7}
    ldrb r8, [r0,#0]
    stm r1!, {r2,r3,r4,r5,r6,r7}
    strb r8, [r1,#0]

To avoid conditional branches, I’ll start with the assumption that the string length is known (25 bytes). One approach would be using multiple load instructions, but the load multiple (ldm) instruction makes it somewhat easier for us — one instruction to fetch 24 bytes, and a load register byte (ldrb) for the last one. Using the ! after the source-address register indicates that it should be updated with the address of the next byte after those that have been read.

The storing of the data back to memory is done analogously. Store multiple (stm) writes 6 registers×4 bytes = 24 bytes (with the ! to have the destination address updated). The final byte is written using strb.

4. Loading string from memory into EAX register

    @cld
    @leal HelloWorldString, %esi
    movw r0, #:lower16:HelloWorldString
    movt r0, #:upper16:HelloWorldString

    @lodsb
    ldrb r1, [r0, #0]

    @movb $0, %al
    mov r1, #0

    @dec %esi  @ unneeded. equiv: sub r0, r0, #1
    @lodsw
    ldrh r1, [r0, #0]

    @movw $0, %ax
    mov r1, #0

    @subl $2, %esi # Make ESI point back to the original string. unneeded. equiv: sub r0, r0, #2
    @lodsl
    ldr r1, [r0, #0]

In this section, we are shown how the IA32 lodsb, lodsw and lodsl instructions work. Again, they have implicitly assigned register usage, which isn’t how ARM operates.

So, instead of a simple, no-operand instruction like lodsb, we have a ldrb r1, [r0, #0] loading a byte from the address in r0 into r1. Because I didn’t use post indexed addressing, there’s no need to dec or subl the address after the load. If I were to do so, it could look like this:

    ldrb r1, [r0], #1
    sub r0, r0, #1

    ldrh r1, [r0], #2
    sub r0, r0, #2

    ldr r1, [r0], #4

If you trace through it in gdb, look at how the value in r0 changes after each instruction.

5. Storing strings from EAX to memory

    @leal DestinationUsingStos, %edi
    movw r0, #:lower16:DestinationUsingStos
    movt r0, #:upper16:DestinationUsingStos

    @stosb
    strb r1, [r0], #1
    @stosw
    strh r1, [r0], #2
    @stosl
    str r1, [r0], #4

Same kind of thing as for the loads. Writes the letters in r1 (being “Hell” — leftovers from the previous section) into DestinationUsingStos (the result being “HHeHell”). String processing on little endian architectures has its appeal.

6. Comparing Strings

    @cld
    @leal HelloWorldString, %esi
    movw r0, #:lower16:HelloWorldString
    movt r0, #:upper16:HelloWorldString
    @leal H3110, %edi
    movw r1, #:lower16:H3110
    movt r1, #:upper16:H3110

    @cmpsb
    ldrb r2, [r0,#0]
    ldrb r3, [r1,#0]
    cmp r2, r3

    @dec %esi
    @dec %edi
    @not needed because of the addressing mode used

    @cmpsw
    ldrh r2, [r0,#0]
    ldrh r3, [r1,#0]
    cmp r2, r3

    @subl $2, %esi
    @subl $2, %edi
    @not needed because of the addressing mode used
    @cmpsl
    ldr r2, [r0,#0]
    ldr r3, [r1,#0]
    cmp r2, r3

Where IA32’s cmps instructions implicitly load through the pointers in %edi and %esi, explicit loads are needed for ARM. The compare then works in pretty much the same way as for IA32, setting condition code flags in the current program status register (cpsr). If you run the above code, and check the status registers before and after execution of the cmp instructions, you’ll see the zero flag set and unset in the same way as is demonstrated in the video.

The condition code flags are:

bit 31 — negative (N)
bit 30 — zero (Z)
bit 29 — carry (C)
bit 28 — overflow (V)

There’s other flags in that register — all the details are on page B1-16 and B1-17 in the ARM Architecture Reference Manual.

And with that, I think we’ve made it (finally) to the end of this part for ARM.

{1,2,3,4}

(This is wonderfully obtuse, but amused me :)

Neil Henning (@sheredom) asked:

SPU gurus of twitter unite, want a vector unsigned int with {1, 2, 3, 4} in each slot, without putting it in as elf constant, any ideas?

Interesting question. The SPU ISA generally doesn’t help build vectors with different values in each slot. In this case, there are only very small values required in each register, so it can be done with a neat little trick.

My answer:

    fsmbi r4, 0x7310  # r4 = {0x00ffffff, 0x0000ffff, 0x000000ff, 0x00000000}
    clz r5, r4        # r5 = {8,16,24,32}
    rotmi r6, r5, -3  # r6 = {1,2,3,4}

Instructions are:

fsmbi — form select mask byte immediate. Creates a 128 bit mask from a 16 bit value, expanding each bit of input to 8 bits of output.
clz — count leading zeroes. Counts the number of leading zeros in each word.
rotmi — rotate and mask word immediate (logical shift right by negative immediate). Shifts each word right by the negation of number of bits specified.

This solution is entirely self contained, required no pre-set state (unlike my first attempt utilising the cbd instruction). In terms of raw instruction size, it’s a whole eight bytes smaller than storing the vector in memory and loading it when needed (that being 16+4 bytes), and a little slower than using a load instruction.

(On a cursory re-examination of the SPU ISA, fsmbi is the only instruction that will construct a different value in each word of a register. A specific pattern may be generated with cbd/cbx that can be used for this problem, but it depends on the contents of another register which limits its already limited usefulness. Combining fsmbi with other immediate instructions allows for a wide range of values to be constructed independent of register state and without access to storage)

TTYtter for the N900

A quick documenting of how I got TTYtter running on the N900/Maemo5.

0. Missing curl

TTYtter requires curl for OAuth, but curl isn’t packaged in the maemo5 repositories (libcurl is — which is frustrating. The particular reason for the frustration will be made clear later…)

That being the case, let’s build curl! I grabbed the sources for the version of curl that matched installed libcurl from the relevant source package page on maemo.org, unpacked the tarball and patch -p1’d the gunzipped patch.

1. What didn’t work

The first half-hearted attempt was to build curl using the cross toolchain I have installed on my gentoo desktop (built with crossdev -t arm-linux-gnueabi). I had little hope that this would just work, and a quick ./configure –host=arm-linux-gnueabi –prefix=/home/user/local && make && make install && scp -r /home/user/local n900: (or something like it) later, it didn’t — the foremost hurdle being that maemo5 uses an antiquated glibc-2.5 (2005, yeah!), and my toolchain uses (and thus generates programs that expect) glibc-2.11.3.

Persisting with my all-too-modern toolchain seemed likely to be a whole lot of effort — I decided to go with what appeared to be the Official method — the probability of success seemed marginally higher.

2. What worked

I installed scratchbox and built it there.

i. Installing scratchbox

I first found this MaemoOnGentoo outline which was got me started. Rather than the emerge command listed on that page, I ended up needing something like:

emerge scratchbox scratchbox-devkit-debian scratchbox-devkit-perl \
scratchbox-devkit-cputransp scratchbox-devkit-doctools \
scratchbox-toolchain-cs2007q3-glibc2_5 scratchbox-devkit-qemu\
scratchbox-devkit-git scratchbox-devkit-svn

As per that page, I needed to re-emerge xorg-server with the kdrive USE flag to build xephyr.

Started scratchbox with /etc/init.d/scratchbox start

From that point on, the Manual Installation instructions for the SDK from maemo.org generally worked — I added a user with /scratchbox/sbin/sbox_adduser, added my user account to the sbox group. (Actually, not really knowing what I was doing, after doing that, I ran the maemo-sdk-install_5.0.sh script, which seemed to do the right thing)

I needed to manually install the Nokia binaries/apps as per the Manual Installation instructions.

That done, I was able to start the SDK UI inside a xephyr window. i.e. Xephyr :2 -host-cursor -screen 800x480x16 -dpi 96 -ac and (inside a scratchbox prompt) DISPLAY=:2 af-sb-init.sh start

(Having the UI running is the Hello, world! ‘proof’ of functionality — it may not count for much, but it’s nice to see)

ii. Building it there

Once there’s a functional scratchbox environment, the next thing to do is to build the package.

I naively followed the relevant parts of the example from the Packaging guide on maemo.org.

Taking the source (as mentioned before — de-tarballed sources with patch applied) it became apparent that the necessary configuration was already in place to build the desired .deb (so much of the guide was unneeded for this task). In fact, from what I recall, the only command from that guide that was necessary was dpkg-buildpackage -sa -rfakeroot -k<my email address> (run using the FREMANTLE_ARMEL tool config)

End result: a bunch of files, including curl_7.18.2-8maemo6+0m5_armel.deb — the frustration mentioned earlier was that the config exists to build this, and that packaging curl for maemo5 would have been approximately zero extra effort.

(Nothing is ever actually zero extra effort. I know this.)

scp curl_7.18.2-8maemo6+0m5_armel.deb n900:, and install with dpkg –install curl_7.18.2-8maemo6+0m5_armel.deb and TTYtter gets the curl.

3. The final bit

TTYtter starts, but it’s not quite working yet. Maemo5 has a prehistoric perl-5.8.3 (2004, woo!) which appears to lack the kind of UTF8 support that TTYtter wants.

To work around this, start TTYtter with the -seven option.

4. Too long; don’t care

The package is here: curl_7.18.2-8maemo6+0m5_armel.deb
(The original source is here with it)

As root, install the package (dpkg -i curl_7.18.2-8maemo6+0m5_armel.deb) and then (as the regular user) grab and run ttytter -seven

TTYtter is by far the best Twitter client I’ve used on this phone — not least because it works.

Smokey Beef Chili with Guinness

This is a recipe I obtained via twitter. I didn’t make a note of the source, and it was shared as text in an image which I printed. I am now unable to locate the original.

I’m re-posting it the original text here (none of the comments within are mine) for the people that have asked me about it with thanks to whoever was responsible — it was enjoyed by my whole family :D

Smokey Beef Chili with Guiness

500 grams of gravy beef
100 grams of streaky bacon
3 celery stalks
2 red chilies
2 red onions
2 green capsicums
8 garlic cloves
1 can of diced tomatoes
2 cans of red kidney beans
150g of tomato paste
500ml beef stock
Dried oregano
Smoked paprika
Ground cumin
Tabasco sauce
Sugar to taste (usually between 1 and 4 teaspoons)
200ml of Guinness
(Optional) 1 can of smoked chipotle peppers

Roughly dice onion, chilies, capsicum and bacon. Finely dice garlic and celery. Add to hot pot with good slug of olive oil and cook until onion becomes semi-transparent.

Meanwhile, cut gravy beef (or chuck steak, or skirt steak) into large chunks. Whack in a food processor and pulse until half the steak has disintegrated and half has been carved up into various random shapes of random size.

Once onion is browned (important) add big teaspoon of smoked paprika, flat teaspoon of cumin, 2 teaspoons of dried oregano and solid few shakes of Tabasco sauce. Cook and stir for a few minutes until mixture becomes coloured from the spices cooking through it.

Add meat and cook until brown.

Add tomato paste and diced chipotle peppers and cook out until it just starts to caramelise on the walls of the pot. Add the adobo sauce from the chipotle can to taste now as well (warning – hot!)

Add can of diced tomatoes (drained) and kidney beans (drain and rinse well first). Cook for a few minutes.

Add Guinness. Cook until most of the beer has evaporated.

Add beef stock. Bring to boil. Add sugar until the mix in the pot is slightly less sweet than you want it to be (the sauce will reduce and sweetness will increase at the end). Alternatively, add the sugar when you add the tomato paste (it will add a little to the caramelisation of the paste and add a bit of extra flavour – best for second time you cook it, so you know how much you need)

Taste the mix after it starts to boil. Add a teaspoon of dried oregano and a little extra smoked paprika if the mix isn’t as smokey in flavour as you’d like. Add tabasco sauce for extra heat.

Simmer uncovered for 1.5 hours and add the lid when the mix is just a bit wetter than you want. (e.g. you want it wetter for serving with rice than you do for tacos or nachos). Let stand for at least 1/2 hour with heat off and lid on before serving (you can give it a reheat before serving if needed and add a little water to the simmering if it starts to dry out.)

Some folks stir fresh oregano and diced chili through before serving. It doesn’t float my boat, but it may well yours.

Assembly Primer Parts 6 — Moving Data — ARM

My notes for where ARM differs from IA32 in the Assembly Primer video Part 6 — Moving Data.

(There is no separate part 5 post for ARM — apart from the instructions, it’s identical to IA32. There’s even support for the .bss section, unlike SPU and PPC)

Moving Data

We’ll look at MovDemo.s for ARM. First, the storage:

.data

    HelloWorld:
        .ascii "Hello World!"

    ByteLocation:
        .byte 10

    Int32:
        .int 2
    Int16:
        .short 3
    Float:
        .float 10.23

    IntegerArray:
        .int 10,20,30,40,50

It’s the same as for IA32, PPC and SPU. Like the first two, ARM will cope with the unnatural alignment.

1. Immediate value to register

.text
.globl _start
_start:
    @movl $10, %eax

    mov r0, #10

Move the value 10 into register r0.

Something to note: the ARM assembly syntax has some slightly differences. Where others use # to mark the start of a comment, ARM has @ (although # works at the start of a line). Literal values are prefixed with #, which confuses the default syntax highlighting in vim.

2. Immediate value to memory

    @movw $50, Int16

    mov r1, #50
    movw r0, #:lower16:Int16
    movt r0, #:upper16:Int16
    strh r1, [r0, #0]

We need to load the immediate value in a register (r1), the address in a register (r0) and then perform the write. To quote the Architecture Reference Manual:

The ARM architecture … incorporates … a load/store architecture, where data processing operations only operate on register contents, not directly on memory contents.

which is like PPC and SPU, and unlike IA32 — and so we’ll see similarly verbose alternatives to the IA32 examples from the video.

I’m using movw, movt sequence to load the address, rather than ldr (as mentioned in the previous installment).

strh is, in this case, Store Register Halfword (immediate) — writes the value in r1 to the address computed from the sum of the contents of r0 and the immediate value of 0.

3. Register to register

    @movl %eax, %ebx

    mov r1,r0

mov (Move) copies the value from r0 to r1.

4. Memory to register

    @movl Int32, %eax

    movw r0, #:lower16:Int32
    movt r0, #:upper16:Int32
    ldr r1, [r0, #0]

Load the address into r0, load from the address r0+0. Here ldr is Load Register (immediate).

5. Register to memory

    @movb $3, %al
    @movb %al, ByteLocation

    mov r0, #3
    movw r1, #:lower16:ByteLocation
    movt r1, #:upper16:ByteLocation
    strb r0, [r1, #0]

Once again the same kind of thing — load 3 into r0, the address of ByteLocation into r1, perform the store.

6. Register to indexed memory location

    @movl $0, %ecx
    @movl $2, %edi
    @movl $22, IntegerArray(%ecx, %edi, 4)

    movw r0, #:lower16:IntegerArray
    movt r0, #:upper16:IntegerArray
    mov r1, #2
    mov r2, #22
    str r2, [r0, r1, lsl #2]

A little more interesting — here str is Store Register (register) which accepts two registers and an optional shift operation and amount. Here lsl is logical shift left, effectively multiplying r1 by 4 — the size of the array elements.

(GCC puts asl here. Presumably identical to logical shift left, but there’s no mention of asl in the Architecture Reference Manual. Update: ASL is referenced in the list of errors here as an obsolete name for LSL)

Two source registers and a shift is still shy of IA32’s support for an calculating an address from a base address, two registers and a multiply.

7. Indirect addressing

    @movl $Int32, %eax
    @movl (%eax), %ebx

    movw r0, #:lower16:Int32
    movt r0, #:upper16:Int32
    ldr r1, [r0, #0]

    @movl $9, (%eax)

    mov r2, #9
    str r2, [r0, #0]

More of the same.

Concluding thoughts

In addition to the cases above, ARM has a number of other interesting addressing modes that I shall consider in more detail in the future — logical operations, auto-{increment, decrement} and multiples. Combined with conditional execution, there are some very interesting possibilities.

Assembly Primer Part 4 — Hello World — ARM

On to Assembly Primer — Part 4. This is where we start writing a small assembly program for the platform. In this case, I don’t know the language and I don’t know the ABI. Learning these from scratch ranges from interesting to tedious :)

Regarding the language (available instructions, mnemonics and assembly syntax): I’m using the ARM Architecture Reference Manual as my reference for the architecture (odd, I know). It’s very long and the documentation for each instruction is extensive — which is good because there are a lot of instructions, and many of them do a lot of things at once.

Regarding the ABI (particularly things like argument passing, return values and system calls): there’s the Procedure Call Standard for the ARM Architecture, and there are a few other references I’ve found, such as the Debian ARM EABI Port wiki page.

“EABI is the new “Embedded” ABI by ARM ltd. EABI is actually a family of ABI’s and one of the “subABIs” is GNU EABI, for Linux.”

– from Debian ARM EABI Port

System Calls

To perform a system call using the GNU EABI:

put the system call number in r7
put the arguments in r0-r6 (64bit arguments must be aligned to an even numbered register i.e. in r0+r1, r2+r3, or r4+r5)
issue the Supervisor Call instruction with a zero operand — svc #0

(Supervisor Call was previously named Software Interrupt — swi)

Just Exit

Based on the above, it’s not difficult to reimplement JustExit.s (original) for ARM.

.text

.globl _start

_start:
        mov r7, #1
        mov r0, #0
        svc #0

mov here is Move (Immediate) which puts the #-prefixed literal into the named register.

Hello World

Likewise, the conversion of HelloWorldProgram.s (original) is not difficult:

.data 

HelloWorldString:
      .ascii "Hello World\n"

.text 

.globl _start 

_start:
      # Load all the arguments for write () 

      mov r7, #4
      mov r0, #1
      ldr r1,=HelloWorldString
      mov r2, #12
      svc #0

      # Need to exit the program 

      mov r7, #1
      mov r0, #0
      svc #0

This includes the load register pseudo-instruction, ldr — the compiler stores the address of HelloWorldString into the literal pool, a portion of memory located in the program text, and the 32bit address is loaded from the literal pool (more details).

When compiling a similar C program with -mcpu=cortex-a8, I notice that the compiler generates Move (immediate) and Move Top — movw and movt — instructions to load the address directly from the instruction stream, which is presumably more efficient on that architecture.

Assembly Primer Parts 1, 2 and 3 — ARM

I had started a series of posts on assembly programming for the Cell BE PPU and SPU, based on the assembly primer video series from securitytube.net. I have recently acquired a Nokia N900, and so thought I might take the opportunity to continue the series with a look at the ARM processor as well.

Wikipedia lists the N900’s processor as a Texas Instruments OMAP3430, 600MHz ARMv7 Cortex-A8. I’m not at all familiar with the processor family, so I’ll be attempting to find out what all of this means as I go :P

I’ve set up a cross compiler on my desktop machine using Gentoo’s neat crossdev tool (built using crossdev -t arm-linux-gnueabi). The toolchain builds a functional Hello, World!

(I note that scratchbox appears to be the standard tool/environment used to build apps for Maemo — I may take a closer look at that at a later date)

I have whatever the latest public ‘stable’ Maemo 5 release is on the N900 (PR 1.3, I think), with an apt-get install openssh gdb — thus far, enough to “debug” a functional Hello, World!

What follows are some details of the Cortex-A8 architecture present in the N900, particularly in how it differs from IA32, as presented in the videos Part 1 — System Organisation, Part 2 — Virtual Memory Organization and Part 3 — GDB Usage Primer. I’ve packed them all into this post because gdb usage and Linux system usage are largely the same on ARM as they are on PPC and IA32.

Most of the following information comes from the ARM Architecture Reference Manual.

(The number of possible configurations of ARM hardware makes it interesting at times to work out exactly which features are present in my particular processor. From what I can tell, the N900’s Cortex-A8 is ARMv7-A and includes VFPv3 half, single and double precision float support, and NEON (aka Advanced SIMD). I expect I’ll find out more when I actually start to try and program the thing. As to which gcc -march, -mcpu or -mfpu options are most correct for the N900 — I have no idea.)

1. Registers

Integer

There are sixteen 32bit ARM core registers, R0 to R15, where R0–R12 are for general use. R13 contains the stack pointer (SP), R14 is the link register (LR), and R15 is the program counter (PC).

The current program status register (CSPR) contains various status and control bits.

VFPv3 (Floating point) & NEON (Advanced SIMD)

There are thrirty two doubleword (64bit) registers, that can be referenced in a number of ways.

NEON instructions can access these as thirty two doubleword registers (D0–D31) or as sixteen quadword registers (Q0–Q15), able to be used interchangeably.

VFP instructions can view the same registers as 32 doubleword registers (again, D0–D31) or as 32 single word registers (S0–S31). The single word view is packed into the first 16 doubleword registers.

Something like this pic (click to embiggen):

VFP in this core (apparently) supports single and double precision floating point data types and arithmetic, as well as half precision (possibly in two different formats…).

NEON instructions support accessing values in extension registers as

8, 16, 32 or 64bit integer, signed or unsigned,
16 or 32bit floating point values, and
8 or 16bit polynomial values.

There’s also a floating point status and control register (FPSCR).

2. Virtual Memory Organisation

On this platform, program text appears to be loaded at 0x8000.

After an echo 0 > /proc/sys/kernel/randomize_va_space, the top of the stack appears to be 0xbf000000.

3. SimpleDemo

Compared to the video, there are only a couple of small differences when running SimpleDemo in gdb on ARM.

Obviously, the disassembly is not the same as for IA32. Rather than the call instructions noted in the video, you’ll see bl (Branch with Link) for the various functions called.

Where the return address is stored on the stack for IA32, the link register (lr in info registers output) stores the return address for the current function, although lr will be pushed to the stack before another function is called.

(From a cursory googling, it seems that to correctly displaying all VFP/NEON registers requires gdb-7.2 — I’m running the 6.8-based build from the maemo repo. crossdev will build me a gdb I can run on my desktop PC — crossdev -t arm-linux-gnueabi –ex-gdb — but I believe I still need to build a newer gdbserver to run on the N900.)

Research tastes better when served with source

I’ve been reading a range of computing-related research in recent times and there’s one thing in particular that bugs me: research presented without (or with insufficient) source code.

On several occasions I’ve attempted to implement a neat algorithm that I’ve read about and have been confounded: written explanations often confuse more than they enlighten, and pseudo-code (where present) invariably glosses over critical implementation details, disguising the complexity of the implementation and/or run-time.

(And, in my limited experience, seeking assistance from the author via email rarely elicits a helpful response. I’d like to hope that that is not the norm, but I fear that it is not…)

The inclusion of source code with published research provides the means for others to more easily reproduce, test and validate the assertions from a paper, particularly exposing flawed assumptions and bugs.

More importantly, it should make it easier for others to build upon the research. The implementation reveals a great deal about research – its scope, shortcomings and opportunities for improvement – in ways that are often not exposed by the text alone. It provides a platform that can be built upon directly.

Standing on the shoulders of giants is easier if we can get a hand up from those already doing so, rather than having to re-engineer the same stepladder that got them there.

Even if done poorly, the availability of source code should still result in an improvement over the current state of affairs – I’d rather have poorly written source code than poorly written explanation of the same.

Space restrictions for submitted papers certainly does not encourage the inclusion of source code. To be fair, it makes little sense that reams of source code should be published in paper form.

Realistically, it makes little sense to expect that recent or future research will be read printed on paper. Ever. The common 2-column A4 format for publication is terrible for reading on-screen and not particularly pleasant even when printed. Continuing to prepare research for publication in this legacy style ensures that papers remain hard to read, while also preventing easy inclusion of more effective ways of presenting information – not just source code, but larger, more detailed diagrams, interactive systems and more useful, intuitive navigation (to name a few).

The issue of intellectual property of course needs consideration, but if the technique is able to described publicly it is not unreasonable to expect that an implementation should be no less freely available – even if both are encumbered by patent or other restrictions.

For researchers, my exhortation is to publish your source – it can only make your research more relevant and useful.

This article was originally written for #AltDevBlogADay

20101223

Less talk, more rock — The native language of video games is neither spoken nor written

X86 Opcode and Instruction Reference

X Window System Network Performance (via High Scalability)

jpeg-compressor — small; single C++ file

CCAN

Supplemental reading in Computer Science

OpenCL on the CPU: AVX and SSE

The SPF Setup Wizard

Working with Strings

1. Simple copying using movsb, movsw, movsl

ARM addressing modes

2. Setting / Clearing the DF flag

3. Using Rep

4. Loading string from memory into EAX register

5. Storing strings from EAX to memory

6. Comparing Strings

Other assembly primer notes are linked here.

0. Missing curl

1. What didn’t work

2. What worked

i. Installing scratchbox

ii. Building it there

3. The final bit

4. Too long; don’t care

Moving Data

1. Immediate value to register

2. Immediate value to memory

3. Register to register

4. Memory to register

5. Register to memory

6. Register to indexed memory location

7. Indirect addressing

Concluding thoughts

Other assembly primer notes are linked here.

System Calls

Just Exit

Hello World

1. Registers

Integer

VFPv3 (Floating point) & NEON (Advanced SIMD)

2. Virtual Memory Organisation

3. SimpleDemo

Other assembly primer notes are linked here.