Monthly Archives: March 2011

{1,2,3,4}

(This is wonderfully obtuse, but amused me :)

Neil Henning (@sheredom) asked:

SPU gurus of twitter unite, want a vector unsigned int with {1, 2, 3, 4} in each slot, without putting it in as elf constant, any ideas?

Interesting question. The SPU ISA generally doesn’t help build vectors with different values in each slot. In this case, there are only very small values required in each register, so it can be done with a neat little trick.

My answer:

    fsmbi r4, 0x7310  # r4 = {0x00ffffff, 0x0000ffff, 0x000000ff, 0x00000000}
    clz r5, r4        # r5 = {8,16,24,32}
    rotmi r6, r5, -3  # r6 = {1,2,3,4}

Instructions are:

  • fsmbi — form select mask byte immediate. Creates a 128 bit mask from a 16 bit value, expanding each bit of input to 8 bits of output.
  • clz — count leading zeroes. Counts the number of leading zeros in each word.
  • rotmi — rotate and mask word immediate (logical shift right by negative immediate). Shifts each word right by the negation of number of bits specified.

This solution is entirely self contained, required no pre-set state (unlike my first attempt utilising the cbd instruction). In terms of raw instruction size, it’s a whole eight bytes smaller than storing the vector in memory and loading it when needed (that being 16+4 bytes), and a little slower than using a load instruction.

(On a cursory re-examination of the SPU ISA, fsmbi is the only instruction that will construct a different value in each word of a register. A specific pattern may be generated with cbd/cbx that can be used for this problem, but it depends on the contents of another register which limits its already limited usefulness. Combining fsmbi with other immediate instructions allows for a wide range of values to be constructed independent of register state and without access to storage)

TTYtter for the N900

A quick documenting of how I got TTYtter running on the N900/Maemo5.

0. Missing curl

TTYtter requires curl for OAuth, but curl isn’t packaged in the maemo5 repositories (libcurl is — which is frustrating. The particular reason for the frustration will be made clear later…)

That being the case, let’s build curl! I grabbed the sources for the version of curl that matched installed libcurl from the relevant source package page on maemo.org, unpacked the tarball and patch -p1′d the gunzipped patch.

1. What didn’t work

The first half-hearted attempt was to build curl using the cross toolchain I have installed on my gentoo desktop (built with crossdev -t arm-linux-gnueabi). I had little hope that this would just work, and a quick ./configure –host=arm-linux-gnueabi –prefix=/home/user/local && make && make install && scp -r /home/user/local n900: (or something like it) later, it didn’t — the foremost hurdle being that maemo5 uses an antiquated glibc-2.5 (2005, yeah!), and my toolchain uses (and thus generates programs that expect) glibc-2.11.3.

Persisting with my all-too-modern toolchain seemed likely to be a whole lot of effort — I decided to go with what appeared to be the Official method — the probability of success seemed marginally higher.

2. What worked

I installed scratchbox and built it there.

i. Installing scratchbox

I first found this MaemoOnGentoo outline which was got me started. Rather than the emerge command listed on that page, I ended up needing something like:

emerge scratchbox scratchbox-devkit-debian scratchbox-devkit-perl \
scratchbox-devkit-cputransp scratchbox-devkit-doctools \
scratchbox-toolchain-cs2007q3-glibc2_5 scratchbox-devkit-qemu\
scratchbox-devkit-git scratchbox-devkit-svn

As per that page, I needed to re-emerge xorg-server with the kdrive USE flag to build xephyr.

Started scratchbox with /etc/init.d/scratchbox start

From that point on, the Manual Installation instructions for the SDK from maemo.org generally worked — I added a user with /scratchbox/sbin/sbox_adduser, added my user account to the sbox group. (Actually, not really knowing what I was doing, after doing that, I ran the maemo-sdk-install_5.0.sh script, which seemed to do the right thing)

I needed to manually install the Nokia binaries/apps as per the Manual Installation instructions.

That done, I was able to start the SDK UI inside a xephyr window. i.e. Xephyr :2 -host-cursor -screen 800x480x16 -dpi 96 -ac and (inside a scratchbox prompt) DISPLAY=:2 af-sb-init.sh start

(Having the UI running is the Hello, world! ‘proof’ of functionality — it may not count for much, but it’s nice to see)

ii. Building it there

Once there’s a functional scratchbox environment, the next thing to do is to build the package.

I naively followed the relevant parts of the example from the Packaging guide on maemo.org.

Taking the source (as mentioned before — de-tarballed sources with patch applied) it became apparent that the necessary configuration was already in place to build the desired .deb (so much of the guide was unneeded for this task). In fact, from what I recall, the only command from that guide that was necessary was dpkg-buildpackage -sa -rfakeroot -k<my email address> (run using the FREMANTLE_ARMEL tool config)

End result: a bunch of files, including curl_7.18.2-8maemo6+0m5_armel.deb — the frustration mentioned earlier was that the config exists to build this, and that packaging curl for maemo5 would have been approximately zero extra effort.

(Nothing is ever actually zero extra effort. I know this.)

scp curl_7.18.2-8maemo6+0m5_armel.deb n900:, and install with dpkg –install curl_7.18.2-8maemo6+0m5_armel.deb and TTYtter gets the curl.

3. The final bit

TTYtter starts, but it’s not quite working yet. Maemo5 has a prehistoric perl-5.8.3 (2004, woo!) which appears to lack the kind of UTF8 support that TTYtter wants.

To work around this, start TTYtter with the -seven option.

4. Too long; don’t care

The package is here: curl_7.18.2-8maemo6+0m5_armel.deb
(The original source is here with it)

As root, install the package (dpkg -i curl_7.18.2-8maemo6+0m5_armel.deb) and then (as the regular user) grab and run ttytter -seven

TTYtter is by far the best Twitter client I’ve used on this phone — not least because it works.

Smokey Beef Chili with Guinness

This is a recipe I obtained via twitter. I didn’t make a note of the source, and it was shared as text in an image which I printed. I am now unable to locate the original.

I’m re-posting it the original text here (none of the comments within are mine) for the people that have asked me about it with thanks to whoever was responsible — it was enjoyed by my whole family :D

Smokey Beef Chili with Guiness

500 grams of gravy beef
100 grams of streaky bacon
3 celery stalks
2 red chilies
2 red onions
2 green capsicums
8 garlic cloves
1 can of diced tomatoes
2 cans of red kidney beans
150g of tomato paste
500ml beef stock
Dried oregano
Smoked paprika
Ground cumin
Tabasco sauce
Sugar to taste (usually between 1 and 4 teaspoons)
200ml of Guinness
(Optional) 1 can of smoked chipotle peppers

Roughly dice onion, chilies, capsicum and bacon. Finely dice garlic and celery. Add to hot pot with good slug of olive oil and cook until onion becomes semi-transparent.

Meanwhile, cut gravy beef (or chuck steak, or skirt steak) into large chunks. Whack in a food processor and pulse until half the steak has disintegrated and half has been carved up into various random shapes of random size.

Once onion is browned (important) add big teaspoon of smoked paprika, flat teaspoon of cumin, 2 teaspoons of dried oregano and solid few shakes of Tabasco sauce. Cook and stir for a few minutes until mixture becomes coloured from the spices cooking through it.

Add meat and cook until brown.

Add tomato paste and diced chipotle peppers and cook out until it just starts to caramelise on the walls of the pot. Add the adobo sauce from the chipotle can to taste now as well (warning – hot!)

Add can of diced tomatoes (drained) and kidney beans (drain and rinse well first). Cook for a few minutes.

Add Guinness. Cook until most of the beer has evaporated.

Add beef stock. Bring to boil. Add sugar until the mix in the pot is slightly less sweet than you want it to be (the sauce will reduce and sweetness will increase at the end). Alternatively, add the sugar when you add the tomato paste (it will add a little to the caramelisation of the paste and add a bit of extra flavour – best for second time you cook it, so you know how much you need)

Taste the mix after it starts to boil. Add a teaspoon of dried oregano and a little extra smoked paprika if the mix isn’t as smokey in flavour as you’d like. Add tabasco sauce for extra heat.

Simmer uncovered for 1.5 hours and add the lid when the mix is just a bit wetter than you want. (e.g. you want it wetter for serving with rice than you do for tacos or nachos). Let stand for at least 1/2 hour with heat off and lid on before serving (you can give it a reheat before serving if needed and add a little water to the simmering if it starts to dry out.)

Some folks stir fresh oregano and diced chili through before serving. It doesn’t float my boat, but it may well yours.

Assembly Primer Parts 6 — Moving Data — ARM

My notes for where ARM differs from IA32 in the Assembly Primer video Part 6 — Moving Data.

(There is no separate part 5 post for ARM — apart from the instructions, it’s identical to IA32. There’s even support for the .bss section, unlike SPU and PPC)

Moving Data

We’ll look at MovDemo.s for ARM. First, the storage:

.data

    HelloWorld:
        .ascii "Hello World!"

    ByteLocation:
        .byte 10

    Int32:
        .int 2
    Int16:
        .short 3
    Float:
        .float 10.23

    IntegerArray:
        .int 10,20,30,40,50

It’s the same as for IA32, PPC and SPU. Like the first two, ARM will cope with the unnatural alignment.

1. Immediate value to register

.text
.globl _start
_start:
    @movl $10, %eax

    mov r0, #10

Move the value 10 into register r0.

Something to note: the ARM assembly syntax has some slightly differences. Where others use # to mark the start of a comment, ARM has @ (although # works at the start of a line). Literal values are prefixed with #, which confuses the default syntax highlighting in vim.

2. Immediate value to memory

    @movw $50, Int16

    mov r1, #50
    movw r0, #:lower16:Int16
    movt r0, #:upper16:Int16
    strh r1, [r0, #0]

We need to load the immediate value in a register (r1), the address in a register (r0) and then perform the write. To quote the Architecture Reference Manual:

The ARM architecture … incorporates … a load/store architecture, where data processing operations only operate on register contents, not directly on memory contents.

which is like PPC and SPU, and unlike IA32 — and so we’ll see similarly verbose alternatives to the IA32 examples from the video.

I’m using movw, movt sequence to load the address, rather than ldr (as mentioned in the previous installment).

strh is, in this case, Store Register Halfword (immediate) — writes the value in r1 to the address computed from the sum of the contents of r0 and the immediate value of 0.

3. Register to register

    @movl %eax, %ebx

    mov r1,r0

mov (Move) copies the value from r0 to r1.

4. Memory to register

    @movl Int32, %eax

    movw r0, #:lower16:Int32
    movt r0, #:upper16:Int32
    ldr r1, [r0, #0]

Load the address into r0, load from the address r0+0. Here ldr is Load Register (immediate).

5. Register to memory

    @movb $3, %al
    @movb %al, ByteLocation

    mov r0, #3
    movw r1, #:lower16:ByteLocation
    movt r1, #:upper16:ByteLocation
    strb r0, [r1, #0]

Once again the same kind of thing — load 3 into r0, the address of ByteLocation into r1, perform the store.

6. Register to indexed memory location

    @movl $0, %ecx
    @movl $2, %edi
    @movl $22, IntegerArray(%ecx, %edi, 4)

    movw r0, #:lower16:IntegerArray
    movt r0, #:upper16:IntegerArray
    mov r1, #2
    mov r2, #22
    str r2, [r0, r1, lsl #2]

A little more interesting — here str is Store Register (register) which accepts two registers and an optional shift operation and amount. Here lsl is logical shift left, effectively multiplying r1 by 4 — the size of the array elements.

(GCC puts asl here. Presumably identical to logical shift left, but there’s no mention of asl in the Architecture Reference Manual. Update: ASL is referenced in the list of errors here as an obsolete name for LSL)

Two source registers and a shift is still shy of IA32′s support for an calculating an address from a base address, two registers and a multiply.

7. Indirect addressing

    @movl $Int32, %eax
    @movl (%eax), %ebx

    movw r0, #:lower16:Int32
    movt r0, #:upper16:Int32
    ldr r1, [r0, #0]

    @movl $9, (%eax)

    mov r2, #9
    str r2, [r0, #0]

More of the same.

Concluding thoughts

In addition to the cases above, ARM has a number of other interesting addressing modes that I shall consider in more detail in the future — logical operations, auto-{increment, decrement} and multiples. Combined with conditional execution, there are some very interesting possibilities.

Other assembly primer notes are linked here.