Assembly Primer Part 7 — Working with Strings

These are my notes for where I can see ARM varying from IA32, as presented in the video Part 7 — Working with Strings.

I’ve not remotely attempted to implement anything approximating optimal string operations for this part — I’m just working my way through the examples and finding obvious mappings to the ARM arch (or, at least what seem to be obvious). When I do something particularly stupid, leave a comment and let me know :)

Working with Strings

.data
     HelloWorldString:
        .asciz "Hello World of Assembly!"
    H3110:
        .asciz "H3110"

.bss
    .lcomm Destination, 100
    .lcomm DestinationUsingRep, 100
    .lcomm DestinationUsingStos, 100

Here’s the storage that the provided example StringBasics.s uses. No changes are required to compile this for ARM.

1. Simple copying using movsb, movsw, movsl

    @movl $HelloWorldString, %esi
    movw r0, #:lower16:HelloWorldString
    movt r0, #:upper16:HelloWorldString

    @movl $Destination, %edi
    movw r1, #:lower16:Destination
    movt r1, #:upper16:Destination

    @movsb
    ldrb r2, [r0], #1
    strb r2, [r1], #1

    @movsw
    ldrh r3, [r0], #2
    strh r3, [r1], #2

    @movsl
    ldr r4, [r0], #4
    str r4, [r1], #4

More visible complexity than IA32, but not too bad overall.

IA32’s movs instructions implicitly take their source and destination addresses from %esi and %edi, and increment/decrement both. Because of ARM’s load/store architecture, separate load and store instructions are required in each case, but there is support for indexing of these registers:

ARM addressing modes

According to ARM A8.5, memory access instructions commonly support three addressing modes:

Offset addressing — An offset is applied to an address from a base register and the result is used to perform the memory access. It’s the form of addressing I’ve used in previous parts and looks like [rN, offset]
Pre-indexed addressing — An offset is applied to an address from a base register, the result is used to perform the memory access and also written back into the base register. It looks like [rN, offset]!
Post-indexed addressing — An address is used as-is from a base register for memory access. The offset is applied and the result is stored back to the base register. It looks like [rN], offset and is what I’ve used in the example above.

2. Setting / Clearing the DF flag

ARM doesn’t have a DF flag (to the best of my understanding). It could perhaps be simulated through the use of two instructions and conditional execution to select the right direction. I’ll look further into conditional execution of instructions on ARM in a later post.

3. Using Rep

ARM also doesn’t appear to have an instruction quite like IA32’s rep instruction. A conditional branch and a decrement will be the long-form equivalent. As branches are part of a later section, I’ll skip them for now.

    @movl $HelloWorldString, %esi
    movw r0, #:lower16:HelloWorldString
    movt r0, #:upper16:HelloWorldString

    @movl $DestinationUsingRep, %edi
    movw r1, #:lower16:DestinationUsingRep
    movt r1, #:upper16:DestinationUsingRep

    @movl $25, %ecx # set the string length in ECX
    @cld # clear the DF
    @rep movsb
    @std

    ldm r0!, {r2,r3,r4,r5,r6,r7}
    ldrb r8, [r0,#0]
    stm r1!, {r2,r3,r4,r5,r6,r7}
    strb r8, [r1,#0]

To avoid conditional branches, I’ll start with the assumption that the string length is known (25 bytes). One approach would be using multiple load instructions, but the load multiple (ldm) instruction makes it somewhat easier for us — one instruction to fetch 24 bytes, and a load register byte (ldrb) for the last one. Using the ! after the source-address register indicates that it should be updated with the address of the next byte after those that have been read.

The storing of the data back to memory is done analogously. Store multiple (stm) writes 6 registers×4 bytes = 24 bytes (with the ! to have the destination address updated). The final byte is written using strb.

4. Loading string from memory into EAX register

    @cld
    @leal HelloWorldString, %esi
    movw r0, #:lower16:HelloWorldString
    movt r0, #:upper16:HelloWorldString

    @lodsb
    ldrb r1, [r0, #0]

    @movb $0, %al
    mov r1, #0

    @dec %esi  @ unneeded. equiv: sub r0, r0, #1
    @lodsw
    ldrh r1, [r0, #0]

    @movw $0, %ax
    mov r1, #0

    @subl $2, %esi # Make ESI point back to the original string. unneeded. equiv: sub r0, r0, #2
    @lodsl
    ldr r1, [r0, #0]

In this section, we are shown how the IA32 lodsb, lodsw and lodsl instructions work. Again, they have implicitly assigned register usage, which isn’t how ARM operates.

So, instead of a simple, no-operand instruction like lodsb, we have a ldrb r1, [r0, #0] loading a byte from the address in r0 into r1. Because I didn’t use post indexed addressing, there’s no need to dec or subl the address after the load. If I were to do so, it could look like this:

    ldrb r1, [r0], #1
    sub r0, r0, #1

    ldrh r1, [r0], #2
    sub r0, r0, #2

    ldr r1, [r0], #4

If you trace through it in gdb, look at how the value in r0 changes after each instruction.

5. Storing strings from EAX to memory

    @leal DestinationUsingStos, %edi
    movw r0, #:lower16:DestinationUsingStos
    movt r0, #:upper16:DestinationUsingStos

    @stosb
    strb r1, [r0], #1
    @stosw
    strh r1, [r0], #2
    @stosl
    str r1, [r0], #4

Same kind of thing as for the loads. Writes the letters in r1 (being “Hell” — leftovers from the previous section) into DestinationUsingStos (the result being “HHeHell”). String processing on little endian architectures has its appeal.

6. Comparing Strings

    @cld
    @leal HelloWorldString, %esi
    movw r0, #:lower16:HelloWorldString
    movt r0, #:upper16:HelloWorldString
    @leal H3110, %edi
    movw r1, #:lower16:H3110
    movt r1, #:upper16:H3110

    @cmpsb
    ldrb r2, [r0,#0]
    ldrb r3, [r1,#0]
    cmp r2, r3

    @dec %esi
    @dec %edi
    @not needed because of the addressing mode used

    @cmpsw
    ldrh r2, [r0,#0]
    ldrh r3, [r1,#0]
    cmp r2, r3

    @subl $2, %esi
    @subl $2, %edi
    @not needed because of the addressing mode used
    @cmpsl
    ldr r2, [r0,#0]
    ldr r3, [r1,#0]
    cmp r2, r3

Where IA32’s cmps instructions implicitly load through the pointers in %edi and %esi, explicit loads are needed for ARM. The compare then works in pretty much the same way as for IA32, setting condition code flags in the current program status register (cpsr). If you run the above code, and check the status registers before and after execution of the cmp instructions, you’ll see the zero flag set and unset in the same way as is demonstrated in the video.

The condition code flags are:

bit 31 — negative (N)
bit 30 — zero (Z)
bit 29 — carry (C)
bit 28 — overflow (V)

There’s other flags in that register — all the details are on page B1-16 and B1-17 in the ARM Architecture Reference Manual.

And with that, I think we’ve made it (finally) to the end of this part for ARM.

Assembly Primer Part 7 — Working with Strings — ARM