ARM Assembly Programming

Machine Language

Machine language refers to the code that a CPU directly executes. A CPU is able of performing many different options, so the instruction codes tell them which operation to do next. For instance, an ARM instruction that adds register 1 to register 2 and stores the result in register 3 can be given as:

Condition	Instruction Format	Use Immediate	Opcode	Shift Operand	Destination Register	Source Register 1	Immediate Value	Source Register 2
1110	00	0	0100	0	0011	0001	00000000	0010

Each of these instruction fields is defined below:

Condition
The condition field is used to make the execution of an instruction hinge on some condition. For instance, we could write an instruction that is only executed if a previous comparison was true or false. 1110 means always execute.
Instruction Format
The instruction format tells the CPU how to interpret these fields. 00 means this is an arithmetic instruction. Other types of instructions have different fields.
Use Immediate
This is used to indicate if the last operand of the instruction is an immediate value which means a number encoded directly into the instruction. Instead of adding two values in registers, we could add one register with a number stored directly in the instruction.
Opcode
The opcode tells the CPU what to do with the operands. 0100 means to add them.
Shift Operand
The shift operand tells the CPU whether to shift the last operand before using it.
Destination Register
The register number to store the result.
Source Register 1
The register to use for the first operand.
Immediate Value
If "Use Immediate" were 1, this field along with the next would store the value to use as the second operand.
Source Register 2
The register to use for the second operand.

This instruction in hexadecimal would be 0xD0831002. When the CPU is given this instruction, it will perform the task the instruction describes, adding the values of registers 1 and 2, and storing the result in register 3.

Machine language is specific to a particular machine. If we gave the above instruction to an x86 computer, it would not add two registers because x86 has different instruction formats, different opcodes, different sets of registers and so on.

Assembly Language

Coding directly in machine language is of course incredibly tedious, and error-prone. Instead, we could program in "assembly language" which is essentially a human readable version of machine language.

The instruction above in ARM assembly is simply:


add r3, r1, r2

Which is much easier for humans to write. Assembly code is translated into the equivalent machine code by a program called an "assembler". There is mostly a one-to-one correspondence between assembly instructions and machine code instructions. The only exception is an instruction which loads memory addresses too big to fit into one instruction.

The assembler is a very simple program that removes the tedium of encoding the instructions into binary. Sometimes people say they are programming in "machine language" when they really mean assembly language. There is no reason to code directly in machine language.

Instruction Set Architecture

Assembly language is tied to the instruction set architecture (ISA) of a machine. The ISA includes things like the number and type of registers, the available instructions on them, how instructions are encoded, how memory is referenced etc.

It is often the case that ISAs are expanded in ways that keep backwards compatibility. For instance if one ISA specified that the opcode was a 4-bit value, but only used 12 opcodes, a second version of that ISA could add new opcodes to it. Old programs would work on the new processors (they just wouldn't use the new instructions), but new programs might not work on old processors. This is the case for ARM which is up to version 8 of the ISA now, but the GBA uses version 4.

There can be multiple CPUs that all follow the same ISA. For instance, there are multiple CPUs that implement ARM version 4. The GBA uses the ARM7TDMI, but there is also the ARM9TDMI which has higher performance, but uses more energy.

ARM is one of the two widely used ISAs in the world. It is used in:

Hand-held devices including game consoles, Raspberry Pi's, and other small computers.
Virtually all phones and tablets.
Some laptops including Chromebooks, and the newer Mac laptops

The other is the x86 ISA, and its 64-bit extension which is called x86-64 or just x64. This ISA is much more complicated than ARM, and has a lot of technical baggage, but is still widely used because it is backwards compatible with the very popular Intel CPUs dating back to the 8086 from the late 70's.

x86 is used for most laptop and desktop computers, and most servers.

There used to be a wider diversity of ISAs, but now ARM and x86 share nearly every type of computer. ARM CPUs are more energy-efficient than x86 which is why it is used when energy is important. The main benefit of x86 is that it is compatible with a large amount of existing software.

Aside: Writing Text

In order to write simple assembly code for the GBA, it would be nice to be able to output text to the screen. This is actually not too difficult using a tiled background. This program uses this approach along with this image file:

The trick of this is that, except for the first 32 ASCII characters, which are used for unprintable control codes, the tiles are in ASCII-order. So to print the letter 'A', we take the ASCII code, subtract 32, and use that to index this image.

We just need to load the image into a character block, and use one of the screen blocks to store which tile (letter) goes into which place. The following function takes in a string and prints it to a given part of the screen:


/* function to set text on the screen at a given location */
void set_text(char* str, int row, int col) {                    
    /* find the index in the text map to draw to */
    int index = row * 32 + col;

    /* the first 32 ASCII characters are missing from the map (control chars etc.) */
    int missing = 32; 

    /* pointer to text map we are using */
    volatile unsigned short* ptr = screen_block(24);

    /* for each character */
    while (*str) {
        /* place this character in the map */
        ptr[index] = *str - missing;

        /* move onto the next character */
        index++;
        str++;
    }   
}

Now we can call set_text to print any strings we want onto the screen. This is a useful approach for putting text onto the screen for games, but we'll also use it to test out our assembly functions.

First Assembly Program

Now we can start writing ARM assembly. One nice thing about C is that it is really easy to work with assembly code. We can call assembly functions from C and vice versa without having to do any sort of extra work.

As a first example, we'll make an assembly function which can add two numbers and return the result. This is done using the same add instruction detailed above. We'll make a file called "add.s" ('s' is the common suffix for assembly files) with the following:


@ add.s

@ declare add_asm as a global function so we can call it
.global	add_asm

@ here is the definition of add_asm
add_asm:
    add r0, r0, r1      @ add r0 and r1 (first two args) and place result in r0
    mov pc, lr          @ return back to the caller

Lines that begin with the '@' character are comments in ARM assembly. You can also use C style comments. The first line declares "add_asm" to be global. This has to be done for any functions we wish to call from another file. We're going to call the "add_asm" function from C, so we need this. If we forget it, we will get a link error.

The line that reads add_asm: is a label which marks the start of the add_asm function. Unlike other languages, assembly functions do not declare the number and type of parameters, or return types. It's on the programmer to keep that straight.

This function has two instructions. The first adds the contents of registers 0 and 1, and stores the results back into register 0.

The second instruction is the mov instruction which copies from one register to another. This one copies the lr register into the pc register. The lr register stands for "link register", and it stores the address of wherever this function is supposed to return back to. If we are done and ready to return, we should put this address into the pc register which stands for "program counter". The program counter stores the address of the next instruction the CPU executes. By setting pc to lr, we make the next instruction we execute the one we are supposed to go back to, so this line is the same as a return statement in C.

To call this function from C, all we have to do is declare it, and then call it:


/* our assembly function - this is not defined here, it is defined in add.s
 * we have to declare the function here in order to call it though! */
int add_asm(int a, int b);

/* the main function */
int main() {
    /* we set the mode to mode 0 with bg0 and bg1 on */
    *display_control = MODE0 | BG0_ENABLE | BG1_ENABLE;

    /* setup the background 0 */
    setup_background();

    /* call assembly function */
    int result = add_asm(5, 7);

    /* print it */
    char text[32];
    sprintf(text, "%d", result);
    set_text(text, 0, 0);

    /* loop forever */
    while (1) {
    }
}

The full program is available here. To run this, we actually need to compile the C code, assemble the assembly code, and then link the results together. The gbacc script does this automatically, so we can pass both files to it:

gbacc add.s main.c

ARM Calling Conventions

How did the main program know what to do with the arguments 5 and 7? And how did it get the value 12 back from the assembly function? The answer lies in the ARM "calling convention" which refers to the protocol between a function and its caller for how parameters are passed and return values are given.

In ARM, r0 through r3 are used for passing parameters into a function. In our code above, the two arguments are passed in r0 and r1. If more than four arguments are needed, they must go on the stack (which we will discuss later).

r0 through r3 are also used for the return value. Because an int fits in just one register, we put the result into r0. If the return value is too big, we must pass the function a pointer to a location in memory where the value can be stored as r0.

The calling convention also includes the use of other registers. A function has to know if it's allowed to overwrite a register, or if it is being used by the function which called it.

The full ARM register convention is as follows:

Register	Use	Aliases
`r0 - r3`	Arguments and return values	`a1 - a4`
`r4 - r11`	Used to hold local variables (preserved across function calls)	`v1 - v8`
`r12`	Intra-procedural scratch register (used by linker to build function addresses)	`ip`
`r13`	Stack pointer (stores address of the top of the stack)	`sp`
`r14`	Link register (stores address of instruction to return to)	`lr`
`r15`	Program counter (stores address of next instruction to execute)	`pc`

Note that we can use register numbers or the aliases to refer to them.

One of the hardest things about programing in assembly is that we do not have names for variables. We have to keep track of which registers store which variables ourselves.

Other Instructions

In addition to add, there are a number of other arithmetic/logic instructions that are available to us in ARM assembly:

sub - subtract
mul - multiply
and - and
orr - or
eor - exclusive or

The mul instruction has some unusual restrictions. We can't use the same register for the destination and the first operand. So mul r0, r0, r1 is not allowed, but mul r0, r1, r0 is. Also, we cannot use immediate values with the multiply instruction.

Note that, while some ARM processors have divide instructions, the one in the GBA does not.

We also will commonly want to copy a value from one register into another. This can be done with the mov instruction. For instance mov r0, r1 will copy the value in register 1 to register 0.

How could we write an assembly function to take three integer arguments and return the product of all three?

Using Immediate Values

In all of the instructions above (except mul), we can substitute the last argument with an immediate value, which is an integer constant stored directly inside of the instruction. To indicate a decimal number, use the # character. For instance, to add one to a register, we could use add r0, r0, #1.

To use a hexadecimal number instead, we can use #0x as a prefix. For instance, to turn on the lower four bits of register 6, we could use orr r6, r6, #0xf.

This is often used with the mov instruction to store a particular value in a register.

Note that not all values can be used. There are in fact only 12 bits available for immediate values. If we attempt to use too big of a value, the assembler will give us an error:

program.s:8: Error: invalid constant -- `add r0,r0,#100000'

To get around this, we can use multiple instructions to build up a larger value piece by piece.

Shifts

The ARM architecture is somewhat unusual in that there are no instructions just for performing shifts. Instead, regular move and arithmetic instructions can have their last operand contain a shift.

The ARM hardware contains a "barrel shifter" which is able to shift register values after they are loaded from the register file, but before they are input to the ALU.

In order to set r0 to the value of r1 shifted left by two, we can use the following instruction:


mov r0, r1, lsl #2

The barrel shifter allows us to perform a shift and another operation with just one instruction. For example to implement something like the C statement:


a = b + c << 3;

We could use the ARM instruciton:


add r0, r1, r2, lsl #3

We have the following shift operations available:

lsl - Logical shift left.
lsr - Logical shift right.
asr - Arithmetic shift right. The difference is that an arithmetic shift keeps the sign bit. asr is for signed numbers while lsr is for unsigned ones.
ror - Rotate right. This is like a regular shift except the bits don't "fall off" the right side, they rotate around to the left. There is no built in operation for rotate in C.