GBA Graphics Programming

Overview

Most games of course don't draw a static image on the screen and then loop forever. Instead they dynamically change the screen as the game is being played.

Doing this actually presents some problems:

Writing directly into video memory can cause "flickering" or "tearing" when some parts of the screen are updated before or after other parts. We would want to update the screen "all at once" to get rid of this.
Even when we update the screen all at once, the timing of when we do it is important. The hardware has a refresh cycle where the colors are sent from memory to the screen. If we update in the middle of this cycle, we could still have tearing.
The processor in the GBA runs at 16.78 MHz, and there are $240 \times 160 = 38400$ pixels on the screen. That means that we only have 437 clock cycles per second per pixel. If we wanted a frame rate of even just 30 frames per second (half of what the GBA hardware is capable of), this leaves us only 14 clock cycles to put each pixel in place. That may sound like enough, but the compiled put_pixel function we saw takes 22 machine instructions, which will take even more than 22 cycles to execute. This is not even counting any actual game logic. It is not possible to directly write every pixel each frame!

Today we will learn how to work around these issues.

Dynamic Graphics

The hello world program drew a static image, and then displayed it forever. In pseudo-code:

Draw Scene
Loop Forever:
1. Do nothing.

A dynamic program would redraw the screen in addition to handling input:

Initialize Things
Loop Forever:
1. Draw scene.
2. Check input.
3. Move objects.

This program uses this scheme to draw a green square on to the screen.

It maintains a struct called square which holds the square's position, size and color:


/* a colored square */
struct square {
    unsigned short x, y, size, color;
};

It has a function which draws square objects to the screen:


/* draw a square onto the screen */
void draw_square(struct square* s) {
    /* for each row */
    for (unsigned short row = s->y; row < (s->y + s->size); row++) {
        /* for each column */
        for (unsigned short col = s->x; col < (s->x + s->size); col++) {
            put_pixel(row, col, s->color);
        }
    }
}

Then the main function contains a loop which draws the square continuously:


/* loop forever */
while (1) {
    /* clear the screen */
    clear_screen();

    /* draw the square */
    draw_square(&s);
}

Unfortunately, this program runs into all three of the problems described above! The output looks like this which leaves a lot to be desired:

Mode 4

In order to put the pixels on the screen all at once, and avoid the tearing, we will need to use double buffering. In double buffering, we have two copies of the screen area at a time. One which is displayed, and one which we modify. When we are done modifying one, we swap the buffers so the screen updates all at once, not pixel by pixel.

Double buffering is supported by hardware directly. We don't actually move any data to swap buffers, we just tell the hardware to use a different section of video memory as the input to the screen!

The GBA mode 3 does not support double-buffering, so we will need to use either mode 4, or mode 5. Mode 4 makes you use a color palette instead of using colors directly, and mode 5 makes you use a reduced screen resolution. Here we will use mode 4, which means we need to talk about the palette!

The Color Palette

In mode 3, the screen stores 16-bit values which refer to colors directly. The hardware reads the screen memory and puts those colors onto the screen.

In mode 4, the screen stores 8-bit values which refer to entries in a color palette, and the palette is what actually stores the color data. Now, the hardware reads the screen memory and uses those values as indices into the palette to determine the actual colors to use.

This means we are limited to a set of 256 colors, but those colors can be whatever we want them to be.

The palette is a block of 256 16-bit values beginning at address 0x5000000. To keep track of which palette entries are used, we need to keep track of the next available index. We can then change our make_color function so that it adds colors to the palette and returns the indices instead:


/* the address of the color palette used in graphics mode 4 */
volatile unsigned short* palette = (volatile unsigned short*) 0x5000000;

/* keep track of the next palette index */
int next_palette_index = 0;

/*
 * function which adds a color to the palette and returns the
 * index to it
 */
unsigned char add_color(unsigned char r, unsigned char g, unsigned char b) {
    unsigned short color = b << 10;
    color += g << 5;
    color += r;

    /* add the color to the palette */
    palette[next_palette_index] = color;

    /* increment the index */
    next_palette_index++;

    /* return index of color just added */
    return next_palette_index - 1;
}

We can then pass the three color components to add_color which will insert it into the next available slot in the palette, and return an index which we can then use to refer to that color.

Writing Pixels

There is one other hardware issue we need to consider before writing a mode 4 program, which is that the GBA video RAM cannot be written as bytes. If we try to write a single byte to it, then that value will actually be set into two bytes, overwriting another one of our pixels.

In order to get around this, we'll have to first read the existing 16-bit value, modify half of it, and write the whole thing back to video RAM.

The put_pixel function becomes more complicated:


/* put a pixel on the screen in mode 4 */
void put_pixel(volatile unsigned short* buffer, int row, int col, unsigned char color) {
    /* find the offset which is the regular offset divided by two */
    unsigned short offset = (row * WIDTH + col) >> 1;

    /* read the existing pixel which is there */
    unsigned short pixel = buffer[offset];

    /* if it's an odd column */
    if (col & 1) {
        /* put it in the left half of the short */
        buffer[offset] = (color << 8) | (pixel & 0x00ff);
    } else {
        /* it's even, put it in the left half */
        buffer[offset] = (pixel & 0xff00) | color;
    }
}

This code finds the pixel offset using the same formula as mode 3, but divides it in half (because the mode 4 pixels are packed into half the memory space). It then reads the current value of the two pixels stored there.

It figures out if this is an even or odd pixel based on the column number, and puts the new color into either the left or right half of the short.

If we use the mode 3 put_pixel function in mode 4, we will get bizarre results like this:

GBA Double Buffering

We are now ready to employ double buffering which is the reason to put up with mode 4's quirks. With double buffering, we now have two places to draw pixels. One is still at the beginning of the screen space at 0x6000000 and the other is at 0x600A000.

We now no longer have one screen, but two buffers:


/* pointers to the front and back buffers - the front buffer is the start
 * of the screen array and the back buffer is a pointer to the second half */
volatile unsigned short* front_buffer = (volatile unsigned short*) 0x6000000;
volatile unsigned short* back_buffer = (volatile unsigned short*)  0x600A000;

Functions that draw graphics need to take the buffer as a parameter since we can no longer assume what memory address they are currently writing into:


/* draw a square onto the screen */
void draw_square(volatile unsigned short* buffer, struct square* s) {
    short row, col;
    /* for each row of the square */
    for (row = s->y; row < (s->y + s->size); row++) {
        /* loop through each column of the square */
        for (col = s->x; col < (s->x + s->size); col++) {
            /* set the screen location to this color */
            put_pixel(buffer, row, col, s->color);
        }
    }
}

/* clear the screen to black */
void clear_screen(volatile unsigned short* buffer, unsigned short color) {
    unsigned short row, col;
    /* set each pixel black */
    for (row = 0; row < HEIGHT; row++) {
        for (col = 0; col < WIDTH; col++) {
            put_pixel(buffer, row, col, color);
        }
    }
}

We also need to actually switch the buffers at some point. This is done with another bit in the display control register. If bit number 4 is 0, then the front buffer is displayed to the screen. If that bit is 1, then the back buffer is displayed instead.

To switch buffers, we just have to invert a single bit in memory. This allows the page to be flipped very quickly which will avoid the tearing and flickering we saw before.

We make a define for the fourth bit called "SHOW_BACK":


/* this bit indicates whether to display the front or the back buffer
 * this allows us to refer to bit 4 of the display_control register */
#define SHOW_BACK 0x10;

And then we can write a function to do the dirty work of flipping this bit. The function also returns the non-visible buffer so that you know which one you should be writing into next:


/* this function takes a video buffer and returns to you the other one */
volatile unsigned short* flip_buffers(volatile unsigned short* buffer) {
    /* if the back buffer is up, return that */
    if(buffer == front_buffer) {
        /* clear back buffer bit and return back buffer pointer */
        *display_control &= ~SHOW_BACK;
        return back_buffer;
    } else {
        /* set back buffer bit and return front buffer */
        *display_control |= SHOW_BACK;
        return front_buffer;
    }
}

And finally the main function calls flip_buffers every time through the main loop to implement the page flipping idea:


/* the main function */
int main() {
    /* we set the mode to mode 4 with bg2 on */
    *display_control = MODE4 | BG2;

    /* make a green square */
    struct square s = {10, 10, 15, add_color(0, 20, 2)};

    /* add black to the palette */
    unsigned char black = add_color(0, 0, 0);

    /* the buffer we start with */
    volatile unsigned short* buffer = front_buffer;

    /* loop forever */
    while (1) {
        /* clear the screen */
        clear_screen(buffer, black);

        /* draw the square */
        draw_square(buffer, &s);

        /* swap the buffers */
        buffer = flip_buffers(buffer);
    }
}

The full program can be seen here and does not exhibit the flickering and tearing. It looks like this:

Adding Movement

We can add movement to out scene pretty easily. We can add code to test for button presses:


/* the button register holds the bits which indicate whether each button has
 * been pressed - this has got to be volatile as well
 */
volatile unsigned short* buttons = (volatile unsigned short*) 0x04000130;

/* the bit positions indicate each button - the first bit is for A, second for
 * B, and so on, each constant below can be ANDED into the register to get the
 * status of any one button */
#define BUTTON_A (1 << 0)
#define BUTTON_B (1 << 1)
#define BUTTON_SELECT (1 << 2)
#define BUTTON_START (1 << 3)
#define BUTTON_RIGHT (1 << 4)
#define BUTTON_LEFT (1 << 5)
#define BUTTON_UP (1 << 6)
#define BUTTON_DOWN (1 << 7)
#define BUTTON_R (1 << 8)
#define BUTTON_L (1 << 9)

/* this function checks whether a particular button has been pressed */
unsigned char button_pressed(unsigned short button) {
    /* and the button register with the button constant we want */
    unsigned short pressed = *buttons & button;

    /* if this value is zero, then it's not pressed */
    if (pressed == 0) {
        return 1;
    } else {
        return 0;
    }
}

Then we can write a function which handles the buttons so that the square moves with the arrow keys:


/* handle the buttons which are pressed down */
void handle_buttons(struct square* s) {
    /* move the square with the arrow keys */
    if (button_pressed(BUTTON_DOWN)) {
        s->y += 3;
    }
    if (button_pressed(BUTTON_UP)) {
        s->y -= 3;
    }
    if (button_pressed(BUTTON_RIGHT)) {
        s->x += 3;
    }
    if (button_pressed(BUTTON_LEFT)) {
        s->x -= 3;
    }
}

And call it in main loop so that the square will move around. The full program which adds this is available here.

This program has two problems:

It is very slow.
It might still exhibit screen tearing, though it may be hard to notice in this simple scene.

We will handle the second of these issues first.

Video Blanking

The reason that we may still experience screen tearing is because we are not timing our buffer flip at all. We are doing it just whenever the program gets to that part of the code. The issue is that the GBA hardware actually takes some amount of time to actually push pixel values onto the screen.

It does this row by row in what are called "scan lines" during the "VDraw" period. Each of the 160 rows takes 1232 machine cycles to draw. The entire VDraw period takes 197,120 cycles. We do not want to flip the buffers during this period.

If we do, the hardware will flip buffers it reads from halfway down the screen and this would result in a torn screen. Super Mario World does not do this, but if it did you might see this:

The time we need to flip the buffers is during the "VBlank" period which occurs right after the last of the 160 rows is pushed to the screen. The VBlank period is a pause between the hardware drawing the last row of one frame, but before starting the first row of the next. It lasts as long as the time taken to draw 68 rows, 83,776 machine cycles. This is plenty of time to swap the buffers.

We can set this timing up by monitoring a hardware register called the scanline counter. The hardware updates this register with the number of rows of the screen it has drawn. This register is at memory address 0x4000006:


/* the scanline counter is a memory cell which is updated to indicate how
 * much of the screen has been drawn */
volatile unsigned short* scanline_counter = (volatile unsigned short*) 0x4000006;

The code to wait for a VBlank period just sits in an empty loop until this counter indicates that it has gotten through all 160 rows:


/* wait for the screen to be fully drawn so we can do something during vblank */
void wait_vblank() {
    /* wait until all 160 lines have been updated */
    while (*scanline_counter < 160) { }
}

Then we just call this function before swapping the buffers which will ensure we do so during VBlank and not VDraw:


/* wait for vblank before switching buffers */
wait_vblank();

/* swap the buffers */
buffer = flip_buffers(buffer);

This techniques is sometimes called vertical synchronization or "Vsync". 60 frames per second is the refresh rate of the GBA screen (and many other screens as well). The best we can hope to do is get all of our work done during VDraw, so that we hit every VBlank period, which would mean that our game would run at 60 FPS.

If we take too long for each frame, we will miss our first VBlank, and have to wait an entire other VDraw period to swap buffers which will result in 30 FPS. This is why 60 FPS and 30 FPS are so common.

The full code which implements this is available here. This program runs even slower than the last (way below 30 FPS), but it will not tear the screen!

Improving the Speed

The reason that this program runs so slowly is because it is updating every pixel each time through the loop. There are a few ways we could fix this.

The simplest is to use a technique called "dirty rectangles" wherein you keep track of which parts of each buffer have changed since the last time that buffer has been drawn and only redraw them.

Doing this in a general way for multiple objects is complex and hard to do efficiently, but we can implement a similar idea with the moving square program.

Instead of clearing the whole screen every cycle, we will instead only clear the area right around the square.

We start by clearing both buffers entirely before the main loop:


/* clear whole screen first */
clear_screen(front_buffer, black);
clear_screen(back_buffer, black);

Then we write a new function "update_screen" which clears the area of the square and a little extra on each side to make up for the movement:


/* clear the screen to black */
void update_screen(volatile unsigned short* buffer, unsigned short color, struct square* s) {
    short row, col;
    /* set each pixel black */
    for (row = s->y - 3; row < (s->y + s->size + 3); row++) {
        for (col = s->x - 3; col < (s->x + s->size + 3); col++) {
            put_pixel(buffer, row, col, color);
        }
    }
}

We then call update_screen before drawing the square, instead of clear_screen. The full version of this is available here. It is still not very fast, but runs much faster than the last one!

To get really fast moving objects, we will have to use the GBA's hardware sprite facility, which we will cover in a few weeks!