We have talked about the GBA memory system indirectly, but here we will see the complete memory system. It is actually composed of several different memories which share one address space:
Start Address | End Address | Size | Description |
0x0000:0000 | 0x0000:3fff | 16 kb | BIOS memory. |
0x0200:0000 | 0x0203:ffff | 256 kb | External work memory (larger and slower). |
0x0300:0000 | 0x0300:7fff | 32 kb | Internal work memory (smaller and faster). |
0x0400:0000 | 0x0400:03ff | 1 kb | Hardware control registers. |
0x0500:0000 | 0x0500:03ff | 1 kb | Palette memory. |
0x0600:0000 | 0x0601:7fff | 96 kb | Video RAM. |
0x0700:0000 | 0x0700:03ff | 1 kb | Sprite memory. |
0x0800:0000 | - | Variable | Game Pak ROM (executable code and data of the game). |
0x0e00:0000 | - | Variable | Cart RAM (save game data). |
Notice that the end of one section does not match up with the start of the next. This means some memory addresses are invalid and don't refer to any real memory on the GBA.
The GBA does not have a cache, but it does have the external work RAM (EWRAM) and internal work RAM (IWRAM) sections which function somewhat like a cache, except it the programmer or compiler's job to manage them.
We have been relying on the compiler to handle this for us, but by programming in assembly, we can decide if our code and data should go in the smaller, faster, IWRAM.
Normally when memory is accessed, it is done through the CPU. This is usually not a problem because the CPU usually processes the data which is accessed directly.
There are some cases, however, where this is not optimal. For instance, if we are copying a block of memory from one address to another, it is not efficient to rout it through the CPU.
Instead, we can use direct memory access (DMA) in which memory transfers happen outside of the CPU. With DMA, blocks of memory can be transferred directly from one part of memory to another:
DMA can also be used to transfer data from memory to an I/O device such as a graphics card or hard drive.
DMA is normally handled by the operating system and device driver software. On the GBA, however, there is no operating system or drivers.
We can use DMA on the GBA, but we have to do it ourselves. To do so, there are a few more hardware registers:
/* pointer to the DMA source location */
volatile unsigned int* dma_source = (volatile unsigned int*) 0x40000D4;
/* pointer to the DMA destination location */
volatile unsigned int* dma_destination = (volatile unsigned int*) 0x40000D8;
/* pointer to the DMA count/control */
volatile unsigned int* dma_count = (volatile unsigned int*) 0x40000DC;
The dma_source register stores the source of the DMA transfer, the dma_destination stores the destination. To turn on a DMA transfer, we store pointers to the source and destination in these two registers.
The dma_count register stores both the number of values to transfer and control signals detailing how the transfer should work. The lowest 16 bits are the size. The highest bit is the enable bit. When 0, no DMA takes place. When 1, a DMA is triggered.
The 26th bit is also important, it determines the size of transfers. If 0, it's 16-bit values, and if 1, it's 32 bit ones. The following defines can be used to set these values:
/* flag for turning on DMA */
#define DMA_ENABLE 0x80000000
/* flags for the sizes to transfer, 16 or 32 bits */
#define DMA_16 0x00000000
#define DMA_32 0x04000000
There are several other control values which allow more fine-grained control.
Without DMA, we would write a function to transfer memory from one location to another with a loop:
/* copy data with a loop */
void memcpy16(unsigned short* dest, unsigned short* source, int amount) {
for (int i = 0; i < amount; i++) {
dest[i] = source[i];
}
}
We can do the same thing but faster using DMA instead:
/* copy data using DMA */
void memcpy16_dma(unsigned short* dest, unsigned short* source, int amount) {
*dma_source = (unsigned int) source;
*dma_destination = (unsigned int) dest;
*dma_count = amount | DMA_16 | DMA_ENABLE;
}
This code stores the source and destination locations in the hardware registers, and then combines the amount with the flags for 16 bits, and the enable bit. This suspends the CPU until the memory transfer has finished.
This program tests the loop-based copy and the DMA-based one. It contains both memory copy functions above and calls one of them a number of times. The DMA one runs far faster.
Copyright © 2025 Ian Finlayson | Licensed under a Creative Commons BY-NC-SA 4.0 License.