Matt's Blog

Programming a Gameboy Color emulator

January 14th, 2018 - 19 min read

Have you ever wanted to go back and replay that nostalgic video game from your past? As time goes on, it only gets more difficult (and expensive) to track down physical copies of these old systems and game cartridges from decades ago. If you kept looking for a way to play them despite this, you have likely discovered and used emulators.

Screenshots from my emulators, Gameboy Crust and original GB emu.

An emulator is a piece of software that enables programs for one system (the guest system) to run on an entirely different system (the host). Have you ever wondered how they work? Many people who have used emulators (my younger self included) tend to write them off as some kind of black magic witchcraft. However, taking the time to understand how an emulator works under the hood is an incredibly rewarding and educational experience in many ways.

This page will describe what I learned during the process of developing an emulator for the Nintendo Gameboy Color. The host system can be any version of Linux/Windows/MacOS. Pseudocode will be presented in Rust syntax. There are many systems that can be emulated, but the choice was easy for me since the Gameboy was my first videogame console.

Theory

The Gameboy, along with all old video game consoles, are nothing more than simple versions of computers. We know that a computer is a device that can carry out logical tasks. When we view the Gameboy as a computer, the video game cartridges are the ‘programs’ that run on it and tell it which instructions to execute.

An emulator works by virtualy representing the physical hardware of the guest machine. This means that in order to create an emulator for the Gameboy, we have to virtually simulate how the real hardware of the gameboy would function. With this in mind, let’s take a look at the hardware of the Gameboy and how it all interacts so we can form an understanding of how to emulate it in code.

Hardware

The Gameboy has the following hardware features:

Memory

The Gameboy has a few different types of memory that need to all be emulated:

Memory is one of the easiest things to implement in code. Regardless of what is actually being stored in each memory location, we represent it in the same way. We can equate a block of memory in the hardware to simply allocating N bytes of memory in our program:

pub fn new() -> Wram {
Wram {
bytes: vec![0; WRAM_SIZE],
}
}

Then all we need are methods to read and write to our virtual memory. The Gameboy has a 16-bit address space, so we can use an unsigned 16-bit integer to index our memory:

pub fn read(&self, address: u16) -> u8 {
self.bytes[address as usize]
}

pub fn write(&mut self, address: u16, data: u8) {
self.bytes[address as usize] = data;
}

And that is all that is needed to emulate any kind of physical memory in the Gameboy. The only exception to this is the game cartridge which is read-only memory (ROM). Since this memory is read-only, we do not implement a write method. The ROM is still a contiguous buffer of data, but we read it in from a file instead of initializing an empty vector:

pub fn load(path: String) -> Rom {
let mut buffer = Vec::new();
let mut file = File::open(path).expect("Invalid ROM path");
file.read_to_end(&mut buffer).expect("Unable to read ROM");

Rom {
bytes: buffer
}
}

Memory Mapping

But what is the point of writing many sub-allocations of memory? Why not just define the entire 64Kb of memory in one contiguous vector?

The Gameboy hardware actually implements virtual memory via memory mapping. This means that logical blocks of memory can be swapped out, but the address range remains the same. This is powerful because the amount of useable memory is not bound by the limit of the address size (2^16 or 0xFFFF)

Consider an example: The 8Kb of video RAM is defined in address space 0x8000 through 0x9FFF. By writing to a special register, the entire 8Kb of VRAM data can be swapped out at will for another bank of 8Kb. This effectively doubles our VRAM to 16Kb while using the same address space.

Since the Gameboy has this scheme of memory management, we must also emulate it. This can be done by defining a struct that has control over all of our memory and interconnects them:

pub fn new(_rom: Rom) -> Interconnect {
Interconnect {
rom: _rom,
gpu: Gpu::new(),
wram: Wram::new(),
hram: Hram::new(),
}
}

Now we can call read and write on any address and not have to care because the memory mapping and bank switching is handled automatically.

pub fn read(&self, address: u16) -> u8 {
match address {
ROM_START ... ROM_BANK_END => self.rom.read(address), // $0000...
VRAM_START ... VRAM_END => self.gpu.read(address),
ERAM_START ... ERAM_END => self.rom.read(address),
WRAM_START ... WRAM_END => self.wram.read(address),
ECHO_START ... ECHO_END => self.wram.read(address),
OAM_START ... OAM_END => self.gpu.read(address),
HRAM_START ... HRAM_END => self.hram.read(address), // ...$FFFF
_ => panic!("Invalid Read")
}
}

Dynamic memory addressing. The write method is similar.

The responsibility of returning the correct data is now passed on to the individual memory modules. It is there that they match the address to the appropriate vector depending on the active memory bank. Since the processor relies on data from memory, it can make use of these methods and is emulated next.

CPU

The Gameboy CPU is a modified version of the 8-bit Zilog-Z80 microprocessor and has the following features:

The 8-bit registers are each 1 byte of internal memory specifically designated for the CPU so it can quickly perform arithmetic. The PC register stores the address of the next CPU instruction to be executed. The SP register stores the top address of a LIFO stack. These registers can be represented easily:

pub struct Registers {
pub a: u8,
pub f: u8,
pub b: u8,
pub c: u8,
pub d: u8,
pub e: u8,
pub h: u8,
pub l: u8,
pub sp: u16,
pub pc: u16,
}

The CPU has hundreds of unique instructions that are indexed by an 8-bit value. Each instruction does something important, like performing arithmetic, loading data into memory, comparing values, etc. The CPU runs at 4.19 Megahertz which means that it performs 4.1 million cycles per second. To put that into perspective, my desktop computer has a 4 Gigahertz processor, which is over 1,000 times faster than the Gameboy’s CPU!

Modern video game consoles are running at much higher speeds. this is why emulators for the Playstation, Xbox, or Wii tend to be slower. Not to mention that every single piece of hardware for modern consoles is becoming more and more advanced. This makes reverse engineering and emulating modern consoles an overwhelming challenge to say the least.

But let’s get back to the archaic Gameboy. Each instruction has a varying amount of cycles that it takes. An average instruction takes about 4, 8, or 12 cycles depending on what it does.

The instruction that the CPU will execute next is found at the memory location defined by the program counter (PC) register:

// Reads the next byte and increments the program counter
fn next_byte(&mut self, memory: &Interconnect) -> u8 {
let byte = memory.read(self.regs.pc);
self.regs.pc = self.regs.pc.wrapping_add(1);
byte
}

The most important part of CPU emulation is the fetch/decode/execute procedure:

// Perform one step of the fetch-decode-execute cycle
pub fn step(&mut self, memory: &mut Interconnect) -> usize {
self.handle_interrupts(memory);
let opcode = self.next_byte(memory);
// decodes/excecutes each operation and returns cycles taken
match opcode {
0x78 => { self.regs.a = self.regs.b; 4 }, // LD A, B
0x79 => { self.regs.a = self.regs.c; 4 }, // LD A, C
// repeat for every instruction...
_ => panic!("Unknown Opcode: ${:02X} @ ${:04X}", opcode, self.regs.pc)
}
}

This procedure reads the next instruction from memory and translates/carries out the task it represents. Each branch returns the amount of cycles taken for that instruction. The Gameboy’s other hardware (video/audio/etc.) is all tied in with the processor speed. Keeping track of the cycles taken is important to keeping all other hardware in sync.

Video Display

Interestingly, the Gameboy has a very smooth screen refresh rate of 60Hz. This means that the screen updates or draws a new frame to the LCD screen 60 times per second. Since we know how many times the CPU cycles per second, it is easy to time our video output. All that we have to do is divide the CPU cycles per second by the number of frames per second:

const CLOCK_SPEED: i32 = 4194304;
const FRAME_RATE: i32 = 60;

let cycles_per_frame = clock_speed / FRAME_RATE;

Then the GPU struct needs a variable to count the number of cycles since the last frame. Once this counter goes above cycles_per_frame, then we know that it is time to draw a frame to the screen.

When the Gameboy is running, there are certain times when the video hardware is accessing video memory. Because games are not allowed to write to video memory when the GPU is reading from it, it is important to emulate and keep track of these hardware timings.

The Gameboy’s screen resolution is 160x144 pixels. A lot of the old video displays update the screen in scan lines going from left-to-right, top-to-bottom. Once a scanline is drawn, the beam goes through a horizontal blanking period where it resets itself far left and one scanline down. Once it has drawn all 144 scanlines, it goes through the vertical blanking period and it resets itself at the top of the screen to start the process again. This rendering process can be emulated by keeping track of the modes.

Special system interrupts are also called when the GPU starts doing something new. This is useful for programmers, because it enables programmers to have special per-scanline video effects.

The meat and potatoes of our GPU is a function that keeps track of the emulated cycles and updates the internal state of the GPU accodringly. Once enough time has passed that a frame can be drawn, the internal framebuffer is cloned and passed to the emulator to be displayed.

pub fn cycles(&mut self, cycles: usize, interrupt: &mut InterruptHandler, video_sink: &mut VideoSink) {
if !self.display_enabled() {
return;
}
let old_mode = self.get_mode();
let mut new_mode: StatusMode;
self.scanline_cycles += cycles;
self.frame_cycles += cycles;
// we are in vblank

if self.frame_cycles > FRAME_PERIOD {
// We have just entered the Vblank period
if old_mode != StatusMode::VBlank {
self.set_mode(StatusMode::VBlank);
// Call the appropriate interrupt
interrupt.request_interrupt(InterruptFlag::VBlank);
request_interrupt = self.STAT.is_set(Bit::Bit4);
video_sink.append(self.frame_buffer.clone());
}
// we have completed vblank period, reset everything, update sink
if self.frame_cycles > VBLANK_PERIOD {
self.scanline_cycles = 0;
self.frame_cycles = 0;
self.LY.clear();
self.line_compare(interrupt);
self.set_mode(StatusMode::Oam);
}
} else {
// Update the scanline state
match self.scanline_cycles {
0 ... OAM_PERIOD => { // OAM
if old_mode != StatusMode::Oam {
self.set_mode(StatusMode::Oam);
request_interrupt = self.STAT.is_set(Bit::Bit5);
}
},
OAM_PERIOD ... TRANSFER_PERIOD => { // Transfer
if old_mode != StatusMode::Transfer {
self.set_mode(StatusMode::Transfer);
// ransferring data from VRAM to screen.
self.update_scanline();
}
},
TRANSFER_PERIOD ... HBLANK_PERIOD => { // H-Blank
// We have just entered H-Blank
if old_mode != StatusMode::HBlank {
self.set_mode(StatusMode::HBlank);
request_interrupt = self.STAT.is_set(Bit::Bit3);
}
},
_ => {},
}
}

// request an interrupt if we need to
if request_interrupt {
interrupt.request_interrupt(InterruptFlag::Lcdc);
}

// If we have finished the H-Blank period, we are on a new line
// LY is updated even if we are in V-blank
if self.scanline_cycles > HBLANK_PERIOD {
self.LY.add(1);
self.scanline_cycles = 0;
self.line_compare(interrupt);
}
}

The internal framebuffer is a vector of bytes sized to the dimensions of our screen:

frame_buffer: vec![0; FRAME_WIDTH * FRAME_HEIGHT],

At the end of each scanline, we update the framebuffer with the contents of VRAM. This is done in the update_scanline function:

fn update_scanline(&mut self) {
if self.LCDC.is_set(Bit::Bit0) {
self.draw_background();
}
if self.LCDC.is_set(Bit::Bit5) {
self.draw_window();
}
if self.LCDC.is_set(Bit::Bit1) {
self.draw_sprites();
}
}

The Gameboy has 3 distinct video layers that are all made up of 8x8 pixel tiles.

The contents of each one are defined in their own special place in VRAM. The background layer is just that. A background of graphics to be drawn to the screen. The window layer is a static layer drawn above the background. The window typically is a blank space that keeps track of player lives/scores, etc. Sprites are the small characters that move on screen, and are typically drawn on top of the background layer.

Each one is rendered slightly differently and the intricacies are beyond what I want to write about in this blog. A full overview of my GPU implementation can be found here.

Joypad Input

There are 4 directional and 4 general buttons on the Gameboy. The game can read/write to a special address in memory to determine the state of the buttons. In order to emulate this, all that is needed is a struct that wraps our host keyboard keypresses to the 8-bit memory register that the game expects to read from.

pub struct Joypad {
button_state: u8,
directional_state: u8,
register: MemoryRegister,
}

When we press a button on our keyboard, we want to call a funciton that updates the internal state of our joypad. For convenience this can be called at the same interval that we refresh the screen (60 times per second).

fn read_input(&mut self) {
self.gameboy.joypad.set_direction_pressed(PAD_UP, self.window.is_key_down(Key::Up));
self.gameboy.joypad.set_direction_pressed(PAD_DOWN, self.window.is_key_down(Key::Down));
self.gameboy.joypad.set_direction_pressed(PAD_LEFT, self.window.is_key_down(Key::Left));
self.gameboy.joypad.set_direction_pressed(PAD_RIGHT, self.window.is_key_down(Key::Right));
self.gameboy.joypad.set_button_pressed(BUTTON_A, self.window.is_key_down(Key::A));
self.gameboy.joypad.set_button_pressed(BUTTON_B, self.window.is_key_down(Key::S));
self.gameboy.joypad.set_button_pressed(BUTTON_START, self.window.is_key_down(Key::Z));
self.gameboy.joypad.set_button_pressed(BUTTON_SELECT, self.window.is_key_down(Key::X));
}

And the set_button_pressed example method looks like this (set_direction_pressed is similar):

pub fn set_button_pressed(&mut self, interrupt: &mut InterruptHandler, input: u8, is_pressed: bool) {
match is_pressed {
true => {
self.button_state = self.button_state | input;
interrupt.request_interrupt(InterruptFlag::Joypad);
},
false => { self.button_state = self.button_state & !input; },
}
}

Then, whenever the special joypad register is read from, it will accurately return the state of the buttons pressed on your host keyboard, depending on whether the game requests directional or regular buttons:

pub fn read(&self) -> u8 {
let mut result = self.register.get() & 0x30;
let pad = match self.read_next {
ReadInput::Directional => !self.directional_state,
ReadInput::Buttons => !self.button_state,
};
result | (pad & 0x0F)
}

Bringing it all together

The last thing that is needed is a class that ties it all together. The emulator class has our Gameboy object and a window object. We load the ROM file and pass it to our gameboy, and initialize the window.

pub fn new(rom: Rom) -> Emulator {
let mut title = "Gameboy Crust - ".to_owned();
title.push_str(&rom.name());
Emulator {
gameboy: GameBoy::new(rom),
window: Window::new(title.as_str(), 160, 144, WindowOptions {
borderless: false,
title: true,
resize: false,
scale: Scale::X4,
}).unwrap(),
}
}

Now we implement a run function that will be called from main and will run forever. This is the high level function that emulates the Gameboy. It matches the 4.19Mhz speed of the CPU, and updates the video and reads joypad input at 60Hz:

pub fn run(&mut self) {
while self.window.is_open() && !self.window.is_key_down(Key::Escape) {
let start_time = Instant::now();
let frame_time = Duration::new(0, 16600000); // 16.6 ms as nanoseconds

let mut video_sink = VideoSink::new();
let cycles_per_frame = clock_speed / FRAME_RATE;
let mut emulated_cycles = 0;

while emulated_cycles <= cycles_per_frame {
emulated_cycles += self.gameboy.step(&mut video_sink) as i32;
}

if let Some(frame) = video_sink.consume() {
self.window.update_with_buffer(frame.as_slice()).unwrap();
self.read_input();
}

// We have done our calculations, wait the remaning time
let elapsed_time = start_time.elapsed();
if !(elapsed_time > frame_time) {
let remaining_time = frame_time - elapsed_time;
thread::sleep(remaining_time);
}
}
}

Conclusion

Programming emulators is a very challenging, technical, long, and difficult task. This project took me quite some time, and I had good documentation! I didn’t even have to go through the more difficult process of reverse engineering the system. This project has given me a great deal of respect for the people who are reverse engineering and writing emulators for modern consoles.

Overall, it was a great learning experience. stepping through and debugging an emulator really forces you to learn about the system and it is very interesting to understand how these old game consoles work. I also learned quite a bit about how some of my favorite GB games were programmed.

Documentation