Writing a 6502 Emulator

12 minute read

Published:

I recently watched an awesome video from Computerphile about writing an emulator for the Atari 2600. The 2600 runs off a 6507 processor, which is basically a modified 6502. This got me thinking: how hard would it actually be to write an emulator for a 6502 computer? At this point I’ve already built a computer with one and am close to having a working Forth interpreter–so I’m pretty familiar with how the microprocessor works internally.

Emulation definitely gets harder as systems get more complicated, but for a small 8-bit microprocessor with a small instruction set, it’s not super hard to set up a project in C to accurately read compiled 6502 assembly and reproduce the behavior one would expect.

The complete source code is available on Github. Rather than go through it line by line, I just wanted to give a broad overview of the intuition behind the code. If you’re interested in the complete code though, definitely check out the code repository!

Core Setup and Assumptions

Let’s start with assumptions. There are a lot of ways to set up an emulator depending on what you’re trying to accomplish–my primary goal here is to be able to run compiled 6502 assembly code, so I’m not trying to support every use case or setup. Any other ICs connected to the computer will be implemented entirely in software, with the exact internal connections obfuscated.

I’m also going to assume that the code is correctly compiled 6502 assembly code. At the beginning, I’m focusing specifically on 6502 code, and once I have that working I’ll move on to implementing the updates included in the 65c02 line. I’m doing my best to emulate the correct clock speeds, and once I have a working implementation I’ll also try to replicate known bugs from a real 6502 (ex. the JMP instruction using indirect indexing to the bottom of the third page would incorrectly read from memory locations $30FF and $3000 rather than $30FF and $4000).

The internal setup of the computer is pretty simple. 6502s have three 8-bit registers (A the accumulator, X, and Y), an 8-bit register for status flags, an 8-bit stack pointer, and a 16-bit address register. In total, that’s five 8-bit registers and one 16-bit register. The system supports memory access from $0000-$FFFF, so we also need an array of 65,536 bytes. In C this is pretty simple, we just initialize some global variables:

// defining byte an unsigned 8-bit integer
typedef uint8_t byte

// Accumulator, X, Y registers
byte a, x, y;

// Program Counter
uint16_t pc;

// Stack Pointer
byte stackpointer;

// Flags
// N V - B D I Z C
byte flags;

// Address bus
uint16_t address;

// Memory, 0x10000 has indices 0x0000-0xFFFF
byte memory[0x10000];

memory is an array of bytes, so we can access it either with array syntax (ex. memory[0x0001] is the second byte), or by using pointer arithmetic (ex. *(memory+0x0001) would access the same value). This comes in pretty handy, since we can access the current byte the program is at at anytime using *(memory+pc).

Normally we’d also have a collection of 8 pins corresponding to the data bus, where the processor sends or receives data to/from external sources (resp.). However, as I mentioned previously, I’m going to skip this in favor of just implementing the connections in software. This means that, for now, we won’t have any errors with things like chips not being hooked up correctly or having incorrect address decoding. Programs are essentially treated as running on a system with a 6502 whose outputs are all mapped to 64KiB of RAM. In the future, I’ll probably add in a translation layer that implements software-level address decoding and correctly misses lookups on chips that aren’t connected.

For now though, it’s simple enough to implement read/writing bytes:

byte read_byte(byte *address){
  sleep(CLOCK_TIME);
  return (*address);
}

void write_byte(byte *address, byte value){
  sleep(CLOCK_TIME);
  *address = value;
  return;
}

uint16_t read_address(byte offset){
  uint16_t val = read_byte(memory+offset+1);
  val <<= 8;
  val |= read_byte(memory+offset);
  return(val);
}

CLOCK_TIME here is a constant defined elsewhere such that sleep(CLOCK_TIME) pauses for a single clock cycle. This means that read_byte() and write_byte() both take a single clock cycle, which is roughly what happens on the 6502. There is some error in this timing due to the speed of the C code execution, but with how fast processors are nowadays that overhead should be negligible.

read_address() is another little function I wrote to help me with reading out addresses, since this happens a lot in 6502 instructions. 6502 is a little endian computer, meaning the least significant byte is store at the lowest address. This function ensures I don’t ever mess that up (since I end up looking up the difference between little and big endian pretty much every time I write a function using addresses).

Opcode Decoding

Okay, so we now have some data and registers. The next big part is a way to determine what to do for each operation. Every instruction for the 6502 is a single byte (so a number from 0x00 to 0xFF). Based on its value, we may need to read up to two additional bytes depending on the instruction. For example, if we want to jump to the code at $0140, we would need the opcode for JMP with absolute addressing, then the address $0140 (two bytes). This would look like 6C 40 01, since 6C corresponds to JMP(abs) and $0140 is entered in little endian format.

There’s a great resource online for some good rules to decode opcodes. The naive approach is to just switch...case on all 255 possible values, but that’s pretty inefficient. Some of the opcodes aren’t used, and others have a lot of patterns.

Each opcode is a byte, meaning 8 bits. Based on their values, they fall into seven main groups: Groups 1-3 (G1,G2,G3), Single Byte 1-2 (SB1 and SB2), Conditionals, and Interrupt/Subroutine (I/S). Groups 1-3 are named based on the MC6500 Microcomputer Family Programming Manual, as mentioned in the previous link. The other three groups are ones I named myself based on their patterns–they’re a little bit of a mixed bag.

SB1 is the simplest group–if the low nibble of the opcode is 0x8, then its in SB1. These are all single byte instructions, meaning we don’t have to read any bytes past the value. An easy example is opcode 0xE8, which corresponds to INX and increments the X register by one.

SB2 are also single byte instructions, and are all instructions where the low nibble is 0xA and the upper nibble is at least 0x8, so anything of the form 1000 1010.

If the instruction isn’t in SB1 or SB2, it’s probably in one of the main groups, G1-3. All of the opcodes in these groups follow the same pattern: If we regard the bits as aaa bbb cc, then cc determines which group we’re in, bbb determines the addressing mode, and aaa determines the operation. For example, G1 has cc=01, LDA has aaa=101, and the immediate addressing mode for G1 has bbb=010. Thus, LDA # has opcode 101 010 01 = 1010 1001 = 0xA9. The addressing modes are slightly different between G1 and G2-3, so I implemented two different functions to decode the addressing mode depending on which group it belongs to.

Since read_byte and write_byte take as input a pointer to a byte, the address mode decoding function can simply return the pointer to the value we’re going to access. For immediate instructions this is simply memory+pc, for absolute instructions it’s memory + *(memory+pc), and similarly for other addressing modes. When we need to use a register, we can just pass the address of the register (ex. &a for operations on the accumulator).

Within G3 is a special subgroup, the Conditionals group. These are all of the form xxy 100 00, and branch conditionally on a specified value. xx determines which flag to check (N,V,C,Z for 00, 01, 10, 11 respectively), and y determines the value to check against. For example, 0xB0 = 101 100 00 branches if C == 1, which is the Branch if Carry Set (BCS) instruction.

The last group is sort of a weird bag of leftovers, the I/S group. This only has four instructions: BRK (0x00), JSR (0x20), RTI (0x40), and RTS (0x60). These are all of the form 0aa 000 000.

The final flow of the opcode decoding logic looks like this:


byte opcode = some_value;
byte highnibble = opcode >> 4;
byte lownibble = opcode & 0x0F;
if (lownibble == 8){
  // SB1 Logic
} else if (lownibble == 0xA && highnibble > 7){
  // SB2 Logic
} else {
  byte aaa = (opcode & 0xE0) >> 5;
  byte bbb = (opcode & 0x1C) >> 2; 
  byte cc  = opcode & 0x03; 
    
  switch(cc){
    case 1:
      // G1 address decoding and opcode logic
      break;
    case 2:
      // G2 address decoding and opcode logic
      break;
    case 3:
      if (bbb == 4){
        // Conditional Branching logic
      } else if (bbb == 0 && !(aaa & 0x4)){
        // I/S Logic
      } else {
        // G3 address decoding and opcode logic
      }
      break;
  }
}

The actual address mode decoding and opcodes are a bunch of switch...case statements, so I’m not going to include them. The result is a lot fewer than 255 case statements, but there are still tons of them in the codebase.

The opcodes themselves are pretty simple to implement at this point. We have a pointer to the data we’ll need and the operation, and the logic behind each operation is very basic. For example, for ORA, we just do:

void ORA(byte *addr){
  // OR value at addr with the accumulator
  write_byte(&a, a | (*addr));

  // Set N,Z flags (0x7D = 0111 1101)
  flags = (flags & 0x7D) | 
            ((a & 0x80)) |           // N
            ((a==0) << 1);           // Z
  return;
}

The only trick here is making sure that we’re updating the flags correctly. Some of these are pretty funky, like the V flag, and some updated in situations I wasn’t aware of (for instance, LDA updates N,Z depending on the value loaded). I also learned a lot about some instructions I didn’t know existed, like PHP and PLP.

At this point, I’ve written software to decode all the opcodes and run the logic behind them accordingly.

Bugs and Other Stuff

One of the interesting things about the actual 6502 is that it has a lot of bugs in it. In order to accurately emulate 6502s running, I’m eventually going to try to incorporate these bugs. For example, I learned that if a G1-3 opcode with cc = 11 is supplied, it will interpret it as both a G1 and G2 opcode. This means that if you somehow passed opcode 0x6B = 0110 1011, it’ll see a G1 code of 0x69 (ADC immediate) and the G2 opcode 0x6A (ROR on accumulator). This actually executes both instructions simultaneously, meaning it will try to add a value to the accumulator while also rotating it. As mentioned in documentation, sometimes this resolves correctly (ex. when different registers are referenced), sometimes it has weird behavior (ex. when one instruction is a hidden instruction), and sometimes it’s totally random (ex. STA and STX stores A AND X, sometimes with an extra constant depending on the manufacturer). “Hidden” instructions are already implemented with my setup, and are created by combining a function and opcode that don’t make sense. For example, 0x89 corresponds to STA immediate, which doesn’t make any sense (but will still execute).

Next Steps

That’s it for this blog post! I spent a lot of time learning how to make Makefiles for this, as well as getting my include guards to correctly guard against endless recursive includes. I write a lot of C code for R, but it’s been a whiel since I wrote a standalone C project…so I was a little rusty. Next steps are going to be writing some logic to actually load a compiled file into memory, and then I’ll start going through all my functions to make sure they’re working as intended. As part of that, I’ll be writing a bunch of unit tests to comprehensively test everything.

As always, you can check out the complete code on Github. Thanks for reading!