6502 FORTH, Part 2: NEXT, EXIT, and DOLIST

6 minute read

Published:

In my previous post, I set up my development environment for creating a Forth interpreter from scratch. The next step is to create the foundational Forth operators next, exit, and dolist.

A Quick Background on Forth

Forth is an entirely stack-oriented language that uses functions called words. Each word executes a distinct set of instructions, and new words can be defined on the fly by programmers at runtime, either in assembly or in Forth (depending on the implementation). All data live in a data stack, and words operate by pushing or pulling values from this stack. Because of this, words operate in reverse Polish notation. As a quick example, to multiply the numbers 10 and 25 together, the internal code does the following:


#>                (stack) ->
#> 25             (stack) -> 25
#> 10             (stack) -> 10 -> 25
#> *              ...

      *  stack pops two values  *
      * stack multiplies values *
      *  stack pushes result    *

ok.               (stack) -> 250

#> CR .           (stack) -> 250 -> 250
250               (stack) -> 250

We could have written this as a single line with 25 10 * CR .. The last two commands here, CR ., start output on a new line and print out the top value of the stack (resp.).

Internally, Forth keeps a dictionary of all the words it recognizes as a linked list, along with pointers to where the code for that routine is stores. When a word is given as input, the interpreter searches the dictionary for that label, then executes the associated code. This linked list is also implemented as a stack, so most recently added words are searched first. As a result, words can be easily redefined at runtime if desired.

The interpreter depends on a small set of core words to make all this execution happen, the most important of which are next, exit, and dolist. dolist begins the execution of a compiled word, next is responsible for executing underlying instructions sequentially within a single word, and then exit returns back to the previous execution context. For any word, the execution proceeds as follows:

(word called)
DOLIST -> NEXT -> EXIT -> NEXT -> ...

next may spawn child processes that call dolist as well. This process continues until we’re all out of instructions to execute, at which point the program terminates successfully and prints ok.

Since execution of any word depends on these three words, I’m going to implement them first.

Previous Definitions:

IP=$50                     ; Forth instruction pointer
RP=$52                     ; return stack pointer
DT=$54                     ; pointer to top of dictionary stack
TMP1=$56                   ; temp value
TMP2=$58                   ; temp value

;; 1 byte variables
TPTR=$5A
TCNT=$5B

;; other
DPTR=$5C
INPUT=$7F00                ; input space
WORDSPC=$7EC0              ; temp space for parsing words (<=63 chars)

DOLIST

Every time dolist is called, it has to execute the following instructions:

  1. Store the instruction pointer on the return stack pointer
  2. Set the instruction pointer to the new word to be executed
  3. Call next to start executing the word

These are all little-endian 16 bit addresses, so we have to store each byte separately.

;;; DOLIST Definition ;;;
dolist:
  ;; First, we store the instruction pointer on the return stack
  lda IP
  sta (RP)
  inc RP
  lda IP+1
  sta (RP)
  inc RP

  ;; Next, we get the address stored at the location in IP 
  ;; (double indirect access) and store in TMP1
  lda (IP)
  sta TMP1
  ldy #1
  lda (IP),y
  sta IP+1
  lda TMP1
  sta IP

  ;; IP now points to the code word of defined word we want
  ;; to execute, so we can just fall through to next
  jmp next

NEXT

next is called immediately after dolist and exit.

;;; NEXT Definition ;;;
next:
  ;; finding the location of the next word
  ;; we advance 2 bytes
  ;; bne skips upper byte if we haven't rolled over
  ;; ex. if the address at IP,IP+1 is $00FF, INC IP
  ;;     produces $0000, when we'd need $0100
.(
  inc IP
  bne .cont1
  inc IP+1
  continue:
.)
.(
  inc IP
  bne .cont2
  inc IP+1
  continue:
.)
  ;; IP now points at location of next word to execute.
  ;; We need to fetch that location, then store it in TMP1.
  ldy #0 
  lda (IP),y
  sta TMP1
  iny
  lda (IP),y
  sta TMP1+1

  ;; Now we load the code address stored in TMP1 into TMP2
  lda (TMP1),y
  sta TMP2+1
  dey
  lda (TMP1)
  sta TMP2

  ;; Finally, we jump to the address stored in TMP2 to execute
  jmp (TMP2)  

This .() syntax was a little new for me coming from vasm–the parentheses define a local scoping, so we can reuse the continue label and the program will jump to the label defined in the most recent scope.

EXIT

The job of exit is to undo all the intermediate pointer manipulation done by next and dolist in order to return the execution context to wherever it was when dolist was initially called. This takes the form of the following steps:

  1. Remove the last address from the return stack
  2. Reset the instruction pointer to the value it previously was at
  3. Execute the next instruction
;;; EXIT Definition ;;;
exit:
  ;; remove an address from return stack
  dec RP
  dec RP

  ;; Take the value that was on return stack
  ;; and replace it in instruction pointer
  ;; (recall that in DOLIST we stored IP in RP and then incremented RP by two)
  ;; (this resets RP to where it was, and pops the value we stored back into IP)
  ldy #1
  lda (RP),y
  sta IP+1
  lda (RP)
  sta IP

  ;; Finally, execute the next instruction
  jmp next

Next Steps

I’m still a little ways off from being able to test this even on the simulated 65c02, but it is progress! Next steps will include adding the minimal amount of features necessary to be able to test that this part of the code works correctly.