6502 FORTH, Part 2: NEXT, EXIT, and DOLIST
Published:
In my previous post, I set up my development environment for creating a Forth interpreter from scratch. The next step is to create the foundational Forth operators next
, exit
, and dolist
.
A Quick Background on Forth
Forth is an entirely stack-oriented language that uses functions called words. Each word executes a distinct set of instructions, and new words can be defined on the fly by programmers at runtime, either in assembly or in Forth (depending on the implementation). All data live in a data stack, and words operate by pushing or pulling values from this stack. Because of this, words operate in reverse Polish notation. As a quick example, to multiply the numbers 10 and 25 together, the internal code does the following:
#> (stack) ->
#> 25 (stack) -> 25
#> 10 (stack) -> 10 -> 25
#> * ...
* stack pops two values *
* stack multiplies values *
* stack pushes result *
ok. (stack) -> 250
#> CR . (stack) -> 250 -> 250
250 (stack) -> 250
We could have written this as a single line with 25 10 * CR .
. The last two commands here, CR .
, start output on a new line and print out the top value of the stack (resp.).
Internally, Forth keeps a dictionary of all the words it recognizes as a linked list, along with pointers to where the code for that routine is stores. When a word is given as input, the interpreter searches the dictionary for that label, then executes the associated code. This linked list is also implemented as a stack, so most recently added words are searched first. As a result, words can be easily redefined at runtime if desired.
The interpreter depends on a small set of core words to make all this execution happen, the most important of which are next
, exit
, and dolist
. dolist
begins the execution of a compiled word, next
is responsible for executing underlying instructions sequentially within a single word, and then exit
returns back to the previous execution context. For any word, the execution proceeds as follows:
(word called)
DOLIST -> NEXT -> EXIT -> NEXT -> ...
next
may spawn child processes that call dolist
as well. This process continues until we’re all out of instructions to execute, at which point the program terminates successfully and prints ok.
Since execution of any word depends on these three words, I’m going to implement them first.
Previous Definitions:
IP=$50 ; Forth instruction pointer
RP=$52 ; return stack pointer
DT=$54 ; pointer to top of dictionary stack
TMP1=$56 ; temp value
TMP2=$58 ; temp value
;; 1 byte variables
TPTR=$5A
TCNT=$5B
;; other
DPTR=$5C
INPUT=$7F00 ; input space
WORDSPC=$7EC0 ; temp space for parsing words (<=63 chars)
DOLIST
Every time dolist
is called, it has to execute the following instructions:
- Store the instruction pointer on the return stack pointer
- Set the instruction pointer to the new word to be executed
- Call
next
to start executing the word
These are all little-endian 16 bit addresses, so we have to store each byte separately.
;;; DOLIST Definition ;;;
dolist:
;; First, we store the instruction pointer on the return stack
lda IP
sta (RP)
inc RP
lda IP+1
sta (RP)
inc RP
;; Next, we get the address stored at the location in IP
;; (double indirect access) and store in TMP1
lda (IP)
sta TMP1
ldy #1
lda (IP),y
sta IP+1
lda TMP1
sta IP
;; IP now points to the code word of defined word we want
;; to execute, so we can just fall through to next
jmp next
NEXT
next
is called immediately after dolist
and exit
.
;;; NEXT Definition ;;;
next:
;; finding the location of the next word
;; we advance 2 bytes
;; bne skips upper byte if we haven't rolled over
;; ex. if the address at IP,IP+1 is $00FF, INC IP
;; produces $0000, when we'd need $0100
.(
inc IP
bne .cont1
inc IP+1
continue:
.)
.(
inc IP
bne .cont2
inc IP+1
continue:
.)
;; IP now points at location of next word to execute.
;; We need to fetch that location, then store it in TMP1.
ldy #0
lda (IP),y
sta TMP1
iny
lda (IP),y
sta TMP1+1
;; Now we load the code address stored in TMP1 into TMP2
lda (TMP1),y
sta TMP2+1
dey
lda (TMP1)
sta TMP2
;; Finally, we jump to the address stored in TMP2 to execute
jmp (TMP2)
This .()
syntax was a little new for me coming from vasm
–the parentheses define a local scoping, so we can reuse the continue
label and the program will jump to the label defined in the most recent scope.
EXIT
The job of exit
is to undo all the intermediate pointer manipulation done by next
and dolist
in order to return the execution context to wherever it was when dolist
was initially called. This takes the form of the following steps:
- Remove the last address from the return stack
- Reset the instruction pointer to the value it previously was at
- Execute the next instruction
;;; EXIT Definition ;;;
exit:
;; remove an address from return stack
dec RP
dec RP
;; Take the value that was on return stack
;; and replace it in instruction pointer
;; (recall that in DOLIST we stored IP in RP and then incremented RP by two)
;; (this resets RP to where it was, and pops the value we stored back into IP)
ldy #1
lda (RP),y
sta IP+1
lda (RP)
sta IP
;; Finally, execute the next instruction
jmp next
Next Steps
I’m still a little ways off from being able to test this even on the simulated 65c02, but it is progress! Next steps will include adding the minimal amount of features necessary to be able to test that this part of the code works correctly.