In my previous post, I set up my development environment for creating a Forth interpreter from scratch. The next step is to create the foundational Forth operators
A Quick Background on Forth
Forth is an entirely stack-oriented language that uses functions called words. Each word executes a distinct set of instructions, and new words can be defined on the fly by programmers at runtime, either in assembly or in Forth (depending on the implementation). All data live in a data stack, and words operate by pushing or pulling values from this stack. Because of this, words operate in reverse Polish notation. As a quick example, to multiply the numbers 10 and 25 together, the internal code does the following:
#> (stack) -> #> 25 (stack) -> 25 #> 10 (stack) -> 10 -> 25 #> * ... * stack pops two values * * stack multiplies values * * stack pushes result * ok. (stack) -> 250 #> CR . (stack) -> 250 -> 250 250 (stack) -> 250
We could have written this as a single line with
25 10 * CR .. The last two commands here,
CR ., start output on a new line and print out the top value of the stack (resp.).
Internally, Forth keeps a dictionary of all the words it recognizes as a linked list, along with pointers to where the code for that routine is stores. When a word is given as input, the interpreter searches the dictionary for that label, then executes the associated code. This linked list is also implemented as a stack, so most recently added words are searched first. As a result, words can be easily redefined at runtime if desired.
The interpreter depends on a small set of core words to make all this execution happen, the most important of which are
dolist begins the execution of a compiled word,
next is responsible for executing underlying instructions sequentially within a single word, and then
exit returns back to the previous execution context. For any word, the execution proceeds as follows:
(word called) DOLIST -> NEXT -> EXIT -> NEXT -> ...
next may spawn child processes that call
dolist as well. This process continues until we’re all out of instructions to execute, at which point the program terminates successfully and prints
Since execution of any word depends on these three words, I’m going to implement them first.
dolist is called, it has to execute the following instructions:
- Store the instruction pointer on the return stack pointer
- Set the instruction pointer to the new word to be executed
nextto start executing the word
These are all little-endian 16 bit addresses, so we have to store each byte separately.
next is called immediately after
.() syntax was a little new for me coming from
vasm–the parentheses define a local scoping, so we can reuse the
continue label and the program will jump to the label defined in the most recent scope.
The job of
exit is to undo all the intermediate pointer manipulation done by
dolist in order to return the execution context to wherever it was when
dolist was initially called. This takes the form of the following steps:
- Remove the last address from the return stack
- Reset the instruction pointer to the value it previously was at
- Execute the next instruction
I’m still a little ways off from being able to test this even on the simulated 65c02, but it is progress! Next steps will include adding the minimal amount of features necessary to be able to test that this part of the code works correctly.