gemini://bbs.geminispace.org/s/FORTH/5444

Some popular language are exactly what they are, syntactically. C, for instance, has no provisions for extending itself, other than typedef for providing names for simple or composite _data_ types. A more interesting language, like Lisp, or Forth, is designed to be extended, and is largely largely written in itself.

Forth, in particular, has a very straightforward way to write code that generates code. Traditional Forth has two distinct modes of operation: interpreter and compiler. When interpreting, words (symbolic names roughly standing for procedures) entered into the system are executed, one by one. When compiling, Forth appends the dictionary with what amounts to calls to such procedures.

Classic Forth is normally in interpretive mode. To define a word, the ':' operator creates the metadata (name, type,etc) for a word, and the system compiles all words that follow up to the ';' word which roughly represents a return from a procedure, placing the system into interpretive mode again.

Oops, I used the word 'operator' for the ':', because something is weird here - the colon, when interpreted, starts the compiler! And note that the semicolon, when compiled, must execute some code to switch to interpretive mode.

In reality, every word in Forth has at least two distinct behaviors: interpret-time and run-time. A numeric literals put its value onto the datastack when interpreted, and will compile code to put the value onto the stack at some later time when the code is executed. A normal, procedure-like word, will run when interpreted and compile a call to itself in compile mode. And so on. A special bit designates a word as an IMMEDIATE -- a word which is executed when encountered, and can therefore take over the parsing or compilation, temporarily.

This arrangement leads to a reasonable syntax a lot of the time: you type in some commands which get interpreted, define some words (which get compiled), etc. But the duality of state creates weird situations. Conditionals don't work - because, when an 'IF' gets interpreted, there is nothing compiled to jump around or into. So special versions of words are created for interpret-time, such as [IF] and [THEN]. This is annoying and inconsistent.

I have a long history of fighting with this inconsistency, and thought I had it beat by abolishing the modes altogether. Most of the Forths I've written are compilers (some even native compilers). I simply compile whatever comes in, and simulate interpretation when appropriate by compiling, running, and eradicating the compiled code.

Normally, this creates a system that appears much like a traditional Forth. However, there is no interpret/compile duality. All code compiles the same exact way, and the only trace of the 'interpret' mode is when we decide to run/erase it or just leave it compiled. Problem solved, I thought.

But, as is often the case, I did not really solve the problem, but simply displaced it, and it rears its ugly head elsewhere. For instance, when Forth interprets code, it executes it directly where it's defined. In my system, the code is compiled into the dictionary, and at least starts executing there. That means that code that compiles code has to account for the fact that it's been invoked from the dictionary -- should it compile code after the invocation and leave garbage in the dictionary? Or should it overwrite the invocation which would normally be destroyed (and risk destroying the code it will return to, crashing the system)?

This becomes obvious in the word 'included;' which loads and compiles a file containing Forth code. Consider:

This will not work, because the double-quote will compile the filename as a string into the dictionary, and execute immediately in interpret mode, leaving a pointer to the string (and count) on the stack. The system will promptly erase itself, and any attempt to use the string to open a file will fail.

So we have to at least group the expression to be compiled/executed together, using the parentheses:

This kind of works, but included must compile a bunch of stuff from the file. If it does so, the filename and the call to included will be stuck in the dictionary, and at the end of the load, upon hitting the end of the expression, the system will erase what it just loaded (as part of normal pseudo-interpretation).

And so, included must be really smart: it needs to erase all traces of itself and the filename prior to loading and compiling the file, and then it needs to convince the system to _not_ erase what it just compiled, after it returns. Yuck.

And so, the syntax is a little more involved. First of all we force the execution of the block using square braces (which compile and always execute the block as a unit). Then, we make included communicate with the block machinery, (and we name it included; to indicate that after its execution nothing else in the block will run. So the final result is

Since there is no interpret/compile mode flag, and we are in interpret mode, how do we compile a new function? To define a word we would have to:

Here braces indicate that we are explicitly switching to compile mode. Although it's eerily c-like - and probably usable, I really hated every minute of it.

An so I did the unthinkable: I put the system into compile-dominant mode. Now we compile by default, and execute only by explicitly marking execution blocks.

This streamlines the definition syntax by eliminating the braces, but makes command-line interaction more annoying: anything typed in does not run unless it's in square brackets. This makes development tedious. Luckily, there is another missing piece which helps the interaction.

Working with Forth, in practice, requires marking a known-good state and interactively messing around, often failing, which results in adding junk to the dictionary. We then want to restore the known-good state and try again. To do that, I added a command familiar to old-school unix people:

sync creates a commit point in the dictionary, allowing us to add any kind of crap interactively, and then `abort` to the commit point -- or create a new one with another `sync`.

The same machinery makes interaction easier: instead of using square braces we can simply run from the commit-point using the .. command:

except a lot easier on the eyes (althogh the square brace works anywhere, not just on top of a sync point!)

By eliminating the interpret/compile flag, I though I tamed the beast of two distinct modes. However, I may have introduced the alternative, infinity of distinct modes! Consider:

In classic Forth, 7 is placed onto the stack, then a variable FOO is created in the dictionary, and 7 is placed into it. What happens in nForth, in its compile-dominant mode?

Let's say VARIABLE is an immediate word (like in classic Forth, since it has to compile). So we compile code to put 7 on the stack, then VARIABLE creates FOO, and loads with the value on the stack -- which is not 7, because 7 has not been executed yet. It's loaded with whatever is on the stack as we compile.

Easy, you say. Just put it into parentheses so it gets compiled together. ( 7 VARIABLE FOO ) -- but it does not work, because again, 7 compiles, but VARIABLE is run as soon as it's parsed, and 7 executes later.

Oh, the square brace - then the whole thing is interpreted together, right? No, because the expression is first compiled, and VARIABLE is executed right then and there during compilation, and again, before 7 gets put on the stack!

We need to forcibly execute 7 to put it on the stack, before our Forth gets to VARIABLE. Now the whole thing works.

Lambdas are compiled expressions, which, when executed return an address, which, when executed, will run the expression... When compiling in a definition, it is simplish:

But if you do it on the command line, how do you get its address? It will compile it, but without running into it the address does not get stacked. And the only tool we have right now is to enclose the lambda into braces, which will give us its address, but will also destroy the lambda!

The whole experience has left me a little shellshocked for the last few days. I thought that after decades of crafting threaded interpreters I understood their workings inside and out, but life finds a way of making an idiot out of me yet again. Information is like that squishy toy with parts of it that pop out -- you squeeze one part in and there is a big bubble elsewhere...

Or consider that you can run a block anywhere, anytime, as part of some other compilation or interpretation -- but only if you agree to destroy it immediately. Should there be a way to compile and execute but not destroy? And words that are immediate are executed as soon as encountered, regardless of how the blocks are structured or if we are compiling or executing -- should they simply be reduced by one level of immediacy instead (and execute when compiled but compiled when meta-compiled?) Do we actually have multiple levels of compilation and multiple levels of interpretation? Has classic Forth just hidden it by flattening the entire infinity into two possibilites? ... So many possibilities...

Sorry, I don't mean to be a pedantic ass. This is sort of a set of implementation notes, thinking outloud.

Metaprogramming and nForth

How do we include a file?

More Syntax Annoyances

But It Gets Even Worse

What about Lambdas?

Clearly, More Thinking Is Required.

1 Comment