-- Leo's gemini proxy

-- Connecting to gemini.abiscuola.com:1965...

-- Connected

-- Sending request

-- Meta line: 20 text/gemini

What it feels to write a parser with yacc.


Or why it's still a pretty good tool for the job


You may have noticed that the old website is gone. This new site uses almost the same style as the previous one, with a notable difference. The pages are generated from the gemtext ones.


I was messing around the idea of unifying the geminispace and web experience, by writing my content only in gemtext and considering I used markdown previously, I thought it was a pretty easy transition. So I started to investigate different approaches. The only constraint, as usual, was to write the software myself, because, well, it's just fun.


The first thing that came to my mind, was to write a proxy that would translate things on the fly, a proxy, requesting the pages to my gmnxd(8) and returning to the client a nice HTML page. By itself, this kind of software is quite simple to write, in particular in a language like Go, but I didn't want to run another daemon when I already have httpd(8) running. Even writing it to use FastCGI, wouldn't make much sense.


Also, I don't have that much free RAM to run another standalone daemon and while gmnxd(8) just handles one requests and exits, a proxy should run pretty much all the time to serve requests. As an example of what a proxy is like:


Drew's proxy solution, in Go.


But it does not fit my needs, so out of the window it goes. I decided then to better define my requirements:


It should be transparent, meaning, no additional steps to publish pages.

It must be as light as possible.

It must be able to generate standard HTML5.

It must be able to link a custom CSS file, for better styling.


Omar Polo, was doing something similar using perl, or awk in precedence, but awk is not totally suited to act as a parser for a free-form grammar. Perl may be a better choice, but there is no easy way to write a proper parser in it.


Then came the idea to use yacc for this project. Problem was, I used it the last time, briefly, something like 10 years ago. Time to re-learn it!


Surprisingly, resources on the net are quite scarce.


All the documentation I found, tries to teach you yacc on a pretty high-level with lex, not covering how it works, it's environment and what it does generate. This means that you will never be able to properly understand how it works and how to write a decent parser in it. The only source of truth, is the original Bell Labs paper.


The yacc paper


All in all, reading the paper carefully and with a couple of attempts, yacc produces really good and robust parsers. Yacc is an old tool, but if you are under UNIX and you don't want particular dependencies, yacc will probably be there, ready to work for you.


It takes a bit of mastery to properly define a grammar for your document and more to write a good lexer (do not use lex), but the result is really good.


gmi2html code


Gmi2html generates an HTML5 standard document reading it from a file or standard input. I must say that thanks to yacc, adding support to identify the problematic lines and columns when reporting an error was quite easy, even if for the moment it tells you the infamous "syntax error" phrase.


I'm not aware of similar tools in other languages, I know Go has (or used to have) a yacc package, but I don't know if it's used.


All in all, yacc is the demonstration that good, well-engineered tools, even if 40 years old, are still and always will be useful, for the years to come.

-- Response ended

-- Page fetched on Mon May 20 20:38:17 2024