-- Leo's gemini proxy

-- Connecting to warmedal.se:1965...

-- Connected

-- Sending request

-- Meta line: 20 text/gemini; lang=en

Wishing for Stricter Gemtext


This is going to be an unpopular opinion, but here goes. (I also want to apologize in advance to anyone using a screen reader; this text discusses gemtext syntax and I assume screen readers will be unable to discern the subtle differences in examples.)


I've already written a tool for gemtext to HTML conversion. But for reasons I now feel the need for a more general parser.


Now, the aforementioned tool uses a whole lot of regex magic to figure out which lines are what. And then there's of course some logic to handle whether we're inside a preformatted block or not (and because it translates to HTML there's also similar logic for whether we're inside a list or not, and I should probably add for whether or not we're in a blockquote block, but those issues are related to HTML rather than gemtext).


Anyway, that was then and now is now, and when I look at that code now and my current needs I feel a little unsatisfied with the needless parsing complexity of gemtext. Because even though it's very easy, it's still harder than it could be.


Consider these examples:

"=>URL text" and "=> URL text" are equivalent. (If you're reading the HTML version of this post you'll have to check the source to see how big the difference really is.)

"> Hello" is a quote, but is the the quote "Hello" or " Hello", with the leading whitespace?

A line in a preformatted block can not start with ```, as that would end the block.

When it comes to headers I have to first check for lines starting with three #, then two, then one. Because a line that starts with one which is then followed by another one is of course not a first level heading starting with a #. But it could be, technically, because there's nothing in the spec that mandates that the leading # must be followed by a whitespace. And what if I want to write the heading " is a whitespace" (with a leading whitespace)? I can't, really.

A list item line starts with "* ", not "*" followed by an arbitrary number of any whitespace as the other line types. Thus a list item text can actually have leading whitespace, but a heading can not.


So, when parsing gemtext I need to consider the first one to three characters of the line, and whether or not it is within a preformatted block.


What if I always, without exception, just had to check the first three characters? The linetypes could be something like this:


#  First level heading.
## Second level heading.
###Third level heading.
*  List item.
=> Link line.
>  Block quote line.
>> Preformatted line.
And of course any other line, which is just plain text.

The URL and text on a link line should be separated by a single blankspace, not an arbitrary number of any type of whitespace.


I also think that plain text lines should be trimmed of leading and trailing whitespace when rendered, simply because I don't see any reason to keep them, and it would also allow plain text lines to start with a leading space followed by a character combination that would otherwise cause the line type to be different.


Maybe I should have sent this to the mailing list, but it's not a proposal for change. It's just unfinished thoughts that I'd like feedback on.


-- CC0 ew0k, 2021-07-23

-- Response ended

-- Page fetched on Fri Apr 26 20:48:43 2024