-- Leo's gemini proxy

-- Connecting to flexibeast.space:1965...

-- Connected

-- Sending request

-- Meta line: 20 text/gemini;lang=en_AU

A convention for gemlog tags


i'm not aware of any widespread convention for designating tags on a gemlog post - if there is one, please let me know! Otherwise, at the risk of creating standard 15[a], here's what i'm going to start using on my own gemlog.


Definition


1. A ‘tags line’ is designated with the character ‘🏷’ at the beginning of a line,

2. optionally followed by

3. at least one space and/or tab, and

4. a comma-separated list of values,

5. each value containing any valid UTF-8 character above and including codepoint 0x20,

6 with the exception of a comma (‘,’).

7. The line may contain a trailing comma.

8. The list of tags cannot extend past the end of the line beginning with ‘-<=’.

9. In the presence of more than one tags line in a document, the last line wins.


Commentary


1. When this post was first published, line 1 referred to “the three characters ‘-<=’”, and this line of commentary originally said “the trigraph ‘-<=’ is intended to look vaguely like a physical tag, and seems reasonably unlikely to be regularly used for another purpose within gemtext.” The line prefix was changed to ‘🏷’ as a result of feedback.


2. Making everything after these three characters optional allows for the possibility of having a ‘placeholder’ tags line.


4. Field splitting on commas is trivial in any serious programming language[b].


6. Allowing a trailing comma is intended to make it easier to generate a conformant tag list.


6. Having the tags line be no more than a single line is intended to facilitate line-oriented parsing.


8. The “last line wins” rule is intended to handle faulty gemtext generators (which in some instances might be a human brain).


Example regular expressions


The final inner parenthesis group in each example represent a capture group, which can then be split on ‘,’.


PCREs


man page: ‘pcre2syntax(3)’


^-<=(?:(?: |\t)+([^\n]+))$

ELisp REs


Emacs Lisp Reference Manual: ‘35.3.1 Syntax of Regular Expressions’


^-<=\\(?:\\(?: \\|\t\\)+\\([^\n]+\\)\\)$

POSIX.2 EREs


man page: ‘regex(7)’


^-<=(( |\t)+([^\n]+))$

Test data


A tags parser must parse the test_tags.txt file as follows:


The first four lines must not be recognised as valid (extraneous whitespace).

The last seven lines must be recognised as valid.


test_tags.txt



🏷 gemini

Glossary

Gemlog Home



[a] xkcd: ‘Standards’


[b] i.e. i'm not particularly concerned about esolangs like Ook! or Piet. :-)

-- Response ended

-- Page fetched on Tue May 21 15:01:58 2024