-- Leo's gemini proxy

-- Connecting to gemini.circumlunar.space:1965...

-- Connected

-- Sending request

-- Meta line: 20 text/gemini

Wisp Markup Language


This is an explanation of the Wisp markup language. Wisp was conceived to fill a niche somewhere between text/gemini (also called gemtext) and markdown.


An interactive demo is available on the web. It uses HTML and javascript to render Wisp source into HTML in real time. It's probably a better way to understand Wisp than simply reading this lazy description of the syntax.


Wisp interactive demo


This is not a polished spec. I ran into some obstacles with the core premise, and I've basically abandoned it.


Wisp takes a lot of inspiration from gemtext, but it is not intended to be a gemtext replacement. Of course, I've already abandoned this experiment, so it's more accurate to say that Wisp is not intended to be anything at all at this point.


Wisp aims for a consistent and predictable markup pattern.

Wisp sacrifices some features to keep parsing simple.

Wisp tries to provide clients with some flexibility about how many features to support.


The javascript on the demo site that converts the Wisp code to HTML is under 350 lines. A more robust (and less buggy) client would probably be longer, though.


Table of Contents

Wisp Basics

Non-tagged Line Types

Basic Line Tags

Advanced Line Tags

Commentary

Conclusion


Wisp Basics


Like gemtext, Wisp is a line-based markup language. This means that markup tags appear at the beginning of a line, and typically close at the end of the line.


Most lines of Wisp are parsed in one of two ways. Either the line begins with a tag and a space, and continues with some remaining text:

<tag><space><remainder>

or the line does not begin with a tag and a space, and is entirely text:

<text>


<tag> is a valid Wisp tag. Wisp tags may be up to 3 characters in total and may only contain the characters ; = # * > / % \

<space> is one space character (ASCII 32). The space is required when both the <tag> and <remainder> are present. It is optional if <remainder> text is omitted.

<remainder> is all characters following the first space character. The remainder is optional. Some tags may split the remainder into multiple pieces.

<text> is any line of text that doesn't begin with a tag and a space.


If the text before the first space contains any non-tag characters, the characters before the space are not a tag and the entire line must be parsed as a text line.


If the tag contains valid tag characters, but they do not form a recognized tag, the client must treat them as a shorter, supported tag. See the Advanced Line Tags section for more information.


Clients must support parsing for all the single-character tags. Clients may choose not to render all of the single-character tags, but they must parse the lines correctly based on each tag. (In particular, the tag-prefix tag would have unwanted side-effects if the parsing rules are ignored.)


Wisp does not support lines with multiple types. For example, Wisp does not support italics within headers, or bullet lists of links.


Spaces in text should be preserved as written. But remember that the space after a tag is not part of the text. If you want a literal space at the beginning of a line with a tag, you must include the extra space.


Some different line types are joined together in the client. This important feature allows mixing tagged text (such as emphasized text or links) inline with regular text.


Non-tagged Line Types


Text Lines


Any line that contains one or more characters and doesn't begin with a tag is a plain text line. Consecutive non-blank text lines should be joined together when rendered by the client.


(If a line contains any non-tag characters before the first space, then it does not contain a tag, and the entire line must be parsed as plain text.) When consecutive lines are joined, there must not be automatic white-space added between them. If you want a space between the last word of a line and the first word of the next line, you must add it yourself, either at the end of the first line or the beginning of the second line.


If a line would exceed the client's viewport, the client may wrap the text onto a new line. The wrapping algorithm to use is up to each client.


Empty Lines


An empty line is a line that contains no characters at all, not even a space or other white-space characters.


Empty lines are referenced by some other tags, but normally they are just rendered as what they are: an empty line.


Basic Line Tags


Clients must parse all of the basic line types correctly. Clients should also *render* all of the basic line types correctly, but may choose not to depending on the goals of the client.


Each basic line tag is exactly one character in length.


; Comment

; <comment text>

A single semi-colon (;) is the comment tag. Any <comment text> must not be rendered by the client.


Comments are ignored when determining whether two lines are consecutive. You can include a comment in the middle of a paragraph, as you will see in this paragraph if you examine the Wisp source on the demo site.


= Link

= <URI> <link text>

A single equals sign (=) is the link tag. The text after a link tag is divided into two parts. The first part is a URI. The second part is user-friendly link text to display to the user. The two parts are separated by a space character (ASCII 32).


Link lines should be treated as text lines when joining adjacent lines. That is, they should be joined with text lines, emphasized text, and other link lines.


Here is some Wisp source text that includes an example link:

Wisp takes a lot of inspiration from
= https://gemini.circumlunar.space/docs/specification.html gemini and gemtext
 but it is not intended to be a gemtext replacement.

# Header

# <header text>

A single number sign (#) is the header tag. The client should render <header text> differently from plain text. An advanced client could use headers on the page to generate a table of contents for the page.


Like text lines, two consecutive header lines may be joined together by the client:


# This is a very long title
#  that I split within the
#  Wisp source code

* Bullet item

* <bullet item text>

A single asterisk (*) is the bullet item tag. Each line of <bullet item text> should be rendered as an item in a list.


Unlike text lines or header lines, consecutive bullet lines are treated as separate bullet items.


This is an example list:

* Apple
* Banana
* Coconut
* Durian

> Quoted Text

> <quoted text>

A single greater-than sign (>) is the quoted text tag. The <quoted text> should be styled, indented or otherwise decorated by the client to indicate that it has a separate origin than the main text.


As with text lines and header lines, the client may join consecutive quoted text lines.


Here is a famous piece of advice:

> No matter where you go,
>  there you are.
> —Abraham Lincoln

/ Emphasis

/ <emphasized text>

A single forward-slash (/) is the emphasis tag. The client may style the <emphasized text> in italics, or use another method to emphasize it.


Emphasis lines should be treated as text lines when joining adjacent lines. That is, they should be joined with text lines, link lines, and other emphasized text lines.


For example, I was
/ not
 expecting this amazing surprise party!

\ Text Indicator Tag

\ <text>

A single backslash (\) is the text indicator tag. The <text> must be rendered as regular text. This can be used to "escape" lines that might otherwise be parsed as tag lines. This can be useful when writing about Wisp itself or other coding or markup languages.


This is a paragraph that happens to mention the
\ = tag in an inconvenient line break, just for demonstration.

Remember that the backslash is not escaping individual characters, as you might expect from other languages. The text indicator is its own tag, and, like all tags, it must appear at the beginning of the line and the space between the tag and the remainder text is required (if there is any remainder text at all).


% Auto-prefix Tag

% <tag> <mode-ending string>

A single percent sign (%) is the auto-prefix tag. This automatically adds <tag> to the lines that follow the auto-prefix line. If <mode-ending string> is omitted, the auto-prefixing will end at the next line break. If <mode-ending string> is specified, the auto-prefixing will end when the parser encounters a line that exactly matches <mode-ending string>.


% >
This paragraph uses the auto-prefix tag to
 quote the entire paragraph without having to
 put the quote tag at the start of every line.
 It ends at the first empty line.

% * END
This is a bullet list
for some reason.

It has empty lines
which don't end the block.

It ends at the
line that equals "END"
(which is not rendered)
END

I guess that
% / END
 inline
 auto-prefix tags
END
 also work?

Advanced Line Tags


Advanced line types extend the basic line types to provide additional features.


Not all clients need to support any or all advanced line types. If a client encounters an unsupported tag, it may trim characters off the end of the tag until the result is a supported tag. This is called "falling back" to a supported tag.


To use some not-real examples, if the client encounters the "//" tag and doesn't support that tag, it should treat "//" as the "/" tag instead. If a client encounters a "/=#" tag and it doesn't support that tag, it should treat "/=#" as a "/=" tag instead. If that tag is also not supported, it should treat "/=#" as a "/" tag.


;; Metadata

;; <property> <value>

Two semi-colons (;;) indicate metadata about the upcoming document. The remaining text is split into two parts: a <property> and a <value> for that property. The two parts are separated by a single space character (ASCII 32).


Note that <property> may not contain the space character, because this is used as the separator from the metadata value. The <value> may contain spaces, since it extends to the end of the line.


The metadata defined for a value applies to all the lines following the metadata declaration, until the same property is redefined by another declaration.


For example, the "lang" metadata property changes in the following paragraphs:


;; lang en-US
I'm writing this in English.

;; lang fr
Je suis un pipe.

;; lang en-US
You get the idea.

A metadata line that contains a property and no value ends any current metadata section for that property. That property is unset for future lines.


Note that the metadata tag falls back to the comment tag. If the client doesn't support metadata, the entire line is simply treated as a comment and ignored.


Wisp does not currently define any particular metadata property types or acceptable values. Since this project will likely not be developed further, it probably never will.


;# Alternative text

;# <alternative text>

A semi-colon followed by a number sign (;#) indicates alternative text (alt text). Clients may use the <alternative text> in place of any following lines, until new alt text is defined, or cleared with an empty alt text line.


An empty alt text line ends any current alt text, and indicates that there is no longer alt text for the following lines. If you have consecutive blocks of text with different alt text, it's not necessary to "close" the blocks. You simply start a new alt text.


As with gemtext, the most plausible use for this feature is to describe ASCII art or other pre-formatted text blocks, but you can use it with other text if there are any appropriate use cases. For example, these two paragraphs have alt text:


;# Made-up DNA sequences
GTACA CTCAG TTGGT ACAGA TTGAC
AGTAC GCACG GCGTT AGTAC CACCC

;# Wavy lines
\\ /\/\\/\/\//\\\\////\
\\ //\/\/\//\/\//\\///\
;#

A client that displays the primary text by default should not display the alt text. Authors should assume that users will usually see either the primary text or the alt text, but usually not both.


The alt text tag falls back to the comment tag. If the client doesn't support alt text, the entire line is simply ignored as a comment.


=> (Reserved)


As a gesture of cross-compatibility with gemtext, the => tag is not used or defined at this time. Because of the fall-back rules, this means that clients will treat "=>" as equivalent to "=", which is the link tag.


## and ### Sub-header and sub-sub-header

## <sub-header text>
### <sub-sub-header text>

Two number signs (##) indicate a sub-header. Three number signs (###) indicate a sub-sub-header. The client should set off the <sub-header text> or <sub-sub- header text> to differentiate it from the main body of the text.


The sub-header and sub-sub-header text should be rendered differently from the header text (and from each other) to differentiate the different sections of the document.


# An example header

## An example sub-header

### An example sub-sub-header

As with header lines, clients should combine sub- header and sub-sub-header lines with adjacent lines of the same type.


The sub-header and sub-sub-header tags fall back to the header tag. Clients that don't support sub-headers and sub-sub-headers may handle them as if they were

equivalent to "#".


This means that sub-header or sub-sub-header lines that are adjacent to each other or header lines may be combined into a single header line by clients that don't support these advanced tags. For this reason, it's a good idea for authors to separate headers, sub-headers and sub-sub-headers from each other by an empty line.


** and *** Second- and Third-level Bullet List Items

** <bullet item text>
*** <bullet item text>

Two asterisks (**) indicate a second-level (indented) bullet list item. Three asterisks (***) indicate a third-level (twice-indented) bullet list item.


Here's an example shopping list:

* Groceries
** Bread
** Fruit
*** Bananas
*** Apples
* Hardware store
** Light bulbs

The sub-level bullet item tags fall back to the bullet item tag. Clients that don't support sub-level bullet items may handle them as if they were equivalent to "*".


This means that multi-level bullet lists may be compressed into a single-level bullet list by clients that don't support sub-level bullet list tags. This isn't ideal, and I don't have any advice for how to prepare for this situation.


*# Labeled List Items

*# <label> <list item text>

An asterisk followed by a number sign (*#) indicates a labeled list item line. The client should render the <label> in front of the <list item text>. This can be used to display ordered lists. Unlike in HTML, the author is responsible for labeling each item on the list. The client may align the labels and bullet lists as desired, but should not add any decoration (such as punctuation) to labeled list items.


Here's an example labeled list:

*# 1. This is a list.
*# 2: it's short.
*# 2.a. Sub-level labeled list items aren't specified, but you can do what you want with the labels. Maybe that's enough.
*# X. Label text can be anything
*# Purpose: You could use it for lists of word definitions I guess?

The labeled list item tag falls back to the bullet item tag. Clients that don't support labeled list items may handle them as if they were equivalent to "*".


This means that the label and text of each line would be combined into the bullet item text for each bullet.


>> and >>> Nested Quote Lines

>> <quoted text>
>>> <quoted text>

Two greater-than signs (>>) indicate a nested quote line (a quote inside a quote). Three greater-than signs (>>>) indicate a double-nested quote line (a quote inside a quote inside a quote).


As with quote lines, clients should combine nested quote lines with adjacent lines of the exact same type (>> with >>, >>> with >>>).


Here's a brief example of nested quote lines, as they might be used to render an email conversation:

>>> Dear esteemed colleague. Please find below a
>>>  fantastic scheme for immediate return on a
>>>  small investment. [etc., etc.]

>> Wow! I'm in! What do I need to do?

> Quickly remit a small payment to the bank account
>  below. [etc., etc.]

The nested quote tags fall back to the quote tag. Clients that don't support nested quotes may handle them as if they were equivalent to ">".


This means that nested quote lines that are adjacent to each other or quote lines may be combined into a single quote line by clients that don't support these advanced tags. For this reason, it's a good idea for authors to separate quotes

and nested quotes from each other by an empty line.


// Bold Emphasis

// <bolded text>

Two forward-slashes (//) form the bold emphasis tag. The client may style the <bolded text> in a heavier font weight, or use another method to emphasize it.


If the bold emphasis tag is not supported, it falls back to the emphasis tag.


Bold emphasis lines should be treated as text lines when joining adjacent lines. That is, they should be joined with text lines, link lines, and other emphasized text lines.


For example, the sign above the door read
// No Admittance!
 and I think it was serious.

/= Underlined Text

/= <underlined text>

A forward-slash followed by an equal sign (/=) is the underlined text tag. The client should render the <underlined text> with a straight line under it.


If the underlined text tag is not supported, it falls back to the emphasis tag.


Underlined text lines should be treated as text lines when joining adjacent lines. That is, they should be joined with text lines, link lines, and other underlined or emphasized text lines.


For example, if I mention
/= Persuasion
 by Jane Austen, the underlining indicates that it's
 the title of a book, in case that was unclear.

Is this even necessary? I included it because it was relatively simple, and people may expect it. But underlining has gone out of fashion on the web, since it can be confused with link text.


\\ Pre-formatted Text Indicator

\\ <text>

Two backslashes (\\) form the pre-formatted text indicator tag. The <text> should be displayed in a fixed-width font. This can be used to make diagrams or drawings out of text characters, or to format text when spacing and line breaks are important, such as poetry. Unlike normal text, pre-formatted text lines are not joined with each other or any other lines.


This is an ASCII-art drawing of a boat:

;# The drawing
\\     |\
\\     | \
\\     |--'
\\   ------
\\ .,\.,.,/,.
;#

Commentary


Here are some of the reasons why I'm not satisfied with Wisp and probably won't develop it further. I have ideas about how to address some of these concerns, but I think they will require changing the language enough that it deserves a different name. That is a project that I might return to in the future.


Spaces at Line Breaks


Strictly joining line breaks exactly as written is unsatisfying. I don't like having to put a space at the beginning or end of every line. That's not the way that English is usually written in plain text documents.


But I don't want to automatically insert spaces between every joined line because other languages don't put spaces between words at all. Even in English, there will often be exceptions related to inline tags.


For example, if a link appears at the end of a sentence, I usually don't want to include the final period in the link text. But if there was an automatic space inserted between lines, then there would be a space between the link text and

the final period.


And I definitely don't want to abandon automatic line joining. That's the critical feature that allows us to achieve in-line markup with gemtext-style start-of-line tags.


I considered adding a new line type( or types?) specifically to differentiate between "joined" and "space-joined" lines. I gave up on this because Wisp wouldn't allow you to combine these with other markup, which is probably 95% of why I would want to do it in the first place (see the example

about links above).


It would also be nice to have support for soft line breaks ("unjoined" lines). I think that would be easy to define as an advanced tag that extends the plain text (\) tag. I maybe meant to do that and forgot, or was frustrated that I couldn't also address the joined/space-joined problem at the same time.


Combining Tags


I'm not satisfied that certain tags can't be combined with other tags. In particular, I think it's a weakness that you can't use emphasis or links inside of quoted text.


Inconsistent Block Termination


The meta-data, alt text and tag prefix tags all act on the lines that follow them, but the two comment-based tags terminate their blocks differently than the tag prefix tag. I don't like this inconsistency.


The meta-data and alt text tags require the author to, essentially, end a tag block by starting a new one. In contrast, the tag prefix tag ends at the either first empty line, or at an author-specified line of text.


Of these options, I like the idea of ending a block at an empty line the most. I like the symmetry it has with ending a tag's effects at a new line character.


Unfortunately, the meta-data and alt text tags aren't a good fit for this. They are more likely to affect multiple paragraphs than the tag prefix tag, so it

seems cumbersome to require the user to redefine them or specify a "block ending string".


They also don't work well with the "block ending string" anyway. So far, both comment-based tags take the first word after the tag as one parameter and the remainder of the line as a second parameter. Introducing a "block ending text" parameter to these tags would require adding a third parameter, and, what's more, it would have to be optional. I can think of one or two ways to do this, but none of them appeal to me.


Bullet List Long Lines


I'm a little unsatisfied that bullet list items must be written entirely on a single line. This prevents authors who like to limit their source code line lengths from doing so. Most other elements can either use long lines or break arbitrarily, but bullet lists are an exception to this behavior.


One alternative would be to auto-join bullet items, and therefore require a blank line between each item, but that goes against the way I usually see bullet lists rendered in plain text.


Another option would be to define a different line type that auto-joins with the line above it, regardless of the type. This also create some parsing complexity, and I don't like that it would only be needed for bullet items. If I went this route, I would probably change header parsing rules to be non-joined by default.


Arbitrary Depth Limits


Headers, bullet lists and quoted text are limited to a depth of three, because the tag length is arbitrarily limited to a length of three. Is this enough?


If an author attempts to use a four-length header, bullet list or quoted text, then Wisp won't even fall back to the longest supported tag. It will treat the entire line as a text line. This is a deliberate choice to allow a client to quickly and definitively determine whether a line has a tag by examining the first four characters only, but the end result is a little unsatisfying.


Conclusion


That's it. Thanks for reading.


emptyhallway

2020-12-20

-- Response ended

-- Page fetched on Thu May 2 05:01:58 2024