-- Leo's gemini proxy

-- Connecting to gemini.marmaladefoo.com:1965...

-- Connected

-- Sending request

-- Meta line: 20 text/gemini

A proposed scheme for parsing preformatted alt text

date: 7-Sep-2020

(updated 7-Sep-2020 to clarify that the proposal is not about table formats particularly, this is an application)

This post is a short write up of a chat session we had today on the IRC #Gemini channel.

The conversation got off to a start with a question of whether there could be a means to support tables in gemtext. Clearly you can have a link to a standard format for your table, such as CSV or TSV, but perhaps a client could render that table directly?

Maybe you could put CSV into the preformatted area and clients could optionally render it as a pretty table?

Whilst the discussion started from a question of table formatting, the proposal below is not about what particular syntax we use for tables per se, but rather to develop a community practice about how additional attributes could be conveyed in the alt text which the Gemini spec permits to be non-empty, as well as supporting the current practice which is that the alt text is empty or just a simple label.

It would be helpful if this became part of the Gemini spec and the conventions were solidified, but it is not necessary at this stage.

Preformatted regions in Gemini

Preformatted areas in gemtext are opened and closed by lines commencing with three backticks ```

here is some      preformatted       text
where white       space              is significant

Clients typically render such preformatted text using a fixed width font and with all the original whitespace.

These preformatted regions have a number of uses, for example to show source code:

<!doctype html>
<html lang="en">
  <meta charset="utf-8">
  <title>Page title</title>
  <p>A very simple HTML 5 page.</p>
</html>

or ascii/unicode art

 _____  _       _       _      ___            _
|   __||_| ___ | | ___ | |_   |  _| ___  ___ | |_
|   __|| || . || || -_||  _|  |  _|| . ||   ||  _|
|__|   |_||_  ||_||___||_|    |_|  |___||_|_||_|
          |___|

A key consideration, particularly for ascii art is how non-visual clients will render the content, since the picture is a graphical one albeit constructed from characters and punctuation. Other gemini agents such as web crawlers might want to index this content for search purposes.

For the opening backtick delimiter, there is a space after the delimiter which does not have any specific meaning, and the Gemini spec says it should not be displayed, but may be interpreted.

>Any text following the leading "```" of a preformat toggle line which toggles preformatted mode on MAY be interpreted by the client as "alt text" pertaining to the preformatted text lines which follow the toggle line. Use of alt text is at the client's discretion,

This is a location where the content author can provide some "alt text" that can be interpreted, and can assist the processing and display of the preformatted content.

Parsing the alt text - Bouncepaw's scheme

Bouncepaw wrote a piece proposing some options for parsing this alt text

Bouncepaw: Extending gemtext's preformatted text

Bouncepaw's scheme proposes a number of different "types" to indicate the role of the content:

 ```type=table
 (preformatted content continues)
 ```

A valid point that came up in our IRC discussions was that we should support screen readers and legacy clients that may not want to parse the content further.

The following scheme builds on this idea of delimited alt text within Bouncepaw's proposal and attempts to make it more flexible and backwards compatible.

Proposal

The proposed scheme to parse the alt text is presented below. There are a number of design considerations it seeks to satisfy:

1. Support screenreaders and other clients that wish to extract a plain text description

2. Low complexity with minimum and recognisible syntax

3. Support multiple attributes if necessary

4. No pre-conceptions about attribute names and values

The scheme is as follows

 ```<human descriptive text>(;<css delimited attribute values>)

Essentially this is a CSS defined delimitation scheme, attribute/value pairs separated by semi-colons and using a colon to separate the attribute from the value.

CSS is chosen as it is a well established, human friendly syntax that permits multiple attributes to be provided.

Remarks

Screenreaders may choose to read up to the first semi-colon (or end of line if none found). This is the same as the current default in which no additional syntax is given.

The set of attributes is not currently specified, although two initial attributes of content-type and lang are proposed below.

Attribute value pairs are space trimmed, like CSS, which allows them to be written in a flexible and more human readable way.

Attribute names should not normally contain spaces, colons or semi-colons, although they can be escaped using the CSS rules for these.

Attribute values should not normally contain colons or semi-colons, although they can be escaped using the CSS rules for these.

The human descriptive text should not contain a semi-colon, otherwise clients may not parse them correctly. However at worse the descriptive content may be truncated.

The first un-named attribute is the "alt" attribute

Attribute names should be written in lower case for simplicity

 ```A description; attribute1: value 1 ; attribute2: value 2

 This is equivalent to:

    alt="A description"
    attribute1="value 1"
    attribute2="value 2"

 ```

Initial attributes

The following attributes are proposed as those that could be of immediate value.

alt

The first un-named attribute is the alt attribute. It can be used elsewhere in the alt text expression, but should normally be the first attribute, in which case the "alt: content" form is not needed. This is for backwards compatibility reasons, and to give the alt text attribute a name.

content-type

This attribute is to indicate the type of text held in the preformatted region. This can assist clients, user agents and end users in correctly understanding and interpreting the content. In some cases, they may decide to render the content in one or more alternative ways. For example

text/csv - could render as a table with borders

source code - could use syntax highlighting

This is to indicate the type of *text* shown in the region. It is not to be used to express embedded binary content of any other kind, or extended to arbitrary mime types. The text encoding of the current page is applicable.

The mime type value is not case sensitive.

 ```here is a table in csv;content-type:text/csv

 ```here is some python that your client could show with syntax highlighting; content-type: application/xpython

 ```here is a graph that could be visualised using graphviz; content-type: text/vnd.graphviz

Here is a table example using tab delimited text (TSV).

 ```Here is a label about the table; content-type: text/tsv
 *    1    2    3
 1    2    3    4
 2    3    4    5
 3    4    5    6
 ```

lang

This attribute is to indicate the language of the content, using a standard ISO 639-1 two letter code, as used in HTML. This enables the quoting of content in other languages than that stated by the media type of the current page.

 ``` Some English content; lang:en
 Hello English Speaking World
 ```

 ```Un peu de Français; lang:fr
 Bonjour mes amis
 ```

Feedback

Let me know your thoughts and feedback by email or perhaps through a followup post of your own.

luke at marmaladefoo dot com

__________________ __________________ __________________

Gemlog index

Home

-- Response ended

-- Page fetched on Fri Apr 26 01:06:30 2024