-- Leo's gemini proxy

-- Connecting to bbs.geminispace.org:1965...

-- Connected

-- Sending request

-- Meta line: 20 text/gemini; charset=utf-8

Anchors


This is a question about the limitations on the gemini protocool. It isn't a criticism or a complaint, more of ... seeking understanding.


Why doesn't the gemini protocool have anchors? I don't mean the <a href='link'> part of the "anchor" tag, but the <a href="#anchor_on_this_page'> sort.


I remember this was one of the first bits about HTML that struck my interest when I was learning it back in the day. You could have a table of contents at the beginning of a document, and then link within the document to various 'anchors' (or in gemini: headings).


It seems like employing these would incentivize a further minimalisation of traffic. For example, rather than a client having to send new requests to a server for new documents, the links would work locally, reducing bandwidth usage. This is a perfect usecase for epubs, FAQs, and reference material that employs a table of contents from that even-simpler technology (books).


geminiprotocol.net/docs/faq.gmi


Note in the link above to the official gemini protocol FAQ... It loads the entire faq.gmi document, and yet following the "table of contents" links at the top opens up a separate page with the same content repeated. This is an increase in bandwidth, as well as an added hurdle when managing the additional files, and updating the text across various files.


Now, those who wrote the gemini protocol made intelligent choices, and intentionally limited the capacity of the protocol for good reasons. I am here because I like those reasons, I think this was a good idea, and I'm excited for it. I'm curious why the anchor-to-location-in-page wasn't among them.


I am not working in the tech field... so it's entirely possible that I simply missed something obvious like "oh, anchors increase extensibility, so eventually you'd end up with too much data in the header request" or something like that. It's possible it was too complicated to figure out how to make '=>' link to one of many potential '# X Y Z' headers that include spaces. Maybe no one thought of it.


But that's the question, submitted out of curiousity: why aren't there intra-document ###anchors that can be destinations for => links?


Posted in: s/Gemini

🌲 Half_Elf_Monk

Apr 17 · 5 weeks ago


23 Comments ↓


🚀 stack · Apr 17 at 16:01:

There are many things that could make Gemini much better at minimum or no cost, but it is what it is.


My pet peeve is that every one of the linetypes requires a different parsing strategy -- it's one or two or three characters, sometimes a space is required, other times not, etc. We could be using a one-line parser instead of a wacky 50+ line conditional monstrosity. It is my conjecture that Solderpunk was high when he wrote the spec -- it is hard to imagine how a coder could make each line type be marked in a distinctly different manner.


🛰️ lufte · Apr 17 at 16:26:

I wonder if this is something that the gemini spec has to explicitly allow. The anchor part of a URL is URL standard, and a client could choose to act on it. A clickable table of contents is something that a client could prefectly do, you could even put it on the side of the document, like in a different pane, so it's always visible.


🍀 gritty · Apr 17 at 16:29:

I'm sure there's a good reason. I've been on here a couple years and have done okay without them. My guess is keeping with Gopher roots and not HTML.


The spec isn't final-final, so who knows, maybe it'll be added.


🕹️ skyjake [mod...] · Apr 17 at 16:40:

I very much doubt support for anchors will be added. There isn't enough bang for the buck, so to speak. You could define a set of rules for valid anchors and how a client should access them, but this feature is not that often needed nor does it bring that much additional value, so why bother with this complexity?


I suggest solving this on the server side by dynamically splitting larger .gmi files and generating URLs for each part/section as appropriate, in addition to serving the whole file as is. Something like this is already done with the Gemini F.A.Q. document, where you can access the sections as separate pages.


🚀 stack · Apr 17 at 17:16:

A sane client should currently ingnore the anchor part of the URI... Lagrange seems to. If anyone implements it in the browser, it could be a defacto feature. Does the spec explicitly prohibit such syntax?


👤 jdcard · Apr 17 at 18:19:

There is nothing preventing client software from implementing anchors, nor fom implementing the similarly useful

— link to text fragments

feature.


🍵 michaelnordmeyer · Apr 17 at 18:49:

Another point against anchors is, that URLs can be redirected, but anchors cannot.


Anchors need to match headings, and an author can quickly edit a heading in a published post for whatever reason, and all aready made links for this anchor will point to the top of the document.


🍵 michaelnordmeyer · Apr 17 at 18:50:

I wish I could point to a certain section in longer dcuments like the specs.


💀 requiem · Apr 17 at 19:30:

Yeah, I’m also thinking that there isn’t anything keeping clients from implementing this. In fact I think that some clients (Lagrange, possibly?) already know how to deal with MarkDown style footnotes. Fairly certain I used them before, and was surprised to see it kindof work.


🚂 MrSVCD · Apr 17 at 19:40:

Essentially Lagrange has support for a toc. If you use headings they will show up in the sidebar and you can click on the heading and go to it in the page.


🚀 poster_boy · Apr 18 at 10:58:

i humbly suggest Geminaut (windows only) https://www.marmaladefoo.com/pages/geminaut#head7


🚀 stack · Apr 18 at 13:07:

but...windows... My condolences.


😺 gemalaya · Apr 21 at 09:12:

A Gemini browser can generate an internal table of contents from the document's headings (some browsers already do that). An important thing to implement would be to parse the URL fragment, and if there's a fragment, scroll to the matching heading in the document ?


💀 requiem · Apr 21 at 13:36:

It would need to remember which line it found what fragment and scroll to that line. As Gemini is designed to be a "parse line by line" format, this should be achievable with minimal effort. I would personally do this with inline, markdown-style "footnotes", so "footnote [^1] " inline and then "[^1]: Footnote text" on new lines at the end. The only question is, how you could create anchors for e.g. headings. Perhaps:

[^Anchor name] # Heading text ?

But then how does this render in plaintext?


Problem is that "extra formatting" like this seems to go against gemtext's philosophy.


There's a new revision of the Gemini specification coming; and there may yet be a few more


💀 requiem · Apr 21 at 13:38:

(I say it goes against gemtext's philosophy because the markup seems to favour an "all markup is content" philosophy; there aren't elements that get always hidden like HTML's tags. Inline anchors, the way we are used to them, rely on attributes that get hidden during rendering, something that doesn't really exist in gemtext, nor does it seem desirable).


💀 requiem · Apr 22 at 16:52:

Another possible way to implement anchors occured to me:


Extend the => property to work with any content. That way a => Link could either have a link to follow, or, if it is followed by markup in the link text, and if the URL wasn't a path or an external url but simply a #keyword then it could serve as an anchor. E.g.:

=> #anchor-to-heading-1 # This is the First Heading
The above text is rendered normally as a heading, but there's an anchor attached to it.

To scroll to it you can click here:
=> index.gmi#anchor-to-heading-1 Click here to go to anchor

😺 gemalaya · Apr 22 at 21:03:

@requiem In my mind there was no need for a specific syntax to set the anchor name because you could derive it from the heading. So this heading:


This is the heading


would have this anchor: "this-is-the-heading" and could be accessed with an URL like:


— geminiprotocol.net/doc.gmi#this-is-the-heading


This is not ideal though because if the heading changes, URLs pointing to it with a fragment must also be changed.


💀 requiem · Apr 23 at 07:15:

Hmm, yeah. Another issue is that you might want to point to things other than Headings. In lieu of inline links I would really like footnotes at least… An inline anchor would be grand.


🕹️ skyjake [mod...] · Apr 23 at 07:24:

You all might also be interested in the discussion about this topic in GitLab. Here's a link to my proposal from 2 years ago:

— https://gitlab.com/gemini-specification/gemini-text/-/issues/3#note_619188413


💀 requiem · Apr 25 at 23:57:

Speaking of anchors and inline texts, it's not _really_ anchors, but here's a little thing I made to create footnotes in Gemtext.

— See example output here.


"""
    Place any footnote markers in your text.[^FN]
    Then put some footnote links at the end.
    --
    => gemini://vigilia.cc [^FN] The coolest place in geminispace

and get nicer footnotes:

    Place any footnote markers in your text.¹
    Then put some footnote links at the end.
    --
    => gemini://vigilia.cc ¹ The coolest place in geminispace
"""

FOOTNOTE_INLINE = re.compile(r'(\[\^FN\])', re.M)
FOOTNOTE_END = re.compile(r'(?P<link>=>\s\S+\s)(?P<fn>\[\^FN\])(?P<desc>.*)', re.M)

def footnote_numbering(n):
    retstr = ""
    subs = {
        "0": "⁰",
        "1" : "¹",
        "2" : "²",
        "3" : "³",
        "4" : "⁴",
        "5" : "⁵",
        "6" : "⁶",
        "7" : "⁷",
        "8" : "⁸",
        "9" : "⁹"
    }
    for c in str(n):
        retstr += subs[c]

    return retstr

endfn_no = 1
def process_end_footnotes(match):
    global endfn_no
    retstr = match.group("link")
    retstr +=  footnote_numbering(endfn_no)
    retstr += match.group("desc")
    endfn_no += 1
    return retstr

inlnfn_no = 1
def process_inline_footnotes(match):
    global inlnfn_no
    retstr = footnote_numbering(inlnfn_no)
    inlnfn_no += 1
    return retstr

def prettier_footnotes(body):
    body.content = re.sub(FOOTNOTE_END, process_end_footnotes, body.content)
    body.content = re.sub(FOOTNOTE_INLINE, process_inline_footnotes, body.content)
    return body

🌲 Half_Elf_Monk [OP] · May 02 at 16:13:

Thank you for all the excellent responses. I appreciate you. I think I have my answer: "A table of contents could (should?) be taken care of by the client." So I'll reiterate (and then drop) a case for putting something about anchors in the spec so that clients could standardize what they're doing. I hope this doesn't sound like complaining so much as a suggestion which, imho, is in line with the purposes of the protocol. And then I'll be done, thanks for your time, sorry for the bother.


Simplicity


Anchors seem like they would further the smol aims of a smoller web. They're simple to use. The idea of dynamically splitting documents apart to make FAQs is clever and impressive, but doubles the compute / data footprint of a site. It would take someone of @skyjake's talents to pull off. Regular end-users (who don't know cgi) could easily figure out anchors with the text editing program they're already using.


Anchors reduce clutter in site storage directories. If you've ever played any one of those 900+ location Interactive Fiction adventures on gemini, you have to wonder what the directory which holds all those files looks like. Anchors would allow IF to exist in one file... which, once downloaded, is navigated entirely client-side.


Yes, links might get broken if the name of a heading changes. But then again, links would also get broken if you're dynamically splitting documents. That's simply part of managing static sites.


Less Bandwidth


All the activity for anchor navigation occurs client-side... therefore it reduces the amount of bandwidth needed for a data transfer. If a smoller web/protocol is trying to minimize/conserve resources, intra-file anchors allow for a significant reduction in bandwidth usage over the protocol in the first place. That reduces the overhead load on whatever infrastructure gets gemini packets from the server to the client, because it gives the option to transfer one bigger file instead of multiple smaller ones. Less instances of transfer might also reduce the attack surface for a motm attack? Idk.


Compare the overall network-wide compute load needed for: " curl gemini://address " vs running " /#heading3 " locally. Which is a more minimalistic, energy-conserving approach?


Potential arguments against


I see two (three?) downsides. a) If soldierpunk et al. just don't want to do it. Fine I guess, but bleh. b) an #anchor in the request might include too much information in the header, which makes the protocol more exploitable (solved by my solution below) and c) it changes the psychological effects of browsing geminispace.


That last one makes the most sense to me. If browsing a giant document with anchors makes it feel less gopher-ey, and that's what we're here for... then I could understand that. The difference between reading a physical book (where you have to turn pages) and scrolling through that same text on a phone (with up-and-down swyping to scroll) is measurable. We tend to remember the information from physical pages better (citations do exist for this, plz go find them yourself). Thus a gopher-ey experience of each screen showing a different "location" is qualitatively different. Fine. I'd be curious what anyone thinks of that.


Implementation Idea


There are no new line types.


Each #heading, ##subheading, and ###triple-subheading is also an anchor. There is no anchor name/title/id, you just get whatever comes after the octothorpe(s) in that heading line. Anchors are not invisible.


The Link line type can point to a new document, or a heading in the current document, or to a heading in a new document. It does so by putting the name of the #heading, ##subheading, or ###triple-subheading after the location of the file to which it links, like this:


> => gemini://gem.example.org/anchors.gmi##subheading Link Title


Anchor data is never sent as part of a request, so it does not ever need to pass through a server. Nor would any server software really need updates. This is something that occurs 100% in the client program.


When a link linetype is acted on by the user, the client would have to look at the tail end of link line for any octothorpes. If it finds an one (or more), it would parse out the file_address and the #text_string that follows it. It sends a request for the file_address as normal.


Then when that file loads, the client tries to jump to the first instance of that octothorpe+text_string it finds. If it finds none, it loads as normal. If no file_address is provided in the link, it'd search for that #text_string in the current document and jump to that point. If no #text_string is provided, or if it can't find any matching #text_strings in the document, this is the current situation and it loads the page as normal.


The server software shouldn't *ever* need to deal with something like "gemini://link.to/file.gmi#My%20Second%20Point" because a) that's gross and b) there's no need for the post-octothorpe anchor bits of the request to ever leave the client. Less bandwidth.


Yet I'm guessing server software might want to figure out a way to either parse/pass those requests by filtering out what's not needed, since there's no way to rely on every client filtering out the ##anchors in the link, so generous servers could strip the ##subheading part off to interpret file requests.


If you want anchors that include spaces, you're either out of luck, or need to come up with a simple way to preserve the ability to have fancy link titles, #heading-anchors, and space characters that all play nicely together. I'm sure there's a solution, but I doubt it's as simple as what I'm suggesting here.


TL;DR / end


Headings that don't contain space characters can be linked to like anchors from the => link linetype, purely within the client. This simple addition saves bandwidth, and allows for book-like readability features such as a ToC and endnotes.


I'm not mad or complaining; just excited about the protocol. Thanks for your time.


🌲 Half_Elf_Monk [OP] · May 02 at 16:17:

Also @skyjake that gitlab link is helpful discussion, and provided another interesting read. thx


🛰️ lufte · May 02 at 21:35:

I like your proposal but also I would like to consider a couple of additions.


First, the initial hash sign in the URL must not denote the level of "heading" we're trying to link, but rather is the standard fragment delimiter in the URL: see https://en.wikipedia.org/wiki/URL.


Second, the actual fragment should be URL-encoded. This would mean that for a triple subheading you would require the hash symbol that separates the fragment plus 3 hash symbols for the header, but these are encoded. This would also allow for spaces or any other text in the header.


Finally, I think clients are already forced to not send fragments in their requests and servers are forced to reject requests that include one, so that part is covered.


Example: to link to "## My second level header" in "capsule.org/home" you would request "gemini://capsule.org/home#%23%23%20My%20second%20level%20header".

-- Response ended

-- Page fetched on Sun May 19 21:12:42 2024