-- Leo's gemini proxy

-- Connecting to gemini.thegonz.net:1965...

-- Connected

-- Sending request

-- Meta line: 20 text/gemini

Arboreal navigation

Though I'd hoped my last post would be the last, I'm afraid this is another diohsc design post. Predictably, my TODO list didn't stay empty for long. In particular, dogfooding and thinking through the old systems' inadequacies led to an overhaul of navigation and history, and this post is about that.


Linearisation

First, the standard model: you navigate from one location to locations it links to, or occasionally jump to an unlinked location by entering a URL or selecting a bookmark. Each new location is appended to a linear chain of history, which you can go back and forth along. Having gone back, navigating to a new location branches off the chain, and the old branch is forgotten.


This is simple and familiar, and at first diohsc unthinkingly copied it.


The aim

But what is it we actually want? If we think of hyperlinked documents as forming a directed graph whose edges are the links, the aim is to explore this graph. This naturally means finding a spanning tree. Often the graph is already a tree, as when the filesystem determines the links, and I suspect users intuitively think in these terms even when there are loops. So, we can concentrate on the case of traversing a tree starting from its root.


The traditional "linearising" model above achieves this in a natural way: you follow some path in the tree, then go back and branch off to another path, and so on. So the linear history always consists of the path from the root to the current node.


Jumpiness

However, in a command-oriented client like diohsc, this approach is needlessly limiting. There's no need to interactively step back along the path and then branch off somewhere new, instead we can specify the desired link in a single command. Diohsc offers a few ways to do that, but I'll concentrate on two particularly intuitive cases.


Firstly, you can add links from one node to the queue, and then later, perhaps having navigated to a quite different part of the tree, jump straight to them.

Secondly, with a feature kindly suggested by Ben of kwiecien.us, you can refer directly to the next/previous links from history items; e.g. "<<]" shortcuts going back twice then selecting the next link.


So now we're jumping around the tree rather than strictly following its edges. This is nice, but it means that our linear history, in which we simply add the jumped to node on to the end of the chain, no longer reflects the tree structure. The history isn't a path from the root, but rather some fairly arbitrary sequence of nodes.


This is a particular problem for the "<<]" notation -- we'd like to interpret it as referring to an "aunt" of the current node, two up in the tree then one down one along -- but if "<<" is whatever node we happened to be on two back in the history, it might be something quite different.


This was the confusing state of navigation in diohsc until yesterday.


Caching

There's a related problem, which is more gemini-specific. Web browsers typically cache every page they see up to some memory limit, and will show the cached version rather than making a GET rerequest. This means the user never knows whether a request will be triggered or not -- the cached version might have since been deleted from the cache. But that isn't much of a problem, because GET requests are guaranteed idempotent, so there's little harm in making a request to refresh an expired cache item. (I won't say "no harm" -- there's still the privacy issue of unexpectedly revealing that you're revisiting the item.)


In gemini, there's no GET/POST distinction, and any request potentially has side-effects. So we must avoid "surprise request" situations, where the user expects to get a cached version but a request is made instead. So the obvious approach of caching the last N items, or the last N bytes worth of items, is no good -- the user is likely to be surprised by requests for items which have fallen out of the cache.


Solution

Here I'll describe a recently implemented solution to these problems.


The main idea is to specifically maintain the "path through the tree" discussed above, alongside but separate from a naive linear history.


So each node is situated somewhere along such a path. This is implemented as a doubly linked list, with each node having optional "parent" and "child" nodes, with loops disallowed.


Now "<" means parent, and ">" means child, and "]" means next sibling of child, i.e. the next link after the link leading to the child (if any). So now "<<]" consistently refers to an "aunt", as desired, and its "<" parent is our "<<" grandparent.


Queue items also get placed properly in the tree: when you add a link to the queue, the node containing the link is stored as the parent of the queue item.


So conceptually, our location is always somewhere along a path in a tree, and when we jump we jump to a new such path.


This gives an immediate answer to the problem of caching. The nodes along the path of a node are all stored in full, so the collection of reachable nodes form the cache. Going to an item along the current path (whether explicitly with e.g. "<5" or by following a link which refers to an uri along the path) shifts us to that node in the path rather than making a fresh request. The "repeat" command forces a request. In theory this does mean an unbounded cache, but in practice the trees seem not to get very deep. Moreover, we're caching only pages with links, i.e. gemtext, which is generally not heavy. There's no need to explicitly delete old cached nodes -- haskell's garbage collection automatically frees unreachable nodes.


Sometimes it is good to undo a jump, so after each jump an automatic mark "''" is set to the node we jumped from. You can also set numbered marks like "'4"; these are not saved to disc, and refer to a node rather than just an uri so it and its ancestors are cached.


It is also nice to have a naive linear history, but it can consist just of URIs rather than cached nodes, so can be quite long. We call this the "log", and it comes with a new target notation '$'.


Once we have this log, it also makes sense to differentiate between visited (i.e. logged) and unvisited URIs when displaying links. For efficiency, this is actually done with a separate set of hashes of URIs rather than searching through the log, but it comes to essentially the same thing. Links to unvisited URIs are displayed bolded, and moreover we have new target variants which select only unvisited links; in particular, "<}" selects the next *unvisited* sibling.


Help

Well, I seem to have written quite a lot in what was meant to be a concise exposition. Which makes me worried that it may be difficult to teach the user about these unfamiliar concepts. The terminology is already difficult -- should I assume users have enough of a maths/CS background to fluently understand tree terminology, and moreover that they draw their trees the same way up I do? I think not. Then there's the difficulty of differentiating between "up" in this tree versus "up" in the sense of removing a path segment, i.e. the relative URI "..", or between "back" along the path and "back" in the log. Currently I've written the help text using "origin" in place of "up", "location of which the base is the origin"(!!) in place of "down", and put a very terse explanation in the help along with some examples, and left the rest to experimentation. I *think* it shouldn't be too hard to intuit what's going on with a bit of playing around.


Feedback welcome, especially on the learnability!

-- Response ended

-- Page fetched on Fri Mar 29 07:54:45 2024