gemini://gem.twunk.uk/log/2023-05-08-this-gemini-server.gmi

I've been at least a little bit interested in Gemini since I first found out about it. Originally I was going to write a Gemini client, and I started doing that years ago, but never finished.

This time I wrote a Gemini server instead. It's really trivial. It's written in Rust and uses various crates for things:

It just serves files straight from the filesystem. I haven't done anything fancy with splice(2) to serve more efficiently (thinking about it I guess that would need kTLS anyway, which I also haven't touched).

It applies a request and response timeout.

I haven't put a concurrency limit in place yet, but maybe I will at some point.

Firstly, Gemini really really is a very simple protocol to implement. My first implementation didn't use any async, it just spawned a thread for each connection, and frankly that would have been totally fine to stick with.

Since each request is on its own connection, with its own TLS session, and there's no keep-alive to receive multiple requests on a connection, there is basically no connection management to do. And since the request is a single URL on a single CR-LF terminated line, the request parsing is pretty easy.

Oh... but request parsing is actually not *quite* so easy. It's easy except that you still need to do percent-decoding and perhaps normalise the request. And for a "file server" type server it's important that it won't try to read outside its designated serving path so you have to carefully map from the requested path to a file-system path. I'm not sure I've done that fully safely yet.

My little VPS host runs Debian 11 (Bullseye), which has glibc... I don't know but an older version than my personal machines run. Rust statically links crate stuff, but it does dynamically link to glibc (which is correct as far as I know - static linking glibc seems bad maybe?). So just making a release build on my local machine and then uploading the binary didn't work: glibc not compatible.

To build it, I use the Debian 11 docker image and build inside a container using podman. This is fine, but I haven't automated it yet. I basically just do:

Make a temp clone of the server repo so I don't mess up my actual repo by mistake.

Use `cargo vendor` to grab all the deps (not actually essential). Put the config

output into .cargo/config.toml so the vendored copies of stuff will be used.

Create and start a debian bullseye container. Volume mount my code directory read-only.

Also volume-mount a fresh output directory (read-write).

apt update, upgrade, install 'git', 'build-essential', 'curl'.

Create a user `builder` to do the build, and su -l builder.

Follow the `curl | sh` instructions from rustup.rs to install the latest stable Rust.

Copy the generated binary to the output directory to get it out of the container.

I would like the container stuff to all be done on tmpfs but I haven't looked up all the correct incantations for that yet.

Anyway, the result is a binary build with the latest stable rust but on a Debian 11 system with Debian 11's GCC and GLIBC. So that works great and I can just copy the binary up to my little VPS and run it.

Unique unprivileged user to run the server, of course.

Server is managed by systemd using a simple .service file.

I let the server generate its certificate and then chmodded the certificate and dir read-only to reduce risk of accidentally wiping it.

Rust async remains a mixture of really great and nice to use, but also kind of annoying in the number of dependencies you need to pull in and in the types used.

The best way I've found of dealing with it continues to be to follow the old pure functional programming guidance: Write a pure functional core (which doesn't need to care about async at all but doesn't do any I/O itself), and then wrap that with an async I/O 'driving' layer in the code that does all the messing around with async, futures, timeouts, read/write stuff.

It might be nice for the server to cache file content? Except... maybe not, since the operating system already caches file content so it's only important if the request rate is high enough that caching in the server memory lets you skip some system calls.

It might be nice to make the server a systemd "socket activated unit" which would mean it could run in quite an isolated environment.

It might be nice to find a way for the server to load its certificate without needing to retain access to that part of the filesytem. That way the server could be run in an environment in which the only files it can see at all (the only files in its mount namespace) are the files it's allowed to serve.

It might be nice for the server to include some more dynamic stuff, e.g., I'll probably extend the existing /info endpoint to show some server stats.

It would be good to clean up the logging a lot; it's a total mess right now.

This Gemini Server (part one)

Lessons from this implementation

Protocol

Building and deploying the server

Async in Rust

Possible future stuff