-- Leo's gemini proxy

-- Connecting to shit.cx:1965...

-- Connected

-- Sending request

-- Meta line: 20 text/gemini;

shit.cx


Announcing a Gemini Monitoring Service


2020-12-11T19:44


After seeing a post to the Gemini mailing list by Stephane Bortzmeyer announcing a Nagios plugin he wrote, I began thinking about monitoring Gemini sites.


My capsule is currently monitored by a AWS Route 53. It just checks that port 1965 is open. It riddles my access logs with errors, but it works.


A server obviously can't monitor itself and it's very wasteful for us to each run two servers just to know everything is still up. The Gemini community is remarkably tight-knit — for us, a better way may may be to monitor each others servers. Having a few servers testing your capsule from a few different locations will provide good coverage when the systems performing the checks are down.


I took a look at Nagios but quickly ran away after being presented with:


> The following additional packages will be installed:

> apache2 apache2-bin apache2-data apache2-utils libapache2-mod-php libapache2-mod-php7.3 libaprutil1-dbd-sqlite3 libaprutil1-ldap libdbi1 libjs-jquery libnet-snmp-perl libpq5 libradcli4 libsnmp-base libsnmp30 libtirpc-common libtirpc3 monitoring-plugins monitoring-plugins-basic monitoring-plugins-common monitoring-plugins-standard nagios-images nagios4-cgi nagios4-common nagios4-core php-common php7.3-cli php7.3-common php7.3-json php7.3-opcache php7.3-readline python-crypto python-gpg python-ldb python-samba python-tdb rpcbind samba-common samba-common-bin samba-dsdb-modules smbclient snmp


Gemini is light; I want a lightweight monitoring solution. I started to think about what the smallest, lightest monitoring solution could be.


It really only needs to check that a response code of 20 is returned within a few seconds. Unlike the web, speed is pretty much irrelevant. Content is static so that doesn't need checking. If one page can be returned it's pretty safe to say they all can be. If this isn't enough, a cgi script can be checked.


I would write a loop. Each loop would check the status of a target. If it was down, it would quickly check a few times whether the problem is transient. If all checks fail, an alarm will be sent by email. When it recovers, a recovery email will be sent.


The state of each check would be handled by Redis. A Redis key with an expiration would act as a temporary lock. While a lock exists, the loop would continue without checking the target. Once it expires, the target will be tested and another temporary lock will be made. Most of this was hacked up on the first evening. And most importantly, it's a simple, lightweight 100 line shell script. All the heavy-lifting is offloaded to purpose built tools.


Yesterday I put in on the server then built out a status page which shows the state of all the tests. It's a gemtext document that is generated every minute by cron. This file will form the foundation to watch the watcher. If the file isn't generated recently enough, or it shows that checks aren't being performed then an alarm can be raised for me to investigate.


I'm happy to monitor other sites. Initially I'll be offering checks every 1 minute with a 2 second timeout, but depending on demand I might need to change it to 5 minute checks or build in concurrency.


If you want to run your own healthchecker, you can get the code from here:


https://git.sr.ht/~jonhiggs/gemini-healthchecker/


Send an email to jon@shit.cx if you'd like your capsule monitored. And just in case it's not clear, this is a free service offered as a best effort.



---


More Posts Like This

Return to Homepage


The content for this site is CC-BY-SA-4.0.

-- Response ended

-- Page fetched on Fri Mar 29 13:09:38 2024