-- Leo's gemini proxy

-- Connecting to perso.pw:1965...

-- Connected

-- Sending request

-- Meta line: 20 text/gemini;

Managing a fleet of NixOS Part 1 - Design choices


Author: Solène

Date: 02 September 2022

Tags: bento nixos nix


Comment on Mastodon


Introduction


I have a grand project in my mind, and I need to think about it before starting any implementation. The blog is a right place for me to explain what I want to do and the different solutions.


It's related to NixOS. I would like to ease the management of a fleet of NixOS workstations that could be anywhere.


This could be useful for companies using NixOS for their employees, to manage all the workstations remotely, but also for people who may manage NixOS systems in various places (cloud, datacenter, house, family computers).


In this central management, it makes sense to not have your users with root access, they would have to call their technical support to ask for a change, and their system could be updated quickly to reflect the request. This can be super useful for remote family computers when they need an extra program not currently installed, and that you took the responsibility of handling your system...


With NixOS, this setup totally makes sense, you can potentially reproduce users bugs as you have their configuration, stage new changes for testing, and users can roll back to a previous working state in case of big regression.


Cachix company made it possible before I figure a solution. It's still not late to propose an open source alternative.


Cachix Deploy


Defining the project


The purpose of this project is to have a central management system on which you keep the configuration files for all the NixOS around, and allow the administrator to make the remote NixOS to pick up the new configuration as soon as possible when required.


We can imagine three different implementations at the highest level:


a scheduled job on each machine looking for changes in the source. The source could be a git repository, a tarball or anything that could be used to carry the configuration.

NixOS systems could connect to something like a pub/sub and wait for an event from the central management to trigger a rebuild, the event may or not contain information / sources.

the central management system could connect to the remote NixOS to trigger the build / push the build


These designs have all pros and cons. Let's see them more in details.


Solution 1 - Scheduled job


In this scenario, The NixOS system would use a cron or systemd timer to periodically check for changes and trigger the update.


Pros


low maintenance

could interactively ask the user when they want to upgrade if not now


Cons


may not run at all if the system is not up at the correct time, or could be run at a delayed time depending on situation

can't force an update as soon as possible

not really bandwidth effective if you often poll

no feedback from the central management about who made/receive the update (except by adding a call to the server?)


Solution 2 - Remote systems are listening for changes (publisher / subscriber)


In this scenario, the NixOS system would always be connected to the central management, using some kind of protocol like MQTT, BOCH or similar.


Pros


you know which systems are up

events from central management are instantaneous and should wait for an acknowledgment

updates should propagate very quickly

could interactively ask the user when they want to upgrade if not now


Cons


this can lead to privacy issue as you know when each host is connected

this adds complexity to the server

this adds complexity on each client

firewalls usually don't like long-lived connections, HTTPS based solution would help bypass firewalls


Solution 3 - The central management pushes the updates to the remote systems


In this scenario, the NixOS system would be reachable over a protocol allowing to run commands like SSH. The central management system would run a remote upgrade on it, or push the changes using tools like deploy-rs, colmena, morph or similar...


Awesome-nix list: deployment-tools


Pros


update is immediate

SSH could be exposed over TOR or I2P for maximum firewall bypassing capability


Cons


offline systems may be complicated to update, you would need to try to connect to them often until they are reachable

you can connect to the remote machine and potentially spy the user. In the alternatives above, you can potentially achieve the same by reconfiguring the computer to allow this, but it would have to be done on purpose


Making a choice


I tried to state the pros and cons of each setup, but I can't see a clear winner. However, I'm not convinced by the Solution 1 as you don't have any feedback or direct control on the systems, I prefer to abandon it.


The Solutions 2 and 3 are still in the competition, we basically ended with a choice between a PUSH and a PULL workflow.


Conclusion


In order to choose between 2 and 3, I will need to experiment with the Solution 2 technologies as I never used them (MQTT, RabbitMQ, BOCH etc…).

-- Response ended

-- Page fetched on Sun May 5 20:52:21 2024