-- Leo's gemini proxy

-- Connecting to thrig.me:1965...

-- Connected

-- Sending request

-- Meta line: 20 text/gemini

syncmail


syncmail is a script that moves maildir mails from the mail server down to my laptop. Alternatives would be to SSH to the server and read the mail there, or to host the mail via POP or IMAP, but I only have one client system, and a local mail client using a local maildir is so much faster than any network operation, and the syncmail script can run in the background to hide the network lag, at the cost of emails taking longer to show up. Keeping email open all the time and checking it frequently is probably unhealthy, though on the other hand a workplace may demand fairly immediate responses. The "right to disconnect" is one thing, though important messages may need some notification system. Opinions very as to what messages are important and when and how frequently they will be sent.†


There have been serveral implementations of the syncmail script and the differences between them may be instructive. I used to use fetchmail or fdm back in the day, but those programs broke when Gmail changed authentication methods. At workplaces I've often setup or maintained the mail services, so would simply have a mail server deliver mail to my OpenBSD desktop directly, which in hindsight was doubtless problematic with all sorts of regulations around message retention, discoverability, and whatnot. But with Outlook or Gmail being really bad, I'm going to stick to mutt.


A brief maildir primer


/blog/2023/04/07/mbox-maildir-other.gmi


The problem is to move files from dirctory A on computer M to directory B on computer N.


    mail-server$ cd ~mail/inbox/
    mail-server$ ls
    cur new tmp
    mail-server$ exit
    ...
    $ cd ~/mail/inbox
    $ ls
    cur new tmp

So up on the server there is a Maildir directory, comprised of the cur, new, and tmp directories. "cur" isn't strictly necessary for delivery and transfer elsewhere but mutt will complain if it does not exist. Filenames are supposed to be unique to each message. A mail delivery agent is responsible for getting email from a SMTP server into the Maildir directory, but we're not interested in that here, beyond that new messages are written into the "tmp" directory, and then renamed into the "new" directory. The rename, hopefully, is an atomic operation.


Again, the problem here is to get all the files in "new" on the server over to the corresponding "new" directory on the client. How hard could this be?


rsync


Implicit here is SSH transport and public key authentication to make this happen automatically. Exactly how you want to set all that up can get complicated, so is not covered here.


    #!/bin/sh
    # syncmail - transfers mail. Assumes that the "cur new tmp"
    # directories already exist.
    cd ~/mail/inbox || exit 1
    exec blocksig -s '1 2 3 15 30 31' \
    /usr/local/bin/rsync \
    --ignore-existing \
    --info=BACKUP1,COPY1,DEL1,MOUNT1,NONREG1,SKIP1,SYMSAFE1 \
    --links --safe-links \
    --one-file-system \
    --partial-dir=tmp \
    --recursive \
    --remove-source-files \
    --times \
    mail-server:mail/inbox/new/ new

blocksig is a custom script that sets up a signal mask to block various signals by default. That is, I would rather the syncmail script be more likely to run to completion than be interrupted. This may conflict with random system shutdowns, but those are rare, and mail syncs are infrequent. (And one could write a shutdown script wrapper that blocks syncmail from starting, warns if one is running, etc.)

signal/blocksig.c


Getting to this list of rsync options took a lot of digging through the rsync documentation, and fiddling around with test scripts (see below). However, the options may not be suitable for your file transfer needs.


--ignore-existing Prevents a local file from being clobbered by something new on the server. In theory, this should not happen as new files are supposed to have unique names in Maildir directories.

--info Logs various things to standard output that hopefully someone reviews. Notably missing is REMOVE1; if you need to preserve a log of the files copied that may also need to be logged, probably to a different "info" channel than the rest, which are present for various unexpected conditions that someone will need to manually review.

--links Ideally links could be ignored, or would produce an error or warning, as they should not appear in a Maildir directory.

--one-file-system Is a defensive measure in the event the Maildir is split across mount points, and hopefully will be logged.

--partial-dir=tmp Makes rsync work with Maildir; new files are written to this directory and then renamed into the "new" directory.

--remove-source-files This moves files from the server to the client.

--times The modification time is probably good to copy around; notably missing is preservation of group and ownership and modes which I do not care too much about.


A disadvantage of rsync is that it may be a bad fit for a security policy (for me, pledge and unveil), so if an attacker can figure out how to send an email or otherwise inject a file into the server's maildir directory they may be able to execute arbitrary code in rsync that could make network connections, read and write files, and run programs. There are low odds of such an exploit, but if it does exist it could be very bad. Much more likely would be any number of exploits aimed at a mail client, GnuPG, or a malicious file opened by an image or PDF viewer or an overly large and complicated Office Suite, or an overly large and complicated web browser.


Another plus is that the script is short and easy to extend to pulling from other mail servers or putting files into different directories. Maybe there could be a config file with a list of source and destination statements. Another minus is that there's no means to filter messages into different destinations based on e.g. the mailing list or other rules if you get a lot of mail and want to pre-sort it in advance (or to delete the ~10,000 messages per year that someone else's Nagios sends).


Systems with paying customers and admins prone to not knowing what they are doing points to the need for logging on both sides and a more strict interpretation of the Maildir standard, e.g. to raise an error when a filename is reused, or a symlink is present, or maybe if files linger in a "tmp" directory for too long. And then you may need to worry about data retention, like are the unique filenames metadata that must be forgotten or retained for some amount of time, or should the files also be copied off to an archive location that gets backed up? etc.


sftp


sftp also has partial file transfers and the ability to delete remote files. However, the following approach is deeply flawed. This version only came about from the notion "okay, but could I drop rsync as a third-party dependency?" Do not use this!


    #!/bin/sh
    cd ~/mail/inbox/tmp || exit 1
    # bad! do not use!
    printf 'get -f -a mail/inbox/new/*\nrm mail/inbox/new/*\n' |
    sftp mail-server
    mv * ../new

The error is the race condition between the "get *" and the "rm *"; the mail delivery agent could while "get" is copying files put a new message into the directory that the "rm" glob would find and remove. Whoops, silent data loss. To avoid this, either temporarily block the mail delivery agent while sftp runs, or write a more complicated script that first obtains a listing of the files to download and log and remove; any new files added while that initial listing is being processed will be ignored.


Race conditions can be tricky to test for, as you may need to insert a file during a very narrow window. Maybe big test files or a slow network or both will help. In production with lots of emails the odds of something hitting that window is probably 100 percent, and how would you know? Besides maybe a log of the mail being delivered, and then the file getting lost, somewhere. Mail clients can also have errors that delete files. User errors are also not unknown, and humans will usually look for someone else to blame. So ideally your code should be as bulletproof and goodly logged as possible when other people or systems are involved.


golang


This script pushes the complexity of rsync into custom code that makes various SFTP calls over SSH. Disadvantages include having to write the code and bringing in the Go environment and various modules (third party attack surface) but on the other hand pledge and unveil support should help prevent the sync script from, say, opening a remote shell for an attacker, or accessing files that it should not.


syncmail.go


Ideally this code would be made generic and read from a configuration file that specifies the authentication method, username, directories, etc. but that's more work than I really want to do so there are things here to tweak. Even more fancy might be to copy files in parallel.


How to test


Start from a known state and confirm that the expected files end up in the expected locations. This should evolve into a more formal test framework that automates various checks, especially if paying customers are involved.


    #!/bin/sh
    mkdir tmp 2>&1
    rm -rf a b
    mkdir a
    touch a/{a,b,c}
    mkdir b
    touch b/b
    exec rsync \
    --ignore-existing \
    --info=BACKUP1,COPY1,DEL1,MOUNT1,NONREG1,REMOVE1,SKIP1,SYMSAFE1 \
    --links --safe-links \
    --partial-dir=tmp \
    --recursive \
    --remove-source-files \
    --times \
    a/ b

Probably my bespoke scripts haven't deleted any mails…


† Stereotypical Boss: all messages are important all the time, even when you are busy wrestling heavy things out of racks, or it is 2AM in the morning. Stereotypical Me: is the server room on fire, or flooding again? Anything else probably isn't important.


‡ Other people repeatedly email around massive files and get confused when they bump up against SMTP limits. xkcd://763 comes to mind, unless it impacts my mail servers.

-- Response ended

-- Page fetched on Mon May 6 04:40:18 2024