gemini://thrig.me/blog/2023/01/04/log-analysis-pitfalls.gmi

Log analysis is not without pitfalls. For example, fail2ban has suffered from a number of issues ranging from annoyances to security flaws. Since there has been recent uh geminews? about pulling strings out of logfiles, and at least one of the fail2ban security flaws involved using bad regular expressions to do that, now might be a good time to review various issues with the practice.

On the annoyances front, someone might configure one email per log event, which means that if the service gets a lot of hits, one may then see a cascade of other problems, such as email service failures or an account being locked due to high activity or triggers on firewall rules--the exact side effects will vary depending on how bad the deluge of email is and what that deluge triggers. This may vary from a nothing burger--oh, whoops, but that's trivial to mass delete from mutt--to something more problematic, like if your email provider locks your account and you have to resort to Twitter Shaming or something to try to get your account back from the cold clutches of corporate logic (good luck, have fun!). Yes, I have had to call a pager company to have my pager reactivated after I got paged like 900 times because some tube on the Internet got unstuck. At least there was a pager company to call back then, with human support representatives. But those cost money...

Anyways, I do not much like using email for notification or reporting, as it can lead to hypothetical situations such as "Nagios is sending ~10,000 emails per year and for some reason all the other sysadmins filtered all emails from Nagios to Spam or /dev/null or whatever, and then waited for the guy who setup Nagios to tell them when there was a problem". Email still could be used; maybe the email could send a consolidated list of links for an hour; that way you would get at most 24 emails per day even if someone or something is flooding the logs. What is the worst case you can tolerate? Or maybe email isn't the right solution? Think about how else you could design it... an RSS feed, perhaps? Or maybe a database, and then build tools around that? SQLite is pretty easy to use.

The problem here is that logfiles can contain strings that a remote attacker can specify. Generally this requires that the attacker know something about the log format being used, and knows how your software is acting on those logs, which is a particular problem for the public fail2ban but maybe less of a concern if your code is not published anywhere. Security through obscurity may not be a good plan, and it is usually straightforward to write the regular expressions (or parsing code) such that unexpected things do not match.

First up, for logfile regular expressions, anchor the expression somewhere, ideally to the beginning of the line. This also will usually make the match a lot faster, as the engine does not need to try matching from the first character, then from the second character, then... hopefully you get the idea. An example may help; given a log line such as

which does work, if you are in a hurry, but will also match who knows what... an attacker would aim to generate a request that gets some string they provide into something this expression matches, and now you have an untrusted string being passed to who knows what. Luckily that string cannot have spaces in it which may limit damage, but then do you decode the URL and pass that to the database or a shell command? Have fun tracing where that string goes in your code!

With the key being the ^ anchor of the expression to the start of the log line, and the rest is to make the match better happen only where we want it to. Certainly this is more fragile--what if the log format changes and thus nothing matches any more? This also assumes that the gemini server is sane, and will not drop a 9999 character URL on you; (\S{1,1024}) might be a sensible change to limit what can be picked up. But a more specific match helps avoid regex from matching unexpected inputs and then unexpected things happening.

(Also the traditional syslog log format is terrible and should be changed to use ISO8601 dates or epoch.nanosecond timestamps, but it's what I have on my OpenBSD server (and desktop) and I can't be arsed to change it.)

What you do with the match will also need to be audited, especially if you let it near a bad SQL statement or a bad shell command. Bobby drop tables or shell injection attacks might be good to know about. But that's a different discussion and will vary depending on the language you are using.

Another method would be to modify the logging code of your gemini server to do something special with particular requests. This way you could have a static site yet perform certain actions with certain requests, without the fuss of parsing a string that the gemini server hopefully already knew out of a log. This would not help if the gemini server still returns an unsuitable response code to the client, and if you're already modifying the server, why not also fix that problem while you are there?

The downside would be more code in the server, and therefore a larger attack surface there, as compared to risks from poor or fragile code in your log scanning software.

Yet another option would be for the server to log to a database (or some logging API?), in which case various fields might be put into various columns, or a file with URI separated by \0 if a database is too complicated. With a database you could maybe select on (untested)

to get everything since the last time that code looked at the database. Putting an index on mtime is about the extent of my database chops, but on the other hand configuring a MTA correctly was for some time beyond the skill of some very highly paid DBAs, and then my pager went insane.

Can you avoid adding the complexity? More features means more complexity means more bugs and security holes and documentation and tests and broken implementations and and and that's what the modern web is for.

Log Analysis Pitfalls

Annoyances

Security Flaws

An Alternative To All This Log Scanning

See Also

Conclusion