-- Leo's gemini proxy

-- Connecting to gemini.kaction.cc:1965...

-- Connected

-- Sending request

-- Meta line: 20 text/gemini

Too much freedom for the layman


Unix file name is not a string in sense people usually think about it, it is actually bytestring that can contain any bytes except 0x2F (slash which is directory separator) and 0x00 (C string terminator). You can create file with name that contains every possible valid byte with following simple C program:


#include <stdio.h>

int main()
{
	char buffer[256];
	FILE *fp;

	for (unsigned int i = 1; i != 257; ++i) {
		buffer[i-1] = (i != 0x2F) ? i : 1;
	}
	buffer[0] = 'x'; // for autocompletion tests
	fp = fopen(buffer, "w");
	if (!fp) {
		perror("failed to open file for writing");
		return 1;
	}
	fputs("Hello world", fp);
	fclose(fp);
	return 0;
}

Resulting file will upset many other programs:


busybox ls(1) will just print most characters in filename as '?', rendering its output unusable for further processing.

GNU coreutils ls(1) will try to quote problematic characters, yet attempt to use its output yields "File name too long" error, which is absolutely absurd

GNU Bash autocompletion breaks horribly with following error: 'bash: bad substitution: no closing "`"'

zsh autocompletion works fine as long as prefix is sane ('x' in example above)

Opening libreoffice via KDE Dolphin fails with "The name contains too many characters"

Opening chromium via KDE Dolphin fails with ERR_FILE_NOT_FOUND

Opening Emacs via KDE Dolphin fails with "File name too long"

Opening KWrite via GNOME Nautilus opens some fail, but it does not contain expected "Hello world" content

Opening Emacs via GNOME Nautilus works fine

Native shell globbing ("file *") works fine


This is perfectly in the spirit of Unix. Unix just provides mechanism and never tries to bar user from doing dangerous things because it would also bar user from doing smart things. We all are responsible adults, right?


Used to be, not anymore. People do crazy shit, like including "$" or "`" into filenames or creating zip archives under Windows, and they don't see the problem because it happens to work in their favorite file manager or office package.


This world would have been much better place should file names been restricted to "^[._a-zA-Z][._0-9a-zA-Z]$" from day one, but who, at the dawn of time, may have anticipated that computers will fall into the hands of layman?


Today we see and feel it every day. World of computers belong to the layman, the kind of unix wizards is on the brink of extinction and yet, same mistake is being commited once again.


Unicode in domain names, unicode in source files, unicode fucking everywhere. People thinks that it is neat to have domains in their native language, but what they really should think about is difference between "ё.com", "ë.com" and "ë.com" and confusion, attack vectors and unnecessary work it will create. Yes, I know what is punycode -- solution to a problem that should not existed in first place.


Don't get me wrong, unicode is a definitely improvement over zillions single-byte encodings, but that is it. No unicode for sake of unicode, please.


Yet, I understand why it happens. Being able to do something, even as insignificant as putting fancy symbols in unexpected places, is shiny and impresses people, while problems it creates... just kick the can down the road.


The hand of Hary Seldon feels so heavy.

-- Response ended

-- Page fetched on Thu May 9 20:56:12 2024