-- Leo's gemini proxy
-- Connecting to gmi.noulin.net:1965...
-- Connected
-- Sending request
-- Meta line: 20 text/gemini
date: 2022-11-16 21:55:38
categories: default
firstPublishDate: 2022-11-16 21:55:38
I have some text files from MSDOS written in the 90s before UTF-8 was common and I don't remember which character encoding was used.
When I open a file, it looks like this:
{**************************************************************************} { Projet : FIGDEMO (Exemple de la documentation) } { Unit<82> FIGURES } { Copyright (c) 1989 Borland International, Inc. } {**************************************************************************}
The <82> should be é. I used the `file` command to detect the encoding:
file -bi PASCAL/FIGURES.PAS application/octet-stream; charset=binary
Not so helpful, so I installed the python program `chardet`:
pip install chardet chardet PASCAL/FIGURES.PAS PASCAL/FIGURES.PAS: Windows-1252 with confidence 0.711673640167364
Windows-1252 is the french encoding, running `iconv -f windows-1252 -t utf-8 PASCAL/FIGURES.PAS -o out.file` doesn't give a good result.
Searching the internet, I found out the character encoding is `CP850`:
iconv -f CP850 -t utf-8 PASCAL/FIGURES.PAS -o out.file {**************************************************************************} { Projet : FIGDEMO (Exemple de la documentation) } { Unité FIGURES } { Copyright (c) 1989 Borland International, Inc. } {**************************************************************************}
-- Response ended
-- Page fetched on Tue May 21 23:34:35 2024