-- Leo's gemini proxy

-- Connecting to gmi.noulin.net:1965...

-- Connected

-- Sending request

-- Meta line: 20 text/gemini

Converting text files with french character encoding to utf8


Feed


date: 2022-11-16 21:55:38


categories: default


firstPublishDate: 2022-11-16 21:55:38


I have some text files from MSDOS written in the 90s before UTF-8 was common and I don't remember which character encoding was used.


When I open a file, it looks like this:


{**************************************************************************}
{ Projet : FIGDEMO (Exemple de la documentation)                           }
{ Unit<82> FIGURES                                                            }
{ Copyright (c) 1989 Borland International, Inc.                           }
{**************************************************************************}

The <82> should be é. I used the `file` command to detect the encoding:


file -bi PASCAL/FIGURES.PAS
application/octet-stream; charset=binary

Not so helpful, so I installed the python program `chardet`:


pip install chardet
chardet PASCAL/FIGURES.PAS
PASCAL/FIGURES.PAS: Windows-1252 with confidence 0.711673640167364

Windows-1252 is the french encoding, running `iconv -f windows-1252 -t utf-8 PASCAL/FIGURES.PAS -o out.file` doesn't give a good result.


Searching the internet, I found out the character encoding is `CP850`:


iconv -f CP850 -t utf-8 PASCAL/FIGURES.PAS -o out.file
{**************************************************************************}
{ Projet : FIGDEMO (Exemple de la documentation)                           }
{ Unité FIGURES                                                            }
{ Copyright (c) 1989 Borland International, Inc.                           }
{**************************************************************************}

Feed

-- Response ended

-- Page fetched on Tue May 21 23:34:35 2024