Converting text files with french character encoding to utf8

Feed

date: 2022-11-16 21:55:38

categories: default

firstPublishDate: 2022-11-16 21:55:38

I have some text files from MSDOS written in the 90s before UTF-8 was common and I don't remember which character encoding was used.

When I open a file, it looks like this:

{**************************************************************************}
{ Projet : FIGDEMO (Exemple de la documentation)                           }
{ Unit<82> FIGURES                                                            }
{ Copyright (c) 1989 Borland International, Inc.                           }
{**************************************************************************}

The <82> should be é. I used the `file` command to detect the encoding:

file -bi PASCAL/FIGURES.PAS
application/octet-stream; charset=binary

Not so helpful, so I installed the python program `chardet`:

pip install chardet
chardet PASCAL/FIGURES.PAS
PASCAL/FIGURES.PAS: Windows-1252 with confidence 0.711673640167364

Windows-1252 is the french encoding, running `iconv -f windows-1252 -t utf-8 PASCAL/FIGURES.PAS -o out.file` doesn't give a good result.

Searching the internet, I found out the character encoding is `CP850`:

iconv -f CP850 -t utf-8 PASCAL/FIGURES.PAS -o out.file
{**************************************************************************}
{ Projet : FIGDEMO (Exemple de la documentation)                           }
{ Unité FIGURES                                                            }
{ Copyright (c) 1989 Borland International, Inc.                           }
{**************************************************************************}

Feed

-- Response ended

-- Page fetched on Tue May 21 23:34:35 2024