-- Leo's gemini proxy

-- Connecting to smol.gr:1965...

-- Connected

-- Sending request

-- Meta line: 20 text/gemini

Long term data archival


Date: 20230525

Tags: backup, archival, storage, bitrot


             ___________________
            |,--------.         |
            || backup |         |
            |`--------'         [
            |        .-.        |
            |       |   |       |
            |        `-' o      |
            |        .-.        |
            |        : :        |
            |        :_;        |
            |_______.___._______|

Goal: Archive data for long-term storage.


Requirements:


- Durable storage.

- Resiliency to bit rot.

- No need for special rooms or conditions to store media.

- Easy retrieval; no need to wait hours to restore data.

- Encryption.

- Indexed archives for easy reference.


Selecting storage media


Common media choices are:


💿 Optical storage (with one notable exception) is very unreliable and will result in read errors usually in a few years time. The exception here is M-DISC, which we'll talk about in a bit. There's also Syylex Glass Master Disc but it's ridiculously expensive ($1000 per disc).


🖴 Flash storage and Solid State drives need occasional connection to power to "refresh" bits and not lose data. Plus, cheap consumer-grade SSDs are notoriously unreliable. Avoid. HDDs are also not reliable (susceptible to sudden bad sectors) and may not even start after being dormant for a few years if you're unlucky. Avoid as well.


📼 Tapes (LTO) are too slow, need a lot of time for data retrieval, need expensive equipment and special storage conditions (low humidity, climate control etc) to be reliable for long-term data storage.


💾 Floppies. Gotta love them for nostalgia, but no.


Avoid other obscure media. Chances are, the hardware you'll need to read them will be obsolete and very difficult to find a few decades' time.


So, what to choose?


M-DISC. (Unless you have huge datasets, where tape is the only realistic option). From Wikipedia:


> M-DISC's design is intended to provide archival media longevity. M-Disc claims that properly stored M-DISC DVD recordings will last up to 1000 years. The patents protecting the M-DISC technology assert that the data layer is a glassy carbon material that is substantially inert to oxidation and has a melting point of 200–1,000 °C (392–1,832 °F). M-Discs are readable by most regular DVD players made after 2005 and Blu-Ray & BDXL disc drives and writable by most made after 2011.


There have been accelerated aging tests for M-DISCS that prove their increased durability compared to even the best quality alternatives, but whether they'll last 50 or 500 years, is something to be seen. Other advantages:


- No need for specific equipment to read. DVD and BluRay drives will probably be here for a long time.

- No need for special storage environment, stash in a drawer and forget.

- No need to purchase special equipment to write. A good quality writer is recommended nevertheless; I got a Toshiba USB3 M-DISC writer at around $200 a few years ago.


There are M-DISC DVDs and BluRays, I chose the latter with the 25GB capacity which is decent. If you have huge storage requirements, then you should revisit LTO storage instead.


Backup procedure


1. Create Veracrypt volume and put your data there.

2. Fortify the volume file with extra metadata to recover from data corruption.

3. Burn the final files to the disc.


1. Encryption


I use Veracrypt. It's easy to use and cross-platform, runs on Windows, MacOS, Linux _and_ OpenBSD (which is what I use). If you're sure you're always going to use a specific platform, you can use their specific tool, such as LUKS on Linux.


To create a new Veracrypt volume:


# veracrypt --text --create enc.vc --volume-type=normal \
  --size=<file_size_in_bytes> --filesystem=fat --encryption=aes \
  --hash=SHA-512 --random-source=/dev/urandom --keyfiles='' --pim='0'

To mount it:


# veracrypt --pim='0' --keyfiles='' ./enc.vc /mnt/enc

To unmount it:


# veracrypt --dismount /mnt/enc

2. Recovering from corruption


To ensure we can recover our data in case of errors, we'll use Parchive (Par2).


Create a Par2 archive with 5% recovery size and one recovery file:

# par2 create -r 5 -n 1 -a enc.vc.par2 enc.vc

To validate a Par2 archive:

# par2 verify ./enc.vc

In case of errors, repair:

# par2 repair ./enc.vc

3. Indexing


To create an encrypted list of files included in the backup:

find . | gpg --armor --cipher-algo AES-256 --symmetric >./files.txt.asc

To see all files included in the backup:


# gpg -d ./files.txt.asc

4. Burning files to disc


After following the above steps, you'll have a set of four files. Burn those on the disc using Brasero or your favourite optical disc burning software. The first time you do this, before you stash the disc away I suggest you follow the procedure backwards to make sure you can decrypt and restore the files correctly.


http://archive.org/details/lne-syylex-glass-dvd-accelerated-aging-report

https://en.wikipedia.org/wiki/Parchive

https://github.com/Parchive/par2cmdline

https://www.veracrypt.fr/en/Home.html

-- Response ended

-- Page fetched on Fri May 10 03:49:49 2024