-- Leo's gemini proxy

-- Connecting to freeshell.de:1965...

-- Connected

-- Sending request

-- Meta line: 20 text/gemini;lang=en-GB

Re: Is There A Better Hard Drive Metaphor? [ 2022-04-03 ]

Marginalia writes about trouble with writing to disk.

> Objects and classes are representations of bytes in memory, effortlessly integrated in the language. Why can't they be representations of bytes on a disk?


This sounds like serialisation. Many years ago, I learned how to do this in Java, so I thought I'd see if it still works. Yes, you do something like this:

static void writeThing( Thing t, File f )
         throws IOException {
   try( ObjectOutputStream objOut = new ObjectOutputStream( new FileOutputStream( f ) ); ) {
      objOut.writeObject( t );
      objOut.flush();
   }
}

static Thing readThing( File f )
         throws IOException, ClassNotFoundException {
   try( ObjectInputStream objIn = new ObjectInputStream( new FileInputStream( f ) ); ) {
      return (Thing) objIn.readObject();
   }
}

The thing you get from readThing matches the thing you passed to writeThing. If you look inside the file it's written, there's just a representations of bytes in memory (plus metadata).


According to Wikipedia, some version of this exists in quite a few languages.

https://en.wikipedia.org/wiki/Serialization#Programming_language_support


It's a bit like teleporting people in SciFi. When Captain Kirk beams down, the transporter serialises him, then deserialises him on the planet's surface. But the Kirk on the planet isn't the same as the Kirk who was in the transporter room. He's a copy. In the code snippet above, the deserialised object isn't the same as the original. It's a copy. If you call writeThing and readThing, then you have two things. To prevent duplicate Kirks, there must be some mechanism in the transporter to keep track of where the real Kirk is. First he's in the transporter room; now he's serialised in the transporter beam; now he's on the planet so we can throw away the serialised version. (If there's a well known story where this goes wrong, I never saw that. If there isn't, there should be.) For data in memory, we know where it is because it has an address. If we serialise it, now we know it's in a file, and I suppose that we should really throw away the in-memory data. The when we get it back into memory, we should really throw away the file. Or we accept that there are two copies of the data, and we try to keep them in sync.


OK, there's a problem with identity, but there's also a problem with readability - the file is binary gibberish. If I want to do something with the serialised data, I have to deserialise the gibberish, presumably using the same language. And there's a problem with keeping track of all the different objects that were serialised - some mapping of file names to object IDs. Or you could serialise a big tree of everything - yuk! And you'll want to come up with some way of doing atomic writes for objects that are related. This is getting hard.


Maybe all these issues come down to the fact that you can make an abstraction to treat disk like memory, but like most abstractions, it's leaky. And none of these issues are confined to serialising. Any out-of-memory store like a database has the same issues, but some of them will be taken out of your hands.


Anyone ever used AS/400?

I've never used an AS/400, but I've heard that they don't distinguish between memory and disk. I've no idea how that works, but it seems it's called Single Level Store:

https://en.wikipedia.org/wiki/IBM_AS/400#Single-level_store


#serialisation


back to gemlog

-- Response ended

-- Page fetched on Sat May 4 05:06:20 2024