-- Leo's gemini proxy

-- Connecting to sotiris.papatheodorou.xyz:1965...

-- Connected

-- Sending request

-- Meta line: 20 text/gemini

How not to reverse engineer a file format


As a teaching assistant, I need to correct student papers handed in as PDF files. For this I use Xournal and a Wacom tablet so I can write corrections and notes by hand. For some reason, I started writing my corrections in green color instead of the more common red. Before I handed in my corrected papers, the professor teaching the course sent out an email reminding us to use red color for corrections if possible. Oops.


Xournal doesn't seem to offer an option to recolor annotations in batch. So I was left with the following options:


Hand in the corrections in green

Rewrite 2.5 hours worth of corrections in red

Find a way to automatically convert the color in the Xournal .xoj files from green to red


I'm lazy so of course I went with the last option.


Attempt 1


First thing I did was open one of the .xoj files with a text editor. Turns out it's a binary file, this won't be that easy.


Attempt 2


The next idea I had was to try and find the RGB value of the green I used in the binary file. According to Xournal the hexadecimal RGB value was #008A00. No need for a full-fledged hex editor here, just use the search function of less


xxd file.xoj | less

But the RGB values were nowhere to be found.


Attempt 3


What if I save an empty .xoj file, then draw a single line, save it under a different filename and compare them? Bash's process substitution saved me a bit of typing and I could just write


vim -d <(xxd empty.xoj) <(xxd line.xoj)

which runs the two xxd commands, saves their output to files and then opens those files in vim's diff mode. What I saw was that pretty much the whole file contents have changed, except for the first few bytes and some scattered bytes here and there. Making a small change and seeing a drastic change in the contents of a binary file suggests that it's compressed. I suspected gzip, since it's use is so common in free software, but I wanted to make sure.


I downloaded the Xournal source code and ran


rg xoj
# Or if you don't use rg
grep -R xoj .

I got a few hits on some localization files and several in src/xo-file.c. Halfway through its include I see #include <zlib.h>, my suspicions have been confirmed.


Let's uncompress and see what we get. I first had to add a .gz extension because gzip refuses to try and decompress files otherwise and then ran


gunzip line.xoj.gz
less line.xoj

Turns out .xoj files are just gzip'ed XML. Not only that, in the first few lines I found what I was looking for


<stroke tool="pen" color="green" width="1.41">

So to change all green strokes to red I would just have to


Add a .gz extension to the .xoj file

Use gunzip to uncompress it

Use sed to replace green with red

Use gzip to re-compress the file

Remove the .gz extension


I wrote the following shell script


#!/bin/sh
set -eu

# Parse the input arguments
if [ "$#" -ge 3 ]; then
	old="$1"
	new="$2"
	shift
	shift
else
	echo "Usage: $(basename "$0") OLDSTR NEWSTR FILE...
  Change all occurrences of OLDSTR to NEWSTR in every supplied xournal (.xoj) file"
	exit 2
fi

# Process all files
while [ "$#" -gt 0 ]; do
	# .xoj files are just gzip'ed XML.
	mv "$1" "$1".gz
	gunzip "$1".gz
	sed -i 's/'"$old"'/'"$new"'/g' "$1"
	gzip "$1"
	mv "$1".gz "$1"
	echo "Converted $1"
	shift
done

I run it with a copy of one of the .xoj files first to ensure it works correctly and then I could batch convert all of them by running


./recolor_xournal.sh green red *.xoj

Conclusions


This whole process took maybe around 20 minutes which was a significant time saving compared to changing everything by hand.


Readers more experienced in the standard Unix tools will have noticed that I followed a very roundabout way of figuring things out. I could have avoided all the binary file inspection by using the file utility on a .xoj file which provides the following very helpful output:


file.xoj: gzip compressed data, from Unix, original size modulo 2^32 61627

I guess the moral of the story is that using open formats allows the user to do things the authors of the software never imagined would be needed. So big thanks to the Xournal authors for making this possible.


Sotiris 2022/02/07 (originally written on 2020/11/21)


Xournal

gzip on Wikipedia


-- Response ended

-- Page fetched on Mon May 20 20:06:48 2024