-- Leo's gemini proxy

-- Connecting to sotiris.papatheodorou.xyz:1965...

-- Connected

-- Sending request

-- Meta line: 20 text/gemini

How to maybe reverse engineer a file format


Need for Speed: Underground 2 (NFSU2) is a 2004 racing game that brought us the best cover of “Riders on the Storm”. It's a game where you can pimp your car to your heart's content with body kits, neon lights, absurdly huge speakers and all kinds of things just for show. All these modifications award you with style points; gather enough and you're invited for a photoshoot for the cover of some magazine or DVD.


Going through an archive of old save files I noticed all the magazine and DVD covers were stored alongside the game profile and so started trying to visualize them.



Attempt 1


Learning from my previous attempt at reverse engineering we can start by using file:


$ file 'Doge DVD 1'
Doge DVD 1: data

That wasn't very helpful. At least we now know it's some bespoke file format.



Attempt 2


Time to look at some bytes with xxd:


$ xxd 'Doge DVD 1'
00000000: 3230 434d 2400 1000 0800 0000 0000 1000  20CM$...........
00000010: bbd0 5160 dadb 43cc 3918 bb51 0d01 0000  ..Q`..C.9..Q....
00000020: 0000 1000 2625 26ff 2625 26ff 2625 26ff  ....&%&.&%&.&%&.
00000030: 2625 26ff 2625 26ff 2625 26ff 2625 26ff  &%&.&%&.&%&.&%&.
... plus 65535 more lines of output

Now this is interesting. Note the value 0xff repeating every 4 bytes. Considering this file contains an image this pattern looks like uncompressed RGBA image data, i.e. one byte each for the red, green, blue and alpha channels of each pixel. The value 0xff might be a fully opaque alpha channel in which case the other three values (26, 25, 26) would represent a very dark gray.


Scrolling through the output of xxd we can observe other values repeating every 4 bytes such as 0x07


...
0000c4c0: 2e2f 4c07 3031 4f07 2a2b 4207 2726 3a07  ./L.01O.*+B.'&:.
...

or even 0x00


...
0000cbb0: 6a52 6200 8c71 8600 876b 8200 795d 7600  jRb..q...k..y]v.
...

Storing uncompressed RGBA values is a rather inefficient way to store an image but it's likely given that all files are exactly 1048612 bytes (or 1.1 MB) in size. The first 36 bytes look like some kind of header and not image data. Subtracting 36 from the file size we get 1048576 bytes and dividing by 4 since we've got 4 bytes per pixel we get 262144, the square root of which is exactly 512. The evidence so far seems pretty strong for a 512x512 RGBA image stored as uncompressed pixel values. Time to attempt to visualize the data.



Visualization attempt 1


We can use tail to skip the header and ImageMagick to convert the raw RGBA data to a PNG image:


tail -c +37 'Doge DVD 1' | convert -depth 8 -size 512x512 RGBA:- 'Doge DVD 1.png'

The resulting PNG image resized to 256x256 (40 KB).

This is it, it's actually just uncompressed image data! Obviously not everything is as expected since the biggest part of the image is transparent. Looks like the alpha channel is used to distinguish the DVD cover text from the actual in-game screenshot it's overlayed onto.



Visualization attempt 2


We can easily remove the alpha channel with an extra option to ImageMagick:


tail -c +37 'Doge DVD 1' | convert -depth 8 -size 512x512 RGBA:- -alpha off 'Doge DVD 1.png'

The resulting PNG image resized to 256x256 (102 KB).

Getting there. But something is still off, the “EA Games” logo should be blue not orange.



Visualization attempt 3


If you've ever used OpenCV before you're likely familiar with this issue, the channel order is actually BGRA and not RGBA. Turns out ImageMagick can also read raw image data in BGRA order:


tail -c +37 'Doge DVD 1' | convert -depth 8 -size 512x512 BGRA:- -alpha off 'Doge DVD 1.png'

The resulting PNG image resized to 256x256 (102 KB).

Now it looks correct except for the fact that magazine and DVD covers in-game don't appear square but elongated along the vertical axis (as one would expect). The “EA Games” logo also seems squished vertically.



Visualization attempt 4


Opening the image in GIMP we can measure the dimensions of the “EA Games” logo to be 55x41. Making it a circle requires resizing the image to 512x686. This is simple enough to do, we just need to tell ImageMagick to ignore the image's aspect ratio with an exclamation mark:


tail -c +37 "$1" | convert -depth 8 -size 512x512 BGRA:- -alpha off -resize '512x686!' "$1.png"

The resulting PNG image (396 KB).

Finally!



Deciphering the header


After inspecting a few different files we can observe that the 36 header bytes have the following format


32 30 43 4D   24 00 10 00   08 00 00 00
00 00 10 00   BB D0 51 60   XX XX XX XX
XX XX XX XX   0D 01 00 00   00 00 10 00

where XX represents bytes whose value changes between files. Let's try to interpret these values as little-endian 32-bit integers to see if we get any useful number. Why integers? We're working with an image so we expect to find something like the image dimensions, size, number of channels, some kind of integer anyway. Why 32-bit? Just because it's the most common integer size in the x86 computers the game is built for. And why little-endian? Because all x86 computers where the game is supposed to run are little-endian.


1296248882      1048612         8
   1048576   1615933627         X
         X          269   1048576

The first number seems weirdly large but its bytes correspond to the printable ASCII characters “20CM” so it's likely the file's magic number.


It might not be apparent at first but we've seen some of these numbers before. The second number of the first row (1048612) is the size of the file in bytes while the first and last numbers of the second and third rows respectively (1048576) are size of the image data in bytes.


The remaining 3 constant numbers don't seem to match anything in the image data. Having the image data size twice is also weird. I suspect this file format can store more than a single image or even more than image data.


Let's check what the header of the profile data looks like, showing 12 bytes per line, grouped by 4:


$ xxd -c 12 -g 4 Doge | head -n 3
00000000: 3230434d b6d60000 08000000  20CM........
0000000c: 92d60000 bbc087f2 077e12b5  .........~..
00000018: 51742456 0d010000 92d60000  Qt$V........
... plus 4578 more lines of output

It is a multi-purpose file format after all. Let's convert those bytes to little-endian 32-bit integers:


1296248882      54966          8
     54930 4068982971 3037888007
1445229649        269      54930

Comparing this with the image header we notice the following:

The 8 and the 269 are there unchanged. Maybe the 8 is the file format version number?

54966 is the size of the file in bytes and 54930 is the size of the file without the header, same as with the image.

The size of the data without the header still appears twice. Maybe the two fields differ in some other type of data.

The second number of the second row in hexadecimal is 0xF287C0BB for the profile and 0x6051D0BB for the image. The value of the 2 least significant bytes for the profile (0xC0BB) is one less than their value for the image (0xD0BB). This seems too good to be a coincidence. Maybe these 4 bytes represent two independent 2-byte fields.


Conclusion


With the information we've got so far the file format looks like this:


The ASCII characters “20CM”.

The size of the whole file in bytes stored as a 32-bit integer.

The decimal number 8 stored as a 32-bit integer. Possibly the file format version number.

The size of the file data in bytes stored as a 32-bit integer. This should be the size of the whole file minus the 36 header bytes, at least for the NFSU2 image and profile sub-formats examined.

4 bytes indicating the sub-format. 0xF287C0BB for the NFSU2 profile and 0x6051D0BB for the NFSU2 image sub-formats respectively.

8 bytes whose value differs between different files of the same sub-format. Their purpose is unknown.

The decimal number 269 stored as a 32-bit integer. Its purpose is unknown.

Again the size of the file data in bytes stored as a 32-bit integer. This might be different for other sub-formats.

The file data. For the NFSU2 image sub-format that is the image data in row-major order with each pixel saved as 4 bytes in BGRA order.


All integers are stored in little-endian order and are most likely unsigned.


Figuring out the missing pieces will likely require cross-examining several images and profiles and maybe even files from other games using the same file format. Considering how much fun this has been, I might do it at some point.


Sotiris 2023-10-19



The best cover of “Riders on the Storm”.

Previous attempt at reverse engineering a file format.

-- Response ended

-- Page fetched on Mon May 20 21:49:47 2024