-- Leo's gemini proxy

-- Connecting to gemini.exoticsilicon.com:1965...

-- Connected

-- Sending request

-- Meta line: 20 text/gemini; charset=utf-8

Resizing softraid volumes with Crystal Kolipe


Introduction


Enlarging a softraid volume on OpenBSD could seem impossible at first glance, but it's actually easier than you might think.


Today, our ever popular Crystal Kolipe takes time away from her other duties to show us how it's done, and discovers a decade-old kernel bug in the process. Yet again, Exotic Silicon finds a bug in OpenBSD that the project's own continuing code review seemingly didn't.


Fundamentals - growing a regular partition with growfs


Before we look at resizing a softraid volume, let's see how a regular FFS filesystem can be expanded.


Many users of OpenBSD will be familar with /sbin/growfs. This utility allows an FFS filesystem to enlarged, and is usually called after modifying the partition layout using /sbin/disklabel.


For example, if we have the following disk layout:


 #                size           offset  fstype [fsize bsize   cpg]
   c:           465.7G                0  unused
   d:           100.0G              128  4.2BSD   2048 16384     1
   e:           100.0G        209715392  4.2BSD   2048 16384     1
   f:           100.0G        419430592  4.2BSD   2048 16384     1
   g:           100.0G        629145792  4.2BSD   2048 16384     1
   h:            65.7G        838860992  4.2BSD   2048 16384     1

We can create an ffs filesystem on partition d:


 # newfs rvnd0d

 /dev/rvnd0d: 102400.0MB in 209715264 sectors of 512 bytes
 506 cylinder groups of 202.50MB, 12960 blocks, 25920 inodes each
 super-block backups (for fsck -b #) at:
  160, 414880, 829600, 1244320, 1659040, 2073760, 2488480, 2903200, 3317920,

 ...

  207774880, 208189600, 208604320, 209019040, 209433760,

We can easily see it's total size by using /bin/df:


 # df -h /mnt
 Filesystem     Size    Used   Avail Capacity  Mounted on
 /dev/vnd0d    96.9G    2.0K   92.0G     0%    /mnt

And we might later decide to delete partition 'e', and expand the 'd' partition to fill it's space using /sbin/disklabel:


 #                size           offset  fstype [fsize bsize   cpg]
   c:           465.7G                0  unused
   d:           200.0G              128  4.2BSD   2048 16384 12960
   f:           100.0G        419430592  4.2BSD   2048 16384     1
   g:           100.0G        629145792  4.2BSD   2048 16384     1
   h:            65.7G        838860992  4.2BSD   2048 16384     1

Re-sizing the partition entry is trivial, as we are just changing a few bytes in the disklabel.


However, the filesystem that was previously on the d partition just stays there. Nothing changes. It's still the same size as before, with all of it's data intact:


 Filesystem     Size    Used   Avail Capacity  Mounted on
 /dev/vnd0d    96.9G    2.0K   92.0G     0%    /mnt

> Resizing a partition doesn't automatically change the underlying filesystem


If we actually want to enlarge the filesystem itself, we can use /sbin/growfs:


 # growfs -y /dev/vnd0d
 new filesystem size is: 104857616 frags
 Warning: 148544 sector(s) cannot be allocated.
 growfs: 204727.5MB (419281920 sectors) block size 16384, fragment size 2048
 	using 1011 cylinder groups of 202.50MB, 12960 blks, 25920 inodes.
 super-block backups (for fsck -b #) at:
  209848480, 210263200, 210677920, 211092640, 211507360, 211922080, 212336800,

 ...

  415964320, 416379040, 416793760, 417208480, 417623200, 418037920, 418452640,
  418867360

And after a quick fsck, we can verify that the filesystem has, indeed, grown:


 # fsck /dev/vnd0d

 ** /dev/rvnd0d
 ** Last Mounted on /mnt
 ** Phase 1 - Check Blocks and Sizes
 ** Phase 2 - Check Pathnames
 ** Phase 3 - Check Connectivity
 ** Phase 4 - Check Reference Counts
 ** Phase 5 - Check Cyl groups
 1 files, 1 used, 101528615 free (7 frags, 12691076 blocks, 0.0% fragmentation)
 # mount /dev/vnd0d /mnt
 # df -h /mnt

 Filesystem     Size    Used   Avail Capacity  Mounted on
 /dev/vnd0d     194G    2.0K    184G     0%    /mnt

No equivalent tool to resize softraid volumes


The process described above works fine for regular, unencrypted partitions. It also works fine for resizing partitions within an existing softraid volume.


However, it's not an uncommon scenario to have an existing softraid partition, particularly a softraid crypto partition, that has free space after it on the disk holding the outer RAID partition. In a virtual hosting environment, this is an extremely easy situation to encounter, as the size of any virtual disks on the host can be changed arbitrarily.


So we might start out with a partition layout like this:


 #                size           offset  fstype [fsize bsize   cpg]
   c:           931.3G                0  unused
   d:           931.3G                0    RAID

We can easily create a softraid crypto volume on this RAID partition, using /sbin/bioctl:


 # bioctl -c C -l vnd0d softraid0

This new crypto volume will appear something like:


 sd5 at scsibus3 targ 3 lun 0:
 sd5: 953674MB, 512 bytes/sector, 1953124472 sectors

If we now detach the volume...


 # bioctl -d sd5
 sd5 detached

We can now increase the size of the RAID partition. In this case, it's a virtual disk that has first been resized and then had it's disklabel updated manually.


 #                size           offset  fstype [fsize bsize   cpg]
   c:             1.8T                0  unused
   d:             1.8T                0    RAID

But of course, if we now re-attach the old softraid partition, it remains the same size as before:


 # bioctl -c C -l vnd0d softraid0
 sd5 at scsibus3 targ 3 lun 0:
 sd5: 953674MB, 512 bytes/sector, 1953124472 sectors

This should come as no surpise, as we haven't modified any of the softraid metadata. We haven't done the equivalent of 'growfs' on the softraid partition itself. The extra 0.9 Tb or so of free space at the end of the newly expanded RAID partition will simply not be used.


What might come as a surprise, is that there is not a tool in the OpenBSD base installation to do the necessary modification of the softraid metadata!


Softraid metadata, (in all it's hexadecimal glory)


Curious readers might at this point be wondering what exactly needs to be changed, what the softraid metadata actually looks like, and how difficult this functionality would be to implement.


Let's dive right in and find out!



Important note:


All of the specific details noted in this article are based on the source code for OpenBSD 7.0-release. Whilst the general principles are applicable to other versions, the structure of the softraid metadata has changed several times over the past few years, and is already at version 6.


If you are using a later, (or much earlier), version of the kernel code, or examining softraid volumes that were created with such a different version, then check the structure of the metadata in use.



First, let's look at the format of the metadata that the softraid system uses. This is defined in /usr/src/sys/dev/softraidvar.h, and there are several structures of interest here.


The location of the softraid metadata is defined as a byte offset value from the beginning of the RAID partition, SR_META_OFFSET. Currently this is 8192 bytes, or 16 sectors.


Several different pieces of metadata are defined. First, we have the metadata for the whole volume, then we have the metadata for that particular chunk. In the case of a softraid crypto volume, (which is what we will be looking at here), we then have some metadata that is specific to that RAID discipline.


If we look at a hexdump of a typical softraid crypto volume, and compare it to the sr_metadata and sr_meta_chunk structures defined in softraidvar.h, we can see where the size of the volume is stored:


 Softraid meta data

 00000000  6d 61 72 63 43 52 41 4d  06 00 00 00 04 00 00 00  |marcCRAM........|
 00000010  59 cc b1 a1 e2 17 42 25  b4 5d ac fa ea c1 ef 35  |Y.....B%.].....5|
 00000020  01 00 00 00 00 00 00 00  01 00 00 00 00 02 00 00  |................|
 00000030  02 00 00 00 43 00 00 00  78 50 6a 74 00 00 00 00  |....C...xPjt....|
                                    ^^^^^^^^^^^^^^^^^^^^^^^
 00000040  4f 50 45 4e 42 53 44 00  53 52 20 43 52 59 50 54  |OPENBSD.SR CRYPT|
 00000050  4f 00 00 00 00 00 00 00  30 30 36 00 00 00 00 00  |O.......006.....|
 00000060  8b 0d 8d b3 85 3e 44 1e  ca 6e a0 4b d1 71 b6 16  |.....>D..n.K.q..|
 00000070  73 64 35 00 00 00 00 00  00 00 00 00 00 00 00 00  |sd5.............|
 00000080  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
 00000090  01 00 00 00 10 02 00 00  01 00 00 00 00 00 00 00  |................|
 000000a0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
 000000b0  76 6e 64 30 64 00 00 00  00 00 00 00 00 00 00 00  |vnd0d...........|
 000000c0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
 000000d0  78 50 6a 74 00 00 00 00  78 50 6a 74 00 00 00 00  |xPjt....xPjt....|
           ^^^^^^^^^^^^^^^^^^^^^^^  ^^^^^^^^^^^^^^^^^^^^^^^
 000000e0  59 cc b1 a1 e2 17 42 25  b4 5d ac fa ea c1 ef 35  |Y.....B%.].....5|
 000000f0  fd 9b bc e5 93 91 50 20  80 87 a1 d0 a5 9d e7 dc  |......P ........|
 00000100  00 00 00 00 01 00 00 00  b0 09 00 00 4e 51 5c eb  |............NQ\.|

Highlighted in the hexdump above are: ssd_size, scm_size, and scm_coerced_size. In a softraid crypto volume, which uses only a single chunk, all three values should be the same. When looking at metadata from other RAID disciplines, they will often be different.


Remembering that these values are stored in little endian format, if we convert them to decimal we get: 0x746A5078 = 1953124472, which matches the value reported when we attach the volume:


 sd5 at scsibus3 targ 3 lun 0:
 sd5: 953674MB, 512 bytes/sector, 1953124472 sectors

If we look at the original disklabel again, before the partition was resized, this time with the sizes in blocks, we can see that the softraid crypto volume is exactly 528 blocks smaller than the RAID partition that contains it:


 #                size           offset  fstype [fsize bsize   cpg]
   c:       1953125000                0  unused
   d:       1953125000                0    RAID

So considering our larger, expanded RAID partition:


 #                size           offset  fstype [fsize bsize   cpg]
   c:       3906250000                0  unused
   d:       3906250000                0    RAID

All we need to do, in principle, is change the stored size values to 0xE8D4A300, since 3906250000-528=3906249472, and 3906249472=0xE8D4A300, which is 00 A3 D4 E8 in little-endian.


However, if we make this change and then try to assemble the volume, it fails:


 softraid0: invalid metadata checksum
 softraid0: one of the chunks has corrupt metadata; aborting assembly

This is obviously because our changes invalidated the checksum stored in ssd_checksum:


 00000000  6d 61 72 63 43 52 41 4d  06 00 00 00 04 00 00 00  |marcCRAM........|
 00000010  59 cc b1 a1 e2 17 42 25  b4 5d ac fa ea c1 ef 35  |Y.....B%.].....5|
 00000020  01 00 00 00 00 00 00 00  01 00 00 00 00 02 00 00  |................|
 00000030  02 00 00 00 43 00 00 00  00 a3 d4 e8 00 00 00 00  |....C...........|
 00000040  4f 50 45 4e 42 53 44 00  53 52 20 43 52 59 50 54  |OPENBSD.SR CRYPT|
 00000050  4f 00 00 00 00 00 00 00  30 30 36 00 00 00 00 00  |O.......006.....|
 00000060  8b 0d 8d b3 85 3e 44 1e  ca 6e a0 4b d1 71 b6 16  |.....>D..n.K.q..|

The correct checksum would be:


 00000060  98 f3 c0 ac 2f d6 5c e3  8c 12 f8 78 1c 07 d7 09  |..../.\....x....|

With this new, re-calculated checksum in place, the softraid volume can be assembled, and correctly reports it's new size:


 sd5 at scsibus3 targ 3 lun 0:
 sd5: 1907348MB, 512 bytes/sector, 3906249472 sectors

Success!


However, what about the rest of the softraid metadata?


Remember that we changed the two size fields in the chunk metadata as well...


The chunk metadata, (more hex, you know you love it!)


There is a checksum for the chunk metadata, too, (which I could call a 'chunksum', but I'll resist the temptation):


 Softraid chunk metadata

 00000070  73 64 35 00 00 00 00 00  00 00 00 00 00 00 00 00  |sd5.............|
 00000080  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
 00000090  01 00 00 00 10 02 00 00  01 00 00 00 00 00 00 00  |................|
 000000a0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
 000000b0  76 6e 64 30 64 00 00 00  00 00 00 00 00 00 00 00  |vnd0d...........|
 000000c0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
 000000d0  10 a5 d4 e8 00 00 00 00  10 a5 d4 e8 00 00 00 00  |................|
 000000e0  59 cc b1 a1 e2 17 42 25  b4 5d ac fa ea c1 ef 35  |Y.....B%.].....5|
 000000f0  fd 9b bc e5 93 91 50 20  80 87 a1 d0 a5 9d e7 dc  |......P ........|
 00000100  00 00 00 00 01 00 00 00  b0 09 00 00 4e 51 5c eb  |............NQ\.|

Since we didn't update the chunksum, (OK, I couldn't resist afterall), along with our changes, we might have expected the kernel to complain.


In fact, reading the softraid code, we can quickly discover that the size values in the chunk metadata are not really very important for crypto volumes anyway.

But looking closer at the chunk metadata checksum, we can clearly see that it's wrong, even for the original, unmodified data:


 000000a0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
                                    ^^^^^^^^^^^^^^^^^^^^^^^
 000000b0  76 6e 64 30 64 00 00 00  00 00 00 00 00 00 00 00  |vnd0d...........|
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 000000c0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 000000d0  78 50 6a 74 00 00 00 00  78 50 6a 74 00 00 00 00  |xPjt....xPjt....|
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 000000e0  59 cc b1 a1 e2 17 42 25  b4 5d ac fa ea c1 ef 35  |Y.....B%.].....5|
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

 000000f0  fd 9b bc e5 93 91 50 20  80 87 a1 d0 a5 9d e7 dc  |......P ........|
 00000100  00 00 00 00 01 00 00 00  b0 09 00 00 4e 51 5c eb  |............NQ\.|

The bytes that are highlighted above are the 72 bytes that make up sr_meta_chunk_invariant. The correct MD5 checksum of these 72 bytes would be:


 000000f0  4b 42 ea a0 06 bf 68 78  ee 4c 09 2d ce 9e 61 1d  |KB....hx.L.-..a.|

So where does the original, incorrect checksum come from? It turns out that the kernel code in /usr/src/sys/dev/softraid.c is calculating the checksum of bytes 168-184, highlighted here:


 000000a0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
                                    ^^^^^^^^^^^^^^^^^^^^^^^
 000000b0  76 6e 64 30 64 00 00 00  00 00 00 00 00 00 00 00  |vnd0d...........|
           ^^^^^^^^^^^^^^^^^^^^^^^
 000000c0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
 000000d0  78 50 6a 74 00 00 00 00  78 50 6a 74 00 00 00 00  |xPjt....xPjt....|
 000000e0  59 cc b1 a1 e2 17 42 25  b4 5d ac fa ea c1 ef 35  |Y.....B%.].....5|
 000000f0  fd 9b bc e5 93 91 50 20  80 87 a1 d0 a5 9d e7 dc  |......P ........|
 00000100  00 00 00 00 01 00 00 00  b0 09 00 00 4e 51 5c eb  |............NQ\.|

Bug alert!


This is obviously a bug, as these 16 bytes represent scm_volid, scm_chunk_id, and the first 8 bytes of scm_devname. It makes little sense to protect only these fields with a checksum, whilst leaving out the size and uuid fields, (and the last three quarters of scm_devname).


The code responsible for this is on lines 570-571 of softraid.c. Here we can see sr_checksum being called with a final parameter of sizeof(scm->scm_checksum). This value is passed directly to MD5Update, as the length parameter. If we now look at the definition of scm_checksum in softraidvar.h, we can see that it is just an array of unsigned 8-bit values, of length MD5_DIGEST_LENGTH. So sizeof(scm->scm_checksum) will always evaluate to MD5_DIGEST_LENGTH, which is ultimately 16.


The correct value for the last parameter would be sizeof(struct sr_meta_chunk_invariant), and indeed the code did do this at some point in the past, until revision 1.262, committed on 20111228, where the current code replaced it, and the bug was introduced.


Kernel bug analysis


Impact: LOW (But lack of code review is concerning)


Since the kernel code blissfully ignores the invalid chunk metadata checksum during normal use, and allows the chunk to be attached and used regardless, it would be easy to conclude that there is no immediate impact. Additionally, the fields which are not protected by checksum are not particularly important for the crypto disciplne, so the effect of any data corruption here is also mitigated to some extent.


However, the mere fact that the chunk checksum is defined at all, but then not validated should be a cause for concern in itself. Additionally, this is fairly trivial bug which has presumably escaped notice for a number of years, or at least not been fixed. How many other bugs are lurking in the softraid code is an open question.


Interestingly, although the commit that introduced the bug was posted to the OpenBSD project's tech mailing list on 20111226 with a request for 'OKs', (the standard process for peer review and approval), there were no explicit OK replies from developers on the list during the next couple of days, and the commit message doesn't mention any either. The fact that new bugs can so easily be introduced due to a seeming lack of interest in peer review of new code rather makes a mockery of the long held claims on the project's website about their 'proactive auditing process'.


Perhaps less obviously, though, this bug also potentially creates an interesting leak of metadata. During forensic analysis of a softraid volume it would be possible to deduce with a high degree of confidence whether it was created using a kernel version before or after version 1.262 of softraid.c was committed. This could have implications, for example, if a business was claiming to have lost copies of financial transactions before a certain date, due to hardware failiure and subsequent replacement.


Fixing the bug, Crystal to the rescue!


Fixing the bug is not as straightforward as just changing sizeof(scm->scm_checksum) to sizeof(struct sr_meta_chunk_invariant). If we read the code in sr_meta_init carefully, we can see that scm_coerced_size has not yet been set by the time we reach line 570. This only happens in line 585. As a result, the calculated checksum would still be wrong!


Instead, we need to move the call to sr_checksum in to the loop below, as well as correcting it's arguments.


The following patch to softraid.c corrects the generation of checksums for chunk meta data.


This patch is against OpenBSD 7.0-release, but also applies cleanly to OpenBSD 6.9-release.


 Patch against OpenBSD-7.0
 --- softraid.c.dist
 +++ softraid.c
 @@ -570,2 +569,0 @@
 -		sr_checksum(sc, scm, &scm->scm_checksum,
 -		    sizeof(scm->scm_checksum));
 @@ -583,2 +581,2 @@
 -	/* Equalize chunk sizes. */
 -	SLIST_FOREACH(chunk, cl, src_link)
 +	/* Equalize chunk sizes and calculate chunk checksum. */
 +	SLIST_FOREACH(chunk, cl, src_link) {
 @@ -585,0 +584,3 @@
 +		sr_checksum(sc, scm, &scm->scm_checksum,
 +		    sizeof(struct sr_meta_chunk_invariant));
 +	}

Download the above patch as ASCII


A program to fiddle with softraid metadata


Meanwhile, back at the ranch...


Returning to the main issue of resizing a softraid volume, of course, doing this with a hex editor is not the most user friendly of experiences. It is arguably highly educational, but not at all practical for daily usage. What we really need is a program to update all three size fields to any arbitrary value, and re-calculate the correct checksums for the ssd_checksum and scm_checksum fields. Furthermore, if we can detect invalid checksum data generated by the known faulty kernel code and correct it, that would be good too.


Well, fear not my loyal fans, because your favourite hackerette has indeed put together just such a program. I'm calling it Exotic Silicon - SoftRaid Metadata Editor, or es-srme for short.


Download links:


Download a tar archive containing the source code and manual page for es-srme version 1.0.

Download the signify signature for the above tar archive.


Installation instructions:


# tar -xvf es-srme_v1.0.tar
# cd es-srme
# cc es-srme.c
# strip a.out
# mv a.out /usr/local/sbin/
# mv es-srme.8 /usr/local/man/man8/

Source code:


 /*
 Copyright 2022, Exotic Silicon, all rights reserved.
 Redistribution and use in source and binary forms, with or without modification,
 are permitted provided that the following conditions are met:
 1. This software is licensed exclusively under this specific license text.  The
    license text may not be changed, and the software including modified versions
    may not be re-licensed under any other license text.
 2. Redistributions of source code must retain the above copyright notice, this
    list of conditions, and the following disclaimer.
 3. Redistributions in binary form must reproduce the above copyright notice,
    this list of conditions, and the following disclaimer in the documentation
    and/or other materials provided with the distribution.
 4. All advertising materials mentioning features or use of this software must
    display the following acknowledgement: This product includes software
    developed by Exotic Silicon.
 5. The name of Exotic Silicon must not be used to endorse or promote products
    derived from this software without specific prior written permission.
 6. Redistributions of modified versions of the source code must be clearly
    identified as having been modified from the original.
 7. Redistributions in binary form that have been created from modified versions
    of the source code must clearly state in the documentation and/or other
    materials provided with the distribution that the source code has been
    modified from the original.
 THIS SOFTWARE IS PROVIDED 'AS IS' AND ANY EXPRESS OR IMPLIED WARRANTIES,
 INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND
 FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL
 EXOTIC SILICON BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
 EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT
 OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA OR PROFITS; OR BUSINESS
 INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
 CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
 IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY
 OF SUCH DAMAGE.
 */

 /* This is the source code for es-srme version 1.0. */
 /* For more details please visit: */
 /* https://www.exoticsilicon.com/research/resizing_softraid_volumes */
 /* or gemini://gemini.exoticsilicon.com/research/resizing_softraid_volumes */

 #include <stdio.h>
 #include <fcntl.h>
 #include <unistd.h>
 #include <stdlib.h>
 #include <sys/types.h>
 #include <md5.h>

 /* Magic number to identify softraid metadata */

 #define MAGIC (uint64_t)0x4d4152436372616d

 void usage(char * progname)
 {
 printf ("Usage: %s [input] [newsize]\n", progname);
 printf ("Where [input] is a 512 byte softraid partition header, usually found in\nsector 16 of a RAID partition.\n");
 printf ("If [newsize] is specified, new size values are written back to the header,\notherwise the current size values are simply displayed.\n");
 return ;
 }

 int main(int argc, char *argv[], char *env[])
 {
 int fd;
 unsigned char * buffer;
 unsigned char * buffer_md5;
 unsigned int raid_level;
 unsigned int version;
 unsigned int flag_bad_checksum;
 uint64_t magic;
 uint64_t size_vol;
 uint64_t size_chunk;
 uint64_t size_coerced;
 uint64_t size_new;
 MD5_CTX cont;

 if (argc<2 || argc>3) {
 	usage(argv[0]);
 	return (3);
 	}

 flag_bad_checksum=0;

 buffer=malloc(512);
 buffer_md5=malloc(MD5_DIGEST_LENGTH);

 if (buffer == NULL || buffer_md5 == NULL) {
 	dprintf (STDERR_FILENO, "Error allocating buffers.\n");
 	return (3);
 	}

 fd = open (argv[1], O_RDONLY);

 if (fd == -1) {
 	dprintf (STDERR_FILENO, "Error opening %s for reading.\n", argv[1]);
 	return (3);
 	}

 if (read (fd, buffer, 512) != 512) {
 	dprintf (STDERR_FILENO, "Error reading 512 bytes from %s.\n", argv[1]);
 	return (3);
 	}

 close (fd);

 /* Check magic */

 magic=*(uint64_t *)buffer;
 printf ("Read magic: %llx\n", magic);
 if (magic != MAGIC) {
 	dprintf (STDERR_FILENO, "Bad magic, should be %llx.\n", MAGIC);
 	return (3);
 	}

 /* We only support version 6 metadata */

 version=*(long *)(buffer+8);
 if (version != 0x06) {
 	dprintf (STDERR_FILENO, "Metadata version is %x, only version 6 is supported by this program.\n", version);
 	return (3);
 	}

 /* We only support the crypto discipline */

 raid_level=*(long *)(buffer+52);
 if (raid_level != 0x43) {
 	dprintf (STDERR_FILENO, "This doesn't appear to be a softraid crypto volume.\n");
 	return (3);
 	}

 size_vol=*(uint64_t *)(buffer+56);
 size_chunk=*(uint64_t *)(buffer+208);
 size_coerced=*(uint64_t *)(buffer+216);

 printf ("Volume  size is %llu sectors, %lluK, %lluM, %lluG\n",size_vol, size_vol/2, size_vol/2048, size_vol/2097152);
 printf ("Chunk   size is %llu sectors, %lluK, %lluM, %lluG\n",size_chunk, size_chunk/2, size_chunk/2048, size_chunk/2097152);
 printf ("Coerced size is %llu sectors, %lluK, %lluM, %lluG\n",size_coerced, size_coerced/2, size_coerced/2048, size_coerced/2097152);

 if (!((size_vol==size_chunk) && (size_chunk==size_coerced))) {
 	dprintf (STDERR_FILENO, "Mismatch between size fields, exiting...\n");
 	return (3);
 	}

 /* Calculate the MD5 checksum of the first 96 bytes, which is the invariant volume metadata */

 MD5Init (&cont);
 MD5Update (&cont, buffer, 96);
 MD5Final (buffer_md5, &cont);
 printf ("Calculated volume MD5: %llx/%llx\n",*(uint64_t *)buffer_md5,*(uint64_t *)(buffer_md5+8));
 printf ("    Stored volume MD5: %llx/%llx\n",*(uint64_t *)(buffer+96),*(uint64_t *)(buffer+104));

 /* Calculate the MD5 checksum of bytes 168 - 184 */
 /* Certain buggy kernels calculate and store the wrong checksum in the chunk MD5 field */
 /* We take the opportunity to detect this and optionally fix it */

 MD5Init (&cont);
 MD5Update (&cont, (buffer+168), 16);
 MD5Final (buffer_md5, &cont);

 if ( (*(uint64_t *)buffer_md5==*(uint64_t *)(buffer+240)) && (*(uint64_t *)(buffer_md5+8)==*(uint64_t *)(buffer+248)) ) {
 	printf ("Warning: This metadata seems to have been created with a buggy OpenBSD kernel!\n");
 	printf ("The chunk checksum matches bytes 168 - 184, instead of bytes 168 - 240.\n");
 	flag_bad_checksum=1;
 	}

 /* Calculate the MD5 checksum of bytes 168 - 240, which is the invariant chunk metadata */

 MD5Init (&cont);
 MD5Update (&cont, (buffer+168), 72);
 MD5Final (buffer_md5, &cont);

 printf ("Calculated  chunk MD5: %llx/%llx\n",*(uint64_t *)buffer_md5,*(uint64_t *)(buffer_md5+8));
 printf ("    Stored  chunk MD5: %llx/%llx\n",*(uint64_t *)(buffer+240),*(uint64_t *)(buffer+248));

 /* Exit here if we are in read-only mode, I.E. no new size was specified. */

 if (argc==2) {
 	return (flag_bad_checksum);
 	}

 /* Parse new size argument */

 size_new=strtonum(argv[2],0,(uint64_t)1<<62,NULL);
 if (size_new == 0) {
 	dprintf (STDERR_FILENO, "New size argument is invalid.\n");
 	return (3);
 	}

 if (size_new == size_vol) {
 	printf ("Note: supplied size value is equal to existing value!\n");
 	}

 /* Write new data to size fields */

 printf ("New     size is %llu sectors, %lluK, %lluM, %lluG\n",size_new, size_new/2, size_new/2048, size_new/2097152);
 *(uint64_t *)(buffer+56)=size_new;
 *(uint64_t *)(buffer+208)=size_new;
 *(uint64_t *)(buffer+216)=size_new;

 /* Re-calculate both checksums, and write them to the in-memory buffer */

 /* Re-calculate the MD5 checksum of the first 96 bytes, which is the invariant volume metadata */

 MD5Init (&cont);
 MD5Update (&cont, buffer, 96);
 MD5Final (buffer_md5, &cont);
 printf ("       New volume MD5: %llx/%llx\n",*(uint64_t *)buffer_md5,*(uint64_t *)(buffer_md5+8));
 *(uint64_t *)(buffer+96)=*(uint64_t *)buffer_md5;
 *(uint64_t *)(buffer+104)=*(uint64_t *)(buffer_md5+8);

 /* Calculate the MD5 checksum of bytes 168 - 240, which is the invariant chunk metadata */

 MD5Init (&cont);
 MD5Update (&cont, (buffer+168), 72);
 MD5Final (buffer_md5, &cont);

 printf ("       New  chunk MD5: %llx/%llx\n",*(uint64_t *)buffer_md5,*(uint64_t *)(buffer_md5+8));

 *(uint64_t *)(buffer+240)=*(uint64_t *)buffer_md5;
 *(uint64_t *)(buffer+248)=*(uint64_t *)(buffer_md5+8);

 /* Write the in-memory buffer over the original input file */

 fd = open (argv[1], O_WRONLY | O_TRUNC);

 if (fd == -1) {
 	dprintf (STDERR_FILENO, "Error opening %s for writing.\n", argv[1]);
 	return (3);
 	}

 if (write (fd, buffer, 512) != 512) {
 	dprintf (STDERR_FILENO, "Error writing 512 bytes to %s.\n", argv[1]);
 	return (3);
 	}

 close (fd);
 return (2);
 }

Manual page


ES-SRME(8)


NAME

es-srme - display and modify metadata of softraid volumes


SYNOPSIS

es-srme [file] [new size]


DESCRIPTION


The main use of es-srme is to modify the size fields in the metadata of a softraid volume, after the RAID partition containing it has been resized with /sbin/disklabel. The process is somewhat analogous to using /sbin/growfs to enlarge an FFS filesystem after extending it's partition in the same way.


The es-srme utility can also be used to inspect the volume checksum and chunk checksum of a softraid volume, and to detect and correct a specific type of invalid chunk checksum data that has been written by a kernel version which has a known bug.


The file argument should be a regular file of exactly 512 bytes, extracted from the first metadata block of a softraid volume. This will typically be done using a command such as:


# dd if=/dev/rsd0d of=metadata skip=16 count=1

Where rsd0d is a RAID partition. es-srme can then be used to inspect the volume sizes stored in the metadata, and validate the checksums:


# es-srme metadata
Read magic: 4d4152436372616d
Volume  size is 1953124472 sectors, 976562236K, 953674M, 931G
Chunk   size is 1953124472 sectors, 976562236K, 953674M, 931G
Coerced size is 1953124472 sectors, 976562236K, 953674M, 931G
Calculated volume MD5: 5f819d7cd369af9d/d693b4e311791081
Stored volume MD5: 5f819d7cd369af9d/d693b4e311791081
Calculated  chunk MD5: 459d3f38167c4119/a3cae6705a4e0a97
Stored  chunk MD5: 459d3f38167c4119/a3cae6705a4e0a97

A bug in certain versions of the OpenBSD kernel causes an invalid checksum to be calculated and stored for the chunk metadata. es-srme can detect this, and will fix it by writing the correct checksum to the corresponding metadata field when the volume size is changed by the user.


Modifying the size fields of the metadata will usually be done after first resizing the RAID partition that contains the softraid volume using /sbin/disklabel. The required size is typically 528 blocks less than the size of the containing volume. For example, considering the following RAID partition:


#                size           offset  fstype [fsize bsize   cpg]
  c:       7812500000                0  unused
  d:       7812500000                0    RAID

The required size for the softraid partition would likely be 7812500000-528=7812499472. Invoking es-srme with this figure for the second argument will produce output similar to the following:


# es-srme metadata 7812499472
Read magic: 4d4152436372616d
Volume  size is 1953124472 sectors, 976562236K, 953674M, 931G
Chunk   size is 1953124472 sectors, 976562236K, 953674M, 931G
Coerced size is 1953124472 sectors, 976562236K, 953674M, 931G
Calculated volume MD5: 5f819d7cd369af9d/d693b4e311791081
Stored volume MD5: 5f819d7cd369af9d/d693b4e311791081
Calculated  chunk MD5: 459d3f38167c4119/a3cae6705a4e0a97
Stored  chunk MD5: 459d3f38167c4119/a3cae6705a4e0a97
New     size is 7812499472 sectors, 3906249736K, 3814697M, 3725G
New volume MD5: 61f26ecb2f4684f2/fe1c3473a897fb5d
New  chunk MD5: 94a384417b8e3b10/ecaa2ed5cde078f8

At this point, the updated metadata in the file can be written back to the same block it was read from on the RAID partition:


# dd if=metadata of=/dev/rsd0d seek=16 count=1

This should be performed with the softraid volume detached. The softraid volume can then be re-attached, and should report it's new size on the console:


# bioctl -c C -l sd0d softraid0
Passphrase:
softraid0: CRYPTO volume attached as sd5
sd5 at scsibus3 targ 3 lun 0:
sd5: 3814697MB, 512 bytes/sector, 7812499472 sectors

EXIT STATUS


0 - Success, no data written.

1 - Success, no data written, buggy chunk metadata checksum detected.

2 - Success, size fields updated.

3 - Failure, an error occurred.


DIAGNOSTICS


Error allocating buffers

Memory allocation failed.


Error opening file for reading

The supplied file could not be opened for reading.


Error reading 512 bytes from file

Reading the supplied file failed, or returned fewer than 512 bytes of data.


Bad magic, should be 0x4d4152436372616d

The first eight bytes of the supplied file don't match the expected magic. A likely cause is that the wrong sector has been read from the RAID partition.


Metadata version is X, only version 6 is supported by this program

Currently, es-srme only supports version 6 softraid metadata.


This doesn't appear to be a softraid crypto volume

Currently, es-srme only supports softraid crypto volumes. Other disciplines such as RAID-1, and RAID-5, are not yet supported.


Mismatch between size fields, exiting...

The supplied file appears to contain valid softraid metadata for a softraid crypto volume, but the values of the three size fields are not all identical as would be expected from such a volume.


New size argument is invalid

The new size argument is either non-numeric, zero, negative, or exceeds 2^62.


Error opening file for writing

The supplied file could not be opened for writing.


Error writing 512 bytes to file

Writing the modified data back to file failed, or fewer than 512 bytes of data were written.


The chunk checksum matches bytes 168 - 184, instead of bytes 168 - 240

es-srme has detected that the volume checksum has been calculated from the wrong byterange. This is an informational message, and does not stop execution of the program. If es-srme was invoked with a new size parameter, then correct checksum data will be written along with it.


Note: supplied size value is equal to existing value!

The supplied value for new size is the same as the existing size. This is an informational message, and does not stop execution of the program. The metadata will still be re-written, and invoking es-srme in this way can be useful in order to overwrite an invalid volume checksum as described above with a valid one.


SEE ALSO


Crystal Kolipe, Resizing softraid volumes,

https://research.exoticsilicon.com/articles/resizing_softraid_volumes,

gemini://gemini.exoticsilicon.com/articles/resizing_softraid_volumes,

Exotic Silicon, 2022, Supporting material, and official webpage, (material also available via gemini).


HISTORY


The initial version of es-srme was written in March 2022.


AUTHORS


Crystal Kolipe

kolipe.c@exoticsilicon.com


CAVEATS


es-srme intentionally does not stop processing if an invalid checksum is read from file and no specific warning is displayed, (except in the case of the chunk checksum matching the incorrect byte-range described above). The only indication that the checksum is wrong will be a mis-match between the values displayed for calculated MD5 and stored MD5.


LICENSE


es-srme and this manual page are distributed under the following license:


Copyright 2022, Exotic Silicon, all rights reserved.


Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:


1. This software is licensed exclusively under this specific license text. The license text may not be changed, and the software including modified versions may not be re-licensed under any other license text.


2. Redistributions of source code must retain the above copyright notice, this list of conditions, and the following disclaimer.


3. Redistributions in binary form must reproduce the above copyright notice, this list of conditions, and the following disclaimer in the documentation and/or other materials provided with the distribution.


4. All advertising materials mentioning features or use of this software must display the following acknowledgement: This product includes software developed by Exotic Silicon.


5. The name of Exotic Silicon must not be used to endorse or promote products derived from this software without specific prior written permission.


6. Redistributions of modified versions of the source code must be clearly identified as having been modified from the original.


7. Redistributions in binary form that have been created from modified versions of the source code must clearly state in the documentation and/or other materials provided with the distribution that the source code has been modified from the original.


THIS SOFTWARE IS PROVIDED 'AS IS' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL EXOTIC SILICON BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.


Summary


Today, we've seen that it is indeed possible to expand a softraid crypto volume, despite a lack of tools in the OpenBSD base installation to do this. We've studied the structure of the softraid metadata, and identified - and fixed - a kernel bug, which was generating invalid checksums for the chunk metadata.


Home page of the Exotic Silicon gemini capsule.

Your use of this gemini capsule is subject to the terms and conditions of use.

Copyright 2016, 2017, 2018, 2019, 2020, 2021, 2022, 2023 Exotic Silicon. All rights reserved.

-- Response ended

-- Page fetched on Sun May 12 09:10:25 2024