-- Leo's gemini proxy

-- Connecting to gemini.hitchhiker-linux.org:1965...

-- Connected

-- Sending request

-- Meta line: 20 text/gemini;lang=en-US

Cross Compilers part 2

2023-03-06

In part one of this series, we built a cross toolchain capable of compiling C and C++ code on an x86_64 linux machine which will then run on a riscv64 linux machine using musl libc. As the final step, we compiled a simple `Hello world!` executable to test the toolchain. In part 2, we're going to go through using the cross compiler to build real world software, and explore some common pitfalls that are encountered due to build systems not being designed with cross compilation in mind.


Best case - a simple program and a blank slate

The first thing that I want to show is compiling a relatively simple C executable without any build system. This is to illustrate how to call the compiler directly. It is also to illustrate that the cross toolchain already knows how to find the correct libraries in it's sysroot, which we defined as part of the configure flags passed to gcc and binutils in part one. Which is going to make you wonder how so many build systems manage to screw up what the compiler already knows how to do.


The following code is /bin/echo taken from HitchHiker Linux. It's released under the Beerware license so feel free to use it. Or not.

#include <locale.h>
#include <stdio.h>
#include <string.h>

int main(int argc, char * argv[]) {
  int n = 0, i = 1;
  (void)setlocale(LC_ALL, "");

  if ((argc > 1) && (!strcmp(argv[1], "-n"))) {
    n = 1;
    i = 2;
  }
  for (; i < argc; i++) {
    (void)fputs(argv[i], stdout);
    if (argv[i + 1] != NULL)
      putchar(' ');
  }
  if (!n)
    putchar('\n');
  if (fflush(stdout))
    return 1;
  return 0;
}

This is a single file, and there are no dependencies other than libc. Let's compile it.

riscv64-linux-musl-gcc -std=c99 -Wall -pedantic -O2 -o echo echo.c

That should finish without errors. Let's check the resulting binary with `file`.

suaron% file echo
echo: ELF 64-bit LSB executable, UCB RISC-V, RVC, double-float ABI, version 1 (SYSV), dynamically linked, interpreter /lib/ld-musl-riscv64.so.1, not stripped

Easy peasy. Note that I could have omitted all of the CFLAGS (-std=c99 -Wall -pedantic -O2) and the command would be even simpler. Anyway, the compiler knew to look in the compiler's sysroot to find the target system's libc and header files. In other words, cross compiling SHOULD BE EASY. In practice it just isn't, because people are clever enough to break it in imaginative ways while attempting to make other things easy.


Next best case, a simple Makefile, no deps

Hand writing Makefiles is becoming a lost art, but it need not be. One of the cool features of make, in it's many implementations, is that the variable $(CC) will generally be automatically set to something sane if you are compiling to run on the same system that the code is being compiled on. So instead of writing a make recipe that compiles object files by calling `gcc`, the recipe can be made generic by calling $(CC) instead. This also usually makes for a great cross compilation story, as you can often cross compile just by passing a new value for CC in the environment.


A great example of this scenario would be the official Lua distribution. Since Lua is written with very portable code that doesn't rely on anything beyond libc, and doesn't call functions from libc that might be specific to a certain libc implementation, we have a great situation for cross compilation. One can extract the source, go into the directory, and do:

suaron% CC=riscv64-linux-musl-gcc make
make[1]: Entering directory '/home/nathan/src/lua-5.4.4/src'
Guessing Linux
make[2]: Entering directory '/home/nathan/src/lua-5.4.4/src'
make all SYSCFLAGS="-DLUA_USE_LINUX" SYSLIBS="-Wl,-E -ldl"
make[3]: Entering directory '/home/nathan/src/lua-5.4.4/src'
riscv64-linux-musl-gcc -std=gnu99 -O2 -Wall -Wextra -DLUA_COMPAT_5_3 -DLUA_USE_LINUX    -c -o lapi.o lapi.c
riscv64-linux-musl-gcc -std=gnu99 -O2 -Wall -Wextra -DLUA_COMPAT_5_3 -DLUA_USE_LINUX   -c lcode.c
riscv64-linux-musl-gcc -std=gnu99 -O2 -Wall -Wextra -DLUA_COMPAT_5_3 -DLUA_USE_LINUX    -c -o lctype.o lctype.c

... omitted for brevity ...

riscv64-linux-musl-gcc -std=gnu99 -O2 -Wall -Wextra -DLUA_COMPAT_5_3 -DLUA_USE_LINUX    -c -o linit.o linit.c
ar rcu liblua.a lapi.o lcode.o lctype.o ldebug.o ldo.o ldump.o lfunc.o lgc.o llex.o lmem.o lobject.o lopcodes.o lparser.o lstate.o lstring.o ltable.o ltm.o lundump.o lvm.o lzio.o lauxlib.o lbaselib.o lcorolib.o ldblib.o liolib.o lmathlib.o loadlib.o loslib.o lstrlib.o ltablib.o lutf8lib.o linit.o
ar: `u' modifier ignored since `D' is the default (see `U')
ranlib liblua.a
riscv64-linux-musl-gcc -std=gnu99 -O2 -Wall -Wextra -DLUA_COMPAT_5_3 -DLUA_USE_LINUX    -c -o lua.o lua.c
riscv64-linux-musl-gcc -o lua   lua.o liblua.a -lm -Wl,-E -ldl
riscv64-linux-musl-gcc -std=gnu99 -O2 -Wall -Wextra -DLUA_COMPAT_5_3 -DLUA_USE_LINUX    -c -o luac.o luac.c
riscv64-linux-musl-gcc -o luac   luac.o liblua.a -lm -Wl,-E -ldl
make[3]: Leaving directory '/home/nathan/src/lua-5.4.4/src'
make[2]: Leaving directory '/home/nathan/src/lua-5.4.4/src'
make[1]: Leaving directory '/home/nathan/src/lua-5.4.4/src'

Now this is a perfect situation. In some cases it will also be neccessary to set variables for other tools such as `AR=riscv64-linux-mus-ar RANLIB=riscv64-linux-musl-ranlib CC=riscv64...`. Generally though, a well crafted Makefile makes me smile as it's usually vastly easier to deal with than more complex build tools.

Some possible Makefile issues

One of the most common mistakes you may run across when dealing with plain Makefiles, and with any build runner in general, is hardcoded paths. It is usually not even necessary to tell the compiler where to find include files or the linker where to look for system libraries, but this is often done anyway, and with hard coded paths to /usr/include and /usr/lib. This can have wildly unpredictable results varying between a complete failure, to what appears to be a successful build only for you to discover later on that the resulting binary won't run on any system as it has been contaminated with incorrect information from the host system. You only want the compiler to look inside $(sysroot)/usr/include for headers, because that's where the headers for the target are at. I can't even begin to tell you how often people screw this up.


Another example would be in case a developer only wants to support folks compiling their software with gcc, in which case they might put something in their Makefile such as `CC := gcc`. Don't do this. It's just terrible practice to fail if someone tries to use a tool that happens to be named differently than the one on your system. Another compiler may work just fine. As I mentioned above, the CC variable will automatically be set to a sane default anyway, so there is generally no good reason to set it explicitly in a Makefile. If you run across a Makefile that does this and you want to cross compile, just change `CC := gcc` to `CC ?= gcc` and you're good to go. The question mark tells make, "set this value for this var if it isn't already initialized with another value". This way you can still override it on the command line.

GNU Autotools

By far the most common build system still widely in use is GNU's autotools. Nobody likes it, and very few people ever truly understood it. It's entirely awful. But it's actually got a pretty good story for cross compilation if the author hasn't messed up. Autotools has a number of features specifically for this use case, actually. If you want to cross compile with autotools, run `./configure --help` and see if it has the options `--host`, `--target`, or `--build`. The `--target` option should not generally be used except in the case of binutils and compilers, as it specifies the machine for which those tools should generate code. The `--build` option should be set to the target triple of the system which is building the software, while the `--host` option will be set to the target triple of the machine which the compiled code will run on. After running ./configure with those options, you can then run `make && make install` as normal.


While looking through the options which `configure` will accept, look out for a `sysroot` option. Occasionally you will need to set this to the compiler's sysroot in order for things to function right in the build, because the author has gone ahead and used `-I/some/path` or `-L/some/other/path`, but have at least prefixed those paths with $(sysroot), so that by telling ./configure where the sysroot is the correct paths will be used. Occasionally the same thing is achieved by setting an environment variable $SYSROOT. Now, as I keep repeating, YOU DON'T GENERALLY EVER HAVE TO TELL THE COMPILER WHERE TO FIND INCLUDES OR LIBRARIES because it already knows, and in the case of a cross compiler the compiler has information that the author of the software does not. And yet, people persist in thinking that they have to explicitly set these paths in the compiler command line.


Another issue that you may run into with autotools, if your target triple lacks the vendor string (like the compiler built in part 1 of this series) and the project you are attempting to compile has an outdated config.sub there is a very real possibility that it will shit the bed because it thinks you have an invalid target triple, even though what you have is completely valid. This can be fixed by simply replacing the old, stale config.sub with an updated version which has this particular bug fixed.

https://git.savannah.gnu.org/cgit/config.git/plain/config.sub

Libtool

Once upon a time there must have been a legitimate reason for libtool. In 2023 libtool is an anachronysm at best, but is much more likely a complete nuisance. Libtool will screw up your cross builds regularly, as it's whole purpose for being was to abstract away the details of how to invoke the compiler and linker so that code could be built portably across all of the dozens of different Unix variants that used to exist. These days, you're generally using Gcc, Clang or whatever crap Windows ships with. More to the point, the command line interfaces for Gcc and Clang are largely the same. There simply isn't any reason to abstract these things away in 2023 as building libraries on different platforms generally involves running the same commands now. Libtool sucks, and it should not be used. The only thing I've found that can sometimes help when it's misbehaving is to delete all .la files everywhere (The .la files are a special libtool archive, which is used to tell libtool how to link to the library. And in the case of cross compiling, they are almost certainly wrong..). Just nuke them from orbit. It's the only way to be sure.

Dealing with dependencies

If your application depends on libraries other than libc, then those libraries must also be installed in your compiler's sysroot. Compile them using your cross compiler using `--prefix=/usr`, but during `make install` make sure that they are installed into the sysroot by running `make DESTDIR=${sysroot} install`. Once the program's dependencies are installed into the sysroot, then the build system should pick them up if it is sane.

Homebrewed build scripts

These are the worst. You'll often find this sort of thing masquerading as an autotools setup where there is a hand written shell script which performs some tests against the target system and then writes either a `config.mk` file or else generates a `config.h` header which sets up some C preprocessor abuse to make the code more "portable". These scripts are almost never aware of cross compilation concerns, and are an almost sure sign of the presence of other footguns such as hardcoded paths.


Here's how I often deal with the homebrewed build system. I follow the directions that te author has given to compile the software, but for the system running the build, not cross compiled. I then redirect the output of `make` to a text file, often using the `tee` utility, from which one can usually glean the actual commands being issued. Sometimes an author will actually obfuscate this information on purpose by prefixing the commands in their Makefile recipe's with '@', in which case I'll edit the Makefile so that I can get a clear picture of what commands were used to compile the software. Sometimes, if all the hand written script was doing was creating a `config.mk` file, you can then just adjust the config.mk and possibly the actual Makefile to make it cross compile. You might have to change lines in a Makefile from `-I/usr/include` to `-I$(SYSROOT)/usr/include`. Occasionally, if it isn't a huge project, I'll just write my own Makefile once I understand how to build the software.


As an aside, I've replaced the build systems in a number of projects over the years. GNU configure scripts can become quite lengthy, and can also take much longer to run than it takes to compile the software. For HitchHiker, I stripped out the autotools build system from GNU sed and replaced it with a Makefile, cutting the build time down from a minute and thirty seconds to about eight seconds. It was a fun experiment and it really illustrates a point.

Perl

Perl is perhaps the best example of the homebrewed build script. Perl's build runner is written in large part in perl. You have to compile a mini version of perl, then use that mini perl to compile perl. There is a perl-cross project on GitHub which will allow you to cross compile perl using a hacky system, but the problem is that you actually have to then compile any perl modules that you intend to install on the finished system at the same time. Perl's build system records the tools and environment that were used when you built the interpreter, then re-uses that information to build perl modules. So if you use perl-cross, then try to build perl modules on the target machine later with a native toolchain, it will shit the bed because it can't find your cross compiler and the paths are all wrong. Basically this is one of the biggest steaming piles I've ever seen in the FLOSS world and a great example of what not to do.

Other build systems - Cmake, Meson, Waf, and beyond

I hate pretty much all of these tools to one degree or another, but the better ones (Cmake and Meson) will at least provide some way to be sanely configured for what you may want to do. How to do so is largely out of scope for this article and you should look at the documentation for the build system in question.


Indeed, while there is technically no reason why any piece of software cannot be cross compiled, there are practical limits to what is achievable in your spare time. The best candidates for cross compilation are those with a small number of dependencies using a build system that is not a mess of homebrewed code but rather a well thought out system. You can fairly quickly and easily cross build a headless server without a gui, but you're going to run into a lot of headache attempting to build a full desktop experience as you encounter all of the hundreds of differeing build systems and find all of the wonderful bugs introduced by accident into a process that should be, as I harped on above, pretty easy and foolproof. Nevertheless, cross compilation is a tool that I use regularly to deploy code on my little army of SBC's around the house, as it's much faster to build with a fast x86_64 system stuffed with memory than on a small constrained SBC. Plus, I've come to enjoy the troubleshooting and I think it's made me a better developer overall.


Tags for this page

software

compilers

cross-compile


Home

All posts


All content for this site is licensed as CC BY-SA.

© 2023 by JeanG3nie

Finger

Contact

-- Response ended

-- Page fetched on Mon May 20 10:54:07 2024