-- Leo's gemini proxy

-- Connecting to nanako.mooo.com:1965...

-- Connected

-- Sending request

-- Meta line: 20 text/gemini; charset=utf-8

Benchmarking The New Crystal Compiler Options (-O0 to -O3)


I noticed that v1.11.0 of Crystal is going to ship with some new compiler options that are meant to control how much optimization is done. This should help compile times a bit, so I decided to check out the git trunk, compile Crystal, and see how it handles Benben. And, at the same time, I decided to check the performance of the Common Lisp version of Benben to see how it's coming around, and to see if it's still a good idea to continue working on that port.


For these benchmarks, I used my desktop computer, which has these specifications:


Intel Core i9-10850K, 10 physical cores, 20 logical cores

64 GB RAM

Slackware 15.0 + a custom kernel v6.6.8

Far too much disk space on both HDDs and SSDs.


The software had these versions:


Crystal: git commit e3200d9eb8814e92849503e646325b09647cfe9e (the current WIP v1.11.0)

SBCL: v2.4.0

Benben (Crystal): fossil commit 93e7e91c61c31f2738fb9ef5804ea32bd2c727a1f31f3fa04f9b5f482db234e0

Benben (Lisp): fossil commit b9b57d54983253bf092ca319e2ac6205c210231e6ed8aa1a5c96315bc1c96b00

YunoSynth: fossil commit dd7d35f6457a78a25f15dbbab958e096d7bce626

SatouSynth: fossil commit 705f3f0624a216ccbf62ca134047c13c0d680efe (not online yet)


Things to remember:


YunoSynth will always be in Crystal. Same repo as always.

SatouSynth is a rewrite of YunoSynth in Common Lisp, and is in a separate repo.

Benben is getting ported, not forked. Same repo, different branches.

SatouSynth has about 9k lines of code right now, YunoSynth has 24,205.


The method I used for the Crystal benchmarks was this, in order:


Remove the Crystal cache (~/.cache/crystal)

Remove the /tmp/foo directory

"rake clean"

Build

Render a pre-defined playlist to /tmp/foo using the new binary.


For the Crystal version, I used this when building, adding on the correct -Ox and --single-module, as-needed:


time shards build -p -Dpreview_mt -Dremiconf_no_hjson --no-debug -Dyunosynth_wd40 -Dremiaudio_wd40 -Dremisound_wd40

This is the same as the Rakefile would do for a release build, except I removed the "--release" parameter. Those "-Dxxx_wd40" defines are just to enable some unsafe optimizations in my libraries.


Building Benben for Common Lisp is a bit different right now. I have some "prepare-deps-XXX.sh" scripts that will pre-build some of the dependencies for either a debug or release build. Then, I use "make release=1" or "make debug=1" to build the binary. The two step approach is just to match my personal workflow, and because I can get away with pre-building most stuff with Lisp and then not rebuild it later. This is just temporary, but it does work.


So, building for Common Lisp was done like this:


Remove the Common Lisp build cache (~/.cache/common-lisp)

Remove the /tmp/foo directory

"make clean"

"time ./prepare-deps-release.sh"

"time make release=1"

Render a pre-defined playlist to /tmp/foo using the new binary.


Finally, I rendered a playlist containing 79 songs to a directory on an SSD ("/tmp/foo"). The songs were mostly from X68000, with a few MSX2 songs thrown in as well. These were chosen because both versions of Benben currently support their chips (I could have thrown in the PC Engine as well, but oh well). So the chip emulators exercised were the YM2151, the OKI MSM6258, and the AY-1-8910.


Reading the Results


The times are all in seconds. Lower is better.


The "Avg. Samples/sec" column is the average number of audio samples that Benben is rendering per second. Higher is better.


The "M"ode column indicates how Benben was built. For Crystal, this is -Ox (so -O1, -O3, etc, without --single-module) or -Oxsm (same, but with --single-module added). With Crystal, --release is the same as "-O3 --single-module" according to the docs, so I just labeled this one "release".


My Common Lisp setup only has two modes right now: release or debug.


Crystal Results


Build Times:


Mode     Time
O0       9
O1       15
O2       16
O3       33
O0sm     12
O1sm     81
O2sm     87
release  91

Run Times:


Mode     Time    Avg. Samples/sec
O0       204       926,130
O1        93     2,022,830
O2        85     2,228,536
O3        87     2,178,511
O0sm     200       939,526
O1sm      32     5,864,521
O2sm      28     6,830,182
release   27     7,071,019

Common Lisp Results


Build Times:


Mode     Time
debug    66
release  38

Run times:


Mode     Time    Avg. Samples/sec
debug    120     1,578,634
release   41     4,633,274

Thoughts


So there we have it. The new -Ox options are... kinda cool, I guess? I can see myself using -O1 or -O2 to build a quick debug binary during development, and pretty much abandoning -O0 for most of my use cases. But overall, the difference between them is pretty underwhelming unless you add --single-module. But, adding --single-module doesn't look like something I would ever do because of the increased build times (I'm not counting --release here). So really, I don't see the new options being much use for me.


On the Lisp side, it's interesting to see that the release build took *less* time than the debug build. This is pretty neat, though not useful during development since I'd then lack debug information. The speed of a release build is quite good, though doing a "-O1 --single-module" is enough to inch past it with Crystal. Still, it's impressive.


I'm not sure if I'll use this info to decide what to do yet, or what I'll even do. So for now, I'll continue on my present course. Regardless, it was interesting to see the new compiler options for Crystal.


---------
Page served by Aya https://nanako.mooo.com/fossil/aya/
Aya is under the GNU Affero GPLv3 license

-- Response ended

-- Page fetched on Sun May 19 22:35:46 2024