-- Leo's gemini proxy

-- Connecting to ew.srht.site:1965...

-- Connected

-- Sending request

-- Meta line: 20 text/gemini

2023-01-28

Tools: redo (part 6) The yacc/bison problem: one call produces two artifacts

tags: software

Content

Part 0: Intro

Part 1: Hello, world!

Part 2: Automatic Recording of Dependencies on Header Files

Part 3: CFLAGS and friends, config.sh, compile.do

Part 4: CFLAGS and friends, env/VAR, default.run.do

Part 5: Auto-update BUILDDATE in version.h

Part 6: The yacc/bison problem: one call produces two artifacts

Part 7: Test: Generator for N source files

My code featured in this series can be found at

https://git.sr.ht/~ew/ew.redo/

One of the things any build system must do for me, is the build of hoc, the "Higher Order Calculator" as presented in "Kernighan, Pike --- The Unix Programming Environment" published in 1984. There is this one detail: a call to bison produces two targets from one prerequisite file. bison should not be called twice during the build --- even though in the case of hoc this is an affair of seconds.

Detour: Higher Order Calculator (hoc)

Many years ago, probably 1993 or so, I came across this book and decided to retrace the hoc programm as shown in there. hoc is a calculator programm, however, its core is coded as a grammar. yacc (or more recently bison) will read the grammar and generate two C-code files from the description. Using a grammar requires fewer lines of code to write. Similarly in stage 3 of this little project the use of lex (or more recently flex) and a declarative piece of code will generate a scanner as C-code. The scanner reads the input and reports recognized tokens (like NUMBER or VAR) to its caller. I did learn a lot from this little project, and I can highly recommend it.

I did publish a separate post for this piece of software. It required a few tweaks to transport it into the world of 64bit machines, and to please newer versions of gcc. My code is published on sourcehut.org:

/en/2023/20230116-hoc-revival.gmi

https://git.sr.ht/~ew/ew.hoc

Among other things the book features a makefile, which tries to handle the following problem: yacc/bison produces two files (targets, artifacts), not only one. So a simplistic description of dependencies will cause yacc to be called twice, once to create y.tab.c and a second time to create y.tab.h. The makefile does offer a solution to this problem, and redo needs to solve this somehow as well.

Detour: cons and scons

In 1998 I came across a build tool named "cons" written in perl. It could deal with a number of build problems much better, since it did not rely on the mtime stamp, but rather on hash sums over the code plus the command to build the artifact. It was written by Bob Sidebotham and featured in The Perl Journal.

https://www.foo.be/docs/tpj/issues/vol3_1/tpj0301-0012.html

https://www.gnu.org/software/cons/

cons has been deprecated and is replaced by scons, which is written in python. scons has been heavily inspired by cons.

https://scons.org/

Puzzle Pieces to play with

stage-3 of the hoc code mentioned above consists of the following sources:

hoc.h -- defines the Symbol struct as a type and publishes a few function prototypes

hoc.y -- defines the grammar, including operators, their precedence, how to deal with them, including error handling and the main() function.

init.c -- initializes a list of builtin functions and constants

math.c -- implements said builtin functions

lex.l -- defines the regular expressions building the input scanner. It returns recognized tokens to its caller

symbol.c -- implements functions to deal with installing and retrieving Symbols, i.e. variables, constants, and function pointers.

Just to make sure the code builds, a simple Makefile is present. The desired set of commands to build hoc stage-3 from a clean directory is this:

shell$ make
bison --yacc --defines hoc.y
hoc.y: warning: 1 shift/reduce conflict [-Wconflicts-sr]
hoc.y: note: rerun with option '-Wcounterexamples' to generate conflict counterexamples
mv y.tab.c hoc.c
gcc -O2   -c -o hoc.o hoc.c
flex lex.l
mv lex.yy.c lex.c
gcc -O2   -c -o lex.o lex.c
gcc -O2   -c -o init.o init.c
gcc -O2   -c -o math.o math.c
gcc -O2   -c -o symbol.o symbol.c
gcc -O2 hoc.o lex.o init.o math.o symbol.o -o hoc -lm -lfl

Possible Solution in redo

How to deal with the "bison produces TWO artefacts in one call"-Problem using redo? Let's try to layout a plan. The list below indicates the neccessary .do snippets to create various targets. "hoc.run-bison" is a virtual target, its content is not used in the compiled code.

input files/sources: hoc.h hoc.y init.c lex.l math.c symbol.c

hoc.run-bison.do hoc.y --> hoc.run-bison, tmp_dir/y.tab.{c,h}

hoc.c.do tmp_dir/y.tab.c --> hoc.c

y.tab.h.do tmp_dir/y.tab.h --> y.tab.h

hoc.run-flex.do lex.l --> yy.lex.c

lex.c.do yy.lex.c --> lex.c

default.o.do {init,hoc,lex,math,symbol}.c --> {init,hoc,lex,math,symbol}.o

hoc.do {init,hoc,lex,math,symbol}.o --> hoc

all.do

clean.do

test.do

So the first step in building hoc is to write a snippet, which will call bison and produce the two artifacts. The input of this step (hoc.y) is registered with a call to redo-ifchange.

hoc.run-bison.do hoc.c.do y.tab.h.do

# hoc.run-bison.do v1
redo-ifchange "hoc.y"
bison --yacc -d hoc.y

While this does produce the desired artefacts y.tab.c and y.tab.h, we need to somehow record the state of affairs to avoid bison being called unless needed. The target hoc.run-bison is a virtual target. Its sole purpose is to record said state, so let's try to collect checksums of the artefacts.

# hoc.run-bison.do v2
redo-ifchange "hoc.y"
bison --yacc -d "hoc.y"
md5sum y.tab.c y.tab.h > "$3"
redo-always
redo-stamp < "$3"

Now we create hoc.c as a copy-if-changed from y.tab.c

# hoc.c.do
redo-ifchange "y.tab.c"
cp -a "y.tab.c" "$3"
redo-always
redo-stamp < "$3"

Following along this path ran into a problem. While the generated file y.tab.c is used in the build as hoc.c, i.e. renamed, y.tab.h is used with this name. One can instruct bison to give a different name to this artefact, however, this changed name is also added to the generated y.tab.c file. However, asking for a subdirectory will do the trick. So the final version of hoc.run-bison.do looks like this:

# hoc.run-bison.do
# place output of bison in separate directory
dir_tmp="./tmp.bison"
if [ ! -d "${dir_tmp}" ]
then
        mkdir "${dir_tmp}"
fi

redo-ifchange "hoc.y"
bison --yacc --defines="${dir_tmp}"/y.tab.h --output="${dir_tmp}"/y.tab.c "hoc.y"
md5sum "${dir_tmp}"/y.tab.h "${dir_tmp}"/y.tab.c > "$3"
redo-stamp < "$3"

Calling this snippet produces the following output:

shell$ redo clean; rm -fr .redo
redo clean (0.003s)

shell$ redo -xx hoc.run-bison
+ redo-ifchange hoc.y
+ dir_tmp=./tmp.bison
+ '[' '!' -d ./tmp.bison ']'
+ mkdir ./tmp.bison
+ bison --yacc --defines=./tmp.bison/y.tab.h --output=./tmp.bison/y.tab.c hoc.y
hoc.y: warning: 1 shift/reduce conflict [-Wconflicts-sr]
hoc.y: note: rerun with option '-Wcounterexamples' to generate conflict counterexamples
+ md5sum ./tmp.bison/y.tab.h ./tmp.bison/y.tab.c
+ redo-stamp
redo hoc.run-bison (0.050s)

shell$ ls tmp.bison/
y.tab.c  y.tab.h

Now we are in a position to add two "copy-on-change" snippets. Note that I chose to record the dependency on hoc.run-bison, since this expressedly stores the state of those artefacts.

# hoc.c.do
dir_tmp="./tmp.bison"

redo-ifchange hoc.run-bison
cp -a "${dir_tmp}/y.tab.c" "$3"
redo-always
redo-stamp < "$3"

# y.tab.h.do
dir_tmp="./tmp.bison"

redo-ifchange hoc.run-bison
cp -a "${dir_tmp}/y.tab.h" "$3"
redo-always
redo-stamp < "$3"

Using these snippets creates copies of files as desired:

shell$ redo -xx hoc.c y.tab.h
+ dir_tmp=./tmp.bison
+ redo-ifchange hoc.run-bison
+ cp -a ./tmp.bison/y.tab.c .redo.hoc.c.879937758.3
+ redo-always
+ redo-stamp
redo hoc.c (0.010s)
+ dir_tmp=./tmp.bison
+ redo-ifchange hoc.run-bison
+ cp -a ./tmp.bison/y.tab.h .redo.y.tab.h.3079584557.3
+ redo-always
+ redo-stamp
redo y.tab.h (0.009s)

hoc.run-flex.do

Adding a call to lex is much simpler, and only for symmetry reasons I chose the same implementation strategy as above. It could be done simpler.

# hoc.run-flex.do
dir_tmp="./tmp.flex"
if [ ! -d "${dir_tmp}" ]
then
        mkdir "${dir_tmp}"
fi

redo-ifchange "lex.l"
flex  --outfile=${dir_tmp}/lex.yy.c lex.l
md5sum ${dir_tmp}/lex.yy.c > "$3"
#redo-always???
redo-stamp < "$3"

# lex.c.do
dir_tmp="./tmp.flex"

redo-ifchange hoc.run-flex
cp -a "${dir_tmp}/lex.yy.c" "$3"
redo-always
redo-stamp < "$3"

The expected things are happening:

shell$ redo clean; rm -fr .redo
redo clean (0.003s)
shell$ redo -xx lex.c
+ redo-ifchange hoc.run-flex
+ dir_tmp=./tmp.flex
+ '[' '!' -d ./tmp.flex ']'
+ mkdir ./tmp.flex
+ redo-ifchange lex.l
+ flex --outfile=./tmp.flex/lex.yy.c lex.l
+ md5sum ./tmp.flex/lex.yy.c
+ redo-stamp
redo . hoc.run-flex (0.013s)
+ dir_tmp=./tmp.flex
+ redo-ifchange ./tmp.flex/lex.yy.c
+ cp -a ./tmp.flex/lex.yy.c .redo.lex.c.1259052827.3
+ redo-always
+ redo-stamp
redo lex.c (0.036s)

El Resto

Except for one thing, which I will come to shortly, the remainder of this build is straight forward. We need the remaining snippets

all.do -- because we want to play nice

hoc.do -- to call the linker producing the final executable

default.o.do -- to produce .o files from .c sources

# all.do
# hoc is the real target in this directory
redo-ifchange hoc

# hoc.do
# OBJS holds the space separated list of .o files, which are needed to build this target
OBJS="hoc.o lex.o init.o symbol.o math.o"

redo-ifchange $OBJS
gcc -O3 $OBJS -o "$3" -lm -lfl

# default.o.do
redo-ifchange $2.c y.tab.h
gcc  -MMD -MF $2.d  -o $3 -c $2.c
read DEPS <$2.d
redo-ifchange ${DEPS#*:}

Please note that default.o.do records an explicit dependency on y.tab.h. This is unobvious. Wasn't it the case that gcc -MMD would record that *.c is or is not dependant on y.tab.h? Yes and no. If y.tab.h does exist at the time of the gcc call, everything proceeds nicely. If, however, y.tab.h does not yet exist, because it has not been generated, then the build fails at this point and will not proceed. So I decided in this case to add an explicit dependency to all .c files. This is wrong in the case of math.c. One could also add some more code to default.o.do in order to deal with this possible dependency to a generated header file. But I don't like to add specific knowledge to a thing called "default" something. Creating two groups of .c files using different default rules is certainly possible, but involves subdirectories as far as I can tell.

test.do: testing the executable

# test.do
(
    cat <<EOF
1+2*(3*4)
2/3
1-8
-3-4
bla = 355
fasel = 113
bla/z
bla/fasel
PI
(bla/fasel-PI)/PI
EOF
) | ./hoc 2>&1 | tee ./test.log

The test simply pipes a number of commands into hoc, the results are, as expected:

shell$ redo test
redo test (0.004s)

shell$ cat test
./hoc: undefined variable z near line 8
        25
        0.66666667
        -7
        -7
        3.1415929
        3.1415927
        8.4913679e-08

The results are as desired, including the error message referencing the non existing variable z. More complicated things are certainly possible but not needed at this point.

Conclusion

I did some testing to change various places in hoc.y to trigger different content for hoc.c and y.tab.h respectively. As far as I can tell, the redo build behaves as expected.

My solution needed 10 .do snippets, which accumulate 111 lines of code (incl. empty lines).

The less intelligent Makefile comes in at 30 lines of code.

So yes, handling multi-artifact calls can be done, whether or not this is a good fit for redo, can be debated. Whether or not my solution is truely minimal is an entirely different question.

Cheers,

~ew

Home

-- Response ended

-- Page fetched on Thu May 2 21:46:39 2024