How to override Shake configuration on the command-line - shake-build-system

I maintain small configuration files per project read via usingConfigFile. I'd like to be able to override any of those settings on the command line. It seems using shakeArgsWith (rather than shakeArgs) is the first step on the way but I don't see an obvious way to wire that through to the values produced by getConfig. Is there a standard approach for doing this?

There isn't a standard approach, but I know several larger build systems have invented something. A combination of shakeArgsWith, readConfigFile and usingConfig should do it. Something like (untested):
main = shakeArgsWith shakeOptions [] $ \_ args -> return $ Just $ do
file <- readConfigFile "myfile.cfg"
usingConfig $ Map.union (argsToSettings args) file
myNormalRules
Where argsToSettings is some function that parses your arguments and turns them into settings - e.g. breaking on the first = symbol or similar.

Related

How to find all C functions starting with a prefix in a library

For a small test framework, I want to do automatic test discovery. Right now, my plan is that all tests just have a prefix, which could basically be implemented like this
#define TEST(name) void TEST_##name(void)
And be used like this (in different c files)
TEST(one_eq_one) { assert(1 == 1); }
The ugly part is that you would need to list all test-names again in the main function.
Instead of doing that, I want to collect all tests in a library (say lib-my-unit-tests.so) and generate the main function automatically, and then just link the generated main function against the library. All of this internal action can be hidden nicely with cmake.
So, I need a script that does:
1. Write "int main(void) {"
2. For all functions $f starting with 'TEST_' in lib-my-unit-tests.so do
a) write "extern void $f(void);"
b) write "$f();
3. Write "}"
Most parts of that script are easy, but I am unsure how to reliably get a list of all functions starting with the prefix.
On POSIX systems, I can try to parse the output of nm. But here, I am not sure if the names will always be the same (on my MacBook, all names start with an additional '_'). To me, it looks like it might be OS/architecture-dependent which names will be generated for the binary. For windows, I do not yet have an idea on how to do that.
So, my questions are:
Is there a better way to implement test-discovery in C? (maybe something like dlsym)
How do I reliably get a list of all function-names starting with a certain prefix on a MacOS/Linux/Windows
A partial solution for the problem is parsing nm with a regex:
for line in $(nm $1) ; do
# Finds all functions starting with "TEST_" or "_TEST_"
if [[ $line =~ ^_?(TEST_.*)$ ]] ; then
echo "${BASH_REMATCH[1]}"
fi
done
And then a second script consumes this output to generate a c file that calls these functions. Then, cmake calls the second script to create the test executable
add_executable(test-executable generated_source.c)
target_link_libraries(test-executable PRIVATE library_with_test_functions)
add_custom_command(
OUTPUT generated_source.c
COMMAND second_script.sh library_with_test_functions.so > generated_source.c
DEPENDS second_script.sh library_with_test_functions)
I think this works on POSIX systems, but I don't know how to solve it for Windows
You can write a shell script using the nm or objdump utilities to list the symbols, pipe through awk to select the appropriate name and output the desired source lines.

Handling multiple build configurations in parallel

How can I build one set of source files using two different configurations without having to rebuild everything?
My current setup adds an option --config=rel which will load all options from build_rel.cfg and compile everything to the directory build_rel/.
data Flags = FlagCfg String
deriving (Show, Eq)
flags = [Option ['c'] ["config"]
(ReqArg (\x -> Right $ FlagCfg x) "CFG")
"Specify which configuration to use for the build"]
main :: IO ()
main = shakeArgsWith shakeOptions { shakeChange=ChangeModtimeAndDigest }
flags $
\flags targets -> return $ Just $do
let buildDir = "build" ++
foldr (\a def -> case (a, def) of
(FlagCfg cfg, "") -> '_':cfg
otherwise -> def)
"" flags
-- Settings are read from a config file.
usingConfigFile $ buildDir ++ ".cfg"
...
If I then run
build --config=rel
build --config=dev
I will end up with two builds
build_rel/
build_dev/
However, every time I switch configuration I end up rebuilding everything. I would guess this is because all my oracles have "changed". I would like all oracles to be specific to my two different build directories so that changes will not interfere between builds using different configurations.
I know there is a -m option to specify where the database should be stored but I would rather not have to specify two options that have to sync all the time.
build --config=rel -m build_rel
Is there a way to update the option shakeFiles after the --config option is parsed?
Another idea was to parameterize all my Oracles to include my build configuration but then I noticed that usingConfigFile uses an Oracle and I would have to reimplement that as well. Seems clunky.
Is there some other way to build multiple targets without having to rebuild everything? It seems like such a trivial thing to do but still, I can't figure it out.
There are a few solutions:
Separate databases
If you want the two directories to be entirely unrelated, with nothing shared between them, then changing the database as well makes most sense. There's currently no "good" way to do that (either pass two flags, or pre-parse some of the command line). However, it should be easy enough to add:
shakeArgsOptionsWith
:: ShakeOptions
-> [OptDescr (Either String a)]
-> (ShakeOptions -> [a] -> [String] -> IO (Maybe (ShakeOptions, Rules ())))
-> IO ()
Which would then let you control both settings from a single flag.
Single database
If you want a single database, you could load all the config files, and specify config like release.destination = ... and debug.destination = ..., then rule for */output.txt would lookup the config based on the prefix of the rule, e.g. release/output.txt would look up release.destination. The advantage here is that anything that does not change between debug and release (e.g. documentation) could potentially be shared.

Using pg_config with waf

I use waf as my build system and I want to compile a small C program using Postgres. I have included postgres.h in my program so I need to find the path to it in my wscript file. I know that I can get the path I need by running:
pg_config --includedir-server
which gives me:
/usr/include/postgresql/9.3/server
So I thought I could use something like this:
cfg.check_cfg(
path='pg_config',
package='',
uselib_store='PG',
args='--includedir-server',
)
And then build my program by:
bld.program(
source=['testpg.c'],
target='testpg',
includes=['.', '../src'],
use=['PQ', 'PG'],
)
But this fails with postgres.h: No such file or directory. I ran ./waf -v and confirmed that the proper -I flag is not being passed to gcc. My guess is this happens because pg_config does not add a -I prefix to the path it returns. Is there a way I can make waf to add the prefix, or make pg_config to add it?
Should pg_config had the standard output of pkg_config like programs (ie outputs something like -Ixxx -Iyyy), your code would work, has check_cfg parse this kind of output.
As there is no complicated parsing, you can go for:
import subprocess
includes = subprocess.check_output(["pg_config", "--includedir-server"])
includes.replace("\n", "")
conf.env.INCLUDES_PG = [includes]
And then use it:
bld.program(
source=['testpg.c'],
target='testpg',
use=['PG'],
)
See the library integration in the waf book. It explains the naming rule that make it works.
You can write a small plugin to ease the use :)

How should I interpolate environment variables in Shake file patterns?

In my Makefiles, I prefer having the output directory defined by a environment variable rather than hard-coded (with some reasonable default value if its unset). For example, a Make rule would look like
$(OUTPUT_DIR)/some_file: deps
#build commands
I have yet to figure out how to achieve a similar goal in Shake. I like using getEnvWithDefault to grab the value of the environment variable or a reasonable default, but no amount of bashing it with binds or lambdas have allowed me to combine it with (*>).
How might it be possible to interpolate an environment variable in a FilePattern for use with (*>)?
The function getEnvWithDefault runs in the Action monad, and the name of the rule has to be supplied in a context where you cannot access the Action monad, so you can't translate this pattern the way you tried. There are a few alternatives:
Option 1: Use lookupEnv before calling shake
To exactly match the behaviour of Make you can write:
main = do
outputDir <- fromMaybe "output" <$> lookupEnv "OUTPUT_DIR"
shakeArgs shakeOptions $ do
(outputDir </> "some_file") *> \out -> do
need deps
-- build commands
Here we use the lookupEnv function (from System.Environment) to grab the environment variable before we start running Shake. We can then define a file that precisely matches the environment variable.
Option 2: Don't force the output in the rule
Alternatively, we can define a rule that builds some_file regardless of what directory it is in, and then use the tracked getEnvWithDefault when we say which file we want to build:
main = shakeArgs shakeOptions $ do
"//some_file" *> \out -> do
need deps
-- build commands
action $ do
out <- getEnvWithDefault "OUTPUT_DIR"
need [out </> "some_file"]
Here the rule pattern can build anything, and the caller picks what the output should be. I prefer this variant, but there is a small risk that if the some_file pattern overlaps in some way you might get name clashes. Introducing a unique name, so all outputs are named something like $OUTPUT_DIR/my_outputs/some_file eliminates that risk, but is usually unnecessary.

Retrieving Global Variable Values from Command Line

In one particular project, we're trying to embed version information into shared object files. We'd like to be able to use some standard linux tool to parse the shared object to determine the version for automated testing.
Currently I have "const int plugin_version = 14;". I can use 'nm' and 'objdump' and verify that it's there:
00000000000dcfbc r plugin_version
I can't, however, seem to be able to get the value of that variable easily from command line. I figured there'd be a POSIX tool for showing the initialized values for globals. I have contemplated using a format for the variable as the information itself, ie, plugin_version_14, but that seems like a huge hack. Embedding the information in the filename unfortunately is NOT an option. Any other suggestions welcome.
You could embed it as a string
"MAGIC MARKER STRING VERSION: 4.56 END OF MAGIC" then just look for "MAGIC MARKER STRING" in the file and extract the version information that comes after it.
if you make it a standard, you could easily make command line tool to find these embeded strings on all your software.
if you require it also to be an int, a little macro magic will construct both the int and magic string to make sure they are never out of synch.
There's a couple of options I think.
My first instinct is to make sure the version information lives in its own section in the ELF file. You can use objdump -s -j name of section /bin/whatever.
This rather relies on objdump being available of course.
Alternatively you can do what Keith suggested, and just use 'strings', along with a magical marker string. This feels a little hackish, but should work quite well.
Finally, why don't you just add a --version command line option? You can then store the version information however you like, and trivially retrieve it using the one tool which is certain to be installed on any system which has your software.
A terrible hack that I've used in the past is to embed the version information in a variable name, so nm will show:
00000000000dcfbc r plugin_version_14
Why not writing your own tool to get that version in C/C++ ? You could Use dlopen, then dlsym to get the symbol and print its value to standard output. This way you also verify if the symbol is already there. It looks like 20 ~ 30 lines of code to me and about 20 minutes of your life :)
I know that the question is about command line, but writing such a tool yourself should be easy (especially if such a command line tool does not exist).
If the binary is not stripped, you could use gdb to print the variable. (I just tried to script gdb, but it seems to refuse work if stdin is not a tty, maybe expect will do the job ? )
If you can accept using python, this might help:
import struct
import sys
import subprocess
if __name__ == '__main__':
so = sys.argv[1]
sym = sys.argv[2]
addr = subprocess.check_output('nm %s | grep %s' % (so, sym), shell=True)
addr = int(addr.split()[0], 16)
so_file = open(so)
so_file.seek(addr)
data = so_file.read(4)
print struct.unpack('#i', data)[0]
Disclaimer: This script doesn't do any error checking (if you like it I'm sure you can come up with some ;)). It also assumes you're reading a 4-byte native int value.
$ cat global.c
const int plugin_version = 14;
$ python readsym.py global.so plugin_version
14

Resources