Depending on a generated file - shake-build-system

What's the right way for a rule to need a generated file? Here's what I tried:
import Development.Shake
import System.IO
import Control.Monad
main = do
s <- withBinaryFile "/dev/urandom" ReadMode $ replicateM 10 . hGetChar
shakeArgs shakeOptions $ do
want ["a.out"]
"generated" *> \target -> writeFileChanged target s
"*.out" *> \out -> do
need ["generated"]
writeFile' out =<< readFile' "generated"
But this results in the rule for generated not getting re-run, so a.out stays the same after repeated runs.

To solve your problem you need to add alwaysRerun to the definition of generated, so that the generated rule always runs. You are correctly depending on generated with the need (and also with the readFile', which does a need behind scenes), it's just that generated doesn't have any input dependencies, so never gets rerun. Adding alwaysRerun gives generated a dependency that always changes. I would expect to see:
"generated" *> \target -> do
alwaysRerun
writeFileChanged target s
(You can also move the definition of s down to under generated, but I have a suspicion that's just an artefact of how you've simplified your test case.)

Related

Can a shake rule determine which "needs" have changed since the last build?

I am building a shake based build system for a large Ruby (+ other things) code base, but I am struggling to deal with Ruby commands that expect to be passed a list of files to "build".
Take Rubocop (a linting tool). I can see three options:
need all Ruby files individually; if they change, run rubocop against the individual file that changed for each file that changed (very slow on first build or if many ruby files change because rubocop has a large start up time)
need all Ruby files; if any change, run rubocop against all the ruby files (very slow if only one or two files have changed because rubocop is slow to work out if a file has changed or not)
need all Ruby files; if any change, pass rubocop the list of changed dependencies as detected by Shake
The first two rules are trivial to build in shake, but my problem is I cannot work out how to represent this last case as a shake rule. Can anyone help?
There are two approaches to take with Shake, using batch or needHasChanged. For your situation I'm assuming rubocop just errors out if there are lint violations, so a standard one-at-a-time rule would be:
"*.rb-lint" %> \out -> do
need [out -<.> "rb"]
cmd_ "rubocop" (out -<.> "rb")
writeFile' out ""
Use batch
The function batch describes itself as:
Useful when a command has a high startup cost - e.g. apt-get install foo bar baz is a lot cheaper than three separate calls to apt-get install.
And the code would be roughly:
batch 3 ("*.rb-lint-errors" %>)
(\out -> do need [out -<.> "rb"]; return out) $
(\outs -> do cmd_ "rubocop" [out -<.> "rb" | out <- outs]
mapM_ (flip writeFile' "") pits)
Use needHasChanged
The function needHasChanged describes itself as:
Like need but returns a list of rebuilt dependencies since the calling rule last built successfully.
So you would write:
"stamp.lint" *> \out -> do
changed <- needHasChanged listOfAllRubyFiles
cmd_ "rubocop" changed
writeFile' out ""
Comparison
The advantage of batch is that it is able to run multiple batches in parallel, and you can set a cap on how much to batch. In contrast needHasChanged is simpler but is very operational. For many problems, both are reasonable solutions. Both these functions are relatively recent additions to Shake, so make sure you are using 0.17.2 or later, to ensure it has all the necessary bug fixes.

How to import a haskell module that uses FFI without refering to the c object?

I'm trying to write a haskell module that wraps a bunch of c functions.
I want to be able to import this module like any other haskell module without referring to the c object files.
I can't find any examples about how to do this.
This is what I've tried. I have a c file "dumbCfunctions.c":
double addThree(double x) {
return x+3;
}
and a haskell file with a module defined in it "Callfunctions.hs"
module Callfunctions (
addThree
) where
import Foreign.C
foreign import ccall "addThree" addThree :: Double -> Double
main = print $ addThree 4
I can make an executable doing:
ghc --make -o cf_ex Callfunctions.hs dumbCfunctions.o
Which correctly gives me 7.
I can also import it into ghic by calling ghci with
shane> ghci dumbCfunctions.o
Prelude> :l Callfunctions.hs
[1 of 1] Compiling Callfunctions ( Callfunctions.hs, interpreted )
Ok, modules loaded: Callfunctions.
*Callfunctions> addThree 3
6.0
But I want to be able to treat it like any other module with out referring to "dumbCfunctions.o":
shane> ghci
Prelude> :l Callfunctions.hs
[1 of 1] Compiling Callfunctions ( Callfunctions.hs, interpreted )
Ok, modules loaded: Callfunctions.
*Callfunctions> addThree 3
But now I get the error
ByteCodeLink: can't find label
During interactive linking, GHCi couldn't find the following symbol:
addThree
This may be due to you not asking GHCi to load extra object files,
archives or DLLs needed by your current session. Restart GHCi, specifying
the missing library using the -L/path/to/object/dir and -lmissinglibname
flags, or simply by naming the relevant files on the GHCi command line.
Alternatively, this link failure might indicate a bug in GHCi.
If you suspect the latter, please send a bug report to:
glasgow-haskell-bugs#haskell.org
This makes sense because I haven't refereed to the object anywhere. So I must be able to do something better by first compiling the module, but I couldn't find out how to do this. I must be looking in the wrong places.
You can create a library through Cabal, and cabal install it.
This would link the C code inside your Haskell library. Later on, when you load the module, you will not need to manually load the C parts.

Handling multiple build configurations in parallel

How can I build one set of source files using two different configurations without having to rebuild everything?
My current setup adds an option --config=rel which will load all options from build_rel.cfg and compile everything to the directory build_rel/.
data Flags = FlagCfg String
deriving (Show, Eq)
flags = [Option ['c'] ["config"]
(ReqArg (\x -> Right $ FlagCfg x) "CFG")
"Specify which configuration to use for the build"]
main :: IO ()
main = shakeArgsWith shakeOptions { shakeChange=ChangeModtimeAndDigest }
flags $
\flags targets -> return $ Just $do
let buildDir = "build" ++
foldr (\a def -> case (a, def) of
(FlagCfg cfg, "") -> '_':cfg
otherwise -> def)
"" flags
-- Settings are read from a config file.
usingConfigFile $ buildDir ++ ".cfg"
...
If I then run
build --config=rel
build --config=dev
I will end up with two builds
build_rel/
build_dev/
However, every time I switch configuration I end up rebuilding everything. I would guess this is because all my oracles have "changed". I would like all oracles to be specific to my two different build directories so that changes will not interfere between builds using different configurations.
I know there is a -m option to specify where the database should be stored but I would rather not have to specify two options that have to sync all the time.
build --config=rel -m build_rel
Is there a way to update the option shakeFiles after the --config option is parsed?
Another idea was to parameterize all my Oracles to include my build configuration but then I noticed that usingConfigFile uses an Oracle and I would have to reimplement that as well. Seems clunky.
Is there some other way to build multiple targets without having to rebuild everything? It seems like such a trivial thing to do but still, I can't figure it out.
There are a few solutions:
Separate databases
If you want the two directories to be entirely unrelated, with nothing shared between them, then changing the database as well makes most sense. There's currently no "good" way to do that (either pass two flags, or pre-parse some of the command line). However, it should be easy enough to add:
shakeArgsOptionsWith
:: ShakeOptions
-> [OptDescr (Either String a)]
-> (ShakeOptions -> [a] -> [String] -> IO (Maybe (ShakeOptions, Rules ())))
-> IO ()
Which would then let you control both settings from a single flag.
Single database
If you want a single database, you could load all the config files, and specify config like release.destination = ... and debug.destination = ..., then rule for */output.txt would lookup the config based on the prefix of the rule, e.g. release/output.txt would look up release.destination. The advantage here is that anything that does not change between debug and release (e.g. documentation) could potentially be shared.

How to override Shake configuration on the command-line

I maintain small configuration files per project read via usingConfigFile. I'd like to be able to override any of those settings on the command line. It seems using shakeArgsWith (rather than shakeArgs) is the first step on the way but I don't see an obvious way to wire that through to the values produced by getConfig. Is there a standard approach for doing this?
There isn't a standard approach, but I know several larger build systems have invented something. A combination of shakeArgsWith, readConfigFile and usingConfig should do it. Something like (untested):
main = shakeArgsWith shakeOptions [] $ \_ args -> return $ Just $ do
file <- readConfigFile "myfile.cfg"
usingConfig $ Map.union (argsToSettings args) file
myNormalRules
Where argsToSettings is some function that parses your arguments and turns them into settings - e.g. breaking on the first = symbol or similar.

How should I interpolate environment variables in Shake file patterns?

In my Makefiles, I prefer having the output directory defined by a environment variable rather than hard-coded (with some reasonable default value if its unset). For example, a Make rule would look like
$(OUTPUT_DIR)/some_file: deps
#build commands
I have yet to figure out how to achieve a similar goal in Shake. I like using getEnvWithDefault to grab the value of the environment variable or a reasonable default, but no amount of bashing it with binds or lambdas have allowed me to combine it with (*>).
How might it be possible to interpolate an environment variable in a FilePattern for use with (*>)?
The function getEnvWithDefault runs in the Action monad, and the name of the rule has to be supplied in a context where you cannot access the Action monad, so you can't translate this pattern the way you tried. There are a few alternatives:
Option 1: Use lookupEnv before calling shake
To exactly match the behaviour of Make you can write:
main = do
outputDir <- fromMaybe "output" <$> lookupEnv "OUTPUT_DIR"
shakeArgs shakeOptions $ do
(outputDir </> "some_file") *> \out -> do
need deps
-- build commands
Here we use the lookupEnv function (from System.Environment) to grab the environment variable before we start running Shake. We can then define a file that precisely matches the environment variable.
Option 2: Don't force the output in the rule
Alternatively, we can define a rule that builds some_file regardless of what directory it is in, and then use the tracked getEnvWithDefault when we say which file we want to build:
main = shakeArgs shakeOptions $ do
"//some_file" *> \out -> do
need deps
-- build commands
action $ do
out <- getEnvWithDefault "OUTPUT_DIR"
need [out </> "some_file"]
Here the rule pattern can build anything, and the caller picks what the output should be. I prefer this variant, but there is a small risk that if the some_file pattern overlaps in some way you might get name clashes. Introducing a unique name, so all outputs are named something like $OUTPUT_DIR/my_outputs/some_file eliminates that risk, but is usually unnecessary.

Resources