shake build: how to deal with case where the needed file are discovered later - shake-build-system

I use shake to build a website (with pandoc). When files are converted to pandoc, other files (css, bibliography, templates, etc.) may be needed, but shake does not know it, because it is internal to the pandoc calling function and the information is in the files used and only gradually visible.
I have the impression from reading the docs, that asking the function called by shake to return a list of the files used and, after the function called in shake, to use the returned list of files to call need. Is it correct that the order in which need is called, matters?
Alternatively,
(1) I can build functions to only find which other files are needed (doing the work nearly twice) and call them first. Or,
(2), to break the process into steps, each resulting in a file and then start a new rule to go forward from this file (and the additional files) and add needs there. The second solution builds intermediate files and breaks the logical flow of the transformation from pandoc to html.
What is better?

The answer depends on the details of the files that are depended upon (e.g. the css, bibliography):
If they are source files, you can add the dependency after they are used, using needed.
If they are generated by Shake itself, you can't use needed without first ensuring they are present. If you can statically predict a superset of which files are the dependencies consulted, you can use orderOnly to ensure all the files that the rule might depend on have been built, then use needed after to declare which are actually required.
Finally, if you can't predict which files are dependencies, and you might be generating them, then the easiest is to run part of the computation twice.

Related

Convert a shared lib to a static one (again)

I have done a bit of research and I saw several answers here on SO, including this and this.
I do understand the process of creating a Shared Object actually merges all the various sources and resolves all internal links; it is thus impossible to separate the various pieces that have already been merged together.
What does prevent making a static lib composed by just one module?
I do understand I would lose capability to link "just what needed" and pulling in even a single function would result in inclusion of the whole thing, but I can live with that.
Reason why I ask is I need to produce a wrapper for a Shared lib and, if done normally, this would mean I need to distribute two shared libs: the original one and my wrapper. This has several nasty effects in my environment because i dynamically load my wrapper in final program and thus I can place it wherever I want, while "original" Shared lib is searched independently and I have a lot of troubles to have dynamic loader to find it in the general case.
If I manage to convert original Shared into a Static lib then linking my wrapper to it would pull it in and I would have an "all inclusive" wrapper.
I did not find any way to obtain this result without recompiling the original lib (which might not be an option in near future).
Either modifying "original lib" to become static (possibly as single object) or instructing linker for my wrapper to "pull shared object in, somehow" would solve my problem.

Storing folder's paths

Where can I store folder's paths, which can be accessed from every function/variable in a C program?
Ex. I have an executable called do_input.exe in the path c:\tests\myprog\bin\do_input.exe,
another one in C:\tools\degreesToDms.exe, etc. how and where should I store these?
I stored them as strings in an header file which I included in every project's file but someone discouraged from doing this. Are they right?
I stored them as strings in an header file which I included in every project's file but someone discouraged from doing this. Are they right?
Yes, they are absolutely right: "baking in" installation-specific strings with paths in a file system into a compiled code is not a good decision, because you must recompile simply to change locations of some key files. This limits the flexibility of other members of your team to run your tests, and may prevent your tests from being ran automatically in an automated testing environment.
A better solution would use a plain text configuration file with the locations of the key directories, and functions that read that file and produce correct locations at run-time.
Alternatively, you could provide locations of key directories as command-line parameters to your program. This way, users who run your program would be able to set correct locations without recompiling.
If they stay the same, then I don't see any problem defining these paths in a ".h" header file included in all the various .c files that reference the paths. But every computer this thing will be running on may have different paths ("Tests" instead of "test"), so this is super risky programming and probably only safe if you're running it on a single machine or a set of machines that you control directly.
If the paths will change, then you need to create a storage place for these paths (e.g. static character array, etc.) and then have methods to allow these to be fetched and possibly reset dynamically (e.g. instead of writing output files to "results", maybe the user wants to change things to write files to "/tmp"). Totally depends on what you are doing in your code and what the tools you're writing will be doing.

Extract just the required functions from a C code project?

How can I extract just the required functions from a pile of C source files? Is there a tool which can be used on GNU/Linux?
Preferably FOSS, but the GNU/Linux is a hard requirement.
Basically I got about 10 .h files; I'd like to grab part of the code and get the required variables from the header files. Then I can make a single small .h file corresponding to the code I'm using in another project.
My terms might not be 100% correct.
One tool that you may or may not be aware of is cscope. It can be used to help you.
For a given set of files (more on what that means shortly), it gives you these options:
Find this C symbol:
Find this global definition:
Find functions called by this function:
Find functions calling this function:
Find this text string:
Change this text string:
Find this egrep pattern:
Find this file:
Find files #including this file:
Thus, if you know you want to use a function humungous_frogmondifier(), you can find where it is declared or defined by typing its name (or pasting its name) after 'Find this global definition'. If you then want to know what functions it calls, you use the next line. Once you've hit return after specifying the name, you will be given a list of the relevant lines in the source files above this menu on the screen. You can page through the list (if there are more entries than will fit on the screen), and at any time select one of the shown entries by number or letter, in which case cscope launches your editor on the file.
How about that list of files? If you run cscope in a directory without any setup, it will scan the source files in the directory and build its cross-reference. However, if you prefer, you can set up a list of files names in cscope.files and it will analyze those files instead. You can also include -I /path/to/directory on the cscope command line and it will find referenced headers in those directories too.
I'm using cscope 15.7a on some sizeable projects - depending on which version of the project, between about 21,000 and 25,000 files (and some smaller ones with only 10-15 thousand files). It takes about half an hour to set up this project (so I carefully rebuild the indexes once per night, and use the files for the day, accepting that they are a little less accurate at the end of the day). It allows me to track down unused stuff, and find out where stuff is used, and so on.
If you're used to an IDE, it will be primitive. If you're used to curses-mode programs (vim, etc), then it is tolerably friendly.
You suggest (in comments to the main question) that you will be doing this more than once, possibly on different (non-library) code bases. I'm not sure I see the big value in this; I've been coding C on an off for 30+ years and don't feel the need to do this very often.
But given the assumption you will, what you really want is a tool that can, for a given identifier in a system of C files and headers, find the definition of that identifier in those files, and compute the transitive closure of all the dependencies which it has. This defines a partial order over the definitions based on the depends-on relationship. Finally you want to emit the code for those definitions to an output file, in a linear order that honors the partial order determined. (You can simplify this a bit by insisting that the identifier you want is in a particular C compilation unit, but the rest of it stays the same).
Our DMS Software Reengineering Toolkit with its C Front End can be used to do this. DMS is a general purpose program transformation system, capable of parsing source files into ASTs, perform full name resolution (e.g., building symbol tables), [do flow analysis but this isn't needed for your task]. Given those ASTs and the symbol tables, it can be configured to compute this transitive dependency using the symbol table information which record where symbols are defined in the ASTs. Finally, it can be configured to assemble the ASTs of interest into a linear order honoring the partial order.
We have done all this with DMS in the past, where the problem was to generate SOA-like interfaces based on other criteria; after generating the SOA code, the tool picked out all the dependencies for the SOA code and did exactly what was required. The dependency extraction machinery is part of the C front end.
A complication for the C world is that the preprocessor may get in the way; for the particular task we accomplished, the extraction was done over a specific configuration of the application and so the preprocessor directives were all expanded away. If you want this done and retain the C preprocessor directives, you'll need something beyond what DMS can do today. (We do have experimental work that captures macros and preprocessor conditionals in the AST but that's not ready for release to production).
You'd think this problem would be harder with C++ but it is not, because the prepreprocessor is used far more lightly in C++ programs. While we have not done extraction for C++, it would follow exactly the same approach as for C.
So that's the good part with respect to your question.
The not so good part from your point of view, perhaps, is that DMS isn't FOSS; it is a commercial tool designed to be used by my company and our customers to build custom analysis and transformation tools for all those tasks you can't get off the shelf, that make economic sense. Nor does DMS run natively on Linux, rather it is a Windows based tool. It can reach across the network using NFS to access files on other systems including Linux. DMS does run under Wine on Linux.

Overhead of copying lua functions

I have a lot of lua-script with the same name used for function name (intended for "entry point") and I want to run them. But I want to make it as fast as possible.
After some browsing/googling/thinking I've got two solution.
1., I have a main lua_State. I load the all necessary given and my own libs/functions "into" it. Next I lua_dump() the current lua-script's lua_State's function (with using linked list for chunck container), then I lua_load() it to the main lua_State and then lua_call() the main lua_State. With this solution I don't have to load all the libs for all the scripts. So the main lua_State is an "environment". :)
2., I simply do the libs loading for all lua_State. And then lua_call() them.
The question would be that: Is even the first one logical correct? And if yes, which one would you use? Is there a better solution?
Thanks ahead and sorry for my English.
(And if the first one is really correct, are there some oblivious optimization possibility?)
As you put it, I don't see why you'd want more than 1 Lua state. If you have only one Lua state all overhead you'll have is loading the libs (once) and then loading the functions from the scripts you run (once, unless you need to 'refresh' them from the file). So simply have 1 state, and dofile the scripts.
If you really need those multiple lua_States, you could only load the libraries you need, as explained
in the Lua Reference Manual, in the paragraph just above 5.1
There's also a nice freely available chapter on optimization of Lua code in Lua Gems Book.
I've recently done something similar and decided to use a single lua_State. I've custom-loaded every script file into its own environment via the use of the _ENV upvalue (generating a new environment for each as a copy of the global environment). This way the names won't conflict and I believe you can potentially run more scripts in parallel if you need that for whatever reason.
It works for my purposes as I need to access the functions in all the loaded scripts basically at random and at any time, but if all you need is to run them once then you can just load and execute them sequentially in the same lua_State.
Edit: I've noticed I've actually missed the point of the question. To answer: using a single lua_State will be faster if you need to load any standard libraries (the overhead is noticeable). If you are running each script only once you don't need to use lua_dump/lua_load, just do something like luaL_dofile followed by lua_pcall on the entry function, then move on (ie load the next file).

How to ensure unused symbols are not linked into the final executable?

First of all my apologies to those of you who would have followed my questions posted in the last few days. This might sound a little repetitive as I had been asking questions related to -ffunction-sections & -fdata-sections and this one is on the same line. Those questions and their answers didn't solve my problem, so I realized it is best for me to state the full problem here and let SO experts ponder about it. Sorry for not doing so earlier.
So, here goes my problem:
I build a set of static libraries which provide a lot of functionalities. These static libraries will be provided to many products. Not all products will use all of the functionalities provided by my libs. The problem is that the library sizes are quite big and the products want it to be reduced. The main goal is to reduce the final executable size and not the library size itself.
Now, I did some research and found out that, if there are 4 functions in a source file and only one function of that is used by the application, the linker will still include the rest of the 3 functions into the final executable as they all belong to the same object file. I further analyzed and found that -ffunction-sections, -fdata-sections and -gc-sections(this one is a linker option) will ensure only that one function gets linked.
But, these options for some reasons beyond my control cannot be used now.
Is there any other way in which I can ensure that the linker will link only the function which is strictly required and exclude all other functions even if they are in the same object file?
Are there any other ways of dealing with the problem?
Note: Reorganizing my code is almost ruled out as it is a legacy code and big.
I am dealing mainly with VxWorks & GCC here.
Thanks for any help!
Ultimately, the only way to ensure that only the functions you want are linked is to ensure that each source (object) file in the library only exports one function symbol - one (visible) function per file. Typically, there are some files which export several functions which are always all used together - the initialization and finalization functions for a package, for example. Also, there are often functions used by the exported function that do not need to be visible outside the source (object) file - make sure they are static.
If you looked at Plauger's "The Standard C Library", you'll find that every function is implemented in a separate file, even if the file ends up 4 lines long (one header, one function line, an open brace, one line of code, and a close brace).
Jay asked:
In the case of a big project, doesn't it become difficult to manage with so many files? Also, I don't find many open source projects following this model. OpenSSL is one example.
I didn't say it was widely used - it isn't. But it is the way to make sure that binaries are minimized. The compiler (linker) won't do the minimization for you - at least, I'm not aware of any that do. On a large project, you design the source files so that closely related functions that will normally all be used together are grouped in single source files. Functions that are only occasionally used should be placed in separate files. Ideally, the rarely used functions should each be in their own file; failing that, group small numbers of them into small (but non-minimal) files. That way, if one of the rarely used functions is used, you only get a limited amount of extra unused code linked.
As to number of files - yes, the technique espoused does mean a lot of files. You have to weigh the workload of managing (naming) lots of files against the benefit of minimal code size. Automatic build systems remove most of the pain; VCS systems handle lots of files.
Another alternative is to put the library code into a shared object - or dynamic link library (DLL). The programs then link with the shared object, which is loaded into memory just once and shared between programs using it. The (non-constant) data is replicated for each process. This reduces the size of the programs on disk, at the cost of fixups during the load process. However, you then don't need to worry about executable size; the executables do not include the shared objects. And you can update the library (if you're careful) without recompiling the main programs that use it. The reduced size of the executables is one reason shared libraries are popular.

Resources