Locating data files in C program built with Autotools - c

I have a C program built using Autotools. In src/Makefile.am, I define a macro with the path to installed data files:
AM_CPPFLAGS = -DAM_INSTALLDIR='"$(pkgdatadir)"'
The problem is that I need to run make install before I can test the binary (since it needs to be able to find the data files).
I can define another macro with the path of the source tree so the data files can be located without installing:
AM_CPPFLAGS = -DAM_INSTALLDIR='"$(pkgdatadir)"' -DAM_TOPDIR='"$(abs_top_srcdir)"'
Now, I would like the following behavior:
If the binary was installed via make install, use AM_INSTALLDIR to fetch data files.
If the binary was not installed, use AM_TOPDIR to fetch data files.
Is this possible? Is there a better approach to this problem?

What I do (in https://http://rhdunn.github.com/cainteoir/) is:
const char *basedir = getenv("CAINTEOIR_DATADIR");
if (!basedir)
basedir = DATADIR "/" PACKAGE; // e.g. /usr/share/cainteoir-engine
and then run it (in tests/harness.py) as:
CAINTEOIR_DATADIR=`pwd`/data src/apps/metadata/metadata test_file.epub
This then allows the user to change the location of where to get the data if they wish.

Making the program able to use a run-time configuration as proposed by reece is a good solution. If for some reason you do not want it to be configurable at run-time, a common solution is to build a test binary differently than the installed binary (there are other problems associated with this, in particular ensuring that the program you are testing has behavior that is consistent with the program that is installed.) An easy way to do that is something like:
bin_PROGRAMS = foo
check_PROGRAMS = test-foo
test_foo_SOURCES = $(foo_SOURCES)
AM_CPPFLAGS = -DINSTALLDIR='"$(pkgdatadir)"'
test_foo_CPPFLAGS = -DINSTALLDIR='"$(abs_top_srcdir)"'
Rather than using a binary with a different name, you might want to have a dedicated tests directory and build the program using the same name as the original.
Note that I've changed the name from AM_INSTALLDIR to INSTALLDIR. Automake reserves names
beginning with "AM_" for its own use, and by using that name you are stomping on Automake's

A bit of additional information first: The data files are under active development, and I have various scripts that need to call binaries using local data files, whereas installed binaries should use stable, installed data files.
My original solution made use of an environment variable, as proposed by reece. But I didn't want to manage setting up environment variables in various places, and I didn't want any risk of the wrong data files being picked up due to a mistake.
So the solution I ended up with was to define macros for both locations at build time, and add a flag (-local) to the binaries to force local data files to be used.


Create directory structure in /var/lib using autotools and automake

I'm using autotools on a C project that, after installation, needs a particular directory structure in /var/lib as follows:
I'm currently using the directive AS_MKDIR_P in configure.ac like so:
But it needs the configure script to be run with root permissions which I don't think is the way to go. I think the instructions to create this directory structure needs to be in Makefile.am, so that make install creates them rather than configure, but I have no idea how to do that.
You really, really, really do not want to specify /var/lib/my-project. As the project maintainer, you have the right to specify relative paths, but the user may change DESTDIR or prefix. If you ignore DESTDIR and prefix and just install your files in /var/lib without regard for the user's requests, then your package is broken. It is not just slightly damaged, it is completely unusable. The autotool packaging must not specify absolute paths; that is for downsteam packagers (ie, those that build *.rpm or *.deb or *.dmg or ...). All you need to do is add something like this to Makefile.am:
configdir = $(pkgdatadir)/configurations
localdir = $(configdir)/local
extradir = $(configdir)/extra
inputdir = $(pkgdatadir)/inputs
mydatadir = $(pkgdatadir)/data
config_DATA = cfg.txt
local_DATA = local.txt
extra_DATA = extra.txt
input_DATA = input.txt
mydata_DATA = data.txt
This will put input.txt in $(DESTDIR)$(pkgdatadir)/inputs, etc. If you want that final path to be /var/lib/my-project, then you can specify datadir appropriately at configure time. For example:
$ CONFIG_SITE= ./configure --datadir=/var/lib > /dev/null
This will assign /var/lib to datadir, so that pkgdatadir will be /var/lib/my-project and a subsequent make install DESTDIR=/path/to/foo will put the files in /path/to/foo/var/lib/my-package/. It is essential that your auto-tooled package honor things like prefix (which for these files was essentially overridden here by the explicit assignment of datadir) and DESTDIR. The appropriate time to specify paths like /var/lib is when you run the configure script. For example, you can add the options to the configure script in your rpm spec file or in debian/rules, or in whatever file your package system uses. The auto-tools provide a very flexible packaging system which can be easily used by many different packaging systems (unfortunately, the word "package" is highly overloaded!). Embrace that flexibility.
According to autotools documentation (here and here), there are hooks that you can specify in Makefile.am that will run at specific times during the installation. For my needs I will use install-exec-hook (or install-data-hook) which will be run after all executables (or data) have been installed:
$(MKDIR_P) /var/lib/my-project/data
$(MKDIR_P) /var/lib/my-project/configurations/local
$(MKDIR_P) /var/lib/my-project/configurations/extra
$(MKDIR_P) /var/lib/my-project/inputs
MKDIR_P is a variable containing the command mkdir -p, or an equivalent to it if the system doesn't have mkdir. To make it available in Makefile.am you have to use the macro AC_PROG_MKDIR_P in configure.ac.

Automatically find dependencies and create CMakeLists.txt with CMake (or CMake Tools in Visual Studio Code) [duplicate]

CMake offers several ways to specify the source files for a target.
One is to use globbing (documentation), for example:
Another method is to specify each file individually.
Which way is preferred? Globbing seems easy, but I heard it has some downsides.
Full disclosure: I originally preferred the globbing approach for its simplicity, but over the years I have come to recognise that explicitly listing the files is less error-prone for large, multi-developer projects.
Original answer:
The advantages to globbing are:
It's easy to add new files as they
are only listed in one place: on
disk. Not globbing creates
Your CMakeLists.txt file will be
shorter. This is a big plus if you
have lots of files. Not globbing
causes you to lose the CMake logic
amongst huge lists of files.
The advantages of using hardcoded file lists are:
CMake will track the dependencies of a new file on disk correctly - if we use
glob then files not globbed first time round when you ran CMake will not get
picked up
You ensure that only files you want are added. Globbing may pick up stray
files that you do not want.
In order to work around the first issue, you can simply "touch" the CMakeLists.txt that does the glob, either by using the touch command or by writing the file with no changes. This will force CMake to re-run and pick up the new file.
To fix the second problem you can organize your code carefully into directories, which is what you probably do anyway. In the worst case, you can use the list(REMOVE_ITEM) command to clean up the globbed list of files:
file(GLOB to_remove file_to_remove.cpp)
list(REMOVE_ITEM list ${to_remove})
The only real situation where this can bite you is if you are using something like git-bisect to try older versions of your code in the same build directory. In that case, you may have to clean and compile more than necessary to ensure you get the right files in the list. This is such a corner case, and one where you already are on your toes, that it isn't really an issue.
The best way to specify sourcefiles in CMake is by listing them explicitly.
The creators of CMake themselves advise not to use globbing.
See: https://cmake.org/cmake/help/latest/command/file.html?highlight=glob#glob
(We do not recommend using GLOB to collect a list of source files from your source tree. If no CMakeLists.txt file changes when a source is added or removed then the generated build system cannot know when to ask CMake to regenerate.)
Of course, you might want to know what the downsides are - read on!
When Globbing Fails:
The big disadvantage to globbing is that creating/deleting files won't automatically update the build-system.
If you are the person adding the files, this may seem an acceptable trade-off, however this causes problems for other people building your code, they update the project from version-control, run build, then contact you, complaining that"the build's broken".
To make matters worse, the failure typically gives some linking error which doesn't give any hints to the cause of the problem and time is lost troubleshooting it.
In a project I worked on we started off globbing but got so many complaints when new files were added, that it was enough reason to explicitly list files instead of globbing.
This also breaks common git work-flows(git bisect and switching between feature branches).
So I couldn't recommend this, the problems it causes far outweigh the convenience, when someone can't build your software because of this, they may loose a lot of time to track down the issue or just give up.
And another note, Just remembering to touch CMakeLists.txt isn't always enough, with automated builds that use globbing, I had to run cmake before every build since files might have been added/removed since last building *.
Exceptions to the rule:
There are times where globbing is preferable:
For setting up a CMakeLists.txt files for existing projects that don't use CMake.Its a fast way to get all the source referenced (once the build system's running - replace globbing with explicit file-lists).
When CMake isn't used as the primary build-system, if for example you're using a project who aren't using CMake, and you would like to maintain your own build-system for it.
For any situation where the file list changes so often that it becomes impractical to maintain. In this case it could be useful, but then you have to accept running cmake to generate build-files every time to get a reliable/correct build (which goes against the intention of CMake - the ability to split configuration from building).
* Yes, I could have written a code to compare the tree of files on disk before and after an update, but this is not such a nice workaround and something better left up to the build-system.
In CMake 3.12, the file(GLOB ...) and file(GLOB_RECURSE ...) commands gained a CONFIGURE_DEPENDS option which reruns cmake if the glob's value changes.
As that was the primary disadvantage of globbing for source files, it is now okay to do so:
# Whenever this glob's value changes, cmake will rerun and update the build with the
# new/removed files.
add_executable(my_target ${sources})
However, some people still recommend avoiding globbing for sources. Indeed, the documentation states:
We do not recommend using GLOB to collect a list of source files from your source tree. ... The CONFIGURE_DEPENDS flag may not work reliably on all generators, or if a new generator is added in the future that cannot support it, projects using it will be stuck. Even if CONFIGURE_DEPENDS works reliably, there is still a cost to perform the check on every rebuild.
Personally, I consider the benefits of not having to manually manage the source file list to outweigh the possible drawbacks. If you do have to switch back to manually listed files, this can be easily achieved by just printing the globbed source list and pasting it back in.
You can safely glob (and probably should) at the cost of an additional file to hold the dependencies.
Add functions like these somewhere:
# Compare the new contents with the existing file, if it exists and is the
# same we don't want to trigger a make by changing its timestamp.
function(update_file path content)
set(old_content "")
if(EXISTS "${path}")
file(READ "${path}" old_content)
if(NOT old_content STREQUAL content)
file(WRITE "${path}" "${content}")
# Creates a file called CMakeDeps.cmake next to your CMakeLists.txt with
# the list of dependencies in it - this file should be treated as part of
# CMakeLists.txt (source controlled, etc.).
function(update_deps_file deps)
set(deps_file "CMakeDeps.cmake")
# Normalize the list so it's the same on every machine
foreach(dep IN LISTS deps)
list(APPEND rel_deps ${rel_dep})
list(SORT rel_deps)
# Update the deps file
set(content "# generated by make process\nset(sources ${rel_deps})\n")
update_file(${deps_file} "${content}")
# Include the file so it's tracked as a generation dependency we don't
# need the content.
And then go globbing:
file(GLOB_RECURSE sources LIST_DIRECTORIES false *.h *.cpp)
add_executable(test ${sources})
You're still carting around the explicit dependencies (and triggering all the automated builds!) like before, only it's in two files instead of one.
The only change in procedure is after you've created a new file. If you don't glob the workflow is to modify CMakeLists.txt from inside Visual Studio and rebuild, if you do glob you run cmake explicitly - or just touch CMakeLists.txt.
Specify each file individually!
I use a conventional CMakeLists.txt and a python script to update it. I run the python script manually after adding files.
See my answer here:
I'm not a fan of globbing and never used it for my libraries. But recently I've looked a presentation by Robert Schumacher (vcpkg developer) where he recommends to treat all your library sources as separate components (for example, private sources (.cpp), public headers (.h), tests, examples - are all separate components) and use separate folders for all of them (similarly to how we use C++ namespaces for classes). In that case I think globbing makes sense, because it allows you to clearly express this components approach and stimulate other developers to follow it. For example, your library directory structure can be the following:
/include - for public headers
/src - for private headers and sources
/tests - for tests
You obviously want other developers to follow your convention (i.e., place public headers under /include and tests under /tests). file(glob) gives a hint for developers that all files from a directory have the same conceptual meaning and any files placed to this directory matching the regexp will also be treated in the same way (for example, installed during 'make install' if we speak about public headers).

How to package Tcl libraries in my own program?

In my c++ program, I used Tcl library and linked libtcl8.5.so, but the target hosts don't have tcl8.5, so I copied the libtcl8.5.so and tcl8.5 dir which contains init.tcl there, and set the environmet variable TCLLIBPATH to path/to/copied/tcl8.5, but when my program call Tcl_Init, it failed and said “package not known”.
It seems the copied tcl8.5/ cannot be init correctly.
How can I solve this problem?
If you change the location of the script library directory (tcl8.5/ in your case), you need to tell the shared library part of Tcl where it is. You do this using the TCL_LIBRARY environment variable, which if set should contain the absolute path that is the location of that directory (technically, the directory that contains init.tcl).
In a normal installation of Tcl, the correct location of that directory is baked directly into the shared library, but when you move things round (or when you are running Tcl's make test) the environment variable allows you to override.
You might wish to look into alternate packaging mechanisms; there have already been a few questions in the tcl tag on this matter (but the usual favorite — a starkit — is probably not suitable for your case given the fact that the program is mainly C++).

Appending data to an executable (Windows, Unix)

I have a program which compiles and runs scripts.
To create a standalone version of the script, I reserve a large static buffer to hold the compiled script. The compiled script is copied into a copy of the program and it can then be run from that copy.
This works fine. It has some disadvantages however:
the buffer is static and takes up space if there's no compiled
program in it.
if the script to be included exceeds the buffer's size, I need to build a new version with a larger buffer.
I'd like to add the compiled script to the end of the program, but naively doing so doesn't work as the exe loader chokes on the new file size.
Is there a way to manipulate the exe so it would be acceptable for the loaders (mind this is a cross platform program)?
would be acceptable for the loaders (mind this is a cross platform program)?
I would think that this is unlikely to be possible without being platform specific. Time for a common interface with different implementations (so the code that saves/loads the script is common, but the executable manipulation is specific).
On Windows you'll hit the problem that a running executable file is locked against modification. By working on copies this can be worked around (but the only way to rename back in a completely deterministic way it is perform the move on boot, but scheduling a job might be acceptable).
On Windows the easiest way to add data to an image (executable or dll) is using resources. Define a custom resource type and add into the image (UpdateResource function) and later retrieve with LoadResource.
You said "script", so I suppose you have a separate file containing the script (a text file?). You could write a simple program that reads the script file and convert it in a compilable form (e.g. a C source containing the initialization of an array of byte). There are also tools you can use to convert an arbitrary file into a linkable object (.o or .obj). In the past I have used the command "objcopy" from GNU bimutils. In particular, on linux:
objcopy -I binary -O elf32-i386 mydata mydata.o
This command creates an object and three public symbols you can use to find the start, the end and the size of your data block:
Something similar may work also on Windows, provided that you install a Windows version of GNU binutils (e.g. cygwin).

Powerflex Database File extensions

I am trying to understand the different file extensions for the pfxplus powerflex database. Could someone please help telling me briefly what each file is for?
Data files:
OK, so .dat is the data file.
.k1 -> .k15 are index files.
These are the critical data files for runtime. (Combined with filelist.cfg or pffiles.tab similar to define what files are available overall).
.fd is the file definition, needed for compiling programs
.tag (which you did not mention) is needed only if you need to access field names at run time (such as using a generic report tool)
.def is the file definition in human readable form, and is not needed by any process but is produced so a programmer or user can understand the file structure.
Run time:
The .ptc files are the compiled threads interpreted by the powerflex runtime.
The .prc file is a resource file that is used at runtime in conjunction with the .ptc file - it defines how a character based program is to look in a gui environment in "g-mode". It was the cheap way to upgrade character based programs when windows first started getting popular usage.
.hdr and .pc3 escape me at the moment, but are vaguely familiar - .hdr is probably another data file used with compression or special field types for later versions of pfxplus. .pc3 may in fact be the .ptc files...
