Appending data to an executable (Windows, Unix)

Appending data to an executable (Windows, Unix) - c

I have a program which compiles and runs scripts.
To create a standalone version of the script, I reserve a large static buffer to hold the compiled script. The compiled script is copied into a copy of the program and it can then be run from that copy.
This works fine. It has some disadvantages however:
the buffer is static and takes up space if there's no compiled
program in it.
if the script to be included exceeds the buffer's size, I need to build a new version with a larger buffer.
I'd like to add the compiled script to the end of the program, but naively doing so doesn't work as the exe loader chokes on the new file size.
Is there a way to manipulate the exe so it would be acceptable for the loaders (mind this is a cross platform program)?

would be acceptable for the loaders (mind this is a cross platform program)?
I would think that this is unlikely to be possible without being platform specific. Time for a common interface with different implementations (so the code that saves/loads the script is common, but the executable manipulation is specific).
On Windows you'll hit the problem that a running executable file is locked against modification. By working on copies this can be worked around (but the only way to rename back in a completely deterministic way it is perform the move on boot, but scheduling a job might be acceptable).
On Windows the easiest way to add data to an image (executable or dll) is using resources. Define a custom resource type and add into the image (UpdateResource function) and later retrieve with LoadResource.

You said "script", so I suppose you have a separate file containing the script (a text file?). You could write a simple program that reads the script file and convert it in a compilable form (e.g. a C source containing the initialization of an array of byte). There are also tools you can use to convert an arbitrary file into a linkable object (.o or .obj). In the past I have used the command "objcopy" from GNU bimutils. In particular, on linux:
objcopy -I binary -O elf32-i386 mydata mydata.o
This command creates an object and three public symbols you can use to find the start, the end and the size of your data block:
_binary_mydata_start
_binary_mydata_end
_binary_mydata_size
Something similar may work also on Windows, provided that you install a Windows version of GNU binutils (e.g. cygwin).

Related

Suspicious static-linked executable file size

I have a BB 10/QNX app in which I need to use more recent version of SQLite than is default on BB 10. I thought I could do that by linking my own SQLite code with my app. I realized that in my qmake-generated Makefile the option -lsqlite3 is passed to qcc. In the library location (/opt/bbndk/target_10_2_0_1155/qnx6/armle-v7/usr/lib) I found the following files:
size filename
559386 libsqlite3.a
560662 libsqlite3S.a
15 libsqlite3.so -> libsqlite3.so.1
496503 libsqlite3.so.1
I thought that I can replace libsqlite3.a with my own file compiled from latest sqlite3.c (amalgamation). What confuses me is that the size of my application executable is just 180 kB, so the code from libsqlite3.a doesn't seem to be present in it. If SQLite is dynamically linked, I'd expect the application archive (.bar) to contain libsqlite3.so, which also isn't true, because the archive is just 130 kB big. How is it possible that the application uses SQLite (via Qt database classes), but SQLite code never makes it to the application archive?

Static version
When linking an executable to a static library, the compiler know you're building a "finished product": nothing will depend on your executable. It allows the compiler to not include unused code. Let's say you're using only one function from the library, which itself is not using anything else from the library. The compiler will pick only the machine code corresponding to this specific function and gently ignore the rest.
Dynamic version
Regarding the size of the bar archive and the dynamic library, it goes down to what really is a bar archive. It's simply a zip archive (maybe with some metadata added, don't know in details). Two possibilities: either the .so file is greatly compressed, or the compiler relies on the system library and doesn't bundle it in the bar archive.

How to stop file names/paths from appearing in compiled C binary

This may be compiler specific, in which case I am using the IAR EWARM 5.50 compiler (firmware development for the STM32 chip).
Our project consists of a bunch of C-code libraries that we compile first, and then the main application which compiles its C-code and then links in those libraries (pretty standard stuff).
However, if I use a hex editor and open up any of the library object files produced or the final application binary, I find a whole bunch of plain text references inside the output binary to the file paths of the C files that were compiled. (eg. I see "C:\Development\trunk\Common\Encryption\SHA_1.c")
Two issues with this:
we don't really want the file paths being easily readable as that indicates our design some what
the size of the binary grows if you have your C-files located in a long subdirectory (the binary contains the full path, not just the name)...this is especially important when we're dealing with firmware that has a limited amount of code space (256KB).
Any thoughts on this? I've tried all the switches in the compiler I can think of to "remove debug information", etc., but those paths are still in there.

"The command-line option --no_path_in_file_macros has been added. It removes the path leaving only the filename for the symbols FILE and BASE_FILE."
It is defined in the release notes if IAR.
http://supp.iar.com/FilesPublic/UPDINFO/005832/arm/doc/infocenter/iccarm_history.ENU.html
Or you can look for FILE and BASE_FILE macros and remove it you do not want to use the flag.

How to package Tcl libraries in my own program?

In my c++ program, I used Tcl library and linked libtcl8.5.so, but the target hosts don't have tcl8.5, so I copied the libtcl8.5.so and tcl8.5 dir which contains init.tcl there, and set the environmet variable TCLLIBPATH to path/to/copied/tcl8.5, but when my program call Tcl_Init, it failed and said “package not known”.
It seems the copied tcl8.5/ cannot be init correctly.
How can I solve this problem?

If you change the location of the script library directory (tcl8.5/ in your case), you need to tell the shared library part of Tcl where it is. You do this using the TCL_LIBRARY environment variable, which if set should contain the absolute path that is the location of that directory (technically, the directory that contains init.tcl).
In a normal installation of Tcl, the correct location of that directory is baked directly into the shared library, but when you move things round (or when you are running Tcl's make test) the environment variable allows you to override.
You might wish to look into alternate packaging mechanisms; there have already been a few questions in the tcl tag on this matter (but the usual favorite — a starkit — is probably not suitable for your case given the fact that the program is mainly C++).

Locating data files in C program built with Autotools

I have a C program built using Autotools. In src/Makefile.am, I define a macro with the path to installed data files:
AM_CPPFLAGS = -DAM_INSTALLDIR='"$(pkgdatadir)"'
The problem is that I need to run make install before I can test the binary (since it needs to be able to find the data files).
I can define another macro with the path of the source tree so the data files can be located without installing:
AM_CPPFLAGS = -DAM_INSTALLDIR='"$(pkgdatadir)"' -DAM_TOPDIR='"$(abs_top_srcdir)"'
Now, I would like the following behavior:
If the binary was installed via make install, use AM_INSTALLDIR to fetch data files.
If the binary was not installed, use AM_TOPDIR to fetch data files.
Is this possible? Is there a better approach to this problem?

What I do (in https://http://rhdunn.github.com/cainteoir/) is:
const char *basedir = getenv("CAINTEOIR_DATADIR");
if (!basedir)
basedir = DATADIR "/" PACKAGE; // e.g. /usr/share/cainteoir-engine
and then run it (in tests/harness.py) as:
CAINTEOIR_DATADIR=`pwd`/data src/apps/metadata/metadata test_file.epub
This then allows the user to change the location of where to get the data if they wish.

Making the program able to use a run-time configuration as proposed by reece is a good solution. If for some reason you do not want it to be configurable at run-time, a common solution is to build a test binary differently than the installed binary (there are other problems associated with this, in particular ensuring that the program you are testing has behavior that is consistent with the program that is installed.) An easy way to do that is something like:
bin_PROGRAMS = foo
check_PROGRAMS = test-foo
test_foo_SOURCES = $(foo_SOURCES)
AM_CPPFLAGS = -DINSTALLDIR='"$(pkgdatadir)"'
test_foo_CPPFLAGS = -DINSTALLDIR='"$(abs_top_srcdir)"'
Rather than using a binary with a different name, you might want to have a dedicated tests directory and build the program using the same name as the original.
Note that I've changed the name from AM_INSTALLDIR to INSTALLDIR. Automake reserves names
beginning with "AM_" for its own use, and by using that name you are stomping on Automake's
namespace.

A bit of additional information first: The data files are under active development, and I have various scripts that need to call binaries using local data files, whereas installed binaries should use stable, installed data files.
My original solution made use of an environment variable, as proposed by reece. But I didn't want to manage setting up environment variables in various places, and I didn't want any risk of the wrong data files being picked up due to a mistake.
So the solution I ended up with was to define macros for both locations at build time, and add a flag (-local) to the binaries to force local data files to be used.

Powerflex Database File extensions

I am trying to understand the different file extensions for the pfxplus powerflex database. Could someone please help telling me briefly what each file is for?
.k1
.k2
.k3
...
.k13
.k14
.k15
.fd
.def
.hdr
.prc
.pc3

Data files:
OK, so .dat is the data file.
.k1 -> .k15 are index files.
These are the critical data files for runtime. (Combined with filelist.cfg or pffiles.tab similar to define what files are available overall).
.fd is the file definition, needed for compiling programs
.tag (which you did not mention) is needed only if you need to access field names at run time (such as using a generic report tool)
.def is the file definition in human readable form, and is not needed by any process but is produced so a programmer or user can understand the file structure.
Run time:
The .ptc files are the compiled threads interpreted by the powerflex runtime.
The .prc file is a resource file that is used at runtime in conjunction with the .ptc file - it defines how a character based program is to look in a gui environment in "g-mode". It was the cheap way to upgrade character based programs when windows first started getting popular usage.
.hdr and .pc3 escape me at the moment, but are vaguely familiar - .hdr is probably another data file used with compression or special field types for later versions of pfxplus. .pc3 may in fact be the .ptc files...