Does Incremental Linking really reduce the size of an image? - linker

According to MSDN an image is larger with "incremental linked" than without:
An incrementally linked program is functionally equivalent to a program that is nonincrementally linked. However, because it is prepared for subsequent incremental links, an incrementally linked executable (.exe) file or dynamic-link library (DLL) is larger than a nonincrementally linked program ...
I made a few tests in release mode (just to test the impact of this feature) and I don't see differences in the size of the image produced. How to explain this? Is the MSDN information wrong? Has anyone tried this and seen which impact this linking feature has on the (released) image file.

The quote you supplied says an incrementally linked program is larger, not smaller.
The size increases because not all of the executable is relinked every time, so some old code remains in the executable and is just no longer used. The next full (non-incremental) build will remove the no longer used code. You should always do a full build (non-incremental) before actual release to remove the extraneous noise (and reduce your executable size).
There's no way to tell know in advance what the size difference will be specifically, because it depends on the exact code in your application, what you change, how many times you change it, choices the linker makes about whether it can just do that change incrementally or not, how long it's been since the last full build, and a lot of other variable information.

Related

How can I convert an .abs or .s19 to a C file?

I am trying to run some MC9S12DP256 example files, but I want to see the code to understand it. Are there any ways to convert a .s19 or .abs file to a C code?
An ".s19" or an ".abs" file contains mainly the machine code of the application. The source code of it is not included, independent of the language used to write it. Even if it were written in assembly language, all symbolic informations and comments are excluded.
However, you can try to de-compile the machine code. This is not a trivial or quick task, you need to know the target really well. I did this with software for other processors, it is feasible for code up to some KB.
These are the steps I recommend:
Get a disassembler and an assembler for the target processor, optimally from the vendor.
Let it disassemble the machine code into assembly source code. You might need to convert the ".s19" file into a binary file, one possible tool for this is "srecord".
Assemble the resulting source code again into ".s19" or ".abs", and make sure that it generates the same contents as your original.
Insert labels for the reset and interrupt entry points. Start at the reset entry point with your analysis.
Read the source code, think about what it does.
You will quickly "dive" into subroutines that execute small functions, like reading ADC or sending data. Place a label and replace the numerical value at the call sites with the label.
Expect sections of (constant) data mixed with executable code.
Repeat often from point 3. If you have a difference, undo your last step and redo it in another way until you produce the same contents.
If you want C source, it is commonly much more difficult. You need a lot of experience how C is compiled into machine code. Be aware that variables or even functions are commonly placed in another sequence than they are declared. If you want to go that route, you usually also have to use the exact version of the compiler used to generate the original machine code.
Be aware that the original might be produced with any other language.

Create a fixed size section with gcc and place values in it

I need to embed a binary file within an executable generated with gcc on Linux, to be executed in the host (not in a separated device).
In addition, I want to be able to change that binary content externally by using obcjcopy --update-section.
I could do that with __attribute__(("section")), but the problem is that the mentioned binary file might have different sizes at different moments, so I want to allocate a section of a fixed maximum size. Thus, I can update slightly bigger/smaller binaries in the future.
Apart from the above, I would like to give a default value to that particular section at build time (a predefined binary file that is available at build time).
This can be done with a linker script. However, as far as I understand, I would need to modify the OS default linker script, what I want to avoid.
The only thing that comes to my mind is to create an array on that section with a fixed size, using the first bytes for allocating the default binary file and padding the rest with 0xFF's for instance.
Is there a better way to do this?
As ikegami has mentioned, it's enough to specify the maximum size of the array and then initialise the values you need.

how to get added content of a file since last modification

I'm working on a project in golang that needs to index recently added file content (using framework called bleve), and I'm looking for a solution to get content of a file since last modification. My current work-around is to record the last indexed position of each file, and during indexing process later on I only retrieve file content starting from the previous recorded position.
So I wonder if there's any library or built-in functionality for this? (doesn't need to be restricted to go, any language could work)
I'll really appreciate it if anyone has a better idea than my work-around as well!
Thanks
It depends on how the files change.
If the files are append-only, then you only need to record the last offset where you stopped indexing, and start from there.
If the changes can happen anywhere, and the changes are mostly replacing old bytes with new bytes (like changing pixels of an image), then perhaps you can consider computing checksum for small chucks, and only index those chunks that has different checksums.
You can check out crypto package in Go standard library for computing hashes.
If the changes are line insertion/deletion to text files (like changes to source code), then maybe a diff algorithm can help you find the differences. Something like https://github.com/octavore/delta.
If you're running in a Unix-like system, you could just use tail. If you specify to follow the file, the process will keep waiting after reaching end of file. You can invoke this in your program with os/exec and pipe the Stdout to your program. Your program can then read from it periodically or with blocking.
The only way I can think of to do this natively in Go is like how you described. There's also a library that tries to emulate tail in Go here: https://github.com/hpcloud/tail

How to add (and use) binary data to compiled executable?

There are several questions dealing with some aspects of this problem, but neither seems to answer it wholly. The whole problem can be summarized as follows:
You have an already compiled executable (obviously expecting the use of this technique).
You want to add an arbitrarily sized binary data to it (not necessarily by itself which would be another nasty problem to deal with).
You want the already compiled executable to be able to access this added binary data.
My particular use-case would be an interpreter, where I would like to make the user able to produce a single file executable out of an interpreter binary and the code he supplies (the interpreter binary being the executable which would have to be patched with the user supplied code as binary data).
A similar case are self-extracting archives, where a program (the archiving utility, such as zip) is capable to construct such an executable which contains a pre-built decompressor (the already compiled executable), and user-supplied data (the contents of the archive). Obviously no compiler or linker is involved in this process (Thanks, Mathias for the note and pointing out 7-zip).
Using existing questions a particular path of solution shows along the following examples:
appending data to an exe - This deals with the aspect of adding arbitrary data to arbitrary exes, without covering how to actually access it (basically simple append usually works, also true with Unix's ELF format).
Finding current executable's path without /proc/self/exe - In companion with the above, this would allow getting a file name to use for opening the exe, to access the added data. There are many more of these kind of questions, however neither focuses especially on the problem of getting a path suitable for the purpose of actually getting the binary opened as a file (which goal alone might (?) be easier to accomplish - truly you don't even need the path, just the binary opened for reading).
There also may be other, probably more elegant ways around this problem than padding the binary and opening the file for reading it in. For example could the executable be made so that it becomes rather trivial to patch it later with the arbitrarily sized data so it appears "within" it being in some proper data segment? (I couldn't really find anything on this, for fixed size data it should be trivial though unless the executable has some hash)
Can this be done reasonably well with as little deviation from standard C as possible? Even more or less cross-platform? (At least from maintenance standpoint) Note that it would be preferred if the program performing the adding of the binary data didn't rely on compiler tools to do it (which the user might not have), but solutions necessiting those might also be useful.
Note the already compiled executable criteria (the first point in the above list), which requires a completely different approach than solutions described in questions like C/C++ with GCC: Statically add resource files to executable/library or SDL embed image inside program executable , which ask for embedding data compile-time.
Additional notes:
The problems with the obvious approach outlined above and suggested in some comments, that to just append to the binary and use that, are as follows:
Opening the currently running program's binary doesn't seem something trivial (opening the executable for reading is, but not finding the path to supply to the file open call, at least not in a reasonably cross-platform manner).
The method of acquiring the path may provide an attack surface which probably wouldn't exist otherwise. This means that a potential attacker could trick the program to see different binary data (provided by him) like which the executable actually has, exposing any vulnerability which might reside in the parser of the data.
It depends on how you want other systems to see your binary.
Digital signed in Windows
The exe format allows for verifying the file has not been modified since publishing. This would allow you to :-
Compile your file
Add your data packet
Sign your file and publish it.
The advantage of following this system, is that "everybody" agrees your file has not been modified since signing.
The easiest way to achieve this scheme, is to use a resource. Windows resources can be added post- linking. They are protected by the authenticode digital signature, and your program can extract the resource data from itself.
It used to be possible to increase the signature to include binary data. Unfortunately this has been banned. There were binaries which used data in the signature section. Unfortunately this was used maliciously. Some details here msdn blog
Breaking the signature
If re-signing is not an option, then the result would be treated as insecure. It is worth noting here, that appended data is insecure, and can be modified without people being able to tell, but so is the code in your binary.
Appending data to a binary does break the digital signature, and also means the end-user can't tell if the code has been modified.
This means that any self-protection you add to your code to ensure the data blob is still secure, would not prevent your code from being modified to remove the check.
Running module
Windows GetModuleFileName allows the running path to be found.
Linux offers /proc/self or /proc/pid.
Unix does not seem to have a method which is reliable.
Data reading
The approach of the zip format, is to have a directory written to the end of the file. This means the data can be found at the end of the location, and then looked backwards for the start of the data. The advantage here, is the data blob is signposted from the end of the data, rather than the natural start.

what's the difference between switch_root and run_init?

What's the difference between switch_root and run_init, besides switch_root being made by busybox while run_init is from klibc?
Thanks very much
They both perform exactly the same function, which is to switch to the "real" root and execv(3) the "real" init(8) program from an initramfs. They both assume that the filesystem that should become the root has been mounted on some directory, which they take as an argument.
(An initramfs is a (usually) temporary in-memory filesystem loaded by the bootloader. Its purpose is to do any setup that might be required before mounting the real root and switching to the real init program.)
Recent source code for run-init can be found here. run_init() is the entry point (called from run-init.c, which parses the arguments).
Recent source code for switch_root can be found here. switch_root_main() is the entry point.
The code is short for both implementations (though a bit tricky), which makes it easy to compare them by eye. The only difference seems to be that they perform slightly different sanity checks, and that recent versions of run-init have an extra option to drop selected capabilities(7) before execv()'ing the new init.

Resources