Add variables in specific section during link time - linker

Consider the following situation:
Variables X and Y are part of a library, let us call it lib.o. We do not have the possibility to adapt the corresponding source code of the library (e.g. like adding pragmas)
X has to be linked to another section than Y, both are currently defined in the same section
Is it possible (in general) for the linker to link individual variables out of a library to a specific (possibly new) section?
The reasoning behind this question is that the library has a buffer overflow bug (memclr with to small buffer (variable X)) and overwrites variable Y. A temporary workaround (until the library will be fixed by the seller) would be to link variable X to a section and leave some space behind, so memclr would not harm important variables.

If the compiler is GCC, and,
you have access to the linker script, you can actually force an entire object (or specific sections from it) to a specific area, something like this:
SECTIONS
{
custom_sec :
{
. = ALIGN(4);
//custom_object.o
//custom_object.o (.data)
//custom_object.o (.bss)
}
}
Consult the linker manual. Usually this feature is present in most linkers.

Related

GNU gcc ld: constant CRC value for special sections with references to external code

I am working on a project which contains of subset of code which needs to be validated from flash at runtime with a CRC value. This code is placed into its own section of flash using the linker and during build the CRC value is calculated and injected into the appropriate area of memory. Then at runtime the flash is read and CRCed and compared to the stored value. This is all working correctly and as intended.
The code which is placed into this special section of flash is considered critical which is why it needs to be verified periodically as correct at runtime. The CRC value is also supposed to be used to validate that no changes were made to the critical section from version to version. This is what is not working as expected.
When the changes to non-critical sections of code are made (for example things placed into the normal .text region of flash) there are small differences in the critical code. Upon examining the changes it appears that most, perhaps all, of the changes are due to external function/variable references which are not in the critical code section of flash. This of course makes sense because the linker will be inserting the calls to other functions wherever they might get placed in flash which of course can change.
Is it possible to force the linker to make references to external functions/variables static in this section of flash? I was thinking this could be accomplished with some kind of lookup table which contained virtual memory/function addresses and then actual memory/function addresses and the critical code section would only reference the virtual addresses?
You don't say what CPU you are using, but suppose you want to call routine do_stuff from your critical section, with signature int do_stuff(int a, intb). Then you need a header in your critical section:
int tramp_do_stuff(int a, int b);
and an assembly file with a trampoline for each function in your normal section:
.org 7ffff00H ;; however you specify a fixed address in your asm
_tramp_do_stuff: ;; this address is fixed
JMP do_stuff ;; This address gets set by linker
_tramp_next_trampoline:

Why differentiate BSS and COMMON section?

I believe my question is different to this one. Here I am asking why we need to differentiate those two, the link only answer which one goes to which.
We know that:
Common section is for uninitialized global variables and
Bss section is for uninitialized static variables plus global variable initialized to 0.
But why differentiate BSS and COMMON section? Especially for global variables initialized to 0, can't we put them in .data section which is for initialized global variables? Isn't that initialize a variable to 0 is also a initialization?
Below is an explanation from my textbook:
in some cases the linker allows multiple modules to define global symbols with the same name. When the compiler is translating some module and encounters a weak global symbol, say, x, it does not know if other modules also define x, and if so, it cannot predict which of the multiple instances of x the linker might choose. So the compiler defers the decision to the linker by assigning x to COMMON. On the other hand, if x is initialized to zero, then it is a strong symbol, so the compiler can confidently assign it to bss.
I am really confused, it says “it does not know if other modules also define x”, but how can you define a variable twice? Is an example code available to illustrate?
.bss section is used for allocating zero-initialized data for optimization purposes, to allow
static linker to reduce executable size (but not storing zeros in it)
runtime linker (loader) to speed up the loading process: common data is efficiently initialized either by mapping a dedicated physical page filled with zeroes or by memseting memory at startup.
Common section is used (on some platforms e.g. Windows but not ELF) to implement so called "common symbols" i.e. symbols which may be duplicated in different object files ("translation units"). When such symbol falls into a common section, static linker will merge all separate definitions (with some platform-specific rules, e.g. merge only if identical, prefer largest definition, etc.).
On some targets common sections are used only for uninitialized data (which make them somewhat similar to .bss) and on others also for vague symbols. In general there are no logical reasons for why different platforms made different choices regarding usage of common sections, it's purely historical.
You can find some history behind common symbols in [Raymond Chen's article]
(https://devblogs.microsoft.com/oldnewthing/20161024-00/?p=94575).

What does # ".intvec" mean in the declaration/initialization of int const __vector_table[] # ".intvec" = { }

This line of C code came from declaration/initialization of a vector table for a microcontroller. There must be special meaning to # ".intvec". What is the meaning behind this?
Just a note: This process takes place before the execution of main() function.
The toolchain is IAR, and .intvec is located at 0x00000000.
This is non-standard C code. # is often used as a non-standard extension when you wish to declare a variable at a specific memory location. In this case it points at a segment .intvec which will be reserved in your linker file, which is also written in some custom, tool-dependent way.
In this specific case, they want to ensure that the interrupt vector table is allocated at the designated address for it. Most likely the MCU will expect it to be placed at a certain address, commonly at the very beginning or at the very end of the memory map.
In IAR # is a linker directive extension to locate an object either at an absolute address, within a specific linker section or in a specific register. On your target, no doubt the interrupt vector table is at 0x00000000.
See the section Controlling data and function placement in memory the toolchain documentation.

Force all data in a C file to be in .text (or other) section

I am using gcc to compile some code for an ARM Cortex-M4F microcontroller. My application uses a large collection of data that I have conveniently written into a C file: large arrays of ints and floats, circular linked lists pointing to those various arrays, various structs, that sort of thing.
When I compile this it adds up to about 500K of data, plus a few hundred K for the actual application. gcc is putting all this data conveniently into the .data section; ld then tries to build the elf putting the .data section into RAM and .text (code) section into FLASH. The mcu I am using doesn't have 500K of RAM so it cannot build the ELF, but it does have 1M of FLASH. I tried changing my linker script so that both .data and .text are put into FLASH which "worked" but there are some other bit of code that expect its data to go into RAM, so ultimately execution failed; I can't make that sweeping change.
What I need is to tell gcc to put every single object in this C file into the .text section so it can go into FLASH with the rest of the non-mutable stuff, or at least some other section which I can then instruct my linker script what to do with so that it doesn't interfere with existing blobs that have no problem fitting in RAM. I don't know how to do that. Here is a very slimmed down example of what I have
/* data.c */
static char* option_array_0_0[] =
{
"O=0.40",
"K=FOO",
"JAR=JAR",
};
static int width_array_0_0[] =
{
0,
1,
1,
};
Window window_array_0[] =
{
{
option_array,
width_array,
},
};
/* main.c */
extern Window window_array_0[];
int main()
{
/* Do stuff */
}
The stuff in data.c, window_array_0 and everything (or most everything, maybe the string arrays are going to .text?) it links to, is being put in .data which my linker script puts into RAM. I want it all in a different section which I can then put into FLASH. There are thousands of these types of arrays in there, plus hundreds of structs and dozens of other bits of information. Is this possible to change? As a test I replaced my "window_array_0" with a char[500000] of random data and that compiled without complaint so I assume it put it all into .text (as would be expected), I just don't know how to make it do so for arbitrary objects.
Thanks for your help.
As other commenters pointed, 'static const' items usually end up in .rodata section which is likely to be placed right next to .text in potentially read-only memory. The caveat is that it may or may not be true in your case as that is specific to particular target and may be changed by particular build process (i.e. linker options, specific section specified via __attribute__((section("name"))) in C code, linker script, binaries tweaked after build with various binutils, etc).
If you need to have precise control over in-memory layout and you know what you're doing, you can use LD script to do it. Among other things it will let you specify that .rodata from file data.o should be placed just before/after .text from all other .o files linked into the executable.
You can use arm-<your toolchain variant>-ld -verbose to dump default linker script and use it as a starting point for tweaking.
Most compilers/linkers, if you declare a variable as static const, it will place it in the text section instead of data. Obviously, these must be preinitialized and not modified at run-time, but that's the only way it makes sense to go in flash.
Somewhere in the code (.text section) put like a sample:
__attribute__((section(".text")))
const int test[0x0A] = {0,0,0,0,0,0,0,0,0,0};
or without const if you want a variable to change:
__attribute__((section(".text")))
int test[0x0A] = {0,0,0,0,0,0,0,0,0,0};
Then try changing it:
test[0] = 0xffffffff;
This works on my Linux 32 intel machine.
IIRC, code in flash is usually required to be ROPI (read only position independent). So option_array_0_0 and width_array_0_0 would need const qualifiers (read only). However:
Window window_array_0[] =
{
{
option_array,
width_array,
},
};
needs to be changed somehow (I'm assuming option_array and width_array are indeed arrays). That makes window_array_0 not position independent.
All non-trivial embedded projects shall have a linker script. It defines where different symbols (function, global variables, debug symbols, ...) shall be placed in memory. Here is some tutorial about gcc linker script.
forcing all data to be placed in one section might not be the best solution!
NOTE: Since these arrays are NOT constants, it doesn't make sense to store them in data section! because this means you will end up over-writing code section by you code (which is PRETTY dangerous). Either make them const or create a no-init section which is not initialized by the loader, and you initialize it on reset.

Position of functions in executable

Is there a requirement in the C standard that functions in the compiled (and linked) binary will appear in the ordered they are written in the C file?
Please assume that in the example below the compiler did not remove / inline any function, and they all exist in the binary. the question is not about what the compiler might do with empty function, but about the order of the functions.
For example, if I compile example.c:
void bar() { }
void foo() { bar(); }
int main() { foo(); }
Can I be sure that foo will come after bar in the output file?
No, there isn't such a requirement in the C standard. In terms of compilation and linkage, only particular properties of functions, such as extern or static linkage, etc. are mentioned explicitly, but even these are described in a mostly implementation-independent manner. There's no clause (as far as I know) in any of the standard documents so far that imposes expectations about the order of symbols in an executable.
There is no rule for this in the language. Typically, they do come in the order you expect from looking at the code, but there is nothing saying the compiler can't build a stack of functions, and output them in the completely opposite order - certainly, a function that isn't called can be deleted, and similarly, a function that is inlined and the compiler can determine that it doesn't need an external reference can be deleted in its original form.
You can find out where a function is by char *ptr = (char *)bar;.
Edit: Note, by taking the address of a function, you may alter the inlining of the function, so don't expect this to be a good way to determine what the compiler does "under normal circumstances".
It is not possible to control this through compiler switches alone. You need a two-step process; illustrating this for ELF (the UN*X object file format) here, but it can be done analogous to this for Windows PE objects.
Instruct the compiler to generate separate / specific object file ELF Sections for functions whose code placement you must strictly control. This can either, in GCC, be done via function attributes or via command line switches.
Depending on what type of placement you want to achieve, some of GCC's function attributes (hot, cold, ...) may already do what you need, but if not, and specific ordering / specific locations are absolutely necessary, then ...
Instruct the linker to order/rearrange/merge/position the input sections in specific way into the output.
The actual code / data placement happens at link time - the linker can control object code placement for "constitutent objects" - ELF sections of the source objects - within the resulting target "compound object", i.e. the resulting executable / library. This happens through Linker Scripts. The linker will place input sections at user-specified locations / in user-specified order into the output object if instructed to do so using a custom Linker Script. See the GNU binutils (ld) manual about linker scripts.
As a result (reflecting where into the output the linker actually puts the various parts of the input) you can request a Linker Mapfile to be generated; if you used a non-default / custom linker script to strictly control code placement, then you should instruct the linker to do that, so you can cross-check it did what you wanted. Otherwise, if you used the linker's default, the mapfile would tell you what's done without specific overrides - that may or may not be what you desire, but it at least is a way to check.
There is no such requirement. However, in your example, if bar() came after foo(), foo will consider bar() to be an yet undefined function that returns an integer.

Resources