__attribute__((section("name"))) usage? - c

I have ran through code that use _ attribute _((section("name")). I understand that for gcc compiler this allows you to tell the linker to put the object created at a specific section "name" (with "name" absolute address declared in a linker file).
What is the point of doing this instead of just using the .data section?

There are many possible uses. [Edit to add note: this is just a sample of uses I've seen myself or considered, not a complete list.]
The Linux kernel, for instance, marks some code and data sections as used only during kernel bootstrap. These can be jettisoned after the kernel is running, reclaiming the space for other uses.
You can use this to mark code or data values that need patching on a particular processor variant, e.g., with or without a coprocessor.
You can use it to make things live in "special" address spaces that will be burned to PROM or saved on an EEPROM, rather than in ordinary memory.
You can use it to collect together code or data areas for purposes like initialization and cleanup, as with C++ constructors and destructors that run before the program starts and when it ends, or for using shorter addressing modes (I don't know how much that would apply on ARM as I have not written any ARM code myself).
The actual use depends on the linker script(s).

From a usecase point of view, there are lots of different types of .data, like:
data local to a specific CPU and/or NUMA node
data shared between contexts (like user/kernelspace, as are the .vdso or vsyscall pages. Or, another example, bootloader and kernel)
readonly data or other data with specific access mode/type restrictions (say, cacheability or cache residency - the latter can be specificed on some ARM SoCs)
data that persists "state transitions" (such as hibernation image loads, or crash kernel / fast reboot reinitializations)
data with specific lifetimes/lifecycles (only used in specific stages during boot or during operation, write-once data)
data specific to a particular kernel subsystem or particular kernel module
"code colocated" data (addressing offsets in x64 are plus/minus 2GB so if you want RIP-relative addressing, the data must be within that range of your currently executing code)
data mapped to some specific hardware register space VA range
So in the end it's often about attributes (the word here used in a more generic sense than what __attribute__(...) allows you to state from within gcc sourcecode. Whether another section is needed and/or useful is ... in the eye of the beholder - the system designer, that is.
The availabiltiy of the section attribute, therefore, allows for flexibility and that is, IMHO, a good thing.

Years later, I'm going to add a specific detail because it's worth writing down.
If you create your own section, you can manage it yourself. In particular, you can use preprocessor macros to insert certain data items into your special section. If the only thing that uses that special section is your preprocessor macros, then you have the ability to create a data structure in a distributed fashion.
What does this mean? It means you can write a preprocessor macro like ADD_VAR_TO_SPECIAL_SECTION(...) and concatenate a bunch of different values in random order into what amounts to an array (or just a big old pile, if they aren't all the same type) in your section.
This gives you the ability to create a (randomly-ordered) array of data at compile time. There is no initialization, no registration, no overhead. You just compile and link your code, and all the macros that were in all the different source files have added all their values into one big array.
How can you use this? Create a bunch of "modules." Register the init functions and destroy functions in an ad-hoc array. Process the array at startup time. (You can add some kind of topological sort if you need to.) You don't need to have a master list of modules anywhere, it gets built automatically. Or, create a macro to register unit test functions into a test suite. Again, it creates an ad-hoc list with no "registration" required.

Related

How to use __attribute__((section(“name”))) for a per-cpu global variable

I have come accross code that uses __attribute__((section(“name”))) for per cpu global variables. I have searched about it and have come to know that it is used for placing the data at a specified memory location with the help of linker scripts. What I do not understand is how it can be used for per-cpu data - that is, if a global variable is placed inside a section defined with __attribute__((section(“name”))), then every cpu has its own copy of this global variable. I may be wrong, but my intuition is that linker scripts are used along with __attribute__((section)) to make this happen. But I do not know how. A small working example, or a hint of how to realize this in code would be great.
Note: this question is with respect to the C language.
The operating system/kernel initializes the hardware in such a way that the data at this memory location is not globally shared, but specific to each CPU. Presumably, this is achieved by having slightly different page tables installed on each CPU. All the section attribute does is place the variable into that special memory region, but once the underlying hardware is configured to enable the lack of cross-CPU sharing, this is all it takes to make a variable CPU-specific.

GCC symbol table for local variables on stack

Of course, symbol and type information of each variable defined in a C/C++ program is available, otherwise the debuggers could not show them. But how to access this information?
A lot info about the elf is available, but that is about linking an seems to hold only global variables, not local ones on the stack i.e.
In a remote real time system (not under unix), I'd like to be able to peek now and then by copying some memory in a list together with the associated variable name, and later on take a look at them while the RT system goes on.
The best would be that the dump could be introduced at any time for any variable without the need to add some statements in the code upfront.
But how to access this information?
TL;DR: it's complicated.
You would need to build almost a complete debugger. You can watch this space. When the author gets around to step 9, you'll have an example to follow.
I'd like to be able to peek now and then by copying some memory in a list together with the associated variable name, and later on take a look at them while the RT system goes on.
RT systems do not usually lend themselves to easy debugging. The best you could probably do is take a snapshot of the entire (used portion of) the stack, and "fish out" variable values later.
To do that, you'll need to know current values of the stack pointer and instruction pointer, contents of the stack, and load addresses of all ELF objects. And you'll need to re-implement large part of a debugger (or modify existing one).
The easiest approach might be to convert (post-process) the above info into an ELF core, and then use existing debugger of your choice to analyse the values. You can use Google user-space coredumper to see what's involved. See also this answer.

Replace function behind a function pointer on microcontroller at runtime

I wonder if there is a way to load a C function with its data at runtime to the text segment of a running microcontroller system. After the function is placed in the text segment and the data is stored in data segment the function pointer to the new loaded function becomes invoked in the main application. The functionality would be similar to a boot loader except from loading a entire binary before start up. I know that you can use the scatter-loading functions of the linker to place the function pointer at a fixed address or alter the alignment in the sections. Does anyone know if this is possible and if not why?
Many thanks
Technically it is possible. Keep in mind that any solution will be non-standard, not portable, and very tricky.
Many controllers may execute code only from a read-only memory, which makes the whole concept of dynamic loading problematic:
you'd need to erase a whole page first, making sure that no other parts of the application accesses this page during load;
you'd need to flush the instruction cache (again, many controllers rely on instruction cache being always valid).
In any case you'd need to ensure that the function being replaced has no stack frame associated with it. Very hard to enforce in a multithreaded system.
Any particular architecture may offer more traps.

What is the deal with position-independent code (PIC)?

Could somebody explain why I should be interested in compiling position-independent code, and also why should I avoid it?
Making code position-independent adds a layer of abstraction, which requires an additional lookup step at runtime for certain operations (usually pertaining to accessing variables with static storage).
So if you don't need it, don't use it!
There are specific situations where you must produce PIC (namely when creating run-time loadable code, such as a plug-in module or library), but the added flexibility comes at a price.
The gory details depend on how your loader works on on whether you are building a executable or a library, but there is a sense in which this is all a problem for the build system and the compiler, not for you.
If you really want to understand you need to consider where the code gets put in the address space before execution starts and what set of branching instructions your chip provides. Are branches relative or absolute? Is access to the data segment relative or absolute?
If branches are absolute, then the code must be loaded to a reliable address or it won't work. That's position dependent code. Many simple (or older) operating systems accommodate this by always loading a program to the same place.
Relative branches mean that the can be placed at any location in memory. That is position independent code.
Again, all you need to know is the recipe for invoking your compiler and linker on your platform.
PIC code usually has to be slightly larger because the compiler can't use instructions that encode relative address offsets. Without PIC, many addresses can be encoded with 16 or 8 bits relative to current PC. Sometimes in embedded systems, PIC is useful. For example if you want to have patch code that can run at various physical addresses.

How can I manually (programmatically) place objects in my multicore project?

I am developing a mutlicore project for our embedded architecture using the gnu toolchain. In this architecture, all independent cores share the same global flat memory space. Each core has its own internal memory, which is addressable from any other core through its global 32-bit address.
There is no OS implemented and we do low-level programming, but in C instead of assembly. Each core has its own executable, generated with a separate compilation. The current method we use for inter-core communication is through calculation of absolute addresses of objects in the destination core's data space. If we build the same code for all cores, then the objects are located by the linker in the same place, so accessing an object in a remote core is merely changing the high-order bits of the address of the object in the current core and making the transaction. Similar concept allows us to share objects that are located in the external DRAM.
Things start getting complicated when:
The code is not the same in the two cores, so objects may not be allocated in similar addresses,
We sometimes use a "host", which is another processor running some control code that requires access to objects in the cores, as well as shared objects in the external memory.
In order to overcome this problem, I am looking for an elegant way of placing variables in build time. I would like to avoid changing the linker script file as possible. However, it seems like in the C level, I could only control placement up to using a combination of the section attribute (which is too coarse) and the align attribute (which doesn't guarantee the exact place).
A possible hack is to use inline assembly to define the objects and explicitly place them (using the .org and .global keywords), but it seems somewhat ugly (and we did not yet actually test this idea...)
So, here's the questions:
Is there a semistandard way, or an elegant solution for manually placing objects in a C program?
Can I declare an "uber"-extarnel objects in my code and make the linker resolve their addresses using another project's executable?
This question describes a similar situation, but there the user references a pre-allocated resource (like a peripheral) whose address is known prior to build time.
Maybe you should try to use 'placement' tag from new operator. More exactly if you have already an allocated/shared memory you may create new objects on that. Please see: create objects in pre-allocated memory
You don't say exactly what sort of data you'll be sharing, but assuming it's mostly fixed-size statically allocated variables, I would place all the data in a single struct and share only that.
The key point here is that this struct must be shared code, even if the rest of the programs are not. It would be possible to append extra fields (perhaps with a version field so that the reader can interpret it correctly), but existing fields must not be removed or modifed. structs are already used as the interface between libraries everywhere, so their layout can be relied upon (although a little more care will be need in a heterogeneous environment, as long as the type sizes and alignments are the same you should be ok).
You can then share structs by either:
a) putting them in a special section and using the linker script to put that in a known location;
b) allocating the struct in static data, and placing a pointer to that at a known location, say in your assembly start-up files; or
c) as (b), but allocate the struct on the heap, and copy the pointer to the known pointer location at run-time. The has the advantage that the pointer can be pre-adjusted for external consumers, thus avoiding a certain amount of messing about.
Hope that helps
Response to question 1: no, there isn't.
As for the rest, it depends very much of the operating system you use. On our system at the time I was in embedded, we had only one processor's memory to handle (80186 and 68030 based), but had multi-tasking but from the same binary. Our tool chain was extended to handle the memory in a certain way.
The toolchain looked like that (on 80186):
Microsoft C 16bit or Borland-C
Linker linking to our specific crt.o which defined some special symbols and segments.
Microsoft linker, generating an exe and a map file with a MS-DOS address schema
A locator that adjusted the addresses in the executable and generated a flat binary
Address patcher.
An EPROM burner (later a Flash loader).
In our assembly we defined a symbol that was always at the beginning of data segment and we patched the binary with a hard coded value coming from the located map file. This allowed the library to use all the remaining memory as a heap.
In fact, if you haven't the controle on the locator (the elf loader on linux or the exe/dll loader in windows) you're screwed.
You're well off the beaten path here - don't expect anything 'standard' for any of this :)
This answer suggests a method of passing a list of raw addresses to the linker. When linking the external executable, generate a linker map file, then process it to produce this raw symbol table.
You could also try linking the entire program (all cores' programs) into a single executable. Use section definitions and a linker script to put each core's program into its internal memory address space; you can build each core's program separately, incrementally link it to a single .o file, then use objcopy to rename its sections to contain the core ID for the linker script, and rename (hide) private symbols if you're duplicating the same code across multiple cores. Finally, manually supply the start address for each core to your bootstrap code instead of using the normal start symbol.

Resources