How can I manually (programmatically) place objects in my multicore project? - c

I am developing a mutlicore project for our embedded architecture using the gnu toolchain. In this architecture, all independent cores share the same global flat memory space. Each core has its own internal memory, which is addressable from any other core through its global 32-bit address.
There is no OS implemented and we do low-level programming, but in C instead of assembly. Each core has its own executable, generated with a separate compilation. The current method we use for inter-core communication is through calculation of absolute addresses of objects in the destination core's data space. If we build the same code for all cores, then the objects are located by the linker in the same place, so accessing an object in a remote core is merely changing the high-order bits of the address of the object in the current core and making the transaction. Similar concept allows us to share objects that are located in the external DRAM.
Things start getting complicated when:
The code is not the same in the two cores, so objects may not be allocated in similar addresses,
We sometimes use a "host", which is another processor running some control code that requires access to objects in the cores, as well as shared objects in the external memory.
In order to overcome this problem, I am looking for an elegant way of placing variables in build time. I would like to avoid changing the linker script file as possible. However, it seems like in the C level, I could only control placement up to using a combination of the section attribute (which is too coarse) and the align attribute (which doesn't guarantee the exact place).
A possible hack is to use inline assembly to define the objects and explicitly place them (using the .org and .global keywords), but it seems somewhat ugly (and we did not yet actually test this idea...)
So, here's the questions:
Is there a semistandard way, or an elegant solution for manually placing objects in a C program?
Can I declare an "uber"-extarnel objects in my code and make the linker resolve their addresses using another project's executable?
This question describes a similar situation, but there the user references a pre-allocated resource (like a peripheral) whose address is known prior to build time.

Maybe you should try to use 'placement' tag from new operator. More exactly if you have already an allocated/shared memory you may create new objects on that. Please see: create objects in pre-allocated memory

You don't say exactly what sort of data you'll be sharing, but assuming it's mostly fixed-size statically allocated variables, I would place all the data in a single struct and share only that.
The key point here is that this struct must be shared code, even if the rest of the programs are not. It would be possible to append extra fields (perhaps with a version field so that the reader can interpret it correctly), but existing fields must not be removed or modifed. structs are already used as the interface between libraries everywhere, so their layout can be relied upon (although a little more care will be need in a heterogeneous environment, as long as the type sizes and alignments are the same you should be ok).
You can then share structs by either:
a) putting them in a special section and using the linker script to put that in a known location;
b) allocating the struct in static data, and placing a pointer to that at a known location, say in your assembly start-up files; or
c) as (b), but allocate the struct on the heap, and copy the pointer to the known pointer location at run-time. The has the advantage that the pointer can be pre-adjusted for external consumers, thus avoiding a certain amount of messing about.
Hope that helps

Response to question 1: no, there isn't.
As for the rest, it depends very much of the operating system you use. On our system at the time I was in embedded, we had only one processor's memory to handle (80186 and 68030 based), but had multi-tasking but from the same binary. Our tool chain was extended to handle the memory in a certain way.
The toolchain looked like that (on 80186):
Microsoft C 16bit or Borland-C
Linker linking to our specific crt.o which defined some special symbols and segments.
Microsoft linker, generating an exe and a map file with a MS-DOS address schema
A locator that adjusted the addresses in the executable and generated a flat binary
Address patcher.
An EPROM burner (later a Flash loader).
In our assembly we defined a symbol that was always at the beginning of data segment and we patched the binary with a hard coded value coming from the located map file. This allowed the library to use all the remaining memory as a heap.
In fact, if you haven't the controle on the locator (the elf loader on linux or the exe/dll loader in windows) you're screwed.

You're well off the beaten path here - don't expect anything 'standard' for any of this :)
This answer suggests a method of passing a list of raw addresses to the linker. When linking the external executable, generate a linker map file, then process it to produce this raw symbol table.
You could also try linking the entire program (all cores' programs) into a single executable. Use section definitions and a linker script to put each core's program into its internal memory address space; you can build each core's program separately, incrementally link it to a single .o file, then use objcopy to rename its sections to contain the core ID for the linker script, and rename (hide) private symbols if you're duplicating the same code across multiple cores. Finally, manually supply the start address for each core to your bootstrap code instead of using the normal start symbol.

Related

Why must shared libraries be position independent while static libraries don't?

I understand position independent code uses offsets from the current position whilst position dependent code uses absolute addresses.
However, I don't understand why shared libraries must be treated as being position independent whilst static libraries do not?
Generally, nowadays programs have only one linear address-space for everything, backed by virtual memory (hardware and os subsystem). And everything used must be fitted into it somehow.
For that, we have to differentiate between PIC code (any position is good), relocatable code (one position is preferred), and fixed code (only one position works).
As the executable itself is priviliged, in that it is the first user code loaded (aside from a loader in some systems, though that can generally reposition itself seamlessly), it can be put wherever you want. Using that limits ASLR though.
Code in static libraries for executables can take advantage, though will limit the including code.
The order in which the shared libraries are loaded on the other hand is far less well-specified, thus while the loader can try to put it at a preferred position, it generally has to be able to put it elsewhere.
Thus, shared libraries, and code for inclusion by them, has to be PIC or at least relocatable.
PIC code is generally slightly slower, though needs fewer fixups, meaning most can be reloaded from the source-binary as needed, instead of having to be either re-relocated (which happened in Windows 95 and descendents) or swapped to disk when the space is needed.
Actually "static libraries" have position independent code, too. The difference is that the linker resolves those relative addresses to absolute addresses when building the static executable. Once the static library is linked, it cannot be executed at any other address.
For shared libraries to be able to be shared it means the code most not be changed. Therefore the code is prepared to work at any position at run-time.
All "addresses" used above mean "virtual addresses" there days. Static libraries still can be loaded and executed at different physical addresses while the virtual addresses stay the same...

GCC symbol table for local variables on stack

Of course, symbol and type information of each variable defined in a C/C++ program is available, otherwise the debuggers could not show them. But how to access this information?
A lot info about the elf is available, but that is about linking an seems to hold only global variables, not local ones on the stack i.e.
In a remote real time system (not under unix), I'd like to be able to peek now and then by copying some memory in a list together with the associated variable name, and later on take a look at them while the RT system goes on.
The best would be that the dump could be introduced at any time for any variable without the need to add some statements in the code upfront.
But how to access this information?
TL;DR: it's complicated.
You would need to build almost a complete debugger. You can watch this space. When the author gets around to step 9, you'll have an example to follow.
I'd like to be able to peek now and then by copying some memory in a list together with the associated variable name, and later on take a look at them while the RT system goes on.
RT systems do not usually lend themselves to easy debugging. The best you could probably do is take a snapshot of the entire (used portion of) the stack, and "fish out" variable values later.
To do that, you'll need to know current values of the stack pointer and instruction pointer, contents of the stack, and load addresses of all ELF objects. And you'll need to re-implement large part of a debugger (or modify existing one).
The easiest approach might be to convert (post-process) the above info into an ELF core, and then use existing debugger of your choice to analyse the values. You can use Google user-space coredumper to see what's involved. See also this answer.

__attribute__((section("name"))) usage?

I have ran through code that use _ attribute _((section("name")). I understand that for gcc compiler this allows you to tell the linker to put the object created at a specific section "name" (with "name" absolute address declared in a linker file).
What is the point of doing this instead of just using the .data section?
There are many possible uses. [Edit to add note: this is just a sample of uses I've seen myself or considered, not a complete list.]
The Linux kernel, for instance, marks some code and data sections as used only during kernel bootstrap. These can be jettisoned after the kernel is running, reclaiming the space for other uses.
You can use this to mark code or data values that need patching on a particular processor variant, e.g., with or without a coprocessor.
You can use it to make things live in "special" address spaces that will be burned to PROM or saved on an EEPROM, rather than in ordinary memory.
You can use it to collect together code or data areas for purposes like initialization and cleanup, as with C++ constructors and destructors that run before the program starts and when it ends, or for using shorter addressing modes (I don't know how much that would apply on ARM as I have not written any ARM code myself).
The actual use depends on the linker script(s).
From a usecase point of view, there are lots of different types of .data, like:
data local to a specific CPU and/or NUMA node
data shared between contexts (like user/kernelspace, as are the .vdso or vsyscall pages. Or, another example, bootloader and kernel)
readonly data or other data with specific access mode/type restrictions (say, cacheability or cache residency - the latter can be specificed on some ARM SoCs)
data that persists "state transitions" (such as hibernation image loads, or crash kernel / fast reboot reinitializations)
data with specific lifetimes/lifecycles (only used in specific stages during boot or during operation, write-once data)
data specific to a particular kernel subsystem or particular kernel module
"code colocated" data (addressing offsets in x64 are plus/minus 2GB so if you want RIP-relative addressing, the data must be within that range of your currently executing code)
data mapped to some specific hardware register space VA range
So in the end it's often about attributes (the word here used in a more generic sense than what __attribute__(...) allows you to state from within gcc sourcecode. Whether another section is needed and/or useful is ... in the eye of the beholder - the system designer, that is.
The availabiltiy of the section attribute, therefore, allows for flexibility and that is, IMHO, a good thing.
Years later, I'm going to add a specific detail because it's worth writing down.
If you create your own section, you can manage it yourself. In particular, you can use preprocessor macros to insert certain data items into your special section. If the only thing that uses that special section is your preprocessor macros, then you have the ability to create a data structure in a distributed fashion.
What does this mean? It means you can write a preprocessor macro like ADD_VAR_TO_SPECIAL_SECTION(...) and concatenate a bunch of different values in random order into what amounts to an array (or just a big old pile, if they aren't all the same type) in your section.
This gives you the ability to create a (randomly-ordered) array of data at compile time. There is no initialization, no registration, no overhead. You just compile and link your code, and all the macros that were in all the different source files have added all their values into one big array.
How can you use this? Create a bunch of "modules." Register the init functions and destroy functions in an ad-hoc array. Process the array at startup time. (You can add some kind of topological sort if you need to.) You don't need to have a master list of modules anywhere, it gets built automatically. Or, create a macro to register unit test functions into a test suite. Again, it creates an ad-hoc list with no "registration" required.

What is the deal with position-independent code (PIC)?

Could somebody explain why I should be interested in compiling position-independent code, and also why should I avoid it?
Making code position-independent adds a layer of abstraction, which requires an additional lookup step at runtime for certain operations (usually pertaining to accessing variables with static storage).
So if you don't need it, don't use it!
There are specific situations where you must produce PIC (namely when creating run-time loadable code, such as a plug-in module or library), but the added flexibility comes at a price.
The gory details depend on how your loader works on on whether you are building a executable or a library, but there is a sense in which this is all a problem for the build system and the compiler, not for you.
If you really want to understand you need to consider where the code gets put in the address space before execution starts and what set of branching instructions your chip provides. Are branches relative or absolute? Is access to the data segment relative or absolute?
If branches are absolute, then the code must be loaded to a reliable address or it won't work. That's position dependent code. Many simple (or older) operating systems accommodate this by always loading a program to the same place.
Relative branches mean that the can be placed at any location in memory. That is position independent code.
Again, all you need to know is the recipe for invoking your compiler and linker on your platform.
PIC code usually has to be slightly larger because the compiler can't use instructions that encode relative address offsets. Without PIC, many addresses can be encoded with 16 or 8 bits relative to current PC. Sometimes in embedded systems, PIC is useful. For example if you want to have patch code that can run at various physical addresses.

Fixed address variable in C

For embedded applications, it is often necessary to access fixed memory locations for peripheral registers. The standard way I have found to do this is something like the following:
// access register 'foo_reg', which is located at address 0x100
#define foo_reg *(int *)0x100
foo_reg = 1; // write to foo_reg
int x = foo_reg; // read from foo_reg
I understand how that works, but what I don't understand is how the space for foo_reg is allocated (i.e. what keeps the linker from putting another variable at 0x100?). Can the space be reserved at the C level, or does there have to be a linker option that specifies that nothing should be located at 0x100. I'm using the GNU tools (gcc, ld, etc.), so am mostly interested in the specifics of that toolset at the moment.
Some additional information about my architecture to clarify the question:
My processor interfaces to an FPGA via a set of registers mapped into the regular data space (where variables live) of the processor. So I need to point to those registers and block off the associated address space. In the past, I have used a compiler that had an extension for locating variables from C code. I would group the registers into a struct, then place the struct at the appropriate location:
typedef struct
{
BYTE reg1;
BYTE reg2;
...
} Registers;
Registers regs _at_ 0x100;
regs.reg1 = 0;
Actually creating a 'Registers' struct reserves the space in the compiler/linker's eyes.
Now, using the GNU tools, I obviously don't have the at extension. Using the pointer method:
#define reg1 *(BYTE*)0x100;
#define reg2 *(BYTE*)0x101;
reg1 = 0
// or
#define regs *(Registers*)0x100
regs->reg1 = 0;
This is a simple application with no OS and no advanced memory management. Essentially:
void main()
{
while(1){
do_stuff();
}
}
Your linker and compiler don't know about that (without you telling it anything, of course). It's up to the designer of the ABI of your platform to specify they don't allocate objects at those addresses.
So, there is sometimes (the platform i worked on had that) a range in the virtual address space that is mapped directly to physical addresses and another range that can be used by user space processes to grow the stack or to allocate heap memory.
You can use the defsym option with GNU ld to allocate some symbol at a fixed address:
--defsym symbol=expression
Or if the expression is more complicated than simple arithmetic, use a custom linker script. That is the place where you can define regions of memory and tell the linker what regions should be given to what sections/objects. See here for an explanation. Though that is usually exactly the job of the writer of the tool-chain you use. They take the spec of the ABI and then write linker scripts and assembler/compiler back-ends that fulfill the requirements of your platform.
Incidentally, GCC has an attribute section that you can use to place your struct into a specific section. You could then tell the linker to place that section into the region where your registers live.
Registers regs __attribute__((section("REGS")));
A linker would typically use a linker script to determine where variables would be allocated. This is called the "data" section and of course should point to a RAM location. Therefore it is impossible for a variable to be allocated at an address not in RAM.
You can read more about linker scripts in GCC here.
Your linker handles the placement of data and variables. It knows about your target system through a linker script. The linker script defines regions in a memory layout such as .text (for constant data and code) and .bss (for your global variables and the heap), and also creates a correlation between a virtual and physical address (if one is needed). It is the job of the linker script's maintainer to make sure that the sections usable by the linker do not override your IO addresses.
When the embedded operating system loads the application into memory, it will load it in usually at some specified location, lets say 0x5000. All the local memory you are using will be relative to that address, that is, int x will be somewhere like 0x5000+code size+4... assuming this is a global variable. If it is a local variable, its located on the stack. When you reference 0x100, you are referencing system memory space, the same space the operating system is responsible for managing, and probably a very specific place that it monitors.
The linker won't place code at specific memory locations, it works in 'relative to where my program code is' memory space.
This breaks down a little bit when you get into virtual memory, but for embedded systems, this tends to hold true.
Cheers!
Getting the GCC toolchain to give you an image suitable for use directly on the hardware without an OS to load it is possible, but involves a couple of steps that aren't normally needed for normal programs.
You will almost certainly need to customize the C run time startup module. This is an assembly module (often named something like crt0.s) that is responsible initializing the initialized data, clearing the BSS, calling constructors for global objects if C++ modules with global objects are included, etc. Typical customizations include the need to setup your hardware to actually address the RAM (possibly including setting up the DRAM controller as well) so that there is a place to put data and stack. Some CPUs need to have these things done in a specific sequence: e.g. The ColdFire MCF5307 has one chip select that responds to every address after boot which eventually must be configured to cover just the area of the memory map planned for the attached chip.
Your hardware team (or you with another hat on, possibly) should have a memory map documenting what is at various addresses. ROM at 0x00000000, RAM at 0x10000000, device registers at 0xD0000000, etc. In some processors, the hardware team might only have connected a chip select from the CPU to a device, and leave it up to you to decide what address triggers that select pin.
GNU ld supports a very flexible linker script language that allows the various sections of the executable image to be placed in specific address spaces. For normal programming, you never see the linker script since a stock one is supplied by gcc that is tuned to your OS's assumptions for a normal application.
The output of the linker is in a relocatable format that is intended to be loaded into RAM by an OS. It probably has relocation fixups that need to be completed, and may even dynamically load some libraries. In a ROM system, dynamic loading is (usually) not supported, so you won't be doing that. But you still need a raw binary image (often in a HEX format suitable for a PROM programmer of some form), so you will need to use the objcopy utility from binutil to transform the linker output to a suitable format.
So, to answer the actual question you asked...
You use a linker script to specify the target addresses of each section of your program's image. In that script, you have several options for dealing with device registers, but all of them involve putting the text, data, bss stack, and heap segments in address ranges that avoid the hardware registers. There are also mechanisms available that can make sure that ld throws an error if you overfill your ROM or RAM, and you should use those as well.
Actually getting the device addresses into your C code can be done with #define as in your example, or by declaring a symbol directly in the linker script that is resolved to the base address of the registers, with a matching extern declaration in a C header file.
Although it is possible to use GCC's section attribute to define an instance of an uninitialized struct as being located in a specific section (such as FPGA_REGS), I have found that not to work well in real systems. It can create maintenance issues, and it becomes an expensive way to describe the full register map of the on-chip devices. If you use that technique, the linker script would then be responsible for mapping FPGA_REGS to its correct address.
In any case, you are going to need to get a good understanding of object file concepts such as "sections" (specifically the text, data, and bss sections at minimum), and may need to chase down details that bridge the gap between hardware and software such as the interrupt vector table, interrupt priorities, supervisor vs. user modes (or rings 0 to 3 on x86 variants) and the like.
Typically these addresses are beyond the reach of your process. So, your linker wouldn't dare put stuff there.
If the memory location has a special meaning on your architecture, the compiler should know that and not put any variables there. That would be similar to the IO mapped space on most architectures. It has no knowledge that you're using it to store values, it just knows that normal variables shouldn't go there. Many embedded compilers support language extensions that allow you to declare variables and functions at specific locations, usually using #pragma. Also, generally the way I've seen people implement the sort of memory mapping you're trying to do is to declare an int at the desired memory location, then just treat it as a global variable. Alternately, you could declare a pointer to an int and initialize it to that address. Both of these provide more type safety than a macro.
To expand on litb's answer, you can also use the --just-symbols={symbolfile} option to define several symbols, in case you have more than a couple of memory-mapped devices. The symbol file needs to be in the format
symbolname1 = address;
symbolname2 = address;
...
(The spaces around the equals sign seem to be required.)
Often, for embedded software, you can define within the linker file one area of RAM for linker-assigned variables, and a separate area for variables at absolute locations, which the linker won't touch.
Failing to do this should cause a linker error, as it should spot that it's trying to place a variable at a location already being used by a variable with absolute address.
This depends a bit on what OS you are using. I'm guessing you are using something like DOS or vxWorks. Generally the system will have certian areas of the memory space reserved for hardware, and compilers for that platform will always be smart enough to avoid those areas for their own allocations. Otherwise you'd be continually writing random garbage to disk or line printers when you meant to be accessing variables.
In case something else was confusing you, I should also point out that #define is a preprocessor directive. No code gets generated for that. It just tells the compiler to textually replace any foo_reg it sees in your source file with *(int *)0x100. It is no different than just typing *(int *)0x100 in yourself everywhere you had foo_reg, other than it may look cleaner.
What I'd probably do instead (in a modern C compiler) is:
// access register 'foo_reg', which is located at address 0x100
const int* foo_reg = (int *)0x100;
*foo_reg = 1; // write to foo_regint
x = *foo_reg; // read from foo_reg

Resources