GCC symbol table for local variables on stack - c

Of course, symbol and type information of each variable defined in a C/C++ program is available, otherwise the debuggers could not show them. But how to access this information?
A lot info about the elf is available, but that is about linking an seems to hold only global variables, not local ones on the stack i.e.
In a remote real time system (not under unix), I'd like to be able to peek now and then by copying some memory in a list together with the associated variable name, and later on take a look at them while the RT system goes on.
The best would be that the dump could be introduced at any time for any variable without the need to add some statements in the code upfront.

TL;DR: it's complicated.
You would need to build almost a complete debugger. You can watch this space. When the author gets around to step 9, you'll have an example to follow.
RT systems do not usually lend themselves to easy debugging. The best you could probably do is take a snapshot of the entire (used portion of) the stack, and "fish out" variable values later.
To do that, you'll need to know current values of the stack pointer and instruction pointer, contents of the stack, and load addresses of all ELF objects. And you'll need to re-implement large part of a debugger (or modify existing one).
The easiest approach might be to convert (post-process) the above info into an ELF core, and then use existing debugger of your choice to analyse the values. You can use Google user-space coredumper to see what's involved. See also this answer.


__attribute__((section("name"))) usage?

I have ran through code that use _ attribute _((section("name")). I understand that for gcc compiler this allows you to tell the linker to put the object created at a specific section "name" (with "name" absolute address declared in a linker file).
What is the point of doing this instead of just using the .data section?
There are many possible uses. [Edit to add note: this is just a sample of uses I've seen myself or considered, not a complete list.]
The Linux kernel, for instance, marks some code and data sections as used only during kernel bootstrap. These can be jettisoned after the kernel is running, reclaiming the space for other uses.
You can use this to mark code or data values that need patching on a particular processor variant, e.g., with or without a coprocessor.
You can use it to make things live in "special" address spaces that will be burned to PROM or saved on an EEPROM, rather than in ordinary memory.
You can use it to collect together code or data areas for purposes like initialization and cleanup, as with C++ constructors and destructors that run before the program starts and when it ends, or for using shorter addressing modes (I don't know how much that would apply on ARM as I have not written any ARM code myself).
The actual use depends on the linker script(s).
From a usecase point of view, there are lots of different types of .data, like:
data local to a specific CPU and/or NUMA node
data shared between contexts (like user/kernelspace, as are the .vdso or vsyscall pages. Or, another example, bootloader and kernel)
readonly data or other data with specific access mode/type restrictions (say, cacheability or cache residency - the latter can be specificed on some ARM SoCs)
data that persists "state transitions" (such as hibernation image loads, or crash kernel / fast reboot reinitializations)
data with specific lifetimes/lifecycles (only used in specific stages during boot or during operation, write-once data)
data specific to a particular kernel subsystem or particular kernel module
"code colocated" data (addressing offsets in x64 are plus/minus 2GB so if you want RIP-relative addressing, the data must be within that range of your currently executing code)
data mapped to some specific hardware register space VA range
So in the end it's often about attributes (the word here used in a more generic sense than what __attribute__(...) allows you to state from within gcc sourcecode. Whether another section is needed and/or useful is ... in the eye of the beholder - the system designer, that is.
The availabiltiy of the section attribute, therefore, allows for flexibility and that is, IMHO, a good thing.
Years later, I'm going to add a specific detail because it's worth writing down.
If you create your own section, you can manage it yourself. In particular, you can use preprocessor macros to insert certain data items into your special section. If the only thing that uses that special section is your preprocessor macros, then you have the ability to create a data structure in a distributed fashion.
What does this mean? It means you can write a preprocessor macro like ADD_VAR_TO_SPECIAL_SECTION(...) and concatenate a bunch of different values in random order into what amounts to an array (or just a big old pile, if they aren't all the same type) in your section.
This gives you the ability to create a (randomly-ordered) array of data at compile time. There is no initialization, no registration, no overhead. You just compile and link your code, and all the macros that were in all the different source files have added all their values into one big array.
How can you use this? Create a bunch of "modules." Register the init functions and destroy functions in an ad-hoc array. Process the array at startup time. (You can add some kind of topological sort if you need to.) You don't need to have a master list of modules anywhere, it gets built automatically. Or, create a macro to register unit test functions into a test suite. Again, it creates an ad-hoc list with no "registration" required.

Maximum stack size needed for a C program on MSP430

In a C program that doesn't use recursion, it should be possible in theory to work out the maximum/worst case stack size needed to call a given function, and anything that it calls. Are there any free, open source tools that can do this, either from the source code or compiled ELF files?
Alternatively, is there a way to extract a function's stack frame size from an ELF file, so I can try to work it out manually?
I'm compiling for the MSP430 using MSPGCC 3.2.3 (I know it's an old version, but I have to use it in this case). The stack space to allocate is set in the source code, and should be as small as possible so that the rest of memory can be used for other things. I have read that you need to take account of the stack space used by interrupts, but the system I'm using already takes account of this - I'm trying to work out how much extra space to add on top of that. Also, I've read that function pointers make this difficult. In the few places where function pointers are used here, I know which functions they can call, so could take account of these cases manually if the stack space needed for the called functions and the calling functions was known.
Static analysis seems like a more robust option than stack painting at runtime, but working it out at runtime is an option if there's no good way to do it statically.
I found GCC's -fstack-usage flag, which saves the frame size for each function as it is compiled. Unfortunately, MSPGCC doesn't support it. But it could be useful for anyone who is trying to do something similar on a different platform.
While static analysis is the best method for determining maximum stack usage you may have to resort to an experimental method. This method cannot guarantee you an absolute maximum but can provide you with a very good idea of your stack usage.
You can check your linker script to get the location of __STACK_END and __STACK_SIZE. You can use these to fill the stack space with an easily recognizable pattern like 0xDEAD or 0xAA55. Run your code through a torture test to try and make sure as many interrupts are generated as possible.
After the test you can examine the stack space to see how much of the stack was overwritten.
Interesting question.
I would expect this information to be statically available in the debugging data included in debug builds.
I had a brief look at the DWARF standard, and it does specify two attributes for functions called DW_AT_frame_base and DW_AT_static_link which can be used to "computes the frame
base of the relevant instance of the subroutine
that immediately encloses the subroutine or entry point".
I think that the only to go is by static analysis. You need to account the space for all non-static local variables, which are going to be mostly pointers, but pointers that are going to be stored in the stack anyway, you'll need also to reserve space for the current running address within the caller, as it's going to be stored by the compiler on the stack so control can be return to the caller after your function returns, and also, you need space for all your function parameters.
Based on that, if you have a tool able to count all parameters, auto variables and figure out their size, you should be able to calculate the minimum stack frame size you'll need.
Please note that the compiler could also try to align values on the stack for your particular architecture, what could make the stack space requirements a little bigger that what you'd expect from this calculation.
Some embedded IDE can give info on stack usageduring runtime
I know that IAR eembedded workbench supports it.
Be aware that you need to take in account that interrupts occur asynchronously, so take the biggest stack usage scenario and add interrupt context to it. If nested interrupts are supported like in ARM processors you need to take this in account also.
TinyOS has some work done on stack size analysis. It is described here:
They only support AVR, but say that "MSP430 is not difficult to support but this is not super high priority". In any case, the page provides lots of resources.

Detect write to string

Is there a way for me to detect/initiate-creash-on a write into a string without using mprotect (which I can't use)?
Currently I can detect the write only in the following read, but that's too late (the following read can come from a completely different lib).
Note: Using gdb with watchpoints failed due to optimizer moving the string around in the process memory.
Edit: The variable in question is a class member (char*) that contains some metadata as a prefix to a string. The string is the part that needs to be immutable, and the prefix must be writable. I've got a few millions of these objects in a class-static hash, and they are accessed from just about anywhere in our code.
You can try to wrap all the code that writes to memory in preprocessor macros which check the address that you're using but since most people love to use bare bones pointers (instead of library calls that encapsulate things), it will probably be a lot of effort.
The only other option is mprotect(2) or GDB which all use special parts of the CPU to watch the address bus for accesses to the memory in question.
Since you can't use that either, the last option is to print the code on paper and sit down in a quiet corner for a couple of days to read it. This will usually work but most people shun the effort (and because it doesn't look like "real" work ;-).
I am not sure if there is a command in gdb similar to "trace" in dbx, but in dbx I remember using a command called "trace" that can be used to track individual variables in the code and it intimates you when the variable value gets changed during the course of execution.

How can I manually (programmatically) place objects in my multicore project?

I am developing a mutlicore project for our embedded architecture using the gnu toolchain. In this architecture, all independent cores share the same global flat memory space. Each core has its own internal memory, which is addressable from any other core through its global 32-bit address.
There is no OS implemented and we do low-level programming, but in C instead of assembly. Each core has its own executable, generated with a separate compilation. The current method we use for inter-core communication is through calculation of absolute addresses of objects in the destination core's data space. If we build the same code for all cores, then the objects are located by the linker in the same place, so accessing an object in a remote core is merely changing the high-order bits of the address of the object in the current core and making the transaction. Similar concept allows us to share objects that are located in the external DRAM.
Things start getting complicated when:
The code is not the same in the two cores, so objects may not be allocated in similar addresses,
We sometimes use a "host", which is another processor running some control code that requires access to objects in the cores, as well as shared objects in the external memory.
In order to overcome this problem, I am looking for an elegant way of placing variables in build time. I would like to avoid changing the linker script file as possible. However, it seems like in the C level, I could only control placement up to using a combination of the section attribute (which is too coarse) and the align attribute (which doesn't guarantee the exact place).
A possible hack is to use inline assembly to define the objects and explicitly place them (using the .org and .global keywords), but it seems somewhat ugly (and we did not yet actually test this idea...)
So, here's the questions:
Is there a semistandard way, or an elegant solution for manually placing objects in a C program?
Can I declare an "uber"-extarnel objects in my code and make the linker resolve their addresses using another project's executable?
This question describes a similar situation, but there the user references a pre-allocated resource (like a peripheral) whose address is known prior to build time.
Maybe you should try to use 'placement' tag from new operator. More exactly if you have already an allocated/shared memory you may create new objects on that. Please see: create objects in pre-allocated memory
You don't say exactly what sort of data you'll be sharing, but assuming it's mostly fixed-size statically allocated variables, I would place all the data in a single struct and share only that.
The key point here is that this struct must be shared code, even if the rest of the programs are not. It would be possible to append extra fields (perhaps with a version field so that the reader can interpret it correctly), but existing fields must not be removed or modifed. structs are already used as the interface between libraries everywhere, so their layout can be relied upon (although a little more care will be need in a heterogeneous environment, as long as the type sizes and alignments are the same you should be ok).
You can then share structs by either:
a) putting them in a special section and using the linker script to put that in a known location;
b) allocating the struct in static data, and placing a pointer to that at a known location, say in your assembly start-up files; or
c) as (b), but allocate the struct on the heap, and copy the pointer to the known pointer location at run-time. The has the advantage that the pointer can be pre-adjusted for external consumers, thus avoiding a certain amount of messing about.
Hope that helps
Response to question 1: no, there isn't.
As for the rest, it depends very much of the operating system you use. On our system at the time I was in embedded, we had only one processor's memory to handle (80186 and 68030 based), but had multi-tasking but from the same binary. Our tool chain was extended to handle the memory in a certain way.
The toolchain looked like that (on 80186):
Microsoft C 16bit or Borland-C
Linker linking to our specific crt.o which defined some special symbols and segments.
Microsoft linker, generating an exe and a map file with a MS-DOS address schema
A locator that adjusted the addresses in the executable and generated a flat binary
Address patcher.
An EPROM burner (later a Flash loader).
In our assembly we defined a symbol that was always at the beginning of data segment and we patched the binary with a hard coded value coming from the located map file. This allowed the library to use all the remaining memory as a heap.
In fact, if you haven't the controle on the locator (the elf loader on linux or the exe/dll loader in windows) you're screwed.
You're well off the beaten path here - don't expect anything 'standard' for any of this :)
This answer suggests a method of passing a list of raw addresses to the linker. When linking the external executable, generate a linker map file, then process it to produce this raw symbol table.
You could also try linking the entire program (all cores' programs) into a single executable. Use section definitions and a linker script to put each core's program into its internal memory address space; you can build each core's program separately, incrementally link it to a single .o file, then use objcopy to rename its sections to contain the core ID for the linker script, and rename (hide) private symbols if you're duplicating the same code across multiple cores. Finally, manually supply the start address for each core to your bootstrap code instead of using the normal start symbol.

C memcpy() a function

Is there any method to calculate size of a function? I have a pointer to a function and I have to copy entire function using memcpy. I have to malloc some space and know 3rd parameter of memcpy - size. I know that sizeof(function) doesn't work. Do you have any suggestions?
Functions are not first class objects in C. Which means they can't be passed to another function, they can't be returned from a function, and they can't be copied into another part of memory.
A function pointer though can satisfy all of this, and is a first class object. A function pointer is just a memory address and it usually has the same size as any other pointer on your machine.
It doesn't directly answer your question, but you should not implement call-backs from kernel code to user-space.
Injecting code into kernel-space is not a great work-around either.
It's better to represent the user/kernel barrier like a inter-process barrier. Pass data, not code, back and forth between a well defined protocol through a char device. If you really need to pass code, just wrap it up in a kernel module. You can then dynamically load/unload it, just like a .so-based plugin system.
On a side note, at first I misread that you did want to pass memcpy() to the kernel. You have to remind that it is a very special function. It is defined in the C standard, quite simple, and of a quite broad scope, so it is a perfect target to be provided as a built-in by the compiler.
Just like strlen(), strcmp() and others in GCC.
That said, the fact that is a built-in does not impede you ability to take a pointer to it.
Even if there was a way to get the sizeof() a function, it may still fail when you try to call a version that has been copied to another area in memory. What if the compiler has local or long jumps to specific memory locations. You can't just move a function in memory and expect it to run. The OS can do that but it has all the information it takes to do it.
I was going to ask how operating systems do this but, now that I think of it, when the OS moves stuff around it usually moves a whole page and handles memory such that addresses translate to a page/offset. I'm not sure even the OS ever moves a single function around in memory.
Even in the case of the OS moving a function around in memory, the function itself must be declared or otherwise compiled/assembled to permit such action, usually through a pragma that indicates the code is relocatable. All the memory references need to be relative to its own stack frame (aka local variables) or include some sort of segment+offset structure such that the CPU, either directly or at the behest of the OS, can pick the appropriate segment value. If there was a linker involved in creating the app, the app may have to be
re-linked to account for the new function address.
There are operating systems which can give each application its own 32-bit address space but it applies to the entire process and any child threads, not to an individual function.
As mentioned elsewhere, you really need a language where functions are first class objects, otherwise you're out of luck.
You want to copy a function? I do not think that this is possible in C generally.
Assume, you have a Harvard-Architecture microcontroller, where code (in other words "functions") is located in ROM. In this case you cannot do that at all.
Also I know several compilers and linkers, which do optimization on file (not only function level). This results in opcode, where parts of C functions are mixed into each other.
The only way which I consider as possible may be:
Generate opcode of your function (e.g. by compiling/assembling it on its own).
Copy that opcode into an C array.
Use a proper function pointer, pointing to that array, to call this function.
Now you can perform all operations, common to typical "data", on that array.
But apart from this: Did you consider a redesign of your software, so that you do not need to copy a functions content?
I don't quite understand what you are trying to accomplish, but assuming you compile with -fPIC and don't have your function do anything fancy, no other function calls, not accessing data from outside function, you might even get away with doing it once. I'd say the safest possibility is to limit the maximum size of supported function to, say, 1 kilobyte and just transfer that, and disregard the trailing junk.
If you really needed to know the exact size of a function, figure out your compiler's epilogue and prologue. This should look something like this on x86:
mov esp, ebp
pop ebp
;expect a varying length run of NOPs here
push ebp
mov ebp, esp
Disassemble your compiler's output to check, and take the corresponding assembled sequences to search for. Epilogue alone might be enough, but all of this can bomb if searched sequence pops up too early, e.g. in the data embedded by the function. Searching for the next prologue might also get you into trouble, i think.
Now please ignore everything that i wrote, since you apparently are trying to approach the problem in the wrong and inherently unsafe way. Paint us a larger picture please, WHY are you trying to do that, and see whether we can figure out an entirely different approach.
A similar discussion was done here:
They propose creating a dummy function after your function-to-be-copied, and then getting the memory pointers to both. But you need to switch off compiler optimizations for it to work.
If you have GCC >= 4.4, you could try switching off the optimizations for your function in particular using #pragma:
Another proposed solution was not to copy the function at all, but define the function in the place where you would want to copy it to.
Good luck!
If your linker doesn't do global optimizations, then just calculate the difference between the function pointer and the address of the next function.
Note that copying the function will produce something which can't be invoked if your code isn't compiled relocatable (i.e. all addresses in the code must be relative, for example branches; globals work, though since they don't move).
It sounds like you want to have a callback from your kernel driver to userspace, so that it can inform userspace when some asynchronous job has finished.
That might sound sensible, because it's the way a regular userspace library would probably do things - but for the kernel/userspace interface, it's quite wrong. Even if you manage to get your function code copied into the kernel, and even if you make it suitably position-independent, it's still wrong, because the kernel and userspace code execute in fundamentally different contexts. For just one example of the differences that might cause problems, if a page fault happens in kernel context due to a swapped-out page, that'll cause a kernel oops rather than swapping the page in.
The correct approach is for the kernel to make some file descriptor readable when the asynchronous job has finished (in your case, this file descriptor almost certainly be the character device your driver provides). The userspace process can then wait for this event with select / poll, or with read - it can set the file descriptor non-blocking if wants, and basically just use all the standard UNIX tools for dealing with this case. This, after all, is how the asynchronous nature of network sockets (and pretty much every other asychronous case) is handled.
If you need to provide additional information about what the event that occured, that can be made available to the userspace process when it calls read on the readable file descriptor.
Function isn't just object you can copy. What about cross-references / symbols and so on? Of course you can take something like standard linux "binutils" package and torture your binaries but is it what you want?
By the way if you simply are trying to replace memcpy() implementation, look around LD_PRELOAD mechanics.
I can think of a way to accomplish what you want, but I won't tell you because it's a horrific abuse of the language.
A cleaner method than disabling optimizations and relying on the compiler to maintain order of functions is to arrange for that function (or a group of functions that need copying) to be in its own section. This is compiler and linker dependant, and you'll also need to use relative addressing if you call between the functions that are copied. For those asking why you would do this, its a common requirement in embedded systems that need to update the running code.
My suggestion is: don't.
Injecting code into kernel space is such an enormous security hole that most modern OSes forbid self-modifying code altogether.
As near as I can tell, the original poster wants to do something that is implementation-specific, and so not portable; this is going off what the C++ standard says on the subject of casting pointers-to-functions, rather than the C standard, but that should be good enough here.
In some environments, with some compilers, it might be possible to do what the poster seems to want to do (that is, copy a block of memory that is pointed to by the pointer-to-function to some other location, perhaps allocated with malloc, cast that block to a pointer-to-function, and call it directly). But it won't be portable, which may not be an issue. Finding the size required for that block of memory is itself dependent on the environment, and compiler, and may very well require some pretty arcane stuff (e.g., scanning the memory for a return opcode, or running the memory through a disassembler). Again, implementation-specific, and highly non-portable. And again, may not matter for the original poster.
The links to potential solutions all appear to make use of implementation-specific behaviour, and I'm not even sure that they do what the purport to do, but they may be suitable for the OP.
Having beaten this horse to death, I am curious to know why the OP wants to do this. It would be pretty fragile even if it works in the target environment (e.g., could break with changes to compiler options, compiler version, code refactoring, etc). I'm glad that I don't do work where this sort of magic is necessary (assuming that it is)...
I have done this on a Nintendo GBA where I've copied some low level render functions from flash (16 bit access slowish memory) to the high speed workspace ram (32 bit access, at least twice as fast). This was done by taking the address of the function immdiately after the function I wanted to copy, size = (int) (NextFuncPtr - SourceFuncPtr). This did work well but obviously cant be garunteed on all platforms (does not work on Windows for sure).
I think one solution can be as below.
For ex: if you want to know func() size in program a.c, and have indicators before and after the function.
Try writing a perl script which will compile this file into object format(cc -o) make sure that pre-processor statements are not removed. You need them later on to calculate the size from object file.
Now search for your two indicators and find out the code size in between.
