Code that belongs to the same section but different subsections has its order of placement defined by the subsection number. I need to use this feature in a c program - i.e. I need two functions to be in the same section and in a particular order. GCC re-orders functions in the same section as it pleases, so that is why I need subsections. Here is the syntax for sections - I can't figure out how to specify subsections using the __attribute__ syntax.
void func1() __attribute__ ((section ("mysection")));
See Jester's comment below for assembly syntax. I am using gcc, so I am assuming gas assembler?
Here is a long explanation of why I have gotten to the point of needing subsections. Maybe one of my conclusions along the way was incorrect and you can help me avoid this.
Q: Why not create separate sections and load them contiguously?
A: I have a separate problem where I need to be able to figure out the exact beginning address of my functions ahead of time.
Q: Why do you need to know the address?
A: I want to align some code in my functions (not the function itself) to a particular alignment
Q: Why not use .align?
A: I have found that using .align inside a c function for some reason forces that function itself to be aligned to that value, and I do not want that - so I have come up with an ugly macro alternative to the .align directive:
b 1f
. = . + (1 << #alignment") - (("#section_start" + .) & ((1 << "#alignment") - 1))
1:
Q: Why not use labels to calculate your current location? Or a label in the loader file?
A: Assembler doesn't let me - I have to use the dot operator.
Q: Tell me again why you need section_start here?
A: The dot operator is relative to the start of the section, it is not the absolute address
Q: Why are you trying this low level stuff in C this is dumb
A: I agree this is dumb, but play along.
I can't figure out how to use subsections, but I believe this GCC option forces function order, and I seem to have at least one example where it fixes the ordering in my test. I am slightly concerned about having to set -fno-section-anchors as well (seems like you can't only use -fno-toplevel-reorder), but this might be the best workaround I have right now.
One problem with this approach is that I lose the ability to place each function in separate sections - which has the benefit of allowing me to use the linker script to calculate the end of functions (also useful to me).
https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html
-fno-toplevel-reorder -fno-section-anchors
Related
I'm trying to gain some insight on how Apples OS signpost implementation works. I'm working with the C API (there is also a Swift API). My ultimate goal is trying to build a RAII style C++ wrapper class for them, which is harder as it might seem.
Expanding the os_signpost_emit_with_type macro reveals that it creates static strings from the string literals passed to that macro that look like this:
__attribute__((section("__TEXT,__oslogstring,cstring_literals"), internal_linkage)) static const char string_name[] __asm (OS_STRINGIFY(OS_CONCAT(LOS_##_ns, __COUNTER__))) = "string literal";
These strings will later appear as names for the signposts in the instruments profiler. What I get from reading that code, is that the string is placed in a specific section of the binary so that the profiler can find it. What's confusing me is the __asm statement before the assignment. Obviously via the __COUNTER__ macro, it expands to something like __asm ("LOS_##_ns0"), __asm ("LOS_##_ns1") with the number being unique for every string. I have very little in depth knowledge when it comes to assembly, I tried to research the meaning of that statement a bit but got no useful results.
My try-and-error testing revealed that the uniqueness of that numerical appendix generated by the __COUNTER__ macro matters, if two duplicated values occur the string with that duplicated value will shadow the other one in the profiler output.
Can anyone with assembly know how explain what's going on here to a C++ developer like me?
Bonus question: Would there be any way to generate that instruction from within C++ code where the unique numerical value here generated by __COUNTER__would be taken from some variable?
A general note: for information on clang extensions, you generally have to refer to the gcc documentation instead. clang aims to be compatible with gcc and so they didn't bother to write independent docs.
So in your example, a few different extensions are being used. Note that none of them are part of standard C or C++.
__attribute__((section ("foo")) places the variable in the section named foo, by having the compiler emit a .section directive into the assembly before placing the label for the variable. See https://gcc.gnu.org/onlinedocs/gcc-12.2.0/gcc/Common-Variable-Attributes.html#Common-Variable-Attributes. It sounds like you already know about this.
asm in a declaration isn't really inline assembly per se; it simply tells the compiler what symbol name to use for this variable when it emits the assembly code. The __asm is just a variant spelling of asm. See https://gcc.gnu.org/onlinedocs/gcc-12.2.0/gcc/Asm-Labels.html#Asm-Labels. So int foo asm("bar") = 7; defines a variable which will be referred to as foo in C source, but whose label in assembly will be named bar.
__COUNTER__ is a special macro defined by the gcc/clang preprocessor that simply increments every time it is expanded. See https://gcc.gnu.org/onlinedocs/cpp/Common-Predefined-Macros.html#Common-Predefined-Macros
Recently, when working on a project, I had a need to measure the size of a C function in order to be able to copy it somewhere else, but was not able to find any "clean" solutions (ultimately, I just wanted to have a label inserted at the end of the function that I could reference).
Having written the LLVM backend for this architecture (while it may look like ARM, it isn't) and knowing that it emitted assembly code for that architecture, I opted for the following hack (I think the comment explains it quite well):
/***************************************************************************
* if ENABLE_SDRAM_CALLGATE is enabled, this function should NEVER be called
* from C code as it will corrupt the stack pointer, since it returns before
* its epilog. this is done because clang does not provide a way to get the
* size of the function so we insert a label with inline asm to measure the
* function. in addition to that, it should not call any non-forceinlined
* functions to avoid generating a PC relative branch (which would fail if
* the function has been copied)
**************************************************************************/
void sdram_init_late(sdram_param_t* P) {
/* ... */
#ifdef ENABLE_SDRAM_CALLGATE
asm(
"b lr\n"
".globl sdram_init_late_END\n"
"sdram_init_late_END:"
);
#endif
}
It worked as desired but required some assembler glue code in order to call it and is a pretty dirty hack that only worked because I could assume several things about the code generation process.
I've also considered other ways of doing this which would work better if LLVM was emitting machine code (since this approach would break once I added an MC emitter to my LLVM backend). The approach I considered involved taking the function and searching for the terminator instruction (which would either be a b lr instruction or a variation of pop ..., lr) but that could also introduce additional complications (though it seemed better than my original solution).
Can anyone suggest a cleaner way of getting the size of a C function without having to resort to incredibly ugly and unreliable hacks such as the ones outlined above?
I think you're right that there aren't any truly portable ways to do this. Compilers are allowed to re-order functions, so taking the address of the next function in source order isn't safe (but does work in some cases).
If you can parse the object file (maybe with libbfd), you might be able to get function sizes from that.
clang's asm output has this metadata (the .size assembler directive after every function), but I'm not sure whether it ends up in the object file.
int foo(int a) { return a * a * 2; }
## clang-3.8 -O3 for amd64:
## some debug-info lines manually removed
.globl foo
foo:
.Lfunc_begin0:
.cfi_startproc
imul edi, edi
lea eax, [rdi + rdi]
ret
.Lfunc_end0:
.size foo, .Lfunc_end0-foo ####### This line
Compiling this to a .o with clang-3.8 -O3 -Wall -Wextra func-size.c -c, I can then do:
$ readelf --symbols func-size.o
Symbol table '.symtab' contains 4 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 FILE LOCAL DEFAULT ABS func-size.c
2: 0000000000000000 0 SECTION LOCAL DEFAULT 2
3: 0000000000000000 7 FUNC GLOBAL DEFAULT 2 foo ### This line
The three instructions total 7 bytes, which matches up with the size output here. It doesn't include the padding to align the entry point, or the next function: the .align directives are outside the two labels that are subtracted to calculate the .size.
This probably doesn't work well for stripped executables. Even their global functions won't still be present in the symbol table of the executable. So you might need a two-step build process:
compile your "normal" code
get sizes of functions you care about into a table, using readelf | some text processing > sizes.c
compile sizes.c
link everything together
Caveat
A really clever compiler could compile multiple similar functions to share a common implementation. So one of the functions jumps into the middle of the other function body. If you're lucky, all the functions are grouped together, with the "size" of each measuring from its entry point all the way to the end of the blocks of code it uses. (But that overlap would make the total sizes add up to more than the size of the file.)
Current compilers don't do this, but you can prevent it by putting the function in a separate compilation unit, and not using whole-program link-time optimization.
A compiler could decide to put a conditionally-executed block of code before the function entry point, so the branch can use a shorter encoding for a small displacement. This makes that block look like a static "helper" function which probably wouldn't be included in the "size" calculation for function. Current compilers never do this, either, though.
Another idea, which I'm not confident is safe:
Put an asm volatile with just a label definition at the end of your function, and then assume the function size is at most that + 32 bytes or something. So when you copy the function, you allocate a buffer 32B larger than your "calculated" size. Hopefully there's only a "ret" insn beyond the label, but actually it probably goes before the function epilogue which pops all the call-preserved registers it used.
I don't think the optimizer can duplicate an asm volatile statement, so it would force the compiler to jump to a common epilogue instead of duplicating the epilogue like it might sometimes for early-out conditions.
But I'm not sure there's an upper bound on how much could end up after the asm volatile.
I'm pretty new to c programming and I have this following program to degub. Problem is, I have no idea what these lines of code even mean. Could anyone point me in the direction of what they mean as far as from a syntax point of view/functionality? What does the code do? The code is compiled with MPLab C30 v3.23 or higher.
fractional abcCoefficient[3] __attribute__ ((space(xmemory))); /*ABC Coefficients loaded from X memory*/
fractional controlHistory[3] __attribute__ ((space(ymemory))); /*Control History loaded from Y memory*/
fractional kCoeffs[] = {0,0,0}; /*Kp,Ki,and Kd gains array initialized to zero*/
These lines declare variables; there's no execution code associated with what you've pasted.
The environment this code is intended for understands that fractional is a type; either in the same file or in a header this file includes (directly or indirectly), fractional will be defined with a typedef statement. In your examples, each of the variables are arrays of three fractional types.
The __attribute__ ((space(?memory))) entries are attributes the compiler intended to build this understands and affect something regarding how the variables are managed. You'll want to consult the compiler documentation for the platform you're using.
See this page to learn about __attribute__ in gcc (however, I don't see a space(xmemory) option in there, consult your compiler's documentation if it's not gcc. If it is, then space() can be a macro).
fractional is also a custom type, search for typedef definitions for fractional.
Basically, the code is creating a bunch of arrays of type fractional. The first two make use of gcc's attribute extension (or whatever compiler you are using), and the last one is initialized to 0 on every position.
The first two lines declare arrays with three elements each. The type is fractional, which is probably a typedef (to a struct with numerator and denominator?).
The comments suggest that the data is stored in another memory space, perhaps some sort of Flash.
So the program seems to be for an embedded system.
It looks like "fractional" is a custom type, look for its typedef somewhere and it should get you started on what you're looking at. I expect these are variable declarations.
Macros are established using the "#define" preprocessor directive, so you can look for "#define space(x) code" somewhere to tell you what it does. Good luck.
I want to use gcc pre-processor to write almost the same code declaration for 500 times. let's say for demonstration purposes I would like to use a macro FOR_MACRO:
#define FOR_MACRO(x) \
#for i in {1 ... x}: \
const int arr_len_##x[i] = {i};
and calling FOR_MACRO(100) will be converted into:
const int arr_len_1[1] = {1};
const int arr_len_2[2] = {2};
...
const int arr_len_100[100] = {100};
This is not a good idea:
While possible in principle, using the preprocessor means you have to manually unroll the loop at least once, you end up with some arbitrary implementation-defined limit on loop depth and all statements will be generated in a single line.
Better use the scripting language of your choice to generate the code (possibly in a separate includeable file) and integrate that with your build process.
You can use Order-PP for this, if you desperately need to.
It's a scripting language implemented in the preprocessor. This means it's conceptually similar to using a scripting language to generate C code (in fact, the same) except there are no external tools and the script runs at the same time as the C compiler: everything is done with C macros. Despite being built on the preprocessor, there are no real limits to loop iterations, recursion depth, or anything like that (the limit is somewhere in the billions, you don't need to worry about it).
To emit the code requested in the question example, you could write:
#include <order/interpreter.h>
ORDER_PP( // runs Order code
8for_each_in_range(8fn(8I,
8print( 8cat(8(const int arr_len_), 8I)
([) 8I (] = {) 8I (};) )),
1, 101)
)
I can't fathom why you would do this instead of simply integrating an external language like Python into your build process (Order might be implemented using macros, but it's still a separate language to understand), but the option is there.
Order only works with GCC as far as I know; other preprocessors run out of stack too quickly (even Clang), or are not perfectly standard-compliant.
Instead of providing you with a solution for exactly your problem, are you sure it cannot be handled in a better way?
Maybe it would be better to
use one array with one more dimension
fill the data with the help of an array at runtime, as you obviously want to fill out the first entry of each array. If you leave the array uninitialized, it will (provided it is defined on module level) be put into .bss segment instead of .data and will probably need less space in the binary file.
You could use e.g P99 to do such preprocessor code unrolling. But because of the limited capacities of the preprocessor this comes with a limit, and that limit is normally way below 500.
#define _FUID1(x) __attribute__((section("__FUID1.sec"),space(prog))) int _FUID1 = (x);
I am trying to make sense of the about the above define. the _FUID(x) macro. This relates to program memory and has the attribute of the section defining in the code section memory area?
what does the above trying to accomplish?
The macro isn't doing anything interesting or complicated at all; it just outputs a declaration for int _FUID1, with its parameter as an initializer, and with an attributes list ahead of it.
As for what the attributes list means, look at the documentation for variable attributes in GCC. section puts the variable in a named section, which allows the linker to relocate it to a special address or do some other interesting thing to it, and space isn't documented, but space(prog) sounds like a directive to put a value into the program address space instead of the data address space on a Harvard-architecture machine.
I think this is hardware specific (some Microchip unit), it places a value, for example:
__attribute__((section("__FUID1.sec"),space(prog))) int _FUID1 = (0xf1);
into unit id register 1 (__FUID1.sec), in the program flash to configure the hardware. See the pic documentation (for references to FUID) and MPLAB C30 manual (for description of memory spaces).