Do gcc function attributes affect function pointers? - c

Do gcc's function attributes extensions have any effect when used on the type of function pointers?
The function attributes can be used when declaring the type of function pointers but at least some of them seem to have no effect.
https://stackoverflow.com/a/28740576/1128289
The gcc documentation itself does not address this issue.

Very generally, the C standard basically says that handling an object over a pointer whose type is not aligned with the type of the object itself generates undefined behaviour - There are many exceptions to this general rule, but, apparently, none of them seems to apply to your case.
That means we're moving on very unsafe grounds here, in the first place.
First, you need to distinguish function attributes into two classes:
Function attributes that actually change something in the behavior or location of the function itself like aligned or interrupt- A function that is not attributed that way will not change its inner code once you declare a pointer to it interrupt, for example (the code that is generated for the function would need to dynamically change - for example, a "return from interrupt" instruction replaced by a "return normally" one - depending on along what type of pointer it would have been called). This is obviously not possible.
Function attributes that tell the calling code something about the behaviour that can be expected from the function - like noreturn or malloc, for example. Those attributes sometimes might not modify the function code itself (you'll never know, however...), but rather tell the calling code something about assumptions it can make in order to optimise. These assumptions will affect the calling function only and thus can be triggered by tweaking a pointer (you don't even need a pointer to do that, a modified function prototype should suffice - for the language, that would actually turn out to have the same effect). If you tell the compiler it can make such assumptions and those turn out not to be true, this will, however, lead to all sorts of things go wrong. After all, C makes the programmer responsible for making sure a pointer points to the right type of thing.
There are, however, function attributes that actually restrict the amount of assumptions a calling function would be allowed to make (the most obvious would be malloc that tells calling code it returns yet untyped and uninitialized memory). Those should be relatively safe to use (I do, however, fail to come up with a use case atm)
Without very detailed knowledge on what belongs into (1) or (2) above, and what exactly a function attribute might affect in both called and calling code (which would be very difficult to achieve, because I don't recall to ever have seen that documented in detail), you thus will not be able to decide whether this pointer tweaking actually is possible and what side effects it might generate.
Also, in my opinion, there is not much of a difference between tweaking a pointer to a function and calling an external function with a (deliberately) wrong prototype. This might give you some insight on what you are actually trying to do...

Related

Can a function know what's calling it?

Can a function tell what's calling it, through the use of memory addresses maybe? For example, function foo(); gets data on whether it is being called in main(); rather than some other function?
If so, is it possible to change the content of foo(); based on what is calling it?
Example:
int foo()
{
if (being called from main())
printf("Hello\n");
if (being called from some other function)
printf("Goodbye\n");
}
This question might be kind of out there, but is there some sort of C trickery that can make this possible?
For highly optimized C it doesn't really make sense. The harder the compiler tries to optimize the less the final executable resembles the source code (especially for link-time code generation where the old "separate compilation units" problem no longer prevents lots of optimizations). At least in theory (but often in practice for some compilers) functions that existed in the source code may not exist in the final executable (e.g. may have been inlined into their caller); functions that didn't exist in the source code may be generated (e.g. compiler detects common sequences in many functions and "out-lines" them into a new function to avoid code duplication); and functions may be replaced by data (e.g. an "int abcd(uint8_t a, uint8_t b)" replaced by a abcd_table[a][b] lookup table).
For strict C (no extensions or hacks), no. It simply can't support anything like this because it can't expect that (for any compiler including future compilers that don't exist yet) the final output/executable resembles the source code.
An implementation defined extension, or even just a hack involving inline assembly, may be "technically possible" (especially if the compiler doesn't optimize the code well). The most likely approach would be to (ab)use debugging information to determine the caller from "what the function should return to when it returns".
A better way for a compiler to support a hypothetical extension like this may be for the compiler to use some of the optimizations I mentioned - specifically, split the original foo() into 2 separate versions where one version is only ever called from main() and the other version is used for other callers. This has the bonus of letting the compiler optimize out the branches too - it could become like int foo_when_called_from_main() { printf("Hello\n"); }, which could be inlined directly into the caller, so that neither version of foo exists in the final executable. Of course if foo() had other code that's used by all callers then that common code could be lifted out into a new function rather than duplicating it (e.g. so it might become like int foo_when_called_from_main() { printf("Hello\n"); foo_common_code(); }).
There probably isn't any hypothetical compiler that works like that, but there's no real reason you can't do these same optimizations yourself (and have it work on all compilers).
Note: Yes, this was just a crafty way of suggesting that you can/should refactor the code so that it doesn't need to know which function is calling it.
Knowing who called a specific function is essentially what a stack trace is visualizing. There are no general standard way of extracting that though. In theory one could write code that targeted each system type the software would run on, and implement a stack trace function for each of them. In that case you could examine the stack and see what is before the current function.
But with all that said and done, the question you should probably ask is why? Writing a function that functions in a specific way when called from a specific function is not well isolated logic. Instead you could consider passing in a parameter to the function that caused the change in logic. That would also make the result more testable and reliable.
How to actually extract a stack trace has already received many answers here: How can one grab a stack trace in C?
I think if loop in C cannot have a condition as you have mentioned.
If you want to check whether this function is called from main(), you have to do the printf statement in the main() and also at the other function.
I don't really know what you are trying to achieve but according to what I understood, what you can do is each function will pass an additional argument that would uniquely identify that function in form of a character array, integer or enumeration.
for example:
enum function{main, add, sub, div, mul};
and call functions like:
add(3,5,main);//adds 3 and 5. called from main
changes to the code would be typical like if you are adding more functions. but it's an easier way to do it.
No. The C language does not support obtaining the name or other information of who called a function.
As all other answers show, this can only be obtained using external tools, for example that use stack traces and compiler/linker emitted symbol tables.

What does an opaque function call mean in compiler optimization?

What does an opaque function call mean in the compiler optimization?I'v found it in Why do global variables cause trouble for compiler optimizations in function calls?, and 'opaque function call' really confuses me.
It seems that an opaque function call is the function call that the compiler has no information about it. But what does it mean?
As you mention in the question, an opaque function call is a call to a function that the compiler has no prior information about. This implies that the compiler can make no assumptions about the side effects of this call except what is guaranteed by the language definition. For example, since the compiler has no other information it must assume that the function call can modify any global variable and must ensure that any local changes are stored before the call, it must also reload global variables used after the call. Further, the compiler can never skip making a call to this function even when calling it looks useless since there is no way for the compiler to know this for certain.

What could happen when you call function returning int with void (*)() pointer?

I would like to know what could happen in a situation like this:
int foo()
{
return 1;
}
void bar()
{
void(*fPtr)();
fPtr = (void(*)())foo;
fPtr();
}
Address of function returning int is assigned to pointer of void(*)() type and the function pointed is called.
What does the standard say about it?
Regardless of answer to 1st question: Are we safe to call the function like this? In practise shouldnt the outcome be just that callee (foo) will put something in EAX / RAX and caller (bar) will just ignore the rax content and go on with the program? I'm interested in Windows calling convention x86 and x64.
Thanks a lot for your time
1)
From the C11 standard - 6.5.2.2 - 9
If the function is defined with a type that is not compatible with the type (of the expression) pointed to by the expression that denotes the called function, the behavior is undefined
It is clearly stated that if a function is called using a pointer of type that does not match the type it is defined with, it leads to Undefined Behavior.
But the cast is okay.
2)
Regarding your second question - In case of a well defined Calling convention XXX and implementation YYYY -
You might have disassembled a sample program (even this one) and figured out that it "works". But there are slight complications. You see, the compilers these days are very smart. There are some compilers which are capable of performing precise inter procedural analysis. Some compiler might figure out that you have behavior that is not defined and it might make some assumption that might break the behavior.
A simple example -
Since the compiler sees that this function is being called with type void(*)(), it will assume that it is not supposed to return anything, and it might remove the instructions required to return the correct value.
In this case other functions calling this functions (in a right way) will get a bad value and thus it would have visible bad effects.
PS: As pointed out by #PeterCordes any modern, sane and useful compiler won't have such an optimization and probably it is always safe to use such calls. But the intent of the answer and the example (probably too simplistic) is to remind that one must tread very carefully when dealing with UBs.
What happens in practice depends a lot on how the compiler implements this. You're assuming C is just a thin ("obvious") layer over asm, but it isn't.
In this case, a compiler can see that you're calling a function through a pointer with the wrong type (which has undefined behavior1), so it could theoretically compile bar() to:
bar:
ret
A compiler can assume undefined behavior never happens during the execution of a program. Calling bar() always results in undefined behavior. Therefore the compiler can assume bar is never called and optimize the rest of the program based on that.
1 C99, 6.3.2.3/8:
If a converted
pointer is used to call a function whose type is not compatible with the pointed-to type,
the behavior is undefined.
About sub-question 2:
Nearly all x86 calling conventions I know (cdecl, stdcall, syscall, fastcall, pascal, 64-bit Windows and 64-bit Linux) will allow void functions to modify the ax/eax/rax register and the difference between an int function and a void function is only that the returned value is passed in the eax register.
The same is true for the "default" calling convention on most other CPUs I have already worked with (MIPS, Sparc, ARM, V850/RH850, PowerPC, TriCore). The register name is not eax but different, of course.
So when using these calling convention you can safely call the int function using a void pointer.
There are however calling conventions where this is not the case: I've read about a calling convention that implicitly use an additional argument for non-void functions...
At the asm level only, this is safe in all normal x86 calling conventions for integer types: eax/rax is call-clobbered, and the caller doesn't have to do anything differently to call a void function vs. an int function and ignoring the return value.
For non-integer return types, this is a problem even in asm. Struct returns are done via a hidden pointer arg that displaces the other args, and the caller is going to store through it so it better not hold garbage. (Assuming the case is more complex than the one shown here, so the function doesn't just inline when optimization is enabled.) See the Godbolt link below for an example of calling through a casted function pointer that results in a store through a garbage "pointer" in rdi.
For legacy 32-bit code, FP return values are in st(0) on the x87 stack, and it's the caller's responsibility to not leave the x87 stack unbalanced. float / double / __m128 return values are safe to ignore in 64-bit ABIs, or in 32-bit code using a calling convention that returns FP values in xmm0 (SSE/SSE2).
In C, this is UB (see other answers for quotes from the standard). When possible / convenient, prefer a workaround (see below).
It's possible that future aggressive optimizations based on a no-UB assumption could break code like this. For example, a compiler might assume any path that leads to UB is never taken, so an if() condition that leads to this code running must always be false.
Note that merely compiling bar() can't break foo() or other functions that don't call bar(). There's only UB if bar() ever runs, so emitting a broken externally-visible definition for foo() (like #Ajay suggests) is not a possible consequence. (Except maybe if you use whole-program optimization and the compiler proves that bar() is always called at least once.) The compiler can break functions that call bar(), though, at least the parts of them that lead to the UB.
However, it is allowed (by accident or on purpose) by many current compilers for x86. Some users expect this to work, and this kind of thing is present in some real codebases, so compiler devs may support this usage even if they implement aggressive optimizations that would otherwise assume this function (and thus all paths that lead to it in any callers) never run. Or maybe not!
An implementation is free to define the behaviour in cases where the ISO C standard leaves the behaviour undefined. However, I don't think gcc/clang or any other compiler explicitly guarantees that this is safe. Compiler devs might or might not consider it a compiler bug if this code stopped working.
I definitely can't recommend doing this, because it may well not continue to be safe. Hopefully if compiler devs decide to break it with aggressive no-UB-assuming optimizations, there will be options to control which kinds of UB are assumed not to happen. And/or there will be warnings. As discussed in comments, whether to take a risk of possible future breakage for short-term performance / convenience benefits depends on external factors (like will lives be at risk, and how carefully you plan to maintain in the future, e.g. checking compiler warnings with future compiler versions.)
Anyway, if it works, it's because of the generosity of your compiler, not because of any kind of standards guarantee. This compiler generosity may be intentional and semi-maintained, though.
See also discussion on another answer: the compilers people actually use aim to be useful, not just standards compliant. The C standard allows enough freedom to make a compliant but not very useful implementation. (Many would argue that compilers that assume no signed overflow even on machines where it has well-defined semantics have already gone past this point, though. See also What Every C Programmer Should Know About Undefined Behavior (an LLVM blog post).)
If the compiler can't prove that it would be UB (e.g. if it can't statically determine which function a function-pointer is pointing to), there's pretty much no way it can break (if the functions are ABI-compatible). Clang's runtime UB-sanitizer would still find it, but a compiler doesn't have much choice in code-gen for calling through an unknown function pointer. It just has to call the way the ABI / calling convention says it should. It can't tell the difference between casting a function pointer to the "wrong" type and casting it back to the correct type (unless you dereference the same function pointer with two different types, which means one or the other must be UB. But the compiler would have a hard time proving it, because the first call might not return. noreturn functions don't have to be marked noreturn.)
But remember that link-time optimization / inlining / constant-propagation could let the compiler see which function is pointed to even in a function that gets a function pointer as an arg or from a global variable.
Workarounds (for a function before you take its address):
If the function won't be part of Link-Time-Optimization, you could lie to the compiler and give it a prototype that matches how you want to call it (as long as you're sure you got the asm-level calling convention is compatible).
You could write a wrapper function. It's potentially less efficient (an extra jmp if it just tail-calls the original), but if it inlines then you're cloning the function to make a version that doesn't do any of the work of creating a return value. This might still be a loss if that was cheap compared to the extra I-cache / uop cache pressure of a 2nd definition, if the version that does return a value is used too.
You could also define an alternate name for a function, using linker stuff so both symbols have the same address. That way you can have two prototypes for the same block of compiler-generated machine code.
Using the GNU toolchain, you can use an attribute on a prototype to make it a weak alias (at the asm / linker level). This doesn't work for all targets; it works for ELF object files, but IDK about Windows.
// in GNU C:
int foo(void) { return 4; }
// include this line in a header if you want; weakref is per translation unit
// a definition (or prototype) for foo doesn't have to be visible.
static void foo_void(void) __attribute((weakref("foo"))); // in C++, use the mangled name
int bar_safe(void) {
void (*goo)(void) = (void(*)())foo_void;
goo();
return 1;
}
example on Godbolt for gcc7.2 and clang5.0.
gcc7.2 inlines foo through the weak alias call to foo_void! clang doesn't, though. I think that means that this is safe, and so is function-pointer casting, in gcc. Alternatively it means that this is potentially dangerous, too. >.<
clang's undefined-behaviour sanitizer does runtime function typeinfo checking (in C++ mode only) for calls through function pointers. int () is different from void (), so it will detect and report this UB on x86. (See the asm on Godbolt). It probably doesn't mean it's actually unsafe at the moment, though, because it doesn't yet detect / warn about it at compile time.
Use the above workarounds in the code that takes the address of the function, not in the code that receives a function pointer.
You want to let the compiler see a real function with the signature that it will eventually be called with, regardless of the function pointer type you pass it through. Make an alias / wrapper with a signature that matches what the function pointer will eventually be cast to. If that means you have to cast the function pointer to pass it in the first place, so be it.
(I think it's safe to create a pointer to the wrong type as long as it's not dereferenced. It's UB to even create an unaligned pointer, even if you don't dereference, but that's different.)
If you have code that needs to deref the same function pointer as int foo(args) in one place and void foo(args) in another place, you're screwed as far as avoiding UB.
C11 ยง6.3.2.3 paragraph 8:
A pointer to a function of one type may be converted to a pointer to a
function of another type and back again; the result shall compare
equal to the original pointer. If a converted pointer is used to call
a function whose type is not compatible with the referenced type, the
behavior is undefined.

Use of (apparently) empty C function

Can anyone comment on the point of the following function, which appears to do not very much:
// Returns stored values
int getDetails(const int param1[],
int* param2,
int* param3,
int* param4)
{
(void)param1;
(void)param2;
(void)param3;
(void)param4;
return 0;
}
The comment is actually there with the code. I'm thinking it must be some kind of odd stub but it is being called and I'm racking my brains to try to imagine what I'm missing.
My best hunch so far is that the function has been deprecated but not removed and the (void)param is to avoid compiler warnings about unused variables.
Statements like (void)param1; are typically used to suppress warnings about unused function parameters. (As an aside, in C++ you could also comment out or remove the parameter names.)
You're correct that the function does nothing. If other code doesn't create a pointer to it, you could safely remove it.
It's an empty function. Casts to void suppress warnings about unused parameters.
Such functions are often used when a function must be called unconditionally or a valid function pointer must be provided, but really the function has nothing to do.
I have a few such functions in my compiler's code generator. There are two code generators actually, one for x86 and the other for MIPS. I compile and link one or the other, never both at the same time.
The code generators are different internally but have the same external API. So, some functions specific to one CPU have some work to do while the same functions for the other have nothing to do. Logically, some of them are empty since there's nothing to do.
My guess (opinion - sorry!) that it could be a stub as you say. If you have a function that takes one or more function pointers to achieve something and it does not allow for a NULL (don't bother with this) then you have to provide something for it to call.
The casts are probably to avoid the "unused parameter" warning.
You're right, it has no point.
All it does is explicitly ignore the arguments by evaluating them and casting the result to (void), and return 0.
Is the return value being used in the context of the call? The best approach is of course to remove the call and replace it with a 0 if the return value is being used, and test the program.
Some Compilers shows error/warning when you are not using the arguments passed to it , to avoid that mention that like it in your code . If the function is not called any where or not assigned to any function pointers , you can remove it as it is not doing anything specific

Function pointer cast to different signature

I use a structure of function pointers to implement an interface for different backends. The signatures are very different, but the return values are almost all void, void * or int.
struct my_interface {
void (*func_a)(int i);
void *(*func_b)(const char *bla);
...
int (*func_z)(char foo);
};
But it is not required that a backends supports functions for every interface function. So I have two possibilities, first option is to check before every call if the pointer is unequal NULL. I don't like that very much, because of the readability and because I fear the performance impacts (I haven't measured it, however). The other option is to have a dummy function, for the rare cases an interface function doesn't exist.
Therefore I'd need a dummy function for every signature, I wonder if it is possible to have only one for the different return values. And cast it to the given signature.
#include <stdio.h>
int nothing(void) {return 0;}
typedef int (*cb_t)(int);
int main(void)
{
cb_t func;
int i;
func = (cb_t) nothing;
i = func(1);
printf("%d\n", i);
return 0;
}
I tested this code with gcc and it works. But is it sane? Or can it corrupt the stack or can it cause other problems?
EDIT: Thanks to all the answers, I learned now much about calling conventions, after a bit of further reading. And have now a much better understanding of what happens under the hood.
By the C specification, casting a function pointer results in undefined behavior. In fact, for a while, GCC 4.3 prereleases would return NULL whenever you casted a function pointer, perfectly valid by the spec, but they backed out that change before release because it broke lots of programs.
Assuming GCC continues doing what it does now, it will work fine with the default x86 calling convention (and most calling conventions on most architectures), but I wouldn't depend on it. Testing the function pointer against NULL at every callsite isn't much more expensive than a function call. If you really want, you may write a macro:
#define CALL_MAYBE(func, args...) do {if (func) (func)(## args);} while (0)
Or you could have a different dummy function for every signature, but I can understand that you'd like to avoid that.
Edit
Charles Bailey called me out on this, so I went and looked up the details (instead of relying on my holey memory). The C specification says
766 A pointer to a function of one type may be converted to a pointer to a function of another type and back again;
767 the result shall compare equal to the original pointer.
768 If a converted pointer is used to call a function whose type is not compatible with the pointed-to type, the behavior is undefined.
and GCC 4.2 prereleases (this was settled way before 4.3) was following these rules: the cast of a function pointer did not result in NULL, as I wrote, but attempting to call a function through a incompatible type, i.e.
func = (cb_t)nothing;
func(1);
from your example, would result in an abort. They changed back to the 4.1 behavior (allow but warn), partly because this change broke OpenSSL, but OpenSSL has been fixed in the meantime, and this is undefined behavior which the compiler is free to change at any time.
OpenSSL was only casting functions pointers to other function types taking and returning the same number of values of the same exact sizes, and this (assuming you're not dealing with floating-point) happens to be safe across all the platforms and calling conventions I know of. However, anything else is potentially unsafe.
I suspect you will get an undefined behaviour.
You can assign (with the proper cast) a pointer to function to another pointer to function with a different signature, but when you call it weird things may happen.
Your nothing() function takes no arguments, to the compiler this may mean that he can optimize the usage of the stack as there will be no arguments there. But here you call it with an argument, that is an unexpected situation and it may crash.
I can't find the proper point in the standard but I remember it says that you can cast function pointers but when you call the resulting function you have to do with the right prototype otherwise the behaviour is undefined.
As a side note, you should not compare a function pointer with a data pointer (like NULL) as thee pointers may belong to separate address spaces. There's an appendix in the C99 standard that allows this specific case but I don't think it's widely implemented. That said, on architecture where there is only one address space casting a function pointer to a data pointer or comparing it with NULL, will usually work.
You do run the risk of causing stack corruption. Having said that, if you declare the functions with extern "C" linkage (and/or __cdecl depending on your compiler), you may be able to get away with this. It would be similar then to the way a function such as printf() can take a variable number of arguments at the caller's discretion.
Whether this works or not in your current situation may also depend on the exact compiler options you are using. If you're using MSVC, then debug vs. release compile options may make a big difference.
It should be fine. Since the caller is responsible for cleaning up the stack after a call, it shouldn't leave anything extra on the stack. The callee (nothing() in this case) is ok since it wont try to use any parameters on the stack.
EDIT: this does assume cdecl calling conventions, which is usually the default for C.
As long as you can guarantee that you're making a call using a method that has the caller balance the stack rather than the callee (__cdecl). If you don't have a calling convention specified the global convention could be set to something else. (__stdcall or __fastcall) Both of which could lead to stack corruption.
This won't work unless you use implementation-specific/platform-specific stuff to force the correct calling convention. For some calling conventions the called function is responsible for cleaning up the stack, so they must know what's been pushed on.
I'd go for the check for NULL then call - I can't imagine it would have any impact on performance.
Computers can check for NULL about as fast as anything they do.
Casting a function pointer to NULL is explicitly not supported by the C standard. You're at the mercy of the compiler writer. It works OK on a lot of compilers.
It is one of the great annoyances of C that there is no equivalent of NULL or void* for function pointers.
If you really want your code to be bulletproof, you can declare your own nulls, but you need one for each function type. For example,
void void_int_NULL(int n) { (void)n; abort(); }
and then you can test
if (my_thing->func_a != void_int_NULL) my_thing->func_a(99);
Ugly, innit?

Resources