What is the calling convention that clang uses? - c

What is the default call convention that the clang compiler uses? I noticed that when I return a local pointer, the reference is not lost
#include <stdio.h>
char *retx(void) {
char buf[4] = "buf";
return buf;
}
int main(void) {
char *p1 = retx();
puts(p1);
return 0;
}

This is Undefined Behaviour. It might happen to work, or it might not, depending on what the compiler happened to choose when compiling for some specific target. It's literally undefined, not "guaranteed to break"; that's the entire point. Compilers can just completely ignore the possibility of UB when generating code, not using extra instructions to make sure UB breaks. (If you want that, compile with -fsanitize=undefined).
Understanding exactly what happened requires looking at the asm, not just trying running it.
warning: address of stack memory associated with local variable 'buf' returned [-Wreturn-stack-address]
return buf;
^~~
Clang prints this warning even without -Wall enabled. Exactly because it's not legal C, regardless of what asm calling convention you're targeting.
Clang uses the C calling convention of the target it's compiling for1. Different OSes on the same ISA can have different conventions, although outside of x86 most ISAs only have one major calling convention. x86 has been around so long that the original calling conventions (stack args with no register args) were inefficient so various 32-bit conventions evolved. And Microsoft chose a different 64-bit convention from everyone else. So there's x86-64 System V, Windows x64, i386 System V for 32-bit x86, AArch64's standard convention, PowerPC's standard convention, etc. etc.
I have tested with clang several times and every time I displayed the string
The "decision" / "luck" of whether it "works" or not is made at compile time, not runtime. Compiling / running the same source multiple times with the same compiler tells you nothing.
Look at the generated asm to find out where char buf[4] ends up.
My guess: maybe you're on Windows x64. Happening to work is more plausible there than most calling conventions, where you'd expect buf[4] to end up below the stack pointer in main, so the call to puts, and puts itself, would be very likely to overwrite it.
If you're on Windows x64 compiling with optimization disabled, retx()'s local char buf[4] might be placed in the shadow space it owns. The caller then calls puts() with the same stack alignment, so retx's shadow space becomes puts's shadow space.
And if puts happens not to write its shadow space, then the data in memory that retx stored is still there. e.g. maybe puts is a wrapper function that in turn calls another function, without initializing a bunch of locals for itself first. But not a tailcall, so it allocates new shadow space.
(But that's not what clang8.0 does in practice with optimization disabled. It looks like buf[4] will be placed below RSP and get stepped on there, using __attribute__((ms_abi)) to get Windows x64 code-gen from Linux clang: https://godbolt.org/z/2VszYg)
But it's also possible in stack-args conventions where padding is left to align the stack pointer by 16 before a call. (e.g. modern i386 System V on Linux for 32-bit x86). puts() has an arg but retx() doesn't, so maybe buf[4] ended up in memory that the caller "allocates" as padding before pushing a pointer arg for puts.
Of course that would be unsafe because the data would be temporarily below the stack pointer, in a calling convention with no red-zone. (Only a few ABIs / calling conventions have red zones: memory below the stack pointer that's guaranteed not to be clobbered asynchronously by signal handlers, exception handlers, or debuggers calling functions in the target process.)
I wondered if enabling optimization would make it inline and happen to work. But no, I tested that for Windows x64: https://godbolt.org/z/k3xGe4. clang and MSVC both optimize away any stores of "buf\0" into memory. Instead they just pass puts a pointer to some uninitialized stack memory.
Code that breaks with optimization enabled is almost always UB.
Footnote 1: Except for x86-64 System V, where clang uses an extra un-documented "feature" of the calling convention: Narrow integer types as function args in registers are assumed to be sign-extended to 32 bits. gcc and clang both do this when calling, but ICC does not, so calling clang functions from ICC-compiled code can cause breakage. See Is a sign or zero extension required when adding a 32bit offset to a pointer for the x86-64 ABI?

Annex L of the C11 Draft N1570 recognizes some situations (i.e. "non-critical Undefined Behavior") where the Standard imposes no particular behavioral requirements but implementations that define __STDC_ANALYZABLE__ with a non-zero value should offer some guarantees, and other situations ("critical Undefined Behavior") where it would be common for implementations not to guarantee anything. Attempts to access objects past their lifetime would fall into the latter category.
While nothing would prevent an implementation from offering behavioral guarantees beyond what the Standard requires, even for Critical Undefined Behavior, and some tasks would require that implementations do so (e.g. many embedded systems tasks require that programs dereference pointers to addresses whose targets no not satisfy the definition for "objects"), accessing automatic variables past their lifetime is a behavior about which few implementations would offer any guarantees beyond perhaps guaranteeing that reading an arbitrary RAM address will have no side-effects beyond yielding an Unspecified value.
Even implementations that guaranteed how automatic objects will be laid out on the stack seldom guaranteed that the storage that held them wouldn't be overwritten between the time a function returned and the next action by the caller. Unless interrupts were disabled, interrupt handling could overwrite any storage used that had been used by automatic objects that were no longer in a live stack frame.
While many implementations can be configured to offer useful guarantees about the behavior of actions for which the Standard imposes no requirements, I can't think of any implementations that can be configured to offer sufficient guarantees to make the above code usable.

Related

GNU C compiler sabotages undefined behaviour

I have an embedded project that requires at some point that I write to address 0. So naturally I try:
*(int*)0 = 0 ;
But at optimisation level 2 or higher, the gcc compiler rubs its hands and says, in effect, "That is undefined behaviour! I can do what I like! Bwahaha!" and emits an invalid instruction to the code stream!
Here is my source file:
void f (void)
{
*(int*)0 = 0 ;
}
and here is the output listing:
.file "bug.c"
.text
.p2align 4,,15
.globl _f
.def _f; .scl 2; .type 32; .endef
_f:
LFB0:
.cfi_startproc
movl $0, 0
ud2 <-- Invalid instruction!
.cfi_endproc
LFE0:
.ident "GCC: (i686-posix-dwarf-rev0, Built by MinGW-W64 project) 7.3.0"
My question is: Why would anybody do this? What possible benefit could accrue from sabotaging code like this? Surely the obvious course of action is to issue a warning and carry on compiling?
I know the compiler is allowed to do this, I just wonder about the motivation of the compiler writer. It cost me two days and four engineering samples to track this down, so I'm a little peeved.
Edited to add: I have worked around this by using assembly language. So I'm not looking for solutions. I'm just curious why anybody would think this compiler behaviour was a good idea.
(Disclaimer: I'm not an expert on GCC internals, and this is more of a "post hoc" attempt to explain its behavior. But maybe it will be helpful.)
the gcc compiler rubs its hands and says, in effect, "That is undefined behaviour! I can do what I like! Bwahaha!" and emits an invalid instruction to the code stream!
I won't deny that there are cases where GCC does more or less that, but here there's a little more going on, and there is some method to its madness.
As I understand it, GCC isn't treating the null dereference as totally undefined here; it is making some assumptions about what it does. Its handling of null dereferences is controlled by a flag called -fdelete-null-pointer-checks, which is probably enabled by default when you turn on optimizations. From the manual:
-fdelete-null-pointer-checks
Assume that programs cannot safely dereference null pointers, and that no code or data element resides at address zero. This option
enables simple constant folding optimizations at all optimization
levels. In addition, other optimization passes in GCC use this flag to
control global dataflow analyses that eliminate useless checks for
null pointers; these assume that a memory access to address zero
always results in a trap, so that if a pointer is checked after it has
already been dereferenced, it cannot be null.
Note however that in some environments this assumption is not true. Use -fno-delete-null-pointer-checks to disable this optimization
for programs that depend on that behavior.
This option is enabled by default on most targets. On Nios II ELF, it defaults to off. On AVR, CR16, and MSP430, this option is
completely disabled.
Passes that use the dataflow information are enabled independently at different optimization levels.
So, if you are intending to actually access address 0, or if for some other reason your code will go on executing after the dereference, then you want to disable this with -fno-delete-null-pointer-checks. That will achieve the "carry on compiling" part of what you want. It will not give you warnings, however, presumably under the assumption that such dereferences are intentional.
But under default options, why are you seeing the generated code that you do, with the undefined instruction, and why isn't there a warning? I would guess that GCC's logic is running as follows:
Because -fdelete-null-pointer-checks is in effect, the compiler assumes that execution will not continue past the null dereference, but instead will trap. How the trap will be handled, it doesn't know: maybe program termination, maybe a signal or exception handler, maybe a longjmp up the stack. The null dereference itself is emitted as requested, perhaps under the assumption that you are intentionally exercising your trap handler. But either way, whatever code comes after the null dereference is now unreachable.
So now it does what any reasonable optimizing compiler does with unreachable code: it doesn't emit it. In your case, that's nothing but a ret, but whatever it is, as far as GCC is concerned it would just be wasted bytes of memory, and should be omitted.
You might think you should get a warning here, but GCC has a longstanding design decision not to warn about unreachable code, on the grounds that such warnings tended to be inconsistent and the false positives would do more harm than good. See for instance https://gcc.gnu.org/legacy-ml/gcc-help/2011-05/msg00360.html.
However, as a safety feature, GCC emits an undefined instruction (ud2 on x86) in place of the omitted unreachable code. The idea, I believe, is that just in case execution somehow does continue past the null dereference, it is better for the program to die, than to go off into the weeds and try to execute whatever memory contents happen to come next. (And indeed this can happen even on systems that do unmap the zero page; for instance, if you do struct huge *p = NULL; p->x = 0;, GCC understands this as a null dereference, even though p->x may not be on the zero page at all, and could conceivably be located at an accessible address.)
There is a warning flag, -Wnull-dereference, that will trigger a warning on your blatant null dereference. However, it only works if -fdelete-null-pointer-checks is enabled.
When would GCC's behavior be useful? Here's an example, maybe contrived, but it might get the idea across. Imagine your program has some allocation function that might fail:
struct foo *p = get_foo();
// do other stuff for a while
if (!p) {
// 5000 lines of elaborate backup plan in case we can't get a foo
}
frob(p->bar);
Now imagine that you redesign get_foo() so that it can't fail. You forget to take out your "backup plan" code, but you go ahead and use the returned object right away:
struct foo *p = get_foo();
frob(p->bar);
// do other stuff for a while
if (!p) {
// 5000 lines of elaborate backup plan in case we can't get a foo
}
The compiler doesn't know, a priori, that get_foo() will always return a valid pointer. But it can see that you've dereferenced it, and thus can assume that execution will only continue past that point if the pointer was not null. Therefore, it can tell that the elaborate backup plan is unreachable and should be omitted, which will save you a lot of bloat in your binary.
Incidentally, the situation with clang. Although as Eric Postpischil points out you do get a warning, what you don't get is an actual load from address 0: clang omits it and just emits ud2. This is what "doing whatever it likes" would really look like, and if you were hoping to exercise your page zero trap handler, you are out of luck.
In describing Undefined Behavior, the Standard refers to it as resulting "upon use of a nonportable or erroneous program construct or of erroneous data,", and the authors of the Standard clarify their intentions more clearly in the published Rationale: "Undefined behavior gives the implementor license not to catch certain program errors that are difficult to diagnose. It also identifies areas of possible conforming language extension: the implementor may augment the language by providing a definition of the officially undefined behavior." The question of when to extend the language in such fashion--treating various forms of UB as non-portable but correct, was left as a Quality of Implementation issue outside the Standard's jurisdiction.
The maintainers of clang and gcc take the view that the phrase "non-portable or erroneous" should be interpreted as synonymous with "erroneous", since the Standard would not forbid such an interpretation. If a compiler will never be used to process non-portable programs that will never be fed erroneous data, such an interpretation will sometimes allow them to process some strictly conforming programs which are fed exclusively valid data more quickly than would otherwise be possible, at the expense of making them less suitable for other purposes. I personally would view the range of programs that a compiler can usefully process reasonably efficiently as a much better metric of quality than the efficiency with which a compiler can process strictly-conforming programs, but people who are using compilers for different purposes may have different views about what would make a compiler more or less useful for those purposes.

Shall I use register class variables in modern C programs?

In C++, the keyword register was removed in its latest standard ISO/IEC 14882:2017 (C++17).
But also in C, I see a lot, that more and more coders tend to not use or like to declare an object with the register class qualifier because its purposed benefit shall be almost useless, like in #user253751´s answer:
register does not cause the compiler to store a value in a register. register does absolutely nothing. Only extremely old compilers used register to know which variables to store in registers. New compilers do it automatically. Even 20-year-old compilers do it automatically.
Is the use of register class variables and with that the use of the keyword register deprecated?
Shall I use register class variables in my modern programs? Or is this behavior redundant and deprecated?
There is no benefit to using register. Modern compilers substantially ignore it — they can handle register allocation better than you can. The only thing it prevents is taking the address of the variable, which is not a significant benefit.
None of my own code uses register any more. The code I work on loses register when I get to work on a file — but it takes time to get through 17,000+ files (and I only change a file when I have an external reason to change it — but it can be a flimsy reason).
As #JonathanLeffler stated it is ignored in most cases.
Some compilers have a special extension syntax if you want to keep the variable in the particular register.
gcc Global or local variable can be placed in the particular register. This option is not available for all platforms. I know that AVR & ARM ports implement it.
example:
register int x asm ("10");
int foo(int y)
{
x = bar(x);
x = bar1(x);
return x*x;
}
https://godbolt.org/z/qwAZ8x
More information: https://gcc.gnu.org/onlinedocs/gcc-6.1.0/gcc/Explicit-Register-Variables.html#Explicit-Register-Variables
But to be honest I was never using it in my programming life (30y+)
It's effectively deprecated and offers no real benefit.
C is a product of the early 1970s, and the register keyword served as a hint to the compiler that a) this particular object was going to be used a lot, so b) you might want to store it somewhere other than main memory - IOW, a register or some other "fast" memory.
It may have made a difference then - now, it's pretty much ignored. The only measurable effect is that it prevents you from taking the address of that object.
First of all, this feature is NOT deprecated because: "register" in this context (global or local register variables) is a GNU extension which are not deprecated.
In your example, R10 (or the register that GCC internally assigns REGNO(reg) = 10), is a global register. "global" here means, that all code in your application must agree on that usage. This is usually not the case for code from libraries like libc, libm or libgcc because they are not compiled with -ffixed-10. Moreover, global registers might conflict with the ABI. avr-gcc for example might pass values in R10. In avr-gcc, R2...R9 are not used by the ABI and not by code from libgcc (except for 64-bit double).
In some hard real-time app with avr-gcc I used global regs in a (premature) optimization, just to notice that the performance gain was miniscule.
Local register variables, however, are very handy when it comes to integrating non-ABI functions for example assembly functions that don't comply to the GCC ABI, without the need for assembly wrappers.

Do I have to link the files with -lgcc?

If you've ever linked a kernel with gcc you may know the parameter -lgcc.
Is this parameter important ? What does it do ?
If you do some driver/kernel dev, you may use the -nostdlib to remove your module from the bloated stdlib. However, you also remove all the internal hacks GCC has in order to have a consistent behaviour on a whole range of hardware.
http://gcc.gnu.org/onlinedocs/gcc-4.6.1/gcc/Link-Options.html
-nostdlib
Do not use the standard system startup files or libraries when linking. No startup files and only the libraries you specify will be
passed to the linker, options specifying linkage of the system
libraries, such as -static-libgcc or -shared-libgcc, will be ignored.
The compiler may generate calls to memcmp, memset, memcpy and memmove.
These entries are usually resolved by entries in libc. These entry
points should be supplied through some other mechanism when this
option is specified.
One of the standard libraries bypassed by -nostdlib and -nodefaultlibs is libgcc.a, a library of internal subroutines that GCC uses to overcome shortcomings of particular machines, or special needs
for some languages. (See Interfacing to GCC Output, for more
discussion of libgcc.a.) In most cases, you need libgcc.a even when
you want to avoid other standard libraries. In other words, when you
specify -nostdlib or -nodefaultlibs you should usually specify -lgcc
as well. This ensures that you have no unresolved references to
internal GCC library subroutines. (For example, `__main', used to
ensure C++ constructors will be called; see collect2.)
https://gcc.gnu.org/onlinedocs/gcc-4.6.1/gccint/Interface.html#Interface
3 Interfacing to GCC Output
GCC is normally configured to use the same function calling convention
normally in use on the target system. This is done with the
machine-description macros described (see Target Macros).
However, returning of structure and union values is done differently
on some target machines. As a result, functions compiled with PCC
returning such types cannot be called from code compiled with GCC, and
vice versa. This does not cause trouble often because few Unix library
routines return structures or unions.
GCC code returns structures and unions that are 1, 2, 4 or 8 bytes
long in the same registers used for int or double return values. (GCC
typically allocates variables of such types in registers also.)
Structures and unions of other sizes are returned by storing them into
an address passed by the caller (usually in a register). The target
hook TARGET_STRUCT_VALUE_RTX tells GCC where to pass this address.
By contrast, PCC on most target machines returns structures and unions
of any size by copying the data into an area of static storage, and
then returning the address of that storage as if it were a pointer
value. The caller must copy the data from that memory area to the
place where the value is wanted. This is slower than the method used
by GCC, and fails to be reentrant.
On some target machines, such as RISC machines and the 80386, the
standard system convention is to pass to the subroutine the address of
where to return the value. On these machines, GCC has been configured
to be compatible with the standard compiler, when this method is used.
It may not be compatible for structures of 1, 2, 4 or 8 bytes.
GCC uses the system's standard convention for passing arguments. On
some machines, the first few arguments are passed in registers; in
others, all are passed on the stack. It would be possible to use
registers for argument passing on any machine, and this would probably
result in a significant speedup. But the result would be complete
incompatibility with code that follows the standard convention. So
this change is practical only if you are switching to GCC as the sole
C compiler for the system. We may implement register argument passing
on certain machines once we have a complete GNU system so that we can
compile the libraries with GCC.
On some machines (particularly the SPARC), certain types of arguments
are passed “by invisible reference”. This means that the value is
stored in memory, and the address of the memory location is passed to
the subroutine.
If you use longjmp, beware of automatic variables. ISO C says that
automatic variables that are not declared volatile have undefined
values after a longjmp. And this is all GCC promises to do, because it
is very difficult to restore register variables correctly, and one of
GCC's features is that it can put variables in registers without your
asking it to.

__fastcall vs register syntax?

Currently I have a small function which gets called very very very often (looped multiple times), taking one argument. Thus, it's a good case for a __fastcall.
I wonder though.
Is there a difference between these two syntaxes:
void __fastcall func(CTarget *pCt);
and
void func(register CTarget *pCt);
After all, those two syntaxes basically tell the compiler to pass the argument in registers right?
Thanks!
__fastcall defines a particular convention.
It was first added by Microsoft to define a convention in which the first two arguments that fit in the ECX and EDX registers are placed in them (on x86, on x86-64 the keyword is ignored though the convention that is used already makes an even heavier use of registers anyway).
Some other compilers also have a __fastcall or fastcall. GCC's is much as Microsofts. Borland uses EAX, EDX & ECX.
Watcom recognises the keyword for compatibility, but ignores it and uses EAX, EDX, EBX & ECX regardless. Indeed, it was the belief that this convention was behind Watcom beating Microsoft on several benchmarks a long time ago that led to the invention of __fastcall in the first place. (So MS could produce a similar effect, while the default would remain compatible with older code).
_mregparam can also be used with some compilers to change the number of registers used (some builds of the Linux kernel are on Intel or GCC but with _mregparam 3 so as to result in a similar result as that of __fastcall on Borland.
It's worth noting that the state of the art having moved on in many regards, (the caching that happens in CPUs being particularly relevant) __fastcall may in fact be slower than some other conventions in some cases.
None of the above is standard.
Meanwhile, register is a standard keyword originally defined as "please put this in a register if possible" but more generally meaning "The address of this automatic variable or parameter will never be used. Please make use of this in optimising, in whatever way you can". This may mean en-registering the value, it may be ignored, or it may be used in some other compiler optimisation (e.g. the fact that the address cannot be taken means certain types of aliasing error can't happen with certain optimisations).
As a rule, it's largely ignored because compilers can tell if you took an address or not and just use that information (or indeed have a memory location, copy into a register for a bunch or work, then copy back before the address is used). Conversely, it may be ignored in function signatures just to allow conventions to remain conventions (especially if exported, then it would either have to be ignored, or have to be considered part of the signature; as a rule, it's ignored by most compilers).
And all of this becomes irrelevant if the compiler decides to inline, as there is then no real "argument passing" at all.
register is enforced, so it can serve as an assertion that you won't take the address; any attempt to do so is then a compile error.
Visual Studio 2012 Microsoft documentation regarding the register keyword:
The compiler does not accept user requests for register variables; instead, it makes its own register choices when global register-allocation optimization (/Oe option) is on. However, all other semantics associated with the register keyword are honored.
Visual Studio 2012 Microsoft documentation regarding the __fastcall keyword:
The __fastcall calling convention specifies that arguments to functions are to be passed in registers, when possible. The following list shows the implementation of this calling convention.
You can still have a look at the assembler code created by the compiler to check what actually happens.
register is essentially meaningless in modern C/C++. Compilers ignore it, putting whichever variables in registers they want (and note that a given variable will often be in a register some of the time, and in the stack some of the time, during the function's execution). It has some minor utility in hinting non-aliasing, but using restrict (or a given compiler's equivalent to restrict) is a better way to achieve that.
__fastcall does improve performance slightly, though not as much as you'd expect. If you have a small function which is called often, the number one thing to do to improve performance is to inline it.
In short, it depends on your architecture and your compiler.
The main difference between these two syntaxes is that register is standardized and __fastcall isn't, but they are both calling conventions.
The default calling convention in C is the cdecl, where parameters are pushed into the stack in reverse order, and return value is stored on EAX register. Every data register can be used in the function, before the call they are caller-saved.
There is another convention, the fastcall, which is indicated by the register keyword. It passes arguments into EAX, ECX and EDX registers (the remaining args are pushed into the stack).
And __fastcall keyword isn't conventionned, it totaly depends on your compiler. With cl (Visual Studio), it seems to store the four first arguments of your function to registers, except on x86-64 and ARM archs. With gcc, the two first arguments are stored on register, regardless of the arch.
But keep in mind that compilers are able by themselves to optimize your code to greatly improve its speed. And I bet that for your function there is a better way to optimize your code.
But you need to disable optimisation to use these keywords (volatile as well). Which is a thing I totaly not recommend.

cast pointer to functor, and call it

can I do something like:
typedef void (*functor)(void* param);
//machine code of function
char functionBody[] = {
0xff,0x43,0xBC,0xC0,0xDE,....
}
//cast pointer to function
functor myFunc = (functor)functionBody;
//call to functor
myFunc(param);
Formally, the C language doesn't allow conversions between function pointers and object pointers, so this can't be done.
However, many C implementations - perhaps even "most" - support this as an extension. Whether it works or not depends on things like memory permissions and cache coherency, which will change depending on your architecture and operating system.
Depends on protection ring.
on most popular desktop platforms, you will not be able to execute code in data segment because of page privileges.
On modern operating systems, it depends on whether the memory is marked executable or not. On POSIX systems, it may be possible to obtain executable memory using mmap. Keep in mind that even on a given cpu architecture, calling convention may vary. For example if the caller expects the callee to clear arguments off the stack, your code had better do that or it will crash on return. (Normally, C ABIs don't make this stupid requirement, but it's something to think about.)
Rather than trying to call your machine code directly as a C function pointer, it may be better to write an inline asm wrapper that calls it. This way, you have control over the calling convention.
It could work. Possibly using const char[] for the machine instructions is even better, because on many platforms static const storage is placed in the read-only program memory section.

Resources