Why calling conventions aren't used in all C programs - c

I am new to programming and while reading Charles Petzold book Programming Windows, I stumbled upon WINAPI (actually was surprised by the presence of another word before a function's name besides the return type) and found that it is a calling convention and to the best of my understanding it is a way of how a function pushes variables on the stack and gets the return value, I wondered why we do not use them in every C programs? Are they just exclusive to OS programming?

Calling conventions are typically tied to compiler, architecture and (when it comes to using system runtime libraries) the OS; they're not part of the C standard at all. In most cases, there's only one calling convention for a given architecture/compiler/OS combo, so you don't need to think about it; it just uses the only convention that OS supports.
The one place where it has mattered a lot in recent history was on 32 bit x86 systems, particularly on Windows. x86 had very few general purpose registers, so only a few were available at all, less than what a typical function might need for its arguments, and using them for argument passing meant you often needed to push whatever they used to contain to the stack, so there were a lot of trade-offs involved in calling conventions (it's faster to pass arguments in registers, but only if the caller could spare the registers or at least not be forced into excessive spilling to stack), and Windows went with "we'll use 'em all in different scenarios".
In modern usage on x86-64 (which is far less register starved) and on non-x86 architectures (which usually had enough registers), most compilers/OSes stick with a single common calling convention, so again, you don't need to pay attention. It's a curiosity, not something you need to personally pay attention to unless you're hand-writing whole functions in assembly.

We DO use calling conventions in all C programs, but they are typically defaulted in the compiler settings and so do not have to be expressed explicitly in code, unless actually necessary (library interactions, etc).
Calling conventions are not part of the C language itself, but are handled by compiler vendors as extensions to the language.
Most C compilers typically default to the __cdecl calling convention, but this can be changed by the compiler user if needed. Once upon a time, Windows APIs used to use __pascal, but for a very long time now __stdcall is being used instead. Hence the existence of the WINAPI preprocessor macro so Microsoft could switch between them without requiring most existing code to be rewritten.

Related

ABI of functions in system libraries

I'm generating machine code to call functions from existing system libraries. Most system libraries were written in C, so I'll take C as an example, but the question probably applies to any other language.
If I understand this answer correctly, C compilers are free to choose the ABI/calling convention of a function as long as they preserve the semantics. For instance they can choose to pass a pointer for the returned value as an argument to obtain copy-elision.
Does this mean that no one can ever truly know what's the right way to call a function from a library, even if its C signature is known?
Is this a real concern in practice? Or is it safe to assume that all the functions with non-mangled names from system libraries always use the system's default calling convention?
What other assumptions or considerations can I make about the ABI/calling convention of functions with non-mangled names in system libraries?
C compilers are free to choose the ABI/calling convention of a function as long as they preserve the semantics.
Well, yes and no. The ABI is often defined by the target system, in which case the compiler has to fall in line. In case there exists no ABI for the target system (often the case in microcontroller programming), the compiler is free to do as it pleases, essentially inventing the ABI.
Does this mean that no one can ever truly know what's the right way to call a function from a library, even if its C signature is known?
No you can't unless you know the target system and calling convention. Some systems have several "de facto" standards such as x86 Windows __cdecl vs __stdcall see https://en.wikipedia.org/wiki/X86_calling_conventions
Is this a real concern in practice?
Not within a program written entirely in C. But it becomes a big problem in case the program links external libs such as Windows DLLs, possibly written in other languages. Then you have to use the right calling convention or the program will soon crash.
It's also a very real concern whenever you attempt to mix assembler and C for the given system - the C compiler will handle stacking according to the calling convention, but in the assembler part you have to write this manually. This can also affect the C code, if it is written with care to suit assembler. You'd then pick parameter and return types that are convenient to use.
If I understand this answer correctly, C compilers are free to choose
the ABI/calling convention of a function as long as they preserve the
semantics. For instance they can choose to pass a pointer for the
returned value as an argument to obtain copy-elision.
I don't see how you conclude that from the answer you referenced. Calling conventions are a characteristic of the function, as it appears in compiled form. The compiler can do all manner of tricks at the point of call, but changing or ignoring the calling conventions of the function implementation is not one of them. Where it is possible, copy elision for returned structure values (the subject of that answer) does not rely on any such thing.
Does this mean that no one can ever truly know what's the right way to
call a function from a library, even if its C signature is known?
Yes and no. The function signature alone does not convey anything about calling convention (with some caveats; see below), but libraries simply could not work if there were no way to know calling conventions. In practice, it is usually the case that calling convention (and ABI overall) is standardized on a per-platform basis.
Thus, for example, Linux implementations for x86_64 substantially all follow the same conventions. All the toolchains targeting that platform both use that convention for function calls and provide for functions to be called according to it. Compilers for Win64 likewise follow the appropriate (different) conventions.
Windows is in fact an interesting case, however, because historically, it has supported multiple calling conventions. In its case, there is a default convention, and different conventions can be specified in function declarations via extension keywords. The compiler knows which convention to use based on the function declaration.
Additionally, where it is not concerned about interoperability, compilers can do anything within their power. So, for example, when compiling a function with internal linkage, it could, in principle, use whatever calling convention it wants, as it is in full control of both the function and all callers (ignoring the possible effect of function pointers). This is not different in kind from compilers' ability to inline functions. As a practical matter, however, I would not expect compilers to use variant calling conventions under such circumstances, and I am not aware of any that do.
Is this a real concern in practice? Or is it safe to assume that all
the functions with non-mangled names from system libraries always use
the system's default calling convention?
Name mangling has nothing to do with it. That's part of a higher-level mapping of C++ (usually) semantics onto system-level, source-language-independent object-file formats.
Generally speaking, it is safe to assume that where the appropriate function declarations are in scope (from the library's header files, typically), the compiler will generate correct calls. This is an essential interoperability characteristic that is rarely violated in practice. It cannot be construed as a universal guarantee, but in practice, it is not something that you should worry about.
What other assumptions or considerations can I make about the
ABI/calling convention of functions with non-mangled names in system
libraries?
I'm unsure what kinds of assumptions you have in mind, and I suspect you're overcomplicating things. You make sure to include the header(s) from the relevant library that declare the functions you want to call. Having done so, you rely on your compiler to generate correct calls.

What remains in C if I exclude libraries and compiler extensions?

Imagine a situation where you can't or don't want to use any of the libraries provided by the compiler as "standard", nor any external library. You can't use even the compiler extensions (such as gcc extensions).
What is the remaining part you get if you strip C language of all the things a lot of people use as a matter of course?
In such a way, probably a list of every callable function supported by any big C compiler (not only ANSI C) out-of-box would be satisfying as as answer as it'd at least approximately show the use-case of the language.
First I thought about sizeof() and printf() (those were already clarified in the comments - operator + stdio), so... what remains? In-line assembly seem like an extension too, so that pretty much strips even the option to use assembly with C if I'm right.
Probably in the matter of code it'd be easier to understand. Imagine a code compiled with only e.g. gcc main.c (output flag permitted) that has no #include, nor extern.
int main() {
// replace_me
return 0;
}
What can I call to actually do something else than "boring" type math and casting from type to type?
Note that switch, goto, if, loops and other constructs that do nothing and only allow repeating a piece of code aren't the thing I'm looking for (if it isn't obvious).
(Hopefully the edit clarified wtf I'm actually asking, but Matteo's answer pretty much did it.)
If you remove all libraries essentially you have something similar to a freestanding implementation of C (which still has to provide some libraries - say, string.h, but that's nothing you couldn't easily implement yourself in portable C), and that's what normally you start with when programming microcontrollers and other computers that don't have a ready-made operating system - and what operating system writers in general use when they compile their operating systems.
There you typically have two ways of doing stuff besides "raw" computation:
assembly blocks (where you can do literally anything the underlying machine can do);
memory mapped IO (you set a volatile pointer to some hardware dependent location and read/write from it; that affects hardware stuff).
That's really all you need to build anything - and after all, it all boils down to that stuff anyway, the C library of a regular hosted implementation is normally written in C itself, with some assembly used either for speed or to communicate with the operating system1 (typically the syscalls are invoked through some kind of interrupt).
Again, it's nothing you couldn't implement yourself. But the point of having a standard library is both to avoid to continuously reinvent the wheel, and to have a set of portable functions that spare you to have to rewrite everything knowing the details of each target platform.
And mainstream operating systems, in turn, are generally written in a mix or C and assembly as well.
C has no "built-in" functions as such. A compiler implementation may include "intrinsic" functions that are implemented directly by the compiler without provision of an external library, although a prototype declaration is still required for intrinsics, so you would still normally include a header file for such declarations.
C is a systems-level language with a minimal run-time and start-up requirement. Because it can directly access memory and memory mapped I/O there is very little that it cannot do (and what it cannot do is what you use assembly, in-line assembly or intrinsics for). For example, much of the library code you are wondering what you can do without is written in C. When running in an OS environment however (using C as an application-level rather then system-level language), you cannot practically use C in that manner - the OS has control over such things as I/O and memory-management and in modern systems will normally prevent unmediated access to such resources. Of course that OS itself is likely to largely written in C (and/or C++).
In a standalone of bare-metal environment with no OS, C is often used very early in the bootstrap process initialising hardware and establishing an application execution environment. In fact on ARM Cortex-M processors it is possible to boot directly into C code from reset, since the hardware loads an initial stack-pointer and start address from the vector table on start-up; this being enough to run C code that does not rely on library or static data initialisation - such initialisation can however be written in C before calling main().
Note that sizeof is not a function, it is an operator.
I don't think you really understand the situation.
You don't need a header to call a function in C. You can call with unchecked parameters - a bad idea and an obsolete feature, but still supported. And if a compiler links a library by default instead of only when you explicitly tell it to, that's only a little switch within the compiler to "link libc". Notoriously Unix compilers need to be told to link the math library, it wasn't linked by default because some very early programs didn't use floating point.
To be fair, some standard library functions like memcpy tend to be special-cased these days as they lend themselves to inlining and optimisation.
The standard library is documented and is usually available, though in effect deprecated by Microsoft for security reasons. You can write pretty much any function quite easily with only stdlib functions, what you can't do is fancy IO.

Why should I not use __fastcall instead the standard __cdecl?

I'd listening some people saying __fastcall is faster than __cdecl and __stdcall cause it puts two parameters in register, instead of the one of other calls; but, in other hand, this is not the standard used in C.
I would like to know what makes __fastcall undesirable like a standard in C and when I will use this in my code.
The x86 platform is unusual in that it doesn't define a global ABI and calling convention.
Win32/x86 does, it standardizes on stdcall. There are various tradeoffs between calling conventions -- placing parameters in registers is faster, but it forces the caller to spill whatever was previously using those registers. So it's hard to predict which gives better performance.
The important thing is to have a uniform standard calling convention to enable interoperability between different compilers (and even different programming languages).
Other platforms don't have cdecl, stdcall, or fastcall conventions. They don't have the same set of registers. In some cases, they don't even have registers at all. But they still can use C code.
Win32/x86_64 doesn't use stdcall, it uses a 64-bit extension of fastcall.
Linux/x86 has a convention also.
Are you looking for a calling convention to specify for a library interface? Because for all other functions, I wouldn't specify a calling convention at all. The compiler's optimization pass (auto-inlining for instance) probably renders the calling convention useless.
But regarding fastcall: as far as I remember, it's not standardized, and therefore not suitable for library code. Here is nice overview: Calling Conventions Demystified

Assuming a calling convention when combining C and x86 Assembly

I have some assembly routines that are called by and take arguments from C functions. Right now, I'm assuming those arguments are passed on the stack in cdecl order. Is that a fair assumption to make?
Would a compiler (GCC) detect this and make sure the arguments are passed correctly, or should I manually go and declare them cdecl? If so, will that attribute still hold if I specify a higher optimisation level?
Calling conventions mean much more than just argument ordering. There is a good pdf explaining all the details, written by Agner Fog: Calling conventions for different C++ compilers and operating systems.
This is a matter of the ABI for the platform you're writing code for. Almost all platforms follow the Unix System V ABI for C calling convention and other ABI issues, which includes both a general ABI (gABI) document detailing the common ABI characteristics across all CPU architectures, and a processor-specific ABI (psABI) document specific to the particular CPU architecture/family. When it comes to x86, this matches what you refer to as "cdecl". So from a practical standpoint, x86 assembly meant to be called from C should be written to assume "cdecl". Basically the only exception to the universality of this calling convention is Windows API functions, which use their own nonstandard "stdcall" calling convention due to legacy Win16 dll thunk compatibility issues; nonetheless, the "default" calling convention on x86 Windows is still "cdecl".
A more important concern when writing asm to be called from C is whether symbol names should be prefixed with an underscore or not. This varies widely between platforms, with the general trend being that ELF-based platforms don't use the prefix, and most other platforms do...
The quick and dirty way to do it is create a dummy C function that matches the asm function you want to implement, do a few things in the dummy C function with the passed in parameters so you can tell them apart, compile then disassemble. Not foolproof but works often.

Does C have a standard ABI?

From a discussion somewhere else:
C++ has no standard ABI (Application Binary Interface)
But neither does C, right?
On any given platform it pretty much does. It wouldn't be useful as the lingua franca for inter-language communication if it lacked one.
What's your take on this?
C defines no ABI. In fact, it bends over backwards to avoid defining an ABI. Those people, who like me, who have spent most of their programming lives programming in C on 16/32/64 bit architectures with 8 bit bytes, 2's complement arithmetic and flat address spaces, will usually be quite surprised on reading the convoluted language of the current C standard.
For example, read the stuff about pointers. The standard doesn't say anything so simple as "a pointer is an address" for that would be making an assumption about the ABI. In particular, it allows for pointers being in different address spaces and having varying width.
An ABI is a mapping from the execution model of the language to a particular machine/operating system/compiler combination. It makes no sense to define one in the language specification because that runs the risk of excluding C implementations on some architectures.
C has no standard ABI in principle, but in practice, this rarely matters: You do what your OS-vendor does.
Take the calling conventions on x86 Windows, for example: The Windows API uses the so-called 'standard' calling convention (stdcall). Thus, any compiler which wants to interface with the OS needs to implement it. However, stdcall doesn't support all C90 language features (eg calling functions without prototypes, variadic functions). As Microsoft provided a C compiler, a second calling convention was necessary, called the 'C' calling convention (cdecl). Most C compilers on Windows use this as their default calling convention, and thus are interoperable.
In principle, the same could have happened with C++, but as the C++ ABI (including the calling convention) is necessarily far more elaborate, compiler vendors did not agree on a single ABI, but could still interoperate by falling back to extern "C".
The ABI for C is platform specific - it covers issues such as register allocation and calling conventions, which are obviously specific to a particular processor. Here are some examples:
The ARM ABI (includes C++)
The PowerPC Embedded ABI
The several ABIs of x86
x86 has had many calling conventions, which extensions under Windows to declare which one is used. Platform ABIs for embedded Linux have also changed over time, leading to incompatible user space. See some history of the ARM Linux port here, which shows the problems in the transition to a newer ABI.
Although several attempts have been
made at defining a single ABI for a
given architecture across multiple
operating systems (Particularly for
i386 on Unix Systems), the efforts
have not met with such success.
Instead, operating systems tend to
define their own ABIs ...
Quoting ... Linux System Programming page 4.
An ABI, even for C, has parts which are quite platform independent, parts which depend on the processor (which registers should be saved, which are used for passing parameters,...) and parts which depend on the OS (more or less the same factors as for the processor as some choices are not imposed by the architecture but are the result of trade-offs, plus some OS's have a language independent notion of exception and so a compiler for any language has to generate the right thing to handle those, handling of threads may also impose things on the ABI -- if a register points to TLS, you can't use it for what you want).
In theory, every compiler may have its own ABI. But usually, for a couple processor/OS, the ABI is fixed by the OS vendor which often also provide a C compiler and common libraries which use that ABI and competitors prefer to be compatible. (I'd not be surprised if there are exceptions for some OS for which C isn't a major programming language).
But the OS vendor may switch ABI for one reason or the other (new versions of processors may have features that you want to use in the ABI for one - for instance some have asked for a 32bit ABI for x86_64 allowing to use all the registers). During the migration phase - which may be for a very long time - you may have to handle two ABI.
neither does C, right?Right
On any given platform it pretty much does. It wouldn't be useful as the lingua franca for inter-language communication if it lacked one.Pretty much might refer to architecture-specific defaults chosen by C compiler vendors being adapted within other languages. So if Keil's ARM C compiler will use left to right little endian parameter ordering and stack to pass arguments and some predetermined register for return value, then extern "C" from other compilers will assume compatibility with such scheme.
While such agreement maybe considered part of ABI, unlike managed execution context such as JVM browser sandbox, this is far from being complete standard ABI by itself.
C does not have a standard ABI. This is easily illustrated by all the calling conventions (cdecl, fastcall and stdcall) that are used out there. Each is a different ABI.
There's no standard ABI because C has always been about maximum runtime performance and the ABI with the highest performance depends on the underlying hardware. As a result, the ABI may use only stack or prefer registers for passing function call arguments and return values as needed for any given hardware.
For example, even amd64 (a.k.a x86-64) has two calling conventions: Microsoft x64 and System V AMD64 ABI. The former puts 4 first arguments to registers and the rest into the stack. The latter puts 6 first arguments to registers and the rest into the stack. I have no idea why Microsoft created non-compatible calling convention for amd64 hardware. For all I know, the Microsoft variant has a slightly worse performance and was created later.
For more information, see https://en.wikipedia.org/wiki/X86_calling_conventions
Prior to the C89 Standard, C compilers for many platforms used essentially the same ABI, save for variations in data sizes. For machines whose stack grows downward, code which calls a function would push the arguments on the stack in order from right to left and then call the function (pushing the return address in the process). A called function would leave its arguments on the stack, and the caller would at its leisure adjust the stack pointer to remove them [or, on some architectures, might adjust the stacked values in place]. While <stdarg.h> made it unnecessary for most programs to rely upon that convention, it remained in use for many years because it was simple and worked pretty well. While there was no "official" document establishing that as a cross-platform "standard", most compilers targeting machines with downward-growing stacks worked that way, leading to a greater level of consistency than exists today.

Resources