As from this Question What is Application binary Interface, ABIs cover details such as
data type, size, and alignment;
the calling convention, which controls how functions' arguments are
passed and return values retrieved;
the system call numbers and how an application should make system
calls to the operating system;
Stack frames (activation records) [not in the above question]
My questions are,
If ABIs define these rule then does the compiler that generates
binaries for the target architecture depends on Architecture ABIs???
for example the compiler of ARM has to follow ARM eabi???
Who design these ABIs??? the vendor, architecture designer (core), compiler designers???
A compiler is just a program like an image manipulation program. You have an input and an output, in both cases they have to conform to some standard (language, file format). But how you do it is up to you. For the compiler case you only need to make code that will work on the target platform which is defined by the instruction set, and for a sane compiler you choose or create a calling convention that works for the language, but it is up to you.
Related
I am wondering how mixing C and assembly can be possible as compilers generate code in different ways, for example many C compilers will use registers rather than pushing to the stack while making a function call, These functions will then move those registers into the appropiate memory locations because of this what if you write assembly code or link with an object file created by a different compiler that will call the C function but instead push the arguments to the stack rather than set the registers.
My guess is the C compiler assembly output has done it in such a clever way that it doesn't make a difference and it will still work but I can't be sure looking at the assembly code it doesn't appear it would work.
Can anyone answer my question as I am writing a compiler and need to know this so I don't make any mistakes should I want to link with a C module in the future.
The conventions that are used for calling functions are part of what's called the "application binary interface" (ABI). If this interface is specified, then all code that follows the specification can be linked together.
There is no standard ABI for C. However, most popular platforms have one prevailing C compiler that effectively produces a de-facto standard ABI (e.g. there's one for Windows, one for Linux on x86 (32 and 64 bit), one for Linux on ARM, etc.). ABIs may specify a large number of separate "calling conventions", and your C compiler will typically let you specify the desired convention at the point of function declaration using some vendor extension.
Conversely, if there is no documented ABI for your C compiler, or for an existing bit of object code, then you cannot in general link (or otherwise interact) with it successfully.
The memory layout of a struct is up to the compiler. So what happens when some code compiled by one compiler uses a struct generated by code compiled by another compiler?
For example, say I have a header file that declares a struct somestruct, and a function that returns the struct. One source file defines that function and is compiled by compiler A. Another source file uses than function and is compiled by compiler B and links against the binary of the other source file.
If the two compilers create two different layouts for somestruct, then what's the layout of the variable returned by the function? Does it defer to one compiler's layout, or will there be a memory bug when the second source file tries to access elements of the struct returned by the first source file? Is it an error at compile time or link time?
The function will return a structure as specified by the ABI of the compiler of the function. The callee compiler, will just treat the function as if it conforms to the ABI of itself.
Assuming the two compilers use a similar ABI, in most cases, no errors will be reported during compile-time or link time or even during runtime. For some compatible compilers like Clang, GCC, and Intel C Compiler on OS X and Linux, no errors should result (if there are errors then it's a bug of the compiler). However in real world it is usually difficult to find fully compatible compilers (in most cases their ABIs are similar but not exactly the same; such ABI errors will be even harder to track down because your app would appear normal and crashes under some really weird circumstances are encountered during runtime).
Just as Basile said, name mangling for C++ poses an additional difference in ABI, but such differences are more easily caught during compile time as the linker literally can't find the symbol of the function, rather than finding a function that is not compatible.
Also, passing structures is another headache in terms of ABI because there are multiple structure-packing ABIs, sometimes even different in "compatible" compilers like GCC/MinGW and MSVC. (See also the -m[no-]ms-bitfields option in GCC, which forces GCC to use the MSVC ABI for structures.) I have also seen some cases where passing structures by pointer is more reliable than passing structures by value.
The layout of data (e.g. structures etc...), and the call protocol (how are call done at the processor level) are defined in a (processor and operating system specific) document called Application Binary Interface. If both compilers are following the same ABI (for the same processor and the same operating system) their generated code should be interoperable.
See e.g. the wikipage for x86 calling conventions and the x86-64 ABI specification.
Name mangling, notably for C++, might also be an issue.
Read also Levine's book on Linkers and Loaders
As an example of implementation defined behavior in C. The C Standard says that the size of data types are implementation defined. So, say sizeof(int) is implementation defined.
Does this implementation defined behavior mean that the size(int) is platform dependent or defined by compiler vendor or both?
Once I compile my code, does implementation dependencies would still apply when I run it on different versions of platforms? Would I get performance loss for compiling implementation defined code on one platform and running it on other?
Yes, implementation defined means that it depends on the platform (Architecture + OS ABI + compiler).
And yes, implementation defined features can differ across different versions of the platform.
Does this implementation defined behavior mean that the size(int) is platform dependent or defined by compiler vendor or both?
In principle the compiler vendor can make that decision. In practice, if the compiler wants to emit code that calls directly to system libraries, then it has to follow the same "ABI" (Application Binary Interface) as the system, and among other things the ABI will specify the size of int. So the compiler vendor will "decide" to make it the size the ABI says.
Compilers that target multiple platforms and architectures will make the decision separately as part of the configuration of each platform. Each target then represents a different C implementation, even though you think of it as "the same compiler".
You could write a conforming C implementation in which int is a different size from what it is on the OS that runs the program. People rarely do, and the standard libraries would have to jump through extra hoops when they make system calls. It could be useful as part of an emulator, but then you might reasonably argue that the "platform" is the emulated platform, not the host platform with its different-sized int.
Once I compile my code, does implementation dependencies would still apply when I run it on different versions of platforms?
sizeof(int) is a compile-time constant, which means that the code emitted by your compiler might assume a certain value. That binary code cannot then run correctly on a different version of the platform with a different sized int.
Would I get performance loss for compiling implementation defined code on one platform and running it on other?
If it works at all, then there's no particular reason to assume there will be a performance loss. It generally won't work at all (see above), because binary code intended for one platform in general doesn't work on another platform. If the platforms are similar enough that it does work, it's possible that optimizations that the compiler made intended for one, are not such good optimizations on another. In that case there would be a performance loss, and the fix would be to re-compile the code targeting the correct (version of the) platform.
This does happen with ARM, and to a lesser extent with x86. Different chips in the past have offered essentially the same instruction set, but with some instructions on some chips having significantly different cost relative to other instructions. An optimization that assumes instruction X is fast would likely be a bad optimization on a different chip where instruction X is slow. As you can imagine, this kind of difference doesn't make the chip manufacturer hugely popular with compiler vendors, and even less so with assembly programmers.
Does this implementation defined behavior mean that the size(int) is platform dependent or defined by compiler vendor or both?
In the C Standard terminology, the implementation is the compiler.
Here is the actual definition from the C Standard of the term implementation:
(C99, 3.12p1) implementation: particular set of software, running in a particular translation environment under particular control options, that performs translation of programs for, and supports execution of functions in, a particular execution environment
size(int) is indeed implementation dependent. It has nothing to do with performance, but rather architecture of the platform you are using. A CPU that is 32-bit wide will behave differently than one that is 64-bit wide or even one that is 16-bit wide.
That's what they mostly refer to by platform dependent, but also there is the question of cross-compiling, which brings even more issues. You can use flags like -m to specify the architecture and width which causes code to use run under different platforms than it was originally compiled on.
According to the C- standard
ISO/IEC 9899:1999 ยง3.4.1
1 implementation-defined behavior
unspecified behavior where each implementation documents how the choice is made`
It means the behavior which is documented in compiler is implementation defined.
sizeof() is documented.
2 EXAMPLE : An example of implementation-defined behavior is the propagation of the high-order bit
when a signed integer is shifted right.
Annex J 'Portability Issues' includes a lists of Unspecified Behaviour (J.1), Undefined Behaviour (J.2), Implementation-Defined Behaviour (J.3) and Locale-Specific Behaviour (J.4).
I have some assembly routines that are called by and take arguments from C functions. Right now, I'm assuming those arguments are passed on the stack in cdecl order. Is that a fair assumption to make?
Would a compiler (GCC) detect this and make sure the arguments are passed correctly, or should I manually go and declare them cdecl? If so, will that attribute still hold if I specify a higher optimisation level?
Calling conventions mean much more than just argument ordering. There is a good pdf explaining all the details, written by Agner Fog: Calling conventions for different C++ compilers and operating systems.
This is a matter of the ABI for the platform you're writing code for. Almost all platforms follow the Unix System V ABI for C calling convention and other ABI issues, which includes both a general ABI (gABI) document detailing the common ABI characteristics across all CPU architectures, and a processor-specific ABI (psABI) document specific to the particular CPU architecture/family. When it comes to x86, this matches what you refer to as "cdecl". So from a practical standpoint, x86 assembly meant to be called from C should be written to assume "cdecl". Basically the only exception to the universality of this calling convention is Windows API functions, which use their own nonstandard "stdcall" calling convention due to legacy Win16 dll thunk compatibility issues; nonetheless, the "default" calling convention on x86 Windows is still "cdecl".
A more important concern when writing asm to be called from C is whether symbol names should be prefixed with an underscore or not. This varies widely between platforms, with the general trend being that ELF-based platforms don't use the prefix, and most other platforms do...
The quick and dirty way to do it is create a dummy C function that matches the asm function you want to implement, do a few things in the dummy C function with the passed in parameters so you can tell them apart, compile then disassemble. Not foolproof but works often.
From a discussion somewhere else:
C++ has no standard ABI (Application Binary Interface)
But neither does C, right?
On any given platform it pretty much does. It wouldn't be useful as the lingua franca for inter-language communication if it lacked one.
What's your take on this?
C defines no ABI. In fact, it bends over backwards to avoid defining an ABI. Those people, who like me, who have spent most of their programming lives programming in C on 16/32/64 bit architectures with 8 bit bytes, 2's complement arithmetic and flat address spaces, will usually be quite surprised on reading the convoluted language of the current C standard.
For example, read the stuff about pointers. The standard doesn't say anything so simple as "a pointer is an address" for that would be making an assumption about the ABI. In particular, it allows for pointers being in different address spaces and having varying width.
An ABI is a mapping from the execution model of the language to a particular machine/operating system/compiler combination. It makes no sense to define one in the language specification because that runs the risk of excluding C implementations on some architectures.
C has no standard ABI in principle, but in practice, this rarely matters: You do what your OS-vendor does.
Take the calling conventions on x86 Windows, for example: The Windows API uses the so-called 'standard' calling convention (stdcall). Thus, any compiler which wants to interface with the OS needs to implement it. However, stdcall doesn't support all C90 language features (eg calling functions without prototypes, variadic functions). As Microsoft provided a C compiler, a second calling convention was necessary, called the 'C' calling convention (cdecl). Most C compilers on Windows use this as their default calling convention, and thus are interoperable.
In principle, the same could have happened with C++, but as the C++ ABI (including the calling convention) is necessarily far more elaborate, compiler vendors did not agree on a single ABI, but could still interoperate by falling back to extern "C".
The ABI for C is platform specific - it covers issues such as register allocation and calling conventions, which are obviously specific to a particular processor. Here are some examples:
The ARM ABI (includes C++)
The PowerPC Embedded ABI
The several ABIs of x86
x86 has had many calling conventions, which extensions under Windows to declare which one is used. Platform ABIs for embedded Linux have also changed over time, leading to incompatible user space. See some history of the ARM Linux port here, which shows the problems in the transition to a newer ABI.
Although several attempts have been
made at defining a single ABI for a
given architecture across multiple
operating systems (Particularly for
i386 on Unix Systems), the efforts
have not met with such success.
Instead, operating systems tend to
define their own ABIs ...
Quoting ... Linux System Programming page 4.
An ABI, even for C, has parts which are quite platform independent, parts which depend on the processor (which registers should be saved, which are used for passing parameters,...) and parts which depend on the OS (more or less the same factors as for the processor as some choices are not imposed by the architecture but are the result of trade-offs, plus some OS's have a language independent notion of exception and so a compiler for any language has to generate the right thing to handle those, handling of threads may also impose things on the ABI -- if a register points to TLS, you can't use it for what you want).
In theory, every compiler may have its own ABI. But usually, for a couple processor/OS, the ABI is fixed by the OS vendor which often also provide a C compiler and common libraries which use that ABI and competitors prefer to be compatible. (I'd not be surprised if there are exceptions for some OS for which C isn't a major programming language).
But the OS vendor may switch ABI for one reason or the other (new versions of processors may have features that you want to use in the ABI for one - for instance some have asked for a 32bit ABI for x86_64 allowing to use all the registers). During the migration phase - which may be for a very long time - you may have to handle two ABI.
neither does C, right?Right
On any given platform it pretty much does. It wouldn't be useful as the lingua franca for inter-language communication if it lacked one.Pretty much might refer to architecture-specific defaults chosen by C compiler vendors being adapted within other languages. So if Keil's ARM C compiler will use left to right little endian parameter ordering and stack to pass arguments and some predetermined register for return value, then extern "C" from other compilers will assume compatibility with such scheme.
While such agreement maybe considered part of ABI, unlike managed execution context such as JVM browser sandbox, this is far from being complete standard ABI by itself.
C does not have a standard ABI. This is easily illustrated by all the calling conventions (cdecl, fastcall and stdcall) that are used out there. Each is a different ABI.
There's no standard ABI because C has always been about maximum runtime performance and the ABI with the highest performance depends on the underlying hardware. As a result, the ABI may use only stack or prefer registers for passing function call arguments and return values as needed for any given hardware.
For example, even amd64 (a.k.a x86-64) has two calling conventions: Microsoft x64 and System V AMD64 ABI. The former puts 4 first arguments to registers and the rest into the stack. The latter puts 6 first arguments to registers and the rest into the stack. I have no idea why Microsoft created non-compatible calling convention for amd64 hardware. For all I know, the Microsoft variant has a slightly worse performance and was created later.
For more information, see https://en.wikipedia.org/wiki/X86_calling_conventions
Prior to the C89 Standard, C compilers for many platforms used essentially the same ABI, save for variations in data sizes. For machines whose stack grows downward, code which calls a function would push the arguments on the stack in order from right to left and then call the function (pushing the return address in the process). A called function would leave its arguments on the stack, and the caller would at its leisure adjust the stack pointer to remove them [or, on some architectures, might adjust the stacked values in place]. While <stdarg.h> made it unnecessary for most programs to rely upon that convention, it remained in use for many years because it was simple and worked pretty well. While there was no "official" document establishing that as a cross-platform "standard", most compilers targeting machines with downward-growing stacks worked that way, leading to a greater level of consistency than exists today.