Clarification on implementation defined behavior in C - c

As an example of implementation defined behavior in C. The C Standard says that the size of data types are implementation defined. So, say sizeof(int) is implementation defined.
Does this implementation defined behavior mean that the size(int) is platform dependent or defined by compiler vendor or both?
Once I compile my code, does implementation dependencies would still apply when I run it on different versions of platforms? Would I get performance loss for compiling implementation defined code on one platform and running it on other?

Yes, implementation defined means that it depends on the platform (Architecture + OS ABI + compiler).
And yes, implementation defined features can differ across different versions of the platform.

Does this implementation defined behavior mean that the size(int) is platform dependent or defined by compiler vendor or both?
In principle the compiler vendor can make that decision. In practice, if the compiler wants to emit code that calls directly to system libraries, then it has to follow the same "ABI" (Application Binary Interface) as the system, and among other things the ABI will specify the size of int. So the compiler vendor will "decide" to make it the size the ABI says.
Compilers that target multiple platforms and architectures will make the decision separately as part of the configuration of each platform. Each target then represents a different C implementation, even though you think of it as "the same compiler".
You could write a conforming C implementation in which int is a different size from what it is on the OS that runs the program. People rarely do, and the standard libraries would have to jump through extra hoops when they make system calls. It could be useful as part of an emulator, but then you might reasonably argue that the "platform" is the emulated platform, not the host platform with its different-sized int.
Once I compile my code, does implementation dependencies would still apply when I run it on different versions of platforms?
sizeof(int) is a compile-time constant, which means that the code emitted by your compiler might assume a certain value. That binary code cannot then run correctly on a different version of the platform with a different sized int.
Would I get performance loss for compiling implementation defined code on one platform and running it on other?
If it works at all, then there's no particular reason to assume there will be a performance loss. It generally won't work at all (see above), because binary code intended for one platform in general doesn't work on another platform. If the platforms are similar enough that it does work, it's possible that optimizations that the compiler made intended for one, are not such good optimizations on another. In that case there would be a performance loss, and the fix would be to re-compile the code targeting the correct (version of the) platform.
This does happen with ARM, and to a lesser extent with x86. Different chips in the past have offered essentially the same instruction set, but with some instructions on some chips having significantly different cost relative to other instructions. An optimization that assumes instruction X is fast would likely be a bad optimization on a different chip where instruction X is slow. As you can imagine, this kind of difference doesn't make the chip manufacturer hugely popular with compiler vendors, and even less so with assembly programmers.

Does this implementation defined behavior mean that the size(int) is platform dependent or defined by compiler vendor or both?
In the C Standard terminology, the implementation is the compiler.
Here is the actual definition from the C Standard of the term implementation:
(C99, 3.12p1) implementation: particular set of software, running in a particular translation environment under particular control options, that performs translation of programs for, and supports execution of functions in, a particular execution environment

size(int) is indeed implementation dependent. It has nothing to do with performance, but rather architecture of the platform you are using. A CPU that is 32-bit wide will behave differently than one that is 64-bit wide or even one that is 16-bit wide.
That's what they mostly refer to by platform dependent, but also there is the question of cross-compiling, which brings even more issues. You can use flags like -m to specify the architecture and width which causes code to use run under different platforms than it was originally compiled on.

According to the C- standard
ISO/IEC 9899:1999 ยง3.4.1
1 implementation-defined behavior
unspecified behavior where each implementation documents how the choice is made`
It means the behavior which is documented in compiler is implementation defined.
sizeof() is documented.
2 EXAMPLE : An example of implementation-defined behavior is the propagation of the high-order bit
when a signed integer is shifted right.
Annex J 'Portability Issues' includes a lists of Unspecified Behaviour (J.1), Undefined Behaviour (J.2), Implementation-Defined Behaviour (J.3) and Locale-Specific Behaviour (J.4).

Related

How to make C codes in Tru64 Unix to work in Linux 64 bit?

I wanna know probable problems faced while moving C programs for eg. server process from Tru64 Unix to Linux 64 bits and why? What probable modifications the program would need or only recompiling the source code in new environment would do as both are 64 bit platforms? I am a little confused, I gotta know before I start working on it.
I spent a lot of time in the early 90s (OMG I feel old...) porting 32-bit code to the Alpha architecture. This was back when it was called OSF/1.
You are unlikely to have any difficulties relating to the bit-width when going from Alpha to x86_64.
Developers are much more aware of the problems caused by assuming that sizeof(int) == sizeof(void *), for example. That was far and away the most common problem I used to have when porting code to Alpha.
Where you do find differences they will be in how the two systems differ in their conformity to various API specifications, e.g. POSIX, XOpen, etc. That said, such differences are normally easily worked around.
If the Alpha code has used the SVR4 style APIs (e.g. streams) that you may have more difficulty than if it has used the more BSD-like APIs.
64 bit architecture is only an approximation of the classification of an architecture.
Ideally your code would have used only "semantic" types for all descriptions of variables, in particular size_t and ptrdiff_t for sizes and pointer arithmetic and the [u]intXX_t for types where a particular width is assumed.
If this is not the case, the main point would be to compare all the standard arithmetic types (all integer types, floating point types and pointers) if they map to the same concept on both platforms. If you find differences, you know the potential trouble spots.
Check the 64-bit data model used by both platforms, most 64bit Unix-like OS's use LP64, so it is likely that your target platforms use the same data model. This being the case you should have few problems given that teh code itself compiles and links.
If you use the same compiler (e.g. GCC) on both platforms you also need not worry about incompatible compiler extensions or differences in undefined or implementation defined behaviour. Such behaviour should be avoided in any case - even if the compilers are the same, since it may differ between target architectures. If you are not using the same compiler, then you need to be cautious about using extensions. #pragma directives are a particular issue since a compiler is allowed to quietly ignore a #pragma it does not recognise.
Finally in order to compile and link, any library dependencies outside the C standard library need to be available on both platforms. Most OS calls will be available since Unix and Linux share the same POSIX API.

C struct alignment and portability across compilers

Assuming the following header file corresponding to, for example, a shared library. The exported function takes a pointer to a custom structure defined in this header:
// lib.h
typedef struct {
char c;
double d;
int i;
} A;
DLL_EXPORT void f(A* p);
If the shared library is built using one compiler and then is used from C code built with another compiler it might not work because of a different memory alignment, as Memory alignment in C-structs suggests. So, is there a way to make my structure definition portable across different compilers on the same platform?
I am interested specifically in Windows platform (apparently it does not have a well-defined ABI), though would be curious to learn about other platforms as well.
TL;DR in practice you should be fine.
The C standard does not define this but a platform ABI generally does. That is, for a given CPU architecture and operating system, there can be a definition for how C maps to assembly that allows different compilers to interoperate.
Struct alignment isn't the only thing that a platform ABI has to define, you also have function calling conventions and stuff like that.
C++ makes it even more complex and the ABI has to specify vtables, exceptions, name mangling, etc.
On Windows I think there are multiple C++ ABIs depending on compiler but C is mostly compatible across compilers. I could be wrong, not a Windows expert.
Some links:
what is an ABI? http://gcc.gnu.org/ml/libstdc++/2001-11/msg00063.html
things an ABI has to define C++ ABI issues list
example C++ ABI spec http://sourcery.mentor.com/public/cxx-abi/abi.html
how the ABI evolved on Solaris http://developers.sun.com/solaris/articles/CC_abi/CC_abi_content.html
Anyway the bottom line is that you're looking for your guarantee in the platform/compiler ABI spec, not the C standard.
The only way to know for sure is to consult the documentation of the compilers in question. However, it is usually the case that C struct layout (except, as you say, for bitfields) is defined by an ABI description for the environment you're using, and C compilers will tend to follow the native ABI.
Not only that it is not guarantied, but even if you use the same compiler there might be differences due to different compiler switches used in the build, or if you use different versions of the same compiler and same switches (happened in an embedded compiler I worked on).
You need to make make sure the structs are represented exactly the same, use switches, #pragmas, whatever the compilers gives you.
My advice - to stay way from this altogether. Pass your arguments in the function, not wrapped within a struct.
And even in this simple form, if you deal with two compilers, it's not trivial. You need to make sure that an int takes the same number of bytes, for example. Also calling conevntion - arguments order - from left to right or from right to left - can differ between compiler.

Is it possible to dynamically create equivalent of limits.h macros during compilation?

Main reason for this is attempt to write perfectly portable C library. After a few weeks i ended up with constants, which are unfortunately not very flexible (using constants for defining another constants isn't possible).
Thx for any advice or critic.
What you ask of is impossible. As stated before me, any standards compliant implementation of C will have limits.h correctly defined. If it's incorrect for whatever reason, blame the vendor of the compiler. Any "dynamic" discovery of the true limits wouldn't be possible at compile time, especially if you're cross compiling for an embedded system, and thus the target architecture might have smaller integers than the compiling system.
To dynamically discover the limits, you would have to do it at run-time by bit shifting, multiplying, or adding until an overflow is encountered, but then you have a variable in memory rather than a constant, which would be significantly slower. (This wouldn't be reliable anyways since different architectures use different bit-level representations, and arithmetic sometimes gets a bit funky around the limits especially with signed and abstract number representations such as floats)
Just use standard types and limits as found in stdint.h and limits.h, or try to avoid pushing the limits all together.
First thing that comes to my mind: have you considered using stdint.h? Thanks to that your library will be portable across C99-compliant compilers.

What parts of C are most portable?

I recently read an interview with Lua co-creators Luiz H. de Figueredo and Roberto Ierusalimschy, where they discussed the design, and implementation of Lua. It was very intriguing to say the least. However, one part of the discussion brought something up in my mind. Roberto spoke of Lua as a "freestanding application" (that is, it's pure ANSI C that uses nothing from the OS.) He said, that the core of Lua was completely portable, and because of its purity has been able to be ported much more easily and to platforms never even considered (such as robots, and embedded devices.)
Now this makes me wonder. C in general is a very portable language. So, what parts of C (namely those in the the standard library) are the most unportable? and what are those that can be expected to work on most platforms? Should only a limited set of data types be used (e.g. avoiding short and maybe float)? What about the FILE and the stdio system? malloc and free? It seems that Lua avoids all of these. Is that taking things to the extreme? Or are they the root of portability issues? Beyond this, what other things can be done to make code extremely portable?
The reason I'm asking all of this, is because I'm currently writing an application in pure C89, and it's optimal that it be as portable as possible. I'm willing take a middle road in implementing it (portable enough, but no so much that I have to write everything from scratch.) Anyways, I just wanted to see what in general is key to writing the best C code.
As a final note, all of this discussion is related to C89 only.
In the case of Lua, we don't have much to complain about the C language itself but we have found that the C standard library contains many functions that seem harmless and straight-forward to use, until you consider that they do not check their input for validity (which is fine if inconveninent). The C standard says that handling bad input is undefined behavior, allowing those functions to do whatever they want, even crash the host program. Consider, for instance, strftime. Some libc's simply ignore invalid format specifiers but other libc's (e.g., in Windows) crash! Now, strftime is not a crucial function. Why crash instead of doing something sensible? So, Lua has to do its own validation of input before calling strftime and exporting strftime to Lua programs becomes a chore. Hence, we have tried to stay clear from these problems in the Lua core by aiming at freestanding for the core. But the Lua standard libraries cannot do that, because their goal is to export facilities to Lua programs, including what is available in the C standard library.
"Freestanding" has a particular meaning in the context of C. Roughly, freestanding hosts are not required to provide any of the standard libraries, including the library functions malloc/free, printf, etc. Certain standard headers are still required, but they only define types and macros (for example stddef.h).
C89 allows two types of compilers: hosted and freestanding. The basic difference is that a hosted compiler provides all of the C89 library, while a freestanding compiler need only provide <float.h>, <limits.h>, <stdarg.h>, and <stddef.h>. If you limit yourself to these headers, your code will be portable to any C89 compiler.
This is a very broad question. I'm not going to give the definite answer, instead I'll raise some issues.
Note that the C standard specifies certain things as "implementation-defined"; a conforming program will always compile on and run on any conforming platform, but it may behave differently depending on the platform. Specifically, there's
Word size. sizeof(long) may be four bytes on one platform, eight on another. The sizes of short, int, long etc. each have some minimum (often relative to each other), but otherwise there are no guarantees.
Endianness. int a = 0xff00; int b = ((char *)&a)[0]; may assign 0 to b on one platform, -1 on another.
Character encoding. \0 is always the null byte, but how the other characters show up depends on the OS and other factors.
Text-mode I/O. putchar('\n') may produce a line-feed character on one platform, a carriage return on the next, and a combination of each on yet another.
Signedness of char. It may or it may not be possible for a char to take on negative values.
Byte size. While nowadays, a byte is eight bits virtually everywhere, C caters even to the few exotic platforms where it is not.
Various word sizes and endiannesses are common. Character encoding issues are likely to come up in any text-processing application. Machines with 9-bit bytes are most likely to be found in museums. This is by no means an exhaustive list.
(And please don't write C89, that's an outdated standard. C99 added some pretty useful stuff for portability, such as the fixed-width integers int32_t etc.)
C was designed so that a compiler may be written to generate code for any platform and call the language it compiles, "C". Such freedom acts in opposition to C being a language for writing code that can be used on any platform.
Anyone writing code for C must decide (either deliberately or by default) what sizes of int they will support; while it is possible to write C code which will work with any legal size of int, it requires considerable effort and the resulting code will often be far less readable than code which is designed for a particular integer size. For example, if one has a variable x of type uint32_t, and one wishes to multiply it by another y, computing the result mod 4294967296, the statement x*=y; will work on platforms where int is 32 bits or smaller, or where int is 65 bits or larger, but will invoke Undefined Behavior in cases where int is 33 to 64 bits, and the product, if the operands were regarded as whole numbers rather than members of an algebraic ring that wraps mod 4294967296, would exceed INT_MAX. One could make the statement work independent of the size of int by rewriting it as x*=1u*y;, but doing so makes the code less clear, and accidentally omitting the 1u* from one of the multiplications could be disastrous.
Under the present rules, C is reasonably portable if code is only used on machines whose integer size matches expectations. On machines where the size of int does not match expectations, code is not likely to be portable unless it includes enough type coercions to render most of the language's typing rules irrelevant.
Anything that is a part of the C89 standard should be portable to any compiler that conforms to that standard. If you stick to pure C89, you should be able to port it fairly easily. Any portability problems would then be due to compiler bugs or places where the code invokes implementation-specific behavior.

Does C have a standard ABI?

From a discussion somewhere else:
C++ has no standard ABI (Application Binary Interface)
But neither does C, right?
On any given platform it pretty much does. It wouldn't be useful as the lingua franca for inter-language communication if it lacked one.
What's your take on this?
C defines no ABI. In fact, it bends over backwards to avoid defining an ABI. Those people, who like me, who have spent most of their programming lives programming in C on 16/32/64 bit architectures with 8 bit bytes, 2's complement arithmetic and flat address spaces, will usually be quite surprised on reading the convoluted language of the current C standard.
For example, read the stuff about pointers. The standard doesn't say anything so simple as "a pointer is an address" for that would be making an assumption about the ABI. In particular, it allows for pointers being in different address spaces and having varying width.
An ABI is a mapping from the execution model of the language to a particular machine/operating system/compiler combination. It makes no sense to define one in the language specification because that runs the risk of excluding C implementations on some architectures.
C has no standard ABI in principle, but in practice, this rarely matters: You do what your OS-vendor does.
Take the calling conventions on x86 Windows, for example: The Windows API uses the so-called 'standard' calling convention (stdcall). Thus, any compiler which wants to interface with the OS needs to implement it. However, stdcall doesn't support all C90 language features (eg calling functions without prototypes, variadic functions). As Microsoft provided a C compiler, a second calling convention was necessary, called the 'C' calling convention (cdecl). Most C compilers on Windows use this as their default calling convention, and thus are interoperable.
In principle, the same could have happened with C++, but as the C++ ABI (including the calling convention) is necessarily far more elaborate, compiler vendors did not agree on a single ABI, but could still interoperate by falling back to extern "C".
The ABI for C is platform specific - it covers issues such as register allocation and calling conventions, which are obviously specific to a particular processor. Here are some examples:
The ARM ABI (includes C++)
The PowerPC Embedded ABI
The several ABIs of x86
x86 has had many calling conventions, which extensions under Windows to declare which one is used. Platform ABIs for embedded Linux have also changed over time, leading to incompatible user space. See some history of the ARM Linux port here, which shows the problems in the transition to a newer ABI.
Although several attempts have been
made at defining a single ABI for a
given architecture across multiple
operating systems (Particularly for
i386 on Unix Systems), the efforts
have not met with such success.
Instead, operating systems tend to
define their own ABIs ...
Quoting ... Linux System Programming page 4.
An ABI, even for C, has parts which are quite platform independent, parts which depend on the processor (which registers should be saved, which are used for passing parameters,...) and parts which depend on the OS (more or less the same factors as for the processor as some choices are not imposed by the architecture but are the result of trade-offs, plus some OS's have a language independent notion of exception and so a compiler for any language has to generate the right thing to handle those, handling of threads may also impose things on the ABI -- if a register points to TLS, you can't use it for what you want).
In theory, every compiler may have its own ABI. But usually, for a couple processor/OS, the ABI is fixed by the OS vendor which often also provide a C compiler and common libraries which use that ABI and competitors prefer to be compatible. (I'd not be surprised if there are exceptions for some OS for which C isn't a major programming language).
But the OS vendor may switch ABI for one reason or the other (new versions of processors may have features that you want to use in the ABI for one - for instance some have asked for a 32bit ABI for x86_64 allowing to use all the registers). During the migration phase - which may be for a very long time - you may have to handle two ABI.
neither does C, right?Right
On any given platform it pretty much does. It wouldn't be useful as the lingua franca for inter-language communication if it lacked one.Pretty much might refer to architecture-specific defaults chosen by C compiler vendors being adapted within other languages. So if Keil's ARM C compiler will use left to right little endian parameter ordering and stack to pass arguments and some predetermined register for return value, then extern "C" from other compilers will assume compatibility with such scheme.
While such agreement maybe considered part of ABI, unlike managed execution context such as JVM browser sandbox, this is far from being complete standard ABI by itself.
C does not have a standard ABI. This is easily illustrated by all the calling conventions (cdecl, fastcall and stdcall) that are used out there. Each is a different ABI.
There's no standard ABI because C has always been about maximum runtime performance and the ABI with the highest performance depends on the underlying hardware. As a result, the ABI may use only stack or prefer registers for passing function call arguments and return values as needed for any given hardware.
For example, even amd64 (a.k.a x86-64) has two calling conventions: Microsoft x64 and System V AMD64 ABI. The former puts 4 first arguments to registers and the rest into the stack. The latter puts 6 first arguments to registers and the rest into the stack. I have no idea why Microsoft created non-compatible calling convention for amd64 hardware. For all I know, the Microsoft variant has a slightly worse performance and was created later.
For more information, see https://en.wikipedia.org/wiki/X86_calling_conventions
Prior to the C89 Standard, C compilers for many platforms used essentially the same ABI, save for variations in data sizes. For machines whose stack grows downward, code which calls a function would push the arguments on the stack in order from right to left and then call the function (pushing the return address in the process). A called function would leave its arguments on the stack, and the caller would at its leisure adjust the stack pointer to remove them [or, on some architectures, might adjust the stacked values in place]. While <stdarg.h> made it unnecessary for most programs to rely upon that convention, it remained in use for many years because it was simple and worked pretty well. While there was no "official" document establishing that as a cross-platform "standard", most compilers targeting machines with downward-growing stacks worked that way, leading to a greater level of consistency than exists today.

Resources