The question I have is mostly related to section four, paragraph six.
The two forms of conforming implementation are hosted and freestanding. A conforming hosted implementation shall accept any strictly conforming program.
As I understand, this constitutes the typical application environment, with filesystems, allocated memory and threads...
A conforming freestanding implementation shall accept any strictly conforming program in which the use of the features specified in the library clause (clause 7) is confined to the contents of the standard headers <float.h>, <iso646.h>, <limits.h>, <stdalign.h>, <stdarg.h>, <stdbool.h>, <stddef.h>, <stdint.h>, and <stdnoreturn.h>.
... and this constitutes the typical kernel and/or embedded, bare minimum environment that doesn't have standard filesystems, allocated memory or threads (among other things).
A conforming implementation may have extensions (including additional library functions), provided they do not alter the behavior of any strictly conforming program.
It seems as though this gives a hosted implementation the freedom to call itself a hosted or freestanding implementation, and when it comes to filesystems, allocated memory or threads (among other things), these can fall under the extension category so that it can merely implement an interface that returns a value indicating errors every time. Just to name a few:
fopen, fgets and malloc can return NULL
fprintf, fscanf, fputc and fgetc can return EOF
thrd_create can return thrd_error (indicating that "the request could not be honored")
This implies that the distinction section four, paragraph six gives is virtually meaningless. Are there any requirements guaranteeing some actual level of functionality for these functions in hosted and freestanding implementations? For example, is it required that those functions above actually be able to return something other than their corresponding failure values?
The cited paragraph already states it quite well.
A hosted execution environment is also a freestanding, but not vice versa. A compiler need only provide a freestanding implementation. gcc, for example, is strictly speaking freestanding only, as the standard library is not included. However, it assumes it is available when compiling for a hosted environment (the default), assuming the lib is available on the system (like glibc). See here
Simply put, freestanding is just the language. It is not required to support any libraries and just a few headers (mostly for common types and implementation specific stuff like numerical limits, etc.). This implies the standard library need not exist - nor do the corresponding headers. Reason is a freestanding environment most likely will not have such facilities like files, display, etc. It is used for kernels, bare-metal embedded, etc.
Note that gcc for instance will, if compiling for a hosted environment (-fhosted), assume functions used in the standard library have the corresponding meaning and might apply very aggressive optimizations (it has many of those functions built-in). For freestanding, it actually does not, so you can use a function strcmp for instance with completely different semantics. However, it assumes the mem...-functions to exist, as these are used for normal code, e.g. struct assignment.
So, when compiling for bare-metal without a standard library (or a non-standard standard library), you must use -ffreestanding.
If a hosted implementation calls itself freestanding, it is obviously not a hosted implementation anymore. Once it calls itself hosted, however, it has to provide all facilities required by the standard and is not allowed to just implement dummies, but has to provide the semantics as defined in the standard.
Just to state that clear: The cited section allows a freestanding environment to omit all functions of the library, except for the few listed headers. So you are free to supply any other library and use the same names, but do anything you like. As that would be not the standard library, there is no need for compliance.
5.1.2.1 further states that "Any library facilities available to a freestanding program, other than the minimal set required by clause 4, are implementation-defined.". That does support my statement. Sidenote: It also does not require main() as program entry point.
The hosted respectively freestanding implementations which the standard defines are minimal definitions. Both variants are the smallest common denominator of what can reasonably be implemented on a wide range of actual systems.
The rationale for defining such minimal, doable sets is to specify how a program targeting either one of the two flavors must look in order to compile and run with the expected result on ideally all implementations conforming to the respective flavor.
A program strictly conforming to the (i.e., to either) standard is maximally portable.
It goes without saying that actually existing implementations, both freestanding and hosted, typically provide a plethora of extensions, which is fine as far as the standard is concerned — with the one caveat that the extensions must not invalidate a strictly conforming program.
Back to your question: The distinction which the standard makes — the theory — is clear. Actually existing implementations — the practice — fall in a wide spectrum between and beyond the standard's minimal requirements but are bound to the "must not change the behavior" clause regarding strictly conforming programs (targeting hosted or freestanding implementations and not relying on any extensions).
By the way, your examples regarding standard library dummy functions is correct: For freestanding implementations these are just a funny case of allowed extensions: No strictly conforming program targeting freestanding environments would call them anyway. As part of a hosted environment they would be conforming but useless. (One can actually run into situations like that when certain system resources are exhausted, or they could be test stub implementations.)
There are many kinds of C implementations, targeting many kinds of different execution platforms, many of which can provide a variety of useful features and guarantees that others cannot. The authors of the Standard decided that in most cases it should be sufficiently obvious what kinds of features and guarantees should be provided by implementations targeting various platforms and application fields, and how they should be provided, that there would be no need to have a standard concern itself with such details. On the other hand, the number of applications that would require things like file I/O and the number of platforms that could provide them were sufficient to justify recognizing as "special" those implementations which included such features.
In general, implementations which are intended designed for freestanding use will be usable on platforms that would be unable to usefully handle a hosted implementation. While the Standard imposes some requirements beyond what would be practical on some of the smaller C platforms, some almost-conforming implementations of C can be quite usefully employed on processors with only enough storage to hold 256 instructions and 16 bytes' worth of variables. If something like a digital kitchen thermometer/timer gadget doesn't have a file system or console, why should it waste storage on things like descriptors for stdout?
In addition, because the Standard defines no standard means by which freestanding applications can perform I/O, and because different platforms handle I/O differently, almost freestanding applications will be targeted for a particular target platform or range of platforms. A hosted implementation which doesn't expose the natural features or guarantees the underlying platform would provide could be useful for running programs that don't require such features or guarantees. It's not possible for an embedded program to do much of anything without using platform-specific features and guarantees, however, and thus a freestanding implementation which didn't allow a programmer access to such things would be unable to do much. Quality implementations should allow programmers to use any features or guarantees that may help them accomplish what they need to do, though some may require use of compilation options to ensure that they don't do anything wacky. For some reason, it has become fashionable to regard a decision by the Standards Committee that there might be some implementations and application fields where the value of a feature or guarantee wouldn't justify the cost, as an indication that programmers should not expect implementations to provide a feature or guarantee which would be useful in low-level programming and which the platform would provide as essentially zero cost.
There are some free standing environments that cannot become hosted environment. Two known conditions that prevent a free standing environment from becoming hosted environment are:
sizeof(char) == sizeof(int);
sizeof(size_t) < sizeof(int signed);
// SIZE_MAX < INT_MAX
Hosted environment explicitly requires
sizeof(char) < sizeof(int);
// (int)EOF != (char)EOF;
sizeof(int unsigned) <= sizeof(size_t);
// UINT_MAX <= SIZE_MAX
If sizeof(char) == sizeof(int), EOF is undefined.
If argc > SIZE_MAX, you cannot call main.
Related
The declaration of strtok_s in C11, and its usage, look to be very different from the strtok_s in compilers like the latest bundled with Visual Studio 2022 (17.4.4) and also GCC 12.2.0 (looking at MinGW64 distribution).
I fear the different form has been developed as a safer and accepted alternative to strtok long before C11. What happens now if someone wants to use strtok_s and stay C11 compliant?
Are the compiler supplied libraries C11 compliant?
Maybe it's just that I've been fooled by something otherwise obvious, and someone can help me...
This is C11 (and similar is to C17 and early drafts of C23):
char *strtok_s(char * restrict s1,
rsize_t * restrict s1max,
const char * restrict s2,
char ** restrict ptr);
the same can be found as a good reference in the safec library
While MSC/VC and GCC have the form
char* strtok_s(
char* str,
const char* delimiters,
char** context
);
The C11 "Annex K bounds checking interfaces" was received with a lot of scepticism and in practice nearly no standard lib implemented it. See for example Field Experience With Annex K — Bounds Checking Interfaces.
As for the MSVC compiler, it doesn't conform to any C standard and never made such claims - you can try this out to check if you are using such a compiler or not:
#if !defined(__STDC__) || (__STDC__==0)
#error This compiler is non-conforming.
#endif
In particular, MSVC did not implement Annex K either, but already had non-standard library extensions in place prior to C11.
In practice _s means:
Possibly more safe or possibly less safe, depending on use and what the programmer expected.
Non-portable.
Possibly non-conforming.
If portability and standard conformance are important, then avoid _s functions.
In practice _s functions protect against two things: getting passed non-sanitized input or null pointers. So assuming that you do proper input sanitation and don't pass null pointers to library functions, the _s functions aren't giving you extra safety, just extra execution bloat and portability problems.
What happens now if someone wants to use strtok_s and stay C11 compliant?
You de facto can't.
And it's not limited to just strtok_s(). The entire C11 Annex K set of implementations is fractured, and because the major deviations from the standard are from Microsoft's implementation, there will probably never be a way to write portable, standard-conforming code using the Annex K functions.
Per N1967 Field Experience With Annex K — Bounds Checking Interface:
Available Implementations
Despite the specification of the APIs having been around for over a
decade only a handful of implementations exist with varying degrees of
completeness and conformance. The following is a survey of
implementations that are known to exist and their status.
While two of the implementations below are available in portable
source code form as Open Source projects, none of the popular Open
Source distribution such as BSD or Linux has chosen to make either
available to their users. At least one (GNU C Library) has repeatedly
rejected proposals for inclusion for some of the same reasons as those
noted by the Austin Group in their initial review of TR 24731-1
N1106]. It appears unlikely that the APIs will be provided by future
versions of these distributions.
Microsoft Visual Studio
Microsoft Visual Studio implements an early version of the APIs.
However, the implementation is incomplete and conforms neither to C11
nor to the original TR 24731-1. For example, it doesn't provide the
set_constraint_handler_s function but instead defines a
_invalid_parameter_handler _set_invalid_parameter_handler(_invalid_parameter_handler) function with similar behavior but a slightly different and incompatible
signature. It also doesn't define the abort_handler_s and
ignore_handler_s functions, the memset_s function (which isn't
part of the TR), or the RSIZE_MAX macro. The Microsoft
implementation also doesn't treat overlapping source and destination
sequences as runtime-constraint violations and instead has undefined
behavior in such cases.
As a result of the numerous deviations from the specification the
Microsoft implementation cannot be considered conforming or portable.
...
Safe C Library
Safe C Library [SafeC] is a fairly efficient and portable but
unfortunately very incomplete implementation of Annex K with support
for the string manipulation subset of functions declared in
<string.h>.
Due to its lack of support for Annex K facilities beyond the
<string.h> functions the Safe C Library cannot be considered a
conforming implementation.
Even the Safe C library is non-conforming.
Whether these functions are "safer" is debatable. Read the entire document.
Unnecessary Uses
A widespread fallacy originated by Microsoft's deprecation of the standard functions in an effort to increase the adoption of the APIs is that every call to the standard functions is necessarily unsafe and should be replaced by one to the "safer" API. As a result, security-minded teams sometimes naively embark on months-long projects rewriting their working code and dutifully replacing all instances of the "deprecated" functions with the corresponding APIs. This not only leads to unnecessary churn and raises the risk of injecting new bugs into correct code, it also makes the rewritten code less efficient.
Also, read the updated N1969 Updated Field Experience With Annex K — Bounds Checking Interfaces:
Despite more than a decade since the original proposal and nearly ten years since the ratification of ISO/IEC TR 24731-1:2007, and almost five years since the introduction of the Bounds checking interfaces into the C standard, no viable conforming implementations has emerged. The APIs continue to be controversial and requests for implementation continue to be rejected by implementers.
The design of the Bounds checking interfaces, though well-intentioned, suffers from far too many problems to correct. Using the APIs has been seen to lead to worse quality, less secure software than relying on established approaches or modern technologies. More effective and less intrusive approaches have become commonplace and are often preferred by users and security experts alike.
Therefore, we propose that Annex K be either removed from the next revision of the C standard, or deprecated and then removed.
The C standard makes clear that a compiler/library combination is allowed to do whatever it likes with the following code:
int doubleFree(char *p)
{
int temp = *p;
free(p);
free(p);
return temp;
}
In the event that a compiler does not require use of a particular bundled library, however, is there anything in the C standard which would forbid a library from defining a meaningful behavior? As a simple example, suppose code were written for a platform which had reference-counted pointers, such that following p = malloc(1234); __addref(p); __addref(p); the first two calls to free(p) would decrement the counter but not free the memory. Any code written for use with such a library would naturally work only with such a library (and the __addref() calls would likely fail on most others), but such a feature could be helpful in many cases when e.g. it is necessary to pass the a string repeatedly to a method which expects to be given a string produced with strdup and consequently calls free on it.
In the event that a library would define a useful behavior for some action like double-freeing a pointer, is there anything in the C standard which would authorize a compiler to unilaterally break it?
There is really two question here, your formally stated one and your broader one outlined in your comments to questions raised by others.
Your formal question is answers by the definition of undefined behavior and section 4 on conformance. The definition says (emphasis mine):
behavior, upon use of a nonportable or erroneous program construct or of erroneous data,
for which this International Standard imposes no requirements
With emphasis on nonportable and imposes no requirements. This really says it all, the compiler is free to optimize in unpleasant manners or can also chose to make the behavior documented and well defined, this of course mean the program is no longer strictly conforming, which brings us to section 4:
A strictly conforming program shall use only those features of the language and library
specified in this International Standard.2) It shall not produce output dependent on any
unspecified, undefined, or implementation-defined behavior, and shall not exceed any
minimum implementation limit.
but a conforming implementation is allowed extensions as long as they don't break a conforming program:
A conforming implementation may have extensions (including additional
library functions), provided they do not alter the behavior of any strictly conforming
program.3)
As the C FAQ says:
There are very few realistic, useful, strictly conforming programs. On the other hand, a merely conforming program can make use of any compiler-specific extension it wants to.
Your informal question deals with compilers taking more aggressive optimization opportunies with undefined behavior and in the long run the fear this will make real world systems programming impossible. While I do understand how this relatively new aggressive stance seems very programmer unfriendly to many in the end a compiler won't last very long if people can not build useful programs with it. A related blog post by John Regehr: Proposal for a Friendly Dialect of C.
One could argue the opposite, that compilers have made a lot of effort to build extensions to support varying needs not supported by the standard. I think the article GCC hacks in the Linux kernel demonstrates this well. It goes into the many gcc extensions that the Linux kernel relies on and clang has in general attempted to support as many gcc extensions as possible.
Whether compilers have removed useful handling of undefined behavior which hampers effective systems programming is not clear to me. I think specific questions on alternatives for individual cases of undefined behavior that has been exploited in systems programming and no longer work would be useful and interesting to the community.
Does C standard mandate that platforms must not define behaviors beyond those given in standard
Quite simply, no, it does not. The standard says:
An implementation shall be accompanied by a document that defines all implementation-
defined and locale-specific characteristics and all extensions.
There is no restriction anywhere in the standard that prohibits implementations from providing any other documentation they like. If you like, you can read N1570, the latest freely available draft of the ISO C standard, and confirm the lack of any such prohibition.
In the event that a library would define a useful behavior for some action like double-freeing a pointer, is there anything in the C standard which would authorize a compiler to unilaterally break it?
A C implementation includes both the compiler and the standard library. free() is part of the standard library. The standard does not define the behavior of passing the same pointer value to free() twice, but an implementation is free to define the behavior. Any such documentation is not required, and is outside the scope of the C standard.
If a C implementation documented, for example, that calling free() a second time on the same pointer value has no effect, but then doing so actually causes the program to crash, that would violate the implementation's own documentation, but it would not violate the C standard. There is no specific requirement in the C standard that says an implementation must conform to its own documentation, beyond the documentation that's required by the standard. An implementation's conformance to its own documentation is enforce by the market and by common sense, not by the C standard.
In the event that a library would define a useful behavior for some action like double-freeing a pointer, is there anything in the C standard which would authorize a compiler to unilaterally break it?
The compiler and the standard library (i.e. the one in which free is defined) are both part of the implementation - it isn't really coherent to talk about one of them doing something "unilaterally".
If a compiler "does not require use of a particular bundled library", then (other than perhaps as a freestanding implementation) it alone is not an implementation, so the standard doesn't apply to it at all. The behavior of a combination of a library and a compiler are the responsibility of whoever chooses to combine them (which may be the author of either component, or someone else entirely) and label this combination as an implementation. It would, of course, be wise not to document extensions implemented by the library as features of this implementation without confirming that the compiler does not break them. For that matter, you would also need to make sure that the compiler doesn't break anything used internally by the library.
In answer to your main question: no, it does not. If the end result of combining a library and a compiler (and kernel, dynamic loader, etc) is a conforming hosted environment, it is a conforming implementation even if some extensions that the library's author would like to have provided are not supported by the final result of combining them, but it does not require them to work, either. Conversely, if the result does not conform - for example if the compiler breaks the internals of the library and thereby causes some library function not to conform - then it is not a conforming implementation. Any program which calls free twice on the same pointer, or uses any reserved identifier starting with two underscores, causes undefined behavior and therefore is not a strictly conforming program.
As an example of implementation defined behavior in C. The C Standard says that the size of data types are implementation defined. So, say sizeof(int) is implementation defined.
Does this implementation defined behavior mean that the size(int) is platform dependent or defined by compiler vendor or both?
Once I compile my code, does implementation dependencies would still apply when I run it on different versions of platforms? Would I get performance loss for compiling implementation defined code on one platform and running it on other?
Yes, implementation defined means that it depends on the platform (Architecture + OS ABI + compiler).
And yes, implementation defined features can differ across different versions of the platform.
Does this implementation defined behavior mean that the size(int) is platform dependent or defined by compiler vendor or both?
In principle the compiler vendor can make that decision. In practice, if the compiler wants to emit code that calls directly to system libraries, then it has to follow the same "ABI" (Application Binary Interface) as the system, and among other things the ABI will specify the size of int. So the compiler vendor will "decide" to make it the size the ABI says.
Compilers that target multiple platforms and architectures will make the decision separately as part of the configuration of each platform. Each target then represents a different C implementation, even though you think of it as "the same compiler".
You could write a conforming C implementation in which int is a different size from what it is on the OS that runs the program. People rarely do, and the standard libraries would have to jump through extra hoops when they make system calls. It could be useful as part of an emulator, but then you might reasonably argue that the "platform" is the emulated platform, not the host platform with its different-sized int.
Once I compile my code, does implementation dependencies would still apply when I run it on different versions of platforms?
sizeof(int) is a compile-time constant, which means that the code emitted by your compiler might assume a certain value. That binary code cannot then run correctly on a different version of the platform with a different sized int.
Would I get performance loss for compiling implementation defined code on one platform and running it on other?
If it works at all, then there's no particular reason to assume there will be a performance loss. It generally won't work at all (see above), because binary code intended for one platform in general doesn't work on another platform. If the platforms are similar enough that it does work, it's possible that optimizations that the compiler made intended for one, are not such good optimizations on another. In that case there would be a performance loss, and the fix would be to re-compile the code targeting the correct (version of the) platform.
This does happen with ARM, and to a lesser extent with x86. Different chips in the past have offered essentially the same instruction set, but with some instructions on some chips having significantly different cost relative to other instructions. An optimization that assumes instruction X is fast would likely be a bad optimization on a different chip where instruction X is slow. As you can imagine, this kind of difference doesn't make the chip manufacturer hugely popular with compiler vendors, and even less so with assembly programmers.
Does this implementation defined behavior mean that the size(int) is platform dependent or defined by compiler vendor or both?
In the C Standard terminology, the implementation is the compiler.
Here is the actual definition from the C Standard of the term implementation:
(C99, 3.12p1) implementation: particular set of software, running in a particular translation environment under particular control options, that performs translation of programs for, and supports execution of functions in, a particular execution environment
size(int) is indeed implementation dependent. It has nothing to do with performance, but rather architecture of the platform you are using. A CPU that is 32-bit wide will behave differently than one that is 64-bit wide or even one that is 16-bit wide.
That's what they mostly refer to by platform dependent, but also there is the question of cross-compiling, which brings even more issues. You can use flags like -m to specify the architecture and width which causes code to use run under different platforms than it was originally compiled on.
According to the C- standard
ISO/IEC 9899:1999 §3.4.1
1 implementation-defined behavior
unspecified behavior where each implementation documents how the choice is made`
It means the behavior which is documented in compiler is implementation defined.
sizeof() is documented.
2 EXAMPLE : An example of implementation-defined behavior is the propagation of the high-order bit
when a signed integer is shifted right.
Annex J 'Portability Issues' includes a lists of Unspecified Behaviour (J.1), Undefined Behaviour (J.2), Implementation-Defined Behaviour (J.3) and Locale-Specific Behaviour (J.4).
I read this official manual about GCC. Sometimes I have a a problem with translating of text. On the page number six (chapter 2.1) I can't understand such fragment of text:
The ISO C standard defines (in clause 4) two classes of conforming
implementation. A conforming hosted implementation supports the whole
standard including all the library facilities; a conforming
freestanding implementation is only required to provide certain
library facilities: those in <float.h>, <limits.h>, <stdarg.h>, and
<stddef.h>; since AMD1, also those in <iso646.h>; since C99, also
those in <stdbool.h> and <stdint.h>; and since C11, also those in
<stdalign.h> and . In addition, complex types, added in
C99, are not required for freestanding implementations. The standard
also defines two environments for programs, a freestanding
environment, required of all implementations and which may not have
library facilities beyond those required of freestanding
implementations, where the handling of program startup and termination
are implementation-defined, and a hosted environment, which is not
required, in which all the library facilities are provided and startup
is through a function int main (void) or int main (int, char *[]). An
OS kernel would be a freestanding environment; a program using the
facilities of an operating system would normally be in a hosted
implementation.
I am not sure I understand it right...
I will rephrase how I understood it:
Exists two implementations of ISO C standard: a full (named as conforming hosted implementation), and a light (named as conforming freestanding implementation).
Exists two environments (for each of standard's implementation): a hosted environment (for full standard), and a freestanding environment (for light standard).
The light versions is for OS developing. The full versions is for programs, which will work in OS.
And I not understood the phrase about the main function.
I ask to explain me this fragment of text.
It's a bit of both.
The standard defines two runtime environments. One has all of the language, plus a tiny subset of the standard runtime library, plus additional implementation-defined stuff. That's a freestanding environment, and is (as you guessed) intended for programming on the bare metal, e.g. an OS kernel.
The other, more sophisticated environment includes all of the facilities of the above plus all of the standard runtime library. That's a hosted environment, which is intended for application programming.
Now, an implementation is only required to include the facilities of the freestanding environment. If that's all it has, it's called a freestanding implementation. Cross-compilers for deeply embedded microcontrollers are often freestanding implementations, because much of the standard C runtime doesn't make sense or is too big to fit.
Implementing the hosted environment is optional; if an implementation provides the hosted environment it's called a hosted implementation. A hosted implementation must also provide the freestanding environment, i.e. a compilation mode in which only the facilities of a freestanding implementation are available. (This mode would typically be used for compiling things like the C runtime itself, most of which is just more C.)
Finally, the standard signatures for main (int main(void) and int main(int, char **)) are part of the hosted environment. A freestanding environment can use those signatures as well, but it can also define the signature of main to be whatever it likes (void main(void) is common) or use a different name for the entry point.
C11 4/6:
The two forms of conforming implementation are hosted and freestanding. A conforming hosted implementation shall accept any strictly conforming program. A conforming freestanding implementation shall accept any strictly conforming program in which the use of the features specified in the library clause (clause 7) is confined to the contents of the standard headers <float.h>, <iso646.h>, <limits.h>, <stdalign.h>, <stdarg.h>, <stdbool.h>, <stddef.h>, <stdint.h>, and <stdnoreturn.h>.
First note that "implementation", in the context of the C standard, means "the implementation of a C compiler" and nothing else.
As you correctly state in your question, the freestanding implementation is a compiler for a system that isn't intended to have an operative system beneath it. In other words a freestanding implementation compiler produces programs that are either embedded applications running on a "bare bone" CPU, or programs that are operative systems in themselves. While a hosted implementation is a compiler intended for applications running on top of a OS.
The compiler for freestanding applications only needs to provide the above mentioned headers. The rest of the headers (such as stdio.h) are defined in the mentioned "clause 7" of the standard, but they are not mandatory for a freestanding implementation.
Note however that several libraries are not mandatory to neither hosted nor freestanding implementations, for example the complex number library:
C11 7.3.1:
"Implementations that define the macro _ _STDC_NO_COMPLEX_ _ need not
provide this header nor support any of its facilities."
Furthermore, the two different execution environments, freestanding and hosted, allow different syntax for main(), more info can be found here. A very common misunderstanding among programmers is that the only allowed form in C is int main(), which is only true in a hosted environment.
For example, a freestanding program could start at an out-of-reset interrupt service routine. From there it can call a void main() function, or it could call some other function entirely: it is implementation-defined.
What it means is that a freestanding environment is not required to execute your main() function upon startup. For example, it might be looking for _main() instead (the exact name and signature is implementation defined).
I recently read an interview with Lua co-creators Luiz H. de Figueredo and Roberto Ierusalimschy, where they discussed the design, and implementation of Lua. It was very intriguing to say the least. However, one part of the discussion brought something up in my mind. Roberto spoke of Lua as a "freestanding application" (that is, it's pure ANSI C that uses nothing from the OS.) He said, that the core of Lua was completely portable, and because of its purity has been able to be ported much more easily and to platforms never even considered (such as robots, and embedded devices.)
Now this makes me wonder. C in general is a very portable language. So, what parts of C (namely those in the the standard library) are the most unportable? and what are those that can be expected to work on most platforms? Should only a limited set of data types be used (e.g. avoiding short and maybe float)? What about the FILE and the stdio system? malloc and free? It seems that Lua avoids all of these. Is that taking things to the extreme? Or are they the root of portability issues? Beyond this, what other things can be done to make code extremely portable?
The reason I'm asking all of this, is because I'm currently writing an application in pure C89, and it's optimal that it be as portable as possible. I'm willing take a middle road in implementing it (portable enough, but no so much that I have to write everything from scratch.) Anyways, I just wanted to see what in general is key to writing the best C code.
As a final note, all of this discussion is related to C89 only.
In the case of Lua, we don't have much to complain about the C language itself but we have found that the C standard library contains many functions that seem harmless and straight-forward to use, until you consider that they do not check their input for validity (which is fine if inconveninent). The C standard says that handling bad input is undefined behavior, allowing those functions to do whatever they want, even crash the host program. Consider, for instance, strftime. Some libc's simply ignore invalid format specifiers but other libc's (e.g., in Windows) crash! Now, strftime is not a crucial function. Why crash instead of doing something sensible? So, Lua has to do its own validation of input before calling strftime and exporting strftime to Lua programs becomes a chore. Hence, we have tried to stay clear from these problems in the Lua core by aiming at freestanding for the core. But the Lua standard libraries cannot do that, because their goal is to export facilities to Lua programs, including what is available in the C standard library.
"Freestanding" has a particular meaning in the context of C. Roughly, freestanding hosts are not required to provide any of the standard libraries, including the library functions malloc/free, printf, etc. Certain standard headers are still required, but they only define types and macros (for example stddef.h).
C89 allows two types of compilers: hosted and freestanding. The basic difference is that a hosted compiler provides all of the C89 library, while a freestanding compiler need only provide <float.h>, <limits.h>, <stdarg.h>, and <stddef.h>. If you limit yourself to these headers, your code will be portable to any C89 compiler.
This is a very broad question. I'm not going to give the definite answer, instead I'll raise some issues.
Note that the C standard specifies certain things as "implementation-defined"; a conforming program will always compile on and run on any conforming platform, but it may behave differently depending on the platform. Specifically, there's
Word size. sizeof(long) may be four bytes on one platform, eight on another. The sizes of short, int, long etc. each have some minimum (often relative to each other), but otherwise there are no guarantees.
Endianness. int a = 0xff00; int b = ((char *)&a)[0]; may assign 0 to b on one platform, -1 on another.
Character encoding. \0 is always the null byte, but how the other characters show up depends on the OS and other factors.
Text-mode I/O. putchar('\n') may produce a line-feed character on one platform, a carriage return on the next, and a combination of each on yet another.
Signedness of char. It may or it may not be possible for a char to take on negative values.
Byte size. While nowadays, a byte is eight bits virtually everywhere, C caters even to the few exotic platforms where it is not.
Various word sizes and endiannesses are common. Character encoding issues are likely to come up in any text-processing application. Machines with 9-bit bytes are most likely to be found in museums. This is by no means an exhaustive list.
(And please don't write C89, that's an outdated standard. C99 added some pretty useful stuff for portability, such as the fixed-width integers int32_t etc.)
C was designed so that a compiler may be written to generate code for any platform and call the language it compiles, "C". Such freedom acts in opposition to C being a language for writing code that can be used on any platform.
Anyone writing code for C must decide (either deliberately or by default) what sizes of int they will support; while it is possible to write C code which will work with any legal size of int, it requires considerable effort and the resulting code will often be far less readable than code which is designed for a particular integer size. For example, if one has a variable x of type uint32_t, and one wishes to multiply it by another y, computing the result mod 4294967296, the statement x*=y; will work on platforms where int is 32 bits or smaller, or where int is 65 bits or larger, but will invoke Undefined Behavior in cases where int is 33 to 64 bits, and the product, if the operands were regarded as whole numbers rather than members of an algebraic ring that wraps mod 4294967296, would exceed INT_MAX. One could make the statement work independent of the size of int by rewriting it as x*=1u*y;, but doing so makes the code less clear, and accidentally omitting the 1u* from one of the multiplications could be disastrous.
Under the present rules, C is reasonably portable if code is only used on machines whose integer size matches expectations. On machines where the size of int does not match expectations, code is not likely to be portable unless it includes enough type coercions to render most of the language's typing rules irrelevant.
Anything that is a part of the C89 standard should be portable to any compiler that conforms to that standard. If you stick to pure C89, you should be able to port it fairly easily. Any portability problems would then be due to compiler bugs or places where the code invokes implementation-specific behavior.