I am using a C library provided to me already compiled. I have limited information on the compiler, version, options, etc., used when compiling the library. The library interface uses enum both in structures that are passed and directly as passed parameters.
The question is: how can I assure or establish that when I compile code to use the provided library, that my compiler will use the same size for those enums? If it does not, the structures won't line up, and the parameter passing may be messed up, e.g. long vs. int.
My concern stems from the C99 standard, which states that the enum type:
shall be compatible with char, a signed integer type, or an unsigned
integer type. The choice of type is implementation-defined, but shall
be capable of representing the values of all the members of the
enumeration.
As far as I can tell, so long as the largest value fits, the compiler can pick any type it darn well pleases, effectively on a whim, potentially varying not only between compilers, but different versions of the same compiler and/or compiler options. It could pick 1, 2, 4, or 8-byte representations, resulting in potential incompatibilities in both structures and parameter passing. (It could also pick signed or unsigned, but I don't see a mechanism for that being a problem in this context.)
Am I missing something here? If I am not missing something, does this mean that enum should never be used in an API?
Update:
Yes, I was missing something. While the language specification doesn't help here, as noted by #Barmar the Application Binary Interface (ABI) does. Or if it doesn't, then the ABI is deficient. The ABI for my system indeed specifies that an enum must be a signed four-byte integer. If a compiler does not obey that, then it is a bug. Given a complete ABI and compliant compilers, enum can be used safely in an API.
APIs that use enum are depending on the assumption that the compiler will be consistent, i.e. given the same enum declaration, it will always choose the same underlying type.
While the language standard doesn't specifically require this, it would be quite perverse for a compiler to do anything else.
Furthermore, all compilers for a particular OS need to be consistent with the OS's ABI. Otherwise, you would have far more problems, such as the library using 64-bit int while the caller uses 32-bit int. Ideally, the ABI should constrain the representation of enums, to ensure compatibility.
More generally, the language specification only ensures compatibility between programs compiled with the same implementation. The ABI ensures compatibility between programs compiled with different implementations.
From the question:
The ABI for my system indeed specifies that an enum must be a signed four-byte integer. If a compiler does not obey that, then it is a bug.
I'm surprised about that. I suspect in reality you're compiler will select a 64-bit (8 byte) size for your enum if you define an enumerated constant with a value larger that 2^32.
On my platforms (MinGW gcc 4.6.2 targeting x86 and gcc 4,.4 on Linux targeting x86_64), the following code says that I get both 4 and 8 byte enums:
#include <stdio.h>
enum { a } foo;
enum { b = 0x123456789 } bar;
int main(void) {
printf("%lu\n", sizeof(foo));
printf("%lu", sizeof(bar));
return 0;
}
I compiled with -Wall -std=c99 switches.
I guess you could say that this is a compiler bug. But the alternatives of removing support for enumerated constants larger than 2^32 or always using 8-byte enums both seem undesirable.
Given that these common versions of GCC don't provide a fixed size enum, I think the only safe action in general is to not use enums in APIs.
Further notes for GCC
Compiling with "-pedantic" causes the following warnings to be generated:
main.c:4:8: warning: integer constant is too large for 'long' type [-Wlong-long]
main.c:4:12: warning: ISO C restricts enumerator values to range of 'int' [-pedantic]
The behavior can be tailored via the --short-enums and --no-short-enums switches.
Results with Visual Studio
Compiling the above code with VS 2008 x86 causes the following warnings:
warning C4341: 'b' : signed value is out of range for enum constant
warning C4309: 'initializing' : truncation of constant value
And with VS 2013 x86 and x64, just:
warning C4309: 'initializing' : truncation of constant value
Related
The C11 spec on enums1, states that the enumerator constants must have type int (1440-1441):
1440 The expression that defines the value of an enumeration constant shall be an integer constant expression that has a value representable as an int.
1441 The identifiers in an enumerator list are declared as constants that have type int and may appear wherever such are permitted.107)
However, it indicates that the backing type of the enum can be either a signed int, and unsigned int, or a char, so long as it fits the range of constants in the enum (1447-1448):
1447 Each enumerated type shall be compatible with char, a signed integer type, or an unsigned integer type.
1448 The choice of type is implementation-defined,108) but shall be capable of representing the values of all the members of the enumeration.
This seems to indicate that only the compiler can know the width of an enum type, which is fine until you consider an array of enum types as part of a dynamically linked library.
Say you had a function:
enum my_enum return_fifth(enum my_enum[] lst) {
return lst[5];
}
This would be fine when linked to statically, because the compiler knows the size of a my_enum, but any other C code linking to it may not.
So, how is it possible for one C library to dynamically link to another C library, and know how the compiler decided to implement the enums? (Or do most modern compilers just stick with int/uint and forgo using chars altogether?
1Okay, I know this website is not quite the C11 standard, where as this one is a bit closer: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf
C standard doesn't says anything about dynamical library or even static library, these concepts doesn't exist in standard. This is in the implemented behavior domain.
But as you said nothing prevent a compiler to use different type for an enumeration, this mean the one compiler can use a type and another use a different type.
This would be fine when linked to statically
In fact no, let say that A is a compiler that used char and B is a compiler that used int, and let say these types are not the same size. You compile a static library with the A compiler, and you statically link this library to a program compiled by B. This is static and still, B can't know that A doesn't use the B type for the enum.
So, how is it possible for one C library to dynamically link to another C library
Well, as I said this is not possible for static library, for the same reason, this is not possible for dynamic library.
Or do most modern compilers just stick with int/uint and forgo using chars altogether?
Big compiler generally talk between them to use the same rules on the same environment, yes. But nothing in C guaranty this behavior. (On the compatibility problem: a lot of people use "The C ABI", despite the fact it doesn't exist in the standard.)
So the best advice is to compile your dynamic library with the same compiler and option that compile your main program, also check the documentation of your compiler is a big plus.
The complete definition of the enum needs to be visible at the time it is used. Then the compiler will know what the size will be.
Forward declarations of enum types is not allowed.
Some background:
the header stdint.h is part of the C standard since C99. It includes typedefs that are ensured to be 8, 16, 32, and 64-bit long integers, both signed and unsigned. This header is not part of the C89 standard, though, and I haven't yet found any straightforward way to ensure that my datatypes have a known length.
Getting to the actual topic
The following code is how SQLite (written in C89) defines 64-bit integers, but I don't find it convincing. That is, I don't think it's going to work everywhere. Worst of all, it could fail silently:
/*
** CAPI3REF: 64-Bit Integer Types
** KEYWORDS: sqlite_int64 sqlite_uint64
**
** Because there is no cross-platform way to specify 64-bit integer types
** SQLite includes typedefs for 64-bit signed and unsigned integers.
*/
#ifdef SQLITE_INT64_TYPE
typedef SQLITE_INT64_TYPE sqlite_int64;
typedef unsigned SQLITE_INT64_TYPE sqlite_uint64;
#elif defined(_MSC_VER) || defined(__BORLANDC__)
typedef __int64 sqlite_int64;
typedef unsigned __int64 sqlite_uint64;
#else
typedef long long int sqlite_int64;
typedef unsigned long long int sqlite_uint64;
#endif
typedef sqlite_int64 sqlite3_int64;
typedef sqlite_uint64 sqlite3_uint64;
So, this is what I've been doing so far:
Checking that the "char" data type is 8 bits long, since it's not guaranteed to be. If the preprocessor variable "CHAR_BIT" is not equal to 8, compilation fails
Now that "char" is guaranteed to be 8 bits long, I create a struct containing an array of several unsigned chars, which correspond to several bytes in the integer.
I write "operator" functions for my datatypes. Addition, multiplication, division, modulo, conversion from/to string, etc.
I have abstracted this process in a header file, which is the best I can do with what I know, but I wonder if there is a more straightforward way to achieve this.
I'm asking because I want to write a portable C library.
First, you should ask yourself whether you really need to support implementations that don't provide <stdint.h>. It was standardized in 1999, and even many pre-C99 implementations are likely to provide it as an extension.
Assuming you really need this, Doug Gwyn, a member of the ISO C standard committee, created an implementation of several of the new headers for C9x (as C99 was then known), compatible with C89/C90. The headers are in the public domain and should be reasonably portable.
http://www.lysator.liu.se/(nobg)/c/q8/index.html
(As I understand it, the name "q8" has no particular meaning; he just chose it as a reasonably short and unique search term.)
One rather nasty quirk of integer types in C stems from the fact that many "modern" implementations will have, for at least one size of integer, two incompatible signed types of that size with the same bit representation and likewise two incompatible unsigned types. Most typically the types will be 32-bit "int" and "long", or 64-bit "long" and "long long". The "fixed-sized" types will typically alias to one of the standard types, though implementations are not consistent about which one.
Although compilers used to assume that accesses to one type of a given size might affect objects of the other, the authors of the Standard didn't mandate that they do so (probably because there would have been no point ordering people to do things they would do anyway and they couldn't imagine any sane compiler writer doing otherwise; once compilers started doing so, it was politically difficult to revoke that "permission"). Consequently, if one has a library which stores data in a 32-bit "int" and another which reads data from a 32-bit "long", the only way to be assured of correct behavior is to either disable aliasing analysis altogether (probably the sanest choice while using gcc) or else add gratuitous copy operations (being careful that gcc doesn't optimize them out and then use their absence as an excuse to break code--something it sometimes does as of 6.2).
I am in the early stages of framing stuff out on a new project.
I defined a function with a return type of "bool"
I got this output from PC-Lint
Including file sockets.h (hdr)
bool sock_close(uint8_t socket_id);
^
"LINT: sockets.h (52, 1) Note 970: Use of modifier or type '_Bool' outside of a typedef [MISRA 2012 Directive 4.6, advisory]"
I went ahead and defined this in another header to shut lint up:
typedef bool bool_t;
Then I started wondering why I had to do that and why it changed anything. I turned to MISRA 2012 Dir 4.6. It is concerned mostly about the width of primitive types like short, int, and long, their width, and how they are signed.
The standard does not give any amplification, rational, exception, or example for bool.
bool is explicitly defined as _Bool in stdbool.h in C99. So does this criteria really apply bool?
I thought _Bool was explicitly always the "smallest standard unsigned integer type large enough to store the values 0 and 1" according to section 6.2.5 of C99. So we know bool is unsigned. Is it then just a matter of the fact that _Bool is not fixed width and subject being promoted somehow that's the issue? Because the rational would seem to contradict that notion.
Adherence to this guideline does not guarantee portability because the size of the int type may determine whether or not an expression is subject to integer promotion.
How does just putting typedef bool bool_t; change anything - because I do nothing to indicate the width or the signdedness in doing so? The width of bool_t will just be platform dependent too. Is there a better way to redefine bool?
A type must not be defined with a specific length unless the implemented type is actually of that length
so typedef bool bool8_t; should be totally illegal.
Is Gimpel wrong in their interpretation of Directive 4.6 or are they spot on?
Use of modifier or type '_Bool' outside of a typedef [MISRA 2012 Directive 4.6, advisory]
That's nonsense, directive 4.6 is only concerned about using the types in stdint.h rather than int, short etc. The directive is about the basic numerical types. bool has nothing to do with that directive whatsoever, as it is not a numerical type.
For reasons unknown, MISRA-C:2012 examples use a weird type called bool_t, which isn't standard. But MISRA does by no means enforce this type to be used anywhere, particularly they do not enforce it in directive 4.6, which doesn't even mention booleans. MISRA does not discourage the use of bool or _Bool anywhere.
Is Gimpel wrong in their interpretation of Directive 4.6
Yes, their tool is giving incorrect diagnostics.
In addition, you may have to configure the tool (if possible) to tell it which bool type that is used. 5.3.2 mentions that you might have to do so if not using _Bool, implying that all static analysers must understand _Bool. But even if the bool type is correctly configured, dir 4.6 has nothing to do with it.
A potential concern with Boolean types is that a lot of code prior to C99 used a single-byte type to hold true/false values, and a fair amount of it may have used the name "bool". Attempting to store any multiple of 256 into most such types would be regarded as storing zero, while storing a non-zero multiple of 256 into a c99 "bool" would yield 1. If a piece of code which uses a C99 "bool" is ported into a piece of code that uses a typedef'ed byte, the resulting code could very easily malfunction (it's somewhat less likely that code written for a typedef'ed byte would rely upon any particular behavior when storing a value other than 0 or 1).
Which of these items can safely be assumed to be defined in any practically-usable platform ABI?
Value of CHAR_BIT
Size, alignment requirements and object representation of:
void*, size_t, ptrdiff_t
unsigned char and signed char
intptr_t and uintptr_t
float, double and long double
short and long long
int and long (but here I expect a "no")
Pointer to an object type for which the platform ABI specifies these properties
Pointer to function whose type only involves types for which the platform ABI specifies these properties
Object representation of a null object pointer
Object representation of a null function pointer
For example, if I have a library (compiled by an unknown, but ABI-conforming compiler) which publishes this function:
void* foo(void *bar, size_t baz, void* (*qux)());
can I assume to be able to safely call it in my program regardless of the compiler I use?
Or, taken the other way round, if I am writing a library, is there a set of types such that if I limit the library's public interface to this set, it will be guaranteed to be usable on all platforms where it builds?
I don't see how you can expect any library to be universally compatible. If that were possible, there would not be so many compiled variations of libraries.
For example, you could call a 64-bit library from a 16-bit program as long as you set up the call correctly. But you would have to know you're calling a 64-bit based library.
Portability is a much-talked about goal, but few truly achieve it. After 30+ years of system-level, firmware and application programming, I think of it as more of a fantasy versus a goal. Unfortunately, hardware forces us to optimize for the hardware. Therefore, when I write a library, I use the following:
Compile for ABI
Use a pointer to a structure for input and output for all function calls:
int lib_func(struct *input, struct *output);
Where the returning int indicates errors only. I make all error codes unique. I require the user to call an init function prior to any use of the library. The user calls it as:
lib_init(sizeof(int), sizeof(char *), sizeof(long), sizeof(long long));
So that I can decide if there will be any trouble or modify any assumptions if needed. I also add a function allowing the user to learn my data sizes and alignment in addition to version numbers.
This is not to say the user or I am expected to "on-the-fly" modify code or spend lots of CPU power reworking structures. But this allows the application to make absolutely sure it's compatible with me and vice-versa.
The other option which I have employed in the past, is to simply include several entry-point functions with my library. For example:
int lib_func32();
int lib_func16();
int lib_func64();
It makes a bit of a mess for you, but you can then fix it up using the preprocessor:
#ifdef LIB_USE32
#define lib_function lib_func32
#endif
You can do the same with data structures but I'd recommend using the same size data structure regardless of CPU size -- unless performance is a top-priority. Again, back to the hardware!
The final option I explore is whether to have entry functions of all sizes and styles which convert the input to my library's expectations, as well as my library's output.
For example, your lib_func32(&input, &output) can be compiled to expect a 32-bit aligned, 32-bit pointer but it converts the 32-bit struct into your internal 64-bit struct then calls your 64 bit function. When that returns, it reformats the 64-bit struct to its 32-bit equivalent as pointed to by the caller.
int lib_func32(struct *input32, struct *output32)
{
struct input64;
struct output64;
int retval;
lib_convert32_to_64(input32, &input64);
retval = lib_func64(&input64, &output64);
lib_convert64_to_32(&output64, output32);
return(retval);
}
In summary, a totally portable solution is not viable. Even if you begin with total portability, eventually you will have to deviate. This is when things truly get messy. You break your style for deviations which then breaks your documentation and confuses users. I think it's better to just plan it from the start.
Hardware will always cause you to have deviations. Just consider how much trouble 'endianness' causes -- not to mention the number of CPU cycles which are used each day swapping byte orders.
The C standard contains an entire section in the appendix summarizing just that:
J.3 Implementation-defined behavior
A completely random subset:
The number of bits in a byte
Which of signed char and unsigned char is the same as char
The text encodings for multibyte and wide strings
Signed integer representation
The result of converting a pointer to an integer and vice versa (6.3.2.3). Note that this means any pointer, not just object pointers.
Update: To address your question about ABIs: An ABI (application binary interface) is not a standardized concept, and it isn't said anywhere that an implementation must even specify an ABI. The ingredients of an ABI are partly the implementation-defined behaviour of the language (though not all of it; e.g. signed-to-unsigned conversion is implementation defined, but not part of an ABI), and most of the implementation-defined aspects of the language are dictated by the hardware (e.g. signed integer representation, floating point representation, size of pointers).
However, more important aspects of an ABI are things like how function calls work, i.e. where the arguments are stored, who's responsible for cleaning up the memory, etc. It is crucial for two compilers to agree on those conventions in order for their code to be binarily compatible.
In practice, an ABI is usually the result of an implementation. Once the compiler is complete, it determines -- by virtue of its implementation -- an ABI. It may document this ABI, and other compilers, and future versions of the same compiler, may like to stick to those conventions. For C implementations on x86, this has worked rather well and there are only a few, usually well documented, free parameters that need to be communicated for code to be interoperable. But for other languages, most notably C++, you have a completely different picture: There is nothing coming near a standard ABI for C++ at all. Microsoft's compiler breaks the C++ ABI with every release. GCC tries hard to maintain ABI compatibility across versions and uses the published Itanium ABI (ironically for a now dead architecture). Other compilers may do their own, completely different thing. (And then you have of course issues with C++ standard library implementations, e.g. does your string contain one, two, or three pointers, and in which order?)
To summarize: many aspects of a compiler's ABI, especially pertaining to C, are dictated by the hardware architecture. Different C compilers for the same hardware ought to produce compatible binary code as long as certain aspects like function calling conventions are communicated properly. However, for higher-level languages all bets are off, and whether two different compilers can produce interoperable code has to be decided on a case-by-case basis.
If I understand your needs correctly, uint style ones are the only ones that will give you binary compatibility guarantee and of cause int, char will but others tend to differ. i.e long on Windows and Linux, Windows considers it 4byte and Linux as 8byte. If you are really dependent on ABI, you have to plan for the platforms you are going to deliver and may be use typedefs to make things standardized and readable.
In a discussion, a colleague told me that he never uses enum because he experienced that some C-compilers don't cope with the enum statement correctly.
He couldn't remember which compiler exactly had problems but among the problems, there were errors when doing something like
enum my_enum{
my_enum_first = 5;
my_enum_second = 10;
};
i.e. initializing enum values instead of letting the compiler do the automatic assignment. Another one was that the compiler decides for itself how big the enum is and therefore you could have unpredictable behavior for sizeof my_enum when compiling your code under various platforms.
To get around that, he told me to better use #defines to define the constant elements. But especially for using doxygen it's quite handy to have an enum (e.g. as function parameter) because in the generated documentation, you could simply click on my_enum and directly jump to the description of my_enum.
Another example would be code completion, where your IDE tells you what you could specify as valid parameters for functions. I know that – as long as you're compiling the code as C-code – that there's no type-safety (i.e. I could also specify 5 instead of my_enum_first), so the use of an enum seems to be a more cosmetic thing.
The question is: do you know any compilers that have limitations regarding the usage of enum?
Edit 1:
Regarding the environment: we are developing for various embedded platforms, so there could also be a compiler for some obscure micro-controller...
Edit 2:
He could tell me that the KEIL C51 compiler didn't play well with enums. Are there any experiences with current versions of the C51 compiler?
Compilers are free to choose the size of an enum based on its range of possible values. This only really becomes an issue if you're exposing enums in your API, and users of your code may be using a different compiler or build options.
In this case, confusion can be caused by the calling code passing in a 16-bit value, for example, and the receiving code expecting it to be 32 bits. If the top 16 bits of the passed-in value are left uninitialized, then bad things will happen.
You can work around this kind of issue by including a dummy entry in your enum to enforce a minimum size.
For example:
typedef enum {
FirstValue = 12,
SecondValue = 25,
DummyValue = 65536 // force enum to be greater than 16 bits
} MyEnum;
I'm pretty sure that a compiler that doesn't play nice with enum is an invalid compiler - enum is specified in the standard, so a failure to implement it means the compiler shouldn't technically be used to compile C (For the record, the scope of enumeration types is discussed in 6.2.1 and defined as a type in 6.2.5 (of C99), so one would assume that it's a valid part of the standard from thereon in)!
So no, I don't know of any such compilers.