What is the size of an enum in C? - c

I'm creating a set of enum values, but I need each enum value to be 64 bits wide. If I recall correctly, an enum is generally the same size as an int; but I thought I read somewhere that (at least in GCC) the compiler can make the enum any width they need to be to hold their values. So, is it possible to have an enum that is 64 bits wide?

Taken from the current C Standard (C99): http://www.open-std.org/JTC1/SC22/WG14/www/docs/n1256.pdf
6.7.2.2 Enumeration specifiers
[...]
Constraints
The expression that defines the value of an enumeration constant shall be an integer
constant expression that has a value representable as an int.
[...]
Each enumerated type shall be compatible with char, a signed integer type, or an
unsigned integer type. The choice of type is implementation-defined, but shall be
capable of representing the values of all the members of the enumeration.
Not that compilers are any good at following the standard, but essentially: If your enum holds anything else than an int, you're in deep "unsupported behavior that may come back biting you in a year or two" territory.
Update: The latest publicly available draft of the C Standard (C11): http://www.open-std.org/JTC1/SC22/WG14/www/docs/n1570.pdf contains the same clauses. Hence, this answer still holds for C11.

An enum is only guaranteed to be large enough to hold int values. The compiler is free to choose the actual type used based on the enumeration constants defined so it can choose a smaller type if it can represent the values you define. If you need enumeration constants that don't fit into an int you will need to use compiler-specific extensions to do so.

While the previous answers are correct, some compilers have options to break the standard and use the smallest type that will contain all values.
Example with GCC (documentation in the GCC Manual):
enum ord {
FIRST = 1,
SECOND,
THIRD
} __attribute__ ((__packed__));
STATIC_ASSERT( sizeof(enum ord) == 1 )

Just set the last value of the enum to a value large enough to make it the size you would like the enum to be, it should then be that size:
enum value{a=0,b,c,d,e,f,g,h,i,j,l,m,n,last=0xFFFFFFFFFFFFFFFF};

We have no control over the size of an enum variable. It totally depends on the implementation, and the compiler gives the option to store a name for an integer using enum, so enum is following the size of an integer.

In C language, an enum is guaranteed to be of size of an int. There is a compile time option (-fshort-enums) to make it as short (This is mainly useful in case the values are not more than 64K). There is no compile time option to increase its size to 64 bit.

Consider this code:
enum value{a,b,c,d,e,f,g,h,i,j,l,m,n};
value s;
cout << sizeof(s) << endl;
It will give 4 as output. So no matter the number of elements an enum contains, its size is always fixed.

Related

Can I treat an `enum` variable as an `int` in C17?

TL;DR: Is it right to assume, given enum NAME {...};, that enum NAME n is the same as int n during execution? Can n be operated on as if it were a signed int, even though it is declared as enum NAME? The reason: I really want to use enum types for return flags, as a type 'closed' with respect to bit-operations.
For example: Let typedef enum FLAGS { F1 = 0x00000001, F2 = 0x00000002, F3 = 0x00000004 } FLAGS ;
Then, FLAGS f = F1 | F2; assigns 3 to f, throwing no related errors or warnings. This and numerous other compiler-permitted usage scenarios, such as f++, makes me think I could legit treat f as if it were a signed int. Compiler used: MSVC'19, 16.9.1, with setting "C17 (2018) Standard (/std:c17)";
I searched the standard (the sketch here) and looked at other related questions, to find no mention of what suspect (and wished) to be a "silent promotion" of enum NAME x to signed int x, even though the identifiers have that type. This leads me to believe that the way enum behaves when assigned a value that isn't a member, is implementation dependent. I'm asking, in part, in order to confirm or deny this claim.
C 2018 6.7.2.2 4 says:
Each enumerated type shall be compatible with char, a signed integer type, or an unsigned integer type. The choice of type is implementation-defined, but shall be capable of representing the values of all the members of the enumeration…
So the answer to “Can I treat an enum variable as an int in C17?” is no, as an object with enumerated type might be effectively a char or other integer type different from int.
However, it is effectively an integer type, so FLAGS f = F1 | F2; will work: The FLAGS type must be capable of representing its values F1 and F2, so whatever type is used for FLAGS must contain all the bits of F1 and of F2, so it contains all the bits of F1 | F2.
Technically, you could construct a trap representation by manipulating bits, so it is not guaranteed that the type is closed under bit operations. For example, if a C implementation used two’s complement for 32-bit int but reserved the bit pattern 1000…0000 as a trap representation, then INT_MIN & -2 would be a trap representation. (INT_MIN would have the bit pattern 1000…0001, for 231−1, and -2 would have the pattern 1111…1110.) This does not occur in C implementations without trap representations in its integer types.
We might question whether the fact that two types (an enumeration and its implementation-defined integer type) are compatible means we can use one as the other. Two types are compatible if they are the same (6.2.7 1), and the only things that can make types compatible but not the same involve qualifiers (like const) that are not an issue for this or involve other properties (such as array dimensions) that are not relevant to simple integer types.
This is in chapter 6.4.4.3 of the PDF you linked:
An identifier declared as an enumeration constant has type int.
Your thought of a promotion of enum NAME x to signed int x is not really true, as it is the identifier NAME that is of type int. The value x is of the type you use to define the identifier, and it is promoted to int.
Additionally, integer promotion takes place in integer operations.
EDIT
Some compilers are quite serious about the difference between enum and int, especially if they have an option to reduce the bit width to the smallest possible. For example, the one I'm using in a job's project, automatically inserts checks on each usage of an enum value against the defined values. Additionally, IIRC, it rejects all implicit conversions, we need to cast explicitly similarly to:
FLAGS f = (FLAGS)((int)F1 | (int)F2);
But this is an extension of this special beast called with specific safety options...

When should I use UINT32_C(), INT32_C(),... macros in C?

I switched to fixed-length integer types in my projects mainly because they help me think about integer sizes more clearly when using them. Including them via #include <inttypes.h> also includes a bunch of other macros like the printing macros PRIu32, PRIu64,...
To assign a constant value to a fixed length variable I can use macros like UINT32_C() and INT32_C(). I started using them whenever I assigned a constant value.
This leads to code similar to this:
uint64_t i;
for (i = UINT64_C(0); i < UINT64_C(10); i++) { ... }
Now I saw several examples which did not care about that. One is the stdbool.h include file:
#define bool _Bool
#define false 0
#define true 1
bool has a size of 1 byte on my machine, so it does not look like an int. But 0 and 1 should be integers which should be turned automatically into the right type by the compiler. If I would use that in my example the code would be much easier to read:
uint64_t i;
for (i = 0; i < 10; i++) { ... }
So when should I use the fixed length constant macros like UINT32_C() and when should I leave that work to the compiler(I'm using GCC)? What if I would write code in MISRA C?
As a rule of thumb, you should use them when the type of the literal matters. There are two things to consider: the size and the signedness.
Regarding size:
An int type is guaranteed by the C standard values up to 32767. Since you can't get an integer literal with a smaller type than int, all values smaller than 32767 should not need to use the macros. If you need larger values, then the type of the literal starts to matter and it is a good idea to use those macros.
Regarding signedness:
Integer literals with no suffix are usually of a signed type. This is potentially dangerous, as it can cause all manner of subtle bugs during implicit type promotion. For example (my_uint8_t + 1) << 31 would cause an undefined behavior bug on a 32 bit system, while (my_uint8_t + 1u) << 31 would not.
This is why MISRA has a rule stating that all integer literals should have an u/U suffix if the intention is to use unsigned types. So in my example above you could use my_uint8_t + UINT32_C(1) but you can as well use 1u, which is perhaps the most readable. Either should be fine for MISRA.
As for why stdbool.h defines true/false to be 1/0, it is because the standard explicitly says so. Boolean conditions in C still use int type, and not bool type like in C++, for backwards compatibility reasons.
It is however considered good style to treat boolean conditions as if C had a true boolean type. MISRA-C:2012 has a whole set of rules regarding this concept, called essentially boolean type. This can give better type safety during static analysis and also prevent various bugs.
It's for using smallish integer literals where the context won't result in the compiler casting it to the correct size.
I've worked on an embedded platform where int is 16 bits and long is 32 bits. If you were trying to write portable code to work on platforms with either 16-bit or 32-bit int types, and wanted to pass a 32-bit "unsigned integer literal" to a variadic function, you'd need the cast:
#define BAUDRATE UINT32_C(38400)
printf("Set baudrate to %" PRIu32 "\n", BAUDRATE);
On the 16-bit platform, the cast creates 38400UL and on the 32-bit platform just 38400U. Those will match the PRIu32 macro of either "lu" or "u".
I think that most compilers would generate identical code for (uint32_t) X as for UINT32_C(X) when X is an integer literal, but that might not have been the case with early compilers.

Initializing bit-fields

When you write
struct {
unsigned a:3, b:2;
} x = {10, 11};
is x.b guaranteed to be 3 by ANSI C (C89)? I have read and reread the standard, but can't seem to find exactly that case.
For example, "result that cannot be represented by the
resulting unsigned integer type is reduced modulo the number that is
one greater than the largest value that can be represented by the
resulting unsigned integer type." speaks about computation, not about initialization. And moreover, bit-field is not really a type.
Also, (when speaking about unsigned t:4) "contains values in the range [0,15]", but it doesn't necessarily mean that initializer must be reduced modulo 16 to be mapped to [0,15].
Struct initialization is really painstakingly detailedly described, but I really can't seem to find exactly that behavior. (Of course compilers do exactly that. And IBM documentation says " when you assign a value that is out of range to a bit field, the low-order bit pattern is preserved and the appropriate bits are assigned.", but I'd like to know if ANSI C standardizes that.
"ANSI C"/C89 has been obsolete for 25 years. Therefore, my answer cites the current C standard ISO 9899:2011, also known as C11.
Pretty much everything related to bit-fields in the C standard is poorly defined. Typically, you will not find anything explicitly addressing the behavior of bit fields, but their behavior is rather specified implicitly, "between the lines". This is why you should avoid using bit fields.
However, I believe that this specific case is well-defined: it should work like any other integer initialization.
The detailed struct initialization rules you mention (6.7.9) show how the literal 11 in the initializer list is related to the variable b. Nothing strange with that. What then applies is "simple assignment", the same thing that would happen as if you wrote x.b = 11;.
When doing any kind of assignment or initialization in C, the right operand is converted to the type of the left operand. This is specified by C11 6.5.16:
In simple assignment (=), the value of the right operand is converted
to the type of the assignment expression and replaces the value stored
in the object designated by the left operand.
In your case, the literal 11 of type int is converted to a bit field of unsigned int:2.
Therefore, the rule you are looking for should be found in the chapter dealing with conversions (C11 6.3). What applies is what you already cited in your question, C11 6.3.1.3:
...if the new type is unsigned, the value is converted by repeatedly
adding or subtracting one more than the maximum value that can be
represented in the new type until the value is in the range of the new
type.
The maximum value of an unsigned int:2 is 3. One more than the maximum value is 3+1=4. The compiler should repeatedly subtract this from the value 11:
11 - (3+1) = 7 does not fit, subtract once more:
7 - (3+1) = 3 does fit, store value 3
But then of course, this is the very same thing as taking the 2 least significant bits of the decimal value 11 and storing them in the bit field.
WRT "speaks about computation, not about initialization", the C89 standard explicitly applies the rules of assignment and conversion to initialization. It also says:
A bit-field is interpreted as an integral type consisting of the specified number of bits.
Given those, while a compiler warning would clearly be in order, it seems that throwing away upper-order bits is guaranteed by the standard.

How to set the value of an enumeration constant outside the range of int?

The C99 standard requires that the expression used to define the value of an enumeration constant has a value representable as an int.
In section 6.7.2.2 paragraph 2 of the C99 standard:
The expression that defines the value of an enumeration constant shall be an integer
constant expression that has a value representable as an int.
However, enumerated types can be defined by the implementation to be compatible with any integer type, including those with a range of values outside of int.
In section 6.7.2.2 paragraph 2 of the C99 standard:
Each enumerated type shall be compatible with char, a signed integer type, or an
unsigned integer type.
This means that while you cannot explicitly set the value of an enumeration constant outside the range of an int, the value of an enumeration constant can be outside the range of an int if the implementation defines the enumeration type to be compatible with an integer type with a range outside of int.
Now I know one way to get a specific value outside the range of int set for an enumeration constant: dummy enumerators.
enum hack{
DUMMY0 = INT_MAX,
DUMMY1,
/* supply as many more dummy enumerators as needed */
...
/* declare desired enumerator */
FOOBAR
};
This works thanks to section 6.7.2.2 paragraph 3 of the C99 standard:
An enumerator with = defines its
enumeration constant as the value of the constant expression.
...
Each subsequent enumerator with no =
defines its enumeration constant as the value of the constant expression obtained by
adding 1 to the value of the previous enumeration constant.
Unfortunately, this only works for positive values greater than INT_MAX, since the value of each subsequent enumerator is only ever incremented. Another caveat is the need to create possibly many dummy enumerators just to acquire the specific enumerator desired.
This leads to the following questions:
Is there a way to set the value of an enumeration constant to a negative value outside the range of int?
Is there a better way to set a positive value outside the range of int to an enumeration constant?
Regarding my dummy enumerator hack, does the C99 standard set a limit on the number of enumerators which may be declared in a single enum?
How to set the value of an enumeration constant outside the range of int?
You don't.
The C99 standard requires that the expression used to define the value
of an enumeration constant has a value representable as an int.
Yes, and the C11 standard didn't change any of this.
However, enumerated types can be defined by the implementation to be
compatible with any integer type, including those with a range of
values outside of int.
Also correct.
This means that while you cannot explicitly set the value of an
enumeration constant outside the range of an int, the value of an
enumeration constant can be outside the range of an int if the
implementation defines the enumeration type to be compatible with an
integer type with a range outside of int.
That's incorrect, but I think you've found a weakness in the wording in the standard. (Update: I don't think it's really a weakness; see below). You quoted 6.7.2.2:
The expression that defines the value of an enumeration constant shall
be an integer constant expression that has a value representable as an
int.
which seems to apply only when the value is defined by an explicit expression, not to a case like this:
enum too_big {
big = INT_MAX,
even_bigger
};
But this doesn't actually work, since even_bigger is declared as a constant of type int, which clearly cannot have the value INT_MAX + 1.
I strongly suspect that the intent is that the above declaration is illegal (a constraint violation); probably 6.7.2.2 should be reworded to make that clearer. (Update: I now think it's clear enough; see below.)
The authors of gcc seem to agree with me:
$ cat c.c
#include <limits.h>
enum huge {
big = INT_MAX,
even_bigger
};
$ gcc -c c.c
c.c:4:5: error: overflow in enumeration values
so even if your interpretation is correct, you're not likely to be able to write and use code that depends on it.
A workaround is to use integers (enumeration types are more or less thinly disguised integers anyway). A const integer object isn't a constant expression, unfortunately, so you might have to resort to using the preprocessor:
typedef long long huge_t;
#define big ((huge_t)INT_MAX)
#define even_bigger (big + 1)
This assumes that long long is wider than int, which is likely but not guaranteed (int and long long could be the same size if int is at least 64 bits).
The answer to your questions 1 and 2 is no; you can't define an enumeration constant, either negative or positive, outside the range of int.
As for your question 3, section 5.2.4.1 of the C11 standard says (roughly) that a compiler must support at least 1023 enumeration constants in a single enumeration. Most compilers don't actually impose a fixed limit, but in any case all of the constants must have values within the range INT_MIN .. INT_MAX, so that doesn't do you much good. (Multiple enumeration constants in the same type can have the same value.)
(The translation limit requirement is actually more complicated than that. A compiler must support at least one program that contains at least one instance of all of an enumerated list of limits. That's a fairly useless requirement as stated. The intent is that the easiest way to meet the requirement given by the Standard is to avoid imposing any fixed limits.)
UPDATE :
I raised this issue on the comp.std.c Usenet newsgroup. Tim Rentsch raised a good point in that discussion, and I now think that this:
enum too_big {
big = INT_MAX,
even_bigger
};
is a constraint violation, requiring a compiler diagnostic.
My concern was that the wording that forbids an explicit value outside the range of int:
The expression that defines the value of an enumeration constant shall
be an integer constant expression that has a value representable as an
int.
does not apply, since there is no (explicit) expression involved. But 6.7.7.2p3 says:
Each subsequent enumerator with no = defines its enumeration
constant as the value of the constant expression obtained by adding 1
to the value of the previous enumeration constant.
(emphasis added). So there is an expression whose value must be representable as an int; it just doesn't appear in the source. I'm not 100% comfortable with that, but I'd say the intent is sufficiently clear.
Here's the discussion on comp.std.c.
Maybe I cannot say anything that Keith Thompson has not told you, yet.
I'll do my try, anyway.
1. Values in the range of int
In the document of ISO C99, I can see in section 6.7.2.2, paragraphs 2 and 3, the following statements:
(2) The expression that defines the value of an enumeration constant shall be an integer
constant expression that has a value representable as an int.
If you write enum T { X = (expr) } VAR, then expr is an integer number in the range of an int, which includes, at least, the range -32767 .. +32767, as you can read in 5.2.4.2.1.
Of course, the paragraph (2) does not imposes any restriction on the type of the identifier X.
2. Enumerator identifiers have type int
In paragraph 3 we can read this line:
(3a) The identifiers in an enumerator list are declared as constants that have type int and may appear wherever such are permitted.
This is a restriction on the type of the identifier. Now X has type int.
3. Discussion about the values
Also, the standard says:
(3b) An enumerator with = defines its enumeration constant as the value of the constant expression.
But this sentence, now, is restricted by (2) and (3a).
So, my interpretation of the standard C99 is as follows:
If you write
enum T { X = INT_MAX, BIGG}
then BIGG has type int (according to (3a)).
As Keith Tompson pointed out, the value (="enumerator constant?") of BIGG is not coming from an expression, representing an out-of-range value (for int). Its value (in mathematical sense) is X+1, because the next rule is applied:
(3c) Each subsequent enumerator with no =
defines its enumeration constant as the value of the constant expression obtained by
adding 1 to the value of the previous enumeration constant.
There is not any rule in the standard (about integer arithmetic) that defines the behaviour of the compiler in this case. So, it would fall in the implementation defined class...
However, in the case that the compiler accepts this out-of-range value, I believe that the (mathematical) X+1 will be converted to a value in the int range.
But, the intended behaviour in (3b) seems to be that the C expression (X+1) == BIGG is always true.
If I am right, then I am agreeing with Keith, and the compiler has to reject the declaration with an Out of range error.
4. The integer type of enumerated types
We can read this even more:
(4a) Each enumerated type shall be compatible with char, a signed integer type, or an
unsigned integer type.
The declaration enum T defines a new integer type.
This type is not neccesarilly related to the type of expression expr,
neither to the type of enumerator X.
It is just another type for another thing: the integer type associated to the enum type T we are defining.
(This type will be the one assigned to the variable VAR_T).
The implementation can decide what integer type is the more appropiated.
If the expressions (like expr) have very small values, as almost always happens,
then the compiler may decide that T is of type char, for example.
If a long long is desired for some reason,
then the type of T will be long long, and so on.
However, this does not change the restrictions to int that the expression expr and the enumerator X have to follow. They are int.
The only rule that relates the types of expr and T is:
(4b) The choice of type is implementation-defined, but shall be capable of representing the values of all the members of the enumeration.
Thus, if you have enum T { X = 0, Y = 5, Z = 9 }, then the type of T is able to be a char.
(The situation is analogous to the case that a character constant, like 'c', which always has type int, is passed to a char variable: char c = 'c';: although 'c' is an int, its value fits in the range of a char).
On the other hand, if you have enum T { X = 20202, Y = -3 }, the compiler cannot choose char for T. It seems that int is the perfect type for T (as for any enum type), but the compiler can choose any other integer type for T whose range contains the values 20202 and -3.
If none value is negative, then the compiler could choose an unsigned integer type.
5. Summary
In summary, we can say that:
There are 4 types involved in an enum declaration: for the expressions (expr), the values (or enumeration constants, coming from expressions or just implicit), the enumerators (X), and the enum type (T).
The type of expressions (as expr) is always int. The type of the values seems hardly intended to be in the range of an int (INT_MAX+1 seems to be not allowed). The type of the enumerators is int (as X). And the enum types (as T) are chosen by the implementation, among the all possible integer types allowed, just fitting the values inside the declaration.
In expressions, we have that:
#define NN 5
enum T { X = 0. Y = 3, Z = (NN*3) } evar;
The expression (NN * 3) + 1 is an int.
The expression (Z + 1) is an int.
If the compiler defines T as a char,
then the expression ((evar = Z), ++evar) is a char.
In response to the recent UPDATE of Keith Thompson:
I think that you are right: the standard says ...the value of the constant expression obtained by adding 1 to the value....
Thus, a non-explicit expression "value" is considered as coming from a constant expression, too.
However, to ensure that only int "values" are allowed, one has to consider, jointly, the restriction that paragraph 2 declares: The expression that defines the value of an enumeration constant ... has a value representable as an int.
Thus, the (non-explicit expression) values also have to fit in an int.
I am convinced now that the intended constraint is that every enumerator value fits in an int.
Thank you, Keith.

structure double output

struct node
{
double a : 23;
int b;
}s;
int main()
{
printf("%d\n",sizeof(s));
}
Why do this produce a compile error? I want to know why we cannot do bit-fields with double datatype.
My answer is for C. I have no idea if it applies to C++.
I suggest you do not try to write multi-language source files. It is hard work.
no prototype in scope for printf
type of sizeof(s) and type required by "%d" do not match
missing return 0; in main (for C89)
What compiler error do you get?
I want to know that we can not do bitwise with double datatype
Because the C99 Standard says so, eg (emphasis is mine)
6.7.2.1/9
A bit-field is interpreted as a signed or unsigned integer type consisting of the specified
number of bits.
C provides a special type of structure member known as a bit field, which is an integer with an explicitly specified number of bits.
Non-integral types cannot be used as base types for bit fields.
Quoted from Wiki :
C also provides a special type of structure member known as a bit field, which is an integer with an explicitly specified number of bits. A bit field is declared as a structure member of type int, signed int, unsigned int, or _Bool, following the member name by a colon (:) and the number of bits it should occupy. The total number of bits in a single bit field must not exceed the total number of bits in its declared type.
in the statement double a : 23; you are using bit field for double which is an error.You should use int instead.
Edit:
The Behavior is implementation dependent use anything other than these.Char may work on your system but it may fail on other platform as it's not part of standard.
Yep, You can't apply bit fields for double ,that's why it is giving compilation error.
Bit fiels are allowed only for signed & unsigned int ,_bool data type.

Resources