Already read through this related question, but was looking for something a little more specific.
Is there a way to tell your compiler specifically how wide you want your enum to be?
If so, how do you do it? I know how to specify it in C#; is it similarly done in C?
Would it even be worth doing? When the enum value is passed to a function, will it be passed as an int-sized value regardless?
I believe there is a flag if you are using GCC.
-fshort-enums
Is there a way to tell your compiler
specifically how wide you want your
enum to be?
In general case no. Not in standard C.
Would it even be worth doing?
It depends on the context. If you are talking about passing parameters to functions, then no, it is not worth doing (see below). If it is about saving memory when building aggregates from enum types, then it might be worth doing. However, in C you can simply use a suitably-sized integer type instead of enum type in aggregates. In C (as opposed to C++) enum types and integer types are almost always interchangeable.
When the enum value is passed to a function, will it be passed as an int-sized value regardless?
Many (most) compilers these days pass all parameters as values of natural word size for the given hardware platform. For example, on a 64-bit platform many compilers will pass all parameters as 64-bit values, regardless of their actual size, even if type int has 32 bits in it on that platform (so, it is not generally passed as "int-sized" value on such a platform). For this reason, it makes no sense to try to optimize enum sizes for parameter passing purposes.
You can force it to be at least a certain size by defining an appropriate value. For example, if you want your enum to be stored as the same size as an int, even though all the values would fit in a char, you can do something like this:
typedef enum {
firstValue = 1,
secondValue = 2,
Internal_ForceMyEnumIntSize = MAX_INT
} MyEnum;
Note, however, that the behavior can be dependent on the implementation.
As you note, passing such a value to a function will cause it to be expanded to an int anyway, but if you are using your type in an array or a struct, then the size will matter. If you really care about element sizes, you should really use types like int8_t, int32_t, etc.
Even if you are writing strict C code, the results are going to be compiler dependent. Employing the strategies from this thread, I got some interesting results...
enum_size.c
#include <stdio.h>
enum __attribute__((__packed__)) PackedFlags {
PACKED = 0b00000001,
};
enum UnpackedFlags {
UNPACKED = 0b00000001,
};
int main (int argc, char * argv[]) {
printf("packed:\t\t%lu\n", sizeof(PACKED));
printf("unpacked:\t%lu\n", sizeof(UNPACKED));
return 0;
}
$ gcc enum_size.c
$ ./a.out
packed: 4
unpacked: 4
$ gcc enum_size.c -fshort_enums
$ ./a.out
packed: 4
unpacked: 4
$ g++ enum_size.c
$ ./a.out
packed: 1
unpacked: 4
$ g++ enum_size.c -fshort_enums
$ ./a.out
packed: 1
unpacked: 1
In my example above, I did not realize any benefit from __attribute__((__packed__)) modifier until I started using the C++ compiler.
EDIT:
#technosaurus's suspicion was correct.
By checking the size of sizeof(enum PackedFlags) instead of sizeof(PACKED) I see the results I had expected...
printf("packed:\t\t%lu\n", sizeof(enum PackedFlags));
printf("unpacked:\t%lu\n", sizeof(enum UnpackedFlags));
I now see the expected results from gcc:
$ gcc enum_size.c
$ ./a.out
packed: 1
unpacked: 4
$ gcc enum_size.c -fshort_enums
$ ./a.out
packed: 1
unpacked: 1
There is also another way if the enum is part of a structure:
enum whatever { a,b,c,d };
struct something {
char :0;
enum whatever field:CHAR_BIT;
char :0;
};
The :0; can be omitted if the enum field is surrounded by normal fields. If there's another bitfield before, the :0 will force byte alignement to the next byte for the field following it.
In some circumstances, this may be helpful:
typedef uint8_t command_t;
enum command_enum
{
CMD_IDENT = 0x00, //!< Identify command
CMD_SCENE_0 = 0x10, //!< Recall Scene 0 command
CMD_SCENE_1 = 0x11, //!< Recall Scene 1 command
CMD_SCENE_2 = 0x12, //!< Recall Scene 2 command
};
/* cmdVariable is of size 8 */
command_t cmdVariable = CMD_IDENT;
On one hand type command_t has size 1 (8bits) and can be used for variable and function parameter type.
On the other hand you can use the enum values for assignation that are of type int by default but the compiler will cast them immediately when assigned to a command_t type variable.
Also, if you do something unsafe like defining and using a CMD_16bit = 0xFFFF, the compiler will warn you with following message:
warning: large integer implicitly truncated to unsigned type [-Woverflow]
As #Nyx0uf says, GCC has a flag which you can set:
-fshort-enums
Allocate to an enum type only as many bytes as it needs for the declared range of possible values. Specifically, the enum type is equivalent to the smallest integer type that has enough room.
Warning: the -fshort-enums switch causes GCC to generate code that is not binary compatible with code generated without that switch. Use it to conform to a non-default application binary interface.
Source: https://gcc.gnu.org/onlinedocs/gcc/Code-Gen-Options.html
Additional great reading for general insight: https://www.embedded.fm/blog/2016/6/28/how-big-is-an-enum.
Interesting...notice the line I highlighted in yellow below! Adding an enum entry called ARM_EXCEPTION_MAKE_ENUM_32_BIT and with a value equal to 0xffffffff, which is the equivalent of UINT32_MAX from stdint.h (see here and here), forces this particular Arm_symbolic_exception_name enum to have an integer type of uint32_t. That is the sole purpose of this ARM_EXCEPTION_MAKE_ENUM_32_BIT entry! It works because uint32_t is the smallest integer type which can contain all of the enum values in this enum--namely: 0 through 8, inclusive, as well as 0xffffffff, or decimal 2^32-1 = 4294967295.
Keywords: ARM_EXCEPTION_MAKE_ENUM_32_BIT enum purpose why have it? Arm_symbolic_exception_name purpose of 0xffffffff enum entry at end.
Right now I can't answer your first two questions, because I am trying to find a good way to do this myself. Maybe I will edit this if I find a strategy that I like. It isn't intuitive though.
But I want to point something out that hasn't been mentioned so far, and to do so I will answer the third question like so:
It is "worth doing" when writing a C API that will be called from languages that aren't C. Anything that directly links to the C code will need to correctly understand the memory layout of all structs, parameter lists, etc in the C code's API. Unfortunately, C types like int, or worst yet, enums, are fairly unpredictably sized (changes by compiler, platform, etc), so knowing the memory layout of anything containing an enum can be dodgy unless your other programming language's compiler is also the C compiler AND it has some in-language mechanism to exploit that knowledge. It is much easier to write problem-free bindings to C libraries when the API uses predictably-sized C types like uint8_t, uint16_t, uint32_t, uint64_t, void*, uintptr_t, etc, and structs/unions composed of those predictably-sized types.
So I would care about enum sizing when it matters for program correctness, such as when memory layout and alignment issues are possible. But I wouldn't worry about it so much for optimization, not unless you have some niche situation that amplifies the opportunity cost (ex: a large array/list of enum-typed values on a memory constrained system like a small MCU).
Unfortunately, situations like what I'm mentioning are not helped by something like -fshort-enums, because this feature is vendor-specific and less predictable (e.g. another system would have to "guess" enum size by approximating GCC's algorithm for -fshort-enums enum sizing). If anything, it would allow people to compile C code in a way that would break common assumptions made by bindings in other languages (or other C code that wasn't compiled with the same option), with the expected result being memory corruption as parameters or struct members get written to, or read from, the wrong locations in memory.
As of C23, this is finally possible in standard C:
You can put a colon and an integer type after the enum keyword (or after the name tag, if it's named) to specify the enum's fixed underyling type, which sets the size and range of the enum type.
Would it even be worth doing? When the enum value is passed to a function, will it be passed as an int-sized value regardless?
On x86_64, the type of a integer does not influence whether it is passed in register or not (as long as it fits in a single register). The size of data on the heap however is very significant for cache performance.
It depends on the values assigned for the enums.
Ex:
If the value greater than 2^32-1 is stored, the size allocated for the overall enum will change to the next size.
Store 0xFFFFFFFFFFFF value to a enum variable, it will give warning if tried to compile in a 32 bit environment (round off warning)
Where as in a 64 bit compilation, it will be successful and the size allocated will be 8 bytes.
Related
I've got some code provided by a vendor that I'm using and its typedef'ing an enum with __attribute__((aligned(1), packed)) and GCC is complaining about the multiple attributes:
error: ignoring attribute 'packed' because it conflicts with attribute 'aligned' [-Werror=attributes]
Not sure what the best approach is here. I feel like both of these attributes are not necessary. Would aligned(1) not also make it packed? And is this even necessary for an enum? Wouldn't it be best to have the struct that this enum goes into be packed?
Thanks!
I've removed the packed attribute and that works to make GCC happy but I want to make sure that it will still behave the same. This is going into an embedded system that relies on memory mapped registers so I need the mappings to be correct or else things won't work.
Here's an example from the code supplied by the vendor:
#define DMESCC_PACKED __attribute__ ((__packed__))
#define DMESCC_ENUM8 __attribute__ ((aligned (1), packed))
typedef enum DMESCC_ENUM8 {DMESCC_OFF, DMESCC_ON} dmescc_bittype_t;
typedef volatile struct {
dmescc_bittype_t rx_char_avail : 1;
dmescc_bittype_t zero_count : 1;
dmescc_bittype_t tx_buf_empty : 1;
dmescc_bittype_t dcd : 1;
dmescc_bittype_t sync_hunt : 1;
dmescc_bittype_t cts : 1;
dmescc_bittype_t txunderrun_eom : 1;
dmescc_bittype_t break_abort : 1;
} DMESCC_PACKED dmescc_rr0_t;
When I build the above code I get the GCC error I mentioned above.
Documentation here: https://gcc.gnu.org/onlinedocs/gcc/Common-Variable-Attributes.html#Common-Variable-Attributes emphasis mine:
When used on a struct, or struct member, the aligned attribute can only increase the alignment; in order to decrease it, the packed attribute must be specified as well. When used as part of a typedef, the aligned attribute can both increase and decrease alignment, and specifying the packed attribute generates a warning.
Note that the effectiveness of aligned attributes for static variables may be limited by inherent limitations in the system linker and/or object file format. On some systems, the linker is only able to arrange for variables to be aligned up to a certain maximum alignment. (For some linkers, the maximum supported alignment may be very very small.) If your linker is only able to align variables up to a maximum of 8-byte alignment, then specifying aligned(16) in an __attribute__ still only provides you with 8-byte alignment. See your linker documentation for further information.
Older GNU documentation said something else. Also, I don't know what the documentation is trying to say here: "specifying the packed attribute generates a warning", because there is no warning in case I do this (gcc x86_64 12.2.0 -Wall -Wextra):
typedef struct
{
char ch;
int i;
} __attribute__((aligned(1), packed)) aligned_packed_t;
However, this effectively places the struct on a 5 byte offset, where the first address appears to be 8 byte aligned (which might be a thing of the linker as suggested in the above docs). We'd have to place it in an array to learn more.
Since I don't really don't trust the GNU documentation, I did some trial & error to reveal how these work in practice. I created 4 structs:
one with aligned(1)
one with such a struct as its member and also aligned(1) in itself
one with packed
one with both aligned(1) and packed (again this compiles cleanly no warnings)
For each struct I created an array, then printed the address of the first 2 array items. Example:
#include <stdio.h>
typedef struct
{
char ch;
int i;
} __attribute__((aligned(1))) aligned_t;
typedef struct
{
char ch;
aligned_t aligned_member;
} __attribute__((aligned(1))) struct_aligned_t;
typedef struct
{
char ch;
int i;
} __attribute__((packed)) packed_t;
typedef struct
{
char ch;
int i;
} __attribute__((aligned(1),packed)) aligned_packed_t;
#define TEST(type,arr) \
printf("%-16s Address: %p Size: %zu\n", #type, (void*)&arr[0], sizeof(type)); \
printf("%-16s Address: %p Size: %zu\n", #type, (void*)&arr[1], sizeof(type));
int main (void)
{
aligned_t arr1 [3];
struct_aligned_t arr2 [3];
packed_t arr3 [3];
aligned_packed_t arr4 [3];
TEST(aligned_t, arr1);
TEST(struct_aligned_t, arr2);
printf(" Address of member: %p\n", arr2[0].aligned_member);
TEST(packed_t, arr3);
TEST(aligned_packed_t, arr4);
}
Output on x86 Linux:
aligned_t Address: 0x7ffc6f3efb90 Size: 8
aligned_t Address: 0x7ffc6f3efb98 Size: 8
struct_aligned_t Address: 0x7ffc6f3efbb0 Size: 12
struct_aligned_t Address: 0x7ffc6f3efbbc Size: 12
Address of member: 0x40123000007fd8
packed_t Address: 0x7ffc6f3efb72 Size: 5
packed_t Address: 0x7ffc6f3efb77 Size: 5
aligned_packed_t Address: 0x7ffc6f3efb81 Size: 5
aligned_packed_t Address: 0x7ffc6f3efb86 Size: 5
The first struct with just aligned(1) didn't make any difference against a normal struct.
The second struct where the first struct was included as a member, to see if it would be misaligned internally, did not pack it any tighter either, nor did the member get allocated at a misaligned (1 byte) address.
The third struct with only packed did get allocated at a potentially misaligned address and packed into 5 bytes.
The fourth struct with both aligned(1) and packed works just as the one that had packed only.
So my conclusion is that "the aligned attribute can only increase the alignment" is correct and as expected aligned(1) is therefore nonsense. However, you can use it to increase the alignment. ((aligned(16), packed) did give 16 bit size, which effectively cancels packed.
Also I can't make sense of this part of the manual:
When used as part of a typedef, the aligned attribute can both increase and decrease alignment, and specifying the packed attribute generates a warning.
Either I'm missing something or the docs are wrong (again)...
Not sure what the best approach is here. I feel like both of these
attributes are not necessary. Would aligned(1) not also make it
packed?
No, it wouldn't. From the docs:
The aligned attribute specifies a minimum alignment (in bytes) for
variables of the specified type.
and
When attached to an enum definition, the packed attribute indicates that the smallest integral type should be used.
These properties are related but neither is redundant with the other (which makes GCC's diagnostic surprising).
And is this even necessary for an enum? Wouldn't it be best to
have the struct that this enum goes into be packed?
It is meaningful for an enum to be packed regardless of how it is used to compose other types. In particular, having packed on an enum is (only) about the storage size of objects of the enum type. It does not imply packing of structure types that have members of the enum type, but you might want that, too.
On the other hand, the alignment requirement of the enum type is irrelevant to the layout of structure types that have the packed attribute. That's pretty much the point of structure packing.
I've removed the packed attribute and that works to make GCC happy but
I want to make sure that it will still behave the same. This is going
into an embedded system that relies on memory mapped registers so I
need the mappings to be correct or else things won't work.
If only one of the two attributes can be retained, then packed should be that one. Removing it very likely does cause meaningful changes, especially if the enum is used as a structure member or as the type of a memory-mapped register. I can't guarantee that removing the aligned attribute won't also cause behavioral changes, but that's less likely.
It might be worth your while to ask the vendor what version of GCC they use for development and testing, and what range of versions they claim their code is compatible with.
Overall, however, the whole thing has bad code smell. Where it is essential to control storage size exactly, explicit-size integer types such as uint8_t should be used.
Addendum
With regard to the example code added to the question: that the enum type in question is used as the type of a bitfield changes the complexity of the question. Portable code steers well clear of bitfields.
The C language specification does not guarantee that an enum type such as that one can be used as the type of a bitfield member, so you're straight away into implementation-defined territory. Not that using one of the types the specification designates are supported would delay that very long, because many of the properties of bitfields and the structures containing them are implementation defined anyway, in particular,
which data types other than qualified and unqualified versions of _Bool, signed int, and unsigned int are allowed as the type of a bitfield member;
the size and alignment requirement of the addressible storage units in which bitfields are stored (the spec does not connect these in any way with bitfields' declared types);
whether bitfields assigned to the same addressible storage unit are arranged from least-significant position to most, or the opposite;
whether bitfields can be split across adjacent addressible storage units;
whether bitfield members may have atomic type.
GCC's definitions for these behaviors are here: https://gcc.gnu.org/onlinedocs/gcc/Structures-unions-enumerations-and-bit-fields-implementation.html#Structures-unions-enumerations-and-bit-fields-implementation. Note well that many of them depend on the target ABI.
To use the vendor code safely, you really need to know which compiler it was developed for and tested against, and if that's a version of GCC, what target ABI. If you learn or are willing to assume GCC targeting the same ABI that you are targeting, then keep packed, dump aligned(1), and test thoroughly. Otherwise, you'll probably want to do more research.
I have an enum like this
typedef enum {
FIRST,
SECOND,
THIRD = 0X80000001,
FOURTH,
FIFTH,
} STATUS;
I am getting a pedantic warning since I am compiling my files with the option -Wpedantic:
warning: ISO C restricts enumerator values to range of 'int' [-Wpedantic]
I found that it occurs since when I convert the hex value 0X80000001 to integer it exceeds the unsigned integer limits. My purpose is to have continuous hex values as the status in the enum without this warning.
I cannot use the macros since this will defy the purpose of having the enums in the first place. What code change will avoid this warning?
Enumeration constants are guaranteed to be of the same size as (signed) int. Apparently your system uses 32 bit int, so an unsigned hex literal larger than 0x7FFFFFFF will not fit.
So the warning is not just "pedantic", it hints of a possibly severe bug. Note that -pedantic in GCC does not mean "be picky and give me unimportant warnings" but rather "ensure that my code actually follows the C standard".
It appears that you want to do a list of bit masks or hardware addresses, or some other hardware-related programming. enum is unsuitable for such tasks, because in hardware-related programming, you rarely ever want to use signed types, but always unsigned ones.
If you must have a safe and portable program, then there is no elegant way to do this. C is a language with a lot of flaws, the way enum is defined by the standard is one of them.
One work-around is to use some sort of "poor man's enum", such as:
typedef uint32_t STATUS;
#define THIRD 0X80000001
If you must also have the increased type safety of an enum, then you could possibly use a struct:
typedef struct
{
uint32_t value;
} STATUS;
Or alternatively, just declare an array of constants and use an enum to define the array index. Probably the cleanest solution but takes a little bit of extra overhead:
typedef enum {
FIRST,
SECOND,
THIRD,
FOURTH,
FIFTH,
STATUS_N
} STATUS;
const uint32_t STATUS_DATA [STATUS_N] =
{
0,
1,
0X80000001,
0X80000002,
0X80000003
};
I am confused about when to use macros or enums. Both can be used as constants, but what is the difference between them and what is the advantage of either one? Is it somehow related to compiler level or not?
In terms of readability, enumerations make better constants than macros, because related values are grouped together. In addition, enum defines a new type, so the readers of your program would have easier time figuring out what can be passed to the corresponding parameter.
Compare
#define UNKNOWN 0
#define SUNDAY 1
#define MONDAY 2
#define TUESDAY 3
...
#define SATURDAY 7
to
typedef enum {
UNKNOWN,
SUNDAY,
MONDAY,
TUESDAY,
...
SATURDAY,
} Weekday;
It is much easier to read code like this
void calendar_set_weekday(Weekday wd);
than this
void calendar_set_weekday(int wd);
because you know which constants it is OK to pass.
A macro is a preprocessor thing, and the compiled code has no idea about the identifiers you create. They have been already replaced by the preprocessor before the code hits the compiler. An enum is a compile time entity, and the compiled code retains full information about the symbol, which is available in the debugger (and other tools).
Prefer enums (when you can).
In C, it is best to use enums for actual enumerations: when some variable can hold one of multiple values which can be given names. One advantage of enums is that the compiler can perform some checks beyond what the language requires, like that a switch statement on the enum type is not missing one of the cases. The enum identifiers also propagate into the debugging information. In a debugger, you can see the identifier name as the value of an enum variable, rather than just the numeric value.
Enumerations can be used just for the side effect of creating symbolic constants of integral type. For instance:
enum { buffer_size = 4096 }; /* we don't care about the type */
this practice is not that wide spread. For one thing, buffer_size will be used as an integer and not as an enumerated type. A debugger will not render 4096 into buffer_size, because that value won't be represented as the enumerated type. If you declare some char array[max_buffer_size]; then sizeof array will not show up as buffer_size. In this situation, the enumeration constant disappears at compile time, so it might as well be a macro. And there are disadvantages, like not being able to control its exact type. (There might be some small advantage in some situation where the output of the preprocessing stages of translation is being captured as text. A macro will have turned into 4096, whereas buffer_size will stay as buffer_size).
A preprocessor symbol lets us do this:
#define buffer_size 0L /* buffer_size is a long int */
Note that various values from C's <limits.h> like UINT_MAX are preprocessor symbols and not enum symbols, with good reasons for that, because those identifiers need to have a precisely determined type. Another advantage of a preprocessor symbol is that we can test for its presence, or even make decisions based on its value:
#if ULONG_MAX > UINT_MAX
/* unsigned long is wider than unsigned int */
#endif
Of course we can test enumerated constants also, but not in such a way that we can change global declarations based on the result.
Enumerations are also ill suited for bitmasks:
enum modem_control { mc_dsr = 0x1, mc_dtr = 0x2, mc_rts = 0x4, ... }
it just doesn't make sense because when the values are combined with a bitwise OR, they produce a value which is outside of the type. Such code causes a headache, too, if it is ever ported to C++, which has (somewhat more) type-safe enumerations.
Note there are some differences between macros and enums, and either of these properties may make them (un)suitable as a particular constant.
enums are signed (compatible with int). In any context where an unsigned type is required (think especially bitwise operations!), enums are out.
if long long is wider than int, big constants won't fit in an enum.
The size of an enum is (usually) sizeof(int). For arrays of small values (up to say, CHAR_MAX) you might want a char foo[] rather than an enum foo[] array.
enums are integral numbers. You can't have enum funny_number { PI=3.14, E=2.71 }.
enums are a C89 feature; K&R compilers (admittedly ancient) don't understand them.
If macro is implemented properly (i.e it does not suffer from associativity issues when substituted), then there's not much difference in applicability between macro and enum constants in situations where both are applicable, i.e. in situation where you need signed integer constants specifically.
However, in general case macros provide much more flexible functionality. Enums impose a specific type onto your constants: they will have type int (or, possibly, larger signed integer type), and they will always be signed. With macros you can use constant syntax, suffixes and/or explicit type conversions to produce a constant of any type.
Enums work best when you have a group of tightly associated sequential integer constants. They work especially well when you don't care about the actual values of the constants at all, i.e. when you only care about them having some well-behaved unique values. In all other cases macros are a better choice (or basically the only choice).
As a practical matter, there is little difference. They are equally usable as constants in your programs. Some may prefer one or the other for stylistic reasons, but I can't think of any technical reason to prefer one over the other.
One difference is that macros allow you to control the integral type of related constants. But an enum will use an int.
#define X 100L
enum { Y = 100L };
printf("%ld\n", X);
printf("%d\n", Y); /* Y has int type */
enum has an advantage: block scope:
{ enum { E = 12 }; }
{ enum { E = 13 }; }
With macros there is a need to #undef.
Already read through this related question, but was looking for something a little more specific.
Is there a way to tell your compiler specifically how wide you want your enum to be?
If so, how do you do it? I know how to specify it in C#; is it similarly done in C?
Would it even be worth doing? When the enum value is passed to a function, will it be passed as an int-sized value regardless?
I believe there is a flag if you are using GCC.
-fshort-enums
Is there a way to tell your compiler
specifically how wide you want your
enum to be?
In general case no. Not in standard C.
Would it even be worth doing?
It depends on the context. If you are talking about passing parameters to functions, then no, it is not worth doing (see below). If it is about saving memory when building aggregates from enum types, then it might be worth doing. However, in C you can simply use a suitably-sized integer type instead of enum type in aggregates. In C (as opposed to C++) enum types and integer types are almost always interchangeable.
When the enum value is passed to a function, will it be passed as an int-sized value regardless?
Many (most) compilers these days pass all parameters as values of natural word size for the given hardware platform. For example, on a 64-bit platform many compilers will pass all parameters as 64-bit values, regardless of their actual size, even if type int has 32 bits in it on that platform (so, it is not generally passed as "int-sized" value on such a platform). For this reason, it makes no sense to try to optimize enum sizes for parameter passing purposes.
You can force it to be at least a certain size by defining an appropriate value. For example, if you want your enum to be stored as the same size as an int, even though all the values would fit in a char, you can do something like this:
typedef enum {
firstValue = 1,
secondValue = 2,
Internal_ForceMyEnumIntSize = MAX_INT
} MyEnum;
Note, however, that the behavior can be dependent on the implementation.
As you note, passing such a value to a function will cause it to be expanded to an int anyway, but if you are using your type in an array or a struct, then the size will matter. If you really care about element sizes, you should really use types like int8_t, int32_t, etc.
Even if you are writing strict C code, the results are going to be compiler dependent. Employing the strategies from this thread, I got some interesting results...
enum_size.c
#include <stdio.h>
enum __attribute__((__packed__)) PackedFlags {
PACKED = 0b00000001,
};
enum UnpackedFlags {
UNPACKED = 0b00000001,
};
int main (int argc, char * argv[]) {
printf("packed:\t\t%lu\n", sizeof(PACKED));
printf("unpacked:\t%lu\n", sizeof(UNPACKED));
return 0;
}
$ gcc enum_size.c
$ ./a.out
packed: 4
unpacked: 4
$ gcc enum_size.c -fshort_enums
$ ./a.out
packed: 4
unpacked: 4
$ g++ enum_size.c
$ ./a.out
packed: 1
unpacked: 4
$ g++ enum_size.c -fshort_enums
$ ./a.out
packed: 1
unpacked: 1
In my example above, I did not realize any benefit from __attribute__((__packed__)) modifier until I started using the C++ compiler.
EDIT:
#technosaurus's suspicion was correct.
By checking the size of sizeof(enum PackedFlags) instead of sizeof(PACKED) I see the results I had expected...
printf("packed:\t\t%lu\n", sizeof(enum PackedFlags));
printf("unpacked:\t%lu\n", sizeof(enum UnpackedFlags));
I now see the expected results from gcc:
$ gcc enum_size.c
$ ./a.out
packed: 1
unpacked: 4
$ gcc enum_size.c -fshort_enums
$ ./a.out
packed: 1
unpacked: 1
There is also another way if the enum is part of a structure:
enum whatever { a,b,c,d };
struct something {
char :0;
enum whatever field:CHAR_BIT;
char :0;
};
The :0; can be omitted if the enum field is surrounded by normal fields. If there's another bitfield before, the :0 will force byte alignement to the next byte for the field following it.
In some circumstances, this may be helpful:
typedef uint8_t command_t;
enum command_enum
{
CMD_IDENT = 0x00, //!< Identify command
CMD_SCENE_0 = 0x10, //!< Recall Scene 0 command
CMD_SCENE_1 = 0x11, //!< Recall Scene 1 command
CMD_SCENE_2 = 0x12, //!< Recall Scene 2 command
};
/* cmdVariable is of size 8 */
command_t cmdVariable = CMD_IDENT;
On one hand type command_t has size 1 (8bits) and can be used for variable and function parameter type.
On the other hand you can use the enum values for assignation that are of type int by default but the compiler will cast them immediately when assigned to a command_t type variable.
Also, if you do something unsafe like defining and using a CMD_16bit = 0xFFFF, the compiler will warn you with following message:
warning: large integer implicitly truncated to unsigned type [-Woverflow]
As #Nyx0uf says, GCC has a flag which you can set:
-fshort-enums
Allocate to an enum type only as many bytes as it needs for the declared range of possible values. Specifically, the enum type is equivalent to the smallest integer type that has enough room.
Warning: the -fshort-enums switch causes GCC to generate code that is not binary compatible with code generated without that switch. Use it to conform to a non-default application binary interface.
Source: https://gcc.gnu.org/onlinedocs/gcc/Code-Gen-Options.html
Additional great reading for general insight: https://www.embedded.fm/blog/2016/6/28/how-big-is-an-enum.
Interesting...notice the line I highlighted in yellow below! Adding an enum entry called ARM_EXCEPTION_MAKE_ENUM_32_BIT and with a value equal to 0xffffffff, which is the equivalent of UINT32_MAX from stdint.h (see here and here), forces this particular Arm_symbolic_exception_name enum to have an integer type of uint32_t. That is the sole purpose of this ARM_EXCEPTION_MAKE_ENUM_32_BIT entry! It works because uint32_t is the smallest integer type which can contain all of the enum values in this enum--namely: 0 through 8, inclusive, as well as 0xffffffff, or decimal 2^32-1 = 4294967295.
Keywords: ARM_EXCEPTION_MAKE_ENUM_32_BIT enum purpose why have it? Arm_symbolic_exception_name purpose of 0xffffffff enum entry at end.
Right now I can't answer your first two questions, because I am trying to find a good way to do this myself. Maybe I will edit this if I find a strategy that I like. It isn't intuitive though.
But I want to point something out that hasn't been mentioned so far, and to do so I will answer the third question like so:
It is "worth doing" when writing a C API that will be called from languages that aren't C. Anything that directly links to the C code will need to correctly understand the memory layout of all structs, parameter lists, etc in the C code's API. Unfortunately, C types like int, or worst yet, enums, are fairly unpredictably sized (changes by compiler, platform, etc), so knowing the memory layout of anything containing an enum can be dodgy unless your other programming language's compiler is also the C compiler AND it has some in-language mechanism to exploit that knowledge. It is much easier to write problem-free bindings to C libraries when the API uses predictably-sized C types like uint8_t, uint16_t, uint32_t, uint64_t, void*, uintptr_t, etc, and structs/unions composed of those predictably-sized types.
So I would care about enum sizing when it matters for program correctness, such as when memory layout and alignment issues are possible. But I wouldn't worry about it so much for optimization, not unless you have some niche situation that amplifies the opportunity cost (ex: a large array/list of enum-typed values on a memory constrained system like a small MCU).
Unfortunately, situations like what I'm mentioning are not helped by something like -fshort-enums, because this feature is vendor-specific and less predictable (e.g. another system would have to "guess" enum size by approximating GCC's algorithm for -fshort-enums enum sizing). If anything, it would allow people to compile C code in a way that would break common assumptions made by bindings in other languages (or other C code that wasn't compiled with the same option), with the expected result being memory corruption as parameters or struct members get written to, or read from, the wrong locations in memory.
As of C23, this is finally possible in standard C:
You can put a colon and an integer type after the enum keyword (or after the name tag, if it's named) to specify the enum's fixed underyling type, which sets the size and range of the enum type.
Would it even be worth doing? When the enum value is passed to a function, will it be passed as an int-sized value regardless?
On x86_64, the type of a integer does not influence whether it is passed in register or not (as long as it fits in a single register). The size of data on the heap however is very significant for cache performance.
It depends on the values assigned for the enums.
Ex:
If the value greater than 2^32-1 is stored, the size allocated for the overall enum will change to the next size.
Store 0xFFFFFFFFFFFF value to a enum variable, it will give warning if tried to compile in a 32 bit environment (round off warning)
Where as in a 64 bit compilation, it will be successful and the size allocated will be 8 bytes.
Is the sizeof(enum) == sizeof(int), always ?
Or is it compiler dependent?
Is it wrong to say, as compiler are optimized for word lengths (memory alignment) ie y int is the word-size on a particular compiler? Does it means that there is no processing penalty if I use enums, as they would be word aligned?
Is it not better if I put all the return codes in an enum, as i clearly do not worry about the values it get, only the names while checking the return types. If this is the case wont #DEFINE be better as it would save memory.
What is the usual practice?
If I have to transport these return types over a network and some processing has to be done at the other end, what would you prefer enums/#defines/ const ints.
EDIT - Just checking on net, as complier don't symbolically link macros, how do people debug then, compare the integer value with the header file?
From Answers —I am adding this line below, as I need clarifications—
"So it is implementation-defined, and
sizeof(enum) might be equal to
sizeof(char), i.e. 1."
Does it not mean that compiler checks for the range of values in enums, and then assign memory. I don't think so, of course I don't know. Can someone please explain me what is "might be".
It is compiler dependent and may differ between enums. The following are the semantics
enum X { A, B };
// A has type int
assert(sizeof(A) == sizeof(int));
// some integer type. Maybe even int. This is
// implementation defined.
assert(sizeof(enum X) == sizeof(some_integer_type));
Note that "some integer type" in C99 may also include extended integer types (which the implementation, however, has to document, if it provides them). The type of the enumeration is some type that can store the value of any enumerator (A and B in this case).
I don't think there are any penalties in using enumerations. Enumerators are integral constant expressions too (so you may use it to initialize static or file scope variables, for example), and i prefer them to macros whenever possible.
Enumerators don't need any runtime memory. Only when you create a variable of the enumeration type, you may use runtime memory. Just think of enumerators as compile time constants.
I would just use a type that can store the enumerator values (i should know the rough range of values before-hand), cast to it, and send it over the network. Preferably the type should be some fixed-width one, like int32_t, so it doesn't come to conflicts when different machines are involved. Or i would print the number, and scan it on the other side, which gets rid of some of these problems.
Response to Edit
Well, the compiler is not required to use any size. An easy thing to see is that the sign of the values matter - unsigned types can have significant performance boost in some calculations. The following is the behavior of GCC 4.4.0 on my box
int main(void) {
enum X { A = 0 };
enum X a; // X compatible with "unsigned int"
unsigned int *p = &a;
}
But if you assign a -1, then GCC choses to use int as the type that X is compatible with
int main(void) {
enum X { A = -1 };
enum X a; // X compatible with "int"
int *p = &a;
}
Using the option --short-enums of GCC, that makes it use the smallest type still fitting all the values.
int main() {
enum X { A = 0 };
enum X a; // X compatible with "unsigned char"
unsigned char *p = &a;
}
In recent versions of GCC, the compiler flag has changed to -fshort-enums. On some targets, the default type is unsigned int. You can check the answer here.
C99, 6.7.2.2p4 says
Each enumerated type shall be
compatible with char, a signed
integer type, or an unsigned
integer type. The choice of type
is implementation-defined,108) but
shall be capable of representing the
values of all the members of the
enumeration. [...]
Footnote 108 adds
An implementation may delay the choice of which integer
type until all enumeration constants have been seen.
So it is implementation-defined, and sizeof(enum) might be equal to sizeof(char), i.e. 1.
In chosing the size of some small range of integers, there is always a penalty. If you make it small in memory, there probably is a processing penalty; if you make it larger, there is a space penalty. It's a time-space-tradeoff.
Error codes are typically #defines, because they need to be extensible: different libraries may add new error codes. You cannot do that with enums.
Is the sizeof(enum) == sizeof(int), always
The ANSI C standard says:
Each enumerated type shall be compatible with char, a signed integer type, or an unsigned integer type. The choice of type is implementation-defined. (6.7.2.2 Enumerationspecifiers)
So I would take that to mean no.
If this is the case wont #DEFINE be better as it would save memory.
In what way would using defines save memory over using an enum? An enum is just a type that allows you to provide more information to the compiler. In the actual resulting executable, it's just turned in to an integer, just as the preprocessor converts a macro created with #define in to its value.
What is the usual practise. I if i have to transport these return types over a network and some processing has to be done at the other end
If you plan to transport values over a network and process them on the other end, you should define a protocol. Decide on the size in bits of each type, the endianess (in which order the bytes are) and make sure you adhere to that in both the client and the server code. Also don't just assume that because it happens to work, you've got it right. It just might be that the endianess, for example, on your chosen client and server platforms matches, but that might not always be the case.
No.
Example: The CodeSourcery compiler
When you define an enum like this:
enum MyEnum1 {
A=1,
B=2,
C=3
};
// will have the sizeof 1 (fits in a char)
enum MyEnum1 {
A=1,
B=2,
C=3,
D=400
};
// will have the sizeof 2 (doesn't fit in a char)
Details from their mailing list
On some compiler the size of an enum is depending on how many entry's are in the Enum. (less than 255 Entrys => Byte, More than 255 Entrys int)
But this is depending on the Compiler and the Compiler Settings.
enum fruits {apple,orange,strawberry,grapefruit};
char fruit = apple;
fruit = orange;
if (fruit < strawberry)
...
all of this works perfectly
if you want a specific underlying type for an enum instance, just don't use the type itself.