Is the sizeof(enum) == sizeof(int), always? - c

Is the sizeof(enum) == sizeof(int), always ?
Or is it compiler dependent?
Is it wrong to say, as compiler are optimized for word lengths (memory alignment) ie y int is the word-size on a particular compiler? Does it means that there is no processing penalty if I use enums, as they would be word aligned?
Is it not better if I put all the return codes in an enum, as i clearly do not worry about the values it get, only the names while checking the return types. If this is the case wont #DEFINE be better as it would save memory.
What is the usual practice?
If I have to transport these return types over a network and some processing has to be done at the other end, what would you prefer enums/#defines/ const ints.
EDIT - Just checking on net, as complier don't symbolically link macros, how do people debug then, compare the integer value with the header file?
From Answers —I am adding this line below, as I need clarifications—
"So it is implementation-defined, and
sizeof(enum) might be equal to
sizeof(char), i.e. 1."
Does it not mean that compiler checks for the range of values in enums, and then assign memory. I don't think so, of course I don't know. Can someone please explain me what is "might be".

It is compiler dependent and may differ between enums. The following are the semantics
enum X { A, B };
// A has type int
assert(sizeof(A) == sizeof(int));
// some integer type. Maybe even int. This is
// implementation defined.
assert(sizeof(enum X) == sizeof(some_integer_type));
Note that "some integer type" in C99 may also include extended integer types (which the implementation, however, has to document, if it provides them). The type of the enumeration is some type that can store the value of any enumerator (A and B in this case).
I don't think there are any penalties in using enumerations. Enumerators are integral constant expressions too (so you may use it to initialize static or file scope variables, for example), and i prefer them to macros whenever possible.
Enumerators don't need any runtime memory. Only when you create a variable of the enumeration type, you may use runtime memory. Just think of enumerators as compile time constants.
I would just use a type that can store the enumerator values (i should know the rough range of values before-hand), cast to it, and send it over the network. Preferably the type should be some fixed-width one, like int32_t, so it doesn't come to conflicts when different machines are involved. Or i would print the number, and scan it on the other side, which gets rid of some of these problems.
Response to Edit
Well, the compiler is not required to use any size. An easy thing to see is that the sign of the values matter - unsigned types can have significant performance boost in some calculations. The following is the behavior of GCC 4.4.0 on my box
int main(void) {
enum X { A = 0 };
enum X a; // X compatible with "unsigned int"
unsigned int *p = &a;
}
But if you assign a -1, then GCC choses to use int as the type that X is compatible with
int main(void) {
enum X { A = -1 };
enum X a; // X compatible with "int"
int *p = &a;
}
Using the option --short-enums of GCC, that makes it use the smallest type still fitting all the values.
int main() {
enum X { A = 0 };
enum X a; // X compatible with "unsigned char"
unsigned char *p = &a;
}
In recent versions of GCC, the compiler flag has changed to -fshort-enums. On some targets, the default type is unsigned int. You can check the answer here.

C99, 6.7.2.2p4 says
Each enumerated type shall be
compatible with char, a signed
integer type, or an unsigned
integer type. The choice of type
is implementation-defined,108) but
shall be capable of representing the
values of all the members of the
enumeration. [...]
Footnote 108 adds
An implementation may delay the choice of which integer
type until all enumeration constants have been seen.
So it is implementation-defined, and sizeof(enum) might be equal to sizeof(char), i.e. 1.
In chosing the size of some small range of integers, there is always a penalty. If you make it small in memory, there probably is a processing penalty; if you make it larger, there is a space penalty. It's a time-space-tradeoff.
Error codes are typically #defines, because they need to be extensible: different libraries may add new error codes. You cannot do that with enums.

Is the sizeof(enum) == sizeof(int), always
The ANSI C standard says:
Each enumerated type shall be compatible with char, a signed integer type, or an unsigned integer type. The choice of type is implementation-defined. (6.7.2.2 Enumerationspecifiers)
So I would take that to mean no.
If this is the case wont #DEFINE be better as it would save memory.
In what way would using defines save memory over using an enum? An enum is just a type that allows you to provide more information to the compiler. In the actual resulting executable, it's just turned in to an integer, just as the preprocessor converts a macro created with #define in to its value.
What is the usual practise. I if i have to transport these return types over a network and some processing has to be done at the other end
If you plan to transport values over a network and process them on the other end, you should define a protocol. Decide on the size in bits of each type, the endianess (in which order the bytes are) and make sure you adhere to that in both the client and the server code. Also don't just assume that because it happens to work, you've got it right. It just might be that the endianess, for example, on your chosen client and server platforms matches, but that might not always be the case.

No.
Example: The CodeSourcery compiler
When you define an enum like this:
enum MyEnum1 {
A=1,
B=2,
C=3
};
// will have the sizeof 1 (fits in a char)
enum MyEnum1 {
A=1,
B=2,
C=3,
D=400
};
// will have the sizeof 2 (doesn't fit in a char)
Details from their mailing list

On some compiler the size of an enum is depending on how many entry's are in the Enum. (less than 255 Entrys => Byte, More than 255 Entrys int)
But this is depending on the Compiler and the Compiler Settings.

enum fruits {apple,orange,strawberry,grapefruit};
char fruit = apple;
fruit = orange;
if (fruit < strawberry)
...
all of this works perfectly
if you want a specific underlying type for an enum instance, just don't use the type itself.

Related

Staticly assert that enum is a certain underlying type

MISRA 10.1 forbids performing arithmetic on an object of an enumerated type.
An operand of essentially enum type should not be used in an arithmetic operation because
an enum object uses an implementation-defined integer type. An operation involving an
enum object may therefore yield a result with an unexpected type. Note that an enumeration
constant from an anonymous enum has essentially signed type.
They also stipulate that the ++ and -- unary operators are treated as binary addition and subtraction for the purpose of this rule.
If I use integers in the loop control structure I'll still need to cast them back to enum later, which will violate Rule 10.5
The value of an expression should not be cast to an inappropriate
essential type
Is there a way I can use static assertions to guarantee some assumptions about the underlying enum type? This code may be reused in the future on another architecture. I'd like to deviate from 10.5 in this case with confidence that the code will throw a compile time error if some assumption about what the underlying enumerate type is violated.
Contrived example:
enum {thing1, thing2, ... , thing_max } thing_index_t
...
for(int thing_index = 0; thing_index < (int) thing_max; ++thing_index)
{
init_something((thing_index_t) thing_index);
// ^~~~~~~~~~~~~~~~~~~~~~~~~
// cannot cast a signed value to an enum type
// [MISRA 2012 Rule 10.5, advisory]
}
This should always be a safe cast if I statically assert that sizeof(thing_index_t == int); and that thing1 == 0u right?
Int will always be large enough to hold my whole range of values without promotion FWIW.
Rule 10.5 is overall sound, but controlled conversions from enum to signed/unsigned aren't dangerous. MISRAs concern is that you might have an enum like enum {thing1=123, thing2=456, .... But if you know that the enumerator constants are from 0 to max, it is then mostly safe to go to/from integers.
You don't need a formal deviation for advisory rules. I would rather leave a comment such as
/* Violates MISRA 10.5 but iterating from thing1 to thing_num is safe.
Integer type is used since arithmetic on enums is forbidden by 10.1. */
(Or use whatever process you have in place for dealing with advisory rules.)
As for static asserts, sizeof(thing_index_t == int) proves nothing, since the allowed enumeration constant values is what matters. And thing1 == 0u is guaranteed by the C standard so you don't need to assert that.
A static assert to ensure enum integrity should rather look like
#define THING_VALUES \
thing1, \
thing2, \
thing_num \
typedef enum { thing1=123, thing2=456, thing_num } thing_index_t;
const size_t expected_size = sizeof((thing_index_t[]){ THING_VALUES }) / sizeof(thing_index_t);
_Static_assert( thing_num+1 == expected_size );
where the compound literal (thing_index_t[]){ THING_VALUES }) gets a size corresponding to the number of enumeration constants in the list. expected_size is the number of items. This asserts that no special initializers such as thing1=123 are present.
The only loop hole is something exotic like thing1=123, thing2=1, which this won't catch. To protect against that, you would need to go even further with macros, implementing the whole thing with X macros etc.

What part of the C language handles type checking?

Does a language feature allow the compiler to check the type of a variable in memory, or is type checking based only on the keyword used for the variable type?
For example:
unsigned short n = 3;
int *p = &n;
Both int and short use 4 bytes in memory, but the compiler cannot implicitly convert from a short * to an int *. How does the compiler know that n isn't a valid address for p in this case?
Does a language feature allow the compiler to check the type of a variable in memory, or is type checking based only on the keyword used for the variable type?
This is a very confused question. The compiler is the thing that implements the language. Therefore a language feature might require the compiler to do certain things (in order to make the feature work), but it doesn't allow the compiler to do things.
A "variable in memory" is a run-time concept. The compiler is only involved at compile time: It translates code in some language (the source language) into another language (the target language, typically assembler code / machine code). It emits instructions that (when executed) reserve memory and use this memory to store values. But at run-time, when the program is actually executed, the compiler is no longer part of the picture, so it can't check anything.
In C, types are checked at compile time. The compiler knows the types of literals (e.g. 42 is an int and "hello" is a char [6]), and it knows the type of everything you declare (because it has to parse the declarations), including variables. Type checking and type conversion rules are unrelated to the sizes of types.
For example:
short int a = 42;
double b = a; // OK, even though commonly sizeof a == 2 and sizeof b == 8
On the other hand:
signed char c;
char *p = &c; // error, even though commonly char and signed char have the same
// size, representation, and range of possible values
It is perfectly possible to type-check C without actually generating any code.
Every expression has a type, ultimately derived from the types of the variables and literals that appear in it. The type of &n is unsigned short*, and that cannot be used to initialize a variable of type int*. This has nothing to do with examining memory, so it works regardless of context other than the variable types.

When to use uint16_t vs int and when to cast type [duplicate]

This question already has answers here:
Should I use cstdint?
(6 answers)
Closed 8 years ago.
I have 2 questions about C programming:
For int and uint16_t, long and uint32_t, and so on. When should I use the u*_t types instead of int, long, and so on? I found it confusing to choose which one is best for my program.
When do I need to cast type?
I have the following statement in my program:
long * src;
long * dst;
...
memcpy(dst, src, len);
My friend changes this to
memcpy((char *)dst, (char *)src, len).
This is just example I encountered. Generally, I am confused when cast is required?
Use the plain types (int etc) except when you need a precisely-sized type. You might need the precisely sized type if you are working with a wire protocol which defines that the size field shall be a 2-byte unsigned integer (hence uint16_t), but for most work, most of the time, use the plain types. (There are some caveats to this, but most of the time, most people can work with the plain types for simple numeric work. If you are working to a set of interfaces, use the types dictated by the interfaces. If you're using multiple interfaces and the types clash, you'll have to consider using casting some of the time — or change one or both interfaces. Etc.)
The casts added by your friend are pointless. The actual prototype of memcpy() is:
void *memcpy(void * restrict s1, const void * restrict s2, size_t n);
The compiler converts the long * values to void * (nominally via char * because of the cast), all of which is almost always a no-op.
More generally, you use a cast when you need to change the type of something. One place you might need it is in bitwise operations, where you want a 64-bit result but the operands are 32-bit and leaving the conversion until after the bitwise operations gives a different result from the one you wanted. For example, assuming a system where int is 32 bits and long is 64 bits.
unsigned int x = 0x012345678;
unsigned long y = (~x << 22) | 0x1111;
This would calculate ~x as a 32-bit quantity, and the shift would be performed on a 32-bit quantity, losing a number of bits. By contrast:
unsigned long z = (~(unsigned long)x << 22) | 0x1111;
ensures that the calculation is done in 64-bit arithmetic and doesn't lose any bits from the original value.
The size of "classical" types like int and long int can vary between systems. This can cause problems when, for example, accessing files with fixed-width data structures. For example, int long is currently a 64-bit integer on new systems, but only 32 bits on older systems.
The intN_t and uintN_t types were introduced with C99 and are defined in <inttypes.h>. Since they explicitly specify the number of bits, they eliminate any ambiguity. As a rule, you should use these types in preference if you are at all concerned about making your code portable.
Wikipedia has more information
If you do not want to rely on your compiler use predefined types provided by standard library headers. Every C library you'd compile with is guaranteed to assign proper types to have at least size to store values of size their types declare.
In your friend specific case one can assume that he made this type cast just because he wanted to point other readers that two pointers actually hold symbol characters. Or maybe he is kind of old-fashion guy who remembers the times when there was no void type and the "lowest common divisor" was pointer to char. In my developer life, if I want to emphasize some of my actions I'll make an explicit type cast even if it is, in fact, redundant.
For you 1st question, look at : https://stackoverflow.com/questions/11786113/difference-between-different-integer-types
Basically, the _t is the real standard type name and without, it's a define of the same type.
the u is for unsigned which doesn't allow negative number.
As for your second question, you often need to cast when the function called needs arguments of another type that what you're passing. You can look here for casting tips, or here...

What is the maximum value of a pointer on Mac OSX?

I'd like to make a custom pointer address printer (like the printf(%p)) and I'd like to know what is the maximum value that a pointer can have on the computer I'm using, which is an iMac OS X 10.8.5.
Someone recommended I use an unsigned long. Is the following cast the adapted one and big enough ?
function print_address(void *pointer)
{
unsigned long a;
a = (unsigned long) pointer;
[...]
}
I searched in the limits.h header but I couldn't find any mention of it. Is it a fixed value or there a way to find out what is the maximum on my system ?
Thanks for your help !
Quick summary: Convert the pointer to uintptr_t (defined in <stdint.h>), which will give you a number in the range 0 to UINTPTR_MAX. Read on for the gory details and some unlikely problems you might theoretically run into.
In general there is no such thing as a "maximum value" for a pointer. Pointers are not numbers, and < and > comparisons aren't even defined unless both pointers point into the same (array) object or just past the end of it.
But I think that the size of a pointer is really what you're looking for. And if you can convert a void* pointer value to an unsigned 32-bit or 64-bit integer, the maximum value of that integer is going to be 232-1 or 264-1, respectively.
The type uintptr_t, declared in <stdint.h>, is an unsigned integer type such that converting a void* value to uintptr_t and back again yields a value that compares equal to the original pointer. In short, the conversion (uintptr_t)ptr will not lose information.
<stdint.h> defines a macro UINTPTR_MAX, which is the maximum value of type uintptr_t. That's not exactly the "maximum value of a pointer", but it's probably what you're looking for.
(On many systems, including Mac OSX, pointers are represented as if they were integers that can be used as indices into a linear monolithic address space. That's a common memory model, but it's not actually required by the C standard. For example, some systems may represent a pointer as a combination of a descriptor and an offset, which makes comparisons between arbitrary pointer values difficult or even impossible.)
The <stdint.h> header and the uintptr_t type were added to the C language by the 1999 standard. For MacOS, you shouldn't have to worry about pre-C99 compilers.
Note also that the uintptr_t type is optional. If pointers are bigger than any available integer type, then the implementation won't define uintptr_t. Again, you shouldn't have to worry about that for MacOS. If you want to be fanatical about portable code, then you can use
#include <stdint.h>
#ifdef UINTPTR_MAX
/* uintptr_t exists */
#else
/* uintptr_t doesn't exist; do something else */
#endif
where "something else" is left as an exercise.
You probably are looking for the value of UINTPTR_MAX defined in <stdint.h>.
As ouah's answer says, uintptr_t sounds like the type you really want.
unsigned long is not guaranteed to to be able to represent a pointer value. Use uintptr_t which is an unsigned integer type that can hold a pointer value.

Integer types in C

Suppose I wish to write a C program (C99 or C2011) that I want to be completely portable and not tied to a particular architecture.
It seems that I would then want to make a clean break from the old integer types (int, long, short) and friends and use only int8_t, uint8_t, int32_t and so on (perhaps using the the least and fast versions as well).
What then is the return type of main? Or must we strick with int? Is it required by the standard to be int?
GCC-4.2 allows me to write
#include <stdint.h>
#include <stdio.h>
int32_t main() {
printf("Hello\n");
return 0;
}
but I cannot use uint32_t or even int8_t because then I get
hello.c:3: warning: return type of ‘main’ is not ‘int’
This is because of a typedef, no doubt. It seems this is one case where we are stuck with having to use the unspecified size types, since it's not truly portable unless we leave the return type up to the target architecture. Is this interpretation correct? It seems odd to have "just one" plain old int in the code base but I am happy to be pragmatic.
Suppose I wish to write a C program (C99 or C2011) that I want to be
completely portable and not tied to a particular architecture.
It seems that I would then want to make a clean break from the old
integer types (int, long, short) and friends and use only int8_t,
uint8_t, int32_t and so on (perhaps using the the least and fast
versions as well).
These two affirmations, in bold, are contradictory. That's because whether uint32_t, uint8_t and al are available or not is actually implementation-defined (C11, 7.20.1.1/3: Exact-width integer types).
If you want your program to be truly portable, you must use the built-in types (int, long, etc.) and stick to the minimum ranges defined in the C standard (i.e.: C11, 5.2.4.2.1: Sizes of integer types),
Per example, the standard says that both short and int should range from at least -32767 to at least 32767. So if you want to store a bigger or lesser value, say 42000, you'd use a long instead.
The return type of main is required by the Standard to be int in C89, C99 and in C11.
Now the exact-width integer types are alias to integer types. So if you use the right alias for int it will still be valid.
For example:
int32_t main(void)
if int32_t is a typedef to int.

Resources