How do I find the width of a uint_fast32_t - c

I'd like to be able to portably fprintf() a uint_fast32_t as defined in stdint.h with all leading zeroes. For instance if my platform defined uint_fast32_t as a 64-bit unsigned integer, I would want to fprintf() with a format specifier like %016lX, but if it's a 32-bit unsigned integer, I would want to use %08lX.
Is there a macro like INTFAST32_BITS somewhere that I could use?
right now I'm using
#if UINT_FAST32_MAX == 0xFFFFFFFF
# define FORMAT "08"PRIXFAST32
#elif UINT_FAST32_MAX == 0xFFFFFFFFFFFFFFFF
# define FORMAT "016"PRIXFAST32
#endif
But this only works with uint_fast32_t that is exactly 32 or 64 bits wide, and the code is also kinda clunky and hard to read

You can find the size of a uint_fast32_t the same way you can find the size of any other type: by applying the sizeof operator to it. The unit of the result is the size of type char. Example:
size_t chars_in_a_uint_fast32_t = sizeof(uint_fast32_t);
But this is not sufficient for printing one with one of the printf-family functions. Undefined behavior arises in printf et. al. if any formatting directive is mismatched with the type of the corresponding argument, and the size of of uint_fast32_t is not sufficient information to make a proper match.
This is where the formatting macros already described in your other answer come in. These provide the appropriate size modifier and conversion letter for each type and conversion. For an X conversion of uint_fast32_t, the appropriate macro is PRIXFAST32.
You provide any flags, field width, and precision as normal. If indeed you want to use a field width that is adapted to the actual size of uint_fast32_t, then a reasonable way to do that would be to avail yourself of the option to specify that the field width is passed as an argument (of type int). Thus:
uint_fast32_t fast;
// ...
printf("%0*" PRIXFAST32 "\n", (int) sizeof(fast) * 2, fast);
I note, however, that this seems a little questionable, inasmuch as you are making provision for values wider than generally you should allow any value of that type to become. The printf function will not in any case print fewer digits than are required to express the value, supposing that it is indeed of the specified type, so I would be inclined to simply use a fixed field width sufficient for 32-bit integers (say 8).

Related

Use of format specifiers for conversions

I am unable to deduce the internal happenings inside the machine when we print data using format specifiers.
I was trying to understand the concept of signed and unsigned integers and the found the following:
unsigned int b=-12;
printf("%d\n",b); //prints -12
printf("%u\n\n",b); //prints 4294967284
I am guessing that b actually stores the binary version of -12 as 11111111111111111111111111110100.
So, since b is unsigned , b technically stores 4294967284.
But still the format specifier %d causes the binary value of b to be printed as its signed version i,e, -12.
However,
printf("%f\n",2); //prints 0.000000
printf("%f\n",100); //prints 0.000000
printf("%d\n",3.2); //prints 2147483639
printf("%d\n",3.1); //prints 2147483637
I kind of expected the 2 to be printed as 2.00000 and 3.2 to be printed as 3 as per type conversion norms.
Why does this not happen and what exactly takes place at machine level ?
Mismatching format specifier and argument type (like using the floating point specifier "%f" to print an int value) leads to undefined behavior.
Remember that 2 is an integer value, and vararg functions (like printf) doesn't really know the types of the arguments. The printf function have to rely on the format specifier to assume the argument is of the specified type.
To better understand how you get the results you get, to understand "the internal happenings", we first must make two assumptions:
The system uses 32 bits for the int type
The system uses 64 bits for the double type
Now what happens with
printf("%f\n",2); //prints 0.000000
is that the printf function sees the "%f" specifier, and fetch the next argument as a 64-bit double value. Since the int value you provided in the argument list is only 32 bits, half of the bits in the double value will be unknown. The printf function will then print the (invalid) double value. If you're unlucky some of the unknown bits might lead the value to be a trap value which can cause a crash.
Similarly with
printf("%d\n",3.2); //prints 2147483639
the printf function fetches the next argument as a 32-bit int value, losing half of the bits in the 64-bit double value provided as the actual argument. Exactly which 32 bits are copied into the internal int value depends on endianness. Integers don't have trap values so no crashes happens, just an unexpected value will be printed.
what exactly takes place at machine level ?
The stdio.h functions are quite far from the machine level. They provide a standardized abstraction layer on top of various OS API. Whereas "machine level" would refer to the generated assembler. The behavior you experience is mostly related to details of the C language rather than the machine.
On the machine level, there exists no signed numbers, but everything is treated as raw binary data. The compiler can turn raw binary data into a signed number by using an instruction that tells the CPU: "use what's stored at this location and treat it as a signed number". Specifically, as a two's complement signed number on all common computers. But this is irrelevant when explaining why your code misbehaves.
The integer constant 12 is of type int. When we write -12 we apply the unary - operator on that. The result is still of type int but now of value -12.
Then you attempt to store this negative number in an unsigned int. This triggers an implicit conversion to unsigned int, which should be carried out according to the C standard:
Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or
subtracting one more than the maximum value that can be represented in the new type
until the value is in the range of the new type
The maximum value of a 32 bit unsigned int is 2^32 - 1, which equals 4.29*10^9 - 1. "One more than the maximum" gives 4.29*10^9. If we calculate-12 + 4.29*10^9 we get 4294967284. This is in range of an unsigned int and is the result you see later.
Now as it happens, the printf family of functions is very unsafe. If you provide a wrong format specifier which doesn't matches the type, they might crash or display the wrong result etc - the program invokes undefined behavior.
So when you use %d or %i reserved for signed int, but pass an unsigned int, anything can happen. "Anything" includes the compiler trying to convert the passed type to match the passed format specifier. That's what happened when you used %d.
When you pass values of types completely mismatching the format specifier, the program just prints gibberish though. Because you are still invoking undefined behavior.
I kind of expected the 2 to be printed as 2.00000 and 3.2 to be printed as 3 as per type conversion norms.
The reason why the printf family can't do anything intelligent like assuming that 2 should be converted to 2.0, is because they are variadic (variable argument) functions. Meaning they can take any number of arguments. In order to make that possible, the parameters are essentially passed as raw binary through something called va_list, and all type information is lost. The printf implementation is therefore left with no type information but the format string you gave it. This is why variadic functions are so unsafe to use.
Unlike a regular function which has more type safety - if you declare void foo (float f) and pass the integer constant 2 (type int), it will attempt to implicitly convert from integer to float, and perhaps also give a conversion warning.
The behaviors you observe are the result of printf interpreting the bits given to it as the type specified by the format specifier. In particular, at least for your system:
The bits for an int argument and an unsigned argument in the same position within the argument list would be passed in the same place, so when you give printf one and tell it to format the other, it uses the bits you give it as if they were the bits of the other.
The bits for an int argument and a double argument would be passed in different places—possibly a general register for the int argument and a special floating-point register for the double argument, so when you give printf one and tell it to format the other, it does not get the bits for the double to use for the int; it gets completely unrelated bits that were left lying around by previous operations.
Whenever a function is called, values for its arguments must be placed in certain places. These places vary according to the software and hardware used, and they vary by the type and number of arguments. However, for any particular argument type, argument position, and specific software and hardware used, there is a specific place (or combination of places) where the bits of that argument should be stored to be passed to the function. The rules for this are part of the Application Binary Interface (ABI) for the software and hardware being used.
First, let us neglect any compiler optimization or transformation and examine what happens when the compiler implements a function call in source code directly as a function call in assembly language. The compiler will take the arguments you provide for printf and write them to the places designated for those types of arguments. When printf executes, it examines the format string. When it sees a format specifier, it figures out what type of argument it should have, and it looks for the value of that argument in the place for that type of argument.
Now, there are two things that can happen. Say you passed an unsigned but used a format specifier for int, like %d. In every ABI I have seen, an unsigned and an int argument (in the same position within the list of arguments) are passed in the same place. So, when printf looks for the bits for the int it is expected, it will get the bits for the unsigned you passed.
Then printf will interpret those bits as if they encoded the value for an int, and it will print the results. In other words, the bits of your unsigned value are reinterpreted as the bits of an int.1
This explains why you see “-12” when you pass the unsigned value 4,294,967,284 to printf to be formatted with %d. When the bits 11111111111111111111111111110100 are interpreted as an unsigned, they represent the value 4,294,967,284. When they are interpreted as an int, they represent the value −12 on your system. (This encoding system is called two’s complement. Other encoding systems include one’s complement and sign-and-magnitude, in which these bits would represent −1 and −2,147,483,636, respectively. Those systems are rare for plain integer types these days.)
That is the first of two things that can happen, and it is common when you pass the wrong type but it is similar to the correct type in size and nature—it is passed in the same place as the wrong type. The second thing that can happen is that the argument you pass is passed in a different place than the argument that is expected. For example, if you pass a double as an argument, it is, in many systems, placed in separate set of registers for floating-point values. When printf goes looking for an int argument for %d, it will not find the bits of your double at all. Instead, what it finds in the place where it looks for an int argument might be whatever bits happened to be left in a register or memory location from previous operations, or it might be the bits of the next argument in the list of arguments. In any case, this means that the value printf prints for the %d will have nothing to do with the double value you passed, because the bits of the double are not involved in any way—a complete different set of bits is used.
This is also part of the reason the C standard says it does not define the behavior when the wrong argument type is passed for a printf conversion. Once you have messed up the argument list by passing double where an int should have been, all the following arguments may be in the wrong places too. They might be in different registers from where they are expected, or they might be in different stack locations from where they are expected. printf has no way to recover from this mistake.
As stated, all of the above neglects compiler optimization. The rules of C arose out of various needs, such as accommodating the problems above and making C portable to a variety of systems. However, once those rules are written, compilers can take advantage of them to allow optimization. The C standard permits a compiler to make any transformation of a program as long as the changed program has the same behavior as the original program under the rules of the C standard. This permission allows compilers to speed up programs tremendously in some circumstances. But a consequence is that, if your program has behavior not defined by the C standard (and not defined by any other rules the compiler follows), it is allowed to transform your program into anything. Over the years, compilers have grown increasingly aggressive about their optimizations, and they continue to grow. This means, aside from the simple behaviors described above, when you pass incorrect arguments to printf, the compiler is allowed to produce completely different results. Therefore, although you may commonly see the behaviors I describe above, you may not rely on them.
Footnote
1 Note that this is not a conversion. A conversion is an operation whose input is one type and whose output is another type but has the same value (or as nearly the same as is possible, in some sense, as when we convert a double 3.5 to an int 3). In some cases, a conversion does not require any change to the bits—an unsigned 3 and an int 3 use the same bits to represent 3, so the conversion does not change the bits, and the result is the same as a reinterpretation. But they are conceptually different.

When should I use UINT32_C(), INT32_C(),... macros in C?

I switched to fixed-length integer types in my projects mainly because they help me think about integer sizes more clearly when using them. Including them via #include <inttypes.h> also includes a bunch of other macros like the printing macros PRIu32, PRIu64,...
To assign a constant value to a fixed length variable I can use macros like UINT32_C() and INT32_C(). I started using them whenever I assigned a constant value.
This leads to code similar to this:
uint64_t i;
for (i = UINT64_C(0); i < UINT64_C(10); i++) { ... }
Now I saw several examples which did not care about that. One is the stdbool.h include file:
#define bool _Bool
#define false 0
#define true 1
bool has a size of 1 byte on my machine, so it does not look like an int. But 0 and 1 should be integers which should be turned automatically into the right type by the compiler. If I would use that in my example the code would be much easier to read:
uint64_t i;
for (i = 0; i < 10; i++) { ... }
So when should I use the fixed length constant macros like UINT32_C() and when should I leave that work to the compiler(I'm using GCC)? What if I would write code in MISRA C?
As a rule of thumb, you should use them when the type of the literal matters. There are two things to consider: the size and the signedness.
Regarding size:
An int type is guaranteed by the C standard values up to 32767. Since you can't get an integer literal with a smaller type than int, all values smaller than 32767 should not need to use the macros. If you need larger values, then the type of the literal starts to matter and it is a good idea to use those macros.
Regarding signedness:
Integer literals with no suffix are usually of a signed type. This is potentially dangerous, as it can cause all manner of subtle bugs during implicit type promotion. For example (my_uint8_t + 1) << 31 would cause an undefined behavior bug on a 32 bit system, while (my_uint8_t + 1u) << 31 would not.
This is why MISRA has a rule stating that all integer literals should have an u/U suffix if the intention is to use unsigned types. So in my example above you could use my_uint8_t + UINT32_C(1) but you can as well use 1u, which is perhaps the most readable. Either should be fine for MISRA.
As for why stdbool.h defines true/false to be 1/0, it is because the standard explicitly says so. Boolean conditions in C still use int type, and not bool type like in C++, for backwards compatibility reasons.
It is however considered good style to treat boolean conditions as if C had a true boolean type. MISRA-C:2012 has a whole set of rules regarding this concept, called essentially boolean type. This can give better type safety during static analysis and also prevent various bugs.
It's for using smallish integer literals where the context won't result in the compiler casting it to the correct size.
I've worked on an embedded platform where int is 16 bits and long is 32 bits. If you were trying to write portable code to work on platforms with either 16-bit or 32-bit int types, and wanted to pass a 32-bit "unsigned integer literal" to a variadic function, you'd need the cast:
#define BAUDRATE UINT32_C(38400)
printf("Set baudrate to %" PRIu32 "\n", BAUDRATE);
On the 16-bit platform, the cast creates 38400UL and on the 32-bit platform just 38400U. Those will match the PRIu32 macro of either "lu" or "u".
I think that most compilers would generate identical code for (uint32_t) X as for UINT32_C(X) when X is an integer literal, but that might not have been the case with early compilers.

What is the purpose of "Macros for minimum-width integer constants"

In C99 standard Section 7.18.4.1 "Macros for minimum-width integer constants", some macros defined as [U]INT[N]_C(x) for casting constant integers to least data types where N = 8, 16, 32, 64. Why are these macros defined since I can use L, UL, LL or ULL modifiers instead? For example, when I want to use at least 32 bits unsigned constant integer, I can simply write 42UL instead of UINT32_C(42). Since long data type is at least 32 bits wide it is also portable.
So, what is the purpose of these macros?
You'd need them in places where you want to make sure that they don't become too wide
#define myConstant UINT32_C(42)
and later
printf( "%" PRId32 " is %s\n", (hasproperty ? toto : myConstant), "rich");
here, if the constant would be UL the expression might be ulong and the variadic function could put a 64bit value on the stack that would be misinterpreted by printf.
They use the smallest integer type with a width of at least N, so UINT32_C(42) is only equivalent to 42UL on systems where int is smaller than 32 bits. On systems where int is 32 bits or greater, UINT32_C(42) is equivalent to 42U. You could even imagine a system where a short is 32 bits wide, in which case UINT32_C(42) would be equivalent to (unsigned short)42.
EDIT: #obareey It seems that most, if not all, implementations of the standard library do not comply with this part of the standard, perhaps because it is impossible. [glibc bug 2841] [glibc commit b7398be5]
The macros essentially possibly add an integer constant suffix such as
L, LL, U, UL, o UL to their argument, which basically makes them almost equivalent to the corresponding cast, except that the suffix won't ever downcast.
E.g.,
UINT32_C(42000000000) (42 billion) on an LLP64 architecture will turn into 42000000000U, which will have type UL subject to the rules explained here. The corresponding cast, on the other hand ((uint32_t)42000000000), would truncate it down to uint32_t (unsigned int on LLP64).
I can't think of a good use case, but I imagine it could be usable in some generic bit-twiddling macros that need at least X bits to work, but don't want to remove any extra bits if the user passes in something bigger.

will ~ operator change the data type?

When I read someone's code I find that he bothered to write an explicite type cast.
#define ULONG_MAX ((unsigned long int) ~(unsigned long int) 0)
When I write code
1 #include<stdio.h>
2 int main(void)
3 {
4 unsigned long int max;
5 max = ~(unsigned long int)0;
6 printf("%lx",max);
7 return 0;
8 }
it works as well. Is it just a meaningless coding style?
The code you read is very bad, for several reasons.
First of all user code should never define ULONG_MAX. This is a reserved identifier and must be provided by the compiler implementation.
That definition is not suitable for use in a preprocessor #if. The _MAX macros for the basic integer types must be usable there.
(unsigned long)0 is just crap. Everybody should just use 0UL, unless you know that you have a compiler that is not compliant with all the recent C standards with that respect. (I don't know of any.)
Even ~0UL should not be used for that value, since unsigned long may (theoretically) have padding bits. -1UL is more appropriate, because it doesn't deal with the bit pattern of the value. It uses the guaranteed arithmetic properties of unsigned integer types. -1 will always be the maximum value of an unsigned type. So ~ may only be used in a context where you are absolutely certain that unsigned long has no padding bits. But as such using it makes no sense. -1 serves better.
"recasting" an expression that is known to be unsigned long is just superfluous, as you observed. I can't imagine any compiler that bugs on that.
Recasting of expression may make sense when they are used in the preprocessor, but only under very restricted circumstances, and they are interpreted differently, there.
#if ((uintmax_t)-1UL) == SOMETHING
..
#endif
Here the value on the left evalues to UINTMAX_MAX in the preprocessor and in later compiler phases. So
#define UINTMAX_MAX ((uintmax_t)-1UL)
would be an appropriate definition for a compiler implementation.
To see the value for the preprocessor, observe that there (uintmax_t) is not a cast but an unknown identifier token inside () and that it evaluates to 0. The minus sign is then interpreted as binary minus and so we have 0-1UL which is unsigned and thus the max value of the type. But that trick only works if the cast contains a single identifier token, not if it has three as in your example, and if the integer constant has a - or + sign.
They are trying to ensure that the type of the value 0 is unsigned long. When you assign zero to a variable, it gets cast to the appropriate type.
In this case, if 0 doesn't happen to be an unsigned long then the ~ operator will be applied to whatever other type it happens to be and the result of that will be cast.
This would be a problem if the compiler decided that 0 is a short or char.
However, the type after the ~ operator should remain the same. So they are being overly cautious with the outer cast, but perhaps the inner cast is justified.
They could of course have specified the correct zero type to begin with by writing ~0UL.

What is the size of an enum in C?

I'm creating a set of enum values, but I need each enum value to be 64 bits wide. If I recall correctly, an enum is generally the same size as an int; but I thought I read somewhere that (at least in GCC) the compiler can make the enum any width they need to be to hold their values. So, is it possible to have an enum that is 64 bits wide?
Taken from the current C Standard (C99): http://www.open-std.org/JTC1/SC22/WG14/www/docs/n1256.pdf
6.7.2.2 Enumeration specifiers
[...]
Constraints
The expression that defines the value of an enumeration constant shall be an integer
constant expression that has a value representable as an int.
[...]
Each enumerated type shall be compatible with char, a signed integer type, or an
unsigned integer type. The choice of type is implementation-defined, but shall be
capable of representing the values of all the members of the enumeration.
Not that compilers are any good at following the standard, but essentially: If your enum holds anything else than an int, you're in deep "unsupported behavior that may come back biting you in a year or two" territory.
Update: The latest publicly available draft of the C Standard (C11): http://www.open-std.org/JTC1/SC22/WG14/www/docs/n1570.pdf contains the same clauses. Hence, this answer still holds for C11.
An enum is only guaranteed to be large enough to hold int values. The compiler is free to choose the actual type used based on the enumeration constants defined so it can choose a smaller type if it can represent the values you define. If you need enumeration constants that don't fit into an int you will need to use compiler-specific extensions to do so.
While the previous answers are correct, some compilers have options to break the standard and use the smallest type that will contain all values.
Example with GCC (documentation in the GCC Manual):
enum ord {
FIRST = 1,
SECOND,
THIRD
} __attribute__ ((__packed__));
STATIC_ASSERT( sizeof(enum ord) == 1 )
Just set the last value of the enum to a value large enough to make it the size you would like the enum to be, it should then be that size:
enum value{a=0,b,c,d,e,f,g,h,i,j,l,m,n,last=0xFFFFFFFFFFFFFFFF};
We have no control over the size of an enum variable. It totally depends on the implementation, and the compiler gives the option to store a name for an integer using enum, so enum is following the size of an integer.
In C language, an enum is guaranteed to be of size of an int. There is a compile time option (-fshort-enums) to make it as short (This is mainly useful in case the values are not more than 64K). There is no compile time option to increase its size to 64 bit.
Consider this code:
enum value{a,b,c,d,e,f,g,h,i,j,l,m,n};
value s;
cout << sizeof(s) << endl;
It will give 4 as output. So no matter the number of elements an enum contains, its size is always fixed.

Resources