Determine data model by C preprocessor

Determine data model by C preprocessor - c

I want to write a .h file conforming to C89 that would be understood by most C preprocessors like gcc, cl (Visual Studio) etc. and that would determine the data model used, i.e. how many bits the (unsigned) short, (unsigned) int and (unsigned) long types occupy. Where can I find the necessary macros? For instance, are there macros I can evaluate in order to find out whether the data model is e.g. ILP32, LP64, LLP64 or something else? It is fine for me to use compiler-specific macros, but I do not want to use architecture-specific or OS-specific macros. If possible, please also provide the necessary macros to check which compiler is used. Thank you!
ADDED 2: The goal is to allow for type definitions depending on the data model. For instance, if long is at least 48-bit wide, a 48-bit key type could be defined as long, but if not, I would need a struct for that. On the other hand, I do not want to rely on anything not guaranteed by C89, like "long is either 32-bit or 64-bit, so if ULONG_MAX != 0xFFFFFFFFlu, then long is wider than 48-bit", which does not have to be true on all C89-conforming compilers.
ADDED 1: Here, the predefined GCC macros are described. Hence I can do the following:
#if defined(__GNUC__)
#define SHRT_BIT __SHRT_WIDTH__
#define INT_BIT __INT_WIDTH__
#define LONG_BIT __LONG_WIDTH__
#elif ???
# ???
#endif
printf(
"short is %u-bit\n"
"int is %u-bit\n"
"long is %u-bit\n",
SHRT_BIT, INT_BIT, LONG_BIT
);
Are there similar macros for other widely used compilers like cl (Visual Studio), which I could add at the location of the ??? in the code?

You can #include <limits.h> and test the values of SHRT_MAX, USHRT_MAX, INT_MAX, UINT_MAX, LONG_MAX, and ULONG_MAX.
For the signed types, if the maximum value is at least 1,073,741,824 (230), the type has at least 32 bits (31 for the value and 1 for the sign), and similarly for other powers of two. For the unsigned types, one fewer bit is indicated (no sign bit), so comparing to 2,147,483,648 (231) would indicate whether the type has at least 32 bits.
These of course give you only the width of a type, the number of bits used for the value and sign (if present). Its actual size may be larger due to padding bits.

Related

What is the difference between using INTXX_C macros and performing type cast to literals?

For example this code is broken (I've just fixed it in actual code..).
uint64_t a = 1 << 60
It can be fixed as,
uint64_t a = (uint64_t)1 << 60
but then this passed my brain.
uint64_t a = UINT64_C(1) << 60
I know that UINT64_C(1) is a macro that expands usually as 1ul in 64-bit systems, but then what makes it different than just doing a type cast?

There is no obvious difference or advantage, these macros are kind of redundant. There are some minor, subtle differences between the cast and the macro:
(uintn_t)1 might be cumbersome to use for preprocessor purposes, whereas UINTN_C(1) expands into a single pp token.
The resulting type of the UINTN_C is actually uint_leastn_t and not uintn_t. So it is not necessarily the type you expected.
Static analysers for coding standards like MISRA-C might moan if you type 1 rather than 1u in your code, since shifting signed integers isn't a brilliant idea regardless of their size.
(uint64_t)1u is MISRA compliant, UINT64_c(1) might not be, or at least the analyser won't be able to tell since it can't expand pp tokens like a compiler. And UINT64_C(1u) will likely not work, since this macro implementation probably looks something like this:
#define UINT64_C(n) ((uint_least64_t) n ## ull)
// BAD: 1u##ull = 1uull
In general, I would recommend to use an explicit cast. Or better yet wrap all of this inside a named constant:
#define MY_BIT ( (uint64_t)1u << 60 )

(uint64_t)1 is formally an int value 1 casted to uint64_t, whereas 1ul is a constant 1 of type unsigned long which is probably the same as uint64_t on a 64-bit system. As you are dealing with constants, all calculations will be done by the compiler and the result is the same.
The macro is a portable way to specify the correct suffix for a constant (literal) of type uint64_t. The suffix appended by the macro (ul, system specific) can be used for literal constants only.
The cast (uint64_t) can be used for both constant and variable values. With a constant, it will have the same effect as the suffix or suffix-adding macro, whereas with a variable of a different type it may perform a truncation or extension of the value (e.g., fill the higher bits with 0 when changing from 32 bits to 64 bits).
Whether to use UINT64_C(1) or (uint64_t)1 is a matter of taste. The macro makes it a bit more clear that you are dealing with a constant.
As mentioned in a comment, 1ul is a uint32_t, not a uint64_t on windows system. I expect that the macro UINT64_C will append the platform-specific suffix corresponding to uint64_t, so it might append uLL in this case. See also https://stackoverflow.com/a/52490273/10622916.

UINT64_C(1) produces a single token via token pasting, whereas ((uint64_t)1) is a constant expression with the same value.
They can be used interchangeably in the sample code posted, but not in preprocessor directives such as #if expressions.
XXX_C macros should be used to define constants that can be used in #if expressions. They are only needed if the constant must have a specific type, otherwise just spelling the constant in decimal or hexadecimal without a suffix is sufficient.

Maximum/minimum value of #define values [duplicate]

This question already has answers here:
Type of #define variables
(7 answers)
Closed 3 years ago.
When using the #define command in C, what is the maximum or minimum amount the variable can be? For example, is
#define INT_MIN (pow(-2,31))
#define INT_MAX (pow(2,31))
an acceptable definition? I suppose a better way to ask is what is the datatype of the defined value?

#define performs token substitution. If you don't know what tokens are, you can think of this as text substitution on complete words, much like your editor's "search and replace" function could do. Therefore,
#define FOO 123456789123456789123456789123456789123456789
is perfectly valid so far — that just means that the preprocessor will replace every instance of FOO with that long number. It would also be perfectly legal (as far as preprocessing goes) to do
#define FOO this is some text that does not make sense
because the preprocessor doesn't know anything about C, and just replaces FOO with whatever it is defined as.
But this is not the answer you're probably looking for.
After the preprocessor has replaced the macro, the compiler will have to compile whatever was left in its place. And compilers will almost certainly be unable to compile either example I posted here and error out.
Integer constants can be as large as the largest integer type defined by your compiler, which is equivalent to uintmax_t (defined in <stdint.h>). For instance, if this type is 64 bits wide (very common case), the maximum valid integer constant is 18446744073709551615, i.e., 2 to the power of 64 minus 1.
This is independent of how this constant is written or constructed — whether it is done via a #define, written directly in the code, written in hexadecimal, it doesn't matter. The limit is the same, because it is given by the compiler, and the compiler runs after preprocessing is finished.
EDIT: as pointed out by #chux in comments, in recent versions of C (starting with C99), decimal constants will be signed by default unless they carry a suffix indicating otherwise (such as U/u, or a combined type/signedness suffix like ULL). In this case, the maximum valid unsuffixed constant would be whatever fits in an intmax_t value (typically half the max of uintmax_t rounded down); constants with unsigned suffixes can grow as large as an uintmax_t value can. (Note that C integer constants, signed or not, are never negative.)

#define INT_MIN (pow(-2,31)) is not acceptable, as it forms a maximum of the wrong type.
pow() returns a double.
Consider this: INT_MIN % 2 leads to invalid code, as % cannot be done on a double.

Your definition is ill-advised for a number of reasons:
These macro names are used in the standard library header limits.h where they are correctly defined for the toolchain's target platform.
Macros are not part of the C language proper; rather they cause replacement text to be inserted into the code for evaluation by the compiler; as such your definition will cause the functionpow() to be called everywhere these macros are used - evaluated at run-time (repeatedly) rather then being a compile-time constant.
The maximum value of a 32 bit two's complement integer is not 231 but 231 - 1.
The pow() function returns a double not an integer - your macro expressions therefore have type double.
Your macros assume the integer size of the platform to be 32 bit, which need not be the case - the definitions are not portable. This is possibly true also of those in , but there the entire library is platform specific, and you'd use a different library/toolchain with each platform.
If you must (and you really shouldn't) define your own macros for this purpose, you should:
define them using distinct macro names,
without assumptions regarding the target platform integer width,
use a constant-expression,
use an expression having int type.
For example:
#define PLATFORM_INDEPENDENT_INT_MAX ((int)(~0u >> 1u))
#define PLATFORM_INDEPENDENT_INT_MIN ((int)~(~0u >> 1u))
Using these the following code:
#include <stdio.h>
#include <limits.h>
#define PLATFORM_INDEPENDENT_INT_MAX ((int)(~0u >> 1u))
#define PLATFORM_INDEPENDENT_INT_MIN ((int)~(~0u >> 1u))
int main()
{
printf( "Standard: %d\t%d\n", INT_MIN, INT_MAX);
printf( "Mine: %d\t%d\n", PLATFORM_INDEPENDENT_INT_MIN, PLATFORM_INDEPENDENT_INT_MAX);
return 0;
}
Outputs:
Standard: -2147483648 2147483647
Mine: -2147483648 2147483647

What is the point of the {U,}INTn_C macros in stdint.h?

When are these macros actually needed?
My systems (gcc/glibc/linux/x86_64) stdint.h uses (__-prefixed) variant of these to define:
# define INT64_MIN (-__INT64_C(9223372036854775807)-1)
# define INT64_MAX (__INT64_C(9223372036854775807))
# define UINT64_MAX (__UINT64_C(18446744073709551615))
# define INT_LEAST64_MIN (-__INT64_C(9223372036854775807)-1)
# define INT_LEAST64_MAX (__INT64_C(9223372036854775807))
# define UINT_LEAST64_MAX (__UINT64_C(18446744073709551615))
# define INT_FAST64_MIN (-__INT64_C(9223372036854775807)-1)
# define INT_FAST64_MAX (__INT64_C(9223372036854775807))
# define UINT_FAST64_MAX (__UINT64_C(18446744073709551615))
# define INTMAX_MIN (-__INT64_C(9223372036854775807)-1)
# define INTMAX_MAX (__INT64_C(9223372036854775807))
# define UINTMAX_MAX (__UINT64_C(18446744073709551615))
Yet for limits.h it seems to make do with:
# define LONG_MAX 9223372036854775807L
# define ULONG_MAX 18446744073709551615UL
Why can't stdint.h forget about the _C macros and simply do:
# define INT_LEAST64_MAX 9223372036854775807 //let it grow as needed
# define UINT_LEAST64_MAX 18446744073709551615U //just the U
What are the use cases for these macros?
The only one I could think of is where I want a sufficiently wide constant usable in cpp conditionals and at the same time I don't want it too wide:
//C guarantees longs are at least 32 bits wide
#define THREE_GIGS_BUT_MAYBE_TOO_WIDE (1L<<30)
#define THREE_GIGS (INT32_C(1)<<30) //possibly narrower than the define above

What is the point of the {U,}INTn_C macros in <stdint.h>?
They insure a minimal type width and sign-ness for a constant.
They "expand to an integer constant expression corresponding to the type (u)int_leastN_t."
123 << 50 // likely int overflow (UB)
INT32_C(123) << 50 // likely int overflow (UB)
INT64_C(123) << 50 // well defined.
INT32_C(123)*2000000000 // likely int overflow (UB)
UINT32_C(123)*2000000000 // well defined - even though it may mathematically overflow
Useful when defining computed constants.
// well defined, but the wrong product when unsigned is 32-bit
#define TBYTE (1024u*1024*1024*1024)
// well defined, and specified to be 1099511627776u
#define TBYTE (UINT64_C(1024)*1024*1024*1024)
It also affects code via _Generic. The below could steer code to unsigned long, unsigned and unsigned long long.
(unsigned long) 123
UINT32_C(123)
UINT64_C(123)

The point of e.g. __UINT64_C(x) seems to be to attach the correct kind of suffix to x.
In this way, the implementer of the C standard library header files are able to separate the numerical constants (which are the same on all platforms) from the suffixes (which depend on the integer size).
For example, when building a 64-bit executable, the __UINT64_C(x) would evaluate to x ## UL, while when building a 32-bit executable, it would evaluate to x ## ULL.
Edit: as #PSkocik points out, for signed integers this macro is not necessary. My guess is that it is still present as (1) the suffix is necessary for unsigned values and (2) the authors wanted to keep the code consistent for signed and unsigned constants.

When should I use UINT32_C(), INT32_C(),... macros in C?

I switched to fixed-length integer types in my projects mainly because they help me think about integer sizes more clearly when using them. Including them via #include <inttypes.h> also includes a bunch of other macros like the printing macros PRIu32, PRIu64,...
To assign a constant value to a fixed length variable I can use macros like UINT32_C() and INT32_C(). I started using them whenever I assigned a constant value.
This leads to code similar to this:
uint64_t i;
for (i = UINT64_C(0); i < UINT64_C(10); i++) { ... }
Now I saw several examples which did not care about that. One is the stdbool.h include file:
#define bool _Bool
#define false 0
#define true 1
bool has a size of 1 byte on my machine, so it does not look like an int. But 0 and 1 should be integers which should be turned automatically into the right type by the compiler. If I would use that in my example the code would be much easier to read:
uint64_t i;
for (i = 0; i < 10; i++) { ... }
So when should I use the fixed length constant macros like UINT32_C() and when should I leave that work to the compiler(I'm using GCC)? What if I would write code in MISRA C?

As a rule of thumb, you should use them when the type of the literal matters. There are two things to consider: the size and the signedness.
Regarding size:
An int type is guaranteed by the C standard values up to 32767. Since you can't get an integer literal with a smaller type than int, all values smaller than 32767 should not need to use the macros. If you need larger values, then the type of the literal starts to matter and it is a good idea to use those macros.
Regarding signedness:
Integer literals with no suffix are usually of a signed type. This is potentially dangerous, as it can cause all manner of subtle bugs during implicit type promotion. For example (my_uint8_t + 1) << 31 would cause an undefined behavior bug on a 32 bit system, while (my_uint8_t + 1u) << 31 would not.
This is why MISRA has a rule stating that all integer literals should have an u/U suffix if the intention is to use unsigned types. So in my example above you could use my_uint8_t + UINT32_C(1) but you can as well use 1u, which is perhaps the most readable. Either should be fine for MISRA.
As for why stdbool.h defines true/false to be 1/0, it is because the standard explicitly says so. Boolean conditions in C still use int type, and not bool type like in C++, for backwards compatibility reasons.
It is however considered good style to treat boolean conditions as if C had a true boolean type. MISRA-C:2012 has a whole set of rules regarding this concept, called essentially boolean type. This can give better type safety during static analysis and also prevent various bugs.

It's for using smallish integer literals where the context won't result in the compiler casting it to the correct size.
I've worked on an embedded platform where int is 16 bits and long is 32 bits. If you were trying to write portable code to work on platforms with either 16-bit or 32-bit int types, and wanted to pass a 32-bit "unsigned integer literal" to a variadic function, you'd need the cast:
#define BAUDRATE UINT32_C(38400)
printf("Set baudrate to %" PRIu32 "\n", BAUDRATE);
On the 16-bit platform, the cast creates 38400UL and on the 32-bit platform just 38400U. Those will match the PRIu32 macro of either "lu" or "u".
I think that most compilers would generate identical code for (uint32_t) X as for UINT32_C(X) when X is an integer literal, but that might not have been the case with early compilers.

What is the purpose of "Macros for minimum-width integer constants"

In C99 standard Section 7.18.4.1 "Macros for minimum-width integer constants", some macros defined as [U]INT[N]_C(x) for casting constant integers to least data types where N = 8, 16, 32, 64. Why are these macros defined since I can use L, UL, LL or ULL modifiers instead? For example, when I want to use at least 32 bits unsigned constant integer, I can simply write 42UL instead of UINT32_C(42). Since long data type is at least 32 bits wide it is also portable.
So, what is the purpose of these macros?

You'd need them in places where you want to make sure that they don't become too wide
#define myConstant UINT32_C(42)
and later
printf( "%" PRId32 " is %s\n", (hasproperty ? toto : myConstant), "rich");
here, if the constant would be UL the expression might be ulong and the variadic function could put a 64bit value on the stack that would be misinterpreted by printf.

They use the smallest integer type with a width of at least N, so UINT32_C(42) is only equivalent to 42UL on systems where int is smaller than 32 bits. On systems where int is 32 bits or greater, UINT32_C(42) is equivalent to 42U. You could even imagine a system where a short is 32 bits wide, in which case UINT32_C(42) would be equivalent to (unsigned short)42.
EDIT: #obareey It seems that most, if not all, implementations of the standard library do not comply with this part of the standard, perhaps because it is impossible. [glibc bug 2841] [glibc commit b7398be5]

The macros essentially possibly add an integer constant suffix such as
L, LL, U, UL, o UL to their argument, which basically makes them almost equivalent to the corresponding cast, except that the suffix won't ever downcast.
E.g.,
UINT32_C(42000000000) (42 billion) on an LLP64 architecture will turn into 42000000000U, which will have type UL subject to the rules explained here. The corresponding cast, on the other hand ((uint32_t)42000000000), would truncate it down to uint32_t (unsigned int on LLP64).
I can't think of a good use case, but I imagine it could be usable in some generic bit-twiddling macros that need at least X bits to work, but don't want to remove any extra bits if the user passes in something bigger.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight