Maximum/minimum value of #define values [duplicate] - c

This question already has answers here:
Type of #define variables
(7 answers)
Closed 3 years ago.
When using the #define command in C, what is the maximum or minimum amount the variable can be? For example, is
#define INT_MIN (pow(-2,31))
#define INT_MAX (pow(2,31))
an acceptable definition? I suppose a better way to ask is what is the datatype of the defined value?

#define performs token substitution. If you don't know what tokens are, you can think of this as text substitution on complete words, much like your editor's "search and replace" function could do. Therefore,
#define FOO 123456789123456789123456789123456789123456789
is perfectly valid so far — that just means that the preprocessor will replace every instance of FOO with that long number. It would also be perfectly legal (as far as preprocessing goes) to do
#define FOO this is some text that does not make sense
because the preprocessor doesn't know anything about C, and just replaces FOO with whatever it is defined as.
But this is not the answer you're probably looking for.
After the preprocessor has replaced the macro, the compiler will have to compile whatever was left in its place. And compilers will almost certainly be unable to compile either example I posted here and error out.
Integer constants can be as large as the largest integer type defined by your compiler, which is equivalent to uintmax_t (defined in <stdint.h>). For instance, if this type is 64 bits wide (very common case), the maximum valid integer constant is 18446744073709551615, i.e., 2 to the power of 64 minus 1.
This is independent of how this constant is written or constructed — whether it is done via a #define, written directly in the code, written in hexadecimal, it doesn't matter. The limit is the same, because it is given by the compiler, and the compiler runs after preprocessing is finished.
EDIT: as pointed out by #chux in comments, in recent versions of C (starting with C99), decimal constants will be signed by default unless they carry a suffix indicating otherwise (such as U/u, or a combined type/signedness suffix like ULL). In this case, the maximum valid unsuffixed constant would be whatever fits in an intmax_t value (typically half the max of uintmax_t rounded down); constants with unsigned suffixes can grow as large as an uintmax_t value can. (Note that C integer constants, signed or not, are never negative.)

#define INT_MIN (pow(-2,31)) is not acceptable, as it forms a maximum of the wrong type.
pow() returns a double.
Consider this: INT_MIN % 2 leads to invalid code, as % cannot be done on a double.

Your definition is ill-advised for a number of reasons:
These macro names are used in the standard library header limits.h where they are correctly defined for the toolchain's target platform.
Macros are not part of the C language proper; rather they cause replacement text to be inserted into the code for evaluation by the compiler; as such your definition will cause the functionpow() to be called everywhere these macros are used - evaluated at run-time (repeatedly) rather then being a compile-time constant.
The maximum value of a 32 bit two's complement integer is not 231 but 231 - 1.
The pow() function returns a double not an integer - your macro expressions therefore have type double.
Your macros assume the integer size of the platform to be 32 bit, which need not be the case - the definitions are not portable. This is possibly true also of those in , but there the entire library is platform specific, and you'd use a different library/toolchain with each platform.
If you must (and you really shouldn't) define your own macros for this purpose, you should:
define them using distinct macro names,
without assumptions regarding the target platform integer width,
use a constant-expression,
use an expression having int type.
For example:
#define PLATFORM_INDEPENDENT_INT_MAX ((int)(~0u >> 1u))
#define PLATFORM_INDEPENDENT_INT_MIN ((int)~(~0u >> 1u))
Using these the following code:
#include <stdio.h>
#include <limits.h>
#define PLATFORM_INDEPENDENT_INT_MAX ((int)(~0u >> 1u))
#define PLATFORM_INDEPENDENT_INT_MIN ((int)~(~0u >> 1u))
int main()
{
printf( "Standard: %d\t%d\n", INT_MIN, INT_MAX);
printf( "Mine: %d\t%d\n", PLATFORM_INDEPENDENT_INT_MIN, PLATFORM_INDEPENDENT_INT_MAX);
return 0;
}
Outputs:
Standard: -2147483648 2147483647
Mine: -2147483648 2147483647

Related

Determine data model by C preprocessor

I want to write a .h file conforming to C89 that would be understood by most C preprocessors like gcc, cl (Visual Studio) etc. and that would determine the data model used, i.e. how many bits the (unsigned) short, (unsigned) int and (unsigned) long types occupy. Where can I find the necessary macros? For instance, are there macros I can evaluate in order to find out whether the data model is e.g. ILP32, LP64, LLP64 or something else? It is fine for me to use compiler-specific macros, but I do not want to use architecture-specific or OS-specific macros. If possible, please also provide the necessary macros to check which compiler is used. Thank you!
ADDED 2: The goal is to allow for type definitions depending on the data model. For instance, if long is at least 48-bit wide, a 48-bit key type could be defined as long, but if not, I would need a struct for that. On the other hand, I do not want to rely on anything not guaranteed by C89, like "long is either 32-bit or 64-bit, so if ULONG_MAX != 0xFFFFFFFFlu, then long is wider than 48-bit", which does not have to be true on all C89-conforming compilers.
ADDED 1: Here, the predefined GCC macros are described. Hence I can do the following:
#if defined(__GNUC__)
#define SHRT_BIT __SHRT_WIDTH__
#define INT_BIT __INT_WIDTH__
#define LONG_BIT __LONG_WIDTH__
#elif ???
# ???
#endif
printf(
"short is %u-bit\n"
"int is %u-bit\n"
"long is %u-bit\n",
SHRT_BIT, INT_BIT, LONG_BIT
);
Are there similar macros for other widely used compilers like cl (Visual Studio), which I could add at the location of the ??? in the code?
You can #include <limits.h> and test the values of SHRT_MAX, USHRT_MAX, INT_MAX, UINT_MAX, LONG_MAX, and ULONG_MAX.
For the signed types, if the maximum value is at least 1,073,741,824 (230), the type has at least 32 bits (31 for the value and 1 for the sign), and similarly for other powers of two. For the unsigned types, one fewer bit is indicated (no sign bit), so comparing to 2,147,483,648 (231) would indicate whether the type has at least 32 bits.
These of course give you only the width of a type, the number of bits used for the value and sign (if present). Its actual size may be larger due to padding bits.

What is the difference between using INTXX_C macros and performing type cast to literals?

For example this code is broken (I've just fixed it in actual code..).
uint64_t a = 1 << 60
It can be fixed as,
uint64_t a = (uint64_t)1 << 60
but then this passed my brain.
uint64_t a = UINT64_C(1) << 60
I know that UINT64_C(1) is a macro that expands usually as 1ul in 64-bit systems, but then what makes it different than just doing a type cast?
There is no obvious difference or advantage, these macros are kind of redundant. There are some minor, subtle differences between the cast and the macro:
(uintn_t)1 might be cumbersome to use for preprocessor purposes, whereas UINTN_C(1) expands into a single pp token.
The resulting type of the UINTN_C is actually uint_leastn_t and not uintn_t. So it is not necessarily the type you expected.
Static analysers for coding standards like MISRA-C might moan if you type 1 rather than 1u in your code, since shifting signed integers isn't a brilliant idea regardless of their size.
(uint64_t)1u is MISRA compliant, UINT64_c(1) might not be, or at least the analyser won't be able to tell since it can't expand pp tokens like a compiler. And UINT64_C(1u) will likely not work, since this macro implementation probably looks something like this:
#define UINT64_C(n) ((uint_least64_t) n ## ull)
// BAD: 1u##ull = 1uull
In general, I would recommend to use an explicit cast. Or better yet wrap all of this inside a named constant:
#define MY_BIT ( (uint64_t)1u << 60 )
(uint64_t)1 is formally an int value 1 casted to uint64_t, whereas 1ul is a constant 1 of type unsigned long which is probably the same as uint64_t on a 64-bit system. As you are dealing with constants, all calculations will be done by the compiler and the result is the same.
The macro is a portable way to specify the correct suffix for a constant (literal) of type uint64_t. The suffix appended by the macro (ul, system specific) can be used for literal constants only.
The cast (uint64_t) can be used for both constant and variable values. With a constant, it will have the same effect as the suffix or suffix-adding macro, whereas with a variable of a different type it may perform a truncation or extension of the value (e.g., fill the higher bits with 0 when changing from 32 bits to 64 bits).
Whether to use UINT64_C(1) or (uint64_t)1 is a matter of taste. The macro makes it a bit more clear that you are dealing with a constant.
As mentioned in a comment, 1ul is a uint32_t, not a uint64_t on windows system. I expect that the macro UINT64_C will append the platform-specific suffix corresponding to uint64_t, so it might append uLL in this case. See also https://stackoverflow.com/a/52490273/10622916.
UINT64_C(1) produces a single token via token pasting, whereas ((uint64_t)1) is a constant expression with the same value.
They can be used interchangeably in the sample code posted, but not in preprocessor directives such as #if expressions.
XXX_C macros should be used to define constants that can be used in #if expressions. They are only needed if the constant must have a specific type, otherwise just spelling the constant in decimal or hexadecimal without a suffix is sufficient.

When should I use UINT32_C(), INT32_C(),... macros in C?

I switched to fixed-length integer types in my projects mainly because they help me think about integer sizes more clearly when using them. Including them via #include <inttypes.h> also includes a bunch of other macros like the printing macros PRIu32, PRIu64,...
To assign a constant value to a fixed length variable I can use macros like UINT32_C() and INT32_C(). I started using them whenever I assigned a constant value.
This leads to code similar to this:
uint64_t i;
for (i = UINT64_C(0); i < UINT64_C(10); i++) { ... }
Now I saw several examples which did not care about that. One is the stdbool.h include file:
#define bool _Bool
#define false 0
#define true 1
bool has a size of 1 byte on my machine, so it does not look like an int. But 0 and 1 should be integers which should be turned automatically into the right type by the compiler. If I would use that in my example the code would be much easier to read:
uint64_t i;
for (i = 0; i < 10; i++) { ... }
So when should I use the fixed length constant macros like UINT32_C() and when should I leave that work to the compiler(I'm using GCC)? What if I would write code in MISRA C?
As a rule of thumb, you should use them when the type of the literal matters. There are two things to consider: the size and the signedness.
Regarding size:
An int type is guaranteed by the C standard values up to 32767. Since you can't get an integer literal with a smaller type than int, all values smaller than 32767 should not need to use the macros. If you need larger values, then the type of the literal starts to matter and it is a good idea to use those macros.
Regarding signedness:
Integer literals with no suffix are usually of a signed type. This is potentially dangerous, as it can cause all manner of subtle bugs during implicit type promotion. For example (my_uint8_t + 1) << 31 would cause an undefined behavior bug on a 32 bit system, while (my_uint8_t + 1u) << 31 would not.
This is why MISRA has a rule stating that all integer literals should have an u/U suffix if the intention is to use unsigned types. So in my example above you could use my_uint8_t + UINT32_C(1) but you can as well use 1u, which is perhaps the most readable. Either should be fine for MISRA.
As for why stdbool.h defines true/false to be 1/0, it is because the standard explicitly says so. Boolean conditions in C still use int type, and not bool type like in C++, for backwards compatibility reasons.
It is however considered good style to treat boolean conditions as if C had a true boolean type. MISRA-C:2012 has a whole set of rules regarding this concept, called essentially boolean type. This can give better type safety during static analysis and also prevent various bugs.
It's for using smallish integer literals where the context won't result in the compiler casting it to the correct size.
I've worked on an embedded platform where int is 16 bits and long is 32 bits. If you were trying to write portable code to work on platforms with either 16-bit or 32-bit int types, and wanted to pass a 32-bit "unsigned integer literal" to a variadic function, you'd need the cast:
#define BAUDRATE UINT32_C(38400)
printf("Set baudrate to %" PRIu32 "\n", BAUDRATE);
On the 16-bit platform, the cast creates 38400UL and on the 32-bit platform just 38400U. Those will match the PRIu32 macro of either "lu" or "u".
I think that most compilers would generate identical code for (uint32_t) X as for UINT32_C(X) when X is an integer literal, but that might not have been the case with early compilers.

will ~ operator change the data type?

When I read someone's code I find that he bothered to write an explicite type cast.
#define ULONG_MAX ((unsigned long int) ~(unsigned long int) 0)
When I write code
1 #include<stdio.h>
2 int main(void)
3 {
4 unsigned long int max;
5 max = ~(unsigned long int)0;
6 printf("%lx",max);
7 return 0;
8 }
it works as well. Is it just a meaningless coding style?
The code you read is very bad, for several reasons.
First of all user code should never define ULONG_MAX. This is a reserved identifier and must be provided by the compiler implementation.
That definition is not suitable for use in a preprocessor #if. The _MAX macros for the basic integer types must be usable there.
(unsigned long)0 is just crap. Everybody should just use 0UL, unless you know that you have a compiler that is not compliant with all the recent C standards with that respect. (I don't know of any.)
Even ~0UL should not be used for that value, since unsigned long may (theoretically) have padding bits. -1UL is more appropriate, because it doesn't deal with the bit pattern of the value. It uses the guaranteed arithmetic properties of unsigned integer types. -1 will always be the maximum value of an unsigned type. So ~ may only be used in a context where you are absolutely certain that unsigned long has no padding bits. But as such using it makes no sense. -1 serves better.
"recasting" an expression that is known to be unsigned long is just superfluous, as you observed. I can't imagine any compiler that bugs on that.
Recasting of expression may make sense when they are used in the preprocessor, but only under very restricted circumstances, and they are interpreted differently, there.
#if ((uintmax_t)-1UL) == SOMETHING
..
#endif
Here the value on the left evalues to UINTMAX_MAX in the preprocessor and in later compiler phases. So
#define UINTMAX_MAX ((uintmax_t)-1UL)
would be an appropriate definition for a compiler implementation.
To see the value for the preprocessor, observe that there (uintmax_t) is not a cast but an unknown identifier token inside () and that it evaluates to 0. The minus sign is then interpreted as binary minus and so we have 0-1UL which is unsigned and thus the max value of the type. But that trick only works if the cast contains a single identifier token, not if it has three as in your example, and if the integer constant has a - or + sign.
They are trying to ensure that the type of the value 0 is unsigned long. When you assign zero to a variable, it gets cast to the appropriate type.
In this case, if 0 doesn't happen to be an unsigned long then the ~ operator will be applied to whatever other type it happens to be and the result of that will be cast.
This would be a problem if the compiler decided that 0 is a short or char.
However, the type after the ~ operator should remain the same. So they are being overly cautious with the outer cast, but perhaps the inner cast is justified.
They could of course have specified the correct zero type to begin with by writing ~0UL.

"Type" of symbolic constants?

When is it appropriate to include a type conversion in a symbolic constant/macro, like this:
#define MIN_BUF_SIZE ((size_t) 256)
Is it a good way to make it behave more like a real variable, with type checking?
When is it appropriate to use the L or U (or LL) suffixes:
#define NBULLETS 8U
#define SEEK_TO 150L
You need to do it any time the default type isn't appropriate. That's it.
Typing a constant can be important at places where the automatic conversions are not applied, in particular functions with variable argument list
printf("my size is %zu\n", MIN_BUF_SIZE);
could easily crash when the width of int and size_t are different and you wouldn't do the cast.
But your macro leaves room for improvement. I'd do that as
#define MIN_BUF_SIZE ((size_t)+256U)
(see the little + sign, there?)
When given like that the macro still can be used in preprocessor expressions (with #if). This is because in the preprocessor the (size_t) evaluates to 0 and thus the result is an unsigned 256 there, too.
#define is just token pasting preprocessor.
Whatever you write in #define it will replace with the replacement text before compilation.
So either way is correct
#define A a
int main
{
int A; // A will be replaced by a
}
There are many variations in #define like variadic macro or multiline macro
But the main aim of #define is the only one explained above.
Explicitly indicating the types in a constant was more relevant in Kernighan and Richie C (before ANSI/Standard C and its function prototypes came along).
Function prototypes like double fabs(double value); now allow the compiler to generate proper type conversions when needed.
You still want to explicitly indicate the constant sizes in some cases. The examples that come to my mind right now are bit masks:
#define VALUE_1 ((short) -1) might be 16 bits long while #define VALUE_2 ((char) -1) might be 8. Therefore, given a long x, x & VALUE_1 and x & VALUE_2would give very different results.
This would also be the case for the L or LL suffixes: the constants would use different numbers of bits.

Resources