Safe low 32 bits masking of uint64_t - c

Assume the following code:
uint64_t g_global_var;
....
....
void foo(void)
{
uint64_t local_32bit_low = g_global_var & 0xFFFFFFFF;
....
}
With the current toolchain, this code works as expected, local_32bit_low indeed contains the low 32 bits of g_global_var.
I wonder if it is guaranteed by the standard C that this code will always work as expected?
My concern is that the compiler may treat 0xFFFFFFFF as integer value of -1 and when promoting to uint64_t it would become 0xFFFFFFFFFFFFFFFF.
P.S.
I know that to be on the safe side it is better to use 0xFFFFFFFFULL in this case. The point is that I saw it in a legacy code and I wonder if it worth to be fixed or not.

There is no problem. The integer constant 0xFFFFFFFF has the type that is able to store the value as is.
According to the C Standard (6.4.4.1 Integer constants)
5 The type of an integer constant is the first of the corresponding
list in which its value can be represented
So this value is stored as a positive value.
If the type unsigned int is a 32-bit integer type then the constant will have the type unsigned int.
Otherwise it will have one of the types that can store the value.
long int
unsigned long int
long long int
unsigned long long int
Due to the usual arithmetic conversions in the expression
g_global_var & 0xFFFFFFFF;
it is promoted like
0x00000000FFFFFFFF
Pay attention to that in C there is no negative integer constants. For example an expression like
-10
consists of two sub-expressions: the primary expression 10 and the sub-expression with the unary operator - -19 that coincides with the full expression.

0xffffffff is not -1, ever. It may convert to -1 if you cast or coerce (e.g. by assignment) it to a signed 32-bit type, but integer literals in C always have their mathematical value unless they overflow.
For decimal literals, the type is the narrowest signed type that can represent the value. For hex literals, unsigned types are used before going up to the next wider signed type. So, in the common case where int is 32-bit, 0xffffffff would have type unsigned int. If you wrote it as decimal, it would have type long (if long is 64-bit) or long long (if long is only 32-bit).

The type of an unsuffixed hexadecimal or octal constant is the first of the following list in which its value can be represented:
int
unsigned int
long int
unsigned long int
long long int
unsigned long long int
(For unsuffixed decimal constants, remove the unsigned types from the above list.)
The hexadecimal constant 0xFFFFFFFF can definitely be represented by unsigned long int, so its type will be the first of int, unsigned int, long int or unsigned long int that can represent its value.
Note that although 0xFFFFFFFF > 0 always evaluates to 1 (true), it is possible for 0xFFFFFFFF > -1 to evaluate to either 0 (false) or 1 (true) on different implementations. So you need to be careful when comparing integer constants with each other or with other objects of integer type.

Others have answered the question, just a recomendation, next time (if you are under C11) you can check the type of the expression by yourself using _Generic
#include <stdio.h>
#include <stdint.h>
#define print_type(x) _Generic((x), \
int64_t: puts("int64_t"), \
uint64_t: puts("uint64_t"), \
default: puts("unknown") \
)
uint64_t g_global_var;
int main(void)
{
print_type(g_global_var & 0xFFFFFFFF);
return 0;
}
The ouput is
uint64_t

Related

Why do we use suffix in c variables?

What is the difference between these two declarations?
long int n=12;
long int n=12l;
int n=12l;
and
How does an unsigned variable store signed values?
unsigned int number=-13;
What is the difference between these declarations?
long int n=12;
An int value 12 is assigned to the long variable n.
long int n=12l;
A long value 12l is assigned to the long variable n.
int n=12l;
A long value 12l is assigned to the int variable n.
How can an unsigned variable allow signed values?
unsigned int number=-13;
Why do you think so?
This question is a duplicate.
Why do we use suffix in c variables?
These suffixes are not in the variables. They are in constants. We use suffixes, sometimes, to control the types of the constants.
long int n = 12;
In this, 12 is a constant of type int. Small decimal numerals with no suffix are int by default. A numeral for an integer too big to represent in int will be long int or long long int, or possibly a compiler-specific wider type.
Since n is declared to be long int, the int 12 will be converted to a long int for the initialization. This conversion will not change the value.
long int n = 12l;
Here, 12l is a constant of type long int. Since it is used to initialize a long int, no conversion is necessary.
int n = 12l;
Again, 12l is a long int. However, since it is used to initialize an int, it will be converted to int. Since int can represent the value 12, this conversion will not change the value. If the constant were too large to represent in an int, then the conversion would change the value, in some implementation-defined way.
As to why we use suffixes sometimes, consider 1 << 31. The 1 is an int, and we shift it left 31 bits. In C implementations common today, an int has 32 bits, and it can represent numbers from −231 to +231−1. However, 1 << 31 would produce 231, which overflows this int range. The C standard does not define the behavior when that overflow occurs.
To avoid this, we might write 1u << 31. Then 1u is an unsigned int, which can represent numbers from 0 to +232−1. Then 231 fits in this, and there is no overflow. Alternately, if we know long int is 64 bits in the C implementation we are using, we might use 1l << 31, depending on our needs. 1 << 31 would overflow, but 1l << 31 would not.
The u suffix makes a decimal constant unsigned int, unsigned long int, or unsigned long long int, as necessary to holds its value (or possibly a wider type supported by the compiler).
The l suffix makes a decimal constant at least long int (skipping int), but the constant will still be long long int if its value is too large for long int.
If an integer constant is written using octal (starts with 0) or hexadecimal (starts with 0x), then it both signed and unsigned types are considered until one is found that can represent its value: int, unsigned int, long int, unsigned long int, long long int, unsigned long long int, and then compiler-specific types.
unsigned int number = -13;
Here 13 is an int constant, and the - is the negation operator, so the result is an int with value −13. (Note there are no negative integer constants in C; you make a negative integer by negating an integer constant.)
Since it is used to initialize an unsigned int, it is converted to unsigned int. When C converts an integer to an unsigned integer type of width M (meaning the type uses M bits to represent values), it is “wrapped” modulo 2M. This means 2M is added to or subtracted from the value until the result is representable in the destination type. For −13 and a 32-bit unsigned int, wrapping −13 modulo 232 produces −13 + 4,294,967,296 = 4,294,967,283.
Wrapping modulo 2M produces the same result as if you wrote the starting number in binary (using two’s complement with more than M bits, if the number is negative) and removed all but the last M bits. For example, −13 in 64-bit two’s complement is 11111111111111111111111111111111111111111111111111111111111100112 = FFFFFFFFFFFFFFF316. The last 32 bits are 111111111111111111111111111100112 = FFFFFFF316. In plain binary (not two’s complement), that is 4,294,967,283.

Why is 0 < -0x80000000?

I have below a simple program:
#include <stdio.h>
#define INT32_MIN (-0x80000000)
int main(void)
{
long long bal = 0;
if(bal < INT32_MIN )
{
printf("Failed!!!");
}
else
{
printf("Success!!!");
}
return 0;
}
The condition if(bal < INT32_MIN ) is always true. How is it possible?
It works fine if I change the macro to:
#define INT32_MIN (-2147483648L)
Can anyone point out the issue?
This is quite subtle.
Every integer literal in your program has a type. Which type it has is regulated by a table in 6.4.4.1:
Suffix Decimal Constant Octal or Hexadecimal Constant
none int int
long int unsigned int
long long int long int
unsigned long int
long long int
unsigned long long int
If a literal number can't fit inside the default int type, it will attempt the next larger type as indicated in the above table. So for regular decimal integer literals it goes like:
Try int
If it can't fit, try long
If it can't fit, try long long.
Hex literals behave differently though! If the literal can't fit inside a signed type like int, it will first try unsigned int before moving on to trying larger types. See the difference in the above table.
So on a 32 bit system, your literal 0x80000000 is of type unsigned int.
This means that you can apply the unary - operator on the literal without invoking implementation-defined behavior, as you otherwise would when overflowing a signed integer. Instead, you will get the value 0x80000000, a positive value.
bal < INT32_MIN invokes the usual arithmetic conversions and the result of the expression 0x80000000 is promoted from unsigned int to long long. The value 0x80000000 is preserved and 0 is less than 0x80000000, hence the result.
When you replace the literal with 2147483648L you use decimal notation and therefore the compiler doesn't pick unsigned int, but rather tries to fit it inside a long. Also the L suffix says that you want a long if possible. The L suffix actually has similar rules if you continue to read the mentioned table in 6.4.4.1: if the number doesn't fit inside the requested long, which it doesn't in the 32 bit case, the compiler will give you a long long where it will fit just fine.
0x80000000 is an unsigned literal with value 2147483648.
Applying the unary minus on this still gives you an unsigned type with a non-zero value. (In fact, for a non-zero value x, the value you end up with is UINT_MAX - x + 1.)
This integer literal 0x80000000 has type unsigned int.
According to the C Standard (6.4.4.1 Integer constants)
5 The type of an integer constant is the first of the corresponding
list in which its value can be represented.
And this integer constant can be represented by the type of unsigned int.
So this expression
-0x80000000 has the same unsigned int type. Moreover it has the same value
0x80000000 in the two's complement representation that calculates the following way
-0x80000000 = ~0x80000000 + 1 => 0x7FFFFFFF + 1 => 0x80000000
This has a side effect if to write for example
int x = INT_MIN;
x = abs( x );
The result will be again INT_MIN.
Thus in in this condition
bal < INT32_MIN
there is compared 0 with unsigned value 0x80000000 converted to type long long int according to the rules of the usual arithmetic conversions.
It is evident that 0 is less than 0x80000000.
The numeric constant 0x80000000 is of type unsigned int. If we take -0x80000000 and do 2s compliment math on it, we get this:
~0x80000000 = 0x7FFFFFFF
0x7FFFFFFF + 1 = 0x80000000
So -0x80000000 == 0x80000000. And comparing (0 < 0x80000000) (since 0x80000000 is unsigned) is true.
A point of confusion occurs in thinking the - is part of the numeric constant.
In the below code 0x80000000 is the numeric constant. Its type is determine only on that. The - is applied afterward and does not change the type.
#define INT32_MIN (-0x80000000)
long long bal = 0;
if (bal < INT32_MIN )
Raw unadorned numeric constants are positive.
If it is decimal, then the type assigned is first type that will hold it: int, long, long long.
If the constant is octal or hexadecimal, it gets the first type that holds it: int, unsigned, long, unsigned long, long long, unsigned long long.
0x80000000, on OP's system gets the type of unsigned or unsigned long. Either way, it is some unsigned type.
-0x80000000 is also some non-zero value and being some unsigned type, it is greater than 0. When code compares that to a long long, the values are not changed on the 2 sides of the compare, so 0 < INT32_MIN is true.
An alternate definition avoids this curious behavior
#define INT32_MIN (-2147483647 - 1)
Let us walk in fantasy land for a while where int and unsigned are 48-bit.
Then 0x80000000 fits in int and so is the type int. -0x80000000 is then a negative number and the result of the print out is different.
[Back to real-word]
Since 0x80000000 fits in some unsigned type before a signed type as it is just larger than some_signed_MAX yet within some_unsigned_MAX, it is some unsigned type.
C has a rule that the integer literal may be signed or unsigned depends on whether it fits in signed or unsigned (integer promotion). On a 32-bit machine the literal 0x80000000 will be unsigned. 2's complement of -0x80000000 is 0x80000000 on a 32-bit machine. Therefore, the comparison bal < INT32_MIN is between signed and unsigned and before comparison as per the C rule unsigned int will be converted to long long.
C11: 6.3.1.8/1:
[...] Otherwise, if the type of the operand with signed integer type can represent all of the values of the type of the operand with unsigned integer type, then the operand with unsigned integer type is converted to the type of the operand with signed integer type.
Therefore, bal < INT32_MIN is always true.

Type of constant in "unsigned ux = 2147483648;"

If I write this declaration:
unsigned ux = 2147483648;
(231), will the C compiler treat 2147483648 as an unsigned or signed value?
I've heard that constant values are always treated as signed, but I don't think that's always right.
The value of an unsuffixed decimal constant such as 2147483648 depends on the value of the constant, the ranges of the predefined type, and, in some cases on the version of the C standard you're using.
In C89/C90, the type is the first of:
int
long int
unsigned long int
in which it fits.
In C99 and later, it's the first of:
int
long int
long long int
in which it fits.
You didn't tell us what implementation you're using, but if long int is 32 bits on your system, then 2147483648 will be of type unsigned long int if you have a pre-C99 compiler, or (signed) long long int if you have a C99 or later compiler.
But in your particular case:
unsigned ux = 2147483648;
it doesn't matter. If the constant is of type unsigned int, then it's already of the right type, and no conversion is necessary. If it's of type long long int (as it must be in C99 or later, given 32-bit long), then the value must be converted from that type to unsigned. Conversion from a signed type to an unsigned type is well defined.
So if unsigned is wide enough to represent the value 2147483648, then that's the value that will be stored in ux. And if it isn't (if unsigned int is 16 bits, for example), then the conversion will result in 0 being stored in ux.
You can exercise some control over the type of a constant by appending a suffix to it. For example, 2147483648UL is guaranteed to be of some unsigned type (it could be either unsigned int or unsigned long int).
Incidentally, your question's title is currently "About Class Cast.(if I write unsigned ux=2147483648(2 to the 31 st))", but your question has nothing to do with classes (which don't exist in C) or with casts. I'll edit the question.

Unexpected sign extension of int32 or 32bit pointer when converted to uint64

I compiled this code using Visual Studio 2010 (cl.exe /W4) as a C file:
int main( int argc, char *argv[] )
{
unsigned __int64 a = 0x00000000FFFFFFFF;
void *orig = (void *)0xFFFFFFFF;
unsigned __int64 b = (unsigned __int64)orig;
if( a != b )
printf( " problem\ta: %016I64X\tb: %016I64X\n", a, b );
return;
}
There are no warnings and the result is:
problem a: 00000000FFFFFFFF b: FFFFFFFFFFFFFFFF
I suppose int orig = (int)0xFFFFFFFF would be less controversial as I'm not assigning a pointer to an integer. However the result would be the same.
Can someone explain to me where in the C standard it is covered that orig is sign extended from 0xFFFFFFFF to 0xFFFFFFFFFFFFFFFF?
I had assumed that (unsigned __int64)orig would become 0x00000000FFFFFFFF. It appears that the conversion is first to the signed __int64 type and then it becomes unsigned?
EDIT: This question has been answered in that pointers are sign extended which is why I see this behavior in gcc and msvc. However I don't understand why when I do something like (unsigned __int64)(int)0xF0000000 it sign extends to 0xFFFFFFFFF0000000 but (unsigned __int64)0xF0000000 does not instead showing what I want which is 0x00000000F0000000.
EDIT: An answer to the above edit. The reason that (unsigned __int64)(int)0xF0000000 is sign extended is because, as noted by user R:
Conversion of a signed type (or any type) to an unsigned type
always takes place via reduction modulo one plus the max value of
the destination type.
And in (unsigned __int64)0xF0000000 0xF0000000 starts off as an unsigned integer type because it cannot fit in an integer type. Next that already unsigned type is converted unsigned __int64.
So the takeaway from this for me is with a function that's returning a 32-bit or 64-bit pointer as an unsigned __int64 to compare I must first convert the 32-bit pointer in my 32-bit application to an unsigned type before promoting to unsigned __int64. The resulting code looks like this (but, you know, better):
unsigned __int64 functionidontcontrol( char * );
unsigned __int64 x;
void *y = thisisa32bitaddress;
x = functionidontcontrol(str);
if( x != (uintptr_t)y )
EDIT again:
Here is what I found in the C99 standard:
6.3.1.3 Signed and unsigned integers
1 When a value with integer type is converted to another integer
type other than _Bool, if the value can be represented by the new
type, it is unchanged.
2 Otherwise, if the new type is unsigned, the value is converted by
repeatedly adding or subtracting one more than the maximum value
that can be represented in the new type until the value is in the
range of the new type.49)
3 Otherwise, the new type is signed and the value cannot be
represented in it; either the result is implementation-defined or an
implementation-defined signal is raised.
49) The rules describe arithmetic on the mathematical value, not the
value of a given type of expression.
Converting a pointer to/from an integer is implementation defined.
Here is how gcc does it, i.e. it sign extends if the integer type is larger than the pointer type(this'll happen regardless of the integer being signed or unsigned, just because that's how gcc decided to implement it).
Presumably msvc behaves similar. Edit, the closest thing I can find on MSDN is this/this, suggesting that converting 32 bit pointers to 64 bit also sign extends.
From the C99 standard (§6.3.2.3/6):
Any pointer type may be converted to an integer type. Except as previously specified, the
result is implementation-defined. If the result cannot be represented in the integer type,
the behavior is undefined. The result need not be in the range of values of any integer
type.
So you'll need to find your compiler's documentation that talks about that.
Integer constants (e.g, 0x00000000FFFFFFFF) are signed integers by default, and hence may experience sign extension when assigned to a 64-bit variable. Try replacing the value on line 3 with:
0x00000000FFFFFFFFULL
Use this to avoid the sign extension:
unsigned __int64 a = 0x00000000FFFFFFFFLL;
Note the L on the end. Without this it is interpreted as a 32-bit signed number (-1) and then cast.

How to cast or convert an unsigned int to int in C?

My apologies if the question seems weird. I'm debugging my code and this seems to be the problem, but I'm not sure.
Thanks!
It depends on what you want the behaviour to be. An int cannot hold many of the values that an unsigned int can.
You can cast as usual:
int signedInt = (int) myUnsigned;
but this will cause problems if the unsigned value is past the max int can hold. This means half of the possible unsigned values will result in erroneous behaviour unless you specifically watch out for it.
You should probably reexamine how you store values in the first place if you're having to convert for no good reason.
EDIT: As mentioned by ProdigySim in the comments, the maximum value is platform dependent. But you can access it with INT_MAX and UINT_MAX.
For the usual 4-byte types:
4 bytes = (4*8) bits = 32 bits
If all 32 bits are used, as in unsigned, the maximum value will be 2^32 - 1, or 4,294,967,295.
A signed int effectively sacrifices one bit for the sign, so the maximum value will be 2^31 - 1, or 2,147,483,647. Note that this is half of the other value.
Unsigned int can be converted to signed (or vice-versa) by simple expression as shown below :
unsigned int z;
int y=5;
z= (unsigned int)y;
Though not targeted to the question, you would like to read following links :
signed to unsigned conversion in C - is it always safe?
performance of unsigned vs signed integers
Unsigned and signed values in C
What type-conversions are happening?
IMHO this question is an evergreen. As stated in various answers, the assignment of an unsigned value that is not in the range [0,INT_MAX] is implementation defined and might even raise a signal. If the unsigned value is considered to be a two's complement representation of a signed number, the probably most portable way is IMHO the way shown in the following code snippet:
#include <limits.h>
unsigned int u;
int i;
if (u <= (unsigned int)INT_MAX)
i = (int)u; /*(1)*/
else if (u >= (unsigned int)INT_MIN)
i = -(int)~u - 1; /*(2)*/
else
i = INT_MIN; /*(3)*/
Branch (1) is obvious and cannot invoke overflow or traps, since it
is value-preserving.
Branch (2) goes through some pains to avoid signed integer overflow
by taking the one's complement of the value by bit-wise NOT, casts it
to 'int' (which cannot overflow now), negates the value and subtracts
one, which can also not overflow here.
Branch (3) provides the poison we have to take on one's complement or
sign/magnitude targets, because the signed integer representation
range is smaller than the two's complement representation range.
This is likely to boil down to a simple move on a two's complement target; at least I've observed such with GCC and CLANG. Also branch (3) is unreachable on such a target -- if one wants to limit the execution to two's complement targets, the code could be condensed to
#include <limits.h>
unsigned int u;
int i;
if (u <= (unsigned int)INT_MAX)
i = (int)u; /*(1)*/
else
i = -(int)~u - 1; /*(2)*/
The recipe works with any signed/unsigned type pair, and the code is best put into a macro or inline function so the compiler/optimizer can sort it out. (In which case rewriting the recipe with a ternary operator is helpful. But it's less readable and therefore not a good way to explain the strategy.)
And yes, some of the casts to 'unsigned int' are redundant, but
they might help the casual reader
some compilers issue warnings on signed/unsigned compares, because the implicit cast causes some non-intuitive behavior by language design
If you have a variable unsigned int x;, you can convert it to an int using (int)x.
It's as simple as this:
unsigned int foo;
int bar = 10;
foo = (unsigned int)bar;
Or vice versa...
If an unsigned int and a (signed) int are used in the same expression, the signed int gets implicitly converted to unsigned. This is a rather dangerous feature of the C language, and one you therefore need to be aware of. It may or may not be the cause of your bug. If you want a more detailed answer, you'll have to post some code.
Some explain from C++Primer 5th Page 35
If we assign an out-of-range value to an object of unsigned type, the result is the remainder of the value modulo the number of values the target type can hold.
For example, an 8-bit unsigned char can hold values from 0 through 255, inclusive. If we assign a value outside the range, the compiler assigns the remainder of that value modulo 256.
unsigned char c = -1; // assuming 8-bit chars, c has value 255
If we assign an out-of-range value to an object of signed type, the result is undefined. The program might appear to work, it might crash, or it might produce garbage values.
Page 160:
If any operand is an unsigned type, the type to which the operands are converted depends on the relative sizes of the integral types on the machine.
...
When the signedness differs and the type of the unsigned operand is the same as or larger than that of the signed operand, the signed operand is converted to unsigned.
The remaining case is when the signed operand has a larger type than the unsigned operand. In this case, the result is machine dependent. If all values in the unsigned type fit in the large type, then the unsigned operand is converted to the signed type. If the values don't fit, then the signed operand is converted to the unsigned type.
For example, if the operands are long and unsigned int, and int and long have the same size, the length will be converted to unsigned int. If the long type has more bits, then the unsigned int will be converted to long.
I found reading this book is very helpful.

Resources