Why is 0 < -0x80000000? - c

I have below a simple program:
#include <stdio.h>
#define INT32_MIN (-0x80000000)
int main(void)
{
long long bal = 0;
if(bal < INT32_MIN )
{
printf("Failed!!!");
}
else
{
printf("Success!!!");
}
return 0;
}
The condition if(bal < INT32_MIN ) is always true. How is it possible?
It works fine if I change the macro to:
#define INT32_MIN (-2147483648L)
Can anyone point out the issue?

This is quite subtle.
Every integer literal in your program has a type. Which type it has is regulated by a table in 6.4.4.1:
Suffix Decimal Constant Octal or Hexadecimal Constant
none int int
long int unsigned int
long long int long int
unsigned long int
long long int
unsigned long long int
If a literal number can't fit inside the default int type, it will attempt the next larger type as indicated in the above table. So for regular decimal integer literals it goes like:
Try int
If it can't fit, try long
If it can't fit, try long long.
Hex literals behave differently though! If the literal can't fit inside a signed type like int, it will first try unsigned int before moving on to trying larger types. See the difference in the above table.
So on a 32 bit system, your literal 0x80000000 is of type unsigned int.
This means that you can apply the unary - operator on the literal without invoking implementation-defined behavior, as you otherwise would when overflowing a signed integer. Instead, you will get the value 0x80000000, a positive value.
bal < INT32_MIN invokes the usual arithmetic conversions and the result of the expression 0x80000000 is promoted from unsigned int to long long. The value 0x80000000 is preserved and 0 is less than 0x80000000, hence the result.
When you replace the literal with 2147483648L you use decimal notation and therefore the compiler doesn't pick unsigned int, but rather tries to fit it inside a long. Also the L suffix says that you want a long if possible. The L suffix actually has similar rules if you continue to read the mentioned table in 6.4.4.1: if the number doesn't fit inside the requested long, which it doesn't in the 32 bit case, the compiler will give you a long long where it will fit just fine.

0x80000000 is an unsigned literal with value 2147483648.
Applying the unary minus on this still gives you an unsigned type with a non-zero value. (In fact, for a non-zero value x, the value you end up with is UINT_MAX - x + 1.)

This integer literal 0x80000000 has type unsigned int.
According to the C Standard (6.4.4.1 Integer constants)
5 The type of an integer constant is the first of the corresponding
list in which its value can be represented.
And this integer constant can be represented by the type of unsigned int.
So this expression
-0x80000000 has the same unsigned int type. Moreover it has the same value
0x80000000 in the two's complement representation that calculates the following way
-0x80000000 = ~0x80000000 + 1 => 0x7FFFFFFF + 1 => 0x80000000
This has a side effect if to write for example
int x = INT_MIN;
x = abs( x );
The result will be again INT_MIN.
Thus in in this condition
bal < INT32_MIN
there is compared 0 with unsigned value 0x80000000 converted to type long long int according to the rules of the usual arithmetic conversions.
It is evident that 0 is less than 0x80000000.

The numeric constant 0x80000000 is of type unsigned int. If we take -0x80000000 and do 2s compliment math on it, we get this:
~0x80000000 = 0x7FFFFFFF
0x7FFFFFFF + 1 = 0x80000000
So -0x80000000 == 0x80000000. And comparing (0 < 0x80000000) (since 0x80000000 is unsigned) is true.

A point of confusion occurs in thinking the - is part of the numeric constant.
In the below code 0x80000000 is the numeric constant. Its type is determine only on that. The - is applied afterward and does not change the type.
#define INT32_MIN (-0x80000000)
long long bal = 0;
if (bal < INT32_MIN )
Raw unadorned numeric constants are positive.
If it is decimal, then the type assigned is first type that will hold it: int, long, long long.
If the constant is octal or hexadecimal, it gets the first type that holds it: int, unsigned, long, unsigned long, long long, unsigned long long.
0x80000000, on OP's system gets the type of unsigned or unsigned long. Either way, it is some unsigned type.
-0x80000000 is also some non-zero value and being some unsigned type, it is greater than 0. When code compares that to a long long, the values are not changed on the 2 sides of the compare, so 0 < INT32_MIN is true.
An alternate definition avoids this curious behavior
#define INT32_MIN (-2147483647 - 1)
Let us walk in fantasy land for a while where int and unsigned are 48-bit.
Then 0x80000000 fits in int and so is the type int. -0x80000000 is then a negative number and the result of the print out is different.
[Back to real-word]
Since 0x80000000 fits in some unsigned type before a signed type as it is just larger than some_signed_MAX yet within some_unsigned_MAX, it is some unsigned type.

C has a rule that the integer literal may be signed or unsigned depends on whether it fits in signed or unsigned (integer promotion). On a 32-bit machine the literal 0x80000000 will be unsigned. 2's complement of -0x80000000 is 0x80000000 on a 32-bit machine. Therefore, the comparison bal < INT32_MIN is between signed and unsigned and before comparison as per the C rule unsigned int will be converted to long long.
C11: 6.3.1.8/1:
[...] Otherwise, if the type of the operand with signed integer type can represent all of the values of the type of the operand with unsigned integer type, then the operand with unsigned integer type is converted to the type of the operand with signed integer type.
Therefore, bal < INT32_MIN is always true.

Related

How you avoid implicit conversion from short to integer during addition?

I'm doing a few integer for myself, where I'm trying to fully understand integer overflow.
I kept reading about how it can be dangerous to mix integer types of different sizes. For that reason i wanted to have an example where a short would overflow much faster than a int.
Here is the snippet:
unsigned int longt;
longt = 65530;
unsigned short shortt;
shortt = 65530;
if (longt > (shortt+10)){
printf("it is bigger");
}
But the if-statement here is not being run, which must mean that the short is not overflowing. Thus I conclude that in the expression shortt+10 a conversion happens from short to integer.
This is a bit strange to me, when the if statement evaluates expressions, does it then have the freedom to assign a new integer type as it pleases?
I then thought that if I was adding two short's then that would surely evaluate to a short:
unsigned int longt;
longt = 65530;
unsigned short shortt;
shortt = 65530;
shortt = shortt;
short tmp = 10;
if (longt > (shortt+tmp)){
printf("Ez bigger");
}
But alas, the proporsition still evaluates to false.
I then try do do something where I am completely explicit, where I actually do the addition into a short type, this time forcing it to overflow:
unsigned int longt;
longt = 65530;
unsigned short shortt;
shortt = 65530;
shortt = shortt;
short tmp = shortt + 10;
if (longt > tmp){
printf("Ez bigger");
}
Finally this worked, which also would be really annoying if it did'nt.
This flusters me a little bit though, and it reminds me of a ctf exercise that I did a while back, where I had to exploit this code snippet:
#include <stdio.h>
int main() {
int impossible_number;
FILE *flag;
char c;
if (scanf("%d", &impossible_number)) {
if (impossible_number > 0 && impossible_number > (impossible_number + 1)) {
flag = fopen("flag.txt","r");
while((c = getc(flag)) != EOF) {
printf("%c",c);
}
}
}
return 0;
}
Here, youre supposed to trigger a overflow of the "impossible_number" variable which was actually possible on the server that it was deployed upon, but would make issues when run locally.
int impossible_number;
FILE *flag;
char c;
if (scanf("%d", &impossible_number)) {
if (impossible_number > 0 && impossible_number > (impossible_number + 1)) {
flag = fopen("flag.txt","r");
while((c = getc(flag)) != EOF) {
printf("%c",c);
}
}
}
return 0;
You should be able to give "2147483647" as input, and then overflow and hit the if statement. However this does not happen when run locally, or running at an online compiler.
I don't get it, how do you get an expression to actually overflow the way that is is actually supossed to do, like in this example from 247ctf?
I hope someone has a answer for this
How you avoid implicit conversion from short to integer during addition?
You don't.
C has no arithmetic operations on integer types narrower than int and unsigned int. There is no + operator for type short.
Whenever an expression of type short is used as the operand of an arithmetic operator, it is implicitly converted to int.
For example:
short s = 1;
s = s + s;
In s + s, s is promoted from short to int and the addition is done in type int. The assignment then implicitly converts the result of the addition from int to short.
Some compilers might have an option to enable a warning for the narrowing conversion from int to short, but there's no way to avoid it.
What you're seeing is a result of integer promotions. What this basically means it that anytime an integer type smaller than int is used in an expression it is converted to int.
This is detailed in section 6.3.1.1p2 of the C standard:
The following may be used in an expression wherever an int or unsigned int may be used:
An object or expression with an integer type (other than int or unsigned int) whose integer conversion rank is less than or equal to
the rank of int and unsigned int.
A bit-field of type _Bool, int, signed int, or unsigned int.
If an int can represent all values of the original type (as restricted
by the width, for a bit-field), the value is converted to an int;
otherwise, it is converted to an unsigned int. These are called the
integer promotions. All other types are unchanged by the integer
promotions
That is what's happening here. So let's look at the first expression:
if (longt > (shortt+10)){
Here we have a unsigned short with value 65530 being added to the constant 10 which has type int. The unsigned short value is converted to an int value, so now we have the int value 65530 being added to the int value 10 which results in the int value 65540. We now have 65530 > 65540 which is false.
The same happens in the second case where both operands of the + operator are first promoted from unsigned short to int.
In the third case, the difference happens here:
short tmp = shortt + 10;
On the right side of the assignment, we still have the int value 65540 as before, but now this value needs to be assigned back to a short. This undergoes an implementation defined conversion to short, which is detailed in section 6.3.1.3:
1 When a value with integer type is converted to another integer type other than _Bool, if the value can be represented by the new
type, it is unchanged.
2 Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that
can be represented in the new type until the value is in the range of
the new type.
3 Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an
implementation-defined signal is raised.
Paragraph 3 takes effect in this particular case. In most implementations you're likely to come across, this will typically mean "wraparound" of the value.
So how do you work with this? The closest thing you can do is either what you did, i.e. assign the intermediate result to a variable of the desired type, or cast the intermediate result:
if (longt > (short)(shortt+10)) {
As for the "impossible" input in the CTF example, that actually causes signed integer overflow as a result of the the addition, and that triggers undefined behavior. For example, when I ran it on my machine, I got into the if block if I compiled with -O0 or -O1 but not with -O2.
How you avoid implicit conversion from short to integer during addition?
Not really avoidable.
On 16-bit and wider machines, the conversion short to int and unsigned short to unsigned does not affect the value. But addition overflow and the implicit conversion from int to unsigned renders a different result in 16-but vs. 32-bit for OP's values. For in 16-bit land, unsigned short to int does not implicitly occur. Instead, code does unsigned short to unsigned.
int/unsigned as 16-bit
If int/unsigned were 16-bit -common on many embedded processors, then shortt would not convert to an int, but to unsigned.
// Given 16-bit int/unsigned
unsigned int longt;
longt = 65530; // 32-bit long constant assigned to 16-bit unsigned - no value change as value in range.
unsigned short shortt;
shortt = 65530; // 32-bit long constant assigned to 16-bit unsigned short - no value change as value in range.
// (shortt+10)
// shortt+10 is a unsigned short + int
// unsigned short promotes to unsigned - no value change.
// Then since unsigned + int, the int 10 converts to unsigned 10 - no value change.
// unsigned 65530 + unsigned 10 exceeds unsigned range so 65536 subtracted.
// Sum is 4.
// Statment is true.
if (longt > (shortt+10)){
printf("it is bigger");
}
It is called an implicit conversion.
From C standard:
Several operators convert operand values from one type to another
automatically. This subclause specifies the result required from such
an implicit conversion, as well as those that result from a cast
operation (an explicit conversion ). The list in 6.3.1.8 summarizes
the conversions performed by most ordinary operators; it is
supplemented as required by the discussion of each operator in 6.5
Every integer type has an integer conversion rank defined as follows:
No two signed integer types shall have the same rank, even if they
have the same representation.
The rank of a signed integer type
shall be greater than the rank of any signed integer type with less
precision.
The rank of long long int shall be greater than the rank
of long int, which shall be greater than the rank of int, which shall
be greater than the rank of short int, which shall be greater than the
rank of signed char.
The rank of any unsigned integer type shall
equal the rank of the corresponding signed integer type, if any.
The
rank of any standard integer type shall be greater than the rank of
any extended integer type with the same width.
The rank of char
shall equal the rank of signed char and unsigned char.
The rank of
_Bool shall be less than the rank of all other standard integer types.
The rank of any enumerated type shall equal the rank of the
compatible integer type (see 6.7.2.2).
The rank of any extended
signed integer type relative to another extended signed integer type
with the same precision is implementation-defined, but still subject
to the other rules for determining the integer conversion rank.
For
all integer types T1, T2, and T3, if T1 has greater rank than T2 and
T2 has greater rank than T3, then T1 has greater rank than T3.
The
following may be used in an expression wherever an int or unsigned int
may be used:
— An object or expression with an integer type (other than int or unsigned
int) whose integer conversion rank is less than or equal to the rank
of int and unsigned int.
A bit-field of type _Bool, int, signed int,
or unsigned int. If an int can represent all v alues of the original
type (as restricted by the width, for a bit-field), the value is
converted to an int; otherwise, it is converted to an unsigned int.
These are called the integer promotions.58) All other types are
unchanged by the integer promotions.
The integer promotions preserve
value including sign. As discussed earlier, whether a ‘‘plain’’ char
is treated as signed is implementation-defined.
You cant avoid implicit conversion but you can cast the result of the operation to the required type
if (longt > (short)(shortt+tmp))
{
printf("Ez bigger");
}
https://godbolt.org/z/39Exa8E7K
But this conversion invokes Undefined Behaviour as your short integer overflows. You have to be very careful doing it as it can be a source of very hard to find and debug errors.

Safe low 32 bits masking of uint64_t

Assume the following code:
uint64_t g_global_var;
....
....
void foo(void)
{
uint64_t local_32bit_low = g_global_var & 0xFFFFFFFF;
....
}
With the current toolchain, this code works as expected, local_32bit_low indeed contains the low 32 bits of g_global_var.
I wonder if it is guaranteed by the standard C that this code will always work as expected?
My concern is that the compiler may treat 0xFFFFFFFF as integer value of -1 and when promoting to uint64_t it would become 0xFFFFFFFFFFFFFFFF.
P.S.
I know that to be on the safe side it is better to use 0xFFFFFFFFULL in this case. The point is that I saw it in a legacy code and I wonder if it worth to be fixed or not.
There is no problem. The integer constant 0xFFFFFFFF has the type that is able to store the value as is.
According to the C Standard (6.4.4.1 Integer constants)
5 The type of an integer constant is the first of the corresponding
list in which its value can be represented
So this value is stored as a positive value.
If the type unsigned int is a 32-bit integer type then the constant will have the type unsigned int.
Otherwise it will have one of the types that can store the value.
long int
unsigned long int
long long int
unsigned long long int
Due to the usual arithmetic conversions in the expression
g_global_var & 0xFFFFFFFF;
it is promoted like
0x00000000FFFFFFFF
Pay attention to that in C there is no negative integer constants. For example an expression like
-10
consists of two sub-expressions: the primary expression 10 and the sub-expression with the unary operator - -19 that coincides with the full expression.
0xffffffff is not -1, ever. It may convert to -1 if you cast or coerce (e.g. by assignment) it to a signed 32-bit type, but integer literals in C always have their mathematical value unless they overflow.
For decimal literals, the type is the narrowest signed type that can represent the value. For hex literals, unsigned types are used before going up to the next wider signed type. So, in the common case where int is 32-bit, 0xffffffff would have type unsigned int. If you wrote it as decimal, it would have type long (if long is 64-bit) or long long (if long is only 32-bit).
The type of an unsuffixed hexadecimal or octal constant is the first of the following list in which its value can be represented:
int
unsigned int
long int
unsigned long int
long long int
unsigned long long int
(For unsuffixed decimal constants, remove the unsigned types from the above list.)
The hexadecimal constant 0xFFFFFFFF can definitely be represented by unsigned long int, so its type will be the first of int, unsigned int, long int or unsigned long int that can represent its value.
Note that although 0xFFFFFFFF > 0 always evaluates to 1 (true), it is possible for 0xFFFFFFFF > -1 to evaluate to either 0 (false) or 1 (true) on different implementations. So you need to be careful when comparing integer constants with each other or with other objects of integer type.
Others have answered the question, just a recomendation, next time (if you are under C11) you can check the type of the expression by yourself using _Generic
#include <stdio.h>
#include <stdint.h>
#define print_type(x) _Generic((x), \
int64_t: puts("int64_t"), \
uint64_t: puts("uint64_t"), \
default: puts("unknown") \
)
uint64_t g_global_var;
int main(void)
{
print_type(g_global_var & 0xFFFFFFFF);
return 0;
}
The ouput is
uint64_t

What is the default data type of number in C?

In C,
unsigned int size = 1024*1024*1024*2;
which results a warning "integer overflow in expression..."
While
unsigned int size = 2147483648;
results no warning?
Is the right value of the first expression is default as int? Where does it mention in C99 spec?
When using a decimal constant without any suffixes the type of the decimal constant is the first that can be represented, in order (the current C standard, 6.4.4 Constants p5):
int
long int
long long int
The type of the first expression is int, since every constant with the value 1024 and 2 can be represented as int. The computation of those constants will be done in type int, and the result will overflow.
Assuming INT_MAX equals 2147483647 and LONG_MAX is greater than 2147483647, the type of the second expression is long int, since this value cannot be represented as int, but can be as long int. If INT_MAX equals LONG_MAX equals 2147483647, then the type is long long int.
unsigned int size = 1024*1024*1024*2;
This expression 1024*1024*1024*2 (in the expression 1024 and 2 are of type signed int) produces result that is of type signed int and this value is too big for signed int . Therefore, you get the warning.
After that signed multiplication it is assigned to unsigned int .

Type of integer literals and ~ in C

I'm a C beginner, and I'm confused by the following example found in the C answer book.
One way to find the size of unsigned long long on your system is to type:
printf("%llu", (unsigned long long) ~0);
I have no idea why this syntax works?
On my system, int are 32 bits, and long long are 64 bits.
What I expected was that, since 0 is a constant of type integer, ~0 calculates the negation of a 32-bits integer, which is then converted to an unsigned long long by the cast operator. This should give 232 - 1 as a result.
Somehow, it looks like the ~ operator already knows that it should act on 64 bits?
Does the compiler interprets this instruction as printf("%llu", ~(unsigned long long)0); ? That doesn't sound right since the cast and ~ have the same priority.
Somehow, it looks like the ~ operator already knows that it should act on 64 bits?
It's not the ~ operator, it's the cast. Here is how the integer conversion is done according to the standard:
6.3.1.3 Signed and unsigned integers
When a value with integer type is converted to another integer type other than _Bool, if the value can be represented by the new type, it is unchanged.
Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.
Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised.
The value of signed int ~0 corresponds to -1 on systems with two's complement representation of negative values. It cannot be represented by an unsigned long long, so the first bullet point does not apply.
The second bullet point does apply: the new type is unsigned, so MAX of unsigned long long is added to -1 once to get the result into the range of unsigned long long. This has the same effect as sign-extending -1 to 64 bits.
0 is of type int, not unsigned int. ~0 will therefore (on machines that use two's complement integer representation, which is all that are in use today) be -1, not 232 - 1.
Assuming a 64-bit unsigned long long, (unsigned long long) -1 is -1 modulo 264, which is 264 - 1.
0 is an int
~0 is still an int, namely the value -1.
Casting an int to unsigned long long is there merely to match the type that printf expects with the conversion llu.
However, the value of -1 extended an unsigned long long should be 0xffffffff for 4 byte int and 0xffffffffffffffff for 8 byte int.
According to N1570 Committee Draft:
6.5.3.3 Unary arithmetic operators
The result of the ~ operator is the bitwise complement of its
(promoted) operand (that is, each bit in the result is set if and only
if the corresponding bit in the converted operand is not set). The
integer promotions are performed on the operand, and the result has
the promoted type. If the promoted type is an "unsigned type, the
expression ~E is equivalent to the maximum value representable in that
type minus E".
§6.2.6.2 Language 45:
(ones’ complement). Which of these applies is implementation-defined, as is whether the value with sign bit 1 and all value bits zero (for the first two), or with sign bit and all value bits 1 (for ones’ complement), is a trap representation or a normal value. In the case of sign and magnitude and ones’ complement, if this representation is a normal value it is called a negative zero.
Hence, the behavior of code:
printf("%llu", (unsigned long long) ~0);
On some machine is implementation-defined and undefined - not as per expected — depend on the internal representations of integers in machine.
And according to section 6.5.3.3, approved way to write code would be:
printf("%llu", (unsigned long long) ~0u);
Further, type of ~0u is unsigned int where as you are casting it to unsigned long long int for which format string is llu. To print ~0u using format string %u.
To learn basic concept of type casting you may like to read: What exactly is a type cast in C/C++?

Is the integer constant's default type signed or unsigned?

Is the integer constant's default type signed or unsigned? such as 0x80000000, how can I to decide to use it as a signed integer constant or unsigned integer constant without any suffix?
If it is a signed integer constant, how to explain following case?
printf("0x80000000>>3 : %x\n", 0x80000000>>3);
output:
0x80000000>>3 : 10000000
The below case can indicate my platform uses arithmetic bitwise shift, not logic bitwise shift:
int n = 0x80000000;
printf("n>>3: %x\n", n>>3);
output:
n>>3: f0000000
C has different rules for decimal, octal and hexadecimal constants.
For decimal, it is the first type the value can fit in: int, long, long long
For hexadecimal, it is the first type the value can fit in: int, unsigned int, long, unsigned long, long long, unsigned long long
For example on a system with 32-bit int and unsigned int: 0x80000000 is unsigned int.
Note that for decimal constants, C90 had different rules (but rules didn't change for hexadecimal constants).
It is signed if it fits in a signed integer. To make it unsigned, append a u suffix, e.g. 1234u.
You can convert a signed value to unsigned by assigning it to an unsigned variable.
unsigned int i = 1234u; // no conversion needed
unsigned int i = 1234; // signed value 1234 now converted to unsigned
For 0x80000000, it will be unsigned if ints are 32 bits on your platform, since it doesn't fit into a signed int.
Another thing to watch out for, though, is that the behaviour of right-shift is platform-dependent. On some platforms it's sign-preserving (arithmetic) and on some platforms it's a simple bitwise shift (logical).

Resources