Comparison operation on unsigned and signed integers - c

See this code snippet
int main()
{
unsigned int a = 1000;
int b = -1;
if (a>b) printf("A is BIG! %d\n", a-b);
else printf("a is SMALL! %d\n", a-b);
return 0;
}
This gives the output: a is SMALL: 1001
I don't understand what's happening here. How does the > operator work here? Why is "a" smaller than "b"? If it is indeed smaller, why do i get a positive number (1001) as the difference?

Binary operations between different integral types are performed within a "common" type defined by so called usual arithmetic conversions (see the language specification, 6.3.1.8). In your case the "common" type is unsigned int. This means that int operand (your b) will get converted to unsigned int before the comparison, as well as for the purpose of performing subtraction.
When -1 is converted to unsigned int the result is the maximal possible unsigned int value (same as UINT_MAX). Needless to say, it is going to be greater than your unsigned 1000 value, meaning that a > b is indeed false and a is indeed small compared to (unsigned) b. The if in your code should resolve to else branch, which is what you observed in your experiment.
The same conversion rules apply to subtraction. Your a-b is really interpreted as a - (unsigned) b and the result has type unsigned int. Such value cannot be printed with %d format specifier, since %d only works with signed values. Your attempt to print it with %d results in undefined behavior, so the value that you see printed (even though it has a logical deterministic explanation in practice) is completely meaningless from the point of view of C language.
Edit: Actually, I could be wrong about the undefined behavior part. According to C language specification, the common part of the range of the corresponding signed and unsigned integer type shall have identical representation (implying, according to the footnote 31, "interchangeability as arguments to functions"). So, the result of a - b expression is unsigned 1001 as described above, and unless I'm missing something, it is legal to print this specific unsigned value with %d specifier, since it falls within the positive range of int. Printing (unsigned) INT_MAX + 1 with %d would be undefined, but 1001u is fine.

On a typical implementation where int is 32-bit, -1 when converted to an unsigned int is 4,294,967,295 which is indeed ≥ 1000.
Even if you treat the subtraction in an unsigned world, 1000 - (4,294,967,295) = -4,294,966,295 = 1,001 which is what you get.
That's why gcc will spit a warning when you compare unsigned with signed. (If you don't see a warning, pass the -Wsign-compare flag.)

You are doing unsigned comparison, i.e. comparing 1000 to 2^32 - 1.
The output is signed because of %d in printf.
N.B. sometimes the behavior when you mix signed and unsigned operands is compiler-specific. I think it's best to avoid them and do casts when in doubt.

#include<stdio.h>
int main()
{
int a = 1000;
signed int b = -1, c = -2;
printf("%d",(unsigned int)b);
printf("%d\n",(unsigned int)c);
printf("%d\n",(unsigned int)a);
if(1000>-1){
printf("\ntrue");
}
else
printf("\nfalse");
return 0;
}
For this you need to understand the precedence of operators
Relational Operators works left to right ...
so when it comes
if(1000>-1)
then first of all it will change -1 to unsigned integer because int is by default treated as unsigned number and it range it greater than the signed number
-1 will change into the unsigned number ,it changes into a very big number

Find a easy way to compare, maybe useful when you can not get rid of unsigned declaration, (for example, [NSArray count]), just force the "unsigned int" to an "int".
Please correct me if I am wrong.
if (((int)a)>b) {
....
}

The hardware is designed to compare signed to signed and unsigned to unsigned.
If you want the arithmetic result, convert the unsigned value to a larger signed type first. Otherwise the compiler wil assume that the comparison is really between unsigned values.
And -1 is represented as 1111..1111, so it a very big quantity ... The biggest ... When interpreted as unsigned.

while comparing a>b where a is unsigned int type and b is int type, b is type casted to unsigned int so, signed int value -1 is converted into MAX value of unsigned**(range: 0 to (2^32)-1 )**
Thus, a>b i.e., (1000>4294967296) becomes false. Hence else loop printf("a is SMALL! %d\n", a-b); executed.

Related

Weirdness with unsigned int, float data types and multiplication

I am not very good at C language and just met a problem I don't understand. The code is:
int main()
{
unsigned int a = 100;
unsigned int b = 200;
float c = 2;
int result_i;
unsigned int result_u;
float result_f;
result_i = (a - b)*2;
result_u = (a - b);
result_f = (a-b)*c;
printf("%d\n", result_i);
printf("%d\n", result_u);
printf("%f\n", result_f);
return 0;
}
And the output is:
-200
-100
8589934592.000000
Program ended with exit code: 0
For (a-b) is negative and a,b are unsigned int type, (a-b) is trivial. And after multiplying a float type number c, the result is 8589934592.000000. I have two questions:
First, why the result is non-trivial after multiplying int type number 2 and assigned to an int type number?
Second, why the result_u is non-trivial even though (a-b) is negative and result_u is unsigned int type?
I am using Xcode to test this code, and the compiler is the default APPLE LLVM 6.0.
Thanks!
Your assumption that a - b is negative is completely incorrect.
Since a and b have unsigned int type all arithmetic operations with these two variables are performed in the domain of unsigned int type. The same applies to mixed "unsigned int with int" arithmetic as well. Such operations implement modulo arithmetic, with the modulo being equal to UINT_MAX + 1.
This means that expression a - b produces a result of type unsigned int. It is a large positive value equal to UINT_MAX + 1 - 100. On a typical platform with 32-bit int it is 4294967296 - 100 = 4294967196.
Expression (a - b) * 2 also produces a result of type unsigned int. It is also a large positive value (UINT_MAX + 1 - 100 multiplied by 2 and taken modulo UINT_MAX + 1). On a typical platform it is 4294967096.
This latter value is too large for type int. Which means that when you force it into a variable result_i, signed integer overflow occurs. The result of signed integer overflow on assignment is implementation defined. In your case result_i ended up being -200. It looks "correct", but this is not guaranteed by the language. (Albeit it might be guaranteed by your implementation.)
Variable result_u receives the correct unsigned result - a positive value UINT_MAX + 1 - 100. But you print that result using %d format specifier in printf, instead of the proper %u. It is illegal to print unsigned int values that do not fit into the range of int using %d specifier. The behavior of your code is undefined for that reason. The -100 value you see in the output is just a manifestation of that undefined behavior. This output is formally meaningless, even though it appears "correct" at the first sight.
Finally, variable result_f receives the "proper" result of (a-b)*c expression, calculated without overflows, since the multiplication is performed in the float domain. What you see is that large positive value I mentioned above, multiplied by 2. It is likely rounded to the precision of float type though, which is implementation-defined. The exact value would be 4294967196 * 2 = 8589934392.
One can argue that the last value you printed is the only one that properly reflects the properties of unsigned arithmetic, i.e. it is "naturally" derived from the actual result of a - b.
You get negative numbers in the printf because you've asked it to print a signed integer with %d. Use %u if you want to see the actual value you ended up with. That will also show you how you ended up with the output for the float multiplication.

Why does this program output "1"?

#include<stdio.h>
main()
{
unsigned int a=1,b=2;
printf("%d\n",a-b>=0);
getch();
}
Why does this program output "1" ?
Compile with warnings enabled, and it will become quite obvious:
unsigned.c:5:22: warning: comparison of unsigned expression >= 0 is always true
[-Wtautological-compare]
printf("%d\n",a-b>=0);
~~~^ ~
You are subtracting two unsigned integers. Even though the subtraction would equal -1, since they are unsigned it wraps around and you get some very large value. There is no way that an unsigned integer could ever not be greater than or equal to zero.
Quoting ANSI C 89 standard on subtraction generating a negative number using unsigned operands:
A computation involving unsigned operands can never overflow, because
a result that cannot be represented by the resulting unsigned integer
type is reduced modulo the number that is one greater than the largest
value that can be represented by the resulting unsigned integer type
So, unsigned subtraction never generates a negative number, making your condition true, and thus prints 1.
try to print parts of your logic expression
printf("%u >= %u\n",a-b, 0);
and you will see why a-b>=0 is true.

Type conversion - unsigned to signed int/char

I tried the to execute the below program:
#include <stdio.h>
int main() {
signed char a = -5;
unsigned char b = -5;
int c = -5;
unsigned int d = -5;
if (a == b)
printf("\r\n char is SAME!!!");
else
printf("\r\n char is DIFF!!!");
if (c == d)
printf("\r\n int is SAME!!!");
else
printf("\r\n int is DIFF!!!");
return 0;
}
For this program, I am getting the output:
char is DIFF!!!
int is SAME!!!
Why are we getting different outputs for both?
Should the output be as below ?
char is SAME!!!
int is SAME!!!
A codepad link.
This is because of the various implicit type conversion rules in C. There are two of them that a C programmer must know: the usual arithmetic conversions and the integer promotions (the latter are part of the former).
In the char case you have the types (signed char) == (unsigned char). These are both small integer types. Other such small integer types are bool and short. The integer promotion rules state that whenever a small integer type is an operand of an operation, its type will get promoted to int, which is signed. This will happen no matter if the type was signed or unsigned.
In the case of the signed char, the sign will be preserved and it will be promoted to an int containing the value -5. In the case of the unsigned char, it contains a value which is 251 (0xFB ). It will be promoted to an int containing that same value. You end up with
if( (int)-5 == (int)251 )
In the integer case you have the types (signed int) == (unsigned int). They are not small integer types, so the integer promotions do not apply. Instead, they are balanced by the usual arithmetic conversions, which state that if two operands have the same "rank" (size) but different signedness, the signed operand is converted to the same type as the unsigned one. You end up with
if( (unsigned int)-5 == (unsigned int)-5)
Cool question!
The int comparison works, because both ints contain exactly the same bits, so they are essentially the same. But what about the chars?
Ah, C implicitly promotes chars to ints on various occasions. This is one of them. Your code says if(a==b), but what the compiler actually turns that to is:
if((int)a==(int)b)
(int)a is -5, but (int)b is 251. Those are definitely not the same.
EDIT: As #Carbonic-Acid pointed out, (int)b is 251 only if a char is 8 bits long. If int is 32 bits long, (int)b is -32764.
REDIT: There's a whole bunch of comments discussing the nature of the answer if a byte is not 8 bits long. The only difference in this case is that (int)b is not 251 but a different positive number, which isn't -5. This is not really relevant to the question which is still very cool.
Welcome to integer promotion. If I may quote from the website:
If an int can represent all values of the original type, the value is
converted to an int; otherwise, it is converted to an unsigned int.
These are called the integer promotions. All other types are unchanged
by the integer promotions.
C can be really confusing when you do comparisons such as these, I recently puzzled some of my non-C programming friends with the following tease:
#include <stdio.h>
#include <string.h>
int main()
{
char* string = "One looooooooooong string";
printf("%d\n", strlen(string));
if (strlen(string) < -1) printf("This cannot be happening :(");
return 0;
}
Which indeed does print This cannot be happening :( and seemingly demonstrates that 25 is smaller than -1!
What happens underneath however is that -1 is represented as an unsigned integer which due to the underlying bits representation is equal to 4294967295 on a 32 bit system. And naturally 25 is smaller than 4294967295.
If we however explicitly cast the size_t type returned by strlen as a signed integer:
if ((int)(strlen(string)) < -1)
Then it will compare 25 against -1 and all will be well with the world.
A good compiler should warn you about the comparison between an unsigned and signed integer and yet it is still so easy to miss (especially if you don't enable warnings).
This is especially confusing for Java programmers as all primitive types there are signed. Here's what James Gosling (one of the creators of Java) had to say on the subject:
Gosling: For me as a language designer, which I don't really count
myself as these days, what "simple" really ended up meaning was could
I expect J. Random Developer to hold the spec in his head. That
definition says that, for instance, Java isn't -- and in fact a lot of
these languages end up with a lot of corner cases, things that nobody
really understands. Quiz any C developer about unsigned, and pretty
soon you discover that almost no C developers actually understand what
goes on with unsigned, what unsigned arithmetic is. Things like that
made C complex. The language part of Java is, I think, pretty simple.
The libraries you have to look up.
The hex representation of -5 is:
8-bit, two's complement signed char: 0xfb
32-bit, two's complement signed int: 0xfffffffb
When you convert a signed number to an unsigned number, or vice versa, the compiler does ... precisely nothing. What is there to do? The number is either convertible or it isn't, in which case undefined or implementation-defined behaviour follows (I've not actually checked which) and the most efficient implementation-defined behaviour is to do nothing.
So, the hex representation of (unsigned <type>)-5 is:
8-bit, unsigned char: 0xfb
32-bit, unsigned int: 0xfffffffb
Look familiar? They're bit-for-bit the same as the signed versions.
When you write if (a == b), where a and b are of type char, what the compiler is actually required to read is if ((int)a == (int)b). (This is that "integer promotion" that everyone else is banging on about.)
So, what happens when we convert char to int?
8-bit signed char to 32-bit signed int: 0xfb -> 0xfffffffb
Well, that makes sense because it matches the representations of -5 above!
It's called a "sign-extend", because it copies the top bit of the byte, the "sign-bit", leftwards into the new, wider value.
8-bit unsigned char to 32-bit signed int: 0xfb -> 0x000000fb
This time it does a "zero-extend" because the source type is unsigned, so there is no sign-bit to copy.
So, a == b really does 0xfffffffb == 0x000000fb => no match!
And, c == d really does 0xfffffffb == 0xfffffffb => match!
My point is: didn't you get a warning at compile time "comparing signed and unsigned expression"?
The compiler is trying to inform you that he is entitled to do crazy stuff! :) I would add, crazy stuff will happen using big values, close to the capacity of the primitive type. And
unsigned int d = -5;
is assigning definitely a big value to d, it's equivalent (even if, probably not guaranteed to be equivalent) to be:
unsigned int d = UINT_MAX -4; ///Since -1 is UINT_MAX
Edit:
However, it is interesting to notice that only the second comparison gives a warning (check the code). So it means that the compiler applying the conversion rules is confident that there won't be errors in the comparison between unsigned char and char (during comparison they will be converted to a type that can safely represent all its possible values). And he is right on this point. Then, it informs you that this won't be the case for unsigned int and int: during the comparison one of the 2 will be converted to a type that cannot fully represent it.
For completeness, I checked it also for short: the compiler behaves in the same way than for chars, and, as expected, there are no errors at runtime.
.
Related to this topic, I recently asked this question (yet, C++ oriented).

why the result is "2" of unsigned int (1) - unsigned int (0xFFFFFFFF)

Please look at the following codes:
#include <stdlib.h>
#include <stdio.h>
int main()
{
unsigned int a = 1;
unsigned int b = -1;
printf("0x%X\n", (a-b));
return 0;
}
The result is 0x2.
I think the integer promotion should not happen because the type of both of "a" and "b" are unsigned int. But the result beats me.... I don't know the reason.
By the way, I know the arithmetic result should be 2 because 1-(-1)=2. But the type of b is unsigned int. When assign the (-1) to b, the value of b is 0xFFFFFFFF actually. It is the maximum value of unsigned int. When one small unsigned value subtract one big value, the result is not that I expect.
From the answer below, I think maybe the overflow is a good explanation。
Now I writes other test codes. It proves the overflow answer is right.
#include <stdlib.h>
#include <stdio.h>
int main()
{
unsigned int c = 1;
unsigned int d = -1;
printf("0x%llx\n", (unsigned long long)c-(unsigned long long)d);
return 0;
}
The result is "0xffffffff00000002". It is I expect.
unsigned int a = 1;
This initializes a to 1. Actually, since 1 is of type int, there's an implicit int-to-unsigned conversion, but it's a trivial conversion that doesn't change the value or representation).
unsigned int b = -1;
This is more interesting. -1 is of type int, and the initialization implicitly converts it from int to unsigned int. Since the mathematical value -1 cannot be represented as an unsigned int, the conversion is non-trivial. The rule in this case is (quoting section 6.3.1.3 of the C standard):
the value is converted by repeatedly adding or subtracting one more
than the maximum value that can be represented in the new type until
the value is in the range of the new type.
Of course it doesn't actually have to do it that way, as long as the result is the same. Adding UINT_MAX+1 ("one more than the maximum value that can be represented in the new type") to -1 yields UINT_MAX. (That addition is defined mathematically; it's not itself subject to any type conversions.)
In fact, assigning -1 to an object of any unsigned type is a good way to get the maximum value of that type without having to refer to the *_MAX macros defined in <limits.h>.
So, assuming a typical 32-bit system, we have a == 1 and b == 0xFFFFFFFF.
printf("0x%X\n", (a-b));
The mathematical result of the subtraction is -0xFFFFFFFE, but that's obviously outside the range of unsigned int. The rules for unsigned arithmetic are similar to the rules for unsigned conversion; the result is 2.
Who says you're suffering integer promotion? Let's pretend that your integers are two's complement (likely, though not mandated by the standard) and they're only four bits wide (not possible according to the standard, I'm just using this to simplify things).
int unsigned-int bit-pattern
--- ------------ -----------
1 1 0001
-1 15 1111
------
(subtract with borrow) 1 0010
(only have four bits) 0010 -> 2
You can see here that you can get the result you see without any promotion to signed or wider types.
There should be a compiler warning that you probably ignored or turned off, but it's still possible to store -1 in an unsigned integer. Internally, -1 is stored on a 32-bit machine as 0xffffffff. So if you subtract 0xffffffff from 1, you end up with -0xfffffffe, which is 2. (There are no negative numbers, a negative number is the maximum integer value plus one minus the number).
Bottom line - signed or unsigned doesn't matter at all, it only comes to play when you compare values.
And mathematically speaking, 1 - (-1) = 1+1.
If you subtract a negative number, it is the equivalent of adding a positive number.
a = 1
b = -1
(a-b) = ?
((1)-(-1)) = ?
(1-(-1)) = ?
(1+1) = ?
2 = ?
At first you might think that this isn't allowed, since you specified an unsigned int; however, you are also converting signed int (the -1 constant) to an unsigned int. So, you are effectively storing the exact same bits into the unsigned int (0xFFFF).
Then, in the expression, you take the negative of the 0xFFFF value, which of course forces the number to be signed. In effect, you are circumventing the unsigned directive at ever step.

How to cast or convert an unsigned int to int in C?

My apologies if the question seems weird. I'm debugging my code and this seems to be the problem, but I'm not sure.
Thanks!
It depends on what you want the behaviour to be. An int cannot hold many of the values that an unsigned int can.
You can cast as usual:
int signedInt = (int) myUnsigned;
but this will cause problems if the unsigned value is past the max int can hold. This means half of the possible unsigned values will result in erroneous behaviour unless you specifically watch out for it.
You should probably reexamine how you store values in the first place if you're having to convert for no good reason.
EDIT: As mentioned by ProdigySim in the comments, the maximum value is platform dependent. But you can access it with INT_MAX and UINT_MAX.
For the usual 4-byte types:
4 bytes = (4*8) bits = 32 bits
If all 32 bits are used, as in unsigned, the maximum value will be 2^32 - 1, or 4,294,967,295.
A signed int effectively sacrifices one bit for the sign, so the maximum value will be 2^31 - 1, or 2,147,483,647. Note that this is half of the other value.
Unsigned int can be converted to signed (or vice-versa) by simple expression as shown below :
unsigned int z;
int y=5;
z= (unsigned int)y;
Though not targeted to the question, you would like to read following links :
signed to unsigned conversion in C - is it always safe?
performance of unsigned vs signed integers
Unsigned and signed values in C
What type-conversions are happening?
IMHO this question is an evergreen. As stated in various answers, the assignment of an unsigned value that is not in the range [0,INT_MAX] is implementation defined and might even raise a signal. If the unsigned value is considered to be a two's complement representation of a signed number, the probably most portable way is IMHO the way shown in the following code snippet:
#include <limits.h>
unsigned int u;
int i;
if (u <= (unsigned int)INT_MAX)
i = (int)u; /*(1)*/
else if (u >= (unsigned int)INT_MIN)
i = -(int)~u - 1; /*(2)*/
else
i = INT_MIN; /*(3)*/
Branch (1) is obvious and cannot invoke overflow or traps, since it
is value-preserving.
Branch (2) goes through some pains to avoid signed integer overflow
by taking the one's complement of the value by bit-wise NOT, casts it
to 'int' (which cannot overflow now), negates the value and subtracts
one, which can also not overflow here.
Branch (3) provides the poison we have to take on one's complement or
sign/magnitude targets, because the signed integer representation
range is smaller than the two's complement representation range.
This is likely to boil down to a simple move on a two's complement target; at least I've observed such with GCC and CLANG. Also branch (3) is unreachable on such a target -- if one wants to limit the execution to two's complement targets, the code could be condensed to
#include <limits.h>
unsigned int u;
int i;
if (u <= (unsigned int)INT_MAX)
i = (int)u; /*(1)*/
else
i = -(int)~u - 1; /*(2)*/
The recipe works with any signed/unsigned type pair, and the code is best put into a macro or inline function so the compiler/optimizer can sort it out. (In which case rewriting the recipe with a ternary operator is helpful. But it's less readable and therefore not a good way to explain the strategy.)
And yes, some of the casts to 'unsigned int' are redundant, but
they might help the casual reader
some compilers issue warnings on signed/unsigned compares, because the implicit cast causes some non-intuitive behavior by language design
If you have a variable unsigned int x;, you can convert it to an int using (int)x.
It's as simple as this:
unsigned int foo;
int bar = 10;
foo = (unsigned int)bar;
Or vice versa...
If an unsigned int and a (signed) int are used in the same expression, the signed int gets implicitly converted to unsigned. This is a rather dangerous feature of the C language, and one you therefore need to be aware of. It may or may not be the cause of your bug. If you want a more detailed answer, you'll have to post some code.
Some explain from C++Primer 5th Page 35
If we assign an out-of-range value to an object of unsigned type, the result is the remainder of the value modulo the number of values the target type can hold.
For example, an 8-bit unsigned char can hold values from 0 through 255, inclusive. If we assign a value outside the range, the compiler assigns the remainder of that value modulo 256.
unsigned char c = -1; // assuming 8-bit chars, c has value 255
If we assign an out-of-range value to an object of signed type, the result is undefined. The program might appear to work, it might crash, or it might produce garbage values.
Page 160:
If any operand is an unsigned type, the type to which the operands are converted depends on the relative sizes of the integral types on the machine.
...
When the signedness differs and the type of the unsigned operand is the same as or larger than that of the signed operand, the signed operand is converted to unsigned.
The remaining case is when the signed operand has a larger type than the unsigned operand. In this case, the result is machine dependent. If all values in the unsigned type fit in the large type, then the unsigned operand is converted to the signed type. If the values don't fit, then the signed operand is converted to the unsigned type.
For example, if the operands are long and unsigned int, and int and long have the same size, the length will be converted to unsigned int. If the long type has more bits, then the unsigned int will be converted to long.
I found reading this book is very helpful.

Resources