Adding unsigned integers in C - c

Here are two very simple programs. I would expect to get the same output, but I don't. I can't figure out why. The first outputs 251. The second outputs -5. I can understand why the 251. However, I don't see why the second program gives me a -5.
PROGRAM 1:
#include <stdio.h>
int main()
{
unsigned char a;
unsigned char b;
unsigned int c;
a = 0;
b= -5;
c = (a + b);
printf("c hex: %x\n", c);
printf("c dec: %d\n",c);
}
Output:
c hex: fb
c dec: 251
PROGRAM 2:
#include <stdio.h>
int main()
{
unsigned char a;
unsigned char b;
unsigned int c;
a = 0;
b= 5;
c = (a - b);
printf("c hex: %x\n", c);
printf("c dec: %d\n",c);
}
Output:
c hex: fffffffb
c dec: -5

In the first program, b=-5; assigns 251 to b. (Conversions to an unsigned type always reduce the value modulo one plus the max value of the destination type.)
In the second program, b=5; simply assigns 5 to b, then c = (a - b); performs the subtraction 0-5 as type int due to the default promotions - put simply, "smaller than int" types are always promoted to int before being used as operands of arithmetic and bitwise operators.
Edit: One thing I missed: Since c has type unsigned int, the result -5 in the second program will be converted to unsigned int when the assignment to c is performed, resulting in UINT_MAX-4. This is what you see with the %x specifier to printf. When printing c with %d, you get undefined behavior, because %d expects a (signed) int argument and you passed an unsigned int argument with a value that's not representable in plain (signed) int.

There are two separate issues here. The first is the fact that you are getting different hex values for what looks like the same operations. The underlying fact that you are missing is that chars are promoted to ints (as are shorts) to do arithmetic. Here is the difference:
a = 0 //0x00
b = -5 //0xfb
c = (int)a + (int)b
Here, a is extended to 0x00000000 and b is extended to 0x000000fb (not sign extended, because it is an unsigned char). Then, the addition is performed, and we get 0x000000fb.
a = 0 //0x00
b = 5 //0x05
c = (int)a - (int)b
Here, a is extended to 0x00000000 and b is extended to 0x00000005. Then, the subtraction is performed, and we get 0xfffffffb.
The solution? Stick with chars or ints; mixing them can cause things you won't expect.
The second problem is that an unsigned int is being printed as -5, clearly a signed value. However, in the string, you told printf to print its second argument, interpreted as a signed int (that's what "%d" means). The trick here is that printf doesn't know what the types of the variables you passed in. It merely interprets them in the way the string tells it to. Here's an example where we tell printf to print a pointer as an int:
int main()
{
int a = 0;
int *p = &a;
printf("%d\n", p);
}
When I run this program, I get a different value each time, which is the memory location of a, converted to base 10. You may note that this kind of thing causes a warning. You should read all of the warnings your compiler gives you, and only ignore them if you're completely sure you are doing what you intend to.

You're using the format specifier %d. That treats the argument as a signed decimal number (basically int).
You get 251 from the first program because (unsigned char)-5 is 251 then you print it like a signed decimal digit. It gets promoted to 4 bytes instead of 1, and those bits are 0, so the number looks like 0000...251 (where the 251 is binary, I just didn't convert it).
You get -5 from the second program because (unsigned int)-5 is some large value, but casted to an int, it's -5. It gets treated like an int because of the way you use printf.
Use the format specifier %ud to print unsigned decimal values.

What you're seeing is the result of how the underlying machine is representing the numbers how the C standard defines signed to unsigned type conversions (for the arithmetic) and how the underlying machine is representing numbers (for the result of the undefined behavior at the end).
When I originally wrote my response I had assumed that the C standard didn't explicitly define how signed values should be converted to unsigned values, since the standard doesn't define how signed values should be represented or how to convert unsigned values to signed values when the range is outside that of the signed type.
However, it turns out that the standard does explicitly define that when converting from negative signed to positive unsigned values. In the case of an integer, a negative signed value x will be converted to UINT_MAX+1-x, just as if it were stored as a signed value in two's complement and then interpreted as an unsigned value.
So when you say:
unsigned char a;
unsigned char b;
unsigned int c;
a = 0;
b = -5;
c = a + b;
b's value becomes 251, because -5 is converted to an unsigned type of value UCHAR_MAX-5+1 (255-5+1) using the C standard. It's then after that conversion that the addition takes place. That makes a+b the same as 0 + 251, which is then stored in c. However, when you say:
unsigned char a;
unsigned char b;
unsigned int c;
a = 0;
b = 5;
c = (a-b);
printf("c dec: %d\n", c);
In this case, a and b are promoted to unsigned ints, to match with c, so they remain 0 and 5 in value. However 0 - 5 in unsigned integer math leads to an underflow error, which is defined to result in UINT_MAX+1-5. If this had happened before the promotion, the value would be UCHAR_MAX+1-5 (i.e. 251 again).
However, the reason you see -5 printed in your output is a combination of the fact that the unsigned integer UINT_MAX-4 and -5 have the same exact binary representation, just like -5 and 251 do with a single-byte datatype, and the fact that when you used "%d" as the formatting string, that told printf to interpret the value of c as a signed integer instead of an unsigned integer.
Since a conversion from unsigned values to signed values for invalid values isn't defined, the result becomes implementation specific. In your case, since the underlying machine uses two's complement for signed values, the result is that the unsigned value UINT_MAX-4 becomes the signed value -5.
The only reason this doesn't happen in the first program because an unsigned int and a signed int can both represent 251, so converting between the two is well defined and using "%d" or "%u" doesn't matter. In the second program, however, it results in undefined behavior and becomes implementation specific since your value of UINT_MAX-4 went outside the range of an signed int.
What's happening under the hood
It's always good to double check what you think is happening or what should happen with what's actually happening, so let's look at the assembly language output from the compiler now to see exactly what's going on. Here's the meaningful part of the first program:
mov BYTE PTR [rbp-1], 0 ; a becomes 0
mov BYTE PTR [rbp-2], -5 ; b becomes -5, which as an unsigned char is also 251
movzx edx, BYTE PTR [rbp-1] ; promote a by zero-extending to an unsigned int, which is now 0
movzx eax, BYTE PTR [rbp-2] ; promote b by zero-extending to an unsigned int which is now 251
add eax, edx ; add a and b, that is, 0 and 251
Notice that although we store a signed value of -5 in the byte b, when the compiler promotes it, it promotes it by zero-extending the number, meaning it's being interpreted as the unsigned value that 11111011 represents instead of the signed value. Then the promoted values are added together to become c. This is also why the C standard defines signed to unsigned conversions the way it does -- it's easy to implement the conversions on architectures that use two's complement for signed values.
Now with program 2:
mov BYTE PTR [rbp-1], 0 ; a = 0
mov BYTE PTR [rbp-2], 5 ; b = 5
movzx edx, BYTE PTR [rbp-1] ; a is promoted to 32-bit integer with value 0
movzx eax, BYTE PTR [rbp-2] ; b is promoted to a 32-bit integer with value 5
mov ecx, edx
sub ecx, eax ; a - b is now done as 32-bit integers resulting in -5, which is '4294967291' when interpreted as unsigned
We see that a and b are once again promoted before any arithmetic, so we end up subtracting two unsigned ints, which leads to a UINT_MAX-4 due to underflow, which is also -5 as a signed value. So whether you interpret it as a signed or unsigned subtraction, due to the machine using two's complement form, the result matches the C standard without any extra conversions.

Assigning a negative number to an unsigned variable is basically breaking the rules. What you're doing is converting the negative number to a large positive number. You're not even guaranteed, technically, that the conversion is the same from one processor to another -- on a 1's complement system (if any still existed) you'd get a different value, eg.
So you get what you get. You can't expect signed algebra to still apply.

Related

Using Type Modifier (signed) Comparisons

This prints "signed comparison" https://onlinegdb.com/eA87wKQkU
#include <stdio.h>
#include <stdint.h>
int main()
{
uint64_t A = -1, B = 1;
if ((signed)A < (signed)B)
{
printf("signed comparison");
}
return 0;
}
To ensure an overall signed comparison, looks like the (signed) type modifier must be applied to A and B.
Is this correct?
Also, I haven't seen any C code using ((signed)A < (signed)B) and was wondering if it's valid C89/99?
Perhaps ((int64_t)A < (int64_t)B) is a better approach?
Thanks.
The answer to both questions is yes:
if you only convert A or B as (signed), which means (signed int), the comparison will still be performed as uint64_t because the converted value will be converted to the larger type uint64_t. Converting both A and B is hence necessary.
converting to int64_t is probably a better idea as this signed type is larger, but it should not matter in this particular example: converting A, whose value is UINT64_MAX to int or int64_t is implementation defined and may or may not produce -1. The C Standard allows for an implementation defined signal to be raised by this out of range conversion.
on most current architectures, no signal will be raised and the conversion of A will indeed produce -1 and the code will print signed comparison. Yet you should end the output with a newline for proper operation.
This is a slightly unusual solution, irrelevant for all practical purposes, but it does have the one advantage of avoiding the following conversion rule from the C standard:
Otherwise, the new type is signed and the value cannot be represented in it; either the
result is implementation-defined or an implementation-defined signal is raised.
The idea of this solution is to invert the top bit of the unsigned type in the values being compared before the comparison.
Converting -1 to uint64_t produces the value 0xffffffffffffffff in A.
Converting 1 to uint64_t produces the value 0x0000000000000001 in B.
So the comparison A < B is false.
Denoting A and B with the top bit inverted as Ax and Bx respectively, then Ax has the value 0x7fffffffffffffff and Bx has the value 0x8000000000000001. The comparison Ax < Bx is true.
One way to invert the top bit of a uint64_t value is to add or subtract INT64_MIN and convert the result back to uint64_t. Converting INT64_MIN to uint64_t produces the value 0x8000000000000000 and adding or subtracting that to another uint64_t value will invert the top bit. This also works for any other exact-width unsigned type with the 64 changed to exact width in question.
So the following will do a "signed comparison" of the uint64_t values A and B:
if ((uint64_t)(A - INT64_MIN) < (uint64_t)(B - INT64_MIN))
printf("A signed less than B\n");
Type-casting the adjusted values back to the unsigned integer type as shown above is only necessary for unsigned types whose values can all be represented by int. For example, it is necessary when A and B are of type uint8_t. A would have the value 255, B would have the value 1, Ax would have value 383 (for subtraction) or 127 (for addition), Bx would have the value 129 (for subtraction) or -127 (for addition), and Ax < Bx would be false. The type-cast would convert Ax to 127 and Bx to 129 so the comparison Ax < Bx would be true.

Clarifications about unsigned type in C

Hi I'm currently learning C and there's something that I quite don't understand.
First of all I was told that if I did this:
unsigned int c2 = -1;
printf("c2 = %u\n", c2);
It would output 255, according to this table:
But I get a weird result: c2 = 4294967295
Now what's weirder is that this works:
unsigned char c2 = -1;
printf("c2 = %d\n", c2);
But I don't understand because since a char is, well, a char why does it even print anything? Since the specifier here is %d and not %u as it should be for unsigned types.
The following code:
unsigned int c2 = -1;
printf("c2 = %u\n", c2);
Will never print 255. The table you are looking at is referring to an unsigned integer of 8 bits. An int in C needs to be at least 16 bits in order to comply with the C standard (UINT_MAX defined as 2^16-1 in paragraph §5.2.4.2.1, page 22 here). Therefore the value you will see is going to be a much larger number than 255. The most common implementations use 32 bits for an int, and in that case you'll see 4294967295 (2^32 - 1).
You can check how many bits are used for any kind of variable on your system by doing sizeof(type_or_variable) * CHAR_BIT (CHAR_BIT is defined in limits.h and represents the number of bits per byte, which is again most of the times 8).
The correct code to obtain 255 as output is:
unsigned char c = -1;
printf("c = %hhu\n", c);
Where the hh prefix specifier means (from man 3 printf):
hh: A following integer conversion corresponds to a signed char or unsigned char argument, or a following n conversion corresponds to a pointer to a signed char argument.
Anything else is just implementation defined or even worse undefined behavior.
In this declaration
unsigned char c2 = -1;
the internal representation of -1 is truncated to one byte and interpreted as unsigned char. That is all bits of the object c2 are set.
In this call
printf("c2 = %d\n", c2);
the argument that has the type unsigned char is promoted to the type int preserving its value that is 255. This value is outputted as an integer.
Is this declaration
unsigned int c2 = -1;
there is no truncation. The integer value -1 that usually occupies 4 bytes (according to the size of the type int) is interpreted as an unsigned value with all bits set.
So in this call
printf("c2 = %u\n", c2);
there is outputted the maximum value of the type unsigned int. It is the maximum value because all bits in the internal representation are set. The conversion from signed integer type to a larger unsigned integer type preserve the sign propagating it to the width of the unsigned integer object.
In C integer can have multiple representations, so multiple storage sizes and value ranges
refer to the table below for more details.

Whats wrong with this C code?

My sourcecode:
#include <stdio.h>
int main()
{
char myArray[150];
int n = sizeof(myArray);
for(int i = 0; i < n; i++)
{
myArray[i] = i + 1;
printf("%d\n", myArray[i]);
}
return 0;
}
I'm using Ubuntu 14 and gcc to compile it, what it prints out is:
1
2
3
...
125
126
127
-128
-127
-126
-125
...
Why doesn't it just count up to 150?
int value of a char can range from 0 to 255 or -127 to 127, depending on implementation.
Therefore once the value reaches 127 in your case, it overflows and you get negative value as output.
The signedness of a plain char is implementation defined.
In your case, a char is a signed char, which can hold the value of a range to -128 to +127.
As you're incrementing the value of i beyond the limit signed char can hold and trying to assign the same to myArray[i] you're facing an implementation-defined behaviour.
To quote C11, chapter §6.3.1.4,
Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised.
Because a char is a SIGNED BYTE. That means it's value range is -128 -> 127.
EDIT Due to all the below comment suggesting this is wrong / not the issue / signdness / what not...
Running this code:
char a, b;
unsigned char c, d;
int si, ui, t;
t = 200;
a = b = t;
c = d = t;
si = a + b;
ui = c + d;
printf("Signed:%d | Unsigned:%d", si, ui);
Prints: Signed:-112 | Unsigned:400
Try yourself
The reason is the same. a & b are signed chars (signed variables of size byte - 8bits). c & d are unsigned. Assigning 200 to the signed variables overflows and they get the value -56. In memory, a, b,c&d` all hold the same value, but when used their type "signdness" dictates how the value is used, and in this case it makes a big difference.
Note about standard
It has been noted (in the comments to this answer, as well as other answers) that the standard doesn't mandate that char is signed. That is true. However, in the case presented by OP, as well the code above, char IS signed.
It seems that your compiler by default considers type char like type signed char. In this case CHAR_MIN is equal to SCHAR_MIN and in turn equal to -128 while CHAR_MAX is equal to SCHAR_MAX and in turn equal to 127 (See header <limits.h>)
According to the C Standard (6.2.5 Types)
15 The three types char, signed char, and unsigned char are
collectively called the character types. The implementation shall
define char to have the same range, representation, and behavior as
either signed char or unsigned char
For signed types one bit is used as the sign bit. So for the type signed char the maximum value corresponds to the following representation in the hexadecimal notation
0x7F
and equal to 127. The most significant bit is the signed bit and is equal to 0.
For negative values the signed bit is set to 1 and for example -128 is represented like
0x80
When in your program the value stored in char reaches its positive maximum 0x7Fand was increased it becomes equal to 0x80 that in the decimal notation is equal to -128.
You should explicitly use type unsigned char instead of the char if you want that the result of the program execution did not depend on the compiler settings.
Or in the printf statement you could explicitly cast type char to type unsigned char. For example
printf("%d\n", ( unsigned char )myArray[i]);
Or to compare results you could write in the loop
printf("%d %d\n", myArray[i], ( unsigned char )myArray[i]);

Initializing unsigned short int to signed value

#include<stdio.h>
int main()
{
unsigned short a=-1;
printf("%d",a);
return 0;
}
This is giving me output 65535. why?
When I increased the value of a in negative side the output is (2^16-1=)65535-a.
I know the range of unsigned short int is 0 to 65535.
But why is rotating in the range 0 to 65535.What is going inside?
#include<stdio.h>
int main()
{
unsigned int a=-1;
printf("%d",a);
return 0;
}
Output is -1.
%d is used for signed decimal integer than why here it is not following the rule of printing the largest value of its(int) range.
Why the output in this part is -1?
I know %u is used for printing unsigned decimal integer.
Why the behavioral is undefined in second code and not in first.?
This I have compiled in gcc compiler. It's a C code
On my machine sizeof short int is 2 bytes and size of int is 4 bytes.
In your implementation, short is 16 bits and int is 32 bits.
unsigned short a=-1;
printf("%d",a);
First, -1 is converted to unsigned short. This results in the value 65535. For the precise definition see the standard "integer conversions". To summarize: the value is taken modulo USHORT_MAX+1.
This value 65535 is assigned to a.
Then for the printf, which uses varargs, the value is promoted back to int. varargs never pass integer types smaller than int, they're always converted to int. This results in the value 65535, which is printed.
unsigned int a=-1;
printf("%d",a);
First line, same as before but modulo UINT_MAX+1. a is 4294967295.
For the printf, a is passed as an unsigned int. Since %d requires an int the behavior is undefined by the C standard. But your implementation appears to have reinterpreted the unsigned value 4294967295, which has all bits set, as as a signed integer with all-bits-set, i.e. the two's-complement value -1. This behavior is common but not guaranteed.
Variable assignment is done to the amount of memory of the type of the variable (e.g., short is 2 bytes, int is 4 bytes, in 32 bit hardware, typically). Sign of the variable is not important in the assignment. What matters here is how you are going to access it. When you assign to a 'short' (signed/unsigned) you assign the value to a '2 bytes' memory. Now if you are going to use '%d' in printf, printf will consider it 'integer' (4 bytes in your hardware) and the two MSBs will be 0 and hence you got [0|0](two MSBs) [-1] (two LSBs). Due to the new MSBs (introduced by %d in printf, migration) your sign bit is hidden in the LSBs and hence printf considers it unsigned (due to the MSBs being 0) and you see the positive value. To get a negative in this you need to use '%hd' in first case. In the second case you assigned to '4 bytes' memory and the MSB got its SIGN bit '1' (means negative) during assignment and hence you see the negative number in '%d' of printf. Hope it explains. For more clarification please comment on the answer.
NB: I used 'MSB' for a shorthand of higher-order byte(s). Please read it according to the context (e.g., 'SIGN bit' will make you read like 'Most Significant Bit'). Thanks.

Comparison operation on unsigned and signed integers

See this code snippet
int main()
{
unsigned int a = 1000;
int b = -1;
if (a>b) printf("A is BIG! %d\n", a-b);
else printf("a is SMALL! %d\n", a-b);
return 0;
}
This gives the output: a is SMALL: 1001
I don't understand what's happening here. How does the > operator work here? Why is "a" smaller than "b"? If it is indeed smaller, why do i get a positive number (1001) as the difference?
Binary operations between different integral types are performed within a "common" type defined by so called usual arithmetic conversions (see the language specification, 6.3.1.8). In your case the "common" type is unsigned int. This means that int operand (your b) will get converted to unsigned int before the comparison, as well as for the purpose of performing subtraction.
When -1 is converted to unsigned int the result is the maximal possible unsigned int value (same as UINT_MAX). Needless to say, it is going to be greater than your unsigned 1000 value, meaning that a > b is indeed false and a is indeed small compared to (unsigned) b. The if in your code should resolve to else branch, which is what you observed in your experiment.
The same conversion rules apply to subtraction. Your a-b is really interpreted as a - (unsigned) b and the result has type unsigned int. Such value cannot be printed with %d format specifier, since %d only works with signed values. Your attempt to print it with %d results in undefined behavior, so the value that you see printed (even though it has a logical deterministic explanation in practice) is completely meaningless from the point of view of C language.
Edit: Actually, I could be wrong about the undefined behavior part. According to C language specification, the common part of the range of the corresponding signed and unsigned integer type shall have identical representation (implying, according to the footnote 31, "interchangeability as arguments to functions"). So, the result of a - b expression is unsigned 1001 as described above, and unless I'm missing something, it is legal to print this specific unsigned value with %d specifier, since it falls within the positive range of int. Printing (unsigned) INT_MAX + 1 with %d would be undefined, but 1001u is fine.
On a typical implementation where int is 32-bit, -1 when converted to an unsigned int is 4,294,967,295 which is indeed ≥ 1000.
Even if you treat the subtraction in an unsigned world, 1000 - (4,294,967,295) = -4,294,966,295 = 1,001 which is what you get.
That's why gcc will spit a warning when you compare unsigned with signed. (If you don't see a warning, pass the -Wsign-compare flag.)
You are doing unsigned comparison, i.e. comparing 1000 to 2^32 - 1.
The output is signed because of %d in printf.
N.B. sometimes the behavior when you mix signed and unsigned operands is compiler-specific. I think it's best to avoid them and do casts when in doubt.
#include<stdio.h>
int main()
{
int a = 1000;
signed int b = -1, c = -2;
printf("%d",(unsigned int)b);
printf("%d\n",(unsigned int)c);
printf("%d\n",(unsigned int)a);
if(1000>-1){
printf("\ntrue");
}
else
printf("\nfalse");
return 0;
}
For this you need to understand the precedence of operators
Relational Operators works left to right ...
so when it comes
if(1000>-1)
then first of all it will change -1 to unsigned integer because int is by default treated as unsigned number and it range it greater than the signed number
-1 will change into the unsigned number ,it changes into a very big number
Find a easy way to compare, maybe useful when you can not get rid of unsigned declaration, (for example, [NSArray count]), just force the "unsigned int" to an "int".
Please correct me if I am wrong.
if (((int)a)>b) {
....
}
The hardware is designed to compare signed to signed and unsigned to unsigned.
If you want the arithmetic result, convert the unsigned value to a larger signed type first. Otherwise the compiler wil assume that the comparison is really between unsigned values.
And -1 is represented as 1111..1111, so it a very big quantity ... The biggest ... When interpreted as unsigned.
while comparing a>b where a is unsigned int type and b is int type, b is type casted to unsigned int so, signed int value -1 is converted into MAX value of unsigned**(range: 0 to (2^32)-1 )**
Thus, a>b i.e., (1000>4294967296) becomes false. Hence else loop printf("a is SMALL! %d\n", a-b); executed.

Resources