Variable conversion mixup in C - c

I have the following C code and I have to understand why the result is a fairly big positive number:
int k;
unsigned int l;
float f;
f=4; l = 1; k = -2;
printf("\n %f", (unsigned int)l+k+f);
The result is a very large number (around 4 billion, max for 32 bit integers), so I suspect it has something to do with the representation of signed negative integers (two's complement) that looks fairly big if we look at it as unsigned.
However I don't quite understand why it behaves this way and what the float has to do with the behavior (if I remove it it stops doing it). Could someone explain how does it do that ? What is the process that happens when adding the numbers that leads to this result ?

The problem is that when you add a signed int to an unsigned, C converts the result to an unsigned int, even negative ones. Since k is negative, it gets re-interpreted as a large positive number before the addition. After that the f is added, but it is small in comparison to negative 2 re-interpreted as a positive number.
Here is a short illustration of the problem:
int k = -2;
unsigned int l = 1;
printf("\n %u", l+k);
This prints 4294967295 on a 32-bit system, because -2 in two's complement representation is 0xFFFFFFFE (demo).

The answer is that when the compiler does the l+k it typecasts the k to unsigned int which turns it into that big number. If you change the order of the vars to this: l+f+k this behavior will not occur.

What happens is that a negative integer is converted to an unsigned integer by adding the maximum unsigned value, plus one, repeatedly to the signed value until it's representable. This is why you see such a large value.
With two's complement, it's the same as truncating or sign extending the value and interpreting it as unsigned which is simple for a computer to do.

Related

Negative power of 2

I am learning the characteristics of the different data type. For example, this program increasingly prints the power of 2 with four different formats: integer, unsigned integer, hexadecimal, octal
#include<stdio.h>
int main(int argc, char *argv[]){
int i, val = 1;
for (i = 1; i < 35; ++i) {
printf("%15d%15u%15x%15o\n", val, val, val, val);
val *= 2;
}
return 0;
}
It works. unsigned goes up to 2147483648. integer goes up to -2147483648. But why does it become negative?
I have a theory: is it because the maximum signed integer we can represent on a 32 bit machine is 2147483647? If so, why does it return the negative number?
First of all, you should understand that this program is undefined. It causes signed integer overflow, and this is declared undefined in the C Standard.
The technical reason is that no behavior can be predicted as different representations are allowed for negative numbers and there could even be padding bits in the representation.
The most probable reason you see a negative number in your case is that your machine uses 2's complement (look it up) to represent negative numbers while arithmetics operate on bits without overflow checks. Therefore, the highest bit is the sign bit, and if your value overflows into this bit, it turns negative.
What you describe is UB caused by integer overflow. Since the behavior is undefined, anything could happen (“When the compiler encounters [a given undefined construct] it is legal for it to make demons fly out of your nose”), BUT, what actually happens on some machines (I suspect yours included) is this:
You start with int val = 1;. That is represented 0b00...1 in binary form. Each time you val *= 2; the value is multiplied by 2, therefore the representation changes to 0b00...10 and then to 0b00...100 and so on (the 1 bit moves one step each time). The last time you val *= 2; you get 0b100.... Now, using 2's complement (which is what I guess your machine uses, as it very common) the value is actually -1 * 0b1000... which is -2147483648
Note, that even though this might be what's really going on in your machine, it's not to be trusted or thought of as the "right" thing to happen, since, as mentioned before, this is UB
In this program, the val value will overflow, if it is a 32- bit machine, because the size of integer is 4 bytes. Now, we have 2 type of values in math, positive and negative, so to do calculation involving negative results, we use sign representations i.e int or char in C language.
Lets take the example of char, range -128 to 127, unsigned char range 0-255 .
It tells, range is divided into two parts for signed representation. So for any signed variable, if it crosses its range of +ve value, it goes into negative value. Like here in case of char, as the value goes above the 127, it becomes -ve. And suppose if you add 300 to any char or unsigned char variable what happens, it rolls over and starts again from zero.
char a=2;
a+=300;
what is the value?? now you know max value of char is 255(total 256 values, including zero), so 300-256 = 44 + 2 =46.
Hope this helps

Output of a short int and a unsigned short?

So I'm trying to interpret the following output:
short int v = -12345;
unsigned short uv = (unsigned short) v;
printf("v = %d, uv = %u\n", v, uv);
Output:
v = -12345
uv = 53191
So the question is: why is this exact output generated when this program is run on a two's complement machine?
What operations lead to this result when casting the value to unsigned short?
My answer assumes 16-bit two's complement arithmetic.
To find the value of -12345, take 12345, complement it, and add 1.
12345 is 0x3039 is 0011000000111001.
Complementing means changing all the 1's to 0's and all the 0's to 1's:
1100111111000110 is 0xcfc6 is 53190.
Add one: 53191.
So internally, -12345 is represented by 0xcfc7 = 53191.
But if you interpret it as an unsigned number, it's obviously just 53191. (And when you assign a signed value to an unsigned integer of the same size, what typically ends up happening is that you assign the exact bit pattern, without converting anything. Later, however, you will typically interpret that value differently, such as when you print it with %u.)
Another, perhaps easier way to think about this is that 16-bit arithmetic "wraps around" at 216 = 65536. So you can think of 65536 as being another name for 0 (just like 0:00 and 24:00 are both names for midnight). So -12345 is 65536 - 12345 = 53191.
The conversion rules, when converting signed integer to an unsigned integer, defined by C standard requires by repeatedly adding the TYPE_MAX + 1 to the value.
From 6.3.1.3 Signed and unsigned integers:
Otherwise, if the new type is unsigned, the value is converted by
repeatedly adding or subtracting one more than the maximum value that
can be represented in the new type until the value is in the range of
the new type.
If USHRT_MAX is 65535 and then adding 65535 + 1 + -12345 is 53191.
The output seen does not depend on 2's complement nor 16 or 32- bit int. The output seen is entirely defined and would be the same on a rare 1's complement or sign-magnitude machine. The result does depend on 16-bit unsigned short
-12345 is within the minimum range of a short, so no issues with that assignment. When a short is passed as a ... argument, is goes thought the usual promotion to an int with no change in value as all short are in the range of int. "%d" expects an int, so the output is "-12345"
short int v = -12345;
printf("%d\n", v); // output "-12345\n"
Assigning a negative number to a unsigned type is well defined. With a 16-bit unsigned short, the value of uv is -12345 plus the minimum multiples of USHRT_MAX+1 (65536 in this case) to a final value of 53191.
Passing an unsigned short as an ... argument, the value is converted to int or unsigned, whichever type contains the entire range of unsigned short. IAC, the values does not change. "%u" matches an unsigned. It also matches an int whose values are expressible as either an int or unsigned.
short int v = -12345;
unsigned short uv = (unsigned short) v;
printf("%u\n", v); // output "53191\n"
What operations lead to this result when casting the value to unsigned short?
The casting did not affect the final outcome. The same result would have occurred without the cast. The cast may be useful to quiet warnings.

How does C store negative numbers in signed vs unsigned integers?

Here is the example:
#include <stdio.h>
int main()
{
int x=35;
int y=-35;
unsigned int z=35;
unsigned int p=-35;
signed int q=-35;
printf("Int(35d)=%d\n\
Int(-35d)=%d\n\
UInt(35u)=%u\n\
UInt(-35u)=%u\n\
UInt(-35d)=%d\n\
SInt(-35u)=%u\n",x,y,z,p,p,q);
return 0;
}
Output:
Int(35d)=35
Int(-35d)=-35
UInt(35u)=35
UInt(-35u)=4294967261
UInt(-35d)=-35
SInt(-35u)=4294967261
Does it really matter if I declare the value as signed or unsigned int? Because, C actually only cares about how I read the value from memory. Please help me understand this and I hope you prove me wrong.
Representation of signed integers is up to the underlying platform, not the C language itself. The language definition is mostly agnostic with regard to signed integer representations. Two's complement is probably the most common, but there are other representations such as one's complement and signed magnitude.
In a two's complement system, you negate a value by inverting the bits and adding 1. To get from 5 to -5, you'd do:
5 == 0101 => 1010 + 1 == 1011 == -5
To go from -5 back to 5, you follow the same procedure:
-5 == 1011 => 0100 + 1 == 0101 == 5
Does it really matter if I declare the value as signed or unsigned int?
Yes, for the following reasons:
It affects the values you can represent: unsigned integers can represent values from 0 to 2N-1, whereas signed integers can represent values between -2N-1 and 2N-1-1 (two's complement).
Overflow is well-defined for unsigned integers; UINT_MAX + 1 will "wrap" back to 0. Overflow is not well-defined for signed integers, and INT_MAX + 1 may "wrap" to INT_MIN, or it may not.
Because of 1 and 2, it affects arithmetic results, especially if you mix signed and unsigned variables in the same expression (in which case the result may not be well defined if there's an overflow).
An unsigned int and a signed int take up the same number of bytes in memory. They can store the same byte values. However the data will be treated differently depending on if it's signed or unsigned.
See http://en.wikipedia.org/wiki/Two%27s_complement for an explanation of the most common way to represent integer values.
Since you can typecast in C you can effectively force the compiler to treat an unsigned int as signed int and vice versa, but beware that it doesn't mean it will do what you think or that the representation will be correct. (Overflowing a signed integer invokes undefined behaviour in C).
(As pointed out in comments, there are other ways to represent integers than two's complement, however two's complement is the most common way on desktop machines.)
Does it really matter if I declare the value as signed or unsigned int?
Yes.
For example, have a look at
#include <stdio.h>
int main()
{
int a = -4;
int b = -3;
unsigned int c = -4;
unsigned int d = -3;
printf("%f\n%f\n%f\n%f\n", 1.0 * a/b, 1.0 * c/d, 1.0*a/d, 1.*c/b);
}
and its output
1.333333
1.000000
-0.000000
-1431655764.000000
which clearly shows that it makes a huge difference if I have the same byte representation interpreted as signed or unsigned.
#include <stdio.h>
int main(){
int x = 35, y = -35;
unsigned int z = 35, p = -35;
signed int q = -35;
printf("x=%d\tx=%u\ty=%d\ty=%u\tz=%d\tz=%u\tp=%d\tp=%u\tq=%d\tq=%u\t",x,x,y,y,z,z,p,p,q,q);
}
the result is:
x=35 x=35 y=-35 y=4294967261 z=35 z=35 p=-35 p=4294967261 q=-35 q=4294967261
the int number store is not different, it stored with Complement style in memory,
I can use 0X... the 35 in 0X00000023, and the -35 in 0Xffffffdd, it is not different you use sigend or unsigend. it only output with different sytle. The %d and %u is not different about positive, but the negative the first position is sign, if you output with %u is 0Xffffffdd equal 4294967261, but the %d the 0Xffffffdd can be - 0X00000023 equal -35.
The most fundamental thing that variable's type defines is the way it is stored (that is - read from and written to) in memory and how are the bits interpreted, so your statement can be considered "valid".
You can also look at the problem using conversions. When you store signed and negative value in unsigned variable it gets converted to unsigned. It so happens that this conversion is reversible, so signed -35 converts to unsigned 4294967261, which - when you request it - can be converted to signed -35. That's how 2's complement encoding (see link in other answer) works.

Difference between '(unsigned)1' and '(unsigned)~0'

What is the difference between (unsigned)~0 and (unsigned)1. Why is unsigned of ~0 is -1 and unsigned of 1 is 1? Does it have something to do with the way unsigned numbers are stored in the memory. Why does an unsigned number give a signed result. It didn't give any overflow error either. I am using GCC compiler:
#include<sdio.h>
main()
{
unsigned int x=(unsigned)~0;
unsigned int y=(unsigned)1;
printf("%d\n",x); //prints -1
printf("%d\n",y); //prints 1
}
Because %d is a signed int specifier. Use %u.
which prints 4294967295 on my machine.
As others mentioned, if you interpret the highest unsigned value as signed, you get -1, see the wikipedia entry for two's complement.
Your system uses two's complement representation of negative numbers. In this representation a binary number composed of all ones represent the biggest negative number -1.
Since inverting all bits of a zero gives you a number composed of all ones, you get -1 when you re-interpret the number as a signed number by printing it with a %d which expects a signed number, not an unsigned one.
First, in your use of printf you are telling it to print the number as signed ("%d") instead of unsigned ("%u").
Second, you are right in that it has "something to do with the way numbers are stored in memory". An int (signed or unsigned) is not a single bit on your computer, but a collection of k bits. The exact value of k depends on the specifics of your computer architecture, but most likely you have k=32.
For the sake of succinctness, lets assume your ints are 8 bits long, so k=8 (this is most certainly not the case, unless you are working on a very limited embedded system,). In that case (int)0 is actually 00000000, and (int)~0 (which negates all the bits) is 11111111.
Finally, in two's complement (which is the most common binary representation of signed numbers), 11111111 is actually -1. See http://en.wikipedia.org/wiki/Two's_complement for a description of two's complement.
If you changed your print to use "%u", then it will print a positive integer that represents (2^k-1) where k is the number of bits in an integer (so probably it will print 4294967295).
printf() only knows what type of variable you passed it by what format specifiers you used in your format string. So what's happening here is that you're printing x and y as signed integers, because you used %d in your format string. Try %u instead, and you'll get a result more in line with what you're probably expecting.

Assigning negative numbers to an unsigned int?

In the C programming language, unsigned int is used to store positive values only. However, when I run the following code:
unsigned int x = -12;
printf("%d", x);
The output is still -12. I thought it should have printed out: 12, or am I misunderstanding something?
The -12 to the right of your equals sign is set up as a signed integer (probably 32 bits in size) and will have the hexadecimal value 0xFFFFFFF4. The compiler generates code to move this signed integer into your unsigned integer x which is also a 32 bit entity. The compiler assumes you only have a positive value to the right of the equals sign so it simply moves all 32 bits into x. x now has the value 0xFFFFFFF4 which is 4294967284 if interpreted as a positive number. But the printf format of %d says the 32 bits are to be interpreted as a signed integer so you get -12. If you had used %u it would have printed as 4294967284.
In either case you don't get what you expected since C language "trusts" the writer of code to only ask for "sensible" things. This is common in C. If you wanted to assign a value to x and were not sure whether the value on the right side of the equals was positive you could have written unsigned int x = abs(-12); and forced the compiler to generate code to take the absolute value of a signed integer before moving it to the unsigned integer.
The int is unsinged, but you've told printf to look at it as a signed int.
Try
unsigned int x = -12; printf("%u", x);
It won't print "12", but will print the max value of an unsigned int minus 11.
Exercise to the reader is to find out why :)
Passing %d to printf tells printf to treat the argument as a signed integer, regardless of what you actually pass. Use %u to print as unsigned.
It all has to do with interpretation of the value.
If you assume 16 bit signed and unsigned integers, then here some examples that aren't exactly correct, but demonstrate the concept.
0000 0000 0000 1100 unsigned int, and signed int value 12
1000 0000 0000 1100 signed int value -12, and a large unsigned integer.
For signed integers, the bit on the left is the sign bit.
0 = positive
1 = negative
For unsigned integers, there is no sign bit.
the left hand bit, lets you store a larger number instead.
So the reason you are not seeing what you are expecting is that.
unsigned int x = -12, takes -12 as an integer, and stores it into x. x is unsigned, so
what was a sign bit, is now a piece of the value.
printf lets you tell the compiler how you want a value to be displayed.
%d means display it as if it were a signed int.
%u means display it as if it were an unsigned int.
c lets you do this kind of stuff. You the programmer are in control.
Kind of like a firearm.
It's a tool.
You can use it correctly to deal with certain situations,
or incorrectly to remove one of your toes.
one possibly useful case is the following
unsigned int allBitsOn = -1;
That particular value sets all of the bits to 1
1111 1111 1111 1111
that can be useful sometimes.
printf('%d', x);
Means print a signed integer. You'll have to write this instead:
printf('%u', x);
Also, it'll still not print "12", it's going to be "4294967284".
They do store positive values. But you're outputting the (very high) positive value as a signed integer, so it gets re-interpreted again (in an implementation-defined fashion, I might add).
Use the format flag "%u instead.
Your program has undefined behavior because you passed the wrong type to printf (you told it you were going to pass an int but you passed an unsigned int). Consider yourself lucky that the "easiest" thing for the implementation to do was just silently print the wrong value and not jump to some code that does something harmful...
What you are missing is that the printf("%d",x) expects x to be signed, so although you assign -12 to x it is interpreted as 2's complement which would be a very large number.
However when you pass this really large number to printf it interprets it as signed thus correctly translating it back to -12.
The correct syntax to print a unsigned in print f is "%u" - try this and see what it does!
The assignment of a negative value to an unsigned int does not compute the absolute value of the negative: it interprets as an unsigned int the binary representation of the negative value, i.e., 4294967284 (2^32 - 12).
printf("%d") performs the opposite interpretation. This is why your program displays -12.
int and unsigned int are used to allocate a number of bytes to store a value nothing more.
The compiler should give warnings about signed mismatching but it really does not affect the bits in the memory that represent the value -12.
%x, %d, %u etc tells the compiler how to interrupt a number of bits when you print them.
When you are trying to display the int value you are passing it to a (int) argument and not a (unsigned int) argument and that causes it to print -12 and not 4294967284. Integers are stored in hexadecimal format and -12 for int is the same as 4294967284 for unsigned int in hexadecimal format..
That is why "%u" prints the right value you want and not "%d".. It depends on your argument type..GOOD LUCK!
The -12 is in 16-bit 2's compliment format. So do this:
if (x & 0x8000) { x = ~x+1; }
This will convert the 2's compliment -ve number to the equivalent +ve number. Good luck.
When the compiler implicitly converts -12 to an unsigned integer, the underlying binary representation remains unaltered. This conversion is purely semantic. The sign bit of the two's complement integer becomes the most significant bit of the unsigned integer. Thus when printf treats the unsigned integer as a signed integer with %d, it will see -12.
In general context when only positive numbers can be stored, negative numbers are not stored explicitly but their 2's complement is stored. In the same way here, the 2's complement of -12 will be stored in 'x' and you use %u to get it.

Resources