I am learning the characteristics of the different data type. For example, this program increasingly prints the power of 2 with four different formats: integer, unsigned integer, hexadecimal, octal
#include<stdio.h>
int main(int argc, char *argv[]){
int i, val = 1;
for (i = 1; i < 35; ++i) {
printf("%15d%15u%15x%15o\n", val, val, val, val);
val *= 2;
}
return 0;
}
It works. unsigned goes up to 2147483648. integer goes up to -2147483648. But why does it become negative?
I have a theory: is it because the maximum signed integer we can represent on a 32 bit machine is 2147483647? If so, why does it return the negative number?
First of all, you should understand that this program is undefined. It causes signed integer overflow, and this is declared undefined in the C Standard.
The technical reason is that no behavior can be predicted as different representations are allowed for negative numbers and there could even be padding bits in the representation.
The most probable reason you see a negative number in your case is that your machine uses 2's complement (look it up) to represent negative numbers while arithmetics operate on bits without overflow checks. Therefore, the highest bit is the sign bit, and if your value overflows into this bit, it turns negative.
What you describe is UB caused by integer overflow. Since the behavior is undefined, anything could happen (“When the compiler encounters [a given undefined construct] it is legal for it to make demons fly out of your nose”), BUT, what actually happens on some machines (I suspect yours included) is this:
You start with int val = 1;. That is represented 0b00...1 in binary form. Each time you val *= 2; the value is multiplied by 2, therefore the representation changes to 0b00...10 and then to 0b00...100 and so on (the 1 bit moves one step each time). The last time you val *= 2; you get 0b100.... Now, using 2's complement (which is what I guess your machine uses, as it very common) the value is actually -1 * 0b1000... which is -2147483648
Note, that even though this might be what's really going on in your machine, it's not to be trusted or thought of as the "right" thing to happen, since, as mentioned before, this is UB
In this program, the val value will overflow, if it is a 32- bit machine, because the size of integer is 4 bytes. Now, we have 2 type of values in math, positive and negative, so to do calculation involving negative results, we use sign representations i.e int or char in C language.
Lets take the example of char, range -128 to 127, unsigned char range 0-255 .
It tells, range is divided into two parts for signed representation. So for any signed variable, if it crosses its range of +ve value, it goes into negative value. Like here in case of char, as the value goes above the 127, it becomes -ve. And suppose if you add 300 to any char or unsigned char variable what happens, it rolls over and starts again from zero.
char a=2;
a+=300;
what is the value?? now you know max value of char is 255(total 256 values, including zero), so 300-256 = 44 + 2 =46.
Hope this helps
Related
Can you guy explain me how overflows and underflows works for signed char and unsigned char?
int main () {
signed char c;
scanf("%d",&c);
printf("%d\n",c);
printf("%c\n",c);
return 0;
}
In this case, if thanks to scanf, I put c=200 there is an overflow and this is showed by the first printf.
The second printf gives me the same ASCII symbol of 200...
Why?
scanf's %d expects an int, so giving it anything else is undefined behavior.
You should do this:
int d;
scanf("%d", &d);
whatevertype c = (whatevertype)d;
However, signed integer overflow is undefined. But if you use unsigned types, like
unsigned char c = (unsigned char)d;
Then c is guaranteed to be d modulus 2 to the power of the number of bits in an unsigned char.
Every compiler is running on a specific machine, on a specific hardware.
Say for example that our machine/processor signed integer is in the range of 16 bits. This means that MAX_INT will be 0x7fff hex value, which is 32767 decimal value and MIN_INT will be 0x8000 hex vale, which is -32768 decimal value.
Most machines has ALU control register that defines how signed integers will behave in case of an overflow. This register generally has a saturation flag.
Overflow Example:
If the saturation flag is set, than in case that the result of the last signed integer ALU operation is bigger than MAX_INT, the result will be set to MAX_INT.
for example if the last operation was adding 0x7ffe to 0x2 than the result will be 0x7fff.
If the saturation flag is not set, than in case that the result of the last signed integer ALU operation is bigger than MAX_INT, the result will probably be set to the lower 16 bits of the correct result. In our case 0x7ffe+0x2=0x8000, which is the minimum integer.
In case of unsigned integers the compiler guarantees as that the result will be according to the definition of unsigned int addition in C.
Underflow example:
Every machine has MIN_FLOAT definition. And again if the saturation flag is set than a result that is smaller than MIN_FLOAT will be rounded to MIN_FLOAT. other wise the result will be according to the operation of the processor. (Search in the internet to understand the terms Mantissa and exponent if you are interested to know on floating point representation and operations).
The Reason you are getting a Overflow in the first printf statement is because you have defined the char as a signed char and if You print them in an %d format which is integer You'll get a range from 0-127 and -1 to -128 because as we all know char consume one byte in memory and by assigning them as signed character you'll divide them in positive and negative values in simple words a unsigned byte ranges from 0 to 255 and soo a signed character will give you both positive as well as negative value range from 0-127 and -1 to -128.
I have the following C code and I have to understand why the result is a fairly big positive number:
int k;
unsigned int l;
float f;
f=4; l = 1; k = -2;
printf("\n %f", (unsigned int)l+k+f);
The result is a very large number (around 4 billion, max for 32 bit integers), so I suspect it has something to do with the representation of signed negative integers (two's complement) that looks fairly big if we look at it as unsigned.
However I don't quite understand why it behaves this way and what the float has to do with the behavior (if I remove it it stops doing it). Could someone explain how does it do that ? What is the process that happens when adding the numbers that leads to this result ?
The problem is that when you add a signed int to an unsigned, C converts the result to an unsigned int, even negative ones. Since k is negative, it gets re-interpreted as a large positive number before the addition. After that the f is added, but it is small in comparison to negative 2 re-interpreted as a positive number.
Here is a short illustration of the problem:
int k = -2;
unsigned int l = 1;
printf("\n %u", l+k);
This prints 4294967295 on a 32-bit system, because -2 in two's complement representation is 0xFFFFFFFE (demo).
The answer is that when the compiler does the l+k it typecasts the k to unsigned int which turns it into that big number. If you change the order of the vars to this: l+f+k this behavior will not occur.
What happens is that a negative integer is converted to an unsigned integer by adding the maximum unsigned value, plus one, repeatedly to the signed value until it's representable. This is why you see such a large value.
With two's complement, it's the same as truncating or sign extending the value and interpreting it as unsigned which is simple for a computer to do.
Following C code displays the result correctly, -1.
#include <stdio.h>
main()
{
unsigned x = 1;
unsigned y=x-2;
printf("%d", y );
}
But in general, is it always safe to do subtraction involving
unsigned integers?
The reason I ask the question is that I want to do some conditioning
as follows:
unsigned x = 1; // x was defined by someone else as unsigned,
// which I had better not to change.
for (int i=-5; i<5; i++){
if (x+i<0) continue
f(x+i); // f is a function
}
Is it safe to do so?
How are unsigned integers and signed integers different in
representing integers? Thanks!
1: Yes, it is safe to subtract unsigned integers. The definition of arithmetic on unsigned integers includes that if an out-of-range value would be generated, then that value should be adjusted modulo the maximum value for the type, plus one. (This definition is equivalent to truncating high bits).
Your posted code has a bug though: printf("%d", y); causes undefined behaviour because %d expects an int, but you supplied unsigned int. Use %u to correct this.
2: When you write x+i, the i is converted to unsigned. The result of the whole expression is a well-defined unsigned value. Since an unsigned can never be negative, your test will always fail.
You also need to be careful using relational operators because the same implicit conversion will occur. Before I give you a fix for the code in section 2, what do you want to pass to f when x is UINT_MAX or close to it? What is the prototype of f ?
3: Unsigned integers use a "pure binary" representation.
Signed integers have three options. Two can be considered obsolete; the most common one is two's complement. All options require that a positive signed integer value has the same representation as the equivalent unsigned integer value. In two's complement, a negative signed integer is represented the same as the unsigned integer generated by adding UINT_MAX+1, etc.
If you want to inspect the representation, then do unsigned char *p = (unsigned char *)&x; printf("%02X%02X%02X%02X", p[0], p[1], p[2], p[3]);, depending on how many bytes are needed on your system.
Its always safe to subtract unsigned as in
unsigned x = 1;
unsigned y=x-2;
y will take on the value of -1 mod (UINT_MAX + 1) or UINT_MAX.
Is it always safe to do subtraction, addition, multiplication, involving unsigned integers - no UB. The answer will always be the expected mathematical result modded by UINT_MAX+1.
But do not do printf("%d", y ); - that is UB. Instead printf("%u", y);
C11 §6.2.5 9 "A computation involving unsigned operands can never overflow, because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting type."
When unsigned and int are used in +, the int is converted to an unsigned. So x+i has an unsigned result and never is that sum < 0. Safe, but now if (x+i<0) continue is pointless. f(x+i); is safe, but need to see f() prototype to best explain what may happen.
Unsigned integers are always 0 to power(2,N)-1 and have well defined "overflow" results. Signed integers are 2's complement, 1's complement, or sign-magnitude and have UB on overflow. Some compilers take advantage of that and assume it never occurs when making optimized code.
Rather than really answering your questions directly, which has already been done, I'll make some broader observations that really go to the heart of your questions.
The first is that using unsigned in loop bounds where there's any chance that a signed value might crop up will eventually bite you. I've done it a bunch of times over 20 years and it has ultimately bit me every time. I'm now generally opposed to using unsigned for values that will be used for arithmetic (as opposed to being used as bitmasks and such) without an excellent justification. I have seen it cause too many problems when used, usually with the simple and appealing rationale that “in theory, this value is non-negative and I should use the most restrictive type possible”.
I understand that x, in your example, was decided to be unsigned by someone else, and you can't change it, but you want to do something involving x over an interval potentially involving negative numbers.
The “right” way to do this, in my opinion, is first to assess the range of values that x may take. Suppose that the length of an int is 32 bits. Then the length of an unsigned int is the same. If it is guaranteed to be the case that x can never be larger than 2^31-1 (as it often is), then it is safe in principle to cast x to a signed equivalent and use that, i.e. do this:
int y = (int)x;
// Do your stuff with *y*
x = (unsigned)y;
If you have a long that is longer than unsigned, then even if x uses the full unsigned range, you can do this:
long y = (long)x;
// Do your stuff with *y*
x = (unsigned)y;
Now, the problem with either of these approaches is that before assigning back to x (e.g. x=(unsigned)y; in the immediately preceding example), you really must check that y is non-negative. However, these are exactly the cases where working with the unsigned x would have bitten you anyway, so there's no harm at all in something like:
long y = (long)x;
// Do your stuff with *y*
assert( y >= 0L );
x = (unsigned)y;
At least this way, you'll catch the problems and find a solution, rather than having a strange bug that takes hours to find because a loop bound is four billion unexpectedly.
No, it's not safe.
Integers usually are 4 bytes long, which equals to 32 bits. Their difference in representation is:
As far as signed integers is concerned, the most significant bit is used for sign, so they can represent values between -2^31 and 2^31 - 1
Unsigned integers don't use any bit for sign, so they represent values from 0 to 2^32 - 1.
Part 2 isn't safe either for the same reason as Part 1. As int and unsigned types represent integers in a different way, in this case where negative values are used in the calculations, you can't know what the result of x + i will be.
No, it's not safe. Trying to represent negative numbers with unsigned ints smells like bug. Also, you should use %u to print unsigned ints.
If we slightly modify your code to put %u in printf:
#include <stdio.h>
main()
{
unsigned x = 1;
unsigned y=x-2;
printf("%u", y );
}
The number printed is 4294967295
The reason the result is correct is because C doesn't do any overflow checks and you are printing it as a signed int (%d). This, however, does not mean it is safe practice. If you print it as it really is (%u) you won't get the correct answer.
An Unsigned integer type should be thought of not as representing a number, but as a member of something called an "abstract algebraic ring", specifically the equivalence class of integers congruent modulo (MAX_VALUE+1). For purposes of examples, I'll assume "unsigned int" is 16 bits for numerical brevity; the principles would be the same with 32 bits, but all the numbers would be bigger.
Without getting too deep into the abstract-algebraic nitty-gritty, when assigning a number to an unsigned type [abstract algebraic ring], zero maps to the ring's additive identity (so adding zero to a value yields that value), one means the ring's multiplicative identity (so multiplying a value by one yields that value). Adding a positive integer N to a value is equivalent to adding the multiplicative identity, N times; adding a negative integer -N, or subtracting a positive integer N, will yield the value which, when added to +N, would yield the original value.
Thus, assigning -1 to a 16-bit unsigned integer yields 65535, precisely because adding 1 to 65535 will yield 0. Likewise -2 yields 65534, etc.
Note that in an abstract algebraic sense, every integer can be uniquely assigned into to algebraic rings of the indicated form, and a ring member can be uniquely assigned into a smaller ring whose modulus is a factor of its own [e.g. a 16-bit unsigned integer maps uniquely to one 8-bit unsigned integer], but ring members are not uniquely convertible to larger rings or to integers. Unfortunately, C sometimes pretends that ring members are integers, and implicitly converts them; that can lead to some surprising behavior.
Subtracting a value, signed or unsigned, from an unsigned value which is no smaller than int, and no smaller than the value being subtracted, will yield a result according to the rules of algebraic rings, rather than the rules of integer arithmetic. Testing whether the result of such computation is less than zero will be meaningless, because ring values are never less than zero. If you want to operate on unsigned values as though they are numbers, you must first convert them to a type which can represent numbers (i.e. a signed integer type). If the unsigned type can be outside the range that is representable with the same-sized signed type, it will need to be upcast to a larger type.
In the C programming language, unsigned int is used to store positive values only. However, when I run the following code:
unsigned int x = -12;
printf("%d", x);
The output is still -12. I thought it should have printed out: 12, or am I misunderstanding something?
The -12 to the right of your equals sign is set up as a signed integer (probably 32 bits in size) and will have the hexadecimal value 0xFFFFFFF4. The compiler generates code to move this signed integer into your unsigned integer x which is also a 32 bit entity. The compiler assumes you only have a positive value to the right of the equals sign so it simply moves all 32 bits into x. x now has the value 0xFFFFFFF4 which is 4294967284 if interpreted as a positive number. But the printf format of %d says the 32 bits are to be interpreted as a signed integer so you get -12. If you had used %u it would have printed as 4294967284.
In either case you don't get what you expected since C language "trusts" the writer of code to only ask for "sensible" things. This is common in C. If you wanted to assign a value to x and were not sure whether the value on the right side of the equals was positive you could have written unsigned int x = abs(-12); and forced the compiler to generate code to take the absolute value of a signed integer before moving it to the unsigned integer.
The int is unsinged, but you've told printf to look at it as a signed int.
Try
unsigned int x = -12; printf("%u", x);
It won't print "12", but will print the max value of an unsigned int minus 11.
Exercise to the reader is to find out why :)
Passing %d to printf tells printf to treat the argument as a signed integer, regardless of what you actually pass. Use %u to print as unsigned.
It all has to do with interpretation of the value.
If you assume 16 bit signed and unsigned integers, then here some examples that aren't exactly correct, but demonstrate the concept.
0000 0000 0000 1100 unsigned int, and signed int value 12
1000 0000 0000 1100 signed int value -12, and a large unsigned integer.
For signed integers, the bit on the left is the sign bit.
0 = positive
1 = negative
For unsigned integers, there is no sign bit.
the left hand bit, lets you store a larger number instead.
So the reason you are not seeing what you are expecting is that.
unsigned int x = -12, takes -12 as an integer, and stores it into x. x is unsigned, so
what was a sign bit, is now a piece of the value.
printf lets you tell the compiler how you want a value to be displayed.
%d means display it as if it were a signed int.
%u means display it as if it were an unsigned int.
c lets you do this kind of stuff. You the programmer are in control.
Kind of like a firearm.
It's a tool.
You can use it correctly to deal with certain situations,
or incorrectly to remove one of your toes.
one possibly useful case is the following
unsigned int allBitsOn = -1;
That particular value sets all of the bits to 1
1111 1111 1111 1111
that can be useful sometimes.
printf('%d', x);
Means print a signed integer. You'll have to write this instead:
printf('%u', x);
Also, it'll still not print "12", it's going to be "4294967284".
They do store positive values. But you're outputting the (very high) positive value as a signed integer, so it gets re-interpreted again (in an implementation-defined fashion, I might add).
Use the format flag "%u instead.
Your program has undefined behavior because you passed the wrong type to printf (you told it you were going to pass an int but you passed an unsigned int). Consider yourself lucky that the "easiest" thing for the implementation to do was just silently print the wrong value and not jump to some code that does something harmful...
What you are missing is that the printf("%d",x) expects x to be signed, so although you assign -12 to x it is interpreted as 2's complement which would be a very large number.
However when you pass this really large number to printf it interprets it as signed thus correctly translating it back to -12.
The correct syntax to print a unsigned in print f is "%u" - try this and see what it does!
The assignment of a negative value to an unsigned int does not compute the absolute value of the negative: it interprets as an unsigned int the binary representation of the negative value, i.e., 4294967284 (2^32 - 12).
printf("%d") performs the opposite interpretation. This is why your program displays -12.
int and unsigned int are used to allocate a number of bytes to store a value nothing more.
The compiler should give warnings about signed mismatching but it really does not affect the bits in the memory that represent the value -12.
%x, %d, %u etc tells the compiler how to interrupt a number of bits when you print them.
When you are trying to display the int value you are passing it to a (int) argument and not a (unsigned int) argument and that causes it to print -12 and not 4294967284. Integers are stored in hexadecimal format and -12 for int is the same as 4294967284 for unsigned int in hexadecimal format..
That is why "%u" prints the right value you want and not "%d".. It depends on your argument type..GOOD LUCK!
The -12 is in 16-bit 2's compliment format. So do this:
if (x & 0x8000) { x = ~x+1; }
This will convert the 2's compliment -ve number to the equivalent +ve number. Good luck.
When the compiler implicitly converts -12 to an unsigned integer, the underlying binary representation remains unaltered. This conversion is purely semantic. The sign bit of the two's complement integer becomes the most significant bit of the unsigned integer. Thus when printf treats the unsigned integer as a signed integer with %d, it will see -12.
In general context when only positive numbers can be stored, negative numbers are not stored explicitly but their 2's complement is stored. In the same way here, the 2's complement of -12 will be stored in 'x' and you use %u to get it.
Please explain the following paragraph.
"The next question is whether we can assign a certain value to a variable without losing precision. It is not sufficient if we just check for overflow during the addition or subtraction, because someone might add 1 to -5 and assign the result to an unsigned int. Then the actual addition does not overflow, but the result still does not fit."
when i am adding 1 to -5 i dont see any reason to worry.the answer is as it should be -4.
so what is the problem of result not being fit??
you can find the full article here through which i was going:
http://www.fefe.de/intof.html
The binary representation of -4, in a 32-bit word, is as follows (hex notation)
0xfffffffc
When interpreted as an unsigned integer, this bit pattern represents the number 2**32-4, or 18446744073709551612. I'm not sure I would call this phenomenon "overflow", but it is a common mistake to assign a small negative integer to a variable of unsigned type and wind up with a really big positive integer.
This trick is actually exploited for bounds checking: if you have a signed integer i and want to know if it is in the range 0 <= i < n, you can test
if ((unsigned)i < n) { ... }
which gives you the answer using one comparison instead of two. The cast to unsigned has no run-time cost; it just tells the compiler to generate an unsigned comparison instead of a signed comparison.
Try assigning it to a unsigned int, not an int.
The term unsigned int is the key - by default an int datatype will hold negative and positive numbers; however, unsigned ints are always positive. They provide this option because uints can technically hold greater positive values than regular signed ints because they do not need to use a bit to keep track of whether or not its negative or positive.
Please see:
Signed versus Unsigned Integers
The problem is that you're storing -4 in an unsigned int. Unsigned ints can only contain zero and positive values. If you assign -4 to one, you'll actually end up getting a very large positive number (the actual value depends on how wide an int you're using).
The problem is that the sizes of storage such as unsigned int can only hold so much. With 1 and -5 it does not matter, but with 1 and -500000000 you might end up with a confusing result. Also, unsigned storage will interpret anything stored in it as positive, so you cannot put a negative value in an unsigned variable.
Two big things to watch out for:
1. Overflow in the operation itself: 1 + -500000000
2. Issues in casting: (unsigned int)(1 + -500)
Unsigned variables, like unsigned int, cannot hold negative values. So assigning 1 - 5 to an unsigned int won't give you -4. I'm not sure what it'll give you, it's probably compiler specific.
Some code:
signed int first, second;
unsigned int result;
first = obtain(); // happens to be 1
second = obtain(); // happens to be -5
result = first + second; // unexpected result here - very large number - and it's too late to check that there's a problem
Say you obtained those values from keyboard. You need to check before addition that the result can be represented in unsigned int. That's what the article talks about.
By definition the number -4 cannot be represented in an unsigned int. -4 is a signed integer. The same goes for any negative number.
When you assign a negative integer to an unsigned int the actual bits of the number do not change, but they are merely represented differently. You'll get some ridiculously-large number due to the way integers are represented in binary (two's complement).
In two's complement, -4 is represented as 0xfffffffc. When 0xfffffffc is represented as an unsigned int you'll get the number 4,294,967,292.
You have to remember that fundamentally you're working with bits. So you can assign a value of -4 to an unsigned integer and this will place a series of bits into that memory location. Those bits can be interpreted as -4 in certain circumstances. One such circumstance is the obvious one: you've told the compiler/system that the bits in that memory location should be interpreted as a two's compliment signed number. So if you do printf("%s",i) prtinf does its magic and converts the two's compliment number to a magnitude and sign. The magnitude will be 4 and the sign will be negative, so it displays '-4'.
However, if you tell the compiler that the data at that memory location is not signed then the bits don't change but instead their interpretation does. So when you do your addition, store the result in an unsigned integer memory location and then call printf on the result it doesn't bother looking for the sign because by definition it is always positive. It calculates the magnitude and prints it. The magnitude will be off because the sign information is still encoded in the bits but it's treated as magnitude information.