Unsigned integers - c

Could anyone help me understand the difference between signed/unsigned int, as well as signed/unsigned char? In this case, if it's unsigned wouldn't the value just never reach a negative number and continue on an infinite loop of 0's?
int main()
{
unsigned int n=3;
while (n>=0)
{
printf ("%d",n);
n=n-1;
}
return 0;
}

Two important things:
At one level, the difference between regular signed, versus unsigned values, is just the way we interpret the bits. If we limit ourselves to 3 bits, we have:
bits
signed
unsigned
000
0
0
001
1
1
010
2
2
011
3
3
100
-4
4
101
-3
5
110
-2
6
111
-1
7
The bit patterns don't change, it's just a matter of interpretation whether we have them represent nonnegative integers from 0 to 2N-1, or signed integers from -2N/2 to 2N/2-1.
The other important thing to know is what operations are defined on a type. For unsigned types, addition and subtraction are defined so that they "wrap around" from 0 to 2N-1. But for signed types, overflow and underflow are undefined. (On some machines they wrap around, but not all.)
Finally, there's the issue of properly matching up your printf formats. For %d, you're supposed to give it a signed integer. But you gave it unsigned instead. Strictly speaking, that results in undefined behavior, too, but in this case (and not too suprisingly), what happened was that it took the same bit pattern and printed it out as if it were signed, rather than unsigned.

wouldn't the value just never reach a negative number
Correct, it can't be negative.
and continue on an infinite loop of 0's
No, it will wrap-around from zero to the largest value of an unsigned int, which is well-defined behavior. If you use the correct conversion specifier %u instead of the incorrect %d, you'll notice this output:
3
2
1
0
4294967295
4294967294
...

Signed number representation is the categorization of positive as well as negative integers while unsigned categorizations are classifications of positive integersو and the code you wrote will run forever because n is an unsigned number and always represents a positive number.

In this case, if it's unsigned wouldn't the value just never reach a negative number ...?
You are right. But in the statement printf ("%d",n); you “deceived” the printf() function — using the type conversion specifier d — that the number in variable n is signed.
Use the type conversion specifier u instead: printf ("%u",n);
... never reach a negative number and continue on an infinite loop of 0's?
No. “Never reaching a negative number” is not the same as “stopping at 0 and resist further decrementing”.
Other people already explained this. Here is my explanation, in the form of analogies:
Imagine yourself a never ending and never beginning sequence of non-negative integers:
..., 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, ... // the biggest is 3 only for simplicity
— or numbers on an analog clock:
                    
You may increase / decrease a number forever, going round and round.

The terms signed and unsigned refer to how the CPU treats sequences of bits.
There are 2 important things to understand here:
How the CPU adds finite sequences of bits to a single finite result
How the CPU differentiates between signed and unsigned operands
Let's start with (1).
Let's take 4-bit nibbles for example.
If we ask the CPU to add 0001 and 0001, the result should be 2, or 0010.
But if we ask it to add 1111 and 0001, the result should be 16, or 10000. But it only has 4 bits to contain the result. The convention is to wrap-around, or circle, back to 0, effectively ignoring the Most Significant Bit. See also integer overflow..
Why is this relevant? Because it produces an interesting result. That is, according to the definition above, if we let x = 1111, then we get x + 1 = 0. Well, x, or 1111, now looks and behaves awfully like -1. This is the birth of signed numbers and operations. And if 1111 can be deemed as -1, then 1111 - 1 = 1110 should be -2, and so on.
Now let's look at (2).
When the C compiler sees you defining an unsigned int, it will use special CPU instructions for dealing with unsigned numbers, where it deems relevant. For example, this is relevant in jump instructions, where the CPU needs to know if you mean it to jump way forward, or slightly backward. For this it needs to know if you mean your operand to be interpreted in a signed, or unsigned way.
The operation of adding two numbers, on the other hand, is fundamentally oblivious to the consequent interpretation. The only thing is that the CPU will turn on a special flag after an addition operation, to tell you whether a wrap-around has occurred, for your own auditing.
But the important thing to understand is that the sequence of bits doesn't change; only its interpretation.
To tie all of this to your example, subtracting 1 from an unsigned 0 will simply wrap-around back to 1111, or 2^32 in your case.
Finally, there are other uses for signed/unsigned. For example, by the very fact it is defined as a different type, this allows functions to be written that define a contract where only unsigned integers, let's say, can be passed to it. Also, it's relevant when you want to display or print the number.

Related

What is signed integer overflow?

I'm learning C from CS50. When I run my code, it says 'signed integer overflow'.
#include <stdio.h>
#include <cs50.h>
int main(void)
{
int x = 41;
int c = 0;
while(x>=25)
{
c = c+1;
}
printf("%i\n", c);
}
Can someone explain what that means?
Your while condition will always be true, meaning the loop will run forever, adding 1 to c in each iteration.
Since c is a (signed) int it means it will increment slowly to its max value, and after that the next increment would be UB (undefined behavior). What many machines will do in this specific UB is to turn c negative, which I guess is not what you wanted. This happens due to a phenomenon called "signed integer overflow".
Let's assume 32-bit int and using two's complement. A signed int will look like this in binary sign bit (0 for positive, 1 for negative) | 31 bits. zero will look like 000...00, one like 000...01 and so on.
Max signed int will look like 0111...11 (2,147,483,647). When adding 1 to this number you'll get 100...000 which flipped the sign bit which will now result in a negative number. Adding another 1 to this will result in 100...001 which again has the sign bit on meaning it is still negative...
Declaring c as unsigned would ensure c remains non-negative. Also, making the loop end with while(x-- >= 25) could also be a good idea :)
"Signed integer overflow" means that you tried to store a value that's outside the range of values that the type can represent, and the result of that operation is undefined (in this particular case, your program halts with an error).
Since your while loop never terminates (x >= 25 evaluates to true, and you never change the value of x), you keep adding 1 to c until you reach a value outside the range that a signed int can represent.
Remember that in C, integral and floating-point types have fixed sizes, meaning they can only represent a fixed number of values. For example, suppose int is 3 bits wide, meaning it can only store 8 distinct values. What those values are depends on how the bit patterns are interpreted. You could store "unsigned" (non-negative) values [0..7], or "signed" (negative and non-negative) values [-3...3] or [-4..3] depending on representation. Here are several different ways you can interpret the values of three bits:
Bits Unsigned Sign-Magnitude 1's Complement 2's Complement
---- -------- ------------- -------------- --------------
000 0 0 0 0
001 1 1 1 1
010 2 2 2 2
011 3 3 3 3
100 4 -0 -3 -4
101 5 -1 -2 -3
110 6 -2 -1 -2
111 7 -3 -0 -1
Most systems use 2's Complement for signed integer values. Yes, sign-magnitude and 1's complement have positive and negative representations for zero.
So, let's say c is our 3-bit signed int. We start it at 0 and add 1 each time through the loop. Eveything's fine until c is 3 - using our 3-bit signed representation, we cannot represent the value 4. The result of the operation is undefined behavior, meaning the compiler is not required to handle the issue in any particular way. Logically, you'd expect the value to "wrap around" to a negative value based on the representation in use, but even that's not necessarily true, depending on how the compiler optimizes arithmetic operations.
Note that unsigned integer overflow is well-defined - you'll "wrap around" back to 0.
An integer can only hold so many numbers before it reaches its max value. In your while loop it says to execute it while x>=25. Since x is 41 and x never decreases in its value that means the while loop will always execute because it's always true since 41>=25.
An integer can only hold up to the number 2,147,483,647 which means that since C will keep on adding to itself as the while loop will always be true once it reaches 2,147,483,647 it will give you an error because an integer cannot go past that as it doesn't have enough memory to.
Well... You have an infinite loop because your value x will always be bigger than 25 since you dont decrease it.since the loop is infinite, your value c reaches the max size of an int (which is 2,147,483,647 if 4bytes).
You can try this in order to escape the infinite loop:
int main(void)
{
int x = 41;
int c = 0;
while (x >= 25)
{
c = c+1;
x--;
}
printf("%i\n", c);
}
First of all, you need to know what a "signed integer overflow condition" is.
It is a condition which appears when a mathematical operation results in a number which is out of bounds of the data type, which is signed integer overflow in your case.
This happens because your loop goes on infinitely, because x >= 25 will always be true.

C Bitwise Operators Example

given the following function:
int boof(int n) {
return n + ~n + 1;
}
What does this function return? I'm having trouble understanding exactly what is being passed in to it. If I called boof(10), would it convert 10 to base 2, and then do the bitwise operations on the binary number?
This was a question I had on a quiz recently, and I think the answer is supposed to be 0, but I'm not sure how to prove it.
note: I know how each bitwise operator works, I'm more confused on how the input is processed.
Thanks!
When n is an int, n + ~n will always result in an int that has all bits set.
Strictly speaking, the behavior of adding 1 to such an int will depend on the representation of signed numbers on the platform. The C standard support 3 representations for signed int:
for Two's Complement machines (the vast majority of systems in use today), the result will be 0 since an int with all bits set is -1.
on a One's Complement machine (which are pretty rare today, I believe), the result will be 1 since an int with all bits set is 0 or -0 (negative zero) or undefined behavior.
a Signed-magnitude machine (are there really any of these still in use?), an int with all bits set is a negative number with the maximum magnitude (so the actual value will depend on the size of an int). In this case adding 1 to it will result in a negative number (the exact value, again depends on the number of bits that are used to represent an int).
Note that the above ignores that it might be possible for some implementations to trap with various bit configurations that might be possible with n + ~n.
Bitwise operations will not change the underlying representation of the number to base 2 - all math on the CPU is done using binary operations regardless.
What this function does is take n and then add it to the two's complement negative representation of itself. This essentially negates the input. Anything you put in will equal 0.
Let me explain with 8 bit numbers as this is easier to visualize.
10 is represented in binary as 00001010.
Negative numbers are stored in two's complement (NOTing the number and adding 1)
So the (~n + 1) portion for 10 looks like so:
11110101 + 1 = 11110110
So if we take n + ~n+1:
00001010 + 11110110 = 0
Notice if we add these numbers together we get a left carry which will set the overflow flag, resulting in a 0. (Adding a negative and positive number together never means the overflow indicates an exception!)
See this
The CARRY and OVERFLOW flag in Binary Arithmetic

Wrap around explanation for signed and unsigned variables in C?

I read a bit in C spec that unsigned variables(in particular unsigned short int) perform some so called wrap around on integer overflow, although I couldn't find anything on signed variables except that I left with undefined behavior.
My professor told me that their values also get wrapped around (maybe he just meant gcc). I thought the bits just get truncated and the bits I left with give me some weird value!
What wrap around is and how is it different from just truncating bits.
Signed integer variables do not have wrap-around behavior in C language. Signed integer overflow during arithmetic computations produces undefined behavior. Note BTW that GCC compiler you mentioned is known for implementing strict overflow semantics in optimizations, meaning that it takes advantage of the freedom provided by such undefined behavior situations: GCC compiler assumes that signed integer values never wrap around. That means that GCC actually happens to be one of the compilers in which you cannot rely on wrap-around behavior of signed integer types.
For example, GCC compiler can assume that for variable int i the following condition
if (i > 0 && i + 1 > 0)
is equivalent to a mere
if (i > 0)
This is exactly what strict overflow semantics means.
Unsigned integer types implement modulo arithmetic. The modulo is equal 2^N where N is the number of bits in the value representation of the type. For this reason unsigned integer types do indeed appear to wrap around on overflow.
However, C language never performs arithmetic computations in domains smaller than that of int/unsigned int. Type unsigned short int that you mention in your question will typically be promoted to type int in expressions before any computations begin (assuming that the range of unsigned short fits into the range of int). Which means that 1) the computations with unsigned short int will be preformed in the domain of int, with overflow happening when int overflows, 2) overflow during such computations will lead to undefined behavior, not to wrap-around behavior.
For example, this code produces a wrap around
unsigned i = USHRT_MAX;
i *= INT_MAX; /* <- unsigned arithmetic, overflows, wraps around */
while this code
unsigned short i = USHRT_MAX;
i *= INT_MAX; /* <- signed arithmetic, overflows, produces undefined behavior */
leads to undefined behavior.
If no int overflow happens and the result is converted back to an unsigned short int type, it is again reduced by modulo 2^N, which will appear as if the value has wrapped around.
Imagine you have a data type that's only 3 bits wide. This allows you to represent 8 distinct values, from 0 through 7. If you add 1 to 7, you will "wrap around" back to 0, because you don't have enough bits to represent the value 8 (1000).
This behavior is well-defined for unsigned types. It is not well-defined for signed types, because there are multiple methods for representing signed values, and the result of an overflow will be interpreted differently based on that method.
Sign-magnitude: the uppermost bit represents the sign; 0 for positive, 1 for negative. If my type is three bits wide again, then I can represent signed values as follows:
000 = 0
001 = 1
010 = 2
011 = 3
100 = -0
101 = -1
110 = -2
111 = -3
Since one bit is taken up for the sign, I only have two bits to encode a value from 0 to 3. If I add 1 to 3, I'll overflow with -0 as the result. Yes, there are two representations for 0, one positive and one negative. You won't encounter sign-magnitude representation all that often.
One's-complement: the negative value is the bitwise-inverse of the positive value. Again, using the three-bit type:
000 = 0
001 = 1
010 = 2
011 = 3
100 = -3
101 = -2
110 = -1
111 = -0
I have three bits to encode my values, but the range is [-3, 3]. If I add 1 to 3, I'll overflow with -3 as the result. This is different from the sign-magnitude result above. Again, there are two encodings for 0 using this method.
Two's-complement: the negative value is the bitwise inverse of the positive value, plus 1. In the three-bit system:
000 = 0
001 = 1
010 = 2
011 = 3
100 = -4
101 = -3
110 = -2
111 = -1
If I add 1 to 3, I'll overflow with -4 as a result, which is different from the previous two methods. Note that we have a slightly larger range of values [-4, 3] and only one representation for 0.
Two's complement is probably the most common method of representing signed values, but it's not the only one, hence the C standard can't make any guarantees of what will happen when you overflow a signed integer type. So it leaves the behavior undefined so the compiler doesn't have to deal with interpreting multiple representations.
The undefined behavior comes from early portability issues when signed integer types could be represented either as sign & magnitude, one's complement or two's complement.
Nowadays, all architectures represent integers as two's complement that do wrap around. But be careful : since your compiler is right to assume you won't be running undefined behavior, you might encounter weird bugs when optimisation is on.
In a signed 8-bit integer, the intuitive definition of wrap around might look like going from +127 to -128 -- in two's complement binary: 0111111 (127) and 1000000 (-128). As you can see, that is the natural progress of incrementing the binary data--without considering it to represent an integer, signed or unsigned. Counter intuitively, the actual overflow takes place when moving from -1 (11111111) to 0 (00000000) in the unsigned integer's sense of wrap-around.
This doesn't answer the deeper question of what the correct behavior is when a signed integer overflows because there is no "correct" behavior according to the standard.

How does unsigned subtraction work when it wraps around?

This is a macro in the lwIP source code:
#define TCP_SEQ_LT(a,b) ((int32_t)((uint32_t)(a) - (uint32_t)(b)) < 0)
Which is used to check if a TCP sequence number is less than another, taking into account when the sequence numbers wrap around. It exploits the fact that arithmetic wraps around, but I am unable to understand how this works in this particular case.
Can anyone explain what happens and why the above works ?
Take a simple 4 bit integer example where a = 5 and b = 6. The binary representation of each will be
a = 0101
b = 0110
Now when we subtract these (or take two's complement of b, sum with a, and add 1), we get the following
0101
1001
+ 1
-----
1111
1111 is equal to 15 (unsigned) or -1 (signed, again translated using two's complement). By casting the two numbers to unsigned, we ensure that if b > a, the difference between the two is going to be a large unsigned number and have it's highest bit set. When translating this large unsigned number into its signed counterpart we will always get a negative number due to the set MSB.
As nos pointed out, when a sequence number wraps around from the max unsigned value back to the min, the macro will also return that the max value is < min using the above arithmetic, hence its usefulness.
On wrap-around a will be much greater than b. If we subtract them the result will also be very large, ie have its high-order bit set. If we then treat the result as a signed value the large difference will turn into a negative number, less than 0.
If you had 2 sequence numbers 2G apart it wouldn't work, but that's not going to happen.
Because it is first cast as a signed integer before it is compared to zero. Remember that the first bit reading from left to right determines the sign in a signed number but is used to increase the unsigned int's range by an extra bit.
Example: let's say you have a 4 bit unsigned number. This would mean 1001 is 17. But as a signed integer this would be -1.
Now lets say we did b0010 - b0100. This ends up with b1110. Unsigned this is 14, and signed this is -6.

Integer overflow problem

Please explain the following paragraph.
"The next question is whether we can assign a certain value to a variable without losing precision. It is not sufficient if we just check for overflow during the addition or subtraction, because someone might add 1 to -5 and assign the result to an unsigned int. Then the actual addition does not overflow, but the result still does not fit."
when i am adding 1 to -5 i dont see any reason to worry.the answer is as it should be -4.
so what is the problem of result not being fit??
you can find the full article here through which i was going:
http://www.fefe.de/intof.html
The binary representation of -4, in a 32-bit word, is as follows (hex notation)
0xfffffffc
When interpreted as an unsigned integer, this bit pattern represents the number 2**32-4, or 18446744073709551612. I'm not sure I would call this phenomenon "overflow", but it is a common mistake to assign a small negative integer to a variable of unsigned type and wind up with a really big positive integer.
This trick is actually exploited for bounds checking: if you have a signed integer i and want to know if it is in the range 0 <= i < n, you can test
if ((unsigned)i < n) { ... }
which gives you the answer using one comparison instead of two. The cast to unsigned has no run-time cost; it just tells the compiler to generate an unsigned comparison instead of a signed comparison.
Try assigning it to a unsigned int, not an int.
The term unsigned int is the key - by default an int datatype will hold negative and positive numbers; however, unsigned ints are always positive. They provide this option because uints can technically hold greater positive values than regular signed ints because they do not need to use a bit to keep track of whether or not its negative or positive.
Please see:
Signed versus Unsigned Integers
The problem is that you're storing -4 in an unsigned int. Unsigned ints can only contain zero and positive values. If you assign -4 to one, you'll actually end up getting a very large positive number (the actual value depends on how wide an int you're using).
The problem is that the sizes of storage such as unsigned int can only hold so much. With 1 and -5 it does not matter, but with 1 and -500000000 you might end up with a confusing result. Also, unsigned storage will interpret anything stored in it as positive, so you cannot put a negative value in an unsigned variable.
Two big things to watch out for:
1. Overflow in the operation itself: 1 + -500000000
2. Issues in casting: (unsigned int)(1 + -500)
Unsigned variables, like unsigned int, cannot hold negative values. So assigning 1 - 5 to an unsigned int won't give you -4. I'm not sure what it'll give you, it's probably compiler specific.
Some code:
signed int first, second;
unsigned int result;
first = obtain(); // happens to be 1
second = obtain(); // happens to be -5
result = first + second; // unexpected result here - very large number - and it's too late to check that there's a problem
Say you obtained those values from keyboard. You need to check before addition that the result can be represented in unsigned int. That's what the article talks about.
By definition the number -4 cannot be represented in an unsigned int. -4 is a signed integer. The same goes for any negative number.
When you assign a negative integer to an unsigned int the actual bits of the number do not change, but they are merely represented differently. You'll get some ridiculously-large number due to the way integers are represented in binary (two's complement).
In two's complement, -4 is represented as 0xfffffffc. When 0xfffffffc is represented as an unsigned int you'll get the number 4,294,967,292.
You have to remember that fundamentally you're working with bits. So you can assign a value of -4 to an unsigned integer and this will place a series of bits into that memory location. Those bits can be interpreted as -4 in certain circumstances. One such circumstance is the obvious one: you've told the compiler/system that the bits in that memory location should be interpreted as a two's compliment signed number. So if you do printf("%s",i) prtinf does its magic and converts the two's compliment number to a magnitude and sign. The magnitude will be 4 and the sign will be negative, so it displays '-4'.
However, if you tell the compiler that the data at that memory location is not signed then the bits don't change but instead their interpretation does. So when you do your addition, store the result in an unsigned integer memory location and then call printf on the result it doesn't bother looking for the sign because by definition it is always positive. It calculates the magnitude and prints it. The magnitude will be off because the sign information is still encoded in the bits but it's treated as magnitude information.

Resources