What is signed integer overflow? - c

I'm learning C from CS50. When I run my code, it says 'signed integer overflow'.
#include <stdio.h>
#include <cs50.h>
int main(void)
{
int x = 41;
int c = 0;
while(x>=25)
{
c = c+1;
}
printf("%i\n", c);
}
Can someone explain what that means?

Your while condition will always be true, meaning the loop will run forever, adding 1 to c in each iteration.
Since c is a (signed) int it means it will increment slowly to its max value, and after that the next increment would be UB (undefined behavior). What many machines will do in this specific UB is to turn c negative, which I guess is not what you wanted. This happens due to a phenomenon called "signed integer overflow".
Let's assume 32-bit int and using two's complement. A signed int will look like this in binary sign bit (0 for positive, 1 for negative) | 31 bits. zero will look like 000...00, one like 000...01 and so on.
Max signed int will look like 0111...11 (2,147,483,647). When adding 1 to this number you'll get 100...000 which flipped the sign bit which will now result in a negative number. Adding another 1 to this will result in 100...001 which again has the sign bit on meaning it is still negative...
Declaring c as unsigned would ensure c remains non-negative. Also, making the loop end with while(x-- >= 25) could also be a good idea :)

"Signed integer overflow" means that you tried to store a value that's outside the range of values that the type can represent, and the result of that operation is undefined (in this particular case, your program halts with an error).
Since your while loop never terminates (x >= 25 evaluates to true, and you never change the value of x), you keep adding 1 to c until you reach a value outside the range that a signed int can represent.
Remember that in C, integral and floating-point types have fixed sizes, meaning they can only represent a fixed number of values. For example, suppose int is 3 bits wide, meaning it can only store 8 distinct values. What those values are depends on how the bit patterns are interpreted. You could store "unsigned" (non-negative) values [0..7], or "signed" (negative and non-negative) values [-3...3] or [-4..3] depending on representation. Here are several different ways you can interpret the values of three bits:
Bits Unsigned Sign-Magnitude 1's Complement 2's Complement
---- -------- ------------- -------------- --------------
000 0 0 0 0
001 1 1 1 1
010 2 2 2 2
011 3 3 3 3
100 4 -0 -3 -4
101 5 -1 -2 -3
110 6 -2 -1 -2
111 7 -3 -0 -1
Most systems use 2's Complement for signed integer values. Yes, sign-magnitude and 1's complement have positive and negative representations for zero.
So, let's say c is our 3-bit signed int. We start it at 0 and add 1 each time through the loop. Eveything's fine until c is 3 - using our 3-bit signed representation, we cannot represent the value 4. The result of the operation is undefined behavior, meaning the compiler is not required to handle the issue in any particular way. Logically, you'd expect the value to "wrap around" to a negative value based on the representation in use, but even that's not necessarily true, depending on how the compiler optimizes arithmetic operations.
Note that unsigned integer overflow is well-defined - you'll "wrap around" back to 0.

An integer can only hold so many numbers before it reaches its max value. In your while loop it says to execute it while x>=25. Since x is 41 and x never decreases in its value that means the while loop will always execute because it's always true since 41>=25.
An integer can only hold up to the number 2,147,483,647 which means that since C will keep on adding to itself as the while loop will always be true once it reaches 2,147,483,647 it will give you an error because an integer cannot go past that as it doesn't have enough memory to.

Well... You have an infinite loop because your value x will always be bigger than 25 since you dont decrease it.since the loop is infinite, your value c reaches the max size of an int (which is 2,147,483,647 if 4bytes).
You can try this in order to escape the infinite loop:
int main(void)
{
int x = 41;
int c = 0;
while (x >= 25)
{
c = c+1;
x--;
}
printf("%i\n", c);
}

First of all, you need to know what a "signed integer overflow condition" is.
It is a condition which appears when a mathematical operation results in a number which is out of bounds of the data type, which is signed integer overflow in your case.
This happens because your loop goes on infinitely, because x >= 25 will always be true.

Related

Unsigned integers

Could anyone help me understand the difference between signed/unsigned int, as well as signed/unsigned char? In this case, if it's unsigned wouldn't the value just never reach a negative number and continue on an infinite loop of 0's?
int main()
{
unsigned int n=3;
while (n>=0)
{
printf ("%d",n);
n=n-1;
}
return 0;
}
Two important things:
At one level, the difference between regular signed, versus unsigned values, is just the way we interpret the bits. If we limit ourselves to 3 bits, we have:
bits
signed
unsigned
000
0
0
001
1
1
010
2
2
011
3
3
100
-4
4
101
-3
5
110
-2
6
111
-1
7
The bit patterns don't change, it's just a matter of interpretation whether we have them represent nonnegative integers from 0 to 2N-1, or signed integers from -2N/2 to 2N/2-1.
The other important thing to know is what operations are defined on a type. For unsigned types, addition and subtraction are defined so that they "wrap around" from 0 to 2N-1. But for signed types, overflow and underflow are undefined. (On some machines they wrap around, but not all.)
Finally, there's the issue of properly matching up your printf formats. For %d, you're supposed to give it a signed integer. But you gave it unsigned instead. Strictly speaking, that results in undefined behavior, too, but in this case (and not too suprisingly), what happened was that it took the same bit pattern and printed it out as if it were signed, rather than unsigned.
wouldn't the value just never reach a negative number
Correct, it can't be negative.
and continue on an infinite loop of 0's
No, it will wrap-around from zero to the largest value of an unsigned int, which is well-defined behavior. If you use the correct conversion specifier %u instead of the incorrect %d, you'll notice this output:
3
2
1
0
4294967295
4294967294
...
Signed number representation is the categorization of positive as well as negative integers while unsigned categorizations are classifications of positive integersو and the code you wrote will run forever because n is an unsigned number and always represents a positive number.
In this case, if it's unsigned wouldn't the value just never reach a negative number ...?
You are right. But in the statement printf ("%d",n); you “deceived” the printf() function — using the type conversion specifier d — that the number in variable n is signed.
Use the type conversion specifier u instead: printf ("%u",n);
... never reach a negative number and continue on an infinite loop of 0's?
No. “Never reaching a negative number” is not the same as “stopping at 0 and resist further decrementing”.
Other people already explained this. Here is my explanation, in the form of analogies:
Imagine yourself a never ending and never beginning sequence of non-negative integers:
..., 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, ... // the biggest is 3 only for simplicity
— or numbers on an analog clock:
                    
You may increase / decrease a number forever, going round and round.
The terms signed and unsigned refer to how the CPU treats sequences of bits.
There are 2 important things to understand here:
How the CPU adds finite sequences of bits to a single finite result
How the CPU differentiates between signed and unsigned operands
Let's start with (1).
Let's take 4-bit nibbles for example.
If we ask the CPU to add 0001 and 0001, the result should be 2, or 0010.
But if we ask it to add 1111 and 0001, the result should be 16, or 10000. But it only has 4 bits to contain the result. The convention is to wrap-around, or circle, back to 0, effectively ignoring the Most Significant Bit. See also integer overflow..
Why is this relevant? Because it produces an interesting result. That is, according to the definition above, if we let x = 1111, then we get x + 1 = 0. Well, x, or 1111, now looks and behaves awfully like -1. This is the birth of signed numbers and operations. And if 1111 can be deemed as -1, then 1111 - 1 = 1110 should be -2, and so on.
Now let's look at (2).
When the C compiler sees you defining an unsigned int, it will use special CPU instructions for dealing with unsigned numbers, where it deems relevant. For example, this is relevant in jump instructions, where the CPU needs to know if you mean it to jump way forward, or slightly backward. For this it needs to know if you mean your operand to be interpreted in a signed, or unsigned way.
The operation of adding two numbers, on the other hand, is fundamentally oblivious to the consequent interpretation. The only thing is that the CPU will turn on a special flag after an addition operation, to tell you whether a wrap-around has occurred, for your own auditing.
But the important thing to understand is that the sequence of bits doesn't change; only its interpretation.
To tie all of this to your example, subtracting 1 from an unsigned 0 will simply wrap-around back to 1111, or 2^32 in your case.
Finally, there are other uses for signed/unsigned. For example, by the very fact it is defined as a different type, this allows functions to be written that define a contract where only unsigned integers, let's say, can be passed to it. Also, it's relevant when you want to display or print the number.

Negative power of 2

I am learning the characteristics of the different data type. For example, this program increasingly prints the power of 2 with four different formats: integer, unsigned integer, hexadecimal, octal
#include<stdio.h>
int main(int argc, char *argv[]){
int i, val = 1;
for (i = 1; i < 35; ++i) {
printf("%15d%15u%15x%15o\n", val, val, val, val);
val *= 2;
}
return 0;
}
It works. unsigned goes up to 2147483648. integer goes up to -2147483648. But why does it become negative?
I have a theory: is it because the maximum signed integer we can represent on a 32 bit machine is 2147483647? If so, why does it return the negative number?
First of all, you should understand that this program is undefined. It causes signed integer overflow, and this is declared undefined in the C Standard.
The technical reason is that no behavior can be predicted as different representations are allowed for negative numbers and there could even be padding bits in the representation.
The most probable reason you see a negative number in your case is that your machine uses 2's complement (look it up) to represent negative numbers while arithmetics operate on bits without overflow checks. Therefore, the highest bit is the sign bit, and if your value overflows into this bit, it turns negative.
What you describe is UB caused by integer overflow. Since the behavior is undefined, anything could happen (“When the compiler encounters [a given undefined construct] it is legal for it to make demons fly out of your nose”), BUT, what actually happens on some machines (I suspect yours included) is this:
You start with int val = 1;. That is represented 0b00...1 in binary form. Each time you val *= 2; the value is multiplied by 2, therefore the representation changes to 0b00...10 and then to 0b00...100 and so on (the 1 bit moves one step each time). The last time you val *= 2; you get 0b100.... Now, using 2's complement (which is what I guess your machine uses, as it very common) the value is actually -1 * 0b1000... which is -2147483648
Note, that even though this might be what's really going on in your machine, it's not to be trusted or thought of as the "right" thing to happen, since, as mentioned before, this is UB
In this program, the val value will overflow, if it is a 32- bit machine, because the size of integer is 4 bytes. Now, we have 2 type of values in math, positive and negative, so to do calculation involving negative results, we use sign representations i.e int or char in C language.
Lets take the example of char, range -128 to 127, unsigned char range 0-255 .
It tells, range is divided into two parts for signed representation. So for any signed variable, if it crosses its range of +ve value, it goes into negative value. Like here in case of char, as the value goes above the 127, it becomes -ve. And suppose if you add 300 to any char or unsigned char variable what happens, it rolls over and starts again from zero.
char a=2;
a+=300;
what is the value?? now you know max value of char is 255(total 256 values, including zero), so 300-256 = 44 + 2 =46.
Hope this helps

Wrap around explanation for signed and unsigned variables in C?

I read a bit in C spec that unsigned variables(in particular unsigned short int) perform some so called wrap around on integer overflow, although I couldn't find anything on signed variables except that I left with undefined behavior.
My professor told me that their values also get wrapped around (maybe he just meant gcc). I thought the bits just get truncated and the bits I left with give me some weird value!
What wrap around is and how is it different from just truncating bits.
Signed integer variables do not have wrap-around behavior in C language. Signed integer overflow during arithmetic computations produces undefined behavior. Note BTW that GCC compiler you mentioned is known for implementing strict overflow semantics in optimizations, meaning that it takes advantage of the freedom provided by such undefined behavior situations: GCC compiler assumes that signed integer values never wrap around. That means that GCC actually happens to be one of the compilers in which you cannot rely on wrap-around behavior of signed integer types.
For example, GCC compiler can assume that for variable int i the following condition
if (i > 0 && i + 1 > 0)
is equivalent to a mere
if (i > 0)
This is exactly what strict overflow semantics means.
Unsigned integer types implement modulo arithmetic. The modulo is equal 2^N where N is the number of bits in the value representation of the type. For this reason unsigned integer types do indeed appear to wrap around on overflow.
However, C language never performs arithmetic computations in domains smaller than that of int/unsigned int. Type unsigned short int that you mention in your question will typically be promoted to type int in expressions before any computations begin (assuming that the range of unsigned short fits into the range of int). Which means that 1) the computations with unsigned short int will be preformed in the domain of int, with overflow happening when int overflows, 2) overflow during such computations will lead to undefined behavior, not to wrap-around behavior.
For example, this code produces a wrap around
unsigned i = USHRT_MAX;
i *= INT_MAX; /* <- unsigned arithmetic, overflows, wraps around */
while this code
unsigned short i = USHRT_MAX;
i *= INT_MAX; /* <- signed arithmetic, overflows, produces undefined behavior */
leads to undefined behavior.
If no int overflow happens and the result is converted back to an unsigned short int type, it is again reduced by modulo 2^N, which will appear as if the value has wrapped around.
Imagine you have a data type that's only 3 bits wide. This allows you to represent 8 distinct values, from 0 through 7. If you add 1 to 7, you will "wrap around" back to 0, because you don't have enough bits to represent the value 8 (1000).
This behavior is well-defined for unsigned types. It is not well-defined for signed types, because there are multiple methods for representing signed values, and the result of an overflow will be interpreted differently based on that method.
Sign-magnitude: the uppermost bit represents the sign; 0 for positive, 1 for negative. If my type is three bits wide again, then I can represent signed values as follows:
000 = 0
001 = 1
010 = 2
011 = 3
100 = -0
101 = -1
110 = -2
111 = -3
Since one bit is taken up for the sign, I only have two bits to encode a value from 0 to 3. If I add 1 to 3, I'll overflow with -0 as the result. Yes, there are two representations for 0, one positive and one negative. You won't encounter sign-magnitude representation all that often.
One's-complement: the negative value is the bitwise-inverse of the positive value. Again, using the three-bit type:
000 = 0
001 = 1
010 = 2
011 = 3
100 = -3
101 = -2
110 = -1
111 = -0
I have three bits to encode my values, but the range is [-3, 3]. If I add 1 to 3, I'll overflow with -3 as the result. This is different from the sign-magnitude result above. Again, there are two encodings for 0 using this method.
Two's-complement: the negative value is the bitwise inverse of the positive value, plus 1. In the three-bit system:
000 = 0
001 = 1
010 = 2
011 = 3
100 = -4
101 = -3
110 = -2
111 = -1
If I add 1 to 3, I'll overflow with -4 as a result, which is different from the previous two methods. Note that we have a slightly larger range of values [-4, 3] and only one representation for 0.
Two's complement is probably the most common method of representing signed values, but it's not the only one, hence the C standard can't make any guarantees of what will happen when you overflow a signed integer type. So it leaves the behavior undefined so the compiler doesn't have to deal with interpreting multiple representations.
The undefined behavior comes from early portability issues when signed integer types could be represented either as sign & magnitude, one's complement or two's complement.
Nowadays, all architectures represent integers as two's complement that do wrap around. But be careful : since your compiler is right to assume you won't be running undefined behavior, you might encounter weird bugs when optimisation is on.
In a signed 8-bit integer, the intuitive definition of wrap around might look like going from +127 to -128 -- in two's complement binary: 0111111 (127) and 1000000 (-128). As you can see, that is the natural progress of incrementing the binary data--without considering it to represent an integer, signed or unsigned. Counter intuitively, the actual overflow takes place when moving from -1 (11111111) to 0 (00000000) in the unsigned integer's sense of wrap-around.
This doesn't answer the deeper question of what the correct behavior is when a signed integer overflows because there is no "correct" behavior according to the standard.

How does unsigned subtraction work when it wraps around?

This is a macro in the lwIP source code:
#define TCP_SEQ_LT(a,b) ((int32_t)((uint32_t)(a) - (uint32_t)(b)) < 0)
Which is used to check if a TCP sequence number is less than another, taking into account when the sequence numbers wrap around. It exploits the fact that arithmetic wraps around, but I am unable to understand how this works in this particular case.
Can anyone explain what happens and why the above works ?
Take a simple 4 bit integer example where a = 5 and b = 6. The binary representation of each will be
a = 0101
b = 0110
Now when we subtract these (or take two's complement of b, sum with a, and add 1), we get the following
0101
1001
+ 1
-----
1111
1111 is equal to 15 (unsigned) or -1 (signed, again translated using two's complement). By casting the two numbers to unsigned, we ensure that if b > a, the difference between the two is going to be a large unsigned number and have it's highest bit set. When translating this large unsigned number into its signed counterpart we will always get a negative number due to the set MSB.
As nos pointed out, when a sequence number wraps around from the max unsigned value back to the min, the macro will also return that the max value is < min using the above arithmetic, hence its usefulness.
On wrap-around a will be much greater than b. If we subtract them the result will also be very large, ie have its high-order bit set. If we then treat the result as a signed value the large difference will turn into a negative number, less than 0.
If you had 2 sequence numbers 2G apart it wouldn't work, but that's not going to happen.
Because it is first cast as a signed integer before it is compared to zero. Remember that the first bit reading from left to right determines the sign in a signed number but is used to increase the unsigned int's range by an extra bit.
Example: let's say you have a 4 bit unsigned number. This would mean 1001 is 17. But as a signed integer this would be -1.
Now lets say we did b0010 - b0100. This ends up with b1110. Unsigned this is 14, and signed this is -6.

Is there a way to AND all the bits of an integer in c and get back a simple 1 or 0?

is there a bitwise operation or logical operation that can be performed on all bits of an integer in C and returns either 1 or 0
Ex. an integer containing 0b10101010 would return 1, 0b00000000 would return 0.
If you anded all the bits of a word only "all ones" would produce a result of 1. In your example 0b10101010 would produce zero not one.
If instead you OR'ed all the bits, any non-zero value would result in 1.
So the following would be type-safe for any integer type without assuming two's-complement:
int i = somevalue;
int and_bits = ~i == 0 ;
int or_bits = i != 0 ;
or perhaps more intuitively:
int and_bits = i == ~0 ;
The question as originally written is self-contradictory, asking about AND but using an example demonstrating OR.
The AND of all the bits in the number will be 0 for all values that contain any 0 bit, and 1 only for the specific value with all one bits.
That can be written as (i==-1) for any signed integer i. For unsigned integers, the test is probably better written as ((~i)==0) or something similar with more type qualifications applied.
The OR of all the bits in the number will be 0 only for the special case of 0, and 1 for all nonzero cases. That can be written as !!i for any integer i.
This works because the ! operator (like all logical operators in C) is specified to test for logical truth in the usual way and return only the values 0 or 1 as appropriate. So !! is a useful idiom for converting an arbitrary C expression into 1 if the expression is true or 0 if false.
(Update: reworded to avoid undefined behavior potentially caused by the expression i+1 overflowing a signed integer. Moral: don't do bit-wise operations on signed integers unless you really enjoy the muck. I've left behind an additional bit of UB that never occurs in practice. Signed integers are not obligated to be two's complement, and so -1 might not actually be represented by a word with all bits set.)
AND all bits:
i==-1
OR all bits:
i!=0
suppose you have an int i
the value you seem to want would be
`i != 0`
which will 'OR' all bits.
and after MSN's answer, I stand corrected. i == -1 will AND the bits, assuming two's complement
In the examples you've given, you could simply check if your variable is equal to 0. If there are any bits in the integer that are not 0, then the integer's value will be greater than 0. In practise:
if (i != 0) {
//Some bits are 1.
} else {
// All bits are 0
}
Or, since C casts integers to Booleans automatically:
if (i) {
...
If you really want to get a 0 or 1 value out of your comparison, you can take advantage of more automatic type-casting to turn a Boolean back into an integer:
int j = i != 0;

Resources