How does unsigned subtraction work when it wraps around? - c

This is a macro in the lwIP source code:
#define TCP_SEQ_LT(a,b) ((int32_t)((uint32_t)(a) - (uint32_t)(b)) < 0)
Which is used to check if a TCP sequence number is less than another, taking into account when the sequence numbers wrap around. It exploits the fact that arithmetic wraps around, but I am unable to understand how this works in this particular case.
Can anyone explain what happens and why the above works ?

Take a simple 4 bit integer example where a = 5 and b = 6. The binary representation of each will be
a = 0101
b = 0110
Now when we subtract these (or take two's complement of b, sum with a, and add 1), we get the following
0101
1001
+ 1
-----
1111
1111 is equal to 15 (unsigned) or -1 (signed, again translated using two's complement). By casting the two numbers to unsigned, we ensure that if b > a, the difference between the two is going to be a large unsigned number and have it's highest bit set. When translating this large unsigned number into its signed counterpart we will always get a negative number due to the set MSB.
As nos pointed out, when a sequence number wraps around from the max unsigned value back to the min, the macro will also return that the max value is < min using the above arithmetic, hence its usefulness.

On wrap-around a will be much greater than b. If we subtract them the result will also be very large, ie have its high-order bit set. If we then treat the result as a signed value the large difference will turn into a negative number, less than 0.
If you had 2 sequence numbers 2G apart it wouldn't work, but that's not going to happen.

Because it is first cast as a signed integer before it is compared to zero. Remember that the first bit reading from left to right determines the sign in a signed number but is used to increase the unsigned int's range by an extra bit.
Example: let's say you have a 4 bit unsigned number. This would mean 1001 is 17. But as a signed integer this would be -1.
Now lets say we did b0010 - b0100. This ends up with b1110. Unsigned this is 14, and signed this is -6.

Related

Unsigned integers

Could anyone help me understand the difference between signed/unsigned int, as well as signed/unsigned char? In this case, if it's unsigned wouldn't the value just never reach a negative number and continue on an infinite loop of 0's?
int main()
{
unsigned int n=3;
while (n>=0)
{
printf ("%d",n);
n=n-1;
}
return 0;
}
Two important things:
At one level, the difference between regular signed, versus unsigned values, is just the way we interpret the bits. If we limit ourselves to 3 bits, we have:
bits
signed
unsigned
000
0
0
001
1
1
010
2
2
011
3
3
100
-4
4
101
-3
5
110
-2
6
111
-1
7
The bit patterns don't change, it's just a matter of interpretation whether we have them represent nonnegative integers from 0 to 2N-1, or signed integers from -2N/2 to 2N/2-1.
The other important thing to know is what operations are defined on a type. For unsigned types, addition and subtraction are defined so that they "wrap around" from 0 to 2N-1. But for signed types, overflow and underflow are undefined. (On some machines they wrap around, but not all.)
Finally, there's the issue of properly matching up your printf formats. For %d, you're supposed to give it a signed integer. But you gave it unsigned instead. Strictly speaking, that results in undefined behavior, too, but in this case (and not too suprisingly), what happened was that it took the same bit pattern and printed it out as if it were signed, rather than unsigned.
wouldn't the value just never reach a negative number
Correct, it can't be negative.
and continue on an infinite loop of 0's
No, it will wrap-around from zero to the largest value of an unsigned int, which is well-defined behavior. If you use the correct conversion specifier %u instead of the incorrect %d, you'll notice this output:
3
2
1
0
4294967295
4294967294
...
Signed number representation is the categorization of positive as well as negative integers while unsigned categorizations are classifications of positive integersو and the code you wrote will run forever because n is an unsigned number and always represents a positive number.
In this case, if it's unsigned wouldn't the value just never reach a negative number ...?
You are right. But in the statement printf ("%d",n); you “deceived” the printf() function — using the type conversion specifier d — that the number in variable n is signed.
Use the type conversion specifier u instead: printf ("%u",n);
... never reach a negative number and continue on an infinite loop of 0's?
No. “Never reaching a negative number” is not the same as “stopping at 0 and resist further decrementing”.
Other people already explained this. Here is my explanation, in the form of analogies:
Imagine yourself a never ending and never beginning sequence of non-negative integers:
..., 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, ... // the biggest is 3 only for simplicity
— or numbers on an analog clock:
                    
You may increase / decrease a number forever, going round and round.
The terms signed and unsigned refer to how the CPU treats sequences of bits.
There are 2 important things to understand here:
How the CPU adds finite sequences of bits to a single finite result
How the CPU differentiates between signed and unsigned operands
Let's start with (1).
Let's take 4-bit nibbles for example.
If we ask the CPU to add 0001 and 0001, the result should be 2, or 0010.
But if we ask it to add 1111 and 0001, the result should be 16, or 10000. But it only has 4 bits to contain the result. The convention is to wrap-around, or circle, back to 0, effectively ignoring the Most Significant Bit. See also integer overflow..
Why is this relevant? Because it produces an interesting result. That is, according to the definition above, if we let x = 1111, then we get x + 1 = 0. Well, x, or 1111, now looks and behaves awfully like -1. This is the birth of signed numbers and operations. And if 1111 can be deemed as -1, then 1111 - 1 = 1110 should be -2, and so on.
Now let's look at (2).
When the C compiler sees you defining an unsigned int, it will use special CPU instructions for dealing with unsigned numbers, where it deems relevant. For example, this is relevant in jump instructions, where the CPU needs to know if you mean it to jump way forward, or slightly backward. For this it needs to know if you mean your operand to be interpreted in a signed, or unsigned way.
The operation of adding two numbers, on the other hand, is fundamentally oblivious to the consequent interpretation. The only thing is that the CPU will turn on a special flag after an addition operation, to tell you whether a wrap-around has occurred, for your own auditing.
But the important thing to understand is that the sequence of bits doesn't change; only its interpretation.
To tie all of this to your example, subtracting 1 from an unsigned 0 will simply wrap-around back to 1111, or 2^32 in your case.
Finally, there are other uses for signed/unsigned. For example, by the very fact it is defined as a different type, this allows functions to be written that define a contract where only unsigned integers, let's say, can be passed to it. Also, it's relevant when you want to display or print the number.

If a C signed integer type is stored in 22 bits, what is the smallest value it can store?

I am learning about data allocation and am a little confused.
If you are looking for the smallest or greatest value that can be stored in a certain number of bits then does it matter what the data type is?
Wouldn't the smallest or biggest number that could be stored in 22 bits would be 22 1's positive or negative? Is the first part of this question a red herring? Wouldn't the smallest value be -4194303?
A 22-bit data element can store any one of 2^22 distinct values. What those values actually mean is a matter of interpretation. That interpretation may be imposed by a compiler or some piece of hardware, or may be under the control of the programmer, and suit some specific application.
A simple interpretation, of course, would be to treat the 22 bits as an unsigned integer, with values from 0 to (2^22)-1. A two's-complement, signed integer is a slightly more sophisticated interpretation of the same bits. Or you (or the compiler, or CPU) could divide the 22 bits up into a mantissa and exponent, and store a range of decimal numbers. The range and precision would depend on how many bits were allocated to the mantissa, and how many to the exponent.
Or you could split the bits up and use some for the numerator and some for the denominator of a fraction. Or, in fact, anything else.
Some of these interpretations of the bits are built into hardware, some are implemented by compilers or libraries, and some are entirely under the programmer's control. Not all programming languages allow the programmer to manipulate individual bits in a natural or efficient way, but some do. Sometimes, using a highly unconventional interpretation of binary data can give significant efficiency gains, but usually at the expense of readability and maintainability.
So, yes, it matters what the data type is.
There is no law (of humans, logic, or nature) that says bits must represent numbers only in the pattern that one of the bits represents 20, another represents 21, another represents 22, and so on (and the number represented is the sum of those values for the bits that are 1). We have choices about how to use bits to represent numbers, including:
The bits do use that pattern, and so 22 bits can represent any number from 0 to the sum of 20 + 21 + 22 + … + 221 = 222 − 1 = 4,194,303. The smallest representable value is 0.
The bits mostly use that pattern, but it is modified so that one bit represents −221 instead of +221. This is called two’s complement, and the smallest value representable is −221 = −2,097,152.
The bits represent numbers as described above except the represent value is divided by 1000. This is called fixed-point. In the first case, the value represent by all bits 1 would be 4194.303, but the smallest representable value would be 0. With a combination of two’s complement and fixed-point scaled by 1/1000, the smallest representable value would be −2097.152.
The bits represent a floating-point number, where one bit represents a sign (+ or −), certain bits represent an exponent and other information, and the remaining bits represent a significand. In common floating-point formats, when all the bits in that exponent-and-other field are 1s and the significand field bits are 0s, the number represents +∞ or −∞, according to the sign bit. In such a format, the smallest representable value is −∞.
As an example, we could designate patterns of bits to represent numbers arbitrarily. We could say that 0000000000000000000000 represents 34, 0000000000000000000001 represents −15, 0000000000000000000010 represents 5, 0000000000000000000011 represents 3+4i, and so on. The smallest representable value would be whichever of those arbitrary values is smallest.
So what the smallest representable value is depends entirely on the type, since the “type” of the data includes the scheme by which the bits represent values.
If the type is a “signed integer type,” there is still some flexibility in the representation. Most modern C implementations (and other programming languages) use the two’s complement scheme described above. But the C standard still allows two other schemes:
One’s complement: If the first bit is 1, the value represented is negative, and its magnitude is given by complementing the remaining bits and interpreting them as binary. Using six bits for an example, 101001 would be negative with the magnitude of 101102 = 22, so −22.
Sign-and-magnitude: If the first bit is 1, the value represented is negative, and its magnitude is given by interpreting the remaining bits as binary. Using the same bits, 101001 would negative with the magnitude of 010012 = 9, so −9.
In both one’s complement and sign-and-magnitude, the smallest representable value with 22 bits is −(221−1) = −2,097,151.
To stretch the question further, C defines standard integer types but allows implementations to extend the language. An implementation could define some “signed integer type” with an arbitrary scheme for representing numbers, as long as that scheme included a sign, to make the name correct.
Without going into technical jargon about doing maths with Two's compliment, I'll try to explain in easy words.
First you need to raise 2 with power of 'number of bits'.
Let's take an example of an 8 bit type,
An un-signed 8-bit integer can store 2 ^ 8 = 256 values.
Since values are indexed starting from 0, so values range from 0 - 255.
Assuming you want to store signed values, so you need to get the half (simply divide it by 2),
256 / 2 = 128.
Remember we start from zero,
You might be rightly thinking you can store -127 to 127 starting from zero on both sides.
Just know that there is only zero (there is nothing like +0 or -0),
so you start with zero to positive half. 0 to 127,
that leaves you with negative half starting from -1 to -128
Hence the range will be -128 to 127.
For a 22 bit signed integer you can do the math,
2 ^ 22 = 4,194,304
4194304 / 2 = 2,097,152
-1 for positive side,
range will be, -2097152 to 2097151.
To answer your question,
-2097152 would be the smallest number you can store.
Thanks everyone for the replies. I figured it out with the help of all of your info but I will explain the answer to show exactly what gaps of knowledge I had that lead to my misunderstanding.
The data type does matter in this question because for signed data types the first bit is used to represent whether or not a binary number is positive or negative. 0111 = 7 and 1111 = -7
sign int and unsigned int use the same number of bits, 32 bits. Since an unsigned int is unsigned: the first bit isn't used to represent positive or negative so it can represent a larger number with that extra bit. 1111 converted to an unsigned int is 15 whereas with the signed int it was -7 since the furthest left bit represents the sign: 1 is negative and 0 is positive.
Now to answer "If a C signed integer type is stored in 22 bits, what is the smallest value it can store?":
If you convert binary to decimal you get 1111111111111111111111 = 4194304
This decimal value -1 is the maximum value an unsigned could hold. Since our data type is signed it has to use one less bit for the number value since the first bit represents the sign. This gives us -2097152.
Thanks again, everyone.

C Bitwise Operators Example

given the following function:
int boof(int n) {
return n + ~n + 1;
}
What does this function return? I'm having trouble understanding exactly what is being passed in to it. If I called boof(10), would it convert 10 to base 2, and then do the bitwise operations on the binary number?
This was a question I had on a quiz recently, and I think the answer is supposed to be 0, but I'm not sure how to prove it.
note: I know how each bitwise operator works, I'm more confused on how the input is processed.
Thanks!
When n is an int, n + ~n will always result in an int that has all bits set.
Strictly speaking, the behavior of adding 1 to such an int will depend on the representation of signed numbers on the platform. The C standard support 3 representations for signed int:
for Two's Complement machines (the vast majority of systems in use today), the result will be 0 since an int with all bits set is -1.
on a One's Complement machine (which are pretty rare today, I believe), the result will be 1 since an int with all bits set is 0 or -0 (negative zero) or undefined behavior.
a Signed-magnitude machine (are there really any of these still in use?), an int with all bits set is a negative number with the maximum magnitude (so the actual value will depend on the size of an int). In this case adding 1 to it will result in a negative number (the exact value, again depends on the number of bits that are used to represent an int).
Note that the above ignores that it might be possible for some implementations to trap with various bit configurations that might be possible with n + ~n.
Bitwise operations will not change the underlying representation of the number to base 2 - all math on the CPU is done using binary operations regardless.
What this function does is take n and then add it to the two's complement negative representation of itself. This essentially negates the input. Anything you put in will equal 0.
Let me explain with 8 bit numbers as this is easier to visualize.
10 is represented in binary as 00001010.
Negative numbers are stored in two's complement (NOTing the number and adding 1)
So the (~n + 1) portion for 10 looks like so:
11110101 + 1 = 11110110
So if we take n + ~n+1:
00001010 + 11110110 = 0
Notice if we add these numbers together we get a left carry which will set the overflow flag, resulting in a 0. (Adding a negative and positive number together never means the overflow indicates an exception!)
See this
The CARRY and OVERFLOW flag in Binary Arithmetic

Bits representation of negative numbers

This is a doubt regarding the representation of bits of signed integers. For example, when you want to represent -1, it is equivalent to 2's complement of (+1). So -1 is represented as 0xFFFFFFF. Now when I shift my number by 31 and print the result it is coming back as -1.
signed int a = -1;
printf(("The number is %d ",(a>>31));//this prints as -1
So can anyone please explain to me how the bits are represented for negative numbers?
Thanks.
When the top bit is zero, the number is positive. When it's 1, the number is negative.
Negative numbers shifted right keep shifting a "1" in as the topmost bit to keep the number negative. That's why you're getting that answer.
For more about two's complement, see this Stackoverflow question.
#Stobor points out that some C implementations could shift 0 into the high bit instead of 1. [Verified in Wikipedia.] In Java it's dependably an arithmetic shift.
But the output given by the questioner shows that his compiler is doing an arithmetic shift.
The C standard leaves it undefined whether the right shift of a negative (necessarily signed) integer shifts zeroes (logical shift right) or sign bits (arithmetic shift right) into the most significant bit. It is up to the implementation to choose.
Consequently, portable code ensures that it does not perform right shifts on negative numbers. Either it converts the value to the corresponding unsigned value before shifting (which is guaranteed to use a logical shift right, putting zeroes into the vacated bits), or it ensures that the value is positive, or it tolerates the variation in the output.
This is an arithmetic shift operation which preserves the sign bit and shifts the mantissa part of a signed number.
cheers
Basically there are two types of right shift. An unsigned right shift and a signed right shift. An unsigned right shift will shift the bits to the right, causing the least significant bit to be lost, and the most significant bit to be replaced with a 0. With a signed right shift, the bits are shifted to the right, causing the least significant bit be be lost, and the most significant bit to be preserved. A signed right shift divides the number by a power of two (corresponding to the number of places shifted), whereas an unsigned shift is a logical shifting operation.
The ">>" operator performs an unsigned right shift when the data type on which it operates is unsigned, and it performs a signed right shift when the data type on which it operates is signed. So, what you need to do is cast the object to an unsigned integer type before performing the bit manipulation to get the desired result.
Have a look at two's complement description. It should help.
EDIT: When the below was written, the code in the question was written as:
unsigned int a = -1;
printf(("The number is %d ",(a>>31));//this prints as -1
If unsigned int is at least 32 bits wide, then your compiler isn't really allowed to produce -1 as the output of that (with the small caveat that you should be casting the unsigned value to int before you pass it to printf).
Because a is an unsigned int, assigning -1 to it must give it the value of UINT_MAX (as the smallest non-negative value congruent to -1 modulo UINT_MAX+1). As long as unsigned int has at least 32 bits on your platform, the result of shifting that unsigned quantity right by 31 will be UINT_MAX divided by 2^31, which has to fit within int. (If unsigned int is 31 bits or shorter, it can produce whatever it likes because the result of the shift is unspecified).

Integer overflow problem

Please explain the following paragraph.
"The next question is whether we can assign a certain value to a variable without losing precision. It is not sufficient if we just check for overflow during the addition or subtraction, because someone might add 1 to -5 and assign the result to an unsigned int. Then the actual addition does not overflow, but the result still does not fit."
when i am adding 1 to -5 i dont see any reason to worry.the answer is as it should be -4.
so what is the problem of result not being fit??
you can find the full article here through which i was going:
http://www.fefe.de/intof.html
The binary representation of -4, in a 32-bit word, is as follows (hex notation)
0xfffffffc
When interpreted as an unsigned integer, this bit pattern represents the number 2**32-4, or 18446744073709551612. I'm not sure I would call this phenomenon "overflow", but it is a common mistake to assign a small negative integer to a variable of unsigned type and wind up with a really big positive integer.
This trick is actually exploited for bounds checking: if you have a signed integer i and want to know if it is in the range 0 <= i < n, you can test
if ((unsigned)i < n) { ... }
which gives you the answer using one comparison instead of two. The cast to unsigned has no run-time cost; it just tells the compiler to generate an unsigned comparison instead of a signed comparison.
Try assigning it to a unsigned int, not an int.
The term unsigned int is the key - by default an int datatype will hold negative and positive numbers; however, unsigned ints are always positive. They provide this option because uints can technically hold greater positive values than regular signed ints because they do not need to use a bit to keep track of whether or not its negative or positive.
Please see:
Signed versus Unsigned Integers
The problem is that you're storing -4 in an unsigned int. Unsigned ints can only contain zero and positive values. If you assign -4 to one, you'll actually end up getting a very large positive number (the actual value depends on how wide an int you're using).
The problem is that the sizes of storage such as unsigned int can only hold so much. With 1 and -5 it does not matter, but with 1 and -500000000 you might end up with a confusing result. Also, unsigned storage will interpret anything stored in it as positive, so you cannot put a negative value in an unsigned variable.
Two big things to watch out for:
1. Overflow in the operation itself: 1 + -500000000
2. Issues in casting: (unsigned int)(1 + -500)
Unsigned variables, like unsigned int, cannot hold negative values. So assigning 1 - 5 to an unsigned int won't give you -4. I'm not sure what it'll give you, it's probably compiler specific.
Some code:
signed int first, second;
unsigned int result;
first = obtain(); // happens to be 1
second = obtain(); // happens to be -5
result = first + second; // unexpected result here - very large number - and it's too late to check that there's a problem
Say you obtained those values from keyboard. You need to check before addition that the result can be represented in unsigned int. That's what the article talks about.
By definition the number -4 cannot be represented in an unsigned int. -4 is a signed integer. The same goes for any negative number.
When you assign a negative integer to an unsigned int the actual bits of the number do not change, but they are merely represented differently. You'll get some ridiculously-large number due to the way integers are represented in binary (two's complement).
In two's complement, -4 is represented as 0xfffffffc. When 0xfffffffc is represented as an unsigned int you'll get the number 4,294,967,292.
You have to remember that fundamentally you're working with bits. So you can assign a value of -4 to an unsigned integer and this will place a series of bits into that memory location. Those bits can be interpreted as -4 in certain circumstances. One such circumstance is the obvious one: you've told the compiler/system that the bits in that memory location should be interpreted as a two's compliment signed number. So if you do printf("%s",i) prtinf does its magic and converts the two's compliment number to a magnitude and sign. The magnitude will be 4 and the sign will be negative, so it displays '-4'.
However, if you tell the compiler that the data at that memory location is not signed then the bits don't change but instead their interpretation does. So when you do your addition, store the result in an unsigned integer memory location and then call printf on the result it doesn't bother looking for the sign because by definition it is always positive. It calculates the magnitude and prints it. The magnitude will be off because the sign information is still encoded in the bits but it's treated as magnitude information.

Resources