given the following function:
int boof(int n) {
return n + ~n + 1;
}
What does this function return? I'm having trouble understanding exactly what is being passed in to it. If I called boof(10), would it convert 10 to base 2, and then do the bitwise operations on the binary number?
This was a question I had on a quiz recently, and I think the answer is supposed to be 0, but I'm not sure how to prove it.
note: I know how each bitwise operator works, I'm more confused on how the input is processed.
Thanks!
When n is an int, n + ~n will always result in an int that has all bits set.
Strictly speaking, the behavior of adding 1 to such an int will depend on the representation of signed numbers on the platform. The C standard support 3 representations for signed int:
for Two's Complement machines (the vast majority of systems in use today), the result will be 0 since an int with all bits set is -1.
on a One's Complement machine (which are pretty rare today, I believe), the result will be 1 since an int with all bits set is 0 or -0 (negative zero) or undefined behavior.
a Signed-magnitude machine (are there really any of these still in use?), an int with all bits set is a negative number with the maximum magnitude (so the actual value will depend on the size of an int). In this case adding 1 to it will result in a negative number (the exact value, again depends on the number of bits that are used to represent an int).
Note that the above ignores that it might be possible for some implementations to trap with various bit configurations that might be possible with n + ~n.
Bitwise operations will not change the underlying representation of the number to base 2 - all math on the CPU is done using binary operations regardless.
What this function does is take n and then add it to the two's complement negative representation of itself. This essentially negates the input. Anything you put in will equal 0.
Let me explain with 8 bit numbers as this is easier to visualize.
10 is represented in binary as 00001010.
Negative numbers are stored in two's complement (NOTing the number and adding 1)
So the (~n + 1) portion for 10 looks like so:
11110101 + 1 = 11110110
So if we take n + ~n+1:
00001010 + 11110110 = 0
Notice if we add these numbers together we get a left carry which will set the overflow flag, resulting in a 0. (Adding a negative and positive number together never means the overflow indicates an exception!)
See this
The CARRY and OVERFLOW flag in Binary Arithmetic
Related
Could anyone help me understand the difference between signed/unsigned int, as well as signed/unsigned char? In this case, if it's unsigned wouldn't the value just never reach a negative number and continue on an infinite loop of 0's?
int main()
{
unsigned int n=3;
while (n>=0)
{
printf ("%d",n);
n=n-1;
}
return 0;
}
Two important things:
At one level, the difference between regular signed, versus unsigned values, is just the way we interpret the bits. If we limit ourselves to 3 bits, we have:
bits
signed
unsigned
000
0
0
001
1
1
010
2
2
011
3
3
100
-4
4
101
-3
5
110
-2
6
111
-1
7
The bit patterns don't change, it's just a matter of interpretation whether we have them represent nonnegative integers from 0 to 2N-1, or signed integers from -2N/2 to 2N/2-1.
The other important thing to know is what operations are defined on a type. For unsigned types, addition and subtraction are defined so that they "wrap around" from 0 to 2N-1. But for signed types, overflow and underflow are undefined. (On some machines they wrap around, but not all.)
Finally, there's the issue of properly matching up your printf formats. For %d, you're supposed to give it a signed integer. But you gave it unsigned instead. Strictly speaking, that results in undefined behavior, too, but in this case (and not too suprisingly), what happened was that it took the same bit pattern and printed it out as if it were signed, rather than unsigned.
wouldn't the value just never reach a negative number
Correct, it can't be negative.
and continue on an infinite loop of 0's
No, it will wrap-around from zero to the largest value of an unsigned int, which is well-defined behavior. If you use the correct conversion specifier %u instead of the incorrect %d, you'll notice this output:
3
2
1
0
4294967295
4294967294
...
Signed number representation is the categorization of positive as well as negative integers while unsigned categorizations are classifications of positive integersو and the code you wrote will run forever because n is an unsigned number and always represents a positive number.
In this case, if it's unsigned wouldn't the value just never reach a negative number ...?
You are right. But in the statement printf ("%d",n); you “deceived” the printf() function — using the type conversion specifier d — that the number in variable n is signed.
Use the type conversion specifier u instead: printf ("%u",n);
... never reach a negative number and continue on an infinite loop of 0's?
No. “Never reaching a negative number” is not the same as “stopping at 0 and resist further decrementing”.
Other people already explained this. Here is my explanation, in the form of analogies:
Imagine yourself a never ending and never beginning sequence of non-negative integers:
..., 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, ... // the biggest is 3 only for simplicity
— or numbers on an analog clock:
You may increase / decrease a number forever, going round and round.
The terms signed and unsigned refer to how the CPU treats sequences of bits.
There are 2 important things to understand here:
How the CPU adds finite sequences of bits to a single finite result
How the CPU differentiates between signed and unsigned operands
Let's start with (1).
Let's take 4-bit nibbles for example.
If we ask the CPU to add 0001 and 0001, the result should be 2, or 0010.
But if we ask it to add 1111 and 0001, the result should be 16, or 10000. But it only has 4 bits to contain the result. The convention is to wrap-around, or circle, back to 0, effectively ignoring the Most Significant Bit. See also integer overflow..
Why is this relevant? Because it produces an interesting result. That is, according to the definition above, if we let x = 1111, then we get x + 1 = 0. Well, x, or 1111, now looks and behaves awfully like -1. This is the birth of signed numbers and operations. And if 1111 can be deemed as -1, then 1111 - 1 = 1110 should be -2, and so on.
Now let's look at (2).
When the C compiler sees you defining an unsigned int, it will use special CPU instructions for dealing with unsigned numbers, where it deems relevant. For example, this is relevant in jump instructions, where the CPU needs to know if you mean it to jump way forward, or slightly backward. For this it needs to know if you mean your operand to be interpreted in a signed, or unsigned way.
The operation of adding two numbers, on the other hand, is fundamentally oblivious to the consequent interpretation. The only thing is that the CPU will turn on a special flag after an addition operation, to tell you whether a wrap-around has occurred, for your own auditing.
But the important thing to understand is that the sequence of bits doesn't change; only its interpretation.
To tie all of this to your example, subtracting 1 from an unsigned 0 will simply wrap-around back to 1111, or 2^32 in your case.
Finally, there are other uses for signed/unsigned. For example, by the very fact it is defined as a different type, this allows functions to be written that define a contract where only unsigned integers, let's say, can be passed to it. Also, it's relevant when you want to display or print the number.
I've done some quick tests that a signed int to unsigned int cast in C does not change the bit values (on an online debugger).
What I want to know is whether it is guaranteed by a C standard or just the common (but not 100% sure) behaviour ?
Conversion from signed int to unsigned int does not change the bit representation in two’s-complement C implementations, which are the most common, but will change the bit representation for negative numbers, including possible negative zeroes on one’s complement or sign-and-magnitude systems.
This is because the cast (unsigned int) a is not defined to retain the bits but the result is the positive remainder of dividing a by UINT_MAX + 1 (or as the C standard (C11 6.3.1.3p2) says,
the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.
The two’s complement representation for negative numbers is the most commonly used representation for signed numbers exactly because it has this property of negative value n mapping to the same bit pattern as the mathematical value n + UINT_MAX + 1 – it makes it possible to use the same machine instruction for signed and unsigned addition, and the negative numbers will work because of wraparound.
Casting from a signed to an unsigned integer is required to generate the correct arithmetic result (the same number), modulo the size of the unsigned integer, so to speak. That is, after
int i = anything;
unsigned int u = (unsigned int)i;
and on a machine with 32-bit ints, the requirement is that u is equal to i, modulo 232.
(We could also try to say that u receives the value i % 0x100000000, except it turns out that's not quite right, because the C rules say that when you divide a negative integer by a positive integer, you get a quotient rounded towards 0 and a negative remainder, which isn't the kind of modulus we want here.)
If i is 0 or positive, it's not hard to see that u will have the same bit pattern.
If i is negative, and if you're on a 2's complement machine, it turns out the result is also guaranteed to have the same bit pattern. (I'd love to present a nice proof of that result here, but I don't have time just now to try to construct it.)
The vast majority of today's machines use 2's complement. But if you were on a 1's complement or sign/magnitude machine, I'm pretty sure the bit patterns would not always be the same.
So, bottom line, the sameness of the bit patterns is not guaranteed by the C Standard, but arises due to a combination of the C Standard's requirements, and the particulars of 2's complement arithmetic.
If I have a normal 32 bit int in C, how is the number represented in memory? Specifically, how is the sign stored?
At first I thought it would be sign and magnitude but then I remembered there would be + and - 0, so then I thought it could be stored in two's complement, however when using two's complement I ended up getting a maximum value of 4294967293 with the MSB set to 0 (I got this value by summing up everything from 2^31 to 2^0) which as we all know is wrong.
The C standard doesn't mandate two's complement, but it's by far the most common option (and I think even this is an understatement). The reason you arrive a 4 billion something, not at the correct INT_MAX, is an off-by-one error: You added the 32th bit as well. 231 is the value of the MSB (because the LSB has value 20), so the correct value is the sum 20 + ... + 230 = 231-1.
Even this is implementation defined, most machines use two's complement.
The problem is that your computation is wrong. Summing up 2^0 to 2^31 (which is 2^32 - 1, by the way) means that you use 32 bits. This is not correct: only 31 bits are used for the actual number. MSB has a special meaning and it has to do with the sign (is not really 'the sign'). So, you need to sum up from 2^0 to 2^30 (which is 2^31 - 1).
This is a macro in the lwIP source code:
#define TCP_SEQ_LT(a,b) ((int32_t)((uint32_t)(a) - (uint32_t)(b)) < 0)
Which is used to check if a TCP sequence number is less than another, taking into account when the sequence numbers wrap around. It exploits the fact that arithmetic wraps around, but I am unable to understand how this works in this particular case.
Can anyone explain what happens and why the above works ?
Take a simple 4 bit integer example where a = 5 and b = 6. The binary representation of each will be
a = 0101
b = 0110
Now when we subtract these (or take two's complement of b, sum with a, and add 1), we get the following
0101
1001
+ 1
-----
1111
1111 is equal to 15 (unsigned) or -1 (signed, again translated using two's complement). By casting the two numbers to unsigned, we ensure that if b > a, the difference between the two is going to be a large unsigned number and have it's highest bit set. When translating this large unsigned number into its signed counterpart we will always get a negative number due to the set MSB.
As nos pointed out, when a sequence number wraps around from the max unsigned value back to the min, the macro will also return that the max value is < min using the above arithmetic, hence its usefulness.
On wrap-around a will be much greater than b. If we subtract them the result will also be very large, ie have its high-order bit set. If we then treat the result as a signed value the large difference will turn into a negative number, less than 0.
If you had 2 sequence numbers 2G apart it wouldn't work, but that's not going to happen.
Because it is first cast as a signed integer before it is compared to zero. Remember that the first bit reading from left to right determines the sign in a signed number but is used to increase the unsigned int's range by an extra bit.
Example: let's say you have a 4 bit unsigned number. This would mean 1001 is 17. But as a signed integer this would be -1.
Now lets say we did b0010 - b0100. This ends up with b1110. Unsigned this is 14, and signed this is -6.
What do arithmetic underflow and overflow mean in C programming?
Overflow
From http://en.wikipedia.org/wiki/Arithmetic_overflow:
the condition that occurs when a
calculation produces a result that is
greater in magnitude than that which a
given register or storage location can
store or represent.
So, for instance:
uint32_t x = 1UL << 31;
x *= 2; // Overflow!
Note that as #R mentions in a comment below, the C standard suggests:
A computation involving unsigned
operands can never overflow, because a
result that cannot be represented by
the resulting unsigned integer type is
reduced modulo the number that is one
greater than the largest value that
can be represented by the resulting
type.
Of course, this is a fairly idiosyncratic definition of "overflow". Most people would refer to modulo reduction (i.e wrap-around) as "overflow".
Underflow
From http://en.wikipedia.org/wiki/Arithmetic_underflow:
the condition in a computer program that
can occur when the true result of a
floating point operation is smaller in
magnitude (that is, closer to zero)
than the smallest value representable
as a normal floating point number in
the target datatype.
So, for instance:
float x = 1e-30;
x /= 1e20; // Underflow!
Computers use only 0 and 1 to represent data so that the range of values that can be represented is limited. Many computers use 32 bits to store integers, so the largest unsigned integer that can be stored in this case is 2^32 -1 = 4294967295. But the first bit is used to represent the sign, so, in fact, the largest value is 2^31 - 1 = 2147483647.
The situation where an integer outside the allowed range requires more bits than can be stored is called an overflow.
Similarly, with real numbers, an exponent that is too small to be stored causes an underflow.
int, the most common data type in C, is a 32-bit data type. This means that each int is given 32 bits in memory. If I had the variable
int a = 2;
that would actually be represented in memory as a 32-bit binary number:
00000000000000000000000000000010.
If you have two binary numbers such as
10000000000000000000000000000000
and
10000000000000000000000000000000,
their sum would be 100000000000000000000000000000000, which is 33 bits long. However, the computer only takes the 32 least significant bits, which are all 0. In this case the computer recognizes that the sum is greater than what can be stored in 32 bits, and gives an overflow error.
An underflow is basically the same thing happening in the opposite direction. The floating-point standard used for C allows for 23 bits after the decimal place; if the number has precision beyond this point it won't be able to store those bits. This results in an underflow error and/or loss of precision.
underflow depends exclusively upon the given algorithm and the given input data,and hence there is no direct control by the programmer .Overflow on the other hand, depends upon the arbitrary choice of the programmer for the amount of memory space reserved for each stack ,and this choice does influence the number of times overflow may occur