I read a bit in C spec that unsigned variables(in particular unsigned short int) perform some so called wrap around on integer overflow, although I couldn't find anything on signed variables except that I left with undefined behavior.
My professor told me that their values also get wrapped around (maybe he just meant gcc). I thought the bits just get truncated and the bits I left with give me some weird value!
What wrap around is and how is it different from just truncating bits.
Signed integer variables do not have wrap-around behavior in C language. Signed integer overflow during arithmetic computations produces undefined behavior. Note BTW that GCC compiler you mentioned is known for implementing strict overflow semantics in optimizations, meaning that it takes advantage of the freedom provided by such undefined behavior situations: GCC compiler assumes that signed integer values never wrap around. That means that GCC actually happens to be one of the compilers in which you cannot rely on wrap-around behavior of signed integer types.
For example, GCC compiler can assume that for variable int i the following condition
if (i > 0 && i + 1 > 0)
is equivalent to a mere
if (i > 0)
This is exactly what strict overflow semantics means.
Unsigned integer types implement modulo arithmetic. The modulo is equal 2^N where N is the number of bits in the value representation of the type. For this reason unsigned integer types do indeed appear to wrap around on overflow.
However, C language never performs arithmetic computations in domains smaller than that of int/unsigned int. Type unsigned short int that you mention in your question will typically be promoted to type int in expressions before any computations begin (assuming that the range of unsigned short fits into the range of int). Which means that 1) the computations with unsigned short int will be preformed in the domain of int, with overflow happening when int overflows, 2) overflow during such computations will lead to undefined behavior, not to wrap-around behavior.
For example, this code produces a wrap around
unsigned i = USHRT_MAX;
i *= INT_MAX; /* <- unsigned arithmetic, overflows, wraps around */
while this code
unsigned short i = USHRT_MAX;
i *= INT_MAX; /* <- signed arithmetic, overflows, produces undefined behavior */
leads to undefined behavior.
If no int overflow happens and the result is converted back to an unsigned short int type, it is again reduced by modulo 2^N, which will appear as if the value has wrapped around.
Imagine you have a data type that's only 3 bits wide. This allows you to represent 8 distinct values, from 0 through 7. If you add 1 to 7, you will "wrap around" back to 0, because you don't have enough bits to represent the value 8 (1000).
This behavior is well-defined for unsigned types. It is not well-defined for signed types, because there are multiple methods for representing signed values, and the result of an overflow will be interpreted differently based on that method.
Sign-magnitude: the uppermost bit represents the sign; 0 for positive, 1 for negative. If my type is three bits wide again, then I can represent signed values as follows:
000 = 0
001 = 1
010 = 2
011 = 3
100 = -0
101 = -1
110 = -2
111 = -3
Since one bit is taken up for the sign, I only have two bits to encode a value from 0 to 3. If I add 1 to 3, I'll overflow with -0 as the result. Yes, there are two representations for 0, one positive and one negative. You won't encounter sign-magnitude representation all that often.
One's-complement: the negative value is the bitwise-inverse of the positive value. Again, using the three-bit type:
000 = 0
001 = 1
010 = 2
011 = 3
100 = -3
101 = -2
110 = -1
111 = -0
I have three bits to encode my values, but the range is [-3, 3]. If I add 1 to 3, I'll overflow with -3 as the result. This is different from the sign-magnitude result above. Again, there are two encodings for 0 using this method.
Two's-complement: the negative value is the bitwise inverse of the positive value, plus 1. In the three-bit system:
000 = 0
001 = 1
010 = 2
011 = 3
100 = -4
101 = -3
110 = -2
111 = -1
If I add 1 to 3, I'll overflow with -4 as a result, which is different from the previous two methods. Note that we have a slightly larger range of values [-4, 3] and only one representation for 0.
Two's complement is probably the most common method of representing signed values, but it's not the only one, hence the C standard can't make any guarantees of what will happen when you overflow a signed integer type. So it leaves the behavior undefined so the compiler doesn't have to deal with interpreting multiple representations.
The undefined behavior comes from early portability issues when signed integer types could be represented either as sign & magnitude, one's complement or two's complement.
Nowadays, all architectures represent integers as two's complement that do wrap around. But be careful : since your compiler is right to assume you won't be running undefined behavior, you might encounter weird bugs when optimisation is on.
In a signed 8-bit integer, the intuitive definition of wrap around might look like going from +127 to -128 -- in two's complement binary: 0111111 (127) and 1000000 (-128). As you can see, that is the natural progress of incrementing the binary data--without considering it to represent an integer, signed or unsigned. Counter intuitively, the actual overflow takes place when moving from -1 (11111111) to 0 (00000000) in the unsigned integer's sense of wrap-around.
This doesn't answer the deeper question of what the correct behavior is when a signed integer overflows because there is no "correct" behavior according to the standard.
Related
Could anyone help me understand the difference between signed/unsigned int, as well as signed/unsigned char? In this case, if it's unsigned wouldn't the value just never reach a negative number and continue on an infinite loop of 0's?
int main()
{
unsigned int n=3;
while (n>=0)
{
printf ("%d",n);
n=n-1;
}
return 0;
}
Two important things:
At one level, the difference between regular signed, versus unsigned values, is just the way we interpret the bits. If we limit ourselves to 3 bits, we have:
bits
signed
unsigned
000
0
0
001
1
1
010
2
2
011
3
3
100
-4
4
101
-3
5
110
-2
6
111
-1
7
The bit patterns don't change, it's just a matter of interpretation whether we have them represent nonnegative integers from 0 to 2N-1, or signed integers from -2N/2 to 2N/2-1.
The other important thing to know is what operations are defined on a type. For unsigned types, addition and subtraction are defined so that they "wrap around" from 0 to 2N-1. But for signed types, overflow and underflow are undefined. (On some machines they wrap around, but not all.)
Finally, there's the issue of properly matching up your printf formats. For %d, you're supposed to give it a signed integer. But you gave it unsigned instead. Strictly speaking, that results in undefined behavior, too, but in this case (and not too suprisingly), what happened was that it took the same bit pattern and printed it out as if it were signed, rather than unsigned.
wouldn't the value just never reach a negative number
Correct, it can't be negative.
and continue on an infinite loop of 0's
No, it will wrap-around from zero to the largest value of an unsigned int, which is well-defined behavior. If you use the correct conversion specifier %u instead of the incorrect %d, you'll notice this output:
3
2
1
0
4294967295
4294967294
...
Signed number representation is the categorization of positive as well as negative integers while unsigned categorizations are classifications of positive integersو and the code you wrote will run forever because n is an unsigned number and always represents a positive number.
In this case, if it's unsigned wouldn't the value just never reach a negative number ...?
You are right. But in the statement printf ("%d",n); you “deceived” the printf() function — using the type conversion specifier d — that the number in variable n is signed.
Use the type conversion specifier u instead: printf ("%u",n);
... never reach a negative number and continue on an infinite loop of 0's?
No. “Never reaching a negative number” is not the same as “stopping at 0 and resist further decrementing”.
Other people already explained this. Here is my explanation, in the form of analogies:
Imagine yourself a never ending and never beginning sequence of non-negative integers:
..., 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, ... // the biggest is 3 only for simplicity
— or numbers on an analog clock:
You may increase / decrease a number forever, going round and round.
The terms signed and unsigned refer to how the CPU treats sequences of bits.
There are 2 important things to understand here:
How the CPU adds finite sequences of bits to a single finite result
How the CPU differentiates between signed and unsigned operands
Let's start with (1).
Let's take 4-bit nibbles for example.
If we ask the CPU to add 0001 and 0001, the result should be 2, or 0010.
But if we ask it to add 1111 and 0001, the result should be 16, or 10000. But it only has 4 bits to contain the result. The convention is to wrap-around, or circle, back to 0, effectively ignoring the Most Significant Bit. See also integer overflow..
Why is this relevant? Because it produces an interesting result. That is, according to the definition above, if we let x = 1111, then we get x + 1 = 0. Well, x, or 1111, now looks and behaves awfully like -1. This is the birth of signed numbers and operations. And if 1111 can be deemed as -1, then 1111 - 1 = 1110 should be -2, and so on.
Now let's look at (2).
When the C compiler sees you defining an unsigned int, it will use special CPU instructions for dealing with unsigned numbers, where it deems relevant. For example, this is relevant in jump instructions, where the CPU needs to know if you mean it to jump way forward, or slightly backward. For this it needs to know if you mean your operand to be interpreted in a signed, or unsigned way.
The operation of adding two numbers, on the other hand, is fundamentally oblivious to the consequent interpretation. The only thing is that the CPU will turn on a special flag after an addition operation, to tell you whether a wrap-around has occurred, for your own auditing.
But the important thing to understand is that the sequence of bits doesn't change; only its interpretation.
To tie all of this to your example, subtracting 1 from an unsigned 0 will simply wrap-around back to 1111, or 2^32 in your case.
Finally, there are other uses for signed/unsigned. For example, by the very fact it is defined as a different type, this allows functions to be written that define a contract where only unsigned integers, let's say, can be passed to it. Also, it's relevant when you want to display or print the number.
I'm learning C from CS50. When I run my code, it says 'signed integer overflow'.
#include <stdio.h>
#include <cs50.h>
int main(void)
{
int x = 41;
int c = 0;
while(x>=25)
{
c = c+1;
}
printf("%i\n", c);
}
Can someone explain what that means?
Your while condition will always be true, meaning the loop will run forever, adding 1 to c in each iteration.
Since c is a (signed) int it means it will increment slowly to its max value, and after that the next increment would be UB (undefined behavior). What many machines will do in this specific UB is to turn c negative, which I guess is not what you wanted. This happens due to a phenomenon called "signed integer overflow".
Let's assume 32-bit int and using two's complement. A signed int will look like this in binary sign bit (0 for positive, 1 for negative) | 31 bits. zero will look like 000...00, one like 000...01 and so on.
Max signed int will look like 0111...11 (2,147,483,647). When adding 1 to this number you'll get 100...000 which flipped the sign bit which will now result in a negative number. Adding another 1 to this will result in 100...001 which again has the sign bit on meaning it is still negative...
Declaring c as unsigned would ensure c remains non-negative. Also, making the loop end with while(x-- >= 25) could also be a good idea :)
"Signed integer overflow" means that you tried to store a value that's outside the range of values that the type can represent, and the result of that operation is undefined (in this particular case, your program halts with an error).
Since your while loop never terminates (x >= 25 evaluates to true, and you never change the value of x), you keep adding 1 to c until you reach a value outside the range that a signed int can represent.
Remember that in C, integral and floating-point types have fixed sizes, meaning they can only represent a fixed number of values. For example, suppose int is 3 bits wide, meaning it can only store 8 distinct values. What those values are depends on how the bit patterns are interpreted. You could store "unsigned" (non-negative) values [0..7], or "signed" (negative and non-negative) values [-3...3] or [-4..3] depending on representation. Here are several different ways you can interpret the values of three bits:
Bits Unsigned Sign-Magnitude 1's Complement 2's Complement
---- -------- ------------- -------------- --------------
000 0 0 0 0
001 1 1 1 1
010 2 2 2 2
011 3 3 3 3
100 4 -0 -3 -4
101 5 -1 -2 -3
110 6 -2 -1 -2
111 7 -3 -0 -1
Most systems use 2's Complement for signed integer values. Yes, sign-magnitude and 1's complement have positive and negative representations for zero.
So, let's say c is our 3-bit signed int. We start it at 0 and add 1 each time through the loop. Eveything's fine until c is 3 - using our 3-bit signed representation, we cannot represent the value 4. The result of the operation is undefined behavior, meaning the compiler is not required to handle the issue in any particular way. Logically, you'd expect the value to "wrap around" to a negative value based on the representation in use, but even that's not necessarily true, depending on how the compiler optimizes arithmetic operations.
Note that unsigned integer overflow is well-defined - you'll "wrap around" back to 0.
An integer can only hold so many numbers before it reaches its max value. In your while loop it says to execute it while x>=25. Since x is 41 and x never decreases in its value that means the while loop will always execute because it's always true since 41>=25.
An integer can only hold up to the number 2,147,483,647 which means that since C will keep on adding to itself as the while loop will always be true once it reaches 2,147,483,647 it will give you an error because an integer cannot go past that as it doesn't have enough memory to.
Well... You have an infinite loop because your value x will always be bigger than 25 since you dont decrease it.since the loop is infinite, your value c reaches the max size of an int (which is 2,147,483,647 if 4bytes).
You can try this in order to escape the infinite loop:
int main(void)
{
int x = 41;
int c = 0;
while (x >= 25)
{
c = c+1;
x--;
}
printf("%i\n", c);
}
First of all, you need to know what a "signed integer overflow condition" is.
It is a condition which appears when a mathematical operation results in a number which is out of bounds of the data type, which is signed integer overflow in your case.
This happens because your loop goes on infinitely, because x >= 25 will always be true.
I think the question is self explanatory, I guess it probably has something to do with overflow but still I do not quite get it. What is happening, bitwise, under the hood?
Why does -(-2147483648) = -2147483648 (at least while compiling in C)?
Negating an (unsuffixed) integer constant:
The expression -(-2147483648) is perfectly defined in C, however it may be not obvious why it is this way.
When you write -2147483648, it is formed as unary minus operator applied to integer constant. If 2147483648 can't be expressed as int, then it s is represented as long or long long* (whichever fits first), where the latter type is guaranteed by the C Standard to cover that value†.
To confirm that, you could examine it by:
printf("%zu\n", sizeof(-2147483648));
which yields 8 on my machine.
The next step is to apply second - operator, in which case the final value is 2147483648L (assuming that it was eventually represented as long). If you try to assign it to int object, as follows:
int n = -(-2147483648);
then the actual behavior is implementation-defined. Referring to the Standard:
C11 §6.3.1.3/3 Signed and unsigned integers
Otherwise, the new type is signed and the value cannot be represented
in it; either the result is implementation-defined or an
implementation-defined signal is raised.
The most common way is to simply cut-off the higher bits. For instance, GCC documents it as:
For conversion to a type of width N, the value is reduced modulo 2^N
to be within range of the type; no signal is raised.
Conceptually, the conversion to type of width 32 can be illustrated by bitwise AND operation:
value & (2^32 - 1) // preserve 32 least significant bits
In accordance with two's complement arithmetic, the value of n is formed with all zeros and MSB (sign) bit set, which represents value of -2^31, that is -2147483648.
Negating an int object:
If you try to negate int object, that holds value of -2147483648, then assuming two's complement machine, the program will exhibit undefined behavior:
n = -n; // UB if n == INT_MIN and INT_MAX == 2147483647
C11 §6.5/5 Expressions
If an exceptional condition occurs during the evaluation of an
expression (that is, if the result is not mathematically defined or
not in the range of representable values for its type), the behavior
is undefined.
Additional references:
INT32-C. Ensure that operations on signed integers do not result in overflow
*) In withdrawed C90 Standard, there was no long long type and the rules were different. Specifically, sequence for unsuffixed decimal was int, long int, unsigned long int (C90 §6.1.3.2 Integer constants).
†) This is due to LLONG_MAX, which must be at least +9223372036854775807 (C11 §5.2.4.2.1/1).
Note: this answer does not apply as such on the obsolete ISO C90 standard that is still used by many compilers
First of all, on C99, C11, the expression -(-2147483648) == -2147483648 is in fact false:
int is_it_true = (-(-2147483648) == -2147483648);
printf("%d\n", is_it_true);
prints
0
So how it is possible that this evaluates to true?
The machine is using 32-bit two's complement integers. The 2147483648 is an integer constant that quite doesn't fit in 32 bits, thus it will be either long int or long long int depending on whichever is the first where it fits. This negated will result in -2147483648 - and again, even though the number -2147483648 can fit in a 32-bit integer, the expression -2147483648 consists of a >32-bit positive integer preceded with unary -!
You can try the following program:
#include <stdio.h>
int main() {
printf("%zu\n", sizeof(2147483647));
printf("%zu\n", sizeof(2147483648));
printf("%zu\n", sizeof(-2147483648));
}
The output on such machine most probably would be 4, 8 and 8.
Now, -2147483648 negated will again result in +214783648, which is still of type long int or long long int, and everything is fine.
In C99, C11, the integer constant expression -(-2147483648) is well-defined on all conforming implementations.
Now, when this value is assigned to a variable of type int, with 32 bits and two's complement representation, the value is not representable in it - the values on 32-bit 2's complement would range from -2147483648 to 2147483647.
The C11 standard 6.3.1.3p3 says the following of integer conversions:
[When] the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised.
That is, the C standard doesn't actually define what the value in this case would be, or doesn't preclude the possibility that the execution of the program stops due to a signal being raised, but leaves it to the implementations (i.e. compilers) to decide how to handle it (C11 3.4.1):
implementation-defined behavior
unspecified behavior where each implementation documents how the choice is made
and (3.19.1):
implementation-defined value
unspecified value where each implementation documents how the choice is made
In your case, the implementation-defined behaviour is that the value is the 32 lowest-order bits [*]. Due to the 2's complement, the (long) long int value 0x80000000 has the bit 31 set and all other bits cleared. In 32-bit two's complement integers the bit 31 is the sign bit - meaning that the number is negative; all value bits zeroed means that the value is the minimum representable number, i.e. INT_MIN.
[*] GCC documents its implementation-defined behaviour in this case as follows:
The result of, or the signal raised by, converting an integer to a signed integer type when the value cannot be represented in an object of that type (C90 6.2.1.2, C99 and C11 6.3.1.3).
For conversion to a type of width N, the value is reduced modulo 2^N to be within range of the type; no signal is raised.
This is not a C question, for on a C implementation featuring 32-bit two's complement representation for type int, the effect of applying the unary negation operator to an int having the value -2147483648 is undefined. That is, the C language specifically disavows designating the result of evaluating such an operation.
Consider more generally, however, how the unary - operator is defined in two's complement arithmetic: the inverse of a positive number x is formed by flipping all the bits of its binary representation and adding 1. This same definition serves as well for any negative number that has at least one bit other than its sign bit set.
Minor problems arise, however, for the two numbers that have no value bits set: 0, which has no bits set at all, and the number that has only its sign bit set (-2147483648 in 32-bit representation). When you flip all the bits of either of these, you end up with all value bits set. Therefore, when you subsequently add 1, the result overflows the value bits. If you imagine performing the addition as if the number were unsigned, treating the sign bit as a value bit, then you get
-2147483648 (decimal representation)
--> 0x80000000 (convert to hex)
--> 0x7fffffff (flip bits)
--> 0x80000000 (add one)
--> -2147483648 (convert to decimal)
Similar applies to inverting zero, but in that case the overflow upon adding 1 overflows the erstwhile sign bit, too. If the overflow is ignored, the resulting 32 low-order bits are all zero, hence -0 == 0.
I'm gonna use a 4-bit number, just to make maths simple, but the idea is the same.
In a 4-bit number, the possible values are between 0000 and 1111. That would be 0 to 15, but if you wanna represent negative numbers, the first bit is used to indicate the sign (0 for positive and 1 for negative).
So 1111 is not 15. As the first bit is 1, it's a negative number. To know its value, we use the two-complement method as already described in previous answers: "invert the bits and add 1":
inverting the bits: 0000
adding 1: 0001
0001 in binary is 1 in decimal, so 1111 is -1.
The two-complement method goes both ways, so if you use it with any number, it will give you the binary representation of that number with the inverted sign.
Now let's see 1000. The first bit is 1, so it's a negative number. Using the two-complement method:
invert the bits : 0111
add 1: 1000 (8 in decimal)
So 1000 is -8. If we do -(-8), in binary it means -(1000), which actually means using the two-complement method in 1000. As we saw above, the result is also 1000.
So, in a 4-bit number, -(-8) is equals -8.
In a 32-bit number, -2147483648 in binary is 1000..(31 zeroes), but if you use the two-complement method, you'll end up with the same value (the result is the same number).
That's why in 32-bit number -(-2147483648) is equals -2147483648
It depends on the version of C, the specifics of the implementation and whether we are talking about variables or literals values.
The first thing to understand is that there are no negative integer literals in C "-2147483648" is a unary minus operation followed by a positive integer literal.
Lets assume that we are running on a typical 32-bit platform where int and long are both 32 bits and long long is 64 bits and consider the expression.
(-(-2147483648) == -2147483648 )
The compiler needs to find a type that can hold 2147483648, on a comforming C99 compiler it will use type "long long" but a C90 compiler can use type "unsigned long".
If the compiler uses type long long then nothing overflows and the comparision is false. If the compiler uses unsigned long then the unsigned wraparound rules come into play and the comparision is true.
For the same reason that winding a tape deck counter 500 steps forward from 000 (through 001 002 003 ...) will show 500, and winding it backward 500 steps backward from 000 (through 999 998 997 ...) will also show 500.
This is two's complement notation. Of course, since 2's complement sign convention is to consider the topmost bit the sign bit, the result overflows the representable range, just like 2000000000+2000000000 overflows the representable range.
As a result, the processor's "overflow" bit will be set (seeing this requires access to the machine's arithmetic flags, generally not the case in most programming languages outside of assembler). This is the only value which will set the "overflow" bit when negating a 2's complement number: any other value's negation lies in the range representable by 2's complement.
given the following function:
int boof(int n) {
return n + ~n + 1;
}
What does this function return? I'm having trouble understanding exactly what is being passed in to it. If I called boof(10), would it convert 10 to base 2, and then do the bitwise operations on the binary number?
This was a question I had on a quiz recently, and I think the answer is supposed to be 0, but I'm not sure how to prove it.
note: I know how each bitwise operator works, I'm more confused on how the input is processed.
Thanks!
When n is an int, n + ~n will always result in an int that has all bits set.
Strictly speaking, the behavior of adding 1 to such an int will depend on the representation of signed numbers on the platform. The C standard support 3 representations for signed int:
for Two's Complement machines (the vast majority of systems in use today), the result will be 0 since an int with all bits set is -1.
on a One's Complement machine (which are pretty rare today, I believe), the result will be 1 since an int with all bits set is 0 or -0 (negative zero) or undefined behavior.
a Signed-magnitude machine (are there really any of these still in use?), an int with all bits set is a negative number with the maximum magnitude (so the actual value will depend on the size of an int). In this case adding 1 to it will result in a negative number (the exact value, again depends on the number of bits that are used to represent an int).
Note that the above ignores that it might be possible for some implementations to trap with various bit configurations that might be possible with n + ~n.
Bitwise operations will not change the underlying representation of the number to base 2 - all math on the CPU is done using binary operations regardless.
What this function does is take n and then add it to the two's complement negative representation of itself. This essentially negates the input. Anything you put in will equal 0.
Let me explain with 8 bit numbers as this is easier to visualize.
10 is represented in binary as 00001010.
Negative numbers are stored in two's complement (NOTing the number and adding 1)
So the (~n + 1) portion for 10 looks like so:
11110101 + 1 = 11110110
So if we take n + ~n+1:
00001010 + 11110110 = 0
Notice if we add these numbers together we get a left carry which will set the overflow flag, resulting in a 0. (Adding a negative and positive number together never means the overflow indicates an exception!)
See this
The CARRY and OVERFLOW flag in Binary Arithmetic
int v;
int sign; // the sign of v ;
sign = -(int)((unsigned int)((int)v) >> (sizeof(int) * CHAR_BIT - 1));
Q1: Since v in defined by type of int ,so why bother to cast it into int again? Is it related to portability?
Edit:
Q2:
sign = v >> (sizeof(int) * CHAR_BIT - 1);
this snippt isn't portable, since right shift of signed int is implementation defined, how to pad the left margin bits is up to complier.So
-(int)((unsigned int)((int)v)
do the poratable trick. Explain me why thid works please.
Isn't right shift of unsigned int alway padding 0 in the left margin bits ?
It's not strictly portable, since it is theoretically possible that int and/or unsigned int have padding bits.
In a hypothetical implementation where unsigned int has padding bits, shifting right by sizeof(int)*CHAR_BIT - 1 would produce undefined behaviour since then
sizeof(int)*CHAR_BIT - 1 >= WIDTH
But for all implementations where unsigned int has no padding bits - and as far as I know that means all existing implementations - the code
int v;
int sign; // the sign of v ;
sign = -(int)((unsigned int)((int)v) >> (sizeof(int) * CHAR_BIT - 1));
must set sign to -1 if v < 0 and to 0 if v >= 0. (Note - thanks to Sander De Dycker for pointing it out - that if int has a negative zero, that would also produce sign = 0, since -0 == 0. If the implementation supports negative zeros and the sign for a negative zero should be -1, neither this shifting, nor the comparison v < 0 would produce that, a direct inspection of the object representation would be required.)
The cast to int before the cast to unsigned int before the shift is entirely superfluous and does nothing.
It is - disregarding the hypothetical padding bits problem - portable because the conversion to unsigned integer types and the representation of unsigned integer types is prescribed by the standard.
Conversion to an unsigned integer type is reduction modulo 2^WIDTH, where WIDTH is the number of value bits in the type, so that the result lies in the range 0 to 2^WIDTH - 1 inclusive.
Since without padding bits in unsigned int the size of the range of int cannot be larger than that of unsigned int, and the standard mandates (6.2.6.2) that signed integers are represented in one of
sign and magnitude
ones' complement
two's complement
the smallest possible representable int value is -2^(WIDTH-1). So a negative int of value -k is converted to 2^WIDTH - k >= 2^(WIDTH-1) and thus has the most significant bit set.
A non-negative int value, on the other hand cannot be larger than 2^(WIDTH-1) - 1 and hence its value will be preserved by the conversion and the most significant bit will not be set.
So when the result of the conversion is shifted by WIDTH - 1 bits to the right (again, we assume no padding bits in unsigned int, hence WIDTH == sizeof(int)*CHAR_BIT), it will produce a 0 if the int value was non-negative, and a 1 if it was negative.
It should be quite portable because when you convert int to unsigned int (via a cast), you receive a value that is 2's complement bit representation of the value of the original int, with the most significant bit being the sign bit.
UPDATE: A more detailed explanation...
I'm assuming there are no padding bits in int and unsigned int and all bits in the two types are utilized to represent integer values. It's a reasonable assumption for the modern hardware. Padding bits are a thing of the past, from where we're still carrying them around in the current and recent C standards for the purpose of backward compatibility (i.e. to be able to run code on old machines).
With that assumption, if int and unsigned int have N bits in them (N = CHAR_BIT * sizeof(int)), then per the C standard we have 3 options to represent int, which is a signed type:
sign-and-magnitude representation, allowing values from -(2N-1-1) to 2N-1-1
one's complement representation, also allowing values from -(2N-1-1) to 2N-1-1
two's complement representation, allowing values from -2N-1 to 2N-1-1 or, possibly, from -(2N-1-1) to 2N-1-1
The sign-and-magnitude and one's complement representations are also a thing of the past, but let's not throw them out just yet.
When we convert int to unsigned int, the rule is that a non-negative value v (>=0) doesn't change, while a negative value v (<0) changes to the positive value of 2N+v, hence (unsigned int)-1=UINT_MAX.
Therefore, (unsigned int)v for a non-negative v will always be in the range from 0 to 2N-1-1 and the most significant bit of (unsigned int)v will be 0.
Now, for a negative v in the range from to -2N-1 to -1 (this range is a superset of the negative ranges for the three possible representations of int), (unsigned int)v will be in the range from 2N+(-2N-1) to 2N+(-1), simplifying which we arrive at the range from 2N-1 to 2N-1. Clearly, the most significant bit of this value will always be 1.
If you look carefully at all this math, you will see that the value of (unsigned)v looks exactly the same in binary as v in 2's complement representation:
...
v = -2: (unsigned)v = 2N - 2 = 111...1102
v = -1: (unsigned)v = 2N - 1 = 111...1112
v = 0: (unsigned)v = 0 = 000...0002
v = 1: (unsigned)v = 1 = 000...0012
...
So, there, the most significant bit of the value (unsigned)v is going to be 0 for v>=0 and 1 for v<0.
Now, let's get back to the sign-and-magnitude and one's complement representations. These two representations may allow two zeroes, a +0 and a -0. But arithmetic computations do not visibly distinguish between +0 and -0, it's still a 0, whether you add it, subtract it, multiply it or compare it. You, as an observer, normally wouldn't see +0 or -0 or any difference from having one or the other.
Trying to observe and distinguish +0 and -0 is generally pointless and you should not normally expect or rely on the presence of two zeroes if you want to make your code portable.
(unsigned int)v won't tell you the difference between v=+0 and v=-0, in both cases (unsigned int)v will be equivalent to 0u.
So, with this method you won't be able to tell whether internally v is a -0 or a +0, you won't extract v's sign bit this way for v=-0.
But again, you gain nothing of practical value from differentiating between the two zeroes and you don't want this differentiation in portable code.
So, with this I dare to declare the method for sign extraction presented in the question quite/very/pretty-much/etc portable in practice.
This method is an overkill, though. And (int)v in the original code is unnecessary as v is already an int.
This should be more than enough and easy to comprehend:
int sign = -(v < 0);
Nope its just excessive casting. There is no need to cast it to an int. It doesn't hurt however.
Edit: Its worth noting that it may be done like that so the type of v can be changed to something else or it may have once been another data type and after it was converted to an int the cast was never removed.
It isn't. The Standard does not define the representation of integers, and therefore it's impossible to guarantee exactly what the result of that will be portably. The only way to get the sign of an integer is to do a comparison.