Why is this bit-hack code portable? - c

int v;
int sign; // the sign of v ;
sign = -(int)((unsigned int)((int)v) >> (sizeof(int) * CHAR_BIT - 1));
Q1: Since v in defined by type of int ,so why bother to cast it into int again? Is it related to portability?
Edit:
Q2:
sign = v >> (sizeof(int) * CHAR_BIT - 1);
this snippt isn't portable, since right shift of signed int is implementation defined, how to pad the left margin bits is up to complier.So
-(int)((unsigned int)((int)v)
do the poratable trick. Explain me why thid works please.
Isn't right shift of unsigned int alway padding 0 in the left margin bits ?

It's not strictly portable, since it is theoretically possible that int and/or unsigned int have padding bits.
In a hypothetical implementation where unsigned int has padding bits, shifting right by sizeof(int)*CHAR_BIT - 1 would produce undefined behaviour since then
sizeof(int)*CHAR_BIT - 1 >= WIDTH
But for all implementations where unsigned int has no padding bits - and as far as I know that means all existing implementations - the code
int v;
int sign; // the sign of v ;
sign = -(int)((unsigned int)((int)v) >> (sizeof(int) * CHAR_BIT - 1));
must set sign to -1 if v < 0 and to 0 if v >= 0. (Note - thanks to Sander De Dycker for pointing it out - that if int has a negative zero, that would also produce sign = 0, since -0 == 0. If the implementation supports negative zeros and the sign for a negative zero should be -1, neither this shifting, nor the comparison v < 0 would produce that, a direct inspection of the object representation would be required.)
The cast to int before the cast to unsigned int before the shift is entirely superfluous and does nothing.
It is - disregarding the hypothetical padding bits problem - portable because the conversion to unsigned integer types and the representation of unsigned integer types is prescribed by the standard.
Conversion to an unsigned integer type is reduction modulo 2^WIDTH, where WIDTH is the number of value bits in the type, so that the result lies in the range 0 to 2^WIDTH - 1 inclusive.
Since without padding bits in unsigned int the size of the range of int cannot be larger than that of unsigned int, and the standard mandates (6.2.6.2) that signed integers are represented in one of
sign and magnitude
ones' complement
two's complement
the smallest possible representable int value is -2^(WIDTH-1). So a negative int of value -k is converted to 2^WIDTH - k >= 2^(WIDTH-1) and thus has the most significant bit set.
A non-negative int value, on the other hand cannot be larger than 2^(WIDTH-1) - 1 and hence its value will be preserved by the conversion and the most significant bit will not be set.
So when the result of the conversion is shifted by WIDTH - 1 bits to the right (again, we assume no padding bits in unsigned int, hence WIDTH == sizeof(int)*CHAR_BIT), it will produce a 0 if the int value was non-negative, and a 1 if it was negative.

It should be quite portable because when you convert int to unsigned int (via a cast), you receive a value that is 2's complement bit representation of the value of the original int, with the most significant bit being the sign bit.
UPDATE: A more detailed explanation...
I'm assuming there are no padding bits in int and unsigned int and all bits in the two types are utilized to represent integer values. It's a reasonable assumption for the modern hardware. Padding bits are a thing of the past, from where we're still carrying them around in the current and recent C standards for the purpose of backward compatibility (i.e. to be able to run code on old machines).
With that assumption, if int and unsigned int have N bits in them (N = CHAR_BIT * sizeof(int)), then per the C standard we have 3 options to represent int, which is a signed type:
sign-and-magnitude representation, allowing values from -(2N-1-1) to 2N-1-1
one's complement representation, also allowing values from -(2N-1-1) to 2N-1-1
two's complement representation, allowing values from -2N-1 to 2N-1-1 or, possibly, from -(2N-1-1) to 2N-1-1
The sign-and-magnitude and one's complement representations are also a thing of the past, but let's not throw them out just yet.
When we convert int to unsigned int, the rule is that a non-negative value v (>=0) doesn't change, while a negative value v (<0) changes to the positive value of 2N+v, hence (unsigned int)-1=UINT_MAX.
Therefore, (unsigned int)v for a non-negative v will always be in the range from 0 to 2N-1-1 and the most significant bit of (unsigned int)v will be 0.
Now, for a negative v in the range from to -2N-1 to -1 (this range is a superset of the negative ranges for the three possible representations of int), (unsigned int)v will be in the range from 2N+(-2N-1) to 2N+(-1), simplifying which we arrive at the range from 2N-1 to 2N-1. Clearly, the most significant bit of this value will always be 1.
If you look carefully at all this math, you will see that the value of (unsigned)v looks exactly the same in binary as v in 2's complement representation:
...
v = -2: (unsigned)v = 2N - 2 = 111...1102
v = -1: (unsigned)v = 2N - 1 = 111...1112
v = 0: (unsigned)v = 0 = 000...0002
v = 1: (unsigned)v = 1 = 000...0012
...
So, there, the most significant bit of the value (unsigned)v is going to be 0 for v>=0 and 1 for v<0.
Now, let's get back to the sign-and-magnitude and one's complement representations. These two representations may allow two zeroes, a +0 and a -0. But arithmetic computations do not visibly distinguish between +0 and -0, it's still a 0, whether you add it, subtract it, multiply it or compare it. You, as an observer, normally wouldn't see +0 or -0 or any difference from having one or the other.
Trying to observe and distinguish +0 and -0 is generally pointless and you should not normally expect or rely on the presence of two zeroes if you want to make your code portable.
(unsigned int)v won't tell you the difference between v=+0 and v=-0, in both cases (unsigned int)v will be equivalent to 0u.
So, with this method you won't be able to tell whether internally v is a -0 or a +0, you won't extract v's sign bit this way for v=-0.
But again, you gain nothing of practical value from differentiating between the two zeroes and you don't want this differentiation in portable code.
So, with this I dare to declare the method for sign extraction presented in the question quite/very/pretty-much/etc portable in practice.
This method is an overkill, though. And (int)v in the original code is unnecessary as v is already an int.
This should be more than enough and easy to comprehend:
int sign = -(v < 0);

Nope its just excessive casting. There is no need to cast it to an int. It doesn't hurt however.
Edit: Its worth noting that it may be done like that so the type of v can be changed to something else or it may have once been another data type and after it was converted to an int the cast was never removed.

It isn't. The Standard does not define the representation of integers, and therefore it's impossible to guarantee exactly what the result of that will be portably. The only way to get the sign of an integer is to do a comparison.

Related

How to find the most significant bit of a signed integer in C

I need to find the most significant bit of signed int N and save it in signBitN. I want to do this using bitwise only operations.
Also, how would I make signBitN extend so that all its bits are equal to its significant bit.
i.e. if it's significant bit was zero, how would I extend that to be 00000...00?
The closest I've gotten is signBitN=1&(N>>(sizeof(int)-1));
Portable expression:
1 & (x >> (CHAR_BIT * sizeof(int) - 1))
Latest C standards put 3 standards on representation of ints.
sign and magnitude
one complement
two complement
See section 6.2.6.2 Integer types of C11 standard.
Only the third option is relevant in practice for modern machines.
As specified in 6.2.6.1:
Values stored in non-bit-field objects of any other object type
consist of n x CHAR_BIT bits, where n is the size of an object of that type,
in bytes.
Therefore int will consist of sizeof(int) * CHAR_BIT bits, likely 32.
Thus the highest bit of int can be read by shifting right by sizeof(int) * CHAR_BIT - 1 bits and reading the last bit with bitwise & operator.
Note that the exact value of the int after the shift is implementation defined as stated in 6.5.7.5.
On sane machines it would be:
int y = x < 0 ? -1 : 0;
The portable way would be casting between int and an array of unsigned char and setting all bytes to -1.
See 6.3.1.3.2:
if the new type is unsigned, the value is converted by
repeatedly adding or subtracting one more than the maximum
value that can be represented in the new type until the value
is in the range of the new type.
And 6.2.6.1.2
Values stored in unsigned bit-fields and objects of type
unsigned char shall be represented using a pure binary notation.
You can use memset() for that.
int x;
memset(&x, (x < 0 ? -1 : 0), sizeof x);
If the question is how to check the MSB bit of the integer (for example 31st bit of the 32 bit integer) ten IMO this is portable.
#define MSB(i) ((i) & (((~0U) >> 1) ^ (~0U)))
#define issetMSB(i) (!!MSB(i))
int main(void)
{
printf("%x\n", MSB(-1));
printf("%x\n", issetMSB(-1));
}

Does signed to unsigned casting in C changes the bit values

I've done some quick tests that a signed int to unsigned int cast in C does not change the bit values (on an online debugger).
What I want to know is whether it is guaranteed by a C standard or just the common (but not 100% sure) behaviour ?
Conversion from signed int to unsigned int does not change the bit representation in two’s-complement C implementations, which are the most common, but will change the bit representation for negative numbers, including possible negative zeroes on one’s complement or sign-and-magnitude systems.
This is because the cast (unsigned int) a is not defined to retain the bits but the result is the positive remainder of dividing a by UINT_MAX + 1 (or as the C standard (C11 6.3.1.3p2) says,
the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.
The two’s complement representation for negative numbers is the most commonly used representation for signed numbers exactly because it has this property of negative value n mapping to the same bit pattern as the mathematical value n + UINT_MAX + 1 – it makes it possible to use the same machine instruction for signed and unsigned addition, and the negative numbers will work because of wraparound.
Casting from a signed to an unsigned integer is required to generate the correct arithmetic result (the same number), modulo the size of the unsigned integer, so to speak. That is, after
int i = anything;
unsigned int u = (unsigned int)i;
and on a machine with 32-bit ints, the requirement is that u is equal to i, modulo 232.
(We could also try to say that u receives the value i % 0x100000000, except it turns out that's not quite right, because the C rules say that when you divide a negative integer by a positive integer, you get a quotient rounded towards 0 and a negative remainder, which isn't the kind of modulus we want here.)
If i is 0 or positive, it's not hard to see that u will have the same bit pattern.
If i is negative, and if you're on a 2's complement machine, it turns out the result is also guaranteed to have the same bit pattern. (I'd love to present a nice proof of that result here, but I don't have time just now to try to construct it.)
The vast majority of today's machines use 2's complement. But if you were on a 1's complement or sign/magnitude machine, I'm pretty sure the bit patterns would not always be the same.
So, bottom line, the sameness of the bit patterns is not guaranteed by the C Standard, but arises due to a combination of the C Standard's requirements, and the particulars of 2's complement arithmetic.

Extracting the sign bit with shift

Is it always defined behavior to extract the sign of a 32 bit integer this way:
#include <stdint.h>
int get_sign(int32_t x) {
return (x & 0x80000000) >> 31;
}
Do I always get a result of 0 or 1?
No, it is incorrect to do this because right shifting a signed integer with a negative value is implementation-defined, as specified in the C Standard:
6.5.7 Bitwise shift operators
The result of E1 >> E2 is E1 right-shifted E2 bit positions. If E1 has an unsigned type or if E1 has a signed type and a nonnegative value, the value of the result is the integral part of the quotient of E1 / 2E2. If E1 has a signed type and a negative value, the resulting value is implementation-defined.
You should cast x as (uint32_t) before masking and shifting.
EDIT: Wrong answer! I shall keep this answer here as an example of good looking, intuitive but incorrect reasoning. As explained in the other answers, there is not right shifting of a negative value in the code posted. The type of x & 0x80000000 is one of the signed integer or unsigned integer types depending on the implementation characteristics, but its value is always positive, either 0 or 2147483648. Right shifting this value is not implementation-defined, the result is always either 0 or 1. Whether the result is the value of the sign bit is less obvious: it is the value of the sign bit except for some very contorted corner cases, hybrid architectures quite unlikely to exist and probably non standard conforming anyway.
Since the answer assumes that fixed width types are available, therefore a negative zero doesn't exists1, the only correct way of extracting the sign bit is to simply check if the value is negative:
_Bool Sign( const int32_t a )
{
return a < 0 ;
}
1 Fixed width types require two's complement representation, which doesn't have a negative zero.
Yes it is correct on 1s and 2s complement architectures, but for subtile reasons:
for the overwhelmingly common hardware where int is the same type as int32_t and unsigned the same as uint32_t, the constant literal 0x80000000 has type unsigned int. The left operand of the & operation is converted to unsigned int and the result of the & has the same type. The right shift is applied to an unsigned int, the value is either 0 or 1, no implementation-defined behavior.
On other platforms, 0x80000000 may have a different type and the behavior might be implementation defined:
0x80000000 can be of type int, if the int type has more than 31 value bits. In this case, x is promoted to int, and its value is unchanged.
If int uses 1s complement or 2s complement representation, the sign bit is replicated into the more significant bits. The mask operation evaluates to an int with value 0 or 0x80000000. Right shifting it by 31 positions evaluates to 0 and 1 respectively, no implementation-defined behavior either.
Conversely, if int uses sign/magnitude representation, preserving the value of x will effectively reset its 31st bit, moving the sign bit beyond the value bits. The mask operation will evaluate to 0 and the result will be incorrect.
0x80000000 can be of type long, if the int type has fewer than 31 value bits or if INT_MIN == -INT_MAX and long has more that 31 value bits. In this case, x is converted to long, and its value is unchanged, with the same consequences as for the int case. For 1s or 2s complement representation of long, the mask operation evaluates to a positive long value of either 0 or 0x80000000 and right shifting it by 31 places is defined and gives either 0 or 1, for sign/magnitude, the result should be 0 in all cases.
0x80000000 can be of type unsigned long, if the int type has fewer than 31 value bits and long has 31 value bits and uses 2s complement representation. In this case, x is converted to unsigned long keeping the sign bit intact. The mask operation evaluates to an unsigned long value of either 0 or 0x80000000 and right shifting it by 31 places is defined and gives either 0 or 1.
lastly, 0x80000000 can be of type long long, if both the int type has fewer than 31 value bits or INT_MIN == -INT_MAX and long has 31 value bits but does not use 2s complement representation. In this case, x is converted to long long, keeping its value, with the same consequences as for the int case if long long representation is sign/magnitude.
This question was purposely contrived. The answer is you get the correct result so long as the platform does not use sign/magnitude representation. But the C Standard insists on supporting integer representations other than 2s complement, with very subtile consequences.
EDIT: Careful reading of section 6.2.6.2 Integer types of the C Standard seems to exclude the possibility for different representations of signed integer types to coexist in the same implementation. This makes the code fully defined as posted, since the very presence of type int32_t implies 2s complement representation for all signed integer types.
Do I always get a result of 0 or 1?
Yes.
Simple answer:
0x80000000 >> 31 is always 1.
0x00000000 >> 31 is always 0.
See below.
[Edit]
Is it always defined behavior to extract the sign of a 32 bit integer this way
Yes, except for a corner case.
Should 0x80000000 implement as a int/long (this implies the type > 32 bit) and that signed integer type is signed-magnitude (or maybe one's complement) on a novel machine, then the conversion of int32_t x to that int/long would move the sign bit to a new bit location, rendering the & 0x80000000 moot.
The question is open if C supports int32_t (which must be 2's complement) and any of int/long/long long as non-2's complement.
0x80000000 is a hexadecimal constant. "The type of an integer constant is the first of the corresponding list in which its value can be represented" C11 §6.4.4.1 5: Octal or Hexadecimal Constant: int, unsigned, long or unsigned long.... Regardless of its type, it will have a value of +2,147,483,648.
The type of x & 0x80000000 will be the wider of the types of int32_t and the type of 0x80000000. If the 2 types are the same width and differ in sign-ness, it will be the unsigned one. INT32_MAX is +2,147,483,647 and less than +2,147,483,648, thus 0x80000000 must be a wider type (or same and unsigned) than int32_t. So regardless of what type 0x80000000, x & 0x80000000 will be the same type.
It makes no difference how int nor long are implemented as 2's complement or not.
The & operation does not change the sign of the value of 0x80000000 as either it is an unsigned integer type or the sign bit is in a more significant position. x & 0x80000000 then has the value of +2,147,483,648 or 0.
Right shift of a positive number is well defined regardless of integer type. Right shift of negative values are implementation defined. See C11 §6.5.7 5. x & 0x80000000 is never a negative number.
Thus (x & 0x80000000) >> 31 is well defined and either 0 or 1.
return x < 0; (which does not "Extracting the sign bit with shift" per post title) is understandable and is certainly the preferred code for most instances I can think of. Either approach may not make any executable code difference.
Whether this expression has precisely defined semantics or not, it is not the most readable way to get the sign bit. Here is simpler alternative:
int get_sign(int32_t x) {
return x < 0;
}
As correctly pointed out by 2501, int32_t is defined to have 2s complement representation, so comparing to 0 has the same semantics as extracting the most significant bit.
Incidentally, both functions compile to the same exact code with gcc 5.3:
get_sign(int):
movl %edi, %eax
shrl $31, %eax
ret

Wrap around explanation for signed and unsigned variables in C?

I read a bit in C spec that unsigned variables(in particular unsigned short int) perform some so called wrap around on integer overflow, although I couldn't find anything on signed variables except that I left with undefined behavior.
My professor told me that their values also get wrapped around (maybe he just meant gcc). I thought the bits just get truncated and the bits I left with give me some weird value!
What wrap around is and how is it different from just truncating bits.
Signed integer variables do not have wrap-around behavior in C language. Signed integer overflow during arithmetic computations produces undefined behavior. Note BTW that GCC compiler you mentioned is known for implementing strict overflow semantics in optimizations, meaning that it takes advantage of the freedom provided by such undefined behavior situations: GCC compiler assumes that signed integer values never wrap around. That means that GCC actually happens to be one of the compilers in which you cannot rely on wrap-around behavior of signed integer types.
For example, GCC compiler can assume that for variable int i the following condition
if (i > 0 && i + 1 > 0)
is equivalent to a mere
if (i > 0)
This is exactly what strict overflow semantics means.
Unsigned integer types implement modulo arithmetic. The modulo is equal 2^N where N is the number of bits in the value representation of the type. For this reason unsigned integer types do indeed appear to wrap around on overflow.
However, C language never performs arithmetic computations in domains smaller than that of int/unsigned int. Type unsigned short int that you mention in your question will typically be promoted to type int in expressions before any computations begin (assuming that the range of unsigned short fits into the range of int). Which means that 1) the computations with unsigned short int will be preformed in the domain of int, with overflow happening when int overflows, 2) overflow during such computations will lead to undefined behavior, not to wrap-around behavior.
For example, this code produces a wrap around
unsigned i = USHRT_MAX;
i *= INT_MAX; /* <- unsigned arithmetic, overflows, wraps around */
while this code
unsigned short i = USHRT_MAX;
i *= INT_MAX; /* <- signed arithmetic, overflows, produces undefined behavior */
leads to undefined behavior.
If no int overflow happens and the result is converted back to an unsigned short int type, it is again reduced by modulo 2^N, which will appear as if the value has wrapped around.
Imagine you have a data type that's only 3 bits wide. This allows you to represent 8 distinct values, from 0 through 7. If you add 1 to 7, you will "wrap around" back to 0, because you don't have enough bits to represent the value 8 (1000).
This behavior is well-defined for unsigned types. It is not well-defined for signed types, because there are multiple methods for representing signed values, and the result of an overflow will be interpreted differently based on that method.
Sign-magnitude: the uppermost bit represents the sign; 0 for positive, 1 for negative. If my type is three bits wide again, then I can represent signed values as follows:
000 = 0
001 = 1
010 = 2
011 = 3
100 = -0
101 = -1
110 = -2
111 = -3
Since one bit is taken up for the sign, I only have two bits to encode a value from 0 to 3. If I add 1 to 3, I'll overflow with -0 as the result. Yes, there are two representations for 0, one positive and one negative. You won't encounter sign-magnitude representation all that often.
One's-complement: the negative value is the bitwise-inverse of the positive value. Again, using the three-bit type:
000 = 0
001 = 1
010 = 2
011 = 3
100 = -3
101 = -2
110 = -1
111 = -0
I have three bits to encode my values, but the range is [-3, 3]. If I add 1 to 3, I'll overflow with -3 as the result. This is different from the sign-magnitude result above. Again, there are two encodings for 0 using this method.
Two's-complement: the negative value is the bitwise inverse of the positive value, plus 1. In the three-bit system:
000 = 0
001 = 1
010 = 2
011 = 3
100 = -4
101 = -3
110 = -2
111 = -1
If I add 1 to 3, I'll overflow with -4 as a result, which is different from the previous two methods. Note that we have a slightly larger range of values [-4, 3] and only one representation for 0.
Two's complement is probably the most common method of representing signed values, but it's not the only one, hence the C standard can't make any guarantees of what will happen when you overflow a signed integer type. So it leaves the behavior undefined so the compiler doesn't have to deal with interpreting multiple representations.
The undefined behavior comes from early portability issues when signed integer types could be represented either as sign & magnitude, one's complement or two's complement.
Nowadays, all architectures represent integers as two's complement that do wrap around. But be careful : since your compiler is right to assume you won't be running undefined behavior, you might encounter weird bugs when optimisation is on.
In a signed 8-bit integer, the intuitive definition of wrap around might look like going from +127 to -128 -- in two's complement binary: 0111111 (127) and 1000000 (-128). As you can see, that is the natural progress of incrementing the binary data--without considering it to represent an integer, signed or unsigned. Counter intuitively, the actual overflow takes place when moving from -1 (11111111) to 0 (00000000) in the unsigned integer's sense of wrap-around.
This doesn't answer the deeper question of what the correct behavior is when a signed integer overflows because there is no "correct" behavior according to the standard.

Tilde C unsigned vs signed integer

For example:
unsigned int i = ~0;
Result: Max number I can assign to i
and
signed int y = ~0;
Result: -1
Why do I get -1? Shouldn't I get the maximum number that I can assign to y?
Both 4294967295 (a.k.a. UINT_MAX) and -1 have the same binary representation of 0xFFFFFFFF or 32 bits all set to 1. This is because signed numbers are represented using two's complement. A negative number has its MSB (most significant bit) set to 1 and its value determined by flipping the rest of the bits, adding 1 and multiplying by -1. So if you have the MSB set to 1 and the rest of the bits also set to 1, you flip them (get 32 zeros), add 1 (get 1) and multiply by -1 to finally get -1.
This makes it easier for the CPU to do the math as it needs no special exceptions for negative numbers. For example, try adding 0xFFFFFFFF (-1) and 1. Since there is only room for 32 bits, this will overflow and the result will be 0 as expected.
See more at:
http://en.wikipedia.org/wiki/Two%27s_complement
unsigned int i = ~0;
Result: Max number I can assign to i
Usually, but not necessarily. The expression ~0 evaluates to an int with all (non-padding) bits set. The C standard allows three representations for signed integers,
two's complement, in which case ~0 = -1 and assigning that to an unsigned int results in (-1) + (UINT_MAX + 1) = UINT_MAX.
ones' complement, in which case ~0 is either a negative zero or a trap representation; if it's a negative zero, the assignment to an unsigned int results in 0.
sign-and-magnitude, in which case ~0 is INT_MIN == -INT_MAX, and assigning it to an unsigned int results in (UINT_MAX + 1) - INT_MAX, which is 1 in the unlikely case that unsigned int has a width (number of value bits for unsigned integer types, number of value bits + 1 [for the sign bit] for signed integer types) smaller than that of int and 2^(WIDTH - 1) + 1 in the common case that the width of unsigned int is the same as the width of int.
The initialisation
unsigned int i = ~0u;
will always result in i holding the value UINT_MAX.
signed int y = ~0;
Result: -1
As stated above, only if the representation of signed integers uses two's complement (which nowadays is by far the most common representation).
~0 is just an int with all bits set to 1. When interpreted as unsigned this will be equivalent to UINT_MAX. When interpreted as signed this will be -1.
Assuming 32 bit ints:
0 = 0x00000000 = 0 (signed) = 0 (unsigned)
~0 = 0xffffffff = -1 (signed) = UINT_MAX (unsigned)
Paul's answer is absolutely right. Instead of using ~0, you can use:
#include <limits.h>
signed int y = INT_MAX;
unsigned int x = UINT_MAX;
And now if you check values:
printf("x = %u\ny = %d\n", UINT_MAX, INT_MAX);
you can see max values on your system.
No, because ~ is the bitwise NOT operator, not the maximum value for type operator. ~0 corresponds to an int with all bits set to 1, which, interpreted as an unsigned gives you the max number representable by an unsigned, and interpreted as a signed int, gives you -1.
You must be on a two's complement machine.
Look up http://en.wikipedia.org/wiki/Two%27s_complement, and learn a little about Boolean algebra, and logic design. Also learning how to count in binary and addition and subtraction in binary will explain this further.
The C language used this form of numbers so to find the largest number you need to use 0x7FFFFFFF. (where you use 2 FF's for each byte used and the leftmost byte is a 7.) To understand this you need to look up hexadecimal numbers and how they work.
Now to explain the unsigned equivalent. In signed numbers the bottom half of numbers are negative (0 is assumed positive so negative numbers actually count 1 higher than positive numbers). Unsigned numbers are all positive. So in theory your highest number for a 32 bit int is 2^32 except that 0 is still counted as positive so it's actually 2^32-1, now for signed numbers half those numbers are negative. which means we divide the previous number 2^32 by 2, since 32 is an exponent we get 2^31 numbers on each side 0 being positive means the range of an signed 32 bit int is (-2^31, 2^31-1).
Now just comparing ranges:
unsigned 32 bit int: (0, 2^32-1)
signed 32 bit int: (-2^31, 2^32-1)
unsigned 16 bit int: (0, 2^16-1)
signed 16 bit int: (-2^15, 2^15-1)
you should be able to see the pattern here.
to explain the ~0 thing takes a bit more, this has to do with subtraction in binary. it's just adding 1 and flipping all the bits then adding the two numbers together. C does this for you behind the scenes and so do many processors (including the x86 and x64 lines of processors.)
Because of this it's best to store negative numbers as though they are counting down, and in two's complement the added 1 is also hidden. Because 0 is assumed positive thus negative numbers can't have a value for 0, so they automatically have -1 (positive 1 after the bit flip) added to them. when decoding negative numbers we have to account for this.

Resources