What is a difference between unsigned int and signed int in C? - c

Consider these definitions:
int x=5;
int y=-5;
unsigned int z=5;
How are they stored in memory? Can anybody explain the bit representation of these in memory?
Can int x=5 and int y=-5 have same bit representation in memory?

ISO C states what the differences are.
The int data type is signed and has a minimum range of at least -32767 through 32767 inclusive. The actual values are given in limits.h as INT_MIN and INT_MAX respectively.
An unsigned int has a minimal range of 0 through 65535 inclusive with the actual maximum value being UINT_MAX from that same header file.
Beyond that, the standard does not mandate twos complement notation for encoding the values, that's just one of the possibilities. The three allowed types would have encodings of the following for 5 and -5 (using 16-bit data types):
two's complement | ones' complement | sign/magnitude
+---------------------+---------------------+---------------------+
5 | 0000 0000 0000 0101 | 0000 0000 0000 0101 | 0000 0000 0000 0101 |
-5 | 1111 1111 1111 1011 | 1111 1111 1111 1010 | 1000 0000 0000 0101 |
+---------------------+---------------------+---------------------+
In two's complement, you get a negative of a number by inverting all bits then adding 1.
In ones' complement, you get a negative of a number by inverting all bits.
In sign/magnitude, the top bit is the sign so you just invert that to get the negative.
Note that positive values have the same encoding for all representations, only the negative values are different.
Note further that, for unsigned values, you do not need to use one of the bits for a sign. That means you get more range on the positive side (at the cost of no negative encodings, of course).
And no, 5 and -5 cannot have the same encoding regardless of which representation you use. Otherwise, there'd be no way to tell the difference.
As an aside, there are currently moves underway, in both C and C++ standards, to nominate two's complement as the only encoding for negative integers.

Because it's all just about memory, in the end all the numerical values are stored in binary.
A 32 bit unsigned integer can contain values from all binary 0s to all binary 1s.
When it comes to 32 bit signed integer, it means one of its bits (most significant) is a flag, which marks the value to be positive or negative.

The C standard specifies that unsigned numbers will be stored in binary. (With optional padding bits). Signed numbers can be stored in one of three formats: Magnitude and sign; two's complement or one's complement. Interestingly that rules out certain other representations like Excess-n or Base −2.
However on most machines and compilers store signed numbers in 2's complement.
int is normally 16 or 32 bits. The standard says that int should be whatever is most efficient for the underlying processor, as long as it is >= short and <= long then it is allowed by the standard.
On some machines and OSs history has causes int not to be the best size for the current iteration of hardware however.

Here is the very nice link which explains the storage of signed and unsigned INT in C -
http://answers.yahoo.com/question/index?qid=20090516032239AAzcX1O
Taken from this above article -
"process called two's complement is used to transform positive numbers into negative numbers. The side effect of this is that the most significant bit is used to tell the computer if the number is positive or negative. If the most significant bit is a 1, then the number is negative. If it's 0, the number is positive."

Assuming int is a 16 bit integer (which depends on the C implementation, most are 32 bit nowadays) the bit representation differs like the following:
5 = 0000000000000101
-5 = 1111111111111011
if binary 1111111111111011 would be set to an unsigned int, it would be decimal 65531.

Related

Is 2's complement representation also used on positive numbers?

when i write :
signed int a = 4;
is my computer using 2's representation?
because if my computer use 2’s complement representation to represent number 4, this is what will happen on a 8 bit machine:
binary value of 4 : 0000 0100
2’s complement become: 1111 1011
add 1: 1111 1100
but i read that when the most signficant bit is 1 , your number is negative. but here my most significant bit is 1 and my number is 4 . it is not -4.
why my number 4 has a 1 as the most significant bit?
2’s complement become: 1111 1011
No. Where did you get that idea from? 1111 1011 is -5 in two's complement. -5 is not +4.
-4 is not the same as +4 either.
The binary value of 4 is 0000 0100. The signed number variable representation of 4 is therefore also 0000 0100.
Two's complement is irrelevant unless the number is negative.
why my number 4 has a 1 as the most significant bit?
It doesn't. Your -4 has a 1 as the msb.
When you initialize a as :
signed int a;
The machine will keep the last bit(MSB) as a marker for positive or negative values.
0 for Positive
1 for Negative
When you take 2's complement of a number which is taking the 1's complement and adding 1 to it(regarding the confusion in the phrasing of the question)
you negate the number you are working with.
So when you do 2's complement of 4 you get 1111 1100 which is binary notation for -4
Since this is negative number you get the MSB as 1
In some sense a signed integer is in "2's complement", in that it requires the left-most bit to be reserved for the sign. "2's complement" tells how to make a positive integer negative by using "flip the bits + one". The implication is that the positive integer is in base 2 and can use only n-1 bits, with n the number of bits in an int.
So "2's complement" is a way to represent negative numbers in binary, not positive numbers in binary.

How to get the signed complement of a number?

I want to find the signed value of a number in C. So if I have a number let's say 10, in binary (in 8 bits) it would be 0000 0110. How do I get the signed number in two's complement 1111 1110, which is -2. Using simple bitwise operations, shifts, masks, how do I do this conversion? I've been stuck on this for hours.
If we already have the binary representation of a positive number n, then the bitwise representation of -n is ~n+1, in other words, 1 plus the bitwise negation of the positive number.
http://en.wikipedia.org/wiki/Two%27s_complement
I want to find the signed value of a number in C. So if I have a number let's say 10, in binary (in 8 bits) it would be 0000 0110. How do I get the signed number in two's complement 1111 1110, which is -2.
You're confused. The 8-bit two's complement of 0000 0110 is 1111 1010 (-10 if interpreted as signed, or 6 if interpreted as unsigned). The whole point is that a bit pattern and its (unsigned) n-bit two's complement add up to 2^n.
Using simple bitwise operations, shifts, masks, how do I do this conversion? I've been stuck on this for hours.
Here:
unsigned char x = 0x0a;
unsigned char twos_complement = (~x) + 1;

Right bit-shift giving wrong result, can someone explain

I'm right-shifting -109 by 5 bits, and I expect -3, because
-109 = -1101101 (binary)
shift right by 5 bits
-1101101 >>5 = -11 (binary) = -3
But, I am getting -4 instead.
Could someone explain what's wrong?
Code I used:
int16_t a = -109;
int16_t b = a >> 5;
printf("%d %d\n", a,b);
I used GCC on linux, and clang on osx, same result.
The thing is you are not considering negative numbers representation correctly. With right shifting, the type of shift (arithmetic or logical) depends on the type of the value being shifted. If you cast your value to an unsigned value, you might get what you are expecting:
int16_t b = ((unsigned int)a) >> 5;
You are using -109 (16 bits) in your example. 109 in bits is:
00000000 01101101
If you take's 109 2's complement you get:
11111111 10010011
Then, you are right shifting by 5 the number 11111111 10010011:
__int16_t a = -109;
__int16_t b = a >> 5; // arithmetic shifting
__int16_t c = ((__uint16_t)a) >> 5; // logical shifting
printf("%d %d %d\n", a,b,c);
Will yield:
-109 -4 2044
The result of right shifting a negative value is implementation defined behavior, from the C99 draft standard section 6.5.7 Bitwise shift operators paragraph 5 which says (emphasis mine going forward):
The result of E1 >> E2 is E1 right-shifted E2 bit positions. If E1 has an unsigned type
or if E1 has a signed type and a nonnegative value, the value of the result is the integral
part of the quotient of E1 / 2E2. If E1 has a signed type and a negative value, the
resulting value is implementation-defined.
If we look at gcc C Implementation-defined behavior documents under the Integers section it says:
The results of some bitwise operations on signed integers (C90 6.3, C99 and C11 6.5).
Bitwise operators act on the representation of the value including both the sign and value bits, where the sign bit is considered immediately above the highest-value value bit. Signed ‘>>’ acts on negative numbers by sign extension.
That's pretty clear what's happening, when representing signed integers, negative integers have a property which is, sign extension, and the left most significant bit is the sign bit.
So, 1000 ... 0000 (32 bit) is the biggest negative number that you can represent, with 32 bits.
Because of this, when you have a negative number and you shift right, a thing called sign extension happens, which means that the left most significant bit is extended, in simple terms it means that, for a number like -109 this is what happens:
Before shifting you have (16bit):
1111 1111 1001 0011
Then you shift 5 bits right (after the pipe are the discarded bits):
1XXX X111 1111 1100 | 1 0011
The X's are the new spaces that appear in your integer bit representation, that due to the sign extension, are filled with 1's, which give you:
1111 1111 1111 1100 | 1 0011
So by shifting: -109 >> 5, you get -4 (1111 .... 1100) and not -3.
Confirming results with the 1's complement:
+3 = 0... 0000 0011
-3 = ~(0... 0000 0011) + 1 = 1... 1111 1100 + 1 = 1... 1111 1101
+4 = 0... 0000 0100
-4 = ~(0... 0000 0100) + 1 = 1... 1111 1011 + 1 = 1... 1111 1100
Note: Remember that the 1's complement is just like the 2's complement with the diference that you first must negate the bits of positive number and only then sum +1.
Pablo's answer is essentially correct, but there are two small bits (no pun intended!) that may help you see what's going on.
C (like pretty much every other language) uses what's called two's complement, which is simply a different way of representing negative numbers (it's used to avoid the problems that come up with other ways of handling negative numbers in binary with a fixed number of digits). There is a conversion process to turn a positive number in two's complement (which looks just like any other number in binary - except that the furthest most left bit must be 0 in a positive number; it's basically the sign place-holder) is reasonably simple computationally:
Take your number
00000000 01101101 (It has 0s padding it to the left because it's 16 bits. If it was long, it'd be padded with more zeros, etc.)
Flip the bits
11111111 10010010
Add one.
11111111 10010011.
This is the two's complement number that Pablo was referring to. It's how C holds -109, bitwise.
When you logically shift it to the right by five bits you would APPEAR to get
00000111 11111100.
This number is most definitely not -4. (It doesn't have a 1 in the first bit, so it's not negative, and it's way too large to be 4 in magnitude.) Why is C giving you negative 4 then?
The reason is basically that the ISO implementation for C doesn't specify how a given compiler needs to treat bit-shifting in negative numbers. GCC does what's called sign extension: the idea is to basically pad the left bits with 1s (if the initial number was negative before shifting), or 0s (if the initial number was positive before shifting).
So instead of the 5 zeros that happened in the above bit-shift, you instead get:
11111111 11111100. That number is in fact negative 4! (Which is what you were consistently getting as a result.)
To see that that is in fact -4, you can just convert it back to a positive number using the two's complement method again:
00000000 00000011 (bits flipped)
00000000 00000100 (add one).
That's four alright, so your original number (11111111 11111100) was -4.

Why does the range of int has a minus 1?

I read that the range of an int is dependent on a byte.
So taking int to be 4 bytes long, thats 4 * 8 bits = 32 bits.
So the range should be : 2 ^ (32-1) = 2 ^ (31)
Why do some people say its 2^31 - 1 though?
Thanks!
Because the counting starts from 0
And the range of int is 2,147,483,647 and 2^32 which is 2,147,483,648. hence we subtract 1
Also the loss of 1 bit is for the positive and negative sign
Check this interestinf wiki article on Integers:-
The most common representation of a positive integer is a string of
bits, using the binary numeral system. The order of the memory bytes
storing the bits varies; see endianness. The width or precision of an
integral type is the number of bits in its representation. An integral
type with n bits can encode 2n numbers; for example an unsigned type
typically represents the non-negative values 0 through 2n−1. Other
encodings of integer values to bit patterns are sometimes used, for
example Binary-coded decimal or Gray code, or as printed character
codes such as ASCII.
There are four well-known ways to represent signed numbers in a binary
computing system. The most common is two's complement, which allows a
signed integral type with n bits to represent numbers from −2(n−1)
through 2(n−1)−1. Two's complement arithmetic is convenient because
there is a perfect one-to-one correspondence between representations
and values (in particular, no separate +0 and −0), and because
addition, subtraction and multiplication do not need to distinguish
between signed and unsigned types. Other possibilities include offset
binary, sign-magnitude, and ones' complement.
You mean 232-1, NOT 232-1.
But your question is about why people use 231. The loss of a whole bit is if the int is a signed one. You lose the first bit to indicate if the number is positive or negative.
A signed int (32 bit) ranges from -2,147,483,648 to +2,147,483,647.
An unsigned int (32 bit) ranges from 0 to 4,294,967,295 (which is 232 -1).
int is a signed data type.
The first bit represents the sign, followed by bits for the value.
If the sign bit is 0, the value is simply the sum of all bits set to 1 ( to the power of 2).
e.g. 0...00101 is 20 + 22 = 5
if the first bit is 1, the value is -232 + the sum of all bits set to 1 (to the power of 2).
e.g. 1...111100 is -232 + 231 + 230 + ... + 22 = -4
all 0 will this result in zero.
When you calculate after, you will see that any number between (and including) the range - 231 and 20 + ... + 231 = 232 - 1 can be created with those 32 bits.
232-1 is not same as 232 - 1 (as 0 is included in the range, we subtract 1)
For your understanding, let us replace by small number 4 instead of 32
24-1 = 8
whereas 24-1 = 16-1 = 15.
Hope this helps!
Since integer is 32 bit. It could store total 2^32 values. So an integer ranges from -2^31 to 2^31-1 giving a total of 2^32 values(2^31 values in the negative range+2^31 values in positive range including 0).However, the first bit(the most significant bit) is reserved for the sign of the integer. Again u need to understand how negative integers are stored.They are stored in 2's complement form, So -9 will be stored as 2's complement of 9.
So 9 is stored in 32 bit system as
0000 0000 0000 0000 0000 0000 0000 1001
and -9 will be stored as
1111 1111 1111 1111 1111 1111 1111 0111 (2's complement of 9).
Again due to some arithmetic operation on an integer, if it happens to exceed the maximum value(2^31-1) then it will recycle to the negative values. So if you add 1 to 2^31-1 it will give you -2^31.

Representation of negative numbers in C?

How does C represent negative integers?
Is it by two's complement representation or by using the MSB (most significant bit)?
-1 in hexadecimal is ffffffff.
So please clarify this for me.
ISO C (C99 section 6.2.6.2/2 in this case but it carries forward to later iterations of the standard(a)) states that an implementation must choose one of three different representations for integral data types, two's complement, ones' complement or sign/magnitude (although it's incredibly likely that the two's complement implementations far outweigh the others).
In all those representations, positive numbers are identical, the only difference being the negative numbers.
To get the negative representation for a positive number, you:
invert all bits then add one for two's complement.
invert all bits for ones' complement.
invert just the sign bit for sign/magnitude.
You can see this in the table below:
number | two's complement | ones' complement | sign/magnitude
=======|=====================|=====================|====================
5 | 0000 0000 0000 0101 | 0000 0000 0000 0101 | 0000 0000 0000 0101
-5 | 1111 1111 1111 1011 | 1111 1111 1111 1010 | 1000 0000 0000 0101
Keep in mind that ISO doesn't mandate that all bits are used in the representation. They introduce the concept of a sign bit, value bits and padding bits. Now I've never actually seen an implementation with padding bits but, from the C99 rationale document, they have this explanation:
Suppose a machine uses a pair of 16-bit shorts (each with its own sign bit) to make up a 32-bit int and the sign bit of the lower short is ignored when used in this 32-bit int. Then, as a 32-bit signed int, there is a padding bit (in the middle of the 32 bits) that is ignored in determining the value of the 32-bit signed int. But, if this 32-bit item is treated as a 32-bit unsigned int, then that padding bit is visible to the user’s program. The C committee was told that there is a machine that works this way, and that is one reason that padding bits were added to C99.
I believe that machine they may have been referring to was the Datacraft 6024 (and it's successors from Harris Corp). In those machines, you had a 24-bit word used for the signed integer but, if you wanted the wider type, it strung two of them together as a 47-bit value with the sign bit of one of the words ignored:
+---------+-----------+--------+-----------+
| sign(1) | value(23) | pad(1) | value(23) |
+---------+-----------+--------+-----------+
\____________________/ \___________________/
upper word lower word
(a) Interestingly, given the scarcity of modern implementations that actually use the other two methods, there's been a push to have two's complement accepted as the one true method. This has gone quite a long way in the C++ standard (WG21 is the workgroup responsible for this) and is now apparently being considered for C as well (by WG14).
C allows sign/magnitude, one's complement and two's complement representations of signed integers. Most typical hardware uses two's complement for integers and sign/magnitude for floating point (and yet another possibility -- a "bias" representation for the floating point exponent).
-1 in hexadecimal is ffffffff. So please clarify me in this regard.
In two's complement (by far the most commonly used representation), each bit except the most significant bit (MSB), from right to left (increasing order of magnitude) has a value 2n where n increases from zero by one. The MSB has the value -2n.
So for example in an 8bit twos-complement integer, the MSB has the place value -27 (-128), so the binary number: 1111 11112 is equal to -128 + 0111 11112 = -128 + 127 = -1
One useful feature of two's complement is that a processor's ALU only requires an adder block to perform subtraction, by forming the two's complement of the right-hand operand. For example 10 - 6 is equivalent to 10 + (-6); in 8bit binary (for simplicity of explanation) this looks like:
0000 1010
+1111 1010
---------
[1]0000 0100 = 4 (decimal)
Where the [1] is the discarded carry bit. Another example; 10 - 11 == 10 + (-11):
0000 1010
+1111 0101
---------
1111 1111 = -1 (decimal)
Another feature of two's complement is that it has a single value representing zero, whereas sign-magnitude and one's complement each have two; +0 and -0.
For integral types it's usually two's complement (implementation specific). For floating point, there's a sign bit.

Resources