What is the difference between signed and unsigned int?
As you are probably aware, ints are stored internally in binary. Typically an int contains 32 bits, but in some environments might contain 16 or 64 bits (or even a different number, usually but not necessarily a power of two).
But for this example, let's look at 4-bit integers. Tiny, but useful for illustration purposes.
Since there are four bits in such an integer, it can assume one of 16 values; 16 is two to the fourth power, or 2 times 2 times 2 times 2. What are those values? The answer depends on whether this integer is a signed int or an unsigned int. With an unsigned int, the value is never negative; there is no sign associated with the value. Here are the 16 possible values of a four-bit unsigned int:
bits value
0000 0
0001 1
0010 2
0011 3
0100 4
0101 5
0110 6
0111 7
1000 8
1001 9
1010 10
1011 11
1100 12
1101 13
1110 14
1111 15
... and Here are the 16 possible values of a four-bit signed int:
bits value
0000 0
0001 1
0010 2
0011 3
0100 4
0101 5
0110 6
0111 7
1000 -8
1001 -7
1010 -6
1011 -5
1100 -4
1101 -3
1110 -2
1111 -1
As you can see, for signed ints the most significant bit is 1 if and only if the number is negative. That is why, for signed ints, this bit is known as the "sign bit".
In laymen's terms an unsigned int is an integer that can not be negative and thus has a higher range of positive values that it can assume. A signed int is an integer that can be negative but has a lower positive range in exchange for more negative values it can assume.
int and unsigned int are two distinct integer types. (int can also be referred to as signed int, or just signed; unsigned int can also be referred to as unsigned.)
As the names imply, int is a signed integer type, and unsigned int is an unsigned integer type. That means that int is able to represent negative values, and unsigned int can represent only non-negative values.
The C language imposes some requirements on the ranges of these types. The range of int must be at least -32767 .. +32767, and the range of unsigned int must be at least 0 .. 65535. This implies that both types must be at least 16 bits. They're 32 bits on many systems, or even 64 bits on some. int typically has an extra negative value due to the two's-complement representation used by most modern systems.
Perhaps the most important difference is the behavior of signed vs. unsigned arithmetic. For signed int, overflow has undefined behavior. For unsigned int, there is no overflow; any operation that yields a value outside the range of the type wraps around, so for example UINT_MAX + 1U == 0U.
Any integer type, either signed or unsigned, models a subrange of the infinite set of mathematical integers. As long as you're working with values within the range of a type, everything works. When you approach the lower or upper bound of a type, you encounter a discontinuity, and you can get unexpected results. For signed integer types, the problems occur only for very large negative and positive values, exceeding INT_MIN and INT_MAX. For unsigned integer types, problems occur for very large positive values and at zero. This can be a source of bugs. For example, this is an infinite loop:
for (unsigned int i = 10; i >= 0; i --) {
printf("%u\n", i);
}
because i is always greater than or equal to zero; that's the nature of unsigned types. (Inside the loop, when i is zero, i-- sets its value to UINT_MAX.)
Sometimes we know in advance that the value stored in a given integer variable will always be positive-when it is being used to only count things, for example. In such a case we can declare the variable to be unsigned, as in, unsigned int num student;. With such a declaration, the range of permissible integer values (for a 32-bit compiler) will shift from the range -2147483648 to +2147483647 to range 0 to 4294967295. Thus, declaring an integer as unsigned almost doubles the size of the largest possible value that it can otherwise hold.
In practice, there are two differences:
printing (eg with cout in C++ or printf in C): unsigned integer bit representation is interpreted as a nonnegative integer by print functions.
ordering: the ordering depends on signed or unsigned specifications.
this code can identify the integer using ordering criterion:
char a = 0;
a--;
if (0 < a)
printf("unsigned");
else
printf("signed");
char is considered signed in some compilers and unsigned in other compilers. The code above determines which one is considered in a compiler, using the ordering criterion. If a is unsigned, after a--, it will be greater than 0, but if it is signed it will be less than zero. But in both cases, the bit representation of a is the same. That is, in both cases a-- does the same change to the bit representation.
Related
For about 3-4 hours and start reading about bit fields in C and I can't understand how they work. For example, I can't understand why the program has the output: -1, 2, -3
#include <stdio.h>
struct REGISTER {
int bit1 : 1;
int : 2;
int bit3 : 4;
int bit4 : 4;
};
int main(void) {
struct REGISTER bit = { 1, 2, 13 };
printf("%d, %d, %d\n", bit.bit1, bit.bit3, bit.bit4);
return 0;
}
Can someone give me an explanation? I tend to think that if I use unsigned in the struct then the output would be positive. But I don't know where that -3 comes from.
Your compiler considers the type int of a bit field as the type signed int.
Consider the binary representation of initializers (it is enough to consider one byte)
1 -> 0b00000001
2 -> 0b00000010
13 -> 0b00001101
So the first bit-field having the width equal to 1 gets 1. For an integer with one bit this bit is the sign bit. So such a bit field can represent two values 0 and -1 (in the 2's complement representation).
The second initialized bit-field has the width equal to 4. So it stores the bit representation 0010 that is equal to 2.
The third initialized bit-field also has the width equal to 4. So its stored bit combination is equal to 1101. The most significant bit is the sign bit and it is set. So the bit-field contains a negative number. This number is equal to -3.
1101 ( = -3 )
+
0011 ( = 3 )
====
0000 ( = 0 )
It is implementation-defined whether an int bitfield is signed or unsigned, hence it is actually an error in any program where you care about the value - if you care for the value you will qualify it either as signed or unsigned.
Now, your compiler considers bitfields without specified signedness as signed. I.e. int bit4: 4 tells that that bit-field is 4 bits wide and signed.
13 cannot be represented in a signed 4-bit bitfield as the maximum value is 7, no matter the representation of negative numbers - 2's complement, 1's complement, sign-and-magnitude. An implementation-specified conversion will now occur: in your case the bit representation 1101 is stored as-is in the 2's complement signed bitfield, and it is considered as the 2's complement negative value -3.
Same happens for 1-bit signed bitfield: the one bit is the sign bit, hence there are only 2 possible values: 0 and -1 on two's complement systems. On one's complement system, or sign-and-magnitude, one-bit bitfield does not make any sense at all because it can only store 0 or a trap value.
If you want it to be able to store value 13, you will have to use at least 5 bits or use unsigned int: 4.
Please Dont use signed int in this operation. here that lead to minus results. if we use unsigned int,The output comes out to be negative
What happened behind is that the value 13 was stored in 4 bit signed integer which is equal to 1101. The MSB is a 1, so it’s a negative number and you need to calculate the 2’s complement of the binary number to get its actual value which is what is done internally. By calculating 2’s complement you will arrive at the value 0011 which is equivalent to decimal number 3 and since it was a negative number you get a -3.
#include <stdio.h>
struct REGISTER {
unsigned int bit1: 1;
unsigned int :2;
unsigned int bit3: 4;
unsigned int bit4: 4;
};
int main(void) {
struct REGISTER bit={1,2,13};
printf("%d, %d, %d\n", bit.bit1, bit.bit3, bit.bit4);
return 0;
}
Here the output will be exactly 1,2,13
Thanks
It's pretty easy, you're initializing the bitfields as follows:
1 as bit length 1. That means the MSB is 1, which since the type is int is the sign bit. So the resulting value is -0-1 = -1.
2 as bit length 4. MSB (sign bit) is 0, so the result is positive, ie 2.
13 as bit length 4. In binary that is 1101, so the MSB is 1 (negative), so the resulting value is -2-1 = -3.
You can read more about two's complement (the format in which numbers are stored on Intel architectures) here. The short version is that negative numbers have MSB=1, and to calculate their value you take the negative of their not representation (without the sign bit) minus one.
I read about twos complement on wikipedia and on stack overflow, this is what I understood but I'm not sure if it's correct
signed int
the left most bit is interpreted as -231 and this how we can have negative numbers
unsigned int
the left most bit is interpreted as +231 and this is how we achieve large positive numbers
update
What will the compiler see when we store 3 vs -3?
I thought 3 is always 00000000000000000000000000000011
and -3 is always 11111111111111111111111111111101
example for 3 vs -3 in C:
unsigned int x = -3;
int y = 3;
printf("%d %d\n", x, y); // -3 3
printf("%u %u\n", x, y); // 4294967293 3
printf("%x %x\n", x, y); // fffffffd 3
Two's complement is a way to represent negative integers in binary.
First of all, here's a standard 32-bit integer ranges:
Signed = -(2 ^ 31) to ((2 ^ 31) - 1)
Unsigned = 0 to ((2 ^ 32) - 1)
In two's complement, a negative is represented by inverting the bits of its positive equivalent and adding 1:
10 which is 00001010 becomes -10 which is 11110110 (if the numbers were 8-bit integers).
Also, the binary representation is only important if you plan on using bitwise operators.
If your doing basic arithmetic, then this is unimportant.
The only time this may give unexpected results outside of the aforementioned times is getting the absolute value of the signed version of -(2 << 31) which will always give a negative.
Your problem does not have to do with the representation, but the type.
A negative number in an unsigned integer is represented the same, the difference is that it becomes a super high number since it must be positive and the sign bit works as normal.
You should also realize that ((2^32) - 5) is the exact same thing as -5 if the value is unsigned, etc.
Therefore, the following holds true:
unsigned int x = (2 << 31) - 5;
unsigned int y = -5;
if (x == y) {
printf("Negative values wrap around in unsigned integers on underflow.");
}
else {
printf( "Unsigned integer underflow is undefined!" );
}
The numbers don't change, just the interpretation of the numbers. For most two's complement processors, add and subtract do the same math, but set a carry / borrow status assuming the numbers are unsigned, and an overflow status assuming the number are signed. For multiply and divide, the result may be different between signed and unsigned numbers (if one or both numbers are negative), so there are separate signed and unsigned versions of multiply and divide.
For 32-bit integers, for both signed and unsigned numbers, n-th bit is always interpreted as +2n.
For signed numbers with the 31th bit set, the result is adjusted by -232.
Example:
1111 1111 1111 1111 1111 1111 1111 11112 as unsigned int is interpreted as 231+230+...+21+20. The interpretation of this as a signed int would be the same MINUS 232, i.e. 231+230+...+21+20-232 = -1.
(Well, it can be said that for signed numbers with the 31th bit set, this bit is interpreted as -231 instead of +231, like you said in the question. I find this way a little less clear.)
Your representation of 3 and -3 is correct: 3 = 0x00000003, -3 + 232 = 0xFFFFFFFD.
Yes, you are correct, allow me to explain a bit further for clarification purposes.
The difference between int and unsigned int is how the bits are interpreted. The machine processes unsigned and signed bits the same way, but there are extra bits added for signing. Two's complement notation is very readable when dealing with related subjects.
Example:
The number 5's, 0101, inverse is 1011.
In C++, it's depends when you should use each data type. You should use unsigned values when functions or operators return those values. ALUs handle signed and unsigned variables very similarly.
The exact rules for writing in Two's complement is as follows:
If the number is positive, count up to 2^(32-1) -1
If it is 0, use all zeroes
For negatives, flip and switch all the 1's and 0's.
Example 2(The beauty of Two's complement):
-2 + 2 = 0 is displayed as 0010 + 1110; and that is 10000. With overflow at the end, we have our result as 0000;
If I have the following:
char v = 32; // 0010 0000
then I do:
v << 2
the number becames negative. // 1000 0000 -128
I read the standard but it is only written:
If E1 has a signed type and nonnegative value, and E1 × 2 E2 is
representable in the result type, then that is the resulting value;
otherwise, the behavior is undefined.
so I don't understand if is a rule that if a bit goes on most left bit the
number must begin negative.
I'm using GCC.
Left shifting it twice would give 1000 0000)2 = 128)10.
If 128 is representable in char i.e. you're in some machine (with a supporting compiler) that provides a char of size > 8 bits then 128 would be the value you get (since it's representable in such a type).
Otherwise, if the size of a char is just 8 bits like most common machines, for a signed character type that uses two's complement for negative values, [-128, 127] is the representable range. You're in undefined behaviour land since it's not representable as-is in that type.
Signed data primitives like char use two's complement(http://en.wikipedia.org/wiki/Twos_complement) to encode value. You probably are looking for is unsigned char which won't encode the value using two's complement(no negatives).
Try using unsigned char instead char uses less bit for representing your character, by using unsigned char you avail more bits for representing your character
unsigned char var=32;
v=var<<2;
From C traps and pitfalls
If a and b are two integer variables, known to be non-negative then to
test whether a+b might overflow use:
if ((int) ((unsigned) a + (unsigned) b) < 0 )
complain();
I didn't get that how comparing the sum of both integers with zero will let you know that there is an overflow?
The code you saw for testing for overflow is just bogus.
For signed integers, you must test like this:
if (a^b < 0) overflow=0; /* opposite signs can't overflow */
else if (a>0) overflow=(b>INT_MAX-a);
else overflow=(b<INT_MIN-a);
Note that the cases can be simplified a lot if one of the two numbers is a constant.
For unsigned integers, you can test like this:
overflow = (a+b<a);
This is possible because unsigned arithmetic is defined to wrap, unlike signed arithmetic which invokes undefined behavior on overflow.
When an overflow occurs, the sum exceeds some range (let's say this one):
-4,294,967,295 < sum < 4,294,967,295
So when the sum overflows, it wraps around and goes back to the beginning:
4,294,967,295 + 1 = -4,294,967,295
If the sum is negative and you know the the two numbers are positive, then the sum overflowed.
If a and b are known to be non negative integers, the sequence (int) ((unsigned) a + (unsigned) b) will return indeed a negative number on overflow.
Lets assume a 4 bit (max positive integer is 7 and max unsigned integer is 15) system with the following values:
a = 6
b = 4
a + b = 10 (overflow if performed with integers)
While if we do the addition using the unsigned conversion, we will have:
int((unsigned)a + (unsigned)b) = (int) ((unsigned)(10)) = -6
To understand why, we can quickly check the binary addition:
a = 0110 ; b = 0100 - first bit is the sign bit for signed int.
0110 +
0100
------
1010
For unsigned int, 1010 = 10. While the same representation in signed int means -6.
So the result of the operation is indeed < 0.
If the integers are unsigned and you're assuming IA32, you can do some inline assembly to check the value of the CF flag. The asm can be trimmed a bit, I know.
int of(unsigned int a, unsigned int b)
{
unsigned int c;
__asm__("addl %1,%2\n"
"pushfl \n"
"popl %%edx\n"
"movl %%edx,%0\n"
:"=r"(c)
:"r"(a), "r"(b));
return(c&1);
}
There are some good explanations on this page.
Here's the simple way from that page that I like:
Do the addition normally, then check the result (e.g. if (a+23<23) overflow).
As we know that Addition of 2 Numbers might be overflow.
So for that we can use following way to add the two numbers.
Adder Concept
Suppose we have 2 numbers "a" AND "b"
(a^b)+(a&b);
this equation will give the correct result..
And this is patented by the Samsung.
assuming twos compliment representation and 8 bit integers, the most significant bit has sign (1 for negative and 0 for positive), since we know the integers are non negative, it means most significant bit is 0 for both integers. Now if adding the unsigned representation of these numbers result in a 1 in most significant bit then that mean the addition has overflowed, and to check whether an unsigned integer has a 1 in most significant bit is to check if it is more than the range of signed integer, or you can convert it to signed integer which will be negative (because the most significant bit is 1)
example 8 bit signed integers (range -128 to 127):
twos compliment of 127 = 0111 1111
twos complement of 1 = 0000 0001
unsigned 127 = 0111 1111
unsigned 1 = 0000 0001
unsigned sum = 1000 0000
sum is 128, which is not a overflow for unsigned integer but is a overflow for signed integer, the most significant bit gives it away.
Consider these definitions:
int x=5;
int y=-5;
unsigned int z=5;
How are they stored in memory? Can anybody explain the bit representation of these in memory?
Can int x=5 and int y=-5 have same bit representation in memory?
ISO C states what the differences are.
The int data type is signed and has a minimum range of at least -32767 through 32767 inclusive. The actual values are given in limits.h as INT_MIN and INT_MAX respectively.
An unsigned int has a minimal range of 0 through 65535 inclusive with the actual maximum value being UINT_MAX from that same header file.
Beyond that, the standard does not mandate twos complement notation for encoding the values, that's just one of the possibilities. The three allowed types would have encodings of the following for 5 and -5 (using 16-bit data types):
two's complement | ones' complement | sign/magnitude
+---------------------+---------------------+---------------------+
5 | 0000 0000 0000 0101 | 0000 0000 0000 0101 | 0000 0000 0000 0101 |
-5 | 1111 1111 1111 1011 | 1111 1111 1111 1010 | 1000 0000 0000 0101 |
+---------------------+---------------------+---------------------+
In two's complement, you get a negative of a number by inverting all bits then adding 1.
In ones' complement, you get a negative of a number by inverting all bits.
In sign/magnitude, the top bit is the sign so you just invert that to get the negative.
Note that positive values have the same encoding for all representations, only the negative values are different.
Note further that, for unsigned values, you do not need to use one of the bits for a sign. That means you get more range on the positive side (at the cost of no negative encodings, of course).
And no, 5 and -5 cannot have the same encoding regardless of which representation you use. Otherwise, there'd be no way to tell the difference.
As an aside, there are currently moves underway, in both C and C++ standards, to nominate two's complement as the only encoding for negative integers.
Because it's all just about memory, in the end all the numerical values are stored in binary.
A 32 bit unsigned integer can contain values from all binary 0s to all binary 1s.
When it comes to 32 bit signed integer, it means one of its bits (most significant) is a flag, which marks the value to be positive or negative.
The C standard specifies that unsigned numbers will be stored in binary. (With optional padding bits). Signed numbers can be stored in one of three formats: Magnitude and sign; two's complement or one's complement. Interestingly that rules out certain other representations like Excess-n or Base −2.
However on most machines and compilers store signed numbers in 2's complement.
int is normally 16 or 32 bits. The standard says that int should be whatever is most efficient for the underlying processor, as long as it is >= short and <= long then it is allowed by the standard.
On some machines and OSs history has causes int not to be the best size for the current iteration of hardware however.
Here is the very nice link which explains the storage of signed and unsigned INT in C -
http://answers.yahoo.com/question/index?qid=20090516032239AAzcX1O
Taken from this above article -
"process called two's complement is used to transform positive numbers into negative numbers. The side effect of this is that the most significant bit is used to tell the computer if the number is positive or negative. If the most significant bit is a 1, then the number is negative. If it's 0, the number is positive."
Assuming int is a 16 bit integer (which depends on the C implementation, most are 32 bit nowadays) the bit representation differs like the following:
5 = 0000000000000101
-5 = 1111111111111011
if binary 1111111111111011 would be set to an unsigned int, it would be decimal 65531.