Related
This question already has answers here:
Why Do We have unsigned and signed int type in C?
(4 answers)
Closed 4 years ago.
I am studying c language and there are two different integer types, signed/unsigned.
Signed integers can present both positive and negative numbers. Why do we
need unsigned integers then?
One word answer is "range"!
When you declare a signed integer, it takes 4 bytes/ 32 bits memory (on a 32 bit system).
Out of those 32 bits, 1 bit is for sign and other 31 bits represent number. Means you can represent any number between -2,147,483,648 to 2,147,483,647 i.e. 2^31.
What if you want to use 2,147,483,648? Go for 8 bytes? But if you are not interested in negative numbers, then isn't it wastage of 4 bytes?
If you use unsigned int, all 32bits represent your number as you don't need to spare 1 bit for sign. Hence, with unsigned int, you can go from 0 to 4,294,967,295 i.e. 2^32
Same applies for other data types.
The reason is that Integer always has a fixed size. On most systems, an Integer is 32 bits large.
So no matter of having a signed or unsigned Integer, it takes always the same amout of memory. And that's where signed and unsigned differ: the range
Where an unsigned integer has a range of 0 to 4294967295 (2³²-1), the signed integer has a range of -2147483647 to 2147483648
unsigned types have an extra bit of storage, allowing them a maximum magnitude of 2CHAR_BIT * sizeof(type)-1 for positive values. This is why types like size_t, which are meant to store sizes of files, strings, arrays, etc. are unsigned.
With signed integers, one bit is reserved for the sign, so if an int is 32-bits long, you only get 31 bits to store the magnitude of the number. An unsigned int does not have this restriction; the MSB is used for magnitude as well, but it comes at the expense of no longer being able to be negative.
Signed integer overflow is undefined by the C standard, whereas unsigned integer overflow is guaranteed to wrap-around and reset to zero. For example, the following code invokes undefined behavior in C:
int a = INT_MAX;
a++;
Whereas this is guaranteed to wrap-around back to zero:
unsigned int a = UINT_MAX;
a++;
Unsigned types are generally better for performing bit operations on
There are two/three reasons, given that C must offer the greatest range of possibilities to the programmer.
The first is that an unsigned integer can hold a double (positive) value in respect to its signed counterpart. And we don't want to waste any single bit right?
The second is that a protocol, or some data structure a program must cope with, can use unsigned values, so it is handy to have that data type.
The third is that processors actually have unsigned types, so C language gives them available. May be that there are algorithms which relay on overflow, for example.
There can be still other motivations, probably I don't remember them all.
Personally, I make large use of unsigned integers in embedded applications. For example, using a single unsigned char as an index into a circular buffer of 256 elements, makes it simple and fast to increment the index without checking for overflow, because when the index overflows, it does exactly what I want it to do (reset to zero). Again, there are probably many other situations, I tell just the first that comes to my mind.
It's all about memory. They are used to represent a greater number without making use of a larger amount of memory.
Numbers are stored on the computer in binary form. Signed numeric values use a process called two's complement to transform positive numbers into negative ones where the first bit, the one that could represent the highest value is not taken into account for any calculation.
It means that the numeric signed type of your choice can only store a maximum value of N available bits minus 1 bit and the remaining bit will be used to determine the sign of the value, while an unsigned type of your choice can make use of all its available bits to store its value with the drawback of not being able to represent negative values.
This question already has answers here:
When an int is cast to a short and truncated, how is the new value determined?
(6 answers)
Closed 6 years ago.
I am playing with type conversions when I came across this in the following code.
int i = 123456;
short s = i;
printf("%i",s);
Here an int value is converted to short implicitly by C compiler. Printing this value, results in -7616 on the console output. I understand that short has a smaller range than that of int but I need to know how C language came to this value -7616 from 123456 instead of truncating it, which I guess is the intuitive option.
The value is being truncated, but not in the way you expect.
The decimal value 123456 has hex value 0x1E240. When assigning this value to a short, which in your case appears to be 16 bits, it takes just the lower 16 bits, which is 0xE240.
If s were an unsigned short, it would have the value 57920. But because it's signed, the value is interpreted as such. Since your machine appears to be using two's complement representation for negative numbers, the decimal equivalent of 0xE240 is -7616 (= 57920 - 65536).
Note that this behavior is implementation defined, so you can't depend on this happening on different machines or different compilers.
Values get truncated during this conversion in such a way that the less significant bits are preserved, but the higher bits are lost. This is consistent whether you use a big or little endian machine.
In your specific example, 123456 is written as 11110001001000000 in binary.
Keep in mind that ints use 4 bytes, so there are a bunch of leading zeros that I omitted from that number. Shorts are limited to 2 bytes, immediately truncating the number to 1110001001000000 strictly. Notice that there is a 1 in front, which in 2's complement would immediately indicate a negative number.
Going through the 2's complement process shows how you would get the result you did.
In C, when you convert an int to a short, you are already asking for trouble. In cases like when working with .wav files, you typically use int datatypes for manipulation of attributes before using a series of if statements to control truncation.
//you will need to #include <limits.h>
int i = 123456;
short s;
if(i > SHRT_MIN && i < SHRT_MAX) //if your int is within the range of shorts
{
s = i;
}
//if your int is too large or too negative, you can control the truncation
if(i < SHRT_MIN) //if int is too negative
{
i=SHRT_MIN;
}
else //if int is too positive
{
i=SHRT_MAX;
}
Let's imagine a 4bit integer is converted to a 2bit integer. The first bit determines whether the number is positive (0) or negative (1).
0111 (dec: 7) converts to 11 (dec: -1). It will cut higher bits. And since the first bit 1 represents - it will be negative.
Data type short (2 byte size) has values from [-32768] to [32767].
int main()
{
int i = 32767;
short c = i;
printf("%d\n",c);
c = i+1;
printf("%d\n",c);
c = i+2;
printf("%d\n",c);
}
//Output:
32767
-32768 << rolling
-32767 << rolling
If you see the values keep on rolling from least value (-32768) to max values if short is assigned values greater than allowed max value i.e [32767]
int main()
{
int i = -32768;
short c = i;
printf("%d\n",c);
c = i-1;
printf("%d\n",c);
c = i-2;
printf("%d\n",c);
}
//Output:
-32768
32767 << rolling
32766 << rolling
If you see the values keep on rolling from max value (32767) to least values if short is assigned values less than allowed min value i.e [-32768]
This question already has answers here:
Assigning negative numbers to an unsigned int?
(14 answers)
Closed 6 years ago.
This should be a pretty simple question but I can't seem to find the answer in my textbook and can't find the right keywords to find it online.
What does it mean when you have a negative sign in front of an unsigned int?
Specifically, if x is an unsigned int equal to 1, what is the bit value of -x?
Per the C standard, arithmetic for unsigned integers is performed modulo 2bit width. So, for a 32-bit integer, the negation will be taken mod 232 = 4294967296.
For a 32-bit number, then, the value you'll get if you negate a number n is going to be 0-n = 4294967296-n. In your specific case, assuming unsigned int is 32 bits wide, you'd get 4294967296-1 = 4294967295 = 0xffffffff (the number with all bits set).
The relevant text in the C standard is in §6.2.5/9:
a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting type
It will overflow in the negative direction, i.e. if your int is 16 bits x will be 65535. The bit value will be 1111111111111111 (16 ones)
If int is 32 bits, x will be 4294967295
when you apply the "-", the Two's complement of the integer is stored in variable. see here for details
Following C code displays the result correctly, -1.
#include <stdio.h>
main()
{
unsigned x = 1;
unsigned y=x-2;
printf("%d", y );
}
But in general, is it always safe to do subtraction involving
unsigned integers?
The reason I ask the question is that I want to do some conditioning
as follows:
unsigned x = 1; // x was defined by someone else as unsigned,
// which I had better not to change.
for (int i=-5; i<5; i++){
if (x+i<0) continue
f(x+i); // f is a function
}
Is it safe to do so?
How are unsigned integers and signed integers different in
representing integers? Thanks!
1: Yes, it is safe to subtract unsigned integers. The definition of arithmetic on unsigned integers includes that if an out-of-range value would be generated, then that value should be adjusted modulo the maximum value for the type, plus one. (This definition is equivalent to truncating high bits).
Your posted code has a bug though: printf("%d", y); causes undefined behaviour because %d expects an int, but you supplied unsigned int. Use %u to correct this.
2: When you write x+i, the i is converted to unsigned. The result of the whole expression is a well-defined unsigned value. Since an unsigned can never be negative, your test will always fail.
You also need to be careful using relational operators because the same implicit conversion will occur. Before I give you a fix for the code in section 2, what do you want to pass to f when x is UINT_MAX or close to it? What is the prototype of f ?
3: Unsigned integers use a "pure binary" representation.
Signed integers have three options. Two can be considered obsolete; the most common one is two's complement. All options require that a positive signed integer value has the same representation as the equivalent unsigned integer value. In two's complement, a negative signed integer is represented the same as the unsigned integer generated by adding UINT_MAX+1, etc.
If you want to inspect the representation, then do unsigned char *p = (unsigned char *)&x; printf("%02X%02X%02X%02X", p[0], p[1], p[2], p[3]);, depending on how many bytes are needed on your system.
Its always safe to subtract unsigned as in
unsigned x = 1;
unsigned y=x-2;
y will take on the value of -1 mod (UINT_MAX + 1) or UINT_MAX.
Is it always safe to do subtraction, addition, multiplication, involving unsigned integers - no UB. The answer will always be the expected mathematical result modded by UINT_MAX+1.
But do not do printf("%d", y ); - that is UB. Instead printf("%u", y);
C11 §6.2.5 9 "A computation involving unsigned operands can never overflow, because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting type."
When unsigned and int are used in +, the int is converted to an unsigned. So x+i has an unsigned result and never is that sum < 0. Safe, but now if (x+i<0) continue is pointless. f(x+i); is safe, but need to see f() prototype to best explain what may happen.
Unsigned integers are always 0 to power(2,N)-1 and have well defined "overflow" results. Signed integers are 2's complement, 1's complement, or sign-magnitude and have UB on overflow. Some compilers take advantage of that and assume it never occurs when making optimized code.
Rather than really answering your questions directly, which has already been done, I'll make some broader observations that really go to the heart of your questions.
The first is that using unsigned in loop bounds where there's any chance that a signed value might crop up will eventually bite you. I've done it a bunch of times over 20 years and it has ultimately bit me every time. I'm now generally opposed to using unsigned for values that will be used for arithmetic (as opposed to being used as bitmasks and such) without an excellent justification. I have seen it cause too many problems when used, usually with the simple and appealing rationale that “in theory, this value is non-negative and I should use the most restrictive type possible”.
I understand that x, in your example, was decided to be unsigned by someone else, and you can't change it, but you want to do something involving x over an interval potentially involving negative numbers.
The “right” way to do this, in my opinion, is first to assess the range of values that x may take. Suppose that the length of an int is 32 bits. Then the length of an unsigned int is the same. If it is guaranteed to be the case that x can never be larger than 2^31-1 (as it often is), then it is safe in principle to cast x to a signed equivalent and use that, i.e. do this:
int y = (int)x;
// Do your stuff with *y*
x = (unsigned)y;
If you have a long that is longer than unsigned, then even if x uses the full unsigned range, you can do this:
long y = (long)x;
// Do your stuff with *y*
x = (unsigned)y;
Now, the problem with either of these approaches is that before assigning back to x (e.g. x=(unsigned)y; in the immediately preceding example), you really must check that y is non-negative. However, these are exactly the cases where working with the unsigned x would have bitten you anyway, so there's no harm at all in something like:
long y = (long)x;
// Do your stuff with *y*
assert( y >= 0L );
x = (unsigned)y;
At least this way, you'll catch the problems and find a solution, rather than having a strange bug that takes hours to find because a loop bound is four billion unexpectedly.
No, it's not safe.
Integers usually are 4 bytes long, which equals to 32 bits. Their difference in representation is:
As far as signed integers is concerned, the most significant bit is used for sign, so they can represent values between -2^31 and 2^31 - 1
Unsigned integers don't use any bit for sign, so they represent values from 0 to 2^32 - 1.
Part 2 isn't safe either for the same reason as Part 1. As int and unsigned types represent integers in a different way, in this case where negative values are used in the calculations, you can't know what the result of x + i will be.
No, it's not safe. Trying to represent negative numbers with unsigned ints smells like bug. Also, you should use %u to print unsigned ints.
If we slightly modify your code to put %u in printf:
#include <stdio.h>
main()
{
unsigned x = 1;
unsigned y=x-2;
printf("%u", y );
}
The number printed is 4294967295
The reason the result is correct is because C doesn't do any overflow checks and you are printing it as a signed int (%d). This, however, does not mean it is safe practice. If you print it as it really is (%u) you won't get the correct answer.
An Unsigned integer type should be thought of not as representing a number, but as a member of something called an "abstract algebraic ring", specifically the equivalence class of integers congruent modulo (MAX_VALUE+1). For purposes of examples, I'll assume "unsigned int" is 16 bits for numerical brevity; the principles would be the same with 32 bits, but all the numbers would be bigger.
Without getting too deep into the abstract-algebraic nitty-gritty, when assigning a number to an unsigned type [abstract algebraic ring], zero maps to the ring's additive identity (so adding zero to a value yields that value), one means the ring's multiplicative identity (so multiplying a value by one yields that value). Adding a positive integer N to a value is equivalent to adding the multiplicative identity, N times; adding a negative integer -N, or subtracting a positive integer N, will yield the value which, when added to +N, would yield the original value.
Thus, assigning -1 to a 16-bit unsigned integer yields 65535, precisely because adding 1 to 65535 will yield 0. Likewise -2 yields 65534, etc.
Note that in an abstract algebraic sense, every integer can be uniquely assigned into to algebraic rings of the indicated form, and a ring member can be uniquely assigned into a smaller ring whose modulus is a factor of its own [e.g. a 16-bit unsigned integer maps uniquely to one 8-bit unsigned integer], but ring members are not uniquely convertible to larger rings or to integers. Unfortunately, C sometimes pretends that ring members are integers, and implicitly converts them; that can lead to some surprising behavior.
Subtracting a value, signed or unsigned, from an unsigned value which is no smaller than int, and no smaller than the value being subtracted, will yield a result according to the rules of algebraic rings, rather than the rules of integer arithmetic. Testing whether the result of such computation is less than zero will be meaningless, because ring values are never less than zero. If you want to operate on unsigned values as though they are numbers, you must first convert them to a type which can represent numbers (i.e. a signed integer type). If the unsigned type can be outside the range that is representable with the same-sized signed type, it will need to be upcast to a larger type.
Please explain the following paragraph.
"The next question is whether we can assign a certain value to a variable without losing precision. It is not sufficient if we just check for overflow during the addition or subtraction, because someone might add 1 to -5 and assign the result to an unsigned int. Then the actual addition does not overflow, but the result still does not fit."
when i am adding 1 to -5 i dont see any reason to worry.the answer is as it should be -4.
so what is the problem of result not being fit??
you can find the full article here through which i was going:
http://www.fefe.de/intof.html
The binary representation of -4, in a 32-bit word, is as follows (hex notation)
0xfffffffc
When interpreted as an unsigned integer, this bit pattern represents the number 2**32-4, or 18446744073709551612. I'm not sure I would call this phenomenon "overflow", but it is a common mistake to assign a small negative integer to a variable of unsigned type and wind up with a really big positive integer.
This trick is actually exploited for bounds checking: if you have a signed integer i and want to know if it is in the range 0 <= i < n, you can test
if ((unsigned)i < n) { ... }
which gives you the answer using one comparison instead of two. The cast to unsigned has no run-time cost; it just tells the compiler to generate an unsigned comparison instead of a signed comparison.
Try assigning it to a unsigned int, not an int.
The term unsigned int is the key - by default an int datatype will hold negative and positive numbers; however, unsigned ints are always positive. They provide this option because uints can technically hold greater positive values than regular signed ints because they do not need to use a bit to keep track of whether or not its negative or positive.
Please see:
Signed versus Unsigned Integers
The problem is that you're storing -4 in an unsigned int. Unsigned ints can only contain zero and positive values. If you assign -4 to one, you'll actually end up getting a very large positive number (the actual value depends on how wide an int you're using).
The problem is that the sizes of storage such as unsigned int can only hold so much. With 1 and -5 it does not matter, but with 1 and -500000000 you might end up with a confusing result. Also, unsigned storage will interpret anything stored in it as positive, so you cannot put a negative value in an unsigned variable.
Two big things to watch out for:
1. Overflow in the operation itself: 1 + -500000000
2. Issues in casting: (unsigned int)(1 + -500)
Unsigned variables, like unsigned int, cannot hold negative values. So assigning 1 - 5 to an unsigned int won't give you -4. I'm not sure what it'll give you, it's probably compiler specific.
Some code:
signed int first, second;
unsigned int result;
first = obtain(); // happens to be 1
second = obtain(); // happens to be -5
result = first + second; // unexpected result here - very large number - and it's too late to check that there's a problem
Say you obtained those values from keyboard. You need to check before addition that the result can be represented in unsigned int. That's what the article talks about.
By definition the number -4 cannot be represented in an unsigned int. -4 is a signed integer. The same goes for any negative number.
When you assign a negative integer to an unsigned int the actual bits of the number do not change, but they are merely represented differently. You'll get some ridiculously-large number due to the way integers are represented in binary (two's complement).
In two's complement, -4 is represented as 0xfffffffc. When 0xfffffffc is represented as an unsigned int you'll get the number 4,294,967,292.
You have to remember that fundamentally you're working with bits. So you can assign a value of -4 to an unsigned integer and this will place a series of bits into that memory location. Those bits can be interpreted as -4 in certain circumstances. One such circumstance is the obvious one: you've told the compiler/system that the bits in that memory location should be interpreted as a two's compliment signed number. So if you do printf("%s",i) prtinf does its magic and converts the two's compliment number to a magnitude and sign. The magnitude will be 4 and the sign will be negative, so it displays '-4'.
However, if you tell the compiler that the data at that memory location is not signed then the bits don't change but instead their interpretation does. So when you do your addition, store the result in an unsigned integer memory location and then call printf on the result it doesn't bother looking for the sign because by definition it is always positive. It calculates the magnitude and prints it. The magnitude will be off because the sign information is still encoded in the bits but it's treated as magnitude information.