Signed integer and unsigned integer [duplicate] - c

This question already has answers here:
Why Do We have unsigned and signed int type in C?
(4 answers)
Closed 4 years ago.
I am studying c language and there are two different integer types, signed/unsigned.
Signed integers can present both positive and negative numbers. Why do we
need unsigned integers then?

One word answer is "range"!
When you declare a signed integer, it takes 4 bytes/ 32 bits memory (on a 32 bit system).
Out of those 32 bits, 1 bit is for sign and other 31 bits represent number. Means you can represent any number between -2,147,483,648 to 2,147,483,647 i.e. 2^31.
What if you want to use 2,147,483,648? Go for 8 bytes? But if you are not interested in negative numbers, then isn't it wastage of 4 bytes?
If you use unsigned int, all 32bits represent your number as you don't need to spare 1 bit for sign. Hence, with unsigned int, you can go from 0 to 4,294,967,295 i.e. 2^32
Same applies for other data types.

The reason is that Integer always has a fixed size. On most systems, an Integer is 32 bits large.
So no matter of having a signed or unsigned Integer, it takes always the same amout of memory. And that's where signed and unsigned differ: the range
Where an unsigned integer has a range of 0 to 4294967295 (2³²-1), the signed integer has a range of -2147483647 to 2147483648

unsigned types have an extra bit of storage, allowing them a maximum magnitude of 2CHAR_BIT * sizeof(type)-1 for positive values. This is why types like size_t, which are meant to store sizes of files, strings, arrays, etc. are unsigned.
With signed integers, one bit is reserved for the sign, so if an int is 32-bits long, you only get 31 bits to store the magnitude of the number. An unsigned int does not have this restriction; the MSB is used for magnitude as well, but it comes at the expense of no longer being able to be negative.
Signed integer overflow is undefined by the C standard, whereas unsigned integer overflow is guaranteed to wrap-around and reset to zero. For example, the following code invokes undefined behavior in C:
int a = INT_MAX;
a++;
Whereas this is guaranteed to wrap-around back to zero:
unsigned int a = UINT_MAX;
a++;
Unsigned types are generally better for performing bit operations on

There are two/three reasons, given that C must offer the greatest range of possibilities to the programmer.
The first is that an unsigned integer can hold a double (positive) value in respect to its signed counterpart. And we don't want to waste any single bit right?
The second is that a protocol, or some data structure a program must cope with, can use unsigned values, so it is handy to have that data type.
The third is that processors actually have unsigned types, so C language gives them available. May be that there are algorithms which relay on overflow, for example.
There can be still other motivations, probably I don't remember them all.
Personally, I make large use of unsigned integers in embedded applications. For example, using a single unsigned char as an index into a circular buffer of 256 elements, makes it simple and fast to increment the index without checking for overflow, because when the index overflows, it does exactly what I want it to do (reset to zero). Again, there are probably many other situations, I tell just the first that comes to my mind.

It's all about memory. They are used to represent a greater number without making use of a larger amount of memory.
Numbers are stored on the computer in binary form. Signed numeric values use a process called two's complement to transform positive numbers into negative ones where the first bit, the one that could represent the highest value is not taken into account for any calculation.
It means that the numeric signed type of your choice can only store a maximum value of N available bits minus 1 bit and the remaining bit will be used to determine the sign of the value, while an unsigned type of your choice can make use of all its available bits to store its value with the drawback of not being able to represent negative values.

Related

What's the difference between int and unsigned-int value representations in bits in C [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
I noticed by experiment that in unsigned int the value of a number is represented in 32 bit even if the number is taking 1-bit space, the rest of the bits would take 0 as a value. While in int, the value is being put in the bits needed with just 1 more bit added for the sign. Can someone please explain to me what's that?
i noticed by experiment that in unsigned int the value of a number is represented in 32 bit even if the number is taking 1-bit space, the rest of the bits would take 0 as a value. while in int, the value is being put in the bits needed with just 1 more bit added for the sign. can someone please explain to me what's that?
Sure. You're mistaken.
The C standard specifies that, as corresponding unsigned and signed integer types, unsigned int and (signed) int require the same amount of storage (C2011 6.2.5/6). The standard does not specify the exact sizes of these types, but 32 bits is a common choice. If the representation of an unsigned int takes 32 bits in a given C implementation, then so does the representation of that implementation's int.
Furthermore, although C allows a choice from among 3 alternative styles of negative-value representation, the correspondance between signed and unsigned integer representations is defined so that the value bits in the representation of an int -- those that are neither padding bits nor the one sign bit -- represent the same place value as the bits in the same position of the corresponding unsigned integer type (C2011, 6.2.6.2/2). Thus, the representation of a signed integer with non-negative value can be reinterpreted as the corresponding unsigned integer type without changing its numeric value.
Machines use fixed length representations for numbers (at least common machines). Say your machine is 32-bits, that means it uses 32-bits for numbers and their arithmetic.
Usually you have unsigned representation that can represent numbers from 0 to 2^32-1 (but every number uses 32-bits) and 2's-complement 32-bits representation for numbers from -2^31 to 2^31-1 (such a representation uses the most significant bit for the sign). But whatever is the encoding, a number always use the same number of bits whatever is its value.
The answer is very language dependent, and in some languages (like C++) the answer depends on the target CPU architecture.
many languages store both int and unsigned int in 32-bits of space.
"BigInt" support for numbers of unknown size exist in many languages. Which behave much as you describe where they expand based on the requirements of the number being stored.
Some languages, like ruby automatically convert between the two as the math operations demand.

Signed int range confusion

This question might be very basic but i post here only after days of googling and for my proper basic understanding of signed integers in C.
Actually some say signed int has range
-32767 to 32767 and others say it has range
-32768 to 32767
Let us have int a=5 (signed / let us consider just 1 byte)
*the 1st representation of a=5 is represented as 00000101 as a positive number and a=-5 is represented as 10000101 (so range -32767 to 32767 justified)
(here the msb/sign bit is 1/0 the number will be positive/negative and rest(magnitude bits) are unchanged )
*the 2nd representation of a=5 is represented as 00000101 as a positive number and a=-5 is represented as 11111011
(the msb is considered as -128 and the rest of bits are manipulated to obtain -5) (so range -32768 to 32767 justified)
So I confuse between these two things. My doubt is what is the actual range of signed int in c ,1) or 2)
It depends on your environment and typically int can store -2147483648 to 2147483647 if it is 32-bit long and two's complement is used, but C specification says that int can store at least -32767 to 32767.
Quote from N1256 5.2.4.2.1 Sizes of integer types <limits.h>
Their implementation-defined values shall be equal or greater in magnitude (absolute value) to those shown, with the same sign.
— minimum value for an object of type int
INT_MIN -32767 // −(2 15 − 1)
— maximum value for an object of type int
INT_MAX +32767 // 2 15 − 1`
Today, signed ints are usually done in two's complement notation.
The highest bit is the "sign bit", it is set for all negative numbers.
This means you have seven bits to represent different values.
With the highest bit unset, you can (with 16 bits total) represent the values 0..32767.
With the highest bit set, and because you already have a representation for zero, you can represent the values -1..-32768.
This is, however, implementation-defined, other representations do exist as well. The actual range limits for signed integers on your platform / for your compiler are the ones found in your environment's <limits.h>. That is the only definite authority.
On today's desktop systems, an int is usually 32 or 64 bits wide, for a correspondingly much larger range than the 16-bit 32767 / 32768 you are talking of. So either those people are talking about really old platforms, really old knowledge, embedded systems, or the minimum guaranteed range -- the standard states that INT_MIN must be at least -32767, and INT_MAX be at least +32767, the lowest common denominator.
My doubt is what is the actual range of signed int in c ,1) [-32767 to 32767] or 2) [-32768 to 32767]?
The whole point of C and its advantage of high portability to old and new platforms is that code should not care.
C defines the range of int with 2 macros: INT_MIN and INT_MAX. The C spec specifies:
INT_MIN is -32,767 or less.
INT_MAX is +32,767 or more.
If code needs a 16-bit 2's complement type, use int16_t. If code needs a 32-bit or wider type, use long or int32least_t, etc. Do not code assuming int is something that it is not defined to be.
The value 32767 is the maximum positive value you can represent on a signed 16-bit integer. The C corresponding type is short.
The int type is represented on at least the same number of bytes as short and at most the same number of bytes as long. The size of int on 16-bit processors is 2 bytes (the same as short). On 32-bit and higher architecture, the size of int is 4 bytes (the same as long).
No matter the architecture, the minumum value of int is INT_MIN and the maximum value of int is INT_MAX.
Similar, there are constants to get the minimum and maximum values for short (SHRT_MIN and SHRT_MAX), long, char etc. You don't need to use hardcoded constants or to guess what is the minimum value for int on your system.
The representation #1 is named "sign and magnitude representation". It is a theoretical model that uses the most significant byte to store the sign and the rest of the bytes to store the absolute value of the number. It was used by some early computers, probably because it seemed a natural map of the numbers representation in mathematics. However, it is not natural for binary computers.
The representation #2 is named two's complement. The two's-complement system has the advantage that the fundamental arithmetic operations of addition, subtraction, and multiplication are identical to those for unsigned binary numbers (as long as the inputs are represented in the same number of bits and any overflow beyond those bits is discarded from the result). This is why it is the preferred encoding nowadays.
The C standard specifies the lowest limits for integer values. As it is written in the Standard (5.2.4.2.1 Sizes of integer types )
...Their implementation-defined values shall be equal or greater in
magnitude (absolute value) to those shown, with the same sign.
For objects of type int these lowest limits are
— minimum value for an object of type int
INT_MIN -32767 // −(215 − 1)
— maximum value for an object of type int
INT_MAX +32767 // 215 − 1
For the two's complement representation of integers the number of positive values is one less than the number of negative values. So if only tow bytes are used for representations of objects of type int then INT_MIN will be equal to -32768.
Take into account that 32768 in magnitude is greater than the value used in the Standard. So it satisfies the Standard requirement.
On the other habd for the representation "sign and magnitude" the limits (when 2 bytes are used) will be the same as shown in the Standard that is -32767:32767
So the actual limits used in the implementation depend on the width of integers and their representation.

Formulas to determine how the processor stores integers in certain bits

I have thought for a bit, and for 4 byte integers I came up with the following formula:
32(^4) * 256 * 8
Which actually really returns the 4 byte integer range. I am not really sure how.
Is it correct, why and if not.. how do i think of it?
With that formula I can calculate easily any range.
You could use the constants defined in limits.h to test whether your values will fit in the variables you have provided.
In two's complement (which is the representation used by almost all modern computers), the range of a signed integer of k bits is -2k-1 — 2k-1-1, inclusive. In sign-magnitude and one's complement representations, there are two possible representations for 0 and the range is symmetric: -(2k-1-1) — 2k-1-1.
If you need to represent this in portable C, you need to worry about integer overflow. Also, without limits.h you have no good way to know what the bit-length of an int is. Non-portably, if you believe that int has 32 bits (which is not guaranteed), then you could use ((1UL<<31)-1) as the largest possible signed integer.

is it safe to subtract between unsigned integers?

Following C code displays the result correctly, -1.
#include <stdio.h>
main()
{
unsigned x = 1;
unsigned y=x-2;
printf("%d", y );
}
But in general, is it always safe to do subtraction involving
unsigned integers?
The reason I ask the question is that I want to do some conditioning
as follows:
unsigned x = 1; // x was defined by someone else as unsigned,
// which I had better not to change.
for (int i=-5; i<5; i++){
if (x+i<0) continue
f(x+i); // f is a function
}
Is it safe to do so?
How are unsigned integers and signed integers different in
representing integers? Thanks!
1: Yes, it is safe to subtract unsigned integers. The definition of arithmetic on unsigned integers includes that if an out-of-range value would be generated, then that value should be adjusted modulo the maximum value for the type, plus one. (This definition is equivalent to truncating high bits).
Your posted code has a bug though: printf("%d", y); causes undefined behaviour because %d expects an int, but you supplied unsigned int. Use %u to correct this.
2: When you write x+i, the i is converted to unsigned. The result of the whole expression is a well-defined unsigned value. Since an unsigned can never be negative, your test will always fail.
You also need to be careful using relational operators because the same implicit conversion will occur. Before I give you a fix for the code in section 2, what do you want to pass to f when x is UINT_MAX or close to it? What is the prototype of f ?
3: Unsigned integers use a "pure binary" representation.
Signed integers have three options. Two can be considered obsolete; the most common one is two's complement. All options require that a positive signed integer value has the same representation as the equivalent unsigned integer value. In two's complement, a negative signed integer is represented the same as the unsigned integer generated by adding UINT_MAX+1, etc.
If you want to inspect the representation, then do unsigned char *p = (unsigned char *)&x; printf("%02X%02X%02X%02X", p[0], p[1], p[2], p[3]);, depending on how many bytes are needed on your system.
Its always safe to subtract unsigned as in
unsigned x = 1;
unsigned y=x-2;
y will take on the value of -1 mod (UINT_MAX + 1) or UINT_MAX.
Is it always safe to do subtraction, addition, multiplication, involving unsigned integers - no UB. The answer will always be the expected mathematical result modded by UINT_MAX+1.
But do not do printf("%d", y ); - that is UB. Instead printf("%u", y);
C11 §6.2.5 9 "A computation involving unsigned operands can never overflow, because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting type."
When unsigned and int are used in +, the int is converted to an unsigned. So x+i has an unsigned result and never is that sum < 0. Safe, but now if (x+i<0) continue is pointless. f(x+i); is safe, but need to see f() prototype to best explain what may happen.
Unsigned integers are always 0 to power(2,N)-1 and have well defined "overflow" results. Signed integers are 2's complement, 1's complement, or sign-magnitude and have UB on overflow. Some compilers take advantage of that and assume it never occurs when making optimized code.
Rather than really answering your questions directly, which has already been done, I'll make some broader observations that really go to the heart of your questions.
The first is that using unsigned in loop bounds where there's any chance that a signed value might crop up will eventually bite you. I've done it a bunch of times over 20 years and it has ultimately bit me every time. I'm now generally opposed to using unsigned for values that will be used for arithmetic (as opposed to being used as bitmasks and such) without an excellent justification. I have seen it cause too many problems when used, usually with the simple and appealing rationale that “in theory, this value is non-negative and I should use the most restrictive type possible”.
I understand that x, in your example, was decided to be unsigned by someone else, and you can't change it, but you want to do something involving x over an interval potentially involving negative numbers.
The “right” way to do this, in my opinion, is first to assess the range of values that x may take. Suppose that the length of an int is 32 bits. Then the length of an unsigned int is the same. If it is guaranteed to be the case that x can never be larger than 2^31-1 (as it often is), then it is safe in principle to cast x to a signed equivalent and use that, i.e. do this:
int y = (int)x;
// Do your stuff with *y*
x = (unsigned)y;
If you have a long that is longer than unsigned, then even if x uses the full unsigned range, you can do this:
long y = (long)x;
// Do your stuff with *y*
x = (unsigned)y;
Now, the problem with either of these approaches is that before assigning back to x (e.g. x=(unsigned)y; in the immediately preceding example), you really must check that y is non-negative. However, these are exactly the cases where working with the unsigned x would have bitten you anyway, so there's no harm at all in something like:
long y = (long)x;
// Do your stuff with *y*
assert( y >= 0L );
x = (unsigned)y;
At least this way, you'll catch the problems and find a solution, rather than having a strange bug that takes hours to find because a loop bound is four billion unexpectedly.
No, it's not safe.
Integers usually are 4 bytes long, which equals to 32 bits. Their difference in representation is:
As far as signed integers is concerned, the most significant bit is used for sign, so they can represent values between -2^31 and 2^31 - 1
Unsigned integers don't use any bit for sign, so they represent values from 0 to 2^32 - 1.
Part 2 isn't safe either for the same reason as Part 1. As int and unsigned types represent integers in a different way, in this case where negative values are used in the calculations, you can't know what the result of x + i will be.
No, it's not safe. Trying to represent negative numbers with unsigned ints smells like bug. Also, you should use %u to print unsigned ints.
If we slightly modify your code to put %u in printf:
#include <stdio.h>
main()
{
unsigned x = 1;
unsigned y=x-2;
printf("%u", y );
}
The number printed is 4294967295
The reason the result is correct is because C doesn't do any overflow checks and you are printing it as a signed int (%d). This, however, does not mean it is safe practice. If you print it as it really is (%u) you won't get the correct answer.
An Unsigned integer type should be thought of not as representing a number, but as a member of something called an "abstract algebraic ring", specifically the equivalence class of integers congruent modulo (MAX_VALUE+1). For purposes of examples, I'll assume "unsigned int" is 16 bits for numerical brevity; the principles would be the same with 32 bits, but all the numbers would be bigger.
Without getting too deep into the abstract-algebraic nitty-gritty, when assigning a number to an unsigned type [abstract algebraic ring], zero maps to the ring's additive identity (so adding zero to a value yields that value), one means the ring's multiplicative identity (so multiplying a value by one yields that value). Adding a positive integer N to a value is equivalent to adding the multiplicative identity, N times; adding a negative integer -N, or subtracting a positive integer N, will yield the value which, when added to +N, would yield the original value.
Thus, assigning -1 to a 16-bit unsigned integer yields 65535, precisely because adding 1 to 65535 will yield 0. Likewise -2 yields 65534, etc.
Note that in an abstract algebraic sense, every integer can be uniquely assigned into to algebraic rings of the indicated form, and a ring member can be uniquely assigned into a smaller ring whose modulus is a factor of its own [e.g. a 16-bit unsigned integer maps uniquely to one 8-bit unsigned integer], but ring members are not uniquely convertible to larger rings or to integers. Unfortunately, C sometimes pretends that ring members are integers, and implicitly converts them; that can lead to some surprising behavior.
Subtracting a value, signed or unsigned, from an unsigned value which is no smaller than int, and no smaller than the value being subtracted, will yield a result according to the rules of algebraic rings, rather than the rules of integer arithmetic. Testing whether the result of such computation is less than zero will be meaningless, because ring values are never less than zero. If you want to operate on unsigned values as though they are numbers, you must first convert them to a type which can represent numbers (i.e. a signed integer type). If the unsigned type can be outside the range that is representable with the same-sized signed type, it will need to be upcast to a larger type.

Integer overflow problem

Please explain the following paragraph.
"The next question is whether we can assign a certain value to a variable without losing precision. It is not sufficient if we just check for overflow during the addition or subtraction, because someone might add 1 to -5 and assign the result to an unsigned int. Then the actual addition does not overflow, but the result still does not fit."
when i am adding 1 to -5 i dont see any reason to worry.the answer is as it should be -4.
so what is the problem of result not being fit??
you can find the full article here through which i was going:
http://www.fefe.de/intof.html
The binary representation of -4, in a 32-bit word, is as follows (hex notation)
0xfffffffc
When interpreted as an unsigned integer, this bit pattern represents the number 2**32-4, or 18446744073709551612. I'm not sure I would call this phenomenon "overflow", but it is a common mistake to assign a small negative integer to a variable of unsigned type and wind up with a really big positive integer.
This trick is actually exploited for bounds checking: if you have a signed integer i and want to know if it is in the range 0 <= i < n, you can test
if ((unsigned)i < n) { ... }
which gives you the answer using one comparison instead of two. The cast to unsigned has no run-time cost; it just tells the compiler to generate an unsigned comparison instead of a signed comparison.
Try assigning it to a unsigned int, not an int.
The term unsigned int is the key - by default an int datatype will hold negative and positive numbers; however, unsigned ints are always positive. They provide this option because uints can technically hold greater positive values than regular signed ints because they do not need to use a bit to keep track of whether or not its negative or positive.
Please see:
Signed versus Unsigned Integers
The problem is that you're storing -4 in an unsigned int. Unsigned ints can only contain zero and positive values. If you assign -4 to one, you'll actually end up getting a very large positive number (the actual value depends on how wide an int you're using).
The problem is that the sizes of storage such as unsigned int can only hold so much. With 1 and -5 it does not matter, but with 1 and -500000000 you might end up with a confusing result. Also, unsigned storage will interpret anything stored in it as positive, so you cannot put a negative value in an unsigned variable.
Two big things to watch out for:
1. Overflow in the operation itself: 1 + -500000000
2. Issues in casting: (unsigned int)(1 + -500)
Unsigned variables, like unsigned int, cannot hold negative values. So assigning 1 - 5 to an unsigned int won't give you -4. I'm not sure what it'll give you, it's probably compiler specific.
Some code:
signed int first, second;
unsigned int result;
first = obtain(); // happens to be 1
second = obtain(); // happens to be -5
result = first + second; // unexpected result here - very large number - and it's too late to check that there's a problem
Say you obtained those values from keyboard. You need to check before addition that the result can be represented in unsigned int. That's what the article talks about.
By definition the number -4 cannot be represented in an unsigned int. -4 is a signed integer. The same goes for any negative number.
When you assign a negative integer to an unsigned int the actual bits of the number do not change, but they are merely represented differently. You'll get some ridiculously-large number due to the way integers are represented in binary (two's complement).
In two's complement, -4 is represented as 0xfffffffc. When 0xfffffffc is represented as an unsigned int you'll get the number 4,294,967,292.
You have to remember that fundamentally you're working with bits. So you can assign a value of -4 to an unsigned integer and this will place a series of bits into that memory location. Those bits can be interpreted as -4 in certain circumstances. One such circumstance is the obvious one: you've told the compiler/system that the bits in that memory location should be interpreted as a two's compliment signed number. So if you do printf("%s",i) prtinf does its magic and converts the two's compliment number to a magnitude and sign. The magnitude will be 4 and the sign will be negative, so it displays '-4'.
However, if you tell the compiler that the data at that memory location is not signed then the bits don't change but instead their interpretation does. So when you do your addition, store the result in an unsigned integer memory location and then call printf on the result it doesn't bother looking for the sign because by definition it is always positive. It calculates the magnitude and prints it. The magnitude will be off because the sign information is still encoded in the bits but it's treated as magnitude information.

Resources