Related
I have an unsigned int that actually stores a signed int, and the signed int ranges from -128 to 127.
I would like to store this value back in the unsigned int so that I can simply
apply a mask 0xFF and get the signed char.
How do I do the conversion ?
i.e.
unsigned int foo = -100;
foo = (char)foo;
char bar = foo & 0xFF;
assert(bar == -100);
The & 0xFF operation will produce a value in the range 0 to 255. It's not possible to get a negative number this way. So, even if you use & 0xFF somewhere, you will still need to apply a conversion later to get to the range -128 to 127.
In your code:
char bar = foo & 0xFF;
there is an implicit conversion to char. This relies on implementation-defined behaviour but this will work on all but the most esoteric of systems. The most common implementation definition is the inverse of the conversion that applies when converting unsigned char to char.
(Your previous line foo = (char)foo; should be removed).
However,
char bar = foo;
would produce exactly the same effect (again, except for on those esoteric systems).
Since the unsigned int foo value does not reach the boundaries of -128 or 127 the implicit conversion will work for this case. But if unsigned int foo had a bigger value you will be losing bytes at the moment when storing it in a char variable and will get unexpected results on your program.
Answering for C,
If you have an unsigned int whose value was set by assignment of a value of type char (where char happens to be a signed type) or of type signed char, where the assigned value was negative, then the stored value is the arithmetic sum of the assigned negative value and one more than UINT_MAX. This will be far beyond the range of values representable by (signed) char on any C system I have ever encountered. If you convert that value back to (signed) char, whether implicitly or via a cast, "either the result is implementation-defined, or an implementation-defined signal is raised" (C2011, 6.3.1.3/3).
Converting back to the original char value in a way that avoids implementation-defined behavior is a bit tricky (but relying on implementation-defined behavior may afford much easier approaches). Certainly, masking off all but the 8 lowest-order value bits does not do the trick, as it always gives you a positive value. Also, it assumes that char is 8 bits wide, which, in principle, is not guaranteed. It does not necessarily even give you the correct bit pattern, as C permits negative integers to be represented in any of three different ways.
Here's an approach that will work on any conforming C system:
unsigned int foo = SOME_SIGNED_CHAR_VALUE;
signed char bar;
/* ... */
if (foo <= SCHAR_MAX) {
/* foo's value is representable as a signed char */
bar = foo;
} else {
/* mask off the highest-order value bits to yield a value that fits in an int */
int foo2 = foo & INT_MAX;
/* reverse the conversion to unsigned int, as if unsigned int had the same
number of value bits as int; the other bits are already accounted for */
bar = (foo2 - INT_MAX) - 1;
}
That relies only on characteristics of integer representation and conversion that C itself defines.
Don't do it.
Casting to a smaller size may truncate the value. Casting from signed to unsigned or opposite may results wrong value (e.g. 255 -> -1).
If you have to make calculations with different data types, pick one common type, prefereably signed and long int (32-bit), and check boundaries before casting down (to smaller size).
Signed helps you detect underflows (e.g. when result gets less than 0), long int (or just simply: int, which means natural word length) suits for machines (32-bit or 64-bit), and it's big enough for most purposes.
Also try to avoid mixed types in formulas, especially when they contain division (/).
I have wrote this program as an exercise to understand how the signed and unsigned integer
work in C.
This code should print simply -9 the addition of -4+-5 stored in variable c
#include <stdio.h>
int main (void) {
unsigned int a=-4;
unsigned int b=-5;
unsigned int c=a+b;
printf("result is %u\n",c);
return 0;
}
When this code run it give me an unexpected result 4294967287.
I also have cast c from unsigned to signed integer printf ("result is %u\n",(int)c);
but also doesn't work.
please someone give explanation why the program doesn't give the exact result?
if this is an exercise in c and signed vs unsigned you should start by thinking - what does this mean?
unsigned int a=-4;
should it even compile? It seems like a contradiction.
Use a debugger to inspect the memory stored at he location of a. Do you think it will be the same in this case?
int a=-4;
Does the compiler do different things when its asked to add unsigned x to unsigned y as opposed to signed x and signed y. Ask the compiler to show you the machine code it generated in each case, read up what the instructions do
Explore investigate verify, you have the opportunity to get really interesting insights into how computers really work
You expect this:
printf("result is %u\n",c);
to print -9. That's impossible. c is of type unsigned int, and %u prints a value of type unsigned int (so good work using the right format string for the argument). An unsigned int object cannot store a negative value.
Going back a few line in your program:
unsigned int a=-4;
4 is of type (signed) int, and has the obvious value. Applying unary - to that value yields an int value of -4.
So far, so good.
Now what happens when you store this negative int value in an unsigned int object?
It's converted.
The language specifies what happens when you convert a signed int value to unsigned int: the value is adjusted to it's within the range of unsigned int. If unsigned int is 32 bits, this is done by adding or subtracting 232 as many times as necessary. In this case, the result is -4 + 232, or 4294967292. (That number makes a bit more sense if you show it in hexadecimal: 0xfffffffc.)
(The generated code isn't really going to repeatedly add or subtract 232; it's going to do whatever it needs to do to get the same result. The cool thing about using two's-complement to represent signed integers is that it doesn't have to do anything. The int value -4 and the unsigned int value 4294967292 have exactly the same bit representation. The rules are defined in terms of values, but they're designed so that they can be easily implemented using bitwise operations.)
Similarly, c will have the value -5 + 232, or 4294967291.
Now you add them together. The mathematical result is 8589934583, but that won't fit in an unsigned int. Using rules similar to those for conversion, the result is reduced to a value that's within the range of unsigned int, yielding 4294967287 (or, in hex, 0xfffffff7).
You also tried a cast:
printf ("result is %u\n",(int)c);
Here you're passing an int argument to printf, but you've told it (by using %u) to expect an unsigned int. You've also tried to convert a value that's too big to fit in an int -- and the unsigned-to-signed conversion rules do not define the result of such a conversion when the value is out of range. So don't do that.
That answer is precisely correct for 32-bit ints.
unsigned int a = -4;
sets a to the bit pattern 0xFFFFFFFC, which, interpreted as unsigned, is 4294967292 (232 - 4). Likewise, b is set to 232 - 5. When you add the two, you get 0x1FFFFFFF7 (8589934583), which is wider than 32 bits, so the extra bits are dropped, leaving 4294967287, which, as it happens, is 232 - 9. So if you had done this calculation on signed ints, you would have gotten exactly the same bit patterns, but printf would have rendered the answer as -9.
Using google, one finds the answer in two seconds..
http://en.wikipedia.org/wiki/Signedness
For example, 0xFFFFFFFF gives −1, but 0xFFFFFFFFU gives 4,294,967,295
for 32-bit code
Therefore, your 4294967287 is expected in this case.
However, what exactly do you mean by "cast from unsigned to signed does not work?"
I tried the to execute the below program:
#include <stdio.h>
int main() {
signed char a = -5;
unsigned char b = -5;
int c = -5;
unsigned int d = -5;
if (a == b)
printf("\r\n char is SAME!!!");
else
printf("\r\n char is DIFF!!!");
if (c == d)
printf("\r\n int is SAME!!!");
else
printf("\r\n int is DIFF!!!");
return 0;
}
For this program, I am getting the output:
char is DIFF!!!
int is SAME!!!
Why are we getting different outputs for both?
Should the output be as below ?
char is SAME!!!
int is SAME!!!
A codepad link.
This is because of the various implicit type conversion rules in C. There are two of them that a C programmer must know: the usual arithmetic conversions and the integer promotions (the latter are part of the former).
In the char case you have the types (signed char) == (unsigned char). These are both small integer types. Other such small integer types are bool and short. The integer promotion rules state that whenever a small integer type is an operand of an operation, its type will get promoted to int, which is signed. This will happen no matter if the type was signed or unsigned.
In the case of the signed char, the sign will be preserved and it will be promoted to an int containing the value -5. In the case of the unsigned char, it contains a value which is 251 (0xFB ). It will be promoted to an int containing that same value. You end up with
if( (int)-5 == (int)251 )
In the integer case you have the types (signed int) == (unsigned int). They are not small integer types, so the integer promotions do not apply. Instead, they are balanced by the usual arithmetic conversions, which state that if two operands have the same "rank" (size) but different signedness, the signed operand is converted to the same type as the unsigned one. You end up with
if( (unsigned int)-5 == (unsigned int)-5)
Cool question!
The int comparison works, because both ints contain exactly the same bits, so they are essentially the same. But what about the chars?
Ah, C implicitly promotes chars to ints on various occasions. This is one of them. Your code says if(a==b), but what the compiler actually turns that to is:
if((int)a==(int)b)
(int)a is -5, but (int)b is 251. Those are definitely not the same.
EDIT: As #Carbonic-Acid pointed out, (int)b is 251 only if a char is 8 bits long. If int is 32 bits long, (int)b is -32764.
REDIT: There's a whole bunch of comments discussing the nature of the answer if a byte is not 8 bits long. The only difference in this case is that (int)b is not 251 but a different positive number, which isn't -5. This is not really relevant to the question which is still very cool.
Welcome to integer promotion. If I may quote from the website:
If an int can represent all values of the original type, the value is
converted to an int; otherwise, it is converted to an unsigned int.
These are called the integer promotions. All other types are unchanged
by the integer promotions.
C can be really confusing when you do comparisons such as these, I recently puzzled some of my non-C programming friends with the following tease:
#include <stdio.h>
#include <string.h>
int main()
{
char* string = "One looooooooooong string";
printf("%d\n", strlen(string));
if (strlen(string) < -1) printf("This cannot be happening :(");
return 0;
}
Which indeed does print This cannot be happening :( and seemingly demonstrates that 25 is smaller than -1!
What happens underneath however is that -1 is represented as an unsigned integer which due to the underlying bits representation is equal to 4294967295 on a 32 bit system. And naturally 25 is smaller than 4294967295.
If we however explicitly cast the size_t type returned by strlen as a signed integer:
if ((int)(strlen(string)) < -1)
Then it will compare 25 against -1 and all will be well with the world.
A good compiler should warn you about the comparison between an unsigned and signed integer and yet it is still so easy to miss (especially if you don't enable warnings).
This is especially confusing for Java programmers as all primitive types there are signed. Here's what James Gosling (one of the creators of Java) had to say on the subject:
Gosling: For me as a language designer, which I don't really count
myself as these days, what "simple" really ended up meaning was could
I expect J. Random Developer to hold the spec in his head. That
definition says that, for instance, Java isn't -- and in fact a lot of
these languages end up with a lot of corner cases, things that nobody
really understands. Quiz any C developer about unsigned, and pretty
soon you discover that almost no C developers actually understand what
goes on with unsigned, what unsigned arithmetic is. Things like that
made C complex. The language part of Java is, I think, pretty simple.
The libraries you have to look up.
The hex representation of -5 is:
8-bit, two's complement signed char: 0xfb
32-bit, two's complement signed int: 0xfffffffb
When you convert a signed number to an unsigned number, or vice versa, the compiler does ... precisely nothing. What is there to do? The number is either convertible or it isn't, in which case undefined or implementation-defined behaviour follows (I've not actually checked which) and the most efficient implementation-defined behaviour is to do nothing.
So, the hex representation of (unsigned <type>)-5 is:
8-bit, unsigned char: 0xfb
32-bit, unsigned int: 0xfffffffb
Look familiar? They're bit-for-bit the same as the signed versions.
When you write if (a == b), where a and b are of type char, what the compiler is actually required to read is if ((int)a == (int)b). (This is that "integer promotion" that everyone else is banging on about.)
So, what happens when we convert char to int?
8-bit signed char to 32-bit signed int: 0xfb -> 0xfffffffb
Well, that makes sense because it matches the representations of -5 above!
It's called a "sign-extend", because it copies the top bit of the byte, the "sign-bit", leftwards into the new, wider value.
8-bit unsigned char to 32-bit signed int: 0xfb -> 0x000000fb
This time it does a "zero-extend" because the source type is unsigned, so there is no sign-bit to copy.
So, a == b really does 0xfffffffb == 0x000000fb => no match!
And, c == d really does 0xfffffffb == 0xfffffffb => match!
My point is: didn't you get a warning at compile time "comparing signed and unsigned expression"?
The compiler is trying to inform you that he is entitled to do crazy stuff! :) I would add, crazy stuff will happen using big values, close to the capacity of the primitive type. And
unsigned int d = -5;
is assigning definitely a big value to d, it's equivalent (even if, probably not guaranteed to be equivalent) to be:
unsigned int d = UINT_MAX -4; ///Since -1 is UINT_MAX
Edit:
However, it is interesting to notice that only the second comparison gives a warning (check the code). So it means that the compiler applying the conversion rules is confident that there won't be errors in the comparison between unsigned char and char (during comparison they will be converted to a type that can safely represent all its possible values). And he is right on this point. Then, it informs you that this won't be the case for unsigned int and int: during the comparison one of the 2 will be converted to a type that cannot fully represent it.
For completeness, I checked it also for short: the compiler behaves in the same way than for chars, and, as expected, there are no errors at runtime.
.
Related to this topic, I recently asked this question (yet, C++ oriented).
My apologies if the question seems weird. I'm debugging my code and this seems to be the problem, but I'm not sure.
Thanks!
It depends on what you want the behaviour to be. An int cannot hold many of the values that an unsigned int can.
You can cast as usual:
int signedInt = (int) myUnsigned;
but this will cause problems if the unsigned value is past the max int can hold. This means half of the possible unsigned values will result in erroneous behaviour unless you specifically watch out for it.
You should probably reexamine how you store values in the first place if you're having to convert for no good reason.
EDIT: As mentioned by ProdigySim in the comments, the maximum value is platform dependent. But you can access it with INT_MAX and UINT_MAX.
For the usual 4-byte types:
4 bytes = (4*8) bits = 32 bits
If all 32 bits are used, as in unsigned, the maximum value will be 2^32 - 1, or 4,294,967,295.
A signed int effectively sacrifices one bit for the sign, so the maximum value will be 2^31 - 1, or 2,147,483,647. Note that this is half of the other value.
Unsigned int can be converted to signed (or vice-versa) by simple expression as shown below :
unsigned int z;
int y=5;
z= (unsigned int)y;
Though not targeted to the question, you would like to read following links :
signed to unsigned conversion in C - is it always safe?
performance of unsigned vs signed integers
Unsigned and signed values in C
What type-conversions are happening?
IMHO this question is an evergreen. As stated in various answers, the assignment of an unsigned value that is not in the range [0,INT_MAX] is implementation defined and might even raise a signal. If the unsigned value is considered to be a two's complement representation of a signed number, the probably most portable way is IMHO the way shown in the following code snippet:
#include <limits.h>
unsigned int u;
int i;
if (u <= (unsigned int)INT_MAX)
i = (int)u; /*(1)*/
else if (u >= (unsigned int)INT_MIN)
i = -(int)~u - 1; /*(2)*/
else
i = INT_MIN; /*(3)*/
Branch (1) is obvious and cannot invoke overflow or traps, since it
is value-preserving.
Branch (2) goes through some pains to avoid signed integer overflow
by taking the one's complement of the value by bit-wise NOT, casts it
to 'int' (which cannot overflow now), negates the value and subtracts
one, which can also not overflow here.
Branch (3) provides the poison we have to take on one's complement or
sign/magnitude targets, because the signed integer representation
range is smaller than the two's complement representation range.
This is likely to boil down to a simple move on a two's complement target; at least I've observed such with GCC and CLANG. Also branch (3) is unreachable on such a target -- if one wants to limit the execution to two's complement targets, the code could be condensed to
#include <limits.h>
unsigned int u;
int i;
if (u <= (unsigned int)INT_MAX)
i = (int)u; /*(1)*/
else
i = -(int)~u - 1; /*(2)*/
The recipe works with any signed/unsigned type pair, and the code is best put into a macro or inline function so the compiler/optimizer can sort it out. (In which case rewriting the recipe with a ternary operator is helpful. But it's less readable and therefore not a good way to explain the strategy.)
And yes, some of the casts to 'unsigned int' are redundant, but
they might help the casual reader
some compilers issue warnings on signed/unsigned compares, because the implicit cast causes some non-intuitive behavior by language design
If you have a variable unsigned int x;, you can convert it to an int using (int)x.
It's as simple as this:
unsigned int foo;
int bar = 10;
foo = (unsigned int)bar;
Or vice versa...
If an unsigned int and a (signed) int are used in the same expression, the signed int gets implicitly converted to unsigned. This is a rather dangerous feature of the C language, and one you therefore need to be aware of. It may or may not be the cause of your bug. If you want a more detailed answer, you'll have to post some code.
Some explain from C++Primer 5th Page 35
If we assign an out-of-range value to an object of unsigned type, the result is the remainder of the value modulo the number of values the target type can hold.
For example, an 8-bit unsigned char can hold values from 0 through 255, inclusive. If we assign a value outside the range, the compiler assigns the remainder of that value modulo 256.
unsigned char c = -1; // assuming 8-bit chars, c has value 255
If we assign an out-of-range value to an object of signed type, the result is undefined. The program might appear to work, it might crash, or it might produce garbage values.
Page 160:
If any operand is an unsigned type, the type to which the operands are converted depends on the relative sizes of the integral types on the machine.
...
When the signedness differs and the type of the unsigned operand is the same as or larger than that of the signed operand, the signed operand is converted to unsigned.
The remaining case is when the signed operand has a larger type than the unsigned operand. In this case, the result is machine dependent. If all values in the unsigned type fit in the large type, then the unsigned operand is converted to the signed type. If the values don't fit, then the signed operand is converted to the unsigned type.
For example, if the operands are long and unsigned int, and int and long have the same size, the length will be converted to unsigned int. If the long type has more bits, then the unsigned int will be converted to long.
I found reading this book is very helpful.
Given that signed and unsigned ints use the same registers, etc., and just interpret bit patterns differently, and C chars are basically just 8-bit ints, what's the difference between signed and unsigned chars in C? I understand that the signedness of char is implementation defined, and I simply can't understand how it could ever make a difference, at least when char is used to hold strings instead of to do math.
It won't make a difference for strings. But in C you can use a char to do math, when it will make a difference.
In fact, when working in constrained memory environments, like embedded 8 bit applications a char will often be used to do math, and then it makes a big difference. This is because there is no byte type by default in C.
In terms of the values they represent:
unsigned char:
spans the value range 0..255 (00000000..11111111)
values overflow around low edge as:
0 - 1 = 255 (00000000 - 00000001 = 11111111)
values overflow around high edge as:
255 + 1 = 0 (11111111 + 00000001 = 00000000)
bitwise right shift operator (>>) does a logical shift:
10000000 >> 1 = 01000000 (128 / 2 = 64)
signed char:
spans the value range -128..127 (10000000..01111111)
values overflow around low edge as:
-128 - 1 = 127 (10000000 - 00000001 = 01111111)
values overflow around high edge as:
127 + 1 = -128 (01111111 + 00000001 = 10000000)
bitwise right shift operator (>>) does an arithmetic shift:
10000000 >> 1 = 11000000 (-128 / 2 = -64)
I included the binary representations to show that the value wrapping behaviour is pure, consistent binary arithmetic and has nothing to do with a char being signed/unsigned (expect for right shifts).
Update
Some implementation-specific behaviour mentioned in the comments:
char != signed char. The type "char" without "signed" or "unsinged" is implementation-defined which means that it can act like a signed or unsigned type.
Signed integer overflow leads to undefined behavior where a program can do anything, including dumping core or overrunning a buffer.
#include <stdio.h>
int main(int argc, char** argv)
{
char a = 'A';
char b = 0xFF;
signed char sa = 'A';
signed char sb = 0xFF;
unsigned char ua = 'A';
unsigned char ub = 0xFF;
printf("a > b: %s\n", a > b ? "true" : "false");
printf("sa > sb: %s\n", sa > sb ? "true" : "false");
printf("ua > ub: %s\n", ua > ub ? "true" : "false");
return 0;
}
[root]# ./a.out
a > b: true
sa > sb: true
ua > ub: false
It's important when sorting strings.
There are a couple of difference. Most importantly, if you overflow the valid range of a char by assigning it a too big or small integer, and char is signed, the resulting value is implementation defined or even some signal (in C) could be risen, as for all signed types. Contrast that to the case when you assign something too big or small to an unsigned char: the value wraps around, you will get precisely defined semantics. For example, assigning a -1 to an unsigned char, you will get an UCHAR_MAX. So whenever you have a byte as in a number from 0 to 2^CHAR_BIT, you should really use unsigned char to store it.
The sign also makes a difference when passing to vararg functions:
char c = getSomeCharacter(); // returns 0..255
printf("%d\n", c);
Assume the value assigned to c would be too big for char to represent, and the machine uses two's complement. Many implementation behave for the case that you assign a too big value to the char, in that the bit-pattern won't change. If an int will be able to represent all values of char (which it is for most implementations), then the char is being promoted to int before passing to printf. So, the value of what is passed would be negative. Promoting to int would retain that sign. So you will get a negative result. However, if char is unsigned, then the value is unsigned, and promoting to an int will yield a positive int. You can use unsigned char, then you will get precisely defined behavior for both the assignment to the variable, and passing to printf which will then print something positive.
Note that a char, unsigned and signed char all are at least 8 bits wide. There is no requirement that char is exactly 8 bits wide. However, for most systems that's true, but for some, you will find they use 32bit chars. A byte in C and C++ is defined to have the size of char, so a byte in C also is not always exactly 8 bits.
Another difference is, that in C, a unsigned char must have no padding bits. That is, if you find CHAR_BIT is 8, then an unsigned char's values must range from 0 .. 2^CHAR_BIT-1. THe same is true for char if it's unsigned. For signed char, you can't assume anything about the range of values, even if you know how your compiler implements the sign stuff (two's complement or the other options), there may be unused padding bits in it. In C++, there are no padding bits for all three character types.
"What does it mean for a char to be signed?"
Traditionally, the ASCII character set consists of 7-bit character encodings. (As opposed to the 8 bit EBCIDIC.)
When the C language was designed and implemented this was a significant issue. (For various reasons like data transmission over serial modem devices.) The extra bit has uses like parity.
A "signed character" happens to be perfect for this representation.
Binary data, OTOH, is simply taking the value of each 8-bit "chunk" of data, thus no sign is needed.
Arithmetic on bytes is important for computer graphics (where 8-bit values are often used to store colors). Aside from that, I can think of two main cases where char sign matters:
converting to a larger int
comparison functions
The nasty thing is, these won't bite you if all your string data is 7-bit. However, it promises to be an unending source of obscure bugs if you're trying to make your C/C++ program 8-bit clean.
Signedness works pretty much the same way in chars as it does in other integral types. As you've noted, chars are really just one-byte integers. (Not necessarily 8-bit, though! There's a difference; a byte might be bigger than 8 bits on some platforms, and chars are rather tied to bytes due to the definitions of char and sizeof(char). The CHAR_BIT macro, defined in <limits.h> or C++'s <climits>, will tell you how many bits are in a char.).
As for why you'd want a character with a sign: in C and C++, there is no standard type called byte. To the compiler, chars are bytes and vice versa, and it doesn't distinguish between them. Sometimes, though, you want to -- sometimes you want that char to be a one-byte number, and in those cases (particularly how small a range a byte can have), you also typically care whether the number is signed or not. I've personally used signedness (or unsignedness) to say that a certain char is a (numeric) "byte" rather than a character, and that it's going to be used numerically. Without a specified signedness, that char really is a character, and is intended to be used as text.
I used to do that, rather. Now the newer versions of C and C++ have (u?)int_least8_t (currently typedef'd in <stdint.h> or <cstdint>), which are more explicitly numeric (though they'll typically just be typedefs for signed and unsigned char types anyway).
The only situation I can imagine this being an issue is if you choose to do math on chars. It's perfectly legal to write the following code.
char a = (char)42;
char b = (char)120;
char c = a + b;
Depending on the signedness of the char, c could be one of two values. If char's are unsigned then c will be (char)162. If they are signed then it will an overflow case as the max value for a signed char is 128. I'm guessing most implementations would just return (char)-32.
One thing about signed chars is that you can test c >= ' ' (space) and be sure it's a normal printable ascii char. Of course, it's not portable, so not very useful.