How do I represent negative char values in hexadecimal? - c

The following code
char buffer[BSIZE];
...
if(buffer[0]==0xef)
...
Gives the compiler warning "comparison is always false due to limited range of data type".
The warning goes away when I change the check to
if(buffer[0]==0xffffffef)
This feels very counterintuitive. What is the correct way to check a char against a specific byte value in hex? (other than making it unsigned)

What's wrong with:
if (buffer[0] == '\xef')
?

To understand why buffer[0] == 0xef triggers a warning, and buffer[0] == 0xffffffef does not, you need to understand exactly what's happening in that expression.
Firstly, the == operator compares the value of two expressions, not the underlying representation - 0xef is the number 239, and will only compare equal to that number; likewise 0xffffffef is the number 4294967279 and will only compare equal to that.
There is no difference between the constants 0xef and 239 in C: both have type int and the same value. If your char has the range -128 to 127, then when you evaluate buffer[0] == 0xef the buffer[0] is promoted to int, which leaves its value unchanged. It can therefore never compare equal to 0xef, so the warning is correct.
However, there is potentially a difference between the constants 0xffffffef and 4294967279; decimal constants are always signed, but hexadecimal constant may be unsigned. On your system, it appears to have an unsigned type - probably unsigned int (because the value is too large to store in an int, but small enough to store in an unsigned int). When you evaluate buffer[0] == 0xffffffef, the buffer[0] is promoted to unsigned int. This leaves any positive value unchanged, but negative values are converted by adding UINT_MAX + 1 to them; with a char that has range -128 to 127, the promoted values are in either of the ranges 0 to 127 or 4294967168 to 4294967295. 0xffffffef lies within this range, so it is possible for the comparison to return true.
If you are storing bit patterns rather than numbers, then you should be using unsigned char in the first place. Alternatively, you may inspect the bit pattern of an object by casting a pointer to it to unsigned char *:
if (((unsigned char *)buffer)[0] == 0xef)
(This is obviously more conveniently done by using a separate variable of type unsigned char *).
As PaulR says, you can also use buffer[0] == '\xef' - this works because '\xef' is defined to be an int constant with the value that a char object with the bit pattern 0xef would have when converted to an int; eg. on a 2s complement system with signed chars, '\xef' is a constant with the value -17.

This is happening because buffer contents are of type char. Making them unsigned char will work:
if ((unsigned char) (buffer[0]) == 0xef)

Do it just like with any other negative number:
if (buffer[0]== -0x6f)
But usually you want to use unsigned char as data type:
unsigned char buffer[BSIZE];
...
if(buffer[0]==0xef)
Reasons to use signed char are very rare. Even more rare are reasons to use "char without sign specification", which can be signed or unsigned on different platforms.

To spell out the reason explicitly: Whether char is signed or unsigned is implementation defined. If your compiler treats char as signed by default then 0xef will be larger than the largest possibled signed char (which is 127 or 0x7f) and therefore your compare will always be false. Hence the warning.
Possible solutions are provided by the other answers.

Related

C typecasting from a signed char to int type

In the below snippet, shouldn't the output be 1? Why am I getting output as -1 and 4294967295?
What I understand is, the variable, c, here is of signed type, so shouldn't its value be 1?
char c=0xff;
printf("%d %u",c,c);
c is of signed type. a char is 8 bits. So you have an 8 bit signed quantity, with all bits 1. On a twos complement machine, that evaluates to -1.
Some compilers will warn you when you do that sort of thing. If you're using gcc/clang, switch on all the warnings.
Pedant note: On some machines it could have the value 255, should the compiler treat 'char' as unsigned.
You're getting the correct answer.
The %u format specifier indicates that the value will be an unsigned int. The compiler automatically promotes your 8-bit char to a 32-bit int. However you have to remember that char is a signed type. So a value of 0xff is in fact -1.
When the casting from char to int occurs, the value is still -1, but the it's the 32-bit representation which in binary is 11111111 11111111 11111111 11111111 or in hex 0xffffffff
When that is interpreted as an unsigned integer, all of the bits are obviously preserved because the length is the same, but now it's handled as an unsigned quantity.
0xffffffff = 4294967295 (unsigned)
0xffffffff = -1 (signed)
There are three character types in C, char, signed char, and unsigned char. Plain char has the same representation as either signed char or unsigned char; the choice is implementation-defined. It appears that plain char is signed in your implementation.
All three types have the same size, which is probably 8 bits (CHAR_BIT, defined in <limits.h>, specifies the number of bits in a byte). I'll assume 8 bits.
char c=0xff;
Assuming plain char is signed, the value 0xff (255) is outside the range of type char. Since you can't store the value 255 in a char object, the value is implicitly converted. The result of this conversion is implementation-defined, but is very likely to be -1.
Keep this carefully in mind: 0xff is simply another way to write 255, and 0xff and -1 are two distinct values. You cannot store the value 255 in a char object; its value is -1. Integer constants, whether they're decimal, hexadecimal, or octal, specify values, not representations.
If you really want a one-byte object with the value 0xff, define it as an unsigned char, not as a char.
printf("%d %u",c,c);
When a value of an integer type narrower than int is passed to printf (or to any variadic function), it's promoted to int if that type can hold the type's entire range of values, or to unsigned int if it can't. For type char, it's almost certainly promoted to int. So this call is equivalent to:
printf("%d %u", -1, -1);
The output for the "%d" format is obvious. The output for "%u" is less obvious. "%u" tells printf that the corresponding argument is of type unsigned int, but you've passed it a value of type int. What probably happens is that the representation of the int value is treated as if it were of type unsigned int, most likely yielding UINT_MAX, which happens to be 4294967295 on your system. If you really want to do that, you should convert the value to type unsigned int. This:
printf("%d %u", -1, (unsigned int)-1);
is well defined.
Your two lines of code are playing a lot of games with various types, treating values of one type as if they were of another type, and doing implicit conversions that might yield results that are implementation-defined and/or depend on the choices your compiler happens to make.
Whatever you're trying to do, there's undoubtedly a cleaner way to do it (unless you're just trying to see what your implementation does with this particular code).
Let us start with the assumption using OP's "c, here is of signed type"
char c=0xff; // Implementation defined behavior.
0xff is a hexadecimal constant with the value of 255 and type of int.
... the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised. ยง6.3.1.4 3
So right off, the value of c is implementation defined (ID). Let us assume the common ID behavior of 8-bit wrap-around, so c --> -1.
A signed char will be promoted to int as part of a variadic argument to printf("%d %u",c,c); is the same as printf("%d %u",-1, -1);. Printing the -1 with "%d" is not an issue and "-1" is printed.
Printing an int -1 with "%x" is undefined behavior (UB) as it is a mis-matched specifier/type and does not fall under the exception of being representable in both types. The common UB is to print the value as if it was converted to unsigned before being passed. When UINT_MAX == 4294967295 (4-bytes) that prints the value as -1 + (UINT_MAX + 1) or "4294967295"`.
So with ID and UB, you get a result, but robust code would be re-written to depend on neither.

Store signed char inside unsigned int

I have an unsigned int that actually stores a signed int, and the signed int ranges from -128 to 127.
I would like to store this value back in the unsigned int so that I can simply
apply a mask 0xFF and get the signed char.
How do I do the conversion ?
i.e.
unsigned int foo = -100;
foo = (char)foo;
char bar = foo & 0xFF;
assert(bar == -100);
The & 0xFF operation will produce a value in the range 0 to 255. It's not possible to get a negative number this way. So, even if you use & 0xFF somewhere, you will still need to apply a conversion later to get to the range -128 to 127.
In your code:
char bar = foo & 0xFF;
there is an implicit conversion to char. This relies on implementation-defined behaviour but this will work on all but the most esoteric of systems. The most common implementation definition is the inverse of the conversion that applies when converting unsigned char to char.
(Your previous line foo = (char)foo; should be removed).
However,
char bar = foo;
would produce exactly the same effect (again, except for on those esoteric systems).
Since the unsigned int foo value does not reach the boundaries of -128 or 127 the implicit conversion will work for this case. But if unsigned int foo had a bigger value you will be losing bytes at the moment when storing it in a char variable and will get unexpected results on your program.
Answering for C,
If you have an unsigned int whose value was set by assignment of a value of type char (where char happens to be a signed type) or of type signed char, where the assigned value was negative, then the stored value is the arithmetic sum of the assigned negative value and one more than UINT_MAX. This will be far beyond the range of values representable by (signed) char on any C system I have ever encountered. If you convert that value back to (signed) char, whether implicitly or via a cast, "either the result is implementation-defined, or an implementation-defined signal is raised" (C2011, 6.3.1.3/3).
Converting back to the original char value in a way that avoids implementation-defined behavior is a bit tricky (but relying on implementation-defined behavior may afford much easier approaches). Certainly, masking off all but the 8 lowest-order value bits does not do the trick, as it always gives you a positive value. Also, it assumes that char is 8 bits wide, which, in principle, is not guaranteed. It does not necessarily even give you the correct bit pattern, as C permits negative integers to be represented in any of three different ways.
Here's an approach that will work on any conforming C system:
unsigned int foo = SOME_SIGNED_CHAR_VALUE;
signed char bar;
/* ... */
if (foo <= SCHAR_MAX) {
/* foo's value is representable as a signed char */
bar = foo;
} else {
/* mask off the highest-order value bits to yield a value that fits in an int */
int foo2 = foo & INT_MAX;
/* reverse the conversion to unsigned int, as if unsigned int had the same
number of value bits as int; the other bits are already accounted for */
bar = (foo2 - INT_MAX) - 1;
}
That relies only on characteristics of integer representation and conversion that C itself defines.
Don't do it.
Casting to a smaller size may truncate the value. Casting from signed to unsigned or opposite may results wrong value (e.g. 255 -> -1).
If you have to make calculations with different data types, pick one common type, prefereably signed and long int (32-bit), and check boundaries before casting down (to smaller size).
Signed helps you detect underflows (e.g. when result gets less than 0), long int (or just simply: int, which means natural word length) suits for machines (32-bit or 64-bit), and it's big enough for most purposes.
Also try to avoid mixed types in formulas, especially when they contain division (/).

What does (int)(unsigned char)(x) do in C?

In ctype.h, line 20, __ismask is defined as:
#define __ismask(x) (_ctype[(int)(unsigned char)(x)])
What does (int)(unsigned char)(x) do? I guess it casts x to unsigned char (to retrieve the first byte only regardless of x), but then why is it cast to an int at the end?
(unsigned char)(x) effectively computes an unsigned char with the value of x % (UCHAR_MAX + 1). This has the effect of giving a positive value (between 0 and UCHAR_MAX). With most implementations UCHAR_MAX has a value of 255 (although the standard permits an unsigned char to support a larger range, such implementations are uncommon).
Since the result of (unsigned char)(x) is guaranteed to be in the range supported by an int, the conversion to int will not change value.
Net effect is the least significant byte, with a positive value.
Some compilers give a warning when using a char (signed or not) type as an array index. The conversion to int shuts the compiler up.
The unsigned char-cast is to make sure the value is within the range 0..255, the resulting value is then used as an index in the _ctype array which is 255 bytes large, see ctype.h in Linux.
A cast to unsigned char safely extracts the least significant CHAR_BITs of x, due to the wraparound properties of an unsigned type. (A cast to char could be undefined if a char is a signed type on a platform: overflowing a signed type is undefined behaviour in c). CHAR_BIT is usually 8.
The cast to int then converts the unsigned char. The standard guarantees that an int can always hold any value that unsigned char can take.
A better alternative, if you wanted to extract the 8 least significant bits would be to apply & 0xFF and cast that result to an unsigned type.
I think char is implementation dependent, either signed or unsigned. So you need to be explicit by writing unsigned char, in order not to cast to a negative number. Then cast to int.

Why is '\xff' not being recognized?

I know that 0xff can have different representations depending on what the variable type is. Like -1 for signed (chars/ints(?)) and 255 for unsigned chars.
But I am using the implementation-independent type of uint8_t and i've made sure that 0xff is infact inside the structure I am iterating across. Here is the code:
struct pkt {
uint8_t msg[8];
};
void main(int argc, char **argv) {
...
struct pkt packet;
memset(&packet, 0, sizeof packet);
strcpy(packet.msg, "hello");
packet.msg[strlen("hello")] = '\xff';
crypt(&packet, argv[1]);
...
}
void crypt(struct pkt *packet, unsigned char *key) {
int size = msglen(packet->msg);
...
}
int msglen(uint8_t *msg) {
int i = 0;
while(*(msg++) != '\xff') {
i++;
}
return i;
}
I've looked into the structure, and packet.msg[5] is indeed set to 0xff. But the while loop goes into an infinite loop, like it never discovered 0xff.
Values such as 0x7f works. I haven't tried 0x80 but I suspect it probably won't work if 0xff doesn't. It probably has something to do with the signness, but I just cant't see where the problem is supposed to come from.
Thanks.
EDIT: For me, it doesn't matter if I use 0x7f or 0xff. But I would just like to know what is preventing me from detecting 0xff.
If you have an unsigned, you can't use character literals.
'\xff' is -1, not 255, because the a character literal is signed.
The while condition is always true. If you are unsigned you should be using numbers only: 0 to 255, or casting characters you know are <128 to unsigned.
\xff is a character constant. It's of type int, not char (this is one way in which C differs from C++), but its value depends on whether plain char is signed or unsigned, which is implementation-defined.
The wording in the C standard is:
The hexadecimal digits that follow the backslash and the letter x in a
hexadecimal escape sequence are taken to be part of the construction
of a single character for an integer character constant or of a single
wide character for a wide character constant. The numerical value of
the hexadecimal integer so formed specifies the value of the desired
character or wide character.
If plain char is unsigned, then '\xff' is equivalent to 0xff or 255; it's of type int and has the value 255.
If plain char is signed, then '\xff' specifies a value that's outside the range of char (assuming that char is 8 bits). The wording of the standard isn't 100% clear to me, but at least with gcc the value of '\xff' is -1.
Just use an integer constant 0xff rather than a character constant \xff'. 0xff is of type int and is guaranteed to have the value 255, which is what you want.
I know that 0xff can have different representations depending on what the variable type is. Like -1 for signed (chars/ints(?)) and 255 for unsigned chars.
This needs some explanation. The integer literal 0xFF in a C program always means 255. If you assign this to a type for which 255 is out of range, e.g. a signed char then the behaviour is implementation-defined. Typically on 2's complement systems this is defined as assigning the value -1.
Character literals have different rules to integer literals. The character literal '\xff' must be a value that can sit in a char. You appear to have signed char, so it's implementation-defined what happens here, but again the most common behaviour is that this gets value -1. Note that character literals actually have type int despite the fact that they must have values representable by char.
In the line packet.msg[strlen("hello")] = '\xff'; , you try to assign (int)-1 to a uint8_t. This is out of range, but the behaviour is well-defined for out-of-range assignment to signed types, so the value you get is -1 (mod 256) which is 255.
Finally, when using the == operator (and most operators), the values are promoted to int if they were not already int. The 8-bit int 255 is promoted to (int)255, and you compare this against (int)-1, and they differ.
To solve this , change your comparison to have either 0xFF, or (uint8_t)'\xFF'.

Difference between signed / unsigned char [duplicate]

This question already has answers here:
What is an unsigned char?
(16 answers)
char!=(signed char), char!=(unsigned char)
(4 answers)
Closed 5 years ago.
So I know that the difference between a signed int and unsigned int is that a bit is used to signify if the number if positive or negative, but how does this apply to a char? How can a character be positive or negative?
There's no dedicated "character type" in C language. char is an integer type, same (in that regard) as int, short and other integer types. char just happens to be the smallest integer type. So, just like any other integer type, it can be signed or unsigned.
It is true that (as the name suggests) char is mostly intended to be used to represent characters. But characters in C are represented by their integer "codes", so there's nothing unusual in the fact that an integer type char is used to serve that purpose.
The only general difference between char and other integer types is that plain char is not synonymous with signed char, while with other integer types the signed modifier is optional/implied.
I slightly disagree with the above. The unsigned char simply means: Use the most significant bit instead of treating it as a bit flag for +/- sign when performing arithmetic operations.
It makes significance if you use char as a number for instance:
typedef char BYTE1;
typedef unsigned char BYTE2;
BYTE1 a;
BYTE2 b;
For variable a, only 7 bits are available and its range is (-127 to 127) = (+/-)2^7 -1.
For variable b all 8 bits are available and the range is 0 to 255 (2^8 -1).
If you use char as character, "unsigned" is completely ignored by the compiler just as comments are removed from your program.
There are three char types: (plain) char, signed char and unsigned char. Any char is usually an 8-bit integer* and in that sense, a signed and unsigned char have a useful meaning (generally equivalent to uint8_t and int8_t). When used as a character in the sense of text, use a char (also referred to as a plain char). This is typically a signed char but can be implemented either way by the compiler.
* Technically, a char can be any size as long as sizeof(char) is 1, but it is usually an 8-bit integer.
Representation is the same, the meaning is different. e.g, 0xFF, it both represented as "FF". When it is treated as "char", it is negative number -1; but it is 255 as unsigned. When it comes to bit shifting, it is a big difference since the sign bit is not shifted. e.g, if you shift 255 right 1 bit, it will get 127; shifting "-1" right will be no effect.
A signed char is a signed value which is typically smaller than, and is guaranteed not to be bigger than, a short. An unsigned char is an unsigned value which is typically smaller than, and is guaranteed not to be bigger than, a short. A type char without a signed or unsigned qualifier may behave as either a signed or unsigned char; this is usually implementation-defined, but there are a couple of cases where it is not:
If, in the target platform's character set, any of the characters required by standard C would map to a code higher than the maximum `signed char`, then `char` must be unsigned.
If `char` and `short` are the same size, then `char` must be signed.
Part of the reason there are two dialects of "C" (those where char is signed, and those where it is unsigned) is that there are some implementations where char must be unsigned, and others where it must be signed.
The same way -- e.g. if you have an 8-bit char, 7 bits can be used for magnitude and 1 for sign. So an unsigned char might range from 0 to 255, whilst a signed char might range from -128 to 127 (for example).
This because a char is stored at all effects as a 8-bit number. Speaking about a negative or positive char doesn't make sense if you consider it an ASCII code (which can be just signed*) but makes sense if you use that char to store a number, which could be in range 0-255 or in -128..127 according to the 2-complement representation.
*: it can be also unsigned, it actually depends on the implementation I think, in that case you will have access to extended ASCII charset provided by the encoding used
The same way how an int can be positive or negative. There is no difference. Actually on many platforms unqualified char is signed.

Resources