Why is '\xff' not being recognized? - c

I know that 0xff can have different representations depending on what the variable type is. Like -1 for signed (chars/ints(?)) and 255 for unsigned chars.
But I am using the implementation-independent type of uint8_t and i've made sure that 0xff is infact inside the structure I am iterating across. Here is the code:
struct pkt {
uint8_t msg[8];
};
void main(int argc, char **argv) {
...
struct pkt packet;
memset(&packet, 0, sizeof packet);
strcpy(packet.msg, "hello");
packet.msg[strlen("hello")] = '\xff';
crypt(&packet, argv[1]);
...
}
void crypt(struct pkt *packet, unsigned char *key) {
int size = msglen(packet->msg);
...
}
int msglen(uint8_t *msg) {
int i = 0;
while(*(msg++) != '\xff') {
i++;
}
return i;
}
I've looked into the structure, and packet.msg[5] is indeed set to 0xff. But the while loop goes into an infinite loop, like it never discovered 0xff.
Values such as 0x7f works. I haven't tried 0x80 but I suspect it probably won't work if 0xff doesn't. It probably has something to do with the signness, but I just cant't see where the problem is supposed to come from.
Thanks.
EDIT: For me, it doesn't matter if I use 0x7f or 0xff. But I would just like to know what is preventing me from detecting 0xff.

If you have an unsigned, you can't use character literals.
'\xff' is -1, not 255, because the a character literal is signed.
The while condition is always true. If you are unsigned you should be using numbers only: 0 to 255, or casting characters you know are <128 to unsigned.

\xff is a character constant. It's of type int, not char (this is one way in which C differs from C++), but its value depends on whether plain char is signed or unsigned, which is implementation-defined.
The wording in the C standard is:
The hexadecimal digits that follow the backslash and the letter x in a
hexadecimal escape sequence are taken to be part of the construction
of a single character for an integer character constant or of a single
wide character for a wide character constant. The numerical value of
the hexadecimal integer so formed specifies the value of the desired
character or wide character.
If plain char is unsigned, then '\xff' is equivalent to 0xff or 255; it's of type int and has the value 255.
If plain char is signed, then '\xff' specifies a value that's outside the range of char (assuming that char is 8 bits). The wording of the standard isn't 100% clear to me, but at least with gcc the value of '\xff' is -1.
Just use an integer constant 0xff rather than a character constant \xff'. 0xff is of type int and is guaranteed to have the value 255, which is what you want.

I know that 0xff can have different representations depending on what the variable type is. Like -1 for signed (chars/ints(?)) and 255 for unsigned chars.
This needs some explanation. The integer literal 0xFF in a C program always means 255. If you assign this to a type for which 255 is out of range, e.g. a signed char then the behaviour is implementation-defined. Typically on 2's complement systems this is defined as assigning the value -1.
Character literals have different rules to integer literals. The character literal '\xff' must be a value that can sit in a char. You appear to have signed char, so it's implementation-defined what happens here, but again the most common behaviour is that this gets value -1. Note that character literals actually have type int despite the fact that they must have values representable by char.
In the line packet.msg[strlen("hello")] = '\xff'; , you try to assign (int)-1 to a uint8_t. This is out of range, but the behaviour is well-defined for out-of-range assignment to signed types, so the value you get is -1 (mod 256) which is 255.
Finally, when using the == operator (and most operators), the values are promoted to int if they were not already int. The 8-bit int 255 is promoted to (int)255, and you compare this against (int)-1, and they differ.
To solve this , change your comparison to have either 0xFF, or (uint8_t)'\xFF'.

Related

typecasting unsigned char and signed char to int in C

int main()
{
char ch1 = 128;
unsigned char ch2 = 128;
printf("%d\n", (int)ch1);
printf("%d\n", (int)ch2);
}
The first printf statement outputs -128 and second 128. According to me both ch1 and ch2 will have same binary representation of the number stored: 10000000. So when I typecast both the values to integers how they end up being different value?
First of all, a char can be signed or unsigned and that depends on the compiler implementation. But, as you got different results. Then, your compiler treats char as signed.
A signed char can only hold values from -128 to 127. So, a value of 128 for signed char overflows to -128.
But an unsigned char can hold values from 0 to 255. So, a value of 128 remains the same.
An unsigned char can have a value of 0 to 255. A signed char can have a value of -128 to 127. Setting a signed char to 128 in your compiler probably wrapped around to the lowest possible value, which is -128.
Your fundamental error here is a misunderstanding of what a cast (or any conversion) does in C. It does not reinterpret bits. It's purely an operation on values.
Assuming plain char is signed, ch1 has value -128 and ch2 has value 128. Both -128 and 128 are representable in int, and therefore the cast does not change their value. (Moreover, writing it is redundant since the default promotions automatically convert variadic arguments of types lower-rank than int up to int.) Conversions can only change the value of an expression when the original value is not representable in the destination type.
For starters these castings
printf("%d\n", (int)ch1);
printf("%d\n", (int)ch2);
are redundant. You could just write
printf("%d\n", ch1);
printf("%d\n", ch2);
because due to the default argument promotions integer types with the rank that is less than the rank of the type int are promoted to the type int if an object of this type can represent the value stored in an object of an integer type with less rank.
The type char can behave either as the type signed char or unsigned char depending on compiler options.
From the C Standard (5.2.4.2.1 Sizes of integer types <limits.h>)
2 If the value of an object of type char is treated as a signed
integer when used in an expression, the value of CHAR_MIN shall be the
same as that of SCHAR_MIN and the value of CHAR_MAX shall be the same
as that of SCHAR_MAX. Otherwise, the value of CHAR_MIN shall be 0 and
the value of CHAR_MAX shall be the same as that of UCHAR_MAX. 20) The
value UCHAR_MAX shall equal 2CHAR_BIT − 1.
So it seems by default the used compiler treats the type char as signed char.
As a result in the first declaration
char ch1 = 128;
unsigned char ch2 = 128;
the internal representation 0x80 of the value 128 was interpreted as a signed value because the sign bit is set. And this value is equal to -128.
So you got that the first call of printf outputted the value -128
printf("%d\n", (int)ch1);
while the second call of printf where there is used an object of the type unsigned char
printf("%d\n", (int)ch2);
outputted the value 128.

C typecasting from a signed char to int type

In the below snippet, shouldn't the output be 1? Why am I getting output as -1 and 4294967295?
What I understand is, the variable, c, here is of signed type, so shouldn't its value be 1?
char c=0xff;
printf("%d %u",c,c);
c is of signed type. a char is 8 bits. So you have an 8 bit signed quantity, with all bits 1. On a twos complement machine, that evaluates to -1.
Some compilers will warn you when you do that sort of thing. If you're using gcc/clang, switch on all the warnings.
Pedant note: On some machines it could have the value 255, should the compiler treat 'char' as unsigned.
You're getting the correct answer.
The %u format specifier indicates that the value will be an unsigned int. The compiler automatically promotes your 8-bit char to a 32-bit int. However you have to remember that char is a signed type. So a value of 0xff is in fact -1.
When the casting from char to int occurs, the value is still -1, but the it's the 32-bit representation which in binary is 11111111 11111111 11111111 11111111 or in hex 0xffffffff
When that is interpreted as an unsigned integer, all of the bits are obviously preserved because the length is the same, but now it's handled as an unsigned quantity.
0xffffffff = 4294967295 (unsigned)
0xffffffff = -1 (signed)
There are three character types in C, char, signed char, and unsigned char. Plain char has the same representation as either signed char or unsigned char; the choice is implementation-defined. It appears that plain char is signed in your implementation.
All three types have the same size, which is probably 8 bits (CHAR_BIT, defined in <limits.h>, specifies the number of bits in a byte). I'll assume 8 bits.
char c=0xff;
Assuming plain char is signed, the value 0xff (255) is outside the range of type char. Since you can't store the value 255 in a char object, the value is implicitly converted. The result of this conversion is implementation-defined, but is very likely to be -1.
Keep this carefully in mind: 0xff is simply another way to write 255, and 0xff and -1 are two distinct values. You cannot store the value 255 in a char object; its value is -1. Integer constants, whether they're decimal, hexadecimal, or octal, specify values, not representations.
If you really want a one-byte object with the value 0xff, define it as an unsigned char, not as a char.
printf("%d %u",c,c);
When a value of an integer type narrower than int is passed to printf (or to any variadic function), it's promoted to int if that type can hold the type's entire range of values, or to unsigned int if it can't. For type char, it's almost certainly promoted to int. So this call is equivalent to:
printf("%d %u", -1, -1);
The output for the "%d" format is obvious. The output for "%u" is less obvious. "%u" tells printf that the corresponding argument is of type unsigned int, but you've passed it a value of type int. What probably happens is that the representation of the int value is treated as if it were of type unsigned int, most likely yielding UINT_MAX, which happens to be 4294967295 on your system. If you really want to do that, you should convert the value to type unsigned int. This:
printf("%d %u", -1, (unsigned int)-1);
is well defined.
Your two lines of code are playing a lot of games with various types, treating values of one type as if they were of another type, and doing implicit conversions that might yield results that are implementation-defined and/or depend on the choices your compiler happens to make.
Whatever you're trying to do, there's undoubtedly a cleaner way to do it (unless you're just trying to see what your implementation does with this particular code).
Let us start with the assumption using OP's "c, here is of signed type"
char c=0xff; // Implementation defined behavior.
0xff is a hexadecimal constant with the value of 255 and type of int.
... the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised. §6.3.1.4 3
So right off, the value of c is implementation defined (ID). Let us assume the common ID behavior of 8-bit wrap-around, so c --> -1.
A signed char will be promoted to int as part of a variadic argument to printf("%d %u",c,c); is the same as printf("%d %u",-1, -1);. Printing the -1 with "%d" is not an issue and "-1" is printed.
Printing an int -1 with "%x" is undefined behavior (UB) as it is a mis-matched specifier/type and does not fall under the exception of being representable in both types. The common UB is to print the value as if it was converted to unsigned before being passed. When UINT_MAX == 4294967295 (4-bytes) that prints the value as -1 + (UINT_MAX + 1) or "4294967295"`.
So with ID and UB, you get a result, but robust code would be re-written to depend on neither.

What's the diffrence between \xFF and 0xFF

1st - What's the difference between
#define s 0xFF
and
#define s '\xFF'
2nd - Why the second line equals to -1?
3rd - Why after I try this (in the case of '\xFF')
unsigned char t = s;
putchar(t);
unsigned int p = s;
printf("\n%d\n", p);
the output is
(blank)
-1
?
thanks:)
This
#define s 0xFF
is a definition of hexadecimal integer constant. It has type int and its value is 255 in decimal notation.
This
#define s '\xFF'
is a definition of integer character constant that represented by a hexadecimal escape sequence. It also has type int but it represents a character. Its value is calculated differently.
According to the C Standard (p.#10 of section 6.4.4.4 Character constants)
...If an integer character constant contains a single character or
escape sequence, its value is the one that results when an object with
type char whose value is that of the single character or escape
sequence is converted to type int.
It seems that by default your compiler considers values of type char as values of type signed char. So according to the quote integer character constant
'\xFF' has negative value because the sign bit (MSB) is set and is equal to -1.
If you set the option of the compiler that controls whether type char is considered as signed or unsigned to unsigned char then '\xFF' and 0xFF will have the same value that is 255.
Take into account that hexadecimal escape sequences may be used in string literals along with any other escape sequences.
You can use '\xFF' in a string literal as last character and also as middle character using string concatenation but same is not true for 0xFF.
Difference between '\xFF' and 0xFF is analogous to difference between 'a' and code of character 'a' (Let's assume it is 0x61 for some implementation) with only difference '\xFF' will consume further hex characters if used in string.
When you print the character FF using putchar, output is implementation dependent. But when you print it as an integer, due to default promotion rule of varargs, it may print -1 or 255 on systems where char behaves as signed char and unsigned char respectively.

How do I represent negative char values in hexadecimal?

The following code
char buffer[BSIZE];
...
if(buffer[0]==0xef)
...
Gives the compiler warning "comparison is always false due to limited range of data type".
The warning goes away when I change the check to
if(buffer[0]==0xffffffef)
This feels very counterintuitive. What is the correct way to check a char against a specific byte value in hex? (other than making it unsigned)
What's wrong with:
if (buffer[0] == '\xef')
?
To understand why buffer[0] == 0xef triggers a warning, and buffer[0] == 0xffffffef does not, you need to understand exactly what's happening in that expression.
Firstly, the == operator compares the value of two expressions, not the underlying representation - 0xef is the number 239, and will only compare equal to that number; likewise 0xffffffef is the number 4294967279 and will only compare equal to that.
There is no difference between the constants 0xef and 239 in C: both have type int and the same value. If your char has the range -128 to 127, then when you evaluate buffer[0] == 0xef the buffer[0] is promoted to int, which leaves its value unchanged. It can therefore never compare equal to 0xef, so the warning is correct.
However, there is potentially a difference between the constants 0xffffffef and 4294967279; decimal constants are always signed, but hexadecimal constant may be unsigned. On your system, it appears to have an unsigned type - probably unsigned int (because the value is too large to store in an int, but small enough to store in an unsigned int). When you evaluate buffer[0] == 0xffffffef, the buffer[0] is promoted to unsigned int. This leaves any positive value unchanged, but negative values are converted by adding UINT_MAX + 1 to them; with a char that has range -128 to 127, the promoted values are in either of the ranges 0 to 127 or 4294967168 to 4294967295. 0xffffffef lies within this range, so it is possible for the comparison to return true.
If you are storing bit patterns rather than numbers, then you should be using unsigned char in the first place. Alternatively, you may inspect the bit pattern of an object by casting a pointer to it to unsigned char *:
if (((unsigned char *)buffer)[0] == 0xef)
(This is obviously more conveniently done by using a separate variable of type unsigned char *).
As PaulR says, you can also use buffer[0] == '\xef' - this works because '\xef' is defined to be an int constant with the value that a char object with the bit pattern 0xef would have when converted to an int; eg. on a 2s complement system with signed chars, '\xef' is a constant with the value -17.
This is happening because buffer contents are of type char. Making them unsigned char will work:
if ((unsigned char) (buffer[0]) == 0xef)
Do it just like with any other negative number:
if (buffer[0]== -0x6f)
But usually you want to use unsigned char as data type:
unsigned char buffer[BSIZE];
...
if(buffer[0]==0xef)
Reasons to use signed char are very rare. Even more rare are reasons to use "char without sign specification", which can be signed or unsigned on different platforms.
To spell out the reason explicitly: Whether char is signed or unsigned is implementation defined. If your compiler treats char as signed by default then 0xef will be larger than the largest possibled signed char (which is 127 or 0x7f) and therefore your compare will always be false. Hence the warning.
Possible solutions are provided by the other answers.

ansi-c converting char to int representable by ascii

hi i am interested in those chars which are representable by ascii table. for that reason i am doing the following:
int t(char c) { return (int) c; }
...
if(!(t(d)>255)) { dostuff(); }
so i am interested in only ascii table representable chars, which i assume after conversion to int should be less than 256, am i right? thanks!
Usually (not always) a char is 8-bits so all chars would typically have a value of less than 256. So your test would always succeed.
Also, ASCII only goes up to 127, not 255. The characters after that are not standard ASCII, and can vary depending on code pages.
If you are dealing with international characters you should probably use wide characters instead of char.
Use the library:
#include <ctype.h>
...
if (isascii(d)) { dostuff(); }
Two caveats:
The C standard does not decide if char is by default signed or unsigned. If your compiler treated char as signed by default the cast to int could result in negative values instead of the values from 128 to 255 (and this is assuming that your chars are 8-bit, too). Perhaps it's better to use unsigned char if you want to be sure this range will be converted the way you expect.
Technically ASCII is from 0 to 127, everything above is some kind of extension.
char is an integral type in C. You can do the check directly:
char c;
/* assign to c */
if (c >= 0 && c <= 127) {
/* in ASCII range */
}
I am assuming you don't want to use isascii() (it's not in the C standard, although it is POSIX).
Also, you can check if CHAR_MAX is equal to 127. If it is, you don't need the comparison with 127, since c will not exceed it by definition. Similarly, if CHAR_MIN is 0, then you don't need the comparison with 0. Both CHAR_MIN and CHAR_MAX are defined in limits.h.
I think you're thinking about an integer value overflowing a char, and therefore convert it to an int. But, that doesn't help with overflow since the damage has already been done.
Size of char is always 1 byte (as per standard). For all practical matters this means that a char var cannot have a value bigger than 255. (though there are systems, where a byte has more than 8 bits and thus a char value can be bigger, but these are rare nowadays)
Additional caveat is that if char is not defined as signed or unsigned, so it can be in the -128 to 127 range or the 0 to 255 range. (assuming 8 bits per byte, of course :-))
Meanwhile, the ASCII table is 7-bit, which means it covers the range of 0 to 127. So if you are interested in only ASCII symbols, you can just check if the value of your char var is in that range. No need to cast for the comparison.

Resources