1st - What's the difference between
#define s 0xFF
and
#define s '\xFF'
2nd - Why the second line equals to -1?
3rd - Why after I try this (in the case of '\xFF')
unsigned char t = s;
putchar(t);
unsigned int p = s;
printf("\n%d\n", p);
the output is
(blank)
-1
?
thanks:)
This
#define s 0xFF
is a definition of hexadecimal integer constant. It has type int and its value is 255 in decimal notation.
This
#define s '\xFF'
is a definition of integer character constant that represented by a hexadecimal escape sequence. It also has type int but it represents a character. Its value is calculated differently.
According to the C Standard (p.#10 of section 6.4.4.4 Character constants)
...If an integer character constant contains a single character or
escape sequence, its value is the one that results when an object with
type char whose value is that of the single character or escape
sequence is converted to type int.
It seems that by default your compiler considers values of type char as values of type signed char. So according to the quote integer character constant
'\xFF' has negative value because the sign bit (MSB) is set and is equal to -1.
If you set the option of the compiler that controls whether type char is considered as signed or unsigned to unsigned char then '\xFF' and 0xFF will have the same value that is 255.
Take into account that hexadecimal escape sequences may be used in string literals along with any other escape sequences.
You can use '\xFF' in a string literal as last character and also as middle character using string concatenation but same is not true for 0xFF.
Difference between '\xFF' and 0xFF is analogous to difference between 'a' and code of character 'a' (Let's assume it is 0x61 for some implementation) with only difference '\xFF' will consume further hex characters if used in string.
When you print the character FF using putchar, output is implementation dependent. But when you print it as an integer, due to default promotion rule of varargs, it may print -1 or 255 on systems where char behaves as signed char and unsigned char respectively.
Related
#include<stdio.h>
int main()
{
int i = 577;
printf("%c",i);
return 0;
}
After compiling, its giving output "A". Can anyone explain how i'm getting this?
%c will only accept values up to 255 included, then it will start from 0 again !
577 % 256 = 65; // (char code for 'A')
This has to do with how the value is converted.
The %c format specifier expects an int argument and then converts it to type unsigned char. The character for the resulting unsigned char is then written.
Section 7.21.6.1p8 of the C standard regarding format specifiers for printf states the following regarding c:
If no l length modifier is present, the int argument is converted to an
unsigned char, and the resulting character is written.
When converting a value to a smaller unsigned type, what effectively happens is that the higher order bytes are truncated and the lower order bytes have the resulting value.
Section 6.3.1.3p2 regarding integer conversions states:
Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or
subtracting one more than the maximum value that can be represented in the new type
until the value is in the range of the new type.
Which, when two's complement representation is used, is the same as truncating the high-order bytes.
For the int value 577, whose value in hexadecimal is 0x241, the low order byte is 0x41 or decimal 65. In ASCII this code is the character A which is what is printed.
How does printing 577 with %c output "A"?
With printf(). "%c" matches an int argument*1. The int value is converted to an unsigned char value of 65 and the corresponding character*2, 'A' is then printed.
This makes no difference if a char is signed or unsigned or encoded with 2's complement or not. There is no undefined behavior (UB). It makes no difference how the argument is passed, on the stack, register, or .... The endian of int is irrelevant. The argument value is converted to an unsigned char and the corresponding character is printed.
*1All int values are allowed [INT_MIN...INT_MAX].
When a char value is passed as ... argument, it is first converted to an int and then passed.
char ch = 'A';
printf("%c", ch); // ch is converted to an `int` and passed to printf().
*2 65 is an ASCII A, the ubiquitous encoding of characters. Rarely other encodings are used.
Just output the value of the variable i in the hexadecimal representation
#include <stdio.h>
int main( void )
{
int i = 577;
printf( "i = %#x\n", i );
}
The program output will be
i = 0x241
So the least significant byte contains the hexadecimal value 0x41 that represents the ASCII code of the letter 'A'.
577 in hex is 0x241. The ASCII representation of 'A' is 0x41. You're passing an int to printf but then telling printf to treat it as a char (because of %c). A char is one-byte wide and so printf looks at the first argument you gave it and reads the least significant byte which is 0x41.
To print an integer, you need to use %d or %i.
6.4.4.4/10 ...If an integer character constant contains a single character or escape sequence, its value is the one that results when an object with type char whose value is that of the single character or escape sequence is converted to type int.
I'm having trouble understanding this paragraph. After this paragraph standard gives the example below:
Example 2: Consider implementations that use two’s complement representation for
integers and eight bits for objects that have type char. In an
implementation in which type char has the same range of values as
signed char, the integer character constant '\xFF' has the value −1;
if type char has the same range of values as unsigned char, the
character constant '\xFF' has the value +255.
What i understand from the expression: "value of an object with type char" is the value we get when we interpret the object's content with type char. But when we look to the example it's like talking about the object's value with pure binary notation. Is my understanding wrong? Does an object's value mean the bits in that object always?
All "integer character constants" (the stuff between ' and ') have type int out of tradition and compatibility reasons. But they are mostly meant to be used together with char, so 6.4.4.4/10 needs to make a distinction between the types. Basically patch up the broken C language - we have cases such as *"\xFF" that results in type char but '\xFF' results in type int, which is very confusing.
The value '\xFF' = 255 will always fit in an int on any implementation, but not necessarily in a char, which has implementation-defined signedness (another inconsistency in the language). The behavior of the escape sequence should be as if we stored the character constant in a char, as done in my string literal example *"\xFF".
This need for consistency with char type even though the value is stored in an int is what 6.4.4.4/10 describes. That is, printf("%d", '\xFF'); should behave just as char ch = 255; printf("%d", (int)ch);
The example is describing one possible implementation, where char is either signed or unsigned and the system uses 2's complement. Generally the value of an object with integer type refers to decimal notation. char is an integer type, so it can have a negative decimal value (if the symbol table has a matching index for the value -1 or not is another story). But "raw binary" cannot have a negative value, 1111 1111 can only be said to be -1 if you say the the memory cell should be interpreted as 8 bit 2's complement. That is, if you know that a signed char is stored there. If you know that an unsigned char is stored there, then the value is 255.
What does the C 2018 standard specify for the value of a hexadecimal escape sequence such as '\xFF'?
Consider a C implementation in which char is signed and eight bits.
Clause 6.4.4.4 tells us about character constants. In paragraph 6, it discusses hexadecimal escape sequences:
The hexadecimal digits that follow the backslash and the letter x in a hexadecimal escape sequence are taken to be part of the construction of a single character for an integer character constant or of a single wide character for a wide character constant. The numerical value of the hexadecimal integer so formed specifies the value of the desired character or wide character.
The hexadecimal integer is “FF”. By the usual rules of hexadecimal notation, its value1 is 255. Note that, so far, we do not have a specific type: A “character” is a “member of a set of elements used for the organization, control, or representation of data” (3.7) or a “bit representation that fits in a byte” (3.7.1). When \xFF is used in '\xFF', it is a c-char in the grammar (6.4.4.4 1), and '\xFF' is an integer character constant. Per 6.4.4.4 2, “An integer character constant is a sequence of one or more multibyte characters enclosed in single-quotes, as in ’x’.”
6.4.4.4 9 specifies constraints on character constants:
The value of an octal or hexadecimal escape sequence shall be in the range of representable values for the corresponding type:
That is followed by a table that, for character constants with no prefix, shows the corresponding type is unsigned char.
So far, so good. Our hexadecimal escape sequence has value 255, which is in the range of an unsigned char.
Then 6.4.4.4 10 purports to tell us the value of the character constant. I quote it here with its sentences separated and labeled for reference:
(i) An integer character constant has type int.
(ii) The value of an integer character constant containing a single character that maps to a single-byte execution character is the numerical value of the representation of the mapped character interpreted as an integer.
(iii) The value of an integer character constant containing more than one character (e.g., ’ab’ ), or containing a character or escape sequence that does not map to a single-byte execution character, is implementation-defined.
(iv) If an integer character constant contains a single character or escape sequence, its value is the one that results when an object with type char whose value is that of the single character or escape sequence is converted to type int.
If 255 maps to an execution character, (ii) applies, and the value of '\xFF' is the value of that character. This is the first use of “maps” in the standard; it is not defined elsewhere. Should it mean anything other than a map from the value derived so far (255) to an execution character with the same value? If so, for (ii) to apply, there must be an execution character with the value 255. Then the value of '\xFF' would be 255.
Otherwise (iii) applies, and the value of '\xFF' is implementation-defined.
Regardless of whether (ii) or (iii) applies, (iv) also applies. It says the value of '\xFF' is the value of a char object whose value is 255, subsequently converted to int. But, since char is signed and eight-bit, there is no char object whose value is 255. So the fourth sentence states an impossibility.
Footnote
1 3.19 defines “value” as “precise meaning of the contents of an object when interpreted as having a specific type,” but I do not believe that technical term is being used here. “The numerical value of the hexadecimal integer” has no object to discuss yet. This appears to be a use of the word “value” in an ordinary sense.
Your demonstration leads to an interesting conclusion:
There is no portable way to write character constants with values outside the range 0 .. CHAR_MAX. This is not necessarily a problem for single characters as one can use integers in place of character constants, but there is no such alternative for string constants.
It seems type char should always be unsigned by default for consistency with many standard C library functions:
fgetc() returns an int with a negative value EOF for failure and the value of an unsigned char if a byte was successfully read. Hence the meaning and effect of fgetc() == '\xFF' is implementation defined.
the functions from <ctype.h> accept an int argument with the same values as those returned by fgetc(). Passing a negative char value has undefined behavior.
strcmp() and compares strings based on the values of characters converted to unsigned char.
'\xFF' may have the value -1 which is completely unintuitive and is potentially identical to the value of EOF.
The only reason to make or keep char signed by default is compatibility with older compilers for historical code that relies on this behavior and were written before the advent of signed char, some 30 years ago!
I strongly advise programmers to use -funsigned-char to make char unsigned by default and use signed char or better int8_t if one needs signed 8-bit variables and structure members.
As hyde commented, to avoid portability problems, char values should be cast as (unsigned char) where the signedness of char may pose problems: for example:
char str[] = "Hello world\n";
for (int i = 0; str[i]; i++)
str[i] = tolower((unsigned char)str[i]);
I know that 0xff can have different representations depending on what the variable type is. Like -1 for signed (chars/ints(?)) and 255 for unsigned chars.
But I am using the implementation-independent type of uint8_t and i've made sure that 0xff is infact inside the structure I am iterating across. Here is the code:
struct pkt {
uint8_t msg[8];
};
void main(int argc, char **argv) {
...
struct pkt packet;
memset(&packet, 0, sizeof packet);
strcpy(packet.msg, "hello");
packet.msg[strlen("hello")] = '\xff';
crypt(&packet, argv[1]);
...
}
void crypt(struct pkt *packet, unsigned char *key) {
int size = msglen(packet->msg);
...
}
int msglen(uint8_t *msg) {
int i = 0;
while(*(msg++) != '\xff') {
i++;
}
return i;
}
I've looked into the structure, and packet.msg[5] is indeed set to 0xff. But the while loop goes into an infinite loop, like it never discovered 0xff.
Values such as 0x7f works. I haven't tried 0x80 but I suspect it probably won't work if 0xff doesn't. It probably has something to do with the signness, but I just cant't see where the problem is supposed to come from.
Thanks.
EDIT: For me, it doesn't matter if I use 0x7f or 0xff. But I would just like to know what is preventing me from detecting 0xff.
If you have an unsigned, you can't use character literals.
'\xff' is -1, not 255, because the a character literal is signed.
The while condition is always true. If you are unsigned you should be using numbers only: 0 to 255, or casting characters you know are <128 to unsigned.
\xff is a character constant. It's of type int, not char (this is one way in which C differs from C++), but its value depends on whether plain char is signed or unsigned, which is implementation-defined.
The wording in the C standard is:
The hexadecimal digits that follow the backslash and the letter x in a
hexadecimal escape sequence are taken to be part of the construction
of a single character for an integer character constant or of a single
wide character for a wide character constant. The numerical value of
the hexadecimal integer so formed specifies the value of the desired
character or wide character.
If plain char is unsigned, then '\xff' is equivalent to 0xff or 255; it's of type int and has the value 255.
If plain char is signed, then '\xff' specifies a value that's outside the range of char (assuming that char is 8 bits). The wording of the standard isn't 100% clear to me, but at least with gcc the value of '\xff' is -1.
Just use an integer constant 0xff rather than a character constant \xff'. 0xff is of type int and is guaranteed to have the value 255, which is what you want.
I know that 0xff can have different representations depending on what the variable type is. Like -1 for signed (chars/ints(?)) and 255 for unsigned chars.
This needs some explanation. The integer literal 0xFF in a C program always means 255. If you assign this to a type for which 255 is out of range, e.g. a signed char then the behaviour is implementation-defined. Typically on 2's complement systems this is defined as assigning the value -1.
Character literals have different rules to integer literals. The character literal '\xff' must be a value that can sit in a char. You appear to have signed char, so it's implementation-defined what happens here, but again the most common behaviour is that this gets value -1. Note that character literals actually have type int despite the fact that they must have values representable by char.
In the line packet.msg[strlen("hello")] = '\xff'; , you try to assign (int)-1 to a uint8_t. This is out of range, but the behaviour is well-defined for out-of-range assignment to signed types, so the value you get is -1 (mod 256) which is 255.
Finally, when using the == operator (and most operators), the values are promoted to int if they were not already int. The 8-bit int 255 is promoted to (int)255, and you compare this against (int)-1, and they differ.
To solve this , change your comparison to have either 0xFF, or (uint8_t)'\xFF'.
In Kernighan & Ritchie, it says that "all printable characters are positive when though char datatype being signed or unsigned is machine-dependent."
Can somebody explain to me the meaning of this line ? My system has signed chars but even with a negative value say of -90, printf does print a character (even though its not a very familiar character).
ASCII character set defines codepoints from 0x00 to 0x7F. It doesn't matter if they are represented with unsigned or signed byte values since this range is common for both.
Printable characters are between 0x20 and 0x7E, which are all part of the ASCII. The term printable character does not define every possible character in the world that is printable. Rather it is defined inside the realm of ASCII.
Byte values from 0x80 to 0xFF are not defined in ASCII and different systems assign different characters to values in this range resulting in many different types of codepages which are identical in their ASCII range but differ in this range. This is also the range where values for signed and unsigned bytes differ.
The implementation of printf looks for a single byte value when it encounters a %c key in its input. This byte value may be signed or unsigned with respect to your point of view as the caller of printf function but printf does not know this. It just passes these 8bits to the output stream it's connected to and that stream emits characters within 0x00 and 0xff.
The concept of sign has no meaning inside the output pipeline where characters are emitted. Thus, whether you send a 255 or a -1, the character mapped to 0xFF in the specific codepage is emitted.
-90 as a signed char is being re-interpreted as an unsigned char, in which case it's value is 166. (Both -90 and 166 are 0xA6 in hex.)
That's right. All binary numbers are positive. Whether you treat it as negative or not is your own interpretation. Using the common two's compliment.
The 8-bit number: 10100110 is positive 166, which is greater that 128 (The maximum positive signed 8 bit number).
Using signed arithmatic the number 166 is -90.
You are seeing the character whose ascii value is 166.
Using this as an example:
signed char x = -90;
printf("%c", x);
The integer promotion rules convert x into an int before passing it as an argument to printf. (Note, none of the other answers mention this detail, and several imply the argument to printf is still a signed char).
Section 7.21.6.1.6 of the standard (I'm using the C11 standard) says of the %c flag character:
If no l length modifier is present, the int argument is converted to
an unsigned char, and the resulting character is written.
So the integer -90 gets converted into an unsigned char. That means (6.3.1.3.2):
...the value is converted by repeatedly adding or subtracting one more than
the maximum value that can be represented in the new type until the value is
in the range of the new type.
If an unsigned char on your system takes the values 0 to 255 (which it almost certainly does), then the result will be -90 + 256 = 166. (Note: other answers refer to the "lowest byte" or "hex representation" assuming two's complement representation. Although this is overwhelmingly common, the C standard does not guarantee it).
The character 166 is then written to stdout, and interpreted by your terminal.