Why hex encoded characters greater than x7F displays different in printf function?

Why hex encoded characters greater than x7F displays different in printf function? - c

I expected the code below show two equal lines:
#include <stdio.h>
int main(void) {
//printf("%x %x %x\n", '\x7F', (unsigned char)'\x8A', (unsigned char)'\x8B');
printf("%x %x %x\n", '\x7F', '\x8A', '\x8B');
printf("%x %x %x\n", 0x7F, 0x8A, 0x8B);
return 0;
}
My output:
7f ffffff8a ffffff8b
7f 8a 8b
I know that is maybe a overflow case. But why the ffffff8a (4 bytes)...?

'\x8A' is, according to cppreference,
a single-byte integer character constant, e.g. 'a' or '\n' or '\13'.
What is particularly interesting is the following.
Such constant has type int and a value equal to the representation of c-char in the execution character set as a value of type char mapped to int.
This means that the conversion of '\x8A' to an unsigned int is implementation-defined, because char can be signed or unsigned, depending on the system. If char is signed, as it seems to be the case for you (and is very common), then the value of '\x8A' (which is negative) as a 32-bit int is 0xFFFFFF8A (also negative). However, if char is unsigned, then it becomes 0x0000008A (which is why the commented line in your code works as you'd think it should).
The printf format specifier %x is used to convert an unsigned integer into hexadecimal representation. Although printf expects an unsigned int and you give it an int, and even though the standard says that passing an incorrect type to printf is (generally) undefined behavior, it isn't in your case. This is because the conversion from int to unsigned int is well-defined, even though the opposite isn't.

Related

what happens when %x read signed char?

i thought *(p3 + 3) will print 90 but it shows ffffff90
why does it happend?
i guess MSB is 1, and %x is for reading unsinged hexadecimal integer so it reads 90 like minus integer but it is not clear and i cant find about this problem at printf reference
https://cplusplus.com/reference/cstdio/printf/
is there anyone who explain this?

Use an unsigned char *.
In your environment,
char is signed.
char is 8 bits.
Signed numbers use two's complement.
So you have a char with a bit pattern of 9016. In this environment, that's -112. So you are effectively doing the following:
printf( "%x", (char)-112 );
When passing to variadric function like printf, the smaller integer types are implicitly promoted to int or unsigned int. So what's really happening is this:
printf( "%x", (int)(char)-112 );
So you're passing the int value -112. On your machine, that has the bit pattern FF FF FF 9016 (in some unknown byte order). %x expects an unsigned integer, and thus prints that bit pattern as-is.

Printing the hex value stored as a string gives unexpected output

I have in C language hex numbers defined in string:
char chars[] = "\xfb\x54\x9c\xb2\x10\xef\x89\x51\x2f\x0b\xea\xbb\x1d\xaf\xad\xf8";
Then I want to compare the values with another. It is not working and if I print the value like:
printf("%02x\n", chars[0]);
it writes fffffffb. Why is that and how to get fb value exactly?

This is because of the sign extension.
Change
printf("%02x\n", chars[0]);
to
printf("%02x\n", (unsigned char)chars[0]);
The %x format specifier will read 4 bytes on 32bit machine. As you have declared chars as the character array, when fetching the value fb(negative value) will be sign extended as fffffffb, where the MSB of fb is set to all other bits before it.
Refer this for more details sign extension
If you would have declared char chars[] as unsigned char chars[] then the print would have been as expected.

As per the standard mentioning regarding the %x format specifier with fprintf()
o,u,x,X
The unsigned int argument is converted to unsigned octal (o), unsigned
decimal (u), or unsigned hexadecimal notation (x or X) in the style dddd; [...]
So, the expected type of argument to %x is unsigned int.
Now, printf() being a variadic function, only default promotion rule is applied to its arguments. In your code, chars being an array of type char (signedness of which is implementation dependent), in case of
printf("%02x\n", chars[0]);
the value of chars[0] get promoted to an int which is not the expected type for %x. Hence, the output is wrong, as int and unsigned int are not the same type. [Refer §6.7.2, C11]. So, without an explicit cast like
printf("%02x\n", (unsigned int)chars[0]);
it invokes undefined behaviour.
FWIW, if you're having a C99 supported compiler, you can make use of the hh length modifier to work around this, like
printf("%02hhx\n", (unsigned char)chars[0]);

It's because of sign extension.
This will work as you expect:
printf("%02x\n", (unsigned char)chars[0]);

Char and int16 array element both shown as 32bit hex?

In the example below:
int main(int argc, char *argv[])
{
int16_t array1[] = {0xffff,0xffff,0xffff,0xffff};
char array2[] = {0xff,0xff,0xff,0xff};
printf("Char size: %d \nint16_t size: %d \n", sizeof(char), sizeof(int16_t));
if (*array1 == *array2)
printf("They are the same \n");
if (array1[0] == array2[0])
printf("They are the same \n");
printf("%x \n", array1[0]);
printf("%x \n", *array1);
printf("%x \n", array2[0]);
printf("%x \n", *array2);
}
Output:
Char size: 1
int16_t size: 2
They are the same
They are the same
ffffffff
ffffffff
ffffffff
ffffffff
Why are the 32bit values printed for both char and int16_t and why can they be compared and are considered the same?

They're the same because they're all different representations of -1.
They print as 32 bits' worth of ff becaue you're on a 32-bit machine and you used %d and the default argument promotions took place (basically, everything smaller gets promoted to int). Try using %hx. (That'll probably get you ffff; I don't know of a way to get ff here other than by using unsigned char, or masking with & 0xff: printf("%x \n", array2[0] & 0xff) .)
Expanding on "They're the same because they're all different representations of -1":
int16_t is a signed 16-bit type. It can contain values in the range -32768 to +32767.
char is an 8-bit type, and on your machine it's evidently signed also. So it can contain values in the range -128 to +127.
0xff is decimal 255, a value which can't be represented in a signed char. If you assign 0xff to a signed char, that bit pattern ends up getting interpreted not as 255, but rather as -1. (Similarly, if you assigned 0xfe, that would be interpreted not as 254, but rather as -2.)
0xffff is decimal 65535, a value which can't be represented in an int16_t. If you assign 0xffff to a int16_t, that bit pattern ends up getting interpreted not as 65535, but rather as -1. (Similarly, if you assigned 0xfffe, that would be interpreted not as 65534, but rather as -2.)
So when you said
int16_t array1[] = {0xffff,0xffff,0xffff,0xffff};
it was basically just as if you'd said
int16_t array1[] = {-1,-1,-1,-1};
And when you said
char array2[] = {0xff,0xff,0xff,0xff};
it was just as if you'd said
char array2[] = {-1,-1,-1,-1};
So that's why *array1 == *array2, and array1[0] == array2[0].
Also, it's worth noting that all of this is very much because of the types of array1 and array2. If you instead said
uint16_t array3[] = {0xffff,0xffff,0xffff,0xffff};
unsigned char array4[] = {0xff,0xff,0xff,0xff};
You would see different values printed (ffff and ff), and the values from array3 and array4 would not compare the same.
Another answer stated that "there is no type information in C at runtime". That's true but misleading in this case. When the compiler generates code to manipulate values from array1, array2, array3, and array4, the code it generates (which of course is significant at runtime!) will be based on their types. In particular, when generating code to fetch values from array1 and array2 (but not array3 and array4), the compiler will use instructions which perform sign extension when assigning to objects of larger type (e.g. 32 bits). That's how 0xff and 0xffff got changed into 0xffffffff.

Because there is no type information in C at runtime and by using a plain %x for printing, you tell printf that your pointer points to an unsigned int. Poor library function just trusts you ... see Length modifier in printf(3) for how to give printf the information it needs.

Using %x to print negative values causes undefined behaviour so you should not assume that there is anything sensible about what you are seeing.
The correct format specifier for char is %hhd, and for int16_t it is "%" PRId16. You will need #include <inttypes.h> to get the latter macro.
Because of the default argument promotions, it is also correct to use %d with char and int16_t 1. If you change your code to use %d instead of %x, it will no longer exhibit undefined behaviour, and the results will make sense.
1 The C standard doesn't actually say that, but it's assumed that that was the intent of the writers.

Sign extension query in case of short

Given,
unsigned short y = 0xFFFF;
When I print
printf("%x", y);
I get : 0xFFFF;
But when I print
printf("%x", (signed short)y);
I get : 0xFFFFFFFF
Whole program below:
#include <stdio.h>
int main() {
unsigned short y = 0xFFFF;
unsigned short z = 0x7FFF;
printf("%x %x\n", y,z);
printf("%x %x", (signed short)y, (signed short)z);
return 0;
}
Sign extension happens when we typecast lower to higher byte data type, but here we are typecasting short to signed short.
In both cases sizeof((signed short)y) or sizeof((signed short)z) prints 2 bytes. Short remains of 2 bytes, if sign bit is zero as in case of 0x7fff.
Any help is very much appreciated!

Output of the first printf is as expected. The second printf produces undefined behavior.
In C language when you pass a a value smaller than int as a variadic argument, that value is always implicitly converted to type int. It is not possible to physically pass a short or char variadic argument. That implicit conversion to int is where your "sign extension" takes place.
For this reason, your printf("%x", y); is equivalent to printf("%x", (int) y);. The value that is passed to printf is 0xFFFF of type int. Technically, %x format requires an unsigned int argument, but a non-negative int value is also OK (unless I'm missing some technicality). The output is 0xFFFF.
Conversion to int happens in the second case as well. I.e. your printf("%x", (signed short) y); is equivalent to printf("%x", (int) (signed short) y);. The conversion of 0xFFFF to (signed short) is implementation-defined, because 0xFFFF is apparently out of range of signed short on your platform. But most likely it produces a negative value (-1). When converted to int it produces the same negative value of type int (again, -1 represented as 0xFFFFFFFF for a 32-bit int). The further behavior is undefined, since you are passing a negative int value for format specifier %x, which requires unsigned int argument. It is illegal to use %x with negative int values.
In other words, formally your second printf prints unpredictable garbage. But practically the above explains where that 0xFFFFFFFF came from.

Let's break it down and into smaller pieces:
Given,
unsigned short y = 0xFFFF;
Assuming two-bytes unsigned short maximum value is 2^16-1, that is indeed 0xFFFF.
When I print
printf("%x", y);
Due to default argument promotion (as printf() is variadic function) value of y is implicitly promoted to type int. With %x format-specified it's treated as unsigned int. Assuming common two-complement's representation and four-bytes int type, that means that as most-significant bit is set to zero, the bit patterns of int and unsigned int are simply the same.
But when I print
printf("%x", (signed short)y);
What you have done is cast to signed type, that cannot represent value of 0xFFFF. Such conversion as standard stays is implementation-defined, so you can get whatever result. After implicit conversion to int apparently you have bit-patern of 32-ones, that are represented as 0xFFFFFFFF.

Printing hexadecimal characters in C

I'm trying to read in a line of characters, then print out the hexadecimal equivalent of the characters.
For example, if I have a string that is "0xc0 0xc0 abc123", where the first 2 characters are c0 in hex and the remaining characters are abc123 in ASCII, then I should get
c0 c0 61 62 63 31 32 33
However, printf using %x gives me
ffffffc0 ffffffc0 61 62 63 31 32 33
How do I get the output I want without the "ffffff"? And why is it that only c0 (and 80) has the ffffff, but not the other characters?

You are seeing the ffffff because char is signed on your system. In C, vararg functions such as printf will promote all integers smaller than int to int. Since char is an integer (8-bit signed integer in your case), your chars are being promoted to int via sign-extension.
Since c0 and 80 have a leading 1-bit (and are negative as an 8-bit integer), they are being sign-extended while the others in your sample don't.
char int
c0 -> ffffffc0
80 -> ffffff80
61 -> 00000061
Here's a solution:
char ch = 0xC0;
printf("%x", ch & 0xff);
This will mask out the upper bits and keep only the lower 8 bits that you want.

Indeed, there is type conversion to int.
Also you can force type to char by using %hhx specifier.
printf("%hhX", a);
In most cases you will want to set the minimum length as well to fill the second character with zeroes:
printf("%02hhX", a);
ISO/IEC 9899:201x says:
7 The length modifiers and their meanings are:
hh Specifies that a following d, i, o, u, x, or X conversion specifier applies to a
signed char or unsigned char argument (the argument will have
been promoted according to the integer promotions, but its value shall be
converted to signed char or unsigned char before printing); or that
a following

You can create an unsigned char:
unsigned char c = 0xc5;
Printing it will give C5 and not ffffffc5.
Only the chars bigger than 127 are printed with the ffffff because they are negative (char is signed).
Or you can cast the char while printing:
char c = 0xc5;
printf("%x", (unsigned char)c);

You can use hh to tell printf that the argument is an unsigned char. Use 0 to get zero padding and 2 to set the width to 2. x or X for lower/uppercase hex characters.
uint8_t a = 0x0a;
printf("%02hhX", a); // Prints "0A"
printf("0x%02hhx", a); // Prints "0x0a"
Edit: If readers are concerned about 2501's assertion that this is somehow not the 'correct' format specifiers I suggest they read the printf link again. Specifically:
Even though %c expects int argument, it is safe to pass a char because of the integer promotion that takes place when a variadic function is called.
The correct conversion specifications for the fixed-width character types (int8_t, etc) are defined in the header <cinttypes>(C++) or <inttypes.h> (C) (although PRIdMAX, PRIuMAX, etc is synonymous with %jd, %ju, etc).
As for his point about signed vs unsigned, in this case it does not matter since the values must always be positive and easily fit in a signed int. There is no signed hexideximal format specifier anyway.
Edit 2: ("when-to-admit-you're-wrong" edition):
If you read the actual C11 standard on page 311 (329 of the PDF) you find:
hh: Specifies that a following d, i, o, u, x, or X conversion specifier applies to a signed char or unsigned char argument (the argument will have been promoted according to the integer promotions, but its value shall be converted to signed char or unsigned char before printing); or that a following n conversion specifier applies to a pointer to a signed char argument.

You are probably storing the value 0xc0 in a char variable, what is probably a signed type, and your value is negative (most significant bit set). Then, when printing, it is converted to int, and to keep the semantical equivalence, the compiler pads the extra bytes with 0xff, so the negative int will have the same numerical value of your negative char. To fix this, just cast to unsigned char when printing:
printf("%x", (unsigned char)variable);

You are probably printing from a signed char array. Either print from an unsigned char array or mask the value with 0xff: e.g. ar[i] & 0xFF. The c0 values are being sign extended because the high (sign) bit is set.

Try something like this:
int main()
{
printf("%x %x %x %x %x %x %x %x\n",
0xC0, 0xC0, 0x61, 0x62, 0x63, 0x31, 0x32, 0x33);
}
Which produces this:
$ ./foo
c0 c0 61 62 63 31 32 33

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Why hex encoded characters greater than x7F displays different in printf function? - c

Related

what happens when %x read signed char?

Printing the hex value stored as a string gives unexpected output

Char and int16 array element both shown as 32bit hex?

Sign extension query in case of short

Printing hexadecimal characters in C

Categories

Resources