Printing the hex value stored as a string gives unexpected output - c

I have in C language hex numbers defined in string:
char chars[] = "\xfb\x54\x9c\xb2\x10\xef\x89\x51\x2f\x0b\xea\xbb\x1d\xaf\xad\xf8";
Then I want to compare the values with another. It is not working and if I print the value like:
printf("%02x\n", chars[0]);
it writes fffffffb. Why is that and how to get fb value exactly?

This is because of the sign extension.
Change
printf("%02x\n", chars[0]);
to
printf("%02x\n", (unsigned char)chars[0]);
The %x format specifier will read 4 bytes on 32bit machine. As you have declared chars as the character array, when fetching the value fb(negative value) will be sign extended as fffffffb, where the MSB of fb is set to all other bits before it.
Refer this for more details sign extension
If you would have declared char chars[] as unsigned char chars[] then the print would have been as expected.

As per the standard mentioning regarding the %x format specifier with fprintf()
o,u,x,X
The unsigned int argument is converted to unsigned octal (o), unsigned
decimal (u), or unsigned hexadecimal notation (x or X) in the style dddd; [...]
So, the expected type of argument to %x is unsigned int.
Now, printf() being a variadic function, only default promotion rule is applied to its arguments. In your code, chars being an array of type char (signedness of which is implementation dependent), in case of
printf("%02x\n", chars[0]);
the value of chars[0] get promoted to an int which is not the expected type for %x. Hence, the output is wrong, as int and unsigned int are not the same type. [Refer ยง6.7.2, C11]. So, without an explicit cast like
printf("%02x\n", (unsigned int)chars[0]);
it invokes undefined behaviour.
FWIW, if you're having a C99 supported compiler, you can make use of the hh length modifier to work around this, like
printf("%02hhx\n", (unsigned char)chars[0]);

It's because of sign extension.
This will work as you expect:
printf("%02x\n", (unsigned char)chars[0]);

Related

why the output of following c program is -10

#include <stdio.h>
int main() {
unsigned int a = -10;
printf("a=%d\n", a);
return 0;
}
The above code is printing -10 for signed int. If the both signed and unsigned are printing the -10 then what is difference between them?
printf doesn't know about the type of the argument that you're giving it. With %d you're telling it that it's a signed int, which is wrong. That's undefined behavior - anything could happen. What will likely happen is that it's just interpreting that memory as signed int anyway, and with the unsigned int a=-10; you set the unsigned int to what will be interpreted as -10 when read as signed int. For further info on what happens with that assignment of a negative number to an unsigned type, check out this answer.
You actually have undefined behavior in that code.
The "%d" format is for plain signed int, and mismatching format specifier and argument leads to UB.
Since printf doesn't have any idea of the real types being passed, it has to rely only on the format specifiers. So what probably happens is that the printf function simply treats the value as a plain signed int and print it as such.
You should use
printf("a=%u\n",a);
to print "a" as an unsigned integer

Why I have to use %u with unsigned int and I can use %i with unsigned char?

I tried to play with data types in C. My first problem was printf() show negative value with unsigned int. I fixed this with %u instead of %i.
But unsigned char still works with %i, how it is possible?
#include <stdio.h>
int main(void) {
unsigned int a;
unsigned char b;
a = -7;
b = -1
printf("a=%u\nb=%i\n", a, b);
return 0;
}
If you see e.g. this printf (and family) reference you will see that the "%i" format do
converts a signed integer into decimal representation [-]dddd.
[Emphasis not mine]
Since you pass an unsigned int you're having mismatched format specifier and value, which leads to undefined behavior.
Furthermore, for variable argument functions (like printf) arguments smaller than int (like for example char, signed or unsigned) are promoted to int. And again, since the resulting value is an int (which is signed) and you use the "%u" format, there is a mismatch between format specifier and argument type.
As it has been stated before, using 'i' as the format specifier is not correct for a variable of type unsigned char.
Whenever you are unsure of what the correct format would be for any particular (integer) type, you can just take a look at inttypes.h, which contains a bunch of macros meant to be used for portable format strings. Depending on the platform you're developing for, the correct format specifiers might differ (uint16_t could be 'u' or 'hu', int32_t could be 'd' or 'ld' for instance).
You could either use this header as a "cheat sheet", or actually write your format strings like this:
printf("a=%"PRIu32"\nb=%"PRIu8"\n", a, b);
Note that for the code to actually be portable, you would of course also need to use uint8_t instead of unsigned char, and uint32_t instead of int.

Why hex encoded characters greater than x7F displays different in printf function?

I expected the code below show two equal lines:
#include <stdio.h>
int main(void) {
//printf("%x %x %x\n", '\x7F', (unsigned char)'\x8A', (unsigned char)'\x8B');
printf("%x %x %x\n", '\x7F', '\x8A', '\x8B');
printf("%x %x %x\n", 0x7F, 0x8A, 0x8B);
return 0;
}
My output:
7f ffffff8a ffffff8b
7f 8a 8b
I know that is maybe a overflow case. But why the ffffff8a (4 bytes)...?
'\x8A' is, according to cppreference,
a single-byte integer character constant, e.g. 'a' or '\n' or '\13'.
What is particularly interesting is the following.
Such constant has type int and a value equal to the representation of c-char in the execution character set as a value of type char mapped to int.
This means that the conversion of '\x8A' to an unsigned int is implementation-defined, because char can be signed or unsigned, depending on the system. If char is signed, as it seems to be the case for you (and is very common), then the value of '\x8A' (which is negative) as a 32-bit int is 0xFFFFFF8A (also negative). However, if char is unsigned, then it becomes 0x0000008A (which is why the commented line in your code works as you'd think it should).
The printf format specifier %x is used to convert an unsigned integer into hexadecimal representation. Although printf expects an unsigned int and you give it an int, and even though the standard says that passing an incorrect type to printf is (generally) undefined behavior, it isn't in your case. This is because the conversion from int to unsigned int is well-defined, even though the opposite isn't.

Tilde operator in C

unsigned short int i = 0;
printf("%u\n",~i);
Why does this code return a 32 bit number in the console? It should be 16 bit, since short is 2 bytes.
The output is 4,294,967,295 and it should be 65,535.
%u expects an unsigned int; if you want to print an unsigned short int, use %hu.
EDIT
Lundin is correct that ~i will be converted to type int before being passed to printf. i is also converted to int by virtue of being passed to a variadic function. However, printf will convert the argument back to unsigned short before printing if the %hu conversion specifier is used:
7.21.6.1 The fprintf function
...
3 The format shall be a multibyte character sequence, beginning and ending in its initial
shift state. The format is composed of zero or more directives: ordinary multibyte
characters (not %), which are copied unchanged to the output stream; and conversion
specifications, each of which results in fetching zero or more subsequent arguments,
converting them, if applicable, according to the corresponding conversion specifier, and
then writing the result to the output stream.
...
7 The length modifiers and their meanings are:
...
h Specifies that a following d, i, o, u, x, or X conversion specifier applies to a
short int or unsigned short int argument (the argument will
have been promoted according to the integer promotions, but its value shall
be converted to short int or unsigned short int before printing);
or that a following n conversion specifier applies to a pointer to a short
int argument.
Emphasis mine.
So, the behavior is not undefined; it would only be undefined if either i or ~i were not integral types.
When you pass an argument to printf and that argument is of integer type shorter than int, it is implicitly promoted to int as per K&R argument promotion rules. Thus your printf-call actually behaves like:
printf("%u\n", (int)~i);
Notice that this is undefined behavior since you told printf that the argument has an unsigned type whereas int is actually a signed type. Convert i to unsigned short and then to unsignedto resolve the undefined behavior and your problem:
printf("%u\n", (unsigned)(unsigned short)~i);
N1570 6.5.3.3 Unary arithmetic operators p4:
The result of the ~ operator is the bitwise complement of its (promoted) operand (that is,
each bit in the result is set if and only if the corresponding bit in the converted operand is
not set). The integer promotions are performed on the operand, and the result has the
promoted type. ...
Integer type smaller than int are promoted to int. If sizeof(unsigned short) == 2 and sizeof(int) == 4, then resulting type is int.
And what's more, printf conversion specifier %u expects unsigned int, so representation of int is interpreted as unsigned int. You are basically lying to compiler, and this is undefined behaviour.
It's because the arguments to printf() are put into the stack in words, as there is no way inside printf to know that the argument is short. Also by using %u format you are merely stating that you are passing an unsigned number.

Sign extension query in case of short

Given,
unsigned short y = 0xFFFF;
When I print
printf("%x", y);
I get : 0xFFFF;
But when I print
printf("%x", (signed short)y);
I get : 0xFFFFFFFF
Whole program below:
#include <stdio.h>
int main() {
unsigned short y = 0xFFFF;
unsigned short z = 0x7FFF;
printf("%x %x\n", y,z);
printf("%x %x", (signed short)y, (signed short)z);
return 0;
}
Sign extension happens when we typecast lower to higher byte data type, but here we are typecasting short to signed short.
In both cases sizeof((signed short)y) or sizeof((signed short)z) prints 2 bytes. Short remains of 2 bytes, if sign bit is zero as in case of 0x7fff.
Any help is very much appreciated!
Output of the first printf is as expected. The second printf produces undefined behavior.
In C language when you pass a a value smaller than int as a variadic argument, that value is always implicitly converted to type int. It is not possible to physically pass a short or char variadic argument. That implicit conversion to int is where your "sign extension" takes place.
For this reason, your printf("%x", y); is equivalent to printf("%x", (int) y);. The value that is passed to printf is 0xFFFF of type int. Technically, %x format requires an unsigned int argument, but a non-negative int value is also OK (unless I'm missing some technicality). The output is 0xFFFF.
Conversion to int happens in the second case as well. I.e. your printf("%x", (signed short) y); is equivalent to printf("%x", (int) (signed short) y);. The conversion of 0xFFFF to (signed short) is implementation-defined, because 0xFFFF is apparently out of range of signed short on your platform. But most likely it produces a negative value (-1). When converted to int it produces the same negative value of type int (again, -1 represented as 0xFFFFFFFF for a 32-bit int). The further behavior is undefined, since you are passing a negative int value for format specifier %x, which requires unsigned int argument. It is illegal to use %x with negative int values.
In other words, formally your second printf prints unpredictable garbage. But practically the above explains where that 0xFFFFFFFF came from.
Let's break it down and into smaller pieces:
Given,
unsigned short y = 0xFFFF;
Assuming two-bytes unsigned short maximum value is 2^16-1, that is indeed 0xFFFF.
When I print
printf("%x", y);
Due to default argument promotion (as printf() is variadic function) value of y is implicitly promoted to type int. With %x format-specified it's treated as unsigned int. Assuming common two-complement's representation and four-bytes int type, that means that as most-significant bit is set to zero, the bit patterns of int and unsigned int are simply the same.
But when I print
printf("%x", (signed short)y);
What you have done is cast to signed type, that cannot represent value of 0xFFFF. Such conversion as standard stays is implementation-defined, so you can get whatever result. After implicit conversion to int apparently you have bit-patern of 32-ones, that are represented as 0xFFFFFFFF.

Resources