what happens when %x read signed char? - c

i thought *(p3 + 3) will print 90 but it shows ffffff90
why does it happend?
i guess MSB is 1, and %x is for reading unsinged hexadecimal integer so it reads 90 like minus integer but it is not clear and i cant find about this problem at printf reference
https://cplusplus.com/reference/cstdio/printf/
is there anyone who explain this?

Use an unsigned char *.
In your environment,
char is signed.
char is 8 bits.
Signed numbers use two's complement.
So you have a char with a bit pattern of 9016. In this environment, that's -112. So you are effectively doing the following:
printf( "%x", (char)-112 );
When passing to variadric function like printf, the smaller integer types are implicitly promoted to int or unsigned int. So what's really happening is this:
printf( "%x", (int)(char)-112 );
So you're passing the int value -112. On your machine, that has the bit pattern FF FF FF 9016 (in some unknown byte order). %x expects an unsigned integer, and thus prints that bit pattern as-is.

Related

Convert raw ASCII data to Hex string

I have the following code to convert raw ASCII data to Hex string. The full c code can be found here
void str2hex(char* inputStr, char* outputStr)
{
int i;
int counter;
i=0;
counter=0;
while(inputStr[counter] != '\0')
{
sprintf((char*)(outputStr+i),"%02X", inputStr[counter]);
i+=2; counter+=1;
}
outputStr[i++] = '\0';
}
It works fine for most of the values. But when I am trying the following input from terminal using echo as stdin echo 11223344556677881122334455667788|xxd -r -p| ./CProgram --stdin
11223344556677881122334455667788
It returns the following output
11223344556677FF11223344556677FF
As it can be seen instead of 88 it returns FF.
How can I adjust this code to get 88 instead of FF.
There are multiple issues all coalescing into your problem.
The first issue is that it's compiler-defined if char is a signed or unsigned integer type. Your compiler seem to have signed char types.
The second issue is that on most systems today, signed integers are represented using two's complement, where the most significant bit indicates the sign.
The third issue is that vararg functions like printf will do default argument promotion of its arguments. That means types smaller than int will be promoted to int. And that promotion will keep the value of the converted integer, which means negative values will be sign-extended. Sign-extension means that the most significant bit will be copied all the way to the "top" when extending the value. That means the signed byte 0xff will be extended to 0xffffffff when promoted to an int.
Now when your code tries to convert the byte 0x88 it will be treated as the negative number -120, not 136 as you might expect.
There are two possible solutions to this:
Explicitly use unsigned char for the input string:
void str2hex(const unsigned char* inputStr, char* outputStr);
Use the hh prefix in the printf format:
sprintf((char*)(outputStr+i),"%02hhX", inputStr[counter]);
This tells sprintf that the argument is a single byte, and will mask out the upper bits of the (promoted) integer.

Why hex encoded characters greater than x7F displays different in printf function?

I expected the code below show two equal lines:
#include <stdio.h>
int main(void) {
//printf("%x %x %x\n", '\x7F', (unsigned char)'\x8A', (unsigned char)'\x8B');
printf("%x %x %x\n", '\x7F', '\x8A', '\x8B');
printf("%x %x %x\n", 0x7F, 0x8A, 0x8B);
return 0;
}
My output:
7f ffffff8a ffffff8b
7f 8a 8b
I know that is maybe a overflow case. But why the ffffff8a (4 bytes)...?
'\x8A' is, according to cppreference,
a single-byte integer character constant, e.g. 'a' or '\n' or '\13'.
What is particularly interesting is the following.
Such constant has type int and a value equal to the representation of c-char in the execution character set as a value of type char mapped to int.
This means that the conversion of '\x8A' to an unsigned int is implementation-defined, because char can be signed or unsigned, depending on the system. If char is signed, as it seems to be the case for you (and is very common), then the value of '\x8A' (which is negative) as a 32-bit int is 0xFFFFFF8A (also negative). However, if char is unsigned, then it becomes 0x0000008A (which is why the commented line in your code works as you'd think it should).
The printf format specifier %x is used to convert an unsigned integer into hexadecimal representation. Although printf expects an unsigned int and you give it an int, and even though the standard says that passing an incorrect type to printf is (generally) undefined behavior, it isn't in your case. This is because the conversion from int to unsigned int is well-defined, even though the opposite isn't.

Format specifier for hex char in C

Is there a format specifier for sprintf in C that maps a char to hex in the same way that %x maps an int to hex?
Yes and no.
Since sprintf takes a variable argument list, all arguments undergo default promotion before sprintf receives them. That means sprintf will never receive a char -- a char will always be promoted to int before sprintf receives it (and a short will as well).
Yes, since what sprintf is receiving will be an int, you can use %x to convert it to hex format, and it'll work the same whether that value started as a char, short, or int. If (as is often the case) you want to print 2 characters for each input, you can use %2.2x.
Beware one point though: if your char is signed, and you start with a negative value, the promotion to int will produce the same numerical value, which normally won't be the same bit pattern as the original char, so (for example) a char with the value -1 will normally print out as ffff if int is 16 bits, ffffffff if int is 32 bits, or ffffffffffffffff if int is 64 bits (assuming the typical 2's complement representation for signed integers).
That's the same %x. All char values are converted to int before being passed to sprintf (or any other function that takes variable number of parameters).
printf("%x\n", 'a');
prints 61

Printing hexadecimal characters in C

I'm trying to read in a line of characters, then print out the hexadecimal equivalent of the characters.
For example, if I have a string that is "0xc0 0xc0 abc123", where the first 2 characters are c0 in hex and the remaining characters are abc123 in ASCII, then I should get
c0 c0 61 62 63 31 32 33
However, printf using %x gives me
ffffffc0 ffffffc0 61 62 63 31 32 33
How do I get the output I want without the "ffffff"? And why is it that only c0 (and 80) has the ffffff, but not the other characters?
You are seeing the ffffff because char is signed on your system. In C, vararg functions such as printf will promote all integers smaller than int to int. Since char is an integer (8-bit signed integer in your case), your chars are being promoted to int via sign-extension.
Since c0 and 80 have a leading 1-bit (and are negative as an 8-bit integer), they are being sign-extended while the others in your sample don't.
char int
c0 -> ffffffc0
80 -> ffffff80
61 -> 00000061
Here's a solution:
char ch = 0xC0;
printf("%x", ch & 0xff);
This will mask out the upper bits and keep only the lower 8 bits that you want.
Indeed, there is type conversion to int.
Also you can force type to char by using %hhx specifier.
printf("%hhX", a);
In most cases you will want to set the minimum length as well to fill the second character with zeroes:
printf("%02hhX", a);
ISO/IEC 9899:201x says:
7 The length modifiers and their meanings are:
hh Specifies that a following d, i, o, u, x, or X conversion specifier applies to a
signed char or unsigned char argument (the argument will have
been promoted according to the integer promotions, but its value shall be
converted to signed char or unsigned char before printing); or that
a following
You can create an unsigned char:
unsigned char c = 0xc5;
Printing it will give C5 and not ffffffc5.
Only the chars bigger than 127 are printed with the ffffff because they are negative (char is signed).
Or you can cast the char while printing:
char c = 0xc5;
printf("%x", (unsigned char)c);
You can use hh to tell printf that the argument is an unsigned char. Use 0 to get zero padding and 2 to set the width to 2. x or X for lower/uppercase hex characters.
uint8_t a = 0x0a;
printf("%02hhX", a); // Prints "0A"
printf("0x%02hhx", a); // Prints "0x0a"
Edit: If readers are concerned about 2501's assertion that this is somehow not the 'correct' format specifiers I suggest they read the printf link again. Specifically:
Even though %c expects int argument, it is safe to pass a char because of the integer promotion that takes place when a variadic function is called.
The correct conversion specifications for the fixed-width character types (int8_t, etc) are defined in the header <cinttypes>(C++) or <inttypes.h> (C) (although PRIdMAX, PRIuMAX, etc is synonymous with %jd, %ju, etc).
As for his point about signed vs unsigned, in this case it does not matter since the values must always be positive and easily fit in a signed int. There is no signed hexideximal format specifier anyway.
Edit 2: ("when-to-admit-you're-wrong" edition):
If you read the actual C11 standard on page 311 (329 of the PDF) you find:
hh: Specifies that a following d, i, o, u, x, or X conversion specifier applies to a signed char or unsigned char argument (the argument will have been promoted according to the integer promotions, but its value shall be converted to signed char or unsigned char before printing); or that a following n conversion specifier applies to a pointer to a signed char argument.
You are probably storing the value 0xc0 in a char variable, what is probably a signed type, and your value is negative (most significant bit set). Then, when printing, it is converted to int, and to keep the semantical equivalence, the compiler pads the extra bytes with 0xff, so the negative int will have the same numerical value of your negative char. To fix this, just cast to unsigned char when printing:
printf("%x", (unsigned char)variable);
You are probably printing from a signed char array. Either print from an unsigned char array or mask the value with 0xff: e.g. ar[i] & 0xFF. The c0 values are being sign extended because the high (sign) bit is set.
Try something like this:
int main()
{
printf("%x %x %x %x %x %x %x %x\n",
0xC0, 0xC0, 0x61, 0x62, 0x63, 0x31, 0x32, 0x33);
}
Which produces this:
$ ./foo
c0 c0 61 62 63 31 32 33

Adding unsigned integers in C

Here are two very simple programs. I would expect to get the same output, but I don't. I can't figure out why. The first outputs 251. The second outputs -5. I can understand why the 251. However, I don't see why the second program gives me a -5.
PROGRAM 1:
#include <stdio.h>
int main()
{
unsigned char a;
unsigned char b;
unsigned int c;
a = 0;
b= -5;
c = (a + b);
printf("c hex: %x\n", c);
printf("c dec: %d\n",c);
}
Output:
c hex: fb
c dec: 251
PROGRAM 2:
#include <stdio.h>
int main()
{
unsigned char a;
unsigned char b;
unsigned int c;
a = 0;
b= 5;
c = (a - b);
printf("c hex: %x\n", c);
printf("c dec: %d\n",c);
}
Output:
c hex: fffffffb
c dec: -5
In the first program, b=-5; assigns 251 to b. (Conversions to an unsigned type always reduce the value modulo one plus the max value of the destination type.)
In the second program, b=5; simply assigns 5 to b, then c = (a - b); performs the subtraction 0-5 as type int due to the default promotions - put simply, "smaller than int" types are always promoted to int before being used as operands of arithmetic and bitwise operators.
Edit: One thing I missed: Since c has type unsigned int, the result -5 in the second program will be converted to unsigned int when the assignment to c is performed, resulting in UINT_MAX-4. This is what you see with the %x specifier to printf. When printing c with %d, you get undefined behavior, because %d expects a (signed) int argument and you passed an unsigned int argument with a value that's not representable in plain (signed) int.
There are two separate issues here. The first is the fact that you are getting different hex values for what looks like the same operations. The underlying fact that you are missing is that chars are promoted to ints (as are shorts) to do arithmetic. Here is the difference:
a = 0 //0x00
b = -5 //0xfb
c = (int)a + (int)b
Here, a is extended to 0x00000000 and b is extended to 0x000000fb (not sign extended, because it is an unsigned char). Then, the addition is performed, and we get 0x000000fb.
a = 0 //0x00
b = 5 //0x05
c = (int)a - (int)b
Here, a is extended to 0x00000000 and b is extended to 0x00000005. Then, the subtraction is performed, and we get 0xfffffffb.
The solution? Stick with chars or ints; mixing them can cause things you won't expect.
The second problem is that an unsigned int is being printed as -5, clearly a signed value. However, in the string, you told printf to print its second argument, interpreted as a signed int (that's what "%d" means). The trick here is that printf doesn't know what the types of the variables you passed in. It merely interprets them in the way the string tells it to. Here's an example where we tell printf to print a pointer as an int:
int main()
{
int a = 0;
int *p = &a;
printf("%d\n", p);
}
When I run this program, I get a different value each time, which is the memory location of a, converted to base 10. You may note that this kind of thing causes a warning. You should read all of the warnings your compiler gives you, and only ignore them if you're completely sure you are doing what you intend to.
You're using the format specifier %d. That treats the argument as a signed decimal number (basically int).
You get 251 from the first program because (unsigned char)-5 is 251 then you print it like a signed decimal digit. It gets promoted to 4 bytes instead of 1, and those bits are 0, so the number looks like 0000...251 (where the 251 is binary, I just didn't convert it).
You get -5 from the second program because (unsigned int)-5 is some large value, but casted to an int, it's -5. It gets treated like an int because of the way you use printf.
Use the format specifier %ud to print unsigned decimal values.
What you're seeing is the result of how the underlying machine is representing the numbers how the C standard defines signed to unsigned type conversions (for the arithmetic) and how the underlying machine is representing numbers (for the result of the undefined behavior at the end).
When I originally wrote my response I had assumed that the C standard didn't explicitly define how signed values should be converted to unsigned values, since the standard doesn't define how signed values should be represented or how to convert unsigned values to signed values when the range is outside that of the signed type.
However, it turns out that the standard does explicitly define that when converting from negative signed to positive unsigned values. In the case of an integer, a negative signed value x will be converted to UINT_MAX+1-x, just as if it were stored as a signed value in two's complement and then interpreted as an unsigned value.
So when you say:
unsigned char a;
unsigned char b;
unsigned int c;
a = 0;
b = -5;
c = a + b;
b's value becomes 251, because -5 is converted to an unsigned type of value UCHAR_MAX-5+1 (255-5+1) using the C standard. It's then after that conversion that the addition takes place. That makes a+b the same as 0 + 251, which is then stored in c. However, when you say:
unsigned char a;
unsigned char b;
unsigned int c;
a = 0;
b = 5;
c = (a-b);
printf("c dec: %d\n", c);
In this case, a and b are promoted to unsigned ints, to match with c, so they remain 0 and 5 in value. However 0 - 5 in unsigned integer math leads to an underflow error, which is defined to result in UINT_MAX+1-5. If this had happened before the promotion, the value would be UCHAR_MAX+1-5 (i.e. 251 again).
However, the reason you see -5 printed in your output is a combination of the fact that the unsigned integer UINT_MAX-4 and -5 have the same exact binary representation, just like -5 and 251 do with a single-byte datatype, and the fact that when you used "%d" as the formatting string, that told printf to interpret the value of c as a signed integer instead of an unsigned integer.
Since a conversion from unsigned values to signed values for invalid values isn't defined, the result becomes implementation specific. In your case, since the underlying machine uses two's complement for signed values, the result is that the unsigned value UINT_MAX-4 becomes the signed value -5.
The only reason this doesn't happen in the first program because an unsigned int and a signed int can both represent 251, so converting between the two is well defined and using "%d" or "%u" doesn't matter. In the second program, however, it results in undefined behavior and becomes implementation specific since your value of UINT_MAX-4 went outside the range of an signed int.
What's happening under the hood
It's always good to double check what you think is happening or what should happen with what's actually happening, so let's look at the assembly language output from the compiler now to see exactly what's going on. Here's the meaningful part of the first program:
mov BYTE PTR [rbp-1], 0 ; a becomes 0
mov BYTE PTR [rbp-2], -5 ; b becomes -5, which as an unsigned char is also 251
movzx edx, BYTE PTR [rbp-1] ; promote a by zero-extending to an unsigned int, which is now 0
movzx eax, BYTE PTR [rbp-2] ; promote b by zero-extending to an unsigned int which is now 251
add eax, edx ; add a and b, that is, 0 and 251
Notice that although we store a signed value of -5 in the byte b, when the compiler promotes it, it promotes it by zero-extending the number, meaning it's being interpreted as the unsigned value that 11111011 represents instead of the signed value. Then the promoted values are added together to become c. This is also why the C standard defines signed to unsigned conversions the way it does -- it's easy to implement the conversions on architectures that use two's complement for signed values.
Now with program 2:
mov BYTE PTR [rbp-1], 0 ; a = 0
mov BYTE PTR [rbp-2], 5 ; b = 5
movzx edx, BYTE PTR [rbp-1] ; a is promoted to 32-bit integer with value 0
movzx eax, BYTE PTR [rbp-2] ; b is promoted to a 32-bit integer with value 5
mov ecx, edx
sub ecx, eax ; a - b is now done as 32-bit integers resulting in -5, which is '4294967291' when interpreted as unsigned
We see that a and b are once again promoted before any arithmetic, so we end up subtracting two unsigned ints, which leads to a UINT_MAX-4 due to underflow, which is also -5 as a signed value. So whether you interpret it as a signed or unsigned subtraction, due to the machine using two's complement form, the result matches the C standard without any extra conversions.
Assigning a negative number to an unsigned variable is basically breaking the rules. What you're doing is converting the negative number to a large positive number. You're not even guaranteed, technically, that the conversion is the same from one processor to another -- on a 1's complement system (if any still existed) you'd get a different value, eg.
So you get what you get. You can't expect signed algebra to still apply.

Resources