Format specifier for hex char in C - c

Is there a format specifier for sprintf in C that maps a char to hex in the same way that %x maps an int to hex?

Yes and no.
Since sprintf takes a variable argument list, all arguments undergo default promotion before sprintf receives them. That means sprintf will never receive a char -- a char will always be promoted to int before sprintf receives it (and a short will as well).
Yes, since what sprintf is receiving will be an int, you can use %x to convert it to hex format, and it'll work the same whether that value started as a char, short, or int. If (as is often the case) you want to print 2 characters for each input, you can use %2.2x.
Beware one point though: if your char is signed, and you start with a negative value, the promotion to int will produce the same numerical value, which normally won't be the same bit pattern as the original char, so (for example) a char with the value -1 will normally print out as ffff if int is 16 bits, ffffffff if int is 32 bits, or ffffffffffffffff if int is 64 bits (assuming the typical 2's complement representation for signed integers).

That's the same %x. All char values are converted to int before being passed to sprintf (or any other function that takes variable number of parameters).
printf("%x\n", 'a');
prints 61

Related

How does printing 577 with %c output "A"?

#include<stdio.h>
int main()
{
int i = 577;
printf("%c",i);
return 0;
}
After compiling, its giving output "A". Can anyone explain how i'm getting this?
%c will only accept values up to 255 included, then it will start from 0 again !
577 % 256 = 65; // (char code for 'A')
This has to do with how the value is converted.
The %c format specifier expects an int argument and then converts it to type unsigned char. The character for the resulting unsigned char is then written.
Section 7.21.6.1p8 of the C standard regarding format specifiers for printf states the following regarding c:
If no l length modifier is present, the int argument is converted to an
unsigned char, and the resulting character is written.
When converting a value to a smaller unsigned type, what effectively happens is that the higher order bytes are truncated and the lower order bytes have the resulting value.
Section 6.3.1.3p2 regarding integer conversions states:
Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or
subtracting one more than the maximum value that can be represented in the new type
until the value is in the range of the new type.
Which, when two's complement representation is used, is the same as truncating the high-order bytes.
For the int value 577, whose value in hexadecimal is 0x241, the low order byte is 0x41 or decimal 65. In ASCII this code is the character A which is what is printed.
How does printing 577 with %c output "A"?
With printf(). "%c" matches an int argument*1. The int value is converted to an unsigned char value of 65 and the corresponding character*2, 'A' is then printed.
This makes no difference if a char is signed or unsigned or encoded with 2's complement or not. There is no undefined behavior (UB). It makes no difference how the argument is passed, on the stack, register, or .... The endian of int is irrelevant. The argument value is converted to an unsigned char and the corresponding character is printed.
*1All int values are allowed [INT_MIN...INT_MAX].
When a char value is passed as ... argument, it is first converted to an int and then passed.
char ch = 'A';
printf("%c", ch); // ch is converted to an `int` and passed to printf().
*2 65 is an ASCII A, the ubiquitous encoding of characters. Rarely other encodings are used.
Just output the value of the variable i in the hexadecimal representation
#include <stdio.h>
int main( void )
{
int i = 577;
printf( "i = %#x\n", i );
}
The program output will be
i = 0x241
So the least significant byte contains the hexadecimal value 0x41 that represents the ASCII code of the letter 'A'.
577 in hex is 0x241. The ASCII representation of 'A' is 0x41. You're passing an int to printf but then telling printf to treat it as a char (because of %c). A char is one-byte wide and so printf looks at the first argument you gave it and reads the least significant byte which is 0x41.
To print an integer, you need to use %d or %i.

Convert raw ASCII data to Hex string

I have the following code to convert raw ASCII data to Hex string. The full c code can be found here
void str2hex(char* inputStr, char* outputStr)
{
int i;
int counter;
i=0;
counter=0;
while(inputStr[counter] != '\0')
{
sprintf((char*)(outputStr+i),"%02X", inputStr[counter]);
i+=2; counter+=1;
}
outputStr[i++] = '\0';
}
It works fine for most of the values. But when I am trying the following input from terminal using echo as stdin echo 11223344556677881122334455667788|xxd -r -p| ./CProgram --stdin
11223344556677881122334455667788
It returns the following output
11223344556677FF11223344556677FF
As it can be seen instead of 88 it returns FF.
How can I adjust this code to get 88 instead of FF.
There are multiple issues all coalescing into your problem.
The first issue is that it's compiler-defined if char is a signed or unsigned integer type. Your compiler seem to have signed char types.
The second issue is that on most systems today, signed integers are represented using two's complement, where the most significant bit indicates the sign.
The third issue is that vararg functions like printf will do default argument promotion of its arguments. That means types smaller than int will be promoted to int. And that promotion will keep the value of the converted integer, which means negative values will be sign-extended. Sign-extension means that the most significant bit will be copied all the way to the "top" when extending the value. That means the signed byte 0xff will be extended to 0xffffffff when promoted to an int.
Now when your code tries to convert the byte 0x88 it will be treated as the negative number -120, not 136 as you might expect.
There are two possible solutions to this:
Explicitly use unsigned char for the input string:
void str2hex(const unsigned char* inputStr, char* outputStr);
Use the hh prefix in the printf format:
sprintf((char*)(outputStr+i),"%02hhX", inputStr[counter]);
This tells sprintf that the argument is a single byte, and will mask out the upper bits of the (promoted) integer.

are int and char represented using the same bits internally by gcc?

I was playing around with unicode characters (without using wchar_t support) just for fun. I'm only using the regular char data type. I noticed that while printing them in hex they were showing up full 4 bytes instead of just one byte.
For ex. consider this c file:
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
char *s = (char *) malloc(100);
fgets(s, 100, stdin);
while (s && *s != '\0') {
printf("%x\n", *s);
s++;
}
return 0;
}
After compiling with gcc and giving input as 'cent' symbol (hex: c2 a2) I get the following output
$ ./a.out
ยข
ffffffc2: ?
ffffffa2: ?
a:
So instead of just printing c2 and a2 I got the whole 4 bytes as if it's an int type.
Does this mean char is not really 1-byte in length, ascii made it look like 1-byte?
Maybe the reason why the upper three bytes become 0xFFFFFF needs a bit more explanation?
The upper three bytes of the value printed for *s have a value of 0xFF due to sign extension.
The char value passed to printf is extended to an int before the call to printf.
This is due to C's default behaviour.
In the absence of signed or unsigned, the compiler can default to interpret char as signed char or unsigned char. It is consistently one or the other unless explicitly changed with a command line option or pragma's. In this case we can see that it is signed char.
In the absence of more information (prototypes or casts), C passes:
int, so char, short, unsigned char unsigned short are converted to int. It never passes a char, unsigned char, signed char, as a single byte, it always passes an int.
unsigned int is the same size as int so the value is passed without change
The compiler needs to decide how to convert the smaller value to an int.
signed values: the upper bytes of the int are sign extended from the smaller value, which effectively copies the top, sign bit, upwards to fill the int. If the top bit of the smaller signed value is 0, the upper bytes are filled with 0. If the top bit of the smaller signed value is 1, the upper bytes are filled with 1. Hence printf("%x ",*s) prints ffffffc2
unsigned values are not sign extended, the upper bytes of the int are 'zero padded'
Hence the reason C can call a function without a prototype (though the compiler will usually warn about that)
So you can write, and expect this to run (though I would hope your compiler issues warnings):
/* Notice the include is 'removed' so the C compiler does default behaviour */
/* #include <stdio.h> */
int main (int argc, const char * argv[]) {
signed char schar[] = "\x70\x80";
unsigned char uchar[] = "\x70\x80";
printf("schar[0]=%x schar[1]=%x uchar[0]=%x uchar[1]=%x\n",
schar[0], schar[1], uchar[0], uchar[1]);
return 0;
}
That prints:
schar[0]=70 schar[1]=ffffff80 uchar[0]=70 uchar[1]=80
The char value is interpreted by my (Mac's gcc) compiler as signed char, so the compiler generates code to sign extended the char to the int before the printf call.
Where the signed char value has its top (sign) bit set (\x80), the conversion to int sign extends the char value. The sign extension fills in the upper bytes (in this case 3 more bytes to make a 4 byte int) with 1's, which get printed by printf as ffffff80
Where the signed char value has its top (sign) bit clear (\x70), the conversion to int still sign extends the char value. In this case the sign is 0, so the sign extension fills in the upper bytes with 0's, which get printed by printf as 70
My example shows the case where the value is unsigned char. In these two cases the value is not sign extended because the value is unsigned. Instead they are extended to int with 0 padding. It might look like printf is only printing one byte because the adjacent three bytes of the value would be 0. But it is printing the entire int, it happens that the value is 0x00000070 and 0x00000080 because the unsigned char values were converted to
int without sign extension.
You can force printf to only print the low byte of the int, by using suitable formatting (%hhx), so this correctly prints only the value in the original char:
/* Notice the include is 'removed' so the C compiler does default behaviour */
/* #include <stdio.h> */
int main (int argc, const char * argv[]) {
char schar[] = "\x70\x80";
unsigned char uchar[] = "\x70\x80";
printf("schar[0]=%hhx schar[1]=%hhx uchar[0]=%hhx uchar[1]=%hhx\n",
schar[0], schar[1], uchar[0], uchar[1]);
return 0;
}
This prints:
schar[0]=70 schar[1]=80 uchar[0]=70 uchar[1]=80
because printf interprets the %hhx to treat the int as an unsigned char. This does not change the fact that the char was sign extended to an int before printf was called. It is only a way to tell printf how to interpret the contents of the int.
In a way, for signed char *schar, the meaning of %hhx looks slightly misleading, but the '%x' format interprets int as unsigned anyway, and (with my printf) there is no format to print hex for signed values (IMHO it would be a confusing).
Sadly, ISO/ANSI/... don't freely publish our programming language standards, so I can't point to the specification, but searching the web might turn up working drafts. I haven't tried to find them. I would recommend "C: A Reference Manual" by Samuel P. Harbison and Guy L. Steele as a cheaper alternative to the ISO document.
HTH
No. printf is a variable argument function, arguments to a variable argument function will be promoted to an int. And in this case the char was negative, so it gets sign extended.
%x tells printf that the value to print is an unsigned int. So, it promotes the char to an unsigned int, sign extending as necessary and then prints out the resulting value.

How to explain this,what happens when we cast signed char to int/hex?

signed char num = 220; //DC in hex
printf("%02X\n", num);
printf("%d\n", num);
I know that signed char can only represent -128~127,but why the above outputs:
FFFFFFDC
-36
What's the reason?
UPDATE
My above code is just contrived for my question,that is ,what happens when we cast signed char to int/hex
As our starting point, 220 = DC in hex, and 11011100 in binary.
The first bit is the sign-bit, leaving us with 1011100. Per two's complement, if we complement it (getting 0100011), and then add one, we get 0100100 -- this is 36.
When it converts the signed char to signed int, it doesn't say "this would be 220 if it's unsigned", it says "this is -36, make it an int of -36", for which the 32-bit two's complement representation is FFFFFFDC, because it must be the negative value for the full size of int (this is called sign-extension):
+36 as a 32-bit value: 00000000000000000000000000100100
complement: 11111111111111111111111111011011
add one: 11111111111111111111111111011100
Or, in hex, FFFFFFDC.
This is why you must be careful with printf("%x", ch); (and relatives) -- if you intend to just get a two-digit value, and chars are signed, you may wind up with eight digits instead. Always specify "unsigned char" if you need it to be unsigned.
As you pointed out, the signed car can have a max value of 127. The reason for the negative number there, however, is due to how the char is stored in memory. All signed integer types save the final bit for sign, with 0/1 denoting positive/negative. The compiler, however, does not check overflow, so when you tried to assign num to 220, it overflowed the value into the sign bit since it could not fit it into the first 7 bits of the char (chars are 1 byte). As a result, when you try to read what is in memory, it sees the sign bit as thrown making the compiler think that instead of seeing a large positive number as you intended, it is instead seeing a small negative value. Hence, the output you see.
Edit
Responding to your updated question. All that happens is that the compiler will copy or expand the char to have 4 bytes of memory, interpret the value of the char, and rewrite it in the new int's memory. In you're case, the program at run time would think that the char has a value of -36 instead of 220 because it's interpreting those bits as a signed char before the cast. Then, when it casts, it simply creates an int with value -36.
Your overflow on the assignment, combined with sign extension when you promote it to an int to view it in hex ... see What's happening in the background of a unsigned char to integer type cast?
Any type smaller that "int" gets converted to "int" when it's passed via "...". This means your negative char got converted to negative int, with FFFs showing up in hex printout.

C Unsigned int providing a negative value?

I have an unsigned integer but when i print it out using %d there is sometimes a negative value there?
Printing %d will read the integer as a signed decimal number, regardless of its defined type.
To print unsigned numbers, use %u.
This happens because of C's way to handle variable arguments. The compiler just pulls values from the stack (typed as void* and pointing to the call stack) and printf has to figure out what the data contains from the format string you give it to.
This is why you need to supply the format string - C has no way of RTTI or a 'base class' (Object in Java, for example) to get a generic or predefined toString from.
This should work:
unsigned int a;
printf("%u\n", a);
Explanation: On most architectures, signed integers are represented in two's complement. In this system, positive numbers less than 2**(N-1) (where N = sizeof(int)) are represented the same way regardless whether you are using an int or a unsigned int. However, if the number in your unsigned int is larger than 2**(N-1), it represents a negative signed number under two's complement -- which is what printf gave you when you passed it "%d".
%d means printf will interpret the value as an int(which is signed). use %u if it is an unsigned int.

Resources