How does printing 577 with %c output "A"? - c

#include<stdio.h>
int main()
{
int i = 577;
printf("%c",i);
return 0;
}
After compiling, its giving output "A". Can anyone explain how i'm getting this?

%c will only accept values up to 255 included, then it will start from 0 again !
577 % 256 = 65; // (char code for 'A')

This has to do with how the value is converted.
The %c format specifier expects an int argument and then converts it to type unsigned char. The character for the resulting unsigned char is then written.
Section 7.21.6.1p8 of the C standard regarding format specifiers for printf states the following regarding c:
If no l length modifier is present, the int argument is converted to an
unsigned char, and the resulting character is written.
When converting a value to a smaller unsigned type, what effectively happens is that the higher order bytes are truncated and the lower order bytes have the resulting value.
Section 6.3.1.3p2 regarding integer conversions states:
Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or
subtracting one more than the maximum value that can be represented in the new type
until the value is in the range of the new type.
Which, when two's complement representation is used, is the same as truncating the high-order bytes.
For the int value 577, whose value in hexadecimal is 0x241, the low order byte is 0x41 or decimal 65. In ASCII this code is the character A which is what is printed.

How does printing 577 with %c output "A"?
With printf(). "%c" matches an int argument*1. The int value is converted to an unsigned char value of 65 and the corresponding character*2, 'A' is then printed.
This makes no difference if a char is signed or unsigned or encoded with 2's complement or not. There is no undefined behavior (UB). It makes no difference how the argument is passed, on the stack, register, or .... The endian of int is irrelevant. The argument value is converted to an unsigned char and the corresponding character is printed.
*1All int values are allowed [INT_MIN...INT_MAX].
When a char value is passed as ... argument, it is first converted to an int and then passed.
char ch = 'A';
printf("%c", ch); // ch is converted to an `int` and passed to printf().
*2 65 is an ASCII A, the ubiquitous encoding of characters. Rarely other encodings are used.

Just output the value of the variable i in the hexadecimal representation
#include <stdio.h>
int main( void )
{
int i = 577;
printf( "i = %#x\n", i );
}
The program output will be
i = 0x241
So the least significant byte contains the hexadecimal value 0x41 that represents the ASCII code of the letter 'A'.

577 in hex is 0x241. The ASCII representation of 'A' is 0x41. You're passing an int to printf but then telling printf to treat it as a char (because of %c). A char is one-byte wide and so printf looks at the first argument you gave it and reads the least significant byte which is 0x41.
To print an integer, you need to use %d or %i.

Related

Convert raw ASCII data to Hex string

I have the following code to convert raw ASCII data to Hex string. The full c code can be found here
void str2hex(char* inputStr, char* outputStr)
{
int i;
int counter;
i=0;
counter=0;
while(inputStr[counter] != '\0')
{
sprintf((char*)(outputStr+i),"%02X", inputStr[counter]);
i+=2; counter+=1;
}
outputStr[i++] = '\0';
}
It works fine for most of the values. But when I am trying the following input from terminal using echo as stdin echo 11223344556677881122334455667788|xxd -r -p| ./CProgram --stdin
11223344556677881122334455667788
It returns the following output
11223344556677FF11223344556677FF
As it can be seen instead of 88 it returns FF.
How can I adjust this code to get 88 instead of FF.
There are multiple issues all coalescing into your problem.
The first issue is that it's compiler-defined if char is a signed or unsigned integer type. Your compiler seem to have signed char types.
The second issue is that on most systems today, signed integers are represented using two's complement, where the most significant bit indicates the sign.
The third issue is that vararg functions like printf will do default argument promotion of its arguments. That means types smaller than int will be promoted to int. And that promotion will keep the value of the converted integer, which means negative values will be sign-extended. Sign-extension means that the most significant bit will be copied all the way to the "top" when extending the value. That means the signed byte 0xff will be extended to 0xffffffff when promoted to an int.
Now when your code tries to convert the byte 0x88 it will be treated as the negative number -120, not 136 as you might expect.
There are two possible solutions to this:
Explicitly use unsigned char for the input string:
void str2hex(const unsigned char* inputStr, char* outputStr);
Use the hh prefix in the printf format:
sprintf((char*)(outputStr+i),"%02hhX", inputStr[counter]);
This tells sprintf that the argument is a single byte, and will mask out the upper bits of the (promoted) integer.

ASCII conversion into characters

#include<stdio.h>
int main() {
int i = 3771717;
printf ("%c", i);
return 0;
}
The output is E. Isn't 69 the ASCII for E?
The %c format specifier expects an int argument which is then converted to and unsigned char as printed as a character.
The int value 3771717 gets converted to the unsigned char value 69 as per the conversion rules from a signed integer to unsigned integer. The C standard specifies this is done by repeatedly subtracting one more than the maximum value for unsigned char until it is in range. In practice, that means truncating the value to the low order byte. 3771717 decimal is 398D45 hex so that leaves us with 45 hex which is 69 decimal.
Then the character code for 69 i.e. 'E' is printed.

Dealing with char values over 127 in C

I'm quite new to C programming, and I have some problems trying to assign a value over 127 (0x7F) in a char array. In my program, I work with generic binary data and I don't face any problem printing a previously acquired byte stream (e.g. with fopen or fgets, then processed with some bitwise operations) as %c or %d.But if I try to print a character from its numerical value like this:
printf("%c\n", 128);
it just prints FFFD (the replacement character).Here is another example:
char abc[] = {126, 128, '\0'}; // Manually assigning values
printf("%c", abc[0]); // Prints "~", as expected
printf("%c", 121); // Prints "y"
pritf("%c", abc[1]; // Should print "€", I think, but I get "�"
I'm a bit confused since I can just print every character below 128 in these ways.The reason I'm asking this, is because I need to generate a (pseudo)random byte sequence using the rand() function.Here is an example:
char abc[10];
srand(time(NULL));
abc[0] = rand() % 256; // Gives something between 00:FF ...
printf("%c", abc[0]); // ... but I get "�"
If this is of any help, the source code is encoded in UTF-8, but changing encoding doesn't have any effect.
In C, a char is a different type than unsigned char and signed char. It has the range CHAR_MIN to CHAR_MAX. Yet it has the same range as one of unsigned char/signed char. Typically these are 8-bit types, but could be more. See CHAR_BIT. So the typical range is [0 to 255] or [-128 to 127]
If char is unsigned, abc[1] = 128 is fine. If char is signed, abc[1] = 128 is implementation-defined (see below). The typical I-D is the abc[1] will have the value of -128.
printf("%c\n", 128); will send the int value 128 to printf(). The "%c" will cast that value to an unsigned char. So far no problems. What appears on the output depends on how the output device handles code 128. Perhaps Ç, perhaps something else.
printf("%c", abc[1]; will send 128 or is I-D. If I-D and -128 was sent, then casting -128 to unsigned char is 128 and again the code for 128 is printed.
If the output device is expecting UTF8 sequences, a UTF8 sequence beginning with code 128 is invalid (it is an unexpected continuation byte) and many such systems will print the replacement character which is unicode FFFD.
Converting a value outside the range of of a signed char to char invokes:
the new type is signed and the value cannot be represented in it; either the
result is implementation-defined or an implementation-defined signal is raised. C11dr §6.3.1.3 3
First of all, let me tell you, signed-ness of a char is implementation defined.
If you have to deal with char values over 127, you can use unsigned char. It can handle 0-255.
Also, you should be using %hhu format specifier to print the value of an unsigned char.
If you're dealing with bytes, use unsigned char instead of char for your datatypes.
With regard to printing, you can print the bytes in hex instead of decimal or as characters:
printf("%02X", abc[0]);
You probably don't want to print these bytes as characters, as you'll most likely be dealing with UTF-8 character encoding which doesn't seem to be what you're looking for.

Format specifier for hex char in C

Is there a format specifier for sprintf in C that maps a char to hex in the same way that %x maps an int to hex?
Yes and no.
Since sprintf takes a variable argument list, all arguments undergo default promotion before sprintf receives them. That means sprintf will never receive a char -- a char will always be promoted to int before sprintf receives it (and a short will as well).
Yes, since what sprintf is receiving will be an int, you can use %x to convert it to hex format, and it'll work the same whether that value started as a char, short, or int. If (as is often the case) you want to print 2 characters for each input, you can use %2.2x.
Beware one point though: if your char is signed, and you start with a negative value, the promotion to int will produce the same numerical value, which normally won't be the same bit pattern as the original char, so (for example) a char with the value -1 will normally print out as ffff if int is 16 bits, ffffffff if int is 32 bits, or ffffffffffffffff if int is 64 bits (assuming the typical 2's complement representation for signed integers).
That's the same %x. All char values are converted to int before being passed to sprintf (or any other function that takes variable number of parameters).
printf("%x\n", 'a');
prints 61

are int and char represented using the same bits internally by gcc?

I was playing around with unicode characters (without using wchar_t support) just for fun. I'm only using the regular char data type. I noticed that while printing them in hex they were showing up full 4 bytes instead of just one byte.
For ex. consider this c file:
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
char *s = (char *) malloc(100);
fgets(s, 100, stdin);
while (s && *s != '\0') {
printf("%x\n", *s);
s++;
}
return 0;
}
After compiling with gcc and giving input as 'cent' symbol (hex: c2 a2) I get the following output
$ ./a.out
¢
ffffffc2: ?
ffffffa2: ?
a:
So instead of just printing c2 and a2 I got the whole 4 bytes as if it's an int type.
Does this mean char is not really 1-byte in length, ascii made it look like 1-byte?
Maybe the reason why the upper three bytes become 0xFFFFFF needs a bit more explanation?
The upper three bytes of the value printed for *s have a value of 0xFF due to sign extension.
The char value passed to printf is extended to an int before the call to printf.
This is due to C's default behaviour.
In the absence of signed or unsigned, the compiler can default to interpret char as signed char or unsigned char. It is consistently one or the other unless explicitly changed with a command line option or pragma's. In this case we can see that it is signed char.
In the absence of more information (prototypes or casts), C passes:
int, so char, short, unsigned char unsigned short are converted to int. It never passes a char, unsigned char, signed char, as a single byte, it always passes an int.
unsigned int is the same size as int so the value is passed without change
The compiler needs to decide how to convert the smaller value to an int.
signed values: the upper bytes of the int are sign extended from the smaller value, which effectively copies the top, sign bit, upwards to fill the int. If the top bit of the smaller signed value is 0, the upper bytes are filled with 0. If the top bit of the smaller signed value is 1, the upper bytes are filled with 1. Hence printf("%x ",*s) prints ffffffc2
unsigned values are not sign extended, the upper bytes of the int are 'zero padded'
Hence the reason C can call a function without a prototype (though the compiler will usually warn about that)
So you can write, and expect this to run (though I would hope your compiler issues warnings):
/* Notice the include is 'removed' so the C compiler does default behaviour */
/* #include <stdio.h> */
int main (int argc, const char * argv[]) {
signed char schar[] = "\x70\x80";
unsigned char uchar[] = "\x70\x80";
printf("schar[0]=%x schar[1]=%x uchar[0]=%x uchar[1]=%x\n",
schar[0], schar[1], uchar[0], uchar[1]);
return 0;
}
That prints:
schar[0]=70 schar[1]=ffffff80 uchar[0]=70 uchar[1]=80
The char value is interpreted by my (Mac's gcc) compiler as signed char, so the compiler generates code to sign extended the char to the int before the printf call.
Where the signed char value has its top (sign) bit set (\x80), the conversion to int sign extends the char value. The sign extension fills in the upper bytes (in this case 3 more bytes to make a 4 byte int) with 1's, which get printed by printf as ffffff80
Where the signed char value has its top (sign) bit clear (\x70), the conversion to int still sign extends the char value. In this case the sign is 0, so the sign extension fills in the upper bytes with 0's, which get printed by printf as 70
My example shows the case where the value is unsigned char. In these two cases the value is not sign extended because the value is unsigned. Instead they are extended to int with 0 padding. It might look like printf is only printing one byte because the adjacent three bytes of the value would be 0. But it is printing the entire int, it happens that the value is 0x00000070 and 0x00000080 because the unsigned char values were converted to
int without sign extension.
You can force printf to only print the low byte of the int, by using suitable formatting (%hhx), so this correctly prints only the value in the original char:
/* Notice the include is 'removed' so the C compiler does default behaviour */
/* #include <stdio.h> */
int main (int argc, const char * argv[]) {
char schar[] = "\x70\x80";
unsigned char uchar[] = "\x70\x80";
printf("schar[0]=%hhx schar[1]=%hhx uchar[0]=%hhx uchar[1]=%hhx\n",
schar[0], schar[1], uchar[0], uchar[1]);
return 0;
}
This prints:
schar[0]=70 schar[1]=80 uchar[0]=70 uchar[1]=80
because printf interprets the %hhx to treat the int as an unsigned char. This does not change the fact that the char was sign extended to an int before printf was called. It is only a way to tell printf how to interpret the contents of the int.
In a way, for signed char *schar, the meaning of %hhx looks slightly misleading, but the '%x' format interprets int as unsigned anyway, and (with my printf) there is no format to print hex for signed values (IMHO it would be a confusing).
Sadly, ISO/ANSI/... don't freely publish our programming language standards, so I can't point to the specification, but searching the web might turn up working drafts. I haven't tried to find them. I would recommend "C: A Reference Manual" by Samuel P. Harbison and Guy L. Steele as a cheaper alternative to the ISO document.
HTH
No. printf is a variable argument function, arguments to a variable argument function will be promoted to an int. And in this case the char was negative, so it gets sign extended.
%x tells printf that the value to print is an unsigned int. So, it promotes the char to an unsigned int, sign extending as necessary and then prints out the resulting value.

Resources