Printing hexadecimal characters in C - c

I'm trying to read in a line of characters, then print out the hexadecimal equivalent of the characters.
For example, if I have a string that is "0xc0 0xc0 abc123", where the first 2 characters are c0 in hex and the remaining characters are abc123 in ASCII, then I should get
c0 c0 61 62 63 31 32 33
However, printf using %x gives me
ffffffc0 ffffffc0 61 62 63 31 32 33
How do I get the output I want without the "ffffff"? And why is it that only c0 (and 80) has the ffffff, but not the other characters?

You are seeing the ffffff because char is signed on your system. In C, vararg functions such as printf will promote all integers smaller than int to int. Since char is an integer (8-bit signed integer in your case), your chars are being promoted to int via sign-extension.
Since c0 and 80 have a leading 1-bit (and are negative as an 8-bit integer), they are being sign-extended while the others in your sample don't.
char int
c0 -> ffffffc0
80 -> ffffff80
61 -> 00000061
Here's a solution:
char ch = 0xC0;
printf("%x", ch & 0xff);
This will mask out the upper bits and keep only the lower 8 bits that you want.

Indeed, there is type conversion to int.
Also you can force type to char by using %hhx specifier.
printf("%hhX", a);
In most cases you will want to set the minimum length as well to fill the second character with zeroes:
printf("%02hhX", a);
ISO/IEC 9899:201x says:
7 The length modifiers and their meanings are:
hh Specifies that a following d, i, o, u, x, or X conversion specifier applies to a
signed char or unsigned char argument (the argument will have
been promoted according to the integer promotions, but its value shall be
converted to signed char or unsigned char before printing); or that
a following

You can create an unsigned char:
unsigned char c = 0xc5;
Printing it will give C5 and not ffffffc5.
Only the chars bigger than 127 are printed with the ffffff because they are negative (char is signed).
Or you can cast the char while printing:
char c = 0xc5;
printf("%x", (unsigned char)c);

You can use hh to tell printf that the argument is an unsigned char. Use 0 to get zero padding and 2 to set the width to 2. x or X for lower/uppercase hex characters.
uint8_t a = 0x0a;
printf("%02hhX", a); // Prints "0A"
printf("0x%02hhx", a); // Prints "0x0a"
Edit: If readers are concerned about 2501's assertion that this is somehow not the 'correct' format specifiers I suggest they read the printf link again. Specifically:
Even though %c expects int argument, it is safe to pass a char because of the integer promotion that takes place when a variadic function is called.
The correct conversion specifications for the fixed-width character types (int8_t, etc) are defined in the header <cinttypes>(C++) or <inttypes.h> (C) (although PRIdMAX, PRIuMAX, etc is synonymous with %jd, %ju, etc).
As for his point about signed vs unsigned, in this case it does not matter since the values must always be positive and easily fit in a signed int. There is no signed hexideximal format specifier anyway.
Edit 2: ("when-to-admit-you're-wrong" edition):
If you read the actual C11 standard on page 311 (329 of the PDF) you find:
hh: Specifies that a following d, i, o, u, x, or X conversion specifier applies to a signed char or unsigned char argument (the argument will have been promoted according to the integer promotions, but its value shall be converted to signed char or unsigned char before printing); or that a following n conversion specifier applies to a pointer to a signed char argument.

You are probably storing the value 0xc0 in a char variable, what is probably a signed type, and your value is negative (most significant bit set). Then, when printing, it is converted to int, and to keep the semantical equivalence, the compiler pads the extra bytes with 0xff, so the negative int will have the same numerical value of your negative char. To fix this, just cast to unsigned char when printing:
printf("%x", (unsigned char)variable);

You are probably printing from a signed char array. Either print from an unsigned char array or mask the value with 0xff: e.g. ar[i] & 0xFF. The c0 values are being sign extended because the high (sign) bit is set.

Try something like this:
int main()
{
printf("%x %x %x %x %x %x %x %x\n",
0xC0, 0xC0, 0x61, 0x62, 0x63, 0x31, 0x32, 0x33);
}
Which produces this:
$ ./foo
c0 c0 61 62 63 31 32 33

Related

what happens when %x read signed char?

i thought *(p3 + 3) will print 90 but it shows ffffff90
why does it happend?
i guess MSB is 1, and %x is for reading unsinged hexadecimal integer so it reads 90 like minus integer but it is not clear and i cant find about this problem at printf reference
https://cplusplus.com/reference/cstdio/printf/
is there anyone who explain this?
Use an unsigned char *.
In your environment,
char is signed.
char is 8 bits.
Signed numbers use two's complement.
So you have a char with a bit pattern of 9016. In this environment, that's -112. So you are effectively doing the following:
printf( "%x", (char)-112 );
When passing to variadric function like printf, the smaller integer types are implicitly promoted to int or unsigned int. So what's really happening is this:
printf( "%x", (int)(char)-112 );
So you're passing the int value -112. On your machine, that has the bit pattern FF FF FF 9016 (in some unknown byte order). %x expects an unsigned integer, and thus prints that bit pattern as-is.

I learned that in C language char type ranges from -128 to 127, but it doesn't seem like that

This might be a very basic problem, but I couldn't manage to.
Here is what I am working with.
#include <stdio.h>
int main(void)
{
char c1, c2;
int s;
c1 = 128;
c2 = -128;
s = sizeof(char);
printf("size of char: %d\n", s);
printf("c1: %x, c2: %x\n", c1, c2);
printf("true or false: %d\n", c1 == c2);
}
The result is like this.
size of char: 1
c1: ffffff80, c2: ffffff80
true or false: 1
i assigned the value 128 to signed(normal) char type, but it didn't overflow.
In additon, c1 and c2 both seems to hold 4bytes, and -128 and 128 are the same value.
How can I understand these facts? I need your help. Thank you very much.
In c1 = 128;, 128 does not fit in the signed eight-bit char that your C implementation uses. 128 is converted to char per C 2018 6.5.16.1 2: “the value of the right operand is converted to the type of the assignment expression…”
The conversion is implementation-defined, per 6.3.1.3 3: “Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised.” Your C implementation converted 128, which is 100000002 as an unsigned binary numeral, to −128, which is represented with the same bits when using two’s complement for signed binary. Thus, the result is that c1 contains the value −128.
In printf("c1: %x, c2: %x\n", c1, c2);, c1 is converted to an int. This is because the rules for calling functions with ... parameters are to apply the default argument promotions to the corresponding arguments, per 6.5.2.2 7: “The default argument promotions are performed on trailing arguments.”
The default argument promotions include the integer promotions, per 6.5.2.2 6. When the range of char is narrower than int, as it is in most C implementations, the integer promotions convert a char to an int, per 6.3.1.1 2: “If an int can represent all values of the original type…, the value is converted to an int…”
Thus, in printf("c1: %x, c2: %x\n", c1, c2);, an int value of −128 is passed as the second argument. Your C implementation uses 32-bit two’s complement for int, in which −128 is represented with the bits 11111111111111111111111110000000, which we can express in hexadecimal as ffffff80.
The format string specifies a conversion using %x. The proper argument type for %x is unsigned int. However, your C implementation has accepted the int and reinterpreted its bits as an unsigned int. Thus, the bits 11111111111111111111111110000000 are converted to the string “ffffff80”.
This explains why “ffffff80” is printed. It is not because c1 has four bytes but because it was converted to a four-byte type before being passed to printf. Further, the conversion of a negative value to that four-byte type resulted in four bytes with many bits set.
Regarding c1 == c2 evaluating to true (1), this is simply because c1 was given the value −128 as explained above, and c2 = -128; also assigns the value −128 to c2, so c1 and c2 have the same value.
The type char can behave as the type signed char or as the type unsigned char depending on a compiler option or default settings of the compiler.
In your case the type char behaves as the type signed char. In this case CHAR_MIN is equal to -128 and CHAR_MAX is equal to 127.
So an object of the type char can not hold the positive number 128. Internally this value has the following hexadecimal representation 0x80. So stored in an object of the type char it is interpreted as a negative value because the sign bit is set. This negative value is -128.
So after these statements
c1 = 128;
c2 = -128;
the both objects have the same value equal to -128.
And the output
c1: ffffff80, c2: ffffff80
of this call
printf("c1: %x, c2: %x\n", c1, c2);
shows that the both objects c1 and c2 promoted to the type int have the same representation of a negative value.
Pay attention to that it is implementation-defined behavior to assign an object of the signed type with a positive value that can not be represented in the object.
In the statement
printf("c1: %x, c2: %x\n", c1, c2);
%x expects an argument of type unsigned int, so the values of c1 and c2 are being promoted from char to unsigned int, with the leading bit extended. To print the numeric value of an unsigned char as hex, you need to use the hh length modifier in the conversion:
printf("c1: %hhx, c2: %hhx\n", c1, c2 );
As for the values that can be represented in a char, it's a little more complicated than that.
The encodings for members of the basic character set1 are guaranteed to be non-negative. Encodings for additional characters may be negative or non-negative.
Thus, depending on the implementation, a plain char may represent values in at least the range [-128..127] (assuming two's complement representation) or [0..255]. I say "at least" since CHAR_BIT may be more than 8 (there are historical systems that used 9-bit bytes and 36-bit words). A signed char will represent values in at least the range [-128..127] (again, assuming two's complement).
Assuming char is signed and 8 bits, then assigning 128 to c1 leads to signed integer overflow and the behavior on that is undefined, meaning the compiler and execution environment aren't required to handle it in any particular way. Any result is "correct" as far as the language definition is required, whether it's the result you expected or not.
Upper- and lowercase Latin alphabet, decimal digits, 29 graphical characters, whitespace and control characters (line feed, form feed, tab, etc.).
Here its explained: https://en.wikipedia.org/wiki/Signed_number_representations
If -128 and 128 and all numbers in between were represented with a byte, we would have 257 numbers in that set. We however dont, its just 256.
Its mapped as follows decimal: [0..127,-128..-1] => [0b00000000..0b11111111]. Note that the first bit becomes 1 at -128, happy accident ;).
Also your string formatting is incorrect, your compiler should warn you, %x expects 4 bytes! If you take into account what I said earlier then you see that 0x80 is indeed 0b10000000.

What causes 0xA4 to become 0xffffffa4 when reading a binary file? [duplicate]

This question already has answers here:
How to print an unsigned char in C?
(6 answers)
Closed 3 years ago.
I'm getting unexpected results when loading a binary file in C.
FILE *bin = NULL;
unsigned long file_length = 0;
bin = fopen("vs.bin", "rb");
fseek(bin, 0, SEEK_END);
file_length = ftell(bin);
fseek(bin, 0, SEEK_SET);
char *buffer = (char *)malloc(file_length);
fread(buffer, 1, file_length, bin);
for(unsigned int i = 0; i < file_length; i++) {
printf("%02x ", buffer[i]);
}
printf("\n");
What I see in the first eight values of output is this:
56 53 48 05 ffffffa4 ffffff8b ffffffef 49
But what I see when I open the binary in a hex editor is this:
56 53 48 05 A4 8B EF 49
What would cause this to happen? There are more instances of this happening throughout but I thought only sharing the first segment would suffice to illustrate the problem.
Thanks for reading.
Change char *buffer to unsigned char *buffer. Also change %02x to %02hhx.
In your C implementation, char is signed. When you read data into a buffer of char, you have signed values. When you use them in an expression (including arguments to printf), some of them have negative values. Additionally, values narrower than int are generally promoted to int. At that point, the char value −92 (which is represented with bits 0xA4) becomes the int value −92 (which is represented with bits 0xFFFFFFA4, in your C implementation).
So you have negative values that are converted to int and then printed with %02x, and %02x shows all the bits of the int. (In %02x, 2 specifies the minimum width; it does not restrict the result to two digits.)
%hhx is a proper conversion specifier for an unsigned char. %x is for unsigned int.
The format specifier %02x specifies the minimum number of digits to be printed out, not the maximum. The values a4, 8b and ef are all negative when interpreted as signed bytes, so what you're seeing is the two's complement representation of these values as 32-bit ints, which is what they're promoted to when passed to printf.
Explicitly name buffer as unsigned char or uint8_t to avoid this unintended sign-extension, and use the correct format specifier (%hhx for lowercase a-f hex digits, %hhX for uppercase).

Why hex encoded characters greater than x7F displays different in printf function?

I expected the code below show two equal lines:
#include <stdio.h>
int main(void) {
//printf("%x %x %x\n", '\x7F', (unsigned char)'\x8A', (unsigned char)'\x8B');
printf("%x %x %x\n", '\x7F', '\x8A', '\x8B');
printf("%x %x %x\n", 0x7F, 0x8A, 0x8B);
return 0;
}
My output:
7f ffffff8a ffffff8b
7f 8a 8b
I know that is maybe a overflow case. But why the ffffff8a (4 bytes)...?
'\x8A' is, according to cppreference,
a single-byte integer character constant, e.g. 'a' or '\n' or '\13'.
What is particularly interesting is the following.
Such constant has type int and a value equal to the representation of c-char in the execution character set as a value of type char mapped to int.
This means that the conversion of '\x8A' to an unsigned int is implementation-defined, because char can be signed or unsigned, depending on the system. If char is signed, as it seems to be the case for you (and is very common), then the value of '\x8A' (which is negative) as a 32-bit int is 0xFFFFFF8A (also negative). However, if char is unsigned, then it becomes 0x0000008A (which is why the commented line in your code works as you'd think it should).
The printf format specifier %x is used to convert an unsigned integer into hexadecimal representation. Although printf expects an unsigned int and you give it an int, and even though the standard says that passing an incorrect type to printf is (generally) undefined behavior, it isn't in your case. This is because the conversion from int to unsigned int is well-defined, even though the opposite isn't.

Why does printing char sometimes print 4 bytes number in C

Why does printing a hex representation of char to the screen using printf sometimes prints a 4 byte number?
This is the code I have written
#include <stdio.h>
#include <stdint.h>
#include<stdio.h>
int main(void) {
char testStream[8] = {'a', 'b', 'c', 'd', 0x3f, 0x9d, 0xf3, 0xb6};
int i;
for(i=0;i<8;i++){
printf("%c = 0x%X, ", testStream[i], testStream[i]);
}
return 0;
}
And following is the output:
a = 0x61, b = 0x62, c = 0x63, d = 0x64, ? = 0x3F, � = 0xFFFFFF9D, � = 0xFFFFFFF3, � = 0xFFFFFFB6
char appears to be signed on your system. With the standard "two's complement" representation of integers, having the most significant bit set means it is a negative number.
In order to pass a char to a vararg function like printf it has to be expanded to an int. To preserve its value the sign bit is copied to all the new bits (0x9D → 0xFFFFFF9D). Now the %X conversion expects and prints an unsigned int and you get to see all the set bits in the negative number rather than a minus sign.
If you don't want this, you have to either use unsigned char or cast it to unsigned char when passing it to printf. An unsigned char has no extra bits compared to a signed char and therefore the same bit pattern. When the unsigned value gets extended, the new bits will be zeros and you get what you expected in the first place.
From the C standard (C11 6.3.2.1/8) description of %X:
The unsigned int argument is converted to unsigned octal (o), unsigned
decimal (u), or unsigned hexadecimal notation (x or X) in the style dddd; the
letters abcdef are used for x conversion and the letters ABCDEF for X
conversion.
You did not provide an unsigned int as argument1, therefore your code causes undefined behaviour.
In this case the undefined behaviour manifests itself as the implementation of printf writing its code for %X to behave as if you only ever pass unsigned int. What you are seeing is the unsigned int value which has the same bit-pattern as the negative integer value you gave as argument.
There's another issue too, with:
char testStream[8] = {'a', 'b', 'c', 'd', 0x3f, 0x9d, 0xf3, 0xb6};
On your system the range of char is -128 to +127. However 0x9d, which is 157, is out of range for char. This causes implementation-defined behaviour (and may raise a signal); the most common implementation definition here is that the char with the same bit-pattern as (unsigned char)0x9d will be selected.
1 Although it says unsigned int, this section is usually interpreted to mean that a signed int, or any argument of lower rank, with a non-negative value is permitted too.
On your machine, char is signed by default. Change the type to unsigned char and you'll get the results you are expecting.
A Quick explanation on why this is
In computer systems, the MSB (Most Significant Bit) is the bit with the highest value (the left most bit). The MSB of a number is used to determine if the number is positive or negative. Even though a char type is 8-bits long, a signed char only can use 7-bits because the 8th bit determines if its positive or negative. Here is an example:
Data Type: signed char
Decimal: 25
Binary: 00011001
^
|
--- Signed flag. 0 indicates positive number. 1 indicates negtive number
Because a signed char uses the 8th bit as a signed flag, the number of bits it can actually use to store a number is 7-bits. The largest value you can store in 7-bits is 127 (7F in hex).
In order to convert a number from positive to negative, computers use something called two's-compliment. How it works is that all the bits are inverted, then 1 is added to the value. Here's an example:
Decimal: 25
Binary: 00011001
Decimal: -25
Binary: 11100111
When you declared char testStream[8], the compiler assumed you wanted signed char's. When you assigned a value of 0x9D or 0xF3, those numbers were bigger then 0x7F, which is the biggest number that can fit into 7-bits of a signed char. Therefore, when you tried to printf the value to the screen, it was expanded into an int and filled with FF's.
I hope this explanation clears things up!
char is signed on your platform: the initializer 0x9d for the 6th character is larger than CHAR_MAX (157 > 127), it is converted to char as a negative value -99 (157 - 256 = -99) stored at offset 5 in textStream.
When you pass textStream[5] as an argument to printf, it is first promoted to int, with a value of -99. printf actually expects an unsigned int for the "%X" format specifier.
On your architecture, int is 32 bits with 2's complement representation of negative values, hence the value -99 passed as int is interpreted as 4294967197 (2^32-99), whose hexadecimal representation is 0xFFFFFF9D. On a different architecture, it could be something else: on 16-bit DOS, you would get 0xFF9D, on a 64-bit Cray you might get 0xFFFFFFFFFFFFFF9D.
To avoid this confusion, you should cast the operands of printf as (unsigned char). Try replacing your printf with this:
printf("%c = 0x%2X, ", (unsigned char)testStream[i], (unsigned char)testStream[i]);
What seem to happen here is implicit char -> int -> uint cast. When the positive char is being converted to int nothing bad happens. But in case of the negative chars such as 0x9d, 0xf3, 0xb6 cast to int will keep them negative and therefore they become 0xffffff9d, 0xfffffff3, 0xffffffb6. Not that actual value is not changed, that is 0xffffff9d == -99 and 0x9d == -99.
To print them properly you can change your code to
printf("%c = 0x%X, ", testStream[i] & 0xff, testStream[i] & 0xff);

Resources