What causes 0xA4 to become 0xffffffa4 when reading a binary file? [duplicate] - c

This question already has answers here:
How to print an unsigned char in C?
(6 answers)
Closed 3 years ago.
I'm getting unexpected results when loading a binary file in C.
FILE *bin = NULL;
unsigned long file_length = 0;
bin = fopen("vs.bin", "rb");
fseek(bin, 0, SEEK_END);
file_length = ftell(bin);
fseek(bin, 0, SEEK_SET);
char *buffer = (char *)malloc(file_length);
fread(buffer, 1, file_length, bin);
for(unsigned int i = 0; i < file_length; i++) {
printf("%02x ", buffer[i]);
}
printf("\n");
What I see in the first eight values of output is this:
56 53 48 05 ffffffa4 ffffff8b ffffffef 49
But what I see when I open the binary in a hex editor is this:
56 53 48 05 A4 8B EF 49
What would cause this to happen? There are more instances of this happening throughout but I thought only sharing the first segment would suffice to illustrate the problem.
Thanks for reading.

Change char *buffer to unsigned char *buffer. Also change %02x to %02hhx.
In your C implementation, char is signed. When you read data into a buffer of char, you have signed values. When you use them in an expression (including arguments to printf), some of them have negative values. Additionally, values narrower than int are generally promoted to int. At that point, the char value −92 (which is represented with bits 0xA4) becomes the int value −92 (which is represented with bits 0xFFFFFFA4, in your C implementation).
So you have negative values that are converted to int and then printed with %02x, and %02x shows all the bits of the int. (In %02x, 2 specifies the minimum width; it does not restrict the result to two digits.)
%hhx is a proper conversion specifier for an unsigned char. %x is for unsigned int.

The format specifier %02x specifies the minimum number of digits to be printed out, not the maximum. The values a4, 8b and ef are all negative when interpreted as signed bytes, so what you're seeing is the two's complement representation of these values as 32-bit ints, which is what they're promoted to when passed to printf.
Explicitly name buffer as unsigned char or uint8_t to avoid this unintended sign-extension, and use the correct format specifier (%hhx for lowercase a-f hex digits, %hhX for uppercase).

Related

Clarifications about unsigned type in C

Hi I'm currently learning C and there's something that I quite don't understand.
First of all I was told that if I did this:
unsigned int c2 = -1;
printf("c2 = %u\n", c2);
It would output 255, according to this table:
But I get a weird result: c2 = 4294967295
Now what's weirder is that this works:
unsigned char c2 = -1;
printf("c2 = %d\n", c2);
But I don't understand because since a char is, well, a char why does it even print anything? Since the specifier here is %d and not %u as it should be for unsigned types.
The following code:
unsigned int c2 = -1;
printf("c2 = %u\n", c2);
Will never print 255. The table you are looking at is referring to an unsigned integer of 8 bits. An int in C needs to be at least 16 bits in order to comply with the C standard (UINT_MAX defined as 2^16-1 in paragraph §5.2.4.2.1, page 22 here). Therefore the value you will see is going to be a much larger number than 255. The most common implementations use 32 bits for an int, and in that case you'll see 4294967295 (2^32 - 1).
You can check how many bits are used for any kind of variable on your system by doing sizeof(type_or_variable) * CHAR_BIT (CHAR_BIT is defined in limits.h and represents the number of bits per byte, which is again most of the times 8).
The correct code to obtain 255 as output is:
unsigned char c = -1;
printf("c = %hhu\n", c);
Where the hh prefix specifier means (from man 3 printf):
hh: A following integer conversion corresponds to a signed char or unsigned char argument, or a following n conversion corresponds to a pointer to a signed char argument.
Anything else is just implementation defined or even worse undefined behavior.
In this declaration
unsigned char c2 = -1;
the internal representation of -1 is truncated to one byte and interpreted as unsigned char. That is all bits of the object c2 are set.
In this call
printf("c2 = %d\n", c2);
the argument that has the type unsigned char is promoted to the type int preserving its value that is 255. This value is outputted as an integer.
Is this declaration
unsigned int c2 = -1;
there is no truncation. The integer value -1 that usually occupies 4 bytes (according to the size of the type int) is interpreted as an unsigned value with all bits set.
So in this call
printf("c2 = %u\n", c2);
there is outputted the maximum value of the type unsigned int. It is the maximum value because all bits in the internal representation are set. The conversion from signed integer type to a larger unsigned integer type preserve the sign propagating it to the width of the unsigned integer object.
In C integer can have multiple representations, so multiple storage sizes and value ranges
refer to the table below for more details.

Size of characters in *argv[] when passed argument from command line [duplicate]

This question already has answers here:
Printing hexadecimal characters in C
(7 answers)
Is char signed or unsigned by default?
(6 answers)
Closed 3 years ago.
I have simple program.
#include <stdio.h>
#include <string.h
int main(int argc, char *argv[])
{
for (int i = 0; i < strlen(argv[1]); ++i)
printf("%x ", argv[1][i]);
printf("\n");
}
I run it like
$ ./program 111
31 31 31
But when I run it like
$ ./program ●●●
ffffffe2 ffffff97 ffffff8f ffffffe2 ffffff97 ffffff8f ffffffe2 ffffff97 ffffff8f
Here each ● is should be encoded by 3 bytes (UTF-8): e2 97 8f, but looks like it is encoded by 3 unsigned.
I don't understand where the ffffff comes from if sizeof(char) is always 1 byte.
printf() is a function accepting a variable number of arguments.
Any integer argument of a type shorter than int is automatically converted to type int.
Apparently, in your implementation, the "character" little-round-thing is composed of 3 chars, all with a negative value.
Try these
printf("%x ", (unsigned char)argv[1][i]);
printf("%hhx ", argv[1][i]); // thanks to Jonathan Leffler
UTF-8 codeunits for multi-codeunit codepoints (everything but ASCII) are all from 128 to 255, meaning outside the ASCII range.
printf() is a vararg function, and all the arguments passed to the vararg part (all but the format-string) are subject to the standard promotions.
As your implementation's bare char is 8bit signed 2s-complement, meaning the UTF-8 codeunit-value is negative, and between -1 and -128, after promotion you have an int with that value.
Then you lie to printf() by asserting it's an unsigned (%x is for unsigned int), and 2s-complement results in your Undefined Behavior printing a very big unsigned int.
You could get the right result by using %hhx, though strictly speaking you should cast the argument to unsigned char.
I don't understand where the ffffff comes from if sizeof(char) is always 1 byte.
by definition sizeof(char) is 1, but '●' is not a char in the C sense and produces 3 char
your char are visibly signed (a char is a signed char by default in your case), each the input ● produce each 3 negative codes, because your char is converted to an int (32b in your case) and the format %x consider the argument unsigned you have these output
you will have the same output doing printf("%x", -30); -> ffffffe2
note to do for (int i = 0; i < strlen(argv[1]); ++i) is expensive for nothing, the length doesn't change, better to save it or to just do for (int i = 0; argv[1][i] != 0; ++i)
it was also better to check argc is at least 1 before to look into argv[1]

Why hex encoded characters greater than x7F displays different in printf function?

I expected the code below show two equal lines:
#include <stdio.h>
int main(void) {
//printf("%x %x %x\n", '\x7F', (unsigned char)'\x8A', (unsigned char)'\x8B');
printf("%x %x %x\n", '\x7F', '\x8A', '\x8B');
printf("%x %x %x\n", 0x7F, 0x8A, 0x8B);
return 0;
}
My output:
7f ffffff8a ffffff8b
7f 8a 8b
I know that is maybe a overflow case. But why the ffffff8a (4 bytes)...?
'\x8A' is, according to cppreference,
a single-byte integer character constant, e.g. 'a' or '\n' or '\13'.
What is particularly interesting is the following.
Such constant has type int and a value equal to the representation of c-char in the execution character set as a value of type char mapped to int.
This means that the conversion of '\x8A' to an unsigned int is implementation-defined, because char can be signed or unsigned, depending on the system. If char is signed, as it seems to be the case for you (and is very common), then the value of '\x8A' (which is negative) as a 32-bit int is 0xFFFFFF8A (also negative). However, if char is unsigned, then it becomes 0x0000008A (which is why the commented line in your code works as you'd think it should).
The printf format specifier %x is used to convert an unsigned integer into hexadecimal representation. Although printf expects an unsigned int and you give it an int, and even though the standard says that passing an incorrect type to printf is (generally) undefined behavior, it isn't in your case. This is because the conversion from int to unsigned int is well-defined, even though the opposite isn't.

printf adds extra `FFFFFF` to hex print from a char array [duplicate]

This question already has answers here:
Why does printf not print out just one byte when printing hex?
(5 answers)
Closed 6 years ago.
Consider the following simplified code bellow. I want to extract some binary data/stream from a file and print it to the standard output in Hexadecimal format.
I got extra 3 bytes 0xFFFFFF. What's wrong? From where did the extra bytes come?
output
in:
2000FFFFFFAF00690033005A00
out:
2000FFFFFFAF00690033005A00
program.c
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char** argv) {
int i;
char raw[10] = {0x20,0x00,0xAF,0x00,0x69,0x00,0x33,0x00,0x5A,0x00};
FILE *outfile;
char *buf;
printf("in:\n\t");
for( i=0; i<10; i++ )
printf("%02X", raw[i]);
outfile = fopen("raw_data.bin", "w+b");
fwrite(raw, 1, 10, outfile);
buf = (char *) malloc (32 * sizeof(char));
fseek(outfile, 0, SEEK_SET);
fread(buf, 1, 10, outfile);
printf("\nout:\n\t");
for( i=0; i<10; i++ )
printf("%02X", buf[i]);
printf("\n");
fclose(outfile);
return 0;
}
Sign extension. Your compiler is implementing char as a signed char. When you pass the chars to printf they are all being sign extended during their promotion to ints. When the first bit is a 0 this doesn't matter, because it gets extended with 0s.
0xAF in binary is 10101111 Since the first bit is a 1, when passing it to printf it is extended with all 1s in the conversion to int making it 11111111111111111111111110101111, which is 0xFFFFFFAF, the hex value you have.
Solution: Use unsigned char (instead of char) to prevent the sign extension from occurring in the call
const unsigned char raw[] = {0x20,0x00,0xAF,0x00,0x69,0x00,0x33,0x00,0x5A,0x00};
All of these values in your original example are being sign extended, it's just that 0xAF is the only one with a 1 in the first bit.
Another simpler example of the same behavior (live link):
signed char c = 0xAF; // probably gives an overflow warning
int i = c; // extra 24 bits are all 1
assert( i == 0xFFFFFFAF );
That's because 0xAF when converted from a signed character to a signed integer is negative (it is sign extended), and the %02X format is for unsigned arguments and prints the converted value as FFFFFFAF.
The extra characters appear because printf %x will never silently truncate digits off of a value. Values which are non-negative get sign extended as well, but that's just adding zero bits and the value fits in 2 hex digits, so printf %02 can do with a two digit output.
Note that there are 2 C dialects: one where plain char is signed, and one where it is unsigned. In yours it is signed. You may change it using an option, e.g. gcc and clang support -funsigned-char and -fsigned-char.
The printf() is a variadic function and its additional arguments (corresponding with ... part of its prototype) are subject to default argument promotions, thus char is promoted to int.
As your char has signed1, two's complement representation the most significant bit is set to one for 0xAF element. During promotion signed bit is propagated, resulting 0xFFFFFFAF of int type, as presumably sizeof(int) = 4 in your implementation.
By the way you are invoking undefined behaviour, since %X format specifier should be used for object of type unsigned int or at least for int with MSB that is unset (this is common, widely accepted practice).
As suggested you may consider use of unambiguous unsigned char type.
1) Implementation may choose between signed and unsigned represention of char. It's rather common that char is signed, but you cannot take it for granted for every other compiler on the planet. Some of them may allow to choose between these two modes, as mentioned in Jens's answer.

Printing hexadecimal characters in C

I'm trying to read in a line of characters, then print out the hexadecimal equivalent of the characters.
For example, if I have a string that is "0xc0 0xc0 abc123", where the first 2 characters are c0 in hex and the remaining characters are abc123 in ASCII, then I should get
c0 c0 61 62 63 31 32 33
However, printf using %x gives me
ffffffc0 ffffffc0 61 62 63 31 32 33
How do I get the output I want without the "ffffff"? And why is it that only c0 (and 80) has the ffffff, but not the other characters?
You are seeing the ffffff because char is signed on your system. In C, vararg functions such as printf will promote all integers smaller than int to int. Since char is an integer (8-bit signed integer in your case), your chars are being promoted to int via sign-extension.
Since c0 and 80 have a leading 1-bit (and are negative as an 8-bit integer), they are being sign-extended while the others in your sample don't.
char int
c0 -> ffffffc0
80 -> ffffff80
61 -> 00000061
Here's a solution:
char ch = 0xC0;
printf("%x", ch & 0xff);
This will mask out the upper bits and keep only the lower 8 bits that you want.
Indeed, there is type conversion to int.
Also you can force type to char by using %hhx specifier.
printf("%hhX", a);
In most cases you will want to set the minimum length as well to fill the second character with zeroes:
printf("%02hhX", a);
ISO/IEC 9899:201x says:
7 The length modifiers and their meanings are:
hh Specifies that a following d, i, o, u, x, or X conversion specifier applies to a
signed char or unsigned char argument (the argument will have
been promoted according to the integer promotions, but its value shall be
converted to signed char or unsigned char before printing); or that
a following
You can create an unsigned char:
unsigned char c = 0xc5;
Printing it will give C5 and not ffffffc5.
Only the chars bigger than 127 are printed with the ffffff because they are negative (char is signed).
Or you can cast the char while printing:
char c = 0xc5;
printf("%x", (unsigned char)c);
You can use hh to tell printf that the argument is an unsigned char. Use 0 to get zero padding and 2 to set the width to 2. x or X for lower/uppercase hex characters.
uint8_t a = 0x0a;
printf("%02hhX", a); // Prints "0A"
printf("0x%02hhx", a); // Prints "0x0a"
Edit: If readers are concerned about 2501's assertion that this is somehow not the 'correct' format specifiers I suggest they read the printf link again. Specifically:
Even though %c expects int argument, it is safe to pass a char because of the integer promotion that takes place when a variadic function is called.
The correct conversion specifications for the fixed-width character types (int8_t, etc) are defined in the header <cinttypes>(C++) or <inttypes.h> (C) (although PRIdMAX, PRIuMAX, etc is synonymous with %jd, %ju, etc).
As for his point about signed vs unsigned, in this case it does not matter since the values must always be positive and easily fit in a signed int. There is no signed hexideximal format specifier anyway.
Edit 2: ("when-to-admit-you're-wrong" edition):
If you read the actual C11 standard on page 311 (329 of the PDF) you find:
hh: Specifies that a following d, i, o, u, x, or X conversion specifier applies to a signed char or unsigned char argument (the argument will have been promoted according to the integer promotions, but its value shall be converted to signed char or unsigned char before printing); or that a following n conversion specifier applies to a pointer to a signed char argument.
You are probably storing the value 0xc0 in a char variable, what is probably a signed type, and your value is negative (most significant bit set). Then, when printing, it is converted to int, and to keep the semantical equivalence, the compiler pads the extra bytes with 0xff, so the negative int will have the same numerical value of your negative char. To fix this, just cast to unsigned char when printing:
printf("%x", (unsigned char)variable);
You are probably printing from a signed char array. Either print from an unsigned char array or mask the value with 0xff: e.g. ar[i] & 0xFF. The c0 values are being sign extended because the high (sign) bit is set.
Try something like this:
int main()
{
printf("%x %x %x %x %x %x %x %x\n",
0xC0, 0xC0, 0x61, 0x62, 0x63, 0x31, 0x32, 0x33);
}
Which produces this:
$ ./foo
c0 c0 61 62 63 31 32 33

Resources