Why does printing char sometimes print 4 bytes number in C - c

Why does printing a hex representation of char to the screen using printf sometimes prints a 4 byte number?
This is the code I have written
#include <stdio.h>
#include <stdint.h>
#include<stdio.h>
int main(void) {
char testStream[8] = {'a', 'b', 'c', 'd', 0x3f, 0x9d, 0xf3, 0xb6};
int i;
for(i=0;i<8;i++){
printf("%c = 0x%X, ", testStream[i], testStream[i]);
}
return 0;
}
And following is the output:
a = 0x61, b = 0x62, c = 0x63, d = 0x64, ? = 0x3F, � = 0xFFFFFF9D, � = 0xFFFFFFF3, � = 0xFFFFFFB6

char appears to be signed on your system. With the standard "two's complement" representation of integers, having the most significant bit set means it is a negative number.
In order to pass a char to a vararg function like printf it has to be expanded to an int. To preserve its value the sign bit is copied to all the new bits (0x9D → 0xFFFFFF9D). Now the %X conversion expects and prints an unsigned int and you get to see all the set bits in the negative number rather than a minus sign.
If you don't want this, you have to either use unsigned char or cast it to unsigned char when passing it to printf. An unsigned char has no extra bits compared to a signed char and therefore the same bit pattern. When the unsigned value gets extended, the new bits will be zeros and you get what you expected in the first place.

From the C standard (C11 6.3.2.1/8) description of %X:
The unsigned int argument is converted to unsigned octal (o), unsigned
decimal (u), or unsigned hexadecimal notation (x or X) in the style dddd; the
letters abcdef are used for x conversion and the letters ABCDEF for X
conversion.
You did not provide an unsigned int as argument1, therefore your code causes undefined behaviour.
In this case the undefined behaviour manifests itself as the implementation of printf writing its code for %X to behave as if you only ever pass unsigned int. What you are seeing is the unsigned int value which has the same bit-pattern as the negative integer value you gave as argument.
There's another issue too, with:
char testStream[8] = {'a', 'b', 'c', 'd', 0x3f, 0x9d, 0xf3, 0xb6};
On your system the range of char is -128 to +127. However 0x9d, which is 157, is out of range for char. This causes implementation-defined behaviour (and may raise a signal); the most common implementation definition here is that the char with the same bit-pattern as (unsigned char)0x9d will be selected.
1 Although it says unsigned int, this section is usually interpreted to mean that a signed int, or any argument of lower rank, with a non-negative value is permitted too.

On your machine, char is signed by default. Change the type to unsigned char and you'll get the results you are expecting.
A Quick explanation on why this is
In computer systems, the MSB (Most Significant Bit) is the bit with the highest value (the left most bit). The MSB of a number is used to determine if the number is positive or negative. Even though a char type is 8-bits long, a signed char only can use 7-bits because the 8th bit determines if its positive or negative. Here is an example:
Data Type: signed char
Decimal: 25
Binary: 00011001
^
|
--- Signed flag. 0 indicates positive number. 1 indicates negtive number
Because a signed char uses the 8th bit as a signed flag, the number of bits it can actually use to store a number is 7-bits. The largest value you can store in 7-bits is 127 (7F in hex).
In order to convert a number from positive to negative, computers use something called two's-compliment. How it works is that all the bits are inverted, then 1 is added to the value. Here's an example:
Decimal: 25
Binary: 00011001
Decimal: -25
Binary: 11100111
When you declared char testStream[8], the compiler assumed you wanted signed char's. When you assigned a value of 0x9D or 0xF3, those numbers were bigger then 0x7F, which is the biggest number that can fit into 7-bits of a signed char. Therefore, when you tried to printf the value to the screen, it was expanded into an int and filled with FF's.
I hope this explanation clears things up!

char is signed on your platform: the initializer 0x9d for the 6th character is larger than CHAR_MAX (157 > 127), it is converted to char as a negative value -99 (157 - 256 = -99) stored at offset 5 in textStream.
When you pass textStream[5] as an argument to printf, it is first promoted to int, with a value of -99. printf actually expects an unsigned int for the "%X" format specifier.
On your architecture, int is 32 bits with 2's complement representation of negative values, hence the value -99 passed as int is interpreted as 4294967197 (2^32-99), whose hexadecimal representation is 0xFFFFFF9D. On a different architecture, it could be something else: on 16-bit DOS, you would get 0xFF9D, on a 64-bit Cray you might get 0xFFFFFFFFFFFFFF9D.
To avoid this confusion, you should cast the operands of printf as (unsigned char). Try replacing your printf with this:
printf("%c = 0x%2X, ", (unsigned char)testStream[i], (unsigned char)testStream[i]);

What seem to happen here is implicit char -> int -> uint cast. When the positive char is being converted to int nothing bad happens. But in case of the negative chars such as 0x9d, 0xf3, 0xb6 cast to int will keep them negative and therefore they become 0xffffff9d, 0xfffffff3, 0xffffffb6. Not that actual value is not changed, that is 0xffffff9d == -99 and 0x9d == -99.
To print them properly you can change your code to
printf("%c = 0x%X, ", testStream[i] & 0xff, testStream[i] & 0xff);

Related

Since characters from -128 to -1 are same as from +128 to +255, then what is the point of using unsigned char?

#include <stdio.h>
#include <conio.h>
int main()
{
char a=-128;
while(a<=-1)
{
printf("%c\n",a);
a++;
}
getch();
return 0;
}
The output of the above code is same as the output of the code below
#include <stdio.h>
#include <conio.h>
int main()
{
unsigned char a=+128;
while(a<=+254)
{
printf("%c\n",a);
a++;
}
getch();
return 0;
}
Then why we use unsigned char and signed char?
K & R, chapter and verse, p. 43 and 44:
There is one subtle point about the conversion of characters to
integers. The language does not specify whether variables of type char
are signed or unsigned quantities. When a char is converted to an int,
can it ever produce a negative integer? The answer varies from machine
to machine, reflecting differences in architecture. On some machines,
a char whose leftmost bit is 1 will be converted to a negative integer
("sign extension"). On others, a char is promoted to an int by adding
zeros at the left end, and thus is always positive. [...] Arbitrary
bit patterns stored in character variables may appear to be negative
on some machines, yet positive on others. For portability, specify
signed or unsigned if non-character data is to be stored in char
variables.
With printing characters - no difference:
The function printf() uses "%c" and takes the int argument and converts it to unsigned char and then prints it.
char a;
printf("%c\n",a); // a is converted to int, then passed to printf()
unsigned char ua;
printf("%c\n",ua); // ua is converted to int, then passed to printf()
With printing values (numbers) - difference when system uses a char that is signed:
char a = -1;
printf("%d\n",a); // --> -1
unsigned char ua = -1;
printf("%d\n",ua); // --> 255 (Assume 8-bit unsigned char)
Note: Rare machines will have int the same size as char and other concerns apply.
So if code uses a as a number rather than a character, the printing differences are significant.
The bit representation of a number is what the computer stores, but it doesn't mean anything without someone (or something) imposing a pattern onto it.
The difference between the unsigned char and signed char patterns is how we interpret the set bits. In one case we decide that zero is the smallest number and we can add bits until we get to 0xFF or binary 11111111. In the other case we decide that 0x80 is the smallest number and we can add bits until we get to 0x7F.
The reason we have the funny way of representing signed numbers (the latter pattern) is because it places zero 0x00 roughly in the middle of the sequence, and because 0xFF (which is -1, right before zero) plus 0x01 (which is 1, right after zero) add together to carry until all the bits carry off the high end leaving 0x00 (-1 + 1 = 0). Likewise -5 + 5 = 0 by the same mechanisim.
For fun, there are a lot of bit patterns that mean different things. For example 0x2a might be what we call a "number" or it might be a * character. It depends on the context we choose to impose on the bit patterns.
Because unsigned char is used for one byte integer in C89.
Note there are three distinct char related types in C89: char, signed char, unsigned char.
For character type, char is used.
unsigned char and signed char are used for one byte integers like short is used for two byte integers. You should not really use signed char or unsigned char for characters. Neither should you rely on the order of those values.
Different types are created to tell the compiler how to "understand" the bit representation of one or more bytes. For example, say I have a byte which contains 0xFF. If it's interpreted as a signed char, it's -1; if it's interpreted as a unsigned char, it's 255.
In your case, a, no matter whether signed or unsigned, is integral promoted to int, and passed to printf(), which later implicitly convert it to unsigned char before printing it out as a character.
But let's consider another case:
#include <stdio.h>
#include <string.h>
int main(void)
{
char a = -1;
unsigned char b;
memmove(&b, &a, 1);
printf("%d %u", a, b);
}
It's practically acceptable to simply write printf("%d %u", a, a);. memmove() is used just to avoid undefined behaviour.
It's output on my machine is:
-1 4294967295
Also, think about this ridiculous question:
Suppose sizeof (int) == 4, since arrays of characters (unsigned
char[]){UCHAR_MIN, UCHAR_MIN, UCHAR_MIN, UCHAR_MIN} to (unsigned
char[]){UCHAR_MAX, UCHAR_MAX, UCHAR_MAX, UCHAR_MAX} are same as
unsigned ints from UINT_MIN to UINT_MAX, then what is the point
of using unsigned int?

Whats wrong with this C code?

My sourcecode:
#include <stdio.h>
int main()
{
char myArray[150];
int n = sizeof(myArray);
for(int i = 0; i < n; i++)
{
myArray[i] = i + 1;
printf("%d\n", myArray[i]);
}
return 0;
}
I'm using Ubuntu 14 and gcc to compile it, what it prints out is:
1
2
3
...
125
126
127
-128
-127
-126
-125
...
Why doesn't it just count up to 150?
int value of a char can range from 0 to 255 or -127 to 127, depending on implementation.
Therefore once the value reaches 127 in your case, it overflows and you get negative value as output.
The signedness of a plain char is implementation defined.
In your case, a char is a signed char, which can hold the value of a range to -128 to +127.
As you're incrementing the value of i beyond the limit signed char can hold and trying to assign the same to myArray[i] you're facing an implementation-defined behaviour.
To quote C11, chapter §6.3.1.4,
Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised.
Because a char is a SIGNED BYTE. That means it's value range is -128 -> 127.
EDIT Due to all the below comment suggesting this is wrong / not the issue / signdness / what not...
Running this code:
char a, b;
unsigned char c, d;
int si, ui, t;
t = 200;
a = b = t;
c = d = t;
si = a + b;
ui = c + d;
printf("Signed:%d | Unsigned:%d", si, ui);
Prints: Signed:-112 | Unsigned:400
Try yourself
The reason is the same. a & b are signed chars (signed variables of size byte - 8bits). c & d are unsigned. Assigning 200 to the signed variables overflows and they get the value -56. In memory, a, b,c&d` all hold the same value, but when used their type "signdness" dictates how the value is used, and in this case it makes a big difference.
Note about standard
It has been noted (in the comments to this answer, as well as other answers) that the standard doesn't mandate that char is signed. That is true. However, in the case presented by OP, as well the code above, char IS signed.
It seems that your compiler by default considers type char like type signed char. In this case CHAR_MIN is equal to SCHAR_MIN and in turn equal to -128 while CHAR_MAX is equal to SCHAR_MAX and in turn equal to 127 (See header <limits.h>)
According to the C Standard (6.2.5 Types)
15 The three types char, signed char, and unsigned char are
collectively called the character types. The implementation shall
define char to have the same range, representation, and behavior as
either signed char or unsigned char
For signed types one bit is used as the sign bit. So for the type signed char the maximum value corresponds to the following representation in the hexadecimal notation
0x7F
and equal to 127. The most significant bit is the signed bit and is equal to 0.
For negative values the signed bit is set to 1 and for example -128 is represented like
0x80
When in your program the value stored in char reaches its positive maximum 0x7Fand was increased it becomes equal to 0x80 that in the decimal notation is equal to -128.
You should explicitly use type unsigned char instead of the char if you want that the result of the program execution did not depend on the compiler settings.
Or in the printf statement you could explicitly cast type char to type unsigned char. For example
printf("%d\n", ( unsigned char )myArray[i]);
Or to compare results you could write in the loop
printf("%d %d\n", myArray[i], ( unsigned char )myArray[i]);

Char and int16 array element both shown as 32bit hex?

In the example below:
int main(int argc, char *argv[])
{
int16_t array1[] = {0xffff,0xffff,0xffff,0xffff};
char array2[] = {0xff,0xff,0xff,0xff};
printf("Char size: %d \nint16_t size: %d \n", sizeof(char), sizeof(int16_t));
if (*array1 == *array2)
printf("They are the same \n");
if (array1[0] == array2[0])
printf("They are the same \n");
printf("%x \n", array1[0]);
printf("%x \n", *array1);
printf("%x \n", array2[0]);
printf("%x \n", *array2);
}
Output:
Char size: 1
int16_t size: 2
They are the same
They are the same
ffffffff
ffffffff
ffffffff
ffffffff
Why are the 32bit values printed for both char and int16_t and why can they be compared and are considered the same?
They're the same because they're all different representations of -1.
They print as 32 bits' worth of ff becaue you're on a 32-bit machine and you used %d and the default argument promotions took place (basically, everything smaller gets promoted to int). Try using %hx. (That'll probably get you ffff; I don't know of a way to get ff here other than by using unsigned char, or masking with & 0xff: printf("%x \n", array2[0] & 0xff) .)
Expanding on "They're the same because they're all different representations of -1":
int16_t is a signed 16-bit type. It can contain values in the range -32768 to +32767.
char is an 8-bit type, and on your machine it's evidently signed also. So it can contain values in the range -128 to +127.
0xff is decimal 255, a value which can't be represented in a signed char. If you assign 0xff to a signed char, that bit pattern ends up getting interpreted not as 255, but rather as -1. (Similarly, if you assigned 0xfe, that would be interpreted not as 254, but rather as -2.)
0xffff is decimal 65535, a value which can't be represented in an int16_t. If you assign 0xffff to a int16_t, that bit pattern ends up getting interpreted not as 65535, but rather as -1. (Similarly, if you assigned 0xfffe, that would be interpreted not as 65534, but rather as -2.)
So when you said
int16_t array1[] = {0xffff,0xffff,0xffff,0xffff};
it was basically just as if you'd said
int16_t array1[] = {-1,-1,-1,-1};
And when you said
char array2[] = {0xff,0xff,0xff,0xff};
it was just as if you'd said
char array2[] = {-1,-1,-1,-1};
So that's why *array1 == *array2, and array1[0] == array2[0].
Also, it's worth noting that all of this is very much because of the types of array1 and array2. If you instead said
uint16_t array3[] = {0xffff,0xffff,0xffff,0xffff};
unsigned char array4[] = {0xff,0xff,0xff,0xff};
You would see different values printed (ffff and ff), and the values from array3 and array4 would not compare the same.
Another answer stated that "there is no type information in C at runtime". That's true but misleading in this case. When the compiler generates code to manipulate values from array1, array2, array3, and array4, the code it generates (which of course is significant at runtime!) will be based on their types. In particular, when generating code to fetch values from array1 and array2 (but not array3 and array4), the compiler will use instructions which perform sign extension when assigning to objects of larger type (e.g. 32 bits). That's how 0xff and 0xffff got changed into 0xffffffff.
Because there is no type information in C at runtime and by using a plain %x for printing, you tell printf that your pointer points to an unsigned int. Poor library function just trusts you ... see Length modifier in printf(3) for how to give printf the information it needs.
Using %x to print negative values causes undefined behaviour so you should not assume that there is anything sensible about what you are seeing.
The correct format specifier for char is %hhd, and for int16_t it is "%" PRId16. You will need #include <inttypes.h> to get the latter macro.
Because of the default argument promotions, it is also correct to use %d with char and int16_t 1. If you change your code to use %d instead of %x, it will no longer exhibit undefined behaviour, and the results will make sense.
1 The C standard doesn't actually say that, but it's assumed that that was the intent of the writers.

Storing hexadecimal number in char

#include <stdio.h>
int main(int argc, char const *argv[])
{
char a = 0xAA;
int b;
b = (int)a;
b = b >> 4;
printf("%x\n", b);
return 0;
}
Here the output is fffffffa. Could anyone please explain to me how this output was obtained?
C standard allows compiler designers choose if char is signed or unsigned. It appears that your system uses signed chars and 32-bit ints. Since the most significant bit of 0xAA (binary 10101010) is set, the value gets sign-extended into 0xFFFFFFAA.
Shifting signed values right also sign-extends the result, so when you shift out the lower four bits, four ones get shifted in from the left, resulting in the final output of 0xFFFFFFFA.
EDIT : According to C99 specification, hexadecimal integer constants such as 0xAA in your example are treated as ints of different length depending on their length. Therefore, assigning 0xAA to a signed char is out of range: a proper way of assigning the value would be with a hexadecimal character literal, like this:
char a='\xAA';
It looks like 0xAA got sign extended when you put it into an int to 0xFFFFFFAA. Then, when you right-shifted it by four bits (one hex character) you ended up with 0xFFFFFFFA.
//a is 8-bits wide. If you interpret this as a signed value, it's negative
char a=0xAA;
int b; //b is 32 bits wide here, also signed
//the compiler sign-extends A to 0xFFFFFFAA to keep the value negative
b=(int)a;
b=b>>4; //right-shift maintains the sign-bit, so now you have 0xFFFFFFFA
The standard allows char to be either signed or unsigned in your case it looks like it is signed. Assigning 0xAA to a signed char is signed overflow and therefore undefined behavior. So if you change your declaration to this:
unsigned char a=0xAA;
you should get the results you expect.

Printing hexadecimal characters in C

I'm trying to read in a line of characters, then print out the hexadecimal equivalent of the characters.
For example, if I have a string that is "0xc0 0xc0 abc123", where the first 2 characters are c0 in hex and the remaining characters are abc123 in ASCII, then I should get
c0 c0 61 62 63 31 32 33
However, printf using %x gives me
ffffffc0 ffffffc0 61 62 63 31 32 33
How do I get the output I want without the "ffffff"? And why is it that only c0 (and 80) has the ffffff, but not the other characters?
You are seeing the ffffff because char is signed on your system. In C, vararg functions such as printf will promote all integers smaller than int to int. Since char is an integer (8-bit signed integer in your case), your chars are being promoted to int via sign-extension.
Since c0 and 80 have a leading 1-bit (and are negative as an 8-bit integer), they are being sign-extended while the others in your sample don't.
char int
c0 -> ffffffc0
80 -> ffffff80
61 -> 00000061
Here's a solution:
char ch = 0xC0;
printf("%x", ch & 0xff);
This will mask out the upper bits and keep only the lower 8 bits that you want.
Indeed, there is type conversion to int.
Also you can force type to char by using %hhx specifier.
printf("%hhX", a);
In most cases you will want to set the minimum length as well to fill the second character with zeroes:
printf("%02hhX", a);
ISO/IEC 9899:201x says:
7 The length modifiers and their meanings are:
hh Specifies that a following d, i, o, u, x, or X conversion specifier applies to a
signed char or unsigned char argument (the argument will have
been promoted according to the integer promotions, but its value shall be
converted to signed char or unsigned char before printing); or that
a following
You can create an unsigned char:
unsigned char c = 0xc5;
Printing it will give C5 and not ffffffc5.
Only the chars bigger than 127 are printed with the ffffff because they are negative (char is signed).
Or you can cast the char while printing:
char c = 0xc5;
printf("%x", (unsigned char)c);
You can use hh to tell printf that the argument is an unsigned char. Use 0 to get zero padding and 2 to set the width to 2. x or X for lower/uppercase hex characters.
uint8_t a = 0x0a;
printf("%02hhX", a); // Prints "0A"
printf("0x%02hhx", a); // Prints "0x0a"
Edit: If readers are concerned about 2501's assertion that this is somehow not the 'correct' format specifiers I suggest they read the printf link again. Specifically:
Even though %c expects int argument, it is safe to pass a char because of the integer promotion that takes place when a variadic function is called.
The correct conversion specifications for the fixed-width character types (int8_t, etc) are defined in the header <cinttypes>(C++) or <inttypes.h> (C) (although PRIdMAX, PRIuMAX, etc is synonymous with %jd, %ju, etc).
As for his point about signed vs unsigned, in this case it does not matter since the values must always be positive and easily fit in a signed int. There is no signed hexideximal format specifier anyway.
Edit 2: ("when-to-admit-you're-wrong" edition):
If you read the actual C11 standard on page 311 (329 of the PDF) you find:
hh: Specifies that a following d, i, o, u, x, or X conversion specifier applies to a signed char or unsigned char argument (the argument will have been promoted according to the integer promotions, but its value shall be converted to signed char or unsigned char before printing); or that a following n conversion specifier applies to a pointer to a signed char argument.
You are probably storing the value 0xc0 in a char variable, what is probably a signed type, and your value is negative (most significant bit set). Then, when printing, it is converted to int, and to keep the semantical equivalence, the compiler pads the extra bytes with 0xff, so the negative int will have the same numerical value of your negative char. To fix this, just cast to unsigned char when printing:
printf("%x", (unsigned char)variable);
You are probably printing from a signed char array. Either print from an unsigned char array or mask the value with 0xff: e.g. ar[i] & 0xFF. The c0 values are being sign extended because the high (sign) bit is set.
Try something like this:
int main()
{
printf("%x %x %x %x %x %x %x %x\n",
0xC0, 0xC0, 0x61, 0x62, 0x63, 0x31, 0x32, 0x33);
}
Which produces this:
$ ./foo
c0 c0 61 62 63 31 32 33

Resources