Defined behavior, passing character to printf("%02X" - c

I recently came across this question, where the OP was having issues printing the hexadecimal value of a variable. I believe the problem can be summed by the following code:
#include <stdio.h>
int main() {
char signedChar = 0xf0;
printf("Signed\n”);
printf(“Raw: %02X\n”, signedChar);
printf(“Masked: %02X\n”, signedChar &0xff);
printf(“Cast: %02X\n", (unsigned char)signedChar);
return 0;
}
This gives the following output:
Signed
Raw: FFFFFFF0
Masked: F0
Cast: F0
The format string used for each of the prints is %02X, which I’ve always interpreted as ‘print the supplied int as a hexadecimal value with at least two digits’.
The first case passes the signedCharacter as a parameter and prints out the wrong value (because the other three bytes of the int have all of their bits set).
The second case gets around this problem, by applying a bit mask (0xFF) against the value to remove all but the least significant byte, where the char is stored. Should this work? Surely: signedChar == signedChar & 0xFF?
The third case gets around the problem by casting the character to an unsigned char (which seems to clear the top three bytes?).
For each of the three cases above, can anybody tell me if the behavior defined? How/Where?

I don't think this behavior is completely defined by c standard. After all it depends on binary representation of signed values. I will just describe how it's likely to work.
printf(“Raw: %02X\n”, signedChar);
(char)0xf0 which can be written as (char)-16 is converted to (int)-16 its hex representation is 0xfffffff0.
printf(“Masked: %02X\n”, signedChar &0xff);
0xff is of type int so before calculating &, signedChar is converted to (int)-16.
((int)-16) & ((int)0xff) == (int)0x000000f0.
printf(“Cast: %02X\n", (unsigned char)signedChar);
(unsigned char)0xf0 which can be written as (unsigned char)240 is converted to (unsigned int)240 as hex it's 0x000000f0

Related

Operator "<<= " : What does it it mean?

I need help solving this problem in my mind, so if anyone had a similar problem it would help me.
Here's my code:
char c=0xAB;
printf("01:%x\n", c<<2);
printf("02:%x\n", c<<=2);
printf("03:%x\n", c<<=2);
Why the program prints:
01:fffffeac
02:ffffffac
03:ffffffb0
What I expected to print, that is, what I got on paper is:
01:fffffeac
02:fffffeac
03:fffffab0
I obviously realized I didn't know what the operator <<= was doing, I thought c = c << 2.
If anyone can clarify this, I would be grateful.
You're correct in thinking that
c <<= 2
is equivalent to
c = c << 2
But you have to remember that c is a single byte (on almost all systems), it can only contain eight bits, while a value like 0xeac requires 12 bits.
When the value 0xeac is assigned back to c then the value will be truncated and the top bits will simply be ignored, leaving you with 0xac (which when promoted to an int becomes 0xffffffac).
<<= means shift and assign. It's the compound assignment version of c = c << 2;.
There's several problems here:
char c=0xAB; is not guaranteed to give a positive result, since char could be an 8 bit signed type. See Is char signed or unsigned by default?. In which case 0xAB will get translated to a negative number in an implementation-defined way. Avoid this bug by always using uint8_t when dealing with raw binary bytes.
c<<2 is subject to Implicit type promotion rules - specifically c will get promoted to a signed int. If the previous issue occured where your char got a negative value, c now holds a negative int.
Left-shifting negative values in C invokes undefined behavior - it is always a bug. Shifting signed operands in general is almost never correct.
%x isn't a suitable format specifier to print the int you ended up with, nor is it suitable for char.
As for how to fix the code, it depends on what you wish to achieve. It's recommended to cast to uint32 before shifting.

wrong conversion of two bytes array to short in c

I'm trying to convert 2 bytes array to an unsigned short.
this is the code for the conversion :
short bytesToShort(char* bytesArr)
{
short result =(short)((bytesArr[1] << 8)|bytesArr[0]);
return result;
}
I have an InputFile which stores bytes, and I read its bytes via loop (2 bytes each time) and store it in char N[] arr in this manner :
char N[3];
N[2]='\0';
while(fread(N,1,2,inputFile)==2)
when the (hex) value of N[0]=0 the computation is correct otherwise its wrong, for example :
0x62 (N[0]=0x0,N[1]=0x62) will return 98 (in short value), but 0x166 in hex (N[0]=0x6,N[1]=0x16) will return 5638 (in short value).
In the first place, it's generally best to use type unsigned char for the bytes of raw binary data, because that correctly expresses the semantics of what you're working with. Type char, although it can be, and too frequently is, used as a synonym for "byte", is better reserved for data that are actually character in nature.
In the event that you are furthermore performing arithmetic on byte values, you almost surely want unsigned char instead of char, because the signedness of char is implementation-defined. It does vary among implementations, and on many common implementations char is signed.
With that said, your main problem appears simple. You said
166 in hex (N[0]=6,N[1]=16) will return 5638 (in short value).
but 0x166 packed into a two-byte little-endian array would be (N[0]=0x66,N[1]=0x1). What you wrote would correspond to 0x1606, which indeed is the same as decimal 5638.
The problem is sign extension due to using char. You should use unsigned char instead:
#include <stdio.h>
short bytesToShort(unsigned char* bytesArr)
{
short result = (short)((bytesArr[1] << 8) | bytesArr[0]);
return result;
}
int main()
{
printf("%04x\n", bytesToShort("\x00\x11")); // expect 0x1100
printf("%04x\n", bytesToShort("\x55\x11")); // expect 0x1155
printf("%04x\n", bytesToShort("\xcc\xdd")); // expect 0xddcc
return 0;
}
Note: the problem in the code is not the one presented by the OP. The problem is returning the wrong result upon the input "\xcc\xdd". It will produce 0xffcc where it should be 0xddcc

Is this Integer Promotion? How does it work?

I was just experimenting and I tried out two printf()s.
unsigned char a = 1;
a = ~a;
printf("----> %x %x %x %x", ~a, a, ~a, ++a);
This one gave the output
----> ffffff00 ff ffffff00 ff
Next one was
unsigned char a = 1;
printf("----> %x %x %x %x", ~a, a, ~a, ++a);
This one gave the output
----> fffffffd 2 fffffffd 2
Now, I know what '++' does and '~' does. I also know that the sequence of operation inside printf is from the right.
But could some one explain the difference in the number of bytes printed? A total explanation of output would be helpful of course, but I am more interested in the number of bytes and the difference in both cases [especially in the printf a and ~a parts].
EDIT:
OK, looks like the ++ part and my mistake of "I also know that the sequence of operation inside printf is from the right" has prompted every post other than the answer I was hopefully looking for. So may be the way I asked was wrong.
I will try again,
unsigned char a = ~1;
a = ~a;
printf("----> %x", a);
OUTPUT: ----> 1
unsigned char a = ~1;
printf("----> %x", ~a);
OUTPUT: ----> ffffff01
Why this difference?
printf("----> %x %x %x %x", ~a, a, ~a, ++a); actually invokes undefined behavior because you have a side effect on a and other expressions depending on the same lvalue. So anything can happen and it is hopeless to try and explain the output produced.
Assuming 32 bit ints in 2's complement representation, if you wrote
printf("----> %x %x %x %x", ~a, a, ~a, a + 1);
You would get different and less surprising output:
ffffff01 fe ffffff01 ff
Let me explain what is going on:
a = ~a;
a contains 1, is converted to an int with the same value, the ~ operator applied to 1 computes to -2, converting that back to unsigned char gives 254 or 0xfe.
The arguments to printf are then computed as follows:
~a: 0xfe is converted to int and all bit are complemented, yielding 0xffffff01.
a is converted to int with the same value and printed as fe.
~a again of course gives the same output.
a+1: a is converted to int before incrementing by one, result is 255, prints as ff.
The explanation for your surprising outputs is that a is first converted to int and then the computation is done on the int value.
a = ~a; You have integer promotion on this line already, since the ~, like most operators in C, promotes the operand according to the rules of integer promotion.
The character containing value 1 gets integer promoted to an int containing the value 1, before the operation is done. Assuming 32 bit int, the result of ~a is a negative, two complement variable with hex value 0xFFFFFFFE.
You then show this result back into the unsigned char, it will truncate the result and only grab the raw binary value of the least significant byte, that is: 0xFE.
I also know that the sequence of operation inside printf is from the right.
No. The order of evaluation of function parameters in not specified by the standard. The compiler is free to evaluate them in any order it likes, and you cannot know or assume any particular order.
Even more problematic is that there is no sequence point between the evaluation of the different parameters. And since in your case you are using the same variable more than once, each access to the variable is unsequenced and your program invokes undefined behavior. Meaning that anything can happen: weird outputs, program crashes, memory corruption etc etc.
Furthermore, printf is a special case, being an obscure, variadic function. All such functions have particular rules for promotion of the arguments ("the default argument promotions"). So regardless of what promotions that happen or don't happen before you pass the result to printf, printf will ruin everything by applying its own integer promotion to the parameter.
So if you wish to toy around with promotion, printf is a very bad choice for displaying the result. Try using the sizeof operator instead. printf("%zu", sizeof(~a)); will for example print 4, because of integer promotion.

Testing the endianness of a machine

Here is the program I used:
int hex = 0x23456789;
char * val = &hex;
printf("%p\n",hex);
printf("%p %p %p %p\n",*val,*(val+1),*(val+2),*(val+3));
Here is my output:
0x23456789
0xffffff89 0x67 0x45 0x23
I am working on a 64 bit CPU with a 64 bit OS. This shows my machine is little endian. Why is the first byte 0xffffff89? Why the ff's?
Firstly, you should be using %x since those aren't pointers.
The %x specifiers expect an integer. Because you are passing in a value of type 'char', which is a signed type, the value is being converted to an integer and being sign extended.
http://en.wikipedia.org/wiki/Sign_extension
That essentially means that it takes the most significant bit and uses it for all the higher bits. So 0x89 => 0b10001001 , which has a highest bit of '1' becomes 0xFFFFFF89.
The proper solution is to specify a 'length' parameter options. You can get more info here: Printf Placeholders Essentially, between the '%' and the 'x', you can put extra parameters. 'hh' means that you are passing a char value.
int hex = 0x23456789;
char *val = (char*)&hex;
printf("%x\n",hex);
printf("%hhx %hhx %hhx %hhx\n", val[0], val[1], val[2], val[3]);
char is a signed type, it gets promoted to int when passed as an argument. This promotion causes sign extension. 0x89 is a negative value for char, it gets thus sign-extended to 0xffffff89. This does not happen for the other values, they don't exceed CHAR_MAX, 127 or 0x7f on the most machines. You are getting confused by this behavior because you use the wrong format specifier.
%p is asking printf to format it as an address, you are actaully passing a value (*val)
On a 64 bit machine pointer addresses are 64bit, so printf is adding the ffff to pad the fields
As #Martin Beckett said, %p asks printf to print a pointer, which is equivalent to %#x or %#lx (the exact format depends on your OS).
This means printf expect an int or a long (again depends on OS), but you are only supplying it with char so the value is up-cast to the appropriate type.
When you cast a smaller signed number to a bigger signed number you have to do something called sign extension in order to preserve the value. In the case of 0x89 this occurs because the sign bit is set, so the upper bytes are 0xff and get printed because they are significant.
In the case of 0x67, 0x45, 0x23 sign extension does not happen because the sign bit is not set, and so the upper bytes are 0s and thus not printed.
I test the endian-ness with the condition ((char)((int)511) == (char)255). True means little, false means big.
I have tested this on a few separate systems, both little and big, using gcc with optimizations off and to max. In every test I have done I have gotten correct results.
You could put that condition in an if of your application before it needs to do endian-critical operations. If you only want to guarentee you are using the right endian-ness for your entire application, you could instead use a static assertion method such as follows:
extern char ASSERTION__LITTLE_ENDIAN[((char)((int)511) == (char)255)?1:-1];
That line in the global scope will create a compile error if the system is not little endian and will refuse to compile. If there was no error, it compiles perfectly as if that line didn't exist. I find that the error message is pretty descriptive:
error: size of array 'ASSERTION__LITTLE_ENDIAN' is negative
Now if you're paranoid of your compiler optimizing the actual check away like I am, you can do the following:
int endian;
{
int i = 255;
char * c = &i;
endian = (c[0] == (char)255);
}
if(endian) // if endian is little
Which compacts nicely in to this macro:
#define isLittleEndian(e) int e; { int i = 255; char * c = &i; e = (c[0] == (char)255); }
isLittleEndian(endian);
if(endian) // if endian is little
Or if you use GCC, you can get away with:
#define isLittleEndian ({int i = 255; char * c = &i; (c[0] == (char)255);})
if(isLittleEndian) // if endian is little

Types questions in ANSI C

I having few questions about typed in ANSI C:
1. what's the difference between "\x" in the beginning of a char to 0x in the beginning of char (or in any other case for this matter). AFAIK, they both means that this is hexadecimal.. so what's the difference.
when casting char to (unsigned), not (unsigned char) - what does it mean? why (unsigned)'\xFF' != 0xFF?
Thanks!
what's the difference between "\x" in
the beginning of a char to 0x in the
beginning of char
The difference is that 0x12 is used for specifying an integer in hexadecimal, while "\x" is used for string literals. An example:
#include <stdio.h>
int main(){
int ten = 0xA;
char* tenString = "1\x30";
printf("ten as integer: %d\n", ten);
printf("ten as string: %s\n", tenString);
return 0;
}
The printf's should both output a "10" (try to understand why).
when casting char to (unsigned), not
(unsigned char) - what does it mean?
why (unsigned)'\xFF' != 0xFF?
"unsigned" is just an abbreviation for "unsigned int". So you're casting from char to int. This will give you the numeric representation of the character in the character set your platform uses. Note that the value you get for a character is platform-dependent (typically depending on the default character encoding). For ASCII characters you will (usually) get the ASCII code, but anything beyond that will depend on platform and runtime configuration.
Understanding what a cast from one typ to another does is very complicated (and often, though not always, platform-dependent), so avoid it if you can. Sometimes it is necessary, though. See e.g. need-some-clarification-regarding-casting-in-c

Resources