Convert raw ASCII data to Hex string - c

I have the following code to convert raw ASCII data to Hex string. The full c code can be found here
void str2hex(char* inputStr, char* outputStr)
{
int i;
int counter;
i=0;
counter=0;
while(inputStr[counter] != '\0')
{
sprintf((char*)(outputStr+i),"%02X", inputStr[counter]);
i+=2; counter+=1;
}
outputStr[i++] = '\0';
}
It works fine for most of the values. But when I am trying the following input from terminal using echo as stdin echo 11223344556677881122334455667788|xxd -r -p| ./CProgram --stdin
11223344556677881122334455667788
It returns the following output
11223344556677FF11223344556677FF
As it can be seen instead of 88 it returns FF.
How can I adjust this code to get 88 instead of FF.

There are multiple issues all coalescing into your problem.
The first issue is that it's compiler-defined if char is a signed or unsigned integer type. Your compiler seem to have signed char types.
The second issue is that on most systems today, signed integers are represented using two's complement, where the most significant bit indicates the sign.
The third issue is that vararg functions like printf will do default argument promotion of its arguments. That means types smaller than int will be promoted to int. And that promotion will keep the value of the converted integer, which means negative values will be sign-extended. Sign-extension means that the most significant bit will be copied all the way to the "top" when extending the value. That means the signed byte 0xff will be extended to 0xffffffff when promoted to an int.
Now when your code tries to convert the byte 0x88 it will be treated as the negative number -120, not 136 as you might expect.
There are two possible solutions to this:
Explicitly use unsigned char for the input string:
void str2hex(const unsigned char* inputStr, char* outputStr);
Use the hh prefix in the printf format:
sprintf((char*)(outputStr+i),"%02hhX", inputStr[counter]);
This tells sprintf that the argument is a single byte, and will mask out the upper bits of the (promoted) integer.

Related

Since characters from -128 to -1 are same as from +128 to +255, then what is the point of using unsigned char?

#include <stdio.h>
#include <conio.h>
int main()
{
char a=-128;
while(a<=-1)
{
printf("%c\n",a);
a++;
}
getch();
return 0;
}
The output of the above code is same as the output of the code below
#include <stdio.h>
#include <conio.h>
int main()
{
unsigned char a=+128;
while(a<=+254)
{
printf("%c\n",a);
a++;
}
getch();
return 0;
}
Then why we use unsigned char and signed char?
K & R, chapter and verse, p. 43 and 44:
There is one subtle point about the conversion of characters to
integers. The language does not specify whether variables of type char
are signed or unsigned quantities. When a char is converted to an int,
can it ever produce a negative integer? The answer varies from machine
to machine, reflecting differences in architecture. On some machines,
a char whose leftmost bit is 1 will be converted to a negative integer
("sign extension"). On others, a char is promoted to an int by adding
zeros at the left end, and thus is always positive. [...] Arbitrary
bit patterns stored in character variables may appear to be negative
on some machines, yet positive on others. For portability, specify
signed or unsigned if non-character data is to be stored in char
variables.
With printing characters - no difference:
The function printf() uses "%c" and takes the int argument and converts it to unsigned char and then prints it.
char a;
printf("%c\n",a); // a is converted to int, then passed to printf()
unsigned char ua;
printf("%c\n",ua); // ua is converted to int, then passed to printf()
With printing values (numbers) - difference when system uses a char that is signed:
char a = -1;
printf("%d\n",a); // --> -1
unsigned char ua = -1;
printf("%d\n",ua); // --> 255 (Assume 8-bit unsigned char)
Note: Rare machines will have int the same size as char and other concerns apply.
So if code uses a as a number rather than a character, the printing differences are significant.
The bit representation of a number is what the computer stores, but it doesn't mean anything without someone (or something) imposing a pattern onto it.
The difference between the unsigned char and signed char patterns is how we interpret the set bits. In one case we decide that zero is the smallest number and we can add bits until we get to 0xFF or binary 11111111. In the other case we decide that 0x80 is the smallest number and we can add bits until we get to 0x7F.
The reason we have the funny way of representing signed numbers (the latter pattern) is because it places zero 0x00 roughly in the middle of the sequence, and because 0xFF (which is -1, right before zero) plus 0x01 (which is 1, right after zero) add together to carry until all the bits carry off the high end leaving 0x00 (-1 + 1 = 0). Likewise -5 + 5 = 0 by the same mechanisim.
For fun, there are a lot of bit patterns that mean different things. For example 0x2a might be what we call a "number" or it might be a * character. It depends on the context we choose to impose on the bit patterns.
Because unsigned char is used for one byte integer in C89.
Note there are three distinct char related types in C89: char, signed char, unsigned char.
For character type, char is used.
unsigned char and signed char are used for one byte integers like short is used for two byte integers. You should not really use signed char or unsigned char for characters. Neither should you rely on the order of those values.
Different types are created to tell the compiler how to "understand" the bit representation of one or more bytes. For example, say I have a byte which contains 0xFF. If it's interpreted as a signed char, it's -1; if it's interpreted as a unsigned char, it's 255.
In your case, a, no matter whether signed or unsigned, is integral promoted to int, and passed to printf(), which later implicitly convert it to unsigned char before printing it out as a character.
But let's consider another case:
#include <stdio.h>
#include <string.h>
int main(void)
{
char a = -1;
unsigned char b;
memmove(&b, &a, 1);
printf("%d %u", a, b);
}
It's practically acceptable to simply write printf("%d %u", a, a);. memmove() is used just to avoid undefined behaviour.
It's output on my machine is:
-1 4294967295
Also, think about this ridiculous question:
Suppose sizeof (int) == 4, since arrays of characters (unsigned
char[]){UCHAR_MIN, UCHAR_MIN, UCHAR_MIN, UCHAR_MIN} to (unsigned
char[]){UCHAR_MAX, UCHAR_MAX, UCHAR_MAX, UCHAR_MAX} are same as
unsigned ints from UINT_MIN to UINT_MAX, then what is the point
of using unsigned int?

Dealing with char values over 127 in C

I'm quite new to C programming, and I have some problems trying to assign a value over 127 (0x7F) in a char array. In my program, I work with generic binary data and I don't face any problem printing a previously acquired byte stream (e.g. with fopen or fgets, then processed with some bitwise operations) as %c or %d.But if I try to print a character from its numerical value like this:
printf("%c\n", 128);
it just prints FFFD (the replacement character).Here is another example:
char abc[] = {126, 128, '\0'}; // Manually assigning values
printf("%c", abc[0]); // Prints "~", as expected
printf("%c", 121); // Prints "y"
pritf("%c", abc[1]; // Should print "€", I think, but I get "�"
I'm a bit confused since I can just print every character below 128 in these ways.The reason I'm asking this, is because I need to generate a (pseudo)random byte sequence using the rand() function.Here is an example:
char abc[10];
srand(time(NULL));
abc[0] = rand() % 256; // Gives something between 00:FF ...
printf("%c", abc[0]); // ... but I get "�"
If this is of any help, the source code is encoded in UTF-8, but changing encoding doesn't have any effect.
In C, a char is a different type than unsigned char and signed char. It has the range CHAR_MIN to CHAR_MAX. Yet it has the same range as one of unsigned char/signed char. Typically these are 8-bit types, but could be more. See CHAR_BIT. So the typical range is [0 to 255] or [-128 to 127]
If char is unsigned, abc[1] = 128 is fine. If char is signed, abc[1] = 128 is implementation-defined (see below). The typical I-D is the abc[1] will have the value of -128.
printf("%c\n", 128); will send the int value 128 to printf(). The "%c" will cast that value to an unsigned char. So far no problems. What appears on the output depends on how the output device handles code 128. Perhaps Ç, perhaps something else.
printf("%c", abc[1]; will send 128 or is I-D. If I-D and -128 was sent, then casting -128 to unsigned char is 128 and again the code for 128 is printed.
If the output device is expecting UTF8 sequences, a UTF8 sequence beginning with code 128 is invalid (it is an unexpected continuation byte) and many such systems will print the replacement character which is unicode FFFD.
Converting a value outside the range of of a signed char to char invokes:
the new type is signed and the value cannot be represented in it; either the
result is implementation-defined or an implementation-defined signal is raised. C11dr §6.3.1.3 3
First of all, let me tell you, signed-ness of a char is implementation defined.
If you have to deal with char values over 127, you can use unsigned char. It can handle 0-255.
Also, you should be using %hhu format specifier to print the value of an unsigned char.
If you're dealing with bytes, use unsigned char instead of char for your datatypes.
With regard to printing, you can print the bytes in hex instead of decimal or as characters:
printf("%02X", abc[0]);
You probably don't want to print these bytes as characters, as you'll most likely be dealing with UTF-8 character encoding which doesn't seem to be what you're looking for.

Format specifier for hex char in C

Is there a format specifier for sprintf in C that maps a char to hex in the same way that %x maps an int to hex?
Yes and no.
Since sprintf takes a variable argument list, all arguments undergo default promotion before sprintf receives them. That means sprintf will never receive a char -- a char will always be promoted to int before sprintf receives it (and a short will as well).
Yes, since what sprintf is receiving will be an int, you can use %x to convert it to hex format, and it'll work the same whether that value started as a char, short, or int. If (as is often the case) you want to print 2 characters for each input, you can use %2.2x.
Beware one point though: if your char is signed, and you start with a negative value, the promotion to int will produce the same numerical value, which normally won't be the same bit pattern as the original char, so (for example) a char with the value -1 will normally print out as ffff if int is 16 bits, ffffffff if int is 32 bits, or ffffffffffffffff if int is 64 bits (assuming the typical 2's complement representation for signed integers).
That's the same %x. All char values are converted to int before being passed to sprintf (or any other function that takes variable number of parameters).
printf("%x\n", 'a');
prints 61

are int and char represented using the same bits internally by gcc?

I was playing around with unicode characters (without using wchar_t support) just for fun. I'm only using the regular char data type. I noticed that while printing them in hex they were showing up full 4 bytes instead of just one byte.
For ex. consider this c file:
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
char *s = (char *) malloc(100);
fgets(s, 100, stdin);
while (s && *s != '\0') {
printf("%x\n", *s);
s++;
}
return 0;
}
After compiling with gcc and giving input as 'cent' symbol (hex: c2 a2) I get the following output
$ ./a.out
¢
ffffffc2: ?
ffffffa2: ?
a:
So instead of just printing c2 and a2 I got the whole 4 bytes as if it's an int type.
Does this mean char is not really 1-byte in length, ascii made it look like 1-byte?
Maybe the reason why the upper three bytes become 0xFFFFFF needs a bit more explanation?
The upper three bytes of the value printed for *s have a value of 0xFF due to sign extension.
The char value passed to printf is extended to an int before the call to printf.
This is due to C's default behaviour.
In the absence of signed or unsigned, the compiler can default to interpret char as signed char or unsigned char. It is consistently one or the other unless explicitly changed with a command line option or pragma's. In this case we can see that it is signed char.
In the absence of more information (prototypes or casts), C passes:
int, so char, short, unsigned char unsigned short are converted to int. It never passes a char, unsigned char, signed char, as a single byte, it always passes an int.
unsigned int is the same size as int so the value is passed without change
The compiler needs to decide how to convert the smaller value to an int.
signed values: the upper bytes of the int are sign extended from the smaller value, which effectively copies the top, sign bit, upwards to fill the int. If the top bit of the smaller signed value is 0, the upper bytes are filled with 0. If the top bit of the smaller signed value is 1, the upper bytes are filled with 1. Hence printf("%x ",*s) prints ffffffc2
unsigned values are not sign extended, the upper bytes of the int are 'zero padded'
Hence the reason C can call a function without a prototype (though the compiler will usually warn about that)
So you can write, and expect this to run (though I would hope your compiler issues warnings):
/* Notice the include is 'removed' so the C compiler does default behaviour */
/* #include <stdio.h> */
int main (int argc, const char * argv[]) {
signed char schar[] = "\x70\x80";
unsigned char uchar[] = "\x70\x80";
printf("schar[0]=%x schar[1]=%x uchar[0]=%x uchar[1]=%x\n",
schar[0], schar[1], uchar[0], uchar[1]);
return 0;
}
That prints:
schar[0]=70 schar[1]=ffffff80 uchar[0]=70 uchar[1]=80
The char value is interpreted by my (Mac's gcc) compiler as signed char, so the compiler generates code to sign extended the char to the int before the printf call.
Where the signed char value has its top (sign) bit set (\x80), the conversion to int sign extends the char value. The sign extension fills in the upper bytes (in this case 3 more bytes to make a 4 byte int) with 1's, which get printed by printf as ffffff80
Where the signed char value has its top (sign) bit clear (\x70), the conversion to int still sign extends the char value. In this case the sign is 0, so the sign extension fills in the upper bytes with 0's, which get printed by printf as 70
My example shows the case where the value is unsigned char. In these two cases the value is not sign extended because the value is unsigned. Instead they are extended to int with 0 padding. It might look like printf is only printing one byte because the adjacent three bytes of the value would be 0. But it is printing the entire int, it happens that the value is 0x00000070 and 0x00000080 because the unsigned char values were converted to
int without sign extension.
You can force printf to only print the low byte of the int, by using suitable formatting (%hhx), so this correctly prints only the value in the original char:
/* Notice the include is 'removed' so the C compiler does default behaviour */
/* #include <stdio.h> */
int main (int argc, const char * argv[]) {
char schar[] = "\x70\x80";
unsigned char uchar[] = "\x70\x80";
printf("schar[0]=%hhx schar[1]=%hhx uchar[0]=%hhx uchar[1]=%hhx\n",
schar[0], schar[1], uchar[0], uchar[1]);
return 0;
}
This prints:
schar[0]=70 schar[1]=80 uchar[0]=70 uchar[1]=80
because printf interprets the %hhx to treat the int as an unsigned char. This does not change the fact that the char was sign extended to an int before printf was called. It is only a way to tell printf how to interpret the contents of the int.
In a way, for signed char *schar, the meaning of %hhx looks slightly misleading, but the '%x' format interprets int as unsigned anyway, and (with my printf) there is no format to print hex for signed values (IMHO it would be a confusing).
Sadly, ISO/ANSI/... don't freely publish our programming language standards, so I can't point to the specification, but searching the web might turn up working drafts. I haven't tried to find them. I would recommend "C: A Reference Manual" by Samuel P. Harbison and Guy L. Steele as a cheaper alternative to the ISO document.
HTH
No. printf is a variable argument function, arguments to a variable argument function will be promoted to an int. And in this case the char was negative, so it gets sign extended.
%x tells printf that the value to print is an unsigned int. So, it promotes the char to an unsigned int, sign extending as necessary and then prints out the resulting value.

How to explain this,what happens when we cast signed char to int/hex?

signed char num = 220; //DC in hex
printf("%02X\n", num);
printf("%d\n", num);
I know that signed char can only represent -128~127,but why the above outputs:
FFFFFFDC
-36
What's the reason?
UPDATE
My above code is just contrived for my question,that is ,what happens when we cast signed char to int/hex
As our starting point, 220 = DC in hex, and 11011100 in binary.
The first bit is the sign-bit, leaving us with 1011100. Per two's complement, if we complement it (getting 0100011), and then add one, we get 0100100 -- this is 36.
When it converts the signed char to signed int, it doesn't say "this would be 220 if it's unsigned", it says "this is -36, make it an int of -36", for which the 32-bit two's complement representation is FFFFFFDC, because it must be the negative value for the full size of int (this is called sign-extension):
+36 as a 32-bit value: 00000000000000000000000000100100
complement: 11111111111111111111111111011011
add one: 11111111111111111111111111011100
Or, in hex, FFFFFFDC.
This is why you must be careful with printf("%x", ch); (and relatives) -- if you intend to just get a two-digit value, and chars are signed, you may wind up with eight digits instead. Always specify "unsigned char" if you need it to be unsigned.
As you pointed out, the signed car can have a max value of 127. The reason for the negative number there, however, is due to how the char is stored in memory. All signed integer types save the final bit for sign, with 0/1 denoting positive/negative. The compiler, however, does not check overflow, so when you tried to assign num to 220, it overflowed the value into the sign bit since it could not fit it into the first 7 bits of the char (chars are 1 byte). As a result, when you try to read what is in memory, it sees the sign bit as thrown making the compiler think that instead of seeing a large positive number as you intended, it is instead seeing a small negative value. Hence, the output you see.
Edit
Responding to your updated question. All that happens is that the compiler will copy or expand the char to have 4 bytes of memory, interpret the value of the char, and rewrite it in the new int's memory. In you're case, the program at run time would think that the char has a value of -36 instead of 220 because it's interpreting those bits as a signed char before the cast. Then, when it casts, it simply creates an int with value -36.
Your overflow on the assignment, combined with sign extension when you promote it to an int to view it in hex ... see What's happening in the background of a unsigned char to integer type cast?
Any type smaller that "int" gets converted to "int" when it's passed via "...". This means your negative char got converted to negative int, with FFFs showing up in hex printout.

Resources