Hebrew char translates to "FFFF" in HEX - c

I have a code that translates ASCII char array to a Hex char array:
void ASCIIFormatCharArray2HexFormatCharArray(char chrASCII[72], char chrHex[144])
{
int i,j;
memset(chrHex, 0, 144);
for(i=0, j=0; i<strlen(chrASCII); i++, j+=2)
{
sprintf((char*)chrHex + j, "%02X", chrASCII[i]);
}
chrHex[j] = '\0';
}
When I insert the function the char 'א' - Alef, the equivalent to 'A' in English, the function does this:
chrHex = "FFFF"
I don't understand how 1 char translates to 2 bytes of Hex ("FFFF") instead of 1 byte(like "u" in ASCII is "75" in Hex) when it's not even an English letter.
would love for an explanation of how the compiler treats 'א' like so.

When “א” appears in a string literal, your compiler likely represents it with the bytes D716 and 9016, although other possibilities are allowed by the C standard.
When these bytes are interpreted as a signed char, they have the values −41 and −112. When these are passed as an argument to sprintf, they are automatically promoted to int. In a 32-bit two’s complement int, the bits used to represent −41 and −112 are FFFFFFD716 and FFFFFF9016.
The behavior of asking sprintf to format these with %02X is technically not defined by the C standard, because an unsigned int should be passed for X, rather than an int. However, your C implementation likely formats them as “FFFFFFD7” and “FFFFFF90”.
So the first sprintf puts “FFFFFFD7” in chrHex starting at element 0.
Then the second sprintf puts “FFFFFF90” in chrHex starting at element 2, partially overwriting the first string. Now chrHex contains “FFFFFFFF90”.
Then chrHex[j] = '\0'; puts a null character an element 4, truncating the string to “FFFF”.
To fix this, change the sprintf to expect an unsigned char and pass an unsigned char value (which will be promoted to int, but sprintf expects that for hhX and works with it):
sprintf(chrHex + j, "%02hhX", (unsigned char) chrASCII[i]);

Related

How to get unicode value of multibyte character stored under char * in C?

Let's assume I don't use <uchar.h> from C11 and have something like this
char *a = "Ā";
How can I get unicode value of this character (it's 256)? Doing something like this:
int *a_value = (int *)a;
printf("%d\n", *a_value);
doesn't work.
How this character is written in memory?
gdb shows me:
loc a = 0x555555556004 "Ā": -60 '\304'
but quite don't get what does it exactly mean.
I've checked size of a and it's 2 bytes which is okay but doing
printf("%d\n", a[0]);
printf("%d\n", a[1]);
also doesn't work. It gives me -60 and -128.
The value is encoded as UTF-8.
256 in binary is 100000000 (9 bits). It has more than 7 bits (but less than 12) so it need 2 bytes to be encoded in UTF-8.
The 1st byte will have the first 5 bits, the 2nd byte will have the final 6 bits.
So, again 256 in binary with 11 bits is 00100000000 or 00100 followed by 000000
Final UTF-8 1st byte 11000100 ... 110 + 00100
Final UTF-8 2nd byte 10000000 ... 10 + 000000
11000100 in decimal is 196, or considering the MSB a sign bit: -60
10000000 in decimal is 128, or considering the MSB a sign bit: -128
Read more about UTF-8 encoding in the Wikipedia article
Two more things:
(1) You got those weird numbers because plain characters on your machine (like many) are evidently signed. You can see the "real" bytes by casting to unsigned char:
char *a = "Ā";
printf("%u %u\n", ((unsigned char *)a)[0], ((unsigned char *)a)[1]);
printf("%x %x\n", ((unsigned char *)a)[0], ((unsigned char *)a)[1]);
or by using unsigned char all along:
unsigned char *u = "Ā";
printf("%x %x\n", u[0], u[1]);
The %u version prints 196 128, and the %x versions print c4 80.
(2) I'm not sure what you meant by "not using <uchar.h> from C11", but if you don't want to do the UTF-8 conversion by hand, you can convert a "multibyte string" (which is just about invariably UTF-8) to a wide or Unicode character by using the library function mbtowc from <stdlib.h>:
wchar_t wc;
mbtowc(&wc, a, strlen(a));
printf("%d %x\n", wc, wc);
This prints 256 100 on my machine, since Ā is U+0100.
Another useful function is mbstowcs, which does this for multiple characters at once:
char *mbs = "Daß ist sehr schön";
printf("%s\n", mbs);
wchar_t wcs[20];
int n = mbstowcs(wcs, mbs, 20);
for(int i = 0; i < n; i++)
printf("%3d %x %lc\n", wcs[i], wcs[i], wcs[i]);
When using functions like mbtowc and mbstowcs, however, you have to remember that they do not necessarily deal in UTF-8 and Unicode. There are wide character encodings other than Unicode, and there are multibyte representations other than UTF-8. In fact, to get these functions to work "correctly" on my machine I have to first call
setlocale(LC_CTYPE, "");
to tell them that it's okay to use my locale settings (namely, en_US.UTF-8), instead of the default "C" locale which does not assume Unicode.

Char automatically converts to int (I guess)

I have following code
char temp[] = { 0xAE, 0xFF };
printf("%X\n", temp[0]);
Why output is FFFFFFAE, not just AE?
I tried
printf("%X\n", 0b10101110);
And output is correct: AE.
Suggestions?
The answer you're getting, FFFFFFAE, is a result of the char data type being signed. If you check the value, you'll notice that it's equal to -82, where -82 + 256 = 174, or 0xAE in hexadecimal.
The reason you get the correct output when you print 0b10101110 or even 174 is because you're using the literal values directly, whereas in your example you're first putting the 0xAE value in a signed char where the value is then being sort of "reinterpreted modulo 128", if you wanna think of it that way.
So in other words:
0 = 0 = 0x00
127 = 127 = 0x7F
128 = -128 = 0xFFFFFF80
129 = -127 = 0xFFFFFF81
174 = -82 = 0xFFFFFFAE
255 = -1 = 0xFFFFFFFF
256 = 0 = 0x00
To fix this "problem", you could declare the same array you initially did, just make sure to use an unsigned char type array and your values should print as you expect.
#include <stdio.h>
#include <stdlib.h>
int main()
{
unsigned char temp[] = { 0xAE, 0xFF };
printf("%X\n", temp[0]);
printf("%d\n\n", temp[0]);
printf("%X\n", temp[1]);
printf("%d\n\n", temp[1]);
return EXIT_SUCCESS;
}
Output:
AE
174
FF
255
https://linux.die.net/man/3/printf
According to the man page, %x or %X accept an unsigned integer. Thus it will read 4 bytes from the stack.
In any case, under most architectures you can't pass a parameter that is less then a word (i.e. int or long) in size, and in your case it will be converted to int.
In the first case, you're passing a char, so it will be casted to int. Both are signed, so a signed cast is performed, thus you see preceding FFs.
In your second example, you're actually passing an int all the way, so no cast is performed.
If you'd try:
printf("%X\n", (char) 0b10101110);
You'd see that FFFFFFAE will be printed.
When you pass a smaller than int data type (as char is) to a variadic function (as printf(3) is) the parameter is converted to int in case the parameter is signed and to unsigned int in the case it is unsigned. What is being done and you observe is a sign extension, as the most significative bit of the char variable is active, it is replicated to the thre bytes needed to complete an int.
To solve this and to have the data in 8 bits, you have two possibilities:
Allow your signed char to convert to an int (with sign extension) then mask the bits 8 and above.
printf("%X\n", (int) my_char & 0xff);
Declare your variable as unsigned, so it is promoted to an unsigned int.
unsigned char my_char;
...
printf("%X\n", my_char);
This code causes undefined behaviour. The argument to %X must have type unsigned int, but you supply char.
Undefined behaviour means that anything can happen; including, but not limited to, extra F's appearing in the output.

8 Byte Number as Hex in C

I have given a number, for example n = 10, and I want to calculate its length in hex with big endian and save it in a 8 byte char pointer. In this example I would like to get the following string:
"\x00\x00\x00\x00\x00\x00\x00\x50".
How do I do that automatically in C with for example sprintf?
I am not even able to get "\x50" in a char pointer:
char tmp[1];
sprintf(tmp, "\x%x", 50); // version 1
sprintf(tmp, "\\x%x", 50); // version 2
Version 1 and 2 don't work.
I have given a number, for example n = 10, and I want to calculate its length in hex
Repeatedly divide by 16 to find the number of hexadecimal digits. A do ... while insures the result is 1 when n==0.
int hex_length = 0;
do {
hex_length++;
} while (number /= 16);
save it in a 8 byte char pointer.
C cannot force your system to use 8-byte pointer. So if you system uses 4 byte char pointer, we are out of luck. Let us assume OP's system uses 8-byte pointer. Yet integers may be assigned to pointers. This may or may not result in valid pointer.
assert(sizeof (char*) == 8);
char *char_pointer = n;
printf("%p\n", (void *) char_pointer);
In this example I would like to get the following string: "\x00\x00\x00\x00\x00\x00\x00\x50".
In C, a string includes the various characters up to an including a null character. "\x00\x00\x00\x00\x00\x00\x00\x50" is not a valid C string, yet is a valid string literal. Code cannot construct string literals at run time, that is a part of source code. Further the relationship between n==10 and "\x00...\x00\x50" is unclear. Instead perhaps the goal is to store n into a 8-byte array (big endian).
char buf[8];
for (int i=8; i>=0; i--) {
buf[i] = (char) n;
n /= 256;
}
OP's code certainly will fail as it attempts to store a string which is too small. Further "\x%x" is not valid code as \x begins an invalid escape sequence.
char tmp[1];
sprintf(tmp, "\x%x", 50); // version 1
Just do:
int i;
...
int length = round(ceil(log(i) / log(16)));
This will give you (in length) the number of hexadecimal digits needed to represent i (without 0x of course).
log(i) / log(base) is the log-base of i. The log16 of i gives you the exponent.
To make clear what we're doing here: When rising 16 to the power of the found exponent, we get back i: 16^log16(i) = i.
By rounding up this exponent using ceil(), you get the number of digits.

declaring string using pointer to int

I am trying to initialize a string using pointer to int
#include <stdio.h>
int main()
{
int *ptr = "AAAA";
printf("%d\n",ptr[0]);
return 0;
}
the result of this code is 1094795585
could any body explain this behavior and why the code gave this answers ?
I am trying to initialize a string using pointer to int
The string literal "AAAA" is of type char[5], that is array of five elements of type char.
When you assign:
int *ptr = "AAAA";
you actually must use explicit cast (as types don't match):
int *ptr = (int *) "AAAA";
But, still it's potentially invalid, as int and char objects may have different alignment requirements. In other words:
alignof(char) != alignof(int)
may hold. Also, in this line:
printf("%d\n", ptr[0]);
you are invoking undefined behavior (so it might print "Hello from Mars" if compiler likes so), as ptr[0] dereferences ptr, thus violating strict aliasing rule.
Note that it is valid to make transition int * ---> char * and read object as char *, but not the opposite.
the result of this code is 1094795585
The result makes sense, but for that, you need to rewrite your program in valid form. It might look as:
#include <stdio.h>
#include <string.h>
union StringInt {
char s[sizeof("AAAA")];
int n[1];
};
int main(void)
{
union StringInt si;
strcpy(si.s, "AAAA");
printf("%d\n", si.n[0]);
return 0;
}
To decipher it, you need to make some assumptions, depending on your implementation. For instance, if
int type takes four bytes (i.e. sizeof(int) == 4)
CPU has little-endian byte ordering (though it's not really matter, since every letter is the same)
default character set is ASCII (the letter 'A' is represented as 0x41, that is 65 in decimal)
implementation uses two's complement representation of signed integers
then, you may deduce, that si.n[0] holds in memory:
0x41 0x41 0x41 0x41
that is in binary:
01000001 ...
The sign (most-significant) bit is unset, hence it is just equal to:
65 * 2^24 + 65 * 2^16 + 65 * 2^8 + 65 =
65 * (2^24 + 2^16 + 2^8 + 1) = 65 * 16843009 = 1094795585
1094795585 is correct.
'A' has the ASCII value 65, i.e. 0x41 in hexadecimal.
Four of them makes 0x41414141 which is equal to 1094795585 in decimal.
You got the value 65656565 by doing 65*100^0 + 65*100^1 + 65*100^2 + 65*100^3 but that's wrong since a byte1 can contain 256 different values, not 100.
So the correct calculation would be 65*256^0 + 65*256^1 + 65*256^2 + 65*256^3, which gives 1094795585.
It's easier to think of memory in hexadecimal because one hexadecimal digit directly corresponds to half a byte1, so two hex digits is one full byte1 (cf. 0x41). Whereas in decimal, 255 fits in a single byte1, but 256 does not.
1 assuming CHAR_BIT == 8
65656565 this is a wrong representation of the value of "AAAA" you are seprately representing each character and "AAAA" is stored as array.Its converting into 1094795585 because %d identifier prints decimal value. Run this in gdb with following command:
x/8xb (pointer) //this will show you the memory hex value
x/d (pointer) //this will show you the converted decimal value
#zenith gave you the answer you expected, but your code invokes UB. Anyway, you could demonstrate the same in an almost correct way :
#include <stdio.h>
int main()
{
int i, val;
char *pt = (char *) &val; // cast a pointer to any to a pointer to char : valid
for (i=0; i<sizeof(int); i++) pt[i] = 'A'; // assigning bytes of int : UB in general case
printf("%d 0x%x\n",val, val);
return 0;
}
Assigning bytes of an int is UB in the general case because C standard says that [for] signed integer types, the bits of the object representation shall be divided into three groups: value bits, padding bits, and the sign bit. And a remark adds Some combinations of padding bits might generate trap representations, for example, if one padding
bit is a parity bit.
But in common architectures, there are no padding bits and all bits values correspond to valid numbers, so the operation is valid (but implementation dependant) on all common systems. It is still implementation dependant because size of int is not fixed by standard, nor is endianness.
So : on a 32 bit system using no padding bits, above code will produce
1094795585 0x41414141
indepentantly of endianness.

How to print the hexadecimal in a specific manner?

In the following code I stored the mac address in a char array.
But even when I am storing it in a char variable, while printing it's printing as follows:
ffffffbb
ffffffcc
ffffffdd
ffffffee
ffffffff
This is the code:
#include<stdio.h>
int main()
{
char *mac = "aa:bb:cc:dd:ee:ff";
char a[6];int i;
sscanf(mac,"%x:%x:%x:%x:%x:%x",&a[0],&a[1],&a[2],&a[3],&a[4],&a[5]);
for( i = 0; i < 6;i++)
printf("%x\n",a[i]);
}
I need the output to be in the following way:
aa
bb
cc
dd
ee
ff
The current printf statement is
printf("%x\n",a[i]);
How can I get the desired output and why is the printf statement printing ffffffaa even though I stored the aa in a char array?
You're using %x, which expects the argument to be unsigned int *, but you're just passing char *. This is dangerous, since sscanf() will do an int-sized write, possibly writing outside the space allocated to your variable.
Change the conversion specifier for the sscanf() to %hhx, which means unsigned char. Then change the print to match. Also, of course, make the a array unsigned char.
Also check to make sure sscanf() succeded:
unsigned char a[6];
if(sscanf(mac, "%hhx:%hhx:%hhx:%hhx:%hhx:%hhx",
a, a + 1, a + 2, a + 3, a + 4, a + 5) == 6)
{
printf("daddy MAC is %02hhx:%02hhx:%02hhx:%02hhx:%02hhx:%02hhx",
a[0], a[1], a[2], a[3], a[4], a[5]);
}
Make sure to treat your a array as unsigned chars, i.e.
unsigned char a[6];
In
printf("%x\n",a[i]);
the expression a[i] yields a char. However, the standard does not specify whether char is signed or unsigned. In your case, the compiler apparently treats it as a signed type.
Since the most significant bit is set in all the bytes of your Mac address (each by is larger than or equal to 0x80), a[i] is treated as a negative value so printf generates the hexadecimal representation of a negative value.

Resources