Convert char from big endian to little endian in C - c

I'm trying to convert a char variable from big endian to little endian.
Here it is exactly:
char name[12];
I know how to convert an int between big and little endian, but the char is messing me up.
I know I have to convert it to integer form first, which I have.
For converting an int this is what I used:
(item.age >> 24) | ((item.age >> 8) & 0x0000ff00) | ((item.age << 8) & 0x00ff0000) | (item.age << 24);
For converting the char, I'd like to do it in the same way, if possible, just because this is the only way I understand how to.

Endianess only affects byte order. Since char is exactly one byte it is the same on all endianess formats.
And as a note: you may want to look up the endianess conversion functions in libc:
htonl, htons
ntohl, ntohs

Byte-order (a.k.a. Endian-ness) has no meaning when it comes to single-byte types.
Although char name[12] consists of 12 bytes, the size of its type is a single byte.
So you need not do anything with it...
BTW, the size of int is not necessarily 4 bytes on all compilers, so you might want to use a more generic conversion method (for any type):
char* p = (char*)&item;
for (int i=0; i<sizeof(item)/2; i++)
{
char temp = p[i];
p[i] = p[sizeof(item)-1-i];
p[sizeof(item)-1-i] = temp;
}

Related

How to convert to integer a char[4] of "hexadecimal" numbers [C/Linux]

So I'm working with system calls in Linux. I'm using "lseek" to navigate through the file and "read" to read. I'm also using Midnight Commander to see the file in hexadecimal. The next 4 bytes I have to read are in little-endian , and look like this : "2A 00 00 00". But of course, the bytes can be something like "2A 5F B3 00". I have to convert those bytes to an integer. How do I approach this? My initial thought was to read them into a vector of 4 chars, and then to build my integer from there, but I don't know how. Any ideas?
Let me give you an example of what I've tried. I have the following bytes in file "44 00". I have to convert that into the value 68 (4 + 4*16):
char value[2];
read(fd, value, 2);
int i = (value[0] << 8) | value[1];
The variable i is 17480 insead of 68.
UPDATE: Nvm. I solved it. I mixed the indexes when I shift. It shoud've been value[1] << 8 ... | value[0]
General considerations
There seem to be several pieces to the question -- at least how to read the data, what data type to use to hold the intermediate result, and how to perform the conversion. If indeed you are assuming that the on-file representation consists of the bytes of a 32-bit integer in little-endian order, with all bits significant, then I probably would not use a char[] as the intermediate, but rather a uint32_t or an int32_t. If you know or assume that the endianness of the data is the same as the machine's native endianness, then you don't need any other.
Determining native endianness
If you need to compute the host machine's native endianness, then this will do it:
static const uint32_t test = 1;
_Bool host_is_little_endian = *(char *)&test;
It is worthwhile doing that, because it may well be the case that you don't need to do any conversion at all.
Reading the data
I would read the data into a uint32_t (or possibly an int32_t), not into a char array. Possibly I would read it into an array of uint8_t.
uint32_t data;
int num_read = fread(&data, 4, 1, my_file);
if (num_read != 1) { /* ... handle error ... */ }
Converting the data
It is worthwhile knowing whether the on-file representation matches the host's endianness, because if it does, you don't need to do any transformation (that is, you're done at this point in that case). If you do need to swap endianness, however, then you can use ntohl() or htonl():
if (!host_is_little_endian) {
data = ntohl(data);
}
(This assumes that little- and big-endian are the only host byte orders you need to be concerned with. Historically, there have been others, which is why the byte-reorder functions come in pairs, but you are extremely unlikely ever to see one of the others.)
Signed integers
If you need a signed instead of unsigned integer, then you can do the same, but use a union:
union {
uint32_t unsigned;
int32_t signed;
} data;
In all of the preceding, use data.unsigned in place of plain data, and at the end, read out the signed result from data.signed.
Suppose you point into your buffer:
unsigned char *p = &buf[20];
and you want to see the next 4 bytes as an integer and assign them to your integer, then you can cast it:
int i;
i = *(int *)p;
You just said that p is now a pointer to an int, you de-referenced that pointer and assigned it to i.
However, this depends on the endianness of your platform. If your platform has a different endianness, you may first have to reverse-copy the bytes to a small buffer and then use this technique. For example:
unsigned char ibuf[4];
for (i=3; i>=0; i--) ibuf[i]= *p++;
i = *(int *)ibuf;
EDIT
The suggestions and comments of Andrew Henle and Bodo could give:
unsigned char *p = &buf[20];
int i, j;
unsigned char *pi= &(unsigned char)i;
for (j=3; j>=0; j--) *pi++= *p++;
// and the other endian:
int i, j;
unsigned char *pi= (&(unsigned char)i)+3;
for (j=3; j>=0; j--) *pi--= *p++;

Converting little endian to big endian using Bitshift Operators

I am working on endianess. My little endian program works, and gives the correct output. But I am not able to get my way around big endian. Below is the what I have so far.
I know i have to use bit shift and i dont think i am doing a good job at it. I tried asking my TA's and prof but they are not much help.
I have been following this link (convert big endian to little endian in C [without using provided func]) to understand more but cannot still make it work. Thank you for the help.
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[])
{
FILE* input;
FILE* output;
input = fopen(argv[1],"r");
output = fopen(argv[2],"w");
int value,value2;
int i;
int zipcode, population;
while(fscanf(input,"%d %d\n",&zipcode, &population)!= EOF)
{
for(i = 0; i<4; i++)
{
population = ((population >> 4)|(population << 4));
}
fwrite(&population, sizeof(int), 1, output);
}
fclose(input);
fclose(output);
return 0;
}
I'm answering not to give you the answer but to help you solve it yourself.
First ask yourself this: how many bits are in a byte? (hint: 8) Next, how many bytes are in an int? (hint: probably 4) Picture this 32-bit integer in memory:
+--------+
0x|12345678|
+--------+
Now picture it on a little-endian machine, byte-wise. It would look like this:
+--+--+--+--+
0x|78|56|34|12|
+--+--+--+--+
What shift operations are required to get the bytes into the correct spot?
Remember, when you use a bitwise operator like >>, you are operating on bits. So 1 << 24 would be the integer value 1 converted into the processor's opposite endianness.
"little-endian" and "big-endian" refer to the order of bytes (we can assume 8 bits here) in a binary representation. When referring to machines, it's about the order of the bytes in memory: on big-endian machines, the address of an int will point to its highest-order byte, while on a little-endian machine the address of an int will refer to its lowest-order byte.
When referring to binary files (or pipes or transmission protocols etc.), however, it refers to the order of the bytes in the file: a "little-endian representation" will have the lowest-order byte first and the highest-order byte last.
How does one obtain the lowest-order byte of an int? That's the low 8 bits, so it's (n & 0xFF) (or ((n >> 0) & 0xFF), the usefulness of which you will see below).
The next lowest-order byte is ((n >> 8) & 0xFF).
The next lowest-order byte is ((n >> 16) & 0xFF) ... or (((n >> 8) >> 8) & 0xFF).
And so on.
So you can peal off bytes from n in a loop and output them one byte at a time ... you can use fwrite for that but it's simpler just to use putchar or putc.
You say that your teacher requires you to use fwrite. There are two ways to do that: 1) use fwrite(&n, 1, 1, filePtr) in a loop as described above. 2) Use the loop to reorder your int value by storing the bytes in the desired order in a char array rather than outputting them, then use fwrite to write it out. The latter is probably what your teacher has in mind.
Note that, if you just use fwrite to output your int it will work ... if you're running on a little-endian machine, where the bytes of the int are already stored in the right order. But the bytes will be backwards if running on a big-endian machine.
The problem with most answers to this question is portability. I've provided a portable answer here, but this recieved relatively little positive feedback. Note that C defines undefined behavior as: behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements.
The answer I'll give here won't assume that int is 16 bits in width; It'll give you an idea of how to represent "larger int" values. It's the same concept, but uses a dynamic loop rather than two fputcs.
Declare an array of sizeof int unsigned chars: unsigned char big_endian[sizeof int];
Separate the sign and the absolute value.
int sign = value < 0;
value = sign ? -value : value;
Loop from sizeof int to 0, writing the least significant bytes:
size_t foo = sizeof int;
do {
big_endian[--foo] = value % (UCHAR_MAX + 1);
value /= (UCHAR_MAX + 1);
} while (foo > 0);
Now insert the sign: foo[0] |= sign << (CHAR_BIT - 1);
Simple, yeh? Little endian is equally simple. Just reverse the order of the loop to go from 0 to sizeof int, instead of from sizeof int to 0:
size_t foo = 0;
do {
big_endian[foo++] = value % (UCHAR_MAX + 1);
value /= (UCHAR_MAX + 1);
} while (foo < sizeof int);
The portable methods make more sense, because they're well defined.

Use int instead of char in char array and mask

In the bit-shifting example shown
here:
unsigned long int longInt = 1234567890;
unsigned char byteArray[4];
// convert from an unsigned long int to a 4-byte array
byteArray[0] = (int)((longInt >> 24) & 0xFF) ;
byteArray[1] = (int)((longInt >> 16) & 0xFF) ;
byteArray[2] = (int)((longInt >> 8) & 0XFF);
byteArray[3] = (int)((longInt & 0XFF));
Three questions:
Why is it (int) instead of (unsigned char)? I tried it with unsigned char and it seems to compile just fine.
Is 0XFF necessary? Isn't the new bit shifted-in 0 because Wikipedia says C uses logical shifting and logical shifting shifts in 0? (EDIT: at least it doesn't seem necessary on one with >> 24?)
Can't I just do a memcpy() to copy longInt to a unsigned char buffer? Is it not so because of issue with Endianness? Is there any other reason?
1.
((longInt >> 24) & 0xFF) expression is of type unsigned long int. With the cast to int the expression is first converted to int then to unsigned char. If you don't cast to int the expression is not first converted to int. There are no difference in the two situations and the cast is superfluous.
2.
The 0xff is not necessary. The conversion to unsigned char actually performs the same.
3.
You can use memcpy but it is not portable because it depends on the endianness of the system. It will give different results if the system is big endian or little endian while the bitwise shift solution will give the same results.

Casting a short from a char array

I've run into a small issue here. I have an unsigned char array, and I am trying to access bytes 2-3 (0xFF and 0xFF) and get their value as a short.
Code:
unsigned char Temp[512] = {0x00,0xFF,0xFF,0x00};
short val = (short)*((unsigned char*)Temp+1)
While I would expect val to contain 0xFFFF it actually contains 0x00FF. What am I doing wrong?
There's no guarantee that you can access a short when the data is improperly aligned.
On some machines, especially RISC machines, you'd get a bus error and core dump for misaligned access. On other machines, the misaligned access would involve a trap into the kernel to fix up the error — which is only a little quicker than the core dump.
To get the result reliably, you'd be best off doing shifting and or:
val = *(Temp+1) << 8 | *(Temp+2);
or:
val = *(Temp+2) << 8 | *(Temp+1);
Note that this explicitly offers big-endian (first option) or little-endian (second) interpretation of the data.
Also note the careful use of << and |; if you use + instead of |, you have to parenthesize the shift expression or use multiplication instead of shift:
val = (*(Temp+1) << 8) + *(Temp+2);
val = *(Temp+1) * 256 + *(Temp+2);
Be logical and use either logic or arithmetic and not a mixture.
Well you're dereferencing a unsigned char* when you should be derefencing a short*
I think this should work:
short val = *((short*)(Temp+1))
Your problem is that you are only accessing one byte of the array:
*((unsigned char*)Temp+1) will dereference the pointer Temp+1 giving you 0xFF
(short)*((unsigned char*)Temp+1) will cast the result of the dereference to short. Casting unsigned char 0xFF to short obviously gives you 0x00FF
So what you are trying to do is *((short*)(Temp+1))
It should however be noted that what you are doing is a horrible hack. First of all when you have different chars the result will obviously depend on the endianess of the machine.
Second there is no guarantee that the accessed data is correctly aligned to be accessed as a short.
So it might be a better idea to do something like short val= *(Temp+1)<<8 | *(Temp+2) or short val= *(Temp+2)<<8 | *(Temp+1) depending on the endianess of your architecture
I do not recommend this approach because it is architecture-specific.
Consider the following definition of Temp:
unsigned char Temp[512] = {0x00,0xFF,0x88,0x00};
Depending on the endianness of the system, you will get different results casting Temp + 1 to a short *; on a little endian system, the result would be the value 0x88FF, but on a Big endian system, the result would be 0xFF88.
Also, I believe that this is an undefined cast because of issues with alignment.
What you could use is:
short val = (((short)Temp[1]) << 8) | Temp[2];

How to convert from integer to unsigned char in C, given integers larger than 256?

As part of my CS course I've been given some functions to use. One of these functions takes a pointer to unsigned chars to write some data to a file (I have to use this function, so I can't just make my own purpose built function that works differently BTW). I need to write an array of integers whose values can be up to 4095 using this function (that only takes unsigned chars).
However am I right in thinking that an unsigned char can only have a max value of 256 because it is 1 byte long? I therefore need to use 4 unsigned chars for every integer? But casting doesn't seem to work with larger values for the integer. Does anyone have any idea how best to convert an array of integers to unsigned chars?
Usually an unsigned char holds 8 bits, with a max value of 255. If you want to know this for your particular compiler, print out CHAR_BIT and UCHAR_MAX from <limits.h> You could extract the individual bytes of a 32 bit int,
#include <stdint.h>
void
pack32(uint32_t val,uint8_t *dest)
{
dest[0] = (val & 0xff000000) >> 24;
dest[1] = (val & 0x00ff0000) >> 16;
dest[2] = (val & 0x0000ff00) >> 8;
dest[3] = (val & 0x000000ff) ;
}
uint32_t
unpack32(uint8_t *src)
{
uint32_t val;
val = src[0] << 24;
val |= src[1] << 16;
val |= src[2] << 8;
val |= src[3] ;
return val;
}
Unsigned char generally has a value of 1 byte, therefore you can decompose any other type to an array of unsigned chars (eg. for a 4 byte int you can use an array of 4 unsigned chars). Your exercise is probably about generics. You should write the file as a binary file using the fwrite() function, and just write byte after byte in the file.
The following example should write a number (of any data type) to the file. I am not sure if it works since you are forcing the cast to unsigned char * instead of void *.
int homework(unsigned char *foo, size_t size)
{
int i;
// open file for binary writing
FILE *f = fopen("work.txt", "wb");
if(f == NULL)
return 1;
// should write byte by byte the data to the file
fwrite(foo+i, sizeof(char), size, f);
fclose(f);
return 0;
}
I hope the given example at least gives you a starting point.
Yes, you're right; a char/byte only allows up to 8 distinct bits, so that is 2^8 distinct numbers, which is zero to 2^8 - 1, or zero to 255. Do something like this to get the bytes:
int x = 0;
char* p = (char*)&x;
for (int i = 0; i < sizeof(x); i++)
{
//Do something with p[i]
}
(This isn't officially C because of the order of declaration but whatever... it's more readable. :) )
Do note that this code may not be portable, since it depends on the processor's internal storage of an int.
If you have to write an array of integers then just convert the array into a pointer to char then run through the array.
int main()
{
int data[] = { 1, 2, 3, 4 ,5 };
size_t size = sizeof(data)/sizeof(data[0]); // Number of integers.
unsigned char* out = (unsigned char*)data;
for(size_t loop =0; loop < (size * sizeof(int)); ++loop)
{
MyProfSuperWrite(out + loop); // Write 1 unsigned char
}
}
Now people have mentioned that 4096 will fit in less bits than a normal integer. Probably true. Thus you can save space and not write out the top bits of each integer. Personally I think this is not worth the effort. The extra code to write the value and processes the incoming data is not worth the savings you would get (Maybe if the data was the size of the library of congress). Rule one do as little work as possible (its easier to maintain). Rule two optimize if asked (but ask why first). You may save space but it will cost in processing time and maintenance costs.
The part of the assignment of: integers whose values can be up to 4095 using this function (that only takes unsigned chars should be giving you a huge hint. 4095 unsigned is 12 bits.
You can store the 12 bits in a 16 bit short, but that is somewhat wasteful of space -- you are only using 12 of 16 bits of the short. Since you are dealing with more than 1 byte in the conversion of characters, you may need to deal with endianess of the result. Easiest.
You could also do a bit field or some packed binary structure if you are concerned about space. More work.
It sounds like what you really want to do is call sprintf to get a string representation of your integers. This is a standard way to convert from a numeric type to its string representation. Something like the following might get you started:
char num[5]; // Room for 4095
// Array is the array of integers, and arrayLen is its length
for (i = 0; i < arrayLen; i++)
{
sprintf (num, "%d", array[i]);
// Call your function that expects a pointer to chars
printfunc (num);
}
Without information on the function you are directed to use regarding its arguments, return value and semantics (i.e. the definition of its behaviour) it is hard to answer. One possibility is:
Given:
void theFunction(unsigned char* data, int size);
then
int array[SIZE_OF_ARRAY];
theFunction((insigned char*)array, sizeof(array));
or
theFunction((insigned char*)array, SIZE_OF_ARRAY * sizeof(*array));
or
theFunction((insigned char*)array, SIZE_OF_ARRAY * sizeof(int));
All of which will pass all of the data to theFunction(), but whether than makes any sense will depend on what theFunction() does.

Resources