C/C++ code to convert big endian to little endian - c

I've seen several different examples of code that converts big endian to little endian and vice versa, but I've come across a piece of code someone wrote that seems to work, but I'm stumped as to why it does.
Basically, there's a char buffer that, at a certain position, contains a 4-byte int stored as big-endian. The code would extract the integer and store it as native little endian. Here's a brief example:
char test[8] = { 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07};
char *ptr = test;
int32_t value = 0;
value = ((*ptr) & 0xFF) << 24;
value |= ((*(ptr + 1)) & 0xFF) << 16;
value |= ((*(ptr + 2)) & 0xFF) << 8;
value |= (*(ptr + 3)) & 0xFF;
printf("value: %d\n", value);
value: 66051
The above code takes the first four bytes, stores it as little endian, and prints the result. Can anyone explain step by step how this works? I'm confused why ((*ptr) & 0xFF) << X wouldn't just evaluate to 0 for any X >= 8.

This code is constructing the value, one byte at a time.
First it captures the lowest byte
(*ptr) & 0xFF
And then shifts it to the highest byte
((*ptr) & 0xFF) << 24
And then assigns it to the previously 0 initialized value.
value =((*ptr) & 0xFF) << 24
Now the "magic" comes into play. Since the ptr value was declared as a char* adding one to it advances the pointer by one character.
(ptr + 1) /* the next character address */
*(ptr + 1) /* the next character */
After you see that they are using pointer math to update the relative starting address, the rest of the operations are the same as the ones already described, except that to preserve the partially shifted values, they or the values into the existing value variable
value |= ((*(ptr + 1)) & 0xFF) << 16
Note that pointer math is why you can do things like
char* ptr = ... some value ...
while (*ptr != 0) {
... do something ...
ptr++;
}
but it comes at a price of possibly really messing up your pointer addresses, greatly increasing your risk of a SEGFAULT violation. Some languages saw this as such a problem, that they removed the ability to do pointer math. An almost-pointer that you cannot do pointer math on is typically called a reference.

If you want to convert little endian represantion to big endian you can use htonl, htons, ntohl, ntohs. these functions convert values between host and network byte order. Big endian also used in arm based platform. see here: https://linux.die.net/man/3/endian

A code you might use is based on the idea that numbers on the network shall be sent in BIG ENDIAN mode.
The functions htonl() and htons() convert 32 bit integer and 16 bit integer in BIG ENDIAN where your system uses LITTLE ENDIAN and they leave the numbers in BIG ENDIAN otherwise.
Here the code:
#include <stdio.h>
#include <stdint.h>
#include <inttypes.h>
#include <arpa/inet.h>
int main(void)
{
uint32_t x,y;
uint16_t s,z;
x=0xFF567890;
y=htonl(x);
printf("LE=%08X BE=%08X\n",x,y);
s=0x7891;
z=htons(s);
printf("LE=%04X BE=%04X\n",s,z);
return 0;
}
This code is written to convert from LE to BE on a LE machine.
You might use the opposite functions ntohl() and ntohs() to convert from BE to LE, these functions convert the integers from BE to LE on the LE machines and don't convert on BE machines.

I'm confused why ((*ptr) & 0xFF) << X wouldn't just evaluate to 0 for any X >= 8.
I think you misinterpret the shift functionality.
value = ((*ptr) & 0xFF) << 24;
means a masking of the value at ptr with 0xff (the byte) and afterwards a shift by 24 BITS (not bytes). That is a shift by 24/8 bytes (3 bytes) to the highest byte.

One of the keypoints to understanding the evaluation of ((*ptr) & 0xFF) << X
Is Integer Promotion. The Value (*ptr) & 0xff is promoted to an Integer before being shifted.

I've written the code below. This code contains two functions swapmem() and swap64().
swapmem() swaps the bytes of a memory area of an arbitrary dimension.
swap64() swaps the bytes of a 64 bits integer.
At the end of this reply I indicate you an idea to solve your problem with the buffer of byte.
Here the code:
#include <stdio.h>
#include <stdint.h>
#include <inttypes.h>
#include <malloc.h>
void * swapmem(void *x, size_t len, int retnew);
uint64_t swap64(uint64_t k);
/**
brief swapmem
This function swaps the byte into a memory buffer.
param x
pointer to the buffer to be swapped
param len
lenght to the buffer to be swapped
param retnew
If this parameter is 1 the buffer is swapped in a new
buffer. The new buffer shall be deallocated by using
free() when it's no longer useful.
If this parameter is 0 the buffer is swapped in its
memory area.
return
The pointer to the memory area where the bytes has been
swapped or NULL if an error occurs.
*/
void * swapmem(void *x, size_t len, int retnew)
{
char *b = NULL, app;
size_t i;
if (x != NULL) {
if (retnew) {
b = malloc(len);
if (b!=NULL) {
for(i=0;i<len;i++) {
b[i]=*((char *)x+len-1-i);
}
}
} else {
b=(char *)x;
for(i=0;i<len/2;i++) {
app=b[i];
b[i]=b[len-1-i];
b[len-1-i]=app;
}
}
}
return b;
}
uint64_t swap64(uint64_t k)
{
return ((k << 56) |
((k & 0x000000000000FF00) << 40) |
((k & 0x0000000000FF0000) << 24) |
((k & 0x00000000FF000000) << 8) |
((k & 0x000000FF00000000) >> 8) |
((k & 0x0000FF0000000000) >> 24)|
((k & 0x00FF000000000000) >> 40)|
(k >> 56)
);
}
int main(void)
{
uint32_t x,*y;
uint16_t s,z;
uint64_t k,t;
x=0xFF567890;
/* Dynamic allocation is used to avoid to change the contents of x */
y=(uint32_t *)swapmem(&x,sizeof(x),1);
if (y!=NULL) {
printf("LE=%08X BE=%08X\n",x,*y);
free(y);
}
/* Dynamic allocation is not used. The contents of z and k will change */
z=s=0x7891;
swapmem(&z,sizeof(z),0);
printf("LE=%04X BE=%04X\n",s,z);
k=t=0x1120324351657389;
swapmem(&k,sizeof(k),0);
printf("LE=%16"PRIX64" BE=%16"PRIX64"\n",t,k);
/* LE64 to BE64 (or viceversa) using shift */
k=swap64(t);
printf("LE=%16"PRIX64" BE=%16"PRIX64"\n",t,k);
return 0;
}
After the program was compiled I had the curiosity to see the assembly code gcc generated. I discovered that the function swap64 is generated as indicated below.
00000000004007a0 <swap64>:
4007a0: 48 89 f8 mov %rdi,%rax
4007a3: 48 0f c8 bswap %rax
4007a6: c3 retq
This result is obtained compiling the code, on a PC with Intel I3 CPU, with the gcc options: -Ofast, or -O3, or -O2, or -Os.
You may solve your problem using something like the swap64() function. A function like the following I've named swap32():
uint32_t swap32(uint32_t k)
{
return ((k << 24) |
((k & 0x0000FF00) << 8) |
((k & 0x00FF0000) >> 8) |
(k >> 24)
);
}
You may use it as:
uint32_t j=swap32(*(uint32_t *)ptr);

Related

Copy low-order bytes of an integer whilst preserving endianness

I need to write a function that copies the specified number of low-order bytes of a given integer into an address in memory, whilst preserving their order.
void lo_bytes(uint8_t *dest, uint8_t no_bytes, uint32_t val)
I expect the usage to look like this:
uint8 dest[3];
lo_bytes(dest, 3, 0x44332211);
// Big-endian: dest = 33 22 11
// Little-endian: dest = 11 22 33
I've tried to implement the function using bit-shifts, memcpy, and iterating over each byte of val with a for-loop, but all of my attempts failed to work on either one or the other endianness.
Is it possible to do this in a platform-independent way, or do I need to use #ifdefs and have a separate piece of code for each endianness?
I've tried to implement the function using bit-shifts, memcpy, and
iterating over each byte of val with a for-loop, but all of my
attempts failed to work on either one or the other endianness.
All arithmetic, including bitwise arithmetic, is defined in terms of the values of the operands, not their representations. This cannot be sufficient for you because you want to obtain a result that differs depending on details of the representation style for type uint32_t.
You can operate on object representations via various approaches, but you still need to know which bytes to operate upon. That calls for some form of detection. If big-endian and little-endian are the only byte orders you're concerned with supporting, then I favor an approach similar to that given in #P__J__'s answer:
void lo_bytes(uint8_t *dest, uint8_t no_bytes, uint32_t val) {
static const union { uint32_t i; uint8_t a[4] } ubytes = { 1 };
memcpy(dest, &val + (1 - ubytes.a[0]) * (4 - no_bytes), no_bytes);
}
The expression (1 - ubytes.a[0]) evaluates to 1 if the representation of uint32_t is big endian, in which case the high-order bytes occur at the beginning of the representation of val. In that case, we want to skip the first 4 - no_bytes of the representation and copy the rest. If uint32_t has a little-endian representation, on the other hand, (1 - ubytes.a[0]) will evaluate to 0, with the result that the memcpy starts at the beginning of the representation. In every case, whichever bytes are copied from the representation of val, their order is maintained. That's what memcpy() does.
Is it possible to do this in a platform-independent way, or do I need to use #ifdefs and have a separate piece of code for each endianness?
No, that doesn't even make sense. Anything that cares about a specific characteristic of a platform (e.g. endianness) can't be platform independent.
Example 1 (platform independent):
// Copy the 3 least significant bytes to dest[]
dest[0] = value & 0xFF; dest[1] = (value >> 8) & 0xFF; dest[2] = (value >> 16) & 0xFF;
Example 2 (platform independent):
// Copy the 3 most significant bytes to dest[]
dest[0] = (value >> 8) & 0xFF; dest[1] = (value >> 16) & 0xFF; dest[2] = (value >> 24) & 0xFF;
Example 3 (platform dependent):
// I want the least significant bytes on some platforms and the most significant bytes on other platforms
#ifdef PLATFORM_TYPE_A
dest[0] = value & 0xFF; dest[1] = (value >> 8) & 0xFF; dest[2] = (value >> 16) & 0xFF;
#endif
#ifdef PLATFORM_TYPE_B
dest[0] = (value >> 8) & 0xFF; dest[1] = (value >> 16) & 0xFF; dest[2] = (value >> 24) & 0xFF;
#endif
Note that it makes no real difference what the cause of the platform dependence is (if it's endianness or something else), as soon as you have a platform dependence you can't have platform independence.
int detect_endianess(void) //1 if little endian 0 if big endianes
{
union
{
uint16_t u16;
uint8_t u8[2];
}val = {.u16 = 0x1122};
return val.u8[0] == 0x22;
}
void lo_bytes(void *dest, uint8_t no_bytes, uint32_t val)
{
if(detect_endianess())
{
memcpy(dest, &val, no_bytes);
}
else
{
memcpy(dest, (uint8_t *)(&val) + sizeof(val) - no_bytes, no_bytes);
}
}

C - Increment 18 bits in C 8051

I have been programming the 8051 for about two months now and am somewhat of a newbie to the C language. I am currently working with flash memory in order to read, write, erase, and analyze it. I am working on the write phase at the moment and one of the tasks that I need to do is specify an address location and fill that location with data then increment to the next location and fill it with complementary data. So on and so forth until I reach the end.
My dilemma is I have 18 address bits to play with and currently have three bytes allocated for those 18 bits. Is there anyway that I could combine those 18 bits into an int or unsigned int and increment like that? Or is my only option to increment the first byte, then when that byte rolls over to 0x00 increment the next byte and when that one rolls over, increment the next?
I currently have:
void inc_address(void)
{
P6=address_byte1;
P7=address_byte2;
P2=address_byte3;
P5=data_byte;
while(1)
{
P6++;
if(P6==0x00){P7++;}
else if(P7==0x00){P2++;}
else if(P2 < 0x94){break;} //hex 9 is for values dealing with flash chip
P5=~data_byte;
}
}
Where address is uint32_t:
void inc_address(void)
{
// Increment address
address = (address + 1) & 0x0003ffff ;
// Assert address A0 to A15
P6 = (address & 0xff)
P7 = (address >> 8) & 0xff
// Set least significant two bits of P2 to A16,A17
// without modifying other bits in P2
P2 &= 0xFC ; // xxxxxx00
P2 |= (address >> 16) & 0x03 ; // xxxxxxAA
// Set data
P5 = ~data_byte ;
}
However it is not clear why the function is called inc_address but also assigns P5 with ~data_byte, which presumably asserts the the data bus? It is doing something more than increment an address it seems, so is poorly and confusingly named. I suggest also that the function should take address and data as parameters rather than global data.
Is there anyway that I could combine those 18 bits into an int or
unsigned int and increment like that?
Sure. Supposing that int and unsigned int are at least 18 bits wide on your system, you can do this:
unsigned int next_address = (hi_byte << 16) + (mid_byte << 8) + low_byte + 1;
hi_byte = next_address >> 16;
mid_byte = (next_address >> 8) & 0xff;
low_byte = next_address & 0xff;
The << and >> are bitwise shift operators, and the binary & is the bitwise "and" operator.
It would be a bit safer and more portable to not make assumptions about the sizes of your types, however. To avoid that, include stdint.h, and use type uint_least32_t instead of unsigned int:
uint_least32_t next_address = ((uint_least32_t) hi_byte << 16)
+ ((uint_least32_t) mid_byte << 8)
+ (uint_least32_t) low_byte
+ 1;
// ...

Bit Shifting - Finding nth byte in a number [duplicate]

I know you can get the first byte by using
int x = number & ((1<<8)-1);
or
int x = number & 0xFF;
But I don't know how to get the nth byte of an integer.
For example, 1234 is 00000000 00000000 00000100 11010010 as 32bit integer
How can I get all of those bytes? first one would be 210, second would be 4 and the last two would be 0.
int x = (number >> (8*n)) & 0xff;
where n is 0 for the first byte, 1 for the second byte, etc.
For the (n+1)th byte in whatever order they appear in memory (which is also least- to most- significant on little-endian machines like x86):
int x = ((unsigned char *)(&number))[n];
For the (n+1)th byte from least to most significant on big-endian machines:
int x = ((unsigned char *)(&number))[sizeof(int) - 1 - n];
For the (n+1)th byte from least to most significant (any endian):
int x = ((unsigned int)number >> (n << 3)) & 0xff;
Of course, these all assume that n < sizeof(int), and that number is an int.
int nth = (number >> (n * 8)) & 0xFF;
Carry it into the lowest byte and take it in the "familiar" manner.
If you are wanting a byte, wouldn't the better solution be:
byte x = (byte)(number >> (8 * n));
This way, you are returning and dealing with a byte instead of an int, so we are using less memory, and we don't have to do the binary and operation & 0xff just to mask the result down to a byte. I also saw that the person asking the question used an int in their example, but that doesn't make it right.
I know this question was asked a long time ago, but I just ran into this problem, and I think that this is a better solution regardless.
//was trying to do inplace, would have been better if I had swapped higher and lower bytes somehow
uint32_t reverseBytes(uint32_t value) {
uint32_t temp;
size_t size=sizeof(uint32_t);
for(int i=0; i<size/2; i++){
//get byte i
temp = (value >> (8*i)) & 0xff;
//put higher in lower byte
value = ((value & (~(0xff << (8*i)))) | (value & ((0xff << (8*(size-i-1)))))>>(8*(size-2*i-1))) ;
//move lower byte which was stored in temp to higher byte
value=((value & (~(0xff << (8*(size-i-1)))))|(temp << (8*(size-i-1))));
}
return value;
}

what does a[0] = addr & 0xff?

i'm currently learning from the book "the shellcoder's handbook", I have a strong understanding of c but recently I came across a piece of code that I can't grasp.
Here is the piece of code:
char a[4];
unsigned int addr = 0x0806d3b0;
a[0] = addr & 0xff;
a[1] = (addr & 0xff00) >> 8;
a[2] = (addr & 0xff0000) >> 16;
a[3] = (addr) >> 24;
So the question is what does this, what is addr & 0xff (and the three lines below it) and what makes >> 8 to it (I know that it divides it 8 times by 2)?
Ps: don't hesitate to tell me if you have ideas for the tags that I should use.
The variable addr is 32 bits of data, while each element in the array a is 8 bits. What the code does is copy the 32 bits of addr into the array a, one byte at a time.
Lets take this line:
a[1] = (addr & 0xff00) >> 8;
And then do it step by step.
addr & 0xff00 This gets the bits 8 to 15 of the value in addr, the result after the operation is 0x0000d300.
>> 8 This shifts the bits to the right, so 0x0000d300 becomes 0x000000d3.
Assign the resulting value of the mask and shift to a[1].
The code is trying to enforce endianness on the data input. Specifically, it is trying to enforce little endian behavior on the data. Here is the explaination:
a[0] = addr & 0xff; /* gets the LSB 0xb0 */
a[1] = (addr & 0xff00) >> 8; /* gets the 2nd LSB 0xd3 */
a[2] = (addr & 0xff0000) >> 16; /* gets 2nd MSB 0x06 */
a[3] = (addr) >> 24; /* gets the MSB 0x08 */
So basically, the code is masking and separating out every byte of data and storing it in the array "a" in the little endian format.
unsigned char a[4]; /* I think using unsigned char is better in this case */
unsigned int addr = 0x0806d3b0;
a[0] = addr & 0xff; /* get the least significant byte 0xb0 */
a[1] = (addr & 0xff00) >> 8; /* get the second least significant byte 0xd3 */
a[2] = (addr & 0xff0000) >> 16; /* get the second most significant byte 0x06 */
a[3] = (addr) >> 24; /* get the most significant byte 0x08 */
Apparently, the code isolates the individual bytes from addr to store them in the array a so they can be indexed. The first line
a[0] = addr & 0xff;
masks out the byte of lowest value by using 0xff as a bit mask; the subsequent lines do the same, but in addition shift the result to the rightmost position. Finally, the the last line
a[3] = (addr) >> 24;
no masking is necessary anymore, as all unneccesary information is discarded by the shift.
The code is effectively storing a 32 bit adress in a 4 chars long array. As you may know, a char has a byte (8 bit). It first copies the first byte of the adress, then shifts, copies the second byte, then shifts, etc. You get the gist.
It enforces endianness, and stores the integer in little-endian format in a.
See the illustration on wikipedia.
also, why not visualize the bit shifting results..
char a[4];
unsigned int addr = 0x0806d3b0;
a[0] = addr & 0xff;
a[1] = (addr & 0xff00) >> 8;
a[2] = (addr & 0xff0000) >> 16;
a[3] = (addr) >> 24;
int i = 0;
for( ; i < 4; i++ )
{
printf( "a[%d] = %02x\t", i, (unsigned char)a[i] );
}
printf("\n" );
Output:
a[0] = b0 a[1] = d3 a[2] = 06 a[3] = 08
I addition to the multiple answers given, the code has some flaws that need to be fixed to make the code portable. In particular, the char type is very dangerous to use for storing values, because of its implementation-defined signedness. Very classic C bug. If the code was taken from a book, then you should read that book sceptically.
While we are at it, we can also tidy up the code, make it overly explicit to avoid potential future maintenance bugs, remove some implicit type promotions of integer literals etc.
#include <stdint.h>
uint8_t a[4];
uint32_t addr = 0x0806d3b0UL;
a[0] = addr & 0xFFu;
a[1] = (addr >> 8) & 0xFFu;
a[2] = (addr >> 16) & 0xFFu;
a[3] = (addr >> 24) & 0xFFu;
The masks & 0xFFu are strictly speaking not needed, but they might save you from some false positive compiler warnings about wrong integer types. Alternatively, each shift result could be cast to uint8_t and that would have been fine too.

c get nth byte of integer

I know you can get the first byte by using
int x = number & ((1<<8)-1);
or
int x = number & 0xFF;
But I don't know how to get the nth byte of an integer.
For example, 1234 is 00000000 00000000 00000100 11010010 as 32bit integer
How can I get all of those bytes? first one would be 210, second would be 4 and the last two would be 0.
int x = (number >> (8*n)) & 0xff;
where n is 0 for the first byte, 1 for the second byte, etc.
For the (n+1)th byte in whatever order they appear in memory (which is also least- to most- significant on little-endian machines like x86):
int x = ((unsigned char *)(&number))[n];
For the (n+1)th byte from least to most significant on big-endian machines:
int x = ((unsigned char *)(&number))[sizeof(int) - 1 - n];
For the (n+1)th byte from least to most significant (any endian):
int x = ((unsigned int)number >> (n << 3)) & 0xff;
Of course, these all assume that n < sizeof(int), and that number is an int.
int nth = (number >> (n * 8)) & 0xFF;
Carry it into the lowest byte and take it in the "familiar" manner.
If you are wanting a byte, wouldn't the better solution be:
byte x = (byte)(number >> (8 * n));
This way, you are returning and dealing with a byte instead of an int, so we are using less memory, and we don't have to do the binary and operation & 0xff just to mask the result down to a byte. I also saw that the person asking the question used an int in their example, but that doesn't make it right.
I know this question was asked a long time ago, but I just ran into this problem, and I think that this is a better solution regardless.
//was trying to do inplace, would have been better if I had swapped higher and lower bytes somehow
uint32_t reverseBytes(uint32_t value) {
uint32_t temp;
size_t size=sizeof(uint32_t);
for(int i=0; i<size/2; i++){
//get byte i
temp = (value >> (8*i)) & 0xff;
//put higher in lower byte
value = ((value & (~(0xff << (8*i)))) | (value & ((0xff << (8*(size-i-1)))))>>(8*(size-2*i-1))) ;
//move lower byte which was stored in temp to higher byte
value=((value & (~(0xff << (8*(size-i-1)))))|(temp << (8*(size-i-1))));
}
return value;
}

Resources