Optimized code for big to little endian conversion - c

In an interview, I was asked to implement big_to_little_endian() as a macro. I implemented using shift operator. But the interviewer want me to optimize this further. I could not do it. Later I googled & searched but could not find it. Can someone help in understanding how to further optimize this code?
#define be_to_le (((x) >> 24) | (((x) & 0x00FF0000) >> 8) | (((x) & 0x0000FF00) << 8) | ((x) << 24))

He might have been referring to using a 16-bit op to swap the top two words then using 8-bit ops to swap the bytes in them -- saves a couple instructions, easiest done in a union, though C technically doesn't like it (but many compilers will accept it), and it still compiler dependent since you are hoping the compiler optimizes a couple things out:
union dword {
unsigned int i;
union shorts {
unsigned short s0, s1;
union bytes {
unsigned char c0, c1, c2, c3;
} c;
} s;
};
union dword in = (union dword)x;
union dword temp = { x.s.s1, x.s.s0 };
union dword out = { temp.s.c.c1, temp.s.c.c0, temp.s.c.c3, temp.s.c.c2 };
Not even valid C, but you get the idea (and I don't think the compiler will even emit what I'm hoping it will).
Or you can save an op, but introduce a data dependency so probably runs slower.
temp = (x << 16) | ( x >> 16)
out = ((0xff00ff00 & temp) >> 8) | (0x00ff00ff & temp) << 8)
Best is just use the compiler intrinsic since it maps to a single bswap instruction.

Related

C set 3 bits for a particular number

I am trying to understand masking concept and want to set bits 24,25,26 of a uint32_t number in C.
example i have
uint32_t data =0;
I am taking an input from user of uint_8 which can be only be value 3 and 4 (011,100)
I want to set the value 011 or 110 in bits 24,25,26 of the data variable without disturbing other bits.
Thanks.
To set bits 24, 25, and 26 of an integer without modifying the other bits, you can use this pattern:
data = (data & ~((uint32_t)7 << 24)) | ((uint32_t)(newBitValues & 7) << 24);
The first & operation clears those three bits. Then we use another & operation to ensure we have a number between 0 and 7. Then we shift it to the left by 24 bits and use | to put those bits into the final result.
I have some uint32_t casts just to ensure that this code works properly on systems where int has fewer than 32 bits, but you probably won't need those unless you are programming embedded systems.
More general approach macro and function. Both are the same efficient as optimizing compilers do a really good job. Macro sets n bits of the d at position s to nd. Function has the same parameters order.
#define MASK(n) ((1ULL << n) - 1)
#define SMASK(n,s) (~(MASK(n) << s))
#define NEWDATA(d,n,s) (((d) & MASK(n)) << s)
#define SETBITS(d,nd,n,s) (((d) & SMASK(n,s)) | NEWDATA(nd,n,s))
uint32_t setBits(uint32_t data, uint32_t newBitValues, unsigned nbits, unsigned startbit)
{
uint32_t mask = (1UL << nbits) - 1;
uint32_t smask = ~(mask << startbit);
data = (data & smask) | ((newBitValues & mask) << startbit);
return data;
}

copy byte-reversed uint64_t to uint8_t array

I know how to reverse the byte order (convert big endian to little endian in C [without using provided func]) - in this case I'd like to use __builtin_bswap64
I also know how to copy a 64bit uint to a char array - ideally memcopy. (How do I convert a 64bit integer to a char array and back?)
My problem is the combination of both these. At the root of the problem, I'm trying to find a faster alternative to this code:
carr[33] = ((some64bitvalue >> 56) & 0xFF) ;
carr[34] = ((some64bitvalue >> 48) & 0xFF) ;
carr[35] = ((some64bitvalue >> 40) & 0xFF) ;
carr[36] = ((some64bitvalue >> 32) & 0xFF) ;
carr[37] = ((some64bitvalue >> 24) & 0xFF) ;
carr[38] = ((some64bitvalue >> 16) & 0xFF) ;
carr[39] = ((some64bitvalue >> 8) & 0xFF) ;
carr[40] = (some64bitvalue & 0XFF);
As memcopy doesn't take the result of __builtin_bswap64 as source argument (or does it?), I tried this:
*(uint64_t *)upub+33 = __builtin_bswap64(some64bitvalue);
but I end up with the
error: lvalue required as left operand of assignment
Is there a faster alternative to the original code I'm trying to replace at all?
This:
*(uint64_t *)upub+33 = __builtin_bswap64(PplusQ[di][3]);
parses as
(*(uint64_t *) upub) + 33 = __builtin_bswap64(PplusQ[di][3]);
so the left-hand side is a uint64_t, not an lvalue.
So would this work?
*(uint64_t *) (upub+33) = __builtin_bswap64(PplusQ[di][3]);
or did you mean to cast upub to uint64_t * first, as Aconcagua commented?
*((uint64_t *) upub + 33) = __builtin_bswap64(PplusQ[di][3]);
I didn't see the type of upub mentioned, so I can't tell.
Also, I have a feeling that there may be an issue with the aliasing rules if upub is originally pointing to another type, so you may want to use something like gcc's -fno-strict-aliasing or make the assignment through a union, or one byte at a time as in your first code snippet.
You can copy as:
uint64_t tmp = __builtin_bswap64(some64bitvalue);
memcpy(upub+33,&tmp,sizeof(tmp));
assuming upub is pointer variable
When writing endian-independent code there is no alternative to bit shifts. You code is likely already close to ideal.
What you could play around with is to use a loop instead of hard-coded numbers. Something along the lines of this:
for(uint_fast8_t i=0; i<8; i++)
{
carr[i+offset] = (some64bitvalue >> (56-(i*8)) & 0xFF;
}
This may turn slower or faster or equal compared to what you already have, depending on the system. Overall, it doesn't make any sense to discuss manual optimization like this without a specific system in mind.

Combining uint8_t, uint16_t and uint8_t

I have three values, uint8_t, uint16_t and uint8_t in that order. I am trying to combine them to one uint_32 without losing the order. I found this question from here, but I got stuck with the uint_16 value in the middle.
For example:
uint8_t v1=0x01;
uint16_t v2=0x1001;
uint8_t v3=0x11;
uint32_t comb = 0x01100111;
I was thinking about spitting v2 into two separate uint8_t:s but realized there might be some easier way to solve it.
My try:
v2 = 0x1001;
a = v2 & 0xFF;
b = v1 >> 8;
first = ((uint16_t)v1 << 8) | a;
end = ((uint16_t)b << 8) | v3;
comb = ((uint32_t)first << 16) | end;
This should be your nestedly implied and as one-liner written transformation:
uint32_t comb = ((uint32_t)v1 << 24) | (((uint32_t)v2 << 8) | v3);
Basically, you have the 8 | 16 | 8 building the 32bit-sized type. To shift the first one and put at the head, you would need to cast to 32bit and use 24 (32-8). Then OR the next ones whilst shifting, i.e. placing at the right offset and the rest filling with zeros and casting respectively.
You use OR for the obvious reasons of not losing any information.

Changing endianness on 3 byte integer

I am receiving a 3-byte integer, which I'm storing in an array. For now, assume the array is unsigned char myarray[3]
Normally, I would convert this into a standard int using:
int mynum = ((myarray[2] << 16) | (myarray[1] << 8) | (myarray[0]));
However, before I can do this, I need to convert the data from network to host byte ordering.
So, I change the above to (it comes in 0-1-2, but it's n to h, so 0-2-1 is what I want):
int mynum = ((myarray[1] << 16) | (myarray[2] << 8) | (myarray[0]));
However, this does not seem to work. For the life of me can't figure this out. I've looked at it so much that at this point I think I'm fried and just confusing myself. Is what I am doing correct? Is there a better way? Would the following work?
int mynum = ((myarray[2] << 16) | (myarray[1] << 8) | (myarray[0]));
int correctnum = ntohl(mynum);
Here's an alternate idea. Why not just make it structured and make it explicit what you're doing. Some of the confusion you're having may be rooted in the "I'm storing in an array" premise. If instead, you defined
typedef struct {
u8 highByte;
u8 midByte;
u8 lowByte;
} ThreeByteInt;
To turn it into an int, you just do
u32 ThreeByteTo32(ThreeByteInt *bytes) {
return (bytes->highByte << 16) + (bytes->midByte << 8) + (bytes->lowByte);
}
if you receive the value in network ordering (that is big endian) you have this situation:
myarray[0] = most significant byte
myarray[1] = middle byte
myarray[2] = least significant byte
so this should work:
int result = (((int) myarray[0]) << 16) | (((int) myarray[1]) << 8) | ((int) myarray[2]);
Beside the ways of using strucures / unions with byte-size members you have two other ways
Using ntoh / hton and masking out the high byte of the 4-byte integer before or after
the conversion with an bitwise and.
Doing the bitshift operations contained in other answers
At any rate you should not rely on side effects and shift data beyond the size of data type.
Shift by 16 is beyond the size of unsigned char and will cause problems depending on compiler, flags, platform endianess and byte order. So always do the proper cast before bitwise to make it work on any compiler / platform:
int result = (((int) myarray[0]) << 16) | (((int) myarray[1]) << 8) | ((int) myarray[2]);
Why don't just receive into the top 3 bytes of a 4-byte buffer? After that you could use ntohl which is just a byte swap instruction in most architectures. In some optimization levels it'll be faster than simple bitshifts and or
union
{
int32_t val;
unsigned char myarray[4];
} data;
memcpy(&data, buffer, 3);
data.myarray[3] = 0;
data.val = ntohl(data.val);
or in case you have copied it to the bottom 3 bytes then another shift is enough
memcpy(&data.myarray[1], buffer, 3);
data.myarray[0] = 0;
data.val = ntohl(data.val) >> 8; // or data.val = ntohl(data.val << 8);
unsigned char myarray[3] = { 1, 2, 3 };
# if LITTLE_ENDIAN // you figure out a way to express this on your platform
int mynum = (myarray[0] << 0) | (myarray[1] << 8) | (myarray[2] << 16);
# else
int mynum = (myarray[0] << 16) | (myarray[1] << 8) | (myarray[2] << 0);
# endif
printf("%x\n", mynum);
That prints 30201 which I think is what you want. The key is to realize that you have to shift the bytes differently per-platform: you can't easily use ntohl() because you don't know where to put the extra zero byte.

C bitfield element with non-contiguous layout

I'm looking for input on the most elegant interface to put around a memory-mapped register interface where the target object is split in the register:
union __attribute__ ((__packed__)) epsr_t {
uint32_t storage;
struct {
unsigned reserved0 : 10;
unsigned ICI_IT_2to7 : 6; // TOP HALF
unsigned reserved1 : 8;
unsigned T : 1;
unsigned ICI_IT_0to1 : 2; // BOTTOM HALF
unsigned reserved2 : 5;
} bits;
};
In this case, accessing the single bit T or any of the reserved fields work fine, but to read or write the ICI_IT requires code more like:
union epsr_t epsr;
// Reading:
uint8_t ici_it = (epsr.bits.ICI_IT_2to7 << 2) | epsr.bits.ICI_IT_0to1;
// Writing:
epsr.bits.ICI_IT_2to7 = ici_it >> 2;
epsr.bits.ICI_IT_0to1 = ici_it & 0x3;
At this point I've lost a chunk of the simplicity / convenience that the bitfield abstraction is trying to provide. I considered the macro solution:
#define GET_ICI_IT(_e) ((_e.bits.ICI_IT_2to7 << 2) | _e.bits.ICI_IT_0to1)
#define SET_ICI_IT(_e, _i) do {\
_e.bits.ICI_IT_2to7 = _i >> 2;\
_e.bits.ICI_IT_0to1 = _i & 0x3;\
while (0);
But I'm not a huge fan of macros like this as a general rule, I hate chasing them down when I'm reading someone else's code, and far be it from me to inflict such misery on others. I was hoping there was a creative trick involving structs / unions / what-have-you to hide the split nature of this object more elegantly (ideally as a simple member of an object).
I don't think there's ever a 'nice' way, and actually I wouldn't rely on bitfields... Sometimes it's better to just have a bunch of exhaustive macros to do everything you'd want to do, document them well, and then rely on them having encapsulated your problem...
#define ICI_IT_HI_SHIFT 14
#define ICI_IT_HI_MASK 0xfc
#define ICI_IT_LO_SHIFT 5
#define ICI_IT_LO_MASK 0x02
// Bits containing the ICI_IT value split in the 32-bit EPSR
#define ICI_IT_PACKED_MASK ((ICI_IT_HI_MASK << ICI_IT_HI_SHIFT) | \
(ICI_IT_LO_MASK << ICI_IT_LO_SHIFT))
// Packs a single 8-bit ICI_IT value x into a 32-bit EPSR e
#define PACK_ICI_IT(e,x) ((e & ~ICI_IT_PACKED_MASK) | \
((x & ICI_IT_HI_MASK) << ICI_IT_HI_SHIFT) | \
((x & ICI_IT_LO_MASK) << ICI_IT_LO_SHIFT)))
// Unpacks a split 8-bit ICI_IT value from a 32-bit EPSR e
#define UNPACK_ICI_IT(e) (((e >> ICI_IT_HI_SHIFT) & ICI_IT_HI_MASK) | \
((e >> ICI_IT_LO_SHIFT) & ICI_IT_LO_MASK)))
Note that I haven't put type casting and normal macro stuff in, for the sake of readability. Yes, I get the irony in mentioning readability...
If you dislike macros that much just use an inline function, but the macro solution you have is fine.
Does your compiler support anonymous unions?
I find it an elegant solution which gets rid of your .bits part. It is not C99 compliant, but most compilers do support it. And it became a standard in C11.
See also this question: Anonymous union within struct not in c99?.

Resources