Does bit-shifting properly isolate bytes on different endian systems? - c

I'm attempting to write an integer to a byte buffer. Will the following code always write in big endian format, regardless of the endianness of the system:
byte[0] = (uint8_t) (val & 0xFF000000) >> 24;
byte[1] = (uint8_t) (val & 0x00FF0000) >> 16;
byte[2] = (uint8_t) (val & 0x0000FF00) >> 8;
byte[3] = (uint8_t) (val & 0x000000FF);
Unfortunately, I don't have access to htonl() and similar functions.

Yes, this will work correctly. The bit-shifting operators deal with the abstract numeric values, not the way they're represented in the hardware registers or RAM. >> N is essentially equivalent to dividing by 2N.

Related

C how to read middle bits?

Please note my comments, the question is still unanswered.
I have the following which I can't change:
unsigned long addr=142;
u16 offset_low, offset_middle;
u32 offset_high;
I want to set offset_low for low 16 bits, offset_middle for mid 16 bits and offset_high for higher 32 bits of addr.
So I wrote:
offset_low = addr & 0xFFFF;
offset_middle = addr & 0xFFFF0000;
offset_high = addr & 0xFFFFFFFF0000;
Is this right? Is there any clear way to do it instead of wiriting so many F?
Why I think it's not right?
I am working with little endian, so when doing addr & 0xFFFF0000; I will get the mid bits but with zeros and it may load the zeros instead of non-zeroes.
For your purpose you must shift the masked values:
unsigned long addr = 142; // 64-bit on the target system
uint16_t offset_low = addr & 0xFFFF;
uint16_t offset_middle = (addr & 0xFFFF0000) >> 16;
uint32_t offset_high = (addr & 0xFFFFFFFF00000000) >> 32;
Note that since you extract exactly 16 and 32 bits to variables with the same size, masking can be omitted:
uint64_t addr = 142;
uint16_t offset_low = addr;
uint16_t offset_middle = addr >> 16;
uint32_t offset_high = addr >> 32;
The order of bytes in memory (little endian vs big endian) is irrelevant for this question. You could read the specific parts from memory using this knowledge, reading the first 2 bytes for offset_low, the next 2 for offset_middle and the next 4 for offset_high, but extracting from the full 64-bit value is performed the same for both architectures.
Shifting one by desired bits and then subtracting one will give sequence of bits 1 unless you want the top (most significant) bit in the integer type to be one.
Assuming that unsigned long in the environment has 33 bits or more, it can be written like this:
offset_low = addr & ((1UL << 16) - 1);
offset_middle = (addr >> 16) & ((1UL << 16) - 1);
offset_high = (addr >> 32) & ((1UL << 32) - 1);
Is this right?
Not quite, these would be correct:
offset_low = addr & 0xFFFF;
offset_middle = (addr >> 16) & 0xFFFF;
offset_high = addr >> 32;
You didn't shift your results to the right (and your high was just wrong).
I am working with little endian, so when doing addr & 0xFFFF0000; I will get the mid bits but with zeros and it may load the zeros instead of non-zeroes.
The endianness doesn't matter in code, because code runs on the same machine. It only matters during serialization, where you write to a stream on one machine and read from another machine of another endianness, thus getting garbage.

Copy low-order bytes of an integer whilst preserving endianness

I need to write a function that copies the specified number of low-order bytes of a given integer into an address in memory, whilst preserving their order.
void lo_bytes(uint8_t *dest, uint8_t no_bytes, uint32_t val)
I expect the usage to look like this:
uint8 dest[3];
lo_bytes(dest, 3, 0x44332211);
// Big-endian: dest = 33 22 11
// Little-endian: dest = 11 22 33
I've tried to implement the function using bit-shifts, memcpy, and iterating over each byte of val with a for-loop, but all of my attempts failed to work on either one or the other endianness.
Is it possible to do this in a platform-independent way, or do I need to use #ifdefs and have a separate piece of code for each endianness?
I've tried to implement the function using bit-shifts, memcpy, and
iterating over each byte of val with a for-loop, but all of my
attempts failed to work on either one or the other endianness.
All arithmetic, including bitwise arithmetic, is defined in terms of the values of the operands, not their representations. This cannot be sufficient for you because you want to obtain a result that differs depending on details of the representation style for type uint32_t.
You can operate on object representations via various approaches, but you still need to know which bytes to operate upon. That calls for some form of detection. If big-endian and little-endian are the only byte orders you're concerned with supporting, then I favor an approach similar to that given in #P__J__'s answer:
void lo_bytes(uint8_t *dest, uint8_t no_bytes, uint32_t val) {
static const union { uint32_t i; uint8_t a[4] } ubytes = { 1 };
memcpy(dest, &val + (1 - ubytes.a[0]) * (4 - no_bytes), no_bytes);
}
The expression (1 - ubytes.a[0]) evaluates to 1 if the representation of uint32_t is big endian, in which case the high-order bytes occur at the beginning of the representation of val. In that case, we want to skip the first 4 - no_bytes of the representation and copy the rest. If uint32_t has a little-endian representation, on the other hand, (1 - ubytes.a[0]) will evaluate to 0, with the result that the memcpy starts at the beginning of the representation. In every case, whichever bytes are copied from the representation of val, their order is maintained. That's what memcpy() does.
Is it possible to do this in a platform-independent way, or do I need to use #ifdefs and have a separate piece of code for each endianness?
No, that doesn't even make sense. Anything that cares about a specific characteristic of a platform (e.g. endianness) can't be platform independent.
Example 1 (platform independent):
// Copy the 3 least significant bytes to dest[]
dest[0] = value & 0xFF; dest[1] = (value >> 8) & 0xFF; dest[2] = (value >> 16) & 0xFF;
Example 2 (platform independent):
// Copy the 3 most significant bytes to dest[]
dest[0] = (value >> 8) & 0xFF; dest[1] = (value >> 16) & 0xFF; dest[2] = (value >> 24) & 0xFF;
Example 3 (platform dependent):
// I want the least significant bytes on some platforms and the most significant bytes on other platforms
#ifdef PLATFORM_TYPE_A
dest[0] = value & 0xFF; dest[1] = (value >> 8) & 0xFF; dest[2] = (value >> 16) & 0xFF;
#endif
#ifdef PLATFORM_TYPE_B
dest[0] = (value >> 8) & 0xFF; dest[1] = (value >> 16) & 0xFF; dest[2] = (value >> 24) & 0xFF;
#endif
Note that it makes no real difference what the cause of the platform dependence is (if it's endianness or something else), as soon as you have a platform dependence you can't have platform independence.
int detect_endianess(void) //1 if little endian 0 if big endianes
{
union
{
uint16_t u16;
uint8_t u8[2];
}val = {.u16 = 0x1122};
return val.u8[0] == 0x22;
}
void lo_bytes(void *dest, uint8_t no_bytes, uint32_t val)
{
if(detect_endianess())
{
memcpy(dest, &val, no_bytes);
}
else
{
memcpy(dest, (uint8_t *)(&val) + sizeof(val) - no_bytes, no_bytes);
}
}

Mask or not mask when converting int to byte array?

Say you have a integer and you want to convert it to a byte array. After searching various places I've seen two ways of doing this, one with is shift only and one is shift then mask. I understand the shifting part, but why masking?
For example, scenario 1:
uint8 someByteArray[4];
uint32 someInt;
someByteArray[0] = someInt >> 24;
someByteArray[1] = someInt >> 16;
someByteArray[2] = someInt >> 8;
someByteArray[3] = someInt;
Scenario 2:
uint8 someByteArray[4];
uint32 someInt;
someByteArray[0] = (someInt >> 24) & 0xFF;
someByteArray[1] = (someInt >> 16) & 0xFF;
someByteArray[2] = (someInt >> 8) & 0xFF;
someByteArray[3] = someInt & 0xFF;
Is there a reason for choosing one over the other?
uint8 and uint32 are not standard types in C. I assume they represent 8-bit and 32-bit unsigned integral types, respectively (such as supported by Microsoft compilers as a vendor-specific extension).
Anyways ....
The masking is more general - it ensures the result is between 0 and 0xFF regardless of the actual type of elements someByteArray or of someInt.
In this particular case, it makes no difference, since the conversion of uint32 to uint8 is guaranteed to use modulo arithmetic (modulo 0xFF + 0x01 which is equal to 0x100 or 256 in decimal). However, if your code is changed to use variables or arrays of different types, the masking is necessary to ensure the result is between 0 and 255 (inclusive).
With some compilers the masking stops compiler warnings (it effectively tells the compiler that the expression produces a value between 0 and 0xFF, which can be stored in a 8 bit unsigned). However, some other compilers complain about the act of converting a larger type to an 8 bit type. Because of that, you will sometimes see a third variant, which truly demonstrates a "belts and suspenders" mindset.
uint8 someByteArray[4];
uint32 someInt;
someByteArray[0] = (uint8)((someInt >> 24) & 0xFF);
someByteArray[1] = (uint8)(someInt >> 16) & 0xFF);
someByteArray[2] = (uint8)((someInt >> 8) & 0xFF);
someByteArray[3] = (uint8)(someInt & 0xFF);

is it possible to bit mask a char array

Let's say I have the following char array
char array[32];
I want to use only the 10 most significant bits of this array as a hash value.
Is it possible to use bitwise operation on this char array?
If so, how should i do it?
Assuming your implementation has 8-bit char, and that you have a 256-bit number stored in big endian in this array, here how to access the 10 msb of the 256-bit number.
uint16_t a;
a = (array[0] << 2 | (array[1] & 0xC0) >> 6) & 0x3FF;
I'm pretty sure you want something like this (again assuming 8-bit chars stored big endian in array):
uint16_t a = (((uint16_t)array[0] & 0xFF) << 2 | ((uint16_t)array[1] & 0xFF) >> 6) & 0x3FF;
To break that down a bit:
uint16_t byte0 = (uint16_t)array[0] & 0xFF;
uint16_t byte1 = (uint16_t)array[1] & 0xFF;
uint16_t a = (byte0 << 2 | byte1 >> 6) & 0x3FF;

Signed right shift = strange result?

I was helping someone with their homework and ran into this strange issue. The problem is to write a function that reverses the order of bytes of a signed integer(That's how the function was specified anyway), and this is the solution I came up with:
int reverse(int x)
{
int reversed = 0;
reversed = (x & (0xFF << 24)) >> 24;
reversed |= (x & (0xFF << 16)) >> 8;
reversed |= (x & (0xFF << 8)) << 8;
reversed |= (x & 0xFF) << 24;
return reversed;
}
If you pass 0xFF000000 to this function, the first assignment will result in 0xFFFFFFFF. I don't really understand what is going on, but I know it has something to do with conversions back and forth between signed and unsigned, or something like that.
If I either append ul to 0xFF it works fine, which I assume is because it's forced to unsigned then converted to signed or something in that direction. The resulting code also changes; without the ul specifier it uses sar(shift arithmetic right), but as unsigned it uses shr as intended.
I would really appreciate it if someone could shed some light on this for me. I'm supposed to know this stuff, and I thought I did, but I'm really not sure what's going on here.
Thanks in advance!
Since x is a signed quantity, the result of (x & (0xFF << 24)) is 0xFF000000 which is also signed and thus a negative number since the top (sign) bit is set. The >> operator on int (a signed value) performs sign extension (Edit: though this behaviour is undefined and implementation-specific) and propagates the sign bit value of 1 as the value is shifted to the right.
You should rewrite the function as follows to work exclusively on unsigned values:
unsigned reverse(unsigned x)
{
unsigned int reversed = 0;
reversed = (x & (0xFF << 24)) >> 24;
reversed |= (x & (0xFF << 16)) >> 8;
reversed |= (x & (0xFF << 8)) << 8;
reversed |= (x & 0xFF) << 24;
return reversed;
}
From your results we can deduce that you are on a 32-bit machine.
(x & (0xFF << 24)) >> 24
In this expression 0xFF is an int, so 0xFF << 24 is also an int, as is x.
When you perform the bitwise & between two int, the result is also an int and in this case the value is 0xFF000000 which on a 32-bit machine means that the sign bit is set, so you have a negative number.
The result of performing a right-shift on an object of signed type with a negative value is implementation-defined. In your case, as sign-preserving arithmetic shift right is performed.
If you right-shift an unsigned type, then you would get the results that you were expecting for a byte reversal function. You could achieve this by making either operand of the bitwise & operand an unsigned type forcing conversion of both operands to the unsigned type. (This is true on any implementation where an signed int can't hold all the possible range of positive values of an unsigned int which is nearly all implementations.)
Right shift on signed types is implementation defined, in particular the compiler is free to do an arithmetic or logical shift as pleases. This is something you will not notice if the concrete value that you are treating is positive, but as soon as it is negative you may fall into a trap.
Just don't do it, this is not portable.
x is signed, so the highest bit is used for the sign. 0xFF000000 means "negative 0x7F000000". When you do the shift, the result is "sign extended": The binary digit that is added on the left to replace the former MSB that was shifted right, is always the same as the sign of value. So
0xFF000000 >> 1 == 0xFF800000
0xFF000000 >> 2 == 0xFFC00000
0xFF000000 >> 3 == 0xFFE00000
0xFF000000 >> 4 == 0xFFF00000
If the value being shifted is unsigned, or if the shift is toward the left, the new bit would be 0. It's only in right-shifts of signed values that sign-extension come into play.
If you want it to work the same on al platforms with both signed and unsigned integers, change
(x & (0xFF << 24)) >> 24
into
(x >> 24) & 0xFF
If this is java code you should use '>>>' which is an unsigned right shift, otherwise it will sign extend the value

Resources