Portable way of splitting n-byte integer into single bytes

Portable way of splitting n-byte integer into single bytes - c

The problem is simple:
Take an 32-bit or 64-bit integer and split it up to send over an (usually)1-byte interface like uart, spi or i2c.
To do this I can easily use bit masking and shifting to get what I want. However, I want this to be portable that will work on big and little endian, but also make it work for platforms that don't discard bits but rotate through carry(masking gets rid of excess bits right?).
Example code:
uint32_t value;
uint8_t buffer[4];
buffer[0] = (value >> 24) & 0xFF;
buffer[1] = (value >> 16) & 0xFF;
buffer[2] = (value >> 8) & 0xFF;
buffer[3] = value & 0xFF;
I want to guarantee this works on any platform that supports 32 bit integers or more. I don't know if this is correct.

The code you presented is the most portable way of doing it. You convert a single unsigned integer value with 32 bits width into an array of unsigned integer values of exactly 8 bits width. The resulting bytes in the buffer array are in big endian order.
The masking is not needed. From C11 6.5.7p5:
The result of E1 >> E2 is E1 right-shifted E2 bit positions. If E1 has an unsigned type or if E1 has
a signed type and a nonnegative value, the value of the result is the integral part of the quotient of
E1 / 2^E2.
and casting to an integer with 8 bits width is (to the value) equal to masking 8 bits. So (result >> 24) & 0xff is equal to (uint8_t)(result >> 24) (to the value). As you assign to uint8_t variable the masking is not needed. Anyway I would safely assume it will be optimized out by a sane compiler.
I can recommend to take a look at one implementation that I remembered, that I guess has implemented in a really safe manner all the possible variants of splitting and composing fixed-width integers up to 64 bits from bytes and back, that is at gpsd bits.h.

Related

Purpose of bitwise operation in this example

In the function outb(0x3D5, (uint8_t) (pos & 0xFF));, I am having trouble understanding the purpose of the bitwise operation. pos is a uint16_t variable.
What is the purpose of the bitwise & with simply 8 bits of 1. The & operation would just compare this value with 8 bits of 1, which to my knowledge, would just result in the same value. How does this operation change the value of pos?
For background if it is important, the function outb is intended to move the value of the right argument into the register indexed by the port (left argument). In this case, pos is a position in VGA adapter memory. The function outb uses inline assembly and the assembly instruction out.

The & operation would just compare this value with 8 bits of 1s, which to my knowledge, would just result in the same value.
The & operation does not “compare” values. It performs a bitwise AND. When the 16-bit pos is ANDed with 0xFF, the result is the low eight bits of pos. Thus, if pos is 0x1234u, then pos & 0xFF will produce the value of 0x34u.
The conversion to uint8_t with the cast (uint8_t) would produce the same value, because a conversion of an integer to uint8_t wraps the value modulo 28, which, for non-negative integers, is equivalent to taking the low eight bits.
Thus, these expressions all produce the same value:
pos & 0xFF,
(uint8_t) pos, and
(uint8_t) (pos & 0xFF).
(For negative integers, wrapping modulo 28 and taking the low eight bits are not equivalent if the C implementation uses sign-and-magnitude or one’s complement representations, which are allowed by the C standard but are very rare these days. Of course, no uint16_t values are negative.)

32-Bit variable shifting in 8-bit MCU

I'm using 32-bit variable for storing 4 8-bit values into one 32-bit value.
32_bit_buf[0]= cmd[9]<<16 | cmd[10]<<8| cmd[11] <<0;
cmd is of unsigned char type with data
cmd [9]=AA
cmd[10]=BB
cmd[11]=CC
However when 32-bit variable is printed I'm getting 0xFFFFBBCC.
Architecture- 8-bit AVR Xmega
Language- C
Can anyone figure out where I'm going wrong.

Your architecture uses 16bit int, so shifting by 16 places is undefined. Cast your cmd[9] to a wider type, e.g. (uint32_t)cmd[9] << 16 should work.
You should also apply this cast to the other components: When you shift cmd[10] by 8 places, you could shift into the sign-bit of the 16bit signed int your operands are automatically promoted to, leading to more strange/undefined behavior.

That is because you are trying to shift value in 8 bit container (unsigned char) and get a 32 bit. The 8-bit value will be expanded to int (16 bit), but this is still not enough. You can solve the issue in many ways, one of them for e.g. could be by using the destination variable as accumulator.
32_bit_buf[0] = cmd[9];
32_bit_buf[0] <<= 8;
32_bit_buf[0] |= cmd[10];
32_bit_buf[0] <<= 8;
32_bit_buf[0] |= cmd[11];

Why first bitshifting left and then right, instead of AND-ing?

I came across this piece of C code:
typedef int gint
// ...
gint a, b;
// ...
a = (b << 16) >> 16;
For ease of notation let's assume that b = 0x11223344 at this point. As far as I can see it does the following:
b << 16 will give 0x33440000
>> 16 will give 0x00003344
So, the 16 highest bits are discarded.
Why would anyone write (b << 16) >> 16 if b & 0x0000ffff would work as well? Isn't the latter form more understandable? Is there any reason to use bitshifts in a case like this? Is there any edge-case where the two could not be the same?

Assuming that the size of int is 32 bits, then there is no need to use shifts. Indeed, bitwise & with a mask would be more readable, more portable and safer.
It should be noted that left-shifting on negative signed integers invokes undefined behavior, and that left-shifting things into the sign bits of a signed integer could also invoke undefined behavior. C11 6.5.7 (emphasis mine):
The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated
bits are filled with zeros. If E1 has an unsigned type, the value of
the result is E1 × 2E2, reduced modulo one more than the maximum value
representable in the result type. If E1 has a signed type and
nonnegative value, and E1 × 2E2 is representable in the result type,
then that is the resulting value; otherwise, the behavior is
undefined.
(The only possible rationale I can think of, is some pre-mature optimization for a 16-bit CPU that comes with a poor compiler. Then the code would be more efficient if you broke up the arithmetic in 16 bit chunks. But on such a system, int would most likely be 16 bits, so the code wouldn't make any sense then.)
As a side note, it doesn't make any sense to use the signed int type either. The most correct and safe type for this code would have been uint32_t.

So, the 16 highest bits are discarded.
They are not. Though it is formally implementation-defined how right-shift operation is performed on signed types, most compilers do it so as to replicate the sign bit.
Thus, the 16 highest bits are filled by the replicated value of the 15th bit as the result of this expression.

For an unsigned integral type (eg, the uint32_t we first thought was being used),
(b << 16) >> 16
is identical to b & (1<<16 - 1).
For a signed integral type though,
(b << 16)
could become negative (ie, the low int16_t would have been considered negative when taken on its own), in which case
(b << 16) >> 16
will (probably) still be negative due to sign extension. In that case, it isn't the same as the & mask, because the top bits will be set instead of zero.
Either this behaviour is deliberate (in which case the commented-out typedef is misleading), or it's a bug. I can't tell without reading the code.
Oh, and the shift behaviour in both directions is how I'd expect gcc to behave on x86, but I can't comment on how portable it is outside that. The left-shift may be UB as Lundin points out, and sign extension on the right-shift is implementation defined.

Assigning bits to a 64-bit variable

I am kinda new to bit operations. I am trying to store information in an int64_t variable like this:
int64_t u = 0;
for(i=0;i<44;i++)
u |= 1 << i;
for(;i<64;i++)
u |= 0 << i;
int t = __builtin_popcountl(u);
and what I intended with this was to store 44 1s in variable u and make sure that the remaining positions are all 0, so "t" returns 44. However, it always returns 64. With other variables, e.g. int32, it also fails. Why?

The type of an expression is generally determined by the expression itself, not by the context in which it appears.
Your variable u is of type int64_t (incidentally, uint64_t would be better since you're performing bitwise operations).
In this line:
u |= 1 << i;
since 1 is of type int, 1 << i is also of type int. If, as is typical, int is 32 bits, this has undefined behavior for larger values of i.
If you change this line to:
u |= (uint64_t)1 << i;
it should do what you want.
You could also change the 1 to 1ULL. That gives it a type of unsigned long long, which is guaranteed to be at least 64 bits but is not necessarily the same type as uint64_t.

__builtin_popcountl takes unsigned long as its paremeter, which is not always 64-bit integer. I personally use __builtin_popcountll, which takes long long. Looks like it's not the case for you
Integers have type 'int' by default, and by shifting int by anything greater or equal to 32 (to be precise, int's size in bits), you get undefined behavior. Correct usage: u |= 1LL << i; Here LL stands for long long.
Oring with zero does nothing. You can't just set bit to a particular value, you should either OR with mask (if you want to set some bits to 1s) or AND with mask's negation (if you want to set some bits to 0s), negation is done by tilda (~).

When you shift in the high bit of the 32-bit integer and and convert to 64-bit the sign bit will extend through the upper 32 bits; which you will then OR in setting all 64 bits, because your literal '1' is a signed 32 bit int by default. The shift will also not effect the upper 32 bits because the value is only 32 bit; however the conversion to 64-bit will when the the value being converted is negative.
This can be fixed by writing your first loop like this:
for(i=0;i<44;i++)
u |= (int64_t)1 << i;
Moreover, this loop does nothing since ORing with 0 will not alter the value:
for(;i<64;i++)
u |= 0 << i;

Clear lower 16 bits

I'm not so good with bitwise operators so please excuse the question but how would I clear the lower 16 bits of a 32-bit integer in C/C++?
For example I have an integer: 0x12345678 and I want to make that: 0x12340000

To clear any particular set of bits, you can use bitwise AND with the complement of a number that has 1s in those places. In your case, since the number 0xFFFF has its lower 16 bits set, you can AND with its complement:
b &= ~0xFFFF; // Clear lower 16 bits.
If you wanted to set those bits, you could instead use a bitwise OR with a number that has those bits set:
b |= 0xFFFF; // Set lower 16 bits.
And, if you wanted to flip those bits, you could use a bitwise XOR with a number that has those bits set:
b ^= 0xFFFF; // Flip lower 16 bits.
Hope this helps!

To take another path you can try
x = ((x >> 16) << 16);

One way would be to bitwise AND it with 0xFFFF0000 e.g. value = value & 0xFFFF0000

Use an and (&) with a mask that is made of the top 16 bit all ones (that will leave the top bits as they are) and the bottom bits all zeros (that will kill the bottom bits of the number).
So it'll be
0x12345678 & 0xffff0000
If the size of the type isn't known and you want to mask out only the lower 16 bits you can also build the mask in another way: use a mask that would let pass only the lower 16 bits
0xffff
and invert it with the bitwise not (~), so it will become a mask that kills only the lower 16 bits:
0x12345678 & ~0xffff

int x = 0x12345678;
int mask = 0xffff0000;
x &= mask;

Assuming the value you want to clear bits from has an unsigned type not of "small rank", this is the safest, most portable way to clear the lower 16 bits:
b &= -0x10000;
The value -0x10000 will be promoted to the type of b (an unsigned type) by modular arithmetic, resulting in all high bits being set and the low 16 bits being zero.
Edit: Actually James' answer is the safest (broadest use cases) of all, but the way his answer and mine generalize to other similar problems is a bit different and mine may be more applicable in related problems.