bitwise operationd to write individual byte of an integer - c

How do I set the nth byte of an 64 bit unsigned integer regardless of endian type in c ? One of the possible methods I tried is set each bit in a loop.

Assuming n = 0 is the least significant byte, why can't you just do the following:
x |= (0xffull << (n * 8));
If x = 0 and n = 2 this sets x to 0x0ff0000. Unless I am missing something? I don't see what endian-ness has to do with the problem.

Related

SWAR byte counting methods from 'Bit Twiddling Hacks' - why do they work?

Bit Twiddling Hacks contains the following macros, which count the number of bytes in a word x that are less than, or greater than, n:
#define countless(x,n) \
(((~0UL/255*(127+(n))-((x)&~0UL/255*127))&~(x)&~0UL/255*128)/128%255)
#define countmore(x,n) \
(((((x)&~0UL/255*127)+~0UL/255*(127-(n))|(x))&~0UL/255*128)/128%255)
However, it doesn't explain why they work. What's the logic behind these macros?
Let's try for intuition on countmore.
First, ~0UL/255*(127-n) is a clever way of copying the value 127-n to all bytes in the word in parallel. Why does it work? ~0 is 255 in all bytes. Consequently, ~0/255 is 1 in all bytes. Multiplying by (127-n) does the "copying" mentioned at the outset.
The term ~0UL/255*127 is just a special case of the above where n is zero. It copies 127 into all bytes. That's 0x7f7f7f7f if words are 4 bytes. "Anding" with x zeros out the high order bit in each byte.
That's the first term (x)&~0UL/255*127). The result is the same as x except the high bit in each byte is zeroed.
The second term ~0UL/255*(127-(n)) is as above: 127-n copied to each byte.
For any given byte x[i], adding the two terms gives us 127-n+x[i] if x[i]<=127. This quantity will have the high order bit set whenever x[i]>n. It's easiest to see this as adding two 7-bit unsigned numbers. The result "overflows" into the 8th bit because the result is 128 or more.
So it looks like the algorithm is going to use the 8th bit of each byte as a boolean indicating x[i]>n.
So what about the other case, x[i]>127? Here we know the byte is more than n because the algorithm stipulates n<=127. The 8th bit ought to be always 1. Happily, the sum's 8th bit doesn't matter because the next step "or"s the result with x. Since x[i] has the 8th bit set to 1 if and only if it's 128 or larger, this operation "forces" the 8th bit to 1 just when the sum might provide a bad value.
To summarize so far, the "or" result has the 8th bit set to 1 in its i'th byte if and only if x[i]>n. Nice.
The next operation &~0UL/255*128 sets everything to zero except all those 8th bits of interest. It's "anding" with 0x80808080...
Now the task is to find the number of these bits set to 1. For this, countmore uses some basic number theory. First it shifts right 7 bits so the bits of interest are b0, b8, b16... The value of this word is
b0 + b8*2^8 + b16*2^16 + ...
A beautiful fact is that 1 == 2^8 == 2^16 == ... mod 255. In other words, each 1 bit is 1 mod 255. It follows that finding mod 255 of the shifted result is the same as summing b0+b8+b16+...
Yikes. We're done.
Let's analyse countless macro. We can simplify this macro as following code:
#define A(n) (0x0101010101010101UL * (0x7F+n))
#define B(x) (x & 0x7F7F7F7F7F7F7F7FUL)
#define C(x,n) (A(n) - B(x))
#define countless(x,n) (( C(x,n) & ~x & 0x8080808080808080UL) / 0x80 % 0xFF )
A(n) will be:
A(0) = 0x7F7F7F7F7F7F7F7F
A(1) = 0x8080808080808080
A(2) = 0x8181818181818181
A(3) = 0x8282828282828282
....
And for B(x), each byte of x will mask with 0x7F.
If we suppose x = 0xb0b1b2b3b4b5b6b7 and n = 0, then C(x,n) will equals to 0x(0x7F-b0)(0x7F-b1)(0x7F-b2)...
For example, We suppose x = 0x1234567811335577 and n = 0x50. So:
A(0x50) = 0xCFCFCFCFCFCFCFCF
B(0x1234567811335577) = 0x1234567811335577
C(0x1234567811335577, 0x50) = 0xBD9B7957BE9C7A58
~(0x1234567811335577) = 0xEDCBA987EECCAA88
0xEDCBA987EECCAA88 & 0x8080808080808080UL = 0x8080808080808080
C(0x1234567811335577, 0x50) & 0x8080808080808080 = 0x8080000080800000
(0x8080000080800000 / 0x80) % 0xFF = 4 //Count bytes that equal to 0x80 value.

How do I set a new bit pattern in a certain position without changing the rest of the bits in C?

Suppose I have the following unsigned long val = 0xfedcba9876543210 and I want to change the 16 least significant bits to 0xabcd. So, the original value will be changed to unsigned long val = 0xfedcba987654abcd. I already have a function get that can return 0x3210, but I'm unsure how I can change this section of the value to 0xabcd. For more context, here is what I am trying to implement:
void set_pattern(unsigned long* val, int i, unsigned short new_pattern) {
// my attempt
unsigned short old_pattern = get(val, i); // ex: returns 0x3210 when i = 0
unsigned short* ptr = NULL;
ptr = &old_pattern;
*ptr = new_pattern;
}
When I tried my attempt, it seemed to not set the new pattern as I expected. Any help or feedback is appreciated in helping me gain a better understanding of C.
To explain Nate's comment, you want to apply a bitmask to zero out the relevant bits, then apply the new bits with a bitwise or.
Let's do it with 32 bits and you want to change the the least 8.
Apply a bitmask to turn the least 8 bits to 0. val = val & ~0xff. ~0xff is 0xffffff00. Since 0 & x = 0, all the filled in bits will retain their value, and all the 0's will become 0 no matter their original value.
0x12345678 val
AND 0xffffff00 ~0xff
= 0x12345600
Now that the relevant bits have been masked out, turned to 0, we can overwrite just them with a bitwise or. val = val | new_value. x | 0 = x. The irrelevant bits of new_value are 0. x | 0 = x. They will retain val's value. The relevant bits of val are 0. 0 | x = x. They will retain new_value's value.
0x12345600 val
OR 0x000000ef new_value
= 0x123456ef
If you want to replace different bits, you need to shift the bitmask and replacement value the appropriate amount.
Let's say we want to replace 56 with ef instead. Each hex character is 4 bits, so we need to left shift both the bitmask and replacement value by 8 bits.
0x12345678 val
AND 0xffff00ff ~(0xff << 8) == ~0xff00
= 0x12340078
0x12340078 val
OR 0x0000ef00 new_value << 8 == 0xef00
= 0x1234ef78
Well simple solution to it:
0xfedcba9876543210 & 0xfedcba9876540000
0xfedcba9876540000 | 0x000000000000abcd

Constant time string equality test return value

Looking for a constant time string equality test I found that most of them use bit trickery on the return value. For example this piece of code:
int ctiszero(const void* x, size_t n)
{
volatile unsigned char r = 0;
for (size_t i = 0; i < n; i += 1) {
r |= ((unsigned char*)x)[i];
}
return 1 & ((r - 1) >> 8);
}
What is the purpose of return 1 & ((r - 1) >> 8);? Why not a simple return !r;?
As mentioned in one of my comments, this functions checks if an array of arbitrary bytes is zero or not. If all bytes are zero then 1 will be returned, otherwise 0 will be returned.
If there is at least one non-zero byte, then r will be non-zero as well. Subtract 1 and you get a value that is zero or positive (since r is unsigned). Shift all bits off of r and the result is zero, which is then masked with 1 resulting in zero, which is returned.
If all the bytes are zero, then the value of r will be zero as well. But here comes the "magic": In the expression r - 1 the value of r undergoes what is called usual arithmetic conversion, which leads to the value of r to become promoted to an int. The value is still zero, but now it's a signed integer. Subtract 1 and you will have -1, which with the usual two's complement notation is equal to 0xffffffff. Shift it so it becomes 0x00ffffff and mask with 1 results in 1. Which is returned.
With constant time code, typically code that may branch (and incur run-time time differences), like return !r; is avoided.
Note that a well optimized compiler may emit the exact same code for return 1 & ((r - 1) >> 8); as return !r;. This exercise is therefore, at best, code to coax the compiler input emitting constant time code.
What about uncommon platforms?
return 1 & ((r - 1) >> 8); is well explained by #Some programmer dude good answer when int is 8-bit 2's complement - something that is very common.
With 8-bit unsigned char, and r > 0, r-1 is non-negative and 1 & ((r - 1) >> 8) returns 0 even if int is 2's complement, 1's complement or sign-magnitude, 16-bit, 32-bit etc.
When r == 0, r-1 is -1. It is implementation define behavior what 1 & ((r - 1) >> 8) returns. It returns 1 with int as 2's complement or 1's complement, but 0 with sign-magnitude.
// fails with sign-magnitude (rare)
// fails when byte width > 8 (uncommon)
return 1 & ((r - 1) >> 8);
Small changes can fix to work as desired in more cases1. Also see #Eric Postpischil
By insuring r - 1 is done using unsigned math, int encoding is irrelevant.
// v--- add u v--- shift by byte width
return 1 & ((r - 1u) >> CHAR_BIT);
1 Somewhat rare: When unsigned char size is the same as unsigned, OP's code and this fix fail. If wider math integer was available, code could use that: e.g.: return 1 & ((r - 1LLU) >> CHAR_BIT);
That's shorthand for r > 128 or zero. Which is to say, it's a non-ASCII character. If r's high bit is set subtracting 1 from it will leave the high bit set unless the high bit is the only bit set. Thus greater than 128 (0x80) and if r is zero, underflow will set the high bit.
The result of the for loop then is that if any bytes have the high bit set, or if all of the bytes are zero, 1 will be returned. But if all the non-zero bytes do not have the high bit set 0 will be returned.
Oddly, for a string of all 0x80 and 0x00 bytes 0 will still be returned. Not sure if that's a "feature" or not!

Setting bits in a bit stream

I have encountered the following C function while working on a legacy code and I am compeletely baffled, the way the code is organized. I can see that the function is trying to set bits at given position in bit stream but I can't get my head around with individual statements and expressions. Can somebody please explain why the developer used divison by 8 (/8) and modulus 8 (%8) expressions here and there. Is there an easy way to read these kinds of bit manipulation functions in c?
static void setBits(U8 *input, U16 *bPos, U8 len, U8 val)
{
U16 pos;
if (bPos==0)
{
pos=0;
}
else
{
pos = *bPos;
*bPos += len;
}
input[pos/8] = (input[pos/8]&(0xFF-((0xFF>>(pos%8))&(0xFF<<(pos%8+len>=8?0:8-(pos+len)%8)))))
|((((0xFF>>(8-len)) & val)<<(8-len))>>(pos%8));
if ((pos/8 == (pos+len)/8)|(!((pos+len)%8)))
return;
input[(pos+len)/8] = (input[(pos+len)/8]
&(0xFF-(0xFF<<(8-(pos+len)%8))))
|((0xFF>>(8-len)) & val)<<(8-(pos+len)%8);
}
please explain why the developer used divison by 8 (/8) and modulus 8 (%8) expressions here and there
First of all, note that the individual bits of a byte are numbered 0 to 7, where bit 0 is the least significant one. There are 8 bits in a byte, hence the "magic number" 8.
Generally speaking: if you have any raw data, it consists of n bytes and can therefore always be treated as an array of bytes uint8_t data[n]. To access bit x in that byte array, you can for example do like this:
Given x = 17, bit x is then found in byte number 17/8 = 2. Note that integer division "floors" the value, instead of 2.125 you get 2.
The remainder of the integer division gives you the bit position in that byte, 17%8 = 1.
So bit number 17 is located in byte 2, bit 1. data[2] gives the byte.
To mask out a bit from a byte in C, the bitwise AND operator & is used. And in order to use that, a bit mask is needed. Such bit masks are best obtained by shifting the value 1 by the desired amount of bits. Bit masks are perhaps most clearly expressed in hex and the possible bit masks for a byte will be (1<<0) == 0x01 , (1<<1) == 0x02, (1<<3) == 0x04, (1<<4) == 0x08 and so on.
In this case (1<<1) == 0x02.
C code:
uint8_t data[n];
...
size_t byte_index = x / 8;
size_t bit_index = x % 8;
bool is_bit_set;
is_bit_set = ( data[byte_index] & (1<<bit_index) ) != 0;

How do I get the lower 8 bits of an int?

Lets say I have an int variable n = 8. On most machines this will be a 32 bit value. How can I only get the lower 8 bits (lowest byte) of this in binary? Also how can I access each bit to find out what it is?
unsigned n = 8;
unsigned low8bits = n & 0xFF;
Note a few things:
For bitwise operations, always use the unsigned types
Bits can be extracted from numbers using binary masking with the & operator
To access the low 8 bits the mask is 0xFF because in binary it has its low 8 bits turned on and the rest 0
The low 8 bits of the number 8 are... 8 (think about it for a moment)
To access a certain bit of a number, say the kth bit:
unsigned n = ...;
unsigned kthbit = (1 << k) & n;
Now, kthbit will be 0 if the kth bit of n is 0, and some positive number (2**k) if the kth bit of n is 1.
Use bitwise arithmetic to mask off the lowest 8 bits:
unsigned char c = (x & 0xFF);
To access the nth lowest bit, the equation is (x & (1 << n)) (n of zero indicates the least significant bit). A result of zero indicates the bit is clear, and non-zero indicates the bit is set.
The best way is to use the bit logical operator & with the proper value.
So for the lower 8 bits:
n & 0xFF; /* 0xFF == all the lower 8 bits set */
Or as a general rule:
n & ((1<<8)-1) /* generate 0x100 then subtract 1, thus 0xFF */
You can combine with the bit shift operator to get a specific bit:
(n & (1<<3))>>3;
/* will give the value of the 3rd bit - note the >>3 is just to make the value either 0, or 1, not 0 or non-0 */
You can test if a particular bit is set in a number using << and &, ie:
if (num & (1<<3)) ...
will test if the fourth bit is set or not.
Similarly, you can extract just the lowest 8 bits (as an integer) by using & with a number which only has the lowest 8 bits set, ie num & 255 or num & 0xFF (in hexadecimal).

Resources