I have a header that can be any number of bits, and there is a variable called ByteAlign that's calculated by subtracting the current file position from the file position at the beginning of the file, the point of this variable is to pad the header to the next complete byte. so if the header is taking up 57 bits, the ByteAlign variable needs to be 7 bits in length to pad the header to 64 bits total, or 8 bytes.
Solutions that don't work:
Variable % 8 - 8, the result is the answer, but negative.
8 % Variable; this is completely inaccurate, and gives answers like 29, which is blatantly wrong, the largest number it should be is 7.
how exactly do I do this?
The number of bytes you need to accommodate n bits is (n + 7) / 8.
The number of bits in this is 8 * ((n + 7) / 8).
The amount of padding is thus 8 * ((n + 7) / 8) - n.
This should work:
(8 - (Variable & 7)) & 7
Related
Bit Twiddling Hacks contains the following macros, which count the number of bytes in a word x that are less than, or greater than, n:
#define countless(x,n) \
(((~0UL/255*(127+(n))-((x)&~0UL/255*127))&~(x)&~0UL/255*128)/128%255)
#define countmore(x,n) \
(((((x)&~0UL/255*127)+~0UL/255*(127-(n))|(x))&~0UL/255*128)/128%255)
However, it doesn't explain why they work. What's the logic behind these macros?
Let's try for intuition on countmore.
First, ~0UL/255*(127-n) is a clever way of copying the value 127-n to all bytes in the word in parallel. Why does it work? ~0 is 255 in all bytes. Consequently, ~0/255 is 1 in all bytes. Multiplying by (127-n) does the "copying" mentioned at the outset.
The term ~0UL/255*127 is just a special case of the above where n is zero. It copies 127 into all bytes. That's 0x7f7f7f7f if words are 4 bytes. "Anding" with x zeros out the high order bit in each byte.
That's the first term (x)&~0UL/255*127). The result is the same as x except the high bit in each byte is zeroed.
The second term ~0UL/255*(127-(n)) is as above: 127-n copied to each byte.
For any given byte x[i], adding the two terms gives us 127-n+x[i] if x[i]<=127. This quantity will have the high order bit set whenever x[i]>n. It's easiest to see this as adding two 7-bit unsigned numbers. The result "overflows" into the 8th bit because the result is 128 or more.
So it looks like the algorithm is going to use the 8th bit of each byte as a boolean indicating x[i]>n.
So what about the other case, x[i]>127? Here we know the byte is more than n because the algorithm stipulates n<=127. The 8th bit ought to be always 1. Happily, the sum's 8th bit doesn't matter because the next step "or"s the result with x. Since x[i] has the 8th bit set to 1 if and only if it's 128 or larger, this operation "forces" the 8th bit to 1 just when the sum might provide a bad value.
To summarize so far, the "or" result has the 8th bit set to 1 in its i'th byte if and only if x[i]>n. Nice.
The next operation &~0UL/255*128 sets everything to zero except all those 8th bits of interest. It's "anding" with 0x80808080...
Now the task is to find the number of these bits set to 1. For this, countmore uses some basic number theory. First it shifts right 7 bits so the bits of interest are b0, b8, b16... The value of this word is
b0 + b8*2^8 + b16*2^16 + ...
A beautiful fact is that 1 == 2^8 == 2^16 == ... mod 255. In other words, each 1 bit is 1 mod 255. It follows that finding mod 255 of the shifted result is the same as summing b0+b8+b16+...
Yikes. We're done.
Let's analyse countless macro. We can simplify this macro as following code:
#define A(n) (0x0101010101010101UL * (0x7F+n))
#define B(x) (x & 0x7F7F7F7F7F7F7F7FUL)
#define C(x,n) (A(n) - B(x))
#define countless(x,n) (( C(x,n) & ~x & 0x8080808080808080UL) / 0x80 % 0xFF )
A(n) will be:
A(0) = 0x7F7F7F7F7F7F7F7F
A(1) = 0x8080808080808080
A(2) = 0x8181818181818181
A(3) = 0x8282828282828282
....
And for B(x), each byte of x will mask with 0x7F.
If we suppose x = 0xb0b1b2b3b4b5b6b7 and n = 0, then C(x,n) will equals to 0x(0x7F-b0)(0x7F-b1)(0x7F-b2)...
For example, We suppose x = 0x1234567811335577 and n = 0x50. So:
A(0x50) = 0xCFCFCFCFCFCFCFCF
B(0x1234567811335577) = 0x1234567811335577
C(0x1234567811335577, 0x50) = 0xBD9B7957BE9C7A58
~(0x1234567811335577) = 0xEDCBA987EECCAA88
0xEDCBA987EECCAA88 & 0x8080808080808080UL = 0x8080808080808080
C(0x1234567811335577, 0x50) & 0x8080808080808080 = 0x8080000080800000
(0x8080000080800000 / 0x80) % 0xFF = 4 //Count bytes that equal to 0x80 value.
I'm looking through DOOM source code and I found this line.
void *
Z_Malloc
(int size,
int tag,
void *user) {
int extra;
memblock_t *start;
memblock_t *rover;
memblock_t *newblock;
memblock_t *base;
size = (size + 3) & ~3; // Why is it doing this?
...
I see sizeof used a lot to create byte offsets, but I've never seen this.
I understand the caller of this function wants some memory allocated, but I'm at a loss why it would manipulate the size like this.
What is it doing?
size = (size+3) & ~3 rounds the size up to the nearest multiple of 4.
It does this so that all blocks are a multiple of 4 bytes long and every block starts at an address that is a multiple of 4.
This is necessary so that the placement of ints and pointers inside the block can be aligned to fit into single memory words, which makes accessing them more efficient. Some processors actually require it.
To see how the rounding works, lets say that size = 4x-a, where 0 <= a <= 3. We have:
size+3 = 4x + (3-a), where 3-a is also between 0 and 3.
~3 is a bit mask that includes all bits except 20 and 21, so the & operation will leave just the multiple of 4:
(size+3)&~3 = 4x
If you run this it will be obvious:
for(int i=0; i<30; i++)
printf("%d ", (i+3) & ~3);
Output:
0 4 4 4 4 8 8 8 8 12 12 12 12 16 16 16 16 20 20 20 20 24 24 24 24 28 28 28 28 32
It rounds up to nearest 4.
It's done like this. Performing x = x & ~3 will set the two least significant bits of x to zero. If we assume 8-bit numbers for simplicity, a 3 will be stored as 00000011, which means that ~3 will be 11111100, so performing a logical and with this number will set the last two bits to zero. This itself is a rounding down to nearest four, because 4 binary is 100. If you add 3 first it will be a rounding up instead.
What's the most efficient way to calculate the amount of padding for 8-bit data that needs to be a multiple of 32-bit in C?
At the moment I do it like this:
pad = (4-size%4)%4;
As long as the optimizing compiler uses bitmasking for the % 4 instead of division, I think your code is probably pretty good. This might be a slight improvement:
// only the last 2 bits (hence & 3) matter
pad = (4 - (size & 3)) & 3;
But again, the optimizing compiler is probably smart enough to be reducing your code to this anyway. I can't think of anything better.
// align n bytes on size boundary
pad n size = (~n + 1) & (size - 1)
this is similar to TypeIA's solution and only machine language ops are used.
(~n + 1) computes the negative value, that would make up 0 when added to n
& (size - 1) filters only the last relevant bits.
examples
pad 13 8 = 3
pad 11 4 = 1
pad = (-size)&3;
This should be the fastest.
size 0: pad 0
size 1: pad 3
size 2: pad 2
size 3: pad 1
I was looking at the below bit reversal code and just wondering how does one come up with these kind of things. (source : http://www.cl.cam.ac.uk/~am21/hakmemc.html)
/* reverse 8 bits (Schroeppel) */
unsigned reverse_8bits(unsigned41 a) {
return ((a * 0x000202020202) /* 5 copies in 40 bits */
& 0x010884422010) /* where bits coincide with reverse repeated base 2^10 */
/* PDP-10: 041(6 bits):020420420020(35 bits) */
% 1023; /* casting out 2^10 - 1's */
}
Can someone explain what does comment "where bits coincide with reverse repeated base 2^10" mean?
Also how does "%1023" pull out the relevent bits? Is there any general idea in this?
It is a very broad question you are asking.
Here is an explanation of what % 1023 might be about: you know how computing n % 9 is like summing the digits of the base-10 representation of n? For instance, 52 % 9 = 7 = 5 + 2.
The code in your question is doing the same thing with 1023 = 1024 - 1 instead of 9 = 10 - 1. It is using the operation % 1023 to gather multiple results that have been computed “independently” as 10-bit slices of a large number.
And this is the beginning of a clue as to how the constants 0x000202020202 and 0x010884422010 are chosen: they make wide integer operations operate as independent simpler operations on 10-bit slices of a large number.
Expanding on Pascal Cuoq idea, here is an explaination.
The general idea is, in any base, if any number is divided by (base-1), the remainder will be sum of all the digits in the number.
For example, 34 when divided by 9 leaves 7 as remainder. This is because 34 can be written as 3 * 10 + 4
i.e. 34 = 3 * 10 + 4
= 3 * (9 +1) + 4
= 3 * 9 + (3 +4)
Now, 9 divides 3 * 9, leaving remainder (3 + 4). This process can be extended to any base 'b', since (b^n - 1) is always divided by (b-1).
Now, coming to the problem, if a number is represented in base 1024, and if the number is divided by 1023, the remainder will be sum of its digits.
To convert a binary number to base 1024, we can group bits of 10 from the right side into single number
For example, to convert binary number 0x010884422010(0b10000100010000100010000100010000000010000) to base 1024, we can group it into 10 bits number as follows
(1) (0000100010) (0001000100) (0010001000) (0000010000) =
(0b0000000001)*1024^4 + (0b0000100010)*1024^3 + (0b0001000100)*1024^2 + (0b0010001000)*1024^1 + (0b0000010000)*1024^0
So, when this number is divided by 1023, the remainder will sum of
0b0000000001
+ 0b0000100010
+ 0b0001000100
+ 0b0010001000
+ 0b0000010000
--------------------
0b0011111111
If you observe the above digits closely, the '1' bits in each above digit occupy complementay positions. So, when added together, it should pull all the 8 bits in the original number.
So, in the above code, "a * 0x000202020202", creates 5 copies of the byte "a". When the result is ANDed with 0x010884422010, we selectively choose 8 bits in the 5 copies of "a". When "% 1023" is applied, we pull all the 8 bits.
So, how does it actually reverse bits? That is bit clever. The idea is, the "1" bit in the digit 0b0000000001 is actually aligned with MSB of the original byte. So, when you "AND" and you are actually ANDing MSB of the original byte with LSB of the magic number digit. Similary the digit 0b0000100010 is aligned with second and sixth bits from MSB and so on.
So, when you add all the digits of the magic number, the resulting number will be reverse of the original byte.
I was recently asked in an interview how to set the 513th bit of a char[1024] in C, but I'm unsure how to approach the problem. I saw How do you set, clear, and toggle a single bit?, but how do I choose the bit from such a large array?
int bitToSet = 513;
inArray[bitToSet / 8] |= (1 << (bitToSet % 8));
...making certain assumptions about character size and desired endianness.
EDIT: Okay, fine. You can replace 8 with CHAR_BIT if you want.
#include <limits.h>
int charContaining513thBit = 513 / CHAR_BIT;
int offsetOf513thBitInChar = 513 - charContaining513thBit*CHAR_BIT;
int bit513 = array[charContaining513thBit] >> offsetOf513thBitInChar & 1;
You have to know the width of characters (in bits) on your machine. For pretty much everyone, that's 8. You can use the constant CHAR_BIT from limits.h in a C program. You can then do some fairly simple math to find the offset of the bit (depending on how you count them).
Numbering bits from the left, with the 2⁷ bit in a[0] being bit 0, the 2⁰ bit being bit 7, and the 2⁷ bit in a[1] being bit 8, this gives:
offset = 513 / CHAR_BIT; /* using integer (truncating) math, of course */
bit = 513 % CHAR_BIT;
a[offset] |= (0x80>>bit)
There are many sane ways to number bits, here are two:
a[0] a[1]
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 This is the above
7 6 5 4 3 2 1 0 15 14 13 12 11 10 9 8 This is |= (1<<bit)
You could also number from the other end of the array (treating it as one very large big-endian number).
Small optimization:
The / and % operators are rather slow, even on a lot of modern cpus, with modulus being slightly slower. I would replace them with the equivalent operations using bit shifting (and subtraction), which only works nicely when the second operand is a power of two, obviously.
x / 8 becomes x >> 3
x % 8 becomes x-((x>>3)<<3)
for this second operation, just reuse the result from the initial division.
Depending on the desired order (left to right versus right to left), it might change. But the general idea assuming 8 bits per byte would be to choose the byte as. This is expanded into lots of lines of code to hopefully show more clearly the intended steps (or perhaps it just obfuscates the intention):
int bitNum = 513;
int bytePos = bitNum / 8;
Then the bit position would be computed as:
int bitInByte = bitNum % 8;
Then set the bit (assuming the goal is to set it to 1 as opposed to clear or toggle it):
charArray[bytePos] |= ( 1 << bitInByte );
When you say 513th are you using index 0 or 1 for the 1st bit? If it's the former your post refers to the bit at index 512. I think the question is valid since everywhere else in C the first index is always 0.
BTW
static char chr[1024];
...
chr[512>>3]=1<<(512&0x7);