Extracting bits with a single multiplication - c

I saw an interesting technique used in an answer to another question, and would like to understand it a little better.
We're given an unsigned 64-bit integer, and we are interested in the following bits:
1.......2.......3.......4.......5.......6.......7.......8.......
Specifically, we'd like to move them to the top eight positions, like so:
12345678........................................................
We don't care about the value of the bits indicated by ., and they don't have to be preserved.
The solution was to mask out the unwanted bits, and multiply the result by 0x2040810204081. This, as it turns out, does the trick.
How general is this method? Can this technique be used to extract any subset of bits? If not, how does one figure out whether or not the method works for a particular set of bits?
Finally, how would one go about finding the (a?) correct multiplier to extract the given bits?

Very interesting question, and clever trick.
Let's look at a simple example of getting a single byte manipulated. Using unsigned 8 bit for simplicity. Imagine your number is xxaxxbxx and you want ab000000.
The solution consisted of two steps: a bit masking, followed by multiplication. The bit mask is a simple AND operation that turns uninteresting bits to zeros. In the above case, your mask would be 00100100 and the result 00a00b00.
Now the hard part: turning that into ab.......
A multiplication is a bunch of shift-and-add operations. The key is to allow overflow to "shift away" the bits we don't need and put the ones we want in the right place.
Multiplication by 4 (00000100) would shift everything left by 2 and get you to a00b0000 . To get the b to move up we need to multiply by 1 (to keep the a in the right place) + 4 (to move the b up). This sum is 5, and combined with the earlier 4 we get a magic number of 20, or 00010100. The original was 00a00b00 after masking; the multiplication gives:
000000a00b000000
00000000a00b0000 +
----------------
000000a0ab0b0000
xxxxxxxxab......
From this approach you can extend to larger numbers and more bits.
One of the questions you asked was "can this be done with any number of bits?" I think the answer is "no", unless you allow several masking operations, or several multiplications. The problem is the issue of "collisions" - for example, the "stray b" in the problem above. Imagine we need to do this to a number like xaxxbxxcx. Following the earlier approach, you would think we need {x 2, x {1 + 4 + 16}} = x 42 (oooh - the answer to everything!). Result:
00000000a00b00c00
000000a00b00c0000
0000a00b00c000000
-----------------
0000a0ababcbc0c00
xxxxxxxxabc......
As you can see, it still works, but "only just". They key here is that there is "enough space" between the bits we want that we can squeeze everything up. I could not add a fourth bit d right after c, because I would get instances where I get c+d, bits might carry, ...
So without formal proof, I would answer the more interesting parts of your question as follows: "No, this will not work for any number of bits. To extract N bits, you need (N-1) spaces between the bits you want to extract, or have additional mask-multiply steps."
The only exception I can think of for the "must have (N-1) zeros between bits" rule is this: if you want to extract two bits that are adjacent to each other in the original, AND you want to keep them in the same order, then you can still do it. And for the purpose of the (N-1) rule they count as two bits.
There is another insight - inspired by the answer of #Ternary below (see my comment there). For each interesting bit, you only need as many zeros to the right of it as you need space for bits that need to go there. But also, it needs as many bits to the left as it has result-bits to the left. So if a bit b ends up in position m of n, then it needs to have m-1 zeros to its left, and n-m zeros to its right. Especially when the bits are not in the same order in the original number as they will be after the re-ordering, this is an important improvement to the original criteria. This means, for example, that a 16 bit word
a...e.b...d..c..
Can be shifted into
abcde...........
even though there is only one space between e and b, two between d and c, three between the others. Whatever happened to N-1?? In this case, a...e becomes "one block" - they are multiplied by 1 to end up in the right place, and so "we got e for free". The same is true for b and d (b needs three spaces to the right, d needs the same three to its left). So when we compute the magic number, we find there are duplicates:
a: << 0 ( x 1 )
b: << 5 ( x 32 )
c: << 11 ( x 2048 )
d: << 5 ( x 32 ) !! duplicate
e: << 0 ( x 1 ) !! duplicate
Clearly, if you wanted these numbers in a different order, you would have to space them further. We can reformulate the (N-1) rule: "It will always work if there are at least (N-1) spaces between bits; or, if the order of bits in the final result is known, then if a bit b ends up in position m of n, it needs to have m-1 zeros to its left, and n-m zeros to its right."
#Ternary pointed out that this rule doesn't quite work, as there can be a carry from bits adding "just to the right of the target area" - namely, when the bits we're looking for are all ones. Continuing the example I gave above with the five tightly packed bits in a 16 bit word: if we start with
a...e.b...d..c..
For simplicity, I will name the bit positions ABCDEFGHIJKLMNOP
The math we were going to do was
ABCDEFGHIJKLMNOP
a000e0b000d00c00
0b000d00c0000000
000d00c000000000
00c0000000000000 +
----------------
abcded(b+c)0c0d00c00
Until now, we thought anything below abcde (positions ABCDE) would not matter, but in fact, as #Ternary pointed out, if b=1, c=1, d=1 then (b+c) in position G will cause a bit to carry to position F, which means that (d+1) in position F will carry a bit into E - and our result is spoilt. Note that space to the right of the least significant bit of interest (c in this example) doesn't matter, since the multiplication will cause padding with zeros from beyone the least significant bit.
So we need to modify our (m-1)/(n-m) rule. If there is more than one bit that has "exactly (n-m) unused bits to the right (not counting the last bit in the pattern - "c" in the example above), then we need to strengthen the rule - and we have to do so iteratively!
We have to look not only at the number of bits that meet the (n-m) criterion, but also the ones that are at (n-m+1), etc. Let's call their number Q0 (exactly n-m to next bit), Q1 (n-m+1), up to Q(N-1) (n-1). Then we risk carry if
Q0 > 1
Q0 == 1 && Q1 >= 2
Q0 == 0 && Q1 >= 4
Q0 == 1 && Q1 > 1 && Q2 >=2
...
If you look at this, you can see that if you write a simple mathematical expression
W = N * Q0 + (N - 1) * Q1 + ... + Q(N-1)
and the result is W > 2 * N, then you need to increase the RHS criterion by one bit to (n-m+1). At this point, the operation is safe as long as W < 4; if that doesn't work, increase the criterion one more, etc.
I think that following the above will get you a long way to your answer...

Very interesting question indeed. I'm chiming in with my two cents, which is that, if you can manage to state problems like this in terms of first-order logic over the bitvector theory, then theorem provers are your friend, and can potentially provide you with very fast answers to your questions. Let's re-state the problem being asked as a theorem:
"There exists some 64-bit constants 'mask' and 'multiplicand' such that, for all 64-bit bitvectors x, in the expression y = (x & mask) * multiplicand, we have that y.63 == x.63, y.62 == x.55, y.61 == x.47, etc."
If this sentence is in fact a theorem, then it is true that some values of the constants 'mask' and 'multiplicand' satisfy this property. So let's phrase this in terms of something that a theorem prover can understand, namely SMT-LIB 2 input:
(set-logic BV)
(declare-const mask (_ BitVec 64))
(declare-const multiplicand (_ BitVec 64))
(assert
(forall ((x (_ BitVec 64)))
(let ((y (bvmul (bvand mask x) multiplicand)))
(and
(= ((_ extract 63 63) x) ((_ extract 63 63) y))
(= ((_ extract 55 55) x) ((_ extract 62 62) y))
(= ((_ extract 47 47) x) ((_ extract 61 61) y))
(= ((_ extract 39 39) x) ((_ extract 60 60) y))
(= ((_ extract 31 31) x) ((_ extract 59 59) y))
(= ((_ extract 23 23) x) ((_ extract 58 58) y))
(= ((_ extract 15 15) x) ((_ extract 57 57) y))
(= ((_ extract 7 7) x) ((_ extract 56 56) y))
)
)
)
)
(check-sat)
(get-model)
And now let's ask the theorem prover Z3 whether this is a theorem:
z3.exe /m /smt2 ExtractBitsThroughAndWithMultiplication.smt2
The result is:
sat
(model
(define-fun mask () (_ BitVec 64)
#x8080808080808080)
(define-fun multiplicand () (_ BitVec 64)
#x0002040810204081)
)
Bingo! It reproduces the result given in the original post in 0.06 seconds.
Looking at this from a more general perspective, we can view this as being an instance of a first-order program synthesis problem, which is a nascent area of research about which few papers have been published. A search for "program synthesis" filetype:pdf should get you started.

Every 1-bit in the multiplier is used to copy one of the bits into its correct position:
1 is already in the correct position, so multiply by 0x0000000000000001.
2 must be shifted 7 bit positions to the left, so we multiply by 0x0000000000000080 (bit 7 is set).
3 must be shifted 14 bit positions to the left, so we multiply by 0x0000000000000400 (bit 14 is set).
and so on until
8 must be shifted 49 bit positions to the left, so we multiply by 0x0002000000000000 (bit 49 is set).
The multiplier is the sum of the multipliers for the individual bits.
This only works because the bits to be collected are not too close together, so that the multiplication of bits which do not belong together in our scheme either fall beyond the 64 bit or in the lower don't-care part.
Note that the other bits in the original number must be 0. This can be achieved by masking them with an AND operation.

(I'd never seen it before. This trick is great!)
I'll expand a bit on Floris's assertion that when extracting n bits you need n-1 space between any non-consecutive bits:
My initial thought (we'll see in a minute how this doesn't quite work) was that you could do better: If you want to extract n bits, you'll have a collision when extracting/shifting bit i if you have anyone (non-consecutive with bit i) in the i-1 bits preceding or n-i bits subsequent.
I'll give a few examples to illustrate:
...a..b...c... Works (nobody in the 2 bits after a, the bit before and the bit after b, and nobody is in the 2 bits before c):
a00b000c
+ 0b000c00
+ 00c00000
= abc.....
...a.b....c... Fails because b is in the 2 bits after a (and gets pulled into someone else's spot when we shift a):
a0b0000c
+ 0b0000c0
+ 00c00000
= abX.....
...a...b.c... Fails because b is in the 2 bits preceding c (and gets pushed into someone else's spot when we shift c):
a000b0c0
+ 0b0c0000
+ b0c00000
= Xbc.....
...a...bc...d... Works because consecutive bits shift together:
a000bc000d
+ 0bc000d000
+ 000d000000
= abcd000000
But we have a problem. If we use n-i instead of n-1 we could have the following scenario: what if we have a collision outside of the part that we care about, something we would mask away at the end, but whose carry bits end up interfering in the important un-masked range? (and note: the n-1 requirement makes sure this doesn't happen by making sure the i-1 bits after our un-masked range are clear when we shift the the ith bit)
...a...b..c...d... Potential failure on carry-bits, c is in n-1 after b, but satisfies n-i criteria:
a000b00c000d
+ 0b00c000d000
+ 00c000d00000
+ 000d00000000
= abcdX.......
So why don't we just go back to that "n-1 bits of space" requirement?
Because we can do better:
...a....b..c...d.. Fails the "n-1 bits of space" test, but works for our bit-extracting trick:
+ a0000b00c000d00
+ 0b00c000d000000
+ 00c000d00000000
+ 000d00000000000
= abcd...0X......
I can't come up with a good way to characterize these fields that don't have n-1 space between important bits, but still would work for our operation. However, since we know ahead of time which bits we're interested in we can check our filter to make sure we don't experience carry-bit collisions:
Compare (-1 AND mask) * shift against the expected all-ones result, -1 << (64-n) (for 64-bit unsigned)
The magic shift/multiply to extract our bits works if and only if the two are equal.

In addition to the already excellent answers to this very interesting question, it might be useful to know that this bitwise multiplication trick has been known in the computer chess community since 2007, where it goes under the name of Magic BitBoards.
Many computer chess engines use several 64-bit integers (called bitboards) to represent the various piece sets (1 bit per occupied square). Suppose a sliding piece (rook, bishop, queen) on a certain origin square can move to at most K squares if no blocking pieces were present. Using bitwise-and of those scattered K bits with the bitboard of occupied squares gives a specific K-bit word embedded within a 64-bit integer.
Magic multiplication can be used to map these scattered K bits to the lower K bits of a 64-bit integer. These lower K bits can then be used to index a table of pre-computed bitboards that representst the allowed squares that the piece on its origin square can actually move to (taking care of blocking pieces etc.)
A typical chess engine using this approach has 2 tables (one for rooks, one for bishops, queens using the combination of both) of 64 entries (one per origin square) that contain such pre-computed results. Both the highest rated closed source (Houdini) and open source chess engine (Stockfish) currently use this approach for its very high performance.
Finding these magic multipliers is done either using an exhaustive search (optimized with early cutoffs) or with trial and erorr (e.g. trying lots of random 64-bit integers). There have been no bit patterns used during move generation for which no magic constant could be found. However, bitwise carry effects are typically necessary when the to-be-mapped bits have (almost) adjacent indices.
AFAIK, the very general SAT-solver approachy by #Syzygy has not been used in computer chess, and neither does there appear to be any formal theory regarding existence and uniqueness of such magic constants.

Related

Is this how the + operator is implemented in C?

When understanding how primitive operators such as +, -, * and / are implemented in C, I found the following snippet from an interesting answer.
// replaces the + operator
int add(int x, int y) {
while(x) {
int t = (x & y) <<1;
y ^= x;
x = t;
}
return y;
}
It seems that this function demonstrates how + actually works in the background. However, it's too confusing for me to understand it. I believed that such operations are done using assembly directives generated by the compiler for a long time!
Is the + operator implemented as the code posted on MOST implementations? Does this take advantage of two's complement or other implementation-dependent features?
To be pedantic, the C specification does not specify how addition is implemented.
But to be realistic, the + operator on integer types smaller than or equal to the word size of your CPU get translated directly into an addition instruction for the CPU, and larger integer types get translated into multiple addition instructions with some extra bits to handle overflow.
The CPU internally uses logic circuits to implement the addition, and does not use loops, bitshifts, or anything that has a close resemblance to how C works.
When you add two bits, following is the result: (truth table)
a | b | sum (a^b) | carry bit (a&b) (goes to next)
--+---+-----------+--------------------------------
0 | 0 | 0 | 0
0 | 1 | 1 | 0
1 | 0 | 1 | 0
1 | 1 | 0 | 1
So if you do bitwise xor, you can get the sum without carry.
And if you do bitwise and you can get the carry bits.
Extending this observation for multibit numbers a and b
a+b = sum_without_carry(a, b) + carry_bits(a, b) shifted by 1 bit left
= a^b + ((a&b) << 1)
Once b is 0:
a+0 = a
So algorithm boils down to:
Add(a, b)
if b == 0
return a;
else
carry_bits = a & b;
sum_bits = a ^ b;
return Add(sum_bits, carry_bits << 1);
If you get rid of recursion and convert it to a loop
Add(a, b)
while(b != 0) {
carry_bits = a & b;
sum_bits = a ^ b;
a = sum_bits;
b = carrry_bits << 1; // In next loop, add carry bits to a
}
return a;
With above algorithm in mind explanation from code should be simpler:
int t = (x & y) << 1;
Carry bits. Carry bit is 1 if 1 bit to the right in both operands is 1.
y ^= x; // x is used now
Addition without carry (Carry bits ignored)
x = t;
Reuse x to set it to carry
while(x)
Repeat while there are more carry bits
A recursive implementation (easier to understand) would be:
int add(int x, int y) {
return (y == 0) ? x : add(x ^ y, (x&y) << 1);
}
Seems that this function demonstrates how + actually works in the
background
No. Usually (almost always) integer addition translates to machine instruction add. This just demonstrate an alternate implementation using bitwise xor and and.
Seems that this function demonstrates how + actually works in the background
No. This is translated to the native add machine instruction, which is actually using the hardware adder, in the ALU.
If you're wondering how does the computer add, here is a basic adder.
Everything in the computer is done using logic gates, which are mostly made of transistors. The full adder has half-adders in it.
For a basic tutorial on logic gates, and adders, see this. The video is extremely helpful, though long.
In that video, a basic half-adder is shown. If you want a brief description, this is it:
The half adder add's two bits given. The possible combinations are:
Add 0 and 0 = 0
Add 1 and 0 = 1
Add 1 and 1 = 10 (binary)
So now how does the half adder work? Well, it is made up of three logic gates, the and, xor and the nand. The nand gives a positive current if both the inputs are negative, so that means this solves the case of 0 and 0. The xor gives a positive output one of the input is positive, and the other negative, so that means that it solves the problem of 1 and 0. The and gives a positive output only if both the inputs are positive, so that solves the problem of 1 and 1. So basically, we have now got our half-adder. But we still can only add bits.
Now we make our full-adder. A full adder consists of calling the half-adder again and again. Now this has a carry. When we add 1 and 1, we get a carry 1. So what the full-adder does is, it takes the carry from the half-adder, stores it, and passes it as another argument to the half-adder.
If you're confused how can you pass the carry, you basically first add the bits using the half-adder, and then add the sum and the carry. So now you've added the carry, with the two bits. So you do this again and again, till the bits you have to add are over, and then you get your result.
Surprised? This is how it actually happens. It looks like a long process, but the computer does it in fractions of a nanosecond, or to be more specific, in half a clock cycle. Sometimes it is performed even in a single clock cycle. Basically, the computer has the ALU (a major part of the CPU), memory, buses, etc..
If you want to learn computer hardware, from logic gates, memory and the ALU, and simulate a computer, you can see this course, from which I learnt all this: Build a Modern Computer from First Principles
It's free if you do not want an e-certificate. The part two of the course is coming up in spring this year
C uses an abstract machine to describe what C code does. So how it works is not specified. There are C "compilers" that actually compile C into a scripting language, for example.
But, in most C implementations, + between two integers smaller than the machine integer size will be translated into an assembly instruction (after many steps). The assembly instruction will be translated into machine code and embedded within your executable. Assembly is a language "one step removed" from machine code, intended to be easier to read than a bunch of packed binary.
That machine code (after many steps) is then interpreted by the target hardware platform, where it is interpreted by the instruction decoder on the CPU. This instruction decoder takes the instruction, and translates it into signals to send along "control lines". These signals route data from registers and memory through the CPU, where the values are added together often in an arithmetic logic unit.
The arithmetic logic unit might have separate adders and multipliers, or might mix them together.
The arithmetic logic unit has a bunch of transistors that perform the addition operation, then produce the output. Said output is routed via the signals generated from the instruction decoder, and stored in memory or registers.
The layout of said transistors in both the arithmetic logic unit and instruction decoder (as well as parts I have glossed over) is etched into the chip at the plant. The etching pattern is often produced by compiling a hardware description language, which takes an abstraction of what is connected to what and how they operate and generates transistors and interconnect lines.
The hardware description language can contain shifts and loops that don't describe things happening in time (like one after another) but rather in space -- it describes the connections between different parts of hardware. Said code may look very vaguely like the code you posted above.
The above glosses over many parts and layers and contains inaccuracies. This is both from my own incompetence (I have written both hardware and compilers, but am an expert in neither) and because full details would take a career or two, and not a SO post.
Here is a SO post about an 8-bit adder. Here is a non-SO post, where you'll note some of the adders just use operator+ in the HDL! (The HDL itself understands + and generates the lower level adder code for you).
Almost any modern processor that can run compiled C code will have builtin support for integer addition. The code you posted is a clever way to perform integer addition without executing an integer add opcode, but it is not how integer addition is normally performed. In fact, the function linkage probably uses some form of integer addition to adjust the stack pointer.
The code you posted relies on the observation that when adding x and y, you can decompose it into the bits they have in common and the bits that are unique to one of x or y.
The expression x & y (bitwise AND) gives the bits common to x and y. The expression x ^ y (bitwise exclusive OR) gives the bits that are unique to one of x or y.
The sum x + y can be rewritten as the sum of two times the bits they have in common (since both x and y contribute those bits) plus the bits that are unique to x or y.
(x & y) << 1 is twice the bits they have in common (the left shift by 1 effectively multiplies by two).
x ^ y is the bits that are unique to one of x or y.
So if we replace x by the first value and y by the second, the sum should be unchanged. You can think of the first value as the carries of the bitwise additions, and the second as the low-order bit of the bitwise additions.
This process continues until x is zero, at which point y holds the sum.
The code that you found tries to explain how very primitive computer hardware might implement an "add" instruction. I say "might" because I can guarantee that this method isn't used by any CPU, and I'll explain why.
In normal life, you use decimal numbers and you have learned how to add them: To add two numbers, you add the lowest two digits. If the result is less than 10, you write down the result and proceed to the next digit position. If the result is 10 or more, you write down the result minus 10, proceed to the next digit, buy you remember to add 1 more. For example: 23 + 37, you add 3+7 = 10, you write down 0 and remember to add 1 more for the next position. At the 10s position, you add (2+3) + 1 = 6 and write that down. Result is 60.
You can do the exact same thing with binary numbers. The difference is that the only digits are 0 and 1, so the only possible sums are 0, 1, 2. For a 32 bit number, you would handle one digit position after the other. And that is how really primitive computer hardware would do it.
This code works differently. You know the sum of two binary digits is 2 if both digits are 1. So if both digits are 1 then you would add 1 more at the next binary position and write down 0. That's what the calculation of t does: It finds all places where both binary digits are 1 (that's the &) and moves them to the next digit position (<< 1). Then it does the addition: 0+0 = 0, 0+1 = 1, 1+0 = 1, 1+1 is 2, but we write down 0. That's what the excludive or operator does.
But all the 1's that you had to handle in the next digit position haven't been handled. They still need to be added. That's why the code does a loop: In the next iteration, all the extra 1's are added.
Why does no processor do it that way? Because it's a loop, and processors don't like loops, and it is slow. It's slow, because in the worst case, 32 iterations are needed: If you add 1 to the number 0xffffffff (32 1-bits), then the first iteration clears bit 0 of y and sets x to 2. The second iteration clears bit 1 of y and sets x to 4. And so on. It takes 32 iterations to get the result. However, each iteration has to process all bits of x and y, which takes a lot of hardware.
A primitive processor would do things just as quick in the way you do decimal arithmetic, from the lowest position to the highest. It also takes 32 steps, but each step processes only two bits plus one value from the previous bit position, so it is much easier to implement. And even in a primitive computer, one can afford to do this without having to implement loops.
A modern, fast and complex CPU will use a "conditional sum adder". Especially if the number of bits is high, for example a 64 bit adder, it saves a lot of time.
A 64 bit adder consists of two parts: First, a 32 bit adder for the lowest 32 bit. That 32 bit adder produces a sum, and a "carry" (an indicator that a 1 must be added to the next bit position). Second, two 32 bit adders for the higher 32 bits: One adds x + y, the other adds x + y + 1. All three adders work in parallel. Then when the first adder has produced its carry, the CPU just picks which one of the two results x + y or x + y + 1 is the correct one, and you have the complete result. So a 64 bit adder only takes a tiny bit longer than a 32 bit adder, not twice as long.
The 32 bit adder parts are again implemented as conditional sum adders, using multiple 16 bit adders, and the 16 bit adders are conditional sum adders, and so on.
My question is: Is the + operator implemented as the code posted on MOST implementations?
Let's answer the actual question. All operators are implemented by the compiler as some internal data structure that eventually gets translated into code after some transformations. You can't say what code will be generated by a single addition because almost no real world compiler generates code for individual statements.
The compiler is free to generate any code as long as it behaves as if the actual operations were performed according to the standard. But what actually happens can be something completely different.
A simple example:
static int
foo(int a, int b)
{
return a + b;
}
[...]
int a = foo(1, 17);
int b = foo(x, x);
some_other_function(a, b);
There's no need to generate any addition instructions here. It's perfectly legal for the compiler to translate this into:
some_other_function(18, x * 2);
Or maybe the compiler notices that you call the function foo a few times in a row and that it is a simple arithmetic and it will generate vector instructions for it. Or that the result of the addition is used for array indexing later and the lea instruction will be used.
You simply can't talk about how an operator is implemented because it is almost never used alone.
In case a breakdown of the code helps anyone else, take the example x=2, y=6:
x isn't zero, so commence adding to y:
while(2) {
x & y = 2 because
x: 0 0 1 0 //2
y: 0 1 1 0 //6
x&y: 0 0 1 0 //2
2 <<1 = 4 because << 1 shifts all bits to the left:
x&y: 0 0 1 0 //2
(x&y) <<1: 0 1 0 0 //4
In summary, stash that result, 4, in t with
int t = (x & y) <<1;
Now apply the bitwise XOR y^=x:
x: 0 0 1 0 //2
y: 0 1 1 0 //6
y^=x: 0 1 0 0 //4
So x=2, y=4. Finally, sum t+y by resetting x=t and going back to the beginning of the while loop:
x = t;
When t=0 (or, at the beginning of the loop, x=0), finish with
return y;
Just out of interest, on the Atmega328P processor, with the avr-g++ compiler, the following code implements adding one by subtracting -1 :
volatile char x;
int main ()
{
x = x + 1;
}
Generated code:
00000090 <main>:
volatile char x;
int main ()
{
x = x + 1;
90: 80 91 00 01 lds r24, 0x0100
94: 8f 5f subi r24, 0xFF ; 255
96: 80 93 00 01 sts 0x0100, r24
}
9a: 80 e0 ldi r24, 0x00 ; 0
9c: 90 e0 ldi r25, 0x00 ; 0
9e: 08 95 ret
Notice in particular that the add is done by the subi instruction (subtract constant from register) where 0xFF is effectively -1 in this case.
Also of interest is that this particular processor does not have a addi instruction, which implies that the designers thought that doing a subtract of the complement would be adequately handled by the compiler-writers.
Does this take advantage of two's complement or other implementation-dependent features?
It would probably be fair to say that compiler-writers would attempt to implement the wanted effect (adding one number to another) in the most efficient way possible for that particularly architecture. If that requires subtracting the complement, so be it.

How do you use bitwise operators, masks, to find if a number is a multiple of another number?

So I have been told that this can be done and that bitwise operations and masks can be very useful but I must be missing something in how they work.
I am trying to calculate whether a number, say x, is a multiple of y. If x is a multiple of y great end of story, otherwise I want to increase x to reach the closest multiple of y that is greater than x (so that all of x fits in the result). I have just started learning C and am having difficulty understanding some of these tasks.
Here is what I have tried but when I input numbers such as 5, 9, or 24 I get the following respectively: 0, 4, 4.
if(x&(y-1)){ //if not 0 then multiple of y
x = x&~(y-1) + y;
}
Any explanations, examples of the math that is occurring behind the scenes, are greatly appreciated.
EDIT: So to clarify, I somewhat understand the shifting of bits to get whether an item is a multiple. (As was explained in a reply 10100 is a multiple of 101 as it is just shifted over). If I have the number 16, which is 10000, its complement is 01111. How would I use this complement to see if an item is a multiple of 16? Also can someone give a numerical explanation of the code given above? Showing this may help me understand why it does not work. Once I understand why it does not work I will be able to problem solve on my own I believe.
Why would you even think about using bit-wise operations for this? They certainly have their place but this isn't it.
A better method is to simply use something like:
unsigned multGreaterOrEqual(unsigned x, unsigned y) {
if ((x % y) == 0)
return x;
return (x / y + 1) * y;
}
In the trivial cases, every number that is an even multiple of a power of 2 is just shifted to the left (this doesn't apply when possibly altering the sign bit)
For example
10100
is 4 times
101
and
10100
is 2 time
1010
As for other multiples, they would have to be found by combining the outputs of two shifts. You might want to look up some primitive means of computer division, where division looks roughly like
x = a / b
implemented like
buffer = a
while a is bigger than b; do
yes: subtract a from b
add 1 to x
done
faster routines try to figure out higher level place values first, skipping lots of subtractions. All of these routine can be done bitwise; but it is a big pain. In the ALU these routines are done bitwise. Might want to look up a digital logic design book for more ideas.
Ok, so I have discovered what the error was in my code and since the majority say that it is impossible to calculate whether a number is a multiple of another number using masks I figured I would share what I have learned.
It is possible! - if you are using the correct data types that is.
The code given above works if y is declared as a constant unsigned long as x which was being passed in was also an unsigned long. The key point is not the long or constant part but that the number is unsigned. This sign bit causes miscalculation as the first place in the number indicates sign and when performing bitwise operations signs can get muddled.
So here is my code if we are looking for multiples of 16:
const unsigned long y = 16; //declared globally in my case
Then an unsigned long is passed to the function which runs the following code:
if(x&(y-1)){ //if not 0 then multiple of y
x = x&~(y-1) + y;
}
x will now be the size of the nearest multiple of 16.

How to use bit manipulation to find 5th bit, and to return number of 1 bits in an integer

Assume Z is an unsigned integer. Using ~, <<, >>, &, | , +, and - provide statements which return the desired result.
I am allowed to introduce new binary values if needed.
I have these problems:
1.Extract the 5th bit from the left Z.
For this I was thinking about doing something like
x x x x x x x x
& 0 0 0 0 1 0 0 0
___________________
0 0 0 0 1 0 0 0
Does this make sense for extracting the fifth bit? I am not totally sure how I would make this work by using just Z when I do not know its values. (I am relatively new to all of this). Would this type of idea work though?
2.Return the number of 1 bits in Z
Here I kind of have no idea how to work this out. What I really need to know is how to work on just Z with the operators, but I m not sure exactly how to.
Like I said I am new to this, so any help is appreciated.
Problem 1
You’re right on the money. I’d do an & and a >> so that you get either a nice 0 or 1.
result = (z & 0x08) >> 3;
However, this may not be strictly necessary. For example, if you’re trying to check whether the bit is set as part of an if conditional, you can exploit C’s definition of anything nonzero as true.
if (z & 0x08)
do_stuff();
Problem 2
There are a whole variety of ways to do this. According to that page, the following methodology dates from 1960, though it wasn’t published in C until 1988.
for (result = 0; z; result++)
z &= z - 1;
Exactly why this works might not be obvious at first, but if you work through a few examples, you’ll quickly see why it does.
It’s worth noting that this operation – determining the number of 1 bits in a number – is sufficiently important to have a name (population count or Hamming weight) and, on recent Intel and AMD processors, a dedicated instruction. If you’re using GCC, you can use the __builtin_popcount intrinsic.
Problem 1 looks right, except you should finish it by shifting the result right by 4 to get that bit after the mask.
To implement the mask, you need to know what integer is represented by a single 5th bit. That number is incidentally 2^5 = 32. So you can just AND z with 32 and shift it right by 4.
Problem 2:
int answer = 0;
while (z != 0){ //stop when there are no more 1 bits in z
//the following masks the lowest bit in z and adds it into answer
//if z ends with a 0, nothing is added, otherwise 1 is added
answer += (z & 1);
//this shifts z right by 1 to get the next higher bit
z >>= 1;
}
return answer;
To find out the value of the fifth bit, you don't care about the bottom bits so you can get rid of them:
unsigned int answer = z >> 4;
The fifth bit becomes the bottom bit, so you can strip it off with a bitwise-AND:
answer = answer & 1;
To find the number of 1-bits in a number you can apply stakSmashr's solution. You could optimise this further if you know you need to count the number of bits in a lot of integers - precompute the number of bits in every possible 8-bit number and store it in a table. There will only be 256 entries in the table so it won't use much memory. Then, you can loop over your data one byte at a time and find the answer from the table. This lookup will be quicker than looping again over each bit.

Quick integer logarithm for special case

I have integer values ranging from 32-8191 which I want to map to a roughly logarithmic scale. If I were using base 2, I could just count the leading zero bits and map them into 8 slots, but this is too course-grained; I need 32 slots (and more would be better, but I need them to map to bits in a 32-bit value), which comes out to a base of roughly 1.18-1.20 for the logarithm. Anyone have some tricks for computing this value, or a reasonable approximation, very fast?
My intuition is to break the range down into 2 or 3 subranges with conditionals, and use a small lookup table for each, but I wonder if there's some trick I could do with count-leading-zeros then refining the result, especially since the results don't have to be exact but just roughly logarithmic.
Why not use the next two bits other than the leading bit. You can first partition the number into the 8 bin, and the next two bits to further divide each bin into four. In this case, you can use a simple shift operation which is very fast.
Edit: If you think using the logarithm is a viable solution. Here is the general algorithm:
Let a be the base of the logarithm, and the range is (b_min, b_max) = (32,8191). You can find the base using the formula:
log(b_max/b_min) / log(a) = 32 bin
which give you a~1.1892026. If you use this a as the base of the logarithm, you can map the range (b_min, b_max) into (log_a(b_min), log_a(b_max)) = (20.0004,52.0004).
Now you only need to subtract the all element by a 20.0004 to get the range (0,32). It guarantees all elements are logarithmically uniform. Done
Note: Either a element may fall out of range because of numerical error. You should calculate it yourself for the exact value.
Note2: log_a(b) = log(b)/log(a)
Table lookup is one option, that table isn't that big. If an 8K table is too big, and you have a count leading zeros instruction, you can use a table lookup on the top few bits.
nbits = 32 - count_leading_zeros(v) # number of bits in number
highbits = v >> (nbits - 4) # top 4 bits. Top bit is always a 1.
log_base_2 = nbits + table[highbits & 0x7]
The table you populate with some approximation of log_2
table[i] = approx(log_2(1 + i/8.0))
If you want to stay in integer arithmetic, multiply the last line by a convenient factor.
Answer I just came up with based in IEEE 754 floating point:
((union { float v; uint32_t r; }){ x }.r >> 21 & 127) - 16
It maps 32-8192 onto 0-31 roughly logarithmically (same as hwlau's answer).
Improved version (cut out useless bitwise and):
((union { float v; uint32_t r; }){ x }.r >> 21) - 528

How to map a long integer number to a N-dimensional vector of smaller integers (and fast inverse)?

Given a N-dimensional vector of small integers is there any simple way to map it with one-to-one correspondence to a large integer number?
Say, we have N=3 vector space. Can we represent a vector X=[(int16)x1,(int16)x2,(int16)x3] using an integer (int48)y? The obvious answer is "Yes, we can". But the question is: "What is the fastest way to do this and its inverse operation?"
Will this new 1-dimensional space possess some very special useful properties?
For the above example you have 3 * 32 = 96 bits of information, so without any a priori knowledge you need 96 bits for the equivalent long integer.
However, if you know that your x1, x2, x3, values will always fit within, say, 16 bits each, then you can pack them all into a 48 bit integer.
In either case the technique is very simple you just use shift, mask and bitwise or operations to pack/unpack the values.
Just to make this concrete, if you have a 3-dimensional vector of 8-bit numbers, like this:
uint8_t vector[3] = { 1, 2, 3 };
then you can join them into a single (24-bit number) like so:
uint32_t all = (vector[0] << 16) | (vector[1] << 8) | vector[2];
This number would, if printed using this statement:
printf("the vector was packed into %06x", (unsigned int) all);
produce the output
the vector was packed into 010203
The reverse operation would look like this:
uint8_t v2[3];
v2[0] = (all >> 16) & 0xff;
v2[1] = (all >> 8) & 0xff;
v2[2] = all & 0xff;
Of course this all depends on the size of the individual numbers in the vector and the length of the vector together not exceeding the size of an available integer type, otherwise you can't represent the "packed" vector as a single number.
If you have sets Si, i=1..n of size Ci = |Si|, then the cartesian product set S = S1 x S2 x ... x Sn has size C = C1 * C2 * ... * Cn.
This motivates an obvious way to do the packing one-to-one. If you have elements e1,...,en from each set, each in the range 0 to Ci-1, then you give the element e=(e1,...,en) the value e1+C1*(e2 + C2*(e3 + C3*(...Cn*en...))).
You can do any permutation of this packing if you feel like it, but unless the values are perfectly correlated, the size of the full set must be the product of the sizes of the component sets.
In the particular case of three 32 bit integers, if they can take on any value, you should treat them as one 96 bit integer.
If you particularly want to, you can map small values to small values through any number of means (e.g. filling out spheres with the L1 norm), but you have to specify what properties you want to have.
(For example, one can map (n,m) to (max(n,m)-1)^2 + k where k=n if n<=m and k=n+m if n>m--you can draw this as a picture of filling in a square like so:
1 2 5 | draw along the edge of the square this way
4 3 6 v
8 7
if you start counting from 1 and only worry about positive values; for integers, you can spiral around the origin.)
I'm writing this without having time to check details, but I suspect the best way is to represent your long integer via modular arithmetic, using k different integers which are mutually prime. The original integer can then be reconstructed using the Chinese remainder theorem. Sorry this is a bit sketchy, but hope it helps.
To expand on Rex Kerr's generalised form, in C you can pack the numbers like so:
X = e[n];
X *= MAX_E[n-1] + 1;
X += e[n-1];
/* ... */
X *= MAX_E[0] + 1;
X += e[0];
And unpack them with:
e[0] = X % (MAX_E[0] + 1);
X /= (MAX_E[0] + 1);
e[1] = X % (MAX_E[1] + 1);
X /= (MAX_E[1] + 1);
/* ... */
e[n] = X;
(Where MAX_E[n] is the greatest value that e[n] can have). Note that these maximum values are likely to be constants, and may be the same for every e, which will simplify things a little.
The shifting / masking implementations given in the other answers are a generalisation of this, for cases where the MAX_E + 1 values are powers of 2 (and thus the multiplication and division can be done with a shift, the addition with a bitwise-or and the modulus with a bitwise-and).
There is some totally non portable ways to make this real fast using packed unions and direct accesses to memory. That you really need this kind of speed is suspicious. Methods using shifts and masks should be fast enough for most purposes. If not, consider using specialized processors like GPU for wich vector support is optimized (parallel).
This naive storage does not possess any usefull property than I can foresee, except you can perform some computations (add, sub, logical bitwise operators) on the three coordinates at once as long as you use positive integers only and you don't overflow for add and sub.
You'd better be quite sure you won't overflow (or won't go negative for sub) or the vector will become garbage.
#include <stdint.h> // for uint8_t
long x;
uint8_t * p = &x;
or
union X {
long L;
uint8_t A[sizeof(long)/sizeof(uint8_t)];
};
works if you don't care about the endian. In my experience compilers generate better code with the union because it doesn't set of their "you took the address of this, so I must keep it in RAM" rules as quick. These rules will get set off if you try to index the array with stuff that the compiler can't optimize away.
If you do care about the endian then you need to mask and shift.
I think what you want can be solved using multi-dimensional space filling curves. The link gives a lot of references on this, which in turn give different methods and insights. Here's a specific example of an invertible mapping. It works for any dimension N.
As for useful properties, these mappings are related to Gray codes.
Hard to say whether this was what you were looking for, or whether the "pack 3 16-bit ints into a 48-bit int" does the trick for you.

Resources