Bitwise carry applications - c

Call me naive but in this area I always struggled. So I was just browsing through the code for adding two numbers without + operator and bumped into this code:
int Add(int x, int y)
{
// Iterate till there is no carry
while (y != 0)
{
// carry now contains common set bits of x and y
int carry = x & y;
// Sum of bits of x and y where at least one of the bits is not set
x = x ^ y;
// Carry is shifted by one so that adding it to x gives the
// required sum
y = carry << 1;
}
return x;
}
Now I understand, how he is calculating the carry but why y!=0 and how this code is achieving the result for adding two numbers?

Basics first. Exclusive or'ing two bits is the same as the bottom digit of their sum. And'ing two bits is the same as the top bit of their sum.
A | B | A&B | A^B | A+B
-----------------------
0 | 0 | 0 | 0 | 00
0 | 1 | 0 | 1 | 01
1 | 0 | 0 | 1 | 01
1 | 1 | 1 | 0 | 10
As you can see the exclusive-or result is the same as the last digit of the sum. You can also see that the first digit of the sum is only 1 when A is 1 and B is 1.
[If you have a circuit with two inputs and two outputs, one of which is the exclusive or of the inputs and the other is the and of the inputs, it is called a half adder - because there is no facility to also input a carry (from a previous digit).]
So, to sum two bits, you calculate the XOR to get the lowest digit of the result and the AND to get the highest digit of the result.
For each individual pair of bits in a pair of numbers, I can calculate the sum of those two bits by doing both an XOR and an AND. Using four bit numbers, for example 3 and 5
3 0011
5 0101
------
0110 3^5 = 6 (low bit)
0001 3&5 = 1 (high bit)
In order to treat the 3 and 5 as single numbers rather than collections of four bits, each of those high bits needs to be treated as a carry and added to the next low bit to the left. We can do this by shifting the 3&5 left 1 bit and adding to the 3^5 which we do by repeating the two operations
6 0110
1<<1 0010
----
0100 6^(1<<1) = 4
0010 6&(1<<1) = 2
Unfortunately, one of the additions resulted in another carry being generated. So we can just repeat the operation.
4 0100
2<<1 0100
----
0000 4^(2<<1) = 0
0100 4&(2<<1) = 4
We still have a carry, so round we go again.
0 0000
4<<1 1000
----
1000 4^(4<<1) = 8
0000 4&(4<<1) = 0
This time, all the carries are 0 so more iterations are not going to change anything. We've finished.

I will try to explain it on a simple 3 bits example (you can skip this example to the actual explanation which marked in bold font and starts at Now to the way we achieve the same flow from the posted code).
Lets say we want to add x=0b011 with y=0b101. First we add the least significant bits 1+1 = 0b10
carry: x10
x: 011
+
y: 101
-----
xx0
Then we add the second bits (and by the book we need to add also the carry from the previous stage but we can also skip it for later stage): 1+0 = 0b1
carry: 010
x: 011
+
y: 101
-----
x10
Do the same for the third bit: 0+1 = 0b1
carry: 010
x: 011
+
y: 101
-----
110
So now we have carry = 0b010 and some partial result 0b110.
Remember my comment earlier that we take care of carry at some later stage? So now is this "later stage". Now we add the carry to the partial result we got (note that it is the same if we added the carry for each bit separately at the earlier stages). LSB bits addition:
NEW carry: x00
carry: 010
+
part. res.: 110
-----
xx0
Second bits addition:
NEW carry: 100
carry: 010
+
part. res.: 110
-----
x00
Third bit addition:
NEW carry: 100
carry: 010
+
part. res.: 110
-----
new part. res. 100
Now carry = NEW carry, part. res. = new part. res. and we do the same iteration once again.
For LSB
NEW carry: x00
carry: 100
+
part. res.: 100
-----
xx0
For the second bits:
NEW carry: 000
carry: 100
+
part. res.: 100
-----
x00
Third bits:
NEW carry: 1000 --> 000 since we are working with 3 bits only
carry: 100
+
part. res.: 100
-----
000
Now NEW carry is 0 so we have finished the calculation.The final result is 0b000 (overflow).
I am sure I haven't discovered anything to here. Now to the way we achieve the same flow from the posted code:
The partial result is the result without the carry, which means when x and y have different bits at the same position, the sum of these bits will be 1. If the same bits are identical, the result will be 0 (1+1 => 0, carry 1 and 0+0 => 0, carry 0).
Thus partial result is x ^ y (see the properties of the XOR operation). In the posted code it is x = x ^ y;.
Now let's look at the carry. We will get carry from a single bit addition only if both bits are 1. So the bits which will set the carry bits to 1 are marked as 1 in the following expression: x & y (only the set bits at the same position will remain 1). But the carry should be added to the next (more significant) bit! Thus
carry = (x & y) << 1; // in the posted code it is y = carry << 1
And the iterations are performed unless carry is 0 (like in our example).

Related

Fastest way to compute sum of first set bit over consecutive integers?

Edit: I wish SO let me accept 2 answers because neither is complete without the other. I suggest reading both!
I am trying to come up with a fast implementation of a function that given an unsigned 32-bit integer x returns the sum of 2^trailing_zeros(i) for i=1..x-1, where trailing_zeros is the count trailing zeros operation which is defined as returning the 0 bits after the least significant 1 bit. This seems like the kind of problem that should lend itself to a clever bit manipulation implementation that takes the same number of instructions regardless of the input, but I haven't been able to derive it.
Mathematically, 2^trailing_zeros(i) is equivalent to the largest factor of 2 that exactly divides i. So we are summing those largest factors for 1..x-1.
i | 1 2 3 4 5 6 7 8 9 10
-----------------------------------------------------------------------
2^trailing_zeroes(i) | 1 2 1 4 1 2 1 8 1 2
-----------------------------------------------------------------------
Sum (desired value) | 0 1 3 4 8 9 11 12 20 21
It is a little easier to see the structure of 2^trailing_zeroes(i) if we 'plot' the values -- horizontal position increasing from left to right corresponding to i and vertical position increasing from top to bottom corresponding to trailing_zeroes(i).
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8
16 16 16 16 16 16 16 16
32 32 32 32
64 64
Here it is easier to see the pattern that 2's are always 4 apart, 8's are always 16 apart, etc. However, each pattern starts at a different time -- 8's don't begin until i=8, 16 doesn't begin until i=16, etc. If you don't take into account that the patterns don't start right away you can come up with formulas that don't work -- for example you might think to determine the number of 8's going into the total you should just compute floor(x/16) but i=25 is far enough to the right to include both of the first two 8s.
The best solution I have come up with so far is:
Set n = floor(log2(x)). This can be computed quickly using the count leading zeros operation. This tells us the highest power of two that is going to be involved in the sum.
Set sum = 0
for i = 1..n
sum += floor((x - 2^i) / 2^(i+1))*2^i + 2^i
The way this works as for each power, it calculates the horizontal distance on the plot between x and the first appearance of that power, e.g. the distance between x and the first 8 is (x-8), and then it divides by the distance between repeating instances of that power, e.g. floor((x-8)/16), which gives us how many times that power appeared, we the sum for that power, e.g. floor((x-8)/16)*8. Then we add one instance of the given power because that calculation excludes the very first time that power appears.
In practice this implementation should be pretty fast because the division/floor can be done by right bit shift and powers of two can be done with 1 bit-shifted to the left. However it seems like it should still be possible to do better. This implementation will loop more for larger inputs, up to 32 times (it's O(log2(n)), ideally we want O(1) without a gigantic lookup table using up all the CPU cache). I've been eyeing the BMI/BMI2 intrinsics but I don't see an obvious way to apply them.
Although my goal is to implement this in a compiled language like C++ or Rust with real bit shifting and intrinsics, I've been prototyping in Python. Included below is my script that includes the implementation I described, z(x), and the code for generating the plot, tower(x).
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from math import pow, floor, log, ceil
def leading_zeros(x):
return len(bin(x).split('b')[-1].split('1')[-1])
def f(x):
s = 0
for c, i in enumerate(range(1,x)):
a = pow(2, len(bin(i).split('b')[-1].split('1')[-1]))
s += a
return s
def g(x): return sum([pow(2,i)*floor((x+pow(2,i)-1)/pow(2,i+1)) for i in range(0,32)])
def h(x):
s = 0
extra = 0
extra_s = 0
for i in range(0,32):
num = (x+pow(2,i)-1)
den = pow(2,i+1)
fraction = num/den
floored = floor(num/den)
power = pow(2,i)
product = power*floored
if product == 0:
break
s += product
extra += (fraction - floored)
extra_s += power*fraction
#print(f"i={i} s={s} num={num} den={den} fraction={fraction} floored={floored} power={power} product={product} extra={extra} extra_s={extra_s}")
return s
def z(x):
upper_bound = floor(log(x,2)) if x > 0 else 0
s = 0
for i in range(upper_bound+1):
num = (x - pow(2,i))
den = pow(2,i+1)
fraction = num/den
floored = floor(fraction)
added = pow(2,i)
s += floored * added
s += added
print(f"i={i} s={s} upper_bound={upper_bound} num={num} den={den} floored={floored} added={added}")
return s
# return sum([floor((x - pow(2,i))/pow(2,i+1) + pow(2,i)) for i in range(floor(log(x, 2)))])
def tower(x):
table = [[" " for i in range(x)] for j in range(ceil(log(x,2)))]
for i in range(1,x):
p = leading_zeros(i)
table[p][i] = 2**p
for row in table:
for col in row:
print(col,end='')
print()
# h(9000)
for i in range(1,16):
tower(i)
print((i, f(i), g(i), h(i), z(i-1)))
Based on the method of Eric Postpischil, here is a way to do it without a loop.
Note that every bit is being multiplied by its position, and the results are summed (sort of, except there is also a factor of 0.5 in it, let's put that aside for now). Let's call those values that are being added up "the partial products" just to call them something, it's not really accurate to call them that, I can't come up with anything better. If we transpose that a little bit, then it's built up like this: the lowest bit of every partial product is the lowest bit of the position of every bit multiplied by that bit. Single-bit-products are bitwise-AND, and the values of the lowest bits of the positions are 0,1,0,1 etc, so it works out to x & 0xAAAAAAAA, the second bit of every partial product is x & 0xCCCCCCCC (and has a "weight" of 2, so this must be multiplied by 2) etc.
Then the whole thing needs to be shifted right by 1, to account for the factor of 0.5
So in total:
unsigned CountCumulativeTrailingZeros(unsigned x)
{
--x;
unsigned sum = x;
sum += (x >> 1) & 0x55555555;
sum += x & 0xCCCCCCCC;
sum += (x & 0xF0F0F0F0) << 1;
sum += (x & 0xFF00FF00) << 2;
sum += (x & 0xFFFF0000) << 3;
return sum;
}
For an additional explanation, here is a more visual example. Let's temporarily drop the factor of 0.5 again, it doesn't fundamentally change the algorithm but adds some complication.
First I write above every bit of v (some example value), the position of that bit in binary (p0 is the least significant bit of the position, p1 the second bit etc). Read the ps vertically, every column is a number:
p0: 10101010101010101010101010101010
p1: 11001100110011001100110011001100
p2: 11110000111100001111000011110000
p3: 11111111000000001111111100000000
p4: 11111111111111110000000000000000
v : 00000000100001000000001000000000
So for example bit 9 is set, and it has (reading from bottom to top) 01001 above it (9 in binary).
What we want to do (why this works has been explained by Eric's answer), is take the indexes of the bits that are set, shift them to their corresponding positions, and add them. In this case, they are already at their own positions (by construction, the numbers were written at their own positions), so there is no shift, but they still need to be filtered so only the numbers that correspond to set bits survive. This is what I meant by the "single bit products": take a bit of v and multiply it by the corresponding bits of p0, p1, etc.
You can look at that as multiplying the bit value by its index as well so 2^bit * bit as mentioned in the comments. That is not how it is done here, but that is effectively what is done.
Back to the example, applying bitwise-AND results in these partial products:
pp0: 00000000100000000000001000000000
pp1: 00000000100001000000000000000000
pp2: 00000000100000000000000000000000
pp3: 00000000000000000000001000000000
pp4: 00000000100001000000000000000000
v : 00000000100001000000001000000000
The only values that are left are 01001, 10010, 10111, and they are at their corresponding positions (so, already shifted to where they need to go).
Those values must be added, while keeping them at their positions. They don't need to be extracted from the strange form which they are in, addition is freely reorderable (associative and commutative) so it's OK to add all the least significant bits of the partial products to the sum first, then all the seconds bits, and so on. But they have to added with the right "weight", after all a set bit in pp0 corresponds to a 1 at that position but a set bit in pp1 really corresponds to a 2 at that position (since it's the second bit of the number that it is part of). So pp0 is used directly, but pp1 is shifted left by 1, pp2 is shifted left by 2, etc.
The the factor of 0.5 must still be accounted for, which I did mostly by shifting over the bits of the partial products by one less than what their weight would imply. pp0 was shifted left by zero, so it must be shifted right by 1 now. This could be done with less complication by just putting return sum >> 1; at the end, but that would reduce the range of values that the function can handle before running into integer wrapping modulo 232 (also it would cost an extra operation, and doing it the weird way does not).
Observe that if we count from 1 to x instead of to x−1, we have a pattern:
x
sum
sum/x
1
1
1
2
3
1.5
4
8
2
8
20
2.5
16
48
3
So we can easily calculate the sum for any power of two p as p • (1 + ½b), where b is the power (equivalently, the number of the bit that is set or the log2 of the power). We can see this by induction: If the sum from 1 to 2b is 2b•(1+½b) (which it is for b=0), then the sum from 1 to 2b+1 reprises the individual term contributions twice except that the last term adds 2b+1 instead of 2b, so the sum is 2•2b•(1+½b) − 2b + 2b+1 = 2b+1•(1+½b) + ½•2b+1 = 2b+1•(1+½(b+1)).
Further, between any two powers of two, the lower bits reprise the previous partial sums. Thus, for any x, we can compute the cumulative number of trailing zeros by summing the sums for the set bits in it. Recalling this provides the sum for numbers from 1 to x, we adjust by to get the desired sum from 1 to x−1 subtracting one from x before computation:
unsigned CountCumulative(unsigned x)
{
--x;
unsigned sum = 0;
for (unsigned bit = 0; bit < sizeof x * CHAR_BIT; ++bit)
sum += (x & 1u << bit) * (1 + bit * .5);
return sum;
}
We can terminate the loop when x is exhausted:
unsigned CountCumulative(unsigned x)
{
--x;
unsigned sum = 0;
for (unsigned bit = 0; x; ++bit, x >>= 1)
sum += ((x & 1) << bit) * (1 + bit * .5);
return sum;
}
As harold points out, we can factor out the 1, as summing the value of each bit of x equals x:
unsigned CountCumulative(unsigned x)
{
--x;
unsigned sum = x;
for (unsigned bit = 0; x; ++bit, x >>= 1)
sum += ((x & 1) << bit) * bit * .5;
return sum;
}
Then eliminate the floating-point:
unsigned CountCumulative(unsigned x)
{
unsigned sum = --x;
for (unsigned bit = 0; x; ++bit, x >>= 1)
sum += ((x & 1) << bit) / 2 * bit;
return sum;
}
Note that when bit is zero, ((x & 1) << bit) / 2 will lose the fraction, but this irrelevant as * bit makes the contribution zero anyway. For all other values of bit, (x & 1) << bit is even, so the division does not lose anything.
This will overflow unsigned at some point, so one might want to use a wider type for the calculations.
More Code Golf
Another way to add half the values of the bits of x repeatedly depending on their bit position is to shift x (to halve its bit values) and then add that repeatedly while removing successive bits from low to high:
unsigned CountCumulative(unsigned x)
{
unsigned sum = --x;
for (unsigned bit = 0; x >>= 1; ++bit)
sum += x << bit;
return sum;
}

K&R C programming language exercise 2-9

I don't understand the exercise 2-9, in K&R C programming language,
chapter 2, 2.10:
Exercise 2-9. In a two's complement number system, x &= (x-1) deletes the rightmost 1-bit in x . Explain why. Use this observation to write a faster version of bitcount .
the bitcount function is:
/* bitcount: count 1 bits in x */
int bitcount(unsigned x)
{
int b;
for (b = 0; x != 0; x >>= 1)
if (x & 01)
b++;
return b;
}
The function deletes the rightmost bit after checking if it is bit-1 and then pops in the last bit .
I can't understand why x&(x-1) deletes the right most 1-bit?
For example, suppose x is 1010 and x-1 is 1001 in binary, and x&(x-1) would be 1011, so the rightmost bit would be there and would be one, where am I wrong?
Also, the exercise mentioned two's complement, does it have something to do with this question?
Thanks a lot!!!
First, you need to believe that K&R are correct.
Second, you may have some mis-understanding on the words.
Let me clarify it again for you. The rightmost 1-bit does not mean the right most bit, but the right most bit which is 1 in the binary form.
Let's arbitrary assume that x is xxxxxxx1000(x can be 0 or 1). Then from right to left, the fourth bit is the "rightmost 1-bit". On the basis of this understanding, let's continue on the problem.
Why x &=(x-1) can delete the rightmost 1-bit?
In a two's complement number system, -1 is represented with all 1 bit-pattern.
So x-1 is actually x+(-1), which is xxxxxxx1000+11111111111. Here comes the tricky point.
before the righmost 1-bit, all 0 becomes 1 and the rightmost 1-bit becomes 0 and there is a carry 1 go to left side. And this 1 will continue to proceed to the left most and cause an overflow, meanwhile, all 'x' bit is still a because 'x'+'1'+'1'(carry) causes a 'x' bit.
Then x & (x-1) will delete the rightmost 1-bit.
Hope you can understand it now.
Thanks.
Here is a simple way to explain it. Let's arbitrarily assume that number Y is xxxxxxx1000 (x can be 0 or 1).
xxxxxxx1000 - 1 = xxxxxxx0111
xxxxxxx1000 & xxxxxxx0111 = xxxxxxx0000 (See, the "rightmost 1" is gone.)
So the number of repetitions of Y &= (Y-1) before Y becomes 0 will be the total number of 1's in Y.
Why do x & (x-1) delete the right most order bit? Just try and see:
If the righmost order bit is 1, x has a binary representation of a...b1 and x-1 is a...b0 so the bitwise and will give a...b1 because common bits are left unchanged by the and and 1 & 0 is 0
Else x has a binary representation of a...b10...0; x-1 is a...b01...1 and for same reason as above x & (x-1) will be a...b00...0 again clearing the rightmost order bit.
So instead of scanning all bits to find which one are 0 and which one are 1, you just iterate the operation x = x & (x-1) until x is 0: the number of steps will be the number of 1 bits. It is more efficient than the naive implementation because statistically you will use half number of steps.
Example of code:
int bitcount(unsigned int x) {
int nb = 0;
while (x != 0) {
x &= x-1;
nb++
}
return nb;
}
Ik I'm already very late (≈ 3.5yrs) but your example has mistake.
x = 1010 = 10
x - 1 = 1001 = 9
1010 & 1001 = 1000
So as you can see, it deleted the rightmost bit in 10.
7 = 111
6 = 110
5 = 101
4 = 100
3 = 011
2 = 010
1 = 001
0 = 000
Observe that the position of rightmost 1 in any number, the bit at that same position of that number minus one is 0. Thus ANDing x with x-1 will be reset (i.e. set to 0) the rightmost bit.
7 & 6 = 111 & 110 = 110 = 6
6 & 5 = 110 & 101 = 100 = 4
5 & 4 = 101 & 100 = 100 = 4
4 & 3 = 010 & 011 = 010 = 2
3 & 2 = 011 & 010 = 010 = 2
2 & 1 = 010 & 001 = 000 = 0
1 & 0 = 001 & 000 = 000 = 0

Changing the LSB to 1 in a 4 bit integer via C

I am receiving a number N where N is a 4-bit integer and I need to change its LSB to 1 without changing the other 3 bits in the number using C.
Basically, all must read XXX1.
So lets say n = 2, the binary would be 0010. I would change the LSB to 1 making the number 0011.
I am struggling with finding a combination of operations that will do this. I am working with: !, ~, &, |, ^, <<, >>, +, -, =.
This has really been driving me crazy and I have been playing around with >>/<< and ~ and starting out with 0xF.
Try
number |= 1;
This should set the LSB to 1 regardless of what the number is. Why? Because the bitwise OR (|) operator does exactly what its name suggests: it logical ORs the two numbers' bits. So if you have, say, 1010b and 1b (10 and 1 in decimal), then the operator does this:
1 0 1 0
OR 0 0 0 1
= 1 0 1 1
And that's exactly what you want.
For your information, the
number |= 1;
statement is equivalent to
number = number | 1;
Use x = x | 0x01; to set the LSB to 1
A visualization
? ? ? ? ? ? ? ?
OR
0 0 0 0 0 0 0 1
----------------------
? ? ? ? ? ? ? 1
Therefore other bits will stay the same except the LSB is set to 1.
Use the bitwise or operator |. It looks at two numbers bit by bit, and returns the number generated by performing an OR with each bit.
int n = 2;
n = n | 1;
printf("%d\n", n); // prints the number 3
In binary, 2 = 0010, 3 = 0011, and 1 = 0001
0010
OR 0001
-------
0011
If n is not 0
n | !!n
works.
If n is 0, then !n is what you want.
UPDATE
The fancy one liner :P
n = n ? n | !!n : !n;

C - Fast conversion between binary and hex representations

Reading or writing a C code, I often have difficulties translating the numbers from the binary to the hex representations and back. Usually, different masks like 0xAAAA5555 are used very often in low-level programming, but it's difficult to recognize a special pattern of bits they represent. Is there any easy-to-remember rule how to do it fast in the mind?
Each hex digit map exactly on 4 bit, I usually keep in mind the 8421 weights of each of these bits, so it is very easy to do even an in mind conversion ie
A = 10 = 8+2 = 1010 ...
5 = 4+1 = 0101
just keep the 8-4-2-1 weights in mind.
A 5
8+4+2+1 8+4+2+1
1 0 1 0 0 1 0 1
I always find easy to map HEX to BINARY numbers. Since each hex digit can be directly mapped to a four digit binary number, you can think of:
> 0xA4
As
> b 1010 0100
> ---- ---- (4 binary digits for each part)
> A 4
The conversion is calculated by dividing the base 10 representation by 2 and stringing the remainders in reverse order. I do this in my head, seems to work.
So you say what does 0xAAAA5555 look like
I just work out what A looks like and 5 looks like by doing
A = 10
10 / 2 = 5 r 0
5 / 2 = 2 r 1
2 / 2 = 1 r 0
1 / 2 = 0 r 1
so I know the A's look like 1010 (Note that 4 fingers are a good way to remember the remainders!)
You can string blocks of 4 bits together, so A A is 1010 1010. To convert binary back to hex, I always go through base 10 again by summing up the powers of 2. You can do this by forming blocks of 4 bits (padding with 0s) and string the results.
so 111011101 is 0001 1101 1101 which is (1) (1 + 4 + 8) (1 + 4 + 8) = 1 13 13 which is 1DD

C GPIO hex numbering

I have been given the following bit of code as an example:
Make port 0 bits 0-2 outputs, other to be inputs.
FIO0DIR = 0x00000007;
Set P0.0, P0.1, P0.2 all low (0)
FIO0CLR = 0x00000007;
I have been told that the port has 31 LED's attached to it. I cant understand why, to enable the first 3 outputs, it is 0x00000007 not 0x00000003?
These GPIO config registers are bitmaps.
Use your Windows calculator to convert the hex to binary:
0x00000007 = 111, or with 32 bits - 00000000000000000000000000000111 // three outputs
0x00000003 = 11, or with 32 bits - 00000000000000000000000000000011 // only two outputs
Because the value you write to the register is a binary bit-mask, with a bit being one meaning "this is an output". You don't write the "number of outputs I'd like to have", you are setting 8 individual flags at the same time.
The number 7 in binary is 00000111, so it has the lower-most three bits set to 1, which here seems to mean "this is an output". The decimal value 3, on the other hand, is just 00000011 in binary, thus only having two bits set to 1, which clearly is one too few.
Bits are indexed from the right, starting at 0. The decimal value of bit number n is 2n. The decimal value of a binary number with more than one bit set is simply the sum of all the values of all the set bits.
So, for instance, the decimal value of the number with bits 0, 1 and 2 set is 20 + 21 + 22 = 1 + 2 + 4 = 7.
Here is an awesome ASCII table showing the 8 bits of a byte and their individual values:
+---+---+---+---+---+---+---+---+
index | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
+---+---+---+---+---+---+---+---+
value |128| 64| 32| 16| 8 | 4 | 2 | 1 |
+---+---+---+---+---+---+---+---+

Resources