replace addition operation with bit-shifting - c

In my embedded code I need to add offsets 0x100, 0x200, 0x300 etc. (overall number of offsets is fixed, say 64) to initial register address. Is it possible to optimize it with bit shifting? I know that multiplication by 2 is left bit-shifting by 2, but I can't get my head around addition operation.

You can't replace addition with bit shifting. Only multiplication can. This is because left-shift(<<) multiplies by x * 2^n. So
(1 << 3) == 1 * pow(3, 2)
Right shift(>>) divides x / 2^n.
Doing bit shifting instead of the equivalent multiplying/dividing is faster, and is usually used in games, where performance is critical.

Related

Applications of bitwise operators in C and their efficiency? [duplicate]

This question already has answers here:
Real world use cases of bitwise operators [closed]
(41 answers)
Closed 6 years ago.
I am new to bitwise operators.
I understand how the logic functions work to get the final result. For example, when you bitwise AND two numbers, the final result is going to be the AND of those two numbers (1 & 0 = 0; 1 & 1 = 1; 0 & 0 = 0). Same with OR, XOR, and NOT.
What I don't understand is their application. I tried looking everywhere and most of them just explain how bitwise operations work. Of all the bitwise operators I only understand the application of shift operators (multiplication and division). I also came across masking. I understand that masking is done using bitwise AND but what exactly is its purpose and where and how can I use it?
Can you elaborate on how I can use masking? Are there similar uses for OR and XOR?
The low-level use case for the bitwise operators is to perform base 2 math. There is the well known trick to test if a number is a power of 2:
if ((x & (x - 1)) == 0) {
printf("%d is a power of 2\n", x);
}
But, it can also serve a higher level function: set manipulation. You can think of a collection of bits as a set. To explain, let each bit in a byte to represent 8 distinct items, say the planets in our solar system (Pluto is no longer considered a planet, so 8 bits are enough!):
#define Mercury (1 << 0)
#define Venus (1 << 1)
#define Earth (1 << 2)
#define Mars (1 << 3)
#define Jupiter (1 << 4)
#define Saturn (1 << 5)
#define Uranus (1 << 6)
#define Neptune (1 << 7)
Then, we can form a collection of planets (a subset) like using |:
unsigned char Giants = (Jupiter|Saturn|Uranus|Neptune);
unsigned char Visited = (Venus|Earth|Mars);
unsigned char BeyondTheBelt = (Jupiter|Saturn|Uranus|Neptune);
unsigned char All = (Mercury|Venus|Earth|Mars|Jupiter|Saturn|Uranus|Neptune);
Now, you can use a & to test if two sets have an intersection:
if (Visited & Giants) {
puts("we might be giants");
}
The ^ operation is often used to see what is different between two sets (the union of the sets minus their intersection):
if (Giants ^ BeyondTheBelt) {
puts("there are non-giants out there");
}
So, think of | as union, & as intersection, and ^ as union minus the intersection.
Once you buy into the idea of bits representing a set, then the bitwise operations are naturally there to help manipulate those sets.
One application of bitwise ANDs is checking if a single bit is set in a byte. This is useful in networked communication, where protocol headers attempt to pack as much information into the smallest area as is possible in an effort to reduce overhead.
For example, the IPv4 header utilizes the first 3 bits of the 6th byte to tell whether the given IP packet can be fragmented, and if so whether to expect more fragments of the given packet to follow. If these fields were the size of ints (1 byte) instead, each IP packet would be 21 bits larger than necessary. This translates to a huge amount of unnecessary data through the internet every day.
To retrieve these 3 bits, a bitwise AND could be used along side a bit mask to determine if they are set.
char mymask = 0x80;
if(mymask & (ipheader + 48) == mymask)
//the second bit of the 6th byte of the ip header is set
Small sets, as has been mentioned. You can do a surprisingly large number of operations quickly, intersection and union and (symmetric) difference are obviously trivial, but for example you can also efficiently:
get the lowest item in the set with x & -x
remove the lowest item from the set with x & (x - 1)
add all items smaller than the smallest present item
add all items higher than the smallest present item
calculate their cardinality (though the algorithm is nontrivial)
permute the set in some ways, that is, change the indexes of the items (not all permutations are equally efficient)
calculate the lexicographically next set that contains as many items (Gosper's Hack)
1 and 2 and their variations can be used to build efficient graph algorithms on small graphs, for example see algorithm R in The Art of Computer Programming 4A.
Other applications of bitwise operations include, but are not limited to,
Bitboards, important in many board games. Chess without bitboards is like Christmas without Santa. Not only is it a space-efficient representation, you can do non-trivial computations directly with the bitboard (see Hyperbola Quintessence)
sideways heaps, and their application in finding the Nearest Common Ancestor and computing Range Minimum Queries.
efficient cycle-detection (Gosper's Loop Detection, found in HAKMEM)
adding offsets to Z-curve addresses without deconstructing and reconstructing them (see Tesseral Arithmetic)
These uses are more powerful, but also advanced, rare, and very specific. They show, however, that bitwise operations are not just a cute toy left over from the old low-level days.
Example 1
If you have 10 booleans that "work together" you can do simplify your code a lot.
int B1 = 0x01;
int B2 = 0x02;
int B10 = 0x0A;
int someValue = get_a_value_from_somewhere();
if (someValue & (B1 + B10)) {
// B1 and B10 are set
}
Example 2
Interfacing with hardware. An address on the hardware may need bit level access to control the interface. e.g. an overflow bit on a buffer or a status byte that can tell you the status of 8 different things. Using bit masking you can get down the the actual bit of info you need.
if (register & 0x80) {
// top bit in the byte is set which may have special meaning.
}
This is really just a specialized case of example 1.
Bitwise operators are particularly useful in systems with limited resources as each bit can encode a boolean. Using many chars for flags is wasteful as each takes one byte of space (when they could be storing 8 flags each).
Commonly microcontrollers have C interfaces for their IO ports in which each bit controls 1 of 8 ports. Without bitwise operators these would be quite difficult to control.
Regarding masking, it is common to use both & and |:
x & 0x0F //ensures the 4 high bits are 0
x | 0x0F //ensures the 4 low bits are 1
In microcontroller applications, you can utilize bitwise to switch between ports. In the below picture, if we would like to turn on a single port while turning off the rest, then the following code can be used.
void main()
{
unsigned char ON = 1;
TRISB=0;
PORTB=0;
while(1){
PORTB = ON;
delay_ms(200);
ON = ON << 1;
if(ON == 0) ON=1;
}
}

How to flip and reverse an int in C?

For example:
Input: 01011111
Output: 00000101
I know I can use ~ to flip a number, but I don't know good ways to reverse it. And I'm not sure whether they can be done together.
Does anyone have any ideas?
For this sort of thing I'd advise you to go to the fantastic bit twiddling hacks webpage. Here's one of the solutions from that page:
Reverse the bits in a byte with 3 operations (64-bit multiply and modulus division):
unsigned char b; // reverse this (8-bit) byte
b = (b * 0x0202020202ULL & 0x010884422010ULL) % 1023;
The multiply operation creates five separate copies of the 8-bit byte pattern to fan-out into a 64-bit value. The AND operation selects the bits that are in the correct (reversed) positions, relative to each 10-bit groups of bits. The multiply and the AND operations copy the bits from the original byte so they each appear in only one of the 10-bit sets. The reversed positions of the bits from the original byte coincide with their relative positions within any 10-bit set. The last step, which involves modulus division by 2^10 - 1, has the effect of merging together each set of 10 bits (from positions 0-9, 10-19, 20-29, ...) in the 64-bit value. They do not overlap, so the addition steps underlying the modulus division behave like or operations.
This method was attributed to Rich Schroeppel in the Programming Hacks section of Beeler, M., Gosper, R. W., and Schroeppel, R. HAKMEM. MIT AI Memo 239, Feb. 29, 1972.
And here's a different solution that doesn't use 64-bit integers:
Reverse the bits in a byte with 7 operations (no 64-bit):
b = ((b * 0x0802LU & 0x22110LU) | (b * 0x8020LU & 0x88440LU)) * 0x10101LU >> 16;
Make sure you assign or cast the result to an unsigned char to remove garbage in the higher bits. Devised by Sean Anderson, July 13, 2001. Typo spotted and correction supplied by Mike Keith, January 3, 2002.

What does >> and 0xfffffff8 mean?

I was told that (i >> 3) is faster than (i/8) but I can't find any information on what >> is. Can anyone point me to a link that explains it?
The same person told me "int k = i/8, followed by k*8 is better accomplished by (i&0xfffffff8);" but again Google didn't help m...
Thanks for any links!
As explained here the >> operator is simply a bitwise shift of the bits of i. So shifting i 1 bit to the right results in an integer-division by 2 and shifting by 3 bits results in a division by 2^3=8.
But nowadays this optimization for division by a power of two should not really be done anymore, as compilers should be smart enough to do this themselves.
Similarly a bitwise AND with 0xFFFFFFF8 (1...1000, last 3 bits 0) is equal to rounding down i to the nearest multiple of 8 (like (i/8)*8 does), as it will zero the last 3 bits of i.
Bitwise shift right.
i >> 3 moves the i integer 3 places to the right [binary-way] - aka, divide by 2^3.
int x = i / 8 * 8:
1) i / 8, can be replaced with i >> 3 - bitwise shift to the right on to 3 digits (8 = 2^3)
2) i & xfffffff8 comparison with mask
For example you have:
i = 11111111
k (i/8) would be: 00011111
x (k * 8) would be: 11111000
Therefore the operation just resets last 3 bits:
And comparable time cost multiplication and division operation can be rewritten simple with
i & xfffffff8 - comparison with (... 11111000 mask)
They are Bitwise Operations
The >> operator is the bit shift operator. It takes the bit represented by the value and shifts them over a set number of slots to the right.
Regarding the first half:
>> is a bit-wise shift to the right.
So shifting a numeric value 3 bits to the right is the same as dividing by 8 and inting the result.
Here's a good reference for operators and their precedence: http://web.cs.mun.ca/~michael/c/op.html
The second part of your question involves the & operator, which is a bit-wise AND. The example is ANDing i and a number that leaves all bits set except for the 3 least significant ones. That is essentially the same thing happening when you have a number, divide it by 8, store the result as an integer, then multiply that result by 8.
The reason this is so is that dividing by 8 and storing as an integer is the same as bit-shifting to the right 3 places, and multiplying by 8 and storing the result in an int is the same as bit-shifting to the left 3 places.
So, if you're multiplying or dividing by a power of 2, such as 8, and you're going to accept the truncating of bits that happens when you store that result in an int, bit-shifting is faster, operationally. This is because the processor can skip the multiply/divide algorithm and just go straight to shifting bits, which involves few steps.
Bitwise shifting.
Suppose I have an 8 -bit integer, in binary
01000000
If I left shift (>> operator) 1 the result is
00100000
If I then right shift (<< operator) 1, I clearly get back to wear I started
01000000
It turns out that because the first binary integer is equivelant to
0*2^7 + 1*2^6 + 0*2^5 + 0*2^4 + 0*2^3 + 1*2^2 + 0*2^1 + 0*2^0
or simply 2^6 or 64
When we right shift 1 we get the following
0*2^7 + 0*2^6 + 1*2^5 + 0*2^4 + 0*2^3 + 1*2^2 + 0*2^1 + 0*2^0
or simply 2^5 or 32
Which means
i >> 1
is the same as
i / 2
If we shift once more (i >> 2), we effectively divide by 2 once again and get
i / 2 / 2
Which is really
i / 4
Not quite a mathematical proof, but you can see the following holds true
i >> n == i / (2^n)
That's called bit shifting, it's an operation on bits, for example, if you have a number on a binary base, let's say 8, it will be 1000, so
x = 1000;
y = x >> 1; //y = 100 = 4
z = x >> 3; //z = 1
Your shifting the bits in binary so for example:
1000 == 8
0100 == 4 (>> 1)
0010 == 2 (>> 2)
0001 == 1 (>> 3)
Being as you're using a base two system, you can take advantage with certain divisors (integers only!) and just bit-shift. On top of that, I believe most compilers know this and will do this for you.
As for the second part:
(i&0xfffffff8);
Say i = 16
0x00000010 & 0xfffffff8 == 16
(16 / 8) * 8 == 16
Again taking advantage of logical operators on binary. Investigate how logical operators work on binary a bit more for really clear understanding of what is going on at the bit level (and how to read hex).
>> is right shift operation.
If you have a number 8, which is represented in binary as 00001000, shifting bits to the right by 3 positions will give you 00000001, which is decimal 1. This is equivalent to dividing by 2 three times.
Division and multiplication by the same number means that you set some bits at the right to zero. The same can be done if you apply a mask. Say, 0xF8 is 11111000 bitwise and if you AND it to a number, it will set its last three bits to zero, and other bits will be left as they are. E.g., 10 & 0xF8 would be 00001010 & 11111000, which equals 00001000, or 8 in decimal.
Of course if you use 32-bit variables, you should have a mask fitting this size, so it will have all the bits set to 1, except for the three bits at the right, giving you your number - 0xFFffFFf8.
>> shifts the number to the right. Consider a binary number 0001000 which represents 8 in the decimal notation. Shifting it 3 bits to the right would give 0000001 which is the number 1 in decimal notation. Thus you see that every 1 bit shift to the right is in fact a division by 2. Note here that the operator truncates the float part of the result.
Therefore i >> n implies i/2^n.
This might be fast depending on the implementation of the compiler. But it generally takes place right in the registers and therefore is very fast as compared to traditional divide and multiply.
The answer to the second part is contained in the first one itself. Since division also truncates all the float part of the result, the division by 8 will in theory shift your number 3 bits to the right, thereby losing all the information about the rightmost 3 bits. Now when you again multiply it by 8 (which in theory means shifting left by 3 bits), you are padding the righmost 3 bits by zero after shifting the result left by 3 bits. Therefore, the complete operation can be considered as one "&" operation with 0xfffffff8 which means that the number has all bits 1 except the rightmost 4 bits which are 1000.

Bitshift to multiply by any number

Using only adding, subtracting, and bitshifting, how can I multiply an integer by a given number?
For example, I want to multiply an integer by 17.
I know that shifting left is multiplying by a multiple of 2 and shifting right is dividing by a power of 2 but I don’t know how to generalize that.
What about negative numbers? Convert to two's complement and do the same procedure?
(EDIT: OK, I got this, nevermind. You convert to two's complement and then do you shifting according to the number from left to right instead of right to left.)
Now the tricky part comes in. We can only use 3 operators.
For example, multiplying by 60 I can accomplish by using this:
(x << 5) + (x << 4) + (x << 3) + (x << 2)
Where x is the number I am multiplying. But that is 7 operators - how can I condense this to use only 3?
It's called shift-and-add. Wikipedia has a good explanation of this:
http://en.wikipedia.org/wiki/Multiplication_algorithm#Shift_and_add
EDIT:
To answer your other question, yes converting to two's compliment will work. But you need to sign extend it long enough to hold the entire product. (assuming that's what you want)
EDIT2:
If both operands are negative, just two's compliment both of them from the start and you won't have to worry about this.
Here's an example of multiplying by 3:
unsigned int y = (x << 1) + (x << 0);
(where I'm assuming that x is also unsigned).
Hopefully you should be able to generalise this.
As far as I know, there is no easy way to multiply in general using just 3 operators.
Multiplying with 60 is possible, since 60 = 64 - 4: (x << 6) - (x << 2)
17 = 16 + 1 = (2^4) + (2^0). Therefore, shift your number left 4 bits (to multiply by 2^4 = 16), and add the original number to it.
Another way to look at it is: 17 is 10001 in binary (base 2), so you need a shift operation for each of the bits set in the multiplier (i.e. bits 4 and 0, as above).
I don't know C, so I won't embarrass myself by offering code.
Numbers that would work with using only 3 operators (a shift, plus or minus, and another shift) is limited, but way more than the 3, 17 and 60 mentioned above. If a number can be represented as (2^x) +/- (2^y) it can be done with only 3 operators.

optimized byte array shifter

I'm sure this has been asked before, but I need to implement a shift operator on a byte array of variable length size. I've looked around a bit but I have not found any standard way of doing it. I came up with an implementation which works, but I'm not sure how efficient it is. Does anyone know of a standard way to shift an array, or at least have any recommendation on how to boost the performance of my implementation;
char* baLeftShift(const char* array, size_t size, signed int displacement,char* result)
{
memcpy(result,array,size);
short shiftBuffer = 0;
char carryFlag = 0;
char* byte;
if(displacement > 0)
{
for(;displacement--;)
{
for(byte=&(result[size - 1]);((unsigned int)(byte))>=((unsigned int)(result));byte--)
{
shiftBuffer = *byte;
shiftBuffer <<= 1;
*byte = ((carryFlag) | ((char)(shiftBuffer)));
carryFlag = ((char*)(&shiftBuffer))[1];
}
}
}
else
{
unsigned int offset = ((unsigned int)(result)) + size;
displacement = -displacement;
for(;displacement--;)
{
for(byte=(char*)result;((unsigned int)(byte)) < offset;byte++)
{
shiftBuffer = *byte;
shiftBuffer <<= 7;
*byte = ((carryFlag) | ((char*)(&shiftBuffer))[1]);
carryFlag = ((char)(shiftBuffer));
}
}
}
return result;
}
If I can just add to what #dwelch is saying, you could try this.
Just move the bytes to their final locations. Then you are left with a shift count such as 3, for example, if each byte still needs to be left-shifted 3 bits into the next higher byte. (This assumes in your mind's eye the bytes are laid out in ascending order from right to left.)
Then rotate each byte to the left by 3. A lookup table might be faster than individually doing an actual rotate. Then, in each byte, the 3 bits to be shifted are now in the right-hand end of the byte.
Now make a mask M, which is (1<<3)-1, which is simply the low order 3 bits turned on.
Now, in order, from high order byte to low order byte, do this:
c[i] ^= M & (c[i] ^ c[i-1])
That will copy bits to c[i] from c[i-1] under the mask M.
For the last byte, just use a 0 in place of c[i-1].
For right shifts, same idea.
My first suggestion would be to eliminate the for loops around the displacement. You should be able to do the necessary shifts without the for(;displacement--;) loops. For displacements of magnitude greater than 7, things get a little trickier because your inner loop bounds will change and your source offset is no longer 1. i.e. your input buffer offset becomes magnitude / 8 and your shift becomes magnitude % 8.
It does look inefficient and perhaps this is what Nathan was referring to.
assuming a char is 8 bits where this code is running there are two things to do first move the whole bytes, for example if your input array is 0x00,0x00,0x12,0x34 and you shift left 8 bits then you get 0x00 0x12 0x34 0x00, there is no reason to do that in a loop 8 times one bit at a time. so start by shifting the whole chars in the array by (displacement>>3) locations and pad the holes created with zeros some sort of for(ra=(displacement>>3);ra>3)] = array[ra]; for(ra-=(displacement>>3);ra>(7-(displacement&7))). a good compiler will precompute (displacement>>3), displacement&7, 7-(displacement&7) and a good processor will have enough registers to keep all of those values. you might help the compiler by making separate variables for each of those items, but depending on the compiler and how you are using it it could make it worse too.
The bottom line though is time the code. perform a thousand 1 bit shifts then a thousand 2 bit shifts, etc time the whole thing, then try a different algorithm and time it the same way and see if the optimizations make a difference, make it better or worse. If you know ahead of time this code will only ever be used for single or less than 8 bit shifts adjust the timing test accordingly.
your use of the carry flag implies that you are aware that many processors have instructions specifically for chaining infinitely long shifts using the standard register length (for single bit at a time) rotate through carry basically. Which the C language does not support directly. for chaining single bit shifts you could consider assembler and likely outperform the C code. at least the single bit shifts are faster than C code can do. A hybrid of moving the bytes then if the number of bits to shift (displacement&7) is maybe less than 4 use the assembler else use a C loop. again the timing tests will tell you where the optimizations are.

Resources