Using bitwise operations - c

How often you use bitwise operation "hacks" to do some kind of
optimization? In what kind of situations is it really useful?
Example: instead of using if:
if (data[c] >= 128) //in a loop
sum += data[c];
you write:
int t = (data[c] - 128) >> 31;
sum += ~t & data[c];
Of course assuming it does the same intended result for this specific situation.
Is it worth it? I find it unreadable. How often do you come across
this?
Note: I saw this code here in the chosen answers :Why is processing a sorted array faster than an unsorted array?

While that code was an excellent way to show what's going on, I usually wouldn't use code like that. If it had to be fast, there are usually even faster solutions, such as using SSE on x86 or NEON on ARM. If none of that is available, sure, I'll use it, provided it helps and it's necessary.
By the way, I explain how it works in this answer
Like Skylion, one thing I've used a lot is figuring out whether a number is a power of two. Think a while about how you'd do that.. then look at this: (x & (x - 1)) == 0 && x != 0
It's tricky the first time you see it, I suppose, but once you get used to it it's just so much simpler than any alternative that doesn't use bitmath. It works because subtracting 1 from a number means that the borrow starts at the rightmost end of the number and runs through all the zeroes, then stops at the first 1 which turns into a zero. ANDing that number with the original then makes the rightmost 1 zero. Powers of two only have one 1, which disappears, leaving zero. All other numbers will have at least one 1 left, except zero, which is a special case. A common variant doesn't test for zero, and is OK with treating it as power of two or knows that zero can't happen.
Similarly there are other things that you can easily do with bitmath, but not so easy without. As they say, use the right tool for the job. Sometimes bitmath is the right tool.

Bitwise operations are so useful that prof. Knuth wrote a book abot them: http://www.amazon.com/The-Computer-Programming-Volume-Fascicle/dp/0321580508
Just to mention a few simplest ones: int multiplication and division by a power of two (using left and right shift), mod with respect to a power of two, masking and so on. When using bitwise ops just be sure to provide sufficient comments about what's going on.
However, your example, data[c]>128 is not applicable IMO, just keep it that way.
But if you want to compute data[c] % 128 then data[c] & 0x7f is much faster (where & represents bitwise AND).

There are several instances where using such hacks may be useful. For instance, they can remove some Java Virtual Machine "Optimizations" such as branch predictors. I have found them useful only once in a few cases. The main one is multiplying by -1. If you are doing it hundreds of times across a massive array it is more efficient to simply flip the first bit, than to actually multiple. Another example I have used it is to know if a number is a power of 2 (since it's so easy to figure out in binary.) Basically, bit hacks are useful when you want to cheat. Here is a human analogy. If you have list of numbers and you need to know if they are greater than 29, You can automatically know if the first digit is larger than 3, then the whole thing is larger than 30 an vice versa. Bitwise operations simply allow you to perform similar cheats to binary.

Related

Determine if a given integer number is element of the Fibonacci sequence in C without using float

I had recently an interview, where I failed and was finally told having not enough experience to work for them.
The position was embedded C software developer. Target platform was some kind of very simple 32-bit architecture, those processor does not support floating-point numbers and their operations. Therefore double and float numbers cannot be used.
The task was to develop a C routine for this architecture. This takes one integer and returns whether or not that is a Fibonacci number. However, from the memory only an additional 1K temporary space is allowed to use during the execution. That means: even if I simulate very great integers, I can't just build up the sequence and interate through.
As far as I know, a positive integer is a exactly then a Fibonacci number if one of
(5n ^ 2) + 4
or
(5n ^ 2) − 4
is a perfect square. Therefore I responded the question: it is simple, since the routine must determine whether or not that is the case.
They responded then: on the current target architecture no floating-point-like operations are supported, therefore no square root numbers can be retrieved by using the stdlib's sqrt function. It was also mentioned that basic operations like division and modulus may also not work because of the architecture's limitations.
Then I said, okay, we may build an array with the square numbers till 256. Then we could iterate through and compare them to the numbers given by the formulas (see above). They said: this is a bad approach, even if it would work. Therefore they did not accept that answer.
Finally I gave up. Since I had no other ideas. I asked, what would be the solution: they said, it won't be told; but advised me to try to look for it myself. My first approach (the 2 formula) should be the key, but the square root may be done alternatively.
I googled at home a lot, but never found any "alternative" square root counter algorithms. Everywhere was permitted to use floating numbers.
For operations like division and modulus, the so-called "integer-division" may be used. But what is to be used for square root?
Even if I failed the interview test, this is a very interesting topic for me, to work on architectures where no floating-point operations are allowed.
Therefore my questions:
How can floating numbers simulated (if only integers are allowed to use)?
What would be a possible soultion in C for that mentioned problem? Code examples are welcome.
The point of this type of interview is to see how you approach new problems. If you happen to already know the answer, that is undoubtedly to your credit but it doesn't really answer the question. What's interesting to the interviewer is watching you grapple with the issues.
For this reason, it is common that an interviewer will add additional constraints, trying to take you out of your comfort zone and seeing how you cope.
I think it's great that you knew that fact about recognising Fibonacci numbers. I wouldn't have known it without consulting Wikipedia. It's an interesting fact but does it actually help solve the problem?
Apparently, it would be necessary to compute 5n²±4, compute the square roots, and then verify that one of them is an integer. With access to a floating point implementation with sufficient precision, this would not be too complicated. But how much precision is that? If n can be an arbitrary 32-bit signed number, then n² is obviously not going to fit into 32 bits. In fact, 5n²+4 could be as big as 65 bits, not including a sign bit. That's far beyond the precision of a double (normally 52 bits) and even of a long double, if available. So computing the precise square root will be problematic.
Of course, we don't actually need a precise computation. We can start with an approximation, square it, and see if it is either four more or four less than 5n². And it's easy to see how to compute a good guess: it will very close to n×√5. By using a good precomputed approximation of √5, we can easily do this computation without the need for floating point, without division, and without a sqrt function. (If the approximation isn't accurate, we might need to adjust the result up or down, but that's easy to do using the identity (n+1)² = n²+2n+1; once we have n², we can compute (n+1)² with only addition.
We still need to solve the problem of precision, so we'll need some way of dealing with 66-bit integers. But we only need to implement addition and multiplication of positive integers, is considerably simpler than a full-fledged bignum package. Indeed, if we can prove that our square root estimation is close enough, we could safely do the verification modulo 2³¹.
So the analytic solution can be made to work, but before diving into it, we should ask whether it's the best solution. One very common caregory of suboptimal programming is clinging desperately to the first idea you come up with even when as its complications become increasingly evident. That will be one of the things the interviewer wants to know about you: how flexible are you when presented with new information or new requirements.
So what other ways are there to know if n is a Fibonacci number. One interesting fact is that if n is Fib(k), then k is the floor of logφ(k×√5 + 0.5). Since logφ is easily computed from log2, which in turn can be approximated by a simple bitwise operation, we could try finding an approximation of k and verifying it using the classic O(log k) recursion for computing Fib(k). None of the above involved numbers bigger than the capacity of a 32-bit signed type.
Even more simply, we could just run through the Fibonacci series in a loop, checking to see if we hit the target number. Only 47 loops are necessary. Alternatively, these 47 numbers could be precalculated and searched with binary search, using far less than the 1k bytes you are allowed.
It is unlikely an interviewer for a programming position would be testing for knowledge of a specific property of the Fibonacci sequence. Thus, unless they present the property to be tested, they are examining the candidate’s approaches to problems of this nature and their general knowledge of algorithms. Notably, the notion to iterate through a table of squares is a poor response on several fronts:
At a minimum, binary search should be the first thought for table look-up. Some calculated look-up approaches could also be proposed for discussion, such as using find-first-set-bit instruction to index into a table.
Hashing might be another idea worth considering, especially since an efficient customized hash might be constructed.
Once we have decided to use a table, it is likely a direct table of Fibonacci numbers would be more useful than a table of squares.

Performing bit level permutations on a quadword

I'm looking for the fastest possible way to permutate bits in a 64 bit integer.
Given a table called "array" corresponding to a permutations array, meaning it has a size of 64 and filled with unique numbers (i.e. no repetition) ranging from 0 to 63, corresponding to bit positions in a 64 bit integer, I can permutate bits this way
bit = GetBitAtPos(integer_, array[i]);
SetBitAtPos(integer_, array[i], GetBitAtPos(integer_, i));
SetBitAtPos(integer_, i, bit);
(by looping i from 0 to 63)
GetBitAtPos being
GetBitAtPos(integer_, pos) { return (integer >>) pos & 1 }
Setbitatpos is also founded on the same principle (i.e. using C operators),
under the form SetBitAtPos(integer, position, bool_bit_value)
I was looking for a faster way, if possible, to perform this task. I'm open to any solution, including inline assembly if necessary. I have difficulty to figure a better way than this, so I thought I'd ask.
I'd like to perform such a task to hide data in a 64 bit generated integer (where the 4 first bit can reveal informations). It's a bit better than say a XOR mask imo (unless I miss something), mostly if someone tries to find a correlation.
It also permits to do the inverse operation to not lose the precious bits...
However I find the operation to be a bit costly...
Thanks
Since the permutation is constant, you should be able to come up with a better way than moving the bits one by one (if you're OK with publishing your secret permutation, I can have a go at it). The simplest improvement is moving bits that have the same distance (that can be a modular distance because you can use rotates) between them in the input and output at the same time. This is a very good methods if there are few such groups.
If that didn't work out as well as you'd hoped, see if you can use bit_permute_steps to move all or most of the bits. See the rest of that site for more ideas.
If you can use PDEP and PEXT, you can move bits in groups where the distance between bits can arbitrarily change (but their order can not). It is, afaik, unknown how fast they will be though (and they're not available yet).
The best method is probably going to be a combination of these and other tricks mentioned in other answers.
There are too many possibilities to explore them all, really, so you're probably not going to find the best way to do the permutation, but using these ideas (and the others that were posted) you can doubtlessly find a better what than you're currently using.
PDEP and PEXT have been available for a while now so their performance is known, at 3 cycle latency and 1/cycle throughput they're faster than most other useful permutation primitives (except trivial ones).
Split your bits into subsets where this method works:
Extracting bits with a single multiplication
Then combine the results using bitwise OR.
For 64-bit number I believe the problem (of finding best algorithm) may be unsolvable due to huge amount of possibilities. One of the most scalable and easiest to automatize would be look up table:
result = LUT0[ value & 0xff] +
LUT1[(value >> 8) & 0xff] +
LUT2[(value >> 16) & 0xff] + ...
+ LUT7[(value >> 56) & 0xff];
Each LUT entry must be 64-bit wide and it just spreads each 8 bits in a subgroup to the full range of 64 possible bins. This configuration uses 16k of memory.
The scalability comes from the fact that one can use any number of look up tables (practical range from 3 to 32?). This method is vulnerable to cache misses and it can't be parallelized (for large table sizes at least).
If there are certain symmetries, there are some clever trick available --
e.g. swapping two bits in Intel:
test eax, (1<<BIT0 | 1<<BIT1)
jpe skip:
xor eax, (1<<BIT0 | 1<<BIT1)
skip:
This OTOH is highly vulnerable to branch mispredictions.

Practical applications of bit shifting

I totally understand how to shift bits. I've worked through numerous examples on paper and in code and don't need any help there.
I'm trying to come up with some real world examples of how bit shifting is used. Here are some examples I've been able to come up with:
Perhaps the most important example I could conceptualize had to do with endianness. In big endian systems, least significant bits are stored from the left, and in little endian systems, least significant bits are stored from the right. I imagine that for files and networking transmissions between systems which use opposite endian strategies, certain conversions must be made.
It seems certain optimizations could be made by compilers and processors when dealing with any multiplications that are n^2, n^4, etc. The bits are just being shifted to the left. (Conversly, I suppose the same would apply for division, n/2, n/4, etc.)
In encryption algorithms. Ie using a series of bit shifts, reverses and combinations to obfuscate something.
Are all of these accurate examples? Is there anything you would add? I've spent quite a bit of time learning about how to implement bit shifting / reordering / byte swapping and I want to know how it can be practically applied = )
I would not agree that the most important example is endianness but it is useful. Your examples are valid.
Hash functions often use bitshifts as a way to get a chaotic behavior; not dissimilar to your cryptographic algorithms.
One common use is to use an int/long as a series of flag values, that can be checked, set, and cleared by bitwise operators.
Not really widely used, but in (some) chess games the board and moves are represented with 64 bit integer values (called bitboards) so evaluating legal moves, making moves, etc. is done with bitwise operators. Lots of explanations of this on the net, but this one seems like a pretty good explanation: http://www.frayn.net/beowulf/theory.html#bitboards.
And finally, you might find that you need to count the number of bits that are set in an int/long, in some technical interviews!
The most common example of bitwise shift usage I know is for setting and clearing bits.
uint8_t bla = INIT_VALUE;
bla |= (1U << N); // Set N-th bit
bla &= ~(1U << N); // Clear N-th bit
Quick multiplication and division by a power of 2 - Especially important in embedded applications
CRC computation - Handy for networks e.g. Ethernet
Mathematical calculations that requires very large numbers
Just a couple off the top of my head

In C, would !~b ever be faster than b == 0xff?

From a long time ago I have a memory which has stuck with me that says comparisons against zero are faster than any other value (ahem Z80).
In some C code I'm writing I want to skip values which have all their bits set. Currently the type of these values is char but may change. I have two different alternatives to perform the test:
if (!~b)
/* skip */
and
if (b == 0xff)
/* skip */
Apart from the latter making the assumption that b is an 8bit char whereas the former does not, would the former ever be faster due to the old compare to zero optimization trick, or are the CPUs of today way beyond this kind of thing?
If it is faster, the compiler will substitute it for you.
In general, you can't write C better than the compiler can optimize it. And it is architecture specific anyway.
In short, don't worry about it unless that sub-micro-nano-second is ultra important
From what I recall in my architecture classes, I believe they should be equally fast. Both have 2 instructions.
First example
1. Negate b into a temp register
2. Compare temp register equal 0
Second example
1. Subtract 0xff from b into a temp register
2. Compare temp register equal to 0
These are basically identical, and besides, even if your particular architecture requires more or less than this, is it really worth the fraction of a nanosecond? Several minutes have been spent just answering this question.
I would say it's not so much the that CPUs are beyond these kind of tricks as it is the compilers.
The CPUs of today are, however, beyond simple tricks which pull an extra clock-tick or two of speed. Even if you do this 100,000 times a second, we are still only talking about an increase in speed of 0.00003 seconds on a single-core 3Ghz computer - it is simply not worth your time to worry about things like this.
Go with the one that will be easier for the person who is maintaining your code to understand. If you have a successful product, most of the expense in software is in maintenance. If you write cryptic code you add to that expense. If you don't have a successful product, it doesn't matter because no one will have to maintain it. I have been in situations where I had to save every byte I could, and had to resort to tricks like the one you gave, but I only do it as the very very very last resort.

How to divide big numbers?

I have a big number (integer, unsigned) stored in 2 variables (as you can see, the high and low part of number):
unsigned long long int high;
unsigned long long int low;
I know how to add or subtract some other that-kind of variable.
But I need to divide that-kind of numbers. How to do it? I know, I can subtract N times, but, maybe, there are more better solutions. ;-)
Language: C
Yes. It will involve shifts, and I don't recommend doing that in C. This is one of those rare examples where assembler can still prove its value, easily making things run hundreds of times faster (And I don't think I'm exaggerating this.)
I don't claim total correctness, but the following should get you going :
(1) Initialize result to zero.
(2) Shift divisor as many bits as possible to the left, without letting it become greater than the dividend.
(3) Subtract shifted divisor from dividend and add one to result.
(4) Now shift divisor to the right until once again, it is less than the remaining dividend, and for each right-shift, left-shift result by one bit. Go back to (3) unless stopping condition is satisfied. (Stopping condition must be something like "divisor has become zero", but I'm not certain about that.)
It really feels great to get back to some REAL programming problems :-)
Have you looked at any large-number libraries, such as GNU MP BigNum?
I know, I can subtract N times, but, maybe, there are more better solutions.
Subtracting N times may be slow when N is large.
Better (i.e. more complicated but faster) would be shift-and-subtract, using the algorithm you learned to do long division of decimal numbers in elementary school.
[There may also be 3rd-party library and/or compiler-specific support for such numbers.]
Hmm. I suppose if you have some headroom in "high", you could shift it all up one digit, divide high by the number, then add the remainder to the top remaining digit in low and divide low by the number, then shift everything back.
Here's another library doing 128 bit arithmetic. GnuCash: Math128.
Per my commenters below, my previous answer was stupid.
Quickly, my new answer would be that when I've tried to do this in the past, it almost always involved shifting, because it's the only operation that can be applied across multiple "words", if you will, and have it look the same as if it were one large word (with the exception of having to track carryover bits).
There are a couple different approaches to it, but I don't know of any better general direction than using shifts, unless your hardware has some special operations.
You could implement a "BigInt" type algorithm that does divisions on string arrays. Create 1 string array for each high,low pair and do the division. Store the result in another string array, then convert back to high,low integer pair.
Since the language is C, the array would probably be a character array. Consider it analogous to the "string array" I was mentioning above.
You can do addition and subtraction of arbitrarily large binary objects using the assembler looping and "add/subtract with carry (adc/sbb)" instructions. You can implement the other operations using them. I've never investigated doing anything beyond those two personally.
If your processor (or your C library) has a fast 64-bit divide, you can break the 128-bit divide into pieces (the same way you'd do a 32-bit divide on processors that had 16-bit divisions).
By the way, there are all sorts of tricks you can use if you know what typical values will be for the dividend and divisor. What is the source of these numbers? If a lot of your cases can be solved quickly, it might be OK the occasional case takes a long time.
Also, if you can find cases where an approximate answer is OK, that opens the door to a lot of speedy approximations.

Resources