I have a big number (integer, unsigned) stored in 2 variables (as you can see, the high and low part of number):
unsigned long long int high;
unsigned long long int low;
I know how to add or subtract some other that-kind of variable.
But I need to divide that-kind of numbers. How to do it? I know, I can subtract N times, but, maybe, there are more better solutions. ;-)
Yes. It will involve shifts, and I don't recommend doing that in C. This is one of those rare examples where assembler can still prove its value, easily making things run hundreds of times faster (And I don't think I'm exaggerating this.)
I don't claim total correctness, but the following should get you going :
(1) Initialize result to zero.
(2) Shift divisor as many bits as possible to the left, without letting it become greater than the dividend.
(3) Subtract shifted divisor from dividend and add one to result.
(4) Now shift divisor to the right until once again, it is less than the remaining dividend, and for each right-shift, left-shift result by one bit. Go back to (3) unless stopping condition is satisfied. (Stopping condition must be something like "divisor has become zero", but I'm not certain about that.)
Have you looked at any large-number libraries, such as GNU MP BigNum?

I know, I can subtract N times, but, maybe, there are more better solutions.
Subtracting N times may be slow when N is large.
Better (i.e. more complicated but faster) would be shift-and-subtract, using the algorithm you learned to do long division of decimal numbers in elementary school.
[There may also be 3rd-party library and/or compiler-specific support for such numbers.]

Hmm. I suppose if you have some headroom in "high", you could shift it all up one digit, divide high by the number, then add the remainder to the top remaining digit in low and divide low by the number, then shift everything back.

Here's another library doing 128 bit arithmetic. GnuCash: Math128.

Quickly, my new answer would be that when I've tried to do this in the past, it almost always involved shifting, because it's the only operation that can be applied across multiple "words", if you will, and have it look the same as if it were one large word (with the exception of having to track carryover bits).
There are a couple different approaches to it, but I don't know of any better general direction than using shifts, unless your hardware has some special operations.

You could implement a "BigInt" type algorithm that does divisions on string arrays. Create 1 string array for each high,low pair and do the division. Store the result in another string array, then convert back to high,low integer pair.
Since the language is C, the array would probably be a character array. Consider it analogous to the "string array" I was mentioning above.

You can do addition and subtraction of arbitrarily large binary objects using the assembler looping and "add/subtract with carry (adc/sbb)" instructions. You can implement the other operations using them. I've never investigated doing anything beyond those two personally.

If your processor (or your C library) has a fast 64-bit divide, you can break the 128-bit divide into pieces (the same way you'd do a 32-bit divide on processors that had 16-bit divisions).
By the way, there are all sorts of tricks you can use if you know what typical values will be for the dividend and divisor. What is the source of these numbers? If a lot of your cases can be solved quickly, it might be OK the occasional case takes a long time.
Also, if you can find cases where an approximate answer is OK, that opens the door to a lot of speedy approximations.


Determine if a given integer number is element of the Fibonacci sequence in C without using float

I had recently an interview, where I failed and was finally told having not enough experience to work for them.
The position was embedded C software developer. Target platform was some kind of very simple 32-bit architecture, those processor does not support floating-point numbers and their operations. Therefore double and float numbers cannot be used.
The task was to develop a C routine for this architecture. This takes one integer and returns whether or not that is a Fibonacci number. However, from the memory only an additional 1K temporary space is allowed to use during the execution. That means: even if I simulate very great integers, I can't just build up the sequence and interate through.
As far as I know, a positive integer is a exactly then a Fibonacci number if one of
(5n ^ 2) + 4
(5n ^ 2) − 4
is a perfect square. Therefore I responded the question: it is simple, since the routine must determine whether or not that is the case.
They responded then: on the current target architecture no floating-point-like operations are supported, therefore no square root numbers can be retrieved by using the stdlib's sqrt function. It was also mentioned that basic operations like division and modulus may also not work because of the architecture's limitations.
Then I said, okay, we may build an array with the square numbers till 256. Then we could iterate through and compare them to the numbers given by the formulas (see above). They said: this is a bad approach, even if it would work. Therefore they did not accept that answer.
Finally I gave up. Since I had no other ideas. I asked, what would be the solution: they said, it won't be told; but advised me to try to look for it myself. My first approach (the 2 formula) should be the key, but the square root may be done alternatively.
I googled at home a lot, but never found any "alternative" square root counter algorithms. Everywhere was permitted to use floating numbers.
For operations like division and modulus, the so-called "integer-division" may be used. But what is to be used for square root?
Even if I failed the interview test, this is a very interesting topic for me, to work on architectures where no floating-point operations are allowed.
Therefore my questions:
How can floating numbers simulated (if only integers are allowed to use)?
What would be a possible soultion in C for that mentioned problem? Code examples are welcome.
The point of this type of interview is to see how you approach new problems. If you happen to already know the answer, that is undoubtedly to your credit but it doesn't really answer the question. What's interesting to the interviewer is watching you grapple with the issues.
For this reason, it is common that an interviewer will add additional constraints, trying to take you out of your comfort zone and seeing how you cope.
I think it's great that you knew that fact about recognising Fibonacci numbers. I wouldn't have known it without consulting Wikipedia. It's an interesting fact but does it actually help solve the problem?
Apparently, it would be necessary to compute 5n²±4, compute the square roots, and then verify that one of them is an integer. With access to a floating point implementation with sufficient precision, this would not be too complicated. But how much precision is that? If n can be an arbitrary 32-bit signed number, then n² is obviously not going to fit into 32 bits. In fact, 5n²+4 could be as big as 65 bits, not including a sign bit. That's far beyond the precision of a double (normally 52 bits) and even of a long double, if available. So computing the precise square root will be problematic.
Of course, we don't actually need a precise computation. We can start with an approximation, square it, and see if it is either four more or four less than 5n². And it's easy to see how to compute a good guess: it will very close to n×√5. By using a good precomputed approximation of √5, we can easily do this computation without the need for floating point, without division, and without a sqrt function. (If the approximation isn't accurate, we might need to adjust the result up or down, but that's easy to do using the identity (n+1)² = n²+2n+1; once we have n², we can compute (n+1)² with only addition.
We still need to solve the problem of precision, so we'll need some way of dealing with 66-bit integers. But we only need to implement addition and multiplication of positive integers, is considerably simpler than a full-fledged bignum package. Indeed, if we can prove that our square root estimation is close enough, we could safely do the verification modulo 2³¹.
So the analytic solution can be made to work, but before diving into it, we should ask whether it's the best solution. One very common caregory of suboptimal programming is clinging desperately to the first idea you come up with even when as its complications become increasingly evident. That will be one of the things the interviewer wants to know about you: how flexible are you when presented with new information or new requirements.
So what other ways are there to know if n is a Fibonacci number. One interesting fact is that if n is Fib(k), then k is the floor of logφ(k×√5 + 0.5). Since logφ is easily computed from log2, which in turn can be approximated by a simple bitwise operation, we could try finding an approximation of k and verifying it using the classic O(log k) recursion for computing Fib(k). None of the above involved numbers bigger than the capacity of a 32-bit signed type.
Even more simply, we could just run through the Fibonacci series in a loop, checking to see if we hit the target number. Only 47 loops are necessary. Alternatively, these 47 numbers could be precalculated and searched with binary search, using far less than the 1k bytes you are allowed.
It is unlikely an interviewer for a programming position would be testing for knowledge of a specific property of the Fibonacci sequence. Thus, unless they present the property to be tested, they are examining the candidate’s approaches to problems of this nature and their general knowledge of algorithms. Notably, the notion to iterate through a table of squares is a poor response on several fronts:
At a minimum, binary search should be the first thought for table look-up. Some calculated look-up approaches could also be proposed for discussion, such as using find-first-set-bit instruction to index into a table.
Hashing might be another idea worth considering, especially since an efficient customized hash might be constructed.
Once we have decided to use a table, it is likely a direct table of Fibonacci numbers would be more useful than a table of squares.

How does one divide a big integer by another big integer?

I've been researching this the last few days and I have been unable to come up with an answer. I have come up with one algorithm that works if the divisor is only one word. But, if the divisor is multiple words then I get some strange answers. I know this question has been asked a few times on here, but there has been no definitive answer except use the schoolbook method or go get a book on the subject. I have been able to get every function in my big integer library to work except division. It seems that some individuals think big integer division is a NP hard problem, and with the trouble that I'm having with it, I'm inclined to agree.
The data is stored in a structure that contains a pointer to an array of either uint16_t or uint32_t based on if the long long data type is supported or not. If long long is not supported, then uint16_t is used for the capture of any carry/overflow from multiplication and addition operations. The current functions that I have are addition, subtraction, multiply, 2's complement negation, comparison, and, or, xor, not, shift left, shift right, rotate left, rotate right, bit reversal (reflection), a few conversion routines, a random number fill routine, and some other utility routines. All these work correctly (I checked the results on a calculator) except division.
typedef struct bn_data_t bn_t;
struct bn_data_t
uint32 sz1; /* Bit Size */
uint32 sz8; /* Byte Size */
uint32 szw; /* Word Count */
bnint *dat; /* Data Array */
uint32 flags; /* Operational Flags */
This is related to another question that I asked about inline assembler as this is what it was for.
What I have found so far:
Algorithm for dividing very large numbers
What is the fastest algorithm for division of crazy large integers?
Newton-Raphson Division With Big Integers
And a bunch of academic papers on the subject.
What I have tried so far:
I have a basic routine working, but it divides a multi-word big integer number by a single word. I have tried to implement a Newton-Raphson algorithm, but that's not working as I have gotten some really strange results. I know about Newton's method from Calculus on which it is based, but this is integer math and not floating point. I understand the math behind the Goldschmidt division algorithm, but I am not clear on how to implement it with integer math. Part of the problem with some of these algorithms is that they call for a base 2 logarithm function. I know how to implement a logarithm function using floating point and a Taylor series, but not when using integer math.
I have tried looking at the GMP library, but the division algorithm is not very well documented and it kinda goes over my head. It seems that they are using different algorithms at different points which adds to the confusion.
For the academic papers, I mostly understand the math (I have cleared basic calculus math, multi-variable calculus, and ordinary differential equations), but once again, there is a disconnect between my mathematical knowledge and implementation using integer math. I have seen the grade school method being suggested which from what I can ascertain is something similar to a shift-subtract method, but I'm not too sure how to implement that one either. Any ideas? Code would be nice.
This is for my own personal learning experience. I want to learn how it is done.
EDIT: 4-JUN-2016
It has been awhile since I have worked on this as I had other irons in the fire and other projects to work on. Now that I have revisited this project, I have finally implemented big integer division using two different algorithms. The basic one is the shift-subtract method outlined here. The high speed algorithm which uses the CPU divide instruction is called only when the divisor is one word. Both algorithms have been confirmed to work properly as the results that they produce has been checked with an online big number calculator. So now, all basic math and logic functions have been implemented. Those functions include add, subtract, multiply, divide, divide with modulus, modulus, and, or, not, xor, negate, reverse (reflection), shift left, shift right, rotate left, and rotate right. I may add additional functions as their need comes up. Thank you to everyone who responded.
The schoolbook division (long-division) algorithm, commonly used for base-10 operands, can be used for arbitrarily large operands too. I will assume we are implementing the large numbers by array of digits in base B.
When we perform long-division manually for decimal operands, we usually depend on trial-and-error to find each quotient-digit d. But this trial-and-error can be replaced with an efficient method (due to D. A. Pope and M. L. Stein) when using long-division for large operands in base B.
To guess d, we can use the first digit (e) of the divisor and first two digits (yz) of the "current remainder" (resulting from a subtraction step of long-division). Say, d1 is the estimate for d obtained by dividing the number yz by e. It can be proved that, if the divisor has certain properties (which are always achievable, refer the link below), either d1 or d1-1 or d1-2 must be the required digit d. Each of these three candidates can be checked for the desired properties of d one by one.
Thus the finding of each quotient-digit becomes efficient, and for the rest part we can follow the iterative long-division process. Please refer the below article (written by me) for details about this algorithm and implementation in C:

C atmega2560 Division of large integers

So I'm wondering about the costs of division on a atmega2560 as well as in general:
Let's say I got something like this
unsigned long long a=some-large-value;
unsigned long long b=some-other-large-value;
unsigned long result=(a-b)/A_CONSTANT
//A_CONSTANT i.e. 16
How long does it actually take? Are we speaking about hundrets or thousands of cycles? And does it make a difference if I change the division to a multiplication i.e. like so
unsigned long result=(a-b)*1/A_CONSTANT
I want to use that in a time-critical application for calculating a time span which is used for determining when to execute another part of the program. Assuming the division takes too much time, what other options do I have?
This really depends on your A_CONSTANT and how good the compiler is IMO.
I've looked up the chip and it's obviously an 8 bit processor with 8 or 16 MHz.
As such, I'd consider those unsigned long long integer to be the biggest hurdle to take, if your division is trivial.
For this it would have to be a power of two (like 2, 4, 8, 16, etc.). What would happen then, would be an optimization, replacing the whole division with a simple right shift, which would be completed in far less cycles.
Switching to a multiplication won't net you anything good. You'll at least suffer precision issues and your current code would result in the result 0 all the time, unless A_CONSTANT is 1 (since you're obviously doing an integer division, where the result is rounded down).
So what to do or whether to consider this something for optimization heavily depends on the actual value of A_CONSTANT.
Probably the easiest way solving this (or comparing solutions) would be comparing the resulting assembly code, because it will be the final result that's actually processed. Optimizing this purely on theory is rather complicated and might even get you wrong or misleading results.
AVR instructions set doesn't have a divide operation on its own so as being mentioned in the comments it's all goes to point how compiler you are using implements this operation.
You might want to have a look on generated machine instructions to see what's actually generated and think of possible optimisation.
There are a lot of information available on google about different implementations of integer divisions, like for example this
Also very good source of information.

Using bitwise operations

How often you use bitwise operation "hacks" to do some kind of
optimization? In what kind of situations is it really useful?
Example: instead of using if:
if (data[c] >= 128) //in a loop
sum += data[c];
you write:
int t = (data[c] - 128) >> 31;
sum += ~t & data[c];
Of course assuming it does the same intended result for this specific situation.
Is it worth it? I find it unreadable. How often do you come across
Note: I saw this code here in the chosen answers :Why is processing a sorted array faster than an unsorted array?
While that code was an excellent way to show what's going on, I usually wouldn't use code like that. If it had to be fast, there are usually even faster solutions, such as using SSE on x86 or NEON on ARM. If none of that is available, sure, I'll use it, provided it helps and it's necessary.
By the way, I explain how it works in this answer
Like Skylion, one thing I've used a lot is figuring out whether a number is a power of two. Think a while about how you'd do that.. then look at this: (x & (x - 1)) == 0 && x != 0
It's tricky the first time you see it, I suppose, but once you get used to it it's just so much simpler than any alternative that doesn't use bitmath. It works because subtracting 1 from a number means that the borrow starts at the rightmost end of the number and runs through all the zeroes, then stops at the first 1 which turns into a zero. ANDing that number with the original then makes the rightmost 1 zero. Powers of two only have one 1, which disappears, leaving zero. All other numbers will have at least one 1 left, except zero, which is a special case. A common variant doesn't test for zero, and is OK with treating it as power of two or knows that zero can't happen.
Similarly there are other things that you can easily do with bitmath, but not so easy without. As they say, use the right tool for the job. Sometimes bitmath is the right tool.
Bitwise operations are so useful that prof. Knuth wrote a book abot them:
Just to mention a few simplest ones: int multiplication and division by a power of two (using left and right shift), mod with respect to a power of two, masking and so on. When using bitwise ops just be sure to provide sufficient comments about what's going on.
However, your example, data[c]>128 is not applicable IMO, just keep it that way.
But if you want to compute data[c] % 128 then data[c] & 0x7f is much faster (where & represents bitwise AND).
There are several instances where using such hacks may be useful. For instance, they can remove some Java Virtual Machine "Optimizations" such as branch predictors. I have found them useful only once in a few cases. The main one is multiplying by -1. If you are doing it hundreds of times across a massive array it is more efficient to simply flip the first bit, than to actually multiple. Another example I have used it is to know if a number is a power of 2 (since it's so easy to figure out in binary.) Basically, bit hacks are useful when you want to cheat. Here is a human analogy. If you have list of numbers and you need to know if they are greater than 29, You can automatically know if the first digit is larger than 3, then the whole thing is larger than 30 an vice versa. Bitwise operations simply allow you to perform similar cheats to binary.

What is a convenient base for a bignum library & primality testing algorithm?

I am to program the Solovay-Strassen primality test presented in the original paper on RSA.
Additionally I will need to write a small bignum library, and so when searching for a convenient representation for bignum I came across this specification:
struct {
int sign;
int size;
int *tab;
} bignum;
I will also be writing a multiplication routine using the Karatsuba method.
So, for my question:
What base would be convenient to store integer data in the bignum struct?
Note: I am not allowed to use third party or built-in implementations for bignum such as GMP.
Thank you.
A power of 2.
For a simple implementation, probably half the size of a word on your machine, so that you can multiply two digits without overflow. So 65536 or 4294967296. Or possibly half the size of the largest integer type, for the same reason but maybe better performance over all.
But I've never actually implemented such a library: if you're using best known algorithms then you won't be doing school-style long multiplication. Karatsuba multiplication (and whatever other clever tricks you use) might benefit from being done in an integer that's more than twice the size of the digits, I really don't know how the performance works out. If so, then you'd be best off using 256 and 32 bit arithmetic, or 65536 and 64 bit arithmetic.
In any case if your representation is binary, then you can pick and choose larger power-of-two bases as convenient for each operation. For instance, you could treat the data as base 2^16 for multiplication, but base 2^32 for addition. It's all the same thing provided you're careful about endian-ness. I'd probably start with base 2^16 (since that forces me to get the endian-ness right to begin with, while 2^8 wouldn't), and see how I get on - as each operation is optimised, part of the optimisation is to identify the best base.
Using a size which isn't a multiple of bytes is a possibility, but then you have to use the same base for everything, because there are unused bits in the storage in specific places according to the base.
You will be doing the following operation a whole lot:
Either choose 1/4 the largest word size, or 1/2 the largest word size less a bit or two. That would be either 2^16 or 2^30 for 64 bit systems and 2^8 or 2^14 for 32 bit systems. Use the largest size the compiler supports, not the hardware.
If you choose 2^31 on a 64 bit system, that means you can add 4 products without overflow. If you choose 2^30 then you can add 16 products without overflow. The more you can add without overflow, the larger interim blocks you can use.
If you choose 1/4 the word size you will still have a native type so it will be easier to store results back out. You can pretty much ignore overflow too. This will basically make writing code faster and less error prone, and is slightly more memory efficient. I would recommend this unless you like lots of bit manipulation along with your math.
Choosing a larger base will make the big O numbers look better. In practice, while it would probably be faster to have a larger base, it will not be the 4x speed bump that you might hope for.
The base you use should be a power of 2. Since it looks like you're going to keep track of sign separately, you can use unsigned ints for storing the numbers themselves. You're going to need the ability to multiply 2 pieces/digits/units of these numbers at a time, so the size must be no more than half the word size you've got available. i.e. on x86 an unsigned int is 32 bits, so you'd want your digits to be not more than 16 bits. You may also use "long long" for the intermediate results of products of unsigned ints. Then you're looking at 2^32 for your base. One last thing to consider is that you may want to add sums of products, which will overflow unless you use fewer bits.
If performance is not a major concern, I'd just use base 256 and call it a day. You may want to use typedefs and defined constants so you can later change these parameters easily.
The integers in the tab array should be unsigned. They should be largest possible size (base) that you can multiply and still represent the product. If your compiler/processor supports 64 bit unsigned long long, for example, you might use uint32_t for the array of "digits." If your compiler/processor can only natively produce 32 bit products, you should use uint16_t.
When you sum two arrays you will need to deal with overflow; in assembly this is easy. In C you may opt to use one less bit (31 or 15) to make the overflow detection easier.
Also consider endianess, and the effect it and the algorithm will have on cache behavior.
