Miller Rabin Primality test accuracy - c

I know the Miller–Rabin primality test is probabilistic. However I want to use it for a programming task that leaves no room for error.
Can we assume that it is correct with very high probability if the input numbers are 64-bit integers (i.e. long long in C)?

Miller–Rabin is indeed probabilistic, but you can trade accuracy for computation time arbitrarily. If the number you test is prime, it will always give the correct answer. The problematic case is when a number is composite, but is reported to be prime. We can bound the probability of this error using the formula on Wikipedia: If you select k different bases randomly and test them, the error probability is less than 4-k. So even with k = 9, you only get a 3 in a million chance of being wrong. And with k = 40 or so it becomes ridiculously unlikely.
That said, there is a deterministic version of Miller–Rabin, relying on the correctness of the generalized Riemann hypothesis. For the range u
up to 264, it is enough to check a = 2, 3, 5, 7, 11, 13, 17, 19, 23. I have a C++ implementation online which was field-tested in lots of programming contests. Here's an instantiation of the template for unsigned 64-bit ints:
bool isprime(uint64_t n) { //determines if n is a prime number
const int pn = 9, p[] = { 2, 3, 5, 7, 11, 13, 17, 19, 23 };
for (int i = 0; i < pn; ++i)
if (n % p[i] == 0) return n == p[i];
if (n < p[pn - 1]) return 0;
uint64_t s = 0, t = n - 1;
while (~t & 1)
t >>= 1, ++s;
for (int i = 0; i < pn; ++i) {
uint64_t pt = PowerMod(p[i], t, n);
if (pt == 1) continue;
bool ok = 0;
for (int j = 0; j < s && !ok; ++j) {
if (pt == n - 1) ok = 1;
pt = MultiplyMod(pt, pt, n);
}
if (!ok) return 0;
}
return 1;
}
PowerMod and MultiplyMod are just primitives to multiply and exponentiate under a given modulus, using square-and-{multiply,add}.

For n < 2^64, it is possible to perform strong-pseudoprime tests to the seven bases 2, 325, 9375, 28178, 450775, 9780504, and 1795265022 and completely determine the primality of n; see http://miller-rabin.appspot.com/.
A faster primality test performs a strong-pseudoprime test to base 2 followed by a Lucas pseudoprime test. It takes about 3 times as long as a single strong-pseudoprime test, so is more than twice as fast as the 7-base Miller-Rabin test. The code is more complex, but not dauntingly so.
I can post code if you're interested; let me know in the comments.

In each iteration of Miller-Rabin you need to choose a random number. If you are unlucky this random number doesn't reveal certain composites. A small example of this is that 2^341 mod 341 = 2, passing the test
But the test guarantees that it only lets a composite pass with probability <1/4. So if you run the test 64 times with different random values, the probability drops below 2^(-128) which is enough in practice.
You should take a look at the Baillie–PSW primality test. While it may have false positives, there are no known examples for this and according to wikipedia has been verified that no composite number below 2^64 passes the test. So it should fit your requirements.

There are efficient deterministic variants of the MR test for 64-bit values - which do not rely on the GRH - having been exhaustively tested by exploiting GPUs and other known results.
I've listed the pertinent sections of a C program I wrote that tests the primality of any 64-bit value: (n > 1), using Jaeschke's and Sinclair's bases for the deterministic MR variant. It makes use of gcc and clang's __int128 extended type for exponentiation. If not available, an explicit routine is required. Maybe others will find this useful...
#include <inttypes.h>
/******************************************************************************/
static int sprp (uint64_t n, uint64_t a)
{
uint64_t m = n - 1, r, y;
unsigned int s = 1, j;
/* assert(n > 2 && (n & 0x1) != 0); */
while ((m & (UINT64_C(1) << s)) == 0) s++;
r = m >> s; /* r, s s.t. 2^s * r = n - 1, r in odd. */
if ((a %= n) == 0) /* else (0 < a < n) */
return (1);
{
unsigned __int128 u = 1, w = a;
while (r != 0)
{
if ((r & 0x1) != 0)
u = (u * w) % n; /* (mul-rdx) */
if ((r >>= 1) != 0)
w = (w * w) % n; /* (sqr-rdx) */
}
if ((y = (uint64_t) u) == 1)
return (1);
}
for (j = 1; j < s && y != m; j++)
{
unsigned __int128 u = y;
u = (u * u) % n; /* (sqr-rdx) */
if ((y = (uint64_t) u) <= 1) /* (n) is composite: */
return (0);
}
return (y == m);
}
/******************************************************************************/
static int is_prime (uint64_t n)
{
const uint32_t sprp32_base[] = /* (Jaeschke) */ {
2, 7, 61, 0};
const uint32_t sprp64_base[] = /* (Sinclair) */ {
2, 325, 9375, 28178, 450775, 9780504, 1795265022, 0};
const uint32_t *sprp_base;
/* assert(n > 1); */
if ((n & 0x1) == 0) /* even: */
return (n == 2);
sprp_base = (n <= UINT32_MAX) ? sprp32_base : sprp64_base;
for (; *sprp_base != 0; sprp_base++)
if (!sprp(n, *sprp_base)) return (0);
return (1); /* prime. */
}
/******************************************************************************/
Note that the MR (sprp) test is slightly modified to pass values on an iteration where the base is a multiple of the candidate, as mentioned in the 'remarks' section of the website
Update: while this has fewer base tests than Niklas' answer, it's important to note that the bases: {3, 5, 7, 11, 13, 17, 19, 23, 29} provide a cheap test that allows us to eliminate candidates exceeding: 29 * 29 = 841 -
simply using the GCD.
For (n > 29 * 29), we can clearly eliminate any even value as prime. The product of the small primes: (3 * 5 * 7 * 11 * 13 * 17 * 19 * 23 * 29} = 3234846615, fits nicely in a 32-bit unsigned value. A gcd(n, 3234846615) is a lot cheaper than a MR test! If the result is not (1), then (n) > 841 has a small factor.
Merten's (?) theorem suggests that this simple gcd(u64, u64) test eliminates ~ 68% of all odd candidates (as composites). If you're using M-R to search for primes (randomly or incrementally), rather than just a 'one-off' test, this is certainly worth while!

Your computer is not perfect; it has a finite probability of failing in such a way as to produce an incorrect result to a calculation. Providing the probability of the M-R test giving a false result is greatly less than the probability of some other computer failure, then you are fine. There is no reason to run the M-R test for less than 64 iterations (a 1 in 2^128 chance of error). Most examples will fail in the first few iterations, so only the actual primes will be thoroughly tested. Use 128 iterations for a 1 in 2^256 chance of error.

Related

Need help creating function to double the digits of an integer individually

So I'm trying to make a recursive function that takes an integer, let's say 123, and gives back double every single digit in the integer. So 123 would become 248, not 246. The main difference obviously is that instead of doing 123x2, you do (100x2)+(20x2)+(3x2). There is also the condition that if any of the numbers are equal to or greater than 5, you replace it with a 9, so 345 becomes 689. I am trying to create it iteratively before making it recursively and I am running into an issue with the conditions. Heres what I have so far:
int double_digit(int g) {
int dubl = 0;
int mod = 10;
int i = 1;
while (g > 0) {
mod = pow(10, i);
if (g % 10 < 5) {
dubl = dubl + g % mod * 2;
g = g - (g % mod);
i++;
}
else {
dubl = dubl + (9 * (mod / 10));
g = g - (g % mod);
i++;
}
}
printf("%d", dubl);
}
If you run the code, you will see that it works for numbers under 5, but not greater than or equal to, how can I fix it?
WAY too complicated... A lookup table will shrink the code down to almost nothing. This even tries to protect against receiving a negative number.
int double_digit(int g) {
int dd = 0;
for( int w = abs(g), pow = 1; w; w /= 10, pow *= 10 )
dd += pow * "\0\02\04\06\10\011\011\011\011\011"[ w % 10 ];
return g < 0 ? 0 - dd : dd;
}
int main() {
int tests[] = { 42, 256, 123, 578, -3256, };
for( int i = 0; i < sizeof tests/sizeof tests[0]; i++ )
printf( "%d = %d\n", tests[i], double_digit( tests[i] ) );
return 0;
}
42 = 84
256 = 499
123 = 246
578 = 999
-3256 = -6499
EDIT:
C's understanding of octal bytes specified in a character array (aka string) may be new.
Here is an abridged alternative that may not be as confrontational:
int lut[] = { 0, 2, 4, 6, 8, 9, 9, 9, 9, 9, };
for( int w = abs(g), pow = 1; w; w /= 10, pow *= 10 )
dd += pow * lut[ w % 10 ];
And, then there are branchless techniques (overkill in this instance, but useful to consider in other applications):
for( int d, w = abs(g), pow = 1; w; w /= 10, pow *= 10 )
d = (w%10)<<1, dd += pow * (d*(d<9)+(d>8)*9);
In other words, there are lots of ways to achieve objectives. A good exercise is to try to discover those different ways and then to consider where each may be used to the most benefit (and where other methods may not appropriate or as clear.) Write clean, clear, concise code to the best of your abilities.
I've written the recursive version. Without accounting for negative values, the recursive version requires, to the best of my abilities, more code and takes a 'linear' (iterative) problem into other dimensions. Make the method fit the problem, not vice versa.

What is the time complexity of exponentiation by squaring?

Here is a code to exponentiate a number to a given power:
#include <stdio.h>
int foo(int m, int k) {
if (k == 0) {
return 1;
} else if (k % 2 != 0) {
return m * foo(m, k - 1);
} else {
int p = foo(m, k / 2);
return p * p;
}
}
int main() {
int m, k;
while (scanf("%d %d", &m, &k) == 2) {
printf("%d\n", foo(m, k));
}
return 0;
}
How do I calculate the time complexity of the function foo?
I have been able to deduce that if k is a power of 2, the time complexity is O(log k).
But I am finding it difficult to calculate for other values of k. Any help would be much appreciated.
How do I calculate the time complexity of the function foo()?
I have been able to deduce that if k is a power of 2, the time complexity is O(logk).
First, I assume that the time needed for each function call is constant (this would for example not be the case if the time needed for a multiplication depends on the numbers being multiplied - which is the case on some computers).
We also assume that k>=1 (otherwise, the function will run endlessly unless there is an overflow).
Let's think the value k as a binary number:
If the rightmost bit is 0 (k%2!=0 is false), the number is shifted right by one bit (foo(m,k/2)) and the function is called recursively.
If the rightmost bit is 1 (k%2!=0 is true), the bit is changed to a 0 (foo(m,k-1)) and the function is called recursively. (We don't look at the case k=1, yet.)
This means that the function is called once for each bit and it is called once for each 1 bit. Or, in other words: It is called once for each 0 bit in the number and twice for each 1 bit.
If N is the number of function calls, n1 is the number of 1 bits and n0 is the number of 0 bits, we get the following formula:
N = n0 + 2*n1 + C
The constant C (C=(-1), if I didn't make a mistake) represents the case k=1 that we ignored up to now.
This means:
N = (n0 + n1) + n1 + C
And - because n0 + n1 = floor(log2(k)) + 1:
floor(log2(k)) + C <= N <= 2*floor(log2(k)) + C
As you can see, the time complexity is always O(log(k))
O(log(k))
Some modification added to output a statistics for spread sheet plot.
#include <stdio.h>
#include <math.h>
#ifndef TEST_NUM
#define TEST_NUM (100)
#endif
static size_t iter_count;
int foo(int m, int k) {
iter_count++;
if (k == 0) {
return 1;
} else if(k == 1) {
return m;
} else if (k % 2 != 0) {
return m * foo(m, k - 1);
} else {
int p = foo(m, k / 2);
return p * p;
}
}
int main() {
for (int i = 1; i < TEST_NUM; ++i) {
iter_count = 0;
int dummy_result = foo(1, i);
printf("%d, %zu, %f\n", i, iter_count, log2(i));
}
return 0;
}
Build it.
gcc t1.c -DTEST_NUM=10000
./a > output.csv
Now open the output file with a spread sheet program and plot the last two output columns.
For k positive, the function foo calls itself recursively p times if k is the p-th power of 2. If k is not a power of 2, the number of recursive calls is strictly inferior to 2 * p where p is the exponent of the largest power of 2 inferior to k.
Here is a demonstration:
let's expand the recursive call in the case k % 2 != 0:
int foo(int m, int k) {
if (k == 1) {
return m;
} else
if (k % 2 != 0) { /* 2 recursive calls */
// return m * foo(m, k - 1);
int p = foo(m, k / 2);
return m * p * p;
} else { /* 1 recursive call */
int p = foo(m, k / 2);
return p * p;
}
}
The total number of calls is floor(log2(k)) + bitcount(k), and bitcount(k) is by construction <= ceil(log2(k)).
There are no loops in the code and the time of each individual call is bounded by a constant, hence the overall time complexity of O(log k).
The number of times the function is called (recursively or not) per power call is proportional to the minimum number of bits in the exponent required to represent it in binary form.
Each time you enter in the function, it solves by reducing the number by one if the exponent is odd, OR reducing it to half if the exponent is even. This means that we will do n squares per significant bit in the number, and m more multiplications by the base for all the bits that are 1 in the exponent (which are, at most, n, so m < n) for a 32bit significant exponent (this is an exponent between 2^31 and 2^32 the routine will do between 32 and 64 products to get the result, and will reenter to itself a maximum of 64 times)
as in both cases the routine is tail-recursive, the code you post can be substituted with an iterative code in which a while loop is used to solve the problem.
int foo(int m, int k)
{
int prod = 1; /* last recursion foo(m, 0); */
int sq = m; /* squares */
while (k) {
if (k & 1) {
prod *= sq; /* foo(m, k); k odd */
}
k >>= 1;
sq *= sq;
}
return prod; /* return final product */
}
That's huge savings!!! (between 32 multiplications and 64 multiplications, to elevate something to 1,000,000,000 power)

Efficient way to find divisibility

Professor says this isn't a efficient algorithm to check whether the number is divisible by a number from 100,000-150,000. I'm having trouble finding a better way. Any help would be appreciated.
unsigned short divisibility_check(unsigned long n) {
unsigned long i;
for (i = 100000; i <= 150000; i++) {
if (n % i == 0) {
return 0;
}
}
return 1;
}
Let's say you need to find whether a positive integer K is divisible by a number between 100,000 and 150,000, and it is such a rare operation, that doing precalculations is just not worth the processor time or memory used.
If K < 100,000, it cannot be divisible by a number between 100,000 and 150,000.
If 100,000 ≤ K ≤ 150,000, it is divisible by itself. It is up to you to decide whether this counts or not.
For a K > 150,000 to be divisible by M, with 100,000 ≤ M ≤ 150,000, K must also be divisible by L = K / M. This is because K = L × M, and all three are positive integers. So, you only need to test the divisibility of K by a set of L, where ⌊ K / 150,000 ⌋ ≤ L ≤ ⌊ K / 100,000 ⌋.
However, that set of Ls becomes larger than the set of possible Ms when K > = 15,000,000,000. Then it is again less work to just test K for divisibility against each M, much like OP's code is now.
When implementing this as a program, the most important thing in practice is, surprisingly, the comments you add. Do not write comments that describe what the code does; write comments that explain the model or algorithm you are trying to implement (say, at the function level), and your intent of what each small block of code should accomplish.
In this particular case, you should probably add a comment to each if clause, explaining your reasoning, much like I did above.
Beginner programmers often omit comments completely. It is unfortunate, because writing good comments is a hard habit to pick up afterwards. It is definitely a good idea to learn to comment your code (as I described above -- the comments that describe what the code does are less than useful; more noise than help), and keep honing your skill on that.
A programmer whose code is maintainable, is worth ten geniuses who produce write-only code. This is because all code has bugs, because humans make errors. To be an efficient developer, your code must be maintainable. Otherwise you're forced to rewrite each buggy part from scratch, wasting a lot of time. And, as you can see above, "optimization" at the algorithmic level, i.e. thinking about how to avoid having to do work, yields much better results than trying to optimize your loops or something like that. (You'll find in real life that surprisingly often, optimizing a loop in the proper way, removes the loop completely.)
Even in exercises, proper comments may be the difference between "no points, this doesn't work" and "okay, I'll give you partial credit for this one, because you had a typo/off-by-one bug/thinko on line N, but otherwise your solution would have worked".
As bolov did not understand how the above leads to a "naive_with_checks" function, I'll show it implemented here.
For ease of testing, I'll show a complete test program. Supply the range of integers to test, and the range of divisors accepted, as parameters to the program (i.e. thisprogram 1 500000 100000 150000 to duplicate bolov's tests).
#include <stdlib.h>
#include <inttypes.h>
#include <limits.h>
#include <locale.h>
#include <ctype.h>
#include <stdio.h>
#include <errno.h>
int is_divisible(const uint64_t number,
const uint64_t minimum_divisor,
const uint64_t maximum_divisor)
{
uint64_t divisor, minimum_result, maximum_result, result;
if (number < minimum_divisor) {
return 0;
}
if (number <= maximum_divisor) {
/* Number itself is a valid divisor. */
return 1;
}
minimum_result = number / maximum_divisor;
if (minimum_result < 2) {
minimum_result = 2;
}
maximum_result = number / minimum_divisor;
if (maximum_result < minimum_result) {
maximum_result = minimum_result;
}
if (maximum_result - minimum_result > maximum_divisor - minimum_divisor) {
/* The number is so large that it is the least amount of work
to check each possible divisor. */
for (divisor = minimum_divisor; divisor <= maximum_divisor; divisor++) {
if (number % divisor == 0) {
return 1;
}
}
return 0;
} else {
/* There are fewer possible results than divisors,
so we check the results instead. */
for (result = minimum_result; result <= maximum_result; result++) {
if (number % result == 0) {
divisor = number / result;
if (divisor >= minimum_divisor && divisor <= maximum_divisor) {
return 1;
}
}
}
return 0;
}
}
int parse_u64(const char *s, uint64_t *to)
{
unsigned long long value;
const char *end;
/* Empty strings are not valid. */
if (s == NULL || *s == '\0')
return -1;
/* Parse as unsigned long long. */
end = s;
errno = 0;
value = strtoull(s, (char **)(&end), 0);
if (errno == ERANGE)
return -1;
if (end == s)
return -1;
/* Overflow? */
if (value > UINT64_MAX)
return -1;
/* Skip trailing whitespace. */
while (isspace((unsigned char)(*end)))
end++;
/* If the string does not end here, it has garbage in it. */
if (*end != '\0')
return -1;
if (to)
*to = (uint64_t)value;
return 0;
}
int main(int argc, char *argv[])
{
uint64_t kmin, kmax, dmin, dmax, k, count;
if (argc != 5) {
fprintf(stderr, "\n");
fprintf(stderr, "Usage: %s [ -h | --help | help ]\n", argv[0]);
fprintf(stderr, " %s MIN MAX MIN_DIVISOR MAX_DIVISOR\n", argv[0]);
fprintf(stderr, "\n");
fprintf(stderr, "This program counts which positive integers between MIN and MAX,\n");
fprintf(stderr, "inclusive, are divisible by MIN_DIVISOR to MAX_DIVISOR, inclusive.\n");
fprintf(stderr, "\n");
return EXIT_SUCCESS;
}
/* Use current locale. This may change which codes isspace() considers whitespace. */
if (setlocale(LC_ALL, "") == NULL)
fprintf(stderr, "Warning: Your C library does not support your current locale.\n");
if (parse_u64(argv[1], &kmin) || kmin < 1) {
fprintf(stderr, "%s: Invalid minimum positive integer to test.\n", argv[1]);
return EXIT_FAILURE;
}
if (parse_u64(argv[2], &kmax) || kmax < kmin || kmax >= UINT64_MAX) {
fprintf(stderr, "%s: Invalid maximum positive integer to test.\n", argv[2]);
return EXIT_FAILURE;
}
if (parse_u64(argv[3], &dmin) || dmin < 2) {
fprintf(stderr, "%s: Invalid minimum divisor to test for.\n", argv[3]);
return EXIT_FAILURE;
}
if (parse_u64(argv[4], &dmax) || dmax < dmin) {
fprintf(stderr, "%s: Invalid maximum divisor to test for.\n", argv[4]);
return EXIT_FAILURE;
}
count = 0;
for (k = kmin; k <= kmax; k++)
count += is_divisible(k, dmin, dmax);
printf("%" PRIu64 "\n", count);
return EXIT_SUCCESS;
}
It is useful to note that the above, running bolov's test, i.e. thisprogram 1 500000 100000 150000 only takes about 15 ms of wall clock time (13 ms CPU time), median, on a much slower Core i5-7200U processor. For really large numbers, like 280,000,000,000 to 280,000,010,000, the test does the maximum amount of work, and takes about 3.5 seconds per 10,000 numbers on this machine.
In other words, I wouldn't trust bolov's numbers to have any relation to timings for properly written test cases.
It is important to note that for any K between 1 and 500,000, the same test that bolov says their code measures, the above code does at most two divisibility tests to find if K is divisible by an integer between 100,000 and 150,000.
This solution is therefore quite efficient. It is definitely acceptable and near-optimal, when the tested K are relatively small (say, 32 bit unsigned integers or smaller), or when precomputed tables cannot be used.
Even when precomputed tables can be used, it is unclear if/when prime factorization becomes faster than the direct checks. There is certainly a tradeoff in the size and content of the precomputed tables. bolov claims that it is clearly superior to other methods, but hasn't implemented a proper "naive" divisibility test as shown above, and bases their opinion on experiments on quite small integers (1 to 500,000) that have simple prime decompositions.
As an example, a table of integers 1 to 500,000 pre-checked for divisibility takes only 62500 bytes (43750 bytes for 150,000 to 500,000). With that table, each test takes a small near-constant time (that only depends on memory and cache effects). Extending it to all 32-bit unsigned integers would require 512 GiB (536,870,912 bytes); the table can be stored in a memory-mapped read-only file, to let the OS kernel manage how much of it is mapped to RAM at any time.
Prime decomposition itself, especially using trial division, becomes more expensive than the naive approach when the number of trial divisions exceeds the range of possible divisors (50,000 divisors in this particular case). As there are 13848 primes (if one counts 1 and 2 as primes) between 1 and 150,000, the number of trial divisions can easily approach the number of divisors for sufficiently large input values.
For numbers with many prime factors, the combinatoric phase, finding if any subset of the prime factors multiply to a number between 100,000 and 150,000 is even more problematic. The number of possible combinations grows faster than exponentially. Without careful checks, this phase alone can do way more work per large input number than just trial division with each possible divisor would be.
(As an example, if you have 16 different prime factors, you already have 65,535 different combinations; more than the number of direct trial divisions. However, all such numbers are larger than 64-bit; the smallest being 2·3·5·7·11·13·17·19·23·29·31·37·41·43·47·53 = 32,589,158,477,190,044,730 which is a 65-bit number.)
There is also the problem of code complexity. The more complex the code, the harder it is to debug and maintain.
Ok, so I've implemented the version with sieve primes and factorization mentioned in the comments by m69 and it is ... way faster than the naive approach. I must admit, I didn't expect this at all.
My notations: left == 100'000 and right = 150'000
naive your version
naive_with_checks your version with simple checks:
if (n < left) no divisor
else if (n <= right) divisor
else if (left * 2 >= right && n < left * 2) divisor
factorization (above checks implemented)
Precompute the Sieve of Eratosthenes for all primes up to right. This time is not measured
factorize n (only with the primes from the prev step)
generate all subsets (backtracking, depth first: i.e. generate p1^0 * p2^0 * p3^0 first, instead of p1^5 first) with the product < left or until the product is in [left, right] (found divisor).
factorization_opt optimization of the previous algorithm where the subsets are not generated (no vector of subsets is created). I just pass the current product from one backtracking iteration to the next.
Nominal Animal's version I have also ran his version on my system with the same range.
I have written the program in C++ so I won't share it here.
I used std::uint64_t as data type and I have checked all numbers from 1 to 500'000 to see if each is divisible by a number in interval [100'000, 150'000]. All version reached the same solution: 170'836 numbers with positive results.
The setup:
Hardware: Intel Core i7-920, 4 cores with HT (all algorithm versions are single threaded), 2.66 GHz (boost 2.93 GHz),
8 MB SmartCache; memory: 6 GB DDR3 triple channel.
Compiler: Visual Studio 2017 (v141), Release x64 mode.
I must also add that I haven't profiled the programs so there is definitely room to improve the implementation. However this is enough here as the idea is to find a better algorithm.
version | elapsed time (milliseconds)
-----------------------+--------------
naive         |  167'378 ms (yes, it's thousands separator, aka 167 seconds)
naive_with_checks |   97'197 ms
factorization | 7'906 ms
factorization_opt | 7'320 ms
|
Nominal Animal version | 14 ms
Some analysis:
For naive vs naive_with_checks: all the numbers in [1 200'000] can be solved with just the simple checks. As these represent 40% of all the numbers checked, the naive_with_checks version does roughly 60% of the work naive does. The execution time reflect this as naive_with_checks runtime is ≅58% of the naive version.
The factorization version is a whopping 12.3 times faster. That is indeed impressive. I haven't analyzed the time complexity of the alg.
And the final optimization brings a further 1.08x speedup. This is basically the time gained by removing the creation and copy of the small vectors of subset factors.
For those interested the sieve precomputation which is not included above takes about 1 ms. And this is the naive implementation from wikipedia, no optimizations whatsoever.
For comparison, here's what I had in mind when I posted my comment about using prime factorization. Compiled with gcc -std=c99 -O3 -m64 -march=haswell this is slightly faster than the naive method with checks and inversion when tested with the last 10,000 integers in the 64-bit range (3.469 vs 3.624 seconds).
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>
#include <stdbool.h>
void eratosthenes(bool *ptr, uint64_t size) {
memset(ptr, true, size);
for (uint64_t i = 2; i * i < size; i++) {
if (ptr[i]) {
for (uint64_t j = i * i; j < size; j += i) {
ptr[j] = false;
}
}
}
}
bool divisible(uint64_t n, uint64_t a, uint64_t b) {
/* check for trivial cases first */
if (n < a) {
return false;
}
if (n <= b) {
return true;
}
if (n < 2 * a) {
return false;
}
/* Inversion: use range n/b ~ n/a; see Nominal Animal's answer */
if (n < a * b) {
uint64_t c = a;
a = (n + b - 1) / b; // n/b rounded up
b = n / c;
}
/* Create prime sieve when first called, or re-calculate it when */
/* called with a higher value of b; place before inversion in case */
/* of a large sequential test, to avoid repeated re-calculation. */
static bool *prime = NULL;
static uint64_t prime_size = 0;
if (prime_size <= b) {
prime_size = b + 1;
prime = realloc(prime, prime_size * sizeof(bool));
if (!prime) {
printf("Out of memory!\n");
return false;
}
eratosthenes(prime, prime_size);
}
/* Factorize n into prime factors up to b, using trial division; */
/* there are more efficient but also more complex ways to do this. */
/* You could return here, if a factor in the range a~b is found. */
static uint64_t factor[63];
uint8_t factors = 0;
for (uint64_t i = 2; i <= n && i <= b; i++) {
if (prime[i]) {
while (n % i == 0) {
factor[factors++] = i;
n /= i;
}
}
}
/* Prepare divisor sieve when first called, or re-allocate it when */
/* called with a higher value of b; in a higher-level language, you */
/* would probably use a different data structure for this, because */
/* this method iterates repeatedly over a potentially sparse array. */
static bool *divisor = NULL;
static uint64_t div_size = 0;
if (div_size <= b / 2) {
div_size = b / 2 + 1;
divisor = realloc(divisor, div_size * sizeof(bool));
if (!divisor) {
printf("Out of memory!\n");
return false;
}
}
memset(divisor, false, div_size);
divisor[1] = true;
uint64_t max = 1;
/* Iterate over each prime factor, and for every divisor already in */
/* the sieve, add the product of the divisor and the factor, up to */
/* the value b/2. If the product is in the range a~b, return true. */
for (uint8_t i = 0; i < factors; i++) {
for (uint64_t j = max; j > 0; j--) {
if (divisor[j]) {
uint64_t product = factor[i] * j;
if (product >= a && product <= b) {
return true;
}
if (product < div_size) {
divisor[product] = true;
if (product > max) {
max = product;
}
}
}
}
}
return false;
}
int main() {
uint64_t count = 0;
for (uint64_t n = 18446744073709541615LLU; n <= 18446744073709551614LLU; n++) {
if (divisible(n, 100000, 150000)) ++count;
}
printf("%llu", count);
return 0;
}
And this is the naive + checks + inversion implementation I compared it with:
#include <stdio.h>
#include <stdint.h>
#include <stdbool.h>
bool divisible(uint64_t n, uint64_t a, uint64_t b) {
if (n < a) {
return false;
}
if (n <= b) {
return true;
}
if (n < 2 * a) {
return false;
}
if (n < a * b) {
uint64_t c = a;
a = (n + b - 1) / b;
b = n / c;
}
while (a <= b) {
if (n % a++ == 0) return true;
}
return false;
}
int main() {
uint64_t count = 0;
for (uint64_t n = 18446744073709541615LLU; n <= 18446744073709551614LLU; n++) {
if (divisible(n, 100000, 150000)) ++count;
}
printf("%llu", count);
return 0;
}
Here's a recursive method with primes. The idea here is that if a number is divisible by a number between 100000 and 150000, there is a path of reducing by division the product of only relevant primes that will pass through a state in the target range. (Note: the code below is meant for numbers greater than 100000*150000). In my testing, I could not find an instance where the stack performed over 600 iterations.
# Euler sieve
def getPrimes():
n = 150000
a = (n+1) * [None]
ps = ([],[])
s = []
p = 1
while (p < n):
p = p + 1
if not a[p]:
s.append(p)
# Save primes less
# than half
# of 150000, the only
# ones needed to construct
# our candidates.
if p < 75000:
ps[0].append(p);
# Save primes between
# 100000 and 150000
# in case our candidate
# is prime.
elif p > 100000:
ps[1].append(p)
limit = n / p
new_s = []
for i in s:
j = i
while j <= limit:
new_s.append(j)
a[j*p] = True
j = j * p
s = new_s
return ps
ps1, ps2 = getPrimes()
def f(n):
# Prime candidate
for p in ps2:
if not (n % p):
return True
# (primes, prime_counts)
ds = ([],[])
prod = 1
# Prepare only prime
# factors that could
# construct a composite
# candidate.
for p in ps1:
while not (n % p):
prod *= p
if (not ds[0] or ds[0][-1] != p):
ds[0].append(p)
ds[1].append(1)
else:
ds[1][-1] += 1
n /= p
# Reduce the primes product to
# a state where it's between
# our target range.
stack = [(prod,0)]
while stack:
prod, i = stack.pop()
# No point in reducing further
if prod < 100000:
continue
# Exit early
elif prod <= 150000:
return True
# Try reducing the product
# by different prime powers
# one prime at a time
if i < len(ds[0]):
for p in xrange(ds[1][i] + 1):
stack.append((prod / ds[0][i]**p, i + 1))
return False
Output:
c = 0
for ii in xrange(1099511627776, 1099511628776):
f_i = f(ii)
if f_i:
c += 1
print c # 239
Here is a very simple solution with a sieve cache. If you call the divisibility_check function for many numbers in a sequence, this should be very efficient:
#include <string.h>
int divisibility_check_sieve(unsigned long n) {
static unsigned long sieve_min = 1, sieve_max;
static unsigned char sieve[1 << 19]; /* 1/2 megabyte */
if (n < sieve_min || n > sieve_max) {
sieve_min = n & ~(sizeof(sieve) - 1);
sieve_max = sieve_min + sizeof(sieve) - 1;
memset(sieve, 1, sizeof sieve);
for (unsigned long m = 100000; m <= 150000; m++) {
unsigned long i = sieve_min % m;
if (i != 0)
i = m - i;
for (; i < sizeof sieve; i += m) {
sieve[i] = 0;
}
}
}
return sieve[n - sieve_min];
}
Here is a comparative benchmark:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
int divisibility_check_naive(unsigned long n) {
for (unsigned long i = 100000; i <= 150000; i++) {
if (n % i == 0) {
return 0;
}
}
return 1;
}
int divisibility_check_small(unsigned long n) {
unsigned long i, min = n / 150000, max = n / 100000;
min += (min == 0);
max += (max == 0);
if (max - min > 150000 - 100000) {
for (i = 100000; i <= 150000; i++) {
if (n % i == 0) {
return 0;
}
}
return 1;
} else {
for (i = min; i <= max; i++) {
if (n % i == 0) {
unsigned long div = n / i;
if (div >= 100000 && div <= 150000)
return 0;
}
}
return 1;
}
}
int divisibility_check_sieve(unsigned long n) {
static unsigned long sieve_min = 1, sieve_max;
static unsigned char sieve[1 << 19]; /* 1/2 megabyte */
if (n < sieve_min || n > sieve_max) {
sieve_min = n & ~(sizeof(sieve) - 1);
sieve_max = sieve_min + sizeof(sieve) - 1;
memset(sieve, 1, sizeof sieve);
for (unsigned long m = 100000; m <= 150000; m++) {
unsigned long i = sieve_min % m;
if (i != 0)
i = m - i;
for (; i < sizeof sieve; i += m) {
sieve[i] = 0;
}
}
}
return sieve[n - sieve_min];
}
int main(int argc, char *argv[]) {
unsigned long n, count = 0, lmin, lmax, range[2] = { 1, 500000 };
int pos = 0, naive = 0, small = 0, sieve = 1;
clock_t t;
char *p;
for (int i = 1; i < argc; i++) {
n = strtoul(argv[i], &p, 0);
if (*p == '\0' && pos < 2)
range[pos++] = n;
else if (!strcmp(argv[i], "naive"))
naive = 1;
else if (!strcmp(argv[i], "small"))
small = 1;
else if (!strcmp(argv[i], "sieve"))
sieve = 1;
else
printf("invalid argument: %s\n", argv[i]);
}
lmin = range[0];
lmax = range[1] + 1;
if (naive) {
t = clock();
for (count = 0, n = lmin; n != lmax; n++) {
count += divisibility_check_naive(n);
}
t = clock() - t;
printf("naive: [%lu..%lu] -> %lu non-divisible numbers, %10.2fms\n",
lmin, lmax - 1, count, t * 1000.0 / CLOCKS_PER_SEC);
}
if (small) {
t = clock();
for (count = 0, n = lmin; n != lmax; n++) {
count += divisibility_check_small(n);
}
t = clock() - t;
printf("small: [%lu..%lu] -> %lu non-divisible numbers, %10.2fms\n",
lmin, lmax - 1, count, t * 1000.0 / CLOCKS_PER_SEC);
}
if (sieve) {
t = clock();
for (count = 0, n = lmin; n != lmax; n++) {
count += divisibility_check_sieve(n);
}
t = clock() - t;
printf("sieve: [%lu..%lu] -> %lu non-divisible numbers, %10.2fms\n",
lmin, lmax - 1, count, t * 1000.0 / CLOCKS_PER_SEC);
}
return 0;
}
Here are some run times:
naive: [1..500000] -> 329164 non-divisible numbers, 158174.52ms
small: [1..500000] -> 329164 non-divisible numbers, 12.62ms
sieve: [1..500000] -> 329164 non-divisible numbers, 1.35ms
sieve: [0..4294967295] -> 3279784841 non-divisible numbers, 8787.23ms
sieve: [10000000000000000000..10000000001000000000] -> 765978176 non-divisible numbers, 2205.36ms

subset sum with negative values in c or c++

I have this code for finding the subset sum of positive values and everywhere I searched I only see positive integers or a program written in java in advanced level. I want to know how to implement that my C program would work with negative numbers. Actually, I want it to find sum that is 0. I had an idea
Take the minimum value in the set, call it k.
Add each element in the set by the absolute value of k.
Add sum by the absolute value of k.
Perform the algorithm.
But I found that this wont work. Take the set (-5, 10) and see if any subset adds up to 5. We would convert (-5, 10) -> (0, 15) and 5->10. -5+10=5, but 0+15 != 10
A lot of ideas I searched on the internet but can't find the answer.
#include <stdio.h>
typedef int bool;
#define true 1
#define false 0
bool isSubsetSum(int set[], int n, int sum) {
// Base Cases
if (sum == 0)
return true;
if (n == 0 && sum != 0)
return false;
if (set[n - 1] > sum)
return isSubsetSum(set, n - 1, sum);
return isSubsetSum(set, n - 1, sum) ||
isSubsetSum(set, n - 1, sum - set[n - 1]);
}
int main() {
int set[] = { -3, 34, -2, 12, 5, 8 };
int sum = 0;
int i;
int n = sizeof(set) / sizeof(set[0]);
if (isSubsetSum(set, n, sum) == true)
printf("Found a subset");
else
printf("No subset");
return 0;
}
I dont really understand your strategy. You should not use the absolute value. The sum of a+b has little to do with the sum of |a|+|b| (well there are some relations, but if you use them somewhere then I missed it ;).
If you have an algorithm that can find you a subset among positive integers that adds up to x, then you can use it also for negative numbers. It wont be as efficient, but with a small trick it can work....
First you add an offset to all numbers to make them all positive. Now you look for subsets that add up to x+y*offset, where y is the size of the subset. Eg. you have
A = -1, -3, -2, 6 12, 48
and you are looking for a subset that adds up to 0, then you first add 3 to all numbers,
b = 2, 0, 1, 9, 15, 51
and then try to find a subset of size 1 that adds up to 3, a subset of size 2 that adds up to 6, ...., a subset of size 4 that adds up to 12, that would be
12 = 2+0+1+9 ie 0 = -1 + -3 + -2 + 6
Doing it that way isnt very efficient, because you have to apply the algorithm N-times (N= size of input). However, if your algorithm for positives lets you fix the size of the subset, this may compensate this loss in efficiency.
I guess you can try a brute force attempt by removing the test for overflow:
#include <stdio.h>
int isSubsetSum(int set[], int n, int sum, int empty_ok) {
// Base Cases
if (sum == 0 && empty_ok)
return 1;
if (n == 0)
return 0;
return isSubsetSum(set, n - 1, sum, empty_ok) ||
isSubsetSum(set, n - 1, sum - set[n - 1], 1);
}
int main(void) {
int set[] = { 3, 34, 2, 12, 5, 8 };
int n = sizeof(set) / sizeof(set[0]);
int sum = 6;
if (isSubsetSum(set, n, sum, 0) == true)
printf("Found a subset");
else
printf("No subset");
return 0;
}
Unfortunately, the time complexity of this solution is O(2n).
Here is a non recursive solution for sets up to 64 elements:
int isSubsetSum(int set[], int n, int sum) {
unsigned long long last;
if (n == 0)
return sum == 0;
last = ((1ULL << (n - 1)) << 1) - 1;
// only find non empty subsets for a 0 sum
for (unsigned long long bits = 1;; bits++) {
int s = 0;
for (int i = 0; i < n; i++) {
s += set[i] * ((bits >> i) & 1);
}
if (s == sum)
return 1;
if (bits == last)
return 0;
}
}
Explanation: type unsigned long long is guaranteed to have at least 64 value bits. bits varies from 1 to last inclusive and takes all possible bit patterns of n bits except all off. For each value of bits, I sum the elements for which the corresponding bit is set, hence all possible non empty subsets are tested.
Code has TBD bug
Yet OP requested it to remain. I fix it later or take down tomorrow.
OP's code has trouble because it is searching for the wrong sum.
By finding the minimum value and offsetting each element of set[], the problem becomes one of only positive numbers - which apparently OP has solved prior.
The trick is that the target sum needs to be offset by n*offset
#include <stdio.h>
#include <stdbool.h>
//typedef int bool;
//#define true 1
//#define false 0
bool isSubsetSum(int set[], int n, int sum, int offset) {
// Base Cases
if ((sum + n*offset) == 0)
return true;
if (n == 0 && (sum + n*offset) != 0)
return false;
if (set[n - 1] > sum + n*offset)
return isSubsetSum(set, n - 1, sum, offset);
return isSubsetSum(set, n - 1, sum, offset) ||
isSubsetSum(set, n - 1, sum - set[n - 1], offset);
}
int main() {
int set[] = { -3, 34, -2, 12, 5, 8 };
int sum = 0;
int i;
int n = sizeof(set) / sizeof(set[0]);
int min = -3; // TBD code to find minimum
for (i = 0; i<6; i++) set[i] -= min;
if (isSubsetSum(set, n, sum, -min) == true)
printf("Found a subset");
else
printf("No subset");
return 0;
}
Found a subset

The most efficient way to implement an integer based power function pow(int, int)

What is the most efficient way given to raise an integer to the power of another integer in C?
// 2^3
pow(2,3) == 8
// 5^5
pow(5,5) == 3125
Exponentiation by squaring.
int ipow(int base, int exp)
{
int result = 1;
for (;;)
{
if (exp & 1)
result *= base;
exp >>= 1;
if (!exp)
break;
base *= base;
}
return result;
}
This is the standard method for doing modular exponentiation for huge numbers in asymmetric cryptography.
Note that exponentiation by squaring is not the most optimal method. It is probably the best you can do as a general method that works for all exponent values, but for a specific exponent value there might be a better sequence that needs fewer multiplications.
For instance, if you want to compute x^15, the method of exponentiation by squaring will give you:
x^15 = (x^7)*(x^7)*x
x^7 = (x^3)*(x^3)*x
x^3 = x*x*x
This is a total of 6 multiplications.
It turns out this can be done using "just" 5 multiplications via addition-chain exponentiation.
n*n = n^2
n^2*n = n^3
n^3*n^3 = n^6
n^6*n^6 = n^12
n^12*n^3 = n^15
There are no efficient algorithms to find this optimal sequence of multiplications. From Wikipedia:
The problem of finding the shortest addition chain cannot be solved by dynamic programming, because it does not satisfy the assumption of optimal substructure. That is, it is not sufficient to decompose the power into smaller powers, each of which is computed minimally, since the addition chains for the smaller powers may be related (to share computations). For example, in the shortest addition chain for a¹⁵ above, the subproblem for a⁶ must be computed as (a³)² since a³ is re-used (as opposed to, say, a⁶ = a²(a²)², which also requires three multiplies).
If you need to raise 2 to a power. The fastest way to do so is to bit shift by the power.
2 ** 3 == 1 << 3 == 8
2 ** 30 == 1 << 30 == 1073741824 (A Gigabyte)
Here is the method in Java
private int ipow(int base, int exp)
{
int result = 1;
while (exp != 0)
{
if ((exp & 1) == 1)
result *= base;
exp >>= 1;
base *= base;
}
return result;
}
An extremely specialized case is, when you need say 2^(-x to the y), where x, is of course is negative and y is too large to do shifting on an int. You can still do 2^x in constant time by screwing with a float.
struct IeeeFloat
{
unsigned int base : 23;
unsigned int exponent : 8;
unsigned int signBit : 1;
};
union IeeeFloatUnion
{
IeeeFloat brokenOut;
float f;
};
inline float twoToThe(char exponent)
{
// notice how the range checking is already done on the exponent var
static IeeeFloatUnion u;
u.f = 2.0;
// Change the exponent part of the float
u.brokenOut.exponent += (exponent - 1);
return (u.f);
}
You can get more powers of 2 by using a double as the base type.
(Thanks a lot to commenters for helping to square this post away).
There's also the possibility that learning more about IEEE floats, other special cases of exponentiation might present themselves.
power() function to work for Integers Only
int power(int base, unsigned int exp){
if (exp == 0)
return 1;
int temp = power(base, exp/2);
if (exp%2 == 0)
return temp*temp;
else
return base*temp*temp;
}
Complexity = O(log(exp))
power() function to work for negative exp and float base.
float power(float base, int exp) {
if( exp == 0)
return 1;
float temp = power(base, exp/2);
if (exp%2 == 0)
return temp*temp;
else {
if(exp > 0)
return base*temp*temp;
else
return (temp*temp)/base; //negative exponent computation
}
}
Complexity = O(log(exp))
If you want to get the value of an integer for 2 raised to the power of something it is always better to use the shift option:
pow(2,5) can be replaced by 1<<5
This is much more efficient.
int pow( int base, int exponent)
{ // Does not work for negative exponents. (But that would be leaving the range of int)
if (exponent == 0) return 1; // base case;
int temp = pow(base, exponent/2);
if (exponent % 2 == 0)
return temp * temp;
else
return (base * temp * temp);
}
Just as a follow up to comments on the efficiency of exponentiation by squaring.
The advantage of that approach is that it runs in log(n) time. For example, if you were going to calculate something huge, such as x^1048575 (2^20 - 1), you only have to go thru the loop 20 times, not 1 million+ using the naive approach.
Also, in terms of code complexity, it is simpler than trying to find the most optimal sequence of multiplications, a la Pramod's suggestion.
Edit:
I guess I should clarify before someone tags me for the potential for overflow. This approach assumes that you have some sort of hugeint library.
Late to the party:
Below is a solution that also deals with y < 0 as best as it can.
It uses a result of intmax_t for maximum range. There is no provision for answers that do not fit in intmax_t.
powjii(0, 0) --> 1 which is a common result for this case.
pow(0,negative), another undefined result, returns INTMAX_MAX
intmax_t powjii(int x, int y) {
if (y < 0) {
switch (x) {
case 0:
return INTMAX_MAX;
case 1:
return 1;
case -1:
return y % 2 ? -1 : 1;
}
return 0;
}
intmax_t z = 1;
intmax_t base = x;
for (;;) {
if (y % 2) {
z *= base;
}
y /= 2;
if (y == 0) {
break;
}
base *= base;
}
return z;
}
This code uses a forever loop for(;;) to avoid the final base *= base common in other looped solutions. That multiplication is 1) not needed and 2) could be int*int overflow which is UB.
more generic solution considering negative exponenet
private static int pow(int base, int exponent) {
int result = 1;
if (exponent == 0)
return result; // base case;
if (exponent < 0)
return 1 / pow(base, -exponent);
int temp = pow(base, exponent / 2);
if (exponent % 2 == 0)
return temp * temp;
else
return (base * temp * temp);
}
The O(log N) solution in Swift...
// Time complexity is O(log N)
func power(_ base: Int, _ exp: Int) -> Int {
// 1. If the exponent is 1 then return the number (e.g a^1 == a)
//Time complexity O(1)
if exp == 1 {
return base
}
// 2. Calculate the value of the number raised to half of the exponent. This will be used to calculate the final answer by squaring the result (e.g a^2n == (a^n)^2 == a^n * a^n). The idea is that we can do half the amount of work by obtaining a^n and multiplying the result by itself to get a^2n
//Time complexity O(log N)
let tempVal = power(base, exp/2)
// 3. If the exponent was odd then decompose the result in such a way that it allows you to divide the exponent in two (e.g. a^(2n+1) == a^1 * a^2n == a^1 * a^n * a^n). If the eponent is even then the result must be the base raised to half the exponent squared (e.g. a^2n == a^n * a^n = (a^n)^2).
//Time complexity O(1)
return (exp % 2 == 1 ? base : 1) * tempVal * tempVal
}
int pow(int const x, unsigned const e) noexcept
{
return !e ? 1 : 1 == e ? x : (e % 2 ? x : 1) * pow(x * x, e / 2);
//return !e ? 1 : 1 == e ? x : (((x ^ 1) & -(e % 2)) ^ 1) * pow(x * x, e / 2);
}
Yes, it's recursive, but a good optimizing compiler will optimize recursion away.
One more implementation (in Java). May not be most efficient solution but # of iterations is same as that of Exponential solution.
public static long pow(long base, long exp){
if(exp ==0){
return 1;
}
if(exp ==1){
return base;
}
if(exp % 2 == 0){
long half = pow(base, exp/2);
return half * half;
}else{
long half = pow(base, (exp -1)/2);
return base * half * half;
}
}
I use recursive, if the exp is even,5^10 =25^5.
int pow(float base,float exp){
if (exp==0)return 1;
else if(exp>0&&exp%2==0){
return pow(base*base,exp/2);
}else if (exp>0&&exp%2!=0){
return base*pow(base,exp-1);
}
}
In addition to the answer by Elias, which causes Undefined Behaviour when implemented with signed integers, and incorrect values for high input when implemented with unsigned integers,
here is a modified version of the Exponentiation by Squaring that also works with signed integer types, and doesn't give incorrect values:
#include <stdint.h>
#define SQRT_INT64_MAX (INT64_C(0xB504F333))
int64_t alx_pow_s64 (int64_t base, uint8_t exp)
{
int_fast64_t base_;
int_fast64_t result;
base_ = base;
if (base_ == 1)
return 1;
if (!exp)
return 1;
if (!base_)
return 0;
result = 1;
if (exp & 1)
result *= base_;
exp >>= 1;
while (exp) {
if (base_ > SQRT_INT64_MAX)
return 0;
base_ *= base_;
if (exp & 1)
result *= base_;
exp >>= 1;
}
return result;
}
Considerations for this function:
(1 ** N) == 1
(N ** 0) == 1
(0 ** 0) == 1
(0 ** N) == 0
If any overflow or wrapping is going to take place, return 0;
I used int64_t, but any width (signed or unsigned) can be used with little modification. However, if you need to use a non-fixed-width integer type, you will need to change SQRT_INT64_MAX by (int)sqrt(INT_MAX) (in the case of using int) or something similar, which should be optimized, but it is uglier, and not a C constant expression. Also casting the result of sqrt() to an int is not very good because of floating point precission in case of a perfect square, but as I don't know of any implementation where INT_MAX -or the maximum of any type- is a perfect square, you can live with that.
I have implemented algorithm that memorizes all computed powers and then uses them when need. So for example x^13 is equal to (x^2)^2^2 * x^2^2 * x where x^2^2 it taken from the table instead of computing it once again. This is basically implementation of #Pramod answer (but in C#).
The number of multiplication needed is Ceil(Log n)
public static int Power(int base, int exp)
{
int tab[] = new int[exp + 1];
tab[0] = 1;
tab[1] = base;
return Power(base, exp, tab);
}
public static int Power(int base, int exp, int tab[])
{
if(exp == 0) return 1;
if(exp == 1) return base;
int i = 1;
while(i < exp/2)
{
if(tab[2 * i] <= 0)
tab[2 * i] = tab[i] * tab[i];
i = i << 1;
}
if(exp <= i)
return tab[i];
else return tab[i] * Power(base, exp - i, tab);
}
Here is a O(1) algorithm for calculating x ** y, inspired by this comment. It works for 32-bit signed int.
For small values of y, it uses exponentiation by squaring. For large values of y, there are only a few values of x where the result doesn't overflow. This implementation uses a lookup table to read the result without calculating.
On overflow, the C standard permits any behavior, including crash. However, I decided to do bound-checking on LUT indices to prevent memory access violation, which could be surprising and undesirable.
Pseudo-code:
If `x` is between -2 and 2, use special-case formulas.
Otherwise, if `y` is between 0 and 8, use special-case formulas.
Otherwise:
Set x = abs(x); remember if x was negative
If x <= 10 and y <= 19:
Load precomputed result from a lookup table
Otherwise:
Set result to 0 (overflow)
If x was negative and y is odd, negate the result
C code:
#define POW9(x) x * x * x * x * x * x * x * x * x
#define POW10(x) POW9(x) * x
#define POW11(x) POW10(x) * x
#define POW12(x) POW11(x) * x
#define POW13(x) POW12(x) * x
#define POW14(x) POW13(x) * x
#define POW15(x) POW14(x) * x
#define POW16(x) POW15(x) * x
#define POW17(x) POW16(x) * x
#define POW18(x) POW17(x) * x
#define POW19(x) POW18(x) * x
int mypow(int x, unsigned y)
{
static int table[8][11] = {
{POW9(3), POW10(3), POW11(3), POW12(3), POW13(3), POW14(3), POW15(3), POW16(3), POW17(3), POW18(3), POW19(3)},
{POW9(4), POW10(4), POW11(4), POW12(4), POW13(4), POW14(4), POW15(4), 0, 0, 0, 0},
{POW9(5), POW10(5), POW11(5), POW12(5), POW13(5), 0, 0, 0, 0, 0, 0},
{POW9(6), POW10(6), POW11(6), 0, 0, 0, 0, 0, 0, 0, 0},
{POW9(7), POW10(7), POW11(7), 0, 0, 0, 0, 0, 0, 0, 0},
{POW9(8), POW10(8), 0, 0, 0, 0, 0, 0, 0, 0, 0},
{POW9(9), 0, 0, 0, 0, 0, 0, 0, 0, 0, 0},
{POW9(10), 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
};
int is_neg;
int r;
switch (x)
{
case 0:
return y == 0 ? 1 : 0;
case 1:
return 1;
case -1:
return y % 2 == 0 ? 1 : -1;
case 2:
return 1 << y;
case -2:
return (y % 2 == 0 ? 1 : -1) << y;
default:
switch (y)
{
case 0:
return 1;
case 1:
return x;
case 2:
return x * x;
case 3:
return x * x * x;
case 4:
r = x * x;
return r * r;
case 5:
r = x * x;
return r * r * x;
case 6:
r = x * x;
return r * r * r;
case 7:
r = x * x;
return r * r * r * x;
case 8:
r = x * x;
r = r * r;
return r * r;
default:
is_neg = x < 0;
if (is_neg)
x = -x;
if (x <= 10 && y <= 19)
r = table[x - 3][y - 9];
else
r = 0;
if (is_neg && y % 2 == 1)
r = -r;
return r;
}
}
}
My case is a little different, I'm trying to create a mask from a power, but I thought I'd share the solution I found anyway.
Obviously, it only works for powers of 2.
Mask1 = 1 << (Exponent - 1);
Mask2 = Mask1 - 1;
return Mask1 + Mask2;
In case you know the exponent (and it is an integer) at compile-time, you can use templates to unroll the loop. This can be made more efficient, but I wanted to demonstrate the basic principle here:
#include <iostream>
template<unsigned long N>
unsigned long inline exp_unroll(unsigned base) {
return base * exp_unroll<N-1>(base);
}
We terminate the recursion using a template specialization:
template<>
unsigned long inline exp_unroll<1>(unsigned base) {
return base;
}
The exponent needs to be known at runtime,
int main(int argc, char * argv[]) {
std::cout << argv[1] <<"**5= " << exp_unroll<5>(atoi(argv[1])) << ;std::endl;
}
I've noticed something strange about the standard exponential squaring algorithm with gnu-GMP :
I implemented 2 nearly-identical functions - a power-modulo function using the most vanilla binary exponential squaring algorithm,
labeled ______2()
then another one basically the same concept, but re-mapped to dividing by 10 at each round instead of dividing by 2,
labeled ______10()
.
( time ( jot - 1456 9999999999 6671 | pvE0 |
gawk -Mbe '
function ______10(_, __, ___, ____, _____, _______) {
__ = +__
____ = (____+=_____=____^= \
(_ %=___=+___)<_)+____++^____—
while (__) {
if (_______= __%____) {
if (__==_______) {
return (_^__ *_____) %___
}
__-=_______
_____ = (_^_______*_____) %___
}
__/=____
_ = _^____%___
}
}
function ______2(_, __, ___, ____, _____) {
__=+__
____+=____=_____^=(_%=___=+___)<_
while (__) {
if (__ %____) {
if (__<____) {
return (_*_____) %___
}
_____ = (_____*_) %___
--__
}
__/=____
_= (_*_) %___
}
}
BEGIN {
OFMT = CONVFMT = "%.250g"
__ = (___=_^= FS=OFS= "=")(_<_)
_____ = __^(_=3)^--_ * ++_-(_+_)^_
______ = _^(_+_)-_ + _^!_
_______ = int(______*_____)
________ = 10 ^ 5 + 1
_________ = 8 ^ 4 * 2 - 1
}
GNU Awk 5.1.1, API: 3.1 (GNU MPFR 4.1.0, GNU MP 6.2.1)
.
($++NF = ______10(_=$___, NR %________ +_________,_______*(_-11))) ^!___'
out9: 48.4MiB 0:00:08 [6.02MiB/s] [6.02MiB/s] [ <=> ]
in0: 15.6MiB 0:00:08 [1.95MiB/s] [1.95MiB/s] [ <=> ]
( jot - 1456 9999999999 6671 | pvE 0.1 in0 | gawk -Mbe ; )
8.31s user 0.06s system 103% cpu 8.058 total
ffa16aa937b7beca66a173ccbf8e1e12 stdin
($++NF = ______2(_=$___, NR %________ +_________,_______*(_-11))) ^!___'
out9: 48.4MiB 0:00:12 [3.78MiB/s] [3.78MiB/s] [<=> ]
in0: 15.6MiB 0:00:12 [1.22MiB/s] [1.22MiB/s] [ <=> ]
( jot - 1456 9999999999 6671 | pvE 0.1 in0 | gawk -Mbe ; )
13.05s user 0.07s system 102% cpu 12.821 total
ffa16aa937b7beca66a173ccbf8e1e12 stdin
For reasons extremely counter-intuitive and unknown to me, for a wide variety of inputs i threw at it, the div-10 variant is nearly always faster. It's the matching of hashes between the 2 that made it truly baffling, despite computers obviously not being built in and for a base-10 paradigm.
Am I missing something critical or obvious in the code/approach that might be skewing the results in a confounding manner ? Thanks.

Resources