Ramanujan's number in C - c

[Hardy about Ramanujan]: I remember once going to see him when he was ill at Putney. I had ridden in taxi cab number 1729 and remarked that the number seemed to me rather a dull one, and that I hoped it was not an unfavourable omen. "No," he replied, "it is a very interesting number; it is the smallest number expressible as the sum of two cubes in two different ways."
The two different ways are 1³ + 12³ and 9³ + 10³
I'm writing a series of functions (in C) to calculate different things related to Ramanujan's numbers. I'm now trying to write a function that returns the i-th Ramanujan's number. Since I've already created a function that checks whether a number is a Ramanujan number or not, the easy way would be to check every number, from 0 to infinity. If a given number is a Ramanujan number, increment a counter by one. Once the counter equals the index I'm looking for, I return the number. In code:
unsigned long ramanujan_index (unsigned long x, int counter, int index)
{
if (counter == index)
return x - 1;
if (is_ramanujan(x))
return ramanujan_index(x + 1, counter + 1, index);
else
return ramanujan_index(x + 1, counter, index);
}
It works, sure, but I'm a little worried that it's not as efficient as it could possibly be. Checking every number doesn't seem like the best solution. More so if we consider the first number is 1729, and the second is 4104. It seems that it'd take quite a lot of steps to find the 5th Ramanujan number (32832 steps, actually, since it has to check every number from 0 to 32832, which is the 5th number). Is there a better way to do so?

Here is a simple program using nested loops to enumerate Ramanujan numbers of different orders. It uses an array to store the number of ways and enumerates cubes to generate sums. The computation is performed in slices to take advantage of CPU caches and allow for ranges that exceed memory size.
This program enumerates Ramanujan numbers of order 2 up to 1 million in less than 0.01s and finds the smallest Ramanujan number of order 4 in a few hours: 6963472309248
#include <stdio.h>
#include <stdlib.h>
#define MAX_SLICE 0x400000 // use 4MB at a time
int main(int argc, char **argv) {
int order = 2;
size_t min = 0, max = 1000000, a, a3, b, n, i, n1, n2;
while (*++argv) {
char *p;
n = strtoull(*argv, &p, 0);
if (*p == '-') {
min = n;
max = strtoull(p + 1, NULL, 0);
} else {
if (n < 10)
order = n;
else
max = n;
}
}
for (n1 = min; n1 <= max; n1 = n2) {
size_t slice = (max + 1 - n1 <= MAX_SLICE) ? max + 1 - n1 : MAX_SLICE;
unsigned char *count = calloc(slice, 1);
n2 = n1 + slice;
for (a = 1; (a3 = a * a * a) < n2; a++) {
if (a3 + a3 >= n1) {
for (b = 1; b <= a && (n = a3 + b * b * b) < n2; b++) {
if (n >= n1)
count[n - n1]++;
}
}
}
for (i = n1; i < n2; i++) {
if (count[i - n1] >= order)
printf("%llu\n", (long long unsigned int)i);
}
free(count);
}
return 0;
}
Runs:
chqrlie$ time ./rama
1729
4104
13832
20683
32832
39312
40033
46683
64232
65728
110656
110808
134379
149389
165464
171288
195841
216027
216125
262656
314496
320264
327763
373464
402597
439101
443889
513000
513856
515375
525824
558441
593047
684019
704977
805688
842751
885248
886464
920673
955016
984067
994688
real 0m0.008s
user 0m0.002s
sys 0m0.002s
chqrlie$ time ./rama 10000000000 2 | wc -l
4724
real 0m7.526s
user 0m7.373s
sys 0m0.061s
chqrlie$ time ./rama 6963000000000-6964000000000 4
6963472309248
real 0m10.383s
user 0m10.243s
sys 0m0.050s

Related

What is the time complexity of exponentiation by squaring?

Here is a code to exponentiate a number to a given power:
#include <stdio.h>
int foo(int m, int k) {
if (k == 0) {
return 1;
} else if (k % 2 != 0) {
return m * foo(m, k - 1);
} else {
int p = foo(m, k / 2);
return p * p;
}
}
int main() {
int m, k;
while (scanf("%d %d", &m, &k) == 2) {
printf("%d\n", foo(m, k));
}
return 0;
}
How do I calculate the time complexity of the function foo?
I have been able to deduce that if k is a power of 2, the time complexity is O(log k).
But I am finding it difficult to calculate for other values of k. Any help would be much appreciated.
How do I calculate the time complexity of the function foo()?
I have been able to deduce that if k is a power of 2, the time complexity is O(logk).
First, I assume that the time needed for each function call is constant (this would for example not be the case if the time needed for a multiplication depends on the numbers being multiplied - which is the case on some computers).
We also assume that k>=1 (otherwise, the function will run endlessly unless there is an overflow).
Let's think the value k as a binary number:
If the rightmost bit is 0 (k%2!=0 is false), the number is shifted right by one bit (foo(m,k/2)) and the function is called recursively.
If the rightmost bit is 1 (k%2!=0 is true), the bit is changed to a 0 (foo(m,k-1)) and the function is called recursively. (We don't look at the case k=1, yet.)
This means that the function is called once for each bit and it is called once for each 1 bit. Or, in other words: It is called once for each 0 bit in the number and twice for each 1 bit.
If N is the number of function calls, n1 is the number of 1 bits and n0 is the number of 0 bits, we get the following formula:
N = n0 + 2*n1 + C
The constant C (C=(-1), if I didn't make a mistake) represents the case k=1 that we ignored up to now.
This means:
N = (n0 + n1) + n1 + C
And - because n0 + n1 = floor(log2(k)) + 1:
floor(log2(k)) + C <= N <= 2*floor(log2(k)) + C
As you can see, the time complexity is always O(log(k))
O(log(k))
Some modification added to output a statistics for spread sheet plot.
#include <stdio.h>
#include <math.h>
#ifndef TEST_NUM
#define TEST_NUM (100)
#endif
static size_t iter_count;
int foo(int m, int k) {
iter_count++;
if (k == 0) {
return 1;
} else if(k == 1) {
return m;
} else if (k % 2 != 0) {
return m * foo(m, k - 1);
} else {
int p = foo(m, k / 2);
return p * p;
}
}
int main() {
for (int i = 1; i < TEST_NUM; ++i) {
iter_count = 0;
int dummy_result = foo(1, i);
printf("%d, %zu, %f\n", i, iter_count, log2(i));
}
return 0;
}
Build it.
gcc t1.c -DTEST_NUM=10000
./a > output.csv
Now open the output file with a spread sheet program and plot the last two output columns.
For k positive, the function foo calls itself recursively p times if k is the p-th power of 2. If k is not a power of 2, the number of recursive calls is strictly inferior to 2 * p where p is the exponent of the largest power of 2 inferior to k.
Here is a demonstration:
let's expand the recursive call in the case k % 2 != 0:
int foo(int m, int k) {
if (k == 1) {
return m;
} else
if (k % 2 != 0) { /* 2 recursive calls */
// return m * foo(m, k - 1);
int p = foo(m, k / 2);
return m * p * p;
} else { /* 1 recursive call */
int p = foo(m, k / 2);
return p * p;
}
}
The total number of calls is floor(log2(k)) + bitcount(k), and bitcount(k) is by construction <= ceil(log2(k)).
There are no loops in the code and the time of each individual call is bounded by a constant, hence the overall time complexity of O(log k).
The number of times the function is called (recursively or not) per power call is proportional to the minimum number of bits in the exponent required to represent it in binary form.
Each time you enter in the function, it solves by reducing the number by one if the exponent is odd, OR reducing it to half if the exponent is even. This means that we will do n squares per significant bit in the number, and m more multiplications by the base for all the bits that are 1 in the exponent (which are, at most, n, so m < n) for a 32bit significant exponent (this is an exponent between 2^31 and 2^32 the routine will do between 32 and 64 products to get the result, and will reenter to itself a maximum of 64 times)
as in both cases the routine is tail-recursive, the code you post can be substituted with an iterative code in which a while loop is used to solve the problem.
int foo(int m, int k)
{
int prod = 1; /* last recursion foo(m, 0); */
int sq = m; /* squares */
while (k) {
if (k & 1) {
prod *= sq; /* foo(m, k); k odd */
}
k >>= 1;
sq *= sq;
}
return prod; /* return final product */
}
That's huge savings!!! (between 32 multiplications and 64 multiplications, to elevate something to 1,000,000,000 power)

Efficient way to find divisibility

Professor says this isn't a efficient algorithm to check whether the number is divisible by a number from 100,000-150,000. I'm having trouble finding a better way. Any help would be appreciated.
unsigned short divisibility_check(unsigned long n) {
unsigned long i;
for (i = 100000; i <= 150000; i++) {
if (n % i == 0) {
return 0;
}
}
return 1;
}
Let's say you need to find whether a positive integer K is divisible by a number between 100,000 and 150,000, and it is such a rare operation, that doing precalculations is just not worth the processor time or memory used.
If K < 100,000, it cannot be divisible by a number between 100,000 and 150,000.
If 100,000 ≤ K ≤ 150,000, it is divisible by itself. It is up to you to decide whether this counts or not.
For a K > 150,000 to be divisible by M, with 100,000 ≤ M ≤ 150,000, K must also be divisible by L = K / M. This is because K = L × M, and all three are positive integers. So, you only need to test the divisibility of K by a set of L, where ⌊ K / 150,000 ⌋ ≤ L ≤ ⌊ K / 100,000 ⌋.
However, that set of Ls becomes larger than the set of possible Ms when K > = 15,000,000,000. Then it is again less work to just test K for divisibility against each M, much like OP's code is now.
When implementing this as a program, the most important thing in practice is, surprisingly, the comments you add. Do not write comments that describe what the code does; write comments that explain the model or algorithm you are trying to implement (say, at the function level), and your intent of what each small block of code should accomplish.
In this particular case, you should probably add a comment to each if clause, explaining your reasoning, much like I did above.
Beginner programmers often omit comments completely. It is unfortunate, because writing good comments is a hard habit to pick up afterwards. It is definitely a good idea to learn to comment your code (as I described above -- the comments that describe what the code does are less than useful; more noise than help), and keep honing your skill on that.
A programmer whose code is maintainable, is worth ten geniuses who produce write-only code. This is because all code has bugs, because humans make errors. To be an efficient developer, your code must be maintainable. Otherwise you're forced to rewrite each buggy part from scratch, wasting a lot of time. And, as you can see above, "optimization" at the algorithmic level, i.e. thinking about how to avoid having to do work, yields much better results than trying to optimize your loops or something like that. (You'll find in real life that surprisingly often, optimizing a loop in the proper way, removes the loop completely.)
Even in exercises, proper comments may be the difference between "no points, this doesn't work" and "okay, I'll give you partial credit for this one, because you had a typo/off-by-one bug/thinko on line N, but otherwise your solution would have worked".
As bolov did not understand how the above leads to a "naive_with_checks" function, I'll show it implemented here.
For ease of testing, I'll show a complete test program. Supply the range of integers to test, and the range of divisors accepted, as parameters to the program (i.e. thisprogram 1 500000 100000 150000 to duplicate bolov's tests).
#include <stdlib.h>
#include <inttypes.h>
#include <limits.h>
#include <locale.h>
#include <ctype.h>
#include <stdio.h>
#include <errno.h>
int is_divisible(const uint64_t number,
const uint64_t minimum_divisor,
const uint64_t maximum_divisor)
{
uint64_t divisor, minimum_result, maximum_result, result;
if (number < minimum_divisor) {
return 0;
}
if (number <= maximum_divisor) {
/* Number itself is a valid divisor. */
return 1;
}
minimum_result = number / maximum_divisor;
if (minimum_result < 2) {
minimum_result = 2;
}
maximum_result = number / minimum_divisor;
if (maximum_result < minimum_result) {
maximum_result = minimum_result;
}
if (maximum_result - minimum_result > maximum_divisor - minimum_divisor) {
/* The number is so large that it is the least amount of work
to check each possible divisor. */
for (divisor = minimum_divisor; divisor <= maximum_divisor; divisor++) {
if (number % divisor == 0) {
return 1;
}
}
return 0;
} else {
/* There are fewer possible results than divisors,
so we check the results instead. */
for (result = minimum_result; result <= maximum_result; result++) {
if (number % result == 0) {
divisor = number / result;
if (divisor >= minimum_divisor && divisor <= maximum_divisor) {
return 1;
}
}
}
return 0;
}
}
int parse_u64(const char *s, uint64_t *to)
{
unsigned long long value;
const char *end;
/* Empty strings are not valid. */
if (s == NULL || *s == '\0')
return -1;
/* Parse as unsigned long long. */
end = s;
errno = 0;
value = strtoull(s, (char **)(&end), 0);
if (errno == ERANGE)
return -1;
if (end == s)
return -1;
/* Overflow? */
if (value > UINT64_MAX)
return -1;
/* Skip trailing whitespace. */
while (isspace((unsigned char)(*end)))
end++;
/* If the string does not end here, it has garbage in it. */
if (*end != '\0')
return -1;
if (to)
*to = (uint64_t)value;
return 0;
}
int main(int argc, char *argv[])
{
uint64_t kmin, kmax, dmin, dmax, k, count;
if (argc != 5) {
fprintf(stderr, "\n");
fprintf(stderr, "Usage: %s [ -h | --help | help ]\n", argv[0]);
fprintf(stderr, " %s MIN MAX MIN_DIVISOR MAX_DIVISOR\n", argv[0]);
fprintf(stderr, "\n");
fprintf(stderr, "This program counts which positive integers between MIN and MAX,\n");
fprintf(stderr, "inclusive, are divisible by MIN_DIVISOR to MAX_DIVISOR, inclusive.\n");
fprintf(stderr, "\n");
return EXIT_SUCCESS;
}
/* Use current locale. This may change which codes isspace() considers whitespace. */
if (setlocale(LC_ALL, "") == NULL)
fprintf(stderr, "Warning: Your C library does not support your current locale.\n");
if (parse_u64(argv[1], &kmin) || kmin < 1) {
fprintf(stderr, "%s: Invalid minimum positive integer to test.\n", argv[1]);
return EXIT_FAILURE;
}
if (parse_u64(argv[2], &kmax) || kmax < kmin || kmax >= UINT64_MAX) {
fprintf(stderr, "%s: Invalid maximum positive integer to test.\n", argv[2]);
return EXIT_FAILURE;
}
if (parse_u64(argv[3], &dmin) || dmin < 2) {
fprintf(stderr, "%s: Invalid minimum divisor to test for.\n", argv[3]);
return EXIT_FAILURE;
}
if (parse_u64(argv[4], &dmax) || dmax < dmin) {
fprintf(stderr, "%s: Invalid maximum divisor to test for.\n", argv[4]);
return EXIT_FAILURE;
}
count = 0;
for (k = kmin; k <= kmax; k++)
count += is_divisible(k, dmin, dmax);
printf("%" PRIu64 "\n", count);
return EXIT_SUCCESS;
}
It is useful to note that the above, running bolov's test, i.e. thisprogram 1 500000 100000 150000 only takes about 15 ms of wall clock time (13 ms CPU time), median, on a much slower Core i5-7200U processor. For really large numbers, like 280,000,000,000 to 280,000,010,000, the test does the maximum amount of work, and takes about 3.5 seconds per 10,000 numbers on this machine.
In other words, I wouldn't trust bolov's numbers to have any relation to timings for properly written test cases.
It is important to note that for any K between 1 and 500,000, the same test that bolov says their code measures, the above code does at most two divisibility tests to find if K is divisible by an integer between 100,000 and 150,000.
This solution is therefore quite efficient. It is definitely acceptable and near-optimal, when the tested K are relatively small (say, 32 bit unsigned integers or smaller), or when precomputed tables cannot be used.
Even when precomputed tables can be used, it is unclear if/when prime factorization becomes faster than the direct checks. There is certainly a tradeoff in the size and content of the precomputed tables. bolov claims that it is clearly superior to other methods, but hasn't implemented a proper "naive" divisibility test as shown above, and bases their opinion on experiments on quite small integers (1 to 500,000) that have simple prime decompositions.
As an example, a table of integers 1 to 500,000 pre-checked for divisibility takes only 62500 bytes (43750 bytes for 150,000 to 500,000). With that table, each test takes a small near-constant time (that only depends on memory and cache effects). Extending it to all 32-bit unsigned integers would require 512 GiB (536,870,912 bytes); the table can be stored in a memory-mapped read-only file, to let the OS kernel manage how much of it is mapped to RAM at any time.
Prime decomposition itself, especially using trial division, becomes more expensive than the naive approach when the number of trial divisions exceeds the range of possible divisors (50,000 divisors in this particular case). As there are 13848 primes (if one counts 1 and 2 as primes) between 1 and 150,000, the number of trial divisions can easily approach the number of divisors for sufficiently large input values.
For numbers with many prime factors, the combinatoric phase, finding if any subset of the prime factors multiply to a number between 100,000 and 150,000 is even more problematic. The number of possible combinations grows faster than exponentially. Without careful checks, this phase alone can do way more work per large input number than just trial division with each possible divisor would be.
(As an example, if you have 16 different prime factors, you already have 65,535 different combinations; more than the number of direct trial divisions. However, all such numbers are larger than 64-bit; the smallest being 2·3·5·7·11·13·17·19·23·29·31·37·41·43·47·53 = 32,589,158,477,190,044,730 which is a 65-bit number.)
There is also the problem of code complexity. The more complex the code, the harder it is to debug and maintain.
Ok, so I've implemented the version with sieve primes and factorization mentioned in the comments by m69 and it is ... way faster than the naive approach. I must admit, I didn't expect this at all.
My notations: left == 100'000 and right = 150'000
naive your version
naive_with_checks your version with simple checks:
if (n < left) no divisor
else if (n <= right) divisor
else if (left * 2 >= right && n < left * 2) divisor
factorization (above checks implemented)
Precompute the Sieve of Eratosthenes for all primes up to right. This time is not measured
factorize n (only with the primes from the prev step)
generate all subsets (backtracking, depth first: i.e. generate p1^0 * p2^0 * p3^0 first, instead of p1^5 first) with the product < left or until the product is in [left, right] (found divisor).
factorization_opt optimization of the previous algorithm where the subsets are not generated (no vector of subsets is created). I just pass the current product from one backtracking iteration to the next.
Nominal Animal's version I have also ran his version on my system with the same range.
I have written the program in C++ so I won't share it here.
I used std::uint64_t as data type and I have checked all numbers from 1 to 500'000 to see if each is divisible by a number in interval [100'000, 150'000]. All version reached the same solution: 170'836 numbers with positive results.
The setup:
Hardware: Intel Core i7-920, 4 cores with HT (all algorithm versions are single threaded), 2.66 GHz (boost 2.93 GHz),
8 MB SmartCache; memory: 6 GB DDR3 triple channel.
Compiler: Visual Studio 2017 (v141), Release x64 mode.
I must also add that I haven't profiled the programs so there is definitely room to improve the implementation. However this is enough here as the idea is to find a better algorithm.
version | elapsed time (milliseconds)
-----------------------+--------------
naive         |  167'378 ms (yes, it's thousands separator, aka 167 seconds)
naive_with_checks |   97'197 ms
factorization | 7'906 ms
factorization_opt | 7'320 ms
|
Nominal Animal version | 14 ms
Some analysis:
For naive vs naive_with_checks: all the numbers in [1 200'000] can be solved with just the simple checks. As these represent 40% of all the numbers checked, the naive_with_checks version does roughly 60% of the work naive does. The execution time reflect this as naive_with_checks runtime is ≅58% of the naive version.
The factorization version is a whopping 12.3 times faster. That is indeed impressive. I haven't analyzed the time complexity of the alg.
And the final optimization brings a further 1.08x speedup. This is basically the time gained by removing the creation and copy of the small vectors of subset factors.
For those interested the sieve precomputation which is not included above takes about 1 ms. And this is the naive implementation from wikipedia, no optimizations whatsoever.
For comparison, here's what I had in mind when I posted my comment about using prime factorization. Compiled with gcc -std=c99 -O3 -m64 -march=haswell this is slightly faster than the naive method with checks and inversion when tested with the last 10,000 integers in the 64-bit range (3.469 vs 3.624 seconds).
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>
#include <stdbool.h>
void eratosthenes(bool *ptr, uint64_t size) {
memset(ptr, true, size);
for (uint64_t i = 2; i * i < size; i++) {
if (ptr[i]) {
for (uint64_t j = i * i; j < size; j += i) {
ptr[j] = false;
}
}
}
}
bool divisible(uint64_t n, uint64_t a, uint64_t b) {
/* check for trivial cases first */
if (n < a) {
return false;
}
if (n <= b) {
return true;
}
if (n < 2 * a) {
return false;
}
/* Inversion: use range n/b ~ n/a; see Nominal Animal's answer */
if (n < a * b) {
uint64_t c = a;
a = (n + b - 1) / b; // n/b rounded up
b = n / c;
}
/* Create prime sieve when first called, or re-calculate it when */
/* called with a higher value of b; place before inversion in case */
/* of a large sequential test, to avoid repeated re-calculation. */
static bool *prime = NULL;
static uint64_t prime_size = 0;
if (prime_size <= b) {
prime_size = b + 1;
prime = realloc(prime, prime_size * sizeof(bool));
if (!prime) {
printf("Out of memory!\n");
return false;
}
eratosthenes(prime, prime_size);
}
/* Factorize n into prime factors up to b, using trial division; */
/* there are more efficient but also more complex ways to do this. */
/* You could return here, if a factor in the range a~b is found. */
static uint64_t factor[63];
uint8_t factors = 0;
for (uint64_t i = 2; i <= n && i <= b; i++) {
if (prime[i]) {
while (n % i == 0) {
factor[factors++] = i;
n /= i;
}
}
}
/* Prepare divisor sieve when first called, or re-allocate it when */
/* called with a higher value of b; in a higher-level language, you */
/* would probably use a different data structure for this, because */
/* this method iterates repeatedly over a potentially sparse array. */
static bool *divisor = NULL;
static uint64_t div_size = 0;
if (div_size <= b / 2) {
div_size = b / 2 + 1;
divisor = realloc(divisor, div_size * sizeof(bool));
if (!divisor) {
printf("Out of memory!\n");
return false;
}
}
memset(divisor, false, div_size);
divisor[1] = true;
uint64_t max = 1;
/* Iterate over each prime factor, and for every divisor already in */
/* the sieve, add the product of the divisor and the factor, up to */
/* the value b/2. If the product is in the range a~b, return true. */
for (uint8_t i = 0; i < factors; i++) {
for (uint64_t j = max; j > 0; j--) {
if (divisor[j]) {
uint64_t product = factor[i] * j;
if (product >= a && product <= b) {
return true;
}
if (product < div_size) {
divisor[product] = true;
if (product > max) {
max = product;
}
}
}
}
}
return false;
}
int main() {
uint64_t count = 0;
for (uint64_t n = 18446744073709541615LLU; n <= 18446744073709551614LLU; n++) {
if (divisible(n, 100000, 150000)) ++count;
}
printf("%llu", count);
return 0;
}
And this is the naive + checks + inversion implementation I compared it with:
#include <stdio.h>
#include <stdint.h>
#include <stdbool.h>
bool divisible(uint64_t n, uint64_t a, uint64_t b) {
if (n < a) {
return false;
}
if (n <= b) {
return true;
}
if (n < 2 * a) {
return false;
}
if (n < a * b) {
uint64_t c = a;
a = (n + b - 1) / b;
b = n / c;
}
while (a <= b) {
if (n % a++ == 0) return true;
}
return false;
}
int main() {
uint64_t count = 0;
for (uint64_t n = 18446744073709541615LLU; n <= 18446744073709551614LLU; n++) {
if (divisible(n, 100000, 150000)) ++count;
}
printf("%llu", count);
return 0;
}
Here's a recursive method with primes. The idea here is that if a number is divisible by a number between 100000 and 150000, there is a path of reducing by division the product of only relevant primes that will pass through a state in the target range. (Note: the code below is meant for numbers greater than 100000*150000). In my testing, I could not find an instance where the stack performed over 600 iterations.
# Euler sieve
def getPrimes():
n = 150000
a = (n+1) * [None]
ps = ([],[])
s = []
p = 1
while (p < n):
p = p + 1
if not a[p]:
s.append(p)
# Save primes less
# than half
# of 150000, the only
# ones needed to construct
# our candidates.
if p < 75000:
ps[0].append(p);
# Save primes between
# 100000 and 150000
# in case our candidate
# is prime.
elif p > 100000:
ps[1].append(p)
limit = n / p
new_s = []
for i in s:
j = i
while j <= limit:
new_s.append(j)
a[j*p] = True
j = j * p
s = new_s
return ps
ps1, ps2 = getPrimes()
def f(n):
# Prime candidate
for p in ps2:
if not (n % p):
return True
# (primes, prime_counts)
ds = ([],[])
prod = 1
# Prepare only prime
# factors that could
# construct a composite
# candidate.
for p in ps1:
while not (n % p):
prod *= p
if (not ds[0] or ds[0][-1] != p):
ds[0].append(p)
ds[1].append(1)
else:
ds[1][-1] += 1
n /= p
# Reduce the primes product to
# a state where it's between
# our target range.
stack = [(prod,0)]
while stack:
prod, i = stack.pop()
# No point in reducing further
if prod < 100000:
continue
# Exit early
elif prod <= 150000:
return True
# Try reducing the product
# by different prime powers
# one prime at a time
if i < len(ds[0]):
for p in xrange(ds[1][i] + 1):
stack.append((prod / ds[0][i]**p, i + 1))
return False
Output:
c = 0
for ii in xrange(1099511627776, 1099511628776):
f_i = f(ii)
if f_i:
c += 1
print c # 239
Here is a very simple solution with a sieve cache. If you call the divisibility_check function for many numbers in a sequence, this should be very efficient:
#include <string.h>
int divisibility_check_sieve(unsigned long n) {
static unsigned long sieve_min = 1, sieve_max;
static unsigned char sieve[1 << 19]; /* 1/2 megabyte */
if (n < sieve_min || n > sieve_max) {
sieve_min = n & ~(sizeof(sieve) - 1);
sieve_max = sieve_min + sizeof(sieve) - 1;
memset(sieve, 1, sizeof sieve);
for (unsigned long m = 100000; m <= 150000; m++) {
unsigned long i = sieve_min % m;
if (i != 0)
i = m - i;
for (; i < sizeof sieve; i += m) {
sieve[i] = 0;
}
}
}
return sieve[n - sieve_min];
}
Here is a comparative benchmark:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
int divisibility_check_naive(unsigned long n) {
for (unsigned long i = 100000; i <= 150000; i++) {
if (n % i == 0) {
return 0;
}
}
return 1;
}
int divisibility_check_small(unsigned long n) {
unsigned long i, min = n / 150000, max = n / 100000;
min += (min == 0);
max += (max == 0);
if (max - min > 150000 - 100000) {
for (i = 100000; i <= 150000; i++) {
if (n % i == 0) {
return 0;
}
}
return 1;
} else {
for (i = min; i <= max; i++) {
if (n % i == 0) {
unsigned long div = n / i;
if (div >= 100000 && div <= 150000)
return 0;
}
}
return 1;
}
}
int divisibility_check_sieve(unsigned long n) {
static unsigned long sieve_min = 1, sieve_max;
static unsigned char sieve[1 << 19]; /* 1/2 megabyte */
if (n < sieve_min || n > sieve_max) {
sieve_min = n & ~(sizeof(sieve) - 1);
sieve_max = sieve_min + sizeof(sieve) - 1;
memset(sieve, 1, sizeof sieve);
for (unsigned long m = 100000; m <= 150000; m++) {
unsigned long i = sieve_min % m;
if (i != 0)
i = m - i;
for (; i < sizeof sieve; i += m) {
sieve[i] = 0;
}
}
}
return sieve[n - sieve_min];
}
int main(int argc, char *argv[]) {
unsigned long n, count = 0, lmin, lmax, range[2] = { 1, 500000 };
int pos = 0, naive = 0, small = 0, sieve = 1;
clock_t t;
char *p;
for (int i = 1; i < argc; i++) {
n = strtoul(argv[i], &p, 0);
if (*p == '\0' && pos < 2)
range[pos++] = n;
else if (!strcmp(argv[i], "naive"))
naive = 1;
else if (!strcmp(argv[i], "small"))
small = 1;
else if (!strcmp(argv[i], "sieve"))
sieve = 1;
else
printf("invalid argument: %s\n", argv[i]);
}
lmin = range[0];
lmax = range[1] + 1;
if (naive) {
t = clock();
for (count = 0, n = lmin; n != lmax; n++) {
count += divisibility_check_naive(n);
}
t = clock() - t;
printf("naive: [%lu..%lu] -> %lu non-divisible numbers, %10.2fms\n",
lmin, lmax - 1, count, t * 1000.0 / CLOCKS_PER_SEC);
}
if (small) {
t = clock();
for (count = 0, n = lmin; n != lmax; n++) {
count += divisibility_check_small(n);
}
t = clock() - t;
printf("small: [%lu..%lu] -> %lu non-divisible numbers, %10.2fms\n",
lmin, lmax - 1, count, t * 1000.0 / CLOCKS_PER_SEC);
}
if (sieve) {
t = clock();
for (count = 0, n = lmin; n != lmax; n++) {
count += divisibility_check_sieve(n);
}
t = clock() - t;
printf("sieve: [%lu..%lu] -> %lu non-divisible numbers, %10.2fms\n",
lmin, lmax - 1, count, t * 1000.0 / CLOCKS_PER_SEC);
}
return 0;
}
Here are some run times:
naive: [1..500000] -> 329164 non-divisible numbers, 158174.52ms
small: [1..500000] -> 329164 non-divisible numbers, 12.62ms
sieve: [1..500000] -> 329164 non-divisible numbers, 1.35ms
sieve: [0..4294967295] -> 3279784841 non-divisible numbers, 8787.23ms
sieve: [10000000000000000000..10000000001000000000] -> 765978176 non-divisible numbers, 2205.36ms

Largest number formed from an array (understanding a solution)

Write a function that, given a list of non-negative integers, arranges them such that they form the largest possible number. For example, given [0, 1, 2, 3], the largest formed number is 3210.
Logic I understand:
We compare two numbers XY (Y appended at the end of X) and YX (X appended at the end of Y). If XY is larger, then X should come before Y in output, else Y should come before. For example, let X and Y be 542 and 60. To compare X and Y, we compare 54260 and 60542. Since 60542 is greater than 54260, we put Y first. I can also write code for this.
What surprises me is this solution:
#include <stdio.h>
#include<stdlib.h>
int swap(const void *c, const void *d) {
int n1 = *(int*)c;
int n2 = *(int*)d;
int a = pow(10, floor(log10(n2)) + 1) * n1 + n2;
int b = pow(10, floor(log10(n1)) + 1) * n2 + n1;
if (n1 == 0) return 1;
if (a < b) return 1;
return 0;
}
int main() {
int t = 0, tc = 0;
scanf("%d", &t);
for(tc = 1; tc <= t; tc++) {
int n;
scanf("%d",&n);
int arr[n];
for (int i = 0; i < n; i++) {
scanf("%d", &arr[i]);
}
qsort(arr, n, sizeof(int), swap);
for (int i = 0; i < n; i++)
printf("%d", arr[i]);
printf("\n");
}
return 0;
}
To my surprise, it passes all the test cases. Can anyone explain to me this logic?
This does exactly what you described:
int a = pow(10, floor(log10(n2)) + 1) * n1 + n2;
int b = pow(10, floor(log10(n1)) + 1) * n2 + n1;
If we're passed in X and Y, then a is XY, and b is YX.
If you're concatenating 2 and 34, you need to multiply 2 by 100 (to get 200) and then add 34 (to get 234). Where did the 100 come from? It's 10 to the power of the number of digits in 34. To get the number of digits, we compute the base-10 logarithm of 34 and round it up.
So:
log10(34) ~= 1.5
floor(log10(34)) == 1
floor(log10(34)) + 1 == 2
10^2 = 100, so now we know what to multiply the first number by before adding the second.
The second line does the same thing with the variables in the opposite order (computing YX concatenated).
Finally, we return 1 if a < b and 0 otherwise. This makes it a working comparator for a sort function:
if (a < b) return 1;
EDIT
I'm not sure what this line is doing:
if (n1 == 0) return 1;
I think it may be protecting us from the result of log10(0). (I'm not sure what that returns... the mathematical result is negative infinity.)
Basically, the result of this in the comparator is "Put n2 first if n1 is 0," which is always right. (I'm just not 100% sure why it's needed.)
Let's say that an array arr[] is the solution to your problem, i.e. its elements are arranged in such a way as to produce the max result M. Therefore, swapping arbitrary array elements i and j cannot yield a result that would be greater than M.
Consider comparing arbitrary indexes i and j in your comparator function swap, and digits surrounding them:
XXXXXXXX IIIIII XXXXXXXXXXXXXXXX JJJJJJ XXXXXXXXX
-------- ------ ---------------- ------ ---------
arr[...] arr[i] arr[...] arr[j] arr[...]
Note that if IIIIII block sorts before JJJJJJ block, it would continue sorting ahead of it regardless of the content of the X blocks. Therefore, comparing individual elements of arr in isolation produces an optimal solution when the entire array is sorted using this comparison.
Your comparator implementation performs this logic using "decimal shifting": if you want to add digits of x behind digits of y, you need to decimal-shift y by the number of digits in the x. The number of digits in the x can be determined as log10(x); decimal shifting left by k positions is achieved by multiplying y by 10k.
Note: This line
if (n1 == 0) return 1;
should be at the top, before you call decimal logarithm. There should also be another line
if (n2 == 0) return 0;
to ensure that we do not pass zero to log10.
What has been done in the code is,
Take input the array
Sort it descending order
Output it
The input & output part is easy to understand.
Now the sorting is done using qsort which accepts a compare function. In the code though the function is named swap, it is actually a compare function - which returns 1 when the first element is greater then the second one. Otherwise returns 0. Like is 54 > 45? and is 45>54?
Now, why a descending sort gives the deserved output? Let's see an example:
54 > 45 , This means if the big number is a left position the number is greater. A descending sort keeps the greater number left.
You already have some very good explanations of why the code you posted works. However, it should be noted that this method suffers from overflow whenever the decimal shifted version of any number exceeds the max representable int. If we assume a 32-bit int then this has 10 digits (2147483647), so comparing relatively small numbers such as 32412 and 12345 will cause problems.
As an alternative we can compare the numbers directly using a recursive function. Let the two numbers be n1 and n2, with d1 and d2 digits respectively. Our comparison function needs to handle three cases:
If d1 == d2we compare n1 and n2 directly, e.g. 345 and 463
If d1 < d2 we compare n1 to the d1 high-order digits of n2, e.g. for 37 and 398 we compare 37 and 39. If these are equal, we recursively compare n1 with the d2-d1 low-order digits of n2. So for 37 and 378 we'd compare 37 and 8.
If d1 > d2 we can swap n1 and n2 and compare as per case 2, though we then have to reverse the order of the result.
Here's some code to illustrate.
int swap(const void *c, const void *d)
{
int n1 = *(int*)c;
int n2 = *(int*)d;
int d1 = numDigits(n1);
int d2 = numDigits(n2);
return compare0(n1, d1, n2, d2);
}
int compare0(int n1, int d1, int n2, int d2)
{
if (d1 == d2)
return n2 - n1;
else if (d1 < d2)
return compare1(n1, d1, n2, d2);
else
return -compare1(n2, d2, n1, d1);
}
int compare1(int n1, int d1, int n2, int d2)
{
int pd = (int) pow(10, d2 - d1);
int nh2 = n2 / pd;
if (n1 == nh2)
return compare0(n1, d1, n2 % pd, d2 - d1);
else
return nh2 - n1;
}
int numDigits(int n)
{
return (n == 0) ? 1 : 1 + (int) log10(n);
}

Efficient way to find the sum of digits of an 8 digit number

I have to find the sum of the first 4 digits, the sum of the last 4 digits and compare them (of all the numbers betweem m and n). But when I submit my solution online there's a problem with the time limit.
Here's my code:
#include <stdio.h>
int main()
{
int M, N, res = 0, cnt, first4, second4, sum1, sum2;
scanf("%d", &M);
scanf("%d", &N);
for(cnt = M; cnt <= N; cnt++)
{
first4 = cnt % 10000;
sum1 = first4 % 10 + (first4 / 10) % 10 + (first4 / 100) % 10 + (first4 / 1000) % 10;
second4 = cnt / 10000;
sum2 = second4 % 10 + (second4 / 10) % 10 + (second4 / 100) % 10 + (second4 / 1000) % 10;
if(sum1 == sum2)
res++;
}
printf("%d", res);
return 0;
}
I'm trying to find a more efficient way to do this.
Finally, if you are still interested, there is a much faster way to do this.
Your task doesn't specifically require you to calculate the sums for all the numbers,
it only asks for the number of some special numbers.
In such cases optimization techniques like memoization or dynamic programming come really handy.
In this case, when you have the first four digits of some number (let them be 1234),
you calculate their sum (in this case 10) and you immediately know,
what is the sum of the other four digits supposed to be.
Any 4-digit number, that yields sum 10 can now be the other half to create a valid number.
Therefore total number of valid numbers beginning with 1234 is exactly the number of all four digit numbers that give the sum 10.
Now consider another number, say 3412. This number has also sum equal to 10,
therefore any right-side that completes 1234 also completes 3412.
What this means is that the number of valid numbers beginning with 3412 is the same
as the number of valid numbers beginning with 1234, which is in turn the same as the total number of valid numbers, where the first half yields the sum 10.
Therefore if we precompute for each i the number of four digit numbers
that yield the sum i, we would know for each first four digits the exact number of
combinations of last four digits that complete a valid number,
without having to iterate over all 10000 of them.
The following implementation of this algorithm
Precomputes number of different ending halves for each sum of the beginning half
Splits the [M,N] interval in three subintervals, because in the first and the last beginning not every ending is possible
This algorithm runs quadratically faster than the naive implementation (for sufficiently big N-M).
#include <string.h>
int sum_digits(int number) {
return number%10 + (number/10)%10 + (number/100)%10 + (number/1000)%10;
}
int count(int M, int N) {
if (M > N) return 0;
int ret = 0;
int tmp = 0;
// for each i from 0 to 36 precompute number of ways we can get this sum
// out of a four-digit number
int A[37];
memset(A, 0, 37*4);
for (int i = 0; i <= 9999; ++i) {
++A[sum_digits(i)];
}
// nearest multiple of 10000 greater than M
int near_M = ((M+9999)/10000)*10000;
// nearest multiple of 10000 less than N
int near_N = (N/10000)*10000;
// count all numbers up to first multiple of 10000
tmp = sum_digits(M/10000);
if (near_M <= N) {
for (int i = M; i < near_M; ++i) {
if (tmp == sum_digits(i % 10000)) {
++ret;
}
}
}
// count all numbers between the 10000 multiples, use the precomputed values
for (int i = near_M / 10000; i < near_N / 10000; ++i) {
ret += A[sum_digits(i)];
}
// count all numbers after the last multiple of 10000
tmp = sum_digits(N / 10000);
if (near_N >= M) {
for (int i = near_N; i <= N; ++i) {
if (tmp == sum_digits(i % 10000)) {
++ret;
}
}
}
// special case when there are no multiples of 10000 between M and N
if (near_M > near_N) {
for (int i = M; i <= N; ++i) {
if (sum_digits(i / 10000) == sum_digits(i % 10000)) {
++ret;
}
}
}
return ret;
}
EDIT: I fixed the bugs mentioned in the comments.
I don't know if this would be significantly faster or not, but you might try breaking the number into two 4 digit numbers, then use a table lookup to get the sums. That way there's only one division operation instead of eight.
You can pre-compute the table of 10000 sums so it gets compiled in so there's no runtime cost at all.
Another slightly more complicated, but probably much faster, approach that can be used is have a table or map of 10000 elements that's the reverse of the sum lookup table where you can map the sum to the set of four digit numbers that would produce that sum. That way, when you have to find the result for a particular range 10000 number range, it's a simple lookup on the sum of the most significant four digits. For example, to find the result for the range 12340000 - 12349999, you could use a binary search on the reverse lookup table to quickly find how many numbers in the range 0 - 9999 have the sum 10 (1 + 2 + 3 + 4).
Again - this reverse sum lookup table can be pre-computed and compiled in as a static array.
In this way, the results for complete 10000 number ranges are performed with a couple binary searches. Any partial ranges can also be handled with the reverse lookup table with slightly more complication due to having to ignore matches that are from out of the range of interest. But that complication only has to happen at most twice for your whole set of subranges.
This would reduce the complexity of the algorithm from O(N*N) to O(N log N) (I think).
update:
Here's some timings I got (Win32-x86, using VS 2013 (MSVC 12) with release build default options):
range range
start end count time
================================================
alg1(10000000, 99999999): 4379055, 1.854 seconds
alg2(10000000, 99999999): 4379055, 0.049 seconds
alg3(10000000, 99999999): 4379055, 0.001 seconds
with:
alg1() is the original code from the question
alg2() is my first cut suggestion (lookup precomputed sums)
alg3() is the second suggestion (binary search lookup of sum matches using a table sorted by sums)
I'm actually surprised at the difference between alg1() to alg2()
You are going about this the wrong way. A little bit of cleverness is worth a lot of horsepower. You should not be comparing the first and last four digits of every number.
First - notice that the first four digits will change very slowly - so for sure you can have a loop of 10000 of the last four digits without re-computing the first sum.
Second - the sum of digits repeats itself every 9th number (until you get overflow). This is the basis of the "number is divisible by 9 if sum of digits is divisible by 9". example:
1234 - sum = 10
1234 + 9 = 1243 - sum is still 10
What this means is that the following will work pretty well (pseudo code):
take first 4 digits of M, find sum (call it A)
find sum of last four digits of M (call it B)
subtract: C = (A - B)
If C < 9:
D = C%9
first valid number is [A][B+D]. Then step by 9, until...
You need to think a bit about the "until", and also about what to do when C >= 9. This means you need to find a zero in B and replace it with a 9, then repeat the above.
If you want to do nothing else, then see that you don't need to re-compute the sum of digits that did not change. In general when you add 1 to a number, the sum of digits increases by 1 (unless there is carry - then it decreases by 9; and that happens every 9th, 99th (twice -> sum drops by 18), 999th (drop by 27), etc.
I hope this helps you think about the problem differently.
I am going to try an approach which doesn't make use of the lookup table (even though I know that the second one should be faster) to investigate how much we can speedup just optimizing calculus. This algorithm can be used where stack is an important resource...
Let's work on the idea that divisions and modulus are slow, for example in cortex R4 a 32 bit division requires up to 16 loops while a multiplication can be done in a single loop, with older ARMs things can be even worse.
This basic idea will try to get rid of them using digit arrays instead of integers. To keep it simple let's show an implementation using printf before a pseudo optimized version.
void main() {
int count=0;
int nmax;
char num[9]={0};
int n;
printf( "Insert number1 ");
scanf( "%d", &nm );
printf( "Insert number2 ");
scanf( "%d", &nmax );
while( nm <= nmax ) {
int sumup=0, sumdown=0;
sprintf( num, "%d", nm );
for( n=0; n<4; n++ ) {
sumup += num[n] -'0'; // subtracting '0' is not necessary (see below)
sumdown += num[7-n]-'0'; // subtracting '0' is not necessary (see below)
}
if( sumup == sumdown ) {
/* whatever */
count++;
}
nm++;
}
}
You may want to check that the string is a valid number using strtol before calling the for loop and the length of the string using strlen. I set here fixed values as you required (I assume length always 8).
The downside of the shown algorithm is the sprintf for any loop that may do thing worse... So we apply two major changes
we use [0-9] instead of ['0';'9']
we drop the sprintf for a faster solution which takes in account that we need to format a digit string starting from the previous number (n-1)
Finally the pseudo optimized algorithm should look something like the one shown below in which all divisions and modules are removed (apart from the first number) and bytes are used instead of ASCII.
void pseudo_optimized() {
int count=0;
int nmax,nm;
char num[9]={0};
int sumup=0, sumdown=0;
int n,i;
printf( "Insert number1 ");
scanf( "%d", &nm );
printf( "Insert number2 ");
scanf( "%d", &nmax );
n = nm;
for( i=7; i>=0; i-- ) {
num[i]=n%10;
n/=10;
}
while( nm <= nmax ) {
sumup = num[0] + num[1] + num[2] + num[3];
sumdown = num[7] + num[6] + num[5] + num[4];
if( sumup == sumdown ) {
/* whatever */
count++;
}
nm++;
/* Following loop is a faster sprintf replacement and
* it will exit at the first value 9 times on 10
*/
for( i=7; i>=0; i-- ) {
if( num[i] == 9 ) {
num[i]=0;
} else {
num[i] += 1;
break;
}
}
}
}
Original algo on my vm 5.500000 s, this algo 0.950000 s tested for [00000000=>99999999]
The weak point of this algorithm is that it uses sum of digits (which are not necessary and a for...loop that can be unrolled.
* update *
further optimization. The sums of digits are not necessary.... thinking about it I could improve the algorithm in the following way:
int optimized() {
int nmax=99999999,
int nm=0;
clock_t time1, time2;
char num[9]={0};
int sumup=0, sumdown=0;
int n,i;
int count=0;
n = nm;
time1 = clock();
for( i=7; i>=0; i-- ) {
num[i]=n%10;
n/=10;
}
sumup = num[0] + num[1] + num[2] + num[3];
sumdown = num[7] + num[6] + num[5] + num[4];
while( nm <= nmax ) {
if( sumup == sumdown ) {
count++;
}
nm++;
for( i=7; i>=0; i-- ) {
if( num[i] == 9 ) {
num[i]=0;
if( i>3 )
sumdown-=9;
else
sumup-=9;
} else {
num[i] += 1;
if( i>3 )
sumdown++;
else
sumup++;
break;
}
}
}
time2 = clock();
printf( "Final-now %d %f\n", count, ((float)time2 - (float)time1) / 1000000);
return 0;
}
with this we arrive to 0.760000 s which is 3 times slower than the result achieved on the same machine using lookup tables.
* update* Optimized and unrolled:
int optimized_unrolled(int nm, int nmax) {
char num[9]={0};
int sumup=0, sumdown=0;
int n,i;
int count=0;
n = nm;
for( i=7; i>=0; i-- ) {
num[i]=n%10;
n/=10;
}
sumup = num[0] + num[1] + num[2] + num[3];
sumdown = num[7] + num[6] + num[5] + num[4];
while( nm <= nmax ) {
if( sumup == sumdown ) {
count++;
}
nm++;
if( num[7] == 9 ) {
num[7]=0;
if( num[6] == 9 ) {
num[6]=0;
if( num[5] == 9 ) {
num[5]=0;
if( num[4] == 9 ) {
num[4]=0;
sumdown=0;
if( num[3] == 9 ) {
num[3]=0;
if( num[2] == 9 ) {
num[2]=0;
if( num[1] == 9 ) {
num[1]=0;
num[0]++;
sumup-=26;
} else {
num[1]++;
sumup-=17;
}
} else {
num[2]++;
sumup-=8;
}
} else {
num[3]++;
sumup++;
}
} else {
num[4]++;
sumdown-=26;
}
} else {
num[5]++;
sumdown-=17;
}
} else {
num[6]++;
sumdown-=8;
}
} else {
num[7]++;
sumdown++;
}
}
return count;
}
Unrolling vectors improves the speed of about 50%. The algorithm costs now 0.36000 s, by the way it makes use of the stack a bit more than the previous solution (as some 'if' statements may result in a push, so it cannot be always used). The result is comparable with Alg2#Michael Burr on the same machine, [Alg3-Alg5]#Michael Burr are a lot faster where stack isn't a concern.
Note all test where performed on a intel VMS. I will try to run all those algos on a ARM device if I will have time.
#include <stdio.h>
int main(){
int M, N;
scanf("%d", &M);
scanf("%d", &N);
static int table[10000] = {0,1,2,3,4,5,6,7,8,9};
{
register int i=0,i1,i2,i3,i4;
for(i1=0;i1<10;++i1)
for(i2=0;i2<10;++i2)
for(i3=0;i3<10;++i3)
for(i4=0;i4<10;++i4)
table[i++]=table[i1]+table[i2]+table[i3]+table[i4];
}
register int cnt = M, second4 = M % 10000;
int res = 0, first4 = M / 10000, sum1=table[first4];
for(; cnt <= N; ++cnt){
if(sum1 == table[second4])
++res;
if(++second4>9999){
second4 -=10000;
if(++first4>9999)break;
sum1 = table[first4];
}
}
printf("%d", res);
return 0;
}
If you know that the numbers are fixed like that, then you can you substring functions to get the components and compare them. Otherwise, your modulator operations are contributing unnecessary time.
i found faster algorithm:
#include <stdio.h>
#include <ctime>
int main()
{
clock_t time1, time2;
int M, N, res = 0, cnt, first4, second4, sum1, sum2,last4_ofM,first4_ofM,last4_ofN,first4_ofN,j;
scanf("%d", &M);
scanf("%d", &N);
time1 = clock();
for(cnt = M; cnt <= N; cnt++)
{
first4 = cnt % 10000;
sum1 = first4 % 10 + (first4 / 10) % 10 + (first4 / 100) % 10 + (first4 / 1000) % 10;
second4 = cnt / 10000;
sum2 = second4 % 10 + (second4 / 10) % 10 + (second4 / 100) % 10 + (second4 / 1000) % 10;
if(sum1 == sum2)
res++;
}
time2 = clock();
printf("%d\n", res);
printf("first algorithm time: %f\n",((float)time2 - (float)time1) / 1000000.0F );
res=0;
time1 = clock();
first4_ofM = M / 10000;
last4_ofM = M % 10000;
first4_ofN = N / 10000;
last4_ofN = N % 10000;
for(int i = first4_ofM; i <= first4_ofN; i++)
{
sum1 = i % 10 + (i / 10) % 10 + (i / 100) % 10 + (i / 1000) % 10;
if ( i == first4_ofM )
j = last4_ofM;
else
j = 0;
while ( j <= 9999)
{
sum2 = j % 10 + (j / 10) % 10 + (j / 100) % 10 + (j / 1000) % 10;
if(sum1 == sum2)
res++;
if ( i == first4_ofN && j == last4_ofN ) break;
j++;
}
}
time2 = clock();
printf("%d\n", res);
printf("second algorithm time: %f\n",((float)time2 - (float)time1) / 1000000.0F );
return 0;
}
i just dont need to count sum of the first four digits all the time the number in changed. I need to count it one time per 10000 iterations. In worst case output is:
10000000
99999999
4379055
first algorithm time: 5.160000
4379055
second algorithm time: 2.240000
about half the better result.

Faster algorithm to find how many numbers are not divisible by a given set of numbers

I am trying to solve an online judge problem: http://opc.iarcs.org.in/index.php/problems/LEAFEAT
The problem in short:
If we are given an integer L and a set of N integers s1,s2,s3..sN, we have to find how many numbers there are from 0 to L-1 which are not divisible by any of the 'si's.
For example, if we are given, L = 20 and S = {3,2,5} then there are 6 numbers from 0 to 19 which are not divisible by 3,2 or 5.
L <= 1000000000 and N <= 20.
I used the Inclusion-Exclusion principle to solve this problem:
/*Let 'T' be the number of integers that are divisible by any of the 'si's in the
given range*/
for i in range 1 to N
for all subsets A of length i
if i is odd then:
T += 1 + (L-1)/lcm(all the elements of A)
else
T -= 1 + (L-1)/lcm(all the elements of A)
return T
Here is my code to solve this problem
#include <stdio.h>
int N;
long long int L;
int C[30];
typedef struct{int i, key;}subset_e;
subset_e A[30];
int k;
int gcd(a,b){
int t;
while(b != 0){
t = a%b;
a = b;
b = t;
}
return a;
}
long long int lcm(int a, int b){
return (a*b)/gcd(a,b);
}
long long int getlcm(int n){
if(n == 1){
return A[0].key;
}
int i;
long long int rlcm = lcm(A[0].key,A[1].key);
for(i = 2;i < n; i++){
rlcm = lcm(rlcm,A[i].key);
}
return rlcm;
}
int next_subset(int n){
if(k == n-1 && A[k].i == N-1){
if(k == 0){
return 0;
}
k--;
}
while(k < n-1 && A[k].i == A[k+1].i-1){
if(k <= 0){
return 0;
}
k--;
}
A[k].key = C[A[k].i+1];
A[k].i++;
return 1;
}
int main(){
int i,j,add;
long long int sum = 0,g,temp;
scanf("%lld%d",&L,&N);
for(i = 0;i < N; i++){
scanf("%d",&C[i]);
}
for(i = 1; i <= N; i++){
add = i%2;
for(j = 0;j < i; j++){
A[j].key = C[j];
A[j].i = j;
}
temp = getlcm(i);
g = 1 + (L-1)/temp;
if(add){
sum += g;
} else {
sum -= g;
}
k = i-1;
while(next_subset(i)){
temp = getlcm(i);
g = 1 + (L-1)/temp;
if(add){
sum += g;
} else {
sum -= g;
}
}
}
printf("%lld",L-sum);
return 0;
}
The next_subset(n) generates the next subset of size n in the array A, if there is no subset it returns 0 otherwise it returns 1. It is based on the algorithm described by the accepted answer in this stackoverflow question.
The lcm(a,b) function returns the lcm of a and b.
The get_lcm(n) function returns the lcm of all the elements in A.
It uses the property : LCM(a,b,c) = LCM(LCM(a,b),c)
When I submit the problem on the judge it gives my a 'Time Limit Exceeded'. If we solve this using brute force we get only 50% of the marks.
As there can be upto 2^20 subsets my algorithm might be slow, hence I need a better algorithm to solve this problem.
EDIT:
After editing my code and changing the function to the Euclidean algorithm, I am getting a wrong answer, but my code runs within the time limit. It gives me a correct answer to the example test but not to any other test cases; here is a link to ideone where I ran my code, the first output is correct but the second is not.
Is my approach to this problem correct? If it is then I have made a mistake in my code, and I'll find it; otherwise can anyone please explain what is wrong?
You could also try changing your lcm function to use the Euclidean algorithm.
int gcd(int a, int b) {
int t;
while (b != 0) {
t = b;
b = a % t;
a = t;
}
return a;
}
int lcm(int a, int b) {
return (a * b) / gcd(a, b);
}
At least with Python, the speed differences between the two are pretty large:
>>> %timeit lcm1(103, 2013)
100000 loops, best of 3: 9.21 us per loop
>>> %timeit lcm2(103, 2013)
1000000 loops, best of 3: 1.02 us per loop
Typically, the lowest common multiple of a subset of k of the s_i will exceed L for k much smaller than 20. So you need to stop early.
Probably, just inserting
if (temp >= L) {
break;
}
after
while(next_subset(i)){
temp = getlcm(i);
will be sufficient.
Also, shortcut if there are any 1s among the s_i, all numbers are divisible by 1.
I think the following will be faster:
unsigned gcd(unsigned a, unsigned b) {
unsigned r;
while(b) {
r = a%b;
a = b;
b = r;
}
return a;
}
unsigned recur(unsigned *arr, unsigned len, unsigned idx, unsigned cumul, unsigned bound) {
if (idx >= len || bound == 0) {
return bound;
}
unsigned i, g, s = arr[idx], result;
g = s/gcd(cumul,s);
result = bound/g;
for(i = idx+1; i < len; ++i) {
result -= recur(arr, len, i, cumul*g, bound/g);
}
return result;
}
unsigned inex(unsigned *arr, unsigned len, unsigned bound) {
unsigned i, result = bound, t;
for(i = 0; i < len; ++i) {
result -= recur(arr, len, i, 1, bound);
}
return result;
}
call it with
unsigned S[N] = {...};
inex(S, N, L-1);
You need not add the 1 for the 0 anywhere, since 0 is divisible by all numbers, compute the count of numbers 1 <= k < L which are not divisible by any s_i.
Create an array of flags with L entries. Then mark each touched leaf:
for(each size in list of sizes) {
length = 0;
while(length < L) {
array[length] = TOUCHED;
length += size;
}
}
Then find the untouched leaves:
for(length = 0; length < L; length++) {
if(array[length] != TOUCHED) { /* Untouched leaf! */ }
}
Note that there is no multiplication and no division involved; but you will need up to about 1 GiB of RAM. If RAM is a problem the you can use an array of bits (max. 120 MiB).
This is only a beginning though, as there are repeating patterns that can be copied instead of generated. The first pattern is from 0 to S1*S2, the next is from 0 to S1*S2*S3, the next is from 0 to S1*S2*S3*S4, etc.
Basically, you can set all values touched by S1 and then S2 from 0 to S1*S2; then copy the pattern from 0 to S1*S2 until you get to S1*S2*S3 and set all the S3's between S3 and S1*S2*S3; then copy that pattern until you get to S1*S2*S3*S4 and set all the S4's between S4 and S1*S2*S3*S4 and so on.
Next; if S1*S2*...Sn is smaller than L, you know the pattern will repeat and can generate the results for lengths from S1*S2*...Sn to L from the pattern. In this case the size of the array only needs to be S1*S2*...Sn and doesn't need to be L.
Finally, if S1*S2*...Sn is larger than L; then you could generate the pattern for S1*S2*...(Sn-1) and use that pattern to create the results from S1*S2*...(Sn-1) to S1*S2*...Sn. In this case if S1*S2*...(Sn-1) is smaller than L then the array doesn't need to be as large as L.
I'm afraid your problem understanding is maybe not correct.
You have L. You have a set S of K elements. You must count the sum of quotient of L / Si. For L = 20, K = 1, S = { 5 }, the answer is simply 16 (20 - 20 / 5). But K > 1, so you must consider the common multiples also.
Why loop through a list of subsets? It doesn't involve subset calculation, only division and multiple.
You have K distinct integers. Each number could be a prime number. You must consider common multiples. That's all.
EDIT
L = 20 and S = {3,2,5}
Leaves could be eaten by 3 = 6
Leaves could be eaten by 2 = 10
Leaves could be eaten by 5 = 4
Common multiples of S, less than L, not in S = 6, 10, 15
Actually eaten leaves = 20/3 + 20/2 + 20/5 - 20/6 - 20/10 - 20/15 = 6
You can keep track of the distance until then next touched leaf for each size. The distance to the next touched leaf will be whichever distance happens to be smallest, and you'd subtract this distance from all the others (and wrap whenever the distance is zero).
For example:
int sizes[4] = {2, 5, 7, 9};
int distances[4];
int currentLength = 0;
for(size = 0 to 3) {
distances[size] = sizes[size];
}
while(currentLength < L) {
smallest = INT_MAX;
for(size = 0 to 3) {
if(distances[size] < smallest) smallest = distances[size];
}
for(size = 0 to 3) {
distances[size] -= smallest;
if(distances[size] == 0) distances[size] = sizes[size];
}
while( (smallest > 1) && (currentLength < L) ) {
currentLength++;
printf("%d\n", currentLength;
smallest--;
}
}
#A.06: u r the one with username linkinmew on opc, rite?
Anyways, the answer just requires u to make all possible subsets, and then apply inclusion exclusion principle. This will fall well within the time bounds for the data given. For making all possible subsets, u can easily define a recursive function.
i don't know about programming but in math there is a single theorem which works on a set that has GCD 1
L=20, S=(3,2,5)
(1-1/p)(1-1/q)(1-1/r).....and so on
(1-1/3)(1-1/2)(1-1/5)=(2/3)(1/2)(4/5)=4/15
4/15 means there are 4 numbers in each set of 15 number which are not divisible by any number rest of it can be count manually eg.
16, 17, 18, 19, 20 (only 17 and 19 means there are only 2 numbers thatr can't be divided by any S)
4+2=6
6/20 means there are only 6 numbers in first 20 numbers that can't be divided by any s

Resources