Algorithm to find Lucky Numbers - c

I came across this question.A number is called lucky if the sum of its digits, as well as the sum of the squares of its digits is a prime number. How many numbers between A and B are lucky? 1 <= A <= B <= 1018. I tried this.
First I generated all possible primes between 1 and the number that could be resulted by summing the squares (81 *18 = 1458).
I read in A and B find out maximum number that could be generated by summing the digits If B is a 2 digit number ( the max number is 18 generated by 99).
For each prime number between 1 an max number. I applied integer partition algorithm.
For each possible partition I checked whether their sum of squares of their digits form prime. If so the possible permutations of that partition are generated and if they lie with in range they are lucky numbers.
This is the implementation:
#include<stdio.h>
#include<malloc.h>
#include<math.h>
#include <stdlib.h>
#include<string.h>
long long luckynumbers;
int primelist[1500];
int checklucky(long long possible,long long a,long long b){
int prime =0;
while(possible>0){
prime+=pow((possible%10),(float)2);
possible/=10;
}
if(primelist[prime]) return 1;
else return 0;
}
long long getmax(int numdigits){
if(numdigits == 0) return 1;
long long maxnum =10;
while(numdigits>1){
maxnum = maxnum *10;
numdigits-=1;
}
return maxnum;
}
void permuteandcheck(char *topermute,int d,long long a,long long b,int digits){
if(d == strlen(topermute)){
long long possible=atoll(topermute);
if(possible >= getmax(strlen(topermute)-1)){ // to skip the case of getting already read numbers like 21 and 021(permuted-210
if(possible >= a && possible <= b){
luckynumbers++;
}
}
}
else{
char lastswap ='\0';
int i;
char temp;
for(i=d;i<strlen(topermute);i++){
if(lastswap == topermute[i])
continue;
else
lastswap = topermute[i];
temp = topermute[d];
topermute[d] = topermute[i];
topermute[i] = temp;
permuteandcheck(topermute,d+1,a,b,digits);
temp = topermute[d];
topermute[d] = topermute[i];
topermute[i] = temp;
}
}
}
void findlucky(long long possible,long long a,long long b,int digits){
int i =0;
if(checklucky(possible,a,b)){
char topermute[18];
sprintf(topermute,"%lld",possible);
permuteandcheck(topermute,0,a,b,digits);
}
}
void partitiongenerator(int k,int n,int numdigits,long long possible,long long a,long long b,int digits){
if(k > n || numdigits > digits-1 || k > 9) return;
if(k == n){
possible+=(k*getmax(numdigits));
findlucky(possible,a,b,digits);
return;
}
partitiongenerator(k,n-k,numdigits+1,(possible + k*getmax(numdigits)),a,b,digits);
partitiongenerator(k+1,n,numdigits,possible,a,b,digits);
}
void calcluckynumbers(long long a,long long b){
int i;
int numdigits = 0;
long long temp = b;
while(temp > 0){
numdigits++;
temp/=10;
}
long long maxnum =getmax(numdigits)-1;
int maxprime=0,minprime =0;
temp = maxnum;
while(temp>0){
maxprime+=(temp%10);
temp/=10;
}
int start = 2;
for(;start <= maxprime ;start++){
if(primelist[start]) {
partitiongenerator(0,start,0,0,a,b,numdigits);
}
}
}
void generateprime(){
int i = 0;
for(i=0;i<1500;i++)
primelist[i] = 1;
primelist[0] = 0;
primelist[1] = 0;
int candidate = 2;
int topCandidate = 1499;
int thisFactor = 2;
while(thisFactor * thisFactor <= topCandidate){
int mark = thisFactor + thisFactor;
while(mark <= topCandidate){
*(primelist + mark) = 0;
mark += thisFactor;
}
thisFactor++;
while(thisFactor <= topCandidate && *(primelist+thisFactor) == 0) thisFactor++;
}
}
int main(){
char input[100];
int cases=0,casedone=0;
long long a,b;
generateprime();
fscanf(stdin,"%d",&cases);
while(casedone < cases){
luckynumbers = 0;
fscanf(stdin,"%lld %lld",&a,&b);
int i =0;
calcluckynumbers(a,b);
casedone++;
}
}
The algorithm is too slow. I think the answer can be found based on the property of numbers.Kindly share your thoughts. Thank you.

Excellent solution OleGG, But your code is not optimized. I have made following changes to your code,
It does not require to go through 9*9*i for k in count_lucky function, because for 10000 cases it would run that many times, instead i have reduced this value through start and end.
i have used ans array to store intermediate results. It might not look like much but over 10000 cases this is the major factor that reduces the time.
I have tested this code and it passed all the test cases. Here is the modified code:
#include <stdio.h>
const int MAX_LENGTH = 18;
const int MAX_SUM = 162;
const int MAX_SQUARE_SUM = 1458;
int primes[1460];
unsigned long long dyn_table[20][164][1460];
//changed here.......1
unsigned long long ans[19][10][164][1460]; //about 45 MB
int start[19][163];
int end[19][163];
//upto here.........1
void gen_primes() {
for (int i = 0; i <= MAX_SQUARE_SUM; ++i) {
primes[i] = 1;
}
primes[0] = primes[1] = 0;
for (int i = 2; i * i <= MAX_SQUARE_SUM; ++i) {
if (!primes[i]) {
continue;
}
for (int j = 2; i * j <= MAX_SQUARE_SUM; ++j) {
primes[i*j] = 0;
}
}
}
void gen_table() {
for (int i = 0; i <= MAX_LENGTH; ++i) {
for (int j = 0; j <= MAX_SUM; ++j) {
for (int k = 0; k <= MAX_SQUARE_SUM; ++k) {
dyn_table[i][j][k] = 0;
}
}
}
dyn_table[0][0][0] = 1;
for (int i = 0; i < MAX_LENGTH; ++i) {
for (int j = 0; j <= 9 * i; ++j) {
for (int k = 0; k <= 9 * 9 * i; ++k) {
for (int l = 0; l < 10; ++l) {
dyn_table[i + 1][j + l][k + l*l] += dyn_table[i][j][k];
}
}
}
}
}
unsigned long long count_lucky (unsigned long long maxp) {
unsigned long long result = 0;
int len = 0;
int split_max[MAX_LENGTH];
while (maxp) {
split_max[len] = maxp % 10;
maxp /= 10;
++len;
}
int sum = 0;
int sq_sum = 0;
unsigned long long step_result;
unsigned long long step_;
for (int i = len-1; i >= 0; --i) {
step_result = 0;
int x1 = 9*i;
for (int l = 0; l < split_max[i]; ++l) {
//changed here........2
step_ = 0;
if(ans[i][l][sum][sq_sum]!=0)
{
step_result +=ans[i][l][sum][sq_sum];
continue;
}
int y = l + sum;
int x = l*l + sq_sum;
for (int j = 0; j <= x1; ++j) {
if(primes[j + y])
for (int k=start[i][j]; k<=end[i][j]; ++k) {
if (primes[k + x]) {
step_result += dyn_table[i][j][k];
step_+=dyn_table[i][j][k];
}
}
}
ans[i][l][sum][sq_sum] = step_;
//upto here...............2
}
result += step_result;
sum += split_max[i];
sq_sum += split_max[i] * split_max[i];
}
if (primes[sum] && primes[sq_sum]) {
++result;
}
return result;
}
int main(int argc, char** argv) {
gen_primes();
gen_table();
//changed here..........3
for(int i=0;i<=18;i++)
for(int j=0;j<=163;j++)
{
for(int k=0;k<=1458;k++)
if(dyn_table[i][j][k]!=0ll)
{
start[i][j] = k;
break;
}
for(int k=1460;k>=0;k--)
if(dyn_table[i][j][k]!=0ll)
{
end[i][j]=k;
break;
}
}
//upto here..........3
int cases = 0;
scanf("%d",&cases);
for (int i = 0; i < cases; ++i) {
unsigned long long a, b;
scanf("%lld %lld", &a, &b);
//changed here......4
if(b == 1000000000000000000ll)
b--;
//upto here.........4
printf("%lld\n", count_lucky(b) - count_lucky(a-1));
}
return 0;
}
Explanation:
gen_primes() and gen_table() are pretty much self explanatory.
count_lucky() works as follows:
split the number in split_max[], just storing single digit number for ones, tens, hundreds etc. positions.
The idea is: suppose split_map[2] = 7, so we need to calculate result for
1 in hundreds position and all 00 to 99.
2 in hundreds position and all 00 to 99.
.
.
7 in hundreds position and all 00 to 99.
this is actually done(in l loop) in terms of sum of digits and sum of square of digits which has been precalcutaled.
for this example: sum will vary from 0 to 9*i & sum of square will vary from 0 to 9*9*i...this is done in j and k loops.
This is repeated for all lengths in i loop
This was the idea of OleGG.
For optimization following is considered:
its useless to run sum of squares from 0 to 9*9*i as for particular sums of digits it would not go upto the full range. Like if i = 3 and sum equals 5 then sum of square would not vary from 0 to 9*9*3.This part is stored in start[] and end[] arrays using precomputed values.
value for particular number of digits and particular digit at most significant position of number and upto particular sum and upto particular sum of square isstored for memorization. Its too long but still its about 45 MB.
I believe this could be further optimized.

You should use DP for this task. Here is my solution:
#include <stdio.h>
const int MAX_LENGTH = 18;
const int MAX_SUM = 162;
const int MAX_SQUARE_SUM = 1458;
int primes[1459];
long long dyn_table[19][163][1459];
void gen_primes() {
for (int i = 0; i <= MAX_SQUARE_SUM; ++i) {
primes[i] = 1;
}
primes[0] = primes[1] = 0;
for (int i = 2; i * i <= MAX_SQUARE_SUM; ++i) {
if (!primes[i]) {
continue;
}
for (int j = 2; i * j <= MAX_SQUARE_SUM; ++j) {
primes[i*j] = 0;
}
}
}
void gen_table() {
for (int i = 0; i <= MAX_LENGTH; ++i) {
for (int j = 0; j <= MAX_SUM; ++j) {
for (int k = 0; k <= MAX_SQUARE_SUM; ++k) {
dyn_table[i][j][k] = 0;
}
}
}
dyn_table[0][0][0] = 1;
for (int i = 0; i < MAX_LENGTH; ++i) {
for (int j = 0; j <= 9 * i; ++j) {
for (int k = 0; k <= 9 * 9 * i; ++k) {
for (int l = 0; l < 10; ++l) {
dyn_table[i + 1][j + l][k + l*l] += dyn_table[i][j][k];
}
}
}
}
}
long long count_lucky (long long max) {
long long result = 0;
int len = 0;
int split_max[MAX_LENGTH];
while (max) {
split_max[len] = max % 10;
max /= 10;
++len;
}
int sum = 0;
int sq_sum = 0;
for (int i = len-1; i >= 0; --i) {
long long step_result = 0;
for (int l = 0; l < split_max[i]; ++l) {
for (int j = 0; j <= 9 * i; ++j) {
for (int k = 0; k <= 9 * 9 * i; ++k) {
if (primes[j + l + sum] && primes[k + l*l + sq_sum]) {
step_result += dyn_table[i][j][k];
}
}
}
}
result += step_result;
sum += split_max[i];
sq_sum += split_max[i] * split_max[i];
}
if (primes[sum] && primes[sq_sum]) {
++result;
}
return result;
}
int main(int argc, char** argv) {
gen_primes();
gen_table();
int cases = 0;
scanf("%d", &cases);
for (int i = 0; i < cases; ++i) {
long long a, b;
scanf("%lld %lld", &a, &b);
printf("%lld\n", count_lucky(b) - count_lucky(a-1));
}
return 0;
}
Brief explanation:
I'm calculating all primes up to 9 * 9 * MAX_LENGTH using Eratosthenes method;
Later, using DP, I'm building table dyn_table where value X in dyn_table[i][j][k] means that we have exactly X numbers of length i with sum of digits equal to j and sum of its squares equal to k
Then we can easily count amount of lucky numbers from 1 to 999..999(len times of 9). For this we just sum up all dyn_table[len][j][k] where both j and k are primes.
To calculate amount of lucky number from 1 to random X we split interval from 1 to X into intervals with length equal to 10^K (see *count_lucky* function).
And our last step is subtract count_lucky(a-1) (cause we are including a in our interval) from count_lucky(b).
That's all. Precalculation work for O(log(MAX_NUMBER)^3), each step have also this complexity.
I've tested my solution against linear straightforward one and results were equal

Instead of enumerating the space of numbers, enumerate the different "signatures" of numbers that are lucky. and then print all the differnet combination of those.
This can be done with trivial backtracking:
#define _GNU_SOURCE
#include <assert.h>
#include <limits.h>
#include <stdbool.h>
#include <stdint.h>
#include <stdio.h>
#define bitsizeof(e) (CHAR_BIT * sizeof(e))
#define countof(e) (sizeof(e) / sizeof((e)[0]))
#define BITMASK_NTH(type_t, n) ( ((type_t)1) << ((n) & (bitsizeof(type_t) - 1)))
#define OP_BIT(bits, n, shift, op) \
((bits)[(unsigned)(n) / (shift)] op BITMASK_NTH(typeof(*(bits)), n))
#define TST_BIT(bits, n) OP_BIT(bits, n, bitsizeof(*(bits)), & )
#define SET_BIT(bits, n) (void)OP_BIT(bits, n, bitsizeof(*(bits)), |= )
/* fast is_prime {{{ */
static uint32_t primes_below_1M[(1U << 20) / bitsizeof(uint32_t)];
static void compute_primes_below_1M(void)
{
SET_BIT(primes_below_1M, 0);
SET_BIT(primes_below_1M, 1);
for (uint32_t i = 2; i < bitsizeof(primes_below_1M); i++) {
if (TST_BIT(primes_below_1M, i))
continue;
for (uint32_t j = i * 2; j < bitsizeof(primes_below_1M); j += i) {
SET_BIT(primes_below_1M, j);
}
}
}
static bool is_prime(uint64_t n)
{
assert (n < bitsizeof(primes_below_1M));
return !TST_BIT(primes_below_1M, n);
}
/* }}} */
static uint32_t prime_checks, found;
static char sig[10];
static uint32_t sum, square_sum;
static void backtrack(int startdigit, int ndigits, int maxdigit)
{
ndigits++;
for (int i = startdigit; i <= maxdigit; i++) {
sig[i]++;
sum += i;
square_sum += i * i;
prime_checks++;
if (is_prime(sum) && is_prime(square_sum)) {
found++;
}
if (ndigits < 18)
backtrack(0, ndigits, i);
sig[i]--;
sum -= i;
square_sum -= i * i;
}
}
int main(void)
{
compute_primes_below_1M();
backtrack(1, 0, 9);
printf("did %d signature checks, found %d lucky signatures\n",
prime_checks, found);
return 0;
}
When I run it it does:
$ time ./lucky
did 13123091 signature checks, found 933553 lucky signatures
./lucky 0.20s user 0.00s system 99% cpu 0.201 total
Instead of found++ you want to generate all the distinct permutations of digits that you can build with that number. I also precompute the first 1M of primes ever.
I've not checked if the code is 100% correct, you may have to debug it a bit. But the rought idea is here, and I'm able to generate all the lucky permutation below 0.2s (even without bugs it should not be more than twice as slow).
And of course you want to generate the permutations that verify A <= B. You may want to ignore generating partitions that have more digits than B or less than A too. Anyway you can improve on my general idea from here.
(Note: The blurb at the start is because I cut and paste code I wrote for project euler, hence the very fast is_prime that works for N <= 1M ;) )

For those who weren't aware already, this is a problem on the website InterviewStreet.com (and in my opinion, the most difficult one there). My approach started off similar to (and was inspired by) OleGG's below. However, after creating the first [19][163][1459] table that he did (which I'll call table1), I went in a slightly different direction. I created a second table of ragged length [19][x][3] (table2), where x is the number of unique sum pairs for the corresponding number of digits. And for the third dimension of the table, with length 3, the 1st element is the quantity of unique "sum pairs" with the sum and squareSum values held by the 2nd and 3rd elements.
For example:
//pseudocode
table2[1] = new long[10][3]
table2[1] = {{1, 0, 0}, {1, 1, 1}, {1, 2, 4},
{1, 3, 9}, {1, 4, 16}, {1, 5, 25},
{1, 6, 36}, {1, 7, 49}, {1, 8, 64}, {1, 9, 81}}
table2[2] = new long[55][3]
table2[3] = new long[204][3]
table2[4] = new long[518][3]
.
.
.
.
table2[17] = new long[15552][3]
table2[18] = new long[17547][3]
The numbers I have for the second dimension length of the array (10, 55, 204, 518, ..., 15552, 17547) can be verified by querying table1, and in a similar fashion table2 can be populated. Now using table2 we can solve large "lucky" queries much faster than OleGG's posted method, although still employing a similar "split" process as he did. For example, if you need to find lucky(00000-54321) (i.e. the lucky numbers between 0 and 54321), it breaks down to the sum of the following 5 lines:
lucky(00000-54321) = {
lucky(00000-49999) +
lucky(50000-53999) +
lucky(54000-54299) +
lucky(54300-53319) +
lucky(54320-54321)
}
Which breaks down further:
lucky(00000-49999) = {
lucky(00000-09999) +
lucky(10000-19999) +
lucky(20000-29999) +
lucky(30000-39999) +
lucky(40000-49999)
}
.
.
lucky(54000-54299) = {
lucky(54000-54099) +
lucky(54100-54199) +
lucky(54200-54299)
}
.
.
.
etc
Each of these values can be obtained easily by querying table2. For example, lucky(40000-49999) is found by adding 4 and 16 to the 2nd and 3rd elements of the third dimension table2:
sum = 0
for (i = 0; i < 518; i++)
if (isPrime[table2[4][i][1] + 4] && isPrime[table2[4][i][2] + 4*4])
sum += table2[4][i][0]
return sum
Or for lucky(54200-54299):
sum = 0
for (i = 0; i < 55; i++)
if (isPrime[table2[2][i][1] + (5+4+2)]
&& isPrime[table2[2][i][2] + (5*5+4*4+2*2)])
sum += table2[2][i][0]
return sum
Now, OleGG's solution performed significantly faster than anything else I'd tried up until then, but with my modifications described above, it performs even better than before (by a factor of roughly 100x for a large test set). However, it is still not nearly fast enough for the blind test cases given on InterviewStreet. Through some clever hack I was able to determine I am currently running about 20x too slow to complete their test set in the allotted time. However, I can find no further optimizations. The biggest time sink here is obviously iterating through the second dimension of table2, and the only way to avoid that would be to tabulate the results of those sums. However, there are too many possibilities to compute them all in the time given (5 seconds) or to store them all in the space given (256MB). For example, the lucky(54200-54299) loop above could be pre-computed and stored as a single value, but if it was, we'd also need to pre-compute lucky(123000200-123000299) and lucky(99999200-99999299), etc etc. I've done the math and it is way too many calculations to pre-compute.

I have just solved this problem.
It's just a Dynamic Programming problem. Take DP[n](sum-square_sum) as the DP function, and DP[n](sum-square_sum) is the count of all of the numbers whose digits is less than or equal to n, with the sum and square_sum of digits of the number is respectively represented by sum and square_sum. For example:
DP[1](1-1) = 1 # only 1 satisfies the condition
DP[2](1-1) = 2 # both 1 and 10 satisfies the condition
DP[3](1-1) = 3 # 1 10 100
DP[3](2-4) = 3 # 11 110 101
Since we can easily figure out the first DP state DP[1][..][..], it is:
(0-0) => 1 (1-1) => 1 (2-4) => 1 (3-9) => 1 (4-16) => 1
(5-25) => 1 (6-36) => 1 (7-49) => 1 (8-64) => 1 (9-81) => 1
then we can deduce DP[1] from DP[1], and then DP[3] ... DP[18]
the deduce above is made by the fact that every time when n increase by 1, for example from DP[1] to DP[2], we got a new digit (0..9), and the set of (sum, square_sum) pair (i.e. DP[n]) must be updated.
Finally, we can traverse the DP[18] set and count of the numbers that are lucky.
Well, how about the time and space complexity of the algorithm above?
As we know sum <= 18*9=162, square_sum <= 18*9*9 = 1458, so the set of (sum, square_sum) pair
(i.e. DP[n]) is very small, less than 162*1458=236196, in fact it's much smaller than 236196;
The fact is: my ruby program counting all the lucky numbers between 0 and 10^18 finishes in less than 1s.
ruby lucky_numbers.rb 0.55s user 0.00s system 99% cpu 0.556 total
and I test my program by writing a test function using brute force algorithm, and it's right for numbers less than 10^7 .

Based on the requirements, you can do it in different ways. If I was doing it, I would calculate the prime numbers using 'Sieve of Eratosthenes' in the required range (A to (9*2)*B.length), cache them (again, depending on your setup, you can use in-memory or disk cache) and use it for the next run.
I just coded a fast solution (Java), as below (NOTE: Integer overflow is not checked. Just a fast example. Also, my code is not optimized.):
import java.util.ArrayList;
import java.util.Arrays;
public class LuckyNumbers {
public static void main(String[] args) {
int a = 0, b = 1000;
LuckyNumbers luckyNums = new LuckyNumbers();
ArrayList<Integer> luckyList = luckyNums.findLuckyNums(a, b);
System.out.println(luckyList);
}
private ArrayList<Integer> findLuckyNums(int a, int b) {
ArrayList<Integer> luckyList = new ArrayList<Integer>();
int size = ("" + b).length();
int maxNum = 81 * 4; //9*2*b.length() - 9 is used, coz it's the max digit
System.out.println("Size : " + size + " MaxNum : " + maxNum);
boolean[] primeArray = sieve(maxNum);
for(int i=a;i<=b;i++) {
String num = "" + i;
int sumDigits = 0;
int sumSquareDigits = 0;
for(int j=0;j<num.length();j++) {
int digit = Integer.valueOf("" + num.charAt(j));
sumDigits += digit;
sumSquareDigits += Math.pow(digit, 2);
}
if(primeArray[sumDigits] && primeArray[sumSquareDigits]) {
luckyList.add(i);
}
}
return luckyList;
}
private boolean[] sieve(int n) {
boolean[] prime = new boolean[n + 1];
Arrays.fill(prime, true);
prime[0] = false;
prime[1] = false;
int m = (int) Math.sqrt(n);
for (int i = 2; i <= m; i++) {
if (prime[i]) {
for (int k = i * i; k <= n; k += i) {
prime[k] = false;
}
}
}
return prime;
}
}
And the output was:
[11, 12, 14, 16, 21, 23, 25, 32, 38, 41, 49, 52, 56, 58, 61, 65, 83, 85, 94, 101, 102, 104, 106, 110, 111, 113, 119, 120, 131, 133, 137, 140, 146, 160, 164, 166, 173, 179, 191, 197, 199, 201, 203, 205, 210, 223, 229, 230, 232, 250, 289, 292, 298, 302, 308, 311, 313, 317, 320, 322, 331, 335, 337, 344, 346, 353, 355, 364, 368, 371, 373, 377, 379, 380, 386, 388, 397, 401, 409, 410, 416, 434, 436, 443, 449, 461, 463, 467, 476, 490, 494, 502, 506, 508, 520, 533, 535, 553, 559, 560, 566, 580, 595, 601, 605, 610, 614, 616, 634, 638, 641, 643, 647, 650, 656, 661, 665, 674, 683, 689, 698, 713, 719, 731, 733, 737, 739, 746, 764, 773, 779, 791, 793, 797, 803, 805, 829, 830, 836, 838, 850, 863, 869, 883, 892, 896, 904, 911, 917, 919, 922, 928, 937, 940, 944, 955, 968, 971, 973, 977, 982, 986, 991]

I haven't carefully analyzed your current solution but this might improve on it:
Since the order of digits doesn't matter, you should go through all possible combinations of digits 0-9 of length 1 to 18, keeping track of the sum of digits and their squares and adding one digit at a time, using result of previous calculation.
So if you know that for 12 sum of digits is 3 and of squares is 5, look at numbers
120, 121, 122... etc and calculate sums for them trivially from the 3 and 5 for 12.

Sometimes the fastest solution is incredibly simple:
uint8_t precomputedBitField[] = {
...
};
bool is_lucky(int number) {
return precomputedBitField[number >> 8] & (1 << (number & 7));
}
Just modify your existing code to generate "precomputedBitField".
If you're worried about size, to cover all numbers from 0 to 999 it will only cost you 125 bytes, so this method will probably be smaller (and a lot faster) than any other alternative.

I was trying to come up with a solution using Pierre's enumeration method, but never came up with a sufficiently fast way to count the permutations. OleGG's counting method is very clever, and pirate's optimizations are necessary to make it fast enough. I came up with one minor improvement, and one workaround to a serious problem.
First, the improvement: you don't have to step through all the sums and squaresums one by one checking for primes in pirate's j and k loops. You have (or can easily generate) a list of primes. If you use the other variables to figure out which primes are in range, you can just step through the list of suitable primes for sum and squaresum. An array of primes and a lookup table to quickly determine at which index the prime >= a number is at is helpful. However, this is probably only a fairly minor improvement.
The big issue is with pirate's ans cache array. It is not 45MB as claimed; with 64 bit entries, it is something like 364MB. This is outside the (current) allowed memory limits for C and Java. It can be reduced to 37MB by getting rid of the "l" dimension, which is unnecessary and hurts cache performance anyway. You're really interested in caching counts for l + sum and l*l + squaresum, not l, sum, and squaresum individually.

First I would like to add that a lucky number can be calculated by a sieve, the explanation of the sieve can be found here: http://en.wikipedia.org/wiki/Lucky_number
so you can improve your solution speed using a sieve to determine the numbers,

Related

Find the least common multiple of multiple numbers

The goal of this program is to find the smallest number that can be divided by the numbers 1 to 20 without any remainders. The code is working but it takes 33 seconds. Can I improve it so that it can be faster? How?
#include <stdio.h>
int main(){
int number = 19, i, k;
label:
number++;
k = 0;
for (i = 1; i <= 20; i++){
if (number%i == 0){
k++;
}
}
if (k != 20){
goto label;
}
printf("%d\n", number);
return 0;
}
#include <stdio.h>
/* GCD returns the greatest common divisor of a and b (which must be non-zero).
This algorithm comes from Euclid, Elements, circa 300 BCE, book (chapter)
VII, propositions 1 and 2.
*/
static unsigned GCD(unsigned a, unsigned b)
{
while (0 < b)
{
unsigned c = a % b;
a = b;
b = c;
}
return a;
}
int main(void)
{
static const unsigned Limit = 20;
unsigned LCM = 1;
/* Update LCM to the least common multiple of the LCM so far and the next
i. The least common multiple is obtained by multiplying the numbers
and removing the duplicated common factors by dividing by the GCD.
*/
for (unsigned i = 1; i <= Limit; ++i)
LCM *= i / GCD(LCM, i);
printf("The least common multiple of numbers from 1 to %u is %u.\n",
Limit, LCM);
}
Change
int number = 19 ;
to
int number = 0 ;
then:
number++;
to
number += 20 ;
is an obvious improvement that will have a significant impact even if it is still a somewhat naive brute force approach.
At onlinegdb.com your algorithm took 102 seconds to run whereas this change runs in less that one second and produces the same answer.
The initial product of primes value suggested in a comment will provide a further improvement.
You need to multiply all the least common multiples together, but omit numbers that could be multiplied to get any of the others. This translates to multiply by all primes less than N with each prime number raised to the highest power <= N.
const unsigned primes[] = {
2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47
};
unsigned long long answer(unsigned n){ //for your example n=20
if (n>46) return 0; //will overflow 64 bit unsigned long long
unsigned long long tmp, ret = 1;
for (unsigned i = 0; primes[i]<=n;++i){ //each prime less than n
tmp = primes[i];
while ((tmp*primes[i])<=n) //highest power less than n
tmp *= primes[i];
ret *= tmp;
}
return ret;
}
usage: printf("%llu", answer(20));
If my math/code is right this should be fast and cover numbers up to 46. If your compiler supports unsigned __int128 it can be modified to go up to 88.
Explanation:
TLDR version: all numbers are either prime or can be made by multiplying primes.
To get the least common multiple of a set of numbers you break each
number into it's prime multiples and multiply the highest number of
each prime together.
Primes less than 20:
2,3,5,7,11,13,17,19
Non primes under 20:
4 = 2*2
6 = 2*3
8 = 2*2*2
9 = 3*3
10 = 2*5
12 = 2*2*3
14 = 2*2*7
15 = 3*5
16 = 2*2*2*2
18 = 2*3*3
20 = 2*2*5
From this we see that the maximum number of 2s is 4 and the maximum
number of 3s is 2.
2 to the 4th <= 20
3 squared <= 20
All powers >1 of the remaining
primes are greater than 20.
Therefore you get:
2*2*2*2*3*3*5*7*11*13*17*19
Which is what you would see if you watched the tmp variable in a
debugger.
Another reason this is faster is that it avoids modulus and division
(It's expensive on a lot of systems)
Here's a way to do it without defining primes or divisions (except for a single sqrt), using a Sieve of Eratosthenes (circa 200 BCE).
I mark composites with 1, and primes^x with -1. Then i just loop over the array of numbers from sqrt(n) to n and pull out the remaining primes and max power primes.
#include <stdio.h>
#include <math.h>
#define n 20
int main()
{
int prime [100]={0};
int rootN = sqrt(n);
unsigned long long inc,oldInc;
int i;
for (i=2; i<rootN; i++)
{
if (prime[i] == 1) continue;
//Classic Sieve
inc = i*i;
while (inc < n)
{
prime[inc] = 1;
inc += i;
}
//Max power of prime
oldInc = i;
inc = i * i;
while (inc < n)
{
prime[inc] = 1;
oldInc=inc;
inc *= i;
}
prime[oldInc] = -1;
prime[i] = 1;
}
inc = 1;
for(i=rootN; i<n; i++)
{
if (prime[i] == 0 || prime[i] == -1)
{
inc = inc * i;
}
}
printf("%llu",inc);
return 0;
}

Distribute elements between equivalent arrays to achieve balanced sums

I am given a set of elements from, say, 10 to 21 (always sequential),
I generate arrays of the same size, where size is determined runtime.
Example of 3 generated arrays (arrays # is dynamic as well as # of elements in all arrays, where some elements can be 0s - not used):
A1 = [10, 11, 12, 13]
A2 = [14, 15, 16, 17]
A3 = [18, 19, 20, 21]
these generated arrays will be given to different processes to to do some computations on the elements. My aim is to balance the load for every process that will get an array. What I mean is:
With given example, there are
A1 = 46
A2 = 62
A3 = 78
potential iterations over elements given for each thread.
I want to rearrange initial arrays to give equal amount of work for each process, so for example:
A1 = [21, 11, 12, 13] = 57
A2 = [14, 15, 16, 17] = 62
A3 = [18, 19, 20, 10] = 67
(Not an equal distribution, but more fair than initial). Distributions can be different, as long as they approach some optimal distribution and are better than the worst (initial) case of 1st and last arrays. As I see it, different distributions can be achieved using different indexing [where the split of arrays is made {can be uneven}]
This works fine for given example, but there may be weird cases..
So, I see this as a reflection problem (due to the lack of knowledge of proper definition), where arrays should be seen with a diagonal through them, like:
10|111213
1415|1617
181920|21
And then an obvious substitution can be done..
I tried to implement like:
if(rest == 0)
payload_size = (upper-lower)/(processes-1);
else
payload_size = (upper-lower)/(processes-1) + 1;
//printf("payload size: %d\n", payload_size);
long payload[payload_size];
int m = 0;
int k = payload_size/2;
int added = 0; //track what been added so far (to skip over already added elements)
int added2 = 0; // same as 'added'
int p = 0;
for (i = lower; i <= upper; i=i+payload_size){
for(j = i; j<(i+payload_size); j++){
if(j <= upper){
if((j-i) > k){
if(added2 > j){
added = j;
payload[(j-i)] = j;
printf("1 adding data: %d at location: %d\n", payload[(j-i)], (j-i));
}else{
printf("else..\n");
}
}else{
if(added < upper - (m+1)){
payload[(j-i)] = upper - (p*payload_size) - (m++);
added2 = payload[(j-i)];
printf("2 adding data: %d at location: %d\n", payload[(j-i)], (j-i));
}else{
payload[(j-i)] = j;
printf("2.5 adding data: %d at location: %d\n", payload[(j-i)], (j-i));
}
}
}else{ payload[(j-i)] = '\0'; }
}
p++;
k=k/2;
//printf("send to proc: %d\n", ((i)/payload_size)%(processes-1)+1);
}
..but failed horribly.
You definitely can see the problem in the implementation, because it is poorly scalable, not complete, messy, badly written and so on, and on, and on, ...
So, I need help either with the implementation or with an idea of a better approach to do what I want to achieve, given the description.
P.S. I need the solution to be as 'in-liney' as possible (avoid loop nesting) - that is why I am using bunch of flags and global indexes.
Surely this can be done with extra loops and unnecessary iterations. I invite people that can and appreciate t̲h̲e̲ ̲a̲r̲t̲ ̲o̲f̲ ̲i̲n̲d̲e̲x̲i̲n̲g̲ when it comes to arrays.
I am sure there is a solution somewhere out there, but I just cannot make an appropriate Google query to find it.
Hint? I thought of using index % size_of_my_data to achieve this task..
P.S. Application: described here
Here is an O(n) solution I wrote using deque (double-ended queue, a deque is not necessary and a simple array can be used, but a deque makes the code clean because of popRight and popLeft). The code is Python, not pseudocode, but it should be pretty to understand (because it's Python).:
def balancingSumProblem(seqStart = None, seqStop = None, numberOfArrays = None):
from random import randint
from collections import deque
seq = deque(xrange(seqStart or randint(1, 10),
seqStop and seqStop + 1 or randint(11,30)))
arrays = [[] for _ in xrange(numberOfArrays or randint(1,6))]
print "# of elements: {}".format(len(seq))
print "# of arrays: {}".format(len(arrays))
averageNumElements = float(len(seq)) / len(arrays)
print "average number of elements per array: {}".format(averageNumElements)
oddIteration = True
try:
while seq:
for array in arrays:
if len(array) < averageNumElements and oddIteration:
array.append(seq.pop()) # pop() is like popright()
elif len(array) < averageNumElements:
array.append(seq.popleft())
oddIteration = not oddIteration
except IndexError:
pass
print arrays
print [sum(array) for array in arrays]
balancingSumProblem(10,21,3) # Given Example
print "\n---------\n"
balancingSumProblem() # Randomized Test
Basically, from iteration to iteration, it alternates between grabbing large elements and distributing them evenly in the arrays and grabbing small elements and distributing them evenly in the arrays. It goes from out to in (though you could go from in to out) and tries to use what should be the average number of elements per array to balance it out further.
It's not 100 percent accurate with all tests but it does a good job with most randomized tests. You can try running the code here: http://repl.it/cJg
With a simple sequence to assign, you can just iteratively add the min and max elements to each list in turn. There are some termination details to fix up, but that's the general idea. Applied to your example the output would look like:
john-schultzs-macbook-pro:~ jschultz$ ./a.out
10 21 13 18 = 62
11 20 14 17 = 62
12 19 15 16 = 62
A simple reflection assignment like this will be optimal when num_procs evenly divides num_elems. It will be sub-optimal, but still decent, when it doesn't:
#include <stdio.h>
int compute_dist(int lower, int upper, int num_procs)
{
if (lower > upper || num_procs <= 0)
return -1;
int num_elems = upper - lower + 1;
int num_elems_per_proc_floor = num_elems / num_procs;
int num_elems_per_proc_ceil = num_elems_per_proc_floor + (num_elems % num_procs != 0);
int procs[num_procs][num_elems_per_proc_ceil];
int i, j, sum;
// assign pairs of (lower, upper) to each process until we can't anymore
for (i = 0; i + 2 <= num_elems_per_proc_floor; i += 2)
for (j = 0; j < num_procs; ++j)
{
procs[j][i] = lower++;
procs[j][i+1] = upper--;
}
// handle left overs similarly to the above
// NOTE: actually you could use just this loop alone if you set i = 0 here, but the above loop is more understandable
for (; i < num_elems_per_proc_ceil; ++i)
for (j = 0; j < num_procs; ++j)
if (lower <= upper)
procs[j][i] = ((0 == i % 2) ? lower++ : upper--);
else
procs[j][i] = 0;
// print assignment results
for (j = 0; j < num_procs; ++j)
{
for (i = 0, sum = 0; i < num_elems_per_proc_ceil; ++i)
{
printf("%d ", procs[j][i]);
sum += procs[j][i];
}
printf(" = %d\n", sum);
}
return 0;
}
int main()
{
compute_dist(10, 21, 3);
return 0;
}
I have used this implementation, which I mentioned in this report (Implementation works for cases I've used for testing (1-15K) (1-30K) and (1-100K) datasets. I am not saying that it will be valid for all the cases):
int aFunction(long lower, long upper, int payload_size, int processes)
{
long result, i, j;
MPI_Status status;
long payload[payload_size];
int m = 0;
int k = (payload_size/2)+(payload_size%2)+1;
int lastAdded1 = 0;
int lastAdded2 = 0;
int p = 0;
int substituted = 0;
int allowUpdate = 1;
int s;
int times = 1;
int times2 = 0;
for (i = lower; i <= upper; i=i+payload_size){
for(j = i; j<(i+payload_size); j++){
if(j <= upper){
if(k != 0){
if((j-i) >= k){
payload[(j-i)] = j- (m);
lastAdded2 = payload[(j-i)];
}else{
payload[(j-i)] = upper - (p*payload_size) - (m++) + (p*payload_size);
if(allowUpdate){
lastAdded1 = payload[(j-i)];
allowUpdate = 0;
}
}
}else{
int n;
int from = lastAdded1 > lastAdded2 ? lastAdded2 : lastAdded1;
from = from + 1;
int to = lastAdded1 > lastAdded2 ? lastAdded1 : lastAdded2;
int tempFrom = (to-from)/payload_size + ((to-from)%payload_size>0 ? 1 : 0);
for(s = 0; s < tempFrom; s++){
int restIndex = -1;
for(n = from; n < from+payload_size; n++){
restIndex = restIndex + 1;
payload[restIndex] = '\0';
if(n < to && n >= from){
payload[restIndex] = n;
}else{
payload[restIndex] = '\0';
}
}
from = from + payload_size;
}
return 0;
}
}else{ payload[(j-i)] = '\0'; }
}
p++;
k=(k/2)+(k%2)+1;
allowUpdate = 1;
}
return 0;
}

Trailing zeroes in a Factorial

I am trying to write a code for calculating the number of trailing zeroes in a factorial of a specific number (large numbers). However, for small numbers, i get the correct result, but for large the deviations keeps increasing. What's wrong with my logic
#include <stdio.h>
int main(void) {
int t;
scanf("%d", &t);
while (t > 0) {
int factorten = 0, factorfive = 0, factortwo = 0, remainingfive = 0,
remainingtwo = 0;
unsigned int factors = 0;
unsigned int n;
scanf("%u", &n);
for (unsigned int i = n; i > 0; i--) {
if (i % 10 == 0) {
factorten++;
continue;
} else if (i % 5 == 0) {
factorfive++;
continue;
} else if (i % 2 == 0) {
// int new = i;
// while(new % 2 == 0)
//{
// new = new / 2;
factortwo++;
//}
continue;
}
}
factors = factors + factorten;
printf("%u\n", factors);
if (factorfive % 2 == 0 && factorfive != 0) {
factors = factors + (factorfive / 2);
} else {
remainingfive = factorfive % 2;
factors = factors + ((factorfive - remainingfive) / 2);
}
printf("%u\n", factors);
if (factortwo % 5 == 0 && factortwo != 0) {
factors = factors + (factortwo / 5);
} else {
remainingtwo = factortwo % 5;
factors = factors + ((factortwo - remainingtwo) / 5);
}
printf("%u\n", factors);
if ((remainingfive * remainingtwo % 10) == 0 &&
(remainingfive * remainingtwo % 10) != 0) {
factors++;
}
printf("%u\n", factors);
t--;
}
}
Sample Input:
6
3
60
100
1024
23456
8735373
Sample Output:
0
14
24
253
5861
2183837
My OUTPUT
0
13
23
235
5394
2009134
Edit: ignore the first two, they are suboptimal. The third algorithm is optimal.
I think this does what you're trying to do, but is a lot simpler and works:
int tzif(int n)
{
int f2 = 0, f5 = 0;
for (;n > 1; n--)
{
int x = n;
for (;x % 2 == 0; x /= 2)
f2++;
for (;x % 5 == 0; x /= 5)
f5++;
}
return f2 > f5 ? f5 : f2;
}
It counts 2-factors and 5-factors of numbers N...2. Then it returns the smaller of the two (because adding 2-factors is useless without adding 5-factors and vice-versa). Your code is too strange for me to analyze.
I think this should work too, because a factorial will have enough 2-factors to "cover" the 5-factors:
int tzif(int n)
{
int f5 = 0;
for (;n > 1; n--)
for (x = n;x % 5 == 0; x /= 5)
f5++;
return f5;
}
This only counts 5-factors and returns that.
Another method I think should work:
int tzif(int n)
{
int f5 = 0;
for (int d = 5; d <= n; d *= 5)
f5 += n / d;
return f5;
}
Count every fifth number (each has a 5-factor), then every 25-th number (each has another 5-factor), etc.
Have 3 counters - c2,c5,c10.
I think the checks should be
divisible by 5 but not by 10 -> c5++
divisible by 2 but not by 10 -> c2++
divisible by 10. Here if true, then count number of 0's. (c10++)
At last number of 0's will be
smaller_of(c2,c5) + c10
Try to code using this. Should work.
First the trailing 0 in N! are determined by factors 2 and 5 (10). The factors 2 always would be more that the factors 5 in this case you only need to calculate how factors 5 are in the N!.
(N!/5) would give you the number of multiple of 5 (5^1) in N!
(N!/25) would give you the number of multiple of 25 (5^2) in N!
(N!/125) would give you the number of multiple of 125 (5^3) in N!
...
(N!/5^n) would give you the number of multiple of 5^n in N!
When you add the multiple of 5 you are adding too the multiple of 25, 125, ..., 5^n, when you add multiple of 25 you are adding too the multiple of 125, ..., 5^n, etc...
In that case you only need to iterate the power of 5 less or equal than N and add the number of multiple of that 5 power.
Code:
long long trailing_zeros(long long N) {
long long zeros = 0;
for (long long power5 = 5; power5 <= N; power5 *= 5)
zeros += N / power5;
return zeros;
}
#include<iostream>
int main()
{
int size,i;
std::cin >> size;
int*fact;
fact = new int[size];
for (i = 0; i < size; i++)
{
std::cin >> fact[size];
}
for (i = 0; i < size; i++)
{
int con = 5;
int multiple = 0;
do
{
multiple = multiple+(fact[size] / con);
con = con * 5;
} while (con < fact[size]);
std::cout << multiple <<'\n';
}
return 0;
}
this code works perfectly for a single input..bt for multiple inputs it prints the o/p for the last entered number...what is wrong..i jst cant think off it

Faster algorithm to find how many numbers are not divisible by a given set of numbers

I am trying to solve an online judge problem: http://opc.iarcs.org.in/index.php/problems/LEAFEAT
The problem in short:
If we are given an integer L and a set of N integers s1,s2,s3..sN, we have to find how many numbers there are from 0 to L-1 which are not divisible by any of the 'si's.
For example, if we are given, L = 20 and S = {3,2,5} then there are 6 numbers from 0 to 19 which are not divisible by 3,2 or 5.
L <= 1000000000 and N <= 20.
I used the Inclusion-Exclusion principle to solve this problem:
/*Let 'T' be the number of integers that are divisible by any of the 'si's in the
given range*/
for i in range 1 to N
for all subsets A of length i
if i is odd then:
T += 1 + (L-1)/lcm(all the elements of A)
else
T -= 1 + (L-1)/lcm(all the elements of A)
return T
Here is my code to solve this problem
#include <stdio.h>
int N;
long long int L;
int C[30];
typedef struct{int i, key;}subset_e;
subset_e A[30];
int k;
int gcd(a,b){
int t;
while(b != 0){
t = a%b;
a = b;
b = t;
}
return a;
}
long long int lcm(int a, int b){
return (a*b)/gcd(a,b);
}
long long int getlcm(int n){
if(n == 1){
return A[0].key;
}
int i;
long long int rlcm = lcm(A[0].key,A[1].key);
for(i = 2;i < n; i++){
rlcm = lcm(rlcm,A[i].key);
}
return rlcm;
}
int next_subset(int n){
if(k == n-1 && A[k].i == N-1){
if(k == 0){
return 0;
}
k--;
}
while(k < n-1 && A[k].i == A[k+1].i-1){
if(k <= 0){
return 0;
}
k--;
}
A[k].key = C[A[k].i+1];
A[k].i++;
return 1;
}
int main(){
int i,j,add;
long long int sum = 0,g,temp;
scanf("%lld%d",&L,&N);
for(i = 0;i < N; i++){
scanf("%d",&C[i]);
}
for(i = 1; i <= N; i++){
add = i%2;
for(j = 0;j < i; j++){
A[j].key = C[j];
A[j].i = j;
}
temp = getlcm(i);
g = 1 + (L-1)/temp;
if(add){
sum += g;
} else {
sum -= g;
}
k = i-1;
while(next_subset(i)){
temp = getlcm(i);
g = 1 + (L-1)/temp;
if(add){
sum += g;
} else {
sum -= g;
}
}
}
printf("%lld",L-sum);
return 0;
}
The next_subset(n) generates the next subset of size n in the array A, if there is no subset it returns 0 otherwise it returns 1. It is based on the algorithm described by the accepted answer in this stackoverflow question.
The lcm(a,b) function returns the lcm of a and b.
The get_lcm(n) function returns the lcm of all the elements in A.
It uses the property : LCM(a,b,c) = LCM(LCM(a,b),c)
When I submit the problem on the judge it gives my a 'Time Limit Exceeded'. If we solve this using brute force we get only 50% of the marks.
As there can be upto 2^20 subsets my algorithm might be slow, hence I need a better algorithm to solve this problem.
EDIT:
After editing my code and changing the function to the Euclidean algorithm, I am getting a wrong answer, but my code runs within the time limit. It gives me a correct answer to the example test but not to any other test cases; here is a link to ideone where I ran my code, the first output is correct but the second is not.
Is my approach to this problem correct? If it is then I have made a mistake in my code, and I'll find it; otherwise can anyone please explain what is wrong?
You could also try changing your lcm function to use the Euclidean algorithm.
int gcd(int a, int b) {
int t;
while (b != 0) {
t = b;
b = a % t;
a = t;
}
return a;
}
int lcm(int a, int b) {
return (a * b) / gcd(a, b);
}
At least with Python, the speed differences between the two are pretty large:
>>> %timeit lcm1(103, 2013)
100000 loops, best of 3: 9.21 us per loop
>>> %timeit lcm2(103, 2013)
1000000 loops, best of 3: 1.02 us per loop
Typically, the lowest common multiple of a subset of k of the s_i will exceed L for k much smaller than 20. So you need to stop early.
Probably, just inserting
if (temp >= L) {
break;
}
after
while(next_subset(i)){
temp = getlcm(i);
will be sufficient.
Also, shortcut if there are any 1s among the s_i, all numbers are divisible by 1.
I think the following will be faster:
unsigned gcd(unsigned a, unsigned b) {
unsigned r;
while(b) {
r = a%b;
a = b;
b = r;
}
return a;
}
unsigned recur(unsigned *arr, unsigned len, unsigned idx, unsigned cumul, unsigned bound) {
if (idx >= len || bound == 0) {
return bound;
}
unsigned i, g, s = arr[idx], result;
g = s/gcd(cumul,s);
result = bound/g;
for(i = idx+1; i < len; ++i) {
result -= recur(arr, len, i, cumul*g, bound/g);
}
return result;
}
unsigned inex(unsigned *arr, unsigned len, unsigned bound) {
unsigned i, result = bound, t;
for(i = 0; i < len; ++i) {
result -= recur(arr, len, i, 1, bound);
}
return result;
}
call it with
unsigned S[N] = {...};
inex(S, N, L-1);
You need not add the 1 for the 0 anywhere, since 0 is divisible by all numbers, compute the count of numbers 1 <= k < L which are not divisible by any s_i.
Create an array of flags with L entries. Then mark each touched leaf:
for(each size in list of sizes) {
length = 0;
while(length < L) {
array[length] = TOUCHED;
length += size;
}
}
Then find the untouched leaves:
for(length = 0; length < L; length++) {
if(array[length] != TOUCHED) { /* Untouched leaf! */ }
}
Note that there is no multiplication and no division involved; but you will need up to about 1 GiB of RAM. If RAM is a problem the you can use an array of bits (max. 120 MiB).
This is only a beginning though, as there are repeating patterns that can be copied instead of generated. The first pattern is from 0 to S1*S2, the next is from 0 to S1*S2*S3, the next is from 0 to S1*S2*S3*S4, etc.
Basically, you can set all values touched by S1 and then S2 from 0 to S1*S2; then copy the pattern from 0 to S1*S2 until you get to S1*S2*S3 and set all the S3's between S3 and S1*S2*S3; then copy that pattern until you get to S1*S2*S3*S4 and set all the S4's between S4 and S1*S2*S3*S4 and so on.
Next; if S1*S2*...Sn is smaller than L, you know the pattern will repeat and can generate the results for lengths from S1*S2*...Sn to L from the pattern. In this case the size of the array only needs to be S1*S2*...Sn and doesn't need to be L.
Finally, if S1*S2*...Sn is larger than L; then you could generate the pattern for S1*S2*...(Sn-1) and use that pattern to create the results from S1*S2*...(Sn-1) to S1*S2*...Sn. In this case if S1*S2*...(Sn-1) is smaller than L then the array doesn't need to be as large as L.
I'm afraid your problem understanding is maybe not correct.
You have L. You have a set S of K elements. You must count the sum of quotient of L / Si. For L = 20, K = 1, S = { 5 }, the answer is simply 16 (20 - 20 / 5). But K > 1, so you must consider the common multiples also.
Why loop through a list of subsets? It doesn't involve subset calculation, only division and multiple.
You have K distinct integers. Each number could be a prime number. You must consider common multiples. That's all.
EDIT
L = 20 and S = {3,2,5}
Leaves could be eaten by 3 = 6
Leaves could be eaten by 2 = 10
Leaves could be eaten by 5 = 4
Common multiples of S, less than L, not in S = 6, 10, 15
Actually eaten leaves = 20/3 + 20/2 + 20/5 - 20/6 - 20/10 - 20/15 = 6
You can keep track of the distance until then next touched leaf for each size. The distance to the next touched leaf will be whichever distance happens to be smallest, and you'd subtract this distance from all the others (and wrap whenever the distance is zero).
For example:
int sizes[4] = {2, 5, 7, 9};
int distances[4];
int currentLength = 0;
for(size = 0 to 3) {
distances[size] = sizes[size];
}
while(currentLength < L) {
smallest = INT_MAX;
for(size = 0 to 3) {
if(distances[size] < smallest) smallest = distances[size];
}
for(size = 0 to 3) {
distances[size] -= smallest;
if(distances[size] == 0) distances[size] = sizes[size];
}
while( (smallest > 1) && (currentLength < L) ) {
currentLength++;
printf("%d\n", currentLength;
smallest--;
}
}
#A.06: u r the one with username linkinmew on opc, rite?
Anyways, the answer just requires u to make all possible subsets, and then apply inclusion exclusion principle. This will fall well within the time bounds for the data given. For making all possible subsets, u can easily define a recursive function.
i don't know about programming but in math there is a single theorem which works on a set that has GCD 1
L=20, S=(3,2,5)
(1-1/p)(1-1/q)(1-1/r).....and so on
(1-1/3)(1-1/2)(1-1/5)=(2/3)(1/2)(4/5)=4/15
4/15 means there are 4 numbers in each set of 15 number which are not divisible by any number rest of it can be count manually eg.
16, 17, 18, 19, 20 (only 17 and 19 means there are only 2 numbers thatr can't be divided by any S)
4+2=6
6/20 means there are only 6 numbers in first 20 numbers that can't be divided by any s

Optimization of Brute-Force algorithm or Alternative?

I have a simple (brute-force) recursive solver algorithm that takes lots of time for bigger values of OpxCnt variable. For small values of OpxCnt, no problem, works like a charm. The algorithm gets very slow as the OpxCnt variable gets bigger. This is to be expected but any optimization or a different algorithm ?
My final goal is that :: I want to read all the True values in the map array by
executing some number of read operations that have the minimum operation
cost. This is not the same as minimum number of read operations.
At function completion, There should be no True value unread.
map array is populated by some external function, any member may be 1 or 0.
For example ::
map[4] = 1;
map[8] = 1;
1 read operation having Adr=4,Cnt=5 has the lowest cost (35)
whereas
2 read operations having Adr=4,Cnt=1 & Adr=8,Cnt=1 costs (27+27=54)
#include <string.h>
typedef unsigned int Ui32;
#define cntof(x) (sizeof(x) / sizeof((x)[0]))
#define ZERO(x) do{memset(&(x), 0, sizeof(x));}while(0)
typedef struct _S_MB_oper{
Ui32 Adr;
Ui32 Cnt;
}S_MB_oper;
typedef struct _S_MB_code{
Ui32 OpxCnt;
S_MB_oper OpxLst[20];
Ui32 OpxPay;
}S_MB_code;
char map[65536] = {0};
static int opx_ListOkey(S_MB_code *px_kod, char *pi_map)
{
int cost = 0;
char map[65536];
memcpy(map, pi_map, sizeof(map));
for(Ui32 o = 0; o < px_kod->OpxCnt; o++)
{
for(Ui32 i = 0; i < px_kod->OpxLst[o].Cnt; i++)
{
Ui32 adr = px_kod->OpxLst[o].Adr + i;
// ...
if(adr < cntof(map)){map[adr] = 0x0;}
}
}
for(Ui32 i = 0; i < cntof(map); i++)
{
if(map[i] > 0x0){return -1;}
}
// calculate COST...
for(Ui32 o = 0; o < px_kod->OpxCnt; o++)
{
cost += 12;
cost += 13;
cost += (2 * px_kod->OpxLst[o].Cnt);
}
px_kod->OpxPay = (Ui32)cost; return cost;
}
static int opx_FindNext(char *map, int pi_idx)
{
int i;
if(pi_idx < 0){pi_idx = 0;}
for(i = pi_idx; i < 65536; i++)
{
if(map[i] > 0x0){return i;}
}
return -1;
}
static int opx_FindZero(char *map, int pi_idx)
{
int i;
if(pi_idx < 0){pi_idx = 0;}
for(i = pi_idx; i < 65536; i++)
{
if(map[i] < 0x1){return i;}
}
return -1;
}
static int opx_Resolver(S_MB_code *po_bst, S_MB_code *px_wrk, char *pi_map, Ui32 *px_idx, int _min, int _max)
{
int pay, kmax, kmin = 1;
if(*px_idx >= px_wrk->OpxCnt)
{
return opx_ListOkey(px_wrk, pi_map);
}
_min = opx_FindNext(pi_map, _min);
// ...
if(_min < 0){return -1;}
kmax = (_max - _min) + 1;
// must be less than 127 !
if(kmax > 127){kmax = 127;}
// is this recursion the last one ?
if(*px_idx >= (px_wrk->OpxCnt - 1))
{
kmin = kmax;
}
else
{
int zero = opx_FindZero(pi_map, _min);
// ...
if(zero > 0)
{
kmin = zero - _min;
// enforce kmax limit !?
if(kmin > kmax){kmin = kmax;}
}
}
for(int _cnt = kmin; _cnt <= kmax; _cnt++)
{
px_wrk->OpxLst[*px_idx].Adr = (Ui32)_min;
px_wrk->OpxLst[*px_idx].Cnt = (Ui32)_cnt;
(*px_idx)++;
pay = opx_Resolver(po_bst, px_wrk, pi_map, px_idx, (_min + _cnt), _max);
(*px_idx)--;
if(pay > 0)
{
if((Ui32)pay < po_bst->OpxPay)
{
memcpy(po_bst, px_wrk, sizeof(*po_bst));
}
}
}
return (int)po_bst->OpxPay;
}
int main()
{
int _max = -1, _cnt = 0;
S_MB_code best = {0};
S_MB_code work = {0};
// SOME TEST DATA...
map[ 4] = 1;
map[ 8] = 1;
/*
map[64] = 1;
map[72] = 1;
map[80] = 1;
map[88] = 1;
map[96] = 1;
*/
// SOME TEST DATA...
for(int i = 0; i < cntof(map); i++)
{
if(map[i] > 0)
{
_max = i; _cnt++;
}
}
// num of Opx can be as much as num of individual bit(s).
if(_cnt > cntof(work.OpxLst)){_cnt = cntof(work.OpxLst);}
best.OpxPay = 1000000000L; // invalid great number...
for(int opx_cnt = 1; opx_cnt <= _cnt; opx_cnt++)
{
int rv;
Ui32 x = 0;
ZERO(work); work.OpxCnt = (Ui32)opx_cnt;
rv = opx_Resolver(&best, &work, map, &x, -42, _max);
}
return 0;
}
You can use dynamic programming to calculate the lowest cost that covers the first i true values in map[]. Call this f(i). As I'll explain, you can calculate f(i) by looking at all f(j) for j < i, so this will take time quadratic in the number of true values -- much better than exponential. The final answer you're looking for will be f(n), where n is the number of true values in map[].
A first step is to preprocess map[] into a list of the positions of true values. (It's possible to do DP on the raw map[] array, but this will be slower if true values are sparse, and cannot be faster.)
int pos[65536]; // Every position *could* be true
int nTrue = 0;
void getPosList() {
for (int i = 0; i < 65536; ++i) {
if (map[i]) pos[nTrue++] = i;
}
}
When we're looking at the subproblem on just the first i true values, what we know is that the ith true value must be covered by a read that ends at i. This block could start at any position j <= i; we don't know, so we have to test all i of them and pick the best. The key property (Optimal Substructure) that enables DP here is that in any optimal solution to the i-sized subproblem, if the read that covers the ith true value starts at the jth true value, then the preceding j-1 true values must be covered by an optimal solution to the (j-1)-sized subproblem.
So: f(i) = min(f(j) + score(pos(j+1), pos(i)), with the minimum taken over all 1 <= j < i. pos(k) refers to the position of the kth true value in map[], and score(x, y) is the score of a read from position x to position y, inclusive.
int scores[65537]; // We effectively start indexing at 1
scores[0] = 0; // Covering the first 0 true values requires 0 cost
// Calculate the minimum score that could allow the first i > 0 true values
// to be read, and store it in scores[i].
// We can assume that all lower values have already been calculated.
void calcF(int i) {
int bestStart, bestScore = INT_MAX;
for (int j = 0; j < i; ++j) { // Always executes at least once
int attemptScore = scores[j] + score(pos[j + 1], pos[i]);
if (attemptScore < bestScore) {
bestStart = j + 1;
bestScore = attemptScore;
}
}
scores[i] = bestScore;
}
int score(int i, int j) {
return 25 + 2 * (j + 1 - i);
}
int main(int argc, char **argv) {
// Set up map[] however you want
getPosList();
for (int i = 1; i <= nTrue; ++i) {
calcF(i);
}
printf("Optimal solution has cost %d.\n", scores[nTrue]);
return 0;
}
Extracting a Solution from Scores
Using this scheme, you can calculate the score of an optimal solution: it's simply f(n), where n is the number of true values in map[]. In order to actually construct the solution, you need to read back through the table of f() scores to infer which choice was made:
void printSolution() {
int i = nTrue;
while (i) {
for (int j = 0; j < i; ++j) {
if (scores[i] == scores[j] + score(pos[j + 1], pos[i])) {
// We know that a read can be made from pos[j + 1] to pos[i] in
// an optimal solution, so let's make it.
printf("Read from %d to %d for cost %d.\n", pos[j + 1], pos[i], score(pos[j + 1], pos[i]));
i = j;
break;
}
}
}
}
There may be several possible choices, but all of them will produce optimal solutions.
Further Speedups
The solution above will work for an arbitrary scoring function. Because your scoring function has a simple structure, it may be that even faster algorithms can be developed.
For example, we can prove that there is a gap width above which it is always beneficial to break a single read into two reads. Suppose we have a read from position x-a to x, and another read from position y to y+b, with y > x. The combined costs of these two separate reads are 25 + 2 * (a + 1) + 25 + 2 * (b + 1) = 54 + 2 * (a + b). A single read stretching from x-a to y+b would cost 25 + 2 * (y + b - x + a + 1) = 27 + 2 * (a + b) + 2 * (y - x). Therefore the single read costs 27 - 2 * (y - x) less. If y - x > 13, this difference goes below zero: in other words, it can never be optimal to include a single read that spans a gap of 12 or more.
To make use of this property, inside calcF(), final reads could be tried in decreasing order of start-position (i.e. in increasing order of width), and the inner loop stopped as soon as any gap width exceeds 12. Because that read and all subsequent wider reads tried would contain this too-large gap and therefore be suboptimal, they need not be tried.

Resources