I'm working on Project Euler #14 in C and have figured out the basic algorithm; however, it runs insufferably slow for large numbers, e.g. 2,000,000 as wanted; I presume because it has to generate the sequence over and over again, even though there should be a way to store known sequences (e.g., once we get to a 16, we know from previous experience that the next numbers are 8, 4, 2, then 1).
I'm not exactly sure how to do this with C's fixed-length array, but there must be a good way (that's amazingly efficient, I'm sure). Thanks in advance.
Here's what I currently have, if it helps.
#include <stdio.h>
#define UPTO 2000000
int collatzlen(int n);
int main(){
int i, l=-1, li=-1, c=0;
for(i=1; i<=UPTO; i++){
if( (c=collatzlen(i)) > l) l=c, li=i;
}
printf("Greatest length:\t\t%7d\nGreatest starting point:\t%7d\n", l, li);
return 1;
}
/* n != 0 */
int collatzlen(int n){
int len = 0;
while(n>1) n = (n%2==0 ? n/2 : 3*n+1), len+=1;
return len;
}
Your original program needs 3.5 seconds on my machine. Is it insufferably slow for you?
My dirty and ugly version needs 0.3 seconds. It uses a global array to store the values already calculated. And use them in future calculations.
int collatzlen2(unsigned long n);
static unsigned long array[2000000 + 1];//to store those already calculated
int main()
{
int i, l=-1, li=-1, c=0;
int x;
for(x = 0; x < 2000000 + 1; x++) {
array[x] = -1;//use -1 to denote not-calculated yet
}
for(i=1; i<=UPTO; i++){
if( (c=collatzlen2(i)) > l) l=c, li=i;
}
printf("Greatest length:\t\t%7d\nGreatest starting point:\t%7d\n", l, li);
return 1;
}
int collatzlen2(unsigned long n){
unsigned long len = 0;
unsigned long m = n;
while(n > 1){
if(n > 2000000 || array[n] == -1){ // outside range or not-calculated yet
n = (n%2 == 0 ? n/2 : 3*n+1);
len+=1;
}
else{ // if already calculated, use the value
len += array[n];
n = 1; // to get out of the while-loop
}
}
array[m] = len;
return len;
}
Given that this is essentially a throw-away program (i.e. once you've run it and got the answer, you're not going to be supporting it for years :), I would suggest having a global variable to hold the lengths of sequences already calculated:
int lengthfrom[UPTO] = {};
If your maximum size is a few million, then we're talking megabytes of memory, which should easily fit in RAM at once.
The above will initialise the array to zeros at startup. In your program - for each iteration, check whether the array contains zero. If it does - you'll have to keep going with the computation. If not - then you know that carrying on would go on for that many more iterations, so just add that to the number you've done so far and you're done. And then store the new result in the array, of course.
Don't be tempted to use a local variable for an array of this size: that will try to allocate it on the stack, which won't be big enough and will likely crash.
Also - remember that with this sequence the values go up as well as down, so you'll need to cope with that in your program (probably by having the array longer than UPTO values, and using an assert() to guard against indices greater than the size of the array).
If I recall correctly, your problem isn't a slow algorithm: the algorithm you have now is fast enough for what PE asks you to do. The problem is overflow: you sometimes end up multiplying your number by 3 so many times that it will eventually exceed the maximum value that can be stored in a signed int. Use unsigned ints, and if that still doesn't work (but I'm pretty sure it does), use 64 bit ints (long long).
This should run very fast, but if you want to do it even faster, the other answers already addressed that.
Related
I've written a piece of code that uses a static array of size 3000.
Ordinarily, I would just use a for loop to scan in 3000 values, but it appears that I can only ever scan in a maximum of 2048 numbers. To me that seems like an issue with memory allocation, but I'm not sure.
The problem arises because I do not want a user to input the amount of numbers they intend to input. They should only input whatever amount of numbers they want, terminate the scan by inputting 0, after which the program does its work. (Otherwise I would just use malloc.)
The code is a fairly simple number occurrence counter, found below:
int main(int argc, char **argv)
{
int c;
int d;
int j = 0;
int temp;
int array[3000];
int i;
// scanning in elements to array (have just used 3000 because no explicit value for the length of the sequence is included)
for (i = 0; i < 3000; i++)
{
scanf("%d", &array[i]);
if (array[i] == 0)
{
break;
}
}
// sorting
for(c = 0; c < i-1; c++) {
for(d = 0; d < i-c-1; d++) {
if(array[d] > array[d+1]) {
temp = array[d]; // swaps
array[d] = array[d+1];
array[d+1] = temp;
}
}
}
int arrayLength = i + 1; // saving current 'i' value to use as 'n' value before reset
for(i = 0; i < arrayLength; i = j)
{
int numToCount = array[i];
int occurrence = 1; // if a number has been found the occurence is at least 1
for(j = i+1; j < arrayLength; j++) // new loops starts at current position in array +1 to check for duplicates
{
if(array[j] != numToCount) // prints immediately after finding out how many occurences there are, else adds another
{
printf("%d: %d\n", numToCount, occurrence);
break; // this break keeps 'j' at whatever value is NOT the numToCount, thus making the 'i = j' iterator restart the process at the right number
} else {
occurrence++;
}
}
}
return 0;
}
This code works perfectly for any number of inputs below 2048. An example of it not working would be inputting: 1000 1s, 1000 2s, and 1000 3s, after which the program would output:
1: 1000
2: 1000
3: 48
My question is whether there is any way to fix this so that the program will output the right amount of occurrences.
To answer your title question: The size of an array in C is limited (in theory) only by the maximum value that can be represented by a size_t variable. This is typically a 32- or 64-bit unsigned integer, so you can have (for the 32-bit case) over 4 billion elements (or much, much more in 64-bit systems).
However, what you are probably encountering in your code is a limit on the memory available to the program, where the line int array[3000]; declares an automatic variable. Space for these is generally allocated on the stack - which is a chunk of memory of limited size made available when the function (or main) is called. This memory has limited size and, in your case (assuming 32-bit, 4-byte integers), you are taking 12,000 bytes from the stack, which may cause problems.
There are two (maybe more?) ways to fix the problem. First, you could declared the array static - this would make the compiler pre-allocate the memory, so it would not need to be taken from the stack at run-time:
static int array[3000];
A second, probably better, approach would be to call malloc to allocate memory for the array; this assigns memory from the heap - which has (on almost all systems) considerably more space than the stack. It is often limited only by the available virtual memory of the operating system (many gigabytes on most modern PCs):
int *array = malloc(3000 * sizeof(int));
Also, the advantage of using malloc is that if, for some reason, there isn't enough memory available, the function will return NULL, and you can test for this.
You can access the elements of the array in the same way, using array[i] for example. Of course, you should be sure to release the memory when you've done with it, at the end of your function:
free(array);
(This will be done automatically in your case, when the program exits, but it's good coding style to get used to doing it explicitly!)
I am (re-)learning C and in the book I am following we are covering arrays, and the book gives an algorithm for finding the first n primes; myself being a mathematician and a decently skilled programmer in a few languages I decided to use a different algorithm (using the sieve of Eratosthenes) to get the first n primes. Well making the algorithm went well, what I have works, and even for moderately large inputs, i.e. the first 50,000 primes take a bit to run as you would expect, but no issues. However when you get to say 80,000 primes pretty much as soon as it begins a window pops up saying the program is not responding and will need to quit, I made sure to make the variables that take on the primes were unsigned long long int, so I should still be in the acceptable range for their values. I did some cursory browsing online and other people that had issues with large inputs received the recommendation to create the variables outside of main, to make them global variables. I tried this for some of the variables that I could immediately put outside, but that didn't fix the issue. Possibly I need to put my arrays isPrime or primes outside of main as well? But I couldn't really see how to do that since all of my work is in main.
I realize I should have done this with separate functions, but I was just writing it as I went, but if I moved everything into separate functions, my arrays still wouldn't be global, so I wasn't sure how to fix this issue.
I tried making them either static or extern, to try and get them out of the stack memory, but naturally that didn't work since they arrays change size depending on input, and change over time.
the code is:
#include <math.h>
#include <stdbool.h>
#include <stdio.h>
unsigned long long int i,j;
unsigned long long int numPrimes,numPlaces;
int main(void)
{
bool DEBUG=false;
printf("How many primes would you like to generate? ");
scanf("%llu",&numPrimes);
// the nth prime is bounded by n*ln(n)+n*ln(ln(n)), for n >=6
// so we need to check that far out for the nth prime
if (numPrimes>= 6)
numPlaces = (int) numPrimes*log(numPrimes)+
numPrimes*log(log(numPrimes));
else
numPlaces = numPrimes*numPrimes;
if(DEBUG)
printf("numPlaces: %llu\n\n", numPlaces);
// we will need to check each of these for being prime
// add one so that we can just ignore starting at 0
bool isPrime[numPlaces+1];
// only need numPrimes places, since that is all we are looking for
// but numbers can and will get large
unsigned long long int primes[numPrimes];
for (i=2; i<numPlaces+1;i++)
isPrime[i] = true; // everything is prime until it isn't
i=2; // represents current prime
while (i < numPlaces + 1)
{
for (j=i+1;j<numPlaces+1;j++)
{
if (isPrime[j] && j%i ==0) // only need to check if we haven't already
{
isPrime[j] = false;// j is divisibly by i, so not prime
if(DEBUG)
{
printf("j that is not prime: %llu\n",j);
printf("i that eliminated it: %llu\n\n",i);
}//DEBUG if
}//if
}//for
// ruled out everything that was divisible by i, need to choose
// the next i now.
for (j=i+1;j<numPlaces+2;j++)// here j is just a counter
{
if (j == numPlaces +1)// this is to break out of while
{
i = j;
break;
}// if j = numPlaces+1 then we are done
else if (isPrime[j]==true)
{
i = j;
if (DEBUG)
{
printf("next prime: %llu\n\n",i);
}//DEBUG if
break;
}//else if
}// for to decide i
}//while
// now we have which are prime and which are not, now to just get
// the first numPrimes of them.
primes[0]=2;
for (i=1;i<numPrimes;i++)// i is now a counter
{
// need to determine what the ith prime is, i.e. the ith true
// entry in isPrime, 2 is taken care of
// first we determine the starting value for j
// the idea here is we only need to check odd numbers of being
// prime after two, so I don't need to check everything
if (i<3)
j=3;
else if (i % 2 ==0)
j = i+1;
else
j = i;
for (;j<numPlaces+1;j+=2)// only need to consider odd nums
{
// check for primality, but we don't care if we already knew
// it was prime
if (isPrime[j] && j>primes[i-1])
{
primes[i]=j;
break;
}//if, determined the ith prime
}//for to find the ith prime
}//for to fill in primes
// at this point we have all the primes in 'primes' and now we just
// need to print them
printf(" n\t\t prime\n");
printf("___\t\t_______\n");
for(i=0;i<numPrimes;i++)
{
printf("%llu\t\t%llu\n",i+1,primes[i]);
}//for
return 0;
}//main
I suppose I could just avoid the primes array and just use the index of isPrime, if that would help? Any ideas would help thanks!
Your problem is here, in the definition of the VLA ("Variable Length Array", not "Very Large Array")
bool isPrime[numPlaces+1];
The program does not have enough space in the area for local variables for the array isPrime when numPlaces is large.
You have two options:
declare the array with a "big enough" size outside of the main function and ignore the extra space
use another area for storing the array with malloc() and friends
option 1
#include <stdio.h>
unsigned long long int i,j;
bool isPrime[5000000]; /* waste memory */
int main(void)
option 2
int main(void)
{
bool *isPrime;
// ...
printf("How many primes would you like to generate? ");
scanf("%llu",&numPrimes);
// ...
// we will need to check each of these for being prime
// add one so that we can just ignore starting at 0
isPrime = malloc(numPrimes * sizeof *isPrime);
// ... use the pointer exactly as if it was an array
// ... with the same syntax as you already have
free(isPrime);
return 0;
}
The array you allocate is a stack variable (by all likelihood), and stack size is limited, so you are probably overwriting something important as soon as you hit a certain size threshold, causing the program to crash. Try using a dynamic array, allocated with malloc, to store the sieve.
I am unable to understand why i am getting runtime error with this code. Problem is every number >=6 can be represented as sum of two prime numbers.
My code is ...... Thanks in advance problem link is http://poj.org/problem?id=2262
#include "stdio.h"
#include "stdlib.h"
#define N 1000000
int main()
{
long int i,j,k;
long int *cp = malloc(1000000*sizeof(long int));
long int *isprime = malloc(1000000*sizeof(long int));
//long int *isprime;
long int num,flag;
//isprime = malloc(2*sizeof(long int));
for(i=0;i<N;i++)
{
isprime[i]=1;
}
j=0;
for(i=2;i<N;i++)
{
if(isprime[i])
{
cp[j] = i;
j++;
for(k=i*i;k<N;k+=i)
{
isprime[k] = 0;
}
}
}
//for(i=0;i<j;i++)
//{
// printf("%d ",cp[i]);
//}
//printf("\n");
while(1)
{
scanf("%ld",&num);
if(num==0) break;
flag = 0;
for(i=0;i<j&&num>cp[i];i++)
{
//printf("%d ",cp[i]);
if(isprime[num-cp[i]])
{
printf("%ld = %ld + %ld\n",num,cp[i],num-cp[i]);
flag = 1;
break;
}
}
if(flag==0)
{
printf("Goldbach's conjecture is wrong.\n");
}
}
free(cp);
free(isprime);
return 0;
}
Two possibilities immediately spring to mind. The first is that the user input may be failing if whatever test harness is being used does not provide any input. Without knowing more detail on the harness, this is a guess at best.
You could check that by hard-coding a value rather than accepting one from standard input.
The other possibility is the rather large memory allocations being done. It may be that you're in a constrained environment which doesn't allow that.
A simple test for that is to drop the value of N (and, by the way, use it rather than the multiple hardcoded 1000000 figures in your malloc calls). A better way would be to check the return value from malloc to ensure it's not NULL. That should be done anyway.
And, aside from that, you may want to check your Eratosthenes Sieve code. The first item that should be marked non-prime for the prime i is i + i rather than i * i as you have. I think it should be:
for (k = i + i; k < N; k += i)
The mathematical algorithm is actually okay since any multiple of N less than N * N will already have been marked non-prime by virtue of the fact it's a multiple of one of the primes previously checked.
Your problem lies with integer overflow. At the point where N becomes 46_349, N * N is 2_148_229_801 which, if you have a 32-bit two's complement integer (maximum value of 2_147_483_647), will wrap around to -2_146_737_495.
When that happens, the loop keeps going since that negative number is still less than your limit, but using it as an array index is, shall we say, inadvisable :-)
The reason it works with i + i is because your limit is well short of INT_MAX / 2 so no overflow happens there.
If you want to make sure that this won't be a problem if you get up near INT_MAX / 2, you can use something like:
for (k = i + i; (k < N) && (k > i); k += i)
That extra check on k should catch the wraparound event, provided your wrapping follows the "normal" behaviour - technically, I think it's undefined behaviour to wrap but most implementations simply wrap two positives back to a negative due to the two's complement nature. Be aware then that this is actually non-portable, but what that means in practice is that it will only work on 99.999% of machines out there :-)
But, if you're a stickler for portability, there are better ways to prevent overflow in the first place. I won't go into them here but to say they involve subtracting one of the terms being summed from MAX_INT and comparing it to the other term being summed.
The only way I can get this to give an error is if I enter a value greater than 1000000 or less than 1 to the scanf().
Like this:
ubuntu#amrith:/tmp$ ./x
183475666
Segmentation fault (core dumped)
ubuntu#amrith:/tmp$
But the reason for that should be obvious. Other than that, this code looks good.
Just trying to find what went wrong!
If the sizeof(long int) is 4 bytes for the OS that you are using, then it makes this problem.
In the code:
for(k=i*i;k<N;k+=i)
{
isprime[k] = 0;
}
Here, when you do k = i*i, for large values if i, the value of k goes beyond 4 bytesand get truncated which may result in negative numbers and so, the condition k<N is satisfied but with a negative number :). So you get a segmentation fault there.
It's good that you need only i+i, but if you need to increase the limit, take care of this problem.
I tried listing prime numbers up to 2 billion, using Sieve Eratosthenes method. Here is what I used!
The problem I am facing is, I am not able to go beyond 10 million numbers. When I tried, it says 'Segmentation Fault'. I searched in the Internet to find the cause. Some sites say, it is the memory allocation limitation of the compiler itself. Some say, it is a hardware limitation. I am using a 64-bit processor with 4GB of RAM installed. Please suggest me a way to list them out.
#include <stdio.h>
#include <stdlib.h>
#define MAX 1000000
long int mark[MAX] = {0};
int isone(){
long int i;
long int cnt = 0;
for(i = 0; i < MAX ; i++){
if(mark[i] == 1)
cnt++;
}
if(cnt == MAX)
return 1;
else
return 0;
}
int main(int argc,char* argv[]){
long int list[MAX];
long int s = 2;
long int i;
for(i = 0 ; i < MAX ; i++){
list[i] = s;
s++;
}
s = 2;
printf("\n");
while(isone() == 0){
for(i = 0 ; i < MAX ; i++){
if((list[i] % s) == 0)
mark[i] = 1;
}
printf(" %lu ",s);
while(mark[++s - 2] != 0);
}
return 1;
}
long int mark[1000000] does stack allocation, which is not what you want for that amount of memory. try long int *mark = malloc(sizeof(long int) * 1000000) instead to allocate heap memory. This will get you beyond ~1Mil of array elements.
remember to free the memory, if you don't use it anymore. if yon don't know malloc or free, read the manpages (manuals) for the functions, available via man 3 malloc and man 3 free on any linux terminal. (alternatively you could just google man malloc)
EDIT: make that calloc(1000000, sizeof(long int)) to have a 0-initialized array, which is probably better.
Additionally, you can use every element of your array as a bitmask, to be able to store one mark per bit, and not per sizeof(long int) bytes. I'd recommend using a fixed-width integer type, like uint32_t for the array elements and then setting the (n % 32)'th bit in the (n / 32)'th element of the array to 1 instead of just setting the nth element to 1.
you can set the nth bit of an integer i by using:
uint32_t i = 0;
i |= ((uint32_t) 1) << n
assuming you start counting at 0.
that makes your set operation on the uint32_t bitmask array for a number n:
mask[n / 32] |= ((uint32_t)1) << (n % 32)
that saves you >99% of memory for 32bit types. Have fun :D
Another, more advanced approach to use here is prime wheel factorization, which basically means that you declare 2,3 and 5 (and possibly even more) as prime beforehand, and use only numbers that are not divisible by one of these in your mask array. But that's a really advanced concept.
However, I have written a primesieve wich wheel factorization for 2 and 3 in C in about ~15 lines of code (also for projecteuler) so it is possible to implement this stuff efficiently ;)
The most immediate improvement is to switch to bits representing the odd numbers. Thus to cover the M=2 billion numbers, or 1 billion odds, you need 1000/8 = 125 million bytes =~ 120 MB of memory (allocate them on heap, still, with the calloc function).
The bit at position i will represent the number 2*i+1. Thus when marking the multiples of a prime p, i.e. p^2, p^2+2p, ..., M, we have p^2=(2i+1)^2=4i^2+4i+1 represented by a bit at the position j=(p^2-1)/2=2i(i+1), and next multiples of p above it at position increments of p=2i+1,
for( i=1; ; ++i )
if( bit_not_set(i) )
{
p=i+i+1;
k=(p-1)*(i+1);
if( k > 1000000000) break;
for( ; k<1000000000; k+=p)
set_bit(k); // mark as composite
}
// all bits i>0 where bit_not_set(i) holds,
// represent prime numbers 2i+1
Next step is to switch to working in smaller segments that will fit in your cache size. This should speed things up. You will only need to reserve memory region for primes under the square root of 2 billion in value, i.e. 44721.
First, sieve this smaller region to find the primes there; then write these primes into a separate int array; then use this array of primes to sieve each segment, possibly printing the found primes to stdout or whatever.
have wrote the code for what i see to be a good algorithm for finding the greatest prime factor for a large number using recursion. My program crashes with any number greater than 4 assigned to the variable huge_number though. I am not good with recursion and the assignment does not allow any sort of loop.
#include <stdio.h>
long long prime_factor(int n, long long huge_number);
int main (void)
{
int n = 2;
long long huge_number = 60085147514;
long long largest_prime = 0;
largest_prime = prime_factor(n, huge_number);
printf("%ld\n", largest_prime);
return 0;
}
long long prime_factor (int n, long long huge_number)
{
if (huge_number / n == 1)
return huge_number;
else if (huge_number % n == 0)
return prime_factor (n, huge_number / n);
else
return prime_factor (n++, huge_number);
}
any info as to why it is crashing and how i could improve it would be greatly appreciated.
Even fixing the problem of using post-increment so that the recursion continues forever, this is not a good fit for a recursive solution - see here for why, but it boils down to how fast you can reduce the search space.
While your division of huge_number whittles it down pretty fast, the vast majority of recursive calls are done by simply incrementing n. That means you're going to use a lot of stack space.
You would be better off either:
using an iterative solution where you won't blow out the stack (if you just want to solve the problem) (a); or
finding a more suitable problem for recursion if you're just trying to learn recursion.
(a) An example of such a beast, modeled on your recursive solution, is:
#include <stdio.h>
long long prime_factor_i (int n, long long huge_number) {
while (n < huge_number) {
if (huge_number % n == 0) {
huge_number /= n;
continue;
}
n++;
}
return huge_number;
}
int main (void) {
int n = 2;
long long huge_number = 60085147514LL;
long long largest_prime = 0;
largest_prime = prime_factor_i (n, huge_number);
printf ("%lld\n", largest_prime);
return 0;
}
As can be seen from the output of that iterative solution, the largest factor is 10976461. That means the final batch of recursions in your recursive solution would require a stack depth of ten million stack frames, not something most environments will contend with easily.
If you really must use a recursive solution, you can reduce the stack space to the square root of that by using the fact that you don't have to check all the way up to the number, but only up to its square root.
In addition, other than 2, every other prime number is odd, so you can further halve the search space by only checking two plus the odd numbers.
A recursive solution taking those two things into consideration would be:
long long prime_factor_r (int n, long long huge_number) {
// Debug code for level checking.
// static int i = 0;
// printf ("recursion level = %d\n", ++i);
// Only check up to square root.
if (n * n >= huge_number)
return huge_number;
// If it's a factor, reduce the number and try again.
if (huge_number % n == 0)
return prime_factor_r (n, huge_number / n);
// Select next "candidate" prime to check against, 2 -> 3,
// 2n+1 -> 2n+3 for all n >= 1.
if (n == 2)
return prime_factor_r (3, huge_number);
return prime_factor_r (n + 2, huge_number);
}
You can see I've also removed the (awkward, in my opinion) construct:
if something then
return something
else
return something else
I much prefer the less massively indented code that comes from:
if something then
return something
return something else
But that's just personal preference. In any case, that gets your recursion level down to 1662 (uncomment the debug code to verify) rather than ten million, a rather sizable reduction but still not perfect. That runs okay in my environment.
You meant n+1 instead of n++. n++ increments n after using it, so the recursive call gets the original value of n.
You are overflowing stack, because n++ post-increments the value, making a recursive call with the same values as in the current invocation.
the crash reason is stack overflow. I add a counter to your program and execute it(on ubuntu 10.04 gcc 4.4.3) the counter stop at "218287" before core dump. the better solution is using loop instead of recursion.