I have 2D array of size m*m with element values either 0s or 1s. Furthermore, each column of the array has a contiguous block of 1s (with 0 outside that block). The array itself is too large to be held in memory (as many as 10^6 rows), but for each column I can determine the lower bound, a, and the upper bound, b, of the 1s in that column. For a given n, I need to find out those n consecutive rows which have the maximum number of 1s. I can easily do it for smaller numbers by calculating the sum of each row one by one, and then choosing n consecutive rows whose sum comes out to be maximum, but for large numbers, it is consuming too much time. Is there any efficient way for calculating this? Perhaps using Dynamic Programming?
Here is an example code fragment showing my current approach, where successive calls to read_int() (not given here) provide the lower and upper bounds for successive columns:
long int harr[10000]={0}; //initialized to zero
for(int i=0;i<m;i++)
{
a=read_int();
b=read_int();
for(int j=a;j<=b;j++) // for finding sum of each row
harr[j]++;
}
answer=0;
for(int i=0;i<n;i++)
{
answer=answer+harr[i];
}
current=answer;
for(int i=n;i<m;i++)
{
current=current+harr[i]-harr[i-n];
if(current>answer)
{
answer=current;
}
}
For example (with m = 6 and n = 3)
Here the answer would be row 1 to row 3 with a total 1-count of 13 in those rows. (Row 2 to row 4 also maximizes the sum as there is a tie.)
Here is a different approach. Think of each pair a, b as defining an interval of the form [a,b+1). The task is to find the n consecutive indices which maximizes the sum of the parenthesis depth of the numbers in that interval. Every new a bumps the parenthesis depth at a up by 1. Every new b causes the parenthesis depth after b to go down by 1. In the first pass -- just load these parentheses depth deltas. Then one pass gets the parenthesis depths from these deltas. The following code illustrates this approach. I reduced m to 6 for testing purposes and replaced calls to the unkown read_int() by accesses to hard-wired arrays (which correspond to the example in the question):
#include <stdio.h>
int main(void){
int a,b,answer,current,lower,upper;
int n = 3;
int lower_bound[6] = {0,1,2,3,1,2};
int upper_bound[6] = {3,4,3,5,2,4};
int m = 6;
int harr[6]={0};
//load parenthesis depth-deltas (all initially 0)
for(int i=0;i<m;i++)
{
a = lower_bound[i];
b = upper_bound[i];
harr[a]++;
if(b < m-1)harr[b+1]--;
}
//determine p-depth at each point
for(int i = 1; i < m; i++){
harr[i] += harr[i-1];
}
//find optimal n-rows by sliding-window
answer = 0;
for(int i=0;i<n;i++)
{
answer = answer+harr[i];
}
current =answer;
lower = 0;
upper = n-1;
for(int i=n;i<m;i++)
{
current = current+harr[i]-harr[i-n];
if(current>answer)
{
answer = current;
lower = i-n+1;
upper = i;
}
}
printf("Max %d rows are %d to %d with a total sum of %d ones\n", n,lower,upper,answer);
return 0;
}
(Obviously, the loop which loads harr can be combined with the loop which computes answer. I kept it as two passes to better illustrate the logic of how the final harr values can be obtained from the parentheses deltas).
When this code is compiled and run its output is:
Max 3 rows are 1 to 3 with a total sum of 13 ones
I'm not sure how the following will scale for your 10^6 rows, but it manages the the trailing sum of x consecutive rows in a single pass without function call overhead. It may be worth a try. Also insure you are compiling with full optimizations so the compiler can add its 2 cents as well.
My original thought was to find some way to read x * n integers (from your m x n matrix) and in some fashion look at a population of set bits over that number of bytes. (checking the endianness) and taking either the first or last byte for each integer to check whether a bit was set. However, the logic seemed as costly as simply carrying the sum of the trailing x rows and stepping through the array while attempting to optimize the logic.
I don't have any benchmarks from your data to compare against, but perhaps this will give you another idea or two.:
#include <stdio.h>
#include <stdlib.h>
#ifndef CHAR_BIT
#define CHAR_BIT 8
#endif
#ifndef INT_MIN
#define INT_MIN -(1U << (sizeof (int) * CHAR_BIT - 1))
#endif
int main (int argc, char **argv) {
/* number of consecutive rows to sum */
size_t ncr = argc > 1 ? (size_t)atoi (argv[1]) : 3;
/* static array to test summing and row id logic, not
intended to simulate the 0's or 1's */
int a[][5] = {{1,2,3,4,5},
{2,3,4,5,6},
{3,4,5,6,7},
{4,5,6,7,8},
{3,4,5,6,7},
{0,1,2,3,4},
{1,2,3,4,5}};
int sum[ncr]; /* array holding sum on ncr rows */
int sumn = 0; /* sum of array values */
int max = INT_MIN; /* variable holding maximum sum */
size_t m, n, i, j, k, row = 0, sidx;
m = sizeof a / sizeof *a; /* matrix m x n dimensions */
n = sizeof *a / sizeof **a;
for (k = 0; k < ncr; k++) /* initialize vla values */
sum[k] = 0;
for (i = 0; i < m; i++) /* for each row */
{
sidx = i % ncr; /* index for sum array */
if (i > ncr - 1) { /* sum for ncr prior rows */
for (k = 0; k < ncr; k++)
sumn += sum[k];
/* note 'row' index assignment below is 1 greater
than actual but simplifies output loop indexes */
max = sumn > max ? row = i, sumn : max;
sum[sidx] = sumn = 0; /* zero index to be replaced and sumn */
}
for (j = 0; j < n; j++) /* compute sum for current row */
sum [sidx] += a[i][j];
}
/* output results */
printf ("\n The maximum sum for %zu consecutive rows: %d\n\n", ncr, max);
for (i = row - ncr; i < row; i++) {
printf (" row[%zu] : ", i);
for (j = 0; j < n; j++)
printf (" %d", a[i][j]);
printf ("\n");
}
return 0;
}
Example Output
$./bin/arraymaxn
The maximum sum for 3 consecutive rows: 80
row[2] : 3 4 5 6 7
row[3] : 4 5 6 7 8
row[4] : 3 4 5 6 7
$./bin/arraymaxn 4
The maximum sum for 4 consecutive rows: 100
row[1] : 2 3 4 5 6
row[2] : 3 4 5 6 7
row[3] : 4 5 6 7 8
row[4] : 3 4 5 6 7
$ ./bin/arraymaxn 2
The maximum sum for 2 consecutive rows: 55
row[2] : 3 4 5 6 7
row[3] : 4 5 6 7 8
Note: if there are multiple equivalent maximum consecutive rows (i.e. two sets of rows where the 1's add up the the same number), the first occurrence of the maximum is selected.
I'm not sure what optimizations you are choosing to compile with, but regardless which code you use, you can always try the simple hints to the compiler to inline all functions (if you have functions in your code) and fully optimize the code. Two helpful ones are:
gcc -finline-functions -Ofast
Related
I am writing a program to break numbers in an array into their digits then store those digits in a new array. I have two problems:
It does not display the first number in the array (2) when transferred to the second array, and I am not entirely sure why.
The array may contain 0's, which would break my current for loop. Is there another way to implement a for loop to only run for as many numbers are stored in a array without knowing how big the array is?
#include <cs50.h>
#include <stdio.h>
#include <string.h>
int main(void)
{
// Setting an array equal to test variables
int sum[50] = { 2, 6, 3, 10, 32, 64 };
int i, l, k = 0, sumdig[10], dig = 0;
// Runs for every digit in array sum, increases size of separate variable k every time loop runs
for (i = 0; sum[i] > 0; i++ && k++)
{
sumdig[k] = sum[i] % 10;
dig++;
sum[i] /= 10;
// If statement checks to see if the number was two digits
if (sum[i] > 0)
{
// Advancing a place in the array
k++;
// Setting the new array position equal to the
sumdig[k] = sum[i] % 10;
dig++;
}
}
// For testing purposes - looking to see what digits have been stored
for (l = 0; l < dig; l++)
{
printf("%i\n", sumdig[l]);
}
}
This is the output:
6
3
0
1
2
3
4
6
0
Solution:
It does not display the first number in the array (2) when transferred to the second array
changes i++ && k++ into i++,k++
Is there another way to implement a for loop to only run for as many numbers are stored in an array
There are many different ways but here is some to illustrate it in a few different scenarios:
1. The array length is known and fixed:
Let the compiler automatically allocate the array for you. And then for(i=0; i<6; i++)
2. Able to calc the array length:
Then count the number of elements when initializing the array into a varible. Then just for(i=0; i<SizeCount; i++)
3. Not-able to know array size for some reason:
It is rare but, in that case, you can pre-set a stop criteria i.e. -1 or some other flag so that you can stop when it reaches the terminator i.e. set or pre-set all other values of sum to be -1. Then you can while(sum[i] != -1) This is how string lengths work in C, either with NULL termination (string end with the number 0 or value NULL) or, with input, the line break character \n indicating a termination.
DEMO
Here is a demo of full code with some explanation:
#include <stdio.h>
int main(void){
int sum[] = {2, 6, 3, 10, 32, 64}; // compiler is smart enough know the size
int i, k = 0, sumdig[10], dig = 0;
// Runs for every digit in array sum, increases size of seperate variable k everytime loop runs
for(i = 0; i < sizeof(sum)/sizeof(int); i++, k++){
sumdig[k] = sum[i] % 10;
dig++;
sum[i] /= 10;
// If statement checks to see if the number was two digits
if (sum[i] > 0)
{
// Advancing a place in the array
k++;
// Setting the new array position equal to the
sumdig[k] = sum[i] % 10;
dig++;
}
}
// For testing purposes - looking to see what digits have been stored
for(i = 0; i < dig; i++){
printf("%i\n", sumdig[i]);
}
}
Compile and run
gcc -Wall demo.c -o demo
./demo
Output
2
6
3
0
1
2
3
4
6
I'm currently working on a program in C where I input matrix dimensions and elements of a matrix, which is represented in memory as dynamic 2D array. Program later finds maximum of each row. Then it finds minimal maximum out of maximums of all rows.
For example,
if we have 3x3 matrix:
1 2 3
7 8 9
4 5 6
maximums are 3, 9, 6 and minimal maximum is 3. If minimal maximum is positive, program should proceed with rearranging order of rows so they follow ascending order of maximums, so the final output should be:
1 2 3
4 5 6
7 8 9
I made a dynamic array which contains values of maximums followed by row in which they were found, for example: 3 0 6 1 9 2. But I have no idea what should I do next. It crossed my mind if I somehow figure out a way to use this array with indices I made that I would be in problem if I have same maximum values in different rows, for example if matrix was:
1 2 3
4 5 6
7 8 9
1 1 6
my array would be 3 0 6 1 9 2 6 3. I would then need additional array for positions and it becomes like an inception. Maybe I could use some flag to see if I've already encountered the same number, but I generally, like algorithmically, don't know what to do. It crossed my mind to make an array and transfer values to it, but it would waste additional space... If I found a way to find order in which I would like to print rows, would I need an adress function different than one I already have? (which is, in double for loop, for current element - *(matrix+i * numOfCols+currentCol) ) I would appreciate if somebody told me am I thinking correctly about problem solution and give me some advice about this problem. Thanks in advance!
I don't know if I have understood it correctly, but what you want to do is to rearrange the matrix, arranging the rows by the greatest maximum to the least...
First, I don't think you need the dynamic array, because the maximums are already ordered, and their position on the array is enough to describe the row in which they are.
To order from maximum to minimum, I would make a loop which saved the position of the maximum and then, use it to store the correspondent row in the input matrix into the output matrix. Then, change the value of that maximum to 0 (if you include 0 in positives, then change to -1), and repeat the process until all rows have been passed to the output matrix. Here is a sketch of what it would look like:
for(k = 0; k < n_rows; ++k)
for(i = 0; i < n_rows; ++i)
if (max[i] > current_max)
current_max = max[i];
max_row = i;
for(c = 0; c < n_columns; ++c)
output_matrix[row][c] = inputmatrix[max_row][c];
max[max_row] = 0;
Array is not dynamic because we can not change the size of array, so in this case you can use double pointer, for example, int **matrix to store the value of 2D array.
The function for searching the max value of each row and the row index of each max value:
int * max_of_row(int n, int m, int **mat) {
// allocate for n row and the row index of max value
int *matrix = malloc (sizeof(int) * n*2);
for(int i = 0; i < 2*n; i++) {
matrix[i] = 0;
}
int k = 0;
for(int i = 0; i < n; i++) {
for (int j = 0; j < m; j++) {
if(matrix[k] < mat[i][j]) {
matrix[k] = mat[i][j];
}
}
matrix[k+1] = i;
k += 2;
}
return matrix;
}
The main function for test:
int main(int argc, char const *argv[])
{
// allocate for 4 rows
int **matrix = malloc (sizeof (int) * 4);
for (int i = 0; i < 4; i++) {
// allocate for 3 cols
matrix[i] = malloc(sizeof(int) * 3);
for(int j = 0; j < 3; j++){
matrix[i][j] = i+j;
}
}
int * mat = max_of_row(4, 3,matrix);
printf("matrix:\n");
for (int i = 0; i < 4; i++) {
for(int j = 0; j < 3; j++){
printf("%d ",matrix[i][j]);
}
printf("\n");
}
printf("max of row and positon\n");
for (int i = 0; i < 8; i++) {
printf("%d ", mat[i]);
}
printf("\nmax of row\n");
for (int i = 0; i < 8; i += 2) {
printf("%d ", mat[i]);
}
printf("\n");
return 0;
}
Output:
matrix:
0 1 2
1 2 3
2 3 4
3 4 5
max of row and positon
2 0 3 1 4 2 5 3
max of row
2 3 4 5
I have an array of size n, and want to divide into k number of sub arrays, and each array must have approximately the same size. I have been thinking for a while and know that you must use two for loops, but I am having a hard time implementing these for loop.
What I've Tried:
//Lets call the original integer array with size n: arr
// n is the size of arr
// k is the number of subarrays wanted
int size_of_subArray = n/k;
int left_over = n%k; // When n is not divisible by k
int list_of_subArrays[k][size_of_subArray + 1];
//Lets call the original integer array with size n: arr
for(int i = 0; i < k; i++){
for(int j = 0; j < size_of_subArray; j++){
list_of_subArrays[i][j] = arr[j];
}
}
I am struggling with getting the correct indexes in the forloops.
Any Ideas?
I've refactored your code and annotated it.
The main points are:
When calculating the sub-array size, it must be rounded up
The index for arr needs to continue to increment from 0 (i.e. it is not reset to 0)
The following should work, but I didn't test it [please pardon the gratuitous style cleanup]:
// Lets call the original integer array with size n: arr
// n is the size of arr
// k is the number of subarrays wanted
// round up the size of the subarray
int subsize = (n + (k - 1)) / k;
int list_of_subArrays[k][subsize];
int arridx = 0;
int subno = 0;
// process all elements in original array
while (1) {
// get number of remaining elements to process in arr
int remain = n - arridx;
// stop when done
if (remain <= 0)
break;
// clip remaining count to amount per sub-array
if (remain > subsize)
remain = subsize;
// fill next sub-array
for (int subidx = 0; subidx < remain; ++subidx, ++arridx)
list_of_subArrays[subno][subidx] = arr[arridx];
// advance to next sub-array
++subno;
}
UPDATE:
Yes this divides the arrays into n subarrays, but it doesn't divide it evenly. Say there was an array of size 10, and wanted to divide it into 9 subarrays. Then 8 subarrays will have 1 of original array's element, but one subarray will need to have 2 elements.
Your original code had a few bugs [fixed in the above example]. Even if I were doing this for myself the above would have been the first step to get something working.
In your original question, you did say: "and each array must have approximately the same size". But, here, there is the physical size of the list sub-array [still a rounded up value].
But, I might have said something like "evenly distributed" or some such to further clarify your intent. That is, that you wanted the last sub-array/bucket to not be "short" [by a wide margin].
Given that, the code starts off somewhat the same, but needs a bit more sophistication. This is still a bit rough and might be optimized further:
#include <stdio.h>
#ifdef DEBUG
#define dbgprt(_fmt...) printf(_fmt)
#else
#define dbgprt(_fmt...) /**/
#endif
int arr[5000];
// Lets call the original integer array with size n: arr
// n is the size of arr
// k is the number of subarrays wanted
void
fnc2(int n,int k)
{
// round up the size of the subarray
int subsize = (n + (k - 1)) / k;
int list_of_subArrays[k][subsize];
dbgprt("n=%d k=%d subsize=%d\n",n,k,subsize);
int arridx = 0;
for (int subno = 0; subno < k; ++subno) {
// get remaining number of sub-arrays
int remsub = k - subno;
// get remaining number of elements
int remain = n - arridx;
// get maximum bucket size
int curcnt = subsize;
// get projected remaining size for using this bucket size
int curtot = remsub * curcnt;
// if we're too low, up it
if (curtot < remain)
++curcnt;
// if we're too high, lower it
if (curtot > remain)
--curcnt;
// each bucket must have at least one
if (curcnt < 1)
curcnt = 1;
// each bucket can have no more than the maximum
if (curcnt > subsize)
curcnt = subsize;
// last bucket is the remainder
if (curcnt > remain)
curcnt = remain;
dbgprt(" list[%d][%d] --> arr[%d] remain=%d\n",
subno,curcnt,arridx,remain);
// fill next sub-array
for (int subidx = 0; subidx < curcnt; ++subidx, ++arridx)
list_of_subArrays[subno][subidx] = arr[arridx];
}
dbgprt("\n");
}
Could you explain me how the following two algorithms work?
int countSort(int arr[], int n, int exp)
{
int output[n];
int i, count[n] ;
for (int i=0; i < n; i++)
count[i] = 0;
for (i = 0; i < n; i++)
count[ (arr[i]/exp)%n ]++;
for (i = 1; i < n; i++)
count[i] += count[i - 1];
for (i = n - 1; i >= 0; i--)
{
output[count[ (arr[i]/exp)%n] - 1] = arr[i];
count[(arr[i]/exp)%n]--;
}
for (i = 0; i < n; i++)
arr[i] = output[i];
}
void sort(int arr[], int n)
{
countSort(arr, n, 1);
countSort(arr, n, n);
}
I wanted to apply the algorithm at this array:
After calling the function countSort(arr, n, 1) , we get this:
When I call then the function countSort(arr, n, n) , at this for loop:
for (i = n - 1; i >= 0; i--)
{
output[count[ (arr[i]/exp)%n] - 1] = arr[i];
count[(arr[i]/exp)%n]--;
}
I get output[-1]=arr[4].
But the array doesn't have such a position...
Have I done something wrong?
EDIT:Considering the array arr[] = { 10, 6, 8, 2, 3 }, the array count will contain the following elements:
what do these numbers represent? How do we use them?
Counting sort is very easy - let's say you have an array which contains numbers from range 1..3:
[3,1,2,3,1,1,3,1,2]
You can count how many times each number occurs in the array:
count[1] = 4
count[2] = 2
count[3] = 3
Now you know that in a sorted array,
number 1 will occupy positions 0..3 (from 0 to count[1] - 1), followed by
number 2 on positions 4..5 (from count[1] to count[1] + count[2] - 1), followed by
number 3 on positions 6..8 (from count[1] + count[2] to count[1] + count[2] + count[3] - 1).
Now that you know final position of every number, you can just insert every number at its correct position. That's basically what countSort function does.
However, in real life your input array would not contain just numbers from range 1..3, so the solution is to sort numbers on the least significant digit (LSD) first, then LSD-1 ... up to the most significant digit.
This way you can sort bigger numbers by sorting numbers from range 0..9 (single digit range in decimal numeral system).
This code: (arr[i]/exp)%n in countSort is used just to get those digits. n is base of your numeral system, so for decimal you should use n = 10 and exp should start with 1 and be multiplied by base in every iteration to get consecutive digits.
For example, if we want to get third digit from right side, we use n = 10 and exp = 10^2:
x = 1234,
(x/exp)%n = 2.
This algorithm is called Radix sort and is explained in detail on Wikipedia: http://en.wikipedia.org/wiki/Radix_sort
It took a bit of time to pick though your countSort routine and attempt to determine just what it was you were doing compared to a normal radix sort. There are some versions that split the iteration and the actual sort routine which appears to be what you attempted using both countSort and sort functions. However, after going though that exercise, it was clear you had just missed including necessary parts of the sort routine. After fixing various compile/declaration issues in your original code, the following adds the pieces you overlooked.
In your countSort function, the size of your count array was wrong. It must be the size of the base, in this case 10. (you had 5) You confused the use of exp and base throughout the function. The exp variable steps through the powers of 10 allowing you to get the value and position of each element in the array when combined with a modulo base operation. You had modulo n instead. This problem also permeated you loop ranges, where you had a number of your loop indexes iterating over 0 < n where the correct range was 0 < base.
You missed finding the maximum value in the original array which is then used to limit the number of passes through the array to perform the sort. In fact all of your existing loops in countSort must fall within the outer-loop iterating while (m / exp > 0). Lastly, you omitted a increment of exp within the outer-loop necessary to applying the sort to each element within the array. I guess you just got confused, but I commend your effort in attempting to rewrite the sort routine and not just copy/pasting from somewhere else. (you may have copied/pasted, but if that's the case, you have additional problems...)
With each of those issues addressed, the sort works. Look though the changes and understand what it is doing. The radix sort/count sort are distribution sorts relying on where numbers occur and manipulating indexes rather than comparing values against one another which makes this type of sort awkward to understand at first. Let me know if you have any questions. I made attempts to preserve your naming convention throughout the function, with the addition of a couple that were omitted and to prevent hardcoding 10 as the base.
#include <stdio.h>
void prnarray (int *a, int sz);
void countSort (int arr[], int n, int base)
{
int exp = 1;
int m = arr[0];
int output[n];
int count[base];
int i;
for (i = 1; i < n; i++) /* find the maximum value */
m = (arr[i] > m) ? arr[i] : m;
while (m / exp > 0)
{
for (i = 0; i < base; i++)
count[i] = 0; /* zero bucket array (count) */
for (i = 0; i < n; i++)
count[ (arr[i]/exp) % base ]++; /* count keys to go in each bucket */
for (i = 1; i < base; i++) /* indexes after end of each bucket */
count[i] += count[i - 1];
for (i = n - 1; i >= 0; i--) /* map bucket indexes to keys */
{
output[count[ (arr[i]/exp) % base] - 1] = arr[i];
count[(arr[i]/exp)%n]--;
}
for (i = 0; i < n; i++) /* fill array with sorted output */
arr[i] = output[i];
exp *= base; /* inc exp for next group of keys */
}
}
int main (void) {
int arr[] = { 10, 6, 8, 2, 3 };
int n = 5;
int base = 10;
printf ("\n The original array is:\n\n");
prnarray (arr, n);
countSort (arr, n, base);
printf ("\n The sorted array is\n\n");
prnarray (arr, n);
printf ("\n");
return 0;
}
void prnarray (int *a, int sz)
{
register int i;
printf (" [");
for (i = 0; i < sz; i++)
printf (" %d", a[i]);
printf (" ]\n");
}
output:
$ ./bin/sort_count
The original array is:
[ 10 6 8 2 3 ]
The sorted array is
[ 2 3 6 8 10 ]
I'm writing a program to look for the longest Collatz sequence starting under 1,000,000.
I was really proud of this code, it seemed so efficient and clean and well written... until I tried to run it. After a little debugging to get it to compile, I found that when I run the program, it crashes.
I have used both
int array[1000000];
and
int *array;
array = (int*)calloc(s, sizeof(int));
(where s=1000000)
to declare an array of 1,000,000 spaces.
So part A) of my question: Is it ridiculous or possible to declare an array of that size?
and part B) of my question: This is used for a 'checklist' of sorts, checking which numbers have already been seen. Is there a simpler or better or just different method of 'checking off' numbers that I should be using instead?
the code is as follows:
// This is a program to find the longest Collatz sequence starting under 1,000,000
#include <stdio.h>
#include <stdlib.h>
int main()
{
// Collatz sequence: IF EVEN n/2 :: IF ODD 3n+1
//define ints
int i;
int n;
int c; // counter of sequence length
int longestsequence = 0;
int beststart;
int s = 1000000; //size of array
//define int array
//int array[999999];
//define array using calloc
//define pointer for calloc int array
int *array;
// do your calloc thing
array = (int*)calloc(s, sizeof(int)); // allocates 1,000,000 spots (s) of size "int" to array "array"
//fill array
for(i = 0; i < 1000000; i++)
{
array[i] = i;
}
for(i = 999999; i > 500000; i--)
{
if(array[i] == 0) // skip if number has already been seen
goto done;
n = i;
c = 0;
//TEST
printf("Current starting number is: %d\n", i);
//TEST
while(n != 4) // run and count collatz sequence
{
//TEST
//printf("test1\n");
//TEST
if(n % 2 == 0) // EVEN
n = n/2;
else // ODD
n = 3 * n + 1;
//TEST
//printf("test2\n");
//TEST
c++;
//TEST
//printf("test3\n");
//TEST
if(n < 1000000 && array[n] != 0) // makes note of used numbers under 1000000
array[n] = 0;
//TEST
//printf("test4\n");
//TEST
}
if(longestsequence < c)
{
longestsequence = c;
beststart = i;
//TEST
printf("Current best start is: %d\n", beststart);
//TEST
}
done:
}
printf("the starting number that produces the longest Collatz sequence is...\n");
printf("%d\n", beststart);
getchar();
return 0;
}
Thanks for any and all help and suggestions! Links to helpful sources are always appreciated.
UPDATE!
1.My code now looks like this^^^^
and
2.The program runs, and then mysteriously stops at i value 999167
for(i = 999999; i > 4; i++)
You easily go beyond array boundary here. I guess what you meant was
for(i = 999999; i > 4; --i)
// ^^^
Also, as in your implementation, 1 million element is not enough.
Take n == 999999 as example. In the 1st step, you compute 3 * n + 1, which is obviously way larger than 1000000. A simple solution would be change
if(array[n-1] != 0) // makes note of used numbers
array[n-1] = 0;
into
if(n < s && array[n-1] != 0) // makes note of used numbers
array[n-1] = 0;
which just disables result lookup when n is over array boundary.
You could use a simple linked list of numbers, which will reduce the memory requirements at the expense of "long" search times. I've always noticed a bit of repetition:
1
2 → 1 (already seen in 1, so link to the existing 1)
3 → 5 → 16 → 8 → 4 → 2 (already seen in 2, so link to the existing 2)
4 (link to existing after 8)
5 (link to existing after 5)
etc.
You would have a number A and possibly one more number B link to a number N for some numbers, but N would only link to one number C. For example:
A -> N -> C
3 -> 10 -> 5
20 -> 10 -> 5
B -> N -> C
Of course, you could optimize it by storing a length of the list and an extra pointer containing the next adjacent number, allowing you to implement a binary search using that length as a guide.
However, if you're only looking for the longest sequence length instead of the sequence itself, why aren't you merely storing the longest length found and comparing it to the length of the current sequence? Storing the numbers only for calculating the length seems like overkill. Something like the following pseudocode:
Longest := 0
For N = 1 To 1000000
Length := 1
X := N
While X != 1
Length := Length + 1
If IsEven(X) Then
X := 3 * X + 1
Else
X := X / 2
End If
End While
If Length > Longest Then
Longest := Length
End If
End For
Print("Longest sequence less than 1000000 is: ", Longest)
The line
n = 3 * n + 1;
ends up setting the value of n to be higher than the valid index. The highest valid index is 999999. You have to make sure that n is less than or equal to 1000000 before you access the array in:
if(array[n-1] != 0) // makes note of used numbers
array[n-1] = 0;
You don't check the array index [n-1] within the while loop to ensure it doesn't exceed the array bounds of 1,000,000. For example, in your first loop i = 999,999 which makes `n = 999999*3+1 = 2,999,998'.
Solution is to make sure n doesn't exceed your array size.