Minimize the sum of errors of representative integers - arrays

Given n integers between [0,10000] as D1,D2...,Dn, where there may be duplicates, and n can be huge:
I want to find k distinct representative integers (for example k=5) between [0,10000] as R1,R2,...,Rk, so the sum of errors of all the representative integers is minimized.
The error of a representative integer is defined below:
Assuming we have k representative integers in ascending order as {R1,R2...,Rk}, the error of Ri
is :
and i want to minimize the sum of errors of the k representative integers:
how can this be done efficiently?
EDIT1: The smallest of the k representative integers must be the smallest number in
{D1,D2...,Dn}
EDIT2: The largest of the k representative integers must be the largest number in {D1,D2...,Dn} plus 1. For example when the largest number in {D1,D2...,Dn} is 9787 then Rk is 9788.
EDIT3: A concrete example is here:
D={1,3,3,7,8,14,14,14,30} and if k=5 and R is picked as {1,6,10,17,31} then the sum of errors is :
sum of errors=(1-1)+(3-1)*2+(7-6)+(8-6)+(14-10)*3+(30-17)=32
this is because 1<=1,3,3<6 , 6<=7,8<10, 10<=14,14,14<17, 17<=30<31

Although with help from the community you have managed to state your
problem in a form which is understandable mathematically, you still do
not supply enough information to enable me (or anyone else) to give
you a definitive answer (I would have posted this as a comment, but
for some reason I don't see the "add comment" option is available to
me). In order to give a good answer to this problem, we need to know
the relative sizes of n and k versus each other and 10000 (or the
expected maximum of the Di if it isn't 10000), and whether
it is critical that you attain the exact minimum (even if this
requires an exorbitant amount of time for the calculation) or if a
close approximation would be OK, also (and if so, how close do you
need to get). In addition, in order to know what algorithm runs in
minimum time, we need to have an understanding about what kind of
hardware is going to run the algorithm (i.e., do we have m CPU
cores to run on in parallel and what is the size of m relative to k).
Another important piece of information is whether this problem will be
solved only once, or it must be solved many times but there exists
some connection between the distributions of the integers Di
from one problem to the next (e.g., the integers
Di are all random samples from a particular, unchanging
probability distribution, or perhaps each successive problem has as
its input an ever-increasing set which is the set from the previous
problem plus an extra s integers).
No reasonable algorithm for your problem should run in time which
depends in a way greater than linear in n, since building a histogram
of the n integers Di requires O(n) time, and the answer to
the optimization problem itself is only dependent on the histogram of
the integers and not on their ordering. No algorithm can run in time
less than O(n), since that is the size of the input of the problem.
A brute force search over all of the possibilities requires, (assuming
that at least one of the Di is 0 and another is 10000), for
small k, say k < 10, approximately O(10000k-2) time, so if
log10(n) >> 4(k-2), this is the optimal algorithm (since in
this case the time for the brute force search is insignificant compared to the
time to read the input). It is also interesting to note that if k is
very close to 10000, then a brute force search only requires
O(1000010002-k) (because we can search instead over the
integers which are not used as representative integers).
If you update the definition of the problem with more information, I
will attempt to edit my answer in turn.

Now the question is clarified, we observe the Ri divide the Dx into k-1 intervals, [R1,R2), [R2,R3), ... [Rk-1,Rk). Every Dx belongs to exactly one of those intervals. Let qi be the number of Dx in the interval [Ri,Ri+1), and let si be the sum of those Dx. Then each error(Ri) expression is the sum of qi terms and evaluates to si - qiRi.
Summing that over all i, we get a total error of S - sum(qiRi), where S is the sum of all the Dx. So the problem is to choose the Ri to maximize sum(qiRi). Remember each qi is the number of original data at least as large as Ri, but smaller than the next one.
Any global maximum must be a local maximum; so we imagine increasing or decreasing one of the Ri. If Ri is not one of the original data values, then we can increase it without changing any of the qi and improve our target function. So an optimal solution has each Ri (except the limiting last one) as one of the data values. I got a bit bogged down in math after that, but it seems a sensible approach is to pick the initial Ri as every (n/k)th data value (simple percentiles), then iteratively seeing if moving the R_i to the previous or next value improves the score and thus decreases the error. (The qiRi seems easier to work with, since you can read the data and count repetitions and update qi, Ri by only looking at a single data/count point. You only need to store an array of 10,000 data counts, no matter how huge the data).
data: 1 3 7 8 14 30
count: 1 2 1 1 3 1 sum(data) = 94
initial R: 1 3 8 14 31
initial Q: 1 3 1 4 sum(QR) = 74 (hence error = 20)
In this example, we could try changing the 3 or the 8 to a 7, For example if we increase the 3 to 7, then we see there are 2 3's in the initial data, so the first two Q's become 1+2, 3-2 - it turns out this decreases sum(QR)). I'm sure there are smarter patterns to detect what changes in the QR table are viable, but this seems workable.

If the distribution is near random and the selection (n) is large enough, you are wasting time, generally, trying to optimize for what will amount to real costs in time calculating to gain decreasing improvements in % from expected averages. The fastest average solution is to set the lower k-1 at the low end of intervals M/(k-1), where M is the lowest upper bound - the greatest lower bound (ie, M = max number possible - 0) and the last k at M+1. It would take order k (the best we can do with the information presented in this problem) to figure those values out. Stating what I just did is not a proof of course.
My point is this. The above discussion is one simplification that I think is very practical for one large class of sets. At the other end, it's straightforward to compute every error possible for all permutations and then select the smallest one. The running time for this makes that solution intractable in many cases. That the person asking this question expects more than the most direct and exact (intractable) answer leaves much that is open-ended. We can trim at the edges from here to eternity trying to quantify all sorts of properties along the infinite solution space for all possible permutations (or combinations) of n numbers and all k values.

The integrity of this program was partly confirmed by a modified version of it used here to produce data that matched results obtained independently by #mhum.
It finds the exact minimal error value(s) and corresponding R values for a given data set and k value(s).
/************************************************************
This program can be compiled and run (eg, on Linux):
$ gcc -std=c99 minimize-sum-errors.c -o minimize-sum-errors
$ ./minimize-sum-errors
************************************************************/
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
//data: Data set of values. Add extra large number at the end
int data[]={
10,40,90,160,250,360,490,640,810,1000,1210,1440,1690,1960,2250,2560,2890,3240,3610,4000,4410,4840,5290,5760,6250,6760,7290,7840,8410,9000,9610,10240,99999,1,2,3,4,5,6,7,8,9,10,24,24,14,12,41,51,21,41,41,848,21, 547,3,2,888,4,1,66,5,4,2,11,742,95,25,365,52,441,874,51,2,145,254,24,245,54,21,87,98,65,32,25,14,25,36,25,14,47,58,58,69,85,74,71,82,82,93,93,93,12,12,45,78,78,985,412,74,3,62,125,458,658,147,432,124,328,952,635,368,634,637,874,124,35,23,65,896,21,41,745,49,2,7,8,4,8,7,2,6,5,6,9,8,9,6,5,9,5,9,5,9,11,41,5,24,98,78,45,54,65,32,21,12,18,38,48,68,78,75,72,95,92,65,55,5,87,412,158,654,219,943,218,952,357,753,159,951,485,862,1,741,22,444,452,487,478,478,495,456,444,141,238,9,445,421,441,444,436,478,51,24,24,24,24,24,24,247,741,98,99,999,111,444,323,33,5,5,5,85,85,85,85,654,456,5,4,566,445,5664,45,4556,45,5,6,5,4,56,66,5,456,5,45,6,68,7653,434,4,6,7,689,78,8,99,8700,344,65,45,8,899,86,65,3,422,3,4,3,4,7,68,544,454,545,65,4,6,878,423,64,97,8778,5456,5486,5485,545,5455,4548,81,999,8233,5223,8741,7747,7756,54,7884,5477,89,332,5999,9861,12545,9852,11452,5482,9358,9845,577,884,5589,5412,3669,32,6699,396,9629,953,321,45,5484,588,5872,85,872,87,1122,884,2458,471,22685,955,2845,6852,589,5896,2521,35696,5236,32633,56665,6633,326,5486,5487,8541,5495,2155,3,8523,65896,3852,5685,69536,1,1,1,1,1,2,3,4,5,6,
};
//N: size of data set
int N=sizeof(data)/sizeof(int);
//SavedBundle: data type to "hold" memoized values needed (minimized error sums and corresponding "list" of R values for a given round)
typedef struct _SavedBundle {
long e;
int head_index_value;
int tail_offset;
} SavedBundle;
//sb: (pts to) lookup table of all calculated values memoized
SavedBundle *sb; //holds winning values being memoized
//Sort in increasing order.
int sortfunc (const void *a, const void *b) {
return (*(int *)a - *(int *)b);
}
/****************************
Most interesting code in here
****************************/
long full_memh(int l, int n) {
long e;
long e_min=-1;
int ti;
if (sb[l*N+n].e) {
return sb[l*N+n].e; //convenience passing
}
for (int i=l+1; i<N-1; i++) {
e=0;
//sum first part
for (int j=l+1; j<i; j++) {
e+=data[j]-data[l];
}
//sum second part
if (n!=1) //general case, recursively
e+=full_memh(i, n-1);
else //base case, iteratively
for (int j=i+1; j<N-1; j++) {
e+=data[j]-data[i];
}
if (e_min==-1) {
e_min=e;
ti=i;
}
if (e<e_min) {
e_min=e;
ti=i;
}
}
sb[l*N+n].e=e_min;
sb[l*N+n].head_index_value=ti;
sb[l*N+n].tail_offset=ti*N+(n-1);
return e_min;
}
/**************************************************
Call to calculate and print results for the given k
**************************************************/
int full_memoization(int k) {
char *str;
long errorsum; //for convenience
int idx;
//Call recursive workhorse
errorsum=full_memh(0, k-2);
//Now print
str=(char *) malloc(k*20+100);
sprintf (str,"\n%6d %6d {%d,",k,errorsum,data[0]);
idx=0*N+(k-2);
for (int i=0; i<k-2; i++) {
sprintf (str+strlen(str),"%d,",data[sb[idx].head_index_value]);
idx=sb[idx].tail_offset;
}
sprintf (str+strlen(str),"%d}",data[N-1]);
printf ("%s",str);
free(str);
return 0;
}
/******************************************
Initialize and seek result for all k values
******************************************/
int main (int x, char **y) {
int t;
int i2;
qsort(data,N,sizeof(int),sortfunc);
sb = (SavedBundle *)calloc(sizeof(SavedBundle),N*N);
printf("\n Total data size: %d",N);
printf("\n k errSUM R values",N);
for (int i=3; i<=N; i++) {
full_memoization(i);
}
free(sb);
return 0;
}
Some sample results obtained:
Total data size: 375
k errSUM R values
3 475179 {1,5223,99999}
4 320853 {1,5223,56665,99999}
5 260103 {1,5223,7653,56665,99999}
6 210143 {1,5223,7653,32633,56665,99999}
7 171503 {1,421,5223,7653,32633,56665,99999}
8 142865 {1,412,2458,5223,7653,32633,56665,99999}
9 124403 {1,412,2458,5223,7653,32633,56665,65896,99999}
10 106790 {1,412,2458,5223,7653,9610,32633,56665,65896,99999}
11 93715 {1,412,2458,5223,7653,9610,22685,32633,56665,65896,99999}
12 81507 {1,412,848,2458,5223,7653,9610,22685,32633,56665,65896,99999}
13 71495 {1,412,848,2155,3610,5223,7653,9610,22685,32633,56665,65896,99999}
14 64243 {1,412,848,2155,3610,5223,6633,7747,9610,22685,32633,56665,65896,99999}
15 58355 {1,412,848,2155,3610,5223,6633,7653,8523,9610,22685,32633,56665,65896,99999}
16 53363 {1,65,412,848,2155,3610,5223,6633,7653,8523,9610,22685,32633,56665,65896,99999}
17 48983 {1,65,412,848,2155,3610,4548,5412,6633,7653,8523,9610,22685,32633,56665,65896,99999}
18 45299 {1,65,412,848,2155,3610,4548,5412,6633,7653,8523,9610,11452,22685,32633,56665,65896,99999}
19 41659 {1,65,412,848,2155,3610,4548,5412,6633,7653,8523,9610,11452,22685,32633,56665,65896,69536,99999}
20 38295 {1,65,321,441,848,2155,3610,4548,5412,6633,7653,8523,9610,11452,22685,32633,56665,65896,69536,99999}
21 35232 {1,65,321,441,848,2155,3610,4548,5412,6633,7653,8523,9610,11452,22685,32633,35696,56665,65896,69536,99999}
22 32236 {1,65,321,441,848,2155,3610,4410,5223,5455,6633,7653,8523,9610,11452,22685,32633,35696,56665,65896,69536,99999}
23 29323 {1,65,321,432,634,872,2155,3610,4410,5223,5455,6633,7653,8523,9610,11452,22685,32633,35696,56665,65896,69536,99999}
24 26791 {1,65,321,432,634,862,1690,2458,3610,4410,5223,5455,6633,7653,8523,9610,11452,22685,32633,35696,56665,65896,69536,99999}
25 25123 {1,65,321,432,634,862,1690,2458,3610,4410,5223,5455,5872,6633,7653,8523,9610,11452,22685,32633,35696,56665,65896,69536,99999}
26 23658 {1,65,321,432,634,862,1690,2458,3610,4410,5223,5455,5872,6633,7653,8233,8700,9610,11452,22685,32633,35696,56665,65896,69536,99999}
27 22333 {1,41,78,321,432,634,862,1690,2458,3610,4410,5223,5455,5872,6633,7653,8233,8700,9610,11452,22685,32633,35696,56665,65896,69536,99999}
28 21073 {1,41,78,321,432,634,862,1440,2155,2845,3610,4410,5223,5455,5872,6633,7653,8233,8700,9610,11452,22685,32633,35696,56665,65896,69536,99999}
29 19973 {1,41,78,321,432,634,848,951,1960,2458,2845,3610,4410,5223,5455,5872,6633,7653,8233,8700,9610,11452,22685,32633,35696,56665,65896,69536,99999}
30 18879 {1,41,78,321,432,634,848,951,1960,2458,2845,3610,4410,5223,5455,5872,6633,7653,8233,8700,9358,9845,11452,22685,32633,35696,56665,65896,69536,99999}
31 17786 {1,41,78,321,432,634,848,951,1960,2458,2845,3610,4410,5223,5455,5872,6633,7653,8233,8700,9358,9845,11452,12545,22685,32633,35696,56665,65896,69536,99999}
32 16801 {1,41,78,321,432,634,848,943,1440,1960,2458,2845,3610,4410,5223,5455,5872,6633,7653,8233,8700,9358,9845,11452,12545,22685,32633,35696,56665,65896,69536,99999}
33 15821 {1,41,78,218,321,432,634,848,943,1440,1960,2458,2845,3610,4410,5223,5455,5872,6633,7653,8233,8700,9358,9845,11452,12545,22685,32633,35696,56665,65896,69536,99999}
34 14900 {1,41,78,218,321,421,544,634,848,943,1440,1960,2458,2845,3610,4410,5223,5455,5872,6633,7653,8233,8700,9358,9845,11452,12545,22685,32633,35696,56665,65896,69536,99999}
35 14185 {1,41,78,218,321,421,544,634,848,943,1440,1960,2458,2845,3610,4410,5223,5455,5872,6633,7290,7747,8410,8700,9358,9845,11452,12545,22685,32633,35696,56665,65896,69536,99999}
36 13503 {1,41,78,218,321,421,544,634,741,862,951,1440,1960,2458,2845,3610,4410,5223,5455,5872,6633,7290,7747,8410,8700,9358,9845,11452,12545,22685,32633,35696,56665,65896,69536,99999}
37 12859 {1,21,45,78,218,321,421,544,634,741,862,951,1440,1960,2458,2845,3610,4410,5223,5455,5872,6633,7290,7747,8410,8700,9358,9845,11452,12545,22685,32633,35696,56665,65896,69536,99999}
38 12232 {1,21,45,78,218,321,421,544,634,741,862,951,1440,1960,2458,2845,3610,4410,5223,5455,5664,5872,6633,7290,7747,8410,8700,9358,9845,11452,12545,22685,32633,35696,56665,65896,69536,99999}
39 11662 {1,21,45,78,218,321,412,444,544,634,741,862,951,1440,1960,2458,2845,3610,4410,5223,5455,5664,5872,6633,7290,7747,8410,8700,9358,9845,11452,12545,22685,32633,35696,56665,65896,69536,99999}
40 11127 {1,21,45,78,218,321,412,444,544,634,741,862,951,1440,1960,2458,2845,3610,4410,5223,5455,5664,5872,6633,7290,7747,8233,8523,8700,9358,9845,11452,12545,22685,32633,35696,56665,65896,69536,99999}
41 10623 {1,21,45,78,218,321,412,444,544,634,741,862,951,1440,1960,2458,2845,3610,4410,5223,5455,5664,5872,6633,7290,7747,8233,8523,8700,9358,9610,9845,11452,12545,22685,32633,35696,56665,65896,69536,99999}
42 10121 {1,21,41,65,85,218,321,412,444,544,634,741,862,951,1440,1960,2458,2845,3610,4410,5223,5455,5664,5872,6633,7290,7747,8233,8523,8700,9358,9610,9845,11452,12545,22685,32633,35696,56665,65896,69536,99999}
43 9637 {1,21,41,65,85,218,321,412,444,544,634,741,862,951,1440,1960,2458,2845,3610,3852,4410,5223,5455,5664,5872,6633,7290,7747,8233,8523,8700,9358,9610,9845,11452,12545,22685,32633,35696,56665,65896,69536,99999}
44 9207 {1,21,41,65,85,218,321,412,444,544,634,741,862,951,1440,1960,2458,2845,3610,3852,4410,4840,5223,5455,5664,5872,6633,7290,7747,8233,8523,8700,9358,9610,9845,11452,12545,22685,32633,35696,56665,65896,69536,99999}
45 8804 {1,21,41,65,85,218,321,412,444,544,634,741,862,943,1122,1690,2155,2458,2845,3610,3852,4410,4840,5223,5455,5664,5872,6633,7290,7747,8233,8523,8700,9358,9610,9845,11452,12545,22685,32633,35696,56665,65896,69536,99999}
46 8409 {1,21,41,65,85,218,321,412,444,544,634,741,862,943,1122,1690,2155,2458,2845,3240,3610,3852,4410,4840,5223,5455,5664,5872,6633,7290,7747,8233,8523,8700,9358,9610,9845,11452,12545,22685,32633,35696,56665,65896,69536,99999}
47 8014 {1,21,41,65,85,218,321,412,444,544,634,741,862,943,1122,1690,2155,2458,2845,3240,3610,3852,4410,4840,5223,5455,5664,5872,6633,7290,7747,8233,8523,8700,9358,9610,9845,10240,11452,12545,22685,32633,35696,56665,65896,69536,99999}
48 7636 {1,21,41,65,85,218,321,412,444,544,634,741,862,943,1122,1690,2155,2458,2845,3240,3610,3852,4410,4840,5223,5455,5664,5872,6250,6633,7290,7747,8233,8523,8700,9358,9610,9845,10240,11452,12545,22685,32633,35696,56665,65896,69536,99999}
49 7273 {1,21,41,65,85,218,321,412,444,544,634,741,862,943,1122,1690,2155,2458,2845,3240,3610,3852,4410,4840,5223,5455,5664,5872,6250,6633,7290,7653,7747,8233,8523,8700,9358,9610,9845,10240,11452,12545,22685,32633,35696,56665,65896,69536,99999}
50 6922 {1,21,41,65,85,124,218,321,412,444,544,634,741,862,943,1122,1690,2155,2458,2845,3240,3610,3852,4410,4840,5223,5455,5664,5872,6250,6633,7290,7653,7747,8233,8523,8700,9358,9610,9845,10240,11452,12545,22685,32633,35696,56665,65896,69536,99999}
51 6584 {1,21,41,65,85,124,218,321,412,444,544,634,741,862,943,1122,1440,1960,2155,2458,2845,3240,3610,3852,4410,4840,5223,5455,5664,5872,6250,6633,7290,7653,7747,8233,8523,8700,9358,9610,9845,10240,11452,12545,22685,32633,35696,56665,65896,69536,99999}
52 6283 {1,21,41,65,85,124,218,321,412,444,544,634,741,862,943,1122,1440,1960,2155,2458,2845,3240,3610,3852,4410,4840,5223,5412,5477,5664,5872,6250,6633,7290,7653,7747,8233,8523,8700,9358,9610,9845,10240,11452,12545,22685,32633,35696,56665,65896,69536,99999}
53 5983 {1,21,41,65,85,124,218,321,412,444,544,634,741,862,943,1122,1440,1960,2155,2458,2845,3240,3610,3852,4410,4840,5223,5412,5477,5664,5872,6250,6633,7290,7653,7747,8233,8523,8700,9000,9358,9610,9845,10240,11452,12545,22685,32633,35696,56665,65896,69536,99999}
54 5707 {1,21,41,65,85,124,218,321,412,444,544,634,741,862,943,1122,1440,1960,2155,2458,2845,3240,3610,3852,4410,4548,4840,5223,5412,5477,5664,5872,6250,6633,7290,7653,7747,8233,8523,8700,9000,9358,9610,9845,10240,11452,12545,22685,32633,35696,56665,65896,69536,99999}
55 5450 {1,21,41,65,85,124,218,321,412,441,478,544,634,741,862,943,1122,1440,1960,2155,2458,2845,3240,3610,3852,4410,4548,4840,5223,5412,5477,5664,5872,6250,6633,7290,7653,7747,8233,8523,8700,9000,9358,9610,9845,10240,11452,12545,22685,32633,35696,56665,65896,69536,99999}
56 5196 {1,21,41,65,85,124,218,321,412,441,478,544,634,741,862,943,1122,1440,1960,2155,2458,2845,3240,3610,3852,4410,4548,4840,5223,5412,5477,5664,5872,6250,6633,6760,7290,7653,7747,8233,8523,8700,9000,9358,9610,9845,10240,11452,12545,22685,32633,35696,56665,65896,69536,99999}
57 4946 {1,21,41,65,85,124,218,321,412,441,478,544,634,741,862,943,1122,1440,1690,1960,2155,2458,2845,3240,3610,3852,4410,4548,4840,5223,5412,5477,5664,5872,6250,6633,6760,7290,7653,7747,8233,8523,8700,9000,9358,9610,9845,10240,11452,12545,22685,32633,35696,56665,65896,69536,99999}
58 4722 {1,5,21,41,65,85,124,218,321,412,441,478,544,634,741,862,943,1122,1440,1690,1960,2155,2458,2845,3240,3610,3852,4410,4548,4840,5223,5412,5477,5664,5872,6250,6633,6760,7290,7653,7747,8233,8523,8700,9000,9358,9610,9845,10240,11452,12545,22685,32633,35696,56665,65896,69536,99999}
59 4536 {1,5,21,41,65,85,124,218,321,412,441,478,544,634,741,862,943,1122,1440,1690,1960,2155,2458,2845,3240,3610,3852,4410,4548,4840,5223,5412,5477,5664,5872,6250,6633,6760,7290,7653,7747,7840,8233,8523,8700,9000,9358,9610,9845,10240,11452,12545,22685,32633,35696,56665,65896,69536,99999}
60 4352 {1,5,21,41,65,85,124,218,321,412,441,478,544,634,741,848,872,951,1122,1440,1690,1960,2155,2458,2845,3240,3610,3852,4410,4548,4840,5223,5412,5477,5664,5872,6250,6633,6760,7290,7653,7747,7840,8233,8523,8700,9000,9358,9610,9845,10240,11452,12545,22685,32633,35696,56665,65896,69536,99999}
61 4172 {1,5,21,41,65,85,124,218,321,357,412,441,478,544,634,741,848,872,951,1122,1440,1690,1960,2155,2458,2845,3240,3610,3852,4410,4548,4840,5223,5412,5477,5664,5872,6250,6633,6760,7290,7653,7747,7840,8233,8523,8700,9000,9358,9610,9845,10240,11452,12545,22685,32633,35696,56665,65896,69536,99999}
62 3995 {1,5,21,41,65,85,124,218,321,357,412,441,478,544,634,741,848,872,951,1122,1440,1690,1960,2155,2458,2845,3240,3610,3852,4410,4548,4840,5223,5412,5477,5664,5872,6250,6633,6760,7290,7653,7747,7840,8233,8410,8523,8700,9000,9358,9610,9845,10240,11452,12545,22685,32633,35696,56665,65896,69536,99999}
63 3828 {1,5,21,41,65,85,124,218,321,357,412,441,478,544,634,741,848,872,943,985,1122,1440,1690,1960,2155,2458,2845,3240,3610,3852,4410,4548,4840,5223,5412,5477,5664,5872,6250,6633,6760,7290,7653,7747,7840,8233,8410,8523,8700,9000,9358,9610,9845,10240,11452,12545,22685,32633,35696,56665,65896,69536,99999}
64 3680 {1,5,21,41,65,85,124,218,321,357,412,441,478,544,634,741,848,872,943,985,1122,1440,1690,1960,2155,2458,2845,3240,3610,3852,4000,4410,4548,4840,5223,5412,5477,5664,5872,6250,6633,6760,7290,7653,7747,7840,8233,8410,8523,8700,9000,9358,9610,9845,10240,11452,12545,22685,32633,35696,56665,65896,69536,99999}
65 3545 {1,5,21,32,45,65,85,124,218,321,357,412,441,478,544,634,741,848,872,943,985,1122,1440,1690,1960,2155,2458,2845,3240,3610,3852,4000,4410,4548,4840,5223,5412,5477,5664,5872,6250,6633,6760,7290,7653,7747,7840,8233,8410,8523,8700,9000,9358,9610,9845,10240,11452,12545,22685,32633,35696,56665,65896,69536,99999}
66 3418 {1,5,21,32,45,65,85,124,218,321,357,412,441,478,544,634,741,848,872,943,985,1122,1440,1690,1960,2155,2458,2845,3240,3610,3852,4000,4410,4548,4840,5223,5412,5477,5664,5872,5999,6250,6633,6760,7290,7653,7747,7840,8233,8410,8523,8700,9000,9358,9610,9845,10240,11452,12545,22685,32633,35696,56665,65896,69536,99999}
The output started slowing down on the PC too much before it reached 100 rows (of the possible 375).

This algorithm does not produce exact answers, but it allows very large data sets to be handled much faster than the memoized algorithm and with results that tends to be very close to the exact answer. Algorithm still needs improvement/debugging to prevent potential infinite loops in some cases, but that is not a deal-breaker since code can be added to stop that. Meanwhile, the alg excels with data sets too large to be managed by some other generally preferable algorithms (like the memoized alg answer submitted earlier). For example, on nearly 10,000 samples from the range [1-100000], k=500 is calculated in seconds on an old PC where the memo version appeared would take over an hour to do a much smaller k=90 on a much much smaller data set of size 375. For this kind of added performance, not getting the absolute lowest error sum is a very small price to pay. [I have not derived the quality of the results, but all comparisons made on data values where memo could keep up yielded not much over 10% worse, if that.]
/************************************************************
This program can be compiled and run (eg, on Linux):
$ gcc -std=c99 fast-inexact.c -o fast-inexact
$ .fast-inexact
************************************************************/
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>
//a: Data set of values. Add extra large number at the end
int a[]={
10,40,90,160,250,360,490,640,810,1000,1210,1440,1690,1960,2250,2560,2890,3240,3610,4000,4410,
4840,5290,5760,6250,6760,7290,7840,8410,9000,9610,10240,99999,
1,2,3,4,5,6,7,8,9,10,
24,24,14,12,41,51,21,41,41,848,21, 547,3,2,888,4,1,66,5,4,2,11,742,
95,25,365,52,441,874,51,2,145,254,24,245,54,21,87,98,65,32,25,14,
25,36,25,14,47,58,58,69,85,74,71,82,82,93,93,93,12,12,45,78,78,985,
412,74,3,62,125,458,658,147,432,124,328,952,635,368,634,637,874,
124,35,23,65,896,21,41,745,49,2,7,8,4,8,7,2,6,5,6,9,8,9,6,5,9,5,9,
5,9,11,41,5,24,98,78,45,54,65,32,21,12,18,38,48,68,78,75,72,95,92,65,
55,5,87,412,158,654,219,943,218,952,357,753,159,951,485,862,1,741,22,
444,452,487,478,478,495,456,444,141,238,9,445,421,441,444,436,478,51,
24,24,24,24,24,24,247,741,98,99,999,111,444,323,33,5,5,5,85,85,85,85,
654,456,5,4,566,445,5664,45,4556,45,5,6,5,4,56,66,5,456,5,45,6,68,7653,
434,4,6,7,689,78,8,99,8700,344,65,45,8,899,86,65,3,422,3,4,3,4,7,68,
544,454,545,65,4,6,878,423,64,97,8778,5456,5486,5485,545,5455,4548,81,
999,8233,5223,8741,7747,7756,54,7884,5477,89,332,5999,9861,12545,9852,
11452,5482,9358,9845,577,884,5589,5412,3669,32,6699,396,9629,953,321,
45,5484,588,5872,85,872,87,1122,884,2458,471,22685,955,2845,6852,589,
5896,2521,35696,5236,32633,56665,6633,326,5486,5487,8541,5495,2155,3,
8523,65896,3852,5685,69536,
1,1,1,1,1,2,3,4,5,6,
//375
54,5451,545,54,885,855,8621,5,23,7,54,89,3,8545,196,35338,6412,5338,35512,8545,55483,3548,34878,37846,1545,2489,24534,84234,56465,8643,454,8,548,78,454,85,44,54564,87,85,45,48,54,564,67564,8945,864,54564,864,5453,554,7894,65456,45,5489,8424,84248,543,5454,82,54548,44,54654,8454,54,684,54,34,8,454,87,84,4,548,45456,48454,86465,4,454,45,4445,4564,484,4564,64654,56456,54,45,121,2851,15,248,24853,845,8485,384,3484,3484,3853,183,4835,83545,82,1851,6851,854,83,48434,87,34,854,943,849,468,4654,97,35,494,6549,878,65,2184,4845,4564,64,8,44,84,5,4454,4845,484,8513,897,47,8789,764,54,454,54894,454,842,181,54,81348,4518,548,51,813,1851,1841,5484,51,8431,8484,5487,79,4,31,31,84,87,74111,1,7272,7814,18,781,1,7,823,27872,8,8178,4156,485,184,84,45,18,75,18,715,48,78174,6541,8,54,8,41,8,4564,187,154,841,9,4194,53,4194,15,48,941,48,941,5,489,7415,41,49,41,54,54545,494,15,98,4189,5641,841,5145,41,416,48,414,4,841,5414,61,41,9891,61,169,19,1989,173,48154,56116,187,191,61,61418,8719,8187,51842,815,4815,4984,5,484,15,4897,18,4151,81,8941,549,1,5498,15,89,12,4,97,97,1,591,519,1,51,9,15,1655,65,2,3214,2365,8,77899,6565,6589,586,5,66,5,23669,5,9,59,9,8,569,3,3,6,96,99,955,5,96,9595,95,629,8971,81,5715,45,141,4819,84,518,81,87,2,41,5,98,41,54,9415,49,841,54,591,54,918,781,794,1221,2891,5,19878,154,9,4154,94,1518,41,49,415,49,15,4541,954,78,219,45,4515,49,9187,1549,15,985,14984,1597,91978,1541,41,5491,54197,815,914,91,78195,4179,1984,971,54,91,5198,71914,97,194,914,59419,49,4194,941,94191,41,9419,1941,914,9149,4191,1,19149,4949,454,141,1,9,489,415,4941,9841,24,8941,54,5915,198,419,24949,194,8545,4591,5498,714,54,984,5491,54,978,154,978,154,91495,41945,49,41954,8,154,94,149,4594,54,98,154,594,984,815,45,9148,4191,19,84,15,1,948,7897,184,5419,71,4194,8419,41984,954,54,1941,81798,789,459,45,4198,787,184,941,921,987,181,541,48,971,894,9145,594,19,78,48,4984,184,945,4,194,19849,454,978,4154,944,84154,9871,8489,4154,841,8945,198,710,45,4,51,541,984,982,1954,81,491,2465,498,5419,7481,5497,8515,498,4154,979,871,41,148,11971,184,94145,498,15,48,154,9418,41894,815,494,145,419,8,151,25,18940,5415,64348,74851,541,9481,24,9841,2,498,124,91,594,34614,64,8491,456,4164,81,3496,494,324,16498,15,4917,9841,546,4841,546,484,54541,654,81,246,518,4841,65,486,4165,4654,8415,4646,846,41654,864,165,45,4188,165,481,31,354,9415,491,549,484,87131,828,284,842,2,434,8434,64835,4313,143,48,35,498,7,154,8,7897,7154,654,987,564,3546,8789,715,4684,864,234,864,615,467,89,135,4198,7,654,64,189,7817,56,4,654,98,465,46,48,4,354,96,8413,54,8,768,45,165,46,81,654,3,48,7,41,54,6,71654,5618,745,4687,56461,8415,46841,654,18415,4641,684,8641,654,6848,
//1030 data points where 0 sum was reached around k=700
91971,84841,7108,25538,61927,311,13293,49323,82575,42047,42621,4528,33492,40233,8207,19313,17418,20046,97930,91319,21352,75522,80884,92887,3172,3402,30154,53295,45129,64875,76120,95241,75935,50600,14969,24058,64668,10739,74264,82103,95766,81604,31825,48253,98824,53223,50979,74839,22673,6901,6628,40582,11625,16851,74329,34832,99379,67076,64535,32430,87878,39846,87266,94771,68911,30598,78570,11443,96418,82912,14659,57422,88738,73430,37122,14757,65752,64413,55350,47566,40052,5269,63245,91024,62122,96172,73761,32491,9914,22246,56477,72743,31766,539,8060,51233,94746,38936,82773,4027,49755,75621,27878,36503,63731,96923,93088,71466,7829,91854,96506,5351,9372,792,8237,29526,10003,45061,84050,64869,44551,18686,31130,92931,77843,257,81465,33118,41736,19277,23252,95069,38862,84583,1510,78924,52875,83591,88760,51204,55668,31803,28820,72180,85375,31097,61709,65438,76378,50339,69786,24471,37894,62870,61760,37134,19589,41610,54127,65701,23447,47115,77960,13598,42731,76482,51722,53980,83969,68876,28247,64097,74556,89852,32215,28318,66235,62950,5848,45470,40770,50000,20546,47738,5013,56026,69247,9403,14276,78600,52114,49300,57225,70920,41405,25704,72529,85561,95069,24490,22578,66416,10333,42579,7541,34835,89226,88650,29651,87181,47493,73420,73326,86056,96184,881,3074,34043,12385,62809,32617,30558,47161,95675,18317,95487,1691,30156,70901,86281,29738,59373,94311,11038,62245,98438,48944,35946,67426,98144,37638,39288,90091,2419,74368,5501,53487,4721,45268,92114,77645,92420,55346,24469,10418,80834,72980,57352,54643,47955,28398,59555,4432,64450,94353,18022,7363,88904,18304,75731,28145,77099,37077,51892,77769,78618,58440,76279,93078,66569,40061,11341,95239,42097,34627,455,45190,15006,70919,28975,52242,94947,81103,98508,18289,49883,93925,10329,28593,59948,62807,53107,82485,46257,99603,81315,69200,57179,81100,70139,56208,71697,58216,18287,56682,80797,74856,68581,24932,56111,79553,44985,1078,33601,5052,46698,58454,21591,22216,75724,51901,19814,34312,93688,12404,1472,74226,42104,88751,74560,27770,52677,1257,3921,14543,38065,62154,80166,41952,83753,27875,96367,46870,56989,70061,29349,30417,14600,15638,9381,12672,32427,52193,63465,21644,79884,1788,84165,86538,32588,14481,62895,18922,17814,52043,27770,90651,30220,54177,684,12877,79534,9521,9151,62696,49504,17889,92016,34501,79437,49929,35694,79281,81751,61146,37207,14690,88139,71934,37867,42414,14138,68956,66459,78179,98301,41906,28393,66701,39038,98593,78928,19123,89097,7903,86555,7229,72289,30837,26828,75810,85795,52580,23946,52315,75066,6195,6247,10422,36205,85037,85639,37868,40653,13242,14990,17400,87468,33841,76043,15413,52200,15840,43988,4222,1163,97877,5894,27907,49478,82287,62434,88319,1326,96296,19314,63080,94678,65175,46033,18353,721,50185,87762,48604,70941,57076,15778,83744,24345,72384,93133,60848,51265,34558,58951,16594,45325,19575,41243,4129,36254,47318,68398,85336,10464,81489,49839,17483,40148,36113,1869,68571,90880,26744,26872,80029,40512,50642,85233,39595,61899,73401,33864,32744,45026,35147,62806,66004,75647,32795,25836,22709,46475,18975,89237,63503,37520,62019,72519,66694,78254,11971,26555,38208,51235,82437,73811,30071,60979,42083,59457,17922,53300,69295,14213,79140,14106,93565,39018,98767,12898,19065,99290,55406,96661,81503,17804,17835,45522,36121,15560,90373,29672,82686,95100,85898,30209,39965,18232,96036,83814,67533,31902,91084,43548,81247,34779,24890,45285,10364,29152,94940,1995,28647,63798,74587,51510,61728,52559,95367,41582,56753,92546,45668,73055,76292,80820,71398,87558,10149,25260,95802,56610,94918,65816,83004,32247,89064,94486,43603,91064,9278,44821,43852,46724,55095,8366,4778,36327,75601,71599,3061,64696,56375,58868,1881,13519,7193,3729,55724,98000,686,20422,84697,6823,99729,51581,9345,3230,33531,62041,75483,78380,13008,92322,73680,95761,3407,73779,1497,25348,4410,4715,97954,27151,96981,33027,16691,4754,50716,26714,15603,3877,63828,2177,78364,78663,90410,32799,40001,88635,31521,62240,71126,88550,45596,35836,30578,14734,48055,78423,99670,66613,25034,16271,95578,39832,59491,64164,90110,24612,94666,98316,36945,84526,23957,35914,74261,10148,89869,7362,96525,28747,19389,54348,30954,83866,24346,62858,96355,25336,89159,17438,19877,26213,67260,19395,50133,17429,80909,44168,77546,44149,40791,21306,59121,22933,97532,24283,47625,60143,50324,31150,79093,34412,15694,57816,56400,30645,44351,91535,47481,71120,45186,25358,96844,2731,37108,15691,10876,85188,81006,56378,7416,80928,73845,50342,33962,45379,14001,62637,45,66328,28684,75003,63335,81237,31773,34202,32170,51647,64902,65287,23594,39435,72560,25085,34321,96756,31878,39290,10456,313,58353,87017,45851,46863,88919,38035,94970,67059,21063,13281,91385,94599,5249,34230,96221,35681,18889,64631,49931,51949,23519,37007,59540,76583,70018,97867,98583,94493,13835,55055,56230,57409,48797,81045,97777,38919,4967,75806,36522,29159,64195,58832,51397,5911,47348,72203,31621,61132,32046,47295,81259,92105,61855,46985,8173,15735,16105,14233,36084,27771,77334,39122,50253,25481,17826,63048,64197,80649,172,94257,41669,22848,45634,72586,11604,36415,75842,95214,81968,86722,8491,5522,778,68350,83144,72919,27675,98142,63391,76649,1091,61181,77909,10498,4311,1144,73887,86234,49497,2192,89204,27685,19088,12111,74087,63381,72931,39497,22860,73816,96460,82602,26617,90907,742,77501,54128,6263,58682,81642,54077,13337,55144,73541,30715,98031,19841,26379,51787,48035,81621,81003,63135,71207,857,53082,49846,33006,69020,32600,28809,93781,27697,28789,84895,40154,42393,46255,83968,38531,59098,23078,2388,31081,47343,91678,12450,54226,9212,68542,55477,6778,75148,15625,15970,58963,76847,7532,43793,56065,6579,15151,54887,15814,80796,62039,38595,82848,30052,22450,42599,2606,11555,17245,8693,90166,98322,3856,43958,78906,24069,91181,74155,11157,26701,75147,79735,83698,59368,99053,27406,35721,38162,72535,90580,98451,36614,74207,57638,43118,31493,54616,3525,14593,70458,65804,2371,29952,95822,16967,46585,85324,44495,40046,40188,28571,49601,87926,46314,89084,54871,51785,30464,40750,88002,46775,99857,41941,70369,49355,82416,67822,88126,72305,68090,42573,6664,50620,8171,54154,64323,71018,70255,49214,19102,13961,38126,13767,75255,89885,24285,6784,45907,76710,69512,96761,36343,9178,43610,26232,20416,35417,79808,48812,1442,12738,14060,27780,73339,72251,22224,984,99484,3129,95242,5406,45172,93152,17698,79263,91020,2372,96955,93000,16632,24974,80075,22770,78679,52026,87169,97389,60924,95753,22470,73104,14341,89258,27802,15165,44009,16116,65558,26768,84349,42048,38158,69626,54520,6232,89607,70649,89678,64649,61427,73712,23429,60767,97914,19092,55872,67273,72611,17408,58426,45902,1158,3151,12460,6843,12175,39110,13795,48488,598,88102,62734,30051,85108,83685,4614,16221,89546,22251,33607,22389,28056,97714,97847,69668,14514,25876,47436,98820,80096,38333,73919,10210,53350,38424,71994,95426,16011,63218,46060,88059,54803,17782,4764,89636,75816,20450,71524,51424,66346,38996,51636,65503,35668,16180,35424,16688,71067,19510,20900,81505,44392,88822,66810,54956,22721,4020,55164,7768,53816,29369,97493,22693,50851,53883,40911,91519,8328,3488,89357,265,68837,37347,68925,53993,39617,21956,81340,90625,17603,82990,55479,96397,54300,90079,19013,58286,80248,93752,48825,87804,38548,82925,79145,68161,26215,27595,28166,84134,53883,72828,14699,57729,9756,21219,71888,58735,27888,77657,9862,29308,5713,10369,5132,16637,36379,74924,73424,7622,67815,44654,61976,37575,67544,41394,765,60364,48627,28929,35016,65876,16879,35727,58510,44848,68747,63314,45271,38285,19974,31022,46601,79594,28293,41943,93783,73472,25540,42352,12406,76008,60580,97316,35941,23328,63611,42353,32625,86073,50162,36848,48968,26200,44694,79594,18595,96664,3781,66827,18775,76745,23087,49444,9680,44804,1139,95993,35979,73285,78351,51555,55892,72987,919,6576,58724,74645,55748,15929,5263,9385,31276,81207,26297,13715,36839,23698,80161,78030,22099,23672,18946,52224,63113,52239,62193,69534,6218,9437,72951,26121,99384,89559,22585,41905,50820,17497,62181,95609,11040,48450,4672,19090,74922,25774,27276,59533,82413,90879,54002,1927,80107,48775,3426,13476,8,56421,57833,11826,84730,13248,44960,22347,77712,38664,51496,10446,26342,92613,2537,50518,15298,50077,95138,59074,38391,25995,51757,68041,45079,73503,44795,73578,73714,86823,85676,86652,24529,65036,8034,84053,27255,51289,41191,54733,32906,43556,99492,36724,54294,58745,21879,39742,9760,84910,89882,22040,74779,87799,6733,85094,51541,2561,42018,50703,77647,48273,51943,25317,76893,43156,11204,23775,31223,90188,61128,28268,90392,44615,8268,29091,49155,85684,99848,45164,39604,29565,7404,45835,49245,63540,86092,99420,29373,37648,26577,49013,58660,15547,88827,60902,28092,21057,57412,65051,95942,67104,62021,81178,64070,25825,74358,939,257,2738,85713,59958,34270,45916,48325,86031,68099,13435,96,26509,74600,67943,48462,59415,62719,63542,46788,33966,56201,74797,49645,34663,70653,13993,75123,64971,56299,24290,55613,17019,16469,10228,41530,65684,20980,5657,34485,74200,65094,45106,82980,23313,89679,78589,76463,44692,92419,28224,67769,79223,10358,12046,64508,5664,84873,98898,65128,22810,40006,78824,85890,28783,24195,88541,17804,77888,98005,3163,40585,37812,83540,65091,2368,10153,10123,67755,4906,26772,91241,61110,81443,96255,55859,82616,50686,39787,2824,67188,75051,72755,88297,70263,10886,52465,90967,67740,38048,41855,57897,36010,98311,39800,34970,61919,53605,35960,34556,75179,62847,99582,53397,64903,97681,21363,89165,18184,63832,93788,41877,73907,40862,72219,55696,34322,70520,86524,58044,96077,51867,68264,68773,42545,23291,87772,43135,35583,34601,78101,33345,33146,17877,55345,47317,93056,55095,65630,83641,32938,20800,94894,86836,18949,33463,96351,90210,15136,93306,57484,4970,44591,98902,99319,64390,71429,9453,10288,25967,27543,35135,10277,22490,95493,40130,69518,34173,68057,99926,85945,64916,85557,25386,77470,46018,32982,8214,45880,16878,64891,93415,13796,54925,65390,91184,73091,97527,20617,56281,20972,61849,6063,22834,23121,50132,23168,65229,36501,7732,61662,97901,80938,5449,4324,17177,98702,90374,47638,78220,8025,631,14868,7109,99196,81197,61445,13447,88088,61439,81253,11602,79082,91878,78818,23302,98516,17627,35548,90154,93396,36910,78363,42642,78977,42405,77288,36169,14039,62197,8267,34957,14627,80164,49675,44941,46282,38844,80471,92596,57433,129,85146,35443,63604,8925,2513,16232,75329,65547,12865,65564,40061,37036,2476,18330,67599,77393,27915,21717,39812,67331,39831,91457,82078,86552,84524,33187,78388,93499,31834,51610,48374,9599,62442,58001,79427,25427,10503,79516,447,67385,67312,58505,44860,57318,34878,5784,24556,75323,42995,80230,66273,70899,98707,35982,28881,77311,13652,56025,39197,99183,60848,78778,12611,55865,32717,30740,98863,41178,61939,7501,15980,32740,35422,11145,47642,95924,26318,2168,1409,43150,76210,83168,74739,80236,49366,56557,97408,79222,60346,18130,37973,34753,29330,6319,90128,81966,20999,93645,6116,58983,64143,64283,28777,72347,88576,31292,32646,60348,32715,27780,57558,71558,62441,13266,17590,44902,70808,47002,35206,76340,61551,21960,23786,90629,38566,37365,31258,7344,89896,68418,32815,88560,1462,20635,98402,15486,87818,87832,36665,62422,19706,36102,59077,73157,88408,62456,94333,70225,5552,24127,22076,11571,53084,13717,63016,69793,18229,59206,5782,57379,89482,91636,32108,36221,34750,75596,45483,42593,34406,5029,18772,57724,84624,39233,86922,77772,79550,46276,51367,81047,3458,57390,26930,10328,59690,9573,24463,84201,38847,23755,11125,21505,48379,61368,21367,57139,84731,655,79392,8614,574,39349,68808,91270,98549,33131,12817,91148,95310,90468,67711,69769,23201,54384,94404,70804,1585,8258,46100,64826,22817,47627,40503,7975,62228,2710,71282,16290,73345,9938,11950,57669,2767,47660,95717,4262,87231,23613,3132,5303,35138,15020,78273,55237,44423,43500,60718,8549,35593,34035,87978,94337,43352,66386,24438,4444,99250,15905,7122,12769,9057,38677,31656,91946,36395,6718,17485,60118,28635,16146,62946,37135,31640,81708,28255,76004,97645,67914,42872,64750,74295,54037,32473,77988,34434,87058,97884,71812,99386,59090,54572,94425,88778,26076,81966,74456,58589,75413,95109,35565,47090,62943,21610,15980,51616,83195,68938,77421,95579,43972,33475,80601,75068,71451,46172,53235,10594,229,14538,15779,3535,18792,74958,52376,39650,78112,30435,94447,23664,53290,14297,36564,57415,42577,62973,60178,54379,23471,86489,62865,63912,73881,27530,83805,88588,50955,79897,13092,25428,21032,55713,59695,44564,91593,38587,88118,40408,3458,52854,31271,82336,65550,75963,6379,94470,95815,57597,93874,49015,2623,78929,3542,96688,85728,93123,63370,93923,76560,81218,49577,40321,83277,75622,40868,20647,46612,53977,72510,27750,35045,37285,27490,11203,31005,67918,16305,66145,40471,68625,23301,84038,20189,36517,98936,39855,69536,39650,69495,4564,22889,54425,52056,70852,2908,50233,74545,75399,24963,43468,48828,24219,51897,9941,2651,36619,17298,43353,88712,82010,31618,63076,36919,84544,79994,83762,88616,25184,25358,90200,24379,1593,4588,99452,57834,24314,49316,55537,35259,88574,38989,461,34754,4295,90437,72079,88103,58951,94197,6143,66948,69639,73061,73794,18524,36274,36020,97571,22422,94371,55337,70290,23719,53295,26714,29618,44899,26361,6649,77299,54330,43848,2630,5462,43877,47405,17621,71980,48556,69593,10270,46270,96883,82522,97178,9692,38037,86735,12843,18547,90318,86142,36102,54725,67779,33960,70407,88337,68586,7416,32155,69809,76500,70306,61447,80616,15737,87763,27999,73308,44107,53341,12657,44004,61361,90129,11435,7085,83481,28657,86953,13639,35151,52055,98933,42279,8555,15128,98397,55654,21379,32574,97085,84159,36599,94225,9609,46510,85077,48658,2029,91337,35690,89749,37788,70610,10988,40687,76143,75614,96737,58393,53231,6038,72579,98172,68388,22202,75978,76105,62423,49984,89596,86908,52089,45258,42748,81535,12974,16453,27924,36973,55034,86639,71309,45107,17565,24195,39749,79635,82267,38262,24281,66895,22641,63707,26083,77311,11325,69871,66911,58230,47237,86189,86797,39368,87093,27872,48833,92246,72298,38752,73197,52211,18861,26297,48056,20065,65745,77341,96440,93208,79154,17581,91398,79828,59172,11980,31520,93006,90766,66947,25505,25662,51023,65611,68719,63162,26980,62111,33503,1486,79944,60541,97670,31407,32663,2175,36753,80740,98049,11417,70007,45707,20439,23306,81475,18876,79912,99032,38020,70540,25705,47679,54785,65712,60704,17881,63622,70029,11128,12249,58989,86318,73313,81666,73660,35613,13506,18101,38049,72504,24305,72834,59070,86071,40740,89168,78606,20938,68498,4058,52276,17717,47543,69021,7429,70137,18260,99292,15612,36939,26462,98679,74311,89545,57406,22151,89872,75116,77927,708,45972,38956,82904,59329,66204,82768,20511,72448,62524,87094,72666,36882,35918,2861,47804,24993,4741,22743,21102,23629,94258,42562,64222,95958,20392,86930,90823,49038,70593,4626,44967,17031,30479,24644,445,39185,42883,78137,45683,17756,54827,56304,89580,68790,85188,79520,52051,50933,47318,99246,22670,38591,17499,28045,35897,10805,97972,59912,69796,47591,87583,71044,9546,60276,47273,54530,64316,78565,55598,85509,68537,15822,53298,6913,78820,34806,23461,284,67945,80858,85813,9375,38770,83815,85870,58022,85829,52337,86324,79505,83821,11453,90476,36126,3399,87985,3568,71839,500,4500,88954,76271,29306,7336,56562,80863,6939,11635,25426,90038,43766,11098,1442,14002,76769,11714,51146,37615,88888,44783,94305,70233,19140,88140,54117,69619,93691,63860,64113,38961,85274,83126,16814,59257,35115,47597,69520,86248,82673,58660,20880,41290,12433,47215,34085,89592,19558,40045,76312,14199,74436,86458,70215,91487,71433,842,52231,56386,50702,87931,25493,45389,77371,50364,58718,16481,47618,59975,34313,10748,46240,47191,67951,40516,61168,65880,38343,23913,43060,82008,62035,75518,40273,83224,93027,21877,77435,88307,71632,1290,632,50128,9736,88768,37080,45712,4640,24781,93644,30882,33452,46119,42674,37702,89906,3306,6481,4953,12024,59540,87372,48001,2808,86196,55303,4471,18788,16408,59537,96648,92719,33499,6722,48955,33862,37879,13396,51670,43929,3939,48079,7524,30503,53480,5113,51939,94524,21433,17288,99724,16560,84499,11638,87050,60679,30010,88386,23716,28451,49719,22809,4109,25706,42582,55513,37422,47324,48847,53170,43576,84234,70617,40713,83624,15968,10641,63638,32104,65516,91641,5415,11173,5659,26470,28816,5998,33061,37595,42178,89808,43363,5269,23750,61805,51709,85293,88466,97116,4958,53628,55383,7265,38854,41885,40104,76385,44247,11543,30538,2550,62201,6462,67803,58608,73861,24575,92339,66125,11831,69331,66295,92341,93284,77231,44467,36331,47688,61093,39930,11186,17004,50702,88419,87123,4120,75947,73915,56934,53118,9829,22476,31866,85915,78365,50359,87479,80483,43982,24880,11468,69964,23984,20071,95228,63841,65270,25149,25133,58416,90652,74823,57118,96185,80077,2593,71310,39144,50045,38367,99790,79818,24103,43742,15546,60521,37955,28865,40358,85080,82134,23407,26816,91597,55285,34872,40824,81081,96452,96514,19764,63241,8287,7009,94878,93697,19786,50498,812,16033,1477,5958,72682,24719,2862,24640,75466,72096,62615,59825,51657,88987,31905,7150,1124,89274,93786,93492,12603,77640,93879,50029,38429,46416,14540,60096,99854,53701,21337,14776,23149,88425,87612,14118,73275,95677,39736,3861,38363,43532,86976,15608,57283,51214,31728,86864,3893,27587,60312,59186,75516,88981,54955,51563,28305,65703,39850,31114,26250,3584,46705,77618,3806,65896,68972,86255,44699,33004,84586,3957,78670,42706,9693,82971,71501,40885,48798,242,43439,36694,26933,51184,3285,42256,95213,33537,49816,36308,90626,951,40183,83542,11529,28352,57885,13323,48035,12880,7877,91954,93393,52484,52896,23149,50824,21749,49257,22391,26697,86114,57421,78006,81506,93889,24402,6114,92508,14331,62855,31552,80177,74401,2284,59442,18156,14048,16802,43234,64081,10157,79349,27857,18695,43181,37814,81532,19176,12963,37409,62303,51145,82240,82460,24582,65136,51998,26422,3929,13113,49884,43757,5622,55423,97518,40118,86731,92631,27205,90926,96293,52068,23709,65996,26947,96154,90337,7128,61897,4087,1991,23993,72488,86899,79374,11324,65148,88418,54336,12508,45647,16415,24710,13391,49148,95397,52338,72425,10023,68818,69355,49133,6885,23866,96638,15108,4489,91949,27222,5337,23033,81313,53666,77066,51356,802,88481,73212,23401,9683,58821,34532,78650,75983,37081,45008,45518,28358,73269,62559,78852,33061,22244,40088,55471,93262,69180,69656,21966,99573,56305,78275,32777,50891,53474,49154,20607,2148,90109,39213,57031,23313,61937,36049,86539,44626,86424,72189,79772,35617,2010,10973,83124,61716,63266,76741,70735,42465,12545,50465,55141,56235,27373,11852,54391,62949,35903,11676,42076,94570,98170,21346,17823,34684,94320,5241,17876,22221,86826,60898,6228,79471,82826,96582,25270,22077,78881,45064,40919,62442,72087,63729,34985,31142,15176,70720,52435,18755,39650,40171,90443,81261,36559,81823,67630,78532,56197,16870,77809,5654,18834,96386,3089,20120,95531,2744,78053,81987,16283,43645,86248,80595,11559,18234,59452,81379,53882,15148,5439,15665,98296,52359,40524,34081,
//+8000 rand ... cut to about 3000 to fit stackoverflow posting limits
};
//numofa: size of data set
int numofa=sizeof(a)/sizeof(int);
//Sort in increasing order. Used by slow algo to be not nearly as slow.
int sortfunc (const void *a, const void *b) {
return (*(int *)a - *(int *)b);
}
// Given 3 adjacent k values. Re-calculates the middle k value where (changing only this middle k value) the sum of the error on its left and error on its right is minimized (ie, ke[left] and ke[middle] are minimized).
int minimize_error_3(int *k, int *kai, int64_t *ke, const int left, const int middle, const int right) {
int64_t minerr=-1;
int64_t tmperr;
int64_t l_e=0, e=0; //not necessary to save errors by parts in general but
long minidx=kai[left]+1;
//printf ("%d %d %d %d ",k[middle], left, middle, right);
for (int i=kai[left]; i<kai[right]; i++) {
tmperr=0;
for (int j=kai[left]+1; j<i; j++) { //int j=kai[left]
tmperr+=a[j]-a[kai[left]];
}
e=tmperr;
for (int j=i+1; j<kai[right]; j++) {
tmperr+=a[j]-a[i];
}
if (minerr==-1)
minerr=tmperr;
if (tmperr<minerr) {
minerr=tmperr;
minidx=i;
l_e=e;
}
}
ke[left]=l_e;
ke[middle]=minerr>-1?minerr-l_e:0;
kai[middle]=minidx;
k[middle]=a[minidx];
//printf ("%d %d %d.%d ",ke[left], ke[middle], k[middle], minidx);
return 0;
}
int evenstartitercycles (int numofk) {
char *str, *str2;
int i, idx, err;
int done, moved;
int *k, *kai, *k_old;
int64_t *ke;
qsort(a,numofa,sizeof(int),sortfunc);
k=(int *) calloc(numofk,sizeof(int));
kai=(int *) malloc(numofk*sizeof(int));
k_old=(int *) malloc(numofk*sizeof(int));
ke=(int64_t *) calloc(numofk,sizeof(int64_t));
k[0]=a[0];
kai[0]=0;
k_old[0]=k[0];
for (int i=1; i<numofk-1; i++) {
k[i]=a[(numofa*i)/(numofk-1)];
kai[i]=(numofa*i)/(numofk-1);
k_old[i]=k[i];
}
k[numofk-1]=a[numofa-1];
kai[numofk-1]=numofa-1;
k_old[numofk-1]=k[numofk-1];
ke[numofk-1]=0; //already 0
i=0;
moved=1;
int at_end=0;
int min_x=k[2]-k[1]; //1 doing infin loop 0 ok but violates rule
int min_xi=1; //1 doing infin loop 0 ok but violates rule
int max_e=-1;
int max_ei=0;
while (!at_end || moved) {
if (i==0) {
moved=0;
at_end=0;
min_x=k[2]-k[1]; //?
min_xi=1;
max_e=-1; //?
max_ei=0;
}
minimize_error_3(k, kai, ke, i, i+1, i+2);
if (i>0) {
if (k[i+1]-k[i]<min_x) {
min_x=k[i+1]-k[i];
min_xi=i;
}
if (ke[i]>max_e && i>min_xi+1) {
max_e=ke[i];
max_ei=i;
}
//later do going to left version
}
if (k[i+1]!=k_old[i+1]) {
moved=1;
k_old[i+1]=k[i+1];
}
if (i<numofk-3) {
i++;
} else {
if (ke[i+1]>max_e && i+1>min_xi) {
max_e=ke[i+1];
max_ei=i+1;
}
//here see if can gain from shifting around some
if (max_ei>min_xi+3 && .1*ke[min_xi]*(k[min_xi]-k[min_xi-1])<max_e) { //fix the +3 to make it unnec??? .3??
//printf("1:%d %d %d %d ",min_x,min_xi,max_e,max_ei);
moved=1;
for (int i=min_xi; i<max_ei; i++) {
k[i]=a[kai[i+1]];
kai[i]=kai[i+1];
}
k[max_ei]=a[++kai[max_ei]];
}
i=0;
at_end=1;
}
}
err=0;
for (int i=0; i<numofk; i++)
err+=ke[i];
str=(char *) calloc(numofk,20);
for (int i=0; i<numofk; i++)
sprintf (str+strlen(str),"%d,",k[i]);
str2=(char *) calloc(numofk,20);
for (int i=0; i<numofk; i++)
sprintf (str2+strlen(str2),"%d,",(int)ke[i]);
printf ("\nevenstartitercycles(%d): The mininum error was %d, found at, k={%s} with error parts={%s} ",numofk,err,str,str2);
free(str);
free(str2);
return 0;
}
int main (int x, char **y) {
int t; //to track unique num of data if want this feature
int kmax;
qsort(a,numofa,sizeof(int),sortfunc);
t=1;
for (int i=1; i<numofa; i++)
if (a[i]!=a[i-1]) {
t++;
}
kmax=t; //t is value where we can reach 0 err sum for first time
kmax=numofa; //this will give many cases of 0 sum error for data sets that have many repeated data points.
for (int i=3; i<=kmax; i++) {
evenstartitercycles(i);
}
return 0;
}

This is is similar to one-dimensional k-medians clustering.
The DP I suggested previously won't work; I think we need a table from (n', k', i) to the optimal solution on D1 ≤ … ≤ Dn' with k' representatives of which the greatest is i. Given the bounds on D, the running time is on the order of n2 k with a very large constant, so you should probably adapt one of the heuristics that people use for k-means.

You may want to look into Ward's method using your particular distance function.

one first obvious point is that your Rs must be chosen amongst your Ds. (if you choose an R that is any D - 1 (but not another D), you'll improve the answer by incrementing your R by 1)
a second point is that your last R is useless and its error will always be 0

Related

Use the maximum CPU to solve the permutations of 10 in the shortest possible time

I am using this program written in C to determine the permutations of size 10 of an regular alphabet.
When I run the program it only uses 36% of my 3GHz CPU leaving 50% free. It also only uses 7MB of my 8GB of RAM.
I would like to use at least 70-80% of my computer's performance and not just this misery. This limitation is making this procedure very time consuming and I don't know when (number of days) I will need to have the complete output. I need help to resolve this issue in the shortest possible time, whether improving the source code or other possibilities.
Any help is welcome even if this solution goes through instead of using the C language use another one that gives me better performance in the execution of the program.
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
static int count = 0;
void print_permutations(char arr[], char prefix[], int n, int k) {
int i, j, l = strlen(prefix);
char newprefix[l + 2];
if (k == 0) {
printf("%d %s\n", ++count, prefix);
return;
}
for (i = 0; i < n; i++) {
//Concatenation of currentPrefix + arr[i] = newPrefix
for (j = 0; j < l; j++)
newprefix[j] = prefix[j];
newprefix[l] = arr[i];
newprefix[l + 1] = '\0';
print_permutations(arr, newprefix, n, k - 1);
}
}
int main() {
int n = 26, k = 10;
char arr[27] = "abcdefghijklmnopqrstuvwxyz";
print_permutations(arr, "", n, k);
system("pause");
return 0;
}
There are fundamental problems with your approach:
What are you trying to achieve?
If you want to enumerate the permutations of size 10 of a regular alphabet, your program is flawed as it enumerates all combinations of 10 letters from the alphabet. Your program will produce 2610 combinations, a huge number, 141167095653376, 141,167 billion! Ignoring the numbering, which will exceed the range of type int, that's more than 1.5 Petabytes, unlikely to fit on your storage space. Writing this at the top speed of 100MB/s would take more than 20 days.
The number of permutations, that is combinations of distinct letters from the 26 letter alphabet is not quite as large: 26! / 16! which is still large: 19275223968000, 7 times less than the previous result. That is still more than 212 terabytes of storage and 3 days at 100MB/s.
Storing these permutations is therefore impractical. You could change your program to just count the permutations and measure how long it takes if the count is the expected value. The first step of course is to correct your program to produce the correct set.
Test on smaller sets to verify correctness
Given the expected size of the problem, you should first test for smaller values such as enumerating permutations of 1, 2 and 3 letters to verify that you get the expected number of results.
Once you have correctness, only then focus on performance
Selecting different output methods, from printf("%d %s\n", ++count, prefix); to ++count; puts(prefix); to just ++count;, you will see that most of the time is spent in producing the output. Once you stop producing output, you might see that strlen() consumes a significant fraction of the execution time, which is useless since you can pass the prefix length from the caller. Further improvements may come from using a common array for the current prefix, precluding the need to copy at each recursive step.
Using multiple threads each producing its own output, for example each with a different initial letter, will not improve the overall time as the bottleneck is the bandwidth of the output device. But if you reduce the program to just enumerate and count the permutations, you might get faster execution with multiple threads, one per core, thereby increasing the CPU usage. But this should be the last step in your development.
Memory use is no measure of performance
Using as much memory as possible is not a goal in itself. Some problems may require a tradeoff between memory and time, where faster solving times are achieved using more core memory, but this one does not. 8MB is actually much more than your program's actual needs: this count includes the full stack space assigned to the program, of which only a tiny fraction will be used.
As a matter of fact, using less memory may improve overall performance as the CPU will make better use of its different caches.
Here is a modified program:
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
static unsigned long long count;
void print_permutations(char arr[], int n, char used[], char prefix[], int pos, int k) {
if (pos == k) {
prefix[k] = '\0';
++count;
//printf("%llu %s\n", count, prefix);
//puts(prefix);
return;
}
for (int i = 0; i < n; i++) {
if (!used[i]) {
used[i] = 1;
prefix[pos] = arr[i];
print_permutations(arr, n, used, prefix, pos + 1, k);
used[i] = 0;
}
}
}
int main(int argc, char *argv[]) {
int n = 26, k = 10;
char arr[27] = "abcdefghijklmnopqrstuvwxyz";
char used[27] = { 0 };
char perm[27];
unsigned long long expected_count;
clock_t start, elapsed;
if (argc >= 2)
k = strtol(argv[1], NULL, 0);
if (argc >= 3)
n = strtol(argv[2], NULL, 0);
start = clock();
print_permutations(arr, n, used, perm, 0, k);
elapsed = clock() - start;
expected_count = 1;
for (int i = n; i > n - k; i--)
expected_count *= i;
printf("%llu permutations, expected %llu, %.0f permutations per second\n",
count, expected_count, count / ((double)elapsed / CLOCKS_PER_SEC));
return 0;
}
Without output, this program enumerates 140 million combinations per second on my slow laptop, it would take 1.5 days to enumerate the 19275223968000 10-letter permutations from the 26-letter alphabet. It uses almost 100% of a single core, but the CPU is still 63% idle as I have a dual core hyper-threaded Intel Core i5 CPU. Using multiple threads should yield increased performance, but the program must be changed to no longer use a global variable count.
There are multiple reasons for your bad experience:
Your metric:
Your metric is fundamentally flawed. Peak-CPU% is an imprecise measurement for "how much work does my CPU do". Which normally isn't really what you're most interested in. You can inflate this number my doing more work (like starting another thread that doesn't contribute to the output at all).
Your proper metric would be items per second: How many different strings will be printed or written to a file per second. To measure that, start a test run with a smaller size (like k=4), and measure how long it takes.
Your problem: Your problem is hard. Printing or writing down all 26^10 ~1.4e+14 different words with exactly 10 letters will take some time. Even if you changed it to all permutations - which your program doesn't do - it's still ~1.9e13. The resulting file will be 1.4 petabytes - which is most likely more than your hard drive will accept. Also, if you used your CPU to 100% and used one thousand cycles for one word, it'd take 1.5 years. 1000 cycles are an upper bound, you most likely won't be faster that this while still printing your result, as printf usually takes around 1000 cycles to complete.
Your output: Writing to stdout is slow comapred to writing to a file, see https://stackoverflow.com/a/14574238/4838547.
Your program: There are issues with your program that could be a problem for your performance. However, they are dominated by the other problems stated here. With my setup, this program uses 93.6% of its runtime in printf. Therefore, optimizing this code won't yield satisfying results.

Matchmaking program in C?

The problem I am given is the following:
Write a program to discover the answer to this puzzle:"Let's say men and women are paid equally (from the same uniform distribution). If women date randomly and marry the first man with a higher salary, what fraction of the population will get married?"
From this site
My issue is that it seems that the percent married figure I am getting is wrong. Another poster asked this same question on the programmers exchange before, and the percentage getting married should be ~68%. However, I am getting closer to 75% (with a lot of variance). If anyone can take a look and let me know where I went wrong, I would be very grateful.
I realize, looking at the other question that was on the programmers exchange, that this is not the most efficient way to solve the problem. However, I would like to solve the problem in this manner before using more efficient approaches.
My code is below, the bulk of the problem is "solved" in the test function:
#include <cs50.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#define ARRAY_SIZE 100
#define MARRIED 1
#define SINGLE 0
#define MAX_SALARY 1000000
bool arrayContains(int* array, int val);
int test();
int main()
{
printf("Trial count: ");
int trials = GetInt();
int sum = 0;
for(int i = 0; i < trials; i++)
{
sum += test();
}
int average = (sum/trials) * 100;
printf("Approximately %d %% of the population will get married\n", average / ARRAY_SIZE);
}
int test()
{
srand(time(NULL));
int femArray[ARRAY_SIZE][2];
int maleArray[ARRAY_SIZE][2];
// load up random numbers
for (int i = 0; i < ARRAY_SIZE; i++)
{
femArray[i][0] = (rand() % MAX_SALARY);
femArray[i][1] = SINGLE;
maleArray[i][0] = (rand() % MAX_SALARY);
maleArray[i][1] = SINGLE;
}
srand(time(NULL));
int singleFemales = 0;
for (int k = 0; k < ARRAY_SIZE; k++)
{
int searches = 0; // count the unsuccessful matches
int checkedMates[ARRAY_SIZE] = {[0 ... ARRAY_SIZE - 1] = ARRAY_SIZE + 1};
while(true)
{
// ARRAY_SIZE - k is number of available people, subtract searches for people left
// checked all possible mates
if(((ARRAY_SIZE - k) - searches) == 0)
{
singleFemales++;
break;
}
int randMale = rand() % ARRAY_SIZE; // find a random male
while(arrayContains(checkedMates, randMale)) // ensure that the male was not checked earlier
{
randMale = rand() % ARRAY_SIZE;
}
checkedMates[searches] = randMale;
// male has a greater income and is single
if((femArray[k][0] < maleArray[randMale][0]) && (maleArray[randMale][1] == SINGLE))
{
femArray[k][1] = MARRIED;
maleArray[randMale][1] = MARRIED;
break;
}
else
{
searches++;
continue;
}
}
}
return ARRAY_SIZE - singleFemales;
}
bool arrayContains(int* array, int val)
{
for(int i = 0; i < ARRAY_SIZE; i++)
{
if (array[i] == val)
return true;
}
return false;
}
In the first place, there is some ambiguity in the problem as to what it means for the women to "date randomly". There are at least two plausible interpretations:
You cycle through the unmarried women, with each one randomly drawing one of the unmarried men and deciding, based on salary, whether to marry. On each pass through the available women, this probably results in some available men being dated by multiple women, and others being dated by none.
You divide each trial into rounds. In each round, you randomly shuffle the unmarried men among the unmarried women, so that each unmarried man dates exactly one unmarried woman.
In either case, you must repeat the matching until there are no more matches possible, which occurs when the maximum salary among eligible men is less than or equal to the minimum salary among eligible women.
In my tests, the two interpretations produced slightly different statistics: about 69.5% married using interpretation 1, and about 67.6% using interpretation 2. 100 trials of 100 potential couples each was enough to produce fairly low variance between runs. In the common (non-statistical) sense of the term, for example, the results from one set of 10 runs varied between 67.13% and 68.27%.
You appear not to take either of those interpretations, however. If I'm reading your code correctly, you go through the women exactly once, and for each one you keep drawing random men until either you find one that that woman can marry or you have tested every one. It should be clear that this yields a greater chance for women early in the list to be married, and that order-based bias will at minimum increase the variance of your results. I find it plausible that it also exerts a net bias toward more marriages, but I don't have a good argument in support.
Additionally, as I wrote in comments, you introduce some bias through the way you select random integers. The rand() function returns an int between 0 and RAND_MAX, inclusive, for RAND_MAX + 1 possible values. For the sake of argument, let's suppose those values are uniformly distributed over that range. If you use the % operator to shrink the range of the result to N possible values, then that result is still uniformly distributed only if N evenly divides RAND_MAX + 1, because otherwise more rand() results map to some values than map to others. In fact, this applies to any strictly mathematical transformation you might think of to narrow the range of the rand() results.
For the salaries, I don't see why you even bother to map them to a restricted range. RAND_MAX is as good a maximum salary as any other; the statistics gleaned from the simulation don't depend on the range of salaries; but only on their uniform distribution.
For selecting random indices into your arrays, however, either for drawing men or for shuffling, you do need a restricted range, so you do need to take care. The best way to reduce bias in this case is to force the random numbers drawn to come from a range that is evenly divisible by the number of options by re-drawing as many times as necessary to ensure it:
/*
* Returns a random `int` in the half-open interval [0, upper_bound).
* upper_bound must be positive, and should not exceed RAND_MAX + 1.
*/
int random_draw(int upper_bound) {
/* integer division truncates the remainder: */
int rand_bound = (RAND_MAX / upper_bound) * upper_bound;
for (;;) {
int r = rand();
if (r < rand_bound) {
return r % upper_bound;
}
}
}

Reducing memory usage when designing a sieve of eratosthenes in C

I'm trying to design a sieve of eratosthenes in C but I've run into two strange problems which I can't figure out. Here's my basic program outline. Ask users to set a range to display primes from. If the range minimum is below 9, set the minimum as 9. Fill an array with all odd numbers in the range.
1) I'm trying to reduce memory usage by declaring variable size arrays like so:
if (max<=UINT_MAX)
unsigned int range[(max-min)/2];
else if (max<=ULONG_MAX)
unsigned long int range[(max-min)/2];
else if (max<=ULLONG_MAX)
unsigned long long int range[(max-min)/2];
Why doesn't this compile? Variables min and max are declared as ints earlier and limits.h is included. I've commented out the selection structure and just declared unsigned long long int range[(max-min)/2]; for now which compiles and works for now.
2) My code runs but it sometimes marks small primes as non primes.
#include<stdio.h>
#include<limits.h>
void prime(int min, int max)
{
int i, f=0;
//declare variable size array
/*if (max<=(int)UINT_MAX)
unsigned int range[(max-min)/2];
else if (max<=(int)ULONG_MAX)
unsigned long int range[(max-min)/2];
else if (max<=(int)ULLONG_MAX)*/
unsigned long long int range[(max-min)/2];
//fill array with all odd numbers
if (min%2==0)
{
for (i=min+1;i<=max;i+=2)
{
range[f]=i;
f+=1;
}
}
else
{
for (i=min;i<=max;i+=2)
{
range[f]=i;
f+=1;
}
}
//assign 0 to cell if divisible by any number other than itself
for (i=3;i<=sqrt(max);++i)
{
for (f=0;f<=((max-min)/2);f++)
{
if (range[f]%i==0 && f!=i)
range[f]=0;
}
}
//troubleshoot only: print full range
for (f=0;f<=((max-min)/2);f++)
{
printf("ALL: %d / %d\n", f, range[f]);
}
//display all primes
if (min==9) /*print primes lower than 9 for ranges where min<9*/
printf("2\n3\n5\n7\n");
for (f=0;f<=((max-min)/2);f++) /*print non 0 numbers in array*/
{
if (range[f]!=0)
printf("%d\n", range[f]);
}
}
int main(void)
{
int digits1, digits2;
printf("\n\n\nCalculate Prime Numbers\n");
printf("This program will display all prime numbers in a given range. \nPlease set the range.\n");
printf("Minimum: ");
scanf("%d", &digits1);
if (digits1<9)
digits1=9;
printf("Maximum: ");
scanf("%d", &digits2);
printf("Calculating...");
printf("All prime numbers between %d and %d are:\n", digits1, digits2);
prime(digits1, digits2);
getchar();
getchar();
}
For example, if digits=1 and digits2=200 my program outputs all primes between 1 and 200 except 11 and 13. 11 and 13 are sieved out and I can't figure out why this happens to more and more low numbers as digits2 is increased.
3) Finally, is my sieve a proper sieve of eratosthenes? It kind of works but I feel like there is a more efficient way of sieving out non primes but I can't figure out how to implement it. One of my goals for this program is to be as efficient as possible. Again, what I have right now is:
//assign 0 to cell if divisible by any number other than itself
for (i=3;i<=sqrt(max);++i)
{
for (f=0;f<=((max-min)/2);f++)
{
if (range[f]%i==0 && f!=i)
range[f]=0;
}
}
Thanks for reading all of that! I'm sorry for posting yet another sieve of eratosthenes related question and thank you in advance for the help!
No, it is not a proper sieve of Eratosthenes. No testing of remainders is involved in the sieve of Eratosthenes algorithm, Wikipedia is real clear on this I think. :) The whole point to it is to avoid the trial divisions, to get the primes for free, without testing.
How? By generating their multiples, from every prime that we identify, in ascending order one after another.
The multiples of a prime p are: 2p, 2p + p, 2p + p + p, ...
The odd multiples of a prime p are: 3p, 3p + 2p, 3p + 2p + 2p, ...
As we enumerate them, we mark them in the sieve array. Some will be marked twice or more, e.g. 15 will be marked for 3 and for 5 (because 3 * 5 == 5 * 3). Thus, we can start enumerating and marking from p2:
for( i=3; i*i < n; i += 2 )
if( !sieve[i] ) // if `i` is not marked as composite
for( j = i*i; j < n; j += 2*i )
{
sieve[j] = 1; // 1 for composite, initially all are 0s
}
The key to the sieve is this: we don't store the numbers in the array. It is not an array of INTs; it is an array of 1-bit flags, 0 or 1 in value. The index of an entry in the sieve array signifies the number for which the sieve holds its status: marked, i.e. composite, or not yet marked, i.e. potentially prime.
So in the end, all the non-marked entries signify the primes. You will need to devise an addressing scheme of course, e.g. an entry at index i might correspond to the number a + 2*i where a is the odd start of the range. Since your range starts at some offset, this scheme is known as offset sieve of Eratosthenes. A skeleton C implementation is here.
To minimize the memory use, we need to treat our array as a bit array. In C++ e.g. it is easy: we declare it as vector<bool> and it is automatically bit-packed for us. In C we'll have to do some bit packing and unpacking ourselves.
A word of advice: don't go skimpy on interim variables. Name every meaningful entity in your program. There shouldn't be any (max-min)/2 in your code; but instead define width = max - min and use that name. Leave optimizations in the small to the compiler. :)
To your first question: it's a scope thing. Your code is equivalent to
if (max<=UINT_MAX)
{ unsigned int range[(max-min)/2]; } // note the curly braces!
else if (max<=ULONG_MAX)
{ unsigned long int range[(max-min)/2]; }
else if (max<=ULLONG_MAX)
{ unsigned long long int range[(max-min)/2]; }
so there's three range array declarations here, each in its own scope, inside the corresponding block. Each is created on entry to its enclosing block ({) and is destroyed on exit from it (}). In other words, it doesn't exist for the rest of your prime function anymore. Practically it means that if you declare your variable inside an if block, you can only use it inside that block (between the corresponding braces { and } ).
Q1: you can not declare a symbol (here: range) twice in the same scope. It is not exactly your problem but you are trying to do this: you declare range within the if scope and it is not visible outside.

Summation of nth prime

I'm writing a program that finds the summation of prime numbers. It must use redirected input. I've written it so that it finds the largest number for the input then uses that as the nth prime. It then uses the nth prime to set the size of the array. It works up until I try to printf the sum. I can't figure out why I'm getting a seg fault there of all place.I think I've allocated the arrays correctly with malloc. why would the fault happen at printf on not when I'm using my arrays? Any suggestions on my code are also welcome.
EDIT
used a test input of 2000 form 1 to 2000 and it worked but the full test file of 10000 form 1 to 10000 crashes still looking into why. I'm guessing I didnt allocate enough space
EDIT
My problem was in my sieve I didnt take the sqrt(nthprime) so it found more primes then the array could hold
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
int nprime (int max);
void sieve_sum ( int *primes, int nthprime, int tests,int *input);
int main(void)
{
int i=0;
int max=0; //largest input
int tests; //number of tests
int nthprime; //estimated nth prime
int *primes; //array of primes
int *input; // numbers to put in to P(n), where p(n) is the summation of primes
scanf("%d",&tests); //gets number of tests
input = malloc(sizeof(int)*tests);
//test values
for(i=0; i<=tests-1; i++)
scanf("%d",&input[i]);
//finds max test value
i=0;
for (i = 0; i < tests; i++ )
{
if ( input[i] > max-1 )
max = input[i];
}
// calls nprime places value in n
nthprime = nprime(max);
primes = malloc(sizeof(int)*nthprime);
// calls sieve_sum
sieve_sum( primes, nthprime, tests, input);
//free memory
free(input);
free(primes);
return 0;
}
//finds Primes and their sum
void sieve_sum ( int *primes, int nthprime, int tests,int *input)
{
int i;
int j;
//fills in arrays with 1's
for(i=2; i<=nthprime; i++)
primes[i] = 1;
//replaces non primes with 0's
i=0;
for(i=2; i<=sqrt(nthprime); i++)
{
if(primes[i] == 1)
{
for(j=i; (i*j)<=(nthprime); j++)
primes[(i*j)] = 0;
}
}
//rewrites array with only primes
j=1;
i=0;
for(i=2; i<=nthprime; i++)
{
if(primes[i] == 1)
{
primes[j] = i;
j++;
}
}
//sums
i=0;
for ( i=1; i<=tests; i++ )
{
int sum=0;//sum of primes
j=0;
for(j=1; j<=input[i-1]; j++)
{
sum = primes[j] + sum;
}
printf("%d\n", sum );
}
return 0;
}
//finds the Nth number prime
int nprime (int max)
{
//aproximization of pi(n) (the nth prime) times 2 ensures correct allocation of memory
max = ceil( max*((log (max)) + log ((log (max)))))*2;
return (max);
}
Example input file:
20
1
2
3
4
5
6
7
8
9
10
10
9
8
7
6
5
4
3
2
1
Example output should be:
2
5
10
17
28
41
58
77
100
129
129
100
77
58
41
28
17
10
5
2
OK, so I'm going to have a shot at this without saying too much because it looks like it might be a homework problem.
My best guess is that a lot of people have shied away from even looking at your code because it's a bit too messy for their level of patience. It's not irredeemable, but you would probably get a better response if it were considerably cleaner.
So, first, a few critical comments on your code to help you clean it up: it is insufficiently well-commented to make your intentions clear at any level, including what the overall purpose of the program is; it is inconsistently indented and spaced in an unconventional manner; and your choice of variable names leaves something to be desired, which is exacerbated by the absence of comments on variable declarations.
You should compile this code with something like (assuming your source file is called sumprimes.c):
gcc -std=c99 -pedantic -Wall -Wextra -o sumprimes sumprimes.c -lm
Have a look at the warnings this produces, and it will alert you to a few, admittedly fairly minor, problems.
The major immediate issue that I can see by inspection is that your program will certainly segfault because the storage that you have allocated with malloc() is too small by a factor of sizeof(int), which you have omitted.
A static error-checker like splint would help you detect some further issues; there's no need to slavishly follow all of its recommendations, though: once you understand them all, you can decide which ones to follow.
A few other remarks:
“Magic numbers”, such as 100 in your code, are considered very bad form. As a rule of thumb, the only number that should ever appear in code literally is 0 (zero), and then only sometimes. Your 100 would be much better represented as something named (like a const int or, more traditionally, a #define) to give an indication of its meaning.
It is unconventional to “outdent” variable declarations in the manner done in your code
If a function is advertised to return a value, you should always check it for errors, e.g. make sure that the return value of malloc() is not NULL, check that the return value of scanf() (if you use it) is the expected value, etc.
As a general matter of style, it is usually considered good practice in C to declare variables one per line with a short explanatory comment. There are exceptions, but it's a reasonable rule-of-thumb.
For any kind of input, scanf() is a poor choice because it alters the state of stdin in ways that are difficult to predict unless the input is exactly as expected, which you can never depend on. If you want to read in an integer, it is much better to read what's available on stdin with fgets() into a buffer, and then use strtol(), as you can do much more effective error-checking and reporting that way.
It is not recommended to cast the return of malloc() any more.
Hope this helps.

storing known sequences in c

I'm working on Project Euler #14 in C and have figured out the basic algorithm; however, it runs insufferably slow for large numbers, e.g. 2,000,000 as wanted; I presume because it has to generate the sequence over and over again, even though there should be a way to store known sequences (e.g., once we get to a 16, we know from previous experience that the next numbers are 8, 4, 2, then 1).
I'm not exactly sure how to do this with C's fixed-length array, but there must be a good way (that's amazingly efficient, I'm sure). Thanks in advance.
Here's what I currently have, if it helps.
#include <stdio.h>
#define UPTO 2000000
int collatzlen(int n);
int main(){
int i, l=-1, li=-1, c=0;
for(i=1; i<=UPTO; i++){
if( (c=collatzlen(i)) > l) l=c, li=i;
}
printf("Greatest length:\t\t%7d\nGreatest starting point:\t%7d\n", l, li);
return 1;
}
/* n != 0 */
int collatzlen(int n){
int len = 0;
while(n>1) n = (n%2==0 ? n/2 : 3*n+1), len+=1;
return len;
}
Your original program needs 3.5 seconds on my machine. Is it insufferably slow for you?
My dirty and ugly version needs 0.3 seconds. It uses a global array to store the values already calculated. And use them in future calculations.
int collatzlen2(unsigned long n);
static unsigned long array[2000000 + 1];//to store those already calculated
int main()
{
int i, l=-1, li=-1, c=0;
int x;
for(x = 0; x < 2000000 + 1; x++) {
array[x] = -1;//use -1 to denote not-calculated yet
}
for(i=1; i<=UPTO; i++){
if( (c=collatzlen2(i)) > l) l=c, li=i;
}
printf("Greatest length:\t\t%7d\nGreatest starting point:\t%7d\n", l, li);
return 1;
}
int collatzlen2(unsigned long n){
unsigned long len = 0;
unsigned long m = n;
while(n > 1){
if(n > 2000000 || array[n] == -1){ // outside range or not-calculated yet
n = (n%2 == 0 ? n/2 : 3*n+1);
len+=1;
}
else{ // if already calculated, use the value
len += array[n];
n = 1; // to get out of the while-loop
}
}
array[m] = len;
return len;
}
Given that this is essentially a throw-away program (i.e. once you've run it and got the answer, you're not going to be supporting it for years :), I would suggest having a global variable to hold the lengths of sequences already calculated:
int lengthfrom[UPTO] = {};
If your maximum size is a few million, then we're talking megabytes of memory, which should easily fit in RAM at once.
The above will initialise the array to zeros at startup. In your program - for each iteration, check whether the array contains zero. If it does - you'll have to keep going with the computation. If not - then you know that carrying on would go on for that many more iterations, so just add that to the number you've done so far and you're done. And then store the new result in the array, of course.
Don't be tempted to use a local variable for an array of this size: that will try to allocate it on the stack, which won't be big enough and will likely crash.
Also - remember that with this sequence the values go up as well as down, so you'll need to cope with that in your program (probably by having the array longer than UPTO values, and using an assert() to guard against indices greater than the size of the array).
If I recall correctly, your problem isn't a slow algorithm: the algorithm you have now is fast enough for what PE asks you to do. The problem is overflow: you sometimes end up multiplying your number by 3 so many times that it will eventually exceed the maximum value that can be stored in a signed int. Use unsigned ints, and if that still doesn't work (but I'm pretty sure it does), use 64 bit ints (long long).
This should run very fast, but if you want to do it even faster, the other answers already addressed that.

Resources