Generating random number in sorted order - c

I want to generate random number in sorted order.
I wrote below code:
void CreateSortedNode(pNode head)
{
int size = 10, last = 0;
pNode temp;
while(size-- > 0) {
temp = (pnode)malloc(sizeof(struct node));
last += (rand()%10);
temp->data = last;//randomly generate number in sorted order
list_add(temp);
}
}
[EDIT:]
Expecting number will be generated in increased or decreased order: i.e {2, 5, 9, 23, 45, 68 }
int main()
{
int size = 10, last = 0;
while(size-- > 0) {
last += (rand()%10);
printf("%4d",last);
}
return 0;
}
Any better idea?

Solved back in 1979 (by Bentley and Saxe at Carnegie-Mellon):
https://apps.dtic.mil/dtic/tr/fulltext/u2/a066739.pdf
The solution is ridiculously compact in terms of code too!
Their paper is in Pascal, I converted it to Python so it should work with any language:
from random import random
cur_max=100 #desired maximum random number
n=100 #size of the array to fill
x=[0]*(n) #generate an array x of size n
for i in range(n,0,-1):
cur_max=cur_max*random()**(1/i) #the magic formula
x[i-1]=cur_max
print(x) #the results
Enjoy your sorted random numbers...

Without any information about sample size or sample universe, it's not easy to know if the following is interesting but irrelevant or a solution, but since it is in any case interesting, here goes.
The problem:
In O(1) space, produce an unbiased ordered random sample of size n from an ordered set S of size N: <S1,S2,…SN>, such that the elements in the sample are in the same order as the elements in the ordered set.
The solution:
With probability n/|S|, do the following:
add S1 to the sample.
decrement n
Remove S1 from S
Repeat steps 1 and 2, each time with the new first element (and size) of S until n is 0, at which point the sample will have the desired number of elements.
The solution in python:
from random import randrange
# select n random integers in order from range(N)
def sample(n, N):
# insist that 0 <= n <= N
for i in range(N):
if randrange(N - i) < n:
yield i
n -= 1
if n <= 0:
break
The problem with the solution:
It takes O(N) time. We'd really like to take O(n) time, since n is likely to be much smaller than N. On the other hand, we'd like to retain the O(1) space, in case n is also quite large.
A better solution (outline only)
(The following is adapted from a 1987 paper by Jeffrey Scott Vitter, "An Efficient Algorithm for Sequential Random Sampling". See Dr. Vitter's publications page.. Please read the paper for the details.)
Instead of incrementing i and selecting a random number, as in the above python code, it would be cool if we could generate a random number according to some distribution which would be the number of times that i will be incremented without any element being yielded. All we need is the distribution (which will obviously depend on the current values of n and N.)
Of course, we can derive the distribution precisely from an examination of the algorithm. That doesn't help much, though, because the resulting formula requires a lot of time to compute accurately, and the end result is still O(N).
However, we don't always have to compute it accurately. Suppose we have some easily computable reasonably good approximation which consistently underestimates the probabilities (with the consequence that it will sometimes not make a prediction). If that approximation works, we can use it; if not, we'll need to fallback to the accurate computation. If that happens sufficiently rarely, we might be able to achieve O(n) on the average. And indeed, Dr. Vitter's paper shows how to do this. (With code.)

Suppose you wanted to generate just three random numbers, x, y, and z so that they are in sorted order x <= y <= z. You will place these in some C++ container, which I'll just denote as a list like D = [x, y, z], so we can also say that x is component 0 of D, or D_0 and so on.
For any sequential algorithm that first draws a random value for x, let's say it comes up with 2.5, then this tells us some information about what y has to be, Namely, y >= 2.5.
So, conditional on the value of x, your desired random number algorithm has to satisfy the property that p(y >= x | x) = 1. If the distribution you are drawing from is anything like a common distribution, like uniform or Guassian, then it's clear to see that usually p(y >= x) would be some other expression involving the density for that distribution. (In fact, only a pathological distribution like a Dirac Delta at "infinity" could be independent, and would be nonsense for your application.)
So what we can speculate with great confidence is that p(y >= t | x) for various values of t is not equal to p(y >= t). That's the definition for dependent random variables. So now you know that the random variable y (second in your eventual list) is not statistically independent of x.
Another way to state it is that in your output data D, the components of D are not statistically independent observations. And in fact they must be positively correlated since if we learn that x is bigger than we thought, we also automatically learn that y is bigger than or equal to what we thought.
In this sense, a sequential algorithm that provides this kind of output is an example of a Markov Chain. The probability distribution of a given number in the sequence is conditionally dependent on the previous number.
If you really want a Markov Chain like that (I suspect that you don't), then you could instead draw a first number at random (for x) and then draw positive deltas, which you will add to each successive number, like this:
Draw a value for x, say 2.5
Draw a strictly positive value for y-x, say 13.7, so y is 2.5 + 13.7 = 16.2
Draw a strictly positive value for z-y, say 0.001, so z is 16.201
and so on...
You just have to acknowledge that the components of your result are not statistically independent, and so you cannot use them in an application that relies on statistical independence assumptions.

Related

Monte Carlo integration of the Gaussian function f(x) = exp(-x^2/2) in C incorrect output

I'm writing a short program to approximate the definite integral of the gaussian function f(x) = exp(-x^2/2), and my codes are as follows:
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
double gaussian(double x) {
return exp((-pow(x,2))/2);
}
int main(void) {
srand(0);
double valIntegral, yReal = 0, xRand, yRand, yBound;
int xMin, xMax, numTrials, countY = 0;
do {
printf("Please enter the number of trials (n): ");
scanf("%d", &numTrials);
if (numTrials < 1) {
printf("Exiting.\n");
return 0;
}
printf("Enter the interval of integration (a b): ");
scanf("%d %d", &xMin, &xMax);
while (xMin > xMax) { //keeps looping until a valid interval is entered
printf("Invalid interval!\n");
printf("Enter the interval of integration (a b): ");
scanf("%d %d", &xMin, &xMax);
}
//check real y upper bound
if (gaussian((double)xMax) > gaussian((double)xMin))
yBound = gaussian((double)xMax);
else
yBound = gaussian((double)xMin);
for (int i = 0; i < numTrials; i++) {
xRand = (rand()% ((xMax-xMin)*1000 + 1))/1000.00 + xMin; //generate random x value between xMin and xMax to 3 decimal places
yRand = (rand()% (int)(yBound*1000 + 1))/1000.00; //generate random y value between 0 and yBound to 3 decimal places
yReal = gaussian(xRand);
if (yRand < yReal)
countY++;
}
valIntegral = (xMax-xMin)*((double)countY/numTrials);
printf("Integral of exp(-x^2/2) on [%.3lf, %.3lf] with n = %d trials is: %.3lf\n\n", (double)xMin, (double)xMax, numTrials, valIntegral);
countY = 0; //reset countY to 0 for the next run
} while (numTrials >= 1);
return 0;
}
However, the outputs from my code doesn't match the solutions. I tried to debug and print out all xRand, yRand and yReal values for 100 trials (and checked yReal value with particular xRand values with Matlab, in case I had any typos), and those values didn't seem to be out of range in any way... I don't know where my mistake is.
The correct output for # of trials = 100 on [0, 1] is 0.810, and mine is 0.880; correct output for # of trials = 50 on [-1, 0] is 0.900, and mine was 0.940. Can anyone find where I did wrong? Thanks a lot.
Another question is, I can't find a reference to the use of following code:
double randomNumber = rand() / (double) RAND MAX;
but it was provided by the instructor and he said it would generate a random number from 0 to 1. Why did he use '/' instead of '%' after "rand()"?
There's a few logical errors / discussion points in your code, both mathematics and programming-wise.
First of all, just to get it out of the way, we're talking about the standard gaussian here, i.e.
except, the definition of the gaussian on line 6, omits the
normalising term. Given the outputs you seem to expect, this seems to have been done on purpose. Fair enough. But if you wanted to calculate the actual integral, such that a practically infinite range (e.g. [-1000, 1000]) would sum up to 1, then you would need that term.
Is my code logically correct?
No. Your code has two logical errors: one on line 29 (i.e. your if statement), and one on line 40 (i.e. the calculation of valIntegral), which is a direct consequence of the first logical error.
For the first error, consider the following plot to see why:
Your Monte Carlo process effectively considers a bounded box over a certain range, and then says "I will randomly place points inside this box, and then count the proportion of the total number of points that randomly fell under the curve; the integral estimate is then the area of the bounded box itself, times this proportion".
Now, if both
and
are to the left of the mean (i.e. 0), then your if statement correctly sets the box's upper bound (i.e. yBound) to
such that the topmost bound of the box contains the highest part of that curve. So, e.g., to estimate the integral for the range [-2,-1], you set the upper bound to
.
Similarly, if both
and
are to the right of the mean, then you correctly set yBound to
However, if
, you should be setting yBound to neither
nor
, since the 0 point is higher than both!. So in this case, your yBound should simply be at the peak of the Gaussian, i.e.
(which in your case of an unnormalised Gaussian, this takes a value of '1').
Therefore, the correct if statement is as follows:
if (xMax < 0.0)
{ yBound = gaussian((double)xMax); }
else if (xMin > 0.0)
{ yBound = gaussian((double)xMin); }
else
{ yBound = gaussian(0.0); }
As for the second logical error, we already mentioned that the value of the integral is the "area of the bounding box" times the "proportion of successes". However, you seem to ignore the height of the box in your calculation. It is true that in the special case where
, the height of your unnormalised Gaussian function defaults to '1', therefore this term can be omitted. I suspect that this is why it may have been missed. However, in the other two cases, the height of the bounding box is necessarily less than 1, and therefore needs to be included in the calculation. So the correct code for line 40 should be:
valIntegral = yBound * (xMax-xMin) * (((double)countY)/numTrials);
Why am I not getting the correct output?
Even despite the above logical errors, as we've discussed above, your output should have been correct for the specific intervals [0,1] and [-1,0] (since they include the mean and therefore the correct yBound of 1). So why are you still getting a 'wrong' output?
The answer is, you are not. Your output is "correct". Except, a Monte Carlo process involves randomness, and 100 trials is not a big enough number to lead to consistent results. If you run the same range for 100 trials again and again, you'll see you'll get very different results each time (though, overall, they'll be distributed around the right value). Run with 1000000 trials, and you'll see that the result becomes a lot more precise.
What's up with that randomNumber code?
The rand() function returns an integer in the range [0, RAND_MAX], where RAND_MAX is system-specific (have a look at man 3 rand).
The modulo approach (i.e. %) works as follows: consider the range [-0.1, 0.3]. This range spans 0.4 units. 0.4 * 1000 + 1 = 401. For a random number from 0 to RAND_MAX, doing rand() modulo 401 will always result in a random number in the range [0,400]. If you then divide this back by 1000, you get a random number in the range [0, 0.4]. Add this to your xmin offset (here: -0.1) and you get a random number in the range [-0.1, 0.3].
In theory, this makes sense. However, unfortunately, as already pointed out in the other answer here, as a method it is susceptible to modulo bias, because RAND_MAX isn't necessarily exactly divisible by 401, therefore the top part of that range leading up to RAND_MAX overrepresents some numbers compared to others.
By contrast, the approach given to you by your teacher is simply saying: divide the result of the rand() function with RAND_MAX. This effectively normalises the returned random number into the range [0,1]. This is a much more straightforward thing to do, and it avoids modulo bias.
Therefore, the way I would implement this would be to make it into a function:
double randomNumber(void) {
return rand() / (double) RAND_MAX;
}
which then simplifies your computations as follows too:
xRand = randomNumber() * (xMax-xMin) + xMin;
yRand = randomNumber() * yBound;
You can see that this is a much more accurate thing to do, if you use a normalised gaussian, i.e.
double gaussian(double x) {
return exp((-pow(x,2.0))/2.0) / sqrt(2.0 * M_PI);
}
and then compare the two methods. You will see that the randomNumber() method for an "effectively infinite" range (e.g. [-1000,1000]) gives the correct result of 1, whereas the modulo approach tends to give numbers that are larger than 1.
Your code has no obvious bug (though there is a bug in the upper bound calculation, as #TasosPapastylianou points out, though it isn't the issue in your test cases). On 100 trials, your answer of 0.880 is closer to the actual value of the integral (0.855624...) than 0.810, and neither of those numbers are so far from the true value to suggest an outright bug in the code. Seems to be within sampling error (though see below). Here is a histogram of 1000 runs of a Monte Carlo integration (done in R, but with the same algorithm) of e^(-x^2/2) on [0,1] with 100 trials:
Unless your instructor specified the algorithm and the seed in precise detail, you shouldn't expect the exact same answer.
As far as your second question about rand() / (double) RAND MAX: it is an attempt to avoid modulo bias. It is possible that such a bias is effecting your code (especially given the way you round to 3 decimal places), since it does seem to overestimate the integral (based on running it a dozen times or so). Perhaps you could use that in your code and see if you get better results.

Deterministic bit scrambling to filter coordinates

I am trying to write a function that, given an (x,y) coordinate pair and the random seed of the program, will psuedo-randomly return true for some preset percentage of all such pairs. There are no limits on x or y beyond the restrictions of the data type, which is a 32-bit signed int.
My current approach is to scramble the bits of x, y, and the seed together and then compare the resulting number to the percentage:
float percentage = 0.005;
...
unsigned int n = (x ^ y) ^ seed;
return (((float) n / UINT_MAX) < percentage);
However, it seems that this approach would be biased for certain values of x and y. For example, if it returns true for (0,a), it will also return true for (a,0).
I know this implementation that just XORs them together is naive. Is there a better bit-scrambling algorithm to use here that will not be biased?
Edit: To clarify, I am not starting with a set of (x,y) coordinates, nor am I trying to get a fixed-size set of coordinates that evaluate to true. The function should be able to evaluate a truth value for arbitrary x, y, and seed, with the percentage controlling the average frequency of "true" coordinates.
The easy solution is to use a good hashing algorithm. You can do the range check on the value of hash(seed || x || y).
Of course, selecting points individually with percentage p does not guarantee that you will end up with a sample whose size will be exactly p * N. (That's the expected size of the sample, but any given sample will deviate a bit.) If you want to get a sample of size precisely k from a universe of N objects, you can use the following simple algorithm:
Examine the elements in the sample one at a time until k reaches 0.
When examining element i, add it to the sample if its hash value mapped onto the range [0, N-i) is less than k. If you add the element to the sample, decrement k.
There's no way to get the arithmetic absolutely perfect (since there is no way to perfectly partition 2i different hash values into n buckets unless n is a power of 2), so there will always be a tiny bias. (Floating point arithmetic does not help; the number of possible floating point values is also fixed, and suffers from the same bias.)
If you do 64-bit arithmetic, the bias will be truly tiny, but the arithmetic is more complicated unless your environment provides a 128-bit multiply. So you might feel satisfied with 32-bit computations, where the bias of one in a couple of thousand million [Note 1] doesn't matter. Here, you can use the fact that any 32 bits in your hash should be as unbiased as any other 32 bits, assuming your hash algorithm is any good (see below). So the following check should work fine:
// I need k elements from a remaining universe of n, and I have a 64-bit hash.
// Return true if I should select this element
bool select(uint32_t n, uint32_t k, uint64_t hash) {
return ((hash & (uint32_t)(-1)) * (uint64_t)n) >> 32 < k;
}
// Untested example sampler
// select exactly k elements from U, using a seed value
std::vector<E> sample(const std::vector<E>& U, uint64_t seed, uint32_t k) {
std::vector<E> retval;
uint32_t n = U.size();
for (uint32_t n = U.size(); k && n;) {
E& elt = U[--n];
if (select(n, k, hash_function(seed, elt))) {
retval.push_back(elt);
--k;
}
}
return retval;
}
Assuming you need to do this a lot, you'll want to use a fast hash algorithm; since you're not actually working in a secure environment, you don't need to worry about whether the algorithm is cryptographically secure.
Many high-speed hashing algorithms work on 64-bit units, so you could maximize the speed by constructing a 128-bit input consisting of a 64-bit seed and the two 32-bit co-ordinates. You can then unroll the hash loop to do exactly two blocks.
I won't venture a guess at the best hash function for your purpose. You might want to check out one or more of these open-source hashing functions:
Farmhash https://code.google.com/p/farmhash/
Murmurhash https://code.google.com/p/smhasher/
xxhash https://code.google.com/p/xxhash/
siphash https://github.com/majek/csiphash/
... and many more.
Notes
A couple of billion, if you're on that side of the Atlantic.
I would prefer feeding seed, x, and y through a Combined Linear Congruential Generator.
This is generally much faster than hashing, and it is designed specifically for the purpose: To output a pseudo-random number uniformly in a certain range.
Using coefficients recommended by Wichmann-Hill (which are also used in some versions of Microsoft Excel) we can do:
si = 171 * s % 30269;
xi = 172 * x % 30307;
yi = 170 * y % 30323;
r_combined = fmod(si/30269. + xi/30307. + yi/30323., 1.);
return r_combined < percentage;
Where s is the seed on the first call, and the previous si on each subsequent call. (Thanks to rici's comment for this point.)

fast algorithm of finding sums in array

I am looking for a fast algorithm:
I have a int array of size n, the goal is to find all patterns in the array that
x1, x2, x3 are different elements in the array, such that x1+x2 = x3
For example I know there's a int array of size 3 is [1, 2, 3] then there's only one possibility: 1+2 = 3 (consider 1+2 = 2+1)
I am thinking about implementing Pairs and Hashmaps to make the algorithm fast. (the fastest one I got now is still O(n^2))
Please share your idea for this problem, thank you
Edit: The answer below applies to a version of this problem in which you only want one triplet that adds up like that. When you want all of them, since there are potentially at least O(n^2) possible outputs (as pointed out by ex0du5), and even O(n^3) in pathological cases of repeated elements, you're not going to beat the simple O(n^2) algorithm based on hashing (mapping from a value to the list of indices with that value).
This is basically the 3SUM problem. Without potentially unboundedly large elements, the best known algorithms are approximately O(n^2), but we've only proved that it can't be faster than O(n lg n) for most models of computation.
If the integer elements lie in the range [u, v], you can do a slightly different version of this in O(n + (v-u) lg (v-u)) with an FFT. I'm going to describe a process to transform this problem into that one, solve it there, and then figure out the answer to your problem based on this transformation.
The problem that I know how to solve with FFT is to find a length-3 arithmetic sequence in an array: that is, a sequence a, b, c with c - b = b - a, or equivalently, a + c = 2b.
Unfortunately, the last step of the transformation back isn't as fast as I'd like, but I'll talk about that when we get there.
Let's call your original array X, which contains integers x_1, ..., x_n. We want to find indices i, j, k such that x_i + x_j = x_k.
Find the minimum u and maximum v of X in O(n) time. Let u' be min(u, u*2) and v' be max(v, v*2).
Construct a binary array (bitstring) Z of length v' - u' + 1; Z[i] will be true if either X or its double [x_1*2, ..., x_n*2] contains u' + i. This is O(n) to initialize; just walk over each element of X and set the two corresponding elements of Z.
As we're building this array, we can save the indices of any duplicates we find into an auxiliary list Y. Once Z is complete, we just check for 2 * x_i for each x_i in Y. If any are present, we're done; otherwise the duplicates are irrelevant, and we can forget about Y. (The only situation slightly more complicated is if 0 is repeated; then we need three distinct copies of it to get a solution.)
Now, a solution to your problem, i.e. x_i + x_j = x_k, will appear in Z as three evenly-spaced ones, since some simple algebraic manipulations give us 2*x_j - x_k = x_k - 2*x_i. Note that the elements on the ends are our special doubled entries (from 2X) and the one in the middle is a regular entry (from X).
Consider Z as a representation of a polynomial p, where the coefficient for the term of degree i is Z[i]. If X is [1, 2, 3, 5], then Z is 1111110001 (because we have 1, 2, 3, 4, 5, 6, and 10); p is then 1 + x + x2 + x3 + x4 + x5 + x9.
Now, remember from high school algebra that the coefficient of xc in the product of two polynomials is the sum over all a, b with a + b = c of the first polynomial's coefficient for xa times the second's coefficient for xb. So, if we consider q = p2, the coefficient of x2j (for a j with Z[j] = 1) will be the sum over all i of Z[i] * Z[2*j - i]. But since Z is binary, that's exactly the number of triplets i,j,k which are evenly-spaced ones in Z. Note that (j, j, j) is always such a triplet, so we only care about ones with values > 1.
We can then use a Fast Fourier Transform to find p2 in O(|Z| log |Z|) time, where |Z| is v' - u' + 1. We get out another array of coefficients; call it W.
Loop over each x_k in X. (Recall that our desired evenly-spaced ones are all centered on an element of X, not 2*X.) If the corresponding W for twice this element, i.e. W[2*(x_k - u')], is 1, we know it's not the center of any nontrivial progressions and we can skip it. (As argued before, it should only be a positive integer.)
Otherwise, it might be the center of a progression that we want (so we need to find i and j). But, unfortunately, it might also be the center of a progression that doesn't have our desired form. So we need to check. Loop over the other elements x_i of X, and check if there's a triple with 2*x_i, x_k, 2*x_j for some j (by checking Z[2*(x_k - x_j) - u']). If so, we have an answer; if we make it through all of X without a hit, then the FFT found only spurious answers, and we have to check another element of W.
This last step is therefore O(n * 1 + (number of x_k with W[2*(x_k - u')] > 1 that aren't actually solutions)), which is maybe possibly O(n^2), which is obviously not okay. There should be a way to avoid generating these spurious answers in the output W; if we knew that any appropriate W coefficient definitely had an answer, this last step would be O(n) and all would be well.
I think it's possible to use a somewhat different polynomial to do this, but I haven't gotten it to actually work. I'll think about it some more....
Partially based on this answer.
It has to be at least O(n^2) as there are n(n-1)/2 different sums possible to check for other members. You have to compute all those, because any pair summed may be any other member (start with one example and permute all the elements to convince yourself that all must be checked). Or look at fibonacci for something concrete.
So calculating that and looking up members in a hash table gives amortised O(n^2). Or use an ordered tree if you need best worst-case.
You essentially need to find all the different sums of value pairs so I don't think you're going to do any better than O(n2). But you can optimize by sorting the list and reducing duplicate values, then only pairing a value with anything equal or greater, and stopping when the sum exceeds the maximum value in the list.

Perfect Power detection in linear time

I'm trying to write a C program which, given a positive integer n (> 1) detect whether exists numbers x and r so that n = x^r
This is what I did so far:
while (c>=d) {
double y = pow(sum, 1.0/d);
if (floor(y) == y) {
out = y;
break;
}
d++;
}
In the program above, "c" is the maxium value for the exponent (r) and "d" will start by being equal to 2. Y is the value to be checked and the variable "out" is set to output that value later on. Basically, what the script does, is to check if the square roots of y exists: if not, he tries with the square cube and so on... When he finds it, he store the value of y in "out" so that: y = out^d
My question is, is there any more efficient way to find these values? I found some documentation online, but that's far more complicated than my high-school algebra. How can I implement this in a more efficient way?
Thanks!
In one of your comments, you state you want this to be compatible with gigantic numbers. In that case, you may want to bring in the GMP library, which supports operations on arbitrarily large numbers, one of those operations being checking if it is a perfect power.
It is open source, so you can check out the source code and see how they do it, if you don't want to bring in the whole library.
If n fits in a fixed-size (e.g. 32-bit) integer variable, the optimal solution is probably just hard-coding the list of such numbers and binary-searching it. Keep in mind, in int range, there are roughly
sqrt(INT_MAX) perfect squares
cbrt(INT_MAX) perfect cubes
etc.
In 32 bits, that's roughly 65536 + 2048 + 256 + 128 + 64 + ... < 70000.
You need the r-base logarithm, use an identity to calculate it using the natural log
So:
log_r(x) = log(x)/log(r)
So you need to calculate:
x = log(n)/log(r)
(In my neck of the wood, this is highschool math. Which immediately explains my having to look up whether I remembered that identity correctly :))
After you are calculating y in
double y = pow(sum, 1.0/d);
you can get the nearest int to it and you can use your own power function to check for the
equality condition with sum.
int x = (int)(y+0.5);
int a = your_power_func(x,d);
if (a == sum)
break;
I guess this way you can confirm whether a number is integer power of some other number or not.

Algorithm to pick values from array that sum closest to a target value?

I have an array of nearly sorted values 28 elements long. I need to find the set of values that sums to a target value provided to the algorithm (or if exact sum cannot be found, the closest sum Below the target value).
I currently have a simple algorithm that does the job but it doesn't always find the best match. It works under ideal circumstances with a specific set of values, but I need a more robust and accurate solution that can handle a wider variety of data sets.
The algorithm must be written in C, not C++, and is meant for an embedded system so keep that in mind.
Here's my current algorithm for reference. It iterates starting at the highest value available. If the current value is less than the target sum, it adds the value to the output and subtracts it from the target sum. This repeats until the sum has been reached or it runs out of values. It asumes a nearly ascending sorted list.
//valuesOut will hold a bitmask of the values to be used (LSB representing array index 0, next bit index 1, etc)
void pickValues(long setTo, long* valuesOut)
{
signed char i = 27;//last index in array
long mask = 0x00000001;
(*valuesOut) = 0x00000000;
mask = mask<< i;//shift to ith bit
while(i>=0 && setTo > 0)//while more values needed and available
{
if(VALUES_ARRAY[i] <= setTo)
{
(*valuesOut)|= mask;//set ith bit
setTo = setTo - VALUES_ARRAY[i]._dword; //remove from remaining }
//decrement and iterate
mask = mask >> 1;
i--;
}
}
A few more paramters:
The array of values is likely to be Nearly Sorted ascending, but that cannot be enforced so assume that there is not sorting. In fact, there may also be duplicate values.
It is quite possible that the array will hold a set of values that cannot create every sum within its range. If the exact sum cannot be found, the algorithm should return values that create the next Lowest Sum.
This problem is known as the subset sum problem, which is a special case of the Knapsack problem. Wikipedia is a good starting point for some algorithms.
As others have noted, this is same as the optimization version of subset sum problem, which is NP-Complete.
Since you mentioned that you are short in memory and can probably work with approximate solutions (based on your current solution), there are polynomial time approximation algorithms for solving the optimization version of subset sum.
For instance, given an e > 0, there is a polynomial time algorithm which uses O((n*logt)/e) space, (t is the target sum, n is the size of the array) which gives you a subset such that the sum z is no less than 1/(1+e) times the optimal.
i.e If the largest subset sum was y, then the algorithm finds a subset sum z such that
z <= y <= (1+e)z
and uses space O((n*logt)/e).
Such an algorithm can be found here: http://www.cs.dartmouth.edu/~ac/Teach/CS105-Winter05/Notes/nanda-scribe-3.pdf
Hope this helps.
If values are reasonably small, it's a simple dynamic programming (DP). Time complexity would be O(n * target) and memory requirements O(target). If that satisfies you, there're lots of DP tutorials on the web. For example, here the first discussed problem (with couns) is very much like yours (except they allow to use each number more than once):
http://www.topcoder.com/tc?module=Static&d1=tutorials&d2=dynProg
update
Yep, as other person noted, it's a simple case of knapsack problem.

Resources