Dynamic programming with sets - c

I have a typical question in dynamic programming.
My question is given an array = {1,2,3,4,5,6}, I have to find all the arrays whose sum is atmost k. If I consider all the sets, it will become exponential alogorthm. I thought of achiveng this by Dynamic Programming.
Suppose f k =7,
My idea is
Pass 1: {1],{2}....{6}
Pass 2: Pass1 + {1,2},{1,3},{1,4},{1,5}
Pass 3: Pass2 + {1,2,3},
And my algo stops.
Im not able to formulate this with dynamic programming. Any inputs?? How to formulate this algo into program?

A DP solution for the problem should follow the next recursive formula, and build bottom-up:
f(i,0) = {{}} //a set containing only an empty set
f(0,W) = {{}} (W > 0)
f(0,W) = {} (W < 0) //an empty set
f(i,W) = f(i-1,W) [union] extend(f(i-1,w-element[i]),element[i])
Where the function extend(set,e) is:
extend(set,e):
for each s in set: //s is a set itself
s.add(e)
Note that complexity could still be exponential (and not even pseudo-polynomial), since the number of sets generated could be exponential, and is stored in the DP table.

your problem is an instance of the knapsack problem whose related decision problem is known to be NP-complete. this means that most certainly there will be no sub-exponential algorithm (though a mathematical proof is missing ).
ZachLangleys comment shows that the enumeration of all solutions would still be exponential in the worst case even if there was an efficient problem solver since producing the output already requires exponential time.
since the decision problem is NP-complete, counting can not be easier (otherwise you'd count and afterwards test the result whether it equals 0 or not).

Related

Theory of arrays in Z3: (1) model is difficult to understand, (2) do not know how to implement functions and (3) difference with sequences

Following to the question published in How expressive can we be with arrays in Z3(Py)? An example, I expressed the following formula in Z3Py:
Exists i::Integer s.t. (0<=i<|arr|) & (avg(arr)+t<arr[i])
This means: whether there is a position i::0<i<|arr| in the array whose value a[i] is greater than the average of the array avg(arr) plus a given threshold t.
The solution in Z3Py:
t = Int('t')
avg_arr = Int('avg_arr')
len_arr = Int('len_arr')
arr = Array('arr', IntSort(), IntSort())
phi_1 = And(0 <= i, i< len_arr)
phi_2 = (t+avg_arr<arr[i])
phi = Exists(i, And(phi_1, phi_2))
s = Solver()
s.add(phi)
print(s.check())
print(s.model())
Note that, (1) the formula is satisfiable and (2) each time I execute it, I get a different model. For instance, I just got: [avg_a = 0, t = 7718, len_arr = 1, arr = K(Int, 7719)].
I have three questions now:
What does arr = K(Int, 7719)] mean? Does this mean the array contains one Int element with value 7719? In that case, what does the K mean?
Of course, this implementation is wrong in the sense that the average and length values are independent from the array itself. How can I implement simple avg and len functions?
Where is the i index in the model given by the solver?
Also, in which sense would this implementation be different using sequences instead of arrays?
(1) arr = K(Int, 7719) means that it's a constant array. That is, at every location it has the value 7719. Note that this is truly "at every location," i.e., at every integer value. There's no "size" of the array in SMTLib parlance. For that, use sequences.
(2) Indeed, your average/length etc are not related at all to the array. There are ways of modeling this using quantifiers, but I'd recommend staying away from that. They are brittle, hard to code and maintain, and furthermore any interesting theorem you want to prove will get an unknown as answer.
(3) The i you declared and the i you used as the existential is completely independent of each other. (Latter is just a trick so z3 can recognize it as a value.) But I guess you removed that now.
The proper way to model such problems is using sequences. (Although, you shouldn't expect much proof performance there either.) Start here: https://microsoft.github.io/z3guide/docs/theories/Sequences/ and see how much you can push it through. Functions like avg will need a recursive definition most likely, for that you can use RecAddDefinition, for an example see: https://stackoverflow.com/a/68457868/936310
Stack-overflow works the best when you try to code these yourself and ask very specific questions about how to proceed, as opposed to overarching questions. (But you already knew that!) Best of luck..

Can a recursive function containing a for loop that contains a call of the mentioned function be implemented using only for loops?

Similar questions have been asked and the general consensus is that anything can be converted from recursion to for loops and vice versa. However, I can't find a way to convert a function of the following pseudocode type to a for loop:
def recursive(n):
if n == 0:
return
for i in range(some_number):
do_sth...
recursive(n-1)
In this case, there is n nested loops and n varies depending on the given argument. When using only for loops, the number of nested loops seems to be always predetermined in the code, it doesn't vary depending on "input". Is there a way to make something like this using only for loops?
Is there a way to make something like this using only for loops?
Well, if you admit a while loop as a case of a pseudocode for loop, at least your example can be made:
def nonrecursive(n):
a = []
z = 0
while n:
while n:
i = z
if i == some_number: break
print((n, i))
a += [[n, i]]
n -= 1
z = 0
if not a: break
n, i = a.pop()
i += 1
z = i
We need to be careful here.
The general true statement is that loops can replace recursion and vice versa. This can be shown lots of ways; see the structured programming theorem for ideas.
Whether for loops can replace recursion depends upon your definitions. Can your for loops run forever, or for an indefinite amount of time not known in advance? If so, they are functionally equivalent to while loops, and they can replace recursion. If your for loops cannot be made to run forever or for an unknown (initially) number of iterations, recursion cannot always be replaced.
Really, it's while loops (plus a stack data structure) that can replace recursion without much trouble.

Does the array “sum and/or sub” to `x`?

Goal
I would like to write an algorithm (in C) which returns TRUE or FALSE (1 or 0) depending whether the array A given in input can “sum and/or sub” to x (see below for clarification). Note that all values of A are integers bounded between [1,x-1] that were randomly (uniformly) sampled.
Clarification and examples
By “sum and/or sub”, I mean placing "+" and "-" in front of each element of array and summing over. Let's call this function SumSub.
int SumSub (int* A,int x)
{
...
}
SumSub({2,7,5},10)
should return TRUE as 7-2+5=10. You will note that the first element of A can also be taken as negative so that the order of elements in A does not matter.
SumSub({2,7,5,2},10)
should return FALSE as there is no way to “sum and/or sub” the elements of array to reach the value of x. Please note, this means that all elements of A must be used.
Complexity
Let n be the length of A. Complexity of the problem is of order O(2^n) if one has to explore all possible combinations of pluses and minus. However, some combinations are more likely than others and therefore are worth being explored first (hoping the output will be TRUE). Typically, the combination which requires substracting all elements from the largest number is impossible (as all elements of A are lower than x). Also, if n>x, it makes no sense to try adding all the elements of A.
Question
How should I go about writing this function?
Unfortunately your problem can be reduced to subset-sum problem which is NP-Complete. Thus the exponential solution can't be avoided.
The original problem's solution is indeed exponential as you said. BUT with the given range[1,x-1] for numbers in A[] you can make the solution polynomial. There is a very simple dynamic programming solution.
With the order:
Time Complexity: O(n^2*x)
Memory Complexity: O(n^2*x)
where, n=num of elements in A[]
You need to use dynamic programming approach for this
You know the min,max range that can be made in in the range [-nx,nx]. Create a 2d array of size (n)X(2*n*x+1). Lets call this dp[][]
dp[i][j] = taking all elements of A[] from [0..i-1] whether its possible to make the value j
so
dp[10][3] = 1 means taking first 10 elements of A[] we CAN create the value 3
dp[10][3] = 0 means taking first 10 elements of A[] we can NOT create the value 3
Here is a kind of pseudo code for this:
int SumSub (int* A,int x)
{
bool dp[][];//set all values of this array 0
dp[0][0] = true;
for(i=1;i<=n;i++) {
int val = A[i-1];
for(j=-n*x;j<=n*x;j++) {
dp[i][j]=dp[ i-1 ][ j + val ] | dp[ i-1 ][ j - val ];
}
}
return dp[n][x];
}
Unfortunately this is NP-complete even when x is restricted to the value 0, so don't expect a polynomial-time algorithm. To show this I'll give a simple reduction from the NP-hard Partition Problem, which asks whether a given multiset of positive integers can be partitioned into two parts having equal sums:
Suppose we have an instance of the Partition Problem consisting of n positive integers B_1, ..., B_n. Create from this an instance of your problem in which A_i = B_i for each 1 <= i <= n, and set x = 0.
Clearly if there is a partition of B into two parts C and D having equal sums, then there is also a solution to the instance of your problem: Put a + in front of every number in C, and a - in front of every number in D (or the other way round). Since C and D have equal sums, this expression must equal 0.
OTOH, if the solution to the instance of your problem that we just created is YES (TRUE), then we can easily create a partition of B into two parts having equal sums: just put all the positive terms in one part (say, C), and all the negative terms (without the preceding - of course) in the other (say, D). Since we know that the total value of the expression is 0, it must be that the sum of the (positive) numbers in C is equal to the (negated) sum of the numbers in D.
Thus a YES to either problem instance implies a YES to the other problem instance, which in turn implies that a NO to either problem instance implies a NO to the other problem instance -- that is, the two problem instances have equal solutions. Thus if it were possible to solve your problem in polynomial time, it would be possible to solve the NP-hard Partition Problem in polynomial time too, by constructing the above instance of your problem, solving it with your poly-time algorithm, and reporting the result it gives.

Homework: Creating O(n) algorithm for sorting

I am taking the cs50 course on edx and am doing the hacker edition of pset3 (in essence it is the advanced version).
Basically the program takes a value to be searched for as the command-line argument, and then asks for a bunch of numbers to be used in an array.
Then it sorts and searches that array for the value entered at the command-line.
The way the program is implemented, it uses a pseudo-random number generator to feed the numbers for the array.
The task is to write the search and sorting functions.
I already have the searching function, but the sorting function is supposed to be O(n).
In the regular version you were supposed to use a O(n ^ 2) algorithm which wasn't a problem to implement. Also using a log n algorithm wouldn't be an issue either.
But the problem set specifically ask's for a big O(n) algorithm.
It gives a hint in saying that, since no number in the array is going to be negative, and the not greater than LIMIT (the numbers output by the generator are modulo'd so they are not greater than 65000). But how does that help in getting the algorithm to be O(n)?
But the counting sort algorithm, which purports to be an acceptable solution, returns a new sorted array rather than actually sort the original one, and that contradicts with the pset specification that reads 'As this return type of void implies, this function must not return a sorted array; it must instead "destructively" sort the actual array that it’s passed by moving around the values therein.'
Also, if we decide to copy the sorted array onto the original one using another loop, with so many consecutive loops, I'm not sure if the sorting function can be considered to have a running time of O(n) anymore. Here is the actual pset, the question is about the sorting part.
Any ideas to how to implement such an algorithm would be greatly appreciated. It's not necessary to provide actual code, rather just the logic of you can create a O(n) algorithm under the conditions provided.
It gives a hint in saying that, since no number in the array is going
to be negative, and the not greater than LIMIT (the numbers outputted
by the generator are modulo'd to not be higher than 65000). But how
does that help in getting the algorithm to be O(n).
That hint directly seems to point towards counting sort.
You create 65000 buckets and use them to count the number of occurrences of each number.
Then, you just revisit the buckets and you have the sorted result.
It takes advantage of the fact that:
They are integers.
They have a limited range.
Its complexity is O(n) and as this is not a comparison-based sort, the O(nlogn) lower bound on sorting does not apply. A very good visualization is here.
As #DarkCthulhu said, counting sort is clearly what they were urging you to use. But you could also use a radix sort.
Here is a particularly concise radix 2 sort that exploits a nice connection to Gray codes. In your application it would require 16 passes over the input, one per data bit. For big inputs, the counting sort is likely to be faster. For small ones, the radix sort ought to be fster because you avoid initializing 256K bytes or more of counters.
See this article for explanation.
void sort(unsigned short *a, int len)
{
unsigned short bit, *s = a, *d = safe_malloc(len * sizeof *d), *t;
unsigned is, id0, id1;
for (bit = 1; bit; bit <<= 1, t = s, s = d, d = t)
for (is = id0 = 0, id1 = len; is < len; ++is)
if (((s[is] >> 1) ^ s[is]) & bit)
d[--id1] = s[is];
else
d[id0++] = s[is];
free(d);
}

Find duplicate entry in array of integers

As a homework question, the following task had been given:
You are given an array with integers between 1 and 1,000,000. One
integer is in the array twice. How can you determine which one? Can
you think of a way to do it using little extra memory.
My solutions so far:
Solution 1
List item
Have a hash table
Iterate through array and store its elements in hash table
As soon as you find an element which is already in hash table, it is
the dup element
Pros
It runs in O(n) time and with only 1 pass
Cons
It uses O(n) extra memory
Solution 2
Sort the array using merge sort (O(nlogn) time)
Parse again and if you see a element twice you got the dup.
Pros
It doesn't use extra memory
Cons
Running time is greater than O(n)
Can you guys think of any better solution?
The question is a little ambiguous; when the request is "which one," does it mean return the value that is duplicated, or the position in the sequence of the duplicated one? If the former, any of the following three solutions will work; if it is the latter, the first is the only that will help.
Solution #1: assumes array is immutable
Build a bitmap; set the nth bit as you iterate through the array. If the bit is already set, you've found a duplicate. It runs on linear time, and will work for any size array.
The bitmap would be created with as many bits as there are possible values in the array. As you iterate through the array, you check the nth bit in the array. If it is set, you've found your duplicate. If it isn't, then set it. (Logic for doing this can be seen in the pseudo-code in this Wikipedia entry on Bit arrays or use the System.Collections.BitArray class.)
Solution #2: assumes array is mutable
Sort the array, and then do a linear search until the current value equals the previous value. Uses the least memory of all. Bonus points for altering the sort algorithm to detect the duplicate during a comparison operation and terminating early.
Solution #3: (assumes array length = 1,000,001)
Sum all of the integers in the array.
From that, subtract the sum of the integers 1 through 1,000,000 inclusive.
What's left will be your duplicated value.
This take almost no extra memory, can be done in one pass if you calculate the sums at the same time.
The disadvantage is that you need to do the entire loop to find the answer.
The advantages are simplicity, and a high probability it will in fact run faster than the other solutions.
Assuming all the numbers from 1 to 1,000,000 are in the array, the sum of all numbers from 1 to 1,000,000 is (1,000,000)*(1,000,000 + 1)/2 = 500,000 * 1,000,001 = 500,000,500,000.
So just add up all the numbers in the array, subtract 500,000,500,000, and you'll be left with the number that occured twice.
O(n) time, and O(1) memory.
If the assumption isn't true, you could try using a Bloom Filter - they can be stored much more compactly than a hash table (since they only store fact of presence), but they do run the risk of false positives. This risk can be bounded though, by our choice of how much memory to spend on the bloom filter.
We can then use the bloom filter to detect potential duplicates in O(n) time and check each candidate in O(n) time.
This python code is a modification of QuickSort:
def findDuplicate(arr):
orig_len = len(arr)
if orig_len <= 1:
return None
pivot = arr.pop(0)
greater = [i for i in arr if i > pivot]
lesser = [i for i in arr if i < pivot]
if len(greater) + len(lesser) != orig_len - 1:
return pivot
else:
return findDuplicate(lesser) or findDuplicate(greater)
It finds a duplicate in O(n logn)), I think. It uses extra memory on the stack, but it can be rewritten to use only one copy of the original data, I believe:
def findDuplicate(arr):
orig_len = len(arr)
if orig_len <= 1:
return None
pivot = arr.pop(0)
greater = [arr.pop(i) for i in reversed(range(len(arr))) if arr[i] > pivot]
lesser = [arr.pop(i) for i in reversed(range(len(arr))) if arr[i] < pivot]
if len(arr):
return pivot
else:
return findDuplicate(lesser) or findDuplicate(greater)
The list comprehensions that produce greater and lesser destroy the original with calls to pop(). If arr is not empty after removing greater and lesser from it, then there must be a duplicate and it must be pivot.
The code suffers from the usual stack overflow problems on sorted data, so either a random pivot or an iterative solution which queues the data is necessary:
def findDuplicate(full):
import copy
q = [full]
while len(q):
arr = copy.copy(q.pop(0))
orig_len = len(arr)
if orig_len > 1:
pivot = arr.pop(0)
greater = [arr.pop(i) for i in reversed(range(len(arr))) if arr[i] > pivot]
lesser = [arr.pop(i) for i in reversed(range(len(arr))) if arr[i] < pivot]
if len(arr):
return pivot
else:
q.append(greater)
q.append(lesser)
return None
However, now the code needs to take a deep copy of the data at the top of the loop, changing the memory requirements.
So much for computer science. The naive algorithm clobbers my code in python, probably because of python's sorting algorithm:
def findDuplicate(arr):
arr = sorted(arr)
prev = arr.pop(0)
for element in arr:
if element == prev:
return prev
else:
prev = element
return None
Rather than sorting the array and then checking, I would suggest writing an implementation of a comparison sort function that exits as soon as the dup is found, leading to no extra memory requirement (depending on the algorithm you choose, obviously) and a worst case O(nlogn) time (again, depending on the algorithm), rather than a best (and average, depending...) case O(nlogn) time.
E.g. An implementation of in-place merge sort.
http://en.wikipedia.org/wiki/Merge_sort
Hint: Use the property that A XOR A == 0, and 0 XOR A == A.
As a variant of your solution (2), you can use radix sort. No extra memory, and will run in
linear time. You can argue that time is also affected by the size of numbers representation, but you have already given bounds for that: radix sort runs in time O(k n), where k is the number of digits you can sort ar each pass. That makes the whole algorithm O(7n)for sorting plus O(n) for checking the duplicated number -- which is O(8n)=O(n).
Pros:
No extra memory
O(n)
Cons:
Need eight O(n) passes.
And how about the problem of finding ALL duplicates? Can this be done in less than
O(n ln n) time? (Sort & scan) (If you want to restore the original array, carry along the original index and reorder after the end, which can be done in O(n) time)
def singleton(array):
return reduce(lambda x,y:x^y, array)
Sort integer by sorting them on place they should be. If you get "collision" than you found the correct number.
space complexity O(1) (just same space that can be overwriten)
time complexity less than O(n) becuse you will statisticaly found collison before getting on the end.

Resources