How to find the biggest subset of an array given some constraints? - arrays

There is an array A[1........N]. How to find the largest subset of the array such that product of any two distinct element of the subset is not a perfect cube. Upper bound for N is 100000.
Example:
For A = 1 2 4 8. Answer will be {1, 2} or {1, 4} or {8, 2} 0r {8, 4}.
1 and 8 cannot come together in the solution.
Similarly 2 and 4.
My approach.
check all the subset of the given array and return the subset of maximum length which satisfies the constraint. It will take O(N*N*2^N).
create a graph out of the given array. Two nodes in the graph will be connected if their product is perfect cube. Our main task is to remove the minimum number of nodes such that there are no edges left in the graph (when we remove any node all the edges associated with the node will disappear). Here the main issue is the space (representation of graph). In the worst case size of the graph will be O(N*N).
Please help.

Explanation
Consider the factorization of each number as follows:
A[i] = x^3.y^2.z
i.e. we first find the largest cube that divides (and call it x), then the largest square (and call it y), then call whatever is left over z.
The product of A[i] with another A[j]=X^3.Y^2.Z will be a cube if and only if Y=z and Z=y.
Therefore, if you consider groups of numbers with the same value of y^2.z, these groups form into pairs, where for each pair you cannot take an element from both linked groups.
Clearly the best case is to take all the elements from whichever group is the largest in each pair.
There is one special case, where y^2.z is equal to 1. In this case, any number in the group is already a perfect cube and cannot be paired with another number from the same group. Therefore you can add just 1 number from the set of perfect cubes.
Example
Suppose our array was (expressed as a prime factorization):
A[0] = 2^3
A[1] = 3^3
A[2] = 2^2.3.5^3
A[3] = 2^2.3.7^3
A[4] = 2.3^2.13^3
We first assign these into groups:
Value 1 = Group A (2^3, 3^3)
Value 2^2.3 = Group B (2^2.3.5^3, 2^2.3.7^3)
Value 2.3^2 = Group C (2.3^2.13^3)
Group A is paired with itself, while group B is paired with group C.
Therefore we can take one element from group A, and the whole of group B, for a total of 3 elements in the final subset.

You can formulate it as a largest clique problem.
Create a graph with each number as a vertex and connect two vertices if their product is not a cube.
Now find the largest clique in the graph. See https://en.wikipedia.org/wiki/Clique_problem#Finding_maximum_cliques_in_arbitrary_graphs

Related

Given a sorted array of integers find subarrays such that the largest elements of the subarrays are within some distance of the smallest

For example, given an array
a = [1, 2, 3, 7, 8, 9]
and an integer
i = 2. Find maximal subarrays where the distance between the largest and the smallest elements is at most i. The output for the example above would be:
[1,2,3] [7,8,9]
The subarrays are maximal in the sense given two subarrays A and B. There exists no element b in B such that A + b satisfies the condition given. Does there exist a non-polynomial time algorithm for said problem ?
This problem might be solved in linear time using method of two pointers and two deques storing indices, the first deque keeps minimum, another keeps maximum in sliding window.
Deque for minimum (similar for maximum):
current_minimum = a[minq.front]
Adding i-th element of array: //at the right index
while (!minq.empty and a[minq.back] > a[i]):
//last element has no chance to become a minimum because newer one is better
minq.pop_back
minq.push_back(i)
Extracting j-th element: //at the left index
if (!minq.empty and minq.front == j)
minq.pop_front
So min-deque always contains non-decreasing sequence.
Now set left and right indices in 0, insert index 0 into deques, and start to move right. At every step add index in order into deques, and check than left..right interval range is good. When range becomes too wide (min-max distance is exceeded), stop moving right index, check length of the last good interval, compare with the best length.
Now move left index, removing elements from deques. When max-min becomes good, stop left and start with right again. Repeat until array end.

Find way to separate array so each subarrays sum is less or equal to a number

I have a mathematical/algorithmic problem here.
Given an array of numbers, find a way to separate it to 5 subarrays, so that sum of each subarrays is less than or equal to a given number. All numbers from the initial array, must go to one of the subarrays, and be part of one sum.
So the input to the algorithm would be:
d - representing the number that each subarrays sum has to be less or equal
A - representing the array of numbers that will be separated to different subarrays, and will be part of one sum
Algorithm complexity must be polynomial.
Thank you.
If by "subarray" you mean "subset" as opposed to "contiguous slice", it is impossible to find a polynomial time algorithm for this problem (unless P = NP). The Partition Problem is to partition a list of numbers into to sets such that the sum of both sets are equal. It is known to be NP-complete. The partition problem can be reduced to your problem as follows:
Suppose that x1, ..., x_n are positive numbers that you want to partition into 2 sets such that their sums are equal. Let d be this common sum (which would be the sum of the xi divided by 2). extend x_i to an array, A, of size n+3 by adding three copies of d. Clearly the only way to partition A into 5 subarrays so that the sum of each is less than or equal to d is if the sum of each actually equals d. This would in turn require 3 of the subarrays to have length 1, each consisting of the number d. The remaining 2 subarrays would be exactly a partition of the original n numbers.
On the other hand, if there are additional constraints on what the numbers are and/or the subarrays need to be, there might be a polynomial solution. But, if so, you should clearly spell out what there constraints are.
Set up of the problem:
d : the upper bound for the subarray
A : the initial array
Assuming A is not sorted.
(Heuristic)
Algorithm:
1.Sort A in ascending order using standard sorting algorithm->O(nlogn)
2.Check if the largest element of A is greater than d ->(constant)
if yes, no solution
if no, continue
3.Sum up all the element in A, denote S. Check if S/5 > d ->O(n)
if yes, no solution
if no, continue
4.Using greedy approach, create a new subarray Asi, add next biggest element aj in the sorted A to Asi so that the sum of Asi does not exceed d. Remove aj from sorted A ->O(n)
repeat step4 until either of the condition satisfied:
I.At creating subarray Asi, there are only 5-i element left
In this case, split the remaining element to individual subarray, done
II. i = 5. There are 5 subarray created.
The algorithm described above is bounded by O(nlogn) therefore in polynomial time.

Maximize sum of weights with constraints given on left and right indices in array

I recently came through an interesting coding problem, which is as follows:
There are n boxes, let us assume this is an array of n boxes.
For each index i of this array, three values are given -
1.) Weight(i)
2.) Left(i)
3.) Right(i)
left(i) means - if weight[i] is chosen, we are not allowed to choose left[i] elements from the left of this ith element.
Similarly, right[i] means if arr[i] is chosen, we are not allowed to choose right[i] elements from the right of it.
Example :
Weight[2] = 5
Left[2] = 1
Right[2] = 3
Then, if I pick element at position 2, I get weight of 5 units. But, I cannot pick elements at position {1} (due to left constraint). And cannot pick elements at position {3,4,5} (due to right constraint).
Objective - We need to calculate the maximum sum of the weights we can pick.
Sample Test Case :-
**Input: **
5
2 0 3
4 0 0
3 2 0
7 2 1
9 2 0
**Output: **
13
Note - First column is weights, Second column is left constraints, Third column is right constraints
I used Dynamic Programming approach(similar to Longest Increasing Subsequence) to reach a O(n^2) solution. But, not able to think of a O(n*logn) solution. (n can be up to 10^5.)
I also tried to use priority queue, in which elements with lower value of (right[i] + i) are given higher priority(assigned higher priority to element with lower value of "i", in case primary key value is equal). But, it is also giving timeout error.
Any other approach for this? or any optimization in priority queue method? I can post both of my codes if needed.
Thanks.
One approach is to use a binary indexed tree to create a data structure that makes it easy to do two operations in O(logn) time each:
Insert number into an array
Find maximum in a given range
We will use this data structure to hold the maximum weight that can be achieved by selecting box i along with an optimal selection of boxes to the left.
The key is that we will only insert values into this data structure when we reach a point where the right constraint has been met.
To find the best value for box i, we need to find the maximum value in the data structure for all points up to location i-left[i], which can be done in O(logn).
The final algorithm is to loop over i=0..n-1 and for each i:
Compute result for box i by finding maximum in range 0..(i-left[i])
Schedule the result to be added when we reach location i+right[i]
Add any previously scheduled results into our data structure
The final result is the maximum value in the whole data structure.
Overall, the complexity is o(nlogn) because each value of i results in one lookup and one update operation.

Optimize queries performed on a subarray

I recently interviewed at Google. Because of this question my process didn't move forward after 2 rounds.
Suppose you are given an array of numbers. You can be given queries
to:
Find the sum of the values between indexes i and j.
Update value at index i to a new given value.
Find the maximum of the values between indexes i and j.
Check whether the subarray between indexes i and j, both inclusive, is in ascending or descending order.
I gave him a solution but it was to check the subarray between the indexes i and j. He asked me to optimize it. I thought of using a hashtable so that if the starting index is same and the ending index is more than the previous found, we store the maximum and whether its in ascending or descending and check only the remaining subarray. But that also didn't optimize it as much as required.
I'd love to know how I can optimize the solution so as to make it acceptable.
Constraints:
Everything from [1,10^5]
Thanks :)
All this queries can be answered in O(log N) time per query in the worst case(with O(N) time for preprocessing). You can just build a segment tree and maintain the sum, the maximum and two boolean flags(they indicate whether the range which corresponds to this node is sorted in ascending/descending order or not) for each node. All this values can be recomputed efficiently for an update query because only O(log N) nodes can change(they lie on the path from the root to a leaf which corresponds to the changing element). All other range queries(sum, max, sorted or not) are decomposed into O(log N) nodes(due to the properties of a segment tree), and it is easy to combine the value of two nodes in O(1)(for example, for sum the result of combining 2 nodes is just the sum of values for these nodes).
Here is some pseudo code. It shows what data should be stored in a node and how to combine values of 2 nodes:
class Node {
bool is_ascending
bool is_descending
int sum
int max
int leftPos
int rightPos
}
Node merge(Node left, Node right) {
res = Node()
res.leftPos = left.leftPos
res.rightPos = right.rightPos
res.sum = left.sum + right.sum
res.max = max(left.max, right.max)
res.is_ascending = left.is_ascending and right.is_ascending
and array[left.rightPos] < array[right.leftPos]
res.is_descending = left.is_descending and right.is_descending
and array[left.rightPos] > array[right.leftPos]
return res
}
As andy pointed out in the comments: The queries are quite different in nature, so the "best" solution will probably depend on which query type is executed most frequently.
However, the task
Find the sum of the values between indexes i and j.
can efficiently be solved by performing a scan/prefix sum computation of the array. Imagine an array of int values
index: 0 1 2 3 4 5 6
array: [ 3, 8, 10, -5, 2, 12, 7 ]
Then you compute the Prefix Sum:
index: 0 1 2 3 4 5 6, 7
prefix: [ 0, 3, 11, 21, 16, 18, 30, 37 ]
Note that this can be computed particularly efficient in parallel. In fact, this is one important building block of many parallel algorithms, as described in the thesis "Vector Models for Data-Parallel Computing" by Guy E. Blelloch (thesis as PDF File).
Additionally, it can be computed in two ways: Either starting with the value from array[0], or starting with 0. This will, of course, affect how the resulting prefix array has to be accessed. Here, I started with 0, and made the resulting array one element longer than the input array. This may also be implemented differently, but in this case, it makes it easier to obey the array limits (although one would still have to clarify in which cases indices should be considered as inclusive or exclusive).
However, given this prefix sum array, one can compute the sum of elements between indices i and j in constant time, by simply subtracting the corresponding values of the prefix sum array:
sum(n=i..j)(array[n]) = (prefix[j+1] - prefix[i])
For example:
sum(n=2..5)(array[n]) = 10 + (-5) + 2 + 12 = 19
prefix[5+1] - prefix[2] = 30 - 11 = 19
For task 2,
Update value at index i to a new given value.
this would mean that the prefix sums would have to be updated. This could be done brute-force, in linear time, by just adding the difference of the old value and the new value to all prefix sums that appear after the modified element (but for this, also see the notes in the last section of this answer)
The tasks 3 and 4
Find the maximum of the values between indexes i and j.
Check whether the subarray between indexes i and j, both inclusive, is in ascending or descending order.
I could imagine that the maximum value could simply be tracked while building the prefix sums, as well as checking whether the values are only ascending or descending. However, when values are updated, this information would have to be re-computed.
In any case, there are some data structures that deal with prefix sums in particular. I think that a Fenwick tree might allow to implement some of the O(n) operations mentioned above in O(logn), but I have not yet looked at this in detail.

Find the Element Occurring b times in an an array of size n*k+b

Description
Given an Array of size (n*k+b) where n elements occur k times and one element occurs b times, in other words there are n+1 distinct Elements. Given that 0 < b < k find the element occurring b times.
My Attempted solutions
Obvious solution will be using hashing but it will not work if the numbers are very large. Complexity is O(n)
Using map to store the frequencies of each element and then traversing map to find the element occurring b times.As Map's are implemented as height balanced trees Complexity will be O(nlogn).
Both of my solution were accepted but the interviewer wanted a linear solution without using hashing and hint he gave was make the height of tree constant in tree in which you are storing frequencies, but I am not able to figure out the correct solution yet.
I want to know how to solve this problem in linear time without hashing?
EDIT:
Sample:
Input: n=2 b=2 k=3
Aarray: 2 2 2 3 3 3 1 1
Output: 1
I assume:
The elements of the array are comparable.
We know the values of n and k beforehand.
A solution O(n*k+b) is good enough.
Let the number occuring only b times be S. We are trying to find the S in an array of n*k+b size.
Recursive Step: Find the median element of the current array slice as in Quick Sort in lineer time. Let the median element be M.
After the recursive step you have an array where all elements smaller than M occur on the left of the first occurence of M. All M elements are next to each other and all element larger than M are on the right of all occurences of M.
Look at the index of the leftmost M and calculate whether S<M or S>=M. Recurse either on the left slice or the right slice.
So you are doing a quick sort but delving only one part of the divisions at any time. You will recurse O(logN) times but each time with 1/2, 1/4, 1/8, .. sizes of the original array, so the total time will still be O(n).
Clarification: Let's say n=20 and k = 10. Then, there are 21 distinct elements in the array, 20 of which occur 10 times and the last occur let's say 7 times. I find the medium element, let's say it is 1111. If the S<1111 than the index of the leftmost occurence of 1111 will be less than 11*10. If S>=1111 then the index will be equal to 11*10.
Full example: n = 4. k = 3. Array = {1,2,3,4,5,1,2,3,4,5,1,2,3,5}
After the first recursive step I find the median element is 3 and the array is something like: {1,2,1,2,1,2,3,3,3,5,4,5,5,4} There are 6 elements on the left of 3. 6 is a multiple of k=3. So each element must be occuring 3 times there. So S>=3. Recurse on the right side. And so on.
An idea using cyclic groups.
To guess i-th bit of answer, follow this procedure:
Count how many numbers in array has i-th bit set, store as cnt
If cnt % k is non-zero, then i-th bit of answer is set. Otherwise it is clear.
To guess whole number, repeat the above for every bit.
This solution is technically O((n*k+b)*log max N), where max N is maximal value in the table, but because number of bits is usually constant, this solution is linear in array size.
No hashing, memory usage is O(log k * log max N).
Example implementation:
from random import randint, shuffle
def generate_test_data(n, k, b):
k_rep = [randint(0, 1000) for i in xrange(n)]
b_rep = [randint(0, 1000)]
numbers = k_rep*k + b_rep*b
shuffle(numbers)
print "k_rep: ", k_rep
print "b_rep: ", b_rep
return numbers
def solve(data, k):
cnts = [0]*10
for number in data:
bits = [number >> b & 1 for b in xrange(10)]
cnts = [cnts[i] + bits[i] for i in xrange(10)]
return reduce(lambda a,b:2*a+(b%k>0), reversed(cnts), 0)
print "Answer: ", solve(generate_test_data(10, 15, 13), 3)
In order to have a constant height B-tree containing n distinct elements, with height h constant, you need z=n^(1/h) children per nodes: h=log_z(n), thus h=log(n)/log(z), thus log(z)=log(n)/h, thus z=e^(log(n)/h), thus z=n^(1/h).
Example, with n=1000000, h=10, z=3.98, that is z=4.
The time to reach a node in that case is O(h.log(z)). Assuming h and z to be "constant" (since N=n.k, then log(z)=log(n^(1/h))=log(N/k^(1/h))=ct by properly choosing h based on k, you can then say that O(h.log(z))=O(1)... This is a bit far-fetched, but maybe that was the kind of thing the interviewer wanted to hear?
UPDATE: this one use hashing, so it's not a good answer :(
in python this would be linear time (set will remove the duplicates):
result = (sum(set(arr))*k - sum(arr)) / (k - b)
If 'k' is even and 'b' is odd, then XOR will do. :)

Resources