Is there a more elegant way of doing this? - arrays

Given an array of positive integers a I want to output array of integers b so that b[i] is the closest number to a[i] that is smaller then a[i], and is in {a[0], ... a[i-1]}. If such number doesn't exist, then b[i] = -1.
Example:
a = 2 1 7 5 7 9
b = -1 -1 2 2 5 7
b[0] = -1 since there is no number that is smaller than 2
b[1] = -1 since there is no number that is smaller than 1 from {2}
b[2] = 2, closest number to 7 that is smaller than 7 from {2,1} is 2
b[3] = 2, closest number to 5 that is smaller than 5 from {2,1,7} is 2
b[4] = 5, closest number to 7 that is smaller than 7 from {2,1,7,5} is 5
I was thinking about implementing balanced binary tree, however it will require a lot of work. Is there an easier way of doing this?

Here is one approach:
for i ← 1 to i ← (length(A)-1) {
// A[i] is added in the sorted sequence A[0, .. i-1] save A[i] to make a hole at index j
item = A[i]
j = i
// keep moving the hole to next smaller index until A[j - 1] is <= item
while j > 0 and A[j - 1] > item {
A[j] = A[j - 1] // move hole to next smaller index
j = j - 1
}
A[j] = item // put item in the hole
// if there are elements to the left of A[j] in sorted sequence A[0, .. i-1], then store it in b
// TODO : run loop so that duplicate entries wont hamper results
if j > 1
b[i] = A[j-1]
else
b[1] = -1;
}
Dry run:
a = 2 1 7 5 7 9
a[1] = 2
its straight forward, set b[1] to -1
a[2] = 1
insert into subarray : [1 ,2]
any elements before 1 in sorted array ? no.
So set b[2] to -1 . b: [-1, -1]
a[3] = 7
insert into subarray : [1 ,2, 7]
any elements before 7 in sorted array ? yes. its 2
So set b[3] to 2. b: [-1, -1, 2]
a[4] = 5
insert into subarray : [1 ,2, 5, 7]
any elements before 5 in sorted array ? yes. its 2
So set b[4] to 2. b: [-1, -1, 2, 2]
and so on..

Here's a sketch of a (nearly) O(n log n) algorithm that's somewhere in between the difficulty of implementing an insertion sort and balanced binary tree: Do the problem backwards, use merge/quick sort, and use binary search.
Pseudocode:
let c be a copy of a
let b be an array sized the same as a
sort c using an O(n log n) algorithm
for i from a.length-1 to 1
binary search over c for key a[i] // O(log n) time
remove the item found // Could take O(n) time
if there exists an item to the left of that position, b[i] = that item
otherwise, b[i] = -1
b[0] = -1
return b
There's a few implementation details that can make this have poor runtime.
For instance, since you have to remove items, doing this on a regular array and shifting things around will make this algorithm still take O(n^2) time. So, you could store key-value pairs instead. One would be the key, and the other would be the number of those keys (kind of like a multiset implemented on an array). "Removing" one would just be subtracting the second item from the pair and so on.
Eventually you will be left with a bunch of 0-value keys. This would eventually make the if there exists an item to the left take roughly O(n) time, and therefore, the entire algorithm would degrade to a O(n^2) for that reason. So another optimization might be to batch remove all of them periodically. For instance, when 1/2 of them are 0-values, perform a pruning.
The ideal option might be to implement another data structure that has a much more favorable remove time. Something along the lines of a modified unrolled linked list with indices could work, but it would certainly increase the implementation complexity of this approach.
I've actually implemented this. I used the first two optimizations above (storing key-value pairs for compression, and pruning when 1/2 of them are 0s). Here's some benchmarks to compare using an insertion sort derivative to this one:
a.length This method Insert sort Method
100 0.0262ms 0.0204ms
1000 0.2300ms 0.8793ms
10000 2.7303ms 75.7155ms
100000 32.6601ms 7740.36 ms
300000 98.9956ms 69523.6 ms
1000000 333.501 ms ????? Not patient enough
So, as you can see, this algorithm grows much, much slower than the insertion sort method I posted before. However, it took 73 lines of code vs 26 lines of code for the insertion sort method. So in terms of simplicity, the insertion sort method might still be the way to go if you don't have time requirements/the input is small.

You could treat it like an insertion sort.
Pseudocode:
let arr be one array with enough space for every item in a
let b be another array with, again, enough space for all elements in a
For each item in a:
perform insertion sort on item into arr
After performing the insertion, if there exists a number to the left, append that to b.
Otherwise, append -1 to b
return b
The main thing you have to worry about is making sure that you don't make the mistake of reallocating arrays (because it would reallocate n times, which would be extremely costly). This will be an implementation detail of whatever language you use (std::vector's reserve for C++ ... arr.reserve(n) for D ... ArrayList's ensureCapacity in Java...)
A potential downfall with this approach compared to using a binary tree is that it's O(n^2) time. However, the constant factors using this method vs binary tree would make this faster for smaller sizes. If your n is smaller than 1000, this would be an appropriate solution. However, O(n log n) grows much slower than O(n^2), so if you expect a's size to be significantly higher and if there's a time limit that you are likely to breach, you might consider a more complicated O(n log n) algorithm.
There are ways to slightly improve the performance (such as using a binary insertion sort: using binary search to find the position to insert into), but generally they won't improve performance enough to matter in most cases since it's still O(n^2) time to shift elements to fit.

Consider this:
a = 2 1 7 5 7 9
b = -1 -1 2 2 5 7
c 0 1 2 3 4 5 6 7 8 9
0 - - - - - - - - - -
Where the index of C is value of a[i] such that 0,3,4,6,8 would have null values.
and the 1st dimension of C contains the highest to date closest value to a[i]
So in step by a[3] we have the following
c 0 1 2 3 4 5 6 7 8 9
0 - -1 -1 - - 2 - 2 - -
and by step a[5] we have the following
c 0 1 2 3 4 5 6 7 8 9
0 - -1 -1 - - 2 - 5 - 7
This way when we get to the 2nd 7 at a[4] we know that 2 is the largest value to date and all we need to do is loop back through a[i-1] until we encounter a 7 again comparing the a[i] value to that in c[7] if bigger, replace c[7]. Once a[i-1] = the 7 we put c[7] into b[i] and move on to next a[i].
The main downfalls to this approach that I can see are:
footprint size depending on how big the c[] needs to be dimensioned..
the fact that you have to revisit elements of a[] that you've already touched. If the distribution of data is such that there are significant spaces between the two 7's then keeping track of the highest value as you go would presumably be faster. Alternatively it might be better to gather statistics on the a[i] up front to know what distributions exist and then use a hybrid method maintaining the max until such time that no more instances of that number are in the statistics.

Related

Convert sorted array into low high array

Interview question:
Given a sorted array of this form :
1,2,3,4,5,6,7,8,9
( A better example would be 10,20,35,42,51,66,71,84,99 but let's use above one)
Convert it to the following low high form without using extra memory or a standard library
1,9,2,8,3,7,4,6,5
A low-high form means that we use the smallest followed by highest. Then we use the second smallest and second-highest.
Initially, when he asked, I had used a secondary array and used the 2 pointer approach. I kept one pointer in front and the second pointer at last . then one by one I copied left and right data to my new array and then moved left as left ++ and right as --right till they cross or become same.
After this, he asked me to do it without memory.
My approach to solving it without memory was on following lines . But it was confusing and not working
1) swap 2nd and last in **odd** (pos index 1)
1,2,3,4,5,6,7,8,9 becomes
1,9,3,4,5,6,7,8,2
then we reach even
2) swap 3rd and last in **even** (pos index 2 we are at 3 )
1,9,3,4,5,6,7,8,2 becomes (swapped 3 and 2_ )
1,9,2,4,5,6,7,8,3
and then sawp 8 and 3
1,9,2,4,5,6,7,8,3 becomes
1,9,2,4,5,6,7,3,8
3) we reach in odd (pos index 3 we are at 4 )
1,9,2,4,5,6,7,3,8
becomes
1,9,2,8,5,6,7,3,4
4) swap even 5 to last
and here it becomes wrong
Let me start by pointing out that even registers are a kind of memory. Without any 'extra' memory (other than that occupied by the sorted array, that is) we don't even have counters! That said, here goes:
Let a be an array of n > 2 positive integers sorted in ascending order, with the positions indexed from 0 to n-1.
From i = 1 to n-2, bubble-sort the sub-array ranging from position i to position n-1 (inclusive), alternatively in descending and ascending order. (Meaning that you bubble-sort in descending order if i is odd and in ascending order if it is even.)
Since to bubble-sort you only need to compare, and possibly swap, adjacent elements, you won't need 'extra' memory.
(Mind you, if you start at i = 0 and first sort in ascending order, you don't even need a to be pre-sorted.)
And one more thing: as there was no talk of it in your question, I will keep very silent on the performance of the above algorithm...
We will make n/2 passes and during each pass we will swap each element, from left to right, starting with the element at position 2k-1, with the last element. Example:
pass 1
V
1,2,3,4,5,6,7,8,9
1,9,3,4,5,6,7,8,2
1,9,2,4,5,6,7,8,3
1,9,2,3,5,6,7,8,4
1,9,2,3,4,6,7,8,5
1,9,2,3,4,5,7,8,6
1,9,2,3,4,5,6,8,7
1,9,2,3,4,5,6,7,8
pass 2
V
1,9,2,3,4,5,6,7,8
1,9,2,8,4,5,6,7,3
1,9,2,8,3,5,6,7,4
1,9,2,8,3,4,6,7,5
1,9,2,8,3,4,5,7,6
1,9,2,8,3,4,5,6,7
pass 3
V
1,9,2,8,3,4,5,6,7
1,9,2,8,3,7,5,6,4
1,9,2,8,3,7,4,6,5
1,9,2,8,3,7,4,5,6
pass 4
V
1,9,2,8,3,7,4,5,6
1,9,2,8,3,7,4,6,5
This should take O(n^2) swaps and uses no extra memory beyond the counters involved.
The loop invariant to prove is that the first 2k+1 positions are correct after iteration k of the loop.
Alright, assuming that with constant space complexity, we need to lose some of our time complexity, the following algorithm possibly works in O(n^2) time complexity.
I wrote this in python. I wrote it as quickly as possible so apologies for any syntactical errors.
# s is the array passed.
def hi_low(s):
last = len(s)
for i in range(0, last, 2):
if s[i+1] == None:
break
index_to_swap = last
index_to_be_swapped = i+1
while s[index_to_be_swapped] != s[index_to_swap]:
# write your own swap func here
swap(s[index_to_swap], s[index_to_swap-1])
index_to_swap -=1
return s
Quick explanation:
Suppose the initial list given to us is:
1 2 3 4 5 6 7 8 9
So in our program, initially,
index_to_swap = last
meaning that it is pointing to 9, and
index_to_be_swapped = i+1
is i+1, i.e one step ahead of our current loop pointer. [Also remember we're looping with a difference of 2].
So initially,
i = 0
index_to_be_swapped = 1
index_to_swap = 9
and in the inner loop what we're checking is: until the values in both of these indexes are same, we keep on swapping
swap(s[index_to_swap], s[index_to_swap-1])
so it'll look like:
# initially:
1 2 3 4 5 6 7 8 9
^ ^---index_to_swap
^-----index_to_be_swapped
# after 1 loop
1 2 3 4 5 6 7 9 8
^ ^-----index_to_swap
^----- index_to_be_swapped
... goes on until
1 9 2 3 4 5 6 7 8
^-----index_to_swap
^-----index_to_be_swapped
Now, the inner loop's job is done, and the main loop is run again with
1 9 2 3 4 5 6 7 8
^ ^---- index_to_swap
^------index_to_be_swapped
This runs until it's behind 2.
So the outer loop runs for almost n\2 times, and for each outer loop the inner loop runs for almost n\2 times in the worst case so the time complexity if n/2*n/2 = n^2/4 which is the order of n^2 i.e O(n^2).
If there are any mistakes please feel free to point it out.
Hope this helps!
It will work for any sorted array
let arr = [1, 2, 3, 4, 5, 6, 7, 8, 9];
let i = arr[0];
let j = arr[arr.length - 1];
let k = 0;
while(k < arr.length) {
arr[k] = i;
if(arr[k+1]) arr[k+1] = j;
i++;
k += 2;
j--;
}
console.log(arr);
Explanation: Because its a sorted array, you need to know 3 things to produce your expected output.
Starting Value : let i = arr[0]
Ending Value(You can also find it with the length of array by the way): let j = arr[arr.length -1]
Length of Array: arr.length
Loop through the array and set the value like this
arr[firstIndex] = firstValue, arr[thirdIndex] = firstValue + 1 and so on..
arr[secondIndex] = lastValue, arr[fourthIndex] = lastValue - 1 and so on..
Obviously you can do the same things in a different way. But i think that's the simplest way.

Find a duplicate in array of integers

This was an interview question.
I was given an array of n+1 integers from the range [1,n]. The property of the array is that it has k (k>=1) duplicates, and each duplicate can appear more than twice. The task was to find an element of the array that occurs more than once in the best possible time and space complexity.
After significant struggling, I proudly came up with O(nlogn) solution that takes O(1) space. My idea was to divide range [1,n-1] into two halves and determine which of two halves contains more elements from the input array (I was using Pigeonhole principle). The algorithm continues recursively until it reaches the interval [X,X] where X occurs twice and that is a duplicate.
The interviewer was satisfied, but then he told me that there exists O(n) solution with constant space. He generously offered few hints (something related to permutations?), but I had no idea how to come up with such solution. Assuming that he wasn't lying, can anyone offer guidelines? I have searched SO and found few (easier) variations of this problem, but not this specific one. Thank you.
EDIT: In order to make things even more complicated, interviewer mentioned that the input array should not be modified.
Take the very last element (x).
Save the element at position x (y).
If x == y you found a duplicate.
Overwrite position x with x.
Assign x = y and continue with step 2.
You are basically sorting the array, it is possible because you know where the element has to be inserted. O(1) extra space and O(n) time complexity. You just have to be careful with the indices, for simplicity I assumed first index is 1 here (not 0) so we don't have to do +1 or -1.
Edit: without modifying the input array
This algorithm is based on the idea that we have to find the entry point of the permutation cycle, then we also found a duplicate (again 1-based array for simplicity):
Example:
2 3 4 1 5 4 6 7 8
Entry: 8 7 6
Permutation cycle: 4 1 2 3
As we can see the duplicate (4) is the first number of the cycle.
Finding the permutation cycle
x = last element
x = element at position x
repeat step 2. n times (in total), this guarantees that we entered the cycle
Measuring the cycle length
a = last x from above, b = last x from above, counter c = 0
a = element at position a, b = elment at position b, b = element at position b, c++ (so we make 2 steps forward with b and 1 step forward in the cycle with a)
if a == b the cycle length is c, otherwise continue with step 2.
Finding the entry point to the cycle
x = last element
x = element at position x
repeat step 2. c times (in total)
y = last element
if x == y then x is a solution (x made one full cycle and y is just about to enter the cycle)
x = element at position x, y = element at position y
repeat steps 5. and 6. until a solution was found.
The 3 major steps are all O(n) and sequential therefore the overall complexity is also O(n) and the space complexity is O(1).
Example from above:
x takes the following values: 8 7 6 4 1 2 3 4 1 2
a takes the following values: 2 3 4 1 2
b takes the following values: 2 4 2 4 2
therefore c = 4 (yes there are 5 numbers but c is only increased when making steps, not initially)
x takes the following values: 8 7 6 4 | 1 2 3 4
y takes the following values: | 8 7 6 4
x == y == 4 in the end and this is a solution!
Example 2 as requested in the comments: 3 1 4 6 1 2 5
Entering cycle: 5 1 3 4 6 2 1 3
Measuring cycle length:
a: 3 4 6 2 1 3
b: 3 6 1 4 2 3
c = 5
Finding the entry point:
x: 5 1 3 4 6 | 2 1
y: | 5 1
x == y == 1 is a solution
Here is a possible implementation:
function checkDuplicate(arr) {
console.log(arr.join(", "));
let len = arr.length
,pos = 0
,done = 0
,cur = arr[0]
;
while (done < len) {
if (pos === cur) {
cur = arr[++pos];
} else {
pos = cur;
if (arr[pos] === cur) {
console.log(`> duplicate is ${cur}`);
return cur;
}
cur = arr[pos];
}
done++;
}
console.log("> no duplicate");
return -1;
}
for (t of [
[0, 1, 2, 3]
,[0, 1, 2, 1]
,[1, 0, 2, 3]
,[1, 1, 0, 2, 4]
]) checkDuplicate(t);
It is basically the solution proposed by #maraca (typed too slowly!) It has constant space requirements (for the local variables), but apart from that only uses the original array for its storage. It should be O(n) in the worst case, because as soon as a duplicate is found, the process terminates.
If you are allowed to non-destructively modify the input vector, then it is pretty easy. Suppose we can "flag" an element in the input by negating it (which is obviously reversible). In that case, we can proceed as follows:
Note: The following assume that the vector is indexed starting at 1. Since it is probably indexed starting at 0 (in most languages), you can implement "Flag item at index i" with "Negate the item at index i-1".
Set i to 0 and do the following loop:
Increment i until item i is unflagged.
Set j to i and do the following loop:
Set j to vector[j].
if the item at j is flagged, j is a duplicate. Terminate both loops.
Flag the item at j.
If j != i, continue the inner loop.
Traverse the vector setting each element to its absolute value (i.e. unflag everything to restore the vector).
It depends what tools are you(your app) can use. Currently a lot of frameworks/libraries exists. For exmaple in case of C++ standart you can use std::map<> ,as maraca mentioned.
Or if you have time you can made your own implementation of binary tree, but you need to keep in mind that insert of elements differs in comarison with usual array. In this case you can optimise search of duplicates as it possible in your particular case.
binary tree expl. ref:
https://www.wikiwand.com/en/Binary_tree

Smallest number that cannot be formed from sum of numbers from array

This problem was asked to me in Amazon interview -
Given a array of positive integers, you have to find the smallest positive integer that can not be formed from the sum of numbers from array.
Example:
Array:[4 13 2 3 1]
result= 11 { Since 11 was smallest positive number which can not be formed from the given array elements }
What i did was :
sorted the array
calculated the prefix sum
Treverse the sum array and check if next element is less than 1
greater than sum i.e. A[j]<=(sum+1). If not so then answer would
be sum+1
But this was nlog(n) solution.
Interviewer was not satisfied with this and asked a solution in less than O(n log n) time.
There's a beautiful algorithm for solving this problem in time O(n + Sort), where Sort is the amount of time required to sort the input array.
The idea behind the algorithm is to sort the array and then ask the following question: what is the smallest positive integer you cannot make using the first k elements of the array? You then scan forward through the array from left to right, updating your answer to this question, until you find the smallest number you can't make.
Here's how it works. Initially, the smallest number you can't make is 1. Then, going from left to right, do the following:
If the current number is bigger than the smallest number you can't make so far, then you know the smallest number you can't make - it's the one you've got recorded, and you're done.
Otherwise, the current number is less than or equal to the smallest number you can't make. The claim is that you can indeed make this number. Right now, you know the smallest number you can't make with the first k elements of the array (call it candidate) and are now looking at value A[k]. The number candidate - A[k] therefore must be some number that you can indeed make with the first k elements of the array, since otherwise candidate - A[k] would be a smaller number than the smallest number you allegedly can't make with the first k numbers in the array. Moreover, you can make any number in the range candidate to candidate + A[k], inclusive, because you can start with any number in the range from 1 to A[k], inclusive, and then add candidate - 1 to it. Therefore, set candidate to candidate + A[k] and increment k.
In pseudocode:
Sort(A)
candidate = 1
for i from 1 to length(A):
if A[i] > candidate: return candidate
else: candidate = candidate + A[i]
return candidate
Here's a test run on [4, 13, 2, 1, 3]. Sort the array to get [1, 2, 3, 4, 13]. Then, set candidate to 1. We then do the following:
A[1] = 1, candidate = 1:
A[1] ≤ candidate, so set candidate = candidate + A[1] = 2
A[2] = 2, candidate = 2:
A[2] ≤ candidate, so set candidate = candidate + A[2] = 4
A[3] = 3, candidate = 4:
A[3] ≤ candidate, so set candidate = candidate + A[3] = 7
A[4] = 4, candidate = 7:
A[4] ≤ candidate, so set candidate = candidate + A[4] = 11
A[5] = 13, candidate = 11:
A[5] > candidate, so return candidate (11).
So the answer is 11.
The runtime here is O(n + Sort) because outside of sorting, the runtime is O(n). You can clearly sort in O(n log n) time using heapsort, and if you know some upper bound on the numbers you can sort in time O(n log U) (where U is the maximum possible number) by using radix sort. If U is a fixed constant, (say, 109), then radix sort runs in time O(n) and this entire algorithm then runs in time O(n) as well.
Hope this helps!
Use bitvectors to accomplish this in linear time.
Start with an empty bitvector b. Then for each element k in your array, do this:
b = b | b << k | 2^(k-1)
To be clear, the i'th element is set to 1 to represent the number i, and | k is setting the k-th element to 1.
After you finish processing the array, the index of the first zero in b is your answer (counting from the right, starting at 1).
b=0
process 4: b = b | b<<4 | 1000 = 1000
process 13: b = b | b<<13 | 1000000000000 = 10001000000001000
process 2: b = b | b<<2 | 10 = 1010101000000101010
process 3: b = b | b<<3 | 100 = 1011111101000101111110
process 1: b = b | b<<1 | 1 = 11111111111001111111111
First zero: position 11.
Consider all integers in interval [2i .. 2i+1 - 1]. And suppose all integers below 2i can be formed from sum of numbers from given array. Also suppose that we already know C, which is sum of all numbers below 2i. If C >= 2i+1 - 1, every number in this interval may be represented as sum of given numbers. Otherwise we could check if interval [2i .. C + 1] contains any number from given array. And if there is no such number, C + 1 is what we searched for.
Here is a sketch of an algorithm:
For each input number, determine to which interval it belongs, and update corresponding sum: S[int_log(x)] += x.
Compute prefix sum for array S: foreach i: C[i] = C[i-1] + S[i].
Filter array C to keep only entries with values lower than next power of 2.
Scan input array once more and notice which of the intervals [2i .. C + 1] contain at least one input number: i = int_log(x) - 1; B[i] |= (x <= C[i] + 1).
Find first interval that is not filtered out on step #3 and corresponding element of B[] not set on step #4.
If it is not obvious why we can apply step 3, here is the proof. Choose any number between 2i and C, then sequentially subtract from it all the numbers below 2i in decreasing order. Eventually we get either some number less than the last subtracted number or zero. If the result is zero, just add together all the subtracted numbers and we have the representation of chosen number. If the result is non-zero and less than the last subtracted number, this result is also less than 2i, so it is "representable" and none of the subtracted numbers are used for its representation. When we add these subtracted numbers back, we have the representation of chosen number. This also suggests that instead of filtering intervals one by one we could skip several intervals at once by jumping directly to int_log of C.
Time complexity is determined by function int_log(), which is integer logarithm or index of the highest set bit in the number. If our instruction set contains integer logarithm or any its equivalent (count leading zeros, or tricks with floating point numbers), then complexity is O(n). Otherwise we could use some bit hacking to implement int_log() in O(log log U) and obtain O(n * log log U) time complexity. (Here U is largest number in the array).
If step 1 (in addition to updating the sum) will also update minimum value in given range, step 4 is not needed anymore. We could just compare C[i] to Min[i+1]. This means we need only single pass over input array. Or we could apply this algorithm not to array but to a stream of numbers.
Several examples:
Input: [ 4 13 2 3 1] [ 1 2 3 9] [ 1 1 2 9]
int_log: 2 3 1 1 0 0 1 1 3 0 0 1 3
int_log: 0 1 2 3 0 1 2 3 0 1 2 3
S: 1 5 4 13 1 5 0 9 2 2 0 9
C: 1 6 10 23 1 6 6 15 2 4 4 13
filtered(C): n n n n n n n n n n n n
number in
[2^i..C+1]: 2 4 - 2 - - 2 - -
C+1: 11 7 5
For multi-precision input numbers this approach needs O(n * log M) time and O(log M) space. Where M is largest number in the array. The same time is needed just to read all the numbers (and in the worst case we need every bit of them).
Still this result may be improved to O(n * log R) where R is the value found by this algorithm (actually, the output-sensitive variant of it). The only modification needed for this optimization is instead of processing whole numbers at once, process them digit-by-digit: first pass processes the low order bits of each number (like bits 0..63), second pass - next bits (like 64..127), etc. We could ignore all higher-order bits after result is found. Also this decreases space requirements to O(K) numbers, where K is number of bits in machine word.
If you sort the array, it will work for you. Counting sort could've done it in O(n), but if you think in a practically large scenario, range can be pretty high.
Quicksort O(n*logn) will do the work for you:
def smallestPositiveInteger(self, array):
candidate = 1
n = len(array)
array = sorted(array)
for i in range(0, n):
if array[i] <= candidate:
candidate += array[i]
else:
break
return candidate

Find intersection between two arrays with restrictions

I have to write a program in order to find the same numbers between two arrays.
The problem is that I have to do it in the most optimized way respecting some constraints:
-Having i,j indexes for the array A and w,x indexes for the array B, if A[i]=B[w] and A[j]=b[x] and i
-The maximum distance between these numbers has to be k (given by input);
-I have to use at maximum O(k) space in order to implement something to optimize the search;
-The numbers appears only once in each array (like sets).
I was thinking about constructing a balanced RBTree with k elements of the first array in order to optimize the search process, but I am in doubt about the space it requires (I think it's not O(k) because of the pointers and the color marking).
Anyone has a better idea about this problem?
Edit: I'll put my examples here to make it more clear:
Array A: 3 7 5 9 10 15 16 1 6 2
Array B: 4 8 5 13 1 17 2 11
Constant k = 6
Output: 5 1 2
Edit2: In the output the numbers must appear in the same sequence as they are in the arrays.
Using K as Max Distance
Assuming that when you say they must be presented in Array order that the order from one array is sufficient - assuming:
A: 1 2
B: 2 1
results in 1 2 or 2 1 and not either 1 or 2 since the ordering is crossed
Note that the K constraint makes this less optimal
The first observation is that anything in the larger array, past the index of the number of elements in the smaller array + K -1 can be ignored
The second observation is that all values are apparently int
The third observation is that this has to be optimal for huge arrays with a K that can be close to the size of the arrays
A radix sort is O(N) and takes O(N) size, so we will use that
In order to allow for K we can copy both arrays to parallel arrays of (value, position) and not copy values that are unreachable in the larger array as per observation 1 i.e.
A: 71, 23, 42 ==> A2: { 71, 0 }, { 23, 1 }, { 42, 2 }
We can also create a similar array for results that is the same size as the smaller array
We can modify the radix sort to move values and postions together
Algorythm:
1) Copy arrays [ O(1) ]
2) Radix sort array A and B by values [ O(1) ]
3) Walk A and B: [ O(1) ]
if A < B -> increment index in A
if A > B -> increment index in B
if A == B -> incremnt index in A and B
add original A to result IF the pos diffence is less than K
4) Radix sort results by position [ O(1) ]
5) print result values [ O(1) ]

Maximizing a particular sum over all possible subarrays

Consider an array like this one below:
{1, 5, 3, 5, 4, 1}
When we choose a subarray, we reduce it to the lowest number in the subarray. For example, the subarray {5, 3, 5} becomes {3, 3, 3}. Now, the sum of the subarray is defined as the sum of the resultant subarray. For example, {5, 3, 5} the sum is 3 + 3 + 3 = 9. The task is to find the largest possible sum that can be made from any subarray. For the above array, the largest sum is 12, given by the subarray {5, 3, 5, 4}.
Is it possible to solve this problem in time better than O(n2)?
I believe that I have an algorithm for this that runs in O(n) time. I'll first describe an unoptimized version of the algorithm, then give a fully optimized version.
For simplicity, let's initially assume that all values in the original array are distinct. This isn't true in general, but it gives a good starting point.
The key observation behind the algorithm is the following. Find the smallest element in the array, then split the array into three parts - all elements to the left of the minimum, the minimum element itself, and all elements to the right of the minimum. Schematically, this would look something like
+-----------------------+-----+-----------------------+
| left values | min | right values |
+-----------------------+-----+-----------------------+
Here's the key observation: if you take the subarray that gives the optimum value, one of three things must be true:
That array consists of all the values in the array, including the minimum value. This has total value min * n, where n is the number of elements.
That array does not include the minimum element. In that case, the subarray has to be purely to the left or to the right of the minimum value and cannot include the minimum value itself.
This gives a nice initial recursive algorithm for solving this problem:
If the sequence is empty, the answer is 0.
If the sequence is nonempty:
Find the minimum value in the sequence.
Return the maximum of the following:
The best answer for the subarray to the left of the minimum.
The best answer for the subarray to the right of the minimum.
The number of elements times the minimum.
So how efficient is this algorithm? Well, that really depends on where the minimum elements are. If you think about it, we do linear work to find the minimum, then divide the problem into two subproblems and recurse on each. This is the exact same recurrence you get when considering quicksort. This means that in the best case it will take Θ(n log n) time (if we always have the minimum element in the middle of each half), but in the worst case it will take Θ(n2) time (if we always have the minimum value purely on the far left or the far right.
Notice, however, that all of the effort we're spending is being used to find the minimum value in each of the subarrays, which takes O(k) time for k elements. What if we could speed this up to O(1) time? In that case, our algorithm would do a lot less work. More specifically, it would do only O(n) work. The reason for this is the following: each time we make a recursive call, we do O(1) work to find the minimum element, then remove that element from the array and recursively process the remaining pieces. Each element can therefore be the minimum element of at most one of the recursive calls, and so the total number of recursive calls can't be any greater than the number of elements. This means that we make at most O(n) calls that each do O(1) work, which gives a total of O(1) work.
So how exactly do we get this magical speedup? This is where we get to use a surprisingly versatile and underappreciated data structure called the Cartesian tree. A Cartesian tree is a binary tree created out of a sequence of elements that has the following properties:
Each node is smaller than its children, and
An inorder walk of the Cartesian tree gives back the elements of the sequence in the order in which they appear.
For example, the sequence 4 6 7 1 5 0 2 8 3 has this Cartesian tree:
0
/ \
1 2
/ \ \
4 5 3
\ /
6 8
\
7
And here's where we get the magic. We can immediately find the minimum element of the sequence by just looking at the root of the Cartesian tree - that takes only O(1) time. Once we've done that, when we make our recursive calls and look at all the elements to the left of or to the right of the minimum element, we're just recursively descending into the left and right subtrees of the root node, which means that we can read off the minimum elements of those subarrays in O(1) time each. Nifty!
The real beauty is that it is possible to construct a Cartesian tree for a sequence of n elements in O(n) time. This algorithm is detailed in this section of the Wikipedia article. This means that we can get a super fast algorithm for solving your original problem as follows:
Construct a Cartesian tree for the array.
Use the above recursive algorithm, but use the Cartesian tree to find the minimum element rather than doing a linear scan each time.
Overall, this takes O(n) time and uses O(n) space, which is a time improvement over the O(n2) algorithm you had initially.
At the start of this discussion, I made the assumption that all array elements are distinct, but this isn't really necessary. You can still build a Cartesian tree for an array with non-distinct elements in it by changing the requirement that each node is smaller than its children to be that each node is no bigger than its children. This doesn't affect the correctness of the algorithm or its runtime; I'll leave that as the proverbial "exercise to the reader." :-)
This was a cool problem! I hope this helps!
Assuming that the numbers are all non-negative, isn't this just the "maximize the rectangle area in a histogram" problem? which has now become famous...
O(n) solutions are possible. This site: http://blog.csdn.net/arbuckle/article/details/710988 has a bunch of neat solutions.
To elaborate what I am thinking (it might be incorrect) think of each number as histogram rectangle of width 1.
By "minimizing" a subarray [i,j] and adding up, you are basically getting the area of the rectangle in the histogram which spans from i to j.
This has appeared before on SO: Maximize the rectangular area under Histogram, you find code and explanation, and a link to the official solutions page (http://www.informatik.uni-ulm.de/acm/Locals/2003/html/judge.html).
The following algorithm I tried will have the order of the algorithm which is initially used to sort the array. For example, if the initial array is sorted with binary tree sort, it will have O(n) in best case and O(n log n) as an average case.
Gist of algorithm:
The array is sorted. The sorted values and the correponding old indices are stored. A binary search tree is created from the corresponding older indices which is used to determine how far it can go forwards and backwards without encountering a value less than the current value, which will result in the maximum possible sub array.
I will explain the method with the array in the question [1, 5, 3, 5, 4, 1]
1 5 3 5 4 1
-------------------------
array indices => 0 1 2 3 4 5
-------------------------
This array is sorted. Store the value and their indices in ascending order, which will be as follows
1 1 3 4 5 5
-------------------------
original array indices => 0 5 2 4 1 3
(referred as old_index) -------------------------
It is important to have a reference to both the value and their old indices; like an associative array;
Few terms to be clear:
old_index refers to the corresponding original index of an element (that is index in original array);
For example, for element 4, old_index is 4; current_index is 3;
whereas, current_index refers to the index of the element in the sorted array;
current_array_value refers to the current element value in the sorted array.
pre refers to inorder predecessor; succ refers to inorder successor
Also, min and max values can be got directly, from first and last elements of the sorted array, which are min_value and max_value respectively;
Now, the algorithm is as follows which should be performed on sorted array.
Algorithm:
Proceed from the left most element.
For each element from the left of the sorted array, apply this algorithm
if(element == min_value){
max_sum = element * array_length;
if(max_sum > current_max)
current_max = max_sum;
push current index into the BST;
}else if(element == max_value){
//here current index is the index in the sorted array
max_sum = element * (array_length - current_index);
if(max_sum > current_max)
current_max = max_sum;
push current index into the BST;
}else {
//pseudo code steps to determine maximum possible sub array with the current element
//pre is inorder predecessor and succ is inorder successor
get the inorder predecessor and successor from the BST;
if(pre == NULL){
max_sum = succ * current_array_value;
if(max_sum > current_max)
current_max = max_sum;
}else if (succ == NULL){
max_sum = (array_length - pre) - 1) * current_array_value;
if(max_sum > current_max)
current_sum = max_sum;
}else {
//find the maximum possible sub array streak from the values
max_sum = [((succ - old_index) - 1) + ((old_index - pre) - 1) + 1] * current_array_value;
if(max_sum > current_max)
current_max = max_sum;
}
}
For example,
original array is
1 5 3 5 4 1
-------------------------
array indices => 0 1 2 3 4 5
-------------------------
and the sorted array is
1 1 3 4 5 5
-------------------------
original array indices => 0 5 2 4 1 3
(referred as old_index) -------------------------
After first element:
max_sum = 6 [it will reduce to 1*6]
0
After second element:
max_sum = 6 [it will reduce to 1*6]
0
\
5
After third element:
0
\
5
/
2
inorder traversal results in: 0 2 5
applying the algorithm,
max_sum = [((succ - old_index) - 1) + ((old_index - pre) - 1) + 1] * current_array_value;
max_sum = [((5-2)-1) + ((2-0)-1) + 1] * 3
= 12
current_max = 12 [the maximum possible value]
After fourth element:
0
\
5
/
2
\
4
inorder traversal results in: 0 2 4 5
applying the algorithm,
max_sum = 8 [which is discarded since it is less than 12]
After fifth element:
max_sum = 10 [reduces to 2 * 5, discarded since it is less than 8]
After last element:
max_sum = 5 [reduces to 1 * 5, discarded since it is less than 8]
This algorithm will have the order of the algorithm which is initially used to sort the array. For example, if the initial array is sorted with binary sort, it will have O(n) in best case and O(n log n) as an average case.
The space complexity will be O(3n) [O(n + n + n), n for sorted values, another n for old indices, and another n for constructing the BST]. However, I'm not sure about this. Any feedback on the algorithm is appreciated.

Resources